#ceph IRC Log

Index

IRC Log for 2011-05-06

Timestamps are in GMT/BST.

[0:00] <gregaf> Tv: since you aborted, we lost the logs again, right?
[0:00] <sagewk> just need librados and rados, actually, but we'll probably need to beat configure over the head several times to build it
[0:02] <Tv> gregaf: yeah i think so, sorry
[0:02] <gregaf> okay
[0:02] <Tv> gregaf: i have a fix for the log lossage, hold on
[0:02] <gregaf> we really need at least the MDS log (hopefully just that one)
[0:02] <Tv> gregaf: doing so many things i lose track..
[0:05] <Tv> gregaf: my philosophy: don't wait for me to run a test for you, just run it; feel free to clone one of the existing tests
[0:05] <Tv> gregaf: that gets you e.g. to access live cluster with the problem occurring & pokeable
[0:11] <Tv> (as the test hangs, you won't see the results in the web ui until my bugfix goes in, anyway)
[0:17] <Tv> but to do that, i need to fix another bug...
[0:22] <Tv> didn't work :(
[0:27] <Tv> ok i need to pass an explicit --chmod= to rsync.. first time i use that option..
[0:27] <Tv> live and learn
[0:31] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Quit: Leaving)
[0:36] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:40] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[0:46] <yehuda_hm> sagewk: building on opensolaris probably much easier than building on windows
[0:46] <sagewk> small comfort!
[0:47] <yehuda_hm> well.. we probably want some configure --client-only option
[0:48] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[0:48] <yehuda_hm> the cypto library is a problem
[0:48] <yehuda_hm> and we might be including some stuff directly out of include/linux
[0:49] <sagewk> yeah..
[0:50] <yehuda_hm> though I did some linux->solaris ports before and it was never too bad
[0:59] <cmccabe> sagewk: should rados export assume that if object sizes and creation times are the same, they are the same?
[0:59] <cmccabe> sagewk: otherwise, we basically will have to download everything, every time.
[0:59] <sagewk> there should be an option that does that (-u)
[1:00] <sagewk> there's also an mtime on the object
[1:00] <cmccabe> well the option for not doing that can be rm -rf?
[1:00] <cmccabe> stat gives me size and mtime
[1:00] <sagewk> oh. and a version that is actually unique and ordered! not currently exposed via librados though
[1:00] <cmccabe> sorry, I guess I was confused about the ctime thing... not sure if librados has that, let me check
[1:01] <sagewk> same thing in this case
[1:01] <cmccabe> pool stat has a lot of fields, but for an object, that's all you get.
[1:04] <sagewk> use those for now.. we can add version soon.
[1:04] <cmccabe> ok
[1:05] <cmccabe> it will probably work pretty well in practice with just those two
[1:05] <sagewk> yeah
[1:05] <sagewk> and we can expose version cleanly later
[1:07] <sagewk> cmccabe: any idea why --log-file wouldn't work with rados tool?
[1:07] <cmccabe> is rados tool using dout?
[1:07] <sagewk> for the librados stuff...
[1:08] <cmccabe> looks like there area few uses
[1:08] <sagewk> i'm adding --debug-objecter 20 --log-file foo
[1:08] <sagewk> don't care about the stuff in rados.cc itself
[1:14] <cmccabe> sagewk: I'm looking at it
[1:14] <cmccabe> sagewk: rebuild taking a long time here for some reason
[1:14] <sagewk> thanks
[1:16] <cmccabe> sagewk: looks like it's working, but there is a bug in the argv parsing that turns dashes to underscores, even after equals signs
[1:17] <sagewk> --log_file then?
[1:17] <cmccabe> sagewk: either --log-file or --log_file will work
[1:18] <cmccabe> sagewk: I was just commenting that --log-file=/tmp/log-foo would actually log to /tmp/log_foo
[1:18] <sagewk> oh
[1:18] <sagewk> you tested on master or stable?
[1:18] <cmccabe> master
[1:18] <sagewk> not doing it for me... :/
[1:19] <sagewk> './rados lspools --debug-ms 1 --log-file c' spams stdout
[1:19] <cmccabe> try --debug-ms=1, you should get a lot of output
[1:19] <sagewk> yeah, just not to the 'c' file
[1:19] <gregaf> did we break the ' ' and '=' equivalence at some point?
[1:19] <sagewk> 2> c works, but --log-file should too, right?
[1:20] <sagewk> = or space, that part is fine (--debug-ms 1 generates output)
[1:20] <sagewk> it's the --log-file part that's not working
[1:20] <sagewk> is this a library thing?
[1:20] <cmccabe> let me just finish this thing and I'll retry
[1:21] <sagewk> k no rush, 2> works for now
[1:26] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[1:27] <cmccabe> sagewk: --log-file works fine for me
[1:27] <sagewk> weird. let me try on another machine
[1:27] <cmccabe> sagewk: is it possible that you just want no logging to stderr?
[1:28] <cmccabe> you can get that with --log-to-stderr=0
[1:28] <sagewk> oh, is it doing both?
[1:28] <cmccabe> the two options have nothing to do with one another
[1:28] <sagewk> oh! i see
[1:28] <cmccabe> you can log to as many sinks or as few as you want
[1:28] <sagewk> got it, that makes sense.
[1:28] <cmccabe> k
[1:28] <sagewk> :)
[1:32] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[1:40] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:48] <Tv> anyone know how to get umask of a running process?
[1:51] <cmccabe> tv: there isn't even a way to get the umask without setting it in glibc
[1:51] <cmccabe> tv: the really sad thing is, if there were a proc entry, it would often be preferrable to calling umask, since it wouldn't blow away the old value
[1:51] <Tv> yeah but you can bounce that back and forth
[1:51] <Tv> threads are evil anyway ;)
[1:53] <cmccabe> threads are bad, but sometimes they're hard to avoid
[1:53] <cmccabe> like when you're modifying software that uses threads :)
[1:55] <cmccabe> the worst part is that once your threaded application grows to giant size, fork() becomes inefficient due to TLB copying
[1:55] <cmccabe> which of course is another argument for more and more threads :)
[2:00] <cmccabe> I think that's one problem the hadoop guys have had, is long fork times
[2:00] <cmccabe> due to the large size of the JVM
[2:00] <cmccabe> I think there might be some Linux-specific workarounds that they're not aware of though
[2:01] <Tv> there's almost no aspect of hadoop that'd be well designed to perform on linux
[2:01] <Tv> all the mapreduce power they have is the pigs & sufficient thrust method
[2:01] <cmccabe> tv: heh
[2:02] <cmccabe> well, at the end of the day, it comes down to economics
[2:02] <cmccabe> writing stuff in Java does often make sense
[2:02] <Tv> especially when you have a big corp full of java programmers
[2:03] <cmccabe> it probably would have made more sense to write the framework in C, and the application code in Java, but oh well
[2:03] <Tv> actually it'd be pretty nasty in C
[2:03] <Tv> at least it doesn't do bad things when it crashes, now
[2:03] <cmccabe> the code for distributing out jobs to nodes and such?
[2:04] * Juul (~Juul@slim.dhcp.lbl.gov) has joined #ceph
[2:11] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:12] * Dantman (~dantman@74-115-199-40.eng.wind.ca) has joined #ceph
[2:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[2:24] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:37] * Dantman (~dantman@74-115-199-40.eng.wind.ca) Quit (Ping timeout: 482 seconds)
[2:43] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:43] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[2:52] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:03] * Juul (~Juul@slim.dhcp.lbl.gov) Quit (Quit: Leaving)
[3:06] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[3:23] * maswan (maswan@kennedy.acc.umu.se) Quit (Server closed connection)
[3:23] * maswan (~maswan@kennedy.acc.umu.se) has joined #ceph
[3:30] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[3:34] * zwu (~root@202.108.130.138) has joined #ceph
[3:46] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[4:12] * darkfaded (~floh@188.40.175.2) has joined #ceph
[4:13] * darkfader (~floh@188.40.175.2) Quit (Write error: connection closed)
[4:13] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Write error: connection closed)
[4:13] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[4:37] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[5:00] * zwu (~root@202.108.130.138) has joined #ceph
[5:02] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (Server closed connection)
[5:02] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[5:13] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[5:17] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[5:29] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[5:30] * zwu (~root@202.108.130.138) has joined #ceph
[5:34] * iggy (~iggy@theiggy.com) Quit (Server closed connection)
[5:34] * iggy (~iggy@theiggy.com) has joined #ceph
[5:53] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[5:55] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:56] * zwu (~root@202.108.130.138) has joined #ceph
[6:22] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[6:27] * zwu (~root@202.108.130.138) has joined #ceph
[6:43] * atg (~atg@please.dont.hacktheinter.net) Quit (Server closed connection)
[6:43] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[7:29] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[7:33] * zwu (~root@202.108.130.138) has joined #ceph
[7:33] * tjikkun_ (~tjikkun@195-240-187-63.ip.telfort.nl) Quit (Server closed connection)
[7:34] * tjikkun_ (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[7:56] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[7:57] * hijacker (~hijacker@213.91.163.5) Quit (Remote host closed the connection)
[7:59] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[8:02] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Read error: Operation timed out)
[8:15] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:26] * lidongyang (~lidongyan@222.126.194.154) Quit (Remote host closed the connection)
[8:29] * allsystemsarego (~allsystem@188.25.132.227) has joined #ceph
[8:31] * lidongyang (~lidongyan@222.126.194.154) has joined #ceph
[9:07] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[9:12] * zwu (~root@202.108.130.138) has joined #ceph
[9:14] * Tsipa_ (~Yulya@ip-95-220-153-255.bb.netbynet.ru) has joined #ceph
[9:16] * allsystemsarego (~allsystem@188.25.132.227) Quit (Quit: Leaving)
[9:18] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:21] * Tsipa (~Yulya@ip-95-220-180-110.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[9:44] * joshd (~jdurgin@75.28.69.238) has joined #ceph
[9:56] * joshd (~jdurgin@75.28.69.238) Quit (Quit: Leaving.)
[10:06] * Tsipa_ is now known as Meguka_meduca
[10:09] * Meguka_meduca is now known as Meduka_Meguca
[10:09] * allsystemsarego (~allsystem@188.25.132.227) has joined #ceph
[10:10] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[10:43] * gregorg (~Greg@78.155.152.6) Quit (Quit: Quitte)
[10:44] * gregorg (~Greg@78.155.152.6) has joined #ceph
[11:03] * chraible (~chraible@blackhole.science-computing.de) Quit (Remote host closed the connection)
[11:47] * zwu (~root@202.108.130.138) Quit (Ping timeout: 480 seconds)
[11:51] * zwu (~root@202.108.130.138) has joined #ceph
[12:06] * allsystemsarego (~allsystem@188.25.132.227) Quit (resistance.oftc.net charm.oftc.net)
[12:06] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (resistance.oftc.net charm.oftc.net)
[12:08] * iggy (~iggy@theiggy.com) Quit (Ping timeout: 480 seconds)
[12:08] * nolan (~nolan@phong.sigbus.net) Quit (Ping timeout: 480 seconds)
[12:16] * allsystemsarego (~allsystem@188.25.132.227) has joined #ceph
[12:17] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[12:17] <trollface> ugh
[12:17] <trollface> sagewk: so cls_rbd is active, but still EIO of that form above.
[12:41] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[13:17] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[13:28] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[13:43] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[14:00] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:35] * allsystemsarego_ (~allsystem@188.27.164.67) has joined #ceph
[14:40] * allsystemsarego (~allsystem@188.25.132.227) Quit (Ping timeout: 480 seconds)
[14:54] <trollface> thereifixed it
[14:54] <trollface> needed to mkdir /var/lib/ceph/tmp
[14:59] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:14] <trollface> now authx doesn't like me
[15:14] <trollface> le fu~
[16:03] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:07] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:14] * Meduka_Meguca (~Yulya@ip-95-220-153-255.bb.netbynet.ru) Quit (Quit: leaving)
[16:14] * Yulya (~Yulya@ip-95-220-153-255.bb.netbynet.ru) has joined #ceph
[16:14] * Yulya is now known as Meduka_Meguca
[16:15] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[16:33] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:35] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:36] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[16:48] * greglap (~Adium@198.228.211.250) has joined #ceph
[16:56] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:04] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[17:09] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:29] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:38] * greglap (~Adium@198.228.211.250) Quit (Read error: Connection reset by peer)
[17:53] <trollface> quick question: how to rate limit recovery?
[17:56] <gregaf> trollface: I think you want to set osd_recovery_threads, osd_recovery_max_active, and osd_recovery_max_chunk
[17:56] <gregaf> they default to 1, 5, 1<<20
[17:57] <gregaf> I'd focus on osd_recovery_max_active, that's the number of PGs each OSD will allow to be in recovery at a time
[17:57] <trollface> thx
[17:57] <trollface> lets try setting it to 1 everywhere
[17:57] <trollface> :)
[17:57] <trollface> it is injectable?
[17:58] <trollface> hmm doesn't look that wat
[17:59] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:03] <gregaf> trollface: it won't reduce the running number but I think it will prevent extras from starting up
[18:04] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[18:11] <trollface> gregaf: I tried increasing the number of placement groups for a pool and it triggered a massive failure of osds. all osds segfaulted
[18:12] <gregaf> trollface: yeah, pg splitting is busted right now :(
[18:12] <trollface> it was very funny
[18:13] <gregaf> hopefully we'll get it in the next release; it's not been well-tested in a while but we're getting to the point where it needs to be
[18:13] <trollface> is there a way to run posix layer over different pair of pools (not data&metadata)
[18:13] <trollface> ?
[18:13] <trollface> I guess there isn't and I'm probably the only person that asked about it
[18:14] <gregaf> yeah, you can set the pool using cephfs on the root
[18:14] <trollface> but I cannot do 2 pools at once?
[18:14] <gregaf> I think maybe metadata is hard-coded in right now? sagewk?
[18:14] <gregaf> trollface: you can set the pool for data storage on individual inodes if you like, so you can use as many for data storage as you can create
[18:14] <sagewk> mostly. in theory it's possible, the tools just don't let you do it easily
[18:15] <trollface> okay
[18:22] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:36] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:45] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[18:54] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:00] * cmccabe (~cmccabe@208.80.64.174) has joined #ceph
[19:04] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:09] <Tv> joshd: ok so the email from Fyodor Ustinov looks like there's kernel memory corruption :(
[19:09] <Tv> corrupt file contents, "Bad page state"
[19:10] <joshd> tv: that sounds pretty bad
[19:12] <joshd> tv: what's the best way to debug this? bisect?
[19:13] <Tv> joshd: you could do a bisect with a uml build from ceph-client.git
[19:13] <Tv> joshd: if you can reproduce it nicely
[19:13] <Tv> joshd: step 1: reproduce with current head
[19:15] <joshd> tv: sounds like a plan
[19:39] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[19:46] <Tv> fyi: autotest violated its own db constraints, troubleshooting..
[19:46] <Tv> (3rd time for that)
[19:47] <cmccabe> tv: what ORM does autotest use?
[19:48] <Tv> django
[19:48] <Tv> but this is not on that level
[19:48] <cmccabe> k
[20:00] <Tv> alright ceph logs should show up in the autotest web ui again, even for aborted jobs
[20:12] * Meths_ is now known as Meths
[21:02] * Meduka_M1guca (~Yulya@ip-95-220-133-98.bb.netbynet.ru) has joined #ceph
[21:08] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:09] * Meduka_Meguca (~Yulya@ip-95-220-153-255.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[21:12] <bchrisman> I'm guessing libceph doesn't have a shared object cache between different instances? If I have two processes mounting the same filesystem via libceph and reading from the same file, will I have two cached objects in memory?
[21:13] <bchrisman> My concern here is that for samba3, each connection has its own process??? so the vfs layer we're building would potentially have a copy of a particular object by each connection accessing it.
[21:14] <gregaf> bchrisman: yeah, the cache isn't shared between processes
[21:14] <bchrisman> will continue on with this, but we'll need to find a way to fix that for samba eventually
[21:15] <bchrisman> either a) object cache in shmem or mmap file b) having a libceph daemon which all connections query??? or something along those lines
[21:15] <gregaf> (b) sounds a lot easier to me!
[21:15] <bchrisman> yeah.. I was thinking of just having the vfs layer talk fuse :)
[21:19] <bchrisman> gregaf: do you know whether fuse locks weren't implemented due to time, or due to some trickiness that was going to be extremely difficult to resolve?
[21:19] <gregaf> bchrisman: just time
[21:19] <gregaf> there may be trickiness but I never even looked at them enough to find out
[21:19] <bchrisman> ahh ok.. cool
[21:20] <gregaf> the file locking was due to requests from people who needed it but nobody asked about cfuse after they went into the kclient
[21:20] <cmccabe> gregaf: how does file locking work in the kclient
[21:22] <gregaf> it sends requests off to the MDS, then if the MDS replies positively it integrates the locks into its local lock list
[21:23] <gregaf> on reconnect it reads the locks off its local lock list and sends them back
[21:23] <cmccabe> so multiple MDSes can have different parts of the same directory right
[21:23] <gregaf> most of the magic's in the MDS to handle the different kinds of locks and overlapping/merging/etc
[21:23] <gregaf> uh, yeah?
[21:24] <cmccabe> it seems like the locks would go into the mdsmap?
[21:24] <cmccabe> how do you make sure all the MDSes know about them
[21:24] <cmccabe> I guess since they're advistory-- rather than mandatory-- locks, you only have to do that kind of lookup when someone asks
[21:25] <gregaf> yeah
[21:25] <gregaf> I don't remember if they're journalled or not but I think they are
[21:25] <gregaf> and each one's associated with the inode it belongs to
[21:26] <cmccabe> so since inodes are partitioned among MDSes, locks are too
[21:28] <gregaf> yep
[21:59] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[22:03] <sagewk> btw we have ceph.com now!
[22:04] <gregaf> wow, it looks just like ceph.newdream.net somehow...
[22:04] <gregaf> ;)
[22:04] <sagewk> magic!
[22:06] <gregaf> Tv: is there a good way to adjust debug levels for the client?
[22:06] <Tv> gregaf: if you're developing a test, it's doable; from just the control file, not yet
[22:06] <gregaf> it looks like the daemons get all their options set via the conf file but I can't even figure out where cfuse gets mounted so I dunno if it's looking at the conf or whatever
[22:06] <Tv> gregaf: it must look at the conf, otherwise it'd fail to auth
[22:07] <Tv> gregaf: in ceph-autotest.git, create your own branch, either edit teuthology/ceph.conf directly, or add a hook that edits it programmatically, commit, push your branch, point to it in the control file -- just be careful not to merge to master accidentally
[22:08] <Tv> gregaf: i want to make that a lot more developer-friendly, but it's not gonna happen overnight
[22:08] <gregaf> heh, okay
[22:08] <gregaf> I was trying to figure out a way to do it in the test since editing the conf seems like a really blunt instrument, but I can do that
[22:09] <Tv> in the test, you'd do it like...
[22:10] <Tv> def init_022_conf_kludge(self):
[22:10] <Tv> self.ceph_conf.setdefault('cfuse') # or whatever the section is
[22:10] <Tv> self.ceph_conf['cfuse']['debug foo bar'] = 9000
[22:10] <Tv> err
[22:11] <Tv> def init_022_conf_kludge(self):
[22:11] <Tv> self.ceph_conf.setdefault('cfuse', {})
[22:11] <Tv> self.ceph_conf['cfuse']['debug foo bar'] = 9000
[22:11] <gregaf> right
[22:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:19] <Tv> ooh none of the tests have needed to hook into init or shutdown so far, that simplifies this a lot
[22:20] <sagewk> wait: off_t should be 64 bits on a 64bit arch right?
[22:20] <sagewk> since we have FILE_OFFSET64 defined?
[22:20] <Tv> sagewk: if you compiled with that, yes
[22:20] <sagewk> and size_t too
[22:23] <Tv> and you mean -D_FILE_OFFSET_BITS=64
[22:24] <Tv> i wonder what's the exact for size_t
[22:24] <Tv> maybe it's 64 always on 64-bit arch
[22:25] <Tv> yup seems to be always
[22:28] <sagewk> wido: around?
[22:28] <sagewk> unsigned long, yeah.
[22:29] <Tv> oh wow SIZE_MAX is really low
[22:30] <Tv> interesting, i wonder if the actual C spec has any hope in real-world programming..
[22:46] <sagewk> tv: anybody good with man pages? (is it nroff markup?)
[22:46] <Tv> yeah it's nroff
[22:46] <Tv> you can write them with saner formats, these days
[22:49] <Tv> here's a sample manpage written in restructuredtext: http://docutils.sourceforge.net/sandbox/manpage-writer/rst2man.txt
[22:50] <Tv> i've wanted to set up sphinx (= a fancy variant of restructuredtext) doc tree for ceph for a while
[22:50] <Tv> you can say that subsections of the whole doc should be written out as individual man pages, etc
[23:02] <cmccabe> tv: a 16-bit size_t doesn't sound too bad. I'm not sure if structs bigger than 32kb are allowed anyway
[23:03] <cmccabe> tv: even if they were, you'd have to be insane
[23:04] <cmccabe> (looks like the C standard says size_t is at least uint16_t, and on linux it's uint32_t)
[23:05] <cmccabe> the only place you might get burned is on things like strlen, I guess
[23:07] <cmccabe> actually, I take that back... the argument to malloc is size_t
[23:07] <cmccabe> yeah, a 16-bit size_t is ludicrous
[23:17] * allsystemsarego_ (~allsystem@188.27.164.67) Quit (Quit: Leaving)
[23:35] * neurodrone (~neurodron@64.206.151.218) has joined #ceph
[23:38] <bchrisman> Is this an authentication problem? http://pastebin.com/1VACsBkT
[23:41] <bchrisman> ??? and with debug_ms set to 20: http://pastebin.com/vRLtraX8
[23:42] <bchrisman> but doesn' tlook like that helps.
[23:42] <bchrisman> 2011-05-06 21:40:11.448803 7f9a74e4a720 monclient(hunting): authenticate timed out after 30
[23:46] <cmccabe> bchrisman: not sure that that is an auth problem
[23:47] <cmccabe> bchrisman: looks more like a generic messenger timeout
[23:47] <gregaf> it looks more like the mon messenger is busted somehow...
[23:51] <bchrisman> any better check on those than:
[23:51] <bchrisman> 2011-05-06 21:51:32.659282 mon e1: 3 mons at {0=192.168.98.109:6789/0,1=192.168.98.110:6789/0,2=192.168.98.111:6789/0}
[23:52] <Tv> bchrisman: auth problems should never lead to timeout
[23:52] <gregaf> turn on the mon messenger debug for whichever node the client is trying to connect to, and get logs from both sides to compare
[23:53] * neurodrone_ (~neurodron@64.206.151.218) has joined #ceph
[23:57] * neurodrone (~neurodron@64.206.151.218) Quit (Ping timeout: 480 seconds)
[23:57] * neurodrone_ is now known as neurodrone

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.