#ceph IRC Log


IRC Log for 2011-03-29

Timestamps are in GMT/BST.

[0:00] <Tv> neurodrone: are you perhaps compiling it on a fairly slow / low-memory machine?
[0:00] <neurodrone> Tv: I am doing it on a 512Megs VirtualBox VM.
[0:01] <Tv> neurodrone: that might be too little for c++ compilation :(
[0:01] <neurodrone> oh wow, is it?
[0:01] <Tv> i bumped our build vm to 6GB when i got frustrated with it.. :-)
[0:01] <neurodrone> Oh I see.
[0:02] <neurodrone> Cause there are no errors. The machine doesn't respond at all though.
[0:02] <neurodrone> 6GB is a lot. My MBP has 4gigs at max. :)
[0:02] <Tv> yeah probably in swap hell
[0:02] <neurodrone> hahah..
[0:02] <Tv> well the 6GB vm currently has ~2.5GB used
[0:02] <Tv> the rest is disk cache = makes it faster
[0:03] <gregaf> the MDCache is probably just the first big compile it has to do :/
[0:03] <neurodrone> ah
[0:03] <neurodrone> first big compile? Nooooo.
[0:03] <neurodrone> I thought it was that Message.cc thingy. Took me 20 mins to crawl through.
[0:03] <gregaf> ooof
[0:03] <neurodrone> s/me/my\ machine//
[0:03] <gregaf> Message.cc is pretty small...
[0:03] <neurodrone> ah
[0:04] <neurodrone> that's grim. :(
[0:04] <neurodrone> let's see. Maybe I will bump up my RAM.
[0:04] <neurodrone> but good to know it's not an exceptional (bad) behavior. :)
[0:04] <neurodrone> thanks Tv, gregaf :)
[0:08] <cmccabe> yeah, C++ is special.
[0:08] <cmccabe> patches to remove code from header files and reduce unecessary #include are welcome!
[0:09] <Tv> C++ is the special needs child of C ;)
[0:09] <gregaf> sjust is really excited: the C++ 2011 spec got released last week
[0:09] <cmccabe> auto is a good idea
[0:10] <cmccabe> static_assert is good, but you can already do it in a library
[0:11] <cmccabe> TLS is good, but we already have __thread
[0:11] <gregaf> auto
[0:12] <gregaf> and completions
[0:12] <gregaf> those are the most exciting for us
[0:12] <gregaf> maybe eventually we can kill the ridiculous Contexts
[0:12] <cmccabe> lambdas you mean?
[0:12] <sjust1> yes
[0:12] <cmccabe> I think when you open your debugger, you'll want to see a real class as your context
[0:13] <gregaf> bchrisman: hmm, what debug settings did you use on the OSD?
[0:13] <cmccabe> most attempts to kludge in lambdas to a language like that are doomed to suck
[0:13] <cmccabe> c.f. Java
[0:13] <gregaf> or is the log maybe not from the same time period as the client?
[0:13] <sjust1> cmccabe: that may be a good point, but from what I have seen of the spec, it looks pretty good
[0:13] <bchrisman> gregaf: should I see that transaction id in the osd log?
[0:14] <gregaf> bchrisman: all I see in the log is osd heartbeats and similar maintenance bits
[0:14] <bchrisman> gregaf: I thought I tracked it down to the proper osd...
[0:14] <gregaf> not any actual work
[0:14] <bchrisman> gregaf: log files were awful small.
[0:14] <bchrisman> gregaf: lemme check again
[0:14] <gregaf> did they maybe get rotated out from under you?
[0:14] <bchrisman> gregaf: any canaries that'l show me that I'm getting the logging I'm looking for?
[0:15] <gregaf> the time stamps in the osd log start ~20 minutes after the client log you sent, it looks like?
[0:15] <bchrisman> I'm not doing rotation right now.. at least not intentionally. I'm setting these through injectargs
[0:15] <bchrisman> and then hupping
[0:15] <neurodrone> yay! it got through! 2gigs did the magic. :)
[0:15] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[0:15] <neurodrone> phew!
[0:15] <gregaf> you should see osd_op messages being passed around
[0:15] <bchrisman> hmm… I might've goofed on the pull.. will check again.
[0:15] <bchrisman> okie
[0:15] <gregaf> and a lot of stuff about "client41xx: yyyy"
[0:15] <bchrisman> cool.. that'll help me identify what I'm looking at.
[0:15] <gregaf> I think there's log rotation by default, although I don't remember how frequent it is
[0:17] <cmccabe> gregaf: I don't think there's any log rotation by default
[0:17] <Tv> i'd say rotation yes, pruning no
[0:17] <cmccabe> except that if you have log_per_instance set, it will rotate each time you restart the daemon
[0:18] <gregaf> Tv: looks like debian gitbuilder ran outta space
[0:18] <cmccabe> log_per_instance is false by default
[0:19] <Tv> gregaf: yeah sadly it's a fork of my gitbuilder stuff, and it didn't get the clean-up-old-stuff fix yet
[0:19] <cmccabe> we open logfiles O_APPEND and never truncate, no matter how big it gets
[0:20] <cmccabe> I asked sage about this once, but he believes rotation is best done by a separate daemon, like logrotate
[0:20] <Tv> gregaf: i'll clean up manually and try to poke Sage at some point to figure out what all he did
[0:20] <gregaf> okay, I just thought it got set up by default at some point
[0:20] <cmccabe> nah
[0:20] <cmccabe> well, if you use vstart, it sets log_per_instance
[0:20] <Tv> gregaf: umm it still has 2GB free?
[0:20] <gregaf> Tv: hmm
[0:21] <gregaf> I just saw that latest master failed
[0:21] <cmccabe> but you didn't use vstart... did you? :)
[0:21] <gregaf> "/usr/bin/install: writing `/tmp/buildd/ceph-0.25-454-g98dd2d1/debian/tmp/usr/bin/radosgw_admin': No space left on device"
[0:21] <Tv> gregaf: perhaps the source tree consumes >2GB while compiling
[0:21] <gregaf> and oh man, don't paste a line like that into an irc client without preceding characters other than /
[0:23] <Tv> oh found it, 9.4GB of logs
[0:27] * pombreda (~Administr@ has joined #ceph
[0:29] <pombreda> Howdy: gregaf, is this appropriate to chat about issues I have with the ceph playground @ dreamhsot here?
[0:29] <gregaf> pombreda: nowhere else to do it that I know of :)
[0:31] <pombreda> gregaf: some things are very wrong there : If I touch a new file, it gets create with some content, and as root. I cannot edit it further. just read it. the content that is put in a file I touch seems random
[0:31] <pombreda> gregaf: let me paste you privately a typical session output
[0:31] <Tv> gregaf: log compression automated, space cleaned up, failed builds flagged for recompilation
[0:32] <gregaf> pombreda: hmm, that's not good
[0:32] <pombreda> gregaf: sounds like there is something wacko going on :P
[0:32] <gregaf> yes
[0:32] <pombreda> which what a playground is for :P
[0:33] <pombreda> gregaf it has likely been going since Fraiday, but I could not really pinpoint what was wrong
[0:48] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[0:55] <sagewk> tv: where does autobuild-ceph.git live now?
[0:56] <Tv> sagewk: i think it didn't end up hosted properly yet, not sure why
[0:56] <Tv> i don't think it contains any secrets, re-checking
[0:56] <sagewk> then i can push the deb builder stuff
[0:57] <Tv> ok setting it up on ceph.newdream just like the others
[0:57] <sagewk> k
[1:06] <Tv> sagewk: /git/autobuild-ceph.git at your service
[1:08] <sagewk> tv: thanks
[1:08] <Tv> sagewk: kclient branch ceph-crypto-key now has keytype "ceph"
[1:08] <Tv> sagewk: next up, mount.ceph
[1:09] <Tv> sagewk: the two last commits on that branch might be better squashed together, i don't have strong feelings but felt that this way it's easier for you to see what's new
[1:38] <gregaf> pombreda: looks like this issue is fallout from another (very short-lived) bug and how we handled the cleanup
[1:38] <gregaf> we have an idea of how to fix it but want to look at what happened a little more closely
[1:38] <gregaf> will let you know when it's resolved
[1:42] * imcsk8 (~ichavero@ Quit (Quit: Leaving)
[1:44] <Tv> hmm.. my small ceph cluster uses 10MB of disk per 2 minutes, only client is kclient mount-umount loop, no files touched
[1:45] <Tv> perhaps atime of root dir or something that silly
[1:47] <gregaf> Tv: you mean the logging it's generating?
[1:47] <Tv> 2011-03-28 16:46:47.680889 pg v196: 24 pgs: 24 active+clean+degraded; 683 KB data, 33328 MB used, 100 GB / 140 GB avail; 54/108 degraded (50.000%)
[1:47] <Tv> the "used" number there
[1:48] <Tv> i understand that's journaling; i'm curious what operations need permanent storage when it's just connect/unconnect
[1:48] <sagewk> the mds is journaling session opens/closes.
[1:48] <Tv> ahh so it actually keeps track of those, reliably?
[1:48] <gregaf> Tv: and depending on what drives stuff is stored on your debug logs might be part of that...
[1:48] <gregaf> Tv: has to for correctness
[1:49] <gregaf> to keep track of capabilities and ensure reliable data recovery on daemon failure
[1:49] <gregaf> pombreda: issue should be fixed now!
[1:49] <gregaf> there was a short-lived issue that lost some data on the cluster
[1:49] <gregaf> while the data was still cached in the client
[1:50] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[1:50] <gregaf> which was resulting in the odd results from touching
[1:50] <gregaf> but your files are now your own, and the empty ones are empty as they should be :)
[2:07] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:12] <pombreda> gregaf: let me check :P
[2:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[2:13] <pombreda> gregaf: was this the metada erver getting confused somehow? just curious
[2:16] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:24] <bchrisman> gregaf: okay… think I've got it there now.
[2:24] <bchrisman> didn't quite know what all I was looking for before and just grabbed the last 100k lines on my way out the door.. relevant logs were some 3-400k lines from the bottom...
[2:25] <bchrisman> so I took 20 minutes before the cfuse crash up until a short bit after the last osd_op
[2:28] * samsung (~samsung@ has joined #ceph
[2:32] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[2:37] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:48] <pombreda> gregaf: thx mucho: all seems back to a-ok normal for now :P
[2:49] <pombreda> gregaf: i'll tell you if there has been more corruption on the 11TB dataset in a day or two, once the veruf of integrity is complete
[2:49] <pombreda> *verification
[2:55] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:19] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[3:30] * cmccabe (~cmccabe@ has left #ceph
[4:59] * pombreda (~Administr@ Quit (Remote host closed the connection)
[5:15] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:16] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[5:30] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) has joined #ceph
[5:32] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[5:56] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:13] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) Quit (Quit: Leaving.)
[7:30] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[9:45] * allsystemsarego (~allsystem@ has joined #ceph
[11:32] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[11:35] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[11:37] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit ()
[11:43] * morse (~morse@ has joined #ceph
[12:29] * samsung (~samsung@ Quit (Ping timeout: 480 seconds)
[12:43] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[12:50] * samsung (~samsung@ has joined #ceph
[13:52] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:10] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[14:48] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[14:49] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:53] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[15:05] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:30] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:22] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[16:23] * morse (~morse@ Quit (Quit: Bye, see you soon)
[16:23] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:32] * samsung (~samsung@ Quit (Quit: Leaving)
[16:32] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:33] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[16:42] * hijacker_ (~hijacker@ Quit (Ping timeout: 480 seconds)
[16:45] * hijacker_ (~hijacker@ has joined #ceph
[17:24] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[17:24] * lxo (~aoliva@ has joined #ceph
[17:27] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:30] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:36] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:08] * neurodrone (~neurodron@dhcp210-105.wireless.buffalo.edu) has joined #ceph
[18:17] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[18:38] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[18:43] <sagewk> cmccabe: kill_after is old
[18:43] <sagewk> max_open_files is used in the init scripts only
[18:47] <cmccabe> oh, was that the thing we read in init-ceph
[18:47] <cmccabe> yeah, there is it
[18:47] <cmccabe> it is
[18:48] * neurodrone (~neurodron@dhcp210-105.wireless.buffalo.edu) Quit (Quit: neurodrone)
[18:50] <cmccabe> what's the point of "mon addr" in cmon.cc
[18:54] <sagewk> that's the address cmon binds to
[18:54] <cmccabe> I don't think so
[18:55] <sagewk> well, it binds to the monmap address for the rank.
[18:55] <cmccabe> seems to be parsed and then thrown away
[18:55] <sagewk> but on the client side that's what's used to know who to connect to, so it should always match. if not you may have trouble connecting
[18:55] <sagewk> which is why it warns if there is a difference.
[18:56] <cmccabe> oh, yeah, on the client side it has an effect
[18:57] <cmccabe> we're not doing metavariable expansion on it correctly at the moment
[18:58] <cmccabe> on the other hand, I guess that might not be useful for an IP address
[18:58] <sagewk> yeah
[19:00] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:03] * neurodrone (~neurodron@dhcp210-105.wireless.buffalo.edu) has joined #ceph
[19:07] * neurodrone (~neurodron@dhcp210-105.wireless.buffalo.edu) Quit ()
[19:31] <Tv> 10:30?
[19:54] * imcsk8 (~ichavero@ has joined #ceph
[20:12] <Tv> in & out of the post office in 20 min, and that included going through the line twice
[20:12] <Tv> today must be a good day
[20:28] <stingray> huh
[20:28] <stingray> so, somebody killed 2 out of my 3 monitor nodes
[20:28] <stingray> now monitor appears to never succeed in an election
[20:28] <stingray> what I am doing wrong
[20:29] <stingray> (by killed I mean took them out of the rack and threw into a window)
[20:29] <stingray> s/into/out of the/
[20:30] <Tv> stingray: your remaining monitor node thinks the two others are just partitioned off, alive, and maintain the majority of the decision making power
[20:30] <stingray> yeah, I guess so
[20:30] <Tv> otherwise a network split between A&B and C would make them both be active clusters
[20:30] <stingray> the question is, how to beat it into submission
[20:31] <Tv> i'm not sure about this, but perhaps just take out the other mons from the config..
[20:31] <Tv> perhaps gregaf or someone can voice a more informed opinion
[20:31] <stingray> does it have to do something with monmap
[20:31] <sagewk> stingray: you can recreate teh other two by just rsyncing the surviving monitor to new locations
[20:32] <sagewk> the monmap still has the old addresses, though, so removing them from the cluster is tricky. not sure it's easily done without first having a healthy cluster or manual tweaking
[20:32] <Tv> ah
[20:33] <sagewk> possibly just updating the monmap file in the mon data dir is enough, but i'm not certain about that. at some point we should document the procedure.
[20:33] <stingray> so no "ceph mon delete" etc
[20:33] <stingray> ?
[20:34] <stingray> sagewk: yeah, I can rsync the monmap
[20:34] <stingray> but the question how to update addresses still stays
[20:34] <Tv> sagewk: and automate a test of it... ;)
[20:34] <sagewk> there is a wiki page on adding/removing monitors on a healthy cluster
[20:34] <stingray> only adding.
[20:34] <sagewk> so once you get 2 or 3 up that should work
[20:34] <sagewk> hmm.
[20:34] <sagewk> should be analogous?
[20:35] <stingray> $ ceph mon add beta
[20:35] <stingray> I guess "ceph mon del" will fail if I don't have quorum, right?
[20:35] <stingray> or "ceph mon del"
[20:35] <stingray> ceph mon add, sorry
[20:35] <sagewk> ceph mon remove ...
[20:36] <stingray> I can in theory create 2 entirely new nodes thish shall bring the number to 3 out of 5 which in theory should be enough for quorum
[20:36] <sagewk> once the monitor is active (you need two)
[20:36] <sagewk> you started with 3?
[20:36] <stingray> but I need to do "ceph mon add" on the original node which is parititoned and doesn't seem to accept anything
[20:36] <stingray> yes
[20:36] <stingray> I had 3, then 2 were destroyed
[20:37] <sagewk> you just need to get one up. rsync teh mon data dir, start up another cmon for another rank (binding to the correct address)
[20:37] <stingray> and the remaining one restarted
[20:37] <sagewk> once it's up, 'ceph mon remove that_one'
[20:37] <stingray> what if I can't bind to that address? there's another machine there now
[20:37] <sagewk> "that should do the trick..."
[20:37] <stingray> which will not appreciate the disruption
[20:38] <sagewk> the cmon needs to bind to the correct address.. can you run it on the node with the ip now?
[20:39] <stingray> no.
[20:39] <sagewk> ok, let me make a fix...
[20:40] <stingray> that node is offline, the replacement node is setup in such a way I can't add that ip there
[20:40] <stingray> well, it's not that urgent
[20:40] <stingray> I'll be flying from SFO back to ireland soon
[20:40] <stingray> which will take almost 20 hours as I have to go through JFK
[20:41] <stingray> so I'llbe able to do something useful to it only on thursday
[20:41] <stingray> I was just kind of curious about exercising this recovery scenario
[20:42] <stingray> where I lost all but one monitor but still need the data
[20:47] <Tv> alright, key api change looks good to me.. ceph.git branch mount-keys, ceph-client.git branch ceph-crypto-key, both are backwards compatible against each other
[20:47] <Tv> sagewk, who else wants to review?
[20:51] <sagewk> stingray: pushed a patch to next branch that lets you do
[20:51] <sagewk> ./cmon -i a --inject-monmap foo
[20:51] <sagewk> to inject a (modified) monmap into a down monitor
[20:52] <sagewk> the last map will be in $mon_data/monmap/<highest integer> .. copy that to a temp location, modify it with monmaptool (--rm the old monitors), then inject it into the survivor, then start it up. you should be all set
[20:52] <sagewk> don't forget to remove it from ceph.conf too
[20:52] <sagewk> (the old ones)
[20:52] <cmccabe> tv: in this line, "+ ret = ceph_crypto_key_decode(ckey, &p, (void*)data+datalen);"
[20:52] <cmccabe> tv: pointer arithmetic on void pointers is undefined :(
[20:52] <Tv> cmccabe: good catch
[20:52] <cmccabe> tv: thx
[20:53] <cmccabe> everyone just casts to char* to get around it. it's a little annoying
[20:54] <stingray> sagewk: thanks!
[20:55] <sagewk> tv: looks okay to me. does it work? :)
[20:55] <Tv> btw
[20:56] <Tv> if (list_empty(&req->r_osd_item))
[20:56] <Tv> req->r_osd == NULL;
[20:56] <Tv> ==..
[20:56] <Tv> sagewk: worked earlier, rebasing to fix bug cmccabe just pointed out, will test again after that
[20:57] <sagewk> tv: right.
[20:57] <sagewk> net/ceph/osd_client.c:886: warning: statement with no effect
[20:59] <cmccabe> tv: nice fix in osd_client.c too, with the event_work check
[20:59] <cmccabe> tv: I don't know if they'll want that as a separate patch or not. It could be one
[21:00] <Tv> sagewk: ceph-crypto-key branch updated with (void*)+foo fix
[21:01] <Tv> sagewk: as far as i can see, this is all we need right now; there's future clean up in 1) removing secret= support 2) using the common key struct more and deeper in ceph
[21:01] <Tv> sagewk: oh and yes it still worked ;)
[21:02] <sagewk> cool.
[21:03] <sagewk> merge window is still open, shall i send it?
[21:07] <Tv> sagewk: i'd say yes
[21:10] <cmccabe> an owl is totally going nuts outside making owl noises
[21:10] <cmccabe> I didn't think they were active during the day
[21:14] <sagewk> cmccabe: are you halfway through the g_conf string conversion?
[21:14] <cmccabe> sagewk: yep
[21:14] <sagewk> i'm having trouble with the g_conf.monmap checks in build_initial_monmap on current master
[21:14] <sagewk> ok
[21:14] <cmccabe> sagewk: it turns out I need to modify confutils to a certain extent
[21:14] <cmccabe> sagewk: hmm, let me look at the monmap thing
[21:15] <sagewk> - if (g_conf.monmap) {
[21:15] <sagewk> + if (g_conf.monmap && g_conf.monmap[0]) {
[21:15] <sagewk> fixed it for radosgw
[21:15] <cmccabe> fair enough I guess
[21:15] <sagewk> that'll go away shortly though right? those are turning into std::string?
[21:15] <cmccabe> yeah
[21:15] <sagewk> cool :)
[21:16] <cmccabe> so you can add that now if it's urgent I guess
[21:16] <sagewk> naw
[21:16] <cmccabe> a lot of checks against nil/char0 will become !empty()
[21:17] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[22:42] <Tv> i'm replying to a list email but need info
[22:42] <Tv> how would one tell e.g. osd.0 to listen on a specific port?
[22:43] <Tv> quick greps are not giving me the config option
[22:44] <Tv> public_addr it seems
[22:45] <bchrisman> Tv: throw a :(port num) on it?
[22:46] <Tv> yeah trying first before i give that out as advice ;)
[22:46] <bchrisman> heh
[23:04] <Tv> "2011-03-29 14:04:02.635646 7f9e3d090720 can't open : error 2: No such file or directory"
[23:04] <Tv> now who's trying to use an empty filename...
[23:04] <Tv> cmccabe: does this sound like a g_conf c string vs string bug?
[23:05] <cmccabe> tv: where are you encountering that?
[23:05] <Tv> at least this
[23:05] <Tv> unable to read/decode monmap from : No such file or directory
[23:06] <Tv> maybe others
[23:06] <Tv> running vstart.sh
[23:06] <cmccabe> ok, let me check
[23:07] <Tv> if (g_conf.monmap) {
[23:07] <Tv> i guess that needs to stay if !g_conf.monmap.empty(), these days?
[23:07] <cmccabe> tv: hasn't been converted to std::string yet
[23:07] <Tv> hmm, what then..
[23:11] <cmccabe> 2011-03-29 14:12:34.586845 7fca7b29b720 auth: error reading file: .ceph_keyring
[23:26] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:27] <cmccabe> tv: it seems to be a problem with .ceph_keyring
[23:27] <Tv> cmccabe: yeah sorry got interrupted by sjust, will be back in a bit
[23:37] <sagewk> tv,cmccabe: thats the problem i hit earlier.
[23:37] <sagewk> testing for g_conf.monmap[0] etc. is the temp fix
[23:37] <Tv> open("", O_RDONLY) = -1 ENOENT (No such file or directory)
[23:37] <Tv> yeah
[23:37] <cmccabe> ah
[23:37] <cmccabe> duh
[23:38] <cmccabe> I'll submit the fix. don't want breakage in head of line
[23:38] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[23:39] <cmccabe> makes it harder for me to submit more
[23:40] <cmccabe> now I'm getting a crash in pick_random_mon
[23:41] <cmccabe> from MonClient::authenticate
[23:42] <cmccabe> I'm kind of confused about where to go from here
[23:42] <cmccabe> are we supposed to function without monitors or what
[23:42] <sagewk> no
[23:42] <Tv> cmccabe: SIGFPE? me too
[23:42] <sagewk> get_initial_monmap should fail before if there are no monitors
[23:42] <cmccabe> it's just because the code is doing foo % 0
[23:43] <sagewk> it sounds like it's taking the wrong path because of the string stuff
[23:44] <cmccabe> well, what path is it supposed to take when g_conf.monmap is not set?
[23:44] <sagewk> tv: btw autobuilder is failing due to keyutils dep
[23:44] <Tv> sagewk: naturally.. will fix
[23:44] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:45] <cmccabe> vstart is not supplying -m
[23:45] <sagewk> can you fix the ceph.spec.in too?
[23:45] <Tv> sagewk: i can try.. i don't know much about spec files
[23:45] <sagewk> cmccabe: there's a ceph.conf in the current dir that's passed to everyone who needs it
[23:45] <sagewk> it's pretty obvious, the hard part is figuring out the package name
[23:45] <cmccabe> I guess it should be adding the monitors from that conf
[23:46] <sagewk> cmccabe: no, it just passed -c ceph.conf to everyone
[23:46] <cmccabe> right
[23:46] <cmccabe> and that has the monitors in it
[23:46] <sagewk> oh right, misunderstood :)
[23:51] <cmccabe> ok, just needed another foo && foo[0]
[23:51] <cmccabe> I'm not sure why this surfaced now... I haven't changed that recently
[23:52] <sagewk> tv: doh, just missed -rc1 it looks like. maybe he'll merge it anyway since it's mostly bug fixes.
[23:53] <Tv> sagewk: well, it's not like any major feature would depend on that change
[23:53] <sagewk> yep

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.