#ceph IRC Log


IRC Log for 2012-01-05

Timestamps are in GMT/BST.

[0:04] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:39] * fronlius (~fronlius@d217032.adsl.hansenet.de) Quit (Quit: fronlius)
[2:01] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:07] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[2:08] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[2:41] * jojy (~jvarghese@ Quit (Quit: jojy)
[2:42] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:57] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[3:02] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[3:04] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:33] * Tobarja (~chatzilla@ has joined #ceph
[3:37] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[3:37] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit ()
[4:16] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[4:17] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[4:23] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[4:31] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[4:44] * vodka (~paper@6.Red-88-11-191.dynamicIP.rima-tde.net) has joined #ceph
[4:56] * vodka (~paper@6.Red-88-11-191.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[6:20] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[8:21] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:23] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[8:26] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:36] * Tobarja (~chatzilla@ Quit (Quit: ChatZilla 0.9.88-rdmsoft [XULRunner])
[8:53] * NaioN (~stefan@andor.naion.nl) Quit (Remote host closed the connection)
[9:06] * NaioN (stefan@andor.naion.nl) has joined #ceph
[9:29] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[9:53] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:06] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:07] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:41] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[11:48] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[11:55] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:06] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) has joined #ceph
[12:27] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[12:45] * aneesh (~aneesh@ Quit (Remote host closed the connection)
[14:35] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[14:40] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:41] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:50] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:01] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[16:26] <nhm_> oh well, there is already an option to run the daemons through valgrind.
[16:26] <nhm_> s/well/wow
[16:50] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has left #ceph
[16:59] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) Quit (Remote host closed the connection)
[17:41] <sagewk> elder: ping
[17:41] <elder> Here.
[17:41] <sagewk> elder: just saw your xattr email.
[17:41] <sagewk> elder: don't we just need to prealloc the blob in removexattr?
[17:42] <elder> I have a fix.
[17:42] <elder> Yes.
[17:42] <elder> I made the removexattr code pretty much match the createxattr
[17:42] <sagewk> great
[17:42] <elder> I tested it and it's fine.
[17:42] <elder> I wanted to factor out the common stuff but haven't done that yet. I'm about to send my patch to the list though.
[17:44] <sagewk> great
[17:45] * nhm_ is now known as nhm
[17:46] <dwm_> Hmm, I think I've just triggered the enormous-truncate bug again.
[17:46] <dwm_> ... this time with rather more OSD-side logging, in theory..
[17:46] <dwm_> (Told all mds+osd processes to restart. Only the MDS and one OSD came back..)
[17:48] <dwm_> Hmm, will have to go for the network syslogs; nuclear logging on the hosts themselves filled /var.
[18:06] <sagewk> dwm_: the mds didn't crash? (were you running recent master?)
[18:11] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:14] <sagewk> elder: looks good. are you going to refactor into a helper now as well?
[18:15] <elder> I am about to do that. I've done some already but wanted to get the fix out first.
[18:16] <elder> Meanwhile I'm continuing running through xfstests--now running 074.
[18:18] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[18:19] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:30] * jojy (~jvarghese@ has joined #ceph
[18:30] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[18:30] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[18:52] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:14] <gregaf> dwm_: it might just be the log running out of space that killed them, then
[19:57] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[20:03] <wido> joshd: You said that binding to IPv6 is not needed for a client, but if I specify a hostname for the monitor which contains a A and a AAAA-record it seems that librados only tries over IPv4 and then fails
[20:05] <joshd> wido: I didn't think the client called bind at all
[20:07] <gregaf> it doesn't, but it does need to specify the connection target, and it probably defaults to the IPv4 address
[20:09] <joshd> seems like a bug in the client code then - it could try ipv6 before failing
[20:13] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) has joined #ceph
[20:14] <jmlowe> How long is too long to wait for a rbd rm to complete?
[20:14] <joshd> jmlowe: it's proportional to the size of the image
[20:14] <jmlowe> 80G
[20:15] * fronlius (~fronlius@e176056251.adsl.alicedsl.de) has joined #ceph
[20:16] <joshd> so that's 20480 objects it's trying to remove, if they're each 4M
[20:16] <dwm_> gregaf: It's only logging via syslog, not directly.
[20:16] <wido> I made the mistake of creating a 4TB image today... Removing that took some time
[20:16] <dwm_> sagewk: Still need to dig into the logs; the MDS restarted cleanly without a crash so far as I've found so far.
[20:16] <gregaf> dwm_: well you said they filled /var so you'd need to switch to syslog… ;)
[20:19] <joshd> wido: I can see how the ipv6 preference option makes sense, but it seems to me like the kind of thing that should work be default - trying one way and falling back if it doesn't work
[20:20] <joshd> anyone else have any thoughts on ipv6 preference?
[20:21] <wido> joshd: I agree that it should work by default. Imho the client should try IPv4 first and then fall back to IPv6
[20:22] <wido> That could inflict a connection delay however, so the recommendation is to have just a A or AAAA-record in your hostname
[20:22] <wido> otherwise you would go through the IPv4-first-try process on every connect
[20:23] <wido> joshd: http://pastebin.com/7arbWi4z
[20:25] <joshd> yeah, that makes sense - I'll make a bug
[20:30] <dwm_> gregaf: Indeed, but syslog was also logging locally.. oops.
[20:35] * vodka (~paper@41.Red-88-15-116.dynamicIP.rima-tde.net) has joined #ceph
[20:47] <jmlowe> I'm stuck in a degraded state, any suggestions?
[20:52] * BManojlovic (~steki@ has joined #ceph
[21:16] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[21:23] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) has joined #ceph
[21:27] <jmlowe> ok, so I think I now have a ceph cluster broken in a new and interesting way
[21:29] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[21:31] <jmlowe> I'm not sure how to succinctly express how it's broken
[21:32] <gregaf> jmlowe: sorry, we were all at lunch, but joshd or I would love to hear about interesting things! ;)
[21:32] <jmlowe> 2012-01-05 15:32:11.699822 mon <- [osd,stat]
[21:32] <jmlowe> 2012-01-05 15:32:11.700729 mon.1 -> 'e1242: 12 osds: 12 up, 12 in' (0)
[21:33] <jmlowe> 2012-01-05 15:31:52.109524 pg v2837470: 1200 pgs: 1186 active+clean, 14 active+clean+degraded; 256 GB data, 515 GB used, 21497 GB / 22355 GB avail; 2494/135700 degraded (1.838%)
[21:33] <jmlowe> and stuck at 1.838%
[21:33] <jmlowe> can't seem to mount or use rbd
[21:33] <gregaf> ah, stuck recovery
[21:33] <gregaf> no rbd? that is different
[21:33] <gregaf> what version are you running?
[21:33] <jmlowe> mount error 5 = Input/output error
[21:34] <jmlowe> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
[21:34] <jmlowe> any suggestions for stuck recovery?
[21:34] <jmlowe> perhaps if I poke it and get the recovery going everything will free up?
[21:34] <gregaf> there are suggestions and I don't remember them…joshd?
[21:35] <jmlowe> ceph -a stop;sleep 60; ceph -a start didn't do much
[21:35] <joshd> jmlowe: do 'ceph pg dump | grep degraded' - this should point you to the osds that are having problems
[21:35] <joshd> check their logs (might need to turn up debugging)
[21:38] <joshd> in particular look at the primaries (the first osd in the acting set show by pg dump) for the degraded pgs
[21:39] <jmlowe> looks like osd.8 if I read it correctly
[21:39] <jmlowe> [8] [8]
[21:39] <jmlowe> for all of them
[21:41] <joshd> can you turn up logging on osd 8 and pastebin the result? (probably want debug ms 1, debug osd 20, debug filestore 20)
[21:42] <jmlowe> can you remind me of the way to do that so I don't have to go hunting
[21:44] <joshd> ceph osd tell 8 injectargs --debug-ms 1 --debug-osd 20 --debug-filestore 20
[21:46] <joshd> err, make that ceph osd tell 8 injectargs '--debug-ms 1 --debug-osd 20 --debug-filestore 20'
[21:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:46] <jmlowe> I should probably lay out the timeline of events: rbd snap, osd's start crashing, reboot osd, btrfs corrupt can't mount osd.8, follow replacement osd procedure
[21:46] <jmlowe> anything in particular I'm looking for?
[21:47] <joshd> errors from btrfs maybe (negative return values)
[21:49] <joshd> so this is a new osd.8 with a new btrfs underneath?
[21:49] <jmlowe> yes
[21:50] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:50] <jmlowe> http://pastebin.com/F1jgmMhi
[21:51] <jmlowe> that repeats
[21:56] <joshd> it doesn't look like the debugging took effect... I know there was a bug fixed recently with injectargs, but I thought it was working in .39
[22:00] <joshd> the rbd watch stuff happens whenever you try to open an image, so it might be repeating due to a client trying to reconnect
[22:01] <jmlowe> ok
[22:01] <jmlowe> clients are deadlocked so without a reboot
[22:02] <jmlowe> is this the bug you were referring to: log 2012-01-05 10:44:15.297145 osd.8 1 : [INF] ignoring empty injectargs
[22:03] <joshd> no, that's probably from when I forgot to put quotes around the debug arguments
[22:04] <jmlowe> doh, I see your correction now
[22:04] <joshd> I just tested, and it does seem to work in 0.39
[22:04] <jmlowe> yep, spitting out tons of logs
[22:05] <jmlowe> how about this one http://pastebin.com/Grr21asW
[22:07] <joshd> that's normal stuff, with stuck recovery we may have to try to kickstart recovery with the logs on to see if it gets stuck again
[22:07] <jmlowe> ok, how do I do that
[22:07] <joshd> ceph osd out 8; sleep 10; ceph osd in 8; should be good
[22:08] <joshd> maybe more time in between
[22:08] <nhm> odd, can't get to pastebin to look at your output.
[22:12] <jmlowe> well two things happened, pastebin went belly up and so did one of the clients banging away at this thing
[22:16] <jmlowe> how about this, does it have anything good in it? http://pastebin.com/uTSzNJ5k
[22:17] <joshd> can you paste the pg dump?
[22:26] <jmlowe> http://pastebin.com/XwRpD7ZH
[22:30] <joshd> hmm, all the stuck pgs are in pool 0
[22:33] <jmlowe> sounds about right, I don't think I use any other pools
[22:33] <jmlowe> I'm really only interested in rbd
[22:34] <jmlowe> do you think my best bet is to rebuild the cluster or try to salvage something?
[22:37] <joshd> depends on how long it takes to rebuild the cluster
[22:38] <joshd> one thing that might cause this stuck recovery is a crush bug - there were a couple corner cases fixed since 0.39 I think
[22:39] <joshd> you can grab the crushmap and osdmap with 'ceph osd getcrushmap > /tmp/crushmap; ceph osd getmap > /tmp/osdmap'
[22:40] <joshd> then see if the stuck pgs are mapping to [8] using the osdmaptool
[22:43] <jmlowe> is it just osdmaptool —print osdmap?
[22:44] <joshd> that'll show you it
[22:44] <jmlowe> osdmaptool --print osdmap
[22:44] <jmlowe> osdmaptool: osdmap file 'osdmap'
[22:44] <jmlowe> osdmaptool: error decoding osdmap 'osdmap'
[22:44] <jmlowe> tried 'ceph osd getmap > osdmap' a second time just to be sure
[22:45] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[22:45] <joshd> hmm, it's not working on master, but that's worse if it's not working in 0.39
[22:46] <jmlowe> so my osdmap may be borked?
[22:46] <joshd> no, I think there's a bug in the osdmaptool
[22:46] <jmlowe> ah
[22:46] <joshd> hopefully I just misremembered the dump syntax
[22:50] <jmlowe> so...
[22:50] <joshd> no, it seems to be broken
[22:50] <joshd> in light of that, I'd say rebuilding is simpler at this point
[22:51] <joshd> sorry :(
[22:52] <jmlowe> it's not the bleeding edge if you don't exsanguinate every now and then
[22:59] <joshd> oh, the bug is actually in the ceph tool - 'ceph osd getmap -o osdmap' will work
[23:00] <joshd> there must be some extraneous output on stdout
[23:03] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has left #ceph
[23:15] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[23:20] <sagewk> joshd, jmlowe: best to do '-o filename' to write the binary blob to a file.
[23:23] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) has joined #ceph
[23:28] <sagewk> wido: when you have a moment would you mind verifying my fix for http://tracker.newdream.net/issues/1835 ?
[23:44] * jmlowe (~Adium@129-79-134-204.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[23:48] * MarkDude (~MT@c-67-170-237-59.hsd1.ca.comcast.net) has joined #ceph
[23:55] * fronlius (~fronlius@e176056251.adsl.alicedsl.de) Quit (Quit: fronlius)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.