#ceph IRC Log


IRC Log for 2010-11-29

Timestamps are in GMT/BST.

[0:15] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[3:05] * lidongyang_ (~lidongyan@ Quit (Remote host closed the connection)
[3:58] * lidongyang (~lidongyan@ has joined #ceph
[6:39] * ijuz__ (~ijuz@p4FFF5E7F.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[6:48] * ijuz__ (~ijuz@p4FFF662E.dip.t-dialin.net) has joined #ceph
[7:25] * f4m8_ is now known as f4m8
[7:28] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[7:38] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[8:21] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:12] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:25] <jantje> *yawn*
[10:47] * allsystemsarego (~allsystem@ has joined #ceph
[11:04] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[11:09] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Read error: Connection reset by peer)
[14:59] * tnt (~tnt@mojito.smartwebsearching.be) has left #ceph
[15:12] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[15:52] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[15:58] * andret (~andre@pcandre.nine.ch) has joined #ceph
[16:10] * f4m8 is now known as f4m8_
[16:22] <failboat> ugh
[17:10] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:34] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[17:42] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[17:51] * greglap (~Adium@ has joined #ceph
[18:12] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) has joined #ceph
[18:12] <fred_> hi
[18:13] <greglap> hi fred_
[18:15] <fred_> gregaf1, maybe you can help me... I'm wondering if the testing branch has a fix for #590, or if such a fix is still a work in progress
[18:18] <greglap> hmmm
[18:19] <greglap> I know they did some work on it, I'm not sure what the status is
[18:20] <cmccabe> fred_: let me check to see if there was anything in the commit logs
[18:20] <fred_> ok, I still haven't restarted my cluster as I'm waiting for a fix, but that's no problem I'll check back one of these days
[18:21] <greglap> yeah, I'm not seeing anything :/
[18:22] <fred_> same here
[18:26] <fred_> gotta go... I'm checking #590 almost every day, so if I see a fix, I'll test and report back
[18:26] <fred_> thanks, good day
[18:26] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) Quit (Quit: Leaving)
[18:34] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:42] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:46] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:53] <failboat> http://tracker.newdream.net/issues/605 is killing me
[20:54] <gregaf1> failboat: the anchor table?
[21:31] * johnl plays with custom crush maps
[22:27] <failboat> gregaf1: yeah
[22:27] <failboat> I have one corrupted directory on the fs
[22:27] <failboat> can't do anything with it
[22:27] <failboat> :|
[22:28] <gregaf1> oh, I see
[22:29] <failboat> I will probably wipe the fs and start from scratch
[22:29] <gregaf1> yeah
[22:29] <gregaf1> fsck isn't going to be done for a while
[22:29] <gregaf1> did you submit a report when you noticed the corruption?
[22:30] <failboat> yeah
[22:30] <gregaf1> that's good
[22:30] <failboat> http://tracker.newdream.net/issues/593
[22:30] <failboat> that was me
[22:30] <gregaf1> we are starting to do design work on fsck but of course we'd rather not ever need it
[22:31] <failboat> well
[22:31] <failboat> in this case the best thing would be to have a recover/quarantine strategy in mds itself
[22:31] <sagewk> failboat: what would be really helpful, incidentally, is a workload that would reproduce the error
[22:32] <failboat> sagewk: that would create suck an entry?
[22:32] <failboat> such
[22:32] <sagewk> yeah
[22:32] <failboat> damn keyboards
[22:32] <sagewk> i.e. create new fs, work script x that creates a bunch of hardlinks, viola.
[22:32] <failboat> because I can delete pretty much everything except that directory and then targzip the osd :)
[22:32] <failboat> ah
[22:32] <failboat> ok
[23:10] <johnl> I've got my cluster into a situation where all the osds are crashing in loops :/
[23:10] <johnl> even if I shut down the entire cluster and start just one node.
[23:10] <sagewk> johnl: what's the stack trace look like?
[23:10] <johnl> http://pastebin.com/Yw0gjDnL
[23:11] <sagewk> were you creating/removing pools by any chance?
[23:11] <johnl> yeah
[23:12] <johnl> latest debian packages (on ubuntu lucid)
[23:13] <johnl> i changed the crushmap (quite significantly) and was watching it rebalance
[23:13] <sagewk> removing?
[23:13] <johnl> thought I'd delete some pools to speed it up :)
[23:13] <sagewk> can you reproduce the crash on an osd with 'debug osd = 20'?
[23:13] <johnl> yeah, np. does it on every start. 1min.
[23:13] <sagewk> and open an issue in the tracker to attach the log to? :)
[23:13] <sagewk> thanks
[23:14] <johnl> yep, np.
[23:15] <johnl> actually, not getting any log output in the log file
[23:15] <johnl> just that stack trace
[23:15] <johnl> repeatedly, over and over
[23:17] <sagewk> even with debug osd = 20 in teh [osd] section of your ceph.conf? you should get all sorts of chatter for that
[23:18] <johnl> ah, there is some preable, then the loop. sorry - was quick!
[23:18] <johnl> will file bug
[23:21] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:30] <johnl> opened #614

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.