IRC Log for 2010-11-29

Timestamps are in GMT/BST.

[9:25] <jantje> *yawn*
[16:22] <failboat> ugh
[18:12] <fred_> hi
[18:13] <greglap> hi fred_
[18:15] <fred_> gregaf1, maybe you can help me... I'm wondering if the testing branch has a fix for #590, or if such a fix is still a work in progress
[18:18] <greglap> hmmm
[18:19] <greglap> I know they did some work on it, I'm not sure what the status is
[18:20] <cmccabe> fred_: let me check to see if there was anything in the commit logs
[18:20] <fred_> ok, I still haven't restarted my cluster as I'm waiting for a fix, but that's no problem I'll check back one of these days
[18:21] <greglap> yeah, I'm not seeing anything :/
[18:22] <fred_> same here
[18:26] <fred_> gotta go... I'm checking #590 almost every day, so if I see a fix, I'll test and report back
[18:26] <fred_> thanks, good day
[20:53] <failboat> http://tracker.newdream.net/issues/605 is killing me
[20:54] <gregaf1> failboat: the anchor table?
[21:31] * johnl plays with custom crush maps
[22:27] <failboat> gregaf1: yeah
[22:27] <failboat> I have one corrupted directory on the fs
[22:27] <failboat> can't do anything with it
[22:27] <failboat> :|
[22:28] <gregaf1> oh, I see
[22:29] <failboat> I will probably wipe the fs and start from scratch
[22:29] <gregaf1> yeah
[22:29] <gregaf1> fsck isn't going to be done for a while
[22:29] <gregaf1> did you submit a report when you noticed the corruption?
[22:30] <failboat> yeah
[22:30] <gregaf1> that's good
[22:30] <failboat> http://tracker.newdream.net/issues/593
[22:30] <failboat> that was me
[22:30] <gregaf1> we are starting to do design work on fsck but of course we'd rather not ever need it
[22:31] <failboat> well
[22:31] <failboat> in this case the best thing would be to have a recover/quarantine strategy in mds itself
[22:31] <sagewk> failboat: what would be really helpful, incidentally, is a workload that would reproduce the error
[22:32] <failboat> sagewk: that would create suck an entry?
[22:32] <failboat> such
[22:32] <sagewk> yeah
[22:32] <failboat> damn keyboards
[22:32] <sagewk> i.e. create new fs, work script x that creates a bunch of hardlinks, viola.
[22:32] <failboat> because I can delete pretty much everything except that directory and then targzip the osd :)
[22:32] <failboat> ah
[22:32] <failboat> ok
[23:10] <johnl> I've got my cluster into a situation where all the osds are crashing in loops :/
[23:10] <johnl> even if I shut down the entire cluster and start just one node.
[23:10] <sagewk> johnl: what's the stack trace look like?
[23:10] <johnl> http://pastebin.com/Yw0gjDnL
[23:11] <sagewk> were you creating/removing pools by any chance?
[23:11] <johnl> yeah
[23:12] <johnl> latest debian packages (on ubuntu lucid)
[23:13] <johnl> i changed the crushmap (quite significantly) and was watching it rebalance
[23:13] <sagewk> removing?
[23:13] <johnl> thought I'd delete some pools to speed it up :)
[23:13] <sagewk> can you reproduce the crash on an osd with 'debug osd = 20'?
[23:13] <johnl> yeah, np. does it on every start. 1min.
[23:13] <sagewk> and open an issue in the tracker to attach the log to? :)
[23:13] <sagewk> thanks
[23:14] <johnl> yep, np.
[23:15] <johnl> actually, not getting any log output in the log file
[23:15] <johnl> just that stack trace
[23:15] <johnl> repeatedly, over and over
[23:17] <sagewk> even with debug osd = 20 in teh [osd] section of your ceph.conf? you should get all sorts of chatter for that
[23:18] <johnl> ah, there is some preable, then the loop. sorry - was quick!
[23:18] <johnl> will file bug
[23:30] <johnl> opened #614

