#ceph IRC Log


IRC Log for 2012-02-25

Timestamps are in GMT/BST.

[0:09] <pulsar> sjust2: Tv|work gregaf1 you guys around? i have some test results after stress testing ceph you might be interessted in. oterhwise i will be shutting it down and giving it a try some time later with a newer kernel / ceph version
[0:09] <sjust2> pulsar: here
[0:09] <sjust2> and intereseted
[0:09] <sjust2> *interested
[0:09] <pulsar> oh, great.
[0:10] <pulsar> so basically i made a script to create directories like crazy. ended up with mds locked up at 14gb ram and serveral dead osds crashing right after restarting them.
[0:10] <sjust2> ah
[0:10] <pulsar> did not look into the logfiles, i do not feel it is an option for me right now
[0:11] <pulsar> so, if you want to take a look, i can patch you through. otherwise killall -9 ceph :)
[0:11] <sjust2> sagewk: are you interested in seeing the mds logs?
[0:12] <sagewk> not the mds logs... shouldnt' hard to recreate the situation
[0:12] <sagewk> lots of subdirs in teh same dir, or were individual dirs relatively small?
[0:12] <pulsar> not that much, i tried to hit 1b directories / files
[0:13] <pulsar> let me see how far i actually got
[0:14] <sagewk> were you careful to keep individual dirs small, or were there any that were big?
[0:14] <pulsar> ~ 6720000 directories / files
[0:15] <pulsar> and maximum number of children per directory might be around ....
[0:15] <pulsar> not even 20k
[0:15] <pulsar> i have 3 ods which will die after attempting a restart
[0:16] <pulsar> and a couple of deadlocked fuse mounts
[0:16] <sagewk> that may still be pushing into problem area, given that dir fragmentation is off.
[0:16] <sagewk> ok. we're definitely interested in the osd crashes!
[0:17] <pulsar> i can get you some log files then, if you want to watch over my shoulder i can give you screen sharing over skype
[0:18] <sagewk> just the stack traces in the log file may be enoug
[0:18] <pulsar> ok, i'll see what i can get you and msg you the download link
[0:18] <pulsar> i am just wondering...
[0:18] <pulsar> you guys are working full time on ceph?
[0:20] <sagewk> pulsar: yep!
[0:20] <sagewk> x ~10 people so far
[0:20] <pulsar> pretty much the best support i came across so far regarding open source project
[0:21] <pulsar> usually i end up in a dead irc channel or forum with plenty of users and maybe one person answering a question per day. so, thanks! has been a pleasure!
[0:21] <sagewk> you're welcome :)
[0:24] <pulsar> http://dl.dropbox.com/u/3343578/logs/crashing.ods.log
[0:25] <pulsar> there is one, crashing every time i restart it
[0:25] <pulsar> actually these are two instances
[0:26] <pulsar> on the same machine
[0:26] <pulsar> i have another server with only one osd crashing
[0:27] <sagewk> pulsar: any chance you can start it up with 'debug filestore = 10' and let it run to crash one more time?
[0:27] <pulsar> sure
[0:29] <pulsar> sagewk: [osd] .... debug filestore = 10
[0:29] <pulsar> or is it [osd] .... osd debug file store = 10
[0:29] <pulsar> ?
[0:29] <sagewk> [osd]
[0:29] <sagewk> debug filestore = 10
[0:30] <pulsar> that space/underscore substitution thing is a bit confusing :)
[0:31] <pulsar> logging...
[0:32] <pulsar> if i was going to try put that many directories into a ceph filesystem again, i take it it is a good idea to limit the number of children per directory to keep the mds memory usage low?
[0:33] <pulsar> i could come up with some path encoding based on md5 hashes to keep every fs node < 0xff entries for instance
[0:44] <sagewk> for now, until mds frag = true by default
[0:46] <pulsar> which is unstable?
[0:46] <pulsar> or not available for 0.41?
[0:58] <sagewk> unstable with recovery
[1:02] <pulsar> ic
[5:38] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[12:12] <pulsar> sagewk: logs are ready, see query/privmsg
These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.