[1:50] <wido> sagewk: here now
[1:50] <sagewk> wido: ah! sorry i'm just heading out the door
[1:50] <wido> vacation time here at work!
[1:50] <wido> under powered these weeks
[1:50] <sagewk> wido: wanted to ask more about the high pool count problems you saw on friday... maybe we can talk tomorrow? or you can send an email with more detail?
[1:51] <wido> yeah, sure. we'll talk tomorrow
[1:51] <sagewk> wido: sounds good. don't work too hard!
[1:51] <wido> tnx!
[1:51] <wido> ttyl
[18:01] <wido> sagewk: I'll be online in about 5 hours
[18:11] <gregaf> wido: I think we'll all be around then :)
[22:10] <wido> hi
[22:11] <cmccabe> hi wido, just about to head to lunch
[22:11] <cmccabe> I think sage is coming back from lunch soonish
[22:11] <cmccabe> bbl
[22:13] <wido> ok, I'll hang around a bit
[22:17] * aliguori (~anthony@ has joined #ceph
[22:22] <wido> sagewk: Back from lunch yet?
[22:25] <sagewk> wido: yeah
[22:27] <wido> ah, ok
[22:27] <wido> What I noticed during the last weeks was the fact that the recovery seems to struggle when there are a lot of PG's
[22:27] <wido> We all know that the recovery right now hammers the hell out of the machines
[22:28] <wido> But during soms tests I found out that on my Atom platform, when I go over the ~10% of its capacity, it starts to struggle
[22:28] <wido> it's 74TB, so that's 7.4TB of data
[22:29] <wido> each OSD than has about 200G on it disk, which really starts to become heavy
[22:29] <wido> If for some reason a OSD goes down, than for 90% of the time your done
[22:30] <wido> I could really get a grip on it, but I kept getting crash after crash. All seemed to be osdmap encode/decode related. Like the osd map changes were following eachother up to fast
[22:31] <wido> Riht now I'm reaching the 10% filling again, that is almost a recipe for disaster. I had some disk failures, no way I could recover from those. The whole cluster started bouncing around
[22:32] <wido> I'm currently at 8k PG's and that's also the "magic" limit I noticed, everything below 8k goes fine
[22:33] <Tv> wido: my understanding is that peering/recovering consumes way more cpu than steady state
[22:33] <sjust> Tv: yeah, but the crashes indicate that we have a more significant problem
[22:33] <Tv> yes
[22:34] <Tv> just saying, that's why Atoms might be problematic *once something crashes*
[22:34] <wido> But the weird thing is, the crashes are so random, I could keep reporting them, but I would then create 5 issues a day
[22:34] <wido> or sometimes even more
[22:34] <sjust> wido: actually, out of memory errors might manifest this way
[22:35] <wido> Sure, I know that recovery is heavy and that Atoms might be a bit underpowered, but imho a OSD should be able to run on a Atom
[22:35] <sjust> wido: yeah, I agree
[22:35] <sagewk> it woudl crash in decode?
[22:35] <sjust> sagewk: just guessing, but in encode, at least, the bufferlist is expanding
[22:36] <sjust> unlikely to be causing all of this, though
[22:36] <wido> sagewk: No, encode, my bad
[22:36] <sagewk> what kind of crashes stack traces are you seeing?
[22:36] <wido> First the encode strack traces, but second a lot of PG state related
[22:37] <wido> But what I also noticed, if I start a crashed OSD again, it could take up to 30 minutes before it comes up, for the first 30 minutes it's only scanning it's local dir
[22:37] <sagewk> can you pastebin some of them? or stick them in a tracker issue?
[22:37] <sagewk> hmm, that is probably the missing set calculation
[22:38] <wido> I don't have them right now, but since my cluster is reaching the magical 10% again I'll kill some OSD's and it will start bouncing again
[22:39] <sagewk> that'd be great.
[22:39] <wido> sure
[22:40] <sagewk> sjust: i suspect the slow startup is for any pgs that have a backlog.. PG::read_log() scans the log items for anything missing.
[22:43] <wido> during that slow startup the OSD is eating 100% CPU btw
[22:43] <sjust> sagewk: yeah
[22:44] <wido> What I'll do is kill some OSD's and slowly start them up again. I'll keep a close eye on the logs of what they are doing during the slow startup and gather all the stack traces I'll can
[22:44] <wido> I'll stick to v0.31 btw
[22:45] <sagewk> k
[22:45] <wido> but still, my small cluster with 6 OSD's is still running fine :) No crashes or whatsoever. But that's only 50G of data
[22:46] <wido> Oh btw, the bash completion hasn't been merged yet? Something wrong with it?
[22:49] <sagewk> oh yeah. not really.. just some hesitation to enshrine the syntax in another location.
[22:50] <sagewk> probably worth it, unless anyone els objects?
[22:50] <sagewk> brb
[22:50] <wido> get your point, might be useful for new user though, just tab their way through the commands
[22:52] <Tv> i wish there was a standard for "this is my calling convention"
[22:53] <Tv> something where the man page synopsis and bash completion are extracted from the actual source
[22:53] <gregaf> that would be awesome, but scary
[22:53] <Tv> yeah, complex beast
[23:01] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[23:05] <sagewk> wido: applied
[23:05] <wido> sagewk: cool :) Now it's the only docs for some magic commands ;)
[23:05] * aliguori (~anthony@ Quit (Read error: Operation timed out)
[23:06] <wido> I grabbed most of it out of the source code
[23:18] * aliguori (~anthony@ has joined #ceph
[23:29] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph

