#ceph IRC Log


IRC Log for 2011-07-26

Timestamps are in GMT/BST.

[0:19] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[0:20] * lxo (~aoliva@ has joined #ceph
[1:01] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[1:12] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[1:24] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[1:29] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[1:31] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:50] <wido> sagewk: here now
[1:50] <sagewk> wido: ah! sorry i'm just heading out the door
[1:50] <wido> vacation time here at work!
[1:50] <wido> under powered these weeks
[1:50] <sagewk> wido: wanted to ask more about the high pool count problems you saw on friday... maybe we can talk tomorrow? or you can send an email with more detail?
[1:51] <wido> yeah, sure. we'll talk tomorrow
[1:51] <sagewk> wido: sounds good. don't work too hard!
[1:51] <wido> tnx!
[1:51] <wido> ttyl
[2:13] * cmccabe (~cmccabe@ Quit (Quit: Leaving.)
[2:27] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:44] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Ping timeout: 480 seconds)
[2:50] * Juul (~Juul@3408ds2-vbr.4.fullrate.dk) Quit (Quit: Leaving)
[3:01] * joshd (~joshd@ip-64-111-111-107.dreamhost.com) Quit (Quit: Leaving.)
[3:05] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[3:22] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[4:13] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[5:17] * darkfaded (~floh@ Quit (Ping timeout: 480 seconds)
[5:17] * darkfader (~floh@ has joined #ceph
[5:31] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[5:50] * eternaleye_ is now known as eternaleye
[7:33] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[8:17] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[8:35] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Remote host closed the connection)
[9:01] * verwilst (~verwilst@dD576F3A5.access.telenet.be) has joined #ceph
[9:24] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[9:37] * votz (~votz@pool-72-78-219-212.phlapa.fios.verizon.net) has joined #ceph
[9:54] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[10:30] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:58] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[11:23] * yoshi (~yoshi@p4094-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[11:27] * yoshi (~yoshi@p4094-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:21] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[12:23] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[12:24] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[12:43] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[12:56] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[13:09] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[13:17] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[15:43] * lxo (~aoliva@ Quit (Read error: No route to host)
[15:47] * lxo (~aoliva@19NAACRKS.tor-irc.dnsbl.oftc.net) has joined #ceph
[16:47] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[16:51] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[17:19] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:52] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) has joined #ceph
[18:01] <wido> sagewk: I'll be online in about 5 hours
[18:11] <gregaf> wido: I think we'll all be around then :)
[18:26] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:41] * joshd (~joshd@ip-64-111-111-107.dreamhost.com) has joined #ceph
[18:51] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[18:57] * aliguori (~anthony@ has joined #ceph
[19:01] * cmccabe (~cmccabe@ has joined #ceph
[19:19] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * sjust (~sam@ip-64-111-111-107.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * MarkN (~nathan@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * rsharpe (~Adium@70-35-37-146.static.wiline.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * aliguori (~anthony@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * lxo (~aoliva@19NAACRKS.tor-irc.dnsbl.oftc.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * votz (~votz@pool-72-78-219-212.phlapa.fios.verizon.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * yehudasa (~yehudasa@ip-64-111-111-107.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * iggy (~iggy@theiggy.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * u3q (~ben@jupiter.tspigot.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * maswan (maswan@kennedy.acc.umu.se) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * [ack]_ (ANONYMOUS@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * nolan (~nolan@phong.sigbus.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * dwm (~dwm@vm-shell4.doc.ic.ac.uk) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * jjchen1 (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * cmccabe (~cmccabe@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * joshd (~joshd@ip-64-111-111-107.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * darkfader (~floh@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * eternaleye (~eternaley@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * Meths (rift@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * sagewk (~sage@ip-64-111-111-107.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * ajm (adam@adam.gs) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * badari (~badari@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * hachiya (~hachiya@encyclical.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * todin_ (tuxadero@kudu.in-berlin.de) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * gregaf (~Adium@ip-64-111-111-107.dreamhost.com) Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * MK_FG (~MK_FG@ Quit (reticulum.oftc.net resistance.oftc.net)
[19:19] * atg (~atg@please.dont.hacktheinter.net) Quit (reticulum.oftc.net resistance.oftc.net)
[19:20] * dwm (~dwm@vm-shell4.doc.ic.ac.uk) has joined #ceph
[19:20] * jjchen1 (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) has joined #ceph
[19:20] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[19:20] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) has joined #ceph
[19:20] * hachiya (~hachiya@encyclical.net) has joined #ceph
[19:20] * todin_ (tuxadero@kudu.in-berlin.de) has joined #ceph
[19:20] * badari (~badari@ has joined #ceph
[19:20] * ajm (adam@adam.gs) has joined #ceph
[19:20] * sagewk (~sage@ip-64-111-111-107.dreamhost.com) has joined #ceph
[19:20] * Meths (rift@ has joined #ceph
[19:20] * eternaleye (~eternaley@ has joined #ceph
[19:20] * darkfader (~floh@ has joined #ceph
[19:20] * joshd (~joshd@ip-64-111-111-107.dreamhost.com) has joined #ceph
[19:20] * cmccabe (~cmccabe@ has joined #ceph
[19:20] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[19:20] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[19:20] * sjust (~sam@ip-64-111-111-107.dreamhost.com) has joined #ceph
[19:20] * MarkN (~nathan@ has joined #ceph
[19:20] * rsharpe (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:22] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[19:22] * gregaf (~Adium@ip-64-111-111-107.dreamhost.com) has joined #ceph
[19:22] * MK_FG (~MK_FG@ has joined #ceph
[19:22] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[19:24] * aliguori (~anthony@ has joined #ceph
[19:24] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) has joined #ceph
[19:24] * lxo (~aoliva@19NAACRKS.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:24] * votz (~votz@pool-72-78-219-212.phlapa.fios.verizon.net) has joined #ceph
[19:24] * yehudasa (~yehudasa@ip-64-111-111-107.dreamhost.com) has joined #ceph
[19:24] * iggy (~iggy@theiggy.com) has joined #ceph
[19:24] * u3q (~ben@jupiter.tspigot.net) has joined #ceph
[19:24] * maswan (maswan@kennedy.acc.umu.se) has joined #ceph
[19:24] * [ack]_ (ANONYMOUS@ has joined #ceph
[19:37] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[19:57] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[20:01] * Dantman (~dantman@ has joined #ceph
[20:34] * verwilst (~verwilst@dD576F3A5.access.telenet.be) Quit (Quit: Ex-Chat)
[21:55] * todin_ (tuxadero@kudu.in-berlin.de) Quit (Read error: Connection reset by peer)
[21:57] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[22:05] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[22:10] <wido> hi
[22:11] <cmccabe> hi wido, just about to head to lunch
[22:11] <cmccabe> I think sage is coming back from lunch soonish
[22:11] <cmccabe> bbl
[22:13] <wido> ok, I'll hang around a bit
[22:17] * aliguori (~anthony@ has joined #ceph
[22:22] <wido> sagewk: Back from lunch yet?
[22:25] <sagewk> wido: yeah
[22:27] <wido> ah, ok
[22:27] <wido> What I noticed during the last weeks was the fact that the recovery seems to struggle when there are a lot of PG's
[22:27] <wido> We all know that the recovery right now hammers the hell out of the machines
[22:28] <wido> But during soms tests I found out that on my Atom platform, when I go over the ~10% of its capacity, it starts to struggle
[22:28] <wido> it's 74TB, so that's 7.4TB of data
[22:29] <wido> each OSD than has about 200G on it disk, which really starts to become heavy
[22:29] <wido> If for some reason a OSD goes down, than for 90% of the time your done
[22:30] <wido> I could really get a grip on it, but I kept getting crash after crash. All seemed to be osdmap encode/decode related. Like the osd map changes were following eachother up to fast
[22:31] <wido> Riht now I'm reaching the 10% filling again, that is almost a recipe for disaster. I had some disk failures, no way I could recover from those. The whole cluster started bouncing around
[22:32] <wido> I'm currently at 8k PG's and that's also the "magic" limit I noticed, everything below 8k goes fine
[22:33] <Tv> wido: my understanding is that peering/recovering consumes way more cpu than steady state
[22:33] <sjust> Tv: yeah, but the crashes indicate that we have a more significant problem
[22:33] <Tv> yes
[22:34] <Tv> just saying, that's why Atoms might be problematic *once something crashes*
[22:34] <wido> But the weird thing is, the crashes are so random, I could keep reporting them, but I would then create 5 issues a day
[22:34] <wido> or sometimes even more
[22:34] <sjust> wido: actually, out of memory errors might manifest this way
[22:35] <wido> Sure, I know that recovery is heavy and that Atoms might be a bit underpowered, but imho a OSD should be able to run on a Atom
[22:35] <sjust> wido: yeah, I agree
[22:35] <sagewk> it woudl crash in decode?
[22:35] <sjust> sagewk: just guessing, but in encode, at least, the bufferlist is expanding
[22:36] <sjust> unlikely to be causing all of this, though
[22:36] <wido> sagewk: No, encode, my bad
[22:36] <sagewk> what kind of crashes stack traces are you seeing?
[22:36] <wido> First the encode strack traces, but second a lot of PG state related
[22:37] <wido> But what I also noticed, if I start a crashed OSD again, it could take up to 30 minutes before it comes up, for the first 30 minutes it's only scanning it's local dir
[22:37] <sagewk> can you pastebin some of them? or stick them in a tracker issue?
[22:37] <sagewk> hmm, that is probably the missing set calculation
[22:38] <wido> I don't have them right now, but since my cluster is reaching the magical 10% again I'll kill some OSD's and it will start bouncing again
[22:39] <sagewk> that'd be great.
[22:39] <wido> sure
[22:40] <sagewk> sjust: i suspect the slow startup is for any pgs that have a backlog.. PG::read_log() scans the log items for anything missing.
[22:43] <wido> during that slow startup the OSD is eating 100% CPU btw
[22:43] <sjust> sagewk: yeah
[22:44] <wido> What I'll do is kill some OSD's and slowly start them up again. I'll keep a close eye on the logs of what they are doing during the slow startup and gather all the stack traces I'll can
[22:44] <wido> I'll stick to v0.31 btw
[22:45] <sagewk> k
[22:45] <wido> but still, my small cluster with 6 OSD's is still running fine :) No crashes or whatsoever. But that's only 50G of data
[22:46] <wido> Oh btw, the bash completion hasn't been merged yet? Something wrong with it?
[22:49] <sagewk> oh yeah. not really.. just some hesitation to enshrine the syntax in another location.
[22:50] <sagewk> probably worth it, unless anyone els objects?
[22:50] <sagewk> brb
[22:50] <wido> get your point, might be useful for new user though, just tab their way through the commands
[22:52] <Tv> i wish there was a standard for "this is my calling convention"
[22:53] <Tv> something where the man page synopsis and bash completion are extracted from the actual source
[22:53] <gregaf> that would be awesome, but scary
[22:53] <Tv> yeah, complex beast
[23:01] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[23:05] <sagewk> wido: applied
[23:05] <wido> sagewk: cool :) Now it's the only docs for some magic commands ;)
[23:05] * aliguori (~anthony@ Quit (Read error: Operation timed out)
[23:06] <wido> I grabbed most of it out of the source code
[23:18] * aliguori (~anthony@ has joined #ceph
[23:29] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.