#ceph IRC Log


IRC Log for 2013-06-06

Timestamps are in GMT/BST.

[0:05] * yan (~Adium@75-101-14-52.static.sonic.net) has joined #ceph
[0:06] * yan (~Adium@75-101-14-52.static.sonic.net) has left #ceph
[0:06] * yan (~Adium@75-101-14-52.static.sonic.net) has joined #ceph
[0:06] * jfriedly_ (~jfriedly@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[0:06] * yan (~Adium@75-101-14-52.static.sonic.net) has left #ceph
[0:07] * Tamil (~tamil@ Quit (Quit: Leaving.)
[0:08] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Ping timeout: 480 seconds)
[0:08] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Quit: bia)
[0:17] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:19] * Tamil (~tamil@ has joined #ceph
[0:24] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[0:24] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:25] * mschiff (~mschiff@ Quit (Remote host closed the connection)
[0:29] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) has joined #ceph
[0:31] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Read error: Operation timed out)
[0:33] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[0:36] * portante (~user@ Quit (Ping timeout: 480 seconds)
[0:40] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[0:44] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[0:48] * markl (~mark@tpsit.com) Quit (Ping timeout: 480 seconds)
[0:56] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[1:04] * markl (~mark@tpsit.com) has joined #ceph
[1:06] * jfriedly_ (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Remote host closed the connection)
[1:06] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[1:14] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Connection reset by peer)
[1:15] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[1:15] * ChanServ sets mode +o scuttlemonkey
[1:21] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[1:22] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[1:25] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[1:26] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:27] <sage> paravoid: there?
[1:28] * loicd (~loic@brln-4db819d8.pool.mediaWays.net) Quit (Quit: Leaving.)
[1:31] * kyle__ (~kyle@ Quit (Quit: Leaving)
[1:34] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:37] <paravoid> sage: here now
[1:37] <paravoid> saw the mail, replied to the report
[1:37] <sage> k cool
[1:37] <paravoid> sage: so I just upgraded one box
[1:37] <paravoid> I have 12 of them, so I can repeat the test
[1:38] * diegows (~diegows@ has joined #ceph
[1:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[1:44] <sage> brb
[1:45] <paravoid> huh
[1:50] <joao> paravoid, still around?
[1:50] <paravoid> okay, another strange thing put in the bug report
[1:50] <paravoid> yes :)
[1:50] <joao> paravoid, do you have the log for mon.2?
[1:50] <paravoid> I have no mon.2
[1:50] <joao> i.e., whatever monitor is ranked 2
[1:50] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[1:51] <paravoid> that would be ms-be1005, let me get that for you
[1:51] <joao> thanks
[1:51] <paravoid> anything you're looking for specifically?
[1:51] <paravoid> or do you want the whole log?
[1:52] * rturk is now known as rturk-away
[1:53] <joao> I'm interested the portion from 2013-06-05 18:25:32.458023 until slightly past 2013-06-05 18:25:33.071758
[1:53] <joao> okay, say, 18:25 up to 18:26 or something :p
[1:54] <paravoid> http://p.defau.lt/?0uIyz78_wLMxIrWGd0Y6uQ
[1:54] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[1:54] <paravoid> more context?
[1:58] <paravoid> joao: do you need more context than the one above?
[1:58] <joao> paravoid, sorry, yeah, if you don't mind
[1:59] <joao> although I'm assuming you have default log levels in place?
[1:59] <paravoid> yes
[1:59] <joao> yeah, I'll take a bit more context just in case, but it might not give me the answers I'm looking for :\
[2:00] <paravoid> what are you trying to figure out?
[2:00] <paravoid> why it crashed?
[2:00] <paravoid> I think there are multiple bugs here
[2:00] <paravoid> one is the initial crash, the other one is that conversion isn't resuming or restarting
[2:01] <paravoid> the third one is that after recognizing that there's a half-baked conversion, it still tries to operate without a monmap and then tries to add itself to an empty monmap
[2:01] <paravoid> aiui, you would know better :)
[2:02] <paravoid> the fourth one is that ceph-create-keys is executing mon_status on the asok and that doesn't exist anymore?
[2:02] <joao> paravoid, there's a couple of things I need to look into, and one of them I already figured why it's happening but need to audit the code to see if it's happening somewhere else too
[2:03] <joao> apparently, when we abort a sync we will clean up the store, just as intended, to avoid inconsistencies; however, we're also removing our backup monmap
[2:03] <joao> which isn't a problem if the monitor doesn't crash
[2:03] <joao> so that one is the one I have figured out
[2:04] <joao> now, the whole conversion thing is weid
[2:04] <joao> *wird
[2:04] <joao> *weird
[2:04] <joao> I would assume that the monitor did convert prior to synchronizing, no?
[2:04] <paravoid> no idea
[2:05] <joao> well, regardless, the bug here is that if there's an aborted conversion the monitor shouldn't even start
[2:05] <paravoid> my vague recollection of the events is that I upgraded ms-fe1001, then ms-fe1003, then ms-be1005, then ms-fe1003 crashed and was unable to recover
[2:06] <paravoid> that's one of the bugs, right?
[2:06] <paravoid> the third one that I listed above
[2:07] <joao> yes, the third you listed is a bug; the second is by design
[2:07] <paravoid> aha
[2:07] <joao> if a conversion fails, it should require user intervention
[2:07] <paravoid> okay
[2:07] <paravoid> and what that intervention be? :)
[2:08] <joao> and by that we mean, rerun the monitor with higher debug levels, stash the log
[2:08] <paravoid> I mean, it's non-obvious to me on how to restart that conversion
[2:08] <joao> mv mon_data_dir/store.db mon_data_dir/store.db.bak
[2:08] <joao> rerun the monitor
[2:08] <paravoid> I tried that
[2:08] <paravoid> it failed
[2:08] <paravoid> let me retry
[2:08] <paravoid> to give you more info than "it failed" :)
[2:09] <joao> paravoid, this time increase the debug level to 20
[2:09] <joao> debug mon = 20
[2:09] <joao> it the conversion is failing we need to know why
[2:10] <joao> and I'm off to the kitchen for ~5-10 minutes
[2:10] <joao> brb
[2:10] <paravoid> # /usr/bin/ceph-mon --cluster=ceph -i ms-fe1003 -f --debug-mon=20
[2:10] <paravoid> Invalid argument: /var/lib/ceph/mon/ceph-ms-fe1003/store.db: does not exist (create_if_missing is false)
[2:11] <paravoid> oh wait
[2:11] <paravoid> it converts :)
[2:11] <paravoid> slowly but it seems to be doing work
[2:11] * The_Bishop (~bishop@2001:470:50b6:0:6473:bc33:4957:72d) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[2:13] * iggy_ (~iggy@theiggy.com) Quit (Quit: leaving)
[2:14] * iggy (~iggy@theiggy.com) Quit (Remote host closed the connection)
[2:16] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (Ping timeout: 480 seconds)
[2:18] <joao> paravoid, store conversion can be slow for large stores
[2:18] <paravoid> yep, it worked
[2:18] <paravoid> it's done just now
[2:19] <joao> now to see why it progressed despite finding an aborted conversion before tackling the rest
[2:21] <paravoid> I have a HEALTH_WARN, with nothing obvious in ceph -s
[2:22] <paravoid> could this be because of
[2:22] <paravoid> 2013-06-06 00:18:38.291303 mon.2 41 : [WRN] reached concerning levels of available space on data store (9% free)
[2:22] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[2:22] <joao> yeah
[2:22] <paravoid> damn
[2:22] <joao> that will do it
[2:22] <joao> paravoid, 'ceph health detail' will tell you more on what's happening
[2:22] <joao> but yeah, that will do it
[2:23] <joao> (I'm repeating myself, always a sign of being time for bed)
[2:23] * wigly (~tra26@tux64-13.cs.drexel.edu) has joined #ceph
[2:24] <paravoid> sage: the out of order osdmaps could be because of 5256
[2:24] <paravoid> also see above
[2:24] <paravoid> this happened at basically the same time
[2:24] * iggy (~iggy@theiggy.com) has joined #ceph
[2:24] <paravoid> one of the monitors is on a box with osds which happens to be the box in 5257
[2:25] <wigly> Is there something I should do if I get the following in the kernel logs (of an client) libceph: get_reply unknown tid 1167414 from osd7?
[2:26] <sage> wigly: nope, ignore it. we turned down the volume on that a while back, iirc.
[2:28] <wigly> sage: even if I have a few hundred of those?
[2:29] <wigly> Turned on a few more services to hit rbds and those started + slow requests
[2:31] <joao> paravoid, do you have any recollection on why the conversion was aborted initially?
[2:31] <joao> was it the crash?
[2:31] <paravoid> I think so
[2:32] <sage> which crash?
[2:32] <paravoid> but too many things happening at the same time
[2:32] <joao> the crash should only have happened way after the conversion had finished
[2:32] <joao> sage, a sync crash, which makes little sense to happen during or before conversion has finished
[2:32] <sage> yeah
[2:33] <joao> maybe the store was in fact converted but somehow we didn't clean up the flags?
[2:33] <joao> that would be a new thing altogether
[2:33] <paravoid> it had quorum for a while
[2:33] * joao checks
[2:33] <paravoid> with that box
[2:33] <paravoid> so I'm guessing conversion did finish
[2:34] <paravoid> I upgraded ms-fe1001, then ms-fe1003 (the 5256 crashing one), then those two were now cuttlefish and established quorum
[2:34] <paravoid> the cluster was responsive at the time
[2:34] <paravoid> then I upgraded ms-be1005
[2:34] <paravoid> and then it established quorum with ms-fe1001 and ms-fe1003 crashed
[2:34] <paravoid> that's my recollection of the events
[2:34] <cjh_> i think i'm doing something wrong with export diff. each export i do shows only 64k of changes
[2:35] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[2:35] <paravoid> also do have in mind that ms-be1005 also had osds that were restarted and this had the effect seen on 5257, in which sage correctly pointed out osdmap out of order epochs
[2:36] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[2:36] <sage> do you ahve a full mon log?
[2:37] * elder (~elder@ has joined #ceph
[2:37] <sage> paravoid: ^
[2:37] <paravoid> ms-fe1003's is on cephdrop
[2:37] <sage> k
[2:37] <paravoid> as 5256.log.gz
[2:37] <paravoid> joao has been investigating it for another bug for a while
[2:38] <paravoid> do you want the rest?
[2:38] <paravoid> the other mons that is
[2:39] <joao> fwiw, the on-disk flags marking an on-going conversion are indeed removed after conversion finished; that hasn't changed
[2:40] <sage> did your mons have debug levels higher?
[2:40] <paravoid> I'm afraid not
[2:41] * wigly (~tra26@tux64-13.cs.drexel.edu) has left #ceph
[2:41] <paravoid> I always have interesting bugs, haven't I
[2:42] <sage> yeah seriously
[2:42] <sage> well, ok. i think you're right that the osd down thing was related to the osdmap freakout. we do need to get to the bottom of that, though. can you upload your set of mon logs so we can piece together a timeline?
[2:43] * Tamil (~tamil@ Quit (Quit: Leaving.)
[2:43] <paravoid> ok
[2:43] <paravoid> I didn't expect this to go this bad, so I didn't record a timestamped log of my actions nor increased debug levels
[2:44] <paravoid> in retrospect, silly of me
[2:45] <sage> no worries. thanks-
[2:47] <sage> paravoid: have to run.. if you attach the logs to teh bug or cephdrop them i will dig in tonight. delaying 0.61.3 package upload until we understand what that is all about...
[2:49] <paravoid> 5256-ceph-mon.ms-be1005.log.gz 5256-ceph-mon.ms-fe1001.log.gz 5256-ceph-mon.ms-fe1003.log.gz
[2:49] <paravoid> 5256-ceph-mon.ms-be1005.log.gz 5256-ceph-mon.ms-fe1001.log.gz 5256-ceph-mon.ms-fe1003.log.gz
[2:49] <paravoid> @ cephdrop
[2:49] <cephalobot`> paravoid: Error: "cephdrop" is not a valid command.
[2:49] <paravoid> hah
[2:50] <dmick> do not anger cephalobot :)
[2:50] <joao> well, I'm off to bed
[2:50] <joao> I'll look into this again tomorrow
[2:50] <paravoid> joao: not at PDT?
[2:50] * elder (~elder@ Quit (Quit: Leaving)
[2:50] <joao> paravoid, GMT
[2:50] <paravoid> GMT? is anyone at GMT this time of the year?
[2:50] <joao> GMT-ish
[2:50] <paravoid> heh, okay
[2:51] <paravoid> UTC+3 here, another late night chasing bugs :)
[2:51] <joao> okay, according to google I'm on WEST (UTC+1)
[2:51] <joao> go figure, TIL
[2:57] * The_Bishop (~bishop@f052096139.adsl.alicedsl.de) has joined #ceph
[3:07] * iggy_ (~iggy@theiggy.com) has joined #ceph
[3:16] <sage> paravoid: did mons get upgraded on this second box?
[3:17] <paravoid> I posted a timeline on the 5256 bug
[3:21] <paravoid> sage: basically 5256 == 5257 as I see it and now we're posting on both bug reports :)
[3:24] <paravoid> on an unrelated note, pg upgrade on my 2T disks takes ~50mins
[3:24] <paravoid> more actually, it still hasn't finished
[3:28] * frank9999 (~frank@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[3:30] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[3:31] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:34] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[3:36] <paravoid> -26/822691943 degraded (-0.000%)
[3:36] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[3:36] <paravoid> that's... interesting:)
[3:40] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[3:45] * Midnightmyth (~IceChat9@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[3:45] * stass (stas@ssh.deglitch.com) has joined #ceph
[3:46] <Midnightmyth> Hi Im totally new to Ceph, but I do have some input to the development of Ceph can someone tell me how to comment on this?
[3:48] <Midnightmyth> hmm really no one online?
[3:50] <Cube> Midnightmyth: The mailing list might be a good place to start, have you joined ceph-devel?
[3:51] <Midnightmyth> No I have not, I never used a mailing list in my life actually
[3:52] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[3:52] <Midnightmyth> Does the mailing list also concert proposals like this one http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/Erasure_encoding_as_a_storage_backend
[3:52] <Midnightmyth> Or should I comment to the author instead?
[3:53] <Cube> Mailing list would work
[3:53] <paravoid> sage: here?
[3:54] <Midnightmyth> Because the software suggested to use in that erasure code suggest would be really really a bad mistake, cant stress that enought.
[3:54] <Midnightmyth> But I guess I can put it all in a mail
[3:55] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[3:59] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[4:21] * Midnightmyth (~IceChat9@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[4:40] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[4:42] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Quit: bia)
[4:44] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[5:02] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Ping timeout: 480 seconds)
[5:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[5:07] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[5:09] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:10] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[5:12] * jfriedly (~jfriedly@c-24-6-102-119.hsd1.ca.comcast.net) has joined #ceph
[5:16] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[5:34] * jfriedly (~jfriedly@c-24-6-102-119.hsd1.ca.comcast.net) Quit (Quit: leaving)
[5:34] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[5:45] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[5:47] <sage> paravoid: weird, but harmless.
[5:58] * Vanony_ (~vovo@i59F7A451.versanet.de) has joined #ceph
[6:05] * Vanony (~vovo@i59F7A9B1.versanet.de) Quit (Ping timeout: 480 seconds)
[6:15] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[6:27] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[6:28] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:22] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 482 seconds)
[7:31] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[7:35] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (Remote host closed the connection)
[7:41] * loicd (~loic@brln-4db819d8.pool.mediaWays.net) has joined #ceph
[7:46] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:47] * fmarchand (~fmarchand@a.clients.kiwiirc.com) has joined #ceph
[7:51] * bergerx_ (~bekir@ has joined #ceph
[7:51] * tnt (~tnt@ has joined #ceph
[8:00] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[8:09] * BillK (~BillK@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[8:13] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[8:26] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[8:32] * capri (~capri@ has joined #ceph
[8:36] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[8:54] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[8:54] * zindello (~zindello@ has joined #ceph
[8:56] <zindello> Evening
[9:00] * rongze1 (~zhu@ has joined #ceph
[9:03] * rongze (~zhu@ Quit (Ping timeout: 480 seconds)
[9:05] * rongze (~zhu@ has joined #ceph
[9:09] * rongze1 (~zhu@ Quit (Ping timeout: 480 seconds)
[9:11] * athrift_ (~nz_monkey@ Quit (Remote host closed the connection)
[9:12] * Muhlemmer (~kvirc@cable-88-137.zeelandnet.nl) has joined #ceph
[9:12] * athrift (~nz_monkey@ has joined #ceph
[9:15] * mabeki (~makiefer@2001:8d8:1fe:301:a2b3:ccff:fef7:21d5) has joined #ceph
[9:15] <Muhlemmer> g'day
[9:15] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:16] <Muhlemmer> I am looking into CEPH training from inktank. (CEPH-100,110 & 120).
[9:17] <Muhlemmer> where is this trainig performed? US or also somewhere in Euroe?
[9:17] * mabeki (~makiefer@2001:8d8:1fe:301:a2b3:ccff:fef7:21d5) Quit (Remote host closed the connection)
[9:17] <Muhlemmer> and what does (approx) cost?
[9:18] <Muhlemmer> Kind of want to know to design a buisiness plan
[9:18] * rongze1 (~zhu@ has joined #ceph
[9:19] * mabeki (~makiefer@2001:8d8:1fe:301:a2b3:ccff:fef7:21d5) has joined #ceph
[9:19] * eschnou (~eschnou@ has joined #ceph
[9:23] <zindello> Anyone around? I'm trying to get a testing cluster set up on CentOS 6.4 and I'm having a hell of a time using the ceph-deploy tool
[9:23] * rongze (~zhu@ Quit (Ping timeout: 480 seconds)
[9:28] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:29] * leseb (~Adium@ has joined #ceph
[9:32] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[9:32] * BManojlovic (~steki@ has joined #ceph
[9:34] <zindello> Anyone?
[9:35] <Gugge-47527> zindello: ask something, and maybe someone will answer.
[9:35] <ofu> zindello: yes, the path is wrong
[9:36] <wogri_risc> heh :) thought the same...
[9:36] <ofu> I tried to document everything last week... http://oliver.fuckner.net/index.php?/archives/263-ceph-Testinstallation,-Teil1.html
[9:36] <ofu> correct path would be:
[9:36] <ofu> if os.path.exists('/usr/lib/python2.6/site-packages/ceph_deploy'): sys.path.insert(0,'/usr/lib/python2.6/site-packages/ceph_deploy')
[9:37] <zindello> Ok
[9:37] <wogri_risc> ofu, I don't think that 'ceph schlecht dokumentiert ist' - maybe not so well for CentOS/Redhat.
[9:38] <ofu> ok, i am new to ceph, perhaps i dont know the right places to look
[9:38] <zindello> Well, I downloaded and installed the ceph-deploy package onto my local workstation at work (Fedora 15) and followed the instructions as per the preflight-checklist
[9:38] <zindello> Then prepped my 3 nodes with CentOS 6.4, updated and EPEL installed
[9:38] <zindello> Disabled selinux and rebooted, disabled iptables
[9:39] <zindello> I created a cluster with 3 hosts, and then created all 3 of those as monitors
[9:40] <zindello> Now, here's the tricky part. On two of those nodes, that worked ok, on one of them it's failing, ceph is starting and the ceph-create-keys process is hanging around in the background. If I kill it and run it manually it's giving me an error about not being in quorum
[9:41] <ofu> you forgot ceph gatherkeys?
[9:41] <zindello> I've tried all manner of things, including completely blowing away the ceph install on that node and re-adding it using ceph-deploy command and still getting the same erro
[9:41] <zindello> The gatherkeys is meant to be run after adding the mons isn't it
[9:41] <zindello> ?*
[9:41] <zindello> At least according to the instructions ..
[9:43] <ofu> have your keyrings been placed everywhere?
[9:44] <zindello> File structure on the failing server is the same as the working server
[9:45] <zindello> Keys all *appear* to be ok
[9:47] <ofu> try gatherkeys -v
[9:47] <zindello> Furthermore - if I run the ceph command line util on the server with the problem, I can query the status, but it's not in quorum
[9:48] <wogri_risc> I am not familiar with the new ceph-deploy tool, but if your mon's aren't running yet it won't be in quorum.
[9:49] <zindello> The mon is running on the box, or at least it should be. I'm starting it with /etc/init.d/ceph start
[9:49] <zindello> It's listed as a mon in the config file
[9:49] <wogri_risc> how many nodes with mon's you say you have?
[9:49] <wogri_risc> it should also be running as the process namend 'ceph-mon'
[9:49] <zindello> 3 - 2 are working and in quorum. One isn't
[9:50] <zindello> Mmm, hang 2 I'm rebooting the failing box at the moment. The working boxes do have the ceph-mon process running
[9:50] <zindello> (Dell Servers take about 5 minutes to reboot)
[9:51] * LeaChim (~LeaChim@ has joined #ceph
[9:52] <zindello> Ok, rebooted
[9:52] <zindello> Yep, ceph-mon is running
[9:52] <fmarchand> Hi !
[9:53] <wogri_risc> check your iptables -vnL output, zindello.
[9:53] <zindello> Yeah, iptables is disabled, as is selinux
[9:54] <fmarchand> I have a question ... why when I delete a file through radosgw (http delete) ... the file seems to be removed ... but does not seem to release disk space in my ceph cluster ... is that normal ?
[9:54] <wogri_risc> I know you said that, but I think you should double-check.
[9:54] <wogri_risc> just to be sure :)
[9:55] <wogri_risc> yes, fmarchand, it is.
[9:55] <zindello> Yeah, I checked anyway, it's definitely dead
[9:55] <wogri_risc> it will be freed when the system thinks it's a good time to do this.
[9:56] <zindello> On the failing node I'm seeing errors like
[9:56] <zindello> >> pipe(0x1d20500 sd=21 :0 s=1 pgs=0 cs=0 l=0).fault
[9:56] <zindello> On startup
[9:56] <fmarchand> wogri:oh so what do I have to do to definitely  remove it
[9:56] <fmarchand> ?
[9:57] <wogri_risc> nothing, fmarchand. wait.
[9:57] <fmarchand> wogri: gc ?
[9:58] * itamar_ (~itamar@ has joined #ceph
[9:58] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:58] <wogri_risc> fmarchand - don't know what gc stands for
[9:58] <fmarchand> wogri: garbage collector. how long should I wait ?
[9:59] <wogri_risc> I don't know. inktank folks will do. you should probably not worry about this.
[9:59] * ScOut3R (~ScOut3R@ has joined #ceph
[9:59] <wogri_risc> zindello: no idea what that error means. mainglist might help.
[10:07] <zindello> Mmm
[10:08] <zindello> I'm going to blow away the nodes again and peform a fresh install. See if that helps at all
[10:14] * capri (~capri@ Quit (Ping timeout: 480 seconds)
[10:17] * madkiss (~madkiss@tmo-110-188.customers.d1-online.com) has joined #ceph
[10:19] * dcasier (~dcasier@ has joined #ceph
[10:20] * capri (~capri@ has joined #ceph
[10:26] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[10:28] * rongze1 (~zhu@ has left #ceph
[10:29] * tziOm (~bjornar@ has joined #ceph
[10:33] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:36] * SWAT (~swat@cyberdyneinc.xs4all.nl) has joined #ceph
[10:36] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[10:36] * ChanServ sets mode +v andreask
[10:45] * athrift (~nz_monkey@ Quit (Remote host closed the connection)
[10:47] * itamar_ (~itamar@ Quit (Remote host closed the connection)
[10:48] * athrift (~nz_monkey@ has joined #ceph
[10:53] * itamar_ (~itamar@ has joined #ceph
[10:54] * itamar_ (~itamar@ Quit (Remote host closed the connection)
[10:54] * itamar_ (~itamar@ has joined #ceph
[10:59] * madkiss1 (~madkiss@tmo-107-6.customers.d1-online.com) has joined #ceph
[11:01] * schlitzer|work (~schlitzer@ has joined #ceph
[11:04] * madkiss2 (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[11:04] * madkiss1 (~madkiss@tmo-107-6.customers.d1-online.com) Quit (Read error: Connection reset by peer)
[11:05] * madkiss (~madkiss@tmo-110-188.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[11:06] * leseb (~Adium@ Quit (Quit: Leaving.)
[11:07] <schlitzer|work> hey folks, i�m running ubuntu 12.04 with a ceph + rgw installation. trying use s3 + multipart upload gives this:
[11:08] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:08] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:08] <schlitzer|work> http://pastebin.com/bPKPxVUb
[11:10] <loicd> schlitzer|work: what client do you use ?
[11:16] <schlitzer|work> mom, have du ask the developer, but i guess it was jcloud
[11:16] <loicd> schlitzer|work: it will be more efficient if he asks directly, I think. Can you suggest him to join this chan ?
[11:17] <loicd> or even better, write down a bug report with a detailed description of the conditions to reproduce the problem :-)
[11:17] <loicd> I have to go now, bbl
[11:17] <schlitzer|work> firewall stuff, he cannot get out, i digged myself out thru a proxy
[11:17] <schlitzer|work> kk
[11:17] <schlitzer|work> will ask hin to do so
[11:17] <schlitzer|work> him^^
[11:18] * zindello (~zindello@ Quit (Quit: Computer went to sleep)
[11:21] <fmarchand> What is the "good" way to delete files through radosgw ... I miss something I think because disk space is never released ...
[11:23] * terje-_ (~terje@75-166-102-61.hlrn.qwest.net) Quit (Read error: Operation timed out)
[11:23] * terje_ (~joey@75-166-102-61.hlrn.qwest.net) Quit (Read error: Operation timed out)
[11:27] * Midnightmyth (~IceChat9@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[11:27] * loicd (~loic@brln-4db819d8.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[11:36] * leseb (~Adium@ has joined #ceph
[11:43] <tnt> Is there a way to dump the full binary pgmap ?
[11:47] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[11:48] <tnt> Interestingly the cpu_sys load (linked to IO cpu usage presumably) of monitors is much higher on the master. Which I find weird because the peons have to commit the same data to disk ...
[11:49] * leseb (~Adium@ Quit (Ping timeout: 480 seconds)
[11:59] * maximilian (~maximilia@ has joined #ceph
[12:04] * terje (~joey@97-118-115-214.hlrn.qwest.net) has joined #ceph
[12:10] * leseb (~Adium@ has joined #ceph
[12:11] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:12] * mschiff (~mschiff@tmo-106-252.customers.d1-online.com) has joined #ceph
[12:13] * madkiss2 (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[12:19] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:23] * leseb (~Adium@ Quit (Quit: Leaving.)
[12:25] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:25] * ChanServ sets mode +v andreask
[12:26] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[12:33] * diegows (~diegows@ has joined #ceph
[12:36] * leseb (~Adium@ has joined #ceph
[12:44] * The_Bishop_ (~bishop@e179004203.adsl.alicedsl.de) has joined #ceph
[12:49] * The_Bishop (~bishop@f052096139.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[13:03] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:06] * mschiff (~mschiff@tmo-106-252.customers.d1-online.com) Quit (Remote host closed the connection)
[13:33] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Ping timeout: 480 seconds)
[13:34] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[13:48] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[13:49] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:49] * ChanServ sets mode +v andreask
[13:49] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:56] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Quit: Leaving)
[13:56] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) has joined #ceph
[14:13] * _et (~yogesh@ has joined #ceph
[14:16] <_et> hi, is python-pushy a required package for Ceph on Ubuntu?
[14:16] <_et> cuz it seems to be unavailable in the repos.
[14:19] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:23] * leseb (~Adium@ Quit (Quit: Leaving.)
[14:34] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[14:34] <andrei> hello guys
[14:34] <andrei> i was wondering if someone could help me with my small ceph cluster
[14:35] <andrei> i was having issues with mon crash after doing some benchmarks yesterday
[14:35] <andrei> as a result one of the mons got corrupted and no longer starts
[14:35] <andrei> i was left with 2 mons
[14:36] <andrei> a few hours ago i had a crash of the second mon server, which has brought down my cluster once again.
[14:36] <andrei> the second mon server could be successfully restarted, so my cluster is back alive
[14:37] * nhm (~nhm@65-128-142-169.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[14:37] <joao> andrei, do you have logs for the mon crashes?
[14:37] <andrei> i was wondering if there is a ceph release which addresses the levelbd bug that i have been told caused the problem
[14:37] <andrei> joao: yeah, I do, I pasted the logs yesterday
[14:37] <joao> andrei, which version are you using?
[14:37] <andrei> and sage told me that it looks like the leveldb bug
[14:37] <andrei> i am on 0.61.2 at the moment
[14:38] <andrei> sage also mentioned that there is a new release coming in a few hours
[14:38] <andrei> but that release will not address the bug that i had
[14:39] * yanzheng (~zhyan@ has joined #ceph
[14:40] <andrei> I am also not seeing any new updates for ceph in debian/centos repos
[14:40] <joao> andrei, can't seem to find the logs; can you point me to them again?
[14:40] <andrei> joao, sure
[14:40] <andrei> one moment
[14:40] <joao> ty
[14:40] <andrei> joao: http://ur1.ca/e7106
[14:41] <andrei> here is the log from yesterdays crash
[14:41] <andrei> it happened shortly after i've fired fio testing
[14:41] <andrei> on 4 vms running off rbd
[14:41] <andrei> as a results vms crashed
[14:41] <andrei> ((
[14:41] <andrei> it happened about an hour into the tests
[14:41] <andrei> i was using 200gb files with random read/writes
[14:42] <andrei> today's crash happend shortly after doing similar tests.
[14:42] <joao> andrei, same crash?
[14:42] <andrei> i will need to check
[14:42] <andrei> however, i've not done it this way
[14:43] <andrei> the tests i did was from one of the storage server
[14:43] <andrei> i've rbd maped the disk
[14:43] <andrei> and started fio
[14:43] <andrei> let me check the second crash
[14:43] <tnt> joao: btw, while you're here did you need my message from 3h ago. (it wasn't addressed to you specifically but it was a mon question and noone else answered).
[14:44] * elder (~elder@ has joined #ceph
[14:44] <joao> tnt, sorry, didn't noticed it
[14:44] <joao> tnt, I'm not sure there is; let me check
[14:44] <joao> hey elder :)
[14:46] <elder> Hi Joao
[14:46] <paravoid> joao: hey
[14:46] <andrei> joao: here you go, the crash log from the second mon which died today: http://ur1.ca/e7gz7
[14:46] <paravoid> joao: sage asked for my mon dirs
[14:46] <paravoid> they're 1.6G each
[14:46] <paravoid> should I just put them on cephdrop?
[14:46] <joao> paravoid, yeah :)
[14:46] <paravoid> heh, fun
[14:47] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[14:47] <joao> andrei, thanks
[14:47] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:47] <andrei> the other question that I have is how do I replace the broken mon which has crashed and refuses to start?
[14:47] <andrei> joao: you are welcome. Did both crashes happened because of the same bug?
[14:48] <joao> andrei, set debug mon = 20 on that monitor's ceph.conf, restart it and paste the resulting log; we'll go from there
[14:48] <joao> andrei, don't think so
[14:48] <joao> might be related though
[14:48] <andrei> joao: in the ceph.conf?
[14:48] <joao> not sure what the hell happened with leveldb
[14:49] <joao> andrei, yeah, under [mon.foo], being 'foo' whatever id that monitor has
[14:49] <andrei> joao: can i trash my store.db folder?
[14:50] <andrei> it grown to 8gb and used the entire disk space of my /root partition ((
[14:50] <andrei> that could possibly be the reason why it was starting anymore
[14:50] <andrei> it had no space?
[14:51] <joao> ah
[14:51] <joao> well, yeah
[14:51] * elder (~elder@ Quit (Quit: Leaving)
[14:52] <joao> if there's no more space on the data store, it will certainly make leveldb do all sorts of things, and even more so the monitor will shut itself down as soon as it notices the data dir has no space available
[14:52] <andrei> joao: i've cleared some space and ceph-mon has started )))
[14:52] <andrei> joao: is it normal for mon to consule 8gb of disk space?
[14:52] <joao> eh, I'm assuming you ':' doesn't work and those ')))' are smilies? :p
[14:53] <joao> andrei, there's a leveldb bug that will cause that
[14:53] * leseb (~Adium@ has joined #ceph
[14:53] <joao> we've addressed it, not sure it went into 0.61.2 though
[14:53] <andrei> all other mons that i have are using around 1.5gb
[14:53] <andrei> apart from that one which crashed
[14:53] <joao> I just can't recall anymore which fixes have went where :\
[14:54] <tnt> no. 0.61.2 is still affected by all mons issue basically.
[14:54] <joao> tnt, 'pg getmap -o /tmp/foo' should get you the map
[14:54] <andrei> do you know if I can clean the store.db folder?
[14:55] <andrei> i need to free up some space
[14:55] <joao> tnt, sorry it took so long, got distracted by andrei's issues :)
[14:55] * Midnightmyth (~IceChat9@93-167-84-102-static.dk.customer.tdc.net) Quit (Quit: Few women admit their age. Few men act theirs.)
[14:55] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[14:55] <joao> andrei, you can, but you'll have to recreate that monitor from scratch after that
[14:55] <joao> that's not really advisable
[14:56] <joao> best would be to move the data store to some other partition
[14:56] <joao> and adjust that on your ceph.conf
[14:56] <tnt> joao: ah thanks ! and np. And do you know why the master mon would have significantly (like 10x more, up to 30% average) higher "CPU Sys" usage than peons ? The actual IO looks very similar (and that's what I'd expect).
[14:56] <joao> restarting the monitor usually helps with the growth
[14:56] <andrei> joao: does it clean the used space?
[14:57] <joao> tnt, I believe I saw something concerning leveldb compaction a while ago
[14:57] <andrei> or does it keep on growing forever?
[14:57] <joao> tnt, which version are you on? 0.61.2?
[14:57] <joao> also, have you enabled compression?
[14:57] <tnt> joao: well, compaction also runs on the peons, that's what's weird. The difference between master/peon.
[14:57] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[14:57] * Vjarjadian (~IceChat77@ has joined #ceph
[14:58] <tnt> joao: I have a 0.61.2 with a bunch of fixes over it. Among other things the trimming issue and also the async compaction patches.
[14:58] <joao> andrei, it actually cleans up unused space that wasn't freed by leveldb as it should
[14:58] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[14:58] <andrei> so, you think that if I restart the mon process it will clean up the 8GB of so?
[14:58] <joao> andrei, it should, yeah
[14:59] <andrei> let me try that
[14:59] <andrei> i guess I should disable the debug
[14:59] <tnt> joao: I don't have compression enabled. I tried it and it reduced IO a lot, but used a lot of "User" CPU (~60%). Which I find weird because the map is not that big ( 5Mo ) and a benchmark shows that machine can compress/decompress >600Mo/s.
[14:59] <joao> also, there was yet another bug that would cause the mon to keep a whole lot of versions around due to lack of trimming
[14:59] <joao> one workaround would be to restart the monitor
[15:00] <joao> tnt, we've seen 10-30% increases in cpu when using compression, iirc
[15:00] <joao> andrei, might be worth to check the size of store.db/LOG too
[15:01] <joao> I believe it was tnt who stumbled upon LOG growing quite large on xfs
[15:01] <tnt> yup.
[15:01] <tnt> I configured it on /dev/null now
[15:03] <joao> tnt, fwiw, I would expect that at any time the leader would be under heavier load than the peons, but I wouldn't expect it to be 10x more
[15:03] <joao> but nothing comes to mind right now
[15:03] <joao> err, I mean, that justifies the 10x
[15:04] <andrei> joao: that is what i am using - xfs
[15:04] <joao> andrei, then du -chs <mon_data_dir>/store.db/LOG
[15:04] <andrei> my LOG file is actually very small
[15:04] <joao> ok
[15:04] <andrei> 400k or so
[15:05] <andrei> i do have a bunch of *.sst files
[15:05] <andrei> which are several megs each
[15:05] <joao> yeah, that's how leveldb stores things
[15:05] <andrei> about 1500 files
[15:06] <andrei> joao: do you know if i can remove these files? or at least older sst files?
[15:06] <andrei> or would it break mon?
[15:06] <joao> andrei, don't touch them
[15:06] <andrei> okay
[15:06] <joao> leveldb takes care of them automatically and you'll probably end up breaking it if you mess around with those files
[15:06] <tnt> joao: yeah and what's surprising is that it's "Sys" time so that's like syscall or kernel time on behalf of the process.
[15:07] <tnt> joao: it also has a bit more "user" cpu but I expected that and it's almost nothing.
[15:07] <tnt> andrei: you can enable compact on start as well to help clear files on restart
[15:07] <joao> where's nhm when one needs him? :p
[15:08] <andrei> tnt: please let me now how to do that
[15:08] * leseb (~Adium@ Quit (Quit: Leaving.)
[15:08] <joao> andrei, set 'mon compact on start = true' on your ceph.conf
[15:08] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Remote host closed the connection)
[15:13] * portante (~user@ has joined #ceph
[15:14] <_et> any idea how i can get python-pushy installed on ubuntu?
[15:14] <andrei> joao: cheers, will do it now
[15:14] <_et> it seems to be a dependency for ceph-deploy
[15:14] <paravoid> this mon corruption sounds worrying
[15:14] <paravoid> maybe I should go from 3 to 5 mons
[15:15] <tnt> paravoid: well, you really need to make sure your mon don't grow to fill the disk.
[15:15] <andrei> tnt: how do I do that?
[15:15] <paravoid> two of the three have plenty of space, the third one does not
[15:15] <andrei> frequent mon restarts?
[15:15] <tnt> that, or yu deploy a git version with the fix :)
[15:16] <paravoid> git version wfm
[15:17] <paravoid> works today, didn't work yesterday, so there's always an increased risk
[15:18] <tnt> it's been working stable for me for more than a week now.
[15:18] <tnt> I'll go to 0.61.3 when it's out.
[15:19] <tnt> note that you only need to use a recent version on the master. That will fix most of the issues.
[15:19] <paravoid> I had #5255, #5256/#5257
[15:21] <tnt> ah damn elder left.
[15:26] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[15:26] * ChanServ sets mode +v andreask
[15:27] * _et (~yogesh@ has left #ceph
[15:27] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[15:27] <joelio> does ceph-deploy not support md devices?
[15:28] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[15:32] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[15:32] * PerlStalker (~PerlStalk@ has joined #ceph
[15:37] <bergerx_> joelio: seems like ceph-deploy tries to create a partition table and a partition in the given block device with using sgpart
[15:38] <bergerx_> i've been trying same with a partition, but it blindly tries to create a partition table in it and fails on partprobe
[15:39] <joelio> yea, I wanted to get my osds as 3 disk RAID0's - I could do it fine with the mkcephfs for a bit ago.. now it fails on ceph-deploy and mkcephfs giving me confusing errors..
[15:40] <bergerx_> its already written: Users who want fine-control over security settings, partitions or directory locations should use a tool such as Juju, Puppet ...
[15:40] <joelio> I create the xfs filesysems and mount up an osd and journal in the mount for each MD device
[15:40] <joelio> mkcephfs wokred before, but now it says it's already a directory
[15:40] <joelio> I remove the dir and it says no direcotry
[15:41] <joelio> I don;t want it to mkfs
[15:41] <joelio> it's not set to mkfs
[15:42] <joelio> bergerx_: I only have 6 nodes. I have a puppet infra but the modules were a headache
[15:42] <joelio> more work to get going that run the *1 command* that did it all previously
[15:42] <bergerx_> ahaha, same here :)
[15:42] <joelio> really annoying
[15:43] <bergerx_> maybe we just need to wait for ceph-deploy to be improved for this kind of situations
[15:43] <joelio> it's like.. oh, here lets make it easier.. and then let's make it more difficult
[15:43] <joelio> I really don't like ceph-deplloy tbh
[15:44] <joelio> I find myself duplicating a lot of commands
[15:44] <joelio> opr having MASSSIVE one-liners
[15:45] <joelio> or scripting it all up
[15:45] <joelio> and making me just wish mkcephfs worked as it once did
[15:46] <joelio> I'm going to have to bin the RAID devs.. annoyingly if it wont work and just hope it's robust enough until I can afford some SSDs
[15:46] <joelio> had some great results from the test cluster too.. back to the drawing board I gues
[15:46] * joelio gerubmles
[15:47] * mschiff (~mschiff@tmo-106-252.customers.d1-online.com) has joined #ceph
[15:47] <bergerx_> joelio: do you really need raid, because ceph can already do raid for you
[15:48] <joelio> it's not RAID for redundancy
[15:48] <bergerx_> and if you use raid with ssd disks
[15:48] <bergerx_> and if you use raid with ssd disks it seems like you lose trim supprt
[15:48] <darkfaded> bergerx_: "officially" recent md versions can pass through trim
[15:50] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[15:50] <joelio> bergerx_: it's RAID to increase IOPS available per OSD
[15:51] <joelio> (and with journal)
[15:51] <joelio> I got much better results and lower latency in RADOS benches doing it this way
[15:51] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[15:51] <joelio> as opposed to using raw single devices per osd
[15:52] <joelio> but I guess I can't now - easily at least
[15:53] <tnt> joelio: configuring OSD manually is really not that hard.
[15:54] <joelio> it's certainly more involved as it used to be
[15:55] <fmarchand> tnt joelio : not that hard and you can then stop or restart your daemons manually ! which is not possible with ceph-deploy !
[15:55] <fmarchand> I don't know why ...
[15:55] <joelio> hostnames
[15:55] <fmarchand> I checked ...
[15:55] <joelio> no sections in ceph config so the init script doesn't know what to restart
[15:56] <joelio> on the mkcephfs route, this worked as it was reading the ceph config
[15:56] <fmarchand> I missed something I think ! I added sections in the ceph.conf also
[15:56] <joelio> it no longer works as, well, not sure what ceph-deploy does under the hood
[15:58] <joelio> another excruitatingly annoying thing
[15:59] <fmarchand> Where can I have price for support if Ceph is chosen ?
[15:59] <tnt> request a quote from inktank, there is a form online.
[16:00] <fmarchand> oh oki
[16:00] <fmarchand> thx
[16:01] <itamar_> Hi,
[16:01] <itamar_> anyone knows about any other major company
[16:01] <itamar_> using ceph
[16:01] <itamar_> aside from dreamhost?
[16:03] <joelio> GRRRR
[16:03] <tnt> http://www.inktank.com/customers/
[16:06] <wschulze> itamar_: https://www.openstack.org/summit/portland-2013/session-videos/presentation/keynote-bloomberg-user-spotlight
[16:07] * mschiff (~mschiff@tmo-106-252.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[16:08] * leseb (~Adium@ has joined #ceph
[16:08] <madkiss> huhu wschulze :P
[16:10] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[16:13] * mschiff (~mschiff@tmo-106-252.customers.d1-online.com) has joined #ceph
[16:13] * joelio gives up
[16:15] <joelio> ceph-deploy is making me tear my hair out
[16:16] * leseb (~Adium@ Quit (Ping timeout: 480 seconds)
[16:16] * Vjarjadian (~IceChat77@ Quit (Quit: Say What?)
[16:17] <fmarchand> joelio : hang on !
[16:19] <joelio> I can't face using it any more - I'm going to have to fix up the puppet modules. They're designed for Bobtail, so yet another interesting prospect
[16:19] <joelio> disk zap doesn't actually do the job properly.. random issues when running it
[16:20] <joelio> it seems to miss off one of the devices in the list
[16:20] <joelio> I have no idea about how it manages the config on the filesystem
[16:20] <joelio> a seemingly decriptive ceph.conf is now so minimal it's useless
[16:22] <joelio> I'd just like to decribe my infrastrucure in a config file, not have to manually add stuff via the cli
[16:22] * aliguori (~anthony@ has joined #ceph
[16:23] <fmarchand> You made it with bobtail and puppet ?
[16:23] <joelio> https://github.com/enovance/puppet-ceph
[16:23] <joelio> hardcoded for bobtail
[16:23] <joelio> don't know what userland stuff will have changed or whether the libs work still
[16:25] <joelio> it's just frustrating that mkcephfs - worked.
[16:25] <joelio> no messing about with all this :)
[16:32] <andrei> does anyone know if I can have 4 mon servers?
[16:32] <andrei> or do I need to have either 3 or 5?
[16:32] * The_Bishop__ (~bishop@e179013061.adsl.alicedsl.de) has joined #ceph
[16:33] <andrei> joao: i've tried to add the compression option to all 3 monitors. it worked for 2 just fine, mons started without any issues
[16:33] <paravoid> andrei: you can in theory, doesn't make sense in practice
[16:33] <paravoid> it's worse than having 3
[16:33] <andrei> the 3rd mon (which has crashed previously and created 8gb of data files) has been starting for a while
[16:34] <andrei> paravoid: I seems to be having an issue with one of the mons and I was wondering how to replace it.
[16:34] <andrei> if i simply remove it, the cluster will have 2 mons and it will not work
[16:34] <andrei> how can I replace it?
[16:34] <andrei> i thought that perhaps add the 4rth mon
[16:34] <andrei> and delete the crashed mon afterwords
[16:35] <andrei> so that the cluster would always have 3+ mons
[16:35] <paravoid> I guess that would work, yes
[16:35] <andrei> would this work?
[16:35] <paravoid> pretty sure actually, I've done this
[16:35] <andrei> cool
[16:35] <andrei> nice, i will do that
[16:35] <andrei> i was hoping someone could answer another question that i've got
[16:35] <andrei> basically, i have two osd servers
[16:35] <andrei> one is a new and fast server
[16:35] <andrei> and second one is not so fast
[16:36] <andrei> at the moment i have both servers in the cluster
[16:36] <andrei> with replication 2
[16:36] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[16:36] <andrei> when I was doing testing, i've noticed that the clients tend to read only from the fast server
[16:36] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:36] <andrei> i've hardly seen any reads from the second, slow server
[16:37] * mschiff_ (~mschiff@tmo-102-20.customers.d1-online.com) has joined #ceph
[16:37] <andrei> that is probably down to weighting which got automatically assigned when I was setting up the cluster and adding osds
[16:37] <andrei> my question is if I will set the same weight to both servers, would it adversely effect the performance?
[16:37] <andrei> or would it improve performance as clients would be reading concurrently from both servers
[16:37] <andrei> ?
[16:38] * The_Bishop_ (~bishop@e179004203.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[16:40] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[16:40] * mschiff (~mschiff@tmo-106-252.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[16:41] * portante (~user@ Quit (Ping timeout: 480 seconds)
[16:42] * jahkeup (~jahkeup@ has joined #ceph
[16:46] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:48] * mschiff (~mschiff@tmo-102-20.customers.d1-online.com) has joined #ceph
[16:49] * mschiff_ (~mschiff@tmo-102-20.customers.d1-online.com) Quit (Read error: Connection reset by peer)
[16:52] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[16:55] * mschiff (~mschiff@tmo-102-20.customers.d1-online.com) Quit (Read error: Connection reset by peer)
[16:59] * rongze (~zhu@ has joined #ceph
[17:01] * mschiff (~mschiff@tmo-102-20.customers.d1-online.com) has joined #ceph
[17:02] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[17:04] * mschiff (~mschiff@tmo-102-20.customers.d1-online.com) Quit (Read error: Connection reset by peer)
[17:04] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[17:04] * leseb (~Adium@ has joined #ceph
[17:07] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:09] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:10] * mschiff (~mschiff@tmo-102-20.customers.d1-online.com) has joined #ceph
[17:10] <andrei> paravoid: i was wondering if you could help me with the monitor addition problem?
[17:10] <andrei> i've added the 4th monitor
[17:10] <andrei> and the cluster seems to have died
[17:10] <andrei> not sure how to recover now
[17:11] <andrei> ceph -s gives me nothing
[17:11] <andrei> just sits there
[17:11] <andrei> no messages, nothing at all
[17:11] <tnt> what was the previous state of the cluster ?
[17:11] <andrei> health HEALTH_WARN 1 mons down, quorum 1,2 a,b
[17:12] <andrei> tnt: i wanted to add the 4rth mon and after that remove the one that has previously crashed
[17:12] <tnt> well ... adding a mon in that state was not a good idea.
[17:12] <andrei> and consumed my root partition
[17:12] <tnt> you should have removed the failed one before.
[17:12] <tnt> because now you have 4 mons in the monmap so you need N/2+1 = 3 mon up to have quorum.
[17:12] <andrei> tnt: i would have had only 2 mons, which is also not a good idea?
[17:12] * sagelap (~sage@2600:1012:b025:9ab:e9b5:72f4:5d25:daa5) has joined #ceph
[17:13] <tnt> except the new one can't start because it doesn't have any data and so it needs to get a sync from an existing quorum first, which you don't have anymore.
[17:13] <andrei> i see
[17:14] <andrei> that doesn't sound good (((
[17:14] <joao> it should sync from any existing monitor if there's no quorum formed
[17:14] <tnt> joao: it's not my experience.
[17:14] <joao> afaik, it may very well be synchronizing right now
[17:14] <joao> it may take a long time though
[17:14] <andrei> 2013-06-06 16:14:49.571067 7f3e12fd6700 1 mon.a@2(electing) e6 discarding message auth(proto 0 27 bytes epoch 6) v1 and sending client elsewhere
[17:15] <andrei> i am seeing a bunch of messages like these in the logs
[17:15] <tnt> joao: AFAIK the mon will just discard all incoming messages when not in quorum.
[17:15] <joao> tnt, it was designed with that purpose in mind, we may have incorrectly changed it do address some bug however
[17:15] * sagelap (~sage@2600:1012:b025:9ab:e9b5:72f4:5d25:daa5) Quit (Remote host closed the connection)
[17:15] <imjustmatthew> andrei: when that happened to me I had to follow the procedure to remove an unhealthy mon from the cluster
[17:15] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:16] <joao> tnt, it won't; a good example of it not being that way is that the monitor will have to handle *some* messages to get in the quorum :)
[17:16] <joao> it will however drop most messages while synchronizing
[17:16] * sagelap1 (~sage@ Quit (Ping timeout: 480 seconds)
[17:16] <tnt> joao: well, it's definitely not working ATM. Sevral people had issues when going from 1 to 2 mons when following http://ceph.com/docs/next/rados/operations/add-or-rm-mons/
[17:16] <tnt> because in step 7, after the 'ceph mon add x', it looses quorum.
[17:17] <joao> andrei, those messages are normal; the monitor won't handle client requests if it's not in the quorum
[17:17] <tnt> the solution is to just skip step 7. Start the new mon first, it will sync, then add itself to the monmap.
[17:17] <andrei> thatnks
[17:18] * diegows (~diegows@ has joined #ceph
[17:18] <joao> tnt, I'll look into it
[17:18] * joao adds that to the pile
[17:18] * oliver1 (~oliver@p4FD06CA9.dip0.t-ipconnect.de) has joined #ceph
[17:18] <andrei> so, before i added the monitor i had 2 good mons and 1 broken
[17:18] <andrei> i've added a 4th mon and it broke things
[17:18] <andrei> so, which once should i remove first?
[17:18] <andrei> the new mon and the broken one
[17:19] <andrei> and add a third one?
[17:19] <tnt> just remove the broken one.
[17:19] <tnt> it will do the trick.
[17:19] <tnt> technically you could just have deleted the data frm the broken one, redone a mkfs and restarted it and it would have resynced.
[17:19] <tnt> without needing to add a 4th mon at all.
[17:22] * capri (~capri@ Quit (Quit: Verlassend)
[17:32] <paravoid> sage: slow peering is still there :(
[17:34] * leseb (~Adium@ Quit (Quit: Leaving.)
[17:34] * haomaiwang (~haomaiwan@ has joined #ceph
[17:36] * itamar_ (~itamar@ Quit (Remote host closed the connection)
[17:40] <andrei> Guys, i've followed the guide "Removing Monitors from an Unhealthy Cluster", but that did not solve my problem
[17:40] <andrei> after injecting the new map with 2 mons I am still struggling
[17:41] <andrei> ceph-s gives me fault messages
[17:41] <andrei> should I now try to add a 3rd mon?
[17:41] <andrei> as at the moment my monmap has 2 mons
[17:41] <andrei> as the 3rd broken mon has been removed
[17:43] * portante (~user@ has joined #ceph
[17:45] <joao> andrei, what do the logs for those 2 monitors say?
[17:46] <joao> also, if you're having so much problems, might be a good idea to set 'debug mon = 10' on both monitors to see what's happening
[17:48] <andrei> 2013-06-06 16:48:04.996443 7f3690156700 1 mon.b@2(peon) e6 discarding message auth(proto 0 26 bytes epoch 6) v1 and sending client elsewhere
[17:48] <andrei> messages like these
[17:48] <andrei> 2013-06-06 16:48:47.063295 7f90690f0700 1 mon.a@2(electing) e6 discarding message mon_subscribe({monmap=0+,osdmap=1337}) and sending client elsewhere
[17:48] <andrei> nothing else
[17:49] <joao> andrei, increase the debug level then
[17:49] <andrei> will do now
[17:49] <andrei> thanks
[17:51] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[17:52] <andrei> damn, one of the monitors just crashed
[17:53] * yan (~Adium@75-101-14-52.static.sonic.net) has joined #ceph
[17:53] * yan (~Adium@75-101-14-52.static.sonic.net) has left #ceph
[17:55] <andrei> joao: there are a lot of log entries
[17:55] <joao> well, I'd expect as much :)
[17:55] <andrei> should I paste the last couple of hundred?
[17:55] <andrei> into fpaste?
[17:56] <joao> paste as much as you can, or point me to the raw log
[17:56] <joao> I can also provide you a place to drop the log
[17:56] * janos stifles an immature snicker
[17:56] <andrei> joao: that wold be ideal if you could
[17:57] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:58] <joelio> right, I really give up now.. been nice beuing here but this is eating up too much time
[17:59] <andrei> joao: I am looking at the logs now and i can see that for some reason the monmap still has 4 mons
[17:59] <andrei> even though i've removed 2 mons
[17:59] <andrei> and reinjected it
[18:00] <andrei> the reinjection did not give me any errors
[18:00] <joao> andrei, have you injected it on all mons?
[18:00] <andrei> (((
[18:00] <andrei> nope
[18:01] <andrei> should I inject it to the remaining two or all four?
[18:02] <joao> you should inject it to all monitors contained on the monmap
[18:02] <andrei> sorry, silly mistake. will now do it
[18:02] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[18:03] * haomaiwang (~haomaiwan@ Quit (Quit: 离开)
[18:05] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: my troubles seem so far away, now yours are too...)
[18:06] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[18:06] * ChanServ sets mode +o scuttlemonkey
[18:08] * BillK (~BillK@124-169-216-2.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] * tnt (~tnt@ has joined #ceph
[18:08] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[18:09] <joao> andrei, can you check which version of leveldb are you running?
[18:10] <andrei> joao: how do I check that?
[18:11] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[18:12] <joao> andrei, if you're on ubuntu/debian, I believe dpkg -s libleveldb-dev would do it
[18:12] <tnt> isn't leveldb a static lib only ?
[18:12] * Vjarjadian (~IceChat77@ has joined #ceph
[18:12] <tnt> so if he's using the official package, it'd be whatever is on the build machine.
[18:13] <joao> tnt, I don't think we package leveldb in quite a while; glowell ?
[18:13] <andrei> tnt: i am using repo version
[18:13] <andrei> and I do not have leveldb packages installed
[18:13] <andrei> just checked
[18:13] <andrei> i am on ubuntu 12.04
[18:13] <joao> ok
[18:13] <tnt> joao: you don't package it, but it's "included" in the binary itself.
[18:13] <andrei> with latest patches
[18:14] <joao> tnt, alright, I'll check it :)
[18:14] <andrei> joao: I will be sending you logs shortly. something odd is going on
[18:14] <andrei> after i've reinjected the monmap and restarted two remaining mons
[18:14] <andrei> i ma having one of the mons crashing all the time
[18:14] <joao> kay
[18:14] <andrei> the other one is running without crashes
[18:15] <andrei> do you want just the mon logs or anything else?
[18:16] * leseb (~Adium@ has joined #ceph
[18:16] <joelio> what flaming directory are you supposed to run ceph-deploy? I'm having real difficulty understanding this
[18:17] <joelio> I will not be beaten
[18:18] <joelio> I get loads of keyring files and config written to the dir I'm running in - wtf
[18:20] * rongze (~zhu@ Quit (Read error: Connection reset by peer)
[18:21] <andrei> joelio: i've tried ceph-deploy
[18:21] <andrei> and that didn't work for me either
[18:21] <joelio> it doesn't work full stop
[18:21] <andrei> it got stuck at getting keys
[18:21] <joelio> root@vm-ds-01:/tmp# ceph-deploy gatherkeys vm-ds-01
[18:21] <joelio> Unable to find /etc/ceph/ceph.client.admin.keyring on ['vm-ds-01']
[18:21] <joelio> Unable to find /var/lib/ceph/bootstrap-osd/ceph.keyring on ['vm-ds-01']
[18:21] <andrei> so, i've followed the 5 minute guide from the ceph docs
[18:21] <andrei> and that worked like a charm
[18:22] <andrei> i've added just one mds, one mon and two osd servers
[18:22] <andrei> that has worked flawlessly
[18:22] * oliver1 (~oliver@p4FD06CA9.dip0.t-ipconnect.de) has left #ceph
[18:22] <andrei> i think there are still unresolved issues with ceph-deploy
[18:22] <joelio> surely this is putting off newbies.. I feel ashamed as I've been using ceph for 9 months and have no idea what it meant to be what
[18:23] * rongze (~zhu@ has joined #ceph
[18:23] <andrei> i am surprised it is listed as a preferred way to deploy ceph
[18:23] <andrei> as adding a simple cluster doesn't work
[18:23] <andrei> ))
[18:23] <andrei> i guess these things will be ironed out with time
[18:24] <andrei> i wish ceph would have a nice cli that would allow admin to do everything from one place
[18:24] <andrei> similar to what glusterfs does
[18:24] <andrei> it is so nice and simple to use
[18:25] <andrei> it took me minutes to roll out a new glusterfs cluster
[18:25] <andrei> but unfortunately glusterfs was not the right solution for me
[18:25] <tnt> andrei: well when the cluster is up the 'ceph' command can control a lot of stuff ...
[18:26] <andrei> tnt: sure
[18:26] <andrei> it's really good
[18:26] <joelio> andrei: that won't work for me now - it's what I've been attempting to do based on my test cluster config. mkcephfs won't allow me to just instantiate the OSDs without mkfs (it used too)
[18:26] <tnt> joelio: well without the mkfs it's pretty much just 'ceph osd create' ...
[18:26] <andrei> joelio: i've done this about a week ago
[18:26] <andrei> with 0.61.2
[18:27] <andrei> worked for me
[18:27] <andrei> i've pre-created osd partitions
[18:27] <andrei> and added them using the docs howto
[18:27] <andrei> didn't have any issues
[18:27] * mschiff (~mschiff@tmo-102-20.customers.d1-online.com) Quit (Read error: Connection reset by peer)
[18:27] <andrei> joao: did you find the logs that i've uploaded?
[18:29] * Tamil (~tamil@ has joined #ceph
[18:29] <andrei> joao: I need to leave the office now, but will be back in about an hour
[18:29] <andrei> hopefully I will catch you online
[18:29] * loicd (~loic@2a01:e35:2eba:db10:f59f:7722:b1c4:a589) has joined #ceph
[18:29] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[18:30] <joelio> andrei: ok, so where did you get the keyring set?
[18:31] <andrei> joelio: i've used this guide: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[18:31] <andrei> step by step
[18:31] <andrei> and it all worked for me
[18:32] <joelio> in the 5 minute guide
[18:32] <joelio> mkcephfs requires a keyring
[18:32] <joelio> where is it?
[18:32] <joelio> fresh install don't forget
[18:32] * leseb (~Adium@ Quit (Quit: Leaving.)
[18:36] <andrei> i think it will generate it
[18:36] <joelio> what will generate it?
[18:36] <joelio> -k is to describe where to find it
[18:37] <joelio> so there's an order issue
[18:37] <andrei> i've not had to do anything else apart from following the 5 mins guide
[18:37] <andrei> and not had any issues
[18:37] <joelio> ok, I'm following it now
[18:37] <andrei> sorry, got to run
[18:37] <andrei> back in about an hour with my mon problem )))
[18:38] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[18:41] <joelio> andrei: +1 nice - finally got something
[18:42] <joelio> maybe not
[18:42] <joelio> root@vm-ds-01:/etc/ceph# ceph health
[18:42] <joelio> 2013-06-06 17:42:39.057056 7fad623b8780 -1 unable to authenticate as client.admin
[18:42] <joelio> 2013-06-06 17:42:39.057255 7fad623b8780 -1 ceph_tool_common_init failed.
[18:42] <joelio> ffs
[18:43] <joelio> that's copied from the 5 minute guide verbatim
[18:43] <joelio> there's not really much to get wrong
[18:45] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[18:46] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[18:48] <cjh_> if you write with librbd to an rbd and don't close the image are your writes lost?
[18:48] * Oliver1 (~oliver1@ip-37-201-199-233.unitymediagroup.de) has joined #ceph
[18:48] <joelio> this 5 minute guide doesn't work either
[18:49] <joelio> https://gist.github.com/anonymous/52f0aba677e4d34e2108
[18:49] <joelio> errors with auth - as expected
[18:50] <joelio> this is really frustrating, spent A LOT of money getting a production cluster for Ceph
[18:51] <joelio> and now I can't even get the damn thing installed
[18:55] <cjh_> what does the ceph osd cluster_snap command do?
[18:55] <sjusthm> cjh_: nothing, it's deprecated, I think
[18:55] <cjh_> ok thanks
[18:56] <cjh_> sjusthm: i'm kinda wondering why there isn't a rados snap export command. Wouldn't it make sense to snapshot in the underlying rados layer instead of inside each client layer (rbd, radosgw, etc)?
[18:56] <sjusthm> snap export/
[18:56] <sjusthm> ?
[18:57] <cjh_> well there's a snap-diff export command for rbd
[18:57] <sjusthm> there actually is a rados level snapshot mechanism, but it doesn't have the kind of atomicity you'd want for rbd
[18:57] <cjh_> i see
[18:57] <sjusthm> it cannot be used with rbd
[18:57] <cjh_> i was thinking if you snap'd at the rados level the cephfs would also get snaps for 'free'
[18:57] <cjh_> i see
[18:57] <sjusthm> yeah, same with cephfs actually
[18:57] <cjh_> ok i guess i didn't understand the complexity of it
[18:58] <cjh_> does the metadata server actually store anything or is it all in the rados objects themselves?
[18:58] <sjusthm> in reality, the rbd, cephfs, and rados snapshots are all based on the same lower level rados snapshots
[18:58] <sjusthm> the MDS stores everything in rados
[18:58] <sjusthm> including operation logs
[18:58] <cjh_> ok
[18:58] <cjh_> that's what i figured was the case
[18:58] <cjh_> i thought that you were taking the metadata key/value pairs and creating a filesystem representation out of that
[18:59] <sjusthm> metadata key/value pairs/
[18:59] <sjusthm> ?
[18:59] <cjh_> yeah in rados everything is an object
[18:59] <sjusthm> true
[18:59] <cjh_> and it has metadata associated with it
[18:59] <sjusthm> yeah
[18:59] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) Quit (Quit: Leaving.)
[19:00] <cjh_> i figured cephfs was marking in the metadata that some objects belong to others and that's how your filesystem hierarchy is created
[19:01] <sjusthm> inodes are objects in the metadata pool or something like that with associated metadata, with data blocks stored in objects in other pools (gregaf feel free to step in?)
[19:02] <cjh_> interesting
[19:02] <gregaf> what's up?
[19:03] <sjusthm> I said a thing about cephfs
[19:03] <sjusthm> you should correct the wrong parts
[19:03] <gregaf> each directory (or fragment thereof) gets its own object in the metadata pool
[19:03] <gregaf> that object contains all the information about each inode which lives in that directory
[19:03] <cjh_> ok that's roughly how i guessed it worked
[19:04] <gregaf> the rados objects which hold file data are named deterministically based on inode number and offset
[19:05] <cjh_> ok thanks :)
[19:05] <gregaf> filesystem-level snapshots consist of 1) a little bit of extra metadata data on each associated inode (mostly a name->id mapping, and a list of IDs on each file), and 2) the data, which is stored in the file objects as rados user-managed snapshots
[19:06] <cjh_> does it currently have support for snaps?
[19:06] <gregaf> the filesystem?
[19:06] <cjh_> yeah
[19:07] <gregaf> well, you can create snapshots and play with them, yes
[19:07] <cjh_> at the pool level i'm guessing?
[19:07] <gregaf> but those code paths have less testing and are somewhat less stable than the rest of it
[19:07] <sjusthm> cjh_: any directory, I think
[19:07] <gregaf> nope, they're rooted in the directory you create them at
[19:07] * Vjarjadian (~IceChat77@ Quit (Quit: When the chips are down, well, the buffalo is empty)
[19:07] <cjh_> i see
[19:08] <gregaf> there's a hidden ".snaps" directory you can access via the shell
[19:08] <cjh_> what command would i run to do that snap?
[19:08] <cjh_> i'm having trouble finding it
[19:08] <gregaf> and you create/remove snapshots with mkdir/rmdir
[19:08] <cjh_> oh
[19:08] <gregaf> it won't show up in listings, you have to ask for it explicitly
[19:08] <cjh_> weird, ok
[19:08] <gregaf> (it *could* be ".snap" instead?)
[19:09] <cjh_> so i mkdir .snap and it snapshots it?
[19:09] <paravoid> sjusthm: your slow peering fix didn't cut it for me
[19:09] <sjusthm> paravoid: yeah
[19:09] <cjh_> that's both interesting and strange haha
[19:10] <sjusthm> paravoid: did you send osd logs at some point during the slow peering?
[19:10] <sjusthm> oh, here we go
[19:11] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:11] <cjh_> so i know it's theoretically possible but are there any examples out there of ceph clusters bigger than say 1,000 hosts?
[19:12] <gregaf> cjh_: mkdir .snap/my_snap, to be more accurate
[19:12] <cjh_> gregaf: ok cool, that sounds better
[19:13] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:13] <sjusthm> paravoid: to summarize, you can reproduce by restarting osd 6?
[19:13] <gregaf> and I'm not aware of any clusters that are that size yet; the customers who need to build out that much storage tend to be a little more conservative
[19:14] <cjh_> conservative in what way?
[19:14] <cjh_> in advertising it :)
[19:14] <paravoid> sjusthm: I reproduced by restarting osd.0
[19:14] <paravoid> sjusthm: or are you asking me to restart osd.6 now?
[19:14] <sjusthm> no
[19:14] <sjusthm> just getting my bearing
[19:14] <sjusthm> s
[19:14] <paravoid> right
[19:14] <paravoid> I attached the mon log
[19:14] <gregaf> if you need 1000 hosts you've got…tens of petabytes(?) in a single cluster
[19:14] <gregaf> at this point
[19:14] <cjh_> probably yeah
[19:14] <sjusthm> paravoid: you also had this problem on bobtail?
[19:15] <paravoid> yes
[19:15] <sjusthm> worse?
[19:15] <gregaf> DreamHost has discussed their cloud publicly, and it's 2.5-3 petabytes with 900 daemons — which is only like 100 machines
[19:15] <paravoid> the issue still says "(bobtail)" on the title, this should be probably get changed :)
[19:15] <cjh_> gregaf: i think it's 15PB at that point
[19:15] <sjusthm> yeah, I know
[19:15] <paravoid> I can't say if it's worse or better
[19:15] <paravoid> it's bad
[19:15] <gregaf> cjh_: more than that with modern disks
[19:15] <sjusthm> most of the moving parts changed, it's entirely possible that the bobtail version of the problem is unrelated
[19:15] <cjh_> well i'm using 3x replication and 3T disks to estimate
[19:15] <cjh_> and 15 disks per host
[19:16] <gregaf> ah, 15PB replicated, that's probably the right range
[19:16] <joelio> finally got there, I'll stop complainng about chef-deploy now.. got it going with mkcephfs
[19:16] <cjh_> gregaf: has dreamhost run into scaling issues at 900 daemons?
[19:16] <gregaf> they did on argonaut :p
[19:16] <cjh_> i see
[19:16] <paravoid> sjusthm: as I wrote to that bug report, upgrading/restarting all of the OSDs was a very bad experience because of this...
[19:17] <cjh_> what problems did they run into?
[19:17] <gregaf> bunch of little things that you don't see emerge until that kind of scale
[19:17] <sjusthm> paravoid: ok, if you're willing to reproduce, I'd like to get osd logs with logging after the restart
[19:17] <sjusthm> to start, can you restart osd.0 with
[19:17] <sjusthm> debug osd = 20
[19:17] <gregaf> mostly it was the recovery issues that we did a lot of work on in bobtail
[19:17] <sjusthm> debug filestore = 5
[19:17] <sjusthm> debug ms = 1
[19:17] <sjusthm> in the ceph.conf
[19:17] <sjusthm> ?
[19:17] <gregaf> and map processing
[19:17] <cjh_> gregaf: right, makes sense
[19:17] <paravoid> --debug-ms=1 --debug-filestore=5 --debug-osd=20 wouldn't work?
[19:18] <sjusthm> oh, yeah, that would be fine
[19:18] <paravoid> cool
[19:18] <sjusthm> injectargs might be too slow though
[19:18] <paravoid> right
[19:18] <paravoid> I'll do osd.1, in bobtail the problem was less intense for osds that had not been running for a while
[19:18] <gregaf> cjh_: when you create a cluster an order of magnitude bigger than the previous largest you tend to find new things turn into bottlenecks, but it's all implementation details ;)
[19:19] <cjh_> gregaf: oh i'm sure
[19:19] <paravoid> could be a different issue as you say, but let's not risk it :)
[19:20] <cjh_> gregaf: did Dreamhost deploy too many monitors or something?
[19:20] <gregaf> nah, it was stuff like what happens when you have 900 OSDs turning on at the same time, or 400 OSDs fail and then come back up (due to network flapping)
[19:20] <cjh_> gotcha
[19:21] <cjh_> i can see that being an issue
[19:21] <cjh_> you melt the monitors when that happens? :D
[19:21] <gregaf> our fixes for those issues should scale to more or less whatever size (I hope), but there may be something new we haven't foreseen
[19:21] <gregaf> actually you tend to melt the OSDs
[19:21] <cjh_> interesting
[19:21] <cjh_> why the osds?
[19:21] <gregaf> not anymore, but they used to get 40 new OSD maps a second and try and process them all synchronously
[19:21] <cjh_> oh haha
[19:22] <cjh_> that would do it
[19:22] <gregaf> which meant taking a bunch of big locks
[19:22] <cjh_> right
[19:22] <cjh_> while it figures out the state of the world
[19:22] <gregaf> now they take a couple of small locks, and we don't force all the PGs to write out new data on every map all at the same time, etc
[19:23] <gregaf> sjust could talk more about this as he did all the work (I just reviewed it) and probably has a better sense for what kinds of issues are likely to appear in the future
[19:23] <cjh_> ok maybe i'll hit him up if he has time
[19:23] <paravoid> sjusthm: so, peering now lasted for 30s, although now I'm seeing pgs in "active" (not "active+clean") and "active+recovery_wait" and that produces slow requests too
[19:24] <cjh_> gregaf: at a 100 host level how do you ensure that with your 3x replication you don't lose exactly the wrong hosts at the same time?
[19:24] <gregaf> set up a good crush map?
[19:24] <cjh_> or is that all in the crush implementation to ensure your failure domains are properly represented
[19:24] <cjh_> ok
[19:25] <gregaf> I mean, somebody could maliciously start hitting your hard drives
[19:25] <cjh_> yeah that's always a possibility
[19:25] <gregaf> and if you lose 3 OSDs in three separate failure domains at the same time you might lose a small fraction, but that's pretty much life
[19:25] <gregaf> the numbers I hear are that 3 replicas gets your odds of data loss below your odds of data center loss
[19:26] <cjh_> yeah what i mean is it's hard to say which 3 machines i can't lose with ceph
[19:26] <cjh_> ok
[19:26] <gregaf> availability is a different story, but I'm not sure by how much
[19:28] <cjh_> that's odd cause i thought dreamhost was far past 100 hosts by now
[19:28] <gregaf> well, let's see what's public
[19:28] <cjh_> ok
[19:28] <gregaf> they've got two clusters and I think they're both about 3 PB
[19:28] <cjh_> so that puts them at about 200 hosts
[19:28] <paravoid> sjusthm: should I file a new bug report or append the data to this one?
[19:29] * DarkAce-Z (~BillyMays@ has joined #ceph
[19:29] <gregaf> that requires 1000 3TB disks, and you can stick 9 to 12 disks in a 2U (?) host depending on which model you get from which vendor
[19:29] <cjh_> gregaf: make that 400 hosts
[19:29] <gregaf> which is 100ish per cluster
[19:29] <cjh_> yeah i was thinking more like 15 disks but you're prob right
[19:30] <cjh_> so they've got a ways to go
[19:30] <gregaf> it doesn't take that much physical space any more to store more data than you can possibly imagine filling :p
[19:30] <cjh_> the genomics people that wanted the erasure coding are probably looking to build the biggest cluster yet
[19:30] <gregaf> unless you're one of those people who routinely works in quantities nobody else can believe
[19:31] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[19:31] <cjh_> haha
[19:31] <gregaf> I heard somewhere that Netflix's whole catalogue was a small number of PB and I just couldn't believe it
[19:31] * rongze (~zhu@ Quit (Read error: Connection timed out)
[19:31] <cjh_> that seems low
[19:31] <gregaf> but then I sat down and did the math and 3PB is actually many million hour-long titles at 5mb/s
[19:32] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[19:32] <cjh_> my guess is they keep multiple copies of everything at different levels of encoding
[19:32] <gregaf> which is actually the right size
[19:32] <cjh_> hmm ok
[19:32] <gregaf> yeah, but their top-end is 5mb/s and they encode every half level or whatever so you can say it's 10mb overall, and then carry on the math
[19:32] * rongze (~zhu@ has joined #ceph
[19:33] <cjh_> right
[19:33] <gregaf> so, you know, Netflix, only 3PB
[19:33] <cjh_> so depending on the titles they have, etc you could estimate it pretty accurately
[19:33] <gregaf> there aren't many people who want to store more than they do ;)
[19:33] <cjh_> indeed
[19:33] <gregaf> yeah, I was just picking out one of their advertised title counts though I don't remember it now
[19:36] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[19:36] <cjh_> gregaf: i'm willing to be bioinformations people could dwarf that amount of data
[19:36] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:36] <gregaf> yeah, the repositories definitely can
[19:36] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[19:37] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:37] <cjh_> gregaf: so if one of those guys said we wanted to build a 50PB cluster would you say go for it?
[19:37] <gregaf> sure
[19:38] <cjh_> nice :)
[19:38] <gregaf> though to be fair I'm not sure how else you would go about building one at that size
[19:39] <cjh_> oh i'm sure there's ways
[19:40] * tkensiski (~tkensiski@179.sub-70-197-10.myvzw.com) has joined #ceph
[19:41] * tkensiski (~tkensiski@179.sub-70-197-10.myvzw.com) has left #ceph
[19:41] <gregaf> I'm not saying there aren't, just that I don't know what they are :)
[19:42] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[19:43] <gregaf> I guess GPFS gets pretty big, so several of those clustered together? or Lustre, but even ORNL's Titan install is only 40PB and they've probably got all the world's Lustre expertise working on setting that up and keeping it alive
[19:43] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:43] <cjh_> wow i didn't know there was a lustre that big
[19:43] <gregaf> it's fairly new
[19:43] <gregaf> and probably a couple times larger than the next-biggest
[19:44] <cjh_> that's crazy
[19:44] <gregaf> Wikipedia says April
[19:44] <cjh_> i'd hope they have a clustered mds setup :)
[19:46] <gregaf> no idea
[19:52] <sjusthm> paravoid: probably just add the file to the bug
[19:53] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[19:54] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:07] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[20:14] <Midnightmyth> when subscribing to ceph-devel do you recieve a confirmation?
[20:17] <gregaf> huh, I don't remember
[20:18] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[20:18] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[20:21] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[20:22] <gregaf> but I did just send an email to the ceph-devel list; if that shows up shortly I'd say you're subscribed ;) (or the converse)
[20:24] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[20:24] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[20:24] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[20:30] * athrift (~nz_monkey@ Quit (Ping timeout: 480 seconds)
[20:30] <sjusthm> paravoid: logs?
[20:30] <paravoid> sec.
[20:30] <sjusthm> k
[20:32] <erwan_taf> hi there
[20:32] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[20:32] <erwan_taf> does some of ceph people are getting an eye on the LocalFS initiative ?
[20:33] <erwan_taf> swift is currently working on allowing multiple backend support
[20:33] <erwan_taf> gluster will be the first implementation
[20:33] <erwan_taf> wonder if ceph people will give some backend for swift ?
[20:34] <sjusthm> we do have a swift compatible backend
[20:36] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Ping timeout: 480 seconds)
[20:36] <mattbenjamin> davidz: around?
[20:38] <davidz> mattbenjamin: yes
[20:39] <mattbenjamin> Hi David.
[20:40] <mattbenjamin> I had on my todo list to revisit wip-libcephfs and wip-libcephfs-rebased.
[20:40] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:40] <mattbenjamin> What I pushed initially was just a draft.
[20:41] <mattbenjamin> The stuff on rebased has our changes, but looses all attribution, and mixes in lots of other changes. Should I be rebasing wip-libcephfs again on something else?
[20:42] * wer (~wer@206-248-239-142.unassigned.ntelos.net) Quit (Ping timeout: 480 seconds)
[20:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:47] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:49] <Midnightmyth> hmm cant get the mailing list to work Im sending the subscribe ceph-devel to majojdomo but just get an error
[20:49] <mattbenjamin> Or, some other changes. I lack context, I guess.
[20:50] <dmick> Midnightmyth: the kind of error probably matters
[20:50] <sagewk> mattbenjamin: the rebase patches are easier to follow so that or something close to is is ideally what we want to merge, but of course the attribution needs to be fixed up on the commits
[20:50] <Midnightmyth> Im getting "The address you supplied, = is not a complete address."
[20:50] <sagewk> also, ideally the branch would be against master for a clean merge. but that is easy to fix later.
[20:51] <paravoid> sjusthm: done
[20:52] <sjusthm> thanks
[20:52] * tkensiski (~tkensiski@ has joined #ceph
[20:52] * tkensiski (~tkensiski@ has left #ceph
[20:53] <mattbenjamin> Ok, that just thew me. I'm interested in going in and making Ganesha actually run against the branch.
[20:54] <mattbenjamin> I'd like to fix the attribution for what we did, but don't want to create any ambiguity about how you're adapting things yourselve.
[20:56] <dmick> Midnightmyth: indeed, = is not a complete address
[20:56] <mattbenjamin> Sorry I didn't look at it earlier. The emails from Ilya made me want to get this off the back burner.
[20:57] <Midnightmyth> dmick I did put my adress ..
[20:58] * loicd (~loic@2a01:e35:2eba:db10:f59f:7722:b1c4:a589) Quit (Quit: Leaving.)
[20:58] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:59] <Midnightmyth> the error I got : It says When providing an address, you must give**** the full name of the machine including the domain part (like **** host.corp.com
[21:01] <mattbenjamin> sagewk,davidz: or, should I focus on wip-libcephs, and not worry about rebased for now?
[21:02] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[21:04] <Midnightmyth> ah I see why it wont work... MS ruins it
[21:05] <Midnightmyth> need to get another mail adr then
[21:05] * mattbenjamin1 (~matt@aa2.linuxbox.com) has joined #ceph
[21:05] * viperafk (~baron@n1-38-250.dhcp.drexel.edu) has joined #ceph
[21:08] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:11] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Ping timeout: 480 seconds)
[21:11] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[21:11] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[21:17] * nhm (~nhm@65-128-142-169.mpls.qwest.net) has joined #ceph
[21:17] * ChanServ sets mode +o nhm
[21:18] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[21:19] * portante_ (~user@ has joined #ceph
[21:20] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[21:22] * portante (~user@ Quit (Ping timeout: 480 seconds)
[21:30] * BManojlovic (~steki@fo-d- has joined #ceph
[21:32] * rongze (~zhu@ Quit (Read error: Connection reset by peer)
[21:32] * rongze (~zhu@ has joined #ceph
[21:34] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:35] <jfriedly> Any radosgw people around? I'm wondering if we can configure radosgw with a stripe unit, stripe count, and object size to get fast writes in addition to the fast reads that we get by default
[21:37] <paravoid> joao: so, the verdict is that I messed up and restarted one of the mons in a critical moment?
[21:37] * jasdeepH (~jasdeepH@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[21:38] <paravoid> I don't remember that but I won't dispute it
[21:41] * viperafk (~baron@n1-38-250.dhcp.drexel.edu) Quit (Quit: leaving)
[21:43] * mattbenjamin1 (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[21:44] * paradigm (~paradigm@ has joined #ceph
[21:44] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[21:52] <sjustlaptop> paravoid: what do you have set for osd op threads? default?
[21:53] <paravoid> yes
[21:54] <paravoid> sjustlaptop: my settings are https://git.wikimedia.org/blob/operations%2Fpuppet.git/d06f424819e6cce15ac9d5d105044028785b2d13/manifests%2Frole%2Fceph.pp
[21:55] <paravoid> some of them are weird, remnants of older bugs, I should probably remove them
[21:55] * athrift (~nz_monkey@ has joined #ceph
[21:55] <paravoid> osd op thread timeout / osd recovery thread timeout at least
[21:56] * Oliver1 (~oliver1@ip-37-201-199-233.unitymediagroup.de) has left #ceph
[21:58] * portante_ (~user@ Quit (Ping timeout: 480 seconds)
[22:01] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[22:05] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[22:05] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[22:07] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[22:07] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit (Quit: leaving)
[22:10] <sjustlaptop> paravoid: I've got a small patch I'd like you to try when it builds
[22:10] <andrei> hello guys
[22:10] <andrei> i am back with my mons headache.
[22:11] <andrei> is there anyone how could help me bring back my ceph cluster?
[22:13] <mattbenjamin> lieb (others): nfs-ganesha/ntirpc duplex-10 is updated, default branch now duplex-10, and our next has a commit retargeting the submodule to the official repo.
[22:13] <mattbenjamin> (er, mix: why I don't always join this channel, sorry :(
[22:15] <paravoid> sjustlaptop: sure
[22:15] <paravoid> would it need a restart of all osds to test?
[22:15] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:15] <sjustlaptop> paravoid: for now, better to just update 1 osd and restart 1 osd
[22:15] <sjustlaptop> it's a 1 line change
[22:15] <gregaf> andrei: what's up?
[22:24] * portante (~user@ has joined #ceph
[22:33] <sjusthm> paravoid: wip_5238_cuttlefish
[22:33] <sjusthm> osd.2 again
[22:33] <sjusthm> and can you add debug-filestore=20
[22:33] <sjusthm> ?
[22:33] <joao> paravoid, that's my current verdict that I'm trying to dispute as we speak
[22:36] <andrei> gregal: thanks
[22:36] <andrei> i've uploaded the log files to ceph.com sftp server
[22:36] <andrei> in andrei_irc folder
[22:37] <andrei> basically, i had 3 mons before one of the mons crashed
[22:37] <andrei> aparently due to the issue with leveldb
[22:37] <joao> andrei, haven't had the opportunity to look into them
[22:37] <joao> sorry about that
[22:37] <joao> trying to wrap up other bug
[22:37] <andrei> i tried to add a 4th mon so that i could remove the crahsed one
[22:37] <andrei> joao: no probs
[22:38] <andrei> joao: is there a way for me to start clean with the mons
[22:38] <andrei> ?
[22:38] <andrei> but keep the data?
[22:38] <andrei> i would like to resume my testing
[22:39] <andrei> i was hoping to start moving production vms onto ceph early next week
[22:40] <davidz> mattbenjamin, sagewk: I would work off of wip-libcephfs-rebased, since Sage tweaked the commits, but the code is identical except from some dead code removal.
[22:41] <gregaf> andrei: so adding the fourth mon failed? or what's the problem?
[22:42] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:42] * ChanServ sets mode +v andreask
[22:42] <andrei> gregal: yeah, after i've added it my cluster failed
[22:42] <andrei> it stopped responding
[22:42] <andrei> ceph -s hangs
[22:42] <andrei> i've tried to follow the guide to remove the mon
[22:43] <andrei> i've removed 2 mons - the newly added one and the crashed one
[22:43] <andrei> after that i've restarted the mons that were previously working okay
[22:43] <andrei> but that failed as well
[22:44] <andrei> one of the mons just keeps crashing
[22:44] <gregaf> it sounds like maybe you managed to add the fourth monitor without actually connecting it
[22:44] <andrei> and other one starts, but i guess i can't do much
[22:44] <gregaf> ah, is the backtrace on the crash in your logs?
[22:44] <andrei> i've followed the adding guide step by step
[22:45] <andrei> yeah, there should be a backtrace in the logs
[22:45] <andrei> which i've sent as per joao's instructions
[22:47] <paravoid> sjusthm: sorry, just came back, doing it now
[22:47] <sjusthm> no problem
[22:48] <paravoid> hm, installing the package will restart all the osds
[22:49] <sjusthm> still, you can do it on just one machine at least
[22:49] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)
[22:50] <andrei> does anyone know if i can start fresh with ceph monitors without killing the data in the cluster?
[22:51] <sjustlaptop> andrei: not easily
[22:51] <andrei> in that case, how do i recover from the state i am in now?
[22:52] <andrei> should i follow the guide again and remove all mons apart from one?
[22:52] <andrei> and add other mons after?
[22:52] <andrei> would that work?
[22:52] <andrei> or do i need another aproach?
[22:55] <gregaf> andrei: are you sure you removed the monitors you think you did? it looks like your monitor that's crashing got a corrupted disk state (possibly by never having all the state to begin with)
[22:56] <andrei> the mommap after removal had 2 mons in it
[22:56] <andrei> and i had 4 before the removal
[22:57] <gregaf> joao: check_osd_map (PGMonitor) is asserting because it's getting out an error from osdmon()->get_version() (presumably ENOENT?)
[22:57] <paravoid> sjusthm: much much worse
[22:57] <paravoid> sjusthm: I did it just on osd.2
[22:57] <paravoid> now I got stuck peering
[22:57] <sjustlaptop> um
[22:58] <sjustlaptop> can you get the ceph-osd -v on another node?
[22:58] <paravoid> but, I now have -filestore=20, -osd=20, -ms=1
[22:58] <sjustlaptop> and post the log
[22:58] <gregaf> joao: so my guess is that he wants to clear out the store (keeping backups) and let it re-sync from the other, but I don't remember what needs to stay where to make that happen
[22:58] <joao> gregaf, I've seen that happen before, just can't recall which bug
[22:58] <joao> or maybe an iteration of that sort of thing
[22:58] <paravoid> ceph version 0.61.2-58-g7d549cb (7d549cb82ab8ebcf1cc104fc557d601b486c7635)
[22:58] <paravoid> that's 0.61.3 right before the version bump
[22:59] <joao> andrei, did you upgrade your monitors from bobtail?
[22:59] <sjustlaptop> yeah
[22:59] <andrei> joao: nope, clean install from ubuntu repo
[22:59] <paravoid> still peering
[22:59] <joao> right
[22:59] <sjustlaptop> I cannot imagine a way in which that patch would have resulted in hung peering
[22:59] <sjustlaptop> interesting
[23:00] <paravoid> well
[23:00] <sjustlaptop> you can reinstall the old version and restart and post the logs
[23:00] <paravoid> I saw stuck peering before
[23:00] <joao> gregaf, checking what and where is causing the assert would be nice before getting rid of the store
[23:00] <paravoid> not with the logs you saw
[23:00] <sjustlaptop> meh, no point in speculating, this instance has log output
[23:00] <paravoid> but it has happened with .2
[23:00] <paravoid> yeah
[23:00] <joao> andrei, can you install 'ceph-tests', and run a couple of commands for me?
[23:01] <joao> or just send the store my way? :)
[23:01] <paravoid> just finished
[23:01] <andrei> sure
[23:01] <paravoid> almost 5'
[23:02] <sjustlaptop> it finished peering?
[23:03] <paravoid> yes
[23:03] <sjustlaptop> hmm
[23:03] <sjustlaptop> ok
[23:04] * dcasier (~dcasier@ Quit (Ping timeout: 480 seconds)
[23:04] <paravoid> bziping logs and uploading
[23:05] <andrei> joao: i've installed it on one of the mon/mds/osd servers
[23:05] <andrei> should i install it on the clients as well?
[23:05] <andrei> or one node would be sufficient?
[23:07] * Vjarjadian (~IceChat77@ has joined #ceph
[23:07] <paravoid> sjusthm: slowpeer2-ceph-osd.2.log.bz2 & slowpeer2-osd2-ceph.log
[23:10] * dcasier (~dcasier@ has joined #ceph
[23:11] * DarkAce-Z is now known as DarkAceZ
[23:16] * mschiff (~mschiff@port-50463.pppoe.wtnet.de) has joined #ceph
[23:16] <andrei> joao: what commands would you like me to run?
[23:18] <sjustlaptop> paravoid: can you try it again without debug-filestore=20?
[23:19] <sjustlaptop> something is fishy
[23:19] <joao> andrei, 'ceph_test_store_tool <path-to-mon-data-dir>/store.db list osdmap > store.dump && ceph_test_store_tool <path-to-mon-data-dir>/store.db get osdmap first_committed >> store.dump && ceph_test_store_tool <path-to-mon-data-dir>/store.db get osdmap last_committed >> store.dump'
[23:19] <joao> and drop store.dump with the rest of your logs in cepdrop
[23:20] <andrei> joao: will run it now
[23:20] <andrei> should I do it on the mon which keeps crashing?
[23:20] <joao> k thx
[23:20] <andrei> or on all 4 mons?
[23:22] <joao> andrei, yes please; change the name of the dump file accordingly for each mon though
[23:22] <paravoid> sjustlaptop: this was with debug-filestore=20
[23:22] <sjustlaptop> yeah, without this time
[23:22] <paravoid> oh
[23:22] <paravoid> btw
[23:22] <paravoid> 2013-06-06 21:22:21.876723 mon.0 [INF] pgmap v8355185: 16760 pgs: 16514 active+clean, 1 active+degraded+wait_backfill, 152 active+degraded+backfilling, 12 active+degraded+remapped+wait_backfill, 8 active+clean+scrubbing, 73 active+degraded+remapped+backfilling; 44530 GB data, 137 TB used, 117 TB / 254 TB avail; 1784KB/s rd, 1982KB/s wr, 502op/s; 3395684/825624677 degraded (0.411%); recovering 1088 o/s, 173MB/s
[23:23] <paravoid> now it's like that
[23:23] <paravoid> (it out'ed osd.2)
[23:23] <paravoid> and I'm getting tons of slow requests
[23:23] <sjustlaptop> is it back in?
[23:23] <paravoid> I haven't started it yet, no
[23:23] <sjustlaptop> that's why it outed it
[23:23] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[23:23] <paravoid> 2013-06-06 21:23:35.604779 osd.51 [WRN] 4 slow requests, 1 included below; oldest blocked for > 612.576969 secs
[23:23] <paravoid> I know
[23:24] <paravoid> I'm just saying, out'ing didn't use to block requests for 10 mins
[23:24] <mikedawson> Just apt-get upgraded to 0.61.3. All ceph processes got a sigterm and gracefully shut down. Is that expected? I don't recall a package upgrade ever sending sigterm's to the daemons before.
[23:24] <paravoid> different bug I guess
[23:24] <sjustlaptop> paravoid: yeah, one bug at a time
[23:24] <paravoid> heh
[23:24] <gregaf> sagewk: I thought you fixed that sigterm-on-update that mikedawson is seeing...
[23:25] <paravoid> okay, I'll start osd.2 without any debug options, let it clean itself up and be healthy again
[23:25] <paravoid> and I'll retry the test
[23:26] <paravoid> wow, that out -> slow reqs is very very bad
[23:26] <paravoid> digging my hole deeper and deeper
[23:28] <sjustlaptop> right
[23:29] <andrei> joao: okay, first one uploaded in andrei_irc folder
[23:29] <andrei> called mon-a.tar.bz2
[23:29] <andrei> doing the rest now
[23:29] * mschiff (~mschiff@port-50463.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[23:30] <paravoid> sjustlaptop: so, debug-filestore=5 now?
[23:30] <Karcaw> how do i start and osd that is 'full', it got so full that its getting an ENOSPC, i can reweight it, but it wont even start enough to let it reduce its usage. is it safe to just remove a few things to get it started?
[23:30] <sjustlaptop> paravoid: whatever you used for the first one
[23:31] <paravoid> that was it
[23:31] <paravoid> fwiw, I started it without any debug options and it's still taking minutes to peer
[23:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[23:33] <sjustlaptop> how's the cpu usage?
[23:33] <sjustlaptop> memory?
[23:34] <Vjarjadian> anyone here used Cephwin? supposedly a client for windows server 2008
[23:34] <paravoid> https://ganglia.wikimedia.org/latest/?c=Ceph%20eqiad&h=ms-be1001.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2
[23:34] <sjustlaptop> paravoid: that's with the patched version?
[23:35] <joao> eh, is there such a thing? first time I'm hearing about this :p
[23:35] <paravoid> sjustlaptop: yes
[23:35] <Vjarjadian> http://code.google.com/p/cephwin/
[23:35] <paravoid> cpu & memory above
[23:35] <joao> Vjarjadian, oh wow
[23:35] * mschiff (~mschiff@port-50463.pppoe.wtnet.de) has joined #ceph
[23:35] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:36] <Vjarjadian> i'm hoping maybe it will work with Hyper-V server 2012... so i can avoid having to have iSCSI VMs...
[23:36] <paravoid> I was telling sagewk how I have interesting bugs
[23:37] <joao> Vjarjadian, it looks like the project is empty
[23:37] <Vjarjadian> indeed, i emailed ages ago, but no response
[23:37] <andrei> joao, on the mon server which is initially crashed I get a seg fault running the first command
[23:38] <joao> Vjarjadian, considering most of their sentences are written as "will be built", my guess is that the project never took off
[23:38] <andrei> the second command also causes a seg fault
[23:38] <Vjarjadian> guess i'll need to have the iSCSI VMs then
[23:38] <joao> andrei, oh yeah? is there a stack trace?
[23:39] <andrei> that's the onlything that i get
[23:39] <andrei> ceph_test_store[16233]: segfault at 40 ip 000000000041fa10 sp 00007fff54061138 error 4 in ceph_test_store_tool[400000+71000]
[23:41] * rongze (~zhu@ Quit (Ping timeout: 480 seconds)
[23:42] <andrei> i can run an strace or something else if you wish
[23:42] <andrei> the newly added monitor also produces a seg fault when i run the command
[23:43] <joao> andrei, maybe running the ceph_test_store_tool commands with gdb and see what's causing the segfault?
[23:44] <andrei> let me finish the final mon and i will get back to seg fault
[23:44] <joao> k
[23:44] <joao> no worries
[23:44] <andrei> in the mean time, could you please remind me how to run gdb and what i should do. i've not done this for years
[23:50] <andrei> joao: okay, the mon-b files have been uploaded
[23:50] <andrei> now i can check the segfault issue
[23:50] * portante (~user@ Quit (Read error: Operation timed out)
[23:53] <paravoid> sjustlaptop: heh, now it worked, dammit
[23:53] <paravoid> it didn't take long to peer that is
[23:53] * mschiff_ (~mschiff@2a02:2028:256:f531:290:f5ff:fec3:eac5) has joined #ceph
[23:54] <sjustlaptop> paravoid: try again?
[23:54] <Midnightmyth> arg srsly how come is so difficult to subscribe to mailing list, all the webbased mail system uses weird systems making it imposible
[23:54] <paravoid> I will
[23:54] <paravoid> I've kinda seen this before
[23:54] <paravoid> I think the observation I've made has to do with osd's uptime
[23:55] <paravoid> if it's been running for a while or not, but I could be wrong.
[23:55] <sjustlaptop> for some problems, probably
[23:55] <gregaf> Midnightmyth: hmm, it works fine for me as long as I send plain text emails (there's an easy switch for that in gmail)
[23:55] <sjustlaptop> for this particular one, I think it's probably not very strongly related
[23:55] <gregaf> just "subscribe ceph-devel" from the email I wanted to receive it at seems to have worked
[23:56] <Midnightmyth> gregaf: I did send a textmail from my gmx mail
[23:56] * mschiff (~mschiff@port-50463.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[23:59] <Midnightmyth> oh well nevermind that
[23:59] <Midnightmyth> how often is ceph released?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.