#ceph IRC Log

Index

IRC Log for 2013-05-26

Timestamps are in GMT/BST.

[0:09] * tnt (~tnt@91.177.204.242) Quit (Read error: Operation timed out)
[0:13] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:17] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[0:17] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[0:29] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Quit: bia)
[0:32] * BillK (~BillK@124-169-236-155.dyn.iinet.net.au) has joined #ceph
[0:42] * frank9999 (~frank@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[0:43] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[0:52] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[0:55] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:55] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:56] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[0:59] <Kioob> Hi !
[0:59] <Kioob> I have a big problem tonight : store.db is still increasing (near 9GB !) and the disk is near full
[1:00] <Kioob> mon.a addr 10.0.0.1:6789/0 has 25% avail disk space -- low disk space!
[1:00] <Kioob> mon.b addr 10.0.0.2:6789/0 has 21% avail disk space -- low disk space!
[1:00] <Kioob> mon.e addr 10.0.0.3:6789/0 has 14% avail disk space -- low disk space!
[1:04] <Kioob> (ok I'm reading http://tracker.ceph.com/issues/4895 )
[1:08] * danieagle (~Daniel@177.205.182.70.dynamic.adsl.gvt.net.br) has joined #ceph
[1:18] <Kioob> someone tried the wip-4895-cuttlefish version from Sage ?
[1:22] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[1:25] <nhm> Kioob: there were some suggestions in the tracker too. It's definitely something we are working on.
[1:27] <Kioob> yes nhm, I tried "mon compact on start = true"
[1:27] <Kioob> but the compact need some space
[1:27] <Kioob> so... the disk is now 100% full, and the mon process crash
[1:28] <nhm> Kioob: ugh, no good.
[1:28] <Kioob> yes...
[1:28] <nhm> Kioob: I'm coming at this from the performance side. I have a test cluster that had OSDs stalling ops because they couldn't get monmap updates.
[1:28] <nhm> sorry, osdmap
[1:29] <nhm> Kioob: I haven't been involved in a whole lot of the discussions yet, just discovered the stuff I was seeing was related this morning.
[1:29] <Kioob> ok... so Mon should have fast storage too
[1:30] <nhm> Kioob: maybe, but I suspect there's something strange going on.
[1:30] <Kioob> right now, can I move my crashed mon on SSD (on a bigger partition) ?
[1:30] <nhm> Kioob: I was seeing 100% CPU utilization on all mons, seemingly due to compaction or something.
[1:30] <nhm> Kioob: I don't know enough about the mons to feel comfortable saying if that's a good idea or not. :/
[1:31] <Kioob> well... in some hours all my cluster will be dead
[1:31] <nhm> Kioob: production?
[1:31] <Kioob> of course... Murphy ;)
[1:31] <nhm> Kioob: ugh. Support contract? ;)
[1:32] <Kioob> not yet...
[1:32] <nhm> Kioob: ok. It's going to be tough to find anyone on the weekend then.
[1:32] <Kioob> I suppose yes
[1:32] <nhm> Kioob: If you can slow down cluster traffic it might let the mons catch up, but I don't know for sure.
[1:33] * jtang1 (~jtang@80.111.97.194) Quit (Quit: Leaving.)
[1:35] <nhm> Kioob: You could try Sage's patch. No idea if it works though.
[1:36] <Kioob> oh. I see that all my "mon" use ext4 storage. Don't know if it's the recommended usage
[1:37] <nhm> Kioob: did you see Mike Dawson's temporary fix by stopping the OSDs and MONs, then staring the MONs only with mon compact on start = true?
[1:37] <nhm> If you have no alternative and the cluster is going to go down anyway, that might be a way to recover gracefully.
[1:38] <Kioob> in fact, it's what I tried. My disk was 86% usage, the mon start a compaction, and fill the disk at 100%, then crash
[1:38] <nhm> doh
[1:38] <Kioob> so to do compaction, it need some room
[1:38] <nhm> Ok, I see your problem.
[1:39] <Kioob> I'm trying to move the mon on a bigger partition
[1:39] <Kioob> but don't now if I can safely move /var/lib/ceph/mon/
[1:39] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[1:40] <nhm> Kioob: Sadly I don't know either.
[1:40] <Kioob> I don't find any information about xattr
[1:40] <Kioob> ;)
[1:40] <nhm> Kioob: I mostly work with clusters that live for like an hour or two. :)
[1:40] <nhm> if that
[1:40] <nhm> unless I'm doing aging tests
[1:48] <Kioob> nhm : I see that mon do a lot of "fsync" ; it doesn't use "syncfs" like osd ? So having mon on same host than osd is a very very bad idea, no ?
[1:53] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:57] <Kioob> (moving data on a bigger partition seem to solve my problem)
[2:15] <Kioob> strange : when mon "compact" the 9.7GB store.db, it grow until 18GB, then shrink to 9.5GB... Great :p
[2:22] <Kioob> → I had shut down a mon, then all others mon shrink to 1.2GB. Don't know if it's related
[2:24] <nhm> Kioob: Could you mention all of that on the tracker?
[2:24] <nhm> Kioob: don't know if it's related or not either, but it wouldn't hurt to mention it.
[2:33] <Kioob> ok
[2:41] * jtang1 (~jtang@80.111.97.194) Quit (Quit: Leaving.)
[2:53] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[2:54] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[3:00] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[3:08] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:09] * danieagle (~Daniel@177.205.182.70.dynamic.adsl.gvt.net.br) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[3:14] * treaki_ (588095a022@p4FDF665F.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[3:15] * LeaChim (~LeaChim@2.127.72.50) Quit (Ping timeout: 480 seconds)
[3:19] <Kioob> nhm: I'm really interested by your (futur) analyze of mons, "performance side". Now that all my mons are on SSD, I see that a "rbd rm XXX" run without impacting the cluster performance. When I tried it some hours ago, I add to cancel the "rm" to let the production work
[3:20] <Kioob> and since all my mons are on same hosts than osd, I suppose it doesn't help.
[3:22] <nhm> Kioob: yeah, I need to start paying more attention to the mons, especially in larger setups.
[3:23] <nhm> Kioob: I've been really focused on raw OSD/RBD/FS performance, and less on cluster scaling.
[3:23] <nhm> Wasn't expecting the MON to become a big bottleneck.
[3:24] <nhm> at least until hitting a lot of clients and lots of PGs.
[3:24] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[3:27] <Psi-jack> I have 3 ceph servers all saying HEALTH_WARN 142 pgs peering; 142 pgs stuck inactive; 345 pgs stuck unclean -- And I don't know why.... Yet since I wasn't actually here, I had to solve it myself because it took my firewalls down which use RBD disks on the ceph cluster. LOL
[3:27] <Psi-jack> So I ended up hard rebooting one of the 3 servers and it recovered..
[3:33] <nhm> Kioob: btw, just talked to Sage. Sounds like the fix appears to be working. Assuming no major issues there should be a point release coming son.
[3:33] <nhm> soon.
[3:33] <Psi-jack> Wait. 0.61?
[3:34] <nhm> Psi-jack: yeah
[3:34] <Kioob> great news ! I add recompiled a Ceph version with this patch, I will deploy it then, thanks nhm ! (and Sage)
[3:34] <nhm> Kioob: do keep the "no major issues" comment in mind. ;)
[3:35] <nhm> Kioob: not sure how much testing has been done yet...
[3:35] <Psi-jack> but.. But.. Cuttlefish is named.. meaning it's another LTS?
[3:35] <nhm> Psi-jack: point release, like 0.61.3
[3:35] <Kioob> oh you're right... Since I have a lot of room now, I should wait for the point release
[3:36] <Psi-jack> But, 0.56, what about it?
[3:36] <nhm> Psi-jack: I'm not sure if this bug affects 0.56
[3:36] <Psi-jack> heh. I'm not asking about the bug you're referring to.
[3:36] <Kioob> (0.56 didn't have store.db)
[3:36] <Psi-jack> I'm asking is if 0.61.x is LTS like 0.56 is.
[3:37] <nhm> Psi-jack: oh, I think so.
[3:37] <Psi-jack> Wow.. That's a bit way earlier than expected.
[3:37] <nhm> Well, I have no idea really. I should probably pay more attention to those things. ;)
[3:37] <nhm> Psi-jack: basically just assume I have no idea what I'm talking about most of the time. ;)
[3:38] <Psi-jack> hehe
[3:38] <Psi-jack> Usually ceph only names releases when they're LTS stables.
[3:38] <nhm> Psi-jack: yeah, now we are on this 3 month release cycle though. Not sure if that's changed things.
[3:38] <Psi-jack> Hmmm
[3:39] <Psi-jack> 3 months?!?
[3:39] <nhm> Yeah, I think dumpling is supposed to be out in Aug.
[3:39] <Psi-jack> Wow..
[3:40] <Kioob> yes... and the next version (dumpling) is planed on 07/07/2013
[3:40] <nhm> I ignore such things and just occasionally make random performance predictions from my ivory tower.
[3:40] <Psi-jack> This means I need to hurry up and change my Arch Linux ceph servers to gentoo like, yesterday!
[3:42] <Psi-jack> heh, lottsa changes from 0.56 to .61
[3:42] <nhm> "Your SSDs are too slow! Those MONs are chewing through CPU! Our crc32c implementation should be using native instructions!"
[3:43] <nhm> Psi-jack: yes, 0.61 has a lot of work. It's too bad this MON thing cropped up becuase it really is a nice release.
[3:43] <Psi-jack> heh
[3:43] <Psi-jack> Eh, I'm still on 0.56.3, not even .4 yet, because I've put a hold on any updates since maintaining Arch Linux is too much a PITA to properly maintain.
[3:44] <Psi-jack> hence, why I'm working out upgrading it all to Gentoo based servers, to fix that issue, since at least Gentoo has a full @world system, majorly lacking in Arch.
[3:44] <Kioob> well, wait for the next version of Cuttlefish, to avoid MON problems
[3:45] <Psi-jack> Oh, I plan to, for that.
[3:45] <Psi-jack> I could still do 0.56.4 on Gentoo, until 0.61.x is stabilized.
[3:45] * mikedawson (~chatzilla@mobile-166-147-103-115.mycingular.net) has joined #ceph
[3:45] * mikedawson (~chatzilla@mobile-166-147-103-115.mycingular.net) Quit ()
[3:45] <Psi-jack> heck, stlil have the packages themselves built as such. heh
[3:45] <Kioob> the current version can also work without any problem, but if it's a production cluster, you should wait yes
[3:46] * mikedawson (~chatzilla@mobile-166-147-103-115.mycingular.net) has joined #ceph
[3:46] <Psi-jack> yeah, my ceph cluster houses the RBD and CephFS disks for ~16 VM's in my home infrastructure.
[3:50] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:51] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:09] * mikedawson (~chatzilla@mobile-166-147-103-115.mycingular.net) Quit (Read error: Connection reset by peer)
[4:29] * BillK (~BillK@124-169-236-155.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[4:33] * noahmehl_ (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[4:37] * BillK (~BillK@58-7-155-66.dyn.iinet.net.au) has joined #ceph
[4:39] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:39] * noahmehl_ is now known as noahmehl
[4:51] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[4:53] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:13] * Vanony (~vovo@i59F7A824.versanet.de) has joined #ceph
[5:18] * Vanony_ (~vovo@i59F7A0AB.versanet.de) Quit (Read error: Operation timed out)
[5:25] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[5:25] * diegows (~diegows@190.190.2.126) Quit (Read error: Operation timed out)
[5:40] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Read error: Operation timed out)
[5:47] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[5:55] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[6:23] * redeemed (~redeemed@cpe-192-136-224-78.tx.res.rr.com) Quit (Quit: bia)
[6:47] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[7:01] * vipr_ (~vipr@78-23-112-42.access.telenet.be) has joined #ceph
[7:08] * vipr (~vipr@78-23-118-188.access.telenet.be) Quit (Ping timeout: 480 seconds)
[7:50] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Read error: No route to host)
[7:51] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[8:15] * KindTwo (KindOne@h83.25.131.174.dynamic.ip.windstream.net) has joined #ceph
[8:18] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:18] * KindTwo is now known as KindOne
[8:24] * esammy (~esamuels@host-2-101-237-68.as13285.net) has joined #ceph
[8:28] * esammy (~esamuels@host-2-101-237-68.as13285.net) has left #ceph
[8:32] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[9:15] * illuminatis (~illuminat@0001adba.user.oftc.net) has joined #ceph
[9:21] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Quit: ZNC - http://znc.sourceforge.net)
[9:22] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[9:31] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Quit: ZNC - http://znc.sourceforge.net)
[9:33] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[9:46] * tnt (~tnt@91.177.204.242) has joined #ceph
[9:47] * humbolt (~elias@91-113-98-46.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[9:56] * humbolt (~elias@178-190-249-124.adsl.highway.telekom.at) has joined #ceph
[10:29] * LeaChim (~LeaChim@2.127.72.50) has joined #ceph
[10:45] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[10:56] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[10:58] * ghartz (~ghartz@33.ip-5-135-148.eu) has joined #ceph
[11:25] * joset (~joset@59.92.199.116) has joined #ceph
[11:33] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[11:33] * ChanServ sets mode +v andreask
[11:36] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[12:10] * vipr (~vipr@78-23-118-217.access.telenet.be) has joined #ceph
[12:17] * vipr_ (~vipr@78-23-112-42.access.telenet.be) Quit (Ping timeout: 480 seconds)
[12:52] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[12:57] <jksM> how can I read out the "min_size" of a pool? (it is not listed in ceph osd dump)
[13:00] <Kioob> jksM: ceph osd dump | grep ^pool
[13:00] <Kioob> it's the cols 7&8
[13:01] <Kioob> for example : pool 4 'hdd2copies' rep size 2 min_size 1
[13:01] <Kioob> or if you prefer : ceph osd dump | grep ^pool | awk '{ print $3" "$7" "$8 }'
[13:35] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[13:40] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Quit: upgrading)
[13:42] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[13:45] * joset (~joset@59.92.199.116) Quit (Remote host closed the connection)
[13:56] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:01] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[14:01] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[14:01] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[14:02] * john_barbee_ is now known as john_barbee
[14:03] * codice_ (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[14:03] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (Read error: Connection reset by peer)
[14:08] * Elbandi_ (~ea333@elbandi.net) Quit (Remote host closed the connection)
[14:08] * Elbandi (~ea333@elbandi.net) has joined #ceph
[14:10] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[14:11] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:11] * john_barbee_ is now known as john_barbee
[14:11] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:15] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[14:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[14:20] * mikedawson (~chatzilla@mobile-166-147-103-115.mycingular.net) has joined #ceph
[14:34] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[14:36] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[14:40] * coyo (~unf@00017955.user.oftc.net) Quit ()
[14:45] * mikedawson (~chatzilla@mobile-166-147-103-115.mycingular.net) Quit (Read error: Connection reset by peer)
[16:01] <jksM> Kloob, it's not listed my osd dump
[16:02] <jksM> does this require something newer than bobtail?
[16:44] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[16:48] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[17:14] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:20] * tnt_ (~tnt@91.177.218.48) has joined #ceph
[17:21] * humbolt (~elias@178-190-249-124.adsl.highway.telekom.at) Quit (Quit: Leaving.)
[17:21] * tnt (~tnt@91.177.204.242) Quit (Ping timeout: 480 seconds)
[17:25] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Read error: Operation timed out)
[17:29] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[17:29] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:39] * jtang1 (~jtang@80.111.97.194) Quit (Ping timeout: 480 seconds)
[17:47] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[17:50] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[17:52] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[18:00] <Machske> hi guys, running into a problem with my mons.All of a sudden, my mon disks are filling up. And for now, I don't have a clue what's causing the fillup
[18:01] <Machske> I'm running v0.62, with 3 mons and 6 osd's in a 4 machine setup.
[18:02] <Machske> At the moment I'm taking mons out and in on resided volumes to prevent a fill up. It looks like it started a day ago
[18:02] <Machske> But no idea what triggers it
[18:05] * jtang2 (~jtang@80.111.97.194) has joined #ceph
[18:05] * jtang1 (~jtang@80.111.97.194) Quit (Read error: Connection reset by peer)
[18:07] * jtang2 (~jtang@80.111.97.194) Quit (Read error: Connection reset by peer)
[18:07] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[18:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:08] * BillK (~BillK@58-7-155-66.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:15] * jtang2 (~jtang@80.111.97.194) has joined #ceph
[18:15] * jtang1 (~jtang@80.111.97.194) Quit (Read error: Connection reset by peer)
[18:19] <Machske> don't get it :s, my other ceph clusters running 0.58 only have 50mb of mon data for a 16 node ceph cluster
[18:19] <Machske> but this cluster in trouble has already 3.9GB of mon data and still growing (v 0.62)
[18:20] <Machske> It started appearantly when adding an extra osd to the cluster
[18:20] <Machske> looks like new mon data is added, but old data is not cleaned up ?
[18:20] <Machske> really need some help here
[18:23] * mikedawson (~chatzilla@mobile-166-147-102-031.mycingular.net) has joined #ceph
[18:24] * jtang2 (~jtang@80.111.97.194) Quit (Read error: Connection reset by peer)
[18:25] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[18:34] <jmlowe> Machske: I can't help, but there have been some other people working on this
[18:38] <Machske> I'm thinking one of the mons is "bad"
[18:38] <Machske> had 3 mons,
[18:38] <Machske> removed the one I think was "bad"
[18:39] <Machske> everything froze. Restarted the 2 remaining mons, and everything unfroze. I'm gonna monitor the mon disk usage to see how it progresses
[18:39] <Machske> when stable again, I'll need to add a third mon
[18:39] <Machske> scary shit :)
[18:41] <Machske> Another thing which looks inconsistent. I have/had 3 mons. mon.1/2/3, but ceph -w shows all lines starting with mon.0
[18:56] <tnt_> only the master will do anything, the other are just 'peons' and listen only.
[18:59] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[19:00] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[19:03] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Read error: No route to host)
[19:04] * jtang1 (~jtang@80.111.97.194) Quit (Read error: Connection reset by peer)
[19:04] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[19:12] * jtang1 (~jtang@80.111.97.194) Quit (Ping timeout: 480 seconds)
[19:15] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[19:32] * ScOut3R (~ScOut3R@540240A4.dsl.pool.telekom.hu) has joined #ceph
[19:49] * mikedawson (~chatzilla@mobile-166-147-102-031.mycingular.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[19:51] * tkensiski1 (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[19:51] * tkensiski1 (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[19:53] <jksM> how can I read out the "min_size" of a pool? It is not listed in ceph osd dump... I'm running bobtail, does this require cuttlefish?
[20:03] <Machske> on my 0.62 version it is shown in ceph osd dump, in my v0.58 version it is also shown
[20:06] * diegows (~diegows@190.190.2.126) has joined #ceph
[20:23] * jtang1 (~jtang@80.111.97.194) Quit (Quit: Leaving.)
[20:26] <jksM> Machske, hmm okay... I have 0.56.6 - perhaps it's not in that version?
[20:26] <jksM> and if it is not, I wonder if this is because it is not possible to set the min_size at all, or it is just not possible to display the min_size you have set?
[20:30] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[20:30] <andrei> hello guys
[20:30] <andrei> anyone alive ?
[20:31] <andrei> i am having some performance questions I was wondering if someone could help me with?
[20:40] * jtang1 (~jtang@80.111.97.194) has joined #ceph
[20:41] * jtang1 (~jtang@80.111.97.194) Quit ()
[20:46] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[20:46] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Read error: Operation timed out)
[20:49] * danieagle (~Daniel@177.205.181.189.dynamic.adsl.gvt.net.br) has joined #ceph
[20:51] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:01] * ekarlso (~ekarlso@cloudistic.me) Quit (Quit: Lost terminal)
[21:01] <loicd> andrei: I'm alive
[21:02] * loicd touches his head, checks his legs
[21:06] * The_Bishop (~bishop@f052098020.adsl.alicedsl.de) has joined #ceph
[21:13] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[21:16] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[21:20] * tnt_ thinks he's alive too.
[21:29] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[21:35] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:35] <andrei> )))
[21:36] <andrei> nice one
[21:36] <andrei> lods of guys
[21:36] <andrei> tnt thanks for your help earlier on
[21:37] <andrei> i am doing some performance testing from a virtual machine which is running on kvm with rbd support
[21:37] <andrei> i am looking at performance figures with dd
[21:37] <andrei> single and multiple threads
[21:37] <andrei> and I am noticing slow performance
[21:38] <andrei> was wondering if you could comment
[21:38] <andrei> i've got two servers
[21:38] <andrei> each with 8 disks
[21:38] <andrei> each disk is capable of doing around 160mb seq reads
[21:38] <andrei> replication is set to 2
[21:39] <andrei> i am running dd if=<file> of=/dev/null bs=4M iflag=direct
[21:39] <andrei> at the same time I am using iotop on the servers and ceph -s to check out what's happening with ceph
[21:39] <andrei> iotop is showing me that it's reading from all disks
[21:40] <andrei> but it is reading only about 150mb/s in total
[21:40] <andrei> around 20 or so mb per disk
[21:40] <andrei> the obvious question is why is it not reading at faster speeds?
[21:40] <andrei> the second question is why is it not reading from both servers at the same time?
[21:40] <andrei> it seems to load only one server during reads
[21:41] <tnt_> reads always hit the master OSD only
[21:43] <nhm> andrei: is that 1 copy of dd or more?
[21:44] <andrei> well, one copy, coz even if i specify iflag=direct it still caches data on the backend server
[21:44] <tnt_> I wonder if with iflag=direct it does a read then wait for the response before submitting another ... in which case, you'll be hitting each OSD sequentially ...
[21:44] <andrei> and next time i run the command it reads from cache
[21:44] <andrei> if i do not use iflag=direct the speed is even less
[21:44] <andrei> i am getting around 110mb/s
[21:44] <andrei> from 1 thread
[21:44] <nhm> andrei: how many copies of dd?
[21:44] <andrei> just one copy
[21:45] <andrei> one dd thread
[21:45] <andrei> as i've mentioned multiple dd copies do not generate load on disk
[21:45] <tnt_> how are the server interconnected ?
[21:45] <andrei> they just get fed from cache
[21:45] <andrei> tnt: ip over infiniband
[21:46] <andrei> tnt: there is no issue with networking I think, as I can run 20 dd threads and I get around 700mb/s cumulative throughput from 20 threads
[21:47] <andrei> however, during 20 threads there is no disk activity as everything is read from cache
[21:47] <nhm> andrei: fyi, from a single VM using QEMU/KVM with RBD cache and virtio device I can do about 110MB/s with fio using iodepth=1 and up to about 890MB/s using iodepth=16.
[21:47] <andrei> nhm: could you please let me know what is the fio syntax that you use and I will try it on mine
[21:48] <tnt_> yeah, I suspect that dd is doing sequential sync call so it waits for the return before submitting the next ..
[21:48] <andrei> nhm: is that disk throughtpu or from cache?
[21:48] <nhm> andrei: I run sync and drop_caches on every machine in the cluster before the tests.
[21:48] <tnt_> andrei: try increasing the read ahead in the VM
[21:50] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:50] * ChanServ sets mode +v andreask
[21:51] <nhm> andrei: something like "fio --rw=read -ioengine=libaio --runtime=60 --numjobs=1 --direct=1 --bs=4M --iodepth=16 --size 64G"
[21:51] <nhm> andrei: I think that should work.
[21:51] <andrei> i will give it a go
[21:51] <andrei> tnt: how do I increase the read ahead?
[21:51] <nhm> andrei: increasing read_ahead might help like tnt said too.
[21:52] <nhm> andrei: /sys/block/<device>/queue/read_ahead_kb on the OSDs.
[21:53] <nhm> andrei: default is 128k, something like 512k or even several megs might help, but potentially could hurt iops if you set it too high.
[21:53] <nhm> andrei: also, modifying it in /sys won't be permenant on reboot.
[21:54] <andrei> thanks
[21:54] <andrei> i will try that
[21:54] <andrei> an in terms of the ceph itself, are there any performance improvements that i can try?
[21:55] <nhm> andrei: one other thing, I've seen on a couple of IPoIB clusters that read performance was really dramatically hurt by tcp autotuning. Basically the tcp buffers were filling up causing a bunch of retransmits.
[21:55] <nhm> andrei: Jim Schutt discovered it first last year.
[21:56] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) has joined #ceph
[21:56] <nhm> andrei: This may or may not help: http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/
[21:56] <andrei> oh, yeah, true
[21:56] <andrei> there is a module options for ipoib that could help
[21:58] <andrei> the third issue that i've noticed is that i have set a dd thread to write a large file to the disk. while it was running i've rebooted one of the servers. the write process died without recovering. Is this normal? I assumed that it could freeze for a short period until figuring out that it should write to the second server
[21:59] <tnt_> yeah, I have noticed that has well ... it seems that any pending op is not retried
[21:59] <nhm> andrei: hrm, I think there is some subtlety there that a write may need to be resubmitted, but I don't remember the details.
[22:00] <andrei> tnt: so effectively you will start loosing data on the server if there are server restarts? or am I not correctly interpreting this?
[22:01] <tnt_> no, it will just freeze
[22:01] * codice_ is now known as codice
[22:01] <tnt_> because they're not acked either ...
[22:01] <nhm> If the application doesn't actually check to make sure the write suceeeded...
[22:04] <tnt_> well ... the app will actually block.
[22:04] <tnt_> any well designed app should have checkpoints to ensure on-disk consistency.
[22:05] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[22:05] * noahmehl_ (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[22:10] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:10] * noahmehl_ is now known as noahmehl
[22:12] * jahkeup (~jahkeup@199.232.79.20) has joined #ceph
[22:12] <andrei> i see
[22:15] <andrei> i am noticing kernel is complaining about a number of tasks which are blocked for more than 120 seconds
[22:15] <andrei> on the vm which is running on kvm with rbd
[22:16] <andrei> and I can't seems to umount the disk
[22:16] <andrei> the process just hangs
[22:17] <andrei> has anyone noticed this behaviour?
[22:17] * joao (~JL@89-181-159-84.net.novis.pt) has joined #ceph
[22:17] * ChanServ sets mode +o joao
[22:22] <nhm> andrei: btw, what version of ceph is this and do you have rbd cache enabled?
[22:22] <andrei> nhm: i am using the latest version from repo, which is 0.61.2 i believe
[22:23] <andrei> and I've not made any changes at all to the ceph default setup
[22:23] <tnt_> andrei: yes, when I kill an osd and some IO have been submitted to it but it hasn't acked them, I get the same.
[22:23] <andrei> i've just created a very simple ceph.conf file
[22:23] <tnt_> when I restart the osd, it clears out.
[22:23] <andrei> added 3 mons
[22:23] <andrei> and that's it
[22:24] <andrei> tnt: i've got these problems without killing or restarting an osd
[22:24] <andrei> while i was doing read test i've not had any problems
[22:24] <andrei> soon after i've started write tests these messages showed up
[22:26] <andrei> from what I can see, all osd processes are running on both servers
[22:26] <tnt_> does ceph -w shows "slow ops" ?
[22:29] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[22:39] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:41] <andrei> one sec
[22:41] <andrei> will check
[22:43] <andrei> here is the output:
[22:43] <andrei> http://ur1.ca/e1xhx
[22:44] <andrei> i do not see anything slow ops
[22:50] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[22:51] * cce (~cce@50.56.54.167) has joined #ceph
[22:52] * cce is trying to find a simple S3 like server I could run that would have an append-only file system I could encrypt and archive.
[22:52] <cce> I'm looking for a WORM (write once read many) file interface.
[22:53] <cce> So that my developers have a place to store large media files that I can backup reliably and keep consistent with my postgresql database.
[22:53] <cce> Although, perhaps I'm thinking about this all wrong.
[22:53] * cce saw ceph and was curious.
[23:00] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (Remote host closed the connection)
[23:06] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[23:15] * DarkAce-Z is now known as DarkAceZ
[23:28] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[23:28] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Read error: Operation timed out)
[23:39] <lightspeed> hmm, I had been running 0.56.4 for quite a while, and upgraded to 0.61.1 just now
[23:39] <lightspeed> but one of my monitors fails to start, with the logs claiming "Existing monitor store has not been converted to 0.52 (bobtail) format"
[23:39] <lightspeed> I tried then additionally upgrading to 0.61.2, with the same result
[23:40] <lightspeed> the other two monitors are happy
[23:41] <lightspeed> I saw http://tracker.ceph.com/issues/4747, but am not sure from that how to go about resolving it in my cluster
[23:52] * BillK (~BillK@58-7-155-66.dyn.iinet.net.au) has joined #ceph
[23:57] * eschnou (~eschnou@148.95-201-80.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.