#ceph IRC Log


IRC Log for 2013-08-19

Timestamps are in GMT/BST.

[0:00] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[0:03] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[0:19] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:21] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:26] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[0:26] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:30] * lx0 is now known as lxo
[0:34] * chris__ (~chris@ has joined #ceph
[0:35] * chris__ (~chris@ Quit (Read error: Connection reset by peer)
[0:41] * chris_ (~chris@ Quit (Ping timeout: 480 seconds)
[1:07] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[1:20] * AfC (~andrew@2407:7800:200:1011:ede3:46ba:43c9:da5b) has joined #ceph
[1:21] * mschiff (~mschiff@port-51725.pppoe.wtnet.de) Quit (Remote host closed the connection)
[1:25] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:26] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[1:44] * Cube (~Cube@ has joined #ceph
[1:57] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[2:11] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:11] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:38] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:44] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:45] * kenneth_ (~kenneth@ has joined #ceph
[2:45] * huangjun (~kvirc@ has joined #ceph
[2:46] * yy-nm (~Thunderbi@ has joined #ceph
[2:48] * Cube (~Cube@ Quit (Read error: Connection reset by peer)
[2:50] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[2:55] <ChoppingBrocoli> mozg: I think I have found the answer to kvm and ceph
[3:01] * madkiss (~madkiss@ has joined #ceph
[3:02] * madkiss1 (~madkiss@ has joined #ceph
[3:02] * madkiss (~madkiss@ Quit (Read error: Connection reset by peer)
[3:03] * Cube (~Cube@ has joined #ceph
[3:07] * julian (~julianwa@ has joined #ceph
[3:10] * danieagle (~Daniel@ has joined #ceph
[3:11] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[3:33] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[3:34] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[3:34] * danieagle (~Daniel@ Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[3:35] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Read error: Operation timed out)
[3:41] * DarkAce-Z (~BillyMays@ Quit (Ping timeout: 480 seconds)
[3:42] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[3:42] * ChanServ sets mode +o elder
[3:49] * haomaiwang (~haomaiwan@notes4.com) has joined #ceph
[3:56] * haomaiwa_ (~haomaiwan@ Quit (Read error: Operation timed out)
[4:08] * yanzheng (~zhyan@ has joined #ceph
[4:09] * DarkAce-Z (~BillyMays@ has joined #ceph
[4:12] * DarkAce-Z is now known as DarkAceZ
[4:12] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[4:12] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[4:14] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[4:18] * rongze (~quassel@li565-182.members.linode.com) has joined #ceph
[4:19] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[4:25] * rongze_ (~quassel@ Quit (Ping timeout: 480 seconds)
[4:27] * kenneth_ (~kenneth@ Quit (Quit: Leaving)
[4:49] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[5:02] * rongze_ (~quassel@ has joined #ceph
[5:05] * fireD_ (~fireD@93-139-190-39.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-139-162-125.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:09] * rongze (~quassel@li565-182.members.linode.com) Quit (Read error: Operation timed out)
[5:12] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[5:17] * rongze (~quassel@li565-182.members.linode.com) has joined #ceph
[5:19] * julian_ (~julianwa@ has joined #ceph
[5:22] * rongze__ (~quassel@ has joined #ceph
[5:24] * rongze_ (~quassel@ Quit (Read error: Operation timed out)
[5:26] * julian (~julianwa@ Quit (Ping timeout: 480 seconds)
[5:28] * rongze (~quassel@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[5:32] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[5:36] * smiley (~smiley@cpe-67-251-108-92.stny.res.rr.com) Quit (Quit: smiley)
[5:39] * KindTwo (~KindOne@h189.62.186.173.dynamic.ip.windstream.net) has joined #ceph
[5:40] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: my troubles seem so far away, now yours are too...)
[5:41] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:41] * KindTwo is now known as KindOne
[5:52] * lightspeed (~lightspee@ Quit (Ping timeout: 480 seconds)
[5:53] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:01] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[6:03] * haomaiwa_ (~haomaiwan@ has joined #ceph
[6:03] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit ()
[6:09] * haomaiwang (~haomaiwan@notes4.com) Quit (Ping timeout: 480 seconds)
[6:21] * mdjunaid (uid13426@id-13426.ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[6:21] * mkoderer (uid11949@ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[6:21] * scalability-junk (uid6422@ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[6:22] * mdjunaid (uid13426@ealing.irccloud.com) has joined #ceph
[6:41] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[6:52] * torment (~torment@pool-72-64-189-67.tampfl.fios.verizon.net) has joined #ceph
[6:57] * loopy (~torment@pool-72-91-109-202.tampfl.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:19] * torment1 (~torment@pool-72-64-180-146.tampfl.fios.verizon.net) has joined #ceph
[7:21] * torment (~torment@pool-72-64-189-67.tampfl.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:31] * tnt (~tnt@ has joined #ceph
[7:39] <yy-nm> hay, all. i have a question about ENCODE_DUMP macro definition variable?
[7:39] <yy-nm> where it is defined?
[7:42] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[7:43] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[7:54] * haomaiwang (~haomaiwan@notes4.com) has joined #ceph
[7:56] * jasoncn (~jasoncn@ has joined #ceph
[8:01] * haomaiwa_ (~haomaiwan@ Quit (Ping timeout: 480 seconds)
[8:07] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:22] * scalability-junk (uid6422@ealing.irccloud.com) has joined #ceph
[8:22] * mkoderer (uid11949@id-11949.ealing.irccloud.com) has joined #ceph
[8:26] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[8:29] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[8:42] * Cube (~Cube@ Quit (Quit: Leaving.)
[8:48] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:56] * X3NQ (~X3NQ@ Quit (Read error: Connection timed out)
[8:57] * X3NQ (~X3NQ@ has joined #ceph
[9:00] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:04] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:08] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[9:11] * BManojlovic (~steki@ has joined #ceph
[9:12] * nerdtron (~kenneth@ has joined #ceph
[9:16] * odyssey4me3 (~odyssey4m@ has joined #ceph
[9:18] * vipr (~vipr@78-23-120-14.access.telenet.be) has joined #ceph
[9:19] <nerdtron> hi all
[9:19] <nerdtron> how do you change the number of backfill operations?\
[9:21] <yy-nm> reference on http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
[9:24] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[9:24] * ChanServ sets mode +v andreask
[9:25] * AfC (~andrew@2407:7800:200:1011:ede3:46ba:43c9:da5b) Quit (Quit: Leaving.)
[9:26] * Cube (~Cube@et-0-30.gw-nat.bs.kae.de.oneandone.net) has joined #ceph
[9:40] * mschiff (~mschiff@p4FD7C905.dip0.t-ipconnect.de) has joined #ceph
[9:41] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[9:42] * jjgalvez1 (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[9:43] <nerdtron> yy-nm how do i declare those?
[9:43] * jjgalvez1 (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit ()
[9:50] <yy-nm> in conf file
[9:51] <yy-nm> under [osd] section
[9:56] <nerdtron> then what should i do to apply changes?
[10:01] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[10:06] * allsystemsarego (~allsystem@5-12-241-157.residential.rdsnet.ro) has joined #ceph
[10:10] <yy-nm> nerdtron: restart
[10:11] <nerdtron> restart all nodes?
[10:11] * lightspeed (~lightspee@ has joined #ceph
[10:16] <yy-nm> nerdtron: yes
[10:17] * haomaiwa_ (~haomaiwan@ has joined #ceph
[10:24] * haomaiwang (~haomaiwan@notes4.com) Quit (Ping timeout: 480 seconds)
[10:37] <loicd> ccourtaut: morning sir
[10:37] <ccourtaut> loicd: morning
[10:37] * _robbat2|irssi (nobody@www2.orbis-terrarum.net) Quit (Ping timeout: 480 seconds)
[10:38] * jasoncn (~jasoncn@ Quit (Ping timeout: 480 seconds)
[10:38] * jasoncn (~jasoncn@180-198-91-108.nagoya1.commufa.jp) has joined #ceph
[10:45] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[10:47] * jasoncn (~jasoncn@180-198-91-108.nagoya1.commufa.jp) Quit (Read error: Operation timed out)
[10:48] * jasoncn (~jasoncn@180-198-91-108.nagoya1.commufa.jp) has joined #ceph
[10:50] * X3NQ (~X3NQ@ Quit (Quit: Leaving)
[10:53] * nerdtron (~kenneth@ Quit (Quit: Leaving)
[10:57] * yanzheng (~zhyan@ Quit (Quit: Leaving)
[11:00] * jasoncn (~jasoncn@180-198-91-108.nagoya1.commufa.jp) Quit (Read error: Operation timed out)
[11:01] * jasoncn (~jasoncn@ has joined #ceph
[11:01] * haomaiwa_ (~haomaiwan@ Quit (Remote host closed the connection)
[11:02] * haomaiwang (~haomaiwan@li565-182.members.linode.com) has joined #ceph
[11:03] * _robbat2|irssi (nobody@ has joined #ceph
[11:09] * haomaiwa_ (~haomaiwan@ has joined #ceph
[11:14] * haomaiwang (~haomaiwan@li565-182.members.linode.com) Quit (Read error: Operation timed out)
[11:15] * joao (~joao@89-181-153-226.net.novis.pt) has joined #ceph
[11:15] * ChanServ sets mode +o joao
[11:15] <joao> morning #ceph
[11:20] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[11:22] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[11:31] * jasoncn (~jasoncn@ Quit (Read error: Connection timed out)
[11:31] * jasoncn (~jasoncn@ has joined #ceph
[11:37] * ruhe (~ruhe@ has joined #ceph
[11:37] * _robbat2|irssi (nobody@ Quit (Ping timeout: 480 seconds)
[11:38] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[11:47] <loicd> joao: morning ! you're back in Lisboa ?
[11:48] <joao> loicd, indeed I am
[11:48] <joao> back to the right side of the pond :)
[11:52] * jabadia (~jabadia@ has joined #ceph
[11:57] * Cube (~Cube@et-0-30.gw-nat.bs.kae.de.oneandone.net) Quit (Quit: Leaving.)
[12:02] * jabadia (~jabadia@ Quit (Ping timeout: 480 seconds)
[12:07] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:08] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[12:16] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[12:17] * tnt (~tnt@ has joined #ceph
[12:40] * yy-nm (~Thunderbi@ Quit (Quit: yy-nm)
[12:43] * ruhe (~ruhe@ has joined #ceph
[12:53] * Cube (~Cube@et-0-30.gw-nat.bs.kae.de.oneandone.net) has joined #ceph
[12:54] * jasoncn (~jasoncn@ Quit (Quit: Leaving)
[12:57] * BillK (~BillK-OFT@220-253-232-124.dyn.iinet.net.au) has joined #ceph
[12:57] <joelio> Ahh, a Ceph day in London, good stuff
[12:57] <joao> yep
[13:03] * huangjun (~kvirc@ Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[13:04] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[13:05] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:12] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[13:18] <mozg> really?
[13:18] <mozg> when is it taking place?
[13:18] <mozg> do you have a link?
[13:18] <joelio> http://cephdaylondon.eventbrite.com/?discount=London
[13:20] <darkfader> wow cool
[13:20] <darkfader> i'm going to london for some musical and didnt (yet) know which date to pick :)
[13:21] <joelio> oops, erm, erm not sure if discount code should be in the URL (it was in a generic mail to me though)
[13:42] <mozg> Nice, i've just signed up )))
[13:42] <mozg> anyone else is coming?
[13:42] <mozg> it would be nice to have a chat and socialise
[13:43] <mozg> discuss challenges and successes )))
[13:49] * mxmln (~maximilia@ has joined #ceph
[13:51] * mxmln (~maximilia@ Quit ()
[14:00] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:00] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:04] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:05] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:11] <joao> mozg, afaik, sage is definitely going
[14:11] <joao> I'm planing to go as well
[14:17] <BManojlovic> is someone specific maintainer of opensuse rpm spec files? or is it generic one
[14:17] <BManojlovic> (single spec file for all rpm distributions)
[14:18] <joao> don't know; but Gary Lowell (glowell) should be the one to talk to wrt packaging
[14:19] <BManojlovic> ok will do thanks for info
[14:21] <joelio> mozg: yea, I should be too
[14:25] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla [Firefox 23.0/20130730113002])
[14:31] <ChoppingBrocoli> mozg I have information for you if you would like
[14:31] * julian_ (~julianwa@ Quit (Quit: afk)
[14:35] * sleinen (~Adium@2001:620:0:46:7147:2707:c5df:2ab4) has joined #ceph
[14:44] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Remote host closed the connection)
[14:46] * clayb (~kvirc@ has joined #ceph
[14:47] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:50] * loicd reading https://www.kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
[14:50] <mozg> yeah, please
[14:50] <mozg> i would love to share my thoughts and experiences
[14:50] <mozg> how do I find you guys?
[14:50] <mozg> shall we exchange twitter/facebook accounts or phone numbers, etc?
[14:51] <loicd> mozg: I plan to go as well :-)
[14:51] <mozg> nice
[14:51] <mozg> )))
[14:51] <mozg> the more people the better for everyone i guess
[14:53] <mozg> by the way, has anyone upgraded from 0.61 to 0.67 yet?
[14:54] <mozg> i am planning to do it this week
[14:54] <mozg> wanted to get an idea how hard it will be
[14:54] * loicd rollback to implementing 2+1 XOR instead of diving in RAID 6 ;-)
[14:56] * vipr_ (~vipr@78-23-114-68.access.telenet.be) has joined #ceph
[14:57] * haomaiwang (~haomaiwan@li565-182.members.linode.com) has joined #ceph
[14:58] * rongze__ (~quassel@ Quit (Quit: No Ping reply in 180 seconds.)
[14:58] * vipr (~vipr@78-23-120-14.access.telenet.be) Quit (Ping timeout: 480 seconds)
[14:59] * yanzheng (~zhyan@ has joined #ceph
[14:59] * sleinen1 (~Adium@ has joined #ceph
[15:00] <ChoppingBrocoli> Yes, it was easy
[15:00] * rongze (~quassel@ has joined #ceph
[15:00] <ChoppingBrocoli> it appears as if .67 runs much better, I have noticed a lot of improvments
[15:00] <ChoppingBrocoli> I also fixed my KVM RBD issue
[15:00] * sleinen2 (~Adium@2001:620:0:25:186c:dcae:a82e:4659) has joined #ceph
[15:00] * rongze (~quassel@ Quit (Remote host closed the connection)
[15:01] * rongze (~quassel@ has joined #ceph
[15:02] * rongze (~quassel@ Quit (Remote host closed the connection)
[15:02] <loicd> ccourtaut: thanks for taking a look at https://github.com/ceph/ceph/tree/wip-5878 :-)
[15:04] * haomaiwa_ (~haomaiwan@ Quit (Ping timeout: 480 seconds)
[15:07] * sleinen (~Adium@2001:620:0:46:7147:2707:c5df:2ab4) Quit (Ping timeout: 480 seconds)
[15:07] * sleinen1 (~Adium@ Quit (Ping timeout: 480 seconds)
[15:10] <ccourtaut> loicd: you're welcome
[15:13] <joelio> mozg: I thought it was obligatory to attend all conferences wearing a sandwich board with your IRC handle in flashing neon? :)
[15:18] <mozg> ChoppingBrocoli, thanks, what kind of issues did yo uhave with kvm/rbd?
[15:18] <mozg> i am on kvm/rbd using 0.61.7 at the moment
[15:18] <mozg> joelio, nice! ))
[15:19] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:19] <mozg> ChoppingBrocoli, sage has mentioned to me yesterday that some one has reported slower io performance for kvm/qemu use
[15:19] <mozg> did you notice any difference at all?
[15:19] <ChoppingBrocoli> disk access was really slow
[15:20] <ChoppingBrocoli> in my case it was not the network or lat
[15:20] <ChoppingBrocoli> i used the osd bench tool, iostat and htop to really take a closer look at what was going on
[15:21] <mozg> i see
[15:21] <mozg> did you have slow disk access for any particular work loads?
[15:22] <mozg> like small block size only?
[15:22] <mozg> or pretty much all work loads?
[15:26] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[15:26] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[15:28] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:28] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:29] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[15:29] * DarkAce-Z (~BillyMays@ has joined #ceph
[15:30] * vipr_ (~vipr@78-23-114-68.access.telenet.be) Quit (Quit: leaving)
[15:32] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[15:32] * vipr (~vipr@78-23-114-68.access.telenet.be) has joined #ceph
[15:34] <joao> mozg, how far back on 0.61 are you?
[15:34] <mozg> i am on 0.61.7
[15:35] <joao> then you should be fine
[15:35] <mozg> ChoppingBrocoli, how's the performance now? what tests did you run so far?
[15:35] <joao> was going to point out the mon protocol change from 0.61.4 to 0.61.5, but in that case there's no need to worry
[15:36] <ChoppingBrocoli> i found some osds were benching at only 40mb, some said they were up and in but were not, and some had a heavy load. i replaced the troubled disks and adjusted crush and according to my plan. All of this and my rbd speed has doubled
[15:36] <ChoppingBrocoli> using osd bench, iostat, htop you can track down where the slowness in your cluster is
[15:36] * zhyan_ (~zhyan@ has joined #ceph
[15:36] <yo61> Afternoon
[15:38] <ChoppingBrocoli> Also, having a 100K IOP ssd really does make a difference for the journal
[15:39] <ChoppingBrocoli> Here is the new speed of one of my osds
[15:39] <ChoppingBrocoli> ceph tell osd.4 bench
[15:39] <ChoppingBrocoli> { "bytes_written": 1073741824,
[15:39] <ChoppingBrocoli> "blocksize": 4194304,
[15:39] <ChoppingBrocoli> "bytes_per_sec": "352300618.000000"}
[15:42] * yanzheng (~zhyan@ Quit (Ping timeout: 480 seconds)
[15:43] <darkfader> nice speed
[15:43] <darkfader> and it's quite odd to see bytes as a float :)
[15:43] <darkfader> what hardware is in there?
[15:47] <ChoppingBrocoli> that os has the journal on the 100K IOP OCZ, Data is WD Raptor 1TB 10K, Sata 6 to SAS2008
[15:49] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[15:50] * ruhe (~ruhe@ has joined #ceph
[15:51] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:53] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[15:55] <mozg> that is nice indeed
[15:55] <mozg> ChoppingBrocoli, what ocz drive do you have? could yo ushare your link please?
[15:55] <mozg> also, how many osds do you run per ssd disk?
[15:55] * kyann (~oftc-webi@AMontsouris-652-1-204-73.w86-212.abo.wanadoo.fr) has joined #ceph
[15:57] * jeff-YF (~jeffyf@ has joined #ceph
[15:57] * zhyan_ (~zhyan@ Quit (Ping timeout: 480 seconds)
[15:58] <ChoppingBrocoli> http://www.neweggbusiness.com/Product/Product.aspx?Item=9B-20-227-916
[15:58] <ChoppingBrocoli> I use primary partitions meaning you can do 4 at most
[16:01] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[16:04] * symmcom (~wahmed@ has joined #ceph
[16:04] <darkfader> partitions are so 90s :)
[16:05] <ChoppingBrocoli> call me old fashion
[16:05] <darkfader> nah i'm sure you have a reason
[16:06] <symmcom> Hello all. first time in Ceph IRC and in any IRC :)
[16:07] * madkiss1 (~madkiss@ Quit (Quit: Leaving.)
[16:10] * storage (~wahmed@ has joined #ceph
[16:11] * storage (~wahmed@ has left #ceph
[16:11] * ruhe (~ruhe@ has joined #ceph
[16:11] <mozg> symmcom, welcome!
[16:12] <symmcom> thanks mozg
[16:12] * kyann (~oftc-webi@AMontsouris-652-1-204-73.w86-212.abo.wanadoo.fr) Quit (Quit: Page closed)
[16:12] <symmcom> i was having some issues with CEPH and while trying to find some source of help, came across this IRC channel info
[16:13] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Quit: leaving)
[16:13] <mozg> there are people to help ))
[16:14] <mozg> some ppl are more experienced than others
[16:14] <mozg> i am a newbie myself
[16:14] * BillK (~BillK-OFT@220-253-232-124.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:17] <symmcom> it seems so quiet here. do i just ask my question and wait for somebody to respond?
[16:17] <symmcom> glad to see another newbie in my CEPH venture :)
[16:17] <absynth> most people here are from the US
[16:18] <absynth> at least those who are inktank employees
[16:18] <clayb> symmcom, yup ask away; if someone can answer they will
[16:18] <symmcom> i m in Calgary, Canada
[16:18] * smiley (~smiley@cpe-67-251-108-92.stny.res.rr.com) has joined #ceph
[16:19] <symmcom> i have been running a CEPH cluster for last 4 months, no complaint, running great. Last night i started upgrading Cuttlefish 61.7 cluster to Dumplings. I did upgrade on the Admin host and 1 of the 9 monitors. After the upgrade the monitor does not come online. ceph -s says its down
[16:20] <tnt> that's normal.
[16:20] <tnt> new monitors can't talk to old ones.
[16:20] <tnt> you need to read the release notes.
[16:20] <jmlowe> symmcom: upgrade 4 more and you should be in business with a new quorum
[16:21] <symmcom> i had a feeling about that, but did not want to continue upgrading the rest incase none of them came online
[16:22] <symmcom> upgrading 4 more mons now....
[16:22] <ChoppingBrocoli> Am I the only one who basically has a heart attack everytime I have to mess with a mon?
[16:22] <symmcom> I m with u there lol
[16:24] <symmcom> is it necessary to restart all mon machine after the upgrade?
[16:24] <absynth> what do you mean? Reboot them? No.
[16:24] <ChoppingBrocoli> restart each mon 1 at a time, then restart osd
[16:25] <absynth> if you upgrade across versions (bobtail -> dumpling) etc., you have to restart the mons after updating them, and wait for the new quorum
[16:25] <symmcom> is it the right command for restart? service ceph restart mon.[id] ?
[16:26] <jmlowe> ChoppingBrocoli: I get butterflies just thinking about doing anything with any component
[16:26] <ChoppingBrocoli> yea we need therapy, ptsd lol
[16:26] <janos> symmcom: on rpm-based distro it should be fine. though i like to go long-hand and do /etc/init.d/ceph blah blah
[16:27] <jmlowe> symmcom: that's what I do for ubuntu, but I use 'ceph -a' instead of 'ceph'
[16:28] <jmlowe> also if joao is online and probably is (he's based in Europe, Spain I think) and is the go to guy for monitors
[16:29] <symmcom> i used ceph-deploy to deploy all my mons/osd. and the entire installation did not write anything in ceph.conf file like [mon.0] etc etc. so everytime i run /etc/init.d/ceph blah blah, it says my mon id does not exist in /etc/ceph/ceph.conf , which is why i rebooted entire machine
[16:29] <alfredodeza> symmcom: what version of ceph-deploy are you using?
[16:30] <jmlowe> symmcom: how is it going, get you get your quorum back?
[16:30] <jmlowe> <- upgrades vicariously
[16:30] <symmcom> i m running Ubuntu 12.04. ceph-deploy 1.2.1
[16:30] * julian (~julianwa@ has joined #ceph
[16:30] <mozg> jmlowe, yeah man, same here
[16:31] * julian (~julianwa@ Quit ()
[16:31] <mozg> on a live cluster i am worried sh*tless with each upgrade
[16:31] <jmlowe> I caused 0.61.1
[16:31] <mozg> or when one of the mons is down and it takes a bit of time when running ceph -s
[16:31] <symmcom> my palms r sweating lol
[16:32] <jmlowe> keep us posted
[16:33] <symmcom> by the way, this is how i m upgrading. I upgraded the admin host first , then running this command from the admin host to upgrade the mons.... #ceph-deploy install ceph-mon-01
[16:33] <joelio> I'd reaklly read http://ceph.com/docs/master/release-notes/#v0-67-dumpling
[16:34] <joelio> UPGRADING FROM V0.61 “CUTTLEFISH”¶
[16:34] <mozg> symmcom, yeah, for sure, read the upgrade guide
[16:34] <mozg> if you do not want to mess things up
[16:34] <joelio> section marked "UPGRADE SEQUENCING"
[16:37] <symmcom> ok, i think i have missed something. b
[16:37] <joelio> btw Ubuntu 12.04 uses upstart, so service commands are fairly moot
[16:37] <joelio> yes, reasing upgrade instructions before upgrading is always handy
[16:40] <symmcom> i was under the impression that running #ceph-deploy install host from admin upgrades the destination host with wahtever files needed
[16:40] <yo61> so, I'm deploying a ceph cluster using puppet
[16:41] <yo61> On CentOS
[16:41] <yo61> So using http://ceph.com/docs/next/install/rpm/
[16:41] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[16:41] <yo61> When I get to the step about creating the user key, is this a key that is shared across all nodes, or is it node-specific?
[16:43] <joelio> symmcom: don't assume ceph-deploy to do much sane in it's current guise (especially if you have older cuttlefish). It's had a few issues
[16:43] <joelio> although they should be on the whole fixed now
[16:44] <symmcom> i have cuttlefish 61.7... upgrading ceph-common on all mon right now.......
[16:47] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[16:47] * ChanServ sets mode +o scuttlemonkey
[16:55] * sleinen2 (~Adium@2001:620:0:25:186c:dcae:a82e:4659) Quit (Quit: Leaving.)
[16:55] <symmcom> and just like that.. 5 newly upgraded is now on and older 4 mons are down :)
[16:55] * sleinen (~Adium@2001:620:0:30:48db:7a73:117e:2cd6) has joined #ceph
[16:55] <symmcom> i can breath again !!
[16:55] <ChoppingBrocoli> Crack open a beer!
[16:56] <symmcom> the step i missed was i did not run this on every mon.. #apt-get upgrade ceph-common
[16:56] <jmlowe> I'll drink some coffee to that!
[16:56] <symmcom> :) upgrading rest of the 4 mons now
[17:00] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[17:00] <symmcom> how frequently can i ask questions here without annoying people to death :)
[17:00] <janos> keep going~
[17:00] <janos> ;)
[17:01] <joelio> it's what we're here for :)
[17:01] <janos> the people here are older and professionals. if you ask good questions and learn from the answers, i don't think it's a big deal
[17:01] <jmlowe> symmcom: 1/60 Hz
[17:02] <symmcom> great! good to know! and thank you to all !
[17:03] <symmcom> ok, upgrade related question.... seems like i have to reboot each mon host in order to activate the upgrade, how can it be done without the reboot? or can it?
[17:03] * sleinen (~Adium@2001:620:0:30:48db:7a73:117e:2cd6) Quit (Ping timeout: 480 seconds)
[17:03] <joelio> stop ceph-all && sleep 2 && start ceph-all
[17:04] <joelio> as I mentioned, on Ubuntu 12.04 you use upstart
[17:04] <joelio> so no /etc/init.d/ or service - uses stop/start
[17:04] <joelio> http://ceph.com/docs/master/rados/operations/operating/
[17:04] <symmcom> i m new to Ubuntu, got into it same time i stumbled upon ceph 4 months ago
[17:05] <joelio> the service commands used to work via mkcephfs in bobtail, but in cuttlefish and with ceph-deploy they don't (or didn't(
[17:06] <joelio> you can still use service commands, but they are generally wrappers
[17:06] <symmcom> this one? #service ceph start|restart?
[17:06] <joelio> um, no
[17:06] <joelio> http://ceph.com/docs/master/rados/operations/operating/
[17:08] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:08] <symmcom> so using this i can start any mon without reboot? sudo start ceph-mon id=hostname?
[17:10] * ruhe_ (~ruhe@ has joined #ceph
[17:11] <joelio> symmcom: if it's not running already
[17:12] <joelio> or ssh into the host in question and run stop ceph-mon all
[17:12] <joelio> start ceph-mon all
[17:12] * ruhe (~ruhe@ Quit (Ping timeout: 480 seconds)
[17:16] <symmcom> #stop ceph-mon all from the host in question does not stop ALL the mons in the cluster ?
[17:17] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:20] <symmcom> it worked! i ran the stop ceph-mon-all command from the host then start command and the cluster can see it now :) but still not sure if stop ceph-mon-all stops all the mons or just the mon in the host i m running the command from
[17:20] <joelio> it's just the host you're on afaik
[17:21] <joelio> --allhosts is for all hosts
[17:21] <symmcom> upgrading the last mon now, using the knowledge i just learned here
[17:21] <joelio> reading the docs it alwats wise ;)
[17:21] * haomaiwa_ (~haomaiwan@notes4.com) has joined #ceph
[17:22] * joao (~joao@89-181-153-226.net.novis.pt) Quit (Remote host closed the connection)
[17:22] <symmcom> i think i stayed away the commands on that page when i saw they had -all in the end. didnt want to stop ALL mon/osd in the cluster and cause sleepless nights :)
[17:23] <joelio> you can have more than one osd per node for example though
[17:23] <joelio> so the commands are pretty generic
[17:23] <joelio> -a or --allhosts will make it do on * from an admin host with ssh access
[17:23] <symmcom> at this moment i have 2 storage nodes with 4 OSDs in each
[17:27] * ruhe (~ruhe@ has joined #ceph
[17:27] * ruhe (~ruhe@ Quit (Remote host closed the connection)
[17:28] * ruhe (~ruhe@ has joined #ceph
[17:28] * haomaiwang (~haomaiwan@li565-182.members.linode.com) Quit (Read error: Operation timed out)
[17:28] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:29] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[17:29] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Read error: Connection reset by peer)
[17:30] * Dieter_be (~Dieterbe@dieter2.plaetinck.be) has joined #ceph
[17:30] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[17:30] * ruhe_ (~ruhe@ Quit (Ping timeout: 480 seconds)
[17:30] <Dieter_be> has anyone used the ceph filesystem as a graphite (whisper) storage directory?
[17:31] <symmcom> ceph says HEALTH_OK :)
[17:32] <joelio> symmcom: sounds like win to me
[17:32] <jmlowe> symmcom: yea! I'm currently having a discussion with my manager about the best time to do our upgrade from cuttlefish to dumpling
[17:33] <symmcom> now on to OSD upgrade. is it same procedure as mon?
[17:33] <jmlowe> afaik
[17:34] <jmlowe> except no quorum of course
[17:34] <absynth> what's the release after dumpling going to be named?
[17:34] <tnt> emperor
[17:34] <absynth> as in Palpatine?
[17:35] <janos> haha
[17:35] <joelio> symmcom: if you've restarted using ceph-all and already have upgraded ceph then it'll be done
[17:35] <joelio> no need to do anything
[17:36] <symmcom> i havent touched the osd hosts yet
[17:36] <jmlowe> I'm expecting to see nhm post some cuttlefish vs dumpling numbers soon, unless I missed it and he already did
[17:36] <absynth> uh, did you guys read about that performance issue in dumpling?
[17:36] <joelio> symmcom: ok, so upgrade the software and restart the osds
[17:36] <absynth> i'd stay away from an upgrade until that gets resolved
[17:36] <joelio> symmcom: osd's are much less sensitive to any upgrade issues (imho)
[17:37] <symmcom> ok, here i go........
[17:39] <joelio> absynth: I'm waiting for other people to make the jump before. In no rush
[17:39] <joelio> quite happy with Cuttlefish atm, no need to rice it :)
[17:40] <jmlowe> I've got six more osd's to add, I think I need dumpling's improved peering to make it less painful
[17:43] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[17:44] <odyssey4me3> I need a little help. I just can't seem to get this right and I'm getting conflicting advice from various websites. I'm trying to get a ceph-mds running.
[17:44] <odyssey4me3> It's with dumpling in a new environment.
[17:44] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:44] <odyssey4me3> I do not plan to use cephx if at all possible.
[17:44] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[17:45] <odyssey4me3> Ideally I also want to do as little as possible config in /etc/ceph/ceph.conf
[17:45] <joelio> I added mds post cluster build using ceph-deploy (admittedly on cuttlefish) didn't have any issues
[17:45] <joelio> where are you stuck?
[17:46] <lxo> isn't it a bit disturbing that sync doesn't? if rsync changes a bunch of dir timestamps on a ceph.ko mount, syncs, and then the client crashes before a umount, it seems like many of the dirs revert to their earlier timestamps if nobody else accessed them. something to do with how timestamps are flushed to the mds only when caps/locks are released, it seems
[17:46] <odyssey4me3> As I understand it, all I should have to do is: Create /var/lib/ceph/mds/mds.${id}; then generate a keyring with something like ceph-authtool /var/lib/ceph/mds/mds.0/mds.0.keyring --create-keyring --gen-key --name=mds.0 --cap mds 'allow' --cap osd 'allow *' --cap mon 'allow rwx'
[17:46] * sagelap (~sage@2607:f298:a:607:7067:94d7:68ab:4e08) has joined #ceph
[17:47] <joelio> odyssey4me3: umm that's manual - why not use ceph-deploy?
[17:47] <odyssey4me3> then start the service... but it won't start - ERROR: failed to authenticate: (1) Operation not permitted
[17:47] <odyssey4me3> joelio - I'm working out the method for the purpose of creating the chef recipe... so ceph-deploy isn't an option
[17:48] <joelio> grok ceph-deploy's code
[17:48] <alfredodeza> oh, don't attempt to grok ceph-deploy's code :)
[17:48] <joelio> although the manual additions should work :)
[17:48] <alfredodeza> odyssey4me3: what version of ceph-deploy are you using?
[17:48] <alfredodeza> the latest releases are fairly verbose and should get you a good chunk of info of what is doing
[17:49] <odyssey4me3> alfredodeza - I'm not. I want to do it manually as this'll be done by a chef run.
[17:49] <alfredodeza> I'm working on improving other commands besides `install` and `mon create` but those already tell you a lot
[17:49] <alfredodeza> odyssey4me3: you should definitely use ceph-deploy to "see" what is going on and apply that to chef
[17:49] <alfredodeza> I am not saying you should use it for/instead-of chef
[17:50] <zackc> chef can't use ceph-deploy?
[17:50] <alfredodeza> it sure can
[17:50] <alfredodeza> if you want it to
[17:50] <alfredodeza> but you lose some control I guess
[17:50] <joelio> REST admin API please :)
[17:51] <joelio> oh, wait a minute.. :)
[17:51] <ChoppingBrocoli> mozg: I made some more changes check out this bench....
[17:51] <ChoppingBrocoli> Total time run: 51.571456
[17:51] <ChoppingBrocoli> Total writes made: 6082
[17:51] <ChoppingBrocoli> Write size: 4194304
[17:51] <ChoppingBrocoli> Bandwidth (MB/sec): 471.734
[17:51] <ChoppingBrocoli> Stddev Bandwidth: 161.856
[17:51] <ChoppingBrocoli> Max bandwidth (MB/sec): 656
[17:51] <ChoppingBrocoli> Min bandwidth (MB/sec): 0
[17:51] <ChoppingBrocoli> Average Latency: 0.134921
[17:51] <ChoppingBrocoli> Stddev Latency: 0.192999
[17:51] <odyssey4me3> zackc - it could, I guess, but that's not the current method for all the rest of the cookbooks
[17:51] <ChoppingBrocoli> Max latency: 2.29017
[17:51] <ChoppingBrocoli> Min latency: 0.027948
[17:51] <joelio> ChoppingBrocoli: pastebin dude!
[17:51] <mozg> nice!!!!
[17:51] <ChoppingBrocoli> Sorry!
[17:52] <joelio> :)
[17:52] <mozg> do you mind sharing your config changes
[17:52] <odyssey4me3> I'm not all that keen to rewrite the whole bunch just yet... we're planning on using the API down the line
[17:52] <ChoppingBrocoli> yea, what is your email?
[17:52] * odyssey4me3 is now known as odyssey4me
[17:53] <mozg> has anyone looked at optimising for small block size performance with kvm/rbd?
[17:53] <zackc> odyssey4me: ah ok, understandable
[17:53] * odyssey4me is now known as odyssey4me3
[17:53] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[17:53] <mozg> i am not having much luck with getting decent performance figures
[17:53] <mozg> ChoppingBrocoli, how did you run the last benchmark? do you mind sharing?
[17:54] <mozg> what tool did you use?
[17:54] <joelio> that's rados bench iirc
[17:54] * tnt (~tnt@ has joined #ceph
[17:54] <ChoppingBrocoli> rados bench -p pbench 50 write
[17:54] <joelio> mozg: http://ceph.com/w/index.php?title=Benchmark
[17:54] <kraken> php is just terrible
[17:55] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:55] <mozg> thanks
[17:55] <mozg> let me try that from my end
[17:55] <mozg> i assume you need to have a pool for that benchmark, right?
[17:56] <joelio> yes and you need data in a pool
[17:56] <joelio> I just create a pool for benchmarking so it's easily removed
[17:56] <joelio> usign same pg's as other
[17:57] <joelio> if you use write with --no-cleanup you get data so you can read
[17:57] <mozg> cheers
[17:59] * ruhe (~ruhe@ has joined #ceph
[18:00] * Cube (~Cube@et-0-30.gw-nat.bs.kae.de.oneandone.net) Quit (Quit: Leaving.)
[18:00] <odyssey4me3> joelio - good call on looking into ceph-deploy... I'm finding differences in the process :)
[18:01] <joelio> anytime bud
[18:02] <mozg> joelio, are you based in the UK?
[18:02] <joelio> yep, (sunny old) Manchester
[18:02] <joelio> Manchester even
[18:02] <symmcom> ok, upgraded 1 of the 2 osd node. doesnt seem anything odd happned. how can i know for sure it upgraded ok? #ceph -v on the osd node shows the new ver 67.1
[18:06] <ChoppingBrocoli> mozg: just to be nossy what did you get?
[18:06] <mozg> one sec
[18:06] <mozg> let me run it
[18:06] <mozg> give me a few minutes
[18:07] <joelio> symmcom: that's just showing you the version of the ceph command line binaries, not the osd versions. You could check OSDmap versions, but I'd just upgrade the other osd host and restart the osds
[18:07] <mozg> i will get much lower speeds as i only have 2 ssd disks serving 8 osds out of 16
[18:07] <mozg> i've got two osd servers
[18:07] <mozg> with 8 osds each
[18:07] <mozg> they are sas 7k 3TB and sas 7k 1.5TB disks
[18:07] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:08] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[18:09] <mozg> ChoppingBrocoli, by the way, were you testing from the storage server or from one of your client host servers?
[18:10] <jmlowe> symmcom: I've always used 'service ceph -a status' but my cluster is pre ceph-deploy so ymmv
[18:10] <cjh_> i'm noticing with ceph dumpling that my init files don't seem to know when the osd's are running. They all say down even though the processes are up
[18:11] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[18:11] * flickerdown (~flickerdo@westford-nat.juniper.net) has joined #ceph
[18:13] <cjh_> nevermind. i nuked the processes and restarted the init script. i think they were just stuck
[18:14] <symmcom> ok, done upgrading both OSD nodes. mow how can i check osd version
[18:15] <symmcom> all OSDs stopped and restarted
[18:15] <ChoppingBrocoli> Storage
[18:18] <mozg> can you run it from the client side and check if the numbers are any different?
[18:19] <ChoppingBrocoli> Yea 1 sec
[18:19] <mozg> ChoppingBrocoli, here are my results: http://ur1.ca/f479r
[18:20] <ChoppingBrocoli> mozg: still the same from client, I don;t think it matters where you run from
[18:20] <mozg> cool
[18:20] <mozg> how many osds do you have?
[18:21] <mozg> ChoppingBrocoli, how many osds do you have?
[18:22] <mozg> joelio, cool, I am in London ))
[18:22] <joelio> symmcom: upgrading will have replaced the osd binaries, so if you have restarted, they will be using the newer version
[18:22] <mozg> joelio, cool, I am in London ))
[18:23] <joelio> mozg: yea, going to come down to the Ceph day I think. Our main office is in London anyway, so may drag some other engineers along
[18:23] <mozg> nice!
[18:23] <ChoppingBrocoli> 11 now
[18:24] <symmcom> joelio, i restarted all OSDs. The cluster went to degraded mode, but came back as active+clean in less than a minute. Now the health is OK
[18:24] <mozg> ChoppingBrocoli, that is not bad
[18:24] <joelio> symmcom: that makes sense if you have not set the noout flag
[18:24] <mozg> i guess all your ssds are doing the hard job )))
[18:24] <mozg> do you have 3 ssds for 11 osds?
[18:25] <joelio> symmcom: as it'll think an OSD has gone offline and start to rebalance, but the OSD will have come back up and meant rebalancing is minimal
[18:25] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[18:25] <symmcom> opss ya i did not set noout flag
[18:25] <symmcom> ceph osd tree only showed OSD down though
[18:26] <ChoppingBrocoli> mozg: 5 ssds
[18:26] <mozg> ChoppingBrocoli, how are your read benchmarks?
[18:26] <mozg> 5 ssds - very nice
[18:26] <mozg> )))
[18:26] <joelio> symmcom: it may have been aquiescing/coalescing state, try the command again?
[18:27] <ChoppingBrocoli> morg: I cannot get read to work
[18:27] <ChoppingBrocoli> Says must write data first
[18:27] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Quit: Leaving.)
[18:27] <mozg> ChoppingBrocoli, you need to specify the flag to keep the data
[18:27] <mozg> by default it erases everything
[18:27] <ChoppingBrocoli> what flag?
[18:27] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[18:28] * Guest3625 (~Adium@c-67-176-54-246.hsd1.co.comcast.net) Quit (Remote host closed the connection)
[18:28] <mozg> --no-cleanup
[18:28] <mozg> as joelio pointed out ))
[18:28] <joelio> mozg: it doesn't erase, just leaves crud
[18:28] * mtl1 (~Adium@c-67-176-54-246.hsd1.co.comcast.net) has joined #ceph
[18:28] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[18:28] <mozg> so, run the write test with the flag
[18:28] <mozg> and do the read after
[18:28] <symmcom> i meant when i did start/stop OSDs after upgrade, #ceph osd tree showed OSD as down, but came back up in less then a minutes. right now all OSDs/MONs are up and Health is OK.
[18:28] <joelio> mozg: so I create a pool just to fill with crud so it's easier to remove after
[18:28] <joelio> symmcom: yea, all good then
[18:29] <joelio> symmcom: the take away point here is that if you've upgraded the ceph software, then when you restart the osd, it will use the newer version of the software (as there is no old software left on the system as it's been upgraded)
[18:30] <joelio> just like you would if upgrading apache for example
[18:30] <symmcom> awesome! seems like i m all up to date with Dumplings :)
[18:30] <joelio> :)
[18:30] <ChoppingBrocoli> mozg: read is Bandwidth (MB/sec): 326.467
[18:31] <symmcom> to make sure all works, all these upgrades that i just did, i did them from inside a VM which is stored on the cluster
[18:31] <symmcom> my VM never disconnected or stopped during the whole cluster upgrade process
[18:31] <joelio> yep, that's the point in shared storage, did the same today (although just a cuttlefish point telease) for several dozen VMs :D
[18:32] <joelio> also with live migrations, can evactuate a compute node, keeping all VMs runing, do maintenance, rebalance and users never even know :)
[18:33] <symmcom> i used FreeNAS, OmniOS, GlusterFS for shared storage, but Ceph topped them all and i dont even think there is anything like CEPH at this moment
[18:35] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:35] * ishkabob (~c7a82cc0@webuser.thegrebs.com) has joined #ceph
[18:36] * Cube (~Cube@ has joined #ceph
[18:36] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[18:36] <ishkabob> hey guys, does anyone know why i might be having trouble loading the rbd kernel module on one of my cluster machines?
[18:36] <ishkabob> is this unsupported?
[18:36] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[18:37] <joelio> ishkabob: errors, logs etc?
[18:38] <ishkabob> hey joelio: i have a 3-node cluster running just fine, i logged in to one of the nodes which is running a monitor and 6 osds
[18:38] <ishkabob> and then ran: modprobe rbd
[18:38] <ishkabob> and got
[18:38] <ishkabob> FATAL: Module rbd not found.
[18:38] <ishkabob> its SL 6.4, Ceph Dumpling
[18:39] <ishkabob> ceph version 0.67.1 (436941e0664b648459cc27cf9d1a685f060c09cc)
[18:39] <joelio> well, afaik, that's down to the running kernel, not the ceph version right
[18:39] <ishkabob> true, which package provides the kernel module?
[18:40] <joelio> that error is indicative that the module has not been compiled for your kernel
[18:40] <joelio> ishkabob: no package provides the kernel module as it's upstreamed (so is in kerbel sources)
[18:41] <joelio> althogh there may be something different for CentOS/Scientific/RHEL etc
[18:41] <ishkabob> yeah, the kernel for 6.4 is pretty old 2.6.32
[18:41] <joelio> there you go then :)
[18:42] <ishkabob> thx
[18:42] <joelio> did this work on any others with that kernel?
[18:42] <ishkabob> i don't believe so no
[18:42] <joelio> ok, well there's an angle of attack then
[18:42] <joelio> fwiw I have a better time with -fuse as it's more tightly linked with the ceph libs
[18:43] <joelio> on raring kernel I got all kinds of strangeness
[18:43] <ishkabob> does fues use librdb or some such?
[18:43] <joelio> yea, it's userspace
[18:44] <joelio> not sure on performance or relevance to your workloads
[18:44] <ishkabob> do you use fuse commands to map the drive?
[18:44] <sagewk> joshd dmick or anyone else familiar with the python bindings and tests: https://github.com/ceph/ceph/commits/wip-6052
[18:44] <joelio> ishkabob: yea, ceph-fuse -m {host:6789} /path/to/mount
[18:45] <joelio> that gives me correct representation of disk usage and won't fall over on tunables
[18:45] <joelio> been running it as the system datastore for my opennebula stack for a good few weeks and all good so far
[18:45] <joelio> not massive amounts of concurrent writes though, so again, YMMV! :_
[18:45] <joelio> :)
[18:46] <ishkabob> joelio: how do you specify which rbd image you want?
[18:47] <joelio> ishkabob: sorry, ignore me, confused mysefl with cephfs
[18:47] <joelio> time for home I think :)
[18:47] <ishkabob> ah right
[18:47] <joelio> but same goes for kernel stuff
[18:48] * ruhe (~ruhe@ has joined #ceph
[18:48] <joelio> I use via libvirt so it's all userspace
[18:48] <ishkabob> does anyone know how to use rbd-fuse? I'm just trying to expose a samba share from my ceph cluster
[18:48] <joelio> no need to be as stringent with kernels, so less admin overhead for me
[18:48] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:49] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[18:49] * gregmark (~Adium@ has joined #ceph
[18:50] <joelio> ishkabob: man rbd-fuse :)
[18:50] <joelio> rbd-fuse [ -p pool ] [-c conffile] mountpoint [ fuse options ]
[18:50] <ishkabob> joelio: yeah i saw that
[18:50] <ishkabob> it mounts everything in the pool
[18:50] <ishkabob> and each image is a file
[18:50] <ishkabob> do i then subsequently mount that file as a block device?
[18:50] * sleinen1 (~Adium@2001:620:0:26:283b:e3d3:8923:46fe) has joined #ceph
[18:50] <mozg> ChoppingBrocoli, your read speeds are nice
[18:51] <mozg> try and see how your writes/reads work with small block sizes
[18:51] <mozg> like 4k
[18:51] <joelio> ishkabob: they'll be stored as RAW so I guess.. this sounds messy to me though
[18:51] <mozg> my 4k performance is not that good
[18:52] <flickerdown> if your file types are larger, 4k block sizes may not be optimal anyway.
[18:52] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[18:54] <mozg> flickerdown, if my vms are running sql with small random read/writes, what kind of block sizes would ceph cluster hit?
[18:54] <mozg> i am assuming it will not aggregate them into 4M chunks
[18:54] <flickerdown> like MS-SQL?
[18:54] <mozg> yeah, or mysql
[18:54] <flickerdown> or what?
[18:54] <flickerdown> ah..
[18:55] <flickerdown> so....i don't believe you'll get any write/read coalescing ...
[18:55] <flickerdown> though, i could be wrong. are you presenting via SMB/CIFS/NFS or block?
[18:56] * smiley (~smiley@cpe-67-251-108-92.stny.res.rr.com) Quit (Quit: smiley)
[18:57] * nhm (~nhm@184-97-168-219.mpls.qwest.net) has joined #ceph
[18:57] * ChanServ sets mode +o nhm
[18:57] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[18:57] <mozg> i am using rbd via qemu/kvm
[18:58] <mozg> there is rbd cache options which could speed things up
[18:58] <flickerdown> ok...
[18:58] <joelio> only writes
[18:58] * mschiff (~mschiff@p4FD7C905.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[18:58] <flickerdown> @joelio caching would be write only?
[18:59] <cephalobot> flickerdown: Error: "joelio" is not a valid command.
[18:59] <mozg> guys, could you please try rados bench using -b 4K option?
[18:59] <mozg> does it work for you?
[18:59] <nhm> mozg: are you saying 4K or 4096?
[18:59] <mozg> it segfaults on my cluster
[18:59] <mozg> ah
[18:59] <joelio> flickerdown: rbd caching is only for writes
[18:59] <mozg> that could be the issue
[18:59] <flickerdown> thanks, Joelio...
[18:59] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[18:59] <nhm> mozg: rados bench isn't too smart. :)
[18:59] <mozg> rados bench -b 4096 -p pbench 60 seq
[18:59] <mozg> *** Caught signal (Segmentation fault) **
[18:59] <mozg> in thread 7f5614bb4780
[18:59] <mozg> *** glibc detected *** rados: malloc(): memory corruption (fast): 0x00000000015daf40 ***
[19:00] <nhm> oh my
[19:00] <flickerdown> figured there was some sort of caching mechanism in there.
[19:00] <mozg> ))
[19:00] <flickerdown> :)
[19:00] <sagewk> mozg: does librados2 version match ceph-common version?
[19:00] <sagewk> and what version is that?
[19:00] <flickerdown> so, you're getting write caching which should coalescing writes before commit to the fs....
[19:00] <mozg> librados2 0.61.7-1precise
[19:00] <mozg> yes it does
[19:00] <joelio> flickerdown: you'll get caching inside the VM's memory, so there is still some read/write perfomance gains to be had. rbd caching is just for wrting though
[19:01] <joelio> mozg: I have segfault too
[19:02] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[19:02] <flickerdown> yup...i know the virt side pretty darn well . ;)
[19:04] <yo61> Eveing
[19:04] <yo61> Er, evening
[19:04] <yo61> I'm installing a poc ceph cluster on 5 nodes
[19:04] <yo61> This is not production - just learning
[19:04] <joelio> mozg: I don't segfault on write
[19:04] <nhm> mozg: joelio are these the pre-built packages?
[19:05] <joelio> yes
[19:05] <joelio> fresh 0.61.8 today
[19:05] <joelio> would mixing rados block sizes make a difference in the same pool?
[19:05] <nhm> joelio: same kind of error as mozg?
[19:05] <joelio> root@vm-hv-01:~# rados bench -b 4096 -p tester2 60 seq
[19:05] <joelio> *** Caught signal (Segmentation fault) ** in thread 7f81805ec7c0
[19:06] <yo61> Is it OK to, eg. use all 5 for OSDs and put MON on, say, three?
[19:06] <joelio> nhm: write is ok
[19:06] <joelio> and really slow, as you'd expecy for 4k
[19:06] <nhm> joelio: data already exists with a write that kept the objects around?
[19:06] <yo61> Am trying to get to grips with all the various moving parts
[19:07] <joelio> nhm: I'm using a test pool, can purge
[19:08] <joelio> nhm: yes, I can confirm that an empty pool works with 4k writes and then can be read with seq
[19:08] <joelio> otherwise if data exists, the reads fail and segfault
[19:08] <nhm> joelio: I was thinking the other way around I think. IE you did a rados bench write test and used the flag to keep the objects so the seq read would succeed?
[19:09] <nhm> joelio: oh interesting
[19:09] <nhm> joelio: can you file a bug in the tracker for that with the steps you took and the segfault?
[19:09] <joelio> I can indeed, I will do some further passes to see if it's the quanitity of data in the pool too
[19:09] <joelio> as there's quite a lot in my test pool that was failing
[19:10] <joelio> I'm on -raring too wheras mozg is on precise I think
[19:12] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:12] <nhm> joelio: ok, good to know, and thanks!
[19:12] * ruhe (~ruhe@ has joined #ceph
[19:12] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[19:12] * sleinen1 (~Adium@2001:620:0:26:283b:e3d3:8923:46fe) Quit (Quit: Leaving.)
[19:18] * smiley (~smiley@cpe-67-251-108-92.stny.res.rr.com) has joined #ceph
[19:26] * Gamekiller77 (~oftc-webi@128-107-239-235.cisco.com) has joined #ceph
[19:27] <Gamekiller77> good day all, I have a problem with Cinder Volume not starting it stating that it can not excute rados lspools in the log files. I can run the command just find at the command line so i expect this to be a permissions problem at some level any ideas?
[19:29] <Gamekiller77> i answered my own question
[19:29] <Gamekiller77> the admin key was 700
[19:29] * symmcom (~wahmed@ Quit ()
[19:29] * ruhe (~ruhe@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[19:30] * mschiff (~mschiff@ has joined #ceph
[19:53] <cmdrk> well this is interesting! i have cephFS mounted on a host, and apparently one of the directories has disappeared. an "ls" shows 2 directories, and when I try to "mkdir" the missing directory, I get permissioned denied with "file exists". then i 'ls' and its there (along with other missing directories). in a few seconds it disappears into the aether again
[19:53] <cmdrk> any thoughts?
[19:53] <cmdrk> (this is with cuttlefish and kernel 3.10.5)
[19:57] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[19:59] * odyssey4me3 (~odyssey4m@ Quit (Ping timeout: 480 seconds)
[19:59] <gregaf> uh, what?
[19:59] <cmdrk> let me make a pastebin
[19:59] <gregaf> cmdrk: it disappears meaning it doesn't show up in "ls"? and you mean it's the third directory?
[19:59] <gregaf> yes, pastebin would be perfect
[19:59] * yo61 (~yo61@lin001.yo61.net) Quit (Remote host closed the connection)
[20:00] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC)
[20:00] <cmdrk> http://pastebin.com/0xQQttCb
[20:00] * Cube (~Cube@ Quit (Quit: Leaving.)
[20:01] * psiekl (psiekl@wombat.eu.org) has joined #ceph
[20:01] * psieklFH (psiekl@wombat.eu.org) Quit (Read error: Connection reset by peer)
[20:01] * Gamekiller77 (~oftc-webi@128-107-239-235.cisco.com) Quit (Remote host closed the connection)
[20:03] <gregaf> *stares*
[20:03] <gregaf> I've never seen anything like that before
[20:04] <cmdrk> woo
[20:04] <cmdrk> i'm the winner! ;)
[20:04] <gregaf> 3.10.5 is…very new, right?
[20:04] <cmdrk> its the latest stable kernel, yeah.
[20:05] <gregaf> sagewk: I think you've merged some stuff from Yan for the kclient; is there anything that might have changed the visibility of dentries (in cap fixes, perhaps?)
[20:05] <gregaf> cmdrk: what usage does this cluster see?
[20:05] * yo61 (~yo61@lin001.yo61.net) has joined #ceph
[20:05] <gregaf> I have a theory but I don't work in the kclient much and more data would help direct our search efforts
[20:06] <cmdrk> at the moment only me testing transfering a variety of files into it.
[20:06] <sjust> loicd: wip-erasure-coded-doc is now in master
[20:06] <gregaf> cmdrk: so it's not mounted anywhere else?
[20:06] <sjust> there are still some urls to that branch so I'll leave it up as well
[20:06] <cmdrk> gregaf: actually, it is. it's mounted on two 10Gb hosts
[20:06] <cmdrk> both are using the same kernel etc
[20:07] <gregaf> are they both accessing it at the same time, or is one sitting idle?
[20:07] <cmdrk> one is sitting idle
[20:07] <gregaf> hrm, that makes my theory somewhat less likely :/
[20:07] <gregaf> anyway, I think this is a new regression in the kernel client; can you create a ticket at tracker.ceph.com?
[20:08] <cmdrk> yep
[20:08] <gregaf> thanks
[20:08] <gregaf> I'll poke Sage about it later since he's the defacto maintainer right now, but he's in a meeting atm
[20:08] <sagewk> gregaf: i have yan's d_prune patch in our testing branch that hopefully fixes the ENOENT on rm -r
[20:08] <sagewk> ... is that what you mean?
[20:08] <gregaf> sagewk: this bug is failing to display existing directories
[20:09] <sagewk> hmm
[20:09] <sagewk> milosz?
[20:09] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[20:09] <gregaf> but not consistently; the pastebin (http://pastebin.com/0xQQttCb) has it going back and forth
[20:09] <gregaf> I haven't kept close track of the kclient patches over the last several months but I know there's been some activity
[20:10] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[20:10] <sagewk> that could be the d_prune issue
[20:10] <sagewk> maybe something else, i forget. the dprune thing has to go upstream through al's tree in the next window tho
[20:10] <sagewk> if they can reproduce this they could test with our kenrel
[20:12] <gregaf> cmdrk, any interest in testing a custom kernel?
[20:13] <cmdrk> gregaf: I can try
[20:13] <sagewk> what kernel v is it now btw?
[20:14] <cmdrk> my usual workflow is to build an RPM of the kernel from source
[20:14] <cmdrk> 3.10.5
[20:14] <cmdrk> with some small custom patches that should be irrelevant to ceph (network driver firmware)
[20:14] <sagewk> trying to reproduce this against the testing branch of ceph-client.git would be very helpful :)
[20:14] <sagewk> which is -rc6 basically
[20:15] <sagewk> (3.11-rc6)
[20:15] <cmdrk> sure
[20:18] <gregaf> sagewk or somebody: review for https://github.com/ceph/ceph/pull/512 at some point would be appreciated
[20:18] <gregaf> …or maybe you're already done, thanks!
[20:18] <sagewk> :)
[20:22] <cmdrk> is the ceph-client.git on github the latest or should i use the one from ceph.com ?
[20:22] <sagewk> github
[20:22] <cmdrk> ok
[20:23] <sagewk> ceph.com is a mirror (lags by ~60s)
[20:25] <joelio> yo61: yes, you can have 5 OSD hosts and 3 mons on the same.. I have 6 OSD hosts (with 6 OSDs per host) and 3 mons on 3 of the OSD servers too. They have lots of RAM though and enough CPU
[20:25] <joelio> yo61: you're best with 3 rather than 5 in that configuration too, it'll be less overheads
[20:36] * alram (~alram@ has joined #ceph
[20:43] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[20:54] * flickerdown|2 (~flickerdo@97-95-180-66.dhcp.oxfr.ma.charter.com) has joined #ceph
[20:58] * ruhe (~ruhe@95-29-196-189.broadband.corbina.ru) has joined #ceph
[21:01] * flickerdown (~flickerdo@westford-nat.juniper.net) Quit (Ping timeout: 480 seconds)
[21:04] <cmdrk> btw, i see a call trace for ceph in dmesg
[21:04] <cmdrk> http://pastebin.com/naJxu4kr
[21:07] * zoltan (~zoltan@178-83-84-204.dynamic.hispeed.ch) has joined #ceph
[21:07] <zoltan> hi
[21:09] * ruhe (~ruhe@95-29-196-189.broadband.corbina.ru) Quit (Quit: Textual IRC Client: www.textualapp.com)
[21:26] * flickerdown (~flickerdo@westford-nat.juniper.net) has joined #ceph
[21:31] * allsystemsarego (~allsystem@5-12-241-157.residential.rdsnet.ro) Quit (Quit: Leaving)
[21:32] * dmick (~dmick@2607:f298:a:607:345b:3f9f:be42:5ae0) Quit (Ping timeout: 480 seconds)
[21:33] * flickerdown|2 (~flickerdo@97-95-180-66.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[21:41] * dmick (~dmick@2607:f298:a:607:6d69:5957:f7a4:b0b6) has joined #ceph
[21:42] * BManojlovic (~steki@fo-d- has joined #ceph
[21:46] <sagewk> k
[21:46] <dmick> sagewk: wrt wip-6052: looks right; should probably add to ceph-rest-api as well
[21:46] <dmick> test: are you trying to suggest that FOO isn't set, and therefore parses empty without error?
[21:50] * roald (~oftc-webi@ has joined #ceph
[21:56] <Fetch_> I tried to upgrade from 0.67.0 to 0.67.1 from a HEALTH_OK status, and it seems like the upgrade wrecked mon maps. Now I've got a problem in that I only have one mon out of 3 up, and it is running version 0.67.0
[21:56] <Fetch_> on one of the other 3 mons, I'm getting [14321]: (33) Numerical argument out of domain
[21:57] <Fetch_> the 3rd mon seems to be stalled on startup
[21:57] <zoltan> sagewk, I've tried to get more debug output, but didn't really succeed.
[21:58] * roald (~oftc-webi@ Quit (Quit: Page closed)
[22:00] <dmick> Fetch_: I remember that error, but am having problems finding the bug/issue
[22:00] <dmick> why 76.1?
[22:00] <dmick> er,67.1?
[22:02] * m0zes (~mozes@beocat.cis.ksu.edu) has joined #ceph
[22:04] * ishkabob (~c7a82cc0@webuser.thegrebs.com) has joined #ceph
[22:04] <m0zes> hello all, I am trying to setup openstack with ceph 0.67.1, the docs show that the cinder-volume host should have a libvirt secret for cephx auth. http://ceph.com/docs/master/rbd/rbd-openstack/#configure-openstack-to-use-ceph
[22:04] <dmick> Fetch_: did you get a backtrace, and if so, can you put it on fpaste.org?
[22:04] <m0zes> should I have multiple cinder-volume hosts?
[22:05] <Fetch_> dmick: 0.67.1 because openstack/rbd
[22:05] <zoltan> m0zes, you can if you want.
[22:06] <zoltan> m0zes, then the scheduler service will distribute requests among all your cinder nodes
[22:06] <m0zes> the cinder-volume host is typically the controller node, they don't usually have libvirt on them.
[22:07] <zoltan> in my test setup I have them on one node... never tried to break compute and cinder up so far.
[22:07] <zoltan> I mean with my ceph-based setup.
[22:07] <zoltan> with iSCSI, it works :)
[22:08] <Fetch_> dmick: http://fpaste.org/33250/76942840/ debug dump of startup on node that throws "Numerical argument out of domain"
[22:08] <zoltan> on a separate note, have any of you tried using ceph from vmware via iSCSI? I tried both with the patched tgt and with reexporting the kernel based rbd devices, but it was waaay too slow from vmware.
[22:09] <zoltan> <1MB/s slow.
[22:09] <ishkabob> hey guys, i just created a cluster with ceph-deploy and some puppet scripts, I noticed 10 or so minutes after everything came up I'm still not getting a good health status
[22:09] <ishkabob> i'm getting HEALTH_WARN 40 pgs degraded; 45 pgs stale; 45 pgs stuck stale; 192 pgs stuck unclean
[22:09] <ishkabob> any ideas on how to troubleshoot?
[22:10] <dmick> Fetch_: that looks like a SEGV, not a EDOM?
[22:10] <Fetch_> dmick: yeah I've no explanation - that paste is what happens when I add -d to the command line
[22:10] * dmick wonders what -d is and why Fetch_ is adding it
[22:10] <dmick> ah
[22:11] <dmick> even if you don't do that, there's probably a backtrace in the monitor log?
[22:11] <dmick> (not that SEGV with -d is great either, but..)
[22:12] <Fetch_> http://fpaste.org/33251/37694315/
[22:12] <Fetch_> that's the log, without -d
[22:12] <Fetch_> still catching a SEGV
[22:13] <dmick> hm. so maybe it changed behavior?...
[22:13] <Fetch_> no, it still dumped [19748]: (33) Numerical argument out of domain
[22:13] <Fetch_> that's just what the backtrace in the log is
[22:14] <dmick> and, 67.1, not 61.7; sorry, my dyslexia
[22:14] <Fetch_> correct, 67.1, the new point release of dumpling
[22:14] <dmick> yeah
[22:14] <Fetch_> upgraded from 0.67.0
[22:15] <Fetch_> I'm now at less than quorum on my cluster (1 mon up out of 3 expected) - should I bring down my remaining mon and/or configure it to believe it's the only mon?
[22:15] * zoltan (~zoltan@178-83-84-204.dynamic.hispeed.ch) Quit (Quit: Leaving)
[22:15] <Fetch_> I'm unable to get a ceph status anymore
[22:15] <dmick> yeah, without quorum you can't
[22:16] <dmick> not sure; hold a sec
[22:16] * DarkAce-Z (~BillyMays@ Quit (Ping timeout: 480 seconds)
[22:17] <dmick> joshd: I notice you're actually on call ATM; I'm not sure which direction would be best here; can you step in?
[22:17] <Fetch_> seems like I should probably down the last good mon, extract its monmap, and copy that over to the other 2 mons
[22:22] <dmick> not sure if that will make it better or worse
[22:22] <kraken> ≖_≖
[22:26] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[22:26] <Tamil> ishkabob: it would be nice to have more details about your test setup
[22:27] <Tamil> ishkabob: how many nodes are you using? which ceph version?
[22:28] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) has joined #ceph
[22:30] <Fetch_> dmick: can extract-monmap be run on a running mon?
[22:30] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[22:34] * alram (~alram@ Quit (Read error: Connection reset by peer)
[22:40] * Cube (~Cube@ has joined #ceph
[22:43] <Fetch_> dmick: well, I'm back to HEALTH_OK, albeit with just one mon
[22:45] * smiley (~smiley@cpe-67-251-108-92.stny.res.rr.com) Quit (Quit: smiley)
[22:47] <Fetch_> http://fpaste.org/33259/69452111/ | dmick, sagewk - mkfs on a new mon directory after successful creation of the keyring and monmap files
[22:48] <Fetch_> pthread lock: Invalid argument
[22:48] <sagewk> fetch_: le tme guess.. rhel 6.4?
[22:48] <Fetch_> got it in one
[22:48] <Fetch_> bad pthread lib mismatch?
[22:48] <sagewk> http://tracker.ceph.com/issues/6022
[22:48] <sagewk> i'm hoping it is just the leveldb package we backported
[22:49] <sagewk> glowell: ^
[22:49] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[22:49] * DarkAceZ (~BillyMays@ has joined #ceph
[22:49] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[22:50] <glowell> I'll have a look. Maybe we need a 6.4 build.
[22:51] <sagewk> oh, it's not built on 6.4?
[22:51] <Fetch_> sage: should I give that leveldb rpm y'all have in ceph-extras a try, then?
[22:51] <sagewk> fetch_: which leveldb package are you using now?
[22:52] <yo61> joelio: thanks
[22:52] <Fetch_> leveldb-1.12.0-3.el6.x86_64 - I couldn't say if it's from your repo or from redhat
[22:52] <Fetch_> well maybe I can, checking
[22:52] <yo61> When you say: /win 1
[22:52] <yo61> Ooops
[22:52] <Fetch_> according to yum, that's from the ceph.com extra-packages repo
[22:53] <sagewk> yeah i think that is the problem package. is there a version in epel?
[22:53] <yo61> joelio: when you say "you're best with 3 rather than 5 in that configuration", what do you mean?
[22:53] <sagewk> might try against that just to see
[22:53] <Fetch_> older version, 1.7.0 . I'll try it if you think it's likely
[22:54] <sagewk> yeah please give it a go
[22:55] <Fetch_> that passed the mkfs, giving daemon start a try
[22:56] <Fetch_> ok it's started and seems to be running
[22:57] <Fetch_> is there an exciting reason for that leveldb backport that I will discover at some point in the future?
[22:57] * roald (~oftc-webi@ has joined #ceph
[22:59] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[22:59] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[23:01] <sagewk> fetch_: probably not anything you'll notice.
[23:02] <sagewk> glowell: let's yank the el6 leveldb package for now
[23:02] <glowell> ok
[23:02] <glowell> I think ceph-deploy will install the leveldb-1.7.0 package from epel by default.
[23:02] <joelio> yo61: I mean you're better with 3 mons, rather than 5 for that number of hosts. You'll have less network overheads
[23:03] <sagewk> once it's removed from the ceph-extras, you mean?
[23:04] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[23:06] <glowell> We shouldn't need to have the ceph-extras repo enabled to install ceph.
[23:06] <glowell> I'll check that to be sure.
[23:07] <Fetch_> (as no data point whatsoever, I didn't use ceph-deploy and have that repo mainly for the qemu packages)
[23:08] <joelio> nhm: I've been unable to reproduce the issue with 4096 block sizes bizarely
[23:09] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC)
[23:09] <Fetch_> sagewk: btw thanks. As always, you are freakishly good at debugging Ceph
[23:10] <sagewk> yum probably just grabbed that leveldb bc its newer?
[23:10] <Fetch_> yup
[23:10] <sagewk> thanks for confirming the workaround :)
[23:10] <Fetch_> I did a yum update, which pulled in the newer ceph (intended) and newer leveldb (happy accident)
[23:10] <sagewk> glowell: i bet the build machine just didn't match the libraries properly (centos vs rhel, or 6.3 vs 6.4, or something)
[23:10] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[23:11] <Kioob> I have just upgraded my cluster from 0.61.7 to 0.61.8 (39 OSD), without any problem. Thanks a lot !
[23:12] <Kioob> PS : the channel title is not up to date
[23:12] <Fetch_> sagewk: hey I'm a derp and was pointing my centos box at the rhel repos. I can install the proper binary real quick if you think there's a pthread API/ABI difference between centos/rhel
[23:13] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Ping timeout: 480 seconds)
[23:13] <sagewk> Fetch_: that would be interesting to confirm!
[23:13] <nhm> joelio: hrm!
[23:13] <sagewk> glowell: ^
[23:14] <Fetch_> sagewk: same failure (as rhel package), so it's not a rhel/centos lib mismatch on your build server, anyway
[23:14] <sagewk> k
[23:14] <glowell> got it.
[23:15] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[23:15] <Fetch_> glowell: if you need a tester in the new couple hours give me a shout. I'll be on, and owe y'all for the quick fix
[23:17] <glowell> I'll have the el6 packages pulled from the repo in an hours or so, then installation should default to leveldb 1.7.
[23:17] <glowell> New 6.3/64 builds will likeley be sometime tomorrow.
[23:22] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[23:26] * mtl2 (~Adium@c-67-176-54-246.hsd1.co.comcast.net) has joined #ceph
[23:26] * mtl1 (~Adium@c-67-176-54-246.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[23:27] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) has joined #ceph
[23:27] <mozg> hello guys
[23:28] <mozg> would you recommend going to 0.61.8 or straight to 0.67.1?
[23:28] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[23:29] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[23:29] * ChanServ sets mode +v andreask
[23:29] <Fetch_> mozg: I'm not an Inktank employee or developer of Ceph, but I'd go for 0.67.1 . Updates are always easier if you're at a more recent version, and performance is generally a good deal better
[23:35] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[23:37] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[23:39] <glowell> leveldb-1.12 has been removed from ceph-extras for centos6/3/rhel6.3. Installs should now default to leveldb-1.7.0
[23:40] <yo61> joelio: OK, gotcha
[23:42] <mozg> Fetch_, thanks
[23:43] <sagewk> gregaf: librados helloworld ready to merge?
[23:43] <gregaf> I'll squash it
[23:45] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[23:59] <Fetch_> sagewk: that patch you just created for the CEPH_ARGES environment variable - I see you patch is to ceph.in, but that should in up in /usr/bin/ceph, correct? And that patch should work on 0.67.1 ?
[23:59] <Fetch_> CEPH_ARGS*
[23:59] * jeff-YF (~jeffyf@ Quit (Quit: jeff-YF)
[23:59] <sagewk> Fetch_: make ceph builds ceph from ceph.in

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.