#ceph IRC Log

Index

IRC Log for 2013-10-23

Timestamps are in GMT/BST.

[0:00] <mozg> as I was working on several servers at a time, i didn't want to loose the cluster
[0:00] <mozg> i am planning to remove two in a week or so
[0:00] <mozg> wrencsok, do you think 5 is too many?
[0:01] * ksingh (~Adium@b-v6-0001.vpn.csc.fi) Quit (Ping timeout: 480 seconds)
[0:01] <mozg> mikedawson, thanks for pointing out the cause
[0:01] <wrencsok> not sure, we use 3.
[0:01] <mozg> i will try to start that mon again and see if things become happy
[0:01] <wrencsok> you want and odd number tho.
[0:02] <wrencsok> so that votes have a tie breaker
[0:02] <wrencsok> its usually not an issue unless some severe chaos happens to the cluster.
[0:02] <wrencsok> at least from my testing.
[0:03] <mozg> mikedawson, yeah, it seems that mon is causing the issue and after the start my quorum got broken again
[0:04] <mozg> it keeps calling for a new quorum
[0:04] <mozg> every 5 seconds as you've pointed out
[0:04] <mozg> this doesn't sound normal
[0:04] <wrencsok> i will say, i finally am getting my hardware changes pushed thru. can hit 10 15k spinning disks at 100% utilization simultaneously now :)
[0:04] <wrencsok> per node. across the entire cluster.
[0:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:07] <wrencsok> and it stays stable and my load no longer skyrockets to the lands of the absurd.
[0:07] <mozg> wrencsok, how many osds are in your cluster?
[0:08] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:08] <wrencsok> currently its being reduced on this particular cluster to 144 15k 3TB spinners.
[0:08] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[0:09] <mozg> what is the model for the 15k 3tb spinners
[0:10] <wrencsok> assorted vendors. hitachi, seagate, western digital, 4 intel SSD's per box for os and journals. 12 storage nodes.
[0:10] <mozg> i thought they don't make 15k sas in 3TB sizes
[0:11] <mozg> wrencsok, very nice cluster you've got
[0:11] <mozg> what do you use it for? vms?
[0:12] * shang (~ShangWu@70.35.39.20) has joined #ceph
[0:12] <wrencsok> right now jsut remove block storage for some beta customers, we've got the object store up and running to.
[0:12] <wrencsok> remote block devices for vm's.
[0:12] <wrencsok> not hosting vm's, but that may change.
[0:13] <mozg> how many vms do you have that generate so much io?
[0:13] <mozg> if you don't mind me asking
[0:13] <mozg> i am a little curious
[0:13] * ksingh1 (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Quit: Leaving.)
[0:13] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) Quit (Ping timeout: 480 seconds)
[0:13] <wrencsok> oh... that's me doing that type of io.
[0:13] <wrencsok> looking for the breaking points and bottle necks.
[0:13] <wrencsok> forcing recovery operations, in addition to customer usage.
[0:14] <wrencsok> highest throughput i have seen since the ops guys are pushing my hardware changes out, has been about 35GB/s and for change we stay very stable now.
[0:15] <wrencsok> in about a week, i'll spin up 50 to 100 clients of assorted vm configurations and hammer it with customer workflows to further fine tune things.
[0:15] <wrencsok> and start getting a feel for when we need to scale out with additional nodes.
[0:15] <wrencsok> we only have a few hundred beta customers on it, atm.
[0:16] <wrencsok> who barely stress it.
[0:16] * sagelap (~sage@2600:1012:b01e:fc6a:18f7:88d2:cd17:da33) Quit (Read error: Connection reset by peer)
[0:17] <wrencsok> it helps having a separate cluster with very broken hardware to test things as well, like we have in our lab.
[0:18] <wrencsok> i call it the franken cluster. that one. is pretty damn fast now too, on very broken and wornout hardware. ceph is great once you get the hardware and kernels tuned right. even with bad stuff, imho.
[0:19] <Knorrie> if parts of a cluster are half-broken or slow etc, don't they slow down the entire system then?
[0:21] <wrencsok> they can, but not to much anymore.
[0:21] <wrencsok> during most failure scenarios i lose maybe 3% of our throughput per client.
[0:22] <wrencsok> which recovers when the updates fro the failures take effect. in less than a few seconds.
[0:22] <mozg> wrencsok, nice!!!
[0:22] <wrencsok> you see little blips when we yank a drive. or kill a switch, but all less than a second.
[0:23] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) has joined #ceph
[0:23] <mozg> wrencsok, what network do you run it on to acheive 35GB/s speeds?
[0:23] <wrencsok> right now bonded 10GE
[0:24] <mozg> what is the max speed you get per client?
[0:24] <wrencsok> not sure on this one yet.
[0:24] <wrencsok> still waiting for the rest of my hardware changes to roll out. to get valid numbers.
[0:24] <mozg> did you run rados benchmarks that gave you 35GB speed?
[0:24] <wrencsok> in my franken cluster with rbd caching 550 to 600MB/s
[0:24] <wrencsok> per client
[0:25] <mozg> that's not bad
[0:25] <wrencsok> on bonded GE. it levels off to a sane saturation tho.
[0:25] <wrencsok> once i surpase the cahce size.
[0:25] <mozg> how much do you get from the spindles?
[0:25] <mozg> coz i don't get much if it's not coming from ram
[0:25] <wrencsok> i saturate them.
[0:25] <wrencsok> i pushed my bottlenecks to the disks.
[0:26] <wrencsok> they're my slow point.
[0:26] <wrencsok> as they should be.
[0:26] <mozg> i can get over 1GB/s per client when the servers have the data in ram
[0:26] <wrencsok> yeah. i can see that.
[0:26] <mozg> however, when reading from spindles i don't get much disk utilisation at all
[0:26] <wrencsok> i have seen that. tested it very thrououghly.
[0:26] <mozg> my speeds drop down to around 100-200MB/s max
[0:27] <mozg> and my storage servers are not utilised much
[0:27] <wrencsok> it depends on the load.
[0:27] <mozg> like being idle 50% at least
[0:27] <wrencsok> more readers and writers and data floating around. should ramp up your utilization.
[0:27] <mozg> io utilisation on osds are also not reaching even 60-70%
[0:28] <mozg> so, you recon if i do this from 10+ clients i will get full utilisation?
[0:28] <wrencsok> our's barely do too. again our beta users have very tame workflows. mainly backups right now.
[0:29] <mozg> have you tested your cluster with sql io?
[0:29] <wrencsok> yes :(
[0:29] <mozg> like small block sizes?
[0:29] <wrencsok> and postgres
[0:29] <wrencsok> and a few other database workflows
[0:29] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[0:30] <wrencsok> not happy with that. working on that currently.
[0:30] <mozg> do you get good performance?
[0:30] <wrencsok> in every other workflow, but database types we rock.
[0:30] <mozg> what figues do you get?
[0:30] <wrencsok> in databse types notso much. i can change that tho.
[0:30] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[0:32] <wrencsok> this is old data before i really started tweaking from a 1GE client.
[0:32] <wrencsok> http://67.225.231.70/test_results/merge-6526/composite.xml
[0:32] * shang (~ShangWu@70.35.39.20) Quit (Read error: Operation timed out)
[0:33] <wrencsok> we use that as a baseline, since we ran similar tests with different size vm's against rackspace, amazon, and vps providers.
[0:34] <wrencsok> that other data, i won't share. ;) its embarrasing for our competition in every case save, the db ones.
[0:34] <mozg> is this on bobtail?
[0:35] <wrencsok> that's all 3 versions, bobtail, cuttle, and dumpling. same hardware.
[0:35] <wrencsok> same vm
[0:35] <wrencsok> before i started tuning and re-architecting the hardware.
[0:36] <wrencsok> i use our smallest vm's for that baseline. you can tell the dev's really overuse flexible io for tuning perf. based on the progression.
[0:38] <wrencsok> we had horrendous memory fragmentation and management issues too, that got worse with each build. 3.2 kernel sucks, imho. i tuned most of it out then swapped our kernels for a 3.8. no more memory issues
[0:39] <mozg> nice thanks for sharing
[0:40] <mozg> i am on 3.11 now
[0:40] * thomnico (~thomnico@70.35.39.20) Quit (Quit: Ex-Chat)
[0:41] * ircolle (~Adium@2601:1:8380:2d9:4cf:f5a8:9108:e140) Quit (Quit: Leaving.)
[0:41] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[0:45] <wrencsok> eek
[0:45] <wrencsok> i wanted to go there. i'll wait til 3.12
[0:45] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[0:45] <wrencsok> we can't use a dev kernel in production.
[0:46] <wrencsok> i can't wait to retest. and also start spinning up some 10GE clients.
[0:47] <wrencsok> that won't be for a week or so tho. we're not done rolling all our hardware changes out to the cluster.
[0:47] <wrencsok> only done a few nodes.
[0:49] * sjm (~sjm@38.98.115.250) has left #ceph
[0:49] * rongze (~rongze@117.79.232.203) has joined #ceph
[0:54] * AfC1 (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:54] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Read error: Connection reset by peer)
[0:54] * shang (~ShangWu@70.35.39.20) has joined #ceph
[0:58] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[0:59] * gsaxena (~gsaxena@pool-108-56-185-167.washdc.fios.verizon.net) has joined #ceph
[1:01] * AfC1 is now known as AfC
[1:01] <wrencsok> fwiw, i took that site down if folks try to click on the link. once we get done, i'll share the new numbers for different size vms, os's, and such. have a good night ceph channel.
[1:04] * thomnico (~thomnico@70.35.39.20) Quit (Read error: Operation timed out)
[1:08] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[1:19] * sagelap (~sage@12.130.118.19) has joined #ceph
[1:20] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[1:21] * mschiff (~mschiff@85.182.236.82) Quit (Ping timeout: 480 seconds)
[1:24] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Ping timeout: 480 seconds)
[1:25] * danieagle (~Daniel@179.176.57.59.dynamic.adsl.gvt.net.br) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[1:34] * shang (~ShangWu@38.126.120.10) has joined #ceph
[1:37] * Steki (~steki@198.199.65.141) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:37] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[1:39] * themgt_ (~themgt@181.72.252.96) has joined #ceph
[1:40] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[1:41] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[1:41] * themgt (~themgt@201-223-204-108.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[1:41] * themgt_ is now known as themgt
[1:47] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[1:47] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) Quit (Remote host closed the connection)
[1:47] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) has joined #ceph
[1:50] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Quit: leaving)
[1:53] * JoeGruher (~JoeGruher@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[2:04] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:06] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[2:06] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[2:10] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[2:11] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[2:11] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[2:17] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[2:19] * The_Bishop (~bishop@2a02:2450:102f:4:d10e:5b68:fd24:ff14) Quit (Ping timeout: 480 seconds)
[2:20] * The_Bishop (~bishop@2001:470:50b6:0:d10e:5b68:fd24:ff14) has joined #ceph
[2:23] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:25] * sjustlaptop (~sam@172.56.9.102) Quit (Quit: Leaving.)
[2:30] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Ping timeout: 480 seconds)
[2:32] * angdraug (~angdraug@64-79-127-122.static.wiline.com) Quit (Quit: Leaving)
[2:35] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[2:36] * BillK (~BillK-OFT@106-68-202-154.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[2:37] * shang (~ShangWu@38.126.120.10) Quit (Ping timeout: 480 seconds)
[2:38] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[2:38] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[2:39] * BillK (~BillK-OFT@106-68-144-144.dyn.iinet.net.au) has joined #ceph
[2:40] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[2:44] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) Quit (Ping timeout: 480 seconds)
[2:45] * shang (~ShangWu@70.35.39.20) has joined #ceph
[2:47] * BillK (~BillK-OFT@106-68-144-144.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[2:49] * BillK (~BillK-OFT@106-69-76-247.dyn.iinet.net.au) has joined #ceph
[2:54] * yy-nm (~Thunderbi@122.224.154.38) has joined #ceph
[2:59] * freedomhui (~freedomhu@117.79.232.235) has joined #ceph
[3:01] * The_Bishop (~bishop@2001:470:50b6:0:d10e:5b68:fd24:ff14) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[3:02] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:05] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[3:07] * JoeGruher (~JoeGruher@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[3:08] * Cube (~Cube@66-87-66-203.pools.spcsdns.net) has joined #ceph
[3:12] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:13] * haomaiwang (~haomaiwan@119.4.172.74) has joined #ceph
[3:15] * huangjun (~kvirc@111.172.153.78) has joined #ceph
[3:17] * freedomhui (~freedomhu@117.79.232.235) Quit (Quit: Leaving...)
[3:21] * haomaiwang (~haomaiwan@119.4.172.74) Quit (Ping timeout: 480 seconds)
[3:22] * thomnico (~thomnico@70.35.39.20) Quit (Ping timeout: 480 seconds)
[3:22] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[3:22] * BillK (~BillK-OFT@106-69-76-247.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[3:23] * BillK (~BillK-OFT@58-7-51-2.dyn.iinet.net.au) has joined #ceph
[3:26] * Pedras (~Adium@64.191.206.83) Quit (Quit: Leaving.)
[3:26] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Quit: shimo)
[3:27] * yehudasa__ (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[3:28] * freedomhui (~freedomhu@117.79.232.204) has joined #ceph
[3:29] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:34] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:36] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[3:38] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit ()
[3:47] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[3:47] * BillK (~BillK-OFT@58-7-51-2.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[3:49] * freedomhui (~freedomhu@117.79.232.204) Quit (Quit: Leaving...)
[3:49] * BillK (~BillK-OFT@106-69-88-100.dyn.iinet.net.au) has joined #ceph
[3:50] * rongze (~rongze@117.79.232.204) has joined #ceph
[3:52] * huangjun (~kvirc@111.172.153.78) Quit (Read error: Connection reset by peer)
[3:52] * huangjun (~kvirc@111.172.153.78) has joined #ceph
[3:54] * themgt (~themgt@181.72.252.96) Quit (Quit: Pogoapp - http://www.pogoapp.com)
[3:58] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[4:01] <huangjun> hello
[4:02] * freedomhui (~freedomhu@117.79.232.204) has joined #ceph
[4:03] <huangjun> a problem while run make check
[4:03] * freedomhui (~freedomhu@117.79.232.204) Quit ()
[4:04] <huangjun> CCLD unittest_arch
[4:04] <huangjun> ./.libs/libglobal.a(probe.o):(.data.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
[4:04] <huangjun> collect2: ld returned 1 exit status
[4:04] <huangjun> make[4]: *** [unittest_arch] Error 1
[4:12] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[4:13] * BillK (~BillK-OFT@106-69-88-100.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[4:13] * Cube (~Cube@66-87-66-203.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[4:14] * BillK (~BillK-OFT@124-169-105-117.dyn.iinet.net.au) has joined #ceph
[4:22] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:22] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[4:22] * sagelap (~sage@12.130.118.19) Quit (Remote host closed the connection)
[4:25] * rongze (~rongze@117.79.232.204) Quit (Remote host closed the connection)
[4:25] * rongze (~rongze@106.120.176.84) has joined #ceph
[4:26] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[4:26] * rongze (~rongze@106.120.176.84) Quit (Read error: Connection reset by peer)
[4:26] * rongze (~rongze@117.79.232.204) has joined #ceph
[4:36] * haomaiwang (~haomaiwan@183.220.21.115) has joined #ceph
[4:36] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[4:51] <pmatulis_> is it possible to create an format2 rbd image with qemu-img? if so, what is the syntax?
[4:54] <joshd1> pmatulis_: https://www.redhat.com/archives/libvirt-users/2013-October/msg00095.html
[4:55] <joshd1> huangjun: you may need to do a make clean; make distclean; to clear out old build products due to recent makefile changes
[4:56] <huangjun> joshd1: thanks, i will try it now
[4:58] * rongze (~rongze@117.79.232.204) Quit (Remote host closed the connection)
[4:59] <pmatulis_> joshd1: wow ok!
[4:59] <pmatulis_> my google failed me
[4:59] * rongze (~rongze@117.79.232.204) has joined #ceph
[4:59] <mikedawson> joshd1: is there a way to force osd journals on to ssd partitions in teuthology?
[5:00] <pmatulis_> are there any reason to *not* use format2?
[5:01] <lurbs> RBD drivers in older kernels doesn't support them.
[5:02] <pmatulis_> besides that
[5:02] <joshd1> mikedawson: I think you can specify particular journal devices - check out task/ceph.py
[5:03] <joshd1> pmatulis_: no other reasons
[5:05] * rongze_ (~rongze@211.155.113.241) has joined #ceph
[5:05] * fireD_ (~fireD@93-139-135-235.adsl.net.t-com.hr) has joined #ceph
[5:07] * rongze (~rongze@117.79.232.204) Quit (Ping timeout: 480 seconds)
[5:07] * fireD (~fireD@93-139-180-195.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:08] * freedomhui (~freedomhu@117.79.232.204) has joined #ceph
[5:11] * BillK (~BillK-OFT@124-169-105-117.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[5:13] * BillK (~BillK-OFT@124-169-89-28.dyn.iinet.net.au) has joined #ceph
[5:13] * The_Bishop (~bishop@2001:470:50b6:0:5d8:43e2:642c:5286) has joined #ceph
[5:16] <huangjun> joshd1: unfourtunely, is failed again
[5:17] <huangjun> with the same reason
[5:17] <huangjun> ./.libs/libglobal.a(probe.o):(.data.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0'
[5:19] <huangjun> and if i rename the test_arch.c to test_arch.cc, this error disappears,but other test failed
[5:25] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[5:25] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit ()
[5:32] * sagelap (~sage@2600:1001:b100:2f2:18f7:88d2:cd17:da33) has joined #ceph
[5:35] <huangjun> and make check-local passed
[5:37] * julian (~julianwa@125.70.133.130) has joined #ceph
[5:42] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) has left #ceph
[5:45] * freedomhui (~freedomhu@117.79.232.204) Quit (Quit: Leaving...)
[5:49] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[5:58] <huangjun> [ FAILED ] 1 test, listed below:
[5:58] <huangjun> [ FAILED ] BufferList.is_page_aligned
[5:58] <huangjun>
[5:58] <huangjun> 1 FAILED TEST
[5:58] <huangjun> FAIL: unittest_bufferlist
[5:58] * sagelap (~sage@2600:1001:b100:2f2:18f7:88d2:cd17:da33) Quit (Read error: Connection reset by peer)
[5:59] <huangjun> does this related to the unittest_arch
[6:11] * JoeGruher (~JoeGruher@134.134.137.71) has joined #ceph
[6:19] * sagelap (~sage@2600:1001:b100:2f2:18f7:88d2:cd17:da33) has joined #ceph
[6:20] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[6:25] * JoeGruher (~JoeGruher@134.134.137.71) Quit (Remote host closed the connection)
[6:27] * Pedras (~Adium@c-24-130-196-123.hsd1.ca.comcast.net) has joined #ceph
[6:27] * sleinen (~Adium@2001:620:0:25:50a6:a3e2:bf8f:f4da) has joined #ceph
[6:29] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:30] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[9:00] -magnet.oftc.net- *** Looking up your hostname...
[9:00] -magnet.oftc.net- *** Checking Ident
[9:00] -magnet.oftc.net- *** Couldn't look up your hostname
[9:00] -magnet.oftc.net- *** No Ident response
[9:00] * CephLogBot (~PircBot@92.63.168.213) has joined #ceph
[9:00] * Topic is 'Latest stable (v0.67.4 "Dumpling" or v0.61.8 "Cuttlefish") -- http://ceph.com/get || CDS Vids and IRC logs posted http://ceph.com/cds/ || New dev channel #ceph-devel'
[9:00] * Set by dmick!~dmick@2607:f298:a:607:fda5:f05e:4bd1:9153 on Wed Oct 09 03:04:48 CEST 2013
[9:02] * ksingh (~Adium@hermes1-231.csc.fi) has joined #ceph
[9:03] * ksingh (~Adium@hermes1-231.csc.fi) Quit ()
[9:03] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[9:04] * ksingh1 (~Adium@2001:708:10:10:a46e:da5d:29ac:7cce) Quit (Ping timeout: 480 seconds)
[9:12] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:12] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) Quit (Quit: Bye...)
[9:14] * mschiff (~mschiff@p4FD7E6B7.dip0.t-ipconnect.de) has joined #ceph
[9:18] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:19] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) has joined #ceph
[9:19] * JustEra (~JustEra@89.234.148.11) Quit ()
[9:22] * mschiff_ (~mschiff@tmo-108-227.customers.d1-online.com) has joined #ceph
[9:26] * mschiff (~mschiff@p4FD7E6B7.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[9:26] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[9:27] * mattt_ (~textual@94.236.7.190) has joined #ceph
[9:30] * RuediR1 (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) has joined #ceph
[9:33] * RuediR (~Adium@macrr.switch.ch) Quit (Ping timeout: 480 seconds)
[9:39] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:39] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:44] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[9:44] * ChanServ sets mode +v andreask
[9:44] * RuediR1 (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) Quit (Quit: Leaving.)
[9:46] * RuediR (~Adium@130.59.94.164) has joined #ceph
[9:47] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[9:48] * renzhi (~renzhi@116.226.38.214) has joined #ceph
[10:05] * sleinen (~Adium@2001:620:610:f0e:10ef:72f0:e600:3940) has joined #ceph
[10:06] * ismell_ (~ismell@host-64-17-89-79.beyondbb.com) Quit (Ping timeout: 480 seconds)
[10:10] * ksingh (~Adium@2001:708:10:91:c186:9010:9db9:df57) has joined #ceph
[10:10] * ksingh1 (~Adium@teeri.csc.fi) has joined #ceph
[10:13] * sleinen (~Adium@2001:620:610:f0e:10ef:72f0:e600:3940) Quit (Ping timeout: 480 seconds)
[10:16] * jcfischer (~fischer@macjcf.switch.ch) has joined #ceph
[10:17] * sleinen (~Adium@2001:620:0:26:41ee:9d74:1e68:c4bc) has joined #ceph
[10:18] * julian (~julianwa@125.70.133.130) Quit (Read error: Connection reset by peer)
[10:18] <jcfischer> I have installed a new ceph cluster (dumpling 67.4) with ceph-deploy and can create pools etc (rados mkpool, rados lspool). When I try to do "rados --pool foo ls", I get numerous connect claims to be [2001:620:0:6::11f]:6804/1016455 not [2001:620:0:6::11f]:6804/17552 - wrong node! errors
[10:18] * ksingh (~Adium@2001:708:10:91:c186:9010:9db9:df57) Quit (Ping timeout: 480 seconds)
[10:23] <huangjun> are you using ipv6?
[10:24] <yanzheng> 2001:620:0:6::11f is ipv6 address
[10:25] * ismell (~ismell@host-64-17-89-79.beyondbb.com) has joined #ceph
[10:28] <jcfischer> yes
[10:31] <jcfischer> I will try to rebuild the cluster (its just a test one) - I seem to have messed up ceph.conf anyway
[10:32] * rongze_ (~rongze@117.79.232.220) has joined #ceph
[10:33] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[10:35] * jtlebigot (~jlebigot@proxy.ovh.net) has joined #ceph
[10:35] <jcfischer> woah - I try to kill the different osd processes, but they respawn immediately - what process is controlling this? (service ceph stop doesn't seem to work)
[10:36] * ismell (~ismell@host-64-17-89-79.beyondbb.com) Quit (Read error: Operation timed out)
[10:37] * rongze (~rongze@211.155.113.241) Quit (Read error: Operation timed out)
[10:37] <Gdub> hi everyone
[10:37] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[10:38] <Gdub> i succeded in creating a cluster with rbd and iscsci
[10:38] <Gdub> but i put all my server downs
[10:38] <Gdub> and restarted them
[10:38] <Gdub> i had lots of errors
[10:38] <Gdub> PG degraded
[10:38] <Gdub> and so on
[10:39] <Gdub> well, in anycase, i decied to start over. So i purged the data and "flush" the keys
[10:39] <huangjun> jcfischer: on ubuntu, you should use stop ceph-all
[10:39] <Gdub> i can create the new cluster but i can't create the mon anymore
[10:39] <jcfischer> huangjun: thank I got it nuked - will reinstall a new one
[10:40] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) has joined #ceph
[10:40] <Gdub> http://pastebin.com/SGQ02kBj
[10:41] <foosinn> i might have a dumb question: how do i handle a failed disk in a ceph cluster? can i simply stop the osd, replace the disk and start it again?
[10:42] <foosinn> the documentation somehow seems to not cover this: http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
[10:45] <huangjun> foosinn: after you replace the disk, you should use ceph-deploy to add this disk to the cluster
[10:45] <huangjun> as an osd
[10:46] <foosinn> as a new osd again?
[10:46] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:46] <foosinn> so i don't fix a osd, i simply create a new one?
[10:46] <yanzheng> yes
[10:46] <huangjun> yes
[10:47] <foosinn> also i guess i have do delete the failed one? :)
[10:47] <huangjun> thanks,zheng, for redirect my bug tracker
[10:48] <huangjun> foosinn: you can delete it by ceph osd rm osd.N, and if you didn't, it will out afer 5min, then the cluster will recovery the failed disk data.
[10:49] * ksingh1 (~Adium@teeri.csc.fi) Quit (Quit: Leaving.)
[10:51] <foosinn> huangjun, i assume i have to delete it so have a clean 'ceph osd stat' again?
[10:53] <huangjun> and you should remove the failed osd in your config file
[10:54] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[10:56] <jcfischer> huangjun: so I have a new cluster with just one mon and still have the same symptom: 2001:620:0:6::11c]:0/1001972 >> [2001:620:0:6::11f]:6800/12421 pipe(0x7f8d4adf1a40 sd=4 :37282 s=1 pgs=0 cs=0 l=1 c=0x7f8d4adf1ca0).connect claims to be [2001:620:0:6::11f]:6800/1012890 not [2001:620:0:6::11f]:6800/12421 - wrong node!
[10:57] <jcfischer> which is interesting, because the mon actually runs on ::11e - not ::11f
[10:58] <huangjun> and what you config file
[10:58] <huangjun> i didn't test ipv6 before
[10:59] <jcfischer> we have a running ceph cluster on ipv6 (with a hand crafted ceph.conf)
[10:59] <wido> jcfischer: using bonding?
[10:59] <jcfischer> wido: of course
[10:59] <wido> and using Router Advertisements
[10:59] <wido> yes, so, that is causing the issues. Since the IP address is generated based on the MAC
[11:00] <jcfischer> no idea
[11:00] <wido> when the machine changes NIC the MAC address changes
[11:00] <foosinn> thanks for you help
[11:00] <wido> and thus the IP address changes
[11:00] <jcfischer> hmm - those are freshly installed boxes
[11:00] <ksingh> dear ceph experts please help me out , http://pastebin.com/ucWBjFY9 , unable to add monitors from 1 to 3
[11:01] <jcfischer> wido: our resident network guru sleinen is out of the office today - any idea on how to fix this?
[11:01] <wido> jcfischer: Assign a static IPv6 address to the nodes
[11:01] <huangjun> ksingh: have you installed ceph on the host you deployed to ?
[11:01] * jcfischer scrambles of to find out how to do that
[11:02] <ksingh> huangjun: yes i followed each step of documentation of ceph and installed using ceph-deploy install ( monitor node name ) command
[11:03] <huangjun> ksingh: can ceph -w works?
[11:03] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[11:04] <ksingh> yes it works from 1st monitor node
[11:05] <ksingh> i have 1 admin node ( admin-node ) , 1 monitor node ( thats working ) now want to add 2 more monitor nodes
[11:06] * ksingh (~Adium@teeri.csc.fi) Quit (Remote host closed the connection)
[11:06] * ksingh (~Adium@2001:708:10:10:15fd:1559:ae6d:d64e) has joined #ceph
[11:06] <huangjun> add mon is bugy for me
[11:07] <Gdub> same for me ^^ (http://pastebin.com/SGQ02kBj) :)
[11:07] <ksingh> Gdub : how many running monitors currently you have
[11:07] <huangjun> we just have one
[11:07] <Gdub> none
[11:08] <Gdub> am trying to create one
[11:08] <huangjun> if we want to use 3, we will build it on the startup
[11:08] <Gdub> and i had a workin cluster with everything but decided to start all over
[11:08] <ksingh> hauangjun: i also have 1 mon running , now scaling to 3 is a problem
[11:08] <Gdub> and now i can't create mon, neither gather keys
[11:08] <ksingh> there should be some solution to this
[11:09] <ksingh> huangjun: are u using ceph-deploy or manual metho
[11:09] <ksingh> *method
[11:10] <huangjun> ceph-deploy
[11:11] <ksingh> ok so do you have a separate nodes for ceph-deploy and MON and OSD
[11:11] <ksingh> ?
[11:12] <huangjun> yes
[11:12] <ksingh> alright , do you have ceph packages installed on ceph-deploy node ?
[11:13] <huangjun> i think you can try to add the new mon addr to the ceph.conf mon_initial_members and mon_host
[11:13] <huangjun> yes
[11:14] <huangjun> all node you want to use mon osd mon should install the ceph
[11:14] <ksingh> so are u able to run ceph status from your ceph-deploy node , is the command getting completed
[11:14] <ksingh> ?
[11:15] * allsystemsarego (~allsystem@188.27.166.164) has joined #ceph
[11:15] <huangjun> ceph status will look for the mon in ceph.conf
[11:15] <huangjun> so if you have mon running, it will works
[11:16] <ksingh> yes , what about your case is it getting worked , i have same setup and for me its not working
[11:22] <ksingh> ?
[11:22] * ksingh (~Adium@2001:708:10:10:15fd:1559:ae6d:d64e) Quit (Quit: Leaving.)
[11:23] * ksingh (~Adium@2001:708:10:10:15fd:1559:ae6d:d64e) has joined #ceph
[11:24] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[11:24] <sleinen> wido: We do use static IPv6 address assignment (DHCPv6 actually). But thanks for the hint!
[11:25] <wido> sleinen: Yes, it's something very annoying
[11:25] <wido> So for now, always use static IPv6 with Ceph when using bonding
[11:25] <ksingh> huangjun : is the command ceph status working for you from ceph-deploy node , if yes can you share you ceph.conf on pastbin
[11:27] <huangjun> you can try the ceph -m MONaddr status
[11:27] <huangjun> if this works for you, the deploy node is not you mon node?
[11:28] <ksingh> nopes this doesnt works from ceph-deploy node , but works from ceph-monitor node
[11:29] <ksingh> my question is ceph status should work both from deploy node as well as monitor nodes
[11:29] <jcfischer> wide, sleinen: that still doesn't solve my problem though - I am wondering why ceph tries to talk to an address (::11f) where no mon is running in the first place. I have amended /etc/ceph.conf similar to our production cluster with a specific entry for the one mon, specifying its adress
[11:29] <ksingh> what change should i don in ceph.conf to establish this relation / connection
[11:30] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[11:31] * csaesumu (~csaesumu@inf-205-218.inf.um.es) has joined #ceph
[11:31] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[11:31] <csaesumu> Hello there
[11:31] <Gdub> since i have problem with ceph-deploy mon create, i am trying to do manual install of one mon
[11:31] <Gdub> and yet i have the following error
[11:32] <Gdub> http://pastebin.com/ZtLNEaXh
[11:32] <Gdub> where could that come from ?
[11:32] <Gdub> thx
[11:33] * sleinen (~Adium@2001:620:0:26:41ee:9d74:1e68:c4bc) Quit (Ping timeout: 480 seconds)
[11:33] <ksingh> hello csaesumu , do have some idea about my question above
[11:34] <ksingh> Gdub , can you past complete output
[11:34] <csaesumu> I have a small question. How are object identifiers assigned? Is it a hash function on the pathname? Because I have seen CRUSH gets an oid as input, but couldn't figure out how that oid was assigned
[11:34] <ksingh> no idea at least to me
[11:35] <csaesumu> Oh ksingh I hope I could but I am a pretty new ceph's user
[11:35] <ksingh> hello huangjun : where are u dear , need your help
[11:37] <jcfischer> I'm purging everything, I think I had some leftovers and starting fresh after lunch
[11:38] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[11:38] * glanzi (~glanzi@201.75.202.207) Quit ()
[11:39] <ksingh> jcfischer : did you got your monitors running
[11:39] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[11:40] * sleinen (~Adium@2001:620:0:26:68bc:fa1c:f87a:d344) has joined #ceph
[11:42] <huangjun> ksingh: sorry,
[11:42] * sleinen2 (~Adium@2001:620:0:26:e178:1534:94aa:4e3b) has joined #ceph
[11:42] <ksingh> for what dear
[11:43] <Gdub> ksingh: i pasted the complete log :)
[11:44] <ksingh> Gdub link pls , i will try but not sure i knew this
[11:45] <Gdub> http://pastebin.com/ZtLNEaXh
[11:48] * sleinen (~Adium@2001:620:0:26:68bc:fa1c:f87a:d344) Quit (Ping timeout: 480 seconds)
[11:48] * sleinen (~Adium@2001:620:610:f0e:78d7:4133:52b7:5d39) has joined #ceph
[11:49] * sleinen1 (~Adium@eduforum-149-183.unil.ch) has joined #ceph
[11:49] * sleinen1 (~Adium@eduforum-149-183.unil.ch) Quit (Quit: Leaving.)
[11:49] * sleinen (~Adium@2001:620:610:f0e:78d7:4133:52b7:5d39) has left #ceph
[11:50] * sleinen2 (~Adium@2001:620:0:26:e178:1534:94aa:4e3b) Quit (Ping timeout: 480 seconds)
[11:51] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[11:53] * mschiff (~mschiff@tmo-108-224.customers.d1-online.com) has joined #ceph
[11:53] * mschiff_ (~mschiff@tmo-108-227.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[11:54] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:55] * ScOut3R_ (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[11:55] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Read error: Connection reset by peer)
[11:59] <ksingh> Gdub : this links still does not show fill output , including from the command that you executed till end
[11:59] <ksingh> well what are u trying to achieve
[11:59] <Gdub> hmm
[11:59] <Gdub> i had a working cluster
[11:59] <Gdub> but then shutdown my 3 servers
[11:59] <Gdub> when it got back
[11:59] <ksingh> details pls , like how many nodes
[12:00] <Gdub> 3 nodes
[12:00] <Gdub> 3 mon
[12:00] <Gdub> so when it got back, everything was degragated
[12:00] <Gdub> so i decided to purge
[12:00] <Gdub> and redo the setup
[12:00] <Gdub> from the ceph-deploy docu
[12:01] <Gdub> i can create my cluster
[12:01] <Gdub> but when its about creating the mon
[12:01] <Gdub> its failing
[12:01] <Gdub> http://pastebin.com/9hjkppVY
[12:02] <Gdub> so since the cephdeploy did not work, i wante to test the manual install
[12:02] <Gdub> manual creation of mon sorry
[12:02] <Gdub> but failing
[12:05] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[12:06] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[12:16] <AndreyGrebennikov> hello there people! Is it possible to set up ceph engine so that nodes could communicate through ip addresses instead of names?
[12:16] <aarontc> AndreyGrebennikov: I believe that is the default behavior
[12:17] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[12:17] <AndreyGrebennikov> aarontc, hm.. as I remember we should use hostnames in ceph.conf by default...
[12:17] <AndreyGrebennikov> aarontc, no?
[12:17] <aarontc> That is just so the nodes can identify themselves, i.e. the init script will know which mon to start when it runs on a specific machine
[12:19] <aarontc> Your ceph.conf should also have the IP address specified for each mon
[12:23] <ksingh> Gdub : did ceph-deploy install worked fine before this step
[12:26] <Gdub> yep
[12:26] <Gdub> went perfect
[12:26] <Gdub> http://pastebin.com/7TqEDTfh
[12:30] * mschiff_ (~mschiff@p4FD7E6B7.dip0.t-ipconnect.de) has joined #ceph
[12:32] * mschiff_ (~mschiff@p4FD7E6B7.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[12:35] <huangjun> ksingh: do you run ceph status on the mon node normally?
[12:36] <huangjun> if it works, then i recommend you rebuild the cluster, bc you add new mon failed.
[12:36] * mschiff (~mschiff@tmo-108-224.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[12:36] <ksingh> yes i from mon node ceph status is working fine , my next short goal is to run ceph status from deploy node , pls help me in that
[12:36] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[12:38] <huangjun> now, backup your ceph.conf on deploy node, and then scp it *from* the mon node
[12:38] <huangjun> and then run ceph status
[12:39] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) has joined #ceph
[12:40] <ksingh> you mean copy the ceph.conf file from mon node ???> deploy node
[12:40] <ksingh> Gdub : i know u have done it correctly , check this http://switzernet.com/3/public/130925-ceph-cluster/ if you get something
[12:41] <Gdub> ok thank yoi ksingh, checkin that out now
[12:42] <huangjun> ksingh: yes
[12:43] <huangjun> we should first locate the problem
[12:43] <ksingh> Huangjun : i done that and tried ceph status
[12:43] <ksingh> got this
[12:43] <ksingh> [root@ceph-admin ceph]# ceph status
[12:43] <ksingh> 2013-10-23 13:42:14.525231 7fdf02dd4700 0 -- :/26319 >> 192.168.1.28:6789/0 pipe(0x1cfa510 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[12:43] <ksingh> 2013-10-23 13:42:17.525720 7fdefc4bc700 0 -- :/26319 >> 192.168.1.28:6789/0 pipe(0x7fdef4000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[12:43] <ksingh> 2013-10-23 13:42:20.526052 7fdf02dd4700 0 -- :/26319 >> 192.168.1.28:6789/0 pipe(0x7fdef4003010 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[12:43] <ksingh> one interesting thing , i cannot get service status on ceph-deploy node
[12:44] <ksingh> [root@ceph-admin ceph]# service ceph status
[12:44] <ksingh> [root@ceph-admin ceph]# service ceph start
[12:44] <ksingh> [root@ceph-admin ceph]#
[12:44] <huangjun> bc you can not connect to the mon
[12:44] <ksingh> ceph packages are installed on ceph-admin ( my ceph-deploy node ) but services are not running
[12:44] <ksingh> Alright , understand
[12:44] <huangjun> 192.168.1.28 is your mon ip?
[12:44] <ksingh> yes
[12:45] <ksingh> my working mon
[12:45] <ksingh> so how to locate problem now :/
[12:47] <huangjun> can you ping 192.168.1.28 works?
[12:49] <huangjun> and do you enabled authx?
[12:52] <ksingh> yes ping works , authx also added
[12:53] <ksingh> now i destroyed entire cluster coz i was getting several problems , lets see how it works on recreating
[12:53] <ksingh> one advice i need from you , i am using ceph-deploy to create cluster from my ceph-admin node
[12:54] <ksingh> so there will be a clustername.conf file gets created on current working directory of ceph-admin node
[12:54] <ksingh> do i need to push this file to other nodes , or the cluster deployment takes care of this
[12:54] <huangjun> you can sepcify the -overwrite-conf
[12:55] <huangjun> then it will overwrite the conf on deploy dir to remote or local /etc/ceph
[12:55] <huangjun> and i just seen the http://switzernet.com/3/public/130925-ceph-cluster/,it nice
[12:56] <huangjun> if you follow the step. it will works
[12:56] * rongze_ (~rongze@117.79.232.220) Quit (Remote host closed the connection)
[12:58] <ksingh> Thanks : i followed several tutorials but still far away from a running ceph cluster
[13:00] <huangjun> my advice, you should build a cluster with 1 mon and 1 mds and multi osd at first
[13:01] <huangjun> then you can build cluster with muti mon and multi mds
[13:02] <huangjun> from our test, ceph is much stable than 3 years ago, and for a small cluster, 1 mon and 1 mds is ok
[13:03] * laithshadeed (~Owner@80.227.44.122) has joined #ceph
[13:07] <jcfischer> weird: ceph-deploy mon create ??? and ceph-deploy gather keys ??? doesn't create/find the admin.keyring: Unable to find /etc/ceph/ceph.client.admin.keyring on ['h0s']
[13:07] <huangjun> is ceph-deploy gatherkeys
[13:10] <jcfischer> typo in irc - it gathered the other keyes
[13:10] <jcfischer> re-building the cluster one more time
[13:11] * ksingh (~Adium@2001:708:10:10:15fd:1559:ae6d:d64e) Quit (Quit: Leaving.)
[13:12] <jcfischer> it gets worse: http://pastebin.com/9Nxt90ic
[13:12] <jcfischer> now it doesn't find any keys
[13:14] <huangjun> when you rebuild you cluster, you should run ceph-deploy forgetkeys
[13:14] <huangjun> i'll go now,
[13:15] * huangjun (~kvirc@111.172.153.78) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[13:15] * sleinen (~Adium@2001:620:0:26:9b7:fe0f:7b7f:3be6) has joined #ceph
[13:16] <jcfischer> I did
[13:16] <jcfischer> I will do another round
[13:16] <andreask> jcfischer: you also did the ceph-deploy purgedata h0s ?
[13:17] <jcfischer> ah - I did a purge and removed /var/lib/ceph manually
[13:18] <jcfischer> running ceph-deploy {new, install} h0s h1s s0s now
[13:20] * rongze (~rongze@117.79.232.235) has joined #ceph
[13:20] <jcfischer> andreask: same problem: no admin, and no bootstrap keys on h0s (after I created a mon on h0s)
[13:21] <andreask> and permissions are all fine?
[13:21] <andreask> so ssh connect and creating files is no problem?
[13:21] <jcfischer> I'm doing it as root (and it worked yesterday on the same machines)
[13:23] <andreask> so nothing in the tmp dirs on the mon?
[13:24] <jcfischer> exactly
[13:24] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[13:25] <andreask> hmm ... also not this "done" file?
[13:25] <jcfischer> here's the log from the install I just did:
[13:25] <jcfischer> http://pastebin.com/CN6n2ify
[13:26] <jcfischer> the done file is in root@h0s:/var/lib/ceph/mon/ceph-h0s#
[13:26] <jcfischer> (with keyring, store.db and upstart)
[13:26] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[13:28] <ksingh> jcfischer are you expanding your cluster with with one more mon , or this is the first mon
[13:29] <jcfischer> this is the first mon - I purged, purged data, installed
[13:29] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[13:29] <jcfischer> I basically nuked the cluster and start totally fresh
[13:31] <andreask> jcfischer: and /etc/ceph is empty on h0s?
[13:32] <jcfischer> andreask: yes - both /etc/ceph and /var/lib/ceph are gone
[13:34] <jcfischer> I'm redoing it again from scratch - will past the log in a second
[13:35] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[13:35] <jcfischer> (creating the mon on a different host for kicks)
[13:37] <jcfischer> ksingh, andreask: same result: http://pastebin.com/AT0B4fr2
[13:37] <andreask> jcfischer: /etc/hosts is also fine on all nodes ... resolves the shortnames to the correct ips?
[13:37] <jcfischer> yes
[13:37] <andreask> and you use all ipv6?
[13:38] <jcfischer> yes
[13:39] <jcfischer> I'll try a manual install next
[13:40] <ksingh> jcfischer : cd to /etc/ceph
[13:40] <ksingh> again gather keys
[13:46] <AndreyGrebennikov> aarontc, sorry for being away. So, when I run ceph-deploy may it operate with ips rather then names?
[13:49] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[13:50] <jcfischer> actually, trying with the github version of ceph-deploy first
[13:52] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[13:52] <andreask> jcfischer: have you checked the ceph-mon is running?
[13:52] <jcfischer> nope - I didn't
[13:52] <jcfischer> let me try with the new ceph-deploy and see what happens
[13:56] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[13:56] <jcfischer> andreask: ok - mon is running - but so is ceph-create-keys
[13:57] <andreask> jcfischer: hmm ... what ceph release are you running?
[13:58] <jcfischer> ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)
[13:58] <jcfischer> root 32108 0.1 0.0 34248 7504 ? Ss 13:55 0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i h0s
[13:59] <jcfischer> if that doesn't finish, that would explain the missing key rings, no?
[13:59] <andreask> exactly
[13:59] <andreask> there was an older bug in the ceph-tools that should be already fixed
[14:00] <jcfischer> I just pulled ceph-deploy from github
[14:00] <jcfischer> do you know where I can look for that bug (or its fix)?
[14:01] <andreask> no iptables on the mon?
[14:01] <topro> anyone knows if there is a regular point-release-schedule, like for 0.67.5?
[14:01] <jcfischer> andreask: all wide open
[14:01] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:02] <andreask> jcfischer: I read that in http://tracker.ceph.com/issues/4924
[14:03] <pmatulis_> where can i see a list of values that the CEPH_ARGS environment variable can be set to and where is it typically defined?
[14:03] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[14:03] * jcfischer is reading
[14:03] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[14:03] * dxd828 (~dxd828@195.191.107.205) Quit ()
[14:04] * sleinen (~Adium@2001:620:0:26:9b7:fe0f:7b7f:3be6) Quit (Quit: Leaving.)
[14:05] <andreask> jcfischer: you had a look at the mon status?
[14:06] <jcfischer> "state": "probing",
[14:06] <jcfischer> ah - I know the problem - I set up 3 mons in the "new" call, but only installed one...
[14:06] * jcfischer hits head on desk
[14:07] <jcfischer> and guess what I just found in /var/lib/ceph/bootstrap-osd?
[14:07] <ksingh> hello andreask :^)
[14:07] <andreask> oh ... no quorum
[14:08] * BillK (~BillK-OFT@124-169-89-28.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:08] <jcfischer> Installed the other mons, got quorum, all good!
[14:08] <jcfischer> can I provide a doc patch to the wiki?
[14:08] <ksingh> i removed everything and now i am going to build a cluster
[14:09] * BillK (~BillK-OFT@124-169-102-184.dyn.iinet.net.au) has joined #ceph
[14:09] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[14:09] <ksingh> andreask , i need to have 3 mons , so should i create all the mons in ceph-deploy create mon mon1 mon2 mon3 , like this
[14:10] <jcfischer> thanks for the help
[14:10] <ksingh> or the other way round , create 1st mon and after that add 2 more , previously i got problem adding 2 mons at later statge
[14:10] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[14:10] <ksingh> jcfischer : did you now have 1 mon or 3 mons runnign ?
[14:11] <jcfischer> I had only 1 mon running (but had defined 3), so there was no quorum and ceph-create-keys never finished
[14:11] <jcfischer> as soon as I installed the other 2 mons, things started working
[14:11] <andreask> jcfischer: sure, patches to the docs are always welcome ;-)
[14:12] <jcfischer> are the sources on github as well?
[14:15] <andreask> ksingh: yes, if you installed 3 mons you can create them like this
[14:15] <jcfischer> andreask: now I'm back to square 1: root@hxs:~/ceph# rados --pool data ls
[14:15] <jcfischer> 2013-10-23 14:15:14.695126 7f684ad0e700 0 -- [2001:620:0:6::11c]:0/1021146 >> [2001:620:0:6::11f]:6800/24721 pipe(0x19c8050 sd=4 :52418 s=1 pgs=0 cs=0 l=1 c=0x19c82b0).connect claims to be [2001:620:0:6::11f]:6800/1025199 not [2001:620:0:6::11f]:6800/24721 - wrong node!
[14:15] <andreask> jcfischer: yes, should be on github
[14:16] <ksingh> thanks you should get a cup of coffee , i am starting my installation
[14:17] * freedomhui (~freedomhu@106.120.176.65) Quit (Quit: Leaving...)
[14:18] <jcfischer> andreask: do you know which repo? search for doc doesn't immediately yield a result
[14:19] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:20] <andreask> jcfischer: hm .. isn't it all in ceph/doc?
[14:21] <jcfischer> andreask: na - that would be too obvious??? wait!
[14:21] <jcfischer> (yes of course, found it)
[14:26] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:29] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[14:31] * ksingh (~Adium@teeri.csc.fi) Quit (Quit: Leaving.)
[14:32] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[14:33] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[14:34] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[14:35] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[14:36] * sleinen1 (~Adium@2001:620:0:25:fd95:1b34:dfba:2e4d) has joined #ceph
[14:37] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[14:37] * dxd828 (~dxd828@195.191.107.205) Quit (Ping timeout: 480 seconds)
[14:40] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:43] <ksingh> andreask : can you check this http://pastebin.com/e3L06Xgi i ceph-deploy new and ceph-deploy install steps were fine
[14:43] <jcfischer> andreask: https://github.com/ceph/ceph/pull/759
[14:44] <ksingh> getting problem with ceph-deploy mon create , 30 minutes back i removed everything and now i am rebuilding my cluster
[14:44] <ksingh> jcfischer : can you also check this http://pastebin.com/e3L06Xgi
[14:45] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[14:47] <jcfischer> ksingh: did you install ceph on the cluster before running "mon create" (It looks like /etc/ceph is not around - for me that was the case when I purged the software from the system)
[14:48] <jcfischer> I had to install it, so that /etc/ceph was created
[14:48] <jcfischer> and then could create the mon
[14:48] <andreask> yes, looks like there are steps missing before
[14:48] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[14:49] <ksingh> guys i purged from admin-node
[14:50] <jcfischer> ksingh: you will also need to install from admin-node then
[14:50] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[14:51] <ksingh> after that i done 1) [root@ceph-admin ceph]# ceph-deploy new ceph-mon1 ceph-mon2 ceph-mon3 then 2) [root@ceph-admin ceph]# ceph-deploy install ceph-mon1 ceph-mon2 ceph-mon3 this also went fine
[14:51] * themgt (~themgt@181.72.252.96) has joined #ceph
[14:51] <ksingh> and finally 3) [root@ceph-admin ceph]# ceph-deploy mon create ceph-mon1 ceph-mon2 ceph-mon3 this is giving error
[14:52] <ksingh> there is a local directory on ceph-admin node /etc/ceph and it has files in it ceph.conf , ceph.log and ceph.mon.keyring
[14:52] <andreask> and /etc/ceph is there on the mons?
[14:53] <ksingh> so ceph packages should be installed on all the nodes in the second step which was OK , so ceph packages are there on all nodes
[14:53] <ksingh> andreask : no
[14:53] <ksingh> does i need to create empty directory on MONS /etc/ceph
[14:53] <andreask> ksingh: well ...then something went wrong ... packages are installed, you checked?
[14:54] <ksingh> yes packages are there
[14:54] <ksingh> [root@ceph-mon3 ~]# rpm -qa | grep ceph
[14:54] <ksingh> libcephfs1-0.67.4-0.el6.x86_64
[14:54] <ksingh> python-ceph-0.67.4-0.el6.x86_64
[14:54] <ksingh> ceph-0.67.4-0.el6.x86_64
[14:54] <ksingh> ceph-release-1-0.el6.noarch
[14:54] <ksingh> [root@ceph-mon3 ~]#
[14:54] <ksingh> similarly on other mons as well
[14:55] <andreask> strange ... the install of ceph-0.67.4-0.el6.x86_64 should create this
[14:56] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:56] <andreask> ksingh: hmm ... selinux enabled?
[14:57] <ksingh> andreask : i manually created /etc/ceph on all nodes
[14:57] <ksingh> now i am back to square one , i encountred what i saw yesterday ,
[14:58] <jcfischer> andreask, wido: I have a clue to my osd "wrong node" problem. On our old cluster, the OSDs are binding to :::6839 (etc) while on the new cluster (the one that doesn't work) the bind to the ipv6 address 2001:620:0:6::11e:6800
[14:58] <jcfischer> where can I convince the OSD to bind to all ipv6 addresses?
[15:00] * shang (~ShangWu@38.126.120.10) has joined #ceph
[15:01] <jcfischer> ah - ms bind ipv6 = true might be the ticket
[15:02] <ksingh> jcfischer : few minutes back you were facing something similar http://pastebin.com/2vpT44kt
[15:02] <ksingh> andreask : what did you suggested on http://pastebin.com/2vpT44kt
[15:02] <jcfischer> ksingh: do your mons have quorum?
[15:03] <ksingh> i created 3 mons , so it should have
[15:03] <jcfischer> ksingh: ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon1.asok mon_status
[15:03] <ksingh> cluster is not ready withouth this step
[15:04] <ksingh> jcfischer : this is getting executed on ceph-mon1 node but not on other nodes
[15:04] <jcfischer> yes
[15:05] <ksingh> [root@ceph-admin ceph]# ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon1.asok mon_status
[15:05] <ksingh> connect to /var/run/ceph/ceph-mon.ceph-mon1.asok failed with (2) No such file or directory
[15:05] <ksingh> [root@ceph-admin ceph]#
[15:05] * yanzheng (~zhyan@101.82.61.79) has joined #ceph
[15:05] <jcfischer> this need to be executed on (one) mon node (adjust the name of the mon in the commandline)
[15:05] <ksingh> oppse sorry
[15:06] <ksingh> my apologise , yes command is working on all the 3 mons
[15:06] <ksingh> what does this signifies
[15:06] <andreask> it gives you information about the current state and configuration of the mons
[15:07] <ksingh> okey
[15:07] <jcfischer> andreask, wide: adding "ms bind ipv6 = true" to ceph.conf made the "wrong nodes" error go away - it seems my cluster is up and running!
[15:08] <andreask> jcfischer: cool ;-)
[15:08] <ksingh> hey guys now my turn
[15:08] <ksingh> what should i do for this warning http://pastebin.com/2vpT44kt
[15:08] <jcfischer> back to openstack havana glance & cinder with rbd then
[15:08] <jcfischer> can you paste bin the output of the mon status?
[15:09] <ksingh> u mean this command ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon1.asok mon_status
[15:10] <andreask> yes
[15:11] <ksingh> here u go http://pastebin.com/540mQB0J
[15:12] <jcfischer> there's your problem: "state": "probing",
[15:12] <jcfischer> your mons don't have quorum
[15:12] <ksingh> ohoooooo not again
[15:12] <jcfischer> not sure why not, though
[15:14] <andreask> ksingh: of course no firewall?
[15:14] <ksingh> yeah no firewall , this is testing box
[15:14] <ksingh> ping resolution is nice
[15:15] <andreask> so iptables-save is empty?
[15:16] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[15:17] * markbby (~Adium@168.94.245.2) has joined #ceph
[15:17] <ksingh> yes ,
[15:18] <ksingh> and also just now i disabled firewall to be on safer side
[15:19] <andreask> any logs from the mons?
[15:19] * WebSpider (~webspider@37-251-7-113.FTTH.ispfabriek.nl) Quit (Read error: Connection reset by peer)
[15:20] * sleinen1 (~Adium@2001:620:0:25:fd95:1b34:dfba:2e4d) Quit (Quit: Leaving.)
[15:23] * WebSpider (~webspider@2001:7b8:1518:0:908a:2be2:f1a7:1545) has joined #ceph
[15:24] * Teduardo (~DW-10297@dhcp92.cmh.ee.net) has joined #ceph
[15:24] * The_Bishop (~bishop@2001:470:50b6:0:5d8:43e2:642c:5286) Quit (Ping timeout: 480 seconds)
[15:25] <ksingh> andreask nothing much found in monitor logs http://pastebin.com/AfDhD2v3
[15:25] <ksingh> i am now check cluster logs as well
[15:25] <Teduardo> Has any large scale ceph deployment published findings on the observed suitability of various classes of SATA hard disks as it pertains to ceph clusters? For instance, within the Western Digital product line there are at least 6 different classes of 4TB SATA hard drives.
[15:26] <Teduardo> Black, Red, SE, RE and others
[15:27] <ksingh> andreask : nothing on ceph.logs as well excepth this
[15:27] <ksingh> 2013-10-23 16:00:22,603 [ceph_deploy.gatherkeys][WARNING] Unable to find /var/lib/ceph/bootstrap-mds/ceph.keyring on ['ceph-mon1']
[15:27] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[15:29] <Gdub> ksingh: i started all over
[15:29] <Gdub> and i got something working ;)
[15:29] <Gdub> just small question
[15:30] <Gdub> i wanted to add a 2nd mon
[15:30] <Gdub> and yet i have these error: http://pastebin.com/aXZjyngr
[15:30] <foosinn> is there a way to edit the osdmap? i deployed a cluster using ceph-deploy. now i would like to set the cluster_addr values for my osds. is this possible?
[15:30] <andreask> ksingh: the ips are correct of the mons?
[15:30] <ksingh> yes sir correct
[15:31] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[15:32] <ksingh> Gdub : try this ceph-deploy --overwrite-conf to overwrite mon create ceph-node2-osd
[15:33] <andreask> ksingh: does it help to restart the mons?
[15:33] <ksingh> restart from ceph-deploy node or manually go to each nodes cli and do ??
[15:34] <ksingh> andreask - i restarted services of all the MON nodes manually
[15:34] <andreask> ksingh: manuall
[15:35] <ksingh> services came up with no problem , but cluster status not showing
[15:35] <ksingh> [root@ceph-mon1 ceph]# ceph status
[15:35] <ksingh> 2013-10-23 16:33:56.622159 7f0938adf700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[15:35] <ksingh> 2013-10-23 16:33:56.622231 7f0938adf700 0 librados: client.admin initialization error (2) No such file or directory
[15:35] <ksingh> Error connecting to cluster: ObjectNotFound
[15:35] <ksingh> [root@ceph-mon1 ceph]#
[15:35] <ksingh> cephx for authentication ??? this is bugging
[15:36] <Gdub> ksingh: a bit better but .. http://pastebin.com/46S2t8XQ :)
[15:37] <Gdub> [WARNIN] ceph-node2-osd is not defined in `mon initial members`
[15:37] <Gdub> should i really add the node the to the members?
[15:38] <ksingh> Gdub - does this exist /var/run/ceph/ceph-mon.ceph-node2-osd.asok
[15:39] <Gdub> nah nothing
[15:39] <Gdub> only /var/run/ceph
[15:39] <andreask> ksingh: is the keyring existing on mon1?
[15:41] * sleinen (~Adium@eduroam-2-026.epfl.ch) has joined #ceph
[15:41] <ksingh> nopes
[15:41] <ksingh> andreask : ceph-deploy gatherkeys ceph-mon1 step is responsible to create keyrings on mon1 , in my case this is giving me warnings , that i showd to you earlier
[15:41] <ksingh> thats why no keyring on any mon
[15:41] <andreask> ksingh: true ... no mons up no keys ...
[15:42] <ksingh> Gdub : are u running the commands from ROOT user
[15:42] <Gdub> no
[15:42] <ksingh> try with root
[15:43] <ksingh> i trust this is not production environment ??
[15:43] * sleinen1 (~Adium@2001:620:0:26:eddd:bcf7:642e:6b92) has joined #ceph
[15:43] <Gdub> nope :)
[15:43] <ksingh> kill it then :^)
[15:44] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Quit: wogri_risc)
[15:44] <andreask> ksingh: so your mons are still probing? ... and listening on the network shows you packages from the other mons?
[15:44] <Gdub> its worth with root :p
[15:45] <ksingh> andreask - after service restart , no still probing ,
[15:46] <ksingh> gdub - I am not getting my cluster UP :'( Thank God at least you :^) keep it up
[15:46] <andreask> ksingh: hmm ... maybe debugging output of the mons reveals something
[15:47] <ksingh> where to get it from
[15:47] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:48] <andreask> ksingh: here are examples http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/
[15:49] * sleinen (~Adium@eduroam-2-026.epfl.ch) Quit (Ping timeout: 480 seconds)
[15:50] * dmsimard (~Adium@2607:f748:9:1666:908c:55c9:a8b4:5272) has joined #ceph
[15:50] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[15:51] <Gdub> ahah ksingh. thanx anyway for ur help
[15:53] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:54] * zhyan_ (~zhyan@101.83.104.57) has joined #ceph
[15:56] <ksingh> Gdub : Most welcome
[15:56] <pmatulis_> ksingh: 'gatherkeys ceph-mon1' does not create keys on ceph-mon1
[15:57] * yanzheng (~zhyan@101.82.61.79) Quit (Ping timeout: 480 seconds)
[15:57] <ksingh> pmatulis - so how to do it
[15:57] <ksingh> [root@ceph-mon3 ceph]# ceph status
[15:57] <ksingh> 2013-10-23 16:54:41.787309 7fc029f92700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[15:57] <ksingh> 2013-10-23 16:54:41.787356 7fc029f92700 0 librados: client.admin initialization error (2) No such file or directory
[15:57] <ksingh> Error connecting to cluster: ObjectNotFound
[15:57] <ksingh> [root@ceph-mon3 ceph]#
[15:58] <ksingh> client.admin eror
[15:58] * dmsimard (~Adium@2607:f748:9:1666:908c:55c9:a8b4:5272) has left #ceph
[15:58] * dmsimard (~Adium@2607:f748:9:1666:908c:55c9:a8b4:5272) has joined #ceph
[16:00] <pmatulis_> ksingh: gatherkeys gathers keys from the monitor and places them locally
[16:00] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[16:01] <ksingh> i do not have file ceph.client.admin.keyring on any node in my cluster
[16:01] <ksingh> i need to get it generated , and as per document this generates using ceph-deploy gatherkeys command
[16:01] <pmatulis_> ksingh: in order to use a key without specifying it with a command (ceph status) <--- no key specified you need to put the admin keyring under /etc/ceph the admin keyring is used by default and will use it if found under /etc/ceph
[16:03] <ksingh> on my ceph-admin node under /etc/ceph i have this
[16:03] <ksingh> [root@ceph-admin ceph]# ls
[16:03] <ksingh> ceph.conf ceph.log ceph.mon.keyring
[16:03] <ksingh> [root@ceph-admin ceph]#
[16:03] <ksingh> do you mean i should manually copy ceph.mon.keyring file from admin node to MONITOR nodes
[16:04] <pmatulis_> ksingh: i said the admin keyring (ceph.client.admin.keyring is the default name; but replace 'ceph' if that is not your cluster name)
[16:04] <pmatulis_> ksingh: are you following any documentation?
[16:04] <andreask> ksingh: you got some debug logs?
[16:06] * ismell (~ismell@host-64-17-89-79.beyondbb.com) has joined #ceph
[16:06] <ksingh> andreask - i am still in process of getting the logs
[16:06] * haomaiwang (~haomaiwan@175.152.18.101) has joined #ceph
[16:06] <ksingh> pmatulis - sir i am following http://ceph.com/docs/next/start/quick-ceph-deploy/ , please see step no 4
[16:07] <ksingh> i got some warnings on step no 4 those warnings are http://pastebin.com/2vpT44kt
[16:07] <ksingh> this is the reason i do not have file {cluster-name}.client.admin.keyring in /etc/ceph direcrory of my admin node
[16:08] <ksingh> so to proceed further step no 4 should be completed
[16:08] <pmatulis_> ksingh: correct, don't go any further if the gatherkeys command failed
[16:09] <pmatulis_> ksingh: how did the install of the monitors go?
[16:10] <pmatulis_> ksingh: remember that you should be invoking all ceph-deploy commands from the same directory, have you been doing that?
[16:10] * freedomhui (~freedomhu@117.79.232.203) has joined #ceph
[16:11] <pmatulis_> ksingh: also, it prolly won't matter much here, but try using the up-to-date docs:
[16:11] <pmatulis_> http://ceph.com/docs/master/start/quick-ceph-deploy/
[16:14] * alfredodeza (~alfredode@172.56.1.178) has joined #ceph
[16:15] <ksingh> pmatulis - yes i am invoking ceph-deploy from the same directory
[16:16] * alfredodeza (~alfredode@172.56.1.178) has left #ceph
[16:16] <ksingh> you can see http://pastebin.com/wPfUH81n my install logs , few errors were there but finally it got installed correctly
[16:16] <pmatulis_> ksingh: why don't you purge a monitor and install it again?
[16:17] <ksingh> :-D so problem i can do that but during last 4 days this would be the 4 time i am doing this
[16:17] <ksingh> do you suggest me to do it again ?
[16:17] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:18] <pmatulis_> ksingh: what was your method?
[16:19] * shang (~ShangWu@38.126.120.10) Quit (Quit: Ex-Chat)
[16:20] <ksingh> for clearing everything
[16:20] <ksingh> ceph-deploy purgedata {ceph-node} [{ceph-node}]
[16:20] <ksingh> ceph-deploy forgetkeys
[16:20] <ksingh> remove contents fo /etc/ceph if they are still there
[16:20] <ksingh> and remove contents of /var/lib/ceph
[16:20] <ksingh> any thing i am missing here
[16:20] <ksingh> ??
[16:20] * smiley (~smiley@205.153.36.170) has joined #ceph
[16:21] * claenjoy (~leggenda@37.157.33.36) Quit (Quit: Leaving.)
[16:23] <andreask> ksingh: have you ever done a low-level check and tried to connect with nc or telnet from one mon to the other on port 6789?
[16:24] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:31] * zhyan_ (~zhyan@101.83.104.57) Quit (Read error: Connection timed out)
[16:32] * wrencsok1 (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[16:32] * zhyan_ (~zhyan@101.83.104.57) has joined #ceph
[16:32] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[16:33] * freedomhui (~freedomhu@117.79.232.203) Quit (Quit: Leaving...)
[16:34] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[16:35] * allsystemsarego (~allsystem@188.27.166.164) Quit (Ping timeout: 480 seconds)
[16:35] * allsystemsarego (~allsystem@188.27.166.164) has joined #ceph
[16:36] <pmatulis_> ksingh: prolly should do 'ceph-deploy purge <host>' as well. this will remove ceph packages and purge all data. not sure if 'purgedata' takes care of the second bit
[16:36] * wrencsok (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) Quit (Ping timeout: 480 seconds)
[16:38] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[16:38] * markbby (~Adium@168.94.245.3) has joined #ceph
[16:39] * sleinen1 (~Adium@2001:620:0:26:eddd:bcf7:642e:6b92) Quit (Quit: Leaving.)
[16:40] <wido> jcfischer: there is no way to set a OSD to bind to all addresses
[16:40] * claenjoy (~leggenda@37.157.33.36) Quit (Quit: Leaving.)
[16:41] * rongze (~rongze@117.79.232.235) Quit (Remote host closed the connection)
[16:41] <jcfischer> wido: adding ms_bind_ipv6 = true in ceph.conf took care of that problem
[16:42] <pmatulis_> ksingh: what about provisioning the monitor?
[16:43] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:43] <ksingh> what do you mean by this
[16:44] * BillK (~BillK-OFT@124-169-102-184.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:48] <mattt_> question -- what happens if an object goes missing on the primary OSD for the pg in question?
[16:48] <mattt_> does the read get deferred to a replica?
[16:53] * lofejndif (~lsqavnbok@109.163.233.195) has joined #ceph
[16:55] * rongze (~rongze@117.79.232.203) has joined #ceph
[16:55] * sarob (~sarob@mobile-198-228-235-204.mycingular.net) has joined #ceph
[16:56] * freedomhui (~freedomhu@117.79.232.203) has joined #ceph
[16:59] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[16:59] <pmatulis_> ksingh: how do you install the monitor?
[17:01] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[17:01] <ksingh> using documentation ceph-deploy install
[17:02] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Read error: Connection reset by peer)
[17:04] <pmatulis_> ksingh: provide exact commands
[17:05] <ksingh> ceph-deploy install ceph-mon1 ceph-mon2 ceph-mon3
[17:06] <ksingh> ceph-deploy mon create ceph-mon1 ceph-mon2 ceph-mon3
[17:06] <ksingh> ceph-deploy gatherkeys ceph-mon1 ceph-mon2 ceph-mon3
[17:07] <ksingh> after this got this ****ing string : [ceph_deploy.gatherkeys][WARNIN] Unable to find /etc/ceph/ceph.client.admin.keyring on ['ceph-mon1', 'ceph-mon2', 'ceph-mon3']
[17:08] <peetaur> ksingh: here's what I had to do to clean out a mixed version install before ceph-deploy would work: http://pastebin.com/wdxUDHHA (note I have hardcoded paths... not sure if they are right for you)
[17:08] <pmatulis_> ksingh: are you sure the first 2 commands go ok?
[17:10] <peetaur> ksingh: added some more if you hit reload on that. I also remember I had to mkdir some dirs before it would work again (although I think it makes them automatically on a totally clean system)
[17:11] <ksingh> thanks peetaur : i would rather try to troubleshoot this instead of rebuilding , i tried 3 times earlier and got stick every time
[17:11] <peetaur> oh well I thought you were rebuilding already
[17:12] <peetaur> before you said you made node1, then tried adding 2 and 3; and the commands above have all 3
[17:12] * ajazdzewski (~quassel@lpz-66.sprd.net) Quit (Ping timeout: 480 seconds)
[17:13] <ksingh> Pmatulis : in step 1 i had few errors http://pastebin.com/wPfUH81n but finally it completed , in step 2 : got some warnings like [ceph-mon2][WARNIN] No data was received after 7 seconds, disconnecting... and [ceph_deploy.lsb][WARNIN] lsb_release was not found - inferring OS details
[17:14] <ksingh> peetaur : yes yesterday i tried for creating , so 1 monitor got installed but after that mon2 and mon3 were giving errors , then i removed every thing and installed using these commands so all monitors in one time
[17:14] <ksingh> now again got stucked
[17:14] <ksingh> i dont know where i am lagging behind
[17:14] <ksingh> do you guys also following same documentation for building you clusters ??
[17:16] <pmatulis_> ksingh: yes, except i use ubuntu, you seem to be using fedora
[17:16] <ksingh> centos
[17:16] <janos> i have a .67.3 cluster on fedora with no problems
[17:16] * sarob (~sarob@mobile-198-228-235-204.mycingular.net) Quit (Remote host closed the connection)
[17:17] <ksingh> you are the lucky man
[17:17] <janos> centos? what kernel version?
[17:17] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:17] * sarob (~sarob@mobile-198-228-235-204.mycingular.net) has joined #ceph
[17:17] <ksingh> 2.6.32
[17:17] <janos> yikes
[17:18] <janos> i don't know enough to know how much that impacts ceph, but that's a red flag
[17:18] <pmatulis_> how can anyone be running ceph on such an old kernel?
[17:18] * gsaxena (~gsaxena@pool-108-56-185-167.washdc.fios.verizon.net) Quit (Remote host closed the connection)
[17:18] <pmatulis_> ksingh: i presume you are being forced?
[17:19] * sarob (~sarob@mobile-198-228-235-204.mycingular.net) Quit (Read error: Connection reset by peer)
[17:19] <tsnider> DQOTD --- I had to reboot storage nodes - after nodes came up: I did "ceph-osd-all-starter" on the nodes. However osd processes were not started. What command is use to start osds after reboot? Is there an /etc/rc.d/ file that would automatically start osds at boot time?
[17:19] <tsnider> mount
[17:19] <tsnider> oops
[17:20] <ksingh> centos 6.4 is the latest release and by default it comes with this kernel version
[17:20] <ksingh> nopes i am not forced , i am forced to build a ceph cluster on ceontos
[17:23] <pmatulis_> right, forced onto centos, that's what i meant
[17:23] <pmatulis_> ksingh: when you say you 'removed everything', do you mean you reinstalled your nodes completely? at the OS level?
[17:23] * RuediR (~Adium@130.59.94.164) Quit (Quit: Leaving.)
[17:23] <pmatulis_> ksingh: and is this on bare metal?
[17:28] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[17:28] * shang (~ShangWu@70.35.39.20) has joined #ceph
[17:29] <ksingh> removed everything means purged only ceph and keys , no OS reinstall
[17:31] * shang (~ShangWu@70.35.39.20) Quit (Remote host closed the connection)
[17:32] <peetaur> ksingh: I had issues and fixed them with things like what I put in my pastebin
[17:33] <ksingh> peetaur : cleaning and starting over again right ? yep that the last option i have
[17:34] <ksingh> but i am pretty sad now
[17:34] <ksingh> :|
[17:34] <peetaur> ksingh: well I mean I think the install failed because it needed more cleaning than the purge and purgedata steps
[17:34] <ksingh> yeah i also thinks same
[17:37] * rongze (~rongze@117.79.232.203) Quit (Remote host closed the connection)
[17:42] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[17:43] * Cube (~Cube@66-87-66-203.pools.spcsdns.net) has joined #ceph
[17:46] * shang (~ShangWu@38.126.120.10) has joined #ceph
[17:47] * Cube (~Cube@66-87-66-203.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[17:49] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[17:49] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[17:54] * haomaiwang (~haomaiwan@175.152.18.101) Quit (Remote host closed the connection)
[17:54] * haomaiwang (~haomaiwan@175.152.18.101) has joined #ceph
[17:59] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[18:02] * nhm (~nhm@172.56.7.171) has joined #ceph
[18:02] * ChanServ sets mode +o nhm
[18:02] * mattt_ (~textual@94.236.7.190) Quit (Quit: Computer has gone to sleep.)
[18:02] * haomaiwang (~haomaiwan@175.152.18.101) Quit (Ping timeout: 480 seconds)
[18:06] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Read error: Operation timed out)
[18:06] * shang (~ShangWu@38.126.120.10) Quit (Ping timeout: 480 seconds)
[18:07] * rongze (~rongze@117.79.232.203) has joined #ceph
[18:09] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[18:10] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Read error: Operation timed out)
[18:10] * d` (dana@kokshark.techbandits.com) has joined #ceph
[18:11] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[18:17] * jcfischer (~fischer@macjcf.switch.ch) Quit (Quit: jcfischer)
[18:18] <d`> is there a feature or ticket open on asynchronous replication that I can track? I'm really interested in replacing our swift stack with ceph, but async replication is a critical feature for us
[18:18] * xarses1 (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[18:19] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[18:20] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[18:20] * shang (~ShangWu@70.35.39.20) has joined #ceph
[18:20] <peetaur> d`: by async, you mean like some cronjob can copy the system at any time, rather than some master+slave setup?
[18:22] <peetaur> if yes, then answer is yes for rbd, incremental backup: http://ceph.com/dev-notes/incremental-snapshots-with-rbd/ and here are my notes, including missign steps from the other that guys here helped me with http://pastebin.com/Q0R8Wk2r
[18:24] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:26] <mikedawson> d: Asynchronous Replication for radosgw (the swift/s3 compatible object store) should land in the Emperor release in November http://www.inktank.com/about-inktank/roadmap/
[18:26] * alram (~alram@216.103.134.250) has joined #ceph
[18:27] <mikedawson> d`: ^
[18:28] <mikedawson> d`: http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/RGW_Geo-Replication_and_Disaster_Recovery and http://wiki.ceph.com/01Planning/02Blueprints/Emperor/rgw%3A_Multi-region_%2F%2F_Disaster_Recovery_%28phase_2%29
[18:29] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:32] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[18:34] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[18:36] <d`> peetaur: I was looking for the answer that mikedawson gave mostly, but thanks for the cool solution
[18:37] <d`> mikedawson: I'm aware of that item on the roadmap, I was interested in tracking/testing this feature, so I'm looking for a bug/feature ticket to subscribe to
[18:37] * zhyan_ (~zhyan@101.83.104.57) Quit (Ping timeout: 480 seconds)
[18:38] * JoeGruher (~JoeGruher@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[18:40] * thomnico (~thomnico@70.35.39.20) Quit (Ping timeout: 480 seconds)
[18:40] * shang (~ShangWu@70.35.39.20) Quit (Quit: Ex-Chat)
[18:40] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[18:42] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[18:42] * ChanServ sets mode +v andreask
[18:42] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit ()
[18:46] <mikedawson> d`: lots of finished features to review here -> http://tracker.ceph.com/projects/ceph/issues?utf8=%E2%9C%93&set_filter=1&f[]=assigned_to_id&op[assigned_to_id]=%3D&v[assigned_to_id][]=4&f[]=tracker_id&op[tracker_id]=%3D&v[tracker_id][]=2&f[]=&c[]=project&c[]=tracker&c[]=status&c[]=priority&c[]=subject&c[]=assigned_to&c[]=updated_on&c[]=category&c[]=fixed_version&c[]=story_points&c[]=cf_3&c[]=cf_4&g
[18:46] <mikedawson> roup_by=
[18:47] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[18:47] * ircolle (~Adium@2601:1:8380:2d9:40fe:2e07:e1b5:9806) has joined #ceph
[18:48] * yehudasa__ (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) has joined #ceph
[18:51] * ksingh (~Adium@teeri.csc.fi) has left #ceph
[18:51] * The_Bishop (~bishop@2001:470:50b6:0:5d8:43e2:642c:5286) has joined #ceph
[18:55] * sakari (sakari@turn.ip.fi) Quit (Ping timeout: 480 seconds)
[18:58] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[19:00] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[19:00] * joelio (~Joel@88.198.107.214) Quit (Ping timeout: 480 seconds)
[19:01] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[19:05] * sagelap (~sage@2600:1001:b113:8719:c685:8ff:fe59:d486) has joined #ceph
[19:05] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[19:07] * csaesumu (~csaesumu@inf-205-218.inf.um.es) Quit (Remote host closed the connection)
[19:09] * wusui (~Warren@2607:f298:a:607:c960:8ea:5374:da97) Quit (Ping timeout: 480 seconds)
[19:10] * wusui (~Warren@2607:f298:a:607:69d3:d139:6363:a6d7) has joined #ceph
[19:11] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Ping timeout: 480 seconds)
[19:12] * rongze (~rongze@117.79.232.203) has joined #ceph
[19:12] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[19:15] * sagelap (~sage@2600:1001:b113:8719:c685:8ff:fe59:d486) Quit (Quit: Leaving.)
[19:16] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:17] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:20] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[19:26] * lofejndif (~lsqavnbok@82VAAFVRK.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:27] <JoeGruher> does it matter if i mount my OSD and journal or just point ceph-deploy to the raw devices? ceph-deploy osd prepare host01:/dev/sdc:/dev/sdb1 versus host01:/path/to/mounted/OSD:/path/to/mounted/journal ?
[19:27] <JoeGruher> i know both work functionally but is one better in some way, say for performance reasons?
[19:29] <mikedawson> JoeGruher: point ceph-deploy to the raw devices
[19:30] <mikedawson> JoeGruher: udev handles mounting on modern deployments (and allows for things like OSD portability)
[19:31] <JoeGruher> k
[19:31] <JoeGruher> thx
[19:37] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:45] * jtlebigot (~jlebigot@proxy.ovh.net) Quit (Quit: Leaving.)
[19:47] * glzhao_ (~glzhao@118.195.65.67) Quit (Quit: leaving)
[19:51] * Pedras (~Adium@216.207.42.132) has joined #ceph
[19:52] * albionandrew (~albionand@64.25.15.100) has joined #ceph
[19:52] * albionandrew (~albionand@64.25.15.100) Quit (Remote host closed the connection)
[19:57] <tsnider> after reboot some osds aren't up. log file has messages about journal fsids not matching expected fsids. Cluster was created yesterday using ceph-deploy. fsids in /var/log/ceph/osd/ceph-*/fsid differ for the same cluster. Is that expected? no osd processes are started on any storage nodes. using start ceph-osd --verbose id=xx wasn't effective. What do I need to do?
[19:57] <tsnider> to get the osds up?
[19:58] * Cube (~Cube@66-87-65-237.pools.spcsdns.net) has joined #ceph
[20:02] <d`> mikedawson: thanks!
[20:03] <mikedawson> d`: yw
[20:12] * sarob (~sarob@166.137.82.181) has joined #ceph
[20:13] * sarob (~sarob@166.137.82.181) Quit (Read error: Connection reset by peer)
[20:14] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[20:16] * alram (~alram@216.103.134.250) Quit (Ping timeout: 480 seconds)
[20:24] * yanzheng (~zhyan@101.82.182.104) has joined #ceph
[20:34] <JoeGruher> should osd_pool_default_pg_num and osd_pool_default_pgp_num apply to the default pools? i put them in ceph.conf before creating any OSDs but after bringing up the OSDs the default pools are using a value of 64
[20:37] * yanzheng (~zhyan@101.82.182.104) Quit (Ping timeout: 480 seconds)
[20:44] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[20:44] * Cube (~Cube@66-87-65-237.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[20:44] * Cube (~Cube@66-87-66-139.pools.spcsdns.net) has joined #ceph
[20:46] * yanzheng (~zhyan@101.82.187.88) has joined #ceph
[20:49] <JoeGruher> what's the equivalent of "apt-get install ceph-common" for centos? "yum install ceph-common" reports no ceph-common package exists
[20:50] * zhyan_ (~zhyan@101.83.180.157) has joined #ceph
[20:54] * yanzheng (~zhyan@101.82.187.88) Quit (Ping timeout: 480 seconds)
[20:57] * mschiff (~mschiff@85.182.236.82) has joined #ceph
[20:58] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[21:05] * sarob (~sarob@166.137.81.119) has joined #ceph
[21:07] <xarses> JoeGruher just ceph
[21:07] <xarses> JoeGruher see https://github.com/Mirantis/fuel/blob/master/deployment/puppet/ceph/manifests/params.pp
[21:08] <xarses> for conversions of most things between debian and redhat
[21:08] <JoeGruher> xarses, thanks, I have ceph installed (Package ceph-0.67.4-0.el6.x86_64 already installed and latest version) but I don't have radosgw-admin... is that in a different package? i'll check out that link.
[21:09] * freedomhui (~freedomhu@117.79.232.203) Quit (Quit: Leaving...)
[21:09] * alram (~alram@216.103.134.250) has joined #ceph
[21:11] <tsnider> after having to reboot storage nodes osds aren't up (no ceph-osd processes were started). log file has messages about journal fsids not matching expected fsids. Cluster was created yesterday using ceph-deploy. fsids in /var/log/ceph/osd/ceph-*/fsid differ for the same cluster. Is that expected? using start ceph-osd --verbose id=xx wasn't effective. What do I need to do to jump start the osds? Journals are on separate unmounted devices.
[21:11] <JoeGruher> aha, i have to yum install ceph-radosgw. thanks xarses.
[21:13] * rongze (~rongze@117.79.232.203) has joined #ceph
[21:20] <xarses> JoeGruher: np
[21:21] <xarses> JoeGruher: also make sure you use mod_fastcgi and not fcgid, i was never able to get that to work
[21:21] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[21:21] <xarses> JoeGruher: and on top of using mod_fastcgi, use the version with patches from inktank/ceph http://gitbuilder.ceph.com/mod_fastcgi-rpm-centos6-x86_64-basic/ref/master/x86_64/
[21:22] * nigwil_ (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) has joined #ceph
[21:23] <JoeGruher> xarses, thanks for the tips, will do
[21:25] * dty (~derek@proxy00.umiacs.umd.edu) has joined #ceph
[21:28] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[21:28] <cjh_> i'm not sure i get why openstack project manilla is needed
[21:28] * nigwil (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) Quit (Ping timeout: 480 seconds)
[21:29] * sleinen1 (~Adium@2001:620:0:26:c564:529a:479b:f150) has joined #ceph
[21:29] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[21:31] <dty> I am still trying to debug a problem with a production ceph cluster with radosgw running and showing the usage statistics. Digging even into showing the logs seems to be confusing to me
[21:31] <dty> So i have a log entry in radosgw-admin log list that says "2013-10-23-15-default.8463.3-hmp"
[21:32] <dty> if i try to run the command radosgw-admin log show --date=2013-10-23 --bucket=hmp --bucket-id=default.8463.3
[21:32] <dmsimard> leseb: ping
[21:32] <dty> i get a error, error reading log 2013-10-23-default.8463.3-: (2) No such file or directory
[21:36] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[21:36] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) Quit (Quit: Leaving.)
[21:36] * sarob (~sarob@166.137.81.119) Quit (Remote host closed the connection)
[21:36] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[21:37] * sarob (~sarob@166.137.81.119) has joined #ceph
[21:37] <dty> i can get the object out of rados
[21:37] <dty> rados --pool .log get 2013-10-23-15-default.8463.3-hmp /tmp/foo
[21:38] <dty> and there are logs in it
[21:39] * sarob (~sarob@166.137.81.119) Quit (Read error: Connection reset by peer)
[21:45] * danieagle (~Daniel@179.186.126.49.dynamic.adsl.gvt.net.br) has joined #ceph
[21:46] <leseb> dmsimard: pong
[21:46] <dmsimard> leseb: yay! Did you get news from your manager about making puppet-ceph Apache 2 ?
[21:47] <pmatulis_> how do i know what permissions are required for a key that will manage rbd images? is this enough?:
[21:47] <pmatulis_> ceph auth get-or-create client.images_pool osd 'allow rw' > images_pool.keyring
[21:47] <leseb> dmsimard: not yet :(
[21:47] <leseb> dmsimard: I'll ping someone else
[21:47] * yehudasa__ (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[21:47] <dmsimard> leseb: Okay, we started working on puppet-ceph already. I'll include the AGPL license/copyright information with the related files.
[21:47] <dmsimard> (for the time being)
[21:49] <leseb> dmsimard: I sent the vote to the contributors, and got everyone's approval :)
[21:49] <mikedawson> pmatulis_: my caps look like caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images
[21:49] <dmsimard> leseb: awesome, so is that a yes ? Or you still need that approval from Enovance
[21:50] <pmatulis_> mikedawson: thanks, but what do you mean by 'caps look like caps'?
[21:50] <leseb> dmsimard: still need enovance's approval
[21:51] <dmsimard> leseb: Okay, no problem. Can you poke me when you have an update? :)
[21:51] <mikedawson> pmatulis_: look at the output of 'ceph auth list' and it should make sense
[21:51] * zhyan_ (~zhyan@101.83.180.157) Quit (Ping timeout: 480 seconds)
[21:51] <leseb> dmsimard: sure I will
[21:52] <mikedawson> pmatulis_: the refrerence to the images pool is for openstack which may not be relevant to you
[21:53] <pmatulis_> mikedawson: alright, thank you
[21:54] * amospalla (~amospalla@0001a39c.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:55] * zhyan_ (~zhyan@101.83.180.157) has joined #ceph
[21:55] <JoeGruher> what's the best way to install the rbd client module on centos? - [ceph@joceph05 ceph]$ sudo modprobe rbd - FATAL: Module rbd not found.
[21:55] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) has joined #ceph
[21:56] * ircolle (~Adium@2601:1:8380:2d9:40fe:2e07:e1b5:9806) Quit (Ping timeout: 480 seconds)
[21:56] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) has joined #ceph
[21:57] * carif_ (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) has joined #ceph
[21:57] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) Quit (Read error: Connection reset by peer)
[21:58] * amospalla (~amospalla@0001a39c.user.oftc.net) has joined #ceph
[22:00] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) has joined #ceph
[22:00] * carif_ (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) Quit (Read error: Connection reset by peer)
[22:02] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) Quit (Remote host closed the connection)
[22:02] <pmatulis_> JoeGruher: get a kernel that has the rbd module?
[22:03] <JoeGruher> ungh
[22:03] <pmatulis_> JoeGruher: what kernel do you have?
[22:03] * zhyan_ (~zhyan@101.83.180.157) Quit (Ping timeout: 480 seconds)
[22:03] <JoeGruher> pmatulis_: 3.11.6
[22:03] <pmatulis_> er
[22:04] <JoeGruher> pmatulis_: on CentOS 6.4
[22:04] <pmatulis_> 3.11 is very recent, you should have it
[22:04] <JoeGruher> hmm
[22:06] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) has joined #ceph
[22:07] <pmatulis_> re cephx keys, what is this for:
[22:07] <pmatulis_> class-read = can call class methods that are reads
[22:12] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) Quit (Quit: Ex-Chat)
[22:14] <lurbs> Has anyone here tried to do the math on the likelihood of data loss due to multiple disk failure, for different replica sizes?
[22:16] <lurbs> Or, more specifically, have any idea as to how much more likely a disk is to die while under the strain of the rebuild process caused by the failure of a previous disk.
[22:19] * JoeGruher (~JoeGruher@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[22:24] <smiley> Is Alfredo around?
[22:25] * ScOut3R_ (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[22:37] * gregsfortytwo1 (~Adium@38.122.20.226) has joined #ceph
[22:37] * allsystemsarego (~allsystem@188.27.166.164) Quit (Quit: Leaving)
[22:38] * yehudasa (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) has joined #ceph
[22:43] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[22:45] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[22:46] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[22:50] * gregsfortytwo (~Adium@2607:f298:a:607:dcd:f9b1:80f4:84cc) Quit (Quit: Leaving.)
[22:54] * gregsfortytwo1 (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[22:56] * gregsfortytwo (~Adium@38.122.20.226) has joined #ceph
[22:56] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[22:56] * glanzi (~glanzi@201.75.202.207) Quit (Remote host closed the connection)
[22:57] * jhujhiti (~jhujhiti@00012a8b.user.oftc.net) has left #ceph
[22:57] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[22:57] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[23:03] * Pedras (~Adium@216.207.42.132) Quit (Ping timeout: 480 seconds)
[23:08] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[23:09] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[23:09] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[23:10] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) has joined #ceph
[23:10] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[23:11] * yehudasa (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) Quit (Read error: Connection reset by peer)
[23:12] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[23:14] * rongze (~rongze@106.120.176.65) has joined #ceph
[23:18] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[23:19] * danieagle (~Daniel@179.186.126.49.dynamic.adsl.gvt.net.br) Quit (Ping timeout: 480 seconds)
[23:22] * rongze (~rongze@106.120.176.65) Quit (Read error: Operation timed out)
[23:23] * Pedras (~Adium@216.207.42.132) has joined #ceph
[23:27] * sleinen1 (~Adium@2001:620:0:26:c564:529a:479b:f150) Quit (Quit: Leaving.)
[23:27] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:28] * gregsfortytwo1 (~Adium@2607:f298:a:607:250a:643:46b8:2692) has joined #ceph
[23:28] <mikedawson> lurbs: i think loicd did some math in regard to replication levels vs. different erasure coding schemes
[23:28] * loicd wakes up
[23:28] <mikedawson> loicd: sorry!
[23:29] <loicd> metaphorically that is ;-)
[23:32] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[23:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:36] * gregsfortytwo (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[23:36] * sagelap (~sage@2600:1001:b12e:b7b4:2467:e849:938f:f70e) has joined #ceph
[23:36] <loicd> lurbs: I actually don't know the math and I'd be interested to have actual numbers :-) My understanding is that the odds of losing data when you have something like 100 machines and 3 replicas are lower than the chance that the building containing them is destroyed by an accident.
[23:42] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[23:45] * BillK (~BillK-OFT@58-7-61-12.dyn.iinet.net.au) has joined #ceph
[23:47] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[23:49] * dty (~derek@proxy00.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[23:52] <lurbs> loicd: I'm currently trying to convince the Powers That Be that 3 replicas is basically a requirement for a production grade cluster.
[23:53] <loicd> lurbs: what are the odds of a disk being destroyed in your environment ?
[23:54] <lurbs> Uh, non-zero? :)
[23:54] <loicd> the maths are probably trivial for someone who knows probability ( which I don't )
[23:56] <lurbs> I may have to do a bit more digging. I believe that Google did a study of failure rates with a pretty decent sample size.
[23:57] <loicd> say it takes at most X seconds in your reference architecture to recover what a lost disk contains. The odds of losing data when you have two replicas is the probability of losing another disk during this interval.
[23:57] <lurbs> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/disk_failures.pdf
[23:58] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:58] <lurbs> loicd: Yeah, but the probability of losing another disk during that time is tricky to determine, especially if the rebuild itself exposes the failure in the disk.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.