#ceph IRC Log

Index

IRC Log for 2013-08-07

Timestamps are in GMT/BST.

[0:01] <dirk___> anything specific I should look for?
[0:01] <sjust> no, you'll have to sftp the gzipped log to cephdrop@ceph.com
[0:02] * jjgalvez (~jjgalvez@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:02] <dirk___> hmm, anything I should wait for before doing that?
[0:02] <sjust> I'd give it about 2 minutes
[0:02] <dirk___> it logs a few thousand lines per second, all in the same format
[0:02] <sjust> yep
[0:03] * sprachgenerator (~sprachgen@130.202.135.209) Quit (Quit: sprachgenerator)
[0:04] * dosaboy (~dosaboy@host86-136-164-81.range86-136.btcentralplus.com) has joined #ceph
[0:04] * mschiff (~mschiff@85.182.236.82) Quit (Remote host closed the connection)
[0:05] * danieagle (~Daniel@186.214.57.154) has joined #ceph
[0:08] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) has joined #ceph
[0:10] <dirk___> sjust: hmm, it is 5gb after 5 min.. compressing takes a while
[0:10] <sjust> dirk___: yep
[0:15] <dirk___> sjust: ceph-osd.0.log.debug.gz upload finished
[0:17] <dirk___> there is a big of traces from the run before at the beginning
[0:19] * devoid (~devoid@130.202.135.211) Quit (Quit: Leaving.)
[0:20] * bandrus1 (~Adium@2607:f298:a:607:9d92:ff4a:7784:418c) has joined #ceph
[0:24] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:26] <sjust> dirk___: can you post the output of ceph -s?
[0:26] * bandrus (~Adium@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:27] <dirk___> sjust: http://pastebin.com/a7jaxSDP
[0:27] <dirk___> hmm, the stuck stale is new
[0:30] <sjust> did you just create a pool?
[0:31] <dirk___> you mean due to the "1 creating" in the pgmap?
[0:32] <sjust> yes
[0:33] <dirk___> thats a fallout from a debug attempt an hour ago. I tried force_pg_creating command according to some mailing list posting on one of the pgs that was empty
[0:33] <dirk___> and inactive/peering before
[0:33] <dirk___> didn't really help, other than that particular pg changed from peering to creating, but also stuck there then
[0:35] <sjust> try marking the osds down manually
[0:36] <sjust> ceph osd down 0
[0:36] <sjust> ceph osd down 1
[0:36] <sjust> ceph osd down 2
[0:36] <sjust> and then restart the osd daemons
[0:36] * junglebells (~junglebel@0001b1b9.user.oftc.net) Quit (Quit: YAY for dinner)
[0:37] <dirk___> any particular order for the restart?
[0:37] <sjust> no
[0:38] <dmick> sjust: need noup first?
[0:38] <sjust> no, don't think so
[0:38] <dmick> k
[0:38] * sjm (~oftc-webi@c73-103.rim.net) Quit (Quit: Page closed)
[0:40] <dirk___> hmm, a bit better than before
[0:41] <dirk___> pgmap v5129085: 192 pgs: 31 active+clean, 97 peering, 3 active+remapped, 24 down+peering, 37 remapped+peering; 524 GB data, 1058 GB used, 1734 GB / 2793 GB avail
[0:42] <dirk___> ceph -w complaining about slow requests on osd.0 and .1
[0:44] * KevinPerks (~Adium@2607:f298:a:607:258e:7bbb:7b2f:3ba7) Quit (Quit: Leaving.)
[0:45] * doxavore (~doug@99-89-22-187.lightspeed.rcsntx.sbcglobal.net) Quit (Quit: :qa!)
[0:45] <dirk___> hmm, no other change though.. two scrubs (that turned out clean)
[0:48] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[0:50] <dirk___> sjust: hmm, pg query on the peering tagged pg's looks quite different now
[0:50] <mikedawson> joshd: have you had a chance to look at the rbd / qemu issue and logs I sent?
[0:51] <paravoid> hey folks, what's your plans for the 0.67 release?
[0:51] <paravoid> next week or two?
[0:51] * Cube (~Cube@38.122.20.226) Quit (Remote host closed the connection)
[0:51] <paravoid> (also, blog is quite outdated, doesn't mention 0.67-rc3 nor the summit & ceph days)
[0:51] * Cube (~Cube@38.122.20.226) has joined #ceph
[0:53] <scuttlemonkey> paravoid: yeah, content is getting back on track next week
[0:54] <scuttlemonkey> we have all been traveling
[0:55] * KevinPerks (~Adium@38.122.20.226) has joined #ceph
[0:55] <joshd> mikedawson: a little, it's nothing obvious - I'll need to take a closer look, and maybe give you a wip branch with more logging
[0:55] * KevinPerks (~Adium@38.122.20.226) Quit (Read error: Connection reset by peer)
[0:55] * KevinPerks (~Adium@2607:f298:a:607:1099:c8c5:bfb9:97c2) has joined #ceph
[0:56] * nwat (~nwat@38.122.20.226) has joined #ceph
[0:56] <paravoid> sjust: so after a few weeks of running dumpling, upgrading to -rc3, rebooting nodes under traffic and other tests:
[0:56] <scuttlemonkey> paravoid: however, CDS and ceph day were in there...are you on a stale page?
[0:56] <scuttlemonkey> http://ceph.com/community/ceph-developer-summit-emperor/
[0:56] <scuttlemonkey> http://ceph.com/events/countdown-to-ceph-day-nyc/
[0:57] <scuttlemonkey> just no recaps yet
[0:57] <paravoid> sjust: peering is definitely nothing like it used to be; peering on boot is usually instanteneous, although there have been cases where it took longer but not long enough for me to collect pg dumps yet
[0:57] * Cube (~Cube@38.122.20.226) Quit (Read error: Connection reset by peer)
[0:57] * Cube (~Cube@2607:f298:a:697:ac1b:c979:2553:f418) has joined #ceph
[0:57] <paravoid> sjust: peering on "down" events is still not great, though, albeit bearable :) I remember sage kind of expecting this
[0:58] <dirk___> sjust: doing the osd out and restart in a loop seems to kick it alive again.. it is actively replicating data now
[0:58] <paravoid> scuttlemonkey: yeah my bad. CDS was back from July 18th which is why I missed this now
[0:59] <scuttlemonkey> yeah
[0:59] <scuttlemonkey> CDS vids are all posted w/ the timecode splits on sessions now though :)
[0:59] <scuttlemonkey> so I'll do a recap later this week or early next
[0:59] <paravoid> cool!
[0:59] <dirk___> sjust: thanks for the help :)
[0:59] <dirk___> not sure if it will eventually finish though
[1:05] * tnt (~tnt@109.130.80.16) Quit (Ping timeout: 480 seconds)
[1:06] * Cube (~Cube@2607:f298:a:697:ac1b:c979:2553:f418) Quit (Quit: Leaving.)
[1:16] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[1:17] * bandrus1 (~Adium@2607:f298:a:607:9d92:ff4a:7784:418c) Quit (Quit: Leaving.)
[1:24] <mikedawson> joshd: Thanks! I'll help in any way I can
[1:26] * bandrus (~Adium@38.122.20.226) has joined #ceph
[1:26] <sjust> dirk___: cool
[1:26] <sjust> paravoid: good to hear
[1:26] * bandrus (~Adium@38.122.20.226) Quit ()
[1:26] * scuttlemonkey (~scuttlemo@2607:f298:a:607:91ef:d414:2ad6:6a62) Quit (Ping timeout: 480 seconds)
[1:27] * AfC (~andrew@2001:44b8:31cb:d400:590c:f208:eacc:c3b) has joined #ceph
[1:29] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC)
[1:33] * nwat (~nwat@38.122.20.226) Quit (Ping timeout: 480 seconds)
[1:33] <paravoid> sjust: :)
[1:38] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:41] * sagelap (~sage@2607:f298:a:607:c5e:7bb0:c323:186c) Quit (Remote host closed the connection)
[1:42] <lurbs> Anyone else seen an error where ceph-deploy fails to create a second OSD, if you've specified the same block device (/dev/vdc in this case) for the journal?
[1:42] <lurbs> It creates a second partition for it, but fails with:
[1:42] <lurbs> http://paste.nothing.net.nz/fbfa9f
[1:43] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:44] * KevinPerks (~Adium@2607:f298:a:607:1099:c8c5:bfb9:97c2) Quit (Quit: Leaving.)
[1:47] * danieagle (~Daniel@186.214.57.154) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[1:48] * zhu_ (~quassel@117.79.232.188) Quit (Read error: Operation timed out)
[1:48] * haomaiwa_ (~haomaiwan@117.79.232.202) Quit (Ping timeout: 480 seconds)
[2:04] * scuttlemonkey (~scuttlemo@38.122.20.226) has joined #ceph
[2:04] * ChanServ sets mode +o scuttlemonkey
[2:04] * houkouonchi_work (~houkouonc@38.122.20.226) Quit (Read error: Connection reset by peer)
[2:12] * danieagle (~Daniel@186.214.57.154) has joined #ceph
[2:12] * mmercer (~kvirc@199.127.107.196) Quit (Ping timeout: 480 seconds)
[2:22] * dosaboy_ (~dosaboy@host86-156-255-255.range86-156.btcentralplus.com) has joined #ceph
[2:23] * dosaboy (~dosaboy@host86-136-164-81.range86-136.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:33] * huangjun (~kvirc@111.173.155.201) has joined #ceph
[2:34] * dirk___ (~dirk@nrbg-4dbe3c45.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[2:37] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[2:39] * LeaChim (~LeaChim@2.122.34.66) Quit (Ping timeout: 480 seconds)
[2:39] * john (~john@2607:f298:a:607:d6be:d9ff:fe04:efa6) Quit (Quit: Leaving)
[2:48] * scuttlemonkey (~scuttlemo@38.122.20.226) Quit (Ping timeout: 480 seconds)
[2:49] * KevinPerks (~Adium@38.122.20.226) has joined #ceph
[2:49] * scuttlemonkey (~scuttlemo@2607:f298:a:607:4d5:9e17:42c1:e3df) has joined #ceph
[2:49] * ChanServ sets mode +o scuttlemonkey
[2:50] * gentleben (~sseveranc@12.250.97.26) Quit (Quit: gentleben)
[2:54] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[2:55] * danieagle (~Daniel@186.214.57.154) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[2:56] * KevinPerks (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[2:58] * yy-nm (~chatzilla@115.196.74.105) has joined #ceph
[3:04] <Tamil> lurbs: around?
[3:04] <lurbs> Yep.
[3:04] <Tamil> lurbs: on which distro are you trying the osd create?
[3:05] * jeroenmoors (~quassel@193.104.8.40) Quit (Ping timeout: 480 seconds)
[3:05] <lurbs> 12.04 LTS, using ceph-deploy 1.0-1 and ceph 0.67-rc3-1precise.
[3:06] <lurbs> It works if I pre-create the partitions for the journals, and specify those directly rather than the raw block device.
[3:06] * alfredodeza (~alfredode@38.122.20.226) Quit (Remote host closed the connection)
[3:06] <lurbs> Similar sort of problem to one I just saw on the mailing list: http://www.spinics.net/lists/ceph-users/msg03239.html
[3:06] <Tamil> lurbs: alright, let me check
[3:07] <lurbs> Although they get it with the third journal, not the second.
[3:08] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[3:09] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[3:10] <lurbs> I was going to try with a more recent ceph-deploy (1.1-1, built from master), but that has a dependency on python-pushy >= 0.5.2 and there's no package built for it.
[3:12] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:13] * ishkabob (~c7a82cc0@webuser.thegrebs.com) has joined #ceph
[3:13] <ishkabob> hey guys
[3:13] * danieagle (~Daniel@186.214.57.154) has joined #ceph
[3:14] <ishkabob> when you're using ceph-deploy, should "ceph-deploy mon create blah" create the monitor directory for you?
[3:25] <scuttlemonkey> ishkabob: yes
[3:30] <ishkabob> scuttlemonkey: will it still work if you define a custom location for ceph mon directories?
[3:30] <ishkabob> i basically create a ceph.conf using ceph-deploy new blah1 blah2
[3:31] <ishkabob> and then add some lines telling it where to put mon, mds, and osd directories
[3:31] <ishkabob> for example, monitors should go in /srv/mon-$name
[3:31] <ishkabob> then i try to run "ceph-deploy mon create camelot"
[3:31] <ishkabob> and i get [Errno 2] No such file or directory: '/var/lib/ceph/mon/ceph-camelot'
[3:35] <scuttlemonkey> ishkabob: you should be able to set that in the conf
[3:35] <scuttlemonkey> http://ceph.com/docs/master/rados/configuration/mon-config-ref/#data
[3:35] <scuttlemonkey> as long as it's uniform across the cluster you should be ok
[3:38] <ishkabob> scuttlemonkey: thanks, i think that might actually be broken. It doesn't seem to want to create directories if you define the non-default pahts
[3:38] <ishkabob> its no big deal though
[3:38] * AfC (~andrew@2001:44b8:31cb:d400:590c:f208:eacc:c3b) Quit (Quit: Leaving.)
[3:39] <scuttlemonkey> are you using a packaged version, or did you clone from github?
[3:42] * danieagle (~Daniel@186.214.57.154) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[3:43] <scuttlemonkey> ishkabob: I actually have to run, but feel free to hit me tomorrow if I'm around
[3:44] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[3:44] <nerdtron> hi all does anyone have managed to use ceph as a datastore in open nebula?
[3:50] * joao (~JL@2607:f298:a:607:9eeb:e8ff:fe0f:c9a6) Quit (Ping timeout: 480 seconds)
[3:51] * nhm_ (~nhm@38.122.20.226) has joined #ceph
[3:51] * scuttlemonkey (~scuttlemo@2607:f298:a:607:4d5:9e17:42c1:e3df) Quit (Ping timeout: 480 seconds)
[3:51] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC (Ping timeout))
[3:53] * nhm (~nhm@ma30436d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[3:57] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[4:06] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[4:08] * nhm_ (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[4:12] * nhm (~nhm@38.122.20.226) has joined #ceph
[4:25] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[4:32] * nhm (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[4:41] * nhm_ (~nhm@38.122.20.226) has joined #ceph
[4:49] * nhm (~nhm@38.122.20.226) has joined #ceph
[4:51] * nhm_ (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[5:03] * nhm (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[5:05] * fireD_ (~fireD@93-139-175-22.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-142-223-18.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:38] * nhm (~nhm@38.122.20.226) has joined #ceph
[5:40] * huangjun (~kvirc@111.173.155.201) Quit (Ping timeout: 480 seconds)
[5:43] * dosaboy (~dosaboy@host86-152-196-168.range86-152.btcentralplus.com) has joined #ceph
[5:50] * dosaboy_ (~dosaboy@host86-156-255-255.range86-156.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[5:53] * nhm (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[5:57] * nhm (~nhm@38.122.20.226) has joined #ceph
[6:20] * nhm_ (~nhm@38.122.20.226) has joined #ceph
[6:20] * madkiss (~madkiss@2001:6f8:12c3:f00f:b540:54ff:945:9838) Quit (Quit: Leaving.)
[6:22] * huangjun (~kvirc@221.234.156.126) has joined #ceph
[6:23] * madkiss (~madkiss@2001:6f8:12c3:f00f:24f4:40e7:22e3:d5ff) has joined #ceph
[6:25] * haomaiwang (~haomaiwan@106.120.176.101) has joined #ceph
[6:27] * nhm (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[6:30] * nhm_ (~nhm@38.122.20.226) Quit (Read error: Connection reset by peer)
[6:35] * nhm (~nhm@38.122.20.226) has joined #ceph
[6:38] * root_ (~chatzilla@218.94.22.130) Quit (Ping timeout: 480 seconds)
[6:43] * nhm (~nhm@38.122.20.226) Quit (Read error: Connection reset by peer)
[6:58] * nhm (~nhm@38.122.20.226) has joined #ceph
[7:04] * tnt (~tnt@109.130.80.16) has joined #ceph
[7:34] * nhm (~nhm@38.122.20.226) Quit (Ping timeout: 480 seconds)
[7:38] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[7:42] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[7:47] * nhm (~nhm@216.1.187.162) has joined #ceph
[7:52] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[7:55] * nhm (~nhm@216.1.187.162) Quit (Quit: Lost terminal)
[8:00] <yy-nm> hi, all. how to gener the keyring file under the bootstrap-osd/ and bootstrap-mds/ by using ceph-deploy
[8:07] <nerdtron> it should be automatically generated when you create a monitor or osd
[8:15] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[8:19] <yy-nm> i just transition from mkcephfs, and no other mon or osd need create.
[8:20] * scuttlemonkey (~scuttlemo@mdc2036d0.tmodns.net) has joined #ceph
[8:20] * ChanServ sets mode +o scuttlemonkey
[8:29] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[8:31] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[8:37] <nerdtron> wat command do you use to create mons?
[8:40] * shoosah (~ssha637@en-279303.engad.foe.auckland.ac.nz) has joined #ceph
[8:40] * rongze (~quassel@li565-182.members.linode.com) has joined #ceph
[8:41] * dirk (~dirk@nrbg-4dbfd808.pool.mediaWays.net) has joined #ceph
[8:41] <shoosah> hello
[8:42] <yy-nm> ceph-deploy mon create
[8:43] <shoosah> I am trying to add mds, but I get this error all the time >
[8:43] <shoosah> ssha637@en-439--215-005:/var/lib/ceph/mds/ceph-en-439--215-005$ sudo ceph-deploy mds create en-439--215-005
[8:43] <shoosah> 2013-08-07 17:56:13.910786 7fdf26416780 -1 unable to authenticate as client.bootstrap-mds
[8:43] <shoosah> 2013-08-07 17:56:13.910998 7fdf26416780 -1 ceph_tool_common_init failed.
[8:43] <shoosah> Traceback (most recent call last):
[8:43] <shoosah> File "/usr/bin/ceph-deploy", line 21, in <module>
[8:43] <shoosah> main()
[8:43] <shoosah> File "/usr/lib/pymodules/python2.7/ceph_deploy/cli.py", line 112, in main
[8:43] <shoosah> return args.func(args)
[8:43] <shoosah> File "/usr/lib/pymodules/python2.7/ceph_deploy/mds.py", line 195, in mds
[8:43] <shoosah> mds_create(args)
[8:43] <shoosah> File "/usr/lib/pymodules/python2.7/ceph_deploy/mds.py", line 182, in mds_create
[8:43] <shoosah> init=init,
[8:43] <shoosah> File "/usr/lib/python2.7/dist-packages/pushy/protocol/proxy.py", line 255, in <lambda>
[8:43] <shoosah> (conn.operator(type_, self, args, kwargs))
[8:43] <shoosah> File "/usr/lib/python2.7/dist-packages/pushy/protocol/connection.py", line 66, in operator
[8:43] <shoosah> return self.send_request(type_, (object, args, kwargs))
[8:43] <shoosah> File "/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py", line 323, in send_request
[8:43] <shoosah> return self.__handle(m)
[8:43] <shoosah> File "/usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py", line 639, in __handle
[8:43] <shoosah> raise e
[8:43] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[8:43] <shoosah> pushy.protocol.proxy.ExceptionProxy: Command '['ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-mds', '--keyring', '/var/lib/ceph/bootstrap-mds/ceph.keyring', 'auth', 'get-or-create', 'mds.en-439--215-005', 'osd', 'allow *', 'mds', 'allow', 'mon', 'allow rwx', '-o', '/var/lib/ceph/mds/ceph-en-439--215-005/keyring']' returned non-zero exit status 1
[8:43] <shoosah> does anybody have any idea?!
[8:49] <yy-nm> do you check the keyring file under /var/lib/ceph/bootstrap-mds/ ?
[8:49] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[8:49] * ChanServ sets mode +v andreask
[8:50] <nerdtron> ceph-deploy purge [nodes] to destroy all settings and start over
[8:51] * sleinen (~Adium@eduroam-4-152.epfl.ch) has joined #ceph
[8:51] <nerdtron> ceph-deploy install [nodes] to automatically create the keys..
[8:51] <nerdtron> follow this http://ceph.com/docs/master/start/quick-start-preflight/
[8:52] <joelio> nerdtron: get the latest ceph-deploy
[8:52] <nerdtron> and this http://ceph.com/docs/master/start/quick-ceph-deploy/
[8:52] <shoosah> nerdtron thanks but Im following the same link
[8:52] <nerdtron> you won't be lost when you follow that
[8:53] <shoosah> ceph actually doesnt create the keyring in the right folder
[8:53] <shoosah> then I had to copy the keyring manually!
[8:53] <shoosah> or sometimes I do need to make directory
[8:54] <nerdtron> on the part install ceph and add mons the keys should be generated automatically...if not, you uninstall ceph and "purge" it
[8:54] * sleinen1 (~Adium@2001:620:0:25:1e2:46ae:a53a:17ef) has joined #ceph
[8:54] <nerdtron> so that the next time you run the command again, it will generate the keys for you
[8:56] <shoosah> yy-nm: I just check the bootstrap-mds, the ceph.keyring is already in there
[8:56] * dirk (~dirk@nrbg-4dbfd808.pool.mediaWays.net) Quit (Remote host closed the connection)
[8:56] <yy-nm> .....,are you sure about the generate the keys aparts
[8:57] <yy-nm> shoosah, do you have admin keyring file ?
[8:57] <shoosah> I already created two ods then Im pretty sure that the keys are generated already
[8:57] <shoosah> two osds*
[8:58] <nerdtron> shoosah copy the contents of the /etc/ceph/ of the nodes to your front end
[8:59] <nerdtron> be sure that the permissions are 644
[8:59] * sleinen (~Adium@eduroam-4-152.epfl.ch) Quit (Ping timeout: 480 seconds)
[9:00] * allsystemsarego (~allsystem@188.25.130.190) has joined #ceph
[9:01] <joelio> get the latest ceph-deploy, from gitbuilder or master - there have been lots of bugs squished in the past 2 weeks
[9:03] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:05] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:06] <shoosah> nerdtron : where shall I exactly copy ceph.client.admin.keyring in?
[9:06] * rongze (~quassel@li565-182.members.linode.com) Quit (Remote host closed the connection)
[9:06] <nerdtron> on what folder are you executing ceph-deploy command?
[9:07] <shoosah> home/usr/my-cluster
[9:07] * rongze (~quassel@notes4.com) has joined #ceph
[9:07] <nerdtron> /etc/ceph/ go to that folder, copy the /etc/ceph/ of the nodes, also copy the keyrings...
[9:07] <shoosah> the content of etc/ceph are already in there
[9:07] <nerdtron> wait, gather keys should copy the keyrings to you
[9:08] <shoosah> yes it did already!
[9:08] * mschiff (~mschiff@p4FD7EB63.dip0.t-ipconnect.de) has joined #ceph
[9:08] <nerdtron> permissions?
[9:08] <shoosah> I didnt get you 644?!
[9:08] <nerdtron> yes
[9:09] <nerdtron> persmission of the ceph.conf and the keyrings
[9:10] <shoosah> here is the content of my ceph.conf :
[9:10] <shoosah> [global]
[9:10] <shoosah> fsid = 14ff0ac1-5ecf-4491-9d38-198c903c6e37
[9:10] <shoosah> mon initial members = en-439--215-005
[9:10] <shoosah> mon host = 130.216.217.46
[9:10] <shoosah> auth supported = cephx
[9:10] <shoosah> osd journal size = 1024
[9:10] <shoosah> filestore xattr use omap = true
[9:10] <shoosah> osd crush chooseleaf type = 0
[9:10] <joelio> pastebin dude
[9:10] * tnt (~tnt@109.130.80.16) Quit (Ping timeout: 480 seconds)
[9:11] <joelio> also, there is a ceph-deploy admin command for deploying admin keys and config
[9:11] <joelio> as I said though, I'd use latest ceph-deploy. You may find you're hitting bugs
[9:11] <nerdtron> ls -l will list permissions? are you sure the permissions are 644?
[9:12] <joelio> seriously, if you're messing about with file perms, something's wrong :)
[9:12] <shoosah> nerdtron I just checked them out, its not actually 644
[9:13] <shoosah> how I am supposed to change it?
[9:13] <shoosah> joelio Im struggling to not go through the whole thing again :D
[9:13] <nerdtron> yeah dude...just make a working ceph cluster a few weeks ago
[9:13] <nerdtron> cd /etc/ceph
[9:13] <nerdtron> chmod 644 *
[9:14] <nerdtron> wait....you don't know how to change permissions and your building a ceph cluster?? wow
[9:15] <joelio> shoosah: you will waste more time hitting the bugs, than what it would take to purge, get the gitbuilder package, install - but that's just me
[9:15] <shoosah> Im totally new and so confused
[9:16] <joelio> I'd try and do the install with someone who has some unix skills though, it's not hard, but you may struggle if not (plug you learn too!) :)
[9:16] <joelio> s/plug/plus
[9:17] <nerdtron> so was it? you managed to change the permissions?
[9:17] <nerdtron> do it on all nodes just to make sure
[9:18] <shoosah> yes nerdtron
[9:18] <shoosah> thats what I did
[9:18] <nerdtron> try ceph deploy again
[9:18] <shoosah> and it still didnt work out!
[9:18] * joelio gives up
[9:18] <nerdtron> with sudo?
[9:20] <shoosah> yeah
[9:20] * saabylaptop (~saabylapt@46.30.211.3) has joined #ceph
[9:20] <nerdtron> still the same error?
[9:21] <shoosah> yes unfortunately, I guess joelio is right
[9:21] <shoosah> I do need to purge and clean everything!
[9:21] <nerdtron> yes..that what i said..ceph-deploy uninstall
[9:22] <nerdtron> ceph-deploy purge and purge all
[9:22] <nerdtron> also be sure that you can ssh passwordless
[9:22] <nerdtron> and run sudo passwordless too
[9:22] <nerdtron> on all nodes ofcourse
[9:22] <shoosah> I did that too
[9:23] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:23] <shoosah> alright
[9:23] <joelio> shoosah: what distro/version?
[9:23] <joelio> sudo passwordless isn't needed
[9:25] <shoosah> nerdtron purge all doesnt really work, does it?!
[9:26] * shoosah (~ssha637@en-279303.engad.foe.auckland.ac.nz) Quit (Quit: Konversation terminated!)
[9:27] <nerdtron> sorry purgedata and purge and uninstall....it purges the keys and conf so that when you run ceph-deploy install, and mon create, new keys will lbe generated
[9:28] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[9:29] * scuttlemonkey (~scuttlemo@mdc2036d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[9:32] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Always try to be modest, and be proud about it!)
[9:38] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Read error: Operation timed out)
[9:38] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:39] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) Quit (Ping timeout: 480 seconds)
[9:41] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[9:48] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:52] * doubleg (~doubleg@69.167.130.11) Quit (Remote host closed the connection)
[9:52] * sleinen1 (~Adium@2001:620:0:25:1e2:46ae:a53a:17ef) Quit (Read error: Connection reset by peer)
[9:52] * doubleg (~doubleg@69.167.130.11) has joined #ceph
[9:57] <Gugge-47527> 2013-08-07 09:55:37.065563 7fd97de17700 0 log [WRN] : slow request 960.926798 seconds old, received at 2013-08-07 09:39:36.138694: osd_op(client.17481.1:11555349 rb.0.273f.238e1f29.000000035651 [read 1114112~65536] 9.416b754e RETRY=-1 e5593) currently reached pg
[9:57] <Gugge-47527> how do i fix an error like that?
[9:57] <Gugge-47527> should there be a pg id in the end or something?
[9:59] * sleinen1 (~Adium@2001:620:0:26:1052:a4ab:45d8:ea38) has joined #ceph
[9:59] <nerdtron> have you configured ntp?
[10:01] <absynth> no, the error message is complete as is
[10:01] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[10:01] <absynth> slow requests are often a result of i/o saturation on an OSD
[10:01] <nerdtron> ceph -s what is the output?
[10:03] * Vincent_Valentine (~Vincent_V@115.119.113.218) has joined #ceph
[10:04] <Gugge-47527> absynth: osd.0 is bored, and that paste was after a restart, before the restart the slow request was 10 hours old
[10:04] * LeaChim (~LeaChim@2.222.172.106) has joined #ceph
[10:04] <Gugge-47527> i guess i could crank up debug and restart the osd again :)
[10:04] <madkiss> Gugge-47527: do you have your journals on an SSD?
[10:05] <absynth> Gugge-47527: a likely fix is often marking the respective OSD as down instead of restarting it
[10:05] <absynth> that often fixes singular slow requests (i.e. they simply disappear)
[10:05] <madkiss> err. the solution to slow requests is not kicking OSDs out of the cluster at random
[10:05] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[10:05] <Gugge-47527> madkiss: yes, the journal is on ssd
[10:05] <absynth> exactly, that's why you usually mark them down instead of restarting them
[10:05] <madkiss> Gugge-47527: how many OSDs share the SSD for their journals?
[10:06] <Gugge-47527> 4
[10:06] <madkiss> and what does iostat tell you about the utilization of the ssd?
[10:07] <Gugge-47527> util at around 40%, wwait at 3-4ms and rwait at 0.8-1ms
[10:07] <madkiss> okay, what ceph version?
[10:07] <Gugge-47527> 0.61.7
[10:07] <madkiss> i see
[10:08] <Gugge-47527> and its just that single io that stalls (a single rbd)
[10:08] <Gugge-47527> all other rbd's work fine
[10:08] <Gugge-47527> with read and writes, to osd.0 too :)
[10:08] <madkiss> hm
[10:09] <absynth> is that a singular issue i.e. are we talking about *one* slow request ever, or does this happen often?
[10:09] <Gugge-47527> first time
[10:09] <madkiss> in that case, absynth is probably right ;)
[10:09] <absynth> doesn't sound very scientific, but i'd say this has to do with sun radiation, the moon phase or something
[10:09] <absynth> don't worry too much about it
[10:10] <Gugge-47527> marking it down, and up again?
[10:10] <absynth> you don't need to mark it up
[10:10] <absynth> it will report that it was wrongly marked down and nothing will happen actually
[10:10] <absynth> at least that was the behavior in 0.56
[10:11] <Gugge-47527> done, and the rbd retried the write and i now have
[10:11] <Gugge-47527> 2013-08-07 10:12:05.233398 7fad89f2a700 0 log [WRN] : slow request 30.906445 seconds old, received at 2013-08-07 10:11:34.326899: osd_op(client.17481.1:11555349 rb.0.273f.238e1f29.000000035651 [read 1114112~65536] 9.416b754e RETRY=-1 e5605) currently reached pg
[10:12] <madkiss> ahum. fishy OSD?
[10:12] <Gugge-47527> nothing strange in any logs
[10:14] <absynth> hm, that's strange
[10:14] <Gugge-47527> can i check which pg is in use?
[10:14] <Gugge-47527> and run a scrub or something on that?
[10:14] <absynth> yeah, but i don't remember how
[10:14] <absynth> lemme ask oliver
[10:16] <absynth> you can dump the operations that are in flight
[10:16] <absynth> ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_ops_in_flight
[10:16] <absynth> (on osd.0, of course)
[10:17] <Gugge-47527> yep :)
[10:17] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[10:18] <Gugge-47527> http://paste2.org/9yYPbp1f
[10:18] <Gugge-47527> the new op, is me trying to do a rados get on the object
[10:18] <Gugge-47527> ot stalled too
[10:18] <Gugge-47527> it stalled too
[10:19] <absynth> uh
[10:19] <absynth> waiting_for_osdmap
[10:19] <absynth> that is a weird reason for stalling
[10:19] <absynth> madkiss: any idea?
[10:19] * oliver1 (~oliver@p4FD06F72.dip0.t-ipconnect.de) has joined #ceph
[10:21] <madkiss> ahum
[10:21] <Gugge-47527> and if i keep dumping the ops in flight, i see other ops going by fine :)
[10:22] <madkiss> some network weirdones?
[10:22] <Gugge-47527> the rados get was from the local machine running the osd
[10:22] * jefferai (~quassel@corkblock.jefferai.org) Quit (Quit: No Ping reply in 180 seconds.)
[10:22] * odyssey4me (~odyssey4m@165.233.205.190) has joined #ceph
[10:23] * jefferai (~quassel@corkblock.jefferai.org) has joined #ceph
[10:23] <Gugge-47527> if i could find the pg containing rb.0.273f.238e1f29.000000035651, i could try to read other stuff from the same pg
[10:25] <oliver1> Gugge-47527: Hi, if you have the image-name you could try a "ceph osd map <pool> <image>", output ala:
[10:25] <oliver1> "osdmap e59832 pool '9997' (15) object 'vm-103-disk-1.rbd' -> pg 15.ceda74ab (15.2b) -> up [1,62] acting [1,62]"
[10:25] <madkiss> ceph osd getmap -o /tmp/osdmap && osdmaptool --test-map-object rb.0.273f.238e1f29.000000035651 /tmp/osdmap
[10:26] <madkiss> should tell you which pg it is.
[10:26] <Gugge-47527> image is the object?
[10:26] <Gugge-47527> ohh, the rbd
[10:27] <oliver1> the rbd, yes. Having all the same prefix "rb.0.273f.238e1f29"
[10:28] <Gugge-47527> so image would be rb.0.273f.238e1f29.000000035651 or rb.0.273f.238e1f29 ?
[10:28] <madkiss> http://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object
[10:28] <madkiss> might come in handy
[10:29] <Gugge-47527> ohh this is interresting
[10:30] <Gugge-47527> up [5,3] acting [0,5]
[10:30] <Gugge-47527> osdmap e5606 pool 'rbd' (9) object 'rb.0.273f.238e1f29.000000035651' -> pg 9.416b754e (9.54e) -> up [5,3] acting [0,5]
[10:32] <Gugge-47527> the cluster is right now both changing size from 1 to 2, and adding a new osd, so i guess its in a not normal state :)
[10:35] <yy-nm> try out the osd.5 maybe can work, i guess
[10:35] <absynth> ah, i think that explains the "waiting for osdmap"
[10:35] <Gugge-47527> 9.54e id active+degraded+remapped+wait_backfill
[10:35] <Gugge-47527> is
[10:35] <Gugge-47527> i guess its not on anything else than 0 yet :P
[10:36] <absynth> yeah
[10:36] <absynth> wait for the backfill to complete, then check if your slow request is still there
[10:36] <absynth> ...i guess...
[10:36] <Gugge-47527> ill be back in a week :)
[10:36] <Gugge-47527> if it doesnt fix itself :)
[10:36] <absynth> did you increase the replica count to 2 AND add an osd at the same time?
[10:37] <Gugge-47527> yes :)
[10:37] <absynth> that is... courageous
[10:37] <Gugge-47527> its not in production yet :)
[10:49] * Vincent_Valentine (~Vincent_V@115.119.113.218) Quit (Read error: Connection reset by peer)
[10:49] * VincentValentine (~Vincent_V@115.119.113.218) has joined #ceph
[10:51] * odyssey4me (~odyssey4m@165.233.205.190) Quit (Ping timeout: 480 seconds)
[10:54] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[10:59] * sleinen2 (~Adium@2001:620:0:25:80e0:82c0:8366:15b4) has joined #ceph
[11:03] * sleinen1 (~Adium@2001:620:0:26:1052:a4ab:45d8:ea38) Quit (Ping timeout: 480 seconds)
[11:04] * yy-nm (~chatzilla@115.196.74.105) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[11:05] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:19] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[11:24] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:24] * KindOne- (KindOne@h104.211.89.75.dynamic.ip.windstream.net) has joined #ceph
[11:25] * KindOne- is now known as KindOne
[11:29] * steki (~steki@198.199.65.141) has joined #ceph
[11:33] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[11:34] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:38] * steki (~steki@198.199.65.141) Quit (Ping timeout: 480 seconds)
[11:38] <ccourtaut> loicd: hi
[11:39] <loicd> ccourtaut: \o I'm writing a transcript of the erasure coding session to make sure I did not miss anything. This is tedious ... but useful ;-)
[11:39] <ccourtaut> loicd: indeed
[11:39] <ccourtaut> loicd: i was looking the talk about s3 compliance of radosgw
[11:39] <ccourtaut> i wrote a proposal of naming convention in BNF form
[11:40] <ccourtaut> https://github.com/kri5/ceph/blob/wip-s3-compliance-doc/doc/radosgw/s3_compliance.rst
[11:40] <ccourtaut> loicd: can you take a look at it for advice?
[11:59] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[12:01] <loicd> ccourtaut: sure
[12:01] * sleinen1 (~Adium@2001:620:0:25:48:4600:f1d8:1fd) has joined #ceph
[12:01] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[12:02] <loicd> ccourtaut: could you give me an example of how that would be used ?
[12:03] * bergerx_ (~bekir@78.188.204.182) Quit (Quit: Leaving.)
[12:04] * sleinen2 (~Adium@2001:620:0:25:80e0:82c0:8366:15b4) Quit (Ping timeout: 480 seconds)
[12:21] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:29] * sleinen1 (~Adium@2001:620:0:25:48:4600:f1d8:1fd) Quit (Quit: Leaving.)
[12:29] * sleinen (~Adium@eduroam-4-152.epfl.ch) has joined #ceph
[12:31] * VincentValentine (~Vincent_V@115.119.113.218) Quit (Ping timeout: 480 seconds)
[12:37] * sleinen (~Adium@eduroam-4-152.epfl.ch) Quit (Ping timeout: 480 seconds)
[12:40] * huangjun (~kvirc@221.234.156.126) Quit (Read error: Connection reset by peer)
[12:48] * Vincent_Valentine (~Vincent_V@115.119.113.218) has joined #ceph
[12:50] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[12:50] * ChanServ sets mode +v andreask
[12:57] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Read error: Operation timed out)
[13:00] * nerdtron (~kenneth@202.60.8.252) Quit (Ping timeout: 480 seconds)
[13:02] * Vincent_Valentine (~Vincent_V@115.119.113.218) Quit (Ping timeout: 480 seconds)
[13:05] * Vincent_Valentine (~Vincent_V@183.82.2.214) has joined #ceph
[13:09] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Read error: Operation timed out)
[13:14] * Fetch_ (fetch@gimel.cepheid.org) Quit (Remote host closed the connection)
[13:14] * AfC (~andrew@2001:44b8:31cb:d400:f4e1:b1cd:48bf:d89c) has joined #ceph
[13:15] * Vincent_Valentine (~Vincent_V@183.82.2.214) Quit (Ping timeout: 480 seconds)
[13:16] <cfreak201> Can i somehow reduce the ceph-osd load? Currently a 250 MB/s dd-write is eating 4 cores on my two storage nodes... since the ceph-osd process shows up in top as high cpu consumption i doubt its related to the underlying FS ?
[13:16] <cfreak201> network connection is 10g between all nodes...
[13:25] <ccourtaut> loicd: as an example, the GET Bucket acl header option AccessControlList would be something like req_GET_Bucket_acl_AccessControlList
[13:26] <ccourtaut> to have a unique name to put in the code, so we can track it back
[13:27] <loicd> so you would have a permalink such as https://github.com/kri5/ceph/blob/wip-s3-compliance-doc/doc/radosgw/s3_compliance.rst#req_GET_Bucket_acl_AccessControlList that would lead you directly to the relevant entry in the table ?
[13:29] <loicd> if reading the table you would read "req_GET_Bucket_acl_AccessControlList" in the (left cell ?) and then get log -S'req_GET_Bucket_acl_AccessControlList' to find the code.
[13:30] <loicd> if reading the code you would just click on the // https://github.com/kri5/ceph/blob/wip-s3-compliance-doc/doc/radosgw/s3_compliance.rst#req_GET_Bucket_acl_AccessControlList URL and your browser would do the rest ? probably most convenient when browsing the code from the web too.
[13:30] <loicd> Am I understanding correctly what you're after ?
[13:33] * CliMz (~CliMz@194.88.193.33) has joined #ceph
[13:34] * julian (~julianwa@125.69.104.58) has joined #ceph
[13:34] <ccourtaut> loicd: yes, exactly as we discussed it yesterday
[13:34] * julian (~julianwa@125.69.104.58) Quit ()
[13:35] <loicd> ok, sounds perfect to me. Nice idea to propose a naming scheme that makes it simple to avoid name clashes ;-)
[13:35] * Fetch_ (fetch@gimel.cepheid.org) has joined #ceph
[13:37] <ccourtaut> loicd: thanks, the idea camed from the discussion, as the naming would be an issue, a naming convention seemed appropriate to me :)
[13:37] * CliMz (~CliMz@194.88.193.33) Quit (Remote host closed the connection)
[13:41] * CliMz (~CliMz@194.88.193.33) has joined #ceph
[13:46] * john_barbee (~jbarbee@108-236-99-170.lightspeed.iplsin.sbcglobal.net) has joined #ceph
[13:51] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[13:51] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[13:51] * ChanServ sets mode +v andreask
[13:58] <loicd> ccourtaut: working on the transcript of http://www.youtube.com/watch?v=-K8bSHx7zJ0 using VLC to watch it and Alt-left Alt-right to seek 10 seconds back/forward. Neat trick to avoid installing flash ( because gnash, for some reason, seeks back 10 minutes instead of 10 seconds when used in firefox ). A life saver :-)
[13:58] <ccourtaut> loicd: great! :)
[13:59] * AfC (~andrew@2001:44b8:31cb:d400:f4e1:b1cd:48bf:d89c) Quit (Quit: Leaving.)
[13:59] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[14:03] <Kioob`Taff> Hi
[14:04] <Kioob`Taff> is there a CLI tool to easily store objects in ceph ?
[14:04] <Kioob`Taff> or should I install radosgw + a swift client ?
[14:04] * sleinen (~Adium@2001:620:0:26:88c5:8d40:accd:b370) has joined #ceph
[14:05] <Kioob`Taff> I can use RBD + dd, but it looks ugly :p
[14:10] * mikedawson_ (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[14:15] <Kioob`Taff> What I would like is, from a bash script, be able to do : MY_VAR=`rados get conf/foo`
[14:15] <brother> Kioob`Taff: 'rados put'
[14:15] * diegows (~diegows@190.190.11.42) has joined #ceph
[14:15] <Kioob`Taff> great :)
[14:16] <Kioob`Taff> « Write object name to the cluster with contents from infile »
[14:16] <Kioob`Taff> no way to use STDIN or STDOUT ?
[14:16] <Kioob`Taff> "-"
[14:16] <Kioob`Taff> wonderfull :)
[14:16] <Kioob`Taff> thanks brother
[14:17] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:17] <brother> you're welcome
[14:19] * john_barbee (~jbarbee@108-236-99-170.lightspeed.iplsin.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[14:19] <loicd> sjust: would you mind moving https://github.com/ceph/ceph/blob/wip-erasure-coding-doc/doc/dev/osd_internals/erasure_coding.rst to master ? I'm inserting in the transcript of the session & other places, it would be good that they are permalinks ;)
[14:20] * john_barbee (~jbarbee@165.sub-70-198-67.myvzw.com) has joined #ceph
[14:20] <loicd> s/inserting/inserting URLs/
[14:22] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[14:22] <agh> Hello, do you have any best practice to fully monitor a Ceph Cluster ? (tools, way to get metrics, etc) ?
[14:23] <agh> I used to use Zabbix, is there somewhere a Zabbix template already done ?
[14:23] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[14:24] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[14:25] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:37] <loicd> ccourtaut: unfortunately vlc 2.0.6 / 2.0.8 crashes every 5 minutes when I use Alt-left. But reporting on #videolan
[14:38] <ccourtaut> loicd: :/
[14:39] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:39] * Vincent_Valentine (Vincent_Va@49.206.158.155) has joined #ceph
[14:41] <loicd> ccourtaut: I wonder if vlc can be used to download youtube video. Maybe part of the problem comes from code dealing with streamed videos ?
[14:42] <ccourtaut> loicd: i really have no clue on this on
[14:54] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[14:56] * sleinen (~Adium@2001:620:0:26:88c5:8d40:accd:b370) Quit (Quit: Leaving.)
[14:57] * sleinen (~Adium@eduroam-4-236.epfl.ch) has joined #ceph
[15:04] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[15:05] * sleinen (~Adium@eduroam-4-236.epfl.ch) Quit (Ping timeout: 480 seconds)
[15:11] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:12] * john_barbee (~jbarbee@165.sub-70-198-67.myvzw.com) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[15:13] * sleinen1 (~Adium@2001:620:0:26:b880:7c20:eca6:d7c7) has joined #ceph
[15:13] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[15:14] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[15:14] * ChanServ sets mode +v andreask
[15:16] * sleinen1 (~Adium@2001:620:0:26:b880:7c20:eca6:d7c7) Quit ()
[15:20] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[15:20] * sleinen1 (~Adium@2001:620:0:25:f10a:4bb7:b988:bf0) has joined #ceph
[15:27] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:30] <niklas> what does it mean if "ceph osd map" lists four osds at the bottom which do not seem to belong to any host?
[15:31] * KevinPerks (~Adium@216.1.187.162) has joined #ceph
[15:32] <joelio> niklas: ceph osd map? root@vm-ds-01:~# ceph osd map
[15:32] <joelio> unknown command map
[15:32] <joelio> ceph osd tree?
[15:32] <niklas> yep, osd tree was what I meant
[15:32] <niklas> sry
[15:32] <joelio> n/p
[15:33] <joelio> want to pastebin it so we can see?
[15:36] <niklas> http://pastebin.com/WQisPgyQ
[15:36] <niklas> There are more hosts before that
[15:36] <niklas> but they seem fine
[15:37] <joelio> hmm, strange, looks like they're orphaned from a host
[15:37] <joelio> you should be able to readd I would have though
[15:37] <niklas> how would I do that?
[15:38] <niklas> I can find out on which host they actually run with osd dump, but that does not solve the problem
[15:40] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[15:41] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit ()
[15:43] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[15:45] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit ()
[15:57] * rongze (~quassel@notes4.com) Quit (Remote host closed the connection)
[15:57] * rongze (~quassel@117.79.232.203) has joined #ceph
[16:02] * sleinen1 (~Adium@2001:620:0:25:f10a:4bb7:b988:bf0) Quit (Ping timeout: 480 seconds)
[16:02] <niklas> Any ideas?
[16:05] <joelio> niklas: sorry, been afk - I'd imagine you'd need to check the crushmap - either that or delete the orphaned OSDs and readd
[16:05] <janos> niklas - i'd do a manual crush set
[16:06] <janos> something like ceph osd crush set {osd name} {weight} root=blah rack=blah2 host=hostblah
[16:06] <janos> with the appropriate values of course ;)
[16:06] <joelio> janos: good to know
[16:07] <niklas> I'll check on that as soon as my cluster recovered from the changes I just made
[16:07] <janos> they appear to be weight 0 in your paste
[16:07] <janos> i'd add them one by one and verify after each
[16:07] <janos> and let rebalances occur
[16:07] <janos> or heck
[16:07] <janos> put in at weight 0
[16:09] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[16:10] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[16:17] * ishkabob (~c7a82cc0@webuser.thegrebs.com) has joined #ceph
[16:18] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:22] <niklas> works, thanks
[16:23] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[16:25] * sleinen1 (~Adium@2001:620:0:26:3144:ea4c:4a15:b3f3) has joined #ceph
[16:25] * sleinen1 (~Adium@2001:620:0:26:3144:ea4c:4a15:b3f3) Quit ()
[16:26] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[16:27] <joelio> /wi2
[16:27] <joelio> dohhhh
[16:28] <niklas> Is it that when I remove an osd from a cluster and then start creating new osds, some of the new osds get the same id as the old ones?
[16:28] <niklas> thus everything breaks when I decide to readd the old ones?
[16:29] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC (EOF))
[16:34] * sleinen1 (~Adium@2001:620:0:25:f158:5d94:2ea5:5754) has joined #ceph
[16:37] <niklas> I have one unfound pg
[16:37] <niklas> how do I find out which one it is?
[16:38] * sleinen1 (~Adium@2001:620:0:25:f158:5d94:2ea5:5754) Quit ()
[16:39] <Kioob`Taff> niklas: ceph health detail
[16:39] <janos> niklas, there really should not be old ones to add
[16:39] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[16:39] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[16:39] <janos> remove old ones - making them effectively not-ones
[16:39] <janos> so even though id's get re-used, it's still only new ones
[16:41] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:41] * n1md4 (~nimda@anion.cinosure.com) Quit (Ping timeout: 480 seconds)
[16:42] * alfredodeza (~alfredode@216.1.187.162) has joined #ceph
[16:47] * The_Bishop (~bishop@2001:470:50b6:0:b90d:9906:f15e:d46b) has joined #ceph
[16:51] * alfredodeza (~alfredode@216.1.187.162) Quit (Remote host closed the connection)
[16:57] * alfredodeza (~alfredode@216.1.187.162) has joined #ceph
[16:58] <cmdrk> Do my RBD client machines need to have the same ceph.conf as the rest of the cluster? Or do they just need to contain the [client] section of ceph.conf?
[16:59] <tnt> cmdrk: they at least need the [mon] sections to get the IP of the mons.
[16:59] * sleinen1 (~Adium@2001:620:0:26:e8f2:c655:8528:b6fb) has joined #ceph
[17:00] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[17:00] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[17:00] <cmdrk> gotcha, ok
[17:00] <cmdrk> thanks
[17:01] <cmdrk> oh, one more question -- can a rbd disk be mounted on multiple hosts simultaneously?
[17:01] <cmdrk> i'm assuming no
[17:02] <guppy> cmdrk: no
[17:02] <cmdrk> thanks
[17:03] * joao (~JL@2607:f298:a:607:9eeb:e8ff:fe0f:c9a6) has joined #ceph
[17:03] * ChanServ sets mode +o joao
[17:05] * saabylaptop (~saabylapt@46.30.211.3) Quit (Quit: Leaving.)
[17:09] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[17:10] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has left #ceph
[17:12] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[17:15] * KevinPerks (~Adium@216.1.187.162) Quit (Quit: Leaving.)
[17:15] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[17:18] * joao (~JL@2607:f298:a:607:9eeb:e8ff:fe0f:c9a6) Quit (Ping timeout: 480 seconds)
[17:20] * sleinen1 (~Adium@2001:620:0:26:e8f2:c655:8528:b6fb) Quit (Quit: Leaving.)
[17:20] * CliMz (~CliMz@194.88.193.33) Quit (Ping timeout: 480 seconds)
[17:22] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[17:25] * alfredodeza (~alfredode@216.1.187.162) Quit (Remote host closed the connection)
[17:31] * Cube (~Cube@38.122.20.226) has joined #ceph
[17:33] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Read error: Operation timed out)
[17:37] * sleinen1 (~Adium@2001:620:0:26:448b:f29f:73be:5a4) has joined #ceph
[17:44] * sprachgenerator (~sprachgen@130.202.135.222) has joined #ceph
[17:47] * devoid (~devoid@130.202.135.223) has joined #ceph
[17:49] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:52] * Cube (~Cube@38.122.20.226) Quit (Quit: Leaving.)
[17:58] * oliver1 (~oliver@p4FD06F72.dip0.t-ipconnect.de) has left #ceph
[18:00] <Gugge-47527> cmdrk: an rbd can be mapped on all the hosts you want at the same time
[18:00] <Gugge-47527> cmdrk: most filesystems can only be mounted on one host at a time though
[18:03] * CliMz (~CliMz@AAnnecy-651-1-352-57.w90-10.abo.wanadoo.fr) has joined #ceph
[18:03] * joao (~JL@mc35636d0.tmodns.net) has joined #ceph
[18:03] * ChanServ sets mode +o joao
[18:03] <loicd> ccourtaut: done ! http://wiki.ceph.com/01Planning/CDS/Emperor/Transcript_:_Erasure_coded_storage_backend_%28step_2%29
[18:04] * tnt (~tnt@109.130.80.16) has joined #ceph
[18:07] * leseb (~leseb@88-190-214-97.rev.dedibox.fr) Quit (Killed (NickServ (Too many failed password attempts.)))
[18:07] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[18:07] * leseb (~leseb@88-190-214-97.rev.dedibox.fr) has joined #ceph
[18:09] * alfredodeza (~alfredode@mba2036d0.tmodns.net) has joined #ceph
[18:11] <cmdrk> Gugge-47527: Ok - do I get a nice error or something if it's attempting to be mounted multiple places simultaneously?
[18:12] <janos> cmdrk: i'd assume you just get nice corruption
[18:14] * bandrus (~Adium@38.106.55.34) has joined #ceph
[18:17] * CliMz (~CliMz@AAnnecy-651-1-352-57.w90-10.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[18:24] * alfredodeza (~alfredode@mba2036d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[18:26] <cmdrk> janos: well, at least it's nice :)
[18:28] * sleinen1 (~Adium@2001:620:0:26:448b:f29f:73be:5a4) Quit (Quit: Leaving.)
[18:29] * sleinen (~Adium@eduroam-4-246.epfl.ch) has joined #ceph
[18:29] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:32] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:32] * devoid (~devoid@130.202.135.223) Quit (Ping timeout: 480 seconds)
[18:37] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[18:37] * sleinen (~Adium@eduroam-4-246.epfl.ch) Quit (Ping timeout: 480 seconds)
[18:40] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[18:40] * joao (~JL@mc35636d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[18:43] * bandrus (~Adium@38.106.55.34) Quit (Quit: Leaving.)
[18:46] * LeaChim (~LeaChim@2.222.172.106) Quit (Ping timeout: 480 seconds)
[18:54] * LeaChim (~LeaChim@97e00998.skybroadband.com) has joined #ceph
[18:57] <troug> Getting a failure on ceph-deploy prepare for a virtual disk in a ubuntu vm. Any reason why this shouldn't work?
[18:57] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:58] <troug> INFO:ceph-disk:Will colocate journal with data on /dev/sdb
[18:58] <troug> umount: /var/lib/ceph/tmp/mnt.MwSmOC: device is busy.
[19:00] * Vincent_Valentine (Vincent_Va@49.206.158.155) Quit (Ping timeout: 480 seconds)
[19:03] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[19:11] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[19:23] * diegows (~diegows@200.68.116.185) has joined #ceph
[19:27] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) has joined #ceph
[19:59] * mschiff (~mschiff@p4FD7EB63.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[20:02] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) has joined #ceph
[20:03] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) Quit (Ping timeout: 480 seconds)
[20:04] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[20:05] * haomaiwang (~haomaiwan@106.120.176.101) Quit (Ping timeout: 480 seconds)
[20:05] * rongze (~quassel@117.79.232.203) Quit (Ping timeout: 480 seconds)
[20:15] * indeed (~indeed@206.124.126.33) has joined #ceph
[20:31] * devoid (~devoid@130.202.135.223) has joined #ceph
[20:38] * dhsmith (~dhsmith@206-212-235-34.dynamic.onlinenw.com) has joined #ceph
[20:39] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[20:43] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[20:49] * ntranger (~ntranger@proxy2.wolfram.com) Quit ()
[20:57] <grepory> do inconsistent pgs eventually repair themselves?
[20:58] <grepory> i think in like 4 weeks of uptime i just got my first alert about a pg not being active+clean… active+clean+inconsistent. i guess i could have waited to see what happened, but started the repair manually.
[21:01] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[21:03] * alfredodeza (~alfredode@mba2036d0.tmodns.net) has joined #ceph
[21:06] * alfredodeza (~alfredode@mba2036d0.tmodns.net) Quit (Read error: Connection reset by peer)
[21:09] * rongze (~quassel@117.79.232.191) has joined #ceph
[21:11] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[21:16] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[21:31] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:34] * allsystemsarego (~allsystem@188.25.130.190) Quit (Quit: Leaving)
[21:37] * devoid (~devoid@130.202.135.223) Quit (Quit: Leaving.)
[21:43] * devoid (~devoid@130.202.135.223) has joined #ceph
[21:54] * alfredodeza (~alfredode@mba2036d0.tmodns.net) has joined #ceph
[21:55] * madkiss (~madkiss@2001:6f8:12c3:f00f:24f4:40e7:22e3:d5ff) Quit (Ping timeout: 480 seconds)
[21:56] * haomaiwang (~haomaiwan@117.79.232.191) has joined #ceph
[21:59] * mschiff (~mschiff@85.182.236.82) has joined #ceph
[22:02] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[22:02] * doubleg (~doubleg@69.167.130.11) Quit (Ping timeout: 480 seconds)
[22:02] * markbby (~Adium@168.94.245.2) has joined #ceph
[22:15] * KevinPerks (~Adium@38.106.55.34) has joined #ceph
[22:18] * KevinPerks (~Adium@38.106.55.34) has left #ceph
[22:20] * alfredodeza (~alfredode@mba2036d0.tmodns.net) Quit (Remote host closed the connection)
[22:22] * alfredodeza (~alfredode@mba2036d0.tmodns.net) has joined #ceph
[22:23] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[22:25] * joao (~JL@mc35636d0.tmodns.net) has joined #ceph
[22:25] * ChanServ sets mode +o joao
[22:32] * alfredod_ (~alfredode@mba2036d0.tmodns.net) has joined #ceph
[22:32] * alfredodeza (~alfredode@mba2036d0.tmodns.net) Quit (Remote host closed the connection)
[22:37] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[22:59] * sleinen (~Adium@2001:620:0:25:8ce0:88a9:4629:accd) has joined #ceph
[23:08] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[23:08] * ChanServ sets mode +v andreask
[23:08] * dhsmith (~dhsmith@206-212-235-34.dynamic.onlinenw.com) Quit (Ping timeout: 480 seconds)
[23:16] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[23:18] * alfredod_ (~alfredode@mba2036d0.tmodns.net) Quit (Remote host closed the connection)
[23:24] * indeed (~indeed@206.124.126.33) Quit (Remote host closed the connection)
[23:26] * sprachgenerator (~sprachgen@130.202.135.222) Quit (Quit: sprachgenerator)
[23:27] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[23:32] * joao (~JL@mc35636d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[23:33] * indeed (~indeed@206.124.126.33) has joined #ceph
[23:45] * portante is now known as portante|afk
[23:59] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.