#ceph IRC Log

Index

IRC Log for 2013-08-12

Timestamps are in GMT/BST.

[0:12] * janisg (~troll@85.254.50.23) Quit (Ping timeout: 480 seconds)
[0:13] * janisg (~troll@85.254.50.23) has joined #ceph
[0:19] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[0:20] * AaronSchulz_ (~chatzilla@216.38.130.164) has joined #ceph
[0:21] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[0:22] * AaronSchulz (~chatzilla@216.38.130.164) Quit (Ping timeout: 480 seconds)
[0:23] * AaronSchulz_ is now known as AaronSchulz
[0:24] * LeaChim (~LeaChim@176.27.136.68) has joined #ceph
[0:29] * mschiff_ (~mschiff@port-28418.pppoe.wtnet.de) has joined #ceph
[0:33] * mschiff (~mschiff@port-28418.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[0:35] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:57] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:57] * ChanServ sets mode +o scuttlemonkey
[1:06] <lurbs> Anyone else getting a 403 on http://gitbuilder.ceph.com ?
[1:26] <tnt> yup same here
[1:39] * mschiff_ (~mschiff@port-28418.pppoe.wtnet.de) Quit (Remote host closed the connection)
[1:52] * danieagle_ (~Daniel@177.97.249.166) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[1:58] * AfC (~andrew@ppp244-218.static.internode.on.net) has joined #ceph
[2:10] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:11] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[2:12] * AfC (~andrew@ppp244-218.static.internode.on.net) Quit (Remote host closed the connection)
[2:12] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) has joined #ceph
[2:14] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:17] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) Quit (Remote host closed the connection)
[2:17] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) has joined #ceph
[2:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:32] * huangjun (~kvirc@111.174.91.224) has joined #ceph
[2:40] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) Quit (Remote host closed the connection)
[2:40] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) has joined #ceph
[2:55] * danieagle (~Daniel@177.97.249.166) has joined #ceph
[2:55] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[2:57] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[2:57] * nerdtron (~kenneth@202.60.8.252) Quit ()
[3:01] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:05] * yy-nm (~chatzilla@122.233.47.137) has joined #ceph
[3:09] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[3:10] * LeaChim (~LeaChim@176.27.136.68) Quit (Ping timeout: 480 seconds)
[3:34] * silversurfer (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:35] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[3:37] * jaydee (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[3:38] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) Quit (Read error: Connection reset by peer)
[3:38] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) has joined #ceph
[3:40] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[3:42] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[3:43] <shimo> are the gitbuilder builds up? getting lots of 403 forbiddens
[3:44] * DarkAce-Z (~BillyMays@50.107.55.36) Quit (Read error: Operation timed out)
[3:46] * jaydee (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:47] <lurbs> shimo: I'm getting the same thing.
[3:50] * silversurfer (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[3:51] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:55] <shimo> kind of sucks because ceph's forks of apache and fastcgi are distributed through the gitbuilder
[3:56] <shimo> so falling back to a stable release is not enough..
[4:00] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[4:06] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:15] <lurbs> Is there a fix for bug 5599 (http://tracker.ceph.com/issues/5599) in the works? It jams up passing a block device, rather than a already existing partition, for the second and subsequent journals to ceph-deploy.
[4:16] * yy-nm (~chatzilla@122.233.47.137) Quit (Read error: Connection reset by peer)
[4:17] * yy-nm (~chatzilla@122.233.47.137) has joined #ceph
[4:20] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[4:26] * julian (~julianwa@125.70.132.20) has joined #ceph
[4:26] * danieagle (~Daniel@177.97.249.166) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[4:26] * julian (~julianwa@125.70.132.20) Quit (Read error: Connection reset by peer)
[4:27] * julian (~julianwa@125.70.132.20) has joined #ceph
[4:27] * julian (~julianwa@125.70.132.20) Quit (Read error: Connection reset by peer)
[4:27] * julian (~julianwa@125.70.132.20) has joined #ceph
[4:28] * julian_ (~julianwa@125.70.132.20) has joined #ceph
[4:28] * julian_ (~julianwa@125.70.132.20) Quit (Read error: Connection reset by peer)
[4:29] * julian_ (~julianwa@125.70.132.20) has joined #ceph
[4:31] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[4:35] <lurbs> Naive patch to fix it, BTW: http://paste.nothing.net.nz/48e17a
[4:35] * julian (~julianwa@125.70.132.20) Quit (Ping timeout: 480 seconds)
[4:47] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:47] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:05] * fireD (~fireD@93-139-188-49.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD_ (~fireD@78-0-237-141.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:23] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[5:24] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit ()
[5:25] * L2SHO_ (~adam@office-nat.choopa.net) has joined #ceph
[5:29] * L2SHO (~adam@office-nat.choopa.net) Quit (Ping timeout: 480 seconds)
[5:38] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[5:53] * fireD (~fireD@93-139-188-49.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:55] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[6:07] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[6:09] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[7:18] * jjgalvez (~jjgalvez@ip68-231-104-168.ph.ph.cox.net) has joined #ceph
[7:19] * Machske (~Bram@81.82.216.124) Quit ()
[7:27] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:27] * KindTwo (KindOne@h98.215.89.75.dynamic.ip.windstream.net) has joined #ceph
[7:27] * KindTwo is now known as KindOne
[7:32] * madkiss (~madkiss@2001:6f8:12c3:f00f:d005:8e65:ae52:766e) has joined #ceph
[7:33] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:54b3:ddf0:d0ab:2531) has joined #ceph
[7:40] * madkiss (~madkiss@2001:6f8:12c3:f00f:d005:8e65:ae52:766e) Quit (Ping timeout: 480 seconds)
[7:41] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:54b3:ddf0:d0ab:2531) Quit (Quit: Leaving.)
[7:41] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[7:41] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit ()
[7:48] * Rom (~Rom@c-107-3-156-152.hsd1.ca.comcast.net) has joined #ceph
[7:51] <Rom> Hi all... I'm trying to setup ceph without ceph-deploy because I need a custom config (public and cluster networks) - does anyone know if there are instructions that do not involve ceph-deploy? I can't seem to find anything complete on that.
[7:55] * lightspeed (~lightspee@81.187.0.153) Quit (Ping timeout: 480 seconds)
[8:00] <jjgalvez> You can actually modify the ceph.conf generated by the ceph-deploy new command, before running any of the next steps. You can implement the public and cluster network directives in that file then run the rest of the deployment commands.
[8:01] <Rom> ahh, I wasn't sure if ceph-deploy would change it again
[8:01] <Rom> I already have the config file I want to use
[8:01] <Rom> thanks!
[8:01] <yy-nm> using mkcephfs?
[8:01] * wer (~wer@206-248-239-142.unassigned.ntelos.net) Quit (Remote host closed the connection)
[8:02] <huangjun> i want to use ceph in hadoop, what should i do?
[8:02] <Rom> Since mkcephfs is deprecated I wasn't sure if it still worked right..
[8:04] <yy-nm> it still work,but you need do more addition work using mkcephfs than using ceph-deploy
[8:09] * KrisK (~krzysztof@213.17.226.11) has joined #ceph
[8:10] * wer (~wer@206-248-239-142.unassigned.ntelos.net) has joined #ceph
[8:16] * AfC1 (~andrew@59.167.244.218) has joined #ceph
[8:16] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) Quit (Read error: Connection reset by peer)
[8:24] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:30] * AfC1 (~andrew@59.167.244.218) Quit (Quit: Leaving.)
[8:31] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) has joined #ceph
[8:35] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:46] * wiwengweng (~oftc-webi@183.62.249.162) has joined #ceph
[8:55] <Rom> Hrm, why is ceph-deploy not creating keys under bootstrap-osd or bootstrap-mds... The monitor creates fine and starts up
[8:56] <Rom> it doesn't even create the directories
[9:02] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:03] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:05] <Rom> is /var/lib/ceph/bootstrap-osd/ceph.keyring the same as /var/lib/ceph/mon/ceph-XXX/keyring ? I have no idea which step actually creates the bootstrap directories
[9:06] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:07] <jjgalvez> the keyring under /var/lib/ceph/mon/ceph-*/keyring will be different than the one under the bootstrap-osd directory. Have the monitors reached a quorum?
[9:11] <Rom> 3 mons at {sn1=xxx.xxx.xxx.11:6789/0,sn2=xxx.xxx.xxx.12:6789/0,sn3=xxx.xxx.xxx.13:6789/0}, election epoch 8, quorum 0,1,2 node1,node2,node3
[9:12] <Rom> is there any way for me to manually create the bootstrap stuff?
[9:12] <jjgalvez> have you run the gatherkeys command?
[9:13] <Rom> yep, fails because it can't find the keys under bootstrap-osd and bootstrap-mds
[9:16] <Rom> according to some of the documentation I found a reference to the files not being there if the monitors don't create right... I tried the mon create command a couple of times, but no difference. I'm wondering if it expects me to delete the mons entirely first and try it again
[9:16] <jjgalvez> do you see the ceph-create-keys process running? I believe that is what makes the bootstrap keys
[9:16] <Rom> no, although I wasn't watching the processes when creating the monitors - everything seemed to work fine
[9:18] * vipr (~vipr@78-21-225-176.access.telenet.be) has joined #ceph
[9:18] <jjgalvez> my running cluster actually still has the process running, maybe try running that manually:
[9:18] <jjgalvez> /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i <mon_host>
[9:18] <jjgalvez> make sure it's on the mon_host
[9:19] * fireD (~fireD@93-139-146-165.adsl.net.t-com.hr) has joined #ceph
[9:19] <Rom> does each monitor have unique bootstrap keys, or are they the same among them?
[9:25] <Rom> Weird... looks like node3 actually did have the keys, but node1 and node2 did not!
[9:31] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:36] * tnt (~tnt@91.177.243.62) Quit (Ping timeout: 480 seconds)
[9:36] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[9:45] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:46] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[9:48] * AfC (~andrew@2407:7800:200:1011:7199:70c5:7209:403d) Quit (Quit: Leaving.)
[9:52] <Rom> okay, looks like I am active+clean with 18 OSD's.. I want to add public/cluster addresses to each OSD - do I just add the appropriate definitions in ceph.conf, and restart? Right now it's dynamically picking up the OSD's (they are not in ceph.conf at all)
[9:53] <Gugge-47527> yes, just add the public/private networks in ceph.conf, and it will pick up the right addresses when you restart the osd's
[9:54] <Rom> cool - thanks!
[9:54] <jjgalvez> oh cool, glad you figured out the keys
[9:54] <Rom> yep, thanks jjgalvez :)
[9:59] <jjgalvez> np, enjoy your new ceph cluster :)
[10:00] * jjgalvez (~jjgalvez@ip68-231-104-168.ph.ph.cox.net) has left #ceph
[10:04] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[10:07] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[10:09] * lightspeed (~lightspee@81.187.0.153) has joined #ceph
[10:31] * allsystemsarego (~allsystem@188.25.131.161) has joined #ceph
[10:35] * mschiff (~mschiff@pD9510218.dip0.t-ipconnect.de) has joined #ceph
[10:37] * CliMz (~CliMz@194.88.193.33) has joined #ceph
[10:38] * vipr (~vipr@78-21-225-176.access.telenet.be) Quit (Ping timeout: 480 seconds)
[10:38] <Rom> Is there a good way of visualizing the storage cluster? Almost something that you could put up on one of your NOC screens and it shows the status of all your OSDs, capacities, read/write/iops activity, etc?
[10:38] * X3NQ (~X3NQ@195.191.107.205) has joined #ceph
[10:38] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:39] * LeaChim (~LeaChim@176.27.136.68) has joined #ceph
[10:40] * indego (~indego@91.232.88.10) has joined #ceph
[10:43] * jaydee (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[10:47] * vipr (~vipr@78-21-225-176.access.telenet.be) has joined #ceph
[10:49] * mathlin (~mathlin@dhcp2-pc112059.fy.chalmers.se) Quit (Read error: Connection reset by peer)
[10:49] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[10:52] * sakari (sakari@turn.ip.fi) Quit (Ping timeout: 480 seconds)
[10:53] * mathlin (~mathlin@dhcp2-pc112059.fy.chalmers.se) has joined #ceph
[10:53] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[10:55] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[10:56] * sakari (sakari@turn.ip.fi) has joined #ceph
[11:01] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Quit: Leaving)
[11:18] * yy-nm (~chatzilla@122.233.47.137) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[11:54] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Quit: shimo)
[11:57] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[11:58] * wiwengweng (~oftc-webi@183.62.249.162) Quit (Remote host closed the connection)
[12:04] <nerdtron> ceph osd tree
[12:04] <nerdtron> ceph -w
[12:04] <nerdtron> ceph osd dump
[12:05] <nerdtron> Then write a script that will extract information from these commands and will display the output to a webpage or something
[12:11] <joelio> --format=json also handy option
[12:12] <joelio> ceph health --format=json
[12:12] <joelio> ceph osd dump --format=json
[12:16] <nerdtron> how do you use the json files?
[12:17] <joelio> JSON is a standard data structure format, so it'd be easier to crezate a script that would parse the output properly, in an extensible fashion
[12:18] * deadsimple (~infinityt@134.95.27.132) has joined #ceph
[12:23] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[12:24] * joelio about to embark on writing a nagios plugin
[12:25] <joelio> I've seen some dreamhost stuff via a google, looks quite rudimentary, so I'm going to extend one with perfdata etc.
[12:27] * brother| (foobaz@2a01:7e00::f03c:91ff:fe96:ab16) has joined #ceph
[12:27] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[12:29] * brother (foobaz@vps1.hacking.dk) Quit (Ping timeout: 480 seconds)
[12:31] <loicd> I'm seeing 403 on http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/master/version . Is it just me ?
[12:32] <shimo> it's been like that for ~12 hours
[12:32] <shimo> US people still sleeping probably
[12:32] <loicd> oh well :-)
[12:33] <loicd> 403 => don't wake me up, sleeping
[12:55] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Download IceChat at www.icechat.net)
[12:57] * huangjun (~kvirc@111.174.91.224) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[13:01] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[13:20] * yanzheng (~zhyan@134.134.139.72) has joined #ceph
[13:23] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (Ping timeout: 480 seconds)
[13:52] * mathlin (~mathlin@dhcp2-pc112059.fy.chalmers.se) Quit (Read error: Connection reset by peer)
[13:54] * mathlin (~mathlin@dhcp2-pc112059.fy.chalmers.se) has joined #ceph
[13:55] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[14:06] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:06] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:11] * nerdtron (~kenneth@202.60.8.252) Quit (Ping timeout: 480 seconds)
[14:11] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:11] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:12] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:14] * brother| is now known as brother
[14:19] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:21] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:34] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[14:53] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:58] * dosaboy (~dosaboy@host31-51-54-207.range31-51.btcentralplus.com) has joined #ceph
[14:59] * yanzheng (~zhyan@134.134.139.72) Quit (Ping timeout: 480 seconds)
[15:03] * dosaboy_ (~dosaboy@host81-156-124-131.range81-156.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:03] * jeff-YF (~jeffyf@64.191.222.109) has joined #ceph
[15:06] * allsystemsarego (~allsystem@188.25.131.161) Quit (Quit: Leaving)
[15:11] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[15:12] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[15:13] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:13] * jeff-YF (~jeffyf@64.191.222.109) Quit (Ping timeout: 480 seconds)
[15:21] * yanzheng (~zhyan@101.83.185.144) has joined #ceph
[15:23] * dosaboy_ (~dosaboy@host81-152-9-117.range81-152.btcentralplus.com) has joined #ceph
[15:28] * dosaboy__ (~dosaboy@host109-157-181-219.range109-157.btcentralplus.com) has joined #ceph
[15:29] * dosaboy (~dosaboy@host31-51-54-207.range31-51.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:33] * dosaboy_ (~dosaboy@host81-152-9-117.range81-152.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:43] * KrisK (~krzysztof@213.17.226.11) Quit (Quit: KrisK)
[15:47] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[15:47] * ChanServ sets mode +v andreask
[15:53] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:54] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:58] * Josh_ (~IceChat9@rrcs-74-218-204-10.central.biz.rr.com) has joined #ceph
[15:59] <Josh_> Hello is inktank in here yet?
[15:59] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[15:59] <alfredodeza> Josh_: what do you mean?
[16:00] <Josh_> Thi sis my first time doing this, is this the ceph geek on duty channel?
[16:00] * alram (~alram@208.86.100.62) has joined #ceph
[16:00] <alfredodeza> usually a lot of ceph devs are here, yes
[16:01] <Josh_> Has anyone has an issue where OSDs will not start after host reboot?
[16:01] <Josh_> *using ceph-deploy
[16:02] <alfredodeza> what are your logs saying?
[16:04] <Josh_> Nothing, that is the hard part. I used ceph-deploy osd create osdserver1:sdb:/dev/ssd1. Ceph log shows proper creation, osd is created marked up and in. OSD stays up and is part of crush, however when I issue initctl list | grep ceph the osd does not show. dep-deploy osd stop/start does not work. Restart and the OSD will never come back up with no log info
[16:09] * huangjun (~kvirc@58.51.149.211) has joined #ceph
[16:10] <Josh_> is there another location for logs other than /etc/ceph/ceph.log?
[16:12] <Josh_> I looked in /var/log/ceph/ceph-osd.X.log and they are all empty
[16:12] * BillK (~BillK-OFT@124-169-72-15.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:14] * lx0 is now known as lxo
[16:14] <huangjun> Josh_: if you didn't set the location, it will under /var/log/ceph
[16:16] <Josh_> yes, osd logs are empty
[16:16] * deadsimple (~infinityt@134.95.27.132) Quit ()
[16:17] <Josh_> I just created another osd storing the journal on a ssd, the log creates it fine however near the end it says journal _open /var/lib/ceph/osd/ceph-12/journal fd 26: 68568481792 bytes, block size 4096 bytes, directio = 1, aio = 1 : does that mean it did not store the journal correctly?
[16:17] <Josh_> the journal should be placed on /dev/sda3
[16:23] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:23] * yanzheng (~zhyan@101.83.185.144) Quit (Quit: Leaving)
[16:25] <huangjun> no,if you use the ceph-deploy tool to deploy you cluster, it will create a jounral for you
[16:26] <Josh_> So it looks like it is fine and working. Now when I restart the host that OSD will never come back up. Do you know of anything I can try?
[16:27] <huangjun> never come back? does the osd process failed?
[16:27] <huangjun> or /etc/init.d/ceph just output nothing?
[16:28] <Josh_> The creation process works fine. init.d does not work since it is ceph-deploy. sudo start ceph-osd id=1 gives me unknown ceph/1
[16:29] * clayb (~kvirc@proxy-ny2.bloomberg.com) has joined #ceph
[16:29] <alfredodeza> why would init.d not work?
[16:29] <alfredodeza> ceph-deploy helps you get things configured, but you can certainly use init.d to manage everything
[16:29] <Josh_> I get /etc/init.d/ceph: osd.1 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )
[16:29] <alfredodeza> it is basically a jumpt-start so you don
[16:29] <alfredodeza> so you don't need to configure everything manually
[16:29] <alfredodeza> that is all
[16:30] <alfredodeza> aha, well that seems like that is a problem
[16:30] <alfredodeza> what version of ceph-deploy are you using?
[16:31] <Josh_> how to I find that
[16:32] <alfredodeza> actually, I am working on adding that flag. But I can tell from when you installed it
[16:32] <Gugge-47527> ceph-deploy actually uses upstart on ubuntu, so the init.d script wont work :)
[16:32] <alfredodeza> there was a big release on Friday
[16:32] <huangjun> if you want to start a daemon on remote host, you should set it in ceph.conf and if you want to start a local osd daemon, then you should mount /dev/sdX on /var/lib/ceph
[16:33] <alfredodeza> ah, right, in Ubuntu it will use upstart
[16:33] <alfredodeza> so what Gugge-47527 said is spot on
[16:33] <Josh_> ok I used apt-get about 2 weeks ago
[16:35] * sprachgenerator (~sprachgen@130.202.135.205) has joined #ceph
[16:36] <Josh_> So steps (correct me if I am wrong) I still need to place the mount points in fstab, and manually edit ceph.conf?
[16:37] <Josh_> or does the update fix the need
[16:37] * julian_ (~julianwa@125.70.132.20) Quit (Quit: afk)
[16:39] <alfredodeza> I am not sure if we have the new RPM/DEB packages for ceph-deploy
[16:39] <alfredodeza> we did a release for the Python Package Index on friday
[16:39] <alfredodeza> if you are familiar with Python install tools (e.g. `pip`) you can try that
[16:39] <alfredodeza> otherwise, circle around in a few hours so I can confirm the OS packages are ready
[16:39] <alfredodeza> in the meantime, yes, you could add them manually
[16:40] <Josh_> perfect I will give that a try thank you
[16:40] * jeff-YF (~jeffyf@67.23.117.122) Quit (Quit: jeff-YF)
[16:45] * jeff-YF (~jeffyf@216.14.83.26) has joined #ceph
[16:49] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[16:58] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:01] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[17:05] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Operation timed out)
[17:13] * jeff-YF (~jeffyf@216.14.83.26) Quit (Quit: jeff-YF)
[17:18] * CliMz (~CliMz@194.88.193.33) Quit (Quit: Leaving)
[17:18] * jeff-YF (~jeffyf@216.14.83.26) has joined #ceph
[17:18] * madkiss (~madkiss@chello080108036100.31.11.vie.surfer.at) has joined #ceph
[17:21] * jeff-YF_ (~jeffyf@67.23.123.228) has joined #ceph
[17:24] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:26] * jeff-YF (~jeffyf@216.14.83.26) Quit (Ping timeout: 480 seconds)
[17:26] * jeff-YF_ is now known as jeff-YF
[17:27] * verdurin (~adam@46-65-111-12.zone16.bethere.co.uk) has joined #ceph
[17:31] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Quit: jlogan1)
[17:33] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:34] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[17:43] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[17:44] * Guest2911 (~coyo@thinks.outside.theb0x.org) Quit (Quit: om nom nom delicious bitcoins...)
[17:46] * jeff-YF (~jeffyf@67.23.123.228) Quit (Quit: jeff-YF)
[17:51] * devoid (~devoid@130.202.135.246) has joined #ceph
[17:53] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:54] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:54] * xdeller (~xdeller@91.218.144.129) has joined #ceph
[17:58] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[17:59] * madkiss (~madkiss@chello080108036100.31.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:02] * yanzheng (~zhyan@101.83.108.65) has joined #ceph
[18:02] * Coyo (~coyo@thinks.outside.theb0x.org) has joined #ceph
[18:02] * Coyo is now known as Guest3106
[18:02] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:10] * huangjun (~kvirc@58.51.149.211) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[18:11] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[18:12] * NelsonJeppesen (~oftc-webi@199.181.135.135) has joined #ceph
[18:14] * tnt (~tnt@91.177.243.62) has joined #ceph
[18:15] * joao (~JL@2607:f298:a:607:9eeb:e8ff:fe0f:c9a6) has joined #ceph
[18:15] * ChanServ sets mode +o joao
[18:16] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[18:20] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:23] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[18:28] <NelsonJeppesen> Moring Joao, wonder if you have time to work with me today?
[18:28] <joao> NelsonJeppesen, I'll sure find the time; give me just a few minutes to try wrapping something up
[18:29] <NelsonJeppesen> Thank you.
[18:31] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[18:33] * alram (~alram@208.86.100.62) Quit (Ping timeout: 480 seconds)
[18:38] <joao> NelsonJeppesen, sorry, just a few more minutes
[18:38] * yanzheng (~zhyan@101.83.108.65) Quit (Ping timeout: 480 seconds)
[18:38] <joao> NelsonJeppesen, in the mean time, did you had the chance to read my last email?
[18:39] * madkiss (~madkiss@089144192103.atnat0001.highway.a1.net) has joined #ceph
[18:39] <joao> those waiting times you experienced, which looked like the monitor had hung, were in fact due to store compaction
[18:40] * bergerx_ (~bekir@78.188.204.182) Quit (Quit: Leaving.)
[18:40] <NelsonJeppesen> I did
[18:40] <NelsonJeppesen> I think you're right
[18:40] <joao> you should be able to check if the store is performing IO with iostat or iotop to make sure there's still io going on
[18:41] <NelsonJeppesen> Yea, it was doing IO also I could see disk usage grow then shrink
[18:41] <joao> that would probably be the simplest approach at solving your problems
[18:41] <joao> yeah, compaction works like that
[18:41] <NelsonJeppesen> when the monitor is running normaly i see little to no change on disk usage
[18:41] <NelsonJeppesen> How long can cluster run without a monitor?
[18:42] <joao> NelsonJeppesen, how long has that monitor store been on 220GB of size?
[18:42] <NelsonJeppesen> At least a month
[18:42] <NelsonJeppesen> It was a bug in the early .61 releases
[18:42] <joao> yeah
[18:42] <joao> makes sense
[18:43] <NelsonJeppesen> Before upgrading I had three monitors
[18:43] <NelsonJeppesen> but after the upgrade I could only get one online
[18:43] <joao> it grew to that point and after we introduced the new compaction shenanigans it stopped growing as much
[18:43] <NelsonJeppesen> I woulnt care about the disk usage, as long as I had thre monitors
[18:43] <joao> NelsonJeppesen, that's most likely because of the store size
[18:44] <NelsonJeppesen> Is there a way I could copy the monitor store.db to create a new one?
[18:44] <joao> leveldb iterators can have a tough time with stores that haven't been compacted; or so it seems
[18:45] <NelsonJeppesen> Well, at least cpu and memory load is low.
[18:45] <joao> NelsonJeppesen, yeah, that would be feasible but not the simplest approach -- and it would still mean downtime
[18:45] <joao> now, it would probably be less downtime than the compaction...
[18:45] <NelsonJeppesen> Do you see anyway of getting out of this with less than 15 min of downtime?
[18:47] * Meths (rift@2.25.214.150) Quit (Read error: Connection reset by peer)
[18:47] <joao> I don't know how long it would take to copy the data from one store to the other
[18:47] * Meths (rift@2.25.214.150) has joined #ceph
[18:47] <NelsonJeppesen> I was thinking of a rsync while the node is online, because most of the files seem not to change
[18:47] <NelsonJeppesen> then shutdown the monitor, do a final rsycn
[18:47] <joao> NelsonJeppesen, I doubt that will help you
[18:48] <joao> because you'd end up with the same files
[18:48] <joao> the same store, with 220GB worth of mostly junk
[18:48] <joao> and you'd still need to compact it
[18:48] <NelsonJeppesen> Ture, but I can do that twice and get to 3 monitors
[18:48] <NelsonJeppesen> I would have 3 monitors with junk, but could compact one and keep quarum
[18:49] <joao> maybe, if copying it doesn't corrupt the db
[18:49] <joao> well, true
[18:49] <joao> you could try that
[18:49] <joao> let us know if it works
[18:49] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Quit: jlogan1)
[18:49] * alram (~alram@208.86.100.62) has joined #ceph
[18:50] <NelsonJeppesen> in the store.db there are 21k files as an fyi
[18:51] <joao> NelsonJeppesen, if that doesn't work, we can always try to copy the data *out* of the store into a new one; that will mean a patch to one of our tools, but should be feasible to do it in a couple of hours
[18:52] <joao> well, grabbing more coffee
[18:52] <joao> brb
[18:52] * Guest3106 (~coyo@thinks.outside.theb0x.org) Quit (Quit: om nom nom delicious bitcoins...)
[18:52] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Read error: No route to host)
[18:53] <Rom> coffee sounds good..
[18:54] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[18:54] * tchmnkyz (~jeremy@0001638b.user.oftc.net) has joined #ceph
[18:54] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[18:54] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[18:55] <Rom> What is the general consensus on replicas.. Is 2 enough, or should we use 3 to be safe? Customer data is always critical, but I'd be interested in knowing what the accepted norm in production environments.
[18:56] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[18:57] <tnt> Rom: I use 3 + weekly backups for the 'originals'. Then 2 for everything else that could be regenerated (like thumnails / converted videos / ...) ...
[18:58] * nlopes (~nlopes@a89-154-18-198.cpe.netcabo.pt) Quit (Quit: leaving)
[19:00] <NelsonJeppesen> joao, could you explain that last option?
[19:00] <NelsonJeppesen> do you think it's the better one?
[19:04] * Coyo (~coyo@thinks.outside.theb0x.org) has joined #ceph
[19:04] <joao> NelsonJeppesen, I believe it is the safest
[19:04] <Rom> Thanks tnt - is that from experience, or just felt like the best way to run it? That is, did you run it with 2 before but had an issue and decided to increase to 3?
[19:04] * Coyo is now known as Guest3115
[19:05] <Gugge-47527> I use 2 copies, and sync all data offsite every 30 minutes
[19:05] <joao> NelsonJeppesen, the best option would be to wait for the compaction to finish; but considering that is not an option because we have no idea how long that will take, the safest approach to copy the store is to do it properly: open two leveldb instances, copy keys and data from one to another
[19:06] <Rom> every 30 mins? How much data are you generating? I'm assuming a small enough dataset that it completes within 30mins?
[19:06] <joao> but maybe copying the stores would work as well; no idea.
[19:06] <Gugge-47527> Rom: using zfs on top of RBD, so incremental sends are easy :)
[19:06] <NelsonJeppesen> To do your recomondation, i would need custom tooling?
[19:07] <joao> NelsonJeppesen, yes
[19:07] <joao> Sage and I agreed that would be a cool feature to have on our ceph-monstore-tool, but we haven't gotten around to implement it
[19:07] <NelsonJeppesen> Is that something you, or you team could do. I have some scripting and light programing background but I think it's beyomd my baility
[19:07] <joao> shouldn't take long assuming I don't get interrupted by other stuff
[19:08] <joao> like monitors being blown to bits
[19:08] <NelsonJeppesen> heh :)
[19:08] * devoid (~devoid@130.202.135.246) Quit (Ping timeout: 480 seconds)
[19:08] <NelsonJeppesen> Thank you. Whats the best way to move forward? Should I just ask you about it in a week in this channel?
[19:08] <Rom> Gugge-47527: ZFS on Linux, or BSD? Not even sure if ceph has an RBD driver for BSD...
[19:09] <joao> I'll send an email to the list whenever I got it available for testing
[19:09] <Gugge-47527> Rom: on linux
[19:09] <Gugge-47527> I wish someone would make a GEOM-RBD :)
[19:09] <Gugge-47527> I wish i had the skills to do it :)
[19:10] <joao> NelsonJeppesen, you should really contemplate issuing a full store compaction though; that would be the best way to get it done -- I don't have a bit enough store to test my approach on, so I can't guarantee that it will be faster
[19:10] <joao> s/bit/big
[19:10] <Rom> Gugge-47527: I've used ZFS a lot on OpenSolaris (and derivatives) - in fact my primary SAN is the commercial version of Nexenta, with ZFS at the core - but I've always been reluctant to try it on Linux... Isn't it still FUSE based there?
[19:11] * darkfaded (~floh@88.79.251.60) Quit (Ping timeout: 480 seconds)
[19:11] <NelsonJeppesen> the custom tooling option; would that be a long outage?
[19:12] <Gugge-47527> Rom: zfsonlinux.org, no fuse is not used :)
[19:13] <Rom> Gugge-47527: Cool - I'll take a look. Any stability issues with Ceph RBD/ZFS? How much daily data do you get and how active is your cluster?
[19:13] <Josh_> Are snapshots considered a first line in terms of backups? I have been doing snapshots of all VM images nightly, and then every sunday I dump all ceph images to a large server with just a bunch of disks. Is this safe do you think? I have 5 server node with 20 OSDs
[19:13] * mschiff (~mschiff@pD9510218.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[19:14] <Gugge-47527> Rom: i get about 30GB daily data, no problems yet
[19:14] <sjust> Josh_: snapshots are not a backup
[19:15] <sjust> if you loose data from the main copy, you will have lost pretty much the same data from the copies
[19:15] * madkiss (~madkiss@089144192103.atnat0001.highway.a1.net) Quit (Quit: Leaving.)
[19:15] * markbby1 (~Adium@168.94.245.2) Quit (Ping timeout: 480 seconds)
[19:15] <sjust> *from the snapshots
[19:15] <Josh_> What is the best option for backing up VM RBD images?
[19:16] <sjust> Josh_: probably take a snapshot and backup from the snapshot
[19:16] <Rom> Josh_: Which VM platform are you using? Are the disks being quiesced at snapshot time?
[19:16] <Rom> Gugge-47527: Thanks :)
[19:19] <Josh_> KVM, I am going to use openstack after their next release. Guests are still running at time of backup. What I had been doing when they were on a local FS was using crashplan to back up the .img file daily. Now I have 70 mixed guests. When you say backup from the snapshot do you have any guide you can point me to?
[19:21] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:25] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:26] <Rom> Josh_: What I would be worried about is data-in-flight. Quiescing allows a filesystem to flush all buffers before a snapshot operation. For example, with VMware their tools with automatically quiesce a supported operating system before doing a VM snapshot. For web server type images it's generally not an issue - not many local file changes occur in general. However, on a database or databased-backed mailserver, you could end
[19:26] <Rom> up with corrupted database files. In general I would look at what I was backing up and then use the best method for that. For databases I would use native tools, for example, that know how to flush data, lock tables, etc, before the backup is taken. Of course, the alternative is to shut down the image to get a perfect consistency, but that is generally not desirable!
[19:30] * nhm (~nhm@184-97-255-87.mpls.qwest.net) has joined #ceph
[19:30] * ChanServ sets mode +o nhm
[19:32] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[19:35] * darkfader (~floh@88.79.251.60) has joined #ceph
[19:41] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[19:42] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[19:48] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[19:48] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[19:51] * tnt_ (~tnt@91.176.3.64) has joined #ceph
[19:51] * sjustlaptop (~sam@38.122.20.226) Quit (Read error: Connection reset by peer)
[19:51] * sagelap (~sage@2600:1010:b02a:f6cf:6d05:83c9:ae92:2fdb) has joined #ceph
[19:52] * mschiff (~mschiff@85.182.236.82) has joined #ceph
[19:52] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[19:53] * tnt (~tnt@91.177.243.62) Quit (Ping timeout: 480 seconds)
[19:58] <wrencsok1> how do i verify my mon version? the osd style command (ceph tell osd.$i version) doesn't work, and i don't see anything in the admin daemon stating a version. i'd really like to verify that the version is what i expect it to be beyond the apt-get update/install. tips?
[20:00] * fets (~stef@ylaen.iguana.be) has left #ceph
[20:02] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[20:02] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[20:02] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[20:03] <joao> wrencsok1, 'ceph-mon --version'
[20:08] * sjustlaptop (~sam@38.122.20.226) Quit (Quit: Leaving.)
[20:10] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[20:13] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Remote host closed the connection)
[20:18] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[20:30] * sjustlaptop (~sam@38.122.20.226) Quit (Quit: Leaving.)
[20:31] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[20:33] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:33] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[20:34] * devoid (~devoid@130.202.135.246) has joined #ceph
[20:39] * alfredodeza is now known as alfredo|afk
[20:39] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[20:42] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[20:52] * madkiss (~madkiss@2001:6f8:12c3:f00f:99f7:c177:7146:2e9e) has joined #ceph
[20:53] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[20:53] * allsystemsarego (~allsystem@5-12-241-157.residential.rdsnet.ro) has joined #ceph
[20:54] <NelsonJeppesen> joao, if I were to copy the store.db, whould these steps make sense? Assuming the new monitor id is 3?
[20:54] <NelsonJeppesen> 1) ceph-mon -i 3 --mkfs --monmap /tmp/map --keyring /tmp/auth 2) copy store.db from working monitor 3) ceph mon add 3> <ip>[:<port>] 4) ceph-mon -i 3 --public-addr {ip:port}
[20:54] * LeaChim (~LeaChim@176.27.136.68) Quit (Ping timeout: 480 seconds)
[20:55] <joao> NelsonJeppesen, I'd say so, yes
[20:55] <joao> s/3>/3/
[20:56] <joao> NelsonJeppesen, personally, before adding the monitor to the map, I would first try to run the monitor with the copied store
[20:56] <joao> just to make sure the store isn't corrupted
[20:56] * Machske (~Bram@81.82.216.124) has joined #ceph
[20:56] <NelsonJeppesen> What would be the signs of corruption?
[20:57] <joao> it's also likely that you'll have to end up injecting the monmap on the new monitor
[20:57] <joao> NelsonJeppesen, leveldb should fail to open
[20:59] <joao> I can't recall what the code is like, but I'm fairly certain that starting the monitor like that on step 4) will cause it to commit suicide for not being on the monmap -- and having you copied the store, there's this flag in the store that states that said monitor has once been on a quorum; the monitor will infer it was removed from the cluster given it is not on the current monmap, and will commit suicide
[20:59] <joao> so you'll have to inject the monmap into it
[21:00] <NelsonJeppesen> joao, ah ok; Thanks. Yea, I've dont the monmap injections before to drop from 3 to 1 monitors. Think that'll be ok.
[21:00] <joao> considering this, the steps would be sort of like '1) grab current monmap; 2) monmaptool --add 3 ip:port /tmp/foo.monmap ; 3) inject monmap into monitors ; 4) restart monitors
[21:01] <joao> NelsonJeppesen, if you decide to go with that approach, make sure you test first if the monitor is able to open the store
[21:01] <joao> I'm going to grab some lunch
[21:01] <joao> let me know how it goes
[21:01] <joao> brb
[21:02] <NelsonJeppesen> joao, thanks. I'll try it tonight.
[21:02] <NelsonJeppesen> I'll reply in the message list with results
[21:04] * LeaChim (~LeaChim@176.248.81.121) has joined #ceph
[21:04] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[21:05] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[21:16] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[21:17] * sagelap (~sage@2600:1010:b02a:f6cf:6d05:83c9:ae92:2fdb) Quit (Read error: Connection reset by peer)
[21:27] * dmick (~dmick@2607:f298:a:607:345b:3f9f:be42:5ae0) has joined #ceph
[21:30] * alfredo|afk is now known as alfredodeza
[21:31] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[21:34] * mschiff (~mschiff@85.182.236.82) Quit (Ping timeout: 480 seconds)
[21:41] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[21:41] * ChanServ sets mode +v andreask
[21:45] <sjust> sage: are you around?
[21:46] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[21:49] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) has joined #ceph
[21:58] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[22:00] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: If you think nobody cares, try missing a few payments)
[22:02] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[22:04] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[22:14] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:14] * yanzheng (~zhyan@101.82.254.119) has joined #ceph
[22:26] <joao> NelsonJeppesen, cool thanks
[22:32] <Kioob> Hi
[22:32] <Kioob> # ceph osd status
[22:32] <Kioob> 2013-08-12 22:24:48.828371 7f991f889700 0 -- :/29563 >> 192.168.0.11:6789/0 pipe(0x2c23a80 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[22:32] <sjust> that is usually somewhat harmless
[22:32] <sjust> or the mons are down, also possible
[22:32] <Kioob> I have this error on an OSD, the problem here is that I have no 192.168.0.* network
[22:33] <Kioob> so I probably made a mistake somewhere :D but I don't see that network in ceph.conf too
[22:33] <sjust> 192.168.0.11 must be configured somewhere as a mon address
[22:33] <Kioob> how can I dump the mon conf ?
[22:33] <sjust> it would be in your ceph.conf
[22:33] <Kioob> it's not
[22:34] <sjust> cat /etc/ceph/ceph.conf on that machine
[22:34] <Kioob> rofl... you're right... bad synchronisation
[22:34] <Kioob> sorry for the noise
[22:34] <sjust> no worries
[22:36] * sprachgenerator (~sprachgen@130.202.135.205) Quit (Quit: sprachgenerator)
[22:37] * sprachgenerator (~sprachgen@130.202.135.205) has joined #ceph
[22:40] * mschiff (~mschiff@tmo-103-66.customers.d1-online.com) has joined #ceph
[22:42] * allsystemsarego (~allsystem@5-12-241-157.residential.rdsnet.ro) Quit (Quit: Leaving)
[22:47] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[22:51] * mschiff (~mschiff@tmo-103-66.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[22:52] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[22:58] * sagelap (~sage@182.sub-70-197-5.myvzw.com) has joined #ceph
[22:59] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[23:00] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[23:01] * zhyan_ (~zhyan@101.82.225.231) has joined #ceph
[23:03] * zhyan__ (~zhyan@101.82.244.95) has joined #ceph
[23:05] * psieklFH (psiekl@wombat.eu.org) Quit (Remote host closed the connection)
[23:05] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[23:07] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:07] * yanzheng (~zhyan@101.82.254.119) Quit (Ping timeout: 480 seconds)
[23:10] * zhyan_ (~zhyan@101.82.225.231) Quit (Ping timeout: 480 seconds)
[23:16] * mschiff (~mschiff@tmo-103-66.customers.d1-online.com) has joined #ceph
[23:18] * jharley (~jharley@69.165.148.187) has joined #ceph
[23:18] <jharley> hey, any rbd/openstack people got a moment for a question? my copy on write glance/cinder behaviour isn't happening, and I've set "glance_api_version" to '2' most everywhere
[23:20] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[23:20] * devoid1 (~devoid@130.202.135.246) has joined #ceph
[23:20] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[23:21] * devoid (~devoid@130.202.135.246) Quit (Read error: Connection reset by peer)
[23:22] <sjust> joshd: ^
[23:22] * zhyan__ (~zhyan@101.82.244.95) Quit (Ping timeout: 480 seconds)
[23:23] <joshd> jharley: you've got show_direct_url=True in glance-api.conf's default section?
[23:23] <jharley> joshd: no, 'cause I'm on Grizzly
[23:23] <jharley> do I still need that on Grizzly?
[23:23] <joshd> jharley: yeah, even in folsom
[23:24] <jharley> wait, isn't Folsom older than Grizzly
[23:24] <jharley> ?
[23:24] <jharley> regardless, Grizzly needs 'show_direct_url=True' and 'glance_api_version=2'?
[23:24] * yehudasa_ (~yehudasa@2602:306:330b:1410:95ae:485a:7b55:dc8c) Quit (Remote host closed the connection)
[23:25] <joshd> yes
[23:25] <jharley> ahhh...
[23:25] <jharley> lemme try. Thanks!
[23:25] <joshd> np
[23:26] <jharley> oh, is it "show_image_direct_url"?
[23:26] <joshd> that's the one
[23:26] <jharley> cool.
[23:27] <jharley> sjust: and, thanks for waking up joshd -- much appreciated!
[23:30] <jharley> sweet! that worked
[23:30] <jharley> thanks so much!
[23:31] * mjevans (~mje@209.141.34.79) has joined #ceph
[23:31] <sjust> jharley: that's what I'm here for
[23:35] * BillK (~BillK-OFT@124-169-72-15.dyn.iinet.net.au) has joined #ceph
[23:36] <mjevans> So, I was setting up a 3 monitor (2 of which host OSDs (one per block device), absolute minimal network raid-1 storage cluster) and restarted it before discovering noout. After re-importing a copy of my desired crushmap and letting things settle I still have ~128 pgs stuck unclean; all of which are either 'active' or 'active+remapped'; how can I force these (or just everything) to re-map and repair?
[23:37] <mjevans> I'm using debian 7.1 with some light pinning to get in 3.9.x kernels and ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
[23:39] <sjust> mjevans: can you post the output of ceph osd tree?
[23:39] <sjust> pool size?
[23:39] * sagelap (~sage@182.sub-70-197-5.myvzw.com) Quit (Read error: Connection reset by peer)
[23:39] <mjevans> sjust: I'll pastebin it
[23:41] <sjust> k
[23:43] <mjevans> sjust: http://pastebin.com/cDaL0XaS
[23:43] <sjust> mjevans: which root is pool 3 using?
[23:44] <sjust> ssd_all?
[23:44] <mjevans> Should be SSD, rule 3, pool -4
[23:44] <mjevans> yeah
[23:44] <sjust> can you post your crush map?
[23:44] <mjevans> All of the OSDs are up
[23:44] <sjust> eyah
[23:44] <mjevans> I'll dump the active map to be sure
[23:45] <sjust> it's probably one of the crush edge cases, crush map?
[23:46] * doxavore (~doug@99-7-52-88.lightspeed.rcsntx.sbcglobal.net) has joined #ceph
[23:47] * jharley (~jharley@69.165.148.187) Quit (Quit: jharley)
[23:47] <mjevans> sjust: http://pastebin.com/gb7x7i4T
[23:47] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:51] <mjevans> Oh, It seems that I hadn't set the default rbd pool to use the hdd_all rule. Well that's fixed now.
[23:52] <sjust> was that the problem?
[23:52] <mjevans> No, my problem is still there
[23:52] <mjevans> I just found something sub-optimal that I corrected which was unrelated to it.
[23:53] <sjust> in rule ssd:
[23:53] <sjust>         step chooseleaf firstn 0 type host
[23:53] <sjust> you only have 1 host in that tree
[23:54] <mjevans> That should be using the hdd_all root (-7) which points to -8 and -9 (thought the weight is off) which are hdd_rack0_host0 and hdd_rack1_host1
[23:54] <lurbs> Is the fix for bug 5599 (http://tracker.ceph.com/issues/5599) as simple as copying the code block in ceph-disk that does a partprobe on the data device and modifying it to apply to the journal device, or am I missing something?
[23:55] <sjust> mjevans: can you attach the output of ceph osd dump?
[23:55] <mjevans> Let me see how big that is...
[23:55] <sjust> dmick, alfredodeza: ^ (lurbs)
[23:56] <lurbs> http://paste.uber.geek.nz/a01849
[23:56] <lurbs> Works for me, anyway.
[23:56] <mjevans> sjust: http://pastebin.com/mCCKrscV
[23:56] <dmick> lurbs: shrug? maybe.
[23:57] <sjust> mjevans: pool 3 'ssd' rep size 2 min_size 1 crush_ruleset 3 object_hash rjenkins pg_num 128 pgp_num 128 last_change 43 owner 0
[23:58] <sjust> ruleset 3 has 2 replicas split across hosts on root ssd_all (step take ssd_all)
[23:58] <mjevans> Yes, that sounds correct; does it not fan out until it hits the host level?
[23:58] <sjust> ssd_all appears to contain only 1 host
[23:59] <sjust>         step chooseleaf firstn 0 type host
[23:59] <sjust> mean choose <replication level> hosts with 1 leaf from each
[23:59] <sjust> *means
[23:59] <mjevans> OH, I need to set each of those roots to a type level don't I?
[23:59] <sjust> you probably want set choose firstn 0 type osd

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.