#ceph IRC Log


IRC Log for 2013-07-22

Timestamps are in GMT/BST.

[0:27] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[0:35] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[0:48] * lautriv (~lautriv@f050082113.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[0:58] * lautriv (~lautriv@f050082011.adsl.alicedsl.de) has joined #ceph
[1:04] * dake (~Vince@ has joined #ceph
[1:11] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:28] * madkiss (~madkiss@2001:6f8:12c3:f00f:3838:f44b:2ebb:6f0b) has joined #ceph
[1:36] * madkiss (~madkiss@2001:6f8:12c3:f00f:3838:f44b:2ebb:6f0b) Quit (Ping timeout: 480 seconds)
[1:39] * dake (~Vince@ Quit (Quit: 离开)
[2:10] * mschiff_ (~mschiff@port-50293.pppoe.wtnet.de) has joined #ceph
[2:17] * mschiff (~mschiff@port-33202.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[2:28] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[2:28] * huangjun (~huangjun@ has joined #ceph
[2:32] * AfC (~andrew@2001:44b8:31cb:d400:bc5e:b0f0:5bd2:f8b2) has joined #ceph
[2:36] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:37] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[2:39] * trytol (~dg233@bl5-182-7.dsl.telepac.pt) has joined #ceph
[2:46] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:51] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[2:52] * yy-nm (~chatzilla@ has joined #ceph
[2:56] * trytol (~dg233@bl5-182-7.dsl.telepac.pt) Quit (autokilled: Do not spam. Mail support@oftc.net with questions. (2013-07-22 00:56:34))
[3:04] <huangjun> can i set an user to osd ?
[3:06] <huangjun> not use root user
[3:13] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[3:24] <huangjun> another question, if i deploy a cluster and write [osd.0] the ceph.conf, after everthing is ok, i restart the cluster, it starts the osd.0 daemon twice.
[3:25] <huangjun> and when i reboot the machine, it execute normal, start osd.0 only once.
[3:27] <yy-nm> huangjun, you mean direct to osd not a pool?
[3:29] <huangjun> yes
[3:29] * madkiss (~madkiss@2001:6f8:12c3:f00f:a462:b665:c057:3e10) has joined #ceph
[3:29] * KindTwo (KindOne@ has joined #ceph
[3:30] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:30] <yy-nm> it seem not have a way yet, you can check in http://ceph.com/docs/next/man/8/ceph-authtool/#osd-capabilities
[3:30] * KindTwo is now known as KindOne
[3:37] * madkiss (~madkiss@2001:6f8:12c3:f00f:a462:b665:c057:3e10) Quit (Ping timeout: 480 seconds)
[3:39] <yy-nm> i have a question about the tdump file in /var/log/ceph/ (default)
[3:47] <yanzheng> i think leveldb's transaction dump file
[3:50] <yy-nm> the file can be turned off? the size of tdump file is keeping rise and unlimited
[3:59] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) Quit (Quit: smiley)
[4:14] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[4:14] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[4:17] <yanzheng> put 'debug dump_transactions = 0' to the [mon] section
[4:22] * julian (~julianwa@ has joined #ceph
[4:29] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[4:36] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[4:37] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[4:39] <yy-nm> can you tell how to find the variable? site?
[4:41] <huangjun> you can find it in config_opts.h file
[4:42] <yy-nm> source code?
[4:44] <yy-nm> thanks, i get it
[4:46] * drokita (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) has joined #ceph
[4:46] * drokita (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) Quit ()
[4:46] * drokita (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) has joined #ceph
[4:54] * drokita (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[5:06] * fireD_ (~fireD@93-139-163-75.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-142-209-141.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:25] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) has joined #ceph
[5:27] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) Quit ()
[5:30] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[5:38] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[5:54] * sha (~kvirc@ has joined #ceph
[6:12] * Almaty (~san@ has joined #ceph
[6:19] <sha> hi too all. Can any one help us. we got only 1 monitor. this is log from dead mon.b http://pastebin.com/RMbdEjiw
[6:21] <sha> no quorum. only 1 live monitor.
[6:23] <Almaty> is it possible to run ceph with one monitor?
[6:23] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[6:24] <Almaty> Who knows?
[6:25] <huangjun> yes, can run ceph with only one mon
[6:26] <huangjun> sha:can not open the link website
[6:26] <sha> how we can do it?
[6:27] <huangjun> did you have 2 mons, and mon.b died?
[6:27] <Almaty> 3 mons a,b died
[6:27] <sha> huangjun: we have only 1 mon.c....mon.b, mon.a - dead
[6:28] <sha> no map, no quaorum, no keyring
[6:29] <huangjun> so your cluster works fine now?
[6:29] <sha> huangjun: no)... its only 1 monitor.
[6:30] <huangjun> i have tested this before, we build 3 mons, and then stop the 2 mons squencely, unfortuncately the cluster can not work.
[6:30] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[6:31] <sha> huangjun: yes. but at the moment we have breaking on 2 monitors ...
[6:31] * silversurfer972 (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:32] <sha> how to get out of the situation when only 1 monitor live
[6:32] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Ping timeout: 480 seconds)
[6:34] <sha> huangjun: is it possible to remove the non-working monitors in this situation?
[6:36] * paravoid (~paravoid@scrooge.tty.gr) Quit (Quit: leaving)
[6:38] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[6:39] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[6:40] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Quit: ChatZilla [Firefox 22.0/20130618035212])
[6:43] <sage> sha: yes, this documented on ceph.com/docs/master under rmeoving monitors from an unhealthy cluster
[6:43] <sage> you should also be able to add the failed mons back in, though.
[6:44] <sha> sage: hi! 1 min
[6:45] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[6:46] <sha> sage:http://pastebin.com/ygNyEU5b
[6:47] * Psi-Jack_ (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[6:54] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Read error: Connection reset by peer)
[6:55] * sh_t (~sht@ Quit (Ping timeout: 480 seconds)
[6:57] <sha> sage: ceph-mon -i c --extract-monmap /tmp/monmapc - didnt work... no map in /tmp/
[6:57] <sha> sage: no map were add in /tmp/ by command ceph-mon -i c --extract-monmap /tmp/monmap
[6:58] <sage> why are the other 2 mons down?
[6:59] <sha> sage: log from dead mon.b http://pastebin.com/RMbdEjiw
[6:59] <sage> what version is this?
[7:00] <sage> hmm, stefan hit this. let me push a workaround for you.
[7:00] <sage> what os?
[7:00] <sha> sage: simple reboot 2 mons, Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64)
[7:01] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[7:03] <sha> sage: mon.c log http://pastebin.com/vqgfUciD
[7:04] * paravoid (~paravoid@scrooge.tty.gr) Quit ()
[7:04] <sage> pushed wip-cuttlefish-osdmap, build will appear at gitbuilder.ceph.com in about 10 minutes
[7:04] <sage> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-cuttlefish-osdmap
[7:05] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[7:05] <sha> sage: 404
[7:05] <sage> yeah it's building, will take about 10 minutes
[7:05] <sha> sage: log mon.a http://pastebin.com/PDqRTsex
[7:06] <sage> if you don't mind, i would love to look at a copy of your mon data dir on one of those mons before it recovers to see where it went wrong
[7:08] <sage> yanzheng: hi
[7:08] <yanzheng> hi
[7:08] <sage> yanzheng: did you by chance look to see if d_delete triggers d_prune, or if we need to clear the dir complete flag on that d_delete() in invalidate_dentries()?
[7:09] <sage> i can't remember what the rules were there
[7:10] <sage> btw sorry for the slow review on all of this stuf.. i haven't forgotten! focused on squashing bugs for dumpling
[7:10] <yanzheng> no it doesn't
[7:11] <yanzheng> but i don't think we need to clear the dir complete in this case
[7:11] <sage> yeah i guess long before this the dir cap would have been revoked
[7:11] <yanzheng> agree
[7:13] <sage> k. i'll pull the 3 kernel patches into the testing branch. we're not testing master right now, though, and focusing on next.. we'll pbly start again in another week or so, and i'll pull the other mds stuff in then
[7:14] <sha> sage:archiving
[7:15] <sage> and i'm not sure i mentioned this before, but thanks for all the hard work on these fs issues! really happy to see the bug list shrinking and multi-mds stability improving :) :)
[7:15] <yanzheng> ok, thanks
[7:16] * paravoid (~paravoid@scrooge.tty.gr) Quit (Quit: leaving)
[7:17] <sage> sha: can you make a note with the filename on http://tracker.ceph.com/issues/5704 when you've uploaded the tar? thanks!
[7:17] <sage> and also please let me know on the bug if that branch fixes it or not
[7:17] <sage> thanks!
[7:17] <sage> ttyl
[7:17] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[7:21] * sh_t (~sht@ has joined #ceph
[7:31] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[7:33] <sha> sage: what is this - http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-cuttlefish-osdmap
[7:39] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[7:46] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[8:23] * waxzce (~waxzce@2a01:e35:2e1e:260:7dcb:1273:5f81:fc73) Quit (Remote host closed the connection)
[8:24] * sha (~kvirc@ Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[8:50] * sleinen (~Adium@macsl.switch.ch) has joined #ceph
[9:12] * bergerx_ (~bekir@ has joined #ceph
[9:22] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Remote host closed the connection)
[9:22] * madkiss (~madkiss@2001:6f8:12c3:f00f:2870:e33a:5d26:a4fc) has joined #ceph
[9:23] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[9:23] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[9:28] * mxmln (~maximilia@ Quit ()
[9:32] * yy-nm (~chatzilla@ Quit (Read error: Connection reset by peer)
[9:33] * yy-nm (~chatzilla@ has joined #ceph
[9:43] * syed_ (~chatzilla@ has joined #ceph
[9:43] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:44] * sha (~kvirc@ has joined #ceph
[9:45] <sha> sage: your branch helped us. we remove not working monitors
[9:50] * allsystemsarego (~allsystem@ has joined #ceph
[9:51] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:53] * leseb1 (~Adium@ has joined #ceph
[9:55] * BManojlovic (~steki@fo-d- has joined #ceph
[10:08] * sleinen (~Adium@macsl.switch.ch) Quit (Ping timeout: 480 seconds)
[10:12] * LeaChim (~LeaChim@ has joined #ceph
[10:12] * sha (~kvirc@ Quit (Read error: Connection reset by peer)
[10:35] * X3NQ (~X3NQ@ has joined #ceph
[10:38] * fireD_ is now known as fireD
[10:45] * sha (~kvirc@ has joined #ceph
[10:54] * Almaty (~san@ Quit (Quit: Ex-Chat)
[11:01] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[11:13] * sleinen (~Adium@2001:620:0:46:e59f:3863:f077:238c) has joined #ceph
[11:14] * infinitytrapdoor (~infinityt@ has joined #ceph
[11:29] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:36] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[11:36] <agh> Hello,
[11:36] <agh> I want to setup a Ceph cluster "by hand"
[11:37] <agh> without ceph-deploy
[11:37] <agh> (i will manage it via Salt Stack later)
[11:37] <agh> is there a way to do so ? A doc ?
[11:40] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[11:40] <huangjun> you just can use 0.55 and before if you want to use mkcephfs to build the cluster
[11:42] <huangjun> agh: http://ceph.com/docs/master/rados/deployment/mkcephfs/?highlight=mkcephfs
[11:43] <agh> yes, but with last version ?
[11:43] <agh> Is ceph-deploy mandatory ?
[11:43] <huangjun> uhh, i didn't tried it yet, you can try.
[11:46] <agh> how to define the cluster name (ceph, by default) ?
[11:46] <huangjun> yes, ceph is the default
[11:46] <agh> yes, but, how to change that ?
[11:48] <Pauline> --cluster NAME
[11:48] <Pauline> (on ceph-deploy)
[11:49] <agh> ok
[11:49] <agh> thanks
[11:49] * deadsimple (~infinityt@ has joined #ceph
[11:51] <huangjun> or you can use "ceph-deploy -h" to get the usage of ceph-deploy
[11:53] * infinitytrapdoor (~infinityt@ Quit (Ping timeout: 480 seconds)
[11:59] <paravoid> using <= 0.55 sounds like a terrible idea
[11:59] <paravoid> it's possible to do configure ceph without ceph-deploy
[11:59] <paravoid> I'd look at the chef cookbook or one of the puppet modules out there
[12:05] <agh> paravoid: ok. thanks
[12:07] * jasoncn (~jasoncn@ has joined #ceph
[12:08] <jasoncn> mdsmap e40: 1/1/1 up {0=node-11815=up:replay}, 1 up:standby
[12:08] <jasoncn> what's wrong with mds?
[12:09] <jasoncn> veriosn o.65
[12:09] <jasoncn> hello anyone?
[12:11] <Gugge-47527> jasoncn: is anything wrong?
[12:12] * deadsimple (~infinityt@ Quit ()
[12:16] <huangjun> did the mds always be replay status?
[12:23] <jasoncn> yes
[12:23] <jasoncn> how to fix it?
[12:23] <jasoncn> i have 3 mons,4 osds,2 mds
[12:24] * yy-nm (~chatzilla@ Quit (Quit: ChatZilla [Firefox 22.0/20130618035212])
[12:25] <jasoncn> and ceph-fuse can not mout ceph fs.
[12:25] <huangjun> you can restart the replay status mds and it not work you can stop that one. in theroy the another mds will take off the
[12:26] <huangjun> replay mds
[12:27] <jasoncn> command line?
[12:28] <huangjun> yes
[12:28] <jasoncn> ceph replay mds?
[12:29] <huangjun> jasoncn:/etc/init.d/ceph stop mds
[12:30] <jasoncn> still standby
[12:31] <huangjun> can you show us the mds log?
[12:31] <jasoncn> stop 2 mds ,then now show :mdsmap e41: 1/1/1 up {0=node-1186=up:replay}
[12:31] <jasoncn> sure
[12:32] <jasoncn> how to show you?
[12:33] <jasoncn> 2013-07-22 18:30:45.735399 7fb3d2ff7700 20 mds.0.bal get_load no root, no load
[12:33] <jasoncn> 2013-07-22 18:30:45.735458 7fb3d2ff7700 15 mds.0.bal get_load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 0.08>
[12:33] <jasoncn> 2013-07-22 18:30:45.843952 7fb3d2ff7700 10 mds.0.11 beacon_send up:replay seq 1001 (currently up:replay)
[12:33] <jasoncn> 2013-07-22 18:30:45.844012 7fb3d2ff7700 1 -- --> -- mdsbeacon(5697/node-1186 up:replay seq 1001 v41) v2 -- ?+0 0x7fb3b411de50 con 0x28d8610
[12:33] <jasoncn> 2013-07-22 18:30:45.845232 7fb3d57fb700 1 -- <== mon.0 1045 ==== mdsbeacon(5697/node-1186 up:replay seq 1001 v41) v2 ==== 111+0+0 (2522235786 0 0) 0x7fb3bc0015b0 con 0x28d8610
[12:33] <jasoncn> 2013-07-22 18:30:45.845262 7fb3d57fb700 10 mds.0.11 handle_mds_beacon up:replay seq 1001 rtt 0.001279
[12:33] <jasoncn> 2013-07-22 18:30:49.844096 7fb3d2ff7700 10 mds.0.11 beacon_send up:replay seq 1002 (currently up:replay)
[12:33] <jasoncn> 2013-07-22 18:30:49.844158 7fb3d2ff7700 1 -- --> -- mdsbeacon(5697/node-1186 up:replay seq 1002 v41) v2 -- ?+0 0x7fb3b411de50 con 0x28d8610
[12:33] <jasoncn> 2013-07-22 18:30:49.845479 7fb3d57fb700 1 -- <== mon.0 1046 ==== mdsbeacon(5697/node-1186 up:replay seq 1002 v41) v2 ==== 111+0+0 (2959849299 0 0) 0x7fb3bc0015b0 con 0x28d8610
[12:33] <jasoncn> 2013-07-22 18:30:49.845517 7fb3d57fb700 10 mds.0.11 handle_mds_beacon up:replay seq 1002 rtt 0.001385
[12:33] <jasoncn> 2013-07-22 18:30:50.735525 7fb3d2ff7700 1 -- --> -- ping v1 -- ?+0 0x7fb3b411de50 con 0x7fb3c80a5c70
[12:33] <joelio> jasoncn: pastebin!
[12:38] <huangjun> jasoncn: i can not see anything abnormal, sorry
[12:39] <sha> pastebin!
[12:39] <jasoncn> ok
[12:39] <jasoncn> waiting
[12:42] <jasoncn> how to upload file?
[12:42] <joelio> https://gist.github.com/ http://pastebin.com/ ?
[12:44] <jasoncn> i have no aaount for that web site
[12:44] <jasoncn> sorry
[12:45] <joelio> erm, you don't need one
[12:45] <joelio> you paste in what you want, it's really not that hard
[12:48] * infinitytrapdoor (~infinityt@ has joined #ceph
[12:49] * jjgalvez1 (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Quit: Leaving.)
[12:49] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[12:49] * infinitytrapdoor (~infinityt@ Quit ()
[12:51] <jasoncn> https://gist.github.com/anonymous/6053003
[12:51] * infinitytrapdoor (~infinityt@ has joined #ceph
[12:51] <jasoncn> hello i paste the bin
[12:52] <jasoncn> anyone help me?
[12:52] <jasoncn> https://gist.github.com/anonymous/6053003
[12:53] <jasoncn> i just know the mds up and out
[12:53] <jasoncn> /** down and out */
[12:53] <jasoncn> 012
[12:53] <jasoncn> case CEPH_MDS_STATE_DNE: return "down:dne";
[12:53] <jasoncn> 013
[12:53] <jasoncn> case CEPH_MDS_STATE_STOPPED: return "down:stopped";
[12:53] <jasoncn> 014
[12:53] <jasoncn> /** up and out */
[12:53] <jasoncn> 015
[12:53] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[12:53] <jasoncn> case CEPH_MDS_STATE_BOOT: return "up:boot";
[12:53] <jasoncn> 016
[12:53] <jasoncn> case CEPH_MDS_STATE_STANDBY: return "up:standby";
[12:53] <jasoncn> 017
[12:53] <jasoncn> case CEPH_MDS_STATE_STANDBY_REPLAY: return "up:standby-replay";
[12:53] <jasoncn> 018
[12:53] <jasoncn> case CEPH_MDS_STATE_CREATING: return "up:creating";
[12:53] <jasoncn> 019
[12:53] <jasoncn> case CEPH_MDS_STATE_STARTING: return "up:starting";
[12:53] <jasoncn> 020
[12:53] <jasoncn> /** up and in */
[12:53] <jasoncn> 021
[12:53] <jasoncn> case CEPH_MDS_STATE_REPLAY: return "up:replay";
[12:54] <jasoncn> 022
[12:54] <jasoncn> case CEPH_MDS_STATE_RESOLVE: return "up:resolve";
[12:54] <jasoncn> 023
[12:54] <jasoncn> case CEPH_MDS_STATE_RECONNECT: return "up:reconnect";
[12:54] <joelio> dude!
[12:54] <jasoncn> 024
[12:54] <jasoncn> case CEPH_MDS_STATE_REJOIN: return "up:rejoin";
[12:54] <jasoncn> 025
[12:54] <joelio> jasoncn: pastebin!!!
[12:54] <jasoncn> case CEPH_MDS_STATE_CLIENTREPLAY: return "up:clientreplay";
[12:54] <jasoncn> 026
[12:54] <jasoncn> case CEPH_MDS_STATE_ACTIVE: return "up:active";
[12:54] <jasoncn> 027
[12:54] <jasoncn> case CEPH_MDS_STATE_STOPPING: return "up:stopping";
[12:54] <jasoncn> https://gist.github.com/anonymous/6053003
[12:54] <joelio> not in the channel, this is poor ettiquete
[12:55] <jasoncn> ok,i see,thanks
[12:57] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[12:58] * huangjun (~huangjun@ Quit (Quit: HydraIRC -> http://www.hydrairc.com <- Chicks dig it)
[13:07] * jasoncn (~jasoncn@ Quit (Quit: Leaving)
[13:08] * syed_ (~chatzilla@ Quit (Quit: ChatZilla [Firefox 22.0/20130627172038])
[13:24] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[13:24] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[13:30] * KindTwo (KindOne@ has joined #ceph
[13:30] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:31] * KindTwo is now known as KindOne
[13:32] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:34] * sleinen1 (~Adium@2001:620:0:2d:d85d:a1aa:6416:7bbc) has joined #ceph
[13:35] * sleinen2 (~Adium@2001:620:0:26:65f4:a733:e24a:b518) has joined #ceph
[13:37] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:40] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[13:41] * sleinen (~Adium@2001:620:0:46:e59f:3863:f077:238c) Quit (Ping timeout: 480 seconds)
[13:42] * sleinen1 (~Adium@2001:620:0:2d:d85d:a1aa:6416:7bbc) Quit (Ping timeout: 480 seconds)
[13:43] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[13:48] * Machske2 (~bram@d5152D8A3.static.telenet.be) has joined #ceph
[13:49] <Machske2> Hi guys, just rebooted one of my mons, (cleanly) and after reboot of the server the mon cannot start/crashes
[13:49] <Machske2> running 0.61.5, seems bad considering it's a stable release
[13:50] <joelio> Machske2: any diagnostics?
[13:50] <Machske2> I've got a log
[13:52] <Machske2> http://pastebin.com/hFHfWG5U
[13:53] <Machske2> I can do some debugging to get more info if someone tells me how :)
[13:59] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:04] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[14:21] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:22] * infinitytrapdoor (~infinityt@ Quit ()
[14:36] * diegows (~diegows@ has joined #ceph
[14:39] <Machske2> all my mons went corrupt
[14:41] * infinitytrapdoor (~infinityt@ has joined #ceph
[14:48] * huangjun (~huangjun@ has joined #ceph
[14:50] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[14:53] <Machske2> Hi guys I need some urgent help here, is there anyone I can contact of someone who can help us
[14:57] <Machske2> http://pastebin.com/qwJCWKhK
[14:57] <joelio> Inktank do paid support if it's mission critical
[14:57] <joelio> Not something I've seen, so can't comment
[14:58] <joelio> (your issue, that is, not the excellent Inktank support!)
[14:58] <alfredodeza> Machske2: it is also a bit early for most of our west-coast team, so you might be unanswered for a bit
[15:01] <huangjun> Machiske2: you can describe your question shortly, bc i can't open the link you paste
[15:02] <huangjun> maybe others do
[15:03] <joelio> huangjun: interesting line is;
[15:03] <joelio> 2013-07-22 14:56:45.995786 7fabc3b10780 -1 mon/OSDMonitor.cc: In function 'virtual void OSDMonitor::update_from_paxos(bool*)' thread 7fabc3b10780 time 2013-07-22 14:56:45.994938
[15:03] <joelio> mon/OSDMonitor.cc: 132: FAILED assert(latest_bl.length() != 0)
[15:03] <Machske2> I have /had 3 mons
[15:03] <Machske2> shut one down (reboot server) after reboot, did not come
[15:03] <Machske2> up
[15:03] <Machske2> restarted other mons, same error
[15:03] <Machske2> looks like an issue in the paxos database
[15:04] <joelio> Machske2: was this a fresh install, or an upgrade from an earlier Ceph version?
[15:04] <Machske2> an upgrade, last weekend from 0.61.4 to 0.61.5
[15:04] <joelio> and you restarted the cluster components?
[15:05] <Machske2> yes
[15:05] <joelio> no issues post restated then?
[15:05] <Machske2> stopped the whole cluster
[15:05] <Machske2> but starting any mon fails
[15:05] <huangjun> this is a problem which tracked by sage today http://tracker.ceph.com/issues/5704
[15:05] <Machske2> all with the same error
[15:06] <Machske2> cannot remove a failing mon, then would mean, that I have to remove them all
[15:06] <Machske2> but then I loose 4.3 TB data
[15:07] <Machske2> looks to be the same error indeed as: http://tracker.ceph.com/issues/5704
[15:07] <joelio> I didn't think a monmap held osd info/metadata?
[15:07] <Machske2> but i cannot start any mon
[15:08] <Machske2> which one to remove :)
[15:10] <huangjun> maybe you can reserve your logs which will be good for debugging
[15:11] * julian (~julianwa@ Quit (Quit: afk)
[15:11] <Machske2> can I remove all mons and add new mons ?
[15:11] <huangjun> add new mon, it needs to communicate with the old mons
[15:12] <huangjun> so i think this will not work, but you can try it
[15:14] * jcfischer (~fischer@user-28-9.vpn.switch.ch) Quit (Quit: jcfischer)
[15:15] <Machske2> any way to rebuild the mons ?
[15:15] <Machske2> osdmap crush map, mrs, ...
[15:15] * john_barbee_ (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[15:16] * mikedawson (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) has joined #ceph
[15:17] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[15:18] * john_barbee_ is now known as john_barbee
[15:19] * sh_t (~sht@ Quit (Ping timeout: 480 seconds)
[15:19] <ofu_> i rebuild one mon once by copying /var/lib/ceph/mon/ceph-ceph0/store.db and keyring from another mon because one mon had a monmap that was too old
[15:20] <ofu_> problem looked like this: 2013-06-19 11:43:35.486652 7ff9ce78b700 10 mon.ceph3@3(probing) e3 got newer/committed monmap epoch 7, mine was 3
[15:20] <ofu_> 2013-06-19 11:43:35.486664 7ff9ce78b700 0 mon.ceph3@3(probing) e7 removed from monmap, suicide.
[15:23] * mikedawson_ (~chatzilla@23-25-19-14-static.hfc.comcastbusiness.net) has joined #ceph
[15:29] * mikedawson (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:30] * jcfischer (~fischer@ has joined #ceph
[15:30] * mikedawson (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) has joined #ceph
[15:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[15:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:31] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[15:33] * mikedawson_ (~chatzilla@23-25-19-14-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:34] * mikedawson_ (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) has joined #ceph
[15:34] * jcfischer_ (~fischer@user-23-15.vpn.switch.ch) has joined #ceph
[15:35] * mikedawson__ (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) has joined #ceph
[15:38] * jcfischer (~fischer@ Quit (Ping timeout: 480 seconds)
[15:38] * jcfischer_ is now known as jcfischer
[15:38] * drokita (~drokita@ has joined #ceph
[15:39] * mikedawson (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:39] * mikedawson__ is now known as mikedawson
[15:39] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[15:39] <agh> hello to all.
[15:40] <agh> I've a question about ceph-deploy
[15:40] <agh> i'm deploying a new cluster with this tool
[15:40] <agh> but, i do not understand :
[15:40] <agh> the ceph.conf genereated file does not contain any info about osds
[15:40] <agh> i.e [osd.0] ... [osd.1] etc
[15:40] <agh> is it the normal behaviour ?
[15:41] <alfredodeza> agh: what command are you using exactly?
[15:41] <alfredodeza> also, did you followed the getting started guide for ceph-deploy?
[15:41] <agh> alfredodeza: Ifollow the doc
[15:41] <agh> alfredodeza: 1) ceph-deploy new node1
[15:42] <agh> 2) ceph-deploy mon create node1
[15:42] <agh> 3) ceph-deploy gatherkeys node1
[15:42] * mikedawson_ (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:42] <Pauline> fyi: ceph-deploy seems to add some flags to the global part of ceph.conf. osd's pick it up from there. you only need to the [osd.x] sections for osd specific config
[15:42] <agh> 4) ceph-deploy osd create node1:sda:/dev/journal/0
[15:42] <joelio> agh: there is no config for osd's when generated via cepg-deploy - it creates fstab entries for the mount and uses anything in the /var/lib/ceph/osd dirs
[15:43] <agh> joelio: mm... there is nothing in fstab
[15:44] <alfredodeza> agh: does your log file look ok? do you see any warnings/errors ?
[15:44] <agh> for the moment, everything is ok
[15:44] <agh> but, maybe there is no more need of [osd.x] sections in ceph.conf
[15:44] <huangjun> and from the init-ceph code, it first find the sysint in /var/lib/ceph/osd dir, so you can stop and restart the osd when you didn't reboot the machine
[15:45] <agh> i'm not new to ceph, but new with ceph-deploy :)
[15:45] <huangjun> if you reboot the machine, and use the default ceph.conf then the command /etc/init.d/ceph start osd will not work
[15:46] <agh> huangjun: hu. That's not cool. Why ?
[15:46] <joelio> agh: sorry, not fstab, but the init scripts iirc - somewhere anyway - point being it reads /va/lib/ceph/osd/*
[15:46] <huangjun> yes,not cool, i think it should set the related section in ceph.conf
[15:47] <huangjun> wait developers to answer this confusion
[15:47] <joelio> I agree personally, I really don't like how ceph-deploy works, but hey ho..
[15:47] <agh> joelio: i was looking for a way to deploy Ceph 100% by hand...
[15:47] <huangjun> agh: did you try chef-book?
[15:48] <agh> no, because i don't want Chef :) I wan SaltStack
[15:48] <alfredodeza> agh if you want something non-chef I did a bit of work with Ansible
[15:48] <joelio> you could juts grok the puppet/chef scripts if your feeling masochistic
[15:48] <alfredodeza> I got as far as the 'getting started guide'
[15:49] <alfredodeza> agh: https://github.com/alfredodeza/ceph-ansible
[15:49] <agh> i know how to add mons/osds by hand
[15:49] <agh> but, i don't know how to create a virgin cluster
[15:49] <joelio> just read the ceph-deploy script then, it's python
[15:49] <joelio> see what it does
[15:50] <agh> joelio: yes sure
[15:50] <joelio> or any other config management of choice..
[15:50] <Machske2> guys! http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg15422.html saved my ass
[15:51] <Machske2> I have backups of the mon data dirs for further analisys
[16:05] * sleinen (~Adium@2001:620:0:26:65f4:a733:e24a:b518) has joined #ceph
[16:05] * WarrenTheAardvarkUsui (~WarrenUsu@ has joined #ceph
[16:06] * sprachgenerator (~sprachgen@vis-v410v141.mcs.anl-external.org) has joined #ceph
[16:07] * Cybertinus (~Cybertinu@2001:828:405:30:83:96:177:42) Quit (Read error: Connection reset by peer)
[16:07] * sleinen2 (~Adium@2001:620:0:26:65f4:a733:e24a:b518) Quit (Read error: Connection reset by peer)
[16:07] * Cybertinus (~Cybertinu@2001:828:405:30:83:96:177:42) has joined #ceph
[16:07] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[16:07] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[16:11] * WarrenUsui (~WarrenUsu@ Quit (Ping timeout: 480 seconds)
[16:12] * mikedawson (~chatzilla@50-195-193-105-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[16:14] * huangjun (~huangjun@ Quit (Quit: HydraIRC -> http://www.hydrairc.com <- Wibbly Wobbly IRC)
[16:14] * infinitytrapdoor (~infinityt@ Quit (Ping timeout: 480 seconds)
[16:20] * huangjun (~huangjun@ has joined #ceph
[16:20] <alfredodeza> do we have a 'getting started' guide that uses CentOS rather than Ubuntu?
[16:20] <alfredodeza> all the examples I could find where about DEBs and sources.list, which is fine, but I need to know how to get a similar setup in CentOS
[16:22] * sprachgenerator (~sprachgen@vis-v410v141.mcs.anl-external.org) Quit (Ping timeout: 480 seconds)
[16:22] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:23] * yanzheng (~zhyan@ has joined #ceph
[16:23] * markbby (~Adium@ has joined #ceph
[16:25] <huangjun> what's you want to do on centos?
[16:25] <joelio> alfredodeza: http://ceph.com/docs/next/install/ - distincly has an RPM package section?
[16:25] <Machske2> Btw guys, I'm guessing that anyone running 0.61.5 has potentially the same problem. A healthy setup all of the sudden becomes unavailable. I'm guessing the guys at Inktank are going to get busy. It occurs after a restart.
[16:25] <alfredodeza> joelio: you beat me to it
[16:25] <alfredodeza> I just found it
[16:25] <Machske2> so don't restart :)
[16:25] <alfredodeza> huangjun: I wanted to know how to do this: sudo rpm --import 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc'
[16:25] <alfredodeza> thanks guys
[16:29] <alfredodeza> woah, I could not install the ceph rpm on CentOS
[16:29] <alfredodeza> a whole bunch of things missing it seems: Error: Package: libcephfs1-0.61.5-0.el6.x86_64 (ceph)
[16:29] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:29] <alfredodeza> `uname -r` gives me this CentOS version: 2.6.32-279.el6.x86_64
[16:30] <alfredodeza> which according to the docs seems what we support (Centos6 (el16) )
[16:30] <huangjun> use uname -a, the ceph depends on libcephfs1 librados2 librbd
[16:30] <alfredodeza> ok
[16:31] <alfredodeza> `uname -a` gives me: Linux localhost.localdomain 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
[16:31] <alfredodeza> still, looks like what we support
[16:31] <alfredodeza> shouldn't the RPM install the dependencies?
[16:32] <Pauline> 279 is bloody old though
[16:32] <joelio> since when did RPM's install their own deps? :) EPEL or via yum maybe :)
[16:32] <Pauline> i'm at -358
[16:32] <joelio> looks really old that kernel btw
[16:32] <alfredodeza> joelio: sure, via yum
[16:32] <alfredodeza> joelio: but our docs say we support that
[16:33] <joelio> don't know I'm afraid
[16:33] <Pauline> and as for your yum import, you forgot to add the ceph repo to your set.
[16:33] * joelio debian/ubuntu user
[16:33] <huangjun> the newest centos 6.4 kernel version is also 2.6.32
[16:34] <huangjun> nothing to do with kernel, we can build ceph on centos6.4 successfully.
[16:34] <alfredodeza> Pauline: I followed the instructions here: http://ceph.com/docs/next/install/rpm/
[16:34] <Pauline> 279 was released Jun 13 2012, almost a year old now. 358 is from Jun 11 2013
[16:34] <alfredodeza> what section should I do to add the ceph repo?
[16:35] * alfredodeza has followed everything from the top
[16:35] <Pauline> alfredodeza: somewhere further down is the repo to add.
[16:35] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[16:35] <joelio> the 'Add release packages' should contain the repo list non?
[16:35] <alfredodeza> joelio: that is what I thought too
[16:36] <Pauline> maybe you left DISTRO in? ^^
[16:36] <joelio> alfredodeza: so, check your yum repos.d list then, what's in it?
[16:37] <alfredodeza> Pauline: no, this is the actual command I ran: su -c 'rpm -Uvh http://ceph.com/rpm-cuttlefish/el6/x86_64/ceph-release-1-0.el6.noarch.rpm'
[16:37] <alfredodeza> followed by `sudo yum install ceph`
[16:37] <joelio> yum updater
[16:37] <joelio> yum update - even
[16:37] * alfredodeza dislikes `yum` with a passion
[16:37] <joelio> use debian then :)
[16:38] <alfredodeza> joelio: I am fixing a bug with CentOS :)
[16:38] <Pauline> you have a ceph.repo in your /etc/yum.repos.d?
[16:38] <Pauline> and what can I say... it worked for me! ^^
[16:38] * alfredodeza checks
[16:38] * Pauline loves `yum` with a passion :P
[16:39] <alfredodeza> :)
[16:39] * joelio likes any package manager that doesn't mean compiling from source
[16:39] <alfredodeza> yep, I have a ceph.repo in /etc/yum.repos.d/
[16:39] <Pauline> aww, so 'make' is out ^^
[16:40] <Pauline> alfredodeza: if you do "yum update" does it show getting stuff from ceph? maybe do a "yum clear all" first.
[16:41] <alfredodeza> `yum update` said it needed to download 170MB of stuff
[16:41] <alfredodeza> so it is probably that
[16:41] <Pauline> well, i guess thats mostly due to the old kernel and stuff. sounds like you're running a tad behind.
[16:42] <Pauline> but by all means, update and returns when your system is sprankly fresh.
[16:42] <alfredodeza> nope, it completed, I ran `sudo yum install ceph` and it failed in the same way
[16:43] <alfredodeza> there are missing stuff: http://pastebin.com/fp2pZ5Pf
[16:43] <alfredodeza> libsnappy being one of them
[16:44] <joelio> do you need to enable to other EPEL or something?
[16:44] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[16:44] <joelio> although those deps look very ceph specific
[16:45] <Pauline> yes, and your yum is NOT using the ceph repository
[16:45] <Pauline> is it possible you have enabled=0 in that ceph.repo?
[16:45] <joelio> hah, yea
[16:45] <alfredodeza> Pauline: let me check
[16:46] <joelio> although how could you get ceph as a listing at all? If there was no ceph repo?
[16:46] <joelio> surely it'd just say, sorry, no ceph found
[16:46] <alfredodeza> it does say `enabled=1`
[16:47] <Pauline> there is no ceph repo listed at the top, only base, extras and updates...
[16:47] <Pauline> do a "yum clear all", something is messed up.
[16:47] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[16:47] <Pauline> clean*
[16:47] * alfredodeza tries that
[16:48] <Pauline> it should say something like "Cleaning repos: ceph ceph-deploy cobbler-base cobbler-updates epel"
[16:48] <alfredodeza> similar, yes: Cleaning repos: base ceph extras updates
[16:49] <Pauline> ok, now a yum update?
[16:49] <alfredodeza> aha
[16:49] <alfredodeza> no dice
[16:49] <alfredodeza> weird, not sure what am I missing
[16:50] <Pauline> let me check where libsnappy comes from on my system
[16:51] <Pauline> oi. EPEL.
[16:51] <Pauline> you might want to add EPEL as a repo then.
[16:51] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[16:52] <Pauline> and same for leveldb. So you really need it.
[16:52] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[16:52] * bergerx_ (~bekir@ Quit (Ping timeout: 480 seconds)
[16:54] <joelio> this is why I don't like yum (not the tool, just the ecosystem in general)
[16:55] <alfredodeza> Pauline: so I am not sure how to add EPEL
[16:55] * alfredodeza is clearly not an rpm person
[16:55] <joelio> although that may be down to painful RPM days of yesteryear - circular deps.. rpmfind.net dag wiers rpms... etc..
[16:55] <Pauline> alfredodeza, follow this: http://fedoraproject.org/wiki/EPEL
[16:56] <Pauline> of skip ahead of the class and do "yum install http://download.fedoraproject.org/pub/epel/6/i386/repoview/epel-release.html" ^^
[16:56] <Pauline> yes, these days its just centos, epel, rpmforge...
[16:56] <Pauline> and just about any developer with too much disk and bandwidth ^^
[16:57] <Pauline> oops, ignore that skip ahead thing. that points to an html file. sorry
[16:59] * bergerx_ (~bekir@ has joined #ceph
[17:01] * yanzheng (~zhyan@ has joined #ceph
[17:01] * huangjun (~huangjun@ Quit (Quit: HydraIRC -> http://www.hydrairc.com <- Like it? Visit #hydrairc on EFNet)
[17:05] <alfredodeza> w00t it is now installing Ceph!
[17:05] <alfredodeza> thanks Pauline
[17:05] <alfredodeza> and joelio
[17:06] <Pauline> enjoy!
[17:09] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:10] * aliguori (~anthony@ has joined #ceph
[17:14] * sleinen (~Adium@2001:620:0:26:65f4:a733:e24a:b518) Quit (Quit: Leaving.)
[17:19] <joelio> alfredodeza: good stuff - sounds like a doc update to add EPEL needed
[17:19] <alfredodeza> joelio: on it :)
[17:22] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[17:22] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[17:31] * alram (~alram@ has joined #ceph
[17:31] * jcfischer (~fischer@user-23-15.vpn.switch.ch) Quit (Quit: jcfischer)
[17:34] * jcfischer (~fischer@ has joined #ceph
[17:37] * jcfischer_ (~fischer@user-23-13.vpn.switch.ch) has joined #ceph
[17:38] * devoid (~devoid@ has joined #ceph
[17:39] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[17:41] * madkiss (~madkiss@2001:6f8:12c3:f00f:2870:e33a:5d26:a4fc) Quit (Remote host closed the connection)
[17:42] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:42] * jcfischer (~fischer@ Quit (Ping timeout: 480 seconds)
[17:42] * jcfischer_ is now known as jcfischer
[17:44] * Machske2 (~bram@d5152D8A3.static.telenet.be) Quit (Quit: Leaving)
[17:46] <jtang> yum isnt too bad these days
[17:46] <jtang> its pretty actually
[17:46] <jtang> better than apt imho for most enterprisey kinda stuff
[17:46] <jtang> then again ubuntu/debian is a bit better these days too
[17:46] <jtang> i always found apt to be a bit hairy for doing automated deploys (its too verbose)
[17:47] <jtang> always asking questions and stuff
[17:48] <jtang> pretty good that is
[17:49] <joelio> jtang: you know all you need to do for automation with dpkg is just to preseed the answers so it doesn't try interactive mode, right?
[17:50] <jtang> joelio: yea
[17:50] <jtang> joelio: its that, which i dont like
[17:50] <paravoid> no need to preseed
[17:50] <jtang> i guess you can save the answers from the package system then dump them into the preseed
[17:50] <joelio> DEBCONF_FRONTEND="noninteractive" apt-get install -y --allow-unauthenticated --force-yes -o DPkg::Options::="--force-overwrite" -o DPkg::Options::="--force-confdef" stuff
[17:51] <paravoid> yeah, noninteractive just works
[17:51] <jtang> joelio: im aware of that ;) we used to do that with cfengine/FAI
[17:51] <joelio> ahh, FAI :)
[17:51] <jtang> that was a nightmare :P
[17:52] <joelio> I really liked it :)
[17:52] <jtang> well it was for us when we had one of the first ib clusters in europe
[17:52] <jtang> we had to roll custom kernels
[17:52] <jtang> and all sorts of pacakges
[17:52] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[17:52] <joelio> that's not FAI's fault though :)
[17:52] <jtang> im much happier with kickstart
[17:52] <jtang> yea i suppose ;)
[17:53] <joelio> I use foreman nowadays - love it
[17:53] <jtang> i think kickstart/rpm is just cleaner and easier to get going
[17:53] <jtang> heh, we do minimalistic kickstarts here, then provision boxes with either custom repos and rpms, puppet or more recently ansible
[17:54] <joelio> it's a vi/emacs type thing I guess.. I prefer preseed/debs - but that's what I'm used to now.
[17:54] <jtang> we never really got into cobbler, foreman and the likes of that
[17:54] * sagelap (~sage@202.sub-70-197-69.myvzw.com) has joined #ceph
[17:54] <joelio> I'm using foreman mainly for puppet type stuff tbh
[17:54] <joelio> that's where it shines (for me)
[17:56] * Cube (~Cube@ has joined #ceph
[17:58] * bandrus (~Adium@ has joined #ceph
[18:00] <jtang> time to go, its 5pm!
[18:00] * jcfischer (~fischer@user-23-13.vpn.switch.ch) Quit (Quit: jcfischer)
[18:01] * wrencsok (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[18:09] * off_rhoden (~anonymous@pool-173-79-66-35.washdc.fios.verizon.net) has joined #ceph
[18:14] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[18:16] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:20] * wrencsok (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[18:20] * sagelap (~sage@202.sub-70-197-69.myvzw.com) Quit (Read error: Connection reset by peer)
[18:26] * madkiss (~madkiss@2001:6f8:12c3:f00f:2870:e33a:5d26:a4fc) has joined #ceph
[18:26] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[18:27] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit ()
[18:30] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[18:40] * rturk-away is now known as rturk
[18:41] * leseb1 (~Adium@ Quit (Quit: Leaving.)
[18:43] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[18:43] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[18:47] * sleinen (~Adium@ has joined #ceph
[18:53] * sleinen (~Adium@ Quit (Remote host closed the connection)
[18:54] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) has joined #ceph
[19:00] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[19:06] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[19:07] * rturk is now known as rturk-away
[19:11] * dmick (~dmick@2607:f298:a:607:391e:4fd:d328:a6ee) Quit (Ping timeout: 480 seconds)
[19:13] <gregaf> yehuda_hm: I'm a bit confused about wip-5691
[19:13] <gregaf> I guess the lost ACLs was not just a memory thing, but we were actually removing them?
[19:16] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) has left #ceph
[19:18] <paravoid> the fix works
[19:18] <yehuda_hm> gregaf: we didn't remove them
[19:18] <yehuda_hm> gregaf: we just didn't read them, in case the bucket info still resided in the bucket entry point
[19:18] <gregaf> I saw that it worked in your comment, just confused
[19:19] <yehuda_hm> originally bucket info ( + bucket attrs) resided in the what we call now entry point object
[19:20] <gregaf> yeah
[19:20] <yehuda_hm> new buckets are now written to two different objects, where all the relevant bucket instance info go to the bucket instance object
[19:20] <yehuda_hm> but the problem was that the new code that handles that didn't return the bucket attrs in case it was read from the entry point
[19:20] <gregaf> which includes ACLs, right
[19:22] <gregaf> okay, but this patch looks to just be grabbing stuff into a function-local variable and then ignoring it, so I don't understand how it's fixing anything...
[19:22] * dmick (~dmick@2607:f298:a:607:99c2:3046:25c8:89c) has joined #ceph
[19:23] <yehuda_hm> gregaf: we don't need these attrs everywhere
[19:24] <gregaf> oh, wait, get_bucket_info is the important one in that sense, there we go
[19:24] <yehuda_hm> the interesting place is at get_bucket_info()
[19:24] <yehuda_hm> right
[19:24] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[19:25] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:25] <gregaf> and the reason we're passing them into put() is because that is in fact writing them out in the new correct location farther down the chain?
[19:26] <yehuda_hm> to put()?
[19:26] <gregaf> put-bucket_entrypoint_info
[19:27] <yehuda_hm> yes
[19:27] <gregaf> okay
[19:27] <yehuda_hm> iirc
[19:27] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[19:28] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[19:28] <gregaf> okay, looks good, Reviewed-by
[19:29] <yehuda_hm> cool, thanks
[19:29] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[19:29] <gregaf> you want to merge where appropriate?
[19:29] <yehuda_hm> you can just merge it to next
[19:31] <sagewk> sjust: wip-osd-leaks
[19:37] <paravoid> yehuda_hm: so, how do I convert old buckets to new ones?
[19:37] <paravoid> new format that is
[19:37] <yehuda_hm> paravoid: you don't need to
[19:38] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Quit: foo)
[19:38] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[19:38] <paravoid> does that mean it'll be done automatically?
[19:38] <gregaf> it will do so when the bucket changes in any way
[19:38] <yehuda_hm> yeah, when they're written
[19:41] * Tamil (~tamil@ has joined #ceph
[19:41] <gregaf> sagewk: joao: we don't have a pull request for wip-paxos?
[19:43] * aliguori_ (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[19:44] <sagewk> i'll open one
[19:49] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:50] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Read error: Connection reset by peer)
[19:55] * jane_dfg (~G25@ has joined #ceph
[19:56] * jane_dfg (~G25@ Quit (Remote host closed the connection)
[19:59] * markbby1 (~Adium@ has joined #ceph
[19:59] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[20:00] * diegows (~diegows@ has joined #ceph
[20:00] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[20:02] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:02] * markbby (~Adium@ Quit (Remote host closed the connection)
[20:08] * tchmnkyz (~jeremy@0001638b.user.oftc.net) Quit (Quit: Lost terminal)
[20:09] * LeaChim (~LeaChim@0540adc6.skybroadband.com) has joined #ceph
[20:18] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Read error: Connection reset by peer)
[20:24] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[20:33] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[20:43] * Tamil (~tamil@ Quit (Quit: Leaving.)
[20:49] * Tamil (~tamil@ has joined #ceph
[20:51] * wer_ (~wer@206-248-239-142.unassigned.ntelos.net) Quit (Remote host closed the connection)
[20:51] * wer (~wer@206-248-239-142.unassigned.ntelos.net) has joined #ceph
[20:58] * lx0 is now known as lxo
[21:16] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:18] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[21:20] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[21:22] * Cube1 (~Cube@ has joined #ceph
[21:24] <gregaf> yehuda_hm: comments on https://github.com/ceph/ceph/pull/458 for the copy updates
[21:25] <yehuda_hm> thanks
[21:25] <paravoid> so why am I getting
[21:25] <paravoid> 2013-07-22 15:39:40.984622 osd.25 [WRN] slow request 30.884952 seconds old, received at 2013-07-22 15:39:10.099558: osd_op(client.32373.0:1205987 .dir.10267.409 [call rgw.bucket_prepare_op] 3.ed9872d4 e195956) v4 currently waiting for degraded object
[21:25] <paravoid> occasionally?
[21:25] <paravoid> occasionally when I restart OSDs that is :)
[21:26] <paravoid> what does "waiting for degraded object" means?
[21:26] * Cube (~Cube@ Quit (Read error: Operation timed out)
[21:26] <gregaf> generally that the object doesn't exist (can just be out of date) on the primary so it's fetching it from elsewhere
[21:27] <gregaf> surprised it's taking that long though
[21:27] <gregaf> another thing for you to throw at sjust
[21:30] <sjust> paravoid: it is waiting to recover the item before performing the write
[21:30] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Ping timeout: 480 seconds)
[21:31] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[21:36] * haomaiwa_ (~haomaiwan@ has joined #ceph
[21:42] * haomaiwang (~haomaiwan@ Quit (Ping timeout: 480 seconds)
[21:44] * scuttlemonkey (~scuttlemo@75-150-32-73-Oregon.hfc.comcastbusiness.net) has joined #ceph
[21:44] * ChanServ sets mode +o scuttlemonkey
[21:48] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[21:51] * jskinner (~jskinner@ has joined #ceph
[21:58] * markbby1 (~Adium@ Quit (Quit: Leaving.)
[21:59] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[22:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[22:03] * markbby (~Adium@ has joined #ceph
[22:23] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:25] <yehuda_hm> gregaf: just pushed wip-5664, need review
[22:34] * jakes (~oftc-webi@128-107-239-233.cisco.com) has joined #ceph
[22:34] <jakes> what is the way to reconstruct a cluster if it is in a bad state?
[22:37] <Gugge-47527> fix the broken part :)
[22:37] <Gugge-47527> that is the best answer you will get without telling what bad state it is in :)
[22:38] <janos> move it to a nice state with good tax laws and low unemployment
[22:38] <janos> ;)
[22:40] <jakes> :). all 17:36:53.786648 mon.0 [INF] pgmap v3390: 192 pgs: 48 creating, 144 stale+active+clean; 31982 bytes data, 8230 MB used, 18300 MB / 27964 MB avail
[22:42] <Gugge-47527> and all osds are running?
[22:44] <jakes> yup
[22:45] <jakes> all osd's are running. I restarted osd's as told in troubleshooting
[22:53] <Gugge-47527> what is the output of "ceph osd tree" ?
[22:55] * markbby (~Adium@ Quit (Quit: Leaving.)
[22:57] <jakes> i restarted the machine. And now all osd's are down and out.
[23:01] <jakes> I have used ceph deploy to start the cluster. Service ceph start osd.0 gives error that osd.0 is not found in the conf file.
[23:02] <Tamil> jakes: ceph-deploy uses upstart to start the daemons
[23:04] <Tamil> jakes: http://ceph.com/docs/master/rados/operations/operating/#running-ceph-with-upstart
[23:05] <Tamil> jakes: looks like ceph-deploy supports both init.d scripts and upstart but from the error you mentioned, i guess in your case, it has started using upstart
[23:06] <jakes> yup. I have done and it says start/running. but, ceph -w says all are down and out
[23:07] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:08] <Tamil> jakes: which distro are you trying this on?
[23:08] <jakes> ubuntu 13.04
[23:09] * sleinen1 (~Adium@2001:620:0:26:9c5d:377c:568b:782c) has joined #ceph
[23:10] <Tamil> jakes: anything from the osd logs - /var/log/ceph/ceph-osd.*.log?
[23:13] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:14] <jakes> errors are present in http://pastebin.com/xSwGa7gr
[23:14] <yehuda_hm> gregaf: pushed a few fixes to wip-5693, you can take a look
[23:15] * via (~via@smtp2.matthewvia.info) Quit (Ping timeout: 480 seconds)
[23:15] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:17] <gregaf> yehuda_hm: looks good, can you squash that cast into the original commit and repush?
[23:18] <gregaf> one github comment on the swift forwarding (looks like a useless check to me?), rest looks good
[23:20] <yehuda_hm> gregaf: thanks
[23:23] <jakes> all nodes have same error in the logs
[23:24] <Tamil> jakes: i hope the disks are still mounted
[23:24] * zackc (~zack@0001ba60.user.oftc.net) has joined #ceph
[23:25] <dmick> welcome zackc
[23:25] * alram (~alram@ Quit (Quit: leaving)
[23:25] <jakes> i haven't used separate partition for journal. just one ceph-deploy osd create
[23:25] <Machske> Guys, had all my mons failing in 0.61.5 today, bug: http://tracker.ceph.com/issues/5704
[23:25] <Machske> I have the original mon data of all 3 mons
[23:26] <zackc> thanks!
[23:26] <Machske> I "fixed" my mons by patching the symptons so I had ma 4.3TB data back
[23:26] <Machske> but if I can help provide data for debugging, I can still reproduce it, or deliver the mon data's so it can be reproduced
[23:26] <sagewk> Machske: can you try wip-cuttlefish-osdmap and see if it successfully works around the issue?
[23:27] <sagewk> http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-cuttlefish-osdmap/ if you are on ubuntu 12.04
[23:27] <Tamil> jakes: what is the osd create command that you used ?
[23:27] <Machske> sagewk, I'm running debian squeeze, but with custom kernel (3.6.1 atm) and ceph has been compiled from source
[23:27] * DarkAce-Z (~BillyMays@ has joined #ceph
[23:28] <Machske> is there a patch available against the source ?
[23:28] <sagewk> then cherry-pick the origin/wip-cuttlefish-osdmap patch
[23:29] <jakes> ceph-deploy osd prepare master:/var/lib/ceph/tmp/ and ceph-deploy osd activate master:/var/lib/ceph/tmp/
[23:30] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[23:31] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[23:33] <Tamil> jakes: could you pelase give more info about your cluster?
[23:33] * via (~via@smtp2.matthewvia.info) has joined #ceph
[23:33] * sleinen1 (~Adium@2001:620:0:26:9c5d:377c:568b:782c) Quit (Quit: Leaving.)
[23:33] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:33] <yehuda_hm> gregaf: I re-pushed both branches (wip-5664, wip-5693)
[23:35] <jakes> Tamil: I am having three node cluster . master node is having mds+moon+osd while other two nodes have only osd's. Now, I could see only two osd's but both are down and put. http://pastebin.com/qbPa9HzA
[23:35] <jakes> after a rboot*
[23:36] * mschiff_ (~mschiff@port-50293.pppoe.wtnet.de) Quit (Quit: No Ping reply in 180 seconds.)
[23:36] * mschiff (~mschiff@port-50293.pppoe.wtnet.de) has joined #ceph
[23:39] <Tamil> jakes: please file a bug with the logs that you have.
[23:39] <jakes> fine. I will do
[23:39] <jakes> Tamil, How can i restore the setup?
[23:41] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:42] <jakes> Tamil: I can recreate the cluster as this is a test setup. What can I do for that?
[23:50] <Tamil> jakes: to recreate, you may have to purge followed by purgedata on the admin node and create the cluster fresh
[23:50] * DarkAce-Z is now known as DarkAceZ
[23:51] <jakes> Tamil: can you give me link to these commands ?
[23:52] <Tamil> jakes: sure, it should be in ./ceph-deploy -help
[23:53] <Tamil> jakes: http://ceph.com/docs/master/rados/deployment/ceph-deploy-purge/
[23:53] <jakes> Tamil: Thanks
[23:53] <Tamil> jakes: np
[23:54] <jakes> Tamil: which all logs do I need for filing the isssue?
[23:55] <Tamil> jakes: osd, mon and mds logs. in your case i suspect the way you created the osds. do you see the symlinks created for /var/lib/ceph/osd/ceph-0,etc..?
[23:58] <jakes> yes. Is it journal always needed?. I didn't give
[23:58] <Tamil> jakes: thats ok

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.