#ceph IRC Log


IRC Log for 2013-05-22

Timestamps are in GMT/BST.

[0:02] <hufman> but! i must go home, i will continue fiddling with this more later tonight!
[0:02] * jtang1 (~jtang@ has joined #ceph
[0:02] * hufman (~hufman@rrcs-67-52-43-146.west.biz.rr.com) has left #ceph
[0:12] * The_Bishop (~bishop@2001:470:50b6:0:8855:874:5466:7580) has joined #ceph
[0:15] * aliguori (~anthony@ Quit (Remote host closed the connection)
[0:35] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:39] * tziOm (~bjornar@ti0099a340-dhcp0870.bb.online.no) Quit (Remote host closed the connection)
[0:39] * portante (~user@ Quit (Quit: gotta go)
[0:40] * esammy (~esamuels@host-2-102-69-49.as13285.net) Quit (Quit: esammy)
[0:41] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[0:54] <kyle__> hello all. I'm having trouble getting my OSDs to activate on a new cluster. I see this in the OSD log:
[0:54] <kyle__> ...
[0:54] <kyle__> filestore(/var/lib/ceph/tmp/mnt.u6hL4v) FileStore::mount : stale version stamp 0. Please run the FileStore update script before starting the OSD, or set filestore_update_to to 3
[0:57] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[1:05] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[1:13] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[1:16] * mtanski (~mtanski@ Quit (Ping timeout: 482 seconds)
[1:20] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:20] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[1:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:30] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Read error: Connection reset by peer)
[1:30] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[1:32] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[1:36] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:36] * MarkN (~nathan@ has joined #ceph
[1:36] * MarkN (~nathan@ has left #ceph
[1:39] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[1:58] * alram (~alram@ Quit (Quit: leaving)
[2:00] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:01] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[2:05] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[2:07] * DarkAceZ (~BillyMays@ has joined #ceph
[2:08] * LeaChim (~LeaChim@ Quit (Read error: Operation timed out)
[2:10] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving.)
[2:13] * rahmu (~rahmu@ip-147.net-81-220-131.standre.rev.numericable.fr) Quit (Remote host closed the connection)
[2:25] * DarkAce-Z (~BillyMays@ has joined #ceph
[2:27] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[2:34] * yehuda_hm (~yehuda@2602:306:330b:1410:6d5b:cd48:2d40:7a01) Quit (Ping timeout: 480 seconds)
[2:37] * yehuda_hm (~yehuda@2602:306:330b:1410:c5a7:1d24:87e7:190f) has joined #ceph
[2:39] * jtang1 (~jtang@ has joined #ceph
[2:43] * Tamil (~tamil@ Quit (Quit: Leaving.)
[2:47] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[2:51] * DarkAce-Z is now known as DarkAceZ
[2:58] * hufman (~hufman@CPE-72-128-64-245.wi.res.rr.com) has joined #ceph
[2:58] <hufman> heeeello!
[2:58] <hufman> how is everyone's evening?
[2:59] * Cube (~Cube@66-87-112-70.pools.spcsdns.net) Quit (Quit: Leaving.)
[3:00] <hufman> so i have a two node cluster, and one node is down due to skipping the bobtail upgrade
[3:01] <hufman> should i remove that node from the monmap, or try to reinit that node with the current 2-node monmap?
[3:01] <hufman> i can't figure out how to extract the auth keys from a down cluster, so i'm leaning towards hacking it out of the monmap and readding it
[3:01] * yehuda_hm (~yehuda@2602:306:330b:1410:c5a7:1d24:87e7:190f) Quit (Ping timeout: 480 seconds)
[3:05] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[3:13] <hufman> and now i have a running 1mon cluster, weeeee
[3:14] * BillK (~BillK@124-169-236-155.dyn.iinet.net.au) has joined #ceph
[3:21] * yehuda_hm (~yehuda@2602:306:330b:1410:c5a7:1d24:87e7:190f) has joined #ceph
[3:22] <hufman> and the mon cluster is alive!
[3:22] <hufman> thanks so much, sage!
[3:22] <dmick> good for you. what did you do?
[3:33] * jtang1 (~jtang@ has joined #ceph
[3:34] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[3:38] <hufman> now, how to fix a crashing mds?
[3:41] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[3:47] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[3:49] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:56] <hufman> mds/MDCache.cc: In function 'void MDCache::open_snap_parents()': FAILED assert(reconnected_snaprealms.empty())
[4:03] * treaki__ (0b0c2150c1@p4FDF714E.dip0.t-ipconnect.de) has joined #ceph
[4:03] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:05] <dmick> http://tracker.ceph.com/issues/5031 but I don't know more
[4:06] * treaki_ (2cf4f0764e@p4FDF7C7E.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:17] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[4:18] * markl (~mark@tpsit.com) Quit (Quit: leaving)
[4:20] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[4:20] <hufman> ok :)
[4:21] <hufman> is there any documentation about mds? the articles i found didn't say much, like about shrinking and reinitializing mds nodes
[4:22] <dmick> not that would help with that error, but what we have is under CephFS
[4:22] <dmick> http://ceph.com/docs/master/cephfs/
[4:23] <hufman> oh sneaky
[4:24] <hufman> well, one node of my mds is working, which is enough for ceph-fuse
[4:24] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[4:25] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:26] <dmick> add comments to that bug if you can
[4:27] * jtang1 (~jtang@ has joined #ceph
[4:28] <hufman> i dunno what more information i can give
[4:28] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:28] <hufman> started happening after the upgrade?
[4:28] <hufman> ummm
[4:29] <dmick> just a "me too" would be useful
[4:35] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[4:36] * coyo (~unf@00017955.user.oftc.net) Quit (Remote host closed the connection)
[4:38] <hufman> done :)
[4:42] <hufman> have you guys tried out qemu-kvm with librbd?
[4:54] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[4:54] <dmick> hufman: yes, many people use it
[4:57] <hufman> has anyone experienced a thing where, if the underlying storage is slow, the vm will outright pause?
[4:57] <hufman> it doesn't make the vm's io pause, it makes the entire vm pause
[4:58] <dmick> I've heard people talk about it
[4:58] <dmick> there's been some work to make qemu less synchronous in the last few months
[4:59] <hufman> ok, so they are working on it? awesome :)
[4:59] <dmick> they. we. yes.
[5:02] <hufman> weee
[5:03] <hufman> my ceph install apparently can't handle many iops, so one of my VMs gets to stay local
[5:03] <hufman> it's because of munin, i think
[5:21] * jtang1 (~jtang@ has joined #ceph
[5:24] * The_Bishop (~bishop@2001:470:50b6:0:8855:874:5466:7580) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[5:29] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[5:41] <hufman> but now i sleep!
[5:41] <hufman> good night wonderful sirs!!
[5:41] * hufman (~hufman@CPE-72-128-64-245.wi.res.rr.com) Quit (Quit: leaving)
[5:45] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:58] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:01] * MarkN (~nathan@ has joined #ceph
[6:01] * MarkN (~nathan@ has left #ceph
[6:03] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[6:07] * vipr (~vipr@78-23-118-188.access.telenet.be) Quit (Read error: Connection reset by peer)
[6:07] * vipr (~vipr@78-23-118-188.access.telenet.be) has joined #ceph
[6:15] * jtang1 (~jtang@ has joined #ceph
[6:23] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[6:28] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[6:40] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:49] * The_Bishop (~bishop@f052098056.adsl.alicedsl.de) has joined #ceph
[7:06] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[7:10] * jtang1 (~jtang@ has joined #ceph
[7:10] * drokita1 (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Read error: Operation timed out)
[7:11] * Kdecherf (~kdecherf@shaolan.kdecherf.com) Quit (Ping timeout: 480 seconds)
[7:18] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[7:19] * Kdecherf (~kdecherf@shaolan.kdecherf.com) has joined #ceph
[7:26] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[7:30] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:47] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[7:57] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[8:02] * tnt (~tnt@ has joined #ceph
[8:03] * Bram (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[8:03] * Bram (~Bram@d5152D87C.static.telenet.be) Quit ()
[8:04] * jtang1 (~jtang@ has joined #ceph
[8:04] * esammy (~esamuels@host-2-102-69-49.as13285.net) has joined #ceph
[8:08] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:08] <tnt> sagewk: ping
[8:09] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:12] * jtang1 (~jtang@ Quit (Ping timeout: 480 seconds)
[8:17] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Read error: Connection reset by peer)
[8:17] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:21] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[8:24] * xmltok_ (~xmltok@pool101.bizrate.com) has joined #ceph
[8:24] * xmltok (~xmltok@pool101.bizrate.com) Quit (Read error: Connection reset by peer)
[8:29] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Ping timeout: 480 seconds)
[8:41] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[8:42] * Vjarjadian (~IceChat77@ Quit (Quit: Why is the alphabet in that order? Is it because of that song?)
[8:44] * jtang1 (~jtang@ has joined #ceph
[8:46] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[9:04] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[9:07] * wogri (~wolf@nix.wogri.at) Quit (Quit: leaving)
[9:07] * wogri (~wolf@nix.wogri.at) has joined #ceph
[9:08] * eschnou (~eschnou@ has joined #ceph
[9:12] * ChanServ sets mode +v wogri
[9:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:15] * tnt (~tnt@ Quit (Read error: Operation timed out)
[9:25] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:32] * bergerx_ (~bekir@ has joined #ceph
[9:35] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:38] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:43] * leseb (~Adium@ has joined #ceph
[9:45] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[9:46] * humbolt (~elias@213-33-1-180.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[9:52] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[9:57] * humbolt (~elias@178-190-255-82.adsl.highway.telekom.at) has joined #ceph
[9:58] <tnt> So ... at some point the mon stops trimming. Not really sure how that could happen.
[10:01] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[10:02] * ScOut3R (~ScOut3R@ has joined #ceph
[10:07] <tnt> AFAICT, the only way that could happen (and not print any debug "trim_to" after a "finish_proposal") is that Paxos::should_trim() returns false.
[10:09] <tnt> and the only durable thing that could cause that is going_to_trim being stuck at True.
[10:12] * capri (~capri@ has joined #ceph
[10:17] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[10:19] * LeaChim (~LeaChim@ has joined #ceph
[10:20] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) Quit (Remote host closed the connection)
[10:22] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[10:30] * KindTwo (KindOne@ has joined #ceph
[10:30] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[10:30] * KindTwo is now known as KindOne
[10:32] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[10:34] <joao> tnt, it may also stop trimming according to service-specific conditions
[10:35] <joao> for instance, the osdmonitor won't trim unless the osd cluster is considered clean
[10:35] <joao> there's also a backoff to avoid frequent trimming
[10:40] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:40] <tnt> joao: mmm, where is that implemented ? Here I'm just looking at the "paxos" trimming and not the "paxosservice(...)" trimming. The latter still seem to be occuring.
[10:47] <loicd> ccourtaut: \o
[10:48] <ccourtaut> loicd: o/
[10:58] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[11:01] <joao> tnt, ah, sorry about that, misread the whole thing
[11:01] <joao> tnt, do you happen to have a sync going on by any chance?
[11:01] <joao> or have you noticed a sync going while Paxos stops trimming?
[11:02] <joao> that would disable Paxos trimming for the duration of the sync and then some to give time for a monitor to recover
[11:03] * rahmu_ (~rahmu@ has joined #ceph
[11:07] <joao> ah!
[11:07] <joao> there's a possibility of going_to_trim staying True forever
[11:08] <joao> tnt, awesome
[11:08] <joao> thank you
[11:09] <joao> fyi, when we queue a trim proposal we set going_to_trim = true; we then set it back to false when the proposal goes through and its callback of type C_Trimmed is called
[11:10] <joao> but if there's an election in-between, for instance, during Monitor::reset() we call Paxos::restart(), which in turn clears the list of queued proposals on Paxos
[11:11] <tnt> Ah, there you go, I got an election right about the time it started (because of spurious election due to 'compact' causing time out as we discussed last week).
[11:11] <joao> we should be finishing all contexts on that list with -EAGAIN, similarly to what we do for other queues, but instead we're just getting rid of them all
[11:13] * brother (foobaz@vps1.hacking.dk) Quit (Ping timeout: 480 seconds)
[11:24] <tnt> looking forward to the patch :)
[11:25] <absynth> hrrrm
[11:25] <absynth> that PG "clone without head" thread is really unsettling
[11:25] <absynth> joao: why is there no official intank answer there, yet? this is really a KO criteria, IMHO
[11:26] <joao> absynth, might have gone through the cracks if it came yesterday
[11:26] <joao> let me take a look
[11:28] <joao> ah, from May 7th
[11:28] <joao> well, no idea; my guess is that nobody noticed it
[11:28] * brother (foobaz@vps1.hacking.dk) has joined #ceph
[11:32] * alex_ (~chatzilla@d24-141-198-231.home.cgocable.net) has joined #ceph
[11:33] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[11:33] <andrei> hello guys
[11:33] <andrei> i am doing a small PoC with 2 servers
[11:33] <andrei> i've got things configured and working
[11:34] <andrei> i have tried restarting one of the osd servers and things got almost fully repaired after the osd server got powered up
[11:34] <andrei> however, the health state is shown as health_warn
[11:35] <andrei> and it doesn't seems to change
[11:35] <andrei> i've left it running for a day
[11:35] <andrei> and nothing has changed
[11:35] <andrei> it is showning 2.056% degraded
[11:35] <andrei> and there are some pgs degraded and stuck unclean
[11:35] <andrei> i was wondering if someone could help me with fixing the problem?
[11:38] <wogri_risc> andrei, did you fiddle with the crushmap?
[11:38] <andrei> nope, i've not tried anything
[11:38] <andrei> i've just done an init 6 on one of the servers
[11:38] <andrei> after it restarted i've done service ceph -a start
[11:39] <andrei> it started the service without any errors
[11:39] <andrei> and the repair started
[11:39] <andrei> it took about 30 mins or so to repair from 50% degraded
[11:39] <andrei> but it still left 2.056% degraded
[11:40] <wogri_risc> yeah, I remember that from yesterday.
[11:40] <andrei> yeah
[11:40] <andrei> so it actually didn't change since i've spoke
[11:41] <andrei> i thought that the information was changing, but it wasn't
[11:41] <wogri_risc> hm.
[11:41] <wogri_risc> can you paste ceph osd tree
[11:41] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[11:42] <andrei> http://ur1.ca/dza8u
[11:43] <wogri_risc> hm. you have an osd for the non-existing host arh/ibstorage2-ib?
[11:44] <wogri_risc> sorry, this is not an OSD
[11:44] <wogri_risc> it's just a host-entry.
[11:44] <andrei> that is strange
[11:44] <joao> is host arh-ibstorage2 supposed to be not-under-unknownrack ?
[11:44] <andrei> I am suppose to have only two hosts - arh-ibstorage1-ib and arh-ibstorage2-ib
[11:45] <andrei> these are the ipoib network
[11:45] <andrei> the arh-ibstorage2 is a 1gbit/s network
[11:45] <wogri_risc> I suspect this might cause you troubles
[11:45] <andrei> which is not suppose to run ceph at all
[11:45] <wogri_risc> you should change your crushmap
[11:46] <andrei> i do not see any mentioning of ib-storage2 in the ceph.conf
[11:46] <wogri_risc> but I'm not really sure if this will solve your problems
[11:46] <wogri_risc> this is in the crushmap
[11:46] <andrei> where did it get it from I wonder
[11:47] <andrei> so, what do I need to do?
[11:49] <andrei> how do i change the crash map?
[11:49] <tnt> :) crush
[11:49] <alex_> this is going to be a dumb question, but if you create OSD's via ceph-deploy, you dont specify a filestore or nothing, so is it reliant on client's filestore settings?
[11:50] <wogri_risc> andrei, http://ceph.com/docs/master/rados/operations/crush-map/
[11:55] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:55] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:57] <andrei> thanks I will look at it
[12:02] <alex_> o i see it uses xfs, should of read a little better
[12:07] <andrei> wogri_risc: thanks for the link. i've got the map and decompiled it
[12:07] <andrei> however, i can't figure out what is wrong with it
[12:07] <andrei> http://ur1.ca/dzald
[12:08] <andrei> all osds look okay to me
[12:08] <andrei> hostname too
[12:09] <andrei> ah, yeah
[12:09] <andrei> actually I do see
[12:09] <andrei> there hostname is wrong
[12:09] <andrei> host arh-ibstorage2 should be arh-ibstorage2-ib
[12:09] <andrei> however, it is still the same server, just different IP range
[12:12] <wogri_risc> andrei
[12:12] <wogri_risc> you're running a test-cluster, right?
[12:12] <Gugge-47527> andrei: the host is either named arh-ibstorage2 or arh-ibstorage2-ib, not both :)
[12:12] <wogri_risc> I would say maybe it's a good idea to start over again.
[12:13] <wogri_risc> you might have troubles getting those 2% away with a inconsistent crush-map. at least I would start over, and re-do the test, with a tidy osd tree.
[12:14] * ggreg_ is now known as ggreg
[12:16] <andrei> yes, that's the test cluster
[12:16] <andrei> none production
[12:17] <andrei> wogri_risc: I can start over, no probs. However, i've created the cluster using 5 min quick start guide. I've not made any changes at all. the only thing that i've done is i've restarted the second osd server, which did not run mon or mds
[12:17] <andrei> so, the crush got corruped somehow
[12:17] <andrei> without me doing anything manually
[12:18] <andrei> apart from init 6 on the second server
[12:18] <wogri_risc> andrei, I've tried this, tool.
[12:18] <wogri_risc> (too, not tool)
[12:18] <andrei> so, it would be very useful for me to find out what has happended and how to get it fixed
[12:18] <wogri_risc> and in my case it worked. I could always get 100 consistent cluster back.
[12:18] <andrei> without redoing the cluster
[12:19] <andrei> as I might happen in production and I can't redo production cluster in the future )))
[12:19] <wogri_risc> did you rename the host in the crush-map, compile it and push it back in?
[12:19] <andrei> wogri_risc: I've not done this yet.
[12:19] <andrei> will do it now
[12:19] <wogri_risc> try that first.
[12:19] <andrei> i've just retrieved the map and decompiled it
[12:19] <andrei> thanks for your help, i will do that now
[12:22] * leseb (~Adium@ Quit (Quit: Leaving.)
[12:25] * leseb (~Adium@ has joined #ceph
[12:34] <andrei> wogri_risc: does that indicate a problem with the crushmap: http://ur1.ca/dzb0u ?
[12:35] <wogri_risc> yes, syntactically incorrect crushmap
[12:35] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[12:35] <wogri_risc> as it says, you have defined an item in the bucket unknownrack that doesn't exist as such
[12:36] * diegows (~diegows@ has joined #ceph
[12:40] * sh_t (~sht@lu.privatevpn.com) Quit (Ping timeout: 480 seconds)
[12:40] <andrei> okay, i will try again
[12:42] * alex_ (~chatzilla@d24-141-198-231.home.cgocable.net) Quit (Ping timeout: 480 seconds)
[12:46] <andrei> okay, found the problems and compilation of crushmap didn't give any errors
[12:48] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) Quit (Remote host closed the connection)
[12:54] <andrei> i've updated the crush map and it is now rebuilding things
[12:58] <mikedawson> tnt: run your monitor log you uploaded to issu 4895 through cat mon-log-growing | grep paxos | grep updating | grep accepted
[12:59] <mikedawson> tnt: you'll see a pattern. I see the same pattern. Wonder if that is expected
[12:59] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) has joined #ceph
[13:00] <mikedawson> joao: ^ does that look right?
[13:00] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[13:00] <wogri_risc> andrei, so it works now?
[13:01] <tnt> mikedawson: the problem has been identified btw.
[13:02] <tnt> mikedawson: when there is an election while a trim is pending, it will erroneously just suppress all pending proposals without finishing them and so 'going_to_trim' is stuck to true and it's never going to submit another trim again.
[13:03] <mikedawson> tnt: great! Thanks for you help tracking this one down!
[13:04] <tnt> mikedawson: and wrt to election pattern it's because lease expire during the leveldb compact ... which is another problem and this cause a bunch of elections when there is no need.
[13:04] <tnt> I tracked that down last week but don't really have a fix.
[13:05] <mikedawson> tnt: I see. Reading back through irc logs now.
[13:07] <andrei> wogri_risc: yeah, it just finished, health is OK
[13:07] <andrei> everything got recovered
[13:08] <joao> mikedawson, what sort of pattern?
[13:08] <mikedawson> joao: http://pastebin.com/raw.php?i=uDnqwhVm
[13:08] <joao> ah
[13:09] <andrei> i will try to restart again to see if the same thing happens
[13:09] <joao> well, that's how paxos works
[13:09] <joao> say, the leader starts a proposal and automatically "accepts" it
[13:09] <joao> then you have mon.1 accepting it
[13:09] <joao> and then you have mon.2 accepting it
[13:09] <joao> and then there's a commit
[13:10] <joao> and during a proposal, paxos is set to be on STATE_UPDATING
[13:10] <andrei> do I need to do anything on a server before shutting it down? like do I need to enable a maintenance mode of some sort?
[13:10] <andrei> or is this not necessary with ceph?
[13:10] <joao> so I'm kind of glad that patterns is there, as it means paxos is working just fine :)
[13:11] <tnt> oh yeah, I though "now" was "won" and it was an election :p
[13:11] <tnt> andrei: if the shutdown is of short duration of an OSD, you can set the 'noout' flag.
[13:11] <mikedawson> joao: that makes sense, does it normally happen that frequently?
[13:12] <joao> mikedawson, it depends on cluster load
[13:12] <joao> that can be all sorts of proposals
[13:12] <andrei> tnt: how do i do that?
[13:12] <tnt> ceph osd set noout
[13:12] <joao> pg stats, logging, mds shenanigans
[13:12] <andrei> is this done on the server which is being restarted?
[13:12] <tnt> andrei: nope, it's a global flag.
[13:13] <tnt> andrei: just do it on any server having the admin key
[13:14] <joao> mikedawson, you can check your original log file for paxos services dealing with messages
[13:15] <tnt> joao: btw, can the proposals.clear() in leader_init be an issue as well ? I guess any leader_init() will go through reset() before.
[13:15] <joao> usually these proposals are triggered by some message being handled
[13:15] <joao> tnt, yeah, I'm working on a rework for some other bug and will take care of that while I'm in there
[13:15] <tnt> Most of my updates are the pgmap ... then right after a second one with the log message regarding that new pgmap :p
[13:16] <tnt> joao: ok, looking forward to it. I guess it'll be backported to cuttlefish ?
[13:17] <joao> I would think so
[13:18] <mikedawson> tnt: my pgmap updated about every 1s, just like the paxos proposals in the logs. joao: what is changing every second to require a new pgmap?
[13:21] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[13:21] <tnt> stats I think
[13:22] <joao> yeah, stats
[13:22] <joao> I would think too
[13:22] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[13:31] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:31] <tnt> joao: btw, what kind of timeframe do you expect the fix in ? just want to know if it's worth for me to deloy a quick fix or if I should wait for the official one (or even 0.61.3)
[13:32] <joao> tnt, after lunch, still this afternoon
[13:32] <tnt> oh ok, then I'll just wait.
[13:32] <joao> and in the context of lunch, brb
[13:45] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[13:50] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[13:52] * KindTwo (KindOne@ has joined #ceph
[13:57] * rahmu_ is now known as rahmu
[13:57] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:58] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:59] * KindTwo is now known as KindOne
[14:07] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:08] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[14:12] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[14:12] * ChanServ sets mode +o elder
[14:15] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[14:16] * andrei (~andrei@host217-36-17-226.in-addr.btopenworld.com) has joined #ceph
[14:16] <andrei> hi once again.
[14:16] <andrei> i am trying to stop several osds before doing a reboot
[14:17] <andrei> and i run /etc/init.d/ceph stop osd.9
[14:17] <andrei> it doesn't give me any errors, but the osd tree is still showing osd.9 as up
[14:17] <andrei> what am I doing wrong?
[14:20] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[14:20] <tnt> is the process still running ?
[14:20] <tnt> what version ?
[14:22] <tnt> andrei: ^^
[14:23] * DarkAceZ (~BillyMays@ has joined #ceph
[14:25] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:29] <andrei> one sec
[14:31] <andrei> tnt: I do have this process running: /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf
[14:31] <andrei> ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
[14:31] <andrei> on ubuntu 12.04.2
[14:31] <tnt> try service ceph stop osd.9
[14:32] <andrei> done this, but the process is still running
[14:33] <tnt> and does it says "stopping ..." ?
[14:33] <andrei> nope, it just completes command without any errors or any other output
[14:33] <tnt> my guess is that the ceph.conf doesn't have the right osd.9 entry.
[14:33] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[14:34] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:35] <andrei> it seems correct to me
[14:35] <andrei> [osd.9]
[14:35] <andrei> host = arh-ibstorage2-ib
[14:35] <andrei> devs = /dev/sda
[14:36] <andrei> and from the osd tree i have
[14:36] <tnt> what does "uname -n" says ?
[14:36] <andrei> -4 0 host arh-ibstorage2-ib
[14:36] <andrei> 9 1 osd.9 up 1
[14:36] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[14:37] <andrei> damn, thanks man
[14:37] <andrei> the uname is not right
[14:37] <andrei> i need to fix it
[14:40] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:41] <andrei> sorry, the hostname
[14:42] <andrei> tnt: that has worked like a charm
[14:42] <andrei> sorry for stupid questions
[14:43] <andrei> still getting used to ceph
[14:45] * Vjarjadian (~IceChat77@ has joined #ceph
[14:59] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:16] <mikedawson> jamespage: yesterday, you asked me to file a bug against qemu-kvm, but I don't use it (it was deprecated in qemu 1.3). Should I file it against qemu-system-common or something else, instead?
[15:16] <jamespage> mikedawson, yeah - thats fine - it will end up against the right source package
[15:16] <mikedawson> thanks jamespage
[15:22] <tchmnkyz> dmick: you around?
[15:22] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[15:23] <jamespage> dmick, if you are I also have a question about potentially enabling a --distro source option in ceph-deploy :-)
[15:24] * markl (~mark@tpsit.com) has joined #ceph
[15:24] * markl (~mark@tpsit.com) Quit ()
[15:27] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:27] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[15:28] <elder> jamespage, dmick may not be out of bed yet. Try again in a few hours.
[15:28] <jamespage> elder, thanks
[15:28] <elder> tchmnkyz, ^^
[15:28] * The_Bishop (~bishop@f052098056.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[15:28] <tchmnkyz> thnx
[15:29] <tchmnkyz> elder: maybe you can help me? i am having a mds problem where 1 out of 3 is working just fine
[15:29] <tchmnkyz> the other two commit suicide
[15:29] <elder> I am not the right person to help you on that, I'm sorry.
[15:30] <tchmnkyz> damn
[15:30] <tchmnkyz> was worth a try
[15:30] <tchmnkyz> it is funny
[15:30] <tchmnkyz> the server that mds daemon is fine on
[15:30] <tchmnkyz> is the only one that mon wont start on...
[15:30] <tchmnkyz> so i have 2 mds and 1 mon down
[15:31] <tnt> tchmnkyz: what does the mon say ?
[15:31] <tchmnkyz> mon says auth failed
[15:31] <tchmnkyz> but when i do a auth list it is in the list
[15:31] <tnt> all are at the same version ?
[15:31] <tchmnkyz> yes
[15:32] <tnt> does its keyring match the other mon ?
[15:32] <tchmnkyz> yea
[15:33] <tnt> can you pastebin the exact log ?
[15:33] <tchmnkyz> it was working fine untill i upgraded to 61.2
[15:33] <tchmnkyz> ya
[15:33] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[15:33] <tnt> tchmnkyz: did you restart all mons already ?
[15:35] <joao> tnt, I pushed wip-4999 to gh with the intended patches; I tested them a bit, but they weren't subject to a realistic workload
[15:35] <joao> if you're wiling to test it, it would be much appreciated
[15:36] <tchmnkyz> yes
[15:38] <tnt> tchmnkyz: well, I would just nuke it (i.e. stop it, move the data dir to some temp location, then redo a mkfs, copy the keyring from another mon to the data dir and restart).
[15:38] <tchmnkyz> both errors
[15:38] <tchmnkyz> http://pastebin.com/XmKzA66F
[15:40] <tnt> joao: only the 3 last commits are relevant right ? (I need to build custom packages with some other patches before putting them in prod and I want to take only the required changes)
[15:40] <joao> tnt, yeah, just the last 3
[15:40] <joao> oh, that's going into production?
[15:40] <joao> I'm not very comfortable with that
[15:40] <tnt> well that's the only place where I have the issue.
[15:41] <joao> not that it shouldn't work (it should), but if it doesn't it would be a pity
[15:41] <tnt> I was plannng on putting it on the quorum leader only. (so I can just shutdown that mon if it doesn't work).
[15:41] <joao> now that I think of it, it's unlikely it would cause damage at all; worst case scenario would be a couple of hit asserts
[15:42] <joao> tnt, yeah, that should be a safe approach
[15:42] <tchmnkyz> tnt: what about the two mds's that wont start
[15:42] <tchmnkyz> i dont want to wipe out this node because it has the only working mds
[15:42] <tnt> tchmnkyz: I never used mds so I really can't help there.
[15:43] <tchmnkyz> what is mds used for?
[15:43] <tnt> tchmnkyz: I said to wipe the _mon_ data directory ... not all the node.
[15:43] <tchmnkyz> i just thought i had to have it
[15:43] <tnt> mds is used for cephfs
[15:43] <tnt> you don't need mds for rbd or rgw
[15:43] <tchmnkyz> really
[15:43] <tchmnkyz> nice
[15:43] <tchmnkyz> i did not know that
[15:43] <tchmnkyz> i am just using ceph for KVM storage
[15:46] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:47] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[15:50] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:57] * PerlStalker (~PerlStalk@ has joined #ceph
[16:04] <tchmnkyz> i keep getting this HEALTH_WARN mds 0 is laggy
[16:04] <tchmnkyz> but it wont let me clear it
[16:04] <tchmnkyz> everything i find online says to use ceph mds newfs metadata data
[16:05] <tchmnkyz> but that does not work for me
[16:07] <tnt> tchmnkyz: you removed all the mds from cluster with "ceph mds rm mds.X" ?
[16:07] <tnt> http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/
[16:09] <tchmnkyz> yea i did that
[16:09] <tchmnkyz> and they are our of the conf too
[16:09] <tnt> did you read the note "Note: as Gregory Farnum mentioned it, the new cephfs command needs pool IDs as an argument and not pool name."
[16:09] <tchmnkyz> o ok
[16:09] <tchmnkyz> missed that
[16:09] <tchmnkyz> sorrt
[16:09] <tchmnkyz> sorry
[16:09] <via> is it possible to downgrade from .61 back to .56?
[16:10] * drokita (~drokita@ has joined #ceph
[16:10] <tnt> via: not if you ever _ran_ the cluster with 0.61 AFAIK.
[16:10] <via> ok
[16:10] <tnt> any reason you want to downgrade ?
[16:11] * The_Bishop (~bishop@2001:470:50b6:0:59c8:d63e:cbe6:ad15) has joined #ceph
[16:12] <tchmnkyz> thnx
[16:12] <tchmnkyz> fixed
[16:12] <tnt> and the 3 mon are running now as well ?
[16:14] <tchmnkyz> i recreated the third mon
[16:14] <tchmnkyz> and still getting auth failed
[16:15] <tchmnkyz> http://pastebin.com/pFMnAHWE
[16:15] <tchmnkyz> error logs
[16:16] <tnt> tchmnkyz: can you restart one of the working mon and pastebin its start log as well ?
[16:16] * dosaboy_ (~dosaboy@faun.canonical.com) has joined #ceph
[16:16] <tchmnkyz> ya
[16:17] <tchmnkyz> http://pastebin.com/JSiENxvJ
[16:17] <tchmnkyz> that is the good one
[16:19] <tnt> mmm, are you sure it's at 0.61.2 ? it doesn't print its version ... while the failing one does.
[16:20] <tchmnkyz> jeremy.may@jeremymay-macpro:~ $ for i in `seq 50 52` ; do vk 10.15.7.$i ceph -v ; done
[16:20] <tchmnkyz> ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
[16:20] <tchmnkyz> ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
[16:20] <tchmnkyz> taken from all 3
[16:20] <tchmnkyz> ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
[16:21] * drokita1 (~drokita@ has joined #ceph
[16:21] <tnt> what distribution do you use ?
[16:22] <tchmnkyz> the two working ones are debian
[16:22] <tchmnkyz> the third one is ubuntu
[16:22] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Read error: Connection reset by peer)
[16:22] <tchmnkyz> the other two are going to become ubuntu when i can plan downtime
[16:22] <tchmnkyz> all of the OSD are ubuntu
[16:23] <tnt> can you do a "apt-cache policy ceph ceph-common librados2" on the 3 ?
[16:23] <tchmnkyz> ya
[16:23] <tnt> there was some dependency weirdness that meant that you could have some part of it in one version and some other part at an older version ...
[16:24] <tnt> and that "auth error" is typical of having two old mons in quorum that can't talk to a newer mon (and so that newer mon can't get session cipher keys)
[16:25] <tchmnkyz> http://pastebin.com/TrWjNLnd
[16:25] <tnt> there you go ... ceph package is still at 0.56.4-1~bpo60+1
[16:25] <tchmnkyz> looks like the deb servers are suing bpo
[16:25] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[16:25] <tchmnkyz> The following packages have been kept back: ceph ceph-mds
[16:26] <tchmnkyz> so how can i force it
[16:26] <tnt> yes, they have new dependencies so you need to do a "apt-get install ceph ceph-mds"
[16:26] <tchmnkyz> lol
[16:26] <tchmnkyz> k
[16:26] <tchmnkyz> E: Package 'gdisk' has no installation candidate
[16:26] <tchmnkyz> ok
[16:28] <tchmnkyz> wonder why gdisk is not in squeeze
[16:28] <tnt> mmm, no idea how 'gdisk' is needed ... maybe try --no-install-recommends
[16:28] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[16:29] <tchmnkyz> i can try it
[16:29] <tchmnkyz> i was just trying to find a way to get it installed
[16:29] <tchmnkyz> lol
[16:35] * dosaboy_ (~dosaboy@faun.canonical.com) Quit (Quit: leaving)
[16:37] * rahmu (~rahmu@ Quit (Ping timeout: 480 seconds)
[16:37] * dosaboy_ (~dosaboy@faun.canonical.com) has joined #ceph
[16:39] <tnt> tchmnkyz: how is it going ?
[16:44] <tchmnkyz> fixed on node 1
[16:52] <tchmnkyz> ok now it is complaining of a 3 second time sqew
[16:53] <tchmnkyz> i have ntp running on all 3 nodes so i dont understand the time skew
[16:53] <tchmnkyz> Wed May 22 09:53:18 CDT 2013
[16:53] <tchmnkyz> Wed May 22 09:53:18 CDT 2013
[16:53] <tchmnkyz> Wed May 22 09:53:21 CDT 2013
[16:54] <tnt> might not be synced to the same time server (althoug 3 sec is pretty big for a ntp error)
[16:54] <tnt> you can see the ntp status using ntpq then the 'peers' command at the ntpq shell
[16:55] <tchmnkyz> ok i was able to get it back in sync
[16:55] <tchmnkyz> it seems node 3 was using time-b not time-a like the others as the primary time server
[16:56] <tchmnkyz> switched and things seem fine now
[16:57] <tnt> so all 3 mons are in quorum now ?
[16:58] <tnt> you might want to check the apt-cache policy on your OSDs as well to make sure version are consistent too.
[17:01] <tchmnkyz> all of the osd are ubuntu and look good
[17:01] <tchmnkyz> but now i get a warn status with no explanation of why
[17:01] <tnt> pastebin ceph -s
[17:01] <tchmnkyz> http://pastebin.com/2trXxcHe
[17:02] <tchmnkyz> was doing that as you typed it
[17:02] <tnt> mdsmap e666: 0/0/1 up
[17:02] <tchmnkyz> ok?
[17:02] <tnt> I think it thinks you still have a mds
[17:02] <tchmnkyz> but i removed them all
[17:03] <tnt> ceph mds dump
[17:03] * lightspeed (~lightspee@i01m-62-35-37-66.d4.club-internet.fr) Quit (Ping timeout: 480 seconds)
[17:03] <tchmnkyz> http://pastebin.com/nSDk6wCB
[17:05] <tnt> ok, so that's not the problem.
[17:06] * andrei (~andrei@host217-36-17-226.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[17:07] <tnt> does "ceph health detail" say anything ?
[17:09] * Wolff_John (~jwolff@ has joined #ceph
[17:11] <tchmnkyz> yea it does
[17:11] <tchmnkyz> lol
[17:11] <tchmnkyz> mon.2 addr has 13% avail disk space -- low disk space!
[17:13] <tchmnkyz> turns out there is a 43G core file in /
[17:15] <tnt> :)
[17:17] <tchmnkyz> wahoo ceph is clean again
[17:17] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:18] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[17:19] * lightspeed (~lightspee@lns-c10k-ld-01-m-62-35-37-66.dsl.sta.abo.bbox.fr) has joined #ceph
[17:24] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Quit: Ex-Chat)
[17:25] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[17:25] <andrei> hello guys
[17:25] <andrei> i am trying to add more monitors to my test cluster
[17:25] <andrei> i currently have 1 monitor
[17:26] <andrei> i've added the second one using http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
[17:26] <tnt> and now you don't have q quorum anymore :p
[17:26] <andrei> however, the last command where I start the monitor doesn't seem to do anything
[17:26] <andrei> it Exits 1
[17:27] <andrei> and I do not have ceph-mon process running on the second mon server I am trying to add
[17:27] <andrei> is this because of the quorum?
[17:27] <joao> tnt, any luck with those patches?
[17:28] <tnt> joao: I've just built the package and deployed them to our test cluster, I'll let it run there for a couple of hours and then update the quorum master in prod.
[17:28] <joao> alright
[17:28] <joao> let us know how it goes :)
[17:28] <tnt> andrei: did you run step 7 yet ?
[17:29] <tnt> andrei: does ceph -s do something ?
[17:29] <tnt> joao: I will !
[17:29] <andrei> tnt: i ran step 7 and 8
[17:29] <andrei> ceph -s shows: 2013-05-22 16:29:00.519618 7f299c320700 0 -- >> pipe(0x7f2988010860 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[17:29] * tkensiski (~tkensiski@44.sub-70-197-6.myvzw.com) has joined #ceph
[17:29] <andrei> every once in a while
[17:29] * tkensiski (~tkensiski@44.sub-70-197-6.myvzw.com) has left #ceph
[17:30] <tnt> andrei: yes ... I think the best option when going from 1->2 is to skip step 7 and run step 8 first, let it sync, then run step 7.
[17:31] <tnt> andrei: you may have to remove it manually and start over.
[17:31] <andrei> what happens when I try to add a thrird one now?
[17:31] <andrei> no joy?
[17:32] <tnt> andrei: The problem is that when you do a "ceph mon add", you go from 1 mon to 2 mon meanng you need 2 mon to have a quorum and the mon you just added doesn't have a chance to do the initial sync.
[17:32] <tnt> nope, you need to go back to 1 mon first.
[17:32] <tnt> That process of going from 1->3 mon is not very well documented ...
[17:34] <andrei> ceph mon delete b
[17:34] <andrei> that also gives me a fault
[17:34] <andrei> so, how do I remove it?
[17:34] <tnt> "Removing Monitors from an Unhealthy Cluster"
[17:34] <tnt> http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
[17:34] <andrei> and do you know proper steps for going from 1 > 3?
[17:34] <andrei> cheers
[17:35] <tnt> I did it back at 0.48 so my memory might be a bit rusty, but AFAIR, you just skip step "7" of "Adding Monitors" on that page.
[17:36] <tnt> then when running step 8, the new mon will see it has no data, ask the existing quorum (of 1 mon) for the data and then try to join. The join will fail, but it will ahve done the sync. From that point, you can execute step 7 and restart both mon.
[17:39] <andrei> tnt: the syntax of step 2 from the link is not correct I think
[17:39] <andrei> http://ur1.ca/dzf8e
[17:40] <tnt> arf, it is ... but it's only in git master.
[17:41] <andrei> damn
[17:42] * lofejndif (~lsqavnbok@19NAAC8ZE.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:42] <tnt> how large is your store.db directory on the mon ?
[17:42] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:46] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:46] <andrei> it has a bunch of files
[17:46] <andrei> 50mb in total
[17:47] <andrei> is there way around this problem?
[17:47] <tnt> can you tar it and upload it somewhere I can download ? I'll extract the monmap for you.
[17:47] <andrei> thanks man!
[17:47] <andrei> really appreciate it
[17:48] * portante (~user@ has joined #ceph
[17:49] * leseb (~Adium@ Quit (Quit: Leaving.)
[17:49] * leseb (~Adium@ has joined #ceph
[17:50] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[17:54] <andrei> tnt: did you get my link?
[17:55] <tnt> andrei: yes I did. But my own mons have started growing again, so I need to take care of that first :p
[17:55] <andrei> ah, okay )))
[18:00] * jackhill (jackhill@pilot.trilug.org) Quit (Remote host closed the connection)
[18:00] <tnt> andrei: http://ge.tt/8BNt1Mh/v/0
[18:00] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:00] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:01] <tnt> now you can use monmaptool to rm the second mon, then re-inject it into mon.a
[18:01] <tnt> then finaly restart mon.a (you had stopped it right ?)
[18:02] * alex_ (~chatzilla@d24-141-198-231.home.cgocable.net) has joined #ceph
[18:03] * alex_ (~chatzilla@d24-141-198-231.home.cgocable.net) Quit ()
[18:03] <andrei> i will try that
[18:04] <andrei> thanks
[18:04] * alexxy[home] (~alexxy@ has joined #ceph
[18:04] <andrei> yes, i have stopped
[18:04] <andrei> thanks for your help!
[18:06] <sagewk> mikedawson: around?
[18:07] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[18:07] * _are__ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[18:08] <andrei> tnt: is this not your map?
[18:08] <andrei> i do not have ip range ))
[18:08] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Read error: Connection reset by peer)
[18:09] <tnt> andrei: mmm, hum :p
[18:09] <andrei> mine should be 192.168.168 range
[18:09] <tnt> yes, and I checked it was, so I must have uploaded the wrong file ... ah yes, I have both /tmp/monmap and ~/monmap ...
[18:10] <andrei> ))
[18:11] <tnt> http://ge.tt/8qG66Mh/v/0
[18:12] <andrei> thanks
[18:12] <andrei> that's the correct one
[18:13] <andrei> so, from the guide to add monitors, i should do steps 1-6 for mon.b and mon.c and then do step 7?
[18:13] <andrei> which monitor do I add i step 7?
[18:13] <andrei> mon.b or mon.c?
[18:14] <tnt> andrei: first start with mon.b alone. Do steps 1-6 and then skip 7 and try to start it (like in step 8).
[18:15] * portante (~user@ Quit (Ping timeout: 480 seconds)
[18:16] <andrei> thanks!
[18:16] <andrei> the monmap got recovered
[18:16] <andrei> my cluster is back with heath_ok
[18:16] <andrei> cheers for that
[18:16] <andrei> i will try adding mons once again
[18:17] <andrei> i guess monitor adding procedure is not only broken in the documentation but in the mkcephfs and ceph-deploy
[18:17] <andrei> as i've had monitor related issues when I was trying to create my clusters with 3 mons from the start
[18:19] <andrei> tnt: in my ceph.conf file, should I add monitors one by one or all three rightaway?
[18:19] <andrei> do you know?
[18:20] <tnt> doesn't matter much.
[18:24] * Wolff_John (~jwolff@ Quit (Ping timeout: 480 seconds)
[18:24] * alram (~alram@ has joined #ceph
[18:26] <tnt> joao: just restarted the leader with the fix. It didn't crash on start so that's a good sign :p
[18:27] <loicd> make check on master fails because of http://pastebin.com/qQAHES0K . It's so small it's probably fixed already ;-)
[18:28] <andrei> tnt: i've done steps 1-6 again
[18:28] <andrei> and step 8
[18:28] <andrei> the ceph.mon is running on the second server
[18:28] <tnt> now do step 7
[18:29] <andrei> ah
[18:29] <andrei> okay
[18:29] <tnt> what does ceph -s say ?
[18:29] <andrei> so I do not go to do the 3rd mon just yet?
[18:29] <tnt> (before running)
[18:29] <andrei> ceph -s shows health ok
[18:29] <tnt> and how many mons ?
[18:29] <andrei> but monmap shows just 1 monitor
[18:29] <andrei> not two
[18:29] <tnt> ok, so now try to do step 7
[18:29] <andrei> oh
[18:30] <andrei> hold on
[18:30] <andrei> it does show 2 mons now
[18:30] <joao> tnt, eh nice :p
[18:30] <andrei> monmap e4: 2 mons at {a=,b=}, election epoch 2, quorum 0,1 a,b
[18:30] <andrei> initially it showed just one
[18:30] <andrei> after a few moments it showed the second one
[18:30] <tnt> andrei: perfect then :) you don't even need to do 7) I guess it 'auto added' itself after the sync.
[18:30] <andrei> so, i have to do the 3rd one now, yes?
[18:31] <tnt> yup
[18:31] <andrei> cheers!
[18:31] <tnt> actually I think the doc should not even mention the step 7 since it only leads to issues and seem to be taken care of automatically when the mon daemon does the inital sync.
[18:32] * vata (~vata@2607:fad8:4:6:98cb:c558:8f0:e7fb) has joined #ceph
[18:33] <tnt> joao: of course now the issue is that it only happenned from time to time ... so just need to wait "long enough" see if it still happens :p
[18:35] <tnt> gotta move home now. bbl.
[18:36] * leseb (~Adium@ Quit (Quit: Leaving.)
[18:36] <joao> yeah, and if doesn't happen ever again it might or might not be due to the fix
[18:36] <joao> :p
[18:38] * rturk-away is now known as rturk
[18:40] * tkensiski (~tkensiski@ has joined #ceph
[18:42] <andrei> tnt: thanks a lot! all 3 mons are up )))
[18:42] <andrei> tnt: a question, what would happen if I restart one of the monitors?
[18:42] <andrei> would ceph still function okay with just 2 mons?
[18:43] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:43] <sagewk> loicd: fixed it up, thanks
[18:44] * tkensiski (~tkensiski@ has left #ceph
[18:45] * Tamil (~tamil@ has joined #ceph
[18:45] * NightDog (~Karl@ has joined #ceph
[18:45] <andrei> guys, how many mds servers should I have on a small production cluster?
[18:45] <andrei> at least 2?
[18:45] <andrei> or more?
[18:47] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Remote host closed the connection)
[18:49] * yehuda_hm (~yehuda@2602:306:330b:1410:c5a7:1d24:87e7:190f) Quit (Ping timeout: 480 seconds)
[18:49] * yehuda_hm (~yehuda@2602:306:330b:1410:22:96ef:51be:2ef9) has joined #ceph
[18:50] * tnt (~tnt@ has joined #ceph
[18:50] <loicd> sagewk: you beat me :-) I'll withdraw https://github.com/ceph/ceph/pull/313
[18:51] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:52] <nhm> andrei: personally I feel like 5 is the minimum you really should have in production, but there is nothing stopping you from using even just 1 if you wanted.
[18:53] <nhm> andrei: preferrably you'd have more like 10-20.
[18:53] <andrei> well, i only have to osd servers
[18:53] <andrei> and the total of 4 servers
[18:53] <andrei> 2 of which are clients
[18:53] <andrei> and 2 file servers
[18:53] <andrei> so, i have a very small cluster )))
[18:54] <andrei> but I want to have a setup which would allow me to shut down any one of the two storage servers
[18:54] <andrei> without clients loosing access to the vm images
[18:54] <andrei> which will be stored in ceph using rbd
[18:54] <nhm> andrei: it'll work, it just won't handle a node failure nearly as well, especially if the disks are full.
[18:55] <andrei> what would happen if I fill up the disks?
[18:55] <nhm> andrei: if one of the nodes fails and you have the disks full, it can't create new replicas of the data.
[18:56] <andrei> nhm: my plan is to have two servers for now with replica of 2
[18:56] <andrei> so if one server is down, it will have the data on the second server
[18:56] <andrei> is that not what would happen?
[18:56] <nhm> andrei: that'll work. You'll want a mon on a 3rd node or just have a single mon.
[18:57] <andrei> nhm: tnt helped me with going from 1 mon to 3 just now
[18:57] <nhm> andrei: yes, but you won't have any copies of the data at that point. I'm not sure if the default crush rules would then try to create local replicas.
[18:57] <andrei> i have installed a 3rd mon on one of the clients
[18:57] <nhm> s/any copies/any extra copies
[18:58] <andrei> do you think ceph will try to create doubles on the same server?
[18:58] <andrei> if I switch off one of the servers?
[18:58] <nhm> andrei: Not sure.
[18:58] <nhm> andrei: you should be able to tweak it either way.
[18:59] * Vjarjadian (~IceChat77@ Quit (Quit: It's a dud! It's a dud! It's a du...)
[18:59] <andrei> somewhere in the ceph.conf file i guess?
[18:59] <nhm> andrei: the point is though that either way, the impact of losing one of those servers is big. You'll either be running with only a single copy of the data or you'll have to re-replicate *all* of the data in your cluster.
[19:01] <andrei> nhm: I should be okay with having a single copy of the data for short period of time
[19:01] <andrei> like when I do maintenance, etc
[19:02] <andrei> but i can't afford for vms to stop working while one of the servers is down
[19:02] <andrei> i need to have them up all the time
[19:02] <andrei> previously i had a single nfs server
[19:03] <andrei> but every maintenance required me to shut down a bunch of vms and schedule downtimes, etc
[19:03] <andrei> which is not ideal at all
[19:03] <andrei> that's why i am switching to 2 servers and ceph to prevent this happening
[19:03] <andrei> overtime i would like to add a 3rd storage server
[19:03] * mikedawson (~chatzilla@ has joined #ceph
[19:04] <tnt> andrei: personally i use servers as both storage and compute nodes
[19:05] * danieagle (~Daniel@ has joined #ceph
[19:06] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[19:11] <nhm> andrei: one thing to keep in mind is that if the cluster tries to re-replicate, it will be doing a ton of IO with 2 servers.
[19:12] <nhm> IE recovery will probably be somewhat impactful in your scenario.
[19:13] <nhm> just fyi
[19:13] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:16] * sh_t (~sht@lu.privatevpn.com) has joined #ceph
[19:17] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:18] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:23] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:27] * joelio (~Joel@ Quit (Quit: leaving)
[19:31] * Wolff_John (~jwolff@vpn.monarch-beverage.com) has joined #ceph
[19:35] <loicd> sjust I'm looking at https://github.com/dachary/ceph/blob/b2332e60988fccdffd374eb976ed83ddc561cba2/src/osd/PGLog.h and trying to think about what the API should really look like ( as per https://github.com/ceph/ceph/pull/308 )
[19:36] <loicd> merge_log https://github.com/dachary/ceph/blob/b2332e60988fccdffd374eb976ed83ddc561cba2/src/osd/PGLog.h#L361
[19:36] <loicd> and proc_replica_log https://github.com/dachary/ceph/blob/b2332e60988fccdffd374eb976ed83ddc561cba2/src/osd/PGLog.h#L351
[19:37] <loicd> were moved from PG but not modified ( minimal modification required to move the code out of PG )
[19:38] <gregaf> sjusthm: ^
[19:39] <loicd> gregaf: thanks :-) I'm confused by sam's multiple personality
[19:41] <paravoid> sagewk: re: #4967, what do you need?
[19:42] * rturk is now known as rturk-away
[19:43] <sagewk> i pushed a branch to master that adds some dbugging for other ways osds get marked down..
[19:43] <sagewk> i can push a cuttlefish branch that has it
[19:43] <sagewk> and we will be able to tell hopefully what is gonig on
[19:44] <paravoid> I'm still on bobtail
[19:44] <paravoid> I guess I might upgrade to cuttlefish in two weeks time
[19:44] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[19:45] <paravoid> rumour has it cuttlefish still has mon leveldb issues so I hadn't felt very comfortable risking it :)
[19:46] <tnt> paravoid: it has. There is also a fix for them being tested right now :p
[19:46] <tnt> (altough technically they turned out to be trimming issues rather than leveldb issues)
[19:46] * ScOut3R (~ScOut3R@dsl51B614D7.pool.t-online.hu) has joined #ceph
[19:46] * john_barbee_ (~jbarbee@ has joined #ceph
[19:47] <paravoid> sagewk: which branch is that?
[19:48] <sagewk> haven't created it yet :) sorry, will do that in a minute
[19:48] <sagewk> in teh middle of a bug scrub
[19:48] <paravoid> oh
[19:48] <paravoid> no worries
[19:48] <paravoid> it's not like I can easily reproduce the issue...
[19:56] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[19:59] * NightDog (~Karl@ Quit (Ping timeout: 480 seconds)
[19:59] * dwt (~dwt@128-107-239-234.cisco.com) has joined #ceph
[20:00] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[20:05] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[20:08] * john_barbee__ (~jbarbee@ has joined #ceph
[20:09] <imjustmatthew> Does the assert "mon/Monitor.cc: 1480: FAILED assert(state == STATE_SYNCHRONIZING)" look familiar to anyone on 61.2?
[20:10] * john_barbee_ (~jbarbee@ Quit (Ping timeout: 480 seconds)
[20:14] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[20:20] * BillK (~BillK@124-169-236-155.dyn.iinet.net.au) Quit (Ping timeout: 481 seconds)
[20:22] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[20:24] * pja (~pja@a.clients.kiwiirc.com) has joined #ceph
[20:24] <pja> hello
[20:24] <pja> anybody familiar with radosgw setup?
[20:25] <tnt> pja: ask and see if anyone knows the answer ...
[20:26] <pja> radosgw service doesnt start
[20:27] <sagewk> joao: see ^
[20:28] <sagewk> imjustmatthew: are you still seeing leveldb growth?
[20:28] <sagewk> mikedawson: ping!
[20:28] <mikedawson> pong
[20:28] <pja> when i want to start there comes no new entry to the log file, which i defined in the ceph.conf
[20:28] <sagewk> mikedawson: can you try out wip-4895-cuttlefish and see if it fixes your leveldb grwoth?
[20:28] * sagewk crosses his fingers
[20:29] <mikedawson> sagewk: mine aren't growing now. Would it be better to get into growth mode, then apply it to one?
[20:29] <joao> imjustmatthew, looks familiar indeed
[20:29] <joao> imjustmatthew, do you have logs?
[20:29] <imjustmatthew> sagewk: No, everything's been behaving pretty well, this was just a single assert the other night
[20:29] <joao> if so, #4999
[20:29] <tnt> sagewk: fyi, I'm running those patches, installed a few hours ago.
[20:29] <imjustmatthew> joao: Only at the default level, I'll post what I have
[20:29] <sagewk> mikedawson: we think its triggered by an election.. i bet i fyou killall -STOP ceph-mon on one node for ~14 seconds and then killall -CONT ceph-mon it will start growing
[20:30] <joao> imjustmatthew, thanks
[20:30] <sagewk> (~15 seconds, that is :)
[20:30] <tnt> well, you need an election right when a trim is being queued as proposal ...
[20:31] <sagewk> oh right
[20:31] <sagewk> in that case, i'd say install that branch, and check back in a day or two and make sure it's not growing
[20:31] <mikedawson> sagewk: I'll try to repeatably create growth using that method on 0.61.2, if I can get it growing reliably, I'll upgrade and let you know if its better
[20:31] <mikedawson> ahh..
[20:31] <Tamil> pja: what is the error you see?
[20:31] <imjustmatthew> joao: http://goo.gl/VNawj
[20:32] <mikedawson> I'll run it now and wait
[20:32] <Tamil> pja: which distro are you trying this on?
[20:34] <sagewk> mikedawson: thanks :)
[20:34] <tnt> pja: you need to make sure the hostname matches and the name of the entry is [client.radosgw.something]
[20:34] <mikedawson> sagewk: Thank you (and you too tnt)!
[20:34] <sagewk> df
[20:34] <sagewk> np
[20:36] <joao> imjustmatthew, thanks :)
[20:36] * yehuda_hm (~yehuda@2602:306:330b:1410:22:96ef:51be:2ef9) Quit (Ping timeout: 480 seconds)
[20:37] * yehuda_hm (~yehuda@2602:306:330b:1410:22:96ef:51be:2ef9) has joined #ceph
[20:38] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:38] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[20:38] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:39] <mikedawson> sagewk: 404's for Raring. I assume gitbuilder will do its thing eventually, right?
[20:39] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[20:41] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[20:41] * john_barbee_ (~jbarbee@ has joined #ceph
[20:42] * john_barbee__ (~jbarbee@ Quit (Read error: Connection reset by peer)
[20:43] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[20:45] * The_Bishop (~bishop@2001:470:50b6:0:59c8:d63e:cbe6:ad15) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[20:47] <pja> sry, im back
[20:47] <pja> ubuntu 12.04 lts
[20:47] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[20:47] <pja> yeah it starts with client.radosgw
[20:47] <pja> [client.radosgw.connect2]
[20:48] <pja> host = hcrgwko2
[20:48] <pja> keyring = /etc/ceph/keyring.radosgw.connect2
[20:48] <pja> rgw socket path = /tmp/connect2.sock
[20:48] <pja> log file = /var/log/ceph/connect2.log
[20:48] <pja> rgw dns name = hcrgwko2.cgm.ag
[20:48] <pja> [client.radosgw.connect2]
[20:48] <tnt> try /etc/init.d/radosgw start instead of using service xxxx.
[20:48] <pja> ups without last
[20:48] * LeaChim (~LeaChim@ has joined #ceph
[20:48] <pja> i use etc/init.d
[20:48] <pja> service doesnt work with radosgw in general
[20:48] <tnt> what does uname -n says ?
[20:49] <pja> uname -n says hcrgwko2.cgm.ag
[20:49] <pja> its the fqdn
[20:49] <tnt> that's the problem
[20:50] <tnt> look in the script, it compares what is n the ceph.conf host = xxx with the result of `hostname` ...
[20:52] <pja> ive changed /etc/hostname to hcrgwko2 and it doesnt work
[20:52] <pja> in the /etc/hosts i've entered the netbios and the fqdn
[20:53] <tnt> you'll need to also run 'hostname hcrgwko2' (or reboot) for it to update the hostname live.
[20:54] <pja> oh i forget to do, know it starts, thx a lot ! :)
[20:54] * eschnou (~eschnou@203.39-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:54] <pja> ive another question
[20:54] <pja> ive deactivated cephx in the config
[20:54] <pja> do i need any keyrings?
[20:55] <pja> do i need key rings for permissions so for example radosgw can connect to the object store?
[20:55] <tnt> don't think so ...
[20:58] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:58] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:59] <pja> dont think or do you know exactly?
[20:59] <pja> its not clear for me
[21:00] * dosaboy_ (~dosaboy@faun.canonical.com) Quit (Quit: leaving)
[21:00] <tnt> I'm pretty conifdent you don't need a keyring but I wouldn't bet my life on it either ...
[21:02] <pja> the question is, when i deactivate cephx, its not clear for me if you dont need keyrings anymore for doing things with the ceph tools (ceph, ceph-authtool, etc)
[21:02] <pja> nevertheless needing them at the same time for permissions to look into the object storage
[21:03] <pja> bad language :(
[21:03] <pja> i think cephx is  on the one hand for authentication for the tools like ceph, ceph-authtool etc
[21:04] * dwt (~dwt@128-107-239-234.cisco.com) Quit (Ping timeout: 480 seconds)
[21:05] <pja> but to access the object store(rgw, ceph-fs, block access) you need permissions via a key in the keyring, regardless you have activated cephx or not
[21:06] <tnt> nope
[21:06] <pja> so when deactivated cephx you dont need any key/ring?
[21:06] <tnt> if you disable cephx, you can access object store without key.
[21:07] <pja> so anybody can see everything in the object storage?
[21:07] <tnt> yes
[21:09] <pja> thx
[21:09] <pja> another question
[21:09] <pja> ive setted up 3 mons and 3 osd servers
[21:09] * vipr (~vipr@78-23-118-188.access.telenet.be) Quit (Remote host closed the connection)
[21:09] <pja> with service ceph -a start i can start them all
[21:10] <pja> when i do this on an osd
[21:10] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[21:11] * vipr (~vipr@78-23-118-188.access.telenet.be) has joined #ceph
[21:11] <pja> it says startin mon1 and then "ceph-create-keys "
[21:11] <pja> what is ceph-create-keys  ?
[21:12] <pja> it hangs at this point
[21:12] <dmick> ceph-create-keys is a Python script that creates keys for newly-discovered daemons
[21:12] <tnt> no idea, it's some new stuff and I never looked into it yet.
[21:13] <dmick> waits for the mon to become ready, then adds bootstrapping keys.
[21:14] <pja> what do you mean with newly discovered.. they were started before
[21:14] * john_barbee_ (~jbarbee@ Quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130514181517])
[21:14] <pja> it hangs at this point for minutes now
[21:14] * eschnou (~eschnou@203.39-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[21:15] <pja> when i connect to the mon and start there ceph, it will resume on the osd with the other daemons
[21:15] <dmick> are you saying you get ceph-create-keys back from the 'service ceph -a start' command?
[21:15] <pja> yes
[21:16] <pja> root@osd1:~# service ceph -a start
[21:16] <pja> === mon.a ===
[21:16] <pja> Starting Ceph mon.a on monitor1...
[21:16] <pja> Starting ceph-create-keys on monitor1...
[21:16] <dmick> do you really mean you see "Starting ceph-create-keys on $host"?
[21:16] <dmick> ij'
[21:16] <dmick> ok
[21:16] <pja> everytime
[21:16] <pja> and it hangs know at this point
[21:16] <dmick> yes. If you look at the script, you see that it does that every time you start a mon
[21:16] <dmick> certainly it shouldn't hang.
[21:17] <pja> when i connect to monitor1 and type service ceph start
[21:17] <pja> it will start the mon and resume the "ceph -a start"-process on the osd
[21:17] <dmick> strace it and see what it's doing. it's either looping trying to wait for the monitors to reach quorum or looping trying to contact the mon to add keys
[21:17] <dmick> and, as I said, it's a script, so you can investigate.
[21:18] <pja> whats the meaning of strace?
[21:18] <dmick> it's a Linux command
[21:18] <pja> can you explain what i have to do
[21:18] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:18] * eschnou (~eschnou@203.39-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:19] <mikedawson> pja: is your cluster in quorum?
[21:19] * diegows (~diegows@ has joined #ceph
[21:19] <pja> it can contact the mon without problems with ssh, ive tested this
[21:19] <pja> how can it be in quorum, when the services are stoped on every host?
[21:19] <pja> its a test building, we've setted up 1day before
[21:19] <mikedawson> pja: the monitor processes are stopped?
[21:20] <pja> yes
[21:20] <pja> everything
[21:20] <pja> ceph -a stop
[21:20] <dmick> but you said ceph -a start.
[21:20] <dmick> and you said "it starts the mon and then..."
[21:20] <dmick> which is it?
[21:20] <pja> ceph a stop  before
[21:20] <mikedawson> ceph-create-keys tends to hang until monitors for a quorum
[21:20] <mikedawson> form
[21:20] <pja> i want to understand the functionality, how it works
[21:20] <dmick> s/tends to hang/loops/
[21:21] <dmick> it insists on live monitors
[21:21] <pja> when i install it, ceph is not started on any host
[21:21] <pja> so i connect to any host, for example osd1 or mon2
[21:22] <dmick> of course not, and that's not relevant; installation is separate from startup
[21:22] <pja> and can start ceph on all cluster hosts with ceph -a start
[21:22] <pja> i thought
[21:23] <dmick> 1) what tool did you use to set up the daemons?
[21:24] <dmick> you installed packages, then what?
[21:24] <pja> and as i said, when i go to a 2. host, and start ceph there, it will resume also on host1
[21:24] <pja> mkcephfs --mkfs
[21:24] <pja> at this moment only 1 osd
[21:25] <dmick> you've wiped out your previous installs and have now installed only one osd?
[21:26] <pja> ive started with one osd, want to add another one next week
[21:26] <pja> first installation ever
[21:27] <pja> starting with one osd, trying to connect. when it works, adding another 2 osds next week
[21:27] <pja> hardware is there, only have to install
[21:29] <dmick> ok, but, you realize you must have mons to have a cluster, right?
[21:30] <pja> ive installed 3mons,1rgw,1osd
[21:31] <pja> when i can connect, i will install another 2 osds, 2. rgw and load balancer for rgw
[21:31] <dmick> so if you have a ceph.conf that describes them all, that's been copied to all hosts, and all hosts have passwordless ssh as root
[21:31] <pja> yes
[21:31] <dmick> then service ceph -a start will start all daemons on all hosts
[21:31] <dmick> that's what -a is for
[21:31] <pja> yeah i know that a is for all hosts
[21:32] <pja> but as i said
[21:32] <pja> when every ceph service is stopped
[21:32] <pja> and i make service ceph -a start on one of this hosts
[21:32] <pja> it will hang at the point creating keys
[21:32] <pja> when i go to a 2. host and make ceph -a start it will resume
[21:32] <dmick> what is "a 2. host"?
[21:33] <pja> mon1
[21:33] <pja> first run ceph -a start on osd1
[21:33] <pja> than it hangs
[21:33] <pja> i go to mon 1 and make there also ceph -a start
[21:33] <pja> and it resumes on mon1
[21:33] <dmick> "a 2." is "a second"?
[21:34] <dmick> ok, maybe I'll just assume that's what you mean by "a 2."
[21:34] <dmick> so
[21:34] <dmick> we need to discover why the first ceph -a start has problems
[21:34] <dmick> doing it again is not the solution]
[21:34] <dmick> so when you do it the first time
[21:34] <dmick> and ceph-create-keys hangs
[21:34] <dmick> there are two theories why:
[21:34] <dmick> (12:17:38 PM) dmick: strace it and see what it's doing. it's either looping trying to wait for the monitors to reach quorum or looping trying to contact the mon to add keys
[21:35] <pja> yes i mean second
[21:35] <dmick> if you don't understand what I mean by 'strace it'
[21:35] <dmick> man strace
[21:36] * mikedawson (~chatzilla@ Quit (Ping timeout: 480 seconds)
[21:36] <pja> ive read the first part, but i dont know what i have to do know
[21:37] <dmick> you can also read the ceph-create-keys script and see that it's doing ceph --admin-daemon=/var/run/ceph/$cluster-mon-$mon_id.asok mon_status and looking for its return code
[21:37] <dmick> as for strace, so, have you run it against the ceph-create-keys process?
[21:37] <dmick> (-p)
[21:38] <dmick> (this is basic "what's a process doing on Linux" diagnosis)
[21:40] <pja> no i dont have run it at this point
[21:40] <sagewk> dmick: https://github.com/ceph/ceph/pull/316
[21:40] <pja> do i have to run it on the host, which hangs in the ceph-a process?
[21:43] <pja> should i type strace ceph-create-keys?
[21:44] * Wolff_John (~jwolff@vpn.monarch-beverage.com) Quit (Ping timeout: 480 seconds)
[21:48] <dmick> pja: that would run a new ceph-create-keys under strace
[21:48] <dmick> which is probably not what you want, since you want to observe the existing process
[21:48] <dmick> it also doesn't use the -p switch, which I typed above
[21:48] <dmick> so you probably want strace -p $(pgrep ceph-create-keys)
[21:48] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[21:50] <elder> joshd, do you see any reason why *all* osd request submitted by rbd shouldn't wait for the safe callback?
[21:51] * portante (~user@ has joined #ceph
[21:51] * Wolff_John (~jwolff@ has joined #ceph
[21:51] <joshd> elder: no, I see no reason not to
[21:52] <elder> Right now they all wait for the acknowledgement callback only. It only matters for writes, right?
[21:52] <joshd> yeah, reads only get the ack
[21:52] <elder> Will a read ack also have the SAFE bit set though?
[21:52] <elder> Or whatever?
[21:53] <elder> It would be nicer to just set up the same safe callback in either case, rather than having to do the safe for writes and ack for reads.
[21:54] <joshd> no, although you could modify the osd_client to call the safe callback and the ack callback if they exist for reads
[21:54] <elder> OK.
[21:54] <elder> I'll do that.
[21:56] <pja> so i go to the mon1 (ive started ceph -a start on osd1, but it hangs for create-keys on mon1)
[21:57] <pja> there is a create key process running (ps ax) with pid 6137
[21:57] <pja> i do strace -p 6137
[21:57] <pja> on mon1
[21:58] <pja> strace -p 6137
[21:58] <pja> Process 6137 attached - interrupt to quit
[21:58] <pja> select(0, NULL, NULL, NULL, {0, 422915}) = 0 (Timeout)
[21:58] <pja> pipe([4, 5])                            = 0
[21:58] <pja> fcntl(4, F_GETFD)                       = 0
[21:58] <pja> fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
[21:58] <pja> fcntl(5, F_GETFD)                       = 0
[21:58] <pja> fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
[21:58] <pja> pipe([6, 7])                            = 0
[21:58] <pja> fcntl(6, F_GETFD)                       = 0
[21:58] <pja> fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
[21:58] <pja> fcntl(7, F_GETFD)                       = 0
[21:58] <pja> fcntl(7, F_SETFD, FD_CLOEXEC)           = 0
[21:58] <pja> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f16b2cef9d0) = 9692
[21:58] <pja> close(7)                                = 0
[21:58] <pja> close(5)                                = 0
[21:58] <pja> mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f16b0b38000
[21:58] <dmick> STOP
[21:58] <pja> read(6, "", 1048576)                    = 0
[21:58] <pja> mremap(0x7f16b0b38000, 1052672, 4096, MREMAP_MAYMOVE) = 0x7f16b0b38000
[21:58] <pja> close(6)                                = 0
[21:58] <pja> munmap(0x7f16b0b38000, 4096)            = 0
[21:58] <pja> fcntl(4, F_GETFL)                       = 0 (flags O_RDONLY)
[21:58] <pja> fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[21:58] <pja> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f16b2cf9000
[21:58] <pja> lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
[21:58] <pja> fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[21:58] <pja> munmap(0x7f16b2cf9000, 4096)            = 0
[21:58] <pja> close(3)                                = 0
[21:58] <pja> fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[21:58] <pja> lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
[21:58] <pja> read(4, "{ \"nam", 6)                   = 6
[21:58] <pja> fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[21:58] <pja> lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
[21:58] <pja> read(4, "e\": \"a", 6)                  = 6
[21:58] <pja> fstat(4, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[21:58] <tnt> OMG
[21:58] <pja> lseek(4, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
[21:58] <pja> sorry
[21:59] <dmick> you'll want to add -f to see the children it's executing (the clone above)
[21:59] <dmick> and you'll want to examine the output for signs of what the proc is doing; there's noise, and there's signal
[21:59] <dmick> if it really does all look like complete gibberish to you, use pastebin to share it
[22:01] <pja> -f does not exist, i add -F and post it
[22:04] * pja (~pja@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[22:05] * pja (~pja@a.clients.kiwiirc.com) has joined #ceph
[22:05] <pja> have you wrote s.thing? i ve been disconnected
[22:06] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[22:08] <dmick> -f Trace child processes as they are created by cur‐
[22:08] <dmick> rently traced processes as a result of the fork(2)
[22:08] <dmick> system call.
[22:10] <pja> ive used -F
[22:11] * pja (~pja@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[22:11] * pja (~pja@a.clients.kiwiirc.com) has joined #ceph
[22:16] <pja> strace -p pid -f
[22:16] <pja> http://pastebin.com/w4GW6u6g
[22:16] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) has joined #ceph
[22:20] <dmick> write(2, "INFO:ceph-create-keys:ceph-mon i"..., 60) = 60
[22:20] <dmick> is the log message from ceph-create-keys
[22:20] <dmick> it's saying
[22:21] <dmick> LOG.info('ceph-mon is not in quorum: %r', state)
[22:21] <dmick> so
[22:21] <dmick> the monitors are not healthy
[22:21] <dmick> are the mon processes running on each host where they should be?
[22:23] <dmick> sage: 316 looks OK to me I guess
[22:24] <dmick> do you want me to merge it?
[22:24] <pja> what do you mean
[22:24] <pja> as i said
[22:24] <pja> all daemons are stopped
[22:24] <pja> so nothing is in quorom when everything is stoped
[22:24] <pja> when i want to start them via ceph -a start
[22:25] <pja> it hangs
[22:25] <dmick> but you're *TRYING TO START THEM*
[22:25] <dmick> and in the process *OF STARTING* them
[22:25] <pja> yes
[22:25] <dmick> that's when the hang occurs, right?
[22:25] <pja> yes
[22:25] <dmick> as I've said
[22:25] <dmick> ceph-create-keys requires the monitor to be in quorum
[22:25] <dmick> that implies that the monitor must be running
[22:25] <dmick> clearly, as part of the startup process
[22:26] <pja> in quorum means for me, that all 3 mons have to run at a certain moment
[22:26] <dmick> so if the startup process is not first starting the monitors and then doing ceph-create-keys, that will make it fail
[22:26] <dmick> yes
[22:26] <dmick> and that's what ceph -a start does
[22:26] <pja> yes
[22:26] <sagewk> dmick: tnx, i merged it.
[22:26] <pja> and it hangs after the first mon, when the other 2 are not running
[22:26] <dmick> sagewk: k
[22:27] <dmick> it should have started the other two as well
[22:27] <pja> service ceph -a start
[22:27] <pja> === mon.a ===
[22:27] <pja> Starting Ceph mon.a on hcmonko1...
[22:27] <pja> Starting ceph-create-keys on hcmonko1...
[22:27] <pja> its seriell, not parallel i thought?
[22:27] <dmick> let me see your ceph.conf from hcmonko1
[22:27] <pja> service ceph -a start
[22:27] <pja> === mon.a ===
[22:27] <pja> Starting Ceph mon.a on mon1...
[22:27] <pja> Starting ceph-create-keys on mon1...
[22:28] <dmick> echo Starting ceph-create-keys on $host...
[22:28] <dmick> cmd2="$SBINDIR/ceph-create-keys -i $id 2> /dev/null &"
[22:28] <dmick> do_cmd "$cmd2"
[22:28] <pja> dont know what you mean
[22:29] <dmick> (01:27:50 PM) dmick: let me see your ceph.conf from hcmonko1
[22:32] * fridudad (~oftc-webi@ has joined #ceph
[22:32] <pja> http://pastebin.com/WC4a3eqX
[22:33] <pja> copied per scp on every host
[22:38] * rturk-away is now known as rturk
[22:40] <pja> got it?
[22:45] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[22:45] <dmick> pja: yes, day job, sorry, looking now
[22:46] <dmick> so each mon is meant to be started serially, but the "&" on the command means that it won't wait for ceph-create-keys to finish
[22:47] <dmick> if all the mons aren't running, that's definitely the problem, and you need to sort out why that is; the ceph.conf mentions them all, so it should be trying to start them all
[22:47] * joao (~JL@89-181-145-37.net.novis.pt) Quit (Ping timeout: 480 seconds)
[22:47] <dmick> you might try either -v to the ceph startup, or running it with sh -x and tracing it
[22:52] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[22:52] <pja> sorry, i asked because of my inet connection ;)
[22:53] <pja> which &? i dont type it ?
[22:54] <pja> "you might try either -v to the ceph startup, or running it with sh -x and tracing it " can you explain this?
[22:54] <pja> ceph -v is for version, typing "service ceph -v start makes a fault"
[22:55] <dmick> (01:28:26 PM) dmick: cmd2="$SBINDIR/ceph-create-keys -i $id 2> /dev/null &"
[22:55] <dmick> that &
[22:55] <dmick> from the /etc/init.d/ceph script
[22:55] <pja> ok i havent looked into the service script
[22:56] <pja> i can start the mons localy
[22:56] <pja> but not over the -a parameter on another node like the osd
[22:57] <dmick> service ceph should run /etc/init.d/ceph
[22:57] <dmick> /etc/init.d/ceph is supposed to take -v for verbose
[22:57] * ScOut3R (~ScOut3R@dsl51B614D7.pool.t-online.hu) Quit (Ping timeout: 480 seconds)
[22:58] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has left #ceph
[23:03] <dmick> pja: I don't know why it's not working, but since it's a script, you can experiment
[23:05] * Cube (~Cube@ has joined #ceph
[23:05] <pja> when i make ceph -a start
[23:06] <pja> it hangs by mon 1 ceph-create-keys
[23:06] <pja> when i go to mon 2 or 3
[23:06] <pja> the ceph service is not started
[23:06] <pja> i dont think i have to modify the ceph init.d script ?!
[23:07] * rturk is now known as rturk-away
[23:08] <pja> i think it has to do something with qourom
[23:09] <pja> when i start another mon direct on the machine
[23:09] <pja> for example on mon3: service ceph start
[23:10] <pja> it will resume on osd1 with the ceph -a, which hangs on create-keys for mon1 before
[23:10] <dmick> ceph -a start should start all mons on all machines
[23:10] <pja> when mon3 is started, the ceph-a resumes with creating keys for mon1, goes over to mon2, and for sure says mon3 is started already
[23:10] <elder> OK joshd I guess I'm going to use the safe callback after all.
[23:10] <dmick> ceph-create-keys will hang until mons reach quorum, but that does not make the /etc/init.d/ceph script hang
[23:11] <pja> it seems like this, because when it hangs for creating keys for mon1, it does not start mon2 or mon3 at the same time
[23:11] <dmick> and that's the problem that needs to be solved
[23:11] <dmick> because that's not how it's designed to work
[23:12] <elder> However, joshd, the point of the bug I was looking at was regarding watch/notify. That is, it assumes that the watch request--being a in-osd state changing request--will issue an ONDISK callback?
[23:12] <pja> yeah i think thats the problem
[23:12] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[23:12] * fridudad (~oftc-webi@ Quit (Remote host closed the connection)
[23:12] <elder> Is that true? I was going to only treat the watch request as registered if I got a safe callback, and consider it unregistered only after I receive a safe callback for that.
[23:12] <pja> but i dont know how to fix it, ive not the know -how to edit the init.d script or something like this
[23:13] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[23:13] <elder> Oh, I see, it looks to me like you get an ONDISK ack if you request one in the flags.
[23:16] * Romeo (~romeo@ has joined #ceph
[23:16] * kyle_ (~kyle@ has joined #ceph
[23:16] * Tamil2 (~tamil@ has joined #ceph
[23:16] * NXCZ_ (~chatzilla@ip72-199-155-185.sd.sd.cox.net) has joined #ceph
[23:17] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[23:17] * coyo|2 (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[23:17] <pja> dont know where to look, they all can reach each other without entering a passwort, on every server the same ceph.conf
[23:17] * mnash_ (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[23:18] * jksM (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[23:18] <pja> im not sure about the mkdirs in var/lib/ceph
[23:19] <pja> i've created the folders for the mons only on the mon servers, the folders for the osd only on the osd server
[23:19] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[23:19] <pja> i think thats right?
[23:20] * rtek_ (~sjaak@rxj.nl) has joined #ceph
[23:20] * Meths_ (rift@ has joined #ceph
[23:20] * dosaboy_ (~dosaboy@host86-161-206-107.range86-161.btcentralplus.com) has joined #ceph
[23:20] * mdxi_ (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[23:20] * nooky_ (~nooky@ has joined #ceph
[23:21] <pja> there are new folders on everyhost, for example the bootstrap folders(i dont know what they do) and also "osd" folder on a mon, which i have not created personally
[23:21] <pja> bootstrap-mds  bootstrap-osd  mds  mon  osd  tmp
[23:21] * via_ (~via@smtp2.matthewvia.info) has joined #ceph
[23:21] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * Wolff_John (~jwolff@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * LeaChim (~LeaChim@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * Tamil (~tamil@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * lightspeed (~lightspee@lns-c10k-ld-01-m-62-35-37-66.dsl.sta.abo.bbox.fr) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * coyo (~unf@00017955.user.oftc.net) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * janisg (~troll@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * via (~via@smtp2.matthewvia.info) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * NXCZ (~chatzilla@ip72-199-155-185.sd.sd.cox.net) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * rtek (~sjaak@rxj.nl) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * kyle__ (~kyle@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * Meths (rift@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * Romeo_ (~romeo@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * nooky (~nooky@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * dxd828 (~dxd828@ Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * dosaboy (~dosaboy@host86-161-206-107.range86-161.btcentralplus.com) Quit (synthon.oftc.net oxygen.oftc.net)
[23:21] * NXCZ_ is now known as NXCZ
[23:21] * mnash_ is now known as mnash
[23:21] <pja> when i look into the osd folder on a mon-server, theres nothing inside
[23:21] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[23:21] * ChanServ sets mode +o scuttlemonkey
[23:22] <dmick> pja: I would be trying to run the ceph script with -v or with bash -x
[23:22] * janisg (~troll@ has joined #ceph
[23:22] <dmick> if you have never debugged a shell script before, maybe you can find someone to help you with that
[23:22] <pja> ceph with -v without -a ?
[23:22] <dmick> but I'm sorry, I can't afford the time to walk you through that remotely today
[23:22] <dmick> no, I'd be doing it with -a as well
[23:23] <dmick> -v just adds verbosity
[23:23] <dmick> read the script
[23:23] * dxd828 (~dxd828@ has joined #ceph
[23:23] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[23:24] * LeaChim (~LeaChim@ has joined #ceph
[23:24] * lightspeed (~lightspee@lns-c10k-ld-01-m-62-35-37-66.dsl.sta.abo.bbox.fr) has joined #ceph
[23:24] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[23:27] <tchmnkyz> ok dmick got a sec to help me with something new that has popped up recently
[23:27] <pja> ceph health -s
[23:27] <pja>    health HEALTH_WARN clock skew detected on mon.b, mon.c
[23:28] <pja> can it has to do with this?
[23:28] <pja> i will pastebin the -v
[23:29] <tchmnkyz> i have been seeing a rash of issues with vm disks going readonly or becoming corrupt. not quite as bad as i had the time before where the vm disk was completely wiped out. but still a pain in the ass. health status is green and i dont think i am maxing out the IO
[23:29] <tchmnkyz> is there any way to accruately benchmark my max IO on the entire ceph cluster?
[23:30] <tnt> maxing out the IO would/should not corrupt an image anyway ...
[23:30] <tchmnkyz> tnt: that is what i thought too
[23:31] <tchmnkyz> but what would
[23:31] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[23:31] <pja> http://pastebin.com/Epni6KBF
[23:31] <tchmnkyz> i am using proxmox/kvm with ceph backend
[23:31] <tchmnkyz> for some reason vm's have started to have IO errors
[23:32] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[23:32] <tchmnkyz> almost like if it was a sata disk and the disk was going bad
[23:33] * jtang1 (~jtang@ has joined #ceph
[23:34] <tchmnkyz> the only thing that has saved me is i have backups of the vm's
[23:34] <tchmnkyz> but it is getting old having to redo them
[23:36] <dmick> tchmnkyz: you're accessing each rbd image only from one VM, yes?
[23:36] <dmick> and yeah, max IO has nothing to do with it, I'm sure
[23:36] <tchmnkyz> yes
[23:36] <tchmnkyz> no duplicte mounts
[23:38] <tchmnkyz> it just makes no sense why they would go readonly or even corrupt
[23:41] <tnt> tchmnkyz: and you're use the kvm rbd driver I guess ?
[23:41] <tchmnkyz> yup
[23:41] <tnt> a recent one ?
[23:41] <paravoid> yehuda_hm: so, out of curiosity, what's the plan with #5136 ("
[23:41] <paravoid> rgw: revise user stats
[23:41] <dmick> well, if they get I/O errors, the kernel will mark them readonly, I think
[23:41] <paravoid> oops, newline :)
[23:41] <dmick> so the root cause is something's making I/O requests return errors
[23:41] <dmick> have you looked at the logs?
[23:42] <paravoid> yehuda_hm: i.e. what's going to change, if it's scheduled for some release etc.
[23:42] <tchmnkyz> yea i see nothing in the logs on the vm nodes
[23:42] <tchmnkyz> http://imgur.com/MmmXbmq
[23:42] <paravoid> yehuda_hm: (sage closed #4754 and pointed me to #5136)
[23:42] <tchmnkyz> the error i am seeing
[23:42] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[23:42] <tchmnkyz> and it seems that only RedHat EL vms are having this issue
[23:43] <yehuda_hm> paravoid: 5136 is about doing the user stats in a different way
[23:44] <yehuda_hm> keep track of it differently so that when you list the user you'd get that info immediately
[23:44] <paravoid> aha
[23:45] <paravoid> and is it scheduled for dumpling?
[23:45] <yehuda_hm> nope
[23:45] <yehuda_hm> that'd be post dumpling
[23:45] <paravoid> damn
[23:45] <tchmnkyz> tnt/dmick: if it would help i can post dmesg and anything else from the vm nodes. but i get absolutely 0 error messages from them
[23:45] * eschnou (~eschnou@203.39-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:46] * Meths_ is now known as Meths
[23:46] <tnt> ... they're giving errors, they're on the screen so they must be in the vm logs as well ...
[23:47] <tchmnkyz> that is just it on the parrent node there are no logs
[23:47] <tchmnkyz> syslog/dmesg are both empty of any errors like that
[23:48] <dmick> tchmnkyz: I mean the Ceph logs
[23:48] <tchmnkyz> k
[23:48] <tnt> well yeah, I wouldn't expect the syslog or dmesh of the parent to show anything since at that point it's ceph and all userspace stuff ... ceph logs may show something.
[23:49] <tchmnkyz> where should i look first
[23:49] * portante (~user@ Quit (Quit: softball)
[23:49] <tchmnkyz> i have the mon running on one set of nodes and osd on another
[23:49] <tnt> I'd say the OSDs
[23:50] <tchmnkyz> k
[23:50] <tchmnkyz> fun the osd log on the first one is empty
[23:51] <dmick> well I'd say the client logs, but joshd says there aren't any by default
[23:51] <dmick> so I'd probably start by turning some on
[23:51] <dmick> I'm not super familiar with proxmox configuration
[23:51] <tnt> there should still be a ceph.conf if it uses librbd
[23:52] <tnt> Not sure _where_ the logs will be written out though ...
[23:52] <tchmnkyz> k
[23:57] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[23:59] * jtang1 (~jtang@ has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.