#ceph IRC Log

Index

IRC Log for 2013-04-22

Timestamps are in GMT/BST.

[0:07] * LeaChim (~LeaChim@176.250.220.3) Quit (Ping timeout: 480 seconds)
[0:28] * BillK (~BillK@58-7-203-182.dyn.iinet.net.au) has joined #ceph
[0:33] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[0:42] * tnt (~tnt@91.176.19.114) Quit (Ping timeout: 480 seconds)
[0:48] * dxd828_ (~dxd828@host-92-24-117-34.ppp.as43234.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[0:50] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:52] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[0:52] * KindTwo (~KindOne@h119.3.40.162.dynamic.ip.windstream.net) has joined #ceph
[0:52] * KindTwo is now known as KindOne
[0:57] * BillK (~BillK@58-7-203-182.dyn.iinet.net.au) Quit (Remote host closed the connection)
[0:58] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[0:58] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[1:01] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:01] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has left #ceph
[1:02] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[1:11] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:18] * BillK (~BillK@58-7-203-182.dyn.iinet.net.au) has joined #ceph
[1:23] * BillK (~BillK@58-7-203-182.dyn.iinet.net.au) Quit ()
[1:24] * jefferai (~quassel@quassel.jefferai.org) Quit (Remote host closed the connection)
[1:37] * madkiss (~madkiss@2001:6f8:12c3:f00f:8c55:a426:36d9:ef1) has joined #ceph
[1:40] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[1:44] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:31ee:8ef6:53c4:dacd) has joined #ceph
[1:49] * gmason (~gmason@c-68-61-135-223.hsd1.mi.comcast.net) Quit (Quit: Computer has gone to sleep.)
[1:49] * madkiss (~madkiss@2001:6f8:12c3:f00f:8c55:a426:36d9:ef1) Quit (Ping timeout: 480 seconds)
[1:50] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[2:05] * Yen (~Yen@ip-81-11-213-7.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[2:06] * Yen (~Yen@ip-81-11-244-122.dsl.scarlet.be) has joined #ceph
[2:33] * Ormod (~valtha@ohmu.fi) Quit (Remote host closed the connection)
[2:33] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[2:33] * liiwi (liiwi@idle.fi) has joined #ceph
[2:34] * juuva (~juuva@dsl-hkibrasgw5-58c05e-231.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[2:34] * Ormod (~valtha@ohmu.fi) has joined #ceph
[2:37] * juuva (~juuva@dsl-hkibrasgw5-58c05e-231.dhcp.inet.fi) has joined #ceph
[2:57] * diegows (~diegows@190.190.2.126) has joined #ceph
[3:33] <nigwil> using RADOSgw for Swift capability, is it 100% compatible?
[3:35] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:36] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit ()
[3:51] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[3:57] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[4:02] * treaki_ (53241a3ae7@p4FDF70FE.dip0.t-ipconnect.de) has joined #ceph
[4:03] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[4:06] * treaki (7a477b89bb@p4FDF6755.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:07] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:08] * doubleg (~doubleg@69.167.130.11) Quit (Ping timeout: 480 seconds)
[4:16] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:40] * doubleg (~doubleg@69.167.130.11) has joined #ceph
[4:48] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[5:29] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[5:55] * dosaboy (~dosaboy@66-112-78-242.stat.centurytel.net) has joined #ceph
[5:56] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[6:11] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:15] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:15] * dosaboy (~dosaboy@66-112-78-242.stat.centurytel.net) Quit (Quit: leaving)
[6:23] <mega_au> jabadia: your osds are of similar size?
[6:24] <mega_au> should paxos version to be ever increasing at rate few hundreds per minute or it should be more or less stable?
[6:51] * ScOut3R (~ScOut3R@5400E808.dsl.pool.telekom.hu) has joined #ceph
[7:00] * ScOut3R (~ScOut3R@5400E808.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[7:22] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:33] * trond (~trond@trh.betradar.com) Quit (Remote host closed the connection)
[7:33] * trond (~trond@trh.betradar.com) has joined #ceph
[7:51] * tnt (~tnt@91.176.19.114) has joined #ceph
[8:03] * Havre (~Havre@2a01:e35:8a2c:b230:819a:5a7e:7f4:6cd9) has joined #ceph
[8:20] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[8:25] * squiggy (559eb342@ircip4.mibbit.com) has joined #ceph
[8:27] * capri (~capri@pd95c3284.dip0.t-ipconnect.de) has joined #ceph
[8:31] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Read error: Connection reset by peer)
[8:39] * ScOut3R (~ScOut3R@5400E808.dsl.pool.telekom.hu) has joined #ceph
[8:40] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[8:52] * ScOut3R (~ScOut3R@5400E808.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[8:57] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[9:05] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:06] * rustam (~rustam@94.15.91.30) has joined #ceph
[9:12] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[9:12] * LeaChim (~LeaChim@176.250.220.3) has joined #ceph
[9:13] * joelio_ (~Joel@88.198.107.214) Quit (Remote host closed the connection)
[9:14] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[9:15] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:20] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:25] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:27] * tnt (~tnt@91.176.19.114) Quit (Ping timeout: 480 seconds)
[9:36] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:39] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:43] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[9:47] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[9:49] * LeaChim (~LeaChim@176.250.220.3) Quit (Ping timeout: 480 seconds)
[9:51] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[9:51] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:52] * l0nk (~alex@83.167.43.235) has joined #ceph
[10:02] * LeaChim (~LeaChim@176.250.220.3) has joined #ceph
[10:09] * rustam (~rustam@94.15.91.30) has joined #ceph
[10:12] * joelio (~Joel@88.198.107.214) has joined #ceph
[10:13] * loicd (~loic@185.10.252.15) has joined #ceph
[10:14] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:24] * joao (~JL@89.181.154.215) has joined #ceph
[10:24] * ChanServ sets mode +o joao
[10:24] <joao> good morning #ceph
[10:25] <loicd> good morning joao ;-)
[10:27] <joao> hello loicd
[10:30] <vipr> Morning!
[10:32] <vipr> I'm testing stability for the moment, with multiple clients doing a dd to an RBD, on a 3machine Ceph cluster (each machine 1mon, 3osd)
[10:32] <vipr> during this dd, i stop a random OSD
[10:32] <vipr> 4 of 6 clients get kernel panics like this one:
[10:33] <vipr> http://tracker.ceph.com/issues/4685
[10:33] <vipr> any thoughts on this?
[10:34] * jefferai (~quassel@corkblock.jefferai.org) has joined #ceph
[10:44] * SvenPHX1 (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[10:48] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Ping timeout: 480 seconds)
[10:49] * rustam (~rustam@94.15.91.30) has joined #ceph
[10:49] * v0id (~v0@62-46-175-181.adsl.highway.telekom.at) has joined #ceph
[10:50] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:52] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[10:57] * vo1d (~v0@212-183-97-144.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[11:08] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[11:09] <Kioob`Taff> Hi
[11:09] <Kioob`Taff> I have a data loss problem... but not sure if it's a MySQL (MyISAM...) problem, or Ceph/RBD
[11:10] <Kioob`Taff> I have snapshots, but there is the same problem : the end of a file/table is missing (filled by 0)
[11:11] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) has joined #ceph
[11:11] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) Quit (Max SendQ exceeded)
[11:11] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) has joined #ceph
[11:12] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) Quit (Max SendQ exceeded)
[11:12] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) has joined #ceph
[11:17] <Kioob`Taff> So, I would like to see the content of replicas
[11:17] <Kioob`Taff> is there an easy way to map RBD image from replica ?
[11:18] <Kioob`Taff> or should I calculate the RADOS object and search manually in OSD ?
[11:18] <wogri_risc> you can just deep scrub
[11:18] <wogri_risc> then the replica's will compare stuff
[11:18] <wogri_risc> and 'do the right thing'
[11:19] <Kioob`Taff> and... is there a way to launch a deep scrub on a specific image ?
[11:19] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[11:19] <wogri_risc> FYI: I haven't heard of anybody losing data in an rbd image because of ceph, recently.
[11:19] * virsibl (~virsibl@94.231.117.244) has joined #ceph
[11:19] <jerker> Kioob`Taff: slightly off topic, but I have personally ad a lot better luck with InnoDB than with MyISAM.
[11:20] * Havre (~Havre@2a01:e35:8a2c:b230:819a:5a7e:7f4:6cd9) Quit (Remote host closed the connection)
[11:20] <Kioob`Taff> jerker: I totally agree with you... I don't know why the customer choose MyISAM here
[11:20] <Kioob`Taff> wogri_risc: me too, but I have to check anyway ;)
[11:21] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[11:21] <wogri_risc> you have to deep-scrub per OSD. ceph osd deep-scrub <osd-id>
[11:22] <Kioob`Taff> so... the whole cluster :/
[11:22] <wogri_risc> true :)
[11:22] <Kioob`Taff> great
[11:23] <wogri_risc> you can also scrub per pg
[11:23] <wogri_risc> but that sucks, too :)
[11:23] <Kioob`Taff> yes, then I have to calculate the offset of the file in the image, to found the «good» PG :p
[11:24] <Kioob`Taff> or wait that all deep scrubs are done...
[11:25] <wogri_risc> I'd rather choose the second option.
[11:26] <Kioob`Taff> of course... but I have to explain that to the customer :D
[11:26] <wogri_risc> my guess: your customer somehow managed to mount the rbd image twice
[11:26] <wogri_risc> and corrupted his filesystem, or some inodes.
[11:26] <virsibl> Hello. Help please. There is an error. Where can be the problem 2013-04-22 09:12:23.266912 osd.5 10.90.90.9:6803/1130 1693 : [INF] 2.55 deep-scrub ok
[11:26] <virsibl> 2013-04-22 09:24:46.844317 mon.0 10.90.90.7:6789/0 1351646 : [DBG] osd.5 10.90.90.9:6803/1130 reported failed by osd.3 10.90.90.8:6803/1129
[11:26] <virsibl> 2013-04-22 09:24:51.730302 mon.0 10.90.90.7:6789/0 1351651 : [DBG] osd.5 10.90.90.9:6803/1130 reported failed by osd.2 10.90.90.8:6800/1040
[11:26] <virsibl> 2013-04-22 09:24:51.779931 mon.0 10.90.90.7:6789/0 1351652 : [DBG] osd.5 10.90.90.9:6803/1130 reported failed by osd.3 10.90.90.8:6803/1129
[11:26] <virsibl> 2013-04-22 09:24:51.780155 mon.0 10.90.90.7:6789/0 1351653 : [INF] osd.5 10.90.90.9:6803/1130 failed (3 reports from 2 peers after 2013-04-22 09:25:11.779907 >= grace 20.000000)
[11:26] <virsibl> 2013-04-22 09:29:55.962627 mon.0 10.90.90.7:6789/0 1351907 : [INF] osd.5 out (down for 303.687060)
[11:27] <wogri_risc> your osd 5 is odwn, virsibl
[11:27] <wogri_risc> did your disk break? did you look at the log of the osd?
[11:30] <virsibl> After restart everything is OK and so once a week.
[11:30] <virsibl> ceph osd tree
[11:30] <virsibl> # id weight type name up/down reweight
[11:30] <virsibl> -1 6 root default
[11:30] <virsibl> -3 6 rack unknownrack
[11:30] <virsibl> -2 2 host ceph1
[11:30] <virsibl> 0 1 osd.0 up 1
[11:30] <virsibl> 1 1 osd.1 up 1
[11:30] <virsibl> -4 2 host ceph2
[11:30] <virsibl> 2 1 osd.2 up 1
[11:30] <virsibl> 3 1 osd.3 up 1
[11:30] <virsibl> -5 2 host ceph3
[11:30] <virsibl> 4 1 osd.4 up 1
[11:30] <virsibl> 5 1 osd.5 up 1
[11:35] <virsibl> 2013-04-22 09:24:17.171021 7fd84f645700 -1 FileStore: sync_entry timed out after 600 seconds.
[11:35] <virsibl> ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
[11:35] <virsibl> 1: (SafeTimer::timer_thread()+0x425) [0x826925]
[11:35] <virsibl> 2: (SafeTimerThread::entry()+0xd) [0x82756d]
[11:35] <virsibl> 3: (()+0x7e9a) [0x7fd85b48fe9a]
[11:35] <virsibl> 4: (clone()+0x6d) [0x7fd859efbcbd]
[11:35] <virsibl> 2013-04-22 09:24:17.197331 7fd84f645700 -1 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)' thread 7fd84f645700 time 2013-04-22 09:24:17.195485
[11:35] <virsibl> os/FileStore.cc: 3298: FAILED assert(0)
[11:35] <virsibl> ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
[11:35] <virsibl> 1: (SyncEntryTimeout::finish(int)+0x9e) [0x720b7e]
[11:35] <virsibl> 2: (SafeTimer::timer_thread()+0x425) [0x826925]
[11:35] <virsibl> 3: (SafeTimerThread::entry()+0xd) [0x82756d]
[11:35] <virsibl> 4: (()+0x7e9a) [0x7fd85b48fe9a]
[11:35] <virsibl> 5: (clone()+0x6d) [0x7fd859efbcbd]
[11:35] <virsibl> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[11:35] <virsibl> --- begin dump of recent events ---
[11:40] <virsibl> The system disk is installed and is working fine. But for some reason, it ceph periodically loses
[11:47] <joelio> virsibl: PASTEBIN please!
[11:52] * rustam (~rustam@94.15.91.30) has joined #ceph
[11:53] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:00] * rustam (~rustam@94.15.91.30) has joined #ceph
[12:02] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:03] <virsibl> http://pastebin.com/kyGuLQzS
[12:08] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[12:22] * loicd (~loic@185.10.252.15) Quit (Ping timeout: 480 seconds)
[12:26] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) has joined #ceph
[12:37] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[12:38] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:31ee:8ef6:53c4:dacd) Quit (Quit: Leaving.)
[12:56] * mcclurmc_laptop (~mcclurmc@client-7-213.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[12:56] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:00] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[13:08] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:11] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[13:12] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:22] * thorus (~jonas@pf01.intranet.centron.de) has joined #ceph
[13:22] <thorus> in which formats can I export rbd images with rbd export <image-name> <image-dest>?
[13:23] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:29] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:39] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[13:40] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[13:44] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[13:45] * leseb (~Adium@83.167.43.235) has joined #ceph
[13:46] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[13:52] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[13:57] * virsibl (~virsibl@94.231.117.244) has left #ceph
[14:01] * Niklas (~ngoerke@et-1-41.gw-nat.bs.ka.oneandone.net) has joined #ceph
[14:02] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[14:03] <Niklas> it seems to me, that the default apache config in http://ceph.com/docs/master/start/quick-rgw/ is kinda broken
[14:03] <Niklas> this one works for me http://pastebin.com/2BY8Wkuy
[14:04] <Niklas> I'm not an apache expert, but the one posted on ceph.com does not cover the ssl part, an the rewrite rule does not work…
[14:08] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[14:14] <tnt> \names
[14:14] <tnt> meh, wrong slash
[14:16] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:17] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[14:23] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[14:23] <wogri_risc> thorus rbd images are binary - there is no format here. it's just a binary blob
[14:24] <wogri_risc> at least after exporting it (to a file)
[14:24] <thorus> wogri_risc: is it possible to convert it after exporting?
[14:31] <wogri_risc> yes
[14:31] <wogri_risc> you mean converting from format = 1 to format =2?
[14:32] <wogri_risc> you could also ceph export to stdout and ceph import to stdin, as long as the rbd names are not identical.
[14:34] * squiggy (559eb342@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:34] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[14:43] * leseb (~Adium@83.167.43.235) has joined #ceph
[14:48] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:48] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:50] <Niklas> is rados4j the "official" binding to be used for librados and java?
[14:52] <Niklas> because this ticket (https://github.com/syuu1228/rados4j) suggests that there isn't really any binding that can be used at the moment, is there?
[14:53] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:54] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[14:55] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:56] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[14:57] * vipr_ (~vipr@78-23-112-198.access.telenet.be) has joined #ceph
[14:59] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[15:01] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:03] * vipr (~vipr@78-23-116-42.access.telenet.be) Quit (Ping timeout: 480 seconds)
[15:14] <Niklas> wido: are you Wido den Hollander?
[15:15] <Niklas> is rados-java usable yet? How far did you get implementing it? (http://tracker.ceph.com/issues/4232)
[15:15] <wido> Niklas: yes, that's me
[15:15] <wido> It's kind of usable, but it's not fully implemented
[15:15] <wido> basic stuff like writing and reading should work
[15:15] <wido> The unit tests tell me it does
[15:17] <Niklas> wido: cool, I'll try it
[15:17] <wido> Niklas: nice! Feedback is welcome
[15:17] <Niklas> sure
[15:17] <Niklas> Might take some time, though
[15:19] * Niklas (~ngoerke@et-1-41.gw-nat.bs.ka.oneandone.net) Quit (Quit: leaving)
[15:19] * xdeller (~xdeller@performiks.starlink.ru) has joined #ceph
[15:20] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:24] <mikedawson> joao: do you have any time to look into my two emails to ceph-devel? I think there is a critical bug or two in the new monitor code.
[15:25] <joao> mikedawson, skimmed through them; will take a proper look as soon as I wrap up this bug I'm working on now
[15:25] <joao> mikedawson, those are the LogMonitor asserts, right?
[15:26] <mikedawson> joao: mon/Monitor.cc: 1105: FAILED assert(0 == "We should never reach this") and mon/Monitor.cc: 1126: FAILED assert(!(sync_role & SYNC_ROLE_REQUESTER))
[15:27] <joao> huh, yeah, mega_au also hit that one
[15:27] <joao> can't for the life of me understand how that would be triggered
[15:27] <joao> but will look into that asap
[15:27] <mikedawson> joao: biggest problem is I can't keep quorum
[15:28] <joao> my guess is that's a sync happening again and again
[15:35] <matt_> joao, whilst people are on the topic of monitor issues. Have you ever seen a new monitor getting stuck sync'ing and never reaching quorum in 0.60?
[15:36] <matt_> I get the following repeating in the logs - mon.Storage2@0(synchronizing sync( requester state chunks )).data_health(0) update_stats avail 53% total 30357324 used 12425372 avail 16389868
[15:36] <joao> never seen it, but I suspect that's what's happening to mikedawson
[15:36] <joao> huh, yeah, I see how that would happen; but that shouldn't affect the sync itself
[15:37] <joao> that's a health service reporting stats on the monitor
[15:37] <joao> it shouldn't be running though
[15:38] <matt_> What should the log settings be to shed some light on this one? I have some time tonight to do a proper bug report
[15:40] * virsibl (~virsibl@94.231.117.244) has joined #ceph
[15:40] <joao> debug mon = 20, debug paxos = 10, debug ms = 1
[15:40] <joao> that would be nice
[15:40] <matt_> No worries, I'll get the logs to you a bit later on
[15:41] <joao> thanks
[15:49] * virsibl (~virsibl@94.231.117.244) has left #ceph
[15:49] <mikedawson> matt_: I am seeing that quite often
[15:52] <mikedawson> joao: I'm turning up my logs as well
[15:54] <joao> cool, thanks
[15:59] <mikedawson> matt_: Could you try something for me assuming ceph isn't in production? Stop all MONs and OSDs, then start the MONs one at a time. If it behaves like mine, I believe you'll get a quorum with all monitors (whereas you couldn't with the OSDs running). Then start the OSDs and you'll be set.
[16:00] <matt_> mikedawson, sorry, I have a few things running at the moment that I can't stop.
[16:00] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:00] <mikedawson> matt_: ok
[16:01] <matt_> I did have them in quorum a week ago but they have a memory leak and one stopped. When I restarted it wasn't able to get back in quorum again
[16:01] <matt_> I have two in quorum at the moment and I can start/stop them no problems and they always achieve quorum between the two of them
[16:02] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[16:03] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[16:06] * yehuda_hm (~yehuda@2602:306:330b:1410:9dbe:9b5c:6236:13d2) has joined #ceph
[16:07] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[16:07] * BillK (~BillK@58-7-98-144.dyn.iinet.net.au) has joined #ceph
[16:15] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Quit: wogri_risc)
[16:16] <mikedawson> matt_: that's exactly what I've seen. Once one falls out of quorum, it will never catch up if the OSDs are running.
[16:18] <mikedawson> matt_, joao: I've seen the monitor that will not join the quorum jump from one monitor to another depending the order in which I stop and start the mon daemons, but in this state I can't get all three back into quorum without stopping OSDs
[16:20] <mega_au> mikedawson: just hit the same stuff. in my case I have increased paxos max join drift to 100 and mons did catch up with each other
[16:21] * portante (~user@66.187.233.206) has joined #ceph
[16:24] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) has joined #ceph
[16:24] <mikedawson> mega_au: http://ceph.com/docs/master/rados/configuration/mon-config-ref/ says "paxos max join drift" is deprecated since 0.58? I'm on 0.60. What version are you on?
[16:26] <mega_au> 0.60 I've seen that notice but source code says otherwise :) And I believe in code. Do you have "too far behind" in your logs?
[16:28] <mikedawson> mega_au: that wouldn't be the first documentation mismatch I've seen. I'll check the logs after a meeting. Thanks for the tip!
[16:28] <mega_au> I see paxos version to increase every second. paxos max join drift by default is 10. So mon effectively expects to complete sync in 10 seconds. Which does not happen: for me it takes between 60 to 250 seconds.
[16:28] <joao> yeah, paxos max join drift shouldn't be marked as deprecated yet
[16:28] <joao> I'll pass that along to John
[16:29] <joao> mega_au, yeah, paxos max join drift should be tuned for that, but the recovery mechanism itself isn't the best way to catch up; the store sync should do it
[16:29] <joao> my guess is that the sync is currently way too pessimistic
[16:30] * rustam (~rustam@94.15.91.30) has joined #ceph
[16:32] <mega_au> Now I have new issue: I had some inconsistent pgs. Could not run pg repair - osd was crashing out. I took osd completely out and recovered cluster on 5 osds. Now when I want to add fresh osd mons crashing out completely. And until I remove this new osd refuse to run.
[16:33] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[16:33] * mcclurmc_laptop (~mcclurmc@client-7-201.eduroam.oxuni.org.uk) has joined #ceph
[16:33] * mcclurmc_laptop (~mcclurmc@client-7-201.eduroam.oxuni.org.uk) Quit (Max SendQ exceeded)
[16:34] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[16:34] * mcclurmc_laptop (~mcclurmc@client-7-201.eduroam.oxuni.org.uk) has joined #ceph
[16:34] * mcclurmc_laptop (~mcclurmc@client-7-201.eduroam.oxuni.org.uk) Quit (Max SendQ exceeded)
[16:34] * mcclurmc_laptop (~mcclurmc@client-7-201.eduroam.oxuni.org.uk) has joined #ceph
[16:37] <mega_au> joao: what I have noticed also when I have 3 mons in quorum and want to start remaining two I get very interesting behaviour: they appear trying to sync from each other and not from mons in quorum. Weird!
[16:37] * SvenPHX1 (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Ping timeout: 480 seconds)
[16:37] <mega_au> If I starting them one by one - they join cluster without problems.
[16:38] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[16:38] <joao> that should be okay to happen, but they should have picked a new monitor afterwards
[16:39] <joao> mega_au, if you have logs of that happening, I'd like to take a look
[16:40] <mega_au> I have left them in such state for quite a while and they did not go anywhere else. If I shutdown one second one goes in standby.
[16:40] <mega_au> Of course I have all logs. Was collecting them over weekend. Will upload in few hours.
[16:40] <joao> thanks
[16:43] * mcclurmc_laptop (~mcclurmc@client-7-201.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[16:52] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:55] * mistur (~yoann@kewl.mistur.org) Quit (Ping timeout: 480 seconds)
[16:59] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[17:03] * mib_fhxzia (559eb342@ircip3.mibbit.com) has joined #ceph
[17:03] <mib_fhxzia> Does anybody know when we'll see a new bobtail release?
[17:07] <fghaas> mib_fhxzia: http://tracker.ceph.com/projects/ceph/roadmap would indicate that the dev emphasis is on cuttlefish right now, but if you need stable release updates I'm sure inktank would be happy to talk to you about subscriptions :)
[17:08] * vata (~vata@2607:fad8:4:6:44c1:aa21:33e3:45bc) has joined #ceph
[17:09] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[17:10] <mib_fhxzia> thanks i just wanted to know whether there will be a bobtail release before cuttlefish - i already send a mail regarding subsciption;-)
[17:10] <mib_fhxzia> thanks
[17:13] * mib_fhxzia (559eb342@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[17:23] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[17:24] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[17:24] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[17:28] * wwformat (~chatzilla@222.240.181.108) has joined #ceph
[17:29] * wwformat (~chatzilla@222.240.181.108) Quit ()
[17:30] * mistur (~yoann@kewl.mistur.org) has joined #ceph
[17:43] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[17:44] <jmlowe> I would like to run something by the author of http://ceph.com/docs/master/rbd/qemu-rbd/
[17:44] <jmlowe> I had to use something different for my libvirt xml and I'm wondering if it's just me before I go filing a bug
[17:45] <jmlowe> specifically the discard argument
[17:45] <scuttlemonkey> lemme poke jwilkins, I don't see him in here
[17:47] * rustam (~rustam@94.15.91.30) has joined #ceph
[17:48] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[17:51] <scuttlemonkey> jmlowe: he doesn't seem to be in yet. I told him to poke you when he gets in
[17:51] <jmlowe> ok, thanks
[17:56] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:59] * Cube (~Cube@12.248.40.138) has joined #ceph
[18:08] * BillK (~BillK@58-7-98-144.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:13] * ebo^ (~ebo@koln-5d8119bb.pool.mediaWays.net) has joined #ceph
[18:13] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[18:13] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:15] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[18:20] * BillK (~BillK@124-148-237-131.dyn.iinet.net.au) has joined #ceph
[18:22] <ebo^> i recently had to clear out some disks in my ceph cluster and now they are backfilling with only 10mb/s. I have journal und storage on the same disk/partition. Is the 10mb/s the most i can expect?
[18:22] * mcclurmc_laptop (~mcclurmc@cpc1-oxfd21-2-0-cust70.4-3.cable.virginmedia.com) has joined #ceph
[18:24] <pioto> with the libvirt rbd storage pool... if i make a 'clone' using its APIs... is that going to give me a thin clone throuhg ceph? or, just do a "thick" one, copying all the data to a new image?
[18:25] <sagewk> pioto: thin
[18:25] <sagewk> ebo^: depends on the average object size, 'osd max recovery active', and some other factors.
[18:26] <ebo^> mostly cephfs objects 100mb-1gb files
[18:26] <pioto> sagewk: cool. and i guess libvirt snapshotting of a guest is gonna use ceph snapshots natively too? cool
[18:27] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:27] <ebo^> i tried tuning the backfill parameter but it had little influence
[18:27] <sagewk> i believe so, although iirc there is some mismatch because the libvirt snapshot concept includes vm state, not just storage?
[18:27] * portante` (~user@66.187.233.206) has joined #ceph
[18:28] <jmlowe> sagewk: correct, it includes the memory and state of the vm
[18:28] <ebo^> do backfills go through the journal?
[18:28] <sagewk> ebo^: yeah
[18:28] <ebo^> that seems like a waste
[18:29] * John (~john@astound-64-85-225-33.ca.astound.net) has joined #ceph
[18:29] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[18:29] <sagewk> it's all about consistency of what is on disk to avoid an expensive scan on resync, or ad hoc recovery checks
[18:30] <ebo^> true
[18:32] <John> jmlowe: Patrick indicated that you have some comments on http://ceph.com/docs/master/rbd/qemu-rbd/
[18:32] <jmlowe> I do
[18:33] <John> Love to hear them, or you can send me an email...
[18:34] <jmlowe> I had to use some different arguments to make it work
[18:34] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Leaving)
[18:34] <jmlowe> <qemu:arg value='device.scsi0-0-0-0.discard_granularity=4096'/>
[18:36] <jmlowe> I'm using virtio-scsi, the docs indicate ide however the libvirt xml has scsi, I'm wondering if it actually works for ide that way (I haven't tried) and should there be a virtio-scsi example?
[18:36] * MEMMYBOY89 (687ABUKDF@79.126.197.206) has joined #ceph
[18:36] <MEMMYBOY89> http://russianteenmoviedownload.blogspot.com/2013/04/russian-schoolgirls-anal-lessons.html
[18:36] <John> I think I've updated since then...
[18:37] <fghaas> scuttlemonkey: there's someone for you to kick here
[18:37] * MEMMYBOY89 (687ABUKDF@79.126.197.206) Quit (autokilled: Spam. Mail support@oftc.net if you feel this is in error (2013-04-22 16:37:32))
[18:37] <fghaas> ah, that was easy. an auto-scuttlemonkey :)
[18:38] <John> Ahh... you're right. It's in the repo, but on the "next" branch.
[18:38] <jmlowe> got a link handy?
[18:40] <John> Just looked. I actually closed the bug, so I've got to check why I'm not seeing it. I had changed it to use virtio instead of ide.
[18:40] * nhm (~nh@65-128-150-185.mpls.qwest.net) has joined #ceph
[18:40] <John> jmlowe: You're talking about http://ceph.com/docs/master/rbd/libvirt/#configuring-the-vm step 1, correct?
[18:41] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[18:41] <jmlowe> http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim
[18:42] <jmlowe> specifically I don't think this works block.scsi0-0-0
[18:42] * rustam (~rustam@94.15.91.30) has joined #ceph
[18:42] * gregaf1 (~Adium@2607:f298:a:607:7c9a:6ba6:7f70:222) Quit (Quit: Leaving.)
[18:43] <jmlowe> definitely not for virtio-scsi, and I'm thinking it wouldn't work for ide since it says scsi0 not something ide related
[18:44] * gregaf (~Adium@2607:f298:a:607:559a:91fd:7482:b0f5) has joined #ceph
[18:44] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:44] <John> Actually, I'm going to file this as a bug in lieu of your second comment. I think we need some additional examples.
[18:45] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:45] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:45] <jmlowe> ok, let me know the number and I can attach some of my live xml as an example
[18:46] <jmlowe> also while you are at it this strictly speaking isn't true "Note that this uses the IDE driver. The virtio driver does not support discard." probably should be more like this "Note that this uses the IDE driver. The virtio-blk driver does not support discard; however, the virtio-scsi driver does support discard."
[18:48] * noob2 (~cjh@173.252.71.2) has joined #ceph
[18:49] <John> jmlowe: Noted, and thanks. I'll run this by Josh and get it resolved.
[18:51] * l0nk (~alex@83.167.43.235) Quit (Ping timeout: 480 seconds)
[18:52] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:55] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[19:00] <t0rn> When using tgtadm option '--bstype rbd' with cephx where do you tell it what user to authenticate to the cluster with?
[19:00] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[19:00] * tnt (~tnt@91.176.19.114) has joined #ceph
[19:02] * mrjack_ (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[19:05] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:06] * l0nk (~alex@83.167.43.235) has joined #ceph
[19:06] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: When the chips are down, well, the buffalo is empty)
[19:09] <Kioob> Hi
[19:09] * BillK (~BillK@124-148-237-131.dyn.iinet.net.au) Quit (Read error: Connection reset by peer)
[19:10] <Kioob> it's confirmed, I see data loss on multiple RBD image
[19:10] <Kioob> images*
[19:11] <jmlowe> Kioob: could you elaborate a little?
[19:12] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[19:13] <Kioob> yes : I have Xen VM which use RBD image (via kernel client)
[19:13] <pioto> sagewk: hm, well. actually, my version of libvirt claims "error: Failed to clone vol from [...] error: this function is not supported by the connection driver: storage pool does not support volume creation from an existing volume
[19:13] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[19:13] <Kioob> and on some VM I have empty files, corrupted FS, or and corrupted database
[19:13] <pioto> maybe that was implemented after 1.0.2
[19:13] <wido> Kioob: what Ceph version and kernel version?
[19:13] <jmlowe> Kioob: osd kernel version, osd filesystem, ceph version ?
[19:14] <sagewk> pioto: only, not sure what libvirt clone is/means. that probably isn't wired up to the rbd clone functionality.
[19:14] <Kioob> maybe it's an hardware problem, but I search a way to confirm tha
[19:14] <Kioob> that*
[19:14] <wido> pioto: I wrote that implementation
[19:14] <wido> let me check the later
[19:14] <Kioob> wido & jmlowe : ceph 0.56.4, all OSD and client kernels are 3.6.* (xen VM kernels 3.8.*)
[19:15] <Kioob> and I use XFS storage
[19:15] <jmlowe> Kioob: sounds like me, except for the corruption and kvm instead of xen
[19:19] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[19:22] <Kioob> I have empty files on OSD too
[19:22] <Kioob> like this one : /var/lib/ceph/osd/ceph-10/current/3.63_head/DIR_3/DIR_6/DIR_2/DIR_D/rb.0.2f06a.238e1f29.000000000217__snapdir_0733D263__3
[19:22] <Kioob> I suppose it's not "normal"
[19:23] <Kioob> /var/lib/ceph/osd/ceph-39/current/3.7c_head/DIR_C/DIR_7/DIR_3/DIR_2/rb.0.15c26.238e1f29.000000000a76__12d7_7B52237C__3 too
[19:23] <Kioob> but it's on different OSD, on different hosts
[19:24] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[19:24] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[19:25] * BillK (~BillK@124-148-198-148.dyn.iinet.net.au) has joined #ceph
[19:25] * rturk-away is now known as rturk
[19:36] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[19:42] <Kioob> *snapdir* files are often empty, so this one is probably not a problem
[19:45] * l0nk (~alex@83.167.43.235) Quit (Ping timeout: 480 seconds)
[19:50] * alram (~alram@38.122.20.226) has joined #ceph
[19:51] * KevinPerks2 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[19:51] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:52] * l0nk (~alex@83.167.43.235) has joined #ceph
[19:55] <sagewk> snapdir files should always be empty; the snap info is in an xattr
[20:01] * l0nk (~alex@83.167.43.235) Quit (Ping timeout: 480 seconds)
[20:01] <Kioob> ok, great sagewk
[20:02] <Kioob> so I have not a lot of "corrupted" files
[20:04] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:05] * dmick (~dmick@2607:f298:a:607:a034:ace:685b:f0b2) has joined #ceph
[20:10] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[20:16] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) has joined #ceph
[20:17] * BillK (~BillK@124-148-198-148.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[20:18] * drokita (~drokita@199.255.228.128) has joined #ceph
[20:20] <mikedawson> glowell: thanks for your fix on 4752. In light of http://tracker.ceph.com/issues/4756 , what permissions am I missing if "ceph auth list" doesn't have a mon. section?
[20:22] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[20:23] <glowell> mikedawson: What I used for my testing was:[root@gary-centos-02 ceph]# diff /tmp/keyring /var/lib/ceph/mon/ceph-gary-centos-02/keyring
[20:23] <glowell> 3c3
[20:23] <glowell> < caps mon = "allow *"
[20:23] <glowell> ---
[20:23] <glowell> > caps mon = ""
[20:24] <glowell> That was sufficient to test the authentication failure, but I think it also leaves the monitor unreachable by client programs.
[20:25] <dmick> the mon. key is really supposed to only be for bootstrapping; normally the mons communicate their keys amongst themselves. Later operation sets keys for the other
[20:25] <dmick> daemons and clients explicitly rather than the 'global' mon.
[20:27] <mikedawson> glowell, dmick: Thanks! mkcephfs doesn't even put a caps line in there with 0.59 / 0.60 (#4756). If I want to get past this issue, should I add the caps mon = "allow *" in /var/lib/ceph/mon/ceph-a/keyring ? Or run next when gitbuilder spins a new version?
[20:27] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[20:28] <dmick> depends on which issue you're interested in getting past. If you don't need ceph-create-keys behavior you can just disable that executable for now
[20:28] <dmick> other than the logging, it's not doing any harm, tho
[20:29] <mikedawson> dmick: comment it out of the init script?
[20:30] * BillK (~BillK@58-7-166-94.dyn.iinet.net.au) has joined #ceph
[20:30] <dmick> or move it out of the way in its bin dir, or change its executable mode, or edit it to make it exit immediately, or...
[20:30] <dmick> try to pick something you'll remember to undo or that the packaging will undo, I supposse
[20:30] <mikedawson> dmick: thanks
[20:32] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[20:34] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[20:38] * xdeller (~xdeller@performiks.starlink.ru) Quit (Quit: Leaving)
[20:38] * rturk is now known as rturk-away
[20:39] * rturk-away is now known as rturk
[20:39] <mikedawson> joao: I have logs showing the error condition of two leaders and one probing. Here's a preview http://pastebin.com/raw.php?i=2iSU5yvY
[20:39] * rturk is now known as rturk-away
[20:39] * rturk-away is now known as rturk
[20:40] * gmason_ (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[20:40] <joao> mikedawson, that's the weirdest thing I've seen in a long while
[20:41] * gmason (~gmason@hpcc-fw.net.msu.edu) Quit (Ping timeout: 480 seconds)
[20:41] * gmason_ is now known as gmason
[20:43] <Kioob> I found empty files on 2 OSD, and one of this OSD it's only a replica, and the other OSD is down. So it can't be the source of my problem
[20:44] <mikedawson> joao: then mon.b and mon.c held an election (which mon.b won). http://pastebin.com/raw.php?i=cyNDk77G
[20:46] <mikedawson> joao: I've seen this several times in the last week with 0.60.
[20:46] <joao> mon.a appears to be oblivious to any on-going elections
[20:47] <mikedawson> joao: At this point, yes.
[20:48] <joao> was it a participant on the election?
[20:50] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has joined #ceph
[20:51] <mikedawson> joao: mon.a logged something about an election about 4 hours earlier: http://pastebin.com/raw.php?i=sEZtctQa
[20:53] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) Quit ()
[20:53] * eschnou (~eschnou@131.165-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:55] <joao> mikedawson, point me to the full logs and I'll look into that when I'm through with the bugs I'm wrapping up?
[20:56] * verwilst (~verwilst@dD576F6A2.access.telenet.be) has joined #ceph
[20:59] * rtek (~sjaak@rxj.nl) Quit (Ping timeout: 480 seconds)
[21:22] * BillK (~BillK@58-7-166-94.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[21:27] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has joined #ceph
[21:33] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) Quit (Quit: Leaving.)
[21:40] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[21:42] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[21:47] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) Quit (Remote host closed the connection)
[21:48] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) has joined #ceph
[21:56] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) Quit (Ping timeout: 480 seconds)
[21:59] * rtek (~sjaak@rxj.nl) has joined #ceph
[21:59] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[22:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:10] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[22:11] * mr_fribble (5b09cc47@ircip2.mibbit.com) has joined #ceph
[22:21] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) has joined #ceph
[22:29] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) has joined #ceph
[22:29] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:31] * BillK (~BillK@58-7-139-175.dyn.iinet.net.au) has joined #ceph
[22:33] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has left #ceph
[22:34] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[22:35] * portante` (~user@66.187.233.206) Quit (Ping timeout: 480 seconds)
[22:38] <mikedawson> joao: Now I have a core dump. I have several GB of files for you. I'll post them and try to point you at times of interest. Is there anything else that would be helpful?
[22:40] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[22:40] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) Quit (Ping timeout: 480 seconds)
[22:41] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[22:44] * rustam (~rustam@94.15.91.30) has joined #ceph
[22:50] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) Quit (Quit: Leaving.)
[22:59] <mr_fribble> what are ceph tunables?
[23:01] * verwilst (~verwilst@dD576F6A2.access.telenet.be) Quit (Quit: Ex-Chat)
[23:10] <joao> mikedawson, logs and core are just fine, thanks :)
[23:12] <Kioob> How can I found the internal osd file, corresponding to an offset of an RBD image ?
[23:12] <mr_fribble> joao: is wip-bobtail-rbd-backports-req-order safe?
[23:12] <Kioob> "rados --pool $POOL ls | grep $objectPrefix", do the job ?
[23:13] <joao> mr_fribble, don't know; whoever committed to that branch should know better than I do
[23:13] <Kioob> (I have snapshots for that RBD image, so I'm not sure where to found the "good" object)
[23:14] <mr_fribble> joao: sorry meant Josh
[23:16] * Havre (~Havre@2a01:e35:8a2c:b230:e0ba:7dd8:8fa9:7050) has joined #ceph
[23:17] * ebo^ (~ebo@koln-5d8119bb.pool.mediaWays.net) Quit (Quit: Verlassend)
[23:21] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[23:26] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[23:28] * mr_fribble (5b09cc47@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:32] <mikedawson> joao: how many OSDs have you had running with 0.60?
[23:33] <mikedawson> joao: I can get all three mons in quorum (leader, peon, peon) and it's totally stable without OSDs. Then I start 66 OSDs and the monitors start having issues
[23:38] * eschnou (~eschnou@131.165-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[23:40] <Kioob> for the record : "rados ls" is not necessary, the file is calculated $prefix.$blockNum
[23:40] <Kioob> now, I have to find how it works with snapshot
[23:41] * vata (~vata@2607:fad8:4:6:44c1:aa21:33e3:45bc) Quit (Quit: Leaving.)
[23:43] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[23:46] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:50] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.