#ceph IRC Log


IRC Log for 2013-01-07

Timestamps are in GMT/BST.

[0:06] * madkiss (~madkiss@p5792CB7A.dip.t-dialin.net) Quit (Quit: Leaving.)
[0:09] * illuminatis (~illuminat@0001adba.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:09] * Gugge_47527 (gugge@kriminel.dk) has joined #ceph
[0:12] * Gugge-47527 (gugge@kriminel.dk) Quit (Ping timeout: 480 seconds)
[0:12] * Gugge_47527 is now known as Gugge-47527
[0:14] * CloudGuy (~CloudGuy@5356416B.cm-6-7b.dynamic.ziggo.nl) has joined #ceph
[0:16] * illuminatis (~illuminat@0001adba.user.oftc.net) has joined #ceph
[0:18] * Cube (~Cube@173-112-127-33.pools.spcsdns.net) has joined #ceph
[0:30] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[0:36] * CloudGuy (~CloudGuy@5356416B.cm-6-7b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[0:36] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Quit: Leseb)
[0:38] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:38] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:03] * The_Bishop (~bishop@2001:470:50b6:0:84ff:28d5:ef9e:5d70) Quit (Read error: Operation timed out)
[1:03] <loicd> Hi, I'm looking for low hanging fruits to fix bugs ( http://tracker.newdream.net/projects/ceph/issues ) while learning the ceph code base. Advices ?
[1:14] * The_Bishop (~bishop@2001:470:50b6:0:a4c0:2576:9777:861f) has joined #ceph
[1:17] * LeaChim (~LeaChim@b01bde88.bb.sky.com) Quit (Ping timeout: 480 seconds)
[1:17] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:21] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:27] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[1:29] * sagelap (~sage@ has joined #ceph
[1:33] * sagelap1 (~sage@ has joined #ceph
[1:33] * sagelap (~sage@ Quit (Read error: Connection reset by peer)
[1:38] * BManojlovic (~steki@ has joined #ceph
[1:39] * Cube (~Cube@173-112-127-33.pools.spcsdns.net) Quit (Quit: Leaving.)
[1:40] * sagelap (~sage@ has joined #ceph
[1:41] * sagelap1 (~sage@ Quit (Ping timeout: 480 seconds)
[1:43] * sagelap1 (~sage@ has joined #ceph
[1:48] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[1:50] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:51] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:53] * korgon (~Peto@isp-korex- Quit (Quit: Leaving.)
[1:53] * Cube (~Cube@173-112-127-33.pools.spcsdns.net) has joined #ceph
[2:02] * jlogan (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[2:04] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)
[2:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[2:19] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[2:20] * jlogan (~Thunderbi@2600:c00:3010:1:b121:611b:9c01:6f68) has joined #ceph
[2:21] * Cube (~Cube@173-112-127-33.pools.spcsdns.net) Quit (Quit: Leaving.)
[2:41] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:41] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:59] * markl (~mark@tpsit.com) Quit (Ping timeout: 480 seconds)
[3:01] * yoshi (~yoshi@p2100-ipngn4002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:11] * BManojlovic (~steki@ Quit (Ping timeout: 480 seconds)
[4:22] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[4:26] * jlogan (~Thunderbi@2600:c00:3010:1:b121:611b:9c01:6f68) Quit (Ping timeout: 480 seconds)
[4:52] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) Quit (Quit: Leaving.)
[4:52] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) has joined #ceph
[5:00] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[5:00] <dec> so we're seeing a heap of slow requests on 0.53 and found a ceph-devel thread discussing some fixes; is it worth going to 0.56 and see if this fixes them?
[5:33] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:33] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:39] <dec> building 0.56 now - do I need --with-debug?
[5:43] <iggy> dec: i think 0.56.1 will be out tomorrow with some fixes
[5:48] <dec> doh
[5:49] <dec> so, given you're here too iggy... the timing issues I was seeing on VMs before has returned (even with ntpd stopped on the VM)
[5:49] <dec> it seems to be ceph that's causing the issue; the VMs are backed by ceph rbd disks and the ceph slow downs are delaying the VMs for ~15-30sec at a time and causing big issues
[5:50] <dec> is there anything majorly wrong in 0.56 that I need to wait for 0.56.1, because otherwise I'll try 0.56 (desparately trying to fix this problem)
[5:56] <iggy> one of them was upgrading from older versions
[5:58] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:07] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) has joined #ceph
[6:10] <dec> iggy: hmm, eek; any idea where that's tracked?
[6:10] <dec> I'll dig around.
[6:15] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:17] <dec> ah: http://tracker.newdream.net/issues/3731
[6:27] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) has joined #ceph
[6:41] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) Quit (Remote host closed the connection)
[6:42] <dec> ok; building now against 0.56 testing @ git a10950f
[6:42] <dec> which includes those backwards compat fixes
[6:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[6:48] * loicd (~loic@magenta.dachary.org) has joined #ceph
[6:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[6:51] * silversurfer (~silversur@ has joined #ceph
[6:52] * silversurfer (~silversur@ Quit (Read error: Connection reset by peer)
[6:52] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:01] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) Quit (Quit: Leaving.)
[7:47] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[8:11] * themgt (~themgt@71-90-234-152.dhcp.gnvl.sc.charter.com) has joined #ceph
[8:12] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[8:12] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:12] * silversurfer (~silversur@ has joined #ceph
[8:14] * themgt (~themgt@71-90-234-152.dhcp.gnvl.sc.charter.com) Quit ()
[8:14] * silversurfer (~silversur@ Quit (Read error: Connection reset by peer)
[8:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:28] * themgt (~themgt@71-90-234-152.dhcp.gnvl.sc.charter.com) has joined #ceph
[8:30] * Morg (d4438402@ircip3.mibbit.com) has joined #ceph
[8:40] * sleinen (~Adium@2001:620:0:46:c969:ad77:cee7:ec9c) has joined #ceph
[8:45] * themgt (~themgt@71-90-234-152.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[8:46] * ninkotech (~duplo@ip-94-113-217-68.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[8:52] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:05] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:11] * madkiss (~madkiss@ has joined #ceph
[9:12] * madkiss (~madkiss@ Quit ()
[9:12] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Take my advice. I don't use it anyway)
[9:13] * Leseb (~Leseb@ has joined #ceph
[9:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:22] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:37] * ScOut3R (~ScOut3R@ has joined #ceph
[9:50] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:52] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[9:53] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:54] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[9:55] * f4m8_ (f4m8@kudu.in-berlin.de) has left #ceph
[9:55] * ScOut3R (~ScOut3R@ has joined #ceph
[10:02] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:05] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[10:05] <Kioob`Taff> hi
[10:05] <Kioob`Taff> « osd: experimental support for PG “splitting” (pg_num adjustment for existing pools) »
[10:05] <Kioob`Taff> great !
[10:05] <Kioob`Taff> but I don't like the «experimental» :p
[10:06] <Kioob`Taff> so, I suppose it's not safe for now...
[10:06] <Kioob`Taff> but, how doesn't it works ? It will balance all data of the pool, right ?
[10:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:23] * LeaChim (~LeaChim@b01bde88.bb.sky.com) has joined #ceph
[10:30] * loicd (~loic@ has joined #ceph
[10:31] * LeaChim (~LeaChim@b01bde88.bb.sky.com) Quit (Ping timeout: 480 seconds)
[10:40] * korgon (~Peto@isp-korex- has joined #ceph
[10:40] * LeaChim (~LeaChim@b01bde88.bb.sky.com) has joined #ceph
[10:43] * korgon (~Peto@isp-korex- Quit ()
[10:50] * korgon (~Peto@isp-korex- has joined #ceph
[10:58] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[11:15] * korgon (~Peto@isp-korex- Quit (Read error: Connection reset by peer)
[11:26] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) has joined #ceph
[11:55] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[12:00] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:17] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[12:17] * ScOut3R (~ScOut3R@ has joined #ceph
[12:17] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[12:20] * allsystemsarego (~allsystem@5-12-241-245.residential.rdsnet.ro) has joined #ceph
[12:22] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:25] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[12:28] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[12:47] * ScOut3R_ (~ScOut3R@ has joined #ceph
[12:53] * sleinen1 (~Adium@2001:620:0:25:ad61:4aca:ca45:8bff) has joined #ceph
[12:53] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[12:54] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[13:00] * sleinen (~Adium@2001:620:0:46:c969:ad77:cee7:ec9c) Quit (Ping timeout: 480 seconds)
[13:01] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[13:02] * low (~low@ has joined #ceph
[13:28] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:43] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:49] * agh (~agh@www.nowhere-else.org) has joined #ceph
[13:49] <agh> hello to all
[13:49] <agh> one question :
[13:50] <agh> qemu-img create -f rbd rbd:data/foo 10G is going to create a rbd format 1.
[13:50] <agh> is there a way to create a format2 rbd ? (to do cloning...)
[13:55] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:55] <iggy> i think only newer rbd tools can do that
[13:56] <agh> iggy: ok. it is what i tghout...
[13:56] <agh> thanks
[14:00] * ScOut3R (~ScOut3R@ has joined #ceph
[14:01] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:03] * ScOut3R_ (~ScOut3R@ Quit (Read error: Operation timed out)
[14:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:21] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:23] <jtang> good morning #ceph
[14:25] <tnt> Any idea when 0.56 will be tagged bobtail ?
[14:37] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:37] <agh> for OSD hosts, do you recommand Intel or AMD procs ?
[14:38] <tnt> I'm not sure it has any influence ...
[14:39] <loicd> Hi, I'm compiling ceph ( 0.56 ) with 6 cores in 3.45mn . What's the fastest I can hope for ?
[14:44] <agh> tnt: ok, thanks
[14:45] <janos> agh, i have some of both. i don't notice a difference
[14:48] * Morg (d4438402@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:50] <agh> janos: ok, thanks a lot
[14:51] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:53] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:55] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[15:08] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[15:09] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:11] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[15:14] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:15] <dec> ok, cluster upgraded to 0.56 + testing patches; let's see how we go!
[15:15] <dec> so far so good - all clients (0.53 librados) still talking ok.
[15:17] * tnt is still waiting for the "bobtail" stamp ...
[15:18] * dec couldn't wait - having major performance issues with 0.53
[15:18] * tnt is still at argonaut :p
[15:18] <tnt> How "major" ?
[15:19] <dec> any reasonable I/O load on the cluster, or any operation (like restarting an osd/mon daemon) would cause severe delayed IO (15-80seconds) on all of our RBD images (which we're using to serve VMs)
[15:20] <dec> so frequently all the VMs backed by this cluster would have ~30seconds where they'd all just hang completely unresponsive
[15:21] <janos> dec: fixed up wth .56+ ?
[15:21] <dec> janos: I've only been on 0.56+ for about 15 minutes now, but I'm putting some serious IO load on the cluster and I just restarted an OSD and I saw none of the IO delay I was seeing before
[15:21] <dec> so it's looking promising
[15:21] <janos> cool
[15:22] <dec> previously, restarting an OSD was guaranteed to kill IO
[15:22] <janos> doh
[15:22] <tnt> Oh nice. Here I get delays on RBD only when restarting a lots of OSD (like half of them at once).
[15:23] <janos> i'm on fedora and was having issues with 0.52 (the most recent in the repo) so i've done up my own repo with latest as of about a week ago and it's been doing ok
[15:23] <dec> right; I just built EL6 RPMs for 0.56 + patches from 'testing' branch
[15:24] <tnt> is 'testing' what's schedules for 0.56.1 ?
[15:25] <dec> apparently, yes
[15:28] * vata (~vata@ has joined #ceph
[15:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[15:31] * mikedawson_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:32] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:32] * mikedawson_ is now known as mikedawson
[15:32] <dec> ok - I'm happy enough with this, and it's now 1am, so better get to bed :/
[15:33] <dec> see how the cluster handles traffic in the morning :)
[15:34] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:36] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[15:36] * agh (~agh@www.nowhere-else.org) has joined #ceph
[15:39] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[15:42] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:51] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[15:53] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:54] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) Quit (Quit: Leaving.)
[16:01] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:01] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:16] * PerlStalker (~PerlStalk@ has joined #ceph
[16:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:33] * jtangwk (~Adium@2001:770:10:500:48e:a072:72ca:9727) Quit (Read error: Connection reset by peer)
[16:34] * jtangwk (~Adium@2001:770:10:500:ac68:810:d319:bdd4) has joined #ceph
[16:34] * jlogan1 (~Thunderbi@2600:c00:3010:1:b121:611b:9c01:6f68) has joined #ceph
[16:39] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:40] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:42] * jtangwk (~Adium@2001:770:10:500:ac68:810:d319:bdd4) Quit (Ping timeout: 480 seconds)
[16:44] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[16:50] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:55] * markl (~mark@tpsit.com) has joined #ceph
[16:57] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:00] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[17:01] * sagelap1 (~sage@ Quit (Ping timeout: 480 seconds)
[17:04] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[17:18] * sleinen1 (~Adium@2001:620:0:25:ad61:4aca:ca45:8bff) Quit (Quit: Leaving.)
[17:18] * sleinen (~Adium@ has joined #ceph
[17:22] * sagelap (~sage@121.sub-70-197-145.myvzw.com) has joined #ceph
[17:23] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:26] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[17:29] * ninkotech (~duplo@ip-94-113-217-68.net.upcbroadband.cz) has joined #ceph
[17:33] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[17:33] * loicd (~loic@2a01:e35:2eba:db10:349a:a81d:9d6c:f6d) has joined #ceph
[17:34] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[17:38] <noob2> anyone know how the rbd tag works?
[17:40] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:46] <loicd> Hi, where can I find unit tests for OSD ? I see https://github.com/ceph/ceph/tree/master/src/test/osd but contrary to what I expected from the directory name, it relates to librados.
[17:46] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[17:49] * sleinen1 (~Adium@2001:620:0:25:34f4:2b26:b63c:36cb) has joined #ceph
[17:52] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:53] * low (~low@ Quit (Quit: Leaving)
[17:54] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[17:55] * zaitcev (~zaitcev@lembas.zaitcev.us) has joined #ceph
[18:00] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[18:00] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:01] * agh (~agh@www.nowhere-else.org) has joined #ceph
[18:03] * sagelap (~sage@121.sub-70-197-145.myvzw.com) Quit (Read error: Connection reset by peer)
[18:10] * jtangwk (~Adium@2001:770:10:500:ac68:810:d319:bdd4) has joined #ceph
[18:16] <noob2> can i use the same --shared tag for multiple rbd mounts?
[18:17] <noob2> or does it have to be unique for each rbd pair?
[18:17] * sagelap (~sage@2607:f298:a:607:c554:b663:176f:da4d) has joined #ceph
[18:17] <sstan> I think it's tagged shared by default. Unless you tag it locked for a specific client .. I think
[18:18] <noob2> yeah i have multiple esx servers that are going to share a rbd image
[18:18] <noob2> i was thinking i should use the --shared lun_name as the tag
[18:19] <Karcaw> i have 31 pgs suck in an 'active+remapped' state, ceph -s says they are stuck unclean, how do i get the system to rem-map, and or clean these up. I made sure all osds are active and working.
[18:19] <noob2> you might be short on osd's to get it to remap. that is usually my case
[18:20] * The_Bishop (~bishop@2001:470:50b6:0:a4c0:2576:9777:861f) Quit (Ping timeout: 480 seconds)
[18:20] <noob2> sstan: i've tried to round robin io's from multiple clients before and it caused a kernel panic on both proxies at the same time.
[18:20] <noob2> i wasn't using the shared lock though which is probably my problems
[18:28] * The_Bishop (~bishop@2001:470:50b6:0:31a9:74eb:eb95:cf60) has joined #ceph
[18:29] * wubo (80f40d05@ircip3.mibbit.com) has joined #ceph
[18:33] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:33] <wubo> i'm having trouble getting back to a healthy state. Running ceph version 0.56 (1a32f0a0b42f169a7b55ed48ec3208f6d4edc1e8)
[18:33] <wubo> back when I was running argonaut I had a sequence of osd failures that I never fully recovered from
[18:34] <wubo> though, it never said that data was lost
[18:34] <wubo> now I've been limiting access to the cluster and rebuilding for days. I've stopped making progress at .324% degraded
[18:34] <wubo> 2013-01-07 12:33:41.470343 mon.0 [INF] pgmap v632793: 1544 pgs: 1523 active+clean, 3 active+remapped, 17 active+degraded, 1 active+clean+scrubbing; 4833 GB data, 15903 GB used, 5611 GB / 22356 GB avail; 12579/3884520 degraded (0.324%)
[18:34] <wubo> ceph pg health: http://pastebin.com/yciad5xw
[18:35] <wubo> ceph pg 1.a1 query: http://pastebin.com/2hm10UCU
[18:36] <wubo> i'm especially stumped by the the query result. I can't tell what recovery is blocking on
[18:36] <paravoid> I have the exact same issue with 0.56.
[18:36] <paravoid> well, I think it's the same
[18:37] <jamespag`> noob2, did you figure out that issue with the libcls_lock object/symlink?
[18:37] <noob2> not yet no :(
[18:37] <paravoid> actually, I have 61 pgs active+remapped
[18:37] <paravoid> for days
[18:37] <paravoid> they are all spread into multiple osds
[18:38] <paravoid> it was initially 95 but I restarted one of these osds and it got to 61
[18:38] <wubo> paravoid: glad I'm not alone. yeah, there's no clear single offending osd
[18:38] <wubo> no pattern I can see
[18:38] <paravoid> presumably if I restart all the OSDs that have those, it'll get fixed
[18:38] <wubo> hm. maybe I just need to start cycling osds and waiting
[18:38] <paravoid> but I left it like that as to debug it with people here :-)
[18:38] <paravoid> sjust, sagelap: around? :)
[18:39] <jamespag`> noob2, I don't see that object in the 0.48 package atall
[18:39] <jamespag`> which I think is correct (i.e. I don't see any issues)
[18:39] <noob2> ok
[18:39] <noob2> that's prob my problem then. i'm using a mix of ceph and ubuntu packages
[18:39] <noob2> i need to correct that
[18:39] <jamespag`> noob2, that was my thinking
[18:40] <noob2> yeah
[18:40] <wubo> paravoid: i've also been seeing stuff like this since the update: 2013-01-07 12:39:03.094738 mon.0 [INF] mdsmap e22149: 1/1/1 up {0=a=up:active(laggy or crashed)}
[18:40] <wubo> paravoid: have you been getting anything like that?
[18:40] <paravoid> I don't use an mds
[18:41] <wubo> paravoid: a good choice
[18:42] <paravoid> heh :)
[18:44] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[18:44] <paravoid> wubo: btw, I have no active+degraded ones, just active+remapped
[18:48] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[18:52] * justinwarner (~ceg442049@osis111.cs.wright.edu) has joined #ceph
[18:56] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:58] * loicd (~loic@2a01:e35:2eba:db10:349a:a81d:9d6c:f6d) Quit (Quit: Leaving.)
[18:58] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:01] <justinwarner> I recently set up a small cluster and it seemed to work (Followed the 5-min. guide) and when doing a "ceph osd tree" I get this: http://pastebin.com/xk3FVFX3. The two machines say DNE (Assuming it means Does Not Exist) but they're both restarted and should be working. Wondering if I may be missing anything?
[19:02] <justinwarner> (Using Argonaut, can't get Bobtail to work correctly).
[19:02] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: We be chillin - IceChat style)
[19:04] <justinwarner> I think I figured it out, sorry.
[19:04] * justinwarner (~ceg442049@osis111.cs.wright.edu) has left #ceph
[19:04] <sstan> are osd deamons running on Wilkinson ..
[19:04] <mikedawson> sstan: he apparently didn't want to stick around
[19:04] <sstan> hah ok
[19:05] <sstan> appeared .. and vanished
[19:06] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[19:07] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:09] <noob2> sstan: do i need to do a rbd lock add name id first before i can use the --shared option?
[19:10] <sstan> hmm honestly I never tried, just read about it
[19:10] <sstan> tell us if it works : )
[19:10] <noob2> lol
[19:10] <noob2> i get this at mount time
[19:10] <noob2> rbd: only the lock add command uses the --shared option
[19:12] <sjust> paravoid: here now
[19:12] <noob2> i don't get it. there's a lock add --shared tag but no tag option when you mount it?
[19:13] <mikedawson> Is anyone getting any RBD client logs? I am using qemu to mount RBD volumes. Didn't get any client logging at all, so I specified "log file = /var/log/ceph/client.$pid.log" under [clients
[19:13] <mikedawson> under [clients] in ceph.conf, but I just get 0 byte files
[19:13] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:14] <paravoid> sjust: saw backlog?
[19:14] <paravoid> sjust: I have 61 pgs stuck in active+remapped for no apparent reason
[19:14] <paravoid> sjust: wubo had a similar issue, I think his are active+degraded though
[19:14] * sstan (~chatzilla@dmzgw2.cbnco.com) Quit (Remote host closed the connection)
[19:14] <paravoid> (or hers?)
[19:14] <sjust> not yet, I'll take a look shortly
[19:15] <mikedawson> I'm also can't get connected to the RBD admin socket. I have "admin socket = /var/run/ceph/client.$pid.asok" under [clients] in ceph.conf. I get another 0 byte file, but can never connect to the socket
[19:15] * scuttlemonkey_ is now known as scuttlemonkey
[19:17] <mikedawson> paravoid: wubo: Have you tried setting tunables?
[19:18] * sstan (~chatzilla@dmzgw2.cbnco.com) has joined #ceph
[19:18] <paravoid> I haven't, but does it matter?
[19:18] <mikedawson> paravoid: joshd had me do it when I had some PGs stuck in active+degraded and it worked http://ceph.com/docs/master/rados/operations/crush-map/#tunables
[19:19] <mikedawson> paravoid: not sure if it is applicable to your issue or not
[19:20] <mikedawson> http://tracker.newdream.net/issues/3720
[19:20] <paravoid> I've read that, but it needs some very scary argument if I recall correctly :)
[19:20] <paravoid> or it had a big fat warning or something
[19:20] <paravoid> so I chickened out
[19:21] <mikedawson> yep, if you don't add --enable-unsafe-tunables it just issues a warning and quits
[19:21] <mikedawson> but this isn't production data for me, so went ahead
[19:23] <mikedawson> paravoid: my understanding is I remapped from 2x replication to 3x replication and something failed for some of my PGs (most worked), but Ceph didn't retry remapping the failed PGs so they were left with 2 replicas and therefore degraded
[19:23] <mikedawson> setting the tunables allowed Ceph to try again.
[19:23] <paravoid> that wasn't what I tried to do, I just added a bunch of OSDs
[19:23] <buck> teuthology question: I have a node that didn't come up right after a scheduled test and now I need to nuke it / release the lock. Is there a simple teuthology command to do this?
[19:28] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:28] <slang> buck: teuthology-nuke
[19:31] <buck> slang: thanks
[19:33] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[19:36] * chutzpah (~chutz@ has joined #ceph
[19:36] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:36] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) has joined #ceph
[19:48] <sstan> Is there any advantage to use a RBD block as a LVM PV ?
[19:48] <sstan> in the context of virtualization
[19:48] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[19:49] <sstan> vs the 1 block per VM approach
[19:55] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:57] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:58] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:00] * Cube1 (~Cube@ has joined #ceph
[20:01] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[20:01] * agh (~agh@www.nowhere-else.org) has joined #ceph
[20:09] <sjust> dspano: you there?
[20:10] <dspano> Yeah.
[20:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[20:14] <dspano> sjust: Are the heap files I sent okay?
[20:14] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:15] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:15] <sjust> dspano: do you have the git hash from the ceph-osd binary you were running?
[20:15] <dspano> sjust: Sorry if I sound stupid. How do I get that?
[20:16] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:16] <sjust> dspano: no worries, ceph-osd --version
[20:16] <sjust> I think
[20:16] <sjust> or, you could just sftp the binary to cephdrop@ceph.com
[20:17] <sjust> that would be easiest for me
[20:20] <dspano> sjust: Sure thing.
[20:21] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[20:21] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:23] <sjust> sage: wip-3734 looks fine
[20:26] <dspano> sjust: It's asking for a password.
[20:26] <sjust> I just pm'd it to you
[20:26] * LeaChim (~LeaChim@b01bde88.bb.sky.com) Quit (Ping timeout: 480 seconds)
[20:35] * LeaChim (~LeaChim@b0faeeb0.bb.sky.com) has joined #ceph
[20:37] <sstan> has anyone been able to load the RBD module with SLES ?
[20:39] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[20:42] * dmick (~dmick@2607:f298:a:607:8063:62c7:78d5:6751) has joined #ceph
[20:45] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[20:45] * sel (~sel@90-228-233-162-no223.tbcn.telia.com) has joined #ceph
[20:47] <loicd> This is the coverage report generated from the results of make check-coverage in the master branch of https://github.com/ceph/ceph.git : http://dachary.org/loic/ceph/html/ . It shows 35% of the LOC are covered. Is it correct or did I miss something ?
[20:54] <sel> Hello, I've set up a cluster with four nodes with two osd's on each node My config is more or less equal to http://wiki.debian.org/OpenStackCephHowto. Kernel is 3.7.1 compiled based on the debian wheezy config from 3.2. I've created a rbd with btrfs on top, and all seems ok. Except if I try to do a dd if=/dev/zero of=/<mnt>/testfile bs=4k count=100k. (10k) blocks works, but not 100k bloks. What have I done wrong?
[20:54] <sel> ceph packages from ceph's stable repository on debian
[20:55] <sel> wheezy
[20:56] <loicd> sel: that would be 0.48 / argonaut, right ?
[20:57] <sel> loicd: that's coeect
[20:57] <sel> correct
[20:57] <sel> 0.48.2argonaut-1~bpo70+
[20:59] <gregaf1> loicd: joshd can correct me on the coverage, but I believe that looks about right — we don't have make-time checks for a lot of the codebase :(
[21:00] <gregaf1> sel: before testing with rbd, can you just run "rados -p data bench 60 write" and see if that completes successfully?
[21:01] <loicd> gregaf1: coverage should be better with https://github.com/ceph/teuthology though, right ?
[21:03] <gregaf1> yeah; not sure what it looks like but it's definitely better
[21:03] <loicd> :-)
[21:03] <sel> gregaf1: that seems to work fine. http://pastebin.com/RNYDa0H1
[21:05] <gregaf1> sel: okay, just wanted to make sure you could actually write to the OSDs
[21:05] <mikedawson> gregaf1: can you help me get a client admin socket working? Trying "admin socket = /root/ceph/run/client.$pid.asok" under [client] in ceph.conf but I just get 0 byte files
[21:06] <sel> Tried with ext4 on the rbd device, and that seems to work better.
[21:07] <gregaf1> sel: try adding fsync to the dd as well and make sure that the small one works that way
[21:07] <gregaf1> hrmm, ext4 vs btrfs shouldn't make that much of a difference
[21:07] <gregaf1> you're sure you have the same keys and things set up?
[21:07] <gregaf1> joshd: this is the point at which I call for you because rbd or something is being weird ;)
[21:08] <gregaf1> mikedawson: it should be a 0-byte file, I believe
[21:08] <gregaf1> it's a socket which you can connect to using "ceph --admin-daemon" (iirc)
[21:08] <gregaf1> try "ceph --admin-daemon /root/ceph-run/client.$pid.asok help" and see what that spits back
[21:09] <mikedawson> gregaf1: I guess the mon and osds are 0 byte as well. I can connect to them, but not the client
[21:10] <mikedawson> gregaf1: and I can't get the client to log... I just get mode 0 byte files
[21:10] <gregaf1> what happens when you try?
[21:10] <gregaf1> "I can't get the client to log" meaning what?
[21:10] <gregaf1> you're turning on a client and it doesn't do anything?
[21:10] <mikedawson> [client] log file = /root/ceph/log/client.$pid.log
[21:11] <mikedawson> and I get a bunch of 0 byte files in /root/ceph/log/client.*.log
[21:13] <mikedawson> gregaf1: connect to /root/ceph/run/client.12583.asok failed with (111) Connection refused
[21:13] <wubo> paravoid: "him"
[21:13] <paravoid> heh :)
[21:14] <wubo> mikedawson: (tuneables) no, i'll check it out
[21:14] <mikedawson> gregaf1: none of the values substituted for $pid are processes running on my system
[21:15] <gregaf1> mikedawson: are any of them off-by-one?
[21:15] <mikedawson> wubo: worked for me. I tried it against non-valuable data in a test setting
[21:15] <gregaf1> that does sound odd though
[21:15] <wubo> mikedawson: I also went from 2 to 3 replication but I thought that had completed successfully before the latest fun started.
[21:18] <mikedawson> gregaf1: I have hundreds of the client log files after trying to get this working
[21:19] <mikedawson> actually, a few have data, but 99% are 0 bytes long
[21:19] <mikedawson> root@node1:~/ceph/log# ls client.* | wc -l
[21:19] <mikedawson> 551
[21:20] <gregaf1> mikedawson: okay, so you've probably got a log for each time you run "ceph", which is why all the logs
[21:22] <gregaf1> do you have working clients? or were you just trying to use the admin socket because you saw it and that's not working?
[21:22] <mikedawson> gregaf1: How should I try to log RBD client stuff? Maybe using $pid isn't the right idea
[21:23] <mikedawson> I'm trying to tune RBD writeback caching
[21:23] <gregaf1> do you have separate RBD and admin client keys?
[21:23] <mikedawson> mikedawson: I was hoping to get an admin socket to confirm settings are being applied correctly
[21:24] <mikedawson> yes. I have OpenStack Folsom, Cinder, libvirt/qemu/kvm, and Ceph playing nicely together. Copy on Write working as well
[21:25] <mikedawson> now I'm just trying to get a handle on RBD writeback caching
[21:25] <gregaf1> so use the $name identifier to segregate them into different folders or something :)
[21:25] <gregaf1> s/them/the output/
[21:26] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[21:27] <mikedawson> gregaf1: if I have two VMs with two RBD volumes each, how many admin sockets should I expect to see?
[21:28] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:28] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:32] <gregaf1> mikedawson: if they're running on a single QEMU you'll see one admin socket
[21:32] <gregaf1> joshd: right?
[21:32] <gregaf1> oh wait, he's not here today, dur
[21:32] <gregaf1> damn
[21:32] <gregaf1> umm, dmick, right?
[21:34] <dmick> uh
[21:35] <dmick> I don't think there's anything special about the client, no, but, I think it's one VM to one qemu proc
[21:35] <gregaf1> okay, wasn't quite sure of the model on that side
[21:35] <mikedawson> dmick: that's what I was thinking as well
[21:35] <sel> gregaf1: seems that the trouble is btrfs dd conv=fsync if=/dev/zero of=/mnt/testfile bs=4k count=750 works, but dd conv=fsync if=/dev/zero of=/mnt/testfile bs=4k count=1k don't With a ext4 fs i don't see this kind of problem. Is there any known problems with kernel 3.7.1?
[21:36] <gregaf1> so each QEMU process will have an RBD client inside of it, but the number of disks they get doesn't matter
[21:36] <dmick> but if the default socket name doesn't have PID in it there might be issues
[21:36] <gregaf1> sel: not that I know of — how big is your virtual disk?
[21:36] <gregaf1> btrfs might just be running out of free space if it's small
[21:37] <gregaf1> …although that would be much smaller than I was thinking it was
[21:38] <sel> I'm testing with a 1GB rbd so that shouldn't be the problem...
[21:39] <mikedawson> so I changed to [client] log file = /root/ceph/log/$name.log but now I just get a 0-byte /root/ceph/log/client.admin.log ... Set the file to 777 and still nothing logged...
[21:39] <gregaf1> mikedawson: running daemons aren't going to change their file while running, did you restart them?
[21:40] <gregaf1> sel: actually that might be it; I don't remember the numbers but btrfs doesn't play nice with small disks
[21:40] <gregaf1> something about reserving a constant amount of space for metadata ops and things
[21:40] <sel> Ok, I'll try 50GB
[21:40] <gregaf1> anyway, looks to be btrfs and not rbd :)
[21:42] <mikedawson> gregaf1: restarted ceph, libvirt-bin, nova-compute. Restarted instances. Still no logging into /root/ceph/log/client.admin.log
[21:42] <sel> a bit scary since I'm using btrfs for the osd
[21:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:42] <gregaf1> mikedawson: I wouldn't expect any logs there if you're using different keys for RBD; they'd be in /root/ceph/log/client.rbd.log (or whatever)
[21:42] <gregaf1> assuming they have perms to write to /root, of course
[21:43] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[21:44] <mikedawson> I have /root/ceph/logs as 777, but the files that get created are root:root 644
[21:47] <gregaf1> I believe those are the permissions that would be assigned if the creator is being run as the root user
[21:48] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[21:48] <dmick> mikedawson: fwiw "id=" can be set in the qemu string; otherwise your name is client.admin, I believe
[21:48] <mikedawson> gregaf1: the whole issue I'm trying to get resolved is around RBD writeback caching. When benchmarking volumes that appear to have writeback disabled vs volumes that appear to have it enabled, I get the same write performance. So I'm wondering if it is working at all
[21:49] <gregaf1> mikedawson: can you pastebin your ceph.conf somewhere? and then reboot the host machine and start everything up again and see what kinds of logs are available?
[21:49] <tziOm> Is it possible to have crushmaptool read from stdin ?
[21:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:49] <mikedawson> dmick: Here is a snippet from the long kvm process -drive file=rbd:volumes/volume-1fefc724-418f-4bb6-a92b-f7a9f2fad9dc:id=volumes:key=AQDvZ+RQ6NENBhAAjokqIsN5jEr1TVaPCJL1FA==:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
[21:50] <mikedawson> dmick: id=volumes
[21:51] <mikedawson> gregaf1: http://pastebin.com/3Kqri1T6
[21:51] <dmick> yup. so I would expect every process to use client.volumes for its name
[21:53] <mikedawson> dmick: That's what I was thinking too, but I don't see any log files named client.volumes.log anywhere
[21:53] <dmick> tziOm: assuming you mean crushtool, I don't think so.
[21:53] <gregaf1> tziOm: I don't think that works, sorry
[21:54] <dmick> mikedawson: I would expect /root/ceph/log/client.volumes.log; however, know that apparmor probably prevents opening files there unless you've allowed it explicitly (and is probably warning about it in /var/log somewhere)
[21:55] <sel> gregaf1: the size was the problem it seems, tried with a 4gb file now, and it worked as it should. Thanks for the help.
[21:55] <mikedawson> dmick: is there someplace very permissive to send logs to avoid apparmour pitfalls? /tmp?
[21:55] <gregaf1> np!
[21:56] <dmick> see /etc/apparmor.d/abstractions/libvirt-qemu
[21:57] <sel> btw is there a timeframe for a production ready cephfs?
[21:57] <dmick> doesn't look like much is writable by default
[21:57] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[21:58] * agh (~agh@www.nowhere-else.org) has joined #ceph
[21:59] <mikedawson> dmick: Don't know much about apparmor. Can I add a line like "/var/log/ceph rw," and " /var/run/ceph rw," or something like that?
[22:03] <gregaf1> sel: nope; we're targeting a stable single-MDS system for cuttlefish, but no promises
[22:05] <sel> Ok, so for a production system, a HA nfs cluster with the data on a rdb device is the right path ?
[22:05] <dmick> mikedawson: me either, but templating from the lines that are in there, I'd say so
[22:05] <tziOm> dmick, huh.. so you accept patches?
[22:06] <dmick> tziOm: sure
[22:06] <tziOm> ok, lets take a look at it.
[22:06] <gregaf1> sel: I dunno enough about the other options to say if it's the right path, but it's certainly the most supported path involving Ceph, yes
[22:06] <dmick> github fork-to-pull-requests too
[22:08] <dmick> mikedawson: see apparmor.d(5) if there are problems
[22:09] <wido> hi
[22:10] <wido> I've got some PGs in "remapped" state
[22:10] <wido> 1736 pgs: 1590 active+clean, 146 active+remapped
[22:10] <wido> 2 hosts, 4 osds per host. I restarted one OSD and now thos PGs stay remapped
[22:11] <wido> replication is 3x, it seems that those PGs are stuck
[22:12] <wido> If I check those PGs I see two OSDs as "up" for that PG and 3 as "acting"
[22:14] <sjust> dspano: can you get a heap dump from a longer time period?
[22:14] <sjust> paravoid: can you post ceph pg dump?
[22:18] <gregaf1> wido: sounds like the standard issue where CRUSH isn't mapping enough OSDs to the PG, so it keeps all the copies that exist as a special override
[22:18] <sjust> paravoid: ^ is likely what is causing your issue as well
[22:18] <wido> gregaf1: But this can be triggered after a OSD restart
[22:18] <wido> since all PGs were clean before the restart
[22:18] <gregaf1> wido: although hmm, what are your rules?
[22:19] <wido> unknownrack, 2 hosts, 4 osds per host
[22:19] <wido> didn't touch it, let mkcephfs do the hard work
[22:19] <wido> 0.56 btw
[22:19] <gregaf1> okay, so that should just be grabbing three OSDs without regard to the hosts
[22:19] <gregaf1> hrm
[22:19] <sjust> if you are doing replication by host, 3x replication won't necessarily work with 2 osds
[22:19] <sjust> *2 hosts
[22:20] <wido> sjust: I know for data safety, this was just another test
[22:20] <gregaf1> mkcephfs won't have set up rules to segregate hosts with only two hosts though
[22:20] <wido> Adding a 3rd host later this week
[22:20] <sjust> gregaf1: ok
[22:20] <gregaf1> unless that changed at some point
[22:20] <wido> gregaf1: I'll check the map just to be sure
[22:20] <gregaf1> calebamiles/calebamiles1 might remember the shape of this better than I do
[22:21] <gregaf1> but the part where it said 2 Up, 3 Acting makes me think it's some sort of CRUSH collision
[22:21] <wido> gregaf1: step chooseleaf firstn 0 type host
[22:21] <wido> default has unknownrack with two hosts
[22:22] <sjust> yeah, that is choosing by host
[22:22] <wido> it can't select 3 hosts there, right?
[22:22] <sjust> I assume that all pgs in that pool are remapped or degraded?
[22:22] <gregaf1> yeah, that's bizarre — how many hosts are in your ceph.conf?
[22:22] <dspano> sjust: I stopped the profiler and restarted the OSD at around 2:46 EST. Do you want me to send the last one, or restart the profiler and let things run for a certain amount of time?
[22:22] <sjust> I would just restar the profiler and let it run for a while
[22:23] <sjust> how long does it usually take to get to the point where you have to restart it?
[22:23] <wido> gregaf1: two hosts
[22:23] <wido> sjust: No, that's the weird thing. Just 146 out of the 1736
[22:23] <wido> AFTER I restarted osd.0
[22:23] <wido> before everything was clean
[22:24] <sjust> wido: can you package up ceph pg dump, ceph osd dump, and the osdmap?
[22:24] <sjust> and put it on cephdrop@ceph.com
[22:24] <dspano> sjust: It takes a long time. I noticed with the profiler running, it crept up there faster.
[22:24] <wido> sjust: sure, give me a sec
[22:24] <sjust> dspano: well, the profiler uses memory, that's probably a red herring
[22:25] <sjust> dspano: it would probably suffice to let it run until the memory use becomes a problem
[22:25] <sjust> ...or tomorrow morning, whichever comes first
[22:26] <dspano> sjust: My only concern is that it may crash in the middle of the night.
[22:26] <wido> sjust: Put it on, is that scp or do you mean per mail?
[22:26] <sjust> scp is easiest
[22:26] <wido> scp asks for a password
[22:26] <sjust> sorry, sftp
[22:27] <sjust> dspano: a few hours should be enough then
[22:27] <dspano> sjust: Around the time of the last dump the memory usage was up to 63%. Do you want that one for now?
[22:28] <sjust> dspano: yeah
[22:28] <sjust> you can sftp it
[22:28] <dspano> Alright.
[22:28] <sjust> that one will probably be enough
[22:28] <glowell1> elder saw your last update to the bug. After I get 56.1 build kicked off, I'll update those packages.
[22:29] <elder> I don't know if it'll fix it or not.
[22:29] <elder> Now I'm back to my old kernel config and am getting errors in "perf"
[22:29] <elder> I'm afraid I might have updated my own environment and now it's incompatible with the old or something.
[22:29] <elder> Be careful about updating...
[22:29] <elder> (I.e., make sure you can go back I guess)
[22:30] <dspano> sjust: The filename is osd.0.profile.0019.heap
[22:31] <wido> sjust: I just uploaded remapped-pgs.tar.gz
[22:32] <sjust> dspano: thanks, how long did it take to get to that point?
[22:32] <elder> glowell1, looks like I installed asciidoc, and maybe bison and libelf-dev
[22:33] <mikedawson> dmick: apparmor was the culprit
[22:33] <glowell1> ok, so just xmlto potentially missing
[22:34] <wido> sjust: gregaf1: I "fixed" it by bringing the replication level down to 2, mark osd.0 down and out
[22:34] <wido> and restart osd.0, now all PGs are clean again
[22:34] <mikedawson> dmick: Thank you for your help! After adding /var/log/ceph rw, to /etc/apparmour.d/abstractions/libvirt-qemu , I now get the log files I've been looking fow
[22:34] <mikedawson> for
[22:34] <gregaf1> wido: don't think you even needed to restart the daemon :)
[22:35] <sjust> wido: there does appear to be a bug and it's that the other pgs in pool 2 are not marked degraded
[22:35] <gregaf1> mikedawson: let us know if you can't get the admin socket working properly with that knowledge :)
[22:35] <sjust> the ones marked remapped are correct
[22:35] <wido> gregaf1: I know, could have marked it in again, restart did the same
[22:35] <dspano> sjust: I started the profiler at 9:54am, so it had been running for almost 5 hours at that point.
[22:35] <wido> sjust: Ah, ok. Since the crushmap would never allow replication 3x with just two hosts
[22:36] <mikedawson> gregaf1: not yet ... root@node1:~# tail -100 /var/log/ceph/client.volumes.log
[22:36] <mikedawson> 2013-01-07 16:29:28.544115 7f8a4a266780 -1 asok(0x1541a70) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/client.client.volumes.asok': (13) Permission denied
[22:36] <wido> Very reasonable
[22:36] <sjust> dspano: that's wierd, it shows only 160MB of allocations
[22:36] <sjust> wido: yeah, but it should have been showing degraded for the rest, very odd
[22:36] <sjust> v0.56?
[22:36] <gregaf1> mikedawson: I believe that's another permissions access thing
[22:36] <gregaf1> does the daemon have write permission to that directory?
[22:36] <mikedawson> gregaf1: yep
[22:36] <dspano> sjust: I noticed that too.
[22:38] <wido> sjust: 0.56-1precise debian package. I use the debian-testing repo for this one
[22:39] <mikedawson> gregaf1: do you know which daemon it would be?
[22:39] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[22:39] <mikedawson> gregaf1: I changed the dir to 777, and bounced everything I can think of
[22:39] <gregaf1> mikedawson: well, assuming that only the QEMU daemon is using client.volumes it'll be that one which needs the perms
[22:40] <mikedawson> and have allowed it in apparmor
[22:40] <wido> I'm going off. Tnx gregaf1 and sjust
[22:41] <noob2> if you start out your ceph with a split public/ replication network can you go back to a merged one again if you wanted to?
[22:41] <gregaf1> mikedawson: I don't know what to tell you; directory or apparmor access permissions are pretty much the only thing that would cause that error message
[22:41] <gregaf1> maybe try a different directory and see if you can get it working elsewhere
[22:41] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[22:42] <gregaf1> noob2: yes, you can switch the OSD IP addresses whenever you like with a restart
[22:42] <mikedawson> gregaf1: yep. I'll get it. Thanks for your guidance!
[22:42] <noob2> gregaf1: sweet :)
[22:43] <gregaf1> although obviously your life will be easier in terms of uptime and such if both networks can communicate with each other for the switchover
[22:43] <noob2> yeah networking was smart and made the replication network routed
[22:44] <dspano> sjust: Would it be prudent to send you the stats I get from top at the time of the heap dump?
[22:44] <sjust> couldn't hurt
[22:45] <elder> glowell1, nevermind, I had a leftover thing in my build script from trying to diagnose the perf problem.
[22:46] <glowell1> Ok.
[22:50] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[22:50] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[22:50] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit ()
[22:50] <dspano> sjust: I just did a heap dump of my second OSD. Top says this:
[22:50] <dspano> 2414 root 20 0 1035m 632m 5772 S 86 3.9 182:32.44 ceph-osd
[22:51] <dspano> But the dump says it's only using 3.6 MB of memory.
[22:54] <dmick> mikedawson: glad you got the logs going
[22:55] <dmick> mikedawson: you can prove whether it's apparmor or not by looking for REJECT in the logs
[22:55] * dmick (~dmick@2607:f298:a:607:8063:62c7:78d5:6751) has left #ceph
[22:55] * dmick (~dmick@2607:f298:a:607:8063:62c7:78d5:6751) has joined #ceph
[22:55] * ChanServ sets mode +o dmick
[22:55] <mikedawson> dmick: I changed /var/run/ceph to 777 to get past the permissions issue. Now when I service cinder-volumes restart, the file /var/run/ceph/client.volumes.asok is removed
[22:56] <dmick> mikedawson: you mean you did both apparmor changes *and* the perms change? I think apparmor will stop it creating a socket as well
[22:57] * sel (~sel@90-228-233-162-no223.tbcn.telia.com) Quit (Quit: Leaving)
[22:57] <mikedawson> dmick: ahh - I added /var/run/ceph rw to apparmor, but that's probably now sufficient
[22:57] <mikedawson> dmick: I probably need something to do with sockets
[22:57] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[23:00] <dmick> yeah, create at least
[23:01] <dmick> but again, checking the logs for REJECT would be good
[23:01] <dmick> see ERRORS in apparmor(7)
[23:02] <mikedawson> dmick: I don't see REJECT anywhere under /var/log
[23:02] <dmick> seems unlikely that's it then
[23:05] <sjust> well, keep in mind that the dump only shows the heap delta from when you started the profiler until the dump
[23:05] <sjust> dspano: ^
[23:10] <gregaf1> sjust: dspano: there should be stats dumped along with it (to the OSD log and maybe the central logger) that account for lifetime runs though
[23:10] <gregaf1> or you can get them again by saying "heap dump" instead of "heap start_profiler"
[23:10] <gregaf1> if you want to see how much it thinks is allocated and in use in total
[23:13] <dspano> gregaf1: sjust: Alright. I'll start a fresh run early tomorrow morning when I can watch it, and send you everything once memory usage becomes an issue.
[23:15] * benpol (~benp@garage.reed.edu) has joined #ceph
[23:19] * noob2 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[23:19] <benpol> So if all my OSDs are up and running, but I have 51 stuck in stale+active+clean, do I have to give up on the data in those stale PGs?
[23:21] <benpol> (This is a small test cluster, so it wouldn't be the end of the world to lose some data.)
[23:25] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) Quit (Remote host closed the connection)
[23:26] <paravoid> gregaf1, sjust, wubo: I have 2 replicas and 48 osds
[23:26] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[23:26] <sjust> paravoid: can you sftp the output of 'ceph pg dump' 'ceph osd dump' and the osdmap to cephdrop@ceph.com?
[23:27] <paravoid> sure
[23:27] <tziOm> how come data is not cleared when deleted from rados?
[23:27] <paravoid> sjust: sftp?
[23:28] <sjust> paravoid: yes
[23:28] <paravoid> no password?
[23:28] <dmick> tziOm: you mean like wiping the disk blocks?
[23:33] * nwat (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[23:36] <paravoid> sjust: done
[23:36] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[23:37] <sjust> paravoid: what's the file called?
[23:37] <paravoid> wmf-*
[23:37] <paravoid> wmf-pg-dump, wmf-osd-dump, wmf-osdmap
[23:38] <sjust> paravoid: what is the output of ceph-osd -v
[23:38] <sjust> ?
[23:38] <paravoid> ceph version 0.56 (1a32f0a0b42f169a7b55ed48ec3208f6d4edc1e8)
[23:38] <paravoid> ceph.com precise packages.
[23:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[23:42] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[23:44] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[23:45] <mikedawson> dmick: it's like I can create a new client admin socket with a service cinder-volume restart, but it goes away right away
[23:46] <sjust> paravoid: did you change the replication level of pool 3?
[23:46] <paravoid> no but I doubled the OSDs last week
[23:46] <sjust> how?
[23:46] <paravoid> it was 29% degraded or so initially
[23:46] <paravoid> I just added more OSDs
[23:46] <sjust> to both racks?
[23:46] <sjust> did you change the crush rule?
[23:46] <paravoid> yes I did
[23:47] <paravoid> to chooseleaf rack, instead of osd
[23:48] * mattbenjamin (~matt@wsip-24-234-55-160.lv.lv.cox.net) has joined #ceph
[23:48] <sjust> as an experiment, try marking osd0 out and then back in
[23:48] <mikedawson> dmick: actually, I can confirm that using iwatch. I see IN_CREATE then IN_DELETE on the socket back to back when I restart the cinder-volume service
[23:48] <paravoid> oh I said that before
[23:49] <paravoid> there were 95 active+remapped initially
[23:49] <paravoid> I did a pg dump and restarted one of the osds in question
[23:49] <sjust> crud
[23:49] <paravoid> and seemed to fix it for those and get it down to 61
[23:49] <sjust> which osd?
[23:49] <paravoid> but I left the rest for further debugging
[23:50] <paravoid> 10 I think
[23:50] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[23:50] <paravoid> I'm also hesitant to restart more, the cluster doesn't exactly behave very nice when I do that
[23:50] <sjust> yeah
[23:51] <paravoid> partly because of 3714
[23:51] <paravoid> oh yeah, I have a few (3-4) osds running off wip-3714
[23:51] <paravoid> not sure if it matters
[23:51] <sjust> it shouldn't
[23:52] <dmick> mikedawson: odd. sounds like the process is starting and exiting
[23:52] <mikedawson> dmick: it is odd. cinder-volume is running properly though
[23:53] <mikedawson> dmick: I wonder if apparmor is killing the socket as soon as it sees it
[23:53] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[23:53] <dmick> doubt it; it just wouldn't allow it to be created
[23:53] * agh (~agh@www.nowhere-else.org) has joined #ceph
[23:53] <dmick> does iwatch give you the pid?
[23:54] * zaitcev (~zaitcev@lembas.zaitcev.us) Quit (Quit: Bye)
[23:54] <mikedawson> if I have $pid in ceph.conf I see it in the filename created and immediately deleted
[23:55] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Quit: slang)
[23:55] <dmick> right, I'm just wondering which proc it is
[23:56] <mikedawson> 18227 last time, but now process #18227 stays around
[23:57] <dmick> and is that a kvm process?
[23:57] <mikedawson> dmick: cinder-volume is 18222 or 18215
[23:58] <dmick> but what is 18227?
[23:59] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.