#ceph IRC Log


IRC Log for 2013-02-13

Timestamps are in GMT/BST.

[0:00] <rturk> customer?
[0:01] * vata (~vata@2607:fad8:4:6:345e:43e0:3ced:a3d2) Quit (Quit: Leaving.)
[0:01] <madkiss> ya
[0:03] <Kolobok> I have perfoemed the following http://pastebin.com/iJZZ4E81
[0:04] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[0:05] <Kolobok> Single machine with multiples disks HEALTH_ERR 576 pgs stuck inactive; 576 pgs stuck unclean; no osds
[0:08] * aliguori (~anthony@ Quit (Remote host closed the connection)
[0:09] * The_Bishop (~bishop@2001:470:50b6:0:e8cb:1adf:280:a93d) has joined #ceph
[0:10] * sleinen (~Adium@user-23-12.vpn.switch.ch) Quit (Quit: Leaving.)
[0:10] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[0:12] <ShaunR> can ceph even be run on a single machine?
[0:13] <gregaf> it's silly, but sure
[0:13] <gregaf> we do it a lot for development
[0:14] <Kolobok> Well if it is what might be the proble
[0:14] <Kolobok> logs look ok
[0:14] <gregaf> Kolobok: if you don't have any OSDs, that's the problem
[0:15] <gregaf> dunno why they didn't get created by mkcephfs; maybe your ceph.conf is wrong?
[0:16] <Kolobok> sekhttp://pastebin.com/50aLE8yg
[0:16] <Kolobok> I did as conf as simple as possible
[0:17] <Kolobok> when i remove devs = /dev/sdb
[0:17] <gregaf> oh, I believe that you only use the "devs" option if you aren't handling the mounting yourself :)
[0:17] <gregaf> so you should just put in the directory path to use
[0:18] <gregaf> (which you are already doing generically, hurray)
[0:19] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:19] <Kolobok> oh :) so just leave it as [osd.0] host = ceph [osd.1] host = ceph
[0:20] <gregaf> yeah
[0:20] <Kolobok> gregaf thanks man
[0:20] <gregaf> np
[0:27] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[0:28] <Kolobok> YES :) ceph -k /etc/ceph/keyring.admin -c /etc/ceph/ceph.conf health HEALTH_OK
[0:39] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[0:50] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[0:50] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[0:50] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[1:04] * nwat (~Adium@soenat3.cse.ucsc.edu) has left #ceph
[1:14] <Kdecherf> Hm interesting behavior, gregaf: I observe the same storm (hangs half of clients) when I switch the active mds
[1:25] * jlogan1 (~Thunderbi@2600:c00:3010:1:f10b:fe00:c3e7:1d31) Quit (Ping timeout: 480 seconds)
[1:32] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:35] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:49] * loicd1 (~loic@2a01:e35:2eba:db10:710b:b6a1:1908:4a44) Quit (Quit: Leaving.)
[1:51] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:52] * LeaChim (~LeaChim@5e0d73fe.bb.sky.com) Quit (Ping timeout: 480 seconds)
[1:59] <ShaunR> gregaf: any special mount options i should use with xfs?
[1:59] <ShaunR> one site i'm looking at is saying i should do... storage1
[1:59] <ShaunR> storage2
[1:59] <ShaunR> kvm1
[1:59] <ShaunR> bahhh
[1:59] <ShaunR> rw,noexec,nodev,noatime,nodiratime,barrier=0
[2:01] <ShaunR> the ceph.com docs show only adding user_xattr
[2:02] <ShaunR> but i think thats only for ext4 right
[2:04] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:11] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[2:11] * jjgalvez (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[2:27] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[2:29] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:36] <joshd> ShaunR: never use barrier=0 underneath ceph, it gives up data safety
[2:39] <paravoid> unless you have a BBU?
[2:39] * Lennie`away (~leen@lennie-1-pt.tunnel.tserv11.ams1.ipv6.he.net) Quit (Ping timeout: 480 seconds)
[2:39] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[2:41] <joshd> yeah
[2:41] <paravoid> s/underneath ceph// then :)
[2:47] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[2:52] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[2:53] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[2:56] * Lennie`away (~leen@524A9CD5.cm-4-3c.dynamic.ziggo.nl) has joined #ceph
[2:58] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:59] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[2:59] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[3:03] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[3:07] * rturk is now known as rturk-away
[3:16] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[3:17] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) Quit (Quit: Leaving.)
[3:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:21] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[3:23] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[3:32] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[3:37] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[4:04] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[4:07] * joshd1 (~jdurgin@2602:306:c5db:310:7dd2:ff03:e100:4283) Quit (Ping timeout: 480 seconds)
[4:17] * joshd1 (~jdurgin@2602:306:c5db:310:e8ea:d203:4ee9:9ddb) has joined #ceph
[4:26] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:43] * KindTwo (KindOne@h175.177.130.174.dynamic.ip.windstream.net) has joined #ceph
[4:44] * KindOne (KindOne@h149.20.131.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[4:44] * KindTwo is now known as KindOne
[4:44] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:50] * KindOne (KindOne@h175.177.130.174.dynamic.ip.windstream.net) Quit (Read error: No route to host)
[4:51] * KindOne (KindOne@h175.177.130.174.dynamic.ip.windstream.net) has joined #ceph
[4:58] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[5:00] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:01] * yehuda_hm (~yehuda@2602:306:330b:a40:5046:9efc:4382:29bf) Quit (Ping timeout: 480 seconds)
[5:10] * yehuda_hm (~yehuda@2602:306:330b:a40:29a8:3ee1:e94f:c893) has joined #ceph
[5:16] * KindOne (KindOne@h175.177.130.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[5:16] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[5:17] * KindOne (KindOne@ has joined #ceph
[5:18] * nyeates (~nyeates@pool-173-59-239-231.bltmmd.fios.verizon.net) has joined #ceph
[5:36] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[5:39] * rturk-away is now known as rturk
[5:49] * rturk is now known as rturk-away
[5:54] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[5:55] * capri_on (~capri@ has joined #ceph
[6:05] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[6:10] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[6:11] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[6:13] * nyeates (~nyeates@pool-173-59-239-231.bltmmd.fios.verizon.net) Quit (Quit: nyeates)
[6:13] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[6:14] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Quit: leaving)
[6:30] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:33] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[6:33] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[6:37] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[6:49] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) Quit (Quit: Leaving)
[6:50] * bulent (~bulent@adsl-75-22-79-118.dsl.irvnca.sbcglobal.net) has joined #ceph
[7:13] * bulent (~bulent@adsl-75-22-79-118.dsl.irvnca.sbcglobal.net) Quit (Quit: Leaving)
[7:18] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[7:24] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[7:24] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[7:44] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:49] <Karcaw> i've recently upgraded to 56.2, does the 'osd target transaction size' change witht the new code, or do i need to run something to get it to change. I am seeing errors about timeouts and osd's are killing them selves, the internet seems to say this is related to the timeout being too big in old versions
[7:59] * itamar (~itamar@ has joined #ceph
[8:13] * gaveen (~gaveen@ has joined #ceph
[8:23] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[8:40] * itamar (~itamar@ Quit (Quit: Ex-Chat)
[8:40] * itamar (~itamar@ has joined #ceph
[8:45] * jtang1 (~jtang@ has joined #ceph
[8:46] * gerard_dethier (~Thunderbi@ has joined #ceph
[8:54] <absynth_> Karcaw: "hit suicide timer" type of kill?
[8:58] * sleinen (~Adium@2001:620:0:25:a553:a90f:15c4:8413) has joined #ceph
[9:04] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[9:07] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[9:12] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[9:12] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[9:15] * ScOut3R (~ScOut3R@ has joined #ceph
[9:17] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Remote host closed the connection)
[9:19] * eschnou (~eschnou@ has joined #ceph
[9:22] * ghbizness2 (~ghbizness@host-208-68-233-254.biznesshosting.net) Quit ()
[9:31] * BManojlovic (~steki@ has joined #ceph
[9:34] * leseb (~leseb@mx00.stone-it.com) has joined #ceph
[9:38] <Rocky> wido: g-damn hippies ;)
[9:47] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:50] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[9:52] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[9:57] * loicd (~loic@2a01:e35:2eba:db10:710b:b6a1:1908:4a44) has joined #ceph
[10:00] * low (~low@ has joined #ceph
[10:03] * LeaChim (~LeaChim@5e0d73fe.bb.sky.com) has joined #ceph
[10:07] <absynth_> hm, what causes slow requests marked as "sparse read"?
[10:10] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[10:10] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has left #ceph
[10:12] * loicd (~loic@2a01:e35:2eba:db10:710b:b6a1:1908:4a44) Quit (Quit: Leaving.)
[10:12] * loicd (~loic@magenta.dachary.org) has joined #ceph
[10:15] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[10:15] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[10:19] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:21] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:24] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:29] * sm_ (~sm@ has joined #ceph
[10:36] <sm_> Hi! Is there anywhere a detailed description of how the ceph rebalancing algorithm works and when it kicks in? We have a setup with 4 OSDs, 3 of them have ~500G of Data but one holds 1,1TB. Is there a way to manually trigger rebalancing somehow?
[10:39] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:40] * dosaboy (~gizmo@faun.canonical.com) Quit (Read error: No route to host)
[10:40] <absynth_> probably, you have a pool that is much bigger than the others although it has the same pgnum?
[10:41] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[10:43] <sm_> well we use radosgw so all data is in the .rgw.buckets pool
[10:45] <absynth_> as far as i know, there is no automatically or manually triggered rebalancing of actual OSD storage
[10:45] <absynth_> you can manually trigger a small reweight to see what happens, but in my experience it won't change a lot
[10:47] <sm_> hm ok, the problem probably also is that the .rgw.buckets was not created manually before we started using it, so it has only 8 PGs
[10:47] <absynth_> yep
[10:47] <absynth_> that is suboptimal
[10:47] <sm_> is there a way to make radosGW use another pool?
[10:47] <sm_> we can add a pool
[10:47] <sm_> but it won't use it
[10:47] <absynth_> sorry, i don't use radosgw at all
[10:52] <sm_> thanks anyhow :) maybe someone else has an idea
[10:52] <absynth_> not a lot of people awake yet
[11:08] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[11:21] * ScOut3R (~ScOut3R@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[11:23] * jtang1 (~jtang@2001:770:10:500:edec:4577:b410:f27b) has joined #ceph
[11:23] * jtang2 (~jtang@2001:770:10:500:c110:6158:959c:8f49) has joined #ceph
[11:31] * jtang1 (~jtang@2001:770:10:500:edec:4577:b410:f27b) Quit (Ping timeout: 480 seconds)
[11:36] * l0nk (~alex@ has joined #ceph
[11:47] <madkiss1> how do I set the default pool that cephfs is supposed to use again
[11:58] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:58] <leseb> madkiss1: ceph mds add_data_pool <pool-id>?
[11:59] <madkiss1> awesome. and what's the counterpart of that?
[11:59] <madkiss1> My MDSes still think they need to take care of pool 0, which has, well, gone
[11:59] <madkiss1> hello leseb, btw :)
[11:59] <leseb> madkiss1: hello hello :)
[11:59] <leseb> madkiss1: cool :)
[11:59] <madkiss1> I guess I am looking for "del_data_pool", but that doesn't exist
[12:00] <leseb> :)
[12:00] <madkiss1> i'm serious, i think this keeps my MDSes crashing, so any ideas? where did you find the info on add_data_pool?
[12:00] <madkiss1> "ceph mds" is de facto undocumented
[12:03] <leseb> maybe there are no pool with the id 0? well maybe you deleted the default pools?
[12:03] <madkiss1> that is what I just said.
[12:04] <madkiss1> still "ceph mds dump" still shows me pool 0 though.
[12:05] <leseb> madkiss1: oops yes
[12:08] * dosaboy (~gizmo@faun.canonical.com) Quit (Quit: Leaving.)
[12:08] * bstaz_ (~bstaz@ext-itdev.tech-corps.com) has joined #ceph
[12:08] * bstaz (~bstaz@ext-itdev.tech-corps.com) Quit (Read error: Connection reset by peer)
[12:10] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[12:10] * krypto (~crypto@ has joined #ceph
[12:11] * krypto (~crypto@ Quit ()
[12:11] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:11] * krypto (~crypto@ has joined #ceph
[12:21] * CrashHD (~na@ Quit ()
[12:28] * ShaunR (~ShaunR@staff.ndchost.com) Quit ()
[12:29] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[12:36] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[12:39] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit ()
[12:39] <madkiss1> my ceph mds tells me "load_2 found no table"
[12:39] <madkiss1> so what do I do to fix this?
[12:40] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[12:48] * ScOut3R (~ScOut3R@5400CAE0.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[12:50] * ScOut3R (~ScOut3R@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[12:52] * ScOut3R_ (~scout3r@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[12:52] * ScOut3R (~ScOut3R@5400CAE0.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[12:55] * ScOut3R_ (~scout3r@5400CAE0.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[12:56] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[12:56] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:04] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:18] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:21] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) has joined #ceph
[13:25] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:27] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit ()
[13:28] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:44] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[13:47] <phillipp1> yehudasa: i'm trying to setup radosgw with nginx, but i'm getting 400 errors and it seems like the radosgw is not doing anything with the request
[13:48] <phillipp1> WARNING: RGWRados::log_usage(): user name empty (bucket=), skipping
[13:48] <phillipp1> like it can't parse the bucket name
[13:59] * darktim (~andre@pcandre.nine.ch) has joined #ceph
[13:59] * andret (~andre@pcandre.nine.ch) Quit (Read error: Connection reset by peer)
[14:28] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[14:30] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit ()
[14:36] * aliguori (~anthony@cpe-70-112-157-97.austin.res.rr.com) has joined #ceph
[14:48] * dosaboy (~gizmo@faun.canonical.com) Quit (Ping timeout: 480 seconds)
[15:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[15:15] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[15:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:26] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[15:31] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[15:34] * aliguori (~anthony@cpe-70-112-157-97.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:40] <madkiss1> 0> 2013-02-13 15:18:18.126442 7f704f9d7700 -1 mds/MDCache.cc: In function 'void MDCache::populate_mydir()' thread 7f704f9d7700 time 2013-02-13 15:18:18.124262
[15:43] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:44] * aliguori (~anthony@cpe-70-112-157-97.austin.res.rr.com) has joined #ceph
[15:50] <slang1> madkiss1: ceph mds remove_data_pool
[15:50] <slang1> madkiss1: I think that's what you want
[15:51] <madkiss1> aha? why?
[15:51] <madkiss1> the two pools I have in there actually are the ones I want now.
[15:51] <slang1> madkiss1: you wanted to remove the ..
[15:51] <madkiss1> ah
[15:51] <slang1> madkiss1: oh sorry - I must have misread
[15:51] <madkiss1> slang1: ya, but leseb helped with that one already.
[15:52] <madkiss1> I had the MDSes up and running again nicely, and now they fail with the aboveerror message
[15:52] <slang1> madkiss1: ah ok
[15:52] <slang1> madkiss1: its just that one liner, or is there more failure output?
[15:53] <madkiss1> there's a shitload of additional failure output
[15:54] * gerard_dethier (~Thunderbi@ has left #ceph
[15:55] <madkiss1> slang1: if you tell me your email address, i can send it to you
[15:55] * gerard_dethier (~Thunderbi@ has joined #ceph
[15:56] <slang1> madkiss1: slang@inktank.com
[16:00] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[16:01] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[16:03] * jtangwk (~Adium@2001:770:10:500:d97f:2952:4d27:da7f) has joined #ceph
[16:05] * aliguori (~anthony@cpe-70-112-157-97.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:06] <scalability-junk> yeah spam adress found @slang1 :P
[16:06] <scalability-junk> *address
[16:06] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[16:08] * vata (~vata@2607:fad8:4:6:345e:43e0:3ced:a3d2) has joined #ceph
[16:09] * aliguori (~anthony@cpe-70-112-157-97.austin.res.rr.com) has joined #ceph
[16:11] * jtangwk1 (~Adium@2001:770:10:500:2869:8c8d:3b8d:16c6) Quit (Ping timeout: 480 seconds)
[16:15] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[16:20] <Karcaw> i've recently upgraded to 56.2, does the 'osd target transaction size' change witht the new code, or do i need to run something to get it to change. I am seeing errors about timeouts and osd's are killing them selves with the hit suicide message, the internet seems to say this is related to the timeout being too big in old versions
[16:21] * aliguori (~anthony@cpe-70-112-157-97.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:22] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[16:22] * ScOut3R (~ScOut3R@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[16:24] * phillipp (~phil@p5B3AFF20.dip.t-dialin.net) has joined #ceph
[16:30] * phillipp1 (~phil@p5B3AFA06.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[16:33] * aliguori (~anthony@cpe-70-112-157-4.austin.res.rr.com) has joined #ceph
[16:33] * aliguori (~anthony@cpe-70-112-157-4.austin.res.rr.com) Quit (Remote host closed the connection)
[16:33] * aliguori (~anthony@cpe-70-112-157-4.austin.res.rr.com) has joined #ceph
[16:37] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[16:39] * stefunel- (~stefunel@static. has joined #ceph
[16:46] * gaveen (~gaveen@ has joined #ceph
[16:49] * KindOne (KindOne@ Quit (Read error: Connection reset by peer)
[16:57] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[16:58] * gerard_dethier (~Thunderbi@ Quit (Quit: gerard_dethier)
[16:58] * low (~low@ Quit (Quit: Leaving)
[16:59] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[17:04] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) has joined #ceph
[17:08] * itamar (~itamar@ Quit (Quit: Ex-Chat)
[17:09] * itamar (~itamar@ has joined #ceph
[17:12] * nmartin (~nmartin@adsl-98-90-198-125.mob.bellsouth.net) has joined #ceph
[17:15] <nmartin> where do you get to define the level of redundancy (ie replication) you want to build into a ceph cluster? I read the 5 minute install, and didn't see much in the way of configuring redundancy
[17:16] * sm_ (~sm@ Quit (Quit: sm_)
[17:16] <itamar> when you define your rados pool you can set the parameter for replication
[17:16] * krypto (~crypto@ Quit (Remote host closed the connection)
[17:16] <nmartin> so, if my ceph osd cluster is 4 servers, each with 4x2 TB drives, in jbod, i'd want to handle losing an entire server with 4 disks
[17:16] <nmartin> id define that on pool creation?
[17:16] <slang1> nmartin: http://ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas
[17:17] <nmartin> perfect, thanks! the docs are a bit overwhelming for a newb
[17:17] <itamar> the default replication is *2
[17:17] <itamar> by default the crush map hirarchy spreads the replicas between the PCs
[17:18] <itamar> if you have two PCs only.. two replicas will not reside on on pc
[17:18] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:18] <itamar> one pc..
[17:18] <nmartin> so, it sounds like id still want some raid on each osd server? or jbod is preferred?
[17:18] * diegows (~diegows@ has joined #ceph
[17:19] <andreask> nmartin: jbod is fine, crush map by default never places two replicas on the same server
[17:20] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[17:21] <nmartin> got it. Would it be suitable to run OSD daemins colocated on a KVM Dom0 host? Te be able to utilize the local storage on my hypervisors?
[17:21] <nmartin> or is osd too compute intensive?
[17:21] <andreask> this works yes
[17:21] <nmartin> these are dual hex core boxes with 96 GB ram, and 8x2tb drive bays
[17:22] <andreask> high cpu load can happen during recovery
[17:22] <nmartin> thanks! just getting ready to build an Apache Cloudstack 4 cloud, and am excited to see RBD support!
[17:22] <nmartin> that means my storage scales with my cloud
[17:22] <andreask> yeah, that is great
[17:22] <nmartin> sweet thanks
[17:23] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[17:24] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[17:26] * jlogan (~Thunderbi@2600:c00:3010:1:f10b:fe00:c3e7:1d31) has joined #ceph
[17:27] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit ()
[17:27] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[17:28] * itamar (~itamar@ Quit (Quit: Ex-Chat)
[17:35] * steve (~astalsi@c-69-255-38-71.hsd1.md.comcast.net) has joined #ceph
[17:35] * steve is now known as astalsi
[17:37] <astalsi> Hey all, quick question. I'm using 0.56.1 (havent gotten to the 0.56.2 upgrade). I'm wondering if I can filter which devices get scanned for btrfs filesystems?
[17:43] * cdblack (86868b46@ircip2.mibbit.com) has joined #ceph
[17:44] * ricksos (c0373727@ircip1.mibbit.com) has joined #ceph
[17:45] <slang1> astalsi: what are you using to deploy?
[17:46] <astalsi> slang1: At the moment, this is happening on every osd spinup, but I'm using xCat plus some scripts I wrote
[17:47] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[17:47] <wer> crap. The repositories updated and now where is 0.55.1-1~bpo70+1?
[17:47] * mengesb (~bmenges@servepath-gw3.servepath.com) has joined #ceph
[17:48] * mengesb (~bmenges@servepath-gw3.servepath.com) has left #ceph
[17:49] <wer> 0.56.1-1~bpo70+1 is uninstallable in wheezy. This is the second time this has happened to me. Do I need to maintain my own repo at this point?
[17:50] <slang1> astalsi: the osds will use the device specified in the config file
[17:51] <slang1> [osd.0]
[17:51] <slang1> host = foo
[17:51] <slang1> devs = /dev/sdd
[17:53] <astalsi> slang1: I have that specified in the [osd] general section (I have homogenous nodes, so the device is the same). Guessing thats not effective?
[17:53] <slang1> astalsi: it should be
[17:53] <slang1> astalsi: maybe I don't understand what you're trying to avoid
[17:53] <astalsi> slang1: Long scan times on fd0
[17:54] <astalsi> (dont ask why I have floppy drives)
[17:54] <astalsi> Also, it definately isnt:
[17:54] <astalsi> [osd]
[17:54] <astalsi> osd mkfs type = btrfs
[17:54] <astalsi> devs = /dev/sda3
[17:54] <janos> astalsi: 1 floppy = 1 osd, of course!
[17:54] <janos> ;)
[17:54] <janos> bask in the latency
[17:54] <astalsi> janos: I wish!
[17:55] <astalsi> Mostly I'm still int he middle of testing, so I'm getting annoyed witht he wait time to run tests. In production the wait wont be so much an issue at all
[17:55] <slang1> astalsi: do you have output you can post somewhere to show the osd scanning fd0?
[17:56] <astalsi> slang1: 1 sec, will pastebin as soon as this round hits that point
[17:57] * leseb (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[18:00] <astalsi> slang1: http://pastebin.com/6QuJeRWq
[18:00] <astalsi> (from syslog)
[18:01] * sm_ (~sm@xdsl-195-14-196-155.netcologne.de) has joined #ceph
[18:11] <slang1> astalsi: do those Buffer I/O messages only occur in the log while you're starting the osd?
[18:14] <astalsi> slang1: yes
[18:23] * l0nk (~alex@ Quit (Quit: Leaving.)
[18:29] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[18:30] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[18:33] * diegows (~diegows@ has joined #ceph
[18:35] <janos> i'm trying to unmap some RBD's - but i'm told device or resource busy - how do i tell what's occupying it?
[18:36] * hybrid5121 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Read error: Connection reset by peer)
[18:38] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: He who laughs last, thinks slowest)
[18:40] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[18:42] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:43] <ShaunR> hmm... this could just be me, but my cluster converted to 8 single disks raid0 looks to be alot slower than it was in a raid10 setup
[18:44] <nhm_> ShaunR: what controller?
[18:45] <ShaunR> LSI 9266-4i
[18:45] <nhm_> 1 big raid10?
[18:45] <ShaunR> after i run some bench's on this, i'm going to move it to onboard controller
[18:45] <ShaunR> two servers 4 disks each.
[18:45] <ShaunR> so before each server had a 4 disk raid10 array
[18:46] <ShaunR> now each server has 4 raid0 arrays (one per disk)
[18:46] <nhm_> some things to check: WB cache enabled? How many PGs? Is syncfs available?
[18:46] <ShaunR> i'm doing rbd to vms
[18:51] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:52] <ShaunR> this is a out-of-the-box setup, so pg's are whatever ceph comes with by default.
[18:52] <ShaunR> vm's are using writeback cache
[18:53] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[18:53] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) has joined #ceph
[18:55] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[18:55] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[19:00] * jtang2 (~jtang@2001:770:10:500:c110:6158:959c:8f49) Quit (Quit: Leaving.)
[19:06] * rturk-away is now known as rturk
[19:08] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:10] * Cube (~Cube@ has joined #ceph
[19:12] * sleinen (~Adium@2001:620:0:25:a553:a90f:15c4:8413) Quit (Quit: Leaving.)
[19:12] * sleinen (~Adium@ has joined #ceph
[19:16] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[19:20] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[19:27] <sstan> is this normal : time ceph osd pool get rbd pg_num
[19:27] <sstan> 10 seconds
[19:29] <sstan> I have 3 nodes. Two of them do it really fast, but it takes 10 seconds to the last one
[19:30] * chutzpah (~chutz@ has joined #ceph
[19:31] <noob2> disks ok on the last one?
[19:33] <sstan> 3 osds: 3 up, 3 in , health is ok
[19:33] * Cube (~Cube@ Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * ShaunR (~ShaunR@staff.ndchost.com) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * nmartin (~nmartin@adsl-98-90-198-125.mob.bellsouth.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * vata (~vata@2607:fad8:4:6:345e:43e0:3ced:a3d2) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * Kolobok (426d18f4@ircip1.mibbit.com) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * via (~via@smtp2.matthewvia.info) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * jochen_ (~jochen@laevar.de) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * yehudasa (~yehudasa@2607:f298:a:607:51be:f300:de79:d2d2) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * sagewk (~sage@2607:f298:a:607:799c:4aca:4834:466d) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * rtek (~sjaak@empfindlichkeit.nl) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * nwl (~levine@atticus.yoyo.org) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * dwm37_ (~dwm@northrend.tastycake.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * psieklFH (psiekl@wombat.eu.org) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * jantje (~jan@paranoid.nl) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * NaioN (stefan@andor.naion.nl) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * MrNPP (~mr.npp@ Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (synthon.oftc.net graviton.oftc.net)
[19:33] * ivan` (~ivan`@000130ca.user.oftc.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:34] * Cube (~Cube@ has joined #ceph
[19:34] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[19:34] * nmartin (~nmartin@adsl-98-90-198-125.mob.bellsouth.net) has joined #ceph
[19:34] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[19:34] * vata (~vata@2607:fad8:4:6:345e:43e0:3ced:a3d2) has joined #ceph
[19:34] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[19:34] * Kolobok (426d18f4@ircip1.mibbit.com) has joined #ceph
[19:34] * via (~via@smtp2.matthewvia.info) has joined #ceph
[19:34] * yehudasa (~yehudasa@2607:f298:a:607:51be:f300:de79:d2d2) has joined #ceph
[19:34] * jochen_ (~jochen@laevar.de) has joined #ceph
[19:34] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:34] * sagewk (~sage@2607:f298:a:607:799c:4aca:4834:466d) has joined #ceph
[19:34] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[19:34] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[19:34] * rtek (~sjaak@empfindlichkeit.nl) has joined #ceph
[19:34] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[19:34] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[19:34] * nwl (~levine@atticus.yoyo.org) has joined #ceph
[19:34] * dwm37_ (~dwm@northrend.tastycake.net) has joined #ceph
[19:34] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[19:34] * jantje (~jan@paranoid.nl) has joined #ceph
[19:34] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[19:34] * NaioN (stefan@andor.naion.nl) has joined #ceph
[19:34] * MrNPP (~mr.npp@ has joined #ceph
[19:34] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[19:34] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[19:34] * ivan` (~ivan`@000130ca.user.oftc.net) has joined #ceph
[19:36] <gregaf> sstan: that request takes 10 seconds to one of the monitors? it's not an OSD thing
[19:37] <sstan> ah good point .. I'll try to send the request to each monitor to see which one is the slow one
[19:39] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:40] <sstan> the last node can connect to only one monitor for some reason
[19:40] * mjevans (~mje@ has joined #ceph
[19:42] <sstan> so two out of three mons are unreachable
[19:42] <mjevans> What's a good way of testing access to an rbd file as a specific user? I'm trying to setup ...
[19:42] <mjevans> sstan: ouch, that means you've lost quorum
[19:43] <gregaf> sounds more like a filtering problem of some kind
[19:43] <sstan> it's ok , that's only an experimental cluster
[19:44] <sstan> is there a command for checking the mon. Cluster status ?
[19:45] <sstan> mjevans : you pass rbd commands with the user options (involves keys, etc.)
[19:45] <gucki> sstan: ceph -w
[19:45] <mjevans> sstan: thanks, I'll try that and get back here if it isn't a permissions issue.
[19:47] * dosaboy (~gizmo@faun.canonical.com) Quit (Remote host closed the connection)
[19:47] <sstan> I tried ceph mon stat, but there seems to be no problems
[19:48] <gregaf> check your network filtering
[19:49] <sstan> e1: 3 mons at {a=,b=,c=}, election epoch 10, quorum 0,1,2 a,b,c
[19:49] <sstan> I disabled firewalls to make sure .. but doesn't help. Machines can ping eachother
[19:50] <wer> does running irqbalance lend better throughput performance in a 24 osd node? I am capping out around 1.6Gbps.... and trying to find the bottleneck.
[19:51] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:51] <mjevans> Yay, it's either a permissions issue of somekind or I'm issuing the command incorrectly: rbd -n client.libvirt --keyfile /etc/ceph/libvirt.key ls -l libvirt-pool-test (the pool name is the last argument I provided) returns rbd: list: (1) Operation not permitted
[19:51] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[19:51] <mjevans> Mostly I'm asking if I did the test correctly so I can focus on making the test pass
[19:56] <sstan> try the same command with the admin user; I never did that, tell us if it works
[19:57] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Quit: Leaving.)
[20:00] <gucki> is a new bobtail release comming soon? i see lots of new commits since 0.56.2 :)
[20:01] <joshd> mjevans: that command is correct, assuming /etc/ceph/libvirt.key contains just the base64 secret key associated with client.libvirt (as shown in ceph auth list)
[20:02] <sm_> Hi! Is there anywhere a detailed description of how the ceph rebalancing algorithm works and when it kicks in? We have a setup with 4 OSDs, 3 of them have ~500G of Data but one holds 1,1TB. Is there a way to manually trigger rebalancing somehow?
[20:03] <sstan> did you modify the crush map ?
[20:04] <sstan> I think you'd have to increase that osd's weight. After applying the map, it should start rebalancing
[20:04] <scuttlemonkey> gucki: new point release should show up soonish modulo QA holds
[20:05] <gucki> scuttlemonkey: ;-)
[20:05] <scuttlemonkey> sm_: sstan is right, on pushing a new crushmap the cluster will start rebalancing on its own...for a detailed description of the tech, lemme see if we have something easily consumable
[20:06] <mjevans> joshd: it does, that part I figured out easily enough from the error it produced before
[20:07] <sm_> the crushmap is plain, did not change that
[20:07] <sm_> also is there a way to make radosgw use another pool? i can add a pool for it but it won't use it
[20:07] <mjevans> About sm_'s question; would downloading and re-uploading the default crushmap cause a rebalance?
[20:08] <joshd> mjevans: what caps does client.libvirt have? (ceph auth get client.libvirt)
[20:08] <sm_> currently we use the default .rgw.buckets which has only 8 PGs
[20:08] <mjevans> caps mon = "allow r"
[20:08] <mjevans> caps osd = "allow class-read object_prefix rbd_children, allow pool libvirt-pool-test rwx"
[20:08] <sm_> which is suboptimal of cause
[20:08] <scuttlemonkey> sm_: http://ceph.com/docs/master/man/8/radosgw-admin/
[20:08] <mjevans> joshd: Following the various quickstart guides, though the syntax of allow/permissions is a bit confusing since I see the permissions both before and after pool commands.
[20:08] <joshd> no, uploading the same crushmap (or one that reaches the same placements) would not cause rebalancing
[20:09] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[20:10] <scuttlemonkey> sm_: for detailed look at CRUSH you can read: http://ceph.com/papers/weil-crush-sc06.pdf
[20:10] <gucki> joshd: in fact i think i had a minor rebalance because of rounding errors...
[20:10] <joshd> mjevans: the grammar is in the ceph-authtool manpage: http://ceph.com/docs/master/man/8/ceph-authtool/#osd-capabilities
[20:10] <scuttlemonkey> for a distilled version of "this is how it rebalances on new crushmap" you'll have to get someone more knowledgeable than I
[20:11] <sm_> scuttlemonkey: i tried adding a pg but it is rarely used, the main pg has 1,3 TB and the added one 14G i added it when we had about 100GB stored
[20:12] <sm_> scuttlemonkey: crush map details look quite complicated, i will dig into that, thanks :)
[20:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[20:13] <gucki> does anyone know when 3945 will be fixed so i can finally upgrade my argonaut setup? :-)
[20:13] <mjevans> joshd: Thanks, though from reading that it looks like I specified the permissions correctly?
[20:14] <scuttlemonkey> sm_: when you say you added a pg, you mean added a pool? or did you monkey w/ pgs directly?
[20:15] <sm_> scuttlemonkey: yes sorry a pool
[20:15] <sm_> which has a more optimal number of pg's for my 4 OSDs
[20:15] <scuttlemonkey> how did you create the pool?
[20:15] <dmick> gucki: we just discussed it. It's happening today most likely
[20:15] <scuttlemonkey> 'ceph osd pool create..." ?
[20:16] <gucki> dmick: wow, great news :-)
[20:16] <noob2> awesome :D
[20:16] <noob2> the ceph iops are about 50% faster than our netapp
[20:16] <joshd> mjevans: yeah, they look correct. could you pastebin the output of the same command with --debug-ms 1 and --debug-auth 20?
[20:17] <scuttlemonkey> sm_: also did you use the radosgw-admin to add the pool to the gateway?
[20:17] <nhm_> noob2: really? That's excellent
[20:17] <noob2> nhm_: yeah the netapp is about 671 and the ceph is about 960
[20:17] <rturk> cool
[20:18] <sm_> scuttlemonkey: yes it was added via radosgw-admin and shows up in a "radosgw-admin pools list"
[20:18] <scuttlemonkey> noob2: I'd love to get you to write a profiling blog entry about that comparison :)
[20:18] <sm_> scuttlemonkey: i do not remember the exact settings as i created it a while ago
[20:18] <scuttlemonkey> ok, no biggie
[20:18] <noob2> scuttlemonkey: good idea :)
[20:19] <nhm_> scuttlemonkey: btw, good job on the last blog post. Looks like it's getting a lot of views!
[20:19] <scuttlemonkey> nhm_: sweet, always nice to see...love how every time I walk out the door I hear about some new way people have put ceph into play :)
[20:22] <mjevans> joshd: http://pastebin.com/MWAGSUsr ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) Linux h-0 3.7.7-1-ARCH #1 SMP PREEMPT Mon Feb 11 20:20:58 EET 2013 x86_64 GNU/Linux
[20:22] <scuttlemonkey> sm_: haven't forgotten you, just checking on something real quick
[20:22] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[20:23] <sm_> scuttlemonkey: no problem i am reading the crushmap paper ;)
[20:23] <gucki> btw, is there any good reason why the journal isn't flushed when shutting down an osd?
[20:26] <sm_> I do not quite get how the my ceph cluster got so unbalanced in the first place, the weight is not set for any of the osds in the crushmaps and hardware-wise they are identical. From the numbers it looks like it distributed all files to 3 nodes and made the backup for everythink on the 4th node ;) So if i change the weight now i will have to play with that all the time to get the cluster balanced
[20:26] <mjevans> gucki: have you checked the usual suspects of bad reasons files don't get saved?
[20:28] <gucki> mjevans: sry, not sure what you mean? i mean my cluster is running fine, but the docs they one should flush the journal before recreating it...so i wonder why they aren't flushed on normal shutdown, are they?
[20:28] <scuttlemonkey> sm_: looks like the only way to do it really is to pool add then pool rm the default via radosgw-admin
[20:29] <scuttlemonkey> any buckets you created on the old pool will remain there
[20:29] * chegk (~devnull@ Quit (Ping timeout: 480 seconds)
[20:29] <dmick> gucki: one thing is that there isn't really any "normal shutdown"
[20:29] <dmick> because OSDs have to deal with all sorts of crashes, we don't really have an orderly shutdown procedure
[20:29] <scuttlemonkey> unfortunately there is no way to move them right now, but the new geo replication stuff coming next will solve that
[20:29] <scuttlemonkey> just no way to do it now
[20:29] <scuttlemonkey> you could bucket rm and then recreate...but no migration path
[20:29] <dmick> you just shoot it in the head, and its replacement has to look around and see what all needs doing
[20:30] <mjevans> Validate existing data; sounds costly but sane.
[20:31] <gucki> dmick: it's good to know that osds are made quite crash safe, but anyway crashes of the software (if the hardware fails it'd replace the whole osd anyway) should be the exception?
[20:31] <sm_> scuttlemonkey: hm .. well all my data is in a single bucket, i would have to move it, which would take a while for the ~2TB of data .. when will the geo replication stuff be stable?
[20:31] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[20:32] <dmick> gucki: it's more of a "no matter what" philosophy
[20:32] <dmick> kernel crashes happen. admin fatfingering happens. OOM killers run wild. etc.
[20:32] <scuttlemonkey> sm_: no real firm date...it's not being looked at until cuttlefish...but it's pretty high on our priorities, right next to the admin API
[20:33] <sm_> scuttlemonkey: ok, that sounds not too far away. Any ideas why the cluster got so offset in the first place?
[20:33] <joshd> mjevans: perhaps one of your osds is still running an older version?
[20:34] <mjevans> joshd: There's only one osd, this is a freshly compiled setup (which I might have done wrong) following the quick start guide.
[20:35] <joshd> mjevans: check ceph-osd -v just to be sure
[20:35] <mjevans> I only say I may have done it wrong as this is my very first time setting up ceph and have yet to reach 'success'
[20:35] <mjevans> ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061)
[20:35] <scuttlemonkey> sm_: hard to say (for me) without digging
[20:35] <mjevans> I would be amazed if that's different from what I posted above, but it is from the command you just asked me to run.
[20:36] <sm_> scuttlemonkey: ok, thanks for your help :)!
[20:36] <mjevans> the mds, osd, and mon and client are all on the same test server.
[20:37] <scuttlemonkey> sm_: np, glad I could be of service :)
[20:37] * chegk (~devnull@ has joined #ceph
[20:37] <gucki> dmick: ok, i think this is all great, but doing a "normal shutdown" with flushing the journal on a sigquit would be nice. i'd say it's just what one expects, as it's the norm (any database does it?)
[20:38] <dmick> gucki: it bothered me too; then I wondered if I could tell the difference. The only way is "moving an OSD's data dir intact", which is pretty infrequent. So I let go of my "clean up after yourself" philosophy :)
[20:38] <joshd> mjevans: try doing "ceph osd tell 0 injectargs '--debug-osd 20'", and then running the command again. the osd log should then have a clue as to why it's returning EPERM
[20:40] <gucki> dmick: yeah, i was hit by it when i moved one journal to an ssd. i mean if it's just like adding one line of code i'd say, add it. if it's more complicated, lets keep it as it is.. :)
[20:40] <mjevans> joshd: it has been run, do I have to kill and reload the osd to clear that? Also where would that be logged? (running the other command doesn't seem to show much difference)
[20:41] <joshd> mjevans: it's changed at runtime, no need to restart the osd. it's going to /var/log/ceph/ceph-osd.0.log by default
[20:42] <mjevans> /var/log/ceph/ods.test.log I guess? Ok I'll look in there.
[20:42] * sm_ (~sm@xdsl-195-14-196-155.netcologne.de) Quit (Quit: sm_)
[20:43] <mjevans> joshd: yeah, but that log file is gaining lines at... a rather amazing rate.
[20:43] <mjevans> About 512K / min
[20:44] <joshd> mjevans: could you pastebin the last 500 lines after running the rbd command?
[20:44] <mjevans> I'll see if I can filter out something useful
[20:44] <joshd> it's very verbose, but you can set it back to --debug-osd 0 afterwards too
[20:45] <mjevans> Ah, I just killed it and let systemd relaunch the osd
[20:45] <mjevans> If that's incorrect behavior I'd better learn that before this is used in a production system.
[20:46] <joshd> that's fine
[20:47] <joshd> mjevans: you can look for lines around rbd_directory in the log
[20:49] <mjevans> joshd: http://pastebin.com/bffyzMGm
[20:49] <mjevans> I just lined up the timestamps; being a pure testing environment with nothing else working yet makes getting that really simple.
[20:50] <mjevans> Ah, is it saying the client is too old?
[20:51] <joshd> it's failing the caps check: op_has_sufficient_caps pool=3 (libvirt-pool-test) owner=0 need_read_cap=1 need_write_cap=0 need_class_read_cap=0 need_class_write_cap=0 -> NO
[20:51] <joshd> but I don't see why; those caps work for me
[20:51] <mjevans> Well, let me double check the caps again
[20:52] <mjevans> Hum... yes that seems to be correct
[20:53] <mjevans> The log even shows exactly what caps it has
[20:54] <mjevans> the rwx should include the read cap with the x part as the documentation states...
[20:55] <joshd> hmm, it seems to show nul bytes there too: osdcap[grant(object_prefix rbd^@children  class-read),grant(pool libvirt^@pool^@test rwx)]
[20:55] <noob2> netapp with FIO testing it: Cbs: 16 (f=16): [rrrrrrrrrrrrrrrr] [9.4% done] [13688K/0K /s] [1711 /0 iops] [eta 19m:22s]
[20:55] <noob2> ceph: Cbs: 16 (f=16): [rrrrrrrrrrrrrrrr] [9.3% done] [25800K/0K /s] [3225 /0 iops] [eta 11m:49s]
[20:55] <noob2> smokes it :D
[20:55] <joshd> doing that on my cluster doesn't show any ^@ like that: client.libvirt has caps osdcap[grant(object_prefix rbd_children class-read),grant(pool libvirt-pool
[20:55] <joshd> -test rwx)]
[20:56] <mjevans> joshd: the pool's name is libvirt-pool-test
[20:56] <mjevans> Can pools not have dash in their name?
[20:56] <noob2> i'm not sure. i think they can
[20:56] <joshd> they can, I'm guessing the caps just got screwed up somehow when you set them
[20:57] <mjevans> I'll look in to removing the user grants and re-adding after lunch then.
[20:57] <joshd> I'd suggest trying to set them again: ceph auth get-or-create client.libvirt mon 'allow r' osd "allow class-read object_prefix rbd_children, allow pool libvirt-pool-test rwx"
[20:57] <mjevans> Thanks for the help, and now I know how to crank up the logging when needed.
[20:57] <joshd> you're welcome
[20:57] <mjevans> I did that already earlier this morning; better to start from scratch with a fresh perspective.
[21:00] <wer> so running rados bench instead of rest-bench I get 366BM throughput vs 200MB throughput..... hmm.
[21:00] <ShaunR> noob2: you have a netapp?
[21:01] <wer> I think rados bench is able to saturate the disk IO whereas radosgw (4 of them) does not. If I add more nodes can I expect more throughput?
[21:02] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[21:03] * markw (0c9d41c2@ircip1.mibbit.com) has joined #ceph
[21:03] * KindOne (KindOne@h174.186.130.174.dynamic.ip.windstream.net) has joined #ceph
[21:03] <wer> Anyone have any suggestions on getting more throughput out of my 4 node 96 osd setup? Cause I am in failville right now.
[21:03] * markw is now known as Guest1662
[21:05] <noob2> ShaunR: yeah at work
[21:08] <Guest1662> Having problems setting up radosgw. Is there a good way to test the Apache fastcgi setup?
[21:10] <dmick> Guest1662: I've been known to strace the running radosgw, to see if it gets contacted when the Apache instance is hit
[21:10] * loicd (~loic@bdv75-4-82-226-115-123.fbx.proxad.net) has joined #ceph
[21:11] <nhm_> wer: more radosgw instances might get you more throughput.
[21:11] <ShaunR> noob2: what type of fio test you running? specs on the netapp and specs on your ceph cluster :)
[21:12] <nhm_> wer: If radosbench is doing significantly better that is.
[21:12] <nhm_> wer: you may need to put some kind of load balancer in front of them.
[21:15] <wer> nhm_: ty. hmm. yeah... so I have load balancing between the four instances. They are actually running on the nodes which maybe sucks. But even rados bench is not getting me the throughput I was hoping for. hmm.
[21:19] <ShaunR> noob2: i've always been curious how a netapp would perform compared
[21:24] * sleinen (~Adium@2001:620:0:26:d51e:8c9b:8b6f:6815) has joined #ceph
[21:28] * chegk (~devnull@ Quit (Ping timeout: 480 seconds)
[21:29] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has left #ceph
[21:29] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[21:33] * yehuda_hm (~yehuda@2602:306:330b:a40:29a8:3ee1:e94f:c893) Quit (Ping timeout: 480 seconds)
[21:33] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[21:37] * chegk (~devnull@ has joined #ceph
[21:37] * yehuda_hm (~yehuda@2602:306:330b:a40:29a8:3ee1:e94f:c893) has joined #ceph
[21:38] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:43] * drokita (~drokita@ has joined #ceph
[21:43] <drokita> Any idea why I would logging out of quorum errors with only one monitor running?
[21:48] <sagewk> woot, merged wip-deploy!
[21:49] <joao> drokita, how many monitors do you have?
[21:49] <joao> in the monmap, I mean
[21:49] <nhm_> sagewk: yay! That's good right? ;)
[21:50] <sagewk> step forward for ceph-deploy
[21:50] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[21:50] <nhm_> I still haven't actually used it. :D
[21:50] <drokita> I haven't checked yet, but upon further inspection it is entirely possible that there are 3 in the map
[21:51] <joao> drokita, then you'll be getting those messages as long as you only have one monitor running
[21:51] <joao> you'll need at least 2 out of 3 to get a quorum
[21:51] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:51] <drokita> Will that reset if I comment out the other 2 in ceph.conf and restart the mon
[21:51] <joao> no
[21:52] <drokita> Okay, I will read up on the monmap tool
[21:52] <joao> the ceph.conf is only used by the other daemons and clients to find out where to contact the monitors
[21:52] <drokita> thanks!
[21:52] <joao> let us know if you bump into some weirdness
[21:52] <dmick> nhm_: it wasn't super-usable before. these patches add a *lot* of functionality
[21:53] <mjevans> joshd: I've tested it again both with pool= and just pool, but my line keeps comming out like this in the log. "session 0x22a53b0 client.libvirt has caps osdcap[grant(object_prefix rbd^@children class-read),grant(pool libvirt^@pool^@test rwx)] 'allow class-read object_prefix rbd_children, allow pool libvirt-pool-test rwx'"
[21:54] <mjevans> This is EXACTLY the command I am using to create a completely new set of permissions, does anything look off? ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow pool=libvirt-pool-test rwx'
[21:54] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[21:55] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[21:55] <mjevans> The env contains LANG=en_US.UTF-8 ; I'm starting to think it's some kind of library issue.
[21:55] <joshd> that looks fine. maybe it is something to do with the locale
[21:56] <joshd> I've also got LANG=en_US.UTF-8 though
[22:00] <joshd> it seems like you could work around this for now by not using - in your pool names, but I am curious what's causing the change
[22:06] <mjevans> My thoughts too
[22:06] <mjevans> Also, if this is an issue, might it indicate something else that would disqualify from production use
[22:06] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:07] * LeaChim (~LeaChim@5e0d73fe.bb.sky.com) Quit (Ping timeout: 481 seconds)
[22:07] <drokita> joao: How does one view the current monmap if the monitor itself is unresponsive?
[22:07] <drokita> Any command sent to the monitor just hangs and doesn't return
[22:08] <joao> grab it directly from the monitor store
[22:08] <ShaunR> i'm trying to split up the public/cluster network... i'm a bit confused about the global part... it looks like i need to put the public and cluster network cidr range?
[22:08] <joao> drokita, will give you the commands in just a sec
[22:08] <ShaunR> so... public network =
[22:09] * LeaChim (~LeaChim@5e0d73fe.bb.sky.com) has joined #ceph
[22:11] <joao> drokita, cp /var/lib/ceph/mon/ceph-<mon-id> /tmp/mon.<id>.monmap ; monmaptool /tmp/mon.<id>.monmap --print
[22:11] <joao> err
[22:11] <joao> wait
[22:11] <joao> forgot to actually use the correct path
[22:12] <joao> drokita, again 'lc=`cat /var/lib/ceph/mon/ceph-mon<id>/monmap/last_committed` ; cp /var/lib/ceph/mon/ceph-<id>/monmap/$lc /tmp/mon.<id>.monmap ; monmaptool /tmp/mon.<id>.monmap --print'
[22:13] <joao> and I borked the path again; just substitute whatever /var/lib/ceph/foo stuff there with your monitor data directory
[22:14] <drokita> No worries, my path is diff anyway.
[22:14] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (Quit: Leaving)
[22:14] <joao> the point is to check what's the value in the 'monmap/last_committed' within the mon store, and copy 'monmap/<said-value>' to somewhere, and that should be your monmap
[22:14] <drokita> got it
[22:15] <joao> brb; dinner
[22:16] * jjgalvez (~jjgalvez@ has joined #ceph
[22:22] <ShaunR> Hmm.. this is interesting... I have a 2 server ceph cluster. each server has 1 MON and 4 OSD and is uplinked via 2 100mbit links. I was running a 'rados bench -p rbd 300 write' from one of the storage servers and noticed that the primary eth device was saturated and i was pulling right around 10mb/s. I decided to setup the second nic on both servers as the cluster network. I'm running the
[22:22] <ShaunR> same test, running it from the command line on one of the cluster servers (rados bench -p rbd 300 write) and now i'm seeing around 17.7MB/s and i'm noticing that traffic on eth1 is maxed and eth0 shows 70ish mbit... Does ceph by default load balance the nics?
[22:23] <ShaunR> it looks like it's using both the public and cluster networks
[22:24] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[22:29] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[22:29] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[22:29] <mjevans> ShaunR: I think it doesn't so much load balance the /nics/ as it load balances the routes to servers that it sees (that is it indirectly balances which IP destinations it's talking to) from your described behavior
[22:31] <mjevans> Most likely; this is a list of places I can get my blocks from (other server happens to be duplicated by the two 'views' of it), ask for blocks from each free source in round robin.
[22:33] <gregaf> Ceph doesn't do any load balancing of NICs on its own, no
[22:34] <ShaunR> Hmm, i'm wondering why i'm seeing traffic accross both like this.
[22:34] <gregaf> is your CRUSH map set up properly so that replication is one copy on each host, or could some of it be local replication?
[22:35] <ShaunR> I havnt touched the crush map
[22:35] <gregaf> and then your client is sending out a bunch of traffic on the "public" network to the OSDs on the other host, and only about half of the replication needs to go over the "cluster" network
[22:35] <gregaf> depends on which version of Ceph you're using then, and I don't remember when the various switchovers happened
[22:35] <ShaunR> 56.2
[22:36] <gregaf> you could check by seeing what "ceph osd tree" shows in terms of the CRUSH hierarchy, and…hrm, I don't remember how to dump out the CRUSH rules that are used for placement
[22:36] <gregaf> other than dumping the map and examining that
[22:36] <gregaf> you could spot check on ceph pg dump, I suppose
[22:37] <ShaunR> http://pastebin.ca/2313584
[22:37] <ShaunR> http://pastebin.ca/2313583
[22:37] <ShaunR> thats my config and the output of ceph osd tree
[22:37] <noob2> ShaunR: lemme fpaste the fio file for ya
[22:38] <ShaunR> this is just a test cluster so...
[22:38] <noob2> http://fpaste.org/Cw4t/
[22:39] <gregaf> ShaunR: you'll need to look at the CRUSH rules as well, to see if they're segregating by host or by OSD
[22:40] <gregaf> http://ceph.com/docs/master/rados/operations/crush-map/
[22:40] <ShaunR> whatever they are suppose to do by default is what they are doing... i havnt touched any of that
[22:41] <gregaf> yeah, it changed and I don't remember when, so you'll have to dig it out and see :)
[22:47] <noob2> ShaunR: specs on the ceph cluster are 6x HP DL180 G6's with 1GB flash cache and 12x 3TB drives
[22:47] <noob2> raid 0 on each drive
[22:47] <noob2> the netapp.. i'm not so sure
[22:47] <noob2> it's an older model
[22:50] <paravoid> any news on 0.56.3?
[22:51] * loicd (~loic@bdv75-4-82-226-115-123.fbx.proxad.net) Quit (Quit: Leaving.)
[22:51] <ShaunR> how many disks in the netapp
[22:52] <scuttlemonkey> hawt: http://goo.gl/q3Qqt
[22:53] <noob2> good question. prob 100 disks in each cluster. each netapp has 2 heads
[22:53] <noob2> they're small though
[22:54] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) Quit (Quit: Leaving)
[22:56] <scuttlemonkey> hot off the presses: http://ceph.com/docs/master/faq/
[22:56] <noob2> :)
[22:57] <scuttlemonkey> (also that link earlier was ceph deployed via juju w/ juju-gui running...nothing illicit)
[22:59] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[23:01] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) has joined #ceph
[23:01] * noob2 (~noob2@ext.cscinfo.com) has left #ceph
[23:03] <dilemma> so I have an issue where I have an incomplete placement group, and the two OSDs where that PG lives are continuously crashing (0.48.2)
[23:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:06] <dilemma> anyone here who can help me out? OSD crashes are putting my cluster at risk of losing data
[23:10] <ShaunR> gregaf: light bulb just turned on, let me know if this thinking is correct... I'm running that rados bench, it's acting as a client, so the traffic would go out the public network right? Then that data is being replicated on the cluster network? This would explain why i'm seeing traffic on both sides
[23:10] <gregaf> slang1 or scuttlemonkey, got some time?
[23:10] <gregaf> ShaunR: yeah
[23:10] * slang1 nods
[23:10] <ShaunR> ok, that makes perfect sense.
[23:11] <gregaf> what's going on dilemma?
[23:12] * Guest1662 (0c9d41c2@ircip1.mibbit.com) has left #ceph
[23:12] <dilemma> well, a couple hours ago, two of my OSDs crashed within minutes of each other
[23:12] <dilemma> restarting them worked for a while, but each one of them eventually crashes
[23:13] <slang1> is there a stack trace in the log of the osds that are crashing?
[23:13] <dilemma> so now I have some PGs down / incomplete
[23:13] <dilemma> yeah, let me get it into a pastebin
[23:14] <dilemma> there's a lot there: http://pastebin.com/rDdrzAPQ
[23:18] <slang1> dilemma: are you using btrfs for your osd devices?
[23:19] <dilemma> I've been switching them to xfs
[23:19] * mjevans starts paying more attention
[23:19] <dilemma> there is currently currently 1 btrfs OSD left
[23:19] <mjevans> slang1: why do you ask that? (says the guy about to do that...)
[23:19] <dilemma> and it's not giving me any trouble at the moment
[23:19] <slang1> dilemma: ok so the two that are crashing are on xfs?
[23:20] <slang1> mjevans: just trying to narrow it down
[23:20] <dilemma> slang1: yes
[23:21] <slang1> http://tracker.ceph.com/issues/2462,
[23:21] <mjevans> Ok, I was just seeing if there was a known issue. AFIK, btrfs in jbod, 0, 1, and 10 style modes is considered 'production ready' (at least as far as Ceph is also considered such) now that it has filesystem checking tools.
[23:21] <slang1> dilemma: what version of ceph are you running?
[23:22] <mjevans> Ouch that's in the log
[23:22] <mjevans> It's rather out of date
[23:22] <dilemma> 0.48.2
[23:22] <dilemma> yeah
[23:22] <mjevans> dec: that isn'w what the log... er N/M that's from slang1
[23:22] <dilemma> so I have the feeling I'm going to end up attempting something crazy like manually copying object files between OSDs here
[23:23] <dilemma> any chance that's the correct approach?
[23:23] <mjevans> Though I do see you're still one point release behind; have you looked at the upgrade notes/path?
[23:23] <dilemma> we're looking at a bobtail upgrade
[23:23] <dilemma> but we wanted to finish our btrfs -> xfs conversions first
[23:23] <mjevans> Is there a reason you're leaving btrfs?
[23:24] <dilemma> terrible performance
[23:24] <slang1> dilemma: yeah I think the bobtail upgrade will resolve that crash
[23:24] <dilemma> performance slowly degraded after we started actually using the btrfs OSDs
[23:24] <mjevans> Have you tried tracking down why you're getting that?
[23:24] <dilemma> mjevans: known fragmentation issue
[23:25] <dilemma> slang1: implying that an immediate bobtail upgrade will allow the OSD to start?
[23:25] <mjevans> dilemma: how old of a kernel are you running?
[23:25] <dilemma> old, one sec
[23:26] <dilemma> 3.2.0-32-generic on Ubuntu 12.04.1 LTS
[23:26] <dilemma> which is why our btrfs performance has degraded so quickly
[23:26] <slang1> dilemma: I think so, but why did you say you were about to loose data? What happens if you just leave the two osds down and mark them out?
[23:26] <dilemma> I'll have 2 down+peering and 3 incomplete PGs
[23:27] <dilemma> which is currently the case
[23:27] <dilemma> they're down and out, because I can't keep them from crashing
[23:27] <slang1> dilemma: how many osds do you have in total?
[23:27] <mjevans> Newish enough to support -o autodefrag but for btrfs you generally want something OTHER than LTS kernels (even if the rest of your stack (aside from btrfs tools) is LTS)
[23:27] <dilemma> mjevans: we're beyond that - we're using XFS now
[23:27] <dilemma> and we'll consider switching back some time in the future
[23:28] <dilemma> 168 OSDs, slang1
[23:28] <mjevans> Yeah. If it's just fragmentation I'll take the built in checksums/etc from btrfs and apply nightly btrfs balance operations as necessary.
[23:28] <dilemma> mjevans: agreed, but not helpful at the moment
[23:30] <dilemma> I want to end up on btrfs, but switching to XFS was the option we took rather than building newer kernels and sorting out the fragmentation issue
[23:30] <slang1> dilemma: any unfound pgs?
[23:30] <dilemma> nope
[23:30] <slang1> dilemma: you might be able to resolve this without restarting those osds
[23:31] <dilemma> I'd love to do that - what do you have in mind?
[23:31] <slang1> dilemma: this will help:
[23:31] <slang1> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#placement-group-down-peering-failure
[23:32] <mjevans> dilemma: you could use pinning and source from the quantal repo for linux-image and other such packages on a whitelist basis: http://packages.ubuntu.com/quantal/linux-image
[23:33] <dilemma> mjevans: thanks for the pointer. I'm familiar with how to deal with custom kernels and newer Ubuntu packages, however. It's just not how we chose to address the issue at the moment. My current problem is not btrfs related, however.
[23:34] <dilemma> slang1: are you suggesting that I mark the OSDs as lost?
[23:34] <slang1> dilemma: what does ceph health detail show you?
[23:35] <dilemma> there's a lot there, considering I have a bunch of ongoing backfilling from the two recently lost OSDs
[23:35] <mjevans> unrelated: looking at the changes that could show up in 3.8 makes me want to hold off on deploying this until that version hits arch...
[23:35] <dilemma> 1600 lines in the output
[23:35] <slang1> dilemma: ok
[23:36] <slang1> dilemma: if you see the % degraded number decreasing, you can just let the osds remain down
[23:36] <slang1> dilemma: until you have a chance to upgrade to bobtail
[23:36] <dilemma> the degraded number is decreasing, but my incomplete and down PGs will remain incomplete and down
[23:37] <dilemma> and objects in those PGs will be inaccessible
[23:37] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[23:37] <slang1> dilemma: yeah
[23:38] <slang1> dilemma: I think your best bet is to upgrade to bobtail
[23:38] <dilemma> Is there a known method of manually "backfilling" the PGs in question?
[23:39] <mjevans> dilemma: how much spare HW do you have? Maybe you can duplicate the offline nodes, prune out the PGs that are unrelated and bring them back up; though that sounds like a lot of work and a real long shot compared to upgrading...
[23:40] * cdblack (86868b46@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:42] <loicd> rturk: Hi :-) joao suggested that I ask you to upgrade my privileges in http://tracker.ceph.com/projects so that I can edit my own tickets descriptions ( http://tracker.ceph.com/users/789 ). Would you agree to it ?
[23:42] <slang1> dilemma: can you post 'ceph pg dump' to pastebin or email to me slang@inktank.com
[23:43] <rturk> loicd: happy to, one moment
[23:46] <dilemma> slang1: 72174 lines in the output
[23:46] <dilemma> (72000 PGs in my cluster)
[23:46] <dilemma> still want it?
[23:46] <slang1> dilemma: yep!
[23:46] <slang1> dilemma: you can compress it if it its big
[23:47] <rturk> loicd: give it a try now - you may have to log out and back in again
[23:48] <dilemma> slang1: sent
[23:50] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[23:50] * lollo (lollo@l4m3r5.org) has left #ceph
[23:51] <wer> ceph is making me sad.
[23:51] <nhm_> wer: still having performance problems?
[23:51] * loicd trying
[23:52] <wer> nhm_: yeah I am. I am in the process of pulling out a node... to see if performance decreases....
[23:53] <loicd> rturk: works :-D thanks a lot !
[23:53] <rturk> my pleasure :)
[23:54] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[23:55] <nhm_> wer: have you looked at the osd admin socket at all?
[23:55] <wer> nhm_: I doubt it?
[23:55] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[23:57] <slang1> dilemma: those 3 incomplete pgs are all 0 bytes
[23:57] <nhm_> wer: something like: ceph --admin-daemon /var/run/ceph/ceph-osd.$i.asok dump_ops_in_flight | grep num_ops
[23:58] <wer> nhm_: I was not aware of such a beast. ah :) you just answered my question. Ok I will take a look.,
[23:58] <nhm_> wer: I do that in a loop for each OSD on each node.
[23:58] <dilemma> why would they be zero bytes, slang1?
[23:58] <nhm_> wer: that can help pinpoint on which OSD(s) ops are backing up.
[23:58] <dilemma> I can guarantee that before the OSDs went down the first time, there was usage
[23:59] <slang1> dilemma: on those pgs?
[23:59] <dilemma> on all PGs
[23:59] <dilemma> we have no empty PGs in this cluster
[23:59] * vata (~vata@2607:fad8:4:6:345e:43e0:3ced:a3d2) Quit (Quit: Leaving.)
[23:59] <slang1> dilemma: hmmm

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.