#ceph IRC Log


IRC Log for 2013-01-24

Timestamps are in GMT/BST.

[0:02] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[0:03] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[0:10] * vata (~vata@2607:fad8:4:6:eccc:34b8:fdba:6c99) Quit (Quit: Leaving.)
[0:19] * The_Bishop (~bishop@2001:470:50b6:0:5471:c473:d0f7:1e28) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[0:37] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[0:44] * cmello (cmello@ has joined #ceph
[0:49] * cmello (cmello@ Quit ()
[0:49] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[0:49] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[0:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[0:53] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[0:56] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:56] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit ()
[1:02] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[1:02] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[1:10] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[1:16] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[1:16] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[1:17] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[1:19] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:20] * jlogan1 (~Thunderbi@2600:c00:3010:1:3903:fb9:e591:774d) Quit (Ping timeout: 480 seconds)
[1:20] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Operation timed out)
[1:58] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has left #ceph
[2:00] * dmick (~dmick@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[2:00] * ChanServ sets mode +o dmick
[2:11] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:11] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:14] * alram (~alram@ Quit (Quit: leaving)
[2:40] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[2:41] * ScOut3R (~scout3r@dsl51B61EED.pool.t-online.hu) Quit (Quit: Lost terminal)
[2:43] * LeaChim (~LeaChim@b0faf18a.bb.sky.com) Quit (Ping timeout: 480 seconds)
[2:43] <paravoid> sagelap: so, cluster has stabilized, but I have 1 pg incomplete and 23 stale
[2:43] <paravoid> manual says to report a bug whenever I see "incomplete" :-)
[2:46] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:46] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:55] * Tamil1 (~tamil@ Quit (Quit: Leaving.)
[2:58] * amichel (~amichel@ has joined #ceph
[3:01] <amichel> I found a nice section in the documentation about adding monitors to a running cluster, but nothing about adding mds servers. Is there a page on it that I'm just not seeing?
[3:02] <sjustlaptop> paravoid: have to step out shortly, but how many OSDs are not up?
[3:04] <paravoid> hey
[3:04] <paravoid> all of them are up
[3:05] <paravoid> sjustlaptop: there is one down but that has happened many days ago
[3:06] <paravoid> sjustlaptop: I can come back tomorrow
[3:06] <paravoid> not really urgent
[3:06] <sjustlaptop> paravoid: that is very odd
[3:07] <paravoid> odd things are happening to me all the time :)
[3:07] <sjustlaptop> if you want to make a bug, you can attach the output of ceph osd dump, ceph pg dump, ceph osd getmap -o <tmpfile>, ceph osd getcrushmap -o <tmpfile>
[3:07] <paravoid> not sure if you saw the backlog
[3:07] <sjustlaptop> I should really condense this into a script
[3:07] <sjustlaptop> I didn't
[3:07] <sjustlaptop> oh, and ceph pg <pgid> query for each incomplete pg
[3:07] <paravoid> I have quite a few OSDs added, plus most of the old OSDs marked out
[3:07] <paravoid> this happened on Monday
[3:08] <paravoid> has been recovering since
[3:08] <sjustlaptop> but some pgs are incomplete?
[3:08] <paravoid> then today at some point several osds across the cluster started being marked as down
[3:08] <paravoid> some of them actually died it seems, I filed a bug about one of those
[3:09] <sjustlaptop> k
[3:09] <paravoid> after a few hours, increasing op thread timeout from 7200 to 28800 and a few mon restarts by the oom killer
[3:09] <paravoid> it has stabilized again
[3:09] <paravoid> but now I'm seeing incomplete & stale
[3:09] <sjustlaptop> yeah, file a bug with the stuff I mentioned above
[3:09] <sjustlaptop> if ceph pg <pgid> query hangs, note that
[3:10] <paravoid> thanks
[3:10] <sjustlaptop> I'll be back tomorrow
[3:10] <paravoid> me too
[3:10] <paravoid> rest well!
[3:10] <paravoid> thanks :-)
[3:13] <paravoid> uh oh
[3:13] <paravoid> for all but one of them pg query says "pgid currently maps to no osd
[3:14] <paravoid> for all the stale ones
[3:19] * amichel (~amichel@ Quit ()
[3:39] * sagelap1 (~sage@mobile-166-137-215-104.mycingular.net) has joined #ceph
[3:39] * rturk is now known as rturk-away
[3:44] * sagelap (~sage@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:44] * dmick (~dmick@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:46] * glowell (~glowell@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:49] * chutzpah (~chutz@ Quit (Quit: Leaving)
[3:51] * joshd1 (~jdurgin@2602:306:c5db:310:5cae:d1ca:70f2:af66) Quit (Quit: Leaving.)
[4:05] * sagelap1 (~sage@mobile-166-137-215-104.mycingular.net) Quit (Ping timeout: 480 seconds)
[4:07] * sagelap (~sage@mobile-166-137-215-104.mycingular.net) has joined #ceph
[4:16] * cmello (~cesar@ has joined #ceph
[4:17] <cmello> hello there
[4:19] <sjustlaptop> paravoid: that's good, explains what's going on
[4:20] <via> i had a power outage and now the nodes have all come back up, ceph -s is nonresponsive and the monitor logs all suggest they are refusing connections because they aren't in quorum (out of 3 monitors)
[4:20] <cmello> anyone interested in talking about radosgw performance?
[4:21] <via> all the nodes are correct time with ntp, and iptables is turned off
[4:23] <via> is there a procedure for what to do if all nodes go off?
[4:23] <sjustlaptop> via: not really, the mons should resestablish quorum
[4:24] <via> is there a way to make this happen?
[4:24] <via> it doesn't seem to be
[4:24] <sjustlaptop> try restarting the mons again?
[4:24] <via> i have
[4:25] <via> is there a specific ordering i should try?
[4:25] <sjustlaptop> are you certain that you can ping each mon's connection from each other mon's connection?
[4:25] <via> yes
[4:26] <via> i'm trying to just start two of them first
[4:26] <via> i think that might have worked
[4:27] <via> i also think another server is constantly trying to mount an rbd device and maybe its just bogging them down
[4:28] <via> nope. http://pastebin.com/qLaknAAR
[4:31] <cmello> well in some way I just wanted to say hi to the ceph engineers. congratulations for the great work! I met the project a month ago and I admire it a lot, and I hope ceph will be my tool for serious storage apps. Best regards and greetings from Brazil!
[4:31] <elder> Thank you cmello!
[4:34] <cmello> curious if joao is from Brazil also.
[4:35] <via> i don't understand how each node looks at first like it joins quorum then gets a bunch of messages saying its not in quorum
[4:36] <cmello> oh Portugal
[4:37] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:37] <cmello> good night elder! best regards to all.
[4:37] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:38] * cmello (~cesar@ Quit (Quit: Leaving)
[4:39] <via> with debug mon = 20, it just keeps repeating this: http://pastebin.com/djg46i7u
[4:39] * sagelap (~sage@mobile-166-137-215-104.mycingular.net) Quit (Ping timeout: 480 seconds)
[4:43] <via> i have auth disabled on each node, makes me wonder why all the messages it keeps refusing are auth messages
[4:44] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:49] <via> perhaps there's a way to force an election?
[5:09] * sagelap (~sage@ has joined #ceph
[5:23] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:23] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:48] * sagelap (~sage@ Quit (Read error: Connection reset by peer)
[6:13] * sagelap (~sage@mobile-166-137-177-115.mycingular.net) has joined #ceph
[6:16] <via> i'm sorry guys, apparently my switch's settings didn't survive the power outage and it stopped supporting jumbo frames
[6:16] <via> so random things were getting blocked
[6:25] * maxiz (~pfliu@ has joined #ceph
[6:26] * maxiz (~pfliu@ Quit ()
[6:28] * Hau_MI (~HauM1@login.univie.ac.at) has joined #ceph
[6:30] * HauM1 (~HauM1@login.univie.ac.at) Quit (Ping timeout: 480 seconds)
[6:41] * sagelap (~sage@mobile-166-137-177-115.mycingular.net) Quit (Ping timeout: 480 seconds)
[7:09] * gaveen (~gaveen@ has joined #ceph
[7:18] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[7:49] <darkfaded> via: what switch model was that? i heard such stuff about allied telesys switches
[7:51] * tnt (~tnt@ has joined #ceph
[7:55] * dosaboy (~gizmo@ Quit (Quit: Leaving.)
[8:04] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:08] <xiaoxi> is it possible to have mkcephfs parallel?
[8:15] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:37] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[8:45] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[8:53] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[8:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:01] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:03] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Download IceChat at www.icechat.net)
[9:05] * NaioN (stefan@andor.naion.nl) Quit (Remote host closed the connection)
[9:08] * NaioN (stefan@andor.naion.nl) has joined #ceph
[9:08] * sleinen1 (~Adium@ Quit (Quit: Leaving.)
[9:08] * sleinen (~Adium@2001:620:0:2d:c813:1467:f842:4513) has joined #ceph
[9:09] * sleinen (~Adium@2001:620:0:2d:c813:1467:f842:4513) Quit ()
[9:13] * BManojlovic (~steki@ has joined #ceph
[9:14] * tziOm (~bjornar@ has joined #ceph
[9:16] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:25] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[9:29] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:38] * leseb (~leseb@mx00.stone-it.com) has joined #ceph
[9:41] * sleinen (~Adium@ has joined #ceph
[9:41] <absynth_47215> morning
[9:42] * sleinen1 (~Adium@2001:620:0:26:89ef:7da2:30ac:7d6f) has joined #ceph
[9:48] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[9:49] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[9:52] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[9:55] * low (~low@ has joined #ceph
[10:01] * Zethrok (~martin@ has joined #ceph
[10:03] * nz_monkey (~nz_monkey@ Quit (Read error: Operation timed out)
[10:03] * nz_monkey (~nz_monkey@ has joined #ceph
[10:05] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[10:06] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[10:07] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[10:10] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:13] * nz_monkey_ (~nz_monkey@ has joined #ceph
[10:14] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) has joined #ceph
[10:16] * nz_monkey (~nz_monkey@ Quit (Ping timeout: 480 seconds)
[10:18] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[10:19] <absynth_47215> does ceph do any network-intensive tasks in the background without showing up in ceph -w?
[10:25] * nz_monkey (~nz_monkey@ has joined #ceph
[10:25] * nz_monkey_ (~nz_monkey@ Quit (Remote host closed the connection)
[10:32] * sleinen1 (~Adium@2001:620:0:26:89ef:7da2:30ac:7d6f) Quit (Quit: Leaving.)
[10:34] * sleinen (~Adium@ has joined #ceph
[10:34] * LeaChim (~LeaChim@b0faf18a.bb.sky.com) has joined #ceph
[10:37] * sleinen1 (~Adium@2001:620:0:25:540f:7eec:aa9a:d319) has joined #ceph
[10:42] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[10:46] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[10:58] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:02] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[11:07] * fghaas (~florian@ has joined #ceph
[11:21] * Hau_MI is now known as HauM1
[11:26] * nz_monkey (~nz_monkey@ Quit (Ping timeout: 480 seconds)
[11:29] * nz_monkey (~nz_monkey@ has joined #ceph
[11:37] * lurpy (~test003@soho-94-143-249-78.sohonet.co.uk) has joined #ceph
[11:58] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[12:08] <xdeller> jksm: how much pgs and how much data your pool has? i`m trying to lower potential test suite amount for myself for suicide bug
[12:09] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[12:10] * sleinen1 (~Adium@2001:620:0:25:540f:7eec:aa9a:d319) Quit (Quit: Leaving.)
[12:10] * sleinen1 (~Adium@ has joined #ceph
[12:11] * sleinen1 (~Adium@ Quit ()
[12:11] * sleinen1 (~Adium@ has joined #ceph
[12:16] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[12:17] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit ()
[12:18] * fghaas (~florian@ has left #ceph
[12:18] * BillK (~BillK@58-7-185-153.dyn.iinet.net.au) has joined #ceph
[12:19] * sleinen1 (~Adium@ Quit (Ping timeout: 480 seconds)
[12:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:22] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[12:33] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[13:04] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[13:07] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[13:12] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:14] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[13:15] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit ()
[13:26] * leseb_ (~leseb@mx00.stone-it.com) has joined #ceph
[13:26] * leseb (~leseb@mx00.stone-it.com) Quit (Read error: Connection reset by peer)
[13:31] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) Quit (Remote host closed the connection)
[13:51] <tnt> Does anyone manage to use the 'radosgw-admin log show' command ?
[13:54] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) has joined #ceph
[14:05] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[14:06] <nhm> morning #ceph
[14:06] <absynth_47215> morning mr. performance
[14:13] * ScOut3R (~ScOut3R@catv-89-133-43-117.catv.broadband.hu) has joined #ceph
[14:14] * gaveen (~gaveen@ has joined #ceph
[14:21] * mattbenjamin (~matt@ has joined #ceph
[14:39] <via> darkfaded: its a dlink dgs something
[14:40] <liiwi> my condolences
[14:48] * sleinen (~Adium@ has joined #ceph
[14:48] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[14:49] * sleinen1 (~Adium@2001:620:0:26:5515:48c:fade:3a1b) has joined #ceph
[14:55] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:56] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[14:56] <via> darkfaded: dlink dgs 1210-24
[15:00] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[15:00] <liiwi> heh, the it dept here has couple of those.. waiting for trip to shooting range :P
[15:06] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:09] * ninkotech (~duplo@ Quit (Remote host closed the connection)
[15:09] * ninkotech_ (~duplo@ Quit (Remote host closed the connection)
[15:15] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[15:16] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Quit: leaving)
[15:18] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit ()
[15:18] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[15:20] * ScOut3R (~ScOut3R@catv-89-133-43-117.catv.broadband.hu) Quit (Remote host closed the connection)
[15:20] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:46] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[15:47] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit ()
[15:52] * nmartin (~nmartin@adsl-98-90-208-215.mob.bellsouth.net) has joined #ceph
[15:54] * dosaboy (~gizmo@ has joined #ceph
[15:56] <nmartin> I swear I've googled until my hands are numb: can I run ceph as a VM? I want to cephalize all the local storage in my xenserver cluster
[15:56] <nmartin> currently I run 1U servers, with tiny local stores, and FAT sans, but want to shift to 2U HVs and distributed storage
[15:56] <nmartin> but running ceph in a VM is required to accomplish that
[15:57] <Cybje> no problem, I have tested Ceph on a few KVM virtual machines
[15:57] <scuttlemonkey> nmartin: the one thing you'll want to remember is that MONs don't like it when their IPs change
[15:58] <Cybje> and timing might be an issue in some virtualization environments ... Ceph likes to have a quite accurate time
[15:58] <scuttlemonkey> beyond that VMs can run Ceph fine modulo uber-performance-requirements
[15:58] * PerlStalker (~PerlStalk@ has joined #ceph
[15:58] <nmartin> yes, IPs will be static
[15:59] <nmartin> we have a very nice VM cluster, so dual 10GBe storage lan, etc etc
[15:59] <tnt> I'm running ceph under xen VM ... service RBD for other vms
[15:59] <nmartin> BUT we are a bit constrained with our storage scale
[15:59] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) Quit (Quit: Ex-Chat)
[16:00] <nmartin> tnt, and yeah, I need block not object storage
[16:00] <tnt> yup, RBD is block device :)
[16:01] <nmartin> interesting, I'll be labbing this up today! I'll also document it - this will be ceph RDB in a cloudstack cluster
[16:01] <tnt> Even live migration works which is nice.
[16:01] <nmartin> tnt: what? what is this voodoo?!?
[16:01] <nmartin> how does a migrated vm stil reach the storage on the previous HV?
[16:02] <joao> <scuttlemonkey> nmartin: the one thing you'll want to remember is that MONs don't like it when their IPs change <-- we should put this in the docs
[16:02] <match> nmartin: I'm doing the other side of this - using ceph as a storage pool for vm disks via rbd: http://www.woodwose.net/thatremindsme/2012/10/ha-virtualisation-with-pacemaker-and-ceph/
[16:02] <joao> accompanied by "and neither will you"
[16:03] <scuttlemonkey> hehe
[16:03] <nmartin> match: heh - thats what I do today, and find it limiting
[16:03] <tnt> nmartin: well if the VM storage is entirely on ceph, as long as the new dom0 can access the ceph cluter, it can attach the RBD as well ...
[16:03] <nmartin> so i want to go in the other direction
[16:03] <match> nmartin: whats the limit?
[16:03] <scuttlemonkey> joao: good call, I'll pass that on to jwilkins
[16:04] <joao> scuttlemonkey, we already state something of the sorts iirc, but it's only on the 'changing a mon's ip' section
[16:04] <joao> and I meant the whole quote :p
[16:05] <scuttlemonkey> haha, definitely
[16:05] <joao> might be worth another look just to make sure it is visible enough so people don't assume it's okay
[16:05] <scuttlemonkey> we should have a bit of a "things to consider before starting this mess"
[16:06] <joao> "1. want to be the coolest kid in the neighborhood"
[16:06] <nmartin> I see - the architecture i built in my head this morning on the way to work is: 1. build cloudstack, with local storage turned on 2. add 1 ceph/rdb vm per hypervisor, and allocate ALL local storage to them, build block storage devices, and add iSCSI targets to them in the same vm (if possible). 3.add these targets to cloudstack. 4. disable local storage in user VMs. 5. party like a rock star
[16:07] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) has joined #ceph
[16:07] <nmartin> match: well, we do active active SANs for high availability, so for every 20 TB i have to build a 2 node cluster and add it to cloudstack as an iSCSI target
[16:08] <nmartin> but my argument is that that job could be done on the hypervisors, so storage scales with the hypervisor cluster
[16:09] <nmartin> is that just crazy? am i missing something?
[16:10] <match> nmartin: I'm slightly not following... why are you building 2-node clusters fro each 20TB, rather than one big pool and subdividing the rbd pool if needed?
[16:10] <match> I guess I'm having a problem mapping your description onto OSDs and MONs
[16:17] * mattbenjamin (~matt@ Quit (Quit: Leaving.)
[16:18] * mattbenjamin (~matt@adsl-75-45-228-196.dsl.sfldmi.sbcglobal.net) has joined #ceph
[16:22] <nmartin> match: i build 2node sans today, non ceph
[16:22] <nmartin> so these are traditional 2node active-active sans, using drbd and pacemaker
[16:23] <match> nmartin: Ok - I know that setup well :)
[16:25] <nmartin> i want to push that storage onto the hypervisors with the same or higher performance/reliability
[16:26] <match> nmartin: So replace 2-node drbd with n-node ceph and export via rbd to vms?
[16:26] * mattbenjamin (~matt@adsl-75-45-228-196.dsl.sfldmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[16:29] <nmartin> match: right: so my hypervisors will now become 2U 20 TB nodes, with 2gb for dom0 and the rest of that storage will be placed ona vdisk for a ceph VM, that I provision in cloudstack. I then create an iSCSI target (im unclear where this will happen yet) that I then connect to in cloudstack
[16:31] <match> does cloudstack have support for rbd (the nbd protocol)? Would make it a lot simpler than running iscsi when ceph has its own equivalent built-in
[16:31] <absynth_47215> yeah, i was about to ask
[16:31] <absynth_47215> why use iscsi?
[16:34] <match> nmartin: Something like this might be relevant: http://ceph.com/docs/master/rbd/rbd-cloudstack/
[16:34] * aliguori (~anthony@ has joined #ceph
[16:34] * nyeates (~nyeates@pool-173-59-239-231.bltmmd.fios.verizon.net) has joined #ceph
[16:37] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[16:41] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[16:47] <nmartin> it does not currently - the storage is based on the storage interfaces of the underlying hypervisor
[16:48] <nmartin> which on xen is iscsi, nfs, or a few hbas
[16:50] <nmartin> that was what threw gluster off the map for me - their nfs support is full of pain, heartache, and fail
[16:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:50] <nmartin> i'm certainly planning on 4.0 cloudstack, but it is miles away from production - in fact, i am shocked spache released it
[16:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:51] <nmartin> i think it was to just get a version bump, because it was a very low quality release which is odd for apache
[16:52] * low (~low@ Quit (Quit: bbl)
[16:52] <nmartin> match: but thanks for the info - i'll be watching that closely
[16:52] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[16:53] <match> nmartin: Is xen (rather than kvm) important? I'm using kvm under libvirt with rbd and it works like a charm
[16:54] <absynth_47215> ditto, with qemu
[16:55] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[16:56] * noob212 (~noob21@ has joined #ceph
[17:00] <match> nmartin: (the blog post I mentioned before is an ok howto)
[17:00] <slang1> :q
[17:01] * slang1 grrs
[17:01] * aliguori (~anthony@ Quit (Read error: Operation timed out)
[17:05] <match> slang1: Did you mean C-x C-c? ;-)
[17:07] <slang1> match: :-)
[17:08] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) has joined #ceph
[17:08] <slang1> match: the fact that you have to hold down a key while you press multiple other keys proves how emacs is so inferior
[17:09] * BillK (~BillK@58-7-185-153.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:10] <absynth_47215> the gods graced us with 10 fingers for a reason
[17:14] * jlogan (~Thunderbi@2600:c00:3010:1:3903:fb9:e591:774d) has joined #ceph
[17:15] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[17:16] <slang1> absynth_47215: they couldn't have envisioned wearing out the control key unnecessarily
[17:17] <slang1> I actually managed to start a good old emacs-vi argument
[17:17] * slang1 pats himself on the back
[17:17] * BillK (~BillK@124-149-72-249.dyn.iinet.net.au) has joined #ceph
[17:18] <elder> I thought emacs required foot pedals to really work right.
[17:18] <absynth_47215> or a cephalopod
[17:18] <ircolle> slang - maybe for your next trick you can start an xfs vs ext4 vs btrfsfight
[17:18] <noob212> vim for the win :-)
[17:22] <nmartin> the weak part of my plan is the iSCSI targets; if a hypervisor has to be rebooted or dies, my iSCSI target VM has to migrate, which will not be 0 hit unless it's a live migration
[17:23] <nmartin> match: no, kvm is possible, but our initial cloudstack cluster is xenserver
[17:23] <nmartin> we (possibly) need to be able to support xenserver, vmware, and kvm since cloudstack does
[17:23] <nmartin> and we're a service provider, so the customer is always right ;)
[17:24] <nmartin> so far no one cares what dom0 is as long as it works
[17:25] <nmartin> vim protip: i moved .vimrc to dropbox, and ln -s it to ~, and kerPOW, one vimrc across all my servers!
[17:25] * ebo^ (~ebo@icg1104.icg.kfa-juelich.de) has joined #ceph
[17:26] <nmartin> every tweak is always available to all my other machines
[17:26] <ebo^> does anyone know what the loadavg parameter in the osd perf dump means exactly?
[17:33] <slang1> ebo^: its pulled from the libc getloadavg() function I think
[17:35] <slang1> ebo^: looks like its the first one in the array: loadavg[0], so its the avg over the last minute
[17:35] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[17:35] <ebo^> oh ok .. thx ... i just wondered why its at 400+ :-)
[17:37] * Matt (matt@matt.netop.oftc.net) has joined #ceph
[17:38] <Matt> ah, excellent :)
[17:38] <Matt> love it when there's a channel for a project :)
[17:38] * musca (musca@tyrael.eu) has joined #ceph
[17:40] * aliguori (~anthony@ has joined #ceph
[17:42] * ebo^ (~ebo@icg1104.icg.kfa-juelich.de) Quit (Quit: Verlassend)
[17:43] * vata (~vata@2607:fad8:4:6:dda8:cf80:46fe:873c) has joined #ceph
[17:47] * noob212 (~noob21@ Quit (Ping timeout: 480 seconds)
[17:50] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:52] * dosaboy (~gizmo@ Quit (Remote host closed the connection)
[17:53] * dosaboy (~gizmo@ has joined #ceph
[17:53] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:00] * sagelap (~sage@mobile-166-137-179-169.mycingular.net) has joined #ceph
[18:00] * terje (~joey@71-218-6-247.hlrn.qwest.net) has joined #ceph
[18:02] * terje__ (~joey@97-118-121-72.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[18:03] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[18:05] * leseb_ (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[18:11] * sander (~chatzilla@c-174-62-162-253.hsd1.ct.comcast.net) has joined #ceph
[18:12] * BillK (~BillK@124-149-72-249.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:13] * sleinen (~Adium@2001:620:0:46:55bd:83a4:cb2e:ee1b) has joined #ceph
[18:13] * tnt (~tnt@ has joined #ceph
[18:16] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:18] * sleinen1 (~Adium@2001:620:0:26:5515:48c:fade:3a1b) Quit (Ping timeout: 480 seconds)
[18:23] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[18:23] * Kdecherf (~kdecherf@shaolan.kdecherf.com) has joined #ceph
[18:23] <Kdecherf> Hi world
[18:26] <noob21> hello
[18:27] * sagelap1 (~sage@2607:f298:a:607:ccbf:6c78:41bd:da97) has joined #ceph
[18:29] * sagelap (~sage@mobile-166-137-179-169.mycingular.net) Quit (Ping timeout: 480 seconds)
[18:31] <paravoid> sjust: I'm around if you want to do realtime debugging
[18:34] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:35] <Kdecherf> does anybody already observed a data corruption on cephfs (4 mds, 4 mon, 27 osd on ext4, 27TB raw capacity) 0.48.3argonaut on ubuntu 12.04?
[18:35] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:36] <absynth_47215> depends. what kind of corruption are you seeing
[18:36] <Kdecherf> client is ceph 0.48.3 compiled for exherbo using boost 1.50.0 (with a fix for build error)
[18:36] <Kdecherf> absynth_47215: NULL bytes in some files
[18:36] <Kdecherf> without any other content
[18:36] <absynth_47215> did you have a kernel panic or a host power failure?
[18:36] <Kdecherf> nope
[18:36] <absynth_47215> (on an OSD host)
[18:37] * alram (~alram@ has joined #ceph
[18:37] <absynth_47215> we have XFS...
[18:37] <Kdecherf> daemons were sequentially restarted to apply some configuration changes during the use
[18:37] * doubleg (~doubleg@ has joined #ceph
[18:39] <absynth_47215> Kdecherf: we have seen stuff like this before on argonaut. however, not with .48.3 and not on ext4. i don't remember the exact reason for the zero'ed out blocks, though
[18:39] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[18:39] <absynth_47215> did you have major recovery and/or slow requests?
[18:39] <absynth_47215> anything unusual _at all_?
[18:40] <absynth_47215> gonna relocate to home, bbs
[18:42] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[18:42] <Kdecherf> absynth_47215: I launched a async of my files for migration (only 6GB of data) using lsyncd (rsync/cp/rm) and observed a hang in IO sometimes
[18:43] <Kdecherf> absynth_47215: we have planned to update all our nodes to bobtail tomorrow
[18:45] <Kdecherf> absynth_47215: I also observed a 'big' memory consumption on the active mds (1.7GB for <200k files) during the first import
[18:46] * jtangwk1 (~Adium@2001:770:10:500:75db:82f1:4ba1:c114) has joined #ceph
[18:46] * Cube (~Cube@ has joined #ceph
[18:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:48] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:49] <Kdecherf> absynth_47215: and flapping osds (2 of 27)
[18:53] * jtangwk (~Adium@2001:770:10:500:8d22:d635:d460:e02) Quit (Ping timeout: 480 seconds)
[18:57] * glowell (~glowell@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:58] * ScOut3R (~ScOut3R@dsl51B61EED.pool.t-online.hu) Quit (Remote host closed the connection)
[19:02] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:05] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:08] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[19:09] <absynth_47215> Kdecherf: these sound like familiar issues, all of them fixed in bobtail
[19:09] <absynth_47215> out of curiosity: do you see memleaks in the osd processes, too?
[19:09] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[19:10] * Tamil (~tamil@ has joined #ceph
[19:11] * gaveen (~gaveen@ has joined #ceph
[19:15] <Kdecherf> absynth_47215: at this time, no
[19:15] <sstan> _test_
[19:18] * BillK (~BillK@ has joined #ceph
[19:18] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Connection reset by peer)
[19:18] <jmlowe> any teuthology people here?
[19:19] <slang1> jmlowe: I can try to help, what's up?
[19:21] <jmlowe> I have something I would like to try out to see if it caused 3810, I ran btrfs scrubs while my osd's were under load and got inconsistent pg's, how hard would it be to try to reproduce copying data to rbd devices while running a btrfs scrub then do a ceph deep scrub?
[19:21] <absynth_47215> Kdecherf: from what you wrote, an update to bobtail is probably a good idea for you...
[19:21] <absynth_47215> jmlowe: those inconsistent pgs... is that bobtail?
[19:22] <jmlowe> 0.56.1
[19:23] <slang1> Kdecherf: there have been some fixes (from Zheng Yan) for users seeing null bytes in files, but I don't think they were backported to argonaut
[19:23] <absynth_47215> i rarely do this emoticon, but
[19:23] <absynth_47215> oO
[19:23] <slang1> Kdecherf: let me check
[19:25] <slang1> jmlowe: shouldn't be hard, we would need to add a task (in teuthology) that does the btrfs scrub during some other workload gen
[19:25] <slang1> jmlowe: what were you using to generate the load?
[19:27] <absynth_47215> err
[19:27] <absynth_47215> Kdecherf: rbd_cache active?
[19:27] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:27] <jmlowe> slang1: I was running a vm with the qemu rbd driver, 2 rbd devices one for os one for data, rsync'ing 10's of gigs of files of mixed sizes from another machine to the data rbd with ext3 on it
[19:28] * jlogan (~Thunderbi@2600:c00:3010:1:3903:fb9:e591:774d) Quit (Ping timeout: 480 seconds)
[19:29] <slang1> jmlowe: this sounds like a real bug. Let's create a ticket for it
[19:29] <jmlowe> slang1: I've had inconsistent pg's with a ceph cluster that started as 0.48, then again with a fresh cluster
[19:30] <jmlowe> slang1: it's already filed and being worked on by sjust issue 3810
[19:31] <slang1> jmlowe: ah ok
[19:31] <absynth_47215> *crosses fingers this is a btrfs issue*
[19:31] <jmlowe> slang1: one commonality is that I ran a btrfs scrub while the osd's were under load, I don't know if it's related but I don't really have a good test harness to rule it in or out
[19:31] <slang1> jmlowe: can you get the xattrs of that file he requested?
[19:33] * amichel (~amichel@salty.uits.arizona.edu) has joined #ceph
[19:33] <jmlowe> I think that object was in the previous incarnation of the cluster, the one that started life as 0.48, there is some overlap in the logs because I wanted to make sure I recored all the the operations for the full life cycle of all objects, the object he requested doesn't exist
[19:43] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[19:45] <absynth_47215> but you are sure the corruption is new, right?
[19:45] <absynth_47215> as in, the newly rsynced data is corrupted
[19:47] * jlogan (~Thunderbi@ has joined #ceph
[19:53] <jmlowe> absynth_47215: as in the objects on the primary and secondary are different sizes on disk, interestingly enough the objects on the secondary are often 4MB like you would expect but on the primary they are smaller
[19:54] <absynth_47215> jmlowe: but you are sure these issues haven't been carried over from 0.48?
[19:54] * Ryan_Lane (~Adium@ has joined #ceph
[19:54] <jmlowe> absynth_47215: blew away the cluster and recreated it with 0.56.1 to be sure it wasn't something carried over
[19:54] <absynth_47215> ok
[19:55] * Ryan_Lane (~Adium@ Quit ()
[19:57] * Ryan_Lane (~Adium@ has joined #ceph
[20:00] * chutzpah (~chutz@ has joined #ceph
[20:00] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[20:05] <jmlowe> absynth_47215: I don't know, I kind of hope it's a ceph issue instead of a btrfs issue, it might get fixed *cough* btrfsck *cough*
[20:05] * dosaboy (~gizmo@ Quit (Read error: Connection reset by peer)
[20:05] * dosaboy (~gizmo@ has joined #ceph
[20:06] * dosaboy (~gizmo@ Quit ()
[20:06] * BillK (~BillK@ Quit (Ping timeout: 480 seconds)
[20:07] <absynth_47215> well... i think bugs like this one have a pretty high priority on the list, so if it is reproducible somehow, it will get squashed
[20:09] <paravoid> sjustlaptop: hey
[20:09] <jmlowe> as I understand it, I'm prodigious in making this happen and it's not happening for other people
[20:10] <sjustlaptop> paravoid: hi
[20:10] <paravoid> hi!
[20:10] <janos> is there any need/advantage to large frames when networking?
[20:11] <Matt> is btrfs actually a point where it's stable enough for production use?
[20:11] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[20:11] <janos> advantage with ceph, not a general question ;)
[20:11] <absynth_47215> jmlowe: we saw stuff like that in argonaut before...
[20:11] <paravoid> sjustlaptop: so... #3905...
[20:11] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[20:11] <absynth_47215> Matt: last time i checked, no
[20:11] <absynth_47215> at least not for our usecase
[20:11] <absynth_47215> but it's been a while
[20:11] <sjustlaptop> paravoid: haven't gotten to it yet, it's on my list
[20:11] <paravoid> okay
[20:12] <paravoid> I'm around
[20:12] <paravoid> if you need anything
[20:12] <sjustlaptop> oh, right, it's a mapping problem, I'll look at it shortly
[20:13] <Matt> absynth_47215: yeah, that was my take on it
[20:14] <Matt> but I've not been back and looked at it for a few months
[20:15] <absynth_47215> did you read the recent performance comparison blog article?
[20:15] <Matt> nope
[20:15] <Matt> actually
[20:15] <sjustlaptop> paravoid: ok, can you attach the output of 'ceph osd tree' ?
[20:16] <Matt> I might have
[20:16] <sjustlaptop> I think you are hitting a retry limit problem
[20:16] <Matt> my brain is having issues today :)
[20:17] <paravoid> sjustlaptop: I have 48 osds' weight set to 0 on purpose (days ago), 12 osds down/0 and the rest are weight 1
[20:17] <paravoid> set to 0 = ceph osd out
[20:17] <sjustlaptop> yeah, can you attach 'ceph osd tree'?
[20:17] <paravoid> sure
[20:18] <paravoid> done
[20:19] <sjustlaptop> paravoid: what version are you using?
[20:19] <paravoid> 0.56.12
[20:19] <paravoid> er
[20:19] <paravoid> 0.56.1 :)
[20:19] <sjustlaptop> ok, I think you need to set the crush tunables
[20:20] <paravoid> ok
[20:20] <sjustlaptop> one sec
[20:21] <sjustlaptop> http://ceph.com/docs/master/rados/operations/crush-map/
[20:21] <sjustlaptop> at the bottom, the tunables section
[20:21] <paravoid> oh I've seen that
[20:21] <paravoid> tried to run it once even but aborted
[20:21] <sjustlaptop> you'll need the "THE HARD WAY" section for your version
[20:22] <sjustlaptop> the easy way isn't in a 56 point release yt
[20:22] <sjustlaptop> *yet
[20:22] <paravoid> because it said that "only do it if the developers tell you so" :)
[20:22] <sjustlaptop> actually, one sec
[20:22] <sjustlaptop> are you using kernel clients?
[20:22] <paravoid> no
[20:22] <paravoid> just radosgw
[20:22] <sjustlaptop> and it's up to date?
[20:22] <paravoid> yes
[20:22] <sjustlaptop> then it should be fine -- sagewk: right?
[20:23] <sagewk> catching up, one sec
[20:23] * epa_ (~epa@84-253-205-45.bb.dnainternet.fi) Quit (Quit: leaving)
[20:23] <paravoid> btw, this was "working", as in no incomplete/stale
[20:23] <paravoid> then OSDs flapped, then this happened
[20:23] <sagewk> you should change all but the descend_once one.. that one isn't upstream yet.
[20:23] <sagewk> the othersshould be good from v3.5 and later, iirc.. it's there in that doc.
[20:24] <paravoid> I don't use kernel clients
[20:24] <sagewk> oh nm, misread. right, then you're all good.
[20:24] <sagewk> in that case, update that one too :)
[20:24] <sagewk> hmm, actually, i thought that was in the bobtai branch, let me check
[20:24] <sjustlaptop> sagewk: wip_peering_wq for the heartbeat thing, doing a quick test now
[20:25] <sagewk> paravoid: yeah, it is.. if you run the latest bobtail branch, you can do 'ceph osd crush tunables optimal' and all will be swell. it's not in v0.56.1 tho.
[20:25] <paravoid> set the tunables
[20:26] <paravoid> osds flapping again
[20:26] <paravoid> lost 8
[20:26] <paravoid> 12
[20:27] <paravoid> 17
[20:28] <paravoid> why does this keep happening?
[20:28] <paravoid> osds getting marked down all over the cluster?
[20:29] <paravoid> 2013-01-24 19:27:47.993262 osd.47 [WRN] map e99492 wrongly marked me down
[20:29] <paravoid> 2013-01-24 19:27:48.696216 osd.41 [WRN] map e99492 wrongly marked me down
[20:29] <paravoid> lots of those
[20:30] <phantomcircuit> "reader got old message"
[20:30] <paravoid> 110M/s in peering traffic
[20:30] <phantomcircuit> message in question gets dropped resulting in data loss right
[20:32] <paravoid> sagewk: looks like #3904
[20:32] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[20:32] <sagewk> the tunables triggered a bunch of data movement.. there is no other client activity?
[20:33] <paravoid> nope
[20:33] <sagewk> reader got old message means that tcp sockets disconnected
[20:33] <sagewk> is the network stable?
[20:33] <Kdecherf> absynth_47215: I don't use rbd at all for now
[20:33] <paravoid> that wasn't me
[20:33] <sagewk> oh sorry :)
[20:33] <paravoid> I'm seeing backtraces like the #3904 ones
[20:34] <paravoid> ok, ceph-mon consumed all the memory of the system
[20:34] <paravoid> oom killer is about to kill it
[20:34] <paravoid> (that's #3906)
[20:34] <phantomcircuit> sage, it should be
[20:35] <phantomcircuit> sagewk,
[20:35] <phantomcircuit> ^
[20:36] <sagewk> how many pgs are we talking about?
[20:36] <paravoid> me?
[20:36] <sagewk> yeah
[20:36] <paravoid> 16952 pgs
[20:37] <sagewk> is you can reproduce 3904 (want_acting assert) with osd logs up, that would be super awesome.
[20:37] <sagewk> logs will also hopefully tell us why the osds are getting marked down. are the osd hosts swapping or anything?
[20:38] <paravoid> no
[20:38] <paravoid> not even close
[20:38] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[20:38] <paravoid> they're getting marked down because they die :)
[20:38] <paravoid> because of the assert
[20:39] <sagewk> oh, that's good, there's only one problem then (er, ignoring the mon memory for now)
[20:39] <paravoid> well, probably
[20:39] <sagewk> for that, logs would make me super happy :)
[20:39] <paravoid> i haven't checked all of them
[20:40] <paravoid> it's not easily reproducable, it happens at random
[20:40] <paravoid> and debug on all of the OSDs is erm, challenging :)
[20:40] * Ryan_Lane (~Adium@ has joined #ceph
[20:41] <paravoid> osdmap e99524: 144 osds: 119 up, 84 in
[20:41] <sagewk> how about " ceph osd tell \* injectargs '--debug-osd 0/20 --debug-ms 0/1' "
[20:41] <sagewk> that will log in memory, and only writ eto the file if we crash
[20:41] <sagewk> do that, then restart all osds, and then hopefully at least one will crash...
[20:42] <paravoid> sec, monitors are unresponsive
[20:42] <paravoid> not very happy about all that I guess
[20:42] <paravoid> oops, 16G of RAM all consumed again
[20:44] <paravoid> so all of the osds are up again
[20:44] <paravoid> not crashing anymore
[20:44] <sagewk> rebalancing?
[20:44] <sagewk> let's undo the crush change, and then redo it, and see if we can trigger the crash
[20:44] <paravoid> well, it was 18% degraded before
[20:44] <sagewk> (once you've injected those logging optoins)
[20:45] <paravoid> now it's 21.9%, but still peering
[20:45] <sagewk> did you do the 'ceph osd crush tunables X' method to adjust the tunables?
[20:45] <paravoid> no
[20:45] <sagewk> manually?
[20:45] <paravoid> getcrushmap, set tunables there, setcrushmap
[20:45] <paravoid> and I kept the old one
[20:46] <sagewk> perfect, just reinject that one and it'll go back to before
[20:46] <paravoid> don't you want to wait and see if those incomplete & stales one will be found?
[20:46] <paravoid> 2013-01-24 19:46:29.995965 mon.0 [INF] pgmap v1817442: 16952 pgs: 4 inactive, 29 active, 5227 active+clean, 8337 active+remapped+wait_backfill, 275 active+degraded+wait_backfill, 9 stale, 7 active+recovery_wait, 118 stale+active+clean, 35 stale+active+remapped+wait_backfill, 451 peering, 13 remapped, 29 down+peering, 379 active+remapped+backfilling, 153 active+degraded, 86 active+degraded+backfilling, 44 stale+peering, 315 active+degraded+remapped+wait_bac
[20:46] <sagewk> sure.
[20:46] <paravoid> heh...
[20:47] <sagewk> but then: let's beat it with a stick until we see that assert again :)
[20:47] <paravoid> sure
[20:47] <paravoid> I'd love that too
[20:47] <paravoid> this is nasty
[20:47] <paravoid> peering is awfully slow btw
[20:48] <sagewk> the logging is slowing it down
[20:48] <paravoid> I don't think this took effect yet
[20:48] <slang1> jmlowe: the logs posted to the ceph.com are for the 0.48 case, right? do you have new logs for the 0.56.1 case?
[20:49] <paravoid> I ran it but mons were unresponsive and I interrupted it
[20:49] <paravoid> I'll do it again when we're about to test
[20:49] <paravoid> 373 peering now (out of 451 above)
[20:51] * dosaboy (~gizmo@ has joined #ceph
[20:53] <sagewk> paravoid: going to grab some food, back shortly!
[20:54] <paravoid> thanks :)
[20:54] <paravoid> 348 peering fwiw
[20:54] <paravoid> still waiting
[20:54] <paravoid> when it's done I'll try to collect logs
[21:03] <lxo> sagewk, was there any work towards making the deadlocks go away or be less likely when an osd or mds host ceph.ko-mounts the ceph filesystem and writes to it as if there was no tomorrow?
[21:04] <lxo> I've just realized I've been doing just that for a couple of days now, and there haven't been any deadlocks; they used to be very noticeable when I started playing with ceph (like, it seemed like there was something horribly wrong with ceph.ko or btrfs that kept on hanging back then ;-)
[21:04] <jksM> xdeller: 1018 pgs, 2641 GB data, 6037 GB used, 10290 GB / 17115 GB avail
[21:09] * dosaboy (~gizmo@ Quit (Ping timeout: 480 seconds)
[21:13] * dosaboy (~gizmo@ has joined #ceph
[21:21] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[21:26] <lxo> what's with this temporary jump from GBs to PBs in my ceph logs? http://pastebin.com/Qv0geJH2
[21:27] * noob21 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[21:29] <sagewk> lxo: not really. it's a fundamental issue with kernel vs user space
[21:30] <sagewk> and not specific to ceph. you can make it hard to trigger by tuning the kernel vm
[21:31] <sagewk> paravoid: back
[21:31] <paravoid> hey
[21:31] <lxo> I wonder how loopback manages to get around what I gather is a similar issue, and whether something similar could be used for ceph
[21:31] <paravoid> so peering was stuck at 300something, then 3 osds died out of the blue
[21:31] <sagewk> jksm: how is your heartbeat situation?
[21:31] <paravoid> peering got down again (but still incomplete/stale)
[21:32] <paravoid> I just restored the previous crush map
[21:32] <sagewk> do you have logs for those crashed osds?
[21:32] <paravoid> no crashes that I can see
[21:32] <paravoid> I'm afraid not
[21:32] <sagewk> hmm ok
[21:33] <paravoid> swapping crush maps again
[21:35] <lxo> my list of favorite ceph.ko bugs: touch -h on soft links (and named pipes?) doesn't make it to the metadata server, and zero-sized files often disappear. come to think of it, it could be a single bug having to do with file metadata not being marked as dirty and flushed to the mds before expiration or unmounting
[21:35] * nyeates (~nyeates@pool-173-59-239-231.bltmmd.fios.verizon.net) Quit (Quit: Zzzzzz)
[21:35] * Tamil (~tamil@ Quit (Quit: Leaving.)
[21:36] <sagewk> lxo: if you can open tickets with a procedure for reproducing, it'll make these easy to squash later
[21:36] * Tamil (~tamil@ has joined #ceph
[21:36] <paravoid> sagewk: dammit, still stable
[21:36] <paravoid> (never thought I'd say those words)
[21:37] <lxo> I'm pretty sure I did long ago; gotta double check
[21:37] <paravoid> that mon leak is 100% reproducible though :)
[21:37] <lxo> paravoid, heh
[21:38] <sagewk> triggered by the data movement? or just the background recovery activity of your cluster?
[21:38] <paravoid> peering
[21:38] <paravoid> when I swap crush, it happens almost immediately
[21:40] <sagewk> how many mons?
[21:40] * dosaboy (~gizmo@ Quit (Quit: Leaving.)
[21:40] <paravoid> 3
[21:40] <paravoid> only the active one leaks though.
[21:40] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[21:41] <sagewk> k
[21:41] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[21:42] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[21:43] <jmlowe> slang1: just noticed your question, those logs start the day I blew away the old cluster and created a 0.56.1 cluster so they should be what you need
[21:44] <sjustlaptop> sagewk: pgtemp?
[21:45] <slang1> jmlowe: but the objects aren't there?
[21:45] <sagewk> paravoid: ceph osd dump | grep -c pg_temp
[21:46] <sagewk> i wouldn't expect it to get *that* big... :/
[21:46] <paravoid> 10858
[21:46] <sagewk> ceph osd getmap -o /tmp/om ; ls -al /tmp/om
[21:46] <paravoid> got osdmap epoch 100113
[21:46] <paravoid> -rw-r--r-- 1 root root 434487 Jan 24 20:46 /tmp/om
[21:46] <sagewk> yeah that shouldn't mean gigabytes
[21:47] <paravoid> ?
[21:47] <sagewk> the congress maps are that big and those mons only have 4gb ram
[21:47] <sagewk> how many osds?
[21:47] <paravoid> 2013-01-24 20:47:11.371885 mon.0 [INF] osdmap e100118: 144 osds: 128 up, 81 in
[21:48] <jmlowe> slang1: apparently they are missing or it belong to the old cluster, if it happened before around 11:00 am in the oldest log you are looking at the old cluster with inconsistent pg's, you probably want something after 13:00 in the oldest log
[21:49] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[21:51] * dosaboy (~gizmo@ has joined #ceph
[21:53] <sagewk> paravoid: hmm. trying to reproduce the mon leak.. i have 96 osds and 19000 pgs on this cluster.
[21:54] <paravoid> kill 10 of them? :)
[21:54] <sagewk> no dice. maybe my objects are bigger or something
[21:54] * jlogan (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[21:55] <paravoid> can't reproduce the crash :((
[21:55] <paravoid> I've swapped crush like a dozen times now
[21:57] <sagewk> hrm. maybe set debug_osd to 0, debug_ms to 1, and try again?
[21:58] <paravoid> how come?
[21:58] <sagewk> ceph-mon holding steady at 87m
[21:58] <sagewk> i think the logging is slowing things down and preventing hte bug from triggering
[21:58] <sagewk> but the messages alone will give us some clue.
[21:59] <sagewk> and also get us back to the point where we can trigger the bug
[21:59] <paravoid> ceph osd tell \* injectargs '--debug-osd 0/0 --debug-ms 0/1 ?
[21:59] <sagewk> yeah
[22:00] <paravoid> done
[22:00] <paravoid> set old crush
[22:00] <paravoid> 5G usedroot 17478 15.3 28.8 5065496 4738048 ? Ssl 20:42 2:49 /usr/bin/ceph-mon --cluster=ceph -i ms-fe1001 -f
[22:01] <paravoid> root 17478 16.8 49.8 8406804 8186004 ? Ssl 20:42 3:17 /usr/bin/ceph-mon --cluster=ceph -i ms-fe1001 -f
[22:01] <paravoid> lucky me
[22:03] <sagewk> wth... can you turn up logging on the mon for a bit, maybe there will be some clues there
[22:03] <sagewk> debug mon = 10, debug ms = 1
[22:03] <paravoid> now it says not in quorum
[22:03] <sagewk> over some period where it grows significantly
[22:03] <paravoid> lost monitors
[22:03] <paravoid> ah, the other one is OOMing
[22:05] <paravoid> and yet no crashes, amazing
[22:06] * denken (~denken@dione.pixelchaos.net) Quit (Remote host closed the connection)
[22:06] <paravoid> ok, running mon with debug mon 10, debug ms 1
[22:06] <paravoid> set new crush map
[22:07] <paravoid> leaking
[22:07] <paravoid> sagewk: log is 600MB, anything in particular that you'd like me to grep for?
[22:08] <sagewk> that's probably enough.. can you bzip and post somewhere?
[22:09] <via> liiwi: whats wrong with those switches?
[22:16] <janos> via: if you mean the ones from hours ago - i think he was suggesting they don't have enough lead in them
[22:16] <janos> but he was aiming to fix that at the range!
[22:17] <via> the dlink dgs 1210 24?
[22:17] <janos> i think so, yeah
[22:18] <lxo> sagewk, http://tracker.newdream.net/issues/1878 is the bug
[22:19] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:23] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:27] * nyeates (~nyeates@pool-173-59-239-231.bltmmd.fios.verizon.net) has joined #ceph
[22:34] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[22:38] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[22:39] * nz_monkey (~nz_monkey@ has joined #ceph
[22:40] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:41] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:43] * The_Bishop (~bishop@e179007201.adsl.alicedsl.de) has joined #ceph
[22:43] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[22:47] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:53] * houkouonchi-work (~linux@ Quit (Ping timeout: 480 seconds)
[22:55] * vata (~vata@2607:fad8:4:6:dda8:cf80:46fe:873c) Quit (Quit: Leaving.)
[23:05] * houkouonchi-work (~linux@ has joined #ceph
[23:16] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:16] * jlogan (~Thunderbi@2600:c00:3010:1:ed7c:64e2:3954:4f7e) has joined #ceph
[23:18] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:18] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:27] * danieagle (~Daniel@ has joined #ceph
[23:31] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[23:34] * BillK (~BillK@124-169-244-193.dyn.iinet.net.au) has joined #ceph
[23:40] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[23:50] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[23:50] <xdeller> http://xdel.ru/downloads/ceph-log/allocation-failure/ - may be i`m wrong, but seems ceph revives old mysql numa-aware bug
[23:52] * nmartin (~nmartin@adsl-98-90-208-215.mob.bellsouth.net) Quit (Ping timeout: 480 seconds)
[23:56] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) Quit (Quit: Leaving.)
[23:57] * denken (~denken@dione.pixelchaos.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.