#ceph IRC Log


IRC Log for 2013-02-05

Timestamps are in GMT/BST.

[0:01] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:08] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[0:12] <slang1> elder: http://pastebin.com/9W3j7vuD
[0:12] <slang1> elder: seeing that in the rbd tasks
[0:12] <elder> Is it causing errors?
[0:12] <slang1> elder: sage wanted me to ask about it to see if its something you recognize
[0:13] <slang1> elder: those jobs are failing because of the EBUSY error
[0:13] <elder> Do you know about the ceph tests that are hanging?
[0:13] <elder> In nightly runs?
[0:13] <slang1> the rest of the job (in this case kernel untar) actually succeeded
[0:13] <slang1> elder: well actually - yeah these are hanging (not failing)
[0:13] <elder> I have seen messages like that, and they don't cause the test to fail.
[0:14] <elder> Oh.
[0:14] <elder> I don't know, I'm pretty scattered right now. What I've been trying to do all afternoon is reproduce the ceph failures to see if I can learn something about it.
[0:15] <elder> But in doing so I've been faced with changing my environment a bit, and am working with ceph and other things I'm not normally doing.
[0:15] <elder> So I'm making no headway and am feeling as lost as when I began, unfortunately.
[0:15] <slang1> elder: I've been able to reproduce the failure you're seeing with the key on mount
[0:15] <slang1> elder: I'm going to try with an older version of teuthology that doesn't have the changes I've made
[0:15] <elder> That would be great.
[0:15] <elder> I just don't know what things are problems of my own making.
[0:16] * slang1 nods
[0:16] <elder> I never started on solid ground, so everything seems squishy right now.
[0:32] <slang1> elder: I get that same error with an older version of teuthology, fwiw
[0:32] <elder> OK.
[0:33] <elder> Tell me this, is there any reason I can't run "ceph -s" on the client? (gregaf, I am trying to get "ceph -s" to work because you suggested it)
[0:33] <gregaf> elder; I think you ought to be able to do so, yes, though I'm not sure what caps it has available
[0:34] <elder> I'll try it.
[0:34] <dmick> if ceph -s won't work, it seems to bode ill for lots of other thigns
[0:35] <elder> ubuntu@plana18:/tmp/cephtest$ ceph -s
[0:35] <elder> health HEALTH_OK
[0:35] <elder> monmap e1: 1 mons at {0=}, election epoch 2, quorum 0 0
[0:35] <elder> osdmap e4: 2 osds: 2 up, 2 in
[0:35] <elder> pgmap v7: 24 pgs: 24 active+clean; 0 bytes data, 282 GB used, 572 GB / 900 GB avail
[0:35] <elder> mdsmap e2: 0/0/0 up
[0:36] <elder> What does that tell me?
[0:37] <dmick> looks good to me, as long as you didn't want any MDSes
[0:37] <elder> OK, well I think this is a test that's trying to mount a file system, isn't it gregaf?
[0:37] <elder> So maybe we'd want at least one mds?
[0:38] <gregaf> yeah, that'd be a problem
[0:39] <elder> Does that tell anybody anything about the source of the problem? I'm just trying to identify whether any changes I made to the osd client could be contributing.
[0:39] <gregaf> it seems unlikely that the kernel is at fault here
[0:39] <elder> This--at the moment--seems to say "no," but I'm not going rest on that.
[0:39] <elder> Yay!!!
[0:39] <dmick> er, if you want to run kclient, you need at least one mds, right?
[0:39] <gregaf> :)
[0:39] <dmick> osm
[0:40] <elder> Wait a minute
[0:40] <dmick> oops I mean: isn't kclient the ceph client?
[0:40] <elder> gregaf
[0:40] <elder> THe thing you gave me had no mds defined
[0:40] <gregaf> ...*sigh*
[0:40] <gregaf> *kick*
[0:41] <elder> So... Is that reproducing the nightly problems at all, or is that something different completely?
[0:41] <dmick> better protect myself
[0:41] <gregaf> just for the record, what I said today in standup was that it wasn't working and I needed to look into why, and then Sage said none of them were working in the nightlies
[0:41] * ChanServ sets mode +o dmick
[0:41] <gregaf> so I should wait until those got working to deal with it myself
[0:41] <elder> My aim is to try to help understanding what's going on in the nightlies.
[0:41] <gregaf> yeah
[0:41] <slang1> elder: the nightlies are the EBUSY error I mentioned
[0:41] <slang1> elder: completely separate
[0:41] <elder> That's what they're all doing?
[0:42] <gregaf> so, add an MDS to that config and then go try again
[0:42] <slang1> elder: looks that way
[0:42] <elder> I thought the file system mounts were having trouble too.
[0:42] <elder> So I was looking for something else. I can focus in on rbd much more easily.
[0:43] <elder> Is there a bug open on this?
[0:43] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[0:43] <slang1> not that I know of
[0:44] <elder> ircolle are you around?
[0:45] <ircolle> hmmm, why do you ask? ;-)
[0:46] <elder> Because you're the one who keeps saying everything we do should have a tracker issue associated with it.
[0:46] <elder> Sage pointed me at this: http://pastebin.com/kjgLZAkm
[0:46] <elder> It indicates a file system problem. I've been trying to do something to either rule out my changes or find out they are at fault and fix them.
[0:47] <elder> But I've been getting nowhere.
[0:47] <elder> slang1 believes the problems are just rbd, which is different.
[0:47] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[0:47] <elder> I'll be happy to just quit workin on this for a while, it has not been a very fun afternoon...
[0:47] <elder> And there is evidently no tracker issue.
[0:50] <elder> So I guess ircolle, I'm wondering whether you have a recommendation on how I should proceed.
[0:51] <ircolle> this is kinda a big deal
[0:52] <ircolle> so, I could open a teuthology ticket if we need it, but don't let the process drive this - more important to get this working
[0:52] * scuttlemonkey (~scuttlemo@ has joined #ceph
[0:52] * ChanServ sets mode +o scuttlemonkey
[0:52] <ircolle> This is blocking a bunch of people
[0:52] <elder> I'm not sure I am the right person to be addressing this.
[0:52] <elder> At least not alone.
[0:53] <elder> I've been getting scattered help all afternoon, and it's all very much appreciated, but I don't think it's been effective.
[0:53] <elder> Someone else who knows more about how to troubleshoot ceph should be driving the bus.
[0:53] <slang1> elder: I'm looking into it, sage just wanted to see if the EBUSY error triggered anything in your recent memory
[0:53] <elder> That's fine, but that's not the message I got from him earlier.
[0:54] <elder> I want to participate in getting this fixed though, becuase we just dropped a huge pile of new code.
[0:54] <elder> So I was anticipating problems, just not in the file system.
[0:55] <elder> And the EBUSY thing could very well be new, I'd have to investigate it, but again, the whole I/O path in rbd is new.
[0:56] <slang1> I'll create a ticket so it can be tracked/assigned/whatever...
[0:56] <elder> So I asked the question about whether there was a bugk open.
[0:56] <elder> Yes.
[0:56] <elder> Because that's a mechansim that allows a little more precision in who is responsible and who is doing what, and what we know and what has been tried.
[0:57] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:57] <elder> I didn't realize you were looking into this stuff too slang1 .
[0:58] <slang1> elder: I'm just trying to get teuthology working again
[0:58] <elder> Is that something different from the nightly hangs?
[0:58] <slang1> elder: its not clear the rbd thing isn't caused by the changes to teuthology - once I've eliminated that as an option, I'll probably hand it off
[0:58] <slang1> elder: as I'm not the right person to debug it
[0:59] <elder> OK, so should I wait until you've validated your own chnages somehow?
[0:59] <elder> Are you sure there's actually a problem?
[0:59] <slang1> elder: its a little complicated to answer the 'nightly hangs' question, Sage and I have been killing scheduled runs and restarting with every fix that has gone into teuthology
[1:00] <slang1> elder: many of the killed runs probably showed up (for folks like greg, don't know if there are others) as hangs
[1:00] <slang1> elder: but the rbd EBUSY error is also a hang of specific tasks from the latest scheduled run
[1:01] <elder> Sounds chaotic. Are you able to validate teuthology independent of whether rbd, or ceph, or whatever, has problems?
[1:02] <slang1> elder: I think I can verify that the rbd problems occur on an older version of teuthology
[1:02] <slang1> elder: if so, then we can eliminate teuthology as the cause of those errors
[1:02] <elder> Would you like me to dive into that rbd EBUSY thing then?
[1:02] <slang1> elder: if it seems like its rbd related, I think that would be good
[1:03] <elder> OK. I just want to be constructive, and so far I'm not sure I have been...
[1:03] <slang1> elder: does it seem rbd related?
[1:03] <elder> So that's what I'll do. It does seem rbd related, and as I said, I just replaced the guts of rbd so was looking for what might happen.
[1:03] <elder> I'll go look, but do you have handy the identity of one of these?
[1:04] <slang1> bug #4003
[1:04] <elder> OK.
[1:04] <elder> Thanks.
[1:04] <elder> Well there it i s, it just showed up in my mail box!
[1:04] * KindTwo (~KindOne@ has joined #ceph
[1:04] <slang1> yeah just made it
[1:05] <elder> Actually that does seem familar, but I'll look at it a little more closely.
[1:05] <elder> I am going to cease trying to figure out anything else (e.g., related to hangs of ceph file system tests)
[1:05] * KindOne (KindOne@h166.214.89.75.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[1:05] * KindTwo is now known as KindOne
[1:06] * jlogan (~Thunderbi@ Quit (Read error: Connection reset by peer)
[1:06] * jlogan (~Thunderbi@ has joined #ceph
[1:07] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:07] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[1:13] <elder> Thanks slang1 for your help, you cleared things up a lot.
[1:13] <slang1> np!
[1:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:21] * xiaoxi (~xiaoxiche@ has joined #ceph
[1:21] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:27] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) Quit (Quit: Ex-Chat)
[1:39] * ScOut3R (~scout3r@540079A1.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:56] <gregaf> elder: fyi adding an MDS to that yaml config makes it work out just fine :)
[1:57] <gregaf> (in case that was still tickling your brain at all)
[1:58] <elder> That's a relief.
[1:58] <elder> :)
[1:58] <elder> I guess I diagnosed a problem then...
[1:59] <elder> Not the one I was after, but...
[2:03] * LeaChim (~LeaChim@027ee384.bb.sky.com) Quit (Ping timeout: 480 seconds)
[2:06] <yasu`> dmick ?
[2:09] <dmick> hello yasu`
[2:09] <dmick> send_pg_creates help at all?
[2:09] <yasu`> it didn't help...
[2:09] <yasu`> how to confirm that the send message gets delivered ?
[2:10] <yasu`> ceph -w didn't show it
[2:10] <dmick> you have debug mon = 20, so there should be things in the monitor log
[2:10] <dmick> look for send_pg_creates
[2:15] <yasu`> hmm, log file='/ceph/var/log/...' but no such directory existed, and /var/log/ceph.log is growing ...
[2:15] <yasu`> Let me restart the mon after creating '/ceph/var/log/'.
[2:19] <yasu`> mon.0@0(leader).pg v86190 send_pg_creates 2.0 -> no osds in epoch 295, skipping
[2:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:22] <xiaoxi> rados bench 300 --pool test seq
[2:22] <xiaoxi> is this the right command for rados read test?
[2:23] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[2:23] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Read error: Connection reset by peer)
[2:23] <dmick> yasu`: ok, I see that in the code
[2:24] <dmick> xiaoxi: you have to have data to read; I believe that means running rados bench write first with --no-cleanup
[2:25] <dmick> we could have better docs there
[2:25] * alram (~alram@ Quit (Quit: leaving)
[2:26] <xiaoxi> dmick:thx
[2:30] <dmick> yasu`: I'm guessing ./ceph pg map 2.0 says up [] acting [] then?
[2:31] <dmick> sorry, drop the ./, force of habit
[2:34] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[2:36] <yasu`> yes
[2:37] <yasu`> osdmap e299 pg 2.0 (2.0) -> up [] acting []
[2:37] <dmick> hm. so something must still be wrong with the crushmap, but I don't know what
[2:37] <yasu`> seems right, isn't it ?
[2:37] <yasu`> the crushmap
[2:38] <dmick> are you sure it's in the cluster correctly? How did you push it in?
[2:39] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:39] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:39] <yasu`> the one I've pastebin was retrieved by ceph osd getcrushmap
[2:40] <yasu`> does it mean it is in the cluster ?
[2:41] <yasu`> the push was done by ceph osd setcrushmap -i
[2:41] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:48] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[2:50] <elder> Is there any way to determine who has a watch on an object?
[2:52] <joshd> elder: unfortunately not without manually inspecting osd internals
[2:52] <elder> OK.
[2:53] <elder> I'll try to trace what the kernel client is doing to be sure, but it looks to me like it's dropping its watch request so I don't know why there's this extra watch hanging around.
[2:53] <elder> I have to quit for today though.
[2:54] <dmick> elder: you can trace watch ops at least and verify that someone took one and didn't release it (right joshd)?
[2:56] <joshd> yeah, you could certainly do that on the osd side, logging from the kernel may be more difficult to get
[2:59] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[3:02] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:02] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:12] * jlogan (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[3:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:26] <dmick> yasu`: sage's mail makes me wonder what ceph osd dump says
[3:26] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[3:26] <dmick> specifically about 'rep size' on the pools
[3:27] <dmick> (or you could do ceph osd pool get rbd size too)
[3:28] <yasu`> http://pastebin.com/VQKzJHLk
[3:28] <yasu`> The sage's command looks like fixing the problem
[3:29] <yasu`> the cluster is doing deep scrub now
[3:29] <dmick> yeah, it was set to size 2
[3:29] <dmick> doh. nice call sage :)
[3:29] <yasu`> but the ceph -s became HEALTH_OK
[3:29] <yasu`> Indeed :)
[3:29] <xiaoxi> hi, is there any body also suffering from read performance?
[3:30] <yasu`> and I'm sorry that I should pastebin the ceph osd dump also.
[3:30] <nhm> xiaoxi: same problem as before?
[3:30] <yasu`> so the, rep size should be consistent for all pools ?
[3:30] * Matt (matt@matt.netop.oftc.net) has joined #ceph
[3:31] <dmick> yasu`: well I should have thought to ask about it sooner
[3:31] <dmick> no, but
[3:31] <xiaoxi> nhm:hah, you are online. I have done rados test just before, still 128K req-size in OSD's disk
[3:31] <dmick> it can't fall outside the bounds of min_size/max_size
[3:31] <nhm> xiaoxi: just put the kids to bed.
[3:31] <nhm> xiaoxi: did you try changing the readahead value?
[3:33] <dmick> think of those as bounds, not constraints
[3:33] <yasu`> dmick: I use the ceph only in CephFS. so the pool 2 (rbd) is not used, right ?
[3:33] <xiaoxi> nhm:not yet, what value would you like? increase to 512K? or set it to 0?
[3:33] <dmick> RIGHT
[3:33] <dmick> yikes sorry. Right.
[3:33] <nhm> xiaoxi: try 512
[3:34] <yasu`> got it. so I should fix min/max to 1/10 for all pool.
[3:34] <dmick> something like that
[3:34] <yasu`> Thank you very much dmick !
[3:34] <dmick> heh, thank sage :) but yw
[3:38] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:48] <xiaoxi> nhm:change it to 512 really set avgrq-sz to 1024
[3:48] <xiaoxi> but doesn't benifit performance
[3:51] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:51] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:51] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[3:55] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:03] * Cube1 (~Cube@ Quit (Quit: Leaving.)
[4:10] * yasu` (~yasu`@dhcp-59-166.cse.ucsc.edu) Quit (Remote host closed the connection)
[4:10] <nhm> xiaoxi: Yeah, I figured that was probably a red herring.
[4:13] <xiaoxi> nhm: I have try to pagecache everything so the read will be served from pagecache, it performs quite good, so it's hard to blame on the code efficient.
[4:14] <xiaoxi> and big read_ahead seems at least benifit sequential read request, not sure why
[4:19] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:19] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:22] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:22] <nhm> xiaoxi: I've seen some indication that increasing read_ahead_kb can improve performance.
[4:23] <xiaoxi> nhm: Not sure yet. but I bet increasing this is a tradeoff,it will drop IOPS
[4:24] <nhm> xiaoxi: Certainly possible.
[4:33] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:36] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[4:44] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:50] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:52] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[5:27] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[5:28] * tsygrl (~tsygrl314@c-75-68-140-25.hsd1.vt.comcast.net) Quit (Remote host closed the connection)
[5:49] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[5:50] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:03] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:07] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[6:13] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:21] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:34] * The_Bishop (~bishop@e177091119.adsl.alicedsl.de) has joined #ceph
[6:39] * xiaoxi (~xiaoxiche@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[6:40] <xiaoxi> hi, I have seen 4 of my pgs stay in peering state for quite a long time, restart the corresponding osd daemonssolve the issue, but the osds are in "up and in" status.
[6:43] <sage> can you 'ceph pg <pgid> query' on a hung pg?
[6:44] <paravoid> hey sage
[6:44] <sage> hey paravoid
[6:44] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[6:44] <paravoid> isn't a bit late for you?
[6:44] * loicd (~loic@magenta.dachary.org) has joined #ceph
[6:45] <sage> 9:45pm
[6:45] <sage> last chance to get something done today :)
[6:45] <paravoid> heh
[6:45] <paravoid> okay, I won't ramble on how my cluster is totally fucked again then :)
[6:46] <sage> aie.. what now?
[6:46] <paravoid> heh
[6:46] <paravoid> 100% active+clean, syncing for a few days, then started flapping pgs a lot, for several hours
[6:47] <paravoid> I was away at FOSDEM, so I just left it like that
[6:47] <paravoid> it ended up having 6 pgs active the rest active+clean
[6:47] <paravoid> 4 of them shared an osd, so I "out" that one, then OSDs started flapping all over again
[6:47] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:47] <sage> you didn't by chance have lgoging on?
[6:47] <paravoid> nope
[6:48] <paravoid> that's the executive summary
[6:48] <paravoid> it's still a bit like that
[6:48] <paravoid> we can debug it tomorrow if it stays that way :)
[6:48] <sage> 'ceph pg <pgid> query' is the thing to do to see what the state of a weird pg is
[6:48] <sage> still flapping?
[6:49] <paravoid> I did osd out now and a bunch of them flapped, yes
[6:53] <sage> still flapping, or did it stabilize?
[6:54] <sage> actually, what do you mean by flap? the pg states should go through a bunch of states with osd in or out as data is moved around
[6:54] <xiaoxi> sage:sorry,I can't, one of my teammate just rebooted the daemon...but in general ,what can cause the pg stall in peering state? osd A down and the recovery process is still on the way, then OSD B down, OSD A back ,and before recoveryed, OSD B up again?
[6:54] <paravoid> I mean "down"
[6:55] <paravoid> like 40 out of 144 suddenly down
[6:55] <sage> xiaoxi: nothing should cause it; there is probably a bug. if it happens again, do the pg query can give us some clues. or in a perfect world, you would have logs leading up to it and we can see how it happened.
[6:55] <paravoid> 2013-02-01 07:58:47.554272 7fb8cf886700 0 log [INF] : osdmap e147544: 144 osds: 143 up, 144 in
[6:55] <paravoid> 2013-02-01 07:58:54.204854 7fb8cf886700 0 log [INF] : osdmap e147549: 144 osds: 139 up, 144 in
[6:55] <paravoid> 2013-02-01 07:58:59.048367 7fb8cf886700 0 log [INF] : osdmap e147552: 144 osds: 114 up, 144 in
[6:55] <paravoid> 2013-02-01 07:59:09.353609 7fb8cf886700 0 log [INF] : osdmap e147559: 144 osds: 109 up, 144 in
[6:55] <paravoid> etc
[6:56] <sage> ah.
[6:56] <paravoid> and this resulted in a complete DoS of course
[6:57] <sage> did you see any 'marked down after 3 reports' type messages?
[6:57] <paravoid> I don't remember that, but I do remember a lot of "wrongly marked me down"
[6:57] <sage> did they actually stop/crash, or come back up no their own?
[6:57] <sage> gotcha
[6:57] <paravoid> I don't think they crashed
[6:57] <sage> yeah
[6:58] <sage> that means they weren't responding to the internal heartbeats for long enough to get marked down.. which is 10s of seconds.
[6:58] <paravoid> I'm still seeing a lot of "wrong node"
[6:58] <sage> oh, interesting!
[6:59] <paravoid> peering still takes ages
[6:59] <sage> what version are you running?
[6:59] <paravoid> I upgraded to 0.56.2 before I added the new OSDs
[6:59] <sage> k. i'm going to prepare a branch with teh msgr changes so we can see if the wrong node messages go away.
[7:00] <paravoid> okay
[7:00] <sage> tho i'm worried that if peering is slow something else is still going on.
[7:06] <xiaoxi> paravoid: Seems I have suffered the same things several days before.
[7:06] <paravoid> which one?
[7:06] <sage> xiaoxi: the 'wrong node' msgs or peering?
[7:06] <sage> paravoid: pushed wip-bobtail-osd-msgr
[7:06] <xiaoxi> wrong node and "wrongly mark me down"
[7:08] <xiaoxi> but after fixing all the HW issue(slow disk bay),it seems OK now.This may happen when the load is too high, or as you said before xfs hanging the osd thread
[7:09] <sage> did you see teh 'wrong node' msg once or twice, or was it a repeated stream of messages in the log?
[7:10] <xiaoxi> it is repeated
[7:10] <paravoid> ceph-osd.121.log.3.gz:2013-02-01 06:34:49.027499 7f69e690b700 0 log [WRN] : map e145855 wrongly marked me down
[7:10] <paravoid> ceph-osd.121.log.3.gz:2013-02-01 06:35:53.835164 7f69e690b700 0 log [WRN] : map e145874 wrongly marked me down
[7:10] <paravoid> ceph-osd.121.log.3.gz:2013-02-01 06:38:42.739386 7f69e690b700 0 log [WRN] : map e145938 wrongly marked me down
[7:10] <paravoid> ceph-osd.121.log.3.gz:2013-02-01 09:20:11.106433 7f69e690b700 0 log [WRN] : map e149582 wrongly marked me down
[7:11] <paravoid> ceph-osd.121.log.3.gz:2013-02-01 14:30:43.281155 7f69e690b700 0 log [WRN] : map e155686 wrongly marked me down
[7:11] <paravoid> etc.
[7:11] <paravoid> in other OSDs I mean
[7:11] <paravoid> multiple occurences of that
[7:11] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:13] <sage> paravoid: i mean the "... - wrong node!" message in /var/log/ceph/ceph-osd.NNN.log...
[7:13] <sage> tho if it is what i think it is, it may be contributing to the wrongly marked down issue
[7:13] <sage> i'm hoping the wip-bobtail-osd-msgr branch fixes it...
[7:13] <sage> xiaoxi: are you running bobtail or master?
[7:14] <paravoid> oh sorry
[7:14] <xiaoxi> sage:bobtail 0.56.2,but actually I seen the log on master
[7:15] <sage> also repeating there?
[7:15] <paravoid> wrong node repeating a lot
[7:16] <paravoid> and when I say a lot...
[7:16] <paravoid> # grep -c 'wrong node' *.log | xargs
[7:16] <paravoid> ceph-osd.120.log:11449 ceph-osd.121.log:51789 ceph-osd.122.log:0 ceph-osd.123.log:355210 ceph-osd.124.log:126078 ceph-osd.125.log:0 ceph-osd.126.log:154661 ceph-osd.127.log:5725 ceph-osd.128.log:17315 ceph-osd.129.log:108771 ceph-osd.130.log:74558 ceph-osd.131.log:11359
[7:17] <paravoid> these are logrotated and the system runs on UTC, so almost 24h
[7:17] <sage> xiaoxi: sorry, just to confirm, it repeated over and over on the master branch too?
[7:17] <sage> by master, i mean a version that included b98da75a621ee91be1e41fca0325e8e5f6ae8741 ?
[7:19] <xiaoxi> sage: It's repeated on the master branch (that was 10 days ago).But after fixing HW issue and move to 0.56.2, I never seem this again
[7:19] <sage> yeah. osd only, right? or have you seen it on the mon?
[7:19] <sage> do you know if by chance it was before or after that commit? :)
[7:19] <sage> it was about 10 days ago... trying to figure it out if it might have fixed it.
[7:20] <xiaoxi> sage: I am not sure the version,but it's Ceph 1/24 daily build
[7:20] <sage> xiaoxi: ok cool, the (hopefully!) fix was merged 1/25.
[7:20] <sage> so with some luck, the backport of those patches will resolve it on bobtail. hard to tell, because it is probably difficult to reliably reproduce...
[7:21] <sage> paravoid: ... or is it? :-) if you can reliably make it happen after restarting an osd, we can then test whther it is fixed by wip-bobtail-osd-msgr
[7:22] <paravoid> ceph out/in has worked so far :)
[7:23] <sage> opened to track this http://tracker.ceph.com/issues/4006
[7:24] <paravoid> assuming the "wrong node" is related to "wrongly marked me down"
[7:25] <paravoid> I see wrong node a lot, but marking down doesn't happen *that* often
[7:25] <sage> hopefully, but not necessarily.
[7:25] <paravoid> but when it does, ti happens all over the place
[7:25] <paravoid> like 1/3 of the cluster
[7:25] <sage> the wrong node is the more immediate concern, i think.
[7:25] <sage> k
[7:26] <paravoid> well, the marking down thing produces a multiple hour outage for me and I've had this happen 3-4 times
[7:26] <paravoid> so it's a concern as well :)
[7:29] <sage> yeah definitely
[7:31] <sage> thanks guys, turning in.. ttyl
[7:31] <xiaoxi> sage:any solution for "slow request" follow with " heartbeat_map is_healthy 'OSD::op_tp thread 0x7fa63bc4e700' had timed out after 15"
[7:31] * scuttlemonkey (~scuttlemo@ has joined #ceph
[7:31] * ChanServ sets mode +o scuttlemonkey
[7:34] <xiaoxi> the heartbeat is a seperate thread ,why slow request will result heartbeat issue?
[7:37] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[7:38] * nz_monkey (~nz_monkey@ has joined #ceph
[7:40] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[8:13] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:41] * KindTwo (~KindOne@ has joined #ceph
[8:41] * sleinen (~Adium@user-23-11.vpn.switch.ch) has joined #ceph
[8:43] * KindOne (~KindOne@ Quit (Ping timeout: 480 seconds)
[8:43] * KindTwo is now known as KindOne
[8:51] * sleinen (~Adium@user-23-11.vpn.switch.ch) Quit (Quit: Leaving.)
[8:55] * KindOne (~KindOne@ Quit (Ping timeout: 480 seconds)
[8:56] * KindOne (KindOne@h108.33.28.71.dynamic.ip.windstream.net) has joined #ceph
[9:09] * hybrid512 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:12] * loicd (~loic@lvs-gateway1.teclib.net) has joined #ceph
[9:12] * sleinen (~Adium@2001:620:0:26:c61:f386:6c68:c94c) has joined #ceph
[9:20] * BManojlovic (~steki@ has joined #ceph
[9:27] * leseb (~leseb@mx00.stone-it.com) has joined #ceph
[9:39] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Read error: Connection reset by peer)
[9:39] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[9:40] * ScOut3R (~ScOut3R@ has joined #ceph
[9:46] * espeer (~espeer@105-236-207-22.access.mtnbusiness.co.za) has joined #ceph
[9:46] <espeer> hello, is there anyone here who can answer some questions about cephfs mount points?
[9:47] <espeer> like, what defines the size of the distributed file system?
[10:06] <madkiss> how do I alter the capabilities bestowed to a certain user in the keyring?
[10:09] <madkiss> according to the docs, it's "ceph-authtool -n client.admin …"
[10:09] <madkiss> which doesn't work, because it keeps asking for a keyfile
[10:16] <madkiss> http://ceph.com/docs/master/rados/operations/auth-intro/
[10:16] <madkiss> ceph-authtool -n client.foo --cap osd 'allow rwx' pool=customer-pool
[10:16] <madkiss> does. not. work. in. bobtail.
[10:17] <madkiss> It's "ceph-authtool /etc/ceph/keyring.cinder -n client.cinder --cap osd 'allow rwx pool=images'" I guess
[10:19] * BillK (~BillK@124-148-101-34.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[10:28] * BillK (~BillK@ has joined #ceph
[10:29] * ScOut3R_ (~ScOut3R@ has joined #ceph
[10:30] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:31] * LeaChim (~LeaChim@027ee384.bb.sky.com) has joined #ceph
[10:31] * ScOut3R__ (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[10:36] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[10:38] * ScOut3R_ (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[10:39] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[10:41] * espeer (~espeer@105-236-207-22.access.mtnbusiness.co.za) Quit (Remote host closed the connection)
[10:48] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:55] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[10:58] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:06] * Disconnected.
[11:06] -coulomb.oftc.net- *** Looking up your hostname...
[11:06] -coulomb.oftc.net- *** Checking Ident
[11:06] -coulomb.oftc.net- *** No Ident response
[11:06] -coulomb.oftc.net- *** Found your hostname
[14:43] <- *madkiss* jup.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.