#ceph IRC Log

Index

IRC Log for 2011-09-12

Timestamps are in GMT/BST.

[0:07] * verwilst (~verwilst@d51A5B19B.access.telenet.be) Quit (Quit: Ex-Chat)
[3:36] * yoshi (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[10:08] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[10:20] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Read error: Operation timed out)
[10:26] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[11:37] * yoshi (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:52] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[14:10] * drnexus (~drnexus@branch.inria.fr) Quit (Quit: Lost terminal)
[14:26] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:50] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:15] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[16:25] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[17:23] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:52] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:03] <slang> I'm still having problems getting all of my pgs out of bad states (crashed+peering, peering, etc.)
[18:06] * gregaf (~Adium@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:07] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[18:07] * gregaf (~Adium@aon.hq.newdream.net) Quit ()
[18:12] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[18:12] <slang> One example pg is 0.1p17. The four its on are: [17,24,34,12]. I tried to mark down osd 17 with debugging enabled, this is the log: http://dl.dropbox.com/u/18702194/osd.17.log
[18:13] <slang> err...sorry pg = 1.0p17
[18:17] <slang> it looks that pg does not have the master log (!hml)
[18:19] <slang> but its also empty (I think?)
[18:20] <slang> could it be that the primary for this pg got taken over by osd17 when I lost those 5 osds, but (because it was empty) the master log didn't get generated?
[18:21] <gregaf> slang: you're going to have to wait a bit ??? sage is out of the office this week and sjust isn't in right now (sorry)
[18:22] <slang> gregaf: ok thanks for letting me know
[18:27] <gregaf> slang: np. It looks to me like either there's a transition missing or maybe a problem with how it's handling the lost node; I'll discuss it with sjust as soon as he has time
[18:27] <gregaf> if it's one of the things I think it is, shouldn't take too long, but I don't want to go mucking around in there; it's changed a lot since I worked on it ;)
[18:33] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:41] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:44] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[18:55] <xns> weird, just had a case where 2 clients suddently stopped being able to access the cluster, lets call them client1 and client2. client2 had an --inplace rsync writing to it, client1 was idle. I went for coffee, came back and all io had stopped, no errors on the mds's/or osds. killed client1's cfuse to restart it with debugging to see what was going on.. suddenly IO to client2 resumes, and theres no issues with the newly mounted client1.
[18:56] <Tv> xns: i see you used the plural of mds.. that might be the reason -- single active mds is still the most tested setup
[18:56] <Tv> xns: but i think gregaf here would like to learn how to reproduce your issue ;)
[18:56] <gregaf> heh
[18:56] <xns> yeah greg mentioned that
[18:57] <gregaf> it sounds like a client request got "lost" in the MDS
[18:57] <xns> ultimately I want to end up using a multimds config so I dont mind spending the time to try and break it / report issues ;p
[18:57] <gregaf> and when you killed the client all its requests got dropped, along with its capabilities on files
[18:57] <gregaf> if you have good logs from the MDS we can take a look at that and probably work out what happened and how to fix it
[18:59] <gregaf> otherwise we'll run into it again at some point (and actually we've been picking up a few such bugs in our QA runs)
[19:00] <xns> Yeah I'm about to rejig my test environment so I can turn up debugging on the mds's, at log level 10 I produce about a gig of logs a minute while trying to reproduce this heh.
[19:00] <xns> with luck I can get you good logs today
[19:01] <gregaf> yeah, our debug output isn't very differentiated right now :/
[19:04] <xns> thats alright its been a useful for learning how things work under the hood
[19:06] <bchrisman> I was wondering about debugging too.. it looks like the debug mechanism in the code is designed to look at bit like the kernel dout mechanism.. .was wondering if there were plans/code to allow things like by-file or by-function debug logging specification?
[19:07] <bchrisman> i'd love to restrict debug to certain files, as right now I fill up log partitions extremely quickly.
[19:07] <gregaf> bchrisman: most files define a dout_prefix macro
[19:07] <gregaf> you can adjust that
[19:07] <gregaf> the problem is that the files tend to be very large
[19:08] <gregaf> I suppose you could probably redefine it for specific functions?
[19:08] <bchrisman> hmm.. how about an admin-defined log filter? like debug_mds_pipe_cmd=grep -i lock? :)
[19:08] <gregaf> but there's not a pretty way to adjust it once the program is compiled
[19:08] * adjohn (~adjohn@50.0.103.34) has joined #ceph
[19:09] <gregaf> yeah, it's not nearly that advanced
[19:09] * cmccabe (~cmccabe@69.170.166.146) has joined #ceph
[19:09] <bchrisman> yeah.. my problem is that for very large data sets, collect-and-then-debug can be problematic.
[19:09] <bchrisman> as a result, I end up trying to use injectargs 'just at the right time'??? which is not a great solution.
[19:13] <Tv> bchrisman: you could get your grep -i lock effect from logging to stderr and filtering there.. i've loved runit's log handling for that: http://smarden.org/runit/
[19:14] <Tv> that can also do things like "debug logs never take more than 100MB"
[19:14] <bchrisman> ahhh interesting...
[19:15] <Tv> the difference is, you burn the cpu in generating+parsing the logs, as opposed to filtering as early as possible
[19:15] <Tv> but it's a lot more flexible
[19:15] <bchrisman> though replacing init sounds like??? not for the faint of heart? :)
[19:15] <Tv> you can run it under init
[19:15] <Tv> that's what i always do
[19:15] <bchrisman> ahh
[19:16] <Tv> bchrisman: fyi daily delayed ~5-10 min
[19:16] <Tv> waiting for previous call to end
[19:17] <bchrisman> cool
[19:25] * sagelap (~sage@212.179.87.6) has joined #ceph
[19:28] <slang> sagelap: hey
[19:29] <sagelap> hey
[19:30] <slang> sagelap: sounds like you're out of the office, but if you get a chance, I've got pgs stuck in crashed+peering that I'm trying to figure out
[19:30] <sagelap> all osds up?
[19:30] <gregaf> I think sjust is looking at it now
[19:30] <slang> not all, but the pgs that are crashed+peering are not on the down osds
[19:30] <slang> gregaf: ok
[19:31] <gregaf> looked to me like a problem with lost handling, and he agreed
[19:31] <sagelap> slang: that can happen if the osds think one of the down osds may have written to the pg when it was up
[19:31] <slang> I tried the "mark an osd down with debugging enabled" thing. The log is: http://dl.dropbox.com/u/18702194/osd.17.log, the pg in question on osd 17 is 1.0p17
[19:32] <gregaf> sagelap: should we try and call you for standup tomorrow?
[19:33] <slang> I'm trying to parse the log, but its not clear to me which parts of the pg state are making it crashed+peering
[19:33] <sagelap> gregaf: yeah. i almost made it today but tmobile tech support was taking forever
[19:34] <gregaf> okay ??? wasn't even sure you guys would be awake, I'll check Skype for you tomorrow
[19:34] <slang> sagelap: you're not a beach somewhere with your laptop are you?
[19:35] <slang> (that would be a travesty)
[19:35] <sagelap> slang: sitting in a hotel room in israel. i can see the beach out my window...
[19:36] <slang> sagelap: I would tell you to close your laptop but I want help with this problem I'm having :-)
[19:36] <slang> <- selfish
[19:36] <xns> heh sounds like me, except the thought of my wife killing me tends to make me think twice
[19:37] <Tv> gregaf: http://valgrind.org/downloads/guis.html
[19:37] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[19:38] <Tv> http://alleyoop.sourceforge.net/images/alleyoop-main.png
[19:40] <Tv> gregaf: http://pypi.python.org/pypi/msparser/ ?
[19:41] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[19:42] <slang> it looks like all these problematic pgs are empty
[19:42] <slang> in which case maybe a quick fix would be to just remove them?
[19:42] <sjust> slang: I'm looking now, seems to be a flaw in the marking osds lost step
[19:42] <sjust> or rather in the osd handling of lost osds
[19:54] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[20:02] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[20:07] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[20:40] * cp (~cp@206.15.24.21) has joined #ceph
[21:30] * lxo (~aoliva@9YYAABFLQ.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[21:32] * lxo (~aoliva@09GAAAF7K.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:40] * darkfader (~floh@188.40.175.2) Quit (Remote host closed the connection)
[21:40] * darkfader (~floh@188.40.175.2) has joined #ceph
[22:10] * adjohn (~adjohn@50.0.103.34) Quit (Quit: adjohn)
[22:14] <slang> sjust: it looks as though the state machines for those pgs are stuck in WaitUpThru
[22:15] <greglap> slang: yeah, they seem to be looping into it ??? he's working on it but there are some things here we need to get working too
[22:16] <slang> I'm sure
[22:17] <slang> greglap: it may seem like I'm being pesky, but really I'm trying to provide as much help as I can by debugging the issue here myself
[22:17] <greglap> slang: no, that's great! :)
[22:18] <slang> I'm really looking for pointers on where I might look
[22:18] <greglap> at this point we've diagnosed the issue ??? it's just not handling the lost stuff properly
[22:18] <greglap> that's not likely to be something you can fix, although sjust can correct me?
[22:18] <sjust> greglap: sorry I've been a bit busy, I'll get back to working on it in a few minutes :)
[22:19] <greglap> but we're also working on a dead cluster here that came out of some recent changes in the on-disk format, and that's blocking people on other teams
[22:19] <slang> it looks like adjust_need_up_thru needs to get called, but that won't happen till the osdmap gets an AdvMap event
[22:21] <sjust> ajust_need_up_thru doesn't seem to consider lost osds
[22:38] * johnl_ upset ceph again
[22:47] <greglap> johnl_: thanks for the report!
[22:47] <greglap> we've seen one or two other issues with exporting (the first backtrace there), the one about auth_pins is new though
[22:47] <greglap> but multi-mds system bugs aren't a priority right now so I dunno when we'll get to it ??? is one MDS not enough right now?
[22:51] <sagelap> johnl_: if you include full logs for all mds's i'll get to it at some point.
[22:56] <sjust> sagelap: sorry, should have mentioned that before I left Friday I noticed that the backlog generation on corrupt pg log branch actually doesn't work since the thread isn't running, I'm pushing a replacement now
[22:58] <sagelap> sjust: k
[23:03] <johnl_> greglap: am keen on high availability :)
[23:03] <johnl_> sagelap: ok. I'll up the debug and reproduce and attach full logs.
[23:04] <greglap> johnl_: just run one in standby then? that gets you all the availability you can get without the stability implications
[23:05] <johnl_> greglap: didn't know I could do that! just leave the max_mds option as 1? does that do it?
[23:09] <greglap> johnl_:yeah
[23:09] <greglap> that'll be much more stable given how they work anyway
[23:10] <johnl_> k cool.
[23:11] <johnl_> i think that tacit knowledge needs to be in the wiki somewhere :)
[23:11] <johnl_> maybe it is, but I haven't come across it yet
[23:14] <Tv> johnl_: let me know if http://ceph.newdream.net/docs/latest/architecture/#ceph-filesystem covers what you wanted, or not
[23:19] * adjohn (~adjohn@50.0.103.34) has joined #ceph
[23:20] <johnl_> Tv: hey cool, nice docs.
[23:20] <johnl_> can't find a link to those docs rom the ceph page or the wiki though
[23:22] <xns> ok that took awhile but I think everything just stopped dead again.. 20gb of logs later heh
[23:23] <johnl_> Tv: tbh, it's possibly not harsh enough either. be nice to see "It's not currently recommended."
[23:24] <johnl_> :)
[23:37] <Tv> johnl_: the docs are still very much a work in progress.. some of the wiki pages are starting to point to the new docs instead, but i think the whole website is going to go through a redesign and at that point the docs will be more prominent
[23:38] <johnl_> k cool
[23:38] <johnl_> those docs in git somewhere? can we contribute?
[23:38] <xns> greglap: got a good sample of logs for both mds's and one of the clients for this deadlock'ish problem, gottea head out here in a minute but I'll create an issue and upload them when I get home.
[23:39] <greglap> xns: cool, we'll take a look at some point ??? can't guarantee how soon right now though :(
[23:41] <xns> greglap: no biggy
[23:47] <Tv> johnl_: /doc of ceph.git master
[23:50] <johnl_> Tv: ta!
[23:57] <greglap> Tv: the gitbuilders seem to be broken ??? they're complaining about testlibrbd build stuff, but it's going fine on my machine and I don't see any references to it existing in the codebase any more
[23:57] <Tv> greglap: looking

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.