#ceph IRC Log

Index

IRC Log for 2011-05-14

Timestamps are in GMT/BST.

[0:20] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (Quit: Leaving.)
[0:55] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[0:59] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:01] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:07] * sagelap (~sage@static-66-14-234-139.bdsl.verizon.net) has joined #ceph
[1:19] <sagelap> joshd, sjust: any luck reproducing that msgr race?
[1:19] * greglap (~Adium@198.228.211.27) has joined #ceph
[1:20] <sjust> sagelap: we think so, but we are retrying with debugging
[1:20] <sagelap> :(
[1:21] <sagelap> that's the only oustanding issue now, though?
[1:21] <sjust> sagelap: I also added some debugging to find the bug causing some pgs to get stuck in active+degraded
[1:22] <sjust> sagelap: testing with only 4 osds, however, I have been thrashing osds with rados bench running for the last 2-3 hours and haven't seen any failures
[1:22] <joshd> sagelap: there's another failure that we thought was related - an assert(is_up) failing when sending queries
[1:22] <sagelap> that's good to hear
[1:22] <sagelap> hrm
[1:23] <joshd> sagelap: I got it again today after your msgr fixes though
[1:23] <joshd> I'll make a bug
[1:23] <sagelap> ok cool
[1:26] <sagelap> joshd: do you have logs from that run?
[1:26] <sagelap> oh not with the msgr fixes.
[1:27] <joshd> sagelap: I have logs where it happened before and after the msgr fixes
[1:27] * alexxy (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[1:27] <sagelap> it looks like newest_update_osd requesting master log is the only query that isn't explicitly guarded by is_up()
[1:32] * verwilst (~verwilst@dD576948A.access.telenet.be) has joined #ceph
[1:37] <sjust> sagelap: I think we don't progress passed GetInfo if any are down
[1:40] <sagelap> joshd, sjust: i think i see the problem
[1:40] * greglap (~Adium@198.228.211.27) Quit (Read error: Connection reset by peer)
[1:40] <sagelap> we get info from osd0 while it is up, while it's stray
[1:40] <sagelap> later it goes down, but we still have the info in peer_info, and end up choosing it as the source to pull our logs from
[1:41] <sagelap> we should probably clear out peer_info, peer_missing, etc., when nodes go down?
[1:43] <sagelap> or, probably simpler, check for is_up in choose_log_location
[1:43] <sagelap> and make sure everyeone else understands that peer_info may include down osds
[1:44] <sjust> hmm, peer_info is cleared in clear_primary_state
[1:44] <sjust> while in peering, we should only be pulling info from prior_set
[1:45] <sjust> and changes to prior_set should be resetting peering
[1:45] <sjust> it seems like we are missing a prior_set_affected check?
[1:47] <joshd> we check that in all of peering if we get an advance map
[1:47] <sjust> joshd: yeah
[1:47] <sagelap> in this case osd0 isn't in the prior set at all
[1:48] <sjust> osd0 isn't in the prior set but had the most recent info?
[1:48] <sagelap> strangely, yeah
[1:48] <sagelap> the pg content probably didn't change recently.
[1:48] * verwilst (~verwilst@dD576948A.access.telenet.be) Quit (Quit: Ex-Chat)
[1:48] <sjust> sagelap: that should not be possible, right?
[1:49] <sagelap> it is, because last_clean only cleans out strays that are currently up before setting the last_clean in PG::Info::History
[1:49] <sagelap> and in general, it should always be possible for some node to show up with data... say due to herculean data recovery efforts from an admin or something
[1:49] <sjust> but the primary that sent out the last_clean would have had to have at least as recent a last_update
[1:50] <sjust> *primary that did the cleaning
[1:50] <sagelap> yeah.. but say osd0 is down for a while, everyone cleans up and the history is trimmed.. then osd0 comes back and has data but isn't part of the prior set.
[1:50] <sjust> *primary that declared the pg clean
[1:51] <sjust> right, but it seems that if osd1 was primary when the pg went active+clean, then osd1 would be in the prior_set and we should prefer osd1
[1:51] <sagelap> we could do that. currently it goes through peer_info and picks the first (lowest numbered) one with the latest last_update
[1:51] <sjust> ah..., we should probably prefer the one in the prior_set
[1:51] <sagelap> but even if we do it'll just make this problem more rare. there will stillb e a peer_info item for an osd that is down.
[1:52] <sjust> true
[1:52] <sagelap> we can either: (1) trim peer_info on each map update of down nodes, or (2) check in choose_log_location that the node in question is up
[1:53] <sagelap> i suspect 1 is cleaner, but i'm not sure if that would screw us up if we're in GetLog or GetMissing but suddenly assumptions no longer hold?
[1:53] <joshd> 1 sounds more risky to me
[1:53] <sagelap> i guess mainly it'd affect GetMissing, because we may try to pull logs from a stray peer, it goes down, and we need to choose_log_location again to try another one
[1:53] <sjust> sagelap: yeah, we need to reset peering if the OSD we have selected goes down before we get the log
[1:54] * macana (~ml.macana@159.226.41.129) Quit (Read error: Connection reset by peer)
[1:54] <sagelap> for both the 1 or 2 case i guess
[1:54] * macana (~ml.macana@159.226.41.129) has joined #ceph
[1:54] <sjust> sagelap: we should probable a) always pull from the prior_set b) remove from peer_info when nodes go down
[1:54] <sjust> I mean, a) and b)
[1:55] <sagelap> and c) make sure GetMissing picks a new source if the chosen one goes down
[1:55] <sagelap> which means its own AdvMap handler or whatever i'm guessing?
[1:55] <joshd> sagelap: that will be handled by the peering state's AdvMap handler
[1:55] <sjust> a) takes care of that since prior_set_affected will return true
[1:56] <sagelap> er, GetLog
[1:56] <sagelap> not if the one we chose isn't in the prior set (as in josh's crash log)
[1:56] <sagelap> so GetLog should restart if newest_update_osd (which isn't necessarily in prior_set) goes down
[1:57] <sjust> sagelap: I think that if a stray has a last_update of x there must be a member of the prior set with a last_update of at least x
[1:57] <sagelap> sjust: hmm. yeah you may be right
[1:58] <sjust> sagelap: only thing I worry about is a case where no osds survive an interval and the admin forces the pg to go active anyway
[1:58] <sagelap> and if all of the prior_set guys are down we don't know what the latest is and don't get past GetInfo
[1:58] <sjust> sagelap: if any of the prior_set guys are down, we don't go past GetInfo, I think
[1:58] <sagelap> yeah
[1:58] <sagelap> well, we need one survivor from any interval that maybe_went_rw. or something along those lines.
[1:59] <sjust> sagelap: ah, yes
[1:59] <sagelap> ok, to a) and b) should handle all but the intervening admin case.
[1:59] <sagelap> s/to/so/
[1:59] <sjust> sagelap: as long as that condition holds, we are guaranteed not to have to look outside of prior_set for anything but clearing old strays
[2:00] <sagelap> or missing objects
[2:00] <sjust> sagelap: any stray not in the prior_set must have been a member prior to last time the pg went clean, right?
[2:00] <sagelap> eventually we will want to look at the disaster case, though :/
[2:00] <sagelap> right
[2:00] <sjust> sagelap: so the prior_set must contain at least one member with a more complete collection than that stray
[2:01] <sagelap> but it could still have perfectly valid data (e.g. if the pg content didn't change)
[2:01] <sjust> sagelap: ah, yes, it is useable for recovery, but not necessary?
[2:01] <sagelap> normally right.
[2:01] <sjust> ok
[2:02] <sagelap> but all sorts of things could muck with that.. fs errors on the prior_set nodes, whatever. but the recovery part is more forgiving
[2:02] <sjust> sagelap: yeah
[2:07] <sagelap> sjust, joshd: can you guys let me know before you take off wherever you end up?
[2:07] <joshd> sagelap: sure
[2:08] <sagelap> thanks. have a good weekend! i'm out
[2:09] <joshd> you too
[2:09] * sagelap (~sage@static-66-14-234-139.bdsl.verizon.net) Quit (Quit: Leaving.)
[2:12] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (Quit: Leaving)
[2:34] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[2:44] * cmccabe (~cmccabe@208.80.64.174) has left #ceph
[2:45] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[3:16] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:16] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[4:00] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[4:46] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:51] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[11:29] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[12:13] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[12:23] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[12:54] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[13:06] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[13:29] * yehuda_hm (~yehuda@bzq-79-178-112-50.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[14:27] * macana (~ml.macana@159.226.41.129) Quit (Ping timeout: 480 seconds)
[14:54] * allsystemsarego (~allsystem@188.27.166.92) has joined #ceph
[16:01] * yehuda_hm (~yehuda@bzq-79-178-112-50.red.bezeqint.net) has joined #ceph
[17:14] * alexxy (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[17:25] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[17:35] * Meduka_Meguca (~Yulya@ip-95-220-177-191.bb.netbynet.ru) Quit (Quit: leaving)
[17:36] * Yulya (~Yu1ya@ip-95-220-177-191.bb.netbynet.ru) has joined #ceph
[17:48] * Yulya (~Yu1ya@ip-95-220-177-191.bb.netbynet.ru) Quit (Quit: leaving)
[17:48] * Yulya_ (~Yu1ya_@ip-95-220-177-191.bb.netbynet.ru) has joined #ceph
[17:53] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[18:02] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[18:17] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[18:27] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[20:31] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[20:46] * yehuda_hm (~yehuda@bzq-79-178-112-50.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[20:47] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[20:52] * yehuda_hm (~yehuda@bzq-79-178-112-50.red.bezeqint.net) has joined #ceph
[21:49] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:05] * allsystemsarego (~allsystem@188.27.166.92) Quit (Quit: Leaving)
[22:07] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[23:05] * hachiya (~hachiya@encyclical.net) has joined #ceph
[23:30] * verwilst (~verwilst@dD5767194.access.telenet.be) has joined #ceph
[23:49] * verwilst (~verwilst@dD5767194.access.telenet.be) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.