#ceph IRC Log

Index

IRC Log for 2011-12-19

Timestamps are in GMT/BST.

[0:46] * fronlius (~fronlius@f054184179.adsl.alicedsl.de) Quit (Quit: fronlius)
[1:45] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[1:55] * aa (~aa@r186-52-203-153.dialup.adsl.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[1:55] * aa (~aa@r186-52-203-153.dialup.adsl.anteldata.net.uy) has joined #ceph
[2:43] * andresambrois (~aa@r186-52-243-254.dialup.adsl.anteldata.net.uy) has joined #ceph
[2:43] * aa (~aa@r186-52-203-153.dialup.adsl.anteldata.net.uy) Quit (Read error: Connection reset by peer)
[2:44] * eryc (~eric@internetjanitor.com) Quit (Quit: leaving)
[3:58] * andresambrois (~aa@r186-52-243-254.dialup.adsl.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[3:58] * andresambrois (~aa@r186-52-243-254.dialup.adsl.anteldata.net.uy) has joined #ceph
[6:07] * MarkDude (~MT@wsip-70-164-189-192.lv.lv.cox.net) has joined #ceph
[6:08] * MarkDude (~MT@wsip-70-164-189-192.lv.lv.cox.net) Quit ()
[6:42] * kobkero (~chatzilla@180.183.114.44) has joined #ceph
[6:42] <kobkero> hello
[6:43] <kobkero> i have a ploblem with mds
[6:43] <kobkero> i have 5 mds
[6:43] <kobkero> and set max mds to 3
[6:44] <kobkero> but now mds e49: 1/1/3 up {0=e=up:replay}, 4 up:standby
[6:44] <kobkero> not have any to active
[6:44] <ajm> mds log?
[6:44] <ajm> also max mds > 1 isn't as robust
[6:46] <kobkero> mds max =1 it's same
[6:46] <kobkero> mds e50: 1/1/1 up {0=e=up:replay}, 4 up:standby
[6:46] <kobkero> not active
[6:48] <kobkero> mon.1 192.168.8.62:6789/0 360 ==== mdsbeacon(6111/e up:replay seq 328 v50) v2 ==== 103+0+0 (1247169504 0 0) 0x1531780 con 0x14eddc0
[6:50] <kobkero> if i send command # ceph mds stop mds.e
[6:50] <kobkero> mon.4 -> 'mds.0 not active (up:replay)' (-17)
[6:50] <kobkero> how i get some mds active
[8:19] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[9:13] * kobkero (~chatzilla@180.183.114.44) Quit (Quit: ChatZilla 0.9.87 [Firefox 8.0/20111104165243])
[9:19] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:20] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[9:23] * fghaas1 (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[9:23] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[9:26] * fghaas1 (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) Quit (Remote host closed the connection)
[9:27] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[9:52] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[10:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[10:33] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:40] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:40] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[10:40] * fronlius_ is now known as fronlius
[11:11] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:11] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[11:11] * fronlius_ is now known as fronlius
[11:16] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:16] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[11:16] * fronlius_ is now known as fronlius
[11:24] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[11:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[11:33] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[12:14] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[12:14] * andresambrois (~aa@r186-52-243-254.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[13:42] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:43] * gregaf1 (~Adium@aon.hq.newdream.net) has joined #ceph
[13:46] * iggy_ (~iggy@theiggy.com) has joined #ceph
[13:46] * rosco_ (~r.nap@188.205.52.204) has joined #ceph
[13:47] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (reticulum.oftc.net larich.oftc.net)
[13:47] * gregaf (~Adium@aon.hq.newdream.net) Quit (reticulum.oftc.net larich.oftc.net)
[13:47] * mfoemmel (~mfoemmel@chml01.drwholdings.com) Quit (reticulum.oftc.net larich.oftc.net)
[13:47] * yehudasa_ (~yehudasa@aon.hq.newdream.net) Quit (reticulum.oftc.net larich.oftc.net)
[13:47] * acaos_ (~zac@209-99-103-42.fwd.datafoundry.com) Quit (reticulum.oftc.net larich.oftc.net)
[13:47] * rosco (~r.nap@188.205.52.204) Quit (reticulum.oftc.net larich.oftc.net)
[13:47] * iggy (~iggy@theiggy.com) Quit (reticulum.oftc.net larich.oftc.net)
[13:49] * mfoemmel (~mfoemmel@chml01.drwholdings.com) has joined #ceph
[13:49] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[13:49] * acaos_ (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[14:00] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[14:00] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Read error: Connection reset by peer)
[14:30] * yiH (~rh@83.217.113.221) has joined #ceph
[14:30] <yiH> hi guys,
[14:30] <yiH> I'm still trying to figure out how to achieve HA with ceph
[14:31] <yiH> I killed the MDS node which was active (also acts/acted as OSD and MON), ceph -w shows this, which looks healthy to me, but still can't access the FS:
[14:31] <yiH> 2011-12-19 13:29:23.956787 mds e137: 1/1/1 up {0=beta=up:active}
[14:31] <yiH> 2011-12-19 13:29:23.956830 osd e136: 3 osds: 2 up, 2 in
[14:31] <yiH> 2011-12-19 13:29:23.956933 log 2011-12-19 13:29:16.160454 mds.0 xxx.xxx.xxx.33:6800/2767 3 : [INF] closing stale session client.5103 xxx.xxx.xxx.31:0/3216998045 after 302.352802
[14:31] <yiH> 2011-12-19 13:29:23.957015 mon e1: 3 mons at {0=xxx.xxx.xxx.31:6789/0,1=xxx.xxx.xxx.33:6789/0,2=xxx.xxx.xxx.35:6789/0}
[14:39] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[14:42] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[14:52] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[14:52] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[15:01] <yiH> now did another test, this time the FS works on one of the nodes, and jammed on the other...
[15:15] <yiH> I see this for the jammed node: 2011-12-19 14:12:38.563533 7f6e6a026700 log [INF] : denied reconnect attempt (mds is up:active) from client.7504 xxx.xxx.xxx:0/2600235342 after 904.197541 (allowed interval 45)
[15:27] <yiH> mmm this is new
[15:27] <yiH> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
[15:27] <yiH> 2011-12-19 14:22:54.349159 1: (SafeTimer::timer_thread()+0x2c1) [0x5ca341]
[15:27] <yiH> 2011-12-19 14:22:54.349169 2: (SafeTimerThread::entry()+0xd) [0x5caf7d]
[15:27] <yiH> 2011-12-19 14:22:54.349180 3: (()+0x7efc) [0x7fc482597efc]
[15:27] <yiH> 2011-12-19 14:22:54.349186 4: (clone()+0x6d) [0x7fc480bc889d]
[15:27] <yiH> 2011-12-19 14:22:54.349192 os/FileStore.cc: In function 'virtual void SyncEntryTimeout::finish(int)', in thread '7fc4770ba700'
[15:27] <yiH> os/FileStore.cc: 2937: FAILED assert(0)
[15:27] <yiH> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
[15:27] <yiH> 1: (SyncEntryTimeout::finish(int)+0xfe) [0x6c216e]
[15:27] <yiH> 2: (SafeTimer::timer_thread()+0x2c1) [0x5ca341]
[15:27] <yiH> 3: (SafeTimerThread::entry()+0xd) [0x5caf7d]
[15:27] <yiH> 4: (()+0x7efc) [0x7fc482597efc]
[15:27] <yiH> 5: (clone()+0x6d) [0x7fc480bc889d]
[15:40] * andresambrois (~aa@r190-135-146-9.dialup.adsl.anteldata.net.uy) has joined #ceph
[15:47] * andresambrois (~aa@r190-135-146-9.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[15:58] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) has joined #ceph
[17:02] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:48] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:02] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:11] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[18:11] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:39] <chaos_> hi, i saw that there is new performance counter "recovery ops", what is this exactly? at one osd this is 0 but at the other one it's constantly 7
[18:42] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:49] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:06] <gregaf1> yiH: the SafeTimer assert you're seeing is a result of your OSD disk not syncing quickly enough
[19:07] <yiH> gregaf1: that's a bit odd. there's really nothing to update... there weren't any write operations.
[19:08] <chaos_> sjust, ping ;-)
[19:08] <gregaf1> I'm not sure exactly what's going on with your MDS issues, but it looks like maybe they're failing to switch over to your standbys properly, and your clients aren't connecting to the replacements correctly either — can you paste your cluster config somewhere and explain what steps you went through very clearly?
[19:09] <yiH> sure
[19:09] <yiH> what do you need? ceph.conf, cluster creation, ceph -w, .. anything else?
[19:10] <gregaf1> yiH: that sounds about right, but also exactly how you were testing it — what clients were connected, were they disconnected, etc
[19:10] <yiH> I unplug the power cord from the active mds node (which is also acts as mon and mds)
[19:10] <yiH> let me write all those down
[19:11] <gregaf1> chaos_: the recovery_ops is a counter for how many recovery processes are going on — replicating degraded objects, looking for missing objects, etc
[19:13] <gregaf1> bbiab, daily standups now
[19:14] <chaos_> gregaf1, gregaf1 but it constant 7 at one of my osds
[19:14] <chaos_> it doesn't look good then
[19:23] * iggy_ is now known as iggy
[19:26] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[19:27] <yiH> okay, here is the first part: http://pastebin.com/NTaGk7Ea
[19:27] <yiH> if you are satisfied it, I will reproduce the problem
[19:29] <yiH> oops, a line missing, but it's in the config :) for [osd.0] btrfs devs = /dev/VolGroup/shared
[19:34] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[19:38] <yiH> gregaf1: is that ok?
[19:46] <gregaf1> chaos_: hmm, looks like whoever implemented the counter didn't finish — it never decrements!
[19:47] <gregaf1> yiH: if you can reproduce that I'll take a look at your logs and things
[19:47] <gregaf1> what I'd expect to see is that after ~30 seconds your standby node goes into replay, and then active very shortly after that
[19:47] <yiH> sure, did a couple of times... ceph -w, that is?
[19:47] <gregaf1> that'll tell you, yes
[19:48] <yiH> sometimes I did see the switch yet the mounted FS was frozen
[19:48] <gregaf1> I'm just a bit worried because at some point you had a client fail because it didn't reconnect until some 900 seconds after it was allowed to, and I don't know why that would be happening
[19:51] <yiH> ok, beta in replay
[19:53] <yiH> (note: this takes quite a lot of time.. is there any way to speed this up?)
[19:54] <gregaf1> hmm, it shouldn't take very long at all
[19:54] <gregaf1> I've never seen it take more than ~60 seconds; usually more like 6
[19:54] <yiH> well, not here.. and I have hardly any data
[19:54] <gregaf1> the entire process is usually dominated by timeouts
[19:54] <gregaf1> do you have any MDS logging on?
[19:55] <yiH> 2011-12-19 18:50:24.494331 7f041a6d7700 mds.0.20 handle_mds_map i am now mds.0.20
[19:55] <yiH> 2011-12-19 18:50:24.494361 7f041a6d7700 mds.0.20 handle_mds_map state change up:standby --> up:replay
[19:55] <yiH> 2011-12-19 18:50:24.494373 7f041a6d7700 mds.0.20 replay_start
[19:55] <yiH> 2011-12-19 18:50:24.495039 7f041a6d7700 mds.0.20 recovery set is
[19:55] <yiH> 2011-12-19 18:50:24.495057 7f041a6d7700 mds.0.20 need osdmap epoch 169, have 159
[19:55] <yiH> 2011-12-19 18:50:24.495067 7f041a6d7700 mds.0.20 waiting for osdmap 169 (which blacklists prior instance)
[19:55] <yiH> 2011-12-19 18:50:24.495141 7f041a6d7700 mds.0.cache handle_mds_failure mds.0 : recovery peers are
[19:55] <yiH> 2011-12-19 18:50:24.497962 7f041a6d7700 mds.0.20 ms_handle_connect on 194.107.16.33:6801/3299
[19:55] <yiH> 2011-12-19 18:50:24.498584 7f041a6d7700 mds.0.20 ms_handle_connect on 194.107.16.35:6800/2532
[19:55] <yiH> 2011-12-19 18:50:36.939123 7f041a6d7700 mds.0.cache creating system inode with ino:100
[19:55] <yiH> 2011-12-19 18:50:36.942582 7f041a6d7700 mds.0.cache creating system inode with ino:1
[19:55] <yiH> no new entries in the last 5 minutes
[19:56] <chaos_> gregaf1, should i fill bug report?:p
[19:57] <gregaf1> chaos_: I was going to try and just fix it now, but if you like
[19:57] <chaos_> i just want to have this fixed someday ;-)
[19:57] <gregaf1> yiH: okay, I should have thought of this before — can you do it again after adding "debug mds = 20" and "debug ms = 1" to your mds config?
[19:58] <gregaf1> that will generate a lot more logging output which should let me identify the problem
[19:58] <yiH> both of those lines?
[19:58] <gregaf1> yes
[19:58] <gregaf1> that's turning on debugging in the MDS subsystem and the messenger subsystem :)
[20:00] <gregaf1> chaos_: actually, yeah, can you file a bug report? :P
[20:00] <chaos_> gregaf1, on my way ;)
[20:00] <gregaf1> this isn't as trivial as I'd thought and I don't have time to track down everywhere right now
[20:00] <yiH> ok will do that (btw it's still in replay)
[20:00] <chaos_> gregaf1, chill ;-) i'm happy that it isn't another osd/mds fail
[20:01] <gregaf1> oh, I'm totally chill :D
[20:01] <gregaf1> that's why you're the one writing the bug :p
[20:01] <chaos_> ;>
[20:02] <chaos_> i'm glad i can help, not just use and whine ;)
[20:03] <yiH> :D
[20:03] <yiH> I checked the source, and it looks more-less ok
[20:04] <yiH> but it's quite messy without knowing what's going on. is the pdf thesis relevant for understanding it, or is it out-of-date?
[20:05] <gregaf1> the thesis is a good way to understand system architecture but it doesn't really talk about the code details at all
[20:05] <yiH> any internal documentation out there?
[20:06] <gregaf1> not really; we're starting to generate some now but it's mostly been transmitted verbally and via code dives :(
[20:07] <yiH> mm that's quite natural when there's a lot of code change
[20:07] <yiH> documenting is just hindrance :>
[20:16] <chaos_> gregaf1, http://tracker.newdream.net/issues/1845
[20:19] <yiH> it's still doing 2011-12-19 19:19:06.620430 log 2011-12-19 19:18:58.636736 osd.2 xxx.xxx.xxx.35:6800/10117 55 : [INF] 0.7d scrub ok
[20:23] <gregaf1> yiH: that's a normal thing during low load; the OSDs are doing a limited comparison of their contents to make sure everything's okay
[20:24] * fronlius (~fronlius@e182092112.adsl.alicedsl.de) has joined #ceph
[20:42] <yiH> I'm just watching the replay in the mds log..
[20:43] <yiH> 2011-12-19 19:42:53.978000 7f87b8401700 mds.0.22 beacon_send up:replay seq 505 (currently up:replay)
[20:43] <yiH> 2011-12-19 19:42:53.978566 7f87b9d05700 mds.0.22 handle_mds_beacon up:replay seq 505 rtt 0.000538
[20:43] <yiH> 2011-12-19 19:42:57.936273 7f87b8401700 mds.0.bal get_load no root, no load
[20:43] <yiH> 2011-12-19 19:42:57.936378 7f87b8401700 mds.0.bal get_load mdsload<[0,0 0]/[0,0 0], req 0, hr 0, qlen 0, cpu 1.93>
[20:45] <yiH> this is mds log: http://pastebin.com/BfjzWBij
[20:46] * MarkDude (~MT@64.134.236.99) has joined #ceph
[20:49] <gregaf1> yiH: did you add the "debug ms = 20" line? I don't see any message passing going on
[20:49] <yiH> yeah i did
[20:50] <yiH> [mds] section?
[20:50] <gregaf1> yeah
[20:50] <yiH> it's there
[20:50] <yiH> and restarted deamons
[20:50] <yiH> daemons
[20:51] <yiH> [mds]
[20:51] <yiH> keyring = /data/keyring.$name
[20:51] <yiH> debug mds = 20
[20:51] <yiH> debug mds = 1
[20:51] <gregaf1> oh, no
[20:51] <gregaf1> "debug ms = 20" — note the lack of a 'd' there
[21:02] * MarkDude (~MT@64.134.236.99) Quit (Quit: Leaving)
[21:04] <yiH> oh lol
[21:13] <yiH> 2011-12-19 20:12:57.862420 7fe5a2eca700 mds.-1.0 beacon_kill last_acked_stamp 2011-12-19 20:12:42.862272, we are laggy!
[21:14] <yiH> should I force switch with "ceph mds fail 0" ?
[21:16] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[21:18] <yiH> 2011-12-19 20:13:21.255950 mds e179: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:13:22.129134 pg v9821: 594 pgs: 393 active+clean, 201 active+clean+degraded; 500 MB data, 2150 MB used, 24187 MB / 29076 MB avail; 5215/43371 degraded (12.024%)
[21:18] <yiH> 2011-12-19 20:13:41.260591 mds e180: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:14:01.263469 mds e181: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:14:21.266301 mds e182: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:14:41.269351 mds e183: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:15:01.272598 mds e184: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:15:21.276191 mds e185: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:15:22.137540 pg v9822: 594 pgs: 393 active+clean, 201 active+clean+degraded; 500 MB data, 2145 MB used, 24187 MB / 29076 MB avail; 5215/43371 degraded (12.024%)
[21:18] <yiH> 2011-12-19 20:15:41.279539 mds e186: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:16:01.282469 mds e187: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:16:21.285231 mds e188: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:16:41.288274 mds e189: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> 2011-12-19 20:17:01.291896 mds e190: 1/1/1 up {0=beta=up:active(laggy or crashed)}
[21:18] <yiH> and no new entries on the mds log
[21:19] <yiH> It's kinda crazy, each time I test it, it has different problems
[21:19] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:20] <yiH> 1) jammed in this laggy/crashed state 2) doing failover for mds, but then jammed in the replay state 3) finishing replay and new mds works but then ceph FS is still freezed...
[21:20] <yiH> I saw all those scenarios today
[21:21] <yiH> anyways, I will do a "ceph mds fail 0" now
[21:22] <yiH> 2011-12-19 20:21:39.013149 mds e204: 0/1/1 up, 1 failed
[21:22] <yiH> that's, again, weird
[21:22] <yiH> === mds.alpha ===
[21:22] <yiH> Starting Ceph mds.alpha on alpha...already running
[21:24] <yiH> it's running yet it doesn't report it with ceph -w
[21:38] * votz (~votz@pool-108-52-122-3.phlapa.fios.verizon.net) has joined #ceph
[21:38] <yiH> 2011-12-19 20:37:45.795086 mds e209: 1/1/1 up {0=alpha=up:active}
[21:38] <yiH> 2011-12-19 20:37:45.795116 osd e220: 3 osds: 2 up, 2 in
[21:38] <yiH> now mds is up, but ceph FS is freezed
[22:18] * bugoff (bram@november.openminds.be) Quit (Server closed connection)
[22:18] * bugoff (bram@november.openminds.be) has joined #ceph
[22:19] <yiH> CU tomorrow
[22:22] * slang (~slang@chml01.drwholdings.com) Quit (Remote host closed the connection)
[22:25] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[22:25] * sagelap (~sage@c-76-24-21-36.hsd1.ma.comcast.net) has joined #ceph
[22:25] <sagelap> gregaf1: did the wip-osd-maybe-created look right to address that bug you saw?
[22:26] <sagelap> sjust: is wip_oloc ready to merge?
[22:30] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[22:32] <sjust> sagelap: sorry, I'll take a look, but I think that became wip_pgls which got merged last week
[22:33] <gregaf1> sagelap: yeah, I've been planning to write some tests for it but that keeps getting pushed back; I'll merge it to master and do the tests later I guess...
[22:34] <sjust> sagelap: yeah, it got merged with the pgls changes
[22:35] <sagelap> sjust: ah, i see. killing that old branch
[22:35] <sjust> ok
[22:38] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[22:38] * stass (stas@ssh.deglitch.com) has joined #ceph
[22:39] * f4m8_ (~f4m8@lug-owl.de) Quit (Server closed connection)
[22:39] * f4m8_ (~f4m8@lug-owl.de) has joined #ceph
[22:39] <sagelap> gregaf1: the stats corruption is hard to test for anyway without waiting for scrub. the obs.exists being wrong might be worth an explicit check
[22:40] <gregaf1> I was just going to write something in teuthology that goes through the tedium of restarting OSDs *shrug*
[22:44] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[22:47] * wonko_be (bernard@november.openminds.be) Quit (Server closed connection)
[22:47] * wonko_be (bernard@november.openminds.be) has joined #ceph
[23:29] * fronlius (~fronlius@e182092112.adsl.alicedsl.de) Quit (Quit: fronlius)
[23:30] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[23:35] * stass (stas@ssh.deglitch.com) has joined #ceph
[23:58] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Server closed connection)
[23:59] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.