#ceph IRC Log


IRC Log for 2011-01-04

Timestamps are in GMT/BST.

[0:01] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:35] * sentinel_e86 (~sentinel_@ Quit (Quit: sh** happened)
[0:36] * johnl_ (~johnl@ has joined #ceph
[0:36] * johnl (~johnl@ Quit (Remote host closed the connection)
[0:36] * sentinel_e86 (~sentinel_@ has joined #ceph
[1:37] * yehuda_hm (~yehuda@ppp-69-232-181-98.dsl.irvnca.pacbell.net) has joined #ceph
[1:52] * joshd (~jdurgin@rrcs-74-62-34-205.west.biz.rr.com) Quit (Quit: Leaving.)
[2:26] * bchrisman1 (~Adium@c-24-130-226-22.hsd1.ca.comcast.net) has joined #ceph
[2:26] * bchrisman (~Adium@c-24-130-226-22.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[2:27] <cmccabe> wido: hey, I implemented that max_open_files thing
[2:27] <cmccabe> wido: since I already spent some time thinking about it
[2:42] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Read error: Operation timed out)
[3:31] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) has joined #ceph
[5:00] * raso (~raso@debian-multimedia.org) Quit (Ping timeout: 480 seconds)
[5:09] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) Quit (Quit: Leaving.)
[6:26] * ijuz_ (~ijuz@p4FFF7176.dip.t-dialin.net) has joined #ceph
[6:33] * ijuz__ (~ijuz@p4FFF7423.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[6:58] * yehuda_hm (~yehuda@ppp-69-232-181-98.dsl.irvnca.pacbell.net) Quit (Ping timeout: 480 seconds)
[7:12] * f4m8_ is now known as f4m8
[7:39] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:54] * allsystemsarego (~allsystem@ has joined #ceph
[9:08] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:48] * bchrisman1 (~Adium@c-24-130-226-22.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[9:59] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[10:16] * Yoric (~David@ has joined #ceph
[10:32] * hijacker (~hijacker@ Quit (Read error: Connection reset by peer)
[11:32] * hijacker (~hijacker@ has joined #ceph
[11:52] * hijacker (~hijacker@ Quit (Remote host closed the connection)
[11:57] * hijacker (~hijacker@ has joined #ceph
[13:29] * Yoric_ (~David@ has joined #ceph
[13:29] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[13:29] * Yoric_ is now known as Yoric
[14:07] * Yoric (~David@ Quit (Quit: Yoric)
[14:07] * Yoric (~David@ has joined #ceph
[15:47] * f4m8 is now known as f4m8_
[16:38] <stingray> when osds are resyncing, client is crawling
[16:40] * julienhuang (~julienhua@pasteur.dedibox.netavenir.com) has joined #ceph
[16:49] * greglap (~Adium@ has joined #ceph
[16:58] <greglap> stingray: how much of a performance drop-off?
[17:00] <greglap> things certainly slow down during OSD recovery — we haven't implemented serious QOS or anything — but it generally shouldn't stop unless you've lost data
[17:33] * greglap (~Adium@ Quit (Quit: Leaving.)
[17:48] <stingray> gregaf: client just stalls and [ 1341.264064] ceph: tid 104 timed out on osd3, will reset osd
[17:49] <stingray> osd3 is a new osd, I just added 4 of those, pushed new crushmap and increased replication to 3 for both data and metadata
[18:18] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:35] * cmccabe1 (~cmccabe@adsl-76-200-188-5.dsl.pltn13.sbcglobal.net) has joined #ceph
[18:37] * Yoric (~David@ Quit (Ping timeout: 480 seconds)
[18:39] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[19:08] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:12] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:34] * Tv (~Tv@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:37] <gregaf> stingray: if I understand you correctly, I think this is a present unfortunate consequence of adding too many OSDs at once
[19:46] * Tv (~Tv@ip-66-33-206-8.dreamhost.com) Quit (Quit: Tv)
[19:47] * Tv (~Tv@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:00] * Tv (~Tv@ip-66-33-206-8.dreamhost.com) has left #ceph
[20:28] <stingray> gregaf: I fail :(
[20:28] <stingray> let's wait until it converges
[20:28] <stingray> only 6% left
[20:29] <stingray> what's funny is that writes work
[20:29] <stingray> only reads get stuck, and that are reads from a particular osd
[20:29] <gregaf> hmm
[20:29] <stingray> is there any way to figure this out except tcpdump
[20:29] <gregaf> we do have some reports that writes starve reads
[20:30] <gregaf> and the rate-limiting in recovery isn't well tested, so it's possible that the OSD just has too much of a write workload so reads aren't going through/are taking forever
[20:30] <stingray> the don't usually do that, they seemed to work before I did the reshuffle
[20:30] <stingray> the kernel client doesn't have much of the useful instrumentation - at least the old one I have
[20:30] <stingray> maybe a client bug, after all
[20:31] <stingray> I'll do more tests
[20:31] <gregaf> just to make sure I understand, you just added 4 OSDs to your cluster and increased replication from 2 to 3?
[20:31] <stingray> is ods thread-per-request?
[20:31] <stingray> osd
[20:31] <stingray> gregaf: yeah
[20:31] <gregaf> how big was your cluster prior to that?
[20:31] <stingray> it then told me it's about 50% degraded and started doing stuff
[20:32] <stingray> 3 osds, 3.6T each, 1.5T of data (3T with replication)
[20:32] <gregaf> okay
[20:32] <stingray> I added 4 osds about 3.4T each
[20:32] <gregaf> the OSDs have a lot of threads, but it's not one per request
[20:33] <stingray> and it doesn't prioritize peering versus clients
[20:33] <gregaf> they have a setup based on thread pools and workqueues that the requests get routed through, in order of arrival but modified by what's available in the system
[20:33] <gregaf> there are some limited attempts to do it with some throttling
[20:34] <gregaf> and rudimentary logic to try and keep PGs active if they're moved to a new OSD with no data
[20:34] <stingray> it does a lot of "journal throttle" here, not sure what it means now
[20:34] <gregaf> but I think that by simultaneously doubling the cluster and increasing replication you just gave it more than our metrics can handle
[20:35] <stingray> heh
[20:35] <gregaf> it's an area that needs further development
[20:36] <gregaf> what's the exact journal throttle message you're getting?
[20:37] <stingray> both
[20:37] <stingray> waited for bytes
[20:37] <stingray> and for ops
[20:37] <stingray> 2011-01-04 21:47:45.497892 7f56c0ff9700 journal throttle: waited for ops
[20:37] <stingray> 2011-01-04 21:47:45.584344 7f56abfff700 journal throttle: waited for ops
[20:37] <stingray> 2011-01-04 22:14:49.906795 7f56aaffd700 journal throttle: waited for ops
[20:37] <stingray> 2011-01-04 22:14:49.974128 7f56aaffd700 journal throttle: waited for ops
[20:37] <stingray> 2011-01-04 22:14:50.056533 7f56aaffd700 journal throttle: waited for ops
[20:37] <stingray> 2011-01-04 22:14:50.124184 7f56aaffd700 journal throttle: waited for ops
[20:38] <stingray> they usually come in 2+ screens
[20:38] <gregaf> ah
[20:39] <gregaf> it isn't the sole cause of any trouble, but that means your OSD journal can't keep up with the rate at which its receiving data
[20:41] <gregaf> not anything broken but if you can somehow get a faster journal it will help performance, or at least minimize issues during repeering, etc
[20:41] <stingray> ok
[21:58] <stingray> 2011-01-04 23:58:12.300505 pg v108271: 804 pgs: 7 active, 797 active+clean; 1491 GB data, 4459 GB used, 19195 GB / 24921 GB avail; 31623/1151964 degraded (2.745%)
[21:58] <stingray> doesn't go below this
[22:02] <gregaf> stingray: hmmm, what version are you running and how long has it been stuck there?
[22:05] <stingray> .24
[22:05] <stingray> doesn't seem to make any progress
[22:06] <stingray> and, the chunks are still unreadable
[22:06] <stingray> I'm trying to debug osd
[22:07] <gregaf> sjust: you dealt with symptoms like this recently, didn't you?
[22:10] <sjust> gregaf: Yeah, I don't remember specifically what was causing it though. I would need to look at the logs.
[22:12] * julienhuang (~julienhua@pasteur.dedibox.netavenir.com) Quit (Quit: julienhuang)
[22:14] <stingray> sjust: something fixable?
[22:16] <sjust> stingray: the problem before was with scrubbing, but I now remember fixing that bug
[22:16] <sjust> could you get me the logs?
[22:16] <stingray> if only I knew which osd
[22:17] <stingray> sjust: I'll try to gather more info for you, worst case tomorrow
[22:17] <stingray> now I've got to go - need to meet someone
[22:17] <stingray> cu
[22:17] <sjust> ok
[22:24] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:45] * sentinel_e86 (~sentinel_@ Quit (Quit: sh** happened)
[22:47] * sentinel_e86 (~sentinel_@ has joined #ceph
[22:56] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[23:51] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.