#ceph IRC Log

Index

IRC Log for 2011-12-22

Timestamps are in GMT/BST.

[0:35] * aa (~aa@r190-135-24-129.dialup.adsl.anteldata.net.uy) has joined #ceph
[0:45] * aa (~aa@r190-135-24-129.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[0:51] * verwilst (~verwilst@d51A5B022.access.telenet.be) Quit (Quit: Ex-Chat)
[1:00] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:13] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[1:27] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[2:22] * The_Bishop (~bishop@p5B3D514A.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[2:26] * The_Bishop (~bishop@p5B3D514A.dip.t-dialin.net) has joined #ceph
[2:40] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:41] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[3:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:31] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[3:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:46] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[3:48] * aa (~aa@r186-52-141-50.dialup.adsl.anteldata.net.uy) has joined #ceph
[3:49] * Tv__ (~Tv__@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[4:09] * sagelap (~sage@c-76-24-18-36.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[4:14] * aa (~aa@r186-52-141-50.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[4:49] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[4:49] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[5:02] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[5:18] * The_Bishop (~bishop@p5B3D514A.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[5:39] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:11] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[6:12] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Operation timed out)
[6:12] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Operation timed out)
[6:22] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[6:50] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[6:51] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) Quit ()
[6:53] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[7:03] * aa (~aa@r186-52-141-50.dialup.adsl.anteldata.net.uy) has joined #ceph
[7:13] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[7:15] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[7:39] * sagelap (~sage@c-76-24-18-36.hsd1.ma.comcast.net) has joined #ceph
[7:42] * Tv__ (~Tv__@cpe-76-168-227-45.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:44] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[7:45] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[8:28] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[8:28] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[8:45] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[8:55] * verwilst (~verwilst@dD576F7D1.access.telenet.be) has joined #ceph
[9:31] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:12] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Connection reset by peer)
[10:12] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[10:51] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:56] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[12:00] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:25] * al_ (d@niel.cx) Quit (Remote host closed the connection)
[12:27] * al_ (d@niel.cx) has joined #ceph
[12:33] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:39] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:16] * andresambrois (~aa@r190-64-70-159.dialup.adsl.anteldata.net.uy) has joined #ceph
[13:20] * aa (~aa@r186-52-141-50.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[13:55] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[13:56] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[15:06] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:37] <fghaas> anyone remember whether the issue discussed in http://www.spinics.net/lists/ceph-devel/msg03020.html was ever resolved? my gceph built from 0.39 is suffering that breakage on opensuse 12.1. installing librsvg-2-2 didn't help, and no cairo-lcd package seems to be available for that distro
[16:03] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[16:08] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Connection reset by peer)
[16:08] * gohko_ (~gohko@natter.interq.or.jp) Quit (Ping timeout: 480 seconds)
[16:08] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[16:13] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Connection reset by peer)
[16:13] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[16:19] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Operation timed out)
[16:25] * yiH (~rh@83.217.113.221) Quit (Read error: Connection reset by peer)
[16:31] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[16:46] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[16:47] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[16:54] * andresambrois (~aa@r190-64-70-159.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[16:56] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[17:05] * aa (~aa@r190-64-70-159.dialup.adsl.anteldata.net.uy) has joined #ceph
[17:19] * aa (~aa@r190-64-70-159.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[17:19] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:48] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) has joined #ceph
[17:49] <BManojlovic> good evening
[18:10] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[18:11] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:17] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[18:18] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[18:23] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[18:24] * jfunk (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Operation timed out)
[18:28] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) Quit (Remote host closed the connection)
[18:29] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[18:33] <gregaf> fghaas: I don't see anything besides the recommendations for libsrvg and cairo-lcd; we never heard back if that fixed it or not
[18:34] <gregaf> oh, and you're not actually in the channel any more
[18:55] * eternaleye___ (~eternaley@195.215.30.181) has joined #ceph
[18:55] * eternaleye__ (~eternaley@195.215.30.181) Quit (Remote host closed the connection)
[19:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[19:22] * fronlius (~fronlius@f054111151.adsl.alicedsl.de) has joined #ceph
[20:07] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[20:16] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[20:26] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[20:42] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[20:46] * jfunk_ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[20:55] * sagelap (~sage@c-76-24-18-36.hsd1.ma.comcast.net) has left #ceph
[20:58] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[21:03] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[21:16] * grape_ is now known as grape
[21:22] * sagelap (~sage@c-76-24-18-36.hsd1.ma.comcast.net) has joined #ceph
[21:22] <sagelap> elder: ping?
[21:23] <sagelap> elder: forgot durgin is out today. scheduled nightly '-c' run on the for-linus branch.
[21:23] <sagelap> watch for the result email ceph-commit!
[21:27] <elder> sagelap, , I'm here
[21:28] <elder> My e-mail is seriously acting up lately (in case you tried to reach me that way)
[21:28] <sagelap> elder: that third patch makes the dcache trickery optional (and off by default), since there are some lingering issues. we should look at that when you're here in jan.
[21:29] <sagelap> the other one i'm unsure about is 4a37f04af0c758149d7ad647f3d07e8907803d2f, which fixes a d_lock/s_cap_lock ordering bug. it's ugly, though.. i suspect switching to an atomic for those values would work better
[21:32] <sagelap> elder, anyway, let me know if that mount option patch seems reasonable
[21:32] <elder> Why does the third patch make the dcache stuff optional?
[21:32] <elder> Oh wait
[21:32] <elder> I ahve to look at the ceph-devle mail.
[21:32] <sagelap> elder: we've seen sporatic failures for things like rm -rf on sepia that are due to races in that code
[21:36] <gregaf> sagelap: is there any documentation of cluster snapshots?
[21:37] <sagelap> gregaf: nope :)
[21:37] <gregaf> can you give me a summary? somebody's asking about them
[21:37] <sagelap> they're not a complete solution, since they only snapshot the osds, and no monitor state.
[21:38] <sagelap> an osdmap is published with cluster_snapshot = "foo" and snapshot_epoch = this epoch. when an osds processes that map epoch, it makes a separate snapshot of current/. that's about it...
[21:38] <sagelap> i can't remember if i coded the rollback-to-snapshot part
[21:38] <gregaf> does it work on non-btrfs?
[21:38] <sagelap> nope
[21:38] <gregaf> okay
[21:38] <sjust> sagelap: there is some code in filestore mount for mounting a cluster snap
[21:39] <elder> sagelap, OK, I follow what you did. My patch 1 == your patch 1; my patch 2 & 3 == your patch 2; your patch 3 makes the behavior controlled by a new mount option.
[21:39] <sagelap> gregaf, sjust: ah, --osd-rollback-to-cluster-snap <foo>
[21:40] <elder> Now I look for a message from teuthworker, I take it?
[21:41] <sagelap> elder: i think your 1 -> merged into my 3 (which introduced the bug), the other two were untouched.
[21:41] <sagelap> elder: yeah.. sometime this afternoon, hopefully
[21:49] <elder> Hmm. OK, well I find this confusing but I'm not going to interfere...
[21:51] <elder> (I mean the part about merging in a fix and re-committing something that was already published.)
[22:00] <yehudasa_> sagelap: are you aware off the top of your head of any config change that happened on benjamin and alexandria on early november? there seem to be a significant performance degradation then
[22:09] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Read error: Operation timed out)
[22:14] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[22:52] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) Quit (Ping timeout: 480 seconds)
[22:57] <sagelap> yehudasa_: adding the bucket directory tmap?
[22:58] <yehudasa_> sagelap: I think that happened earlier
[22:59] <sagelap> i'd look at which new code was pushed to the cluster around that time (check the deploy branches) and what changed...
[23:00] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[23:00] <yehudasa_> sagelap: yeah, working on that.. so far only ended up with the freebsd port..
[23:01] <sagelap> yehudasa_: der, hopefully it's not something stupid in there...
[23:01] <yehudasa_> sagelap: I went through the changes, it didn't look like it
[23:02] <sagelap> what is slow? the osds?
[23:02] <yehudasa_> sagelap: what we see now is btrfs being very laggy
[23:02] * jfunk__ (~jfunk@S0106602ad0819190.cq.shawcable.net) has joined #ceph
[23:02] <yehudasa_> sagelap: did strace on a small object write and it took more than 2 seconds just to return from a write() with < 256 bytes
[23:03] <iggy> btrfs is required for rbd snaps right?
[23:03] <sagelap> that can happen if it immediately follows a snapshot
[23:03] <yehudasa_> iggy: not really
[23:03] <sagelap> the btrfs commit_transaction unblocks new transaction, but the first guy to touch a part of the btrfss inode address_space that is under writeback blocks until that io completes
[23:04] <yehudasa_> any change around that area introduced then?
[23:04] <sagelap> no.. nothing new there..
[23:06] <yehudasa_> sagelap: how frequent are the snapshots?
[23:08] <todin> yehudasa_: do you have a problem with decreasing btrfs performance over time? older kernel had that problem.
[23:08] <yehudasa_> todin: we're not completely sure what's going on
[23:09] <yehudasa_> todin: in any case we're pretty bleeding edge w/ btfs
[23:09] <yehudasa_> btrfs
[23:09] <sagelap> the benjamin kernel may be lagging, though.. is it the same on alexandria?
[23:09] <yehudasa_> sagelap: yes
[23:09] <sagelap> could do a reboot on the osds just to verify
[23:10] <todin> yehudasa_: the problem was still in 3.1. it was fixed samewhere in 3.2.rcx
[23:12] <yehudasa_> not sure, but we might have that cherry-picked, gregaf?
[23:13] <yehudasa_> in any case, looking at the current trends I don't think reboot would be the solution, we have rebooted and replaced kernel since early november and it's not at the same order of magnitude it was before
[23:14] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[23:14] <sagelap> yeah
[23:19] <sagelap> yehudasa_: looking at the rados bench numbers it's mostly fast (~3ms) for small writes, altho i do see some outliers with big latencies
[23:19] * verwilst (~verwilst@dD576F7D1.access.telenet.be) Quit (Quit: Ex-Chat)
[23:21] <yehudasa_> sagelap: a small loop through 1k (the same) object PUT is very sluggish, runs fast for a few seconds, then hit a 3-5 seconds wait
[23:21] <yehudasa_> sagelap: was looking at it, and it seemed that the osd was just hanging there
[23:22] <yehudasa_> sagelap: as said earlier, I straced, and looked like slow writes
[23:22] <sagelap> seeing if it looks like hte btrfs thing.
[23:22] <sagelap> the journal should be masking any btrfs latency, unless something else funny is going on
[23:23] <sagelap> oh.. maybe there is more read-modify-write stuff going on.
[23:24] <yehudasa_> sagelap: where would that have come from? the new object store pg layout?
[23:26] <sagelap> my first guess would be the rgw stuff that is doing guards on write ops.. those require a read and then write, and the read means it'll block until any previous writes are applied.
[23:27] <yehudasa_> sagelap: I think that came much earlier
[23:28] <yehudasa_> sagelap: atomic get was implemented aroung early august
[23:29] <sagelap> tracing through a slow write in the osd log to see what its doing
[23:30] <yehudasa_> sagelap: also, the version that was on 11/1 (pre-slowness) already had all the cls_rgw in
[23:31] <sagelap> k
[23:32] <sjust> sagelap: I just pushed more changes to wip-backfill
[23:32] <sjust> I reworked recover_backfill a bit. Also, we now update last_backfill once the push is complete.
[23:36] <sagelap> i see 1.8 second latency on 12k write to journal device
[23:37] <yehudasa_> sagelap: there were a few filestore commits on 11/10, notable "filestore: sync after non-idempotent operations"
[23:37] <sagelap> looks to me like something is broken with the array?
[23:37] <sagelap> yeah, that shouldn't affect us with btrfs, tho. i'll verify
[23:40] <sagelap> yeah, not that.
[23:45] <sagelap> yeah, i see a lot of high latency writes to the journal that are blocking things.
[23:46] <yehudasa_> sagelap: anything else going on while that happens?
[23:47] <sagelap> hmm, probably activity on the other disk partition.
[23:48] <sagelap> it's a separate disk partition with nvram write-back cache. probably would be better off with an ssd or something.
[23:49] <yehudasa_> sagelap: there was a syncfs() change on 11/9
[23:50] <yehudasa_> might have been removed since
[23:51] <yehudasa_> in any case, not relevant for btrfs
[23:51] <sagelap> didn't they change the raid controller settings recently?
[23:51] <yehudasa_> hmm.. I remember something vaguely
[23:52] <sagelap> right after greg triggered the xfs issues they turned off the drive caches, i think.. that was pretty recent though
[23:52] <yehudasa_> well.. Nov is not that far off

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.