#ceph IRC Log


IRC Log for 2012-03-05

Timestamps are in GMT/BST.

[11:24] * joao (~JL@ has joined #ceph
[16:02] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[16:02] <Hugh> all
[16:10] <jmlowe> Does anybody have some more details on the planned caching in librbd/librados?
[16:37] <sage> jmlowe: we're going to reuse the caching code from the ceph userland client (ceph-fuse, libcephfs), expose it via librados, and use it in librbd.
[16:38] <sage> it's a pretty simple cache.. tunable size, writeback delay, etc. any disk "flush" command will flush the cache, so it won't affect correctness as long as the fs on top issues proper barriers. (very old kernels don't)
[16:43] <jmlowe> ok, that's what I was looking for, how does this interact with the writeback window?
[19:44] * sagewk (~sage@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[19:46] <joao> lol
[19:47] <joao> did someone turned off their switch?
[19:51] <elder> Networking trouble?
[19:51] <joao> not sure
[19:51] <joao> I can't connect to the vpn, and they just mass timedout
[19:52] <elder> OK. So it's not just me.
[19:52] <nhm> yeah, all my connections are down too.
[19:53] <elder> Tommi said they were going to set up a new VPN for ceph storage, separate from Newdream. Made me a little nervous. Maybe that's underway?
[19:53] <joao> I was just recalling that
[19:53] <nhm> doesn't seem like you'd need to take the old one down though.
[19:54] <elder> I think it's part of reconfiguring the network more broadly, so maybe it does mean that.
[19:54] <nhm> could be I suppose.
[19:54] <elder> I.e., partitioning "our stuff" from "their stuff" and configuring routing between them.
[19:54] <elder> OK, well I'm going to take this opportunity to go run an errand.
[19:55] <joao> well, in my case they couldn't have found a better time :p
[19:55] <nhm> yeah, I'm gonna go have lunch
[19:55] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[19:56] <joao> yehudasa, what's happening over there? is there networking problems?
[19:56] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[19:58] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[19:58] <yehudasa> joao: yes
[19:58] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[20:03] * jluis (~JL@ has joined #ceph
[20:06] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[20:09] * joao (~JL@89-181-150-46.net.novis.pt) Quit (Ping timeout: 480 seconds)
[20:11] <nhm> wow, just read about the github exploit.
[20:12] * jluis is now known as joao
[20:12] <joao> what github exploit?
[20:12] * joao googles
[20:13] <nhm> http://www.extremetech.com/computing/120981-github-hacked-millions-of-projects-at-risk-of-being-modified-or-deleted
[20:14] * dmick (~dmick@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[20:17] <joao> oh wow
[20:17] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[20:17] <joao> that vuln is disturbing
[20:19] <nhm> yeah
[20:21] <joao> now I'm finding it mildly amusing since, iirc, kernel.org moved into github after it was initially hacked
[20:22] <joao> just thinking how bad it could have gotten if this hack had happened in a timely manner
[20:22] <nhm> Yeah, I think some times we put too much faith in underlying tools.
[20:23] <joao> this case looks like blatant disregard for a known vuln though
[20:23] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[20:24] <nhm> I don't know enough about rails to have a strong opinion, other than people seem to think it could be fixed upstream.
[20:25] <joao> I'm just inferring from what I read on the link you provided
[20:26] <joao> well, looks like dinner is ready
[20:26] <joao> bbl
[20:28] * gregaf (~Adium@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[20:28] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[20:47] <elder> yehudasa, any info on the network status? I now have access to autobuilder but my teuthology run seems to be stuck.
[20:53] <dmick> it's spotty at the moment
[20:53] <dmick> some things seem to work, some don't
[20:53] <sagewk> github is down too, that will break any of the workunit teuth tasks
[20:54] <elder> My trouble is in checking lock status.
[20:55] <elder> Just to clarify, does this have anything to do with the separation of networks that Tv talked about?
[20:55] <dmick> no
[20:55] <dmick> or at least I doubt it; it's possible you're having a different problem than everything else
[20:55] <dmick> afaik old teuthology access should remain as it was
[20:56] <dmick> I assume you're not trying to run on the new cluster?
[20:57] <elder> Correct.
[21:08] <nhm> dmick: I can't get to much of anything either (teuthology, ssh to old sepia nodes, etc)
[21:08] <joao> same here
[21:09] <joao> also, hi dmick
[21:10] <elder> OK, I'm going to go do something else for a bit...
[21:11] <nhm> yeah, I'm going to study python some more.
[21:14] * nhm reads idiomatic python
[22:08] <imjustmatthew> Has anyone had issues with CephFS corruption of files when the OSDs are severely IO bound?
[22:11] <nhm> imjustmatthew: I haven't, but what are you seeing?
[22:12] <imjustmatthew> nhm: a number of files are filled with /00 (unicode 0?) I haven't been able to rule out the client application yet though
[22:13] <nhm> strange. What were you seeing on the IO side?
[22:15] <imjustmatthew> nhm: The OSDs were hitting IO wait and lagging, causing the client to become intermittenly unresponsive. A file copy operation was going on at the same time on the OSDs using scp which was saturating the disks with reads
[22:16] <nhm> imjustmatthew: were the files with the corruption related to those operations taking place?
[22:18] <imjustmatthew> nhm: possibly, but there should only have been reads on the client to the files that are corrupted and the scp'ed files were completely unrelated to ceph
[22:20] <nhm> imjustmatthew: this may be a totally stupid idea, but maybe you could try to trace back which chunks on the underlying filesystem correspond to the files in ceph and look at their modification times to see if they were written to after the last time you wrote to them.
[22:20] <imjustmatthew> nhm: if it doesn't sound familiar I might just write it off; I haven't been able to reproduce it
[22:20] <imjustmatthew> nhm: ah
[22:20] <imjustmatthew> nhm: that's a good call
[22:20] <wido> sagewk: you there?
[22:21] <imjustmatthew> nhm: they're all fairly small, they could easily reside on one or two chunks
[22:22] <wido> I tried compiling wip-2116 and master today, that failed. I initilalized the submodule(s) and that went fine. I get: http://pastebin.com/H0RNqPWX
[22:23] <nhm> wido: there have been some network problems at DreamHost toady, so some of the guys may be intermittently available.
[22:24] <wido> nhm: Ah, okay :) That happends
[22:24] <wido> I justed wanted to inform sagewk that I tried wip-2116 for his feedback
[22:24] <nhm> wido: might want to email just in case
[22:24] <wido> nhm: I'll do, I'll update the issue then
[22:24] <wido> tnx
[22:41] * joao (~JL@ Quit (Remote host closed the connection)
[22:41] * joao (~JL@89-181-150-46.net.novis.pt) has joined #ceph
[22:43] <imjustmatthew> nhm: Would a chunk number look like "10000016bef" ?
[22:56] <sagewk> wido: strange, our deb gitbuilders aren't having that problem
[22:58] <wido> sagewk: Yeah, the deb won't build. I just manually build the binary
[22:58] <wido> ./configure and make worked
[22:58] <wido> scp'ed them into place and running the wip-2116 code now
[23:00] <wido> I'll leave the code running for the night with some I/O load on it, I'll check in the morning
[23:00] <wido> and see how stuff is running
[23:02] <sagewk> wido: perfect, thanks
[23:02] <sagewk> wido: that's with debug ms cranked up right?
[23:14] * joao (~JL@ace.ops.newdream.net) has joined #ceph
[23:36] <sagewk> yehudasa: pushed
[23:37] <sagewk> yehudasa: we should update all the encode/decode stuff too to use the new macros... opened a bug for it
[23:37] <yehudasa> sagewk: thanks
[23:37] <yehudasa> yep
[23:38] <yehudasa> sjust, sagewk: is the src/leveldb being empty intentional?
[23:42] <sagewk> yehudasa: './do_autogen.sh -d 3' cranks up warnings and other debuggy stuff
[23:54] * joao (~JL@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)

