#ceph IRC Log


IRC Log for 2011-01-24

Timestamps are in GMT/BST.

[1:38] <DeHackEd> a vacation usually involves time off irc, no?
[3:20] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Quit: Leaving)
[4:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:55] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:10] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:28] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:39] * BORN-TO-HACK (~TScript3@ has joined #ceph
[5:39] * BORN-TO-HACK (~TScript3@ Quit (autokilled: This host may be infected. Mail support@oftc.net with questions. BOPM (2011-01-24 04:39:55))
[5:45] * HACKERMIND (~TScript3@dhcp18136.myzipnet.com) Quit (Ping timeout: 480 seconds)
[6:49] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:19] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:35] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:36] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:41] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[7:47] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Remote host closed the connection)
[7:49] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) has joined #ceph
[8:00] * alexxy[home] (~alexxy@ has joined #ceph
[8:00] * alexxy (~alexxy@ Quit (Read error: Connection reset by peer)
[8:03] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:07] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:54] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[9:08] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:16] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:41] * allsystemsarego (~allsystem@ has joined #ceph
[10:23] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[10:28] * Yoric (~David@ has joined #ceph
[11:15] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[11:15] * gregaf_ (~gregaf@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[11:21] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[12:12] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[13:12] * gregaf_ (~gregaf@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: gregaf_)
[14:52] * sedulous (~sed@2001:41d0:2:46c1::1) has joined #ceph
[14:53] <sedulous> Should I bother testing Ceph with WebDAV (davfs) file systems as object storage?
[14:53] <sedulous> ... or is it futile
[14:54] <stingray> slooooow
[14:54] <gregaf> you mean using the Ceph POSIX layer with WebDAV as an object store?
[14:55] <sedulous> yes
[14:55] <gregaf> hmmm
[14:55] <gregaf> it's tied pretty well to RADOS
[14:56] <gregaf> if you wanted to rewrite the disk classes in the metadata server you could probably make it work
[14:57] <gregaf> but I doubt it would be worth it
[14:57] <sedulous> it's not likely to work out of the box, is it?
[14:57] <sedulous> no, certainly not
[14:57] <sedulous> i just have several WebDAV accounts with a few hundred gigabytes of storage behind each and would like to combine all of them into one file system
[14:57] <sedulous> it's not worth a lot of trouble
[14:57] <gregaf> oh, yeah, it wouldn't be a trivial undertaking for anything like that at all
[14:59] <sedulous> can you think of a way to somehow do it?
[14:59] <sedulous> isn't there some kind of meta file system that creates a block device based on an existing POSIX FS?
[15:00] <sedulous> which I could then use to run Ceph :)
[15:00] <gregaf> a block device on top of a posix fs?
[15:00] <gregaf> not off-hand, although I'm sure some virtualization package can handle it
[15:00] <sedulous> Yes. It could, for example, read/write "blocks" as file
[15:01] <sedulous> and present itself as 1 block device
[15:01] <gregaf> but if you've already got a posix fs, there's not much point in making another layer and then running Ceph on that layer?
[15:02] <sedulous> gregaf: there are two problems: 1) I need to unite several file system, 2) WebDAV does not support many features like locking
[15:02] <sedulous> also, each of them is not very fast but the combined speed is decent (~50-100 megabit/s)
[15:04] <gregaf> oh, right, I forgot webDAV had fake posix workings
[15:04] <gregaf> unfortunately, any solution I can think of for uniting them into a single striped filesystem is going to assume low-latency high-speed access to each connection, and will be unpleasant
[16:12] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[16:20] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[16:25] * yx (~yx@28IAAB6KH.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[16:33] * yx (~yx@anonymizer.blutmagie.de) has joined #ceph
[16:44] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[16:48] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:16] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[17:21] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:48] <wido> hi
[17:48] <wido> sagewk: I encountered a btrfs bug, compiled the latest for-linus branch against 2.6.37 and got: http://pastebin.com/UsAVV9ZF
[17:49] <wido> Should I report that at #btrfs? Or is this Ceph related? (Code you submitted)
[17:51] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:56] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:15] <wido> yehudasa: are you there?
[18:26] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[18:31] <stingray> baaa
[18:31] <stingray> I killed my server
[18:31] <stingray> damn it
[18:34] <wido> stingray: If you don't touch it, you can't kill it ;)
[18:35] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[18:39] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:51] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:54] * Yoric (~David@ Quit (Quit: Yoric)
[18:58] <stingray> there's no ipmi and it's 3600 kms from me
[18:58] <stingray> :|
[19:08] <jantje> auw
[19:12] <wido> stingray: auch :) Call / Mail the DC then
[19:18] * cmccabe (~cmccabe@ has joined #ceph
[19:19] <stingray> already did :)
[19:37] * ajnelson (~Adium@dhcp-63-189.cse.ucsc.edu) has joined #ceph
[20:07] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:55] <wido> gregaf: Is yehudasa in today?
[21:18] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[21:25] <gregaf> wido: he's moving
[21:25] <gregaf> houses, I mean
[21:25] <gregaf> so not today
[21:49] <wido> gregaf: ok, np
[21:49] <wido> I'm seeing the cephx problem again. My OSD's seem to go into a "split" situation
[21:49] <wido> where osd0 and osd2 want to talk, but not with osd1 and osd3 (and vise-versa)
[21:50] <wido> rados -p rbd ls spits out errors like "failed verifying authorize reply"
[21:51] <wido> I
[21:51] <wido> I'm going afk, ttyl!
[22:00] * ajnelson1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[22:02] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) has joined #ceph
[22:03] * ajnelson (~Adium@dhcp-63-189.cse.ucsc.edu) Quit (Ping timeout: 480 seconds)
[22:33] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[22:34] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[22:34] * ajnelson1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[22:40] * ajnelson (~Adium@dhcp-63-189.cse.ucsc.edu) has joined #ceph
[22:55] <bchrisman> is write I/O for a single file/object striped by default, or is there a crushmap or config setting that will do this?
[22:57] <gregaf> for files IO is striped across objects
[22:57] <gregaf> objects are not striped, they live on one OSD
[23:01] <bchrisman> ahh ok
[23:03] <bchrisman> another test.. pulled plug on a node in cluster… looks like there are some sort of problems there as well… wanted to check if this has been tested, and if so… by whom. I'm more than happy to do the testing… just want to check what other people are seeing.
[23:12] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:20] * greglap (~Adium@ has joined #ceph
[23:21] <greglap> bchrisman: what sort of problems with pulling the plug?
[23:21] <greglap> IIRC you have multiple OSDs/node, did you make sure to design your crush map so that your replicas aren't on the same node?
[23:33] <bchrisman> yeah.. my crushmap separates them...
[23:35] <bchrisman> so.. problem is some metadata inaccessible… ls hangs (standard redhat options), /bin/ls works (no color-based file type lookup)… I/O works..
[23:36] <bchrisman> seen these types of symptoms on other distributed filesystems when the lookup requires going out to the file to check what type it is for the ls color response.
[23:38] <greglap> hmmm
[23:38] <greglap> if you look at ceph -s, what's the PG status look like?
[23:38] <bchrisman> well.. this *also* has kernel client on osd systems.
[23:39] <greglap> if IO is working then that's not causing any problems, at least any that I can think of
[23:39] <bchrisman> but there's still plenty of unused swap
[23:40] <greglap> I wonder if maybe part of the metadata pool is inaccessible, and it's just serving standard /bin/ls out of memory
[23:40] <bchrisman> yeah… that sounds possible.
[23:40] <bchrisman> Ths was my first stab at testing node failure..
[23:40] <greglap> you should be able to tell based on the PGMap status
[23:40] <bchrisman> one sec on that.
[23:40] <greglap> ./ceph -s will show you a summary
[23:41] <bchrisman> http://pastebin.com/huaXTVX4
[23:42] <bchrisman> originally 12 osds… but four taken out when I pulled the plug on a node.
[23:42] <greglap> yeah
[23:42] <bchrisman> had three mds
[23:43] <greglap> oh
[23:43] <greglap> and you turned it down to 1?
[23:43] <bchrisman> not intentionally :)
[23:43] <bchrisman> let me put up my conf and crushmap
[23:43] <greglap> oh, wait, you had an MDS on each node and killed one node?
[23:43] <bchrisman> yeah
[23:44] <greglap> not intentionally which?
[23:44] <bchrisman> wel, I didn't 'turn it down to 1' intentionally
[23:44] <greglap> oh
[23:44] <greglap> did you ever turn it up to 3?
[23:45] <bchrisman> config: http://pastebin.com/p7kiHLRB
[23:45] <bchrisman> that's what I was running and didn't make any changes during runtime
[23:46] <bchrisman> and the crushmap I applied: http://pastebin.com/btbBJ84f
[23:53] <greglap> bchrisman: did anybody check your crush map?
[23:53] <bchrisman> greglap: no.. just me..
[23:53] <greglap> I've never learned to read them properly but it looks to me like you're skipping the host distinction completely?
[23:54] <bchrisman> looked like it worked when I did the drive pull test.
[23:54] <bchrisman> I think the 'rack' stanza distinguishes hosts… or at least… that was my intention there.
[23:55] <bchrisman> I was interpreting it: data -> step take rack -> item host0 host1 host2 -> item device0-device12 in the different host stanzas
[23:56] <greglap> argh, I don
[23:56] * cmccabe (~cmccabe@ has left #ceph
[23:56] <greglap> *don't remember how this works
[23:56] <bchrisman> not sure what casdata is… but the other 'top level' stanzas (as I'm calling them, data, casdata, rbd, metadata) seem to indicate what different parts of ceph is looking for.
[23:56] <greglap> I've never actually set one up myself :|
[23:57] <bchrisman> heh… cool.. we're in the same boat.. :)
[23:57] <bchrisman> I just piled on the assumptions regarding how it works from my minescule knowledge of ceph thus far.
[23:58] <bchrisman> figured I'd put up a wiki page on it if I got some confidence...
[23:59] <bchrisman> so the crushmap is setting up the mds data (as metadata) I think.. which means it should be distributed in the same fashion as the regular data.
[23:59] <bchrisman> (again, my interp)
[23:59] <greglap> casdata isn't actually used, I think it's a reserved pool for
[23:59] <greglap> yeah, that's definitely what it's doing
[23:59] <greglap> looking at the very limited wiki page, I think what your rule is doing though is selecting the rack (you have one, so it takes that one), and then choosing leafs from the rack...which means devices, while not worrying about what hosts they're on
[23:59] <greglap> (wiki page: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.