#ceph IRC Log


IRC Log for 2011-10-31

Timestamps are in GMT/BST.

[0:12] * fronlius1 (~Adium@e176054024.adsl.alicedsl.de) Quit (Quit: Leaving.)
[1:12] * verwilst (~verwilst@dD57670C5.access.telenet.be) Quit (Quit: Ex-Chat)
[2:07] * elder (~elder@cfcafwp.sgi.com) has joined #ceph
[6:46] * tserong (~tserong@58-6-130-191.dyn.iinet.net.au) has joined #ceph
[9:09] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:11] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[10:24] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[11:05] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[11:15] * gregorg_taf (~Greg@ has joined #ceph
[11:15] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[11:18] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[11:19] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[12:31] * verwilst (~verwilst@dD57670C5.access.telenet.be) has joined #ceph
[12:40] * verwilst (~verwilst@dD57670C5.access.telenet.be) Quit (Quit: Ex-Chat)
[13:16] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[13:29] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[13:43] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[13:44] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[13:55] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[13:59] * n0de (~ilyanabut@c-24-127-204-190.hsd1.fl.comcast.net) has joined #ceph
[15:44] * n0de (~ilyanabut@c-24-127-204-190.hsd1.fl.comcast.net) Quit (Quit: This computer has gone to sleep)
[15:44] * sandeen_ (~sandeen@sandeen.net) has joined #ceph
[15:45] <sandeen_> sagewk, ted accepted that fix for the xattr corruption problem. we both agree that it is a little of a hack but without some big rethinking, it works for now
[15:56] * aneesh (~aneesh@ Quit (Ping timeout: 480 seconds)
[15:56] * aneesh (~aneesh@ has joined #ceph
[16:00] * adjohn (~adjohn@70-36-139-78.dsl.dynamic.sonic.net) has joined #ceph
[16:28] * adjohn (~adjohn@70-36-139-78.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[16:33] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:00] * tserong (~tserong@58-6-130-191.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:09] * tserong (~tserong@58-6-99-126.dyn.iinet.net.au) has joined #ceph
[17:15] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:18] * grape (~grape@ has joined #ceph
[17:23] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[17:24] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[17:34] * bchrisman (~Adium@ has joined #ceph
[17:35] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[17:36] <sagewk> sandeen_ sweet, thanks!
[17:39] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) has joined #ceph
[17:43] <grape> Can anyone tell me about how much space should I reserve for metadata for ~8TB of storage?
[17:44] <grape> oh that doesn't make sense
[17:44] <grape> :-)
[17:48] <grape> is the ceph wiki the most up-to-date source of documentation?
[17:50] <grape> and is it worth taking the time to update or add to it?
[18:07] * adjohn (~adjohn@70-36-139-78.dsl.dynamic.sonic.net) has joined #ceph
[18:07] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[18:07] * adjohn (~adjohn@70-36-139-78.dsl.dynamic.sonic.net) has left #ceph
[18:10] * FoxMURDER (~fox@ip-89-176-11-254.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[18:11] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[18:26] * cp (~cp@ has joined #ceph
[18:46] <joshd> grape: we're slowly making up-to-date docs at http://ceph.newdream.net/docs/latest/ - these are in ceph.git, so you can send patches if you like
[18:46] <joshd> the wiki still covers more topics though
[18:47] <joshd> so it's still useful to update
[18:49] * verwilst (~verwilst@dD57670C5.access.telenet.be) has joined #ceph
[18:50] <Tv> the wiki broke again :(
[19:12] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[19:13] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:18] <grape> joshd: great. I'll check it out and contribute what I can. I am doing a complete install on Ubuntu 11.10, so documenting that will be easy and might help folks out.
[19:20] <gregaf> nwatkins: you said your hadoop patch (for null instead of FileNotFound) fixed issue #1661
[19:21] <gregaf> which is for expected system directories not existing — that's what you meant?
[19:22] <grape> if you guys want some tests run on a clean install let me know. I'll ask again when I get to that point, but feel free to ask questions.
[19:28] <Tv> redmine gives internal errors on file uploads
[19:29] <Tv> Errno::EACCES (Permission denied - /home/cephtracker/tracker.newdream.net/files/111031112756_libvir
[19:29] <Tv> t-kludge.diff):
[19:29] <Tv> hmmm
[19:30] <Tv> drwxr-xr-x 2 1000 1000 20480 2011-10-06 13:32 /home/cephtracker/tracker.newdream.net/files/
[19:30] <Tv> there is no uid 1000 on that box
[19:34] <nwatkins> gregaf: yes, that's what i meant
[19:34] <gregaf> k, awesome
[19:35] <gregaf> just wanted to make sure since i wasn't expecting the bug to be an incorrectly-implemented api :)
[19:39] <nwatkins> gregaf: yeh, it took some poking around. getFileStatus return FileNotFound, but listStatus returns null
[19:40] <lxo> rats, the loss of directory layout info appears to be a murphy/heisenbug. I only run into it when I'm not looking; if I try to trigger it, I can't
[19:44] <lxo> I guess I'm gonna have to patch the sources to detect the problem and abort or something
[20:21] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[20:26] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[20:36] * MKFG (~MK_FG@ has joined #ceph
[20:37] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[20:37] * MKFG is now known as MK_FG
[20:37] <gregaf> nwatkins: I forgot to ask earlier — is that MDS crash you saw repeatable, a blocker, or just a one-time thing?
[20:39] <nwatkins> gregaf: so far it's just a one-time thing
[20:40] <gregaf> okay, cool — just trying to prioritize different stuff
[20:44] <nwatkins> gregaf: performance/stability wise, is it fine to go with btrfs or ext4? i'm moving this hadoop stuff onto the cluster now and it seems like there's been a lot of chatter on the mailing list about the underlying file system.
[20:47] <gregaf> nwatkins: there are a few issues with ext4 replay right now, but I think it's only if you're using cloning, which Hadoop won't, so you should be safe with that
[20:48] <gregaf> btrfs is our favorite, of course, but how pleasant that experience will be depends on what kernel you're running (and isn't a linear progression) :(
[20:49] <nwatkins> Ok, makes sense. I think for now I am willing to sacrifice performance for stability to work high-level bugs out before moving onto any sort of performance tuning.
[21:07] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[21:07] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[21:07] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:08] <Tv> yehudasa: ok so the thing confusing me was the acl vs policy; you're completely correct on the acl part, and anyone who wants "make all objects in this bucket world-readable" needs to set a policy (which is what I've seen people do, and hence argued that that feature exists)
[21:09] <Tv> e.g. http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?HostingWebsiteOnS3Setup.html
[21:43] * n0de (~ilyanabut@c-24-127-204-190.hsd1.fl.comcast.net) has joined #ceph
[21:49] * fronlius (~Adium@e176052048.adsl.alicedsl.de) has joined #ceph
[22:04] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:11] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:12] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:21] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:26] * n0de (~ilyanabut@c-24-127-204-190.hsd1.fl.comcast.net) Quit (Quit: This computer has gone to sleep)
[22:40] * MarkN (~nathan@ has joined #ceph
[22:42] * MarkN (~nathan@ Quit (Remote host closed the connection)
[22:43] * MarkN (~nathan@ has joined #ceph
[22:43] * MarkN (~nathan@ has left #ceph
[23:10] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[23:10] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:11] <grape> What size partition is the journal going to need, overall or per TB of storage?
[23:15] * bchrisman1 (~Adium@ has joined #ceph
[23:15] * bchrisman (~Adium@ Quit (Read error: Connection reset by peer)
[23:15] <grape> hey the wiki works :-)
[23:22] <joshd> grape: journal size is more related to workload than overall storage size
[23:22] * sage (~sage@ has joined #ceph
[23:36] <nwatkins> gregaf: the client debug is spewing something about BADAUTHORIZER when trying to connect. The repeated excerpt from the log is here http://pastebin.com/Nt8Cc4Be
[23:40] <joshd> nwatkins: that sounds like you have cephx enabled on the servers, but the client isn't using cephx - unless you have 649921132b6bca1ff104c3fffe9a09bb26aba3fc, whuch will give you a different error
[23:41] * fronlius (~Adium@e176052048.adsl.alicedsl.de) Quit (Quit: Leaving.)
[23:42] <nwatkins> joshd: that sounds right. so I guess I need to provide the keyring through the libcephfs initialize function?
[23:44] <joshd> yeah, that can be from your conf file, or set via ceph_conf_set before mounting
[23:50] <nwatkins> joshd: I'm unable to use ceph_conf_set (this is going through Java), so I tried to pass "-k /path/to/keyring" and this didn't work...
[23:52] <joshd> nwatkins: could be the wrong client name as well - try adding "-n client.name" too
[23:52] <joshd> and --auth-supported=cephx just to be sure
[23:53] <nwatkins> joshd: i'm not familiar with client name. I leave "client.name" as is, or is there something specific I should be replacing there?
[23:53] <joshd> it's the cephx username, e.g. client.admin
[23:53] <nwatkins> hrmm
[23:53] <joshd> if you cat your keyring you'll see it as section headers
[23:54] <nwatkins> got it
[23:56] <nwatkins> joshd: works now. thx!
[23:56] <grape> joshd: so the more IO, the bigger the journal needs to be. That makes sense. The docs suggest a small SSD. Are we talking 8GB, 16GB, 128GB?
[23:58] * tserong (~tserong@58-6-99-126.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[23:58] <grape> joshd: providing that workload==IO ;-)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.