#ceph IRC Log


IRC Log for 2011-11-01

Timestamps are in GMT/BST.

[0:04] <Tv> grape: don't stress too much about the SSD-ness of it
[0:04] <Tv> grape: I think we're not at the level where that would make that huge of a difference
[0:05] <Tv> ok so if crowbar can't install a deb, it'll keep going as if it had worked.. lovely
[0:07] * tserong (~tserong@124-168-228-191.dyn.iinet.net.au) has joined #ceph
[0:09] * bchrisman1 (~Adium@ Quit (Quit: Leaving.)
[0:09] <grape> Tv: I'm for sure not stressing, but trying to figure out what I am going to need for the future.
[0:10] <gregaf> grape: the default journal size is 100MB and that's probably reasonable — basically what you want is something larger enough to absorb a few seconds of IO but not something so large that if/when it runs ahead of the main store you have to spend several seconds idling while a sync completes
[0:10] <Tv> grape: well let me put it this way: we pretty much don't have benchmarks, so we don't have benchmarks that'd show ssd makes it this-and-this-much faster
[0:11] <grape> gregaf: thanks, that helps.
[0:11] <grape> Tv: So nobody can prove my servers are slow. I like it :-)
[0:15] <grape> Tv: More seriously, I did notice a lack of benchmarks, and figured it was simply not done during heavy development for obvious reasons.
[0:15] <Tv> grape: yeah we're (still) dealing more with robust operation under load
[0:15] <Tv> grape: and easing the initial installation and ongoing maintenance, etc
[0:15] <Tv> grape: benchmarks are a natural next step, but not quite yet
[0:16] <grape> agreed. simple order of priorities.
[0:16] <Tv> grape: so all the load we put on ceph tends to be of the "just hit it hard, i don't care how hard" variety
[0:16] <Tv> grape: and success is defined as "all the operations completed successfully, the right data is there, nothing crashed"
[0:17] <grape> yeah crashed filesystems don't benchmark well.
[0:17] <gregaf> if it makes you feel better, the few benchmarks we've seen/done tend to indicate you'll hit wire speed on file transfers (although metadata-heavy ops can be a lot slower, but that's just a fact of life with clustered systems)
[0:18] <Tv> frankly, given the number of spindles we can put to the task, i'm not worried about actual data IO ;)
[0:19] <grape> I saw that with Gluster my bigest limitation was going to be network, and I don't think that is going to change much with any distributed FS.
[0:21] <grape> That is until I get rich and cough up a fortune for 10GbE and SSD, but I don't see that happening any time soon.
[0:24] * sandeen_ (~sandeen@sandeen.net) Quit (Quit: This computer has gone to sleep)
[0:34] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[0:38] * verwilst (~verwilst@dD57670C5.access.telenet.be) Quit (Quit: Ex-Chat)
[1:21] * cp (~cp@ Quit (Quit: cp)
[1:45] <nwatkins> gregaf: well, accept for a few hacks to keep things flowing things are looking good...just ran a nice big multi-GB terasort run on 12 notes with success
[1:51] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) has left #ceph
[2:38] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[2:48] * tserong (~tserong@124-168-228-191.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[2:57] * tserong (~tserong@58-6-102-149.dyn.iinet.net.au) has joined #ceph
[3:50] * gregorg_taf (~Greg@ Quit (Ping timeout: 480 seconds)
[3:58] * gohko_ (~gohko@natter.interq.or.jp) has joined #ceph
[3:58] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[3:59] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[3:59] * gohko_ (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[4:31] * tsuzuki (~tsuzuki@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:36] <tsuzuki> Hi all, I need your help. I want to export 20G rbd image("rbd export volume-0000002b test.img"), but command not effect. How to export rbd image?
[4:58] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[5:04] * sage (~sage@ Quit (Ping timeout: 480 seconds)
[5:14] * sage (~sage@ has joined #ceph
[9:57] * verwilst (~verwilst@d51A5B07F.access.telenet.be) has joined #ceph
[10:00] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[10:54] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:57] * tserong (~tserong@58-6-102-149.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[11:43] <mrjack> is 0.38 ready?
[11:55] * tsuzuki (~tsuzuki@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:01] * tserong (~tserong@58-6-102-149.dyn.iinet.net.au) has joined #ceph
[12:04] * failbaitr (~innerheig@ Quit (Quit: leaving)
[12:04] * failbaitr (~innerheig@ has joined #ceph
[14:58] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[15:10] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[15:26] * fronlius1 (~Adium@testing78.jimdo-server.com) has joined #ceph
[15:26] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[15:29] * verwilst (~verwilst@d51A5B07F.access.telenet.be) Quit (Quit: Ex-Chat)
[16:05] * adjohn (~adjohn@70-36-139-78.dsl.dynamic.sonic.net) has joined #ceph
[16:21] * adjohn (~adjohn@70-36-139-78.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[16:33] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:11] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:30] <gregaf> mrjack: .38 will probably get packaged up at the end of this week or the beginning of next week — any specific reason you're looking for it?
[17:39] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:00] * bchrisman (~Adium@ has joined #ceph
[18:11] * fronlius1 (~Adium@testing78.jimdo-server.com) Quit (Read error: No route to host)
[18:11] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[18:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[18:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[18:23] * _are__ (~quassel@vs01.lug-s.org) has joined #ceph
[18:24] * _are_ (~quassel@vs01.lug-s.org) Quit (Remote host closed the connection)
[18:26] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[18:26] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[18:26] * jrosser (jrosser@dog.thdo.woaf.net) Quit (Remote host closed the connection)
[18:26] * jrosser (jrosser@dog.thdo.woaf.net) has joined #ceph
[18:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[18:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:37] <grape> Allrighty... Back to Ceph. Time to get this installed.
[18:52] * cp (~cp@ has joined #ceph
[19:02] <damoxc> gregaf: does mds bal frag require multiple active mds?
[19:02] * sagelap (~sage@ has joined #ceph
[19:02] <gregaf> damoxc: nope
[19:02] <damoxc> how stable/unstable is it?
[19:02] <gregaf> mds bal frag enables directory fragmentation, which breaks apart large directories into smaller groups
[19:02] <gregaf> on very large directories that can be advantageous for a single MDS
[19:03] <gregaf> sagelap can give you a better idea of stability than I can
[19:03] <damoxc> sagelap?
[19:03] <sagelap> damoxc: when i last ran it through the trashing it behaved, but i dropped the ball and didn't turn it on by default due to lack of testing w/ clustered mds. so i'm not certain
[19:04] <grape> When I install ceph when in the process am I formatting drives to btrfs? Should this happen before or after configuring ceph.conf?
[19:05] <gregaf> grape: if you're giving the OSDs a raw partition to use they'll format it as btrfs themselves
[19:05] <damoxc> sagelap: okay cool, well i'll turn it on on one of my clusters and see what happens. is it just as simple as adding mds bal frag = true to the [mds] section and restarting the mds servers?
[19:05] <sagelap> yep
[19:05] <gregaf> grape: if you're just giving them a directory you need to create it before they start up
[19:05] <sagelap> damoxc: just make sure it's disposable data
[19:06] <damoxc> sagelap: is it only the metadata stuff that will be trashed, or the whole cluster if it goes wrong?
[19:07] <grape> gregaf: so I should simply create the partition and leave it raw. Does this auto-formatting include the creation of journal space, or am I creating that separately?
[19:08] <gregaf> grape: you need to give it a separate journal — again, you can either give it a raw block device which it will use on its own, or you can give it a file somewhere to use
[19:11] <grape> great, thanks
[19:14] <grape> gregaf: Should those partitions be set to mount at boot via fstab?
[19:17] <damoxc> grape: can be, or the ceph init scripts will mount them
[19:17] <grape> that might be safer
[19:26] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[19:29] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[19:33] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[19:41] <grape> Does OSD stand for Object Storage Daemon?
[19:45] <joshd> Object Storage Device - http://en.wikipedia.org/wiki/Object_storage_device#History
[19:45] <gregaf> it's muddied a bit with Object Storage Device, but for our purposes Object Storage Daemon is a better choice
[19:46] <grape> joshd: thanks
[19:47] <grape> gregaf: This is what I appear to have locked on to. I am working on making some sense out of the docs, re-writing some of it to suit my literal mind :-)
[19:48] <grape> I'm running with the "document first, install second" theory today.
[19:50] <grape> It seems like OSD makes sense to use for the device and COSD for the daemon.
[19:51] <grape> but the waters have been muddied ;-)
[19:53] <joshd> even more temporarily, since cosd was renamed ceph-osd in 0.36
[19:54] <grape> oh my!
[19:56] <grape> now the docs have three names for one thing, one of which means something else as well :-D
[19:57] <grape> I love docs :-)
[20:04] <grape> Would it be accurate to say that a Ceph cluster has four primary components: OSD, Ceph-OSD/COSD, CMDS, and CMON?
[20:05] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[20:06] <yehudasa> grape: yes, if you count the osd twice
[20:06] <yehudasa> grape: the fourth component would be the client
[20:07] <yehudasa> grape: but the mds is only required if you use the filesystem
[20:07] <grape> yehudasa: ah thanks
[20:08] <joshd> grape: everything that started with c was renamed to ceph- in 0.36, not just cosd (http://ceph.newdream.net/2011/09/v0-36-released/)
[20:09] <grape> joshd: that's good to know. I was wondering about that
[20:10] <grape> yehudasa: is block storage a case where the filesystem is not used, and MDS is not required?
[20:10] <yehudasa> grape: yes
[20:10] <grape> yehudasa: thanks
[20:16] <grape> Is the "max-mds" parameter limited to ceph.conf?
[20:23] * verwilst (~verwilst@d51A5B07F.access.telenet.be) has joined #ceph
[20:34] <sagelap> grape: i believe so.. probably only used during mkcephfs (i.e., mostly useless)
[20:43] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[20:43] <greglap> you can set the max mds after startup with the set_max_mds command:
[20:46] <damoxc> sagelap: will mds fragmentation only destroy the metadata, would rados be left intact?
[20:46] <damoxc> sagelap: or is there a risk the whole lot will die?
[20:47] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[20:47] <damoxc> :-(
[20:48] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) has joined #ceph
[20:48] <greglap> damoxc: it's not going to break the data objects, but without metadata you're going to have a hell of a time retrieving it...
[20:49] <greglap> (the filesystem data, that is. rados itself won't care a bit)
[20:49] <damoxc> greglap: i'm not fussed about losing the filesystem data, however I have a rbd on the cluster I'm going to test on I would rather not have die on me
[20:49] <greglap> damoxc: yeah, that won't get lost
[20:49] <greglap> as far as the OSDs are concerned the MDS is just another client
[20:49] <damoxc> greglap: cool :-)
[20:49] * sagelap (~sage@ has joined #ceph
[20:50] <nwatkins> gregaf: have any opinions about passing owner/group from hadoop down to ceph? hadoop deals in strings for user/group, each of which could be resolved to the local uid/gid and passed down through libcephfs.
[20:52] <greglap> nwatkins: I haven't thought about it since August 2009, when I evidently punted on it...
[20:53] <greglap> I'm not sure if local uid resolution is any good or not, or if we need to do something annoying like maintaining a lookup
[20:53] <nwatkins> what route does fuse take?
[20:53] <greglap> ceph itself deals with local uids and gids
[20:54] <nwatkins> Yeh, but ceph fuse will fill in gid/uid with the local users' information?
[20:54] <greglap> my assumption though is that hadoop uses uid/gid strings because they're supposed to be stable across large clusters where the local machines aren't necessarily synced?
[20:54] <greglap> nwatkins: yes
[20:55] <nwatkins> Each file system can implement setOwner(path, string:user, string:group) -- I think it's up to the file system to deal with this information.
[20:56] <nwatkins> Potentially we could has the strings to numeric values if Hadoop
[20:56] <nwatkins> ^has = hash
[20:57] <nwatkins> or stuff the Hadoop user/group strings in xattrs
[20:59] <greglap> nwatkins: yeah, I'm just not familiar enough with the Hadoop stuff to know what a good way to go is
[21:03] <nwatkins> greglap: using local gid/uid is nice because it could mean that when Ceph is mounted users have a unified view of permissions across Hadoop and the VFS. But you are also right that Hadoop doesn't necessarily assume that local users exist on the nodes. One of the techniques I mentioned (e.g. hashing) provide a stop-gap that eliminate the need for the patch to Hadoop, so I guess my question was more about if you had any strong preferences one way or ano
[21:03] <nwatkins> ther at this point.
[21:04] * The_Bishop (~bishop@port-92-206-76-12.dynamic.qsc.de) has joined #ceph
[21:06] <greglap> nwatkins: I don't at this point; it's basically a user interaction problem that you can probably guess at better than I can :)
[21:12] <nwatkins> greglap: ok thanks--i'll come up with something sane :)
[21:13] <greglap> cool, let me know if you run into other issues
[21:14] * fronlius (~Adium@e182089016.adsl.alicedsl.de) has joined #ceph
[21:23] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[21:41] <grape> I'm starting to think that these docs have beer as a non-negotiable dependency
[21:47] * fronlius (~Adium@e182089016.adsl.alicedsl.de) Quit (Quit: Leaving.)
[22:04] <grape> ah there are some gems tucked away in the new docs on github
[22:13] <Tv> grape: not just on github: http://ceph.newdream.net/docs
[22:14] <grape> Oy!
[22:15] <grape> ok, maybe not Oy. Looks like thats the stuff from github.
[22:15] <Tv> same content, more sugar coating
[22:18] * greglap (~Adium@aon.hq.newdream.net) Quit (Quit: Leaving.)
[22:32] * sagelap (~sage@ has joined #ceph
[22:42] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[22:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:48] * votz (~votz@pool-108-52-121-103.phlapa.fios.verizon.net) has joined #ceph
[23:09] * verwilst (~verwilst@d51A5B07F.access.telenet.be) Quit (Quit: Ex-Chat)
[23:16] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[23:23] * grape (~grape@ Quit (Read error: Connection reset by peer)
[23:23] * grape (~grape@ has joined #ceph
[23:37] * cp (~cp@ Quit (Quit: cp)
[23:41] * cp (~cp@ has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.