#ceph IRC Log


IRC Log for 2010-10-02

Timestamps are in GMT/BST.

[0:00] <gregaf> actually that ability is shortly going to become much more useful when we merge in the directory default file layouts
[0:01] <wido> idletask: for example, you mirror some data, a webserver should only need RO
[0:01] <gregaf> you could, eg have a machine configured with your base set of programs in a directory that's located on a pool that all your machines can read from, but not write to
[0:02] <gregaf> and then the home directories are located on a pool with read-write access
[0:02] <idletask> Well, I maintain, maybe the rwx isn't quite the right model in this case
[0:02] <wido> gregaf: do you mean we can have multiple data pools in the future?
[0:02] <wido> instead of the one big growing pool?
[0:02] <gregaf> the file_layouts branches that I've been working in let you set a default layout on a directory
[0:03] <wido> Oh, that is really great!
[0:03] <gregaf> and then any files created in the directory tree rooted on that dir will inherit that layout
[0:03] <gregaf> and layouts specify pools
[0:03] <cmccabe> idletask: clients need to talk directly with OSDes in order to cut down on latency
[0:03] <wido> That is something I really missed gregaf
[0:03] <cmccabe> at least that's my understanding from everything I've read
[0:03] <gregaf> It won't automatically move the tree to a new layout (and pool), but if you set them before any data's in there it'll work out
[0:04] <wido> great!
[0:04] <gregaf> Sage should be looking at it in the next few days, at which point I expect we'll merge it into unstable
[0:04] <idletask> cmccabe: that's my "guts" understanding as well, but this doesn't necessarily translate to "have 'r' permission on OSDs"
[0:05] <wido> file_layouts branch, right? How do I set a layout? With a xattr?
[0:05] <idletask> This just sounds too UNIX-y and "restrictive"
[0:05] <gregaf> wido: there's an ioctl and the userspace tree has a new tool cephfs that lets you execute most of the ioctls
[0:06] <gregaf> it's not packaged up nicely yet
[0:06] <wido> ok, I might give it a try
[0:06] <wido> next two weeks were moving from datacenter, so might be short on time
[0:06] <gregaf> but basically you'll want to setup your filesystem tree and then before you start creating files run the tool on each dir you want rooted in a different pool
[0:07] <wido> yes. But that's nice, for example your maildir can be placed on a pool which is on SSD
[0:07] <gregaf> you won't want to install it on your main cluster until we merge it, though, it changes the on-disk encoding a little
[0:07] <wido> where all my movies are on SATA disk
[0:07] <wido> no, I have some VM's where I do the small tests
[0:07] <gregaf> idletask: well strictly speaking you get permissions on OSD "pools"
[0:08] <gregaf> and the permissions are read the pool, write to the pool, and execute on the pool
[0:08] <gregaf> in a default install the only pools you have are "data", which clients have read/write on, metadata, which the MDSes have read/write on, and then if you use RBD or the S3 gateway those'll add pools too
[0:09] <cmccabe> gregaf: file_layouts sounds cool. Being able to migrate stuff around easily could be a big feature
[0:09] <gregaf> yeah, it doesn't implement that at all though
[0:09] <cmccabe> so it's just for creating new PGs?
[0:09] <gregaf> easy migration will require a lot more code to do synchronization and stuff
[0:09] <cmccabe> or I mean what does the ioctl look like
[0:10] <cmccabe> based on what my friends in the storage industry say, system administrators spend a lot of time moving data between filesystems
[0:10] <cmccabe> until recently Netapp's filesystem (WAFL) had a 16 TB limit
[0:10] <gregaf> it lets you set a layout, which specifies things like (for wido, most usefully) the pool, and any preferred OSD, and the striping strategy on the pool
[0:10] <cmccabe> of course Ceph is kind of a different thing
[0:11] <wido> gregaf: why a preferred OSD? Isn't that done by the crushmap and crushrule for that pool?
[0:12] <gregaf> generally speaking the preferred OSD isn't something you want to set
[0:13] <gregaf> but as I understand, it lets you sort of force-feed a location to CRUSH
[0:14] <gregaf> if for instance you're making temporary data and happen to know that a particular OSD sits next to you in the rack
[0:14] <gregaf> it'll still distribute it according to the pool rules but it lets you set a nearby primary
[0:14] <gregaf> I don't think we have anything that uses it, though — it turned out not to help Hadoop at all in our small-scale tests, which was the one area we thought it might be helpful
[0:15] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[0:15] <wido> ah, ok
[0:15] <wido> but a write isn't completed until all replicas received it, isn't it?
[0:16] <gregaf> correct
[0:16] <gregaf> but if you have a temp pool with 1x replication....
[0:16] <wido> ah, didn't think of that.
[0:17] <wido> Well, I might have a solution. For phprados i'm writing a session handler, just proof of concept
[0:17] <gregaf> and you get an unsafe ack once it's journaled on the primary, which you can access via rados
[0:17] <wido> sessions are temp data in PHP, you want them to be written fast
[0:17] <gregaf> and there might be a switch to make the Ceph clients consider a request done (for reporting purposes) once you get an unsafe ack, not sure though
[0:18] <gregaf> but mostly preferred_osd (really you should think of it as preferred_primary) I think was something that seemed like it might be a good idea to have, so we do
[0:18] <gregaf> even though we haven't come up with any good use cases for it yet :)
[0:18] <wido> No, but performance wise you could want it
[0:18] <wido> and the unsafe ack in conjunction
[0:19] <wido> think about replication to a second DC, where you want the primary to be near you
[0:19] <gregaf> haha, I try not to think of that :p
[0:19] <gregaf> that'll be a big project, for version 2 or 3 or something waaaay down the road
[0:20] <wido> yes :) But i'm thinking about RADOS too, not Ceph only
[0:20] <wido> RADOS works pretty well over higher (30ms) links
[0:27] <wido> well, i'm going afk, getting late here
[0:27] <cmccabe> later
[0:28] <wido> sagewk: I saw you commited some fixes for the MDS? Thanks! Try them out asap
[0:28] <wido> ttyl
[0:29] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[1:40] * cmccabe (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) Quit (Remote host closed the connection)
[1:41] * cmccabe (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has joined #ceph
[2:34] * greglap (~Adium@ has joined #ceph
[3:02] * cmccabe (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has left #ceph
[3:20] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[3:23] <idletask> I must be off, see you, have fun!
[3:23] * idletask (~fg@AOrleans-553-1-29-88.w92-152.abo.wanadoo.fr) Quit (Quit: .)
[3:45] * josef (~seven@nat-pool-rdu.redhat.com) Quit (Quit: leaving)
[4:18] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:39] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (Ping timeout: 480 seconds)
[4:44] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[5:31] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[6:13] * yehudasa_hm (~yehuda@adsl-69-225-137-176.dsl.irvnca.pacbell.net) Quit (Ping timeout: 480 seconds)
[7:29] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[7:30] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[7:40] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[7:41] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[9:00] * LW (~jkreger@rrcs-98-101-117-50.midsouth.biz.rr.com) Quit (Ping timeout: 480 seconds)
[9:07] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[9:10] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:14] * allsystemsarego (~allsystem@ has joined #ceph
[10:29] * sentinel_e86 (~sentinel_@ Quit (Remote host closed the connection)
[11:04] * sentinel_e86 (~sentinel_@ has joined #ceph
[11:52] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[14:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[14:46] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:47] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[17:40] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[18:42] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[19:14] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[19:29] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[22:10] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.