#ceph IRC Log


IRC Log for 2011-03-22

Timestamps are in GMT/BST.

[0:01] <cmccabe> interesting idea for listen()
[0:01] <gregaf> mmm Babylon 5 seasons 2-4
[0:01] <sjust> indeed
[0:03] <cmccabe> that would make sockets slightly less special, which would probably be good.
[0:03] <Tv> http://cr.yp.to/tcpip/twofd.html
[0:03] <Tv> cmccabe: exactly
[0:04] <cmccabe> tv: kind of interesting that netcat is such a recent invention
[0:04] <cmccabe> tv: you would think something like that would be old, old, old, but it's not really
[0:04] <Tv> it was well established >10 years ago
[0:05] <Tv> hah 1996
[0:05] <cmccabe> tv: that's nothing on the unix timescale though
[0:05] <gregaf> dude, I was alive then
[0:05] <cmccabe> haha
[0:05] <gregaf> it's younger than bloody Python is
[0:09] <Meths> and Ruby
[0:11] <gregaf> meh, Ruby's a web language :p
[0:11] <gregaf> they're all new
[0:12] <cmccabe> ruby was around for a bunch of years in japan before it became a phenomenom outside the country
[0:12] <cmccabe> but yeah, still not very old in comparison to... say, sockets :)
[0:12] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:30] <Tv> sagewk: i want to do terrible, violent things to ceph's in-kernel base64 logic..
[0:30] <sagewk> armor.c?
[0:30] <Tv> yeah
[0:30] <sagewk> be my guest :)
[0:30] <Tv> sagewk: so i'm thinking, new-style keys will be binary in kernel, once we can drop the secret= option we can drop armor.c
[0:31] <Tv> just need to fiddle with the interfaces
[0:32] <sagewk> that works for me
[0:32] <Tv> if i don't make that switch now, it'll be harder to do later
[0:32] <Tv> or i need to embed keys in a struct that says what type they are etc
[0:32] <Tv> so in the end i think it's just easier to do this switch at this point
[0:33] <Tv> and when i say embed keys in a struct i mean the userspace-visible things
[0:33] <Tv> i'd rather have those be just blobs of entropy
[0:57] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:02] * samsung (~samsung@ has joined #ceph
[1:27] * greglap (~Adium@ has joined #ceph
[2:02] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:11] * greglap (~Adium@ Quit (Quit: Leaving.)
[2:49] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:49] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:52] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[2:52] * rajeshr (~Adium@ Quit (Quit: Leaving.)
[4:31] * samsung (~samsung@ Quit (Quit: Leaving)
[4:47] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[5:38] * lidongyang_ (~lidongyan@ Quit (Remote host closed the connection)
[5:55] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[5:56] * samsung (~samsung@ has joined #ceph
[5:58] * Yuki (~Yuki@ has joined #ceph
[5:59] <Yuki> samsung,
[6:26] * Yuki (~Yuki@ Quit (Ping timeout: 480 seconds)
[6:45] * Yuki (~Yuki@ has joined #ceph
[7:11] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[8:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:03] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[9:06] * andret (~andre@pcandre.nine.ch) has joined #ceph
[9:14] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:17] * allsystemsarego (~allsystem@ has joined #ceph
[9:30] * allsystemsarego_ (~allsystem@ has joined #ceph
[9:33] * allsystemsarego_ (~allsystem@ Quit ()
[9:34] * allsystemsarego (~allsystem@ Quit (Ping timeout: 480 seconds)
[10:22] * Yoric (~David@ has joined #ceph
[10:49] * atgeek (~atg@please.dont.hacktheinter.net) has joined #ceph
[10:49] * yehuda_wk (~quassel@ip-66-33-206-8.dreamhost.com) has joined #ceph
[10:51] * iggy_ (~iggy@theiggy.com) has joined #ceph
[10:51] * [ack]_ (ANONYMOUS@ has joined #ceph
[10:51] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * yehudasa (~quassel@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * darkfaded (~floh@ Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * iggy (~iggy@theiggy.com) Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * [ack] (ANONYMOUS@ Quit (synthon.oftc.net osmotic.oftc.net)
[10:51] * atg (~atg@please.dont.hacktheinter.net) Quit (synthon.oftc.net osmotic.oftc.net)
[10:54] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[10:56] * darkfader (~floh@ has joined #ceph
[10:56] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[10:56] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[10:57] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[10:57] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[11:20] * Yuki (~Yuki@ Quit (Quit: Leaving)
[11:31] * Administrator_ (~samsung@ has joined #ceph
[11:37] * samsung (~samsung@ Quit (Ping timeout: 480 seconds)
[12:16] * Yuki (~Yuki@ has joined #ceph
[13:20] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[13:24] <Administrator_> how to test ceph in a big commerical envir?
[13:24] * Administrator_ is now known as samhung
[13:25] <samhung> sometime it unstable
[13:25] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:19] * Yuki (~Yuki@ Quit (Quit: Leaving)
[15:06] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[15:50] * neurodrone (~neurodron@dhcp213-123.wireless.buffalo.edu) has joined #ceph
[16:07] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[16:13] * neurodrone (~neurodron@dhcp213-123.wireless.buffalo.edu) Quit (Quit: neurodrone)
[16:17] * maswan (maswan@kennedy.acc.umu.se) has joined #ceph
[16:37] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[16:45] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[16:48] * andret (~andre@pcandre.nine.ch) has joined #ceph
[16:51] * greglap (~Adium@ has joined #ceph
[17:14] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:21] <Tv> sjust: so how do i recognize the jammed-heartbeat bug from the logs?
[17:22] <greglap> Tv: which jammed-heartbeat bug?
[17:23] <Tv> greglap: sjust found something funky earlier
[17:24] <Tv> autotest just reproduced it (/something similar) for us
[17:25] <greglap> ah, by "jammed-heartbeat" I thought you meant the heartbeats were stuck while everything else kept going
[17:28] <Tv> greglap: oh sorry, the exact opposite
[17:28] <Tv> jammed-with-heartbeat
[17:28] <Tv> which is worse
[17:28] <greglap> yeah, I got it now — been talking to him about this offline occasionally :)
[17:29] <Tv> i see 8 threads in poll(), rest in pthread_cond_wait
[17:29] <greglap> the ones in poll() are presumably messenger pipes
[17:36] <Tv> the ones i checked were
[17:38] <sjust> greglap: fun fact, it's in journal_writeahead mode
[17:39] * neurodrone (~neurodron@dhcp214-144.wireless.buffalo.edu) has joined #ceph
[17:39] <greglap> sorry, not sure what that means here...
[17:39] <greglap> at the station, be in soon
[17:39] * greglap (~Adium@ Quit (Quit: Leaving.)
[17:39] <sjust> just that we more often test parallel
[17:41] <Tv> sjust: job 263 looks similar but different
[17:41] <Tv> the logs have way more stuff
[17:53] * cmccabe (~cmccabe@ has joined #ceph
[18:00] <gregaf> sjust: we may start testing parallel more often, but I think I'm the only one who ever runs parallel on their dev machine...
[18:00] <cmccabe> gregaf: parallel?
[18:02] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Quit: Leaving.)
[18:02] <gregaf> journaling mode
[18:02] <gregaf> as opposed to write-ahead
[18:02] <gregaf> or write-behind, which is useless and I don't think anybody uses
[18:03] <gregaf> because it is useless
[18:09] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:14] * neurodrone (~neurodron@dhcp214-144.wireless.buffalo.edu) Quit (Quit: neurodrone)
[18:34] * rajeshr (~Adium@ has joined #ceph
[19:14] * neurodrone (~neurodron@dhcp205-162.wireless.buffalo.edu) has joined #ceph
[19:21] <cmccabe> ok, I'm renaming osync to objsync
[19:21] <cmccabe> to avoid the O_SYNC confusion
[19:22] <cmccabe> and since it's an object synchronizer, objsync seems like the best choice.
[19:22] * Tv dislikes the "js" transition in pronouncing that
[19:24] <cmccabe> at least everyone knows how to pronounce it... it's not like fsck or something
[19:25] <cmccabe> obsync is too similar to obscene / absurd
[19:25] <cmccabe> I guess I could go the cute route and call it obsurd
[19:27] * josef (~seven@nat-pool-rdu.redhat.com) has joined #ceph
[19:28] <josef> sagewk: finally sat down and learned the new fedora system so i could update ceph
[19:28] * Yoric (~David@ Quit (Quit: Yoric)
[19:28] <lxo> synko
[19:28] <josef> provided i dont throw this computer out the window ceph should be updated today
[19:29] <cmccabe> josef: great! we fixed a few rpm issues recently so hopefully it will be easier
[19:29] <sagewk> cmccabe: prefer obsync myself, even if it is obscene
[19:29] <josef> yeah i took a diff of the ceph spec and put it into fedora
[19:29] <Tv> sepia88 ain't rebooting :-(
[19:29] <cmccabe> josef: specifically, it's no longer complaining about unpackaged files
[19:29] <josef> great :), thats the only big difference that existed before
[19:29] <Tv> sjust: do you have any magic tricks to make sepia nodes behave?
[19:29] <josef> i'm building the result in koji to make sure it passes
[19:29] <sagewk> josef: awesome. was the .spec fine or did it need any adjusting?
[19:30] <josef> one thing that needed adjusting was the debug trick
[19:30] <sagewk> tv: jabber someone in the noc
[19:30] <cmccabe> josef: we'd like to split the GTK program (gceph) into a separate package (but probably keep in the same spec file). If you know how to do that it would be great
[19:30] <josef> it points to a static path, just had to change it to use %{_localstatedir}
[19:30] <cmccabe> josef: because it's not really ideal to ask most users to install all of gtk
[19:30] <josef> cmccabe: thats already done
[19:30] <cmccabe> josef: great
[19:31] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Quit: Ex-Chat)
[19:31] <josef> gcephtool
[19:31] <josef> thats the way its setup in your .spec file
[19:31] <lxo> hey, I've got 0.25.1 rpms built on blag140k (=~ Free Fedora 14)
[19:32] <cmccabe> josef: oh yeah, I can see %package gcephtool there
[19:32] <cmccabe> josef: I had problems on centos 5.5 because the version of gtk was too old, but it tried to build with gtk by default
[19:32] <cmccabe> josef: I guess I should have turned that off via a spec file flag?
[19:33] <cmccabe> josef: I have to plead ignorance here.... how do things like _with_gtk2 get set from rpmbuild?
[19:34] <josef> so you do something like
[19:34] <josef> %define with_gtk %{?_without_gtk: 0} %{?!_without_gtk: 1}
[19:34] <josef> and then you can rpmbuild --without gtk
[19:35] <josef> and then you have to have the appropriate wrappers around the subpackages to make sure they conditionally build
[19:35] <cmccabe> ic
[19:37] <josef> sagewk: ill try not to let things get this out of date again :)
[19:37] <sagewk> josef: no worries. wanna send a patch so we can have the up to date .spec in ceph.git?
[19:38] <josef> sagewk: yeah as soon as i'm sure it builds right
[19:38] <sagewk> great. that's testing just a fedora build, or also rhel?
[19:38] <josef> just fedora
[19:39] <josef> sagewk: speaking of rhel
[19:39] <josef> there was a request for google-perftools for rhel5, which isn't going to happen, but you could get it into EPEL easily
[19:41] <cmccabe> josef: we can't get into RHEL if we depend on a package that isn't in RHEL, though, can we?
[19:41] <josef> cmccabe: right but you aren't getting into RHEL5 ever :)
[19:41] <sagewk> i think it'll be a while before ceph is in any rhel for that matter :)
[19:42] * rajeshr (~Adium@ Quit (Quit: Leaving.)
[19:42] <josef> yeah i'd like to think we'd have somebody who's looked at ceph before it ends up in rhel :)
[19:42] <cmccabe> I guess rhel 5 is mostly done
[19:42] <sagewk> josef: as long as rhel folk can build the package with minimal pain. what does it take to get tcmalloc in epel?
[19:42] <cmccabe> and that's reasonable
[19:43] <josef> sagewk: not sure, afaik getting things into epel is like getting them into fedora
[19:43] <josef> so it would just be a matter of getting somebody to package it for epel
[19:43] <josef> since its already a fedora package that should be trivial
[19:46] <sagewk> cmccabe: http://fedoraproject.org/wiki/Getting_a_Fedora_package_in_EPEL i guess :)
[19:48] <johnl_> hi all
[19:49] <cmccabe> johnl_: hi
[19:49] <johnl_> do the metadata daemons speak to the monitor cluster? is the monitor cluster responsible for marking metadata daemons down?
[19:50] <cmccabe> I don't know exactly how metadata daemons are marked down. I think it has to do with the mdsmap
[19:50] <cmccabe> I think similar to the osds, we rely on gossip from other mdses to realize when something is down
[19:50] <johnl_> the paper says "Each active MDS sends regular beacon messages to a central monitor
[19:51] <johnl_> but it's not clear if that's the monitor cluster, or a designated mds daemon
[19:51] <Tv> johnl_: it's the monitor afaik
[19:51] <johnl_> k, ta
[19:51] <Tv> the gossip is used to distribute the new state of who's up
[19:51] <Tv> and any peer who sees X down can hint the mons to mark it down
[19:52] <Tv> that's my understanding of that
[19:54] <josef> hrm, failed to build because it couldnt find libcls_rbd.o
[19:54] <josef> err .so
[19:55] <cmccabe> josef: are you building master?
[19:55] <josef> no i'm building 0.25.1
[19:56] <josef> heh looks like the trick to skip the debuginfo stripping didnt work right
[19:57] <josef> hmm no thats not it
[19:58] <johnl_> paper says that clients randomly connect to md servers when they don't know where a subtree is located.
[19:58] <bchrisman> heh… I ended up blowing away default macros to get around stripping —macros=/dev/null though there is certainly a better way...
[19:58] <johnl_> and that the subtrees can be replicated to other nodes kind of on-demand when there is a sudden flash mob access pattern
[19:59] <johnl_> what is responsible for noticing the sudden load?
[19:59] <johnl_> paper says that initial overloading doesn't happen becuase they connect randomly
[20:00] <johnl_> but if they're connecting randomly, how does the current authoritave node know about the upcoming flash?
[20:01] <cmccabe> johnl_: speaking in generalities , because I didn't write this code
[20:02] <cmccabe> johnl_: when you notice load, you start replicating the subtrees and doing other stuff to distribute the load
[20:03] <johnl_> yeah, I get that. paper just says that (basically) the initial random connection pattern buys the cluster some time to replicate the data
[20:04] <cmccabe> johnl_: I think part of the idea is to avoid inflexibility... like some systems don't allow you to put only part of a directory on an mds, but ceph does
[20:04] <cmccabe> johnl_: there was some stuff in there about how having to decide to replicate the whole directory or nothing was a poor choice to have to make
[20:05] <johnl_> yeah, sure. it's a great design. I'm just unsure of that one concept. that because clients aren't flooding the current authoritative server, it has time to replicate and distribute load.
[20:05] <johnl_> but if they're not flooding the authoritative server, how does it know there is a flood!
[20:05] <cmccabe> johnl_: I'm afraid you'll have to ask greg or sage
[20:05] <johnl_> aah, I suppose it gets a random partition of the flood
[20:06] <johnl_> which would be a big clue
[20:06] <johnl_> ok, ta.
[20:07] <cmccabe> johnl_: they're at lunch now (noon here)
[20:08] <johnl_> nearly dinner time here.
[20:08] <johnl_> curry time!
[20:17] <johnl_> does Ceph currently support parity type replication? like equivalent of RAID5? paper mentions that CRUSH can do it but I've not seen it mentioned anywhere else that I recall
[20:20] <cmccabe> johnl_: what do you mean parity type?
[20:20] <cmccabe> johhl_: if you mean doing the XOR thing, we don't support that as far as I know
[20:21] <johnl_> yeah something like that
[20:21] <cmccabe> johnl_: in general replication in objects stores (not just ceph) replicates the whole object
[20:22] <johnl_> right, didn't think so.
[20:22] <cmccabe> johnl_: the reason why raid5 uses parity is because they wanted to avoid the overhead of storing a full replica (RAID1)
[20:22] <johnl_> sounds a bit like CRUSH could support it though (as in, it be described in the crush map)
[20:22] <sagewk> johnl_: the basic idea is this: / and /home are popular and replicated on all nodes. /home/foo is seen by nobody. all nodes suddenly try to hit /home/foo. they direct their requests based on /home (popular, replicated).
[20:23] <cmccabe> johnl_: in a clustered filesystem, I don't think parity-based storage makes sense. It would mean that different OSDs could no longer operate independently.
[20:23] <sagewk> johnl_: i'm not sure what extent to which the current code does exactly that. that load scenario isn't something i've looked at recently. and it may be that in the general case, we don't want that because it's slower.
[20:24] <cmccabe> johnl_: with raid, everything is integrated into this tight bundle on a single machine, so it makes more sense. And sometimes the XOR stuff is hardware accelerated
[20:24] <cmccabe> sagewk: so basically, we notice the heavy load, then take steps to alleviate it?
[20:24] <johnl_> ah ic. your paper seems to suggest something else, but I might have misread. Seemed to mention exploiting the fact that clients would be connecting randomly initially
[20:25] <cmccabe> I still don't understand the mechanics of how /home/foo goes from not being replicated, to being replicated everywhere
[20:25] <sagewk> johnl_: yeah that's right. if they have the item cached, they generally have a good idea how popular it is and can act accordingly. if they don't, they direct requests based on parents (which essentially means random)
[20:26] <sagewk> the actual replication is done by the mds's monitoring their own load and responding. the idea is that solving that problem isn't quite enough if the clients all hammer the same auth node and then have to be redirected elsewhere.
[20:26] <johnl_> ic
[20:27] <cmccabe> sagewk: so essentially the clients are directed to a random MDS for unpopular things, and that MDS puts a copy of that thing into the cache. And that's replication...
[20:27] <johnl_> ah ha!
[20:27] <sagewk> cmccabe: that's one case, yeah (if you s/unpopular/unknown/)
[20:27] <johnl_> I thought the random mds actually just hinted to the client where to go next
[20:28] <cmccabe> well, hopefully they don't all pull it into cache. So which ones pull? that's the interesting question
[20:28] <cmccabe> I guess it has to do with that two-phase commit-based protocol we talked about yesterday
[20:29] <sagewk> johnl_: right. unless it's hot, in which case it already has a replica. and if it's just getting hot, the mds will push replicas to other mds nodes
[20:29] <johnl_> paper seemed to suggest the authoritative server does the replicating to the others.
[20:29] <johnl_> ah indeed
[20:30] <sagewk> cmccabe: yeah in general it only replicates if it's actually hot
[20:30] <johnl_> so the intitial random connecting just alleviates the initial load on the authoritative server - it sees enough random connections to tell that it's getting hot, so starts to replicate
[20:31] <johnl_> and because the initial connections are random, more and more of them start hitting servers that now have the replicated metadata
[20:32] <johnl_> rather than them all hitting the original authoritative one (which they'd have if there was some static hash for the initial lookup)
[20:34] <johnl_> think I've got it
[20:35] * todin (tuxadero@kudu.in-berlin.de) Quit (Remote host closed the connection)
[20:45] * neurodrone (~neurodron@dhcp205-162.wireless.buffalo.edu) Quit (Quit: neurodrone)
[20:54] <johnl_> this paper does mention RAID4 in relation to CRUSH
[20:54] <johnl_> RAID-4 does do parity
[20:55] <johnl_> suspect it's just not implemented in ceph then?
[20:56] <Tv> johnl_: yeah, it's a future possibility
[20:57] <Tv> erasure codes are sexy but slow & require lots of coordination
[20:57] <cmccabe> raid is less sexy than it used to be because the time required to rebuild a raid volume has been growing
[20:58] <cmccabe> and the likelihood of a double fault has grown exponentially along with hard disk sizes
[20:58] <johnl_> cmccabe: tell me about it, heh
[20:58] <cmccabe> since bit errors are (supposedly) a function of num bits
[20:58] <cmccabe> as in , a linear function
[20:58] <cmccabe> so N bits kN errors on average
[20:59] <cmccabe> next year hard disks have 2N bits and 2kN errors... etc
[21:00] <johnl_> crush gives much more flexibility of layout than raid though :)
[21:00] <cmccabe> RAID just *might* make a comeback if SSDs take off
[21:00] <Tv> the new-fangled erasure codes are neat though.. stuff like "you can recover the bits if you can find 7 of the 10 blocks"
[21:00] <cmccabe> because SSDs actually can supply more bandwidth to the disk, whereas hard drives have stagnated at 15k RPM for like forever
[21:00] <johnl_> that reed-solomon matrix stuff?
[21:01] <johnl_> yeah, neato.
[21:01] <Tv> cmccabe: actually, the measurements i've seen of many SSDs say a good fast hard drive has more bandwidth
[21:01] <Tv> cmccabe: SSD just have ~0 seek time
[21:01] <cmccabe> tv: in the long term, SSD *can* have more bandwidth
[21:02] <cmccabe> tv: now a lot of SSDs out there right now probably don't stack up. But that's another issue.
[21:02] <cmccabe> tv: it's a physics thing.
[21:02] <johnl_> physics makes us all it's bitches.
[21:02] <Tv> cmccabe: no my point is, a VW bus full of tapes is a hell of a lot of bandwidth..
[21:02] <cmccabe> the whole reason for SATA 6G to exist is SSDs
[21:03] <Tv> don't underestimate what a bunch of spindles with tightly-packed bits can spit out
[21:03] <cmccabe> tv: my point is, RAID sucks now on hard drives, because you have a teeny little interface to a big old disk
[21:03] <cmccabe> tv: people are even talking about the end of fsck
[21:03] <Tv> the interface will from here on always be teeny, compared to the amount of data behind it
[21:04] <cmccabe> tv: that's a rather bold statement
[21:04] <Tv> interface to RAM is teeny already
[21:04] <Tv> there's lots of precedent
[21:04] <Tv> the only way around that is to decentralize the logic more
[21:04] <Tv> which doesn't change the underlying statement..
[21:05] <cmccabe> tv: if the time required to read an entire drive keeps growing, eventually fsck will be of academic interest only.
[21:05] <cmccabe> tv: at least offline fsck.
[21:05] <johnl_> could someone explain this line from the example crush map for me? "step chooseleaf firstn 0 type rack"
[21:05] <Tv> or drives will change to *have* fsck
[21:05] <johnl_> I understand what the result is, but don't understand each part of the syntax there
[21:05] <cmccabe> tv: no system administrator will take his system offline for a month to run fsck on his hyper petabyte drive with the teeny sata interface
[21:06] <cmccabe> tv: drives can't have fsck, that's a filesystem thing. However, online fsck may be possible.
[21:07] <sjust> johnl_: chooseleaf causes crush to choose a leaf, firstn 0 specifies one leaf, type rack specifies to choose one leaf from each rack, I think
[21:07] <Tv> cmccabe: bleh, my C-64 had fsck in the drive.. you lack imagination
[21:08] <johnl_> in this example, that'd result in a chosen host bucket, not a device
[21:08] <johnl_> http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH#Example_crush_map
[21:08] <cmccabe> bbl, lunch
[21:08] <johnl_> oop, doorbell
[21:09] <sjust> johnl_: chooseleaf forces crush to go to a leaf node from the rack, so a device
[21:13] <johnl_> ah, leaf node! a node with no other children. the terminology didn't click!
[21:13] <johnl_> duh. thanks sjust
[21:13] <sjust> johnl_: yep! no problem
[21:35] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[22:24] * samhung (~samsung@ Quit (Read error: Operation timed out)
[23:38] <cmccabe> I am having quite a bit of trouble finding out what the actual rules are for S3 bucket names
[23:38] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[23:38] <cmccabe> I thought they were utf-8 like S3 keys, but that seems not to be true
[23:38] <Tv> cmccabe: way more restricted
[23:39] <Tv> cmccabe: they have to be valid dns names
[23:39] <cmccabe> didn't dns get unicode-ized recently?
[23:40] <Tv> http://docs.amazonwebservices.com/AmazonS3/2006-03-01/dev/index.html?UsingBucket.html
[23:40] <Tv> cmccabe: no, that's a mangling that encodes things in subset of ascii
[23:40] <cmccabe> "similar to domain names" is not a spec
[23:41] <Tv> first part is mandatory, second part is recommended
[23:41] <Tv> read
[23:58] <cmccabe> based on whoever wrote libs3, bucket names suitable for 'virtual host style' use must be from 3-63 characters long, and be alphanumeric
[23:59] <cmccabe> I think in RGW the best thing to do is be super strict for now and just accept those bucket names

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.