#ceph IRC Log


IRC Log for 2011-08-15

Timestamps are in GMT/BST.

[0:10] * MarkN (~nathan@ has joined #ceph
[0:11] * MarkN (~nathan@ has left #ceph
[0:12] * verwilst (~verwilst@dD5769762.access.telenet.be) Quit (Quit: Ex-Chat)
[1:07] * greglap (~Adium@ has joined #ceph
[1:16] * greglap1 (~Adium@ has joined #ceph
[1:20] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[1:59] * huangjun (~root@ has joined #ceph
[1:59] <huangjun> ls
[3:54] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Quit: ChatZilla 0.9.87 [Firefox 4.0.1/20110609040224])
[4:12] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[9:00] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Ping timeout: 480 seconds)
[9:20] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[10:55] * greglap1 (~Adium@ Quit (Quit: Leaving.)
[12:08] * al (d@niel.cx) has joined #ceph
[13:14] * huangjun (~root@ Quit (Quit: Lost terminal)
[15:33] * greglap (~Adium@ has joined #ceph
[16:16] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[16:33] * Meths_ (rift@ has joined #ceph
[16:41] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[16:47] * greglap (~Adium@ has joined #ceph
[17:41] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[17:42] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:56] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[17:58] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[17:59] * greglap (~Adium@aon.hq.newdream.net) Quit ()
[18:04] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:10] * sagewk (~sage@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:12] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[18:16] <gregaf> hmmm, it looks like there are only 3 or 5 machines available in sepia :(
[18:16] <gregaf> are you actually using those 22, sagewk?
[18:20] <sagewk> gregaf: unlocked 7
[18:21] <gregaf> coolio
[18:31] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[18:33] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:51] * The_Bishop (~bishop@port-92-206-21-65.dynamic.qsc.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:57] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[19:08] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[20:02] * cp (~cp@ has joined #ceph
[20:05] * The_Bishop (~bishop@sama32.de) has joined #ceph
[20:08] * amichel (~amichel@ip68-230-56-203.ph.ph.cox.net) has joined #ceph
[20:09] * jclendenan (~jclendena@ has joined #ceph
[20:10] <cp> Question: I have ceph set up with multiple nodes, and mounted using "mount -t ceph ...". On one node I do "tail -f test" in the shared directory. On the other node I do 'echo "hello world" >> test" a few times. This does write to the file, but nothing is updated for "tail -f". Any thoughts?
[20:14] <cp> Hmm... seems to work doing something similar. I'll retract the question until I can reproduce it reliably.
[20:20] <cp> New question: When I try mounting with the fuse client "cfuse -m ..." I can at first read the directory, but when I try writing to the "test" file in it I get the following message: -bash: bar: Transport endpoint is not connected
[20:20] <cp> "ls" gives the same message now as well. Any ideas?
[20:21] <gregaf> that's a FUSE message which means cfuse has crashed
[20:22] <cp> Ah. OK. Is there any way to get debug/traces out of it? And/or does this happen to other people?
[20:22] <gregaf> it dumps a core according to your environment rules, and you can turn on logging and rerun it if you like, that should say what happened
[20:33] <jclendenan> Hi all, I'm looking a building a small ceph cluster, and was wondering if anyone had any hardware sizing examples that they could share. I've looked at the wiki, and that makes sense, but wondered what people were seeing in terms of iops for any given hw/sw combo?
[20:33] <jclendenan> Any tips on the ratios cpu/backplane/drives ?
[20:34] * Meths_ is now known as Meths
[20:37] * amichel (~amichel@ip68-230-56-203.ph.ph.cox.net) Quit (Quit: Bad news, everyone!)
[20:50] <gregaf> jclendenan: sorry, most of us are in meetings and it's lunchtime
[20:51] <gregaf> wido has a cluster running all on Atoms, and right now that seems to not be enough for handling recovery
[20:51] <jclendenan> gregaf, :) no worries. meetings before IRC :) and lunchtime doubly so
[20:52] <jclendenan> I was looking at the backblaze pod 2.0, and wondering if something like that had been attempted
[20:52] <jclendenan> - http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/
[20:53] <gregaf> obviously the faster the drives and backplane the faster your storage will be, in terms of CPU and memory the monitors really don't take any (although putting them on their own drive can be good since it reduces overlapping syncs); the MDS will basically eat all you can give it (though in default config it restricts itself to a few hundred MB); the OSDs like page cache for read workloads and use very little CPU normally but spike up under recovery circumst
[20:53] <jclendenan> but thinking the # of OSD's per drive and per CPU could be an interesting challenge, as would the nic side of things
[20:54] <gregaf> yeah, I don't think those have enough bandwidth to supply all their drives
[20:54] <gregaf> just in terms of the SATA splitting
[20:54] <jclendenan> gregaf, ya, I did a rough calc, and as long as it's nearline drives, it's almost got enough power.
[20:54] <jclendenan> just don't think about doing many ssd's on it
[20:55] <gregaf> but with that config you wouldn't have enough CPU or memory to drive many cosd processes, so you'd basically have to run a big RAID array (totally doable of course, but it's definitely restricting you to that option)
[20:56] <gregaf> but my food's here now, so I can get into it more with you in a bit :)
[20:56] <jclendenan> my thoughts as well. sounds good. I think lunch is calling my name as well
[21:06] * The_Bishop (~bishop@sama32.de) Quit (Ping timeout: 480 seconds)
[21:41] * The_Bishop (~bishop@p4FCDEC9C.dip.t-dialin.net) has joined #ceph
[21:46] * ag4ve (~ag4ve@ has joined #ceph
[21:47] * ag4ve (~ag4ve@ Quit ()
[22:05] <Tv> jclendenan: the backblaze hw design seems very much oriented toward archival, not active use
[22:07] <Tv> jclendenan: the "optimal" ratio of cpu : io bandwidth, io latency : ram : number of osds is still sort of unknown; right now we're at the stage where we are just starting to have better and better benchmarks, so we actually *can* explore alternatives and have good numbers to compare
[22:07] <Tv> jclendenan: but it comes down to things like.. a read-mostly load is very different from a write-heavy load which is very different from recovering from failures
[22:08] <Tv> jclendenan: so best advice right now is benchmark your own :-/
[22:08] <Tv> with *your* usage mix
[22:51] <jclendenan> Tv, thanks. Is anyone collecting benchmarks at the moment I can take a peek at. I might be able to provide some as I get mine up and running
[22:51] <jclendenan> Especially if there are a common set of tests / workloads
[22:52] <gregaf> unfortunately not ??? it's something we've talked about in passing but we don't have the systems to handle something like that yet
[23:00] <jclendenan> ok. if you start even a wiki page with iozone results and hardware configs, it might be a start
[23:02] <gregaf> unfortunately something like that is so simplistic as to be just about useless :(
[23:03] <gregaf> jclendenan: once your hard drives are either fast or numerous enough you will always get good iozone performance
[23:03] <jclendenan> gregaf, right
[23:03] <gregaf> but that won't capture stuff like "if two OSDs die simultaneously most of the OSDs time out" because the hardware is underpowered for that scenario
[23:04] <jclendenan> toobad specfs isn't free to test
[23:04] <jclendenan> makes sense. any idea what the dreamhost sandbox is running on?
[23:05] <cp> Running this "qemu-img create -f rbd rbd:data/foo 10G" I get the following response: qemu-img: Unknown file format 'rbd'. The man page doesn't mention rbd as an option
[23:05] <gregaf> I think it's 4 boxes with a bunch of disks in RAID5 or something
[23:06] <gregaf> jclendenan: our new (very large) cluster that we're running rgw on is running an OSD per disk, though
[23:07] <gregaf> assuming sufficient processing power then streaming workload bandwidth seems to approximately match theoretical performance based on disk and network throughput
[23:07] <jclendenan> ok, cool
[23:07] <gregaf> and sufficient processing power is very low in the case where all OSDs are working
[23:08] <jclendenan> it's failure cases that cause cpu and backplane io to spike I would imagine
[23:08] <gregaf> yeah
[23:09] <gregaf> we haven't gone through and profiled what all's taking up resources there; we expect there's a good bit of performance to be wrung out when we've got the time
[23:09] <gregaf> just remember when calculating your theoretical performance to include the effect of journaling, if you're putting the journal on the same physical disk as the data partition
[23:10] <gregaf> cp: hmm, yehudasa probably knows about that
[23:11] <yehudasa> cp: did you configure qemu --with-rbd?
[23:18] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[23:33] * Dantman (~dantman@199-7-158-34.eng.wind.ca) has joined #ceph
[23:41] <yehudasa> Tv: s3-tests missing isodate package in virtualenv
[23:42] <Tv> yeah it seems Stephon's commit ea3f73ef904d899ed28449bdc5d2bd5d7ff489f8 uses it without adding it to dependencies
[23:42] <Tv> fixing
[23:46] <Tv> yehudasa: pull cc9a026 and re-run bootstrap
[23:46] <yehudasa> Tv: cool, thanks

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.