#ceph IRC Log


IRC Log for 2010-08-19

Timestamps are in GMT/BST.

[0:02] * MarkN (~nathan@ has joined #ceph
[0:13] * conner (~conner@leo.tuc.noao.edu) Quit (Read error: Operation timed out)
[0:27] * conner (~conner@leo.tuc.noao.edu) has joined #ceph
[0:56] * deksai (~chris@dsl093-003-018.det1.dsl.speakeasy.net) Quit (Ping timeout: 480 seconds)
[3:53] * asllkj (~lbz@c-24-8-7-136.hsd1.co.comcast.net) has joined #ceph
[3:53] <asllkj> error: too few arguments to function 'blkdev_issue_flush' anyone know what causes this in 2.6.35 ? .34 was ok
[4:13] * neale_ (~neale@pool-173-71-192-200.clppva.fios.verizon.net) Quit (Quit: neale_)
[4:20] * asllkj (~lbz@c-24-8-7-136.hsd1.co.comcast.net) Quit (Quit: Leaving)
[4:25] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) has joined #ceph
[6:30] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) Quit (Read error: Operation timed out)
[6:50] * f4m8_ is now known as f4m8
[7:24] * mtg (~mtg@vollkornmail.dbk-nb.de) has joined #ceph
[9:04] * allsystemsarego (~allsystem@ has joined #ceph
[10:16] * Yoric (~David@ has joined #ceph
[11:31] * kblin (~kai@h1467546.stratoserver.net) has joined #ceph
[14:09] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:48] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[15:30] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) has joined #ceph
[15:50] * f4m8 is now known as f4m8_
[15:56] * mtg (~mtg@vollkornmail.dbk-nb.de) Quit (Quit: Verlassend)
[17:26] * neale_ (~neale@pool-173-71-192-200.clppva.fios.verizon.net) has joined #ceph
[17:50] * gregphone (~gregphone@ has joined #ceph
[18:00] * gregphone (~gregphone@ Quit (Quit: Rooms • iPhone IRC Client • http://www.roomsapp.mobi)
[18:00] * gregphone (~gregphone@ has joined #ceph
[18:06] * gregphone (~gregphone@ Quit (Remote host closed the connection)
[18:09] * gregphone (~gregphone@ has joined #ceph
[18:47] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[18:51] * gregphone (~gregphone@ Quit (Ping timeout: 480 seconds)
[18:55] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[18:55] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:58] * Yoric (~David@ Quit (Quit: Yoric)
[19:00] <wido> i've tried qemu-kvm last night, stressed it for about 9 hours with two VM's, worked like a charm
[19:00] <sagewk> yay!
[19:00] <wido> lot of I/O tests too, no problems at all
[19:00] <wido> postmark, stress and bonnie++
[19:01] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[19:02] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:03] <gregaf> wido: what kind of bandwidth were you getting?
[19:06] <wido> about 500Mbit read in a VM
[19:06] <wido> And about 700Mbit write
[19:07] <gregaf> was that with one or both running?
[19:08] <wido> only one
[19:08] <wido> http://zooi.widodh.nl/ceph/bonnie.pdf
[19:08] <wido> some bonnie++ stats, where alpha is running with cache on writeback and aio on native, beta with wb and aio on threads
[19:08] <wido> both vm's are running a default Ubuntu 10.04 install with a "virtual" kernel and ext4
[19:09] <wido> i'll upload the .ods, reads much easier :)
[19:09] <wido> http://zooi.widodh.nl/ceph/bonnie.ods
[19:32] <wido> gregaf: did you try qemu-kvm in some real tests?
[19:32] <gregaf> oh, I haven't run the KVM stuff at all myself
[19:32] <gregaf> why?
[19:33] <wido> i found that small writes are not so fast, like i said yesterday, a Ubuntu install is pretty slow
[19:33] <wido> i'm wondering if this is due to my journal on regular S-ATA disks
[19:33] <wido> i think so, but some form of conformation would be nice :)
[19:34] <sagewk> wido: you could put the journal in a tmpfs just to see how much that contributes
[19:34] <gregaf> that plus replication probably
[19:34] <wido> sagewk: that's new :-) I'll give it a try, but i lack RAM ;)
[19:34] <wido> for now it seems that a VM has the performance of a 5400RPM S-ATA disk
[19:34] <sagewk> for small writes it can be pretty small... 50-100M or something
[19:35] <gregaf> doing a network block device on rados means sticking something like 4 disk writes under the block layer
[19:35] <gregaf> when it expects to get 1
[19:35] <gregaf> so the latency goes way up, and for synchronous writes that'll slow you way down
[19:35] <wido> yes, that's why i'm trying aio, but didn't seem to make many difference
[19:36] <sagewk> cache=writeback, you mean.
[19:36] <wido> no, aio=native and aio=threads
[19:36] <wido> so the rados_aio* calls are used
[19:36] <wido> "-drive file=rbd:rbd/charlie,if=virtio,index=0,boot=on,format=rbd,cache=writeback,aio=native"
[19:37] <kblin> say, what's the minimum ram requirement of a MDS?
[19:37] <kblin> the wiki just says 'lots and lots and lots'
[19:37] <sagewk> the aio settings probably won't change too much here... from the fs's perspective the writes still take longer to hop the network etc.
[19:38] <gregaf> kblin: I think it usually runs at about 800MB?
[19:38] <sagewk> kblin: i think the default settings use around 1G. you can adjust though with the 'mds cache size' option
[19:38] <gregaf> that can go up though
[19:38] <kblin> ah, ok. durn
[19:38] <kblin> so much for my plan of doing an embedded storage cluster
[19:39] <sagewk> kblin: you can adjust the cache size settings however you want
[19:40] <sagewk> there is also some optimization to be done on the mds as far as memory usage goes. just hasn't been a priority yet
[19:40] <gregaf> how little RAM were you hoping for?
[19:41] <kblin> well, so far the plan was running the MDS on a sheevaplug that packs 512 megs
[19:42] <kblin> the storage nodes would go on boards that pack 128 megs
[19:43] <kblin> I know that'll hurt the caching performance, but I doubt that's going to be the bottleneck of the setup
[19:43] <gregaf> oof
[19:44] <gregaf> yeah, I imagine the CPU will be the limiting factor there
[19:45] <kblin> the idea behind this is to prototype a setup where you can connect a bunch of NAS boxes together and have them look like a single file server
[19:46] <kblin> clearly if performance if important, you need a beefy machine
[19:47] <gregaf> as Sage puts it, in principle it will function
[19:47] <gregaf> I'm just not sure what the point would be
[19:48] <kblin> gregaf: I was buzzword hunting for a conference :)
[19:49] <kblin> gregaf: and now that they've accepted my talk, I need to follow through
[19:49] <kblin> also it's a good excuse to finally play with ceph
[19:53] <kblin> hm, I guess I can cheat with the demo and run the monitor on my laptop..
[19:53] <kblin> that has a bit more ram
[19:54] <gregaf> the monitor actually doesn't care
[19:54] <gregaf> I think it uses like 30MB or something
[19:54] <kblin> er, metadata server
[19:54] <gregaf> yeah
[19:54] <gregaf> you certainly ought to be able to make it run on 512MB, it'll just be slow
[19:58] <kblin> ok, I guess I'll first get it to work on pc-based hardware, and then check how much performance will suffer if I go embedded
[20:00] <wido> http://www.pastebin.org/614563 << i guess i've got a tcmalloc crash?
[20:01] <wido> i made a mistake with my crushmap, 5 of the 12 OSD's crashed, all with different backtraces
[20:01] <sagewk> aie. that's something we should fix. was was wrong with the crushmap?
[20:02] <gregaf> I got a crash yesterday that had tcmalloc — our best guess is that we're asking it to free memory that was allocated by the standard malloc in a library
[20:03] <wido> i tried to put a pool on just one OSD, the one with the SSD as journal
[20:03] <wido> i'll post the crushmap
[20:03] <wido> http://www.pastebin.org/614590
[20:03] <wido> the crushmap went fine, creating a pool with rule 4 too, but then trying to create a rbd image failed (rbd just hang)
[20:04] <wido> and then the OSD's started to go down
[20:07] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) Quit (Ping timeout: 480 seconds)
[20:18] <wido> it really seems to be due to the pool i created with a faulty crush rule. Tried removing the pool, but restartting the OSD's will crash then again
[20:25] <wido> i'm going afk, i'm not creating an issue yet, since i have no idea what went wrong. The cluster is in a pretty degraded state now, so feel free to check it out
[20:26] <sagewk> oh, i think i know what the problem is.
[20:29] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[20:31] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) has joined #ceph
[20:31] <sagewk> wido: which pool was it?
[21:14] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[22:02] <josef> sagewk: i swear everytime i go to update ceph the package updating steps have changed
[22:03] <sagewk> josef: what's changed?
[22:05] <josef> they moved everything over to git instead of cfs
[22:05] <josef> err cvs
[22:05] <sagewk> progress!
[22:05] <josef> take two weeks off and all of a sudden everything has changed :)
[22:05] <josef> yeah its a great step no doubt, just means i had to change a bunch of stuff and wont be able to do the update until tomorrow
[22:06] <darkfader> josef: hehe admit it your holiday was a bit longer
[22:06] <darkfader> i updated the wiki a bit because it was my first time using git
[22:06] <josef> well yes i tend to ignore fedora for the most part :)
[22:06] <darkfader> so the instructions should actually work
[22:42] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:35] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[23:35] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.