#ceph IRC Log


IRC Log for 2011-02-21

Timestamps are in GMT/BST.

[8:40] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[9:01] * Juul (~Juul@70-36-128-225.dsl.dynamic.sonic.net) has joined #ceph
[9:07] * alexxy (~alexxy@ has joined #ceph
[9:23] * Juul (~Juul@70-36-128-225.dsl.dynamic.sonic.net) Quit (Quit: Leaving)
[9:51] * FoxMURDER (~fox@ip-89-176-11-254.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[10:14] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[10:47] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[11:39] * frankl_ is now known as frank_
[16:46] * bwlang (~langhorst@static-71-245-237-2.bstnma.ftas.verizon.net) has joined #ceph
[17:06] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Remote host closed the connection)
[18:04] * squig (~bendeluca@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[18:15] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[18:26] * Dantman (~dantman@S0106687f740dba3e.vc.shawcable.net) has joined #ceph
[19:11] * bwlang (~langhorst@static-71-245-237-2.bstnma.ftas.verizon.net) Quit (Quit: bwlang)
[19:14] * bwlang (~langhorst@static-71-245-237-2.bstnma.ftas.verizon.net) has joined #ceph
[19:23] * bwlang (~langhorst@static-71-245-237-2.bstnma.ftas.verizon.net) has left #ceph
[20:09] <wido> hi
[20:09] <wido> I'm running the latast master and I noticed a HUGE increase in performance. This was to be expected?
[20:09] <wido> latest*
[20:14] <prometheanfire> wido: on .38?
[20:14] <prometheanfire> just curious
[20:16] <wido> prometheanfire: I'm not using the filesystem, only running with RBD
[20:17] <wido> Host is running 2.6.38 with the latest btrfs code
[20:17] <prometheanfire> ah :D
[20:17] <prometheanfire> ya, hearing this about rbd is even better
[20:19] <wido> I'm rsyncing about 1TB to my VM, two weeks ago this caused a HUGE load on the host (4 OSD's, 1 MON) and made the VM respond very slow
[20:19] <wido> But right now it all runs smooth, VM stays responsive and rsync runs fine
[20:20] <wido> sagewk: The ceph-client doesn't compile against the latest tree: http://pastebin.com/ajY5M5FS
[20:21] <wido> Like you asked me to test
[20:32] <DJLee> is it normal for 2x performance to be about 1/2 of the 1x performance..?
[20:32] <DJLee> with all things considered default
[20:33] <prometheanfire> 2x * 0.5 * 1 == 1
[20:33] <prometheanfire> hmm
[20:34] <DJLee> well, then for 3x replication, it would be 1/3..hmm expected?
[20:34] <wido> DJLee: You mean in Write performance I assume?
[20:34] <DJLee> right, 'write' sorry, i wasnt clear
[20:34] <wido> Your read performance should increase, since there are more OSD's can provide the data to the client
[20:34] <prometheanfire> ah, you mean performance when 2x redundancy or 3x redundancy is enabled?
[20:35] <wido> DJLee: I don't think the performance will be 1/3
[20:35] <DJLee> yeah, heh;
[20:35] * prometheanfire wonders if data is spread like BT or if is streamed only from one to many
[20:35] <wido> the writes go in parallel to your OSD's, so your network bandwith might be a limit
[20:35] <DJLee> for read, remember to use multi-threads!
[20:36] <prometheanfire> s/BT/bittorrent
[20:36] <wido> but assuming the OSD's are all the same HW, your performance doesn't have to be 1/3
[20:36] <prometheanfire> threadded IO you mean?
[20:36] <wido> Yes, the higher the replication, the lower the performance will be
[20:36] <DJLee> yeah threaded IO, must for 'read', dont use a single thread to test the read..
[20:38] <wido> But when reading for multiple clients they can random loadbalancer over the OSD's, so you could gain READ performance
[20:39] <DJLee> right, but for a single thread, (e.g., single connection?) i think it only reads from primary copy..? and so we have to bombard by creating multiple connections (separate IO streams)
[20:39] <DJLee> to gain the performance..
[20:40] <prometheanfire> or just compile stuff threadded
[20:40] <DJLee> but for write, it was not necessary, the write just scales well with just single thread, too, as it all go into a journal first and then handled later
[20:42] <wido> DJLee: I've ordered some new OSD stuff today, I'll start testing scaling in a few weeks
[20:42] <DJLee> prom, just trying to say multiple IO streams, like generating multiple dd commands, heh;
[20:43] <DJLee> you've 'ordered' what you mean? heh
[20:43] <wido> New hardware :)
[20:43] <DJLee> ohhh.. which one?
[20:44] <wido> I've went for the SuperMicro X7SPA-HF-D525 mainboard with 4 2TB disks and a SSD
[20:44] <DJLee> we've been using some low 54krpm 2TB
[20:44] <wido> Atom CPU
[20:44] <DJLee> oh
[20:44] <DJLee> we have that
[20:44] <DJLee> d525, heh;
[20:44] <wido> WD Greenpower disks, 5400RPM too
[20:44] <DJLee> nice
[20:44] <wido> they work fine for a OSD, you get performance by the numbers
[20:44] <DJLee> the ram?
[20:44] <wido> 4GB
[20:45] <prometheanfire> 2 mds servers is not recomended right?
[20:45] <wido> I'll just add a lot of SWAP on the SSD
[20:45] <wido> prometheanfire: Well, clustered MDS should start to work better and better
[20:45] <wido> 2 MON's is not recommended
[20:45] <DJLee> ok, can you try to e.g., use 6 osd on a single node, at 2x replication wido..? please let me know what happens..
[20:45] <DJLee> make sure to use like 16GB files...
[20:45] <wido> DJLee: I'll go for 4 OSD's per node
[20:46] <prometheanfire> ah, that's what it was :D
[20:46] <DJLee> cuz it will start to swap hard, and erm,, i think there's likely to be some crash or slowness by the cosd..
[20:46] <wido> and start scaling up with OSD's, until I'm at my 36 running OSD's
[20:46] <wido> DJLee: I've already got a Atom machine running, works fine
[20:47] <wido> really a fast CPU, fast enough for a OSD for what I can see now
[20:47] <DJLee> right
[20:47] <wido> and the 4GB Ram seems enough too, not to much cache, but that's not a problem when you have a large number of OSD's
[20:47] <wido> really depends on the situation where you use Ceph in
[20:47] <wido> I want a LOT of small nodes
[20:48] <prometheanfire> network will be your limiting factor I think
[20:48] <wido> In a recovery situation that might be indeed the problem
[20:48] <wido> I'll uplink all OSD's with 2x1G
[20:48] <DJLee> right, but are you going to consider ssds..?
[20:48] <prometheanfire> scarry
[20:49] <wido> DJLee: Yes, SSD's for journaling
[20:50] <DJLee> right, so for write you will be the boost
[20:50] <wido> indeed
[20:50] <wido> I'll have 9 of those systems (36 OSD's) running in about 3 weeks
[20:50] <DJLee> awesome
[20:51] <DJLee> make sure you test 2x as well..?
[20:51] <prometheanfire> bonding?
[20:51] <DJLee> 2x as in data replication
[20:51] <prometheanfire> 1G files?
[20:52] <DJLee> im testing whole lot, from 1g up to 16g each with thread
[20:53] <wido> DJLee: Please not, I'm NOT using the filesystem
[20:53] <wido> My focus is RBD and librados
[20:53] <prometheanfire> ah, 1G test, not a 1G filesystem per node
[20:53] <DJLee> wido, i see, er, any advances? is it quicker..?
[20:54] <DJLee> i rly havent seen much of rbd examples;
[20:54] <DJLee> and so no idea how to take use of it..
[20:55] <DJLee> how do you use and test them? sorry!
[20:55] <wido> DJLee: I use it for my virtual machines, block device storage
[20:56] <DJLee> so rbd is different to osd concept?
[20:57] <DJLee> http://ceph.newdream.net/wiki/Rbd for start?
[20:58] <wido> DJLee: No, both Ceph (the filesystem) and RBD are build on-top of librados
[20:59] <wido> I've got to go afk!
[20:59] <wido> ttyl
[20:59] <DJLee> oki!, cheers
[22:42] * yx (~yx@1RDAAADQP.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[22:42] * yx (~yx@raidz.torservers.net) has joined #ceph
[23:04] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:35] <johnl> hi
[23:36] <johnl> anyone around that can tell me which paxos algorithm ceph uses?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.