#ceph IRC Log


IRC Log for 2010-09-06

Timestamps are in GMT/BST.

[2:10] * gregphone (~gregphone@pool-108-1-48-154.chi01.dsl-w.verizon.net) has joined #ceph
[2:11] <gregphone> wido: trunc truncates the object at whatever offset you provide in size
[2:11] * gregphone (~gregphone@pool-108-1-48-154.chi01.dsl-w.verizon.net) has left #ceph
[2:22] * MarkN1 (~nathan@ has left #ceph
[3:42] * lidongyang_ (~lidongyan@ has joined #ceph
[3:42] * lidongyang (~lidongyan@ Quit (Read error: Connection reset by peer)
[3:43] * lidongyang_ is now known as lidongyang
[5:27] * MK_FG (~fraggod@wall.mplik.ru) has joined #ceph
[7:02] * MarkN (~nathan@ has joined #ceph
[8:11] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:44] * allsystemsarego (~allsystem@ has joined #ceph
[9:12] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:19] * Yoric (~David@ has joined #ceph
[10:19] * eternaleye (~eternaley@ Quit (Quit: eternaleye)
[10:21] * eternaleye (~eternaley@ has joined #ceph
[12:34] * sentinel_x73 (~sentinel_@ Quit (Remote host closed the connection)
[12:35] * sentinel_x73 (~sentinel_@ has joined #ceph
[13:16] * MK_FG (~fraggod@wall.mplik.ru) Quit (Remote host closed the connection)
[13:44] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Quit: WeeChat 0.2.6)
[14:20] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[15:53] * f4m8 is now known as f4m8_
[17:20] * Yoric (~David@ Quit (Quit: Yoric)
[18:03] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Quit: WeeChat 0.2.6)
[18:04] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[19:26] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[19:27] * Osso_ (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[19:27] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) Quit (Read error: Connection reset by peer)
[19:27] * Osso_ is now known as Osso
[20:06] <wido> hi
[20:06] <wido> is there some form of documentation about the self managed snapshots of librados?
[20:19] * klooie (~kloo@a82-92-246-211.adsl.xs4all.nl) has joined #ceph
[20:19] <klooie> hi.
[20:19] <wido> hi
[20:20] <klooie> hi wido!
[20:20] <klooie> do you know if it's possible to vote on bugs or feature requests?
[20:20] <wido> no way to vote for them
[20:20] <wido> which one is bothering you?
[20:20] <klooie> i think that the lack of feature #303 killed my ceph setup.
[20:21] <klooie> reading from the filesystems that underlie my osds is faster than writing.
[20:21] <klooie> i added a few disks, so ceph started shuffling data around.
[20:22] <klooie> the memory usage of the osds for the new disks shot up until memory ran out.
[20:22] <klooie> apparently there's an unbounded work queue involved.
[20:22] <wido> yes, i had the same. You could limit the recovery ops, by default its 5, i have it set to 1
[20:23] <wido> but #303 is set for 0.22, which is due in 25 days
[20:23] <klooie> ah, thanks for that.
[20:23] <wido> http://tracker.newdream.net/versions/show/10
[20:23] <klooie> do you agree that #303 is the real solution, where lowering the number of recovery ops lessens the problem?
[20:24] <wido> I'm not 100% sure that 303 is the real problem, but from what I heard, there still are some memory issues
[20:24] <wido> OOM should not occur that easy
[20:24] <klooie> i'm discovering ceph, so i hacked around and got rid of the queue by disabling some threading and turning off the journal.
[20:24] <wido> Did you try placing your journal on a seperate disk? That speeds up a lot
[20:24] <klooie> osd memory usage was then constant but i corrupted my ceph fs. :-)
[20:24] <klooie> i did, yes..
[20:24] <wido> hehe, my fs is down for almost a week now, MDS bug :-) I'm only playing with RBD/RADOS right now
[20:25] <wido> But the recovery procedure is still a bit heavy, brings a cluster to it's knees
[20:25] <klooie> this is a big problem for real use though, if an osd fails i need to be able to replace it and not have the data reshuffle cause further problems.
[20:26] <wido> "osd recovery max active = 1
[20:26] <klooie> (i note all the warnings saying it's not yet ready for production use, of course.)
[20:26] <kblin> klooie: it also doesn't sound very ready... :)
[20:27] <wido> For now, that keeps my cluster (12 OSD's, 6.3TB) alive when recovering
[20:27] <wido> btw, how many GB of RAM? And how large are your disks? And how many data is on it?
[20:27] <klooie> do i put that in the main osd section, wido?
[20:27] <wido> No, the osd section
[20:27] <wido> My ceph config: osd recovery max active = 1
[20:27] <wido> uhh: http://zooi.widodh.nl/ceph/ceph.conf
[20:28] <klooie> 2GB of RAM, 4 osds of 2TB each.
[20:28] <klooie> oh that's the 'after' though.
[20:28] <klooie> i had 2 x 2TB, only 100GB of data on it then added 2 x 2TB more.
[20:28] <klooie> that killed it.
[20:28] <iggy> is there any info on optimal journal size?
[20:29] <wido> iggy: Not really, but it should be large enough to hold the last few seconds of writes
[20:29] <wido> so, 4GB should be sufficient
[20:29] <wido> even 2GB I assume
[20:29] <wido> klooie: Some of my OSD's have to do it with 1GB of RAM
[20:30] <wido> You have tcmalloc installed?
[20:30] <iggy> well, I've got a 12 bay case with 12x2T drives, I was thinking about getting a 64G ssd and putting all the journals on it (or 2x32G ssd)
[20:30] <klooie> wido, no, i'll have a look at it.
[20:30] <wido> iggy: 12 OSD's or one OSD on that machine?
[20:30] <wido> And I think you will need a LOT of RAM
[20:31] <iggy> I'm leaning more toward 12 OSDs
[20:31] <iggy> 12G of ram enough you think?
[20:32] * sagelap (~sage@c-24-128-48-100.hsd1.ma.comcast.net) has joined #ceph
[20:32] <wido> That's 1GB per OSD, with a large recovery that might be a bit low
[20:33] <klooie> wido, is there a rough calculation one may do?
[20:33] <iggy> okay, so here's the scenario, I'm shipping this to a DC with 6 drives to get up and running quickly, when I add the other 6 drives, I could add another 12G ram for 24G total
[20:34] <iggy> I'll also be adding more identical boxes as time goes on (which is why I'm leaning toward 12 OSDs vs 1 OSD)
[20:34] <wido> klooie: Not really, but a dev might shed some light on this. Imho, you would want about 3GB per OSD
[20:35] <wido> iggy: I wouldn't run so much OSD's on one box, you would need a pretty powerfull CPU. I would go for the pizza-boxes with 4 disks
[20:35] <iggy> negative
[20:35] <wido> If such a box goes down, your cluster misses 24TB! of data, that's a pretty heavy recovery
[20:35] <iggy> too many parts per disk in that scenario
[20:36] <wido> Depends on the situation I assume
[20:36] <iggy> the box is going to have dual quad cores
[20:37] <wido> sage: I just asked (but you were offline), is there some form of documentation about the selfmanaged snaps of librados
[20:44] <klooie> wido, i see there's also an osd_recovery_max_chunk option?
[20:45] <wido> oh, yes. Never tried that one
[20:46] <klooie> also a recover_op is for a PG?
[20:47] <klooie> more PGs means smaller PGs meaning a recover_op (which can be limited to 1 as per your config) is cheaper?
[20:47] <klooie> i would really like to bring the memory requirements down, i'm trying to do this on the cheap at home..
[20:48] <klooie> also i need more free time to try these things. :)
[20:49] * sagelap (~sage@c-24-128-48-100.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[20:51] <wido> klooie: I'm not sure at all, I was just having the same issue. Bringing down that value "fixed" it for me
[20:54] <klooie> yes, i'll recreate the situation but with that change to see if it survives..
[21:01] * klooie has to go.
[21:01] <klooie> thanks wido!
[21:02] * klooie (~kloo@a82-92-246-211.adsl.xs4all.nl) Quit (Quit: Disconnecting)
[22:14] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.