#ceph IRC Log


IRC Log for 2010-08-09

Timestamps are in GMT/BST.

[0:12] <jantje> no, i want to understand it's features
[0:12] <jantje> so I can test some scenario's
[0:12] <darkfade1> ah ok - out of my knowledge then :>
[0:12] <jantje> (probably mine too!)
[0:15] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[0:22] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[0:25] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Quit: bye)
[0:27] * bbigras (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) has joined #ceph
[0:27] * Guest1043 (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) Quit (Read error: Connection reset by peer)
[0:28] * bbigras is now known as Guest1299
[2:16] * atg (~atg@please.dont.hacktheinter.net) Quit (Remote host closed the connection)
[2:17] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[2:28] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[5:45] * Osso (osso@AMontsouris-755-1-10-232.w90-46.abo.wanadoo.fr) Quit (Quit: Osso)
[7:00] * Jiaju (~jjzhang@ Quit (Remote host closed the connection)
[7:04] * mtg (~mtg@port-87-193-189-26.static.qsc.de) has joined #ceph
[7:26] * f4m8_ is now known as f4m8
[11:26] * allsystemsarego (~allsystem@ has joined #ceph
[11:37] * Jiaju (~jjzhang@ has joined #ceph
[12:31] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[12:39] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[13:33] <jantje> btrfs backtraces: http://jan.sin.khk.be/dmesg
[14:29] <jantje> I'm confused
[14:29] <jantje> I ran osd bench for each osd, and slept for 20 secs between each bench
[14:29] <jantje> 10.08.09_14:27:24.619041 log 10.08.09_14:27:23.444105 osd0 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 10.082009 sec at 101 MB/sec
[14:30] <jantje> 10.08.09_14:27:56.638309 log 10.08.09_14:27:55.261593 osd1 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 21.894425 sec at 47892 KB/sec
[14:30] <jantje> 10.08.09_14:28:03.432756 log 10.08.09_14:28:01.529741 osd2 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 8.111137 sec at 126 MB/sec
[14:30] <jantje> 10.08.09_14:28:23.458160 log 10.08.09_14:28:22.295708 osd3 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 8.872562 sec at 115 MB/sec
[14:30] <jantje> 10.08.09_14:28:44.829695 log 10.08.09_14:28:43.694492 osd4 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 10.284167 sec at 101960 KB/sec
[14:30] <jantje> 10.08.09_14:29:02.913161 log 10.08.09_14:29:01.754040 osd5 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 8.339151 sec at 122 MB/sec
[14:30] <jantje> osd0+1, osd2+3, osd4+5 are on the same machine
[14:30] <jantje> I'm using a journal in memory
[14:32] <jantje> (the bench for osd1 is 'slow')
[14:32] <jantje> and they are all identical disks, identical hardware, etc
[14:34] <jantje> oh well, probably nothing to worry about
[15:06] <todinini> hi does anyone test ceph with blogbench?
[15:07] <todinini> one blogbench pusches the mds to nearly 100% Cpu and 75%men on a amd 1210 dual core, with 1GB
[15:49] * f4m8 is now known as f4m8_
[15:50] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[16:07] <jantje> 10.08.09_15:17:16.262423 7f473715f710 mon1 notify fsid fb9e2c83-99cc-7b77-b4c6-4314bc76396c != b33bab3b-4295-e5a0-863d-472dd447c1eb
[16:08] <todinini> jantje: your filesystem id differs
[16:14] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:33] * Osso (osso@AMontsouris-755-1-10-232.w90-46.abo.wanadoo.fr) has joined #ceph
[17:36] * gregphone (~gregphone@ has joined #ceph
[17:46] <gregphone> todinini: throttler is only for client->osd, yes
[17:51] <gregphone> jantje: if you've got one osd consistently slower than the others you might have a bad drive; you should run those benches again and then maybe some diagnostics
[17:53] <gregphone> todinini: that's not surprising with blogbench; it's probably a lot of metadata ops and if you've got logging on that'll take a lot of power to run through
[17:56] * gregphone (~gregphone@ has left #ceph
[18:41] * Osso_ (osso@AMontsouris-755-1-2-32.w86-212.abo.wanadoo.fr) has joined #ceph
[18:44] * Osso (osso@AMontsouris-755-1-10-232.w90-46.abo.wanadoo.fr) Quit (Remote host closed the connection)
[18:44] * Osso_ is now known as Osso
[18:51] <jantje> gregaf: it's consistent, so maybe a bad drive, but they're all brand new
[18:52] <jantje> [17160.448035] INFO: task sync:12937 blocked for more than 120 seconds.
[18:52] <jantje> [17160.453670] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18:52] <jantje> [17160.459351] sync D 00000001003fb40f 0 12937 12755 0x00000000
[18:52] <jantje> [17160.459355] ffff880107e7df60 0000000000000086 ffff88012e24aa00 ffff880000000000
[18:52] <jantje> [17160.459358] ffff88012f653680 0000000000015480 0000000000015480 0000000000015480
[18:52] <jantje> [17160.459361] ffff88007dbb7fd8 0000000000015480 ffff880107e7df60 ffff88007dbb7fd8
[18:52] <jantje> [17160.459364] Call Trace:
[18:52] <jantje> [17160.459371] [<ffffffff812ff7e7>] ? schedule_timeout+0x2d/0xd7
[18:52] <jantje> [17160.459374] [<ffffffff812ff7e7>] ? schedule_timeout+0x2d/0xd7
[18:52] <jantje> [17160.459376] [<ffffffff812ff65c>] ? wait_for_common+0xd1/0x14e
[18:52] <jantje> [17160.459381] [<ffffffff8103f64b>] ? default_wake_function+0x0/0xf
[18:52] <jantje> [17160.459383] [<ffffffff8103f64b>] ? default_wake_function+0x0/0xf
[18:52] <jantje> [17160.459394] [<ffffffffa04c8e6c>] ? ceph_mdsc_sync+0xec/0x1e7 [ceph]
[18:53] <jantje> [17160.459397] [<ffffffff812ffd81>] ? mutex_lock+0xd/0x33
[18:53] <jantje> [17160.459405] [<ffffffffa04cdf3d>] ? ceph_osdc_sync+0x20/0xc1 [ceph]
[18:53] <jantje> [17160.459411] [<ffffffff8110667e>] ? sync_one_sb+0x0/0x20
[18:53] <jantje> [17160.459429] [<ffffffffa04b6778>] ? ceph_syncfs+0x2a/0x2e [ceph]
[18:53] <jantje> [17160.459444] [<ffffffff81106666>] ? __sync_filesystem+0x62/0x7a
[18:53] <jantje> [17160.459459] [<ffffffff810ea932>] ? iterate_supers+0x61/0xa0
[18:53] <jantje> [17160.459473] [<ffffffff811066c6>] ? sys_sync+0x28/0x54
[18:53] <jantje> [17160.459487] [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b
[19:01] * mtg (~mtg@port-87-193-189-26.static.qsc.de) Quit (Quit: Verlassend)
[19:02] <yehudasa> jantje: what does /sys/kernel/debug/ceph/*/mdsc and /sys/kernel/debug/ceph/*/osdc show?
[19:03] * kblin (~kai@mikropc7.biotech.uni-tuebingen.de) Quit (Remote host closed the connection)
[19:03] * kblin (~kai@mikropc7.biotech.uni-tuebingen.de) has joined #ceph
[20:23] <jantje> I dont have anything in debug/
[20:23] <jantje> I probably need linux-image-2.6.35-trunk-amd64-dbg
[20:25] <yehudasa> jante: you need to mount debugfs, e.g., 'mount -t debugfs none /sys/kernel/debug'
[20:25] <yehudasa> that's jantje:
[20:33] * Osso_ (osso@AMontsouris-755-1-2-32.w86-212.abo.wanadoo.fr) has joined #ceph
[20:33] * Osso (osso@AMontsouris-755-1-2-32.w86-212.abo.wanadoo.fr) Quit (Remote host closed the connection)
[20:33] * Osso_ is now known as Osso
[21:27] <wido> hi
[21:27] <wido> this is what i meant with iSCSI: http://ceph.newdream.net/wiki/ISCSI
[21:30] <yehudasa> wido: thanks
[21:32] <wido> np
[21:32] <wido> any idea when there will be a tcmalloc() branch?
[21:33] <wido> my MDS and OSD keep running OOM when i try to sync kernel.org, right now with 616GB of data and a lot of small files my MDS is using 3.86GB of memory, rsyncing kernel.org makes it swap and the MDS will go stale
[21:33] <wido> the OSD's will also start to swap and become unresponsive by doing so
[21:33] <gregaf> wido: right now :p
[21:34] <gregaf> it's not set up properly with package dependencies and whatnot, but if you have libgoogleperf-dev installed it'll use tcmalloc on the cmds and cosd
[21:34] <wido> ok, then i'll install that :)
[21:35] <gregaf> I think it's still up-to-date with unstable but if it's not the only commit it actually adds is the most recent one, so you can just merge or cherry-pick off of unstable
[21:35] <wido> i've got a new toy, a AMD Octet core with 32GB RAM :-) Will start to test the MDS and qemu-kvm on that machine
[21:35] <wido> ok, gregaf i'll do that
[21:46] <gregaf> wido: doesn't doing that iSCSI thing just trash performance?
[21:47] <wido> gregaf: i probably will, but in some legacy situations you will need iSCSI
[21:48] <wido> and the performance isn't that bad, but it won't be super either
[21:48] <wido> you could drop the multipath setup and go for one target, then you can switch to fileio and benefit from the targets memory
[21:54] <wido> gregaf: you mean libgoogle-perftools-dev right?
[21:54] <gregaf> uh, yeah
[21:54] <gregaf> sorry
[21:54] <wido> package is broken under Ubuntu. Are you building on i386 or AMD64?
[21:55] <wido> https://bugs.launchpad.net/ubuntu/+source/google-perftools/+bug/359736
[21:55] <gregaf> amd64 under debian lenny
[21:55] <wido> ok, then it's really Ubuntu which is broken
[21:56] <gregaf> yeah, the last comment there says it was fixed upstream in February
[21:56] <wido> well, it's not, it's not even fixed in Ubuntu 10.10 yet
[21:57] <wido> on i386 it is, but not AMD64
[21:57] <gregaf> :(
[21:57] <gregaf> you could just grab the source and install it yourself, maybe?
[21:57] <wido> yes, that's what i'll do then
[22:01] <wido> or just grab the Debian packages, which also seem to work
[22:07] <wido> gregaf: switching to the tcmalloc branch is enough? then just build?
[22:08] <wido> any way to check i'm really building with tcmalloc?
[22:08] <gregaf> you'll have to rerun autogen and configure
[22:08] <gregaf> when you build you'll see -tcmalloc occurring in the output
[22:08] <gregaf> err, -ltcmalloc
[22:09] <gregaf> and then your memory use ought to be a good bit lower
[22:09] <gregaf> :)
[22:09] <wido> great, i'm going afk now, but i'll give it a try tomorrow morning
[22:09] <wido> time difference is always funny :-)
[22:13] * Anticimex (anticimex@netforce.csbnet.se) Quit (Ping timeout: 480 seconds)
[22:17] <wido> yehudasa: http://www.mail-archive.com/mod-fcgid-users@lists.sourceforge.net/msg00035.html
[22:19] <yehudasa> wido: yeah, saw that
[22:20] <wido> ok
[22:20] <yehudasa> it might be that fastcgi with '-flush' flag would work
[22:21] <yehudasa> another option would be to maintain our own apache module
[22:23] <wido> could be an option too, but then you are limited to Apache
[22:23] <yehudasa> right
[22:23] <yehudasa> well.. if both don't work anyway ..
[22:23] <wido> where there is so much more these days then Apache
[22:23] <wido> about that you are right
[22:24] <yehudasa> basically we need to be able to run cgi
[22:24] <yehudasa> no need for buffering
[22:27] <wido> you mean basic CGI? That could work too, but with small requests it will have a lot of start-up time
[22:28] <yehudasa> yeah, that's why we chose fcgi
[22:34] <wido> but if there is no solution for the fcgid buffering, a Apache module would be an option, since uploading large files is pretty normal these days
[22:37] <yehudasa> yeah
[22:37] <yehudasa> I'm getting closer to having fastcgi setup working
[22:38] <yehudasa> might be that the -flush option would work, but I'm afraid that it'll impact it in some other form
[22:38] <yehudasa> e.g., the 100-continue will not work anymore, or something like that
[22:41] <wido> that is chunked upload, isn't it?
[22:41] <yehudasa> yeah
[22:41] <yehudasa> it's not really working anyway, btw
[22:42] <wido> oh, ok. Btw, RADOS seems pretty stable. The RGW is working fine here and also my app which uses RADOS works fine too
[22:42] <yehudasa> great, happy to hear
[22:42] <yehudasa> I think fastcgi fixes the problem
[22:43] <yehudasa> upload seems slower naturally, but it doesn't buffer the entire request
[22:43] <wido> when you apply the -flush flag?
[22:43] <yehudasa> I didn't even do that
[22:44] <yehudasa> I'll put a reference configuration in the wiki later
[22:44] <yehudasa> pretty similar to the fcgi one
[22:44] <yehudasa> but I had to compile my own apache module
[22:44] <wido> ah, ok :-) Btw, could you also post your lighttpd configure somewhere? might be usefull in some situations
[22:44] <wido> then i'm really afk now
[22:44] <yehudasa> yeah
[22:45] <wido> ttyl
[22:59] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:03] <yehudasa> wido: I updated the wiki with the configs
[23:18] <jantje> is it possible to get a ceph client running on centos4/5
[23:19] <jantje> (would that be the fuse client thingie?)
[23:20] <gregaf> probably — you just need FUSE to run that one
[23:25] <jantje> it's probably very hard to get it compiled for centos/rhel/4/5
[23:29] <yehudasa> jantje: we're kinda debian oriented here, but I guess you should be able to compile the ceph fuse client on centos 5 pretty easily
[23:31] <jantje> debian is OK by me, but I'm sure I will get request for centos clients :-)
[23:35] <yehudasa> well.. centos 5 uses quite an old kernel.. it's been a while since we last compiled on 2.6.18, so probably fuse is the only way
[23:36] <jantje> it would be good if I can get it working with centos5
[23:36] <jantje> or else I'll have to use nfs gateways or stuff like that
[23:36] <jantje> anyway
[23:36] <jantje> when is 1.0 coming? :-)
[23:37] <gregaf> heh
[23:37] * jantje runs
[23:37] <gregaf> When It's Done
[23:37] <gregaf> ;)
[23:48] <jantje> any suggestions on what could really stress my setup?
[23:48] <jantje> bonnie, iozone ?
[23:49] <gregaf> we run both of those in our standard (though limited) qa suite
[23:49] <gregaf> they definitely can stress it though
[23:50] <gregaf> apparently blogbench is pretty intense on the MDS
[23:51] <jantje> I'm thinking of scripting some failover scenario's
[23:51] * alexxy (~alexxy@ has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.