#ceph IRC Log


IRC Log for 2011-11-07

Timestamps are in GMT/BST.

[0:04] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[0:04] <mrjack_> hi
[0:04] <mrjack_> i think i found a bug
[0:05] <mrjack_> node02:/etc/ceph# ls
[0:05] <mrjack_> ceph.conf keyring.mds.1 keyring.osd.1
[0:05] <mrjack_> node02:/etc/ceph# ceph
[0:05] <mrjack_> *** Caught signal (Segmentation fault) **
[0:05] <mrjack_> in thread 7fb817589720
[0:05] <mrjack_> ceph version 0.37-307-gae41f32 (commit:ae41f3232a39dbf33487ab02cbac292f58debea8)
[0:05] <mrjack_> 1: ceph() [0x470789]
[0:05] <mrjack_> 2: (()+0xef60) [0x7fb817170f60]
[0:05] <mrjack_> 3: (MonClient::_reopen_session()+0x5e4) [0x4845e4]
[0:05] <mrjack_> 4: (MonClient::authenticate(double)+0x1a1) [0x485281]
[0:05] <mrjack_> 5: (ceph_tool_common_init(ceph_tool_mode_t, bool)+0x55d) [0x452d2d]
[0:05] <mrjack_> 6: (main()+0x4fe) [0x44fa6e]
[0:05] <mrjack_> 7: (__libc_start_main()+0xfd) [0x7fb815b84c4d]
[0:05] <mrjack_> 8: ceph() [0x44e519]
[0:05] <mrjack_> Speicherzugriffsfehler
[0:06] <mrjack_> node02:/etc/ceph# scp root@node01:/etc/ceph/keyring .
[0:06] <mrjack_> keyring 100% 92 0.1KB/s 00:00
[0:06] <mrjack_> node02:/etc/ceph# ceph
[0:06] <mrjack_> ceph> quit
[0:06] <mrjack_> so ceph seems to segfault when keyring file is missing...
[0:06] <mrjack_> warning should be better :)
[0:21] * fronlius (~Adium@d212223.adsl.hansenet.de) Quit (Quit: Leaving.)
[1:23] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:15] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[3:37] * votz (~votz@pool-108-52-121-103.phlapa.fios.verizon.net) has joined #ceph
[3:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[3:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:04] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[5:04] * mrjack_ (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[5:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[5:47] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:10] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[9:00] * dweazle (~dweazle@dev.tilaa.nl) Quit (Read error: Connection reset by peer)
[9:49] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[9:53] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[10:17] * fronlius (~Adium@testing78.jimdo-server.com) has joined #ceph
[11:28] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:29] * fronlius (~Adium@testing78.jimdo-server.com) Quit (Quit: Leaving.)
[11:29] * fronlius_ is now known as fronlius
[12:01] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:35] * morse_ (~morse@supercomputing.univpm.it) has joined #ceph
[12:35] * alexxy[home] (~alexxy@ has joined #ceph
[12:35] * _are_ (~quassel@vs01.lug-s.org) has joined #ceph
[12:37] * NaioN_ (~stefan@andor.naion.nl) has joined #ceph
[12:37] * df_ (davidf@dog.thdo.woaf.net) has joined #ceph
[12:37] * _Tass4dar (~tassadar@tassadar.xs4all.nl) has joined #ceph
[12:37] * peritus_ (~andreas@h-150-131.a163.priv.bahnhof.se) has joined #ceph
[12:37] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * morse (~morse@supercomputing.univpm.it) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * alexxy (~alexxy@ Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * _are__ (~quassel@vs01.lug-s.org) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * _Tassadar (~tassadar@tassadar.xs4all.nl) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * df__ (davidf@dog.thdo.woaf.net) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * peritus (~andreas@h-150-131.a163.priv.bahnhof.se) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * Ormod (~valtha@ohmu.fi) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * NaioN (~stefan@andor.naion.nl) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * RupS (~rups@panoramix.m0z.net) Quit (reticulum.oftc.net solenoid.oftc.net)
[12:37] * Ormod (~valtha@ohmu.fi) has joined #ceph
[12:39] * RupS (~rups@panoramix.m0z.net) has joined #ceph
[12:39] * maswan (~maswan@kennedy.acc.umu.se) Quit (Ping timeout: 480 seconds)
[12:48] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[12:48] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[12:56] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[13:02] * tools_ (~tom@ipx20310.ipxserver.de) has joined #ceph
[13:03] <tools_> hi all.. I'm planning our infrastructure for 2012 atm and wonder if I can plan with ceph on two servers as a the base for our kvm development-guests
[13:03] * tools_ is now known as tools
[13:03] <tools> any input on that?
[13:04] <tools> I read that ceph is in the kernel already but seem to be tightly coupled with btrfs
[13:04] <tools> I am not too sure how stable btrfs is
[13:04] <tools> or if I can easily use ceph with e.g. ext[34]
[13:05] <tools> the wiki says that ceph is not ready for production use yet. But the last edit was in march I think
[13:05] <tools> things may have changed
[13:05] <nms> tools: http://tracker.newdream.net/projects/ceph/roadmap
[13:25] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:30] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[13:32] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[13:56] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[13:56] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[13:56] * fronlius_ is now known as fronlius
[14:37] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:10] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[15:10] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[15:11] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[15:13] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[15:14] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Read error: Connection reset by peer)
[15:15] * alexxy[home] (~alexxy@ Quit (Read error: Connection reset by peer)
[15:15] * alexxy (~alexxy@ has joined #ceph
[15:17] * sage (~sage@ Quit (Ping timeout: 480 seconds)
[15:17] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:19] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[15:19] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[15:21] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:22] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[15:25] * NaioN_ (~stefan@andor.naion.nl) Quit (Remote host closed the connection)
[15:25] * NaioN (~stefan@andor.naion.nl) has joined #ceph
[15:25] * sage (~sage@ has joined #ceph
[16:27] * jclendenan (~jclendena@ Quit (Ping timeout: 480 seconds)
[16:41] * jclendenan (~jclendena@ has joined #ceph
[16:56] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) has joined #ceph
[17:08] * nwatkins` (~user@kyoto.soe.ucsc.edu) has joined #ceph
[17:11] * nwatkins` (~user@kyoto.soe.ucsc.edu) Quit (Remote host closed the connection)
[17:15] * nwatkins` (~user@kyoto.soe.ucsc.edu) has joined #ceph
[17:30] * grape (~grape@c-76-17-80-143.hsd1.ga.comcast.net) has joined #ceph
[17:41] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[17:42] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:44] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[17:45] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[17:49] <josef> sagewk: has somebody sent you a logrotate patch recently?
[18:02] <gregaf1> lxo: OSDs not starting properly is either a regression or a bug that you've never hit before'
[18:03] <gregaf1> psomas: lpg_num/lpgp_num are the same as pg_num/pgp_num, except for the "local" PGs, which are PGs that are force-fed to have a certain PG as their primary
[18:04] <gregaf1> the default number of PGs is assigned using a bad algorithm based on the number of OSDs with mkcephfs et al, but once the system's running it doesn't go through the effort; don't remember why — but really the defaults aren't that smart anyway
[18:17] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:42] * bchrisman (~Adium@ has joined #ceph
[19:24] * adjohn (~adjohn@ has joined #ceph
[19:33] <lxo> gregaf1, thanks, filed bug 1690
[19:47] * mgalkiewicz (~maciej.ga@ has joined #ceph
[19:48] <mgalkiewicz> hello guys I was told creating many pools does not scale well
[19:50] <mgalkiewicz> the question how many pools I can create before having problems with them?
[19:50] <mgalkiewicz> tens? hundreds?
[19:50] <joshd> yeah, the main problem is increased memory usage on the osds (especially during recovery)
[19:51] <joshd> I think gregaf was tested the limits a little while ago
[19:52] <joshd> er, testing
[19:52] <sjust> mgalkiewicz: We ran into trouble with around 60000 pgs (so around 7500 pools) per osd iirc
[19:52] <gregaf1> well, the relationship between the number of pools and the number of PGs is a bit more pliable than that
[19:53] <sjust> gregaf1: yeah, but it defaults to 8 pgs per pool
[19:53] <gregaf1> I thought it was 16?
[19:53] <sjust> hmm, might be
[19:53] <sjust> mgalkiewicz: anyway, the problem is that each pool needs a (configurable) number of pgs
[19:53] <mgalkiewicz> hmm I would like to have tens of pools, maybe hundred at most
[19:54] <sjust> mgalkiewicz: that should be just fine, I would think
[19:54] <gregaf1> yeah, no problem there
[19:54] <mgalkiewicz> ok do you plan to optimize this somehow?
[19:54] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[19:54] <mgalkiewicz> I have to create many pools because of authentication
[19:55] <mgalkiewicz> each client has its own pool and have access only to rbd images from specific pool
[19:55] <gregaf1> it will get better in the future in various ways, but there's always going to be a limit somehow
[19:56] <gregaf1> but the biggest problem is the number of PGs on each OSD, so as long as you can bound the numbers based on amount of storage you shouldn't run into any issues
[19:56] <mgalkiewicz> yeah, nothing is infinite
[19:56] <gregaf1> the only times we've seen issues were with the old RGW, which created a pool for every bucket
[19:59] <mgalkiewicz> ok thx for help
[19:59] <mgalkiewicz> cu
[20:27] * mgalkiewicz (~maciej.ga@ Quit (Quit: Ex-Chat)
[20:30] <psomas> gregaf1: about the defaults, using the rados bench it seems that the ~256(maybe 264 or sth) pgs for the rbd pool performs better than the default 8 pgs per new pool
[20:33] * fronlius (~fronlius@f054105049.adsl.alicedsl.de) has joined #ceph
[20:48] <NaioN> somebody using mdraid on the osds?
[20:49] <NaioN> I'm having trouble with it, the osds commit suicide after a while under heavy load
[20:58] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[21:01] * adjohn (~adjohn@ Quit (Quit: adjohn)
[21:04] * adjohn (~adjohn@ has joined #ceph
[21:06] <wido> joshd: have you been able to track that bug further down? #1529
[21:06] <wido> I just went from 2 up osds to 1
[21:06] <wido> another died
[21:06] <wido> 1 out of 6 is up now ;)
[21:24] <joshd> wido: no, I haven't reproduced it with debugging yet
[21:33] * nwatkins` (~user@kyoto.soe.ucsc.edu) Quit (Read error: Connection reset by peer)
[21:39] <wido> joshd: ok, thanks :-) If there is anything I can do, let me know
[21:44] <gregaf1> wido: joshd: that isn't the replay bug with clones on non-btrfs?
[21:45] <wido> gregaf1: I'm running btrfs
[21:46] <gregaf1> wido: I figured; the bug just reminds me of the symptoms I'd expect and I don't think the teuthology stuff Josh was referring to is on btrfs
[21:49] <NaioN> what does this mean in the log: 2011-11-07 21:34:31.273754 7fda4cd15700 heartbeat_map is_healthy 'OSD::op_tp thread 0x7fda3fbfa700' had timed out after 30
[21:54] <gregaf1> NaioN: the OSD has a thread that was working on IO ops that hung (probably on a write)
[21:55] <NaioN> I get a lot of those before the osd kills itself
[21:55] <NaioN> together with these: 2011-11-07 21:34:41.274176 7fda4cd15700 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7fda45d07700' had timed out after 60
[21:55] <gregaf1> yeah, the OSD is committing suicide because its IO seems to be dying
[21:59] <ajm> that issue seems to be way worse in 0.37 :/
[22:00] <NaioN> Well I noticed I get in if I'm running the btrfs on a mdraid
[22:00] <NaioN> If i make a osd per disk I didn't get that problem
[22:01] <NaioN> I think I trigger some bug between btrfs and mdraid
[22:02] <NaioN> because if I do a dd on the mounted btrfs it also stalls (status D with ps)
[22:05] <NaioN> the only thing I see is these messages in dmesg: [23131.416746] WARNING: at fs/btrfs/inode.c:2198 btrfs_orphan_commit_root+0xa8/0xc0 [btrfs]()
[22:05] <NaioN> and a trace (http://pastebin.com/GNpMagnD)
[22:11] <NaioN> hmm strange I could initiate a dd a couple of minutes ago, but know I can't do a ls anymore on the volume
[22:11] <NaioN> now it's completely dead
[22:17] <todin> how could I debug a hanging rbd rm? It counts to 8% an than just hangs. other images in the pool are fine
[22:27] <Tv> todin: sounds like an object 8% into the image is having trouble, e.g. the osds for it crashed
[22:27] <Tv> todin: is the cluster healthy?
[22:27] <Tv> todin: what does "ceph -s" say, is there something interesting in the osd logs, etc
[22:27] <todin> Tv: I think so, there is nothing in the logs, an the other 19 images are fine.
[22:29] <todin> 2011-11-07 22:28:30.909188 pg v139457: 792 pgs: 791 active+clean, 1 active+clean+scrubbing; 67249 MB data, 104 GB used, 1635 GB / 1833 GB avail
[22:29] <todin> 2011-11-07 22:28:30.914444 mds e1: 0/0/1 up
[22:29] <todin> 2011-11-07 22:28:30.914522 osd e37: 4 osds: 4 up, 4 in
[22:29] <todin> 2011-11-07 22:28:30.914672 log 2011-11-07 22:21:31.454945 mon.0 7 : [INF] osd.1 boot
[22:29] <todin> 2011-11-07 22:28:30.914883 mon e1: 1 mons at {0=}
[22:36] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[22:39] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[22:45] <NaioN> hmmmm on another server with a BTRFS filesystem I see these messages in dmesg:
[22:45] <NaioN> [143280.760430] INFO: task rsync:5509 blocked for more than 120 seconds.
[22:45] <NaioN> [143280.760520] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[22:46] <NaioN> so it looks like BTRFS sometimes stalls...
[22:52] <ajm> NaioN: i've seen this as well, i've never found a real solution though
[22:56] <todin> does anyone know if ceph supports trim?
[23:05] <Tv> todin: RADOS has that operation; right now I can't pinpoint any code in rbd or cephfs that would use it.
[23:09] <todin> Tv: would be nice to have it in rbd too
[23:09] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[23:09] <Tv> todin: rbd is just a thin layer on top of rados, so it should be easy, but i don't see it there currently
[23:10] <Tv> i'm adding tickets for both cases
[23:12] <Tv> http://tracker.newdream.net/issues/1692 http://tracker.newdream.net/issues/1693
[23:12] <Tv> sagewk: fyi ^
[23:14] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:17] <todin> Tv: thanks, that would be a great improvment
[23:25] <Tv> todin: it's been in the plans already, just perhaps a bit too silently
[23:25] <Tv> bundled into a bigger "rbd" etc milestone
[23:34] <todin> Tv: ok, for me it is a quite important feature, I could than do a thin provising kvm rbd installation
[23:35] <Tv> todin: well rbd is already thinly provisioned without anything extra
[23:36] <Tv> todin: trim would just add hole punching something that got un-thinned
[23:36] <Tv> *for something
[23:39] <todin> Tv: what do you mean by hole punching? If I create a rbd image the space is not allocated in the osd, than I use the space in the rbd image than the space gets allocated on the osd. but when I then free the space on the rbd image, what happends do the space on the osd? will it freed as well?
[23:40] <Tv> todin: the only way to "free up space on the rbd image" would be via TRIM; so once that feature is added, then yes
[23:41] <Tv> todin: that feature tends to be called TRIM by the SATA folk and "hole punching" by more software-oriented people (hole punching being the abstract operation, e.g. to be done on a file, which may or may not result in a SATA-level TRIM operation, or something like that)
[23:43] <todin> Tv: ok, I just know it as TRIM
[23:43] <Tv> todin: there are several interconnected issues.. the conversation in http://lwn.net/Articles/415889/ is a decent overview
[23:44] <darkfader> or thin reclaim ;)
[23:58] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[23:58] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.