[0:07] <sagewk> sjust, joshd: i looked at one of the heartbeat_map errors yesterday and it looks like deadlock somewhere, but i didn't have teh binaries to go with teh coredump
[0:08] <joshd> sagewk: #1635 happened today, and is probably the same bug
[0:08] <sagewk> yeah
[0:08] <sagewk> is there a full log to go with it?
[0:09] <joshd> not with debugging
[0:15] <df__> ohdear
[0:15] <sagewk> uhoh
[0:15] <df__> 7 client machines have died in the last couple of hours
[0:15] <df__> will restart with for-linus and see
[0:19] <df__> http://pastebin.com/zum8Dd6k
[0:22] <Tv> grep -qE '^[^ ]+ (\([^)]\))? start/'
[0:22] <Tv> i hate regexps
[0:24] <Tv> even more so because that was buggy
[0:27] <sagewk> df__ meh pastebin is overloaded
[0:29] <df__> 1sec
[0:33] <df__> ftp://ftp.kw.bbc.co.uk/davidf/priv/ceph/eikailei
[1:02] <df__> gregaf, remember my "mismatch between child accounted_rstats and my rstats!" issue from the other day? is there any method to run an fsck to clean it up?
[1:02] <df__> (a) i'm getting a lot of errors being reported in the log
[1:02] <gregaf> df__: haven't really gotten a chance to dive into it yet, but there's not an fsck yet :/
[1:03] <gregaf> unless sagewk has done something without telling anybody or has a clever idea
[1:07] <df__> it might also be the cause of some 16ExbiByte directories (or more accurately (uint64_t)(-N), where N is some number < 1000)
[2:28] <ajm> does anyone know if I can go from 0.34 -> 0.37 or if I need to go to 0.35 first to do the PG format changes.
[2:34] <joshd> it should upgrade just fine
[11:18] * mgalkiewicz (~maciej.ga@staticline58722.toya.net.pl) has joined #ceph
[11:18] <mgalkiewicz> hello
[11:19] <mgalkiewicz> I have a problem with OSD. After start it crashes with error http://pastie.org/2728756
[12:02] <NaioN> mgalkiewicz: I've disabled snap support for BTRFS and it looks like it's a lot more stable
[12:03] <NaioN> mgalkiewicz: filestore btrfs snap = 0 under your osd section of the ceph.conf
[12:04] <NaioN> do you have a journal device?
[12:05] <mgalkiewicz> hmm not sure how can I check this
[12:07] <mgalkiewicz> hmm this line in config file helped
[12:08] <mgalkiewicz> however ceph -s still reports that there are no osds running
[12:11] <NaioN> 2011-10-20 11:13:27.629208 7f91c8c9a720 filestore(/srv/ceph/osd.0) mount: enabling PARALLEL journal mode: btrfs, SNAP_CREATE_V2
[12:11] <NaioN> from your paste
[12:11] <NaioN> if you disable snap you will see there WRITEAHEAD
[12:12] <NaioN> 2011-10-19 16:28:22.147470 7f302b1c0720 filestore(/data/osd0) mount: enabling WRITEAHEAD journal mode: 'filestore btrfs snap' mode is not enabled
[12:12] <NaioN> besides there is a bug in the linux kernel in BTRFS
[12:13] <NaioN> [62222.860368] WARNING: at fs/btrfs/inode.c:2194 btrfs_orphan_commit_root+0xa8/0xc0 [btrfs]()
[12:13] <NaioN> from the dmesg
[12:13] <NaioN> and that one we hope is solved in 3.1-rc9 upwards
[13:02] <mgalkiewicz> do you have any idea how to workaround this without updating kernel to 3.1?
[13:04] <mgalkiewicz> I understand that replacing btrfs with other fs is one possibility?
[13:28] <NaioN> mgalkiewicz: hmmm think its something else
[13:30] <mgalkiewicz> the last listing was from machine with kernel 3.0
[13:31] <mgalkiewicz> when I tried the same on kernel 2.6.32 I have got similar issue even though I disabled snapshot
[13:32] <mgalkiewicz> I guess that 2.6.32 is not enough?
[13:33] <NaioN> the thing with the kernel is for another thing: WARNING: at fs/btrfs/inode.c:2194 btrfs_orphan_commit_root+0xa8/0xc0
[13:33] <NaioN> i hope
[15:43] <chaos__> what are ways to monitor ceph cluster (i need something for munin), there is ceph -w on server side.. but it would be nice to have more (for example connected clients), what i can do on the client side? is there any way to detect that cluster went down or client lost connection (beside hanging ls <mount_point>)
[17:02] * adrianoc (~adrianoc@189-112-095-060.static.ctbctelecom.com.br) has joined #ceph
[17:32] <ajm> chaos__: "ceph health" will give you an overall "OK/NOK" result
[18:06] <ajm> http://pastebin.com/kWck49q7 anyone seen anything like this doing the upgrades for the disk format changes?
[18:43] <pmjdebruijn> mgalkiewicz: 2..6.32 is very old especially regarding btrfs
[18:43] <pmjdebruijn> NaioN: good evening :D
[18:55] <joshd> chaos__: we have a collectd plugin (https://github.com/NewDreamNetwork/collectd) you can use as a starting point for munin
[18:56] * jojy (~jojyvargh@ has joined #ceph
[18:58] <sjust> ajm: what is currently in the __temp directory on that osd?
[19:13] <ajm> sjust: /data/osd.5/current/_temp/10000ae7aa4.0000008f_head
[19:22] <NaioN> i'm still seeing these messages in dmesg: http://pastebin.com/3XeM0q4m
[19:22] <NaioN> warnings about fs/btrfs/inode.c
[19:22] <NaioN> i hoped they would be fixed in the latest rc
[19:23] <NaioN> anybody know what they are about?
[19:28] <Tv> NaioN: looks like a btrfs bug again
[19:28] <NaioN> yes looks like it
[19:29] <NaioN> but i hoped they fixed that one in the latest rc
[19:29] <NaioN> we are running rc10 at the moment
[19:31] <NaioN> http://marc.info/?l=linux-btrfs&m=131547325515336&w=2
[19:32] <NaioN> well we could apply that patch ourselves....
[19:32] <NaioN> pmjdebruijn: work to do in the morning ;)
[19:32] <sjust> ajm: does that object exist in 0.3bd_head also?
[19:35] <pmjdebruijn> NaioN: ah patch included... I like that
[19:36] <NaioN> pmjdebruijn: first check if its already included in rc10
[19:36] <pmjdebruijn> NaioN: 'patch' automatically warns if a patch already has been applied
[19:36] <NaioN> k
[19:36] <pmjdebruijn> assuming there are been no other changes in the same code area
[19:36] <NaioN> do you have the tree on builder01?
[19:37] <pmjdebruijn> it can't do magic :D
[19:37] <pmjdebruijn> NaioN: yep
[20:01] <ajm> sjust: does not
[20:02] <sjust> that's wierd. You can get the conversion back on track by copying that object from __temp to 0.3bd_head, removing __temp, and restarting the OSD
[20:02] <ajm> ok
[20:03] <sjust> if you could kick the filestore debug level up to 25, we should see what caused the error if it happens again
[20:03] <ajm> ok
[20:03] <sjust> wait
[20:03] <sjust> actually
[20:03] <sjust> ugh
[20:03] <ajm> the file format in 0.3bd_head looks a bit different though, not sure where to put that file
[20:03] <sjust> yeah, I just remembered that
[20:03] <sjust> hang on
[20:07] <sjust> try find . -name '*10000ae7aa4.0000008f*' in that directory
[20:08] <ajm> 0
[20:09] <ajm> the only 10000ae7aa4.0000008f on that whole osd is in the _temp
[20:13] <sjust> the actual filename would be a bit different: <name>_<locator>_<snap>_<hash> where name is 10000ae7aa4.0000008f, snap is head, locator is empty string, and hash is the hash value
[20:14] <ajm> hrm, how can i find out the hash value?
[20:14] <sjust> ajm: one sec
[20:15] <ajm> ok and dirs location I see is the reverse hash 1 dir per char
[20:15] <sjust> right, so we also need the hash to find the correct subdir
[20:17] <sjust> can you give me a hex dump of the user.ceph._ attribute?
[20:20] <sjust> ajm: actually, I can get it from the name, one sec
[20:45] <sjust> ajm: ok, the hash should be 748F2FBD
[20:46] <sjust> the complete filename should be 10000ae7aa4.0000008f__head_748F2FBD
[20:47] <sjust> I
[20:47] <sjust> I'll be back in a few minutes
[20:47] <sjust> you should find other objects with similar hash values in that collection
[20:54] <ajm> sjust: starting up again
[20:55] <ajm> sjust: yay progress
[20:55] <ajm> thats really weird that that happened
[21:32] <wido> hi
[21:33] <wido> I'm playing a bit with librados (C), but I'm a bit confused
[21:33] <wido> isn't rados_append the same as rados_write with an offset?
[21:33] <wido> just an easier way to append to an object instead of doing a rados_stat first for finding the object size?
[21:34] <Tv> wido: i think the _stat way implies a race condition
[21:35] <wido> It depends doesn't it? But you could accomplish the same with a rados_write with an offset, correct?
[21:35] <Tv> wido: do you know the difference between open(2) flag O_APPEND vs no flag?
[21:36] <Tv> wido: i don't know but that sounds like a likely explanation for the difference
[21:37] <wido> Uh Tv not really thát deep into low level stuff
[21:37] <Tv> wido: imagine two processes doing that _stat & _write dance
[21:38] <Tv> wido: A: _stat -> 123 bytes, B: _stat -> 123 bytes, A: write at offset 123, B: write at offset 123 == FAIL
[21:38] <wido> yeah, get your point :)
[21:39] <wido> fair enough, well, that explains
[21:39] <wido> btw, the new rbd_writeback really improves the performance, VM's are much more responsive
[22:16] * bencherian (~bencheria@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[22:55] * mgalkiewicz (~maciej.ga@staticline58722.toya.net.pl) Quit (Ping timeout: 480 seconds)
[23:04] <Tv> heh, crowbar outputs "Compressing indicies".. that's, like, an order of magnitude more plural
[23:16] <yehudasa> bencherian: anything we can help you with?
[23:16] <gregaf> not right now — he's at the white boards :p
[23:16] <yehudasa> oh..
[23:29] <sagewk> anyone care to look at https://github.com/NewDreamNetwork/ceph/blob/wip-prior/src/osd/PG.cc#L4828 and see if it makes more sense than on monday?
[23:31] <joshd> sagewk: what about not passing in the PG?
