#ceph IRC Log


IRC Log for 2010-11-08

Timestamps are in GMT/BST.

[1:38] <p-static> hey, is it just me, or is the sample config file broken?
[1:38] <p-static> it specifies the osd journal as a file, and doesn't specify "osd journal size", so nothing works
[14:30] <jantje_> Hi
[14:31] <jantje_> I have another MON core dump
[14:31] * jantje_ is going to look for a kick-start-gdb guide :-)
[15:52] <jantje_> I have 1 mon, 1 mds and 2 osd's on each server
[15:52] <jantje_> should I be able to mount the ceph cluster using the second or third mds ?
[15:53] <jantje_> I can only mount using the first one, or are the other MDSs refusting connections untill they detect the 'master' MDS is down?
[15:59] <sagewk> jantje_: you always mount against the monitor ips. the load balancing and handoff between mds's is handled internally
[16:01] <jantje_> oh i see, but I also have mon's on the other servers
[16:02] <sagewk> jantje_ is the mon crash reproducible? would be nice to have the full mon logs, 'debug mon = 20'
[16:05] <jantje_> somehow it just started after trying some more
[16:06] <jantje_> but I was doing a cvs checkout from our source tree, and I think all my osd's just died, i'll have to talke a closer look
[16:07] <sagewk> btw you might want to switch to the 'rc' branch (soon to be v0.23). unstable just got a bunch of untested code merged in.
[16:10] <jantje_> #0 0x0000000000575de6 in ceph::buffer::ptr::release (this=0x0, bl=...) at ./include/buffer.h:428
[16:10] <jantje_> #1 ~ptr (this=0x0, bl=...) at ./include/buffer.h:387
[16:10] <jantje_> #2 FileJournal::do_write (this=0x0, bl=...) at os/FileJournal.cc:620
[16:10] <jantje_> (something from last week...)
[16:10] <jantje_> no so relevant I guess
[16:17] <sagewk> was it during osd shutdown or something? i wouldn't expect this=NULL unless it's a thread teardown problem
[16:18] <jantje_> can't tell, I even might have used the wrong binary with gdb
[16:33] <jantje_> you don't know by any chance how to send the sysrq-b key combo to a serial console connected on an terminal server?
[16:33] <jantje_> :)
[16:33] <sagewk> it's the break character
[16:33] <sagewk> not sure what the ascii code for it is. on our terminal servers it's control-z b'
[18:53] <jantje_> sagewk: I reproduced it by .. euhm .. doing nothing :-)
[18:53] <jantje_> I got the logs, uploading them right now
[18:54] <jantje_> http://jan.sin.khk.be/bug.tar.gz
[18:55] <sagewk> is this the mon or filestore crash?
[19:14] <wido> I'm seeing a btrfs bug with 2.6.37, when I run cosd -i 8 -c /etc/ceph/ceph.conf --mkfs --mkjournal --monmap monmap, the machine gets a kernel panic
[19:14] <wido> got remote syslog set-up, but nothing gets logged
[19:14] <wido> anyone seen this before?
[19:17] <wido> i've got something else too, when running "ceph -w" I suddenly get old loglines being printed on my screen, I guess a few thousand
[19:18] <cmccabe> loglines that you've seen before you mean?
[19:18] <wido> yes, old logs from this afternoon
[19:19] <wido> but they get a recent timestamp
[19:19] <yehudasa> wido: 2.6.37? you mean rc1?
[19:19] <wido> yes, rc1 indeed
[19:20] <wido> hmm, I can simply invoke it by doing: touch foo; sync; rm foo
[19:20] <yehudasa> there were a few btrfs issues that we've seen but the fixes were supposed to get pushed to -rc1
[19:20] <wido> tried a fresh mkfs, even overwrite the partition with zeros, didn't help either
[19:20] <wido> pretty weird, same kernel is running fine on other machines
[19:44] <jantje_> sagewk: MON
[23:45] <jantje_> sagewk: I hope the logs could be of any help
[23:50] <jantje_> I have to run some benchmarks tomorrow
[23:50] <jantje_> probably best not to use the unstable branch
[23:50] <jantje_> so rc branch, right?
[23:51] <jantje_> unless there are some cool things in the unstalbe branch that can speed up thing significantly?
