#ceph IRC Log


IRC Log for 2010-10-05

Timestamps are in GMT/BST.

[0:07] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[1:01] * cmccabe1 (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) Quit (Remote host closed the connection)
[1:13] * cmccabe1 (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has joined #ceph
[2:26] * fzylogic (~fzylogic@dsl081-243-128.sfo1.dsl.speakeasy.net) Quit (Quit: DreamHost Web Hosting http://www.dreamhost.com)
[2:42] * cmccabe1 (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has left #ceph
[5:06] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) Quit (Ping timeout: 480 seconds)
[5:29] * oldirty84 (~quassel@adsl-66-231-6.asm.bellsouth.net) Quit (Remote host closed the connection)
[5:30] * oldirty84 (~quassel@adsl-66-231-6.asm.bellsouth.net) has joined #ceph
[5:33] * oldirty84 is now known as oldirty|a
[6:03] * sentinel_e86 (~sentinel_@ Quit (Remote host closed the connection)
[6:06] * sentinel_e86 (~sentinel_@ has joined #ceph
[7:16] * leander_yu (dad320c2@ircip1.mibbit.com) has joined #ceph
[7:18] <leander_yu> Hi all, I have an osd process running but the map mark it as down.
[7:19] <leander_yu> I check the network, it seems ok, since I can ssh to the machine and I saw the cosd process running
[7:20] <leander_yu> but the log shows alot of pipe fault like >> pipe(0x7f7b680e2620 sd =-1 pgs=437 cs=1 l=0).fault with nothing to send, going to standby
[7:20] <leander_yu> any idea?
[8:34] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:06] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:40] * leander_yu (dad320c2@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[9:52] * f4m8_ is now known as f4m8
[10:06] * Yoric (~David@ has joined #ceph
[10:22] * allsystemsarego (~allsystem@ has joined #ceph
[10:36] * darkfader (~floh@host-82-135-62-109.customer.m-online.net) has joined #ceph
[12:50] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[12:52] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:06] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[13:08] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:12] * yuravk (~yura@ext.vps.lviv.ua) has joined #ceph
[14:12] <yuravk> Hi, is anybody here ?
[14:14] * Yoric_ (~David@ has joined #ceph
[14:14] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[14:14] * Yoric_ is now known as Yoric
[15:50] * Yoric_ (~David@ has joined #ceph
[15:51] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[15:51] * Yoric_ is now known as Yoric
[15:58] * deksai (~deksai@ has joined #ceph
[16:08] * f4m8 is now known as f4m8_
[17:07] * deksai (~deksai@ Quit (Ping timeout: 480 seconds)
[17:36] <sage> gregaf, yehudasa: working from home today
[17:53] * deksai (~deksai@dsl093-003-018.det1.dsl.speakeasy.net) has joined #ceph
[18:13] * darktim (~andre@pcandre.nine.ch) Quit (Quit: Verlassend)
[18:14] * andret (~andre@pcandre.nine.ch) has joined #ceph
[18:50] * Yoric (~David@ Quit (Quit: Yoric)
[18:58] <sage> wido: in the future, can you also copy /usr/lib/debug/usr/bin/cosd (or whatever) when collecting the cores and logs?
[18:58] <sage> also it looks like node07 is down?
[18:58] <gregaf> hi yuravk, we're here now :)
[18:58] <sage> -a
[19:32] <sage> gregaf: i see cephfs.cc, but it's not in the Makefile.am yet?
[19:32] <gregaf> oh, duh
[19:32] <sage> just checking :)
[19:32] <gregaf> I was just building it manually
[19:34] <gregaf> you can fix it or I'll get it later — discovered a problem with the uclient handling dir layouts
[19:35] <sage> k
[19:37] <yehudasa> sage: maybe we should have a script that collects all the relevant info, and tars it
[19:37] <yehudasa> so that when we have someone complaining we can just tell him to run this script and send us the result
[19:40] <sage> yeah
[19:40] <sage> it could use cconf to find the log dir too. the hard part is finding the core file...
[19:40] <yehudasa> right
[19:41] <yehudasa> can search in a number of places
[19:41] <sage> we should probably add something to init-ceph to (optionally) ulimit -c unlimited before startup
[19:41] <yehudasa> it can depend on some cconf parameter
[19:41] <yehudasa> can put it under [init] section
[19:42] <sage> or under the usual sections, so you can (say) only capture core dumps for osd but notmds
[19:42] <yehudasa> yeah
[19:45] <yehudasa> hmm.. there's this relatively new core mechanism in linux where you can specify a program to pipe into it any core dump
[19:47] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:47] <yehudasa> so theoretically we can pipe the core into some program that creates that tarball for us with the core, and puts it in some dump directory, specified by ceph.conf
[19:50] <sage> fancy!
[19:51] <sage> we should start with just finding the regular core files though, since most people won't have that functionality for a while
[19:52] <sage> it should maybe check ./core* and /core* (unless there's a tunable specifying core location?) and see if any of the files matches a ceph binary
[19:53] <yehudasa> I agree.. though it's there since 2.6.19, so I'm not sure about most people
[19:53] <sage> hmm. is it something you can specify on a per-process basis, or is it a kernel global (like core_pattern or whatever)?
[19:53] <yehudasa> global
[19:54] <yehudasa> however, the ulimit is not global, so usually most processes will not hit it
[20:04] * idletask (~fg@AOrleans-553-1-62-30.w92-152.abo.wanadoo.fr) has joined #ceph
[20:04] <idletask> Good evening
[20:06] <gregaf> hi idletask
[20:07] <idletask> Hello gregaf
[20:09] <sage> yehudasa: can you look at http://tracker.newdream.net/issues/446 ? that will also save a lot of debugging time
[20:18] <yehudasa> yeah
[20:20] <yehudasa> sage: aren't we doing this already?
[20:20] <sage> not for core. only assert failures show a stack trace.
[20:21] <sage> and actually, the assert code current induces a core by deliberately segfaulting.. that will probably need to be done a bit differently
[20:54] * cmccabe1 (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has joined #ceph
[20:54] <yehudasa> sage: the assertion code currently sends an implicit sigabort by throwing an exception, and not sigsegv
[20:56] <yehudasa> so either we catch both signals, and disable the backtrace dumping in the assertion code, or just catch the sigsegv, and keep the current assert behavior
[20:57] <yehudasa> we can also just dump the backtrace twice, once in the assertion, and once in the signal handler
[20:57] <yehudasa> so that if the signal handler doesn't behave we'll still have the backtrace
[20:57] <cmccabe1> hi yehuda
[20:58] <cmccabe1> and all
[20:58] <yehudasa> hey!
[20:58] <cmccabe1> it would be nice to see a message like "died on signal FOO" when that occurs, I assume that's what you're talking about?
[20:59] <yehudasa> issue #446
[20:59] <yehudasa> we'd like to see a full backtrace when we die
[21:00] <cmccabe1> ic
[21:07] <sage> i would tend toward printing the track in the assert code just in case something goes wrong with the signal handler
[21:08] <sage> s/track/trace/
[21:09] <yehudasa> yeah, I think so too
[21:09] <yehudasa> so I'm closing this issue
[21:09] <cmccabe1> well
[21:09] <cmccabe1> I actually would like very much to implement something simple that just tells you what signal you died on... and maybe process/thread ID
[21:10] <cmccabe1> I've written similar handlers in the past
[21:10] <cmccabe1> although usually I wrote to syslog
[21:11] <cmccabe1> sysadmins tend to watch syslog
[21:12] <yehudasa> cmccabel: we're already catching the signal now and dumping a backtrace
[21:12] <cmccabe1> ok, cool
[21:15] <cmccabe1> one thing you should do though
[21:15] <cmccabe1> disable the signal handler for SIGSEGV inside sigsegv_handler()!
[21:16] <cmccabe1> infinite recursion can really make things confusing... and prevent you from dumping core, if I remember correctly.
[21:18] <cmccabe1> actually, looking at it more closely
[21:18] <cmccabe1> one the signal handler is invoked, the signal action gets set back to SIG_DFL, so never mind
[21:18] <yehudasa> yeah.. linux uses the bsd semantics for that
[21:19] <yehudasa> oh, or is it the other way around?
[21:20] <cmccabe1> from man signal:
[21:20] <cmccabe1> > BSD improved on this situation by changing the semantics of signal
[21:20] <cmccabe1> > handling (but, unfortunately, silently changed the semantics when
[21:20] <cmccabe1> > establishing a handler with signal()). On BSD, when a signal handler is
[21:20] <cmccabe1> > invoked, the signal disposition is not reset, and further instances of
[21:20] <cmccabe1> > the signal are blocked from being delivered while the handler is
[21:20] <cmccabe1> > executing.
[21:20] <cmccabe1> so BSD people would probably want us to use sigaction instead since it's more portable. But it doesn't matter *that* much to us I guess.
[21:21] <yehudasa> no, not really
[21:22] <darkfader> bsd would mean I could put ceph on zfs and that would mean i could use the l2arc volumes as cache ;)
[21:22] <cmccabe1> does FUSE exist on solaris?
[21:22] <darkfader> but that would be so high a pile of software that i'd rather run
[21:23] <darkfader> cmccabe1: seems it's working but dead: http://hub.opensolaris.org/bin/view/Project+fuse/WebHome
[21:42] <sage> cmccabe1, yehudasa: i'm seeing that bdi warning on the unstable branch: http://tracker.newdream.net/issues/464
[21:42] <cmccabe1> yeah, you will see it until Jan's patch gets applied
[21:43] <cmccabe1> the behavior in head-of-line is incorrect... it uses the default backing device for ceph
[21:43] <sage> which one is that?
[21:43] <cmccabe1> 1sec... let me find it
[21:43] <idletask> Grr, mkcephfs doesn't complete :(
[21:43] <cmccabe1> we might want to cherry-pick Jan's fix
[21:43] <cmccabe1> for the client repo
[21:44] <sage> if it's on it's way upstream it can probably wait.
[21:44] <sage> nm then :)
[21:44] <sage> idletask: ssh problems?
[21:45] <idletask> sage: no, I can ssh in with no problems...
[21:45] <cmccabe1> Message-ID: <20100929081936.GA23322@lst.de>
[21:46] <cmccabe1> aka http://www.spinics.net/lists/linux-btrfs/msg06303.html
[21:46] <cmccabe1> so basically Jan wrote a patch, and then Christoph wrote a patch, and it looks like Christoph's is the one going in.
[21:46] <cmccabe1> but either one will fix the problem for ceph.ko
[21:47] <sage> sounds good
[21:49] <idletask> On the "remote" host, I see three sessions opened in a row, but mkbtrfs isn't made at all on it
[21:49] <idletask> I don't use cephx
[21:49] <sage> you can add -v to mkcephfs to see what it's doing
[21:51] <idletask> Can it be due to the fact that /var/log/ceph doesn't exist? It complains loudly about it on the initiating machine
[21:52] <idletask> I also installed with prefix=/opt/ceph fwiw
[21:54] <idletask> Oh
[21:54] <sage> that should make a lot of noise but not actually cause anything to fail
[21:55] <idletask> "max osd in /opt/ceph/etc/ceph/ceph.conf is 1, num osd is 2"
[21:55] <idletask> It still connects via ssh to the second machine though...
[21:55] <idletask> Strange
[21:55] <sage> idletask: taht's normal. osd0 and osd1 => max (osd id #) is 1, num is 2
[21:56] <idletask> OK, so the problem is elsewhere
[21:59] <idletask> Does the remote machine try to connect at all to the monitor on the first machine by any chance?
[21:59] <sage> not during mkcephfs
[21:59] <idletask> OK, so that does not explain the han
[21:59] <idletask> hang
[21:59] <sage> which command hangs?
[22:00] <idletask> mkcephfs... It hangs when trying to do anything to the second host
[22:00] <idletask> http://paste.pocoo.org/show/271477/ <-- that's my configuration file
[22:01] <sage> which command hangs? the cosd --mkfs one?
[22:01] <sage> btrfs devs = /dev/data/ceph
[22:01] <sage> looks suspect.. that should be a block device
[22:01] <idletask> Yes it is, I use LVM on erwin
[22:01] <idletask> data is my VG name
[22:02] <sage> ah
[22:02] <idletask> The command hanging is /opt/ceph/sbin/mkcephfs --mkbtrfs --allhosts --conf /opt/ceph/etc/ceph/ceph.conf --clobber
[22:07] <idletask> I see three SSH connections to erwin from carmen, followed by three immediate disconnections
[22:07] <idletask> mkbtrfs is just not issued at all
[22:13] <yehudasa> sage: shouldn't we reset the config options on cconf according to the defaults from config.cc?
[22:18] <idletask> OK, in fact, the thing is, mkcephfs runs cosd on the localhost and stays on the foreground
[22:19] <idletask> But why didn't it do anything on the second host then? :(
[22:34] <sage> yehudasa: probably.. does it do that now?
[22:34] <yehudasa> nope
[22:34] <yehudasa> fixing it
[22:34] <sage> ok. how did you notice?
[22:34] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:34] <yehudasa> was trying to get default log dir via cconf
[22:35] <sage> oh right
[22:35] <sage> yeah
[22:47] <idletask> So, mkcephfs makes the filesystem on the first node, doesn't make it on the second, and ends up launching cosd on the first node, on the foreground
[22:47] <idletask> Is this expected?
[22:47] <sage> mkcephfs runs cosd --mkfs to initialize the local object store. it shoudl do it on every node, though, if you sepcify -a (--allhosts). when you run -v, does it try to launch on the other nodes?
[22:47] <sage> can you post your mkcephfs -v output?
[22:48] <idletask> Yes, here --> http://paste.pocoo.org/show/271501/
[22:48] <idletask> I'm studying it as well
[22:50] <idletask> I see no osd.1 related section...
[22:50] <sage> yeah.. weird
[22:51] <idletask> And as I said, I do see three SSH connections in very fast succession on erwin from carmen
[22:51] <sage> oh, is that last bit where it runs cosd returning an error code?
[22:51] <sage> it stops if it encounters an error
[22:51] <idletask> No, it just runs it and stays there
[22:52] <sage> oh, that cosd hangs?
[22:52] <idletask> Well, no, I've just seen - it eats 100% CPU
[22:53] <idletask> That's user CPU
[22:53] <sage> you can try running just that command, and add '-D --debug-osd 10 --debug-filestore 10' to see what it's doing
[22:56] <idletask> OK, that's the output --> http://paste.pocoo.org/show/271506/
[22:56] <idletask> And it stays there, same symptoms - 100% user CPU eaten
[22:59] <idletask> Uhm
[22:59] <idletask> Can the fact that the journal file is on the data store be the problem?
[23:00] <sage> i doubt it. hmm, can you attach to it with gdb and get a backtrace?
[23:02] <idletask> Argh, I need to emerge gdb first
[23:08] <idletask> OK, emerged
[23:11] <idletask> I need to recompile, it seems, I have a lot of "value optimized out" in arguments
[23:11] <idletask> I'll do this tomorrow, right now I need sleep
[23:12] <idletask> See you!
[23:12] <sage> just the function trace might be helpful
[23:12] * idletask (~fg@AOrleans-553-1-62-30.w92-152.abo.wanadoo.fr) Quit (Quit: .)
[23:30] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:34] <cmccabe1> hi all
[23:34] <cmccabe1> I'm trying to set up ceph on the sepia cluster
[23:37] <wido> sage: I forgot, the binaries get stripped....
[23:37] <wido> node07 seems down indeed, i'll check it out tomorrow
[23:38] <cmccabe1> brb

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.