#ceph IRC Log

Index

IRC Log for 2010-10-12

Timestamps are in GMT/BST.

[1:37] <darkfader> boooooooh. boooh.
[1:37] <darkfader> 10.10.11 23:35:06.651018 b643fb70 -- 192.168.19.26:0/2362 learned my addr 192.168.19.26:0/2362
[1:37] <darkfader> it hangs
[1:37] <darkfader> just after i unbroke most of it
[1:37] <darkfader> i need to switch to gentoo or something
[1:38] <darkfader> anything that doesnt try to think for its user.
[1:38] <sagewk> you mean restart on dpkg upgrades?
[1:38] <darkfader> sagewk: yeah thats very easy to reproduce
[1:39] <darkfader> it doesn't properly shutdown and then tries to start twice
[1:39] <darkfader> and i'm missing the kclient package
[1:39] <darkfader> but i think i just had a very unlucky day
[1:43] <darkfader> gregaf: are you using a stock debian kernel for testing?
[1:43] <darkfader> (squeeze)
[1:44] <sagewk> no, we're building mostly off mainline kernels.
[1:44] <sagewk> we haven't been testing the backports much internally
[1:44] <darkfader> i'll have to do something to that end
[1:45] <darkfader> right now i dont have a ceph.ko in my /lib/modules...
[1:45] <darkfader> it's creepy.
[2:08] * greglap (~Adium@166.205.139.111) has joined #ceph
[2:10] * MarkN (~nathan@59.167.240.178) Quit (Ping timeout: 480 seconds)
[2:20] * MarkN (~nathan@59.167.240.178) has joined #ceph
[2:56] * greglap (~Adium@166.205.139.111) Quit (Read error: Connection reset by peer)
[3:07] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[5:50] <sage> greglap: there?
[6:42] <greglap> sage: am now!
[6:43] <sage> too slow, now i forget what i was going to ask you :)
[6:43] <greglap> saw your commit, I think that's all the assert failures I've seen
[6:43] <sage> fixed a few more that haven't been pushed yet.
[6:43] <greglap> will have to re-run and see if I can dredge anything else up, and if there are still hanging client requests, which I've seen a few times
[6:44] <sage> also saw some of the pjd errors again from that other bug
[6:44] <greglap> yeah, I saw one or two of them once or twice before asserting, maybe I can start actually tracing those now
[6:45] <greglap> hmm, shouldn't that new assert be
[6:45] <greglap> assert (state == LOCK_XLOCK || (is_locallock() && state == LOCK_LOCK)
[6:45] <greglap> ?
[6:47] <sage> nope, it can be any type of lock...
[6:47] <sage> i think
[6:47] <greglap> k, just doesn't quite match my reading of your commit message
[6:47] <sage> this is from dispatch_slave_request() OP_XLOCK
[6:49] <sage> Server.cc:1247, if the slave says "yes, i xlocked it", we get_xlock() locally. but we're a replica, and in the LOCK state.
[6:50] <greglap> oh, right, I'm just confused, n/m
[6:51] <greglap> was mentally inserting "is_replica()" or something instead of "is_locallock()"
[6:52] <sage> or local vs replica
[6:58] <greglap> sage: can you push those other fixes tonight sometime? I'd like to look at pjd tomorrow before we meet
[6:58] <greglap> going to bed now!
[7:40] <sage> pushed!
[7:53] * f4m8_ is now known as f4m8
[9:04] * allsystemsarego (~allsystem@188.27.167.124) has joined #ceph
[10:06] * Yoric (~David@213.144.210.93) has joined #ceph
[11:02] * sentinel_e86_ (~sentinel_@188.226.51.71) Quit (Quit: sh** happened)
[11:03] * sentinel_e86 (~sentinel_@188.226.51.71) has joined #ceph
[13:39] * Yoric_ (~David@213.144.210.93) has joined #ceph
[13:39] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[13:39] * Yoric_ is now known as Yoric
[13:53] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[13:53] * Yoric (~David@213.144.210.93) has joined #ceph
[14:49] <alexxy> hi all!
[14:49] <alexxy> i get mds crassed every time on start
[14:50] <alexxy> http://paste.pocoo.org/show/274450/
[15:34] * Yoric_ (~David@213.144.210.93) has joined #ceph
[15:34] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[15:34] * Yoric_ is now known as Yoric
[15:36] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[15:36] * Yoric (~David@213.144.210.93) has joined #ceph
[15:41] * f4m8 is now known as f4m8_
[15:54] * hijacker (~hijacker@213.91.163.5) Quit (Remote host closed the connection)
[16:00] * Yoric_ (~David@213.144.210.93) has joined #ceph
[16:00] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[16:00] * Yoric_ is now known as Yoric
[16:03] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[16:06] * Yoric_ (~David@213.144.210.93) has joined #ceph
[16:06] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[16:06] * Yoric_ is now known as Yoric
[16:17] * Yoric_ (~David@213.144.210.93) has joined #ceph
[16:17] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[16:17] * Yoric_ is now known as Yoric
[16:35] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (Quit: Leaving.)
[16:48] * greglap (~Adium@166.205.136.144) has joined #ceph
[17:31] <wido> alexxy: are you running from GIT or a release?
[17:32] <wido> it is actually an older bug, see: http://tracker.newdream.net/issues/385
[17:32] <wido> It has been fixed, but i'm not sure if it got into the latest release
[17:33] <wido> seeing the fix date (07-09-2010) it should be fixed in 0.21.3, that is from 18-09-2010
[17:37] * greglap (~Adium@166.205.136.144) Quit (Read error: Connection reset by peer)
[17:39] * tonyb486 (~tonyb@debian.ams.sunysb.edu) has joined #ceph
[18:08] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:18] * Yoric_ (~David@213.144.210.93) has joined #ceph
[18:18] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[18:18] * Yoric_ is now known as Yoric
[18:29] <gregaf> wido: there?
[18:29] <gregaf> looked at your ceph -s issue a bit, and Sage thinks it's just because your clocks are off
[18:29] <gregaf> by 2 minutes
[18:35] <alexxy> wido: i'm runnign release
[18:39] * alexxy[home] (~alexxy@79.173.82.178) has joined #ceph
[18:39] * alexxy (~alexxy@79.173.82.178) Quit (Read error: Connection reset by peer)
[18:42] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[18:47] <wido> gregaf: hmm, ok
[18:47] <wido> i'll check the clocks
[18:47] <wido> alexxy[home]: try running from git (rc branch)
[18:47] <alexxy[home]> wido: ok i'll try that
[19:24] <wido> gregaf: yes, it seemed to be the clocks. Didn't notice that, but I never got a warning about it :)
[19:28] <yehudasa> wido: about bug #473, I see this on the kernel output:
[19:28] <yehudasa> Oct 9 13:26:01 client01 kernel: [366850.520220] Pid: 4, comm: kworker/0:0 Tainted: G D 2.6.36-rc5-rbd-20014-g53f0521 #3
[19:28] <yehudasa> note the rc5-rbd tag line
[19:28] <yehudasa> could it be by any chance that it was some other branch, and not the master branch?
[19:33] <wido> yehudasa: i'm not sure, but almost 100% it was the master branch
[19:33] <wido> I added -rbd myself with make-kpkg --append-version="-rbd"
[19:33] <yehudasa> oh, ok
[19:33] <wido> But I had a seperate ceph, libceph and rbd branch
[19:33] <wido> branch, uh, module
[19:34] <wido> is that split already done in the master branch?
[19:34] <yehudasa> yes, it's there
[19:35] <wido> ok, weird, I'm at rc7 right now with ceph-client, but there is no libceph anymore, just the "ceph" module
[19:35] <yehudasa> oh, I take it back
[19:36] <yehudasa> yeah, so there's no way it was the master branch
[19:36] <yehudasa> because you have libceph loaded there
[19:37] <wido> then I might be confused, had been playing around a lot with the kernel
[19:37] <wido> I wanted to test RBD, but then gave the FS a try too
[19:38] <wido> You could leave it for now? Might be a search which ends up with nothing
[19:42] <wido> stick with the master branch?
[19:47] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:50] <yehudasa> yeah, just use the master branch at this point
[19:55] <wido> I've got something else since this morning: http://www.pastebin.org/155369
[19:55] <wido> As you can see, my cluster stays degraded, while all the OSD's are up (cluster had a fresh mkcephfs a few days ago)
[19:56] <wido> And mounting keeps failing, no messages in the MON nor MDS logs
[20:00] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[21:25] * alexxy[home] is now known as alexxy
[21:40] * allsystemsarego (~allsystem@188.27.167.124) Quit (Quit: Leaving)
[21:44] <gregaf> wido: you actually were getting clock drift warnings if you look at the log
[21:44] <gregaf> but they weren't showing up from ceph -s due to an extra endl being printed out
[21:44] <gregaf> fixed in the rc branch now
[21:55] <wido> gregaf: ah, ok :) Never check the logs, only ceph -s/-w
[22:27] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:50] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[23:45] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.