#ceph IRC Log

Index

IRC Log for 2010-08-27

Timestamps are in GMT/BST.

[0:10] * MarkN (~nathan@59.167.240.178) Quit (Remote host closed the connection)
[0:12] * MarkN (~nathan@59.167.240.178) has joined #ceph
[0:33] <sagewk> markn: http://tracker.newdream.net/issues/384
[0:33] <sagewk> markn: can you upload your ceph.ko and .config?
[0:40] <MarkN> sure, my kernel .config?
[0:40] <sagewk> markn: were you doing anything funny with multiple mounts of the same fs, or deep mounts (mount -t ceph server:/some/sub/dir /mnt/foo) ?
[0:40] <sagewk> yeah
[0:40] <MarkN> yes wrt the deep mounts
[0:41] <sagewk> aha, ok. do you have the sequence of operations that led to the crash?
[0:42] <MarkN> it seemes as though, if i add another client, then do a stat system call it oops
[0:42] <MarkN> so for example cd /ceph/mount
[0:42] * allsystemsarego (~allsystem@188.26.33.211) Quit (Quit: Leaving)
[0:43] <MarkN> then do 'ls start_of_file' then tab complete it will fail
[0:43] <sagewk> (where /ceph is a mount of server:/some/subdir ?)
[0:43] <sagewk> it's crashing during readdir (triggered by tab completion it sounds like)
[0:43] <MarkN> yes
[0:43] <sagewk> how did you mount in that scenario? server:/some/subdir on /ceph?
[0:44] <MarkN> "mount -t ceph 172.17.8.2:/data/public /data/server/public"
[0:45] <sagewk> and it's crashing on something like 'ls /data/server/public' ?
[0:45] <sagewk> (i.e. root of mounted dir?)
[0:46] <MarkN> yes
[0:46] <sagewk> ok thanks, i'll see if i can reproduce that
[0:47] <MarkN> thanks. One quick question now after rebooting all machines in the cluster / clients i am getting a can't read super block error when trying to mount, what is the best way to diagnose these issues ?
[0:51] <sagewk> dmesg|tail.. if there's nothing useful there, you can crank up debugging (echo 'module ceph +p' > /sys/kernel/debug/dynamic_debug/control) and repeat
[0:52] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[1:03] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[1:19] <sagewk> markn: pushed a fix to ceph-client.git master branch, commit ce4d6eab
[1:19] <sagewk> (at least, it fixed my method of hitting the bug.) can you let me know if it fixes your problem?
[1:20] <MarkN> sure - i will get it sorted this morning
[1:20] <sagewk> thanks
[1:39] * rklahn (~rklahn@38.104.128.78) Quit (Quit: rklahn)
[1:54] * gregphone (~gregphone@166.205.141.35) has joined #ceph
[1:59] <MarkN> so sage I am trying to rebmount the filesystem and keep getting the can' read superblock issue. dmesg show nothing, syslog shows nothing, only :
[1:59] <MarkN> Aug 27 09:42:21 devgold051 kernel: ceph: client4734 fsid 081d1a85-8b2a-39ed-47e3-a4f423017857
[1:59] <MarkN> Aug 27 09:42:21 devgold051 kernel: ceph: mon0 172.17.8.2:6789 session established
[2:00] <MarkN> so no errors. anyother ideas on mounting the fs?
[2:08] <gregphone> MarkN: you tried rebooting?
[2:08] <gregphone> if that doesn't fix it you're going to need to email the list
[2:08] <MarkN> yeah, all clients and cluster nodes, all nodes are up with the correct processes running on them
[2:08] <MarkN> no worries RE list email
[2:09] <gregphone> it's just because sage and yehudasa are both out of the office until after Labor Day now
[2:10] <MarkN> what is labour day date in the US?
[2:11] <gregphone> September 6
[2:11] <gregphone> week and a half from now
[2:13] <MarkN> ah OK no problems - i will do some more digging around anyway and send it off the the lsit
[2:20] <MarkN> hmm after trying for an hour it has decided to mount OK after me going to get a tea and biscuits :)
[2:26] <gregphone> hmm
[2:27] <gregphone> my WAG was that one of the server addresses or the raid wasn't updating properly, guess it finally flushed out due to a timeout or something
[2:27] <gregphone> *raid -> fsid
[2:29] <MarkN> anyway to check this in the future if it happens again ?
[2:30] <gregphone> probably if you enable debug output it'll tell you what's going wrong when the mount fails
[2:31] <gregphone> not sure how thorough that coverage is, though
[3:00] * gregphone (~gregphone@166.205.141.35) Quit (Quit: Rooms • iPhone IRC Client • http://www.roomsapp.mobi)
[4:45] * Osso (osso@AMontsouris-755-1-5-251.w86-212.abo.wanadoo.fr) Quit (Quit: Osso)
[6:19] <wido> sagewk: still there? any idea how long class loading should take?
[6:55] * f4m8_ is now known as f4m8
[7:43] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:08] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:18] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[9:31] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[9:58] * Yoric (~David@213.144.210.93) has joined #ceph
[10:30] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[10:55] * allsystemsarego (~allsystem@188.26.33.211) has joined #ceph
[11:48] * Yoric (~David@213.144.210.93) has joined #ceph
[12:23] * Osso (osso@AMontsouris-755-1-5-251.w86-212.abo.wanadoo.fr) has joined #ceph
[12:56] * Osso_ (osso@AMontsouris-755-1-5-251.w86-212.abo.wanadoo.fr) has joined #ceph
[12:56] * Osso (osso@AMontsouris-755-1-5-251.w86-212.abo.wanadoo.fr) Quit (Remote host closed the connection)
[12:56] * Osso_ is now known as Osso
[14:09] <todinini> wido: the rbd-support.patch does not work, applies cleanly but the function error_report is not defined
[15:07] <wido> todinini: oh, might be and old patch then
[15:07] <wido> let me check
[15:09] <wido> oh, yes, old patch, adding a new one right now
[15:09] <wido> should use printf instead of error_report
[15:11] <wido> todinini: http://tracker.newdream.net/issues/341
[15:11] <todinini> wido: ok, I will try again
[15:24] <wido> todinini: http://tracker.newdream.net/issues/381
[15:24] <wido> are you still hitting that too?
[15:30] <todinini> wido: at one point you call the volume delta and further down alpha, may be that is the problem?
[15:33] <wido> no, alpha - charlie exist, so i try to ls the snapshots
[15:33] <wido> and i try to create the "delta" volume
[15:40] <todinini> wido: that's wired because it works for me
[15:41] * f4m8 is now known as f4m8_
[15:42] <wido> todinini: yeah, i'm still trying to figure it out...
[15:47] <todinini> hmm I can't compile the ubunut libvirt-0.7.5 package even the original source .deb is failling
[16:54] * gregphone (~gregphone@166.205.136.82) has joined #ceph
[17:36] * gregphone (~gregphone@166.205.136.82) Quit (Quit: Rooms • iPhone IRC Client • http://www.roomsapp.mobi)
[17:37] * gregphone (~gregphone@166.205.136.82) has joined #ceph
[17:45] <wido> todinini: why not apt-get my packages?
[17:59] <wido> yehudasa: you there? I just tried to load RBD again, but right now it won't even load
[18:00] <wido> restarted my whole cluster, cclass -a, waited for some time, but RBD never shows, only sync 1.0
[18:51] * gregphone (~gregphone@166.205.136.82) Quit (Ping timeout: 480 seconds)
[19:09] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[19:47] <gregaf> wido: your recent MDS crash is actually a different issue from #312, involving the distributed lock manager
[19:48] <gregaf> are your MDSes just refusing to come up now, or is your cluster working again?
[19:50] <gregaf> and what version of the code were you running when it crashed the first time?
[19:58] <wido> gregaf: my MDS'es will start, but crash after some time
[19:58] <gregaf> with that same backtrace?
[19:58] <wido> and i upgraded from yesterdays unstable to the one of this morning
[19:59] <wido> yes, those backtraces are from the crashes i saw today (i preserved the timestamps)
[20:04] <gregaf> all right, I'll look at it a bit more and see if I can work out what's going on or if it's safe to just nix the assert, but if you could make a new issue it'd be good since a final resolution will probably have to wait on Sage
[20:05] <wido> gregaf: any suggestions for a issue subject?
[20:06] <gregaf> failed assertion in Locker::scatter_nudge
[20:07] <wido> ok, i'll do that in a minute
[20:07] <gregaf> I can see it in the code easily enough but it looks like it's just catching an issue in/with the distributed lock manager that occurred earlier
[20:09] * Yoric (~David@88.189.211.192) has joined #ceph
[20:38] <wido> gregaf: http://tracker.newdream.net/issues/385
[22:39] <kblin> evening folks
[22:40] <gregaf> hey
[23:16] * Yoric (~David@88.189.211.192) Quit (Quit: Yoric)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.