#ceph IRC Log


IRC Log for 2010-09-07

Timestamps are in GMT/BST.

[0:03] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) Quit (Quit: Osso)
[1:33] * jan (~jan@paranoid.nl) has joined #ceph
[1:33] * jan is now known as Guest834
[1:34] * Guest834 is now known as jantje
[2:07] * MK_FG (~fraggod@ has joined #ceph
[3:49] * MK_FG (~fraggod@ Quit (Remote host closed the connection)
[5:31] * MK_FG (~fraggod@wall.mplik.ru) has joined #ceph
[6:49] * f4m8_ is now known as f4m8
[8:03] * Guest649 (quasselcor@bas11-montreal02-1128531099.dsl.bell.ca) Quit (Remote host closed the connection)
[8:05] * bbigras (quasselcor@bas11-montreal02-1128531099.dsl.bell.ca) has joined #ceph
[8:06] * bbigras is now known as Guest864
[8:31] <MarkN> anyone seen this before? http://pastebin.com/gDd7EfHb
[8:32] <MarkN> all mds + osd + mons are up however all logfiles on all nodes are filling up with the above
[8:33] <MarkN> and i can not mount the filesystem
[8:34] <MarkN> devgold050:~# mount -t ceph /mnt/ceph/
[8:34] <MarkN> mount: can't read superblock
[8:37] * allsystemsarego (~allsystem@ has joined #ceph
[8:49] <MK_FG> And you've ruled out network problem, right?
[8:52] <MarkN> yeah it is semmingly random after a reboot/restart ceph of all machines
[8:53] <MarkN> ok off home now, will check back in tomorrow
[9:39] * hijacker (~hijacker@ Quit (reticulum.oftc.net kilo.oftc.net)
[9:39] * hijacker (~hijacker@ has joined #ceph
[11:48] * MK_FG (~fraggod@wall.mplik.ru) Quit (Remote host closed the connection)
[12:29] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[12:49] * hijacker (~hijacker@ Quit (Ping timeout: 480 seconds)
[12:49] * hijacker (~hijacker@ has joined #ceph
[13:30] * MK_FG (~fraggod@ has joined #ceph
[13:34] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[13:35] * Osso_ (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[13:35] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) Quit (Remote host closed the connection)
[13:35] * Osso_ is now known as Osso
[14:22] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[14:48] * MarkN (~nathan@ Quit (Ping timeout: 480 seconds)
[14:59] * MarkN (~nathan@ has joined #ceph
[18:22] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[18:23] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[19:00] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:15] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[19:24] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[19:42] * kblin_ (~kai@h1467546.stratoserver.net) has joined #ceph
[19:42] * kblin (~kai@h1467546.stratoserver.net) Quit (Read error: Connection reset by peer)
[19:43] * DLange is now known as Guest915
[19:43] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[19:45] * Guest915 (~DLange@dlange.user.oftc.net) Quit (Read error: Connection reset by peer)
[20:08] <wido> I've ran selfmanaged_snap_create on a pool, but now i'm not able to snapshot it anymore
[20:09] <wido> is there some timeout on this?
[20:09] <sagewk> wido: yeah, currently once you go one way there's no going back.
[20:09] <wido> ok, and what's the snap after that?
[20:09] <wido> exec?
[20:10] <sagewk> wido: we can make a monitor command for force the pool into the other mode...
[20:10] <wido> I'm not sure what the snapid is, so removing is not possible
[20:10] <wido> btw, I really have no idea what the selfmanaged snaps are
[20:11] <sagewk> it's a way to hook directly into the low-level snapid stuff so that you can do snapshots on a per object (or per-whatever) basis. the caller is responsible for building snap contexts for each io.
[20:11] <sagewk> it's what the mds uses for the file/directory snapshots, and what rbd uses for per-image snapshots.
[20:11] <sagewk> they just don't mix with the per-pool snapshots.
[20:11] <wido> I thought so.
[20:12] <wido> Running exec would fix this? Or remove the selfmanaged snapshot? If the last one, how to get the ID? There is no list method
[20:12] <sagewk> btw, i just pushed a fix for your mds issue.
[20:12] <wido> ah, great, i'll give it a try
[20:12] <sagewk> well there's currently no fix... we'd need to add a monitor command to mean something like "forget all my selfmanaged snaps and switch to per-pool snapshots"
[20:13] <sagewk> not sure if it's worth it?
[20:13] <sagewk> there's no central list of the selfmanaged snapshot (the user is responsible for tracking them). so it's not obvous what (if anything) to remove.
[20:16] <gregaf> we could do something gross like taking a self-managed snapshot of everything in the pool, maybe?
[20:16] <gregaf> and then set that as a pool snapshot
[20:18] <wido> So the workaround for now is just removing the pool? There is no precious data in it
[20:18] <gregaf> but it'd be slow to run and I don't think it's necessarily appropriate
[20:18] <sagewk> yeah, just kill the pool
[20:18] <gregaf> yeah, self-managed snaps are a one-way street, partly be necessity and partly by design
[20:19] <sagewk> i think the "fix it up" would be for the monitor to switch to pool snap mode, continue allocating snapids where it left off, and add all prior snapids to the removed list. something along those lines. anything using the existing selfmanaged snaps would be out of luck.
[20:20] <sagewk> but for now, I think it's fine to leave this in the "don't do that, it doesn't work" category. :)
[20:21] <sagewk> wido: btw, i know you've been down for a week from that mds thing, but before that have you been doing korg rsync tests? there's still an outstanding bug that we haven't been able to reproduce that you saw a few times... #329
[20:22] <sagewk> basically: rsync korg, restart mds, crash during replay, irrc. need the full mds logs (from rsync and mds restart) to track down, but we haven't been able to reproduce it here..
[20:23] <wido> sagewk: i'll give it a try. But i'm going on vacaction tomorrow morning, so won't have much time
[20:23] <wido> it's already evening here
[20:23] <sagewk> ok no worries, when you get a chance!
[20:23] <wido> but you
[20:23] <wido> but you're free to log on
[20:28] <sagewk> oh right :) ok i'll try it when you leave
[20:28] <sagewk> how long are you gone?
[20:29] <wido> I'll boe gone until sunday, early monday (my time offcourse)
[20:29] <sagewk> k
[20:29] <wido> I'm building your commit in the mds_replay_lock_states branch right now, wiĺl install them on the MDS
[20:36] * klooie (~kloo@a82-92-246-211.adsl.xs4all.nl) has joined #ceph
[20:36] <klooie> hi.
[20:36] <gregaf> hi klooie
[21:07] <wido> sagewk: the MDS'es start again :-) And the FS mounts too
[21:09] <sagewk> yay, thanks
[21:09] <wido> i'm afk, tty next week!
[21:09] <sagewk> have fun!
[21:09] <wido> tnx
[22:52] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:55] * klooie (~kloo@a82-92-246-211.adsl.xs4all.nl) Quit (Quit: sleep.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.