#ceph IRC Log

Index

IRC Log for 2011-11-28

Timestamps are in GMT/BST.

[0:24] * Nightdog (~karl@190.84-48-62.nextgentel.com) Quit (Ping timeout: 480 seconds)
[0:27] * Nightdog (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[1:39] * Nightdog (~karl@190.84-48-62.nextgentel.com) Quit (Remote host closed the connection)
[6:14] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:18] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[6:34] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[8:51] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[9:00] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:12] * hijacker (~hijacker@213.91.163.5) Quit (Remote host closed the connection)
[9:35] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[9:44] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[9:44] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[9:56] * gregorg (~Greg@78.155.152.6) has joined #ceph
[12:31] * hijacker (~hijacker@213.91.163.5) Quit (Remote host closed the connection)
[12:53] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[14:07] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[14:23] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[14:30] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[15:08] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:28] <chaos_> gregaf, sagewk ping;-)
[16:03] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has left #ceph
[16:14] * damien (~damien@andromeda.digitalnetworks.co.uk) has joined #ceph
[16:28] * damien is now known as damoxc
[16:35] <damoxc> I seem to keep getting 18 pgs stuck in creating, creating a new filesystem using mkcephfs
[16:36] <damoxc> does anyone have any suggestions as to why this is happening?
[16:37] <damoxc> they don't seem to have been assigned to any of the osds so I'm not really sure where to look
[16:37] <damoxc> http://pastie.org/2933733
[16:57] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[17:01] * Nightdog (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[17:15] * adjohn is now known as Guest18583
[17:15] * Guest18583 (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[17:15] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[17:35] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[17:37] * Nightdog_ (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[17:43] * Nightdog (~karl@190.84-48-62.nextgentel.com) Quit (Ping timeout: 480 seconds)
[17:46] * Nightdog_ (~karl@190.84-48-62.nextgentel.com) Quit (Ping timeout: 480 seconds)
[17:51] <gregaf> damoxc: the problem is that they're not assigned to any OSDs; something's wrong in the monitors or the CRUSH placement code :/
[17:51] <gregaf> do you have any monitor logs for the time period around their creation?
[17:52] <gregaf> lxo: guess there's a problem in how we handle "snaprealms" — keeping track of directories inside a snapshot isn't easy or trivial; there's a lot of logic involved
[17:52] <damoxc> gregaf: doesn't look like it, what log levels should I turn up, this is completely reproducible
[17:52] <gregaf> damoxc: you mean you keep recreating the pool/cluster/whatever and those PGs are always stuck?
[17:53] <damoxc> gregaf: upon running mkcephfs each time they are stuck
[17:53] <damoxc> gregaf: i do have one 32bit machine in the cluster, but I'm guessing that's not the problem?
[17:54] <gregaf> no, not in this case
[17:56] <damoxc> gregaf: will debug mon = 20 be sufficient?
[17:56] <gregaf> I think so
[17:57] <gregaf> I was looking to see if we had any extra crush debugging, but we'll have to make do with the monitor
[17:57] <gregaf> and a copy of your ceph.conf
[17:57] <chaos_> gregaf, could you read my problem description from saturday?
[17:57] <gregaf> chaos_: yeah, looking at it
[17:57] <chaos_> it's quite disturbing.. same thing happend at both mdses
[17:58] <chaos_> and my ceph cluster is down;(
[17:58] <gregaf> yeah, the MDS log has gotten into a wonky state, looks like
[17:58] <chaos_> any way to fix this?
[17:58] <gregaf> or else (hopefully) our logic is just wrong and that's not a valid assert here
[17:58] <gregaf> I'll need to talk to Sage about it
[17:58] <chaos_> it would be great!
[17:58] <chaos_> i can provide any additional data
[17:59] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:01] <gregaf> chaos_: it will probably be helpful if you can get a replay log with full MDS debugging
[18:01] <chaos_> how i can do that?
[18:03] <gregaf> start up an MDS with --debug_mds 20
[18:03] <chaos_> can i just add debug mds = 20 to ceph.conf?
[18:04] <gregaf> sure, as long as it's on your MDS node :)
[18:04] <chaos_> ;>
[18:04] <chaos_> in a moment
[18:07] <chaos_> ok uploading
[18:07] <chaos_> from start to crash in 246MB of log file :/
[18:12] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[18:13] <gregaf> yep, it's dense
[18:13] <chaos_> well it's debug data;)
[18:13] <chaos_> it will be ready in 6 minutes
[18:13] <damoxc> gregaf: how much of the monitor log would you like?
[18:15] <sagewk> chaos_: great, can you open a bug and attach/link to the log from there?
[18:15] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[18:17] <chaos_> sagewk, sure
[18:17] <chaos_> on my way
[18:17] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:19] <damoxc> gregaf: http://damoxc.net/mon.dev1.log.gz, from start of monitor until all the pgs had been created, aside from those 18
[18:20] <gregaf> damoxc: thanks
[18:20] <chaos_> erm.. i need some extra permissions to create bug?
[18:20] <gregaf> it might be a while, we've got several things going on today, but I'll make sure somebody looks at it before the end of the day :)
[18:21] <sagewk> chaos_: i think yhou just need to sign in? what is your username?
[18:21] <chaos_> "szymon"
[18:23] <chaos_> ok found it
[18:23] <chaos_> do you need my ceph.conf?
[18:23] <sagewk> naw
[18:28] <chaos_> sagewk, gregaf http://tracker.newdream.net/issues/1756
[18:29] <sagewk> chaos_: thanks
[18:29] <chaos_> i'll say thanks if you do something usefull with this ;)
[18:29] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:29] <grape> if a client can't create a keyring, is that a problem with regular file permissions or with ceph keyring access permissions (referring to the log entry "monclient(hunting): MonClient::init(): Failed to create keyring") or am I just completely off base.
[18:30] <damoxc> gregaf: awesome thanks :-)
[18:40] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:41] <Tv> grape: the word "create" is misleading there, i guess.. it's basically saying "i don't have a keyring"
[18:41] <Tv> grape: could be the path is not there, maybe even is unreadable, ...
[18:42] <Tv> grape: is there a message like "failed to load" before it, perhaps?
[18:45] <damoxc> sagewk: did you get my osd logs?
[18:47] <grape> Tv: Let me wipe it and run the setup again with crypt.
[18:49] * guido (~guido@mx1.hannover.ccc.de) has joined #ceph
[18:51] <grape> Tv: after setting everything up ceph health gives me the following:
[18:51] <grape> monclient(hunting): build_initial_monmap
[18:51] <grape> -- :/0 messenger.start
[18:51] <grape> monclient(hunting): init
[18:51] <grape> monclient(hunting): MonClient::init(): Failed to create keyring
[18:51] <grape> ceph_tool_common_init failed.
[18:52] <grape> Tv: everything runs without error on mkcephfs
[18:56] <Tv> grape: the logic is in src/auth/KeyRing.cc, line 35- "from_ceph_context", something is making that return NULL
[18:56] <Tv> grape: many of those should cause extra log entries though
[18:56] <Tv> grape: we're about to all head into a meeting so i might not be able to help you
[18:58] <grape> Tv: I did see that when I ran mkcephfs without crypt it was telling me that the connection was refused.
[18:58] <grape> Tv: Thanks for the clues I'll dig into it and see what I can come up with
[19:00] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:07] <sjust> damoxc: could you post the current osdmap?
[19:08] <sjust> ceph osd getmap
[19:08] <damoxc> sjust: for which cluster?
[19:08] <sjust> the one with the pgs stuck in creating
[19:08] <damoxc> sjust: yup, let me just start it up again
[19:08] <sjust> cool
[19:12] <damoxc> sjust: http://damoxc.net/osdmap
[19:15] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[19:22] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[19:34] <sjust> damoxc: yup, some of the preferred placement pgs aren't getting mapped, looking into it now
[19:35] <damoxc> sjust: okay thanks
[19:46] <damoxc> sagewk: is there anything I can do to help test the rbd image layering?
[19:47] <sagewk> not yet, we still don't have the basic implementation. hopefully this week
[19:48] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) has joined #ceph
[19:57] <chaos_> sagewk, is there any chance that someone will look into my bug?
[19:59] <sagewk> chaos_: yeah i will try to get to it today.. have a full schedule but should be able to
[19:59] * elder (~elder@aon.hq.newdream.net) Quit (Quit: Leaving)
[19:59] <chaos_> appreciate it
[20:12] * cp (~cp@74.85.19.35) has joined #ceph
[20:14] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[20:25] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[20:39] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[20:39] <- *MarkDude* help
[20:39] <- *MarkDude* .help
[20:39] <- *MarkDude* commands
[20:44] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[21:05] * adjohn is now known as Guest18612
[21:05] * Guest18612 (~adjohn@208.90.214.43) Quit (Read error: Connection reset by peer)
[21:05] * _adjohn (~adjohn@208.90.214.43) has joined #ceph
[21:05] * _adjohn is now known as adjohn
[21:05] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[21:14] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:16] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[21:26] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[21:30] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:38] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[22:01] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:01] * Tv_ (~tv@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[22:01] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[22:04] * verwilst (~verwilst@d51A5B509.access.telenet.be) has joined #ceph
[22:07] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[22:22] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:32] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[22:33] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[22:37] <grape> ceph.newdream.net web site & wiki appear to be down
[22:50] <MarkDude> Wiki fail is intermitent
[22:50] <MarkDude> At least earlier in the day
[22:52] * greglap (~Adium@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[22:56] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: off to Berkeley)
[22:56] <grape> MarkDude: I think that is it's normal state. I really should accept the fact that I need to push my stuff to github.
[22:57] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[23:05] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:34] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:40] * verwilst (~verwilst@d51A5B509.access.telenet.be) Quit (Quit: Ex-Chat)
[23:47] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[23:53] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:55] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[23:55] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.