#ceph IRC Log


IRC Log for 2011-12-08

Timestamps are in GMT/BST.

[0:02] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) Quit (Quit: WeeChat 0.3.2)
[0:39] * MarkDude (~MT@ Quit (Quit: Leaving)
[0:42] <cp> Question: is it possible to start Ceph up with no osds running?
[0:43] <Tv> cp: sure, e.g. run the ceph-mon processes only
[0:43] <Tv> cp: whether that's useful is a different question ;)
[0:43] <Tv> cp: but that is definitely how the up-and-coming chef recipes will deploy ceph
[0:44] <Tv> it'll start off with no osds
[0:45] <cp> TV: oh, I couldn't do that last time I tried
[0:45] <Tv> cp: you might not be able to construct an initial osdmap without an osd in it
[0:45] <Tv> cp: that's a separate issue from not starting any actual daemons
[0:45] <Tv> (worst case, leave osd.0 in there, never use it, always have it down & out ;)
[0:46] <Tv> i haven't fully explored that area yet
[0:47] <cp> Ah - would things like "rbd list" work in that setup? I guess there won't be any pgs to look at so it would hang.
[0:47] <Tv> cp: there would be no way to fetch the list
[0:48] <cp> Hmmm.. I guess I wonder whether ceph -s would work or not
[0:48] <cp> I can try
[0:48] <Tv> it will
[0:48] <Tv> it'll just say 0 osds are up
[0:49] <Tv> ceph health will scream bloody murder
[0:56] * aa (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) has joined #ceph
[1:00] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) has joined #ceph
[1:00] * aa (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) Quit (Read error: Connection reset by peer)
[1:04] <grape> Thanks to all of you for your help, the install docs I have been cobbling together seem to be pretty workable and might be helpful to someone. They live here: https://github.com/nugoat/ceph/tree/master/doc/ops/install/ceph-ubuntu-howto -- The block device and qemu docs are in hot pursuit, as you will soon see ;-)
[1:07] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[1:07] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) has joined #ceph
[1:14] <grape> Now that is out of the way, I need to figure out what comes after what when the goal is to connect via qemu. Am I correct in thinking that I need to set up a user, pool and secret, then test out a block device, and then move on to configuring qemu?
[1:22] <joshd> not sure what you mean by 'test out a block device' - all you need to do is create one and then pass it as a command line option to qemu
[1:24] <grape> by testing out I was simply trying to prove that it actually existed once created
[1:24] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[1:24] <grape> nothing special
[1:25] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) has joined #ceph
[1:26] <joshd> ah, makes sense
[1:29] * root___ (~root@ has joined #ceph
[1:29] * root___ is now known as huangjun
[1:35] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[1:51] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[1:56] * adjohn (~adjohn@ Quit (Quit: adjohn)
[2:02] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:10] * adjohn (~adjohn@ has joined #ceph
[2:11] * adjohn is now known as Guest19732
[2:11] * _adjohn (~adjohn@ has joined #ceph
[2:11] * _adjohn is now known as adjohn
[2:14] * cp (~cp@adsl-75-6-253-220.dsl.pltn13.sbcglobal.net) Quit (Quit: cp)
[2:18] * Guest19732 (~adjohn@ Quit (Ping timeout: 480 seconds)
[2:27] <grape> joshd: regarding my previous question about the order of things, is it required that I create a user and grant permissions before creating a block device?
[2:27] <joshd> well, you can create them with the admin user if you want
[2:28] <grape> probably sub-optimal
[2:28] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[2:28] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit ()
[2:29] <grape> must the secret be created prior to the block device?
[2:29] <joshd> rbd itself doesn't know anything about users - it's only enforced by rados
[2:30] <joshd> so you'd want a separate pool for the user, and create the rbd device there
[2:30] <joshd> like the end of this section: http://glance.openstack.org/configuring.html#configuring-the-rbd-storage-backend
[2:32] <grape> that's a great help!
[2:34] <ajm> sagewk/sjust: did either of you have a chance to look at the two osd issues from yesterday?
[2:35] <gregaf> ajm: I think sjust got one of them, but they're both out for the day...
[2:35] <ajm> as-in fixed?
[2:36] <gregaf> I believe so, yes
[2:36] <ajm> hrm, let me test :)
[2:36] <gregaf> joshd might know
[2:38] <joshd> he fixed something with sub_op_push, not sure if that was one of your issues
[2:39] <ajm> no unfortunately :/
[2:53] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:58] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:05] <ajm> hrm, anyone around to speculate what would happen if I just move aside that journal with the bad entry
[3:09] <gohko> was ceph web site cracked? my anti virus tools said infected web site.
[3:11] <gohko> Trend Micro's virus buster is suck!
[3:14] <grape> gohko: lol I don't think there is anything naughty going on with the site. Sometimes it slows down, but sooner or later it works again if you reload.
[3:15] <ajm> "mediawiki can't connect to database"
[3:15] <ajm> :P
[3:16] <nhm> ajm: no idea with ceph, I've had to that kind of thing with ext3/4 and lustre.
[3:17] * MarkN (~nathan@ has joined #ceph
[3:20] * adjohn (~adjohn@ Quit (Quit: adjohn)
[3:25] <grape> ajm: if you are looking for docs, have a look into http://ceph.newdream.net/docs/latest/
[3:27] <ajm> grape: this is the stuff that Tv was also working on? (I think)
[3:27] <ajm> looks very good
[3:28] * adjohn (~adjohn@ has joined #ceph
[3:37] * MarkN (~nathan@ has left #ceph
[3:38] * adjohn (~adjohn@ Quit (Ping timeout: 480 seconds)
[3:49] <grape> ajm: I believe it is the same as the docs on github.
[4:20] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:22] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[5:49] * MarkN (~nathan@ has joined #ceph
[5:54] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) has joined #ceph
[5:57] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[6:13] * andresambrois (~aa@r186-53-132-219.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[6:33] * MarkN (~nathan@ has left #ceph
[6:35] * monrad (~mmk@domitian.tdx.dk) Quit (Ping timeout: 480 seconds)
[6:36] * DLange (~DLange@dlange.user.oftc.net) Quit (Read error: Connection reset by peer)
[6:36] * al (d@niel.cx) Quit (Ping timeout: 480 seconds)
[6:37] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[6:38] * darkfaded (~floh@ Quit (Ping timeout: 480 seconds)
[6:38] * darkfader (~floh@ has joined #ceph
[6:42] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[7:07] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:35] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:06] * The_Bishop (~bishop@port-92-206-76-165.dynamic.qsc.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[9:10] * stxShadow (~jens@p4FD07F96.dip.t-dialin.net) has joined #ceph
[9:10] <stxShadow> Hi
[9:11] <stxShadow> i someone in this channel who kindly may answer some questions about ceph ?
[9:44] * stxShadow (~jens@p4FD07F96.dip.t-dialin.net) Quit (Quit: Ex-Chat)
[9:44] * stxShadow (~jens@p4FD07F96.dip.t-dialin.net) has joined #ceph
[9:52] <stxShadow> 2011-12-08 09:48:33.234434 pg v39350: 1150 pgs: 18 creating, 1132 active+clean; 205 GB data, 370 GB used, 14362 GB / 14901 GB avail
[9:52] <stxShadow> can someone please tell me what "18 creating" is ?
[9:53] <stxShadow> we have 2 testsystems
[9:53] <stxShadow> on one we have "18 creating" for a couple of days now
[10:00] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[10:17] * nms_ is now known as nms
[10:47] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[10:50] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[10:50] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit ()
[11:15] * huangjun (~root@ Quit (Quit: leaving)
[13:33] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:32] * NaioN (~stefan@andor.naion.nl) Quit (Ping timeout: 480 seconds)
[14:35] * NaioN (~stefan@andor.naion.nl) has joined #ceph
[15:03] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:07] * Sharky-- (~Sharky-@D57D69B2.static.ziggozakelijk.nl) has joined #ceph
[15:09] <Sharky--> hi, a few questions: should I regard ceph as being production ready or is it really still 'unstable' ? second, anyone have experience with running mysql/innodb instances on ceph (does it handle aio well?) ?
[15:20] <stxShadow> in my opinion -> its not production ready. far to many bugs und crashed even with 0.40
[15:21] <Sharky--> mkay
[15:22] <Sharky--> seems that there are already a lot of features, I was expecting the core to be quite stable
[15:23] <Sharky--> (from reading the website, that is)
[15:24] <stxShadow> we have two ceph clusters .... one with 14 TB Space (3 ODS, 2 MDS, 3 MON)
[15:24] <stxShadow> and i think this one is dieing once a day under heavy load
[15:25] <stxShadow> but -> we never tried mysql
[15:25] <stxShadow> one qemu and rados
[15:25] <stxShadow> only
[15:26] <stxShadow> maybe simple access as client ist stable ....
[15:40] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[15:46] <Sharky--> stxShadow: you losing data as well then? or you have to reboot stuff?
[15:48] <stxShadow> data loss was not a problem so far
[15:48] <stxShadow> but node recovery
[15:49] <Sharky--> you need to do recovery a lot? I'd think that you wouldn't have daily failures in 14TB of data?
[15:50] <stxShadow> sometimes the rados part freezes
[15:51] <stxShadow> and only reboot helps
[15:51] <Sharky--> ah
[16:25] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:29] <guido> stxShadow: Are rolling upgrading a problem? I.e. upgrading one node after the other to a new version without taking the whole cluster offline
[16:31] <stxShadow> i've upgraded from version .39 to .40 without problems
[16:32] <stxShadow> on upgrade from .36 to .37 i got trouble with one osd
[16:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:32] <stxShadow> -> deleted it from the crushmap
[16:32] <stxShadow> reinserted
[16:32] <stxShadow> working
[16:33] <guido> I suppose doing something like that involves some heavy IO for resynchronisation?
[16:33] <stxShadow> sure
[16:34] <stxShadow> we have 10GE between the nodes
[16:35] <guido> What distro are you using to run ceph on?
[16:35] <stxShadow> Debian
[16:35] <stxShadow> the same happens if you adjust the replication level
[16:35] <stxShadow> -> whole data is transmitted to other nodes
[16:36] <guido> Does that affect the availability or performance of the cluster much?
[16:37] <guido> I've mostly been using centos and fedora, and I've been having some problems there
[16:37] <stxShadow> no .... not at all .... works great ....
[16:38] <stxShadow> maybe someone has an answer to my question:
[16:38] <stxShadow> 2011-12-08 16:26:57.410449 pg v48378: 1166 pgs: 18 creating, 1148 active+clean
[16:38] <stxShadow> -> what is the meaning of "creating" ?
[16:38] <stxShadow> stays there for days now
[16:38] <stxShadow> dump -o shows no pgs in create mode
[16:40] <stxShadow> noone ? :S
[16:42] <guido> Maybe ask on the mailing list? I don't the developers will usually spend countless hours a day just watching the IRC channel
[16:42] <guido> +think
[16:45] <stxShadow> will do so :) .... i've already posted some bugs to bugtracker ..... and .40 has some cute ones ;)
[16:45] <guido> stxShadow: btw, how did you upgrade to 0.40 anyway? It's not even out yet
[16:46] <stxShadow> yes .... lets say --> latest git version of .40
[16:46] <nhm> stxShadow/guido: what kind of things are you planning on using Ceph for?
[16:47] <stxShadow> in my case -> cloud services
[16:47] <nhm> ah, openstack?
[16:47] <stxShadow> ceph -> rbd -> qemu
[16:47] <stxShadow> und rados for s3 integration
[16:48] <stxShadow> we've developed our own middleware
[16:48] <guido> pretty much the same thing here, although we're not going to call it "cloud services", just flexible virtual servers or something like that. With Qemu/KVM
[16:49] <stxShadow> we call it "dynamich services" ;)
[16:53] <stxShadow> question to all: what underlaying filesystem do you use ? btrfs ? ext4 ? pros / cons ?
[16:59] <ajm> stxShadow: only use btrfs if your a massive masochist
[17:00] <stxShadow> yes :) i've destoryed an osd twice by powering it off
[17:00] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:06] <guido> Hm, I am using btrfs on the osds. I was under the impression that it has become quite stable in the newest kernels. otoh, I haven't done any real testing yet, I'm still struggling with even getting the ceph libraries compiled in the client machines...
[17:13] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:20] * jantje (~jan@paranoid.nl) Quit (Remote host closed the connection)
[17:20] * jantje (~jan@paranoid.nl) has joined #ceph
[17:21] <stxShadow> hm ? is there a problem with rpmbuild ?
[17:22] <stxShadow> debian packages work fine -> directly from mirror
[17:29] <ajm> http://pastebin.com/BfSLKarb
[17:30] <ajm> any idea how to debug that further?
[17:30] <ajm> it just does abort
[17:39] <guido> Well. I cannot build the software at all on fedora 14, and I have no idea where to go from there - see here: http://tracker.newdream.net/issues/1797
[17:47] * adjohn is now known as Guest19807
[17:47] * Guest19807 (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[17:47] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:53] * Sharky-- (~Sharky-@D57D69B2.static.ziggozakelijk.nl) Quit (Ping timeout: 480 seconds)
[17:58] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:07] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:23] * aliguori (~anthony@ has joined #ceph
[18:23] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:31] * bchrisman (~Adium@ has joined #ceph
[18:39] * nolan (~nolan@phong.sigbus.net) Quit (Read error: Operation timed out)
[18:51] * stxShadow (~jens@p4FD07F96.dip.t-dialin.net) Quit (Quit: Ex-Chat)
[18:53] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:01] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:02] * pombreda (~Administr@adsl-71-142-77-127.dsl.pltn13.pacbell.net) has joined #ceph
[19:05] <todin> guido: do you have the pthread devel pack installed?
[19:05] <sjust> ajm: looking
[19:06] * pombreda (~Administr@adsl-71-142-77-127.dsl.pltn13.pacbell.net) Quit ()
[19:09] <sjust> ajm: sorry, I've forgotten, what underlying filesystem are you using?
[19:10] <guido> todin: Does the configure not check for that?
[19:10] <guido> todin: I don't have access to the machine in question right now, but I'll check for that as soon as possible
[19:11] * NightDog__ (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[19:11] <ajm> sjust: xfs
[19:11] <sjust> thanks
[19:11] * NightDog__ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[19:12] <todin> guido: im not sure if configure checks that
[19:14] <todin> ajm: your pastebin entry is still unclear?
[19:18] <ajm> todin: not sure what you mean? i'm not clear on why its not starting up if thats what your asking.
[19:19] <todin> ajm: as far as I can see, in the pastebin, the osd recived a signal, and stop than
[19:20] <ajm> todin: your saying you think something else on the system sent ceph-osd a sigabrt ?
[19:22] <todin> ajm: I am not sure about that.
[19:41] <sjust> ajm: it's missing the pginfo attr on the 1.182 pg
[19:41] <sjust> ajm: could you verify that that collection currently contains no objects?
[19:43] <ajm> correct
[19:43] <sjust> was this the osd on which you killed the filestore journal
[19:43] <sjust> ?
[19:43] <ajm> no
[19:43] <ajm> i was just thinking about doing that, that didn't work :)
[19:43] <ajm> thats osd.5, this is osd.9
[19:44] <sjust> heh, worth a short
[19:44] <sjust> *shot
[19:44] <ajm> can I just move aside 1.182
[19:44] <ajm> its replicated elsewhere
[19:44] <sjust> yeah, that will fix the problem
[19:44] <ajm> it just seems like it should tolerate that scenario better
[19:44] <sjust> somehow the collection got created without getting its initial info attribute written
[19:44] <sjust> this could happen if you had removed the journal
[19:44] <sjust> other than that, it should not be possible
[19:45] <sjust> but removing the directory will be fine
[19:45] <ajm> ok
[19:45] <sjust> nothing in dmesg?
[19:45] <ajm> nope, no xfs issues like that
[19:45] <ajm> i have >1 of those too
[19:46] <sjust> how did the osd go down?
[19:46] <ajm> hrm, this one broke a while ago actually
[19:46] <ajm> i'm not 100% what the original issue was
[19:46] <ajm> i was having issues with 0.37 and locking up
[19:46] <sjust> I mean, did the machine die or did the osd process die?
[19:47] <ajm> osd iirc
[19:47] <sjust> ok
[19:48] <ajm> for some things though like the 2. pool
[19:48] <ajm> i have 0 objects inside the dir there
[19:48] <ajm> but i also have no data in that pool
[19:48] <ajm> so thats expected I assume?
[19:48] <sjust> yeah
[19:49] <sjust> I only asked because the most likely possibility was that the initial creation of the pg (on that osd) was interrupted leaving the collection but no info attribute
[19:49] <sjust> if there were objects in the collection, it would have to have been caused by something else
[19:49] <ajm> 2232305703 4 -rw-r--r-- 1 root root 8 Nov 9 03:44 meta/DIR_E/DIR_2/DIR_F/DIR_8/pginfo\\u1.182__0_A0668F2E
[19:50] <ajm> this is the actual file that shows the issue i'm guessing, since its only 8 bytes?
[19:50] <ajm> i have a number of pgs that are like that somehow
[19:53] <sjust> ajm: actually, the info is stored in two pieces, past_intervals and snap_collections go in that file
[19:53] <sjust> the rest go in the 'info' attribute on the pg collection
[19:53] <sjust> it's interesting that that info file exists...
[19:54] <sjust> and it's not a problem for it to be 8 bytes, necessarily
[19:54] <ajm> i'm basically trying to determine which pgs are broken so I acn work around them
[19:54] <ajm> like I moved aside 1.182 and it still won't start
[19:54] <sjust> oh, what was the message this time?
[19:54] <ajm> 2011-12-08 13:53:39.018926 7ff172ecb780 filestore(/data/osd.10) collection_getattr /data/osd.10/current/1.173_head 'info'
[19:54] <ajm> 2011-12-08 13:53:39.018936 7ff172ecb780 filestore(/data/osd.10) collection_getattr /data/osd.10/current/1.173_head 'info' = -61
[19:54] <ajm> *** Caught signal (Aborted) **
[19:54] <sjust> wow
[19:54] <sjust> ok
[19:55] <ajm> that one is also empty
[19:56] <ajm> http://pastebin.com/47Dah7Xt
[19:56] <ajm> excluding all the other pools
[19:57] <sjust> like I said, empty is no problem, could you instead look for ones with missing 'ceph.info' attributes?
[19:57] <ajm> oh i misunderstood
[19:58] <sjust> ajm: no worries
[19:58] <ajm> hrm yeah its a lot of things
[19:59] <ajm> i think it might not be your fault actually
[19:59] <sjust> ajm: doesn't really matter, if it's xfs, we need to give them a reproducible test case
[20:00] <ajm> i mean not even an xfs bug exactly
[20:00] <sjust> where there any non-empty pgs with missing 'ceph.info' attributes?
[20:00] <sjust> ajm: any ideas?
[20:01] <ajm> data got moved around here (due to a hardware failure) and I think some data was copied without the xattr stuff being copied
[20:01] <sjust> ajm: ah... that would do it
[20:23] * sageph (~yaaic@ has joined #ceph
[20:24] <sageph> im off! can someone take care of pankaj?
[20:24] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[20:24] * sageph (~yaaic@ Quit ()
[20:26] <gregaf> I think that's Tv this week… ;)
[20:26] <Tv> nothing is slipping by unanswered, but answers may be slow...
[20:27] <Tv> i keep track of conversations and toggle "Ceph support" labels on threads based on whether they need a response or not.
[20:27] <gregaf> ah, that's clever
[20:27] <Tv> but i also need to write this doc asap
[21:18] * NightDog__ (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[21:19] * NightDog__ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[21:24] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[21:28] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[21:32] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:32] * NightDog__ (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[21:32] * NightDog__ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[21:36] * aliguori (~anthony@ has joined #ceph
[21:47] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[22:49] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[22:51] * Meths_ (rift@ has joined #ceph
[22:58] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[22:58] <ajm> hrm, for some reason the MDS is stuck in up:replay on this cluster now ever with everything healthy
[22:58] <ajm> http://pastebin.com/TEfnVLGw
[22:58] * Meths_ is now known as Meths
[22:59] <ajm> thats mds.15
[22:59] <ajm> mds e413: 1/1/1 up {0=15=up:replay}, 2 up:standby
[22:59] <ajm> pg v3876883: 3406 pgs: 3002 active+clean, 404 down+peering; 22746 GB data, 44566 GB used, 39248 GB / 83815 GB avail
[23:03] * MikeP (~Talan@ has joined #ceph
[23:03] <MikeP> Hello all...
[23:04] <MikeP> I'm just getting introduced to Ceph. Any suggestions on reading material/resources?
[23:04] <gregaf> ajm: checking
[23:04] <ajm> gregaf: i only have 2/3 mds up (not multi-mds), i'm going to bring up the 3rd in a moment
[23:05] <gregaf> yeah, that MDS isn't trying to enter replay at any point
[23:06] <ajm> any idea why? bringing up the 3rd mds might fix?
[23:07] <gregaf> looks like it's starting up and it's not getting told to do anything else
[23:07] <gregaf> what's the log for the other one?
[23:08] <ajm> let me get that
[23:14] <ajm> gregaf: http://adam.gs/mds.14.log
[23:15] <gregaf> hmm, same with that oen
[23:16] <ajm> what triggers that?
[23:17] <gregaf> well, when a logical MDS isn't being backed by a daemon, the monitors should assign any appropriate (standby or standby-replay) daemon to the logical
[23:17] <gregaf> and it looks like they think they did, since they say the one labeled "15" is in replay
[23:18] <gregaf> and there are 2 that remain in standby
[23:18] <gregaf> which is probably these two
[23:18] <gregaf> the logs you've shown me, I mean
[23:18] <gregaf> MikeP: what kind of reading materials are you after?
[23:18] <gregaf> unfortunately the documentation is...lacking
[23:18] <ajm> hrm so mon thinks it told 15 to be replay, but mds thinks its supposed to stay in standby....
[23:19] <gregaf> MikeP: but the best user docs are probably ceph.newdream.net/docs/latest
[23:19] <gregaf> if you want architecture stuff, check out the academic papers
[23:19] <gregaf> http://ceph.newdream.net/publications/
[23:20] <gregaf> and there's always our wiki, for what it's worth: http://ceph.newdream.net/wiki/
[23:20] <gregaf> ajm: you sure you got the right descriptors on the right MDS daemons?
[23:20] <gregaf> check out the log on the third one
[23:21] <ajm> looks the same
[23:21] <ajm> let me try restarting all mds/mon
[23:23] <ajm> 2011-12-08 17:22:49.389592 log 2011-12-08 17:22:48.422605 mon.0 9 : [INF] mds.? up:boot
[23:23] <ajm> so its telling .35 to go up:boot ?
[23:23] <gregaf> that's a log message
[23:23] <gregaf> it got an up:boot message from an MDS daemon coming online
[23:23] <gregaf> now the monitor has to decide what to do with it
[23:24] <MikeP> gregaf: Primarily introductory material on Ceph. What it is, how to utilize it, how to get started with implementation and development.
[23:24] <ajm> weird, it doesn't seem to be deciding to do anything, all 3 are in standby from the looks of the log
[23:24] <gregaf> I'll leave it for others to tell you which are the best sources, then; I've been immersed in it for 3 years :)
[23:24] <gregaf> ajm: that's pretty odd
[23:25] <gregaf> what's ceph -s say now?
[23:25] <ajm> mds e422: 1/1/1 up {0=15=up:replay}, 3 up:standby
[23:25] <ajm> about the same
[23:26] <gregaf> but you've only got 3 daemons running, right?
[23:26] <gregaf> so it looks like it thinks there's another daemon still alive
[23:27] <gregaf> have you changed your timeouts?
[23:27] <gregaf> and are you sure there isn't a rogue one somewhere?
[23:27] <ajm> nope, config is rather vanilla
[23:27] <ajm> interesting
[23:27] <ajm> there was a rogue ceph-mds process running
[23:27] <gregaf> so that's probably it
[23:28] <gregaf> do you have logs from it?
[23:28] <ajm> i think i overwrote getting clean logs from these :/
[23:28] <ajm> it was on the same box
[23:28] <gregaf> bummer
[23:28] <gregaf> brb
[23:31] <ajm> mds e427: 1/1/1 up {0=14=up:replay}, 2 up:standby
[23:31] <ajm> so after that with no rogue mds processes, it switched to .14 to replay
[23:31] <ajm> but it still doesn't boot
[23:34] <gregaf> all right, get me that log
[23:34] <ajm> let me put all 3 together, give me a minute
[23:35] <ajm> gregaf: you want mon logs too?
[23:36] <gregaf> maybe, but it's probably a problem on the mds
[23:37] <ajm> i'll just do all at once
[23:37] <ajm> let me let it run for a minute or two
[23:38] <ajm> mds e431: 1/1/1 up {0=13=up:replay}, 2 up:standby
[23:38] <ajm> got a different one in replay again :)
[23:40] <ajm> gregaf: http://adam.gs/ceph-logs.tar.bz2
[23:45] <gregaf> ajm: can you add --debug_journaler 20 to your mds startup and get new logs, and post it to a bug?
[23:49] <ajm> debug journaler = 20
[23:49] <ajm> in ceph.conf
[23:50] <todin> did the xml format in libvirt for a rbd image changed?
[23:50] <gregaf> ajm: yeah, it's not there already, is it?
[23:52] <ajm> gregaf: no, wasn't there, just saying for myself
[23:52] <joshd> todin: the basic format hasn't, although authentication was added a little while ago
[23:53] <todin> joshd: ok, maybe you know what is wrong here virSecurityDACRestoreSecurityFileLabel:143 : cannot resolve symlink rbd/vm9:rb d_writeback_window=8000000: No such file or directory
[23:54] <todin> there is no space in rbd, copy and paste error
[23:54] <joshd> that would be libvirt trying to do an selinux/apparmor check on the image name
[23:55] <joshd> which it shouldn't do since rbd images aren't files
[23:55] <todin> joshd: ok, but only on node does it, two others are fine
[23:56] <joshd> did that happen when trying to start a vm?
[23:57] <todin> joshd: yes
[23:57] <todin> it won't start
[23:57] <ajm> joshd: "mds doesn't enter replay" ?
[23:57] <ajm> or you have a better title
[23:59] <joshd> todin: I wonder if all your machines have selinux/apparmour enabled in the kernel?
[23:59] * aliguori (~anthony@ Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.