#ceph IRC Log


IRC Log for 2011-08-22

Timestamps are in GMT/BST.

[1:35] * eternaleye_ (~eternaley@ has joined #ceph
[1:35] * eternaleye (~eternaley@ Quit (Remote host closed the connection)
[1:39] * eternaleye_ (~eternaley@ Quit (Remote host closed the connection)
[1:40] * eternaleye_ (~eternaley@ has joined #ceph
[1:51] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[1:56] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Ping timeout: 480 seconds)
[2:12] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:24] * jim (~chatzilla@c-71-202-13-33.hsd1.ca.comcast.net) has joined #ceph
[5:53] * lxo (~aoliva@09GAAF891.tor-irc.dnsbl.oftc.net) Quit (Quit: later)
[5:54] * lxo (~aoliva@9KCAAAG4U.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:42] * huangjun (~root@ has joined #ceph
[11:21] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:02] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Quit: o//)
[13:03] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[14:23] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Quit: o//)
[14:38] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[14:48] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:38] * huangjun (~root@ Quit (Remote host closed the connection)
[15:51] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[15:51] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit ()
[15:51] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[16:24] <ajm> if i have a single pg thats missing, any way to just remove that?
[17:11] <sagewk> ajm: yeah, 'ceph pg force_create_pg <pgid>' will recreate it. this only works if all existing copies of the pg are gone.
[17:15] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[17:17] <ajm> sagewk: thx, that seemed to work
[17:20] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[17:44] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[17:46] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:51] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:56] * gigigi (~gigigi@212-198-248-35.rev.numericable.fr) has joined #ceph
[17:58] * gigigi (~gigigi@212-198-248-35.rev.numericable.fr) Quit ()
[18:00] <ajm> sagewk: interesting, after I do force_create_pg, the missing block goes away for a while then comes back
[18:01] <sagewk> missing block?
[18:02] <ajm> missing pg
[18:05] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:15] <sagewk> what do you mean by missing? what are you seeing?
[18:20] * kikougirl (~kikougirl@212-198-248-35.rev.numericable.fr) has joined #ceph
[18:22] <ajm> sorry, unfound "1/10989477 unfound (0.000%)"
[18:22] * kikougirl (~kikougirl@212-198-248-35.rev.numericable.fr) Quit ()
[18:32] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[18:33] <sagewk> oh, that's talking about objects that the system knows exist but it can't find copies.
[18:34] <ajm> yeah any way to just tell it to forget about it?
[18:36] <sagewk> you can mark the objects 'lost', which means you get EIO if you try to read them.
[18:37] <sagewk> that's done with 'ceph osd tell \* mark_unfound_lost'
[18:38] <sagewk> ajm: the lost object logic isn't very robust yet, though, so you may see some weirdness... (we're most focused on avoiding losing objects in the first place :)
[18:40] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:41] <ajm> yeah, i'd just like to get this one back to being healthy again :)
[18:41] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:48] <slang> sagewk: are you around?
[18:48] <sagewk> slang: yeah
[18:50] * pinklady (~pinklady@212-198-248-35.rev.numericable.fr) has joined #ceph
[18:52] * pinklady (~pinklady@212-198-248-35.rev.numericable.fr) Quit ()
[18:55] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[18:55] * greglap (~Adium@aon.hq.newdream.net) Quit ()
[18:59] * cmccabe (~cmccabe@ has joined #ceph
[19:10] <ajm> interesting, if i turn off one of the osd's the unfound pg goes away...
[19:10] <Tv> hehe.. favorite line of this morning's browsing of gitk: + case LOCK_LOCK_XLOCK: return "lock->xlock";
[19:11] <sagewk> :P
[19:12] <cmccabe> and a lock lock here, a lock lock there
[19:12] <cmccabe> here a lock, there a lock, everywhere a lock lock
[19:12] <cmccabe> old mcdonald had a farm...
[19:13] <sagewk> it's LOCK_state or LOCK_oldstate_newstate
[19:13] <sagewk> and sadly the LOCK and SYNC states are poorly named...
[19:13] <Tv> and it's actually more readable than the one-line snippet makes you think.. i just found myself trying to read that out loud ;)
[19:13] <cmccabe> I'm sure it is... just funny taken out of context
[19:13] <cmccabe> looks like someone on the mailing list doesn't have pipe2, or O_CLOEXEC
[19:14] <Tv> cmccabe: yeah that looked weird
[19:14] <cmccabe> I guess O_CLOEXEC is newer than it seems
[19:15] <cmccabe> 2.6.23
[19:15] <gregaf> Tv: you have no idea how much harder that made debugging a problem last week...gah
[19:15] <Tv> CentOS 5.5 is from 2010-05
[19:16] <sagewk> cmccabe: oh, i disabled lenny build starting w/ 0.32 for that reason, no CLOEXEC
[19:16] <cmccabe> I can just add an ifdef I think
[19:16] <cmccabe> CLOEXEC is just nice to have, not essential
[19:16] <sagewk> cool
[19:17] <sagewk> cmccabe: oh, your rados api tests found a real bug this weekend! :)
[19:17] <cmccabe> we don't exec that many programs from daemons hopefully, and those we do are hopefully trusted
[19:17] <cmccabe> ah, glad to hear it
[19:17] <cmccabe> glad you resolved the build problem too. I didn't realize there was no gtest.so on the test machines
[19:17] <Tv> O_CLOEXEC is needed to avoid races if arbitrary threads can fork()
[19:18] <sagewk> not the case for us i think
[19:21] <cmccabe> tv: the race that I'm aware of is the window of time between open and fcntl(fd, F_SETFD, FD_CLOEXEC)
[19:21] <Tv> exactly
[19:21] <cmccabe> tv: so in the worst case, the program that gets exec'ed has access to our seemingly private fd
[19:22] <Tv> also, the fd doesn't close when you think it closes, etc
[19:22] <cmccabe> yeah, it's annoying. luckily I don't think we do too much with exec
[19:23] <cmccabe> I see us running rm, mount, umount, mkdir, and touch
[19:23] <cmccabe> we probably could/should get rid of some of those run_cmd calls, but I doubt there's a real-world security hole there
[19:23] <Tv> it's not just security
[19:24] <Tv> it's also if you intentionally e.g. pipe to something, and child process also holds the fd, eof on the pipe doesn't happen when you expect it to
[19:24] <cmccabe> yeah, it can be other things like setting nonblocking on the file descriptor
[19:25] <cmccabe> I don't think my code relies on EOF from the pipe
[19:25] <cmccabe> it just relies on receiving a single byte
[19:26] <cmccabe> but that is a good thing to keep in mind.
[19:30] * kikougirl (~kikougirl@212-198-248-35.rev.numericable.fr) has joined #ceph
[19:32] * kikougirl (~kikougirl@212-198-248-35.rev.numericable.fr) Quit ()
[19:42] * df_ (~df@ has joined #ceph
[19:42] * df_ (~df@ Quit ()
[19:48] * The_Bishop (~bishop@port-92-206-21-65.dynamic.qsc.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:03] <lxo> sage, aren't layout policies supposed to survive mds restarts?
[21:04] <lxo> with 0.33 + patch, after restarting mds, show_layout on directories most often show ???not specified???, so I have to set it again to avoid getting files placed in the wrong pools
[21:23] * _are_ (~quassel@vs01.lug-s.org) has joined #ceph
[21:24] <_are_> Hi
[21:24] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[21:24] * Juul (~Juul@3408ds2-vbr.4.fullrate.dk) has joined #ceph
[21:34] * Meths (rift@ has joined #ceph
[22:05] <wido> Is there any particular reason why the collectd plugin is for version 4.1 instead of 5.0?
[22:05] <wido> Debian version of collectd?
[22:06] <cmccabe> wido: it was to make it easier for us to build a debian package for our systems
[22:06] <wido> Ah, makes sense
[22:06] <sagewk> lxo: which version are you running?
[22:06] <cmccabe> wido: the collectd changes will be upstreamed into the actual collectd repo once we've tested them a bit locally
[22:07] <wido> cmccabe: Cool :-)
[22:07] <wido> I have a bit for a working plugin where collectd stores JSON objects in RADOS instead of RRD or CSV
[22:07] <wido> works nice in distributed RBD VM's
[22:08] <wido> no more local storage on the VM hosts needed, just RADOS access
[22:09] <cmccabe> wido: an output plugin then?
[22:09] <cmccabe> wido: sounds interesting
[22:09] <cmccabe> wido: mine are all input plugins obviously
[22:09] <wido> cmccabe: Yes, an output plugin.
[22:10] <cmccabe> collectd is very lightweight and I really like that about it
[22:10] <wido> Seemed logical to use RADOS for that.
[22:10] <wido> I also tried to see if I could have librrd store it's output in RADOS, that is a bit harder then I hoped
[22:10] <cmccabe> I never figured out how to aggregate the collectd data though, because we already have some other tool/plugin doing that
[22:11] <wido> cmccabe: I just wanted to store the libvirt data from every VM host, like disk I/O per VM
[22:12] <cmccabe> wido: perhaps rrd_dump_cb_r could be used to plug into librados
[22:12] <wido> But right now I'm still waiting for my RMA.. Those WD Green disks keep dying. I'm sending them back and getting seagate :)
[22:12] <cmccabe> wido: I got a 3TB drive recently from fujitsu that I've been pretty happy with
[22:12] <cmccabe> wido: it's purely for backup use though... only 5400 RPM
[22:12] <wido> cmccabe: Yeah, but I'll probably need some kind of temp dir to store the RRD files in, but it's something fun to find out :)
[22:13] <wido> I'm getting 2TB still, much better price per GB
[22:13] <cmccabe> wido: 3TB also forces you to use GPT
[22:13] <cmccabe> brb, lunch
[22:13] <wido> I'll be afk in a bit, ttyl!
[22:13] <sagewk> wido: have a minute?
[22:14] <gregaf> hi _are_
[22:15] <gregaf> lxo: hmm, I noticed some other issues with the layout based on what you were telling us last week but haven't had a chance to check them out yet
[22:15] <sagewk> lxo: it should work.. if you can send an email or open a bug showing the behavior that'd be helpful! (e.g. set layout, show layout, umount, mount, show layout)
[22:15] <gregaf> I wouldn't have expected total loss though
[22:32] <gregaf> jojy: bchrisman: did you guys have an assert in MDCache::path_traverse?
[22:32] <jojy> yes
[22:32] <bchrisman> gregaf: went away when we updated code.
[22:33] <gregaf> ah, cool
[22:33] <gregaf> I found one here and wanted to consolidate info, but I'll retest if you lost one :)
[22:33] <wido> sagewk: Yes!
[22:34] <sagewk> wido: i wanted to ask you about what you're doing with libvirt+rbd currently...
[22:34] <sagewk> you're using libvirt i assume?
[22:35] <wido> Yes, using libvirt indeed. Still using the old version though with the virtual disk hack in it
[22:35] <wido> I'm still on Ubuntu 10.04
[22:36] <sagewk> i was looking at libvirt over the weekend and it looked like there were problems with how we did the virtual disk
[22:36] <wido> You mean the hack or the new implementation?
[22:36] <sagewk> the upstream implementation
[22:37] <wido> I haven't tested that one yet, it is on the whistlist, but first I need to get everything up and running again
[22:37] <wido> But what were you seeing?
[22:37] <sagewk> wido: it isn't documented, so that's a good thing i guess :)
[22:37] <sagewk> hold on, cloning the repo now
[22:38] <sagewk> basicalloy the virtual disk schema lets you set servers, but not much else, which doesn't map onto how librados/rbd is actually initialized
[22:38] <sagewk> we want a client id (optional, dfeault is admin), a config file (optional), and key/value pairs (for setting whatever config options are appropriate).
[22:39] <wido> iirc you can only set a monitor + ip, but not with extra parameters
[22:39] <sagewk> for qemu, it's just an optoin string, like foo=this:bar=that:bla=cow:conf=/etc/ceph/ceph.conf:id=myname
[22:39] <wido> and since the new librados doesn't open ceph.conf by default anymore, you get stuck
[22:39] <sagewk> with a few options pulled out w/ special meaning (id is passed to the librados init method, conf is give ot rados_conf_read_file, teh rest go to rados_conf_set)
[22:39] <wido> otherwise you could specify keyring onder [client]
[22:39] <sagewk> yeah
[22:40] <wido> good point, I didn't notice it when the new scheme was proposed, but by then librados was behaving different
[22:40] <wido> but libvirt wanted to keep the XML format some sort of standard between Ceph and sheepdog
[22:41] <sagewk> If the <code>protocol</code> attribute
[22:41] <sagewk> is "rbd" or "sheepdog", an additional
[22:41] <sagewk> attribute <code>name</code> is mandatory to specify which
[22:41] <sagewk> image to be used. When the disk <code>type</code> is
[22:41] <sagewk> "network", the <code>source</code> may have zero or
[22:41] <sagewk> more <code>host</code> sub-elements used to specify the hosts
[22:41] <sagewk> to connect.
[22:41] <sagewk> yeah
[22:41] <gregaf> just shove it into the "name" section?
[22:41] <gregaf> if they give any trouble about it, i mean
[22:42] <sagewk> in theory we can abuse any of the fields, but presumably they want a sane schema, so its a matter of coming up with something everyone likes.
[22:43] <wido> I wouldn't start putting custom strings in a "host" field, you never know what kind of filters libvirt might build in
[22:44] <wido> a check of a hostname is valid or something
[22:44] <sagewk> yeah
[22:45] <sagewk> that's not a battle you're interested in by any chance? :)
[22:45] <Tv> btw does that mean sheepdog does no authentication?
[22:45] <wido> sagewk: What do you mean exactly? Before I get you wrong?
[22:45] <wido> Tv: they use corosync
[22:46] <wido> Uh, that is for the "sheeps" to communicate
[22:46] <wido> the "collie" (client) uses no auth at all, but as far as I know, you can only connect to localhost at the moment
[22:46] <Tv> wido: yeah i'm just saying if just listing a "server" lets you do operations, either it reads config files on the side, or doesn't authenticate
[22:46] * jim (~chatzilla@c-71-202-13-33.hsd1.ca.comcast.net) Quit (Quit: ChatZilla 0.9.87 [Firefox 4.0.1/20110609040224])
[22:47] <sagewk> wido: someone needs to find old discussion in archive, and propose a schema change that hopefully addresses the previous concerns and is also sane wrt librados/rbd
[22:47] * jim (~chatzilla@c-71-202-13-33.hsd1.ca.comcast.net) has joined #ceph
[22:47] <wido> sagewk: Ah, get it :-) Yeah, no problem
[22:48] <sagewk> wido: this was all added in 036ad5052b43fe9f0d197e89fd16715950408e1d
[22:48] <wido> When looking at the current implementation, gregaf his proposition to put in in the "name" attribute isn't that bad (to keep a general format), but should be discussed with the libvirt guys
[22:49] <sagewk> i suspect name shows up in various places and shouldn't included config gobbledygook...
[22:49] <sagewk> the xml is all parsed into a struct so the schema should be as simple as possible (i.e. not arbitrary xml mapping to rados config key/value pairs)
[22:50] <sagewk> http://tracker.newdream.net/issues/1432
[22:51] <wido> sagewk: Tnx. It's still vacaction here and I'm needed on some big other projects, so my time is limited lately
[22:51] <wido> But I'll set up the new libvirt and see what could be done
[22:52] <sagewk> wido: the version you're using just feeds a config string to qemu, right?
[22:53] <wido> sagewk: Correct
[22:53] <wido> works pretty nice though :)
[22:53] <sagewk> yeah, that's probably closer to what we really want, actually. at least the ability to add arbitrary config options to that string.
[22:53] <wido> <source path='rbd:rbd/beta:conf=/etc/ceph/ceph.conf'/>
[22:54] <sagewk> wido: also need id=foo or else you're stuck with being client.admin. and ideally you could feed options directly w/o a config file. but yeah
[22:55] <wido> sagewk: Indeed, but in my current env I'm using it that way. Using client.admin everywhere...
[22:56] <sagewk> everyone else apparently is too.. i just noticed the id= part wasn't implemented in qemu this weekend. whoops!
[22:56] <sagewk> pushing that upstream now.
[22:58] <wido> sagewk: Added some personal notes to the issue
[22:58] <sagewk> actually, anyone care to review first? http://ceph.newdream.net/git/?p=qemu-kvm.git;a=commitdiff;h=605314063dd8b2df0ed7c00a5295dfcd69f53d8e
[23:07] <wido> I'm going afk, ttyl!
[23:07] <sagewk> wido: ok ttyl!
[23:09] * verwilst (~verwilst@dD576F4D4.access.telenet.be) has joined #ceph
[23:20] * verwilst (~verwilst@dD576F4D4.access.telenet.be) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.