#ceph IRC Log


IRC Log for 2012-06-11

Timestamps are in GMT/BST.

[10:21] <renzhi> librados crashes so easily, it's unbelievable
[10:21] <renzhi> I'm having trouble believing anyone uses it to do anything serious
[10:22] <renzhi> anyone has experience using it in a busy system, i.e. lots of user connections?
[10:23] <renzhi> having every client with its own cluster handle and io context does not work, sharing one cluster handle among multiple io contexts still does not work
[10:23] <renzhi> it still crashes
[10:24] <renzhi> ticket #2524
[10:37] <renzhi> the question, is it safe to have multiple threads share one io context to read and write?
[10:37] <renzhi> I mean, read from and write to different objects
[11:14] * Hexasoft (~hexasoft@ccgecko.in2p3.fr) has joined #ceph
[11:14] <Hexasoft> hello
[11:16] <Hexasoft> where is the best place to report a bug in ceph-kclient code?
[11:55] <joao> Hexasoft, the tracker: http://tracker.newdream.net/projects/ceph-kclient
[11:57] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[12:07] <Hexasoft> joao: thanks
[12:08] <joao> yw
[12:16] * mtk (trP9T6oSW2@panix2.panix.com) has joined #ceph
[12:23] * renzhi is now known as renzhi_away
[12:33] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:49] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:53] * MapspaM is now known as SpamapS
[17:45] <jmlowe> what does this status mean? active+clean+inconsistent
[17:49] <joao> I believe that's an inconsistent pg
[17:49] <jmlowe> fixable
[17:49] <jmlowe> ?
[17:49] <joao> but I'm basing my belief on a grep
[17:50] <joao> I'm sure someone else will be able to answer you more accurately than I am
[17:52] <joao> well, from the looks of the PG code, there is code to perform a repair during scrub
[17:53] <joao> so I'm assuming that it is fixable, but someone else should know better to be honest
[17:53] <joao> :\
[17:57] <joao> nhm_, elder, are you guys around?
[17:58] <elder> I'm here.
[17:58] <joao> do you know the url to our gitbuilder branch compilation queue?
[17:59] <elder> http://ceph.com/gitbuilder.cgi
[17:59] <elder> That's all of them.
[18:00] <joao> that's it :)
[18:00] <joao> thanks
[18:04] <joao> great... my build crapped out because I forgot to remove an include
[18:16] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[18:32] <sagewk> elder: i pushed a couple fixes on top of testing-next
[18:32] <elder> I'll take a look.
[18:32] <sagewk> you may want to squash one or more of them into the relevant commits
[18:35] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[18:35] <elder> Both look great sagewk but I'll look at them closely and will merge it into the relevant commits.
[18:41] <joao> has anyone had problems installing stuff on the planas with apt-get since the upgrade to precise?
[18:44] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[18:51] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[18:54] <elder> sagewk, does the socket state transition one make the reported warnings go away?
[18:55] <sagewk> yeah
[18:55] <sagewk> i connecting to the daemon on the same host, so the tcp ESTABLISHED came quickly
[18:55] <elder> And the ohter one, did you observe a problem it fixes, or did you just notice it by inspectino?
[18:55] <sagewk> i assume that's why i saw it every time
[18:55] <sagewk> i bisected it down to the embed con in mon_client commit
[18:55] <sagewk> and observed the fix fixed it
[18:56] <elder> So you've verified both fixes?
[18:56] <sagewk> yep!
[18:56] <sagewk> well, i didn't test on non-uml, but i assume that'll happen shortly.
[18:56] <elder> Sweet. I'll commit it shortly with my reviewed-by. They look good.
[18:56] <elder> I'll test it once before I commit to testing.
[18:57] <sagewk> i was trying to confirm that #2478 is fixed.. but uml doesn't issue the scheduling while atomic warnings. need to test that on a real box.
[18:59] * yanzheng (~zhyan@ has joined #ceph
[19:00] <elder> sagewk, are you ready to switch over to the -rc1 version of the testing branch? I have it mostly ready to go.
[19:02] <sagewk> yeah, let's do it
[19:02] <sagewk> testing-next you mean?
[19:05] <elder> Well I have a different version where I merged it rather than rebasing it, but same content.
[19:05] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:23] * chutzpah (~chutz@ has joined #ceph
[19:31] <Tv_> "Disk /dev/vdb: 197kB" -- well, i see my problem..
[19:40] <Tv_> oopsies i attached a qcow2 disk as raw
[19:40] <Tv_> that explains a lot
[19:44] <gregaf> how does bash think that "git fetc" might be an attempt to type "git bisect"?
[19:44] <elder> It's git, not bash
[19:45] <gregaf> right, but w/e ;)
[19:45] <elder> I don't know how it does that though, must have to do with the presence of three characters (ect) from "bisect"
[19:45] <gregaf> it did suggest "fetch" first
[19:45] <jmlowe> *sigh* must have new mbacbook pro
[19:45] <gregaf> yeah, just a very odd distance check
[19:45] <elder> Is the retrospective in 15 minutes?
[19:45] <gregaf> argh, I forgot those announcements were happening right now ??? damn you, jmlowe!
[19:45] <elder> jmlowe, is that what Apple announced today?
[19:46] <elder> Is that the one more thing?
[19:47] <jmlowe> spec bump on all macbooks, new thin macbook with retina display no optical drive or ethernet
[19:47] <jmlowe> not quite an air not quite a macbook pro
[19:47] <jmlowe> app store, macbook update, new macbook
[19:48] <jmlowe> no ios 6 yet, but it was on the banners
[19:48] <elder> Wow 2880x1800 display
[19:48] <elder> 15.4"
[19:48] <jmlowe> I've done 4 years of 17", might be willing to go down to 15" for this
[19:48] <elder> You could fit a lot of text on that.
[19:49] * BManojlovic (~steki@ has joined #ceph
[19:51] <jmlowe> on to 8 new features in mountain lion
[19:51] <elder> The software features are usually fluff in my opinion.
[19:52] <jmlowe> imessages are kind of nice, no more trying to reply on a touch screen keyboard, if they would only release a sdk so I could tie my zenoss in and still be the cheap bastard I am not paying for a text message plan
[19:53] <jmlowe> no more growl, notification center instead, dictation a la ipad
[19:54] <joao> no ethernet?
[19:54] <joao> that's a turn off for me
[19:54] <jmlowe> thunderbolt to ethernet dongle
[19:55] <joao> sold separately, and yet another dongle you'll have to carry...
[19:55] <jmlowe> or leave on your network cable
[19:55] <elder> And WiFi anyway. Apple has been the leader in discontinuing certain technologies.
[19:55] <sjust> I just got all excited for the 1080p zenbook...
[19:55] <gregaf> my old 12" Powerbook came with all the dongles you needed :(
[19:55] <sjust> :(
[19:55] <joao> is not as if ethernet was going away any time soon... :\
[19:55] <jmlowe> 802.11ac
[19:56] <gregaf> err, it doesn't have that, does it?
[19:56] <Tv_> but does it have more leather themes on apps?
[19:56] <Tv_> the macbook air usb-to-ethernet dongle is like $29
[19:57] <Tv_> if you're seriously worried about $29 on top of the laptop price, don't buy apple
[19:57] <Tv_> sagewk: so where are we doing the retrospective?
[19:57] <joao> Tv_, I'm not worried about the price, just about the principle of the whole thing
[19:57] <jmlowe> nope, but when it does and everybody upgrades their access points then ethernet is dead, so by the time you buy the next gen next gen macbook pro you won't need the ethernet port
[19:57] <elder> Tv_, the retrospective is now right?
[19:57] <joao> as in, it's ethernet ffs
[19:57] <Tv_> elder: 3 minutes
[19:58] <elder> OK.
[19:58] <Tv_> elder: snafu
[19:58] <sagewk> fortress?
[19:58] <sagewk> or we can sit around in here
[19:59] <joao> either way it's cool with me :p
[20:00] <elder> I'm all alone on the conference call.
[21:12] <elder> joshd, 'configuration must contain a dictionary of clients'
[21:13] <elder> So now I have to change all my yaml scripts to explicitly list clients?
[21:14] <joshd> elder: for any workunits task, put the existing config in a clients: dictionary, i.e.
[21:14] <joshd> workunits:
[21:14] <joshd> clients:
[21:14] <joshd> all: [misc/snaps.sh]
[21:15] <elder> So I need to do that instead of:
[21:15] <elder> tasks:
[21:15] <elder> - workunit:
[21:15] <elder> all:
[21:15] <elder> - misc/trivial_sync.sh
[21:15] <elder> ?
[21:16] <joshd> yeah, just add clients: after the '- workunit:' line and indent the rest by two more spaces
[21:16] <elder> Is this the way I was supposed to learn this, or am I not watching my e-mail closely enough?
[21:17] <joshd> I didn't send out an email announcing it, sorry
[21:17] <elder> Why the change?
[21:18] <joshd> to allow more configuration of workunits, namely to let you set environment variables for them, so you can e.g. use the same one with new and old rbd formats
[21:18] <elder> Couldn't you make "clients" be an alias for "client.*" or something like that?
[21:20] <joshd> yeah, you could special case any new options, but this was cleaner
[21:21] <elder> Please make sure interface changes that might affect people (me in particular!) get announced before they get implemented.
[21:21] <elder> I just switched over to using the ceph/stable branch to avoid this. Although now I see this particular yaml file doesn't do that...
[21:24] <joshd> will do, sorry for the surprise
[21:25] <joshd> you can revert the last teuthology commit locally if you want to avoid the new syntax for now
[21:25] <elder> It's OK, I just hope it happens a lot less often as time goes on. When testing changes of my own I hate to have unrelated stuff fail, I always assume it's my own fault and it costs a lot of time.
[21:25] <elder> No, I just switched it over.
[21:25] <dmick> whew, Josh's. I was worried there for a minute ;)
[21:26] <elder> OK, that wasn't quite right.
[21:26] <elder> KeyError: 'all'
[21:27] <elder> tasks:
[21:27] <elder> - ceph:
[21:27] <elder> branch: stable
[21:27] <elder> - interactive:
[21:27] <elder> - rbd:
[21:27] <elder> all:
[21:27] <elder> - workunit:
[21:27] <elder> clients:
[21:27] <elder> all:
[21:27] <elder> - misc/trivial_sync.sh
[21:27] <elder> What's wrong with that, joshd?
[21:29] <joshd> it looks right, maybe I messed up the all bit
[21:30] <elder> I'm going to back off. Can you take the above and see if you can reproduce my problem, and if you agree it's a problem fix it?
[21:32] <joshd> fixed if you pull
[21:32] <yehudasa> dmick: wip-2516-2
[21:33] <dmick> ty yehudasa
[21:37] <elder> Looks better joshd, thanks a lot.
[21:40] <joshd> great, glad I could fix the trouble I caused :)
[21:43] * jasoor (~jasoor@1RDAACKHE.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:53] <yehudasa> dmick: I just forced updated that branch, omitting one unrelated commit that I pushed to master
[21:55] <dmick> ko
[22:29] <joao> is it just me, or is the connection to the planas painfully slow?
[22:31] <elder> You don't seem painfully slow to me.
[22:32] <rturk> lol
[22:36] <joao> lol
[23:03] <dmick> dunno
[23:10] <Tv_> sagewk: "ceph ... osd crush set 1 osd.1 1 host=inst01" results in "(22) Invalid argument" with ceph-mon logging "error: didn't find anywhere to add item 1 in {host=inst01}" -- what am I doing wrong with crushmap?
[23:10] <Tv_> yes inst01 is not defined in any way in the crushmap -- should it be? that makes this way more complicated..
[23:11] <sagewk> host probably isn't in there yet.. you need to specify enough ancestors so that it knows where to put it
[23:11] <sagewk> e.g. 'osd crush set 1 osd.1 1 host=foo rack=bar pool=default'
[23:13] <Tv_> sagewk: does the order matter?
[23:14] <sagewk> nope
[23:14] <sagewk> crush knows the 'ranking' of the types
[23:14] <Tv_> oh man this gets confusing
[23:14] <Tv_> an earlier command failed, now it works
[23:15] <Tv_> because apparently i just "taught" it something about the existing options
[23:15] <Tv_> ?
[23:15] <sagewk> a different command created that host, and now it can add it?
[23:15] <Tv_> ubuntu@inst01:~$ sudo ceph --cluster=ceph --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush set 1 osd.1 1 pool=p datacenter=a room=b row=c rack=d host=inst01
[23:15] <Tv_> ...
[23:15] <Tv_> (22) Invalid argument
[23:15] <Tv_> after using what you said
[23:15] <dmick> "That's the signpost up ahead - your next stop, the Twilight Zone"
[23:15] <Tv_> ubuntu@inst01:~$ sudo ceph --cluster=ceph --name=osd.1 --keyring=/var/lib/ceph/osd/ceph-1/keyring osd crush set 0 osd.0 1 pool=p datacenter=a room=b row=c rack=d host=inst01
[23:15] <Tv_> (is ok)
[23:16] <Tv_> where where as it earlier failed on osd.1, once i used your succeeding command on osd.1, now that earlier command works on osd.0
[23:16] <sagewk> can you pastebin 'ceph osd tree'?
[23:17] <Tv_> http://pastebin.com/tkDQkung
[23:17] <Tv_> i'll happily re-run this from scratch to gather more evidence
[23:18] <sagewk> i take it the set 1 one now succeeds?
[23:18] <Tv_> yeah, now they all work
[23:18] <sagewk> yeah, from scratch with 'ceph osd tree' between stages please.. something taught it where inst01 goes..
[23:19] <Tv_> yeah
[23:19] <sagewk> (in rack=bar, pool=default)
[23:31] <Tv_> sagewk: http://pastebin.com/m4XPK7hc
[23:31] <Tv_> sagewk: it seems it was triggered by the pool=p
[23:31] <Tv_> sagewk: and *once* it sees it with pool=default, from there on it just ignores pool=p
[23:33] <Tv_> so i'll just leave pool= out from my example, i guess
[23:33] <Tv_> haven't reached enlightenment on this yet though ;)
[23:41] <Tv_> ok it seems pool=default is really important there
[23:42] <Tv_> or it'll basically ignore anything new it sees
[23:47] <Tv_> this thing is highly deceptive
[23:47] <Tv_> what it says just isn't what it does
[23:49] <Tv_> oh
[23:49] <Tv_> so it refuses to move a host, even if it is given new evidence about the location of the host
[23:59] <sagewk> tv_: exactly

