#ceph IRC Log


IRC Log for 2012-04-13

Timestamps are in GMT/BST.

[0:01] <gregaf> perplexed: are you running both commands from the same host?
[0:01] <perplexed> yes
[0:01] <gregaf> how long is the name it's claiming is incorrect?
[0:02] <gregaf> *name of the object
[0:02] <perplexed> phx7b01c-87e7_24268_object260 for example.. there are many w different Id's
[0:03] <perplexed> hostname was shortened to avoid the rados bench issue.
[0:04] <gregaf> hrm, that's 29 characters? and I think it's got a 30-byte buffer, of which the last is going to be used as a \0, probably?
[0:04] <gregaf> so we're still running into that
[0:04] <perplexed> I'll shorten further for the test... will keep you posted
[0:07] <gregaf> sorry :( this is fixed in 0.45, right joshd?
[0:09] <joshd> yeah, fixed in 0.45
[0:10] <perplexed> I'm on 0.43 currently, so shortening hostname to 3 chars for the moment should workaround it in the short term.
[0:11] <dmick> "object" seems like a great string to compress away
[0:13] <dmick> but I guess that comes out of the tool. never mind.
[0:17] <gregaf> yeah, the issue just comes out of rados bench's growth from cheap hack to something that mostly works
[0:18] <gregaf> used to just be object[number], which fit into 30 chars just fine, but then it got extended to include hostname so you could run more than one without them clashing ??? but the name buffer didn't get a corresponding increase in length
[0:18] <dmick> yeah, I was just momentarily confused about options. cheap hacks happen
[0:18] <nhm> gregaf: speaking of which, has anyone actually tested what rados bench is capable of on the client side writing to dummy output?
[0:18] <gregaf> no, but it's reasonably efficient
[0:19] <perplexed> gregaf: Worked like a charm... thx
[0:19] <gregaf> perplexed: glad to hear it!
[0:20] <gregaf> nhm: it's reusing the old buffers for new objects; the most expensive thing it does is the locking and signaling around requests being finished
[0:21] <nhm> gregaf: how are concurrent ops handled?
[0:21] <gregaf> not sure what you mean?
[0:22] <nhm> gregaf: ie, is data generation and IO handled in separate threads?
[0:22] <nhm> gregaf: actually, what kind of data is being generated?
[0:23] <gregaf> no, but like I said it's reusing the old buffers ??? the extent of data generation consists of printing "I'm object [name]!" into a buffer the rest of which is zeros
[0:23] <nhm> ah, ok
[0:23] <gregaf> and then the next IO in that "slot" overwrites with its own name
[0:23] <gregaf> oh, sorry it's "I'm the %dth object!"
[0:23] <nhm> gregaf: I'll try to record a mental note to keep that in mind when looking at whether our SSDs support compression.
[0:25] <gregaf> yeah, the zero thing could be a problem eventually, but it won't be much of one since it should still be network-limited and nothing's compressing *that* :)
[0:25] <nhm> gregaf: yeah, I was thinking for the jouranl tests on SSD it may make the SSD look a lot better than it is for real workloads...
[0:25] <nhm> now that we are going to be testing on 10G
[0:26] <gregaf> heh
[0:26] <gregaf> well I think it's just sandforce controllers, which I doubt we have
[0:26] <gregaf> and one shouldn't be using rados bench without a good understanding of its limitations ;)
[0:34] <nhm> gregaf: I'd check on the ssds, but it looks like the networking is down again?
[0:35] <gregaf> nhm: I don't think the status has changed much???there's flapping going on due to some router thing which is making our ISP routers unhappy; they're working on it but haven't resolved anything
[0:35] <dmick> nhm: ought we add megacli to the chef recipe?
[0:35] <dmick> i.e. installing the pkg
[0:35] <nhm> dmick: I'm for it.
[0:36] <nhm> dmick: also, if you look on burnupi01, I've got a /etc/hosts with all of the short names for the test nodes, and /etc/ssh/ssh_known_hosts for (almost) all of the hosts. We may want to make sure ssh keys survive reinstalls too...
[0:37] <nhm> dmick: I'm doing that manually for the test clusters right now, but we could make it part of the ceph install.
[0:37] <nhm> sorry, chef
[0:39] <dmick> short names raise the confusion of "did you mean 1G or 10G"
[0:39] <dmick> although I understand
[0:40] <Tv_> and /etc/hosts just plain old sucks ;)
[0:40] <Tv_> (you can get the same effect from a single search line in /etc/resolv.conf, if that's what you really want)
[0:41] * gregorg (~Greg@ Quit (Ping timeout: 480 seconds)
[0:47] * gregorg (~Greg@ has joined #ceph
[0:50] <nhm> dmick: according to the docs, it's required for mkcephfs.
[0:50] <nhm> dmick: http://ceph.newdream.net/docs/master/ops/install/mkcephfs/
[0:51] <dmick> well, if you want to use the short hostnames in ceph.conf
[0:51] <dmick> I don't think there's any other dependency
[0:51] <dmick> and I bet resolv.conf would fix that too
[0:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:00] <Tv_> nhm: what on that page says "/etc/hosts"?
[1:04] <nhm> Tv_: short names
[1:07] <Tv_> nhm: that's.. not.. how.. name resolution.. works..
[1:07] <nhm> Tv_: you aren't making any sense.
[1:07] <Tv_> nhm: you don't need /etc/hosts for anything but "things needed before dns comes up"; e.g. current hostname
[1:08] <Tv_> ceph.conf host var is used for two things:
[1:08] <Tv_> 1. what daemons to start on this node; string comparison against /bin/hostname, nothing to do with hosts or dns
[1:09] <Tv_> 2. mkcephfs destinations to ssh to; ~.
[1:09] <Tv_> 2. mkcephfs destinations to ssh to; ~/.ssh/config or just a dns search domain work fine
[1:11] <nhm> Tv_: I never said we need /etc/hosts, I said the documentation says you need to be able to SSH to all of the nodes using short host names. Given that I may use any given test node as an admin node, it makes sense to have that already setup on any of them that I might use.
[1:13] <dmick> and of course /etc/hosts makes that resilient to DNS failures (as would .ssh/config, more verbosely). So, tradeoffs.
[1:14] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[1:16] <Tv_> dns is not one of those services i'm really afraid of failing, it's so easy to host reliably
[1:25] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[1:32] <buck> question about 0.45. We're still on ubuntu 10.10 (we're going to update to 12.04 once it's out)
[1:33] <buck> 10.10 doesn't have python
[1:33] <buck> it has 2.6.6
[1:33] <buck> does anyone know "how" hard of a dependecy this is?
[1:33] <buck> if I massage the build files, am I asking for trouble?
[1:34] <sagewk> no idea.. i got that from http://wiki.debian.org/Python/TransitionToDHPython2
[2:01] * lofejndif (~lsqavnbok@1RDAAAS77.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:09] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Quit: Ex-Chat)
[2:17] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:41] <nhm> gregaf: looks like the Dell SSDs we have are rebranded samsung SM825s.
[2:41] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving)
[2:47] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:49] <gregaf> nhm: I assume that means Samsung actually made them, since they make their own chipsets
[2:49] <gregaf> back later tonight
[2:49] <nhm> gregaf: yeah, everything internal should be samsung
[2:51] <nhm> gregaf: no compression issues there. Looks like the 200GB model gets about 200GB/s. The 100GB model may be somewhat slower.
[2:51] <joao> sagewk, still around?
[2:51] <nhm> joao: and you accused me of not sleeping. ;)
[2:52] <joao> eheh
[2:52] <joao> I've been "heading to bed" for the last 40 minutes
[2:53] <sage> joao: here for a minute
[2:53] <joao> sage, did you saw what I told you on gtalk?
[2:53] <sage> nope
[2:53] <joao> copy pasting it then
[2:54] <joao> now I'm definitely heading to bed :p
[2:54] <joao> see you guys tomorrow
[2:55] * joao (~JL@ Quit (Quit: Leaving)
[2:58] * danieagle (~Daniel@ has joined #ceph
[3:10] * perplexed (~ncampbell@ has left #ceph
[3:13] * perplexed_ (~ncampbell@ has joined #ceph
[3:19] * perplexed_ (~ncampbell@ Quit (Remote host closed the connection)
[3:29] <elder> joshd are you still around?
[3:30] <joshd> yeah
[3:39] <elder> Nevermind, I think I figured it out myself.
[3:39] <elder> I'll be back to you soon if not...
[3:40] <joshd> ok, I'm around for a little longer
[4:15] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[4:42] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:07] * adjohn (~adjohn@ Quit (Quit: adjohn)
[5:54] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:17] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[6:25] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Ping timeout: 480 seconds)
[6:40] * f4m8_ is now known as f4m8
[6:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:54] * cattelan_away is now known as cattelan_away_away
[6:58] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) Quit (Remote host closed the connection)
[7:05] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) has joined #ceph
[7:08] * __nolife (~Lirezh@83-64-53-66.kocheck.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[7:18] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:15] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:11] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:21] * loicd (~loic@ has joined #ceph
[9:23] <wonko_be> is there a way to make the "ceph -s" or similar command time out when there is no monitor available (or the quorum hasn't been reached)?
[9:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[11:08] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[11:08] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[11:20] * __jt__ (~james@jamestaylor.org) Quit (Remote host closed the connection)
[11:33] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:53] * joao (~JL@ has joined #ceph
[12:54] <joao> hi all
[13:38] * lofejndif (~lsqavnbok@83TAAEYGG.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:08] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has joined #ceph
[14:10] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has left #ceph
[14:14] <wonko_be> any idea why I sometimes get these: 2012-04-13 14:13:26.325222 osd.4 2 : [WRN] map e36 wrongly marked me down or wrong addr
[14:15] <ceph-test> may be incorrect ceph.conf
[14:52] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:59] <Dieter_b1> hey wonko_be , how's your ceph setup doing? i find it interesting how you're using such a bleeding edge thing as ceph for your backups
[15:00] <Dieter_b1> but then again, i guess there aren't many decent distributed network block devices in existence
[15:02] * lofejndif (~lsqavnbok@83TAAEYGG.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[15:14] * loicd (~loic@ Quit (Quit: Leaving.)
[15:14] * loicd (~loic@ has joined #ceph
[15:20] <wonko_be> Dieter_b1: let me say the ride has been ... full of adventure
[15:21] <Dieter_b1> :)
[15:27] <Dieter_b1> wonko_be: maybe we should meet up for lunch/twunch/drinks/whatever sometime.
[15:27] * ss7pro (~ss7pro@static.nk-net.pl) has joined #ceph
[15:27] <ss7pro> Hi
[15:27] <ss7pro> is there anybody here ?
[15:27] <Dieter_b1> no
[15:28] <Dieter_b1> the 89 people your client tells you about are not here
[15:28] <Dieter_b1> anyway, what's up?
[15:28] <nhm> Dieter_b1: to be fair, most of them probably aren't actually there...
[15:28] <ss7pro> I'am wondering about these log messages from here: http://paste.openstack.org/show/12783/
[15:29] <Dieter_b1> true nhm
[15:29] <ss7pro> could anybody tell me what's the problem ?
[15:41] * f4m8 is now known as f4m8_
[15:41] <wonko_be> Dieter_b1: you're using ceph somewhere already?
[15:43] * lofejndif (~lsqavnbok@04ZAACMEZ.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:51] <Dieter_b1> no, but we are considering it for a new project
[15:52] <Dieter_b1> in fact, we had a confcall with them yesterday
[16:01] <Dieter_b1> btw wonko_be do you have chef ceph recipes?
[16:02] <Dieter_b1> aha https://github.com/wonko/ceph-cookbook
[16:06] * lofejndif (~lsqavnbok@04ZAACMEZ.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[16:10] * oliver1 (~oliver@p4FECFB3D.dip.t-dialin.net) has joined #ceph
[16:10] <nhm> ss7pro: I'm guessing we'll probably need more info than that to debug the problem...
[16:32] <ss7pro> nhm: what do you need more ?
[16:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[16:36] <nhm> ss7pro: well, to start out with, what are you trying to do, and what isn't working right?
[16:39] <ss7pro> It seems that everythings is working
[16:39] <ss7pro> but I'am wondering this messages are appering in the log stream
[16:45] <nhm> ss7pro: maybe related to this? http://tracker.newdream.net/issues/1602
[16:46] <nhm> ss7pro: what version of ceph?
[16:58] <ss7pro> nhm: ubuntu@tytan-1:~/osrbd$ ceph -v ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee) ubuntu@tytan-1:~/osrbd$
[17:01] <nhm> ss7pro: anything interesting in the mon or mds logs?
[17:01] <ss7pro> And my message comes from osd
[17:01] <ss7pro> not mon
[17:02] <ss7pro> in mon
[17:02] <ss7pro> only sth like this
[17:02] <ss7pro> 2012-04-13 15:36:43.666154 7fd5281f4700 mon.0@0(leader) e1 handle_command mon_command(mon stat v 0) v1 2012-04-13 16:57:03.437856 7fd5281f4700 mon.0@0(leader) e1 handle_command mon_command(-V v 0) v1
[17:02] <ss7pro> but this is not related to previous message
[17:03] <nhm> ss7pro: ok, might be best to wait for one of the other guys to get here then. Folks should be around in an hour or two.
[17:04] <ss7pro> thanks :-)
[17:04] <nhm> ss7pro: np, sorry I couldn't help more. cranking debugging up might reveal more if you want to play with it.
[17:05] <ss7pro> That's good idea
[17:19] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[17:27] * cattelan_away_away is now known as cattelan_away
[17:28] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[17:42] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:44] <ss7pro> I have cacthed some more logs (with debug turnedon)
[17:44] <ss7pro> http://paste.openstack.org/show/12786/ http://paste.openstack.org/show/12787/
[17:44] <ss7pro> For me it dosen't explain anything ;(
[17:46] * oliver1 (~oliver@p4FECFB3D.dip.t-dialin.net) has left #ceph
[17:47] <joao> sagewk, it just failed on a coll_move :p
[17:47] * joao digs in
[17:54] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[17:56] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[18:11] * adjohn (~adjohn@ has joined #ceph
[18:11] * adjohn (~adjohn@ Quit ()
[18:17] <wonko_be> Dieter_b1: I've sent a mail to the list, but somehow it got lost...
[18:17] <wonko_be> strange, I'll retry later today
[18:18] <wonko_be> (concerning the chef cookbooks)
[18:18] <Dieter_b1> wonko_be: okay cool thanks.
[18:18] <wonko_be> feel free to use them
[18:18] <Dieter_b1> that's the point right :)
[18:19] <Dieter_b1> actually one of the ceph engineers told me yesterday he's also working on chef cookbooks or recipes
[18:19] <wonko_be> yeah, Tv_ is the guy you need
[18:19] <wonko_be> but the cookbooks on the ceph repo are 7 months old
[18:19] <Tv_> yeah i'm getting back on that track
[18:20] <Tv_> got interrupted by needing to set up the new test lab
[18:20] <wonko_be> Tv_: feel free to check mine, you know, and take anything you need from it
[18:20] <Dieter_b1> Tv_: were you one of the guys on the call yesterday?
[18:20] <Tv_> wonko_be: the problem with the 3rd party cookbooks i've seen is that they tend to just codify existing bad practices, instead of going out & making thing s better
[18:21] <Tv_> wonko_be: i'm actually working with the premise of getting code changes in ceph.git when needed to make things easier, etc
[18:21] <Tv_> Dieter_b1: yes
[18:21] <Dieter_b1> okay, hi again then :)
[18:21] <wonko_be> Tv_: the big problem I'm facing is that I'm have to work with the thing I have, and that I can't wait to make the cookbooks till "all is done"
[18:21] <Tv_> Hi!
[18:21] <wonko_be> I needed them now, to streamline my tests
[18:22] <wonko_be> the step to publish them was small, and it is the only thing I can do to contribute back
[18:22] <Tv_> wonko_be: yeah, i know; that's the situation with every 3rd party cookbook i've looked at, and that's why i'm not merging any of them
[18:23] <wonko_be> I can't make cookbooks with technology that doesn't exist yet
[18:23] <Tv_> hence, me making that technology ;)
[18:23] <wonko_be> I've incorporated most of the stuff that was discussed on the mailing list
[18:24] <wonko_be> Tv_: as long as it stays inaccessible to others, they can't help you :)
[18:24] <Tv_> wonko_be: yeah please do think of the cookbooks i'm working on as the "next gen"
[18:24] <Tv_> that's really what they are
[18:24] <wonko_be> I'm very curious, and I know that I'm doing redundant work
[18:25] <wonko_be> do know that all the "moving disks should be possible" and "growing clusters", are all in mine, with what exists
[18:25] <wonko_be> not noob-ready, but clever-sysadmin-ready
[18:26] <Tv_> wonko_be: i'm looking for the "tape monkey on roller skates" level, where disks can be grabbed from a pile of spares and just plugged in
[18:26] <Tv_> wonko_be: because that's fundamentally what even we internally need
[18:27] <Tv_> as in, i've been to our data center exactly once, mostly to look around
[18:28] <elder> Tv_, I have a Python question. Trying to create the path to a device I use: test_dev = '/dev/rbd/rbd/{image}'.format(image=test_image),
[18:28] <elder> but the result I get is: ('/dev/rbd/rbd/test_image_rbd',)
[18:28] <Tv_> and i really believe it's not that difficult either; i'm more distracted by other things than having difficulty programming it
[18:28] <Tv_> elder: comma at the end makes it a tuple
[18:28] <Tv_> elder: foo, == (foo,)
[18:28] <elder> Ahh.
[18:28] <Tv_> elder: foo, bar == (foo, bar)
[18:29] <Tv_> yeah i wish that was a syntax error
[18:29] <Tv_> without the parens
[18:29] <wonko_be> Tv_: the cookbook I created might help early adaptors then, willing to give Ceph a spin...
[18:30] <wonko_be> (should I do crazy stuff to post a new thread to the mailing list?)
[18:30] <Tv_> wonko_be: also, would you like a job writing Ceph cookbooks etc install/admin stuff?-)
[18:30] <elder> I see it now.
[18:30] <wonko_be> Tv_: I'm running a company over here, but I'm willing to participate (and put some personel of mine on it)
[18:30] <elder> Maybe a warning at least, since it's conceivable that's what you wanted.
[18:31] <Tv_> elder: you can always use parens, if you wanted it
[18:31] <Tv_> elder: it's one of the remaining warts of python
[18:31] <elder> I'm getting the hang of it.
[18:31] <elder> Now I know one more thing to look fo.
[18:31] <elder> Yo.
[18:33] <wonko_be> Tv_: If you would give some insight (just publish them) in your cookbooks, I would be more than happy to join efforts and work on one set
[18:39] * bchrisman (~Adium@ has joined #ceph
[18:41] <Tv_> wonko_be: i'm sorry the future design is not well written down; partially because it's pending investigation into underlying tools
[18:41] <Tv_> wonko_be: me & Sage started the "braindump" mailing list threads to get some of it out
[18:42] <Tv_> wonko_be: basically, i'm looking for two big missing chunks: automatic osd creation/startup based on disks that happen to be there, and reliable automatic bringup of new mon clusters
[18:43] <Tv_> wonko_be: all of this with minimal configuration and no pre-dictated IP addresses etc
[18:43] <Tv_> as in, automatic, not configured
[18:46] <darkfaded> guys, i
[18:46] <darkfaded> argh.
[18:46] <darkfaded> (german keymap is not for me)
[18:47] <darkfaded> i'm just updating our storage class description to include ceph officially now. does anyone have a problem if i point out that this is the first and so far only training on ceph?
[18:48] <Tv_> darkfaded: can you email Dona and work with her on the wording?
[18:48] <darkfaded> sure
[18:49] * perplexed (~ncampbell@ has joined #ceph
[18:59] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[19:02] <wonko_be> Tv_: the automatic creation is there already
[19:02] <wonko_be> and my mon cluster starts as soon as more than 2 nodes are online, and have the mon recipe applied
[19:03] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[19:04] <Tv_> wonko_be: we went through the design and ended up thinking it can't be done in a safe way without using new features in the monitors themselves; i haven't looked at your cookbook but i hope you realize the kind of differences I'm looking at
[19:05] <buck> sagewk: I'm struggling to get a core file with symbols in it. Guessing I'm missing something obvious .Is there a generic "how to get a good core dump" wiki you could point me at?
[19:05] <wonko_be> Tv_: I think you want to make cookbooks using features that aren't there yet
[19:05] <wonko_be> which is okay for long-term planning, but some people need them now too
[19:06] <Tv_> wonko_be: yes, i fully intend to improve the installability/manageability of Ceph as a whole, not limiting myself to any part of it
[19:06] <wonko_be> we are not that far apart on opinions, but I'm outside the ceph project, just a regular user, and you're inside, knowing what will come
[19:07] <Tv_> wonko_be: we're trying to be very open, and i would really like to have more people participate
[19:07] <gregaf> ss7pro: those debug messages are just standard messenger timeouts (because there's no traffic between them, so they don't need to maintain them) and reconnects, unless they're spamming out more frequently than in your first snippet :)
[19:07] <Tv_> wonko_be: i'm just trying to make you understand why i don't "just merge" cookbooks
[19:08] <wonko_be> Tv_: i'm not at all asking you to merge/use/... my cookbooks, I'll happily maintain them till Ceph comes with a solution
[19:08] <wonko_be> I need them anyway, so ...
[19:08] <Tv_> wonko_be: monitor cluster bringup using just Chef features is either racy, not automatic, or horribly slow (not even sure if the last one exists as an option)
[19:08] <gregaf> wonko_be: going back to your questions, making ceph -s time out is I guess possible, but I'm not sure why it's interesting compared to ^c
[19:09] <wonko_be> gregaf: chef can't ctrl-c
[19:09] <gregaf> and the "wrongly marked me down" is the OSD protesting that it thinks it was up when it got marked down, but it can happen if it went unresponsive for a while or something
[19:09] <gregaf> wonko_be: ah, of course
[19:09] <gregaf> well, file a bug! :)
[19:10] <wonko_be> gregaf: I was going to use it to see if I can extract a monmap, add a osd, or manipulate the crushmap using the ceph command, ... but only if my monitors are somehow in an okay shape
[19:10] <Tv_> wonko_be: /usr/bin/timeout might be your friend
[19:10] <wonko_be> Tv_: true, but having it in ceph might be cleaner
[19:10] <Tv_> wonko_be: oh that's racy..
[19:11] <Tv_> wonko_be: just suggesting an immediate fix
[19:11] <Tv_> wonko_be: we added a "osd create" subcommand to the ceph tool, to add osds atomically
[19:11] <wonko_be> Tv_: currently, I don't care :) It is in the readme you should first apply the mon recipes till that part is okay
[19:11] <wonko_be> Tv_: i'm using that already
[19:11] <wonko_be> grab the ID back from the output
[19:11] <buck> I have what looks to be a reproducible mds crash and I'm struggling to create a core file with symbols in it. I've installed the *-dbg* packages.
[19:12] <buck> Any other ideas? I'm sure I'm missing something simple
[19:12] <Tv_> wonko_be: you can't have multiple osds' chef scripts doing read-edit-write cycles of the crush map without races
[19:12] <wonko_be> Tv_: indeed, I know
[19:12] <Tv_> wonko_be: unless you use atomic commands like "osd create"
[19:12] <elder> Tv_, it seems like the cleanup (finally) portion of dev_create in teuthology/task/rbd.py does not handle well multiple devices being created.
[19:13] <wonko_be> and i only write the map
[19:13] <Tv_> elder: show me the code and output from a run etc all in one fell swoop and i'll help after the daily
[19:13] <wonko_be> I don't care about the existing map
[19:13] <wonko_be> I just add a new osd
[19:13] <wonko_be> with a rack and a host
[19:13] <elder> Will do. Thanks Tv_
[19:14] <wonko_be> (yes, there is much room for improvement :))
[19:18] <elder> nhm, are you on?
[19:18] <nhm> elder: yep
[19:19] <elder> The call?
[19:19] <nhm> yep
[19:19] <elder> Why are we not?
[19:19] <nhm> who is we?
[19:19] <joao> oh, not again...
[19:19] <nhm> hehe
[19:19] <elder> Joao, Ken, and I.
[19:19] <joao> are you guys in Danger Room?
[19:19] <nhm> awesome
[19:19] <nhm> yep
[19:19] <joao> oh well...
[19:19] <elder> I don't know. Where are we, guys/
[19:21] * grape (~grape@ Quit (Ping timeout: 480 seconds)
[19:21] <elder> OK, well we three signed off, I tried again, and I'm not connecting. Please inform the others.
[19:21] <nhm> elder: yeah, will do.
[19:23] <elder> Tv_, let me know when you're back on again.
[19:23] <joao> yeah, nhm, I am unable to find you guys
[19:23] <joao> I keep having a room all to myself
[19:23] <nhm> joao: I'll let them know what's going on in a bit
[19:23] <elder> Well, please report your status to yourself then, joao.
[19:27] * grape (~grape@ has joined #ceph
[19:27] <elder> Tv_, grabbing some lunch. Back in 15 minutes or so.
[19:28] * loicd (~loic@ Quit (Quit: Leaving.)
[19:28] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving)
[19:29] * perplexed (~ncampbell@ has left #ceph
[19:33] <sagewk> for future reference: make sure you join the room for the danger room, not the 'call' point to point function. i think mark called us and we didn't realize we weren't in the room.
[19:34] <joao> sagewk, I just opened the url and joined as a guest
[19:34] <sagewk> just librbd/librados made it into ubuntu precise, which is probably the best outcome.
[19:34] <joao> not sure if that can be the problem though
[19:34] <sagewk> you were fine, we goofed our end.
[19:39] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[19:41] <nhm> sagewk: huh, I thought I clicked "join room", but I'll keep it mind.
[19:43] <sagewk> qa passed last night! :)
[19:43] <nhm> woo
[19:43] <dmick> perhaps someone with one of those nice tablet/netbook/smallportableusable computers should take one into the room for IRC coping.. :) And yay QA
[19:48] <Tv_> elder: i'm back but in/out of meetings a lot
[19:48] <Tv_> sagewk, dmick: time for a 5-10 min gitbuilder status braindump
[19:48] <Tv_> ?
[19:48] <Tv_> "do you have", too
[19:49] <sagewk> yeah!
[19:51] * chutzpah (~chutz@ has joined #ceph
[20:10] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[20:29] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[20:37] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[20:40] <buck> is there a trick to getting symbols in a core file (for MDS in my case) beyond installing the *-dbg* packages?
[20:49] * jlogan (~chatzilla@2600:c00:3010:1:7c6d:275:da2e:d8f5) has joined #ceph
[20:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:56] * danieagle (~Daniel@ has joined #ceph
[21:15] * ss7pro (~ss7pro@static.nk-net.pl) Quit (Remote host closed the connection)
[21:25] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[21:37] <dmick> buck: that's supposed to be enough as I understand it
[21:38] <dmick> I looked into the mechanism a bit last night
[21:38] <dmick> they have to be the same versions, of course
[21:38] <buck> dmick: Hmm...ok. Thanks
[21:39] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[21:40] * lofejndif (~lsqavnbok@9YYAAFB4R.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:49] * Oliver1 (~oliver1@p54839C9B.dip.t-dialin.net) has joined #ceph
[21:56] <dmick> more sensible SSD numbers on these disks with larger blocksize
[21:56] <Tv_> dmick: is that writes or reads?
[21:56] <Tv_> for writes, it makes sense
[21:56] <Tv_> and probably the point where it plateaus is the erase block size..
[21:57] * lofejndif (~lsqavnbok@9YYAAFB4R.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[21:57] <nhm> Tv_: looks like we have the older ss805 drives which should actually be faster for writes than the sm825s.
[21:57] <dmick> for both
[21:57] <dmick> was using 4k r/w, now 32; makes a big difference to both
[21:58] <dmick> libaio, 4 in-flight
[21:58] * lofejndif (~lsqavnbok@1RDAAAUKL.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:59] <Tv_> yeah erase block size >4k is very likely
[21:59] <Tv_> i'd expect even 512k
[22:10] <yehudasa> gregaf: do you know if client/testceph.cc supposed to pass?
[22:18] <gregaf> yehudasa: well, it did once?
[22:19] <gregaf> I don't remember it very well but it doesn't look like it's doing anything very complicated, although some of them are regression tests that used to fail and shouldn't now...
[22:26] * BManojlovic (~steki@ has joined #ceph
[23:11] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[23:24] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[23:30] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[23:34] * Oliver1 (~oliver1@p54839C9B.dip.t-dialin.net) Quit (Quit: Leaving.)
[23:42] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:52] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving)
[23:54] * jlogan (~chatzilla@2600:c00:3010:1:7c6d:275:da2e:d8f5) Quit (Remote host closed the connection)
[23:56] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.