#ceph IRC Log


IRC Log for 2012-04-09

Timestamps are in GMT/BST.

[0:52] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[1:04] * lofejndif (~lsqavnbok@1RDAAANYC.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:39] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[1:42] * danieagle (~Daniel@ has joined #ceph
[2:05] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[2:39] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:25] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[4:51] <iggy> danieagle: you| probably better off using postgres' built-in replication abilities... it's likely to be better tested, documented, etc.
[5:03] * brambles_ (brambles@ Quit (Remote host closed the connection)
[6:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:59] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:02] * LarsFronius (~LarsFroni@g224049109.adsl.alicedsl.de) has joined #ceph
[9:57] * exel (~pi@ has joined #ceph
[9:58] <exel> morning
[10:09] <exel> I'm having issues subscribing to the mailinglist. subscribe requests go unanswered.
[10:11] * Rocky (~r.nap@ Quit (Quit: leaving)
[10:11] * rosco (~r.nap@ has joined #ceph
[10:15] <exel> ah, there it is.
[10:15] <exel> it's just really slow.
[11:33] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:55] * loicd (~loic@magenta.dachary.org) Quit (Ping timeout: 480 seconds)
[12:16] * LarsFronius (~LarsFroni@g224049109.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[12:21] * joao (~JL@ has joined #ceph
[12:24] <danieagle> iggy, thanks.
[12:24] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[12:39] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:11] * loicd (~loic@magenta.dachary.org) has joined #ceph
[14:31] * LarsFronius (~LarsFroni@f054106219.adsl.alicedsl.de) has joined #ceph
[14:47] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[14:50] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[15:16] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[16:33] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:40] <nhm> anyone else having trouble accessing metropolis or the burnupi/plana nodes? I can't get to them either from my machine or from flak.
[16:41] <joao> let me check
[16:42] <joao> can't ssh into them either
[16:42] <nhm> yeah, no ping response. Seems there are some networking issues.
[16:43] <joao> ssh'ing into flak also took way more than usual
[16:43] <joao> s/more/longer
[16:43] <nhm> flak was normal for me at least...
[16:45] <joao> I'll blame the transatlantic connection then ;)
[17:12] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[17:21] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:23] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[17:27] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit ()
[17:27] <nhm> joao: looks like there are network problems with the irvine datacenter.
[17:28] <joao> :\
[17:28] <joao> these network problems can't be good for dreamhost
[17:28] <joao> I suppose we're not the only ones with server problems...
[17:32] <nhm> joao: It does appear to be affecting dreamhost customers: http://www.dreamhoststatus.com/
[17:39] <elder> So frustrating.
[17:41] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[17:41] <elder> Tv_, do you have access to the plana nodes?
[17:41] <elder> Just verifying it's not us out here in the hinterlands.
[17:42] <nhm> elder: I can't get to them from flak either.
[17:42] <elder> OK.
[18:11] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:19] <elder> Still no access to plana despite: Update Apr 9, 9:00am PDT:Most services are restored, although there still could be one network that is affected that we are looking into.
[18:23] <nhm> yeah, same here
[18:38] * bchrisman (~Adium@ has joined #ceph
[18:47] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:48] * jluis (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[18:51] * joao is now known as Guest1310
[18:51] * jluis is now known as joao
[18:51] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[18:52] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit ()
[18:53] <joao> oh joy
[18:53] <joao> power failed and my git repo just got corrupted
[18:53] <nhm> joao: yay
[18:54] * Guest1310 (~JL@ Quit (Ping timeout: 480 seconds)
[18:55] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[18:58] <joao> brb, gonna shout at whoever overloaded the electrical circuit
[18:58] <nrheckman> Hey guys, I'm using radosgw and I got it all set up and running... but the AWS java api seems to be sending it url encoded filenames... which doesn't seem to work? Is this expected or am I barking up the wrong tree?
[19:01] <yehudasa__> nrheckman: can you give an example of a request?
[19:02] <nrheckman> yehudasa: sure, here is the request from the apache logs: " - - [09/Apr/2012:09:58:55 -0700] "PUT /dbcontent%2F0%2F1%2F0%2F2 HTTP/1.1" 403 78 "-" "aws-sdk-java/1.3.1 Linux/2.6.32-220.7.1.el6.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/20.5-b03"
[19:03] <nrheckman> If I use libs3, everything works just fine...
[19:04] <nrheckman> but requests don't appear to be URL encoded, that's what got me thinking that might be the issue?
[19:04] <yehudasa__> hmm.. generally it should work, however, it looks like the slash that separates the bucket name from the object name is url encoded
[19:05] <nrheckman> Yeah. something is weird in the java client I'm guessing
[19:06] <yehudasa__> are you trying to access an object named 0/1/2?
[19:06] <yehudasa__> or something like that
[19:06] <yehudasa__> ?
[19:06] <yehudasa__> 0/1/0/2
[19:08] * chutzpah (~chutz@ has joined #ceph
[19:10] <nrheckman> yeah
[19:10] <yehudasa__> nrheckman: do you have the rgw log?
[19:11] <nrheckman> I do, but for some reason that request doesn't generate any log output
[19:13] <yehudasa__> if you're getting output for any other requests, then it could be your apache conf
[19:14] <yehudasa__> anything interesting in the apache log?
[19:14] <nrheckman> I think I see what's going on and it wasn't completely obvious. I scanned right past it earlier. It looks like the PutObjectRequest for the AWS java client isn't appending the bucket name to the put request. I'm guessing Amazon is doing it differently than they used to... (dbcontent isn't my bucket name, but the first folder in path)
[19:14] <yehudasa__> hmm.. it's probably using virtual host name
[19:15] <elder> Still no luck, despite: Update Apr 9, 9:21am PDT:We are happy to report that all services were restored around 8:40am
[19:15] <nrheckman> yeah. looks to be so. I'm grabbing the java docs so I can see what's going on and how it's different. could be I just need to use a different constructor for the request.
[19:32] <yehudasa__> nrheckman: maybe your apache is not serving requests that don't have the expected HTTP_HOST?
[19:33] <Tv_> nrheckman: perhaps you're trying to use the vhost mechanism without configuring it properly? (rgw dns name etc?)
[19:33] <Tv_> elder: that's most likely unrelated
[19:33] * lofejndif (~lsqavnbok@82VAACXR2.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:33] <elder> Tv_, OK.
[19:34] <yehudasa__> nrheckman: what Tv_ just said.. are you sure you're not getting any rgw logs?
[19:40] <joao> oh yeah, just managed to fix my git repo with minor losses :D
[19:50] * LarsFronius (~LarsFroni@f054106219.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[19:56] <Tv_> sepia is back
[19:56] <Tv_> sjust, elder: ^
[19:56] <dmick> beat me to it
[19:56] <Tv_> (hoping it's not just temporary)
[19:56] <Tv_> dmick: your thunder. all of it.
[19:57] <dmick> KHAAAAAANNNN!
[19:57] * adjohn (~adjohn@ has joined #ceph
[19:57] <nhm> dmick: still having trouble getting to plana both from my machine and from flak
[19:58] <Tv_> nhm: try restarting your vpn, its timeout might take a moment
[19:58] <Tv_> mine works already
[19:58] <Tv_> s/timeout/retry schedule/
[20:00] <nhm> Tv_: ok, looks like after restarting I can get to plana01.
[20:00] <nhm> Tv_: metropolis still seems to be down?
[20:00] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[20:02] * sagewk (~sage@aon.hq.newdream.net) has left #ceph
[20:02] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[20:05] * lofejndif (~lsqavnbok@82VAACXR2.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[20:05] <Tv_> nhm: i don't think i've ever even logged into that box
[20:10] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[20:10] <nhm> Tv_: can you get the burnupi nodes?
[20:11] <Tv_> nhm: nope
[20:11] <Tv_> plana neither, now
[20:11] <Tv_> oh wait
[20:11] <Tv_> ping -n works
[20:11] <Tv_> burnupi too
[20:11] <Tv_> so dns is messed up
[20:12] <Tv_> lovely
[20:12] <Tv_> ok so now i can ping -n burnupi01.ipmi but not .front
[20:12] <Tv_> somebody's messing with the 4948 switch config
[20:12] <Tv_> dmick: fyi ^
[20:13] <Tv_> i'm gonna say we should expect random loss of service for rest of today
[20:13] <sjust> :(
[20:13] <Tv_> but keep Dan updated if you notice it disappearing
[20:15] <Tv_> connect(4, {sa_family=AF_FILE, path="/var/run/avahi-daemon/socket"}, 110) = 0
[20:15] <Tv_> write(4, "RESOLVE-ADDRESS\n", 28) = 28
[20:15] <Tv_> wtf ubuntu, why are you screwing up my d
[20:15] <Tv_> ns
[20:15] <Tv_> that's why my ping without -n is miserable
[20:15] <Tv_> but why is it giving reverse lookups to mdns :(
[20:17] <dmick> ping plana from gw works, not burnupi
[20:17] <Tv_> dmick: plana.front vs burnupi.front? that probably means links to some 2960s are broken
[20:17] <dmick> different vlans?
[20:18] <Tv_> same vlan, different top-of-rack switches
[20:28] <elder> Tv_, dmick I have contact with plana nodes. I know nothing of this burnupi of which you speek.
[20:28] <elder> Or, in Eenglish, speak.
[20:28] <Tv_> elder: consider yourself lucky and don't rely on that contact working in the near future
[20:29] <elder> That's too bad, I just jumped in with both feet and started testing.
[20:29] <Tv_> it might work
[20:30] <Tv_> but NOC & neteng are both on the loose
[20:33] * LarsFronius (~LarsFroni@f054106219.adsl.alicedsl.de) has joined #ceph
[20:46] * sagelap (~sage@aon.hq.newdream.net) has joined #ceph
[21:13] * joao (~JL@89-181-153-140.net.novis.pt) Quit (Remote host closed the connection)
[21:53] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has joined #ceph
[22:03] * lofejndif (~lsqavnbok@09GAAERD5.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:27] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:35] * sagelap (~sage@aon.hq.newdream.net) Quit (Quit: Leaving.)
[22:51] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[22:55] <elder> How should I go about ensuring a few additional packages are available on a target machine so that I can build xfstests via a teuthology script?
[22:55] <elder> nhm, Tv_ , dmick ?
[22:55] <Tv_> elder: you shouldn't need to compile it
[22:55] <elder> xfstests is not packaged.
[22:56] <Tv_> oh huh i thought we already had a workunit for it
[22:56] <Tv_> can't see it in the source tree, must have been hallucinating
[22:56] <sagewk> elder: one of the ceph-qa-chef lists packages to install
[22:56] <sagewk> yeah, no workunit
[22:56] <elder> If so I haven't looked at it.
[22:56] <sagewk> we've just talked about having one :)
[22:56] <Tv_> compiling on every run is kinda miserable though
[22:56] <elder> This is what I'd like to have happen:
[22:56] <elder> apt-get install libtool automake gettext uuid-dev
[22:56] <elder> apt-get install libncurses5-dev libattr1-dev libacl1-dev
[22:57] <Tv_> a lot of the old school testing did that, and it was actually frustratingly slow
[22:57] <elder> Well, xfstests runs a few hours.
[22:57] <elder> So building is not a big part of it.
[22:57] <Tv_> heh
[22:57] <Tv_> you have a fine point there
[22:57] <elder> Maybe longer on ceph, I don't have much experience with it yet.
[22:58] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:59] <elder> Where do I find the ceph-qa-ceph file(s?)
[22:59] <Tv_> so somebody should add package lines to ceph-qa-chef.git cookbooks/ceph-qa/recipes/default.rb, with a nice commit message and all
[22:59] <elder> Ahh, I don't have that git repository. Will grab it now.
[22:59] <Tv_> it's a fine day to learn some new stuff, *nudge nudge*
[23:00] <elder> I don't mind doing it. Just needed to know where to get what I was to work on.
[23:01] <elder> So just adding a line "package 'libtool'" to that is all that's needed?
[23:02] <Tv_> well, rerunning the deploy
[23:02] <elder> There's a hell of a "sentence"
[23:02] <elder> I'll look a little harder...
[23:03] <Tv_> which entails 1) ensuring nobody else is doing it at the same time, because there's no locking and it'll be confusing
[23:03] <dmick> burnupi are down because they don't have DHCP leases
[23:03] <Tv_> 2) ensuring no test runs on the node at the same time (or taking the risk of confusion)
[23:03] <dmick> working on a way to fix that now
[23:03] <Tv_> 3) running the wget line from solo/solo-from-scratch
[23:03] <Tv_> now, wanting that across all the nodes.. that'll suck a little bit more ;)
[23:05] <elder> Neat!
[23:05] <Tv_> elder: so if you have nodes statically locked for yourself, just copy-paste the wget line on each one
[23:06] <elder> Reading up on chef right now.
[23:07] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[23:09] <dmick> rebooting all burnupis now to bring them back online
[23:12] <dmick> all back
[23:13] <dmick> thank Cthulu for serial over LAN access
[23:14] <elder> All hail FSM
[23:15] <dmick> finite state machines? (yes, I know what you mean)
[23:15] <sagewk> they brought metropolis over to aon today, racking it now
[23:16] <Tv_> sagewk: we were supposed to have a 1-day warning on that one too
[23:17] <sagewk> yep, my bad
[23:17] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:17] <sagewk> it was down this morning and it wasn't coming up on powercycle, so i had them just move it now and avoid remote troubleshooting
[23:17] <Tv_> ah ok
[23:17] <Tv_> that's different if it's our initiative
[23:21] * LarsFronius (~LarsFroni@f054106219.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[23:28] <nrheckman> yehudasa, Tv: I managed to get closer to having the AWS java api working with some RewriteRule entries and a hosts entry for <bucketname>.hostname. Though it's now complaining of "com.amazonaws.http.AmazonHttpClient - Unable to unmarshall error response (White spaces are required between publicId and systemId.)"
[23:29] <nrheckman> FYI, I couldn't find a way to switch the java client to use path-style requests.
[23:30] <nrheckman> Still works with libs3, and I can add the -h option and have it work as well.
[23:30] <yehudasa__> nrheckman: rgw supports virtual hostname style
[23:30] <nrheckman> Oh? hah! how do I enable that?
[23:31] <nrheckman> Though, I hacked it into the path with some success anyway.
[23:31] <yehudasa__> you need to set 'rgw dns name' to the host name you're using
[23:32] <yehudasa__> e.g., for <bucket>.nrheckman.com you'd set it to nrheckman.com
[23:33] <nrheckman> Thanks, will give that a shot instead. Though I'm guessing I may still have the above issue with the java client. Do you know what it's talking about with the white space requirements?
[23:33] <yehudasa__> in any case, I don't think readwrite rule is a great solution, since it'll probably mess up the url encoding
[23:34] <yehudasa__> hmm.. not really
[23:34] <nrheckman> FYI, adding that setting and removing my rewrites worked fine for libs3. Java is still complaining about whitespace.
[23:35] <yehudasa__> probably the returned error code couldn't be parsed
[23:35] <yehudasa__> I'll look at the api
[23:36] <nrheckman> Any easy way to see the xml that is being returned w/o a packet capture?
[23:37] <yehudasa__> hmm.. not sure. You can try setting 'debug rgw = 20', but it might be that we don't dump it there
[23:37] <nrheckman> already set there.
[23:37] * sjust (~sam@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[23:39] <yehudasa__> nrheckman: do you know what's the returned error code?
[23:39] * bchrisman (~Adium@ Quit (Ping timeout: 480 seconds)
[23:41] <nrheckman> looks like a 200 from the logs
[23:41] <nrheckman> 2012-04-09 14:34:21.544683 7fd133fff700 --> Status: 200
[23:41] <yehudasa__> on what kind of request?
[23:41] <nrheckman> put
[23:43] <nrheckman> hrmm. hang on. that might be the wrong request.
[23:44] <yehudasa__> yeah.. doesn't make sense
[23:44] <nrheckman> I got some additional debugging. Check this out from the java client's logs
[23:45] <nrheckman> hrmm this could be a lot to paste.
[23:45] <yehudasa__> pastebin
[23:48] <nrheckman> http://pastebin.com/uYXbJR0t
[23:49] <nrheckman> looks like it's not handling the 302?
[23:50] <yehudasa__> hmm.. I'm not sure where you're getting that 302 from.. do you have some proxy/load balancer set up?
[23:50] <nrheckman> Nope. nothing like that. I'm just making sure my rewrite rules arnt causing it
[23:51] <yehudasa__> hmm.. it probably is the rewrite rule
[23:51] <nrheckman> Hah, wow... I'm really sorry. I removed my rewrites from the httpd.conf but didn't restart apache! :(
[23:52] <yehudasa__> you still need to have the original rewrite rule
[23:52] <nrheckman> Yeah. It seems to be working now.
[23:52] <yehudasa__> cool
[23:52] <nrheckman> (My additional rewrites before I enabled vhost mode)
[23:53] <dmick> there ought to be an HTTP response code for "config file newer than config" :)
[23:53] <yehudasa__> heh, yeah
[23:53] <nrheckman> dmick: Brilliant!
[23:55] <sagewk> metropolis.ops.newdream.net
[23:55] <sagewk> dns will probably take a while
[23:56] * adjohn is now known as Guest1336
[23:56] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[23:56] * bchrisman (~Adium@ has joined #ceph
[23:57] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: No route to host)

