#ceph IRC Log

Index

IRC Log for 2011-11-22

Timestamps are in GMT/BST.

[0:01] <joshd> cpu usage is not too much except during recovery
[0:02] <joshd> if you're using the kernel rbd or ceph module, you can get a deadlock on the same machine as an osd, but if you're just using qemu/kvm or fuse it's fine to have osds and clients on the same host
[0:05] <yehudasa_> damoxc: does anything actually work for you? can you push everything to a tree like you did last time?
[0:05] <stickman123> joshd: thanks, so using rbd and qemu/kvm wouldn't cause problems?
[0:05] <stickman123> isn't rbd tied to the qemu/kvm component?
[0:06] * grape (~grape@c-76-17-80-143.hsd1.ga.comcast.net) has joined #ceph
[0:07] <joshd> stickman123: there's an rbd kernel module as well, which you could use if you didn't want to use qemu
[0:08] <joshd> there's also a library, librbd, if you wanted to do more specialized things (qemu and the command line rbd tool use this)
[0:10] <stickman123> oh ok, are there any performance issues to take into consideration?
[0:11] <joshd> if you compile a newer qemu, you can get better write performance with the rbd_writeback_window option
[0:11] <stickman123> using librbd?
[0:12] <joshd> yeah
[0:12] <joshd> we haven't implemented that on the kernel side yet
[0:12] <stickman123> awesome. i'll clearly need to look into that more. but as you said if i'm running the OSDs on the same machine as the client I shouldn't use the kernel module, right?
[0:13] <joshd> right
[0:14] * chaos_ (~chaos@hybris.inf.ug.edu.pl) Quit (Ping timeout: 480 seconds)
[0:15] <stickman123> thanks. more or less I have one less question for now... the FAQ for ceph says it's not production ready, but it seems like enough people are running it right now so I didn't know if that just is an old answer and hasn't been updated or if it really shouldn't be used in a production environment right now?
[0:17] <damoxc> yehudasa_: yeah everything else is working fine, i'm able to make-bcache -B /dev/rbd0, echo /dev/rbd0 > /sys/fs/bcache/register, mkfs.ext4 /dev/bcache0, mount /dev/bcache0 /mnt, then do whatever I want, run a postmark, dd if=/dev/zero'd some stuff, it's only when I attach the cache device that it starts erroring
[0:22] <joshd> stickman123: rbd is more stable than the full-fledged filesystem, but you will probably run into bugs somewhere - possibly even the filesystem under the OSD
[0:22] <damoxc> yehudasa_: github.com/damoxc/linux branch ceph-testing-new
[0:23] * chaos_ (~chaos@hybris.inf.ug.edu.pl) has joined #ceph
[0:23] <stickman123> joshd: thanks. i really appreciate your help with this
[0:24] <joshd> stickman123: you're welcome :)
[0:51] * stickman123 (~mmeuleman@38.98.143.2) has left #ceph
[0:52] <gregaf> elder: should xfs be able to corrupt itself if you reboot without syncing? (ie reboot -fn)
[0:52] <elder> No.
[0:52] <elder> But what kind of corruption are you talking about?
[0:52] <elder> You can easily lose cached data, but the filesystem should be able to preserve its integrity.
[0:53] <elder> Log replay should ensure things are consistent, but again, you might have some data loss.
[0:53] <gregaf> I sent an email to the xfs list this morning describing it; attempting to mount is failing with an internal error
[0:54] <gregaf> at the time I thought it had been a clean shutdown and everything but it turns out the admin used reboot -fn so I wanted to make sure that shouldn't have been able to break it
[0:54] <elder> I'm sorry I am on my way out the door for dinner. I'll check back when I return though.
[0:54] <gregaf> np, just wanted to check before I wasted more list time :)
[1:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:03] <Tv__> alright i am sitting on new chair, time for 4pm "lunch"
[1:13] <sagewk> tv__: subnet things looks ok. now to make ceph-mon do something sane with it..
[1:19] <damoxc> sagewk: did you see what I said earlier about osds not talking to one another?
[1:20] <sagewk> damoxc: looking
[1:21] <sagewk> are there multiple interfaces or anything funky like that?
[1:22] <damoxc> sagewk: i'm using public/cluster addr on multiple interfaces
[1:22] <damoxc> sagewk: they start off absolutely fine, but then after a few hours or so some of them stop being able to talk
[1:22] <sagewk> do you see connect errors in teh osd log on teh cluster addrs?
[1:22] <sagewk> ah
[1:23] <damoxc> yeah, it's a bit odd
[1:23] <sagewk> it would be interesting to see exactly which error codes are coming out of which syscalls... if you can reproduce it with 'debug ms = 20' and post a log fragment that will tell us something
[1:24] <damoxc> sagewk: sure thing, i've got debugging symbols installed as well if you need tracebacks also..
[1:25] <sagewk> damoxc: that'll be helpful if we make it crash.. it's just marking peer osds down tho right?
[1:27] <sagewk> tv__: so i think the last trick to make the mon mkfs work is to decide which of the addrs listed in mon_host matches the local machine, and name it appropriately in the seed mkfs monmap
[1:28] <damoxc> sagewk: yeah, http://dpaste.com/659985/
[1:28] <damoxc> sagewk: http://dpaste.com/659986/ might be hopeful also?
[1:30] <damoxc> *helpful
[1:32] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving)
[1:32] <sagewk> damoxc: k, so any log that shows an error will tell us something. ideally having the pair tho (e.g. osd.1 marks osd.2 down, so osd.1 and osd.2 logs)
[1:33] <damoxc> sagewk: will inject-args work?
[1:34] <sagewk> should
[1:34] <sagewk> ceph osd tell \* injectargs '--debug-ms 20'
[1:34] <sagewk> except that won't tell the osds that are marked down
[1:34] <damoxc> okay
[1:34] <sagewk> need to restart those, or connect with gdb and 'p g_conf->debug_ms = 20'
[1:39] <damoxc> okies, run the ceph command and attached to the down ones with gdb
[1:40] <damoxc> some connection refuseds by the looks of things
[1:42] <sagewk> what happens if you telnet to those ports directly?
[1:42] <sagewk> and are they the same ip:port you see in the ceph osd dump ?
[1:42] <damoxc> 192.168.55.25:0/25090, 111: Connection refuse
[1:42] <damoxc> that's one of them
[1:43] <damoxc> so no
[1:43] <sagewk> look at the refusing process's /proc/$pid/fds and see if it's leaking sockets or something?
[1:44] <sagewk> not sure offhand how to verify the process is still bound to the socket.. netstat -an i guess shoud list it?
[1:45] <damoxc> 25 has 9 sockets in fd
[1:45] <damoxc> would the problem be it thinks the port is 0?
[1:46] <damoxc> it's definitely listening
[1:46] <sagewk> oh... wait, can you paste that whole section of log?
[1:46] <damoxc> http://dpaste.com/659991/
[1:46] <damoxc> there's a whole chunk
[1:46] <Tv__> sagewk: oh right so the monmap has id->ip:port mapping...
[1:47] <sagewk> tv__: yeah
[1:47] <Tv__> sagewk: but it won't know ids of others anyway..
[1:47] <Tv__> sagewk: until it talks to them
[1:47] <sagewk> that's fine
[1:48] <sagewk> tv__: ceph_mon.cc:128 needs to name itself in the monmap it generates
[1:48] <sagewk> in fact, i think currently that path is basically useless because the default map names monitors a,b,c, so unless it happens to match what -i was passed here it'll be wrong
[1:49] <sagewk> but i think with that'll it'll work (with mon host = a,b,c and no mon addr in the config)
[1:50] <sagewk> oh right, it's fine if the mon addrs are in the conf, but currently (without that piece) it'll behave weirdly with just mon host.
[1:50] <Tv__> sagewk: i still don't understand the intricacies.. thinking about dynamo-style systems, why can't it just use the "mon host" list as seeds, and say "hey guys please add me, i'm mon.$id"
[1:51] <sagewk> it can, if the existing nodes are up with an existing cluster
[1:51] <Tv__> sagewk: grab whatever monmap the existing cluster has, add itself (it knows $id, by that time it knows it's ip too)
[1:51] <Tv__> sagewk: ah so this is the very bootstrap case..
[1:51] <sagewk> it's the initial bootstrap where you need to know which one you are in the map
[1:52] <Tv__> sagewk: and by saying "which one" you mean the 0/1/2 numbering, not the $id
[1:52] <Tv__> ?
[1:52] <sagewk> altho yeah, need to do the add to cluster case too..
[1:52] <sagewk> it needs to find it's $id listed in the seed monmap
[1:52] <sagewk> so that it binds to that address and contacts the nodes that aren't itself.
[1:52] <Tv__> sagewk: that requirement sounds artificial, for some reason
[1:53] <Tv__> as in.. what if all the nodes pick #0 as their id, and then form a quorum -- those initial monmaps were quite useless
[1:53] <sagewk> it's just an artifact of the implementation. the actual requirement is that one item in the mon host list is configured on the current node during the initial bootstrap
[1:53] <sagewk> (on enough montiors to form teh initial quorum
[1:54] <Tv__> sagewk: why is even that true, btw?
[1:54] <sagewk> tv__; good point. they won't if hte mon host is identical on all of them, but if the order varies there's a problem
[1:54] <Tv__> just trying to understand
[1:54] <Tv__> dynamo-style bootstrap explicitly uses seeds, as in <= num_of_actual_nodes
[1:55] <sagewk> tv__: i had patches that rewrote quorum in terms of names instead of ids, but didn't merge it because i didn't actually need it and was a big protocol change. it would fix this problem tho
[1:55] <Tv__> as long as the mon is able to pick cluster&public ip, and reach at least one of the mon_host, what else is really needed?
[1:56] <sagewk> nothing, to join an existing cluster.
[1:56] <sagewk> to form an initial cluster, everyone needs to agree on the initial size of the quorum
[1:57] <sagewk> conceivably they could use each other to learn about each other (and only start out knowing 1 peer), but that's harder; for now the initial set needs to know all the peers, and if >N/2 are up they succeed.
[1:58] <sagewk> gotta run, let's talk about it in the morning
[1:58] <sagewk> damoxc: yeah, they're trying to connect to a peer at port 0.
[1:59] <sagewk> would be great to have this level of logging leading up to the first failure
[1:59] <sagewk> (maybe restart all the daemons with debug ms = 20 and leave it running overnight?)
[2:01] <Tv__> sagewk: yes but wouldn't that go with 1. list parties interested in setting up cluster 2. sort & identify 3. vote
[2:01] <Tv__> the strongly connected subset can be the initial quorum, that's fine, it's not too big of a requirement
[2:01] <damoxc> sagewk: sure thing, I'll put the logs up for you tomorrow
[2:02] <Tv__> but that's not quite the same as "your mon cluster_addr must be in mon_hosts"
[2:03] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:08] <damoxc> sagewk: okay restarted, i'll upload the logs somewhere tomorrow after they've crashed
[2:27] * adjohn (~adjohn@208.90.214.43) Quit (Quit: adjohn)
[2:37] * mosu001 (~michael.o@wireless-nat-1.auckland.ac.nz) has joined #ceph
[2:37] * mosu001 (~michael.o@wireless-nat-1.auckland.ac.nz) has left #ceph
[2:40] * cp (~cp@74.85.19.35) Quit (Quit: cp)
[2:48] * The_Bishop (~bishop@port-92-206-183-175.dynamic.qsc.de) has joined #ceph
[3:10] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[3:16] * aa (~aa@r190-135-149-181.dialup.adsl.anteldata.net.uy) has joined #ceph
[3:20] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:35] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:41] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[5:22] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[6:08] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[6:10] * aa (~aa@r190-135-149-181.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[6:22] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:07] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[9:23] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:41] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[9:59] * Tv__ (~Tv__@cpe-76-168-227-45.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:18] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:53] * raso (~raso@debian-multimedia.org) Quit (Quit: WeeChat 0.3.5)
[12:23] <psomas> Is there a way to see what objects are mapped to each pg?
[12:44] <todin> psomas: you mean on the osd site?
[12:53] <psomas> well, i found that from the osd logs
[12:53] <psomas> but I couldn't find it from the logs when using the rados or rbd command with logging enabled
[12:55] <psomas> recalc_op_target tid 1 pgid 2.ab445e08 acting [0,2]
[12:55] <psomas> osd_op(client.21187.0:1 rb.0.4e.00000000003f [create 0~0] 2.ab445e08) v1
[12:55] <psomas> i thought the pgid was the pg
[12:56] <psomas> in the osd logs, i found that it was mapped to 2.8
[12:56] <psomas> sagewk: about the bug i mentioned yesterday, it seems now that one pg is stuck as active+clean+scrubbing, and every op referring to that pg seems to hang
[13:19] <psomas> and from the osd logs, i see something like this
[13:19] <psomas> 2011-11-21 15:17:56.791423 7fde93da8700 osd.0 704 pg[2.8( v 704'2046 (704'2041,704'2046] n=18 ec=1 les/c 696/704 695/695/690) [0,2] r=0 luod=704'2047 lcod 704'2045 mlcod 704'2045 !hml active+clean+scrubbing] enqueue_op 0x3170240 osd_op(client.20944.0:8 rb.0.40.000000000006 [delete] 2.3a8bb108) v3
[13:20] <psomas> 2011-11-21 15:17:56.791510 7fde90ca1700 osd.0 704 dequeue_op osd_op(client.20944.0:8 rb.0.40.000000000006 [delete] 2.3a8bb108) v3 pg pg[2.8( v 704'2046 (704'2041,704'2046] n=18 ec=1 les/c 696/704 695/695/690) [0,2] r=0 luod=704'2047 lcod 704'2045 mlcod 704'2045 !hml active+clean+scrubbing], 0 more pending
[13:20] <psomas> 2011-11-21 15:17:56.791591 7fde90ca1700 osd.0 704 pg[2.8( v 704'2046 (704'2041,704'2046] n=18 ec=1 les/c 696/704 695/695/690) [0,2] r=0 luod=704'2047 lcod 704'2045 mlcod 704'2045 !hml active+clean+scrubbing] do_op osd_op(client.20944.0:8 rb.0.40.000000000006 [delete] 2.3a8bb108) v3 may_write
[13:20] <psomas> 2011-11-21 15:17:56.791619 7fde90ca1700 osd.0 704 pg[2.8( v 704'2046 (704'2041,704'2046] n=18 ec=1 les/c 696/704 695/695/690) [0,2] r=0 luod=704'2047 lcod 704'2045 mlcod 704'2045 !hml active+clean+scrubbing] do_op: waiting for scrub
[13:21] * Nightdog (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[13:29] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[14:43] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) has joined #ceph
[16:18] * dgandhi (~dwarren@209.166.129.51) has joined #ceph
[17:07] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[17:22] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:29] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[17:39] <elder> sync
[17:39] <elder> Yep, wrong window.
[17:47] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[17:55] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[18:07] * grape (~grape@c-76-17-80-143.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[18:26] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:30] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:40] * fronlius (~fronlius@f054104167.adsl.alicedsl.de) has joined #ceph
[18:57] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:58] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[19:00] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:03] * colon_D (~colon_D@173-165-224-105-minnesota.hfc.comcastbusiness.net) has joined #ceph
[19:04] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[19:07] <colon_D> Can Ceph work with a shared-disk i.e. SAN?
[19:10] <joshd> colon_D: Ceph assumes it's in charge of replication between OSDs
[19:14] <sagewk> colon_d: you could run it on a san if you assign a separate lun to each osd (i.e. don't actually share anything via the san)
[19:14] <psomas> about commit 45cf89c (Revert "osd: simplify finalizing scrub on replica")
[19:15] <psomas> is it possible that if i run v0.38 without this commit/revert, a scrub could get stuck in finalizing_scrub?
[19:15] <colon_D> OK. So it'd be attempting to replicate regardless of shared storage.
[19:17] <psomas> and btw, how ofter does 'automatic scrubbing' run? is it configurable? (and the loadavg check too)
[19:33] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:34] <joshd> psomas: it runs at most each osd_scrub_max_interval seconds when load is less than than osd_scrub_load_threshold
[19:35] * dgandhi (~dwarren@209.166.129.51) Quit (Quit: Leaving.)
[19:40] <psomas> joshd: y, i found that after some grepping :) but i also saw a scrub_finalize_timeout in the ceph options
[19:46] <joshd> psomas: that's used as a timeout for the ScrubFinalizeWQ to make sure it's making progress
[19:47] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[19:48] <psomas> y, but with v0.38, one specific pg on an osd seems stuck after a scrub i think
[19:48] <psomas> any operation on this pg returns
[19:48] <joshd> psomas: if that workqueue doesn't touch the heartbeatmap it'll suicide after 10*scrub_finalize_timeout
[19:48] <psomas> 2011-11-21 15:17:56.791619 7fde90ca1700 osd.0 704 pg[2.8( v 704'2046 (704'2041,704'2046] n=18 ec=1 les/c 696/704 695/695/690) [0,2] r=0 luod=704'2047 lcod 704'2045 mlcod 704'2045 !hml active+clean+scrubbing] do_op: waiting for scrub
[19:48] <psomas> the pg is markes as scrubbing and never changes back to active+clean
[19:49] <psomas> and i'm getting some scrub errors on all osds, when 'automatic scrubbing' is supposed to happen
[19:49] <psomas> either "cannot reserve locally", or "peer denied" errors
[19:50] <psomas> and i'm guessing that the scrub on that particular pg is stuck, and for some reason doesn't suicide
[19:50] <joshd> hmm, the regular ScrubWQ doesn't have a suicide timeout
[19:51] <joshd> sjust: could that cause the hang?
[19:52] <sjust> joshdd, psomas: hmm, one sec
[19:53] <psomas> problem is i don't know what causes the bug, and i tried to get as much debug output/logs as i could
[19:53] <sjust> psomas: you were running vanilla v0.38?
[19:53] <psomas> i could try adding a timeout for ScrubWQ too
[19:53] <psomas> y
[19:54] <sjust> it should never get stuck in that state in the first place
[19:54] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:55] <psomas> and commit 45cf89c seems somewhat related, but i'm not sure about that
[19:55] <sjust> psomas: it might be related, it was not intended to fix a hang, but I did make a few changes to the scrubbing code in the last week or so
[19:57] <psomas> btw, what's the diff between timeout and suicide interval in the wq?
[19:58] <joshd> after the timeout, the heartbeatmap will report that it's not healthy. after the suicide timeout, it'll assert(0)
[20:07] <psomas> ok, if i try to reproduce this, is it ok if i add a suicide_timeout in ScrubWQ in OSD.h?
[20:08] <psomas> just to make sure, and after that i can try building ceph with commit 45cf89c (and a couple of other i think that are related to this one)
[20:09] <sjust> psomas: sounds ok
[20:22] <Tv> /dev/vdb 193G 184G 0 100% /var/lib/teuthworker/archive
[20:23] <Tv> ruh-roh
[20:23] <Tv> joshd, sagewk: ^ i know why some nightlies might have failed ;)
[20:25] <Tv> are some of the old nightlies logs safe to remove?
[20:25] <joshd> ow, the coverage html is taking 45G
[20:26] <Tv> don't open it in a browser ;)
[20:26] <joshd> maybe we should only generate it for the total suite and not each test
[20:27] <Tv> i just freed ~4GB, hopefully that'll ease things a little
[20:30] <Tv> joshd: on a tangent, that vm has cloud-utils installed, that keeps spamming root from cron.. i see no purpose for that package
[20:32] <joshd> Tv: it was created from the base image, which seems to have installed it with cloud-init
[20:32] <Tv> sjust: i have a crashed osd with assert(0) in PG::build_inc_scrub_map, interested?
[20:32] <sjust> ...yes :(
[20:32] <sjust> when?
[20:32] <Tv> joshd: oh then i guess that spam is happening everywhere
[20:32] <Tv> sjust: yesterday
[20:32] <sjust> when yesterday?
[20:32] <Tv> sjust: ubuntu@teuthology:/var/lib/teuthworker/archive/tv-subnet$ less 2864/remote/ubuntu@sepia77.ceph.dreamhost.com/log/osd.0.log.gz
[20:33] <Tv> sjust: 14:56, though i don't understand why
[20:33] <sjust> ah, cool
[20:33] <sjust> before I pushed my patch :)
[20:33] <Tv> hehe
[20:34] <Tv> sjust: i don't need that whole tv-subnet subtree so rm -rf all of it when you're done
[20:34] <sjust> ok
[20:35] <sjust> done
[20:36] <Tv> [ 5.453782] [ INFO: possible circular locking dependency detected ]
[20:36] <Tv> man
[20:36] <Tv> that's a verbose printk
[20:36] <Tv> doesn't seem to implicate ceph though ;)
[21:03] * ircleuser (~ivsipi@216-239-45-4.google.com) has joined #ceph
[21:10] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Read error: Connection reset by peer)
[21:25] * Tv (~Tv|work@aon.hq.newdream.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * joshd (~joshd@aon.hq.newdream.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * kirkland (~kirkland@74.126.19.140.static.a2webhosting.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * nolan (~nolan@phong.sigbus.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * colomonkey (~r.nap@188.205.52.204) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * xns (~xns@evul.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * slang (~slang@chml01.drwholdings.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * gregaf (~Adium@aon.hq.newdream.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * sjust (~sam@aon.hq.newdream.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * todin (tuxadero@kudu.in-berlin.de) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * iggy (~iggy@theiggy.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * jclendenan (~jclendena@204.244.194.20) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * gohko (~gohko@natter.interq.or.jp) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * svenx (92744@diamant.ifi.uio.no) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * __jt__ (~james@jamestaylor.org) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * eternaleye (~eternaley@195.215.30.181) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * Meths (rift@2.25.193.0) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * Bircoph (~Bircoph@nat0.campus.mephi.ru) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * MK_FG (~MK_FG@188.226.51.71) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * sage (~sage@76.89.180.250) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * failbaitr (~innerheig@85.17.0.131) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * diegows (~diegows@50-57-106-86.static.cloud-ips.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * u3q (~ben@uranus.tspigot.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * ircleuser (~ivsipi@216-239-45-4.google.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * colon_D (~colon_D@173-165-224-105-minnesota.hfc.comcastbusiness.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * bchrisman (~Adium@108.60.121.114) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * darkfaded (~floh@188.40.175.2) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * sagewk (~sage@aon.hq.newdream.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * mfoemmel (~mfoemmel@chml01.drwholdings.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * yehudasa_ (~yehudasa@aon.hq.newdream.net) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * ajm (adam@adam.gs) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * aneesh (~aneesh@122.248.163.4) Quit (reticulum.oftc.net resistance.oftc.net)
[21:25] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[21:25] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[21:25] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[21:25] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[21:25] * colomonkey (~r.nap@188.205.52.204) has joined #ceph
[21:25] * jclendenan (~jclendena@204.244.194.20) has joined #ceph
[21:25] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[21:25] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[21:25] * kirkland (~kirkland@74.126.19.140.static.a2webhosting.com) has joined #ceph
[21:25] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[21:25] * xns (~xns@evul.net) has joined #ceph
[21:25] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[21:25] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[21:25] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[21:25] * svenx (92744@diamant.ifi.uio.no) has joined #ceph
[21:25] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[21:25] * iggy (~iggy@theiggy.com) has joined #ceph
[21:25] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[21:25] * __jt__ (~james@jamestaylor.org) has joined #ceph
[21:25] * ircleuser (~ivsipi@216-239-45-4.google.com) has joined #ceph
[21:25] * colon_D (~colon_D@173-165-224-105-minnesota.hfc.comcastbusiness.net) has joined #ceph
[21:25] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[21:25] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[21:25] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) has joined #ceph
[21:25] * eternaleye (~eternaley@195.215.30.181) has joined #ceph
[21:25] * Meths (rift@2.25.193.0) has joined #ceph
[21:25] * Bircoph (~Bircoph@nat0.campus.mephi.ru) has joined #ceph
[21:25] * darkfaded (~floh@188.40.175.2) has joined #ceph
[21:25] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[21:25] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[21:25] * diegows (~diegows@50-57-106-86.static.cloud-ips.com) has joined #ceph
[21:25] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[21:25] * mfoemmel (~mfoemmel@chml01.drwholdings.com) has joined #ceph
[21:25] * ajm (adam@adam.gs) has joined #ceph
[21:25] * sage (~sage@76.89.180.250) has joined #ceph
[21:25] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) has joined #ceph
[21:25] * aneesh (~aneesh@122.248.163.4) has joined #ceph
[21:25] * failbaitr (~innerheig@85.17.0.131) has joined #ceph
[21:25] * u3q (~ben@uranus.tspigot.net) has joined #ceph
[21:25] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[21:48] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:05] * ircleuser (~ivsipi@216-239-45-4.google.com) Quit (Remote host closed the connection)
[22:37] <yehudasa_> Tv: I don't have permissions to push to s3-tests.git
[22:43] <Tv> yehudasa_: hold on..
[22:45] <Tv> damn suggestions box, i added "yehu" now
[22:45] <Tv> yehudasa_: try now
[22:45] <yehudasa_> Tv: cool, thanks
[23:07] * fronlius (~fronlius@f054104167.adsl.alicedsl.de) Quit (Quit: fronlius)
[23:07] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[23:24] <yehudasa_> Tv: I need your help to figure out something, basically I'm trying to reset the connection in s3-tests
[23:24] <yehudasa_> Tv: and I'm getting the following: http://pastebin.com/3N0Vj6Mn
[23:24] <yehudasa_> Tv: after calling s3.main.close()
[23:25] <Tv> reading
[23:26] <Tv> yehudasa_: connection is a "property", as in it has getters and setters
[23:26] <Tv> yehudasa_: this one has no setter, you're not supposed to do that
[23:26] <yehudasa_> Tv: so what's the right way?
[23:26] <Tv> yehudasa_: what are you trying to do?
[23:26] <Tv> yehudasa_: make it not speak proper AWS?
[23:27] <yehudasa_> Tv: I got this test that leaves the connection in some crappy state, which causes the next test to fail
[23:28] <Tv> yehudasa_: what's the name of that one?
[23:28] <yehudasa_> test_bucket_create_bad_expect_unreadable
[23:28] <Tv> yehudasa_: what commit are you on?
[23:28] <yehudasa_> a030d88e5861b1b977795670166ba4122abad699
[23:28] <yehudasa_> + http://pastebin.com/qY8ADNvm
[23:29] <Tv> oh i see
[23:29] <Tv> yehudasa_: it sounds like they forgot .close even exists ;)
[23:29] <Tv> and refactored the connection handling
[23:30] <Tv> when they added connection pools
[23:30] <yehudasa_> great
[23:30] <Tv> now .connection is just a free one from the pool of open connections
[23:30] <Tv> let's see if there's something else
[23:31] <yehudasa_> Tv: maybe I can just generate a new connection for this test
[23:31] <Tv> i don't see how that test poisons the pool though
[23:31] <yehudasa_> Tv: the server assumes it's a chunked PUT and waits for more data
[23:31] <Tv> # this is a really long test..
[23:31] <Tv> @nose.with_setup(teardown=_clear_custom_headers)
[23:31] <Tv> @attr('fails_on_rgw')
[23:31] <Tv> def test_object_create_bad_expect_unreadable():
[23:31] <Tv> key = _setup_bad_object({'Expect': '\x07'})
[23:32] <Tv> key.set_contents_from_string('bar')
[23:32] <Tv> how's that a chunked put?
[23:32] <yehudasa_> not this one
[23:32] <yehudasa_> I sent you to the wrong one I guess
[23:32] <yehudasa_> test_bucket_create_bad_contentlength_empty
[23:33] <Tv> oh right
[23:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving)
[23:33] <Tv> yeah that test just shouldn't use boto i fear
[23:33] <Tv> Stephon did some funky things in trying to use boto as much as possible
[23:34] <yehudasa_> Tv: how about generating a new connection just for this one
[23:34] <yehudasa_> ?
[23:34] <Tv> to get the signing parts for free
[23:34] <Tv> i'm not sure how useful just a connection is
[23:34] <Tv> if that works, then yes
[23:35] <yehudasa_> _create_connection_bad_auth does that
[23:35] <Tv> ohh you can call .get_all_buckets etc on just a connection
[23:35] <Tv> yeah then you're fine
[23:35] <Tv> i thought that functionality wouldn't be in a single connection object
[23:37] <Tv> you might be able to just call .new_http_connection
[23:37] <Tv> the _bad_auth is explicitly making a mess ;)
[23:39] <yehudasa_> Tv: well, making a mess is what it's all about
[23:39] <Tv> yehudasa_: yeah but you might want a valid signature etc
[23:39] <Tv> depends
[23:41] <yehudasa_> Tv: testing .new_http_connection
[23:42] <yehudasa_> Tv: nope
[23:50] <yehudasa_> Tv: ok, I got a workaround, not very happy with it but it works
[23:51] <Tv> yehudasa_: let me walk over & see if it can be cleaned up?
[23:51] <yehudasa_> Tv: just working around the problem with create_new_bucket() if it fails
[23:51] <yehudasa_> Tv: that resets the server state
[23:51] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.