#ceph IRC Log

Index

IRC Log for 2012-04-17

Timestamps are in GMT/BST.

[0:03] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[0:04] <The_Bishop> 2012-04-14 17:39:09.335904 7f4827c82700 1 -- 192.168.32.185:6800/6000 <== osd.0 192.168.32.177:6801/1505 16 ==== osd_op_reply(45 200.00000078 [delete] ondisk = -6 (No such device or address)) v4 ==== 111+0+0 (615103026 0 0) 0x7f480d2efd90 con 0x22aaba0
[0:05] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[0:06] <The_Bishop> why does the osd answer with ENODEV instead of ENOENT
[0:06] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:08] <gregaf> uhhh???it doesn't\
[0:08] <gregaf> The_Bishop: can you give me the output of ceph -v?
[0:09] <gregaf> I want to check error codes against your version
[0:09] <The_Bishop> ceph version 0.45-181-g1bc0128 (commit:1bc01289993d6a451cba92189838299f1be3a7e2)
[0:09] <The_Bishop> it's next to bleeding edge
[0:10] <The_Bishop> it is from my problem with the MDSs
[0:10] * brambles (brambles@79.133.200.49) has joined #ceph
[0:10] <gregaf> yeah, the only place in the codebase that generates an ENODEV is when the OSD is first mounting ??? if you're seeing that it's coming out of the OS on that node
[0:11] <The_Bishop> but the OSD was running, this happened while the MDS was starting up
[0:13] <The_Bishop> 2012-04-14 17:39:09.334901 7f4827c82700 1 -- 192.168.32.185:6800/6000 <== osd.3 192.168.32.185:6801/2219 10 ==== osd_op_reply(47 200.0000007a [delete] ondisk = -2 (No such file or directory)) v4 ==== 111+0+0 (2253285485 0 0) 0x4ff2950 con 0x22a6e50
[0:13] <The_Bishop> 2012-04-14 17:39:09.335904 7f4827c82700 1 -- 192.168.32.185:6800/6000 <== osd.0 192.168.32.177:6801/1505 16 ==== osd_op_reply(45 200.00000078 [delete] ondisk = -6 (No such device or address)) v4 ==== 111+0+0 (615103026 0 0) 0x7f480d2efd90 con 0x22aaba0
[0:13] <The_Bishop> 2012-04-14 17:39:09.335916 7f4827c82700 -1 mds.0.journaler(rw) _prezeroed got (6) No such device or address
[0:13] <The_Bishop> 2012-04-14 17:39:09.335922 7f4827c82700 -1 mds.0.journaler(rw) handle_write_error (6) No such device or address
[0:13] <gregaf> I don't know what to tell you; the OSD is returning that and the OSD doesn't generate ENODEV on its own; it's got to be coming out of the operating system
[0:14] <gregaf> go check the health of the underlying FS
[0:14] <gregaf> and make sure both its journal and its filestore are still accessible
[0:14] <The_Bishop> oh, this could be the case
[0:14] <gregaf> if you've got OSD logs we can look at those to pin it down a little more
[0:15] <The_Bishop> no, i removed them when i reformatted the osd in question
[0:15] <Tv_> nice 2-month delay on that mail from elder...
[0:15] <gregaf> yeah, somebody hadn't done moderation duty in a while
[0:17] <gregaf> (or else both he and Simon fixed their outbox issues within 3 minutes of each other ;))
[0:17] * sagelap (~sage@12.199.7.82) has joined #ceph
[0:30] <dmick> cephalopods for the win: http://i.imgur.com/l97du.jpg
[0:32] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[0:32] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[0:34] <gregaf> that's just scary, dmick
[0:34] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:34] <dmick> do not mess with the octopus
[0:37] <Tv_> you see results of a battle, i see evolution!
[0:38] <Tv_> to the skies, dear invertebrates!
[0:38] <dmick> yikes, now *that's* scary
[0:38] <Tv_> helpful hint: squid can *already* fly
[0:39] <Tv_> they're just not good at gliding, and run out of propellant
[0:39] <Tv_> http://ferrisjabr.wordpress.com/2010/09/19/when-squid-fly-new-photographic-evidence/
[0:40] <dmick> http://images.wikia.com/lovecraft/images/1/1b/Kraken-cthulhu.jpg
[0:41] * lofejndif (~lsqavnbok@1RDAAAYAD.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[0:41] * BManojlovic (~steki@212.200.243.246) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:48] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[0:51] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:53] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[0:57] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:59] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[0:59] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[1:02] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving)
[1:11] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[1:25] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[1:28] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Ping timeout: 480 seconds)
[1:31] * sagelap (~sage@12.199.7.82) Quit (Quit: Leaving.)
[1:32] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[1:32] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[1:50] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[1:54] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[2:00] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:12] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:23] * sagelap (~sage@12.199.7.82) has joined #ceph
[2:27] * sagelap (~sage@12.199.7.82) Quit ()
[2:28] * sagelap (~sage@12.199.7.82) has joined #ceph
[2:31] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[2:36] * sagelap (~sage@12.199.7.82) Quit (Ping timeout: 480 seconds)
[2:39] * joao (~JL@89-181-153-140.net.novis.pt) Quit (Ping timeout: 480 seconds)
[2:47] * sagelap (~sage@ace.ops.newdream.net) has joined #ceph
[3:00] * loicd (~loic@99-7-168-244.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[3:17] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[3:28] <The_Bishop> i have a question about crush maps
[3:29] <The_Bishop> this is my active setting: http://pastebin.com/EyRbQU9B
[3:31] <The_Bishop> why is osd.0 (the smallest disc in the pool) getting full at first, and why can't i stop this behaviour with "ceph osd pause 0" or "ceph osd reweight 0 0.00001"?
[3:33] <The_Bishop> in the crush map i set the weights in terabytes corresponding to the filesystems
[3:52] * al (quassel@niel.cx) has joined #ceph
[4:29] * brambles (brambles@79.133.200.49) Quit (Remote host closed the connection)
[4:30] * brambles (brambles@79.133.200.49) has joined #ceph
[4:31] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[6:15] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:28] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[6:36] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[6:45] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:50] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:55] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:38] * cattelan is now known as cattelan_away
[8:37] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[8:39] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:04] <wido> The_Bishop: still there?
[9:07] <The_Bishop> yes
[9:13] <The_Bishop> my problem is that the smallest OSD is osd.0 and it gets backfilled first
[9:16] <The_Bishop> and the OSD daemon fills the FS up until it errors - even if i adjust all weights
[9:20] * votz (~votz@c-67-188-115-159.hsd1.ca.comcast.net) has joined #ceph
[9:22] <The_Bishop> gotta go, will read later...
[9:53] <wido> The_Bishop: have you tried giving all the OSDs a equal weight and see how distribution goes?
[9:53] <wido> And, are you using the Posix FS, native RADOS or RBD?
[9:54] <wido> You could also try setting relative weight
[9:55] <wido> uh, skip that last sentence
[10:03] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[10:07] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[10:07] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has left #ceph
[11:34] <pmjdebruijn> I just noticed that ceph-rbdnamer.8 is missing
[11:37] <pmjdebruijn> at least the debian packaging tries to install it, but it aint there
[11:43] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:29] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[12:39] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:45] * joao (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[13:25] * votz (~votz@c-67-188-115-159.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[13:36] <The_Bishop> wido: i can not set the disks to equal weight, because there is already too much data in the cluster. and the disk sizes differ too much.
[13:36] <The_Bishop> ah, and i only use posix layer so far
[13:40] <pmjdebruijn> obsync.8 is missing too
[14:32] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:35] * gregorg (~Greg@78.155.152.6) has joined #ceph
[16:13] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[16:18] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[17:00] * hylick (~hylick@32.97.110.63) has joined #ceph
[17:02] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:02] <hylick> My ceph-fuse userspace mount is crashing unexpectedly for some reason, and I need to see if there are any logs specifically for the FUSE mount. I have the OSD logs. I am trying to find if there are any ceph-fuse logs that are captured by default, and if so, where do they live?
[17:13] <gregaf> The_Bishop: I believe that weights need to be positive integers; try that instead :)
[17:14] <gregaf> hylick: you can set userspace clients to log with the "--log-file=path/to/file" option, and set debugging options like "--debug-client 20"
[17:15] <gregaf> (or you can do it in the config file with "log file = path/to/file" if you prefer that to command line options
[17:15] <hylick> sweet. thanks @gregaf
[17:16] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:17] * sagelap1 (~sage@12.199.7.82) has joined #ceph
[17:21] * sagelap (~sage@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[17:26] * sagelap1 (~sage@12.199.7.82) Quit (Read error: Operation timed out)
[17:27] * oliver1 (~oliver@p4FFFE3E4.dip.t-dialin.net) has joined #ceph
[17:32] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[17:44] * sagelap (~sage@12.199.7.82) has joined #ceph
[17:50] * loicd (~loic@99-7-168-244.lightspeed.sntcca.sbcglobal.net) Quit (Quit: Leaving.)
[17:55] * sagelap1 (~sage@mobile-166-205-138-108.mycingular.net) has joined #ceph
[17:55] * lofejndif (~lsqavnbok@09GAAE037.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:55] * aliguori (~anthony@32.97.110.59) has joined #ceph
[17:59] * sagelap (~sage@12.199.7.82) Quit (Ping timeout: 480 seconds)
[17:59] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:05] * sagelap1 (~sage@mobile-166-205-138-108.mycingular.net) Quit (Ping timeout: 480 seconds)
[18:08] * rturk (rturk@ds2390.dreamservers.com) has joined #ceph
[18:10] * lofejndif (~lsqavnbok@09GAAE037.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[18:11] <oliver1> @Sage: if you read this, test number 25 is running... without known problems. One hickup, though: as I remove all images before every run, rbd reported once one remaining header in the pool. Strange. But other than that: Good ;)
[18:14] <oliver1> Being back l8r...
[18:14] * oliver1 (~oliver@p4FFFE3E4.dip.t-dialin.net) has left #ceph
[18:16] * sagelap (~sage@ace.ops.newdream.net) has joined #ceph
[18:24] * perplexed (~ncampbell@216.113.168.141) has joined #ceph
[18:25] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[18:29] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[18:36] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[18:47] * sagelap (~sage@ace.ops.newdream.net) Quit (Quit: Leaving.)
[18:48] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[18:48] * Oliver1 (~oliver1@ip-176-198-98-169.unitymediagroup.de) has joined #ceph
[18:54] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[18:57] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[19:02] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:06] <joao> so, are we using Danger Room today? :)
[19:09] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[19:11] <gregaf> joao: I haven't seen anybody else in there, and I think Mark is doing something to prevent that kind of snafu in the future
[19:15] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:15] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Ping timeout: 480 seconds)
[19:17] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[19:21] <The_Bishop> gregaf: yes, i did not use negative weights; don't want to create antidata ;)
[19:24] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[19:24] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:25] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[19:27] <gregaf> The_Bishop: you were using floats though, and I believe they need to be whole numbers
[19:28] <jmlowe> I saw that librbd caching is hitting 0.46, very excited about that
[19:30] <nhm> jmlowe: yeah, it sounds like it's a pretty big improvement for small operations.
[19:30] <jmlowe> I'm wondering how the libvirt xml syntax changes
[19:30] <wonko_be> sweet, i'm looking forward to that
[19:30] <wonko_be> i had to control myself to not compile the wip- myself
[19:31] <nhm> hehe
[19:31] <jmlowe> used to be something like <source protocol='rbd' name='rbd/vmimage:rbd_writeback_window=64000000'/>
[19:32] <jmlowe> that would get you a writeback cache
[19:33] <jmlowe> I'm assuming there is some default size, but then you could replace rbd_writeback_window argument with a cache size argument?
[19:34] <gregaf> I believe right now it's taking the same arguments as the writeback window did, so that we don't need any libvirt changes
[19:36] <The_Bishop> gregaf: no, in all weight places i see float numbers
[19:36] * sagelap1 (~sage@mobile-166-205-138-108.mycingular.net) has joined #ceph
[19:36] <gregaf> hmm, I may be misremembering then
[19:36] <jmlowe> so the same xml definitions used with the new code would yield vm's that use the new caching?
[19:36] <gregaf> I don't do that stuff much myself
[19:37] <sagelap1> jmlowe: it's a new option, and the old one went away
[19:38] <gregaf> The_Bishop: can you describe your symptoms more exactly?
[19:38] <jmlowe> sagelap1: ok that's what I assumed
[19:38] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:38] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has left #ceph
[19:38] <gregaf> what happens when you try to reweight osd.0 down to a very small size?
[19:38] <sagelap1> no josh today?
[19:38] * hylick (~hylick@32.97.110.63) has left #ceph
[19:38] <gregaf> he's at OpenStack???.
[19:38] <sagelap1> oh right
[19:38] <gregaf> all week, I believe :(
[19:39] <Tv_> sagelap1: oh hey sage.. i have gitbuilder questions
[19:39] <gregaf> so am I misremembering that the libvirt xml stuff will use the new caching without changes?
[19:40] <The_Bishop> gregaf: after degredation, ceph started recover and backfill
[19:41] <The_Bishop> osd.0 was next to full at that point, but ceph started to fill it first, even when i weighted osd.0 down with "ceph osd reweight 0 0.0001"
[19:42] <The_Bishop> so i stopped osd.0 and waited until ceph filled up osd.1 and osd.2
[19:43] <The_Bishop> after re-enabling osd.0 things went better
[19:43] <gregaf> things are working now, then?
[19:44] * perplexed (~ncampbell@216.113.168.141) has left #ceph
[19:44] <gregaf> it wouldn't surprise me if there were some wonky (lack of) interactions between an in-progress backfill and reweighting; I'm not sure if they're worth worrying about but if you write a bug/feature request in the tracker that'll make sure it doesn't get lost :)
[19:45] <The_Bishop> well, just had another issue; mon.0 froze and stood on CPU as zombie after "killall -9 ceph-osd"
[19:46] <The_Bishop> (was not the first time it happened to me that i needed a reboot the box because of mon)
[19:46] <The_Bishop> sry "killall -9 ceph-mon" i did
[19:48] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:48] * sagelap1 (~sage@mobile-166-205-138-108.mycingular.net) Quit (Ping timeout: 480 seconds)
[19:48] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[19:50] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:51] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[19:52] <The_Bishop> i wonder how ceph-mon can be stuck in a way that kill -9 does not help :(
[19:53] <gregaf> sorry, got distracted by some other things
[19:53] <gregaf> unless something else is very broken, that probably means it's stuck on disk access
[19:53] <gregaf> what made you conclude the ceph-mon daemon had frozen so that you wanted to kill ???9 it?
[19:54] <The_Bishop> ceph-mon was not responding and used 100% CPU
[19:54] <The_Bishop> i have logs :)
[19:54] <gregaf> okay, can you post them somewhere?
[19:55] <The_Bishop> yes, collecting and preparing right now
[19:55] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:55] <The_Bishop> as said, it was not the first time, but this time i have logs
[19:56] <gregaf> it doesn't sound familiar, so logs are good :)
[19:56] <The_Bishop> some similar case was mentioned here some days ago
[20:00] * sagelap (~sage@ace.ops.newdream.net) has joined #ceph
[20:08] * sagelap (~sage@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[20:25] * sagelap1 (~sage@mobile-166-205-138-108.mycingular.net) has joined #ceph
[20:25] <sagelap1> does someone have a minute to take a look at wip-2031b?
[20:32] <gregaf> sagelap1: the mechanics look correct
[20:32] <gregaf> (assuming you meant wip-2301b :))
[20:32] <sagelap1> :) yeah
[20:38] <sagelap1> thanks
[20:51] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[20:53] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[21:02] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[21:11] * BManojlovic (~steki@212.200.243.246) has joined #ceph
[21:22] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[21:25] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[21:28] * sagelap1 (~sage@mobile-166-205-138-108.mycingular.net) Quit (Quit: Leaving.)
[21:34] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[21:47] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:00] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[22:02] * sagelap (~sage@12.199.7.82) has joined #ceph
[22:04] * lofejndif (~lsqavnbok@28IAAD0PB.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:19] <sjust> sagelap1: pushed updated wip_journal
[22:19] <nhm> sjust: any major changes?
[22:19] <sjust> nhm: nope, just responding to sage's comments
[22:20] <nhm> sjust: ok, I'm running some tests against it now.
[22:20] <sjust> ok
[22:20] <nhm> sjust: 1 OSD per node tests actually came out a little lower than tests from master. Not sure if it's significant or not.
[22:20] * sagelap (~sage@12.199.7.82) Quit (Quit: Leaving.)
[22:20] <sjust> hmm
[22:20] <sjust> how much?
[22:21] <nhm> sjust: haven't looked closely yet, but fastest I saw for 4 clients against 8 OSDs was about 205MB/s.
[22:21] <nhm> sjust: that was with the default pg_bits
[22:21] <sjust> what was the fastest with master?
[22:21] <Oliver1> Sage??? u there?
[22:23] <nhm> sjust: 234MB/s
[22:23] <sjust> hmm
[22:23] <sjust> how long were the runs?
[22:23] <nhm> sjust: 60s, same as before. Probably need to redo both sets to get more conclusive results.
[22:23] <sjust> I hadn't tried any tests really with default pg_bits
[22:24] <nhm> sjust: you are using 8?
[22:24] <sjust> yesh
[22:24] <sjust> *yeah
[22:25] <nhm> sjust: I'll try it with 6 at some point. I'm doing 2 OSDs per node now.
[22:25] <sjust> I was getting closer to 240-280, I think with 4 clients and 8 osds
[22:25] <sjust> over 10 min
[22:26] <nhm> sjust: I'll try some more targeted longer running tests once this next set finishes up.
[22:26] <sjust> nhm: ok
[22:26] <sjust> what confounds me is that the speed was considerably higher for the first 60s in my tests
[22:26] <nhm> sjust: maybe with a couple of different pg_bits values and against master and wip_journal
[22:26] <sjust> ok
[22:27] <nhm> hrm, master or stable?
[22:27] <nhm> or both?
[22:27] <sjust> wip_journal
[22:27] <nhm> yes, but what to compare it to?
[22:27] <sjust> sorry, it was higher than my 10min runs
[22:27] <nhm> some times I'm afraid master might change underneath me.
[22:27] <sjust> so your 60s run should have been fast
[22:27] <sjust> nhm: actually, it does, you probably want to pick a particular commit and test that
[22:28] <sjust> also, I was probably doing 128 threads per client
[22:28] <sjust> **in_flight_ops
[22:28] <nhm> sjust: yeah, I've started testing against stable instead, but probably better to test against a specific release.
[22:29] <nhm> ah, I'm doing max of 64 clients per thread. That does seem to have an effect.
[22:29] <sjust> 0.45 wouldn't hurt, I can backport wip_journal for you to get an apples-to-mostly-apples comparison
[22:29] <nhm> sjust: how much debugging did you have on?
[22:29] <sjust> lots
[22:29] <nhm> ok, same
[22:30] <sjust> hmm, actually, I can't backport wip_journal
[22:31] <nhm> well, at least we can get an idea of what the overall effect of wip_journal + master changes are.
[22:31] <sjust> yeah
[22:31] <sjust> ah, yes I can
[22:31] <sjust> I'll push upstream
[22:32] <sjust> sorry, didn't think the OpRequest stuff was in 0.45
[22:32] <sjust> nhm: I'll check it and push it in a bit
[22:33] * Oliver1 (~oliver1@ip-176-198-98-169.unitymediagroup.de) Quit (Quit: Leaving.)
[22:33] <nhm> sjust: ok, cool. I'll do the next set of targetted tests agains that and wip_journal.
[22:41] <sjust> ok, it's wip_journal_045_backport
[22:41] <sjust> so you might test against that and 0.45
[22:41] <nhm> oops, that's what I meant to say earlier
[22:41] <nhm> yeah, sounds good
[22:45] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Connection reset by peer)
[22:45] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[22:53] * lofejndif (~lsqavnbok@28IAAD0PB.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[22:53] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[22:54] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit (Ping timeout: 480 seconds)
[23:03] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[23:05] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Ping timeout: 480 seconds)
[23:14] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[23:17] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[23:23] * MK_FG (~MK_FG@188.226.51.71) Quit (Ping timeout: 480 seconds)
[23:27] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[23:41] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[23:43] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit ()
[23:44] <yehudasa> sjust: 11.a37f 0 0 0 0 0 0 0 down+peering 2012-04-14 00:05:48.346509 0'0 8439'336 [1221,1231] [1221,1231] 0'0 0.000000
[23:45] <yehudasa> 11.8551 0 0 0 0 0 0 0 active+degraded 2012-04-14 00:27:39.443152 0'0 8493'319 [1221,1231] [1221,1231] 0'0 2012-04-14 00:20:08.773564
[23:45] <nhm> oof
[23:46] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[23:47] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:50] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[23:57] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[23:57] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[23:58] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[23:59] * aliguori (~anthony@32.97.110.59) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.