#ceph IRC Log


IRC Log for 2012-03-12

Timestamps are in GMT/BST.

[0:14] * lofejndif (~lsqavnbok@28IAADAK3.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[0:36] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[0:48] * tnt_ (~tnt@55.189-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:00] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:09] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[1:24] * timg (~tim@leibniz.catalyst.net.nz) Quit (Remote host closed the connection)
[1:30] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:32] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[1:50] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[2:22] * mastermin (~stuff@S01060023bee96928.vs.shawcable.net) has joined #ceph
[2:23] * mastermin (~stuff@S01060023bee96928.vs.shawcable.net) Quit (autokilled: This host violated network policy. Mail support@oftc.net if you think this in error. (2012-03-12 01:23:56))
[2:32] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Remote host closed the connection)
[2:36] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[3:34] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:35] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[3:43] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:00] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[6:35] * MarkN (~nathan@ Quit (Remote host closed the connection)
[6:43] * MarkN (~nathan@ has joined #ceph
[6:44] * MarkN (~nathan@ has left #ceph
[7:43] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Remote host closed the connection)
[8:08] * tnt_ (~tnt@55.189-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:37] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:47] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[9:18] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:25] <wonko_be> oh, librbd caching has been pushed back to .45 release
[9:32] * tnt_ (~tnt@55.189-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:45] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[9:48] * tnt_ (~tnt@ has joined #ceph
[10:01] <NaioN> argggg
[10:02] <NaioN> I was really looking forward for that one
[10:49] * joao (~JL@ has joined #ceph
[11:25] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[11:37] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[11:45] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:49] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[12:09] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[12:10] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[12:15] * stxShadow (~jens@p4FFFED58.dip.t-dialin.net) has joined #ceph
[12:19] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[12:29] * joao (~JL@ace.ops.newdream.net) has joined #ceph
[13:10] * joao (~JL@ace.ops.newdream.net) Quit (Remote host closed the connection)
[13:13] * joao (~JL@ has joined #ceph
[13:34] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[13:44] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[13:47] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[13:53] * joao (~JL@ace.ops.newdream.net) has joined #ceph
[14:23] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[14:52] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[15:06] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:09] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[15:09] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:20] <nhm> Naion: yeah, me too. :)
[15:24] <wonko_be> same here - tests are stable but a bit slow, so I was wondering what caching might add to the performance
[15:29] <stxShadow> as is see the caching feature has moved from v0.44 to v0.45
[15:29] <nhm> wonko_be: we'll see. Sage sounds hopeful that it should help with some of extent fragmentation we are seeding.
[15:29] <nhm> s/seeding/seeing
[15:30] <stxShadow> nhm, that would be great
[15:31] <nhm> Yeah, we've been having some hardware problems the last week that has limited testing, but I think we're past that now so hopefully we can go full steam ahead now.
[15:34] <wonko_be> do you use the rbd directly, or through qemu/kvm ?
[15:34] <nhm> wonko_be: so far I've only tested ceph and radosgw with an S3 client.
[15:35] <nhm> wonko_be: some of the other guys have done qemu/kvm testing though.
[15:35] <wonko_be> i'm looking to use ceph as a backend to providing block devices (for export through iscsi)
[15:36] <nhm> wonko_be: it's a good idea. It's what I was planning on doing with the openstack deployment I was working on before I joined up with the Ceph folks.
[15:36] <wonko_be> s3 gateway is nice, but i need block devices for my virtual machines (xen based)
[15:37] <wonko_be> ceph + simple iscsi layer could provide me with a redundant iscsi storage solution
[15:38] <nhm> wonko_be: Yeah, I think that will be a popular usecase, along with ceph itself for traditional file storage.
[15:39] <nhm> One step at a time though. Gotta get the basic object store stable and validated first.
[15:42] <wonko_be> rbd looks stable, couldn't get it to break in my latest tests
[15:43] <stxShadow> rbd itself is stable .... but sometimes i got corrupted data inside the rbd images
[15:43] <wonko_be> also with the latest releases?
[15:43] <wonko_be> i didn't really test consistency that much, more the iops i could push towards it
[15:43] <stxShadow> for example if i shut down one of the osds suddenly
[15:44] <stxShadow> -> power off , shutdown network etc
[15:44] <wonko_be> interesting, I'll add some consistency tests to my suite
[15:44] <wonko_be> what fs are you running on it?
[15:45] <stxShadow> xfs
[15:45] <wonko_be> ah, got some "interesting" results with xfs in the past also
[15:45] <nhm> that reminds me, I wonder if there are any good cheap sata controllers out there that support batteries.
[15:46] <stxShadow> nhm, i use LSI und 3Ware -> both with BBU Units .... so that shouldn't be a problem
[15:46] <wonko_be> nhm: we use Areca controllers or LSI-rebranded-to-intel - both do battery backup, although most nowadays are moving towards flash
[15:47] <wonko_be> stxShadow: we were not happy with 3ware, got corrupted data a couple of times
[15:47] <nhm> stxShadow: raid controllers or actual sata controllers?
[15:47] <nhm> wonko_be: I've had a ton of problems with the Areca 1680s we had.
[15:47] <stxShadow> sorry ... raid controllers ....
[15:48] <stxShadow> why to use batterie backup on a non raid controller ?
[15:48] <nhm> stxShadow: yeah, having said that, the LSI cards with the non-raid firmware are solid from what I've heard.
[15:48] <wonko_be> nhm: interesting - what kind of problems? corruption?
[15:48] <nhm> stxShadow: disable barriers
[15:48] <stxShadow> i see
[15:49] <stxShadow> we use: 3ware Inc 9750 SAS2/SATA-II
[15:49] <nhm> wonko_be: cards flaking out leading to data corruption. Half because Lustre chokes if there are any hardware problems.
[15:49] <stxShadow> never hat problems with data corruption
[15:50] <wonko_be> nhm: strange, areca has been very reliable to us
[15:50] <nhm> wonko_be: there were some similar sounding comments from people in a big thread on hardforum: http://hardforum.com/showthread.php?t=1483771
[15:51] <nhm> wonko_be: I did notice that some of the replacement cards we got were bumped a rivision or two. Maybe the early ones were bad or something.
[15:51] <nhm> ugh, revision.
[15:52] <wonko_be> we have all kinds of models (12xx, 16xx, 18xx series), a couple of 100 in total
[15:52] <wonko_be> never had any problem
[15:52] <nhm> wonko_be: that's good to hear. We had 24 of the 1680 models. Probably replaced about 5-6 of them in 18 months.
[15:53] <stxShadow> hmmm .... how fast should ceph recover data access if i shut down a osd ?
[15:53] <nhm> wonko_be: granted, we HAMMERED those cards too.
[15:54] <nhm> they were used for the scratch filesystem of a ~9000 core cluster.
[15:54] <stxShadow> INFO : task flush blocked for more then 120 seconds
[15:55] <stxShadow> why the hell ..... osd is out after 30 seconds
[15:55] <wonko_be> nhm: our workload isn't that massive, but some are used in iscsi backend nodes
[15:55] <wonko_be> getting enough random io to trigger any problems might they exist
[15:56] <nhm> stxShadow: Don't know for sure. Sorry. :/
[15:56] <stxShadow> this is our test system -> only 2 osd
[15:56] <stxShadow> und 1 mon / mds
[15:56] <stxShadow> v0.43
[15:56] <stxShadow> one osd down ..... and the vms are stuck
[15:57] <wonko_be> stxShadow: nothing in the logs?
[15:58] <nhm> wiht 2 OSDs and 1 down, maybe it's unhappy that there is no place to replicate.
[15:58] <nhm> I seem to remember someone around here saying there were still some problems with only 1 OSD avaialable.
[15:58] <stxShadow> nhm, but .... should that be the case ? all data is available on the remaining node
[15:59] <stxShadow> i will add anothter osd .....
[15:59] <stxShadow> and retry :)
[16:00] <nhm> stxShadow: probably shouldn't be the case. I just happened to remember it though since I was going to do some single OSD testing and it didn't work.
[16:01] <nhm> trying to type while sitting(bouncing) on an exercise ball is a challenge.
[16:03] * nhm grooves out to they might be giants
[16:09] * stxShadow -> on a meeting now
[17:12] <sagewk> elder: there?
[17:13] <sagewk> stxshadow: what was ceph pg stat say? you shouldn't have problems with only 1 osd down unless there were strange combinations of multiple failures leading up to it
[17:14] <sagewk> stxshadow: if you see task blocked it sounds like a kernel issue (btrfs?)... whats the stack trace look like?
[17:26] * oliver1 (~oliver@p4FFFED58.dip.t-dialin.net) has joined #ceph
[17:28] <oliver1> G'day... my name is Oliver, working in the same company Jens ( stxshadow) is in ;-)
[17:29] <oliver1> @sage: as Jens is in a meeting right now, here is some output from the "ceph pg stat" command, while 1/2 OSD is out for dinner:
[17:29] <oliver1> 2012-03-12 17:28:00.946150 mon.0 -> 'v18309: 476 pgs: 264 active+clean, 212 stale+active+clean; 48663 MB data, 48309 MB used, 132 GB / 179 GB avail' (0)
[17:33] <sagewk> stale means the osds for those pgs are all down.. are you sure there's only 1 ceph-osd down?
[17:34] <oliver1> on the host, where the vm is running, it says: 1 up, 1 in. So far, so good...
[17:39] <oliver1> it _seems_ to be 1 up, now the ceph-osd on the second node is not kill'able.
[17:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:43] <oliver1> mhm... hard killed via -9, started again. Running again.
[17:50] * tnt_ (~tnt@ Quit (Ping timeout: 480 seconds)
[17:53] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[17:59] * tnt_ (~tnt@55.189-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:00] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[18:01] <elder> sagewk, I'm here. Ssorry I missed your ping.
[18:01] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[18:06] <sagewk> elder: it reset the testing branch over the weekend so that the new patches would hit qa. seem to be fine so far.
[18:06] <elder> You reset it?
[18:06] <elder> (not it)
[18:06] <sagewk> i set it to the wip-testing, aftering cleaning up the btrfs merge a bit.
[18:06] * chutzpah (~chutz@ has joined #ceph
[18:06] <sagewk> or whatever your new testing branch was
[18:07] <elder> That was it.
[18:07] <elder> The btrfs merge was the result of a cherry-pick of a merge. And I later cherry-picked *that*, so somehow it ended up being a patch rather than a merge.
[18:07] <elder> Or something.
[18:07] <sagewk> anyway, i think we can safely move master to wip-master now.
[18:08] <sagewk> yeah.. i just replaced it with a new merge of the latest btrfs for-linus, which had new stuff anyway
[18:08] <elder> Anyway, thank you for doing that. I suppose I shouldn't be so pedantic about getting my own stuff tested before pushing it to the testing brnch.
[18:08] <elder> I will push master.
[18:08] <sagewk> well, it should be easier to test now too :)
[18:08] <elder> (using my wip-master, which is what wip-testing was based on)
[18:10] <elder> Can I test with the new hardware? I haven't figured out what I can do yet.
[18:10] <elder> Or how to go about it.
[18:10] <nhm> elder: I just was able to lock some plana nodes and log into them.
[18:11] <sagewk> elder: yeah, use teuthology-lock just like before, it pulls from the new pool of plana nodes
[18:11] <nhm> elder: haven't submitted anything via teuthology yet because the task I'm writing is still broken.
[18:11] <elder> Schweet.
[18:11] <elder> It seems to have worked for me too.
[18:11] <elder> I'll submit a quickie.
[18:12] <sagewk> elder: any luck with the core dumps?
[18:13] <elder> I'm working now on it. I have a second machine set up, but unfortunately (or not) it's a Fedora 15 system. It's good because doing it there is well-documented. But everything else I use is Ubuntu.
[18:14] <elder> I have "crash" installed and am poking on my live kernel, just need to dump a core and try it out.
[18:15] <elder> Wow, sagewk, the testing branch you took is my latest and greatest, including stuff that hasn't been reviewed yet...
[18:15] <elder> Glad it's working.
[18:16] <nhm> elder: I'm always scared when that happens. ;)
[18:16] <elder> Well I have some confidence in these changes... I just hadn't formally tested them yet.
[18:16] <elder> I tend to make lots of baby steps, which means that each individual change can be reviewed and it's not hard to convince yourself the change is correct.
[18:18] <oliver1> I will try to nail down things 2morrow, right now it's not deterministic. It's up to some ( work-)load issue... I tried to fake some load via "cpulimit" and "netperfmeter" for both host-cpu-load and network-throughput... achieved similar results like we saw in the production environment, but as I said, not determ. enough...
[18:19] * stxShadow (~jens@p4FFFED58.dip.t-dialin.net) Quit (Remote host closed the connection)
[18:20] * oliver1 (~oliver@p4FFFED58.dip.t-dialin.net) has left #ceph
[18:23] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[18:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:28] * bchrisman (~Adium@ has joined #ceph
[18:47] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[18:58] * LarsFronius (~LarsFroni@f054104003.adsl.alicedsl.de) has joined #ceph
[18:59] <elder> 'failed to install new kernel version within timeout'
[19:05] <elder> So I guess attempting to install a new kernel may not be working. Two of the machines are running the testing branch (not what I asked for) and one is running something else.
[19:05] <elder> Oh well, I'll keep working on crash dumps.
[19:09] <sagewk> elder: probably it was just slow...
[19:10] <sagewk> check those machines after a minutes and see if they came up with the new kernel
[19:14] <sagewk> joshd: does the cron job need to be reenabled for the nightlies?
[19:14] <sagewk> haven't seen any try to run
[19:18] <joshd> sagewk: the vm can't connect to the gitbuilder to download the tarball
[19:18] <joshd> sagewk: we could not save tarballs for now to avoid that
[19:19] <joshd> although it still needs to get the latest sha1 from somewhere...
[19:20] <dmick> I guess the problem with getting to ipmi is that I don't have a route to the ipmi network
[19:20] <dmick> must be an OpenVPN server-side config
[19:20] <elder> sagewk, I am sitting on all three of them and I do see that my kernel is sitting in /boot. I've logged out so I don't get in the way...
[19:20] <joshd> sagewk: I'll try changing the url to not use the subdomain
[19:22] <sagewk> joshd: really? maybe it's talking to the wrong one?
[19:22] * blufor (~blufor@mongo-rs2-1.candycloud.eu) has joined #ceph
[19:23] <blufor> ehlo ;]
[19:23] <joao> so, no standup today?
[19:23] <sagewk> elder: uname -a has the old kernel still?
[19:23] <elder> Just a sec.
[19:23] <sagewk> joao: missed it, it was right before we called you back..
[19:23] <dmick> joao: the US went on Standard time today, you probably didn't?
[19:23] <joao> oh
[19:23] <dmick> (actually yesterday, but...)
[19:24] <joao> ups
[19:24] <sagewk> oh yeah :)
[19:24] <elder> sage, yes, still sitting on the "old" kernel.
[19:24] <elder> Looks like it may be ready to boot the new one, but that didn't occur.
[19:24] <sagewk> elder: weird.. just try it again?
[19:24] <joao> looks like I'll have to adjust my clock :p
[19:24] <elder> OK, happily.
[19:24] <blufor> hey guys, i'd like to ask a quick Q, does anyone succesfully run ceph over a cluster of intel atoms ? (i plan for 32x D525 servers with 4GB RAM each, 1Gbit network, SSD drives)
[19:25] <sagewk> blufor: wido has
[19:25] <blufor> i wonder to what success :]
[19:26] <blufor> btw the presentations fron the openstack meetup were awasome
[19:26] <joao> now you got me wondering if it is possible to run a ceph cluster on raspberry pis
[19:26] <sagewk> i think he's had pretty good luck.. you should ask him tho when he's around (probably idle now)
[19:26] <blufor> sagewk: i will
[19:26] <elder> Or run it on phones.
[19:27] <joao> a ceph cluster on android
[19:27] <joao> that would be interesting
[19:27] <blufor> elder: 32 atoms eat less than 16 xeons ;]
[19:27] <elder> Yes. Maybe save a MB on each phone, scale up to the number of phones out there, pretty soon we're talking big data.
[19:27] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:28] <elder> And a hella lotta redundancy available.
[19:28] <elder> (And necessary)
[19:28] <joao> elder, although it would be hard to use btrfs
[19:28] <nhm> elder: add some GPUs and we coudl keynote the next SC conference. :P
[19:28] <joao> not sure if anyone has been able to get btrfs on android
[19:28] <blufor> http://www.supermicro.com/products/nfo/2UTwin3.cfm
[19:29] <blufor> kinda-blade solution ;]
[19:29] <nhm> green data center + big data + GPUs = high score!
[19:29] <elder> At the rate XFS is shrinking maybe that will happen soon.
[19:29] <elder> http://sandeen.net/wordpress/wp-content/uploads/2011/06/fs-loc.png
[19:30] <elder> Sorry, linked-to from this: http://sandeen.net/wordpress/computers/linux-filesystems-loc/
[19:30] <nhm> elder: clearly from a project management perspective, negative work is gettting done.
[19:32] * BManojlovic (~steki@ has joined #ceph
[19:33] <blufor> btw i've heard someone saying at the conference that it's not wise to span the cluster across datacenters, is there any other way (other than the usual ones) to transfer blocks to another datacenter ?
[19:33] <sagewk> elder: the xfs fix isn't in the new testing, the lockdep warning is turning up in qa
[19:33] <blufor> i mean, not-spanning the cluster across more DCs makes sense
[19:35] <joshd> sagewk: the noon run should work
[19:35] <sagewk> joshd: great
[19:35] <joshd> blufor: no cross-dc replication yet
[19:35] <elder> Oopsps.
[19:35] <joshd> unless you have no latency between dcs :)
[19:36] <elder> I need a new keyboard. Or maybe it's just batteries.
[19:36] <elder> I'll update both master and testing.
[19:36] <sagewk> joshd: wip-2160 look right?
[19:36] <elder> I'll also send out the patches now in testing for review.
[19:36] <blufor> joshd: well, give me 10Ggit between prague and SF with 1ms and i'm happy ;]
[19:37] <sagewk> elder: er, just testing for the xfs branch, right? or are you talking about something else?
[19:37] <blufor> joshd: + i'd call you the magician of the year ;]
[19:37] <elder> I mean, I will add that one XFS commit to the end of the ceph-client/testing branch.
[19:38] <elder> To avoid the warnings.
[19:38] <elder> The master branch will contain only the commits in that branch that have been reviewed, and which belong there.
[19:38] <sagewk> elder: :) just checking
[19:38] <elder> (I.e., no "btrfs crap" or XFS commits)
[19:39] <sagewk> no such thing as "xfs crap"? :)
[19:40] <elder> I was referring to your commit message. I think you called one of them "more btrfs crap"
[19:41] <elder> Sorry, "more btrfs debug crap"
[19:41] <sagewk> :)
[19:41] <elder> I'll add "crap" to my next XFS-related commit message for you?
[19:42] <sagewk> hehe
[19:42] <elder> I mean :)
[19:44] <elder> sagewk, same result re-trying my same test request on the plana nodes.
[19:44] <elder> 'failed to install new kernel version within timeout'
[19:44] <sagewk> which nodes?
[19:44] <elder> All three systems involved have the new kernel available, but are still running the old one (never rebooted)
[19:44] <sagewk> elder: i think i know what the problem is
[19:45] <elder> plana{33,34,93}
[19:45] <sagewk> trying to boot kernel 00164?
[19:47] <elder> No. 00063
[19:47] <elder> commit id ca69d07
[19:48] <sagewk> can you paste me the output from when grub runs?
[19:48] <elder> You mean from the teuthology output?
[19:48] <sagewk> yeah
[19:49] <sagewk> joshd: do you rmember what tv did to make it always use the latest (not highest-sorting) kernel?
[19:49] <joshd> sagewk: $ cat /etc/grub.d/01_ceph_kernel
[19:49] <joshd> cat <<EOF
[19:49] <joshd> set default="Ubuntu, with Linux 3.2.0-ceph-00063-gca69d07"
[19:49] <joshd> EOF
[19:50] * The_Bishop (~bishop@178-17-163-220.static-host.net) Quit (Ping timeout: 480 seconds)
[19:50] <sagewk> that's there
[19:50] <elder> sagewk, it doesn't look very interesting to me. I'll paste it in a private window.
[19:51] <sagewk> tnx
[19:52] * The_Bishop (~bishop@178-17-163-220.static-host.net) has joined #ceph
[20:10] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:18] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Quit: adjohn)
[20:20] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[20:24] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[20:33] <elder> sagewk, It's a mess, I tried putting together for-linus + current master extras + everything in testing, but it doesn't work cleanly. I think I'll tack the "current master extras" (12 commits) on to the end of what's now in "testing". Any objection?
[20:34] <elder> Oh, and I'll keep sliding those 6 "btrfs and xfs crap" commits to the end.
[20:34] <sagewk> what re current master extras?
[20:34] <sagewk> are
[20:36] <elder> There are a dozen commits in ceph-client/master based on Linux 3.2. ceph-client/for-linus has 6 commits based on Linux 3.2. And ceph-client/testing has a jillion commits based on ceph-client/for-linus.
[20:36] <elder> The 12 commits in master are not present in testing.
[20:36] <elder> So I'd like to append them to the end rather than rebase testing (which is not clean)
[20:37] <elder> The end result will have all commits, but those old ones (from master) will follow the new ones (from testing) in the commit stream.
[20:56] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[20:56] <sagewk> k
[20:57] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[21:00] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[21:36] * LarsFronius (~LarsFroni@f054104003.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[21:38] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[21:51] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[21:52] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[22:31] <NaioN> how can you see the actual size of a rbd object?
[22:32] <NaioN> rbd info shows the provisioned space not the allocated space
[22:33] <sagewk> naion: right now you can't (easily)
[22:33] <sagewk> naion: want to enter a feature request?
[22:33] <NaioN> sure :)
[22:33] <sagewk> it'll be an o(n) operation, in any case, unless the per-pool stats are good enough for you
[22:33] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[22:34] <NaioN> what do you mean withg per-pool?
[22:36] <NaioN> because per-pool would be good enough I think
[22:37] <NaioN> at the moment I've all the rbds in the rbd pool
[22:37] <joshd> NaioN: I think sage means the stats you can get from 'ceph pg dump' that show overall usage for each pool
[22:38] <sagewk> right. as opposed to per-image space utilization
[22:40] <NaioN> well the problem is that I don't know how much overcommitment I have
[22:40] <sagewk> ah
[22:40] <NaioN> well i could count it if there are only rbds in a pool
[22:41] <NaioN> and then you could count all the provisioned space and look at the used space in the pool
[22:41] <NaioN> but ofcourse that would be for all the rbds together
[22:43] <NaioN> and the dump is a little bit raw :)
[22:43] <NaioN> I first have to figure out which pgs belong to which pool
[22:43] <joshd> NaioN: --format=json
[22:44] <joshd> and it tells you the totals for the pools too, not just each pg
[22:44] <NaioN> aha i see the first number is the pool number
[22:46] <joshd> yeah
[22:47] <NaioN> joshd: thanks the json output gives the totals
[22:48] <NaioN> { "poolid": 2,
[22:48] <NaioN> "stat_sum": { "num_bytes": 12681763093153,
[22:48] <NaioN> "num_objects": 3149270,
[22:48] <NaioN> well that's usefull information for now, with this it easy to count the overcommitment
[22:49] <joshd> just keep in mind those aren't instantly updated - the osds periodically report their info to the monitors
[22:49] <NaioN> but it would be really nice to get the allocated size of a rbd with rbd info
[22:50] <joshd> yeah, it'll just be linear in the number of objects for now
[22:50] <NaioN> joshd: no problem it isn't that it has to be live information, but at a point it's important to know the overcommitment ratio to increase the size of the cluster before it gets full
[22:51] <NaioN> and if the ratio is big it's more a risk to get out of size
[22:52] <NaioN> joshd: it depends if all the rbd use the same object size, does different sizes (besides 4mb) work? I saw it's a option
[22:52] <joshd> yeah, different sizes work
[22:53] <joshd> it's always a constant size for a given image until layering is implementing (then parent/child images might have differing object sizes)
[22:55] <joshd> er, implemented
[22:56] <NaioN> sagewk: how can I make the feature request? :)
[22:57] <joshd> NaioN: make a new issue at tracker.newdream.net and change "bug" to "feature"
[23:01] <NaioN> joshd: euhmmm where exactly?
[23:02] <NaioN> found it :)
[23:02] <joshd> NaioN: ah, I think you have to create an account to create an issue
[23:02] <NaioN> yeah did it
[23:03] <sagewk> joshd: can you look at those python errors in the qa run? not sure why it's getting permission denied there
[23:03] <joshd> sure
[23:18] <joshd> sagewk: there are two ceph tasks in the config file
[23:19] <joshd> sagewk: permissions for that script are changed to rx, hence the error rewriting it
[23:19] <sagewk> ah
[23:19] <sagewk> why two ceph tasks?
[23:20] <joshd> due to the suite changes I'd guess
[23:20] <joshd> the top of the teuthology log shows them both
[23:20] <sagewk> oh i see the problem
[23:21] <sagewk> it's the new jobs i added
[23:22] <sagewk> fixed
[23:23] * nhm pokes at gitbuilder
[23:23] <sagewk> which one?
[23:25] <nhm> sagewk: oneiric I think. Need to pick up some new rgw changes Yehuda put in to remove a libfcgi.so.0 dependency.
[23:26] <nhm> sagewk: radosgw-admin was dying when invoked through teuthology.
[23:26] <sagewk> and libfcgi isn't installed on plana yet.. the chef task needs to run first
[23:26] <sagewk> put - chef: as the first task and rereun teuth
[23:27] <nhm> sagewk: ok. sounded kind of like Yehuda maybe wanted to get rid of the depedency instead of installing it?
[23:28] <sagewk> well radosgw needs it, so that part will fail too
[23:28] <sagewk> but yeah, both are good.
[23:28] <nhm> ok, I'll add the chef task
[23:38] <nhm> sage: ok, looks like that took care of the library problem, but we need python-pip, python-virtualenv, python-dev, and libevent-dev installed. Gotta run and eat dinner. I'll look at it later.
[23:39] <sagewk> k, i'll add those
[23:40] <sagewk> nhm: hmm, those are already in the chef radosgw cookbook. they're not installed by chef? can you tell from the output if it's running the radosgw.rb cookbook?
[23:48] <nhm> hrm, I found this... "radosgw not supported on: ubuntu 11.10"
[23:48] <nhm> ok, going to eat for real now

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.