#ceph IRC Log

Index

IRC Log for 2012-04-12

Timestamps are in GMT/BST.

[0:00] <gregaf> perplexed: oh, that's???higher than I've seen before ??? are any of the reads local?
[0:00] <gregaf> sagewk: did you see above?
[0:00] <perplexed> it's possible, it's running from one of the 4 test servers... that could be it
[0:00] <sagewk> gregaf: thanks
[0:01] <dmick> perplexed: I've found them reasonably accurate
[0:01] <dmick> they top out at right about 115/116MB/s across a 1G link
[0:01] <gregaf> perplexed: I believe it should be accurate, and although that extra over linespeed is a little more than I'd expect it's not impossible it's due to loopback
[0:01] <perplexed> thx. Makes sense if some of the reads are local
[0:02] <perplexed> loopback
[0:02] <gregaf> I guess you'd get 1/4 of your reads going local and 3/4 going remote, so 115MB/s / (3/4) = 153MB/s
[0:03] <gregaf> soooo???yeah, due to locality :)
[0:03] * jks2 (jks@193.189.93.254) has joined #ceph
[0:04] <dmick> Math FTW
[0:04] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[0:11] * jksM (jks@193.189.93.254) Quit (Ping timeout: 480 seconds)
[0:24] <sagewk> gregaf: wip-pools?
[0:28] <gregaf> sagewk: looks good, but we should probably add a "???yes-i-really-mean-it" guard for disable_lpgs
[0:28] <gregaf> in case somebody actually did use them
[0:28] <sagewk> yeah
[0:42] <Kioob> did rbd support trim/discard ?
[0:43] <Qten> hi cephers, whats the difference between ceph-rbd and qemu-rbd anything?
[0:45] * gregorg (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[0:46] <joshd> Kioob: it's being added
[0:46] <Qten> nhm: thanks for the reply ealier :)
[0:46] <joshd> Qten: not sure whan you mean by ceph-rbd
[0:46] * gregorg (~Greg@78.155.152.6) has joined #ceph
[0:46] <Kioob> ok, thanks joshd
[0:46] <nhm> Qten: np, though I forgot the question now. :)
[0:47] <Qten> sorry i thought qemu-rados block device was different to cephs rados block device or are they one in the same :)
[0:47] <Qten> because ealier someone was talking about cephs rbd having issues and qemu-rbd wasnt an option /confused
[0:48] <joshd> it's the same - there's a userspace library (librbd) that qemu uses to access it, and a kernel module for exposing block devices directly
[0:49] <joshd> the person earlier was interested in the kernel module, which lets you have /dev/rbdN block devices
[0:50] <Qten> ah i see, so userspace version dosn't give you the /dev/rbdN just a flat file?
[0:52] <Qten> or i'll ask does either or both rbd devices require btrfs for snapshotting?
[0:52] <joshd> the userspace version needs an application to use it, i.e. qemu
[0:53] <joshd> the user/kernel split is just about accessing them, the underling format is the same, and you can switch from one to the other
[0:53] <joshd> btrfs isn't required for snapshotting, although it is more efficient
[0:54] <Qten> efficient in the way of space usage?
[0:54] <joshd> and time (not too much, but a little)
[0:55] <Qten> another part i'm trying to understand is why not just use a flatfile ie raw file vs qemu-rbd
[0:55] <Qten> rbd i imagine splits it up into smaller files?
[0:56] <joshd> you mean objects? yeah, rbd stripes the block device over uniformly sized objects (4MiB by default)
[0:56] <joshd> it's also sparse, in that objects are only created when you write to them
[0:56] <Qten> so if i have a rbd which is say 10g, then i snapshot it, and i write another 2g to it, is the snapshot 12g or 2g?
[0:57] <joshd> the extra space used is 2g since each object is treated separately
[0:58] <Qten> so the snapshots work like vmware snapshots i gather?
[0:58] <joshd> i.e. data for a snapshot is not actually copied until the object is written to
[0:58] <Qten> nice
[0:59] <joshd> I'm not too familiar with vmware, but rbd snapshots don't form a chain or anything like that - you can delete old snapshots, rollback to any of them, and there's no coalescing (you can just delete snapshots to save space)
[1:00] <Qten> wow
[1:00] <Qten> ok thats not like vmware :)
[1:00] <joshd> there's nothing like backing files etc - it's all just one rbd image
[1:01] <joshd> the one feature that we don't have yet (but will soon) is copy-on-write cloning to create new images from old ones (or old snapshots)
[1:01] <Qten> how does that work if you have say 10g snapshot -> 2g (snapshot -> 3g (snapshot -> 4g (snapshot) were at 20gig, in total how could i delete snapshot 3g without losing data?
[1:02] <joshd> it's all part of how RADOS deals with 'self-managed' (i.e. application managed, per-object) snapshots
[1:03] <Qten> ah so its not the rbd part doing that its the server on a chunk level
[1:04] <joshd> yeah, exactly
[1:04] <Qten> ahh makes sence
[1:04] <Qten> very interesting
[1:05] <joshd> once you have rados, rbd is a pretty simple wrapper
[1:05] <Qten> heres a general question, why have a block device at all why not just treat block devices as files? ie make a gateway effectivly which just interfaces a folder as a disk?
[1:06] <Qten> for an OS install
[1:06] <Qten> and i imagine you only need a small file to act as a MBR/metadata section
[1:06] <Qten> or to try and explain it better emulate a hard disk from a folder
[1:06] <Qten> of files
[1:07] * mgalkiewicz (~mgalkiewi@85.89.186.247) Quit (Quit: Leaving)
[1:08] <joshd> you could think of rbd as doing exactly that, since each rbd image uses a unique object prefix (like a folder) and stores objects in that namespace
[1:09] <joshd> the block interface itself is useful for vms and other applications that use block devices directly
[1:09] <Qten> hmm
[1:10] <Qten> so i was thinking more along the lines of a gateway to object storage instead of storing the files inside a 4mb chunk you store the actual file
[1:10] <Qten> probably a crazy idea just a thought
[1:12] <Qten> for example in our ubunu insteall we hvae 3 files file1 file2 file3, currently they are stored inside a block device why not just store them directly on the object storage and have a driver which acts like a block device but each file is easily accessable
[1:12] <Qten> not hidden inside a 4mb chunk
[1:12] <perplexed> Monitoring - Are there any recommendations on how best to approach monitoring the cluster? Is this just a case of polling with "ceph -s" and looking for problems, or is there a better approach?
[1:12] <Qten> i would have thought that would have lower overhead then trying to use chunks
[1:13] <perplexed> http://ceph.newdream.net/docs/master/ops/monitor/ is TBD currently
[1:13] <gregaf> Qten: then you need a gateway server everybody talks to ??? doesn't scale, and is a single point of failure
[1:14] <gregaf> if you want to share files simply, mount RBD volumes and then have them mount a Ceph FS to look at the files :p
[1:14] <gregaf> perplexed: if you run "ceph health" it'll give you an easily-parseable view of whether the cluster is happy or not
[1:15] <Qten> ok so you've kind of done what i'm trying to explain then just kept the folder as chunks
[1:15] <gregaf> I think you can get it in JSON too
[1:17] * loicd1 is now known as loic
[1:17] * loic is now known as loicd
[1:19] <joshd> perplexed: there's a collectd plugin too http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/4959
[1:24] <joshd> perplexed: oh and a nagio plugin for monitoring ceph health: https://github.com/dreamhost/ceph-nagios-plugin
[1:25] <perplexed> Thx for the links
[1:28] <Qten> so does anyone know if rbd works with openstack swift?
[1:32] <gregaf> joshd: sagewk: want to check out wip-oc-perf?
[1:33] <sagewk> k
[1:34] <joshd> Qten: what do you mean by 'works with'?
[1:35] <joshd> Qten: they're pretty unrelated - swift can't replace rados under rbd, if that's what you're asking
[1:36] <joshd> Qten: eventual consistency and only whole-object writes wouldn't be very good for rbd's correctness and performance, respectively
[1:37] <Qten> fair enough
[1:37] * lofejndif (~lsqavnbok@9YYAAE9DL.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:37] <Qten> i was a little confused by the openstack info on the ceph page
[1:37] <Qten> read to me the rbd was going to work with swift
[1:37] <Qten> :)
[1:38] <joshd> radosgw implements the s3 and swift apis, so you could use it in place of them, but rbd is separate
[1:39] <sagewk> qten: which url are you looking at?
[1:39] <sagewk> gregaf: looks sane. should test it under joshd's modified fsx
[1:40] <Qten> http://ceph.newdream.net/openstack/
[1:40] <gregaf> okay
[1:40] <Qten> The OpenStack??? project has been an important part of Ceph???s development effort. The OpenStack community has expressed widespread interest in using Ceph???s RADOS Block Device (RBD) to fill the block storage void in the cloud software stack. They have also expressed interest in Ceph???s distributed object store as a potential alternative to Swift.
[1:41] <gregaf> sagewk: I forgot about the averages, do you think it's worth changing?
[1:41] <sagewk> i think we may as well be consistent.
[1:41] <gregaf> I'm not sure what consistent is, here :p
[1:41] <Qten> kind of reads to me that rbd is working with openstack not that ceph is replacing swift :)
[1:41] <sagewk> otoh, i don't think there is an integer version of the avg stuff, only float
[1:42] <sagewk> i mean, if the intent is to communicate an average to the user (i think that's what i'd expect for write latency) then we should use a consistent api for that
[1:42] <gregaf> well I wouldn't want to change the bytes blocked to an average, anyway, I don't think
[1:43] <gregaf> and it's not explicitly counting unblocked writes; I was thinking about it more as the kind of thing you'd like at when setting cache defaults, not an average you want to check on an ongoing basis
[1:50] <sagewk> yeah
[1:52] <sagewk> gregaf: ok, we need to come up with a real fix for the boot thing
[2:01] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:13] <dmick> six planas dead: 02 03 05 06 15 79
[2:13] <dmick> anyone know anything?
[2:13] <joao> nope
[2:14] <elder> I think it was Colonel Mustard
[2:14] <dmick> 79 apparently has firmware issues
[2:14] <dmick> the rest don't even answer on serial console
[2:15] <dmick> or, well, I say firmware issues, but all I know is its Ethernet is down
[2:15] <joao> dmick, is it possible they were shut down?
[2:15] <dmick> well anything's possible; if you mean "just rebooting", they've been that way for a while
[2:16] <joao> no, I mean completely halted
[2:16] <joao> although I have no idea if there's anything that could lead to a full shutdown running on the planas
[2:17] <dmick> well, I'd kinda expect people not to do that with shared test machines, which is why I asked
[2:18] <dmick> 79: ifup worked. weird.
[2:18] <joao> dmick, btw, even if they were rebooting they should answer on the serial console, no?
[2:18] <joao> much like old sepia?
[2:18] <dmick> I think *most* of plana have been set that way, not sure about all
[2:19] <dmick> because of things like this, where there are always a few that don't answer :)
[2:20] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:22] <elder> Anybody know whether I can reuse tasks in the definition of other tasks in a teuthology python script?
[2:23] <elder> I want to add something to rbd.py, but would like my new task to be able to do, for example, and rbd.modprobe as part of what I'm adding (rather than requiring it)
[2:27] <joshd> elder: you can import the functions and call them the same way the rbd task does, with a contextutil.nested
[2:28] <elder> Ahh, so this: @contextlib.contextmanager
[2:28] <elder> def task(ctx, config):
[2:28] <elder> defines the task, and later
[2:29] <elder> with contextutil.nested(
[2:29] <elder> lambda: create_image(ctx=ctx, config=norm_config),
[2:29] <elder> lambda: modprobe(ctx=ctx, config=norm_config),
[2:29] <elder> allows those sub-"functions" to be called?
[2:29] <elder> Something like that?
[2:29] <joshd> yeah
[2:36] <elder> So how would I do a sequence of calls like that?
[2:42] <elder> Also, joshd, how is it that the rbd task enforces that it only gets run on clients?
[2:42] <elder> I see this in the mount() definition: PREFIX = 'client.'
[2:42] <elder> assert role.startswith(PREFIX)
[2:44] <joshd> yeah, that does enforce it, although it doesn't try very hard (could be done at the beginning)
[2:45] * perplexed (~ncampbell@216.113.168.141) has left #ceph
[2:45] <dmick> well, all of 02 03 05 06 15 are marked as not up and unlocked, so I guess they're fair game to reboot
[2:45] <elder> Thanks joshd
[2:46] <dmick> not sure why whoever marked them down didn't do that, but I guess we can't find out who that was to ask
[2:46] <joshd> the contextutil.nested runs each task you pass it in sequence
[2:46] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:47] <elder> OK, great.
[2:47] <elder> Back in a little while.
[3:09] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[3:18] <joao> bye o/
[3:19] * joao (~JL@89.181.153.140) Quit (Quit: Leaving)
[3:37] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[3:42] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[3:46] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[4:11] * loicd (~loic@83.167.43.235) Quit (Quit: Leaving.)
[4:33] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[6:07] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:38] * f4m8_ is now known as f4m8
[6:51] <sage> dmick: hmm, those are the ones i used for the rcb demo. i didn't mark them down, tho.. or didn't mean to. they were reimaged so i nuked for good measure and unlocked.
[7:36] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[7:36] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[7:59] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[8:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:51] * cattelan is now known as cattelan_away
[9:15] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[9:31] * loicd (~loic@83.167.43.235) has joined #ceph
[10:45] * joao (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[10:45] <joao> hello
[10:53] <filoo_absynth> morning joao
[10:53] <Dieter_b1> hi
[11:39] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:43] <joao> well, be back in a couple of hours
[11:43] <joao> o/
[11:49] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[12:17] * joao (~JL@89-181-153-140.net.novis.pt) Quit (Ping timeout: 480 seconds)
[12:56] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Remote host closed the connection)
[12:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[14:21] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[14:33] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[14:56] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) has joined #ceph
[15:08] * lxo (~aoliva@1RDAAAR1Y.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[15:14] * joao (~JL@89.181.153.140) has joined #ceph
[15:47] * f4m8 is now known as f4m8_
[15:51] <nhm> good morning #ceph
[15:51] <filoo_absynth> good morning nhm
[15:51] <joao> hi :)
[15:55] <todin> filoo_absynth: hi, are you from the filoo campany in germany?
[15:55] <filoo_absynth> yah
[15:57] <todin> nice, how is you ceph cluster?
[15:57] <nhm> filoo: Do you guys work with Bull much out there?
[15:57] <filoo_absynth> bull?
[15:57] <filoo_absynth> uh, no.
[15:58] <filoo_absynth> why would we? bull is french, afaik
[15:58] <filoo_absynth> as far as i know, their small-range x86 clusters are just the same supermicro barebones that everyone else uses, though
[15:59] <filoo_absynth> todin: it's currently not in production ;)
[15:59] <nhm> filoo_absynth: yeah, I was just curious how prolific they are in Europe.
[15:59] <todin> filoo_absynth: that's a shame, I wrote a few email with oliver berfore the cbit
[15:59] <filoo_absynth> nhm: they have a footprint in science, i think. in the industry... i dunno. not much
[16:01] <nhm> filoo_absynth: I came from an academic/hpc environment before I joined up with the ceph crew.
[16:01] <filoo_absynth> yeah, me too
[16:01] <filoo_absynth> Grid stuff, mostly
[16:01] <filoo_absynth> we actually had a Bull cluster running
[16:01] <nhm> filoo_absynth: globus? ;)
[16:02] <filoo_absynth> globus, glite, UNICORE
[16:02] <filoo_absynth> german national grid infrastructure was tri-middleware (which is insanity, if you ask me)
[16:02] <filoo_absynth> my Phd thesis was about globus, tho
[16:02] <joao> oh, poor you
[16:03] <filoo_absynth> i did my fair share of gLite too, though
[16:03] <joao> no one should have to cross paths with globus :(
[16:03] <nhm> filoo_absynth: I was a project manager for a couple of years to develop bioinformatics software ontop of globus/cagrid.
[16:04] <filoo_absynth> ah
[16:04] <filoo_absynth> i went on a couple OGFs and IEEE Grids back in the day
[16:05] <nhm> filoo_absynth: We mostly focused on cagrid,open science grid, and teragrid
[16:06] <filoo_absynth> well, i'm glad that's behind me tbh
[16:06] <nhm> filoo_absynth: hah, I feel the same way. ;)
[16:06] <filoo_absynth> the globus guys have *no* concept of consistency in their design
[16:06] <filoo_absynth> globus 2: binary protocol, fuck web services.
[16:06] <filoo_absynth> globus 3: pff. lets skip this release.
[16:06] <filoo_absynth> globus 4: let's do web services only. fuck binary protocols
[16:07] <nhm> filoo_absynth: amazingly globus was actually much better than what cagrid built on top of it.
[16:07] <filoo_absynth> globus 5: uh, web services are slow and suck. let's do binary protocols.
[16:08] <filoo_absynth> well, for d-grid there was so much to build on that nobody could decide which middleware to actually use
[16:08] <filoo_absynth> and all for political reasons
[16:09] <nhm> filoo_absynth: cagrid was like "lets take slow web services, create huge XML ontologies defining every possible input/output, and enbed base64 encoded attributes in XML regardless if it makes sense.
[16:10] <filoo_absynth> oh yeah, ontologies. workflows! oh, and let's skip the security features for the time being, they break our services because we don't understand how a PKI works.
[16:11] <nhm> so they ended up trying to transfer multi-gigabyte microarray data as tiny little XML base-64 encoded chunks with all kinds of (useless) metadata over axis1 soap. It was madness.
[16:11] <nhm> filoo_absynth: exactly!
[16:14] <filoo_absynth> but well, it earned me the two letters, that's what counds.
[16:14] <filoo_absynth> -d+t
[16:14] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[16:14] <filoo_absynth> and i developed a healthy mistrust for overdesigned and overabstracted systems
[16:16] <nhm> filoo_absynth: good for you. :)
[16:16] <filoo_absynth> now i need a stable Ceph release.
[16:16] <filoo_absynth> anyone?
[16:17] <nhm> filoo_absynth: we're working on it. ;)
[16:17] <filoo_absynth> i know. we are, too :)
[16:17] <nhm> filoo_absynth: if you could choose, what would you want stable first?
[16:17] <filoo_absynth> the filesystem so i can run VMs
[16:18] <filoo_absynth> i know that's not your primary focus
[16:18] <joao> wouldn't the rbd be more useful in that case?
[16:18] <filoo_absynth> we plan on using radosgw sometime in the future, but right now all the inconsistencies in the qemu-rbd component hurt most, i guess
[16:21] <nhm> filoo_absynth: Assuming we can fix the stability issues, what kind of performance do you need?
[16:22] <filoo_absynth> that's a good question. let me answer it with another one: what kind of metric are you asking for?
[16:22] <filoo_absynth> mb/sec?
[16:22] <filoo_absynth> $transaction-type / sec?
[16:23] <nhm> filoo_absynth: let me answer with yet another question, what is most important to you?
[16:24] <filoo_absynth> i would have to take a long, hard look at oliver for this, but i think it's filesystem latency rather than raw thruput
[16:25] <filoo_absynth> wait a sec, let me take a hard look at oliver
[16:27] * oliver1 (~oliver@p4FD07077.dip.t-dialin.net) has joined #ceph
[16:27] <filoo_absynth> yeah. we both think latency is the interesting metric
[16:27] <filoo_absynth> as to actual *values*, i would have to take yet another hard look at oliver1
[16:28] <filoo_absynth> todin: are you the guy with the large Ceph setup?
[16:29] <oliver1> Hi... *yawn* took my afternoon-nap ;)
[16:29] <oliver1> Anything in the news for me?
[16:30] <nhm> filoo_absynth, oliver1: ok, good to know. any kind of details you can provide would be nice to know. (ie tyical transfer sizes, concurrent transfers in flight, etc)
[16:30] * filoo_absynth points at oliver1
[16:30] <oliver1> *ducks*
[16:31] <nhm> :)
[16:32] <oliver1> Well... all you can imagine that could be done within a linux-VM. There are all possible applications running, web-shops with high hit-rates, galleries, upload-streaming... you name it...
[16:34] <nhm> oliver1: Ok. Beyond the stabliity issues you guys hit, has performance been a problem?
[16:35] <oliver1> nhm: And, there are per node s/t in the range of 150 VM's running... so latency should be as small as possible. And should not degrade having as many pools, btw ;)
[16:36] <filoo_absynth> oh yeah, the pool thing
[16:36] <filoo_absynth> we would really like to have one pool per customer
[16:37] <oliver1> nhm: no, performance has never been an issue... Keeping journal on the SSD's. Being fast, we have been compared to other providers and got good judges.
[16:37] <nhm> filoo_absynth: interesting, so 150 pools has been a performance problem?
[16:38] <filoo_absynth> no, not yet. but we were advised that maybe having 2000 would be an issue
[16:38] <oliver1> nhm: there was a performance-issue predicetd by... Sage(?) if too many pools per OSD...
[16:38] <nhm> oliver1: ok, that's good to know. my Role is primarily to look for performance issues, so the more I know about what customers are seeing the better. :)
[16:41] <filoo_absynth> of course there's a massive impact during rebuild after an OSD crash (at least if you don't have 10gbps links between all nodes)
[16:41] <oliver1> nhm: best "helpers" for latency+performance of course have been: "rbd_writeback_window" and using linux-aio switched on in qemu-kvm.
[16:42] <filoo_absynth> but i guess that's trivial-ish
[16:43] <nhm> oliver1: ok. Most of my testing has been direct rados tests so far.
[16:43] <nhm> oliver1: I haven't really looked at rbd at all yet.
[16:44] <oliver1> nhm: no prob. We have ;)
[16:54] <elder> I need help from someone with some expertise with python and teuthology tasks.
[16:55] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:57] <todin> filoo_absynth: I think so
[17:05] <todin> oliver1: for performacne+latency you should try the librbdcache branch, it's really working well, iops improved from 5k to beyond 30k
[17:05] <filoo_absynth> big hosting commpany with an S in its name?
[17:05] <todin> filoo_absynth: correct
[17:07] <wido> nhm: With a lot of pools you'll get high memory usage on the OSDs due to the high amount of pgs
[17:07] <wido> Each pg eats some memory on the OSD. The more pools, the poor pgs, thus more memory usage
[17:08] * lofejndif (~lsqavnbok@9YYAAFADE.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:08] <oliver1> wido: yeah, so we should be fine with ~150 VMs/customers per node?!
[17:08] <filoo_absynth> todin: i'd be very interested in hearing what you are doing over there, but i presume it's not for public discussion with competitors ;)
[17:09] <nhm> wido: ah yes, I think I remember hearing that. Do you remember how much per PG?
[17:09] <oliver1> todin: well... uhm, waiting for some fixes in current rbd-code, though ;) => #2178
[17:10] <nhm> elder: I'm not python expert, but if you want I can try to help.
[17:10] <wido> oliver1: It's not about the number of VM's
[17:10] <wido> but it's about the number of RADOS pools you have
[17:10] <wido> nhm: I'm not sure how much one PG consumes, that is something I haven't tested yet
[17:11] <wido> I currently have 3 pools (data, metadata, rbd) with a total of 7920 PGs and 40 OSDs
[17:12] <wido> each OSD consumes about 150MB of RAM
[17:12] <wido> but they are idle at the moment
[17:12] <wido> have to go afk, brb
[17:14] <todin> filoo_absynth: filoo_absynth most what I do is not a secret
[17:15] * joao-phone (~JL@28.135.189.46.rev.vodafone.pt) has joined #ceph
[17:16] * joao-phone (~JL@28.135.189.46.rev.vodafone.pt) Quit ()
[17:16] <todin> oliver1: my ceph cluster is runnig very stable for the last month
[17:17] <nhm> todin: how big is your cluster? We are setting up some small 48 OSD clusters internally with xfs, btrfs, and btrfs+patches now for long term testing.
[17:21] <wonko_be> todin: what FS do you use, with what patches applied?
[17:21] <oliver1> todin: well, if everything was stable, no OSD's crashing, we had no prob at all. things went worse, after first inconsistencies came up.
[17:23] <todin> nhm: 12 osd server with each 4 osd daemons tough 48 osd and 12 nodes where the vms are running, thats a test cluster here
[17:24] <todin> wonko_be: xfs
[17:24] <todin> oliver1: that's bad, I use ceph diffrently all vms are in the rbd pool
[17:26] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[17:27] <oliver1> todin: not clear, uhm, so we have "ceph -> OSD -> /<customer-pool>/<disk-image>.rbd, image used by qemu-kvm via virtio... rbd:...
[17:28] <todin> oliver1: I do the same expect of the customer pool
[17:28] <filoo_absynth> maybe we should re-evaluate that design decision, independently of performance issues...
[17:30] <elder> nhm, I can send you my bigass new task definition so you can see what I'm bumping into. Basically it's a syntax error using "with"
[17:31] <nhm> elder: sure
[17:32] <todin> filoo_absynth: what is the point of putting each customer into its own pool?
[17:33] <elder> nhm, check your e-mail
[17:35] <oliver1> todin: if I have a pg per image, the number of VM's should correlate with the pg's, or am I misleaded there?
[17:35] <oliver1> todin: it's a convenience of accounting, having a dedicated pool showing all used space.
[17:36] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:36] <todin> oliver1: ok, we do not account for used space, the customer should pay for the allowed space
[17:36] <elder> joshd, if you show up (or Tv_) I could use your help.
[17:37] <todin> oliver1: As far as I know the number of pg's should correlate with the number of osd's
[17:37] <oliver1> todin: true... the setup was from first design...
[17:38] <Tv_> elder: i'm here but going into a meeting in 15min
[17:40] <elder> Can I send you this thing and have you do a quick consult on why I'm getting a syntax error?
[17:40] <Tv_> sure
[17:41] <filoo_absynth> .oO( "this thing of ours" ...)
[17:42] <elder> Tv_, check your e-mail. I feel like I'm close, but I don't get the with...lambda stuff and so I'm not sure how to fix that.
[17:42] <Tv_> elder: pastebin the full error please
[17:42] <elder> OK.
[17:43] <Tv_> if role in config.iterkeys():
[17:43] <Tv_> assert role.startswith('client.'), \
[17:43] <Tv_> ...
[17:43] <Tv_> if config is None:
[17:43] <Tv_> config = { 'all': None }
[17:43] <Tv_> ^ that's gonna hurt at some point, reorder
[17:43] <elder> http://pastebin.com/kkFAF9ZW
[17:43] <Tv_> or perhaps you didn't mean config.iterkeys
[17:43] <elder> Let me look at that.
[17:43] <Tv_> what version of python are you running?
[17:44] <elder> I was trying to just verify we're running only on clients. On my local machine it's 2.7.2+ I don't know on plana
[17:44] <Tv_> yeah you not plana
[17:44] <elder> Both are 2.7.2+
[17:45] <Tv_> oh there we go
[17:45] <Tv_> log.info(' scratch image
[17:45] <Tv_> {scratch_image}'.format(scratch_image=scratch_image)
[17:45] <Tv_> count the parens
[17:46] <elder> Sweet.
[17:46] <elder> I'll try again. Rest looks somewhat reasonable at first glance though?
[17:46] <Tv_> didn't look in detail, running out of time soon...
[17:47] <elder> I understand. I mean you didn't puke upon glancing upon this masterpiece of mine.
[17:47] <Tv_> hehe
[17:48] <oliver1> todin: if Sage comes with the next wip-branch, then I'm very keen on doing the next "most-threatening-evil-wallbreaking"(tm) tests :-D
[17:48] <Tv_> it's merely at an equivalent level of hideousness to the other stuff in there
[17:48] <Tv_> (teuthology needs cleanup)
[17:48] <elder> My first bit of Python and I feel like I got a bit ambitious.
[17:49] <nhm> elder: might as well jump in head first.
[17:50] <elder> Well that got me past the syntax error. Thanks Tv_ I'll keep chipping away at it.
[17:51] <todin> oliver1: atm I am testing wip-librbd-caching which works very well
[17:55] <oliver1> todin: waiting on wip-osd-reorder for some reason ;)
[17:55] <todin> oliver1: what will it bring?
[17:58] <oliver1> todin: "world piece", uhm... solution for corrupted/mangeld rbd-blocks whilst STRESS-testing
[18:05] * perplexed (~ncampbell@216.113.168.141) has joined #ceph
[18:06] <oliver1> leaving... have a good one... ;)
[18:07] * oliver1 (~oliver@p4FD07077.dip.t-dialin.net) has left #ceph
[18:09] <elder> OK, now I can't get to the plana nodes. Did I do something?
[18:10] <nhm> elder: nope, network is dead.
[18:10] <elder> CRAP
[18:10] <nhm> elder: burnupi too.
[18:10] <elder> Just when I was getting things working
[18:10] <elder> OK, taking a break for a few minutes then...
[18:20] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:24] <elder> Anyone running a Mac should run software update, btw. http://flashbackcheck.com/
[18:25] <joao> oh boy
[18:25] <joao> let me check that out
[18:32] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:33] <sagewk> i'm able to reach plana...
[18:33] <elder> Irvine data center issues again.
[18:34] <elder> I still am not.
[18:34] <elder> ...able to reach plana
[18:34] <elder> http://www.dreamhoststatus.com/
[18:34] <sagewk> aie
[18:35] <nhm> sagewk: where can you get to plana from?
[18:36] <sagewk> metropolis
[18:37] <elder> Nope.
[18:44] <sjust> so, my previously live plana sessions are fine, but I can't ping google from plana20
[18:45] <sjust> if that information is useful to anyone
[18:46] <elder> Now we know one place *you* can't get to from plana20, so that's useful I guess.
[18:46] <elder> :)
[18:47] <elder> I trust they'll get it sorted out soon. Not much anyone here is going to do to fix things I imagine.
[18:47] <sjust> yup
[18:55] * lofejndif (~lsqavnbok@9YYAAFADE.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[18:57] <wido> Yay! Ubuntu just build their Qemu-KVM packages for 12.04 with RBD support
[18:57] <wido> A 12.04 machine will be able to run Qemu-RBD without any external packaging :)
[18:58] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) Quit (Quit: Ex-Chat)
[19:03] * lofejndif (~lsqavnbok@04ZAACLWW.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:06] * Oliver (~oliver1@ip-88-153-226-5.unitymediagroup.de) has joined #ceph
[19:07] * loicd (~loic@83.167.43.235) Quit (Quit: Leaving.)
[19:07] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:14] * Oliver (~oliver1@ip-88-153-226-5.unitymediagroup.de) Quit (Quit: Leaving.)
[19:19] <NaioN> wido: yeah that's really nice
[19:20] <NaioN> we're building a setup with ceph + kvm-rbd + spice
[19:25] * lofejndif (~lsqavnbok@04ZAACLWW.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:40] <elder> Damnit it was back for a bit there. Now it's gone again.
[19:44] <dmick> elder: weird. outbound is good here.
[19:46] <nhm> yeah, lost my connections too.
[19:47] <nhm> though my connetion through metropolis is still working for now.
[19:49] <Tv_> elder, dmick, nhm: office to IRV works, IRV<->Internet is failing
[19:49] <Tv_> so you can do internet<->metropolis<->IRV
[19:49] <Tv_> but not direct
[19:50] <Tv_> as far as i know
[19:50] <Tv_> lovely
[19:56] <NaioN> I want to use the cluster network and public network options
[19:56] <NaioN> but can I just add a cluster network = x to the [osd] and NO cluster addr = y to the [osd.0]?
[19:57] <NaioN> so the osd figures out it has to use the interface facing the cluster network?
[19:58] <Tv_> NaioN: yes
[19:58] <NaioN> and how do the cluster network/public network options work? Do I need to add a subnet? Something like 172.16.0.0/24?
[19:58] <NaioN> Tv_: thx that's nice
[19:58] <NaioN> it makes the config really clean
[19:58] <Tv_> that's why i pushed to have it ;)
[19:58] <NaioN> hehe :)
[19:58] <Tv_> they take a ip/mask
[19:59] <NaioN> well I have a lot of osds on the same node
[19:59] <Tv_> they'll look at the ips configured on the box, find the first match for one of the entries (the "* network" can be a list)
[19:59] <NaioN> this way I only need to define [osd.1] host = node1 etc
[19:59] <NaioN> and nothing more in the osd.x part
[20:00] <Tv_> and with the newer-new style, you won't need [osd.1] at all.. we'll get there ;)
[20:01] <NaioN> nice!
[20:01] <NaioN> my config gets cleaner and cleaner
[20:04] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[20:13] <wonko_be> Tv_: how do you plan to get rid of the osd.x entries?
[20:15] <NaioN> maybe a node entry?
[20:16] <elder> Cool, my pings are responding every second, not every 4-5 like I've seen in the past.
[20:16] <elder> nhm, you know what I'm talking about?
[20:17] <elder> Wait, nevermind, I was pinging gitbuilder.ceph.com, not plana nodes.
[20:19] <elder> In fact...... it's gone again.
[20:20] <elder> Tv_, once I have verified this works, I have a shell script that my changes to rbd.py will call. Where should I put the script?
[20:22] <NaioN> Tv_: i've something weird, the mons are ignoring the mon addr option...
[20:23] <NaioN> I have two interfaces eth0 and ib0 and they are listening on the eth0 interface while in the config I used the address of the ib0 interface
[20:32] <joshd> NaioN: does using 'public addr' instead of 'mon addr' work?
[20:33] <joshd> NaioN: sounds like a bug though
[20:34] <NaioN> I'm now testing with just the cluster network for the osds
[20:34] <NaioN> I will try that next
[20:35] * lofejndif (~lsqavnbok@1RDAAAS4K.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:35] <NaioN> joshd: public addr in the mon.x clause?
[20:36] <joshd> NaioN: yeah - mons don't have a separate cluster addr, and it looks like some special logic is applied to public addr but not mon addr
[20:36] <NaioN> oh ok I'll try it!
[20:36] <NaioN> do I still need the mon addr option?
[20:37] <NaioN> or swap it?
[20:37] <joshd> swap it
[20:38] <joshd> oh, actually if you're using mkcephfs it still needs mon addr
[20:40] <NaioN> nope didn't work
[20:40] <NaioN> euhmmm I don't do a mkcephfs
[20:40] <NaioN> just changing the config
[20:40] <NaioN> and restarting the daemons
[20:41] <NaioN> are those addresses put into more files?
[20:41] <joshd> they're in the monmap
[20:41] <NaioN> hmmm so I need to update the monmap
[20:42] <joshd> you probably need to remove and re-add the mon like http://ceph.newdream.net/docs/wip-doc/ops/manage/grow/mon/
[20:42] <NaioN> hmmm it's easier to recreate the cluster :)
[20:43] <NaioN> hasn't any data in it
[20:43] <Tv_> wonko_be: a server will run osds for all the osd data disks it has
[20:44] <Tv_> elder: what's the shell script do? (also, going for lunch very soon, catch you after that)
[20:44] <wonko_be> yeah, but to get the osd started, you still need a correct config file
[20:44] <Tv_> wonko_be: [global] and [osd] are plenty
[20:45] <Tv_> wonko_be: https://github.com/ceph/ceph-cookbooks/blob/master/ceph/templates/default/ceph.conf.erb is what the config currently looks like for the chef-based deployments
[20:45] <Tv_> wonko_be: still working on it
[20:46] <NaioN> joshd: I'm recreating the cluster
[20:46] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:47] <elder> The shell script sets up for, builds, and runs xfstests, then cleans up after itself completely.
[20:47] <elder> (Whenever you get back from lunch is fine. I'm waiting again for the bouncing network to come back again.)
[20:48] <wonko_be> Tv_: okay, got it, works indeed
[20:48] <wonko_be> must have missed something earlier when I tried this before
[20:52] <NaioN> joshd: thx it works
[20:53] <NaioN> but with the mon addr, I didn't know the ips where also added in the monmap
[20:54] <joshd> NaioN: it'd be good to know if mon addr works too
[20:54] <NaioN> why are they added in the monmap? they are also defined in the config? This way it's a lot of work to change the ips of the mons
[20:55] <joshd> monitor ips are the only addresses that need to be constant, since they're how everything else connects to the cluster
[20:55] <joshd> they're not meant to be changed, more like new ones added and old ones removed
[20:57] <joshd> osds, mds, and clients all use the monitor addresses to connect initially to be able to access the rest of the cluster
[20:59] <joshd> and you don't need to change the config everywhere, it should be possible to do all this with just command line options, but there may be some bugs to be worked out there still
[21:03] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[21:06] * adjohn (~adjohn@50.56.129.169) has joined #ceph
[21:20] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:22] <Tv_> elder: would that perhaps be a "workunit"?
[21:23] <Tv_> elder: those are defined as runnable things in ceph.git qa/workunits/ that assume chdir is inside a filesystem to test
[21:28] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Quit: Ex-Chat)
[21:34] <elder> I get that, and maybe it is. But I'm providing parameters using the other teuthology-like stuff.
[21:34] <elder> To be honest, this is pretty new stuff to me so I may well be going off in a strange direction.
[21:38] <elder> Furthermore, there are two filesystems that need to be under test, and I've arranged for them to be specified by naming the rbd image (and its size) that will back each.
[21:38] <elder> And one can specify what type of filesystem is to be formatted on to those images (xfs, btrfs, etc.)
[21:38] <sagewk> elder: this si definitely not a workunit.. the args it takes are raw block devices and cwd is (possibly) irrelevant
[21:38] <elder> Right.
[21:39] <elder> I wrote it as a shell script because that's what I know how to do.
[21:39] <elder> I now know quite a bit more about Python than I did yesterday too, but still not enough to do what I did here effectively using that language.
[21:40] * mkampe (~markk@aon.hq.newdream.net) Quit (Remote host closed the connection)
[21:41] <joshd> elder: could you pastebin the shell script?
[21:42] <elder> Sure.
[21:43] <elder> http://pastebin.com/N8CA3JS3
[21:44] <elder> Syntax highlighted even (thought I don't like the colors)
[21:46] <elder> Just found a bug in there...
[21:46] <elder> Good thing to get things reviewed I guess.
[21:47] <joshd> hehe
[21:51] <elder> I also think it's not necessary for me to do the section setting up CEPH_ARGS and PATH and LD_LIBRARY_PATH when I'm actually running within teuthology, right?
[21:52] <sagewk> right
[21:52] <elder> OK, I'll pull that before it gets committed.
[21:52] * mkampe (~markk@aon.hq.newdream.net) has joined #ceph
[21:54] <joshd> elder: it seems like the mkfs and mount could be done with the existing tasks
[21:55] <joshd> also it'd be nice if it had an error exit status, unless your teuthology task is parsing the output or something
[21:57] <elder> I agree, but there isn't really a meaningful exit status provided by the existing xfstests script.
[21:58] <elder> The mkfs and mount are done to give a clean start. One of the filesystems gets routinely re-made during testing (scratch).
[21:58] <elder> The test filesystem also gets unmounted and re-mounted as part of testing as well (I think).
[21:59] <elder> In fact, the test (non-scratch) filesystem is supposed to be one that ages--not re-made--but I wasn't sure of a good way to handle that so I just start clean each time.
[22:00] <elder> So the mkfs and possibly the mount *could* get done by existing tasks, but doing it here is really part of making the shell script self-contained in that way, and *not* dependent on the teuthology stuff doing things for it.
[22:03] <joshd> that makes sense for running outside of teuthology
[22:04] <elder> Apparently teuthology runs as ubuntu, huh? So for root I have to do things via sudo?
[22:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[22:05] <joshd> yeah, although I think sjust did something with block devices and groups to make some of that unnecessary
[22:06] <sagewk> teuth can sudo your script too, that's probably easier.
[22:06] <sjust> I just added the ubuntu user to the group which allows raw block device access
[22:07] <Tv_> meh
[22:07] <Tv_> sjust: i don't like that
[22:07] <Tv_> the whole point of the sudo barrier is to make it harder to hurt yourself
[22:08] <elder> That's what I just did (sagewk--added a sudo to the command built for the run command)
[22:08] <elder> xfstests assumes it's running in a highly trusted environment and hence needs to be run as root.
[22:11] * lofejndif (~lsqavnbok@1RDAAAS4K.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[22:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:14] * lofejndif (~lsqavnbok@1RDAAAS77.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:23] <elder> I appear to be extremely close... My (very short) test run ran to completion, but I got complaints about debconf, since I'm doing a little package setup and teardown using apt-get within my script.
[22:25] <elder> http://pastebin.com/5r02eTwV
[22:26] <sagewk> elder: we can move those package dependencies into ceph-qa-chef
[22:26] <Tv_> elder: what?
[22:26] <Tv_> no apt-get in tests please
[22:27] <elder> I know.
[22:27] <elder> One step at a time.
[22:29] <elder> But I'll put them in repo ceph-qa-chef, path cookbooks/ceph-qa/recipes/default.rb (right?)
[22:29] <sagewk> yeah
[22:35] <elder> And it looks like *maybe* I could just add a section that just lists
[22:35] <elder> the set of 8 packages, one line each, using "package". I.e:
[22:35] <elder> # used by rbd.xfstests
[22:35] <elder> package 'libtool'
[22:35] <elder> package 'automake'
[22:35] <elder> package 'gettext'
[22:35] <elder> package 'uuid-dev'
[22:35] <elder> package 'libacl1-dev'
[22:35] <elder> package 'xfsdump'
[22:35] <elder> package 'dmapi'
[22:35] <elder> package 'xfslibs-dev'
[22:35] <Tv_> yup
[22:36] <Tv_> elder: "set -e" in the shell script please
[22:36] <Tv_> elder: -norc is not needed, it's not an interactive shell
[22:36] <elder> Done.
[22:36] <elder> (Old habits)
[22:39] <elder> Tv_, I pass test_list as the last argument to remote.run(). If its value is an empty list [] will that appear as an empty argument ("") in the shell command, or will it be no argument?
[22:40] <elder> (I tried None but that led to failure)
[22:40] <Tv_> elder: run takes no positional arguments, "last" is a very poor description.. what arg?
[22:40] <Tv_> elder: you mean args?
[22:40] <elder> Yes
[22:40] <elder> args[-1] is test_list
[22:41] <elder> (I think I just found that it ends up as no argument)
[22:41] <Tv_> elder: remote.run(args=['foo', 'bar', something_that_is_a_list]) will never work
[22:41] <Tv_> elder: it takes strings
[22:41] <elder> D'oh
[22:41] <elder> That sounds right.
[22:41] <elder> How do I make that string end up being an optional argument--not provided if it has no value?
[22:42] <elder> But provided if one is given in the yaml file?
[22:42] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[22:42] <Tv_> args = ['foo', 'bar']
[22:42] <Tv_> if something_that_is_a_list:
[22:42] <Tv_> args.append(','.join(something_that_is_a_list))
[22:42] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:42] <Tv_> remote.run(args=args)
[22:42] <elder> OK.
[22:42] <elder> I'll do that. Thanks.
[22:43] <Tv_> uhhhh
[22:43] <elder> OK, gotta go. BAck online in a few hours.
[22:43] <elder> Sorry, you were groaning something?
[22:43] <Tv_> sjust: so i was looking for that strace-to-replay thing i remember mentioning.. apparently one such project is called ioreplay.. but don't google that..
[22:44] <elder> OK. Later.
[22:44] <Tv_> sjust: http://code.google.com/p/ioapps/ etc
[22:45] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:45] <Tv_> sjust: please confirm you saw that..
[22:48] <nhm> Tv_: I'm planning on either using that, or modifying Jeff layton's tools to support multiple threads...
[22:50] * adjohn (~adjohn@50.56.129.169) Quit (Quit: adjohn)
[22:50] <nhm> Tv_: back when I first started I actually sort of got Jeff's strace analyzer working with a strace from an OSD, but all of threads were jumbled together. Sadly detaching strace from the process killed it.
[22:53] <sjust> TV_: sorry, I have now seen that
[22:56] <nhm> The python version of strace analyzer is here: http://clusterbuffer.wetpaint.com/page/Strace+Analyzer+-+Python
[22:56] <nhm> huh, nevermind. Guess it's email only.
[23:08] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[23:09] <buck> I'm seeing an odd mds behavior and figured I'd ask about it
[23:09] <buck> I'm running hadoop over ceph
[23:09] <buck> and when I execute a query, the code ends up opening the same file a few times
[23:10] <buck> and it looks like after doing this a few times, the MDS seg faults
[23:10] <buck> I have the /var/log/ceph/mds log (it's about 16 lines)
[23:10] <buck> is that too long for irc?
[23:11] <buck> er poor form i mean?
[23:11] <sagewk> pastebin is better!
[23:12] <buck> http://pastebin.com/wbzVKrmF
[23:13] <buck> if it matters, I was working from dkpg I built, using master
[23:17] <sagewk> do you have a core file you can gdb to get a backtrace?
[23:17] <sagewk> (hi btw :)
[23:18] <buck> I do not believe I have a core file. Let me go look at what I need to do to make a core file (i'm on ubuntu)
[23:18] <buck> and howdy
[23:18] <sagewk> should just be ulimit -c unlimited before starting ceph-mds
[23:21] <buck> turns out I do ahve a core file. working on getting that backtrace
[23:26] <buck> doesn't look like a lot of info (I may be doing something wrong)
[23:26] <buck> http://pastebin.com/gvnSxJKd
[23:26] <buck> a lot of threads though
[23:27] <sagewk> oh, no symbols. install ceph-dbg and then try
[23:27] <buck> that makes sense
[23:48] <perplexed> Is there a trick to rados bench with respect to seq mode? I seem to be running into an issue where the objects being read are reported as not being correct... looks like the objects being looked for are not there, although I've primed the pool with a write bench prior to the seq.
[23:49] * adjohn (~adjohn@50.56.129.169) has joined #ceph
[23:49] <buck> I installed all the -dbg packages and the core file still doesn
[23:50] <buck> t have symbols. I'm going to rebuld the cluster and see if that sorts things out
[23:50] <perplexed> e.g. rados -p testpool7 bench 60 write -b 102400 -t 16 followed by rados -p testpool7 bench 60 seq -b 102400 -t 16 complains "<hostname>_24268_object281 is not correct!" etc...
[23:52] <perplexed> is it necessary to prime the pool with writes over a long period to ensure the object ID is likely to be there?
[23:54] <joshd> they should all be there once the write benchmark completes
[23:54] <sagewk> gregaf may know
[23:55] <joshd> does 'rados --pool testpool7 get <object_name>' say the object doesn't exist?
[23:57] <gregaf> give me a minute to catch up, guys

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.