#ceph IRC Log

Index

IRC Log for 2013-02-18

Timestamps are in GMT/BST.

[18:11] -solenoid.oftc.net- *** Looking up your hostname...
[18:11] -solenoid.oftc.net- *** Checking Ident
[18:11] -solenoid.oftc.net- *** No Ident response
[18:11] -solenoid.oftc.net- *** Found your hostname
[18:11] * CephLogBot (~PircBot@rockbox.widodh.nl) has joined #ceph
[18:11] * Topic is 'v0.56.3 has been released -- http://goo.gl/f3k3U || argonaut v0.48.3 released -- http://goo.gl/80aGP || tell us about your Ceph use at http://ceph.com/census'
[18:11] * Set by ChanServ!services@services.oftc.net on Sun Feb 17 13:50:42 CET 2013
[18:11] * wido__ is now known as wido
[18:12] <wido> mikedawson: IRC seems to have a lot of netsplits
[18:12] <wido> causing the Logbot to be kicked out
[18:12] <mikedawson> wido: thanks - it's useful to me!
[18:13] <wido> mikedawson: Great! That's what they are for :)
[18:13] <infernix> ok so i've created a user and a key for it and gave it capabilities on a pool
[18:14] <infernix> now how can I get the python librados bindings use that user?
[18:14] <infernix> is there a default user i can define in a config file somehow? the documentation is really sparse on this
[18:17] * vata (~vata@2607:fad8:4:6:11f4:cedb:50b7:4c14) has joined #ceph
[18:17] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:23] * gregaf1 (~Adium@2607:f298:a:607:4863:282d:5bfd:a93c) has joined #ceph
[18:23] <infernix> meh. so i figured out how to authenticate with another user, but the same problem remains in subprocesses
[18:23] <infernix> cephx server client.testuser: unexpected key: req.key=0 expected_key=3ddb84bcc2b993d2
[18:24] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:24] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:24] * ksperis (~quassel@cse35-1-82-236-141-76.fbx.proxad.net) Quit (Remote host closed the connection)
[18:25] * andret (~andre@2a02:2528:ff65:0:129a:ddff:feae:7fe5) has joined #ceph
[18:25] * andret (~andre@2a02:2528:ff65:0:129a:ddff:feae:7fe5) Quit ()
[18:26] <infernix> it's as if there can only be one concurrent connection, which makes no sense at all
[18:26] * The_Bishop__ (~bishop@e177089147.adsl.alicedsl.de) has joined #ceph
[18:27] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:29] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:29] * gregaf (~Adium@2607:f298:a:607:7939:b194:6ed2:5b12) Quit (Ping timeout: 480 seconds)
[18:30] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:30] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:30] * stxShadow (~jens@jump.filoo.de) Quit (Remote host closed the connection)
[18:31] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:32] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[18:33] * The_Bishop_ (~bishop@e179004093.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[18:35] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[18:42] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[18:43] <infernix> gahh
[18:44] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[18:45] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:46] <fghaas> infernix:
[18:47] <fghaas> import rados
[18:47] <fghaas> c = rados.Rados(conffile='/etc/ceph/ceph.conf', rados_id='foo')
[18:47] <fghaas> c.connect()
[18:47] <fghaas> ctx = c.open_ioctx('test')
[18:47] <fghaas> that's how you would connect as client.foo (using whatever keyring is specified for that user in ceph.conf), and open an ioctx on pool "test"
[18:48] <fghaas> now I don't know how you use subprocess there, but that's the general standard access method (ttbomk)
[18:51] <fghaas> (that example is taken from the bof that rturk-away and I did at oscon last year, just in case you're wondering)
[18:53] <infernix> fghaas: yes so that works beautifully in single processing mode
[18:53] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:53] <infernix> but if I use multiprocessing, any subprocesses just fail to connect with the abovementioned errors
[18:53] <infernix> the primary process connects fine.
[18:54] <infernix> it's almost as if the rados python library breaks when using multiprocessing
[18:54] <infernix> i even tried multiple users for the child processes, but even then it fails
[18:56] <infernix> it's a fairly trivial multiprocessing rbd benchmark, let me pastebin it
[18:57] <infernix> http://pastebin.ca/2315177
[18:58] <infernix> the single threaded version of this works perfectly fine: http://pastebin.ca/2315178
[18:58] <infernix> it's a simple benchmark on rbd
[18:59] <infernix> if you have an rbd pool it will just create a randomly named device, do some writes and reads, and then delete it
[18:59] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[19:00] <infernix> i am trying to make it go faster by using multiprocessing (i have very fast SANs) but no matter how hard I try I can't get the child processes to connect to ceph :|
[19:29] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:37] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:39] * eschnou (~eschnou@146.108-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:39] * joshd1 (~jdurgin@2602:306:c5db:310:40ad:80cd:848d:32ef) has joined #ceph
[19:39] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:39] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:40] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:42] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Quit: Leaving.)
[19:42] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[19:49] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[19:50] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:52] * The_Bishop__ (~bishop@e177089147.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[19:53] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:57] * The_Bishop__ (~bishop@e177089147.adsl.alicedsl.de) has joined #ceph
[20:09] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[20:16] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[20:16] * junglebells (~bloat@CPE-72-135-215-158.wi.res.rr.com) has joined #ceph
[20:17] <junglebells> Anyone know if it's ok for me to use dd to copy from one RBD (format 1) to another RBD (also format 1)?
[20:18] <junglebells> I'm copying on the file layer right now and not getting the speed I'm hoping for. iostat indicates that both of my rbd's are at 100% usage and I'm getting high w_await's
[20:21] * junglebells (~bloat@0001b1b9.user.oftc.net) Quit (Quit: leaving)
[20:22] * junglebells (~bloat@CPE-72-135-215-158.wi.res.rr.com) has joined #ceph
[20:26] <infernix> junglebells: yes, you might want to play with bs=
[20:26] <infernix> try 1M or 4M
[20:27] <junglebells> infernix: Awesome. I assumed to but I didn't quite want to 'assume' after I'm 3h into my copy job *sigh*
[20:27] <junglebells> I think 4M would be appropriate as that's what my block sizes are all the way down to my RAID level on my OSD's
[20:28] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[20:31] * junglebells (~bloat@0001b1b9.user.oftc.net) Quit (Quit: leaving)
[20:31] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit ()
[20:31] * junglebells (~bloat@0001b1b9.user.oftc.net) has joined #ceph
[20:40] <infernix> junglebells: however i found single threaded performance to be lacking
[20:40] * The_Bishop__ (~bishop@e177089147.adsl.alicedsl.de) Quit (Read error: Connection reset by peer)
[20:40] <infernix> which is why i'm working on a python multiprocessing based version of dd
[20:40] <junglebells> infernix: You got it up on github or anything? I'd be happy to take it for a test drive sometime in the future
[20:41] <infernix> once i get it working
[20:41] <infernix> so far any child process fails to connect
[20:41] * The_Bishop (~bishop@e177089147.adsl.alicedsl.de) has joined #ceph
[20:41] <infernix> very puzzling
[20:42] <junglebells> psh ouch yea. I was able to get ~200MB/s read out of my RBD's but doing dd I'm only getting a copy rate of about 15.2MB/s even with some heavy caching all over
[20:43] <infernix> i can do 1.5-2gb reads/s
[20:43] <infernix> writes hover around 500-800mb/sec
[20:43] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[20:44] <junglebells> eugh I got something not right then...
[20:45] <junglebells> I have 3xOSDs each are 4x2TB 7.2k SAS in RAID10
[20:45] <infernix> i have 59 OSDs
[20:45] <junglebells> I have the two RBD's mapped on one of those nodes and that's what I'm conducting this copy with.
[20:46] <absynth> why do you put your OSDs in a raid instead of doing JBOD / multiple OSDs per host?
[20:46] <absynth> any specific reason for that?
[20:46] <junglebells> Yes
[20:46] <junglebells> I did experiment going that way but I got much more performance going the way of RAID
[20:46] <junglebells> We're going to be putting two DB nodes onto ceph so performance is trumping all else at this point.
[20:48] <todin> wido: the eu.ceph.com mirror doesn't have 0.56.3 on it
[20:50] <infernix> unexpected key: req.key=0 expected_key=89fb8213449ae3f0 - this means that the client isn't sending any key value, right?
[20:55] <junglebells> Anyone have a suggestion on speeding up a dd from rbd0 --> rbd1 (bs=4M). Seems like the synching to disk maybe slowing it down
[20:57] <junglebells> It's like the read portion of the of the dd goes fast and then it just remembers to flush to disk and then that finally is conducted.
[21:11] * gaveen (~gaveen@112.135.159.39) Quit (Remote host closed the connection)
[21:27] <iggy> by default I don't think dd does any sync'ing
[21:27] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[21:28] <iggy> wouldn't it make more sense to just snapshot rbd0 and run off that? (I don't know much about performance of rbd snapshots)
[21:33] <Gugge-47527> would only work if read-only is okay for his usecase
[21:37] <junglebells> iggy: I can't do a snapshot because we've already created the initial rbd as format 1
[21:37] <junglebells> I'm basically looking to bring up a mysql slave and my old slave is dead, my only non-primary copy of the database in the 850GB in my rbd0
[21:39] <junglebells> It's not real feasible to read lock my primary in order to make a second slave off of it.
[21:41] <Gugge-47527> your primary db does not run on some snapshit capable storage?
[21:41] <Gugge-47527> snapshot :)
[21:41] <loicd> Gugge-47527: nice typo
[21:41] <Gugge-47527> :)
[21:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[21:43] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[21:55] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[21:55] * liiwi (liiwi@idle.fi) has joined #ceph
[22:00] * The_Bishop (~bishop@e177089147.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[22:03] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[22:03] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Quit: Leaving.)
[22:09] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[22:15] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:15] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[22:18] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[22:19] <junglebells> The whole reason I needed to move to something like ceph & RBD to support this is we have a single table that's 820GB and cannot be nicely sharded.
[22:20] <junglebells> Gugge-47527: I have it with LVM but I am almost out of space and not sure if I can make a snapshot safely
[22:20] <janos> junglebell: i haven't read scrollback - what database type?
[22:20] <junglebells> My predecessors have done a pretty good job pidgeon holing me
[22:20] <junglebells> mysql
[22:20] <janos> ah nm
[22:20] <junglebells> f/oss version
[22:20] <janos> i haven't kept up with their partitioning
[22:21] <janos> if postgresql was going to suggest partitioning
[22:21] <junglebells> The idea of the ceph cluster is that short term it can allow my developer retards to continuing growing this db at 1.5GB/day until they can pull their heads out of the sand and move to a better db
[22:21] <janos> they may have that and/or tablespaces, but i just don't know
[22:21] <junglebells> This particular table, shouldn't even be in mysql in the first place.
[22:21] <janos> OUCH
[22:21] <janos> that's some growth!
[22:22] <junglebells> Yea no kidding right? heh that's why I find myself in such a predicament
[22:22] <janos> dang
[22:23] <junglebells> After weeks of political bs, I finally managed to get 3xDell R900's with 4x2TB disks in each w/ dual port 10gbe cards to run this. :) I scaled it up a bit from exactly what was needed so I can use excess space to show how useful a san is for hosting VMs. Right now we do 0 virtualization and have tons of lightly used bare metal
[22:24] * wer_gone (~wer@wer.youfarted.net) Quit (Remote host closed the connection)
[22:24] * wer (~wer@wer.youfarted.net) has joined #ceph
[22:24] <janos> i love virtualization for that
[22:25] * BManojlovic (~steki@85.222.184.185) has joined #ceph
[22:25] <janos> getting much better utilization
[22:25] <lurbs> Until you hit four figures of VMs, and wake up crying.
[22:26] <janos> hahaha
[22:26] <janos> i have no known horizon where i will hit that thankfully
[22:26] <janos> i'm crying thinking about that
[22:28] <junglebells> so... bringing me back to my inquiry again, anyone have any idea how to speed up my reads on my rbd for my dd? It seems to only be using 1 of my hosts to read data and then turning around and writing to three. Is there a way to distribute those reads amongst the others? (3mons, 3osd/mds, very basic .conf, and crushmap setup to have full copy on each osd)
[22:28] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:28] * janos does not know
[22:30] <iggy> junglebells: nope, that's expected (reads come from primary)
[22:30] <lurbs> Just for the kernel client, or librbd too?
[22:30] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:30] <iggy> ceph in general
[22:31] <joshd1> junglebells: best way would be to have many reads in flight at once. you could try doing multiple dds for different portions of the disk
[22:31] <wer> Back the osd's with raid?
[22:32] * dxd828 (~dxd828@213.205.241.196) has joined #ceph
[22:33] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Remote host closed the connection)
[22:34] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:35] <infernix> aha
[22:35] <infernix> so i'm getting somewhere i think
[22:35] <infernix> when i have a python child process connect to cep, debugging shows {b'conffile': b'/etc/ceph/ceph.conf', b'rados_id': b'testuser', b'self': <rados.Rados object at 0x7f59600484d0>, b'conf': None}
[22:35] <infernix> 'conf': None - could it be that this is causing it to skip parsing the config file alltogether?
[22:36] <infernix> i don't declare anything for conf anywhere, and if I look at the same connect() in the main thread, there's no conf:None
[22:36] * dxd828 (~dxd828@213.205.241.196) Quit ()
[22:38] <joshd1> infernix: no, it should still load the conffile. conf is just an optional dict of extra settings to apply on top of that
[22:39] <junglebells> joshd1: So my initial impression was that it ready from all three and thus why I setup my crushmap to maintain three copies across the three nodes. I should probably just do two then for the sake of saving disk space since it's otherwise kind of a waste. I still want at least 2 so I can take one host down for maintenance at a time
[22:40] <infernix> joshd1: k, stepping through rados.py with rpbd2
[22:40] <infernix> something is going wrong here where the main thread can connect just fine to the cluster but subprocesses fail
[22:40] <joshd1> junglebells: it will read from all 3 if you have more than one read going at once - the problem is that dd is single-threaded
[22:42] <junglebells> Ahhh roger my bad
[22:43] * leseb_ (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[22:47] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[22:47] <infernix> cephx server client.testuser: unexpected key: req.key=0 expected_key=a21339f8f79c7302
[22:47] <infernix> argh
[22:47] * infernix kicks python in the nuts
[22:51] <infernix> does the python code just pass the ceph.conf to librados?
[22:51] <infernix> is there some limitation on connections in librados based on process ID?
[22:52] <infernix> ret = self.librados.rados_connect(self.cluster)
[22:52] <infernix> this is returning -1
[22:53] <infernix> i can't dig any deeper in python
[22:53] <joshd1> infernix: I'd suggest stracing, maybe it's not reading the keyring file for some reason in the subprocesses?
[22:56] <infernix> joshd1: grepping through the strace shows 3 instances of the key
[22:59] <infernix> it would almost seem to me that the rados python lib breaks when using multiprocessing
[23:00] <joshd1> I wonder if something strange is going on with that mainrados object existing for all of them
[23:00] <infernix> if I pull the code that gets executed as a subprocess and put it in a single file, it works
[23:00] <infernix> hell i'm almost sure that if I would just call an os.exec on them it'll work
[23:00] <infernix> but i want to keep one script
[23:01] <infernix> i've also tried threadrados = __import__(rados) but that didn't help much, and no idea if that localizes it any better
[23:02] <infernix> all it does is connect to rdb, create a 1gb disk, write some zeroes, read some zeroes, delete the disk and exit
[23:03] <infernix> let me just write up a trivial example that only connects
[23:04] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[23:06] <lurbs> junglebells: Have you tried sgp_dd against the mapped RBD device?
[23:07] <junglebells> lurbs: I have not, haven't even heard of it (going to look it up now). I was about to calculate the block count and distribute that over the three nodes and calc the offsets and such to at least break it into three jobs.
[23:08] <lurbs> Looks to be optimised for running against devices that support the SCSI command set, but works again raw devices too, and supports multiple threads.
[23:08] <junglebells> Now my immediate concern is, do my rbds support SCSI commands?
[23:10] * eschnou (~eschnou@146.108-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:11] <infernix> ah ha!
[23:12] <infernix> if i do not connect in the main thread, child threads work
[23:12] <lurbs> I doubt it, but it should work again just the raw RBD devices regardless. I'd test it first, though.
[23:12] <infernix> but that's not good
[23:12] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:14] <infernix> joshd1: http://pastebin.ca/2315339 - see anything obviously wrong with this?
[23:15] <infernix> it looks like that if I connect to ceph in the main process, any child processes fail to connect
[23:15] <jmlowe> is there a knob I can turn to make backfill go faster?
[23:15] <infernix> i can probably work around it by having an additional child process do some initial setup work first, but I don't see any reason why it would fail
[23:19] <joshd1> infernix: no, that's strange, since multi-threaded (or process) c works fine. there must be something odd about multiprocessing and the python binding/loading shared libraries
[23:22] <infernix> joshd1: right. so disabling the cluster.connect() in the main process fixes it
[23:22] * JohansGlock_ (~quassel@kantoor.transip.nl) has joined #ceph
[23:22] <infernix> so i'll do initial setup and final teardown in separate processes too, and be done with it for now
[23:22] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[23:22] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[23:23] * dosaboy (~user1@host86-164-229-186.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[23:29] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[23:29] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[23:32] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[23:41] <junglebells> lurbs: Yea it's still just reading from just the single disk so it's not helping me any. Same performance as before
[23:52] * vata (~vata@2607:fad8:4:6:11f4:cedb:50b7:4c14) Quit (Quit: Leaving.)
[23:53] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:57] <ShaunR> is there a way to see IO/Speed stats from a rbd client connected to the cluster? For example, i'd like to be able to see the IOPS & bandwidth a VM is using thats attached to the cluster.
[23:57] <junglebells> ShaunR: So something like bwm-ng on that client?
[23:58] <ShaunR> no
[23:58] <ShaunR> i'd like to see the IOPS and IO bandwidth for that client

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.