#ceph IRC Log

Index

IRC Log for 2012-11-03

Timestamps are in GMT/BST.

[0:04] * synapsr (~Adium@2620:149:f01:201:d5b0:295e:7a20:72) has joined #ceph
[0:05] * nwatkins (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:07] * synapsr (~Adium@2620:149:f01:201:d5b0:295e:7a20:72) Quit ()
[0:08] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[0:29] <benpol> so will "format 2" become the default with bobcat? I see it still isn't as of 0.53
[0:29] <benpol> (rbd format that is)
[0:29] * jlogan1 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) Quit (Ping timeout: 480 seconds)
[0:30] <joshd> benpol: it probably won't be the default until kernel layering support is upstream for a while
[0:31] <benpol> joshd: I guess that's understandable
[0:32] <benpol> I upgraded my test cluster from argonaut to 0.53 today (in order to play around with clones)
[0:34] <benpol> a libvirt/kvm server has since had trouble starting up a vm that's hosted on the cluster.
[0:35] <joshd> did you upgrade librbd on the vm host?
[0:36] <benpol> yes (did that one first actually)
[0:39] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[0:39] <benpol> wonder if I need to rebuild qemu/kvm, seems like I shouldn't have to as it's dynamically linked to librbd.
[0:39] <joshd> no, you shouldn't have to
[0:40] <joshd> what kind of trouble is it? does qemu start and hang, or does it error out?
[0:40] <iggy> did librbd change?
[0:41] <benpol> "libvirtError: Unable to read from monitor: Connection reset by peer"
[0:41] <joshd> there shouldn't be any backwards incompatible ABI changes, just additions of new functions
[0:41] <iggy> I might still rebuild qemu just to be safe
[0:41] <benpol> joshd: ok, I'll keep fiddling with it
[0:41] <iggy> and there should be more error than that
[0:41] <iggy> otherwise, try to reproduce outside libvirt
[0:43] <benpol> iggy: looking further at the libvirt logs, seeing issues with secret stuff: loadSecrets:505 : Error reading secret: internal error <uuid> does not match secret file name 'foo.xml'
[0:43] <benpol> I'll just recreate those.
[0:44] <benpol> secrets (and associated files) that is.
[0:57] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:16] <iggy> that's a libvirt thing... qemu/kvm doesn't know about secrets
[1:16] <iggy> so it's doubtful that's making it fail
[1:16] <iggy> based on your other message
[1:18] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[1:19] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[1:23] <benpol> iggy: yeah I need to look into it more closely
[1:28] <benpol> have a fine weekend all...
[1:28] * benpol (~benp@garage.reed.edu) has left #ceph
[1:34] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[1:36] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[1:39] * brandi (~brandi@9YYAAKEJL.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[1:42] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[1:51] * ajm (~ajm@adam.gs) has left #ceph
[2:06] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[2:19] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:21] <dec> anyone here versed in the libvirt rbd integration? I've created an rbd pool and a volume within that pool, using virsh. I can't work out how to attached that disk to a domain.
[2:22] <dec> virt-install doesn't support rbd, so I've built a domain and am trying to use 'virsh attach-device' with an rbd <disk> definition XML File
[2:23] <dec> but I just get a 'operation failed: open disk image file failed' error when trying to attach it
[2:23] <joshd> dec: what's your <disk> xml, and are you using cephx?
[2:25] <joshd> dec: also which version of libvirt and qemu do you have, on which platform?
[2:27] <dec> I have auth disabled
[2:27] <dec> here's my disk: http://privatepaste.com/8fac987617
[2:27] <dec> qemu-kvm-1.2.0 and libvirt 0.10.2
[2:27] <dec> on EL6 x86_64
[2:29] <joshd> is libvirt using the right qemu/kvm binary that has rbd support? (kvm -drive format=? should list rbd) those versions are certainly new enough
[2:30] <dmick> ...and is that the one in <emulator> in the domain .xml
[2:32] <joshd> also, do you know if that error is coming from libvirt or qemu? libvirtd's log will show more info. if it's from libvirt, it may be trying to treat rbd as a file again... I've only had to fix that bug 2 or 3 times
[2:32] <dmick> kvm-spice will definitely do that :(
[2:37] <dec> libvirt's log has: qemuMonitorTextAddDrive:2716 : operation failed: open disk image file failed
[2:37] <dec> and that's all
[2:37] <dec> I assume that's coming from qemu then
[2:38] <dec> I can't get qemu-kvm to give me a list of drive formats...
[2:38] <dec> oh nevermind, it's -drive format=?
[2:38] <dec> I was missing the = -- yes it has rbd support
[2:39] <dmick> yes, that's it
[2:39] <dmick> and is qemu-kvm what's specified in <emulator>?
[2:39] <dec> yes
[2:41] <dmick> (and this is stupid, but, you have an image named rbdtest inside a pool named ceph, as can be verified with rbd -p ceph ls?)
[2:41] <joshd> if you enable libvirt debug level logging and restart libvirtd, you can see exactly what it's passing to the qemu monitor in the libvirtd log
[2:42] <dmick> http://tracker.newdream.net/issues/3432, btw
[2:42] <dec> ah, the pool named 'ceph' is a libvirt pool (virsh pool-create...)
[2:43] <dec> the disk lives in the standard 'rbd' rbd pool (rbd -p rbd ls)
[2:43] <dmick> ah ha
[2:43] <dec> so, I'm confused here - does the 'pool' in libvirt actually do *anything* ?
[2:43] <dmick> so you wanna change that to name="rbd/rbdtest"
[2:43] <dmick> libvirt pools are confusing
[2:44] <joshd> you don't need to use libvirt storage pools to use rbd - storage pools are independent of disks for vms in libvirt
[2:45] <dec> oh ok
[2:45] <joshd> it's easier to just use the rbd command to manage them in most cases
[2:46] <dec> I'm still getting "could not open disk image rbd:rbd/rbdtest:auth_supported=none:mon_host=...."
[2:48] <joshd> try using qemu-img info -f rbd rbd:rbd/rbdtest:auth_supported=none:mon_host=.... and add :debug_ms=1:log_file=rbd.log to the end of that
[2:49] <dec> open("rbd:rbd/rbdtest:auth_supported=none", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[2:49] <dec> it's trying to open it as a file
[2:49] <dmick> is that from qemu-img, or the vm startup?
[2:49] <dec> the vm startup
[2:50] <dec> I can query it fine with qemu-img
[2:50] <dmick> that sure looks like a qemu that doesn't support rbd. could this be some evil path confusion somehow?
[2:51] <dmick> if you have strace, can you see the exec and prove which qemu binary it's executing?
[2:53] <dec> yes, and it's the correct one
[2:54] <dec> does this look correct: -drive file=rbd:rbd/rbdtest:auth_supported=none,if=none,id=drive-virtio-disk0,format=raw
[2:54] <dec> (the mons are configured in ceph.conf)
[2:59] <dec> I have to run, I'll try this again later
[3:02] <joshd> yeah, that looks right, although you'll need to specify the monitor ip:ports too if qemu can't read your ceph.conf to find them
[3:13] <Qten> joshd: i watched your conference on openstack website, good talk by the way bit quiet tho :), umm is it correct in saying that the openstack horizon can't launch a instance with a image onto a new volume yet? i recall something along these lines from your talk
[3:16] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[3:17] <joshd> Qten: thanks, it's louder than I usually am :). horizon can't create a volume from an image, but it can boot from a volume that's got data on it.
[3:18] <joshd> so you can create a volume from an image via the api or cinder cli tool, and then boot from it using horizon, but it's not a one-step process yet
[3:19] <Qten> bugger, i wonder how "hard" it would be to add that functionality
[3:20] <Qten> its just the one minor thing that stops me using it :)
[3:20] <joshd> not too hard probably, but I haven't worked on horizon myself
[3:21] <joshd> the cinder client is also a python library, so you don't need to even create the api call yourself
[3:21] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[3:22] <Qten> also i think that needs to be further documented on the ceph website? maybe it is there but i missed it, that has been driving me up the wall for awhile ;)
[3:23] <joshd> you mean the horizon bit?
[3:23] <Qten> yeah
[3:23] <joshd> yeah, I guess http://ceph.com/docs/wip-rbd-openstack-doc/rbd/rbd-openstack/ could mention that explicitly
[3:24] <joshd> err, http://ceph.com/docs/master/rbd/rbd-openstack/
[3:24] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[3:24] <Qten> could save a few peoples sanity
[3:24] <Qten> maybe =)
[3:27] <joshd> it's right here if you have more you'd like to add: https://github.com/ceph/ceph/blob/master/doc/rbd/rbd-openstack.rst - pull requests/patches welcome :)
[3:31] <Qten> thanks
[3:32] <Qten> so i guess the question is has newdream got it working :)
[3:33] <joshd> not that I know of
[3:33] <Qten> i only ask because i saw something on the website regarding cloud compute offering
[3:34] <Qten> fairynuff
[3:36] <joshd> you can see a talk about that too if you're curious: http://www.youtube.com/watch?v=l_8Y988fO44
[3:36] <Qten> ah nice
[3:36] <Qten> thanks mate
[3:37] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:39] <joshd> I'm off - have a good weekend
[3:45] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[4:13] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[4:18] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[4:18] * jtang1 (~jtang@79.97.135.214) Quit ()
[4:20] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:24] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[4:26] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[4:41] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[5:00] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[5:12] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[5:19] <dmick> dec: I started having the same problem as you
[5:20] <dmick> a workaround, although I don't understand yet: I added "auth_supported=none" to the arguments
[5:20] <dmick> i.e. name='rbd/image:auth_supported=none'
[5:20] <dmick> because I finally figured out apparmor enough to let kvm log something
[5:20] <dmick> and it was complaining about auth
[5:21] <dmick> I had auth supported = none in my cluster config, in [global], but that wasn't enough, somehow
[5:22] <dmick> I need to understand the newer settings better
[5:48] * scalability-junk (~stp@dslb-084-056-048-012.pools.arcor-ip.net) has joined #ceph
[5:49] * lolvet (~guest-zjX@173-126-102-15.pools.spcsdns.net) has joined #ceph
[5:50] * lolvet (~guest-zjX@173-126-102-15.pools.spcsdns.net) Quit ()
[6:06] * ajm (~ajm@adam.gs) has joined #ceph
[6:10] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:11] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[6:40] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) has joined #ceph
[7:00] * dmick (~dmick@2607:f298:a:607:c08e:825e:dcf3:e507) Quit (Quit: Leaving.)
[7:03] * Ryan_Lane (~Adium@207.239.114.206) has joined #ceph
[7:06] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[7:07] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[7:16] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) Quit (Remote host closed the connection)
[7:39] * Ryan_Lane (~Adium@207.239.114.206) Quit (Quit: Leaving.)
[8:03] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has left #ceph
[8:28] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) has joined #ceph
[9:02] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) has left #ceph
[9:45] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[9:51] * loicd (~loic@magenta.dachary.org) has joined #ceph
[10:06] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[10:14] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[10:24] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[11:38] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:52] * tryggvil (~tryggvil@16-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[12:01] * ramsay_za (~ramsay_za@41.215.234.234) Quit (Quit: Leaving)
[12:01] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) has joined #ceph
[12:02] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) has joined #ceph
[12:03] * loicd1 (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) has joined #ceph
[12:07] * loicd (~loic@magenta.dachary.org) Quit (Ping timeout: 480 seconds)
[12:34] * loicd1 (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) Quit (Quit: Leaving.)
[12:34] * loicd (~loic@magenta.dachary.org) has joined #ceph
[12:37] <Oliver2> hi… anybody out there, who can help with an idea for reducing used bandwidth when incorporating 2 new nodes into a cluster? I know of "osd recovery max active = 1", but with 4 OSD's per node, this is not enough for a 1GB network ;)
[12:41] <darkfaded> Oliver2: you do have an extra network for the osds, right?
[12:41] <darkfaded> but hmm, still no good idea how to limit it
[12:41] <Oliver2> Yessir ;)
[12:41] <darkfaded> wait till nhm wakes ;>
[12:42] <Oliver2> Hehe...
[12:46] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[12:51] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[12:52] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[13:04] * sagelap (~sage@62.50.250.252) has joined #ceph
[13:13] <jtang> so them storage pods aren't such a hot idea :P
[13:26] <dec> storage pods?
[13:39] * sagelap (~sage@62.50.250.252) Quit (Ping timeout: 480 seconds)
[13:59] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) has joined #ceph
[13:59] <CristianDM> Hi
[14:03] * Steki (~steki@212.200.240.4) has joined #ceph
[14:06] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) Quit ()
[14:07] <jtang> yea
[14:08] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) Quit (Quit: Leaving.)
[14:13] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) has joined #ceph
[14:43] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) Quit (Quit: Leaving.)
[14:46] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) has joined #ceph
[14:46] <CristianDM> Hi.
[14:46] <CristianDM> I try to mount cephfs via ceph-fule
[14:46] <CristianDM> fuse
[14:46] <CristianDM> When try to mount it shows ceph-fuse[39040]: starting ceph clien
[14:46] <CristianDM> But never return to the console
[14:47] * loicd1 (~loic@magenta.dachary.org) has joined #ceph
[14:47] * loicd (~loic@magenta.dachary.org) Quit (Read error: No route to host)
[14:49] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) Quit ()
[14:55] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) has joined #ceph
[15:21] * Yann (~Yann@did75-15-88-160-187-237.fbx.proxad.net) has joined #ceph
[15:22] * Yann is now known as Guest4341
[15:23] * kYann (~Yann@did75-15-88-160-187-237.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[15:34] <denken> with 0.48.2, is it possible to have gaps in the OSD numbering?
[15:35] <denken> i see strange behavior when i try to skip numbers.... 0,1,3,4,5,7,8,9 for example
[15:36] * sagelap (~sage@62.50.252.2) has joined #ceph
[15:37] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[15:38] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[15:43] * joao (~JL@77.56.108.93.rev.vodafone.pt) has joined #ceph
[15:44] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[15:46] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[16:02] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Read error: Connection reset by peer)
[16:03] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[16:26] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) Quit (Quit: Leaving.)
[16:27] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) has joined #ceph
[16:42] * loicd1 (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[16:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:45] * Anticimex (anticimex@netforce.csbnet.se) Quit (Ping timeout: 480 seconds)
[16:48] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[16:58] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[17:01] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[17:02] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[17:09] * joao (~JL@77.56.108.93.rev.vodafone.pt) Quit (Ping timeout: 480 seconds)
[17:10] * joao (~JL@62-50-227-66.client.stsn.net) has joined #ceph
[17:11] * synapsr1 (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[17:12] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[17:25] * Anticimex (anticimex@netforce.csbnet.se) Quit (Remote host closed the connection)
[17:26] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:35] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:53] * synapsr1 (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:02] <NaioN> denken: no it's not possible to have gaps in the osd numbering
[18:09] <denken> ty NaioN
[18:18] <denken> as you can guess, all kinds of strange things happen when you try :P
[18:23] * synapsr (~Adium@2602:306:c434:5230:8d96:53fe:f4f1:546b) has joined #ceph
[18:26] * synapsr (~Adium@2602:306:c434:5230:8d96:53fe:f4f1:546b) Quit ()
[18:36] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:54] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[18:57] <ctrl> hi all!
[18:57] <ctrl> i got bad performance on ceph cluster
[18:58] <ctrl> i have 2 servers connected via infiniband 40Gb, in every server 6 disks 2Tb SAS
[18:59] <ctrl> journal on tmpfs
[19:00] <ctrl> Is anyone here?
[19:03] <jmlowe> can you be a little more specific about bad performance?
[19:03] <jmlowe> ceph -s say everything is healthy?
[19:04] <ctrl> yeah, health is ok
[19:04] <jmlowe> everything is active+clean
[19:05] <jmlowe> what does iotop say, is there as much activity as you expect?
[19:05] <ctrl> yeah
[19:05] <jmlowe> how about the osd bench command?
[19:06] <jmlowe> <- not a dev, just a user
[19:07] <ctrl> osd bench? what is that?
[19:07] <ctrl> iotop show about 500-1000kb/s
[19:07] <jmlowe> http://wiki.ceph.com/wiki/Troubleshooting
[19:08] <jmlowe> under osd performance
[19:08] <jmlowe> basically it tells the osd to write a bunch of random data, throw it away, and report on the speed
[19:09] * synapsr (~Adium@108-67-69-35.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[19:09] <jmlowe> it will put the result in the logs or you can see it with ceph -w
[19:09] <ctrl> hmm
[19:09] <jmlowe> it takes a few seconds to run
[19:10] <jmlowe> I tend to see the results in my logs after about 30 seconds
[19:10] <jmlowe> anyway you can see if it's just one osd or if it is across the board
[19:11] <jefferai> jtang: you mean the backblaze one?
[19:12] <ctrl> i did`t saw anything in ceph -w
[19:14] <jefferai> jtang: fwiw I looked into those, but the iops are actually not great...the hardware is mostly designed for write once, not random access so it's underpowered for the general case
[19:15] <jefferai> and people in here convinced me I'm much better off scaling out than give numbers of disks in one box, where you can saturate your controller if it's not up to snuff
[19:19] <ctrl> how I find the bottleneck in the performance?
[19:20] * synapsr (~Adium@108-67-69-35.lightspeed.sntcca.sbcglobal.net) Quit (Quit: Leaving.)
[19:21] <jefferai> ctrl: if you use iperf in tcp mode, can you verify that you are actually getting the throughput you expect between the two boxes?
[19:23] <ctrl> yeah, 1 min plz )
[19:28] <ctrl> ------------------------------------------------------------
[19:28] <ctrl> Client connecting to 172.16.0.10, TCP port 5001
[19:28] <ctrl> TCP window size: 64.0 KByte (default)
[19:28] <ctrl> ------------------------------------------------------------
[19:28] <ctrl> [ 3] local 172.16.0.11 port 51492 connected with 172.16.0.10 port 5001
[19:28] <ctrl> [ ID] Interval Transfer Bandwidth
[19:28] <ctrl> [ 3] 0.0-10.6 sec 1.78 GBytes 1.44 Gbits/sec
[19:31] <ctrl> when i do dd if=/dev/zero of=/mnt/ceph/testfile.1 bs=1024k count=1000, I get about 20Mbps
[19:31] <darkfaded> 1.44Gbit on infiniband isnt really rushing it either
[19:32] <ctrl> yeah, but enough for ceph
[19:33] <ctrl> 1.44Gbit because tcp windows size is very small for this type of connection
[19:39] <ctrl> so, any ideas?
[19:50] <mikeryan> ctrl: can you describe your setup in more detail?
[19:50] <mikeryan> is this cephs running on top of the OSDs?
[20:00] <ctrl> yeah
[20:01] <ctrl> what kind of information u need?
[20:05] <ctrl> this is my ceph.conf > http://pastebin.com/Pzxafnsm
[20:14] * BManojlovic (~steki@212.200.241.76) has joined #ceph
[20:15] <mikeryan> ctrl: is this a production setup or just a testing setup?
[20:16] <mikeryan> if you don't mind creating a new pool we can test rados performance directly
[20:18] <ctrl> im trying use ceph for cloud storage and this is first test setup
[20:20] * Steki (~steki@212.200.240.4) Quit (Ping timeout: 480 seconds)
[20:23] <ctrl> yes, create rdb pool?
[20:23] <ctrl> *rbd
[20:32] <mikeryan> create a bench pool
[20:32] <mikeryan> (we'll deleted it later)
[20:32] <ctrl> just a second
[20:33] <ctrl> how can ai create a new pool
[20:34] <ctrl> can i use rbd default pool?
[20:34] * joao (~JL@62-50-227-66.client.stsn.net) Quit (Quit: Leaving)
[20:34] <mikeryan> probably.. but i don't want to run the disk of damaging your setup
[20:34] <mikeryan> ceph osd pool create bench
[20:35] <ctrl> ok
[20:35] <ctrl> rados df
[20:35] <ctrl> pool name category KB objects clones degraded unfound rd rd KB wr wr KB
[20:35] <ctrl> bench - 0 0 0 0 0 0 0 0 0
[20:35] <ctrl> data - 0 0 0 0 0 0 0 0 0
[20:35] <ctrl> metadata - 9 21 0 0 0 11 0 49 26
[20:35] <ctrl> rbd - 1 2 0 0 0 1 0 2 2
[20:35] <ctrl> total used 446344 23
[20:35] <ctrl> total avail 23430268952
[20:35] <ctrl> total space 23430715296
[20:36] <ctrl> well done
[20:36] <mikeryan> cool, now let's bench
[20:36] <mikeryan> rados -p bench bench 30 write
[20:36] <mikeryan> and see what kind of throughput numbers you get
[20:36] <ctrl> ok
[20:36] <mikeryan> (that'll bench for 30 sec with 4 MB IO)
[20:37] <ctrl> so
[20:37] <ctrl> Total time run: 43.258228
[20:37] <ctrl> Total writes made: 151
[20:37] <ctrl> Write size: 4194304
[20:37] <ctrl> Bandwidth (MB/sec): 13.963
[20:37] <ctrl> Stddev Bandwidth: 26.307
[20:37] <ctrl> Max bandwidth (MB/sec): 128
[20:37] <ctrl> Min bandwidth (MB/sec): 0
[20:37] <ctrl> Average Latency: 4.48605
[20:37] <ctrl> Stddev Latency: 8.17709
[20:37] <ctrl> Max latency: 29.7957
[20:38] <ctrl> Min latency: 0.039435
[20:38] <mikeryan> looks like you're getting appx 14 MB/sec from RADOS
[20:38] <ctrl> yep
[20:38] <mikeryan> which is pretty close to what you were getting from cephfs
[20:38] <mikeryan> so it appears the problem is not cephfs
[20:38] * sagelap (~sage@62.50.252.2) Quit (Ping timeout: 480 seconds)
[20:39] <mikeryan> https://gist.github.com/
[20:39] <mikeryan> can you paste me the output of ceph pg dump?
[20:39] <ctrl> yeah, 1 sec
[20:42] <ctrl> ready https://gist.github.com/4008450
[20:43] <mikeryan> wow that's a lot of PGs
[20:43] <mikeryan> how many PGs per OSD?
[20:44] <mikeryan> also what kind of disks are these?
[20:46] <ctrl> i don`t know )
[20:46] <ctrl> by default
[20:46] <ctrl> disks 6x2Tb per node
[20:47] <mikeryan> perhaps it's a write bottleneck, let's see how good your read speed is
[20:47] <mikeryan> first we'll create some objects in the bench pool
[20:47] <mikeryan> rados -p bench bench 30 write --no-cleanup
[20:47] <mikeryan> give that 30 sec or so to finish
[20:47] <mikeryan> (that will prevent the files from being deleted after they're created)
[20:49] <ctrl> second )
[20:49] <ctrl> ready
[20:49] <mikeryan> rados -p bench bench 30 read
[20:50] <ctrl> mmm... can`t start with this command
[20:52] <mikeryan> what's the error?
[20:53] <ctrl> no error< i think syntax of command wrong
[20:53] <ctrl> just output --help
[20:53] <mikeryan> ah, my mistake!
[20:53] <mikeryan> rados -p bench bench 30 seq
[20:54] <mikeryan> (seq == sequential read)
[20:54] <ctrl> ok
[20:54] <ctrl> rados -p bench bench 30 seq
[20:54] <ctrl> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
[20:54] <ctrl> 0 0 0 0 0 0 - 0
[20:54] <ctrl> 1 16 57 41 163.964 164 0.158929 0.0545615
[20:54] <ctrl> 2 16 57 41 81.9834 0 - 0.0545615
[20:54] <ctrl> 3 16 57 41 54.6567 0 - 0.0545615
[20:54] <ctrl> 4 16 58 42 41.9927 1.33333 3.42984 0.134925
[20:54] <ctrl> 5 16 58 42 33.5944 0 - 0.134925
[20:54] <ctrl> 6 16 58 42 27.9954 0 - 0.134925
[20:55] <ctrl> 7 16 118 102 58.2763 80 0.040561 1.021
[20:55] <ctrl> 8 16 120 104 51.9917 8 0.013213 1.01783
[20:55] <ctrl> 9 16 149 133 59.1017 116 0.012308 0.982095
[20:55] <ctrl> 10 16 151 135 53.9915 8 0.012814 0.973198
[20:55] <ctrl> 11 16 163 147 53.4462 48 2.15395 1.04815
[20:55] <ctrl> 12 16 163 147 48.9924 0 - 1.04815
[20:55] <ctrl> 13 16 174 158 48.608 22 0.012598 1.08866
[20:55] <ctrl> 14 16 188 172 49.1353 56 0.012093 1.14464
[20:55] <ctrl> 15 16 188 172 45.8597 0 - 1.14464
[20:55] <ctrl> 16 16 195 179 44.7432 14 0.013243 1.18592
[20:55] <ctrl> 17 16 212 196 46.1107 68 4.27039 1.24453
[20:55] <ctrl> 18 16 224 208 46.2153 48 4.29309 1.31808
[20:55] <ctrl> 19 16 230 214 45.0458 24 0.012148 1.29526
[20:55] <ctrl> 2012-11-03 23:54:33.721598min lat: 0.011647 max lat: 7.0215 avg lat: 1.21097
[20:55] <ctrl> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
[20:55] <ctrl> 20 16 275 259 51.7922 180 0.069084 1.21097
[20:55] <ctrl> Total time run: 20.626935
[20:55] <ctrl> Total reads made: 275
[20:55] <ctrl> Read size: 4194304
[20:55] <ctrl> Bandwidth (MB/sec): 53.328
[20:55] <ctrl> Average Latency: 1.19754
[20:55] <ctrl> Max latency: 7.0215
[20:55] <ctrl> Min latency: 0.011647
[20:55] <mikeryan> avg read time of 50 MB/sec
[20:56] <mikeryan> tentatively this points to the write speed of the disks being the bottleneck
[20:57] <ctrl> strange
[20:57] <mikeryan> have you done local throughput testing on the disks?
[20:59] <ctrl> sec, im testing via dd if=/dev/zero of=/mnt/hdd2/testfile bs=1024k count=3000
[21:02] <ctrl> *count=20000
[21:02] <ctrl> 158 MB/c
[21:02] <ctrl> 158 MB/sec
[21:02] <mikeryan> hm, so that's sequential write speed
[21:02] <mikeryan> most OSD writes are non-sequential though
[21:03] <ctrl> random write?
[21:07] <ctrl> Yes, but I thought that the speed will be like 6x50
[21:08] <ctrl> *6x50mb/s ?
[21:09] <mikeryan> ceph writes are not perfectly parallelized, but yes the performance should be closer to that
[21:10] <mikeryan> unfortunately i'm not really sure how to track down the performance issue any further
[21:11] <mikeryan> it's the weekend, so the folks who are able to help won't be around until monday most likely
[21:11] <ctrl> yeah (
[21:11] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[21:11] <mikeryan> i can tell them about your problem, or you can come back in during monday business hours (9 AM - 5 PM GMT-0800), or you can email ceph-devel@vger.kernel.org
[21:12] <ctrl> yeah-yeah, im right now think about mail-listing :)
[21:13] * lofejndif (~lsqavnbok@04ZAABDXG.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:13] <ctrl> im trying to come back in monday
[21:13] <ctrl> thank you!
[21:13] <mikeryan> sorry i couldn't be of more help!
[21:14] <ctrl> everything is ok! :)
[21:39] * lofejndif (~lsqavnbok@04ZAABDXG.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:42] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[21:43] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[21:43] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[21:43] * Leseb_ is now known as Leseb
[21:48] * sjustlaptop (~sam@mf40436d0.tmodns.net) has joined #ceph
[22:04] * nwatkins (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[22:40] * sjustlaptop (~sam@mf40436d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[23:06] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) has joined #ceph
[23:07] <CristianDM> Hi
[23:07] <CristianDM> ceph-fuse have only one mountpoint?
[23:19] * sagelap (~sage@62-50-199-254.client.stsn.net) has joined #ceph
[23:29] * nwatkins (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[23:30] * loicd (~loic@magenta.dachary.org) Quit (Read error: Connection reset by peer)
[23:30] * loicd1 (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) has joined #ceph
[23:32] * jlogan1 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) has joined #ceph
[23:44] * loicd1 (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) Quit (Quit: Leaving.)
[23:45] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:51] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) Quit ()
[23:57] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[23:58] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit ()

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.