#ceph IRC Log

Index

IRC Log for 2015-06-24

Timestamps are in GMT/BST.

[0:00] <qhartman> What is the best method to confirm that? I would think that hitting the admin socket would be the best source of truth.
[0:00] * oro (~oro@84-72-180-62.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:05] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[0:07] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[0:07] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[0:08] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[0:08] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Quit: Leaving.)
[0:09] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) Quit (Ping timeout: 480 seconds)
[0:12] * qybl_ (~foo@kamino.krzbff.de) Quit (Quit: WeeChat 1.1.1)
[0:12] * qybl (~foo@kamino.krzbff.de) has joined #ceph
[0:18] * root (~root@190.19.112.50) has joined #ceph
[0:18] <root> hi all
[0:18] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) has joined #ceph
[0:19] <root> I want to know if there's a way to see the perf of all volumes of a pool?
[0:19] <root> the pool is defined in cinder, and has many volumes, and we would like to know what volumes are causing high iowait to the cluster
[0:19] <cholcombe> ceph perf stat isn't showing you what you need?
[0:20] <cholcombe> oh it probably doesn't break it out by pool does it?
[0:21] <lurbs> 'ceph osd pool stats' does, but not by volume.
[0:21] <cholcombe> right
[0:21] <cholcombe> that's what i was thinking of
[0:22] <root> cholcombe: thanks but yes, i need to get into the volume specific
[0:22] <cholcombe> when you say volume what do you mean?
[0:23] <cholcombe> oh i see it's a block device
[0:23] <cholcombe> root: this is an area that could use a lot of improvement in ceph I think.
[0:23] <root> yeah, the pool got many rbds
[0:23] <cholcombe> this is a problem with gluster also
[0:23] <cholcombe> i have a hard time figuring out who is stomping the cluster
[0:24] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:25] <cholcombe> it's probably just a matter of getting this info the CLI. the cluster knows it i think
[0:26] <root> cholcombe: yeah i think that the info is there in the cluster but we need a way to get that info to us :) in a usefull way
[0:26] <lurbs> I haven't even pulled stats out directly from Ceph, but you can track it down by looking at the network, and then working back to which VM process opened those TCP connections.
[0:27] <root> lurbs: yeah but if you have 200 instances that are going directly to the ceph pool...
[0:27] <cholcombe> exactly it becomes muddy
[0:27] <lurbs> In my case it was ~3 Gb/s of constant writes, so pretty easy to track back. :)
[0:28] <cholcombe> root: maybe your high io wait is from a client doing a lot of small transactions?
[0:29] <root> cholcombe: could be possible, but that's why i need to get some more perf stats from the volumes, and see what volume/s is/are causing high iowait times
[0:29] <cholcombe> root: https://wiki.ceph.com/Planning/Blueprints/Infernalis/A_standard_framework_for_Ceph_performance_profiling_with_latency_breakdown ?
[0:30] <cholcombe> i think that's slightly different than what you want
[0:30] <root> also im getting many requests that are block by many secs :(
[0:30] <cholcombe> do you know any C++?
[0:32] <root> cholcombe: yeah
[0:32] <cholcombe> ah interesting. maybe we can dig into this
[0:32] <cholcombe> i'm cruising the github repo right now
[0:33] <cholcombe> in the mean time you're going to have to cobble together some linux tools to get an approx guess
[0:33] <root> yeah sure
[0:34] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[0:34] <cholcombe> https://github.com/ceph/ceph/blob/77e0084028b85fecedcb8c59ba6144cba5559217/src/test/bench/detailed_stat_collector.cc
[0:34] <doppelgrau> root cholcombe: why don???t you use the stats of qemu on the VM-Side to monitor the utilization (and set IO-limits)
[0:34] <cholcombe> interesting idea
[0:35] <root> doppelgrau: that's a nice idea, let me check it out that
[0:35] <cholcombe> you'd still need to roll it up across the clients but that puts you closer
[0:35] <doppelgrau> https://github.com/qemu/qemu/blob/master/qmp-commands.hx <- can be used for monitoring and setting IO-Limits
[0:36] <doppelgrau> and with IO-Limits the possible impact of a client can be limited
[0:37] <root> doppelgrau: sounds great, now i need to get this thing to the devops guys since i don't had access to those servers
[0:39] <lurbs> You can use Cinder's QoS functionality to impose per volume IOPS limits BTW:
[0:39] <lurbs> http://ceph.com/planet/openstack-ceph-rbd-and-qos/
[0:44] * xarses (~xarses@12.164.168.117) Quit (Ping timeout: 480 seconds)
[0:47] <lurbs> The caveat there is that I've seen some pretty daft IO patterns from various applications, doing tiny reads/writes that hit even a pretty generous IOP limit really easily.
[0:47] <lurbs> s/IOP/IOPS/
[0:47] <root> also don't know if possible to add a collector to ceph in order to get more info
[0:56] <root> mm ok, now the cluster is in the following state:
[0:56] * snakamoto (~Adium@192.16.26.2) has joined #ceph
[0:56] <root> http://pastebin.com/raw.php?i=q2eJV2sK
[0:57] * arbrandes (~arbrandes@189.78.72.151) Quit (Remote host closed the connection)
[0:59] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[0:59] * geli (~geli@geli-2015.its.utas.edu.au) Quit (Remote host closed the connection)
[1:06] <root> any ideas? http://pastebin.com/raw.php?i=12iCTjuw
[1:06] <lurbs> That's what it's supposed to be doing.
[1:07] <lurbs> So long as the number of PGs waiting backfill is slowly dropping, and the number active+clean is rising, it's an acceptable state.
[1:07] <root> That's exactly what i want to know, since there's no disk error, or osd down :S
[1:07] <lurbs> If you find that the backfill is hurting client IO you can throttle it back, BTW.
[1:07] <doppelgrau> root: but you have many blocked IO, ist size = min_size in some of your pools?
[1:08] <lurbs> Live change: ceph tell osd.* injectargs '--osd-max-backfills 1'
[1:08] <lurbs> Or set osd_max_backfills in ceph.conf to make it permanent.
[1:08] <root> size 2, min_size 1
[1:08] * nsoffer (~nsoffer@bzq-79-180-80-9.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[1:09] <root> thanks lurbs let me check that
[1:10] <lurbs> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
[1:10] <lurbs> There's a similar option for 'recovery', rather than 'backfills'.
[1:11] <qhartman> I'm wrestling with some IO issues on my cluster as well. I think it's just a result of having too much going on, but I'm also looking at ways I can reduce IO load.
[1:11] <qhartman> I've got 36OSDs/disks and I'm running 70 VMs against them, none of them particularly IO heavy
[1:12] <qhartman> and I'm seeing significant IOwait on the OSD disks
[1:12] <root> yeah it's really hard to debug high io on ceph
[1:12] <doppelgrau> qhartman: rbdcache, Journal on SSD???
[1:12] <qhartman> journal is on SSD
[1:12] <root> already had journal on ssd, also i try to put journal con fusion-io cards
[1:12] <doppelgrau> qhartman: what hurts with VMs is synchronous IO-operations, independent of the size
[1:13] <qhartman> not running RDBcache, that looked like more trouble than it was worth
[1:13] * fdmanana__ (~fdmanana@bl13-155-180.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[1:13] <doppelgrau> qhartman: rbd-cache helps with sequential IO, making the requests larger => less IO/s
[1:13] <qhartman> true
[1:13] <qhartman> I'm looking at reducing my OSD IO in my ceph cluster if possible, and I'm looking at increasing filestore_max_sync_interval to see if that will allow for less frequent, but larger (sequential?) IO on the OSD disk
[1:14] <qhartman> apologies for the copypasta from before, assuming it was missed
[1:14] <doppelgrau> try it, I think that depends on the IO-patterns of your client
[1:14] <qhartman> but I'd like to confirm that I have enough space in my journals to accommodate a larger interval
[1:15] <qhartman> any wya to investigate journal usage?
[1:15] <qhartman> I haven't been able to find any yet
[1:15] <doppelgrau> but I think there was a second parameter that states how many requests were allowed in the journal, with small IOs like VM-worload that may be reached even before
[1:15] <qhartman> I think you're right, I've been digging through the docs today and I recall something similar
[1:16] <doppelgrau> qhartman: IIRC the journal should be minimum twice the possible IO in a filestore_max_sync_interval
[1:17] <qhartman> journal_queue_max_ops I thin kis the one
[1:19] <doppelgrau> qhartman: some information can be found http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/
[1:19] <root> qhartman: so what would be a 'nice' value for those params? taking into account tht you have 4 osd servers and a total of 36 osd daemons
[1:21] * wesbell__ (~wesbell@vpngac.ccur.com) has joined #ceph
[1:21] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[1:21] <qhartman> I actuall have 12 servers (3 OSDs each, 1 drive per OSD)
[1:21] <doppelgrau> qhartman: journal_queue_max_bytes vs journal_bytes might be a good hint
[1:22] <qhartman> and, I'm not sure what a nice value would be, I'm currently running defaults
[1:22] <qhartman> and they work if I keep my VM density lower
[1:22] <root> as far as i understand the journal act like ring buffer in front to the osds, right?
[1:22] <qhartman> but right now I've got each VM with effectively 1/6 of a HDD spindle for IO
[1:22] <qhartman> and that's just not enough it seems, so I'm looking at other ways to reduce the IO overhead
[1:22] <qhartman> root: that is my undertanding as well
[1:23] <root> so, what would happened if i set the "filestore [min/max] sync interval to let said 999999?
[1:23] <qhartman> I would imagine it would hit it's dirty data threshold long before, and flush anyway
[1:23] <root> will the journal sync start at 100% full or at N %? and also what's N by default?? :S
[1:25] <root> ok so the filestore max sync interval will define the max time period between syncs and the min sync interval will defines the min time period between syncs
[1:25] <doppelgrau> qhartman: I would utilize rbdcache, even with something like 1-2MB size to get some IOs joined at client-side (and track how many IO/s each client makes, even with journal on SSD a platter makes about 150-200 IO/s (and for writes divided by the replication size)
[1:27] * wesbell_ (~wesbell@vpngac.ccur.com) Quit (Ping timeout: 480 seconds)
[1:27] <qhartman> doppelgrau, oh , I may have misunderstood. I do have my qemu instances using cache-writeback setting, and they know they are living on and rbd store
[1:27] <qhartman> so I believe (? hanve't proven) that there is client-side cahcing happening
[1:28] <qhartman> I thought you were referring to the SSD caching layer between ceph and clients that some people are using
[1:29] <doppelgrau> qhartman: there are the parameters documented: http://ceph.com/docs/master/rbd/rbd-config-ref/
[1:30] <qhartman> yes, I am using: rbd cache = true
[1:30] <qhartman> rbd cache writethrough until flush = true
[1:30] <qhartman> defaults for other sizes
[1:35] <qhartman> I do have some RAM overhead available, I may play with increasing these cache sizes
[1:36] <doppelgrau> qhartman: with 32MB default size, I think in most cases a flush is more likely than a full cache
[1:36] * moore_ (~moore@64.202.160.88) Quit (Remote host closed the connection)
[1:37] <doppelgrau> qhartman: I???d use the admin-soclet an monitor if the journal is utilized or if journal_queue_max_ops hit???s first
[1:38] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Ping timeout: 480 seconds)
[1:38] <doppelgrau> journal_queue_max_ops is by default 500, and with small IO that is likely for VMs that means less than 20MB
[1:39] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) Quit (Quit: leaving)
[1:45] * badone_ (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[1:45] <qhartman> so you'r esuggesting that bumping that as well would be needed
[1:46] <doppelgrau> I would start with that
[1:46] * root (~root@190.19.112.50) Quit (Quit: WeeChat 0.4.2)
[1:48] * oms101 (~oms101@p20030057EA06BB00C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:51] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[1:53] <qhartman> Interesting, I'm running "iotop -aPo" on my osd hosts, and as expected the osd processes have far and away the most write traffic, but kworker processes have the most IO
[1:54] <qhartman> xfsaild is also up there, which I would expect
[1:54] <Nats_> my 2cents, i'd concentrate on your disk layer before getting into ceph settings very much
[1:54] <qhartman> yeah, I've looked at my disk layer pretty close, and while it's definitely "busy", it doesn't seems nearly busy enough to account for the iowait and degraded performance I'm seeing
[1:55] <Nats_> i saw significant improvements in my setup just from changing /sys/block/sd?/queue/scheduler to noop and reducing /sys/block/sd?/queue/nr_requests
[1:55] <qhartman> interesting
[1:55] <qhartman> I was looking at filesystem level stuff mostly
[1:55] <qhartman> (and not finding much)
[1:55] <Nats_> the only ceph settings i've touched were reducing the rbd_cache client size, and setting max backfills + max recovery to 1
[1:56] * oms101 (~oms101@p20030057EA061400C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:56] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) has joined #ceph
[1:58] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[2:02] <qhartman> NAts, thanks for the suggestions, I'll look into those
[2:04] <Nats_> i agree though, its very hard hard to know where to look
[2:05] <qhartman> yeah, lots of moving parts
[2:08] <qhartman> Nats_, I'm curious, what did you reduce the nr_requests setting to? Everything I'm reading indicates that reducing the nr_requests value would have a detrimental effect on performance
[2:08] <qhartman> http://www.monperrus.net/martin/scheduler+queue+size+and+resilience+to+heavy+IO
[2:08] <Nats_> qhartman, for me i went from 128 to 32
[2:11] <Nats_> in my case i was tuning a production system, optimizing for latency ("await" in terms of iostat output)
[2:11] <Nats_> for whatever reason, my disks seem to prefer not having too many requests thrown at them
[2:11] <qhartman> that makes sense of SATA
[2:12] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[2:12] <qhartman> interesting stuff to tackle, I would like a technical description of the differences between the noop scheduler and deadline, haven't found that yet
[2:12] <Nats_> who knows, in your env you might want to go the opposite direction, or maybe it will do nothing at all. but its an easy change and pretty easy to measure
[2:12] <qhartman> I'm pretty familiar with that in a cpu shceudler context, but not IO
[2:12] <Nats_> to my knhowledge, noop is the 'dumb' scheduler and just sends requests in order
[2:12] <qhartman> that would make sense
[2:13] <qhartman> and also what I assumed
[2:13] <Nats_> and the others just reorder requests to try and achieve various things
[2:18] * vbellur (~vijay@50-206-204-8-static.hfc.comcastbusiness.net) has joined #ceph
[2:22] <florz> qhartman: Documentation/block/ in the kernel source
[2:22] * rlrevell (~leer@184.52.129.221) has joined #ceph
[2:22] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[2:23] * elder (~elder@104.135.1.105) Quit (Ping timeout: 480 seconds)
[2:23] <qhartman> florz, thanks
[2:31] * rlrevell (~leer@184.52.129.221) Quit (Quit: Leaving.)
[2:31] * rlrevell (~leer@184.52.129.221) has joined #ceph
[2:36] * rlrevell (~leer@184.52.129.221) Quit (Read error: Connection reset by peer)
[2:38] * elder (~elder@104.135.1.105) has joined #ceph
[2:50] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[2:52] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:54] * snakamoto (~Adium@192.16.26.2) Quit (Quit: Leaving.)
[2:57] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[2:59] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) Quit (Ping timeout: 480 seconds)
[3:00] * Rickus (~Rickus@office.protected.ca) Quit (Read error: Connection reset by peer)
[3:02] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:06] * shyu (~Shanzhi@119.254.120.66) has joined #ceph
[3:10] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[3:13] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[3:14] * Debesis (~0x@143.252.117.89.static.mezon.lt) Quit (Ping timeout: 480 seconds)
[3:17] * snakamoto (~Adium@192.16.26.2) has joined #ceph
[3:18] * yanzheng (~zhyan@182.139.20.134) has joined #ceph
[3:20] * kefu (~kefu@114.92.100.239) has joined #ceph
[3:32] * snakamoto (~Adium@192.16.26.2) Quit (Quit: Leaving.)
[3:33] * shang (~ShangWu@175.41.48.77) has joined #ceph
[3:36] * MentalRay (~MRay@107.171.161.165) has joined #ceph
[3:38] * joshd (~jdurgin@63.138.96.7) has joined #ceph
[3:38] * OutOfNoWhere (~rpb@199.68.195.101) has joined #ceph
[3:44] * elder (~elder@104.135.1.105) Quit (Ping timeout: 480 seconds)
[3:44] * elder (~elder@104.135.1.105) has joined #ceph
[3:46] * elder (~elder@104.135.1.105) Quit (Read error: Connection reset by peer)
[3:46] * kefu (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[3:47] * kefu (~kefu@li336-244.members.linode.com) has joined #ceph
[3:51] * sjm (~sjm@114.79.155.115) has joined #ceph
[3:56] * rlrevell1 (~leer@184.52.129.221) has joined #ceph
[4:03] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[4:07] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[4:17] * wenjunhuang (~wenjunhua@111.161.63.110) has joined #ceph
[4:18] * caban (~unknown@office.siplabs.ru) has left #ceph
[4:19] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) has joined #ceph
[4:24] * Concubidated (~Adium@192.170.161.35) Quit (Quit: Leaving.)
[4:24] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[4:25] * ircolle (~ircolle@50-207-27-138-static.hfc.comcastbusiness.net) has joined #ceph
[4:30] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[4:36] * nhm (~nhm@50-201-39-35-static.hfc.comcastbusiness.net) has joined #ceph
[4:36] * ChanServ sets mode +o nhm
[4:44] * ircolle (~ircolle@50-207-27-138-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[4:47] * sjm (~sjm@114.79.155.115) Quit (Ping timeout: 480 seconds)
[4:51] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) has joined #ceph
[4:52] * joshd (~jdurgin@63.138.96.7) Quit (Quit: Leaving.)
[4:54] * sjm (~sjm@114.79.155.115) has joined #ceph
[5:03] * sankarshan (~sankarsha@183.87.39.242) Quit (Ping timeout: 480 seconds)
[5:03] * rlrevell1 (~leer@184.52.129.221) Quit (Read error: Connection reset by peer)
[5:06] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[5:11] * bkopilov (~bkopilov@bzq-109-65-167-66.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[5:18] * kefu_ (~kefu@114.92.100.239) has joined #ceph
[5:19] * bkopilov (~bkopilov@bzq-109-64-160-158.red.bezeqint.net) has joined #ceph
[5:19] * kefu (~kefu@li336-244.members.linode.com) Quit (Read error: Connection reset by peer)
[5:24] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[5:31] * MentalRay (~MRay@107.171.161.165) Quit (Quit: This computer has gone to sleep)
[5:32] * overclk (~overclk@121.244.87.117) has joined #ceph
[5:32] * blinky43 (~Uh@c-76-124-208-67.hsd1.pa.comcast.net) has joined #ceph
[5:33] * kefu_ (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[5:33] * kefu (~kefu@li336-244.members.linode.com) has joined #ceph
[5:34] * Sysadmin88_ (~IceChat77@2.124.164.69) has joined #ceph
[5:37] * Vacuum__ (~Vacuum@i59F79096.versanet.de) has joined #ceph
[5:37] * Sysadmin88 (~IceChat77@2.124.164.69) Quit (Ping timeout: 480 seconds)
[5:41] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[5:41] * georgem (~Adium@69-165-162-64.dsl.teksavvy.com) has joined #ceph
[5:44] * Vacuum_ (~Vacuum@i59F7910C.versanet.de) Quit (Ping timeout: 480 seconds)
[5:51] * bkopilov (~bkopilov@bzq-109-64-160-158.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[5:51] * nhm (~nhm@50-201-39-35-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[6:00] * sjm (~sjm@114.79.155.115) Quit (Ping timeout: 480 seconds)
[6:01] * georgem (~Adium@69-165-162-64.dsl.teksavvy.com) Quit (Quit: Leaving.)
[6:03] * bkopilov (~bkopilov@bzq-109-64-160-158.red.bezeqint.net) has joined #ceph
[6:16] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) Quit (Read error: Connection reset by peer)
[6:16] * off_rhoden (~off_rhode@209.132.181.86) Quit (Read error: Connection reset by peer)
[6:20] * off_rhoden (~off_rhode@transit-86-181-132-209.redhat.com) has joined #ceph
[6:20] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[6:21] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) has joined #ceph
[6:22] * dyasny (~dyasny@104.158.35.152) has joined #ceph
[6:23] * off_rhoden (~off_rhode@transit-86-181-132-209.redhat.com) Quit (Read error: Connection reset by peer)
[6:23] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) Quit (Read error: Connection reset by peer)
[6:23] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[6:28] * kefu (~kefu@li336-244.members.linode.com) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:31] * off_rhoden (~off_rhode@transit-86-181-132-209.redhat.com) has joined #ceph
[6:32] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) has joined #ceph
[6:33] * kefu (~kefu@114.92.100.239) has joined #ceph
[6:41] * shylesh__ (~shylesh@121.244.87.124) has joined #ceph
[6:45] * kefu (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[6:45] * kefu (~kefu@114.92.100.239) has joined #ceph
[6:49] * yguang11 (~yguang11@2001:4998:effd:7801::1103) has joined #ceph
[6:54] * codice (~toodles@97-94-175-73.static.mtpk.ca.charter.com) has joined #ceph
[6:55] * codice_ (~toodles@97-94-175-73.static.mtpk.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:57] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:59] * mgolub (~Mikolaj@91.225.202.121) has joined #ceph
[7:02] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[7:04] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:06] * sjm (~sjm@49.32.0.181) has joined #ceph
[7:06] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) has joined #ceph
[7:10] * nils_ (~nils@doomstreet.collins.kg) has joined #ceph
[7:10] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[7:11] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[7:11] * dyasny (~dyasny@104.158.35.152) Quit (Ping timeout: 480 seconds)
[7:12] * codice_ (~toodles@97-94-175-73.static.mtpk.ca.charter.com) has joined #ceph
[7:13] * nsoffer (~nsoffer@bzq-79-180-80-9.red.bezeqint.net) has joined #ceph
[7:14] * codice (~toodles@97-94-175-73.static.mtpk.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:24] * kefu (~kefu@114.92.100.239) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[7:24] * yguang11_ (~yguang11@12.31.82.125) has joined #ceph
[7:26] * yguang1__ (~yguang11@2001:4998:effd:7804::120b) has joined #ceph
[7:31] * yguang11 (~yguang11@2001:4998:effd:7801::1103) Quit (Ping timeout: 480 seconds)
[7:31] * rlrevell (~leer@184.52.129.221) has joined #ceph
[7:32] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:32] * yguang11_ (~yguang11@12.31.82.125) Quit (Ping timeout: 480 seconds)
[7:42] * rlrevell (~leer@184.52.129.221) Quit (Quit: Leaving.)
[7:43] * rlrevell (~leer@184.52.129.221) has joined #ceph
[7:46] * OutOfNoWhere (~rpb@199.68.195.101) Quit (Ping timeout: 480 seconds)
[7:46] * rlrevell (~leer@184.52.129.221) Quit (Read error: Connection reset by peer)
[7:46] <snerd> is there a neat way to expose a radosgw bucket as a standard apache directory listing?
[7:48] <gleam> https://github.com/rgrp/s3-bucket-listing
[7:50] * mgolub (~Mikolaj@91.225.202.121) Quit (Quit: away)
[7:51] * shohn (~shohn@dslb-088-072-078-092.088.072.pools.vodafone-ip.de) has joined #ceph
[7:55] * topro__ (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[8:01] * dopesong (~dopesong@78-60-74-130.static.zebra.lt) has joined #ceph
[8:04] * rdas (~rdas@121.244.87.116) has joined #ceph
[8:08] * dopesong_ (~dopesong@lb1.mailer.data.lt) has joined #ceph
[8:08] * nsoffer (~nsoffer@bzq-79-180-80-9.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[8:09] * dopesong (~dopesong@78-60-74-130.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[8:12] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) Quit (Quit: Bye)
[8:14] * flisky (~Thunderbi@106.39.60.34) Quit (Quit: flisky)
[8:23] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:24] * yguang1__ (~yguang11@2001:4998:effd:7804::120b) Quit (Ping timeout: 480 seconds)
[8:25] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[8:26] * rdas (~rdas@121.244.87.116) has joined #ceph
[8:31] * tganguly (~tganguly@121.244.87.117) has joined #ceph
[8:34] * yguang11 (~yguang11@2001:4998:effd:7804::120b) has joined #ceph
[8:34] * codice_ is now known as codice
[8:38] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[8:38] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[8:40] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[8:40] * sankarshan (~sankarsha@183.87.39.242) has joined #ceph
[8:43] * Sysadmin88_ (~IceChat77@2.124.164.69) Quit (Quit: Clap on! , Clap off! Clap@#&$NO CARRIER)
[8:43] * kefu (~kefu@114.92.100.239) has joined #ceph
[8:43] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[8:44] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[8:46] <Be-El> hi
[8:56] * nsoffer (~nsoffer@bzq-84-111-112-230.red.bezeqint.net) has joined #ceph
[8:58] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[9:00] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:02] * off_rhoden (~off_rhode@transit-86-181-132-209.redhat.com) Quit (Read error: Connection reset by peer)
[9:02] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) Quit (Read error: Connection reset by peer)
[9:03] * bobrik (~bobrik@83.243.64.45) has joined #ceph
[9:07] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[9:09] * off_rhoden (~off_rhode@transit-86-181-132-209.redhat.com) has joined #ceph
[9:10] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) has joined #ceph
[9:11] * karnan (~karnan@121.244.87.117) has joined #ceph
[9:11] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[9:14] * yguang11 (~yguang11@2001:4998:effd:7804::120b) Quit (Ping timeout: 480 seconds)
[9:17] * dgurtner (~dgurtner@178.197.231.10) has joined #ceph
[9:17] * segutier (~segutier@c-24-6-218-139.hsd1.ca.comcast.net) Quit (Quit: segutier)
[9:17] * karnan_ (~karnan@121.244.87.117) has joined #ceph
[9:18] * sleinen (~Adium@macsl.switch.ch) has joined #ceph
[9:19] * RomeroJnr (~h0m3r@hosd.leaseweb.net) Quit ()
[9:22] * eternaleye (~eternaley@50.245.141.73) Quit (Ping timeout: 480 seconds)
[9:24] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:25] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[9:32] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[9:37] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) has joined #ceph
[9:42] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Ping timeout: 480 seconds)
[9:43] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[9:43] * sep (~sep@2a04:2740:1:0:52e5:49ff:feeb:32) Quit (Quit: Leaving)
[9:43] * sep (~sep@2a04:2740:1:0:52e5:49ff:feeb:32) has joined #ceph
[9:44] * flisky (~Thunderbi@106.39.60.34) Quit (Quit: flisky)
[9:46] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[9:55] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[9:56] * treenerd (~treenerd@85.193.140.98) Quit ()
[9:59] * xzpeter (~oftc-webi@li707-128.members.linode.com) has joined #ceph
[10:00] <kefu> xzpeter, ec pool does not support unaligned write due to its inherent limitation. so one can use it as a rgw backend. but if it comes to rbd, yes, then we need to put a replica
[10:00] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[10:00] <kefu> pool in front of it to align the write access.
[10:01] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[10:01] <kefu> > Is there any way that I could use erasure coding pool directly (replica is really too space consuming for me)?
[10:04] <kefu> xzpeter, if you are using the rbd. i am afraid not. but in this case the replica pool will act as a cache pool, so its size is not necessary on par with its backend pool (the ec pool), and it improves the overall io performance, i believe.
[10:07] * nsoffer (~nsoffer@bzq-84-111-112-230.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[10:07] <topro> jcsp1: got another log file for my MDS issue we were talking about yesterday
[10:08] <xzpeter> kefu: I see.
[10:08] <topro> all of a sudden MDS starts writing "mds.0.cache open_remote_dentry_finish bad remote dentry [dentry #1/home/........." log messages at a rate of some thousand messages per second
[10:08] <topro> always refering to the same file inode
[10:09] <Be-El> kefu: you can also use the cache tier in forward mode, so no writes happen to the replication pool at all
[10:10] <kefu> Be-El, thanks! xzpeter ^
[10:10] <xzpeter> kefu: Actually I am supposed to use cephfs.
[10:10] <topro> it manages to do so for about one minute, the in the middle of one of such logfile lines, huge blobs of binary contents starts to appear in logfile
[10:10] <Be-El> xzpeter: the same applies to cephfs, too. you cannot use ec pool directly
[10:11] <kefu> xzpeter: in that case, the same
[10:11] <xzpeter> Got it. Thanks, Be-El, kefu.
[10:11] <topro> issue might somewhat be related to http://tracker.ceph.com/issues/8255 but still seems to be a bit different
[10:11] <Be-El> xzpeter: you have to add the actual cache tier pool to cephfs as data pool to use it
[10:12] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[10:13] <xzpeter> Be-El: OK.
[10:14] * ingslovak (~peto@office.websupport.sk) has joined #ceph
[10:15] * fdmanana__ (~fdmanana@bl13-155-180.dsl.telepac.pt) has joined #ceph
[10:15] * branto (~branto@nat-pool-brq-t.redhat.com) has joined #ceph
[10:22] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[10:26] <nils_> so what's the status of rbd striping for the kernel module?
[10:26] * Concubidated (~Adium@23.91.33.7) has joined #ceph
[10:30] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[10:31] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[10:31] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit ()
[10:46] * eternaleye (~eternaley@50.245.141.73) has joined #ceph
[10:49] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[10:50] * rlrevell (~leer@184.52.129.221) has joined #ceph
[10:52] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:55] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) has joined #ceph
[10:56] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[10:59] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) has joined #ceph
[11:02] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) has joined #ceph
[11:08] * rlrevell (~leer@184.52.129.221) Quit (Read error: Connection reset by peer)
[11:11] * rlrevell (~leer@184.52.129.221) has joined #ceph
[11:13] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[11:13] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[11:14] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[11:14] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) Quit (Ping timeout: 480 seconds)
[11:15] * bobrik_ (~bobrik@83.243.64.45) has joined #ceph
[11:15] * bobrik (~bobrik@83.243.64.45) Quit (Read error: Connection reset by peer)
[11:15] * oro (~oro@2001:620:20:16:34be:e539:ca57:48b5) has joined #ceph
[11:16] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[11:16] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) has joined #ceph
[11:17] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[11:18] * dlan_ (~dennis@116.228.88.131) has joined #ceph
[11:18] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) Quit (Quit: Ex-Chat)
[11:18] * dlan (~dennis@116.228.88.131) Quit (Read error: Connection reset by peer)
[11:27] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) has joined #ceph
[11:27] * sankarshan (~sankarsha@183.87.39.242) Quit (Read error: Connection reset by peer)
[11:28] * Kioob`Taff (~plug-oliv@2a01:e35:2e8a:1e0::42:10) has joined #ceph
[11:29] * sankarshan (~sankarsha@59.88.96.138) has joined #ceph
[11:30] * rlrevell (~leer@184.52.129.221) Quit (Read error: Connection reset by peer)
[11:31] * sjm (~sjm@49.32.0.181) Quit (Quit: Leaving.)
[11:31] * sjm (~sjm@49.32.0.181) has joined #ceph
[11:39] * peem (~piotr@office.forlinux.co.uk) has joined #ceph
[11:40] <peem> Hi, Having problems with starting radosgw on CentOS 7, anybody feeling like they may help ?
[11:41] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) has joined #ceph
[11:43] <nils_> just describe your problem and stick around for a bit, usually more effective than asking for volunteers ;)
[11:45] <peem> nils_: that may be lenghty, both describing and waiting, but ...
[11:45] <nils_> just a suggestion
[11:47] <peem> I'm trying to get radosgw working on CentOS 7. Init script seems to be defunct, can't get it working properly and there is no logs. Starting radosgw from command line seems to get me bit further, but I'm getting "Method Not Allowed" on all operations, which suggests some sort of auth issues. Again, can't seem to find any logs for this to debug...
[11:48] * sankarshan (~sankarsha@59.88.96.138) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[11:54] * Debesis (Debesis@143.252.117.89.static.mezon.lt) has joined #ceph
[11:58] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[12:02] * Debesis_ (0x@238.160.140.82.mobile.mezon.lt) has joined #ceph
[12:06] * kefu (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[12:06] * kefu (~kefu@li1188-9.members.linode.com) has joined #ceph
[12:08] * Debesis (Debesis@143.252.117.89.static.mezon.lt) Quit (Ping timeout: 480 seconds)
[12:10] * dlan_ (~dennis@116.228.88.131) Quit (Ping timeout: 480 seconds)
[12:13] * dlan (~dennis@116.228.88.131) has joined #ceph
[12:14] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) has joined #ceph
[12:15] * kefu is now known as kefu|afk
[12:16] * jclm (~jclm@50-206-204-8-static.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[12:19] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[12:21] * kefu|afk (~kefu@li1188-9.members.linode.com) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:23] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[12:23] * karnan_ (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[12:23] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[12:25] * shyu (~Shanzhi@119.254.120.66) Quit (Remote host closed the connection)
[12:27] * rwheeler (~rwheeler@pool-173-48-214-9.bstnma.fios.verizon.net) Quit (Quit: Leaving)
[12:33] * rlrevell (~leer@184.52.129.221) has joined #ceph
[12:33] * RomeroJnr (~h0m3r@hosd.leaseweb.net) has joined #ceph
[12:33] * linjan (~linjan@195.110.41.9) has joined #ceph
[12:37] * rlrevell (~leer@184.52.129.221) Quit (Read error: Connection reset by peer)
[12:38] * sankarshan (~sankarsha@183.87.39.242) has joined #ceph
[12:40] <RomeroJnr> Hi everyone, I'm setting up a test environment using ceph as the main storage solution for my qemu-kvm virtualization solution, and everything works fine, except for the following: When I simulate a failure (by powering off the switches) on one of the racks (within the crushmap) my vms get into a read-only state, the illustration might help you to fully understand what is going on: http://i.img
[12:40] <RomeroJnr> ur.com/clBApzK.jpg
[12:41] * nhm (~nhm@50-201-39-35-static.hfc.comcastbusiness.net) has joined #ceph
[12:41] * ChanServ sets mode +o nhm
[12:42] <RomeroJnr> My first assumption is that the VM keeps trying to write on the primary OSDs (which are marked as down), the secondary osds might not be taking the primary role. The pool is replicated, with enough placement groups for the set up.
[12:42] <RomeroJnr> When I power the "failed" back on, the virtual machines go back to their normal state, as if nothing had ever happened.
[12:47] * nardial (~ls@dslb-178-011-179-229.178.011.pools.vodafone-ip.de) has joined #ceph
[12:47] * rlrevell (~leer@184.52.129.221) has joined #ceph
[12:48] * nardial (~ls@dslb-178-011-179-229.178.011.pools.vodafone-ip.de) Quit (Remote host closed the connection)
[12:52] * dneary (~dneary@pool-96-252-45-212.bstnma.fios.verizon.net) has joined #ceph
[12:55] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[12:56] <smerz> RomeroJnr, do i get this right. a non ceph related filesystem goes readonly ?
[12:56] <smerz> /dev/vda is local to the hypervisor right ?
[12:56] <smerz> or is that also a ceph rbd ?
[12:58] <RomeroJnr> smerz, /dev/vda is the VM disk, when the rack dies the VM gets: "[159360.440136] INFO: task jbd2/vda1-8:135 blocked for more than 120 seconds." I supposed that due to the fact that each object is replicated on a different rack, the outage shouldn't cause any service disruption
[13:00] * dgurtner (~dgurtner@178.197.231.10) Quit (Ping timeout: 480 seconds)
[13:00] <smerz> ah so you mount the disk on the hypervisor. i understand. I think best practice is to have the kvm/qemu proces talk directly to ceph
[13:00] <smerz> at least in our setup, even if ceph dies for like 1-2 hours, the vm's resume service once ceph is back up
[13:00] <RomeroJnr> smerz, i know, that's the case. qemu talks directly to ceph
[13:00] <smerz> perhaps it's a setting or something
[13:00] * rlrevell (~leer@184.52.129.221) Quit (Ping timeout: 480 seconds)
[13:00] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[13:01] <smerz> at least with iscsi you can get the same issue. so perhaps it's a tuneable in the guest or something. not sure
[13:02] * dneary (~dneary@pool-96-252-45-212.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[13:03] <RomeroJnr> smerz, i see... as I said, as soon as i power the rack on-line, the VM goes back to its normal state, until then.. it's completely unresponsive
[13:04] <RomeroJnr> i suppose that should not happen, since the objects are replicated on the other remaining racks
[13:04] <smerz> is your ceph cluster actually working with the rack down? because you're right. it should not happen
[13:05] <Be-El> RomeroJnr: what's your setting for min_size on the affected pool?
[13:05] <RomeroJnr> Be-El, 1
[13:06] <Be-El> RomeroJnr: and what's the output of ceph -s during the downtime?
[13:07] <RomeroJnr> Be-El, it reports the down osds... i will reproduce the failure again, just a second
[13:07] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[13:08] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[13:08] <Be-El> ceph might take some time to recognize an osd as down (afaik 30 seconds for heartbeat)
[13:13] * dgurtner (~dgurtner@178.197.231.10) has joined #ceph
[13:13] <RomeroJnr> Be-El, http://pastebin.com/Y7WEfNAW
[13:15] <Be-El> RomeroJnr: so 11 pgs are down....the blocked vm may have accessed data in one of that pg
[13:15] <Be-El> RomeroJnr: as long as the pgs are shown as active they should be accessible (and writable)
[13:15] <RomeroJnr> Be-El, you mean: "11 undersized+degraded+peered"?
[13:15] <Be-El> RomeroJnr: yes
[13:16] <Be-El> RomeroJnr: how many racks do you have?
[13:16] <RomeroJnr> Be-El, 3 racks
[13:17] <Be-El> RomeroJnr: and do you distribute the pgs based on racks or do you use the default crush rules (which use hosts)
[13:17] <RomeroJnr> Be-El, solely based on the racks
[13:18] <Be-El> RomeroJnr: does the state of the 11 pgs change?
[13:18] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[13:18] <RomeroJnr> Be-El, not during the failure simulation.. when i turn the rack back on-line they became active+clean
[13:19] <Be-El> RomeroJnr: can you dump the pg information of one of the affected pgs?
[13:22] * shylesh__ (~shylesh@121.244.87.124) Quit (Remote host closed the connection)
[13:22] * xzpeter (~oftc-webi@li707-128.members.linode.com) Quit (Remote host closed the connection)
[13:24] <RomeroJnr> Be-El, how can I locate the affected pgs?
[13:25] <Be-El> RomeroJnr: 'ceph pg dump | grep peered'
[13:25] <Be-El> RomeroJnr: the first column is the pg id
[13:25] * nardial (~ls@dslb-178-011-179-229.178.011.pools.vodafone-ip.de) has joined #ceph
[13:26] <Be-El> RomeroJnr: and afterwards 'ceph pg <pg id> query'
[13:27] <RomeroJnr> Be-El, http://pastebin.com/xgzjrqs1
[13:28] <Be-El> RomeroJnr: ok, the pg is only located on osd 187
[13:28] <Be-El> RomeroJnr: does that osd belong to a host in the affected rack?
[13:29] * ganders (~root@190.2.42.21) has joined #ceph
[13:30] <RomeroJnr> Be-El, no, it belongs to another rack
[13:30] * Kvisle_ is now known as Kvisle
[13:30] <Be-El> RomeroJnr: and 156 and 285?
[13:31] <Be-El> ah, the pgs are empty.....there are no objects stored in them
[13:31] <RomeroJnr> Be-El, both to different racks
[13:32] <Be-El> and pool 0 is the pool for the rbd image?
[13:32] <RomeroJnr> Be-El, no, pool id 11
[13:32] * tganguly (~tganguly@121.244.87.117) Quit (Ping timeout: 480 seconds)
[13:33] <Be-El> RomeroJnr: ok, so these pgs are not related to your problem
[13:34] <Be-El> RomeroJnr: and i just ran out of ideas....
[13:34] <RomeroJnr> Be-El, haha
[13:34] <RomeroJnr> Be-El, where did you see the association between the PG and the pool?
[13:34] <T1w> what about a monitor that become unresponsive?
[13:35] * lucas (~Thunderbi@218.76.52.64) Quit (Quit: lucas)
[13:36] <Be-El> T1w: good point
[13:36] <RomeroJnr> Be-El, 1 mons down, quorum 1,2 mon002,mon003
[13:36] <Be-El> RomeroJnr: is the mon host with the lowest ip in the affected rack?
[13:36] <RomeroJnr> Be-El, yes
[13:36] <T1w> are the rbds mounted with all monitors referred?
[13:36] <T1w> (or whatever its called in the mount string)
[13:37] <Be-El> RomeroJnr: let's test t1w's hypothesis....power up the switch again, wait for the cluster to recover, and then power down another rack's switch
[13:37] <T1w> o_O
[13:37] <T1w> . o O (I'm helping!)
[13:37] <RomeroJnr> T1w, http://pastebin.com/m8ihYy5f
[13:38] <Be-El> T1w: does librbd behave differently than other ceph applications? I had the impressions that the given mon is only used to acquire a mon map
[13:38] <RomeroJnr> Be-El, i will try exploding the second rack, just a sec
[13:38] <T1w> RomeroJnr: sorry, I've had 0 experience with qemu.. :)
[13:39] <Be-El> RomeroJnr: but fire up the first rack first
[13:39] <T1w> Be-El: afaik no, but then again - everything else seems fine
[13:39] <Be-El> T1w: that's why i ran out of ideas
[13:40] <T1w> and when using fuse for mounting a rbd all monitors (or at least more than 1) should be referred - I could suspect that something either times out or failes to time out, and another monitor is never queried for a new crush map
[13:41] * overclk (~overclk@121.244.87.117) Quit (Quit: Leaving)
[13:41] * overclk (~overclk@121.244.87.117) has joined #ceph
[13:42] * Hemanth (~Hemanth@121.244.87.117) Quit (Ping timeout: 480 seconds)
[13:45] * arbrandes (~arbrandes@189.78.72.151) has joined #ceph
[13:47] <RomeroJnr> Be-El, ok, second rack is down.. this time i kept all monitors up
[13:48] <RomeroJnr> Be-El, http://pastebin.com/c49RgdZC
[13:48] <Be-El> RomeroJnr: does the vm still has access to its disk?
[13:48] * overclk (~overclk@121.244.87.117) Quit (Quit: Leaving)
[13:49] <RomeroJnr> Be-El, apparently it has the same issue
[13:50] * overclk (~overclk@121.244.87.117) has joined #ceph
[13:52] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[13:53] <Be-El> RomeroJnr: so it's probably not related to the mon
[13:54] <RomeroJnr> Be-El, it's weird.. it's not a real read-only state, i can "touch" files and even login.. however, a simple 'dd' (count=10 bs=1MB conv=fdatasync) hangs forever
[13:55] <Be-El> RomeroJnr: i might have an explanation for it
[13:55] <Be-El> RomeroJnr: can you overwrite existing files?
[13:56] <Be-El> small files
[13:56] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) has joined #ceph
[13:57] * xdeller (~xdeller@h195-91-128-218.ln.rinet.ru) has joined #ceph
[13:59] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[14:02] * t0rn (~ssullivan@2607:fad0:32:a02:56ee:75ff:fe48:3bd3) has joined #ceph
[14:02] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) has joined #ceph
[14:03] * t0rn (~ssullivan@2607:fad0:32:a02:56ee:75ff:fe48:3bd3) has left #ceph
[14:04] <RomeroJnr> Be-El, yes i can
[14:04] <RomeroJnr> Be-El, small existing files
[14:06] <RomeroJnr> Be-El, if I start downloading an .iso (3 GB) the download usually crashes after the first few hundred mb
[14:06] <RomeroJnr> Be-El, by "crashes" i meant: hangs forever
[14:08] <Be-El> RomeroJnr: ok
[14:08] <RomeroJnr> Be-El, when i turn the rack on-line again, the download resumes from where it hanged
[14:09] <Be-El> RomeroJnr: i think the problem is the fact that you write to parts of the rbd that have not been accessed before
[14:09] * lucas (~Thunderbi@218.76.52.64) has joined #ceph
[14:09] <Be-El> RomeroJnr: ceph does thin provisioning; each rbd is striped into chunks of 4 mb. each stripe is put into one pgs
[14:10] <Be-El> RomeroJnr: if you access formerly unaccessed parts of the rbd, a new stripe is created. and this probably fails if one of the racks is down
[14:10] <Be-El> RomeroJnr: but that's just a theory...maybe some developer can comment on this later
[14:12] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[14:12] <RomeroJnr> Be-El, hmpf, i see
[14:13] <RomeroJnr> Be-El, well, so I guess the solution is: don't let the rack break
[14:13] <SpaceDump> :D
[14:14] <Be-El> RomeroJnr: or post to the mailing list whether this behaviour is expected. i would have expected another behaviour
[14:15] <RomeroJnr> Be-El, i will do it
[14:17] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[14:18] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[14:18] <smerz> that would indeed be surprising :O
[14:21] <Be-El> smerz: creating an object in a pg might be different than writing to an object
[14:21] <Be-El> smerz: with one rack down ceph cannot satisfy the pg requirements in RomeroJnr's case
[14:22] <smerz> i can only agree with you. that i would expect other behaviour hehe
[14:22] <smerz> we'll have a 5 node setup running production soon. and yet we could have the same issue right ?
[14:22] * thomnico (~thomnico@bob13-2-88-172-8-244.fbx.proxad.net) has joined #ceph
[14:23] * nardial (~ls@dslb-178-011-179-229.178.011.pools.vodafone-ip.de) Quit (Quit: Leaving)
[14:23] <smerz> but we'll have 3 copies. so if a node goes down we'll have 2 osd's left
[14:23] <smerz> maybe maybe theres a reason behind the 3 copy default suggestion. i dunno
[14:24] <Be-El> the reason is a correct solution for split-brain situations
[14:24] <smerz> yeah
[14:24] <smerz> hm he said min_copy was set to 1. not sure how many copies he started out with
[14:25] <smerz> 2 copies i would guess
[14:25] <Be-El> three copies, distributed accross three racks
[14:25] <smerz> ah okay
[14:25] <smerz> nvm then
[14:26] <sep> how well does ceph perform as a storage for c
[14:27] <Be-El> c?
[14:27] <sep> vm's
[14:27] <sep> <.butterfingers
[14:27] <smerz> ceph is the most deployed storage system for openstack
[14:27] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[14:27] <smerz> "well" is so subjective :)
[14:27] <Be-El> i would also say it's the reference implementation for rbds
[14:28] <RomeroJnr> sep, depends on a lot of factors
[14:28] <sep> ofcourse depends on solution, but if you did all best practces of 10g network, ssd journals. 15k osd disks. the works. would it be compareable to a san like netapp or emc or eternus on 8gb FC ?
[14:29] <smerz> we don't have these. so i cannot comment
[14:30] * daviddcc (~dcasier@80.215.158.43) has joined #ceph
[14:30] <sep> we have quite a lot of snowflake servers. so the individual server's iops are quite important for us still. i know that ceph would scale out way better then the other solutions. but i am worried about the early begining, how it performs on the single important server
[14:30] * rubenk (~ruben@0001f23d.user.oftc.net) has joined #ceph
[14:31] <smerz> it's hard to say something generic about these kind of things
[14:31] <smerz> soo many nifty details
[14:31] <peem> when I do ceph-deploy rgw hostname, what it does ? I'm under impression that I need to create radosgw user after that, but should that user be client.radosgw.hostname or client.rgw.hostname ?
[14:34] * sjm (~sjm@49.32.0.181) Quit (Ping timeout: 480 seconds)
[14:36] * lucas (~Thunderbi@218.76.52.64) Quit (Quit: lucas)
[14:39] * madkiss (~madkiss@chello080108036100.31.11.vie.surfer.at) has joined #ceph
[14:39] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[14:41] * sankarshan (~sankarsha@183.87.39.242) Quit (Ping timeout: 480 seconds)
[14:41] * daviddcc (~dcasier@80.215.158.43) Quit (Ping timeout: 480 seconds)
[14:42] <rkeene> sep, Ceph is really really slow for a single OSD.
[14:45] * championofcyrodi (~championo@50-205-35-98-static.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[14:45] * linjan (~linjan@46.210.203.44) has joined #ceph
[14:46] * shang (~ShangWu@175.41.48.77) Quit (Quit: Ex-Chat)
[14:47] <sep> rkeene, but a vm image would never use just a single osd normaly right it would spread out so the trick is to get enough osd's to get the performance acceptable
[14:49] <rkeene> sep, Right -- but if you have a single server you might also have only one OSD (hard to tell from your question, since you only mentioned servers and not OSDs)
[14:50] * championofcyrodi (~championo@50-205-35-98-static.hfc.comcastbusiness.net) has joined #ceph
[14:52] * badone_ (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[14:54] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[15:00] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) has joined #ceph
[15:01] * flisky (~Thunderbi@106.39.60.34) Quit (Quit: flisky)
[15:03] <boolman> if I want to try the journal on ramdisk, How do I go about doing this? ceph-osd --mkjournal doesnt work on tmpfs
[15:04] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) has joined #ceph
[15:04] <boolman> I also tried creating a file on the tmpfs, mkfs.xfs it and mount it, but ceph-osd kept coredumping
[15:09] <smerz> boolman, we did this. but we created a file on tmpfs, and made a filesystem of that file
[15:09] <smerz> and mounted that fs, and put the journal in there via symlink
[15:10] <boolman> smerz: I tried that, well except for the symlink, that seems unnesessary
[15:11] <boolman> mount /my/file.img /mnt ; then specified journal file = /mnt/journal-$id
[15:11] <boolman> but after I started the osd's some of them kept coredumping
[15:11] <boolman> i used xfs
[15:11] <smerz> . /my/ == tmpfs. /my/file.img == storage for xfs filesystem. where is the xfs mounted ?
[15:11] <smerz> you cannot put the journal directly on tmpfs
[15:12] <boolman> mount /my/xfs.partition where /my is my tmpfs
[15:12] <boolman> mounted to /mnt
[15:12] <smerz> well then it should work
[15:12] <smerz> we just moved the existing journal
[15:12] <smerz> but you can create a new one. that worked fine for us too
[15:13] <boolman> I did a --flush-journal and then --mkjournal
[15:13] <smerz> (a new journal that is)
[15:13] <smerz> yeah
[15:22] * yanzheng (~zhyan@182.139.20.134) Quit (Quit: This computer has gone to sleep)
[15:22] * primechuck (~primechuc@207-118-227-76.dyn.centurytel.net) has joined #ceph
[15:23] * yanzheng (~zhyan@182.139.20.134) has joined #ceph
[15:24] <zenpac> I was able to get my OSD's to state (down, out), but I still have same authentication problems.
[15:24] <zenpac> They won't bootstrap
[15:26] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[15:28] * timi (~timi@ip4-89-238-218-34.euroweb.ro) has joined #ceph
[15:28] <timi> hi
[15:28] <timi> I created a cluster but looks like the pgs are stuck
[15:28] <timi> why is that
[15:29] * rotbeard (~redbeard@tmo-100-129.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[15:29] * dopesong_ (~dopesong@lb1.mailer.data.lt) Quit (Read error: Connection reset by peer)
[15:29] * squizzi (~squizzi@2602:306:bc59:85f0:3ea9:f4ff:fe5a:6064) Quit (Quit: to the moon!)
[15:29] <timi> http://pastebin.com/KHEcaPmP
[15:29] * vbellur (~vijay@50-206-204-8-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:30] <timi> I have only one pool
[15:30] * squizzi (~squizzi@2602:306:bc59:85f0:3ea9:f4ff:fe5a:6064) has joined #ceph
[15:33] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[15:34] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Remote host closed the connection)
[15:35] * linjan (~linjan@46.210.203.44) Quit (Ping timeout: 480 seconds)
[15:36] * ghartz_ (~ghartz@AStrasbourg-651-1-176-226.w90-40.abo.wanadoo.fr) has joined #ceph
[15:38] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[15:40] <ganders> what's the best size/brand for ssd osd's?
[15:41] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[15:42] * JFQ (~ghartz@AStrasbourg-651-1-152-164.w90-40.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[15:43] * DV_ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[15:45] <smerz> intel
[15:46] <gleam> dc-s3700 or similarly high endurance newer model
[15:47] * linjan (~linjan@195.110.41.9) has joined #ceph
[15:48] * SamYaple (~SamYaple@162.209.126.134) Quit (Quit: leaving)
[15:48] * SamYaple (~SamYaple@162.209.126.134) has joined #ceph
[15:51] * overclk (~overclk@121.244.87.117) Quit (Quit: Leaving)
[15:51] * elder (~elder@104.135.1.105) has joined #ceph
[15:51] * jtw (~jwilkins@2601:644:4100:bfef:ea2a:eaff:fe08:3f1d) Quit (Quit: Leaving)
[15:55] <boolman> smerz: i tried again, and the osd's dont coredump this time ( no change in my procedure ). But my iops drops to about 50 ( testing with FIO )
[15:56] <zenpac> Can I use Hammer with the Quick-Start docs?
[15:57] <zenpac> What is most recent stable?
[15:58] * primechuck (~primechuc@207-118-227-76.dyn.centurytel.net) Quit (Remote host closed the connection)
[16:00] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:00] <zenpac> Giant?
[16:00] * dyasny (~dyasny@104.158.34.71) has joined #ceph
[16:00] * thomnico (~thomnico@bob13-2-88-172-8-244.fbx.proxad.net) Quit (Read error: No route to host)
[16:00] * kefu (~kefu@114.92.100.239) has joined #ceph
[16:01] <gleam> 0.94.2 i believe
[16:01] <gleam> hammer
[16:01] * kefu (~kefu@114.92.100.239) Quit ()
[16:02] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) has joined #ceph
[16:03] * jwilkins (~jwilkins@2601:644:4100:bfef:ea2a:eaff:fe08:3f1d) has joined #ceph
[16:04] * kefu (~kefu@114.92.100.239) has joined #ceph
[16:08] <Be-El> timi: what's the pool configuration (size/min_size for replicated, k/m for ec pools)?
[16:10] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[16:11] * logan (~a@63.143.49.103) Quit (Ping timeout: 480 seconds)
[16:13] * anorak (~anorak@62.27.88.230) has joined #ceph
[16:13] * yanzheng (~zhyan@182.139.20.134) Quit (Quit: This computer has gone to sleep)
[16:13] <zenpac> deb http://ceph.com/debian-hammer/ jessie main doesnt seem to work.
[16:14] <zenpac> jessie not ready yet
[16:16] <ganders> ok the best are the intel dc s3700..but..i need to have osd's of 1.2TB ssd and hope to find them not so cheap.. but also not so high price.. :S
[16:16] <zenpac> Can I just add ceph-deploy to jessie ?
[16:16] * DV__ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[16:16] <zenpac> Its got by default 80.7-2
[16:18] <timi> Be-El: pool 6 'data01' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 240 pgp_num 240 last_change 43 flags hashpspool stripe_width 0
[16:18] <timi> max_osd 4
[16:19] <smerz> boolman, when we tried it we got way more iops. also be aware that the flush from journal to filestore is blocking for all write iops. so if such a flush takes long then journal on ramdisk doesn't help
[16:19] <anorak> Hi all. With "ceph -w", I can see how many ops/sec are happening in the cluster...is it possible to check that to which block device are these operations carried out?
[16:19] <Be-El> timi: how many hosts do you have?
[16:20] <timi> 4 host 4 osd
[16:21] <boolman> smerz: my testnode crashed when I tried removing a file. rbd_object_request_destroy
[16:22] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[16:22] * JoeJulian (~JoeJulian@shared.gaealink.net) Quit (Ping timeout: 480 seconds)
[16:27] <peem> what do I need to configure to be able to use s3cmd with ceph radosgw ?
[16:27] <peem> I'm getting "MethodNotAllowed" on any operation using s3cmd, but can create bucket using s3test.py script.
[16:30] <ganders> does someone test the Micron M600 SSD 1TB Sata 6Gb/s ? for osd/journal?
[16:30] <zenpac> Anyone know how to get the apt-repos for ceph on Debian8? I'm missing ceph-deploy
[16:30] <alfredodeza> zenpac: what is the codename for debian 8?
[16:30] <alfredodeza> I think we don't have builds for that
[16:30] <alfredodeza> if you want/need ceph-deploy you can always install with Python install tools as it is published in the Python Package Index
[16:31] * logan (~a@63.143.49.103) has joined #ceph
[16:31] <zenpac> jessie
[16:31] <zenpac> ok..
[16:32] <Be-El> timi: do you use the default crush ruleset (distribution over hosts)?
[16:33] * kefu (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[16:34] * DV_ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[16:36] <ganders> is there any bottleneck on kern 3.18?
[16:36] * kefu (~kefu@114.92.100.239) has joined #ceph
[16:37] <MACscr1> hows the cache pool support these days?
[16:37] * MACscr1 is now known as MACscr
[16:37] <timi> Be-El, yes the default crush rulset
[16:38] <MACscr> earlier this year i was told that the tiering wasnt that great
[16:38] <Be-El> timi: with 4 hosts and a size of 3 and the default ruleset everything should be ok
[16:38] <timi> I know
[16:39] * kefu (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[16:39] <timi> I am going to reinstall the cluster
[16:39] <Be-El> MACscr: there are some reports on the mailing list concerning the performance of cache tiers
[16:40] <timi> how much space is requered for jurnaling ?
[16:40] * kefu (~kefu@114.92.100.239) has joined #ceph
[16:40] <MACscr> Be-El: still pretty negative?
[16:41] <Be-El> MACscr: i haven't read all the details
[16:41] <Be-El> timi: default size is 5 GB afaik
[16:41] * kefu (~kefu@114.92.100.239) Quit ()
[16:43] * wenjunhuang (~wenjunhua@111.161.63.110) Quit (Ping timeout: 480 seconds)
[16:45] <peem> To bump it up, I'm getting "methoidnotallowed" via civetweb internal server.
[16:46] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[16:46] * madkiss (~madkiss@chello080108036100.31.11.vie.surfer.at) Quit (Quit: Leaving.)
[16:46] * DV__ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[16:47] * kefu (~kefu@114.92.100.239) has joined #ceph
[16:48] * nardial (~ls@dslb-178-011-179-229.178.011.pools.vodafone-ip.de) has joined #ceph
[16:48] * oro (~oro@2001:620:20:16:34be:e539:ca57:48b5) Quit (Ping timeout: 480 seconds)
[16:50] * DV__ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[16:53] * DV_ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[16:55] * moore (~moore@64.202.160.88) has joined #ceph
[16:56] <peem> Really, nobody here is able to help me get on right track with radosgw ?
[16:58] * segutier (~segutier@c-24-6-218-139.hsd1.ca.comcast.net) has joined #ceph
[17:08] * logan (~a@63.143.49.103) Quit (Ping timeout: 480 seconds)
[17:10] * jameskassemi (~james@128.177.113.102) has joined #ceph
[17:14] <rlrevell> are you sure you civetweb is seeing the traffic? that was the error i got when i had it accidentally pointed at an unrelated apache sever
[17:15] * dneary (~dneary@64.251.112.55) has joined #ceph
[17:16] <peem> rlrevell: apache was down and it used different port.
[17:16] <peem> I think I may be closer now, it seems that setting rgw dns name helped so far, just created bucket fine.
[17:18] * logan (~a@63.143.49.103) has joined #ceph
[17:20] * ChrisNBlum (~ChrisNBlu@dhcp-ip-217.dorf.rwth-aachen.de) Quit (Quit: ZNC - http://znc.in)
[17:23] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[17:24] <peem> Yeah, seems that hostname -s does not work well with civetweb... Still, I have issues with inkscope not talking to RGW, any ideas ?
[17:24] * dneary (~dneary@64.251.112.55) Quit (Ping timeout: 480 seconds)
[17:24] <timi> can I use one single jurnal partition for all osd ?
[17:26] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) has joined #ceph
[17:27] * jameskassemi (~james@128.177.113.102) Quit (Quit: jameskassemi)
[17:29] * overclk (~overclk@61.3.109.105) has joined #ceph
[17:31] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[17:32] * jameskassemi (~james@128.177.113.102) has joined #ceph
[17:33] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[17:34] * reed (~reed@75-101-54-131.dsl.static.fusionbroadband.com) has joined #ceph
[17:40] * logan (~a@63.143.49.103) Quit (Ping timeout: 480 seconds)
[17:46] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) has joined #ceph
[17:48] * squizzi (~squizzi@2602:306:bc59:85f0:3ea9:f4ff:fe5a:6064) Quit (Ping timeout: 480 seconds)
[17:51] * wicope (~wicope@0001fd8a.user.oftc.net) Quit (Read error: Connection reset by peer)
[17:53] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[17:54] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[17:56] * dgurtner (~dgurtner@178.197.231.10) Quit (Ping timeout: 480 seconds)
[17:57] * gregmark (~Adium@68.87.42.115) has joined #ceph
[17:58] * overclk (~overclk@61.3.109.105) Quit (Ping timeout: 480 seconds)
[17:59] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[18:02] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) has joined #ceph
[18:02] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:05] * nhm (~nhm@50-201-39-35-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:05] * segutier (~segutier@c-24-6-218-139.hsd1.ca.comcast.net) Quit (Quit: segutier)
[18:05] * bbby (~bibby@2601:204:c600:8a2e:e4d5:76ac:9b97:4190) has joined #ceph
[18:06] <ndru> Is there a way to configure a single node ceph cluster to have a healthy status?
[18:08] <peem> ndru: there is... "osd crush chooseleaf type = 0" I think.
[18:09] <ndru> I set that, still degraded/inactive/unclean/undersized/stuck
[18:09] <peem> ndru: pretty sure it worked for me while back.
[18:09] <ndru> I think, even with that, I have to have more than 1 OSD.
[18:10] * daviddcc (~dcasier@62.212.108.94) has joined #ceph
[18:10] <ndru> I tried setting osd pool default size = 1 aswell
[18:10] <ndru> Not sure if the changes took effect
[18:11] <m0zes> ndru: you'll have to change the size on pools existing before you changed the default size.
[18:11] <peem> ndru: I'm sure it worked with 1 osd, can't bring up that server now though to check. see : http://devcenter.megam.io/2015/03/27/ceph-in-a-single-node/
[18:13] <ndru> m0zes: Can you point me in the direction of how I can change existing pool size?
[18:14] <m0zes> ndru: ceph osd pool set <name> min_size 1; ceph osd pool set <name> size 1
[18:15] * joel (~joel@81.4.101.217) has joined #ceph
[18:15] * xarses (~xarses@12.164.168.117) has joined #ceph
[18:15] <joel> any word on Debian Jessie packages
[18:15] * branto (~branto@nat-pool-brq-t.redhat.com) has left #ceph
[18:18] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[18:23] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit (Quit: Leaving)
[18:24] * squizzi (~squizzi@nat-pool-rdu-t.redhat.com) has joined #ceph
[18:26] * dopesong (~dopesong@78-60-74-130.static.zebra.lt) has joined #ceph
[18:27] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Quit: Leaving.)
[18:27] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Ping timeout: 480 seconds)
[18:29] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[18:30] * visbits (~textual@8.29.138.28) has joined #ceph
[18:30] <visbits> anyone managing a +1PB cluster? I have 400 osds at around 900T now and it seems like i have a disk failure every few days... my cluster doesn't stay in clean mode very long lol
[18:30] * Larsen (~andreas@2a01:2b0:2000:11::cafe) Quit (Quit: No Ping reply in 180 seconds.)
[18:30] * kefu is now known as kefu|afk
[18:33] * segutier (~segutier@c-24-6-218-139.hsd1.ca.comcast.net) has joined #ceph
[18:33] * dopesong_ (~dopesong@lb1.mailer.data.lt) has joined #ceph
[18:33] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[18:34] <ndru> m0zes: Thank you! That worked.
[18:34] * segutier (~segutier@c-24-6-218-139.hsd1.ca.comcast.net) Quit ()
[18:35] * Larsen (~andreas@larsen.pl) has joined #ceph
[18:35] <ndru> It seems I'm getting "ERROR: Request Entity Too Large (HTTP 413)" when creating volumes on ceph via cinder. Are there any limits I need to investigate? (4TB disk with the OSD on it and trying to create a 100gb volume)
[18:35] * jameskassemi (~james@128.177.113.102) Quit (Quit: jameskassemi)
[18:36] * dopesong (~dopesong@78-60-74-130.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[18:36] * jameskassemi (~james@128.177.113.102) has joined #ceph
[18:38] * jameskassemi (~james@128.177.113.102) Quit ()
[18:38] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[18:38] * jameskassemi (~james@128.177.113.102) has joined #ceph
[18:40] * alram (~alram@206.169.83.146) has joined #ceph
[18:40] * jameskassemi (~james@128.177.113.102) Quit ()
[18:40] * jameskassemi (~james@128.177.113.102) has joined #ceph
[18:41] * TheSov (~TheSov@cip-248.trustwave.com) has joined #ceph
[18:41] <TheSov> when upgrading ceph on debia, do you just do a normal apt-get update/upgrade?
[18:45] * jameskassemi (~james@128.177.113.102) Quit ()
[18:46] * jameskassemi (~james@128.177.113.102) has joined #ceph
[18:47] * marrusl (~mark@cpe-67-247-9-253.nyc.res.rr.com) Quit (Quit: bye!)
[18:48] * marrusl (~mark@cpe-67-247-9-253.nyc.res.rr.com) has joined #ceph
[18:49] <kefu|afk> TheSov: and read http://ceph.com/docs/master/install/upgrading-ceph/ ?
[18:49] <kefu|afk> also the release notes for sure.
[18:53] * mykola (~Mikolaj@91.225.202.121) has joined #ceph
[18:53] * thomnico (~thomnico@bob13-2-88-172-8-244.fbx.proxad.net) has joined #ceph
[18:53] * dyasny (~dyasny@104.158.34.71) Quit (Ping timeout: 480 seconds)
[18:53] <TheSov> holy moly, its more complex then i thought
[18:53] <TheSov> kefu|afk, thanks
[18:57] * tganguly (~tganguly@122.172.53.41) has joined #ceph
[18:57] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[18:58] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[18:58] * georgem1 (~Adium@fwnat.oicr.on.ca) has joined #ceph
[18:58] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[19:03] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) has joined #ceph
[19:04] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) Quit ()
[19:06] * thomnico (~thomnico@bob13-2-88-172-8-244.fbx.proxad.net) Quit (Quit: Ex-Chat)
[19:08] <sean_> is there anywhere where I can find out what "pipe(0x1b38d180??sd=113??:54207??s=2??pgs=777??cs=23??l=0??c=0x1a0e7b80)." of the osd log means without trudging through source code?
[19:11] * segutier (~segutier@c-24-6-218-139.hsd1.ca.comcast.net) has joined #ceph
[19:12] <TheSov> sd probably means storage daemon
[19:12] <TheSov> s is size
[19:12] <TheSov> so you have at least 113 osds
[19:12] <TheSov> and your size is 2
[19:13] <TheSov> cs thats a new one on me
[19:13] <TheSov> and l aswell
[19:13] * vbellur (~vijay@64.251.112.55) has joined #ceph
[19:14] * nardial (~ls@dslb-178-011-179-229.178.011.pools.vodafone-ip.de) Quit (Quit: Leaving)
[19:14] <visbits> TheSov its an easy upgrade, that just details all versions
[19:14] * Concubidated (~Adium@23.91.33.7) Quit (Quit: Leaving.)
[19:16] <TheSov> visbits, thanks man i appreciate
[19:16] <sean_> TheSov: so this OSD is talking to at least 113 other osds?
[19:17] <TheSov> sean_, it looks like, code, storage daemon, block number, size, pg, ??, ??, ??
[19:18] * jclm (~jclm@50-206-204-8-static.hfc.comcastbusiness.net) has joined #ceph
[19:18] * oro (~oro@84-72-180-62.dclient.hispeed.ch) has joined #ceph
[19:18] <sean_> also the upgrade is really easy. I just (generally) do pip install --upgrade ceph-deploy; for host in hosts; do ceph-deploy install ${host} & done; then reboot them in the correct order (monitors, osds, mds, then gateways)
[19:20] <sean_> and by reboot i mean restart the services
[19:20] <sean_> I also throw an apt-get update and reboot the hardware nodes too to grab the new kernel etc
[19:20] <sean_> or yum update.. or whatever your pkg mgr is.
[19:21] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[19:21] <sean_> only done the upgrade twice though. both times were fine but I did wait and check the mailing list + change log both times
[19:21] <kefu|afk> sean_: sd is the socket descriptor. s is the socket state, etc, etc.
[19:22] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[19:24] <kefu|afk> sean_, and you should read the source if you really want to dig some info from the the stuff in "pipe(...)" , i think.
[19:26] <sean_> ah thanks kafu!!!
[19:26] <kefu|afk> sean_: yw =)
[19:30] <sean_> I was looking in pipe right now and saw sd being defined as an int but was hoping to see some include or something to help me figure out what it was :p
[19:31] <kefu|afk> seans_: it's a file descriptor, see sys/socket.h
[19:31] <kefu|afk> sean_ ^
[19:32] * kefu|afk is now known as kefu
[19:33] <kefu> sean_ are you looking at the log just for fun, or for diagnosing some issue in ur cluster?
[19:34] <kefu> sean_: just out of curiosity. coz you might want to be get yourself familiar with socket/network programming before diving into pipe.
[19:36] * kefu about to call it a night.
[19:39] <sean_> just for fun kefu
[19:39] <sean_> well.. diagnosing but building a logstash filter for osd
[19:39] <sean_> so just for fun right now. We have some spikey throughput and I want to know why
[19:39] <sean_> as in when we try to trasnfer a file from or to ceph the throughput is spikey
[19:40] <kefu> sean_: i see.
[19:40] * marrusl (~mark@cpe-67-247-9-253.nyc.res.rr.com) Quit (Quit: bye!)
[19:40] <kefu> to profile a distributed system is challenging
[19:40] * jameskassemi (~james@128.177.113.102) Quit (Quit: jameskassemi)
[19:41] * marrusl (~mark@cpe-67-247-9-253.nyc.res.rr.com) has joined #ceph
[19:41] <qhartman> so based on a suggestion I got here when discussing reducing IO wiat on ceph OSD nodes, I played some with IO scheudlers and their settings over the last day
[19:42] * kefu is now known as kefu|afk
[19:42] <qhartman> I found that the noop io scheduler seemed to make the CPU usage and wait both "spikier" than the deadline scheduler I was using
[19:43] * zack_dol_ (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[19:43] <qhartman> the average wait over a large time period might be marginally lower than when using deadline, but the spikes are much higher. All in all I think deadline is delivering mor econsistent performance.
[19:43] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Read error: Connection reset by peer)
[19:43] * jameskassemi (~james@128.177.113.102) has joined #ceph
[19:43] <qhartman> I also played with nr_queuerequests
[19:43] * sleinen (~Adium@macsl.switch.ch) Quit (Ping timeout: 480 seconds)
[19:44] <qhartman> increasing from the default of 128 to 1024 seems to have dropped iowait by about 10% on each of my OSD nodes
[19:44] <qhartman> I haven't tried going beyond that
[19:45] <qhartman> I also haven't tried lowering the value, but I don't see how that might improve things.
[19:45] * sage (~quassel@2607:f298:6050:709d:f030:67aa:1c6f:a15) Quit (Ping timeout: 480 seconds)
[19:46] * t0rn (~ssullivan@2607:fad0:32:a02:56ee:75ff:fe48:3bd3) has joined #ceph
[19:46] * t0rn (~ssullivan@2607:fad0:32:a02:56ee:75ff:fe48:3bd3) has left #ceph
[19:47] <cholcombe> are you running a raid controller in jbod mode ?
[19:48] <qhartman> jbod
[19:48] * overclk (~overclk@61.3.109.105) has joined #ceph
[19:48] <qhartman> 36OSDs spread across 12 nodes
[19:48] * overclk (~overclk@61.3.109.105) Quit ()
[19:49] <cholcombe> ssds?
[19:49] <cholcombe> oh sorry i read that wrong
[19:49] <qhartman> HDDs, SSds for journals and OS
[19:49] * daviddcc (~dcasier@62.212.108.94) Quit (Ping timeout: 480 seconds)
[19:49] * bitserker (~toni@77.231.154.146) has joined #ceph
[19:49] <cholcombe> ok
[19:50] <cholcombe> how bad is your i/o wait?
[19:50] * sage (~quassel@2607:f298:6050:709d:c4ef:e7a1:7751:6071) has joined #ceph
[19:50] * ChanServ sets mode +o sage
[19:50] <qhartman> about 10% on average
[19:50] <qhartman> with an overall CPU usage of about 20%
[19:50] <cholcombe> try bumping your nr_requests to 100,000
[19:51] <qhartman> yeah, I'm tempted to go that high, but I'm a little concerend about the memory usage implications, and my queues aren't getting real deep, so I'm not sure how it would help
[19:51] <cholcombe> i'm reading yoshinori's blog post about nr_requests
[19:52] <qhartman> yeah, I read that too
[19:52] <cholcombe> :)
[19:52] <cholcombe> i used to work with him
[19:52] <qhartman> seems like a smart guy
[19:52] <cholcombe> yeah he was quiet but really got stuff done
[19:53] <qhartman> It's always the quiet ones
[19:53] * MentalRay (~MRay@69.156.131.177) has joined #ceph
[19:53] <cholcombe> how much ram do these osd boxes have?
[19:53] <qhartman> I'm actually looking at messing with some CPU affinity to see if it might make the OSD processes more efficient overall and result in lower wait in the VMs, even if it doens't effect wait here
[19:54] <qhartman> They have 72GB, but they are also co-hosting VMs
[19:54] <qhartman> I've shaved off about 3.5GB for OSDs and system overhead
[19:54] <cholcombe> i see
[19:55] <cholcombe> cpu affinity might help by reducing the copying of the thread stack between cpu cores
[19:55] <qhartman> The OSDs are only using a bout 550MB each
[19:55] <qhartman> so it seems they have plenty
[19:55] <qhartman> right
[19:55] <cholcombe> yeah that's steady state though
[19:55] <cholcombe> in a recovery state they'll use a ton more
[19:55] <qhartman> yeah, not a lot, interestingly
[19:55] <qhartman> they spike up to about 900MB, but I haven't seen one go past that
[19:55] * Concubidated (~Adium@192.170.161.35) has joined #ceph
[19:55] <cholcombe> i've seen them go into several GB in recovery
[19:56] * jclm1 (~jclm@172.56.35.158) has joined #ceph
[19:56] <cholcombe> how's your performance overall with the ssd journal's?
[19:56] <qhartman> yeah, I keep hearing that, but I haven't seen it happen myself
[19:56] <qhartman> pretty good
[19:56] <qhartman> I think I've just got too much contention for hte number of spindle
[19:56] <qhartman> there are 70 VMs running off this cluster
[19:57] <qhartman> so with 3 replicas of data
[19:57] <qhartman> that's not a lot of spindle per vm
[19:57] <cholcombe> right
[19:57] <cholcombe> that is pretty tight
[19:57] * kefu|afk (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[19:57] <qhartman> I'm looking at setting things up so that "disposable" vms are on a pool where they only get two replicas
[19:58] <qhartman> but I haven't fully digested what th will take
[19:58] <qhartman> more immediately I have 6 more OSDs I can bring on, which will also help, but I need to free up some power budget before I can do that
[19:58] * kefu (~kefu@114.92.100.239) has joined #ceph
[19:59] <cholcombe> right
[19:59] <cholcombe> that'll add some iops and maybe lower your wait times
[19:59] <qhartman> in my benchmarking before I put a bunch of VMs on here I was able to pretty well saturate all of my OSDs wit ha single VM
[19:59] <cholcombe> wow
[20:00] <qhartman> yeah, the upgrade to 10Gb networking really paid off
[20:00] <cholcombe> yup i'm sure
[20:00] <qhartman> moved the bottleneck to the SATA bus
[20:00] <cholcombe> right
[20:00] <qhartman> with as cheap as SSDs are getting I'm tmpeted to suggest the next iteration of this cluster should be straight SSD
[20:00] * logan (~a@63.143.49.103) has joined #ceph
[20:00] <cholcombe> there's talk of ssd's hitting price parity next year
[20:01] * tganguly (~tganguly@122.172.53.41) Quit (Ping timeout: 480 seconds)
[20:01] <qhartman> yeah, even if they hit $.50/GB it would make sense for us
[20:01] <qhartman> right now they seems to be hovering around $.80
[20:02] <qhartman> It's interesting to be doing tuning on such a heavily loaded cluster, it makes positive changes really visible.
[20:02] * jclm (~jclm@50-206-204-8-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:02] <qhartman> (also neagtive ones, of ourse)
[20:02] <cholcombe> sandisk might release their full ssd tuning parameters at some point and we'll see what they did
[20:02] <cholcombe> they claim a huge amount of iops from their all flash cluster
[20:03] * zacbri (~zacbri@glo44-5-88-164-16-77.fbx.proxad.net) has joined #ceph
[20:03] <qhartman> I can imagine
[20:03] <qhartman> Even without tuning I bet it's ridiculous
[20:03] * kefu (~kefu@114.92.100.239) Quit (Max SendQ exceeded)
[20:04] <nils_> all that firmware magic kinda makes me nervous
[20:04] <cholcombe> yeah they're claiming 500K iops on a 100% read workload with 3 osd hosts
[20:04] <nils_> especially since many SSD don't even guarantee durability
[20:05] * kefu (~kefu@114.92.100.239) has joined #ceph
[20:05] <qhartman> sure, but the real-world durability seems to be much higher than what the manufacturers are claiming
[20:05] * JoeJulian (~JoeJulian@shared.gaealink.net) has joined #ceph
[20:05] <qhartman> did you see that big endurance test that Tech Report did?
[20:05] <cholcombe> no
[20:05] * kefu (~kefu@114.92.100.239) Quit ()
[20:05] <qhartman> They did it over a couple years, let me dig up the final article....
[20:06] <qhartman> https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
[20:06] * cok (~chk@test.roskilde-festival.dk) has joined #ceph
[20:06] <cholcombe> i think sandisk was using a 40Gb backend though for ceph
[20:06] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[20:06] <qhartman> yeah, that would make sense
[20:07] <qhartman> If they are going for a land speed record, no point in limiting yourself there
[20:07] * oro (~oro@84-72-180-62.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:09] <cholcombe> yup
[20:10] * vbellur (~vijay@64.251.112.55) Quit (Ping timeout: 480 seconds)
[20:10] <nils_> might be very interesting to do something along those lines, Infiniband FDR, NVRAM journals and SSD as OSD
[20:10] * ChrisNBlum (~ChrisNBlu@dhcp-ip-217.dorf.rwth-aachen.de) has joined #ceph
[20:10] * nils_ (~nils@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[20:11] <qhartman> wow, it's erarly to say for certain, but setting CPU affinity on the OSD may have made a big difference
[20:14] <qhartman> nope, didn't make that much of a difference overall, but it did reduce spikiness more
[20:14] <qhartman> maybe another 5-10% better though
[20:15] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[20:15] * georgem1 (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[20:16] * georgem1 (~Adium@fwnat.oicr.on.ca) has joined #ceph
[20:16] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[20:17] <cholcombe> nice!
[20:17] * overclk (~vshankar@61.3.109.105) has joined #ceph
[20:17] <qhartman> the initial graph just happened to have a valley in it that made it look like it reduced overall cpu load by like 50%
[20:17] <qhartman> stupid insufficient data, getting my hopes up
[20:17] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[20:18] * cok (~chk@test.roskilde-festival.dk) has left #ceph
[20:18] * ircolle (~ircolle@64.251.112.55) has joined #ceph
[20:18] <ira> qhartman: dual socket machines?
[20:19] <qhartman> ira, yes, two sockets, four cores per socket
[20:19] <ira> Memory affinity for the RDMA card on the right CPUs? :)
[20:20] <qhartman> I haven't messed with the memory affinity at all
[20:20] * vbellur (~vijay@64.251.112.55) has joined #ceph
[20:21] <qhartman> since I can only do that with numactl, and that seems to not work on running processes
[20:21] <ira> Not sure... but I've heard the impact is quite real.
[20:21] * primechuck (~primechuc@72.21.225.66) has joined #ceph
[20:21] <qhartman> I would imagine
[20:22] <qhartman> I'm going to let this change ride for awhile and see if it actually bers fruit over a longer time period
[20:22] <ira> Ok. :)
[20:22] <ira> (Actually, I know it makes a difference, though I don't know if you are pushing hard enough to hit it. Microsoft had to get that right in their SMB3 RDMA stack.)
[20:23] <qhartman> based on early numbers though, all my tweaking seems to have reduced my iowait by about 15% total
[20:23] <qhartman> yeah, I doubt that I am
[20:23] <ira> Sounds like a good thing :)
[20:23] <qhartman> I really think I jsut need to throw more / faster spindles at this problem
[20:23] <ira> (the reduction)
[20:24] <qhartman> but I'm taking advantage of having a real problem to see what tweaks are actually effective
[20:24] <joel> Advice for Debian Jessie still to build debs from source?
[20:25] * danieagle (~Daniel@187.35.206.135) has joined #ceph
[20:26] <ira> w/o enough spindles... life is sad. ;)
[20:28] <qhartman> yeha
[20:28] * overclk (~vshankar@61.3.109.105) Quit (Quit: BitchX: it does a body good)
[20:28] * vbellur (~vijay@64.251.112.55) Quit (Ping timeout: 480 seconds)
[20:30] * n8 (18fb5914@107.161.19.53) has joined #ceph
[20:31] * n8 is now known as Guest2769
[20:32] <Guest2769> hi all. im trying to figure out how large of an SSD for a journal is necessary. If the speed of the SSD is 480MB/s, 20 seconds of writes would be about 10GB of data max, suggesting that a 120GB SSD should be more than enough for a journal disk. Is this sound reasoning?
[20:32] <qhartman> the speed of your OSD drives is what you need to pay attention to for journal sizing
[20:33] <qhartman> IIRC
[20:33] <Guest2769> say I map 4 OSDs to a journal. that are 150MB/sec per spinning disk
[20:33] <qhartman> fwiw, I'm using 10GB journals for my OSDs and am not running into a trouble that I'm aware of
[20:33] <Guest2769> how big are your OSDs?
[20:33] <qhartman> 1TB
[20:34] <qhartman> each has a 10GB partition on a single SSD for journals
[20:34] <qhartman> 3 OSDs per machine
[20:34] <ira> 259239
[20:34] <qhartman> (using 1u 4-bay servers)
[20:35] <Guest2769> gotcha. so a 1TB SSD for journals wouldn't be advised. ;)
[20:35] <qhartman> heh
[20:35] <ira> Guest2769: Depends... it'll help with extra flash for wear...
[20:35] <ira> ;)
[20:38] * jclm (~jclm@172.56.35.158) has joined #ceph
[20:39] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[20:42] * dgurtner (~dgurtner@178.197.233.252) has joined #ceph
[20:43] <qhartman> yeah, I'm frequently getting disk queues over 400, I think more spindles is the only answer, other than reducing replicas so I cut down on the total amount of IO pretty signfiicantly
[20:43] <qhartman> can I reduce the replica size of a pool on the fly?
[20:43] <qhartman> I know I can increase it
[20:44] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[20:44] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[20:44] <m0zes> yes. just adjust min_size and size for the pool.
[20:45] <qhartman> cool
[20:45] * jclm1 (~jclm@172.56.35.158) Quit (Ping timeout: 480 seconds)
[20:46] <qhartman> It will be easier to move the few "important" VMs to a 3-replica pool (once I get all the openstack shenanigans figured out) and then reduce the replicas in this pool
[20:46] <qhartman> rather than move the 60+ somewhat disposable VMs to a 2-replica pool
[20:46] * daviddcc (~dcasier@tou77-h01-128-78-105-81.dsl.sta.abo.bbox.fr) has joined #ceph
[20:47] <qhartman> oh, what about pg's and pgp's? If I reduce the replica on this pool it would need way fewer pg's as well....
[20:49] <m0zes> pg/pgp cannot be reduced without recreating the pools.
[20:49] * logan (~a@63.143.49.103) Quit (Ping timeout: 480 seconds)
[20:49] <qhartman> ok, that must have been what I was remembering
[20:49] <qhartman> thanks for the confirmation
[20:50] * dgurtner (~dgurtner@178.197.233.252) Quit (Ping timeout: 480 seconds)
[20:56] <Guest2769> it looks like 1GB of ram is recommended for each Monitor daemon. With three monitor hosts, how many daemons would be running, only 3? or does it spawn additional daemons as needed?
[21:00] * jameskassemi (~james@128.177.113.102) Quit (Quit: jameskassemi)
[21:00] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[21:01] <ganders> hi cephers, quick question... what would be the 'best' HW config available out there for a OSD server?
[21:04] <doppelgrau> ganders: depends on your use-case
[21:07] * ircolle (~ircolle@64.251.112.55) Quit (Ping timeout: 480 seconds)
[21:07] * dupont-y (~dupont-y@familledupont.org) has joined #ceph
[21:10] <ganders> doppelgrau: is going to be used with very high io
[21:10] <doppelgrau> ganders: meaning high bandwidth or high number of IO/second
[21:11] <doppelgrau> ganders: and more read, more write or mixed
[21:11] <ganders> with non-relational databases, cassandra, mongodb
[21:11] <ganders> and some hadoop clusters
[21:11] <ganders> both
[21:13] <ganders> doppelgrau: and its going to be part of an IB FDR network
[21:13] <ganders> i need 8 osd servers, and im trying to find the best HW conf available out there
[21:13] <m0zes> then the question becomes "how much money do you have?"
[21:13] * primechuck (~primechuc@72.21.225.66) Quit (Ping timeout: 480 seconds)
[21:14] <ganders> m0zes: in this case, money is not an issue
[21:14] <ganders> m0zes: is not my money of course heheh lol
[21:15] <doppelgrau> ganders: since HDD/platters makes only about 150 IO/s (with SSD for journal) you???ll want an all SSD-Cluster
[21:16] <ganders> doppelgrau: got you, so i was thinking of 10x Intel DC S3500 800GB disks per OSD server
[21:16] <ganders> doppelgrau: so journal and osd resides on the same disk
[21:17] <doppelgrau> ganders: and fast CPUs, someone made a benchmarks a few days ago, 3,x GHz and less cores did perform bestter than more cores at lower speed in an all SSD-cluster
[21:17] * alram_ (~alram@206.169.83.146) has joined #ceph
[21:18] * ircolle (~ircolle@mobile-107-107-59-213.mycingular.net) has joined #ceph
[21:19] * snakamoto (~Adium@192.16.26.2) has joined #ceph
[21:19] <doppelgrau> ganders: and you???ll need very fast network connection, but I guess that???s no news :)
[21:20] <ganders> doppelgrau: so less cores but much more speed, something like E5-4627v3@2.6Gz(10C)
[21:20] <ganders> doppelgrau: yes we had an infiniband fdr 56Gb/s network
[21:21] * ircolle (~ircolle@mobile-107-107-59-213.mycingular.net) has left #ceph
[21:22] <ganders> doppelgrau:_ i was thinking in something like this: http://pastebin.com/raw.php?i=kyn8vqzK
[21:24] * alram (~alram@206.169.83.146) Quit (Ping timeout: 480 seconds)
[21:24] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[21:26] <ganders> doppelgrau: if i get all ssd, is there any scense to put a cache tier also?
[21:27] <TheSov> ganders, you can get an nvme tier
[21:28] * ganders (~root@190.2.42.21) Quit (Quit: WeeChat 0.4.2)
[21:28] * ganders (~root@190.2.42.21) has joined #ceph
[21:28] <ganders> do you had some info on that?
[21:29] <doppelgrau> ganders: Cache-Tier make only sense in my option if you a SSD-Cache in front of a DISK-pool
[21:29] <ganders> doppelgrau: exactly, so there's no scense to put the cachetier, but how about the other option?
[21:29] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[21:31] * ingslovak (~peto@office.websupport.sk) Quit (Quit: Leaving.)
[21:32] <doppelgrau> ganders: typing error, opinion :)
[21:32] * sleinen1 (~Adium@2001:620:0:82::104) has joined #ceph
[21:36] * snakamoto (~Adium@192.16.26.2) Quit (Quit: Leaving.)
[21:36] <doppelgrau> ganders: http://www.spinics.net/lists/ceph-users/msg19305.html
[21:37] <doppelgrau> ganders: so for your setup I would buy faster CPUs ans save some RAM (64GB should be still more than enough)
[21:38] <ganders> doppelgrau: thanks for that
[21:39] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[21:43] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[21:44] * jclm (~jclm@172.56.35.158) Quit (Quit: Leaving.)
[21:44] * snakamoto (~Adium@192.16.26.2) has joined #ceph
[21:44] * jclm (~jclm@172.56.35.158) has joined #ceph
[21:44] * dopesong_ (~dopesong@lb1.mailer.data.lt) Quit (Remote host closed the connection)
[21:48] <MACscr> right, but a cache tier doesnt even seem to be recommended these days, right? i havent seen anything lately, but i keep hearing that performance isnt what they expected
[21:52] * jclm (~jclm@172.56.35.158) Quit (Ping timeout: 480 seconds)
[21:57] * linjan (~linjan@213.8.240.146) has joined #ceph
[21:59] * bitserker (~toni@77.231.154.146) Quit (Quit: Leaving.)
[22:06] * snakamoto (~Adium@192.16.26.2) Quit (Quit: Leaving.)
[22:07] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[22:10] * ganders (~root@190.2.42.21) Quit (Quit: WeeChat 0.4.2)
[22:11] * oro (~oro@84-72-180-62.dclient.hispeed.ch) has joined #ceph
[22:18] * MentalRay (~MRay@69.156.131.177) Quit (Quit: This computer has gone to sleep)
[22:19] * MentalRay (~MRay@69.156.131.177) has joined #ceph
[22:19] * jwilkins (~jwilkins@2601:644:4100:bfef:ea2a:eaff:fe08:3f1d) Quit (Quit: Leaving)
[22:19] * oro (~oro@84-72-180-62.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:21] <zenpac> in ceph-deploy: (hammer) http://paste.debian.net/260498/
[22:22] <zenpac> I don't know how I missed generating the key.. any way around this?
[22:23] * jwilkins (~jwilkins@c-67-180-123-48.hsd1.ca.comcast.net) has joined #ceph
[22:25] * jtw (~jwilkins@2600:1010:b000:3840:ea2a:eaff:fe08:3f1d) has joined #ceph
[22:25] * MentalRay (~MRay@69.156.131.177) Quit (Quit: This computer has gone to sleep)
[22:27] * ngoswami (~ngoswami@14.97.235.203) has joined #ceph
[22:28] * snakamoto (~Adium@192.16.26.2) has joined #ceph
[22:30] <off_rhoden> zenpac: was the whole cluster created using ceph-deploy?
[22:31] <off_rhoden> zenpac: the keys are created when the monitor starts up for the first time, by a utility called ceph-create-keys
[22:31] <off_rhoden> usually the monitor won't start up unless ceph-create-keys executes successfully first
[22:31] <zenpac> yes. all using the tool
[22:32] <zenpac> Can I fix it?
[22:32] * jwilkins (~jwilkins@c-67-180-123-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:32] <off_rhoden> zenpac: probably. :) what version of ceph-deploy, and which OS?
[22:32] <off_rhoden> ceph-deploy --version
[22:32] <zenpac> hammer, debian 7
[22:33] <zenpac> 1.5.25 for ceph-dep
[22:33] * ingslovak (~peto@188-167-237-207.dynamic.chello.sk) has joined #ceph
[22:33] <off_rhoden> thanks.
[22:33] <zenpac> I installed the apt-repository for it.. I'm ok with going to Giant if that helps.
[22:34] <off_rhoden> if you go onto the monitor node (ceph-01) what do you see running, ceph-wise? e.g. "ps -ef | grep ceph"
[22:34] <zenpac> ceph-mon is running..
[22:34] <zenpac> I restarted ceph mon, and it created an empty key in /etc/ceph
[22:35] <off_rhoden> /etc/ceph/ceph.client.admin.keyring is empty?
[22:36] <zenpac> ceph-create-keys --cluster ceph -i ceph-01 ,, looks like its stuck
[22:36] <zenpac> yes, its non-existent.
[22:36] <zenpac> I have a: ceph.client.admin.keyring.8425.tmp in /etc/ceph
[22:36] * linjan (~linjan@213.8.240.146) Quit (Ping timeout: 480 seconds)
[22:36] <off_rhoden> yeah, that being stuck is indicative of *something*. not sure what just yet. :)
[22:37] <off_rhoden> weird
[22:37] <zenpac> off_rhoden: I'm using the same admin node as a mon.. is that problem?
[22:37] <off_rhoden> it should move that to to the right place
[22:37] * rubenk (~ruben@0001f23d.user.oftc.net) Quit (Quit: rubenk)
[22:37] <zenpac> my first admin node is same as first mon node.
[22:37] <off_rhoden> zenpac: shouldn't be. I do that all the time in my testing
[22:38] <zenpac> I'm sure I have something really broken. But I followed quick start closely.
[22:39] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[22:39] <off_rhoden> zenpac: can you fpaste the contents of your ceph.conf, and perhaps the output of "ip a" on the mon node?
[22:39] <zenpac> sure
[22:40] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[22:42] <zenpac> Thats the ceph.conf from the config folder, not /etc/ceph
[22:45] * DV__ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[22:47] * TheSov (~TheSov@cip-248.trustwave.com) Quit (Ping timeout: 480 seconds)
[22:47] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[22:49] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[22:49] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[22:49] * mykola (~Mikolaj@91.225.202.121) Quit (Quit: away)
[22:49] * wk (~wk@a85-138-73-251.cpe.netcabo.pt) has joined #ceph
[22:53] * daviddcc (~dcasier@tou77-h01-128-78-105-81.dsl.sta.abo.bbox.fr) Quit (Ping timeout: 480 seconds)
[22:54] * badone_ (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[22:55] * freman (~freman@griffin.seedboxes.cc) Quit (Quit: Lost terminal)
[22:55] * alfredodeza (~alfredode@198.206.133.89) has left #ceph
[22:56] * jtw (~jwilkins@2600:1010:b000:3840:ea2a:eaff:fe08:3f1d) Quit (Quit: Leaving)
[23:00] * wk (~wk@a85-138-73-251.cpe.netcabo.pt) Quit (Quit: Leaving)
[23:01] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[23:02] * srk (~srk@32.97.110.57) has joined #ceph
[23:03] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[23:03] * georgem1 (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[23:05] * alram_ (~alram@206.169.83.146) Quit (Quit: leaving)
[23:08] * nsoffer (~nsoffer@bzq-79-180-80-9.red.bezeqint.net) has joined #ceph
[23:08] <srk> hi, anyone come across problems with "cinder upload-to-image .." with rbd as backend?
[23:10] <srk> I'm trying to convert an 80G cinder volume into an image. The process starts to create an entry in ceph images pool, but cinder errors out after transferring 65520M
[23:11] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Ping timeout: 480 seconds)
[23:16] * badone_ is now known as badone
[23:18] * dneary (~dneary@64.251.112.55) has joined #ceph
[23:28] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:29] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:29] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[23:32] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[23:33] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) Quit (Remote host closed the connection)
[23:34] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[23:36] * TheSov (~TheSov@cip-248.trustwave.com) has joined #ceph
[23:37] <TheSov> do any of you guys, when a disk goes bad, just delete the osd from the cluster and recreate some time far in the future when you come around to it?
[23:37] <TheSov> i did that the other day, i dunno, i can see this turning into a bad habit
[23:45] <monsted> some might argue that it isn't worth dealing with a host until it's lost a few drives, depending on how easy they are to service :)
[23:45] * snakamoto (~Adium@192.16.26.2) Quit (Quit: Leaving.)
[23:47] * nsoffer (~nsoffer@bzq-79-180-80-9.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[23:52] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[23:53] <doppelgrau> monsted: I think it also depends on the size of the cluster. if 1 drive out of 1k drives fails you can easily wait, if 1 drive out of 10 fails ???
[23:53] <monsted> sure
[23:59] * dupont-y (~dupont-y@familledupont.org) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.