#ceph IRC Log

Index

IRC Log for 2014-08-12

Timestamps are in GMT/BST.

[0:01] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[0:06] * RandomUser (~oftc-webi@70-91-207-249-BusName-SFBA.hfc.comcastbusiness.net) Quit (Quit: Page closed)
[0:10] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[0:10] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[0:11] * sjustlaptop (~sam@24-205-54-233.dhcp.gldl.ca.charter.com) has joined #ceph
[0:11] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Quit: sync && halt)
[0:16] * wojwang (~wojwang@135.245.48.14) has joined #ceph
[0:17] * jharley (~jharley@192-171-36-233.cpe.pppoe.ca) Quit (Quit: jharley)
[0:19] * ikrstic (~ikrstic@93-87-118-93.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[0:19] * wojwang_ (~oftc-webi@proxy.lucent.com) has joined #ceph
[0:20] <wojwang_> Hi, I am getting "[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: rpm -Uvh --replacepkgs epel-release-6*.rpm" while installing ceph on a node
[0:20] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) Quit (Ping timeout: 480 seconds)
[0:20] * wojwang (~wojwang@135.245.48.14) Quit ()
[0:21] * wojwang_ (~oftc-webi@proxy.lucent.com) Quit ()
[0:22] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) has joined #ceph
[0:24] * The_Bishop (~bishop@2001:470:50b6:0:c1ba:4d17:cefb:fa45) Quit (Ping timeout: 480 seconds)
[0:36] * yuriw1 is now known as yuriw
[0:39] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[0:40] * rendar_ (~I@host75-182-dynamic.37-79-r.retail.telecomitalia.it) Quit ()
[0:43] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:44] * Concubidated (~Adium@66.87.130.194) has joined #ceph
[0:46] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[0:47] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[0:58] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[0:59] * rturk is now known as rturk|afk
[1:01] <swat30> Hi all, we're now seeing this when trying to start our OSDs
[1:01] <swat30> http://pastebin.com/ns0McteE
[1:01] <swat30> Seems to be something time related?
[1:05] * baylight (~tbayly@74-220-196-40.unifiedlayer.com) has left #ceph
[1:14] * The_Bishop (~bishop@e180174176.adsl.alicedsl.de) has joined #ceph
[1:19] * rturk|afk is now known as rturk
[1:22] * joef (~Adium@2620:79:0:131:a9a1:90ce:eff6:b2a6) Quit (Quit: Leaving.)
[1:24] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Remote host closed the connection)
[1:26] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:27] <xarses> swat30: what is the result of 'ceph -s'
[1:30] <swat30> xarses, http://pastebin.com/DjgYQ4yi
[1:31] <stupidnic> 0 OSD up
[1:31] <swat30> it was going through recovery, went a couple percentage pts and then failed
[1:31] * alram_ (~alram@38.122.20.226) has joined #ceph
[1:31] <swat30> yea
[1:31] <swat30> they won't start
[1:31] <stupidnic> where is the other OSD?
[1:32] <xarses> swat30: time is good, it would complain here if the time skew was larger than the configured maximum which is default 50ms
[1:32] <swat30> xarses, ok cool, tks. stupidnic we have another that we're trying to add in, but haven't been able to yet
[1:32] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[1:32] <lurbs> xarses: For the monitors, yeah. Not sure it complains about the OSD hosts.
[1:33] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:33] <swat30> it's really odd that all of the OSDs just died
[1:33] <swat30> they were working fine, rbd was responding
[1:33] <swat30> then poof
[1:33] * Hell_Fire (~hellfire@123-243-155-184.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[1:33] <swat30> got to 2.985% degraded (recovering) and stopped
[1:34] <stupidnic> I don't know, but 2515 pgs seems really high for the number OSDs
[1:34] * Hell_Fire (~hellfire@123-243-155-184.static.tpgi.com.au) has joined #ceph
[1:35] <swat30> stupidnic, what would you expect?
[1:35] <xarses> backfill_toofull?
[1:35] <swat30> yea, we have a drive that is at 93%, it was slowly rebalancing
[1:35] <swat30> and also the reason that we want to add another OSD
[1:36] <swat30> I'm not sure if the log output (http://pastebin.com/ns0McteE) is helpful or not. OSDs run for a couple secs then spew that out
[1:37] <swat30> also, here's an OSD dump http://pastebin.com/mHFSwWr3
[1:37] * rturk is now known as rturk|afk
[1:37] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[1:38] * baylight (~tbayly@204.15.85.169) has joined #ceph
[1:38] * shane_ (~shane@69.43.177.46) has joined #ceph
[1:39] <shane_> ive set cephx permissions for a cephfs user on a specific pool for read only, however the client still attempts to write the files to disk and does not say permission denied or anything like that? is this normal behavior?
[1:39] <stupidnic> swat30: I am sadly out of my depth here I can't offer any more assistance
[1:40] <swat30> stupidnic, np, thanks for your input. hoping that someone will be able to help out
[1:40] <swat30> we've had a rocky ride with this cluster today
[1:40] <stupidnic> I hope it isn't in production
[1:41] <swat30> sure is
[1:41] <stupidnic> ouch
[1:41] <swat30> we're working on a phase out plan, apparently didn't happen soon enough
[1:41] <Gnomethrower> :/
[1:41] <Gnomethrower> (<- looking at migrating towards Ceph...)
[1:42] <swat30> Gnomethrower, it's good. we set it up in a staging environment that someone threw into production
[1:42] * sjustlaptop (~sam@24-205-54-233.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[1:42] <swat30> one of those things
[1:42] <swat30> xarses, you able to provide any insight?
[1:43] <xarses> swat30: not off the top of my head
[1:43] <xarses> maybe check that the osd's still have network to the monitor
[1:43] <xarses> and that the monitor is healthy
[1:43] <xarses> maybe restart the mon
[1:43] * alram_ (~alram@38.122.20.226) Quit (Quit: leaving)
[1:43] <xarses> otherwise I'm at a loss
[1:44] * alram (~alram@38.122.20.226) has joined #ceph
[1:44] <swat30> yea, monitor looks good, can hit it from the OSDs
[1:44] * thurloat (~oftc-webi@104.37.192.5) has joined #ceph
[1:45] <stupidnic> What happens if you set one of the OSDs up?
[1:45] <swat30> I've tried setting osd up then starting the service
[1:46] <swat30> doesn't work unfortunately
[1:46] <stupidnic> So one of your OSDs ran out of space?
[1:47] <stupidnic> have you checked the actual drive space with df -h?
[1:47] <swat30> nope, it's ~93%
[1:47] <swat30> yea
[1:47] <xarses> df -i?
[1:47] <swat30> it was actually starting to free up some space on that drive during the recovery process before it crashed
[1:48] <swat30> 1% in use xarses
[1:48] <xarses> k
[1:48] <swat30> on all three
[1:48] * rturk|afk is now known as rturk
[1:48] * rturk is now known as rturk|afk
[1:49] <stupidnic> swat30: I think at this time... pony up for a support contract from inktank
[1:50] * rweeks (~rweeks@pat.hitachigst.com) Quit (Quit: Leaving)
[1:50] <stupidnic> i tried reading the dump but it's all greek to me
[1:50] * ircolle is now known as ircolle-afk
[1:50] <stupidnic> I see osd.19 a lot though
[1:51] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[1:51] <swat30> yea, from the original log file I sent along?
[1:51] <stupidnic> yeah
[1:51] <swat30> it's probably osd.19's log file
[1:51] <stupidnic> ah okay
[1:51] <swat30> :)
[1:52] <swat30> don't know if this is interesting, from the monitor http://pastebin.com/0g2DCDue
[1:52] <swat30> I assume that's normal
[1:54] * vmx (~vmx@dslb-084-056-050-159.084.056.pools.vodafone-ip.de) Quit (Quit: Leaving)
[1:57] <shane_> should i get an error message when attempting to write to a mounted cephfs pool that i only have read access to?
[1:59] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:02] * aknapp_ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[2:03] * oms101 (~oms101@p20030057EA0C7D00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:05] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[2:10] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Ping timeout: 480 seconds)
[2:10] * aknapp_ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Ping timeout: 480 seconds)
[2:12] * oms101 (~oms101@p20030057EA0B4700EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[2:12] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:15] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[2:15] * bandrus (~Adium@4.31.55.106) Quit (Quit: Leaving.)
[2:24] <swat30> sage, I don't suppose that you would have a few minutes to help me out ?
[2:25] <sage> swat30: that is normal ..if the whole rack is down
[2:25] <swat30> what's the best way to get it back online?
[2:25] <swat30> the OSDs just refuse to start
[2:26] <swat30> MON is up and running just fine apparently
[2:27] * sputnik13 (~sputnik13@207.8.121.241) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[2:27] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[2:29] <swat30> even a pointer in the right direction is appreciated, we're at a standstill :/
[2:33] <sage> there is probably some infrastructural reason why the whole rack is down.. broken neteworking or smoething?
[2:34] <swat30> sage, all seem to be talking to themselves fine
[2:35] <swat30> from a network standpoint
[2:37] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:39] <sage> what does "refusing to start" mean
[2:40] <sage> oh i see it now
[2:40] <sage> aie, this is cuttlefish
[2:40] <swat30> yea :/
[2:41] <swat30> worth noting, it was bobtail
[2:42] <sage> probably need to comment out that assertion to get things up
[2:43] <swat30> ok, so we'll need to compile?
[2:43] <sage> or rather replace teh interval_set op in teh caller to do a more forgiving union
[2:43] <sage> yeah
[2:43] * squisher (~david@2601:0:580:8be:c0de:ec8d:6416:d733) has joined #ceph
[2:43] <swat30> alright, we're currently running from APT
[2:43] <swat30> so that should be fun :)
[2:45] * sjm (~sjm@108.53.250.33) has left #ceph
[2:48] <sage> swat30: pushed wip-swat30.. i think that will get yo upast that particular assert. unclear what will happen later, though.
[2:49] <sage> i would avoid adding or deleting any snaps until you have upgraded and scrubbed and all that
[2:49] <swat30> ok
[2:49] <swat30> thanks! will try right now
[2:49] <sage> it'll show up on gitbuilder.ceph.com in 20-30 min... good luck!
[2:54] * midekra (~dennis@ariel.xs4all.nl) Quit (Ping timeout: 480 seconds)
[2:59] * Tamil1 (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[2:59] * JC1 (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) has joined #ceph
[2:59] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[3:01] * midekra (~dennis@ariel.xs4all.nl) has joined #ceph
[3:04] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[3:06] * JC (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[3:07] * bandrus (~Adium@66-87-130-194.pools.spcsdns.net) has joined #ceph
[3:09] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[3:11] <swat30> sage, thanks a million! OSDs are back up and running, trolling through recovery
[3:13] * lucas1 (~Thunderbi@222.247.57.50) has joined #ceph
[3:14] <stupidnic> swat30: let us know how it goes I am curious to know
[3:14] <swat30> stupidnic, will do for sure. going to take a while to get this guy recovered
[3:17] <Sysadmin88> are there packages for calamari yet?
[3:19] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) has joined #ceph
[3:21] <stupidnic> Sysadmin88: no but you can clone the git repo spin up a vagrant vm and build them easily
[3:22] * Sysadmin88_ (~IceChat77@94.4.14.211) has joined #ceph
[3:22] * LeaChim (~LeaChim@host86-159-115-162.range86-159.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:22] * baylight (~tbayly@204.15.85.169) has left #ceph
[3:22] <Sysadmin88_> dced... did i miss an answer?
[3:23] <stupidnic> Sysadmin88: no but you can clone the git repo spin up a vagrant vm and build them easily
[3:23] * bandrus (~Adium@66-87-130-194.pools.spcsdns.net) Quit (Quit: Leaving.)
[3:24] <Sysadmin88_> :) all good things come in time
[3:25] * Sysadmin88 (~IceChat77@05452df5.skybroadband.com) Quit (Ping timeout: 480 seconds)
[3:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:31] <Sysadmin88_> has there been thought about putting Ceph into appliances like FreeNAS? could be an easy way to allow people to deploy nodes without much work
[3:32] * zerick (~eocrospom@190.187.21.53) Quit (Ping timeout: 480 seconds)
[3:32] * squisher (~david@2601:0:580:8be:c0de:ec8d:6416:d733) Quit (Quit: Leaving)
[3:37] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[3:38] * The_Bishop_ (~bishop@f055077200.adsl.alicedsl.de) has joined #ceph
[3:42] * lupu (~lupu@86.107.101.214) has joined #ceph
[3:45] * The_Bishop (~bishop@e180174176.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[3:50] * haomaiwang (~haomaiwan@223.223.183.114) has joined #ceph
[3:50] * haomaiwang (~haomaiwan@223.223.183.114) Quit (Remote host closed the connection)
[3:51] * haomaiwang (~haomaiwan@124.248.208.2) has joined #ceph
[3:58] * haomaiwa_ (~haomaiwan@223.223.183.114) has joined #ceph
[4:00] * vz (~vz@122.167.100.141) has joined #ceph
[4:05] * KevinPerks (~Adium@2606:a000:80a1:1b00:a571:f795:1391:78a8) Quit (Quit: Leaving.)
[4:06] * haomaiwang (~haomaiwan@124.248.208.2) Quit (Ping timeout: 480 seconds)
[4:08] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[4:12] * lupu (~lupu@86.107.101.214) Quit (Ping timeout: 480 seconds)
[4:12] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[4:15] * EchoMike (~oftc-webi@63-234-143-34.dia.static.qwest.net) has joined #ceph
[4:20] * zhaochao (~zhaochao@111.204.252.9) has joined #ceph
[4:20] * vbellur1 (~vijay@122.178.198.64) has joined #ceph
[4:23] * KevinPerks (~Adium@2606:a000:80a1:1b00:580:404f:1109:e3d6) has joined #ceph
[4:23] <EchoMike> Hey, quick newbie question: if I'm on computer "node1", does "ceph-deploy osd create node2:sdb:/dev/sda3" create the journal on node1 or node2?
[4:24] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[4:24] <stupidnic> node2
[4:24] <stupidnic> it's specific to the node you specify
[4:24] <EchoMike> ok cool. it wouldn't really make sense otherwise
[4:24] <stupidnic> and besides you want journals local anyways
[4:24] <EchoMike> my thoughts exactly
[4:25] <EchoMike> can 2 or more OSD's share the same journal partition/drive?
[4:25] <stupidnic> They can but it isn't recommended unless the drive has good throughput (read SSD)
[4:25] * vbellur (~vijay@122.172.200.139) Quit (Ping timeout: 480 seconds)
[4:26] <stupidnic> since journaling is what limits ack response
[4:26] <EchoMike> ahh, k. I'm deploying a relatively small cluster (5 computers, 1 osd each (for now) with 1 5GB journal). Eventually I'll add more HDD/OSDs per computer
[4:26] <stupidnic> A data point from a similar setup
[4:27] <stupidnic> make sure that you don't put too many OSDs on one server
[4:27] <stupidnic> make sure you have enough cores to handle them
[4:27] <EchoMike> Yeah, I would only have 2-3 each computer. These are 2 CPU / 8 core machines
[4:27] <stupidnic> You should be fine then
[4:27] <stupidnic> RAM is always encouraged though
[4:27] <EchoMike> Though.... I plan on running Open Nebula on top as well. Probably not ideal... :S
[4:27] <EchoMike> 32GB each
[4:27] <stupidnic> not familar with it
[4:28] <EchoMike> it's like openstack; a cloud/vm app
[4:28] <stupidnic> ah okay
[4:28] <stupidnic> we are running openstack for ours
[4:28] <EchoMike> on the same machines as ceph?
[4:29] <stupidnic> Oh god no :)
[4:30] <EchoMike> hahaha, yeaaaah, I realize I'm over-using my machines. It's temporary until I can get some more. Do you think it'd just be better to run 2-3 ceph computers and then 2-3 openstack/nebula separately?
[4:30] <stupidnic> You only need three monitors in ceph
[4:30] <stupidnic> osds can be anywhere
[4:31] <stupidnic> Our cloud cluster is 7 servers
[4:31] <stupidnic> our controller is also a monitor
[4:31] <stupidnic> and then the two ceph nodes are monitors and osds
[4:32] <stupidnic> the remaining servers are compute nodes
[4:32] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) has joined #ceph
[4:33] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[4:33] <EchoMike> how ram/cpu intensive is ceph?
[4:34] <stupidnic> depends on what it is doing
[4:34] <EchoMike> I guess in this case, VM images
[4:34] <stupidnic> let me find the recommendations
[4:34] <stupidnic> I mean specific tasks related to ceph actually
[4:34] <stupidnic> workloads are workloads
[4:34] <EchoMike> oh cool
[4:35] <stupidnic> OSDs can use up to 1GB of RAM while replicating
[4:35] <stupidnic> sorry that should be monitors
[4:35] <stupidnic> http://ceph.com/docs/master/start/hardware-recommendations/
[4:36] <stupidnic> Read the RAM section to get an idea
[4:38] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[4:38] <stupidnic> I have a quad core xeon E3-1230 v2 as the CPU in each ceph node and load can get up around 5 or 6 depending on what is going on.
[4:38] <stupidnic> mainly heavy writes
[4:38] <EchoMike> ah, that helps a lot. It even mentions running openstack on the same hardware as ceph.
[4:38] * haomaiwa_ (~haomaiwan@223.223.183.114) Quit (Remote host closed the connection)
[4:38] <stupidnic> reads... it hardly notices
[4:38] * haomaiwang (~haomaiwan@124.248.208.2) has joined #ceph
[4:39] <EchoMike> are those on a gig or 10gig network?
[4:39] <EchoMike> (or some other SAN fiber?)
[4:39] <stupidnic> sadly gig, but 10GbE tomorrow
[4:39] <stupidnic> Arista is being installed tomorrow (I hope)
[4:39] <stupidnic> bonded 10Gb
[4:39] <EchoMike> yeah, I can't get 10Gig yet. :( I do have enough ports to bond 1Gig links though
[4:40] <stupidnic> We have bonded 4 x 1Gb
[4:40] <EchoMike> Wow. I bet being 1 gig now is your real bottleneck
[4:40] <EchoMike> Oooh ok
[4:40] <stupidnic> we did initial tests with 1Gb
[4:40] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) Quit (Ping timeout: 480 seconds)
[4:40] <stupidnic> and it saturated the line with two nodes and 10 OSDs
[4:40] <stupidnic> Let me dig the numbers out
[4:41] <EchoMike> Yeah, I can see that happening with 5 OSDs per node
[4:41] <stupidnic> 113MB/s on a bonnie++ test
[4:41] * zack_dolby (~textual@p8505b4.tokynt01.ap.so-net.ne.jp) has joined #ceph
[4:41] <stupidnic> so basically line rate (and that is no jumbo frames either - they weren't enabled at the time)
[4:42] <EchoMike> Are your monitors on bare metal? I got a crazy idea to put 1 or 2 of mine on VMs
[4:42] <stupidnic> mine are bare metal
[4:42] <stupidnic> I really would like to start using CoreOS and putting the various parts into Docker containers
[4:42] <stupidnic> that would be ideal
[4:42] <stupidnic> not there yet
[4:43] * fmanana (~fdmanana@bl4-183-124.dsl.telepac.pt) has joined #ceph
[4:43] <stupidnic> EchoMike: I think they would be okay... but if your cloud is down...
[4:43] <stupidnic> ugh
[4:43] <EchoMike> Yeaaah
[4:43] <stupidnic> cyclical depndency
[4:44] <EchoMike> I've never heard of coreos; it looks pretty cool. it's still early enough for me that I could run that direction
[4:44] <stupidnic> I think it dovetails nicely with Ceph nodes in particular
[4:44] <stupidnic> since you don't really care about the OS, and updating it is just a reboot away
[4:45] <stupidnic> I would be cautious though... since they are still working on things in the kernel
[4:45] <stupidnic> and so a future update at the kernel level could break something
[4:45] <stupidnic> It's one of those "Hey I should really..." type of ideas
[4:48] <EchoMike> Hmm.. I wish I had another month to play around with different techs before really getting started. I could get a simple ceph cluster setup now and always experiment with a CoreOs system later; integrating its ceph instance(s) later.
[4:48] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[4:48] <stupidnic> I think there are some issues with getting Docker the required access to the filesystems
[4:48] <stupidnic> but I think that CoreOS and etcd are a pretty good fit for Ceph
[4:49] <stupidnic> You can have etcd notifications handling adding and removing stuff from a cluster
[4:49] <EchoMike> Yeah, so that's something to look at later. It sounds pretty cool
[4:50] <EchoMike> What's CoreOs's relationship with VMs?
[4:50] * fdmanana (~fdmanana@bl5-173-96.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[4:50] <EchoMike> it seems like a replacement, in some regard, but there's still the VM capability/feature of just reloading an image
[4:51] <stupidnic> EchoMike: well it is tightly coupled with Docker
[4:52] <stupidnic> So the concept is to abstract as much of the underlying OS as possible from the applications running inside docker containers
[4:52] * todayman (~quassel@magellan.acm.jhu.edu) has joined #ceph
[4:52] <stupidnic> It would be theoretically possible to use CoreOS on compute nodes since you can shuffle things around... and then just reboot the nodes as needed for updates
[4:53] <EchoMike> Ah, gotcha. Looks like I have more homework to do. :P
[4:53] <stupidnic> I don't see as good a fit with CoreOS and compute nodes, as I do with Ceph nodes
[4:53] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[4:53] <stupidnic> I mean all we are interested in on a Ceph node is Ceph... so if that was in a Docker container and CoreOS was underneath that, we wouldn't really care
[4:54] <stupidnic> so you can setup a basic OSD container in docker, and then spin that up, rinse and repeat
[4:55] <stupidnic> Anyways, like I said... I have given it some thought, but not put pen to paper (so to speak) yet
[4:55] <stupidnic> arguablly you could probably do the same thing with Puppet or Chef
[4:55] <stupidnic> so might be a moot point
[4:56] <EchoMike> Yeah. Thanks for the info! I was going to build a 5-node (1 OSD each) cluster with open nebula on top, but I think now it'd make more sense to just make a 2-node / 4 or 5 OSD cluster with open nebula on separate machines
[4:57] <stupidnic> I think that will be a bit more flexible
[4:57] <stupidnic> and you can still expand Ceph as needed
[4:58] <EchoMike> Exactly
[4:59] * haomaiwa_ (~haomaiwan@223.223.183.114) has joined #ceph
[5:01] <EchoMike> Docker's features sound a little bit like what NodeJS + NPM can do (though a bit more featured)
[5:02] <EchoMike> Is it free though? From the layout of the website, I'm thinking it' snot
[5:02] <stupidnic> It is
[5:02] <stupidnic> it's basically LXC with git
[5:02] <EchoMike> Ah, just hit their FAQ.
[5:04] * haomaiwang (~haomaiwan@124.248.208.2) Quit (Read error: Connection reset by peer)
[5:04] <EchoMike> Thanks a ton stupidnic! Sometimes IRC is so much faster than googling everything.
[5:04] <stupidnic> I know the feeling
[5:04] <stupidnic> good luck with your build
[5:05] <EchoMike> Thanks! I really like Docker too. It seems to mesh nicely with what I'm building
[5:17] * Vacum_ (~vovo@88.130.194.18) has joined #ceph
[5:17] * yuriw (~Adium@c-76-126-35-111.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[5:17] * yuriw (~Adium@c-76-126-35-111.hsd1.ca.comcast.net) has joined #ceph
[5:19] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[5:24] * haomaiwa_ (~haomaiwan@223.223.183.114) Quit (Remote host closed the connection)
[5:24] * longguang (~chatzilla@123.126.33.253) Quit (Ping timeout: 480 seconds)
[5:24] * Vacum__ (~vovo@88.130.202.135) Quit (Ping timeout: 480 seconds)
[5:24] * haomaiwang (~haomaiwan@124.248.208.2) has joined #ceph
[5:24] * haomaiwa_ (~haomaiwan@223.223.183.114) has joined #ceph
[5:32] * haomaiwang (~haomaiwan@124.248.208.2) Quit (Ping timeout: 480 seconds)
[5:35] * Qten (~Qu310@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Ping timeout: 480 seconds)
[5:38] <thurloat> If I run `ceph osd lost` on an OSD that I just added and was not successful in bringing up, that shouldn't cause any data loss, right? I'm just afraid of all the warnings
[5:39] <thurloat> I have a bunch of pgs un down+peering waiting on that OSD to come up
[5:39] * adamcrume (~quassel@c-71-204-162-10.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[5:40] * Qu310 (~Qu310@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[5:40] <thurloat> or will removing it per these instructions: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd do the same job
[5:56] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:58] <shane_> ive set cephx permissions for a cephfs user on a specific pool for read only, however the client still attempts to write the files to disk and does not say permission denied or anything like that? is this normal behavior?
[6:22] * shang_ (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[6:24] * longguang (~chatzilla@123.126.33.253) has joined #ceph
[6:24] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[6:31] * shang_ (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[6:34] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[6:43] * rdas (~rdas@121.244.87.115) has joined #ceph
[6:45] * vbellur1 (~vijay@122.178.198.64) Quit (Ping timeout: 480 seconds)
[6:46] * vz (~vz@122.167.100.141) Quit (Quit: Leaving...)
[6:46] * Concubidated (~Adium@66.87.130.194) Quit (Read error: Connection reset by peer)
[7:04] * Concubidated (~Adium@66-87-130-132.pools.spcsdns.net) has joined #ceph
[7:07] * vbellur (~vijay@121.244.87.117) has joined #ceph
[7:13] * paul (~quassel@S0106362610c83979.ok.shawcable.net) has joined #ceph
[7:13] * paul is now known as paul_fusion
[7:14] * saurabh (~saurabh@121.244.87.117) has joined #ceph
[7:19] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[7:20] <Tamil> EchoMike: it would create the journal on node2 as thats what you have specified
[7:21] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit ()
[7:22] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[7:27] * KevinPerks (~Adium@2606:a000:80a1:1b00:580:404f:1109:e3d6) Quit (Quit: Leaving.)
[7:28] * lalatenduM (~lalatendu@121.244.87.117) has joined #ceph
[7:29] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[7:44] * sjusthm (~sam@24-205-54-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:46] * dlan (~dennis@116.228.88.131) Quit (Ping timeout: 480 seconds)
[7:47] * jianingy (~jianingy@211.151.112.5) Quit (Ping timeout: 480 seconds)
[7:47] * jianingy (~jianingy@211.151.112.5) has joined #ceph
[7:51] * dlan (~dennis@116.228.88.131) has joined #ceph
[7:53] * v2 (~vshankar@121.244.87.117) has joined #ceph
[8:01] * reed (~reed@75-101-54-131.dsl.static.sonic.net) Quit (Quit: Ex-Chat)
[8:02] * v2 (~vshankar@121.244.87.117) Quit (Ping timeout: 480 seconds)
[8:06] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[8:11] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[8:22] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[8:24] * lczerner (~lczerner@ip56-4.tvtrinec.cz) has joined #ceph
[8:27] * allig8r (~allig8r@128.135.219.116) Quit (Read error: Connection reset by peer)
[8:28] * Concubidated1 (~Adium@66.87.130.132) has joined #ceph
[8:28] * Concubidated (~Adium@66-87-130-132.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[8:29] * Concubidated1 (~Adium@66.87.130.132) Quit ()
[8:30] * thb (~me@0001bd58.user.oftc.net) has joined #ceph
[8:31] * Concubidated (~Adium@66.87.130.132) has joined #ceph
[8:34] * keksior (~oftc-webi@109.232.242.2) has joined #ceph
[8:34] * overclk (~vshankar@121.244.87.117) has joined #ceph
[8:35] <keksior> Hello. I've got problem with high load average on one on my servers running ceph osds (7 osd, 24 core, 128 GB RAM). The load average is about 4-6 in the night, when my traffic is almost nearly zero. Can someone help me track down high load average problem ? :(
[8:35] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[8:35] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[8:37] <keksior> in the iotop tool i can see 4-8 kworker process that use 50-100 % of my io. Anyone had similiar problem?
[8:38] <BManojlovic> type free
[8:38] * ismell (~ismell@host-24-52-35-110.beyondbb.com) Quit (Read error: Operation timed out)
[8:40] * cok (~chk@2a02:2350:18:1012:e513:3357:740c:741e) has joined #ceph
[8:40] * The_Bishop_ (~bishop@f055077200.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[8:42] * Concubidated (~Adium@66.87.130.132) Quit (Quit: Leaving.)
[8:45] <keksior> free -m http://pastebin.com/5zrdynKY
[8:48] <keksior> BManojlovic: i've got second similiar server where everything are in normal - load average about 0.8 to 1.2.
[8:51] * shang_ (~ShangWu@218.187.103.40) has joined #ceph
[8:51] * shang_ (~ShangWu@218.187.103.40) Quit (Read error: Connection reset by peer)
[8:57] * marvin0815_home (~oliver.bo@217.66.56.203) has joined #ceph
[9:00] <marvin0815_home> Hi, is there any way to find out which pg belongs to which pool or even which file/object?
[9:19] * ismell (~ismell@host-24-52-35-110.beyondbb.com) has joined #ceph
[9:19] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[9:19] * analbeard (~shw@support.memset.com) has joined #ceph
[9:21] <BManojlovic> hm not what i thought is an issue
[9:23] <BManojlovic> so i'm clueless someone else will need to say something smarter :)
[9:26] <keksior> i'll try update my ceph on this host to firefly from emperor
[9:26] <keksior> maybe this will solve my problem
[9:28] * fsimonce (~simon@host135-17-dynamic.8-79-r.retail.telecomitalia.it) has joined #ceph
[9:41] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[9:41] <mgarcesMZ> good morning
[9:43] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[9:44] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[9:45] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Remote host closed the connection)
[9:46] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[9:46] * ChanServ sets mode +v andreask
[9:47] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[9:47] * rendar (~I@host175-179-dynamic.10-87-r.retail.telecomitalia.it) has joined #ceph
[9:48] * marvin0815_home (~oliver.bo@217.66.56.203) Quit (Quit: leaving)
[9:49] * marvin0815_home (~oliver@ip18860e40.dynamic.kabel-deutschland.de) has joined #ceph
[9:49] * garphy`aw is now known as garphy
[9:53] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[9:54] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[9:55] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Quit: Leaving.)
[9:56] * zhangdongmao (~zhangdong@203.192.156.9) Quit (Remote host closed the connection)
[9:56] * saturnine (~saturnine@ashvm.saturne.in) Quit (Read error: Operation timed out)
[9:57] * capri_on (~capri@212.218.127.222) Quit (Read error: Connection reset by peer)
[9:57] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:57] * saturnine (~saturnine@ashvm.saturne.in) has joined #ceph
[10:00] * zhangdongmao (~zhangdong@203.192.156.9) has joined #ceph
[10:04] * dosaboy_ (~dosaboy@host86-156-254-95.range86-156.btcentralplus.com) has joined #ceph
[10:05] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:06] * The_Bishop_ (~bishop@f055077200.adsl.alicedsl.de) has joined #ceph
[10:14] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[10:15] * shang (~ShangWu@42-71-3-130.EMOME-IP.hinet.net) has joined #ceph
[10:16] * swizgard_ (~swizgard@port-87-193-133-18.static.qsc.de) has joined #ceph
[10:16] * saturnine (~saturnine@ashvm.saturne.in) Quit (Read error: Connection reset by peer)
[10:16] * saturnine (~saturnine@ashvm.saturne.in) has joined #ceph
[10:17] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[10:17] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[10:17] * shang (~ShangWu@42-71-3-130.EMOME-IP.hinet.net) Quit ()
[10:18] * swizgard (~swizgard@port-87-193-133-18.static.qsc.de) Quit (Ping timeout: 480 seconds)
[10:21] * sm1ly (~sm1ly@broadband-77-37-240-109.nationalcablenetworks.ru) has joined #ceph
[10:21] * sm1ly (~sm1ly@broadband-77-37-240-109.nationalcablenetworks.ru) Quit ()
[10:25] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) Quit (Remote host closed the connection)
[10:30] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) has joined #ceph
[10:32] <mgarcesMZ> I am following the document for setup of radosgw??? there is a part: ???FastCgiExternalServer /var/www/s3gw.fcgi -socket /tmp/radosgw.sock??? but in the new apache plugin, mod_fcgi, this option does not exist. Does anyone have the instructions on how to do this with the newer cgi module?
[10:33] * saturnine (~saturnine@ashvm.saturne.in) Quit (Ping timeout: 480 seconds)
[10:34] * capri (~capri@212.218.127.222) has joined #ceph
[10:35] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[10:35] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[10:36] * LeaChim (~LeaChim@host86-159-115-162.range86-159.btcentralplus.com) has joined #ceph
[10:36] * dosaboy_ (~dosaboy@host86-156-254-95.range86-156.btcentralplus.com) Quit (Quit: leaving)
[10:36] * keksior (~oftc-webi@109.232.242.2) Quit (Remote host closed the connection)
[10:39] * keksior (~oftc-webi@109.232.242.2) has joined #ceph
[10:39] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Read error: Connection reset by peer)
[10:40] * drankis (~drankis__@89.111.13.198) Quit (Ping timeout: 480 seconds)
[10:43] * hflai (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[10:46] * dosaboy_ (~dosaboy@host86-156-254-95.range86-156.btcentralplus.com) has joined #ceph
[10:46] <Isotopp> darkfader: at this junction it seems to establish itself that putting the xfs logs alone onto a fusion io does accelerate a pure xfs file system a lot, but does not improve ceph performance in a significant way
[10:46] <Isotopp> darkfader: after the current round of benchmarks finishes i will try with osd logs on fusion io as well
[10:47] * dosaboy_ (~dosaboy@host86-156-254-95.range86-156.btcentralplus.com) Quit ()
[10:50] * lupu (~lupu@86.107.101.214) has joined #ceph
[10:50] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[10:51] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[10:53] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[10:54] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:55] * oro (~oro@2001:620:20:222:7539:3765:ca0b:40df) has joined #ceph
[11:00] * jtang_ (~jtang@80.111.83.231) has joined #ceph
[11:00] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:01] * andreask (~andreask@zid-vpnn021.uibk.ac.at) has joined #ceph
[11:01] * ChanServ sets mode +v andreask
[11:03] <Isotopp> darkfader: I think that is also because the benchmark running in a vm is not capable of remotely saturating the ceph disks (i get around 10% util per spindle)
[11:10] <mgarcesMZ> I wish I had SSD???
[11:24] * dosaboy (~dosaboy@65.93.189.91.lcy-01.canonistack.canonical.com) Quit (Ping timeout: 480 seconds)
[11:27] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[11:29] * dosaboy (~dosaboy@65.93.189.91.lcy-01.canonistack.canonical.com) has joined #ceph
[11:30] * fmanana is now known as fdmanana
[11:32] * rer (~re@86.108.90.117) has joined #ceph
[11:36] * lucas1 (~Thunderbi@222.247.57.50) Quit (Quit: lucas1)
[11:37] <rer> Anyone wanna earn daily profits of 150$ to 500$ now with Copy system in Forex? real, sure, safe, fast, easy, start with very small capital invest 50$ or more, free guide, tips, training, msg me for any details or add skype : benjordan201
[11:38] * rer (~re@86.108.90.117) Quit (autokilled: This host violated network policy. Contact support@oftc.net for further information and assistance. (2014-08-12 09:38:19))
[11:45] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[11:46] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[11:57] * ikrstic (~ikrstic@93-87-118-93.dynamic.isp.telekom.rs) has joined #ceph
[11:59] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[12:00] * yguang11 (~yguang11@2406:2000:ef96:e:1d35:59ec:5c9:dc78) has joined #ceph
[12:03] * nljmo_ (~nljmo@84.246.29.1) has joined #ceph
[12:08] * yguang11 (~yguang11@2406:2000:ef96:e:1d35:59ec:5c9:dc78) Quit (Ping timeout: 480 seconds)
[12:09] * ikrstic (~ikrstic@93-87-118-93.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[12:11] * andreask (~andreask@zid-vpnn021.uibk.ac.at) Quit (Read error: Connection reset by peer)
[12:13] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Remote host closed the connection)
[12:13] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[12:14] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[12:15] * saturnine (~saturnine@66.219.20.211) has joined #ceph
[12:16] * ikrstic (~ikrstic@93-87-118-93.dynamic.isp.telekom.rs) has joined #ceph
[12:20] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[12:26] <theanalyst> mgarcesMZ: if you are on ubuntu I think you need to enable multiverse/universe repos dont remember which
[12:30] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Ping timeout: 480 seconds)
[12:39] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[12:42] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[12:43] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[12:47] <longguang> hi
[12:47] * linuxkidd_ (~linuxkidd@rtp-isp-nat-pool1-1.cisco.com) Quit (Remote host closed the connection)
[12:51] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[12:54] <pressureman> anyone know if "rados bench write" is sequential, or random writes?
[12:54] <pressureman> docs are quite clear on that
[13:04] <pressureman> oops, i mean the docs *aren't* quite clear
[13:09] * cok (~chk@2a02:2350:18:1012:e513:3357:740c:741e) Quit (Quit: Leaving.)
[13:09] * danieljh_ (~daniel@HSI-KBW-046-005-197-128.hsi8.kabel-badenwuerttemberg.de) Quit (Quit: leaving)
[13:13] * ikrstic (~ikrstic@93-87-118-93.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[13:16] * jcsp (~Adium@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[13:17] * i_m (~ivan.miro@gbibp9ph1--blueice4n1.emea.ibm.com) has joined #ceph
[13:18] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[13:18] * ChanServ sets mode +v andreask
[13:19] * jcsp (~Adium@0001bf3a.user.oftc.net) has joined #ceph
[13:24] * drankis (~drankis__@89.111.13.198) has joined #ceph
[13:30] * i_m (~ivan.miro@gbibp9ph1--blueice4n1.emea.ibm.com) Quit (Quit: Leaving.)
[13:31] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) has joined #ceph
[13:31] * fdmanana (~fdmanana@bl4-183-124.dsl.telepac.pt) Quit (Quit: Leaving)
[13:35] * zhaochao (~zhaochao@111.204.252.9) has left #ceph
[13:39] * BManojlovic (~steki@91.195.39.5) Quit (Remote host closed the connection)
[13:40] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[13:40] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) Quit (Remote host closed the connection)
[13:40] * BManojlovic (~steki@91.195.39.5) Quit (Remote host closed the connection)
[13:41] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[13:42] * ade (~abradshaw@193.202.255.218) has joined #ceph
[13:50] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) has joined #ceph
[13:58] * diegows (~diegows@190.190.5.238) has joined #ceph
[14:02] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[14:06] * rdas (~rdas@121.244.87.115) Quit (Quit: Leaving)
[14:13] * fdmanana (~fdmanana@bl4-183-124.dsl.telepac.pt) has joined #ceph
[14:14] * kr0m (~kr0m@62.82.228.34.static.user.ono.com) has joined #ceph
[14:14] <kr0m> Hello everybody
[14:14] <kr0m> I have installed my first ceph cluster with one monitor and two ods
[14:14] <kr0m> i have tested rbd access and works great
[14:14] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[14:15] <mgarcesMZ> hi again
[14:15] <kr0m> now i am interested in API access from python code
[14:15] <kr0m> i have done some test reading http://ceph.com/docs/master/rados/api/python/
[14:15] <kr0m> but now i need to upload image files to cluster storage using python
[14:15] <kr0m> i dont know how can i do it
[14:16] <kr0m> any idea?
[14:16] <kr0m> in the doc seems that you only write txt files, but not upload content
[14:16] <kr0m> or i have to read the image in raw and write it to object?
[14:17] <mgarcesMZ> guys can you help me? Im trying to setup radosgw, but I keep getting: ???End of script output before headers: s3gw.fcgi???
[14:26] * KevinPerks (~Adium@2606:a000:80a1:1b00:602e:927c:da72:b599) has joined #ceph
[14:27] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Ping timeout: 480 seconds)
[14:30] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[14:30] * ganders (~root@200-127-158-54.net.prima.net.ar) has joined #ceph
[14:33] * mrjack (mrjack@office.smart-weblications.net) Quit ()
[14:35] * boichev (~boichev@213.169.56.130) has joined #ceph
[14:43] * gregsfortytwo1 (~Adium@2607:f298:a:607:d0f3:bb61:8cd0:b373) has joined #ceph
[14:43] * gregsfortytwo (~Adium@38.122.20.226) Quit (Read error: Connection reset by peer)
[14:45] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[14:48] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[14:48] <mgarcesMZ> back
[14:48] <mgarcesMZ> my connection is very bad today
[14:48] <mgarcesMZ> :(
[14:48] <mgarcesMZ> my previous question: ???guys can you help me? Im trying to setup radosgw, but I keep getting: ???End of script output before headers: s3gw.fcgi??????
[14:53] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[14:54] <mgarcesMZ> using mod_fcgid, instead of mod_fastcgi (deprecated I think)
[14:54] * The_Bishop_ (~bishop@f055077200.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[14:56] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[14:57] * vbellur (~vijay@122.178.198.64) has joined #ceph
[14:58] * eternaleye (~eternaley@50.245.141.73) Quit (Remote host closed the connection)
[14:59] * lupu (~lupu@86.107.101.214) Quit (Quit: Leaving.)
[14:59] * lupu (~lupu@86.107.101.214) has joined #ceph
[15:02] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit (Quit: Leaving.)
[15:04] * KevinPerks (~Adium@2606:a000:80a1:1b00:602e:927c:da72:b599) Quit (Quit: Leaving.)
[15:05] * KevinPerks (~Adium@2606:a000:80a1:1b00:602e:927c:da72:b599) has joined #ceph
[15:05] * keksior (~oftc-webi@109.232.242.2) Quit (Quit: Page closed)
[15:05] * KevinPerks (~Adium@2606:a000:80a1:1b00:602e:927c:da72:b599) Quit ()
[15:06] * KevinPerks (~Adium@2606:a000:80a1:1b00:602e:927c:da72:b599) has joined #ceph
[15:08] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[15:10] * CephFan1 (~textual@68-233-224-175.static.hvvc.us) has joined #ceph
[15:12] * linuxkidd_ (~linuxkidd@rtp-isp-nat1.cisco.com) has joined #ceph
[15:12] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit ()
[15:16] * branto (~branto@nat-pool-brq-t.redhat.com) has joined #ceph
[15:17] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[15:23] <mgarcesMZ> anyone?
[15:26] * brad_mssw (~brad@shop.monetra.com) has joined #ceph
[15:27] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) has joined #ceph
[15:30] * cok (~chk@2a02:2350:18:1012:a5b0:3d17:aade:26) has joined #ceph
[15:32] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has left #ceph
[15:33] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[15:33] * ChanServ sets mode +v andreask
[15:37] * overclk (~vshankar@121.244.87.117) Quit (Quit: Leaving)
[15:39] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:40] * arcimboldo (~antonio@130.60.144.205) has joined #ceph
[15:41] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) Quit (Remote host closed the connection)
[15:41] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) has joined #ceph
[15:43] * ikrstic (~ikrstic@93-87-118-93.dynamic.isp.telekom.rs) has joined #ceph
[15:44] <sage> mgarcesMZ: use the mod_fastcgi available on ceph.com
[15:44] <sage> or turn off 100 continue.. i think it is 'rgw print continue = false'
[15:44] <mgarcesMZ> sage: ok, let me try the second option first
[15:45] <mgarcesMZ> I put ???rgw print continue = false??? on the [client.radosgw.gateway] part of ceph.conf, right?
[15:45] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has left #ceph
[15:48] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[15:48] <mgarcesMZ> sage: ?
[15:49] * aknapp (~aknapp@ip68-99-237-112.ph.ph.cox.net) Quit (Ping timeout: 480 seconds)
[15:51] * allig8r (~allig8r@128.135.219.116) has joined #ceph
[15:54] <Isotopp> i have finished my benchmarking and the results are contradictory, to say the least
[15:56] <ganders> Isotopp: what are the results of the bench?
[15:56] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Remote host closed the connection)
[15:57] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[15:57] <mgarcesMZ> ok, I know why I didnt use ceph???s mod_fastcgi??? no repos for RHEL7???
[15:59] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:59] <Isotopp> ganders: i have tested xfs with internal log (that's xfs on a hp 900g disk behind a p830 controller with 4g bbu memory), xfs with external log, 1g on fusion-io
[15:59] <Isotopp> ganders: and i have tested ceph with 18 of these on 3 nodes, 6 per node
[15:59] <Isotopp> ganders: internal xfs log, and internal osd log,
[15:59] <mgarcesMZ> I think I???m going back, and test with rhel6 or fedora20
[15:59] <Isotopp> ganders: xfs log on fusion-io, osd log on disk
[15:59] <mgarcesMZ> :(
[16:00] <Isotopp> ganders: and finally xfs log on fusion io and osd log on fusion io
[16:00] <ganders> Isotopp: which tool did you use for the tests?
[16:01] <Isotopp> ganders: i have been using
[16:01] <Isotopp> fio --filename=$(pwd)/keks --sync=1 -rw=randwrite --bs=16k --size=4G --numjobs=32 --runtime=60s --group_reporting --name=file1
[16:01] <Isotopp> and i have been using iozone -f iozonefile -o -O -i0 -i2 -s4g -r16k
[16:01] <Isotopp> and i have bene using
[16:01] <Isotopp> ~/iozone -f iozonefile -o -O -i0 -i2 -s4g -r16k
[16:01] <Isotopp> all of these test random-write 16k blocks on a 4g file
[16:02] <Isotopp> that is a pretty good approximation of mysql innodb workload and also the only 'interesting' case, as linear writes and any kind of read load are typically not critical
[16:02] <Isotopp> in my environment
[16:02] <sage> mgarcesMZ: it should work with other fastcgi modules or servers with 'rgw print continue = false'
[16:02] <Isotopp> the machines have a bonded 2x 10gbit interface on juniper contrail (openstack icehouse with hardware networking based on mpls)
[16:03] <iggy> mgarcesMZ: sadly, ceph is moving so fast right now, even the latest "enterprise" distribution is probably woefully behind... if not it will be in 3 months. So yeah, I'd stick with something that turns over a little faster like ceph (ubuntu or fedora)
[16:03] <sage> but yeah, i think no package for rhel7 at the moment
[16:03] <Isotopp> ganders: when i work on a hardware node, i.e. my first ceph node,
[16:03] <Isotopp> go into /var/lib/ceph/osd/ceph-1
[16:03] <Isotopp> and mkdir kris
[16:03] <sage> tho we should have one soon, i think
[16:03] <Isotopp> and in that dir benchmark fio, i get
[16:03] * eternaleye (~eternaley@50.245.141.73) has joined #ceph
[16:03] <Isotopp> ??write: io=8749.7MB, bw=149325KB/s, iops=9332 , runt= 60001msec
[16:04] <Isotopp> with everything on disk (p830 controller has 4g bbu)
[16:04] <ganders> running that command fio --filename=/dev/... --sync=1 --rw=randwrite --bs=16k --size=4g --numjobs=32 --runtime=60 --group_reporting --name=file1 is giving me 120MB/s
[16:04] <mgarcesMZ> sage: thank you! I will test everything with fedora20??? I think centos/rhel 7 are based on fedora20
[16:04] <Isotopp> when i mount this with logdev=/dev/fio/xfslog-1
[16:04] <Isotopp> i get
[16:04] <Isotopp> ??write: io=34111MB, bw=582089KB/s, iops=36380 , runt= 60008msec
[16:04] <Isotopp> that is, from 150mb/s to 582 mb/s and from 9300 iops to 36000 iops
[16:04] <Isotopp> that's pretty impressive.
[16:05] <iggy> sounds about right
[16:05] <ganders> yeah, and how do you do that mounting?
[16:05] <iggy> I bet the controller (even with the bbu) is obeying flush()es
[16:05] <Isotopp> mount -t xfs -o noatime,logdev=/dev/fio/xfslog-1 /dev/sdb1 /var/lib/ceph/osd/ceph-1
[16:06] <Isotopp> i actually had to put that into the fstab, because ceph-disk cannot handle per-disk mount options at all
[16:06] <Isotopp> due to a dependency loop
[16:06] <Isotopp> i tried with
[16:06] <Isotopp> [osd.1]
[16:06] <Isotopp> osd_mount_options_xfs = logdev=...
[16:06] <Isotopp> but that is unresolveable
[16:06] <ganders> could you share your ceph.conf file (pastebin)?
[16:06] <ganders> I'm using btrfs and tmpfs for journals
[16:07] <sage> mgarcesMZ: not certain there is a mod_fastcgi package for fedora either
[16:07] <ganders> with rados bench performe goes to the sky almost 1GB/s
[16:07] <Isotopp> http://pastebin.com/iN7SaFBN
[16:07] <sage> so i would try the continue = false first
[16:07] <ganders> but when going down to the clients even going directly to the rbd performance goes below 400MB/s
[16:08] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[16:08] <mgarcesMZ> sage: let me search for it...
[16:08] <mgarcesMZ> ubuntu/debian is out of the question
[16:10] <mgarcesMZ> sage: it exists only for fedora19 and centos6
[16:11] <mgarcesMZ> I can test really quick with centos6, but the kernel is very outdated??? I think I???m just going to download fedora19 for testing
[16:12] <ganders> Isotopp: my ceph.conf http://pastebin.com/raw.php?i=cu6qP8Hr
[16:13] * b0e (~aledermue@juniper1.netways.de) Quit (Quit: Leaving.)
[16:13] * rwheeler (~rwheeler@173.48.207.57) Quit (Quit: Leaving)
[16:14] * tupper (~chatzilla@216.1.187.164) has joined #ceph
[16:15] <Isotopp> yes
[16:15] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:15] <Isotopp> on mount, ceph0-conf is being called with a section for osd. not osd.1
[16:16] <Isotopp> it is then apparently tmp mounted
[16:16] <Isotopp> and that does not work with xfs logdev external
[16:16] <Isotopp> because even for the tmp mount you need to spec the xfs logdev
[16:16] <Isotopp> you handle osd journal here, i don't even have to do that,
[16:16] <Isotopp> because of the journal symlink
[16:17] <Isotopp> in the ceph-? directory
[16:18] * stewiem2000 (~stewiem20@195.10.250.233) Quit (Ping timeout: 480 seconds)
[16:18] * madkiss (~madkiss@chello080108052132.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[16:19] <pressureman> sage, nice of you to drop by!
[16:20] <ganders> Isotopp: some of the perf results: http://pastebin.com/raw.php?i=QtA370V1
[16:20] <pressureman> sage, can you clarify whether a "rados bench write" is random or sequential writes? the docs are not quite clear on this
[16:20] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[16:20] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[16:20] * saurabh (~saurabh@121.244.87.117) Quit (Quit: Leaving)
[16:22] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) Quit (Remote host closed the connection)
[16:23] <Sysadmin88_> pressureman - if you had a sequential write it would be split over 4MB chunks over the cluster...
[16:23] <Sysadmin88_> so it becomes random
[16:23] <Vacum_> Isotopp: if the tmp mount is only needed read-only (which I don't know), could the xfs fs then be mounted without the log ("recovery")?
[16:24] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[16:24] <pressureman> Sysadmin88_, hmm, good point. never thought of it like that... but why then are there separate options for sequential read / random read
[16:24] * rotbeard (~redbeard@2a02:908:df19:4b80:76f0:6dff:fe3b:994d) has joined #ceph
[16:25] <Sysadmin88_> maybe the random would be more random in the cluster... loading lots of bits at once instead of 4mb at a time from hosts?
[16:25] <Sysadmin88_> try it and see what the results are
[16:25] <Isotopp> Vacum_: afaik no
[16:26] <Isotopp> Vacum_: b/c it does potentially a log recovery and it cannot know if it has to, without the log
[16:26] <Isotopp> Vacum_: so without the log it does not know, even for read, if it is a consistent image
[16:26] <Vacum_> Isotopp: mh. too bad.
[16:27] <Isotopp> ganders: i have no mb/s numbers, they are not interesting for my use case. i can always provide the required mb/s some way or the other. the scarce ressource is commit/s in a database on a network drive, or fsync/s, which is about the same ressource
[16:27] <Isotopp> Vacum_: i think ultimately the problem is in xfs ballpark
[16:27] <ganders> Isotopp: got it
[16:27] <Isotopp> Vacum_: i think xfs should note the uuid of the logdev in the disk superblock
[16:27] <Isotopp> Vacum_: and the uuid of the data disk in a header of the logdev
[16:28] <Isotopp> Vacum_: instead of requiring that number to be present in each and every mount call.
[16:28] <Isotopp> Vacum_: i bet there is a limitation in the linux kernel that makes it a requirement to handle it the way it is being handled.
[16:29] * branto is now known as brantoAway
[16:29] * brantoAway is now known as branto
[16:29] <Vacum_> Isotopp: I see. Alternatively the pairing needs to be stored on the system drive. which would make it a single point of failure for all xfs volumes :/ otoh, this is already the case if one uses ceph with dmcrypt. the keys have to put to a backup too
[16:30] <mgarcesMZ> pressureman: sage does not always drop by.. but when he does, he helps out a lot! :)
[16:30] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) has joined #ceph
[16:30] <Isotopp> Vacum_: i solved the problem by simply creating a fstab
[16:31] <Isotopp> Vacum_: the existing scripting does a lot of work, all of which is not done when the filesystems are already mounted.
[16:31] <Isotopp> Vacum_: so in my case there is a lot of useless code not being called :-)
[16:31] <Vacum_> Isotopp: that is a neat solution :)
[16:31] <Vacum_> so you disabled the udev rules too?
[16:31] <Isotopp> no
[16:32] <Isotopp> the udev just drops everything it tries to do when it finds the stuff already being present
[16:32] * garphy is now known as garphy`aw
[16:32] <Vacum_> Isotopp: so the udev rules fire *after* fstab?
[16:32] <Isotopp> fstab is about the first thing that the system processes on boot
[16:32] <Isotopp> yes, after fstab
[16:33] <Vacum_> ok. thanks for the insight :)
[16:33] <Vacum_> (we currently have the issue that udev fires so early on that we do not have enough randomness available to open all dmcrypt'ed osds...)
[16:33] <Isotopp> anyway, apparently none of the performance advantages of a local fusion io can be realized by the cluster, using a /dev/vdb on rbd
[16:34] <Isotopp> the single instance is nowhere close to saturating anything, and the IOPS are always the same
[16:34] <Isotopp> xfs jopurnal disk, osd journal disk = 230 IOPS in iozone single threaded
[16:34] <Isotopp> xfs journal fio, osd journal disk = 230 iops in iozone single threaded
[16:34] <Isotopp> xfs journal fio, osd journal fio = 230 iops in iozone single threaded
[16:35] <Isotopp> and spindle load < 1.5% on all spindles at all times
[16:35] <Isotopp> so latency appears to be the dominating and limiting factor here
[16:35] <Isotopp> and that is unchanged
[16:35] <Isotopp> 16k random-write in a 4g file.
[16:35] <Vacum_> how many osds on how many nodes did you use for this?
[16:37] * marvin0815_home (~oliver@ip18860e40.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[16:38] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[16:39] <Isotopp> i have at the moment 3 nodes running ceph, dl380g8 with 256g of memory and 6 2.5" 900g disks as osd per device.
[16:39] <Isotopp> also, in each node, a 160g fusion-io card
[16:39] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[16:40] <Isotopp> the disks are connected to a p830 controller as single-disk raid-0 devices (the smartarray familiy in hp cannot handle jbod)
[16:40] <Isotopp> and the p830 is equipped with 4g of bbu memory
[16:40] * markbby (~Adium@168.94.245.2) has joined #ceph
[16:40] <Isotopp> my hp380g8 has 40 cores.
[16:40] <Isotopp> (well, 20 really, with ht enabled)
[16:40] * lczerner (~lczerner@ip56-4.tvtrinec.cz) Quit (Ping timeout: 480 seconds)
[16:41] <Isotopp> and i have actually 10 of them, but that is a lab cluster and the rest is used for other stuff such as running the contrail, testing ironic and stuff
[16:41] <Isotopp> 2014-08-12 16:41:31.577458 mon.0 [INF] pgmap v53675: 2560 pgs: 2560 active+clean; 32900 MB data, 107 GB used, 14964 GB / 15071 GB avail
[16:42] <Isotopp> osdmap e2124: 18 osds: 18 up, 18 in
[16:43] <Isotopp> my 32-way multithread performance is
[16:43] <Isotopp> Children see throughput for 32 random writers?? =??? 3693.91 ops/sec
[16:43] * Andreas-IPO (~andreas@2a01:2b0:2000:11::cafe) Quit (Read error: Connection reset by peer)
[16:43] * stxShadow (~jens@ip-84-119-161-124.unity-media.net) has joined #ceph
[16:43] <Isotopp> or, with fio
[16:43] <Isotopp> ??write: io=4193.5MB, bw=71527KB/s, iops=4470 , runt= 60034msec
[16:43] <Isotopp> (also 32 way 16k random write
[16:43] <Isotopp> )
[16:43] * stxShadow (~jens@ip-84-119-161-124.unity-media.net) has left #ceph
[16:44] * stxShadow (~jens@ip-84-119-161-124.unity-media.net) has joined #ceph
[16:44] <Vacum_> with a replica size of 3 this means your iops are effectively satisfied by 6 disks.
[16:44] * Andreas-IPO (~andreas@2a01:2b0:2000:11::cafe) has joined #ceph
[16:44] <Isotopp> yes
[16:47] <Vacum_> so its 600 ops/sec / disk. although I'm not sure how/if rbd caching comes into play.
[16:48] <Vacum_> But I agree. generally your cluster won't be able to compare to a local fusion-io card. the network latency alone...
[16:48] <Isotopp> yes, i need to look into other storage solutions as well,
[16:48] <Isotopp> ploop, openvstorage, stuff like that
[16:48] <Isotopp> that takes local writes locally and also streams them off-device.
[16:49] <Isotopp> using log-merge strategies
[16:49] <Vacum_> sounds "eventual consistent" :)
[16:49] * odyssey4me_ (~odyssey4m@41-135-190-139.dsl.mweb.co.za) has joined #ceph
[16:49] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[16:49] <Isotopp> in the case of a loss of local storage, it may or may not be, depending on the wait-strategy (when do you ack a write)
[16:50] <Vacum_> which value does min_size have for your rbd pool?
[16:50] <Isotopp> on the other hand people that are requiring that kind of performance are also known to run mysql with innodb_flush_log_at_trx_commit = 0
[16:51] <Isotopp> what command do you want me to run?
[16:51] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[16:51] <Vacum_> ceph osd dump |grep pool
[16:51] <Isotopp> root@cloud12:~# ceph osd pool get rbd min_size
[16:51] <Isotopp> min_size: 1
[16:51] <Isotopp> root@cloud12:~# ceph osd pool get rbdcinder min_size
[16:51] <Isotopp> min_size: 2
[16:51] <Isotopp> i think the rbdcinder pool is relevant here
[16:52] <Isotopp> root@cloud12:~# ceph osd pool get rbdcinder size
[16:52] <Isotopp> size: 3
[16:52] * lalatenduM (~lalatendu@121.244.87.117) Quit (Quit: Leaving)
[16:52] * odyssey4me__ (~odyssey4m@165.233.71.2) has joined #ceph
[16:53] <Vacum_> min_size defines when the blocking write to the primary pg comes back. so with min_size 2 the write returns after 2 osds have the write in their journal. (the primary plus one replica)
[16:53] <Vacum_> with min_size 1 its enough if the primary pg has it in it journal
[16:54] <Vacum_> min_size 1 is a tad risky - if that osd dies before it was able to send the write to at least one additional replica, the write is lost
[16:55] * stxShadow (~jens@ip-84-119-161-124.unity-media.net) Quit (Read error: Connection reset by peer)
[16:55] * dmsimard_away is now known as dmsimard
[16:57] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[16:58] * ircolle-afk is now known as ircolle
[16:58] * odyssey4me__ (~odyssey4m@165.233.71.2) Quit (Quit: Leaving)
[16:58] * cok (~chk@2a02:2350:18:1012:a5b0:3d17:aade:26) Quit (Quit: Leaving.)
[16:59] * odyssey4me_ (~odyssey4m@41-135-190-139.dsl.mweb.co.za) Quit (Ping timeout: 480 seconds)
[16:59] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) has joined #ceph
[17:00] * branto (~branto@nat-pool-brq-t.redhat.com) has left #ceph
[17:00] * branto (~branto@nat-pool-brq-t.redhat.com) has joined #ceph
[17:00] * branto is now known as branto_out
[17:01] * branto_out is now known as branto
[17:01] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[17:02] * thurloat (~oftc-webi@104.37.192.5) Quit (Quit: Page closed)
[17:05] * EchoMike (~oftc-webi@63-234-143-34.dia.static.qwest.net) Quit (Remote host closed the connection)
[17:06] * garphy`aw is now known as garphy
[17:06] * analbeard (~shw@support.memset.com) has left #ceph
[17:11] * lupu (~lupu@86.107.101.214) Quit (Ping timeout: 480 seconds)
[17:14] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) Quit (Quit: Leaving.)
[17:15] * linjan (~linjan@82.102.126.145) has joined #ceph
[17:17] <linjan> hello! is there any way to install ceph via ceph-deploy on centos 7? thx
[17:17] <alfredodeza> linjan: there is a known bug that just got fixed in master
[17:17] <alfredodeza> the workaround is to pass in the repo url and gpg url directly
[17:18] <alfredodeza> ceph-deploy install --repo-url http://... --gpg-url http://.... {NODES}
[17:18] <linjan> alfredodeza: but... i have not found ceph repo for centos 7
[17:18] <alfredodeza> linjan: el7
[17:18] <alfredodeza> is what you want
[17:19] <linjan> alfredodeza: thank you!
[17:19] * markbby (~Adium@168.94.245.2) has joined #ceph
[17:19] <linjan> another question: can i mix centos 6 and centos 7 nodes?
[17:19] <alfredodeza> you can from ceph-deploy's perspective, not sure if ceph will not like that
[17:20] <linjan> alfredodeza: sorry, what means "not sure if ceph will not like that"?
[17:21] <alfredodeza> linjan: I am not sure if there is something that ceph will complain if you install it in different distro versions
[17:21] <alfredodeza> as in: it may work but I am not sure
[17:21] <alfredodeza> you can install on different distros even with ceph-deploy
[17:21] <alfredodeza> so no problem there
[17:22] <linjan> alfredodeza: thank you again!
[17:22] <alfredodeza> no problem
[17:24] <linjan> alfredodeza: sorry, i visited http://ceph.com/rpm-firefly/ and didnt find el7 :(
[17:24] <mgarcesMZ> linjan: I am using centos7
[17:25] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:25] <mgarcesMZ> word of advise
[17:25] <mgarcesMZ> go with fedora19 or centos6
[17:25] <alfredodeza> linjan: we have not packages el7 for firefly yet, sorry :/
[17:25] <linjan> alfredodeza: thx
[17:25] <mgarcesMZ> linjan: there is still missing packages for el7
[17:25] <mgarcesMZ> took me a few days to find this
[17:25] <mgarcesMZ> :(
[17:26] <mgarcesMZ> linjan: you can???t find el7 but you have rhel7
[17:26] <mgarcesMZ> http://ceph.com/rpm-firefly/rhel7/x86_64/
[17:26] <mgarcesMZ> but go with fedora19, or if you cant do that, go with centos6
[17:27] <mgarcesMZ> rhel7 is almost the same as f19
[17:27] * jrankin (~jrankin@d47-69-66-231.try.wideopenwest.com) has joined #ceph
[17:28] * branto (~branto@nat-pool-brq-t.redhat.com) has left #ceph
[17:30] * erice (~erice@50.245.231.209) has joined #ceph
[17:31] * EchoMike (~oftc-webi@63-234-143-34.dia.static.qwest.net) has joined #ceph
[17:31] * tupper (~chatzilla@216.1.187.164) Quit (Remote host closed the connection)
[17:33] * mnaser (~textual@MTRLPQ5401W-LP130-02-1178024983.dsl.bell.ca) has joined #ceph
[17:34] * adamcrume (~quassel@50.247.81.99) has joined #ceph
[17:37] * sputnik13 (~sputnik13@207.8.121.241) has joined #ceph
[17:37] <mnaser> i've been doing a lot of reading about ceph and looking at deployment scenarios... it looks like for my use case, erasure coded + cache tiering is the best solution for me to have hot data stored on fast ssds and cold data on large drives... erasure coded cold pool will help increase the capacity as well (at the cost of slower performance, but my hot pool should handle that anyways)
[17:37] <mnaser> does that seem like a sane idea... should I look into other options?
[17:42] * linuxkidd (~linuxkidd@cpe-066-057-017-151.nc.res.rr.com) Quit (Quit: Leaving)
[17:42] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[17:43] * ade (~abradshaw@193.202.255.218) Quit (Quit: Too sexy for his shirt)
[17:44] <gleam> as long as you can deal with the latency hit and added cpu/network requirements then sounds totally sane
[17:45] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:46] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Read error: Operation timed out)
[17:46] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[17:49] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[17:49] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:50] <mnaser> gleam: if i understand correctly, added cpu/network and latency hit is only an issue when hitting cold storage pool?
[17:51] <mnaser> so ideally adding a bigger hot storage as i go to minimize my cold storage hits would be besT?
[17:52] <ifur_> mnaser: bcache you want to use?
[17:52] <gleam> the network hit is going to come during rebuild
[17:53] <gleam> (although i may be misremembering)
[17:53] <gleam> as is most of the hit of EC
[17:53] * linuxkidd (~linuxkidd@cpe-066-057-017-151.nc.res.rr.com) has joined #ceph
[17:54] * reed (~reed@75-101-54-131.dsl.static.sonic.net) has joined #ceph
[17:56] * linjan (~linjan@82.102.126.145) Quit (Ping timeout: 480 seconds)
[17:56] * EchoMike (~oftc-webi@63-234-143-34.dia.static.qwest.net) Quit (Remote host closed the connection)
[17:59] <stupidnic> gleam: EC?
[18:00] <mnaser> gleam: hmm.. i think for the increased performance its worth it overall
[18:00] <mnaser> ifur_: no, the cache pool / tiering feature
[18:00] <mnaser> stupidnic: erasure coded pools
[18:00] <stupidnic> ah okay
[18:00] * rotbeard (~redbeard@2a02:908:df19:4b80:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[18:00] * Pedras1 (~Adium@50.185.218.255) has joined #ceph
[18:01] * oro (~oro@2001:620:20:222:7539:3765:ca0b:40df) Quit (Ping timeout: 480 seconds)
[18:01] * alram (~alram@38.122.20.226) has joined #ceph
[18:02] * wedge_ (lordsilenc@bigfoot.xh.se) has joined #ceph
[18:02] * wedge (lordsilenc@bigfoot.xh.se) Quit (Read error: Connection reset by peer)
[18:03] * tupper (~chatzilla@38.106.55.34) has joined #ceph
[18:03] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[18:03] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[18:04] * capri (~capri@212.218.127.222) Quit (Read error: Connection reset by peer)
[18:04] * capri (~capri@212.218.127.222) has joined #ceph
[18:05] * nljmo_ (~nljmo@84.246.29.1) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[18:08] * danieljh (~daniel@0001b4e9.user.oftc.net) has joined #ceph
[18:09] * tupper (~chatzilla@38.106.55.34) Quit (Remote host closed the connection)
[18:09] * tupper (~chatzilla@38.106.55.34) has joined #ceph
[18:13] * Pedras1 (~Adium@50.185.218.255) Quit (Quit: Leaving.)
[18:14] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[18:14] * Pedras1 (~Adium@50.185.218.255) has joined #ceph
[18:16] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Remote host closed the connection)
[18:17] * mnaser (~textual@MTRLPQ5401W-LP130-02-1178024983.dsl.bell.ca) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[18:17] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: leaving)
[18:18] * tupper (~chatzilla@38.106.55.34) Quit (Ping timeout: 480 seconds)
[18:18] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[18:18] * tupper (~chatzilla@38.122.20.226) has joined #ceph
[18:21] * EchoMike (~oftc-webi@63-234-143-34.dia.static.qwest.net) has joined #ceph
[18:21] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[18:22] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[18:26] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[18:26] * b0e (~aledermue@x2f24848.dyn.telefonica.de) has joined #ceph
[18:27] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[18:29] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[18:29] * Pedras1 (~Adium@50.185.218.255) Quit (Quit: Leaving.)
[18:30] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[18:30] * Pedras (~Adium@50.185.218.255) has joined #ceph
[18:31] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[18:32] * Concubidated (~Adium@2607:f298:a:607:61fb:7793:6e08:65f0) has joined #ceph
[18:33] * garphy is now known as garphy`aw
[18:34] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:34] * b0e (~aledermue@x2f24848.dyn.telefonica.de) Quit (Quit: Leaving.)
[18:35] * garphy`aw is now known as garphy
[18:36] * danieljh (~daniel@0001b4e9.user.oftc.net) Quit (Quit: Lost terminal)
[18:40] * xdeller (~xdeller@h195-91-128-218.ln.rinet.ru) Quit (Ping timeout: 480 seconds)
[18:40] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Ping timeout: 480 seconds)
[18:40] * xdeller (~xdeller@109.188.124.35) has joined #ceph
[18:40] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) has joined #ceph
[18:42] * Pedras (~Adium@50.185.218.255) Quit (Quit: Leaving.)
[18:43] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[18:45] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[18:48] * lcavassa (~lcavassa@2-229-47-79.ip195.fastwebnet.it) Quit (Quit: Leaving)
[18:51] * Pedras (~Adium@50.185.218.255) has joined #ceph
[18:55] * shane__ (~shane@69.43.177.46) has joined #ceph
[18:58] * rturk|afk is now known as rturk
[19:02] * xarses (~andreww@12.164.168.117) has joined #ceph
[19:02] * shane_ (~shane@69.43.177.46) Quit (Ping timeout: 480 seconds)
[19:05] * xdeller (~xdeller@109.188.124.35) Quit (Ping timeout: 480 seconds)
[19:05] * xdeller (~xdeller@h195-91-128-218.ln.rinet.ru) has joined #ceph
[19:06] * adamcrume (~quassel@50.247.81.99) Quit (Remote host closed the connection)
[19:09] * sjustwork (~sam@2607:f298:a:607:7459:7c5f:d76d:77c5) has joined #ceph
[19:11] * rweeks (~rweeks@pat.hitachigst.com) has joined #ceph
[19:14] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[19:25] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Ping timeout: 480 seconds)
[19:30] * baylight (~tbayly@74-220-196-40.unifiedlayer.com) has joined #ceph
[19:31] * danieljh (~daniel@0001b4e9.user.oftc.net) has joined #ceph
[19:35] * lalatenduM (~lalatendu@122.172.210.49) has joined #ceph
[19:36] * adamcrume (~quassel@c-71-204-162-10.hsd1.ca.comcast.net) has joined #ceph
[19:37] * arcimboldo (~antonio@130.60.144.205) Quit (Ping timeout: 480 seconds)
[19:41] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[19:53] * dmsimard is now known as dmsimard_away
[19:58] * neurodrone (~neurodron@static-108-30-171-7.nycmny.fios.verizon.net) has joined #ceph
[20:09] * DV__ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[20:09] <badger32d>
[20:09] * badger32d (~badger@71-209-36-209.bois.qwest.net) Quit (Quit: leaving)
[20:10] * maxxware (~maxx@149.210.133.105) has joined #ceph
[20:14] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[20:15] * alram_ (~alram@38.122.20.226) has joined #ceph
[20:16] * dmsimard_away is now known as dmsimard
[20:17] * BManojlovic (~steki@95.180.4.243) has joined #ceph
[20:22] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[20:22] * alram_ (~alram@38.122.20.226) Quit (Quit: leaving)
[20:22] * alram (~alram@38.122.20.226) has joined #ceph
[20:23] * rweeks (~rweeks@pat.hitachigst.com) Quit (Quit: Leaving)
[20:23] * rmoe (~quassel@12.164.168.117) has joined #ceph
[20:25] * gregsfortytwo (~Adium@38.122.20.226) has joined #ceph
[20:26] * gregsfortytwo1 (~Adium@2607:f298:a:607:d0f3:bb61:8cd0:b373) Quit (Read error: Connection reset by peer)
[20:28] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[20:29] * lalatenduM (~lalatendu@122.172.210.49) Quit (Quit: Leaving)
[20:29] * rturk is now known as rturk|afk
[20:32] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[20:33] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[20:35] <swat30> sage, thanks again for your help! we were able to get the cluster to nearly 100%. there are a couple of old pgs that have been hanging around for a long time due to some old nodes that have since passed
[20:36] * lupu (~lupu@86.107.101.214) has joined #ceph
[20:36] <swat30> we are running into an issue with some rbd images still, seems that the client is looking for them where they aren't
[20:36] * mnaser (~textual@MTRLPQ5401W-LP130-02-1178024983.dsl.bell.ca) has joined #ceph
[20:37] <swat30> http://pastebin.com/D2tdV6iK
[20:39] * EchoMike (~oftc-webi@63-234-143-34.dia.static.qwest.net) Quit (Remote host closed the connection)
[20:40] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[20:44] * rturk|afk is now known as rturk
[20:47] * jakes (~oftc-webi@128-107-239-233.cisco.com) has joined #ceph
[20:47] <jakes> is on-disk encryption supported by ceph in latest release?
[20:48] * BManojlovic (~steki@95.180.4.243) Quit (Ping timeout: 480 seconds)
[20:52] <ganders> someone know what could be the issue here? http://pastebin.com/raw.php?i=ysXDUPrg
[20:52] <ganders> the cluster is HEALTH_OK
[20:55] * aknapp_ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[20:56] * ganders (~root@200-127-158-54.net.prima.net.ar) Quit (Quit: WeeChat 0.4.1)
[20:59] * jakes (~oftc-webi@128-107-239-233.cisco.com) Quit (Remote host closed the connection)
[21:02] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Ping timeout: 480 seconds)
[21:02] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[21:03] * rturk is now known as rturk|afk
[21:07] * reed (~reed@75-101-54-131.dsl.static.sonic.net) Quit (Ping timeout: 480 seconds)
[21:08] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) has joined #ceph
[21:09] * DV_ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[21:13] * jjgalvez (~JuanJose@ip72-193-50-198.lv.lv.cox.net) has joined #ceph
[21:14] * rendar (~I@host175-179-dynamic.10-87-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:15] * JC (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) has joined #ceph
[21:16] * BManojlovic (~steki@212.200.65.143) has joined #ceph
[21:16] * rendar (~I@host175-179-dynamic.10-87-r.retail.telecomitalia.it) has joined #ceph
[21:17] * bandrus (~Adium@38.122.20.226) has joined #ceph
[21:17] * JC2 (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) has joined #ceph
[21:21] * JC1 (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[21:23] * JC (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[21:23] * rturk|afk is now known as rturk
[21:31] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[21:31] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[21:32] <swat30> has anyone run into misdirected client issues that prevent rbd from loading images?
[21:34] <steveeJ> swat30: what is a misdirected client issue?
[21:34] <swat30> steveeJ, http://pastebin.com/D2tdV6iK
[21:34] <swat30> found this http://tracker.ceph.com/issues/8226, not sure if it's relevant though
[21:37] <mnaser> looking at erasure coded pools in ceph.. looks like the default is k=2, m=1 .. isnt that a bit unsafe?
[21:37] <mnaser> because of the scale that ceph operates at, it seems to be pretty risky
[21:38] * keksior (~oftc-webi@35.120.225.195.static.bait.pl) has joined #ceph
[21:39] <keksior> hello. anyone had problem with 4-5 load average on clean os only with 7 osd ? (24 core + 124 GB RAM) ?
[21:39] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[21:39] <keksior> iotop shows lots of kworker using 70-100% io
[21:40] <steveeJ> swat30: reading through that bug, have you also played with the OSD weight before that happened?
[21:41] <swat30> steveeJ, we had one OSD set itself to 0.4 b/c it was getting more full than the others
[21:41] <swat30> it's back at 1 now
[21:42] <swat30> PS we're running cuttlefish
[21:42] <steveeJ> keksior: keep in mind that iotop shows relative I/O. also a load of 4-5 is not automatically a problem when the system has 24 cores
[21:43] <keksior> steveeJ: but it's weird that on the second of my server which is identical as this one it shows 0.8 to 1.2
[21:43] <keksior> that's the reason i'm trying to track down this
[21:44] <steveeJ> identical in hardware only or also in software setup?
[21:45] * aknapp_ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Remote host closed the connection)
[21:46] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[21:46] <keksior> steveeJ: identical at hardware and software, it were installed last week and cleanly added osds to my cluster
[21:48] <keksior> steveeJ: http://postimg.org/image/ofs2z9c3d/ heres screenshot of kworker using lots of io. On the second server there aren't so many kworker process using much io
[21:50] * kevinc (~kevinc__@client65-44.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[21:54] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Ping timeout: 480 seconds)
[21:54] <steveeJ> keksior: looks like it's writing a lot yes, but that data actually has to come from somewhere
[21:55] <keksior> but should it write ceph osds ? if its in cluster? why it writes kworker ?
[21:55] * shane__ (~shane@69.43.177.46) Quit (Quit: Leaving)
[21:58] * mnaser (~textual@MTRLPQ5401W-LP130-02-1178024983.dsl.bell.ca) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[21:59] * kevinc (~kevinc__@client65-44.sdsc.edu) has joined #ceph
[21:59] <keksior> steveeJ: on the rest of my nodes the kworker process don't use soo much io as on this one.
[21:59] <steveeJ> are you sure that ceph is causing the load at all? you should find the origin of that workload. may be this helps: http://unix.stackexchange.com/questions/22851/why-is-kworker-consuming-so-many-resources-on-linux-3-0-0-12-server/65270#65270
[21:59] * JC (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) has joined #ceph
[22:00] * JC2 (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[22:01] * Lyfe (~lyfe@hactar.gameowls.com) Quit (Quit: .)
[22:03] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[22:03] <keksior> steveeJ when i stopped all osd's the load average drops down to 0.1-0.3. Thanks for the tip, i'll try to track down the reason kworker use so much io with clues from link you send me :), tomorrow i'll try this out. Thanks for the tip
[22:04] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[22:07] * keksior (~oftc-webi@35.120.225.195.static.bait.pl) Quit (Quit: Page closed)
[22:09] * tupper (~chatzilla@38.122.20.226) Quit (Remote host closed the connection)
[22:10] * tupper (~chatzilla@2607:f298:a:607:2677:3ff:fe64:c3f4) has joined #ceph
[22:11] * rturk is now known as rturk|afk
[22:15] <gleam> would someone using ceph+glance and the rbd-ephemeral-clone branch be willing to share the image location for one of their images? select value from image_locations where image_id='uuid'
[22:18] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) has joined #ceph
[22:19] <swat30> steveeJ, sorry, had to step away for a few minutes
[22:19] <swat30> able to provide any insight?
[22:25] * Nacer (~Nacer@pai34-4-82-240-124-12.fbx.proxad.net) Quit (Read error: Operation timed out)
[22:25] <steveeJ> swat30: not really, but i think that issue you found is related. besides that, have you tried lowering the min_size just for testing?
[22:26] <swat30> steveeJ, I haven't. would that affect existing data?
[22:28] <steveeJ> no, since it only changes the needed replicas to acknowledge a write-request to the client
[22:29] * bandrus1 (~Adium@38.122.20.226) has joined #ceph
[22:30] * thb (~me@0001bd58.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:31] <swat30> ok cool
[22:31] * tupper (~chatzilla@2607:f298:a:607:2677:3ff:fe64:c3f4) Quit (Ping timeout: 480 seconds)
[22:31] <steveeJ> have you recently changed your tunables?
[22:32] * Concubidated1 (~Adium@2607:f298:a:607:801f:8c8d:4873:6920) has joined #ceph
[22:32] <steveeJ> the issue you linked says "chooseleaf_vary_r" in the patch title
[22:32] <steveeJ> but that tunable is not available in your running version of ceph
[22:34] <steveeJ> have you seen anything like http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client/ in the ceph logs?
[22:34] * bandrus (~Adium@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:34] * Concubidated (~Adium@2607:f298:a:607:61fb:7793:6e08:65f0) Quit (Ping timeout: 480 seconds)
[22:35] <swat30> steveeJ, haven't changed anything. just checked min_size, it's 1
[22:35] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:35] <steveeJ> swat30: what's the replica size of the pool?
[22:35] <swat30> 2
[22:36] * rweeks (~rweeks@pat.hitachigst.com) has joined #ceph
[22:39] <swat30> steveeJ, I'm seeing this misdirected client.1117485.0:9 pg 6.e4d30207 to osd.20 in e18721, client e18721 pg 6.7 features 17179869183
[22:39] <swat30> similar, but not exact
[22:43] <steveeJ> i can't find very much information (http://ceph.com/docs/master/dev/osd_internals/map_message_handling/) without actually looking through the ceph source code
[22:43] * rotbeard (~redbeard@2a02:908:df19:4b80:76f0:6dff:fe3b:994d) has joined #ceph
[22:45] <rweeks> hey guys, remind me if Mark Nelson hangs out in here?
[22:45] <rweeks> I am lame and forgot his nick.
[22:45] <rweeks> <.<
[22:45] * aknapp_ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[22:46] * aknapp__ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[22:46] * aknapp_ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Read error: Connection reset by peer)
[22:48] * tupper (~chatzilla@2607:f298:a:607:2677:3ff:fe64:c3f4) has joined #ceph
[22:50] * alram (~alram@38.122.20.226) has joined #ceph
[22:52] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Ping timeout: 480 seconds)
[22:55] * brad_mssw (~brad@shop.monetra.com) Quit (Quit: Leaving)
[22:56] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[23:00] * sputnik13 (~sputnik13@207.8.121.241) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[23:01] * anth1y (~ant@12.219.154.180) has joined #ceph
[23:04] * jpierre03 (~jpierre03@5275675.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[23:04] * jpierre03_ (~jpierre03@voyage.prunetwork.fr) Quit (Read error: Connection reset by peer)
[23:05] * jpierre03 (~jpierre03@voyage.prunetwork.fr) has joined #ceph
[23:06] * jpierre03_ (~jpierre03@voyage.prunetwork.fr) has joined #ceph
[23:08] * Concubidated1 (~Adium@2607:f298:a:607:801f:8c8d:4873:6920) Quit (Ping timeout: 480 seconds)
[23:08] * aknapp__ (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) Quit (Read error: Connection reset by peer)
[23:08] * aknapp (~aknapp@fw125-01-outside-active.ent.mgmt.glbt1.secureserver.net) has joined #ceph
[23:09] * sputnik13 (~sputnik13@207.8.121.241) has joined #ceph
[23:10] * joef (~Adium@2620:79:0:2420::4) has joined #ceph
[23:11] <anth1y> hello we have just deployed ceph and we are using the radosgw. How do we going about doing backups
[23:11] * CephFan1 (~textual@68-233-224-175.static.hvvc.us) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[23:11] <anth1y> thanks in advanced
[23:11] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[23:12] * jrankin (~jrankin@d47-69-66-231.try.wideopenwest.com) Quit (Quit: Leaving)
[23:12] * dneary (~dneary@172.56.23.250) has joined #ceph
[23:13] <Sysadmin88_> the question might be, how do you usually handle backups :)
[23:13] <anth1y> rsync tar etc
[23:14] <anth1y> so rsync the buckets?
[23:14] <Sysadmin88_> some of the datsets ceph deals with would be too large to backup. everything depends on your environment
[23:15] <anth1y> right
[23:16] <anth1y> so I was just wondering if ceph had native tools to backup rgw
[23:16] * JC1 (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) has joined #ceph
[23:16] <Sysadmin88_> i think they are still working on the geographical replication...
[23:17] <rweeks> geo replication went in the D release
[23:17] <anth1y> is there documentation i can look at?
[23:17] <rweeks> hm
[23:18] <gleam> look at radosgw-agent
[23:18] <rweeks> http://ceph.com/docs/master/radosgw/federated-config/
[23:19] * markbby (~Adium@168.94.245.2) has joined #ceph
[23:19] <anth1y> https://github.com/ceph/radosgw-agent
[23:19] <anth1y> ?
[23:19] <gleam> if anyone happens to be using firefly + icehouse + rbd-ephemeral-clone-stable-icehouse, could you take a look at this? I'm able to clone an image with cinder and boot from it, but not boot directly from the image: https://gist.github.com/gleamicus/296af0501de6ed6bf506
[23:22] * BManojlovic (~steki@212.200.65.143) Quit (Ping timeout: 480 seconds)
[23:24] * rturk|afk is now known as rturk
[23:24] * JC (~JC@AMontpellier-651-1-445-156.w81-251.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[23:26] <anth1y> thanks @gleam
[23:26] * joef (~Adium@2620:79:0:2420::4) has left #ceph
[23:26] <anth1y> & @rweeks
[23:28] <anth1y> thank you all I'll keep doing some research and testing
[23:29] <yuriw> loicd: ping
[23:30] <loicd> yuriw: pong
[23:30] <yuriw> wow !!!
[23:30] <loicd> ;-)
[23:30] <yuriw> i did not expect you at all !!
[23:30] <yuriw> take a look pls - https://github.com/ceph/ceph-qa-suite/pull/81
[23:31] <loicd> nice
[23:31] * Concubidated (~Adium@2607:f298:a:607:891c:3185:1314:2788) has joined #ceph
[23:32] <joshd> gleam: is python-ceph installed on the nova-compute node?
[23:32] <joshd> gleam: I'm curious how you ran into that too, since I think I saw someone else report the same thing
[23:32] <yuriw> loicd: we discussed it a bit today and for now it's not elegant but may work, if you'd like you can add your times out here
[23:33] <gleam> yes, python-ceph is on there
[23:33] <gleam> 0.80.5-1trusty
[23:33] * Kupo1 (~tyler.wil@wsip-68-14-231-140.ph.ph.cox.net) has joined #ceph
[23:33] <Kupo1> Hey All, what release has this fix in it? http://tracker.ceph.com/issues/6257
[23:33] <Kupo1> Doesnt say on the report
[23:34] <yuriw> loicd: what do you think?
[23:34] <loicd> How would that work for idle_timeout: 1200 ? http://pastealacon.com/35207 ?
[23:35] <joshd> Kupo1: all firefly versions and 0.67.10
[23:36] <Kupo1> joshd: thx
[23:38] <yuriw> loicd: syntax-wise ? I am not sure, but you could add it and have somebody review
[23:38] <yuriw> a?
[23:38] <loicd> yuriw: how is this going to be taken into account by teuthlogy ? I mean what is the .py that handles this file https://github.com/ceph/ceph-qa-suite/commit/3f18b02cb5953b86148d269d3e9776f1218d4f5a
[23:39] <Kupo1> is it possible to snapshot an RBD and store the copy-on-write snapshot to a different pool?
[23:39] <loicd> yuriw: yes, I'm unsure about the exact syntax of overrides. In particular client.0 and client.1 ...
[23:40] <yuriw> loicd: thats what I meant no elegant, it will be a manual over-rite at the moment, e.g. in the command line of teuthoogy-suite we will use (as we actually do now) vps.yaml
[23:41] <yuriw> here is the line from cron - 21 12 * * * teuthology-suite -v -c firefly -m vps -s upgrade/firefly ~/vps.yaml --suite-branch firefly
[23:41] <loicd> yuriw: I see, good workaround. Maybe it deserves a note in the README ? Or in the --help of teuthology-suite to suggest to use it when --machine vps ?
[23:41] * anth1y (~ant@12.219.154.180) has left #ceph
[23:41] * anth1y (~ant@12.219.154.180) has joined #ceph
[23:42] <yuriw> sure, would you like to add your override ?
[23:42] <gleam> joshd: fwiw, I was getting #8912 a few days ago, but it looks like angdraug's branch was updated. the old version had rbd_utils.py and the new one has rbd.py
[23:42] <gleam> once packages with 8912 fixed come out i might try going back to that older version of the branch and see what's what
[23:43] <jiffe> well all the vmware hosts are up and the mdsc file listed a bunch of entries so tomorrow morning I will turn up debugging on the mds again in hopes of catching this thing in the act
[23:43] * jpierre03 (~jpierre03@voyage.prunetwork.fr) Quit (Ping timeout: 480 seconds)
[23:43] * jpierre03_ (~jpierre03@voyage.prunetwork.fr) Quit (Ping timeout: 480 seconds)
[23:44] <yuriw> loicd: I think it will have to be integrated to the system sooner or later tho, bit it's not clear how
[23:45] <yuriw> maybe flag on/off use over-ride / machine type? I am mumbling to myself???
[23:45] <joshd> gleam: yeah, I'm a bit suspicious that may be the problem, though it doesn't look like it should be from a quick glance. here's the bug though: https://bugs.launchpad.net/nova/+bug/1352595
[23:46] <gleam> i guess i'll just try renaming it back to rbd_utils and see if that does it
[23:47] <loicd> yuriw: I don't know enough about teuthology to useful comment I'm afraid ;-)
[23:47] <joshd> gleam: yeah, just need to rename it in imagebackend.py and maybe driver.py
[23:47] <gleam> yeah. this was abandoned but does basically that: https://review.openstack.org/#/c/111892/
[23:47] <yuriw> loicd: me either :)
[23:48] <joshd> gleam: if you can test it out, and comment on the bug and review if that is the problem, that'd be great
[23:52] <yuriw> loicd: any updates on ##8736,8737,8740 ? are they still valid?
[23:54] <gleam> that does indeed get me back to the state of #8912 segfaulting
[23:54] <gleam> so, progress!
[23:55] <loicd> yuriw: no updates on from me. I guess they need the corresponding log to be analyzed to find clues.
[23:58] <loicd> yuriw: I did spend a fair amount of time trying to figure http://tracker.ceph.com/issues/8988 out but then I got lazy ;-)
[23:58] * TiCPU (~jeromepou@190-130.cgocable.ca) has joined #ceph
[23:59] * yguang11_ (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[23:59] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.