#ceph IRC Log

Index

IRC Log for 2015-06-15

Timestamps are in GMT/BST.

[0:07] * thebevans (~bevans@94.5.237.252) Quit (Quit: thebevans)
[0:08] * Kaervan (~Xylios@178.175.128.50) has joined #ceph
[0:08] * scuttle|afk is now known as scuttlemonkey
[0:09] * daviddcc (~dcasier@ADijon-653-1-114-184.w90-33.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[0:13] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[0:20] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[0:21] * derjohn_mob (~aj@x590e088f.dyn.telefonica.de) has joined #ceph
[0:22] * ChrisNBl_ (~textual@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[0:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[0:26] * oro (~oro@188.143.75.215) has joined #ceph
[0:26] <destrudo> I had been experiementing with cache tiering, but then my backplane blew out and took all of my poor SSD samples with it
[0:27] * circ-user-oIJGb (~circuser-@50.46.225.207) has joined #ceph
[0:31] <darkfaded> :/
[0:37] * Kaervan (~Xylios@8Q4AABI42.tor-irc.dnsbl.oftc.net) Quit ()
[0:38] * circ-user-oIJGb (~circuser-@50.46.225.207) Quit (Read error: Connection reset by peer)
[0:43] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:47] * Bored (~bret@5.61.34.63) has joined #ceph
[0:47] * oro (~oro@188.143.75.215) Quit (Ping timeout: 480 seconds)
[0:48] * Debesis_ (~0x@140.217.38.86.mobile.mezon.lt) has joined #ceph
[0:53] * Debesis (~0x@140.217.38.86.mobile.mezon.lt) Quit (Ping timeout: 480 seconds)
[0:56] * dgbaley27 (~matt@c-67-176-93-83.hsd1.co.comcast.net) has joined #ceph
[0:57] * Debesis_ (~0x@140.217.38.86.mobile.mezon.lt) Quit (Quit: Leaving)
[0:57] * Debesis (~0x@140.217.38.86.mobile.mezon.lt) has joined #ceph
[1:06] * ivotron (uid25461@id-25461.brockwell.irccloud.com) Quit (Quit: Connection closed for inactivity)
[1:11] <destrudo> that was old data
[1:11] <destrudo> I should have told you to disregard it
[1:12] <destrudo> I press a button and all of my pts's go up and enter
[1:16] * Bored (~bret@9S0AAA4JM.tor-irc.dnsbl.oftc.net) Quit ()
[1:17] * Ian2128 (~raindog@exit-01d.noisetor.net) has joined #ceph
[1:29] * tupper (~tcole@108-83-203-37.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[1:44] * fdmanana__ (~fdmanana@bl13-129-165.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[1:46] * Ian2128 (~raindog@7R2AABPMD.tor-irc.dnsbl.oftc.net) Quit ()
[1:51] * tupper (~tcole@108-83-203-37.lightspeed.rlghnc.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[1:56] * ulterior (~lobstar@h-213.61.149.100.host.de.colt.net) has joined #ceph
[1:57] * oms101 (~oms101@p20030057EA48F700C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:01] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[2:05] * oms101 (~oms101@p20030057EA098700C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:09] * sankarshan (~sankarsha@183.87.39.242) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[2:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:11] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:13] * haomaiwang (~haomaiwan@183.206.168.223) Quit (Remote host closed the connection)
[2:23] * mschiff (~mschiff@mx10.schiffbauer.net) Quit (Remote host closed the connection)
[2:25] * ulterior (~lobstar@7R2AABPMZ.tor-irc.dnsbl.oftc.net) Quit ()
[2:26] * elt (~tokie@nx-01.tor-exit.network) has joined #ceph
[2:28] * mschiff (~mschiff@mx10.schiffbauer.net) has joined #ceph
[2:39] * mattronix (~quassel@mail.mattronix.nl) has joined #ceph
[2:45] <jidar> what's osd.4 DNE
[2:45] <jidar> mean?
[2:46] * mattronix_ (~quassel@mail.mattronix.nl) Quit (Ping timeout: 480 seconds)
[2:46] <lurbs> DNE is Does Not Exist.
[2:47] <jidar> how can I clean it up?
[2:47] <jidar> it's not in a crushmap and osd rm 4 doesn't remove it
[2:48] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[2:48] <jidar> woops, it was in a crushmap
[2:55] * elt (~tokie@7R2AABPND.tor-irc.dnsbl.oftc.net) Quit ()
[3:00] * TomyLobo (~Pulec@politkovskaja.torservers.net) has joined #ceph
[3:02] <jidar> anybody feel like helping me with my OSD issues? Basically I've got 6 OSD's up, and 1 mon up but for whatever reason they won't come off of being stale+incomplete, peering, etc. This is a fresh deployment so no data https://gist.github.com/f1af5919f4af31f2300c
[3:06] * derjohn_mobi (~aj@x590e2cb6.dyn.telefonica.de) has joined #ceph
[3:13] * derjohn_mob (~aj@x590e088f.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[3:25] * shyu (~Shanzhi@119.254.120.66) has joined #ceph
[3:30] * TomyLobo (~Pulec@5NZAADT7T.tor-irc.dnsbl.oftc.net) Quit ()
[3:32] * yanzheng (~zhyan@125.71.107.110) has joined #ceph
[3:35] * Mika_c (~Mk@122.146.93.152) has joined #ceph
[3:36] * sankarshan (~sankarsha@183.87.39.242) has joined #ceph
[3:38] * Debesis (~0x@140.217.38.86.mobile.mezon.lt) Quit (Ping timeout: 480 seconds)
[3:44] * burley (~khemicals@cpe-98-28-239-78.cinci.res.rr.com) has joined #ceph
[3:55] * zhaochao (~zhaochao@125.39.8.226) has joined #ceph
[3:56] * OutOfNoWhere (~rpb@199.68.195.101) has joined #ceph
[4:02] * BillyBobJohn (~maku@tor-exit1.arbitrary.ch) has joined #ceph
[4:07] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[4:10] <TheSov> what is the easiest way to replace a failed osd, cuz i just googled it and it seems theres an unecesary amount of work
[4:11] <snerd> I just discovered I had civetweb in firefly all this time and the caph-ansible stuff is massively out of date
[4:11] * vbellur1 (~vijay@122.171.91.105) Quit (Ping timeout: 480 seconds)
[4:11] <snerd> srsly documentation is just awful
[4:26] * BranchPredictor (branch@predictor.org.pl) Quit (Ping timeout: 480 seconds)
[4:31] * BillyBobJohn (~maku@9S0AAA4QN.tor-irc.dnsbl.oftc.net) Quit ()
[4:38] * yguang11 (~yguang11@12.31.82.125) has joined #ceph
[4:43] * bkopilov (~bkopilov@bzq-79-183-58-206.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[4:48] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[4:50] <flaf> TheSov: the easiest way is: remove completely the OSD and recreate a new OSD.
[4:51] <TheSov> yeah thats basically the same thing
[4:51] <TheSov> but my problem is, i cant have my intern do that
[4:53] <flaf> Too bad, it's really the easiest way.
[4:54] * yguang11 (~yguang11@12.31.82.125) Quit (Remote host closed the connection)
[4:55] <flaf> and maybe the only one... (not sure)
[5:00] <TheSov> thats terrible... are you telling me in order to replace a failed osd. i have to remove it from operation, remove it from the crush map. add a new disk, setup osd, add it to crush map every time a disk fails?
[5:00] <TheSov> seriouly if you have thousands of osd's you will be doing that everyday
[5:01] <lurbs> Adding an OSD (using ceph-{deploy,disk}) adds it to the CRUSH map.
[5:01] <TheSov> ok 1 less step thats better
[5:01] * BranchPredictor (branch@predictor.org.pl) has joined #ceph
[5:01] <TheSov> is there a faster way with ceph deploy?
[5:02] <TheSov> there should be an osd refill command
[5:02] <TheSov> that way u could restart the downed osd, and "backfill" it
[5:03] <lurbs> 'ceph osd rm $OSD; ceph auth del osd.$OSD; ceph osd crush remove $OSD' isn't particularly onerous, is it?
[5:04] <TheSov> lurbs, imagine you are working with storage raids. you remove the bad disk, put in a new disk. no further interaction is required
[5:04] <lurbs> You could have fun with udev triggers.
[5:04] <flaf> TheSov: sorry I'm not an ceph expert (and I haven't thousands of OSDs), but when an OSD is failed, I see just 2 options: repair the OSD, or remove it (for the disk, of course it depends if the problem is just a software pb or a hardware pb).
[5:04] <lurbs> But that could get rather complicated.
[5:05] <TheSov> well, this escalates a bit when you lose a journal right?, because you have 4 or 5 osds down then
[5:05] <lurbs> Which is an argument for not putting journals on separate SSDs if you have a sufficiently large cluster, yeah.
[5:06] <TheSov> i dont just intend to put a journal on any ssd, im going to be putting journals on NVMe disks
[5:06] <TheSov> if possible i want sub 2ms writes
[5:06] * ivotron (uid25461@id-25461.brockwell.irccloud.com) has joined #ceph
[5:07] <TheSov> so when you have a sufficiently large cluster you put journals on the same disk?
[5:07] <lurbs> I believe that's what, for example, CERN do.
[5:08] <TheSov> but cern was having issues scaling much passed 7200 osds
[5:08] <TheSov> they didnt follow hardware recommendations either
[5:11] <TheSov> any idea how to appropriately size a journal?
[5:13] <lurbs> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#journal-settings <-- Has notes by 'osd journal size'.
[5:15] * Vacuum__ (~Vacuum@88.130.212.48) has joined #ceph
[5:21] * Vacuum_ (~Vacuum@88.130.203.36) Quit (Ping timeout: 480 seconds)
[5:22] * overclk (~overclk@117.242.4.209) has joined #ceph
[5:25] * Rickus (~Rickus@office.protected.ca) Quit (Read error: Connection reset by peer)
[5:25] * Rickus (~Rickus@office.protected.ca) has joined #ceph
[5:28] * squ (~Thunderbi@46.109.36.167) has joined #ceph
[5:29] * Spessu (~Maza@tor-exit1.arbitrary.ch) has joined #ceph
[5:37] * logan2 (~a@63.143.49.103) Quit (Ping timeout: 480 seconds)
[5:39] * flisky (~Thunderbi@106.39.60.34) Quit (Remote host closed the connection)
[5:39] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[5:40] * calvinx (~calvin@101.100.172.246) has joined #ceph
[5:44] * shylesh (~shylesh@121.244.87.118) has joined #ceph
[5:55] <TheSov> thanks!
[5:58] * Spessu (~Maza@5NZAADUGQ.tor-irc.dnsbl.oftc.net) Quit ()
[5:59] * Nephyrin (~chrisinaj@assk.torservers.net) has joined #ceph
[6:02] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[6:03] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) has joined #ceph
[6:06] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) Quit ()
[6:06] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) has joined #ceph
[6:07] * puffy (~puffy@c-50-131-179-74.hsd1.ca.comcast.net) Quit ()
[6:13] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) has joined #ceph
[6:18] * shaunm (~shaunm@74.215.76.114) has joined #ceph
[6:23] * flisky (~Thunderbi@106.39.60.34) Quit (Remote host closed the connection)
[6:23] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[6:26] * npw (~npw@125.253.50.48) has joined #ceph
[6:27] * npw (~npw@125.253.50.48) Quit ()
[6:27] * npw (~npw@125.253.50.48) has joined #ceph
[6:29] * Nephyrin (~chrisinaj@5NZAADUHZ.tor-irc.dnsbl.oftc.net) Quit ()
[6:29] * sardonyx (~sese_@176.10.99.200) has joined #ceph
[6:33] <npw> I???m working up a design for a new home NAS that will be used to store media, and provide iSCSI / NFS services to my VM Lab. I???m curious about the prospect of using ceph rather than a more traditional RAID configuration. The limitation in this instance is that I still only want to use 1 host at this stage. My understanding that if configured correctly that ceph on a single host will distribute blocks between the different OSD???s in the OSD host, so therefore will
[6:33] <npw> able to tolerate the loss of a disk (the number of which would be based on the configured policy) etc. Is this something that is worth pursuing? or am I just introducing a headache in trying to go down this path?
[6:34] <gleam> i'd say it's a headache, but if you want to use it as a learning experience go for it
[6:36] <npw> well that is part of the plan (the learning), but if it???s going to be painful, an introduce an element of risk I???m happy to go down a different path. with ever increasing drive sizes i see SDS soltuions as the ???future???
[6:37] <npw> I guess what I???m asking is aside from the loss of node redundancy (uptime in this case isn???t a requirement), is there anything ???wrong??? with the single node approach that I have suggested
[6:46] * derjohn_mobi (~aj@x590e2cb6.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[6:47] <gleam> not really.. the mons and osds have different i/o patterns but that would really just mean minor performance hits
[6:47] <gleam> otherwise it hink it'd be fine
[6:52] <npw> thanks for your feedback. I have not had much luck in finding anything online where people have been wanting to do this. I hope at some stage that nas4free will include ceph support, as that would lower the barrier to entry, and the assoicated risk for ???non experts??? etc.
[6:54] <gleam> part of hte problem is that to make it as cost effective as normal raid 5/6/z/z2/z3/other parity raid systems you really need to use erasure codes
[6:54] <gleam> which are relatively recent additions
[6:54] <gleam> as in about a year
[6:55] <gleam> and i don't know that they work with rbd or cephfs yet
[6:56] <gleam> oh apparently you can overcome that with a replicated cache tier
[6:57] * amote (~amote@121.244.87.116) has joined #ceph
[6:57] * sleinen1 (~Adium@2001:620:0:82::104) Quit (Ping timeout: 480 seconds)
[6:59] * sardonyx (~sese_@9S0AAA4UN.tor-irc.dnsbl.oftc.net) Quit ()
[6:59] <npw> I wasn???t expecting the same kind of storage effieiency as R5/6, but conidered this a fair trade off for being able to expand the storage as required without doing parity rebuilds etc.
[6:59] <npw> I???ll read up on the erasure codes to see how they work / if they are applicable etc :)
[6:59] <gleam> how many drives are you planning on? i'm sorry if you already said
[7:00] <gleam> erasure coding is a parity system like raid 5/6, eg you say "i want 8 data blocks and 2 parity blocks" and it's the equivalent of raid 6 with a 10 disk array
[7:00] <gleam> (generally speaking)
[7:00] <gleam> same storage efficiency in any case
[7:01] <npw> ah okay.. was looking to start with 4 x 4TB drives, with SSD for journel, or tiered cache
[7:01] <npw> I don???t have enterprise level requirements for IOPS, uptime, or throughput
[7:02] <gleam> if you don't have serious throughput requirements you could ditch the ssd
[7:03] <npw> I was hoping that the SSD would provide a boost for the vmlab VM???s in terms of latency and IOPS, that being said I can use vflash on the hypervisor to provide SSD caching for reads
[7:03] * yguang11 (~yguang11@12.31.82.125) has joined #ceph
[7:03] <gleam> ahh ok
[7:03] <gleam> go for it then
[7:05] <npw> I still have to decide if this is worth the effort, but you have helped clarify some aspecs of this.. Thank you for your time and thoughts
[7:05] <gleam> you could always do the whole thign in a vm
[7:05] <gleam> and see how that goes
[7:05] <gleam> and then if you decide to do it on real hardware you'll have already done it once
[7:05] * overclk (~overclk@117.242.4.209) Quit (Quit: Leaving)
[7:15] * npw (~npw@125.253.50.48) Quit (Quit: npw)
[7:16] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[7:29] * jclm1 (~jclm@ip-64-134-187-212.public.wayport.net) has joined #ceph
[7:30] * kefu (~kefu@114.92.121.253) has joined #ceph
[7:33] * Diablothein (~sese_@nx-01.tor-exit.network) has joined #ceph
[7:36] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) Quit (Ping timeout: 480 seconds)
[7:42] * overclk (~overclk@121.244.87.117) has joined #ceph
[7:44] * vbellur (~vijay@121.244.87.117) has joined #ceph
[7:48] * derjohn_mob (~aj@tmo-108-201.customers.d1-online.com) has joined #ceph
[7:48] * flisky (~Thunderbi@106.39.60.34) Quit (Remote host closed the connection)
[7:49] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[7:51] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[8:00] * BranchPredictor (branch@predictor.org.pl) Quit (Remote host closed the connection)
[8:03] * Diablothein (~sese_@7R2AABPTQ.tor-irc.dnsbl.oftc.net) Quit ()
[8:03] * spate (~jwandborg@kbtr2ce.tor-relay.me) has joined #ceph
[8:05] * nsoffer (~nsoffer@bzq-84-111-112-230.cablep.bezeqint.net) has joined #ceph
[8:07] * hostranger (~rulrich@2a02:41a:3999::85) has joined #ceph
[8:08] * hostranger (~rulrich@2a02:41a:3999::85) has left #ceph
[8:09] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[8:16] * b0e (~aledermue@213.95.25.82) has joined #ceph
[8:18] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[8:19] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[8:23] * rdas (~rdas@121.244.87.116) has joined #ceph
[8:25] * Sysadmin88 (~IceChat77@2.124.164.69) Quit (Quit: Make it idiot proof and someone will make a better idiot.)
[8:28] * sleinen (~Adium@macsl.switch.ch) has joined #ceph
[8:33] * nardial (~ls@dslb-178-009-182-197.178.009.pools.vodafone-ip.de) has joined #ceph
[8:33] * spate (~jwandborg@5NZAADUMR.tor-irc.dnsbl.oftc.net) Quit ()
[8:35] * derjohn_mob (~aj@tmo-108-201.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[8:36] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[8:37] <Be-El> hi
[8:39] * linjan (~linjan@176.12.132.172) has joined #ceph
[8:39] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[8:41] * npw (~npw@125.253.50.48) has joined #ceph
[8:41] * linjan (~linjan@176.12.132.172) Quit (Read error: Connection reset by peer)
[8:43] * kefu (~kefu@114.92.121.253) Quit (Max SendQ exceeded)
[8:43] * kefu (~kefu@114.92.121.253) has joined #ceph
[8:47] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Ping timeout: 480 seconds)
[8:49] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:50] * Kioob`Taff (~plug-oliv@2a01:e35:2e8a:1e0::42:10) has joined #ceph
[8:51] * tganguly (~tganguly@121.244.87.117) has joined #ceph
[8:51] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[8:51] * npw (~npw@125.253.50.48) Quit (Quit: npw)
[8:54] * thomnico (~thomnico@2a01:e35:8b41:120:f872:c679:8f19:a864) has joined #ceph
[8:56] * bobrik (~bobrik@83.243.64.45) Quit (Quit: (null))
[8:57] * derjohn_mob (~aj@tmo-108-201.customers.d1-online.com) has joined #ceph
[8:59] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[9:00] * karnan (~karnan@121.244.87.117) has joined #ceph
[9:01] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[9:02] * cok (~chk@2a02:2350:18:1010:5d0c:b14b:7495:29af) has joined #ceph
[9:03] * _br_ (~superdug@195-154-79-94.rev.poneytelecom.eu) has joined #ceph
[9:06] * nsoffer (~nsoffer@bzq-84-111-112-230.cablep.bezeqint.net) Quit (Ping timeout: 480 seconds)
[9:06] * derjohn_mob (~aj@tmo-108-201.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[9:10] * thomnico (~thomnico@2a01:e35:8b41:120:f872:c679:8f19:a864) Quit (Ping timeout: 480 seconds)
[9:11] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[9:13] * fdmanana__ (~fdmanana@bl13-129-165.dsl.telepac.pt) has joined #ceph
[9:15] * analbeard (~shw@support.memset.com) has joined #ceph
[9:16] * analbeard (~shw@support.memset.com) Quit ()
[9:16] * analbeard (~shw@support.memset.com) has joined #ceph
[9:18] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[9:20] * analbeard (~shw@support.memset.com) Quit (Remote host closed the connection)
[9:22] * sjm (~sjm@49.32.0.145) has joined #ceph
[9:24] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) has joined #ceph
[9:25] * sjm (~sjm@49.32.0.145) Quit ()
[9:26] * sjm (~sjm@49.32.0.145) has joined #ceph
[9:27] * kawa2014 (~kawa@212.110.41.244) has joined #ceph
[9:27] * analbeard (~shw@support.memset.com) has joined #ceph
[9:32] * dgurtner (~dgurtner@178.197.231.53) has joined #ceph
[9:33] * _br_ (~superdug@8Q4AABJG8.tor-irc.dnsbl.oftc.net) Quit ()
[9:34] * fsimonce (~simon@host253-71-dynamic.3-87-r.retail.telecomitalia.it) has joined #ceph
[9:34] * sjm (~sjm@49.32.0.145) Quit (Ping timeout: 480 seconds)
[9:36] * emik0 (~emik0@91.241.13.28) has joined #ceph
[9:38] * VampiricPadraig (~jacoo@9S0AAA42O.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:41] * bobrik (~bobrik@109.167.249.178) has joined #ceph
[9:41] * shyu (~Shanzhi@119.254.120.66) Quit (Ping timeout: 480 seconds)
[9:41] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[9:42] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:47] * freman (~freman@griffin.seedboxes.cc) has joined #ceph
[9:48] <freman> hi! anyone able to privide som tips on performance issue on a newly installe all flash ceph cluster? When we do write test we get 900MB/s write. but read tests are only 200MB/s all servers are on 10GBit connections.
[9:50] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[9:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:53] * bobrik_ (~bobrik@109.167.249.178) has joined #ceph
[9:53] * linjan (~linjan@195.110.41.9) has joined #ceph
[9:53] * shyu (~Shanzhi@119.254.120.66) has joined #ceph
[9:54] * daviddcc (~dcasier@ADijon-653-1-114-184.w90-33.abo.wanadoo.fr) has joined #ceph
[9:55] * bobrik (~bobrik@109.167.249.178) Quit (Ping timeout: 480 seconds)
[9:58] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[9:58] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) Quit (Read error: No route to host)
[10:04] * kefu is now known as kefu|afk
[10:06] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) has joined #ceph
[10:06] * bkopilov (~bkopilov@bzq-79-183-58-206.red.bezeqint.net) has joined #ceph
[10:07] * thebevans (~bevans@94.5.237.252) has joined #ceph
[10:08] * VampiricPadraig (~jacoo@9S0AAA42O.tor-irc.dnsbl.oftc.net) Quit ()
[10:08] * Hejt (~K3NT1S_aw@8Q4AABJJA.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:09] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:11] * sjm (~sjm@49.32.0.145) has joined #ceph
[10:12] * ksperis (~ksperis@46.218.42.103) has joined #ceph
[10:12] * flisky (~Thunderbi@106.39.60.34) Quit (Read error: Connection reset by peer)
[10:12] * Nats_ (~natscogs@114.31.195.238) Quit (Read error: Connection reset by peer)
[10:12] <freman> hi! anyone able to privide som tips on performance issue on a newly installe all flash ceph cluster? When we do write test we get 900MB/s write. but read tests are only 200MB/s all servers are on 10GBit connections.
[10:12] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[10:13] * Nats_ (~natscogs@114.31.195.238) has joined #ceph
[10:16] * BranchPredictor (branch@predictor.org.pl) has joined #ceph
[10:18] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Read error: Connection reset by peer)
[10:18] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[10:23] * [arx] (~arx@sniff-the.kittypla.net) Quit (Ping timeout: 480 seconds)
[10:26] * oro (~oro@84-236-94-205.pool.digikabel.hu) has joined #ceph
[10:27] * kefu|afk is now known as kefu
[10:27] * Concubidated (~Adium@71.21.5.251) Quit (Quit: Leaving.)
[10:31] * thebevans (~bevans@94.5.237.252) Quit (Quit: thebevans)
[10:35] * shylesh__ (~shylesh@121.244.87.124) has joined #ceph
[10:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[10:37] * bitserker (~toni@88.87.194.130) has joined #ceph
[10:38] * Hejt (~K3NT1S_aw@8Q4AABJJA.tor-irc.dnsbl.oftc.net) Quit ()
[10:38] * shylesh (~shylesh@121.244.87.118) Quit (Ping timeout: 480 seconds)
[10:38] * Hemanth (~Hemanth@121.244.87.117) Quit (Ping timeout: 480 seconds)
[10:39] * owasserm (~owasserm@52D9864F.cm-11-1c.dynamic.ziggo.nl) has joined #ceph
[10:40] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[10:42] * dgbaley27 (~matt@c-67-176-93-83.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[10:42] * tganguly (~tganguly@121.244.87.117) Quit (Ping timeout: 480 seconds)
[10:43] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[10:47] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) has joined #ceph
[10:48] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[10:49] * kawa2014 (~kawa@212.110.41.244) Quit (Ping timeout: 480 seconds)
[10:49] * branto (~branto@178-253-136-248.3pp.slovanet.sk) has joined #ceph
[10:53] * tganguly (~tganguly@121.244.87.117) has joined #ceph
[10:53] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:54] * linjan (~linjan@195.110.41.9) has joined #ceph
[11:03] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[11:04] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[11:04] * sjm (~sjm@49.32.0.145) has left #ceph
[11:06] <SamYaple> freman: perhaps your writes are written to the journal first but on read they are coming from the much slower spinning media?
[11:06] <SamYaple> assuming an external ssd journal
[11:09] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[11:10] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[11:12] * Lunk2 (~Sun7zu@politkovskaja.torservers.net) has joined #ceph
[11:19] * gaveen (~gaveen@175.157.146.224) has joined #ceph
[11:21] * sleinen (~Adium@macsl.switch.ch) Quit (Ping timeout: 480 seconds)
[11:22] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[11:24] * sankarshan (~sankarsha@183.87.39.242) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[11:30] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[11:38] * haomaiwang (~haomaiwan@183.206.171.154) has joined #ceph
[11:41] * peeejayz (~peeejayz@isis57186.sci.rl.ac.uk) has joined #ceph
[11:41] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[11:42] <peeejayz> Morning all, I'm getting requests are blocked messages, I've looked all over my cluster to see whats causing this but I can't find anything wrong. Anyone shed some light on it? currently its causing cephfs to fall over.
[11:42] * Lunk2 (~Sun7zu@5NZAADUVQ.tor-irc.dnsbl.oftc.net) Quit ()
[11:43] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[11:46] * tganguly (~tganguly@121.244.87.117) Quit (Ping timeout: 480 seconds)
[11:46] * shylesh__ (~shylesh@121.244.87.124) Quit (Ping timeout: 480 seconds)
[11:47] * thebevans (~bevans@109.69.234.234) has joined #ceph
[11:50] * Debesis (~0x@140.217.38.86.mobile.mezon.lt) has joined #ceph
[11:53] * tganguly (~tganguly@121.244.87.117) has joined #ceph
[11:55] * oro (~oro@84-236-94-205.pool.digikabel.hu) Quit (Remote host closed the connection)
[11:56] * ZombieL (~Aal@176.10.99.209) has joined #ceph
[12:01] * flisky (~Thunderbi@106.39.60.34) Quit (Remote host closed the connection)
[12:01] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[12:01] * treenerd_ (~treenerd@85.193.140.98) has joined #ceph
[12:01] * Administrator_ (~Administr@172.245.26.218) Quit (Read error: Connection reset by peer)
[12:02] * Administrator_ (~Administr@172.245.26.218) has joined #ceph
[12:03] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[12:05] * sleinen (~Adium@130.59.94.231) has joined #ceph
[12:06] * sleinen1 (~Adium@2001:620:0:82::106) has joined #ceph
[12:07] * sankarshan (~sankarsha@183.87.39.242) has joined #ceph
[12:13] <treenerd_> HI; FYI I tried to install ceph-common (0.94.2-1trusty) on Ubuntu 14.04; Seems there is a bug in the ceph-common version because I got "ImportError: No module named ceph_argparse" if I try to execute ceph. 0.94.1 worked just fine.
[12:13] * sleinen (~Adium@130.59.94.231) Quit (Ping timeout: 480 seconds)
[12:14] * lifeboy (~roland@196.45.29.61) has joined #ceph
[12:14] <treenerd_> deb http://ceph.com/debian-hammer/ trusty main was the repo that I used
[12:15] <treenerd_> shouldn't that repo be a stable branch, or did I use the wrong repository?
[12:15] <SamYaple> treenerd_: its working fine for me
[12:15] <lifeboy> After some time (a few weeks) a perfectly working small ceph cluster unmounts ceph-fs storage. In syslog I have "/mnt/cephfs: File exists at /usr/share/perl5/PVE/Storage/DirPlugin.pm line 52" repeatedly.
[12:16] <lifeboy> I can't find this error reported elsewhere... :-(
[12:16] <lifeboy> How do I figure out what's going wrong?
[12:16] <SamYaple> treenerd_: last time i did have that issue it was because of a mismatch from where i was getting my packages
[12:17] <SamYaple> some were from the ubuntu repo, others from the ceph
[12:17] <treenerd_> SamYaple: okay thanks for your reply
[12:17] * sleinen (~Adium@130.59.94.231) has joined #ceph
[12:18] * shyu (~Shanzhi@119.254.120.66) Quit (Ping timeout: 480 seconds)
[12:19] <lifeboy> In /var/log/ceph/ceph-client-admin.log I see "fuse_parse_cmdline failed" when I try to remount ceph-fs
[12:20] <lifeboy> mount -a gives me "fuse: bad mount point `/mnt/cephfs': Transport endpoint is not connected"
[12:22] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[12:22] <lifeboy> If I umount /mnt/cephfs and then mount is again, it comes back apparently fine, but of course I don't want this to happen!
[12:24] * sleinen1 (~Adium@2001:620:0:82::106) Quit (Ping timeout: 480 seconds)
[12:25] * Mika_c (~Mk@122.146.93.152) Quit (Quit: Konversation terminated!)
[12:26] * ZombieL (~Aal@8Q4AABJMN.tor-irc.dnsbl.oftc.net) Quit ()
[12:26] * ivotron (uid25461@id-25461.brockwell.irccloud.com) Quit (Quit: Connection closed for inactivity)
[12:36] * shyu (~Shanzhi@119.254.120.66) has joined #ceph
[12:36] * kefu is now known as kefu|afk
[12:38] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Remote host closed the connection)
[12:39] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[12:39] * dgurtner (~dgurtner@178.197.231.53) Quit (Ping timeout: 480 seconds)
[12:44] * ngoswami (~ngoswami@121.244.87.116) Quit (Ping timeout: 480 seconds)
[12:47] * lucas1 (~Thunderbi@218.76.52.64) Quit (Quit: lucas1)
[12:47] * kefu|afk (~kefu@114.92.121.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:47] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[12:48] * nils_ (~nils@doomstreet.collins.kg) has joined #ceph
[12:48] * Pommesgabel (~maku@46.183.220.132) has joined #ceph
[12:56] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) has joined #ceph
[12:57] * kefu (~kefu@114.92.121.253) has joined #ceph
[13:04] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[13:04] * sjm (~sjm@49.32.0.149) has joined #ceph
[13:06] * treenerd__ (~treenerd@85.193.140.98) has joined #ceph
[13:09] * sleinen1 (~Adium@2001:620:0:82::10b) has joined #ceph
[13:09] * lifeboy (~roland@196.45.29.61) Quit (Read error: Connection reset by peer)
[13:09] * nardial (~ls@dslb-178-009-182-197.178.009.pools.vodafone-ip.de) Quit (Quit: Leaving)
[13:10] * redf_ (~red@chello084112110034.11.11.vie.surfer.at) has joined #ceph
[13:10] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[13:11] * treenerd_ (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[13:12] * dgurtner (~dgurtner@178.197.231.53) has joined #ceph
[13:12] * brunoleon (~quassel@ARennes-658-1-109-198.w83-199.abo.wanadoo.fr) has joined #ceph
[13:13] * sleinen2 (~Adium@2001:620:0:82::107) has joined #ceph
[13:15] * flisky (~Thunderbi@106.39.60.34) Quit (Quit: flisky)
[13:15] * sleinen (~Adium@130.59.94.231) Quit (Ping timeout: 480 seconds)
[13:16] * red (~red@chello084112110034.11.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[13:17] * Pommesgabel (~maku@5NZAADU0A.tor-irc.dnsbl.oftc.net) Quit ()
[13:17] * Altitudes (~Maza@0.tor.exit.babylon.network) has joined #ceph
[13:18] <loicd> vikhyat: https://gist.github.com/vumrao/fc43ae201f8121614807 I can't tell, I don't have first hand experience with that
[13:19] <loicd> I mean osd deep scrub stride and osd scrub chunk max
[13:19] <vikhyat> hmm
[13:19] * sleinen1 (~Adium@2001:620:0:82::10b) Quit (Ping timeout: 480 seconds)
[13:19] <vikhyat> may be I will ask Sam
[13:19] * rwheeler (~rwheeler@pool-173-48-214-9.bstnma.fios.verizon.net) Quit (Quit: Leaving)
[13:21] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) has joined #ceph
[13:22] * shyu (~Shanzhi@119.254.120.66) Quit (Remote host closed the connection)
[13:25] * sjm (~sjm@49.32.0.149) has left #ceph
[13:34] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[13:35] * kefu (~kefu@114.92.121.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[13:36] * b0e (~aledermue@213.95.25.82) Quit (Ping timeout: 480 seconds)
[13:36] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[13:36] * lucas1 (~Thunderbi@218.76.52.64) Quit ()
[13:37] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[13:39] * sjm (~sjm@49.32.0.149) has joined #ceph
[13:41] * Hannibal (~Hahaha@p4FFD8E2C.dip0.t-ipconnect.de) has joined #ceph
[13:41] * zhaochao (~zhaochao@125.39.8.226) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 38.0.1/20150526223604])
[13:42] * bobrik__ (~bobrik@109.167.249.178) has joined #ceph
[13:42] <Hannibal> hello
[13:43] * treenerd__ (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[13:44] * sjm (~sjm@49.32.0.149) has left #ceph
[13:45] * bobrik_ (~bobrik@109.167.249.178) Quit (Read error: Connection reset by peer)
[13:47] * bobrik___ (~bobrik@109.167.249.178) has joined #ceph
[13:47] * Altitudes (~Maza@9S0AAA5AY.tor-irc.dnsbl.oftc.net) Quit ()
[13:47] * Bwana (~Grimmer@176.10.99.201) has joined #ceph
[13:48] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) has joined #ceph
[13:51] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[13:51] * treenerd__ (~treenerd@85.193.140.98) has joined #ceph
[13:52] * Hannibal (~Hahaha@p4FFD8E2C.dip0.t-ipconnect.de) Quit (Quit: Verlassend)
[13:52] * jclm1 (~jclm@ip-64-134-187-212.public.wayport.net) Quit (Ping timeout: 480 seconds)
[13:53] * bobrik__ (~bobrik@109.167.249.178) Quit (Ping timeout: 480 seconds)
[13:53] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[13:55] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) Quit (Quit: Leaving.)
[13:56] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) has joined #ceph
[13:56] * oro (~oro@82.141.178.242) has joined #ceph
[13:59] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[13:59] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[13:59] * b0e (~aledermue@213.95.25.82) has joined #ceph
[14:00] * i_m1 (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) has joined #ceph
[14:00] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) Quit (Quit: Leaving.)
[14:05] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) Quit (Quit: Leaving.)
[14:06] * xarses (~xarses@12.10.113.130) Quit (Ping timeout: 480 seconds)
[14:08] * i_m1 (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[14:13] * brunoleon (~quassel@ARennes-658-1-109-198.w83-199.abo.wanadoo.fr) Quit (Remote host closed the connection)
[14:16] * ibravo (~ibravo@72.83.69.64) has joined #ceph
[14:17] * Bwana (~Grimmer@7R2AABP1F.tor-irc.dnsbl.oftc.net) Quit ()
[14:18] * liiwi (liiwi@idle.fi) Quit (Ping timeout: 480 seconds)
[14:20] * liiwi (liiwi@idle.fi) has joined #ceph
[14:20] * linjan (~linjan@195.110.41.9) has joined #ceph
[14:28] * kawa2014 (~kawa@89.184.114.246) Quit (Ping timeout: 480 seconds)
[14:29] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) has joined #ceph
[14:29] * treenerd__ (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[14:34] * KevinPerks (~Adium@2606:a000:80ad:1300:597a:9e58:f677:e520) has joined #ceph
[14:34] * scuttlemonkey is now known as scuttle|afk
[14:36] * shylesh (~shylesh@121.244.87.118) has joined #ceph
[14:37] * murmur1 (~Chrissi_@7R2AABP3A.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:38] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) has joined #ceph
[14:39] * kawa2014 (~kawa@212.110.41.244) has joined #ceph
[14:41] * treenerd__ (~treenerd@85.193.140.98) has joined #ceph
[14:43] * kefu (~kefu@114.92.121.253) has joined #ceph
[14:49] * shylesh__ (~shylesh@121.244.87.118) has joined #ceph
[14:50] <emik0> Hello! Anyone can help me out with RGW on debian? Getting [ceph_deploy.rgw][ERROR ] Failed to execute command: service ceph-radosgw start
[14:52] <emik0> if i just mv file to new name, output is "/usr/bin/radosgw is not running."
[14:52] * squ (~Thunderbi@46.109.36.167) Quit (Quit: squ)
[14:53] * xarses (~xarses@166.175.185.252) has joined #ceph
[14:53] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[14:55] * shylesh (~shylesh@121.244.87.118) Quit (Ping timeout: 480 seconds)
[15:00] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[15:02] <shylesh__> loicd: radosgw-syncagent stripping off the '[' & ']' in an ipv6 URL .. any idea
[15:03] <loicd> shylesh__: even when urlescaped ?
[15:03] <shylesh__> loicd: destination: http://[2620:52:0:880:225:90ff:fefc:25aa] becomes UG:boto:url = 'http://2620:52:0:880:225:90ff:fefc:25aa/admin/config'
[15:03] <shylesh__> yep
[15:04] <shylesh__> i tried http://\[2620:52:0:880:225:90ff:fefc:25aa\] --> this becomes -->'http://[2620:52:0:880:225:90ff:fefc:25aa/admin/config
[15:04] <loicd> shylesh__: I have no expertise there. but ... does this work on other websites ?
[15:04] <shylesh__> loicd: because of this sync agent fails to contact destination
[15:04] <shylesh__> loicd: yep .. itworked fine with ipv4
[15:05] <loicd> I mean can you address a web site with [XXX] as an IPv6 ? I'm so ignorant about that ;-)
[15:05] <loicd> owasserm: does that ring a bell ?
[15:05] <shylesh__> loicd: I couls acess the service from browesert with IPV6 url http://[2620:52:0:880:225:90ff:fefc:25aa]
[15:05] * ibravo (~ibravo@72.83.69.64) Quit (Quit: Leaving)
[15:06] * ibravo (~ibravo@72.83.69.64) has joined #ceph
[15:06] * murmur1 (~Chrissi_@7R2AABP3A.tor-irc.dnsbl.oftc.net) Quit ()
[15:06] * thebevans (~bevans@109.69.234.234) Quit (Quit: thebevans)
[15:07] <loicd> I'm unable to connect but that just means the provider of my coworking place is not IPv6 aware ;-)
[15:08] <loicd> shylesh__: all I find is http://tracker.ceph.com/issues/10965
[15:08] <owasserm> loicd, try alfredodeza
[15:09] <owasserm> shylesh__, ^
[15:09] <loicd> owasserm: oh, right, alfredodeza is working on the agent
[15:10] <loicd> shylesh__: you can file an issue at http://tracker.ceph.com/projects/rgw/issues/new
[15:10] <owasserm> loicd, well he was ...
[15:10] <alfredodeza> :)
[15:10] <shylesh__> loicd: its enabled and service is running
[15:10] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) has joined #ceph
[15:10] <alfredodeza> there is no support for IPV6 in the agent
[15:10] <shylesh__> loicd: tcp6 0 0 :::80 :::* LISTEN 10626/radosgw
[15:10] <shylesh__> alfredodeza: its running here
[15:11] <shylesh__> alfredodeza: oh k .. for sync agent u meant ??
[15:11] <alfredodeza> yes
[15:11] <alfredodeza> *the agent* does not support IPV6
[15:11] <shylesh__> alfredodeza: then we can't have federated gateway for IPV6
[15:11] * overclk (~overclk@121.244.87.117) Quit (Quit: Leaving)
[15:11] <alfredodeza> shylesh__: what is the error you are seeing?
[15:12] <alfredodeza> I mean, I don't recall exactly how is it that we don't support it, but I surely don't recall anything in the agent to work with IPV6 :)
[15:12] <shylesh__> alfredodeza: I am trying to setup federated gateway on IPV6, agent can't fetch region map from dest
[15:12] * Shesh (~Architect@tor-exit1.arbitrary.ch) has joined #ceph
[15:12] <alfredodeza> right, what is the output
[15:12] <alfredodeza> pastebin the logs ?
[15:14] <shylesh__> alfredodeza: http://paste2.org/Z6LBXspv
[15:16] * kefu (~kefu@114.92.121.253) Quit (Max SendQ exceeded)
[15:16] * kefu (~kefu@114.92.121.253) has joined #ceph
[15:17] * daviddcc (~dcasier@ADijon-653-1-114-184.w90-33.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[15:17] <alfredodeza> shylesh__: can you use a newer version of the agent?
[15:17] <alfredodeza> and pass it the verbose flag and try again?
[15:18] * marrusl (~mark@rrcs-70-60-101-195.midsouth.biz.rr.com) has joined #ceph
[15:18] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) Quit (Quit: Leaving.)
[15:18] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) has joined #ceph
[15:18] <shylesh__> alfredodeza: this is downstream package.. I only have rpms .. I can past the verbose output
[15:18] <alfredodeza> what would the issue be with getting an RPM from upstream?
[15:19] * tganguly (~tganguly@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:20] <alfredodeza> shylesh__: the output on the newer versions of the agent are x100 more radable
[15:21] <shylesh__> alfredodeza: ya but I can't disturb the setup with upstream package.. :(
[15:21] <alfredodeza> shylesh__: what is the value for the destination? is it literally [2620:52:0:880:225:90ff:fefc:25aa] ? or "[2620:52:0:880:225:90ff:fefc:25aa]" ?
[15:21] <alfredodeza> well there is not setup so far because it doesn't work :)
[15:22] <shylesh__> alfredodeza: http to be used with IPV6 It needs "[" "]" I think
[15:22] <alfredodeza> it just makes it more difficult to understand what is going on without the newer version
[15:22] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[15:22] <shylesh__> alfredodeza: even without "[" "]" it doesn't work
[15:22] <alfredodeza> shylesh__: I am trying to understand what you have for a value for the destination
[15:22] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[15:22] <alfredodeza> what do you have in your configuration file
[15:23] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:23] <shylesh__> alfredodeza: pm
[15:27] * kefu (~kefu@114.92.121.253) Quit (Max SendQ exceeded)
[15:27] * yanzheng (~zhyan@125.71.107.110) Quit (Quit: This computer has gone to sleep)
[15:28] * yanzheng (~zhyan@125.71.107.110) has joined #ceph
[15:28] * kefu (~kefu@114.92.121.253) has joined #ceph
[15:28] * scuttle|afk is now known as scuttlemonkey
[15:28] * treenerd__ (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[15:28] * bobrik____ (~bobrik@109.167.249.178) has joined #ceph
[15:29] * marrusl (~mark@rrcs-70-60-101-195.midsouth.biz.rr.com) Quit (Remote host closed the connection)
[15:31] * bobrik___ (~bobrik@109.167.249.178) Quit (Ping timeout: 480 seconds)
[15:31] * kawa2014 (~kawa@212.110.41.244) Quit (Ping timeout: 480 seconds)
[15:31] * bobrik____ (~bobrik@109.167.249.178) Quit (Read error: Connection reset by peer)
[15:31] * bobrik____ (~bobrik@109.167.249.178) has joined #ceph
[15:32] * rwheeler (~rwheeler@nat-pool-bos-u.redhat.com) has joined #ceph
[15:37] * bobrik______ (~bobrik@109.167.249.178) has joined #ceph
[15:38] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[15:38] * kefu (~kefu@114.92.121.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[15:40] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Quit: Leaving.)
[15:41] * bobrik____ (~bobrik@109.167.249.178) Quit (Ping timeout: 480 seconds)
[15:41] * Shesh (~Architect@9S0AAA5F5.tor-irc.dnsbl.oftc.net) Quit ()
[15:42] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[15:44] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[15:46] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[15:49] * oro (~oro@82.141.178.242) Quit (Quit: Leaving)
[15:52] * marrusl (~mark@rrcs-70-60-101-195.midsouth.biz.rr.com) has joined #ceph
[15:58] * yanzheng (~zhyan@125.71.107.110) Quit (Quit: This computer has gone to sleep)
[16:00] * yanzheng (~zhyan@125.71.107.110) has joined #ceph
[16:00] * linuxkidd (~linuxkidd@209.163.164.50) has joined #ceph
[16:04] * ngoswami (~ngoswami@121.244.87.116) Quit (Remote host closed the connection)
[16:05] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:06] * bobrik_______ (~bobrik@109.167.249.178) has joined #ceph
[16:06] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[16:08] * kraken (~kraken@gw.sepia.ceph.com) Quit (Ping timeout: 480 seconds)
[16:08] * thomnico (~thomnico@cro38-2-88-180-16-18.fbx.proxad.net) Quit (Quit: Ex-Chat)
[16:11] * kraken (~kraken@gw.sepia.ceph.com) has joined #ceph
[16:12] * bobrik______ (~bobrik@109.167.249.178) Quit (Ping timeout: 480 seconds)
[16:14] * kefu (~kefu@114.92.121.253) has joined #ceph
[16:16] * Silentspy (~Izanagi@nx-01.tor-exit.network) has joined #ceph
[16:19] * yanzheng (~zhyan@125.71.107.110) Quit (Quit: This computer has gone to sleep)
[16:19] * Hemanth (~Hemanth@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:22] * yanzheng (~zhyan@125.71.107.110) has joined #ceph
[16:23] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[16:25] * wushudoin (~wushudoin@38.140.108.2) Quit ()
[16:32] * dyasny (~dyasny@173.231.115.59) has joined #ceph
[16:34] * cok (~chk@2a02:2350:18:1010:5d0c:b14b:7495:29af) Quit (Quit: Leaving.)
[16:35] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[16:37] * Hau_MI is now known as HauM1
[16:38] * kefu (~kefu@114.92.121.253) Quit (Max SendQ exceeded)
[16:39] * kefu (~kefu@114.92.121.253) has joined #ceph
[16:39] * bobrik_______ (~bobrik@109.167.249.178) Quit (Read error: Connection reset by peer)
[16:39] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[16:39] * bobrik_______ (~bobrik@109.167.249.178) has joined #ceph
[16:40] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[16:44] * stevenm_ (~stevenm@212.57.232.254) has joined #ceph
[16:45] * bobrik_______ (~bobrik@109.167.249.178) Quit (Read error: Connection reset by peer)
[16:45] * bobrik_______ (~bobrik@109.167.249.178) has joined #ceph
[16:46] * stevenm_ (~stevenm@212.57.232.254) has left #ceph
[16:46] * Silentspy (~Izanagi@7R2AABP6K.tor-irc.dnsbl.oftc.net) Quit ()
[16:46] * SaneSmith (~drdanick@89.105.194.85) has joined #ceph
[16:46] * ade (~abradshaw@tmo-111-156.customers.d1-online.com) has joined #ceph
[16:46] * yguang11 (~yguang11@12.31.82.125) Quit (Remote host closed the connection)
[16:47] * sankarshan (~sankarsha@183.87.39.242) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[16:47] * yguang11 (~yguang11@12.31.82.125) has joined #ceph
[16:47] * bobrik_______ (~bobrik@109.167.249.178) Quit (Read error: Connection reset by peer)
[16:48] * bobrik_______ (~bobrik@109.167.249.178) has joined #ceph
[16:51] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[16:55] * yguang11 (~yguang11@12.31.82.125) Quit (Ping timeout: 480 seconds)
[16:59] * branto (~branto@178-253-136-248.3pp.slovanet.sk) Quit (Ping timeout: 480 seconds)
[17:00] * rwheeler (~rwheeler@nat-pool-bos-u.redhat.com) Quit (Quit: Leaving)
[17:02] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) has joined #ceph
[17:03] * adeel (~adeel@fw1.ridgeway.scc-zip.net) has joined #ceph
[17:05] * haomaiwa_ (~haomaiwan@60-250-10-249.HINET-IP.hinet.net) has joined #ceph
[17:05] * kefu_ (~kefu@114.92.97.251) has joined #ceph
[17:07] * haomaiwang (~haomaiwan@183.206.171.154) Quit (Ping timeout: 480 seconds)
[17:09] * kefu (~kefu@114.92.121.253) Quit (Ping timeout: 480 seconds)
[17:11] * shylesh__ (~shylesh@121.244.87.118) Quit (Remote host closed the connection)
[17:12] * haomaiwang (~haomaiwan@183.206.171.154) has joined #ceph
[17:16] * SaneSmith (~drdanick@8Q4AABJX8.tor-irc.dnsbl.oftc.net) Quit ()
[17:16] * richardus1 (~Jyron@7R2AABP8L.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:16] * haomaiwa_ (~haomaiwan@60-250-10-249.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[17:16] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[17:17] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) has joined #ceph
[17:23] * wicope (~wicope@0001fd8a.user.oftc.net) Quit (Read error: Connection reset by peer)
[17:24] * bobrik_______ (~bobrik@109.167.249.178) Quit (Quit: (null))
[17:24] * bobrik_______ (~bobrik@109.167.249.178) has joined #ceph
[17:25] * bobrik_______ (~bobrik@109.167.249.178) Quit ()
[17:30] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) has joined #ceph
[17:31] * yanzheng (~zhyan@125.71.107.110) Quit (Quit: This computer has gone to sleep)
[17:33] * OutOfNoWhere (~rpb@199.68.195.101) Quit (Ping timeout: 480 seconds)
[17:34] * t4nk262 (~oftc-webi@67-43-142-107.border7-dynamic.dsl.sentex.ca) has joined #ceph
[17:34] * vbellur (~vijay@122.167.141.188) has joined #ceph
[17:37] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit (Quit: Leaving)
[17:39] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[17:40] * xarses_ (~xarses@166.175.190.188) has joined #ceph
[17:41] * xarses_ (~xarses@166.175.190.188) Quit (Read error: Connection reset by peer)
[17:41] * xarses_ (~xarses@166.175.190.188) has joined #ceph
[17:44] * calvinx (~calvin@101.100.172.246) Quit (Quit: calvinx)
[17:44] <t4nk262> Hi All, we're going to repurpose 4 SuperMicro Servers - Supermicro (2U) 8 SATA Bay with 6025B-URB X7DBU 2x Intel Xeon Quad Core and 32GB RAM each. I'll be adding a dual Infiniband QDR MHQH29B-XTR card so the 4 servers can talk to each other and to our two Proxmox Hosts. Ceph will be used as virtual storage to run our VM's. I'd like to add 1 SSD drive to each of these servers but I don't want to lose a SATA bay. I'm thinking
[17:44] <t4nk262> are steep for us. Are there any tested, cheaper ($500 range) PCIe SSD cards this community thinks would work for us?
[17:45] <t4nk262> Or should I just put our journaling on a SATA disk and when we are ready purchase an SSD and put it in a SATA bay? Any advice you can provide for me would be greatly appreciated.
[17:45] * treenerd__ (~treenerd@cpe90-146-100-181.liwest.at) has joined #ceph
[17:46] * richardus1 (~Jyron@7R2AABP8L.tor-irc.dnsbl.oftc.net) Quit ()
[17:46] <jidar> anybody feel like helping me with my OSD issues? Basically I've got 6 OSD's up, and 1 mon up but for whatever reason they won't come off of being stale+incomplete, peering, etc. This is a fresh deployment so no data https://gist.github.com/f1af5919f4af31f2300c
[17:46] * SaneSmith (~capitalth@hessel2.torservers.net) has joined #ceph
[17:47] * xarses (~xarses@166.175.185.252) Quit (Ping timeout: 480 seconds)
[17:48] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[17:49] * mgolub (~Mikolaj@91.225.202.92) has joined #ceph
[17:50] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[17:52] <gregsfortytwo> jidar: at a guess you've probably actually had all your OSDs stop running for some reason, and with none of them running to report each other dead the monitor is going to wait ~15 minutes before doing so
[17:52] <jidar> gregsfortytwo: I see the ceph-osd process running on each osd host
[17:52] <jidar> with netstat reporting the two ports open per osd
[17:52] <jidar> ie:
[17:52] <jidar> tcp 0 0 192.168.0.11:6807 0.0.0.0:* LISTEN 25345/ceph-osd off (0.00/0/0)
[17:52] <gregsfortytwo> that's not nearly enough
[17:52] <jidar> tcp 0 0 192.168.0.11:6808 0.0.0.0:* LISTEN 25345/ceph-osd off (0.00/0/0)
[17:52] <gregsfortytwo> ports
[17:52] * kraken (~kraken@gw.sepia.ceph.com) Quit (Remote host closed the connection)
[17:53] * kraken (~kraken@gw.sepia.ceph.com) has joined #ceph
[17:53] <jidar> is there something else?
[17:53] <gregsfortytwo> have you found and followed the troubleshooting docs yet? I think they should give you some pointers to get going
[17:53] * treenerd__ (~treenerd@cpe90-146-100-181.liwest.at) Quit (Remote host closed the connection)
[17:53] <jidar> I followed the OSD faq/troubleshooting guide
[17:54] <jidar> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/ iirc
[17:54] <gregsfortytwo> so what did the admin socket tell you about the OSD's state?
[17:54] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) has joined #ceph
[17:55] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) has joined #ceph
[17:56] * i_m (~ivan.miro@deibp9eh1--blueice4n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[17:56] <jidar> you mean this one? ceph --admin-daemon /var/run/ceph/ceph-mon.mac0025b500001b.asok config show
[17:56] <jidar> or quorum_status, etc?
[17:57] <jidar> because when I prune all of the logs, I don't really see anything wrong
[17:58] <jidar> I used to have a bunch of errors on my OSDs starting up
[17:58] <jidar> and I feel like there is some bad stuff left over from the last time I tried to build this cluster
[17:58] * jyoti-ranjan (~ranjanj@idp01webcache5-z.apj.hpecore.net) has joined #ceph
[17:59] <jyoti-ranjan> I need help to setup ceph rados gateway which i have been trying for a while. Any help will be greatly appreciated
[17:59] <jidar> jyoti-ranjan: don't ask to ask, just ask
[18:00] * moore (~moore@63-232-3-122.dia.static.qwest.net) has joined #ceph
[18:01] <jyoti-ranjan> I deployed ceph cluster on single node, which is functioning fine. Also deployed radosgw on same node
[18:01] * moore_ (~moore@64.202.160.88) has joined #ceph
[18:01] <jyoti-ranjan> radosgw-admin user info --uid=testuser also worked fine
[18:01] <jyoti-ranjan> but when i tried to create a bucket, it fails
[18:02] <jyoti-ranjan> getting following error:
[18:02] <jyoti-ranjan> Traceback (most recent call last):
[18:02] <jyoti-ranjan> File "s3test.py", line 13, in <module>
[18:02] <jyoti-ranjan> bucket = conn.create_bucket('my-new-bucket')
[18:02] <jyoti-ranjan> File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 504, in create_bucket
[18:02] <jyoti-ranjan> response.status, response.reason, body)
[18:02] <jyoti-ranjan> boto.exception.S3ResponseError: S3ResponseError: 404 Not Found
[18:02] <jyoti-ranjan> None
[18:02] <jyoti-ranjan> Not sure how to proceed further ...
[18:03] <jyoti-ranjan> I followed ceph documentation as per http://docs.ceph.com/docs/master/radosgw/config/
[18:03] <jyoti-ranjan> any pointer?
[18:04] <jyoti-ranjan> mighte be making basic mistake ... but being a newbie ... not abel to decipher
[18:06] * yanzheng (~zhyan@125.71.107.110) has joined #ceph
[18:06] * jwilkins (~jwilkins@2601:9:4580:f4c:ea2a:eaff:fe08:3f1d) has joined #ceph
[18:08] * moore (~moore@63-232-3-122.dia.static.qwest.net) Quit (Ping timeout: 480 seconds)
[18:09] * xarses_ (~xarses@166.175.190.188) Quit (Read error: No route to host)
[18:09] * xarses (~xarses@166.175.190.188) has joined #ceph
[18:10] * xarses (~xarses@166.175.190.188) Quit (Read error: Connection reset by peer)
[18:10] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Ping timeout: 480 seconds)
[18:10] * kefu_ (~kefu@114.92.97.251) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:10] * xarses (~xarses@166.175.190.188) has joined #ceph
[18:10] * analbeard (~shw@5.80.205.222) has joined #ceph
[18:11] * xarses (~xarses@166.175.190.188) Quit (Remote host closed the connection)
[18:12] * xarses (~xarses@166.175.190.188) has joined #ceph
[18:13] <analbeard> hi guys, after a bit of advice. i'm running a small cluster comprised of a few nodes - ~20tb total at the moment. I've got another node to add to the cluster (3 osds) and i've also got another disk in an existing node which is very unhealthy. to minimise downtime I was thinking of setting noout, stopping the dead osd, adding the new node/disks and then unsetting noout. is this a good plan, or is there a better way?
[18:16] * SaneSmith (~capitalth@9S0AAA5M2.tor-irc.dnsbl.oftc.net) Quit ()
[18:18] * kefu (~kefu@114.92.97.251) has joined #ceph
[18:19] * Kioob`Taff (~plug-oliv@2a01:e35:2e8a:1e0::42:10) Quit (Quit: Leaving.)
[18:22] * dgurtner (~dgurtner@178.197.231.53) Quit (Ping timeout: 480 seconds)
[18:24] * calvinx (~calvin@101.100.172.246) has joined #ceph
[18:25] * CorneliousJD|AtWork (~dusti@1.tor.exit.babylon.network) has joined #ceph
[18:25] * sage (~quassel@2607:f298:6050:709d:a846:fa7:4714:887f) Quit (Remote host closed the connection)
[18:27] * sage (~quassel@2607:f298:6050:709d:4004:c720:f8bb:dac8) has joined #ceph
[18:27] * ChanServ sets mode +o sage
[18:27] * jyoti-ranjan (~ranjanj@idp01webcache5-z.apj.hpecore.net) Quit (Ping timeout: 480 seconds)
[18:27] * xarses (~xarses@166.175.190.188) Quit (Read error: No route to host)
[18:27] * xarses (~xarses@166.175.190.188) has joined #ceph
[18:27] * gaveen (~gaveen@175.157.146.224) Quit (Remote host closed the connection)
[18:28] * xarses (~xarses@166.175.190.188) Quit (Remote host closed the connection)
[18:28] * xarses (~xarses@166.175.190.188) has joined #ceph
[18:29] * kefu (~kefu@114.92.97.251) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:30] * yanzheng (~zhyan@125.71.107.110) Quit (Quit: This computer has gone to sleep)
[18:30] * kefu (~kefu@114.92.97.251) has joined #ceph
[18:35] * ade (~abradshaw@tmo-111-156.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[18:35] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[18:38] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:43] <aarontc> analbeard: if it were me, I'd mark the unhealthy OSD as out right away and wait for the cluster to fully recover before adding disks
[18:44] <aarontc> in my experience, making too many changes at once opens up more risk of issues
[18:46] * sleinen2 (~Adium@2001:620:0:82::107) Quit (Ping timeout: 480 seconds)
[18:47] <analbeard> aarontc: thanks, i'll do it that way then. i've dropped max backfills etc so it shouldn't cause too much interruption
[18:47] <analbeard> hopefully!
[18:49] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[18:49] * branto (~branto@178-253-136-248.3pp.slovanet.sk) has joined #ceph
[18:49] * branto (~branto@178-253-136-248.3pp.slovanet.sk) has left #ceph
[18:52] * Sysadmin88 (~IceChat77@2.124.164.69) has joined #ceph
[18:53] * linjan (~linjan@195.110.41.9) has joined #ceph
[18:53] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[18:55] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[18:55] * CorneliousJD|AtWork (~dusti@9S0AAA5O6.tor-irc.dnsbl.oftc.net) Quit ()
[18:55] * bene-at-car-repair (~ben@nat-pool-bos-t.redhat.com) has joined #ceph
[18:55] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[18:57] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[18:58] * bitserker (~toni@88.87.194.130) Quit (Ping timeout: 480 seconds)
[19:02] * ivotron (uid25461@id-25461.brockwell.irccloud.com) has joined #ceph
[19:04] * haomaiwa_ (~haomaiwan@183.206.171.154) has joined #ceph
[19:05] * haomaiwang (~haomaiwan@183.206.171.154) Quit (Ping timeout: 480 seconds)
[19:06] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[19:07] * xarses_ (~xarses@mdf5036d0.tmodns.net) has joined #ceph
[19:08] * tuhnis (~jacoo@0.tor.exit.babylon.network) has joined #ceph
[19:08] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[19:09] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[19:13] * georgem (~Adium@fwnat.oicr.on.ca) Quit ()
[19:13] * xarses (~xarses@166.175.190.188) Quit (Ping timeout: 480 seconds)
[19:13] * kefu (~kefu@114.92.97.251) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:17] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Remote host closed the connection)
[19:18] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) has joined #ceph
[19:19] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[19:21] * ade (~abradshaw@82.100.236.138) has joined #ceph
[19:21] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[19:24] * vbellur (~vijay@122.167.141.188) Quit (Ping timeout: 480 seconds)
[19:26] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[19:26] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[19:30] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[19:33] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[19:38] * tuhnis (~jacoo@9S0AAA5RH.tor-irc.dnsbl.oftc.net) Quit ()
[19:38] * Gibri (~notmyname@9S0AAA5SW.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:38] * danieagle (~Daniel@187.35.206.151) has joined #ceph
[19:39] * midnightrunner (~midnightr@c-67-174-241-112.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[19:41] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[19:44] * calvinx (~calvin@101.100.172.246) Quit (Quit: calvinx)
[19:45] * arx (~arx@sniff-the.kittypla.net) has joined #ceph
[19:47] * LeaChim (~LeaChim@host86-175-32-176.range86-175.btcentralplus.com) has joined #ceph
[19:47] * bjornar_ (~bjornar@ti0099a430-1131.bb.online.no) has joined #ceph
[19:49] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[19:50] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[19:51] * arx (~arx@sniff-the.kittypla.net) Quit (Quit: Your ideas are intriguing to me, and I wish to subscribe to your newsletter.)
[19:51] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[19:53] * puffy (~puffy@161.170.193.99) has joined #ceph
[19:55] * rlrevell1 (~leer@vbo1.inmotionhosting.com) has joined #ceph
[19:56] * cloud_vision (~cloud_vis@bzq-79-180-29-82.red.bezeqint.net) has joined #ceph
[19:58] <cloud_vision> is it possible to stop clients from requesting blocks from specific OSDs? we run into an issue where after adding new server the rest of the servers are OK(mostly read) but the new OSDs that on the new server is on very high load. For some resone it afecting the entire storage
[19:59] <cloud_vision> would be great if there were osme way to stop clients from requesting from those OSDs untill it finish the refillings
[20:00] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[20:01] <cloud_vision> this issue is making the storage real slow while most of the servers are not busy at all
[20:01] * diq (~diq@c-50-161-114-166.hsd1.ca.comcast.net) has joined #ceph
[20:01] <cloud_vision> thanks for every suggestunes!
[20:01] * analbeard (~shw@5.80.205.222) Quit (Quit: Leaving.)
[20:02] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) has joined #ceph
[20:06] * MACscr (~Adium@2601:d:c800:de3:3517:760f:d3e5:3807) Quit (Quit: Leaving.)
[20:07] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) Quit (Quit: Leaving.)
[20:07] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[20:08] * midnightrunner (~midnightr@216.113.160.71) has joined #ceph
[20:08] * Gibri (~notmyname@9S0AAA5SW.tor-irc.dnsbl.oftc.net) Quit ()
[20:08] * Hidendra1 (~dusti@9S0AAA5UG.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:09] * MACscr (~Adium@2601:d:c800:de3:3ce6:91bf:607a:52a1) has joined #ceph
[20:09] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[20:13] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) has joined #ceph
[20:15] * shaunm (~shaunm@74.215.76.114) Quit (Ping timeout: 480 seconds)
[20:15] * vbellur (~vijay@122.167.141.188) has joined #ceph
[20:15] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[20:18] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[20:23] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:25] * [arx] (~arx@sniff-the.kittypla.net) has joined #ceph
[20:25] * andreww (~xarses@166.175.190.188) has joined #ceph
[20:26] * sleinen (~Adium@vpn-ho-d-137.switch.ch) has joined #ceph
[20:28] * TheSov2 (~TheSov@cip-248.trustwave.com) has joined #ceph
[20:32] * xarses_ (~xarses@mdf5036d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[20:33] * shylesh (~shylesh@123.136.222.88) has joined #ceph
[20:35] * Hidendra1 (~dusti@9S0AAA5UG.tor-irc.dnsbl.oftc.net) Quit (autokilled: This host maybe infected. - Contact support@oftc.net for help. (2015-06-15 18:35:23))
[20:35] <TheSov2> is it recommended to have a mirror /var on the monitors?
[20:35] <TheSov2> mirrored*
[20:35] <mongo> you can set the initial weight to 0 in the config file, and change the rate of backfill.
[20:36] <mongo> TheSov2: you can do it with MD, but planing for failure will make your life easier than trying to prevent it.
[20:36] <TheSov2> I was told that monitor's should have /var on a SSD
[20:37] <TheSov2> oh? so you mean its just best o have a warm monitor server standing by?
[20:37] * Lyncos (~lyncos@208.71.184.41) has joined #ceph
[20:37] <mongo> YOu should have at least 3, preferably 5.
[20:37] <TheSov2> i know that :P
[20:37] <TheSov2> i mean in case 1 goes down
[20:37] <mongo> that is why you have 5, if 1 goes down it is not a firedrill.
[20:37] <TheSov2> 5 makes the small scale at which i am started cost preventative
[20:38] <TheSov2> starting*
[20:38] <mongo> but mirroring the disks will fix most of the risk of a disk failing.
[20:38] * Nacer (~Nacer@2001:41d0:fe82:7200:acf5:d887:522e:95fd) has joined #ceph
[20:38] <Lyncos> Hi Everyone.. I'm having some Performance issue I cannot figure out why.... running ceph version 0.94.2 .. with 20Gbit bonds... I'm not able to to saturate eiter my micronSSD nor the network.... I did individual tests.. and I'm able to saturate microns with 'dd' and network with Iperf.... I was able to saturate 20Gbit/s network before (before what .. I don't know).
[20:39] <Lyncos> I think the upgrade from 0.91 to 0.94 created this .. I'm not sure where to look at
[20:39] <TheSov2> im building a cluster to hold our backups. i have to prove this out before going into production. so backups is my only egress point
[20:39] <mongo> Note that behind disks raid cards are the largest identifiable source of downtime in datacenters.
[20:40] * CorneliousJD|AtWork (~Revo84@104.167.107.142) has joined #ceph
[20:40] <mongo> Lyncos: depending on what network cards you have a private cluster network will probably perform better, but check that jumbo frames are working and that GRO and LRO are off on an intell and GRO on broadcom (in my experiance)
[20:40] <Lyncos> mongo Ok I'll look at GRO/LRO
[20:40] <Lyncos> jumbo is enabled
[20:41] <TheSov2> there are people who built ceph clusters on rasperry pi's thats awesome IMO
[20:41] <mongo> most linux bonding configurations remove the modern features of 10GbE cards which allow them to scale, e.g. irq balancing across cpus etc..
[20:41] <Lyncos> mongo .. I did not change the bonding setup
[20:41] <Lyncos> it was performing before... I'm talking of 4Gbit/s vs 18 or 19Gbit/s before
[20:41] * shylesh (~shylesh@123.136.222.88) Quit (Remote host closed the connection)
[20:42] <mongo> Lyncos: what base OS?
[20:42] <Lyncos> Debian wheezy
[20:42] <Lyncos> Kernel 3.16
[20:42] <Lyncos> I got some Sysctl settings
[20:42] <TheSov2> i know inktank is owned by redhat now but realistically, which distro is better for ceph? redhat or deb?
[20:42] <Lyncos> I'll pastbin them
[20:43] <Lyncos> http://pastebin.com/ESZuVPmP
[20:43] <mongo> TheSov2: as a customer, I prefer debian due to a much more modern kernel and kernel backports to the LTS version.
[20:44] <Lyncos> It's possible that these sysctl have been added at same time I've seen the drop in performance... it's not a production cluster yet.. I did test it to put it in prod.... and figured out I'm not even able to saturate anything
[20:44] <Lyncos> when I was able before
[20:44] <mongo> but then again I also want to run btrfs, which is risky on the version that rhel/centos7 has.
[20:45] <mongo> Lyncos: yep, buffer bloat will reduce performance.
[20:45] <Lyncos> I'm better to remove all this and try again ?
[20:46] * sleinen1 (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[20:48] * sleinen2 (~Adium@2001:620:0:82::102) has joined #ceph
[20:50] * vbellur (~vijay@122.167.141.188) Quit (Ping timeout: 480 seconds)
[20:51] <Lyncos> I'll try to remove these sysctl settings.. this is the only change I see on the cluster
[20:51] * sleinen (~Adium@vpn-ho-d-137.switch.ch) Quit (Ping timeout: 480 seconds)
[20:53] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) has joined #ceph
[20:53] * puffy1 (~puffy@161.170.193.99) has joined #ceph
[20:53] * puffy (~puffy@161.170.193.99) Quit (Read error: Connection reset by peer)
[20:54] * sleinen1 (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:59] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[21:00] * championofcyrodi (~championo@50-205-35-98-static.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[21:00] <cloud_vision> hi guys, any sugestunes on how to stop clients request on an OSD that rebuild. its draging the all cluster. :)
[21:00] * wushudoin (~wushudoin@38.140.108.2) Quit (Ping timeout: 480 seconds)
[21:01] <monsted> you could slowly increase its weight instead of having it go balls-to-the-wall from the start
[21:01] <cloud_vision> 6 nodes total of 30 osds
[21:01] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) Quit (Ping timeout: 480 seconds)
[21:01] <todin> does anyone know this sandisk box, the if500? does it really use ceph internally? How do they get this huge number of IOPS?
[21:02] <cloud_vision> if 1 server from the 6 is rebuilding the all cluster is slow, it doesnt look quit right. maybe its the luck of SSD for jurnals but im not sure
[21:02] * wushudoin_ (~wushudoin@38.140.108.2) has joined #ceph
[21:02] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[21:03] <doppelgrau> cloud_vision: reduce number of PGs per OSD that are parallel worked on
[21:03] <cloud_vision> you mean reduce PGs for the OSDs that on the server that rebuild(new)?
[21:04] <cloud_vision> or for pool
[21:04] <monsted> todin: give it enough CPU, RAM and SSDs?
[21:05] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[21:05] <todin> monsted: we tried it, but we couldn't get more than 5k IOPS per OSD
[21:05] <TheSov2> 1 ssd per 4 rust is a good ratio for backups?
[21:05] <cloud_vision> it is a littles luck on that(other are stronger)
[21:05] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) Quit (Ping timeout: 480 seconds)
[21:05] <doppelgrau> cloud_vision: osd recovery max active and osd max backfills
[21:05] * puffy1 (~puffy@161.170.193.99) Quit (Quit: Leaving.)
[21:06] <cloud_vision> its set for 1
[21:06] <cloud_vision> 1 thread
[21:06] <cloud_vision> or disk in parallel not sure
[21:06] <doppelgrau> cloud_vision: try osd disk thread ioprio idle?
[21:07] <cloud_vision> didnt try that yet, i will now. anything that will stop the cluster from getting slaw would be nice
[21:07] <monsted> topro: then you use a couple hundred OSDs and you're golden? :)
[21:08] <cloud_vision> :) sure, all ssds
[21:08] <cloud_vision> :)
[21:09] <cloud_vision> that was a joke, we are not that reach
[21:09] * CorneliousJD|AtWork (~Revo84@5NZAADVRF.tor-irc.dnsbl.oftc.net) Quit ()
[21:10] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Read error: Connection reset by peer)
[21:10] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[21:11] <TheSov2> i heard you pg number should be high enough that there is about 100 pg's per osd, is that correct?
[21:11] * andreww (~xarses@166.175.190.188) Quit (Quit: Leaving)
[21:12] <todin> but it is a bit of a waste to put one osd on a dc 3700 ssd. the ssd can hanlde a lot more then the 5k IOPS
[21:12] * xarses (~xarses@166.175.190.188) has joined #ceph
[21:12] <doppelgrau> TheSov: http://ceph.com/pgcalc/
[21:14] <mongo> note that DD isn't the best test to see if your ssd is fighting in GC, fio or some other tool against a raw device will be much better.
[21:14] <cloud_vision> currect me if im wronge isnt client op priority suppose to handle that situation?
[21:14] <jidar> am I supposed to be able to `ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show`
[21:18] <jidar> just passed the wrong options in
[21:18] <jidar> had to define the cluster
[21:19] <monsted> todin: might be worth playing around with FSs. a friend tried xfs on intel 750 series NVMe and got horrible performance, while zfs was amazingly fast.
[21:20] <todin> monsted: we tried xfs, ext4 and btrfs the are more or less the same. which implementation of zfs did he use?
[21:20] <monsted> todin: dunno, latest zfsonlinux thing, i'd imagine.
[21:20] <TheSov2> can anyone recommend a nice cheapish chassis for roughly 8 osds
[21:21] <doppelgrau> cloud_vision: can also help, but still attributes a proportional amont of IO to recovery, IF the OSD is nearly fully loaded ???
[21:21] <monsted> todin: mind you, he wasn't doing ceph
[21:23] <cloud_vision> its already fully loaded as it is (1 thread)
[21:23] <cloud_vision> thread_ioprio_class idle wasnt heping ither
[21:24] <cloud_vision> if the jurnals are not on SSD then you should expect a slow cluster on every new server added to the cluster
[21:24] <doppelgrau> cloud_vision: sure the parameters are active? (ceph ???injectargs or restarted osd?)
[21:25] <todin> monsted: without ceph, if i just test the fs on the ssd, I get 50k IOPS
[21:25] <cloud_vision> i eamsn new server without jurnaled ssd
[21:26] <TheSov2> recommendations were 4gb and 1gighz per osd correct?
[21:26] <cloud_vision> seems that no twaek kernel or ceph will help
[21:26] <TheSov2> 4gb of ram*
[21:27] <cloud_vision> when 1 server is slow the all cluster is slow
[21:27] * Nacer (~Nacer@2001:41d0:fe82:7200:acf5:d887:522e:95fd) Quit (Remote host closed the connection)
[21:32] * visbits (~textual@8.29.138.28) has joined #ceph
[21:33] <zenpac> Will Ceph .80 be happy with Debian 8?
[21:41] <visbits> http://pastebin.com/XXN9W1iB
[21:41] <visbits> err
[21:42] * slashd (~slashd@modemcable041.67-82-70.mc.videotron.ca) has joined #ceph
[21:42] * slashd (~slashd@modemcable041.67-82-70.mc.videotron.ca) has left #ceph
[21:52] <TheSov2> when you build a crushmap, do you have to specify every osd?
[21:54] <jidar> anybody feel like helping me with my OSD issues? Basically I've got 6 OSD's up, and 1 mon up but for whatever reason they won't come off of being stale+incomplete, peering, etc. This is a fresh deployment so no data https://gist.github.com/f1af5919f4af31f2300c I've also looked at some of the ceph-osd troubleshooting and looked through the ceph --admin-daemon options without much showing me why I'm
[21:54] <jidar> seeing what I'm seeing, any ideas at all?
[21:54] * a1-away (~jelle@62.27.85.48) Quit (Ping timeout: 480 seconds)
[21:55] <doppelgrau> jidar: with default crush rules you have a problem (size=3 with only two hosts)
[21:55] <TheSov2> u have 1 monitor?
[21:56] <jidar> yea, 1 monitor and 2 osd hosts
[21:56] <doppelgrau> jidar: try setting size=2 in the pools or change the crush-map
[21:56] <jidar> let me throw another host into this
[21:56] <jidar> or that
[21:56] <TheSov2> is your monitor incredibly slow?
[21:56] <jidar> not to my knowledge, it's a physical host
[21:56] * mgolub (~Mikolaj@91.225.202.92) Quit (Quit: away)
[21:56] <jidar> 128gigs of ram
[21:56] <jidar> etc
[21:57] <TheSov2> well your monitor thinks you are degraded for some reason
[21:57] <TheSov2> it may clear if you delete your pools and create them
[21:59] <TheSov2> incidently according the people in this room the mimimum number of monitors to have an odd number greater than 1
[21:59] <doppelgrau> TheSov: only for fault-tolerance
[22:00] <doppelgrau> TheSov: 1 monitor is fine, as long you can live with the fact, that a failing monitor means data loss
[22:00] <TheSov2> doppelgrau, I understand that, but there is also the factor of loss of communication
[22:00] * nsoffer (~nsoffer@bzq-79-180-80-9.red.bezeqint.net) has joined #ceph
[22:00] <TheSov2> which seems to have happened in jidar's case
[22:02] * jcalcote_ (~oftc-webi@63-248-159-172.static.layl0102.digis.net) has joined #ceph
[22:03] <jcalcote_> Hey everyone - I've just gone through the quick-start on a four-node centos 6 configuration. Everything is working great, but when I try to run ceph-rest-api (stand-alone), I get a traceback ending with "rados.ObjectNotFound: error connecting to the cluster"
[22:05] <TheSov2> you can drain an osd by setting its weight to 0 correct?
[22:05] <jcalcote_> I've tried every combination of arguments I can think of - specify the config file directly with --conf=/etc/ceph/ceph.conf, --cluster=ceph, etc
[22:05] * a1-away (~jelle@62.27.85.48) has joined #ceph
[22:06] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[22:06] * championofcyrodi (~championo@50-205-35-98-static.hfc.comcastbusiness.net) has joined #ceph
[22:07] * KeeperOfTheSoul (~nupanick@79.98.107.90) has joined #ceph
[22:09] <cloud_vision> @doppelgaro its injectargs not osd restart
[22:09] <cephalobot`> cloud_vision: Error: "doppelgaro" is not a valid command.
[22:10] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[22:11] <TheSov2> does anyone have a crushmap i can look at as reference the ones online are very simple im looking for something complex
[22:13] <zenpac> Is meta-data server only used for Ceph-FS?
[22:13] <doppelgrau> zenpac: yes
[22:14] <TheSov2> how does ceph-fs work. i know with rbd you map it. same with ceph-fs?
[22:15] <zenpac> doppelgrau: thanks.. Is Ceph-fs the way to go in the future?
[22:15] * jcalcote_ (~oftc-webi@63-248-159-172.static.layl0102.digis.net) has left #ceph
[22:15] <doppelgrau> cloud_vision: seems strange that even with io-class idle it creates lag, but im not shure if the switch og the IO-Class is possible at runtime
[22:16] <doppelgrau> zenpac: which usecase?
[22:16] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) Quit (Quit: Verlassend)
[22:16] <zenpac> doppelgrau: For OpenStack for example. Juno...
[22:18] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[22:18] <TheSov2> can someone explain to me what openstack achieves? is it just an alternative to vmware/proxmox?
[22:18] <monsted> TheSov2: cheap and scalable
[22:18] <zenpac> doppelgrau: Its for testing production use of OpenStack + Ceph ..
[22:18] <TheSov2> monsted, how do you mean?
[22:19] <monsted> TheSov2: openstack and the management system is meant to scale to hundreds of hosts with thousands of VMs
[22:19] <monsted> TheSov2: while vmware is stuck at... 16 hosts?
[22:20] * linjan (~linjan@213.8.240.146) has joined #ceph
[22:20] <TheSov2> eh? no way its way more than that
[22:20] <monsted> our clusters are all under 16 :)
[22:20] <TheSov2> you have numa enabled then
[22:20] <TheSov2> turn if off and you can have more
[22:20] <TheSov2> u can get 32 i think
[22:21] * dgurtner (~dgurtner@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[22:21] * ade (~abradshaw@82.100.236.138) Quit (Ping timeout: 480 seconds)
[22:21] <TheSov2> yes 32 is the number for 5.5
[22:21] <monsted> yeah, it's 32 now
[22:22] <monsted> we've run esx farms since the limit was 16 :)
[22:22] <monsted> (at least if i remember right)
[22:24] <monsted> i just build them (physically) and leave the config to someone else :)
[22:24] <doppelgrau> zenpac: I???m not an openstack expert, but as far as I understand openstack utilizes ???only??? rbd and RGW
[22:25] * ibravo (~ibravo@72.83.69.64) Quit (Read error: Connection reset by peer)
[22:26] * ibravo (~ibravo@72.83.69.64) has joined #ceph
[22:26] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[22:27] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[22:27] <doppelgrau> zenpac: rbd, RGW and CephFs are different ???access methods??? for the distributed block storage, so you choose the best for the applikation
[22:27] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[22:28] <doppelgrau> zenpac: If you need a distributed possix-Filesystem Cephfs is the way to go (in the near future, some devs say it isn???t fully production ready).
[22:28] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[22:28] <monsted> "production ready" is for sissies
[22:28] <doppelgrau> zenpac: for virtual disks rbd is the best choice
[22:28] <doppelgrau> monsted: I???ll use it, but only for some config-Files (and I have backups ^^)
[22:32] <jidar> TheSov2: so I tried to delete all pools
[22:32] <jidar> and recreate one
[22:32] <jidar> seems to have the same issue
[22:32] <jidar> well not same
[22:32] <doppelgrau> jidar: size= or size=2 now?
[22:32] <jidar> let me re-do that now that I've created a new pool
[22:34] <jidar> health HEALTH_WARN 64 pgs stuck unclean; too few pgs per osd (10 < min 20)
[22:34] <jidar> this I think I can fix
[22:35] * tomc (~tomc@192.41.52.12) has joined #ceph
[22:35] <tomc> So we???re seeing an issue with a degraded cluster, we don???t know how to get out of.
[22:36] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[22:36] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[22:36] <tomc> We updated our crush profile which caused, expectedly, a large amount of data movement. However, we are now in a situation where the mons are becoming unresponsive due to large mon dbs (35GB and growing) and it appears compaction is not a thing unless the cluster is in a healthy state
[22:36] * KeeperOfTheSoul (~nupanick@5NZAADVWD.tor-irc.dnsbl.oftc.net) Quit ()
[22:37] * Silentspy (~Enikma@hessel0.torservers.net) has joined #ceph
[22:37] * puffy (~puffy@161.170.193.99) has joined #ceph
[22:38] <tomc> the migration is about 80% complete, but we???re pretty sure the mon quorum is not going to last long enough to complete it??? we???ve had a few instances today where the mons lose quorum for a couple minutes??? which we???ve seen before, once it starts happening its only a matter of time before they lose quorum permanently in our experience.
[22:40] <doppelgrau> tomc: not seen that problem, but online compactiong didn???t work if I understand you correctly? ceph tell mon.{id} compact
[22:42] <tomc> correct we attempted that on 1 mon, and it is not permanently out of the quorum, compaction reduced the size of the DB by 0 bytes, and now it is attempting to catch back up to the time it lost
[22:42] * sleinen2 (~Adium@2001:620:0:82::102) Quit (Ping timeout: 480 seconds)
[22:42] * vbellur (~vijay@49.248.227.250) has joined #ceph
[22:42] <tomc> but it can???t because its too slow to catch up to the mon map changes? I assume
[22:42] <jidar> doppelgrau: ok, so size = 2, health HEALTH_WARN 128 pgs stuck unclean, deleted all pools
[22:43] <jidar> recreated, reset size = 2, changed the pg's from 64, 128 when I recreated it
[22:43] <jidar> getting that
[22:43] <tomc> I also then set compact on start, and restarted that same mon
[22:43] <tomc> it now appears to be downloading all 35GB of mon level db from another mon
[22:44] * dyasny (~dyasny@173.231.115.59) Quit (Ping timeout: 480 seconds)
[22:44] <tomc> we???ve pretty reliably seen issues where mons become unresponsive as soon as the db gets larger than 7-10GB??? Does compaction not run if the cluster is not healthy?
[22:45] <doppelgrau> jidar: now clean or still errors?
[22:46] <jidar> still those warn stuck unclean error
[22:46] <TheSov2> jidar, did you create osds and then delete them without removing them from crush?
[22:46] <doppelgrau> tomc: sorry, cant answer that question (even it is a good one and im courius too)
[22:46] <jidar> TheSov2: I did, but I've since repaired that
[22:46] <jidar> so my osd tree looks clean
[22:46] <jidar> and my auth list looks clean
[22:46] <TheSov2> so if you export your crush map, it shows only existing osd's?
[22:46] <jidar> I haven't tried that
[22:47] <TheSov2> try that
[22:48] <jidar> ceph osd crushmap seems to spit out garbage
[22:48] <jidar> er, getcrushmap
[22:49] <jidar> maybe just osd crush dump?
[22:50] <doppelgrau> jidar: you???ll have to decompile it to get plain text
[22:50] <jidar> oh dear!
[22:50] <jidar> googling
[22:51] <jidar> https://gist.github.com/1c364c61638195d97f7e
[22:52] <jidar> 6 osd's
[22:52] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[22:53] <doppelgrau> jidar: full ceph -s?
[22:54] <doppelgrau> jidar: the crush-map looks okay, default crush-rule and the rest as seen in the osd-tree
[22:54] <jidar> yea, not totally worried about the tuning stuff just yet
[22:54] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[22:54] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[22:54] <jidar> gisting
[22:54] <jidar> https://gist.github.com/848258db7bfee9953f31
[22:56] <doppelgrau> jidar: the size of the pool is two, right?
[22:56] <jidar> ceph osd pool get foo size
[22:56] <jidar> size: 2
[22:56] * rlrevell1 (~leer@vbo1.inmotionhosting.com) Quit (Quit: Leaving.)
[22:57] <jidar> and that's the only pool ins lspools
[22:58] <doppelgrau> jidar: strange, have you configured public and private network or just one? and are the host reachable in each network?
[22:58] <jidar> https://gist.github.com/0de135fcccd051f95e0c
[22:58] <jidar> I've got priv/pub
[22:58] <jidar> basically, 192.168.{0,2}.0/23
[22:59] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) has joined #ceph
[22:59] <jidar> and I've telnet'd from the mon->osd, and osd->osd
[22:59] <jidar> across the open ports
[23:00] <jidar> tcp 0 0 192.168.0.10:6803 0.0.0.0:* LISTEN 25311/ceph-osd off (0.00/0/0)
[23:00] <jidar> stuff like that
[23:00] <jidar> are all working
[23:00] <doppelgrau> jidar: tested if large pakets ???surviv????
[23:00] <doppelgrau> survive
[23:00] <jidar> I haven't tried a ping -s 1492 or so
[23:00] <jidar> but we can now!
[23:00] <jidar> it's all the same l2 network though
[23:01] <doppelgrau> jidar: ok, it you use normal MTU, I thought you use MTU 900 or something like that, and that can cause sometimes problems
[23:01] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Ping timeout: 480 seconds)
[23:01] <jidar> looks like I've got jumbo frames available as well
[23:01] <jidar> MTU's on those interfaces are 1500
[23:02] <bilco105> jidar: mtu 1500 isn't jumbo frames :)
[23:02] <jidar> I said available
[23:02] <jidar> not in use on interface
[23:02] <bilco105> ah
[23:02] <bilco105> I'd highly recommend it
[23:03] <jidar> I'd be glad to! once I get a working cluster!
[23:03] <doppelgrau> jidar: just as an test, set size and min_size to one
[23:03] <bilco105> I'd say you're more likely to get it working with Jumbos than without it :)
[23:04] <jidar> HEALTH_OK
[23:04] <doppelgrau> jidar: if that works, I???d guess it???s some sort of ???crush-blackmagic???
[23:04] <jidar> :(
[23:04] * puffy (~puffy@161.170.193.99) Quit (Quit: Leaving.)
[23:04] <doppelgrau> jidar: ok, then try setting the weight of the osds to the same value
[23:05] <jidar> weeeiiirirrrdddd
[23:05] <jidar> ok
[23:06] * ska (~skatinolo@cpe-173-174-111-177.austin.res.rr.com) has joined #ceph
[23:06] <doppelgrau> jidar: I???d had the case a few weeks ago where crush ???couldn???t really choose??? and in one rack there was an host down and it could realy cope with that, someone suggested changing weight (in that case to 0 since down) and worked
[23:06] <jidar> working on that
[23:06] * Silentspy (~Enikma@5NZAADVXY.tor-irc.dnsbl.oftc.net) Quit ()
[23:07] * BillyBobJohn (~Kyso@tor-amici-exit.tritn.com) has joined #ceph
[23:07] <doppelgrau> jidar: think it has something to do with the approach trying to find fitting ???groups??? of OSDs, and in some cases too many attemps fails and crush gives up or something like that
[23:07] <doppelgrau> jidar: as far as I understand only a problem in very small clusters
[23:08] <jidar> hmm, I've got a 3rd host to throw at this
[23:08] <jidar> with 12 or so OSD
[23:08] <jidar> but I was trying to figure out wtf, so I reduced the number of mon's from 3->1 and OSD hosts from 3->2
[23:09] <doppelgrau> jidar: you can test the theory, if it works with three hosts ans size=2 and problems with three hosts and size=3 ???
[23:09] * tomc (~tomc@192.41.52.12) Quit (Quit: tomc)
[23:09] <doppelgrau> or even simpler, change the crush-ap for testing
[23:09] <jidar> crush-ap?
[23:09] <doppelgrau> jidar: change step chooseleaf firstn 0 type host to step chooseleaf firstn 0 type osd
[23:10] <doppelgrau> map
[23:10] <doppelgrau> (only for testing, since ceph could choose osds on the same host then)
[23:11] * towen (~towen@c-98-230-203-84.hsd1.nm.comcast.net) has joined #ceph
[23:13] * ChrisNBlum (~textual@164-229.eduroam.rwth-aachen.de) has joined #ceph
[23:16] * vbellur (~vijay@49.248.227.250) Quit (Ping timeout: 480 seconds)
[23:16] <TheSov2> raw numbers here you can do 1 PB for a little over 80k
[23:16] <TheSov2> holy crap is this cheap
[23:16] <TheSov2> 1 pb of san is like near the million mark
[23:18] <jidar> doppelgrau: changing it back to size = 2 seems fine
[23:18] <doppelgrau> jidar: with or without the change in the crush-map?
[23:18] <jidar> I haven't touched the crushmap yet
[23:19] <jidar> ah, my min_size is 1
[23:19] * jskinner (~jskinner@host-95-2-129.infobunker.com) has joined #ceph
[23:20] <jidar> I feel like redhat is somehow done this to me
[23:21] <jidar> it works fine with min_size = 2 and size = 2 now
[23:21] * Nacer (~Nacer@2001:41d0:fe82:7200:44ab:210b:5810:a626) has joined #ceph
[23:24] <jidar> ok, so when it gets created
[23:25] * ChrisNBlum (~textual@164-229.eduroam.rwth-aachen.de) Quit (Ping timeout: 480 seconds)
[23:27] * midnightrunner (~midnightr@216.113.160.71) Quit (Remote host closed the connection)
[23:27] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) Quit (Quit: Leaving)
[23:28] * wicope (~wicope@0001fd8a.user.oftc.net) Quit (Remote host closed the connection)
[23:32] * midnightrunner (~midnightr@216.113.160.71) has joined #ceph
[23:36] * bobrik_______ (~bobrik@83.243.64.45) has joined #ceph
[23:36] * BillyBobJohn (~Kyso@8Q4AABKAM.tor-irc.dnsbl.oftc.net) Quit ()
[23:37] * Schaap (~Vidi@TOR-EXIT.CYLAB.CMU.EDU) has joined #ceph
[23:37] * midnightrunner (~midnightr@216.113.160.71) Quit (Read error: Network is unreachable)
[23:37] * analbeard (~shw@5.80.205.222) has joined #ceph
[23:38] <TheSov2> not a bad pricetag for me, 1440TB for 82k
[23:40] * linjan (~linjan@213.8.240.146) Quit (Ping timeout: 480 seconds)
[23:41] * jclm (~jclm@ip-64-134-187-212.public.wayport.net) has joined #ceph
[23:43] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[23:44] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[23:49] * midnightrunner (~midnightr@216.113.160.71) has joined #ceph
[23:55] * bjornar_ (~bjornar@ti0099a430-1131.bb.online.no) Quit (Ping timeout: 480 seconds)
[23:55] * jskinner (~jskinner@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[23:56] * jskinner (~jskinner@host-95-2-129.infobunker.com) has joined #ceph
[23:57] * dgurtner (~dgurtner@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:59] * oro (~oro@79.120.135.209) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.