#ceph IRC Log

Index

IRC Log for 2016-01-05

Timestamps are in GMT/BST.

[0:00] * InIMoeK (~InIMoeK@029-226-128-083.dynamic.caiway.nl) has joined #ceph
[0:00] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[0:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[0:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[0:01] * olid19810 (~olid1982@aftr-185-17-204-109.dynamic.mnet-online.de) Quit (Ping timeout: 480 seconds)
[0:02] * Etki (~smuxi@89-156-86-143.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[0:04] * bene_in_mtg (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[0:07] * shawniverson (~shawniver@199.66.65.7) has joined #ceph
[0:10] * georgem (~Adium@184.151.179.232) has joined #ceph
[0:12] * Moriarty (~Fapiko@76GAAA0IT.tor-irc.dnsbl.oftc.net) Quit ()
[0:13] * mtb` (~mtb`@157.130.171.46) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[0:16] * johnavp1989 (~jpetrini@pool-100-14-5-21.phlapa.fios.verizon.net) has joined #ceph
[0:21] * Rachana (~Rachana@2601:87:3:3601::7554) Quit (Quit: Leaving)
[0:24] * mattbenjamin (~mbenjamin@aa2.linuxbox.com) Quit (Quit: Leaving.)
[0:26] <portante> Hi folks, best way to install the master branch of ceph via ceph-deploy?
[0:26] <portante> can I just use, "ceph-deploy install --dev master $CEPH_HOST"?
[0:27] <portante> I tried that on F23, and say warnings like: "Failed to synchronize cache for repo 'ceph-noarch' from http://gitbuilder.ceph.com/... Cannot download repomd.xml ... disabling
[0:27] * georgem (~Adium@184.151.179.232) Quit (Quit: Leaving.)
[0:28] <portante> I am new to ceph, so I am probably way off on how to do this
[0:28] <portante> I was following, http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-install/#install
[0:28] <portante> thinking --dev master would work
[0:29] * angdraug (~angdraug@64.124.158.100) Quit (Quit: Leaving)
[0:34] <joshd> portante: generally it will, but for f23 there aren't upstream packages yet, just for f22 and other distros
[0:35] <portante> okay, so I'll install f22 and work from that, thanks!
[0:40] * fsimonce (~simon@host229-72-dynamic.54-79-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[0:46] <TheSov2> portante, I would suggest you not build ceph, and instead use the existing binaries
[0:52] * InIMoeK (~InIMoeK@029-226-128-083.dynamic.caiway.nl) Quit (Ping timeout: 480 seconds)
[0:59] * yguang11 (~yguang11@2001:4998:effd:600:99e1:b2f:e2cf:48c0) Quit (Remote host closed the connection)
[0:59] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) has joined #ceph
[1:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[1:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[1:03] * moore (~moore@64.202.160.88) Quit (Remote host closed the connection)
[1:04] * Icey (~Icey@0001bbad.user.oftc.net) Quit (Quit: Bye!)
[1:04] * Icey (~Icey@pool-74-109-7-163.phlapa.fios.verizon.net) has joined #ceph
[1:08] * garphy is now known as garphy`aw
[1:09] * Tumm (~geegeegee@nl2x.mullvad.net) has joined #ceph
[1:10] * fghaas (~florian@91-119-120-46.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[1:17] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[1:19] * oms101 (~oms101@p20030057EA11E700C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:23] * dyasny (~dyasny@dsl.198.58.158.134.ebox.ca) Quit (Ping timeout: 480 seconds)
[1:25] * Tumm (~geegeegee@nl2x.mullvad.net) Quit (Ping timeout: 480 seconds)
[1:27] * oms101 (~oms101@p20030057EA766900C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:33] * thansen (~thansen@17.253.sfcn.org) Quit (Quit: Ex-Chat)
[1:37] * pdrakeweb (~pdrakeweb@cpe-65-185-74-239.neo.res.rr.com) has joined #ceph
[1:37] * cloudm2 (uid37542@id-37542.tooting.irccloud.com) Quit (Quit: Connection closed for inactivity)
[1:38] * xar (~xar@retard.io) has joined #ceph
[1:40] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) Quit (Quit: leaving)
[1:49] * dyasny (~dyasny@dsl.198.58.158.134.ebox.ca) has joined #ceph
[1:55] * rendar_ (~I@87.19.183.44) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[1:59] * wushudoin (~wushudoin@2601:646:8201:7769:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[2:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[2:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[2:05] * efirs (~firs@c-73-231-190-117.hsd1.ca.comcast.net) has joined #ceph
[2:09] * blynch (~blynch@vm-nat.msi.umn.edu) Quit (Remote host closed the connection)
[2:09] * blynch (~blynch@vm-nat.msi.umn.edu) has joined #ceph
[2:14] <portante> TheSov2: thanks, but I am not familiar enough with ceph to know the difference. How can I use ceph-deploy to install the existing binaries for master which at least include the syslog priority fix?
[2:15] <TheSov2> what os are you using
[2:17] <portante> fedora
[2:17] <portante> I can also use centos or rhel
[2:18] <portante> TheSov2: I was following Daniel Berrange's blog post on a single node ceph deployment, https://www.berrange.com/posts/2015/12/21/ceph-single-node-deployment-on-fedora-23/
[2:18] <TheSov2> well i got a step by step video online but thats using ubuntu server
[2:19] <portante> I thought using ceph-deploy install --dev master would be sufficient, but that requires an f23 build
[2:19] <TheSov2> and its specific to ubuntu server
[2:19] <portante> oh
[2:19] <portante> but I am told that since there are f22 builds for ceph, ceph-deploy install --dev master will work on an f22 box
[2:21] * sudocat (~dibarra@2602:306:8bc7:4c50::46) Quit (Ping timeout: 480 seconds)
[2:23] * linjan_ (~linjan@176.195.50.55) Quit (Ping timeout: 480 seconds)
[2:24] * squizzi (~squizzi@107.13.31.195) Quit (Ping timeout: 480 seconds)
[2:27] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[2:28] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:31] * kefu (~kefu@114.92.107.250) has joined #ceph
[2:32] * LeaChim (~LeaChim@host86-132-236-140.range86-132.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:33] * linjan_ (~linjan@176.195.50.55) has joined #ceph
[3:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[3:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[3:02] * KrimZon (~csharp@162.216.46.184) has joined #ceph
[3:04] <portante> joshd: you still around?
[3:04] <portante> on F22, ceph-deploy fails, because it trying to do this yum command, "yum -y -q install ceph radosgw", but it should be, "yum -y -q install ceph ceph-radosgw"
[3:07] <joshd> portante: the upstream packages is just radosgw
[3:07] <joshd> *package
[3:07] <portante> ah, well, ceph-deploy is not finding it in the master git repo for f22
[3:07] <portante> :(
[3:07] <portante> foiled again
[3:08] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[3:08] <portante> where is pinky and the brain when you need em'!
[3:08] * valeech (~valeech@pool-108-44-162-111.clppva.fios.verizon.net) has joined #ceph
[3:08] <motk> speaking of ceph and rgw - anyone know if infernalis still supports the old uri/bucket naming convention, instead of bucket.uri?
[3:09] <portante> ceph-deploy on f22 generates the following yum command, "yum -y -q install ceph radosgw"
[3:09] <valeech> hello
[3:09] <portante> but the only radosgw package available is named, "ceph-radosgw"
[3:10] <motk> yep, in the fedora repo that's true
[3:10] <portante> see http://gitbuilder.ceph.com/ceph-rpm-fc22-x86_64-basic/ref/master/$basearch
[3:10] * jwilkins (~jowilkin@2601:644:4000:97c0:ea2a:eaff:fe08:3f1d) Quit (Ping timeout: 480 seconds)
[3:10] <portante> so can I hack ceph-deploy here?
[3:11] * sankarshan (~sankarsha@106.206.150.183) has joined #ceph
[3:11] * portante goes and tries to ...
[3:11] <motk> Name : ceph-radosgw
[3:11] <motk> Arch : x86_64
[3:11] <motk> Epoch : 1
[3:11] <motk> Version : 0.94.5
[3:11] <joshd> portante: it looks like the latest ceph-deploy has a hack for that
[3:12] <joshd> https://github.com/ceph/ceph-deploy/blob/master/ceph_deploy/install.py#L55
[3:12] <portante> k thanks
[3:13] <valeech> I have a ceph hardware architecture question. I have two storage servers with 8 2TB 7200RPM drives each. The storage servers have (2)10G nics and (2) 1G nics. I also have a smaller server with 4 drives but it only has (2) 1G nics. Would it be possible to cross connect the 2 storage servers via 10G for private ceph replication and use the 1G interfaces on all three machines for monitors?
[3:18] * zhaochao (~zhaochao@111.161.77.227) has joined #ceph
[3:18] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[3:19] * jwilkins (~jowilkin@2601:644:4000:97c0:56ee:75ff:fe10:724e) has joined #ceph
[3:21] * georgem (~Adium@107-179-142-56.cpe.teksavvy.com) has joined #ceph
[3:23] <flaf> valeech: I'm not an expert but to my mind the smaller server needs to have an interface in the private cluster network.
[3:24] <valeech> flaf: That???s what I was wondering. Do monitors need access to the private net or just the public net?
[3:26] <flaf> For me, the monitor need access to the public net only (and generally the monitor is _in_ the public net).
[3:26] <lurbs> Only things that needs to be in the private net are the OSDs.
[3:27] <valeech> That would be my scenario too. I am trying to get this done without buying a 10G switch :)
[3:28] <lurbs> Our proof of concept was three storage servers, each with two 10 Gb NICs and no switch. We connected them in a ring and put bridges across the interfaces.
[3:31] <portante> joshd, well, it looks like the fedora install method ignores the given components, and instead has a hardcoded install for "ceph radosgw" instead
[3:31] <portante> :(
[3:31] <portante> joshd: that is on the version I have.
[3:31] <portante> so I hacked it for now
[3:32] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[3:32] <valeech> lurbs: that???s a great idea
[3:32] <motk> does seb from ceph-ansible fame hang out here?
[3:32] * KrimZon (~csharp@162.216.46.184) Quit ()
[3:32] * Crisco (~offender@watchme.tor-exit.network) has joined #ceph
[3:35] * naoto (~naotok@2401:bd00:b001:8920:27:131:11:254) has joined #ceph
[3:35] * yanzheng (~zhyan@182.139.23.32) has joined #ceph
[3:35] * linjan_ (~linjan@176.195.50.55) Quit (Ping timeout: 480 seconds)
[3:36] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[3:39] * kefu (~kefu@114.92.107.250) has joined #ceph
[3:43] * valeech (~valeech@pool-108-44-162-111.clppva.fios.verizon.net) Quit (Quit: valeech)
[3:47] * doppelgrau_ (~doppelgra@p5DC06AB4.dip0.t-ipconnect.de) has joined #ceph
[3:53] * doppelgrau (~doppelgra@p5DC07B91.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:53] * doppelgrau_ is now known as doppelgrau
[3:58] <portante> joshd: so to get things to install and run on F22 has been quite a process
[3:58] <portante> as you suggested, I needed to use the last ceph-deploy script, from "pip install ceph-deploy"
[3:59] <portante> then while the install worked, the ceph-mon service would not start because /var/lib/ceph/mon/ceph-gprfc059/store.db/ was owned by root and not the ceph user
[3:59] <portante> a simple chown -R ceph:ceph /var/lib/ceph/mon/ceph-gprfc059/store.db/ got me past that error, so now the mon starts
[4:00] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[4:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[4:01] * Kupo1 (~tyler.wil@23.111.254.159) Quit (Read error: Connection reset by peer)
[4:02] <portante> the upshot is that all those errors and issues were NOT logged at "emerg" level, so the fix I am interested in is working
[4:02] * Crisco (~offender@84ZAAA3BK.tor-irc.dnsbl.oftc.net) Quit ()
[4:05] * naoto_ (~naotok@27.131.11.254) has joined #ceph
[4:11] * naoto (~naotok@2401:bd00:b001:8920:27:131:11:254) Quit (Ping timeout: 480 seconds)
[4:16] * fauxhawk (~Bromine@65-183-154-104-dhcp.burlingtontelecom.net) has joined #ceph
[4:18] * Kingrat (~shiny@2605:a000:161a:c0f6:ddd2:47d:385f:59bc) Quit (Ping timeout: 480 seconds)
[4:18] * ira (~ira@24.34.255.34) Quit (Ping timeout: 480 seconds)
[4:27] * Kingrat (~shiny@2605:a000:161a:c0f6:2066:c650:bb50:ce87) has joined #ceph
[4:27] * dyasny (~dyasny@dsl.198.58.158.134.ebox.ca) Quit (Ping timeout: 480 seconds)
[4:28] * overclk (~vshankar@59.93.68.10) has joined #ceph
[4:43] * duderonomy (~duderonom@c-24-7-50-110.hsd1.ca.comcast.net) has joined #ceph
[4:44] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[4:46] * fauxhawk (~Bromine@84ZAAA3CS.tor-irc.dnsbl.oftc.net) Quit ()
[5:00] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[5:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[5:04] * kefu (~kefu@114.92.107.250) Quit (Max SendQ exceeded)
[5:05] * kefu (~kefu@114.92.107.250) has joined #ceph
[5:12] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[5:13] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[5:18] * rakeshgm (~rakesh@106.51.30.177) has joined #ceph
[5:22] * amote (~amote@121.244.87.116) has joined #ceph
[5:23] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[5:23] * yanzheng (~zhyan@182.139.23.32) Quit (Quit: This computer has gone to sleep)
[5:27] * daviddcc (~dcasier@80.12.55.237) Quit (Ping timeout: 480 seconds)
[5:30] * kefu (~kefu@114.92.107.250) has joined #ceph
[5:35] * PappI (~superdug@178.162.216.42) has joined #ceph
[5:37] * daviddcc (~dcasier@80.12.55.77) has joined #ceph
[5:40] * georgem (~Adium@107-179-142-56.cpe.teksavvy.com) has left #ceph
[5:44] * tsg (~tgohad@134.134.137.73) Quit (Remote host closed the connection)
[5:51] * Vacuum__ (~Vacuum@88.130.207.60) has joined #ceph
[5:53] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[5:58] * Vacuum_ (~Vacuum@88.130.202.226) Quit (Ping timeout: 480 seconds)
[6:00] * swami1 (~swami@49.44.57.238) has joined #ceph
[6:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[6:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[6:05] * PappI (~superdug@76GAAA0TL.tor-irc.dnsbl.oftc.net) Quit ()
[6:07] * swami2 (~swami@49.32.0.194) has joined #ceph
[6:09] * overclk (~vshankar@59.93.68.10) Quit (Ping timeout: 480 seconds)
[6:12] * swami1 (~swami@49.44.57.238) Quit (Ping timeout: 480 seconds)
[6:15] <jith> hi all, i got ops are blocked error msg when i try to upload data.. Is this because of my cluster_health is health_WARN state?? i am a newbie...
[6:16] * johnavp1989 (~jpetrini@pool-100-14-5-21.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:23] <motk> more info needed
[6:30] * rakeshgm (~rakesh@106.51.30.177) Quit (Ping timeout: 480 seconds)
[6:31] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:32] * efirs (~firs@c-73-231-190-117.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:34] * kefu (~kefu@114.92.107.250) has joined #ceph
[6:41] * overclk (~vshankar@59.93.69.70) has joined #ceph
[6:42] * derjohn_mob (~aj@tmo-112-174.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[6:46] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:46] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:59] * yguang11 (~yguang11@c-50-131-146-113.hsd1.ca.comcast.net) has joined #ceph
[7:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[7:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[7:06] * kefu (~kefu@114.92.107.250) has joined #ceph
[7:11] * lcurtis (~lcurtis@47.19.105.250) Quit (Remote host closed the connection)
[7:12] * daviddcc (~dcasier@80.12.55.77) Quit (Ping timeout: 480 seconds)
[7:16] * delcake (~TomyLobo@atlantic797.serverprofi24.eu) has joined #ceph
[7:18] * shylesh (~shylesh@121.244.87.124) has joined #ceph
[7:24] * karnan (~karnan@121.244.87.117) has joined #ceph
[7:42] * yanzheng (~zhyan@125.71.108.99) has joined #ceph
[7:45] * derjohn_mob (~aj@tmo-112-244.customers.d1-online.com) has joined #ceph
[7:46] * delcake (~TomyLobo@4MJAAA0LS.tor-irc.dnsbl.oftc.net) Quit ()
[7:53] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) Quit (Quit: Ex-Chat)
[7:56] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[7:56] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[7:56] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[7:58] * derjohn_mob (~aj@tmo-112-244.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[7:58] * rwheeler (~rwheeler@5.29.243.114) Quit (Quit: Leaving)
[7:59] * sankarshan (~sankarsha@106.206.150.183) Quit (Quit: Leaving...)
[8:00] * derjohn_mob (~aj@tmo-112-244.customers.d1-online.com) has joined #ceph
[8:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[8:01] * overclk (~vshankar@59.93.69.70) Quit (Ping timeout: 480 seconds)
[8:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[8:06] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[8:07] * tries (~tries__@2a01:2a8:2000:ffff:1260:4bff:fe6f:af91) has joined #ceph
[8:08] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[8:13] * jwilkins (~jowilkin@2601:644:4000:97c0:56ee:75ff:fe10:724e) Quit (Quit: Leaving)
[8:20] * krypto (~krypto@G68-121-13-178.sbcis.sbc.com) has joined #ceph
[8:20] * kefu (~kefu@114.92.107.250) Quit (Max SendQ exceeded)
[8:22] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[8:22] <Be-El> hi
[8:25] * kefu (~kefu@114.92.107.250) has joined #ceph
[8:29] * dgbaley27 (~matt@75.148.118.217) has joined #ceph
[8:30] * trociny (~mgolub@93.183.239.2) Quit (Remote host closed the connection)
[8:32] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[8:36] * trociny (~mgolub@93.183.239.2) has joined #ceph
[8:40] * derjohn_mob (~aj@tmo-112-244.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[8:42] * zaitcev (~zaitcev@c-50-130-189-82.hsd1.nm.comcast.net) Quit (Quit: Bye)
[8:43] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:44] * rwheeler (~rwheeler@bzq-82-81-161-51.red.bezeqint.net) has joined #ceph
[8:49] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[8:50] * enax (~enax@hq.ezit.hu) has joined #ceph
[8:50] * enax (~enax@hq.ezit.hu) has left #ceph
[8:53] * Eduardo_ (~Eduardo@bl4-180-165.dsl.telepac.pt) has joined #ceph
[8:54] * nardial (~ls@dslb-088-072-094-077.088.072.pools.vodafone-ip.de) has joined #ceph
[8:55] * yguang11 (~yguang11@c-50-131-146-113.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[8:57] * treenerd (~treenerd@cpe90-146-148-47.liwest.at) has joined #ceph
[8:58] * dgurtner (~dgurtner@178.197.231.6) has joined #ceph
[8:59] * Eduardo__ (~Eduardo@bl4-180-165.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[9:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[9:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[9:01] * rogierm (~rogierm@a82-94-41-183.adsl.xs4all.nl) has joined #ceph
[9:06] * fghaas (~florian@91-119-120-46.dynamic.xdsl-line.inode.at) has joined #ceph
[9:07] * narthollis (~tritonx@37.48.81.27) has joined #ceph
[9:07] * RMar04 (~RMar04@support.memset.com) has joined #ceph
[9:08] * rmart04 (~rmart04@support.memset.com) has joined #ceph
[9:09] * kefu (~kefu@114.92.107.250) Quit (Max SendQ exceeded)
[9:09] * rogierm (~rogierm@a82-94-41-183.adsl.xs4all.nl) Quit (Ping timeout: 480 seconds)
[9:13] * steveeJ (~junky@HSI-KBW-149-172-252-139.hsi13.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[9:15] * trociny (~mgolub@93.183.239.2) Quit (Remote host closed the connection)
[9:17] * kefu (~kefu@114.92.107.250) has joined #ceph
[9:18] * pabluk_ is now known as pabluk
[9:18] * analbeard (~shw@support.memset.com) has joined #ceph
[9:20] * tdb_ (~tdb@myrtle.kent.ac.uk) Quit (Ping timeout: 480 seconds)
[9:20] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[9:21] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[9:22] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[9:23] * tdb (~tdb@myrtle.kent.ac.uk) has joined #ceph
[9:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:25] * trociny (~mgolub@93.183.239.2) has joined #ceph
[9:25] * karnan (~karnan@121.244.87.117) has joined #ceph
[9:36] * overclk (~vshankar@59.93.65.32) has joined #ceph
[9:37] * narthollis (~tritonx@76GAAA0Y9.tor-irc.dnsbl.oftc.net) Quit ()
[9:44] * fsimonce (~simon@host229-72-dynamic.54-79-r.retail.telecomitalia.it) has joined #ceph
[9:45] * garphy`aw is now known as garphy
[9:46] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[9:53] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[9:53] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[9:56] * zhaochao_ (~zhaochao@61.135.194.175) has joined #ceph
[9:58] * thomnico (~thomnico@2a01:e35:8b41:120:48c4:8169:429f:b1ce) has joined #ceph
[9:59] * zhaochao (~zhaochao@111.161.77.227) Quit (Ping timeout: 480 seconds)
[9:59] * zhaochao_ is now known as zhaochao
[9:59] * Grimmer (~Epi@37.49.226.236) has joined #ceph
[9:59] * rogierm (~rogierm@095-097-129-154.static.chello.nl) has joined #ceph
[10:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[10:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[10:02] * branto1 (~branto@ip-78-102-208-28.net.upcbroadband.cz) has joined #ceph
[10:03] <swami2> I am using ceph cluster with 8 nodes...I wanted to move a few nodes to other rack, without down time...is there way to do this?
[10:04] <swami2> (same Q posted on ceph-devel)
[10:05] <RMar04> you should be able to set no-out and move one at a time, re-add it back into the cluster, move onto the next node no? Updating your node locations in your crush-map after might cause some load though.
[10:05] <Be-El> without further information about your setup: ensure that the min_size setting for all pools allow traffic with one host missing, and then move the hosts one by one
[10:05] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) has joined #ceph
[10:06] <Be-El> RMar04: I'm not sure whether no-out is a good idea. If it is set, traffic to the osds in the the about-to-be-moved hosts will be blocked, resulting in a stall of affected virtual machines etc.
[10:07] * rogierm (~rogierm@095-097-129-154.static.chello.nl) Quit (Remote host closed the connection)
[10:07] * rogierm (~rogierm@095-097-129-154.static.chello.nl) has joined #ceph
[10:07] <swami2> RMar04: If we set no-out, then cluster wirtes will stop...isn't?
[10:07] <RMar04> Yes, you make a good point.
[10:08] <RMar04> Apolgies!
[10:09] <T1w> but without no-out, a major rebalance will happen if the move cannot be completed and the OSD back within 300 sec
[10:10] <T1w> you could possible raise that timeout, so a OSD marked as down doesn't get out'ed after 300 sec, but after 20 mins or so
[10:10] <T1w> then you have 20 mins to shut down the host, move it to a new rack, power it up and have the OSD come back
[10:10] <T1w> (or OSDs)
[10:11] <swami2> Be-El: so need to set the min_size 1 (currently I set it as min_size as 1 only) and remove the node and add it cluster (from other rack)...in that case, a few objects copys will be down and re-balancing may starts...which may not be needed b/c same OSD will be added in next 30 min...
[10:11] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[10:11] <RMar04> Unless you have faith in your DC guys and want to ride the edge of your seat!
[10:11] <Be-El> swami2: if your crush rules do not take racks into account, that procedure should be ok
[10:12] <Be-El> swami2: and as T1w mentioned, you have to be fast or reduce/disable backfilling
[10:12] <Be-El> 'ceph osd set nobackfill' and 'ceph osd set norecovery' might help, too
[10:12] <swami2> T1w: Thats correct...if we can riase the timer to 30 min. or so and add the same node the cluster
[10:13] <swami2> T1w: then all could work...please let me know, if I miss some thing here
[10:15] <swami2> Be-El: Currently CRUSH rule not consider the rack (its node base failure zoen only)..I will update the CRUSH rules after I shifting all node to other 3 racks, such that use the use the rack as failure zone
[10:17] <swami2> Be-El: Thats good...without no-out, will use the nobackfill and norecovery. This will stop the rebalnaceing and no downtime...
[10:17] <T1w> as long as the new rules are added after the move of all nodes are complete it should only result in 1 major data-shuffle
[10:17] <T1w> or rebalance
[10:18] <T1w> mon osd down out interval
[10:18] <T1w> Description: The number of seconds Ceph waits before marking a Ceph OSD Daemon down and out if it doesn?t respond.
[10:18] <T1w> Type: 32-bit Integer
[10:18] <T1w> Default: 300
[10:18] <T1w> that's the value to change
[10:18] <swami2> T1w: Ok, you say - after CRUSH ruls updated to rack as failure zone, there will be big data-shuffle?? that means - huge reblanceing activity starts..
[10:19] <Be-El> swami2: according to my gut that's a good solution. on the other hand there's currently a caffeine lack in my digestive system, so your mileage may vary
[10:19] <T1w> stopping an OSD makes it "out" it self, but the MONs keep expecting it to com back within the 300 secs before outing it and reshuffling data
[10:20] <T1w> swami2: yes, that is the only way for ceph to take the new failure domain (the new rack) into consideration and make allowences for it
[10:20] <T1w> you also need to take another thing into consideration
[10:21] <T1w> if you move 1/3 of your nodes to a new rack you have 2/3 capacity in the old rack and 1/3 in the new - if the entire old rack goes down, you suddenly have a _VERY_ full cluster left - one that needs to handle 3 times as much data
[10:21] <swami2> Be-El: Thanks...solution is - raise the OSD out time and move the node to other rack and update the CRUSH rule..
[10:22] <T1w> the same thing is an issue if/when new OSDs are added to a new rack with different capacity OSd disks from the older OSDs in another node
[10:23] <T1w> suddenly the new rack has much much more capacity and holds much more data - and in case of a failure you could end up with a cluster where no I/O is permitted or possible due to lack of capacity
[10:23] <swami2> T1w: I use 8 nodes in current ATM. Need to distrubute the 8 nodes across 4 racks, that mean 6 nodes has to move to 3 racks, 2 each
[10:24] <T1w> that sounds very resonable
[10:24] <swami2> T1w: Thats not an issue, b/c I am using the same capacity node (ie same node only)
[10:24] <RMar04> Out of curiosity, what kind of spec nodes are you using in your 8 node cluster? If you don't mind me asking?
[10:25] * geli (~geli@geli-2015.its.utas.edu.au) Quit (Ping timeout: 480 seconds)
[10:25] <T1w> good good.. it's just a thing to keep in mind over longer periods of time where new OSDs have more capacity compared to older OSDs
[10:25] * Benoit (~Benoit@2a01cb0405708e00d8188d12d0a68e25.ipv6.abo.wanadoo.fr) has joined #ceph
[10:28] * yankcrime (~yankcrime@185.43.216.241) has joined #ceph
[10:28] <swami2> T1w: sure....but only worry is - after updating the CRUSH as rack as failure zone, there could be a big data-shufle (ie rebalance activity) expected, this may cause the n/w slow
[10:28] <T1w> swami2: yes, but you could try and limit it by only moving a single node at a time
[10:29] <T1w> you can easily add the new racks to CRUSH
[10:29] <T1w> it's first when you move a node below a new rack that the shuffle begins
[10:29] * Grimmer (~Epi@84ZAAA3M4.tor-irc.dnsbl.oftc.net) Quit ()
[10:30] <T1w> and if you do that one node at a time only that node (and nodes where copies are placed) should be affected
[10:30] <T1w> a rebalance is unaviodable
[10:30] * Benoit (~Benoit@2a01cb0405708e00d8188d12d0a68e25.ipv6.abo.wanadoo.fr) Quit (Quit: Quitte)
[10:31] <T1w> I can't remember if there are any way of limiting how many resources the rebalance will use
[10:32] * blo (~bloriot@2a01cb0405708e00d8188d12d0a68e25.ipv6.abo.wanadoo.fr) has joined #ceph
[10:32] <swami2> T1w: Yep, updating the CRUSH is simple one...and I plan to update the CRUSH after all nodes setup done (not one node moved), i guess updating the CRUSH after all node movment will start the rebalance, but it should be fine after a couple of hours..
[10:32] * steveeJ (~junky@141.37.31.187) has joined #ceph
[10:32] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit (Read error: No route to host)
[10:34] * b0e (~aledermue@213.95.25.82) has joined #ceph
[10:35] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[10:37] <swami2> T1w: Yes, the discussed solution is best way to do the node movment...
[10:37] <swami2> T1w: Thank you very much
[10:37] <swami2> Be-El: Thank you very much..
[10:38] <Be-El> swami2: you're welcome
[10:38] <swami2> RMar04: Thank you
[10:38] <RMar04> GL! Hope it goes smoothly!
[10:39] <T1w> np np!
[10:39] <treenerd> swami2: you can limit the backfill and recovery operations
[10:39] <swami2> T1w: Be-El: Plan to update the CRUSH after all nodes moved to other racks, but this may take big re-balanceing ...but no other way
[10:39] * dgbaley27 (~matt@75.148.118.217) Quit (Quit: Leaving.)
[10:40] <treenerd> swami2: http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
[10:40] <T1w> swami2: as long as the load on the cluster network is bearable and your clients can handle/accept it, that would (in theory at least) be the best way for crush to rebalance things in best possible way
[10:40] * krypto (~krypto@G68-121-13-178.sbcis.sbc.com) Quit (Remote host closed the connection)
[10:41] * geli (~geli@geli-2015.its.utas.edu.au) has joined #ceph
[10:41] * krypto (~krypto@103.252.27.149) has joined #ceph
[10:42] <T1w> only moving a few nodes at a time could probably have caused some extra (and unnecessary) rebalancing compared to changing everything at once
[10:42] <treenerd> swami2: also be carefull with the nearfull ratio if your cluster is half full and something goes wrong, because then you can also reach a limit.
[10:43] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[10:43] * nicatronTg (~Neon@37.49.226.236) has joined #ceph
[10:43] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[10:43] <treenerd> swami2: with injectargs you can change those settings also at runtime. e.g. ceph tell osd.* injectargs --osd_max_backfills=2
[10:44] <treenerd> swami2: ceph tell osd.* injectargs --osd_recovery_max_active=2; be carefull with the *
[10:45] <swami2> T1w: if I update the CRUSH -after 2 nodes moved to one rack, then rebalacing starts, which may be problem
[10:46] * T1 (~the_one@87.104.212.66) has joined #ceph
[10:46] <T1w> swami2: if only the 2 moved nodes are moved elsewhere in the CRUSH map, there should be no problems
[10:47] <swami2> treenerd: Good point...risk is there...but plannint to reduce it and do the smooth setup
[10:49] <treenerd> swami2: be also carefull with the rack interlink. it is very easy to saturate the rack interconnect.
[10:50] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[10:51] * olid19810 (~olid1982@aftr-185-17-204-13.dynamic.mnet-online.de) has joined #ceph
[10:52] * yanzheng (~zhyan@125.71.108.99) Quit (Quit: This computer has gone to sleep)
[10:52] * The1_ (~the_one@87.104.212.66) Quit (Ping timeout: 480 seconds)
[10:53] * thomnico (~thomnico@2a01:e35:8b41:120:48c4:8169:429f:b1ce) Quit (Quit: Ex-Chat)
[10:53] * thomnico (~thomnico@2a01:e35:8b41:120:48c4:8169:429f:b1ce) has joined #ceph
[11:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[11:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[11:03] * tim_s007 (~tim_s007@2001:67c:12a0::bc1c:f72e) has joined #ceph
[11:03] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[11:04] <swami2> treenerd: what is rack interlink? all data will go TOR...
[11:04] <swami2> treenerd: let me know if I miss something here
[11:05] <swami2> T1w: update CRUSH, after each move...correct?
[11:06] <T1w> swami2: if you do not wish to have a major single reshuffle, yes
[11:07] <T1w> s/reshuffle/rebalance/
[11:07] <T1w> and allow the rebalance to complete before going to the next rack
[11:08] <T1w> if you only have 2 copies of data, you could potential loose access to some data if both nodes with both copies of the data is moved at the same time
[11:08] * dugravot61 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[11:08] * LeaChim (~LeaChim@host86-132-236-140.range86-132.btcentralplus.com) has joined #ceph
[11:08] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[11:08] <T1w> I'd do it
[11:09] <T1w> move a node to a new rac
[11:09] <T1w> +k
[11:09] <T1w> update CRUSH
[11:09] <T1w> allow rebalance to complete
[11:09] <T1w> move yet another node to a new rack
[11:09] <treenerd_> swami: I just thought you have 2 TOR switches each in one rack, so for example each of the 20 ceph osd nodes is connected with 10G interfaces on the TOR switches. the problem is if the bandwidth between the TOR switche is also only 10G, It could happen that there is a bottle neck in some situations. That's what I tried to say.
[11:09] <swami2> T1w: That may be cause the risk b/c I have 2 racks and set the rack as failure zone, but replica count is 3, where will be th 3rd copy goes?
[11:09] <T1w> update CRUSH
[11:09] <T1w> allow rebalance to complete
[11:09] <T1w> .. repeat until everything is as it should be
[11:10] <T1w> lastly you have 2 nodes that have not been moved physiccally
[11:11] <T1w> but they should be moved to a rack in CRUSH
[11:11] <T1w> afk
[11:12] <swami2> T1w: yes, I noted that ....all nodes should be updated in CRUSH...
[11:12] <swami2> T1w: Thanks
[11:13] * nicatronTg (~Neon@76GAAA016.tor-irc.dnsbl.oftc.net) Quit ()
[11:13] <swami2> treenerd: Thanks for explanation---I guesss, I won't see that case ...
[11:14] <swami2> treenerd: But its very much valid point to note
[11:15] <treenerd_> swami: Okay; You can also handle some stuff of the crushmap with the cli. https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.2.3/red-hat-ceph-storage-123-installation-guide-for-rhel-x86-64/#add_osd_hosts_chassis_to_the_crush_hierarchy sometimes thats nice to know.
[11:15] <swami2> T1w: Thank you...let me with nobackfill, norecovery and raise the OSD down time counter... Move the nodes and update the CRUSH, finally wait for rebalnace to compelte
[11:15] <TMM> how large can the pgmap version number get? is this something I should be at all worried about?
[11:18] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[11:20] * HauM1 (~HauM1@login.univie.ac.at) Quit (Ping timeout: 480 seconds)
[11:21] * i_m (~ivan.miro@88.206.113.199) has joined #ceph
[11:42] * Disconnected.
[12:01] -coulomb.oftc.net- *** Looking up your hostname...
[12:01] -coulomb.oftc.net- *** Checking Ident
[12:01] -coulomb.oftc.net- *** Couldn't look up your hostname
[12:01] -coulomb.oftc.net- *** No Ident response

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.