#ceph IRC Log


IRC Log for 2015-06-08

Timestamps are in GMT/BST.

[0:11] * pepzi (~allenmelo@3DDAAAUJH.tor-irc.dnsbl.oftc.net) Quit ()
[0:11] * CoMa (~rapedex@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[0:26] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) has joined #ceph
[0:37] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Ping timeout: 480 seconds)
[0:41] * CoMa (~rapedex@9S0AAATEZ.tor-irc.dnsbl.oftc.net) Quit ()
[0:42] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[0:46] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[0:47] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[0:49] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:50] * circ-user-8p13Y (~circuser-@ has joined #ceph
[0:52] * circ-user-8p13Y (~circuser-@ Quit (Remote host closed the connection)
[0:53] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) has left #ceph
[1:00] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[1:11] * Harryhy (~LorenXo@ has joined #ceph
[1:22] * fmanana (~fdmanana@bl13-135-31.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[1:33] * kefu (~kefu@ has joined #ceph
[1:35] * kefu (~kefu@ Quit ()
[1:41] * Harryhy (~LorenXo@5NZAADFA5.tor-irc.dnsbl.oftc.net) Quit ()
[1:41] * Zombiekiller (~AotC@chulak.enn.lu) has joined #ceph
[1:51] * bdx (~jbeedy@ Quit (Quit: WeeChat 1.1.1)
[1:56] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[2:11] * Zombiekiller (~AotC@3DDAAAUOU.tor-irc.dnsbl.oftc.net) Quit ()
[2:11] * Malcovent (~allenmelo@ has joined #ceph
[2:34] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Ping timeout: 480 seconds)
[2:38] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[2:41] * Malcovent (~allenmelo@9S0AAATJN.tor-irc.dnsbl.oftc.net) Quit ()
[2:41] * Redshift (~Thononain@lumumba.torservers.net) has joined #ceph
[2:43] * vpol (~vpol@000131a0.user.oftc.net) Quit (Quit: vpol)
[2:44] * yanzheng (~zhyan@ has joined #ceph
[2:52] * lucas1 (~Thunderbi@ has joined #ceph
[2:53] * kefu (~kefu@ has joined #ceph
[2:56] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[2:57] * kefu (~kefu@ Quit ()
[2:58] * kefu (~kefu@ has joined #ceph
[2:58] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:06] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:11] * Redshift (~Thononain@9S0AAATKC.tor-irc.dnsbl.oftc.net) Quit ()
[3:11] * tallest_red (~starcoder@ has joined #ceph
[3:18] * pcsquared (sid11336@id-11336.ealing.irccloud.com) Quit ()
[3:19] * kefu (~kefu@ Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[3:33] * jclm1 (~jclm@ Quit (Quit: Leaving.)
[3:41] * tallest_red (~starcoder@9S0AAATLB.tor-irc.dnsbl.oftc.net) Quit ()
[3:41] * pakman__ (~Neon@nx-74205.tor-exit.network) has joined #ceph
[3:53] * zhaochao (~zhaochao@ has joined #ceph
[3:56] * flisky (~Thunderbi@ has joined #ceph
[3:58] * fam_away is now known as fam
[4:06] * shang (~ShangWu@ has joined #ceph
[4:11] * pakman__ (~Neon@3DDAAAUSX.tor-irc.dnsbl.oftc.net) Quit ()
[4:11] * Averad (~Rens2Sea@tor-exit0-readme.dfri.se) has joined #ceph
[4:11] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:16] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:18] * yguang11 (~yguang11@ has joined #ceph
[4:37] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Ping timeout: 480 seconds)
[4:38] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[4:39] * kefu (~kefu@ has joined #ceph
[4:39] * kefu (~kefu@ Quit ()
[4:40] * fred`` (fred@earthli.ng) Quit (Ping timeout: 480 seconds)
[4:41] * Averad (~Rens2Sea@0SGAAA9AU.tor-irc.dnsbl.oftc.net) Quit ()
[4:45] * bildramer (~jwandborg@spftor1e1.privacyfoundation.ch) has joined #ceph
[4:50] * vpol (~vpol@000131a0.user.oftc.net) has joined #ceph
[4:58] * jclm (~jclm@ has joined #ceph
[5:05] * vpol (~vpol@000131a0.user.oftc.net) Quit (Quit: vpol)
[5:06] * kefu (~kefu@ has joined #ceph
[5:11] * kefu (~kefu@ Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[5:15] * bildramer (~jwandborg@5NZAADFJL.tor-irc.dnsbl.oftc.net) Quit ()
[5:15] * zapu (~Uniju@ has joined #ceph
[5:22] * Vacuum (~Vacuum@i59F79F2F.versanet.de) has joined #ceph
[5:29] * Vacuum_ (~Vacuum@i59F79B5A.versanet.de) Quit (Ping timeout: 480 seconds)
[5:30] * OutOfNoWhere (~rpb@ Quit (Ping timeout: 480 seconds)
[5:31] * jclm1 (~jclm@ has joined #ceph
[5:34] * jclm (~jclm@ Quit (Ping timeout: 480 seconds)
[5:41] * vpol (~vpol@000131a0.user.oftc.net) has joined #ceph
[5:42] * squ (~Thunderbi@ has joined #ceph
[5:42] * yguang11_ (~yguang11@nat-dip15.fw.corp.yahoo.com) has joined #ceph
[5:45] * zapu (~Uniju@9S0AAATPQ.tor-irc.dnsbl.oftc.net) Quit ()
[5:45] * tallest_red (~Neon@tor-exit2-readme.puckey.org) has joined #ceph
[5:49] * yguang11 (~yguang11@ Quit (Ping timeout: 480 seconds)
[6:03] * kefu (~kefu@ has joined #ceph
[6:08] * kefu (~kefu@ Quit ()
[6:15] * tallest_red (~Neon@9S0AAATQ1.tor-irc.dnsbl.oftc.net) Quit ()
[6:15] * Unforgiven (~Izanagi@tor-exit2-readme.puckey.org) has joined #ceph
[6:26] * kefu (~kefu@ has joined #ceph
[6:32] * vpol (~vpol@000131a0.user.oftc.net) Quit (Quit: vpol)
[6:36] * shylesh (~shylesh@ has joined #ceph
[6:36] * squ (~Thunderbi@ Quit (Remote host closed the connection)
[6:36] * squ (~Thunderbi@ has joined #ceph
[6:45] * kanagaraj (~kanagaraj@ has joined #ceph
[6:45] * Unforgiven (~Izanagi@5NZAADFNO.tor-irc.dnsbl.oftc.net) Quit ()
[6:48] * rdas (~rdas@ has joined #ceph
[6:51] * amote (~amote@ has joined #ceph
[6:53] * shnarch (~shnarch@bzq-109-66-143-242.red.bezeqint.net) has joined #ceph
[6:54] * vikhyat (~vumrao@ has joined #ceph
[6:59] * fam is now known as fam_away
[7:00] * vpol (~vpol@000131a0.user.oftc.net) has joined #ceph
[7:00] * kefu (~kefu@ Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[7:01] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has joined #ceph
[7:02] * MACscr (~Adium@2601:d:c800:de3:b014:7779:ae40:9507) has joined #ceph
[7:04] * Concubidated (~Adium@ has joined #ceph
[7:08] * yguang11_ (~yguang11@nat-dip15.fw.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[7:14] * fam_away is now known as fam
[7:26] * raw (~raw@ has joined #ceph
[7:28] * kefu (~kefu@ has joined #ceph
[7:34] * vbellur (~vijay@ has joined #ceph
[7:40] * treenerd (~treenerd@83-64-142-10.zwischennetz.xdsl-line.inode.at) has joined #ceph
[7:47] * derjohn_mobi (~aj@ has joined #ceph
[7:59] * treenerd (~treenerd@83-64-142-10.zwischennetz.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[8:00] * Nacer (~Nacer@2001:41d0:fe82:7200:8908:6cf1:40e0:aaf6) Quit (Read error: Connection reset by peer)
[8:05] * overclk (~overclk@ has joined #ceph
[8:08] * MACscr (~Adium@2601:d:c800:de3:b014:7779:ae40:9507) Quit (Quit: Leaving.)
[8:11] * MACscr (~Adium@2601:d:c800:de3:5d87:9067:1d52:2d76) has joined #ceph
[8:11] * chutz (~chutz@rygel.linuxfreak.ca) Quit (Ping timeout: 480 seconds)
[8:12] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[8:12] * karnan (~karnan@ has joined #ceph
[8:14] * Nacer (~Nacer@2001:41d0:fe82:7200:f029:c01a:e3cc:57fd) has joined #ceph
[8:15] * kefu (~kefu@ Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[8:15] * FNugget (~KUSmurf@9S0AAATXD.tor-irc.dnsbl.oftc.net) has joined #ceph
[8:17] * Hemanth (~Hemanth@ has joined #ceph
[8:22] * shnarch (~shnarch@bzq-109-66-143-242.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[8:23] * Hemanth (~Hemanth@ Quit (Quit: Leaving)
[8:24] * shohn (~shohn@dslb-178-008-199-047.178.008.pools.vodafone-ip.de) has joined #ceph
[8:24] * Hemanth (~Hemanth@ has joined #ceph
[8:26] * cooldharma06 (~chatzilla@ has joined #ceph
[8:28] * cooldharma06 (~chatzilla@ Quit ()
[8:28] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[8:28] * chutz (~chutz@rygel.linuxfreak.ca) has joined #ceph
[8:30] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[8:32] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:33] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[8:36] * derjohn_mobi (~aj@ Quit (Ping timeout: 480 seconds)
[8:39] * cooldharma06 (~chatzilla@ has joined #ceph
[8:40] * Nacer (~Nacer@2001:41d0:fe82:7200:f029:c01a:e3cc:57fd) Quit (Remote host closed the connection)
[8:40] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) has joined #ceph
[8:41] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[8:42] * cok (~chk@2a02:2350:18:1010:d97d:3e3e:ccc9:d95) has joined #ceph
[8:42] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[8:42] * treenerd (~treenerd@ has joined #ceph
[8:43] * treenerd (~treenerd@ Quit ()
[8:45] * FNugget (~KUSmurf@9S0AAATXD.tor-irc.dnsbl.oftc.net) Quit ()
[8:45] * Revo84 (~Mattress@ has joined #ceph
[8:45] * chutz (~chutz@rygel.linuxfreak.ca) Quit (Ping timeout: 480 seconds)
[8:48] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[8:49] <Be-El> hi
[8:51] * MACscr (~Adium@2601:d:c800:de3:5d87:9067:1d52:2d76) Quit (Quit: Leaving.)
[8:55] <raw> hi
[9:06] * treenerd (~treenerd@ has joined #ceph
[9:07] * nc_ch (~nc@flinux01.tu-graz.ac.at) Quit (Quit: Leaving)
[9:15] * Revo84 (~Mattress@9S0AAATYR.tor-irc.dnsbl.oftc.net) Quit ()
[9:15] * zviratko1 (~Malcovent@tor-exit2-readme.puckey.org) has joined #ceph
[9:16] * calvinx (~calvin@ has joined #ceph
[9:19] * sleinen1 (~Adium@2001:620:0:82::107) has joined #ceph
[9:20] * kawa2014 (~kawa@ has joined #ceph
[9:20] * dgurtner (~dgurtner@ has joined #ceph
[9:21] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[9:24] * thomnico (~thomnico@2a01:e35:8b41:120:18d:8034:c3fc:5072) has joined #ceph
[9:24] * naga1 (~oftc-webi@idp01webcache5-z.apj.hpecore.net) has joined #ceph
[9:25] <naga1> i am trying to install rbd packages using apt-get install python-rbd, but is showing E: Unable to locate package python-rbd
[9:25] * fam is now known as fam_away
[9:26] * derjohn_mobi (~aj@fw.gkh-setu.de) has joined #ceph
[9:27] * analbeard (~shw@support.memset.com) has joined #ceph
[9:29] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[9:29] * calvinx (~calvin@ Quit (Quit: calvinx)
[9:29] * fam_away is now known as fam
[9:29] <flaf> naga1: did you modify your sources.list?
[9:30] <naga1> yeah
[9:30] <flaf> Because this package exists in the repository "deb http://ceph.com/debian-hammer/ trusty main"
[9:31] <flaf> naga1: which OS? which line in your sources.list?
[9:31] <naga1> ubuntu
[9:32] <flaf> ubuntu Trusty?
[9:32] * ksperis (~ksperis@ has joined #ceph
[9:32] <naga1> yeah
[9:33] <flaf> So if you add the line in your sources.list, it should work (this is the case for me with the exactly same OS).
[9:33] <Be-El> naga1: did you run apt-get update after adding the line to the sources list?
[9:35] <naga1> yes, i ran apt-get update
[9:35] <naga1> Failed to fetch http://ceph.com/packages/ceph-extras/debian/dists/trusty/main/binary-amd64/Packages 404 Not Found
[9:37] * shnarch (~shnarch@ has joined #ceph
[9:39] <Be-El> naga1: looks like the trusty package information is missing on the ceph server
[9:41] * xarses (~andreww@ has joined #ceph
[9:42] * sleinen1 (~Adium@2001:620:0:82::107) Quit (Ping timeout: 480 seconds)
[9:44] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[9:44] * dgurtner (~dgurtner@ Quit (Ping timeout: 480 seconds)
[9:45] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:45] <flaf> No no, no problem with the repo.
[9:45] * zviratko1 (~Malcovent@8Q4AABC3E.tor-irc.dnsbl.oftc.net) Quit ()
[9:46] <flaf> Currently I have a trusty with the line "deb http://ceph.com/debian-hammer/ trusty main" in my sources.list. No problem. The package python-rbd exists. We can see it in this page: http://ceph.com/debian-hammer/dists/trusty/main/binary-amd64/Packages
[9:47] <flaf> naga1: your sources.list is not correct. That's all.
[9:47] * dgurtner (~dgurtner@ has joined #ceph
[9:49] <flaf> Fix your sources.list with the correct line and you will be able to install the python-rbd package (after a `apt-get update` of course).
[9:50] * jashank42 (~jashan42@ has joined #ceph
[9:58] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) has joined #ceph
[10:00] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[10:00] * jks (~jks@ Quit (Quit: jks)
[10:02] * bitserker (~toni@63.pool85-52-240.static.orange.es) has joined #ceph
[10:02] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[10:03] * bitserker (~toni@63.pool85-52-240.static.orange.es) Quit (Read error: No route to host)
[10:04] * Concubidated (~Adium@ Quit (Quit: Leaving.)
[10:04] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has left #ceph
[10:04] * branto (~branto@ip-213-220-214-203.net.upcbroadband.cz) has joined #ceph
[10:06] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:08] * oms101 (~oms101@p20030057EA0B3000C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[10:12] * PaulCuzner (~paul@222-153-122-160.jetstream.xtra.co.nz) has joined #ceph
[10:13] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[10:13] * PaulCuzner (~paul@222-153-122-160.jetstream.xtra.co.nz) has left #ceph
[10:15] * Kayla (~MonkeyJam@9S0AAAT2M.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:25] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[10:34] * jashank42 (~jashan42@ Quit (Remote host closed the connection)
[10:34] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:34] * jks (~jks@ has joined #ceph
[10:37] * ngoswami (~ngoswami@ has joined #ceph
[10:45] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[10:45] * Kayla (~MonkeyJam@9S0AAAT2M.tor-irc.dnsbl.oftc.net) Quit ()
[10:45] * HoboPickle (~csharp@chomsky.torservers.net) has joined #ceph
[10:49] * fmanana (~fdmanana@bl13-135-31.dsl.telepac.pt) has joined #ceph
[10:50] * b0e (~aledermue@ has joined #ceph
[10:54] * RomeroJnr (~h0m3r@hosd.leaseweb.net) has joined #ceph
[10:54] * linjan (~linjan@ has joined #ceph
[10:54] * linjan (~linjan@ Quit (Read error: Connection reset by peer)
[10:56] <RomeroJnr> Well, well... I have a cluster with 100 nodes (each with 1.5 TB osd), all of them using a 2 GB (LACP) connection on a dedicated cluster network. There's no I/O operation going on except for a couple of dd's that I am running to test the platform throughput, however I noticed I'm getting a lot of slow requests warnings
[10:57] <RomeroJnr> all OSD are up, everything is new.. however, the slow requests are quiet random, sometimes it breaks everything, sometimes it doesn't appear at all
[10:57] <RomeroJnr> any advice on where should I start troubleshooting?
[11:03] * kefu (~kefu@ has joined #ceph
[11:05] * kefu (~kefu@ Quit ()
[11:07] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[11:08] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[11:12] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) has joined #ceph
[11:12] * RomeroJnr (~h0m3r@hosd.leaseweb.net) Quit (Read error: Connection reset by peer)
[11:15] * HoboPickle (~csharp@9S0AAAT3P.tor-irc.dnsbl.oftc.net) Quit ()
[11:15] * geegeegee (~Pirate@spftor1e1.privacyfoundation.ch) has joined #ceph
[11:18] * RomeroJnr (~h0m3r@hosd.leaseweb.net) has joined #ceph
[11:22] <jeroen_> 2 GB LACP is not really recommended for production usage
[11:22] <jeroen_> that will kill your cluster as soon as there is an ongoing recovery process
[11:23] <jeroen_> what kind of disks are you using? do you have ssd caching?
[11:24] <RomeroJnr> yes, i have ssd caching.. the OSD disks are Seagate Barracuda (7200 rpm)
[11:25] <T1w> ig the bandwidth not dependent on the number of nodes?
[11:25] <T1w> is even
[11:26] <T1w> I mean, 1GBit for a 3 or 5 node cluster should be enough .. or?
[11:26] * shnarch (~shnarch@ Quit (Ping timeout: 480 seconds)
[11:31] <RomeroJnr> jeroen_, ok.. however, during these tests i'm not using even 10% of this LACP interfaces... and i'm still getting slow requests warnings
[11:32] <T1w> RomeroJnr: beware.. a 2x 1GBit LACP does not equal 2GBit simultanious bandwidth
[11:32] * gaveen (~gaveen@ has joined #ceph
[11:32] * thomnico (~thomnico@2a01:e35:8b41:120:18d:8034:c3fc:5072) Quit (Quit: Ex-Chat)
[11:33] * thomnico (~thomnico@2a01:e35:8b41:120:18d:8034:c3fc:5072) has joined #ceph
[11:33] <flaf> Hi, has someone already tried cephfs and file-layouts? I would like to put a cephfs directory in a specific pool for a specific account. Currenty, my attempts doesn't work.
[11:34] <flaf> (with hammer in Ubuntu Trusty 14.04)
[11:35] * wenjunhuang (~wenjunhua@ has joined #ceph
[11:35] <RomeroJnr> T1w, i know, but ceph cluster traffic flows in multiple streams among all nodes (check netstat), therefore i can make use of 2 GB with the correct LACP configuration
[11:36] <jeroen_> T1w, not really
[11:36] <T1w> RomeroJnr: good good.. what kind of hashing algorithm do you use?
[11:36] <jeroen_> because when you only have 3 -5 nodes, the recovery does lots of reads on the 4 remaining nodes to recover the data
[11:36] <jeroen_> when having lots of nodes, on a per node basis that is less reads
[11:37] <RomeroJnr> T1w, l3+l2 provided me the best results
[11:37] <jeroen_> same applies for writes
[11:37] <T1w> hmmm
[11:37] <jeroen_> but you can also add a new node in the cluster, and then you have the opposite issue, because 100 nodes with do LOTS of reads going to a single node
[11:37] <T1w> so it's 10gbit net or trouble ahead?
[11:38] <jeroen_> so the bandwidth of that single node will be the bottleneck
[11:38] <jeroen_> yeah
[11:38] <T1w> alas.. :/
[11:38] <T1w> good to know
[11:38] <jeroen_> I have 2x 10 Gbps to each ceph node (LACP)
[11:38] <jeroen_> and 2x 40 Gbps between racks
[11:38] <T1w> (I've got a go/nogo meeting later today where we have to choose if ceph is to be our new storage backend
[11:39] <T1w> hm hm..
[11:39] <jeroen_> it obviously depends abit on the use case
[11:39] <jeroen_> we need high performance
[11:40] <jeroen_> and you can cap the recovery rates abit
[11:40] <T1w> but if a recovery might kill all access..
[11:40] <T1w> hm.. lunch
[11:40] <jeroen_> and 10 gbps has lower latency obviously
[11:40] <T1w> jeroen_: got a few minutes in half an hour or so? I'd like to pick your brains a bit.. ;)
[11:41] <T1w> afk
[11:41] <jeroen_> yeah probably around all day
[11:45] <Be-El> flaf: i'm using two different storage pool in cephfs. what exactly does not work in your case?
[11:45] * geegeegee (~Pirate@5NZAADF2H.tor-irc.dnsbl.oftc.net) Quit ()
[11:45] * elt1 (~colde@7R2AABJY6.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:46] <flaf> Be-El: I have this `ceph fs ls` => name: cephfs, metadata pool: metadata, data pools: [data data2 ]
[11:46] <flaf> and I have mounted my cephfs with admin account to create a data/ and data2/ directories.
[11:47] * Inflatablewoman (~Inflatabl@host-93-104-248-34.customer.m-online.net) has joined #ceph
[11:47] <Inflatablewoman> is it possible to run the gateway without cephx authentication?
[11:48] <flaf> Be-El: I have set pool=data for the data/ dir and pool=data2 for the data2/ dir
[11:48] <flaf> Be-El: I'm trying to mount the cephfs with a ceph account that can only read/write in the data pool.
[11:49] <Be-El> flaf: do you mount the subdirectory or the root directory of cephfs?
[11:50] <flaf> But when I try to mount with the ceph account I have "mount error 1 = Operation not permitted"
[11:50] <Be-El> flaf: and what pool does the root directory belong to?
[11:50] <flaf> Be-El: I try to mount the subdirectory data/
[11:52] <flaf> If I run `getfattr -n ceph.dir.layout /mnt/` (/mnt/ is my mount directory), the root seems to be in "data" pool.
[11:52] <flaf> Be-El: I going to paste that seems to me important...
[11:52] <jcsp> flaf: fuse or kernel client? what version?
[11:52] <Be-El> flaf: i'm not sure whether permissions for the subdirectory pool are enough.
[11:53] <flaf> kernel client, 3.16 for the kernel.
[11:53] * martineg (~martin@shell01.copyleft.no) has joined #ceph
[11:53] <jcsp> flaf: and does your client's key have "caps: [mds] allow"?
[11:54] <jcsp> (I'm guessing you created a custom one)
[11:54] <Be-El> lunchtime, bbl
[11:56] <flaf> jcsp: no "caps: [mds] allow" in my ceph account (http://pastealacon.com/37655). I don't put that usually in the capabilities of my ceph account to mount a cephfs.
[11:58] <flaf> I have mounted cephfs several times with this kind of capabilities http://pastealacon.com/37655. are "caps [mds] allow" necessary to mount a cephfs?
[12:00] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:01] <jcsp> flaf: yes, you need that to talk to an MDS. It's possible that it wasn't enforced at some stage in the past though.
[12:03] <flaf> Of course, my final goal is simple: I want to have a data/ directory in my cephfs in the "data" pool which can be only used for cephfs-user1 ceph account and a data2/ directory in the "data2" pool which can be only used for cephfs-user2 ceph account.
[12:04] <flaf> jcsp: so, if I understand well, to be able to use cephfs, I must use "caps: [mds] allow" in the capabilities of my account here http://pastealacon.com/37655. Correct?
[12:04] <jcsp> right
[12:05] <flaf> Ok. I going to try...
[12:11] <tuxcraft1r> is it possible to use ceph deploy to send a custom command to all nodes
[12:11] <tuxcraft1r> like poweroff
[12:11] <tuxcraft1r> or should i just setup ansible
[12:13] <RomeroJnr> jeroen_, after restarting all osd deamon the performance went back to normal
[12:15] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[12:15] * elt1 (~colde@7R2AABJY6.tor-irc.dnsbl.oftc.net) Quit ()
[12:15] * w0lfeh (~theghost9@tor-daiquiri.piraten-nds.de) has joined #ceph
[12:20] <T1w> re
[12:23] <T1w> I wonder what I can get dual 10gbit nics for
[12:23] <T1w> .. and what a couple of 10g switches would go for
[12:25] * fam is now known as fam_away
[12:25] * fam_away is now known as fam
[12:26] * LeaChim (~LeaChim@host86-163-124-72.range86-163.btcentralplus.com) has joined #ceph
[12:26] <smerz> T1w, switches around 5K+ (euro) per switch. not sure how much the pice NICs go for. also don't forget the optics (they cost like 50-100 eur or so per piece)
[12:27] <smerz> something like that. can depend on your specific hardware choice
[12:27] <T1w> yeah..
[12:27] <T1w> but I was hoping of using cat6 instead of fiber
[12:27] <T1w> - of
[12:27] <smerz> except the range last i checked was horrible
[12:27] <T1w> distances are not that long inside a rack
[12:27] <smerz> max cable length
[12:27] <smerz> true
[12:28] <T1w> of course cross-rack will have to be fiber
[12:28] <smerz> indeed
[12:28] <smerz> for the rack connectivity you may want more than 10gbit links too
[12:29] <T1w> we're thinking of using ceph as an object storage for 5-10 clients that write perhaps 20.000 new object per client per day
[12:29] <T1w> .. and each object if well below 4 or 5MB each
[12:29] <T1w> is even
[12:29] <T1w> basicly worm
[12:30] <T1w> if a change is needed, a new object will be written
[12:31] <T1w> I'm hoping that is possible with 1gbit netork between 3 or 5 physical machines each with 2 OSDs with each 4TB of storage
[12:36] <jeroen_> be aware that 5 nodes is like the minimum amount of ceph nodes
[12:36] <T1w> I know
[12:36] <T1w> we were thinking of staring out with 3 for a start and then add 2 more "ssonish"
[12:36] <T1w> soonish even
[12:36] <jeroen_> how many copies are you aiming for?
[12:37] <T1w> 3 and no immediate erasure encoded pools
[12:37] <T1w> perhaps in future
[12:38] <jeroen_> ok, so make sure your stay below 65% usage
[12:39] <T1w> yeah, or a failure of a single physical machine will cause me to exceed max storage
[12:39] <jeroen_> yeah and ceph does not like that
[12:39] <T1w> (thats part of the reason to add 2 more..)
[12:40] * treenerd (~treenerd@ Quit (Ping timeout: 480 seconds)
[12:41] <jeroen_> but is your application read or write intensive?
[12:41] <T1w> close to a 50/50 split
[12:42] <T1w> perhaps a bit more reads than writes, but that's only due to possible multiple reads of the same objects to more than 1 end-user
[12:42] <T1w> an object is most commenly a single PDF
[12:42] <jeroen_> 10 gbps will improve your iops alot
[12:43] <jeroen_> because with 3 copies traffic has pass the network 6 times
[12:43] <T1w> yeah
[12:43] * Inflatablewoman (~Inflatabl@host-93-104-248-34.customer.m-online.net) Quit (Remote host closed the connection)
[12:43] <jeroen_> on reads its "less" important, as that is always coming from 1 osd
[12:44] <tuxcraft1r> [ceph01][WARNIN] 2015-06-08 12:43:48.387390 7f7f387487c0 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected d16f2967-64ba-4772-a253-a54020e1d10a, invalid (someone else's?) journal
[12:44] <tuxcraft1r> [ceph01][WARNIN] 2015-06-08 12:43:48.591585 7f7f387487c0 -1 filestore(/var/lib/ceph/tmp/mnt.oTvSq8) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[12:44] <tuxcraft1r> i keep hitting issues
[12:44] <tuxcraft1r> i think i may have to give up on debian jessie en ceph....
[12:45] <jeroen_> we tried jessie as well, but moved to ubuntu after like one day testing ;)
[12:45] <T1w> oh well..
[12:45] <T1w> on the other hand
[12:45] <T1w> 1gbit is a place to start
[12:45] <jeroen_> true, but upgrading will be a nightmare
[12:45] <T1w> we can (or should be able to) upgrade to 10gbit in the storage network
[12:45] * w0lfeh (~theghost9@9S0AAAT6Y.tor-irc.dnsbl.oftc.net) Quit ()
[12:45] * osuka_ (~sixofour@aurora.enn.lu) has joined #ceph
[12:46] <T1w> if I schedule an outage for everything and the number of nodes is pretty low (still < 10) it should be doable in a night
[12:46] <flaf> jcsp: Be-El: Here is my test http://pastealacon.com/37656 , I have unexpected behaviour. Have I missed something?
[12:47] <T1w> and it's hard to get acceptance for 10gbit for startes when it more than doubles all costs
[12:48] <jeroen_> yeah well, the switch (A-brand) will be like 4k-5k euros, the NICs around 100 euros each and 5 optics x 60 euros
[12:48] <jcsp> flaf: couple of things???
[12:48] <flaf> Ah?
[12:48] <jcsp> 1. don't give your clients access to the metadata pool
[12:48] <jeroen_> but you probably need multiple nights to upgrade the nodes one-by-one
[12:48] <jcsp> 2. restricting pool auth caps doesn't prevent clients creating files (metadata operation), only writing data to them
[12:49] <jcsp> 3. pool auth caps don't have any effect at all on what path you can mount
[12:49] <jeroen_> or if you schedule maintenance you might be able to pull it off in one, as long you dont have too many changing data
[12:49] <jcsp> 4. even when you can't write data, sometimes a write will appear to succeed, but it won't be stopped until the kernel tries to flush the data to ceph
[12:49] <jcsp> recently we added something to the fuse client to refuse writes up-front so that it's more obvious
[12:50] <T1w> jeroen_: if I take down all clients nothing will change - and then I've got several hours to upgrade all nodes, so it should be doable.. again - I've only got 3 or 5 nodes or 7 or 8 at most
[12:51] <T1w> even with 20-30 mins per node it should be duable during a 6 hour window
[12:51] <T1w> it's "just" a matter of keeping ceph happy
[12:51] <jcsp> flaf: what you will find is that once cephfsuser has written to the data pool, cephfsuser2 won't be able to read those files. That's really the only level of safety you're getting here.
[12:51] <jeroen_> I was thinking of doing it without downtime ;)
[12:52] <T1w> jeroen_: thanks for your input
[12:52] * treenerd (~treenerd@ has joined #ceph
[12:52] <T1w> 8 mins to judgementday.. ;)
[12:52] <jeroen_> in our case 15 mins of downtime is already unacceptable
[12:52] <jeroen_> :P
[12:52] <jeroen_> good luck
[12:52] <jcsp> flaf: see the "Cephfs: one ceph account per directory" thread on ceph-users ??? proper per-path auth caps is being worked on, but not done yet.
[12:53] * Xiol (~Xiol@shrike.daneelwell.eu) Quit (Quit: Bye)
[12:53] <T1w> ah.. well.. as long as I announce it "in good time" (more than a few days, and nothing major is planned) I can get away with downtime from 22 or 23 to 06 or 08 (depending on the day of week)
[12:53] <flaf> jcsp: thx for your help.
[12:53] <T1w> .. but then again - during daytime I cannot accept any downtime
[12:53] * RC (~Adium@ has joined #ceph
[12:53] * RC is now known as Guest916
[12:54] <T1w> even just a few minutes are unacceptable
[12:54] <Guest916> hi, am trying to setup nginx to access ceph buckets
[12:54] <Guest916> https://github.com/anomalizer/ngx_aws_auth
[12:54] <T1w> everything is either cluster-based or on hot standby and ready to take over in just a few seconds
[12:54] <flaf> jcsp: "Cephfs: one ceph account per directory" is "my" thread and I just wanted to test the file-layout feature.
[12:54] <Be-El> T1w: if you want to have a high available cluster, you might want to think about cron jobs that disable scrubbing and deep scrubbing during work hours
[12:55] <T1w> Be-El: good point, thanks
[12:55] <T1w> well.. afk - ceph decision time
[12:57] * zhaochao (~zhaochao@ Quit (Quit: ChatZilla [Iceweasel 38.0.1/20150526223604])
[12:58] <flaf> jcsp: In fact, I don't care about the path mount. The only thing I want is that cephfsuser can't delete the content of data2/ and cephfsuser2 can't delete the content of data/. Can I reach this goal with the file-layout feature?
[12:59] <Guest916> http://pastebin.com/Lhyhk7xk
[12:59] <Guest916> can u someone please help on this
[13:01] * Guest916 (~Adium@ Quit (Quit: Leaving.)
[13:01] * RC1 (~Adium@ has joined #ceph
[13:01] <RC1> hi, am new to ceph and trying to setup nginx + ceph
[13:01] <RC1> http://pastebin.com/Lhyhk7xk - can someone please help
[13:02] <jcsp> flaf: no, not with osd caps. deletion is a metadata operation. However, there are other layers in the stack you can use here. Simple POSIX permissions work fine if you trust the root user on these hosts.
[13:03] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[13:06] <flaf> jcsp: ok but in fact I'm in this case: in a node, I have mounted cephfs with cephfsuser2 (for instance) to write in /mnt/data2/ and I imagine that the node is completely compromised, ie a hacker can be root in the node. In this case, the hacker will be able to delete the content of /mnt/data/, correct? The file-layout feature can't protect me against this catastrophic scenario. Is it correct?
[13:08] <tuxcraft1r> should i use happer or firefly for debian jessie?
[13:08] <jcsp> yes, that's correct. The attacker can't delete anything directly from the data pool himself, but he can send an unlink operation to the MDS to remove the files, and the MDS will purge them. However, the attacker can never see the contents of the files he is deleting.
[13:09] * ade (~abradshaw@tmo-102-221.customers.d1-online.com) has joined #ceph
[13:10] <flaf> jcsp: ok, many thx for your help. I will copy all your remarks in my personal notes. ;)
[13:10] <Be-El> does anyone of you use ganglia for ceph monitoring and can point me to some resources for configuration?
[13:15] * osuka_ (~sixofour@8Q4AABC8J.tor-irc.dnsbl.oftc.net) Quit ()
[13:15] * tZ (~Eric@9S0AAAT88.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:17] <flaf> Be-El: sorry, I don't even monitoring my cluster now. ;)
[13:19] * dneary (~dneary@pool-96-252-45-212.bstnma.fios.verizon.net) has joined #ceph
[13:24] * i_m (~ivan.miro@deibp9eh1--blueice4n2.emea.ibm.com) has joined #ceph
[13:26] * treenerd (~treenerd@ Quit (Ping timeout: 480 seconds)
[13:27] * fam is now known as fam_away
[13:29] * danielitit (~danieliti@ has joined #ceph
[13:29] <danielitit> Hi
[13:29] <danielitit> i want to mount ceph fs to my samba4 server
[13:30] <danielitit> and set acl
[13:30] <danielitit> is this possible?
[13:31] * RC2 (~Adium@ has joined #ceph
[13:31] * RC1 (~Adium@ Quit (Read error: Connection reset by peer)
[13:31] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[13:33] * vbellur (~vijay@ Quit (Ping timeout: 480 seconds)
[13:33] * b0e (~aledermue@ Quit (Ping timeout: 480 seconds)
[13:34] * ganders (~root@ has joined #ceph
[13:34] * treenerd (~treenerd@ has joined #ceph
[13:39] * RC2 (~Adium@ Quit (Ping timeout: 480 seconds)
[13:41] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) has joined #ceph
[13:42] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[13:42] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) has joined #ceph
[13:42] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) Quit ()
[13:45] * tZ (~Eric@9S0AAAT88.tor-irc.dnsbl.oftc.net) Quit ()
[13:47] * shylesh (~shylesh@ Quit (Remote host closed the connection)
[13:49] * topro (~prousa@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[13:49] * karnan (~karnan@ Quit (Remote host closed the connection)
[13:49] * b0e (~aledermue@ has joined #ceph
[13:50] * overclk (~overclk@ Quit (Quit: Leaving)
[13:52] <RomeroJnr> is there any well known reason for 'ceph df' to report incorrect pool storage usage?
[13:57] <tuxcraft1r> raw: jeroen_ i finally managed to get a basic ceph cluster create the osds
[13:57] <tuxcraft1r> with the packages from jessie-backports
[13:58] <tuxcraft1r> the onces in stable are useless
[13:58] * jdillaman_ (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[13:59] <raw> tuxcraft1r, my /etc/apt/sources.list.d/ceph.list reads "deb http://ceph.com/debian-hammer/ wheezy main"
[13:59] <raw> i can just apt-get ceph and it installs 0.94.1
[13:59] <Be-El> RomeroJnr: why do you think it's incorrect?
[14:02] * shang (~ShangWu@ Quit (Remote host closed the connection)
[14:02] <tuxcraft1r> raw: that may be because you are not running jessie
[14:02] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:02] <tuxcraft1r> i got pyton issues because my python version was too new
[14:02] * jdillaman_ is now known as jdillaman
[14:02] <RomeroJnr> Be-EL: after creating a rbd with 10 GB (for testing), i mounted this rbd on a node, uploaded 5gb of files into it, removed these files afterwards, umounted/unmapped this rbd from the node, but the 'ceph df detail' still shows the usage as 5gb
[14:04] <Be-El> RomeroJnr: that's the correct behaviour
[14:04] * b0e (~aledermue@ Quit (Ping timeout: 480 seconds)
[14:04] <Be-El> RomeroJnr: ceph has no idea of what is stored in a rbd. it's just a chunk of data
[14:05] * flisky (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[14:05] <Be-El> RomeroJnr: your harddisk also do not know how much space is available on them. the _filesystem_ on the other hand....
[14:06] <RomeroJnr> Be-El: hm, cool. but where the 'used' value from the 'ceph df' come from?
[14:06] <Be-El> RomeroJnr: if you want to release the space the is freed on the filesystem on a mapped rb, you need to use something like TRIM
[14:06] <Be-El> RomeroJnr: but i'm not sure whether TRIM/DISCARD is supported by rbds
[14:06] <raw> RomeroJnr, Be-El i think ceph supports the discard option known from ssd. with discard/trim, the filesystem can let ceph/the ssd know which blocks are actually not used
[14:07] <raw> see http://ceph.com/docs/master/rbd/qemu-rbd/
[14:07] <Be-El> RomeroJnr: ceph splits each object/rbd into chunks. the used value is the sum of the chunks.
[14:07] <Be-El> RomeroJnr: upon first (write) access to a chunk it is allocated by ceph
[14:07] * analbeard (~shw@support.memset.com) Quit (Read error: Connection reset by peer)
[14:08] <Be-El> raw: ah, that's good to know
[14:09] * jclm1 (~jclm@ Quit (Quit: Leaving.)
[14:12] * analbeard (~shw@support.memset.com) has joined #ceph
[14:13] <raw> hm, does a good pg number on a pool depend on its usage? i.e. should i create a pool that stores 10Gb and one that stores 1000GB with the same amount of PGs?
[14:14] <bilco105_> raw: http://ceph.com/pgcalc/
[14:15] <bilco105_> There's an explanation at the bottom as to how pg number should be calculated
[14:15] <flaf> raw: yes normally it depends on the %data that the pool will contains and indeed it can be annoying.
[14:16] * RaidSoft (~Jyron@ has joined #ceph
[14:17] <flaf> (if this %data changes during the life of the pool)
[14:18] <raw> good, thank you.
[14:18] * tries (ident@easytux.ch) Quit (Ping timeout: 480 seconds)
[14:18] * Kingrat (~shiny@2605:a000:161a:c022:41c7:7bda:46d4:67ed) Quit (Ping timeout: 480 seconds)
[14:19] * b0e (~aledermue@ has joined #ceph
[14:20] * CAPSLOCK2000 (~oftc@2001:984:3be3:1::8) Quit (Ping timeout: 480 seconds)
[14:26] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:27] * joelm (~joel@ has joined #ceph
[14:27] * Kingrat (~shiny@2605:a000:161a:c022:fc88:d666:b178:b993) has joined #ceph
[14:28] <joelm> Hey, are there any maintenance procedure flowcharts - I'm documenting up various runbooks for our support team. Can't see anything obvious bar the usual troubleshooting procedure, would one be useful?
[14:31] * dneary (~dneary@pool-96-252-45-212.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:38] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[14:45] * RaidSoft (~Jyron@8Q4AABDAP.tor-irc.dnsbl.oftc.net) Quit ()
[14:45] * Arcturus (~rf`@7R2AABJ24.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:49] * adrian15b (~kvirc@80.Red-79-151-228.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[14:51] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) has joined #ceph
[14:52] * squ (~Thunderbi@ Quit (Quit: squ)
[14:55] * treenerd (~treenerd@ Quit (Remote host closed the connection)
[14:57] * vpol (~vpol@000131a0.user.oftc.net) Quit (Quit: vpol)
[14:59] <tuxcraft1r> i should be able to use ceph rbd as a way to connect a disk to an other system with rdb right
[14:59] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) has joined #ceph
[14:59] * linuxkidd (~linuxkidd@ has joined #ceph
[15:00] <tuxcraft1r> but an HA disk with multiple monitors and paralell disk access to all the ceph nodes
[15:02] <Be-El> tuxcraft1r: you cannot access one rbd from several clients without taking precautions
[15:02] * jskinner (~jskinner@host-95-2-129.infobunker.com) has joined #ceph
[15:03] <tuxcraft1r> okay so not like iscsi
[15:03] * tupper_ (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[15:03] <tuxcraft1r> i want to replace iscsi with rbd
[15:03] <Be-El> you cannot do it with iscsi either
[15:04] <tuxcraft1r> oh we been using iscsi as block devices to other servers with lvm over it for more then eight years
[15:04] <Be-El> and you have probably used stuff like pacemaker or other failover solutions
[15:04] <raw> tuxcraft1r, rbd works like a shared disk. problem is that normal filesytems (ext4, xfs...) do not know about this shared stage. so you need to be 100% sure that you are not writing to the same fs at the same time from different machines
[15:05] <tuxcraft1r> ah yes we do to make the iscsi ha
[15:05] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[15:05] <tuxcraft1r> raw: yes that part is covert we do not write to the same fs from two machines
[15:06] <Be-El> tuxcraft1r: that's the part i meant by "precautions"
[15:06] <Be-El> tuxcraft1r: even mapping a rbd multiple times may result in problems
[15:06] <raw> what happens if you unplug the cable from one machine so that another machine takes over and the replug the cable? does the returned machine get killed to make sure that it writes nothing?
[15:07] * debian112 (~bcolbert@ has joined #ceph
[15:08] <tuxcraft1r> im looking at diffrent strategies to get the ceph cluster connected as storage disk for a kvm (libvirt) guest vm
[15:08] <tuxcraft1r> lookign at libvirt.org storage pools
[15:08] <tuxcraft1r> ceph documentation etc
[15:08] * tw0fish (~tw0fish@UNIX1.ANDREW.CMU.EDU) has joined #ceph
[15:09] <tuxcraft1r> i need the parallel io to the ceph nodes for performance
[15:09] <tuxcraft1r> together with the ha
[15:09] * tw0fish (~tw0fish@UNIX1.ANDREW.CMU.EDU) Quit ()
[15:09] <Be-El> tuxcraft1r: if you already have a HA infrastructure based on iscsi, you'll probably also have some setup of OCFS2 or similar shared filesystems
[15:09] <tuxcraft1r> was thinkin of just connecting an rdb disk to the host, put lvm over it and give the lvs to kvm guests
[15:09] <Be-El> tuxcraft1r: such a setup can be used with rbd
[15:10] <tuxcraft1r> i do not need shared filesystems
[15:10] <raw> tuxcraft1r, kvm/qemu can use rbd images directly. io-wise this is recommended
[15:10] <Be-El> tuxcraft1r: you can tell libvirt to use rbds directly
[15:10] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[15:11] <raw> the problem im talking about is called "fencing", its a common problem with ha/failover clusters to make sure that a machine that is declared dead and got replaced really stays dead.
[15:11] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[15:12] <tuxcraft1r> raw: i am assuming that ceph takes care of the fencing and all as storage clusters
[15:12] <tuxcraft1r> with 3 mon nodes and such
[15:12] <raw> ceph does not take care of how many users are using a single rbd image
[15:13] <raw> you can easily map the same rbd image from two machines, mount it and write stuff in there, causing havoc to the filesystem
[15:13] <Be-El> raw: that depends on the setup. afaik there's a kind of locking mechanism that grant exclusive access to rbds
[15:14] <raw> Be-El, there is a locking mechanism in ceph for rbd images, but afaik it does not prevent anyone from map/mount a rbd image. you need to ask for a exclusive lock and you will get it granted if no one else has one
[15:14] * t0rn (~ssullivan@2607:fad0:32:a02:56ee:75ff:fe48:3bd3) has joined #ceph
[15:15] <raw> also i dont think that revoking this lock does deny any access from the client that got the lock in the first place
[15:15] <tuxcraft1r> raw you mean the fs that ceph uses internally or the fs on the rbd image
[15:15] * Arcturus (~rf`@7R2AABJ24.tor-irc.dnsbl.oftc.net) Quit ()
[15:15] <Be-El> raw: i think these exclusive locks are requested by the libvirt rbd driver
[15:16] * jwandborg (~Keiya@ has joined #ceph
[15:16] <Be-El> raw: so as long as a vm is running the rbd is locked
[15:16] <Be-El> raw: if this is the case, a second vm should not be able to wreak a rbd of a dead/disconnected vm
[15:17] <raw> Be-El, would be good, haven't found any docs about this
[15:17] * t0rn (~ssullivan@2607:fad0:32:a02:56ee:75ff:fe48:3bd3) has left #ceph
[15:17] <Be-El> raw: http://www.wogri.at/linux/ceph-libvirt-locking/
[15:18] <Be-El> raw: disclaimer: i haven't used rbd locking yet
[15:18] <Be-El> raw: and you'll have manual effort if a vm is really dead without releasing the lock
[15:19] * brad_mssw (~brad@ has joined #ceph
[15:19] <raw> Be-El, yes, its complicated and requires manual scripting
[15:19] <kapil_> folks: after deleting all the contents from my mapped rbd drive, the ceph df still shows the same space usage for this pool. could this be because the content of the mapped rbd drive gets replicated ?
[15:20] <raw> im not doing auto-failover, so i can manually power-reset "dead" machines first before booting new ones, but for auto-failover deployments, there needs to be a solution.
[15:20] <Be-El> kapil_: no, you ceph has no idea about the content of the rbd
[15:20] <Be-El> kapil_: you have to use something like trim and/or discard
[15:20] <raw> tuxcraft1r, im talking about the fs the vm uses on top of the rbd
[15:20] <kapil_> trim ? is it a ceph command ?
[15:21] <kapil_> <Be-El> as in ceph trim ... ?
[15:21] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[15:21] <Be-El> kapil_: as in SSD triim
[15:21] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[15:21] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[15:22] <raw> tuxcraft1r, when planning a multi-host vm deployment with ha, failover, fencing and stuff i can recommend opennebula as platform. it solves some of the problems and helps to script the rest.
[15:22] * wschulze (~wschulze@cpe-69-206-240-164.nyc.res.rr.com) has joined #ceph
[15:23] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[15:24] <raw> kapil_, mount the fs with -o discard and do a manual fstrim -v <path> and see if it helps to reclaim the disk space
[15:24] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Read error: Connection reset by peer)
[15:24] * jclm (~jclm@ has joined #ceph
[15:24] <kapil_> raw: okay lemme try that
[15:24] <raw> kapil_, also see http://ceph.com/docs/master/rbd/qemu-rbd/ section "Enabling Discard/TRIM"
[15:24] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit ()
[15:24] <raw> when using qmeu
[15:25] * vpol (~vpol@000131a0.user.oftc.net) has joined #ceph
[15:26] <kapil_> raw: thanks
[15:26] <Be-El> raw: too bad it is not supported for virtio
[15:28] <kapil_> raw: in my setup I am not using qemu. Just created an rbd image, mapped it, mounted it and then filed the mounted dir with huge files. my ceph usage went from 0 to 40 GB. Now I deleted all the files unmounted the drive, unmapped rbd image (did not delete rbd image) and the ceph disk usage is still 40GB
[15:28] <kapil_> I will try -o discard as you suggested
[15:31] <raw> kapil_, and also fstrim -v <mountpoint> once
[15:34] * kefu (~kefu@ has joined #ceph
[15:35] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[15:35] * derjohn_mobi (~aj@fw.gkh-setu.de) Quit (Remote host closed the connection)
[15:37] * bene (~ben@nat-pool-bos-t.redhat.com) has joined #ceph
[15:37] * fam_away is now known as fam
[15:40] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[15:44] * yanzheng (~zhyan@ Quit (Quit: This computer has gone to sleep)
[15:45] * jwandborg (~Keiya@9S0AAAUDU.tor-irc.dnsbl.oftc.net) Quit ()
[15:46] * KeeperOfTheSoul (~tunaaja@ has joined #ceph
[15:49] * fam is now known as fam_away
[15:50] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[15:53] * cok (~chk@2a02:2350:18:1010:d97d:3e3e:ccc9:d95) has left #ceph
[15:53] * ChrisNBlum (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[15:53] * garphy`aw is now known as garphy
[15:55] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[15:56] <tuxcraft1r> http://paste.debian.net/212453/ < can somebody give me some hints on how to get an ceph rbd pool in libvirt
[15:57] <tuxcraft1r> Formatting 'rbd:libvirt-pool/new-libvirt-image', fmt=rbd size=8589934592
[15:57] <tuxcraft1r> no monitors specified to connect to.
[15:57] <tuxcraft1r> 2015-06-08 13:45:09.984+0000: 27444: error : virStorageBackendRBDOpenRADOSConn:218 : failed to connect to the RADOS monitor on: ceph01.powercraft.nl:6789,ceph02.powercraft.nl:6789,ceph03.powercraft.nl:6789,: Operation not permitted
[15:58] * thomnico (~thomnico@2a01:e35:8b41:120:18d:8034:c3fc:5072) Quit (Quit: Ex-Chat)
[15:59] <raw> where can i report bugs? i have found the ceph bugtracker but i cant find the option to create a new bug
[15:59] * Concubidated (~Adium@ has joined #ceph
[15:59] <vikhyat> raw: you need to register first
[16:00] * bene (~ben@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[16:00] <raw> vikhyat, ah, found the button, thanks
[16:00] <vikhyat> raw: np :)
[16:01] <tuxcraft1r> i got it
[16:01] <tuxcraft1r> username is libvirt not client.libvirt
[16:01] * vikhyat (~vumrao@ Quit (Quit: Leaving)
[16:02] * flisky (~Thunderbi@ has joined #ceph
[16:03] * bene (~ben@nat-pool-bos-t.redhat.com) has joined #ceph
[16:03] * ircolle (~Adium@2601:1:a580:1735:805a:f446:f631:31c1) has joined #ceph
[16:05] * ChrisNBlum (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[16:06] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:06] * thomnico (~thomnico@2a01:e35:8b41:120:18d:8034:c3fc:5072) has joined #ceph
[16:08] * garphy is now known as garphy`aw
[16:08] * b0e (~aledermue@ Quit (Quit: Leaving.)
[16:08] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[16:09] * cooldharma06 (~chatzilla@ Quit (Quit: ChatZilla [Iceweasel 21.0/20130515140136])
[16:10] * flisky (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[16:11] * Concubidated (~Adium@ Quit (Quit: Leaving.)
[16:13] * wushudoin (~wushudoin@ has joined #ceph
[16:14] * kanagaraj (~kanagaraj@ Quit (Quit: Leaving)
[16:15] * KeeperOfTheSoul (~tunaaja@5NZAADGD6.tor-irc.dnsbl.oftc.net) Quit ()
[16:15] * Grimmer (~Jase@ has joined #ceph
[16:16] * kefu is now known as kefu|afk
[16:19] * lpabon (~quassel@nat-pool-bos-t.redhat.com) has joined #ceph
[16:25] * markl (~mark@knm.org) has joined #ceph
[16:25] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[16:25] * adrian15b (~kvirc@31.Red-79-159-37.dynamicIP.rima-tde.net) has joined #ceph
[16:26] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[16:27] * amote (~amote@ Quit (Quit: Leaving)
[16:27] * lion (~oftc-webi@teeri.csc.fi) has joined #ceph
[16:28] * lion is now known as Guest944
[16:29] <Guest944> a very small question how should i identify if my OSD journal is created on dedicated partition or on a File , output here : http://paste.ubuntu.com/11649906/
[16:29] * thomnico (~thomnico@2a01:e35:8b41:120:18d:8034:c3fc:5072) Quit (Ping timeout: 480 seconds)
[16:30] <Guest944> as per output it looks like journal is created on dedicated partition , but i would appreciate if any expert can confirm this
[16:32] <smerz> Guest944, journal seems to be a blockdevice
[16:32] <smerz> you should find the corresponding block device under
[16:32] * DV_ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[16:32] <smerz> /dev/disk/by-id/ (or /dev/disk/by-uuid)
[16:34] <Guest944> smerz: thank you for your reply , so is it correct that by default ceph-deploy osd create ceph-node1:sdf will create journal on /dev/sdf2 and data partition on /dev/sdf1 ??
[16:34] <Guest944> because if just ran this command and it created journal on block device ? so i want to understand if ceph-deploy creates it on block device
[16:37] * rdas (~rdas@ Quit (Quit: Leaving)
[16:37] * Guest944 (~oftc-webi@teeri.csc.fi) Quit (Remote host closed the connection)
[16:38] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[16:39] <smerz> not sure about ceph-deploy. if this is what ceph-deploy gave to you than i gues you're right ;-)
[16:44] * wicope (~wicope@0001fd8a.user.oftc.net) Quit (Read error: Connection reset by peer)
[16:45] * Grimmer (~Jase@9S0AAAUGK.tor-irc.dnsbl.oftc.net) Quit ()
[16:46] * hgjhgjh (~PierreW@ has joined #ceph
[16:46] * kefu|afk is now known as kefu
[16:49] * DV_ (~veillard@2001:41d0:1:d478::1) Quit (Remote host closed the connection)
[16:51] * yanzheng (~zhyan@ has joined #ceph
[16:51] <tuxcraft1r> awesome i got a virtual machine running on an ceph rdb disk
[16:51] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[16:52] <tuxcraft1r> with root@debian:~# dd if=/dev/zero of=/dev/vda oflag=direct giving me 2.9 MB/s
[16:52] <tuxcraft1r> ulra slow ;|
[16:52] <tuxcraft1r> lets start the debuggin and performance monitoring
[16:52] * bitserker (~toni@ has joined #ceph
[16:53] * i_m (~ivan.miro@deibp9eh1--blueice4n2.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[16:55] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[16:56] <doppelgrau> tuxcraft1r: thats about 740 Requests/s. Thats okay. Try larger Blocksizes
[16:58] * yanzheng (~zhyan@ Quit (Quit: This computer has gone to sleep)
[16:59] <tuxcraft1r> doppelgrau: nice getting some io now
[17:01] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[17:02] * xarses (~andreww@ Quit (Ping timeout: 480 seconds)
[17:07] <smerz> what's the default blocksize for dd ?
[17:07] * yanzheng (~zhyan@ has joined #ceph
[17:09] * treenerd (~treenerd@ has joined #ceph
[17:10] * jluis (~joao@ Quit (Ping timeout: 480 seconds)
[17:11] <jeroen_> 512 I guess?
[17:12] <m0zes> man dd specifies 512.
[17:13] <rkeene> tuxcraft1r, Step one is probably to enable librbd caching (assuming you're using librbd and not krbd)
[17:13] * yanzheng (~zhyan@ Quit (Quit: This computer has gone to sleep)
[17:13] * vbellur (~vijay@ has joined #ceph
[17:13] <smerz> does librd caching affect O_DIRECT io ?
[17:14] <rkeene> It might, depending on how FLUSH or FUA is sent
[17:14] <rkeene> Let's find out
[17:15] <m0zes> it shouldn't. unless you set cache=unsafe (iirc)
[17:15] * hgjhgjh (~PierreW@ Quit ()
[17:15] * Frymaster (~Oddtwang@ has joined #ceph
[17:16] * naga1 (~oftc-webi@idp01webcache5-z.apj.hpecore.net) Quit (Remote host closed the connection)
[17:18] * vata1 (~vata@ has joined #ceph
[17:23] * thomnico (~thomnico@2a01:e35:8b41:120:a996:3c09:b684:4b7a) has joined #ceph
[17:25] * Hemanth (~Hemanth@ Quit (Ping timeout: 480 seconds)
[17:26] * danielitit (~danieliti@ Quit (Quit: Leaving)
[17:28] * ChrisNBlum (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[17:29] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) has joined #ceph
[17:30] * branto (~branto@ip-213-220-214-203.net.upcbroadband.cz) has left #ceph
[17:31] * dyasny (~dyasny@ has joined #ceph
[17:31] * treenerd (~treenerd@ Quit (Quit: Verlassend)
[17:31] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[17:34] * ade (~abradshaw@tmo-102-221.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[17:35] * Concubidated (~Adium@ has joined #ceph
[17:45] * Frymaster (~Oddtwang@3DDAAAVV7.tor-irc.dnsbl.oftc.net) Quit ()
[17:45] * Atomizer (~Mraedis@7R2AABJ7L.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:48] * bandrus (~brian@131.sub-70-211-67.myvzw.com) has joined #ceph
[18:01] * Concubidated1 (~Adium@ has joined #ceph
[18:04] * wenjunhuang_ (~wenjunhua@ has joined #ceph
[18:04] * smerz (~ircircirc@ Quit (Ping timeout: 480 seconds)
[18:05] * moore (~moore@ has joined #ceph
[18:06] * moore_ (~moore@ has joined #ceph
[18:06] * moore (~moore@ Quit (Read error: Connection reset by peer)
[18:07] * Concubidated (~Adium@ Quit (Ping timeout: 480 seconds)
[18:08] * yanzheng (~zhyan@ has joined #ceph
[18:09] * TheSov2 (~TheSov@cip-248.trustwave.com) has joined #ceph
[18:10] <TheSov2> is this correct? once i get a ceph cluster built and deployed, in order to get it to "function" i deploy ceph to a client machine give it the keys and then it can see the RBD images, i mount them like normal disks and it can use them.
[18:10] * yanzheng (~zhyan@ Quit ()
[18:11] * wenjunhuang (~wenjunhua@ Quit (Ping timeout: 480 seconds)
[18:12] * yanzheng (~zhyan@ has joined #ceph
[18:12] <doppelgrau> TheSov: you can use ceph this way, or use cephfs or the S3-compatible gateay or use the rbd-Images for virtual servers using qemu (can use it natively)
[18:13] <TheSov2> doppelgrau, I'm trying to replace my san
[18:13] * yanzheng (~zhyan@ Quit ()
[18:13] <TheSov2> so I want to learn ceph intimatlely before i do that
[18:14] <TheSov2> i have done like 40 deployments, so i know how to deploy a ceph cluster, now its down to key management and usage
[18:14] <TheSov2> next I think i need to learn about the crushmaps and groups
[18:14] * lcurtis (~lcurtis@ has joined #ceph
[18:15] * Atomizer (~Mraedis@7R2AABJ7L.tor-irc.dnsbl.oftc.net) Quit ()
[18:16] * aldiyen (~andrew_m@tor-exit0-readme.dfri.se) has joined #ceph
[18:16] * kawa2014 (~kawa@ Quit (Ping timeout: 480 seconds)
[18:16] <doppelgrau> TheSov2: In that case I see basically two Options: 1. as you described, use rbd ???nativ??? on the clients; 2. use some ???gateways??? (potentially with HA-Failover) that export the volumes using iscsi, SMB, NFS ???
[18:16] * kefu (~kefu@ Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:16] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[18:16] <TheSov2> doppelgrau, is it possible to setup an active/active with iscsi with ceph now?
[18:17] * yguang11 (~yguang11@2001:4998:effd:600:901e:438f:212f:6224) has joined #ceph
[18:17] <doppelgrau> TheSov2: In the nativ case, you can create keys that limit read/wirte to specific pools
[18:18] <doppelgrau> TheSov2: I???m not using iscsi (only lots of virtual server on qemu with librados), but I think active/standby is ???state of the art??? ATM
[18:18] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Remote host closed the connection)
[18:18] <TheSov2> crap
[18:19] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) has joined #ceph
[18:19] <Be-El> well, setting up an active/active cephfs is possible. but it is not recommended
[18:19] <doppelgrau> TheSov2: but I might be wrong there since it was never a topic for mee
[18:20] <TheSov2> if ceph ever implements active/active iscsi I know it will be a san killer all around
[18:20] <TheSov2> the last article from the ceph devs on this subject was last year
[18:21] <Be-El> TheSov2: what's the difference between iscsi and rbd?
[18:21] <jdillaman> TheSov2: Mike Cristie is working on active/active iscsi over rbd right now
[18:21] <TheSov2> well with iscsi, i dont need to have the ceph installed
[18:21] <TheSov2> for instance is there a ceph esx client?
[18:22] <Be-El> ok, that's a good point
[18:22] <TheSov2> jdillaman, sweet!
[18:22] <TheSov2> can i send him money?!
[18:22] <jdillaman> haha
[18:23] * sleinen1 (~Adium@2001:620:0:82::101) has joined #ceph
[18:23] <ircolle> TheSov2 - maybe bitcoin ;-)
[18:23] <TheSov2> is anyone working on a client for windows aswell?
[18:25] * jwilkins (~jwilkins@2601:9:4580:f4c:ea2a:eaff:fe08:3f1d) has joined #ceph
[18:26] * alram (~alram@ has joined #ceph
[18:27] * kawa2014 (~kawa@ has joined #ceph
[18:27] <jdillaman> TheSov2: i???ve seen some windows ports for low-level Ceph OSD interactions a la librados, but not a full rbd disk driver for windows
[18:27] * kawa2014 (~kawa@ Quit ()
[18:27] * kawa2014 (~kawa@ has joined #ceph
[18:28] * kawa2014 (~kawa@ Quit ()
[18:28] <TheSov2> jdillaman, I am very hopeful for the future, like i said I know ceph can kill sans, I have very tired of the chokehold dell,emc,hp and ibm have on storage
[18:31] <TheSov2> rbd's are production ready correct?
[18:31] * rlrevell1 (~leer@vbo1.inmotionhosting.com) has joined #ceph
[18:31] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) Quit (Read error: Connection reset by peer)
[18:32] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) has joined #ceph
[18:34] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[18:34] <doppelgrau> TheSov2: I would say so, and in contrast to cephfs I do not see any warnings from the developers (there are still some cephfs-features that are not well tested and a good fsck is missing)
[18:35] <TheSov2> doppelgrau, for cephfs you mean right?
[18:35] <TheSov2> rbd's are just block devices
[18:36] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit (Quit: Leaving)
[18:37] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Remote host closed the connection)
[18:37] <doppelgrau> TheSov2: I mean I do not know any warnings for rbds and the raddos gw, only for cephfs
[18:38] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[18:38] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit ()
[18:40] * yanzheng (~zhyan@ has joined #ceph
[18:40] <TheSov2> does anyone know of a good howto for exporting keys to a system that needs access to ceph but is not part of the pool?
[18:41] <TheSov2> do i use ceph-deploy for that?
[18:42] * yanzheng (~zhyan@ Quit ()
[18:42] * circ-user-iRddx (~circuser-@ has joined #ceph
[18:42] <doppelgrau> TheSov2: I use ansible
[18:43] <doppelgrau> TheSov2: kreate a key on a monitor and copy the result
[18:43] <doppelgrau> create
[18:43] * owasserm (~owasserm@ has joined #ceph
[18:45] * arbrandes (~arbrandes@ has joined #ceph
[18:45] * aldiyen (~andrew_m@7R2AABJ8D.tor-irc.dnsbl.oftc.net) Quit ()
[18:47] * CAPSLOCK2000 (~oftc@2001:984:3be3:1::8) has joined #ceph
[18:50] * karanmc (~oftc-webi@pat.hitachigst.com) has joined #ceph
[18:52] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) has joined #ceph
[18:54] * ngoswami (~ngoswami@ Quit (Quit: Leaving)
[18:55] * Hemanth (~Hemanth@ has joined #ceph
[18:57] * vpol (~vpol@000131a0.user.oftc.net) Quit (Quit: vpol)
[18:59] <TheSov2> whats an ideal number of monitors to have?
[18:59] <Sysadmin88> odd number :)
[18:59] <gleam> 3/5/7
[18:59] <TheSov2> sorry if im asking too many questions theres not really a "quick facts" guide to ceph
[18:59] <Sysadmin88> some places say 3 unless your really large... did see one place that gave a ratio, but i dont recall where
[18:59] <TheSov2> ok good deal
[19:00] <gleam> i'd go with 3 unless you see serious issues
[19:00] <TheSov2> standard quorum rules
[19:00] <TheSov2> even in a ceph cluster of thousands of nodes?
[19:00] <janos_> i've moved mine up to 5 when i needed to move them. "move" them. more like add two somewhere, decommission two
[19:00] <janos_> but i have a small cluster
[19:00] <doppelgrau> TheSov2: depends on your faith on the hardware ond the time you???ll need for repais
[19:01] <TheSov2> for my test setup, i have a deployment server, 3 ceph systems and 1 monitor on each
[19:01] <TheSov2> im building the test client now
[19:01] <doppelgrau> TheSov2: if less than the half of the monitors can be reached, the cluster comes to a halt
[19:01] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Remote host closed the connection)
[19:01] * fmanana (~fdmanana@bl13-135-31.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[19:02] <gleam> i mean you probably want 5 at that point just so you can survive two mon failures
[19:02] <doppelgrau> TheSov2: so with 3 monitors I down is tolerated, two down ???
[19:02] <TheSov2> wow i was actually thinking like 9
[19:03] <doppelgrau> TheSov2: with 9 monitors you???ll survice failures of 4 monitors at the same time, either you are very unlucky if you will ever need that or you have other failures that yould take down more than one monitor at the same time
[19:03] <Sysadmin88> are you using Raid under ceph or are you using one OSD per disk>
[19:04] <TheSov2> well the faq says 1 osd per disk, but also do not run many osd's on one system
[19:04] <TheSov2> i find that to be conflicting advice
[19:04] <Sysadmin88> the answer is usually... 'it depends'
[19:04] <Sysadmin88> what are your systems like?
[19:05] * thomnico (~thomnico@2a01:e35:8b41:120:a996:3c09:b684:4b7a) Quit (Quit: Ex-Chat)
[19:05] <ircolle> TheSov2 - you definitely don't want to run 9 monitors #justsayin
[19:05] <doppelgrau> Sysadmin88: I use 1 OSD/Disk, I find it easier this way with size=3 min_size=2 and failure domain=rack in the crush-map
[19:05] <TheSov2> well i was going to build a R6 of spinning disk, and R1 ssd for each R6
[19:05] <TheSov2> to use for journals
[19:05] <Sysadmin88> R6?
[19:05] <TheSov2> and 1 osd per R6/R1
[19:05] <TheSov2> raid 6
[19:06] <Sysadmin88> how many disks per host, and the host CPU/RAM?
[19:06] <ircolle> TheSov2 - JBOD
[19:06] * Kupo1 (~tyler.wil@ has joined #ceph
[19:06] <TheSov2> 4gig systems 64GB ram
[19:06] <TheSov2> 12 disks per host
[19:07] * Kupo1 (~tyler.wil@ has left #ceph
[19:07] <TheSov2> standard 4tb commodity disks for the rust
[19:07] <Sysadmin88> so 12 disks and 64GB RAM... should be reasonable with individual OSDs per disk... what CPUs?
[19:07] <TheSov2> 4 gigz intel haswell
[19:07] <Sysadmin88> which one?
[19:07] <janos_> with the possible write-speed impacts of raid1 on the journals personally i'd consider 2 journals of non-raided ssd
[19:08] <doppelgrau> Sysadmin88: I run only 4 OSD on each server, and have about 2GB RAM reserved per OSD, but enough CPU-Power (the rest is allocated to virtual servers, I manage a ???mixed??? enviroment)
[19:08] <TheSov2> the 4890s i believe
[19:08] <Sysadmin88> specify the FULL CPU name :)
[19:08] * jks (~jks@ Quit (Ping timeout: 480 seconds)
[19:08] <Sysadmin88> 4890 would be an E7 range... very high end.
[19:09] <TheSov2> E7-4890 v2
[19:09] * dgurtner (~dgurtner@ Quit (Ping timeout: 480 seconds)
[19:09] <Sysadmin88> doesnt quite go with your commodity disks...
[19:09] <TheSov2> i said 4gig sorry i mean 3
[19:10] <TheSov2> oops wrong processor
[19:10] <TheSov2> 1 sec
[19:10] <Sysadmin88> thats a 15 core CPU...
[19:10] <TheSov2> no thats whats written in the quote
[19:10] <TheSov2> its obviously wrong
[19:10] <TheSov2> i would never have picked this processor
[19:11] <Sysadmin88> how much is the quote?
[19:11] <Sysadmin88> thats a $6600 CPU lol
[19:11] <janos_> yikes
[19:12] * chutz (~chutz@rygel.linuxfreak.ca) has joined #ceph
[19:12] <Sysadmin88> iirc there was a guide saying 1Ghz per OSD... so you need 12 GHz which you could get from most quad or hex core E5 CPUs and still have some room for spikes of usage
[19:12] * Hemanth (~Hemanth@ Quit (Ping timeout: 480 seconds)
[19:12] <TheSov2> 250
[19:13] * bandrus (~brian@131.sub-70-211-67.myvzw.com) Quit (Ping timeout: 480 seconds)
[19:13] <Sysadmin88> if theyre giving that away... take it and sell it... then buy a whole server with the money :)
[19:13] <doppelgrau> TheSov2: In my experience the amount of CPU needed is often overestimated when using conventional platters, I have currently a the numbers of a server with 3 quite busy OSDs (without SSD-Journal) and even some CPU-Power ist used for managing the virtual servers it uses less that 2/3 of an CPU-Core (@2,x GHz)
[19:14] <TheSov2> found the original quote, xeon e3-1241 v3
[19:14] <TheSov2> odd
[19:15] <Sysadmin88> that e3 is nice :) i'll stick to my E5 with DDR4
[19:15] <TheSov2> im trying to figure out how cdw switched the processor on me
[19:15] <doppelgrau> TheSov2: so even with 12 OSDs and doubling the amount if you use SSD for jounal you should be fine with the CPU
[19:15] <TheSov2> its obviously a mistake
[19:15] * Swompie` (~Kyso@torsrvs.snydernet.net) has joined #ceph
[19:16] <TheSov2> so JBOD then?
[19:16] <Sysadmin88> TheSov2, got 10GbE networking?
[19:17] <TheSov2> so when i do, ceph-deploy clustername osd prepare ceph-server:sdc,sde, when i activate that i append 1 to the disks correct?
[19:17] <TheSov2> 1 gig
[19:17] <TheSov2> this is a low budget test
[19:17] <doppelgrau> TheSov2: exprct you have a raid-controller with battery backed cache, in that case it can be faster using some raid
[19:18] <Sysadmin88> so you can only do 100MB/s anyway. probably wont get much from journal SSD
[19:18] <doppelgrau> but else I would go with jbod, size=2, min_size=2
[19:18] <TheSov2> Sysadmin88, doesnt it spread the load to multiple systems?
[19:18] <doppelgrau> size=3
[19:18] <Sysadmin88> but your host only needs to do 100MB/s
[19:19] <Sysadmin88> thats not that fast
[19:19] <TheSov2> each 1 host yes
[19:19] <TheSov2> but i should be able to read from multiple yes?
[19:19] <Sysadmin88> 12 disks. could be ugly if you have to rebalance
[19:19] <TheSov2> lets say the client has 3 1 gig 802.3ad
[19:19] * bandrus (~brian@ has joined #ceph
[19:19] <Sysadmin88> your servers supermicro?
[19:19] <TheSov2> yes
[19:20] <Sysadmin88> some of their 10GbE models are not that much more expensive
[19:20] <doppelgrau> TheSov2: the client alway reads from the primary copy, so in the default configuration each 4 MB segment of an rbd is read from the same OSD/Server
[19:20] * karanmc (~oftc-webi@pat.hitachigst.com) Quit (Quit: Page closed)
[19:20] <Sysadmin88> only ?90 more for the dual 10GbE one i got over dual 1GbE
[19:20] <TheSov2> doppelgrau, isnt the point to be able to scale this?
[19:21] <TheSov2> my use case was to have dual links on all storage, and a quad link on the clients
[19:21] <TheSov2> i got the impression that ceph split the data out into buckets
[19:21] * Hemanth (~Hemanth@ has joined #ceph
[19:21] <Sysadmin88> TheSov2, what superserver your place quoting you? or is it a mix of chassis and board?
[19:22] * alfredodeza (~alfredode@ has left #ceph
[19:22] <doppelgrau> TheSov2: write speed (if the block size is large enough that you get enough IO/s to ht the limit) is even worse, since the primary has to write the data to the other OSDs (in my case with size=2 to two other) limiting sequential IO to 50MB/s (after discovering that, I switched to 2 Ports LCAP)
[19:22] <doppelgrau> TheSov2: sequential IO is limited, parallel IO scales well
[19:22] <TheSov2> brb got a meeting
[19:33] * jcalcote_ (~oftc-webi@63-248-159-172.static.layl0102.digis.net) has joined #ceph
[19:33] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has left #ceph
[19:34] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[19:34] * shylesh (~shylesh@ has joined #ceph
[19:35] <jcalcote_> Hey all - I'm a noob. I have a small quick-start (doc) change that I'd like to submit. What's the best way?
[19:35] <jcalcote_> "best" == "easiest"
[19:36] * smerz (~ircircirc@ has joined #ceph
[19:36] * sage (~quassel@2607:f298:6050:709d:5cef:354b:2871:d107) Quit (Remote host closed the connection)
[19:37] * sage (~quassel@2607:f298:6050:709d:a846:fa7:4714:887f) has joined #ceph
[19:37] * ChanServ sets mode +o sage
[19:37] <ircolle> jcalcote_ - PR on github
[19:38] <jcalcote_> I'm kinda new to github too - as I read the docs, their seemed to be two formats for PR - fork&pull and shared repo - which one does ceph use/like?
[19:38] <joshd> jcalcote_: for small doc changes you can edit it on the web even - see the pencil icon in the upper-right corner when you open one of the files https://github.com/ceph/ceph/tree/master/doc/start
[19:39] <jcalcote_> oh - cool - thanks joshd
[19:39] <joshd> it'll let you create a fork and pull request based on your change from there
[19:42] * raw (~raw@ Quit (Remote host closed the connection)
[19:44] * rlrevell1 (~leer@vbo1.inmotionhosting.com) Quit (Quit: Leaving.)
[19:45] * jashank42 (~jashan42@ has joined #ceph
[19:45] * Swompie` (~Kyso@5NZAADGQD.tor-irc.dnsbl.oftc.net) Quit ()
[19:46] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) has joined #ceph
[19:46] <jcalcote_> joshd: Wow - that was so easy it almost hurt.
[19:49] <joshd> jcalcote_: awesome! we should publicize the web-editing method more in the docs
[19:50] * haomaiwang (~haomaiwan@ Quit (Read error: Connection reset by peer)
[19:50] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Quit: Leaving...)
[19:51] * haomaiwang (~haomaiwan@ has joined #ceph
[19:51] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[19:52] * rlrevell1 (~leer@vbo1.inmotionhosting.com) has joined #ceph
[19:52] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Read error: Connection reset by peer)
[19:52] * vpol (~vpol@000131a0.user.oftc.net) has joined #ceph
[19:55] * joao (~joao@ has joined #ceph
[19:55] * ChanServ sets mode +o joao
[19:57] * fmanana (~fdmanana@bl13-135-31.dsl.telepac.pt) has joined #ceph
[19:58] * xarses (~andreww@ has joined #ceph
[19:59] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) has joined #ceph
[19:59] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) Quit ()
[20:00] * wenjunhuang_ (~wenjunhua@ Quit (Remote host closed the connection)
[20:00] * wenjunhuang_ (~wenjunhua@ has joined #ceph
[20:03] * vpol (~vpol@000131a0.user.oftc.net) Quit (Read error: No route to host)
[20:06] * midnightrunner (~midnightr@c-67-174-241-112.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[20:06] * mykola (~Mikolaj@ has joined #ceph
[20:06] * fred`` (fred@earthli.ng) has joined #ceph
[20:15] * Oddtwang (~xENO_@spftor1e1.privacyfoundation.ch) has joined #ceph
[20:20] * shylesh (~shylesh@ Quit (Remote host closed the connection)
[20:23] <TheSov2> ok so i created an rbd image, how would i assign that to a specific machine?
[20:24] * fmanana (~fdmanana@bl13-135-31.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[20:25] <doppelgrau> rbd [???id <used key>] map poolname/rbdname
[20:26] <TheSov2> im confused do i not need to generate a key firsT?
[20:27] * bene2 (~ben@nat-pool-bos-t.redhat.com) has joined #ceph
[20:27] <TheSov2> i dont understand what it means by id and key
[20:27] <TheSov2> is it like an ssh key?
[20:27] <doppelgrau> TheSov2: only if you don???t habe a fitting key
[20:27] <doppelgrau> sort of
[20:28] <doppelgrau> TheSov2: and you only need keys if you have cephx enabled
[20:28] <TheSov2> right now i have a working ceph cluster, an rbd image and a client machine with only ceph installed. what im trying to figure out is how to connect the 2
[20:28] <TheSov2> :)
[20:28] <doppelgrau> http://ceph.com/docs/master/rados/configuration/auth-config-ref/
[20:28] <TheSov2> thats what i needed thanks!
[20:29] <TheSov2> cephx is enabled i believe
[20:30] <doppelgrau> TheSov2: ceph auth get-or-create client.test osd 'allow rwx' mon 'allow r' -o /etc/ceph/ceph.client.test.keyring
[20:30] <TheSov2> so assuming this was a prod environment, i would generate a unique key for each machine and then map it
[20:30] <doppelgrau> TheSov2: and copy that key to the client
[20:31] <doppelgrau> TheSov2: (if you take a closer look, you can limit the client to a specifix pool)
[20:31] * bene (~ben@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:33] <TheSov2> so scp the file /etc/ceph/ceph.client.test.keyring to the client system correct?
[20:33] <rkeene> Does anyone do swap over RBD ? Any deadlocks ?
[20:34] <ghartz_> hi
[20:34] <doppelgrau> TheSov2: yes
[20:35] <doppelgrau> TheSov2: the command would restult in rbd ???id test ???
[20:35] <ghartz_> loicd, do you have any news about ec pool directly accessible through cephfs ?
[20:37] * vpol (~vpol@000131a0.user.oftc.net) has joined #ceph
[20:39] <TheSov2> ok i copied that keyring to the /etc/ceph on the client is there anyway to test it?
[20:43] <doppelgrau> TheSov2: ceph ???id test -s
[20:43] <TheSov2> Error initializing cluster client: Error('error calling conf_read_file: errno EINVAL',)
[20:43] <TheSov2> im doing something wrong here
[20:43] <TheSov2> thanks doppelgrau i appreciate the help
[20:43] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[20:43] <doppelgrau> TheSov2: you need a ceph.conf so the client knows the monitors
[20:43] <TheSov2> is it the same ceph.conf?
[20:43] <TheSov2> do i copy it from the storage devices?
[20:43] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:44] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[20:45] * Hemanth (~Hemanth@ Quit (Quit: Leaving)
[20:45] * Hemanth (~Hemanth@ has joined #ceph
[20:45] <doppelgrau> TheSov2: that the easiest solution
[20:45] * Oddtwang (~xENO_@3DDAAAV90.tor-irc.dnsbl.oftc.net) Quit ()
[20:46] <TheSov2> ok i did that and using ceph --id test
[20:46] * notarima (~Grum@ has joined #ceph
[20:46] <TheSov2> i got the ceph prompt
[20:46] * andrew (~oftc-webi@ has joined #ceph
[20:46] <loicd> ghartz_: it's not going to happen soon
[20:47] <TheSov2> osd lspools shows me "0 rbd," which means it can see it right?
[20:47] <ghartz_> loicd, too difficult or because the cache tiering looks enough ?
[20:47] <andrew> hi, im trying to set up a single node cluster (1 mon/1 osd node), does anyone know what i need to change so that it only replicates on n osd in 1 node?
[20:47] <ghartz_> looks like jewel will be the release of cephfs production
[20:48] * vbellur (~vijay@ Quit (Ping timeout: 480 seconds)
[20:48] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[20:49] <andrew> im currently seeing: health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean
[20:49] <doppelgrau> andrew: size=1, min_size=1 in the pools used
[20:49] * midnightrunner (~midnightr@ has joined #ceph
[20:51] <TheSov2> ok doppelgrau i did a sudo rbd map rbd/testpool --id test , it responded with /dev/rbd0 is that correct?
[20:51] <doppelgrau> yes, that means the rbd-Image is now mapped to /dev/rbd0
[20:52] <andrew> ok now it says: health HEALTH_WARN 256 pgs stuck unclean (after setting the pool size/min_size to 1) do i have to change the crush map now?
[20:52] <TheSov2> i dont see that device on the client, do i have to login to it?
[20:53] <doppelgrau> TheSov2: the device is mapped ob the computer where you execute the rbd map command
[20:53] <TheSov2> ok my bad LOL
[20:54] <doppelgrau> andrew: can you paste a full ceph -s output somewhere?
[20:55] * xdeller (~xdeller@h195-91-128-218.ln.rinet.ru) Quit (Ping timeout: 480 seconds)
[20:56] * MACscr (~Adium@2601:d:c800:de3:439:e533:783b:f33e) has joined #ceph
[20:56] <TheSov2> holy sheis it worked
[20:57] <andrew> doppelgrau, http://pastebin.com/428FyheK
[20:57] <doppelgrau> TheSov2: ???undo??? works with rbd unmap /dev/rbd<number>
[20:59] <TheSov2> yep, i just setup the rbdmap init script
[20:59] <TheSov2> gonna see if that works
[20:59] <TheSov2> then im going to test the "redundancy"
[21:00] <TheSov2> doppelgrau, you are the f***ing man!
[21:00] * djs1 (a89fd5d4@ has joined #ceph
[21:00] <doppelgrau> andrew: might need to take a look if the crush-rule aalow 1 replica
[21:01] <TheSov2> doppelgrau, are you anywhere near chicago, because i gotta buy you a beer!
[21:02] <andrew> here doppelgrau, is the crushmap: http://pastebin.com/WrUDywMk
[21:04] <doppelgrau> andrew: is the OSD active? (ceph osd tree)?
[21:05] <andrew> yes
[21:05] <andrew> osd.0 up 1
[21:05] * midnightrunner (~midnightr@ Quit (Remote host closed the connection)
[21:05] * midnightrunner (~midnightr@ has joined #ceph
[21:06] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[21:06] * adeel (~adeel@2602:ffc1:1:face:2c68:d561:c794:86c5) Quit (Quit: Leaving...)
[21:07] <doppelgrau> andrew: both pools have size=1, min_size=1 set?
[21:07] <andrew> yes
[21:07] <andrew> i tried setting the min_size to 0
[21:07] <andrew> pool 5 'images' replicated size 1 min_size 0
[21:09] * ChrisNBlum (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[21:10] <andrew> ill try adding another OSD to my cluster
[21:12] * blynch (~blynch@vm-nat.msi.umn.edu) Quit (Remote host closed the connection)
[21:12] <doppelgrau> andrew: strange, should be okay with your configuration
[21:12] * shnarch (~shnarch@bzq-109-67-128-59.red.bezeqint.net) has joined #ceph
[21:13] * midnight_ (~midnightr@ has joined #ceph
[21:13] <andrew> adding another osd result in: http://pastebin.com/24sYpdj5 (some active+clean)
[21:14] <rotbeard> how do you guys plan your networks? imaging 10 rooms with 10 racks each. talking about ~10k OSDs (sata hdd) for example. a non-blocking 40G network would be pretty nice, but is that really needed?
[21:15] * notarima (~Grum@5NZAADGVR.tor-irc.dnsbl.oftc.net) Quit ()
[21:15] * KapiteinKoffie (~Frostshif@us.tor-exit.neelc.org) has joined #ceph
[21:16] <doppelgrau> andrew: belong the the stuck pgs to the same pool or to different pools?
[21:16] <andrew> how do i check?
[21:16] <TheSov2> without getting into crushmaps and all that how do i change the amount of replication on an rbd?
[21:17] <doppelgrau> andrew ://ceph.com/docs/master/rados/operations/placement-groups/ ceph pg dump
[21:18] <doppelgrau> TheSov2: you set in on a per pool basis
[21:19] <doppelgrau> TheSov2: http://ceph.com/docs/master/rados/operations/pools/ ceph osd pool set poolname size/min_size value
[21:19] <TheSov2> so because i left it on the default rbd pool, that means its replicated 3 times?
[21:19] <doppelgrau> TheSov2: ceph osd pool get poolname size
[21:20] * midnightrunner (~midnightr@ Quit (Ping timeout: 480 seconds)
[21:20] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[21:20] <doppelgrau> rotbeard: depends on your IO pattern
[21:21] <TheSov2> size 3
[21:22] <rotbeard> doppelgrau, the frontend traffic won't be the problem I guess. but with 10k+ OSDs I guess more _trouble_ will come to the backend side (even if talking about a room failure)
[21:23] * midnight_ (~midnightr@ Quit (Ping timeout: 480 seconds)
[21:23] <TheSov2> ok can someone explain the deal with metadata servers?
[21:25] <TheSov2> do you guys recommend monitor servers be seperate from storage servers?
[21:25] <TheSov2> I have a billion questions heh
[21:28] <doppelgrau> rotbeard: did you need fast recovery in the case of a room-failure?
[21:29] <doppelgrau> rotbeard: else I do not see why youll need to much headroom
[21:31] <andrew> doppelgrau, hmm health is OK once i set the pg of the pools to 64 each
[21:31] <rotbeard> doppelgrau, well. that's exactly the thing... I personally would say a recovery could take days or weeks, if I use a replica count of 4+ and if the room is my failure domain. but managers like to have small recovery times and stuff...
[21:33] <doppelgrau> rotbeard: with size=4, min_size=2 a lot can fail without problems
[21:33] <loicd> ghartz_: because cache tiering should be enough, yes
[21:34] <rotbeard> yep, I think so. but one thought: if I use the _room_ as failure domain means, that a written object in room1 is replicated to room2, 3 and 4 (for example). so just using a 100G link between two rooms couldn't be enough while having ~1k OSD per room, right?
[21:35] <doppelgrau> rotbeard: depends on the amount (and traffik pattern) of IO youll have
[21:36] * vpol (~vpol@000131a0.user.oftc.net) Quit (Quit: vpol)
[21:36] <doppelgrau> rotbeard: in normal operation rad is simple: from a (random) primary OSD to the client (client in the same rooms or smewhere else?)
[21:36] <rotbeard> we are planing a public cloud, but I don't know, what pattern we really have...
[21:37] <rotbeard> *will have
[21:37] <doppelgrau> rotbeard: writing is a bit uglyer: from the client to the primary OSD and from there (size-1)times to the other replicas
[21:38] <rotbeard> doppelgrau, I think I can't guarantee that a client will always write and read to an OSD in his room. (we are planing to use openstack + ceph across all 10 rooms)
[21:39] <doppelgrau> so in normal operations about size/10th of the write traffik goes into each room and 1/10ths of the read traffik
[21:39] <rotbeard> is it reallt _that_ simple? :p
[21:39] <rotbeard> *really
[21:40] <doppelgrau> rotbeard: on average, yes. random distribution :)
[21:41] * mhack (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[21:41] <doppelgrau> rotbeard: (and 1/10th of the write traffic leaves a room an 1/10th enters the room)
[21:41] <doppelgrau> rotbeard: and with virtual servers, you???ll usually get very small IOs
[21:42] <rotbeard> so it is - in normal operation - more about iops than bandwidth?
[21:42] * linjan (~linjan@ has joined #ceph
[21:42] <doppelgrau> rotbeard: so the IO/s are in my eperience really fast a problem, not the bandwidth
[21:45] * KapiteinKoffie (~Frostshif@3DDAAAWED.tor-irc.dnsbl.oftc.net) Quit ()
[21:46] <rotbeard> ok. so I have to think about the recovery time while rebalancing the data of a complete room more than sizing everything to max bandwidth for a normal operating mode
[21:46] * Chrissi_ (~cheese^@ has joined #ceph
[21:46] <doppelgrau> rotbeard: wouldn???t it easier to repair a ???room failure??? than rebalance?
[21:46] <doppelgrau> at least usually?
[21:47] <rotbeard> i am not pretty deep into ceph (right now), but that will result in setting the cluster to _noout_, won't it?
[21:48] <doppelgrau> rotbeard: you can set the maximum entity that ist marked out automatically
[21:48] <doppelgrau> rotbeard: so you can say everything smaller than that is outmatically rebalanced after x minutes
[21:48] <rotbeard> doppelgrau, to just let some broken disks rebalance but not a complete room failure?
[21:48] <doppelgrau> rotbeard: exactly
[21:49] <rotbeard> nice point of view.
[21:49] <doppelgrau> rotbeard: I set the size to server since I have only a few racks
[21:49] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[21:51] <doppelgrau> so if a full rack fails everything still works (size=3, min_size=2, failure-domain set to rack) since that should be repaired asap. If another OSD fails I have stuck IO till it is rebalanced, but with size4, min_size=2 that won???t happen too
[21:51] <rotbeard> thinking about a short "repair time" of the room everything will be fine. but if maybe that whole room burns down and all disks in that room are destroyed... I have to trigger a rebalancing manually right? (after building a new room or maybe in parallel)
[21:52] * midnightrunner (~midnightr@ has joined #ceph
[21:52] * garphy`aw is now known as garphy
[21:52] <doppelgrau> rotbeard: mon osd downout subtree limit is the parameter
[21:53] * bitserker (~toni@ Quit (Ping timeout: 480 seconds)
[21:53] <doppelgrau> rotbeard: yes, in that case you can simply mark the osds out and rebalancing starts
[21:53] <doppelgrau> rotbeard: but ceph won???t do that in that size automatically
[21:54] * garphy is now known as garphy`aw
[21:54] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[21:54] <rotbeard> should be ok, when talking about size 4 and 10k+ OSDs
[21:55] <doppelgrau> rotbeard: yes, with size=4 I won???t see any problems, if you set min_size=2 then even after a whole room has failed an other failure won???t lead to blocing IO
[21:56] <doppelgrau> rotbeard: and If you know the failure won???t revover fast, than you can mark it out an let the rebuild happen
[21:57] <doppelgrau> TheSov2: mds are for cephfs
[21:57] <rotbeard> for me the recovery time isn't that necessary, but u know the CTOs and CIOs out there ;)
[21:58] <rotbeard> doppelgrau, thanks so far. sounds like a pretty good idea for me/us
[21:58] * xdeller (~xdeller@h195-91-128-218.ln.rinet.ru) has joined #ceph
[21:58] <doppelgrau> TheSov2: seperate monitors are easier to manage during upgrades, but no neseccary
[21:59] <doppelgrau> rotbeard: if you want really fast recovery, you also need enough spare IO capacity ???
[22:00] <doppelgrau> rotbeard: If you ???only??? use size=3 I would understand the C?Os, since at that size a second failure after a room failure won???t bee unlikely and that would lead to some blocking IO => bad expierience for the users/SLA violation
[22:00] <rotbeard> just curious: what OSD nodes are you using? I plan to use 2 6 core Xeons + 64G RAM + 12 OSD sata disks with 4 journal SSDs and 4x 40G infiniband (2 times frontend, 2 times backend)
[22:05] <mtanski> how fast are your sata disks?
[22:07] <TheSov2> do monitors need to be beefy machines? or can a rasperry pi with x86 emulation do that
[22:07] <rotbeard> mtanski, we will use WD Red drives, so: not _that_ fast ;)
[22:08] <TheSov2> can u use the same disk for multiple osd journals?
[22:08] <TheSov2> like 1 ssd for like 4 rust
[22:10] <rotbeard> TheSov2, depending on the SSDs 1:4 ratio is a good choice imo. we use 1:3 + use intel dc s3700 SSDs as journals, so our journal disks won't be the bottleneck
[22:11] <TheSov2> how does that work exactly do you do ceph-deploy osd prepare ceph1:sdc:sdb, then ceph-deploy osd prepare ceph1:sde:sdb, assuming sdb is the ssd
[22:11] <mtanski> 2x40G will be more then enough to cover 12 sata disk @ maxiumum sequential speed by like 4x
[22:11] <mtanski> unless you???re doing the links for redudancy
[22:12] <mtanski> 12 * 200Mb = 2400
[22:13] <rotbeard> TheSov2, exactly
[22:13] <mtanski> Sorry: 12 OSD * 200MB = 2400MB vs 80gigabit = 10000MB
[22:13] <TheSov2> nice ok
[22:13] <TheSov2> but the activates have a 1 after the disk
[22:13] <TheSov2> would there be 2 on the ssd?
[22:13] <TheSov2> partitions i mean
[22:14] <mtanski> I???m using 200MB which is the segate enterprise nas drives, I think the spec sheet for wd red says a bit less
[22:14] * sleinen1 (~Adium@2001:620:0:82::101) Quit (Ping timeout: 480 seconds)
[22:14] <rotbeard> TheSov2, if you manually activate your OSDs with ceph-deploy: yep. it would be [...] ceph1:sdc:sdb1 ceph1:sdd:sdb2 and so on
[22:14] <TheSov2> like ceph-deploy osd activate ceph1:sdc1:sdb1 , then ceph-deploy osd activate ceph1:sde1:sdb2?
[22:15] <TheSov2> ok
[22:15] <TheSov2> nice
[22:15] <rotbeard> mtanski, we saw about 180MB with some fio benchmarks
[22:15] <TheSov2> should i not do it manually?
[22:15] <rotbeard> we just want to use those 40G boys to prepare ourself for SSD only pools (later)
[22:15] <mtanski> just saying you have 4x coverage of max speed of their drives
[22:15] * Chrissi_ (~cheese^@5NZAADGYA.tor-irc.dnsbl.oftc.net) Quit ()
[22:15] <TheSov2> i get a warning when i do that, that the system would not be hot swappable if i move the journal to another disk
[22:16] * Vale (~vegas3@ncc-1701-d.tor-exit.network) has joined #ceph
[22:16] <rotbeard> TheSov2, really don't know, for me (ceph hammer + newest release of ceph-deploy) it actives the OSDs automagically while preparing them
[22:16] * linuxkidd (~linuxkidd@ Quit (Quit: Leaving)
[22:17] <TheSov2> u guys better stop answering my questions or ill just keep going :D
[22:18] * Nacer (~Nacer@2001:41d0:fe82:7200:25ea:e339:3796:2987) has joined #ceph
[22:20] * lpabon (~quassel@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[22:20] <TheSov2> i want to see an aliens vs borg movie
[22:21] * dephcon (~oftc-webi@c73-110.rim.net) Quit (Quit: Page closed)
[22:27] * nsoffer (~nsoffer@bzq-79-180-80-86.red.bezeqint.net) has joined #ceph
[22:35] * marvin999 (~marvin999@ppp-204-129.29-151.libero.it) has joined #ceph
[22:35] <TheSov2> how does one safely shutdown a ceph cluster?
[22:35] * jashank42 (~jashan42@ Quit (Ping timeout: 480 seconds)
[22:36] <TheSov2> does anyone know how beefy the monitors need to be?
[22:37] <doppelgrau> TheSov2: poweroff :)
[22:38] * marvin999 (~marvin999@ppp-204-129.29-151.libero.it) Quit ()
[22:38] <TheSov2> all at once or can i shutdown monitors and it will go read only then power it off?
[22:38] * marvin999 (~marvin999@ppp-204-129.29-151.libero.it) has joined #ceph
[22:38] <doppelgrau> TheSov2: it depends ;) But I think I???ve seen posts on the ML that someone had success with the new atoms (the one with ecc ram)
[22:38] <TheSov2> ML?
[22:39] <doppelgrau> TheSov2: mailinglist
[22:39] <TheSov2> oooh i need to get on that
[22:39] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) Quit (Quit: bye!)
[22:41] <TheSov2> so basically to get started properly, I need at least 3 semi decent machines to run as monitors. 3 storage systems with 64gb of ram and 1 gigahertz per osd, in JBOD. 2 gigabits for the backend connectivity within the ceph cluster. 1 gig nic on each machine for client connectivity - does that sound right?
[22:42] * marvin999 (~marvin999@ppp-204-129.29-151.libero.it) Quit ()
[22:42] <doppelgrau> TheSov2: mons can run on the OSD-Machines
[22:43] <doppelgrau> TheSov2: Rest sounds okay for a start
[22:43] <TheSov2> right but for proper seperatation and like was said earlier, ease of upgrading id like to make sure they are different
[22:43] <doppelgrau> yeas, easier that way
[22:43] * ganders (~root@ Quit (Quit: WeeChat 0.4.2)
[22:44] <TheSov2> for the backend connectivity do they need to be bonded via lacp or does ALB work ok?
[22:44] <rkeene> ALB should be fine, but I haven't tested it much
[22:44] <TheSov2> if i dont have to get the network team involved, that would be best
[22:45] <TheSov2> i got a few more questions if you dont mind :)
[22:45] <TheSov2> how do I add an osd node to an existing cluster?
[22:45] <TheSov2> i know to make a new cluster i do ceph-deploy new ceph1 ceph2 ceph3 etc
[22:45] * andrew (~oftc-webi@ Quit (Remote host closed the connection)
[22:45] * Vale (~vegas3@7R2AABKF1.tor-irc.dnsbl.oftc.net) Quit ()
[22:45] <TheSov2> but after its made, how do you add more?
[22:46] * pakman__ (~TheDoudou@ has joined #ceph
[22:46] <rkeene> You'd configure the node's ceph.conf file and then add the OSDs that node will export
[22:46] <TheSov2> do i also copy the admin keyring?
[22:46] <rkeene> There's probably some wrapper script (like ceph-deploy, if it's not ceph-deploy) that does this for you
[22:47] <TheSov2> well the key here is that i can train others to do it easily
[22:47] <rkeene> You'll probably want the admin keyring over there too
[22:47] <TheSov2> my boss wants to make sure im replaceable...
[22:48] * jashank42 (~jashan42@ has joined #ceph
[22:49] <adrian15b> I have problems with an slow 'hammer' ceph system. Any idea on where to start to debug why everything is collapsing ? Somehow network is not used for reading the hard disk data. Thank you.
[22:49] <TheSov2> when upgrading, how does it work, do you do monitors first and then OSD nodes?
[22:50] <TheSov2> or vice versa?
[22:53] <TheSov2> well ill see you guys tomorrow for more jeopardy questions!
[22:53] <TheSov2> :D
[22:54] * TheSov2 (~TheSov@cip-248.trustwave.com) Quit (Quit: Leaving)
[22:56] <adrian15b> Can anyone explain to me why "ceph osd unset norecover" works for me better than "ceph osd set norecover" ? Is there any point where 'hammer' ceph refuses to work if he cannot recover itself ? What's the catch ? Thank you.
[22:56] * capri_on (~capri@ has joined #ceph
[22:57] <adrian15b> Where "works for me" means having 20% or 40%wa instead of the awful 70% or 80% in the virtual machines that uses the ceph contents. Thank you.
[22:59] * mykola (~Mikolaj@ Quit (Remote host closed the connection)
[22:59] * bjornar_ (~bjornar@ has joined #ceph
[22:59] <bjornar_> Is it possible to have a "passive" monitor? not involved in quorum
[23:00] <bjornar_> beeing able to "enable" it in need..
[23:01] * BManojlovic (~steki@cable-89-216-223-146.dynamic.sbb.rs) has joined #ceph
[23:03] * capri (~capri@ Quit (Ping timeout: 480 seconds)
[23:03] * rlrevell1 (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[23:05] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) has joined #ceph
[23:05] * joshd (~jdurgin@ Quit (Ping timeout: 480 seconds)
[23:07] <doppelgrau> bjornar: you can simply add an monitor if you need it an remove another
[23:07] <doppelgrau> as long the majority is active
[23:07] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[23:08] <adrian15b> Having run "ceph osd unset norecover" has solved my previous problem of every I/O being stuck. That's why I ask. It does not make any sense (probably because I don't know much more ceph than you do guys). Thank you.
[23:08] * tupper_ (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Ping timeout: 480 seconds)
[23:11] * bene2 (~ben@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[23:13] * wushudoin (~wushudoin@ Quit (Ping timeout: 480 seconds)
[23:15] * pakman__ (~TheDoudou@8Q4AABDO6.tor-irc.dnsbl.oftc.net) Quit ()
[23:15] * KUSmurf (~legion@ has joined #ceph
[23:16] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[23:19] * Hemanth (~Hemanth@ Quit (Ping timeout: 480 seconds)
[23:21] * bitserker (~toni@ has joined #ceph
[23:21] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) Quit (Ping timeout: 480 seconds)
[23:21] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) has joined #ceph
[23:22] * joshd (~jdurgin@66-194-8-225.static.twtelecom.net) has joined #ceph
[23:26] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (Ping timeout: 480 seconds)
[23:27] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[23:30] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) Quit (Ping timeout: 480 seconds)
[23:30] * wushudoin_ (~wushudoin@ has joined #ceph
[23:31] * joshd (~jdurgin@66-194-8-225.static.twtelecom.net) Quit (Ping timeout: 480 seconds)
[23:35] <doppelgrau> adrian15b: were you pg ???active+clean??? befor or did you have some pgs that were some kind of stuck?
[23:37] <adrian15b> doppelgrau: Probably. Let me show part of the logs. http://paste.debian.net/213526/ . Do you see anything weird on them ?
[23:38] <adrian15b> doppelgrau: And these are the logs from 5 minutes ago or so: http://paste.debian.net/213527/
[23:41] <doppelgrau> adrian15b: since the Pool was not clean and you hat ???norecover??? set, ceph couldn???t recover => IO was stuck
[23:42] <adrian15b> doppelgrau: But...
[23:42] <doppelgrau> adrian15b: norecover, noout, nobackfill shouldn???t be permanently active
[23:43] * LeaChim (~LeaChim@host86-163-124-72.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[23:43] <adrian15b> doppelgrau: If I know that I have three nodes and information is in two of them and I want to pause the recover in the "non-having" information one. Do you mean that I also somehow pause recover on the two which are ok?
[23:43] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[23:44] * joshd (~jdurgin@ has joined #ceph
[23:44] <doppelgrau> what are the size and min_size of your pools and do you use the default crush map?
[23:45] * KUSmurf (~legion@5NZAADG2D.tor-irc.dnsbl.oftc.net) Quit ()
[23:46] * QuantumBeep (~pakman__@9S0AAAU4U.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:46] * alfredodeza (~alfredode@ has joined #ceph
[23:46] <bjornar_> doppelgrau, yeah.. but I have a 2-datacenter problem..
[23:47] <bjornar_> doppelgrau, and would be nice beeing able to have a "passive" monitor I think.. that I can enable manually if I want.
[23:47] * circ-user-iRddx (~circuser-@ Quit (Ping timeout: 480 seconds)
[23:48] <doppelgrau> bjornar_: I don???t think thats supported, since that monitor could have stale data => potential dataloss
[23:48] <adrian15b> doppelgrau: The min_size is 2. The size is 3. And I guess I'm using the default crush map but I'm not totally sure.
[23:48] <adrian15b> doppelgrau: Can you tell me how to dump the current crush map into text? The proxmox crush copy is without line breaks and that's why I ask you.
[23:49] <doppelgrau> adrian15b: then one server down out of three should be tolerated, if the cluster was healthy before
[23:49] <bjornar_> doppelgrau, sure, but thats eventually something I would need to handle..
[23:49] <doppelgrau> http://ceph.com/docs/master/rados/operations/crush-map/ ceph osd getcrushmap -o {compiled-crushmap-filename}; crushtool -d {compiled-crushmap-filename} -o {decompiled-crushmap-filename}
[23:50] <adrian15b> doppelgrau: Ceph devs should write a wrapper for it.
[23:50] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[23:51] <doppelgrau> bjornar_: if the datacenters had a link support the ???osd-traffic??? the trafffik of a monitor should be neglible
[23:51] <adrian15b> doppelgrau: So you mean that the cluster might not have been healthy just before doing some operations and that's why recover was still needed.
[23:51] * OutOfNoWhere (~rpb@ has joined #ceph
[23:52] <doppelgrau> adrian15b: yes, if an unhealthy cluster of 3 nodes looses 1 node, some PGs might falls below 2 (current) coppies => blocked IO
[23:52] <doppelgrau> and norecover had prevented ceph from fixing this
[23:53] <adrian15b> doppelgrau: Unhealthy means not showing "Health OK". I mean... Warn does not mean "Healthy" for you, isn't it ?
[23:53] <doppelgrau> adrian15b: warn youl mean unhealthy in this case, yes
[23:54] <adrian15b> In order to avoid too much I/O I have reduced the recovery threads from default 15 to 1.
[23:55] <adrian15b> doppelgrau: If I'm not mistaken you might advise me to reset it to the default 15 threads ?
[23:55] <doppelgrau> adrian15b: thats ok, just takes a bit longer
[23:55] <doppelgrau> no, just wait now
[23:55] <adrian15b> doppelgrau: Or it does not matter ... Ok. It does not matter.
[23:55] <doppelgrau> adrian15b: I guess the problem was that you had set norevover some time in the past
[23:55] <adrian15b> doppelgrau: It's nice finding someone that seems to understand what was your problem.
[23:56] <adrian15b> doppelgrau: Well, the idea was to just set recover by night and set norecover by day.
[23:56] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[23:58] * jbautista- (~wushudoin@transit-86-181-132-209.redhat.com) has joined #ceph
[23:59] <adrian15b> doppelgrau: Would you mind taking a look at my crush map ( http://paste.debian.net/213532/ ) and tell me if you see something not standard. It's probably non standard... but I think it's a normal crushmap because I don't have, e.g., ssd and sata pools.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.