#ceph IRC Log

Index

IRC Log for 2015-12-28

Timestamps are in GMT/BST.

[0:02] * Discovery (~Discovery@ip-176-199-137-160.hsi06.unitymediagroup.de) Quit (Ping timeout: 480 seconds)
[0:04] * Discovery (~Discovery@178.239.49.68) has joined #ceph
[0:06] * Peltzi- (peltzi@peltzi.fi) has joined #ceph
[0:06] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[0:06] * Peltzi (peltzi@peltzi.fi) Quit (Remote host closed the connection)
[0:08] * mui (mui@eutanasia.mui.fi) Quit (Ping timeout: 480 seconds)
[0:12] * Cybertinus (~Cybertinu@cybertinus.customer.cloud.nl) Quit (Remote host closed the connection)
[0:14] * daviddcc (~dcasier@80.12.39.155) Quit (Ping timeout: 480 seconds)
[0:15] * Cybertinus (~Cybertinu@cybertinus.customer.cloud.nl) has joined #ceph
[0:22] * liiwi (liiwi@idle.fi) has joined #ceph
[0:22] * mui (mui@eutanasia.mui.fi) has joined #ceph
[1:22] * dyasny (~dyasny@198.251.54.204) has joined #ceph
[1:27] * EinstCrazy (~EinstCraz@218.69.72.130) Quit (Remote host closed the connection)
[1:28] * oms101 (~oms101@p20030057EA4ECE00C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:36] * oms101 (~oms101@p20030057EA347000C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:05] * Neon (~rapedex@109.69.67.17) has joined #ceph
[2:12] * EinstCrazy (~EinstCraz@111.30.21.47) has joined #ceph
[2:16] * EinstCrazy (~EinstCraz@111.30.21.47) Quit (Read error: Connection reset by peer)
[2:16] * EinstCrazy (~EinstCraz@111.30.21.47) has joined #ceph
[2:26] * Kingrat (~shiny@2605:a000:161a:c0f6:1c5:ba5d:6401:e971) has joined #ceph
[2:29] * rendar (~I@host176-178-dynamic.246-95-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[2:35] * Neon (~rapedex@84ZAAARQO.tor-irc.dnsbl.oftc.net) Quit ()
[2:44] * Wizeon (~Nanobot@104.238.176.106) has joined #ceph
[2:47] * dyasny (~dyasny@198.251.54.204) Quit (Ping timeout: 480 seconds)
[3:09] * dyasny (~dyasny@198.251.54.204) has joined #ceph
[3:14] * Wizeon (~Nanobot@76GAAAQIF.tor-irc.dnsbl.oftc.net) Quit ()
[3:14] * Dysgalt (~lobstar@195-154-69-88.rev.poneytelecom.eu) has joined #ceph
[3:21] * m8x` (~user@182.150.27.112) Quit (Remote host closed the connection)
[3:22] * zhaochao (~zhaochao@111.161.77.238) has joined #ceph
[3:29] * dyasny (~dyasny@198.251.54.204) Quit (Ping timeout: 480 seconds)
[3:29] * Discovery (~Discovery@178.239.49.68) Quit (Read error: Connection reset by peer)
[3:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:36] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:44] * Dysgalt (~lobstar@84ZAAART2.tor-irc.dnsbl.oftc.net) Quit ()
[3:44] * AG_Clinton (~Bored@76GAAAQKQ.tor-irc.dnsbl.oftc.net) has joined #ceph
[3:57] * doppelgrau_ (~doppelgra@p5DC06520.dip0.t-ipconnect.de) has joined #ceph
[3:57] * naoto (~naotok@27.131.11.254) has joined #ceph
[3:58] * dyasny (~dyasny@198.251.54.204) has joined #ceph
[4:02] * doppelgrau (~doppelgra@p5DC075CE.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:02] * doppelgrau_ is now known as doppelgrau
[4:08] * dyasny (~dyasny@198.251.54.204) Quit (Ping timeout: 480 seconds)
[4:13] * mika_c (~quassel@122.146.93.152) has joined #ceph
[4:14] * AG_Clinton (~Bored@76GAAAQKQ.tor-irc.dnsbl.oftc.net) Quit ()
[4:15] * mika_c (~quassel@122.146.93.152) Quit (Read error: Connection reset by peer)
[4:24] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[4:33] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Quit: Leaving...)
[4:35] <rkeene> win 11
[4:40] <TheSov2> rkeene, what about it?
[4:42] <rkeene> It's on FIRE !
[5:07] * Vacuum_ (~Vacuum@88.130.197.2) has joined #ceph
[5:14] * Vacuum__ (~Vacuum@i59F79EB2.versanet.de) Quit (Ping timeout: 480 seconds)
[5:21] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[5:25] * i_m (~ivan.miro@88.206.113.199) has joined #ceph
[5:34] * overclk (~vshankar@121.244.87.117) has joined #ceph
[5:41] * amote (~amote@121.244.87.116) has joined #ceph
[5:45] * ram_ (~oftc-webi@static-202-65-140-146.pol.net.in) has joined #ceph
[5:47] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[5:55] * kefu (~kefu@114.92.107.250) has joined #ceph
[6:11] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:14] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[6:25] * karnan (~karnan@121.244.87.117) has joined #ceph
[6:26] * dgbaley27 (~matt@75.148.118.217) has joined #ceph
[6:30] * ram_ (~oftc-webi@static-202-65-140-146.pol.net.in) Quit (Ping timeout: 480 seconds)
[6:48] * swami2 (~swami@49.32.0.185) has joined #ceph
[6:49] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[6:55] * karnan_ (~karnan@121.244.87.117) has joined #ceph
[6:55] * karnan_ (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[6:55] * karnan_ (~karnan@121.244.87.117) has joined #ceph
[6:56] * karnan_ (~karnan@121.244.87.117) Quit ()
[6:56] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[6:56] * karnan_ (~karnan@121.244.87.117) has joined #ceph
[6:57] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) has joined #ceph
[6:59] * karnan_ (~karnan@121.244.87.117) Quit ()
[6:59] * karnan (~karnan@121.244.87.117) has joined #ceph
[6:59] * amote (~amote@121.244.87.116) Quit (Quit: Leaving)
[7:00] * amote (~amote@121.244.87.116) has joined #ceph
[7:08] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[7:26] * kefu (~kefu@114.92.107.250) Quit (Max SendQ exceeded)
[7:27] * kefu (~kefu@114.92.107.250) has joined #ceph
[7:28] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) Quit (Quit: Wielebny)
[7:29] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) has joined #ceph
[7:55] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[7:58] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) Quit (Remote host closed the connection)
[7:58] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) has joined #ceph
[8:01] * MACscr (~Adium@2601:247:4101:a0be:7cd8:bc3:a0b4:bdb) has joined #ceph
[8:05] * trociny (~mgolub@93.183.239.2) has joined #ceph
[8:05] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) Quit (Remote host closed the connection)
[8:06] * Wielebny (~Icedove@cl-927.waw-01.pl.sixxs.net) has joined #ceph
[8:07] * Epi (~Behedwin@93.190.142.139) has joined #ceph
[8:09] * vbellur (~vijay@2601:647:4f00:4960:5e51:4fff:fee8:6a5c) has joined #ceph
[8:11] * MannerMan (~oscar@user170.217-10-117.netatonce.net) has joined #ceph
[8:35] * xiangxinyong__ (~xiangxiny@182.138.104.21) Quit (Read error: Connection reset by peer)
[8:35] * xiangxinyong_ (~xiangxiny@182.138.104.21) Quit (Read error: Connection reset by peer)
[8:37] * Epi (~Behedwin@4MJAAATW2.tor-irc.dnsbl.oftc.net) Quit ()
[8:37] * swami2 (~swami@49.32.0.185) Quit (Read error: Connection reset by peer)
[8:38] * swami1 (~swami@49.32.0.185) has joined #ceph
[8:43] * evilrob (~evilrob@2600:3c00::f03c:91ff:fedf:1d3d) Quit (Ping timeout: 480 seconds)
[8:44] * jgornick (~jgornick@2600:3c00::f03c:91ff:fedf:72b4) Quit (Ping timeout: 480 seconds)
[8:46] * ktdreyer (~kdreyer@polyp.adiemus.org) Quit (Ping timeout: 480 seconds)
[8:46] * skorgu (skorgu@pylon.skorgu.net) Quit (Ping timeout: 480 seconds)
[8:46] * trey (~trey@trey.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:46] * motk_ (~motk@2600:3c00::f03c:91ff:fe98:51ee) Quit (Ping timeout: 480 seconds)
[8:47] * nigwil_ (~Oz@li1101-124.members.linode.com) Quit (Ping timeout: 480 seconds)
[8:47] * qman (~rohroh@2600:3c00::f03c:91ff:fe69:92af) Quit (Ping timeout: 480 seconds)
[8:48] * b0e (~aledermue@213.95.25.82) has joined #ceph
[8:48] * fattaneh (~fattaneh@31.59.132.152) has joined #ceph
[8:50] * smf68 (~ghostnote@46.166.188.241) has joined #ceph
[8:55] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[9:01] * jgornick (~jgornick@2600:3c00::f03c:91ff:fedf:72b4) has joined #ceph
[9:01] * evilrob (~evilrob@2600:3c00::f03c:91ff:fedf:1d3d) has joined #ceph
[9:01] * qman (~rohroh@2600:3c00::f03c:91ff:fe69:92af) has joined #ceph
[9:02] * pabluk_ is now known as pabluk
[9:03] * trey (~trey@trey.user.oftc.net) has joined #ceph
[9:05] * ktdreyer (~kdreyer@polyp.adiemus.org) has joined #ceph
[9:05] * skorgu (skorgu@pylon.skorgu.net) has joined #ceph
[9:05] * egonzalez (~egonzalez@41.Red-88-15-119.dynamicIP.rima-tde.net) has joined #ceph
[9:05] * motk (~motk@2600:3c00::f03c:91ff:fe98:51ee) has joined #ceph
[9:05] * nigwil (~Oz@li1101-124.members.linode.com) has joined #ceph
[9:20] * smf68 (~ghostnote@46.166.188.241) Quit ()
[9:29] * utugi______ (~ghostnote@59-234-47-212.rev.cloud.scaleway.com) has joined #ceph
[9:45] * swami1 (~swami@49.32.0.185) Quit (Ping timeout: 480 seconds)
[9:45] * fsimonce (~simon@host229-72-dynamic.54-79-r.retail.telecomitalia.it) has joined #ceph
[9:46] * bvi (~bastiaan@185.56.32.1) has joined #ceph
[9:46] * vbellur (~vijay@2601:647:4f00:4960:5e51:4fff:fee8:6a5c) Quit (Ping timeout: 480 seconds)
[9:58] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[9:59] * utugi______ (~ghostnote@84ZAAAR6G.tor-irc.dnsbl.oftc.net) Quit ()
[10:07] * hyperbaba (~hyperbaba@private.neobee.net) has joined #ceph
[10:09] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[10:21] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[10:32] * LDA (~DM@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[10:35] * LDA (~DM@host217-114-156-249.pppoe.mark-itt.net) Quit ()
[10:36] * rakeshgm (~rakesh@121.244.87.117) Quit (Remote host closed the connection)
[10:38] * bliu (~liub@203.192.156.9) Quit (Ping timeout: 480 seconds)
[10:44] * Bwana (~Thayli@46.166.190.187) has joined #ceph
[10:45] * linjan_ (~linjan@176.195.223.167) has joined #ceph
[10:53] * bliu (~liub@203.192.156.9) has joined #ceph
[10:57] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:03] * Mika_c (~Mika@122.146.93.152) has joined #ceph
[11:10] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[11:10] * Jowtf (~JoHo@mail.dkv.lu) has joined #ceph
[11:11] * rendar (~I@87.19.183.81) has joined #ceph
[11:14] * Bwana (~Thayli@46.166.190.187) Quit ()
[11:19] * click (~Nijikokun@65.19.167.131) has joined #ceph
[11:23] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[11:30] * fattaneh (~fattaneh@31.59.132.152) has left #ceph
[11:30] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[11:33] * swami1 (~swami@49.32.0.241) has joined #ceph
[11:34] * fdmanana (~fdmanana@2001:8a0:6dfd:6d01:7911:4464:7501:b333) has joined #ceph
[11:49] * click (~Nijikokun@76GAAAQXF.tor-irc.dnsbl.oftc.net) Quit ()
[11:56] * naoto (~naotok@27.131.11.254) Quit (Remote host closed the connection)
[11:59] * Mika_c (~Mika@122.146.93.152) Quit (Read error: Connection reset by peer)
[12:04] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[12:07] * EinstCrazy (~EinstCraz@111.30.21.47) Quit (Remote host closed the connection)
[12:12] * smerz (~ircircirc@37.74.194.90) Quit (Remote host closed the connection)
[12:25] * Vacuum__ (~Vacuum@88.130.206.152) has joined #ceph
[12:31] * zhaochao (~zhaochao@111.161.77.238) Quit (Quit: ChatZilla 0.9.92 [Iceweasel 38.5.0/20151216011944])
[12:32] * Vacuum_ (~Vacuum@88.130.197.2) Quit (Ping timeout: 480 seconds)
[12:34] * EinstCrazy (~EinstCraz@218.69.72.130) has joined #ceph
[12:37] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[12:47] * Discovery (~Discovery@178.239.49.68) has joined #ceph
[12:59] * karnan (~karnan@121.244.87.117) has joined #ceph
[13:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[13:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[13:06] <TMM> I have someone from our hardware vendor coming over tomorrow morning to service a bunch of nodes in my ceph cluster. I have way more capacity than I actually use at the moment and I wondered if there are any big negatives to just switching off the osds one by one to reduce the clustersize so I can just turn them off until tomorrow
[13:06] <TMM> I have 30 nodes in the cluster with 8 osds each
[13:07] <TMM> I'd be turning off 6 physical hosts
[13:07] <TMM> (not at once ;))
[13:14] * blinky_ghost (~psousa@195.245.147.94) has joined #ceph
[13:14] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[13:16] <blinky_ghost> Hi all, can anybody point me some hw recommended specifications for osd nodes for a small cloud production system with HA? Thanks
[13:17] * naoto (~naotok@240f:78:1575:1:ecb4:2f45:3655:836c) has joined #ceph
[13:25] * swami2 (~swami@49.32.0.229) has joined #ceph
[13:25] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[13:25] * naoto (~naotok@240f:78:1575:1:ecb4:2f45:3655:836c) Quit (Ping timeout: 480 seconds)
[13:28] * swami2 (~swami@49.32.0.229) Quit ()
[13:28] * Snowcat4 (~rcfighter@185.60.144.31) has joined #ceph
[13:29] * swami1 (~swami@49.32.0.241) Quit (Read error: Connection reset by peer)
[13:32] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[13:32] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[13:36] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[13:42] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[13:46] * steveeJ (~junky@HSI-KBW-149-172-252-139.hsi13.kabel-badenwuerttemberg.de) has joined #ceph
[13:48] * hgichon (~hgichon@112.220.91.130) Quit (Ping timeout: 480 seconds)
[13:49] * fattaneh (~fattaneh@31.59.132.152) has joined #ceph
[13:51] * daviddcc (~dcasier@LAubervilliers-656-1-16-160.w217-128.abo.wanadoo.fr) has joined #ceph
[13:58] * Snowcat4 (~rcfighter@76GAAAQ1M.tor-irc.dnsbl.oftc.net) Quit ()
[14:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[14:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[14:02] * alexj (~alexj@220.106.broadband6.iol.cz) has joined #ceph
[14:03] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[14:03] <alexj> hi! I am testing out ceph in a lab.. trying to get the basics.. so I have some beginner questions that I didn't find in docs
[14:03] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[14:04] <alexj> I configured some OSDs on a couple of hosts.. I want to configure some redundant storage.. something that looks like a raid5, for example
[14:05] <alexj> the default pool is "replicated size 3" does that do the trick already?
[14:06] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[14:08] * naoto (~naotok@KD124213121038.ppp-bb.dion.ne.jp) has joined #ceph
[14:21] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[14:24] <flaf> alexj: if you have "size 3" and only 2 hosts, I won't work. With 2 osd hosts, you can set size == 2. In this case, each object will be written in a osd in host 1 and in a osd in host 2.
[14:25] * LDA (~DM@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[14:26] <flaf> *it won't work...
[14:44] <TheSov> morning!
[14:46] * Solvius (~Quatrokin@84ZAAASGY.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:49] <alexj> I have 5 hosts with a total of 7 OSDs... does creating a pool automatically know to use all of them?
[14:49] <alexj> can I see what OSDs are used by a pool?
[14:49] <TheSov> yes
[14:49] <TheSov> osds are part of the cluster
[14:49] <TheSov> pools are logical cluster sections
[14:50] <alexj> so do I automatically have replication ?
[14:50] <TheSov> you can use crush to ensure data is on a specific set of osd's but there is no way i know of to make sure a certain pool is on specific osds
[14:51] <TheSov> if your crush is setup correctly and your "size" is > 1 yes
[14:51] <alexj> cool
[14:51] <TheSov> you can always test it by outing an osd
[14:51] <alexj> if I have two OSDs on one single host, does it know NOT to put both replicas on the same host?
[14:51] <TheSov> in the default crush algorithm yes
[14:51] <TheSov> if you have modified it who knows
[14:51] <alexj> awesome
[14:52] <TheSov> do not use the default crush algorithm in production
[14:52] <TheSov> its very basic and stupid
[14:52] <TheSov> it just makes sure no copies sit on the same host
[14:52] <TheSov> ok well, you could use it just fine if you had a small cluster
[14:52] <TheSov> less than 50 osds
[14:53] <TheSov> depends on how much redundancy you need
[14:53] <alexj> got any good documentation in mind that would hint towards better algorithms (in a for dummies format)
[14:54] * mhack (~mhack@66-168-117-78.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:54] <TheSov> all i can tell you is to search google. crush is not simple but its not crazy either
[14:54] <TheSov> atm there is no crush for dummies book
[14:54] <alexj> ok, sounds reasonable
[14:54] <flaf> TheSov: can you explain me the problem with the default crush map in ceph? I'm interested.
[14:54] <TheSov> you can get an idea of how crush works by decompiling the default one
[14:54] <alexj> can I modify a pool on the fly (modifing the number of replicas, for ex)?
[14:54] <TheSov> flaf, it only takes into account osd hosts
[14:55] <TheSov> it does not include racks or power or any other failure domain
[14:55] <TheSov> alexj, yes
[14:55] <alexj> ok.. thank you for the assistance
[14:55] <TheSov> not a problem
[14:55] <alexj> ceph looks great so far
[14:55] <TheSov> its pretty amazing i cannot lie
[14:56] <TheSov> i made a nice install video for hammer
[14:56] <TheSov> step by step from scratch type deal
[14:56] <TheSov> just after os install to working ceph cluster
[14:57] <alexj> do share, please
[14:57] <flaf> TheSov: ah ok, but if in my cluster (a little cluster) the only thing I want is to have each replica on different hosts, the default crush map is perfect for me, correct?
[14:57] <TheSov> yes
[14:57] <flaf> Ok, thx TheSov
[14:58] <TheSov> https://www.youtube.com/watch?v=po7SbQHdtV8
[15:08] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[15:12] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:13] <TMM> If I set a whole bunch of osds to 'out' can I just set them back to 'in' later?
[15:14] <TMM> or is it better to reweight them to 0 until after the maintenance?
[15:15] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Quit: Leaving...)
[15:15] <TheSov> yes
[15:15] <TheSov> out means the cluster will rebalance
[15:16] <TMM> ok, so if I just set a bunch of them to 'out', turn off the nodes after the balance, then turn them back on tomorrow evening and set them to 'in' I should be fine?
[15:16] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[15:16] * Solvius (~Quatrokin@84ZAAASGY.tor-irc.dnsbl.oftc.net) Quit ()
[15:16] <TheSov> i have never seen the purpose of reweighting to 0, to be gentle you have to do it gradually and since ceph has no option for gradual reweight you may aswell out it
[15:16] <TheSov> now there can be significant impact if you out an osd being backfilled
[15:16] <TheSov> so its safer to rewieght
[15:17] <TMM> right now my cluster is just healthy
[15:17] <TheSov> right
[15:17] <TMM> 7680 pgs
[15:17] <TheSov> so lets say you out an osd
[15:17] <TheSov> you have to wait until its back to healthy before doing another
[15:17] <TMM> ah yeah, of course
[15:17] <TMM> that was what I was planning
[15:17] <TheSov> whereas by reweighting you can do that all at once
[15:17] * overclk (~vshankar@121.244.87.117) Quit (Quit: Zzzz...)
[15:17] <TMM> it should be safe to do all of them on a single host at once though right?
[15:18] <TheSov> if your crush algorithm is correct yes
[15:18] <TMM> should be, yeah
[15:18] <TheSov> no data should have a second copy on the same host
[15:18] <TMM> I'm pretty sure I didn't fuck that up
[15:18] <TMM> and if I did, well, I should probably find that out now ;)
[15:19] <TheSov> TheSov makes no gaurentee his advice is any good and does not reflect ceph.com or redhat.com's opinions
[15:19] <TMM> I understand ;)
[15:20] <TMM> nice stress test for the network too
[15:20] <TMM> recover i/o 40000MB/s :P
[15:20] <TheSov> lol
[15:20] <TheSov> yep
[15:24] * naoto (~naotok@KD124213121038.ppp-bb.dion.ne.jp) Quit (Quit: Leaving...)
[15:24] <TheSov> you adding new disks?
[15:25] <TMM> no, replacing memory
[15:25] <TMM> we got a dud batch of samsung ram
[15:25] <TheSov> ouch
[15:25] <TMM> and now have to replace all the ram in all the nodes
[15:25] <TheSov> nice thing about linux is you can add memory holes for bad ram
[15:25] <TMM> If only it was that easy
[15:25] <TMM> the servers just sometimes fail to boot after a cold powerup
[15:25] <TheSov> oh that sounds like bad connections
[15:25] <TMM> sometimes a reset helps, some times it doesn't
[15:26] <TheSov> do you have a reflow oven?
[15:26] <TMM> I don't care, it's supermicro's and samsung's problem
[15:26] <TheSov> LOL
[15:26] <TheSov> tru
[15:26] <TheSov> i love me some supermicro though
[15:26] <TMM> oh yeah
[15:26] <TMM> they've been great about all of this
[15:26] <TheSov> they put other server dealers to shame
[15:26] <TMM> but the fact remains that the fucking dimms are in my nodes now :P
[15:27] <TheSov> whenever we buy dell now, we basically send them what we want as a newegg/supermicro quote and they have to match it or we buy the supermicro
[15:27] <TheSov> im sure dell hates that
[15:27] <TheSov> naturally we have a lot of supermicro
[15:27] <TMM> We're kind of heading in the same direction, except that we tend to prefer sm now anyway
[15:27] <TMM> mostly because dell has screwed us for the last 5 years
[15:28] <TMM> only after we got a batch of supermicros were they willing to even consider selling us the hardware we've been asking for
[15:28] <TMM> fucking disgraceful
[15:29] * stj (~stj@0001c20c.user.oftc.net) Quit (Quit: rebuut)
[15:29] <TMM> they refused to sell us 'normal' SSDs, claiming they would only support certain enterprise grade SSDs that were like 5x the price of consumer drives. SM had a model that was 2x the price of consumer drives, which was about what I expected. Consumer drives + some fat caps for journaling info
[15:29] <TMM> which is exactly what sm had
[15:29] <TMM> then dell started to complain that we couldn't compare their products because they sold 'enterprise grade' ssds
[15:30] <TMM> which was funny, because we explicitly asked them to sell us the drives that sm delivered without issue
[15:30] <TMM> NOW they suddenly *do* carry reasonable ssds
[15:30] <TMM> fuckers
[15:30] * kefu (~kefu@114.92.107.250) has joined #ceph
[15:31] <TheSov> hardware lock-in is for noobs
[15:31] <TheSov> compaq did that and paid the price
[15:31] <TMM> yeah, you'd think dell would've learned something
[15:31] <TMM> but they are only more and more going into the direction of vertically integrated bullshit
[15:32] * stj (~stj@0001c20c.user.oftc.net) has joined #ceph
[15:32] * egonzalez (~egonzalez@41.Red-88-15-119.dynamicIP.rima-tde.net) Quit (Quit: Saliendo)
[15:32] <TMM> I'll say that idrac is a little nicer than the older sm bmc, but it's not that much nicer that it's worth having to deal with dell for
[15:32] <TMM> also, ipmi works just fine everywhere ;)
[15:33] <TheSov> yep
[15:38] <TMM> when I only have some pgs backfilling is it safe to take out the next batch of osds?
[15:39] <TMM> or should I wait until my cluster is entirely healthy again?
[15:39] <TheSov> no wait!
[15:39] <TMM> only have active+remapped+backfilling now
[15:39] <TMM> ok, I'll wait :)
[15:39] <TheSov> dude
[15:39] <TheSov> wait wait wait, how important is the data on this cluster?
[15:40] <TMM> It'd be embarrassing to lose
[15:40] <TMM> but nothing bad
[15:40] <TMM> embarrassing for me personally I mean ;)
[15:40] <TheSov> hmmm judgement call on your part
[15:40] <TheSov> id still say wait
[15:40] <TMM> I'll wait
[15:40] <TheSov> i mean its going 40G per second it wont take that long!
[15:40] <TMM> It's only 11pgs now anyway
[15:40] <TMM> yeah, it slowed down quite a bit
[15:41] <IcePic> "Good judgement comes from experience, experience comes from bad judgement" ;)
[15:41] <TMM> yeah
[15:41] <TMM> You don't learn anything from perfectly running systems
[15:42] <TMM> If I increase the replication count of one of the pools I should be able to take out two at a time without any ill effects
[15:42] <TheSov> LOL
[15:42] <TheSov> no
[15:42] <TheSov> then you have to wait for that to finish
[15:42] <TMM> yeah
[15:43] <TMM> I didn't think that was magical :P
[15:43] <TheSov> how many disks per host?
[15:43] <TMM> I have 2 availability zones, most of the pools have their crush map set such that their data stays within one AZ
[15:43] * kefu is now known as kefu|afk
[15:43] <TMM> but there's one pool that's across both azs
[15:43] <TheSov> ouch
[15:44] <TheSov> who called that one
[15:44] <TheSov> thats like a country wide raid 0
[15:44] <TheSov> LOL
[15:44] <TMM> it has a replication count of 3 now, If I set that to 4 I should be able to take down one entire host per az without issue, right?
[15:44] <TheSov> no clue
[15:44] <TheSov> you could just shutdown the host
[15:44] <TheSov> and see what happens
[15:44] <TheSov> if it shows pg's stuck turn it back on
[15:44] <TMM> also, I called that. because of the way that openstack glance works you don't have much choice if you want people to be able to use openstack largely as documented on the rest of the internet
[15:45] <TheSov> you have 10 whole minutes
[15:45] <TheSov> 10 minutes is enough to change ram
[15:46] <TMM> there's no real 'country-wide raid0' going on though, the per-az volumes and vms are understood by the users to all go down together so.
[15:46] <TMM> there's no cross-az backup or anything, a vm exists only on a pool that only exists on placement groups on servers that are in that az
[15:46] <TMM> except for the glance images
[15:46] <TMM> which have to be global if you want to be able to use the horizon ui
[15:49] <TheSov> i need to get into openstack
[15:49] <TheSov> i think not knowing it is holding me back
[15:49] <TheSov> that wasnt meant to rhyme
[15:50] <TMM> ah well, you have 3 types of storage in openstack
[15:50] <IcePic> would it be hard to ask people to upload glance images to both (all?) zones for which they want such machines to spin up?
[15:50] <TMM> IcePic, the problem is that horizon has no ui for it
[15:50] <IcePic> ok, wondered if it was a tech or social/knowledge issue
[15:51] <TMM> not really, I mean, I understand it's possible but even tools that use the openstack api assume that images are global in a region
[15:51] <TMM> things like packer and such
[15:52] <TMM> It's just too much of a pain to fix all those tools and their assumptions, let alone that the users of this platform never read any documentation we write anyway ;)
[15:52] <TMM> so that'll just be annoying for everyone involved not the least of which is me
[15:55] <TMM> health HEALTH_OK \o/
[15:55] <TMM> neeext :P
[15:57] * Kurt (~Adium@2001:628:1:5:2c20:2e:8888:8497) has joined #ceph
[15:57] <TheSov> awesome!
[15:57] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[15:58] <TMM> that's interesting, this set of osds apparently held way less data
[15:58] <TMM> the previous took almost 30 minutes to get from out to healthy
[15:58] <TMM> this one's already almost done now
[15:58] <TMM> Oh, when I put them back in, is there any reason to stagger that at all?
[15:58] <TMM> health HEALTH_OK
[15:58] <TMM> well, that didn't hurt much
[16:00] * andreww (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:03] <TheSov> no you can in them all at the same time
[16:03] <TheSov> thats ok
[16:05] <TMM> poor cluster, down to 179TB
[16:05] <TheSov> free?
[16:05] <TMM> yeah
[16:05] <TheSov> poor you...
[16:05] <TMM> ;)
[16:06] <TMM> '200' is just such a nice round number
[16:08] * kefu|afk is now known as kefu
[16:08] * herrsergio (~herrsergi@200.77.224.239) has joined #ceph
[16:09] * herrsergio is now known as Guest2907
[16:11] * steveeJ (~junky@HSI-KBW-149-172-252-139.hsi13.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[16:11] * rotbeard (~redbeard@2a02:908:df11:4280:6267:20ff:feb7:c20) has joined #ceph
[16:13] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[16:19] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[16:28] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[16:32] * swami1 (~swami@27.7.167.2) has joined #ceph
[16:32] <treenerd> Does anyone know whats going on with calamari? What should I use romana, calamari .... lost the focus on that because normaly I don't need a webinterface for ceph.
[16:33] * linjan_ (~linjan@176.195.223.167) Quit (Ping timeout: 480 seconds)
[16:36] * danieagle (~Daniel@187.34.2.79) has joined #ceph
[16:38] * kefu is now known as kefu|afk
[16:41] * andreww (~xarses@64.124.158.100) has joined #ceph
[16:42] <TMM> thesov, once I get close to done removing an OSD the process slows down a lot, with only a handful of objects a second recovering at a time. I had a look over my cluster and none of the nodes are under any real load. Any ideas what may be causing that? Is that just normal behavior?
[16:43] <TheSov> normal
[16:43] <TheSov> theres a sysctl that helps a little
[16:44] <TheSov> but i forgot which it is
[16:44] <TheSov> let me see if i can find
[16:44] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[16:45] * EinstCrazy (~EinstCraz@218.69.72.130) Quit (Remote host closed the connection)
[16:45] * i_m (~ivan.miro@88.206.113.199) Quit (Ping timeout: 480 seconds)
[16:46] <TMM> it still has to do like 50K objects, at the 20 obj/s I'm seeing now that's going to take a while :-/
[16:46] <TheSov> ulimit and fs.file-max
[16:46] <TheSov> jack those up
[16:46] <TMM> open files is currently 131072 for the ceph processes
[16:46] <TheSov> TMM, probably because those are large objects
[16:47] <TheSov> all the smaller objects are likely already moved
[16:47] <TMM> fs.file-max = 52492373
[16:47] <TheSov> well your set
[16:47] <TheSov> like i said its probably the larger 4MB objects
[16:47] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[16:47] <TMM> even then, that's not very fast, going at 20 obj/s
[16:48] <TheSov> 20 * 4 = 80MB per second
[16:48] <TheSov> its rust or ssd?
[16:48] <TMM> yeah, but it's full SSD and everything's connected over 2x 10Gbit
[16:48] <TMM> 80mb/s is kind of sad
[16:48] <TheSov> actually yeah that seems odd for SSD
[16:48] <TheSov> I can try to look into it for you
[16:49] <TMM> also, rados bechmarks show really nice performance out of this cluster
[16:50] <TheSov> ok
[16:50] <TMM> Anything I can show you?
[16:50] <TheSov> here we go
[16:50] <TheSov> osd max backfills = ?
[16:50] <TheSov> for you?
[16:51] <TMM> unset, so hammer default
[16:51] <TheSov> i believe thats 10
[16:51] <TheSov> you are gonna wanna up that
[16:52] <TheSov> and osd recovery max active
[16:52] <TMM> how high do I want to set that?
[16:52] <TheSov> i dont know what your cluster can do
[16:52] <TheSov> so its a play with it kinda thing
[16:52] <TheSov> but those 2 control backfill rates
[16:53] <TheSov> also there is osd_num_concurrent_backfills it controles number of backfills to single host
[16:53] <TMM> well, I can easily write 2gigabyte a second to the cluster with rados bench
[16:54] <TheSov> basically set the numbers to like 20 and 15 and see what happens
[16:54] <TheSov> i imagine it should get a lot faster
[16:54] <TheSov> you seem to have a very capable cluster
[16:54] <TheSov> i can find no example where someone set max backfills higher than 20
[16:55] <TheSov> so i imagine 20 for max backfills and 15 for max active would do the trick
[16:56] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[16:56] <TheSov> im also going to include a boulder of salt heh
[16:56] <TMM> 15 seems to be the default for max active
[16:56] <TheSov> maybe set that to 20 aswell?
[16:56] <TheSov> 20/20?
[16:58] * swami2 (~swami@106.216.191.199) has joined #ceph
[16:58] <TheSov> let me know if that works, we dont have a High perf ceph cluster, but i would like one in the future
[16:58] * swami1 (~swami@27.7.167.2) Quit (Ping timeout: 480 seconds)
[16:58] <TMM> I'm trying it now
[16:59] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[16:59] <TMM> do I need to restart the OSDs for them to pick up a change in /etc/ceph/ceph.conf?
[16:59] <TheSov> ceph does perameter injection
[16:59] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[17:00] <TheSov> ceph tell osd.X injectargs '--blah = x'
[17:00] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Ping timeout: 480 seconds)
[17:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[17:01] <flaf> TheSov: this command above should be launched in the node where osd.X is hosted, correct?
[17:01] * angdraug (~angdraug@c-69-181-140-42.hsd1.ca.comcast.net) has joined #ceph
[17:01] <TheSov> depends on your osd config
[17:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[17:01] <flaf> Ah?
[17:01] <TheSov> do you have the admin key on all osd servers?
[17:01] <flaf> Yes.
[17:02] <TheSov> then yes :D
[17:02] <TMM> I don't :-/
[17:02] <TheSov> then you gotta goto the monitors!
[17:03] <TMM> Is there a way to tell it to all osds?
[17:03] <TMM> ceph tell osd ...
[17:03] <TheSov> osd.*
[17:03] <TMM> neat
[17:04] <TheSov> i would test that first...
[17:04] <TheSov> im sure its fine
[17:04] <TheSov> but still
[17:04] <TMM> I'll just do one at a time
[17:05] <TheSov> naw
[17:05] <TheSov> do it
[17:05] <TMM> I mean, one parameter
[17:05] <TheSov> do - it - do - it
[17:05] <TMM> see if nothing explodes
[17:05] <TMM> hmm, max backfills set to 20 and now I see between 500 and 1000 obj/s
[17:05] <TMM> that's nice
[17:06] * shylesh__ (~shylesh@59.95.70.193) has joined #ceph
[17:06] <TheSov> holy shit i must have been drinking, DONT DO IT! no wait, do it doucemo, slowly
[17:06] <TheSov> there you go :D
[17:06] <TheSov> check your iowaits
[17:06] <TMM> na'h, it's going fine
[17:07] <TMM> .3% iowait
[17:07] <TheSov> pfft
[17:07] <TMM> seems good
[17:07] <TheSov> 40 sounds like a good number :D
[17:07] <TMM> I don't want to totally cripple my cluster on recovery
[17:07] <TMM> I think this is a decent balance
[17:07] <TheSov> true
[17:07] <TheSov> im a bit mad sometimes
[17:07] <TheSov> forgive please
[17:08] <TMM> Thanks for helping
[17:08] <TheSov> my pleasure
[17:08] <TMM> I should've read the whole bit about parameter injection
[17:08] <TMM> when I first started using ceph
[17:08] <TMM> this seems like stuff I should've known (r)
[17:09] <TheSov> truth be told, once you have a ceph cluster setup, theres really rarely any reason to touch it in my experience
[17:09] <TheSov> you have a special case with the ram
[17:09] <TheSov> if that system had failed, you would just replace the motherboard and turned it back on
[17:09] <TheSov> no big thang
[17:09] <TMM> yeah
[17:09] <TMM> but now it's all of them
[17:10] <TMM> and we will have a sm dude on site
[17:10] <TMM> I can't tell him to wait up to 30min between servers :-/
[17:10] <TheSov> true
[17:10] <TMM> I'm taking out all the 'bad' servers out of the cluster
[17:10] * TiCPU__ is now known as TiCPU|Work
[17:10] <TMM> almost removing a third of the osds now
[17:11] <TMM> I guess that's not a normal operation ;)
[17:11] <TheSov> nope
[17:11] <TheSov> brb nature calls
[17:12] <flaf> Is it possible to get the value of parameter with "ceph tell osd.X ..." command?
[17:12] <flaf> (just get not set)
[17:15] <devicenull> I seem to find myself with blocked operations pretty frequently
[17:15] <devicenull> is there any way to configure those to just time out?
[17:15] <devicenull> it's getting *really* annoying to have to keep restarting OSDs
[17:15] <TMM> flaf, you can do 'ceph daemon osd.0 config show'
[17:17] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[17:17] <flaf> TMM: ok thx, it's the command I use in fact. But in this case, I have to launch the command in the node where the osd is hosted. Is there an equivalent command I can use in any OSD server? This is the point of my question because it's possible to set a value.
[17:17] * swami2 (~swami@106.216.191.199) Quit (Read error: Connection reset by peer)
[17:17] * theghost99 (~drupal@192.42.115.101) has joined #ceph
[17:21] <flaf> With "ceph tell osd.X injectargs ..." I can set a value of an OSD and I can launch the command in any OSD server, but it seems to me curious that there is no equivalent command to just _get_ a value (except with "ceph daemon osd.X config show", but this command is no completely equivalent because I have to launch it in the node where osd.X is hosted).
[17:22] <flaf> I hope I'm clear in my question. ;)
[17:24] * jaank (~quassel@c-73-211-74-134.hsd1.il.comcast.net) Quit (Ping timeout: 480 seconds)
[17:27] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[17:28] * sudocat (~dibarra@2602:306:8bc7:4c50::46) Quit (Ping timeout: 480 seconds)
[17:35] * moore_ (~moore@71-211-73-118.phnx.qwest.net) has joined #ceph
[17:37] * moore_ (~moore@71-211-73-118.phnx.qwest.net) Quit (Remote host closed the connection)
[17:38] * moore_ (~moore@64.202.160.233) has joined #ceph
[17:38] * rotbeard (~redbeard@2a02:908:df11:4280:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[17:39] * fattaneh (~fattaneh@31.59.132.152) Quit (Quit: Leaving.)
[17:45] * EinstCrazy (~EinstCraz@218.69.72.130) has joined #ceph
[17:45] <TheSov> back
[17:47] * theghost99 (~drupal@84ZAAASN7.tor-irc.dnsbl.oftc.net) Quit ()
[17:47] * shylesh__ (~shylesh@59.95.70.193) Quit (Remote host closed the connection)
[17:51] * treenerd (~treenerd@91.141.2.128.wireless.dyn.drei.com) has joined #ceph
[17:54] * dgbaley27 (~matt@75.148.118.217) Quit (Quit: Leaving.)
[17:54] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[17:54] * moore_ (~moore@64.202.160.233) Quit (Remote host closed the connection)
[17:57] * EinstCrazy (~EinstCraz@218.69.72.130) Quit (Ping timeout: 480 seconds)
[17:57] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[17:58] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) has joined #ceph
[18:00] * treenerd (~treenerd@91.141.2.128.wireless.dyn.drei.com) Quit (Ping timeout: 480 seconds)
[18:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[18:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[18:03] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[18:06] * lcurtis_ (~lcurtis@47.19.105.250) has joined #ceph
[18:07] * cathode (~cathode@50-198-166-81-static.hfc.comcastbusiness.net) has joined #ceph
[18:08] * treenerd (~treenerd@91.141.3.188.wireless.dyn.drei.com) has joined #ceph
[18:16] * linjan_ (~linjan@176.77.61.113) has joined #ceph
[18:18] * pabluk is now known as pabluk_
[18:19] * bvi (~bastiaan@185.56.32.1) Quit (Quit: Ex-Chat)
[18:19] * SquallSeeD31 (~FNugget@tor2r.ins.tor.net.eu.org) has joined #ceph
[18:20] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[18:25] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[18:29] * treenerd (~treenerd@91.141.3.188.wireless.dyn.drei.com) Quit (Ping timeout: 480 seconds)
[18:29] * haomaiwa_ (~haomaiwan@103.15.217.218) has joined #ceph
[18:30] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Read error: Connection reset by peer)
[18:35] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[18:36] * Kupo1 (~tyler.wil@23.111.254.159) has joined #ceph
[18:39] <TMM> thesov, it's going fine so far, wrote a little script that will just take out the next host when the cluster goes back to healthy
[18:41] <TheSov> ah nice
[18:42] * mykola (~Mikolaj@91.225.202.139) has joined #ceph
[18:42] <TheSov> shoot it to me will ya?
[18:42] <TheSov> that could be useful!
[18:43] * vbellur (~vijay@2601:647:4f00:4960:5e51:4fff:fee8:6a5c) has joined #ceph
[18:49] * SquallSeeD31 (~FNugget@4MJAAAUBH.tor-irc.dnsbl.oftc.net) Quit ()
[18:50] <TMM> thesov, sure let me remove the hardcoded crap from it :P
[18:50] <TheSov> if you dont mind that is
[18:52] <TMM> no, you've been very helpful
[18:52] <TMM> allow me to return the favor
[18:52] <TheSov> i do appreciate as scripting is not my forte
[18:57] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[18:58] * neobenedict (~Nanobot@tor2r.ins.tor.net.eu.org) has joined #ceph
[18:59] <TiCPU|Work> I have a PG that is stuck active+clean+inconsistent, I have posted to the mailing list with details that scrub/deep-scrub/repair does not want to start but had now answer, anyone in here have any idea why the OSD won't accept any command? Ref.: http://article.gmane.org/gmane.comp.file-systems.ceph.user/26122
[19:00] <TMM> thesov, this should do it: http://pastebin.com/LEqS5X3v
[19:00] <TheSov> TiCPU|Work, does that osd have a seperate journal disk?
[19:01] * haomaiwa_ (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[19:01] <TiCPU|Work> yes, it is a different LV which is on a different PV
[19:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[19:01] * linjan_ (~linjan@176.77.61.113) Quit (Quit: ?????????? ?? ???? ??????)
[19:01] <TheSov> ok is that readable?
[19:02] <TheSov> also you did yourself a disservice by having an osd on LVM
[19:02] <TMM> thesov, this does have one problem: It won't work for the last host in the list. I can fix that too if you want. I didn't need it so I didn't bother
[19:02] <TheSov> naw i can fix that i think heh
[19:04] <TiCPU|Work> Yes, the LV is readable, the cluster works just fine, its just that one disk was going bad and was replaced but the OSD does not care. In my case, LVM saved my life too many time not to use it (snapshots, pvmove, live lvextend in case of disk full, list goes on)
[19:04] <TheSov> TiCPU|Work, wait you replaced the disk and kept the osd
[19:04] <TheSov> ?
[19:05] <TheSov> i dont think you are fundementally understanding how ceph works
[19:06] <TheSov> when a disk goes bad
[19:06] <TheSov> thats ok
[19:06] <TiCPU|Work> it was ddrescue'ed to a new disk and replaced
[19:06] <TheSov> you delete the osd, replace the disk and then add a new disk and a NEW osd
[19:06] <TheSov> TiCPU|Work, your actions broke that disk
[19:06] <TheSov> err osd
[19:06] <TheSov> now delete the osd, and re add it the correct way
[19:07] <TiCPU|Work> right, I just wanted to save myself some backfill since the disk had 1 bad sector
[19:07] <TiCPU|Work> this cluster is very small, 12 disks/OSDs
[19:08] <TheSov> it wont work cuz the log elsewhere tells a different story
[19:08] * treenerd (~treenerd@cpe90-146-148-47.liwest.at) has joined #ceph
[19:09] <TiCPU|Work> deep-scrubs commands are not working because of the log? the OSD was stopped, cloned and put back in
[19:09] * daviddcc (~dcasier@LAubervilliers-656-1-16-160.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[19:10] <TheSov> I dont know enough about it to tell you exactly what happens but i do know that when an osd acts stupid. deleting it, zapping the disk and adding it back fixes it. my assumption is that its an inconsistent log.
[19:10] <TheSov> also theres a guid for the partition
[19:10] <TheSov> that would also not match
[19:11] <TiCPU|Work> well, let's forget about this OSD, deep-scrub on any OSD is not processed
[19:11] <TheSov> out that osd and try again
[19:14] <TiCPU|Work> and scheduled deep-scrubs are working, If I have to out this OSD, I'll do that after holidays not to impact performance on reduced staff, just in case.
[19:14] <TheSov> 12 disks is very small, how many osd hosts do you have
[19:14] <TiCPU|Work> 6 hosts
[19:15] <TiCPU|Work> soon to be expanded to 10 hosts and 28 OSDs
[19:16] <TiCPU|Work> we are preparing our second cluster first
[19:28] * neobenedict (~Nanobot@76GAAARDH.tor-irc.dnsbl.oftc.net) Quit ()
[19:30] * vbellur (~vijay@2601:647:4f00:4960:5e51:4fff:fee8:6a5c) Quit (Ping timeout: 480 seconds)
[19:56] * Georgyo (~georgyo@shamm.as) has joined #ceph
[19:59] * kefu|afk (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:59] * kefu (~kefu@114.92.107.250) has joined #ceph
[19:59] * kefu (~kefu@114.92.107.250) Quit ()
[20:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[20:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[20:02] * cathode|work (~cathode@50.232.215.114) has joined #ceph
[20:05] * cathode (~cathode@50-198-166-81-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:19] * cathode|work (~cathode@50.232.215.114) Quit (Quit: Leaving)
[20:20] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[20:45] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[20:48] * Bonzaii (~Popz@6e.17.01a8.ip4.static.sl-reverse.com) has joined #ceph
[21:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[21:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[21:09] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) has joined #ceph
[21:09] * ChanServ sets mode +o nhm
[21:18] * Bonzaii (~Popz@6e.17.01a8.ip4.static.sl-reverse.com) Quit ()
[21:22] * JamesHarrison (~click@31-168-172-143.telavivwifi.com) has joined #ceph
[21:22] * diegows (~diegows@main.woitasen.com.ar) has joined #ceph
[21:24] * diegows (~diegows@main.woitasen.com.ar) Quit ()
[21:24] * diegows (~diegows@main.woitasen.com.ar) has joined #ceph
[21:26] * herrsergio (~herrsergi@104.194.0.106) has joined #ceph
[21:27] * herrsergio is now known as Guest2927
[21:28] * Guest2907 (~herrsergi@200.77.224.239) Quit (Ping timeout: 480 seconds)
[21:36] * herrserg1o (~herrsergi@200.77.224.239) has joined #ceph
[21:37] * angdraug (~angdraug@c-69-181-140-42.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[21:37] * Guest2927 (~herrsergi@104.194.0.106) Quit (Ping timeout: 480 seconds)
[21:41] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[21:51] * JamesHarrison (~click@31-168-172-143.telavivwifi.com) Quit (Ping timeout: 480 seconds)
[22:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[22:01] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[22:03] * fdmanana (~fdmanana@2001:8a0:6dfd:6d01:7911:4464:7501:b333) Quit (Ping timeout: 480 seconds)
[22:04] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[22:06] * LeaChim (~LeaChim@host86-185-146-193.range86-185.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[22:08] * LDA (~DM@host217-114-156-249.pppoe.mark-itt.net) Quit (Quit: Nettalk6 - www.ntalk.de)
[22:15] * LeaChim (~LeaChim@host86-185-146-193.range86-185.btcentralplus.com) has joined #ceph
[22:19] * alexj (~alexj@220.106.broadband6.iol.cz) Quit (Remote host closed the connection)
[22:22] * danschwabe (~danschwab@157.130.171.46) has joined #ceph
[22:24] * danieagle (~Daniel@187.34.2.79) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[22:26] * EinstCrazy (~EinstCraz@218.69.72.130) has joined #ceph
[22:26] <danschwabe> First time here, random question: We are seeing an issue when hitting a certain number of objects - around 39 million with the filestore settings merge threshold: 40 split multiple: 8, with 648 OSDs. At 39 million objects we see heavy utilization and a lot of blocked requests (> 32 sec). Our current theory is that this is due to overhead of subdir splitting.
[22:28] <danschwabe> Would setting the expected number of objects for the pool to be the largest object count we expected to see be a valid mitigation for this issue?
[22:28] <danschwabe> And could it have any negative consequences?
[22:28] * blinky_ghost (~psousa@195.245.147.94) Quit (Quit: Ex-Chat)
[22:28] * Rosenbluth (~rushworld@178.162.199.89) has joined #ceph
[22:30] * mykola (~Mikolaj@91.225.202.139) Quit (Quit: away)
[22:31] * markl (~mark@knm.org) Quit (Remote host closed the connection)
[22:32] * markl (~mark@knm.org) has joined #ceph
[22:35] * EinstCrazy (~EinstCraz@218.69.72.130) Quit (Ping timeout: 480 seconds)
[22:50] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[22:58] * Rosenbluth (~rushworld@178.162.199.89) Quit ()
[22:59] * neurodrone (~neurodron@108.60.145.130) has joined #ceph
[23:01] * haomaiwang (~haomaiwan@103.15.217.218) Quit (Remote host closed the connection)
[23:06] * haomaiwang (~haomaiwan@103.15.217.218) has joined #ceph
[23:06] * rendar (~I@87.19.183.81) Quit (Ping timeout: 480 seconds)
[23:07] * DV__ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[23:07] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[23:08] * mhack is now known as mhack|afk
[23:09] * saltsa_ (~joonas@dsl-hkibrasgw1-58c018-65.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[23:09] * rendar (~I@87.19.183.81) has joined #ceph
[23:14] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[23:21] * fdmanana (~fdmanana@2001:8a0:6dfd:6d01:a875:9e56:806e:784f) has joined #ceph
[23:24] * herrserg1o (~herrsergi@200.77.224.239) Quit (Remote host closed the connection)
[23:34] * linjan (~linjan@176.195.223.167) has joined #ceph
[23:40] * treenerd (~treenerd@cpe90-146-148-47.liwest.at) Quit (Quit: Verlassend)
[23:40] * linjan (~linjan@176.195.223.167) Quit (Remote host closed the connection)
[23:52] * cathode (~cathode@50-198-166-81-static.hfc.comcastbusiness.net) has joined #ceph
[23:52] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[23:54] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.