#ceph IRC Log

Index

IRC Log for 2016-08-05

Timestamps are in GMT/BST.

[0:18] * [0x4A6F]_ (~ident@p508CD26E.dip0.t-ipconnect.de) has joined #ceph
[0:20] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:20] * [0x4A6F]_ is now known as [0x4A6F]
[0:28] * stefan0 (~stefano@168.205.191.253) Quit (Read error: Connection reset by peer)
[0:28] * Rens2Sea (~Tarazed@61TAAA3JJ.tor-irc.dnsbl.oftc.net) Quit ()
[0:29] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Remote host closed the connection)
[0:34] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[0:38] * Tenk (~Jase@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[0:41] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[0:42] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Ping timeout: 480 seconds)
[0:48] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[0:50] * EthanL (~lamberet@cce02cs4036-fa12-z.ams.hpecore.net) has joined #ceph
[0:50] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[1:00] * EthanL (~lamberet@cce02cs4036-fa12-z.ams.hpecore.net) Quit (Quit: ZNC - 1.6.0 - http://znc.in)
[1:04] * cathode (~cathode@50.232.215.114) Quit (Quit: Leaving)
[1:07] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[1:07] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Quit: Leaving.)
[1:07] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:07] * Tenk (~Jase@5AEAAAR1H.tor-irc.dnsbl.oftc.net) Quit ()
[1:09] * EthanL (~lamberet@cce02cs4037-fa12-z.ams.hpecore.net) has joined #ceph
[1:13] <blizzow> I made a preliminary quick survey to start getting benchmarks from other ceph users. It's at: https://goo.gl/forms/PROLlZdwv0o7EjcI3
[1:13] <blizzow> Does anyone have suggestions for questions that should be on there, or a better basic benchmark command (I just put rbd bench-write -i imagename --size 100G).
[1:14] <blizzow> As soon as I figure out where google put the let anyone view the spreadsheet link, I'll put that out there too.
[1:18] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[1:18] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[1:18] * inevity (~androirc@107.170.0.159) has joined #ceph
[1:22] * squizzi (~squizzi@107.13.31.195) Quit (Quit: bye)
[1:23] * dgurtner (~dgurtner@178.197.225.145) Quit (Ping timeout: 480 seconds)
[1:23] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:29] * blizzow (~jburns@50.243.148.102) Quit (Ping timeout: 480 seconds)
[1:39] * Chrissi_ (~spidu_@93.115.92.169) has joined #ceph
[1:44] * xarses_ (~xarses@64.124.158.192) Quit (Ping timeout: 480 seconds)
[1:45] * EinstCrazy (~EinstCraz@58.39.77.21) has joined #ceph
[1:45] * reed (~reed@216.38.134.18) has joined #ceph
[1:48] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[1:51] * oms101 (~oms101@p20030057EA031900C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:52] * haplo37 (~haplo37@107.190.44.23) has joined #ceph
[1:53] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[1:53] * EinstCrazy (~EinstCraz@58.39.77.21) Quit (Remote host closed the connection)
[1:57] * jdillaman_ (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[1:58] * cephski (~oftc-webi@177.207.44.68.dynamic.adsl.gvt.net.br) has joined #ceph
[1:59] * jdillaman_ (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) Quit ()
[1:59] <cephski> hi there, something quite frustrating, almost at the end of the ceph tutorial and trying to map a volume but getting rbd: map failed: (13) Permission denied can anyone please guide me in the right direction?
[2:00] * oms101 (~oms101@p20030057EA025A00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:00] <cephski> nothing on dmesg and im running this on lxd containers if that can be an issue
[2:01] <cephski> cluster health is also ok
[2:05] * reed (~reed@216.38.134.18) Quit (Quit: Ex-Chat)
[2:06] * haplo37 (~haplo37@107.190.44.23) Quit (Ping timeout: 480 seconds)
[2:06] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[2:06] * wushudoin (~wushudoin@38.99.12.237) Quit (Ping timeout: 480 seconds)
[2:07] * srk_ (~Siva@2605:6000:ed04:ce00:59d1:c1f5:964b:9f1d) has joined #ceph
[2:09] * Chrissi_ (~spidu_@93.115.92.169) Quit ()
[2:12] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[2:21] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[2:21] * srk_ (~Siva@2605:6000:ed04:ce00:59d1:c1f5:964b:9f1d) Quit (Ping timeout: 480 seconds)
[2:26] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[2:41] * inevity (~androirc@107.170.0.159) Quit (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
[2:41] * inevity (~androirc@107.170.0.159) has joined #ceph
[2:42] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[2:42] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[2:46] * AndroUser2 (~androirc@107.170.0.159) has joined #ceph
[2:46] <cephski> nevermind the problem seems to be with lxd, tried to map on the host machine and it was ok
[2:46] * cephski (~oftc-webi@177.207.44.68.dynamic.adsl.gvt.net.br) Quit (Quit: Page closed)
[2:46] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) Quit (Quit: Leaving.)
[2:46] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) has joined #ceph
[2:48] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[2:49] * inevity (~androirc@107.170.0.159) Quit (Ping timeout: 480 seconds)
[2:49] * AndroUser2 (~androirc@107.170.0.159) Quit ()
[2:50] * inevity (~androirc@107.170.0.159) has joined #ceph
[2:53] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[2:55] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[3:01] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) Quit (Quit: Leaving.)
[3:01] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) has joined #ceph
[3:02] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) Quit ()
[3:11] * dcwangmit01 (~dcwangmit@162-245.23-239.PUBLIC.monkeybrains.net) Quit (Ping timeout: 480 seconds)
[3:17] * Arcturus (~Zyn@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[3:18] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[3:19] * srk_ (~Siva@2605:6000:ed04:ce00:9985:42a4:3453:2e79) has joined #ceph
[3:25] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[3:28] * yanzheng (~zhyan@125.70.20.176) has joined #ceph
[3:30] * aNupoisc (~adnavare@192.55.54.40) Quit (Remote host closed the connection)
[3:32] * scg (~zscg@181.122.37.47) Quit (Quit: Ex-Chat)
[3:32] * inevity (~androirc@107.170.0.159) Quit (Remote host closed the connection)
[3:34] * aj__ (~aj@x4db1ce8d.dyn.telefonica.de) has joined #ceph
[3:34] * derjohn_mobi (~aj@x590db2b0.dyn.telefonica.de) Quit (Read error: Connection reset by peer)
[3:39] * vbellur (~vijay@71.234.224.255) has joined #ceph
[3:39] * haomaiwang (~oftc-webi@114.249.239.114) Quit (Ping timeout: 480 seconds)
[3:46] * Arcturus (~Zyn@5AEAAAR4T.tor-irc.dnsbl.oftc.net) Quit ()
[3:48] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:51] * tsg (~tgohad@134.134.139.82) Quit (Ping timeout: 480 seconds)
[3:53] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[3:55] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[3:57] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[3:59] * dcwangmit01 (~dcwangmit@162-245.23-239.PUBLIC.monkeybrains.net) has joined #ceph
[4:00] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[4:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:03] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[4:04] * haomaiwang (~oftc-webi@61.149.85.206) has joined #ceph
[4:05] * srk_ (~Siva@2605:6000:ed04:ce00:9985:42a4:3453:2e79) Quit (Ping timeout: 480 seconds)
[4:07] * t4nk540 (~oftc-webi@117.247.186.15) has joined #ceph
[4:09] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[4:10] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[4:10] * krypto (~krypto@106.51.31.35) has joined #ceph
[4:11] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:15] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:16] * kefu (~kefu@114.92.96.253) has joined #ceph
[4:18] * brians__ (~brian@80.111.114.175) Quit (Read error: Connection reset by peer)
[4:19] * brians (~brian@80.111.114.175) has joined #ceph
[4:19] * baojg (~baojg@61.135.155.34) Quit (Remote host closed the connection)
[4:25] * baojg (~baojg@61.135.155.34) has joined #ceph
[4:26] * krypto (~krypto@106.51.31.35) Quit (Read error: Connection reset by peer)
[4:26] * krypto (~krypto@106.51.31.35) has joined #ceph
[4:30] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) Quit (Remote host closed the connection)
[4:31] * krypto (~krypto@106.51.31.35) Quit ()
[4:32] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Read error: Connection reset by peer)
[4:32] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:37] * jfaj (~jan@p20030084AF61CD005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:40] * yanzheng1 (~zhyan@125.70.20.176) has joined #ceph
[4:41] * yanzheng (~zhyan@125.70.20.176) Quit (Ping timeout: 480 seconds)
[4:43] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) Quit (Ping timeout: 480 seconds)
[4:44] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[4:46] * jfaj (~jan@p20030084AF1806005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[4:48] * Kingrat_ (~shiny@cpe-74-129-33-192.kya.res.rr.com) has joined #ceph
[4:49] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) has joined #ceph
[4:52] * Kingrat (~shiny@cpe-74-129-33-192.kya.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:56] * t4nk540 (~oftc-webi@117.247.186.15) Quit (Ping timeout: 480 seconds)
[4:57] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[4:58] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[4:59] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[5:05] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[5:09] * kefu (~kefu@114.92.96.253) Quit (Read error: Connection reset by peer)
[5:10] * kefu (~kefu@114.92.96.253) has joined #ceph
[5:11] * wjw-freebsd2 (~wjw@smtp.digiware.nl) has joined #ceph
[5:13] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[5:18] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[5:19] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[5:20] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[5:23] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[5:30] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[5:32] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[5:36] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[5:36] * vimal (~vikumar@114.143.165.7) has joined #ceph
[5:42] * Pulec1 (~Quackie@46.166.188.230) has joined #ceph
[5:44] * Vacuum__ (~Vacuum@88.130.210.152) has joined #ceph
[5:51] * Vacuum_ (~Vacuum@88.130.192.154) Quit (Ping timeout: 480 seconds)
[5:53] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:01] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[6:01] * walcubi_ (~walcubi@p5795B9DE.dip0.t-ipconnect.de) has joined #ceph
[6:08] * walbuci (~walcubi@p5795BD0E.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:09] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:11] * inevity (~androirc@107.170.0.159) has joined #ceph
[6:12] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[6:12] * Pulec1 (~Quackie@61TAAA3SX.tor-irc.dnsbl.oftc.net) Quit ()
[6:21] * Kizzi (~Lunk2@178-175-128-50.static.host) has joined #ceph
[6:24] * ira (~ira@121.244.87.117) has joined #ceph
[6:24] * ircolle (~Adium@2601:285:201:633a:d106:43bd:1d7b:f4b0) Quit (Quit: Leaving.)
[6:26] * aNupoisc (~adnavare@134.134.139.82) has joined #ceph
[6:28] * ircolle (~Adium@2601:285:201:633a:6dba:b252:32d3:abf3) has joined #ceph
[6:30] * Hemanth (~hkumar_@103.228.221.143) has joined #ceph
[6:32] * vimal (~vikumar@121.244.87.116) has joined #ceph
[6:32] * ircolle (~Adium@2601:285:201:633a:6dba:b252:32d3:abf3) Quit ()
[6:34] * aNupoisc (~adnavare@134.134.139.82) Quit (Remote host closed the connection)
[6:42] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[6:42] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[6:46] <toastydeath> anyone doing multi-datacener DR using ceph?
[6:46] <toastydeath> i've got a few use cases and i'm seeing a bunch of different ways people are approaching it
[6:47] * inevity (~androirc@107.170.0.159) Quit (Ping timeout: 480 seconds)
[6:50] * parveenks (~oftc-webi@14.141.113.5.static-Delhi.vsnl.net.in) has joined #ceph
[6:51] <parveenks> Have a cluster and I want a radosGW user to have access on a bucket objects only like <BUCKET_NAME>/* but user should not be able to create new or remove this bucket
[6:51] * Kizzi (~Lunk2@9YSAAA4R7.tor-irc.dnsbl.oftc.net) Quit ()
[7:01] * kefu (~kefu@114.92.96.253) has joined #ceph
[7:07] * swami1 (~swami@49.44.57.239) has joined #ceph
[7:07] * t4nk230 (~oftc-webi@117.247.186.15) has joined #ceph
[7:09] * swami2 (~swami@49.38.0.249) has joined #ceph
[7:15] * swami1 (~swami@49.44.57.239) Quit (Ping timeout: 480 seconds)
[7:18] * scuttlemonkey is now known as scuttle|afk
[7:19] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Ping timeout: 480 seconds)
[7:23] <ronrib> hello ceph friends, could anyone using erasure coded pools please post their crushmap? I'm having trouble finding clear examples of the rule section. crushmap dump commands are: ceph osd getcrushmap -o crushmap.dat && crushtool -d crushmap.dat -o crushmap.txt
[7:24] * toastydeath (~toast@pool-71-255-253-39.washdc.fios.verizon.net) Quit (Quit: Leaving)
[7:24] * toastyde1th is now known as toastydeath
[7:25] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Read error: Connection reset by peer)
[7:27] * swami2 (~swami@49.38.0.249) Quit (Read error: Connection timed out)
[7:27] * davidzlap1 (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[7:27] * davidzlap1 (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit ()
[7:28] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Read error: Connection reset by peer)
[7:29] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[7:29] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[7:29] * swami1 (~swami@49.38.0.249) has joined #ceph
[7:36] * bvi (~Bastiaan@185.56.32.1) has joined #ceph
[7:36] * _28_ria (~kvirc@opfr028.ru) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[7:46] * swami1 (~swami@49.38.0.249) Quit (Read error: Connection timed out)
[7:48] * swami1 (~swami@49.38.0.249) has joined #ceph
[7:51] * tom_nz (~oftc-webi@202.14.217.2) Quit (Ping timeout: 480 seconds)
[7:55] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[8:05] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[8:13] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[8:13] * t4nk230 (~oftc-webi@117.247.186.15) Quit (Ping timeout: 480 seconds)
[8:20] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:21] * Izanagi (~Kristophe@tor-exit.squirrel.theremailer.net) has joined #ceph
[8:22] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[8:29] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Killed (NickServ (Too many failed password attempts.)))
[8:29] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[8:47] * rakeshgm (~rakesh@106.51.225.17) has joined #ceph
[8:47] * bb0x (~bb0x@78.97.194.150) Quit (Quit: Leaving)
[8:47] * rakeshgm (~rakesh@106.51.225.17) Quit ()
[8:51] * Izanagi (~Kristophe@5AEAAASAK.tor-irc.dnsbl.oftc.net) Quit ()
[8:56] * bitserker (~toni@88.87.194.130) has joined #ceph
[8:58] * evelu (~erwan@46.231.131.178) has joined #ceph
[8:59] * b0e (~aledermue@213.95.25.82) has joined #ceph
[9:00] * analbeard (~shw@support.memset.com) has joined #ceph
[9:02] * mehmetpg0223 (~top@84.254.67.15) has joined #ceph
[9:02] <mehmetpg0223> Hello guys
[9:05] * dgurtner (~dgurtner@178.197.235.87) has joined #ceph
[9:07] <mehmetpg0223> yesterday i fired up " ceph osd deep-scrub osd.9 " on my ceph cluster. Background is a pg (0.223) which resides on this osd as primary.
[9:09] <mehmetpg0223> Allways when this pg 0.223 get a deep-scrub, nearly my hole cluster is blocked because blocked requests are produced.
[9:09] * bitserker (~toni@88.87.194.130) Quit (Ping timeout: 480 seconds)
[9:10] <mehmetpg0223> all other deep-scrubs on this osd and the resides pgs dont have this effekt! Only pg 0.223. Anyone knows why or can direct me where i should have a look?
[9:11] <mehmetpg0223> in the logfiles for the osd/osds are no errors.
[9:14] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) has joined #ceph
[9:20] * bitserker (~toni@88.87.194.130) has joined #ceph
[9:21] <mehmetpg0223> does anyone know what this lines are telling?
[9:21] <mehmetpg0223> slow request 30.286466 seconds old, received at 2016-08-04 17:40:52.348411: osd_sub_op(unknown.0.0:0 0.223 MIN [scrub-map] v 0'0 snapset=0=[]:[]) currently queued_for_pg
[9:24] * rdas (~rdas@121.244.87.116) has joined #ceph
[9:28] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:28] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[9:32] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:39] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:39] * flesh (~oftc-webi@static.ip-171-033-130-093.signet.nl) has joined #ceph
[9:44] * art_yo (~kvirc@149.126.169.197) has joined #ceph
[9:44] <art_yo> Hi guys!
[9:45] <art_yo> could you give me an advise, what I am supposed to check.
[9:45] <art_yo> I increased pg_num of my pool
[9:45] <art_yo> ceph osd pool set rbd pg_num 256
[9:46] <art_yo> And now ceph -s shows this:
[9:46] <art_yo> [root@ceph-admin ~]# ceph -s
[9:46] <art_yo> cluster f8aa3ef3-e5c9-4bd1-9ee8-2b141ff2f485
[9:46] <art_yo> health HEALTH_WARN
[9:46] <art_yo> pool rbd pg_num 256 > pgp_num 8
[9:46] <art_yo> monmap e1: 1 mons at {ceph-admin=192.168.127.12:6789/0}
[9:46] <art_yo> election epoch 1, quorum 0 ceph-admin
[9:46] <art_yo> osdmap e15710: 12 osds: 12 up, 12 in
[9:46] <art_yo> pgmap v435868: 256 pgs, 1 pools, 3226 GB data, 818 kobjects
[9:46] <art_yo> 6514 GB used, 12558 GB / 20094 GB avail
[9:46] <boolman> art_yo: set pgp_num 256 as well
[9:46] <art_yo> 256 active+clean
[9:46] <boolman> ceph osd pool set rbd pgp_num 256
[9:46] * ffilzwin3 (~ffilz@c-76-115-190-27.hsd1.or.comcast.net) has joined #ceph
[9:47] <art_yo> boolman: what?
[9:47] <boolman> what what?
[9:47] <boolman> set pgp_num to 256
[9:47] <art_yo> but I've already done it
[9:48] <boolman> no, you set pg_num to 256
[9:48] <art_yo> ow
[9:48] <art_yo> exactly!
[9:49] <art_yo> sorry, I didn't notice. i should read more about ceph (i never heard about pgp)
[9:50] <art_yo> Thank you!
[9:50] <art_yo> have a good Friday:)
[9:50] <boolman> thx, u2
[9:51] * rendar (~I@host118-177-dynamic.27-79-r.retail.telecomitalia.it) has joined #ceph
[9:53] * ffilzwin2 (~ffilz@c-76-115-190-27.hsd1.or.comcast.net) Quit (Ping timeout: 480 seconds)
[9:55] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[9:56] * rakeshgm (~rakesh@106.51.225.17) has joined #ceph
[9:58] * DanFoster (~Daniel@2a00:1ee0:3:1337:f118:2435:4917:4c8b) has joined #ceph
[9:58] * dgurtner (~dgurtner@178.197.235.87) Quit (Read error: Connection reset by peer)
[9:59] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[10:00] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[10:01] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[10:08] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[10:18] * aj__ (~aj@x4db1ce8d.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[10:34] * aNupoisc (~adnavare@fmdmzpr03-ext.fm.intel.com) has joined #ceph
[10:39] * VampiricPadraig (~bildramer@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[10:39] * TMM (~hp@185.5.121.201) has joined #ceph
[10:41] * t4nk111 (~oftc-webi@abts-tn-dynamic-199.215.174.122.airtelbroadband.in) has joined #ceph
[10:42] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Read error: Connection reset by peer)
[10:43] * aNupoisc (~adnavare@fmdmzpr03-ext.fm.intel.com) Quit (Ping timeout: 480 seconds)
[10:44] * t4nk111 (~oftc-webi@abts-tn-dynamic-199.215.174.122.airtelbroadband.in) Quit ()
[10:47] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[10:48] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[10:53] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[10:54] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:7c62:b891:9b8a:4ede) has joined #ceph
[10:55] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:56] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[10:58] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:59] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[11:02] * evelu (~erwan@46.231.131.178) Quit (Ping timeout: 480 seconds)
[11:07] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Remote host closed the connection)
[11:08] * dgurtner (~dgurtner@178.197.235.87) has joined #ceph
[11:09] * VampiricPadraig (~bildramer@9YSAAA4W5.tor-irc.dnsbl.oftc.net) Quit ()
[11:10] * kefu (~kefu@114.92.96.253) has joined #ceph
[11:12] * parveenks (~oftc-webi@14.141.113.5.static-Delhi.vsnl.net.in) Quit (Ping timeout: 480 seconds)
[11:13] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[11:13] * thomnico (~thomnico@2a01:e35:8b41:120:1dcb:a4e9:24ee:41d4) has joined #ceph
[11:14] * kefu (~kefu@114.92.96.253) has joined #ceph
[11:14] <mehmetpg0223> Q: I have one PG (0.223) which stops nearly the hole cluster when a deep-scrub is in progress!
[11:15] <mehmetpg0223> It does not matter which osd is involved in this PG! It produce always blocked requests when a deep-scrub is in progress!
[11:17] <mehmetpg0223> When i do " ceph osd deep-scrub osd.XX " to start a deep-scrub on all PGs which resides on the OSD all PGs - except this one (!) 0.223 - are running without any issues.
[11:18] <mehmetpg0223> They finish within few seconds or instantly. But this mentionend PG 0.223 takes ~13-15 minutes to finish.
[11:18] <mehmetpg0223> Ceph don not find any issues when he doing the deep-scrub on this PG.
[11:19] <mehmetpg0223> What can be the cause that only this ****** PG produces so many blocked requests that all my VMs stops working?
[11:20] * flisky (~Thunderbi@106.38.61.183) has joined #ceph
[11:20] * evelu (~erwan@46.231.131.179) has joined #ceph
[11:27] <mehmetpg0223> I have changed the underlying disks/OSDs; Set the tunable to default/jewel; replaced the Switch; Done xfs check on the disks .... all without any success! Other hints?
[11:27] * basicxman (~kalleeen@46.166.137.248) has joined #ceph
[11:29] * EinstCra_ (~EinstCraz@58.247.119.250) has joined #ceph
[11:32] * aj__ (~aj@2001:6f8:1337:0:a184:787a:24ee:8088) has joined #ceph
[11:34] * dgurtner (~dgurtner@178.197.235.87) Quit (Read error: Connection reset by peer)
[11:35] * EinstCra_ (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[11:36] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[11:37] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[11:38] * penguinRaider_ (~KiKo@146.185.31.226) Quit (Ping timeout: 480 seconds)
[11:46] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[11:46] * kefu (~kefu@114.92.96.253) has joined #ceph
[11:48] * smithfarm (~smithfarm@80.188.202.66) has joined #ceph
[11:48] <smithfarm> kefu: hi, I was wondering how to tweak "osd scrub" and "osd deep scrub" configure settings to ensure that deep scrubs do not generate high I/O load?
[11:49] <mehmetpg0223> I belive this is a bug. It can not be that a deep-scrub on one pg blockes all requests on one osd.
[11:49] <kefu> smithfarm, deep scrubs generate high i/o load inherently.
[11:49] <smithfarm> I understand that "osd max scrubs" ensures that one OSD only handles one scrub operation at a time
[11:49] <smithfarm> i.e. default setting is 1
[11:50] <kefu> it read the whole object, xatt, and omap entries
[11:50] <smithfarm> and "osd scrub begin/end" ensures that scrubs are randomized over the entire 24 hours
[11:50] <smithfarm> mehmetpg0223: I am not sure that deep-scrub is blocking all requests on the OSD
[11:50] * wjw-freebsd2 (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[11:51] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[11:51] <smithfarm> the biggest problem seems to be that the DEADLINE scheduler gives much better performance than CFQ, but CFQ is required for the scrub-throttling feature to work
[11:51] * kefu (~kefu@114.92.96.253) has joined #ceph
[11:52] <smithfarm> kefu: the biggest problem seems to be that the DEADLINE scheduler gives much better performance than CFQ, but CFQ is required for the scrub-throttling feature to work - can you confirm this suspicion?
[11:52] <kefu> mehmetpg0223, when performing deep scrub, osd blocks accesses to objects in batch. so there are chance that all the requests at that moment go to a certain range of objects? and hence are blocked?
[11:53] <kefu> > CFQ is required for the scrub-throttling feature to work
[11:53] <kefu> i am not sure about this.
[11:54] <kefu> did we switch to CFQ for scrub-throttling ? probably we did...
[11:55] * bitserker (~toni@88.87.194.130) Quit (Ping timeout: 480 seconds)
[11:56] * ira (~ira@121.244.87.117) Quit (Remote host closed the connection)
[11:56] * penguinRaider_ (~KiKo@146.185.31.226) has joined #ceph
[11:57] * zack_dolby (~textual@p845d32.tokynt01.ap.so-net.ne.jp) has joined #ceph
[11:57] <smithfarm> that's what I heard
[11:57] * basicxman (~kalleeen@46.166.137.248) Quit ()
[11:57] <smithfarm> kefu: is this statement true? deep scrubs take place once every "osd deep scrub interval" regardless of load
[11:57] * utugi______ (~aleksag@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[11:57] * zack_dolby (~textual@p845d32.tokynt01.ap.so-net.ne.jp) Quit ()
[11:58] * dgurtner (~dgurtner@178.197.235.87) has joined #ceph
[11:59] <kefu> smithfarm scrub workqueue shares the same thread pool (disk tp) with other work queues. turning osd_disk_thread_ioprio_* will surely impact how scrub performs.
[12:00] <mehmetpg0223> @smithfarm: before a deep-scrub on the mentioned pg 0.223 starts there are other pgs (34 in count) procesed. But when 0.223 starts i see (allways!) the first slow than a few times later blocked requests...
[12:00] <mehmetpg0223> 2016-08-05 11:44:46.521839 osd.9 172.16.0.11:6834/115670 3371 : cluster [INF] 0.223 deep-scrub starts
[12:00] <mehmetpg0223> 2016-08-05 11:45:24.801150 osd.17 172.16.0.12:6808/7952 1750 : cluster [WRN] slow request 30.833447 seconds old, received at 2016-08-05 11:44:53.967628: replica scrub(pg: 0.223,from:0'0,to:13122'516994,epoch:13122,start:0:c44bc6a3::::0,end:0:c44ee07a::::0,chunky:1,deep:1,seed:4294967295,version:6) currently reached_pg
[12:00] <mehmetpg0223> 2016-08-05 11:45:24.237934 osd.9 172.16.0.11:6834/115670 3372 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.052354 secs
[12:00] <kefu> smithfarm, no.
[12:00] <kefu> it takes load into consideration
[12:02] <kefu> smithfarm you might want to read http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
[12:02] <kefu> we have "osd scrub load threshold"
[12:03] <mehmetpg0223> @kefu: i am doing my tests with deep-scrub on this pg many times over the day and the chance is not high that always only the same objects are in queue.
[12:05] <smithfarm> kefu: the documentation on "osd_disk_thread_ioprio_*" says Note: Only works with the Linux Kernel CFQ scheduler.
[12:05] <smithfarm> so if scheduler is DEADLINE, setting these parameters will have no effect
[12:05] <kefu> smithfarm, yes.
[12:05] <kefu> right.
[12:05] <kefu> but i just cannot connect it with scrub
[12:06] <kefu> but i agree turning it can impact the performance of scrub and other disk ops.
[12:07] * haomaiwang (~oftc-webi@61.149.85.206) Quit (Ping timeout: 480 seconds)
[12:07] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:08] * bitserker (~toni@88.87.194.130) has joined #ceph
[12:08] * kefu is now known as kefu|afk
[12:09] * evelu (~erwan@46.231.131.179) Quit (Ping timeout: 480 seconds)
[12:11] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[12:12] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[12:14] * kefu|afk (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[12:14] <mehmetpg0223> i will try " ceph osd deep-scrub osd.17 " - 17 is one of the replica from pg 0.223 and see whats happen
[12:14] <mehmetpg0223> ceph pg map 0.223
[12:14] <mehmetpg0223> osdmap e13122 pg 0.223 (0.223) -> up [9,17,23] acting [9,17,23]
[12:15] * kefu (~kefu@114.92.96.253) has joined #ceph
[12:17] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[12:17] * kefu (~kefu@114.92.96.253) has joined #ceph
[12:18] * smithfarm (~smithfarm@80.188.202.66) Quit (Ping timeout: 480 seconds)
[12:19] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[12:21] * evelu (~erwan@46.231.131.178) has joined #ceph
[12:22] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[12:23] * kefu (~kefu@114.92.96.253) has joined #ceph
[12:23] * dan__ (~Daniel@office.34sp.com) has joined #ceph
[12:23] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[12:27] * utugi______ (~aleksag@9YSAAA4YK.tor-irc.dnsbl.oftc.net) Quit ()
[12:28] * saintpablo (~saintpabl@gw01.mhitp.dk) has joined #ceph
[12:30] * DanFoster (~Daniel@2a00:1ee0:3:1337:f118:2435:4917:4c8b) Quit (Ping timeout: 480 seconds)
[12:31] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[12:32] * kefu (~kefu@114.92.96.253) has joined #ceph
[12:34] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-055.uzh.ch) has joined #ceph
[12:42] * flisky (~Thunderbi@106.38.61.183) Quit (Quit: flisky)
[12:45] * Hemanth (~hkumar_@103.228.221.143) Quit (Ping timeout: 480 seconds)
[12:48] <WildyLion> Hi there. Any advice about setting up RGW with many objects?
[12:48] <WildyLion> For some reason we're seeing weird index pool behaviour once we get above ~2M objects per bucket
[12:49] <WildyLion> currently we have sharded our index with 64 shards and we're seeing object write latency spikes all the time, I think it can be related to poor index optimization?
[12:50] <WildyLion> b/c some OSDs we see as almost unutilized and at the same time a bunch of other osds spikes up to 100%, I guess that's b/c we have only 128 PGs per index pool?
[12:50] <WildyLion> haven't tried moving index pool to SSDs though, I know it's recommended
[12:51] <WildyLion> and why we're seeing IOs on our index pool without any kb/s? like this: client io 101 kB/s rd, 0 B/s wr, 101 op/s rd, 101 op/s wr
[12:51] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[12:52] * lri_ (kebac@limbo.freebsd.fi) has joined #ceph
[12:57] <mehmetpg0223> so now " ceph osd deep-scrub osd.17 " - this is one of the disks which holds replica from pg 0.223. And al finished fine without any slow or blocked requests.
[12:58] * dgurtner (~dgurtner@178.197.235.87) Quit (Read error: Connection reset by peer)
[12:58] * lri (kebac@limbo.freebsd.fi) Quit (Ping timeout: 480 seconds)
[13:00] * gregmark (~Adium@68.87.42.115) has joined #ceph
[13:00] * neurodrone_ (~neurodron@162.243.191.67) has joined #ceph
[13:04] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[13:04] * kefu (~kefu@114.92.96.253) has joined #ceph
[13:06] <mehmetpg0223> @kefu "osd blocks accesses to objects in batch." is it possible to set a size/count for the objects in batch?
[13:08] * inevity (~androirc@107.170.0.159) has joined #ceph
[13:08] <mehmetpg0223> is it possible to see when the journal is flushing to disks?
[13:09] <kefu> mehmetpg0223 "osd recovery max chunk" and "osd recovery min chunk"
[13:09] <kefu> see http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
[13:10] * bitserker1 (~toni@88.87.194.130) has joined #ceph
[13:11] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Quit: Leaving.)
[13:11] <mehmetpg0223> Ill have a look on this.
[13:12] * bitserker (~toni@88.87.194.130) Quit (Ping timeout: 480 seconds)
[13:12] <mehmetpg0223> but i can not image that this should be optimised for a new installed ceph cluster where only a few vms resides on (rbd).
[13:12] <mehmetpg0223> should/have to
[13:12] * kefu is now known as kefu|afk
[13:13] * bniver (~bniver@pool-173-48-58-27.bstnma.fios.verizon.net) Quit (Remote host closed the connection)
[13:20] <mehmetpg0223> hmm... this seems only be related for "recovering" @ kefu. Shouldnt be have an effect on scrubbing.
[13:21] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[13:24] * inevity (~androirc@107.170.0.159) Quit (Remote host closed the connection)
[13:36] * penguinRaider_ (~KiKo@146.185.31.226) Quit (Ping timeout: 480 seconds)
[13:40] * racpatel__ (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[13:40] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Read error: Connection reset by peer)
[13:44] * rraja (~rraja@121.244.87.117) has joined #ceph
[13:46] * penguinRaider_ (~KiKo@14.139.82.6) has joined #ceph
[13:48] * georgem (~Adium@24.114.73.143) has joined #ceph
[13:49] * dgurtner (~dgurtner@178.197.235.87) has joined #ceph
[13:57] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-055.uzh.ch) Quit (Quit: Ex-Chat)
[13:58] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[13:59] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[14:05] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[14:14] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[14:17] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[14:22] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[14:24] * W|ldCraze (~Scrin@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[14:25] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[14:25] * rraja (~rraja@121.244.87.117) has joined #ceph
[14:26] * scuttle|afk is now known as scuttlemonkey
[14:26] * zack_dolby (~textual@p845d32.tokynt01.ap.so-net.ne.jp) has joined #ceph
[14:31] * RayTracer (~RayTracer@153.19.7.39) has joined #ceph
[14:31] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[14:32] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[14:33] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[14:35] * georgem (~Adium@24.114.73.143) Quit (Quit: Leaving.)
[14:36] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[14:47] * kefu|afk (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[14:47] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:48] * kefu (~kefu@114.92.96.253) has joined #ceph
[14:49] * racpatel__ (~Racpatel@2601:87:0:24af::53d5) Quit (Quit: Leaving)
[14:50] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) Quit (Quit: bye)
[14:50] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) has joined #ceph
[14:52] * RayTracer (~RayTracer@153.19.7.39) Quit (Remote host closed the connection)
[14:52] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:54] * W|ldCraze (~Scrin@5AEAAASGY.tor-irc.dnsbl.oftc.net) Quit ()
[14:55] * georgem (~Adium@206.108.127.16) has joined #ceph
[15:05] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:05] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit ()
[15:06] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[15:06] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:07] * gregmark (~Adium@68.87.42.115) has joined #ceph
[15:16] * praveen (~praveen@171.61.115.158) Quit (Remote host closed the connection)
[15:19] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[15:20] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:24] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[15:24] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[15:25] * kefu (~kefu@114.92.96.253) has joined #ceph
[15:26] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[15:27] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Remote host closed the connection)
[15:28] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[15:32] * EinstCrazy (~EinstCraz@58.39.77.21) has joined #ceph
[15:37] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[15:37] * dgurtner (~dgurtner@178.197.235.87) Quit (Read error: Connection reset by peer)
[15:37] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[15:42] * wewe0901 (uid146646@id-146646.tooting.irccloud.com) Quit (Quit: Connection closed for inactivity)
[15:46] * inevity (~androirc@107.170.0.159) has joined #ceph
[15:50] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:50] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[15:58] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Remote host closed the connection)
[15:58] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[16:04] * saintpablo (~saintpabl@gw01.mhitp.dk) Quit (Ping timeout: 480 seconds)
[16:05] * zhen (~Thunderbi@43.255.178.224) has joined #ceph
[16:06] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[16:08] * art_yo (~kvirc@149.126.169.197) Quit (Ping timeout: 480 seconds)
[16:11] * dgurtner (~dgurtner@178.197.235.87) has joined #ceph
[16:11] * salwasser (~Adium@72.246.3.14) Quit (Quit: Leaving.)
[16:13] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:13] * dgurtner (~dgurtner@178.197.235.87) Quit (Read error: Connection reset by peer)
[16:14] * cronburg (~cronburg@wr-130-64-194-145.medford.tufts.edu) has joined #ceph
[16:14] * swami1 (~swami@49.38.0.249) Quit (Quit: Leaving.)
[16:14] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[16:15] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[16:17] * zack_dolby (~textual@p845d32.tokynt01.ap.so-net.ne.jp) Quit (Quit: Textual IRC Client: www.textualapp.com)
[16:18] * sebastian-w (~quassel@212.218.8.138) Quit (Remote host closed the connection)
[16:18] * sebastian-w (~quassel@212.218.8.138) has joined #ceph
[16:19] * bvi (~Bastiaan@185.56.32.1) Quit (Ping timeout: 480 seconds)
[16:21] * joshd1 (~jdurgin@2602:30a:c089:2b0:8528:33f3:a544:94a5) has joined #ceph
[16:21] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[16:22] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[16:29] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[16:29] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[16:29] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) has joined #ceph
[16:31] * fabioFVZ (~fabiofvz@213.187.10.8) has joined #ceph
[16:31] * fabioFVZ (~fabiofvz@213.187.10.8) Quit ()
[16:33] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) has joined #ceph
[16:39] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[16:43] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[16:46] * kefu (~kefu@114.92.96.253) has joined #ceph
[16:47] * david_ (~david@207.107.71.71) has joined #ceph
[16:47] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[16:47] * zhen (~Thunderbi@43.255.178.224) Quit (Ping timeout: 480 seconds)
[16:50] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[16:50] * kefu (~kefu@114.92.96.253) has joined #ceph
[16:52] * zhen (~Thunderbi@43.255.178.224) has joined #ceph
[16:54] * Silentkillzr (~Uniju@185.65.134.77) has joined #ceph
[16:54] <ceph-ircslackbot> <vdb> Does Ceph need journaling for erasure-coding pools?
[16:55] <ceph-ircslackbot> <vdb> Does it use for stashing EC log there?
[16:57] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[16:57] <SamYaple> @vdb ceph needs journaling for everything, its about how the file write to the disk and in what order. it wont need journaling when bluestore is used though
[16:58] <ceph-ircslackbot> <vdb> Makes sense. The placement and reconstruction mechanism can be disjoint.
[16:58] <ceph-ircslackbot> <vdb> Thanks @SamYaple.
[16:59] <ceph-ircslackbot> <vdb> Are there any recommendations on the hardware for gateway instances?
[16:59] <ceph-ircslackbot> <vdb> I can barely find any discussion regarding this on the mailing list.
[16:59] <ceph-ircslackbot> <vdb> I would assume they will need a bunch of RAM and that should be it probably?
[17:01] * haomaiwang (~oftc-webi@114.249.239.114) has joined #ceph
[17:01] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:01] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) Quit (Ping timeout: 480 seconds)
[17:01] * cronburg (~cronburg@wr-130-64-194-145.medford.tufts.edu) Quit (Ping timeout: 480 seconds)
[17:02] * dgurtner (~dgurtner@178.197.235.87) has joined #ceph
[17:05] * cronburg (~cronburg@wr-130-64-194-145.medford.tufts.edu) has joined #ceph
[17:06] * mykola (~Mikolaj@193.93.217.35) has joined #ceph
[17:10] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) has joined #ceph
[17:10] * praveen (~praveen@171.61.115.158) has joined #ceph
[17:10] * dgurtner (~dgurtner@178.197.235.87) Quit (Ping timeout: 480 seconds)
[17:11] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:12] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:12] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[17:13] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[17:16] * yanzheng1 (~zhyan@125.70.20.176) Quit (Quit: This computer has gone to sleep)
[17:17] * yanzheng1 (~zhyan@125.70.20.176) has joined #ceph
[17:18] <billwebb> running Hammer, is there concern about inconsisten pg???s slowing down the cluster?
[17:18] <billwebb> this is with an EC pool. we???re seeing a coupel slow requests per minute
[17:23] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:23] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[17:24] * kefu (~kefu@114.92.96.253) has joined #ceph
[17:24] * Silentkillzr (~Uniju@26XAAAUS7.tor-irc.dnsbl.oftc.net) Quit ()
[17:25] * yanzheng1 (~zhyan@125.70.20.176) Quit (Quit: This computer has gone to sleep)
[17:26] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Ping timeout: 480 seconds)
[17:26] * tsg (~tgohad@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[17:28] * tsg_ (~tgohad@192.55.54.43) has joined #ceph
[17:30] * dnunez (~dnunez@130.64.25.58) has joined #ceph
[17:34] * tsg (~tgohad@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[17:34] * programo (~programo@2601:240:c600:b150:65d3:748:b645:aad5) has joined #ceph
[17:34] * garphy is now known as garphy`aw
[17:36] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[17:38] * cronburg (~cronburg@wr-130-64-194-145.medford.tufts.edu) Quit (Quit: Leaving)
[17:38] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Read error: Connection reset by peer)
[17:39] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[17:40] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[17:40] * kefu (~kefu@ec2-54-64-13-168.ap-northeast-1.compute.amazonaws.com) has joined #ceph
[17:41] <programo> Hi I want to learn about distributed systems, Is Ceph is a good place to start contributing ?
[17:44] * zhen (~Thunderbi@43.255.178.224) Quit (Ping timeout: 480 seconds)
[17:44] * dgurtner (~dgurtner@178.197.232.170) has joined #ceph
[17:46] * aj__ (~aj@2001:6f8:1337:0:a184:787a:24ee:8088) Quit (Ping timeout: 480 seconds)
[17:47] * cathode (~cathode@50.232.215.114) has joined #ceph
[17:49] * xarses_ (~xarses@64.124.158.192) has joined #ceph
[17:49] * kefu (~kefu@ec2-54-64-13-168.ap-northeast-1.compute.amazonaws.com) Quit (Remote host closed the connection)
[17:50] * kefu (~kefu@114.92.96.253) has joined #ceph
[17:51] * bitserker1 (~toni@88.87.194.130) Quit (Ping timeout: 480 seconds)
[17:52] * EinstCrazy (~EinstCraz@58.39.77.21) Quit (Remote host closed the connection)
[17:53] * karnan (~karnan@106.51.143.72) has joined #ceph
[17:53] * dnunez (~dnunez@130.64.25.58) Quit (Ping timeout: 480 seconds)
[17:58] * mehmetpg0223 (~top@84.254.67.15) Quit (Quit: bye)
[17:59] * blizzow (~jburns@50.243.148.102) has joined #ceph
[18:05] * evelu (~erwan@46.231.131.178) Quit (Remote host closed the connection)
[18:06] * aNupoisc (~adnavare@134.134.139.74) has joined #ceph
[18:08] * dgurtner (~dgurtner@178.197.232.170) Quit (Read error: Connection reset by peer)
[18:09] * programo (~programo@2601:240:c600:b150:65d3:748:b645:aad5) Quit (Ping timeout: 480 seconds)
[18:14] * eXeler0n (~basicxman@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[18:14] * valeech (~valeech@pool-108-44-162-111.clppva.fios.verizon.net) has joined #ceph
[18:16] <IcePic> its a distributed file system, might not be exactly what you aimed for.
[18:17] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[18:18] * kefu (~kefu@114.92.96.253) has joined #ceph
[18:21] * racpatel__ (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[18:21] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Read error: Connection reset by peer)
[18:30] * kefu is now known as kefu|afk
[18:41] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[18:44] * eXeler0n (~basicxman@61TAAA39P.tor-irc.dnsbl.oftc.net) Quit ()
[18:44] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[18:46] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Quit: Leaving.)
[18:52] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:53] * Sliker (~nastidon@108.61.99.238) has joined #ceph
[18:53] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[18:53] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[18:53] * kefu|afk is now known as kefu
[18:58] * flesh (~oftc-webi@static.ip-171-033-130-093.signet.nl) Quit (Ping timeout: 480 seconds)
[19:00] * programo (~programo@2601:240:c600:b150:65d3:748:b645:aad5) has joined #ceph
[19:01] * art_yo (~kvirc@149.126.169.197) has joined #ceph
[19:02] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[19:05] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[19:09] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:10] * EinstCrazy (~EinstCraz@58.39.77.21) has joined #ceph
[19:11] * [0x7c1] (~1985@terminator.vision) has joined #ceph
[19:11] * dnovosel (87f53055@107.161.19.109) has joined #ceph
[19:12] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[19:14] * dan__ (~Daniel@office.34sp.com) Quit (Quit: Leaving)
[19:14] <dnovosel> I have a ceph environment we have slowly been growing, and we now have 4 nodes with 56 OSDs total. Around 25TB raw storage, with 5 pools configured. One thing we have noticed is that we often are having to manually reweigh OSDs, as we can get deviations of drives from 55% - 75-80%. Which seems quite high. I was wondering if this can be consider
[19:14] <dnovosel> ed normal, and from a production perspective what steps are appropriate to try and keep things better balanced. Will more OSDs / nodes help? Should we schedule a daily reweight-by-utilization operation? What are others doing in this regard?
[19:15] <SamYaple> dnovosel: are all of the disks the same size?
[19:15] <SamYaple> if not, are they weighted accordingly?
[19:16] <dnovosel> Two sizes [300GB & 1.2TB] but they are appropriate weighted as 0.3 and 1.2 [ish].
[19:17] <SamYaple> dnovosel: how many pgs exist for each pool?
[19:17] <SamYaple> this sounds liek a pg problem if you have the size/weight correct (which you do)
[19:18] <SamYaple> so list your five pools, what the pg and pgs are for each (should be the same) and the ammount of data in each pool please
[19:19] <dnovosel> All 5 pools are configured for 768 PGs
[19:21] * dnunez (~dnunez@130.64.25.58) has joined #ceph
[19:21] <SamYaple> dnovosel: and the amount of data used in each pool?
[19:21] <dnovosel> rbd 394k 0
[19:21] <dnovosel> volumes 85183M 0.33
[19:21] <dnovosel> images 4373G 17.21
[19:21] <dnovosel> backups 0 0
[19:21] <dnovosel> vms 1485G 5.85
[19:21] <dnovosel> shared-storage 1796G 7.07
[19:22] <SamYaple> you have six pools, is backups also configued with 768 pgs?
[19:23] * Sliker (~nastidon@26XAAAUWI.tor-irc.dnsbl.oftc.net) Quit ()
[19:23] * CoZmicShReddeR (~sardonyx@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[19:23] <SamYaple> i can go ahead and tell you part of the issue. you have 56 osds and only ~4000 placement groups. thats less than 100 pgs per osd
[19:23] <dnovosel> Yes, backups is also 768.. and sorry, I forgot about the default rbd as we aren't using it.
[19:23] <SamYaple> thats why youre so lopsided
[19:23] <dnovosel> Although we haven't started to use backups either.
[19:23] <dnovosel> What number should we be aiming for PGs?
[19:24] <SamYaple> you should shoot for ~200 pgs per osd, and these should be allocated based on pool usage as well
[19:24] <SamYaple> dnovosel: do you expect the same data distribution moving forward?
[19:25] <dnovosel> Images / shared-storage will taper off as we go forward, and vms will be the major pool that will be increasing over time.
[19:25] <SamYaple> dnovosel: do you expect to expand the cluster in teh immediate future?
[19:26] <dnovosel> We are going to be adding 2 nodes / 24 more OSDs in about a month.
[19:26] <SamYaple> ok let me do some math, moment
[19:26] <dnovosel> SamYaple: Thanks!
[19:28] <SamYaple> I would suggest using 2^14 placement groups in total, that is 16384. that gives ~200pgs per osd, so still room to grow. you never want to drop below 100pgs / osd (you are currently at ~65pgs/osd)
[19:28] <SamYaple> now lets talk about how to distribute those
[19:28] <SamYaple> if you dont need the rbd pool, remove it
[19:28] <dnovosel> Okay, I'll pull it out. We don't need it.
[19:28] <SamYaple> if you dont want to remove it, please confirm its pgs count (defualt is 128)
[19:28] <SamYaple> ok
[19:29] <SamYaple> with openstack, how heavily are you planning on using cinder?
[19:29] <SamYaple> im trying to get a feel for total utilization of the cluster
[19:29] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:30] <dnovosel> Not super heavily. Mostly COW images booting into empheral volumes that are destroyed on terminate.
[19:30] <dnovosel> Images will slowly scale up, but I expect as we increase production load vms will be the major growth segment.
[19:30] <SamYaple> is your expected percentages close to this? -- 20% cinder, 10% glace, 40% nova, 10% cinder-backup, 20% shared-storage?
[19:31] <SamYaple> basically youre jsut trying to get close, so figure out those percentages up to 100%
[19:32] <dnovosel> 15 cinder, 15 glance, 50 nova, 10 cinder backup, 10 shared-storage would be my estimate.
[19:32] <SamYaple> cool. so the goal now is to just take that number, 2^14 == 16384 and figure out those pgs based on that
[19:32] <SamYaple> pgs can always be adjsuted up, never down
[19:32] <dnovosel> Okay, awesome. I can definitely do that.
[19:33] <SamYaple> the ideal goal is to have a power of 2 for the total pgs, it helps some internal maths
[19:33] <dnovosel> So basically over time, if we need more [to achieve ~ 200 / OSD] we can just increase those values?
[19:33] <SamYaple> technically the perfect ideal is ~100, 200pgs is for growth without needing to rebalance
[19:33] <SamYaple> adjusting the pgs is one of, if not the most, intensive things you can do
[19:33] <SamYaple> you dont want to do it often
[19:33] <dnovosel> Ah, okay. So 200 will cover off my future expanded growth next month.
[19:34] <dnovosel> Or at least a good portion of it.
[19:34] <SamYaple> actually the 200 is post growth (80 osds)
[19:34] <SamYaple> so you have future _future_ growth
[19:34] <dnovosel> Ah, okay. So right now it would be even higher?
[19:34] * karnan (~karnan@106.51.143.72) Quit (Ping timeout: 480 seconds)
[19:34] <SamYaple> it will be ~300, which is really the highest it even should be
[19:35] <SamYaple> if you are worried, you can do 2^13 == 8192 pgs in total
[19:35] <SamYaple> that gets you to ~100pgs after growth
[19:35] <SamYaple> i would recomemnd 2^14 though
[19:35] <dnovosel> Well I don't mind doing 2^14 for now.
[19:36] <SamYaple> the pgs are a bit magic, and I wont claim to fully understand the inner workings of ceph, but this is absolutely the cause of the issue you are seeing, so once you resolve it youll be fine
[19:36] * thomnico (~thomnico@2a01:e35:8b41:120:1dcb:a4e9:24ee:41d4) Quit (Quit: Ex-Chat)
[19:36] <dnovosel> SamYaple: Well thanks for the all the help, it's definitely appreciated.
[19:36] <SamYaple> again, to warn you, you are going to be moving a TON of data, it will take a while to recover
[19:36] <SamYaple> keep that in mind
[19:36] <dnovosel> I have 20G LAG on a separate data network.
[19:37] <dnovosel> Just for ceph backend stuff.
[19:37] <SamYaple> fair enough, just keep it in mind
[19:37] <dnovosel> So I'm not that worried about the load.
[19:37] * bitserker (~toni@13.38.15.37.dynamic.jazztel.es) has joined #ceph
[19:38] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[19:38] <dnovosel> I'll try making those changes, and see where things go. Thanks again for provide some assistance, and I'll definitely do some more reading up behind the science [art] of pgs.
[19:39] <SamYaple> dnovosel: best things to keep in mind, TOTAL pgs should be a power of 2. pgs should be allocated for usage (best guess). shoot for between 100-200pgs per osd (300 if you expect to grow immmediatly).
[19:39] <SamYaple> most important dnovosel, pgs grow, they do not shrink. making a mistake means recreating the pool
[19:40] <dnovosel> Yeah, I'll definitely keep that in mind.
[19:40] <dnovosel> It's good advice, thanks!
[19:41] <dnovosel> Oh, does replica count make a difference to the calculations?
[19:43] * dgurtner (~dgurtner@178.197.225.50) has joined #ceph
[19:46] <SamYaple> dnovosel: good question... i dont actually know. i would assume not since thats just more objects
[19:46] <SamYaple> dnovosel: im pretty sure it doesnt, but its worth researching
[19:47] * swami1 (~swami@27.7.170.126) has joined #ceph
[19:47] <dnovosel> Right, I guess it would only matter if we had pools set to different values [which we don't right now, but may end up doing next year].
[19:48] <SamYaple> dnovosel: also remember the ceph docs have wrong permissions listed for the cephx keys if yo uwant to do a nova->glance cow snapshot
[19:48] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[19:48] <SamYaple> youll have to give the nova key rwx to glance pool for it to work
[19:50] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[19:50] <dnovosel> Yeah, we ran into that a while ago.
[19:50] <dnovosel> We've had a semi-legit test workload running now for the last couple of months.
[19:50] <SamYaple> you may want to be proactive and add cinder->glance cow snapshots as well. that will land in octavia (not going to make it for newton i fear)
[19:51] <dnovosel> Ah, good to know.
[19:51] <dnovosel> Thanks!
[19:51] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[19:53] * CoZmicShReddeR (~sardonyx@61TAAA4A9.tor-irc.dnsbl.oftc.net) Quit ()
[19:54] <dnovosel> Well thanks again for helping. It's definitely appreciated. I'm a networking / virtualization guy who has been trying to learn storage for the last few months, and it's been quite the adventure so far :)
[19:55] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[20:01] * Discovery (~Discovery@109.235.52.3) has joined #ceph
[20:13] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[20:14] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[20:16] <SamYaple> dnovosel: thats my background as well, but i love me some ceph
[20:16] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[20:19] * kefu is now known as kefu|afk
[20:20] * adept256 (~qable@192.73.244.121) has joined #ceph
[20:22] * EinstCrazy (~EinstCraz@58.39.77.21) Quit (Remote host closed the connection)
[20:28] <srk_> SamYaple: hi
[20:28] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[20:29] <SamYaple> hello srk_
[20:30] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[20:31] <srk_> I've been doing fio runs in addition to the rados benchmarks. IOPS of randrw writes are not consistent.
[20:32] <srk_> sudo fio -filename=/dev/vdb -direct=1 -ioengine=libaio -rw=randwrite -bs=4k -name=srtest -iodepth=5 -runtime=300 -time_based
[20:32] <SamYaple> srk_: thats inside the vm?
[20:33] * bitserker (~toni@13.38.15.37.dynamic.jazztel.es) Quit (Quit: Leaving.)
[20:33] <srk_> with a 10GB pre-allocated cinder volume, and 35 VM concurrency. (total of 60 VMs with 2 per compute hosts)
[20:33] <srk_> Yes, its inside the vm
[20:35] <SamYaple> srk_: if its a cinder volume, ill need to know what version of openstack
[20:35] * penguinRaider_ (~KiKo@14.139.82.6) Quit (Ping timeout: 480 seconds)
[20:35] <SamYaple> older cinder+nova didnt do the correct drivers
[20:35] <srk_> the 4k randwrite run total iops go like this: 7338, 10945, 9147, 2098, 2988, 2250
[20:35] <SamYaple> srk_: getting read to run
[20:35] <srk_> latest
[20:35] <srk_> we tried Mitaka and Kilo before that
[20:37] * swami2 (~swami@27.7.170.126) has joined #ceph
[20:37] <srk_> The "ceph osd perf" show latency of osds on 1 ceph node is much longer than the other 2
[20:38] <srk_> The dirtydata of bcache does not reach the size for "writeback_percent" (default to 10)
[20:39] * swami1 (~swami@27.7.170.126) Quit (Ping timeout: 480 seconds)
[20:39] <srk_> We tried to set "writeback_percent" to 0 for all the ceph nodes to clean the dirtydata, OS reboot, rerun the test , and same result
[20:39] <SamYaple> srk_: you are probably using writeback on the compute node and thats going to attempt to coallese the writes which then may be larger than the seqcutof size
[20:42] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[20:42] <srk_> yes, in nova, we use disk_cachemode as "network=writeback"
[20:43] <srk_> iirc, without that the numbers were even bad
[20:43] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[20:43] * penguinRaider_ (~KiKo@146.185.31.226) has joined #ceph
[20:44] <SamYaple> yes you want writeback
[20:47] <srk_> there are no iperf issues. Tested with multiple hardware. Kind of lost on what else to look for.
[20:47] * ntpttr_ (~ntpttr@192.55.54.43) has joined #ceph
[20:47] <SamYaple> still setting up srk_
[20:47] * kefu|afk (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[20:47] <SamYaple> about to run fio on a 10gb cinder volume prealloc
[20:48] <srk_> ok
[20:50] * adept256 (~qable@192.73.244.121) Quit ()
[20:50] * swami2 (~swami@27.7.170.126) Quit (Quit: Leaving.)
[20:53] * efirs (~firs@31.173.241.114) has joined #ceph
[20:53] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:59] * rakeshgm (~rakesh@106.51.225.17) Quit (Quit: Leaving)
[21:00] * joshd1 (~jdurgin@2602:30a:c089:2b0:8528:33f3:a544:94a5) Quit (Quit: Leaving.)
[21:01] * efirs (~firs@31.173.241.114) Quit (Ping timeout: 480 seconds)
[21:01] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[21:02] * LegalResale (~LegalResa@66.165.126.130) has joined #ceph
[21:06] * dgurtner (~dgurtner@178.197.225.50) Quit (Ping timeout: 480 seconds)
[21:07] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[21:08] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[21:10] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[21:12] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:13] <SamYaple> srk_: im getting ~2k iops
[21:13] <SamYaple> srk_: so you are using the virtio driver, I would suggest using the virtio-scsi driver
[21:13] <SamYaple> its much much better in everyway
[21:13] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[21:14] <srk_> can give that a try. Any good documentation on that?
[21:15] <srk_> I'll look that up.. Do you use virtio-scsi?
[21:15] <SamYaple> yes
[21:15] <SamYaple> glance image-update --property hw_scsi_model=virtio-scsi --property hw_disk_bus=scsi <image-id>
[21:16] <SamYaple> youll have to launch a new instance
[21:16] <srk_> ok.
[21:17] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[21:17] <srk_> for the fio run, did you run it on single VM or more?
[21:17] <SamYaple> single
[21:17] <SamYaple> with more vms i get more performance
[21:17] <SamYaple> there is a virtio bottleneck
[21:19] <srk_> got it. Mine is just 3 node cluster with 18 osds with 512 pgs.
[21:19] <srk_> Is 512 too low?
[21:19] * mykola (~Mikolaj@193.93.217.35) Quit (Quit: away)
[21:20] <srk_> Hammer used to throw a warning when I tried increasing the pg count to 1024.
[21:20] <SamYaple> srk_: 512 pgs across all pools with 18 osds?
[21:20] <srk_> just one pool
[21:20] <srk_> my glance is on swift
[21:20] <srk_> single pool for rbd
[21:20] <SamYaple> wow. yea thats too low. thats only 28pgs per osd
[21:20] <SamYaple> is that all thats on your ceph cluster?
[21:21] <srk_> yes
[21:21] <srk_> our base clusters are usually 3 node, 3 copy with 18 osds
[21:21] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[21:22] <srk_> I was using the formula: no.of osds x 100 / replica count to arrive at the pg number
[21:22] <SamYaple> yea youll want to bump that to at least 2048, but 4096 isnt unreasonable (esspecially if you plan on growing it)
[21:22] <srk_> even for small clusters, its ok to go with 100-200 pgs per osd, right?
[21:24] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[21:25] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[21:25] <aNupoisc> Hi! I have prepared and activated OSDs should I expect an entry for the OSDs prepared in ceph.conf? Or do we need to manually add in ceph.conf?
[21:26] <[arx]> no, you can if you want.
[21:26] <srk_> there will be no entry in ceph.conf. It can be added. But, not a must
[21:28] <aNupoisc> Oh okay so no harm if it doesn't exist [arx]: srk_:
[21:28] <aNupoisc> great
[21:31] * stefan0 (~stefano@168.205.191.253) has joined #ceph
[21:32] * natarej (~natarej@101.188.54.14) has joined #ceph
[21:33] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[21:33] * tsg__ (~tgohad@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[21:36] * tsg__ (~tgohad@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[21:36] * tsg__ (~tgohad@192.55.54.40) has joined #ceph
[21:37] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[21:38] * tsg_ (~tgohad@192.55.54.43) Quit (Remote host closed the connection)
[21:47] * doppelgrau_ (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[21:49] <dnovosel> SamYaple: I tried playing with the math you mentioned, and when I tried to assign 5939 as my PG number for one of my pools, ceph complained it only supports a maximum of 32 PGs/OSD which is around 1792. So I'm not sure those numbers we discussed are going to work [unless there is a flag to override those values].
[21:49] <doppelgrau_> Hi
[21:49] <dnovosel> Also, I am currently on infernalis, if that somehow makes a difference [as we didn't discuss that earlier].
[21:50] * blizzow (~jburns@50.243.148.102) Quit (Ping timeout: 480 seconds)
[21:51] <doppelgrau_> I have a strange problem: after updateing two of three mons from hammer to jewel everything worked for two days, then the third (hammer) mon died with: Shutting down because I do not support required monitor features: { compat={},rocompat={},incompat={7=support shec erasure code} }
[21:52] <doppelgrau_> tried to restart, but now I only get these messages (and mon is still down): e17 handle_probe missing features, have 55169095435288575, required 18416819765248, missing 0
[21:52] <doppelgrau_> any ideas?
[21:52] * bvi (~Bastiaan@102-117-145-85.ftth.glasoperator.nl) has joined #ceph
[21:52] * bvi (~Bastiaan@102-117-145-85.ftth.glasoperator.nl) Quit ()
[21:56] * rendar (~I@host118-177-dynamic.27-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:57] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[21:59] <SamYaple> dnovosel: those numbers are correct, however i believe at some point in the past they put a limit on the ammount of pgs you can jump up at once
[22:00] * ntpttr_ (~ntpttr@192.55.54.43) Quit (Remote host closed the connection)
[22:00] <dnovosel> Ah.. so maybe do it in a few smaller steps?
[22:00] <SamYaple> dnovosel: is this cluster live in production right now? (can we abuse it and have it be unresponsive for a few minutes?)
[22:00] <dnovosel> It's somewhat live, yes. There is real work on it.
[22:00] <SamYaple> then youll want to rebalance in slower steps
[22:00] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[22:01] <dnovosel> Okay.. I'm okay with doing that.
[22:01] <SamYaple> that means pg, then pgs then wait for HEALTH_OK
[22:01] <SamYaple> the proceed until you reach your desired number
[22:01] <SamYaple> yup, found it dnovosel http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-July/041399.html
[22:01] <dnovosel> Okay, I'll give that a try. Thanks again! I was finding a similar consensus on some google searching, but the information is spread across so many releases I wasn't sure if things changed at some point.
[22:02] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[22:02] <SamYaple> that mailing list says it was a security feature added to prevent locking up the cluster while upgrading pgs
[22:03] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[22:05] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[22:07] * tsg_ (~tgohad@134.134.139.76) has joined #ceph
[22:08] * art_yo (~kvirc@149.126.169.197) Quit (Ping timeout: 480 seconds)
[22:10] <doppelgrau_> and expect quite a lot of IO while increasing the PGs, do ist when you have a maintenance window and/or quite a lot of reserve IO
[22:13] * tsg__ (~tgohad@192.55.54.40) Quit (Remote host closed the connection)
[22:18] * scuttlemonkey is now known as scuttle|afk
[22:20] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[22:21] <SamYaple> doppelgrau_: wow. that seems wierd
[22:21] <SamYaple> doppelgrau_: to understand a bit more, did you finish the hammer -> jewel upgrade or are you in an inconsistent state?
[22:23] * rendar (~I@host118-177-dynamic.27-79-r.retail.telecomitalia.it) has joined #ceph
[22:23] <doppelgrau_> SamYaple: inconsitent state, want to proceed, but somehow the update broke qemu/xen accessing rbd-devices and the VMs running on top are now migrated to other hosts, but not enough to free a third host atm
[22:23] * doppelgrau_ is not really happy with that situation
[22:24] <SamYaple> doppelgrau_: can you run the following two commands and pastebin the results `ceph tell osd.* version` `ceph tell mon.* version`
[22:25] * ntpttr_ (~ntpttr@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[22:27] <doppelgrau_> SamYaple: the later command hangs, since one mon is down, paste you the mons directly
[22:28] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[22:29] <doppelgrau_> http://pastebin.com/0SRaQk0U
[22:29] <stefan0> guys, actually I??m running firefly
[22:30] <stefan0> tomorrow I??ll try to upgrade untils infernalis
[22:30] <stefan0> any hint/fasttrack?
[22:30] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[22:30] <doppelgrau_> has someone a redhat subscription? ???found a teaser??? that looks like it could help: https://access.redhat.com/solutions/2115161
[22:30] <stefan0> SamYaple, read about the bcache+osds that you told yesterday, that??ll be our next improve, looks really good.
[22:31] <SamYaple> stefan0: make sure to test it first. its not a silver bullet
[22:31] <SamYaple> doppelgrau_: ok. so one, man your versions are all over the place!
[22:32] <stefan0> reading the version releases, to upgrade from firefly >> infernalis is just going up (one version per version) just upgrading the mon+osd binary? kind that?
[22:33] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:7c62:b891:9b8a:4ede) Quit (Ping timeout: 480 seconds)
[22:33] <SamYaple> stefan0: i prefer to stick to LTS, as most people do. so you would want to firefly >> jewel in that case. but you should read the upgrades release notes. there is an order and some upgrades need different things
[22:33] <srk_> stefan0: did you read this: http://ceph.com/releases/v9-2-0-infernalis-released ?
[22:33] <doppelgrau_> SamYaple: yes, should be fixed - but the mons are only ???latest??? hammer and jewei, so wondering what went wrong there, even after running for days without problems
[22:34] <srk_> "Upgrading directly from Firefly v0.80.z is not recommended"
[22:34] <SamYaple> srk_: it would be the same issue for firefly >> infernalis
[22:35] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[22:35] <srk_> right.even for Jewel.
[22:35] <SamYaple> but stefan0 srk_ is correct, its best to move to latest hammer and let the internal fencing upgrade all finish before moving to jewel
[22:35] <srk_> UPGRADING FROM FIREFLY
[22:35] <srk_> Upgrading directly from Firefly v0.80.z is not recommended. It is possible to do a direct upgrade, but not without downtime, as all OSDs must be stopped, upgraded, and then restarted. We recommend that clusters be first upgraded to Hammer v0.94.6 or a later v0.94.z release; only then is it possible to upgrade to Jewel 10.2.z for an online upgrade (see below).
[22:35] <rkeene> I made a picture: http://www.rkeene.org/viewer/tmp/ceph-and-clients.png.htm
[22:35] <rkeene> Trying to figure out how to express that better
[22:35] <SamYaple> doppelgrau_: that third mon node says its still latest hammer version
[22:36] <SamYaple> doppelgrau_: i would triple check the upgrade actually placed teh correct jewel binary
[22:36] <rkeene> That's why I like my upgrade system -- upgrades are atomic :-D
[22:36] * Jeffrey4l (~Jeffrey@119.251.163.12) has joined #ceph
[22:36] <doppelgrau_> SamYaple: yes, I know - can???t update them to jewel ATM sice a qemu bug prevents me moving the VMs to the allready updated hosts
[22:36] <stefan0> srk_, not yet, thanks for the hint
[22:37] <SamYaple> doppelgrau_: well you can't roll back from jewel
[22:37] <stefan0> last night read that going up to jewel would require more caution.. I??ll set the reading again
[22:37] <SamYaple> doppelgrau_: youve already started the mons upgrade, you wont be able to get back to hammer with that is my understanding
[22:37] <doppelgrau_> but why did it run for two days with the mixed hammer/jewel mon setup?
[22:37] <SamYaple> doppelgrau_: but you are trying to launch the hammer binaries
[22:38] <stefan0> SamYaple, do you recommend first of all going up or setting bcache+osds? Maybe if going up to jewel may set the whole bcache stuff less vital.
[22:38] <SamYaple> doppelgrau_: i have no clue. but you arent really supposed to run that long in the inconsistent state. sounds like a bug, but thats unlikely to help you
[22:38] <srk_> doppelgrau_: Can you please share the qemu bug that you were hitting?
[22:38] * Jeffrey4l_ (~Jeffrey@110.252.65.47) Quit (Ping timeout: 480 seconds)
[22:39] * ntpttr_ (~ntpttr@jfdmzpr06-ext.jf.intel.com) Quit (Quit: Leaving)
[22:39] <SamYaple> stefan0: i would say get to jewel asap, fix the tuneables to the highest your environment and recovery can handle, and then evaluate bcache
[22:39] <rkeene> Yeah, I didn't have a problem using Ceph 0.94.7 librbd linked QEMU talking to Ceph 10.2.2 OSDs/MONs (my upgrade process upgrades the Mons first, then the OSDs then the compute nodes)
[22:39] <rkeene> (For each Ceph server it waits for Ceph to become HEALTH_OK before proceeding with the next host)
[22:39] <doppelgrau_> SamYaple: ok, I hope the guys in the datacenter can give me one or two spare servers tomorrow, so I can update the mon so get ist back to a safe state
[22:40] <stefan0> yes.. our tech guys are discussing about SSD journal over xfs and all.. with jewel we??ll make that be past..
[22:42] <rkeene> SamYaple, I'm trying to get my company to open source my Linux distribution
[22:42] <doppelgrau_> srk_: rkeene: not yet fully debuged, but with ceph 10.2.2 and qemu 2.1.2 (debian jessie) as blockdevice backend for xen the blockdevices are not provided and pvgrub drops to the shell with no devices available
[22:42] <SamYaple> rkeene: would love to dissect it. im still going to do my own for learning purposes
[22:43] <rkeene> My coworker is flying up there next week to talk to them about it
[22:43] <rkeene> It would be a big win for us, I think, because then people could use the software and still pay us for support or for doing Capacity as a Service (which is all we're doing with it so far)
[22:44] <doppelgrau_> srk_: rkeene: Logfiles states something like xen be core: xen be: watching backend path (backend/qdisk/2) failed (which is not very helpfull)
[22:44] <SamYaple> rkeene: awesome to hear
[22:44] <SamYaple> rkeene: im still debating a musl/clang build. but i really dont want to patch everything
[22:45] <SamYaple> i did a musl/clang toolchain and it was a PITA
[22:46] <rkeene> I started out using MUSL but QEMU crashed so I gave up -- I can still switch to it, but there's no particular point
[22:46] <SamYaple> yea its a pain
[22:46] <srk_> doppelgrau_: Thanks. rkeene: Are you using openstack Cinder and qemu/kvm ?
[22:46] <rkeene> Since static linking would be hugely wasteful, so I'd be using the dynamic linking
[22:47] <rkeene> srk_, I'm usiing OpenNebula and QEMU/KVM
[22:47] <SamYaple> srk_: dont mention openstackst**k to rkeene
[22:48] * dnovosel (87f53055@107.161.19.109) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[22:48] <srk_> ah ok. very bad experience with openstack ?
[22:49] <SamYaple> heh yea. he likes to rant about it :P
[22:49] <SamYaple> he isnt wrong either :)
[22:49] <srk_> I'm trying to evaluate Hammer to Jewel upgrades
[22:49] <SamYaple> what about them?
[22:53] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[22:53] <srk_> Some times the VMs with cinder volumes are accessible after upgrade.
[22:54] <srk_> osd and mon versions look fine and all cinder and nova services are restarted.
[22:55] <srk_> need to put more time on that.
[22:58] <rkeene> I'm just annoyed that I wasted so much time with OpenStack and also I want to try to keep people from making the same mistake I made :-D
[22:59] <doppelgrau_> srk_: fixed, installing not only the qemu and qemu-utils from backports but also qemu-block-extra helped
[22:59] <srk_> SamYaple: regarding change to virtio-scsi, usually our VMs are boot from image and cinder volume is attached later.
[23:00] <rkeene> Also annoyed because we are a Cisco gold cloud partner and they keep wanting to know why we're not using OpenStack and telling us that OpenNebula went out of business (when they are thinking of Nebula, which used OpenStack :-D)
[23:01] * programo (~programo@2601:240:c600:b150:65d3:748:b645:aad5) Quit (Quit: Leaving)
[23:01] <srk_> doppelgrau_: what are the versions? qemu-utils 2.0.0+dfsg-2ubuntu1.24
[23:01] <SamYaple> rkeene: ive heard taht one too
[23:01] <SamYaple> srk_: yea but that sets up the storage _controller_ which affects cinder as well
[23:02] <rkeene> SamYaple, Yeah, it's real dumb
[23:03] <doppelgrau_> srk_: qemu-img version 2.5.0 (Debian 1:2.5+dfsg-4~bpo8+1), Copyright (c) 2004-2008 Fabrice Bellard <- works, the 2.1.2 didnt
[23:03] <srk_> SamYaple: So, changing that on the image is going to make vda, vdb ,vdc all goto virtio-scsi?
[23:03] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[23:03] <SamYaple> srk_: right, itll be /dev/sda /dev/sdb
[23:04] <rkeene> So I try to counter Cisco by encouraging people not to use OpenStack :-D
[23:04] <srk_> doppelgrau_: We run Ubuntu, I'll have to find the equivalent.
[23:05] <srk_> SamYaple: Thanks. I'm sure going to try that. However, increasing the pgs after the pool got created, is not going well
[23:05] <doppelgrau_> srk: http://packages.ubuntu.com/search?keywords=qemu <- I read sometimes ???backports???, might help
[23:06] <srk_> # ceph osd pool set default pg_num 2048
[23:06] <srk_> Error E2BIG: specified pg_num 2048 is too large (creating 1024 new PGs on ~18 OSDs exceeds per-OSD max of 32)
[23:06] <SamYaple> srk_: didnt we just talk about this?
[23:06] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[23:06] <SamYaple> srk_: oh no i was talkign to dnovosel
[23:06] <srk_> right. that ML post
[23:07] <srk_> If I had 512 pg to begin with, it allows to goto 1024 and not 2048 or above
[23:07] <SamYaple> srk_: it will allow that, you have to step it up slowly
[23:07] <srk_> However, if I start with 1024 it is not allowing to go to 2048
[23:08] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[23:08] <srk_> by *slowly* you mean, instead of doubling directly, increase by 64,128 etc?
[23:09] <SamYaple> srk_: whatever it allows, yea
[23:09] <SamYaple> i believe there is also an override for that, but i cant rememebr the option
[23:09] <The1_> doppelgrau_: that redhat solution mentions this:
[23:09] <The1_> Issue
[23:09] <The1_> Monitor node is showing down with the below logs
[23:10] <The1_> 2016-01-07 13:26:54.941323 xxxxxxx -1 mon.nodename01@0(probing) e3 handle_probe missing features, have 52776558133247, required 824633720832, missing 0
[23:10] <The1_> 2016-01-07 13:26:56.941139 xxxxxxx -1 mon.nodename01@0(probing) e3 handle_probe missing features, have 52776558133247, required 824633720832, missing 0
[23:10] <The1_> ..
[23:10] <The1_> Resolution
[23:10] <The1_> Make sure there are no firewall rules that blocks the monitor node communication with the other nodes.
[23:10] <The1_> The e3 handle_probe missing features message indicates a version mismatch between cluster nodes. Make sure all the cluster nodes are in the same ceph version.
[23:10] <The1_> .. and sorry for pasting in the channel
[23:10] <SamYaple> The1_: yea but he cant upgrade yet for other reasons. thats definetely the solution like we mentioned before
[23:11] <doppelgrau_> The1_: thanks, so upgrade as heard before. Since I think I found the qemu-Bug, I can move VMs an then restart & upgrade the thirmd mon
[23:11] * drdanick1 (~tallest_r@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[23:11] <The1_> SamYaple: ah, sorry - I didn't read up on everything said.. :)
[23:11] <doppelgrau_> (ATM testing it with a few not mission critical VM)
[23:12] <SamYaple> The1_: no problem :)
[23:15] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[23:23] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[23:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:25] * davidzlap (~Adium@2605:e000:1313:8003:1415:ba10:4eb7:833b) has joined #ceph
[23:32] <doppelgrau_> thanks for the help, my boss wants to test the solution for two days (I would have preffered bringing up the third mon asap), but looks good.
[23:32] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) has joined #ceph
[23:34] <The1_> there seems to be a graceful period where mons can be without certain features - and you probably need to upgrade all mons in that window - afterwards something is assumed to be wrong
[23:38] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Ping timeout: 480 seconds)
[23:41] * drdanick1 (~tallest_r@26XAAAU23.tor-irc.dnsbl.oftc.net) Quit ()
[23:41] <SamYaple> The1_: must be. but thats strange
[23:42] <SamYaple> its best to upgrade them all anyway so i would actually bet its a bug personally
[23:42] <SamYaple> might be a safety percaution
[23:42] <The1_> probably
[23:42] <The1_> could be everything
[23:42] <The1_> but since the normal upgrade procedure is that you take all mons one by one
[23:42] <The1_> there must be some form of overlap allowed
[23:43] <SamYaple> The1_: there is fencing involed, yes. but I would actually be suprised if there was a time delay on the fencing
[23:43] <SamYaple> The1_: basically they all operate on the older version until they are all on the newer version and then they upgrade
[23:44] <The1_> it could be fencing with quorum so if enough mons agree they can evict mons that are not up to speed
[23:44] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) Quit (Quit: Leaving.)
[23:45] <The1_> or upgrade to the new features if quorum allows
[23:45] <hoonetorg> is it possible to mount ceph over wan
[23:45] <hoonetorg> ??
[23:45] <SamYaple> might be. thats curious
[23:45] <SamYaple> hoonetorg: absoultely
[23:45] <SamYaple> hoonetorg: its not recommended
[23:45] <hoonetorg> i know
[23:45] <hoonetorg> but what ip-addresses must be reachable
[23:46] <hoonetorg> i must do some kind of nat
[23:46] <doppelgrau_> The1_: since it took two days, I think a timer, a bug or some network glitch and the last hammer got out for one epoch => all jewel ???
[23:46] <SamYaple> hoonetorg: your public network must be public (which is a bad idea) or you use a vpn
[23:46] <hoonetorg> mhhh
[23:46] <hoonetorg> so mds and mon addresses are enough?
[23:46] <The1_> doppelgrau_: yeah - but there are som many variables it's hard for us to know..
[23:46] <doppelgrau_> hoonetorg: client over wan would be slow, OSDs distrbuted over WAN a pita
[23:46] <SamYaple> hoonetorg: no, you also contact the osds
[23:46] * aj__ (~aj@x4db1ce8d.dyn.telefonica.de) has joined #ceph
[23:47] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[23:47] <hoonetorg> so public and private addresses
[23:47] <SamYaple> its painfully slow with cephfs, even moreso than normal
[23:47] <The1_> perhaps a developer could find a cause, but.. probably like something with a needle and a haystack.. ;)
[23:47] <hoonetorg> i was only thinking if this theoretically works
[23:47] <hoonetorg> but seems only over vpn
[23:47] <hoonetorg> thx
[23:48] <The1_> in theory yes.. in practice, no.. don't ever think of doing it
[23:48] <The1_> you are in for a world of pain
[23:48] <hoonetorg> The1_: i must try this, only to see :)
[23:48] <SamYaple> The1_: ++
[23:48] <hoonetorg> using my qa cluster
[23:48] <The1_> hoonetorg: you must be part of the BDSM scene then.. ;)
[23:49] <hoonetorg> no i want to smile a bit when i see 1KB/s followed by a segfault/kernel panic
[23:50] <The1_> haha
[23:50] <hoonetorg> thx guys - idling again
[23:50] <The1_> or uninterrutable waits and lost data.. ;)
[23:50] <hoonetorg> :)
[23:54] <SamYaple> just hit up devel, sounds like a bug
[23:59] * jrowe (~jrowe@204.14.236.152) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.