#ceph IRC Log

Index

IRC Log for 2016-09-15

Timestamps are in GMT/BST.

[0:01] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) has joined #ceph
[0:02] * togdon (~togdon@74.121.28.6) Quit (Quit: Sleeping...)
[0:02] * togdon (~togdon@74.121.28.6) has joined #ceph
[0:03] * srk (~Siva@32.97.110.53) Quit (Ping timeout: 481 seconds)
[0:03] * tsg (~tgohad@192.55.55.41) Quit (Remote host closed the connection)
[0:07] * tsg (~tgohad@192.55.54.40) has joined #ceph
[0:10] * kristen (~kristen@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[0:11] * rdias (~rdias@2001:8a0:749a:d01:45c0:31f1:4419:16bf) Quit (Ping timeout: 480 seconds)
[0:15] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[0:28] * rdias (~rdias@2001:8a0:749a:d01:448e:e074:416:7bac) has joined #ceph
[0:28] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[0:48] * thansen (~thansen@17.253.sfcn.org) Quit (Ping timeout: 480 seconds)
[0:51] * vata (~vata@207.96.182.162) Quit (Quit: Leaving.)
[0:52] * Brochacho (~alberto@97.93.161.13) Quit (Quit: Brochacho)
[0:55] * xarses_ (~xarses@64.124.158.3) Quit (Ping timeout: 480 seconds)
[0:57] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) has joined #ceph
[0:57] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[0:58] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:02] * Uniju (~utugi____@tor-exit.squirrel.theremailer.net) has joined #ceph
[1:04] * zeroto140 (~zeroto140@205-207-109-139.jfbc.org) has joined #ceph
[1:05] * xarses_ (~xarses@73.93.152.135) has joined #ceph
[1:06] * andreww (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[1:11] * salwasser (~Adium@c-76-118-229-231.hsd1.ma.comcast.net) has joined #ceph
[1:12] * sudocat2 (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[1:12] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[1:13] * xarses_ (~xarses@73.93.152.135) Quit (Ping timeout: 480 seconds)
[1:16] * tsg (~tgohad@192.55.54.40) Quit (Remote host closed the connection)
[1:22] * diver (~diver@95.85.8.93) has joined #ceph
[1:22] * fsimonce (~simon@host98-71-dynamic.1-87-r.retail.telecomitalia.it) Quit (Remote host closed the connection)
[1:24] * thansen (~thansen@17.253.sfcn.org) Quit (Quit: Ex-Chat)
[1:25] * togdon (~togdon@74.121.28.6) Quit (Quit: Bye-Bye.)
[1:25] * Brochacho (~alberto@97.93.161.13) has joined #ceph
[1:25] * [0x4A6F]_ (~ident@p4FC27FC1.dip0.t-ipconnect.de) has joined #ceph
[1:26] * diver (~diver@95.85.8.93) Quit ()
[1:26] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:26] * [0x4A6F]_ is now known as [0x4A6F]
[1:27] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) Quit (Quit: Leaving.)
[1:28] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[1:32] * Uniju (~utugi____@2RTAAAC8S.tor-irc.dnsbl.oftc.net) Quit ()
[1:32] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[1:38] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[1:39] * zeroto140 (~zeroto140@205-207-109-139.jfbc.org) Quit (Quit: Leaving...)
[1:48] * oms101 (~oms101@p20030057EA024000C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:49] * vata (~vata@96.127.202.136) has joined #ceph
[1:50] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[1:50] * nilez (~nilez@96.44.145.186) Quit (Ping timeout: 480 seconds)
[1:50] * nilez (~nilez@96.44.188.106) has joined #ceph
[1:54] * masber (~masber@129.94.15.152) has joined #ceph
[1:56] * oms101 (~oms101@p20030057EA025D00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:58] * Brochacho (~alberto@97.93.161.13) Quit (Quit: Brochacho)
[2:00] * borei (~dan@216.13.217.230) Quit (Ping timeout: 480 seconds)
[2:04] * vata (~vata@96.127.202.136) Quit (Quit: Leaving.)
[2:05] * thansen (~thansen@17.253.sfcn.org) Quit (Quit: Ex-Chat)
[2:07] * vata (~vata@96.127.202.136) has joined #ceph
[2:09] * Skaag (~lunix@65.200.54.234) Quit (Quit: Leaving.)
[2:12] * rdias (~rdias@2001:8a0:749a:d01:448e:e074:416:7bac) Quit (Ping timeout: 480 seconds)
[2:14] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:16] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[2:19] * cholcombe (~chris@97.93.161.13) Quit (Ping timeout: 480 seconds)
[2:19] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[2:20] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[2:27] * LegalResale (~LegalResa@66.165.126.130) Quit (Quit: Leaving)
[2:34] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[2:34] * kristen (~kristen@jfdmzpr06-ext.jf.intel.com) Quit (Quit: Leaving)
[2:37] * salwasser (~Adium@c-76-118-229-231.hsd1.ma.comcast.net) Quit (Quit: Leaving.)
[2:39] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) Quit (Remote host closed the connection)
[2:42] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[2:50] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[3:15] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[3:15] * jfaj__ (~jan@p4FC2523D.dip0.t-ipconnect.de) has joined #ceph
[3:16] * LiamMon (~liam.monc@2.123.181.40) has joined #ceph
[3:22] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) Quit (Quit: wes_dillingham)
[3:22] * LiamMon__ (~liam.monc@94.14.202.192) Quit (Ping timeout: 480 seconds)
[3:22] * jfaj_ (~jan@p4FC25CD8.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:25] * derjohn_mobi (~aj@x590cfca8.dyn.telefonica.de) has joined #ceph
[3:25] * derjohn_mob (~aj@x4db2ad83.dyn.telefonica.de) Quit (Read error: Connection reset by peer)
[3:25] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) has joined #ceph
[3:27] * EinstCrazy (~EinstCraz@61.165.253.98) has joined #ceph
[3:29] <ronrib> how do I set the mds max file size? in the ceph.conf under [mds] ?
[3:33] <ronrib> ah under [global]
[3:33] * EinstCrazy (~EinstCraz@61.165.253.98) Quit (Remote host closed the connection)
[3:41] * sebastian-w_ (~quassel@212.218.8.138) has joined #ceph
[3:44] * sebastian-w (~quassel@212.218.8.138) Quit (Ping timeout: 480 seconds)
[3:56] * Guest359 (~oftc-webi@88.215.192.251) Quit (Ping timeout: 480 seconds)
[3:57] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:00] * wak-work (~wak-work@2620:15c:2c5:3:7c9e:3261:bdc9:bdc9) Quit (Remote host closed the connection)
[4:00] * wak-work (~wak-work@2620:15c:2c5:3:8cc1:c036:72d9:db09) has joined #ceph
[4:01] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) has joined #ceph
[4:01] * ronrib (~boswortr@45.32.242.135) Quit (Remote host closed the connection)
[4:09] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[4:20] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) has joined #ceph
[4:22] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[4:27] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[4:35] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[4:36] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[4:39] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) Quit (Quit: wes_dillingham)
[4:40] * davidz (~davidz@2605:e000:1313:8003:4417:3d54:ddb5:fa77) Quit (Quit: Leaving.)
[4:49] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[4:50] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[4:53] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[4:53] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[4:57] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) has joined #ceph
[5:06] * kefu (~kefu@114.92.125.128) has joined #ceph
[5:19] * Vacuum_ (~Vacuum@88.130.198.181) has joined #ceph
[5:21] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) Quit (Quit: Leaving.)
[5:21] * rdias (~rdias@2001:8a0:749a:d01:448e:e074:416:7bac) has joined #ceph
[5:21] * ronrib (~boswortr@45.32.242.135) has joined #ceph
[5:22] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:25] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[5:26] * Vacuum__ (~Vacuum@88.130.197.23) Quit (Ping timeout: 480 seconds)
[5:26] * sto_ (~sto@121.red-2-139-229.staticip.rima-tde.net) has joined #ceph
[5:27] * sto (~sto@121.red-2-139-229.staticip.rima-tde.net) Quit (Read error: Connection reset by peer)
[5:46] * rdias (~rdias@2001:8a0:749a:d01:448e:e074:416:7bac) Quit (Ping timeout: 480 seconds)
[5:49] * joshd (~jdurgin@125.16.34.66) has joined #ceph
[5:49] * bniver (~bniver@125.16.34.66) has joined #ceph
[5:51] * tsg (~tgohad@192.55.54.43) has joined #ceph
[6:01] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) has joined #ceph
[6:04] * walcubi_ (~walcubi@p5797A1F7.dip0.t-ipconnect.de) has joined #ceph
[6:07] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[6:11] * walcubi (~walcubi@p5795B386.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:13] * bara (~bara@125.16.34.66) has joined #ceph
[6:14] * praveen (~praveen@122.171.114.51) Quit (Ping timeout: 480 seconds)
[6:17] * jcsp (~jspray@125.16.34.66) has joined #ceph
[6:17] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) Quit (Quit: wes_dillingham)
[6:17] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[6:18] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[6:20] * borei (~dan@node-1w7jr9qle4x5ix2kjybp8d4fv.ipv6.telus.net) has joined #ceph
[6:21] <borei> hi all
[6:21] <borei> i have ubuntu 16, 2 OSDs but they started differently
[6:22] <borei> i configured both of them via systemd
[6:22] <borei> one is running as
[6:22] <borei> /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
[6:22] <borei> second one
[6:22] <borei> /bin/bash -c ulimit -n 32768; TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph --setuser ceph --setgroup ceph -f
[6:40] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:43] * tsg (~tgohad@192.55.54.43) Quit (Remote host closed the connection)
[6:43] * tsg (~tgohad@192.55.54.43) has joined #ceph
[6:52] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) has joined #ceph
[6:57] * praveen_ (~praveen@122.171.224.233) has joined #ceph
[6:58] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[7:00] * Jeffrey4l_ (~Jeffrey@110.244.238.95) has joined #ceph
[7:05] * praveen__ (~praveen@171.61.115.115) has joined #ceph
[7:07] * praveen_ (~praveen@122.171.224.233) Quit (Ping timeout: 480 seconds)
[7:07] * Jeffrey4l__ (~Jeffrey@221.195.210.23) Quit (Ping timeout: 480 seconds)
[7:08] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[7:24] * praveen__ (~praveen@171.61.115.115) Quit (Ping timeout: 480 seconds)
[7:28] * kefu is now known as kefu|afk
[7:33] * kefu|afk (~kefu@114.92.125.128) Quit (Quit: Textual IRC Client: www.textualapp.com)
[7:34] * sleinen (~Adium@194.230.155.234) has joined #ceph
[7:41] * sleinen1 (~Adium@2001:620:0:82::104) has joined #ceph
[7:46] * sleinen (~Adium@194.230.155.234) Quit (Ping timeout: 480 seconds)
[7:46] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[7:47] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[7:47] * vikhyat (~vumrao@49.248.206.76) has joined #ceph
[7:51] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) has joined #ceph
[7:51] * praveen_ (~praveen@122.172.122.231) has joined #ceph
[7:52] * tsg (~tgohad@192.55.54.43) Quit (Remote host closed the connection)
[7:59] * praveen_ (~praveen@122.172.122.231) Quit (Ping timeout: 480 seconds)
[8:03] * yanzheng (~zhyan@125.70.21.187) has joined #ceph
[8:03] * al (d@niel.cx) Quit (Ping timeout: 480 seconds)
[8:04] * al (d@2001:41d0:2:1cab::1) has joined #ceph
[8:05] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Read error: Connection reset by peer)
[8:12] * bniver_ (~bniver@121.244.87.118) has joined #ceph
[8:12] * jcsp_ (~jspray@121.244.87.118) has joined #ceph
[8:12] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[8:14] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[8:15] * aiicore (~aiicore@s30.linuxpl.com) Quit (Remote host closed the connection)
[8:16] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:17] * joshd (~jdurgin@125.16.34.66) Quit (Ping timeout: 480 seconds)
[8:17] * jcsp (~jspray@125.16.34.66) Quit (Ping timeout: 480 seconds)
[8:17] * bara (~bara@125.16.34.66) Quit (Ping timeout: 480 seconds)
[8:18] * bniver (~bniver@125.16.34.66) Quit (Ping timeout: 480 seconds)
[8:18] * abhishekvrshny (uid185733@id-185733.charlton.irccloud.com) has joined #ceph
[8:20] * joshd (~jdurgin@125.16.34.66) has joined #ceph
[8:20] * bara (~bara@125.16.34.66) has joined #ceph
[8:21] <ronrib> has anyone played around with cpu tuning? "cpupower frequency-set --governor performance" seems to reduce the latency during benchmarks
[8:22] * jclm (~jclm@77.95.96.78) has joined #ceph
[8:22] * jclm (~jclm@77.95.96.78) Quit (Remote host closed the connection)
[8:24] * praveen_ (~praveen@121.244.155.8) has joined #ceph
[8:27] * jcsp_ (~jspray@121.244.87.118) Quit (Ping timeout: 480 seconds)
[8:27] * aiicore (~aiicore@s30.linuxpl.com) has joined #ceph
[8:27] * bniver_ (~bniver@121.244.87.118) Quit (Ping timeout: 480 seconds)
[8:27] * jcsp_ (~jspray@125.16.34.66) has joined #ceph
[8:28] * bniver_ (~bniver@125.16.34.66) has joined #ceph
[8:33] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[8:40] * ledgr (~ledgr@84.15.177.252) has joined #ceph
[8:45] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[8:46] * yanzheng (~zhyan@125.70.21.187) Quit (Quit: ??????)
[8:46] * bara (~bara@125.16.34.66) Quit (Read error: Connection reset by peer)
[8:46] * jcsp_ (~jspray@125.16.34.66) Quit (Read error: Connection reset by peer)
[8:46] * joshd (~jdurgin@125.16.34.66) Quit (Read error: Connection reset by peer)
[8:47] * bara (~bara@125.16.34.66) has joined #ceph
[8:47] <singler_> ah, I was about to ping Zheng..
[8:47] * bniver (~bniver@125.16.34.66) has joined #ceph
[8:47] * jcsp (~jspray@125.16.34.66) has joined #ceph
[8:47] * joshd (~jdurgin@125.16.34.66) has joined #ceph
[8:49] * ledgr (~ledgr@84.15.177.252) Quit (Remote host closed the connection)
[8:50] * ledgr (~ledgr@84.15.177.252) has joined #ceph
[8:51] * ashah (~ashah@121.244.87.117) has joined #ceph
[8:52] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[8:53] * bniver_ (~bniver@125.16.34.66) Quit (Ping timeout: 480 seconds)
[8:58] * ledgr (~ledgr@84.15.177.252) Quit (Ping timeout: 480 seconds)
[9:00] * joshd1 (~jdurgin@121.244.87.118) has joined #ceph
[9:01] * jcsp_ (~jspray@121.244.87.118) has joined #ceph
[9:01] * bniver_ (~bniver@121.244.87.118) has joined #ceph
[9:02] <singler_> is 10.2.3 planned to be released soon?
[9:03] * jcsp (~jspray@125.16.34.66) Quit (Read error: Connection reset by peer)
[9:04] * bniver (~bniver@125.16.34.66) Quit (Read error: Connection reset by peer)
[9:04] * joshd (~jdurgin@125.16.34.66) Quit (Ping timeout: 480 seconds)
[9:04] * bara (~bara@125.16.34.66) Quit (Read error: Connection reset by peer)
[9:08] * joshd1 (~jdurgin@121.244.87.118) Quit (Ping timeout: 480 seconds)
[9:09] * joshd (~jdurgin@125.16.34.66) has joined #ceph
[9:09] * bara (~bara@125.16.34.66) has joined #ceph
[9:10] * jcsp_ (~jspray@121.244.87.118) Quit (Ping timeout: 480 seconds)
[9:10] * bniver_ (~bniver@121.244.87.118) Quit (Ping timeout: 480 seconds)
[9:11] * jcsp_ (~jspray@125.16.34.66) has joined #ceph
[9:11] * bniver_ (~bniver@125.16.34.66) has joined #ceph
[9:14] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:17] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[9:18] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[9:22] * sleinen1 (~Adium@2001:620:0:82::104) Quit (Ping timeout: 480 seconds)
[9:22] * dugravot61 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[9:23] * dugravot61 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit ()
[9:25] * Bj_o_rn (~Yopi@tsn109-201-154-202.dyn.nltelcom.net) has joined #ceph
[9:25] * Bj_o_rn (~Yopi@tsn109-201-154-202.dyn.nltelcom.net) Quit (autokilled: This host may be infected. Mail support@oftc.net with questions. BOPM (2016-09-15 07:25:17))
[9:26] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[9:26] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[9:29] * sleinen (~Adium@2001:620:0:82::107) has joined #ceph
[9:32] * Racpatel (~Racpatel@2601:87:3:31e3:4e34:88ff:fe87:9abf) Quit (Ping timeout: 480 seconds)
[9:33] * fsimonce (~simon@host98-71-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[9:35] * toastydeath (~toast@pool-71-255-253-39.washdc.fios.verizon.net) Quit (Read error: Connection reset by peer)
[9:36] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[9:38] * jcsp_ (~jspray@125.16.34.66) Quit (Quit: Ex-Chat)
[9:38] * jcsp (~jspray@125.16.34.66) has joined #ceph
[9:46] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:46] * thomnico (~thomnico@2a01:e35:8b41:120:5c0b:4c86:f4c1:52ad) has joined #ceph
[9:47] * Dw_Sn (~Dw_Sn@00020a72.user.oftc.net) has joined #ceph
[9:49] * sleinen (~Adium@2001:620:0:82::107) Quit (Ping timeout: 480 seconds)
[9:50] * thomnico (~thomnico@2a01:e35:8b41:120:5c0b:4c86:f4c1:52ad) Quit ()
[9:51] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[9:51] * lmb (~Lars@nat.nue.novell.com) has joined #ceph
[9:56] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[10:01] * ade (~abradshaw@194.169.251.11) has joined #ceph
[10:01] * Nijikokun1 (~ggg@46.166.190.214) has joined #ceph
[10:03] * bniver_ (~bniver@125.16.34.66) Quit (Quit: Leaving)
[10:03] * bniver (~bniver@125.16.34.66) has joined #ceph
[10:08] * derjohn_mobi (~aj@x590cfca8.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[10:09] * LiamMon (~liam.monc@2.123.181.40) Quit (Quit: leaving)
[10:09] * LiamMon (~liam.monc@classical.moncur.eu) has joined #ceph
[10:10] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Ping timeout: 480 seconds)
[10:19] * thomnico (~thomnico@2a01:e35:8b41:120:5c0b:4c86:f4c1:52ad) has joined #ceph
[10:21] * sleinen (~Adium@eduroam-2-104.epfl.ch) has joined #ceph
[10:23] * elder_ (sid70526@id-70526.charlton.irccloud.com) Quit (Read error: Connection reset by peer)
[10:23] * scubacuda (sid109325@0001fbab.user.oftc.net) Quit (Read error: Connection reset by peer)
[10:23] * elder_ (sid70526@id-70526.charlton.irccloud.com) has joined #ceph
[10:23] * scalability-junk (sid6422@id-6422.ealing.irccloud.com) Quit (Read error: Connection reset by peer)
[10:23] * scubacuda (sid109325@0001fbab.user.oftc.net) has joined #ceph
[10:23] * scalability-junk (sid6422@id-6422.ealing.irccloud.com) has joined #ceph
[10:25] * sleinen1 (~Adium@2001:620:0:82::108) has joined #ceph
[10:31] * sleinen (~Adium@eduroam-2-104.epfl.ch) Quit (Ping timeout: 480 seconds)
[10:31] * Nijikokun1 (~ggg@46.166.190.214) Quit ()
[10:34] * branto (~branto@178.253.167.12) has joined #ceph
[10:34] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[10:36] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[10:42] * briner (~briner@2001:620:600:1000:70cc:216f:578c:e9ba) Quit (Read error: Connection reset by peer)
[10:42] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[10:48] * karnan (~karnan@125.16.34.66) has joined #ceph
[10:50] * TMM (~hp@185.5.121.201) has joined #ceph
[10:51] * briner (~briner@2001:620:600:1000:11c:c342:6be9:dab7) has joined #ceph
[10:51] * ledgr (~ledgr@84.15.177.252) has joined #ceph
[10:53] * CypressXt (~clement@205.37.194.178.dynamic.wline.res.cust.swisscom.ch) has joined #ceph
[10:53] * joshd (~jdurgin@125.16.34.66) Quit (Quit: Leaving.)
[10:53] <CypressXt> Hi
[10:54] <singler_> hi
[10:54] <Kvisle_> hi
[10:54] <CypressXt> may I be in the ceph irc chan ?
[10:55] * ledgr (~ledgr@84.15.177.252) Quit (Remote host closed the connection)
[10:55] <singler_> you may be :)
[10:56] <CypressXt> ok thx, I'm new here ^^
[10:56] * ledgr (~ledgr@84.15.177.252) has joined #ceph
[10:57] <CypressXt> Is it possible to ask any question about ceph there ?
[10:59] * joshd (~jdurgin@125.16.34.66) has joined #ceph
[10:59] <singler_> there is no need to ask permissions to be in channel or ask questions, just be and ask
[11:03] <CypressXt> ok, so I'm having some trouble with a freshly installed ceph cluster. (ceph version 10.2.2) (4 osd's) (1 mon)
[11:04] * ledgr (~ledgr@84.15.177.252) Quit (Ping timeout: 480 seconds)
[11:04] <CypressXt> the all cluster freeze when an osd loose network access while I'm writing some data on the rbd image
[11:06] <CypressXt> I googled the issues without any proper anwser :S, does anyone has any explanations about this ?
[11:06] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[11:06] <singler_> you have 4 osds or 4 osd servers?
[11:07] <CypressXt> 4 osd on 4 servers (1 osd per srv)
[11:07] * saintpablo (~saintpabl@gw01.mhitp.dk) has joined #ceph
[11:08] <singler_> what does "ceph -s" show during that freeze? Does it stay frozen while osd is down or does it recover after a short while. What about server load average, disk utilization?
[11:12] * derjohn_mobi (~aj@b2b-94-79-172-98.unitymedia.biz) has joined #ceph
[11:14] <CypressXt> ceph -s say:
[11:14] <CypressXt> health HEALTH_WARN
[11:14] <CypressXt> 31 requests are blocked > 32 sec
[11:14] <CypressXt> monmap e1: 1 mons at {n1=10.20.20.11:6789/0}
[11:14] <CypressXt> election epoch 5, quorum 0 n1
[11:14] <CypressXt> osdmap e105: 4 osds: 4 up, 4 in
[11:14] <CypressXt> flags sortbitwise
[11:14] <CypressXt> pgmap v984: 128 pgs, 1 pools, 68360 MB data, 17125 objects
[11:14] <CypressXt> 153 GB used, 590 GB / 744 GB avail
[11:14] <CypressXt> 128 active+clean
[11:14] <CypressXt> client io 6593 B/s wr, 0 op/s rd, 0 op/s wr
[11:15] <singler_> you should use pastebin.com (or alternative) for more than few lines
[11:15] <CypressXt> ok will do ;)
[11:16] <singler_> I suspect that your cluster is receiving too much load to handle it easily during failure
[11:16] <CypressXt> and the cluster stay frozen until the osd is marked down
[11:16] <singler_> what does "ceph health detail" show?
[11:17] <singler_> I guess it is small cluster problem. Mon waits for enough reports to kick OSD out
[11:18] <singler_> *down
[11:18] <singler_> you could decrease thresholds
[11:19] <CypressXt> HEALTH_WARN 33 requests are blocked > 32 sec; 3 osds have slow requests
[11:19] <CypressXt> 6 ops are blocked > 262.144 sec on osd.2
[11:19] <CypressXt> 18 ops are blocked > 262.144 sec on osd.1
[11:19] <CypressXt> 9 ops are blocked > 262.144 sec on osd.0
[11:19] <CypressXt> 3 osds have slow requests
[11:19] <Be-El> your cluster did not recognize that the osd is down, so requests are still send to it
[11:19] <CypressXt> yep exactly
[11:20] <Be-El> there are some configuration settings for the monitors that need to be adjusted for small clusters with a low number of osds
[11:20] <Be-El> otherwise the monitors ignore the reports about missing osds
[11:21] <CypressXt> ok 0.0, where may I find these settings ?
[11:25] <Be-El> http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/
[11:25] <Be-El> not sure which of them needs to be adjusted
[11:29] <CypressXt> ok, I'll try this
[11:29] <CypressXt> thx for help guys !
[11:30] * joshd (~jdurgin@125.16.34.66) Quit (Quit: Leaving.)
[11:32] * karnan (~karnan@125.16.34.66) Quit (Ping timeout: 480 seconds)
[11:37] * Gibri (~K3NT1S_aw@exit0.liskov.tor-relays.net) has joined #ceph
[11:37] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[11:41] * karnan (~karnan@121.244.87.117) has joined #ceph
[11:41] * DanFoster (~Daniel@office.34sp.com) Quit (Read error: Connection reset by peer)
[11:41] * DanFoster (~Daniel@2a00:1ee0:3:1337:a4aa:d3cd:9afa:a548) has joined #ceph
[11:55] <CypressXt> does anyone know exactly what I need to change, I tried many changes in the ceph.conf file without success
[11:56] <CypressXt> here is my ceph.conf file: http://pastebin.com/gxGbvLmB
[12:03] <Be-El> CypressXt: did you also restart the monitor after changing the configuration file?
[12:03] <Be-El> CypressXt: you are also able to modify the configuration of a running instance by injecting new configuration settings
[12:05] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[12:05] <CypressXt> Yes, I restart after changing the conf file
[12:06] <CypressXt> I changed the conf file on every node and restart them all after this
[12:07] * Gibri (~K3NT1S_aw@26XAABX81.tor-irc.dnsbl.oftc.net) Quit ()
[12:08] <CypressXt> back soon, going to eat something.
[12:08] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[12:13] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[12:17] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[12:18] * grauzikas (grauzikas@78-56-222-78.static.zebra.lt) Quit (Read error: No route to host)
[12:18] * grauzikas (grauzikas@78-56-222-78.static.zebra.lt) has joined #ceph
[12:18] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit ()
[12:18] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[12:20] * ZombieL (~WedTM@108.61.99.238) has joined #ceph
[12:24] * ashah (~ashah@121.244.87.117) Quit (Quit: Leaving)
[12:28] * ashah (~ashah@121.244.87.117) has joined #ceph
[12:35] * swami1 (~swami@49.38.2.110) has joined #ceph
[12:39] * bara (~bara@125.16.34.66) Quit (Remote host closed the connection)
[12:39] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[12:40] * bara (~bara@125.16.34.66) has joined #ceph
[12:45] * praveen__ (~praveen@121.244.155.8) has joined #ceph
[12:45] * praveen_ (~praveen@121.244.155.8) Quit (Read error: Connection reset by peer)
[12:49] * rendar (~I@host25-87-dynamic.22-79-r.retail.telecomitalia.it) has joined #ceph
[12:50] * ZombieL (~WedTM@108.61.99.238) Quit ()
[12:51] * karnan (~karnan@125.16.34.66) has joined #ceph
[12:53] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[12:55] * ErifKard (~ErifKard@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Remote host closed the connection)
[12:56] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[12:58] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit ()
[13:02] * jcsp (~jspray@125.16.34.66) Quit (Ping timeout: 480 seconds)
[13:02] <CypressXt> hi again
[13:06] * saintpablo (~saintpabl@gw01.mhitp.dk) Quit (Ping timeout: 480 seconds)
[13:11] * praveen__ (~praveen@121.244.155.8) Quit (Ping timeout: 480 seconds)
[13:13] <CypressXt> someone as already used a small ceph cluster and has a working config ?
[13:14] * saintpablo (~saintpabl@gw01.mhitp.dk) has joined #ceph
[13:26] * karnan (~karnan@125.16.34.66) Quit (Ping timeout: 480 seconds)
[13:28] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[13:30] * saintpablo (~saintpabl@gw01.mhitp.dk) Quit (Ping timeout: 480 seconds)
[13:30] * sleinen1 (~Adium@2001:620:0:82::108) Quit (Ping timeout: 480 seconds)
[13:32] * praveen_ (~praveen@121.244.155.8) has joined #ceph
[13:35] * karnan (~karnan@121.244.87.117) has joined #ceph
[13:39] * bniver (~bniver@125.16.34.66) Quit (Remote host closed the connection)
[13:40] * wes_dillingham (~wes_dilli@140.247.242.44) has joined #ceph
[13:41] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[13:45] * thomnico (~thomnico@2a01:e35:8b41:120:5c0b:4c86:f4c1:52ad) Quit (Quit: Ex-Chat)
[13:45] <CypressXt> you can see here the issue: https://www.youtube.com/watch?v=WvJO9V3IMS0
[13:49] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[13:51] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[13:53] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[13:53] * Racpatel (~Racpatel@2601:87:3:31e3::77ec) has joined #ceph
[13:56] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[14:01] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[14:03] * Lokta (~Lokta@193.164.231.98) has joined #ceph
[14:05] * srk (~Siva@2605:6000:ed04:ce00:c0c6:c24d:e75d:5fb0) has joined #ceph
[14:07] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:405b:e9e5:f08e:901e) has joined #ceph
[14:09] * ashah (~ashah@121.244.87.117) Quit (Quit: Leaving)
[14:14] * srk (~Siva@2605:6000:ed04:ce00:c0c6:c24d:e75d:5fb0) Quit (Ping timeout: 480 seconds)
[14:22] * Racpatel (~Racpatel@2601:87:3:31e3::77ec) Quit (Quit: Leaving)
[14:22] * Racpatel (~Racpatel@2601:87:3:31e3::77ec) has joined #ceph
[14:24] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[14:25] * neurodrone_ (~neurodron@162.243.191.67) has joined #ceph
[14:27] <nikbor> is there anything special about the first 63 megabytes of a device? doing experiments with trace-cmd and dd it seems that if i read the first 63 megabytes of a device nothing happens, e.g. i don't see the block_bio_queue event fire, if i read 64 megabytes i do see block activity ?
[14:27] * georgem (~Adium@24.114.76.35) has joined #ceph
[14:31] * diver (~diver@95.85.8.93) has joined #ceph
[14:32] * dan__ (~Daniel@office.34sp.com) has joined #ceph
[14:35] <singler_> wild guess - it get's cached due to readahead and partition reading?
[14:39] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[14:39] * georgem (~Adium@24.114.76.35) Quit (Ping timeout: 480 seconds)
[14:40] * DanFoster (~Daniel@2a00:1ee0:3:1337:a4aa:d3cd:9afa:a548) Quit (Ping timeout: 480 seconds)
[14:40] <nikbor> could be, but i use iflag=direct
[14:40] <nikbor> so bypassing page-cache and this only happens on the first 63 megs apparently
[14:41] * ledgr (~ledgr@84.15.177.252) has joined #ceph
[14:43] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[14:44] * bara (~bara@125.16.34.66) Quit (Ping timeout: 480 seconds)
[14:45] <IcePic> something reads the bootblock/partition table, does readahead and keeps 63M sounds reasonable
[14:50] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[14:51] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[14:52] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[14:53] * georgem (~Adium@206.108.127.16) has joined #ceph
[14:57] * ledgr (~ledgr@84.15.177.252) Quit (Remote host closed the connection)
[14:58] * ledgr (~ledgr@84.15.177.252) has joined #ceph
[14:59] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[15:01] <CypressXt> is there a minimal number of osds required by ceph to run properly ?
[15:03] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[15:06] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[15:06] * ledgr (~ledgr@84.15.177.252) Quit (Ping timeout: 480 seconds)
[15:06] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[15:07] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[15:09] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:09] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[15:11] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[15:12] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:12] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[15:14] <doppelgrau> CypressXt: min size-failuredomains (default hosts), bettser size+1
[15:16] <CypressXt> doppelgrou: thx for your reply, what do you mean exactly ?
[15:20] * swami1 (~swami@49.38.2.110) Quit (Read error: Connection reset by peer)
[15:22] <diver> I ran test ceph server on the single server
[15:22] <diver> with one osd
[15:22] <diver> so it works fine with one osd and single replica. if that's only the test case of course
[15:25] <CypressXt> diver: ok well :), did you make any special configuration to run it smoothly ? Did you use ceph-deploy ?
[15:26] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:26] <diver> yes, ceph-deploy used
[15:26] <diver> set
[15:26] <diver> osd_pool_default_size = 1
[15:26] <diver> osd_pool_default_min_size = 1
[15:26] <diver> osd created via loop device
[15:27] <diver> dd if=/dev/zero of=/drives/osd1 bs=1M count=15360
[15:27] <diver> losetup /dev/loop0 /drives/osd1
[15:27] <diver> ceph-disk zap /dev/loop0
[15:27] <diver> ceph-deploy osd create ceph-test:/dev/loop0
[15:27] <diver> had to re-weight it
[15:27] <diver> ceph osd crush reweight osd.0 1
[15:27] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[15:27] <diver> after the installation first of all just make sure that you reaplly have size=1
[15:27] <diver> ceph osd dump | grep pool
[15:28] <diver> on digitalocean with the new block storage feature
[15:28] <diver> that loop work-around not required even
[15:28] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[15:29] <CypressXt> ok thx, I'm trying to use a 4 OSD setup without success. The cluster is healthy, but when an osd loose the network, the all cluster is frozen.
[15:30] * salwasser (~Adium@72.246.3.14) has joined #ceph
[15:31] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[15:32] * squizzi (~squizzi@107.13.237.240) Quit (Quit: bye)
[15:32] * vata (~vata@96.127.202.136) Quit (Quit: Leaving.)
[15:34] * Pinkbyte (~oftc-webi@work.pinkbyte.ru) has joined #ceph
[15:34] <CypressXt> here is my ceph.conf file: http://pastebin.com/M0SYXDvy
[15:35] <CypressXt> and here the real time stuff: https://www.youtube.com/watch?v=WvJO9V3IMS0
[15:35] <Pinkbyte> Hi. I have ceph cluster, version 0.8.11. During rbd flatten, procedure stuck at 99% and i see '4 requests are blocked > 32 sec' when i call ceph -s. Disk activity is minimal, so i suppose no data is moved. I wait for around 30 minutes, nothing changed. Any ideas?
[15:35] * tsg (~tgohad@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[15:36] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[15:37] * rraja (~rraja@121.244.87.117) has joined #ceph
[15:37] * srk (~Siva@2605:6000:ed04:ce00:a9b6:934d:fbb:5436) has joined #ceph
[15:41] <CypressXt> Hi Pinkbyte, what does "ceph health detail" show ?
[15:42] <diver> CypressXt: I guess you should closely look at http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/ and lower all the timeouts and heartbeats. even with osd mon heartbeat interval your unplugged OSD was up (ceph -s)
[15:43] <diver> so then you see slowrequests and so on
[15:43] <diver> as faster ceph detects that osd is down as sooner it will re-locate requests to the next available OSD's
[15:44] <singler_> mon osd down out interval
[15:44] <singler_> mon osd min down reports
[15:44] <diver> true
[15:44] <diver> it's not set in that ceph.conf
[15:44] <diver> so it's default 300 secds
[15:44] <singler_> I guess these params would help
[15:44] <diver> singler_: I agree with you
[15:44] <CypressXt> ok thx guys, I'll try this :)
[15:45] <CypressXt> right now
[15:45] <Pinkbyte> CypressXt: HEALTH_ERR 1 pgs inconsistent; 4 requests are blocked > 32 sec; 1 scrub errors
[15:45] <Pinkbyte> CypressXt: 4 ops are blocked > 4194.3 sec
[15:45] <singler_> are your disks healthy?
[15:46] <singler_> you can try bouncing osds where requests are blocked
[15:46] <Pinkbyte> smartctl do not show any problems
[15:46] <Pinkbyte> singler_: you mean send it offlline ?
[15:46] <singler_> restart osd daemon
[15:47] <singler_> inconsistent pgs mean that data on primary and replica differs
[15:47] <Pinkbyte> singler_: okay, i did not try that
[15:47] <singler_> if it is test cluster just do pg repair, if it is prod, you need to figure out which one has bad data
[15:48] <singler_> (I do not think that inconsistent pgs block writes)
[15:48] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[15:48] * EinstCrazy (~EinstCraz@61.165.253.98) has joined #ceph
[15:48] * walcubi_ is now known as walcubi
[15:49] <Pinkbyte> singler_: hm, how i can detect which osd holds blocked requests? ceph health detail did not say neigher pg id, nor osd id, just says count of operations and time they are blocked
[15:49] <Pinkbyte> are you suggesting sequentially restart all osd daemons in cluster(with some delay, of course)
[15:51] * srk (~Siva@2605:6000:ed04:ce00:a9b6:934d:fbb:5436) Quit (Ping timeout: 480 seconds)
[15:55] <Pinkbyte> singler_: it helps, thanks. WIll try to flatten image one more time. By the way, is 0.80.* branch still supported? I can not update cluster to latest available version in my distro(which is 9.2.4) due to some hardware restrictions, but if i use unsupported version, probably it is time to dig into some deep shit and try to update it somehow
[15:56] <singler_> did you resolve blocked ops? ceph health detail show which osds has them (you can also check ceph.log)
[15:56] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[15:56] <singler_> "23 ops are blocked > 32.768 sec on osd.71"
[15:56] <Pinkbyte> singler_: yeah, after restart they are gone
[15:57] <singler_> I have such lines on loaded cluster
[15:57] <singler_> if you cannot go for 9 or 10, you can try 0.94
[15:57] <singler_> and I think 0.80 is not supported anymore
[15:57] <singler_> 0.94 is almost done too
[15:58] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:58] <singler_> http://ceph.com/releases/v0-80-11-firefly-released/
[15:58] <singler_> This is a bugfix release for Firefly. As the Firefly 0.80.x series is nearing its planned end of life in January 2016 it may also be the last.
[15:59] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[16:00] <Pinkbyte> singler_: rbd flatten completes successfully without any stucked requests. Will count this as some weird bug in my config
[16:00] <Pinkbyte> yeah, really need to update it
[16:02] * mhack (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[16:03] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[16:13] * ade (~abradshaw@194.169.251.11) Quit (Ping timeout: 480 seconds)
[16:14] * andreww (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:14] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[16:14] <CypressXt> singler_ & diver: ok, I changed the ceph.conf file as you said sadly without any change. Same issue when unplugging the network cable. Any other suggestion ?
[16:15] <CypressXt> here is the new config file: http://pastebin.com/U3NmPmfF
[16:18] <CypressXt> as showned here, the osds loaded the new config file: http://pastebin.com/RdRNqVhx
[16:21] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:22] * squizzi (~squizzi@2001:420:2240:1268:ad85:b28:ee1c:890) has joined #ceph
[16:23] <diver> CypressXt: on the first test - did you wait for a while? does it get back on track after a while or it stuck completely for more than 10-15 mins?
[16:23] <Be-El> CypressXt: osd are not responsible for marking osds as down. you need to update the monitors
[16:24] <CypressXt> diver: yes avec 900 sec (value of the "mon osd report timeout") but the cluster is stuck for 900 without any possible IO
[16:25] <CypressXt> Be-El: ok, how can I do that ?
[16:26] <diver> but not forever. that's the point. 900 secs.. check mon daemons configs for applying changes
[16:26] * vikhyat (~vumrao@49.248.206.76) Quit (Quit: Leaving)
[16:28] <diver> decrease mon osd report timeout
[16:28] <diver> I dont see it in your updated ceph.conf
[16:28] <diver> http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/
[16:29] <Be-El> CypressXt: ceph tell mon.* injectargs "--whatever=something"
[16:30] * andreww (~xarses@64.124.158.3) has joined #ceph
[16:30] <CypressXt> diver: not sure to anderstand what you mean, is it normal to have a freeze time. If I set the "mon osd report timeout" at 5 sec, the cluster will freeze during 5 sec ?
[16:30] <CypressXt> Be-El: thx :)
[16:30] <diver> yes
[16:30] <diver> # ceph daemon /var/run/ceph/ceph-mon.*.asok config show |grep "mon_osd_report_timeout"
[16:30] <diver> "mon_osd_report_timeout": "900",
[16:31] <diver> mon_osd_report_timeout can't be changed on the fly
[16:31] <diver> you should put the value in the ceph.conf
[16:31] <diver> and restart mons
[16:31] <diver> after the restart make sure that it have changed
[16:32] <diver> and repeat the test
[16:32] <CypressXt> diver: ok, I'm writting it, testing asap :). Thanks
[16:32] <CypressXt> diver: ok so the freeze time can't be avoided ?
[16:33] <diver> completely with the small number of OSD's - I'm afraid no
[16:33] <diver> but lowering timeouts will help
[16:33] <diver> it will not give the IO error, but small freeze
[16:34] <diver> on the bigger clusters with many many osd's problem doesn't hurt that much as load spread over the all OSD"s
[16:34] <diver> so less number of requests stuck during the failure
[16:34] <diver> and less affect on the clients
[16:35] <CypressXt> diver: whoa, didn't notice this anywhere in the docs 0.0. Did I miss it ?
[16:36] <diver> I haven't see that that in docs as well. that's what I see in my cluster
[16:36] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[16:37] <diver> until OSD is 'up' - ceph will forward requests to it
[16:37] <diver> and requests will stuck
[16:37] <rkeene> I think you mean "down"
[16:37] <diver> until OSD get down. or reply\process the request
[16:39] <shubjero> Has anyone ever seen extremely high commit and apply latency for a select few osd's? I'm talking hours of latency instead of miliseconds/seconds. The disks seem to have good throughput and there is no percievable slowness to the particular osd's as far as I can tell. Running Hammer..
[16:39] <singler_> also on larger cluster osds can report other osds as down
[16:40] * Inverness (~TGF@46.166.138.160) has joined #ceph
[16:40] <singler_> CypressXt: try running few tests in parallel. If there will be more ops, maybe other osds will notice that someone is down and report to mon. But your scale is very small, so not sure
[16:42] <Be-El> there's nothing like 'ceph'....you have clients, mons and osds. clients are talking to osds, and keep on sending requests to unavailable osds until either the tcp connection is terminated, or a mon reports the osd as down
[16:42] <shubjero> Here's some snipped output from ceph osd perf: http://pastebin.com/NYyX1mQ5
[16:42] * tsg (~tgohad@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[16:43] <Be-El> shubjero: commit latency is writing to journal afaik. do you use a ssd which is dying?
[16:43] * srk (~Siva@32.97.110.55) has joined #ceph
[16:44] * EinstCrazy (~EinstCraz@61.165.253.98) Quit (Remote host closed the connection)
[16:45] * kristen (~kristen@134.134.139.82) has joined #ceph
[16:51] <shubjero> Be-El: We're using a small partition on each osd for journalling, not a separate ssd
[16:54] <Be-El> shubjero: hdds as osd or ssds?
[16:55] <shubjero> hdd's
[16:57] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[16:58] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) has joined #ceph
[16:58] <shubjero> 4TB
[16:58] <Be-El> shubjero: any message in kernel log or some warnings in smartctl?
[16:59] <CypressXt> This topic raise me over 9000 questions per lines XD. Correct me if I'm wrong but, if an osd is networkly down, all the client request to it are stuck for the time of "mon osd report timeout" value ? But in a hudge cluster, other osds are reporting the down one before the 900sec "mon osd report timeout". Is it right ?
[17:00] <CypressXt> sorry for my english by the way ^^
[17:01] <Be-El> CypressXt: yes, as far as i understand it. the mons are able to report osd as down by themselves (with a much larger timeout), or osds are reporting other osds as down to the mons
[17:02] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) Quit (Quit: Leaving.)
[17:02] * lmb (~Lars@nat.nue.novell.com) Quit (Ping timeout: 480 seconds)
[17:04] <shubjero> Be-El: both seem pretty clean
[17:04] <shubjero> nothing sticks out
[17:04] <doppelgrau> CypressXt: http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/
[17:04] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[17:04] <Be-El> CypressXt: the main difference is the network. osds use heartbeats on the internal/cluster network, while the mons use the client network
[17:04] * salwasser (~Adium@72.246.3.14) Quit (Quit: Leaving.)
[17:05] * salwasser (~Adium@72.246.3.14) has joined #ceph
[17:05] * vata (~vata@207.96.182.162) has joined #ceph
[17:10] * schegi_ (~schegi@81.169.147.212) Quit (Read error: No route to host)
[17:10] * Inverness (~TGF@46.166.138.160) Quit ()
[17:11] <CypressXt> ok this seems to make sens, but so, why does my mon or the 3 others osds don't decare the down osd as down ? 3 live osd and 1 mon is not enough ?
[17:15] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[17:19] * theghost99 (~airsoftgl@37.203.209.10) has joined #ceph
[17:20] * branto (~branto@178.253.167.12) Quit (Ping timeout: 480 seconds)
[17:25] * tsg (~tgohad@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[17:25] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[17:26] * branto (~branto@178.253.167.12) has joined #ceph
[17:27] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[17:27] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[17:31] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[17:40] * branto (~branto@178.253.167.12) Quit (Quit: Leaving.)
[17:41] * Pinkbyte (~oftc-webi@work.pinkbyte.ru) Quit (Quit: Page closed)
[17:45] * Skaag (~lunix@65.200.54.234) has joined #ceph
[17:45] * Skaag (~lunix@65.200.54.234) Quit ()
[17:45] * Skaag (~lunix@65.200.54.234) has joined #ceph
[17:48] * toastydeath (~toast@pool-71-255-253-39.washdc.fios.verizon.net) has joined #ceph
[17:49] * theghost99 (~airsoftgl@37.203.209.10) Quit ()
[17:56] <Be-El> which latency from a osd perf dump reflect the client latency? commitcycle? apply? osd write/read?
[17:57] * tsg (~tgohad@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[18:00] * tsg (~tgohad@134.134.137.75) has joined #ceph
[18:03] <cholcombe> my mds is crashing with: wanted state up:boot. Anyone know what cephx key params will give it that? I gave it rwx on mon,osd and mds haha
[18:03] * mattch (~mattch@w5430.see.ed.ac.uk) Quit (Remote host closed the connection)
[18:05] * ntpttr_ (~ntpttr@192.55.54.43) has joined #ceph
[18:06] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[18:10] * Dw_Sn (~Dw_Sn@00020a72.user.oftc.net) Quit (Quit: leaving)
[18:11] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[18:14] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[18:17] * borei1 (~dan@216.13.217.230) has joined #ceph
[18:20] * mykola (~Mikolaj@91.245.74.66) has joined #ceph
[18:22] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[18:25] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[18:29] * ntpttr_ (~ntpttr@192.55.54.43) Quit (Remote host closed the connection)
[18:31] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:37] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:38] * rakeshgm (~rakesh@106.51.28.220) has joined #ceph
[18:42] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:43] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[18:45] * sleinen (~Adium@194.230.159.148) has joined #ceph
[18:48] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[18:50] * rakeshgm (~rakesh@106.51.28.220) Quit (Quit: Leaving)
[18:50] * rakeshgm (~rakesh@106.51.28.220) has joined #ceph
[18:53] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:54] <cetex> so, got new R730-XD node today
[18:54] <cetex> installed and set it up with one osd per hdd
[18:54] <cetex> so 16 osd's.
[18:55] * dan__ (~Daniel@office.34sp.com) Quit (Quit: Leaving)
[18:56] <cetex> it seems quite a bit faster when checking metrics and stuff, we're gonna let the cluster rebalance a bit and then start rebuilding the other nodes to split them up (they're running 16disks in raid-6 currently, optimized for gluster)
[18:59] * starcoder (~Maza@108.61.123.86) has joined #ceph
[18:59] * sleinen (~Adium@194.230.159.148) Quit (Ping timeout: 480 seconds)
[19:00] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[19:03] * tsg_ (~tgohad@192.55.54.42) has joined #ceph
[19:04] * thomnico (~thomnico@2a01:e35:8b41:120:d16f:9087:ec5b:969f) has joined #ceph
[19:05] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[19:07] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[19:07] <TMM> Well, now I know what happens when an osd reaches 95%
[19:07] * davidzlap (~Adium@2605:e000:1313:8003:c9ce:633a:915e:1395) has joined #ceph
[19:08] * sudocat1 (~dibarra@192.185.1.20) has joined #ceph
[19:08] * tsg (~tgohad@134.134.137.75) Quit (Remote host closed the connection)
[19:09] * sudocat (~dibarra@192.185.1.20) Quit (Read error: Connection reset by peer)
[19:10] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[19:17] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[19:18] * thansen (~thansen@17.253.sfcn.org) Quit (Quit: Ex-Chat)
[19:19] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[19:29] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[19:29] * starcoder (~Maza@26XAABYK7.tor-irc.dnsbl.oftc.net) Quit ()
[19:35] <lincolnb> are multiple pools in cephfs via file layouts stable/supported in Hammer and beyond? I have some data I'd like to move from an EC+Cache Tier pool to just a replicated pool
[19:37] * sleinen (~Adium@194.230.159.148) has joined #ceph
[19:37] * praveen_ (~praveen@121.244.155.8) Quit (Remote host closed the connection)
[19:40] * sleinen1 (~Adium@2001:620:0:82::101) has joined #ceph
[19:45] * sleinen (~Adium@194.230.159.148) Quit (Ping timeout: 480 seconds)
[19:51] <diver> cetex: can you share the 730xd spec?
[19:51] <diver> I was planning to use them as well
[19:51] <diver> cpu\memory\ssd(?)\hdds?
[19:52] * salwasser (~Adium@72.246.3.14) Quit (Quit: Leaving.)
[19:55] * rakeshgm (~rakesh@106.51.28.220) Quit (Quit: Peace :))
[19:58] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[20:00] * thomnico (~thomnico@2a01:e35:8b41:120:d16f:9087:ec5b:969f) Quit (Ping timeout: 480 seconds)
[20:00] <lincolnb> diver: FWIW, we're using R730XDs as well. We've got 14x8TB disks (HGST He8) in JBOD via the H730 controller, 2x 200GB SSDs, 96GB RAM, 10Gbps (x520-da2), and 2x E5-2650v3's per machine
[20:00] * sudocat (~dibarra@192.185.1.20) Quit ()
[20:01] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[20:01] * sudocat1 (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[20:01] <lincolnb> also 2x1TB disks in a RAID-1 in the flex bay for the OS
[20:02] <diver> lincolnb: thanks for sharing! almost same spec as I see. does H730 support JBOD? Dell systems I have now supports only RAID0
[20:03] <lincolnb> yep. you can just do a 'omconfig storage controller action=convertRAIDtoNonRAID pdisk=...' and it should JBOD them
[20:03] <diver> great. which OS are you using?
[20:03] <lincolnb> the CPU/RAM are a bit overkill but we're also using these nodes as part of our batch system so we overprovisioned a bit
[20:04] <lincolnb> right now Scientific Linux 6 (i.e., RHEL 6), but we want to move to 7 for Jewel
[20:04] <cetex> diver: no ssd's in this case, 1xE5-2603, 16GB ram, PERC H730 mini currently in the single-disk raid0
[20:04] <cetex> 2xintel X520-da2
[20:05] * dougf (~dougf@75-131-32-223.static.kgpt.tn.charter.com) has joined #ceph
[20:05] <diver> thanks. what are you using to provision the OS?
[20:06] <cetex> it's a bit on the lower end, but it seems to work quite well. the new node with 16 osd's would probably like another 8GB ram or so to have some headroom
[20:06] <diver> and what is the mon server spec? is it different?
[20:06] <lincolnb> we just use cobbler to manage DHCP/PXE for nodes, and puppet to configure
[20:06] <cetex> We actually run mon on the same nodes as we have osd's.
[20:07] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:07] <cetex> since we only have 6 nodes here and it was quite rushed to get stuff running we just installed ubuntu trusty on all of them manually
[20:07] <cetex> but we have the pxe setup for all other nodes i'm thinking about running on them
[20:07] <cetex> although, that requires more ram, so would probably have to upgrade to 32GB by then
[20:07] <lincolnb> we run our mons on VMs. 4 CPU / 16GB RAM is pretty good (we have about 450 OSDs)
[20:08] * dougf (~dougf@75-131-32-223.static.kgpt.tn.charter.com) Quit ()
[20:08] <cetex> In that case we'd probably run ceph-osd inside docker, and just mount the osd-directory into the container.
[20:08] * dougf (~dougf@75-131-32-223.static.kgpt.tn.charter.com) has joined #ceph
[20:08] <cetex> (with --net=host and stuff, just using container as a chroot basically)
[20:09] * tsg_ (~tgohad@192.55.54.42) Quit (Remote host closed the connection)
[20:10] <diver> ok.. thanks for sharing this.
[20:10] * ledgr (~ledgr@client-87-247-92-99.inturbo.lt) has joined #ceph
[20:10] <diver> on the test cluster I uses razor to setup the OS and ansible-ceph to setup the software
[20:10] <diver> works well huh
[20:11] <diver> jewel\centos7
[20:11] * rendar (~I@host25-87-dynamic.22-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:12] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[20:13] <lincolnb> if you're planning to use CephFS, i'd get lots of RAM for the MDS as well for inode caching
[20:13] * tsg_ (~tgohad@134.134.137.75) has joined #ceph
[20:14] <diver> yeah.. I do the tests with enhanceIO right now to get less of seeks
[20:15] <diver> with 99.99% write rate - enhanceIO shows 3-4% read hit rate (ro mode)
[20:15] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[20:16] <diver> have someone test bluestore already?
[20:16] <diver> I tested that recently and faced with OSD crashes when cluster reached few M objects
[20:16] <diver> and osd fsck took ages to start back
[20:18] * ledgr (~ledgr@client-87-247-92-99.inturbo.lt) Quit (Remote host closed the connection)
[20:18] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[20:21] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[20:23] * lcurtis_ (~lcurtis@47.19.105.250) has joined #ceph
[20:33] * derjohn_mobi (~aj@b2b-94-79-172-98.unitymedia.biz) Quit (Ping timeout: 480 seconds)
[20:37] * rendar (~I@host25-87-dynamic.22-79-r.retail.telecomitalia.it) has joined #ceph
[20:39] * praveen (~praveen@122.172.122.231) has joined #ceph
[20:39] * praveen (~praveen@122.172.122.231) Quit (Remote host closed the connection)
[20:39] * praveen (~praveen@122.172.122.231) has joined #ceph
[20:44] * Sirrush (~Jebula@tor-exit.squirrel.theremailer.net) has joined #ceph
[20:46] * andreww (~xarses@64.124.158.3) Quit (Ping timeout: 480 seconds)
[20:49] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[20:50] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[20:58] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[21:00] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[21:02] * derjohn_mobi (~aj@x590cfca8.dyn.telefonica.de) has joined #ceph
[21:03] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[21:14] * Sirrush (~Jebula@635AAAMJ4.tor-irc.dnsbl.oftc.net) Quit ()
[21:19] * zack_dolby (~textual@p845d32.tokynt01.ap.so-net.ne.jp) has joined #ceph
[21:20] * tsg_ (~tgohad@134.134.137.75) Quit (Remote host closed the connection)
[21:20] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:21] * KindOne (~KindOne@h83.224.28.71.dynamic.ip.windstream.net) has joined #ceph
[21:23] * Cue (~thundercl@tsn109-201-154-152.dyn.nltelcom.net) has joined #ceph
[21:28] * DV_ is now known as DV
[21:32] * squizzi (~squizzi@2001:420:2240:1268:ad85:b28:ee1c:890) Quit (Ping timeout: 480 seconds)
[21:32] <rkeene> Is there a way to make ceph status report HEALTH_OK with nobackfill and norecover set ?
[21:33] * tsg_ (~tgohad@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[21:34] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[21:37] * Lokta (~Lokta@193.164.231.98) Quit (Ping timeout: 480 seconds)
[21:40] <TMM> rkeene, no, any flags give a HEALTH_WARN
[21:40] * wes_dillingham (~wes_dilli@140.247.242.44) Quit (Ping timeout: 484 seconds)
[21:40] * ntpttr_ (~ntpttr@134.134.139.83) has joined #ceph
[21:40] <rkeene> TMM, That's not true -- the "sortbiseflag" does not.
[21:43] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[21:44] * doppelgrau1 (~doppelgra@132.252.235.172) has joined #ceph
[21:44] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[21:45] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) has joined #ceph
[21:49] * Skaag (~lunix@65.200.54.234) Quit (Quit: Leaving.)
[21:52] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[21:53] <SamYaple> rkeene: no, those flags are needed on to be considered healthly. sortbitwise is needed for backware compat i believe. and i think there is another one that can be set (or unset) that wont thorugh HEALTH_WARN
[21:53] * Cue (~thundercl@635AAAMKZ.tor-irc.dnsbl.oftc.net) Quit ()
[21:54] * sudocat1 (~dibarra@192.185.1.20) has joined #ceph
[21:55] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[21:56] * Skaag (~lunix@65.200.54.234) has joined #ceph
[21:58] * xarses (~xarses@nat4.460b.weebly.net) has joined #ceph
[21:59] * srk (~Siva@32.97.110.55) Quit (Ping timeout: 480 seconds)
[21:59] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[22:01] * srk (~Siva@32.97.110.55) has joined #ceph
[22:05] * marco208 (~root@159.253.7.204) Quit (Remote host closed the connection)
[22:05] * marco208 (~root@159.253.7.204) has joined #ceph
[22:11] <rkeene> SamYaple, I want to set norebalance during a reboot procedure (here, for testing) but my reboot procedure won't reboot nodes until the cluster gets back to HEALTH_OK. If I change it to go on HEALTH_WARN I have to do a lot of work to figure out if that HEALTH_WARN is an okay HEALTH_WARN now
[22:12] * Racpatel (~Racpatel@2601:87:3:31e3::77ec) Quit (Quit: Leaving)
[22:19] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[22:19] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[22:21] <SamYaple> rkeene: i mean if you think about it, with noout/norecoverer/nobackfill set the cluster _isn't_ healthy
[22:21] <SamYaple> rkeene: it won't recover
[22:21] <SamYaple> rkeene: i think HEALTH_WARN is appropriate
[22:21] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[22:21] * xarses (~xarses@nat4.460b.weebly.net) Quit (Remote host closed the connection)
[22:21] <SamYaple> if you want ot ignore it, you should be the one to work in that logic in my opinion
[22:22] <rkeene> In general it isn't healthy, in this case it is.
[22:22] <rkeene> Because I set it, do a thing that's fine to do with it (maybe -- I want to test it), then unset it
[22:22] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[22:22] <rkeene> Me working out the logic is going to be much harder
[22:23] <SamYaple> so accept the state as healthy, then do your things and ignore the state, the unset the options you set and start monitoring the state again? thats what it sounds like you should do anyway
[22:23] <rkeene> Well, I want to know if it is healthy aside from the things that I set
[22:24] <SamYaple> i hear you, but you are relying on HEALTH_OK and that is not the purpose of that
[22:24] <rkeene> In this case, I want to reboot a node containing OSDs, without causing a rebalance during the time it takes to reboot (which could be hours)
[22:24] <SamYaple> you should be parsing _why_ the HEALTH_WARN exists and determining what to do wit hthat info
[22:27] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[22:28] * doppelgrau_ (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[22:28] * xarses (~xarses@nat4.460b.weebly.net) has joined #ceph
[22:30] * diver_ (~diver@216.85.162.34) has joined #ceph
[22:31] * diver__ (~diver@216.85.162.34) has joined #ceph
[22:34] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[22:34] * doppelgrau_ is now known as doppelgrau
[22:37] * tdb_ (~tdb@myrtle.kent.ac.uk) has joined #ceph
[22:37] * tdb (~tdb@myrtle.kent.ac.uk) Quit (Remote host closed the connection)
[22:37] * diver (~diver@95.85.8.93) Quit (Ping timeout: 480 seconds)
[22:38] * diver_ (~diver@216.85.162.34) Quit (Ping timeout: 480 seconds)
[22:39] * diver__ (~diver@216.85.162.34) Quit (Ping timeout: 480 seconds)
[22:41] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[22:49] * sickology (~mio@vpn.bcs.hr) Quit (Remote host closed the connection)
[22:50] * mykola (~Mikolaj@91.245.74.66) Quit (Quit: away)
[22:56] * thansen (~thansen@17.253.sfcn.org) Quit (Quit: Ex-Chat)
[22:57] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:58] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[22:58] * xarses (~xarses@nat4.460b.weebly.net) Quit (Ping timeout: 480 seconds)
[23:00] <rkeene> :-/
[23:01] <rkeene> 1 ops are blocked > 1048.58 sec on osd.0 (fresh cluster, almost no data -- 1 RBD with 1 file)
[23:04] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[23:08] * sudocat1 (~dibarra@192.185.1.20) Quit (Read error: Connection reset by peer)
[23:08] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[23:08] * mhack (~mhack@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[23:12] * georgem (~Adium@24.114.56.26) has joined #ceph
[23:12] * ntpttr_ (~ntpttr@134.134.139.83) Quit (Remote host closed the connection)
[23:13] * ntpttr_ (~ntpttr@192.55.54.45) has joined #ceph
[23:32] * tsg__ (~tgohad@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[23:34] * tsg_ (~tgohad@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[23:34] * georgem (~Adium@24.114.56.26) Quit (Quit: Leaving.)
[23:36] * ntpttr__ (~ntpttr@192.55.54.45) has joined #ceph
[23:36] * ntpttr_ (~ntpttr@192.55.54.45) Quit (Remote host closed the connection)
[23:39] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:405b:e9e5:f08e:901e) Quit (Ping timeout: 480 seconds)
[23:45] * rendar (~I@host25-87-dynamic.22-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[23:46] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:50] <cetex> so, if i understand it right, setting norebalance still allows recovery operations to take place where it's about replicating any pg to the set number of replicas?
[23:50] <cetex> but moving a pg from node X to node Y won't happen
[23:50] <cetex> right?
[23:51] <cetex> I basically want to prioritize re-replication of pg's to have the right number of replicas as high as possible
[23:53] * ntpttr__ (~ntpttr@192.55.54.45) Quit (Remote host closed the connection)
[23:53] * vbellur (~vijay@71.234.224.255) has joined #ceph
[23:56] * KindOne (kindone@h83.224.28.71.dynamic.ip.windstream.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.