#ceph IRC Log

Index

IRC Log for 2016-10-12

Timestamps are in GMT/BST.

[0:02] * vbellur (~vijay@71.234.224.255) has joined #ceph
[0:03] <blizzow> 10GbE
[0:14] * ronrib (~oftc-webi@2400:8c80::505:45cd:67ce:9431:20e1) has joined #ceph
[0:19] <ben1> weird, doesnt' sound like you should have issues then
[0:21] * terminalecho (~terminale@vanquish.cacr.caltech.edu) has joined #ceph
[0:23] <terminalecho> ceph.com dead?
[0:24] <doppelgrau> terminalecho: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[0:24] <terminalecho> ah, thx
[0:29] * northrup (~northrup@75-146-11-137-Nashville.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:33] <diq> you have separate front and back end networks?
[0:35] <[arx]> what were the advantages of listing every osd in ceph.conf again?
[0:35] <blizzow> diq, yes.
[0:37] * fsimonce (~simon@95.239.69.67) Quit (Quit: Coyote finally caught me)
[0:59] * cmart (~cmart@128.196.38.33) has joined #ceph
[0:59] <cmart> is your web site down? https://ceph.com
[1:01] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) Quit (Quit: Leaving.)
[1:04] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) has joined #ceph
[1:10] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[1:13] * KindOne (kindone@0001a7db.user.oftc.net) has joined #ceph
[1:14] * xarses_ (~xarses@64.124.158.3) Quit (Ping timeout: 480 seconds)
[1:17] * oms101 (~oms101@p20030057EA4F0700C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:18] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) Quit (Quit: Leaving.)
[1:18] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) has joined #ceph
[1:18] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[1:19] * Kaervan (~Lunk2@46.166.138.135) has joined #ceph
[1:22] <cmart> I think your web site may be offline. Is there a better place to report this?
[1:22] <Jeeves_> cmart: They already know
[1:26] * oms101 (~oms101@p20030057EA491300C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:34] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[1:36] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:39] <diq> someone with ops needs to change the topic
[1:40] <Jeeves_> joao nhm or leseb ? :)
[1:42] * vata (~vata@207.96.182.162) Quit (Quit: Leaving.)
[1:45] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[1:45] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[1:47] * Concubidated (~cube@68.140.239.164) Quit (Quit: Leaving.)
[1:47] * Concubid- (~concubida@2607:f298:5:101d:f816:3eff:fe8e:5412) has joined #ceph
[1:49] * Kaervan (~Lunk2@46.166.138.135) Quit ()
[1:52] <ronrib> what kind of speed difference should I expect with 2:3 erasure vs 2x replication while using cephfs? My tests are looking pretty grim, a single rsync of a large file can manage replication reads of 250mB/s while erasure is more like 30mB/s
[1:52] <ronrib> is this usual?
[1:56] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[1:58] * Concubid- (~concubida@2607:f298:5:101d:f816:3eff:fe8e:5412) Quit (Quit: ZNC 1.6.3+deb1+xenial0 - http://znc.in)
[1:59] * Cube (~Cube@2607:f298:5:101d:f816:3eff:fe8e:5412) has joined #ceph
[2:05] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[2:05] * mattbenjamin (~mbenjamin@174-23-175-206.slkc.qwest.net) has joined #ceph
[2:07] * Jaska (~xolotl@tsn109-201-152-15.dyn.nltelcom.net) has joined #ceph
[2:08] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Read error: Connection reset by peer)
[2:09] * salwasser (~Adium@2601:197:101:5cc1:7d00:d2fc:72a4:201e) has joined #ceph
[2:09] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[2:09] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[2:11] * Racpatel (~Racpatel@2601:87:3:31e3::34db) Quit (Ping timeout: 480 seconds)
[2:12] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[2:13] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[2:15] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[2:16] * kristen (~kristen@134.134.139.74) Quit (Quit: Leaving)
[2:19] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[2:24] * ceph-ircslackbot (~ceph-ircs@ds9536.dreamservers.com) Quit (Remote host closed the connection)
[2:28] * KindOne (kindone@0001a7db.user.oftc.net) has joined #ceph
[2:29] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[2:29] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[2:37] * vata (~vata@96.127.202.136) has joined #ceph
[2:37] * Jaska (~xolotl@tsn109-201-152-15.dyn.nltelcom.net) Quit ()
[2:46] * rinek (~o@62.109.134.112) Quit (Quit: ~)
[2:46] * cmart (~cmart@128.196.38.33) Quit (Ping timeout: 480 seconds)
[2:50] * niknakpa1dywak (~xander.ni@outbound.lax.demandmedia.com) Quit (Quit: Lost terminal)
[2:54] * efirs (~firs@98.207.153.155) has joined #ceph
[3:00] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) Quit (Quit: Leaving.)
[3:06] * rinek (~o@62.109.134.112) has joined #ceph
[3:09] * rinek (~o@62.109.134.112) Quit ()
[3:13] * rinek (~o@62.109.134.112) has joined #ceph
[3:20] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) Quit (Quit: Leaving.)
[3:26] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) has joined #ceph
[3:44] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) Quit (Quit: Leaving.)
[3:44] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) has joined #ceph
[3:46] * northrup (~northrup@173.14.101.193) has joined #ceph
[3:53] * yanzheng (~zhyan@118.116.115.174) has joined #ceph
[4:00] * jfaj (~jan@p20030084AD2DDF006AF728FFFE6777FF.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:09] * jfaj (~jan@p20030084AD2EBE006AF728FFFE6777FF.dip0.t-ipconnect.de) has joined #ceph
[4:10] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) has joined #ceph
[4:12] * davidzlap (~Adium@2605:e000:1313:8003:b91f:8dd4:bf57:96a6) Quit (Quit: Leaving.)
[4:14] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) Quit (Quit: Leaving)
[4:16] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit (Quit: Leaving)
[4:16] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) Quit (Quit: Leaving.)
[4:16] * georgem (~Adium@206.108.127.16) has joined #ceph
[4:30] * kefu (~kefu@114.92.125.128) has joined #ceph
[4:31] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:33] * ndevos (~ndevos@nat-pool-ams2-5.redhat.com) Quit (Remote host closed the connection)
[4:34] * salwasser (~Adium@2601:197:101:5cc1:7d00:d2fc:72a4:201e) Quit (Quit: Leaving.)
[4:35] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:35] * minnesotags (~herbgarci@c-50-137-242-97.hsd1.mn.comcast.net) has joined #ceph
[4:36] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:38] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[4:42] * shaunm (~shaunm@ms-208-102-105-216.gsm.cbwireless.com) Quit (Ping timeout: 480 seconds)
[4:48] * kefu (~kefu@114.92.125.128) Quit (Read error: No route to host)
[4:48] * kefu (~kefu@114.92.125.128) has joined #ceph
[4:50] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[4:50] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:51] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit ()
[4:55] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[5:00] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[5:06] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[5:11] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[5:11] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[5:11] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:12] * dillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[5:12] * kefu (~kefu@li1456-173.members.linode.com) has joined #ceph
[5:16] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[5:20] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[5:21] * dillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[5:24] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:25] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[5:29] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[5:33] * rotbeard (~redbeard@aftr-109-90-233-215.unity-media.net) has joined #ceph
[5:34] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:40] * Vacuum__ (~Vacuum@88.130.206.40) has joined #ceph
[5:40] * Vacuum_ (~Vacuum@88.130.211.134) Quit (Read error: Connection reset by peer)
[5:45] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) Quit (Quit: I'll be back!)
[5:53] * _ndevos (~ndevos@nat-pool-ams2-5.redhat.com) has joined #ceph
[5:53] * _ndevos is now known as ndevos
[5:55] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:04] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[6:09] * Jeffrey4l_ (~Jeffrey@120.11.30.55) Quit (Ping timeout: 480 seconds)
[6:11] * walcubi (~walcubi@p5795B011.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:11] * walcubi (~walcubi@p5795AB17.dip0.t-ipconnect.de) has joined #ceph
[6:17] * TomasBeHere (~TomasCZ@yes.tenlab.net) has joined #ceph
[6:17] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Read error: Connection reset by peer)
[6:25] * Jeffrey4l (~Jeffrey@120.11.28.7) has joined #ceph
[6:30] * rinek (~o@62.109.134.112) Quit (Ping timeout: 480 seconds)
[6:31] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[6:33] * kefu (~kefu@li1456-173.members.linode.com) Quit (Max SendQ exceeded)
[6:34] * kefu (~kefu@li1456-173.members.linode.com) has joined #ceph
[6:39] * rinek (~o@62.109.134.112) has joined #ceph
[6:48] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[6:49] * Aramande_ (~Aethis@46.166.148.142) has joined #ceph
[6:55] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[6:56] * kefu (~kefu@li1456-173.members.linode.com) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:56] * rraja (~rraja@125.16.34.66) has joined #ceph
[6:59] * evelu (~erwan@37.162.197.190) has joined #ceph
[7:04] * rinek (~o@62.109.134.112) Quit (Ping timeout: 480 seconds)
[7:04] * kefu (~kefu@114.92.125.128) has joined #ceph
[7:08] * vata (~vata@96.127.202.136) Quit (Quit: Leaving.)
[7:09] * TomasBeHere (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[7:11] * Jeffrey4l (~Jeffrey@120.11.28.7) Quit (Read error: Connection reset by peer)
[7:12] * Jeffrey4l (~Jeffrey@110.252.73.209) has joined #ceph
[7:14] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[7:18] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[7:18] * kefu (~kefu@114.92.125.128) has joined #ceph
[7:19] * Aramande_ (~Aethis@46.166.148.142) Quit ()
[7:20] * karnan (~karnan@125.16.34.66) has joined #ceph
[7:23] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:23] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[7:23] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Quit: cyphase.com)
[7:23] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[7:27] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[7:36] * Nats_ (~natscogs@114.31.195.238) has joined #ceph
[7:43] * Nats (~natscogs@114.31.195.238) Quit (Ping timeout: 480 seconds)
[7:45] * Jeffrey4l (~Jeffrey@110.252.73.209) Quit (Ping timeout: 480 seconds)
[7:48] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[7:51] * Jeffrey4l (~Jeffrey@119.251.221.78) has joined #ceph
[7:57] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:58] * vimal (~vikumar@121.244.87.116) has joined #ceph
[7:58] * Jeffrey4l_ (~Jeffrey@110.244.243.23) has joined #ceph
[8:00] * Jeffrey4l (~Jeffrey@119.251.221.78) Quit (Ping timeout: 480 seconds)
[8:06] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:07] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[8:07] * Ivan1 (~ipencak@213.151.95.130) has joined #ceph
[8:08] * Steve_pi (~Steve@124-170-32-230.dyn.iinet.net.au) has joined #ceph
[8:09] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[8:10] * Jeffrey4l_ (~Jeffrey@110.244.243.23) Quit (Ping timeout: 480 seconds)
[8:13] * Steve_pi (~Steve@124-170-32-230.dyn.iinet.net.au) has left #ceph
[8:16] * rinek (~o@62.109.134.112) has joined #ceph
[8:16] * Jeffrey4l_ (~Jeffrey@110.252.52.64) has joined #ceph
[8:20] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[8:20] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[8:21] * kefu (~kefu@114.92.125.128) has joined #ceph
[8:22] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[8:28] * maybebuggy (~maybebugg@2a01:4f8:191:2350::2) has joined #ceph
[8:31] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:39] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[8:39] * kefu (~kefu@li1456-173.members.linode.com) has joined #ceph
[8:41] * cristicalin (~cristical@109.166.154.147) has joined #ceph
[8:43] <cristicalin> is ceph.com down ?
[8:43] <cristicalin> http://www.isitdownrightnow.com/ceph.com.html
[8:47] <badone> yes, it is
[8:48] <badone> down for more than a week is incorrect...
[8:49] * kefu (~kefu@li1456-173.members.linode.com) Quit (Max SendQ exceeded)
[8:49] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[8:50] * kefu (~kefu@li1456-173.members.linode.com) has joined #ceph
[8:51] * Ramakrishnan (~ramakrish@125.16.34.66) has joined #ceph
[8:51] <cristicalin> i just noticed it because of a failed build this morning which was fetching stuff from download.ceph.com
[8:52] * rotbeard (~redbeard@aftr-109-90-233-215.unity-media.net) Quit (Quit: Leaving)
[8:53] * kefu (~kefu@li1456-173.members.linode.com) Quit (Max SendQ exceeded)
[8:53] * kefu (~kefu@li1456-173.members.linode.com) has joined #ceph
[8:56] <badone> cristicalin: orly? what do you fetch?
[8:56] <badone> I understand the hosting provider is working hard on rectifying the situation...
[8:57] * badone imagines they are having a bad day
[8:57] <IcePic> "When your shoes are still warm when you put them on to go back to work, you know it's going to be one of _those_ days again"
[9:00] <badone> heh
[9:01] <cristicalin> badone, jewel packages for debian
[9:01] <cristicalin> part of a docker image build
[9:01] <badone> ah, nice...
[9:09] * branto (~branto@transit-86-181-132-209.redhat.com) has joined #ceph
[9:11] * cristicalin (~cristical@109.166.154.147) Quit (Quit: Leaving...)
[9:11] * ade (~abradshaw@pool-22.254.176.62.dynamic.wobline-ip.de) has joined #ceph
[9:12] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[9:12] * ivve (~zed@c83-254-15-40.bredband.comhem.se) has joined #ceph
[9:16] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:17] * ashah (~ashah@125.16.34.66) has joined #ceph
[9:18] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[9:20] * analbeard (~shw@support.memset.com) has joined #ceph
[9:22] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[9:22] * fsimonce (~simon@95.239.69.67) has joined #ceph
[9:22] * tiger (~textual@58.213.102.114) has joined #ceph
[9:23] * tiger (~textual@58.213.102.114) Quit ()
[9:25] * b0e (~aledermue@213.95.25.82) has joined #ceph
[9:29] * Hemanth (~hkumar_@125.16.34.66) has joined #ceph
[9:31] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[9:34] * efirs (~firs@98.207.153.155) Quit (Quit: Leaving.)
[9:36] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[9:39] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[9:41] * analbeard (~shw@support.memset.com) has left #ceph
[9:41] * analbeard (~shw@support.memset.com) has joined #ceph
[9:43] * hbogert1 (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[9:43] * rmart04 (~rmart04@support.memset.com) has joined #ceph
[9:43] <etienneme> You can use http://eu.ceph.com/ repo
[9:49] * dgurtner (~dgurtner@176.35.230.73) has joined #ceph
[9:50] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Ping timeout: 480 seconds)
[9:54] <peetaur2> why don't they change the dns to point to eu
[9:55] <peetaur2> that's what I'd do.....bandage solutions until it's really fixed
[9:55] <peetaur2> I'd also put up a static docs site page
[9:55] <peetaur2> no clue what I'd do about the tracker
[9:55] * trociny (~mgolub@93.183.239.2) Quit (Remote host closed the connection)
[9:59] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) Quit (Quit: Leaving.)
[10:00] * jagardaniel_ (~oftc-webi@2001:9b0:109:103::b6) has joined #ceph
[10:02] * derjohn_mob (~aj@46.189.28.79) has joined #ceph
[10:03] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[10:07] * trociny (~mgolub@93.183.239.2) has joined #ceph
[10:09] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Ping timeout: 480 seconds)
[10:10] * lmb (~Lars@2a02:8109:8100:1d2c:2ad2:44ff:fedf:3318) has joined #ceph
[10:12] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:a095:7e95:6611:bc7) has joined #ceph
[10:14] * briner (~briner@129.194.16.54) has joined #ceph
[10:15] <etienneme> You can find the doc here https://github.com/ceph/ceph/tree/master/doc
[10:15] <etienneme> just open the rst github will display it
[10:18] <peetaur2> uh that's pretty terrible; I'd rather build the docs and look at it in html
[10:21] * garphy is now known as garphy`aw
[10:23] * scuttlemonkey (~scuttle@nat-pool-rdu-t.redhat.com) Quit (Ping timeout: 480 seconds)
[10:37] * moegyver (~oftc-webi@2001:9b0:14a:10:36e6:d7ff:fe0e:81b0) Quit (Remote host closed the connection)
[10:39] * donatas (~donatas@88-119-196-104.static.zebra.lt) has joined #ceph
[10:39] <donatas> why ceph.com is not working?
[10:42] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[10:43] <etienneme> http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[10:44] <donatas> got it
[10:45] * garphy`aw is now known as garphy
[10:45] * donatas (~donatas@88-119-196-104.static.zebra.lt) Quit (Quit: leaving)
[10:49] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[10:49] * hbogert2 (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[10:49] * hbogert1 (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Read error: Connection reset by peer)
[10:57] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Ping timeout: 480 seconds)
[11:00] * dgurtner (~dgurtner@176.35.230.73) Quit (Ping timeout: 480 seconds)
[11:04] * pam (~pam@193.106.183.1) has joined #ceph
[11:04] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Read error: No route to host)
[11:04] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[11:06] <pam> Hello, does someone know from where calamari get's the metrics of "ceph.cluster.<id>.df" etc... from the monitors?
[11:06] * garphy is now known as garphy`aw
[11:09] <peetaur2> pam: dunno, but these things ought to contain the same https://bpaste.net/show/af5aff641efd
[11:11] <pam> The problem is since we upgraded to jewel, calamari does not report anymore the total usage of the cluster and the IOPS
[11:11] * kwork_ is now known as kwork
[11:11] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[11:11] * TMM (~hp@185.5.121.201) has joined #ceph
[11:11] <pam> I looked into the graphite data and these values are all empty
[11:11] <walcubi> Is download.ceph.com *still* down?
[11:12] <walcubi> Someone should slap the sysadmin with a wet fish.
[11:12] <frickler> so now dreamhost cannot fix their ceph cluster because the docs how to do that are not accessible? ;-)
[11:12] <T1w> haha
[11:12] <T1w> could be.. ;)
[11:13] <T1w> http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[11:15] <kefu> donatas our host is experiencing some technical difficulties.
[11:15] * dgurtner (~dgurtner@94.126.212.170) has joined #ceph
[11:16] <IcePic> *zing*
[11:20] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:20] * ivve (~zed@c83-254-15-40.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[11:20] * rendar (~I@host153-176-dynamic.52-79-r.retail.telecomitalia.it) has joined #ceph
[11:21] <peetaur2> frickler: nice one
[11:21] <darkfader> walcubi: I LOVE suggestions of slapping the sysadmins once the sw bugs took stuff down
[11:21] <darkfader> we need more like that :)
[11:22] <darkfader> and yes, not recovering nicely from a net outage is a sw issue
[11:25] <walcubi> darkfader, From one sysadmin to another, naturally. :)
[11:31] <peetaur2> but isn't it the sysadmin's job to complain to upstream to fix the sw? ;)
[11:31] * ivve (~zed@m37-2-60-55.cust.tele2.se) has joined #ceph
[11:32] <darkfader> walcubi: :)
[11:33] * nardial (~ls@p5DC06978.dip0.t-ipconnect.de) has joined #ceph
[11:33] <darkfader> peetaur2: i thought so, till i had a non-IT customer again and got reminded the job is to make users happy
[11:33] * evelu (~erwan@37.162.197.190) Quit (Ping timeout: 480 seconds)
[11:33] <darkfader> i need more sleep sorry for the grumping now
[11:33] * etamponi (~etamponi@net-93-71-251-206.cust.vodafonedsl.it) has joined #ceph
[11:34] <etamponi> ceph.com is still down, is there any status update?
[11:34] <peetaur2> someone should put that url in the channel topic...
[11:34] <T1w> etamponi: apart from http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/ no..
[11:34] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:34] <etamponi> Thank you
[11:35] <etamponi> I guess I'll switch my scripts to use eu.ceph.com, hoping that apt will not complain about the lack of https...
[11:42] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[11:51] <walcubi> darkfader, I know the feeling. Especially when a route goes down across two continents.
[11:55] * walcubi (~walcubi@p5795AB17.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[11:56] * ivve (~zed@m37-2-60-55.cust.tele2.se) Quit (Ping timeout: 480 seconds)
[11:57] * JayJay (~oftc-webi@dsl-213-233-211-236.solcon.nl) has joined #ceph
[11:57] * JayJay (~oftc-webi@dsl-213-233-211-236.solcon.nl) Quit ()
[11:58] * t4nk263Edu (~oftc-webi@dsl-213-233-211-236.solcon.nl) has joined #ceph
[11:59] * hbogert2 (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Ping timeout: 480 seconds)
[12:04] * walcubi (~walcubi@p5795AB17.dip0.t-ipconnect.de) has joined #ceph
[12:04] <t4nk263Edu> Hi guys, is ceph.com down?
[12:05] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[12:05] * hbogert1 (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[12:05] <liiwi> yes, outage at dreamhost
[12:05] <t4nk263Edu> oops
[12:06] <liiwi> see dreamhoststatus.com
[12:06] <t4nk263Edu> thanks
[12:07] * ivve (~zed@c83-254-15-40.bredband.comhem.se) has joined #ceph
[12:08] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[12:09] <t4nk263Edu> ceph issues?
[12:10] <ivve> anyone knows if there is any way to throttle by pool or rbd image to client?
[12:11] <doppelgrau> ivve: if you use qemu, you can limit io there
[12:11] <ivve> how about librbd?
[12:13] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Ping timeout: 480 seconds)
[12:16] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[12:19] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:20] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:22] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[12:25] * nardial (~ls@p5DC06978.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[12:33] <brians_> thats some outage - I think after 17 hours that the cluster is dead
[12:33] <peetaur2> or restoring a 500TB backup :D
[12:33] <brians_> yuuuueckckckc
[12:34] <peetaur2> I wish they gave details on it... I wanna know so it's like 2nd hand experience :)
[12:34] <brians_> from floppies
[12:34] <IcePic> "please insert volume BACKUP3402 in any drive"
[12:34] <brians_> jesus
[12:34] <brians_> "please insert more whiskey into engineer"
[12:34] <doppelgrau> brians: or "only" serously fucked up
[12:36] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[12:37] <doppelgrau> after a badly planed Datacenter-movement (while HDD only Pools were slowly replaced by HDD+journal => about 600 PGs/OSD) and some trouble when swithing everything back on nearly every PG became unclean => nearly 4 times of memory needed => started swapping ...
[12:39] <doppelgrau> in that case we were lucky, that the servers also run vms => with a reboot we could assign some memory from the vms to the dom0 whre ceph runs => memory problems resoleved, but after that every pg was unclean => took half a day before we even started to power up some VMs
[12:40] <doppelgrau> with more care and more meory, that problem would have been way smaller, but that was a lesson the management had to learn the hard way
[12:41] * kefu (~kefu@li1456-173.members.linode.com) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:41] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[12:41] <doppelgrau> and after some serious network problems without the change to adjust the mory ...
[12:42] * [0x4A6F]_ (~ident@p4FC2612D.dip0.t-ipconnect.de) has joined #ceph
[12:43] <peetaur2> how much ram, pgs, osds was that?
[12:43] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:43] * [0x4A6F]_ is now known as [0x4A6F]
[12:44] <doppelgrau> peetaur2: each osd growing from less 1GB to about 4GB
[12:47] <peetaur2> you needed 4GB per osd?
[12:47] <doppelgrau> during recovery, yes
[12:47] <peetaur2> I bought some 12 disk machines with 64GB... so I guess that means I have more than that, right?
[12:49] <doppelgrau> looks safe for me, but can not guarantee, that you can't make it even worse :D
[12:49] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[12:51] * kefu (~kefu@114.92.125.128) has joined #ceph
[12:51] * Ivan1 (~ipencak@213.151.95.130) Quit (Quit: Leaving.)
[12:52] * Wizeon (~Teddybare@tor-exit.squirrel.theremailer.net) has joined #ceph
[12:55] <doppelgrau> but assuming Servers with12 OSDs and only 32GB of memory installed, the recovery could be really hard in such a case
[12:57] <Be-El> you cannot have too much memory...
[13:00] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[13:03] <peetaur2> what if you put so much RAM in one chassis that you make a black hole?
[13:06] <doppelgrau> you'd save rackspace
[13:07] <Be-El> that's /dev/null as hardware device...
[13:07] <Be-El> and off for meeting
[13:09] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:10] <walcubi> peetaur2, linux loves to use disk cache.
[13:12] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[13:13] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit ()
[13:15] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[13:16] <ivve> how fast would i notice if i were to... say zap a ssd that holds journals? :)
[13:19] <Gugge-47527> if you are asleep, and have no alarm system, not before you wake up :)
[13:19] <ivve> :)
[13:22] * Wizeon (~Teddybare@tor-exit.squirrel.theremailer.net) Quit ()
[13:31] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[13:32] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[13:33] * t4nk263Edu (~oftc-webi@dsl-213-233-211-236.solcon.nl) Quit (Remote host closed the connection)
[13:38] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[13:38] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[13:39] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[13:39] <ivve> hate to move journals from ssd back to sata
[13:39] <ivve> takes so long
[13:40] * toastydeath (~toast@pool-71-255-253-39.washdc.fios.verizon.net) Quit (Read error: Connection reset by peer)
[13:40] <ivve> 320TB to go! :(
[13:40] <ivve> or actually 2x
[13:44] * realitysandwich (~perry@2001:920:1846:1dc0:baae:edff:fe73:f413) has joined #ceph
[13:44] <realitysandwich> are there currently any issues with ssl on download.ceph.com?
[13:45] <jagardaniel_> realitysandwich: it is down
[13:45] <Gugge-47527> not ssl specifically
[13:45] <boolman> I just changed to eu.ceph.com on all my repos
[13:45] <jagardaniel_> this link was posted before: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[13:45] <realitysandwich> ahhh, ok well this makes me feel less crazy
[13:47] <jagardaniel_> hehe :) are you using ansible? because the module seems to complain about SSL when the page cannot be reached
[13:47] <realitysandwich> no, was just trying to setup a new node with ceph-deploy
[13:47] * rakeshgm (~rakesh@125.16.34.66) has joined #ceph
[13:48] <jagardaniel_> ah, probably apt then
[13:50] <realitysandwich> you got it :)
[13:57] * pdhange (~pdhange@210.185.111.235) Quit (Quit: Leaving)
[13:57] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[13:59] * pam (~pam@193.106.183.1) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:01] * pam (~pam@193.106.183.1) has joined #ceph
[14:01] * Racpatel (~Racpatel@2601:87:3:31e3::34db) has joined #ceph
[14:05] * Racpatel (~Racpatel@2601:87:3:31e3::34db) Quit ()
[14:07] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) has joined #ceph
[14:08] * hbogert1 (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Ping timeout: 480 seconds)
[14:14] * Racpatel (~Racpatel@2601:87:3:31e3::34db) has joined #ceph
[14:18] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) Quit (Quit: valeech)
[14:22] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-055.uzh.ch) has joined #ceph
[14:22] * karnan (~karnan@125.16.34.66) Quit (Remote host closed the connection)
[14:30] <masterpe> http://www.isitdownrightnow.com/docs.ceph.com.html -> docs.ceph.com in maitancnce?
[14:30] <masterpe> sorry did'nt read all the chat lines.
[14:34] * pam (~pam@193.106.183.1) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:40] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[14:42] <thoht> anybody tries to use lvm cache with SSD pv for speeding up OSD SATA ?
[14:47] * shaunm (~shaunm@ms-208-102-105-216.gsm.cbwireless.com) has joined #ceph
[14:49] * moegyver (~oftc-webi@2001:9b0:14a:10:36e6:d7ff:fe0e:81b0) has joined #ceph
[14:49] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[14:53] * evelu (~erwan@aut78-1-78-236-183-64.fbx.proxad.net) has joined #ceph
[14:57] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:06] * richardus1 (~Mraedis@46.166.148.154) has joined #ceph
[15:13] * hbogert (~Adium@a83-163-87-95.adsl.xs4all.nl) Quit (Quit: Leaving.)
[15:13] <Be-El> thoht: there was a thread about supporting bcache for osd on either the ceph-user or ceph-devel mailing list
[15:14] <thoht> Be-El: i tried bcache but i'm sick about the load average bug
[15:14] <thoht> it shows constantly a load average at 4 due to a bug in state of the process
[15:15] <thoht> (on a kernel 4.4)
[15:15] <Be-El> the last time i've tried bcache it destroyed our mysql server since it came up without the cache tier after a reboot (writeback mode....)
[15:16] <thoht> i did many reboot and bcache remainded persistent
[15:16] <Be-El> at that time creating a bcache setup did not introduce a new device. the layer was completely transparent
[15:16] <thoht> oh it was some time ago i see
[15:16] <hoonetorg> hi
[15:17] <hoonetorg> is download.ceph.com broken
[15:17] <hoonetorg> again?
[15:17] <thoht> hoonetorg: use eu.ceph.com
[15:17] <hoonetorg> ah
[15:17] <hoonetorg> k
[15:17] <hoonetorg> what happened?
[15:17] <thoht> ceph infra is down
[15:17] <hoonetorg> serious problems?
[15:18] <lri_> is it running a ceph filesystem
[15:23] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[15:23] * wes_dillingham (~wes_dilli@140.247.242.44) has joined #ceph
[15:23] <thoht> 458222/979657 objects misplaced (46.774%)
[15:23] <thoht> fuck me
[15:24] * squizzi (~squizzi@107.13.237.240) has joined #ceph
[15:24] <peetaur2> misplaced isn't that bad... it does that to change from degraded to that, so you have replication
[15:24] <peetaur2> but 46%?
[15:24] <thoht> i just put 2 OSD at weight 0.0
[15:25] <thoht> ceph osd crush reweight osd.<ID> 0.0
[15:25] <thoht> i need to reinstall the server where there are these 2 OSD
[15:25] <thoht> recovery io 204 MB/s, 51 objects/s
[15:26] <thoht> on a 1Gb link it is not so fast
[15:26] <peetaur2> you could set up a replacement server with the old disks
[15:26] * kefu (~kefu@114.92.125.128) Quit (Read error: Connection reset by peer)
[15:26] <Be-El> thoht: changing the weight influence the weight of the host, which influences the overall pg distribution etc.
[15:27] <peetaur2> or prepare the new install on new disks, and then replace them faster, without using reweight
[15:27] <peetaur2> I mean like install while the old is running
[15:27] <Be-El> thoht: so you end up with a lot of data shuffling between hosts that are not affected by the reinstallation
[15:28] <peetaur2> oh I figured it was because he stopped half his osds...
[15:28] * diver (~diver@95.85.8.93) has joined #ceph
[15:28] <peetaur2> so reweight is worse than out for this reason?
[15:28] <thoht> i ve for now 4 servers with replica 3
[15:28] <thoht> 2 OSD by server
[15:28] <thoht> i need to reinstall 1 server
[15:28] * ashah (~ashah@125.16.34.66) Quit (Ping timeout: 480 seconds)
[15:28] <Be-El> peetaur2: it is worse with respect to overall data movement, yes
[15:29] <peetaur2> I always wondered the difference, and nothing answered properly like this
[15:29] <thoht> so ceph osd crush reweight osd.<ID> 0.0 <== it was not the proper way ?
[15:29] * jarrpa (~jarrpa@63.225.131.166) has joined #ceph
[15:29] <Be-El> peetaur2: but it's recommended with respect to data integrity, since your replication requirements are always fulfilled
[15:29] <peetaur2> people just keep randomly suggesting either one or the other method (and docs say try out first, then reweight if that fails)
[15:29] <thoht> i'm following https://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/
[15:29] <peetaur2> doesn't ceph osd out fulfill the same?
[15:30] <Be-El> thoht: that removes an osd, but you want to replace it
[15:30] <thoht> yes
[15:30] <Be-El> peetaur2: partly. changing weight to 0.0 allows ceph to use the pg on those osd until they are moved to another osd
[15:31] <Be-El> that's why you get remapped pg, not degraded ones
[15:31] <thoht> 744/971926 objects degraded (0.077%)
[15:31] <thoht> 441649/971926 objects misplaced (45.441%)
[15:31] <thoht> there are a bit degraded
[15:32] <Be-El> thoht: those are objects that are modified during the backfilling
[15:32] <thoht> ok so all is normal
[15:32] <Be-El> not sure whether ceph puts them on the new target pgs, or on the former pg, or whatever ;-)
[15:32] <Be-El> thoht: pretty standard operation, yes
[15:33] <thoht> i don't think i ll put the server back to cluster after
[15:33] <thoht> my cluster is 3 servers, and i m replacing each server with a new one with SATA+SSD (instead of only SATA)
[15:33] <Be-El> thoht: in that case the data has to be moved anyways, and you are doing it in the only sane way
[15:33] <thoht> i ve already put the new one, reason why i was at 4
[15:34] <thoht> ok good
[15:34] <Be-El> if you are replacing the host with another host with the same capacity, and you are willing to temporarely break the replication requirements, there's another way
[15:35] <Be-El> restore the former crush weight, wait for everything to settle, set nobackfill/norecover, shutdown osd, shutdown host, add new host, add new osd, unset nobackfill/norecover
[15:35] * salwasser (~Adium@72.246.3.14) has joined #ceph
[15:36] * bene3 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[15:36] * richardus1 (~Mraedis@46.166.148.154) Quit ()
[15:36] * bene3 (~bene@nat-pool-bos-t.redhat.com) Quit ()
[15:36] <Be-El> if you edit the crush map (e.g. changing the host name) you can get away without a lot of data shuffling
[15:37] <Be-El> but it's more risk, that's why most people do not do it
[15:37] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:37] * ashah (~ashah@125.16.34.66) has joined #ceph
[15:37] <thoht> editing crush map yeah
[15:37] <thoht> when you never did it; it is tricky
[15:38] * allen13 (~allen13@2607:f0d0:1100:8466:31dd:9ec:2adf:6349) has joined #ceph
[15:38] <peetaur2> Be-El: would you trust that if you had size=3?
[15:38] <peetaur2> or is the risk more to do with breaking the crush map?
[15:38] <Be-El> the point about editing the crush is moving the new server to the slot of the former one
[15:39] <Be-El> otherwise the complete cluster is turned upside down and all pgs are reshuffled
[15:39] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[15:39] <Be-El> peetaur2: for a given number of hosts it might be the better way. but with 3 or 4 hosts it is too risky imho
[15:40] <allen13> Does anyone have a status on the ceph.com site being restored along with the repos? Perhaps some backup mirrors? Thanks!
[15:41] <lmb> allen13: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[15:41] <Be-El> allen13: there's eu.ceph.com
[15:41] <allen13> Thanks!
[15:44] * mattbenjamin (~mbenjamin@174-23-175-206.slkc.qwest.net) Quit (Ping timeout: 480 seconds)
[15:48] * vata (~vata@207.96.182.162) has joined #ceph
[15:48] * Ramakrishnan (~ramakrish@125.16.34.66) Quit (Ping timeout: 480 seconds)
[15:49] * allen13 (~allen13@2607:f0d0:1100:8466:31dd:9ec:2adf:6349) Quit ()
[15:51] <diver> why dev's can't just point download A record to eu one
[15:52] <diver> to avoid all those questions\inconvenience
[15:52] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) has joined #ceph
[15:52] * peetaur2 would
[15:55] * rakeshgm (~rakesh@125.16.34.66) Quit (Remote host closed the connection)
[16:00] <Gugge-47527> diver: maybe ... just maybe ... they are busy with the broken cluster at dreamhost :)
[16:00] * ron-slc (~Ron@173-165-129-118-utah.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[16:00] <diver> Gugge-47527: but they have live mirror. it makes no sense for me
[16:01] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[16:03] * The_Ball (~pi@20.92-221-43.customer.lyse.net) Quit (Quit: Leaving)
[16:03] * The_Ball (~pi@20.92-221-43.customer.lyse.net) has joined #ceph
[16:04] * derjohn_mob (~aj@46.189.28.79) Quit (Ping timeout: 480 seconds)
[16:08] * mattbenjamin (~mbenjamin@97-117-49-192.slkc.qwest.net) has joined #ceph
[16:09] * bene3 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[16:11] * masteroman (~ivan@93-142-195-123.adsl.net.t-com.hr) has joined #ceph
[16:11] * ron-slc (~Ron@173-165-129-118-utah.hfc.comcastbusiness.net) has joined #ceph
[16:13] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) Quit (Quit: Leaving.)
[16:13] * georgem (~Adium@206.108.127.16) has joined #ceph
[16:15] * northrup (~northrup@173.14.101.193) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[16:26] <IcePic> I'm with diver here I think, should be easy to "fix" and would help a lot of robot scripts that currently fail, for no good reason, then admins scratch their heads and investigate and some get here or other similar places to get a simple workaround.
[16:27] * wedge (~wedge@modemcable104.203-131-66.mc.videotron.ca) has joined #ceph
[16:28] <wedge> ceph.com is down :( any news on when it will be back up?
[16:28] <jagardaniel_> wedge: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[16:29] * wedge (~wedge@modemcable104.203-131-66.mc.videotron.ca) Quit ()
[16:31] * gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) Quit (Quit: Textual IRC Client: www.textualapp.com)
[16:31] * scuttle|afk (~scuttle@nat-pool-rdu-t.redhat.com) has joined #ceph
[16:31] * scuttle|afk is now known as scuttlemonkey
[16:31] * ron-slc (~Ron@173-165-129-118-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[16:31] * gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) has joined #ceph
[16:32] * kristen (~kristen@134.134.139.72) has joined #ceph
[16:36] <doppelgrau> http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/ " We are also working on some hardware upgrades (RAM) to speed the process along as well. There aren???t any more details at this time, but should we have more news we will update all customers."
[16:38] <doppelgrau> looks like they have the issues I'd experienced some time ago, but not the possibility to adjust the memory without "remote hands" and spare parts
[16:38] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[16:39] * Tralin|Sleep (~eXeler0n@37.203.209.26) has joined #ceph
[16:39] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:39] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[16:39] <peetaur2> why doesn't the memory hogging process just wait instead of crashing things?
[16:40] <peetaur2> just not implemented, or somehow technically impossible?
[16:40] <btaylor> is it possible to hotplug an rbd to a qemu process if it has dropped privileges and/or runs in a chroot? (using the -chroot and/or -runas options?)
[16:40] * derjohn_mob (~aj@80.187.107.61) has joined #ceph
[16:40] <btaylor> wondering because every time i try i get a crypto error
[16:40] <doppelgrau> peetaur2: if the osd gets swapped out it's marked down ...
[16:41] <rkeene> peetaur2, Wait for what ?
[16:41] <peetaur2> btaylor: probably not possible... maybe you should use run as root and use apparmor/selinux to confine it instead
[16:41] <doppelgrau> (too long response times) resulting in more pgs degraded => more memory consumption
[16:41] <peetaur2> btaylor: but you should ask that in #qemu
[16:41] <peetaur2> maybe
[16:41] <IcePic> peetaur2: stuff like fsck really cant wait, in general
[16:41] <IcePic> you can make sure you dont run over several fses at the same time of course.
[16:42] <peetaur2> rkeene: if some remapping process wants RAM and there isn't any, it can just wait like it did before the osd was marked down or whatever, rather than eat more RAM than exists and fail
[16:42] <peetaur2> delaying repair is bad of course, but crashing instead of repair is worse
[16:42] <doppelgrau> (a bit it can reduced with noout, nodown, but thats awfully slow)
[16:42] <IcePic> but if you have added and added to an fs and only at crash+reboot find out that it needs huge amounts of ram and you dont have that amount, things are going to be tough.
[16:44] <IcePic> dont know exactly what part they got stuck on though.
[16:44] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[16:44] <btaylor> peetaur2: its looking like that may be the only way. is it possible to mount ceph things as a regular user forgetting about the qemu part?
[16:44] <IcePic> I just know from old days with AFS that fixing fileservers meant both underlying fsck and then on-top-of-that afs salvage to run over all data.
[16:44] <doppelgrau> my guess: with so many PGs degraded osds needing way more memory than usuall => not enough memory installed
[16:45] * masteroman (~ivan@93-142-195-123.adsl.net.t-com.hr) Quit (Quit: WeeChat 1.5)
[16:45] <doppelgrau> I experienced that each OSD can grow from less than 1 GB to nearly 4 GB ...
[16:45] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:46] * kefu (~kefu@114.92.125.128) Quit (Remote host closed the connection)
[16:46] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:49] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:50] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[16:52] * ashah (~ashah@125.16.34.66) Quit (Quit: Leaving)
[16:53] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[16:54] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:54] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[16:56] * kristen (~kristen@134.134.139.72) Quit (Remote host closed the connection)
[16:56] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[16:57] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:58] <dillaman> btaylor: should run just fine outside of root
[16:59] * ntpttr_ (~ntpttr@192.55.54.38) has joined #ceph
[16:59] * yanzheng (~zhyan@118.116.115.174) Quit (Quit: This computer has gone to sleep)
[17:00] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[17:00] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Quit: Ex-Chat)
[17:00] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[17:04] * jagardaniel_ (~oftc-webi@2001:9b0:109:103::b6) Quit (Quit: Page closed)
[17:07] * ade (~abradshaw@pool-22.254.176.62.dynamic.wobline-ip.de) Quit (Ping timeout: 480 seconds)
[17:09] * Tralin|Sleep (~eXeler0n@37.203.209.26) Quit ()
[17:09] * kristen (~kristen@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[17:11] * salwasser (~Adium@72.246.3.14) Quit (Remote host closed the connection)
[17:15] * cmart (~cmart@128.196.38.77) has joined #ceph
[17:17] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[17:18] * kefu (~kefu@114.92.125.128) has joined #ceph
[17:19] * wushudoin (~wushudoin@50.235.109.50) has joined #ceph
[17:21] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[17:24] * Amto_res (~amto_res@ks312256.kimsufi.com) has joined #ceph
[17:24] <Amto_res> Hello all, i have a problem with download.ceph.com ?
[17:25] <Amto_res> Is there a current problem on the platform?
[17:26] * Hemanth (~hkumar_@125.16.34.66) Quit (Ping timeout: 480 seconds)
[17:26] <wiebalck> yes, there is
[17:26] <Amto_res> Okay ;)
[17:27] <Amto_res> Good luck to you ;)
[17:27] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[17:28] * rmart04 (~rmart04@support.memset.com) Quit (Ping timeout: 480 seconds)
[17:28] * kefu (~kefu@114.92.125.128) has joined #ceph
[17:28] * derjohn_mob (~aj@80.187.107.61) Quit (Ping timeout: 480 seconds)
[17:29] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:29] <wiebalck> Amto_res: thx, you can follow things here I think: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[17:31] <topro> "Our beta DreamCompute cluster named US-East 1 is currently having issues with its storage system" <-- its not a ceph storage cluster, is it? ;)
[17:32] <Amto_res> topro: Yes ! ;) It leads our production test right? ;) ... Good luck .. :/
[17:33] <topro> oh well, reading the whole announcement I would have seen it actually IS a ceph cluster. just wanted to be kidding. sorry...
[17:33] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) has joined #ceph
[17:33] * derjohn_mob (~aj@tmo-107-61.customers.d1-online.com) has joined #ceph
[17:33] * rraja (~rraja@125.16.34.66) Quit (Ping timeout: 480 seconds)
[17:33] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) has joined #ceph
[17:34] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-055.uzh.ch) Quit (Remote host closed the connection)
[17:34] * magicrobot (~oftc-webi@162.246.193.10) has joined #ceph
[17:35] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[17:39] <IcePic> topro: felt the same way when reading today
[17:40] * evelu (~erwan@aut78-1-78-236-183-64.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[17:40] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[17:40] * SweetGirl (~clusterfu@108.61.122.156) has joined #ceph
[17:41] <Nicho1as> ceph-deploy --repo-url http://eu.ceph.com --gpg-url http://eu.ceph.com/keys/release.asc --release jewel install
[17:42] <Nicho1as> "[WARNIN] W: Failed to fetch http://eu.ceph.com/dists/jessie/main/binary-amd64/Packages 404 Not Found [IP: 185.27.175.43 80]"
[17:42] <Nicho1as> :(
[17:43] <Amto_res> Nicho1as: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[17:43] <Nicho1as> Amto_res: I used eu.ceph.com though
[17:43] <darkfader> yeah, that's why it's just a 404 :/
[17:43] <darkfader> seems their mirror script blew away from the empty source
[17:44] <Amto_res> Nicho1as: http://eu.ceph.com/debian/dists/jessie/
[17:44] <Amto_res> ;)
[17:45] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[17:48] <darkfader> Amto_res: no packages hehe
[17:49] * salwasser (~Adium@a72-246-0-10.deploy.akamaitechnologies.com) has joined #ceph
[17:49] <Amto_res> darkfader: haha :/
[17:49] * Skaag (~lunix@65.200.54.234) has joined #ceph
[17:50] <topro> oh, thats going to be an expensive mirror resync then...
[17:51] <topro> expensive in terms of bandwidth needed
[17:51] <Nicho1as> darkfader: there is ..? http://eu.ceph.com/debian/dists/jessie/main/binary-amd64/Packages
[17:51] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[17:51] <Nicho1as> but I don't know how to make ceph-deploy use that
[17:51] <darkfader> Nicho1as: and can you find a dpkg for jessie?
[17:52] * derjohn_mob (~aj@tmo-107-61.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[17:54] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:55] * kefu (~kefu@114.92.125.128) has joined #ceph
[18:00] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:02] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[18:02] * cholcombe (~chris@nc-67-233-225-36.dhcp.embarqhsd.net) has joined #ceph
[18:03] <Nicho1as> darkfader: I can't
[18:04] * KindOne_ (kindone@h222.130.30.71.dynamic.ip.windstream.net) has joined #ceph
[18:05] * cmart (~cmart@128.196.38.77) Quit (Ping timeout: 480 seconds)
[18:07] * KindOne- (kindone@h198.226.28.71.dynamic.ip.windstream.net) has joined #ceph
[18:08] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[18:08] * batrick (~batrick@2600:3c00::f03c:91ff:fe96:477b) Quit (Quit: WeeChat 1.5)
[18:10] * SweetGirl (~clusterfu@108.61.122.156) Quit ()
[18:10] * batrick (~batrick@2600:3c00::f03c:91ff:fe96:477b) has joined #ceph
[18:11] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:12] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[18:12] * KindOne- is now known as KindOne
[18:12] * kefu (~kefu@114.92.125.128) has joined #ceph
[18:13] * KindOne_ (kindone@h222.130.30.71.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[18:14] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[18:15] * kefu (~kefu@114.92.125.128) has joined #ceph
[18:16] * wes_dillingham (~wes_dilli@140.247.242.44) Quit (Quit: wes_dillingham)
[18:17] * cmart (~cmart@150.135.128.35) has joined #ceph
[18:19] * ivve (~zed@c83-254-15-40.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[18:20] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[18:21] * KindOne_ (kindone@h198.226.28.71.dynamic.ip.windstream.net) has joined #ceph
[18:24] * branto (~branto@transit-86-181-132-209.redhat.com) Quit (Quit: ZNC 1.6.3 - http://znc.in)
[18:24] * cholcombe (~chris@nc-67-233-225-36.dhcp.embarqhsd.net) Quit (Ping timeout: 480 seconds)
[18:24] * Mousey (~Random@108.61.166.135) has joined #ceph
[18:24] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[18:25] * Ramakrishnan (~ramakrish@106.51.26.9) has joined #ceph
[18:25] * dlan (~dennis@116.228.88.131) has joined #ceph
[18:26] * KindOne- (kindone@h53.224.28.71.dynamic.ip.windstream.net) has joined #ceph
[18:27] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[18:28] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:28] * KindOne- is now known as KindOne
[18:28] * Ramakrishnan (~ramakrish@106.51.26.9) Quit ()
[18:31] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:31] * KindOne- (kindone@h37.162.186.173.dynamic.ip.windstream.net) has joined #ceph
[18:33] * KindOne_ (kindone@h198.226.28.71.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[18:33] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) has joined #ceph
[18:34] * sage (~quassel@64.111.99.127) Quit (Ping timeout: 480 seconds)
[18:35] <topro> seems we're back online :D
[18:35] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:36] <topro> at least http://docs.ceph.com
[18:36] * dgurtner (~dgurtner@94.126.212.170) Quit (Ping timeout: 480 seconds)
[18:37] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:42] * KindOne- (kindone@h37.162.186.173.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[18:44] * xarses_ (~xarses@64.124.158.3) has joined #ceph
[18:45] * rwheeler (~rwheeler@174-23-105-60.slkc.qwest.net) has joined #ceph
[18:48] <ledgr> which one is preferred to deploy ceph? ceph-ansible or deep-sea (salt) ?
[18:48] * sage (~quassel@64.111.99.127) has joined #ceph
[18:49] <doppelgrau> ledgr: depends what you want to use for the rest :)
[18:49] <doppelgrau> ledgr: I???m happy with ansible
[18:50] <[arx]> i use puppet
[18:50] <ledgr> burn! :D
[18:50] <ledgr> thanks ;)
[18:51] * mykola (~Mikolaj@91.245.76.13) has joined #ceph
[18:51] * efirs (~firs@98.207.153.155) has joined #ceph
[18:53] * xinli (~charleyst@32.97.110.53) has joined #ceph
[18:54] * Mousey (~Random@108.61.166.135) Quit ()
[18:54] * ntpttr_ (~ntpttr@192.55.54.38) Quit (Remote host closed the connection)
[18:57] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[18:58] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[18:58] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[18:58] * kefu (~kefu@114.92.125.128) has joined #ceph
[19:00] <valeech> I am have some performance questions. Running hammer on 3 nodes. Each node has 6 WD 5TB Red spinners and 2 Samsung NVMe drives with dual 10G nics. The NVMe are partitioned to provide journals for 3 OSDs each. Ceph public network is on one 10G nic and cluster network is on the other nic. Monitors also run on the 3 nodes. My question is, does the output from these tests make sense? Why are my writes not faster? Or do I just not get
[19:00] <valeech> benchmarking? http://pastebin.com/kfyes1iv
[19:00] <blizzow> Oh the irony download.ceph.com is down because their ceph cluster went down.
[19:00] <valeech> blizzow: uh oh???
[19:00] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) has joined #ceph
[19:00] <diver> blizzow: is it true?
[19:01] * gardar (~oftc-webi@89-160-187-133.du.xdsl.is) has joined #ceph
[19:01] <diver> download.ceph.com still doesn't work from my side
[19:01] * gardar (~oftc-webi@89-160-187-133.du.xdsl.is) Quit ()
[19:01] <blizzow> "Update October 12, 2016 9:00 AM:?? After some memory upgrades, the Ceph cluster is making very good progress. "
[19:01] <diver> once it get online I'll probably mirror it in my DC
[19:01] <diver> it's completely unacceptable
[19:01] <blizzow> "Update October 11, 2016 11:45 AM:?? Work continues on the Ceph cluster for the DreamCompute US-East 1 cluster."
[19:02] <diver> ow. that's irony, true
[19:02] <blizzow> Maybe they should send some of their team into #ceph for support. ;)
[19:02] * efirs (~firs@98.207.153.155) Quit (Quit: Leaving.)
[19:03] <diver> for me it makes no sense why 1) they don't have live mirror 2) why it doesn't point there
[19:03] <diver> 5$ VM with 1TB traffic and nginx will handle the load easily
[19:04] <diver> production? never heard
[19:04] * KindOne (kindone@0001a7db.user.oftc.net) has joined #ceph
[19:05] <blizzow> It makes me question my choice of ceph as a storage backend for my infrastructure.
[19:05] <doppelgrau> valeech: that numbers seem odd, just to be sure: reads are (way) faster?
[19:05] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[19:06] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) Quit (Quit: Leaving.)
[19:06] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[19:07] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) has joined #ceph
[19:07] <valeech> dopplegrau: That???s what it looks like??? I started troubelshooting this because I thought I was having slow read performance backing up data out of ceph. But the benchmarks can???t lie, right? Right?
[19:07] * Rickus_ (~Rickus@office.protected.ca) Quit (Quit: Leaving)
[19:07] * wes_dillingham (~wes_dilli@140.247.242.44) has joined #ceph
[19:08] <doppelgrau> blizzow: if you seriously fuck the cluster up (and their network problems did that), nearly all PGs become degraded => OSDs need way more memory (I once observed about 4GB instead normaly less than one GB) => if you have not enough memory in the node it starts swapping and osds get marked down ???
[19:09] <Unai> Just FYI here ( maybe this will help someone one day( : If you ever think you lost an OSD's drive, and even marked the drive lost , you can get the OSD back if you revive the drive and recreate the OSD with the same ID. So, good on ceph for recognising an OSD that we explicitly ( and wrongly ) told it to completely and absolutely forget.
[19:09] <doppelgrau> blizzow: but with enough memory it recovered fine, even there happend some failed disks too
[19:09] <blizzow> doppelgrau: you working for dreamhost or something?
[19:10] * [arx] is now known as llua
[19:10] <doppelgrau> valeech: of course, benchmarks can lie, but usually they show some truth, what I find intresting is the degrading bandwidth, does it stay the same when running linger benchmarks?
[19:11] * derjohn_mob (~aj@x590c583f.dyn.telefonica.de) has joined #ceph
[19:11] <doppelgrau> blizzow: no, that was freelance for a small german ISP, if I???d been working for dreamhost, I???d make sure that there be enough memory for each OSD, even during really bad situations
[19:12] * dgurtner (~dgurtner@176.35.230.73) has joined #ceph
[19:14] <valeech> dopplegrau: it kinda bounces around the average of 177MBps http://pastebin.com/dntqHHCS
[19:15] <doppelgrau> valeech: the ???size??? is three, default crush map?
[19:16] <diver> >>you working for dreamhost
[19:16] <diver> haha, nice
[19:16] <valeech> dopplegrau: yes, default crushmap. pool is 3/1 with 512pgs???
[19:18] <doppelgrau> valeech: so 30MB/s is written to each platter, these arent some SMR drives?
[19:20] <valeech> dopplegrau: No, I don???t think so. I think they are PMR
[19:23] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Quit: cyphase.com)
[19:23] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[19:23] <blizzow> Is there a way to tell how long an OSD has been part of a cluster?
[19:24] <doppelgrau> valeech: with SSD journals, I???d expect double or tripple bandwidth. Dis you run the benchmark with smaler objects? -o 32k e.g. (to identify if the IO/s or the bandwidth is the limit somewhere)
[19:24] * xinli (~charleyst@32.97.110.53) Quit (Remote host closed the connection)
[19:25] * xinli (~charleyst@32.97.110.57) has joined #ceph
[19:25] <valeech> dopplegrau: I have not. The NVMe drives I am using scream if I benchmark them individually so I was expecting much faster writes. At least as fast as one drive :)
[19:28] * BrianA (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[19:31] <doppelgrau> valeech: can you run a short (20 seconds or so) bench with small objects?
[19:31] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:31] <doppelgrau> valeech: just to be sure it???s not the IO/s
[19:32] <valeech> dopplegrau: Interesting: http://pastebin.com/2jj47tTa
[19:33] <blizzow> Any time I want to do OSD maintenance or even tuning in my cluster, I end up crippling all my VMs. Pulling nodes from other clustering software like elasticsearch or hdfs does not cause me suffering like ceph does. How can I remove a node or OSD without kicking off a bunch of "requests are blocked > 32 sec"?
[19:33] * sage (~quassel@64.111.99.127) Quit (Ping timeout: 480 seconds)
[19:33] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[19:33] * morourke (~Mike@158.116.144.39) has joined #ceph
[19:39] <morourke> Anybody know of a place where people can upload/share ceph performance information? Maybe something like serverbear (now defunct) but for storage/cluster?
[19:40] * sage (~quassel@64.111.99.127) has joined #ceph
[19:41] <btaylor> i???m trying to unmap a rbd, but it has /sys/block/rbd0/holders/dm-2. any ideas how i can remove that?
[19:42] <btaylor> nm got it
[19:43] <btaylor> dmsetup.
[19:43] <blizzow> morourke: I set up a google doc once as a survey nobody really responded. https://drive.google.com/open?id=1vi_9uF0w9-OvTHTrxeMt6JPOk7mdAvnAAyy5NpJuK9Q
[19:43] <blizzow> I'd love some feedback on the questions or what to ask.
[19:43] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Quit: cyphase.com)
[19:43] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[19:44] <morourke> blizzow, I've opened the survey. I'll have to look it over before I can say. :)
[19:44] <blizzow> of course, thanks for even looking!
[19:45] * peetaur (~peter@p200300E10BC02C00667002FFFE2E10FC.dip0.t-ipconnect.de) has joined #ceph
[19:45] <morourke> Do you think there'd be any interest in having a benchmarking/ranking site? Maybe users could fill in information survey-style, and some kind of trusted benchmarking/testing
[19:45] <blizzow> morourke: I would be interested in it for sure.
[19:46] <blizzow> That's why I set up that google doc survey ;)
[19:49] <doppelgrau> valeech: intresting the jumps in the bandwidth (guess some journal settings might cause them), but with 400-1000 IO/s that should not be the main problem
[19:50] <doppelgrau> blizzow: how do you remove an osd for maintenance, I usually get only some slow request on rejoin when the peering happend
[19:52] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:52] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[19:52] <blizzow> doppelgrau: it depends. But typically I do the osd out and remove it from the crush map dance.
[19:54] <doppelgrau> blizzow: eeks, so start the fulll dance with degraded and missplaced PGs at once => much peering???
[19:55] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) Quit (Quit: Leaving.)
[19:55] <doppelgrau> if it isn???t a failure, in my experience slowly reducing the weight to zero s way more firendly
[19:56] <doppelgrau> I usually do it in 3 or 5 steps wirh about 30 seconds break between (so peering finisches each time)
[19:57] <morourke> blizzow, I think that's a solid set of questions in your survey. The only "obvious" gap is a non-issue. No question on "if" someone is using journaling, but if someone is trying to talk-up their performance then they'll be using journaling anyway
[19:58] <blizzow> doppelgrau: how full are your OSDs?
[19:59] <doppelgrau> about 40% ATM
[19:59] <blizzow> how much data rather? I'm storing about 300GB/OSD.
[20:00] <doppelgrau> should be arounf 800GB/OSD
[20:01] <blizzow> And it only takes you 5 minutes to remove an OSD from a cluster?
[20:01] <blizzow> morourke: I'll add it.
[20:02] <doppelgrau> blizzow: no, data movement takes longer, but reducing the weight usually take only one or two minutes
[20:03] <doppelgrau> blizzow: but with the peering distributed, I see only very few slow requests (in the range 10-15 seconds usually)
[20:03] <doppelgrau> blizzow: and after the data has been migrated, the osd can be removed without any impact
[20:11] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[20:11] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[20:16] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[20:18] * mnc (~mnc@c-50-137-214-131.hsd1.mn.comcast.net) has joined #ceph
[20:20] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) has joined #ceph
[20:22] * Hemanth (~hkumar_@103.228.221.140) has joined #ceph
[20:23] * dgurtner (~dgurtner@176.35.230.73) Quit (Ping timeout: 480 seconds)
[20:24] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:28] * rendar (~I@host153-176-dynamic.52-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:38] * niknakpa1dywak (~xander.ni@outbound.lax.demandmedia.com) has joined #ceph
[20:44] * bjozet (~bjozet@82.183.17.144) has joined #ceph
[20:48] <valeech> dopplegrau: Yeah, I am not sure what the problem is???
[20:49] * salwasser (~Adium@a72-246-0-10.deploy.akamaitechnologies.com) Quit (Ping timeout: 480 seconds)
[20:49] * font (~ifont@67.159.147.92) has joined #ceph
[20:51] <font> ceph.com down?
[20:52] <font> server responds to echo request but the web server appears to be down
[20:53] <blizzow> font, ceph.com is down because their ceph cluster went down.
[20:53] <blizzow> http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[20:54] <blizzow> joao or nhm, could you PLEASE change the channel topic to reflect this?
[20:54] <font> yes that would be helpful as that's the first place I looked
[20:54] <blizzow> joao: or nhm: could you also put a link in the channel topic for where the cephlogbot logs to?
[20:54] * rendar (~I@host153-176-dynamic.52-79-r.retail.telecomitalia.it) has joined #ceph
[20:55] <blizzow> could leseb do either of those things?
[21:05] <font> thanks blizzow
[21:13] * pakman__ (~Tumm@108.61.122.156) has joined #ceph
[21:16] * georgem1 (~Adium@45.72.236.114) has joined #ceph
[21:17] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) Quit (Quit: Leaving.)
[21:17] <leseb> blizzow nop I don't have access to that
[21:17] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) has joined #ceph
[21:18] * maybebuggy (~maybebugg@2a01:4f8:191:2350::2) Quit (Ping timeout: 480 seconds)
[21:19] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[21:23] <blizzow> damn :(
[21:24] * georgem (~Adium@206.108.127.16) Quit (Ping timeout: 480 seconds)
[21:24] * georgem1 (~Adium@45.72.236.114) Quit (Ping timeout: 480 seconds)
[21:27] * lmb (~Lars@2a02:8109:8100:1d2c:2ad2:44ff:fedf:3318) Quit (Ping timeout: 480 seconds)
[21:30] * cmart (~cmart@150.135.128.35) Quit (Ping timeout: 480 seconds)
[21:31] <herrsergio> nhm: you have privileges to do it
[21:33] * haplo37 (~haplo37@199.91.185.156) Quit (Ping timeout: 480 seconds)
[21:35] * keeperandy (~textual@50-245-231-209-static.hfc.comcastbusiness.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[21:36] * lmb (~Lars@ip5b404bab.dynamic.kabel-deutschland.de) has joined #ceph
[21:38] * cholcombe (~chris@nc-67-233-225-36.dhcp.embarqhsd.net) has joined #ceph
[21:40] <thoht> what is the best way to remove a mon definitvely ?
[21:41] * wes_dillingham (~wes_dilli@140.247.242.44) Quit (Ping timeout: 480 seconds)
[21:42] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[21:42] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[21:43] * pakman__ (~Tumm@108.61.122.156) Quit ()
[21:44] <blizzow> herrsergio: nhm and joao seem to repeatedly ignore requests to change the channel topic.
[21:45] * Hemanth (~hkumar_@103.228.221.140) Quit (Ping timeout: 480 seconds)
[21:49] * Discovery (~Discovery@109.235.52.4) has joined #ceph
[21:53] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) Quit (Quit: Leaving.)
[22:00] <brians_> the whole ceph organisation hasn't exactly been great at communicating why their infrastructure is down - 2 minutes to post something on twitter to let people know whats up
[22:00] <brians_> but perhaps they're all tied up trying to get dreamhost cloud compute back up in fairness
[22:00] <lurbs> Even a note on their website would help. ;)
[22:00] <brians_> lol
[22:05] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[22:13] * kjetijor (kjetijor@hildring.pvv.ntnu.no) has joined #ceph
[22:14] * mykola (~Mikolaj@91.245.76.13) Quit (Quit: away)
[22:14] <blizzow> Is the sage that's in here the same sage that runs ceph?
[22:15] * rendar (~I@host153-176-dynamic.52-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[22:24] * diver_ (~diver@216.85.162.34) has joined #ceph
[22:29] * haplo37 (~haplo37@199.91.185.156) Quit (Ping timeout: 480 seconds)
[22:29] * morourke (~Mike@158.116.144.39) Quit (Quit: Leaving)
[22:31] * diver (~diver@95.85.8.93) Quit (Ping timeout: 480 seconds)
[22:32] * cmart (~cmart@150.135.210.203) has joined #ceph
[22:32] * diver_ (~diver@216.85.162.34) Quit (Ping timeout: 480 seconds)
[22:32] * davidzlap (~Adium@2605:e000:1313:8003:6156:a079:6823:cfb6) has joined #ceph
[22:40] * cmart (~cmart@150.135.210.203) Quit (Ping timeout: 480 seconds)
[22:43] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[22:47] * Nats__ (~natscogs@114.31.195.238) has joined #ceph
[22:47] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[22:47] * Alexey_Abashkin (~AlexeyAba@91.207.132.76) has joined #ceph
[22:47] * JarekO_ (~jowsie@hC35A6AF2.cli.nitronet.pl) has joined #ceph
[22:47] * Tetard_ (~regnauld@x1.x0.dk) has joined #ceph
[22:48] * sepa (~sepa@aperture.GLaDOS.info) has joined #ceph
[22:49] * destrudo (~destrudo@tomba.sonic.net) Quit (Remote host closed the connection)
[22:49] * darkfaded (~floh@88.79.251.60) has joined #ceph
[22:49] * destrudo (~destrudo@tomba.sonic.net) has joined #ceph
[22:50] * peetaur (~peter@p200300E10BC02C00667002FFFE2E10FC.dip0.t-ipconnect.de) Quit (Quit: Konversation terminated!)
[22:50] * kiranos_ (~quassel@109.74.11.233) has joined #ceph
[22:50] * o0c_ (~o0c@chris.any.mx) has joined #ceph
[22:51] * pfactum (~post-fact@vulcan.natalenko.name) has joined #ceph
[22:51] * olc (~olecam@93.184.35.82) has joined #ceph
[22:51] * WildyLio1 (~simba@45.32.185.17) has joined #ceph
[22:51] * aiicore_ (~aiicore@s30.linuxpl.com) has joined #ceph
[22:51] * kiranos (~quassel@109.74.11.233) Quit (Remote host closed the connection)
[22:51] * darkfader (~floh@88.79.251.60) Quit (Remote host closed the connection)
[22:51] * o0c (~o0c@chris.any.mx) Quit (Remote host closed the connection)
[22:51] * seosepa (~sepa@aperture.GLaDOS.info) Quit (Read error: Connection reset by peer)
[22:51] * Tetard (~regnauld@x1.x0.dk) Quit (Read error: Connection reset by peer)
[22:51] * WildyLion (~simba@45.32.185.17) Quit (Remote host closed the connection)
[22:51] * aiicore (~aiicore@s30.linuxpl.com) Quit (Remote host closed the connection)
[22:51] * olc-__ (~olecam@93.184.35.82) Quit (Remote host closed the connection)
[22:51] * libracious (~libraciou@catchpenny.cf) Quit (Ping timeout: 480 seconds)
[22:51] * jprins (~jprins@bbnat.betterbe.com) Quit (Remote host closed the connection)
[22:51] * Hannes (~Hannes@hygeia.opentp.be) Quit (Remote host closed the connection)
[22:51] * jprins (~jprins@bbnat.betterbe.com) has joined #ceph
[22:51] * Hannes (~Hannes@hygeia.opentp.be) has joined #ceph
[22:52] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Read error: Connection reset by peer)
[22:52] * cmart (~cmart@150.135.128.35) has joined #ceph
[22:52] * JarekO (~jowsie@hC35A6AF2.cli.nitronet.pl) Quit (Ping timeout: 480 seconds)
[22:53] * AlexeyAbashkin (~AlexeyAba@91.207.132.76) Quit (Ping timeout: 480 seconds)
[22:53] * Nats_ (~natscogs@114.31.195.238) Quit (Ping timeout: 480 seconds)
[22:54] * Dominik_H (b034c4a4@107.161.19.109) has joined #ceph
[22:54] * destrudo (~destrudo@tomba.sonic.net) Quit (Remote host closed the connection)
[22:55] <Dominik_H> docs.ceph.com down ?
[22:57] <blizzow> Oh fuck not again.
[22:58] <blizzow> Dominik_H: http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/
[22:58] <Dominik_H> yes i think since a few hours
[22:58] <Dominik_H> ah ok, i see, thanks
[22:58] <blizzow> Dominik_H: It's been down since yesterday.
[23:00] * georgem (~Adium@45.72.224.60) has joined #ceph
[23:00] * destrudo (~destrudo@tomba.sonic.net) has joined #ceph
[23:01] <Dominik_H> hm, ok i hope it will be on soon ;)
[23:05] <btaylor> i???ll be very interesting in reading a post mortem so i can avoid any pitfalls that lead to their ceph cluster going down, and so i know how to fix it when it does
[23:06] * wes_dillingham (~wes_dilli@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) has joined #ceph
[23:07] <T1> I'm guessing ymmv - their problems are probably not something you will encounter
[23:08] * georgem (~Adium@45.72.224.60) Quit (Ping timeout: 480 seconds)
[23:09] * destrudo (~destrudo@tomba.sonic.net) Quit (Remote host closed the connection)
[23:10] * destrudo (~destrudo@tomba.sonic.net) has joined #ceph
[23:13] * Discovery (~Discovery@109.235.52.4) Quit (Read error: Connection reset by peer)
[23:14] <doppelgrau> btaylor: the core message is in the status updates, put enough memory in the OSD-Nodes
[23:14] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[23:14] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[23:14] <doppelgrau> (after doing that, the updates got better)
[23:15] <herrsergio> which is the relationship between Ceph and Dreamhost ?
[23:15] <Dominik_H> how much would you recommend per osd ?
[23:16] * blizzow (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:16] <herrsergio> I mean, does the ceph team work at dreamhost ?
[23:16] * blizzow1 (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) has joined #ceph
[23:17] <doppelgrau> IIRC inktank (core ceph team) is sold to red hat, but I guess when dreamhost sponsors their infrastructure there might be good connections
[23:19] <doppelgrau> Dominik_H: with nearly all PGs unhealthy (degraded, undersized, remapped ???) I???ve seen 4 times the normal memory amount
[23:19] <herrsergio> Anyway, the motto "self-managing platform with no single point of failure" is a lie :(
[23:20] <doppelgrau> Dominik_H: normal use bit less than 1 GB, in that state about 4GB
[23:20] <Dominik_H> @doppelgrau hm, good to know, i thougt also about 1GB/OSD is enough
[23:21] <T1> the old rule of thumb of 1GB ram per 1TB of storage is low
[23:21] <T1> there has been talk of going up to 2GB/1TB of data several times on the mailinglist
[23:21] <doppelgrau> herrsergio: depends on the definition, not enough memory might be seen as a common fault from the operator
[23:21] <T1> especially if using EC
[23:22] * destrudo (~destrudo@tomba.sonic.net) Quit (Remote host closed the connection)
[23:22] <blizzow1> ooh, looks like it might be back?
[23:22] <blizzow1> Let's all hit the repos at once!
[23:24] <doppelgrau> herrsergio: and what gave me goo confidence in ceph was, that it got stable quite fast, after getting enough ressources, after some hours with different failures including failed disks
[23:24] <Cube> T1 - 1GB per 1TB storage doesn't actually make any sense??? We typically recommned 2-3GB per OSD + 16GB for OS
[23:25] <T1> Cube: I actually meant 1GB per 1TB per OSD
[23:25] <doppelgrau> but the assumption ???normal operations uses only 1 GB, so 1,5 per OSD should be enough??? was a huge mistake when for different reasons the cluster got in a very unhealthy state
[23:25] <T1> so a single OSD holding up to 4TB of data should have access to at least 4GB of memory
[23:26] * blizzow1 is now known as blizzow
[23:27] <doppelgrau> in my experience the memory consumption primaryly depends on the number of PGs and the state of the PGs, not to much on the amount of data stored
[23:27] <T1> probably right
[23:27] * destrudo (~destrudo@tomba.sonic.net) has joined #ceph
[23:27] <doppelgrau> but that is not based on hard statistics, at least not yet :)
[23:28] <T1> it "aligns" with the recommendation of not having "too many" PGs or you will have an increased amount of ram
[23:29] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Remote host closed the connection)
[23:29] <T1> but going for a 2GB/1TB/OSD ratio when designing new nodes is IMO not a bad idea
[23:29] <Cube> at least 2GB
[23:29] * ron-slc (~Ron@173-165-129-118-utah.hfc.comcastbusiness.net) has joined #ceph
[23:30] <T1> so a node with 8 OSDs having 4TB each would require 64GB + OS
[23:30] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:30] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[23:30] * doppelgrau calculates with 4GB/OSD (mainly 2TB disks, some smaller SSDs) - or 4 times the memory used in healthy state
[23:30] <bstillwell> I would say 32GB is good for 8 OSDs
[23:30] <blizzow> with the recommendation of 16GB for OS, that's 80GB RAM.
[23:30] <T1> and just for the sake of it then double that amount - memory is cheap anyway
[23:30] * markl (~mark@knm.org) has joined #ceph
[23:30] <blizzow> That seems f00king high.
[23:31] <Cube> If your applcation is read heavy then it makes sense to have lots of RAM.
[23:31] <T1> having some available for cache in the os is also not bad
[23:31] <bstillwell> Cube has a good point
[23:32] <bstillwell> T1 are these going to be SSDs or HDDs?
[23:32] <doppelgrau> bstillwell: 32GB for 8 OSDS is propably ok, after my experience I???d go for 48GB to have some headroom for the OS
[23:32] <T1> bstillwell: rotating rust..
[23:32] <doppelgrau> but 80GB would be nice, but oversized (except you have lots of ???local??? reads that profit from that)
[23:33] <bstillwell> doppelgrau: I was thinking 2GB/OSD would only be 16GB. So 32GB in the system would give you 16GB for OS and cache.
[23:33] <T1> I'm still pondering the final setup, but 4TB 2.5" disks should be doable - 10 2.5" bays in 1U where 2 are for OS + SSD based journals and then 8 OSDs
[23:33] <bstillwell> T1: What is your SSD:HDD journal:OSD ratio?
[23:33] <Cube> What sort of SSDs?
[23:33] <blizzow> T1: have you pondered gluster or something else?
[23:34] <doppelgrau> bstillwell: since I???ve observed OSD-daemons (ok, at that time having lots of PGs/OSD) going up to 4GB/OSD with nearly all PGs not healthy...
[23:34] * Jeffrey4l__ (~Jeffrey@119.251.244.147) has joined #ceph
[23:34] <T1> bstillwell: undecided
[23:34] <doppelgrau> bstillwell: so I???d now calculate 4GB per OSD + at least 2GB for the OS (or more if it has oher duties)
[23:34] <T1> blizzow: no, at the moment we're just adding capacity
[23:34] <bstillwell> T1: So you're thinking of sharing the OS with the journals?
[23:35] <T1> bstillwell: yes - OS/journals on the same SSDs in a software raid1
[23:35] <T1> we're running that already on smaller nodes
[23:35] <bstillwell> doppelgrau: How many PGs did you have per OSD?
[23:35] <doppelgrau> bstillwell: I think it was about 400 (was during between some transition between some pools)
[23:35] <bstillwell> T1: Yuck, but I guess it would work.
[23:35] <T1> tests when setting it up in the days showed no penalty from running the journals on software raid1 compared to directly on a single ssd
[23:36] <bstillwell> doppelgrau: That makes sense then why you would need so much memory then.
[23:36] <bstillwell> If you're at 100-200 PGs/OSD it shouldn't be that high.
[23:36] <doppelgrau> bstillwell: or to compare, in healthy state it was about 900MB up to 1,1GB
[23:36] <T1> .. and I'd like to thing that a set of mirrores SSDs would prevent a single SSD failure to pull down either the entire host or all OSDs
[23:37] <bstillwell> T1: Did you RAID1 the journals?
[23:37] <T1> bstillwell: yes - I did not see any reason not to
[23:37] <T1> we've been running such a setup for the better part of a year now
[23:37] <blizzow> T1: because RAID1 will slow down writing to the journals?
[23:37] <T1> never seen a single issue with it
[23:38] <bstillwell> T1: That means you had a 10:1 ratio... Your HDDs could easily overload the SSDs in that situation.
[23:38] <T1> blizzow: no, I did not see any slowdown at all
[23:38] * Dominik_H (b034c4a4@107.161.19.109) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[23:38] <bstillwell> Unless your OSDs aren't very busy.
[23:38] <bstillwell> But backfilling could crush them.
[23:38] <Cube> Also what type SSDs?
[23:38] <doppelgrau> bstillwell: I hope I never have a such unhealthy cluster again, but better safe than sorry an during normal operation it???s a good read cache
[23:39] * Jeffrey4l_ (~Jeffrey@110.252.52.64) Quit (Ping timeout: 480 seconds)
[23:39] * magicrobot (~oftc-webi@162.246.193.10) Quit (Remote host closed the connection)
[23:39] * fsimonce (~simon@95.239.69.67) Quit (Quit: Coyote finally caught me)
[23:39] <doppelgrau> T1: why not raid1 for OS, splitting for the journals, double lifetime of the SSDs
[23:39] <bstillwell> doppelgrau: I can understand that.
[23:39] <T1> bstillwell: given a single SATA disk would be able to generate/require ~200 IOPS x8 that means at least 2000 IOPS from the mirrored device - tests on those 1 year old nodes gave me 40.000 IOPS with no tweaking.. I'm pretty sure it wont be a problem
[23:40] <T1> Cube: Intel S3710s
[23:40] <bstillwell> Although if you have a lot of reads, maybe you should look into using bcache.
[23:40] <T1> doppelgrau: because the SSDs allows for 10DWPD
[23:40] <Cube> Okay, those are great SSDs, good job :)
[23:40] <bstillwell> T1: Is your workload a lot of 4k reads/writes?
[23:41] <T1> Cube: indeed.. :)
[23:41] <Cube> Basically the ONLY SSDs we can recommned
[23:41] <T1> I know
[23:41] <T1> originally we though of going for some random cosumer-level stuff
[23:41] <T1> then I read up on older entries on the mailinglist..
[23:42] <Cube> yeah, nightmare stories???
[23:42] <Cube> been there many times.
[23:42] <bstillwell> Because usually you have to worry about throughput of the SSDs versus the IOPs.
[23:42] <T1> I threw a flag to my bosses and with a few minutes we reconsidered and went with better SSDs
[23:43] <T1> bstillwell: no, most IO are for larger amounts of data
[23:43] <bstillwell> Basically the journals have to handle the combined throughput of all the HDDs during normal and recovery operations.
[23:43] * evelu (~erwan@aut78-1-78-236-183-64.fbx.proxad.net) has joined #ceph
[23:44] <T1> mmm I know
[23:45] <T1> but then I can just tweak the backfill rates
[23:47] <bstillwell> T1: Well if you start having slow requests during failures or expansions, look at reducing that ratio of HDDs per journal SSD.
[23:47] <doppelgrau> I know a cluster running with sm863, till now, no bad knows (one for 4 HDDs - mostly small IO due to rbd-clients, less than 10 MB/s for the SSD written currently)
[23:47] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[23:49] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:50] <T1> bstillwell: quite hard to do with all bays filled, but I get your drift.. :)
[23:51] * bjozet (~bjozet@82.183.17.144) Quit (Remote host closed the connection)
[23:54] <bstillwell> T1: You could always add an NVMe card for the journals. A single NVMe card could handle 12-18 OSDs.
[23:54] <bstillwell> Or you could wait for BlueStore to become stable in Kraken and not have to deal with journals anymore.
[23:54] <T1> yeah, but that is not feasable in the cabinets I'm looking at
[23:55] <T1> .. which is also quite possible..
[23:55] <bstillwell> They don't have any free PCIe slots?
[23:55] <T1> probably not a x8 or x16, so
[23:56] <T1> (I can't remember the specs, but one slot is taken for a X710 10Gbit NIC)
[23:58] * w0lfeh (~ZombieL@tsn109-201-152-26.dyn.nltelcom.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.