#ceph IRC Log

Index

IRC Log for 2016-08-24

Timestamps are in GMT/BST.

[0:07] * rendar (~I@host158-39-dynamic.57-82-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[0:07] * xarses (~xarses@64.124.158.32) Quit (Ping timeout: 480 seconds)
[0:10] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[0:10] * Pulp (~Pulp@63-221-50-195.dyn.estpak.ee) Quit ()
[0:11] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[0:14] * kiasyn (~hgjhgjh@185.65.134.80) has joined #ceph
[0:20] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[0:20] * squizzi (~squizzi@107.13.237.240) Quit (Quit: bye)
[0:21] <jiffe> how does one set ignore_history_les=true ?
[0:22] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Ping timeout: 480 seconds)
[0:29] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[0:29] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:30] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[0:37] * jermudgeon (~jhaustin@tab.mdu.whitestone.link) has joined #ceph
[0:38] * DrewBeer (~DrewBeer@216.152.240.203) Quit (Server closed connection)
[0:38] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:38] * DrewBeer (~DrewBeer@216.152.240.203) has joined #ceph
[0:40] * SpamapS (~SpamapS@xencbyrum2.srihosting.com) Quit (Server closed connection)
[0:40] * SpamapS (~SpamapS@xencbyrum2.srihosting.com) has joined #ceph
[0:41] * vbellur (~vijay@71.234.224.255) has joined #ceph
[0:44] * kiasyn (~hgjhgjh@26XAABA16.tor-irc.dnsbl.oftc.net) Quit ()
[0:44] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:44] <jiffe> so I've updated the two osds attached to that PH settings osd_find_best_info_ignore_history_les = true, now that PG keeps flipping between peering and remapped+peering and the query shows http://nsab.us/public/ceph
[0:45] <jiffe> I no longer see any reference to OSD 29 which is the one I pulled out so I'm not sure what the holdup is
[0:59] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[1:00] * jermudgeon (~jhaustin@tab.mdu.whitestone.link) Quit (Quit: jermudgeon)
[1:02] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Read error: Connection reset by peer)
[1:05] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[1:05] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[1:09] * cathode (~cathode@50.232.215.114) Quit (Quit: Leaving)
[1:13] * oms101 (~oms101@p20030057EA69E600C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:13] * T1w (~jens@node3.survey-it.dk) Quit (Remote host closed the connection)
[1:14] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[1:22] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[1:23] * oms101 (~oms101@p20030057EA6F3B00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:24] * KindOne (kindone@h253.163.186.173.dynamic.ip.windstream.net) has joined #ceph
[1:28] * masber (~masber@129.94.15.152) has joined #ceph
[1:28] <masber> Hi
[1:28] * salwasser (~Adium@2601:197:101:5cc1:7c3f:e556:a23f:af1a) has joined #ceph
[1:29] * storage (~poller@108.61.123.68) has joined #ceph
[1:33] * TiCPU (~owrt@c216.218.54-96.clta.globetrotter.net) Quit (Server closed connection)
[1:33] * TiCPU (~owrt@2001:470:1c:40::2) has joined #ceph
[1:34] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[1:34] * raphaelsc (~raphaelsc@177.42.73.142) has joined #ceph
[1:35] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[1:42] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[1:48] * herrsergio (~herrsergi@00021432.user.oftc.net) Quit (Server closed connection)
[1:48] * herrsergio (~herrsergi@ec2-107-21-210-136.compute-1.amazonaws.com) has joined #ceph
[1:49] * herrsergio is now known as Guest798
[1:52] * evilrob (~evilrob@2600:3c00::f03c:91ff:fedf:1d3d) Quit (Server closed connection)
[1:52] * dneary (~dneary@207.236.147.202) has joined #ceph
[1:54] * evilrob (~evilrob@2600:3c00::f03c:91ff:fedf:1d3d) has joined #ceph
[1:54] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[1:54] * salwasser (~Adium@2601:197:101:5cc1:7c3f:e556:a23f:af1a) Quit (Quit: Leaving.)
[1:59] * storage (~poller@108.61.123.68) Quit ()
[2:01] * Kruge (~Anus@198.211.99.93) Quit (Server closed connection)
[2:01] * Kruge (~Anus@198.211.99.93) has joined #ceph
[2:01] * Nats (~natscogs@114.31.195.238) has joined #ceph
[2:04] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[2:05] * rektide (~rektide@eldergods.com) Quit (Server closed connection)
[2:09] * shaon (~shaon@shaon.me) Quit (Server closed connection)
[2:09] * shaon (~shaon@shaon.me) has joined #ceph
[2:12] * Jeffrey4l_ (~Jeffrey@119.251.140.28) has joined #ceph
[2:13] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[2:22] * dneary (~dneary@207.236.147.202) Quit (Ping timeout: 480 seconds)
[2:24] * KindOne_ (kindone@h44.149.29.71.dynamic.ip.windstream.net) has joined #ceph
[2:30] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:30] * KindOne_ is now known as KindOne
[2:32] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[2:40] * chunmei (~chunmei@134.134.137.75) Quit (Remote host closed the connection)
[2:52] * dnunez (~dnunez@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[2:57] * dmonschein (~dmonschei@00020eb4.user.oftc.net) Quit (Server closed connection)
[2:57] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) has joined #ceph
[2:57] * dmonschein (~dmonschei@00020eb4.user.oftc.net) has joined #ceph
[3:03] * sudocat (~dibarra@45-17-188-191.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[3:04] * Kingrat_ (~shiny@cpe-74-129-33-192.kya.res.rr.com) Quit (Quit: Leaving)
[3:06] <jiffe> now that pg is just marked inactive
[3:10] * wjw-freebsd3 (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[3:11] * sudocat (~dibarra@45-17-188-191.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[3:14] * rkeene (1011@oc9.org) Quit (Server closed connection)
[3:14] * rkeene (1011@oc9.org) has joined #ceph
[3:17] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[3:21] * linuxkidd (~linuxkidd@ip70-189-207-54.lv.lv.cox.net) Quit (Quit: Leaving)
[3:23] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[3:28] * jproulx (~jon@kvas.csail.mit.edu) Quit (Server closed connection)
[3:29] * jproulx (~jon@kvas.csail.mit.edu) has joined #ceph
[3:31] * Kingrat (~shiny@cpe-74-129-33-192.kya.res.rr.com) has joined #ceph
[3:32] * scalability-junk (sid6422@id-6422.ealing.irccloud.com) Quit (Server closed connection)
[3:32] * scalability-junk (sid6422@id-6422.ealing.irccloud.com) has joined #ceph
[3:39] * jfaj (~jan@p5798303C.dip0.t-ipconnect.de) has joined #ceph
[3:41] * haplo37 (~haplo37@107.190.37.90) Quit (Read error: Connection reset by peer)
[3:42] * sebastian-w (~quassel@212.218.8.138) Quit (Remote host closed the connection)
[3:43] * sebastian-w (~quassel@212.218.8.139) has joined #ceph
[3:43] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:46] * jfaj__ (~jan@p4FC24EAA.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:49] * braderhart (sid124863@braderhart.user.oftc.net) Quit (Server closed connection)
[3:49] * braderhart (sid124863@braderhart.user.oftc.net) has joined #ceph
[3:50] * bassam (sid154933@id-154933.brockwell.irccloud.com) Quit (Server closed connection)
[3:50] * bassam (sid154933@id-154933.brockwell.irccloud.com) has joined #ceph
[3:52] * wgao (~wgao@106.120.101.38) has joined #ceph
[4:03] * EinstCra_ (~EinstCraz@58.247.119.250) has joined #ceph
[4:10] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[4:11] * bene2_afk (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[4:12] * dis (~dis@00018d20.user.oftc.net) Quit (Server closed connection)
[4:12] * dis (~dis@00018d20.user.oftc.net) has joined #ceph
[4:12] * yanzheng (~zhyan@125.70.21.51) has joined #ceph
[4:15] * dlan (~dennis@116.228.88.131) Quit (Quit: leaving)
[4:16] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) Quit (Server closed connection)
[4:16] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) has joined #ceph
[4:16] * Unai (~Adium@208.80.71.24) has joined #ceph
[4:17] * dlan (~dennis@116.228.88.131) has joined #ceph
[4:17] <Unai> Hey ceph gurus???. I am in need of help here
[4:17] <Unai> my monitors seem not to be having a good time
[4:17] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[4:18] <Unai> I can't even do a ceph -w
[4:28] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[4:35] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:38] * smithfarm (~smithfarm@217.30.64.210) has joined #ceph
[4:48] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) Quit (Quit: WeeChat 1.5)
[4:49] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) has joined #ceph
[5:04] * davidz (~davidz@2605:e000:1313:8003:35bc:2156:758e:fbad) Quit (Quit: Leaving.)
[5:07] * Jeffrey4l_ (~Jeffrey@119.251.140.28) Quit (Ping timeout: 480 seconds)
[5:08] * Jeffrey4l_ (~Jeffrey@119.251.140.28) has joined #ceph
[5:19] * react (~react@2001:4800:7815:103:f0d7:c55:ff05:60e8) Quit (Server closed connection)
[5:19] * react (~react@retard.io) has joined #ceph
[5:25] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[5:31] * EinstCra_ (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[5:40] * smithfarm (~smithfarm@217.30.64.210) Quit (Ping timeout: 480 seconds)
[5:40] * batrick (~batrick@2600:3c00::f03c:91ff:fe96:477b) Quit (Server closed connection)
[5:40] * batrick (~batrick@2600:3c00::f03c:91ff:fe96:477b) has joined #ceph
[5:40] * md_ (~john@205.233.53.42) Quit (Server closed connection)
[5:41] * md_ (~john@205.233.53.42) has joined #ceph
[5:42] * vimal (~vikumar@114.143.165.227) has joined #ceph
[5:43] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) Quit (Quit: I'll be back!)
[5:44] * Vacuum__ (~Vacuum@88.130.214.18) has joined #ceph
[5:51] * Vacuum_ (~Vacuum@88.130.198.71) Quit (Ping timeout: 480 seconds)
[5:52] * WildyLion (~simba@45.32.185.17) Quit (Server closed connection)
[5:52] * WildyLion (~simba@45.32.185.17) has joined #ceph
[5:55] * cholcombe (~chris@2001:67c:1562:8007::aac:40f1) Quit (Server closed connection)
[5:56] * cholcombe (~chris@2001:67c:1562:8007::aac:40f1) has joined #ceph
[6:08] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[6:08] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:09] * vimal (~vikumar@114.143.165.227) Quit (Quit: Leaving)
[6:10] * [0x4A6F]_ (~ident@p508CD249.dip0.t-ipconnect.de) has joined #ceph
[6:11] * dcwangmit01 (~dcwangmit@162-245.23-239.PUBLIC.monkeybrains.net) Quit (Server closed connection)
[6:11] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:11] * [0x4A6F]_ is now known as [0x4A6F]
[6:11] * dcwangmit01 (~dcwangmit@162-245.23-239.PUBLIC.monkeybrains.net) has joined #ceph
[6:24] * kefu_ (~kefu@114.92.101.38) has joined #ceph
[6:33] * vimal (~vikumar@121.244.87.116) has joined #ceph
[6:33] * sankarshan (~sankarsha@45.124.141.154) has joined #ceph
[6:39] * carter (~carter@li98-136.members.linode.com) Quit (Server closed connection)
[6:39] <iggy> check the logs? run in the foreground with higher verbosity?
[6:39] * carter (~carter@li98-136.members.linode.com) has joined #ceph
[6:40] * swami1 (~swami@49.38.2.251) has joined #ceph
[6:43] * kuku (~kuku@119.93.91.136) has joined #ceph
[6:43] * ndevos (~ndevos@nat-pool-ams2-5.redhat.com) Quit (Server closed connection)
[6:43] * _ndevos (~ndevos@nat-pool-ams2-5.redhat.com) has joined #ceph
[6:43] * _ndevos is now known as ndevos
[6:47] * cyphase (~cyphase@c-50-148-131-137.hsd1.ca.comcast.net) has joined #ceph
[6:49] * i_m (~ivan.miro@31.173.100.14) has joined #ceph
[6:52] * scuttlemonkey (~scuttle@nat-pool-rdu-t.redhat.com) Quit (Server closed connection)
[6:52] * i_m (~ivan.miro@31.173.100.14) Quit (Read error: Connection reset by peer)
[6:52] * scuttle (~scuttle@nat-pool-rdu-t.redhat.com) has joined #ceph
[6:56] * swami1 (~swami@49.38.2.251) Quit (Read error: Connection timed out)
[6:57] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) Quit (Server closed connection)
[6:58] * swami1 (~swami@49.38.2.251) has joined #ceph
[6:58] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) has joined #ceph
[7:03] * zeestrat (uid176159@id-176159.brockwell.irccloud.com) Quit (Server closed connection)
[7:03] * zeestrat (uid176159@id-176159.brockwell.irccloud.com) has joined #ceph
[7:05] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[7:08] * diq (~diq@2620:11c:f:2:c23f:d5ff:fe62:112c) Quit (Server closed connection)
[7:09] * diq (~diq@2620:11c:f:2:c23f:d5ff:fe62:112c) has joined #ceph
[7:10] * raphaelsc (~raphaelsc@177.42.73.142) Quit (Ping timeout: 480 seconds)
[7:14] * swami1 (~swami@49.38.2.251) Quit (Read error: Connection timed out)
[7:16] * swami1 (~swami@49.38.2.251) has joined #ceph
[7:22] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[7:24] * i_m (~ivan.miro@31.173.101.121) has joined #ceph
[7:33] * Mikko (~Mikko@109-108-30-118.bb.dnainternet.fi) Quit (Quit: This computer has gone to sleep)
[7:37] * Hemanth (~hkumar_@103.228.221.141) has joined #ceph
[7:40] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[7:44] * sankarshan (~sankarsha@45.124.141.154) Quit (Ping timeout: 480 seconds)
[7:46] * jclm (~jclm@ip68-96-196-245.lv.lv.cox.net) Quit (Quit: Leaving.)
[7:50] * adun153 (~adun153@130.105.147.50) has joined #ceph
[7:52] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Quit: Leaving...)
[7:54] * ronrib (~boswortr@45.32.242.135) Quit (Server closed connection)
[7:54] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[7:55] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[7:56] * jmn (~jmn@nat-pool-bos-t.redhat.com) Quit (Server closed connection)
[7:56] * jmn (~jmn@nat-pool-bos-t.redhat.com) has joined #ceph
[7:57] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[7:57] * EinstCrazy (~EinstCraz@101.78.195.62) has joined #ceph
[7:58] * Hemanth (~hkumar_@103.228.221.141) Quit (Ping timeout: 480 seconds)
[7:58] * i_m (~ivan.miro@31.173.101.121) Quit (Read error: Connection reset by peer)
[7:59] * ggarg (~Gaurav@x2f2275a.dyn.telefonica.de) has joined #ceph
[8:02] * EinstCra_ (~EinstCraz@58.247.119.250) has joined #ceph
[8:04] * ronrib (~boswortr@45.32.242.135) has joined #ceph
[8:04] * percevalbot (~supybot@pct-empresas-82.uc3m.es) Quit (Ping timeout: 480 seconds)
[8:05] * wgao (~wgao@106.120.101.38) Quit (Ping timeout: 480 seconds)
[8:05] * owlbot (~supybot@pct-empresas-50.uc3m.es) Quit (Ping timeout: 480 seconds)
[8:07] * percevalbot (~supybot@pct-empresas-82.uc3m.es) has joined #ceph
[8:08] * owlbot (~supybot@pct-empresas-50.uc3m.es) has joined #ceph
[8:09] * EinstCrazy (~EinstCraz@101.78.195.62) Quit (Ping timeout: 480 seconds)
[8:10] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:11] * i_m (~ivan.miro@31.173.101.126) has joined #ceph
[8:14] * wgao (~wgao@106.120.101.38) has joined #ceph
[8:24] * mykola (~Mikolaj@91.245.79.118) has joined #ceph
[8:25] * erice (~eric@c-76-120-53-165.hsd1.co.comcast.net) Quit (Server closed connection)
[8:26] * erice (~eric@c-76-120-53-165.hsd1.co.comcast.net) has joined #ceph
[8:28] * kefu_ (~kefu@114.92.101.38) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[8:29] * kefu (~kefu@114.92.101.38) has joined #ceph
[8:34] * ade (~abradshaw@p4FF7BDB3.dip0.t-ipconnect.de) has joined #ceph
[8:36] * epicguy (~epicguy@41.164.8.42) Quit (Quit: Leaving)
[8:38] * rraja (~rraja@121.244.87.117) has joined #ceph
[8:40] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[8:47] * ronrib (~boswortr@45.32.242.135) Quit (Remote host closed the connection)
[8:48] * SamYaple (~SamYaple@162.209.126.134) Quit (Server closed connection)
[8:48] * SamYaple (~SamYaple@162.209.126.134) has joined #ceph
[8:51] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[8:52] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:52] * bviktor (~bviktor@213.16.80.50) has joined #ceph
[8:55] * dougf (~dougf@75-131-32-223.static.kgpt.tn.charter.com) Quit (Server closed connection)
[8:55] * dougf (~dougf@75-131-32-223.static.kgpt.tn.charter.com) has joined #ceph
[9:07] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[9:09] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:10] * boichev (~boichev@213.169.56.130) has joined #ceph
[9:12] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[9:14] * reset11 (~reset11@pathway.boku.ac.at) Quit (Remote host closed the connection)
[9:19] * kutija (~kutija@89.216.27.139) has joined #ceph
[9:19] * goretoxo (~psilva@84.124.11.230.static.user.ono.com) has joined #ceph
[9:20] * goretoxo (~psilva@84.124.11.230.static.user.ono.com) Quit ()
[9:21] * goretoxo (~psilva@pedrosilva.org) has joined #ceph
[9:21] * karnan (~karnan@121.244.87.117) has joined #ceph
[9:31] * analbeard (~shw@support.memset.com) has joined #ceph
[9:34] * r0lland (~r0lland@121.244.155.12) has joined #ceph
[9:36] <r0lland> ?
[9:40] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[9:43] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) has joined #ceph
[9:43] * smithfarm (~smithfarm@80.188.202.66) has joined #ceph
[9:48] * wjw-freebsd3 (~wjw@smtp.digiware.nl) has joined #ceph
[9:49] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:51] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[9:56] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[10:02] * kuku (~kuku@119.93.91.136) Quit (Remote host closed the connection)
[10:04] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[10:05] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[10:11] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[10:12] * vZerberus (dog@00021993.user.oftc.net) Quit (Quit: Coyote finally caught me)
[10:27] * r0lland (~r0lland@121.244.155.12) Quit (Remote host closed the connection)
[10:29] * goretoxo (~psilva@pedrosilva.org) Quit (Ping timeout: 480 seconds)
[10:29] <Amto_res> Hello, Since updating of Ceph Hammer -> Jewel, it is impossible to create Buckets on the RGW. The bug was corrected in 10.2.3, you know when is the release will be available?
[10:33] * madkiss (~madkiss@91.141.2.83.wireless.dyn.drei.com) has joined #ceph
[10:36] <etienneme> Within few days I would say. They are doing some tests before releasing it.
[10:36] <etienneme> There is a topic on ceph ML about relesing it.
[10:37] * destrudo (~destrudo@tomba.sonic.net) Quit (Server closed connection)
[10:37] * destrudo (~destrudo@tomba.sonic.net) has joined #ceph
[10:38] * goretoxo (~psilva@84.124.11.230.static.user.ono.com) has joined #ceph
[10:38] <Amto_res> etienneme: Thanks, on the ceph-users MF ?
[10:38] <Amto_res> ML*
[10:38] <etienneme> ceph-devel
[10:38] * mattch (~mattch@w5430.see.ed.ac.uk) has joined #ceph
[10:38] <etienneme> "rbd and the next jewel release v10.2.3" from Loic
[10:39] <Amto_res> Ok, Thanks
[10:41] * goretoxo (~psilva@84.124.11.230.static.user.ono.com) Quit ()
[10:46] * TMM (~hp@185.5.121.201) has joined #ceph
[10:47] * Cybert1nus is now known as Cybertinus
[10:47] * natarej (~natarej@101.188.54.14) Quit (Server closed connection)
[10:48] * natarej (~natarej@101.188.54.14) has joined #ceph
[10:48] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:fdc8:c4d5:24bd:956f) has joined #ceph
[10:50] * vZerberus (~dogtail@00021993.user.oftc.net) has joined #ceph
[10:51] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) has joined #ceph
[10:51] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) Quit ()
[10:53] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) has joined #ceph
[10:57] <Be-El> does rados bench has an option to define the object size?
[11:01] * expaddy (~david@92-61-200-142.cable.cablecomm.ie) has joined #ceph
[11:02] <expaddy> Can anybody let me know what this message is stating from a ceph client:
[11:03] <expaddy> 2016-08-24 09:55:31.380250 7f6a8c5a5700 0 -- :/2854841852 >> 192.168.122.10:6789/0 pipe(0x7f6a8805c610 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6a8805d8d0).fault
[11:03] <expaddy> ceph cluster is up and ok
[11:03] <expaddy> It's a virtual environment
[11:07] * md_ (~john@205.233.53.42) Quit (Ping timeout: 480 seconds)
[11:07] * md_ (~john@205.233.53.42) has joined #ceph
[11:08] * briner (~briner@129.194.16.54) Quit (Quit: briner)
[11:10] <etienneme> It's probabl because it was unable to reach the monitor
[11:12] <etienneme> If it's running well then you should not worry about this log
[11:12] <expaddy> ok but I can't use any ceph commands from the client
[11:12] <expaddy> and all seems ok
[11:13] <expaddy> I'm trying to create block storage from the client
[11:23] <etienneme> check your network connectivity, try to do a nc your.mon.ip 6789 if you don't have a ceph vxxx answser then you ahve an issue
[11:29] * madkiss (~madkiss@91.141.2.83.wireless.dyn.drei.com) Quit (Quit: Leaving.)
[11:34] * flisky (~Thunderbi@106.38.61.190) has joined #ceph
[11:37] * DanFoster (~Daniel@2a00:1ee0:3:1337:11e:6c9a:ab18:dbd4) has joined #ceph
[11:47] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[11:55] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[11:56] * owasserm (~owasserm@a212-238-239-152.adsl.xs4all.nl) Quit (Server closed connection)
[11:56] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[12:00] * EinstCra_ (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[12:01] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[12:05] * adamcrume (~quassel@2601:647:cb01:f890:a288:69ff:fe70:6caa) Quit (Server closed connection)
[12:05] * adamcrume (~quassel@2601:647:cb01:f890:a288:69ff:fe70:6caa) has joined #ceph
[12:09] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[12:10] * rendar (~I@95.235.182.241) has joined #ceph
[12:10] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[12:11] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[12:12] * i_m (~ivan.miro@31.173.101.126) Quit (Ping timeout: 480 seconds)
[12:14] * LiamMon (~liam.monc@disco.moncur.eu) has joined #ceph
[12:16] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[12:16] * smithfarm (~smithfarm@80.188.202.66) Quit (Ping timeout: 480 seconds)
[12:16] * i_m (~ivan.miro@31.173.100.200) has joined #ceph
[12:17] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[12:18] * LiamMon__ (~liam.monc@srxo-074250.sorexo.com) has joined #ceph
[12:20] * LiamMon_ (~liam.monc@94.0.110.69) Quit (Ping timeout: 480 seconds)
[12:21] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[12:24] * bniver (~bniver@pool-98-110-180-234.bstnma.fios.verizon.net) Quit (Remote host closed the connection)
[12:28] * Larsen (~andreas@2001:67c:578:2::15) Quit (Quit: Larsen)
[12:29] * smithfarm (~smithfarm@80.188.202.66) has joined #ceph
[12:31] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[12:32] * walcubi (~walcubi@p5795A6A7.dip0.t-ipconnect.de) has joined #ceph
[12:32] * boichev (~boichev@213.169.56.130) Quit (Quit: Nettalk6 - www.ntalk.de)
[12:32] <walcubi> Hi, I've just noticed that two of the monitors setup by ceph-deploy are broken.
[12:33] <walcubi> They were added using ceph-deploy mon add
[12:34] <walcubi> And not starting because: -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-b': (22) Invalid argument
[12:35] <walcubi> Noticed that ownership of the store.db was wrong, but had no effect after changing.
[12:37] * kuku (~kuku@112.203.30.2) has joined #ceph
[12:38] * sebastian-w_ (~quassel@212.218.8.139) has joined #ceph
[12:38] * sebastian-w (~quassel@212.218.8.139) Quit (Ping timeout: 480 seconds)
[12:38] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[12:39] * flisky (~Thunderbi@106.38.61.190) Quit (Quit: flisky)
[12:43] * adun153 (~adun153@130.105.147.50) Quit (Quit: Ex-Chat)
[12:44] * i_m (~ivan.miro@31.173.100.200) Quit (Ping timeout: 480 seconds)
[12:51] * Racpatel (~Racpatel@2601:87:0:24af::313b) Quit (Ping timeout: 480 seconds)
[12:51] * kuku (~kuku@112.203.30.2) Quit (Read error: Connection reset by peer)
[12:52] * kuku (~kuku@112.203.30.2) has joined #ceph
[13:03] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[13:04] * i_m (~ivan.miro@31.173.100.200) has joined #ceph
[13:06] * Pulp (~Pulp@63-221-50-195.dyn.estpak.ee) has joined #ceph
[13:14] * LiamMon (~liam.monc@disco.moncur.eu) Quit (Quit: leaving)
[13:14] * LiamMon (~liam.monc@94.0.110.69) has joined #ceph
[13:16] * LiamMon__ (~liam.monc@srxo-074250.sorexo.com) Quit (Ping timeout: 480 seconds)
[13:19] * epicguy (~epicguy@41.164.8.42) has joined #ceph
[13:21] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[13:27] * dec (~dec@45.96.198.104.bc.googleusercontent.com) Quit (Max SendQ exceeded)
[13:28] * expaddy (~david@92-61-200-142.cable.cablecomm.ie) has left #ceph
[13:30] * bniver (~bniver@nat-pool-bos-u.redhat.com) has joined #ceph
[13:34] * salwasser (~Adium@2601:197:101:5cc1:b8a0:fb71:6760:e458) has joined #ceph
[13:35] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[13:46] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[13:48] * sankarshan (~sankarsha@106.206.154.86) has joined #ceph
[13:53] * smithfarm (~smithfarm@80.188.202.66) Quit (Ping timeout: 480 seconds)
[13:55] <Kvisle> what sdks are people using for testing s3 aws signature v4 om jewel?
[13:57] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Remote host closed the connection)
[14:04] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[14:07] * kuku (~kuku@112.203.30.2) Quit (Remote host closed the connection)
[14:08] * kuku (~kuku@112.203.30.2) has joined #ceph
[14:09] * ashah (~ashah@121.244.87.117) has joined #ceph
[14:10] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[14:10] <jiffe> anyone see why this PG is stuck inactive? http://nsab.us/public/ceph
[14:10] <jiffe> I have 37 osds, all 37 are up
[14:10] * kuku (~kuku@112.203.30.2) Quit (Remote host closed the connection)
[14:10] <jiffe> and in
[14:11] * F|1nt (~F|1nt@host37-211.lan-isdn.imaginet.fr) has joined #ceph
[14:11] * Mikko (~Mikko@dfs61tyfgyhgv19pnq07t-3.rev.dnainternet.fi) has joined #ceph
[14:16] * thomnico (~thomnico@2a01:e35:8b41:120:fd1e:2e4c:64a1:e61e) has joined #ceph
[14:21] * salwasser (~Adium@2601:197:101:5cc1:b8a0:fb71:6760:e458) Quit (Quit: Leaving.)
[14:21] * F|1nt (~F|1nt@host37-211.lan-isdn.imaginet.fr) Quit (Quit: Oups, just gone away...)
[14:23] * salwasser (~Adium@c-76-118-229-231.hsd1.ma.comcast.net) has joined #ceph
[14:24] * salwasser (~Adium@c-76-118-229-231.hsd1.ma.comcast.net) Quit ()
[14:25] * dneary (~dneary@207.236.147.202) has joined #ceph
[14:31] * rakeshgm (~rakesh@121.244.87.117) Quit (Remote host closed the connection)
[14:33] * KUSmurf (~jwandborg@exit-01a.noisetor.net) has joined #ceph
[14:38] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:39] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[14:44] * srk (~Siva@2605:6000:ed04:ce00:7089:f4e7:d277:f787) has joined #ceph
[14:44] * kefu is now known as kefu|afk
[14:48] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[15:03] * kefu|afk (~kefu@114.92.101.38) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[15:03] * KUSmurf (~jwandborg@5AEAAA7G3.tor-irc.dnsbl.oftc.net) Quit ()
[15:04] * kefu (~kefu@114.92.101.38) has joined #ceph
[15:05] * spgriffinjr (~spgriffin@66-46-246-206.dedicated.allstream.net) has joined #ceph
[15:07] * Nicho1as (~nicho1as@14.52.121.20) has joined #ceph
[15:09] * raphaelsc (~raphaelsc@177.19.29.72) has joined #ceph
[15:10] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:12] * kefu (~kefu@114.92.101.38) Quit (Ping timeout: 480 seconds)
[15:12] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[15:24] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[15:25] * squizzi (~squizzi@107.13.237.240) has joined #ceph
[15:26] * scg (~zscg@181.122.4.166) Quit (Quit: Ex-Chat)
[15:26] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) Quit (Ping timeout: 480 seconds)
[15:29] * wes_dillingham (~wes_dilli@140.247.242.44) has joined #ceph
[15:31] * srk (~Siva@2605:6000:ed04:ce00:7089:f4e7:d277:f787) Quit (Ping timeout: 480 seconds)
[15:34] <mistur> Hello
[15:35] <mistur> I have an osd nearly full
[15:35] <mistur> a mistake in hte crush map put an SSD (400G) in the main root compose a 6TB drive
[15:35] <mistur> I set this osd out, reweight to 0
[15:36] <mistur> but ceph still add data on this osd
[15:37] * rraja (~rraja@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:40] * epicguy (~epicguy@41.164.8.42) Quit (Quit: Leaving)
[15:43] <sep> mistur, tried restarting it ? forces the hard ware all peerings to restart
[15:44] * rraja (~rraja@121.244.87.117) has joined #ceph
[15:44] * rraja (~rraja@121.244.87.117) Quit ()
[15:45] * rraja (~rraja@121.244.87.117) has joined #ceph
[15:45] <mistur> sep: I jsut try this but still data comes in
[15:48] <sep> and the osd weight is set to 0 when you check it in ceph osd tree ?
[15:48] <mistur> yes
[15:48] <sep> very strange, i have never seen that so i can't explain it
[15:48] * dneary (~dneary@207.236.147.202) Quit (Ping timeout: 480 seconds)
[15:48] * bviktor (~bviktor@213.16.80.50) Quit (Ping timeout: 480 seconds)
[15:49] * dnunez (~dnunez@nat-pool-bos-u.redhat.com) has joined #ceph
[15:49] <Hatsjoe> It should not be possible for ceph to write data to a 0 weight OSD right?
[15:50] * smithfarm (~smithfarm@95.205.broadband5.iol.cz) has joined #ceph
[15:50] <mistur> sep: ok, I found it
[15:51] <mistur> I swap 2 osd when I move the ssd to the other root in the crush map
[15:51] <mistur> I move osd 111 instead of 110
[15:51] <mistur> make sense
[15:54] * Racpatel (~Racpatel@2601:87:0:24af::313b) has joined #ceph
[15:56] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:57] * salwasser (~Adium@a23-79-238-10.deploy.static.akamaitechnologies.com) has joined #ceph
[15:58] * kuku (~kuku@112.203.30.2) has joined #ceph
[15:59] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[16:00] * b0e (~aledermue@213.95.25.82) has joined #ceph
[16:01] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[16:01] * kuku (~kuku@112.203.30.2) Quit (Read error: Connection reset by peer)
[16:06] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[16:06] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[16:06] * cronburg__ (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:08] <wes_dillingham> Just out of curiousity if everyone things this a normal rebalance time. I added two OSD drives one each to 2 of my 10 OSD hosts 12 drives per OSD hosts = 120 3TB drives. My Cluster is ~50% full and so each new disk is having ~1.5TB pushed on it. I am using a replica count of 4 and my OSDs are connected by 2x bonded 10G. The journals for all OSDs are colocated with the journal partition. My cluster was relatively inactive. Maybe 10MB/s read and
[16:08] <wes_dillingham> write clientIO Anyways, I calculated the raw throughput of the drive at 160MB/s so the minimum theoretical time to fill the drive with 1.5TB would be about 3 hours. However, it took ceph 8 hours to complete the backfill. Just wondering if this would be normal or ceph intentionally throttles the backfill.
[16:09] * salwasser (~Adium@a23-79-238-10.deploy.static.akamaitechnologies.com) Quit (Ping timeout: 480 seconds)
[16:09] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:10] <Be-El> wes_dillingham: colocating the journal will introduce extra seek operations, slowing down the transfer rate
[16:10] * danieagle (~Daniel@177.138.169.68) has joined #ceph
[16:12] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:14] <wes_dillingham> Be-El: I only have spinners to play with. Would I see a performance increase if I put data partition on Disk A and Journal Partition on Disk B ?
[16:15] <Be-El> probably not if both disks are used as OSD
[16:15] <wes_dillingham> Yea, they would be.
[16:15] <Be-El> and using a single spinner for all journal would be worse even
[16:15] * salwasser (~Adium@a23-79-238-10.deploy.static.akamaitechnologies.com) has joined #ceph
[16:16] * cronburg__ (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[16:16] <wes_dillingham> sure, but given my actual setup, does that seem inline with what you might expect?
[16:17] <wes_dillingham> i presume each backfill operation is journaled and so there is some penalty on the journaling
[16:17] <wes_dillingham> but its not a 2x slowdown I would think
[16:21] <Be-El> 160MB/s is imho too high. under normal operations the osd disk has to seek a loot
[16:21] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[16:23] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[16:23] <jprins> When I push thousands of files into a single bucket, at some point in time it gets very slow and every file that is being pushed gets a 3 second penalty because something seems to fail with ACL's. An error in the line of: retrying transfer for failed ......?acl.
[16:24] <jprins> I don't have the real message available at the moment, but I will repeat my test later today and copy the error.
[16:24] <wes_dillingham> Ok thanks for your thought Be-El
[16:24] <jprins> The weird thing is that the rados server only returns 200 OK to the client.
[16:25] <jprins> For every request. So I don't see an error in the request logs of the rados server but the S3 client (s3cmd) retries something witn a ?acl after the filename.
[16:28] * ade (~abradshaw@p4FF7BDB3.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[16:29] * ade (~abradshaw@p4FF7BDB3.dip0.t-ipconnect.de) has joined #ceph
[16:30] * dneary (~dneary@173.243.39.74) has joined #ceph
[16:32] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[16:36] * kutija (~kutija@89.216.27.139) Quit (Quit: Textual IRC Client: www.textualapp.com)
[16:38] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[16:38] * swami1 (~swami@49.38.2.251) Quit (Quit: Leaving.)
[16:40] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[16:44] * xarses (~xarses@64.124.158.32) has joined #ceph
[16:44] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[16:51] * yanzheng (~zhyan@125.70.21.51) Quit (Quit: This computer has gone to sleep)
[16:56] * rraja (~rraja@125.16.34.66) has joined #ceph
[16:57] * rraja (~rraja@125.16.34.66) Quit ()
[16:59] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[16:59] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[17:01] * wjw-freebsd3 (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[17:01] * ntpttr_ (~ntpttr@134.134.139.82) has joined #ceph
[17:03] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:04] * sankarshan (~sankarsha@106.206.154.86) Quit (Ping timeout: 480 seconds)
[17:06] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[17:08] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[17:09] * wushudoin (~wushudoin@38.140.108.2) Quit ()
[17:09] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[17:10] * smithfarm (~smithfarm@95.205.broadband5.iol.cz) Quit (Ping timeout: 480 seconds)
[17:14] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[17:14] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:14] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[17:15] * kefu (~kefu@114.92.101.38) has joined #ceph
[17:15] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[17:15] * kefu is now known as kefu|afk
[17:16] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[17:16] * sankarshan (~sankarsha@45.124.141.154) has joined #ceph
[17:16] * ChanServ changes topic to 'http://ceph.com/get || dev channel #ceph-devel || test lab channel #sepia'
[17:16] * ChanServ sets mode +v joao
[17:18] * kefu|afk is now known as kefu
[17:19] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[17:20] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) Quit ()
[17:20] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[17:21] * oliveiradan (~doliveira@137.65.133.10) Quit (Remote host closed the connection)
[17:23] * ntpttr_ (~ntpttr@134.134.139.82) Quit (Remote host closed the connection)
[17:26] * ade (~abradshaw@p4FF7BDB3.dip0.t-ipconnect.de) Quit (Quit: Too sexy for his shirt)
[17:27] * ashah (~ashah@121.244.87.117) Quit (Quit: Leaving)
[17:28] * oliveiradan2 (~doliveira@67.214.238.80) has joined #ceph
[17:28] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[17:28] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[17:32] * sankarshan (~sankarsha@45.124.141.154) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[17:32] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[17:33] * ntpttr_ (~ntpttr@134.134.139.72) has joined #ceph
[17:38] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[17:39] * blizzow (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) has joined #ceph
[17:41] <blizzow> So I have 8 nodes. 1 node has 8x4TB OSDs. 2 nodes have 4x4TB OSDs. 5 nodes have 4x1TB OSDs. I want to shut down / run some upgrades on the large node with 8 OSDs. Is there a way to tell if shutting this node down willl disable my cluster before actually doing so?
[17:41] * Racpatel (~Racpatel@2601:87:0:24af::313b) Quit (Ping timeout: 480 seconds)
[17:45] <doppelgrau> blizzow: failure domain = host and size > min_size no problems to be expected
[17:45] * derjohn_mob (~aj@88.128.80.36) has joined #ceph
[17:45] * cathode (~cathode@50.232.215.114) has joined #ceph
[17:46] <doppelgrau> (assuming the cluster is healthy before the operation)
[17:49] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) has joined #ceph
[17:51] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[17:51] * Racpatel (~Racpatel@2601:87:0:24af::313b) has joined #ceph
[17:59] * mhackett (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[17:59] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[18:00] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[18:00] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[18:01] * linuxkidd (~linuxkidd@ip70-189-207-54.lv.lv.cox.net) has joined #ceph
[18:01] * kefu (~kefu@114.92.101.38) Quit (Quit: Textual IRC Client: www.textualapp.com)
[18:03] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[18:03] * chunmei (~chunmei@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[18:04] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) Quit (Ping timeout: 480 seconds)
[18:06] * squizzi_ (~squizzi@173.38.117.69) has joined #ceph
[18:08] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[18:10] * Racpatel (~Racpatel@2601:87:0:24af::313b) Quit (Quit: Leaving)
[18:10] * Racpatel (~Racpatel@2601:87:0:24af::313b) has joined #ceph
[18:14] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[18:18] <blizzow> So I'm using writeback cache for the RBD based images in my libvirt based VMs, the drives are attached as virtio. Running system updates has slowed to a crawl... dpkg and apt-get are almost unusable. 15+minutes for a small set of system updates. Is there a better cache mode or better way to attach disks?
[18:20] <SamYaple> blizzow: writeback isnt going to help with reads (unless of course they are cached)
[18:20] <SamYaple> blizzow: i would figure out why the reads iops are taking a long time
[18:24] <blizzow> SamYaple: A) How do I tell if it's the read iops taking forever? B) is there a better cache mode in general to use for VMs?
[18:24] <SamYaple> writeback is what you want. i believe you are describing read ops taking forever
[18:25] <SamYaple> so testing it should be trying ot read from the disk and measuring the results against what you expect them to be
[18:27] * i_m (~ivan.miro@31.173.100.200) Quit (Read error: Connection reset by peer)
[18:28] * i_m (~ivan.miro@31.173.101.125) has joined #ceph
[18:29] * Hemanth (~hkumar_@103.228.221.141) has joined #ceph
[18:29] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[18:29] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[18:30] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[18:32] * Unai (~Adium@208.80.71.24) Quit (Quit: Leaving.)
[18:34] * dneary (~dneary@173.243.39.74) Quit (Ping timeout: 480 seconds)
[18:34] * haplo37 (~haplo37@199.91.185.156) Quit (Read error: Connection reset by peer)
[18:37] * smithfarm (~smithfarm@ip-37-188-134-20.eurotel.cz) has joined #ceph
[18:38] * lcurtis_ (~lcurtis@47.19.105.250) has joined #ceph
[18:39] * salwasser (~Adium@a23-79-238-10.deploy.static.akamaitechnologies.com) Quit (Ping timeout: 480 seconds)
[18:40] * derjohn_mob (~aj@88.128.80.36) Quit (Ping timeout: 480 seconds)
[18:42] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) has joined #ceph
[18:44] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:49] * DanFoster (~Daniel@2a00:1ee0:3:1337:11e:6c9a:ab18:dbd4) Quit (Quit: Leaving)
[18:50] <lcurtis_> hello all...has anyone seen extensive differences in performance with erasure coded pools vs replicas in Ceph?
[18:55] * ntpttr__ (~ntpttr@134.134.139.72) has joined #ceph
[18:55] * i_m (~ivan.miro@31.173.101.125) Quit (Read error: Connection reset by peer)
[18:55] * ntpttr_ (~ntpttr@134.134.139.72) Quit (Remote host closed the connection)
[18:55] * squizzi_ (~squizzi@173.38.117.69) Quit (Quit: bye)
[18:57] * i_m (~ivan.miro@31.173.120.152) has joined #ceph
[18:59] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[19:00] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[19:01] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[19:02] * ntpttr__ (~ntpttr@134.134.139.72) Quit (Remote host closed the connection)
[19:04] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[19:04] * sudocat1 (~dibarra@192.185.1.20) has joined #ceph
[19:10] * BrianA (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[19:13] * squizzi_ (~squizzi@2001:420:2240:1340:414d:2677:1773:dc56) has joined #ceph
[19:15] * i_m (~ivan.miro@31.173.120.152) Quit (Ping timeout: 480 seconds)
[19:22] * squizzi_ (~squizzi@2001:420:2240:1340:414d:2677:1773:dc56) Quit (Quit: bye)
[19:26] * i_m (~ivan.miro@31.173.120.152) has joined #ceph
[19:27] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[19:28] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[19:29] * BrianA1 (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[19:30] * ntpttr_ (~ntpttr@134.134.139.70) has joined #ceph
[19:30] * i_m (~ivan.miro@31.173.120.152) Quit (Read error: Connection reset by peer)
[19:30] * BrianA (~BrianA@fw-rw.shutterfly.com) Quit (Ping timeout: 480 seconds)
[19:32] <walcubi> A thought just occurred to me. As object placement is determined from its hash, which itself is a 32-bit hex number. Does that mean that there can't be any more than UINT_MAX objects in a given pool?
[19:32] * ntpttr_ (~ntpttr@134.134.139.70) Quit ()
[19:32] * squizzi_ (~squizzi@2001:420:2240:1340:414d:2677:1773:dc56) has joined #ceph
[19:33] <rkeene> Hmm
[19:33] <rkeene> Ceph is weird sometimes -- if my cwd is a directory that doesn't exist "ceph" doesn't work
[19:34] * dneary (~dneary@207.236.147.202) has joined #ceph
[19:36] <[arx]> the ceph(8) command?
[19:37] <[arx]> it appears so
[19:37] * swami1 (~swami@27.7.172.152) has joined #ceph
[19:38] * thomnico (~thomnico@2a01:e35:8b41:120:fd1e:2e4c:64a1:e61e) Quit (Quit: Ex-Chat)
[19:38] * overclk_ is now known as overclk
[19:39] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[19:44] <jdillaman> blizzow: what version of librbd are you using?
[19:44] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:fdc8:c4d5:24bd:956f) Quit (Ping timeout: 480 seconds)
[19:45] * ChanServ changes topic to 'http://ceph.com/get || dev channel #ceph-devel || test lab channel #sepia'
[19:46] <blizzow> jdillaman: Version: 10.2.2-1xenial
[19:46] <jdillaman> blizzow: there is a bug in 10.x series where writeback cache never enables
[19:47] <jdillaman> blizzow: it will be fixed in 10.2.3 (forthcoming), but in the meantime, you would need to set the following config option in your ceph.conf:
[19:47] <jdillaman> rbd_cache_writethrough_until_flush = false
[19:48] <jdillaman> blizzow: might help your situation if lots of small IO ops aren't getting coalesced
[19:48] <blizzow> and I need to put that on all of my mons/osds/clients/rgw/mds?
[19:48] <blizzow> or just the clients?
[19:48] <jdillaman> blizzow: only on hosts that run your VMs under the "[client]" section
[19:49] * bjozet (~bjozet@82-183-17-144.customers.ownit.se) Quit (Quit: leaving)
[19:50] * i_m (~ivan.miro@109.188.126.5) has joined #ceph
[19:50] <blizzow> jdillaman: how does the change propagate once I've added a [client] section with that line to the ceph.conf on my hypervisors?
[19:51] <jdillaman> blizzow: you would need to restart QEMU for the change to take effect (unfortunately)
[19:52] <blizzow> well shit. I guess it's better than the news that LVM/mdadm is b0rked with major performance issues for the current ubuntu LTS kernels.
[19:53] <jdillaman> blizzow: i'd try it on a sample host to see if that is the issue you are experiencing
[19:56] * i_m1 (~ivan.miro@109.188.126.8) has joined #ceph
[19:58] * i_m (~ivan.miro@109.188.126.5) Quit (Ping timeout: 480 seconds)
[20:01] * i_m1 (~ivan.miro@109.188.126.8) Quit (Read error: Connection reset by peer)
[20:02] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:03] * BrianA (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[20:04] * swami1 (~swami@27.7.172.152) Quit (Ping timeout: 480 seconds)
[20:05] * i_m (~ivan.miro@94.25.168.9) has joined #ceph
[20:08] * BrianA1 (~BrianA@fw-rw.shutterfly.com) Quit (Ping timeout: 480 seconds)
[20:08] * squizzi (~squizzi@107.13.237.240) Quit (Quit: bye)
[20:12] * derjohn_mob (~aj@x4db0eec0.dyn.telefonica.de) has joined #ceph
[20:12] * baojg (~baojg@61.135.155.34) Quit (Ping timeout: 480 seconds)
[20:14] * i_m1 (~ivan.miro@94.25.168.167) has joined #ceph
[20:15] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[20:18] * i_m2 (~ivan.miro@94.25.168.87) has joined #ceph
[20:19] * i_m (~ivan.miro@94.25.168.9) Quit (Ping timeout: 480 seconds)
[20:21] * i_m1 (~ivan.miro@94.25.168.167) Quit (Read error: Connection reset by peer)
[20:21] * wjw-freebsd3 (~wjw@smtp.digiware.nl) has joined #ceph
[20:22] * i_m (~ivan.miro@31.173.120.19) has joined #ceph
[20:26] * i_m2 (~ivan.miro@94.25.168.87) Quit (Ping timeout: 480 seconds)
[20:26] * bjozet (~bjozet@82-183-17-144.customers.ownit.se) has joined #ceph
[20:28] * i_m1 (~ivan.miro@109.188.125.11) has joined #ceph
[20:31] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[20:33] * i_m (~ivan.miro@31.173.120.19) Quit (Ping timeout: 480 seconds)
[20:35] * nilez (~nilez@155.94.244.74) Quit (Ping timeout: 480 seconds)
[20:36] * nilez (~nilez@96.44.144.90) has joined #ceph
[20:38] * \ask (~ask@oz.develooper.com) has joined #ceph
[20:40] * Unai1 (~Adium@50-115-70-150.static-ip.telepacific.net) has joined #ceph
[20:42] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) has joined #ceph
[20:44] * Kurimus1 (~kalleeen@tor-exit.squirrel.theremailer.net) has joined #ceph
[20:45] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[20:46] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[20:47] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) Quit (Ping timeout: 480 seconds)
[20:48] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[20:50] * mykola (~Mikolaj@91.245.79.118) Quit (Quit: away)
[20:52] <wes_dillingham> does rbd mirroring work with rbd children in 10.2.2 ?
[20:57] * i_m1 (~ivan.miro@109.188.125.11) Quit (Ping timeout: 480 seconds)
[20:57] * smithfarm (~smithfarm@ip-37-188-134-20.eurotel.cz) Quit (Read error: Connection reset by peer)
[20:58] * Unai1 (~Adium@50-115-70-150.static-ip.telepacific.net) Quit (Quit: Leaving.)
[21:00] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[21:00] * i_m (~ivan.miro@83.149.37.190) has joined #ceph
[21:02] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[21:03] * Mikko (~Mikko@dfs61tyfgyhgv19pnq07t-3.rev.dnainternet.fi) Quit (Quit: This computer has gone to sleep)
[21:08] * i_m (~ivan.miro@83.149.37.190) Quit (Ping timeout: 480 seconds)
[21:11] * Mikko (~Mikko@109-108-30-118.bb.dnainternet.fi) has joined #ceph
[21:14] * Kurimus1 (~kalleeen@61TAABJ0I.tor-irc.dnsbl.oftc.net) Quit ()
[21:18] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[21:21] * nilez (~nilez@96.44.144.90) Quit (Ping timeout: 480 seconds)
[21:21] * nilez (~nilez@ec2-52-37-170-77.us-west-2.compute.amazonaws.com) has joined #ceph
[21:24] * BrianA1 (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[21:26] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) has joined #ceph
[21:27] * karnan (~karnan@106.51.131.100) has joined #ceph
[21:29] * BrianA (~BrianA@fw-rw.shutterfly.com) Quit (Ping timeout: 480 seconds)
[21:32] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) Quit (Remote host closed the connection)
[21:36] <blizzow> Would migrating a host off my hypervisor, adding a client writethrough line to /etc/ceph/ceph.conf, and migrating the VM back suffice for getting the caching running? Or is there a way to tell whether or not the caching is being utilized for a VM?
[21:37] * bniver (~bniver@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[21:38] <rkeene> blizzow, Migrating it off and back should work.
[21:39] <blizzow> rkeene: would I need to restart my hypervisor between the migrations?
[21:39] <blizzow> (I mean after I add the client writethrough line to ceph.conf.
[21:40] <rkeene> virsh qemu-monitor-command --domain <id> --hmp info block -n -v might be helpful (if using libvirt)
[21:40] <rkeene> blizzow, No -- that's read by the RBD client (librbd)
[21:41] <rkeene> QEMU is linked to librbd, so on initialization it reads ceph.conf
[21:42] <rkeene> The saved state doesn't include the ceph configuration, so it shouldn't know this change tookplace
[21:42] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[21:42] * cronburg__ (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[21:44] * chunmei (~chunmei@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[21:44] * penguinRaider (~KiKo@14.139.82.6) Quit (Remote host closed the connection)
[21:44] <jdillaman> wes_dillingham: yes, I believe it does
[21:45] <jdillaman> blizzow: rkeene: correct, a live-migration to a host with an updated ceph.conf would work
[21:45] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[21:49] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[21:53] * cronburg__ (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[21:54] * efirs (~firs@98.207.153.155) Quit (Ping timeout: 480 seconds)
[22:03] * chunmei (~chunmei@134.134.137.75) has joined #ceph
[22:05] * dneary (~dneary@207.236.147.202) Quit (Ping timeout: 480 seconds)
[22:08] * Esvandiary (~Shnaw@46.166.188.219) has joined #ceph
[22:08] * squizzi_ (~squizzi@2001:420:2240:1340:414d:2677:1773:dc56) Quit (Ping timeout: 480 seconds)
[22:08] * karnan (~karnan@106.51.131.100) Quit (Quit: Leaving)
[22:18] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[22:22] <blizzow> weee, migration party for me!
[22:24] * wes_dillingham (~wes_dilli@140.247.242.44) Quit (Ping timeout: 480 seconds)
[22:29] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:29] * chunmei (~chunmei@134.134.137.75) Quit (Remote host closed the connection)
[22:30] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[22:31] * wes_dillingham (~wes_dilli@65.112.11.66) has joined #ceph
[22:32] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[22:33] <blizzow> So, kraken is super deprecated, calamari is phased out, and ceph-dash doesn't seem very active any more. Is there a good current equivalent to calamari or ceph-dash out there?
[22:35] * Jeffrey4l__ (~Jeffrey@110.252.55.17) has joined #ceph
[22:38] * Esvandiary (~Shnaw@46.166.188.219) Quit ()
[22:38] * Jeffrey4l_ (~Jeffrey@119.251.140.28) Quit (Ping timeout: 480 seconds)
[22:40] <jdillaman> blizzow: see https://github.com/skyrings/skyring
[22:40] * chunmei (~chunmei@134.134.137.75) has joined #ceph
[22:40] * rendar (~I@95.235.182.241) Quit (Ping timeout: 480 seconds)
[22:42] * sudocat2 (~dibarra@192.185.1.20) has joined #ceph
[22:45] * ``rawr (uid23285@id-23285.tooting.irccloud.com) has joined #ceph
[22:46] <``rawr> attempting to fix a production ceph cluster -- we had a large deletion (deleting an old PG that had roughly 40TB of data) and what appeared to be a disk failure during this same time period
[22:47] <``rawr> now `ceph -s` looks like: http://pastebin.com/j1ntBUrE
[22:47] * sudocat1 (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[22:47] <``rawr> Have restarted all the nodes in the cluster
[22:48] <``rawr> Not sure if its relevant, but we have 10 SSDs as a cache tier and 100 10k spindle drives
[22:50] <``rawr> Would anyone be able to help me, or at least point me in the right direction?
[22:54] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[22:55] * wes_dillingham (~wes_dilli@65.112.11.66) Quit (Quit: wes_dillingham)
[22:58] <s3an2> 71/110 in osds are down << why do you have so many OSD's down?
[22:59] <``rawr> no matter what I do, they keep flapping
[22:59] <``rawr> I can turn `nodown` on
[22:59] <``rawr> and bring them all up
[22:59] <``rawr> but as soon as turn off `nodown` they immediately go down
[22:59] <``rawr> due to heartbeats
[23:00] <``rawr> (from my understanding)
[23:00] <``rawr> even though they all appear to be up
[23:00] <T1> has some network setting or firewall changed?
[23:00] <``rawr> nope, already checked that -- everything seems to be able to hit everything else
[23:00] * sudocat2 (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[23:01] <s3an2> whatif you set nobackfill and norecover can you get them to stay up at all?
[23:01] <``rawr> it happened at a very particular point in time, when we started deleting one of out old PGs
[23:01] <``rawr> it was deleting at like 20GB/s (from what I can tell)
[23:01] <``rawr> then it appeared an OSD actually failed
[23:01] <``rawr> and then they've been flapping ever since
[23:01] <``rawr> gonna try setting nobackfill and norecovery now
[23:02] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Ping timeout: 480 seconds)
[23:04] <``rawr> doesn't appear to be helping
[23:04] <``rawr> also, when I restart an OSD, I'll get: 2016-08-24 21:03:50.116778 7f209acf2700 -1 osd.39 141286 heartbeat_check: no reply from osd.5 ever on either front or back, first ping sent 2016-08-24 21:03:30.033742 (cutoff 2016-08-24 21:03:30.116776)
[23:04] <``rawr> is that normal?
[23:05] <T1> seem like osd.5 is too busy to reply
[23:05] <``rawr> there is no CPU or disk usage on any of these nodes
[23:05] <s3an2> check you don't have any MTU problems
[23:06] <``rawr> MTU set to 9k on all of the nodes
[23:06] <T1> and you are sure you can telnet from one node to another on a port one of the OSDs has opened?
[23:06] * rendar (~I@95.235.182.241) has joined #ceph
[23:06] * huats_ (~quassel@stuart.objectif-libre.com) Quit (Ping timeout: 480 seconds)
[23:07] <``rawr> I haven't tried that yet -- let me confirm
[23:08] <T1> .. just to be absolutely sure it's not related to network stuff..
[23:08] <``rawr> yea, network seems quite reasonable here
[23:08] <``rawr> how can I see the port an OSD is running on?
[23:09] <s3an2> lsof $PID
[23:09] <T1> or netstat -antp as root
[23:09] <T1> it lists the PID that has a specific port opened
[23:10] <``rawr> https://www.irccloud.com/pastebin/lGf0QPNk/
[23:10] <``rawr> there's actually a few of them
[23:11] <m0zes> how many boxes? you didn't happen to run out of pids, did you?
[23:11] <m0zes> 32k is the default cap...
[23:12] <``rawr> I have 5 boxes; each box has 2 smalls SSD disks for root directory; 20 10k SAS disks; and 2 SSDs
[23:12] <``rawr> each SAS disk and SSD disk have their own OSD
[23:12] <T1> and the node where osd.5 runs on - can you identify that node and see if one of those ceph-osd processes is osd.5 and telnet to it's IP and port from the node where osd.39 runs?
[23:12] <m0zes> okay, pid cap shouldn't have brought down *that* many osds, then.
[23:13] <T1> .. amount of ports in use whould also be ok..
[23:13] <``rawr> https://www.irccloud.com/pastebin/DRV7ze9K/
[23:13] <haplo37> Hi guys, it is possible to create an radosgw user with a specific access_key and secret_key ?
[23:13] * Unai1 (~Adium@50-115-70-150.static-ip.telepacific.net) has joined #ceph
[23:13] <T1> ok, good
[23:13] <T1> hm hm..
[23:14] <``rawr> https://www.irccloud.com/pastebin/luCJL49Q/
[23:14] <T1> what version of ceph are you running?
[23:14] <s3an2> maybe confirm that mtu 'ping -M do -s 8972 [destinationIP]'
[23:14] <``rawr> this one doesn't seem to send it
[23:14] <T1> no.. that's not a particular good sign
[23:14] <``rawr> https://www.irccloud.com/pastebin/TuH04LAY/
[23:14] <``rawr> OH GOD
[23:15] <m0zes> welp, there's your problem.
[23:15] <T1> oh dear..
[23:15] <s3an2> looks like am mtu issue
[23:15] <m0zes> or, at least, one of your problems ;)
[23:15] <s3an2> set them all the same and you should be in better place
[23:15] <T1> switch restart?
[23:15] <``rawr> does the MTU need to be set on bonded interfaces?
[23:15] <m0zes> I believe so.
[23:15] <``rawr> as well as the raw interface?
[23:15] <``rawr> k
[23:16] <T1> I've only set it on bonded interfaces
[23:16] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) Quit (Ping timeout: 480 seconds)
[23:16] <``rawr> https://www.irccloud.com/pastebin/WSXrMRjl/
[23:16] <T1> .. but I've also had as incident where a switch was restarted and the support for large mtu was never written to its config
[23:17] <``rawr> I don't think our switches got restarted
[23:17] <m0zes> ``rawr: I think you need to set it on the raw bond, not the vlan in the bond.
[23:17] <``rawr> would it just be safer to set everything to use 1500 MTU?
[23:17] <T1> so it didn't support it after a restart and it was hell to hunt down since only a few things broke
[23:17] <``rawr> https://www.irccloud.com/pastebin/bIuJpRHJ/
[23:17] <s3an2> You may need to set the MTU on the raw interfaces first
[23:17] <``rawr> https://www.irccloud.com/pastebin/JqX7Ze9w/
[23:18] <``rawr> those seem correct (am I wrong?)
[23:18] <T1> no I've got the same on a ceph node here too
[23:19] <T1> but my bond device is without vlan
[23:19] <T1> just checked - no mention of MTU in the slave configs
[23:19] <T1> only in the bond
[23:19] <s3an2> You may need to increase the MTU on the interfaces under the bond to allow the VLAN tag
[23:19] <T1> yes..
[23:20] <m0zes> p2p1 and em50 need 9018, bond0 needs 9000, then you should have 9000 through the system... i believe.
[23:21] <s3an2> Yea, I set the interfaces to be 9216 here, then the tagged bond to be 9000
[23:21] <m0zes> ah, 9216. that makes more sense.
[23:21] <``rawr> https://www.irccloud.com/pastebin/ntig0PIk/
[23:21] <T1> I understand the 9018 for vlan tagging, but why 9216?
[23:22] <s3an2> 9216 is the limit cisco sets for jumbo frames
[23:22] <m0zes> I misread. anyway, 9216 is often the cap in switches. it might make sense just to keep it out of the way ;)
[23:23] <T1> ``rawr: em50 and p1p2 AND bond0 .. and then bond0.64
[23:23] <``rawr> :+1
[23:23] <``rawr> ????
[23:24] <m0zes> oh noes, unicode in irc. that sounds like a terrible thing.
[23:24] <m0zes> s/unicode/emoji/
[23:24] <``rawr> sorry; sorry; Slack user
[23:25] <m0zes> I honestly don't care. I run my terminal client with unicode (thus emoji) support.
[23:25] <m0zes> I just like to poke fun.
[23:25] <``rawr> k -- gonna set this on all 5 boxes and get back
[23:25] <T1> and I'm still using a client that maked utf-8 chars look funny.. ;)
[23:25] <T1> makes even
[23:26] <T1> oh well..
[23:26] <m0zes> networking gets complicated when you any combination bonds, bridges and vlans to the mix.
[23:26] <T1> yeah
[23:26] <T1> most elaborate I've goet set up is lacp
[23:27] <s3an2> m0zes, sounds like openstack neutron networking ;)
[23:27] * m0zes 's switch distro uses linux and debian-style interface configs.
[23:27] <T1> we've no use for vlans - just got 6 physical interfaces in 3 bonds to 3 different networks in most of our servers
[23:27] <T1> + mgmt
[23:28] <s3an2> So it seems the MTU was lost when the OSD's got rebooted, You may still have the original issue after the MTU problem is fixed :/
[23:29] <``rawr> yea, it doesn't seem likely that this would be the only issue
[23:29] <``rawr> they have it set to 9k now
[23:29] <T1> hm..
[23:29] <T1> time for bed
[23:30] <T1> gotta get up early tomorrow
[23:30] <``rawr> :(
[23:30] <mlovell> i have a centos 7 server currently running hammer and am trying to upgrade to jewel manually. i changed my yum repo to have the rpm-jewel repo instead of the rpm-hammer repo. when i try to do yum upgrade, it doesn't think there are newer packages. if i try to yum install ceph-mon it complains saying that it depends on the 10.2.2 versions but that 0.94.7 versions are installed and doesn't want to install newer ones. any thoughts as to what i mi
[23:31] <T1> (2 day company outing - I've packed a case of booze for when the bar closes and the party continues and I know there's been a few cases of wine and beer arranged too)
[23:31] <T1> so it's not all that bad.. ;)
[23:31] <s3an2> sounds like a good day 'at work'
[23:31] <``rawr> https://www.irccloud.com/pastebin/6CVkJoPY/
[23:32] * lobstar (~raindog@108.61.123.75) has joined #ceph
[23:32] <``rawr> doesn't seem like this helped the heartbeat issue :(
[23:32] <T1> yeah I'm just able to cope..
[23:32] <s3an2> those ping tests working now?
[23:32] <s3an2> including to the mons
[23:33] <``rawr> didn't test the mons
[23:34] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) has joined #ceph
[23:36] <``rawr> https://www.irccloud.com/pastebin/Y1K1prd3/
[23:37] <``rawr> yea, went through all the osds and mons
[23:37] <``rawr> is there anything else I should be careful on?
[23:37] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) Quit (Remote host closed the connection)
[23:38] <s3an2> whats the cluster looks like now?
[23:38] <s3an2> still 71 osd's down?
[23:38] <``rawr> 69/110 in osds are down
[23:38] <``rawr> it oscillates a bit
[23:39] <``rawr> but I haven't seen it go below roughly 50/110 osds are down
[23:39] <T1> restart them - a fixed net takes time before timeouts and retransmissions helps
[23:40] <``rawr> restart the OSDs?
[23:40] <``rawr> k
[23:40] <T1> just the ones that are down
[23:40] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[23:40] <``rawr> k
[23:40] <s3an2> Yea, I would reset the down ones, I would keep nobackfills and norecovery set for a little to try and get the osds up
[23:41] <``rawr> k
[23:41] <``rawr> going through them now
[23:41] <s3an2> If you need to increase pid_max 'sysctl -w kernel.pid_max=4194303'
[23:43] * haplo37 (~haplo37@199.91.185.156) Quit (Remote host closed the connection)
[23:43] <T1> I'm off
[23:43] <T1> cya
[23:43] <T1> (and good luck with that cluster..)
[23:47] * chunmei (~chunmei@134.134.137.75) Quit (Remote host closed the connection)
[23:48] <``rawr> 30/110 in osds are down
[23:48] <``rawr> but the number keeps going up
[23:48] <``rawr> k -- increasing pid_max
[23:50] <``rawr> increased the pid_max and unfortunately the number of osds that are down have increased as well
[23:50] <``rawr> 40/110 in osds are down
[23:51] <s3an2> Check the log files of a down osd
[23:55] <``rawr> https://www.irccloud.com/pastebin/Gy4vAGMG/
[23:55] <``rawr> seems suspicious
[23:55] <``rawr> https://www.irccloud.com/pastebin/v1H7Ijtv/
[23:55] <``rawr> this as well, but I don't know what it means
[23:56] <``rawr> https://www.irccloud.com/pastebin/W5OBMatR/
[23:57] <``rawr> seems like some people recommend changing osd_scrub_thread_suicide_timeout (https://access.redhat.com/solutions/2127471)
[23:57] <``rawr> probably going to try that first
[23:58] <s3an2> just disable scrubs for now
[23:58] <s3an2> set osd noscrub
[23:58] <s3an2> set osd nodeep-scrub
[23:59] <``rawr> set -- gonna try restarting the guys that went down

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.