#ceph IRC Log

Index

IRC Log for 2016-10-10

Timestamps are in GMT/BST.

[0:05] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[0:07] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[0:13] * rendar (~I@host220-173-dynamic.116-80-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[0:17] * tallest_red (~storage@108.61.122.72) Quit ()
[0:41] * kuku (~kuku@203.177.235.23) has joined #ceph
[0:43] * kuku_ (~kuku@203.177.235.23) has joined #ceph
[0:43] * kuku (~kuku@203.177.235.23) Quit (Read error: Connection reset by peer)
[0:45] * kuku (~kuku@119.93.91.136) has joined #ceph
[0:52] * kuku_ (~kuku@203.177.235.23) Quit (Ping timeout: 480 seconds)
[0:54] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:56] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[1:04] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:06] * sleinen1 (~Adium@2001:620:0:82::102) Quit (Ping timeout: 480 seconds)
[1:11] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[1:19] * oms101 (~oms101@p20030057EA096800C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:20] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[1:20] * Concubidated1 (~cube@104.220.228.114) Quit (Quit: Leaving.)
[1:28] * oms101 (~oms101@p20030057EA118F00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:31] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) Quit (Quit: jermudgeon_)
[1:36] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[1:36] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) Quit (Quit: Leaving.)
[1:37] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:46] * kuku (~kuku@119.93.91.136) Quit (Remote host closed the connection)
[1:47] * efirs (~firs@98.207.153.155) has joined #ceph
[1:47] * wkennington (~wak@0001bde8.user.oftc.net) has joined #ceph
[1:47] * wkennington (~wak@0001bde8.user.oftc.net) Quit ()
[1:51] * wkennington (~wak@0001bde8.user.oftc.net) has joined #ceph
[1:54] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) Quit (Quit: jermudgeon_)
[1:54] * jermudgeon (~jermudgeo@tab.mdu.whitestone.link) Quit (Quit: jermudgeon)
[1:54] * jermudgeon (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[2:03] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[2:09] * ghostnote (~VampiricP@108.61.122.88) has joined #ceph
[2:09] * minnesotags (~herbgarci@c-50-137-242-97.hsd1.mn.comcast.net) has joined #ceph
[2:10] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) Quit (Remote host closed the connection)
[2:10] * kuku (~kuku@203.177.235.23) has joined #ceph
[2:13] * kuku_ (~kuku@203.177.235.23) has joined #ceph
[2:13] * kuku (~kuku@203.177.235.23) Quit (Read error: Connection reset by peer)
[2:16] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[2:16] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) Quit ()
[2:25] * kuku_ (~kuku@203.177.235.23) Quit (Quit: computer sleep)
[2:29] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[2:35] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Quit: leaving)
[2:38] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[2:39] * ghostnote (~VampiricP@108.61.122.88) Quit ()
[2:45] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Quit: leaving)
[2:51] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[2:53] * jermudgeon_ (~jermudgeo@tab.biz.whitestone.link) Quit (Remote host closed the connection)
[2:53] * MatthewH12 (~cmrn@185.100.86.100) has joined #ceph
[2:54] * jermudgeon (~jermudgeo@tab.biz.whitestone.link) Quit (Quit: jermudgeon)
[2:56] * jermudgeon (~jermudgeo@tab.biz.whitestone.link) has joined #ceph
[3:10] * Jeffrey4l_ (~Jeffrey@110.252.73.52) Quit (Ping timeout: 480 seconds)
[3:11] * Jeffrey4l_ (~Jeffrey@110.252.73.52) has joined #ceph
[3:20] * Jeffrey4l_ (~Jeffrey@110.252.73.52) Quit (Ping timeout: 480 seconds)
[3:21] * mtanski (~mtanski@ool-182dce0f.dyn.optonline.net) has joined #ceph
[3:23] * MatthewH12 (~cmrn@185.100.86.100) Quit ()
[3:29] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[3:29] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[3:30] * Jeffrey4l_ (~Jeffrey@110.252.73.52) has joined #ceph
[3:31] * dalegaar1-39554 (~dalegaard@vps.devrandom.dk) Quit (Remote host closed the connection)
[3:31] * dalegaard-39554 (~dalegaard@vps.devrandom.dk) has joined #ceph
[3:33] * dis (~dis@00018d20.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:34] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) has joined #ceph
[3:34] * jermudgeon_ (~jermudgeo@southend.mdu.whitestone.link) has joined #ceph
[3:34] * jermudgeon (~jermudgeo@tab.biz.whitestone.link) Quit (Quit: quittin)
[3:34] * jermudgeon_ is now known as jermudgeon
[3:36] * dis (~dis@00018d20.user.oftc.net) has joined #ceph
[3:50] * Concubidated (~cube@c-73-12-218-131.hsd1.ca.comcast.net) has joined #ceph
[3:51] * Jeffrey4l_ (~Jeffrey@110.252.73.52) Quit (Ping timeout: 480 seconds)
[3:55] * derjohn_mobi (~aj@x590cc2af.dyn.telefonica.de) has joined #ceph
[4:00] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) Quit (Quit: Leaving.)
[4:02] * jfaj (~jan@p20030084AD264C006AF728FFFE6777FF.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:03] * derjohn_mob (~aj@x590c6c73.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[4:12] * jfaj (~jan@p20030084AD2E91006AF728FFFE6777FF.dip0.t-ipconnect.de) has joined #ceph
[4:21] * blahdodo (~blahdodo@69.172.164.248) Quit (Remote host closed the connection)
[4:25] * blahdodo (~blahdodo@69.172.164.248) has joined #ceph
[4:29] * minnesotags (~herbgarci@c-50-137-242-97.hsd1.mn.comcast.net) Quit (Read error: No route to host)
[4:39] * sep (~sep@2a04:2740:1ab:1::2) Quit (Ping timeout: 480 seconds)
[4:39] * minnesotags (~herbgarci@c-50-137-242-97.hsd1.mn.comcast.net) has joined #ceph
[4:47] * mtanski (~mtanski@ool-182dce0f.dyn.optonline.net) Quit (Quit: mtanski)
[4:56] * sep (~sep@95.62-50-191.enivest.net) has joined #ceph
[5:03] * kefu (~kefu@114.92.125.128) has joined #ceph
[5:08] * m8x (~user@182.150.27.112) has joined #ceph
[5:27] * Jeffrey4l_ (~Jeffrey@110.252.73.52) has joined #ceph
[5:30] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[5:34] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[5:35] * m8x (~user@182.150.27.112) has left #ceph
[5:44] * Vacuum__ (~Vacuum@88.130.215.214) has joined #ceph
[5:50] * Vacuum_ (~Vacuum@i59F79058.versanet.de) Quit (Ping timeout: 480 seconds)
[6:08] * jermudgeon (~jermudgeo@southend.mdu.whitestone.link) Quit (Quit: jermudgeon)
[6:08] * jermudgeon (~jermudgeo@southend.mdu.whitestone.link) has joined #ceph
[6:09] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[6:11] * walcubi (~walcubi@p5797AECB.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:12] * walcubi (~walcubi@p5795B634.dip0.t-ipconnect.de) has joined #ceph
[6:13] * jermudgeon (~jermudgeo@southend.mdu.whitestone.link) Quit (Quit: jermudgeon)
[6:16] * jermudgeon (~jermudgeo@southend.mdu.whitestone.link) has joined #ceph
[6:19] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Quit: leaving)
[6:20] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[6:24] * wiebalck_ (~wiebalck@AAnnecy-653-1-50-224.w90-41.abo.wanadoo.fr) has joined #ceph
[6:37] * kefu is now known as kefu|afk
[6:41] * derjohn_mobi (~aj@x590cc2af.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[6:42] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:05] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:05] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:08] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[7:09] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:12] * morse_ (~morse@supercomputing.univpm.it) has joined #ceph
[7:12] * morse (~morse@supercomputing.univpm.it) Quit (Read error: Connection reset by peer)
[7:14] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[7:19] * wiebalck_ (~wiebalck@AAnnecy-653-1-50-224.w90-41.abo.wanadoo.fr) Quit (Quit: wiebalck_)
[7:24] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[7:27] * Jourei (~Izanagi@46.166.190.195) has joined #ceph
[7:33] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:35] * lincolnb (~lincoln@c-71-57-68-189.hsd1.il.comcast.net) has left #ceph
[7:39] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[7:47] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[7:48] * derjohn_mobi (~aj@46.183.103.8) has joined #ceph
[7:52] * Concubidated (~cube@c-73-12-218-131.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:55] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[7:56] * vimal (~vikumar@121.244.87.116) has joined #ceph
[7:57] * Jourei (~Izanagi@46.166.190.195) Quit ()
[8:08] * Ivan1 (~ipencak@213.151.95.130) has joined #ceph
[8:16] * lmb (~Lars@ip5b404bab.dynamic.kabel-deutschland.de) Quit (Quit: Leaving)
[8:27] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:28] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Quit: Ex-Chat)
[8:37] * Kurt (~Adium@2001:628:1:5:5489:32f9:27e5:3f42) has joined #ceph
[8:39] * derjohn_mobi (~aj@46.183.103.8) Quit (Ping timeout: 480 seconds)
[8:47] * sleinen (~Adium@macsl.switch.ch) has joined #ceph
[8:48] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[8:48] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:49] * sleinen1 (~Adium@2001:620:0:69::101) has joined #ceph
[8:55] * sleinen (~Adium@macsl.switch.ch) Quit (Ping timeout: 480 seconds)
[8:57] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[8:58] * derjohn_mob (~aj@46.189.28.79) has joined #ceph
[8:59] * kefu|afk (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[9:03] * kefu (~kefu@114.92.125.128) has joined #ceph
[9:06] * ggarg (~ggarg@host-82-135-29-34.customer.m-online.net) has joined #ceph
[9:06] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Quit: Ex-Chat)
[9:07] * analbeard (~shw@support.memset.com) has joined #ceph
[9:09] * moegyver (~oftc-webi@2001:9b0:14a:10:36e6:d7ff:fe0e:81b0) has joined #ceph
[9:13] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[9:19] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[9:22] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[9:23] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Quit: cyphase.com)
[9:23] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[9:25] * bitserker (~toni@2.152.12.64.dyn.user.ono.com) has joined #ceph
[9:25] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[9:28] * gucore (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[9:30] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Remote host closed the connection)
[9:31] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[9:31] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:31] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[9:33] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[9:42] * rotbeard (~redbeard@aftr-109-90-233-215.unity-media.net) has joined #ceph
[9:44] * ade (~abradshaw@tmo-098-166.customers.d1-online.com) has joined #ceph
[9:47] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Quit: cyphase.com)
[9:47] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[9:47] * fsimonce (~simon@95.239.69.67) has joined #ceph
[9:49] * Jeffrey4l_ (~Jeffrey@110.252.73.52) Quit (Remote host closed the connection)
[9:52] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[9:52] * Jeffrey4l (~Jeffrey@110.252.73.52) has joined #ceph
[9:54] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) Quit (Quit: I'll be back!)
[9:56] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Quit: Ex-Chat)
[9:58] * JANorman (~JANorman@81.137.246.31) has joined #ceph
[9:58] * garphy`aw is now known as garphy
[9:58] * JANorman_ (~JANorman@81.137.246.31) has joined #ceph
[9:58] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:58] * ccourtaut (~ccourtaut@157.173.31.93.rev.sfr.net) has joined #ceph
[9:59] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:01] * madkiss2 (~madkiss@2a02:8109:8680:2000:c0e7:d8f1:4e38:522f) has joined #ceph
[10:03] * cyphase (~cyphase@2601:640:c401:969a:78cc:a6b9:21d7:4f0e) has joined #ceph
[10:05] * madkiss1 (~madkiss@2a02:8109:8680:2000:3dd2:b156:dc57:2d23) Quit (Ping timeout: 480 seconds)
[10:06] * JANorman (~JANorman@81.137.246.31) Quit (Ping timeout: 480 seconds)
[10:12] * Concubidated (~cube@h4.246.129.40.static.ip.windstream.net) has joined #ceph
[10:16] * ade (~abradshaw@tmo-098-166.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[10:17] * DanFoster (~Daniel@2a00:1ee0:3:1337:418d:47f1:e6a7:15f6) has joined #ceph
[10:22] * erwan_taf (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[10:22] * jermudgeon (~jermudgeo@southend.mdu.whitestone.link) Quit (Quit: jermudgeon)
[10:25] * ade (~abradshaw@pool-22.254.176.62.dynamic.wobline-ip.de) has joined #ceph
[10:27] * branto (~branto@transit-86-181-132-209.redhat.com) has joined #ceph
[10:28] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Quit: cyphase.com)
[10:28] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[10:28] * mattch (~mattch@w5430.see.ed.ac.uk) has joined #ceph
[10:29] * JANorman_ (~JANorman@81.137.246.31) Quit (Remote host closed the connection)
[10:29] * JANorman (~JANorman@81.137.246.31) has joined #ceph
[10:33] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[10:33] * erwan_taf (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Remote host closed the connection)
[10:34] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[10:34] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Remote host closed the connection)
[10:35] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[10:35] * JANorman (~JANorman@81.137.246.31) Quit (Quit: Leaving...)
[10:42] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[10:48] <FidoNet> morning
[10:48] <FidoNet> so I???ve woken up to this today ??? any ideas? cephfs appears to be completely offline
[10:48] <FidoNet> 2016-10-10 09:48:23.329705 7f9989a7d700 1 mds.beacon.mds03 _send skipping beacon, heartbeat map not healthy
[10:48] <FidoNet> 2016-10-10 09:48:24.128000 7f998eb88700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15
[10:49] <Be-El> FidoNet: the important part is missing
[10:49] <FidoNet> wassat?
[10:50] <Be-El> these messages only indicate that the mds is busy and does not respond to heartbeats
[10:50] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[10:50] <Be-El> there should be additional messages indicating e.g. a restart or crash of the mds
[10:50] <FidoNet> ok .. it???s been doing the same for several hours and all accesses are hanging / writes stalled
[10:51] <FidoNet> ceph -s at https://pastebin.com/d2uCW088
[10:51] <FidoNet> some more errors - https://pastebin.com/VNJrVDWM
[10:52] <Be-El> the 'fsmap' line indicates that the mds is restarting (e.g. due to a crash). the rejoin step scans the data and metadata pool to locate all files etc. depending on the speed of these pools it might take some time
[10:53] <FidoNet> ok somre more meaningful errors - https://pastebin.com/3SN96KmJ
[10:53] <FidoNet> 3h so far
[10:54] <Be-El> you're probably having a lot of iop/s on the metadata pool and data pools at the moment
[10:54] <Be-El> 2016-10-10 06:40:33.062034 7fa6e5434700 ??1 mds.mds03 handle_mds_map i (172.16.50.23:6805/11274) dne in the mdsmap, respawning myself
[10:55] <FidoNet> probably .. an ubuntu mirror was running last night
[10:55] <Be-El> that's the current problem. the mds takes too long to scan the pools, does not react to heartbeats and is thus considered dead by the mons
[10:56] * wjw-freebsd (~wjw@31.138.125.136) has joined #ceph
[10:56] <Be-El> you can use a longer timeout for the mds heartbeats
[10:56] <Be-El> it should allow the mds to finish the rejoin phase and become operational again
[10:57] <Be-El> and _after_ that you definitely want to check the complete log to find the reason for the first crash
[10:57] <Be-El> [mds]
[10:57] <Be-El> mds_beacon_grace = 60
[10:57] <Be-El> mds_beacon_interval = 4
[10:57] <Be-El> mds_session_timeout = 120
[10:58] <Be-El> that's our setup for the mds server, using a longer grace period etc.
[10:58] <Be-El> just put it in ceph.conf and wait for the next restart of the mds (or restart it manually)
[10:58] <Be-El> you may need to increase the values depending on the speed of your pools
[11:00] * efirs (~firs@98.207.153.155) Quit (Quit: Leaving.)
[11:00] <FidoNet> ok thanks
[11:01] <Be-El> -> off for meeting, good luck with the mds
[11:01] <FidoNet> thanks!!
[11:02] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[11:08] * wjw-freebsd2 (~wjw@31.137.49.81) has joined #ceph
[11:09] * TMM (~hp@185.5.121.201) has joined #ceph
[11:10] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:11] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:2038:c77a:4100:5349) has joined #ceph
[11:12] * wjw-freebsd (~wjw@31.138.125.136) Quit (Ping timeout: 480 seconds)
[11:13] <thoht> after reboot, my ssd disk journal is not owned by ceph and osd don't start
[11:13] <thoht> i ve to add chown ceph /dev/sda3 on rc.local ...
[11:15] <FidoNet> it is one of the most glaring ommissions from the docs and (imho) the biggest setup bug ???. you need to manually set the partition id otherwise the permissions break
[11:15] <FidoNet> sgdisk -t 1:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 /dev/sda (assuming sda1)
[11:15] <FidoNet> sgdisk -t 2:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 /dev/sda assuming sda2 and so on
[11:15] <FidoNet> took me a week of pulling my hair out to figure that little gem out
[11:16] <thoht> FidoNet: blkid to fetch the UUID ?
[11:16] <FidoNet> no sgdisk to set the id as above
[11:16] * dgurtner (~dgurtner@94.126.212.170) has joined #ceph
[11:16] <FidoNet> id/partition type
[11:17] <FidoNet> the part type needs to be cephfs journal
[11:17] <thoht> but where does 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 come from ?
[11:17] <FidoNet> it???s the partition type string
[11:17] <FidoNet> means cephfs journal
[11:18] <thoht> where does i find it ?
[11:18] <thoht> ceph journal is /dev/sda3
[11:18] <FidoNet> https://en.wikipedia.org/wiki/GUID_Partition_Table
[11:18] <FidoNet> so if your ceph journal is on partition 3 of /dev/sda ??? you would do -t 3:blah /dev/sda
[11:19] <thoht> i ve many other partition on /dev/sda
[11:19] <thoht> oh 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 is the uuid of ceph journal
[11:19] <thoht> nice
[11:20] <thoht> thks for the URL
[11:20] <thoht> sgdisk -t 3:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 /dev/sda <== should be for my case, isn't it ?
[11:21] <FidoNet> yup
[11:27] * rotbeard (~redbeard@aftr-109-90-233-215.unity-media.net) Quit (Quit: Leaving)
[11:31] * masteroman (~ivan@93-142-35-27.adsl.net.t-com.hr) has joined #ceph
[11:32] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[11:35] * rraja (~rraja@125.16.34.66) has joined #ceph
[11:43] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[11:50] <thoht> FidoNet: i got this error: https://gist.github.com/nlienard/acaef35cdc6cba5368b5c67a6c1f1820
[11:50] <FidoNet> is sda your boot drive by any chance?
[11:51] <thoht> yes it is the boot drive
[11:51] <FidoNet> that???s a bad idea .. you shouldn???t put your journals on your OS / boot disk
[11:51] <thoht> FidoNet: i don't have choice
[11:51] <FidoNet> one possible way around this, although I haven???t tested it yet, is if your hardware will boot in EFI mode
[11:51] <thoht> it is separated partition
[11:52] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:52] <FidoNet> if so then you might be able to set your boot disk to GPT ???. but as I say I haven???t tested it
[11:52] <thoht> FidoNet: https://gist.github.com/nlienard/acaef35cdc6cba5368b5c67a6c1f1820 <== i added partion list
[11:52] <thoht> partition
[11:52] <thoht> partition table is msdos
[11:52] <thoht> i won't be able to change it
[11:52] <FidoNet> ceph journals need to be on GPT as far as I can tell
[11:53] <thoht> for now it works
[11:54] <thoht> my osd are in zfs they weren't starting properly
[11:54] <FidoNet> ok .. be careful anyway ??? journals should be on a stand alone SSD
[11:54] <thoht> FidoNet: i don't have this chance to have standalone
[11:55] <thoht> i had to add osd max object name len = 256 and osd max object namespace len = 64 :/
[11:55] <FidoNet> ok
[11:55] <thoht> nobody uses zfs as osd ?
[11:55] <FidoNet> I have the os boot on a small SAS DOM module
[11:56] <FidoNet> no .. xfs here ??? the other thing is ideally 1 osd per disk .. no more
[11:56] <FidoNet> outside of the lab anyway
[11:57] <thoht> it is my case
[11:57] * floppyraid (~holoirc@124-170-33-218.dyn.iinet.net.au) has joined #ceph
[11:57] <thoht> 1 osd per disk
[11:57] <thoht> 1 osd zfs SATA + journal SSD
[12:00] <FidoNet> ok .. I???m still relatively new to this and haven???t used zfs with ceph at all ???. still trying to get the proof of concept working in the lab (15 osds across 3 nodes currently) ??? coming along nicely though
[12:02] * floppyraid (~holoirc@124-170-33-218.dyn.iinet.net.au) Quit ()
[12:11] <peetaur2> FidoNet: do you have a reference to the claim that you can't put the journal on the os disk? I have read everywhere it's just for performance; is this all you mean?
[12:12] <FidoNet> well a) performance and b) the partition needs to be GPT to set the partition type ??? as far as I can tell ???
[12:12] <thoht> it works without GPT
[12:12] <FidoNet> how?
[12:12] <thoht> what how ?
[12:13] <FidoNet> how can you set the GPT partition type on a non GPT partition ?
[12:13] <peetaur2> is the gpt thing all about ceph-deploy?
[12:13] <thoht> in ceph.conf, i just put :osd journal = /dev/sdb3
[12:13] <peetaur2> I have done all sorts of osd types and journal types without worrying about gpt
[12:13] <thoht> for each OSD section
[12:13] <thoht> i didn't use ceph-deploy
[12:13] <FidoNet> ceph jewel needs the partition type to be set as far as I can tell, otherwise it is owned by root and not ceph
[12:14] <peetaur2> I recommend not using letters... sdb. use partlabel, partuuid, etc. instead, or /dev/disk/by-id/...
[12:14] <thoht> i m running jewel as well
[12:14] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[12:14] <peetaur2> if you accidentally start an osd using another osd's journal, bad things happen... (I tested it)
[12:14] <thoht> peetaur2: no reason i ll change the conf :)
[12:14] <thoht> now it is in prod; i won't touch anything
[12:15] <thoht> and if i lose the journal; then i ll rebuild the osd
[12:15] <peetaur2> if you reboot one day and sdb is now sdf and sdr is sdb, then 2 osds go boom
[12:15] <thoht> it is not so dramatic
[12:15] <thoht> if i reboot; letter don't change
[12:15] <peetaur2> never assume that...ever
[12:15] <thoht> it is the case for all
[12:16] <peetaur2> maybe somebody sticks a usb stick in there one day, and then your sda,b,c becomes sda,c,d ... or firmware changes, kernel changes, etc.
[12:16] <thoht> it is a server in a Datacenter; nobody will come with usb key :)
[12:16] <FidoNet> the letters can (and do) change ??? if you use a UUID then it is safer ??? but we???re not talking UUID we???re talking partition type label
[12:16] <peetaur2> just never ever use sdb, use only unique identifiers...this is just a general rule; assuming it'll stay the same just because it always has is very wrong
[12:16] <thoht> sure
[12:16] <FidoNet> never say never ??? we???ve had engineers (Cisco/etc) plugging phones in to usb ports to recharge them ??? with disasterous consequences
[12:16] <peetaur2> :)
[12:17] <thoht> i don't have access to datacenter so should be fine
[12:19] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:20] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[12:24] * bauruine (~bauruine@2a01:4f8:130:8285:fefe::36) Quit (Quit: ZNC - http://znc.in)
[12:25] * bauruine (~bauruine@2a01:4f8:130:8285:fefe::36) has joined #ceph
[12:36] * overclk (~quassel@2400:6180:100:d0::54:1) Quit (Remote host closed the connection)
[12:37] * overclk (~quassel@2400:6180:100:d0::54:1) has joined #ceph
[12:38] * kefu (~kefu@114.92.125.128) has joined #ceph
[12:39] * rendar (~I@host192-183-dynamic.49-79-r.retail.telecomitalia.it) has joined #ceph
[12:39] * [0x4A6F]_ (~ident@p4FC271D2.dip0.t-ipconnect.de) has joined #ceph
[12:39] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[12:41] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:41] * [0x4A6F]_ is now known as [0x4A6F]
[12:42] * analbeard (~shw@support.memset.com) Quit (Ping timeout: 480 seconds)
[12:43] * egi (~egi@83.220.236.101) has joined #ceph
[12:44] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[12:46] * wkennington (~wak@0001bde8.user.oftc.net) Quit (Quit: Leaving)
[12:51] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[12:53] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[12:53] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[12:54] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[12:59] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:01] * KindOne (kindone@0001a7db.user.oftc.net) has joined #ceph
[13:01] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:05] * i_m (~i_m@nat-5-carp.hcn-strela.ru) has joined #ceph
[13:06] * egi (~egi@83.220.236.101) Quit (Ping timeout: 480 seconds)
[13:06] * i_m (~i_m@nat-5-carp.hcn-strela.ru) Quit ()
[13:06] * i_m (~i_m@nat-5-carp.hcn-strela.ru) has joined #ceph
[13:11] * analbeard (~shw@support.memset.com) has joined #ceph
[13:14] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[13:15] * kefu (~kefu@114.92.125.128) has joined #ceph
[13:22] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[13:22] * wjw-freebsd (~wjw@31.137.151.137) has joined #ceph
[13:25] * wjw-freebsd2 (~wjw@31.137.49.81) Quit (Ping timeout: 480 seconds)
[13:28] * bene3 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[13:29] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[13:30] * wjw-freebsd (~wjw@31.137.151.137) Quit (Ping timeout: 480 seconds)
[13:38] * wjw-freebsd (~wjw@109.32.10.189) has joined #ceph
[13:42] <Be-El> FidoNet: any luck fixing the mds startup problem?
[13:43] <FidoNet> nope ??? I increased the timeouts .. but it???s still not happy
[13:43] <FidoNet> 2016-10-10 12:42:55.767285 7f4af8b09700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 360
[13:43] <FidoNet> https://pastebin.com/441FwUaP
[13:46] * natarej (~natarej@101.188.54.14) Quit (Read error: Connection reset by peer)
[13:52] <Be-El> i'm not sure which daemon is involved in marking the mds as unavailable. it might be the mds itself (e.g. a watchdog thread) or the mons
[13:54] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[13:54] <Be-El> mds_beacon_grace is already set to 360?
[14:00] * wiebalck_ (~wiebalck@pb-d-128-141-7-126.cern.ch) has joined #ceph
[14:01] <FidoNet> Yes I increased it from 60 to 360
[14:01] <FidoNet> as it was timing out at 60
[14:02] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:05] * vimal (~vikumar@121.244.87.116) Quit (Ping timeout: 480 seconds)
[14:05] * madkiss2 (~madkiss@2a02:8109:8680:2000:c0e7:d8f1:4e38:522f) Quit (Quit: Leaving.)
[14:06] * sleinen1 (~Adium@2001:620:0:69::101) Quit (Ping timeout: 480 seconds)
[14:10] * salwasser (~Adium@2601:197:101:5cc1:6d44:f771:b878:3371) has joined #ceph
[14:11] * vimal (~vikumar@121.244.87.116) has joined #ceph
[14:26] * wjw-freebsd (~wjw@109.32.10.189) Quit (Ping timeout: 480 seconds)
[14:33] * wiebalck_ (~wiebalck@pb-d-128-141-7-126.cern.ch) Quit (Quit: wiebalck_)
[14:40] * salwasser (~Adium@2601:197:101:5cc1:6d44:f771:b878:3371) Quit (Quit: Leaving.)
[14:40] * salwasser (~Adium@2601:197:101:5cc1:6d44:f771:b878:3371) has joined #ceph
[14:40] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[14:40] * salwasser (~Adium@2601:197:101:5cc1:6d44:f771:b878:3371) Quit ()
[14:40] * salwasser (~Adium@2601:197:101:5cc1:6d44:f771:b878:3371) has joined #ceph
[14:44] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[14:44] * salwasser (~Adium@2601:197:101:5cc1:6d44:f771:b878:3371) Quit ()
[14:44] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[14:48] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:51] * sleinen (~Adium@macsl.switch.ch) has joined #ceph
[14:55] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[15:01] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[15:02] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[15:04] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:12] <ahadi> Hi guys, another question regarding our Ceph setup. Do you think its a very bad idea to have OS (very small partition) and SSD on one disk? I mean the OS is loaded in RAM at some point and won't take much IO, or am I wrong here?
[15:14] * dugravot6 (~dugravot6@nat1-eduroam-plg.wifi.univ-lorraine.fr) has joined #ceph
[15:16] <T1w> ahadi: if you by "SSD" mean 1 or more OSD journals on a SSD drive, then it depends on how many OSDs that particular SSD is journal for
[15:17] <T1w> by rule of thumb, a single SSD should not be journal for more than 8-10 OSDs - but it varies quite a lot on the workload
[15:17] * KindOne (kindone@0001a7db.user.oftc.net) has joined #ceph
[15:17] * vimal (~vikumar@121.244.87.116) Quit (Remote host closed the connection)
[15:17] <valeech> Good morning - I have a ceph cluster that has 3 nodes. 2 of the nodes contain 4 OSDs each for a cluster total of 8 OSDs. The third node simply participates for quorum. Whenever a single OSD fails, the entire cluster appears to become unavailable. If I restart the failed OSD, after rebuild everything works great. Is there a config I am missing regarding the crushmap in tis type of setup? Why would a single OSD cause the entire clus
[15:17] <valeech> to stop working?
[15:18] * dugravot61 (~dugravot6@nat1-eduroam-plg.wifi.univ-lorraine.fr) has joined #ceph
[15:18] * dugravot6 (~dugravot6@nat1-eduroam-plg.wifi.univ-lorraine.fr) Quit ()
[15:19] <ahadi> T1w: One OSD drive is 4TB. We planned to have 4 OSDs per Server with actually two 480 GB SSDs for journal (and OS on small partition).
[15:19] <T1w> what make and model SSDs?
[15:20] * vimal (~vikumar@121.244.87.116) has joined #ceph
[15:21] <peetaur2> ahadi: better to waste 0.2 IOPS on your SSD all day than to waste a whole disk bay on an idle disk
[15:21] * xinli (~charleyst@32.97.110.57) has joined #ceph
[15:21] <ahadi> T1w: SAMSUNG MZ7LM480
[15:22] <ahadi> peetaur2: okay great, thank you
[15:22] <T1w> ahadi: is that from their datacenter line?
[15:22] <ahadi> Yes, so our hoster tells us at least
[15:22] <T1w> or is it a desktop class drive?
[15:22] * T1w looks for himself
[15:24] <T1w> thats the older PM863 model
[15:24] <T1w> afair (and from looking at the numbers on https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ for that model) that seems fine
[15:25] <T1w> OS and 4 OSDs should not be problematic at all for that drive
[15:25] <ahadi> T1w: great. But I remember talking here yesterday evening about Journal and RAID1. I would really like to see my OS in a RAID. Don't you agree?
[15:25] <T1w> 4 OSD journals even
[15:26] <T1w> ahadi: that is the setup I'm running, yes
[15:26] <ahadi> I mean even if Ceph is build for that, I don't want to install the node just because the SSD fails..
[15:26] <ahadi> okay great
[15:26] <T1w> I've got a pair of Intel S3710's for OS and OSD journals
[15:26] <Be-El> valeech: what's the value of the min_size setting for your pools?
[15:26] <peetaur2> ahadi: and 2 SSDs for only 4 OSDs sounds overkill... check util% in iostat -xm and see that it never hits heavy load
[15:26] <T1w> setup with software raid
[15:27] <T1w> peetaur2: perhaps the size is a bit overkill, but without OSD journal performance would be abysmal
[15:27] <ahadi> Because we should deactivate the Hardware RAID. HBA mode
[15:27] <ahadi> I remember
[15:27] <T1w> yes - avoid hardware raid where possible
[15:27] <peetaur2> I was planning for 3 osds per ssd... and bought 2 so far, but these ssds are faster than I expected... I was planning on buying some PCIe NVMe things later, but maybe I won't have to (12 3.5" disk chassis + 2 2.5")
[15:28] <valeech> Be-El: osd pool default min size = 1
[15:28] <peetaur2> T1w: yes of course use a separate journal with filestore osds (maybe won't be necessary with bluestore in the future), but just 2 osds per ssd is probably too few
[15:28] <T1w> if you use rotating rust (aka SATA disks) for OSD data disks you should use SSD based journals - if the OSD data drive is a SSD then you can remove any journals
[15:28] <valeech> Be-El: I only have one pool
[15:28] <peetaur2> but somoene said 8-10 here earlier, which I find to be large from what I read (but didn't test on any big workloads, and varies much by hardware)
[15:29] <T1w> peetaur2: I've got some nodes with just 2 OSDs for 2 S3710s in a mirror.. :)
[15:29] <T1w> yes it's overkill
[15:29] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) has joined #ceph
[15:29] <peetaur2> speaking of journals... what do you guys think about size?
[15:29] <T1w> but IMO its better than to loose an entire node if the SSD dies
[15:29] <peetaur2> docs just say start 1GB and raise from there...but based on what testing?
[15:30] <T1w> as with most ceph-related settings it depends.. :)
[15:30] <T1w> 5GB is the default size in hammer
[15:30] <peetaur2> and also they say take some interval (default 5s) and multiply by bandwidth to get size (which on 10Gbps would be 6.25GB I guess)
[15:30] <T1w> it depends on the workload and how often the journal is flushed to the data disk
[15:31] <T1w> yes
[15:31] <T1w> exactly
[15:31] <peetaur2> hmm I wonder if that would be double if you have 10Gbps public and cluster networks
[15:32] <T1w> there are timers that control how often the journal is flushed to the backend data disks
[15:32] <peetaur2> 2 nodes with public load of 10 each plus replicating to the other means each node needs 20 Gbps
[15:32] <T1w> if your journal fills and you cannot flush fast enough you've got a problem
[15:32] * dugravot6 (~dugravot6@nat1-eduroam-plg.wifi.univ-lorraine.fr) has joined #ceph
[15:32] * dugravot61 (~dugravot6@nat1-eduroam-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[15:32] <peetaur2> not a problem... a bottleneck
[15:32] <T1w> best solution for that is more OSDs, so the load is spread across more spindels
[15:33] <T1w> no, it's a problem
[15:33] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[15:33] <peetaur2> if you had a problem, what do you do? buy more osd disks for the ssd...then the ssd is too slow, so buy more, then your osd disks are too slow again, etc. until network is the bottleneck
[15:33] * wiebalck_ (~wiebalck@pb-d-128-141-7-126.cern.ch) has joined #ceph
[15:33] <peetaur2> then buy more NICs too? ;)
[15:33] <T1w> if the journals fills your OSD won't accept new data, so the cluster grinds to a halt as the data is written to other OSDs that get a higher load etc etc..
[15:34] <peetaur2> uh..why would it go to other OSDs? shouldn't it stay in that same placement group?
[15:34] <Be-El> valeech: what was the ceph status at the time the cluster was not available?
[15:34] <peetaur2> so how do you test if your journal is filling up?
[15:34] <T1w> given a large enough cluster, then more interfaces and link aggregation using LACP with a correct hasing algorithm would be a solution yes
[15:35] <peetaur2> how does one set up the right hash algo? I played with that a while back and it seemed crazy and broken
[15:35] <T1w> peetaur2: it's esentially the same as the OSD not beeing available
[15:35] * derjohn_mob (~aj@46.189.28.79) Quit (Remote host closed the connection)
[15:35] <valeech> Be-El: http://pastebin.com/Y1KsARWM
[15:35] <peetaur2> like in theory the default hash algo should use both links full speed if you have many connections from many hosts (but could use just 1 from one host, or one connection, or 2 hosts with coinicidental hash collision, etc.) ...but it never went full
[15:35] <Be-El> you can monitor the amount of data in the journal using collectd. one of the performance counters contains the number of bytes in the journal
[15:35] <T1w> given a large enough cluster you can has based on IP and port - that should distribute load across all bonded interfaces
[15:36] <T1w> you can hash even
[15:36] <T1w> but that assumes that load is evenly spread
[15:36] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[15:36] <Be-El> in case of bonding use layer3+4 hashing, otherwise you might get into problems if data has to be sent via a gateways
[15:36] <T1w> and from multiple clients
[15:36] <Be-El> -s
[15:37] <valeech> Be-El: FYI - the cluster and public network are 10G networks.
[15:37] <T1w> afk..
[15:37] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[15:38] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[15:39] * dugravot6 (~dugravot6@nat1-eduroam-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[15:39] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[15:40] <Be-El> valeech: was the cluster stuck in that state over several minutes?
[15:41] <valeech> Be-El: Yes, it was. It didn???t seem to be doing any type of recovery.
[15:42] <Be-El> valeech: and the cluster recovered after you restarted the osd?
[15:43] * derjohn_mob (~aj@46.189.28.54) has joined #ceph
[15:43] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[15:43] <valeech> Be-El: I only watched it for a few minutes because I needed to restore services. Yes, everything came back once the OSD was restarted.
[15:44] <Be-El> valeech: the 'peering' state indicates that the osds tried to sync each other to prepare the backfilling of pending pgs
[15:44] <Be-El> valeech: but for some unknown reason the peering did not finish
[15:45] <Be-El> valeech: can you upload the output of 'ceph osd tree' to some pastebin?
[15:46] * Drezil1 (~rhonabwy@213.61.149.100) has joined #ceph
[15:46] <valeech> Be-El: http://pastebin.com/qBbUcrWh
[15:47] * b0e (~aledermue@213.95.25.82) has joined #ceph
[15:48] <Be-El> valeech: which ceph release do you use?
[15:48] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[15:48] <valeech> Be-El: hammer
[15:48] * dmanchado (~dmanchad@nat-pool-bos-t.redhat.com) has joined #ceph
[15:48] * dmanchado (~dmanchad@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[15:48] <valeech> Be-El: ceph version 0.94.7
[15:48] <Be-El> valeech: and you use the default crush ruleset (distribution accross hosts) with size=2 and min_size=1 ?
[15:50] <valeech> Be-El: Yes, all defaults. The only thing changed is that I set the cluster and public networks.
[15:50] * mykola (~Mikolaj@91.245.74.58) has joined #ceph
[15:54] * xinli (~charleyst@32.97.110.57) Quit (Remote host closed the connection)
[15:54] <Be-El> valeech: i've seen this problem in the past with similar setup (small number of hosts). the crush algorithm tries to select two different hosts for the two replicates. if this selection fails (== the same host in all attempts), the mapping of the pg fails. the 'choose_total_tries' crush setting might need to be increased to allow more attempts
[15:55] * xinli (~charleyst@32.97.110.57) has joined #ceph
[15:55] <Be-El> valeech: you can validate this by tries to reproduce the problem and dump the pg state of one of the affected pgs
[15:56] <valeech> Be-El: Good to know! I will lab this up and see what I can see.
[15:57] * jermudgeon (~jermudgeo@southend.mdu.whitestone.link) has joined #ceph
[15:59] * jarrpa (~jarrpa@63.225.131.166) has joined #ceph
[16:00] * mtanski (~mtanski@ool-182dce0f.dyn.optonline.net) has joined #ceph
[16:00] * mtanski (~mtanski@ool-182dce0f.dyn.optonline.net) Quit ()
[16:00] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:05] <valeech> Be-El: A little old but explains the phenomenon you describe. Especially if you consider the math in Greg???s comment at the bottom. http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/
[16:06] * sleinen (~Adium@macsl.switch.ch) Quit (Quit: Leaving.)
[16:07] <valeech> Be-El: In your experience, is modifying the total retries an after hours change or just pull the ripcord? :)
[16:08] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[16:09] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit ()
[16:09] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[16:11] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[16:12] <Be-El> valeech: good question, but the last time i've run into this problem is way too long ago
[16:12] <valeech> Be-El: haha, fair enough. Thanks for the help!!
[16:14] * i_m (~i_m@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[16:15] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[16:16] * Drezil1 (~rhonabwy@213.61.149.100) Quit ()
[16:17] * derjohn_mob (~aj@46.189.28.54) Quit (Remote host closed the connection)
[16:19] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[16:20] * derjohn_mob (~aj@46.189.28.79) has joined #ceph
[16:20] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[16:29] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[16:29] * nardial (~ls@p5DC07ED6.dip0.t-ipconnect.de) has joined #ceph
[16:29] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[16:29] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[16:31] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[16:31] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Remote host closed the connection)
[16:31] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[16:32] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[16:34] * xarses_ (~xarses@64.124.158.3) has joined #ceph
[16:35] * wiebalck_ (~wiebalck@pb-d-128-141-7-126.cern.ch) Quit (Quit: wiebalck_)
[16:36] * markl (~mark@knm.org) has joined #ceph
[16:37] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:38] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Quit: Ex-Chat)
[16:39] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[16:40] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:40] * ade (~abradshaw@pool-22.254.176.62.dynamic.wobline-ip.de) Quit (Ping timeout: 480 seconds)
[16:41] * kristen (~kristen@134.134.139.76) has joined #ceph
[16:43] * salwasser (~Adium@73.114.27.94) has joined #ceph
[16:44] * salwasser1 (~Adium@2601:197:101:5cc1:20ff:a89:8d79:a14e) has joined #ceph
[16:44] * kefu (~kefu@114.92.125.128) Quit (Remote host closed the connection)
[16:44] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[16:46] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[16:47] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:48] * salwasser1 (~Adium@2601:197:101:5cc1:20ff:a89:8d79:a14e) Quit ()
[16:50] * salwasser1 (~Adium@c-73-219-86-22.hsd1.ma.comcast.net) has joined #ceph
[16:51] * salwasser (~Adium@73.114.27.94) Quit (Ping timeout: 480 seconds)
[16:53] <bene3> has anyone run into problems caused by not disabling transparent hugepages? if so, what?
[16:54] <ledgr> How many of you are using deadline IO scheduler for Ceph cluster?
[16:55] <ledgr> Do you find it useful performance-wise? Because it does perform better for me.
[16:57] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[17:00] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[17:00] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[17:01] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit ()
[17:02] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[17:02] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[17:03] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[17:04] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:05] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:07] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) has joined #ceph
[17:09] * rraja (~rraja@125.16.34.66) Quit (Quit: Leaving)
[17:11] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[17:12] * Ivan1 (~ipencak@213.151.95.130) Quit (Quit: Leaving.)
[17:12] * mgolub (~Mikolaj@91.245.79.77) has joined #ceph
[17:14] * yanzheng1 (~zhyan@125.70.23.12) Quit (Quit: This computer has gone to sleep)
[17:15] * Concubidated (~cube@h4.246.129.40.static.ip.windstream.net) Quit (Quit: Leaving.)
[17:17] <btaylor> if i???m not using cephfs on my cluster, i don???t need to install ceph-mds right?
[17:17] <btaylor> and if i???m intending on attaching qemu processes to ceph block storage, i would just need ceph-common on the client/compute host?
[17:17] * mykola (~Mikolaj@91.245.74.58) Quit (Ping timeout: 480 seconds)
[17:19] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[17:20] <boolman> btaylor: yes mds is only for cephfs
[17:21] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[17:22] <boolman> and yes, ceph-common should do the trick
[17:28] * efirs (~firs@98.207.153.155) has joined #ceph
[17:29] <btaylor> wondering why i???m getting ???common/ceph_crypto.cc: 77: FAILED assert(crypto_context != __null)??? when i try to attach a RBD using HMP in qemu
[17:29] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:30] <btaylor> tacking the rbd:thing/foo right to the qemu command line at start seems to work fine
[17:31] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[17:31] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[17:37] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[17:37] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[17:43] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[17:44] <TMM> btaylor, the hmp format is different than the command line format
[17:44] <TMM> btaylor, you need to specify your cephx key
[17:44] <btaylor> i???m just using admin...
[17:45] <btaylor> thought it would default to that
[17:45] <TMM> it will, if you use rbd:foo/foo from the cli
[17:45] <btaylor> ok
[17:45] <btaylor> i???ll look at the options agian
[17:45] <TMM> but through hmp you need to actually do the full path with keys and mons
[17:46] <btaylor> hmm
[17:47] <TMM> btaylor, something like this: rbd:rbd/volume:id=admin:key=RG8gSSBsb29rIHN0dXBpZCB0byB5b3U/Cg==:auth_supported=cephx\;none:mon_host=10.70.2.2\:6789\;10.70.2.3\:6789\;10.70.2.4\:6789
[17:47] <btaylor> ah ok
[17:47] <btaylor> thought it would pull from the /etc/ceph/ceph.conf
[17:48] <TMM> it will, when using the qemu cli :)
[17:48] * nardial (~ls@p5DC07ED6.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[17:48] <btaylor> ok
[17:49] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:56] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[17:56] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[17:57] * Concubidated (~cube@68.140.239.164) has joined #ceph
[17:58] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[18:00] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[18:01] * Jeffrey4l (~Jeffrey@110.252.73.52) Quit (Ping timeout: 480 seconds)
[18:04] * dmanchado (~dmanchad@nat-pool-bos-t.redhat.com) has joined #ceph
[18:05] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[18:13] * dugravot61 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[18:13] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Read error: Connection reset by peer)
[18:14] * sage (~quassel@2607:f298:5:101d:f816:3eff:fe21:1966) Quit (Quit: No Ping reply in 180 seconds.)
[18:14] * sage (~quassel@2607:f298:5:101d:f816:3eff:fe21:1966) has joined #ceph
[18:14] * dmanchado (~dmanchad@nat-pool-bos-t.redhat.com) has left #ceph
[18:14] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[18:18] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[18:19] * nardial (~ls@p5DC07ED6.dip0.t-ipconnect.de) has joined #ceph
[18:23] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[18:26] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:27] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[18:28] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[18:31] * bene3 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:32] * dgurtner (~dgurtner@94.126.212.170) Quit (Ping timeout: 480 seconds)
[18:34] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:36] * Unai (~Adium@50-115-70-150.static-ip.telepacific.net) has joined #ceph
[18:42] * davidzlap (~Adium@2605:e000:1313:8003:d512:b4b9:9e04:e792) has joined #ceph
[18:44] * dgtlcmo (~dgtlcmo@cpe-74-73-129-35.nyc.res.rr.com) Quit (Quit: leaving)
[18:49] * DanFoster (~Daniel@2a00:1ee0:3:1337:418d:47f1:e6a7:15f6) Quit (Quit: Leaving)
[19:04] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[19:07] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[19:08] * kefu (~kefu@li401-71.members.linode.com) has joined #ceph
[19:09] <btaylor> TMM: still getting the same errors :(
[19:09] * ircolle (~Adium@2601:285:201:633a:20a5:7715:7f11:49bb) has joined #ceph
[19:10] <btaylor> TMM: oh, removed -chroot.. from the command, now it???s working???
[19:14] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) Quit (Remote host closed the connection)
[19:15] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) has joined #ceph
[19:16] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[19:19] * kefu_ (~kefu@114.92.125.128) has joined #ceph
[19:24] * kefu_ (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[19:24] * kefu (~kefu@li401-71.members.linode.com) Quit (Ping timeout: 480 seconds)
[19:25] * kefu (~kefu@114.92.125.128) has joined #ceph
[19:40] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[19:50] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:55] * dscastro (~dscastro@181.166.94.84) has joined #ceph
[19:56] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) Quit (Remote host closed the connection)
[19:57] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) has joined #ceph
[19:58] * rwheeler (~rwheeler@pool-108-7-196-31.bstnma.fios.verizon.net) has joined #ceph
[20:00] * tessier (~treed@wsip-98-171-210-130.sd.sd.cox.net) has joined #ceph
[20:06] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[20:07] * tessier_ (~treed@wsip-98-171-210-130.sd.sd.cox.net) Quit (Ping timeout: 480 seconds)
[20:09] * bitserker (~toni@2.152.12.64.dyn.user.ono.com) Quit (Quit: Leaving.)
[20:12] * nardial (~ls@p5DC07ED6.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[20:13] * derjohn_mob (~aj@46.189.28.79) Quit (Ping timeout: 480 seconds)
[20:15] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:16] * haplo37 (~haplo37@107-190-42-94.cpe.teksavvy.com) has joined #ceph
[20:18] * wiebalck_ (~wiebalck@AAnnecy-653-1-50-224.w90-41.abo.wanadoo.fr) has joined #ceph
[20:21] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[20:23] * xinli (~charleyst@32.97.110.57) Quit (Ping timeout: 480 seconds)
[20:28] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[20:28] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[20:29] * wak-work (~wak-work@2620:15c:2c5:3:f0a4:c5dc:c888:f520) Quit (Ping timeout: 480 seconds)
[20:32] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[20:35] * rendar (~I@host192-183-dynamic.49-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:37] * wak-work (~wak-work@2620:15c:2c5:3:d5e:6789:2141:5d46) has joined #ceph
[20:40] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:45] * Discovery (~Discovery@109.235.52.4) has joined #ceph
[20:52] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[20:53] * branto (~branto@transit-86-181-132-209.redhat.com) Quit (Quit: ZNC 1.6.3 - http://znc.in)
[20:55] * masterom1 (~ivan@93-142-237-67.adsl.net.t-com.hr) has joined #ceph
[21:01] * rendar (~I@host192-183-dynamic.49-79-r.retail.telecomitalia.it) has joined #ceph
[21:01] * masteroman (~ivan@93-142-35-27.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[21:05] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[21:06] * bjozet (~bjozet@82-183-17-144.customers.ownit.se) has joined #ceph
[21:08] * masterom1 (~ivan@93-142-237-67.adsl.net.t-com.hr) Quit (Quit: WeeChat 1.5)
[21:10] * salwasser1 (~Adium@c-73-219-86-22.hsd1.ma.comcast.net) Quit (Quit: Leaving.)
[21:11] * wiebalck_ (~wiebalck@AAnnecy-653-1-50-224.w90-41.abo.wanadoo.fr) Quit (Quit: wiebalck_)
[21:11] * salwasser (~Adium@c-73-219-86-22.hsd1.ma.comcast.net) has joined #ceph
[21:14] * haplo37 (~haplo37@107-190-42-94.cpe.teksavvy.com) Quit (Ping timeout: 480 seconds)
[21:17] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[21:20] * salwasser (~Adium@c-73-219-86-22.hsd1.ma.comcast.net) Quit (Quit: Leaving.)
[21:25] * wiebalck_ (~wiebalck@AAnnecy-653-1-50-224.w90-41.abo.wanadoo.fr) has joined #ceph
[21:28] * linuxkidd (~linuxkidd@ip70-189-214-97.lv.lv.cox.net) Quit (Remote host closed the connection)
[21:37] * xinli (~charleyst@32.97.110.54) has joined #ceph
[21:38] * linuxkidd (~linuxkidd@ip70-189-214-97.lv.lv.cox.net) has joined #ceph
[21:51] * linuxkidd (~linuxkidd@ip70-189-214-97.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[21:52] * linuxkidd (~linuxkidd@ip70-189-214-97.lv.lv.cox.net) has joined #ceph
[21:52] * mgolub (~Mikolaj@91.245.79.77) Quit (Quit: away)
[21:54] * wiebalck_ (~wiebalck@AAnnecy-653-1-50-224.w90-41.abo.wanadoo.fr) Quit (Quit: wiebalck_)
[21:54] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[21:55] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[22:02] * vbellur (~vijay@71.234.224.255) has joined #ceph
[22:04] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:06] * hbogert (~Adium@ip54541f88.adsl-surfen.hetnet.nl) has joined #ceph
[22:08] <hbogert> I'm trying to deactivate an osd, by issuing:
[22:08] <hbogert> sudo ceph-disk deactivate /dev/sdc1
[22:08] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[22:08] <hbogert> but it fails with: ValueError: No JSON object could be decoded
[22:09] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[22:10] <hbogert> full stack trace is in: http://pastebin.com/uwAmfRXV
[22:13] * davidzlap (~Adium@2605:e000:1313:8003:d512:b4b9:9e04:e792) Quit (Quit: Leaving.)
[22:23] <hbogert> ok nvm, node didn't have proper keyrings. The error reporting could've been better though
[22:24] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) has joined #ceph
[22:26] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Ping timeout: 480 seconds)
[22:31] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[22:36] * xinli (~charleyst@32.97.110.54) Quit ()
[22:37] * davidzlap (~Adium@2605:e000:1313:8003:d512:b4b9:9e04:e792) has joined #ceph
[22:42] * wjw-freebsd2 (~wjw@smtp.digiware.nl) has joined #ceph
[22:47] * rendar (~I@host192-183-dynamic.49-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[22:48] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[22:50] * xinli (~charleyst@32.97.110.52) has joined #ceph
[22:55] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[22:57] * kristen (~kristen@134.134.139.76) Quit (Remote host closed the connection)
[22:59] * atod (~atod@cpe-74-73-129-35.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:59] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[23:06] * bjozet (~bjozet@82-183-17-144.customers.ownit.se) Quit (Ping timeout: 480 seconds)
[23:08] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[23:08] * kristen (~kristen@134.134.139.82) has joined #ceph
[23:09] * Discovery (~Discovery@109.235.52.4) Quit (Read error: Connection reset by peer)
[23:11] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[23:12] * newdave_ (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[23:12] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Read error: Connection reset by peer)
[23:18] * hoonetorg (~hoonetorg@77.119.226.254) Quit (Ping timeout: 480 seconds)
[23:23] <s3an2> During an upgrade from 10.2.2 to 10.2.3 the active MDS server was restarted and it got stuck in the state 'reconnect_clients' the only way I could recover from this was to hard reboot clients(kernel 4.7), anyone seen anything like this before? anything I should look at other than mds logs?
[23:24] * s3an2 (~root@korn.s3an.me.uk) Quit (Remote host closed the connection)
[23:24] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Ping timeout: 480 seconds)
[23:25] * s3an2 (~root@korn.s3an.me.uk) has joined #ceph
[23:27] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:27] * hoonetorg (~hoonetorg@fh.fh-joanneum.at) has joined #ceph
[23:28] * xinli (~charleyst@32.97.110.52) Quit (Ping timeout: 480 seconds)
[23:29] <s3an2> During an upgrade from 10.2.2. to 10.2.3 the active MDS was restarted and got stuck in the state 'reconnect_clients' - I managed to clear this only by hard rebooting the clients, anyone had anything like this before?
[23:30] * xinli (~charleyst@32.97.110.55) has joined #ceph
[23:31] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Quit: Leaving.)
[23:32] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[23:36] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[23:40] * hoonetorg (~hoonetorg@fh.fh-joanneum.at) Quit (Ping timeout: 480 seconds)
[23:45] * ntpttr_ (~ntpttr@134.134.139.70) has joined #ceph
[23:48] * gucore (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[23:51] * hoonetorg (~hoonetorg@fh.fh-joanneum.at) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.