#ceph IRC Log

Index

IRC Log for 2016-08-03

Timestamps are in GMT/BST.

[0:00] <doppelgrau> since the disk did not get mounted, errors with the udev-rules?
[0:07] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[0:08] * vbellur (~vijay@c-67-189-170-219.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[0:14] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) has joined #ceph
[0:14] * [0x4A6F]_ (~ident@p4FC2762D.dip0.t-ipconnect.de) has joined #ceph
[0:17] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:17] * [0x4A6F]_ is now known as [0x4A6F]
[0:18] * aNupoisc (~adnavare@192.55.55.41) Quit (Remote host closed the connection)
[0:19] * aNupoisc (~adnavare@192.55.55.41) has joined #ceph
[0:21] <doppelgrau> found it http://tracker.ceph.com/issues/16351
[0:22] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:22] <doppelgrau> but now I have two undersized PGs although all OSDs are up & in?
[0:26] * danieagle (~Daniel@201-69-181-193.dial-up.telesp.net.br) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[0:27] * squizzi_ (~squizzi@nat-pool-rdu-t.redhat.com) Quit (Quit: bye)
[0:28] * dgurtner (~dgurtner@46.189.28.90) Quit (Ping timeout: 480 seconds)
[0:28] * Jebula (~Tonux@45.32.233.86) has joined #ceph
[0:29] * rendar (~I@host35-23-dynamic.247-95-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[0:29] * lcurtis_ (~lcurtis@47.19.105.250) Quit (Remote host closed the connection)
[0:32] * fdmanana (~fdmanana@bl12-226-64.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[0:38] * vata (~vata@ARennes-652-1-70-186.w2-11.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[0:38] * T1w (~jens@node3.survey-it.dk) Quit (Remote host closed the connection)
[0:39] * babilen_ (~babilen@185.22.222.210) has joined #ceph
[0:39] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[0:40] * babilen (~babilen@babilen.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:42] * EthanL (~lamberet@cce02cs4045-fa12-z.ams.hpecore.net) has joined #ceph
[0:44] * badone (~badone@66.187.239.16) has joined #ceph
[0:48] * kuku (~kuku@119.93.91.136) has joined #ceph
[0:50] * EthanL (~lamberet@cce02cs4045-fa12-z.ams.hpecore.net) Quit (Ping timeout: 480 seconds)
[0:54] * kuku (~kuku@119.93.91.136) Quit (Quit: computer sleep)
[0:55] * kuku (~kuku@119.93.91.136) has joined #ceph
[0:56] * jclm (~jclm@ip68-96-196-245.lv.lv.cox.net) has joined #ceph
[0:56] * jclm (~jclm@ip68-96-196-245.lv.lv.cox.net) Quit ()
[0:57] * cathode (~cathode@50.232.215.114) Quit (Quit: Leaving)
[0:58] * Jebula (~Tonux@9YSAAA2OF.tor-irc.dnsbl.oftc.net) Quit ()
[1:05] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:12] * xarses (~xarses@64.124.158.192) Quit (Ping timeout: 480 seconds)
[1:17] * Szernex (~Random@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[1:18] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[1:21] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[1:21] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[1:25] * theTrav (~theTrav@ipc032.ipc.telstra.net) has joined #ceph
[1:26] * theTrav (~theTrav@ipc032.ipc.telstra.net) Quit (Remote host closed the connection)
[1:27] * theTrav (~theTrav@203.35.9.142) has joined #ceph
[1:28] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[1:31] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[1:31] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[1:32] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[1:32] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[1:32] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:40] * cyphase (~cyphase@c-50-148-131-137.hsd1.ca.comcast.net) has joined #ceph
[1:46] * Szernex (~Random@5AEAAAQDY.tor-irc.dnsbl.oftc.net) Quit ()
[1:53] * oms101 (~oms101@p20030057EA02F200C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:00] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[2:02] * oms101 (~oms101@p20030057EA02A100C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:05] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[2:07] * blizzow (~jburns@50.243.148.102) Quit (Ping timeout: 480 seconds)
[2:09] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[2:32] * Helleshin (~nartholli@tor2r.ins.tor.net.eu.org) has joined #ceph
[2:32] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[2:38] * EthanL (~lamberet@cce02cs4034-fa12-z.ams.hpecore.net) has joined #ceph
[2:44] * aNupoisc (~adnavare@192.55.55.41) Quit (Remote host closed the connection)
[2:46] * EthanL (~lamberet@cce02cs4034-fa12-z.ams.hpecore.net) Quit (Read error: Connection reset by peer)
[2:56] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[2:57] * yanzheng (~zhyan@125.70.20.95) has joined #ceph
[3:01] * Helleshin (~nartholli@9YSAAA2QW.tor-irc.dnsbl.oftc.net) Quit ()
[3:03] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[3:29] * Pulp (~Pulp@63-221-50-195.dyn.estpak.ee) Quit (Read error: Connection reset by peer)
[3:44] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:45] * reed (~reed@216.38.134.18) Quit (Ping timeout: 480 seconds)
[3:50] * vbellur (~vijay@71.234.224.255) has joined #ceph
[3:50] * dnunez (~dnunez@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[4:04] * dnunez (~dnunez@nat-pool-bos-u.redhat.com) has joined #ceph
[4:05] * b0e (~aledermue@213.95.25.82) has joined #ceph
[4:09] * jrowe_ (~jrowe@204.14.236.152) has joined #ceph
[4:09] * jrowe (~jrowe@204.14.236.152) Quit (Read error: Connection reset by peer)
[4:11] * georgem (~Adium@76-10-180-154.dsl.teksavvy.com) has joined #ceph
[4:12] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[4:12] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[4:18] * tsg (~tgohad@fmdmzpr02-ext.fm.intel.com) has joined #ceph
[4:18] * tsg_ (~tgohad@134.134.137.73) Quit (Remote host closed the connection)
[4:20] * efirs (~firs@5.128.174.86) has joined #ceph
[4:22] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[4:24] * rdias (~rdias@2001:8a0:749a:d01:938:13ff:6e72:610b) has joined #ceph
[4:28] * cronburg (~cronburg@209-6-121-249.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com) has joined #ceph
[4:29] * flisky (~Thunderbi@106.38.61.190) has joined #ceph
[4:33] * kefu (~kefu@183.193.165.164) has joined #ceph
[4:35] * baojg (~baojg@61.135.155.34) Quit (Ping timeout: 480 seconds)
[4:45] * JWilbur (~tallest_r@5.157.16.42) has joined #ceph
[4:48] * baojg (~baojg@61.135.155.34) has joined #ceph
[5:01] * truan-wang (~truanwang@220.248.17.34) has joined #ceph
[5:04] * squizzi (~squizzi@107.13.31.195) Quit (Ping timeout: 480 seconds)
[5:06] * jfaj (~jan@p20030084AF3AA2005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[5:08] * theTrav_ (~theTrav@ipc032.ipc.telstra.net) has joined #ceph
[5:14] * theTrav (~theTrav@203.35.9.142) Quit (Read error: Connection timed out)
[5:15] * JWilbur (~tallest_r@9YSAAA2T0.tor-irc.dnsbl.oftc.net) Quit ()
[5:15] * jfaj (~jan@p20030084AF3665005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[5:18] * georgem (~Adium@76-10-180-154.dsl.teksavvy.com) Quit (Quit: Leaving.)
[5:20] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[5:23] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Ping timeout: 480 seconds)
[5:24] * kefu (~kefu@183.193.165.164) Quit (Ping timeout: 480 seconds)
[5:25] * swsgc (swsgc@c-71-237-134-29.hsd1.or.comcast.net) Quit ()
[5:27] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[5:28] * Nicho1as (~nicho1as@218.147.181.208) has joined #ceph
[5:32] * theTrav_ (~theTrav@ipc032.ipc.telstra.net) Quit (Remote host closed the connection)
[5:33] * theTrav (~theTrav@203.35.9.142) has joined #ceph
[5:44] * vimal (~vikumar@114.143.165.7) has joined #ceph
[5:45] * baojg (~baojg@61.135.155.34) Quit (Ping timeout: 480 seconds)
[5:46] * Vacuum__ (~Vacuum@88.130.199.60) has joined #ceph
[5:53] * Vacuum_ (~Vacuum@88.130.196.47) Quit (Ping timeout: 480 seconds)
[5:54] * baojg (~baojg@61.135.155.34) has joined #ceph
[5:56] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Quit: Leaving.)
[5:59] * flisky1 (~Thunderbi@223.104.254.95) has joined #ceph
[5:59] * flisky (~Thunderbi@106.38.61.190) Quit (Remote host closed the connection)
[6:02] * flisky (~Thunderbi@210.12.157.89) has joined #ceph
[6:02] * truan-wang (~truanwang@220.248.17.34) Quit (Remote host closed the connection)
[6:02] * kefu_ (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:07] * flisky1 (~Thunderbi@223.104.254.95) Quit (Ping timeout: 480 seconds)
[6:08] * cronburg (~cronburg@209-6-121-249.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[6:11] * overclk (~quassel@2400:6180:100:d0::54:1) Quit (Quit: No Ping reply in 180 seconds.)
[6:12] * overclk (~quassel@2400:6180:100:d0::54:1) has joined #ceph
[6:15] * penguinRaider (~KiKo@146.185.31.226) Quit (Ping timeout: 480 seconds)
[6:16] * kefu (~kefu@114.92.96.253) has joined #ceph
[6:21] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:22] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[6:28] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[6:29] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[6:29] * flisky (~Thunderbi@210.12.157.89) Quit (Read error: Connection reset by peer)
[6:30] * flisky (~Thunderbi@106.38.61.189) has joined #ceph
[6:30] * penguinRaider (~KiKo@14.139.82.6) has joined #ceph
[6:38] * truan-wang (~truanwang@58.247.8.186) has joined #ceph
[6:39] * _28_ria (~kvirc@opfr028.ru) has joined #ceph
[6:39] * toastydeath (~toast@pool-71-255-253-39.washdc.fios.verizon.net) has joined #ceph
[6:41] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[6:41] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:41] <ivve> hello, i'm having a pretty full cluster giving me some trouble with stuck pgs due to uneven balance (MIN/MAX VAR: 0.86/1.10 on 4TB disks), anyone got any good advice on getting a cluster more evenly balanced? i haven't tried osd crush reweight yet but osd reweight (as i have read it) just puts of load on a specific osd, and that is what i would like to do, however it seems not to help off my
[6:41] <ivve> issue. i have ordered more storage but don't want to sit with 1 copy of a pg until it arrives... http://pastebin.com/a5c5ryL6 & http://pastebin.com/W4Zgp1fa
[6:44] <ivve> i have fiddled with osd_backfill_full_ratio to allow backfilling to the already quite full osds which just makes my unbalance problem even worse since crush choses to use those disk over the "more empty" ones
[6:46] * efirs1 (~firs@c-50-185-70-125.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:52] * _are__ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[6:52] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Read error: Connection reset by peer)
[6:56] * _28_ria (~kvirc@opfr028.ru) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[6:57] * kuku_ (~kuku@203.177.235.23) has joined #ceph
[7:00] * ira (~ira@1.186.34.66) has joined #ceph
[7:01] * vimal (~vikumar@121.244.87.116) has joined #ceph
[7:02] * truan-wang (~truanwang@58.247.8.186) Quit (Remote host closed the connection)
[7:04] * kuku (~kuku@119.93.91.136) Quit (Read error: Connection reset by peer)
[7:04] * kuku (~kuku@119.93.91.136) has joined #ceph
[7:06] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[7:10] * kuku_ (~kuku@203.177.235.23) Quit (Ping timeout: 480 seconds)
[7:13] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[7:18] * truan-wang (~truanwang@58.247.8.186) has joined #ceph
[7:22] * davidzlap1 (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[7:22] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Read error: Connection reset by peer)
[7:23] * penguinRaider (~KiKo@14.139.82.6) Quit (Ping timeout: 480 seconds)
[7:28] * swami1 (~swami@49.44.57.239) has joined #ceph
[7:28] * bvi (~Bastiaan@185.56.32.1) has joined #ceph
[7:32] * penguinRaider (~KiKo@146.185.31.226) has joined #ceph
[7:35] * swami2 (~swami@49.38.2.241) has joined #ceph
[7:39] * swami1 (~swami@49.44.57.239) Quit (Ping timeout: 480 seconds)
[7:44] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[7:53] * tsg (~tgohad@fmdmzpr02-ext.fm.intel.com) Quit (Ping timeout: 480 seconds)
[7:55] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) has joined #ceph
[8:00] * ira (~ira@1.186.34.66) Quit (Ping timeout: 480 seconds)
[8:00] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:03] * brians__ (~brian@80.111.114.175) has joined #ceph
[8:03] * brians (~brian@80.111.114.175) Quit (Read error: Connection reset by peer)
[8:04] * karnan_ (~karnan@121.244.87.117) has joined #ceph
[8:08] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[8:14] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[8:14] * karnan_ (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[8:16] * tom_nz (~oftc-webi@202.14.217.2) has joined #ceph
[8:17] * aj__ (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[8:20] * efirs (~firs@5.128.174.86) Quit (Ping timeout: 480 seconds)
[8:23] <tom_nz> Hi guys, I've got 4 servers with 4 OSDs with 1T each so 16 OSDs, my question is that if I loose one server due to e.g. a faulty network cable, I'd run into trouble unless I set the replication to 5? Is this correct? Would that mean thought that I only have 16T / 5 available for storing data?
[8:23] <tom_nz> sorry if I'm missing something
[8:24] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:28] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[8:28] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[8:47] * ade (~abradshaw@p4FF79653.dip0.t-ipconnect.de) has joined #ceph
[8:48] * penguinRaider (~KiKo@146.185.31.226) Quit (Ping timeout: 480 seconds)
[8:53] * kuku_ (~kuku@203.177.235.23) has joined #ceph
[8:55] * kuku (~kuku@119.93.91.136) Quit (Read error: Connection reset by peer)
[8:55] * kuku (~kuku@119.93.91.136) has joined #ceph
[8:56] * ira (~ira@121.244.87.117) has joined #ceph
[8:59] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[9:02] * kuku_ (~kuku@203.177.235.23) Quit (Ping timeout: 480 seconds)
[9:04] * penguinRaider (~KiKo@146.185.31.226) has joined #ceph
[9:05] * analbeard (~shw@support.memset.com) has joined #ceph
[9:07] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[9:10] <doppelgrau> tom_nz: no, replication size 3, min size=2 and you can tolerate one loss without problem
[9:12] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[9:12] <doppelgrau> tom_nz: you???d run into blocked IO if size=min_sinze and till rebalance or indefinitive if size=min_size=4, since that can not be fullfiled with a faulty node
[9:15] * t4nk173 (~oftc-webi@117.247.186.15) has joined #ceph
[9:17] <ivve> anyone got any good tips for balancing a cluster.. i've got 70x4TB osds over 2 nodes and balance looks like: MIN/MAX VAR: 0.86/1.16
[9:18] <t4nk173> I had full OSDs and I added new osd But stuck with 4 ops are blocked > 67108.9 sec on osd.6 for some hours there is no firewall, no network issues i tried restarting ceph on all nodes Has anyone come across this? If so could you please help me?
[9:19] * dgurtner (~dgurtner@178.197.232.208) Quit (Ping timeout: 480 seconds)
[9:20] * rendar (~I@host167-157-dynamic.44-79-r.retail.telecomitalia.it) has joined #ceph
[9:21] * penguinRaider (~KiKo@146.185.31.226) Quit (Ping timeout: 480 seconds)
[9:22] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[9:23] * theTrav (~theTrav@203.35.9.142) Quit (Read error: Connection timed out)
[9:25] <ivve> t4nk173: probably similair to my issue, solved that although my cluster is very "unbalanced"
[9:25] <ivve> t4nk173: ceph pg dump_stuck
[9:26] <ivve> t4nk173: ceph pg <pgid> query -> find out what they want to do.. my issue they wanted to do backfills to an osd which had a backfill full ratio that was reached
[9:27] <t4nk173> this started while the cluster was rebalancing
[9:27] <ivve> so i increased that a few %.. although i have osds @ 90% and some as low as 66%
[9:27] <ivve> so strange on 4TB osds
[9:28] <ivve> t4nk173: what does ceph pg dump_stuck look like?
[9:28] <ivve> oh they are blocked, not stuck
[9:29] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) has joined #ceph
[9:29] * penguinRaider (~KiKo@204.152.207.173) has joined #ceph
[9:30] * stein (~stein@185.56.185.82) Quit (Remote host closed the connection)
[9:31] <t4nk173> http://pastebin.com/W1Lk0JX1
[9:31] * aj__ (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[9:31] <t4nk173> http://pastebin.com/W1Lk0JX1 --> ceph pg dump
[9:32] <t4nk173> http://pastebin.com/W1Lk0JX1 --> ceph pg dump_stuck
[9:32] <t4nk173> they were stuck. but now i changed replica size to 2
[9:32] <t4nk173> now its rebalcing
[9:33] <t4nk173> now its rebalancing
[9:33] <ivve> yea was about to ask
[9:33] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[9:34] <ivve> also a "ceph osd df tree" would be useful as well as "ceph osd pool ls detail"
[9:34] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[9:35] * stein (~stein@185.56.185.82) has joined #ceph
[9:36] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:36] <t4nk173> i have enough free space on osds now
[9:36] <ivve> what is your min size set to?
[9:36] <t4nk173> the problem is the requests were stuck. I hope things will look better after this recovery
[9:36] <t4nk173> now i set it to 2
[9:36] <t4nk173> was 3
[9:37] <ivve> size = 2 & min size = 2 ?
[9:37] <t4nk173> yes
[9:37] <ivve> you can set minsize to 1
[9:37] <ivve> it will allow for 1 extra copy if degraded
[9:37] <ivve> instead of 2
[9:38] <ivve> if you are having space issues it could be good to lower
[9:39] <t4nk173> i have changed it to 1
[9:40] <ivve> now if im right scrubs should take care of the extra copies and free up space unless it does that without scrubbing
[9:40] * boolman (boolman@79.138.78.238) has joined #ceph
[9:41] * bvi (~Bastiaan@185.56.32.1) Quit (Quit: Leaving)
[9:42] <t4nk173> i guess so. Thanks. Alsodo u know if ceph jewel supports multi tenant namespace for containetrs?
[9:43] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[9:43] * CampGareth (~Max@149.18.114.222) has joined #ceph
[9:44] * theTrav (~theTrav@1.136.96.134) has joined #ceph
[9:46] <CampGareth> Heya all, this an okay place to ask for help with a ceph-osd that's crashing?
[9:46] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[9:48] <schegi> Hi, i am running a little test ceph cluster 4 nodes (3 mon nodes with one osd running, and one pure osd node with two osds running). And i am observing a strange behavior, when killing osd processes (via systemctl or kill). Ceph seems aiicore
[9:48] <schegi> to recongnize that osds are going downexpcept for the last osd in the cluster os if iam am shutting down all osds ceph health always shows 1 os remaining rup and it is always the last one shutted down as osd tree indicates. When i then restart osds al other than this last one ceph suddenly recognize that it is down.
[9:50] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[9:52] * theTrav (~theTrav@1.136.96.134) Quit (Read error: Connection timed out)
[9:52] * flesh (~oftc-webi@static.ip-171-033-130-093.signet.nl) has joined #ceph
[9:53] <doppelgrau> schegi: osd min down reporters
[9:53] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:54] <doppelgrau> schegi: report through other osd = fast, tieout via mon veeery slow
[9:54] * zokko (zok@neurosis.pl) Quit (Ping timeout: 480 seconds)
[9:55] <schegi> what means very slow, left it over the weekend with all osds off, on monday the last one was still marked as up
[9:55] * zokko (zok@neurosis.pl) has joined #ceph
[9:56] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[9:56] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:56] <doppelgrau> schegi: I thought that should be enough, but outside the lab the szario isn???t relevant,
[9:57] <schegi> is there a way to force check osds by the mons? or to configure the timeout via the mons?
[9:57] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[9:58] <schegi> 3 days of and still not timeout seems a bit long to me.
[10:00] * kuku (~kuku@119.93.91.136) Quit (Remote host closed the connection)
[10:01] * DanFoster (~Daniel@2a00:1ee0:3:1337:61a8:c3cb:9ad9:8377) has joined #ceph
[10:02] <boolman> schegi: any logs from the monitors?
[10:03] * Pulec1 (~ricin@213.61.149.100) has joined #ceph
[10:05] * bb0x (~bb0x@78.97.194.150) Quit (Ping timeout: 480 seconds)
[10:06] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[10:07] <schegi> w8 will produce some. I understand now what is happening, what i am interesting now is it possible to somehow reliably get the state of the last osds (whether it is up or down) from the ceph cli. Because in the current case it seems to me that you cannot decide based on the cli output what is the state of the last osd. There will be at least 'mon osd min down reports' #osds reported as up.
[10:07] <schegi> Am i correct?
[10:14] * truan-wang (~truanwang@58.247.8.186) Quit (Ping timeout: 480 seconds)
[10:16] * allaok (~allaok@machine107.orange-labs.com) Quit (Remote host closed the connection)
[10:16] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:16] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[10:17] * bb0x (~bb0x@5.2.199.244) Quit ()
[10:18] * dgurtner (~dgurtner@178.197.232.208) Quit (Ping timeout: 480 seconds)
[10:19] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[10:21] <CampGareth> alright, my cluster's just a three node one for home NAS purposes, 2 x86 nodes fielding 7 OSDs and one ARM fielding 1 (allwinner A20, 1GB RAM). Both the x86 nodes are fine but for some reason ceph-osd seg faults on the ARM node
[10:22] <CampGareth> The logs list connections to other OSDs, data flowing back and forth, but during this is seg faults in thread name ms_pipe_read
[10:22] <boolman> schegi: hm i think not, if all the osds are down, they should get downed by the monitor 'mon osd report timeout = 900'
[10:22] <boolman> http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/#osds-report-their-status
[10:23] <CampGareth> I don't think it's running out of RAM since it's not getting OOMed and the cluster's only got 128PGs to handle atm, so what's up?
[10:23] * t4nk173 slaps ivve around a bit with a large fishbot
[10:26] <schegi> boolman: yes thats true read this but to me it seems like the timeout is not taken into accout. trying to play aroud with this timeout, i'll be back if i have some results.
[10:27] <boolman> schegi: i think I might have the same issue(where osd's it not getting marked as down), gonna try it later today or tomorrow
[10:29] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[10:31] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[10:33] * Pulec1 (~ricin@5AEAAAQMR.tor-irc.dnsbl.oftc.net) Quit ()
[10:35] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[10:36] * TMM (~hp@185.5.121.201) has joined #ceph
[10:36] * penguinRaider (~KiKo@204.152.207.173) Quit (Quit: Leaving)
[10:37] * bb0x (~bb0x@5.2.199.244) Quit (Quit: This computer has gone to sleep)
[10:37] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[10:38] <schegi> boolman: thx, if i have some news i'll be back here.
[10:44] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Remote host closed the connection)
[10:46] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[10:48] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[10:53] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:54] * fdmanana (~fdmanana@bl12-226-64.dsl.telepac.pt) has joined #ceph
[10:54] * Cybert1nus (~Cybertinu@cybertinus.customer.cloud.nl) Quit (Remote host closed the connection)
[10:54] * alexxy (~alexxy@biod.pnpi.spb.ru) Quit (Remote host closed the connection)
[10:55] * DoDzy (~Quackie@185.65.134.79) has joined #ceph
[10:56] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[11:02] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[11:03] * rmart04 (~rmart04@support.memset.com) has joined #ceph
[11:03] * dgurtner (~dgurtner@178.197.232.208) Quit (Read error: Connection reset by peer)
[11:03] <rmart04> Hi, could anyone tell me if Ceph???s swift API implementation supports object versioning?
[11:10] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[11:10] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[11:12] <schegi> boolman: mon log of the leader http://pastebin.com/AMWJiX9i
[11:12] * babilen_ is now known as babilen
[11:15] * ira (~ira@121.244.87.117) Quit (Quit: Leaving)
[11:23] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[11:25] * arbrandes1 (~arbrandes@ec2-54-172-54-135.compute-1.amazonaws.com) has joined #ceph
[11:25] * DoDzy (~Quackie@9YSAAA20M.tor-irc.dnsbl.oftc.net) Quit ()
[11:26] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[11:27] * MrBy (~MrBy@85.115.23.2) has joined #ceph
[11:31] * arbrandes (~arbrandes@ec2-54-172-54-135.compute-1.amazonaws.com) Quit (Ping timeout: 480 seconds)
[11:38] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Quit: Leaving...)
[11:38] * alexxy (~alexxy@biod.pnpi.spb.ru) has joined #ceph
[11:48] * chengpeng_ (~chengpeng@180.168.197.82) has joined #ceph
[11:49] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[11:52] * Cybertinus (~Cybertinu@cybertinus.customer.cloud.nl) has joined #ceph
[11:54] * chengpeng (~chengpeng@180.168.197.82) Quit (Ping timeout: 480 seconds)
[11:56] * Cybert1nus (~Cybertinu@cybertinus.customer.cloud.nl) has joined #ceph
[12:00] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[12:02] * Cybertinus (~Cybertinu@cybertinus.customer.cloud.nl) Quit (Ping timeout: 480 seconds)
[12:08] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[12:08] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[12:17] * murmur_ (~murmur@zeeb.org) Quit (Read error: Connection reset by peer)
[12:17] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[12:17] * Nixx_ (~quassel@bulbasaur.sjorsgielen.nl) Quit (Read error: Connection reset by peer)
[12:18] * mschiff_ (~mschiff@mx10.schiffbauer.net) Quit (Remote host closed the connection)
[12:19] * Nixx (~quassel@bulbasaur.sjorsgielen.nl) has joined #ceph
[12:19] * murmur (~murmur@zeeb.org) has joined #ceph
[12:20] * mschiff (~mschiff@mx10.schiffbauer.net) has joined #ceph
[12:35] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[12:37] * Defaultti1 (~AG_Scott@185.65.134.77) has joined #ceph
[12:39] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[12:39] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[12:39] * chengpeng__ (~chengpeng@222.73.33.154) has joined #ceph
[12:40] * adamcrume (~quassel@2601:647:cb01:f890:50c0:7b1b:e4aa:8c73) Quit (Quit: No Ping reply in 180 seconds.)
[12:41] * adamcrume (~quassel@2601:647:cb01:f890:9841:dddc:f1b6:58ce) has joined #ceph
[12:41] * dgurtner (~dgurtner@178.197.232.208) Quit (Ping timeout: 480 seconds)
[12:43] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[12:46] * doppelgrau1 (~Admin@132.252.235.172) has joined #ceph
[12:46] * chengpeng_ (~chengpeng@180.168.197.82) Quit (Ping timeout: 480 seconds)
[12:47] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[12:47] * chengpeng__ (~chengpeng@222.73.33.154) Quit (Ping timeout: 480 seconds)
[12:50] * ira (~ira@121.244.87.117) has joined #ceph
[12:51] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Quit: Leaving)
[12:52] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[12:53] * haomaiwang (~oftc-webi@61.149.85.206) has joined #ceph
[12:56] * bb0x (~bb0x@5.2.199.244) Quit (Quit: This computer has gone to sleep)
[12:57] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[13:00] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[13:00] * gregmark (~Adium@68.87.42.115) has joined #ceph
[13:04] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[13:07] * Defaultti1 (~AG_Scott@5AEAAAQPF.tor-irc.dnsbl.oftc.net) Quit ()
[13:13] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[13:13] * t4nk173 (~oftc-webi@117.247.186.15) Quit (Ping timeout: 480 seconds)
[13:15] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[13:21] * liyang (~liyang@223.68.205.133) has joined #ceph
[13:22] <liyang> hi
[13:23] * liyang (~liyang@223.68.205.133) Quit ()
[13:23] <boolman> is there any data on whats good or bad osd apply/commit latency?
[13:28] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[13:29] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[13:34] * MACscr1 (~Adium@c-73-9-230-5.hsd1.il.comcast.net) has joined #ceph
[13:36] <MACscr1> im using a ceph rbd for each kvm guest on a proxmox cluster. Suggestions for incremental/differential backups? Most VM's are about 200GB each, so I need to be efficient. Right now my backup server just has NFS running, but im open other ideas. Unfortunately i cant spinup another ceph cluster though.
[13:38] * johnavp19891 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[13:38] <- *johnavp19891* To prove that you are human, please enter the result of 8+3
[13:38] <doppelgrau1> MACscr1: protect agains errors in the guest (snapshots) or against a total failure of the ceph cluster
[13:39] <MACscr1> just looking to backup individual ones. It would be cool if they could somehow be synced to a image file so that worse case, i could spin them up that way.
[13:39] <MACscr1> that was a great question though
[13:40] * MACscr1 is now known as MACscr
[13:41] <doppelgrau1> at VM level you can use snapsshots, export the whole image or all conventional backup solutions inside the VM (backuupc, backula...)
[13:42] <MACscr> a snapshot is just a diff though isnt it?
[13:43] <MACscr> plus that snapshot resides on on the ceph cluster, so isnt really a backup
[13:43] <MACscr> so my apologies, i do need it to resides off the cluster
[13:43] <doppelgrau1> yes, so it protects agains failures inside the VM (locky e.g.), but not a failed cluster
[13:44] <doppelgrau1> MACscr: If you had a second cluster, you could also sync both with snapshots and copy the diff periodically to the other cluster
[13:45] <doppelgrau1> MACscr: but with only one cluster a simple IMAGE-dump or a solution inside the VM is in my eyes the best solution
[13:45] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[13:48] * jarrpa (~jarrpa@2602:3f:e183:a600:eab1:fcff:fe47:f680) Quit (Remote host closed the connection)
[13:50] * ira (~ira@121.244.87.117) Quit (Remote host closed the connection)
[13:52] * jarrpa (~jarrpa@2602:3f:e183:a600:eab1:fcff:fe47:f680) has joined #ceph
[13:53] * tunaaja (~xanax`@213.61.149.100) has joined #ceph
[13:53] <Miouge> rmart04: ???Object Versioning Not Supported??? http://docs.ceph.com/docs/jewel/radosgw/swift/#api
[13:54] * haomaiwang (~oftc-webi@61.149.85.206) Quit (Ping timeout: 480 seconds)
[13:54] * tsg (~tgohad@192.55.55.41) has joined #ceph
[13:55] * bb0x (~bb0x@5.2.199.244) Quit (Quit: This computer has gone to sleep)
[13:55] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[13:56] * EthanL (~lamberet@cce02cs4044-fa12-z.ams.hpecore.net) has joined #ceph
[13:56] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[13:58] <Miouge> MACscr, doppelgrau1 is right take a snapshot of the VM then copy the base+snapshot over to your NFS sever
[13:58] <Miouge> You can use the snapshot feature as a way to do incremental backup basically
[13:58] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[13:59] * sebastian-w_ (~quassel@212.218.8.138) Quit (Remote host closed the connection)
[13:59] * sebastian-w (~quassel@212.218.8.138) has joined #ceph
[14:00] <schegi> boolman: if the question was meant for me. there is no data at all on the osds it is a fresh cluster not even a pool added, except the default rbd pool.
[14:00] * EthanL (~lamberet@cce02cs4044-fa12-z.ams.hpecore.net) Quit (Read error: Connection reset by peer)
[14:01] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[14:02] * flisky (~Thunderbi@106.38.61.189) Quit (Ping timeout: 480 seconds)
[14:02] <boolman> schegi: no it was a general question for everyone :)
[14:02] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[14:02] * ira (~ira@121.244.87.117) has joined #ceph
[14:04] * willi (~willi@p200300774E21C8FC50DB1BB208D139BF.dip0.t-ipconnect.de) has joined #ceph
[14:05] <willi> hey guys: is in ceph infernalis, ubuntu 14.04 kernel 3.13, rbd striping available ??
[14:05] <willi> stripe count= n ???
[14:05] <willi> image format 2....
[14:05] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[14:05] <willi> the manpage says:
[14:05] <willi> format 2 - Use the second rbd format, which is supported by
[14:05] <willi> librbd and kernel since version 3.11 (except for striping).
[14:05] <willi> This adds support for cloning and is more easily extensible to
[14:05] <willi> allow more features in the future.
[14:05] <willi> http://manpages.ubuntu.com/manpages/wily/man8/rbd.8.html
[14:06] * Mahoff (~oftc-webi@ae0-19.frankfurt-1.celox.net) has joined #ceph
[14:06] <Mahoff> hi
[14:07] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[14:08] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[14:10] * pdrakeweb (~pdrakeweb@oh-76-5-108-60.dhcp.embarqhsd.net) has joined #ceph
[14:10] <rmart04> @Miouge - Thanks!
[14:11] * bniver (~bniver@nat-pool-bos-u.redhat.com) has joined #ceph
[14:12] * kuku (~kuku@112.203.10.252) has joined #ceph
[14:13] <Mahoff> Does anyone know where i can find a list of known bugs in ceph jewel?
[14:14] <ivve> tracker.ceph.com
[14:17] <MACscr> maybe this seems dumb, but couldnt i setup a single node ceph pool and sync to it from the primary cluster?
[14:21] * georgem (~Adium@24.114.72.209) has joined #ceph
[14:21] * georgem (~Adium@24.114.72.209) Quit ()
[14:21] * georgem (~Adium@206.108.127.16) has joined #ceph
[14:22] <Mahoff> @ivve thanks. I already searched up and down but could not find a solution. Our ceph jewel cluster work fine until we shutdown 1 node (mon + osd) - quorum still ok and ceph health shows 1 mon down. However it takes over 1 hour till ceph osd tree reflects these change. Cannot be normal to take that long? Meanwhile all rbd command just hang and try to contact this one node that has been shutdown.
[14:22] * dgurtner (~dgurtner@178.197.232.208) Quit (Read error: Connection reset by peer)
[14:22] * tunaaja (~xanax`@9YSAAA23F.tor-irc.dnsbl.oftc.net) Quit ()
[14:23] <Mahoff> So Works as expected but takes 1 hour+ to come into a working degraded state :-( Should be something between 30s - 2mins from my understanding?
[14:24] <doppelgrau1> MACscr: a possibility, not tested but I do not see any reasons why that shoud not work
[14:28] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[14:30] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[14:31] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[14:33] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[14:35] * rraja (~rraja@121.244.87.117) has joined #ceph
[14:35] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Ping timeout: 480 seconds)
[14:36] * ade (~abradshaw@p4FF79653.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[14:37] <willi> anyone here who tested rbd striping?
[14:38] <willi> is kernel 3.13 (ubuntu 14.04) supporting the striping?
[14:39] * dgurtner_ (~dgurtner@178.197.232.208) has joined #ceph
[14:39] <doppelgrau1> willi: not testet, but I'd bet the kernel does not support it
[14:39] <willi> which kernel supports it?
[14:39] <doppelgrau1> willi: only userspace tools
[14:39] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[14:40] * chengpeng (~chengpeng@180.168.170.2) has joined #ceph
[14:41] * kuku (~kuku@112.203.10.252) Quit (Remote host closed the connection)
[14:42] * johnavp19891 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:43] * dgurtner (~dgurtner@178.197.232.208) Quit (Ping timeout: 480 seconds)
[14:46] <willi> client told me
[14:46] <willi> rbd info lun000
[14:46] <willi> rbd image 'lun000':
[14:46] <willi> size 2048 GB in 524310 objects
[14:46] <willi> order 22 (4096 kB objects)
[14:46] <willi> block_name_prefix: rbd_data.126b4b52e795
[14:46] <willi> format: 2
[14:46] <willi> features: striping, exclusive-lock
[14:46] <willi> flags:
[14:46] <willi> stripe unit: 65536 bytes
[14:46] <willi> stripe count: 30
[14:46] <willi> after creating
[14:46] * kuku (~kuku@112.203.10.252) has joined #ceph
[14:46] <willi> is it now supported?
[14:51] * georgem (~Adium@206.108.127.16) has joined #ceph
[14:54] <willi> hallo???
[14:54] <willi> now one knows it?
[14:55] * ira (~ira@121.244.87.117) Quit (Quit: Leaving)
[15:00] <Hatsjoe> willi please remain calm when waiting for an answer. A big portion of the users here have normal jobs, if someone knows, Im sure he/she will answer your question whenever that person can.
[15:00] * pdrakeweb (~pdrakeweb@oh-76-5-108-60.dhcp.embarqhsd.net) Quit (Read error: Connection reset by peer)
[15:00] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[15:01] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[15:02] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Quit: Leaving)
[15:02] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:02] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit (Read error: No route to host)
[15:02] <georgem> willi: and you could ask the question again for users who just joined???
[15:03] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[15:06] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[15:07] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:09] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) has joined #ceph
[15:10] * tZ (~totalworm@46.45.177.98) has joined #ceph
[15:10] * kuku (~kuku@112.203.10.252) Quit (Read error: Connection reset by peer)
[15:11] * kuku (~kuku@112.203.10.252) has joined #ceph
[15:12] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[15:12] <Mahoff> Could anyone please tell me how long it usually takes for 'ceph osd tree' to reflect change after shutting down 1 cluster node?
[15:14] * rakeshgm (~rakesh@121.244.87.117) Quit (Remote host closed the connection)
[15:15] <CampGareth> Really not long, though as discussed somewhere up there ^ if it's the last OSD it takes longer
[15:16] <CampGareth> *if that one node houses the last OSD
[15:19] * haomaiwang (~oftc-webi@114.249.239.114) has joined #ceph
[15:21] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[15:24] <Mahoff> @CampGareth Strange. I have shut down 1 node (not the last one) and it takes 1 hour until 'ceph osd tree' marks corresponding OSD's as down.
[15:24] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[15:25] <Mahoff> Meanwhile 'rbd --pool MyPool list' just sits there and tries to contact this shutdown node:
[15:25] <Mahoff> 2016-08-03 12:05:54.612169 7f4be9580700 0 -- 192.168.250.64:0/2950800349 >> 192.168.250.60:6804/2448 pipe(0x55fa03908610 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x55fa039098d0).fault
[15:26] <Mahoff> 192.168.250.60 is IP of node shutdown. After 1 hour or so it clears up and works again.
[15:26] <doppelgrau1> Mahoff: have you less than osd min down reporters osds left?
[15:26] * kuku (~kuku@112.203.10.252) Quit (Read error: Connection reset by peer)
[15:27] <Mahoff> Any idea what might be causing this? Shouldn't rbd command "know" which mons are up and which are down?
[15:27] * kuku (~kuku@112.203.10.252) has joined #ceph
[15:28] * squizzi (~squizzi@107.13.31.195) has joined #ceph
[15:28] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[15:30] * technil (~technil@host.cctv.org) has joined #ceph
[15:31] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[15:32] * evelu (~erwan@aut78-1-78-236-183-64.fbx.proxad.net) has joined #ceph
[15:32] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) Quit (Read error: Connection reset by peer)
[15:33] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:33] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[15:37] <Mahoff> @doppelgrau1 There is one node left with 7 OSDs up (and in). "mon_osd_min_down_reporters": "2" - so from my understanding it should report
[15:37] <technil> does anyone know if you can you set min_size 0 briefly on a pool to clear out incomplete pgs with a min_size of 1?
[15:37] <technil> my assumption is no.
[15:39] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) has joined #ceph
[15:40] <doppelgrau1> Mahoff: I would watch ceph -w during shutdown and until the problem resolves, might show additional hints (e.g. when do the monitors decide on which information that the osd is down)
[15:40] * fdmanana (~fdmanana@bl12-226-64.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[15:40] * tZ (~totalworm@46.45.177.98) Quit ()
[15:45] <Mahoff> @doppelgrau1 Good idea i'll give it a try
[15:46] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) Quit (Ping timeout: 480 seconds)
[15:46] * bb0x (~bb0x@5.2.199.244) Quit (Read error: Connection reset by peer)
[15:47] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:47] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Quit: Leaving)
[15:49] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:50] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:6071:62a3:b364:3e85) has joined #ceph
[15:50] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit ()
[15:51] * dgurtner_ (~dgurtner@178.197.232.208) Quit (Read error: Connection reset by peer)
[15:51] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:53] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[15:54] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit ()
[15:54] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:55] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Remote host closed the connection)
[15:56] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[15:56] * bb0x (~bb0x@5.2.199.244) Quit (Read error: No route to host)
[15:57] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[15:58] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[16:02] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:02] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[16:03] * dnunez (~dnunez@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[16:04] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[16:04] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[16:04] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:04] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:05] * dgurtner (~dgurtner@178.197.232.208) Quit (Read error: No route to host)
[16:06] * jordan_c (~jconway@cable-192.222.246.54.electronicbox.net) has joined #ceph
[16:06] * jpierre03 (~jpierre03@voyage.prunetwork.fr) has joined #ceph
[16:08] * jpierre03 (~jpierre03@voyage.prunetwork.fr) Quit (Read error: Connection reset by peer)
[16:08] * jpierre03 (~jpierre03@voyage.prunetwork.fr) has joined #ceph
[16:12] * Chrissi_ (~loft@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[16:14] * dnunez (~dnunez@nat-pool-bos-t.redhat.com) has joined #ceph
[16:14] * Jeffrey4l__ (~Jeffrey@110.252.57.249) Quit (Ping timeout: 480 seconds)
[16:14] * Jeffrey4l__ (~Jeffrey@110.252.57.249) has joined #ceph
[16:19] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) Quit (Read error: Connection reset by peer)
[16:20] * dgurtner (~dgurtner@178.197.232.208) has joined #ceph
[16:28] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) has joined #ceph
[16:28] * dgurtner (~dgurtner@178.197.232.208) Quit (Ping timeout: 480 seconds)
[16:32] * tsg (~tgohad@192.55.55.41) Quit (Remote host closed the connection)
[16:32] <Mahoff> After shutting down 1 node it takes 30 mins until this message appears for all ODSs of this node: osd.5 marked down after no pg stats for 900.056937seconds
[16:32] <Mahoff> Why does it take this long??
[16:33] * kefu (~kefu@183.193.165.164) has joined #ceph
[16:33] <Mahoff> Seems to be mon osd report timeout = 900 - but why does it wait 1800 seconds?
[16:34] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[16:36] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[16:38] * xarses_ (~xarses@64.124.158.192) has joined #ceph
[16:41] * cathode (~cathode@50.232.215.114) has joined #ceph
[16:41] * flisky (~Thunderbi@223.104.254.105) has joined #ceph
[16:42] * Chrissi_ (~loft@5AEAAAQUS.tor-irc.dnsbl.oftc.net) Quit ()
[16:44] <haomaiwang> kefu: https://github.com/ceph/ceph/pull/10264 qa regression tests passed
[16:45] <kefu> haomaiwang, i just feel uncomfortable with the before/after daemonize hooks.
[16:45] <kefu> haomaiwang will take a look at that part later.
[16:47] * zhen (~Thunderbi@43.255.178.224) has joined #ceph
[16:49] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit (Quit: treenerd_)
[16:50] <haomaiwang> kefu: yes, before I do this, I'm also consider this hook. Another solution is we bind port after fork, the drawback is we can't directly know whether can we bind port when execute cmdline
[16:50] <kefu> can we move to #ceph-devel
[16:50] <kefu> ?
[16:50] <kefu> haomaiwang ^
[16:51] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[16:53] <boolman> in dump_historic_ops, duration, is that in seconds?
[16:54] <boolman> on the osd daemon socket
[16:56] * W|ldCraze (~isaxi@26XAAASYI.tor-irc.dnsbl.oftc.net) has joined #ceph
[16:58] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[16:58] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:00] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[17:01] * bene2 (~bene@nat-pool-rdu-u.redhat.com) has joined #ceph
[17:01] * CampGareth (~Max@149.18.114.222) Quit (Read error: Connection reset by peer)
[17:02] <technil> does anyone know how to bring a disk with colocated journal and osd back into the cluster safely without destorying data?
[17:02] <technil> the original osd has been removed from the cluster
[17:03] <technil> ceph-deploy disk lists sees it and thinks it is active
[17:04] <technil> but it is not in the crush map
[17:04] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:04] * kefu (~kefu@183.193.165.164) Quit (Max SendQ exceeded)
[17:04] <technil> I have created auth for it
[17:05] * Mahoff (~oftc-webi@ae0-19.frankfurt-1.celox.net) has left #ceph
[17:05] <technil> and it won't let me add it to the crushmap without "creating" it
[17:06] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[17:06] * ade (~abradshaw@p4FF79653.dip0.t-ipconnect.de) has joined #ceph
[17:09] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[17:09] * mykola (~Mikolaj@193.93.217.35) has joined #ceph
[17:11] * ade (~abradshaw@p4FF79653.dip0.t-ipconnect.de) Quit ()
[17:11] * ade (~abradshaw@p4FF79653.dip0.t-ipconnect.de) has joined #ceph
[17:12] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[17:14] <jordan_c> does osd heartbeat checking happen on the public network or cluster network?
[17:14] <Hatsjoe> cluster
[17:14] * bb0x (~bb0x@5.2.199.244) Quit (Ping timeout: 480 seconds)
[17:16] * swami2 (~swami@49.38.2.241) Quit (Quit: Leaving.)
[17:17] * flisky1 (~Thunderbi@124.207.50.254) has joined #ceph
[17:18] * flisky1 (~Thunderbi@124.207.50.254) Quit ()
[17:19] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) Quit (Read error: Connection reset by peer)
[17:20] * kefu (~kefu@183.193.165.164) has joined #ceph
[17:21] * flisky (~Thunderbi@223.104.254.105) Quit (Ping timeout: 480 seconds)
[17:21] <jordan_c> thanks Hatsjoe
[17:22] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) has joined #ceph
[17:24] * ade (~abradshaw@p4FF79653.dip0.t-ipconnect.de) Quit (Quit: Too sexy for his shirt)
[17:25] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) has joined #ceph
[17:25] * W|ldCraze (~isaxi@26XAAASYI.tor-irc.dnsbl.oftc.net) Quit ()
[17:27] * shaunm (~shaunm@74.83.215.100) Quit (Ping timeout: 480 seconds)
[17:27] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[17:30] <ceph-ircslackbot> <vdb> http://hastebin.com/ahonaganes.vhdl is this a known error with 10.2.2?
[17:30] <ceph-ircslackbot> <vdb> I have no idea why it???s defaulting to bluestore there.
[17:30] * tsg (~tgohad@192.55.55.41) has joined #ceph
[17:30] <ceph-ircslackbot> <vdb> Is there a way to disable it?
[17:31] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[17:37] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[17:37] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit ()
[17:38] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[17:39] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Killed (NickServ (Too many failed password attempts.)))
[17:39] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[17:41] * reed (~reed@184-23-0-196.dsl.static.fusionbroadband.com) has joined #ceph
[17:41] * boolman (boolman@79.138.78.238) Quit (Ping timeout: 480 seconds)
[17:44] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[17:46] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[17:49] * kefu (~kefu@183.193.165.164) Quit (Read error: No route to host)
[17:49] * kuku (~kuku@112.203.10.252) Quit (Read error: Connection reset by peer)
[17:50] * kefu (~kefu@183.193.165.164) has joined #ceph
[17:50] * kuku (~kuku@112.203.10.252) has joined #ceph
[17:52] * rmart04 (~rmart04@support.memset.com) Quit (Quit: rmart04)
[17:53] * kuku (~kuku@112.203.10.252) Quit (Remote host closed the connection)
[17:54] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[17:54] * kefu_ (~kefu@114.92.96.253) Quit (Read error: Connection reset by peer)
[17:55] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[17:56] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[17:56] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[17:57] * mhackett (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[17:57] * mhackett (~mhack@nat-pool-bos-t.redhat.com) Quit ()
[17:57] * kefu (~kefu@183.193.165.164) Quit (Read error: No route to host)
[17:58] * kefu (~kefu@183.193.165.164) has joined #ceph
[18:00] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[18:00] * kefu_ (~kefu@114.92.96.253) Quit (Read error: Connection reset by peer)
[18:02] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[18:04] * davidzlap1 (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[18:05] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[18:06] * kefu (~kefu@183.193.165.164) Quit (Ping timeout: 480 seconds)
[18:06] <willi> anyone here who can help me with rbd striping ?
[18:07] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[18:09] * blizzow (~jburns@50.243.148.102) has joined #ceph
[18:09] * yanzheng (~zhyan@125.70.20.95) Quit (Ping timeout: 480 seconds)
[18:10] * bene2 (~bene@nat-pool-rdu-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:11] <technil> have a full osd issuing a "genericfilestorebackend(/var/lib/ceph/osd/ceph-xx) detect_features: failed to write to /var/lib/ceph/osd/ceph-xx/fiemap_test: (1) Operation not permitted" error
[18:11] <technil> which results in ** ERROR: error converting store /var/lib/ceph/osd/ceph-27: (1) Operation not permitted
[18:11] <technil> which results in ** ERROR: error converting store /var/lib/ceph/osd/ceph-xx: (1) Operation not permitted
[18:12] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[18:12] * kefu_ is now known as kefu|afk
[18:12] * ffilz (~ffilz@c-76-115-190-27.hsd1.or.comcast.net) Quit (Remote host closed the connection)
[18:13] <technil> this recovered osd has pg data needed for incomplete pgs
[18:13] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[18:14] <technil> the error appears to be due to a failure of writing fiemap_test (as there is ~20KB left on osd part)
[18:15] * yanzheng (~zhyan@118.116.113.221) has joined #ceph
[18:15] <technil> the osd is reweighted to 0.8 to push data elsewhere, so it shouldn't get any writes
[18:15] <technil> can you bring up an osd read-only ?
[18:16] <scuttlemonkey> The @Ceph Developer Monthly planning meeting is starting online in approx 15m http://wiki.ceph.com/Planning
[18:19] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) has joined #ceph
[18:19] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) Quit (Remote host closed the connection)
[18:19] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) has joined #ceph
[18:20] <technil> only thing I can find about recovery is http://ceph.com/community/incomplete-pgs-oh-my/
[18:20] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Ping timeout: 480 seconds)
[18:20] * ffilz (~ffilz@c-76-115-190-27.hsd1.or.comcast.net) has joined #ceph
[18:21] * joshd1 (~jdurgin@2602:30a:c089:2b0:94c2:8d44:7f81:59a2) has joined #ceph
[18:21] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[18:25] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[18:26] * Pulp (~Pulp@63-221-50-195.dyn.estpak.ee) has joined #ceph
[18:26] * madkiss (~madkiss@2001:6f8:12c3:f00f:e085:133b:bc22:f42) Quit (Ping timeout: 480 seconds)
[18:28] * aNupoisc (~adnavare@192.55.55.41) has joined #ceph
[18:29] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[18:30] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[18:31] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[18:31] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:32] <SamYaple> technil: reweighting it to .8 will not ensure it doesnt get any writes
[18:32] <SamYaple> thats not quite how it works
[18:32] <SamYaple> it looks like a file system permission issue
[18:32] <SamYaple> jewel uses the 'ceph' user where previous versions didnt, so that file might be owned by root?
[18:32] <SamYaple> how sure are you it is because the FS is full?
[18:33] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[18:33] * aj__ (~aj@88.128.80.62) has joined #ceph
[18:34] <SamYaple> i take that back, thats probably not a perm issue, but check anyway. what is the backend of the OSDs? xfs? btrfs?
[18:35] * yanzheng (~zhyan@118.116.113.221) Quit (Ping timeout: 480 seconds)
[18:38] <technil> @SamYaple thanks, i figured that out, and set it to 0
[18:39] * zhen (~Thunderbi@43.255.178.224) Quit (Remote host closed the connection)
[18:42] * yanzheng (~zhyan@125.70.20.176) has joined #ceph
[18:42] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:42] * vimal (~vikumar@114.143.165.7) has joined #ceph
[18:42] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[18:43] * Pulp (~Pulp@63-221-50-195.dyn.estpak.ee) Quit (Read error: Connection reset by peer)
[18:45] * madkiss (~madkiss@ip5b4029be.dynamic.kabel-deutschland.de) has joined #ceph
[18:48] * DanFoster (~Daniel@2a00:1ee0:3:1337:61a8:c3cb:9ad9:8377) Quit (Quit: Leaving)
[18:51] * vbellur1 (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[18:51] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[18:51] <SamYaple> technil: i dont think you need to set it to zero either
[18:51] <SamYaple> thats going to _remove_ all data from the osd
[18:51] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:51] <SamYaple> not to mention change the data distributino on the rest of the cluster
[18:52] <SamYaple> i think .8 is a good choice so balance out the data. though the best choice is to add more osds
[18:53] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[18:55] * kefu|afk (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:56] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[18:56] * ntpttr (~ntpttr@192.55.55.41) has joined #ceph
[18:57] * madkiss (~madkiss@ip5b4029be.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[19:07] <jordan_c> I'm having some issues with my test network, osds keep being shutdown and I can't figure out why - I'm assuming it may be a network issue, but an not clear on where to start troubleshooting.
[19:07] <jordan_c> I'm seeing things like "2016-08-03 17:07:16.455339 7f7a457d3700 0 -- 192.168.100.12:0/2329 >> 10.68.10.16:6803/2643 pipe(0x7f7a78e4a000 sd=41 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7a78de0580).fault" in an osd log
[19:08] <jordan_c> 192.168.100.1/24 is my public network and 10.68.10.1/24 is my cluster network
[19:08] <jordan_c> should the cluster network be routable from the public network?
[19:08] * madkiss (~madkiss@ip5b417648.dynamic.kabel-deutschland.de) has joined #ceph
[19:09] <jordan_c> I've got 3 monitors with interfaces only on the public network and 5 osd nodes with interfaces on both public and cluster network (2 full disk osds per node with a third shared journal disk)
[19:11] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[19:16] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[19:19] * BrianA (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[19:22] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[19:24] * dgurtner (~dgurtner@178.197.235.207) has joined #ceph
[19:25] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[19:26] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[19:26] <blizzow> What's the story with Calamari and Jewel? The only thing I see in the ceph repos is a version for ubuntu Trusty and older. Is there a newer version or replacement?
[19:27] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[19:27] <scuttlemonkey> blizzow: are you speaking of Calamari the API...or the old GUI management tool?
[19:30] <blizzow> The old GUI management tool.
[19:30] <scuttlemonkey> ahh
[19:31] <scuttlemonkey> well, there have been some intermediary things downstream...but I'm working on getting the eventual realization of that work upstreamed a bit more cleanly
[19:31] <scuttlemonkey> the staging project was called "skyrings" https://github.com/skyrings
[19:31] <blizzow> Is there some sane reason someone would give two major features of one project the same name?
[19:31] <scuttlemonkey> that is going through a bit of metamorphosis and will eventually be a stand-alone project called "Tendrl"
[19:32] <scuttlemonkey> https://github.com/tendrl
[19:32] <scuttlemonkey> blizzow: no, it caused confusion and delay...so now the management tool will be Tendrl, the API will be Calamari (and Calamari will live mostly within the new ceph-manage daemon that is coming....as I understand it)
[19:33] <scuttlemonkey> hopefully this will help things be marginally clearer than mud :)
[19:34] <blizzow> okay, so there is no ceph provided gui/dashboard for a Jewel cluster.
[19:34] <scuttlemonkey> not embedded or distributed with Ceph by default, no
[19:35] <scuttlemonkey> we're trying to keep people's options open and not force anything
[19:35] <scuttlemonkey> then you can use ceph-dash, openattic, tendrl, etc, etc without needing to care
[19:35] * doppelgrau1 (~Admin@132.252.235.172) Quit (Quit: Leaving.)
[19:35] * flesh (~oftc-webi@static.ip-171-033-130-093.signet.nl) Quit (Quit: Page closed)
[19:38] * madkiss1 (~madkiss@ip5b414b9b.dynamic.kabel-deutschland.de) has joined #ceph
[19:39] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Quit: Leaving.)
[19:39] <blizzow> I'm not saying that a dash/gui should be installed by default, I am a little miffed that there used to be one (calamari), and now it's gone with no replacement in a major (Jewel) release.
[19:40] <SamYaple> jordan_c: the cluster network does not need to be routable or reachable from the public network
[19:40] <SamYaple> jordan_c: your monitors should be listening on the public network
[19:40] * doppelgrau1 (~doppelgra@132.252.235.172) has joined #ceph
[19:41] * vimal (~vikumar@114.143.165.7) has joined #ceph
[19:41] * ntpttr (~ntpttr@192.55.55.41) Quit (Remote host closed the connection)
[19:42] <jordan_c> SamYaple: ok, that's what I've got, thanks
[19:44] <scuttlemonkey> blizzow: well, the direct successor to "calamari-the-gui" was Romana, which is available at https://github.com/ceph/romana for now
[19:44] <scuttlemonkey> not sure how it was aging
[19:44] <scuttlemonkey> the vast majority of feedback was that people wanted either a) downstream GUI/manager or b) external project
[19:45] * madkiss (~madkiss@ip5b417648.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[19:45] <scuttlemonkey> so for now I know we have "a" and "b" is coming
[19:45] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:6071:62a3:b364:3e85) Quit (Ping timeout: 480 seconds)
[19:45] <scuttlemonkey> but yes, I definitely hear the complaint
[19:47] * dgurtner (~dgurtner@178.197.235.207) Quit (Ping timeout: 480 seconds)
[19:50] * aj__ (~aj@88.128.80.62) Quit (Ping timeout: 480 seconds)
[19:51] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[19:52] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[19:55] * tsg_ (~tgohad@134.134.139.77) has joined #ceph
[19:56] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[19:59] * haplo37 (~haplo37@107.190.44.23) has joined #ceph
[20:00] * tsg (~tgohad@192.55.55.41) Quit (Remote host closed the connection)
[20:00] * elsonrodriguez (~oftc-webi@192.55.55.41) has joined #ceph
[20:05] * evelu (~erwan@aut78-1-78-236-183-64.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[20:08] * elsonrodriguez (~oftc-webi@192.55.55.41) Quit (Ping timeout: 480 seconds)
[20:12] * david_ (~david@207.107.71.71) has joined #ceph
[20:17] * ntpttr (~ntpttr@192.55.54.36) has joined #ceph
[20:17] * ntpttr (~ntpttr@192.55.54.36) Quit ()
[20:19] * willi (~willi@p200300774E21C8FC50DB1BB208D139BF.dip0.t-ipconnect.de) Quit ()
[20:25] * compass (~oftc-webi@137.82.12.89) has joined #ceph
[20:26] <compass> hi all, quick question, what's the best filesystem to be used on RBD for production? Does it matter? in terms of performance. Thanks.
[20:29] * walcubi (~walcubi@p5795AB75.dip0.t-ipconnect.de) has joined #ceph
[20:32] <SamYaple> compass: it depends on the workload. weve had good success with btrfs for mostly writes
[20:33] * vbellur1 (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:36] <compass> SamYaple: Thanks! I'm planing to use it for mariadb storage. Also the docs here: http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/ says btrfs is not recommended, but I assume it is for the filesystem used on the ceph cluster hosts, is it?
[20:37] <walcubi> Doing trial run #3 - and I'm noting a trend here. Latency starts going through the roof the more objects that get written into a pool.
[20:39] <walcubi> Starting with an empty pool, can get 1500 op/s per OSD. At 18 million objects, now it's only able to write at 60 op/s per OSD.
[20:39] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[20:40] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[20:41] * kcwong (~kcwong@mail.verseon.com) has joined #ceph
[20:41] * madkiss (~madkiss@ip5b414b9b.dynamic.kabel-deutschland.de) has joined #ceph
[20:41] <walcubi> I only have a small setup, 3 storage servers with 6 OSDs. But is it expected to slow down this much as number of files grows? Should large amounts of tiny files be spread out across more servers?
[20:42] <kcwong> Hello. I need some help dealing with a persistent watcher on an RBD that refuses to go away.
[20:43] <walcubi> I'm only consuming 10% of total cluster storage. :-\
[20:44] <kcwong> I'm running infernalis
[20:44] <kcwong> rbd status tells me which host has the watcher but that's about it
[20:45] <kcwong> I don't see anything mounting the device. I've even rebooted that host, but the watcher persists...
[20:45] <kcwong> What can I do?
[20:46] <kcwong> any pointer is welcome
[20:46] * madkiss1 (~madkiss@ip5b414b9b.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[20:46] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[20:46] * Esge (~Uniju@46.166.138.131) has joined #ceph
[20:47] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[20:48] <aNupoisc> Hi guys, I am setting up radosgw with Openstack Swift interface and confused what should go inside rgw.conf file instead of "FastCgiExternalServer /var/www/s3gw.fcgi -socket /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock"
[20:48] <aNupoisc> The above line is meant for Amazon S3
[20:48] * joshd1 (~jdurgin@2602:30a:c089:2b0:94c2:8d44:7f81:59a2) Quit (Quit: Leaving.)
[20:50] <aNupoisc> Is there something specific to Swift interface?
[20:50] * madkiss1 (~madkiss@ip5b40b803.dynamic.kabel-deutschland.de) has joined #ceph
[20:50] <aNupoisc> I know /var/www/s3gw.fcgi is a file but should something specific go inside it for Swift Interface of OpenStack?
[20:50] <aNupoisc> All inputs are appreciated
[20:53] * madkiss (~madkiss@ip5b414b9b.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[20:57] * haplo37 (~haplo37@107.190.44.23) Quit (Ping timeout: 480 seconds)
[20:59] * kcwong (~kcwong@mail.verseon.com) Quit (Remote host closed the connection)
[20:59] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[21:02] * bniver (~bniver@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[21:04] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[21:05] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) Quit (Remote host closed the connection)
[21:05] * pdrakeweb (~pdrakeweb@cpe-71-74-153-111.neo.res.rr.com) has joined #ceph
[21:06] <rkeene> Is it too early to say I hate OpenStack ?
[21:06] * haplo37 (~haplo37@107-190-44-23.cpe.teksavvy.com) has joined #ceph
[21:07] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[21:10] <walcubi> Hmm, maybe I should rebuild using ext4 instead tomorrow. To late to do anything at this hour.
[21:11] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[21:14] * madkiss (~madkiss@ip5b40b803.dynamic.kabel-deutschland.de) has joined #ceph
[21:14] * madkiss1 (~madkiss@ip5b40b803.dynamic.kabel-deutschland.de) Quit (Read error: Connection reset by peer)
[21:16] * Esge (~Uniju@5AEAAAQ1Q.tor-irc.dnsbl.oftc.net) Quit ()
[21:17] * madkiss1 (~madkiss@ip5b40b803.dynamic.kabel-deutschland.de) has joined #ceph
[21:17] * madkiss (~madkiss@ip5b40b803.dynamic.kabel-deutschland.de) Quit (Read error: No route to host)
[21:18] * joshd1 (~jdurgin@2602:30a:c089:2b0:94c2:8d44:7f81:59a2) has joined #ceph
[21:18] * neurodrone_ (~neurodron@162.243.191.67) has joined #ceph
[21:19] * EthanL (~lamberet@cce02cs4034-fa12-z.ams.hpecore.net) has joined #ceph
[21:19] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[21:20] <SamYaple> rkeene: 2014 called. they want your comments back
[21:20] <SamYaple> walcubi: you shold never use ext4 for osds anymore
[21:20] <rkeene> SamYaple, Has OpenStack improved in some way ?
[21:21] <blizzow> rkeene: I went through the exercise of trying to deploy openstack a couple different ways. It was so painful and resource heavy.
[21:21] <blizzow> And never f***ing worked.
[21:22] <rkeene> blizzow, I switched to OpenNebula because modifying OpenStack to do non-dumb things was tedious -- it's so large (~200MB for a base install !) and with terrible error reporting (200 line stack trace telling you "Connection refused", but omitting what it was trying to connect to)
[21:23] <blizzow> I haven't used opennebula, ruby and mysql? ick.
[21:24] <blizzow> But I guess using any frontend will need some kind grossness.
[21:26] * rendar (~I@host167-157-dynamic.44-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:26] * EthanL (~lamberet@cce02cs4034-fa12-z.ams.hpecore.net) Quit (Read error: Connection reset by peer)
[21:27] <SamYaple> rkeene: all things improve with time.
[21:27] <SamYaple> rkeene: i still rage quit openstack every once in a while
[21:27] <rkeene> SamYaple, I went through 3 versions of OpenStack and it did not improve
[21:27] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[21:28] <rkeene> I started a re-implementation of OpenStack called StraightStack... but it had so many dumb APIs I gave up
[21:28] <rkeene> I'll probably re-implement OpenNebula, though :-D
[21:29] * madkiss1 (~madkiss@ip5b40b803.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[21:29] <SamYaple> rkeene: they both have thier ups and downs
[21:29] <rkeene> http://chiselapp.com/user/rkeene/repository/reopenstack/index (It started out being named "reopenstack")
[21:37] * pfactum (~post-fact@vulcan.natalenko.name) has joined #ceph
[21:37] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Read error: Connection reset by peer)
[21:39] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[21:41] * aj__ (~aj@x590d8d33.dyn.telefonica.de) has joined #ceph
[21:42] * shubjero (~shubjero@107.155.107.246) has left #ceph
[21:44] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[21:45] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[21:47] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[21:50] <Aeso> rereopenstack :D
[21:50] <srk> hello, After upgrading a cluster from Hammer to Jewel, cluster status changes to HEALTH_WARN with "crush map has legacy tunables(requires bobtail, min firefly)" message.
[21:50] <SamYaple> srk: you should bump up your tuneables to take advantage of the new crush features
[21:50] <SamYaple> or ignore the message
[21:51] <SamYaple> (you can also disable the mesage in ceph.conf)
[21:51] <srk> If I set it to optimal or Jewel it causes lot of recovery IO
[21:51] <srk> Which is not ideal option for production
[21:52] * rendar (~I@host167-157-dynamic.44-79-r.retail.telecomitalia.it) has joined #ceph
[21:52] <srk> SamYaple: Did you see any performance benefits by switching to the new tunables (i.e. Optimal or Jewel)?
[21:53] <SamYaple> there are several areas that improvements are made and noticable, yes
[21:53] <SamYaple> optimal is jewel in this case btw
[21:53] <SamYaple> default would be hammer I believe
[21:54] <srk> got it thanks. So, ignoring is not an option :)
[21:54] <SamYaple> it is always an option
[21:56] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[21:56] <srk> if it is improving performance, has to be considered, right?
[21:57] <SamYaple> i would recommend reading up on the different tunables
[21:57] <SamYaple> if you are going from legacy to optimal expect 100% of data to move. but you can reduce that greatly while still getting most of the benefit
[21:57] <SamYaple> if you do it right
[21:59] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[22:00] * pfactum is now known as post-factum
[22:00] <srk> for Hammer based cluster, the tunables are left as-is from a default installation.
[22:00] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[22:00] <srk> # ceph osd crush show-tunables
[22:00] <srk> {
[22:00] <srk> "choose_local_tries": 0,
[22:00] <srk> "choose_local_fallback_tries": 0,
[22:00] <srk> "choose_total_tries": 50,
[22:00] <srk> "chooseleaf_descend_once": 1,
[22:00] <srk> "chooseleaf_vary_r": 0,
[22:00] <srk> "straw_calc_version": 1,
[22:00] <srk> "allowed_bucket_algs": 22,
[22:01] <srk> "profile": "unknown",
[22:01] <srk> "optimal_tunables": 0,
[22:01] <srk> "legacy_tunables": 0,
[22:01] <srk> "require_feature_tunables": 1,
[22:01] <srk> "require_feature_tunables2": 1,
[22:01] <srk> "require_feature_tunables3": 0,
[22:01] <srk> "has_v2_rules": 0,
[22:01] <srk> "has_v3_rules": 0,
[22:01] <srk> "has_v4_buckets": 0
[22:01] <srk> }
[22:01] <srk> profile is "unknown"
[22:02] * tsg__ (~tgohad@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[22:02] <srk> and they are considered as "legacy"
[22:02] * kcwong (~kcwong@mail.verseon.com) has joined #ceph
[22:04] * tsg_ (~tgohad@134.134.139.77) Quit (Remote host closed the connection)
[22:05] * shubjero (~shubjero@107.155.107.246) has joined #ceph
[22:05] * shubjero (~shubjero@107.155.107.246) has left #ceph
[22:08] * shubjero (~shubjero@107.155.107.246) has joined #ceph
[22:10] <kcwong> what is the host for the irc logs?
[22:11] <technil> I have and osd with some data as part of a pg recovery that seems to hang on "done with init, starting boot process" - any ideas?
[22:14] * mykola (~Mikolaj@193.93.217.35) Quit (Quit: away)
[22:14] <kcwong> so, do we have a geek on duty?
[22:16] * blizzow (~jburns@50.243.148.102) Quit (Ping timeout: 480 seconds)
[22:17] * cathode (~cathode@50.232.215.114) Quit (Quit: Leaving)
[22:21] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[22:22] * dgurtner (~dgurtner@46.189.28.53) has joined #ceph
[22:23] <kcwong> crickets
[22:23] * technil sees more logging info now, looks like I just have to be patient
[22:25] * gregmark (~Adium@68.87.42.115) has joined #ceph
[22:26] <kcwong> any ideas how to deal w/ phantom watchers?
[22:27] * `Jin (~Qiasfah@tor.yrk.urgs.uk0.bigv.io) has joined #ceph
[22:29] <SamYaple> kcwong: when trying to do something that blocks with watchers they time out after 30 seconds
[22:29] <SamYaple> kcwong: unless they are still active...
[22:30] <SamYaple> you can check the watchers and deal with them accordingly
[22:30] <kcwong> thanks, @samyaple
[22:30] <kcwong> rbd status tells me the host, but what install watchers?
[22:30] <kcwong> how do i find them?
[22:31] <kcwong> i've rebooted the host identified in rbd status, but the watcher persist; rbd showmapped list nothing, so i'm at an impasse
[22:33] <scuttlemonkey> kcwong: typically irc logs are at irclogs.ceph.widodh.nl but the frontend appears to be down for now
[22:34] <scuttlemonkey> sent Wido an email, but he is likely offline until tomorrow
[22:36] * Jeffrey4l_ (~Jeffrey@110.252.45.235) has joined #ceph
[22:36] <kcwong> thanks, scuttlemonkey.
[22:37] <kcwong> that host is actually off DNS
[22:37] <scuttlemonkey> kcwong: yeah, I'm guessing he moved it. I asked him if he wants to put it on ceph.com somewhere
[22:38] <scuttlemonkey> probably should anyway. Anonymized stats are on metrics.ceph.com
[22:38] <scuttlemonkey> when the new kibana front end to metrics.ceph lands in the next month or two we should have full message history as well
[22:38] <scuttlemonkey> then it'll be easy to find
[22:38] * Jeffrey4l__ (~Jeffrey@110.252.57.249) Quit (Ping timeout: 480 seconds)
[22:39] <kcwong> cool!
[22:45] <kcwong> back to watchers... any suggestions/pointers
[22:45] <kcwong> i can't remove the darn rbd.
[22:46] <jordan_c> any clue what's happening or why I woul dhave a HEALTH_OK cluster with 0 pools in it 10/10 osd up and in, and the moment I create a pool osds drop like flies =\
[22:48] <SamYaple> jordan_c: verify all nodes can talk to each other
[22:48] <SamYaple> jordan_c: is this a special setup in anyway (are you using docker or other things that would make this a non-standard deploy?)
[22:49] <jordan_c> SamYaple: all kvm vms
[22:49] <jordan_c> test lab
[22:50] <jordan_c> https://paste.fedoraproject.org/400928/70257436/
[22:56] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[22:56] <SamYaple> how many osd hosts do you have?
[22:56] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[22:57] * `Jin (~Qiasfah@9YSAAA3I0.tor-irc.dnsbl.oftc.net) Quit ()
[22:58] * scg (~zscg@181.122.37.47) has joined #ceph
[22:58] <jordan_c> SamYaple: 3 monitors 5 osd hosts
[23:02] * toast-work (~arabassa@pool-71-255-253-39.washdc.fios.verizon.net) has joined #ceph
[23:03] <toast-work> hey all, I have a architectural question from someone who is just learning ceph. I'm looking at installing ceph clusters for storage at several sites, and I'm a little curious about how that works with ceph object gateway
[23:03] <toast-work> if i want to provide DR between sites over slower links, etc
[23:04] <toast-work> some of the sites have fast links between each other that can support synchronous replication, others have slow links where something like a nightly snapshot of volumes is all that it can handle
[23:04] <toast-work> what I don't understand is twofold: what options are there for that type of replication, and once I set it up, what are realistic DR scenerios?
[23:05] <toast-work> (if i'm approaching this wrong I have no problem with that, just trying to get an idea of what I should be working towards as I set this up and experiment)
[23:07] * tsg_ (~tgohad@134.134.139.77) has joined #ceph
[23:12] * tsg__ (~tgohad@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[23:13] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:14] <SamYaple> jordan_c: and all osds can talk to all the other osd hosts on the public (and cluster) network ips?
[23:15] <jordan_c> SamYaple: what's the best way to test that, which ports?
[23:16] <SamYaple> i think the default port range is 6800-7300 for osds jordan_c, but im mainly talkign about normal network connectivity
[23:17] <SamYaple> can they all ping eachother? use iperf to get an estimated bandwidth between them
[23:17] <SamYaple> its possible you are running out of bandwidth for the healthchecks causing the flapping
[23:17] <SamYaple> but if the osds cant talk to each other (but the mons can talk to the osds) youll see lots of flapping
[23:20] * Aal (~Altitudes@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[23:25] <jordan_c> ah, I may have some issues with my cluster network routing
[23:27] * danieagle (~Daniel@177.9.73.107) has joined #ceph
[23:27] * doppelgrau_ (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[23:29] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[23:32] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[23:32] * doppelgrau_ is now known as doppelgrau
[23:32] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[23:34] * ircolle (~Adium@2601:285:201:633a:59c8:f7d9:15e0:9e37) has joined #ceph
[23:36] * chengpeng__ (~chengpeng@180.168.126.179) has joined #ceph
[23:41] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[23:42] * chengpeng (~chengpeng@180.168.170.2) Quit (Ping timeout: 480 seconds)
[23:42] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[23:45] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[23:49] <kcwong> i guess i'll fire off an email to the mailing list then.
[23:49] <technil> just so people know if they are following my issues at a later date, I has a uuid mismatch.
[23:49] * Aal (~Altitudes@61TAAA2KA.tor-irc.dnsbl.oftc.net) Quit ()
[23:50] <technil> get the correct uuid with: sudo ceph-osd --get-osd-fsid -i {id}
[23:50] * dnunez (~dnunez@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[23:55] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Quit: Leaving.)
[23:58] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.