#ceph IRC Log

Index

IRC Log for 2016-08-04

Timestamps are in GMT/BST.

[0:00] * aNupoisc (~adnavare@192.55.55.41) Quit (Remote host closed the connection)
[0:01] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[0:04] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[0:05] <SamYaple> jordan_c: yep. that sounds like the issue
[0:06] <SamYaple> kcwong: i would check the host identified as a watcher to make sure it doesnt have an open connection even after reboot
[0:06] <SamYaple> kcwong: check the active connections and track that back to a process
[0:07] <kcwong> as in netstat?
[0:07] <SamYaple> kcwong: yea netstat/ss
[0:11] * danieagle (~Daniel@177.9.73.107) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[0:11] * rendar (~I@host167-157-dynamic.44-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[0:15] <kcwong> samyaple: do you expect to see a connection to some rbd-client asok? I'm not seeing anything suspicious with netstat -np
[0:15] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[0:17] * [0x4A6F]_ (~ident@p508CDE78.dip0.t-ipconnect.de) has joined #ceph
[0:18] <SamYaple> kcwong: if its a client node there shouldnt be _any_ connections
[0:18] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:18] * [0x4A6F]_ is now known as [0x4A6F]
[0:28] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) has joined #ceph
[0:32] <kcwong> this happens to be a client node. there is no asok under /run/ceph/rbd-clients. i don't see any established connection to ceph-mon.
[0:35] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:36] * aNupoisc (~adnavare@192.55.55.37) has joined #ceph
[0:37] * kuku (~kuku@119.93.91.136) has joined #ceph
[0:38] * kuku (~kuku@119.93.91.136) Quit ()
[0:40] * kuku (~kuku@119.93.91.136) has joined #ceph
[0:41] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[0:44] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[0:55] <kcwong> in the output of rbd status, does any of those numbers after the IP mean anything?
[0:56] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[0:56] <kcwong> I the client.<number> seems to map to the host
[0:59] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[1:00] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[1:03] * dgurtner (~dgurtner@46.189.28.53) Quit (Ping timeout: 480 seconds)
[1:09] * ircolle (~Adium@2601:285:201:633a:59c8:f7d9:15e0:9e37) Quit (Quit: Leaving.)
[1:09] * kcwong (~kcwong@mail.verseon.com) Quit (Quit: Leaving...)
[1:28] * kuku (~kuku@119.93.91.136) Quit (Quit: computer sleep)
[1:39] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[1:39] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) Quit (Quit: Leaving)
[1:43] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) has joined #ceph
[1:46] * technil (~technil@host.cctv.org) Quit (Quit: Ex-Chat)
[1:52] * oms101 (~oms101@p20030057EA02A100C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:57] * theTrav (~theTrav@ipc032.ipc.telstra.net) has joined #ceph
[2:01] * oms101 (~oms101@p20030057EA031900C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:09] * tsg_ (~tgohad@134.134.139.77) Quit (Remote host closed the connection)
[2:28] * BrianA (~BrianA@fw-rw.shutterfly.com) Quit (Read error: Connection reset by peer)
[2:30] * Maariu5_ (~Silentspy@213.61.149.100) has joined #ceph
[2:39] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[2:42] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[2:43] * xarses_ (~xarses@64.124.158.192) Quit (Ping timeout: 480 seconds)
[2:44] * haplo37 (~haplo37@107-190-44-23.cpe.teksavvy.com) Quit (Ping timeout: 480 seconds)
[2:46] * cgxu (~cgxu@220.249.249.31) has joined #ceph
[2:46] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:51] * yanzheng1 (~zhyan@125.70.20.176) has joined #ceph
[2:52] * compass (~oftc-webi@137.82.12.89) Quit (Ping timeout: 480 seconds)
[2:56] * yanzheng (~zhyan@125.70.20.176) Quit (Ping timeout: 480 seconds)
[2:59] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[3:00] * Maariu5_ (~Silentspy@61TAAA2N1.tor-irc.dnsbl.oftc.net) Quit ()
[3:04] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[3:10] * aNupoisc (~adnavare@192.55.55.37) Quit (Remote host closed the connection)
[3:11] * cronburg (~cronburg@209-6-121-249.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com) has joined #ceph
[3:16] * andrew (~andrew@38.123.99.230) has joined #ceph
[3:20] * theTrav_ (~theTrav@203.35.9.142) has joined #ceph
[3:23] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Quit: Leaving.)
[3:26] * theTrav (~theTrav@ipc032.ipc.telstra.net) Quit (Read error: Connection timed out)
[3:27] * georgem (~Adium@76-10-180-154.dsl.teksavvy.com) has joined #ceph
[3:27] * Inverness (~puvo@tor2r.ins.tor.net.eu.org) has joined #ceph
[3:28] * georgem (~Adium@76-10-180-154.dsl.teksavvy.com) Quit ()
[3:35] * derjohn_mobi (~aj@x590db2b0.dyn.telefonica.de) has joined #ceph
[3:41] * Racpatel (~Racpatel@2601:87:0:24af::53d5) Quit (Read error: No route to host)
[3:42] * aj__ (~aj@x590d8d33.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[3:43] * sebastian-w_ (~quassel@212.218.8.139) has joined #ceph
[3:44] * sebastian-w (~quassel@212.218.8.138) Quit (Ping timeout: 480 seconds)
[3:45] * scg (~zscg@181.122.37.47) Quit (Ping timeout: 480 seconds)
[3:46] * haomaiwang (~oftc-webi@114.249.239.114) Quit (Ping timeout: 480 seconds)
[3:52] * cronburg (~cronburg@209-6-121-249.c3-0.arl-ubr1.sbo-arl.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[3:52] * newdave (~newdave@14-202-180-170.tpgi.com.au) has joined #ceph
[3:55] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:57] * kefu (~kefu@183.193.165.164) has joined #ceph
[3:57] * Inverness (~puvo@5AEAAAQ92.tor-irc.dnsbl.oftc.net) Quit ()
[3:59] * jmn (~jmn@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[4:00] * mhack (~mhack@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[4:01] * mhack (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[4:03] * jmn (~jmn@nat-pool-bos-t.redhat.com) has joined #ceph
[4:08] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[4:09] * theTrav_ (~theTrav@203.35.9.142) Quit (Remote host closed the connection)
[4:10] * blip2 (~Wizeon@108.61.123.84) has joined #ceph
[4:17] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:17] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[4:18] * jbiao (~jbiao@66.187.239.16) has joined #ceph
[4:26] * theTrav (~theTrav@1.136.97.83) has joined #ceph
[4:28] <andrew> Hi all, I follow the guide to build the ceph code. http://docs.ceph.com/docs/master/install/build-ceph/
[4:28] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[4:28] <andrew> the output of the binary ceph-osd is about 165M
[4:30] * kefu (~kefu@183.193.165.164) Quit (Max SendQ exceeded)
[4:31] * kefu (~kefu@183.193.165.164) has joined #ceph
[4:39] <andrew> while the official build is only 15M+
[4:40] * blip2 (~Wizeon@108.61.123.84) Quit ()
[4:45] * andrew (~andrew@38.123.99.230) has left #ceph
[4:51] * theTrav (~theTrav@1.136.97.83) Quit (Read error: No route to host)
[4:58] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[5:03] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[5:05] * jfaj (~jan@p20030084AF3665005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[5:06] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[5:08] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[5:11] * theTrav (~theTrav@1.136.97.83) has joined #ceph
[5:14] * theTrav (~theTrav@1.136.97.83) Quit (Remote host closed the connection)
[5:14] * jfaj (~jan@p20030084AF3993005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[5:25] * CobraKhan007 (~Swompie`@nl3x.mullvad.net) has joined #ceph
[5:35] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[5:37] * vimal (~vikumar@114.143.165.7) has joined #ceph
[5:44] * Vacuum_ (~Vacuum@88.130.192.154) has joined #ceph
[5:46] <SamYaple> andrew: you probably need to strip it, that would be my guess
[5:49] <Kingrat> that seems like a bit much for that though, maybe they did a static linked build?
[5:51] * Vacuum__ (~Vacuum@88.130.199.60) Quit (Ping timeout: 480 seconds)
[5:53] <SamYaple> Kingrat: oh jeez. i thought that was 16M
[5:53] <SamYaple> Kingrat: static build seems more likely
[5:55] * CobraKhan007 (~Swompie`@9YSAAA3SR.tor-irc.dnsbl.oftc.net) Quit ()
[6:00] * ZombieTree (~KUSmurf@212.7.196.82) has joined #ceph
[6:00] * kefu is now known as kefu|afk
[6:01] * walcubi_ (~walcubi@p5795BD0E.dip0.t-ipconnect.de) has joined #ceph
[6:05] * neurodrone_ (~neurodron@162.243.191.67) Quit (Quit: neurodrone_)
[6:08] * haomaiwang (~oftc-webi@61.149.85.206) has joined #ceph
[6:08] * walcubi (~walcubi@p5795AB75.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:10] * kefu|afk (~kefu@183.193.165.164) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:10] * ira (~ira@121.244.87.117) has joined #ceph
[6:13] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[6:20] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:24] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:26] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:28] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:30] * ZombieTree (~KUSmurf@26XAAATIE.tor-irc.dnsbl.oftc.net) Quit ()
[6:31] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:31] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:35] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[6:38] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:38] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:39] * demonspork (~hassifa@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[6:39] * kefu (~kefu@114.92.96.253) has joined #ceph
[6:40] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:41] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:46] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[6:47] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:48] * kefu (~kefu@114.92.96.253) has joined #ceph
[6:49] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:50] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:52] * cmorandin_ (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[6:52] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:54] * vimal (~vikumar@121.244.87.116) has joined #ceph
[6:57] * icey (~Chris@0001bbad.user.oftc.net) Quit (Remote host closed the connection)
[6:57] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:58] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:58] * nass5 (~fred@l-p-dn-in-12a.lionnois.site.univ-lorraine.fr) Quit (Read error: Connection reset by peer)
[6:58] * cmorandin_ (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Remote host closed the connection)
[6:59] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[7:01] * icey (~Chris@pool-74-103-175-25.phlapa.fios.verizon.net) has joined #ceph
[7:02] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:03] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:04] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[7:06] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:06] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:08] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[7:09] * demonspork (~hassifa@5AEAAARDW.tor-irc.dnsbl.oftc.net) Quit ()
[7:12] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:12] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:14] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:14] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:21] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:22] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:22] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:22] * W|ldCraze (~MKoR@ip95.ip-94-23-150.eu) has joined #ceph
[7:23] * swami1 (~swami@49.44.57.239) has joined #ceph
[7:23] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:24] * reed (~reed@184-23-0-196.dsl.static.fusionbroadband.com) Quit (Quit: Ex-Chat)
[7:26] * swami2 (~swami@49.38.0.169) has joined #ceph
[7:26] * davidzlap1 (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[7:26] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[7:31] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:31] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:31] * swami1 (~swami@49.44.57.239) Quit (Ping timeout: 480 seconds)
[7:33] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:33] * davidzlap1 (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[7:34] * haplo37 (~haplo37@107.190.44.23) has joined #ceph
[7:34] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:41] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:42] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:42] * swami2 (~swami@49.38.0.169) Quit (Read error: Connection timed out)
[7:43] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:43] * swami1 (~swami@49.38.0.169) has joined #ceph
[7:44] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:45] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[7:45] * karnan (~karnan@121.244.87.117) has joined #ceph
[7:46] * haplo37 (~haplo37@107.190.44.23) Quit (Ping timeout: 480 seconds)
[7:47] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[7:47] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:49] * jbiao (~jbiao@66.187.239.16) Quit (Quit: Leaving)
[7:50] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[7:50] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:51] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:52] * W|ldCraze (~MKoR@61TAAA2UX.tor-irc.dnsbl.oftc.net) Quit ()
[7:52] * Lite (~mrapple@proxy3.6khz.com) has joined #ceph
[7:53] * baojg (~baojg@61.135.155.34) Quit (Quit: Leaving)
[7:53] * baojg (~baojg@61.135.155.34) has joined #ceph
[7:55] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:56] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:00] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[8:00] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[8:01] * swami1 (~swami@49.38.0.169) Quit (Read error: Connection timed out)
[8:01] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:02] * swami1 (~swami@49.38.0.169) has joined #ceph
[8:02] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:07] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:07] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:07] * EthanL (~lamberet@cce02cs4035-fa12-z.ams.hpecore.net) has joined #ceph
[8:08] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[8:10] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:11] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:16] * rwheeler (~rwheeler@121.244.87.118) has joined #ceph
[8:16] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:16] * EthanL (~lamberet@cce02cs4035-fa12-z.ams.hpecore.net) Quit (Ping timeout: 480 seconds)
[8:16] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:17] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[8:18] * swami1 (~swami@49.38.0.169) Quit (Read error: Connection timed out)
[8:19] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:20] * swami1 (~swami@49.38.0.169) has joined #ceph
[8:22] * Lite (~mrapple@9YSAAA3V2.tor-irc.dnsbl.oftc.net) Quit ()
[8:23] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:25] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[8:26] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:27] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:28] <haomaiwang> kefu: still has concern on this https://github.com/ceph/ceph/pull/10264?
[8:29] <kefu> haomaiwang, sorry, i am on an customer issue atm.
[8:30] <haomaiwang> kefu: ok
[8:31] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[8:32] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:33] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:35] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:35] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[8:36] * EthanL (~lamberet@cce02cs4035-fa12-z.ams.hpecore.net) has joined #ceph
[8:36] * Kidlvr (~Mousey@tsn109-201-154-156.dyn.nltelcom.net) has joined #ceph
[8:42] * jrowe_ (~jrowe@204.14.236.152) Quit (Read error: Connection reset by peer)
[8:42] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:42] * jrowe (~jrowe@204.14.236.152) has joined #ceph
[8:44] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:46] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[8:48] * rendar (~I@host208-116-dynamic.51-82-r.retail.telecomitalia.it) has joined #ceph
[8:49] * dgurtner (~dgurtner@178.197.225.90) has joined #ceph
[8:54] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[8:54] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[8:56] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[8:56] * EthanL (~lamberet@cce02cs4035-fa12-z.ams.hpecore.net) Quit (Read error: Connection reset by peer)
[9:01] * rendar (~I@host208-116-dynamic.51-82-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[9:02] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[9:03] * rendar (~I@host208-116-dynamic.51-82-r.retail.telecomitalia.it) has joined #ceph
[9:05] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[9:06] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:06] * Kidlvr (~Mousey@tsn109-201-154-156.dyn.nltelcom.net) Quit ()
[9:14] * andrew (~andrew@ec2-54-65-196-108.ap-northeast-1.compute.amazonaws.com) has joined #ceph
[9:14] * andrew (~andrew@ec2-54-65-196-108.ap-northeast-1.compute.amazonaws.com) has left #ceph
[9:15] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[9:16] * analbeard (~shw@support.memset.com) has joined #ceph
[9:20] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[9:21] * chengpeng_ (~chengpeng@180.168.126.179) has joined #ceph
[9:21] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[9:23] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[9:23] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[9:28] * chengpeng__ (~chengpeng@180.168.126.179) Quit (Ping timeout: 480 seconds)
[9:28] * mollstam (~galaxyAbs@tsn109-201-154-156.dyn.nltelcom.net) has joined #ceph
[9:29] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[9:31] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[9:40] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[9:40] * walcubi_ is now known as walbuci
[9:40] <walbuci> <SamYaple> walcubi: you shold never use ext4 for osds anymore
[9:40] <walbuci> SamYaple, I'm using btrfs currently.
[9:40] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Quit: Nettalk6 - www.ntalk.de)
[9:41] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:42] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:47] <walbuci> I'm now at 18.6 million objects in the pool
[9:47] * boolman (boolman@79.138.78.238) has joined #ceph
[9:47] <walbuci> It's only grown by 600K overnight!
[9:48] <walbuci> https://s31.postimg.org/4p4obmluj/ceph_average_ops.png
[9:48] <walbuci> All I'm doing is pushing in various sized images into a pool using rados_aio_write()
[9:49] <walbuci> Each have a unique key.
[9:49] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:6071:62a3:b364:3e85) has joined #ceph
[9:50] * derjohn_mobi (~aj@x590db2b0.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[9:51] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:52] <walbuci> In total, there will be some 1.7 billion objects that would be held in ceph should this go live.
[9:53] * fsimonce (~simon@host203-44-dynamic.183-80-r.retail.telecomitalia.it) has joined #ceph
[9:53] <walbuci> And it's struggling with *huge* latency problems at 18 million. :-\
[9:56] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[9:58] * mollstam (~galaxyAbs@tsn109-201-154-156.dyn.nltelcom.net) Quit ()
[10:00] * dgurtner (~dgurtner@178.197.225.90) Quit (Ping timeout: 480 seconds)
[10:00] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[10:02] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:03] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[10:08] <walbuci> Argh... it could be one of the disks is going in the bin.
[10:08] <walbuci> One of the OSDs committed heartbeat suicide at 9am this morning
[10:09] <walbuci> common/HeartbeatMap.cc: 86: FAILED assert(0 == "hit suicide timeout")
[10:09] <walbuci> Morbidly cool.
[10:10] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[10:11] * b0e (~aledermue@213.95.25.82) has joined #ceph
[10:16] * derjohn_mobi (~aj@2001:6f8:1337:0:a5b7:91e8:ccb2:1eff) has joined #ceph
[10:17] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:17] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[10:18] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[10:19] * doppelgrau1 is now known as doppelgrau
[10:19] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:21] * Spikey (~SquallSee@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[10:27] * karnan (~karnan@121.244.87.117) has joined #ceph
[10:28] * bb0x (~bb0x@5.2.199.244) Quit (Quit: This computer has gone to sleep)
[10:30] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) has joined #ceph
[10:33] * dgurtner (~dgurtner@178.197.225.90) has joined #ceph
[10:34] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit (Quit: treenerd_)
[10:38] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[10:40] * vicente_ (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[10:40] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Read error: Connection reset by peer)
[10:41] * swami2 (~swami@49.44.57.239) has joined #ceph
[10:42] * bb0x (~bb0x@5.2.199.244) has joined #ceph
[10:44] * vicente_ (~~vicente@125-227-238-55.HINET-IP.hinet.net) has left #ceph
[10:45] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[10:45] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[10:45] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[10:46] * kefu (~kefu@114.92.96.253) has joined #ceph
[10:46] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[10:47] * swami1 (~swami@49.38.0.169) Quit (Ping timeout: 480 seconds)
[10:51] * haomaiwang (~oftc-webi@61.149.85.206) Quit (Quit: Page closed)
[10:51] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) Quit (Remote host closed the connection)
[10:51] * Spikey (~SquallSee@5AEAAARIT.tor-irc.dnsbl.oftc.net) Quit ()
[10:55] * theTrav (~theTrav@CPE-124-188-218-238.sfcz1.cht.bigpond.net.au) has joined #ceph
[11:04] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[11:06] * mehmet (~top@84.254.67.15) has joined #ceph
[11:08] * haomaiwang (~oftc-webi@61.149.85.206) has joined #ceph
[11:10] <mehmet> Hello Guys,
[11:10] <mehmet> hope you can help
[11:11] * TMM (~hp@185.5.121.201) has joined #ceph
[11:12] <mehmet> i have one PG (really only one!) when a deep scrub on this is running, i get slow/blocked requests and my vms stops responding for a while
[11:13] <mehmet> i have checked the disks, the filesyste, set tunable from jewel back to default, removed the intial disks in the akting set but the issue still exists with different disks.
[11:15] <mehmet> any ideas where the issue could be? I am using ceph Jewel 10.2.2
[11:15] * newdave (~newdave@14-202-180-170.tpgi.com.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[11:17] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Read error: Connection reset by peer)
[11:20] * cyphase (~cyphase@c-50-148-131-137.hsd1.ca.comcast.net) has joined #ceph
[11:20] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[11:24] * dgurtner (~dgurtner@178.197.225.90) Quit (Read error: No route to host)
[11:35] * dgurtner (~dgurtner@178.197.225.90) has joined #ceph
[11:38] * swami1 (~swami@49.38.0.169) has joined #ceph
[11:38] <mehmet> 3 OSD Nodes with 10 OSDs (3 OSDs now removed - 1 from each OSD Node to be sure that the issue is not from the installed disks)
[11:39] <mehmet> 3 dedicated Mon Nodes
[11:39] <mehmet> The OSD Servers using 1 NVMe Intel P3700 as journal for 10 Disks
[11:39] * jfaj (~jan@p20030084AF3993005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[11:41] <mehmet> My problem is only when deep-scrub on PG 0.223 starts. All other PGs do not lead to blocked requests.
[11:42] <mehmet> 1 Pool; 1024 PGs
[11:44] * swami2 (~swami@49.44.57.239) Quit (Ping timeout: 480 seconds)
[11:44] * mehmet is now known as mehmetpg0223
[11:46] * baojg (~baojg@61.135.155.34) Quit (Read error: Connection reset by peer)
[11:50] * jfaj (~jan@p20030084AF61CD005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[11:54] * swami1 (~swami@49.38.0.169) Quit (Read error: Connection timed out)
[11:55] * cgxu (~cgxu@220.249.249.31) Quit (Ping timeout: 480 seconds)
[11:56] * swami1 (~swami@49.38.0.169) has joined #ceph
[11:57] * baojg (~baojg@61.135.155.34) has joined #ceph
[11:59] * karnan (~karnan@106.51.128.115) has joined #ceph
[12:00] <IvanJobs> yep
[12:04] <mehmetpg0223> yep2me?
[12:08] * thomnico (~thomnico@2a01:e35:8b41:120:1dcb:a4e9:24ee:41d4) has joined #ceph
[12:18] * cgxu (~cgxu@220.249.249.31) has joined #ceph
[12:20] * wjw-freebsd2 (~wjw@smtp.digiware.nl) has joined #ceph
[12:22] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[12:22] <ivve> mehmet: you have no other issues in the cluster?
[12:23] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:24] <mehmetpg0223> hi ivve, thats right no other issues. I would post the ceph -s output now but there is a recovering in progress after ceph osd crush tunables jewel. So this would not prof sth. now.
[12:25] <mehmetpg0223> It is really this ONE PG! *argh* i don not know what is going on there...
[12:25] <mehmetpg0223> i have done many checks to find the cause but no luck till noch
[12:25] <mehmetpg0223> noch= now
[12:26] <ivve> have you tried to figure out what is on the PG?
[12:26] <mehmetpg0223> i have posted this in the mailing list a few days ago but do not any response any more so i want to try via irc: http://www.spinics.net/lists/ceph-users/msg30016.html
[12:28] <mehmetpg0223> sure! And ceph did not find any error on this when the deepscrub is running. It takes always ~15 min.
[12:29] <ivve> if you haven't already, i'd try to check where the pg is active and check the osd for logs
[12:29] <ivve> see what's up
[12:30] <ivve> ceph pg 0.223 query
[12:30] * blip2 (~TehZomB@nl3x.mullvad.net) has joined #ceph
[12:30] <ivve> i'd look through that query as well for clues
[12:31] <mehmetpg0223> what i see there with debug_osd=5/5 is many trimming, the output of the query;
[12:31] <ivve> are you running proxmox or smth?
[12:32] * kefu (~kefu@183.193.165.164) has joined #ceph
[12:33] <mehmetpg0223> there my last query: http://slexy.org/view/s21d6qUqnV
[12:35] <mehmetpg0223> i am using proxmox but not the ceph provided with this. I ve setup a clean ceph only cluster with jeweil (no proxmox influence)
[12:35] <ivve> okay
[12:37] <ivve> try tailing /var/log/ceph/ceph-osd.9.log and start a deep scrub if it doesn't show anything useful try turning on debugging for that osd
[12:38] <ivve> deep scrub on the pg that is
[12:39] <ivve> you could grep for [ERR] and the pg number
[12:39] <ivve> or just the pg
[12:40] <mehmetpg0223> i have done that many times; what i allways see with debug-osd 5/5 is trimming and after i start deep scrub a few seconds later there follow slow/blocked requests. I have provided the log from the process within the mailing list. i can give you link from the last:
[12:41] <ivve> cool
[12:42] <doppelgrau> Does someone uses ceph with xen/qemu? I upgraded a few hosts on debian jessie from hammer to jewel and that somehow broke the rbd-backend. Any Ideas?
[12:42] <doppelgrau> My plan is going back/reinstall these nodes with hammer ATM, but that isn't a longterm solution
[12:43] * wjw-freebsd2 (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[12:43] <mehmetpg0223> @ ivve: http://slexy.org/view/s21z9JmwSu the deep scrub was started 2016-08-02 10:02:44 and ends 2016-08-02 10:17:22
[12:44] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[12:44] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[12:45] <The_Ball> Is there any way of printing a "scrubbing summary"?
[12:46] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[12:46] * kefu_ (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[12:46] * dgurtner (~dgurtner@178.197.225.90) Quit (Ping timeout: 480 seconds)
[12:47] <mehmetpg0223> @ The_Ball: i have an command there you can get how many scrubs on which day and which our was running
[12:47] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[12:47] <mehmetpg0223> our= hour
[12:47] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[12:48] <mehmetpg0223> for date in `ceph pg dump | grep active | awk '{print $20}'`; do date +%A -d $date; done | sort | uniq -c
[12:48] <mehmetpg0223> for date in `ceph pg dump | grep active | awk '{print $21}'`; do date +%H -d $date; done | sort | uniq -c
[12:48] * kefu (~kefu@183.193.165.164) Quit (Ping timeout: 480 seconds)
[12:48] <mehmetpg0223> @ The_Ball: found this last week
[12:49] * praveen_ (~praveen@121.244.155.12) has joined #ceph
[12:51] <ivve> looking thru it :)
[12:51] <mehmetpg0223> Thank you VERY (!) much!
[12:52] <The_Ball> mehmetpg0223, thanks, I'll have a look
[12:52] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[12:52] <mehmetpg0223> Your helo is really appreciated @ ivve
[12:52] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[12:52] * tom_nz (~oftc-webi@202.14.217.2) Quit (Ping timeout: 480 seconds)
[12:52] <mehmetpg0223> you are welcome @ the_ball HTH
[12:54] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[12:54] <The_Ball> mehmetpg0223, that's pretty cool, neat idea
[12:55] <mehmetpg0223> @the_ball thats not my credit ^^ - found it last week in the web
[12:55] <mehmetpg0223> but thanks ^^
[12:56] <ivve> mehmetpg0223: so im guessing the output is identical on all 3 osds where the replicas reside?
[12:57] <mehmetpg0223> yes but no noticeable
[12:57] <ivve> what do you mean?
[12:57] <mehmetpg0223> the output is identical
[12:59] <mehmetpg0223> i have this issue with differend osds as the primary pg for 0.223
[13:00] * blip2 (~TehZomB@26XAAATOO.tor-irc.dnsbl.oftc.net) Quit ()
[13:00] <mehmetpg0223> osdmap e7047 pg 0.223 (0.223) -> up [4,16,28] acting [4,16,28]
[13:00] <mehmetpg0223> osdmap e9363 pg 0.223 (0.223) -> up [9,17,23] acting [9,17,23]
[13:00] <mehmetpg0223> osdmap e11230 pg 0.223 (0.223) -> up [9,11,20] acting [9,11,20]
[13:00] <mehmetpg0223> allways the same :*(
[13:02] * bniver (~bniver@pool-173-48-58-27.bstnma.fios.verizon.net) has joined #ceph
[13:04] * analbeard (~shw@support.memset.com) Quit (Remote host closed the connection)
[13:06] <ivve> wierd
[13:07] <ivve> have you had a look in the headers
[13:07] <The_Ball> mehmetpg0223, I wanted the date as well as the hour: for date in `ceph pg dump | grep active | awk '{print $20"_"$21}'`; do echo $date | sed 's/_/ /' | date +'Date %m-%d Hour %H' -f - ; done | sort | uniq -c
[13:07] <ivve> whats in that PG
[13:08] * rwheeler (~rwheeler@121.244.87.118) Quit (Quit: Leaving)
[13:09] <mehmetpg0223> absolutley! And i do not know where i can look further. Yes i have checked manualy some of the objects - did not find any issues... like ceph which seems also do not find any issue
[13:09] <ivve> something i would try is running a deepscrub on the osd itself, start with the non-primaries see if the issue occurs
[13:09] * analbeard (~shw@support.memset.com) has joined #ceph
[13:10] <ivve> instead of the pg
[13:11] <ivve> does the same thing happen on a regular scrub?
[13:12] * kefu_ (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[13:12] <mehmetpg0223> no, ive tested manual deep scrubs with some other pgs without problems. normal scrub on pg 0.223 is also fine
[13:13] * kefu (~kefu@114.92.96.253) has joined #ceph
[13:13] <mehmetpg0223> thanks @ the_ball :) will give it a try
[13:15] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[13:16] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[13:18] * dux0r (~ricin@ip95.ip-94-23-150.eu) has joined #ceph
[13:26] <mehmetpg0223> when the disks would be the problem, i guess i would have not only slow/blocked requests when this pg is doing his deep-scrub...
[13:26] <mehmetpg0223> how i do that on the osd itself? what do you mean?
[13:27] <mehmetpg0223> do you mean md5sum on the object and compare that with the replica?
[13:27] <mehmetpg0223> if so, yes ive done that
[13:27] <ivve> ceph osd deep-scrub osd.X
[13:28] <mehmetpg0223> hmm... thats new... i ll give it a try when the recovery process is finished
[13:28] <ivve> its a longshot i guess
[13:28] <ivve> but it sure is a wierd problem
[13:30] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[13:36] * ira (~ira@121.244.87.117) Quit (Ping timeout: 480 seconds)
[13:36] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[13:37] <mehmetpg0223> really wired.
[13:38] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[13:41] * jpierre03 (~jpierre03@voyage.prunetwork.fr) Quit (Ping timeout: 480 seconds)
[13:44] * derjohn_mobi (~aj@2001:6f8:1337:0:a5b7:91e8:ccb2:1eff) Quit (Ping timeout: 480 seconds)
[13:48] * dux0r (~ricin@9YSAAA31H.tor-irc.dnsbl.oftc.net) Quit ()
[13:52] * dgurtner (~dgurtner@178.197.225.90) has joined #ceph
[13:52] * luigiman (~starcoder@tsn109-201-152-14.dyn.nltelcom.net) has joined #ceph
[13:57] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:06] * cgxu (~cgxu@220.249.249.31) Quit (Quit: WeeChat 1.0.1)
[14:06] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[14:07] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[14:16] * ira (~ira@121.244.87.118) has joined #ceph
[14:16] * garphy`aw is now known as garphy
[14:22] * luigiman (~starcoder@tsn109-201-152-14.dyn.nltelcom.net) Quit ()
[14:23] * georgem (~Adium@24.114.73.251) has joined #ceph
[14:23] * karnan (~karnan@106.51.128.115) Quit (Ping timeout: 480 seconds)
[14:25] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Remote host closed the connection)
[14:27] * dux0r (~BlS@ip95.ip-94-23-150.eu) has joined #ceph
[14:28] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[14:33] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[14:40] * ira (~ira@121.244.87.118) Quit (Ping timeout: 480 seconds)
[14:43] * georgem (~Adium@24.114.73.251) Quit (Quit: Leaving.)
[14:44] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[14:46] * jordan_c (~jconway@cable-192.222.246.54.electronicbox.net) Quit (Ping timeout: 480 seconds)
[14:49] * rraja (~rraja@121.244.87.117) has joined #ceph
[14:50] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[14:51] * karnan (~karnan@106.216.145.92) has joined #ceph
[14:52] * karnan (~karnan@106.216.145.92) Quit ()
[14:52] * karnan (~karnan@106.216.145.92) has joined #ceph
[14:57] * dux0r (~BlS@9YSAAA326.tor-irc.dnsbl.oftc.net) Quit ()
[15:00] * karnan (~karnan@106.216.145.92) Quit (Ping timeout: 480 seconds)
[15:02] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[15:02] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[15:03] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit ()
[15:05] * georgem (~Adium@206.108.127.16) has joined #ceph
[15:05] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[15:06] * dgurtner_ (~dgurtner@178.197.225.90) has joined #ceph
[15:06] <walbuci> Would there be any known reason why ceph's throughput starts to deteriorate when the cluster size reachs 50GB then starts picking up again when it reaches 70GB?
[15:07] * dgurtner (~dgurtner@178.197.225.90) Quit (Read error: Connection reset by peer)
[15:07] <walbuci> I've got graphs that show this happens when using 6x osd on 3 servers. First using btrfs, then using xfs.
[15:10] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[15:10] * Tumm (~loft@185.65.134.78) has joined #ceph
[15:13] * haomaiwang (~oftc-webi@61.149.85.206) Quit (Ping timeout: 480 seconds)
[15:13] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[15:13] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[15:21] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[15:21] * derjohn_mobi (~aj@p4FFD2F9A.dip0.t-ipconnect.de) has joined #ceph
[15:21] * MACscr (~Adium@c-73-9-230-5.hsd1.il.comcast.net) Quit (Quit: Leaving.)
[15:21] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) has joined #ceph
[15:21] * ChanServ sets mode +o nhm
[15:22] * vimal (~vikumar@114.143.165.7) has joined #ceph
[15:24] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[15:29] <essjayhch> Bit of a quandry
[15:30] <essjayhch> in the process of upgrading from hammer, had no issues with the monitors, or the first osd node. However I'm seeing something a little strange on the second.
[15:31] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[15:32] * netmare (~skrasnopi@188.93.16.54) has joined #ceph
[15:33] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Ping timeout: 480 seconds)
[15:33] <essjayhch> Because the Disk density is somewhat high, I'm trying to avoid running restart ceph-all, because it basically sends the latency through the roof if it's trying to re-peer all the PGs on a box post upgrade (basically made my cluster unhappy for a period of time earlier but it fixed itself). As a result I'm trying to restart the ceph osd instances
[15:33] <essjayhch> individually. Ergo I want to run `restart ceph-osd id=$id cluster=$cluster` sequentially. Now when i did that on the first disk in this node, it won't restart. There doesn't appear to be anything wrong with the disk itself (I can manually invoke the osd daemon from the cli as upstart would do so) however i'm seeing the following in the upstart logs:
[15:33] <essjayhch> `/proc/self/fd/9: 8: /proc/self/fd/9: /usr/libexec/ceph/ceph-osd-prestart.sh: not found`.
[15:34] <essjayhch> Which is highly weird because the ceph-osd.conf file in /etc/init says /usr/lib/ceph/ceph-osd-prestart.sh.
[15:34] <essjayhch> Any thoughts?
[15:34] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[15:38] * Jeffrey4l_ (~Jeffrey@110.252.45.235) Quit (Ping timeout: 480 seconds)
[15:38] * dgurtner_ (~dgurtner@178.197.225.90) Quit (Read error: Connection reset by peer)
[15:39] * dnunez (~dnunez@nat-pool-bos-u.redhat.com) has joined #ceph
[15:40] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:40] * Tumm (~loft@26XAAATRU.tor-irc.dnsbl.oftc.net) Quit ()
[15:42] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[15:43] * dgurtner (~dgurtner@178.197.225.90) has joined #ceph
[15:45] * JWilbur (~JohnO@tor2r.ins.tor.net.eu.org) has joined #ceph
[15:48] * dnunez (~dnunez@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[15:48] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-055.uzh.ch) has joined #ceph
[15:49] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[15:49] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:52] <netmare> Hi all!
[15:52] <netmare> Yesterday I have a problem after updating to version CEPH 10.2.2 (Jewel). Initially, the cluster consisted of four nodes with installed CEPH Firefly. First, I updated to Firefly to Hammer - everything OK, without any problems. After I performed the upgrade of the cluster to Jewel and restarted ceph-mon's on all nodes. As soon as I restart all OSD in one of the nodes, they moved into a state of down.
[15:52] <netmare> ceph daemon osd.X status command indicates that the OSD is in the preboot state:
[15:52] <netmare> {
[15:52] <netmare> "Cluster_fsid": "a8e457e7-ed20-4f6f-9bfd-7755bf4fedbb",
[15:52] <netmare> "Osd_fsid": "4b72e531-4f2a-4322-a26c-0da64cb4525e",
[15:52] <netmare> "Whoami": 3
[15:52] <netmare> "State": "preboot",
[15:52] <netmare> "Oldest_map": 1,
[15:52] <netmare> "Newest_map": 431,
[15:52] <netmare> "Num_pgs": 195
[15:52] <netmare> }
[15:52] <netmare> In to log I see a lot of messages of the form - http://paste.ubuntu.com/22178548/
[15:52] <netmare> My problem is very similar to the problem described here http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2016-May/009584.html
[15:52] <netmare> Unfortunately, nothing helps . I tried to restart the OSD's, mon's, I rebooted node - OSD will still be able to down.
[15:52] <netmare> Does anyone have any ideas what could be the problem?
[15:53] <netmare> My osd tree - http://paste.ubuntu.com/22179338/
[15:54] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[15:57] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[16:00] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) Quit (Ping timeout: 480 seconds)
[16:00] * haomaiwang (~oftc-webi@114.249.239.114) has joined #ceph
[16:00] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:02] * dnunez (~dnunez@nat-pool-bos-t.redhat.com) has joined #ceph
[16:03] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[16:03] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[16:09] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:14] * derjohn_mobi (~aj@p4FFD2F9A.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[16:14] <essjayhch> turns out my thing was to do with the fact that upstart won't reinitialize it's config for a task that is currently in use: Ie unless all the upstart processes had been stopped, it would never update ergo it's using the current running configuration.
[16:14] <essjayhch> which is a pain.
[16:15] * JWilbur (~JohnO@9YSAAA35O.tor-irc.dnsbl.oftc.net) Quit ()
[16:15] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:18] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[16:18] * _are__ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Read error: Connection reset by peer)
[16:19] * sebastian-w (~quassel@212.218.8.138) has joined #ceph
[16:19] * sebastian-w_ (~quassel@212.218.8.139) Quit (Ping timeout: 480 seconds)
[16:19] * cryptk (~Pieman@205.185.112.54) has joined #ceph
[16:20] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[16:20] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[16:20] * netmare (~skrasnopi@188.93.16.54) Quit (Ping timeout: 480 seconds)
[16:21] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[16:22] * mykola (~Mikolaj@193.93.217.35) has joined #ceph
[16:22] * mgolub (~Mikolaj@193.93.217.35) has joined #ceph
[16:23] * EinstCrazy (~EinstCraz@58.39.76.245) has joined #ceph
[16:24] * wjw-freebsd (~wjw@vpn.ecoracks.nl) has joined #ceph
[16:28] * swami1 (~swami@49.38.0.169) Quit (Quit: Leaving.)
[16:29] <toast-work> hey guys, how are you all handling DR between sites with ceph over slow links?
[16:29] <toast-work> are you using something to replicate volumes, stretching the cluster and using crush?
[16:30] * wjw-freebsd2 (~wjw@vpn.ecoracks.nl) has joined #ceph
[16:31] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[16:31] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[16:32] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[16:32] * netmare (~skrasnopi@188.93.16.2) has joined #ceph
[16:33] * sankars (uid176696@id-176696.brockwell.irccloud.com) Quit (Quit: Connection closed for inactivity)
[16:34] * wjw-freebsd (~wjw@vpn.ecoracks.nl) Quit (Ping timeout: 480 seconds)
[16:38] * xarses_ (~xarses@64.124.158.192) has joined #ceph
[16:42] * wjw-freebsd2 (~wjw@vpn.ecoracks.nl) Quit (Ping timeout: 480 seconds)
[16:44] * kefu (~kefu@183.193.165.164) has joined #ceph
[16:44] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[16:45] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-055.uzh.ch) Quit (Ping timeout: 480 seconds)
[16:45] * Jeffrey4l (~Jeffrey@110.252.45.235) has joined #ceph
[16:45] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:48] * dgurtner (~dgurtner@178.197.225.90) Quit (Ping timeout: 480 seconds)
[16:49] * cryptk (~Pieman@61TAAA26I.tor-irc.dnsbl.oftc.net) Quit ()
[16:50] * _are__ (~quassel@strato.lihas.de) has joined #ceph
[16:51] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[16:52] * kefu (~kefu@183.193.165.164) Quit (Ping timeout: 480 seconds)
[16:53] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[16:54] * derjohn_mobi (~aj@2001:6f8:1337:0:5184:6b65:7a40:1bb) has joined #ceph
[16:55] * scg (~zscg@181.122.37.47) has joined #ceph
[16:57] <mehmetpg0223> @ivve: My recovery process after changing the tunables to jewel is finished. Now i did at first a deep-scrub on 0.223 again to see if something happend since i rollback to default tunables and then to jewel again. No luck! Again blocked requests.
[16:57] <mehmetpg0223> i am waiting for this to become health_ok again and than ill try the deep-scrub on osd
[16:59] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[17:00] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[17:00] * F|1nt (~F|1nt@host37-212.lan-isdn.imaginet.fr) has joined #ceph
[17:01] * dgurtner (~dgurtner@178.197.225.90) has joined #ceph
[17:03] <mehmetpg0223> it seems that always when the deep scrub on the mentioned pg 0.223 is running, the primary osd of the pg - in this example - osd.9 blocks all requests from other pgs too...
[17:05] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:06] * srk_ (~Siva@cpe-70-113-23-93.austin.res.rr.com) has joined #ceph
[17:07] <mehmetpg0223> for you information: the deep-scrub has taken 13 minutes to finish
[17:09] <SamYaple> netmare: did you update teh permissions?
[17:10] <SamYaple> netmare: hammer -> jewel by default switches from the root to ceph user. so the files need to be owned by the cpeh user
[17:10] <mehmetpg0223> @ivve: ceph osd deep-scrub osd.9 is running...
[17:10] * srk (~Siva@cpe-70-113-23-93.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:10] <netmare> Yes, i changed premissions for /var/lib/ceph and /var/log/ceph to ceph user
[17:12] <SamYaple> netmare: are you using bluestore?
[17:12] <SamYaple> that can't be
[17:13] <netmare> SamYaple: No, we using old backend - not bluestore
[17:14] <SamYaple> netmare: thats strange then as to why its probing for bluestore
[17:17] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[17:17] <SamYaple> netmare: are those the only osd logs at all?
[17:17] <SamYaple> netmare: I would attempt to start teh osd directly and get some more information
[17:17] * wushudoin (~wushudoin@38.99.12.237) has joined #ceph
[17:18] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[17:19] * EinstCrazy (~EinstCraz@58.39.76.245) Quit (Remote host closed the connection)
[17:22] * EinstCrazy (~EinstCraz@58.39.76.245) has joined #ceph
[17:22] <netmare> SamYaple: I have just restart OSD and this is last log http://paste.ubuntu.com/22186069/
[17:23] <SamYaple> netmare: that seems pretty clear what the issue is
[17:23] <SamYaple> what version of hammer are you on?
[17:23] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:24] <netmare> SamYaple: wersion of Hammer was: "version": "ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)"
[17:24] <SamYaple> netmare: please paste the output of "ceph tell osd.* version"
[17:25] <SamYaple> in pastebin, not here!
[17:25] * cathode (~cathode@50.232.215.114) has joined #ceph
[17:26] * Catsceo (~Redshift@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[17:26] <netmare> SamYaple: http://paste.ubuntu.com/22186436/ - output of ceph tell osd.* version
[17:27] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[17:27] <SamYaple> netmare: osd.55 and osd.58, how long have they been down?
[17:27] <SamYaple> since before the hammer upgrade?
[17:28] <SamYaple> the issue appears to be you have old osds (even if they are currently UP and IN) in the cluster
[17:28] <netmare> SamYaple: No, it's going down after upgrade to jewel
[17:29] <SamYaple> so you have two options, remove/upgrade those old osds or perform a full stop of the osds in the cluster and then start them again
[17:29] * ntpttr (~ntpttr@192.55.55.37) has joined #ceph
[17:30] <netmare> SamYaple: Thats OSD's was upgraded to Jewel, but I'm not restarted them
[17:31] * krypto (~krypto@106.51.31.35) has joined #ceph
[17:31] * EinstCrazy (~EinstCraz@58.39.76.245) Quit (Remote host closed the connection)
[17:31] * kefu_ is now known as kefu
[17:32] <SamYaple> netmare: what do you mean?
[17:33] <netmare> SamYaple: OK, I'm trying to withdraw cluster from production at the moment and to restart all OSD's after that
[17:33] * F|1nt (~F|1nt@host37-212.lan-isdn.imaginet.fr) Quit (Quit: Be back later ...)
[17:34] <mehmetpg0223> ARGH... again when the deep-scrub is running on 0.223 => blocked requests. All other PGs on osd.9 are fine... :/ any ideas?
[17:35] <SamYaple> netmare: you need to stop _all_ osds, then start them back
[17:35] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[17:35] <SamYaple> netmare: read the notes about upgrading from firefly -> jewel http://ceph.com/releases/v10-2-0-jewel-released/
[17:35] <SamYaple> netmare: i think those apply to you here
[17:36] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Ping timeout: 480 seconds)
[17:36] <SamYaple> netmare: i dont know how quickly you moved from firefly > hammer, but perhaps the internal upgrade process hadnt finished yet. i dont know how long that actually takes
[17:38] * dgurtner (~dgurtner@178.197.225.90) Quit (Ping timeout: 480 seconds)
[17:42] * EinstCrazy (~EinstCraz@58.39.76.245) has joined #ceph
[17:42] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[17:42] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[17:43] <mehmetpg0223> i am off in a few minutes... will come again online tomorrow
[17:44] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[17:44] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) Quit (Quit: Leaving.)
[17:45] * danieagle (~Daniel@177.68.230.88) has joined #ceph
[17:47] * ntpttr (~ntpttr@192.55.55.37) Quit (Ping timeout: 480 seconds)
[17:47] * EinstCrazy (~EinstCraz@58.39.76.245) Quit (Remote host closed the connection)
[17:48] * EinstCrazy (~EinstCraz@58.39.76.245) has joined #ceph
[17:52] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[17:53] <mehmetpg0223> can this be a point for my issue
[17:53] <mehmetpg0223> 2016-08-04 17:41:22.634953 osd.9 172.16.0.11:6834/115670 3293 : cluster [WRN] slow request 30.286466 seconds old, received at 2016-08-04 17:40:52.348411: osd_sub_op(unknown.0.0:0 0.223 MIN [scrub-map] v 0'0 snapset=0=[]:[]) currently queued_for_pg
[17:55] * Catsceo (~Redshift@61TAAA28W.tor-irc.dnsbl.oftc.net) Quit ()
[17:56] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[17:59] * malevolent (~quassel@124.red-88-11-251.dynamicip.rima-tde.net) Quit (Ping timeout: 480 seconds)
[18:00] * mehmetpg0223 (~top@84.254.67.15) Quit (Quit: bye)
[18:01] * ircolle (~Adium@2601:285:201:633a:d106:43bd:1d7b:f4b0) has joined #ceph
[18:04] * swami1 (~swami@27.7.171.143) has joined #ceph
[18:06] * aNupoisc (~adnavare@192.55.55.39) has joined #ceph
[18:06] * EthanL (~lamberet@cce02cs4034-fa12-z.ams.hpecore.net) has joined #ceph
[18:07] * jowilkin (~jowilkin@2601:644:4000:b0bf:56ee:75ff:fe10:724e) Quit (Quit: Leaving)
[18:07] * jowilkin (~jowilkin@2601:644:4000:b0bf:56ee:75ff:fe10:724e) has joined #ceph
[18:07] * dgurtner (~dgurtner@178.197.225.167) has joined #ceph
[18:11] * blizzow (~jburns@50.243.148.102) has joined #ceph
[18:13] * malevolent (~quassel@124.red-88-11-251.dynamicip.rima-tde.net) has joined #ceph
[18:14] * mhackett (~mhack@nat-pool-bos-u.redhat.com) has joined #ceph
[18:15] * rdas (~rdas@121.244.87.116) Quit (Ping timeout: 480 seconds)
[18:16] * malevolent_ (~quassel@195.red-193-152-185.dynamicip.rima-tde.net) has joined #ceph
[18:17] * EthanL (~lamberet@cce02cs4034-fa12-z.ams.hpecore.net) Quit (Ping timeout: 480 seconds)
[18:18] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Quit: Leaving.)
[18:19] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[18:20] * mhack (~mhack@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:20] * aNupoisc (~adnavare@192.55.55.39) Quit (Remote host closed the connection)
[18:21] <netmare> SamYaple: Thank you! After restart all cluster, OSD's on failured nodes are up
[18:24] * malevolent (~quassel@124.red-88-11-251.dynamicip.rima-tde.net) Quit (Ping timeout: 480 seconds)
[18:24] * malevolent_ (~quassel@195.red-193-152-185.dynamicip.rima-tde.net) Quit (Ping timeout: 480 seconds)
[18:25] <blizzow> What kind of write performance are people here getting from ceph clients writing into their own rbd image? I can't seem to exceed 50-60MB per second.
[18:25] * tsg (~tgohad@134.134.139.82) has joined #ceph
[18:27] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[18:28] * yanzheng1 (~zhyan@125.70.20.176) Quit (Quit: This computer has gone to sleep)
[18:29] <etienneme> It mostly depends of your hardware configuration ;)
[18:30] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[18:31] <blizzow> etienneme: I know that. I'm curious what kind of performance other people are getting out of their ceph clusters irrespective of their hardware choices.
[18:32] * anadrom (~curtis864@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[18:33] * ntpttr (~ntpttr@192.55.54.42) has joined #ceph
[18:33] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[18:34] * ntpttr (~ntpttr@192.55.54.42) Quit ()
[18:34] <SamYaple> netmare: no problem. glad its up for you!
[18:44] * mhack (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[18:46] <blizzow> I can only find one benchmark of ceph write performance. And it claims 200MB/sec for one client and drops to about 100MB/sec for four clients. That's with 40Gb ethernet for the ceph cluster.
[18:46] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[18:51] * mhackett (~mhack@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:51] * inevity (~androirc@107.170.0.159) has joined #ceph
[18:51] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[18:52] <inevity> hi
[18:54] * bb0x (~bb0x@5.2.199.244) Quit (Quit: This computer has gone to sleep)
[18:55] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:56] * poningru (~evarghese@gw-sfo.plos.org) Quit (Quit: Ex-Chat)
[18:57] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[19:00] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[19:01] <SamYaple> hello
[19:01] * anadrom (~curtis864@61TAAA3AQ.tor-irc.dnsbl.oftc.net) Quit ()
[19:03] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[19:05] * krypto (~krypto@106.51.31.35) Quit (Ping timeout: 480 seconds)
[19:06] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[19:07] <inevity> when run the gihub ceph/s3-tests , it cannot parse yaml conf file. Has someone met it?
[19:08] <inevity> s/gihub/github
[19:09] <inevity> the first line saying no section header
[19:10] * inevity (~androirc@107.170.0.159) Quit (Remote host closed the connection)
[19:10] * DanFoster (~Daniel@office.34sp.com) Quit (Ping timeout: 480 seconds)
[19:10] * shaunm (~shaunm@74.83.215.100) Quit (Ping timeout: 480 seconds)
[19:10] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[19:12] * ntpttr_ (~ntpttr@fmdmzpr03-ext.fm.intel.com) Quit (Remote host closed the connection)
[19:13] * ntpttr (~ntpttr@134.134.139.77) has joined #ceph
[19:13] * ntpttr_ (~ntpttr@134.134.139.77) has joined #ceph
[19:14] * garphy is now known as garphy`aw
[19:14] * joshd1 (~jdurgin@2602:30a:c089:2b0:94c2:8d44:7f81:59a2) Quit (Quit: Leaving.)
[19:14] * thomnico (~thomnico@2a01:e35:8b41:120:1dcb:a4e9:24ee:41d4) Quit (Ping timeout: 480 seconds)
[19:17] * kefu (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[19:18] * kefu (~kefu@114.92.96.253) has joined #ceph
[19:19] * aNupoisc (~adnavare@134.134.139.82) has joined #ceph
[19:21] * Hemanth (~hkumar_@103.228.221.143) has joined #ceph
[19:23] * kefu (~kefu@114.92.96.253) Quit ()
[19:24] * KeeperOfTheSoul (~Misacorp@205.185.112.54) has joined #ceph
[19:27] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[19:28] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[19:32] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[19:34] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[19:36] * fred`` (fred@earthli.ng) Quit (Quit: +++ATH0)
[19:38] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[19:39] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[19:39] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[19:40] * bb0x (~bb0x@78.97.194.150) Quit ()
[19:44] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[19:44] * fred`` (fred@earthli.ng) has joined #ceph
[19:45] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:6071:62a3:b364:3e85) Quit (Ping timeout: 480 seconds)
[19:49] * madkiss (~madkiss@ip5b4029be.dynamic.kabel-deutschland.de) has joined #ceph
[19:50] * fred`` (fred@earthli.ng) Quit (Quit: +++ATH0)
[19:52] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[19:54] * KeeperOfTheSoul (~Misacorp@9YSAAA4CG.tor-irc.dnsbl.oftc.net) Quit ()
[20:02] * Hemanth (~hkumar_@103.228.221.143) Quit (Quit: Leaving)
[20:02] * Hemanth (~hkumar_@103.228.221.143) has joined #ceph
[20:05] * tsg_ (~tgohad@192.55.54.40) has joined #ceph
[20:07] * blizzow (~jburns@50.243.148.102) Quit (Ping timeout: 480 seconds)
[20:09] * fred`` (fred@earthli.ng) has joined #ceph
[20:10] * bb0x (~bb0x@78.97.194.150) Quit (Read error: Connection reset by peer)
[20:10] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[20:10] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) has joined #ceph
[20:10] * tsg (~tgohad@134.134.139.82) Quit (Remote host closed the connection)
[20:11] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:12] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:7c62:b891:9b8a:4ede) has joined #ceph
[20:14] * gregmark (~Adium@68.87.42.115) has joined #ceph
[20:15] * smiley99 (~oftc-webi@pool-108-45-41-147.washdc.fios.verizon.net) has joined #ceph
[20:16] <smiley99> Hello everyone...is anyone here using rbd-mirror in production yet?
[20:19] * swami1 (~swami@27.7.171.143) Quit (Quit: Leaving.)
[20:21] * blizzow (~jburns@50.243.148.102) has joined #ceph
[20:22] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:7c62:b891:9b8a:4ede) Quit (Ping timeout: 480 seconds)
[20:23] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:23] * TMM (~hp@84.243.211.106) has joined #ceph
[20:25] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[20:27] * dgurtner (~dgurtner@178.197.225.167) Quit (Ping timeout: 480 seconds)
[20:30] * bb0x (~bb0x@78.97.194.150) Quit (Quit: This computer has gone to sleep)
[20:31] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[20:32] * derjohn_mobi (~aj@2001:6f8:1337:0:5184:6b65:7a40:1bb) Quit (Ping timeout: 480 seconds)
[20:38] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[20:38] * bb0x (~bb0x@78.97.194.150) has joined #ceph
[20:38] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) Quit (Remote host closed the connection)
[20:39] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Ping timeout: 480 seconds)
[20:39] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[20:41] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[20:42] * stefan0 (~stefano@168.205.191.253) has joined #ceph
[20:42] <stefan0> Hi guys! Hope all doing great..
[20:43] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[20:43] <stefan0> I do have a ceph cluster with 4 nodes Dell R730XD
[20:44] <stefan0> each node has 2 Intel SSD S3710 for journal (journal as a symlink to drive xfs mounting point in Linux)
[20:44] <stefan0> 2 Intel SSD S3610 (acting as a hot-storage pool, caching)
[20:45] <stefan0> and 8 SATA HDDs as the regular tier
[20:45] <stefan0> for performance measurement, I do think that the things should be running better than they are
[20:46] <stefan0> in each node the journal serves 4 SSD + 1 SSD S3610
[20:47] <stefan0> I??m reading about affinity and wondering if would be better to reset this layout
[20:47] <stefan0> destroy the cache tier
[20:47] <stefan0> and set the SSDs as members of the regular pool
[20:47] <stefan0> and set them with affinity
[20:48] <stefan0> as it is a design question I know it may have several point of views
[20:48] <stefan0> but what do you, more experti ceph people, think?
[20:49] <blizzow> stefan0: when you say performance measurement, what stats are you getting and what are you expecting?
[20:50] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[20:50] <stefan0> almost forgot, we use a 10 Gb LAN with LACP (2 ports per server)
[20:51] <stefan0> blizzow, we??re getting sequential reads of about 400 MBps
[20:52] <stefan0> and writes around half of that
[20:53] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[20:54] <stefan0> for comparison, using 8 HDDs SAS 10k with Raid-10 with a RAID controller we do have 600 MBps of reading (the limit of the 6 Gbps controller)
[20:55] <stefan0> since we??re starting with Ceph we don??t have a lot of reference about how things should be doing
[20:55] <stefan0> (we started working with it 3-4 months ago)
[20:56] * smiley99 slaps aakso around a bit with a large fishbot
[21:00] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) Quit (Read error: Connection reset by peer)
[21:01] * derjohn_mobi (~aj@x590db2b0.dyn.telefonica.de) has joined #ceph
[21:01] * EinstCrazy (~EinstCraz@58.39.76.245) Quit (Read error: Connection reset by peer)
[21:02] * EinstCrazy (~EinstCraz@58.39.76.245) has joined #ceph
[21:03] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[21:03] * penguinRaider (~KiKo@204.152.207.173) has joined #ceph
[21:03] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[21:04] * mykola (~Mikolaj@193.93.217.35) Quit (Quit: away)
[21:04] * mgolub (~Mikolaj@193.93.217.35) Quit (Quit: away)
[21:05] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[21:06] * squizzi (~squizzi@107.13.31.195) Quit (Ping timeout: 480 seconds)
[21:09] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[21:10] * rendar (~I@host208-116-dynamic.51-82-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:13] * EinstCrazy (~EinstCraz@58.39.76.245) Quit (Remote host closed the connection)
[21:14] * malevolent (~quassel@192.146.172.118) has joined #ceph
[21:20] * haplo37 (~haplo37@107.190.44.23) has joined #ceph
[21:22] <blizzow> stefan0: How many clients when you're getting 200MBps write?
[21:24] <stefan0> blizzow, none
[21:24] <stefan0> just the test environment
[21:24] <blizzow> How are you benchmarking then?
[21:24] * TMM (~hp@84.243.211.106) Quit (Quit: Ex-Chat)
[21:25] <blizzow> You have to have at least one client (the benchmark process) to run a benchmark, no?
[21:25] <stefan0> yes, but I mean no concurrency at all
[21:26] <stefan0> we??re using HD Tune Pro (our servers environment are primary Windows)
[21:27] * jlayton (~jlayton@cpe-2606-A000-1125-405B-14D9-DFF4-8FF1-7DD8.dyn6.twc.com) has left #ceph
[21:28] <SamYaple> stefan0: how are you accessing ceph? rbd? qemu? librados directly?
[21:28] <stefan0> SamYaple, rbd, through Mirantis Fuel deployment
[21:28] * Kidlvr (~Guest1390@46.166.190.176) has joined #ceph
[21:29] <SamYaple> so youre using openstack. have you setup teh appropriate driver for the glance image?
[21:30] <SamYaple> using virtio youll have a bottleneck thats noticible (200MBps sounds about right)
[21:30] <SamYaple> virtio-scsi however allows me to saturate 10Gb/s
[21:31] <blizzow> Okay, yeah, I see even lower performance ~40-60MB/sec :( I can write to disk on each OSD node at about ~1000MB/sec and have a dedicated 10Gbe network, but get crushed when I put ceph on it.
[21:31] <stefan0> SamYaple, according our design (S3710 for journal, S3610 for hot-storage and the rest SATA for regular storage), do you think we should re-think that?
[21:32] <SamYaple> stefan0: personally, I am a huge advocate of bcache+osds. it allows for ssd write speeds without creating more failure domains
[21:33] <SamYaple> stefan0: it would suggest at least investigating it
[21:33] * tsg_ (~tgohad@192.55.54.40) Quit (Remote host closed the connection)
[21:33] * tsg_ (~tgohad@134.134.139.78) has joined #ceph
[21:33] <SamYaple> stefan0: but what you describe seems fine, though I woudl stay away from a cache tier
[21:35] <stefan0> SamYaple, I didn??t feel in love with cache tiering indeed
[21:35] * rendar (~I@host208-116-dynamic.51-82-r.retail.telecomitalia.it) has joined #ceph
[21:36] <stefan0> blizzow, googled bcache+osd and didn??t have a lot o things
[21:36] <stefan0> do you recommend any material?
[21:37] <stefan0> onde you said about the driver for the glance image, would that be the same driver for cinder and swift
[21:38] <SamYaple> stefan0: frankly, i am doubtful how much work is going to be put into the cache tiering for the future. with bluestore fast approaching, it doesnt make a whole lot of sense to me
[21:38] <SamYaple> plus the code not exactly getting lots of love understandably
[21:38] <stefan0> hmm.. thanks for the info
[21:39] <SamYaple> stefan0: `glance image-update --property hw_scsi_model=virtio-scsi --property hw_disk_bus=scsi <imageid>`
[21:39] * analbeard (~shw@host86-142-132-208.range86-142.btcentralplus.com) has joined #ceph
[21:39] <SamYaple> youll have to relaunch a new image for that to take affect
[21:40] <SamYaple> im not sure how many people use bcache for osds stefan0. i manage ~1.5PB cluster and we use bcache for it all
[21:40] <SamYaple> we have a really bad workload too (lots of 4k writes)
[21:41] <stefan0> SamYaple, what do you think about turning off the cache tier and set the primary affinity?
[21:41] <stefan0> (throwing the SSDs S3610 drives to the same SATA pool)
[21:42] <stefan0> (and setting them affinity 1 and the SATA platters to zero)
[21:42] <SamYaple> stefan0: if your workload is alot of reads, then go for it
[21:42] <SamYaple> if its mostly writes, it wont help at all
[21:42] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) Quit (Ping timeout: 480 seconds)
[21:42] <SamYaple> so the idea with bcache is that you use the ssds you would have for the journals for the journal+osd combo
[21:43] <SamYaple> you could still do the affinity with that setup and _then_ it would be very fast
[21:43] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[21:43] <stefan0> the writes will be handicapped at journal, right?
[21:43] * haplo37 (~haplo37@107.190.44.23) Quit (Ping timeout: 480 seconds)
[21:43] <SamYaple> not with the hardware youve posted
[21:43] <SamYaple> but rememebr it has to hit the journal and osds
[21:43] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[21:45] <SamYaple> so for us stefan0, what we found was with lots of small writes it would hit the journal and everyone would be happy.... until 30 seconds later when the journal flushed to the disk and it would all lock up
[21:46] <SamYaple> once we switched to bcache everything was better
[21:46] * davidzlap (~Adium@2605:e000:1313:8003:5c17:daa8:30cf:6a22) has joined #ceph
[21:46] <SamYaple> bcache with read affinity sounds ideal, though ive not setup read affinity in production
[21:46] <SamYaple> we dont see many reads, the reads we do have are from writes that jsut happened so they still exists in the ssd cache anyway
[21:47] * penguinRaider_ (~KiKo@204.152.207.173) has joined #ceph
[21:49] <stefan0> SamYaple, great.. so, in first, setting the hw_scsi_model can possibly help (if actually we??re using the virtio)
[21:50] <SamYaple> stefan0: i believe with the setup youve described you are likely hitting a software bottleneck of virtio
[21:50] <SamYaple> virtio-scsi still has bottlenecks. it can only do ~5k iops . ive never found the throughput bottleneck though
[21:50] <SamYaple> i _only_ have 10gb networking :)
[21:51] <stefan0> SamYaple, great, thanks for you help..
[21:52] * jlayton (~jlayton@cpe-2606-A000-1125-405B-14D9-DFF4-8FF1-7DD8.dyn6.twc.com) has joined #ceph
[21:52] * penguinRaider (~KiKo@204.152.207.173) Quit (Ping timeout: 480 seconds)
[21:58] * praveen_ (~praveen@121.244.155.12) Quit (Remote host closed the connection)
[21:58] * Kidlvr (~Guest1390@61TAAA3FM.tor-irc.dnsbl.oftc.net) Quit ()
[21:59] * aNupoisc (~adnavare@134.134.139.82) Quit (Remote host closed the connection)
[22:01] * Hemanth (~hkumar_@103.228.221.143) Quit (Ping timeout: 480 seconds)
[22:01] * shubjero (~shubjero@107.155.107.246) Quit (Quit: Bye!)
[22:02] * shubjero (~shubjero@107.155.107.246) has joined #ceph
[22:07] * vata (~vata@ARennes-652-1-70-186.w2-11.abo.wanadoo.fr) has joined #ceph
[22:12] * analbeard (~shw@host86-142-132-208.range86-142.btcentralplus.com) Quit (Quit: Leaving.)
[22:14] * tsg__ (~tgohad@134.134.139.78) has joined #ceph
[22:14] * tsg_ (~tgohad@134.134.139.78) Quit (Remote host closed the connection)
[22:14] * penguinRaider_ (~KiKo@204.152.207.173) Quit (Ping timeout: 480 seconds)
[22:14] * tsg__ (~tgohad@134.134.139.78) Quit (Remote host closed the connection)
[22:14] * tsg__ (~tgohad@192.55.54.40) has joined #ceph
[22:24] * penguinRaider_ (~KiKo@146.185.31.226) has joined #ceph
[22:24] * squizzi (~squizzi@107.13.31.195) has joined #ceph
[22:26] * georgem (~Adium@206.108.127.16) Quit (Ping timeout: 480 seconds)
[22:27] * tom_nz (~oftc-webi@202.14.217.2) has joined #ceph
[22:33] * aNupoisc (~adnavare@192.55.54.40) has joined #ceph
[22:36] * Jeffrey4l_ (~Jeffrey@110.252.65.47) has joined #ceph
[22:39] * Jeffrey4l (~Jeffrey@110.252.45.235) Quit (Ping timeout: 480 seconds)
[22:40] * garphy`aw is now known as garphy
[22:40] <srk_> SamYaple: Did you run any benchmarks on bcache based cluster?
[22:41] <srk_> I'm trying to run a rados bench write with default 4M and hitting slow requests rightaway
[22:41] * dgurtner (~dgurtner@178.197.225.145) has joined #ceph
[22:41] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[22:43] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[22:45] * smiley99 (~oftc-webi@pool-108-45-41-147.washdc.fios.verizon.net) Quit (Quit: Page closed)
[22:45] <SamYaple> srk_: i did not perform rados bench on the larger clsuter if thats what you are asking. but I can perform it on our test lab if yo uwant
[22:45] <SamYaple> what are you using/how are you benchmarking and ill run those tests
[22:46] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[22:46] <srk_> This is a 3 node cluster with 6x4TB disks per node.
[22:46] <srk_> 10gb bonded
[22:47] <srk_> Cluster is Jewel 10.2.2 based with Jemalloc enabled.
[22:47] <SamYaple> I have a comprable clsuter, 3x3x4TB with 10gb (active/passive) and intel 750s for journal+bcache
[22:47] <SamYaple> it is not running erasure though, since thats foolish on such a small cluster
[22:48] <srk_> yea. no erasure
[22:48] <srk_> my SSD is 1.2TB INtel S3710
[22:49] <srk_> 6x10GB journals and rest for bcache cache
[22:49] <srk_> 4TBs are Western Digital
[22:50] <SamYaple> hmm. mines a bit different. i add /dev/sd* to the bcache set and then create the journals on /dev/bcache0p1 /dev/bcache0p2 as collocated journals
[22:50] <SamYaple> for performance i suppose that wouldnt matter one way or the other... youll just lose some effieceny
[22:52] <srk_> rados bench 300 write -p rbd
[22:52] <srk_> I try rados bench from Monitor nodes or openstack controller nodes
[22:54] <SamYaple> ok
[22:58] * tsg__ (~tgohad@192.55.54.40) Quit (Remote host closed the connection)
[22:58] * praveen (~praveen@171.61.115.158) has joined #ceph
[22:59] <SamYaple> srk_: what results are you expecting?
[23:00] <srk_> Total writes made: 21282
[23:00] <srk_> Write size: 4194304
[23:00] <srk_> Bandwidth (MB/sec): 283.105
[23:01] <srk_> and No slow requests
[23:01] * tsg (~tgohad@134.134.139.82) has joined #ceph
[23:01] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:02] <srk_> I had Hammer on this same setup and didn't see any Slow requests. With Jewel, they are seen right away
[23:03] * mhack (~mhack@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[23:03] <srk_> ceph osd perf, doesn't show any latency
[23:03] * _28_ria (~kvirc@opfr028.ru) has joined #ceph
[23:03] <srk_> They were all 0 infact
[23:04] <SamYaple> Total writes made: 9368
[23:04] <SamYaple> Bandwidth (MB/sec): 124.517
[23:04] <SamYaple> so a bit slower than what youve got listed
[23:04] <SamYaple> though I have tuned bcache different than default
[23:04] <SamYaple> no slow requests
[23:04] <SamYaple> this is a lab too, there are active vms running on this, but they are not too io intensive
[23:05] <srk_> is it Jewel too?
[23:05] <SamYaple> yes
[23:06] * EthanL (~lamberet@cce02cs4042-fa12-z.ams.hpecore.net) has joined #ceph
[23:06] <srk_> 4k write tests are going well though
[23:06] <srk_> i'll share the number once its done
[23:06] <SamYaple> srk_: i think the difference here is your journals are on the ssd directly, while mine are goign to and _through_ the ssd
[23:07] <SamYaple> to confirm this, lower your osd sync time
[23:07] <SamYaple> your slow requests should disapper/lessen
[23:07] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) has joined #ceph
[23:08] <srk_> you mean: `filestore max sync interval1 ?
[23:08] <srk_> filestore max sync interval ?
[23:08] <SamYaple> thats the one
[23:08] <SamYaple> lower it to 1 second, or even .5 seconds
[23:09] <SamYaple> if you do an iostat on your ssd youll likely see it at 100% util while the osds are not. you want to keep them both close to 100% util for best perfromance
[23:10] <srk_> thesa are 4k write numbers: Total writes made: 3120296
[23:10] <srk_> Write size: 4096
[23:10] <srk_> Object size: 4096
[23:10] <srk_> Bandwidth (MB/sec): 40.6141
[23:10] <srk_> Stddev Bandwidth: 14.7357
[23:10] <srk_> Max bandwidth (MB/sec): 55.3516
[23:11] <SamYaple> ok im testing some bcache tuning and then ill do the 4k test
[23:12] * ntpttr_ (~ntpttr@134.134.139.77) Quit (Quit: Leaving)
[23:12] <srk_> Thank you. no problem. I was about to ask about bcache tuning :)
[23:12] * ntpttr (~ntpttr@134.134.139.77) Quit (Quit: Leaving)
[23:12] * ntpttr (~ntpttr@134.134.139.77) has joined #ceph
[23:13] <SamYaple> so it depends on what you need and how utilized it is right? in my cause I put seq write cutoff at 0, which means even sequtial writes get sent to the bcache cache device
[23:13] <aNupoisc> Hi guys i am trying to add echo apache2 to my system's apt list by running deb http://gitbuilder.ceph.com/apache2-deb-trusty-x86_64-basic but I can't see this under gitbuilder.ceph.com/. Can anyone please let me know what should be the correct equivalent of this
[23:13] <SamYaple> thats not ideal for alot of cases, like writing out a 1TB file
[23:14] <aNupoisc> has it changed to libapache?
[23:14] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) Quit (Quit: Leaving.)
[23:15] <aNupoisc> i can see "libapache-mod-fastcgi-deb-trusty-x86_64-basic/"
[23:16] <aNupoisc> any idea?
[23:18] * georgem (~Adium@24.114.58.54) has joined #ceph
[23:19] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[23:20] * EthanL (~lamberet@cce02cs4042-fa12-z.ams.hpecore.net) Quit (Ping timeout: 480 seconds)
[23:20] * dnunez (~dnunez@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[23:22] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[23:23] * david_ (~david@207.107.71.71) Quit (Remote host closed the connection)
[23:23] <SamYaple> ok srk_ running 4k tests now
[23:24] * vegas3 (~delcake@tor-exit.squirrel.theremailer.net) has joined #ceph
[23:25] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[23:27] <srk_> ok
[23:27] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[23:27] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:30] <SamYaple> im not getting slow requests but im getting awfully slow speeds. ~4MB/s
[23:30] <SamYaple> let me reset my tunnings
[23:31] <srk_> sure.
[23:32] <srk_> SamYaple: Do you know of any published fio benchmark data that can be used as baseline ?
[23:33] <SamYaple> not recent
[23:33] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[23:34] <blizzow> How suicidal is to put the osd journal on a ramdisk?
[23:34] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[23:35] * georgem (~Adium@24.114.58.54) Quit (Quit: Leaving.)
[23:36] <SamYaple> blizzow: for production? pretty suicidal i think
[23:36] <SamYaple> blizzow: pretty much as soon as you need to reboot and dont flush out that journal the osd is trashed
[23:36] <SamYaple> on reboot youd need to recreate teh journal each time
[23:37] <blizzow> But assuming lots of nodes and OSDs, that danger becomes lower.
[23:38] <blizzow> right?
[23:38] <blizzow> Or rather the consequence of losing an OSD becomes less as more nodes and OSDs are created.
[23:38] <SamYaple> blizzow: i mean any reboot will require a journal flush and every restart will require a journal creation, beyond that it should be fine
[23:39] <blizzow> But performance should be a lot better, with a RAMdisk journal than even an SSD, right?
[23:40] <SamYaple> blizzow: no. it wont be much, if any, better
[23:40] <SamYaple> it still has to flush to the backing spinning disk
[23:40] <SamYaple> so unless you are bottlenecking on your ssd right now, its not going to imrpove anything
[23:40] <The1_> and if the right combination of nodes dies at the same time (eg via a power outage), you loose the data not flushed to OSD data disks
[23:41] <The1_> .. and recovery becomes hard since multiple journals are gone
[23:41] <The1_> normally you loose the OSD when you loose its journal
[23:41] <SamYaple> The1_: always (unless the journal was flushed via command with the osd shutdown)
[23:42] <The1_> there are ways, but its not foolproof
[23:42] <SamYaple> well its also not garunteed
[23:42] <The1_> indeed
[23:43] <SamYaple> if you are trying to recover and object after you lost the other 2 of 3 copies you can _techinically_ get it off of the old osd, but you can garuntee it is valid/most recent
[23:43] <The1_> so just for the sake of it - a lost journal == a lost OSD
[23:43] <SamYaple> right
[23:47] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[23:47] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[23:48] * Sirenia (~sirenia@454028b1.test.dnsbl.oftc.net) has joined #ceph
[23:52] * vata (~vata@ARennes-652-1-70-186.w2-11.abo.wanadoo.fr) Quit (Quit: Leaving.)
[23:54] * vegas3 (~delcake@26XAAAT53.tor-irc.dnsbl.oftc.net) Quit ()
[23:56] * rendar (~I@host208-116-dynamic.51-82-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[23:59] * Rens2Sea (~Tarazed@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.