#ceph IRC Log

Index

IRC Log for 2016-04-27

Timestamps are in GMT/BST.

[0:00] * Roy (~Blueraven@06SAABQN2.tor-irc.dnsbl.oftc.net) Quit ()
[0:02] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[0:03] <TMM> I'd still like to fix these slow request when osds are peering
[0:03] * haomaiwang (~haomaiwan@11.sub-70-196-15.myvzw.com) Quit (Ping timeout: 480 seconds)
[0:10] * redf_ (~red@80-108-89-163.cable.dynamic.surfer.at) has joined #ceph
[0:11] * HappyLoaf (~HappyLoaf@cpc93928-bolt16-2-0-cust133.10-3.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:12] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[0:14] * Racpatel (~Racpatel@2601:87:3:3601::675d) Quit (Ping timeout: 480 seconds)
[0:17] * redf (~red@80-108-89-163.cable.dynamic.surfer.at) Quit (Ping timeout: 480 seconds)
[0:17] * haomaiwang (~haomaiwan@11.sub-70-196-15.myvzw.com) has joined #ceph
[0:19] * Discovery (~Discovery@178.239.49.68) Quit (Read error: Connection reset by peer)
[0:20] * fsimonce (~simon@host201-70-dynamic.26-79-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[0:21] * badone (~badone@66.187.239.16) has joined #ceph
[0:22] * Racpatel (~Racpatel@2601:87:3:3601::675d) has joined #ceph
[0:23] * lcurtis_ (~lcurtis@47.19.105.250) has joined #ceph
[0:25] * xarses (~xarses@209.94.245.222) has joined #ceph
[0:26] * haomaiwang (~haomaiwan@11.sub-70-196-15.myvzw.com) Quit (Ping timeout: 480 seconds)
[0:26] * OODavo (~Rehevkor@4MJAAEG8R.tor-irc.dnsbl.oftc.net) Quit ()
[0:30] * Freddy (~nartholli@06SAABQRH.tor-irc.dnsbl.oftc.net) has joined #ceph
[0:44] * Racpatel (~Racpatel@2601:87:3:3601::675d) Quit (Quit: Leaving)
[0:53] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:8cb1:7443:8517:e025) has joined #ceph
[0:53] * beaver6675 (~beaver667@101.127.60.122) has joined #ceph
[0:54] * hardwire (~hardwire@00012b17.user.oftc.net) Quit (Remote host closed the connection)
[0:55] * beaver6675 (~beaver667@101.127.60.122) Quit ()
[1:00] * Freddy (~nartholli@06SAABQRH.tor-irc.dnsbl.oftc.net) Quit ()
[1:08] * kevinc (~kevinc__@ip174-65-71-172.sd.sd.cox.net) has joined #ceph
[1:11] * mimizone (~mimizone@mx.boitachou.com) has joined #ceph
[1:11] * mimizone (~mimizone@mx.boitachou.com) has left #ceph
[1:13] * vata1 (~vata@207.96.182.162) Quit (Quit: Leaving.)
[1:14] * _{Tite}_ (~oftc-webi@209-6-251-36.c3-0.wrx-ubr1.sbo-wrx.ma.cable.rcn.com) has joined #ceph
[1:14] <_{Tite}_> good afternoon folks
[1:15] <_{Tite}_> anybody with a few minutes to lend me a hand ? I am trying to add a osd device with no luck .. no output on the logs
[1:17] * xarses_ (~xarses@209.94.245.222) Quit (Ping timeout: 480 seconds)
[1:17] * xarses (~xarses@209.94.245.222) Quit (Ping timeout: 480 seconds)
[1:18] * lcurtis_ (~lcurtis@47.19.105.250) Quit (Remote host closed the connection)
[1:22] <_{Tite}_> there is an error actually: FileStore::mount: lock_fsid failed
[1:26] <_{Tite}_> son of a gun... stopped the service, killed a stale ceph-osd , and now it did come up
[1:30] * oms101 (~oms101@p20030057EA008900C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:32] * bniver (~bniver@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Ping timeout: 480 seconds)
[1:34] * shyu_ (~shyu@119.254.120.71) has joined #ceph
[1:38] * Lea (~LeaChim@host86-147-224-166.range86-147.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:38] * oms101 (~oms101@p20030057EA002900C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:47] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[1:47] * xarses (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[1:55] * haplo37 (~haplo37@199.91.185.156) Quit (Ping timeout: 480 seconds)
[1:55] * yanzheng (~zhyan@118.116.115.159) has joined #ceph
[1:59] * ibravo (~ibravo@12.14.132.5) has joined #ceph
[2:03] * Skaag (~lunix@65.200.54.234) Quit (Quit: Leaving.)
[2:04] * ItsCriminalAFK (~dontron@anonymous.sec.nl) has joined #ceph
[2:11] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) Quit (Quit: WeeChat 1.4)
[2:14] * ibravo (~ibravo@12.14.132.5) Quit (Quit: This computer has gone to sleep)
[2:20] * kevinc (~kevinc__@ip174-65-71-172.sd.sd.cox.net) Quit (Quit: Leaving)
[2:21] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Ping timeout: 480 seconds)
[2:21] * xarses (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Ping timeout: 480 seconds)
[2:26] * ricin (~Atomizer@kunstler.tor-exit.calyxinstitute.org) has joined #ceph
[2:34] * ItsCriminalAFK (~dontron@06SAABQV0.tor-irc.dnsbl.oftc.net) Quit ()
[2:34] * Freddy (~Moriarty@192.42.115.101) has joined #ceph
[2:36] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[2:42] * haplo37 (~haplo37@107-190-32-70.cpe.teksavvy.com) has joined #ceph
[2:48] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:8cb1:7443:8517:e025) Quit (Ping timeout: 480 seconds)
[2:56] * ricin (~Atomizer@06SAABQWZ.tor-irc.dnsbl.oftc.net) Quit ()
[2:56] * Revo84 (~PeterRabb@94-245-57-237.customer.t3.se) has joined #ceph
[2:56] * icey (~Chris@0001bbad.user.oftc.net) Quit (Remote host closed the connection)
[2:57] * doppelgrau (~doppelgra@ipservice-092-209-179-229.092.209.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[3:04] * Freddy (~Moriarty@06SAABQXC.tor-irc.dnsbl.oftc.net) Quit ()
[3:05] * Frymaster (~Kizzi@hessel2.torservers.net) has joined #ceph
[3:06] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Remote host closed the connection)
[3:07] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[3:08] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) has joined #ceph
[3:14] * yanzheng (~zhyan@118.116.115.159) Quit (Quit: This computer has gone to sleep)
[3:15] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[3:18] * winston-d__ (uid98317@id-98317.richmond.irccloud.com) has joined #ceph
[3:20] * winston-d_ (uid98317@id-98317.richmond.irccloud.com) Quit (Ping timeout: 480 seconds)
[3:20] * winston-d__ is now known as winston-d_
[3:24] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:25] * winston-d_ (uid98317@id-98317.richmond.irccloud.com) Quit (Quit: Connection closed for inactivity)
[3:26] * Revo84 (~PeterRabb@06SAABQYI.tor-irc.dnsbl.oftc.net) Quit ()
[3:26] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[3:31] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[3:32] * Gandle (~boob@00021b85.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:34] * boredatwork (~overonthe@199.68.193.62) has joined #ceph
[3:34] * Frymaster (~Kizzi@06SAABQYS.tor-irc.dnsbl.oftc.net) Quit ()
[3:35] * Jourei (~Sun7zu@176.10.99.208) has joined #ceph
[3:36] * Mika_c (~quassel@122.146.93.152) has joined #ceph
[3:37] * zhaochao (~zhaochao@125.39.9.159) has joined #ceph
[3:39] * shyu_ (~shyu@119.254.120.71) Quit (Remote host closed the connection)
[3:41] * dynamicudpate (~overonthe@199.68.193.62) Quit (Ping timeout: 480 seconds)
[3:45] * _{Tite}_ (~oftc-webi@209-6-251-36.c3-0.wrx-ubr1.sbo-wrx.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[3:51] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[3:52] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[3:55] * shyu (~Shanzhi@119.254.120.66) Quit (Ping timeout: 480 seconds)
[3:56] * isaxi (~Aal@tor-exit.squirrel.theremailer.net) has joined #ceph
[4:00] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[4:00] * IvanJobs (~hardes@103.50.11.146) has joined #ceph
[4:02] * EinstCra_ (~EinstCraz@58.247.119.250) has joined #ceph
[4:04] * cholcombe (~chris@206.193.217.205) has joined #ceph
[4:04] * Jourei (~Sun7zu@76GAAEVIR.tor-irc.dnsbl.oftc.net) Quit ()
[4:04] * Altitudes (~vend3r@76GAAEVJM.tor-irc.dnsbl.oftc.net) has joined #ceph
[4:06] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[4:06] * shyu (~Shanzhi@119.254.120.67) has joined #ceph
[4:07] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[4:08] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[4:09] * flisky (~Thunderbi@36.110.40.24) has joined #ceph
[4:09] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[4:11] * shyu (~Shanzhi@119.254.120.67) Quit (Quit: Leaving)
[4:14] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[4:17] * [1]lj (~liujun@111.202.176.44) has joined #ceph
[4:18] * EinstCra_ (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[4:19] * cholcombe (~chris@206.193.217.205) Quit (Ping timeout: 480 seconds)
[4:26] * isaxi (~Aal@7V7AAD7XP.tor-irc.dnsbl.oftc.net) Quit ()
[4:26] * _s1gma (~luigiman@tor-exit5-readme.dfri.se) has joined #ceph
[4:34] * Altitudes (~vend3r@76GAAEVJM.tor-irc.dnsbl.oftc.net) Quit ()
[4:35] * rapedex (~Scymex@5.61.34.63) has joined #ceph
[4:44] * EinstCra_ (~EinstCraz@58.247.119.250) has joined #ceph
[4:48] * lj (~oftc-webi@111.202.176.44) Quit (Quit: Page closed)
[4:48] * [1]lj is now known as lj
[4:51] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[4:56] * _s1gma (~luigiman@06SAABQ2K.tor-irc.dnsbl.oftc.net) Quit ()
[4:56] * galaxyAbstractor (~Da_Pineap@104.200.154.56) has joined #ceph
[5:04] * rapedex (~Scymex@6AGAABEED.tor-irc.dnsbl.oftc.net) Quit ()
[5:04] * Sami345 (~Epi@192.42.115.101) has joined #ceph
[5:08] * haomaiwang (~haomaiwan@rrcs-67-79-205-19.sw.biz.rr.com) has joined #ceph
[5:09] * lifeboy (~roland@196.32.233.91) Quit (Remote host closed the connection)
[5:11] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[5:11] * Nacer (~Nacer@101.53.18.178) has joined #ceph
[5:11] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[5:24] * kefu (~kefu@183.193.162.205) has joined #ceph
[5:25] * Vacuum__ (~Vacuum@i59F7978B.versanet.de) has joined #ceph
[5:26] * galaxyAbstractor (~Da_Pineap@104.200.154.56) Quit ()
[5:26] * elt (~pico@62.102.148.67) has joined #ceph
[5:30] * shyu (~Shanzhi@119.254.120.66) has joined #ceph
[5:32] * Vacuum_ (~Vacuum@i59F79E89.versanet.de) Quit (Ping timeout: 480 seconds)
[5:33] * Mons (~manens@relay.manens.org) Quit (Remote host closed the connection)
[5:34] * Sami345 (~Epi@4MJAAEHDL.tor-irc.dnsbl.oftc.net) Quit ()
[5:37] * yanzheng (~zhyan@118.116.115.159) has joined #ceph
[5:40] * overclk (~quassel@121.244.87.117) has joined #ceph
[5:48] * thansen (~thansen@162.219.43.108) has joined #ceph
[5:49] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[5:52] * Hemanth (~hkumar_@103.228.221.189) has joined #ceph
[5:52] * cholcombe (~chris@206.193.217.205) has joined #ceph
[5:53] * cholcombe (~chris@206.193.217.205) Quit ()
[5:56] * elt (~pico@06SAABQ4V.tor-irc.dnsbl.oftc.net) Quit ()
[5:56] * Azerothian______ (~Pulec@static-ip-85-25-103-119.inaddr.ip-pool.com) has joined #ceph
[5:57] <IvanJobs> PoRNo-MoRoZ, maybe choose some idle moment, remove this osd, treat it as a new one to add to another host is the easiest way to do this. Ok, I know that it will cause two backfills, just FYI.
[6:09] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[6:09] * xarses (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[6:12] * ibravo (~ibravo@12.14.132.5) has joined #ceph
[6:16] * ibravo2 (~ibravo@72.198.142.104) has joined #ceph
[6:22] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[6:22] * Nacer (~Nacer@101.53.18.178) Quit (Remote host closed the connection)
[6:23] * ibravo (~ibravo@12.14.132.5) Quit (Ping timeout: 480 seconds)
[6:23] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[6:25] * ibravo2 (~ibravo@72.198.142.104) Quit (Quit: Leaving)
[6:26] * Azerothian______ (~Pulec@7V7AAD7YH.tor-irc.dnsbl.oftc.net) Quit ()
[6:26] * Azerothian______ (~Pirate@hessel3.torservers.net) has joined #ceph
[6:28] <skullone> https://www.youtube.com/watch?v=zlfzvjUgqr4
[6:29] <skullone> water tested my old lenovo
[6:33] * Hemanth (~hkumar_@103.228.221.189) Quit (Ping timeout: 480 seconds)
[6:35] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Read error: Connection reset by peer)
[6:35] * xarses (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Read error: Connection reset by peer)
[6:35] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[6:35] * andreww (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[6:36] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[6:39] * Peaced (~Epi@tor-exit1-readme.dfri.se) has joined #ceph
[6:56] * Azerothian______ (~Pirate@06SAABQ7A.tor-irc.dnsbl.oftc.net) Quit ()
[6:56] * Kidlvr (~measter@rfidlab2.redi.uniroma1.it) has joined #ceph
[6:58] * bjarne is now known as dvahlin
[7:00] * dneary (~dneary@12.237.105.253) has joined #ceph
[7:01] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Read error: Connection reset by peer)
[7:01] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[7:06] * kawa2014 (~kawa@83.111.58.108) has joined #ceph
[7:09] * Peaced (~Epi@4MJAAEHEZ.tor-irc.dnsbl.oftc.net) Quit ()
[7:10] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[7:10] * dneary (~dneary@12.237.105.253) Quit (Ping timeout: 480 seconds)
[7:13] * straterra (~KUSmurf@46.105.61.138) has joined #ceph
[7:18] * IvanJobs (~hardes@103.50.11.146) Quit (Read error: Connection reset by peer)
[7:26] * Kidlvr (~measter@6AGAABEJH.tor-irc.dnsbl.oftc.net) Quit ()
[7:26] * Uniju1 (~Jebula@4MJAAEHF2.tor-irc.dnsbl.oftc.net) has joined #ceph
[7:28] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Remote host closed the connection)
[7:39] * Gandle (~boob@00021b85.user.oftc.net) has joined #ceph
[7:40] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[7:43] * straterra (~KUSmurf@06SAABQ87.tor-irc.dnsbl.oftc.net) Quit ()
[7:44] * Jaska (~Joppe4899@Relay-J.tor-exit.network) has joined #ceph
[7:44] <skullone> my new one is a t460s, without a waterproof keyboard...
[7:44] <skullone> having second thoughts
[7:53] * karnan (~karnan@121.244.87.117) has joined #ceph
[7:54] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[7:56] * Uniju1 (~Jebula@4MJAAEHF2.tor-irc.dnsbl.oftc.net) Quit ()
[7:56] * kefu is now known as kefu|afk
[8:00] * kefu|afk is now known as kefu
[8:06] * rraja (~rraja@121.244.87.117) has joined #ceph
[8:08] * kefu_ (~kefu@114.92.122.74) has joined #ceph
[8:12] * kefu (~kefu@183.193.162.205) Quit (Ping timeout: 480 seconds)
[8:13] * lmb (~Lars@p578a9261.dip0.t-ipconnect.de) has joined #ceph
[8:13] * Jaska (~Joppe4899@4MJAAEHF6.tor-irc.dnsbl.oftc.net) Quit ()
[8:14] * Freddy (~Inverness@176.10.99.208) has joined #ceph
[8:18] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[8:19] * rdas (~rdas@121.244.87.116) has joined #ceph
[8:25] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[8:26] * cooey (~JohnO@relay1.cavefelem.com) has joined #ceph
[8:30] * lifeboy (~roland@196.32.233.188) has joined #ceph
[8:32] * linjan__ (~linjan@176.195.142.37) Quit (Ping timeout: 480 seconds)
[8:39] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[8:41] * Mika_c (~quassel@122.146.93.152) Quit (Remote host closed the connection)
[8:41] * nils_ (~nils_@doomstreet.collins.kg) has joined #ceph
[8:43] * scuttlemonkey is now known as scuttle|afk
[8:43] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:43] * ade (~abradshaw@dslb-088-072-191-127.088.072.pools.vodafone-ip.de) has joined #ceph
[8:43] * Freddy (~Inverness@6AGAABEL1.tor-irc.dnsbl.oftc.net) Quit ()
[8:43] * Grum (~Malcovent@exit1.ipredator.se) has joined #ceph
[8:45] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[8:48] * kasimon (~user@2a02:2450:dd1f::2450) has joined #ceph
[8:55] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[8:56] * cooey (~JohnO@6AGAABEMI.tor-irc.dnsbl.oftc.net) Quit ()
[8:56] * hyst (~biGGer@176.10.99.202) has joined #ceph
[8:59] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[9:03] <kasimon> Hi! After update to jewel, my evaluation cluster (debian jessie) doesn't start osd's anymore after boot. ceph-disk list shows them as "prepared" and ceph-disk activate-all starts them without problem.
[9:04] * kefu_ (~kefu@114.92.122.74) Quit (Max SendQ exceeded)
[9:05] * kefu (~kefu@114.92.122.74) has joined #ceph
[9:05] <kasimon> systemd tries to start the osds, but they are not mounted. My guess is it's an udev problem, but I couldn't find anything in the logs.
[9:06] <kasimon> Any ideas?
[9:11] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:11] * flisky (~Thunderbi@36.110.40.24) Quit (Remote host closed the connection)
[9:11] * allaok (~allaok@machine107.orange-labs.com) Quit ()
[9:11] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:13] * Grum (~Malcovent@6AGAABEMT.tor-irc.dnsbl.oftc.net) Quit ()
[9:13] * Behedwin (~Schaap@06SAABREA.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:14] * rdas (~rdas@121.244.87.116) Quit (Remote host closed the connection)
[9:14] * rendar (~I@host160-181-dynamic.3-87-r.retail.telecomitalia.it) has joined #ceph
[9:18] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[9:21] * lifeboy (~roland@196.32.233.188) Quit (Quit: Ex-Chat)
[9:24] * rdas (~rdas@121.244.87.116) has joined #ceph
[9:24] * dvanders (~dvanders@dvanders-pro.cern.ch) Quit (Ping timeout: 480 seconds)
[9:25] * mohmultihouse (~mohmultih@gw01.mhitp.dk) has joined #ceph
[9:26] * hyst (~biGGer@6AGAABEM9.tor-irc.dnsbl.oftc.net) Quit ()
[9:26] * w2k (~luckz@76GAAEVPW.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:26] * dvanders (~dvanders@2001:1458:202:200::102:124a) has joined #ceph
[9:27] * haplo37 (~haplo37@107-190-32-70.cpe.teksavvy.com) Quit (Ping timeout: 480 seconds)
[9:27] * kefu (~kefu@114.92.122.74) Quit (Max SendQ exceeded)
[9:28] * kefu (~kefu@114.92.122.74) has joined #ceph
[9:29] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) has joined #ceph
[9:29] * zenpac (~zenpac3@66.55.33.66) Quit (Ping timeout: 480 seconds)
[9:36] * ska (~skatinolo@cpe-173-174-111-177.austin.res.rr.com) Quit (Remote host closed the connection)
[9:36] * vikhyat is now known as vikhyat|afk
[9:37] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[9:38] * fsimonce (~simon@host201-70-dynamic.26-79-r.retail.telecomitalia.it) has joined #ceph
[9:38] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:41] * linjan__ (~linjan@86.62.112.22) has joined #ceph
[9:41] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[9:43] * Behedwin (~Schaap@06SAABREA.tor-irc.dnsbl.oftc.net) Quit ()
[9:43] * basicxman (~anadrom@06SAABRFE.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:48] * huangjun (~kvirc@113.57.168.154) has joined #ceph
[9:51] * The_Ball (~pi@20.92-221-43.customer.lyse.net) Quit (Ping timeout: 480 seconds)
[9:54] * The_Ball (~pi@20.92-221-43.customer.lyse.net) has joined #ceph
[9:54] * i_m (~ivan.miro@88.206.104.168) has joined #ceph
[9:56] * w2k (~luckz@76GAAEVPW.tor-irc.dnsbl.oftc.net) Quit ()
[9:56] * MKoR (~VampiricP@50.7.151.127) has joined #ceph
[9:57] * Concubidated (~cube@71.177.40.123) Quit (Quit: Leaving.)
[10:00] * dvanders (~dvanders@2001:1458:202:200::102:124a) Quit (Ping timeout: 480 seconds)
[10:00] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Read error: Connection reset by peer)
[10:00] * andreww (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Read error: Connection reset by peer)
[10:01] * xarses (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[10:01] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) has joined #ceph
[10:01] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[10:02] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[10:07] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:11] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[10:12] * huangjun|2 (~kvirc@113.57.168.154) has joined #ceph
[10:13] * basicxman (~anadrom@06SAABRFE.tor-irc.dnsbl.oftc.net) Quit ()
[10:15] * huangjun (~kvirc@113.57.168.154) Quit (Read error: Connection reset by peer)
[10:16] * LeaChim (~LeaChim@host86-147-224-166.range86-147.btcentralplus.com) has joined #ceph
[10:18] * Rosenbluth (~Sun7zu@06SAABRG1.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:18] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[10:23] * dvanders (~dvanders@dvanders-pro.cern.ch) has joined #ceph
[10:26] * MKoR (~VampiricP@76GAAEVQH.tor-irc.dnsbl.oftc.net) Quit ()
[10:27] <sep> when you empty a disk with ceph osd out osd.9 ; is that disk doing a backfill or recovery ? i wanted to increase it's max backfill to speed up the process of migrating away pg's
[10:28] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:28] * i_m (~ivan.miro@88.206.104.168) Quit (Quit: Leaving.)
[10:29] * i_m (~ivan.miro@88.206.104.168) has joined #ceph
[10:29] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[10:30] * Bj_o_rn (~mog_@destiny.enn.lu) has joined #ceph
[10:35] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:37] * vikhyat|afk is now known as vikhyat
[10:42] * huangjun|2 is now known as huangjun
[10:42] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:45] * derjohn_mob (~aj@46.189.28.82) has joined #ceph
[10:48] * Rosenbluth (~Sun7zu@06SAABRG1.tor-irc.dnsbl.oftc.net) Quit ()
[10:51] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[10:51] * Mika_c (~quassel@122.146.93.152) has joined #ceph
[10:53] * mattbenjamin1 (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[10:56] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[10:57] * Shadow386 (~delcake@37.203.209.26) has joined #ceph
[11:00] * Bj_o_rn (~mog_@4MJAAEHIT.tor-irc.dnsbl.oftc.net) Quit ()
[11:00] * ylmson (~toast@194.187.249.135) has joined #ceph
[11:01] * mattbenjamin1 (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[11:02] * zhaochao_ (~zhaochao@125.39.9.151) has joined #ceph
[11:07] * Shadow386 (~delcake@37.203.209.26) Quit (Remote host closed the connection)
[11:07] * Teddybareman (~W|ldCraze@anonymous6.sec.nl) has joined #ceph
[11:08] * zhaochao (~zhaochao@125.39.9.159) Quit (Ping timeout: 480 seconds)
[11:08] * zhaochao_ is now known as zhaochao
[11:08] <flaf> sep, I'm not a ceph expert but for me, after that, another(s) osd will make backfill.
[11:09] <vikhyat> flaf: you are right if we mark an osd out
[11:09] <vikhyat> it peers will do the backfill to some other OSD in cluster
[11:10] <vikhyat> to reach to the replica count
[11:10] <flaf> thx for the confirmation vikhyat ;)
[11:11] <vikhyat> :)
[11:12] * thomnico (~thomnico@2a01:e35:8b41:120:1504:6b12:91f9:b92f) has joined #ceph
[11:14] <sep> flaf, what puzzles me is that when i started the process. it was going fairly fast. about 1000MBps and 3-400 objects/second. and it had this speed for a few hours. but it was gradualy ramping off. so now the recovery io is more something like 10-100MB/sec with 2-20 objects a second. and it's been going at this slower speed all night.
[11:14] <sep> i wondered if there was some way to maintain the higher earlier speed for the duration of the recovery/backfill
[11:17] * brians (~brian@80.111.114.175) Quit (Quit: Textual IRC Client: www.textualapp.com)
[11:17] * brians (~brian@80.111.114.175) has joined #ceph
[11:19] * t4nek (~oftc-webi@178.237.98.13) has joined #ceph
[11:19] <t4nek> Hello,
[11:20] <t4nek> using java S3 API for CEPH, I can use all functions except for downloading an object, which results in the following error :
[11:20] <t4nek> Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 400; Error Code: InvalidArgument; Request ID: tx00000000000000000009c-005720838a-1053-default), S3 Extended Request ID: null
[11:21] <t4nek> I did what is told in the doc example, is this example correct?
[11:25] <flaf> sep: not sure but it seems to me there are 2 different things here. When a osd (re)start it's recovery (the osd is already in the crushmap, there is no change concerning the crushmap). When the crushmap is changed (for instance after a ceph osd out ...etc.) it's backfill.
[11:30] * ylmson (~toast@6AGAABERR.tor-irc.dnsbl.oftc.net) Quit ()
[11:30] * bildramer1 (~utugi____@46.166.188.202) has joined #ceph
[11:35] <Heebie> Is anyone successfully using TOC routers in a "ring" and having success with it? (as opposed to spine routers sitting above TOC routers)
[11:35] * derjohn_mob (~aj@46.189.28.82) Quit (Ping timeout: 480 seconds)
[11:36] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:8cb1:7443:8517:e025) has joined #ceph
[11:37] * Teddybareman (~W|ldCraze@6AGAABERT.tor-irc.dnsbl.oftc.net) Quit ()
[11:37] * hassifa1 (~blank@politkovskaja.torservers.net) has joined #ceph
[11:37] <kasimon> Anyone having an idea what /lib/systemd/system/ceph-disk@.service is for?
[11:38] * pabluk__ is now known as pabluk_
[11:40] <sep> kasimon, it's the systemd service or unit file responsible for stopping/starting restarting status etc services on a systemd system. all except trusty are systemd now ref : http://docs.ceph.com/docs/master/release-notes/#v9.2.0-infernalis
[11:41] * huangjun (~kvirc@113.57.168.154) Quit (Ping timeout: 480 seconds)
[11:42] <kasimon> sep: yes, I know, I'm currently fighting with getting a jewel cluster to work with systemd.
[11:42] * allaok (~allaok@machine107.orange-labs.com) Quit (Quit: Leaving.)
[11:43] <Heebie> sadly, figting with systemd seems to be "the new black"
[11:43] <kasimon> But it's unclear to me what this specific unit file is ment for.
[11:43] <sep> i am not fluent in systemd i am afraid
[11:44] <sep> still on hammer.
[11:44] <kasimon> I'm actually making progress, but it's a bit difficult because what is delivered in the current official packages is incomplete and partially incoherent.
[11:45] <kasimon> Thus me asking for the purpose of that file.
[11:45] * derjohn_mob (~aj@46.189.28.82) has joined #ceph
[11:48] <sep> kaisan, thanks for paving the way atleast. i do not have a working infernalis or jewel lab atm.
[11:48] * Kupo1 (~t.wilson@23.111.255.162) Quit (Ping timeout: 480 seconds)
[11:48] <sep> so i have not looked to hard at the systemd parts
[11:49] * Kupo1 (~t.wilson@23.111.255.162) has joined #ceph
[11:53] <kasimon> No problem, I'm really interested into learning systemd. But I wonder what would be the best way to give feedback to the developers to improve the current situation.
[12:00] * bildramer1 (~utugi____@46.166.188.202) Quit ()
[12:00] * Cue (~tunaaja@politkovskaja.torservers.net) has joined #ceph
[12:02] * IvanJobs (~hardes@103.50.11.146) has joined #ceph
[12:07] * hassifa1 (~blank@7V7AAD70K.tor-irc.dnsbl.oftc.net) Quit ()
[12:12] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[12:23] * shylesh__ (~shylesh@121.244.87.118) has joined #ceph
[12:30] * Cue (~tunaaja@06SAABRLS.tor-irc.dnsbl.oftc.net) Quit ()
[12:30] * dux0r (~sardonyx@06SAABRM9.tor-irc.dnsbl.oftc.net) has joined #ceph
[12:31] * EinstCra_ (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[12:31] * Mika_c (~quassel@122.146.93.152) Quit (Remote host closed the connection)
[12:39] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[13:00] * dux0r (~sardonyx@06SAABRM9.tor-irc.dnsbl.oftc.net) Quit ()
[13:00] * HoboPickle (~homosaur@snowfall.relay.coldhak.com) has joined #ceph
[13:03] <PoRNo-MoRoZ> is it bad idea to store OS on usbflash ?
[13:05] <AvengerMoJo> PoRNo-MoRoZ, you mean running a system in a usb ?
[13:05] <PoRNo-MoRoZ> yep
[13:05] <PoRNo-MoRoZ> currently my nodes on fast usb sticks
[13:05] <AvengerMoJo> PoRNo-MoRoZ, it is fine
[13:06] <AvengerMoJo> PoRNo-MoRoZ, as long as you think the speed is acceptable :)
[13:06] <PoRNo-MoRoZ> sometimes one of my node leaves quorum
[13:06] <AvengerMoJo> oh
[13:06] <AvengerMoJo> PoRNo-MoRoZ, you mean running OSD or MON?
[13:06] <PoRNo-MoRoZ> looks like caused by write latency on lags
[13:06] <PoRNo-MoRoZ> MON
[13:06] <PoRNo-MoRoZ> *lags=logs
[13:06] <PoRNo-MoRoZ> should i move logs folder to tmpfs ?
[13:06] <AvengerMoJo> MON you may want to put the log into the tmpfs
[13:06] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[13:06] <PoRNo-MoRoZ> :D
[13:06] <AvengerMoJo> but then you can't reboot
[13:06] <AvengerMoJo> hehe
[13:07] <AvengerMoJo> so it is not good
[13:07] <PoRNo-MoRoZ> it probably should fix that quorum-leaving stuff ><
[13:07] <PoRNo-MoRoZ> but i can't reboot that node atm
[13:07] <AvengerMoJo> right
[13:07] <PoRNo-MoRoZ> anyway, thanks :)
[13:08] <AvengerMoJo> no problem
[13:08] <PoRNo-MoRoZ> that also caused 'spikes' in my monitoring ><
[13:08] <PoRNo-MoRoZ> it really pissing me off ><
[13:09] <kasimon> PoRNo-MoRoZ or you set up a remote log server and disable local logging
[13:09] <PoRNo-MoRoZ> that's an idea also
[13:09] <kasimon> Or log locally to tmpfs and have a persistent log on the logserver.
[13:09] <Be-El> PoRNo-MoRoZ: mons have the same requirement for their local state database as osd journals
[13:10] <Be-El> PoRNo-MoRoZ: and I'm sure standard usb sticks do not cope well with the O_DSYNC flag
[13:10] <PoRNo-MoRoZ> osds looks fine, atleast in logs there is no sudden leaves
[13:12] * Blueraven (~JamesHarr@politkovskaja.torservers.net) has joined #ceph
[13:12] * lmb (~Lars@p578a9261.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[13:12] * garphy is now known as garphy`aw
[13:19] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[13:19] * shyu (~Shanzhi@119.254.120.66) Quit (Remote host closed the connection)
[13:23] <IvanJobs> Hi cephers, after I upload a new object on to ceph cluster, how can I locate its final destination? FYI, I know how to locate OSDs storing this object, but just don't know how to find it on my xfs filesystem.
[13:25] <Heebie> The object would be stored as a bunch of chunks across all the XFS filesystems in the pool that's available to it. (depending on the CRUSH rules and how big the object is.)
[13:25] * zhaochao (~zhaochao@125.39.9.151) Quit (Quit: ChatZilla 0.9.92 [Firefox 45.0.2/20160413010457])
[13:27] <IvanJobs> ok, so an object becomes multiple chunks, there chuncks are replicated on to mutiple osds. So chunks forming original object was not necessary to be put on single osd?
[13:29] <IvanJobs> Let's say an object strip into 3 chuncks. because all these 3 chuncks follow the same crush policy, so they must be placed on to the same osds, am I right?
[13:29] <Heebie> If the "size" of your pool is "3" and you only have 3 disks, then you'll have all chunks of an object on each disk.
[13:30] * HoboPickle (~homosaur@7V7AAD71C.tor-irc.dnsbl.oftc.net) Quit ()
[13:30] * ghostnote1 (~Dragonsha@tor.krosnov.net) has joined #ceph
[13:30] <Heebie> No... each chunk will be put in a "placement group", then that chunk will be placed on each OSD that participates in that placement group.. .but different chunks get different placement groups.
[13:31] <IvanJobs> thx, Heebie, clear me out of this.
[13:33] <IvanJobs> So how can I find one single chunk of this object in my osd file system? can I see it in plain form? or ceph did some binary encoding of this chunck before write to disk.
[13:33] * garphy`aw is now known as garphy
[13:34] <Heebie> I think you can check what placement groups the file belongs to, and you can get a list of what OSD's are in what placement groups, and cross-reference?
[13:36] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[13:36] * yanzheng (~zhyan@118.116.115.159) Quit (Ping timeout: 480 seconds)
[13:39] * yanzheng (~zhyan@118.116.115.159) has joined #ceph
[13:41] * Blueraven (~JamesHarr@06SAABROY.tor-irc.dnsbl.oftc.net) Quit ()
[13:41] * nupanick (~sixofour@65.19.167.132) has joined #ceph
[13:55] * haplo37 (~haplo37@107-190-32-70.cpe.teksavvy.com) has joined #ceph
[13:56] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[13:58] <IvanJobs> Heebie, thx, I made it.
[14:00] * ghostnote1 (~Dragonsha@6AGAABEVX.tor-irc.dnsbl.oftc.net) Quit ()
[14:04] <t4nek> is there example of a working powershell API for CEPH radosgw?
[14:05] * gravetech (~roughsqua@mail.steelway.com) has joined #ceph
[14:07] * Racpatel (~Racpatel@2601:87:3:3601::675d) has joined #ceph
[14:10] * The1w (~jens@node3.survey-it.dk) has joined #ceph
[14:11] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[14:11] * nupanick (~sixofour@06SAABRQJ.tor-irc.dnsbl.oftc.net) Quit ()
[14:11] * matx (~Spikey@93.115.95.216) has joined #ceph
[14:14] <PoRNo-MoRoZ> how ceph determines primary osd in acting set ?
[14:14] <PoRNo-MoRoZ> does it favor latency ?
[14:14] <PoRNo-MoRoZ> or it pure random ?
[14:15] <Be-El> afaik it's the first osd generated by a crush rule
[14:15] <PoRNo-MoRoZ> based on my experiments i think same :D
[14:15] <PoRNo-MoRoZ> would be if ceph would favor latency
[14:16] <PoRNo-MoRoZ> would be nice if ceph would favor latency
[14:16] <Be-El> but there are also primary affinity values in the crush maps, that also influence which osd is selected as primary
[14:16] <Be-El> how should ceph measure latency?
[14:16] <PoRNo-MoRoZ> dunno, i got zabbix monitoring
[14:16] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) has joined #ceph
[14:16] <PoRNo-MoRoZ> i just see some osds have more latency than others
[14:17] <Be-El> that's measuring between the zabbix host and the monitored systems....ceph has connections between a client and all osds + mons, even accross networks or datacenters
[14:18] <PoRNo-MoRoZ> thanks for pointing me to affinity
[14:20] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[14:20] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[14:24] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[14:26] * gravey (~roughsqua@LONDON14-3096783633.sdsl.bell.ca) has joined #ceph
[14:29] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[14:30] * loft (~luckz@06SAABRS1.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:31] * gravetech (~roughsqua@mail.steelway.com) Quit (Ping timeout: 480 seconds)
[14:32] * zenpac (~zenpac3@66.55.33.66) has joined #ceph
[14:32] * gravetech (~roughsqua@LONDON14-3096783633.sdsl.bell.ca) has joined #ceph
[14:33] <Be-El> does the cephfs kernel driver provide a method to enforce updating the mdsmap?
[14:34] * gravey (~roughsqua@LONDON14-3096783633.sdsl.bell.ca) Quit (Ping timeout: 480 seconds)
[14:36] * gravey (~roughsqua@mail.steelway.com) has joined #ceph
[14:38] * oms101 (~oms101@p20030057EA002900C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[14:38] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:38] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Read error: Connection reset by peer)
[14:40] * gravetech (~roughsqua@LONDON14-3096783633.sdsl.bell.ca) Quit (Ping timeout: 480 seconds)
[14:41] <boolman> leseb: are you the maintainer of https://github.com/ceph/ceph-docker ? I was wondering if you got any plans for adding jewel
[14:41] * matx (~Spikey@6AGAABEW8.tor-irc.dnsbl.oftc.net) Quit ()
[14:42] * Jourei (~Nephyrin@Relay-J.tor-exit.network) has joined #ceph
[14:43] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[14:44] * oms101 (~oms101@p20030057EA002900C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[14:45] * IvanJobs (~hardes@103.50.11.146) Quit (Quit: Leaving)
[14:45] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[14:47] * icey (~Chris@pool-74-109-7-163.phlapa.fios.verizon.net) has joined #ceph
[14:51] * haplo37 (~haplo37@107-190-32-70.cpe.teksavvy.com) Quit (Ping timeout: 480 seconds)
[14:55] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) Quit (Quit: Leaving)
[14:55] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) has joined #ceph
[14:57] * vanham (~vanham@191.185.29.65) has joined #ceph
[14:57] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[14:58] <vanham> Morning everyone
[14:58] <vanham> Does anyone knows of a RadosGW install doc that doesn't use ceph-deploy?
[15:00] * EinstCra_ (~EinstCraz@58.247.119.250) has joined #ceph
[15:00] * loft (~luckz@06SAABRS1.tor-irc.dnsbl.oftc.net) Quit ()
[15:03] * gravetech (~roughsqua@mail.steelway.com) has joined #ceph
[15:04] * gravetech (~roughsqua@mail.steelway.com) Quit ()
[15:05] * xarses (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Ping timeout: 480 seconds)
[15:05] * xarses_ (~xarses@rrcs-24-173-18-66.sw.biz.rr.com) Quit (Ping timeout: 480 seconds)
[15:08] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Ping timeout: 480 seconds)
[15:08] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[15:09] * gravey (~roughsqua@mail.steelway.com) Quit (Ping timeout: 480 seconds)
[15:09] * rhonabwy (~elt@tor-exit1-readme.dfri.se) has joined #ceph
[15:11] * Jourei (~Nephyrin@06SAABRTR.tor-irc.dnsbl.oftc.net) Quit ()
[15:11] * kiasyn1 (~Kayla@192.87.28.82) has joined #ceph
[15:13] * vicente (~vicente@111-241-42-133.dynamic.hinet.net) has joined #ceph
[15:14] * bene3 (~bene@nat-pool-rdu-u.redhat.com) has joined #ceph
[15:16] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[15:17] * EinstCra_ (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[15:17] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[15:22] * csoukup (~csoukup@2605:a601:9c8:6b00:e487:bea:d796:8846) Quit (Ping timeout: 480 seconds)
[15:28] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[15:28] * huangjun (~kvirc@117.151.55.44) has joined #ceph
[15:35] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:37] * huangjun (~kvirc@117.151.55.44) Quit (Ping timeout: 480 seconds)
[15:39] * rhonabwy (~elt@06SAABRU3.tor-irc.dnsbl.oftc.net) Quit ()
[15:39] * ylmson1 (~Nephyrin@128.153.145.125) has joined #ceph
[15:41] * kiasyn1 (~Kayla@06SAABRU6.tor-irc.dnsbl.oftc.net) Quit ()
[15:42] * HoboPickle (~Ralth@85.159.237.210) has joined #ceph
[15:43] * Aeso (~aesospade@aesospadez.com) Quit (Quit: Leaving)
[15:43] * Aeso (~aesospade@aesospadez.com) has joined #ceph
[15:45] * wwdillingham (~LobsterRo@140.247.242.44) has joined #ceph
[15:46] * kasimon (~user@2a02:2450:dd1f::2450) has left #ceph
[15:48] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[15:55] * Drankis (~martin@89.111.13.198) has joined #ceph
[15:56] * mtanski_ (~mtanski@65.244.82.98) has joined #ceph
[15:57] * diegows_ (~diegows@main.woitasen.com.ar) has joined #ceph
[15:58] * mohmultihhouse (~mohmultih@2.106.149.41) has joined #ceph
[15:59] * mtanski (~mtanski@65.244.82.98) Quit (Ping timeout: 480 seconds)
[15:59] * mtanski_ is now known as mtanski
[15:59] * diegows (~diegows@main.woitasen.com.ar) Quit (Ping timeout: 480 seconds)
[16:01] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[16:01] * huangjun (~kvirc@117.151.55.44) has joined #ceph
[16:01] * yanzheng (~zhyan@118.116.115.159) Quit (Quit: This computer has gone to sleep)
[16:03] <leseb> boolman: yeah, will do that soon, i'm currently at the openstack summit so I don't have much time at the moment :), will try that next week
[16:03] <rkeene> What you need is more OpenNebula :-D
[16:04] <boolman> leseb: no worries man, just wondering! :) have fun over there
[16:04] <leseb> boolman: sure, thanks ;)
[16:05] * mohmultihouse (~mohmultih@gw01.mhitp.dk) Quit (Ping timeout: 480 seconds)
[16:07] * The1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:07] * mohmultihhouse (~mohmultih@2.106.149.41) Quit (Ping timeout: 480 seconds)
[16:09] * lmb (~Lars@p578a9261.dip0.t-ipconnect.de) has joined #ceph
[16:09] * ylmson1 (~Nephyrin@76GAAEVX8.tor-irc.dnsbl.oftc.net) Quit ()
[16:09] * dsl (~dsl@204.155.27.222) has joined #ceph
[16:09] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) Quit (Quit: Leaving)
[16:09] * bene3 (~bene@nat-pool-rdu-u.redhat.com) Quit (Ping timeout: 480 seconds)
[16:11] * HoboPickle (~Ralth@06SAABRW4.tor-irc.dnsbl.oftc.net) Quit ()
[16:12] * Kottizen (~Coestar@remailer.cpunk.us) has joined #ceph
[16:12] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) has joined #ceph
[16:12] * dsl (~dsl@204.155.27.222) Quit (Remote host closed the connection)
[16:12] * kawa2014 (~kawa@83.111.58.108) Quit (Ping timeout: 480 seconds)
[16:12] * dsl (~dsl@204.155.27.222) has joined #ceph
[16:16] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[16:17] * Aeso (~aesospade@aesospadez.com) Quit (Quit: Leaving)
[16:18] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[16:18] * xarses (~xarses@209.94.245.220) has joined #ceph
[16:18] * xarses_ (~xarses@209.94.245.220) has joined #ceph
[16:19] * danieagle (~Daniel@201-1-132-74.dsl.telesp.net.br) has joined #ceph
[16:21] * Drankis (~martin@89.111.13.198) Quit (Quit: Leaving)
[16:21] * Aeso (~aesospade@aesospadez.com) has joined #ceph
[16:21] * kawa2014 (~kawa@46.166.190.189) has joined #ceph
[16:23] * csoukup (~csoukup@159.140.254.104) has joined #ceph
[16:25] <azizulhakim> which one is the gsoc channel for ceph?
[16:30] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[16:31] * mohmultihouse (~mohmultih@185.85.5.78) has joined #ceph
[16:31] * mattbenjamin1 (~mbenjamin@aa2.linuxbox.com) has joined #ceph
[16:33] * alexxy (~alexxy@biod.pnpi.spb.ru) Quit (Remote host closed the connection)
[16:34] * alexxy (~alexxy@biod.pnpi.spb.ru) has joined #ceph
[16:37] * nardial (~ls@dslb-178-001-225-172.178.001.pools.vodafone-ip.de) has joined #ceph
[16:38] * nardial (~ls@dslb-178-001-225-172.178.001.pools.vodafone-ip.de) Quit ()
[16:39] * overclk (~quassel@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:39] * Tarazed (~AluAlu@turing.tor-exit.calyxinstitute.org) has joined #ceph
[16:41] * mohmultihouse (~mohmultih@185.85.5.78) Quit (Ping timeout: 480 seconds)
[16:41] * Kottizen (~Coestar@4MJAAEHPS.tor-irc.dnsbl.oftc.net) Quit ()
[16:41] * Keiya (~tuhnis@176.116.104.49) has joined #ceph
[16:42] * ibravo (~ibravo@2605:b400:104:11:ade6:a4d9:e276:e83b) has joined #ceph
[16:46] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[16:46] * haplo37 (~haplo37@199.91.185.156) Quit (Ping timeout: 480 seconds)
[16:49] * dalgaaf (uid15138@id-15138.ealing.irccloud.com) has joined #ceph
[16:51] * nardial (~ls@dslb-178-001-225-172.178.001.pools.vodafone-ip.de) has joined #ceph
[16:52] * nardial (~ls@dslb-178-001-225-172.178.001.pools.vodafone-ip.de) Quit ()
[16:53] * dgurtner (~dgurtner@178.197.236.141) has joined #ceph
[16:55] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[16:55] * shylesh__ (~shylesh@121.244.87.118) Quit (Read error: Connection reset by peer)
[17:01] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[17:03] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[17:05] * kefu_ (~kefu@183.193.162.205) has joined #ceph
[17:07] * derjohn_mob (~aj@46.189.28.82) Quit (Ping timeout: 480 seconds)
[17:07] * dgurtner (~dgurtner@178.197.236.141) Quit (Read error: Connection reset by peer)
[17:09] * Tarazed (~AluAlu@4MJAAEHQ3.tor-irc.dnsbl.oftc.net) Quit ()
[17:11] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[17:11] * Keiya (~tuhnis@6AGAABE3C.tor-irc.dnsbl.oftc.net) Quit ()
[17:11] * SquallSeeD31 (~brianjjo@4MJAAEHR5.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:12] <wwdillingham> Post infernalis ???> jewel upgrade I seem to have encountered server ???> client breakdown, I can still create rbd images from my clients, but cannot map them, similarly, I am unable to mount my cephfs filesystem. I am running a 3.10 kernel, i have increased verbosity on my logs and dont really see any indication of client connection attempts in the mds log. the servers seem happy (health okay) but my clients are totally borked.
[17:12] * kefu_ (~kefu@183.193.162.205) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:12] * kefu (~kefu@114.92.122.74) Quit (Ping timeout: 480 seconds)
[17:14] * kefu (~kefu@183.193.162.205) has joined #ceph
[17:15] * dc-tx (~dc-tx@rrcs-97-79-186-163.sw.biz.rr.com) has joined #ceph
[17:16] * ibravo (~ibravo@2605:b400:104:11:ade6:a4d9:e276:e83b) Quit (Quit: This computer has gone to sleep)
[17:16] * ibravo (~ibravo@72.198.142.104) has joined #ceph
[17:17] * dc-tx (~dc-tx@rrcs-97-79-186-163.sw.biz.rr.com) has left #ceph
[17:17] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[17:18] <m0zes> wwdillingham: jewel turns on a bunch of rbd format 2 features that aren't supported by 3.10. I think you need to disable them.
[17:18] * kefu_ (~kefu@114.92.122.74) has joined #ceph
[17:18] * garphy is now known as garphy`aw
[17:20] * dc-tx (~dc-tx@rrcs-97-79-186-163.sw.biz.rr.com) has joined #ceph
[17:21] * dc-tx (~dc-tx@rrcs-97-79-186-163.sw.biz.rr.com) Quit ()
[17:21] * BrianA1 (~BrianA@c-50-168-46-112.hsd1.ca.comcast.net) has joined #ceph
[17:22] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[17:22] * kefu (~kefu@183.193.162.205) Quit (Ping timeout: 480 seconds)
[17:27] <wwdillingham> m0zes: so I should set rbd_default_features = 1 cluster-wide ?
[17:29] * t4nek (~oftc-webi@178.237.98.13) Quit (Quit: Page closed)
[17:29] <m0zes> wwdillingham: at least for the hosts creating rbd images.
[17:30] <m0zes> give it a shot at least ;)
[17:30] <evilrob> is there a discussion somewhere that I can't find on "how many OSDs is too many per node"
[17:31] * winston-d_ (uid98317@id-98317.richmond.irccloud.com) has joined #ceph
[17:31] <Heebie> evilrob: It depends.... you don't want to oversaturate your networking, you want to have enough RAM for the OSD's, and you want to have enough CPU for the OSD's. Those are all limiting factors.
[17:33] * kefu_ (~kefu@114.92.122.74) Quit (Read error: Connection reset by peer)
[17:33] <Be-El> evilrob: and do not forget to include the overhead for backfilling/recovery operations
[17:34] * haomaiwang (~haomaiwan@rrcs-67-79-205-19.sw.biz.rr.com) Quit (Remote host closed the connection)
[17:34] <Heebie> Yeah, what Be-El said. :)
[17:34] * kefu (~kefu@114.92.122.74) has joined #ceph
[17:35] * neurodrone (~neurodron@158.106.193.162) has joined #ceph
[17:36] <evilrob> I think we're speccing 256GB RAM and 24 cores for the nodes
[17:36] <evilrob> with 40Gb network
[17:37] <Heebie> 24 notional, or 24 physical?
[17:37] <evilrob> 2 12core processors
[17:37] * ibravo (~ibravo@72.198.142.104) Quit (Quit: Leaving)
[17:38] <evilrob> for 42 OSDs
[17:39] <Be-El> evilrob: we just purchased similar boxes (128 GB instead of 256) for 14 OSD (12 HDD + 2 SSD). they are completely overpowered in our setup
[17:39] <Heebie> Sounds like you might get 48 OSD's to work on that. How many of them are you shooting for?
[17:39] <Be-El> during backfilling we currently have peak load spikes of 10
[17:40] <Heebie> and.. don't forget to consider what you want your largest failure domain to be.
[17:40] <bloatyfloat> evilrob: We used hex cores in 30OSD compute units but with SATA storage, get acceptable performance
[17:40] * Concubidated (~cube@71.177.40.123) has joined #ceph
[17:40] * derjohn_mob (~aj@88.128.80.20) has joined #ceph
[17:40] <evilrob> ok. we're looking at cisco UCS C3160's 3 rows of platters 1 row of SSDs
[17:40] <Be-El> evilrob: with 40GB ethernet and 48 drives you also have to think about PCI express layout, e.g. disitributing IO load oder processors etc.
[17:41] <Be-El> and include the journals in your calculation
[17:41] * SquallSeeD31 (~brianjjo@4MJAAEHR5.tor-irc.dnsbl.oftc.net) Quit ()
[17:42] * Teddybareman (~JWilbur@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[17:42] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[17:42] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[17:42] <evilrob> ok. I'm thinking 42 OSDs with SSDs available for journals. I might get the 512GB boxes though. (still 24 cores)
[17:42] <bloatyfloat> evilrob: Would recommend using smaller units over high disk density as that can cause pain during outages/recovery
[17:42] * pabluk_ is now known as pabluk__
[17:43] <bloatyfloat> evilrob: We are looking to drop from 30osds in a compute unit to ~20
[17:43] <rkeene> Right, with Ceph you're essentially doing RAID over the machines
[17:43] <evilrob> yeah... going back and forth about using 8TB or 4TB drives.
[17:43] * s3an2 (~root@korn.s3an.me.uk) has joined #ceph
[17:44] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[17:45] <Be-El> evilrob: maybe 512 GB is a good idea....if you want to use 5 GB journals (default size), each OSD might keep 5 GB in memory (writes operation are written to journal, but also stored in memory. journal is only read during recovery at osd startup). 48 x 5 GB = 240 GB
[17:45] <Heebie> 42 OSD's is an immense amount of storage to lose a a single failure domain. eek! =O
[17:45] <Be-El> evilrob: we use 6 TB drives. and ensure that you do not purchase SMR drives....
[17:46] <Heebie> SMR drives? I think I need to google that.
[17:47] <evilrob> The other option is C240's with 12 disks each
[17:50] * Skaag (~lunix@65.200.54.234) has joined #ceph
[17:50] <Heebie> Oh... SMR drives look really icky! =O Maybe use that type of design if you're storing stuff in the style of Amazon Glacier.
[17:50] <Be-El> Heebie: or in mostly-read scenarios
[17:51] * haomaiwang (~haomaiwan@2605:b400:104:11:318c:9708:3b8c:93fc) has joined #ceph
[17:53] * m0zes is going to attempt to use them in an archival setup.
[17:54] <Heebie> sounds like something to use in a WORM format if at all possible.
[17:54] <m0zes> an "archive" cephfs pool. erasure encoded. big and slow. potentially putting as many as 96 osds per host.
[17:55] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[17:55] <Heebie> m0zes: sounds like the performance will be glacial indeed.
[17:55] * rraja (~rraja@121.244.87.117) Quit (Remote host closed the connection)
[17:57] * haomaiwa_ (~haomaiwan@2605:b400:104:11:41ec:54f2:3b88:2a65) has joined #ceph
[17:57] * haomaiwa_ (~haomaiwan@2605:b400:104:11:41ec:54f2:3b88:2a65) Quit (Remote host closed the connection)
[17:58] * haomaiwa_ (~haomaiwan@2605:b400:104:11:41ec:54f2:3b88:2a65) has joined #ceph
[17:58] <m0zes> yep. but cheap. we'll be charging the researchers ~$500 for 8TB of space over 3 years. and researchers can buy in in 8TB chunks. overkill for most, but I've got a few that will need 4-5 chunks (right *now*)
[17:59] <m0zes> at the moment I've got users storing 40-50TB of data in there homedirs on fast storage. If I can shrink that down, I can make homedirs even faster.
[17:59] * haomaiwang (~haomaiwan@2605:b400:104:11:318c:9708:3b8c:93fc) Quit (Ping timeout: 480 seconds)
[17:59] <Heebie> That sure is cheap! What are you charging for the "fast" tiers?
[18:00] * reed (~reed@rrcs-97-79-186-162.sw.biz.rr.com) has joined #ceph
[18:01] <m0zes> we don't charge for the fast tiers right now. we're looking at putting a quota on the homedirs soon, though. something in the 1-2TB range, so most won't hit it. scratch (which is fast) won't have a quota, but it is patrolled and cleaned out automatically.
[18:02] <Heebie> What kind of workloads are you running there? Deeply-data-dependent scientific research or something? ;)
[18:02] * thansen (~thansen@162.219.43.108) Quit (Quit: Ex-Chat)
[18:02] <evilrob> ok... I'll probably go with the C240's I can put 12 8TB drives and a pair of SSDs in them. so 12 OSDs, boot off half the SSD, use the other half for journal, 256GB memory, 24 cores (2x12 like above)
[18:03] * haomaiwa_ (~haomaiwan@2605:b400:104:11:41ec:54f2:3b88:2a65) Quit (Quit: Leaving...)
[18:03] * vata (~vata@207.96.182.162) has joined #ceph
[18:03] <m0zes> bioinformatics, molecular dynamics, physics, and statistics. the first two are the biggest storage users.
[18:04] <m0zes> and I'm clumping genomics in with bioinformatics.
[18:04] <Heebie> OK, so that sounds data intensive. :) How fast is that storage pool?
[18:05] <m0zes> we've got a research project now that wants to run 1000 genomes, each generating 1TB of data. due to data retention policies, that results have to be stored for 3 years after the grant ends.
[18:05] <m0zes> scratch can read/write at about 1GB/s and we're adding more nodes.
[18:06] <Heebie> So, a petabyte that needs to be retained for 3 years.
[18:06] <m0zes> yep, and that is just one project ;)
[18:06] <Heebie> How many nodes with how many disks, perchance?
[18:06] <m0zes> 24 nodes, 16 spinners and 2 pcie ssds each.
[18:07] * m0zes has to run and teach a class now. I'll be back
[18:07] <Heebie> (I'm putting together a proposal with 21 x servers with 10 rusties, 2 OSD's for journals, and 1 of the OSD's is SSD.
[18:08] <Heebie> trying to determine what kind of "real world" throughput might see is pretty crazy.
[18:08] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[18:08] <Walex> I think that drives larger than 1Tb have terrifyingly bad IOPS-per-TB. But lots of people know better :-).
[18:09] * huangjun (~kvirc@117.151.55.44) Quit (Ping timeout: 480 seconds)
[18:09] * SaneSmith (~K3NT1S_aw@91.109.29.120) has joined #ceph
[18:11] * Teddybareman (~JWilbur@76GAAEV1C.tor-irc.dnsbl.oftc.net) Quit ()
[18:12] <Heebie> good luck convincing anyone in management to get 3-8 times as many nodes so you can stick with 1TB drives! =O
[18:12] <Walex> Heebie: Lots of PHBs know better :-)
[18:13] <Heebie> Uhm... I believe you Dilbert... or are you Wally?
[18:14] <wwdillingham> m0zes: setting rbd_default_features = 1 cluster wide did not change the issue (newly created rbd images only have layering) but still cannot map them, unfortunately. The only other images on the cluster were created while on infernalis (and only have layering as well)
[18:14] <wwdillingham> I am going to upgrade the kernel, nevertheless
[18:15] <Heebie> wwdillingham: I had problems previously where I couldn't mount things because I had to map out things in the feature-map to make them backwards-compatible with the older kernel versions on client machines.
[18:15] * lifeboy (~roland@196.32.233.188) has joined #ceph
[18:16] <wwdillingham> what do you mean map things in the feature map Heebie ?
[18:17] <wwdillingham> do you mean enable object map feature?
[18:19] <Walex> Heebie: in a previous job some PHBs went for a huge data acquisition system on slow large drives in RAID6. Amazing results happened.... :-)
[18:19] <Heebie> It was a long time ago...I don't recall fully. There's a "feature map" that's just a series of ones that turn features on and off cluster-wide, if I'm recalling correctly. I had to turn off the highest 2-3 bits in that mask before I could use CentOS 6 machines as CEPH clients or some such thing. (I had to use FUSE to get CentOS 5 clients to work... but.. these were both because I couldn't put in a custom kernel.)
[18:19] <Heebie> 0.00032745 IOPS across 10 PB of storage? >_<
[18:19] <Heebie> or was it negative numbers?
[18:20] <Walex> Heebie: the lab users found that doing data acquisition to USB2 sticks was faster and timed out less often :-)
[18:20] <Heebie> EEK!!!!! =O
[18:21] <Heebie> We're using custom-built ZFS setups, but they're not scalable enough for either performance nor space.
[18:22] <evilrob> so it'll cost me slightly more, but if I go with the C240's with 10 8TB drives and 2 SSDs dedicated for journaling, I'll get 10 OSDs per 256GB RAM and 24 cores and 10Gb of network.
[18:22] <Heebie> We've looked at about 20 commercial solutions that were all WAYYYYYY too expensive. (one was over ???4,000/GB... yes GB)
[18:23] * SDub (~SDub@chippewa-nat.cray.com) has joined #ceph
[18:23] <Heebie> evilrob: sounds like it'll scream if you have a bunch of nodes. :)
[18:23] <SDub> is there anything one needs to know about ceph if it's being used as storage for glance and cinder, and those openstack services are behind a HAProxy server running SSL?
[18:23] <evilrob> probably start with 12
[18:23] <SDub> does that effect the configuration of ceph or how it operates?
[18:23] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) has joined #ceph
[18:24] * garphy`aw is now known as garphy
[18:24] <Heebie> SDub: There are probably performance considerations you should read up on. (I don't know what they are) There is a CEPH PG calculator which might be of some use to you in that regard, as well.
[18:26] <evilrob> hmmm need to start with a raw PB and 12 of those boxes only gives me 900T. So 16
[18:26] <SDub> that is probably a concern I should have, Heebie. However, right now my concern is that the images I'm storing in ceph are corrupt. When glance tries to transfer them to cinder the checksum changes and the transfer errors.
[18:28] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[18:28] * dgurtner (~dgurtner@207.91.176.136) has joined #ceph
[18:31] * Rickus__ (~Rickus@office.protected.ca) Quit (Read error: Connection reset by peer)
[18:32] <Walex> Heebie: BTW, that situation was first Lustre, then they redid it as GPFS. Curiously, same result. :-)
[18:32] <Heebie> corrupt? That sounds very bad. Are you seeing errors in CEPH? I don't know what GLANCE is, but I think Cinder is OpenStack block storage, right?
[18:32] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[18:33] <evilrob> It's decided. 16 of the above C240's with 10Gb network each. not including cost of ports, this is coming out to about $0.11/GB. God I love working for the hardware manufacturer :)
[18:35] <Heebie> evilrob: Must be nice. :( I have to BUY from them, and they want to gouge me all the time! Once major SAN vendor heard that I liked CEPH, and proposed building a CEPH system where the OSD's were FC LUNs on a SAN.... which kinda misses the point.
[18:35] <Walex> evilrob: backing up and re-balancing and checking a backend made of 8TB drives is going to make you happy! :-)
[18:36] * jordanP (~jordan@bdv75-2-81-57-250-57.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[18:36] <monsted> Heebie: i'm not even surprised they'd suggest that
[18:36] <evilrob> we got handed a bunch of hardware when we got bought. Currently running blades in multiple chassis connected to SAN with LUNs as OSDs. This is the "we told you we'd make you money now let us spec our own hardware" buy.
[18:37] <Heebie> monsted: I can't say I'm surprised.. they want to stay relevant.
[18:38] * kawa2014 (~kawa@46.166.190.189) Quit (Quit: Leaving)
[18:38] <SDub> Heebie, correct. Glance is what manages Openstack's images. I'm not seeing much of anything in ceph logs, just errors in cinder-volume logs.
[18:38] <monsted> then they should stop making crap products and start actually giving HDS some competition.
[18:38] <Heebie> HDS?
[18:38] <evilrob> hitachi
[18:39] <monsted> hitachi data systems, yeah
[18:39] <evilrob> the REALLY fast stuff
[18:39] <Heebie> Oh. OK.
[18:39] * SaneSmith (~K3NT1S_aw@06SAABR5N.tor-irc.dnsbl.oftc.net) Quit ()
[18:39] <Heebie> SDub: So, glance is having problems with storage managed by cinder that's back-ended by CEPH? Virtual world problems! )
[18:39] <Heebie> ;)
[18:40] <SDub> :)
[18:40] * nathani (~nathani@2607:f2f8:ac88::) Quit (Ping timeout: 480 seconds)
[18:40] <evilrob> SDub: I backend openstack clusters with ceph. I don't front the opestack services with HAProxy though.
[18:41] <SDub> Yeah, we just started implementing HAproxy and just going through all the bugs.
[18:41] <evilrob> but knowing how they work against ceph, no. The HAProxy just balances between the different services. Each service daemon contacts ceph separately.
[18:41] <Heebie> Do you have stickiness set on the HAproxy servers for the backend? (Not using Varnish for some stuff that's HAproxied?)
[18:43] * derjohn_mob (~aj@88.128.80.20) Quit (Ping timeout: 480 seconds)
[18:44] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[18:45] <SDub> Heebie, I don't believe we do
[18:45] * overclk (~quassel@117.202.99.55) has joined #ceph
[18:47] <Heebie> SDub: OK, scenario to check. If you're using round-robin, or least-connections, you should see sessions going to different back-end servers often, so if there's any persistent session authentication, that might be what's not working, as a session might authenticate against backend X, then send a data request to Y. (If sessioning is handled by cookies, or simply not used and every request requires a full authentication handshake, or keypai
[18:47] * nathani (~nathani@2607:f2f8:ac88::) has joined #ceph
[18:48] <Heebie> That got cut off, didn't it? If so... where? =O
[18:49] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:49] <SDub> at the word "keypair"
[18:50] <diq> Hi! If I'm reading the BoF live notes correctly, the upgrade path from FileStore to BlueStore is as simple as stopping the old OSD and starting up a bluestore OSD?
[18:50] <Heebie> or keypair.. then it this is an invalid theory)
[18:50] <Heebie> I almost got it into one message. ;)
[18:51] <diq> PG's can be placed on both FileStore and BlueStore for the same pool?
[18:51] * dgurtner (~dgurtner@207.91.176.136) Quit (Ping timeout: 480 seconds)
[18:52] * Heebie wonders what FileStore and BlueStore are.
[18:53] * pabluk__ is now known as pabluk_
[18:53] * reed (~reed@rrcs-97-79-186-162.sw.biz.rr.com) Quit (Quit: Ex-Chat)
[18:56] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:8cb1:7443:8517:e025) Quit (Ping timeout: 480 seconds)
[18:57] <diq> FileStore = existing OSD storage format
[18:57] <diq> BlueStore = future OSD storage format
[18:57] <diq> FileStore is POSIX based and uses underlying file systems
[18:57] <diq> BlueStore is key/value based on RocksDB
[18:58] * penguinRaider (~KiKo@14.139.82.6) Quit (Read error: Connection reset by peer)
[18:59] * penguinRaider (~KiKo@14.139.82.6) has joined #ceph
[19:01] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Quit: Leaving.)
[19:03] <Heebie> Oh. Is BlueStore significantly improved performance wise?
[19:04] * linjan__ (~linjan@86.62.112.22) Quit (Ping timeout: 480 seconds)
[19:09] * measter1 (~spidu_@67.ip-92-222-38.eu) has joined #ceph
[19:10] * garphy is now known as garphy`aw
[19:11] * JamesHarrison (~Yopi@marcuse-1.nos-oignons.net) has joined #ceph
[19:12] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:8cb1:7443:8517:e025) has joined #ceph
[19:15] * wolsen (~quassel@152.34.213.162.lcy-01.canonistack.canonical.com) Quit (Remote host closed the connection)
[19:19] * mykola (~Mikolaj@91.245.77.36) has joined #ceph
[19:22] * jmunhoz (~jmunhoz@51.pool85-61-139.dynamic.orange.es) has joined #ceph
[19:32] <via> the ceph upgrade to jewel runs restorecon on /var/lib/ceph -_- that could take so long
[19:38] * shylesh__ (~shylesh@45.124.226.179) has joined #ceph
[19:38] * vicente (~vicente@111-241-42-133.dynamic.hinet.net) Quit (Ping timeout: 480 seconds)
[19:39] * measter1 (~spidu_@6AGAABE9Q.tor-irc.dnsbl.oftc.net) Quit ()
[19:41] * JamesHarrison (~Yopi@76GAAEV4K.tor-irc.dnsbl.oftc.net) Quit ()
[19:45] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[19:46] * smf681 (~Kizzi@tor-exit-node.seas.upenn.edu) has joined #ceph
[19:47] * overclk (~quassel@117.202.99.55) Quit (Remote host closed the connection)
[19:47] * xarses (~xarses@209.94.245.220) Quit (Ping timeout: 480 seconds)
[19:47] * xarses_ (~xarses@209.94.245.220) Quit (Ping timeout: 480 seconds)
[19:50] <Kupo1> Can any ceph guys confirm weither or not rbd diff is affected by replica counts?
[19:55] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[19:56] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[19:57] <lifeboy> I've been hacking away at a broken cluster for some days now and still don't have joy. Of my cluster of 4 hosts I broke 2. There were 19 OSD and only 2 in total on the two crashed hosts.
[19:57] <lifeboy> So I've been trying to get one of the hosts back into the cluster.
[19:59] <lifeboy> I can't clear all the nodes and start again, there's "stuff" on the cluster that I can't lose, so I have recorded my steps in added a "new" host to the cluster here: http://pastebin.com/ftwjjdCe
[19:59] * wwdillingham (~LobsterRo@140.247.242.44) has left #ceph
[19:59] * wwdillingham (~LobsterRo@140.247.242.44) has joined #ceph
[20:00] * Skaag (~lunix@65.200.54.234) Quit (Quit: Leaving.)
[20:01] * haomaiwang (~haomaiwan@2605:b400:104:11:494b:9d54:390:f74b) has joined #ceph
[20:01] <lifeboy> When I use "ceph-deploy new newhost2" from and existing cluster, will I add the host to the that cluster, or will I create a new cluster?
[20:02] <lifeboy> I can't afford to break my cluster by issuing this command and then loose my cluster...
[20:02] * Skaag (~lunix@65.200.54.234) has joined #ceph
[20:02] * Nacer (~Nacer@14.186.140.221) has joined #ceph
[20:04] <lifeboy> The documentation always only speak of creating a new cluster with "ceph-deploy new". Do I use it to add a new node to a cluster as well?
[20:09] * Inuyasha (~Atomizer@anonymous.sec.nl) has joined #ceph
[20:11] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Remote host closed the connection)
[20:13] * vanham (~vanham@191.185.29.65) Quit (Ping timeout: 480 seconds)
[20:16] * smf681 (~Kizzi@6AGAABFBC.tor-irc.dnsbl.oftc.net) Quit ()
[20:16] * Jaska (~Gibri@46.182.106.190) has joined #ceph
[20:17] <lifeboy> Aha, I found "ceph-deploy mon add" that seems to be what I'm looking for
[20:21] * vanham (~vanham@191.185.29.65) has joined #ceph
[20:29] * kefu (~kefu@114.92.122.74) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[20:31] <wwdillingham> lifeboy: ceph-deploy new generates a new ceph.conf in the pwd on the deploy host, ceph-deploy mon add would be likely what you want, you can also add a new mon without ceph-deploy using this method http://dachary.org/loic/ceph-doc/dev/mon-bootstrap/#initially-peerless-expansion
[20:34] * haomaiwang (~haomaiwan@2605:b400:104:11:494b:9d54:390:f74b) Quit (Remote host closed the connection)
[20:34] * Hemanth (~hkumar_@103.228.221.189) has joined #ceph
[20:36] * derjohn_mob (~aj@x590e23be.dyn.telefonica.de) has joined #ceph
[20:36] * vanham (~vanham@191.185.29.65) Quit (Ping timeout: 480 seconds)
[20:39] * Inuyasha (~Atomizer@06SAABSBW.tor-irc.dnsbl.oftc.net) Quit ()
[20:39] * geegeegee (~lmg@176.10.99.207) has joined #ceph
[20:40] * xarses (~xarses@209.94.245.220) has joined #ceph
[20:40] * xarses_ (~xarses@209.94.245.220) has joined #ceph
[20:40] * xarses (~xarses@209.94.245.220) Quit (Remote host closed the connection)
[20:40] * xarses (~xarses@209.94.245.220) has joined #ceph
[20:42] * vicente (~vicente@111-241-42-133.dynamic.hinet.net) has joined #ceph
[20:45] * vanham (~vanham@191.185.29.65) has joined #ceph
[20:46] * Jaska (~Gibri@06SAABSB6.tor-irc.dnsbl.oftc.net) Quit ()
[20:46] * totalwormage (~nupanick@argenla.tor-exit.network) has joined #ceph
[20:50] * vicente (~vicente@111-241-42-133.dynamic.hinet.net) Quit (Ping timeout: 480 seconds)
[20:51] <lifeboy> wwdillingham: ceph-deploy mon add doesn't quite do what I thought it would though. I get an error timeout from the cluster
[20:51] <lifeboy> 2016-04-27 20:43:25.059896 7fdce46f7700 0 -- :/2951884774 >> 192.168.121.32:6789/0 pipe(0x7fdcdc000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fdcdc004ef0).fault
[20:53] <wwdillingham> is your existing monitor quorum in a good state?
[20:55] <lifeboy> No, I don't think so. Pls check my description of the problem at 19:57 above
[20:55] <wwdillingham> not part of my irc history, sorry
[20:55] <lifeboy> UCT+2
[20:55] <lifeboy> I've been hacking away at a broken cluster for some days now and still don't have joy. Of my cluster of 4 hosts I broke 2. There were 19 OSD and only 2 in total on the two crashed hosts.
[20:55] <lifeboy> So I've been trying to get one of the hosts back into the cluster.
[20:55] <lifeboy> I can't clear all the nodes and start again, there's "stuff" on the cluster that I can't lose, so I have recorded my steps in added a "new" host to the cluster here: http://pastebin.com/ftwjjdCe
[20:56] <wwdillingham> I dont believe you will be able to add a new monitor host when you have a non-existent quorum.
[20:56] <lifeboy> The pastebin is missing the "ceph-deploy mon add" part, so it creates a new cluster.
[20:57] <lifeboy> So how can I possibly recover the cluster or force quorum?
[20:58] <lifeboy> Is there a way to manually add a new host and then mon, etc.?
[20:58] <wwdillingham> you want to work with the pre-existing monitors only (keep in mind ive never faced this situation so not speaking from experience)
[20:59] <wwdillingham> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovering-a-monitor-s-broken-monmap
[21:00] <wwdillingham> I *think* what you want to do is get your latest working monmap and inject it into your preexisting monitors
[21:00] <lifeboy> I have a small hope of recovering the one of the hosts, since the only thing "wrong" with it is that it hangs when it boots up... So I suppose that can possibly be fixed by disabling some things from a boot dvd
[21:01] <wwdillingham> how many mons did you have before you failed?
[21:01] * haomaiwang (~haomaiwan@rrcs-67-79-205-19.sw.biz.rr.com) has joined #ceph
[21:01] <wwdillingham> 4?
[21:02] <lifeboy> I had 3. If I get three hosts up again, I will be able to achieve quorum again, right?
[21:02] <wwdillingham> you could have quorum with only 2/3
[21:02] <wwdillingham> you need a strict majority
[21:03] <lifeboy> ceph -s reports: monmap e2: 2 mons at {h1=192.168.121.30:6789/0,s1=192.168.121.33:6789/0}, election epoch 520, quorum 0,1 h1,s1
[21:03] <lifeboy> Does that mean I have quorum?
[21:04] <wwdillingham> yes, i believe you do
[21:05] <wwdillingham> are you health warn or health err?
[21:06] <lifeboy> health warn
[21:06] * Jokerz (c05e6605@107.161.19.53) has joined #ceph
[21:07] <wwdillingham> what does ceph status show exactly
[21:07] <wwdillingham> pastebin it
[21:08] <Jokerz> sill question... just built a demo ceph cluster 3 mon 3 osd. How do i properly shut it down. I am currently only setting "ceph osd set noout" then shutting all the boxes off.
[21:09] * haomaiwang (~haomaiwan@rrcs-67-79-205-19.sw.biz.rr.com) Quit (Ping timeout: 480 seconds)
[21:09] * geegeegee (~lmg@4MJAAEHXX.tor-irc.dnsbl.oftc.net) Quit ()
[21:09] * hifi1 (~delcake@6AGAABFEJ.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:11] * garphy`aw is now known as garphy
[21:11] <BranchPredictor> Jokerz: on how many nodes?
[21:12] * thomnico (~thomnico@2a01:e35:8b41:120:1504:6b12:91f9:b92f) Quit (Quit: Ex-Chat)
[21:12] <xcezzz> Jokerz: take all osds down??? then mon nodes??? leaving `ceph osd set noout` will stop it backfilling when osd???s disappear
[21:16] * Nacer (~Nacer@14.186.140.221) Quit (Remote host closed the connection)
[21:16] * Rickus (~Rickus@office.protected.ca) has joined #ceph
[21:16] * totalwormage (~nupanick@06SAABSDF.tor-irc.dnsbl.oftc.net) Quit ()
[21:16] * Blueraven (~Uniju@lumumba.torservers.net) has joined #ceph
[21:17] * jmunhoz (~jmunhoz@51.pool85-61-139.dynamic.orange.es) Quit (Quit: Ex-Chat)
[21:18] <lifeboy> wwdillingham: ceph status http://pastebin.com/Yfx2gXd0
[21:20] * rendar (~I@host160-181-dynamic.3-87-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:21] * haomaiwa_ (~haomaiwan@166.184.9.178) has joined #ceph
[21:21] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[21:23] <wwdillingham> well first you need to resolve your clock skew
[21:23] * rendar (~I@host160-181-dynamic.3-87-r.retail.telecomitalia.it) has joined #ceph
[21:23] <wwdillingham> is ntp running?
[21:23] <wwdillingham> does ntpstat show unsyncronized anywhere?
[21:23] <lifeboy> both s1 and h1 are syncing time from the same server.
[21:26] <wwdillingham> what does ntpstat on all mons say? and on which host is the osd down? check with ???ceph osd tree??? what does ntpstat say there?
[21:28] * mykola (~Mikolaj@91.245.77.36) Quit (Quit: away)
[21:31] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) has joined #ceph
[21:35] <wwdillingham> Those who are sucessfully running jewel on rhel7, what kernel version do you run, what is the suggested kernel for 10.2 ?
[21:39] * hifi1 (~delcake@6AGAABFEJ.tor-irc.dnsbl.oftc.net) Quit ()
[21:39] * Kyso_1 (~mollstam@tor2r.ins.tor.net.eu.org) has joined #ceph
[21:39] * dalgaaf (uid15138@id-15138.ealing.irccloud.com) Quit (Quit: Connection closed for inactivity)
[21:40] * xarses (~xarses@209.94.245.220) Quit (Remote host closed the connection)
[21:41] * xarses (~xarses@209.94.245.220) has joined #ceph
[21:41] * xarses_ (~xarses@209.94.245.220) Quit (Remote host closed the connection)
[21:41] * xarses_ (~xarses@209.94.245.220) has joined #ceph
[21:43] <lifeboy> wwdillingham: H1 runs Ubuntu, the others Debian... http://pastebin.com/PqNp6aj3
[21:43] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) Quit (Read error: Connection reset by peer)
[21:45] <lifeboy> I have removed the "down" host S2 completely also from the crushmap, so it doesn't show in ceph osd tree anymore
[21:45] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) has joined #ceph
[21:46] * Blueraven (~Uniju@7V7AAD798.tor-irc.dnsbl.oftc.net) Quit ()
[21:46] * bret1 (~Jase@relay1.cavefelem.com) has joined #ceph
[21:46] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[21:48] <wwdillingham> well your ntp problem is on s1 according to ceph status, so it needs to be resolved there, is ntp running on it?
[21:48] * dsl (~dsl@204.155.27.222) Quit (Remote host closed the connection)
[21:48] * haomaiwa_ (~haomaiwan@166.184.9.178) Quit (Remote host closed the connection)
[21:49] * dsl (~dsl@204.155.27.222) has joined #ceph
[21:49] <lifeboy> Yes, running and syncing... just checking how to get the status
[21:51] <vanham> lifeboy, wwdillingham, sometimes it take a while for my mons to detect a recently resynced time (ntpdate/ntpd). Maybe, after making sure that time is synced, you should restart that mon
[21:52] <vanham> That's with my production cluster (hammer) at least
[21:52] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[21:52] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) Quit (Read error: Connection reset by peer)
[21:53] <lifeboy> vanham: wwdillingham: just did this root@s1:~# ntpdate ntp.is.co.za
[21:53] <lifeboy> 27 Apr 21:52:39 ntpdate[375180]: adjust time server 196.4.160.4 offset -0.000994 sec
[21:53] <lifeboy> It seems fine.
[21:53] <lifeboy> Would I have to restart the server or can I just restart ceph?
[21:54] <wwdillingham> just the mon daemon
[21:54] <wwdillingham> so you only have 2 osd hosts now?
[21:55] <lifeboy> Ah, restarting mon.s1 cleared the clock skew
[21:55] <lifeboy> Yes, 2 osd hosts
[21:55] <wwdillingham> your errors might be related to having a replication size of 3+ but only having 2 osd hosts
[21:55] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) has joined #ceph
[21:55] <wwdillingham> what is the current state of ceph status?
[21:57] * dsl (~dsl@204.155.27.222) Quit (Ping timeout: 480 seconds)
[21:57] <lifeboy> At this stage the most important to me is to be able to extract the virtual machine images from the cluster if I can't get it fully operational again. Here's ceph status http://pastebin.com/ZZ47Hefq
[21:58] <lifeboy> But I suppose if I can get it up, then I can get to the data, if not, then I can't, right?
[21:58] <wwdillingham> im not convinced you are even down
[21:58] <wwdillingham> you are just in a warn state
[21:58] <vanham> wwdillingham, please pastebin the output for "ceph pg dump" and "ceph osd tree" please
[21:59] <wwdillingham> can you place an object into a pool?
[22:00] <vanham> sorry
[22:00] <vanham> lifeboy, please pastebin the output for "ceph pg dump" and "ceph osd tree"
[22:00] <vanham> lifeboy, add "ceph osd dump" to that
[22:00] <vanham> plz
[22:01] <vanham> You have two problems that, those 7 pgs down and the stuck requests
[22:02] <vanham> I'm trying to understand the 7 pgs down first
[22:04] <lifeboy> http://pastebin.com/0fE91BMY
[22:06] <wwdillingham> lifeboy: how many osds have been lost since your problem began?
[22:06] <lifeboy> The 7 pages down started showing after the two hosts went down... I upgraded to hammer from firefly. Did H1, restarted, all ok, then S1, all still ok. After doing S2, it came back up again, but I didn't check the status properly. I then did S3 and it didn't boot back up, so I check and saw S2 was down too.
[22:07] <lifeboy> I only lost an osd on each host. osd.13 and osd.2
[22:07] <wwdillingham> ok, and osd.13 has been removed from crush ?
[22:08] <vanham> You have two replicas per object
[22:08] <vanham> losing two osds should mean losing data
[22:09] * Kyso_1 (~mollstam@6AGAABFFI.tor-irc.dnsbl.oftc.net) Quit ()
[22:10] <vanham> You should be looking for pgs 2.73 1.6b 1.5e 0.38 1.2b 2.2b 0.8
[22:10] <vanham> Like, osd.10 and osd.0 should have 2.73
[22:11] <vanham> So, /var/lib/ceph/osd/ceph-0/current/2.73* on S1 and /var/lib/ceph/osd/ceph-10/current/2.73* on H1 should exist
[22:12] <vanham> (if you are using the default dirs)
[22:12] <vanham> Are they there?
[22:14] * ulterior (~Altitudes@06SAABSHI.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:15] <lifeboy> Here's what I can see for 2.73 http://pastebin.com/ubSmZD0V
[22:16] * bret1 (~Jase@4MJAAEHZW.tor-irc.dnsbl.oftc.net) Quit ()
[22:16] <lifeboy> wwdillingham: Yes, osd.13 is removed
[22:16] * Jourei (~Kizzi@4MJAAEH0Q.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:16] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) Quit (Read error: Connection reset by peer)
[22:16] <lifeboy> vanham: should they be the same pages on S1 and H1?
[22:16] * Jokerz (c05e6605@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[22:18] <vanham> lifeboy, I was expecting to see more stuff there
[22:18] <vanham> Like many objects
[22:18] <lifeboy> osd.2 is down, but I haven't touched it as far as the data is concerned, so if I get that up again,
[22:18] <lifeboy> I should be able to recover
[22:18] <vanham> Yes
[22:18] <vanham> You could check to see if osd.2 have that dir there
[22:19] <vanham> at /var/lib/ceph/osd/ceph-2/current/2.73*
[22:19] <vanham> at /var/lib/ceph/osd/ceph-2/current/2.73_head
[22:19] <vanham> If it's there, than that's a great sign
[22:20] <vanham> Usually /var/log/ceph/ceph-osd.2.log will be a great friend trying to get osd.2 up again
[22:20] * shylesh__ (~shylesh@45.124.226.179) Quit (Remote host closed the connection)
[22:20] * Hemanth (~hkumar_@103.228.221.189) Quit (Ping timeout: 480 seconds)
[22:20] <lifeboy> It will take me some time, since I have to access the server via it's ilom interface and boot from a virtual CD ... I'll see how quick I can do that and report back.
[22:23] <vanham> lifeboy, since I have to go pick up my kid @ kiddengarden soon, usually the stuck requests could mean a disk failing or with 100% usage. You should check you ceph-osd.??.log overall after that
[22:24] <lifeboy> vanham: ok, I will do that, thanks for the help!
[22:30] * xarses_ (~xarses@209.94.245.220) Quit (Ping timeout: 480 seconds)
[22:30] * xarses (~xarses@209.94.245.220) Quit (Ping timeout: 480 seconds)
[22:31] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[22:35] * xarses_ (~xarses@209.94.245.220) has joined #ceph
[22:35] * xarses (~xarses@209.94.245.220) has joined #ceph
[22:38] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) has joined #ceph
[22:41] * vanham (~vanham@191.185.29.65) Quit (Quit: Ex-Chat)
[22:42] * ibravo (~ibravo@2605:b400:104:11:7469:9e7:4d76:4b1) has joined #ceph
[22:44] * ulterior (~Altitudes@06SAABSHI.tor-irc.dnsbl.oftc.net) Quit ()
[22:44] * Mattress (~Jones@marylou.nos-oignons.net) has joined #ceph
[22:46] * Jourei (~Kizzi@4MJAAEH0Q.tor-irc.dnsbl.oftc.net) Quit ()
[22:46] * Hazmat (~Esvandiar@politkovskaja.torservers.net) has joined #ceph
[22:46] * ibravo (~ibravo@2605:b400:104:11:7469:9e7:4d76:4b1) Quit ()
[22:47] * ibravo (~ibravo@2605:b400:104:11:7469:9e7:4d76:4b1) has joined #ceph
[22:47] * ozzzo|2 (~kvirc@216.223.13.166) Quit (Quit: cya)
[22:50] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[22:51] * valeech (~valeech@209-145-79-82.unassigned.ntelos.net) Quit (Quit: valeech)
[22:54] <Anticimex> what could typically be causing a ~20 second complete & blocking freeze when doing random direct writes from fio to kvm guests rbd image onto ssd pool?
[22:54] <Anticimex> and, apart from the freeze, occasional and very periodic spikes in latency
[22:54] <Anticimex> i plotted some tests on: http://martin.millnert.se/ceph/ssdcluster/ , eg http://martin.millnert.se/ceph/ssdcluster/testrun2_clat.png
[22:55] <Anticimex> drilling into host side stats now
[22:56] * ibravo (~ibravo@2605:b400:104:11:7469:9e7:4d76:4b1) Quit (Read error: Connection reset by peer)
[22:56] <Anticimex> submission latency shows the small intermittent freezes better, http://martin.millnert.se/ceph/ssdcluster/testrun2_slat.png
[22:56] * skuggan68 (~oftc-webi@c83-252-195-130.bredband.comhem.se) has joined #ceph
[22:56] <Anticimex> sincle worker, qd=1, so serial io basically (in order to catch these things and baseline the system)
[22:57] <Anticimex> infernalis, s3600 ssds, and basically no tuning at all
[22:59] * nils_ (~nils_@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[23:01] * LeaChim (~LeaChim@host86-147-224-166.range86-147.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[23:02] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[23:02] * wwdillingham (~LobsterRo@140.247.242.44) Quit (Ping timeout: 480 seconds)
[23:05] * Racpatel (~Racpatel@2601:87:3:3601::675d) Quit (Ping timeout: 480 seconds)
[23:06] <Anticimex> when drilling down on the io stats, i found basically frozen io towards one particular drive on one host matching the large blocking
[23:09] * Racpatel (~Racpatel@2601:87:3:3601::675d) has joined #ceph
[23:10] * LeaChim (~LeaChim@host86-147-119-244.range86-147.btcentralplus.com) has joined #ceph
[23:10] <Anticimex> perhaps unsuprisingly, errors from bus, http://martin.millnert.se/ceph/ssdcluster/testrun2_error.txt
[23:12] * beat8 (~beat@46.250.135.1) Quit (Ping timeout: 480 seconds)
[23:13] * lmb (~Lars@p578a9261.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[23:14] * Mattress (~Jones@06SAABSIU.tor-irc.dnsbl.oftc.net) Quit ()
[23:14] * Azru (~Teddybare@tor-exit1-readme.dfri.se) has joined #ceph
[23:16] * billwebb (~billwebb@66.56.15.14) has joined #ceph
[23:16] * Hazmat (~Esvandiar@6AGAABFIB.tor-irc.dnsbl.oftc.net) Quit ()
[23:23] * skuggan68 (~oftc-webi@c83-252-195-130.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[23:25] * _28_ria (~kvirc@opfr028.ru) Quit (Read error: Connection reset by peer)
[23:27] * pabluk_ is now known as pabluk__
[23:28] * billwebb (~billwebb@66.56.15.14) Quit (Quit: billwebb)
[23:31] * Racpatel (~Racpatel@2601:87:3:3601::675d) Quit (Quit: Leaving)
[23:31] * Racpatel (~Racpatel@2601:87:3:3601::675d) has joined #ceph
[23:31] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[23:31] * haplo37 (~haplo37@199.91.185.156) Quit (Remote host closed the connection)
[23:34] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Remote host closed the connection)
[23:37] * SDub (~SDub@chippewa-nat.cray.com) Quit (Quit: Leaving)
[23:38] * _28_ria (~kvirc@opfr028.ru) has joined #ceph
[23:44] * Azru (~Teddybare@06SAABSJ5.tor-irc.dnsbl.oftc.net) Quit ()
[23:46] * AGaW (~redbeast1@atlantic480.us.unmetered.com) has joined #ceph
[23:48] <lifeboy> vanham, wwdillingham: Still here? The offline osd.2 contains current/2.73_head so does this mean that I have a chance to rescue the cluster?
[23:51] * vbellur (~vijay@71.234.224.255) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.