#ceph IRC Log

Index

IRC Log for 2016-09-30

Timestamps are in GMT/BST.

[0:00] * jermudgeon (~jhaustin@199.200.6.173) Quit (Quit: jermudgeon)
[0:06] * jstrassburg (~oftc-webi@66.195.131.19) Quit (Remote host closed the connection)
[0:10] * jclm (~jclm@ip68-96-196-245.lv.lv.cox.net) has joined #ceph
[0:11] * jclm (~jclm@ip68-96-196-245.lv.lv.cox.net) Quit ()
[0:14] * Concubidated (~cube@68.140.239.164) Quit (Quit: Leaving.)
[0:18] * Chaos_Llama (~K3NT1S_aw@5.8.88.149) Quit ()
[0:25] * vata (~vata@207.96.182.162) Quit (Quit: Leaving.)
[0:30] * Concubidated (~cube@12.207.21.2) has joined #ceph
[0:30] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[0:34] * KindOne (kindone@h252.172.16.98.dynamic.ip.windstream.net) has joined #ceph
[0:39] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:45] * davidzlap1 (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[0:45] * davidzlap2 (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[0:45] * davidzlap1 (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[0:48] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[0:48] * davidzlap2 (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[0:48] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Read error: Connection reset by peer)
[0:49] * KindOne (kindone@h252.172.16.98.dynamic.ip.windstream.net) has joined #ceph
[1:02] * xarses (~xarses@64.124.158.3) Quit (Ping timeout: 480 seconds)
[1:03] * Tusker (~tusker@CPE-124-190-175-165.snzm1.lon.bigpond.net.au) has joined #ceph
[1:03] * Bwana (~AotC@exit0.liskov.tor-relays.net) has joined #ceph
[1:10] * axion_joey_ (~oftc-webi@108.47.170.18) Quit (Quit: Page closed)
[1:10] * stefan0 (~stefano@168.205.191.253) Quit (Read error: Connection reset by peer)
[1:22] * vata (~vata@96.127.202.136) has joined #ceph
[1:25] <Tusker> heya guys
[1:25] <Tusker> i'm having an issue trying to get a cluster to recover... it recovers some objects when I restart the mon and osd, and then it gets to a state where it doesn't recover
[1:26] <Tusker> was working with be-el earlier, who identified that there was an issue with the network configuration, which is now resolved
[1:26] <Tusker> but, ceph is still behaving exactly the same, very slow to recover (2 pgs recovering) continously
[1:30] * oms101 (~oms101@p20030057EA008100C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:33] * Bwana (~AotC@exit0.liskov.tor-relays.net) Quit ()
[1:34] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:34] * [0x4A6F] (~ident@p4FC2717F.dip0.t-ipconnect.de) has joined #ceph
[1:40] * oms101 (~oms101@p20030057EA007900C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:48] * peetaur is now known as Guest573
[1:48] * peetaur (~peter@p200300E10BC05800667002FFFE2E10FC.dip0.t-ipconnect.de) has joined #ceph
[1:50] * jarrpa (~jarrpa@167.220.102.11) Quit (Ping timeout: 480 seconds)
[1:52] * Guest573 (~peter@p200300E10BC04E00667002FFFE2E10FC.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:53] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[1:55] * salwasser (~Adium@2601:197:101:5cc1:756a:2a6a:71dc:d80) has joined #ceph
[1:56] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[1:56] <Tusker> any ideas on how to find the missing objects ?
[1:57] * scuttlemonkey is now known as scuttle|afk
[2:01] * rakeshgm (~rakesh@38.140.108.5) Quit (Quit: Peace :))
[2:01] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Quit: Leaving.)
[2:10] * zviratko (~Esvandiar@exit0.liskov.tor-relays.net) has joined #ceph
[2:14] * sudocat (~dibarra@2602:306:8bc7:4c50:f479:1bad:a78f:3bb9) has joined #ceph
[2:16] * davidzlap (~Adium@2605:e000:1313:8003:c816:a5e6:9822:58a3) has joined #ceph
[2:25] * sudocat (~dibarra@2602:306:8bc7:4c50:f479:1bad:a78f:3bb9) Quit (Ping timeout: 480 seconds)
[2:27] * wgao (~wgao@106.120.101.38) has joined #ceph
[2:31] <Tusker> also, when I try ceph tell osd.*, it just hangs
[2:36] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[2:39] * blizzow (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[2:39] * zviratko (~Esvandiar@exit0.liskov.tor-relays.net) Quit ()
[2:40] * davidzlap (~Adium@2605:e000:1313:8003:c816:a5e6:9822:58a3) Quit (Quit: Leaving.)
[2:44] * davidzlap (~Adium@2605:e000:1313:8003:c816:a5e6:9822:58a3) has joined #ceph
[2:44] * kristen (~kristen@134.134.139.82) Quit (Quit: Leaving)
[2:48] * haplo37 (~haplo37@199.91.185.156) Quit (Remote host closed the connection)
[2:49] * vicente (~vicente@1-161-184-59.dynamic.hinet.net) Quit (Ping timeout: 480 seconds)
[2:53] * CephFan1 (~textual@173-171-133-163.res.bhn.net) has joined #ceph
[2:54] * davidzlap13 (~Adium@2605:e000:1313:8003:b52f:ed52:7d1:537d) has joined #ceph
[2:59] * CephFan1 (~textual@173-171-133-163.res.bhn.net) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[3:07] * davidzlap1 (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[3:08] * derjohn_mobi (~aj@x4db2abdd.dyn.telefonica.de) has joined #ceph
[3:09] * davidzlap2 (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[3:09] * davidzlap1 (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[3:11] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:12] * davidzlap (~Adium@2605:e000:1313:8003:c816:a5e6:9822:58a3) Quit (Ping timeout: 480 seconds)
[3:13] * davidzlap13 (~Adium@2605:e000:1313:8003:b52f:ed52:7d1:537d) Quit (Quit: Leaving.)
[3:15] * jfaj_ (~jan@p20030084AD19E6005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:16] * derjohn_mob (~aj@x590cda80.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[3:19] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[3:20] * davidzlap2 (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[3:23] * adamcrume__ (~quassel@2601:647:cb01:f890:f513:187d:92d0:5870) Quit (Quit: No Ping reply in 180 seconds.)
[3:24] * jfaj_ (~jan@p20030084AD1CE6005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[3:25] * adamcrume_ (~quassel@2601:647:cb01:f890:2834:18f0:ba81:2681) has joined #ceph
[3:26] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[3:38] * Concubidated (~cube@12.207.21.2) Quit (Quit: Leaving.)
[3:41] * vicente (~vicente@1-161-184-59.dynamic.hinet.net) has joined #ceph
[3:43] <Tusker> anyone alive ?
[3:43] * kefu (~kefu@114.92.125.128) has joined #ceph
[3:50] * vicente (~vicente@1-161-184-59.dynamic.hinet.net) Quit (Ping timeout: 480 seconds)
[3:51] * Concubidated (~cube@12.207.21.2) has joined #ceph
[3:54] * yanzheng (~zhyan@125.70.23.147) has joined #ceph
[4:06] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) Quit (Quit: Leaving.)
[4:11] * salwasser (~Adium@2601:197:101:5cc1:756a:2a6a:71dc:d80) Quit (Quit: Leaving.)
[4:13] * KindOne_ (kindone@h99.228.28.71.dynamic.ip.windstream.net) has joined #ceph
[4:14] * Concubidated (~cube@12.207.21.2) Quit (Quit: Leaving.)
[4:18] <Tusker> heya guys ?
[4:20] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:20] * KindOne_ is now known as KindOne
[4:22] * Green (~Green@27.11.112.248) has joined #ceph
[4:40] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[4:46] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[4:53] <Tusker> anyone around ?
[5:02] * Vacuum_ (~Vacuum@i59F79D55.versanet.de) has joined #ceph
[5:02] * Vacuum__ (~Vacuum@88.130.193.212) Quit (Read error: Connection reset by peer)
[5:05] * baotiao (~baotiao@218.30.116.4) has joined #ceph
[5:17] * Helleshin (~Zombiekil@108.61.122.214) has joined #ceph
[5:33] * vimal (~vikumar@114.143.161.188) has joined #ceph
[5:37] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[5:39] * adamcrume_ (~quassel@2601:647:cb01:f890:2834:18f0:ba81:2681) Quit (Quit: No Ping reply in 180 seconds.)
[5:39] * adamcrume (~quassel@2601:647:cb01:f890:e926:30a4:8c8b:b6b0) Quit (Quit: No Ping reply in 210 seconds.)
[5:40] * adamcrume (~quassel@2601:647:cb01:f890:c07:bc1:9ffe:aecb) has joined #ceph
[5:40] * adamcrume_ (~quassel@2601:647:cb01:f890:c07:bc1:9ffe:aecb) has joined #ceph
[5:47] * Helleshin (~Zombiekil@108.61.122.214) Quit ()
[5:50] * t4nk419 (~oftc-webi@v61.tyo01.gmo.ntt.us.jp.ssccnode.net) has joined #ceph
[5:50] <t4nk419> ?
[5:50] * t4nk419 (~oftc-webi@v61.tyo01.gmo.ntt.us.jp.ssccnode.net) Quit ()
[5:51] * Hemanth (~hkumar_@103.228.221.140) has joined #ceph
[5:51] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[5:58] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:00] * Vacuum__ (~Vacuum@88.130.209.236) has joined #ceph
[6:02] * kefu (~kefu@114.92.125.128) has joined #ceph
[6:02] * sudocat (~dibarra@2602:306:8bc7:4c50:f479:1bad:a78f:3bb9) has joined #ceph
[6:05] * Kakeru (~Kayla@185.133.32.19) has joined #ceph
[6:07] * Vacuum_ (~Vacuum@i59F79D55.versanet.de) Quit (Ping timeout: 480 seconds)
[6:07] * tessier_ (~treed@wsip-98-171-210-130.sd.sd.cox.net) has joined #ceph
[6:11] * walcubi (~walcubi@p5795BD11.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:11] * walcubi (~walcubi@p5795B96B.dip0.t-ipconnect.de) has joined #ceph
[6:14] * tessier (~treed@wsip-98-171-210-130.sd.sd.cox.net) Quit (Ping timeout: 480 seconds)
[6:15] * vimal (~vikumar@114.143.161.188) Quit (Quit: Leaving)
[6:25] * EinstCrazy (~EinstCraz@27.156.19.56) has joined #ceph
[6:26] * KindOne_ (kindone@h229.169.16.98.dynamic.ip.windstream.net) has joined #ceph
[6:27] * joshd1 (~jdurgin@2602:30a:c089:2b0:d811:6c84:cc46:adea) Quit (Quit: Leaving.)
[6:29] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:29] * KindOne_ is now known as KindOne
[6:32] * peetaur (~peter@p200300E10BC05800667002FFFE2E10FC.dip0.t-ipconnect.de) Quit (Quit: Konversation terminated!)
[6:35] * Kakeru (~Kayla@185.133.32.19) Quit ()
[6:40] * vimal (~vikumar@121.244.87.116) has joined #ceph
[6:45] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[6:47] * EinstCrazy (~EinstCraz@27.156.19.56) Quit (Remote host closed the connection)
[6:50] * Kurt (~Adium@kb.vpn.aco.net) has joined #ceph
[6:52] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[7:02] * Kurimus (~Coe|work@tor-exit.squirrel.theremailer.net) has joined #ceph
[7:09] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[7:16] * sudocat (~dibarra@2602:306:8bc7:4c50:f479:1bad:a78f:3bb9) Quit (Ping timeout: 480 seconds)
[7:17] * Hemanth (~hkumar_@103.228.221.140) Quit (Ping timeout: 480 seconds)
[7:29] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[7:32] * Kurimus (~Coe|work@tor-exit.squirrel.theremailer.net) Quit ()
[7:34] * LiftedKilt (~LiftedKil@is.in.the.madhacker.biz) Quit (Quit: Kilted Southern)
[7:38] * Goodi (~Hannu@194.251.119.207) has joined #ceph
[7:39] * LiftedKilt (~LiftedKil@is.in.the.madhacker.biz) has joined #ceph
[7:40] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[7:46] * Jeffrey4l_ (~Jeffrey@110.252.46.17) Quit (Ping timeout: 480 seconds)
[7:48] * karnan (~karnan@125.16.34.66) has joined #ceph
[7:52] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[7:55] * Jeffrey4l_ (~Jeffrey@110.252.46.17) has joined #ceph
[7:57] * \ask (~ask@oz.develooper.com) Quit (Quit: Bye)
[8:04] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[8:06] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:08] * sickology (~root@vpn.bcs.hr) has joined #ceph
[8:10] * kefu_ (~kefu@114.92.125.128) has joined #ceph
[8:14] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[8:23] * Hemanth (~hkumar_@103.228.221.183) has joined #ceph
[8:25] * niknakpa1dywak (~xander.ni@outbound.lax.demandmedia.com) Quit (Ping timeout: 480 seconds)
[8:28] * niknakpaddywak (~xander.ni@outbound.lax.demandmedia.com) has joined #ceph
[8:29] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[8:33] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) Quit (Quit: Leaving)
[8:35] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Quit: Ex-Chat)
[8:36] * flisky (~Thunderbi@106.38.61.181) has joined #ceph
[8:37] * branto (~branto@178.253.162.116) has joined #ceph
[8:39] * flisky (~Thunderbi@106.38.61.181) Quit ()
[8:46] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[8:50] * \ask (~ask@oz.develooper.com) has joined #ceph
[8:58] * branto (~branto@178.253.162.116) Quit (Ping timeout: 480 seconds)
[9:03] * ade (~abradshaw@2a02:810d:a4c0:5cd:9001:2bba:886a:5200) has joined #ceph
[9:04] * bara (~bara@ip4-83-240-10-82.cust.nbox.cz) has joined #ceph
[9:11] * karnan (~karnan@125.16.34.66) Quit (Ping timeout: 480 seconds)
[9:15] * branto (~branto@178.253.162.116) has joined #ceph
[9:15] * vbellur (~vijay@2601:18f:780:74a0:5e51:4fff:fee8:6a5c) has joined #ceph
[9:19] * TMM (~hp@185.5.121.201) has joined #ceph
[9:20] * analbeard (~shw@support.memset.com) has joined #ceph
[9:22] * karnan (~karnan@125.16.34.66) has joined #ceph
[9:23] * vbellur (~vijay@2601:18f:780:74a0:5e51:4fff:fee8:6a5c) Quit (Ping timeout: 480 seconds)
[9:29] * branto (~branto@178.253.162.116) Quit (Ping timeout: 480 seconds)
[9:33] * Animazing (~Wut@94.242.217.235) has left #ceph
[9:34] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) has joined #ceph
[9:36] * branto1 (~branto@178.253.162.116) has joined #ceph
[9:38] * vicente (~~vicente@125.227.238.55) has joined #ceph
[9:43] * EinstCrazy (~EinstCraz@27.156.19.56) has joined #ceph
[9:52] * EinstCrazy (~EinstCraz@27.156.19.56) Quit (Remote host closed the connection)
[10:01] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[10:09] * Jeffrey4l__ (~Jeffrey@110.252.60.167) has joined #ceph
[10:10] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[10:13] * Jeffrey4l_ (~Jeffrey@110.252.46.17) Quit (Ping timeout: 480 seconds)
[10:17] * fridim (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[10:17] * sickology (~root@vpn.bcs.hr) Quit (Ping timeout: 480 seconds)
[10:21] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[10:21] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[10:23] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[10:24] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[10:26] * niknakpa1dywak (~xander.ni@outbound.lax.demandmedia.com) has joined #ceph
[10:27] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) Quit (Quit: Splunk> Be an IT superhero. Go home early.)
[10:27] * niknakpaddywak (~xander.ni@outbound.lax.demandmedia.com) Quit (Ping timeout: 480 seconds)
[10:28] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) has joined #ceph
[10:29] * ivve (~zed@m176-68-40-86.cust.tele2.se) has joined #ceph
[10:29] * karnan (~karnan@125.16.34.66) Quit (Ping timeout: 480 seconds)
[10:31] * sickology (~root@vpn.bcs.hr) has joined #ceph
[10:37] * karnan (~karnan@125.16.34.66) has joined #ceph
[10:40] * kefu_ (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[10:41] * kefu_ (~kefu@114.92.125.128) has joined #ceph
[10:42] * sebes (~oftc-webi@81.0.80.161) has joined #ceph
[10:43] <sebes> Hi
[10:43] <sebes> We have a big problem with our ceph based storage system, does anyone can help mel?
[10:44] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) Quit (Quit: Splunk> Be an IT superhero. Go home early.)
[10:45] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) has joined #ceph
[10:45] <sebes> We have a big problem with our ceph based storage system, does anyone can help me?
[10:46] <peetaur2> sebes: with a description of the problem, perhaps someone can help you.
[10:46] <sebes> cool thx I try to describe it:
[10:48] <TMM> does anyone know if there's a way for OSDs or placement groups to have some incorrect metadata in them?
[10:48] <TMM> after a fuckup of one of my colleagues we now regularly get half-finished objects in the EC pools that crash osds
[10:49] <TMM> what happened was what I think amounted to a brief split-brain between two monitors
[10:49] <TMM> I've had trouble on my cluster ever since
[10:49] <TMM> I don't really know what I need to check
[10:49] <peetaur2> TMM: how many monitors in total?
[10:49] <TMM> peetaur2, 3
[10:50] <peetaur2> so then I don't see why split brain can happen on its own
[10:50] <TMM> it didn't happen on its own
[10:50] <TMM> from what I can tell, the system load on what was at the time the primary monitor went through the roof
[10:50] <peetaur2> so you forced it to run with 1/3?
[10:50] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[10:50] <TMM> and it appeared that osds were connecting between the two
[10:50] <TMM> err, my 1st and 2nd monitors
[10:51] <TMM> at least, that's what I thought my logfiles meant
[10:51] * derjohn_mobi (~aj@x4db2abdd.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[10:52] <TMM> The primary problem appeared to be that the primary monitor was *almost* unreachable but not entirely
[10:52] <TMM> whatever exactly happened ever *since* then I'm having all kinds of very strange problems
[10:53] <sebes> We added a new OSD into ceph, and the rebalance is started. We have a disk whick is almost full (99%). Is there any possbility to accelarate the rebalance's speed?
[10:54] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) Quit (Quit: Splunk> Be an IT superhero. Go home early.)
[10:54] <sebes> the log: ceph-osd: 2016-09-30 10:15:24.278955 7fbd3554e700 -1 log_channel(cluster) log [ERR] : OSD full dropping all updates 98% full
[10:54] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) has joined #ceph
[10:54] <sebes> So again:
[10:54] <sebes> We added a new OSD into ceph, and the rebalance is started. We have a disk whick is almost full (99%). Is there any possbility to accelarate the rebalance's speed? ceph-osd: 2016-09-30 10:15:24.278955 7fbd3554e700 -1 log_channel(cluster) log [ERR] : OSD full dropping all updates 98% full
[10:55] <TMM> peetaur2, any ideas on what else I could check?
[10:58] <peetaur2> sebes: don't know... maybe it's in http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#backfilling
[10:58] <peetaur2> TMM: not sure, but I think it shouldn't split brain unless you force it to run with a minority of mons
[10:59] * branto (~branto@178.253.162.116) has joined #ceph
[11:00] <TMM> peetaur2, well, something happened during that event that caused these issues, we never had any corrupted objects before that happened
[11:02] * pfactum is now known as post-factum
[11:03] <sebes> peetaur2: thx, the problem is that with this can fine-tuning the backfilling. But until now 0 objects are removed from the full disk.
[11:03] * thundercloud (~csharp@46.166.138.149) has joined #ceph
[11:05] * branto1 (~branto@178.253.162.116) Quit (Ping timeout: 480 seconds)
[11:07] <TMM> sebes, I think you can unset the 'full' flag which should allow backfilling to continue
[11:10] <sebes> ok thx, we check it
[11:12] * baotiao (~baotiao@218.30.116.4) Quit (Quit: baotiao)
[11:15] <sebes> TMM: how can we unset the full flag?
[11:16] <peetaur2> I don't know enough to be sure, but I think setting it to non-full will be counterproductive, just filling it more, not emptying it.
[11:17] * kefu_ (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[11:17] * kefu_ (~kefu@114.92.125.128) has joined #ceph
[11:21] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[11:22] * baotiao (~baotiao@218.30.116.7) has joined #ceph
[11:27] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[11:31] <sebes> TMM: we noout the disk and for a while it can solve the problem
[11:31] <TMM> sebes, if you have enough copies you can just remove the osd entirely, resync the cluster and then add it back too
[11:31] <TMM> or reduce its crush weight
[11:33] * thundercloud (~csharp@46.166.138.149) Quit ()
[11:34] * branto (~branto@178.253.162.116) Quit (Ping timeout: 480 seconds)
[11:36] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:4418:f044:c01f:37d4) has joined #ceph
[11:38] * branto (~branto@178.253.162.116) has joined #ceph
[11:40] * derjohn_mobi (~aj@b2b-94-79-172-98.unitymedia.biz) has joined #ceph
[11:49] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) Quit (Quit: Splunk> Be an IT superhero. Go home early.)
[11:49] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) has joined #ceph
[11:52] <sebes> THX for the help!
[11:52] <sebes> have a nice weekend!
[11:52] * sebes (~oftc-webi@81.0.80.161) Quit (Quit: Page closed)
[11:54] * arbrandes1 (~arbrandes@ec2-54-172-54-135.compute-1.amazonaws.com) has joined #ceph
[11:54] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[11:59] * ivve (~zed@m176-68-40-86.cust.tele2.se) Quit (Ping timeout: 480 seconds)
[12:01] * arbrandes (~arbrandes@ec2-54-172-54-135.compute-1.amazonaws.com) Quit (Ping timeout: 480 seconds)
[12:01] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[12:07] * branto (~branto@178.253.162.116) Quit (Ping timeout: 480 seconds)
[12:14] * dugravot6 (~dugravot6@nat-persul-montet.wifi.univ-lorraine.fr) has joined #ceph
[12:18] * branto (~branto@178.253.162.116) has joined #ceph
[12:19] * dugravot6 (~dugravot6@nat-persul-montet.wifi.univ-lorraine.fr) Quit ()
[12:20] * rraja (~rraja@125.16.34.66) has joined #ceph
[12:21] * salwasser (~Adium@2601:197:101:5cc1:9cd9:46d1:11d3:1500) has joined #ceph
[12:27] * kefu_ (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:30] * salwasser (~Adium@2601:197:101:5cc1:9cd9:46d1:11d3:1500) Quit (Quit: Leaving.)
[12:32] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[12:51] * kefu (~kefu@114.92.125.128) has joined #ceph
[12:55] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[13:01] * Guest269 (~steve@ip68-98-63-137.ph.ph.cox.net) Quit (Remote host closed the connection)
[13:01] * steve (~steve@ip68-98-63-137.ph.ph.cox.net) has joined #ceph
[13:02] * steve is now known as Guest607
[13:05] * vicente (~~vicente@125.227.238.55) Quit (Quit: Leaving)
[13:06] * nardial (~ls@p5DC07C6E.dip0.t-ipconnect.de) has joined #ceph
[13:11] * trociny (~mgolub@93.183.239.2) Quit (Quit: ??????????????)
[13:12] * eXeler0n (~Lite@exit1.radia.tor-relays.net) has joined #ceph
[13:13] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[13:20] * derjohn_mobi (~aj@b2b-94-79-172-98.unitymedia.biz) Quit (Ping timeout: 480 seconds)
[13:27] * vata (~vata@96.127.202.136) Quit (Ping timeout: 480 seconds)
[13:33] * karnan (~karnan@125.16.34.66) Quit (Ping timeout: 480 seconds)
[13:34] * Tusker (~tusker@CPE-124-190-175-165.snzm1.lon.bigpond.net.au) Quit (Quit: Time wasted on IRC: 1 day 3 hours 49 minutes 21 seconds)
[13:36] * DanFoster (~Daniel@office.34sp.com) Quit (Quit: Leaving)
[13:42] * eXeler0n (~Lite@exit1.radia.tor-relays.net) Quit ()
[13:43] * trociny (~mgolub@93.183.239.2) has joined #ceph
[13:43] <ivve> hmmm
[13:44] <ivve> did resizing change somehow in 10.2.3? i can't xfs_growfs images anymore
[13:44] <ivve> at least live
[13:44] <ivve> rbd accepts the resize but not the fs, no errors or anything
[13:46] <peetaur2> sounds like an xfs/guest os issue
[13:46] <ivve> tried on two different machines, same "error"
[13:46] <peetaur2> try stopping the vm/client (or making a new one), and doing it from a new kernel and xfs version
[13:47] <ivve> superwierd
[13:47] <ivve> booting the testvm, although i don't like booting as a solution
[13:48] * Guest607 (~steve@ip68-98-63-137.ph.ph.cox.net) Quit (Remote host closed the connection)
[13:49] * steve_ (~steve@ip68-98-63-137.ph.ph.cox.net) has joined #ceph
[13:49] <ivve> seems that clients might need a restart after upgrading hammer->jewel
[13:49] <ivve> buhu
[13:50] * portante (~portante@nat-pool-bos-t.redhat.com) Quit (Quit: ZNC 1.6.2 - http://znc.in)
[13:50] <peetaur2> if you can live migrate qemu-kvm, I think that might do it :)
[13:50] <ivve> its rbd images mounted as xfs FS
[13:50] <ivve> vm is in vmware
[13:50] <ivve> so thats probably a no-go
[13:51] <peetaur2> does it do live migration though? (costs a fortune for the version that does, but it does)
[13:52] <peetaur2> (a fortune to a small company...maybe nothing to a big company)
[13:52] <ivve> i can try
[13:52] <peetaur2> in qemu, the vm just opens the disk like normal....virtio or ide or whatever drivers
[13:53] <peetaur2> so qemu does the rbd stuff; and when you migrate, the new qemu instance is new, not cloned or something... so you can change qemu versions and everything; so it should also reopen the rdb
[13:53] <peetaur2> rbd
[13:53] * rwheeler (~rwheeler@pool-108-7-196-31.bstnma.fios.verizon.net) Quit (Quit: Leaving)
[13:54] <ivve> after reboot the other vm started responding well on the resize/grow
[13:54] <ivve> on new grows
[13:54] <ivve> migrated the vm, my hopes are not high :P
[13:54] <ivve> nope nothing
[13:54] <ivve> that sucks
[13:55] * rraja (~rraja@125.16.34.66) Quit (Quit: Leaving)
[13:59] * portante (~portante@nat-pool-bos-t.redhat.com) has joined #ceph
[14:01] * briner (~briner@129.194.16.54) Quit (Remote host closed the connection)
[14:03] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:03] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) Quit (Remote host closed the connection)
[14:04] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[14:05] * dugravot6 (~dugravot6@l-p-dn-in-4a.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:05] * krypto (~krypto@G68-121-13-68.sbcis.sbc.com) has joined #ceph
[14:06] <doppelgrau> ivve: how do you export the rbd-image to esx?
[14:07] * portante (~portante@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:07] <ivve> no to the vm
[14:07] * portante (~portante@nat-pool-bos-t.redhat.com) has joined #ceph
[14:07] <ivve> just a centos
[14:08] <doppelgrau> ivve: so the centos-vm is the client, kernel or qemu-nd?
[14:08] <ivve> kernel
[14:09] <ivve> aah fucking systemd
[14:11] <peetaur2> oh if the guest directly imports the rbd, and vmware knows nothing of it, the live migration thing won't do anything
[14:11] <ivve> is it even possible to export a rbd to esx?
[14:12] * ade (~abradshaw@2a02:810d:a4c0:5cd:9001:2bba:886a:5200) Quit (Ping timeout: 480 seconds)
[14:12] <doppelgrau> ivve: iscsi, never tried but should work even with mulipath wehn disbaling the rbd-cache
[14:12] <ivve> directly from ceph?
[14:12] <ivve> thought that wasn't ready yet
[14:13] <peetaur2> for the insane amount of money you pay for vmware, I simply assumed yes :)
[14:13] <peetaur2> rbd is all open source stuff...no reason they can't have it instantly with their budget
[14:13] <doppelgrau> ivve: some older slides (in german) from a company doing linux support: https://www.heinlein-support.de/sites/default/files/ceph-iscsi-host-failover-multipath.pdf
[14:14] <doppelgrau> ivve: and on the mailinglist there was a message a few weeks ago, that someone even got a vmware certification for a special setup => you can open even support cases without being told the setup is unsupported
[14:16] <ivve> well thats using two or more hosts supplying iscsi
[14:16] <ivve> which are supplied with the same rbd image
[14:16] <hroussea> radosgw question: is it still possible in jewel to have within a single zone multiple placement_targets (eg: multiple pools) ? I setup everything correctly (I guess at least) and the data always gets written in the default group
[14:16] * b0e (~aledermue@213.95.25.82) has joined #ceph
[14:18] * analbeard (~shw@support.memset.com) Quit (Ping timeout: 480 seconds)
[14:21] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[14:23] * DougalJacobs (~tokie@108.61.122.88) has joined #ceph
[14:23] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) has joined #ceph
[14:29] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[14:33] * vincepii (~textual@77.245.22.67) has joined #ceph
[14:33] * analbeard (~shw@5.153.255.226) has joined #ceph
[14:33] * vincepii (~textual@77.245.22.67) Quit ()
[14:34] * vincepii (~textual@77.245.22.67) has joined #ceph
[14:37] * Kurt (~Adium@kb.vpn.aco.net) Quit (Read error: Connection reset by peer)
[14:38] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:39] * rdas (~rdas@121.244.87.116) has joined #ceph
[14:39] * Kurt (~Adium@188-23-105-53.adsl.highway.telekom.at) has joined #ceph
[14:41] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) has joined #ceph
[14:43] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[14:43] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[14:43] * Goodi (~Hannu@194.251.119.207) Quit (Quit: This computer has gone to sleep)
[14:44] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[14:45] * vikhyat (~vumrao@49.248.204.2) has joined #ceph
[14:46] <mistur> hello
[14:46] <mistur> hroussea: I think so
[14:47] <mistur> hroussea: I have one zonegroup and one zone and 2 pools, one replicated pool and one EC pool
[14:47] <mistur> and you have to specify the target pool
[14:48] <mistur> but with rclone or s3cmd, I had issue to force the target
[14:48] <mistur> so I create a deficated user for erasure pool and change the default placement for that user only
[14:49] * Jeffrey4l_ (~Jeffrey@221.195.84.71) has joined #ceph
[14:49] <mistur> then when I have to create a bucket on erasure, I use this user to create the bucket, then I can push data from others users on the good pool
[14:50] * Hemanth (~hkumar_@103.228.221.183) Quit (Ping timeout: 480 seconds)
[14:50] <mistur> but it should be possible to set the placement_target directly to s3cmd or rclone
[14:50] * Jeffrey4l__ (~Jeffrey@110.252.60.167) Quit (Ping timeout: 480 seconds)
[14:53] * DougalJacobs (~tokie@108.61.122.88) Quit ()
[14:56] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[14:56] <hroussea> mistur: s3cmd seems to send the right string and radosgw says that "create bucket location constraint: indexless-placement"
[14:56] * vincepii (~textual@77.245.22.67) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[14:57] * branto (~branto@178.253.162.116) Quit (Quit: Leaving.)
[14:57] <mistur> hroussea: ok I will test that thanks
[14:58] * vincepii (~textual@77.245.22.67) has joined #ceph
[15:02] * Oddtwang (~eXeler0n@5.157.38.2) has joined #ceph
[15:03] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[15:04] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[15:06] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[15:06] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[15:07] * branto (~branto@178.253.162.116) has joined #ceph
[15:08] * imcsk8_ (~ichavero@189.231.109.122) has joined #ceph
[15:10] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[15:10] * imcsk8 (~ichavero@189.155.163.170) Quit (Ping timeout: 480 seconds)
[15:18] * branto (~branto@178.253.162.116) Quit (Quit: Leaving.)
[15:21] * branto (~branto@178.253.162.116) has joined #ceph
[15:24] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:28] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[15:30] * danieagle (~Daniel@187.35.190.157) has joined #ceph
[15:31] * danieagle (~Daniel@187.35.190.157) Quit ()
[15:32] * Oddtwang (~eXeler0n@5.157.38.2) Quit ()
[15:34] * steve_ (~steve@ip68-98-63-137.ph.ph.cox.net) Quit (Remote host closed the connection)
[15:34] * steve_ (~steve@ip68-98-63-137.ph.ph.cox.net) has joined #ceph
[15:37] * baotiao (~baotiao@218.30.116.7) Quit (Quit: baotiao)
[15:41] * yanzheng (~zhyan@125.70.23.147) Quit (Quit: This computer has gone to sleep)
[15:44] * rdas (~rdas@121.244.87.116) has joined #ceph
[15:47] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[15:48] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[15:48] * bniver (~bniver@nat-pool-bos-u.redhat.com) has joined #ceph
[15:50] * kefu (~kefu@114.92.125.128) has joined #ceph
[15:55] * salwasser (~Adium@72.246.3.14) has joined #ceph
[15:57] * bniver (~bniver@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[16:03] * vincepii (~textual@77.245.22.67) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[16:05] <hroussea> (but I have a strange behaviour in the sense that the rgw doesn't honor the placement_rule and falls back to default for an obscure reason)
[16:07] * vincepii (~textual@77.245.22.67) has joined #ceph
[16:07] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[16:08] * CephFan1 (~textual@96-31-67-206.static.hvvc.us) has joined #ceph
[16:08] * bniver (~bniver@nat-pool-bos-t.redhat.com) has joined #ceph
[16:09] <mistur> hroussea: that's exactly what I have observed...
[16:10] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[16:10] * squizzi (~squizzi@107.13.237.240) has joined #ceph
[16:11] <mistur> and that's why I've created the specific user and changed the default_placement pool thus buckets are created on the right pool
[16:11] <hroussea> oh ok :)
[16:12] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[16:16] <hroussea> mistur: it seems like placement_rule is not defined in https://github.com/ceph/ceph/blob/v10.2.3/src/rgw/rgw_op.cc#L1920
[16:17] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Read error: No route to host)
[16:18] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[16:22] <mistur> hroussea: indeed
[16:23] <mistur> https://github.com/ceph/ceph/blob/v10.2.3/src/rgw/rgw_op.cc#L2038
[16:23] <hroussea> mistur: it turns out that it's a RT(non-existent)FM, the format of the --bucket-location is region:placement_target
[16:23] <hroussea> and it works now :)
[16:24] * ron-slc (~Ron@173-165-129-118-utah.hfc.comcastbusiness.net) has joined #ceph
[16:24] <mistur> ok
[16:25] * nils_ (~nils_@doomstreet.collins.kg) has joined #ceph
[16:26] <mistur> hroussea: yes, it works for me too
[16:28] * CephFan1 (~textual@96-31-67-206.static.hvvc.us) Quit (Ping timeout: 480 seconds)
[16:29] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:34] * kristen (~kristen@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[16:38] <TMM> Does anyone know what the relevance of 'direct' is in the cache mode output for the qemu rbd backend here? http://paste.debian.net/848125/
[16:39] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[16:39] <TMM> it seems that data on volume with the 'direct' flag gets its disk caches mangled and leads to data loss for clients writing with o_direct from the guest, whereas that doesn't happen with the volume above without the direct flag
[16:39] <TMM> I'm trying to figure out where this flag is even coming from
[16:39] <TMM> but I also can't find what it MEANS
[16:40] <peetaur2> TMM: sounds dreadful... use ps -ef to see the full qemu syntax and see if there's something about it there
[16:40] <peetaur2> and if you find out why it's there, plz do tell me
[16:40] <TMM> peetaur2, it gets added later through the monitor
[16:41] <peetaur2> so it's not there on startup, but it is after a minute?
[16:41] <TMM> it gets added by the 'cinder' volume driver for nova
[16:41] <peetaur2> maybe you could try tcpdump on the QMP port if that's what it uses
[16:42] <peetaur2> ok so if you think it comes from cinder, then ask the openstack channels
[16:42] <TMM> ah, no
[16:42] <TMM> my mistake
[16:42] <TMM> it does get added from the command line
[16:42] <TMM> with cache=none
[16:42] <TMM> whereas the other one gets added with 'cache=writeback'
[16:42] <TMM> so the ,direct comes because of cache=...none?
[16:42] <TMM> wtf
[16:43] <TMM> grmbl
[16:44] <peetaur2> that is what this seems to say https://www.suse.com/documentation/sles11/book_kvm/data/sect1_1_chapter_book_kvm.html
[16:45] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Ping timeout: 480 seconds)
[16:45] <TMM> peetaur2, you are correct, I had misread that explanation earlier, particularly about the host adapter being put in writeback mode
[16:45] <TMM> the weird thing is that this is supposed to be safer
[16:46] <TMM> but it seems that using librbd it is not
[16:46] <peetaur2> well I think you hit some kind of bug
[16:46] * SamYaple (~SamYaple@162.209.126.134) Quit (Quit: leaving)
[16:46] * SamYaple (~SamYaple@162.209.126.134) has joined #ceph
[16:46] <peetaur2> it should only affect safety when crashing or hard reset/power off
[16:46] <TMM> yeah, this obviously shouldn't happen with live migration
[16:46] <peetaur2> (and supposedly with live migration, O_DIRECT is always used regardless of setting)
[16:46] * ntpttr_ (~ntpttr@192.55.54.38) has joined #ceph
[16:47] <TMM> I was suspecting that either libvirt or qemu didn't properly flush rdb client caches when it thinks there are no caches
[16:47] <TMM> but that would have to be libvirt then, qemu *is* aware of the writeback cache
[16:47] <TMM> now I'm suspecting that I'm just completely off base
[16:48] <peetaur2> ok but I don't think libvirt does anything with disks or cache... it just tells qemu what to configure
[16:48] <TMM> libvirt *is* involved in live migrations though
[16:48] <peetaur2> which qemu version is it?
[16:49] <TMM> 2.3.0
[16:49] <SamYaple> TMM: not always
[16:49] <SamYaple> TMM only if its tunneled
[16:49] <TMM> SamYaple, it is, in my case
[16:49] <SamYaple> i dont do tunneled migrations so it is qemu to qemu direct
[16:49] <SamYaple> TMM: ok :)
[16:50] <TMM> I'm using tunneled migrations
[16:50] <TMM> maybe I should try untunnled ones
[16:50] <SamYaple> TMM: im sorry, i got bounced and missed scroll back
[16:51] <TMM> peetaur2, qemu-kvm-ev-2.3.0-31.el7_2.10.1 to be exact
[16:53] <peetaur2> sounds ancient, like you'd expect from RH...but hard to know what that means...what have they backported and what did they miss
[16:54] <peetaur2> so I guess you could optionally test a vanilla 2.3 or 2.7 and then bug report it to your distro because they created this backport version
[16:54] <SamYaple> TMM: jsut read scrollback. is the issue here you want cinder to respect the writeback mode?
[16:54] <peetaur2> I think the issue is that using cache=none is dangerous because there seems to be a bug where O_DIRECT from qemu breaks things but O_DIRECT from guests doesn't
[16:55] <peetaur2> (and also live migration uses O_DIRECT, even if you have cache set to something else)
[16:55] * nardial (~ls@p5DC07C6E.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[16:55] <TMM> SamYaple, my first stop is to get cinder to respect the disk_cachemodes setting so I know that at least that works
[16:55] <TMM> SamYaple, once that's done I'd like to use peetaur2's suggestion to figure out what's actually wrong
[16:56] <TMM> peetaur2, did you actually find this bugreport/
[16:56] <peetaur2> no, but no cache mode should cause corruption unless you do hard poweroffs or crashes
[16:56] <peetaur2> so anything doing that is a bug
[16:57] <TMM> peetaur2, agreed
[16:57] <SamYaple> TMM: thats going to be dependant on your version of openstack
[16:57] <TMM> SamYaple, liberty atm
[16:57] <SamYaple> im actually not entirely sure it landed even in Newton
[16:58] <TMM> if there's a patch for it, I'll happily backport it
[16:58] <TMM> I've done it before
[16:58] <SamYaple> yea im looking
[16:58] <SamYaple> i have a vague recollection of this
[16:58] <TMM> I have some other local fixes here that are either bullshit elsehwere or just in bad shape :P
[16:58] <SamYaple> its the same issue that happened with discard support
[16:58] <SamYaple> cinder said "well we don't know what backend is there"
[16:58] <TMM> yeah, discard also seems to not work for volumes
[17:00] <TMM> peetaur2, I'm thinking actually that O_DIRECT from guests get messed up during live migration only, mysql databases have innodb corruption after a live migration but no filesystem corruption. Some Innodb pages appear as zeros to mysql after a live migration, if the database is shut down, and the volume mounted and remounted the data is fine, unless an update to a page was done and a page of partial zeros gets written to disk.
[17:00] <TMM> peetaur2, it's a very peculiar problem
[17:00] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Read error: Connection reset by peer)
[17:01] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[17:02] <SamYaple> TMM: it does respect writeback in Mitaka
[17:02] <SamYaple> TMM: this isn't a single patch you can apply though
[17:02] <SamYaple> or more accurately it respects cachemode
[17:02] <TMM> SamYaple, if I can just figure out where the actual attachment is done in the code I'll happily hardcode it to cache=writeback for now for all volumes
[17:02] <SamYaple> that always in novas libvirt generator
[17:03] <TMM> so far I haven't been able to actually figure out where that implementation lives
[17:03] <SamYaple> there is a template generator
[17:03] <TMM> I'll have a look there then
[17:03] <TMM> I was hoping I could do it only for all cinder attachments
[17:04] <TMM> but I guess I can just add the discard and cache settings everywhere
[17:04] <TMM> it doesn't really matter
[17:04] <SamYaple> https://github.com/openstack/nova/blob/stable/liberty/nova/virt/libvirt/driver.py
[17:04] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:05] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) Quit ()
[17:05] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:07] <TMM> SamYaple, thank you, I'll just hardcode it in the generator
[17:07] <TMM> SamYaple, then when moving to mitaka I can drop the patch again
[17:07] <TMM> thank you
[17:10] * jfaj_ (~jan@p20030084AD1CE6005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Quit: WeeChat 1.5)
[17:11] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[17:12] * analbeard (~shw@5.153.255.226) Quit (Ping timeout: 480 seconds)
[17:15] * bara (~bara@ip4-83-240-10-82.cust.nbox.cz) Quit (Quit: Bye guys! (??????????????????? ?????????)
[17:16] * malevolent (~quassel@192.146.172.118) Quit (Quit: No Ping reply in 180 seconds.)
[17:17] * scuttle|afk is now known as scuttlemonkey
[17:17] * malevolent (~quassel@192.146.172.118) has joined #ceph
[17:18] * krypto (~krypto@G68-121-13-68.sbcis.sbc.com) Quit (Ping timeout: 480 seconds)
[17:19] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:20] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[17:20] * squizzi (~squizzi@107.13.237.240) Quit (Ping timeout: 480 seconds)
[17:21] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[17:22] * kefu (~kefu@114.92.125.128) has joined #ceph
[17:23] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[17:24] * wes_dillingham (~wes_dilli@140.247.242.44) has joined #ceph
[17:27] * m0zes__ (~mozes@n117m02.cs.ksu.edu) has joined #ceph
[17:28] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[17:28] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Read error: Connection reset by peer)
[17:28] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[17:30] * branto (~branto@178.253.162.116) Quit (Quit: Leaving.)
[17:30] <m0zes__> okay. just had 3 of my 5 monitors blow up unexpectedly. cannot restart them. was at 10.2.1, used ceph-deploy to install another mds node. it unexpectedly started itself running 10.2.3. the monitors exploded, so I updated them (quickly) to 10.2.3. still won???t fire up. getting logs like the following out of the monitors. http://paste.ie/view/29c6d71b
[17:31] <m0zes__> I *really* need to get these up soon
[17:35] * vikhyat (~vumrao@49.248.204.2) Quit (Quit: Leaving)
[17:36] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[17:36] <m0zes__> I think it incremented the mds map, and the mons couldn???t parse part of the message
[17:36] <m0zes__> -3> 2016-09-30 10:33:17.800585 7f313bffd700 10 mon.hobbit01@0(leader).mds e903363 create_pending e903364
[17:36] <m0zes__> -2> 2016-09-30 10:33:17.800592 7f313bffd700 10 mon.hobbit01@0(leader).mds e903363 e903363: 1/1/1 up {0=hobbit13=up:active}, 1 up:standby
[17:36] <m0zes__> -1> 2016-09-30 10:33:17.800618 7f313bffd700 20 mon.hobbit01@0(leader).mds e903363 gid 1055992652 is standby and following nobod
[17:40] * nils_ (~nils_@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[17:47] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[17:47] * nils_ (~nils_@doomstreet.collins.kg) has joined #ceph
[17:48] <doppelgrau> m0zes: tried (after making a backup of the whole mon-dir) injecting a monmap from one of the surviving mons?
[17:49] * sudocat (~dibarra@2602:306:8bc7:4c50:f479:1bad:a78f:3bb9) has joined #ceph
[17:52] <m0zes__> that didn???t fix it. the monmap doesn???t contain the mdsmap, so probably won???t help. especially considering it looks to be a pending change that broke it.
[17:53] * blizzow (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) has joined #ceph
[17:53] <m0zes__> doppelgrau: thanks for the suggestion, though.
[17:55] <m0zes__> I could, potentially, make backups of the two good ones, extract their monmaps, edit them to remove the other mons and re-insert. unfortunately, considering that they???re missing the pending mdsmap update, they may not be in complete sync. I would expect them to be, but who knows...
[17:57] * ntpttr_ (~ntpttr@192.55.54.38) Quit (Remote host closed the connection)
[18:01] * Miho (~osuka_@tor-exit.squirrel.theremailer.net) has joined #ceph
[18:02] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:02] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[18:03] * bjozet_ (~bjozet@82-183-17-144.customers.ownit.se) Quit (Remote host closed the connection)
[18:04] * jowilkin (~jowilkin@184-23-213-254.fiber.dynamic.sonic.net) has joined #ceph
[18:04] * vincepii (~textual@77.245.22.67) Quit (Quit: Textual IRC Client: www.textualapp.com)
[18:10] <blizzow> How do I list all snapshots for objects in a pool? I have lots of objects in there and don't want to do: rbd snap ls mypool/eachimageorobject.img
[18:10] * baotiao (~baotiao@43.255.178.184) has joined #ceph
[18:13] * scg (~zscg@181.122.0.60) Quit (Ping timeout: 480 seconds)
[18:14] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[18:14] <doppelgrau> m0zes: if they aren't completly in snyc, they are only one epoch behind, and if the rest of the cluster has been stable, that shouldn't make a difference (worst case new mds missing), or forget I something
[18:23] * squizzi (~squizzi@2001:420:2240:1268:ad85:b28:ee1c:890) has joined #ceph
[18:29] * ntpttr_ (~ntpttr@192.55.54.44) has joined #ceph
[18:30] * Miho (~osuka_@tor-exit.squirrel.theremailer.net) Quit ()
[18:32] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[18:34] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[18:35] * nils_ (~nils_@doomstreet.collins.kg) Quit (Quit: Leaving)
[18:38] * scg (~zscg@181.122.0.60) has joined #ceph
[18:38] <walcubi> The number of unchangeable config options is just disappointing.
[18:38] <m0zes__> doppelgrau: I attempted to do just that (again with backups). they???re in sync (and broken)
[18:40] <doppelgrau> walcubi: 10.2? IIRC there is a bug that many option reported wrong as read only
[18:40] <doppelgrau> m0zes: ugly
[18:40] <m0zes__> yeah. this is really bad.
[18:41] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[18:41] <m0zes__> especially considering everything in the monitor is leveldb. I can???t even edit it by hand.
[18:41] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:42] * rwheeler (~rwheeler@pool-108-7-196-31.bstnma.fios.verizon.net) has joined #ceph
[18:44] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[18:44] <walcubi> doppelgrau, even more so then that there have been three point releases and it's never been picked up. ;-)
[18:45] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[18:46] * ade (~abradshaw@tmo-098-246.customers.d1-online.com) has joined #ceph
[18:57] * ahmeni (~JamesHarr@108.61.122.218) has joined #ceph
[18:58] * vata (~vata@96.127.202.136) has joined #ceph
[19:00] * ade (~abradshaw@tmo-098-246.customers.d1-online.com) Quit (Quit: Too sexy for his shirt)
[19:01] <walcubi> The best way to evenly spread out scrubs is run each manually?
[19:01] * baotiao (~baotiao@43.255.178.184) Quit (Quit: baotiao)
[19:02] <walcubi> Or should they be evenly spread after the first run has finished.
[19:05] * ntpttr_ (~ntpttr@192.55.54.44) Quit (Remote host closed the connection)
[19:05] * jdillaman_ (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[19:06] * jdillaman_ (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) Quit ()
[19:15] * krypto (~krypto@49.207.56.43) has joined #ceph
[19:15] * krypto (~krypto@49.207.56.43) Quit ()
[19:16] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[19:17] * bniver (~bniver@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[19:21] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[19:27] * ahmeni (~JamesHarr@108.61.122.218) Quit ()
[19:28] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[19:29] * rakeshgm (~rakesh@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[19:30] * Kurt (~Adium@188-23-105-53.adsl.highway.telekom.at) Quit (Quit: Leaving.)
[19:32] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[19:32] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Remote host closed the connection)
[19:32] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[19:42] <blizzow> I have 1 OSD node with 8x4TB drives, 2 OSD nodes with 4x4TB drives, and 8 OSD nodes with 4x1TB drives. I have a pool with ~8-12TB of data in it, how many PGs should I have in the pool?
[19:52] * realitysandwich (~perry@2001:920:1846:1dc0:baae:edff:fe73:f413) Quit (Read error: Connection reset by peer)
[19:52] * fridim (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[19:58] <walcubi> by the way, would the cluster go into a warning state if scrubs can't be done in a timely manner?
[20:02] * nilez (~nilez@209.95.50.118) Quit ()
[20:03] * _rsK (~rene@x4e37516d.dyn.telefonica.de) has joined #ceph
[20:03] <walcubi> ie: 4096 pgs, 38 scrubbing in parallel, 0 have finished, 10 hours since start.
[20:04] * _rsK (~rene@x4e37516d.dyn.telefonica.de) Quit ()
[20:04] <walcubi> Assuming it finishes in 12 hours, that's 54 days to scrub all PGs.
[20:05] * linuxkidd (~linuxkidd@ip70-189-214-97.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[20:05] * linuxkidd (~linuxkidd@ip70-189-214-97.lv.lv.cox.net) has joined #ceph
[20:07] * scg (~zscg@181.122.0.60) Quit (Ping timeout: 480 seconds)
[20:08] <walcubi> By comparison, the clients that access the data stored, ensure that the entire pool has been stat()'d and verified it's integrity about once every 2 days.
[20:08] <walcubi> Unless I'm missing something, something is horribly slow with scrubbing
[20:12] * Concubidated (~cube@c-73-12-218-131.hsd1.ca.comcast.net) has joined #ceph
[20:13] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[20:15] <blizzow> How much bother will adding PGs to a pool put my cluster in?
[20:20] * fridim (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[20:30] * wes_dillingham_ (~wes_dilli@65.112.8.205) has joined #ceph
[20:31] * wes_dillingham_ (~wes_dilli@65.112.8.205) Quit ()
[20:31] * wes_dillingham_ (~wes_dilli@65.112.8.205) has joined #ceph
[20:35] * nilez (~nilez@104.129.28.50) has joined #ceph
[20:35] * wes_dillingham (~wes_dilli@140.247.242.44) Quit (Ping timeout: 480 seconds)
[20:35] * wes_dillingham_ is now known as wes_dillingham
[20:35] * BrianA (~BrianA@c-73-189-153-151.hsd1.ca.comcast.net) has joined #ceph
[20:35] * scg (~zscg@181.122.0.60) has joined #ceph
[20:36] * fridim (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[20:38] * mykola (~Mikolaj@91.245.74.212) has joined #ceph
[20:41] <m0zes__> anyone around that knows anything about repairing damaged monitors? http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013390.html
[20:42] * imcsk8_ (~ichavero@189.231.109.122) Quit (Ping timeout: 480 seconds)
[20:47] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:4418:f044:c01f:37d4) Quit (Ping timeout: 480 seconds)
[20:56] * salwasser (~Adium@72.246.3.14) Quit (Quit: Leaving.)
[20:56] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Read error: Connection reset by peer)
[20:57] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[21:13] * raphaelsc (~raphaelsc@177.157.175.32) has joined #ceph
[21:15] <shubjero> Hey all, I'm SSH'ing to a jump server and using ipmiconsole to connect to a servers serial-over-lan and trying to enter the raid bios (ctrl-r) but ctrl-r seems to 'redraw/refresh' the ipmiconsole screen. Any idea on how I can pass the 'CTRL-R' keystroke to the server?
[21:25] <blizzow> shubjero: I think you may want a different forum?
[21:25] * wak-work (~wak-work@2620:15c:2c5:3:ed41:8500:8d0:fe8a) Quit (Remote host closed the connection)
[21:25] * wak-work (~wak-work@2620:15c:2c5:3:2497:7a21:8815:f0e7) has joined #ceph
[21:26] * BrianA (~BrianA@c-73-189-153-151.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:26] * BrianA (~BrianA@c-73-189-153-151.hsd1.ca.comcast.net) has joined #ceph
[21:28] <blizzow> shubjero: But if I remember correctly, the sol interface tells you what to use for your keymapping on boot.
[21:28] <blizzow> What kind of server/BMC?
[21:28] <shubjero> like vt100+ and etc? It's a supermicro server
[21:28] <shubjero> blizzow: I've kind of sprayed this question to all the tech channels I frequent :).. someones got to know :)
[21:29] <blizzow> Just sent you a private message.
[21:30] * limebyte (~limebyte@2607:5300:60:349a::1111) has joined #ceph
[21:30] <limebyte> Hey guys, is there any concern, when running CEPH over the Internet, execpt DOS?
[21:31] <limebyte> the deamon uses secure connections?
[21:34] * BrianA1 (~BrianA@c-73-189-153-151.hsd1.ca.comcast.net) has joined #ceph
[21:36] <m0zes__> the daemon uses authentication, but not encryption iirc.
[21:40] * blizzow1 (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) has joined #ceph
[21:40] * blizzow (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[21:40] * BrianA (~BrianA@c-73-189-153-151.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:40] * blizzow1 is now known as blizzow
[21:42] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[21:45] <limebyte> okay
[21:45] <limebyte> so files are transfered in clear?
[21:45] <limebyte> or just the commands?
[21:45] <limebyte> m0zes__,
[21:45] <limebyte> Any ways to let it tunnel throught SSH?
[21:53] * nolan (~nolan@phong.sigbus.net) Quit (Quit: ZNC - http://znc.sourceforge.net)
[21:53] <doppelgrau> limebyte: I???d put everythin in a vpn, and the latency wil result in bad latency and third: can lead to many ???funny??? failure modes due to network splits
[21:54] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[21:54] <limebyte> hmmm
[21:54] <limebyte> well as long the network is stable, there should not be an issue right?
[21:54] <limebyte> or any corrupt files?
[21:54] * nolan (~nolan@2001:470:1:41:a800:ff:fe3e:ad08) has joined #ceph
[21:55] <limebyte> the Nodes autojoin if disconnected or not?
[21:56] * mr_flea (~JWilbur@mtest2.im-in-the-tor-network.xyz) has joined #ceph
[21:56] <doppelgrau> except the low performance, no
[21:57] <limebyte> well
[21:57] <limebyte> mainly use it as archive
[21:57] * niknakpa1dywak (~xander.ni@outbound.lax.demandmedia.com) Quit (Ping timeout: 480 seconds)
[21:57] <doppelgrau> ceph keeps data integrity very high, so usually network problems only lead to blocked access
[21:58] <limebyte> premium
[21:58] <limebyte> since every node has a monitor running in my case
[21:58] <limebyte> they notice that and take action then i guess
[21:58] <limebyte> if one node drops out
[21:58] <darkfader> limebyte: you could try to go with some meshed vpn to ease the pain
[21:58] <limebyte> okay
[21:58] <doppelgrau> limebyte: the online/offline detection are usually done by the osds
[21:59] <doppelgrau> the monitors are only ???last resort??? with a very long timeout
[21:59] <limebyte> ah okay
[22:00] <limebyte> well there is no way to run it over a SSH tunnel or such? so easy thing would be a VPN? but that needs to be HA also
[22:00] <limebyte> otherwise if something dies there
[22:00] <limebyte> CEPH will follow
[22:03] <darkfader> limebyte: ic
[22:03] <darkfader> i know dmvpn can that but i only had that on alpine and there's no working ceph on alpine, so that might not work as well as you need it
[22:04] <darkfader> how seriously are you going to use this? it's not like that is a "exactly as the guy intended" use case :)
[22:04] <limebyte> well
[22:04] <limebyte> I have basically 4 servers with 1TB or 2TB
[22:05] <limebyte> and want to connect them via CEPH
[22:05] <darkfader> and how far are they from each other?
[22:05] <darkfader> ignoring that it's via the internet, how far in latency?
[22:05] <limebyte> depends
[22:05] <limebyte> 100ms peak
[22:06] <doppelgrau> bandwidth?
[22:06] <limebyte> 100Mbit or Gbit
[22:06] <limebyte> no DSL or such
[22:06] <darkfader> it'll perform like a old 5GB harddrive, that's the sad part
[22:07] <darkfader> but i don't think it'll constantly fall apart if you tune ntp really well
[22:07] <darkfader> you can firewall the mon ports so only your other nodes can see / reach them
[22:07] <limebyte> sure
[22:07] <limebyte> but the conection is not encrypted
[22:07] <darkfader> yeah
[22:07] <limebyte> and the Data
[22:08] <blizzow> I just created a snapshot of the disk image in my ceph pool for a VM that's pretty active. The whole VM crashed saying there was an error writing to the swap file. :( Do I have to do something to my vm to snapshot a live image?
[22:08] <darkfader> i would look at some other fs that focus on replication, not scaleout
[22:08] <darkfader> also with 4 nodes you can set up tunnels just between those nodes
[22:08] <darkfader> manually
[22:08] <darkfader> that is no biggie
[22:09] <darkfader> you'll need to make sure you use some crypto that works with hardware offloading (aesni)
[22:09] <darkfader> and then the latency hit from crpto is not too high
[22:09] <darkfader> if you had 20 nodes you'd need something like dmvpn, but for 4 just set it up manually
[22:09] <doppelgrau> with 100ms network latency, the crypto doesn matter IMHO
[22:09] <limebyte> I guess then I do pick SSH tunnels
[22:10] <darkfader> doppelgrau: it'd stack up i think
[22:10] <limebyte> but somehow i need to configure a Port for CEPH, not found a option yet to pass a port
[22:11] <darkfader> limebyte: ssh can make a different interface/ip you can bind ceph to
[22:11] <darkfader> not a port forward, it can also make a tunnel interface
[22:11] <darkfader> don't ask me how
[22:11] <doppelgrau> limebyte: you configure the network, and the interface is chosen depending on your routing table
[22:11] <limebyte> kay
[22:11] * j3roen (~j3roen@93.188.248.149) Quit (Ping timeout: 480 seconds)
[22:11] * wes_dillingham (~wes_dilli@65.112.8.205) Quit (Quit: wes_dillingham)
[22:12] <blizzow> If I increase the PGs in a pool, will my cluster start acting weird (EG: blocked processes)?
[22:13] <T1> an increase of PGs will result in rebalancing
[22:16] <T1> ssh tunnels for ceph-over-internet is a bad idea - you need to forward multiple ports and depending on the amount of OSDs and how often they are restarted portnumber can change - by default ceph uses quite a large portrange that it always assumes is available
[22:16] <limebyte> dam
[22:17] <limebyte> Well I could forward the port range
[22:17] <T1> besides that what others told you about a less than ideal use case still applies..
[22:17] <limebyte> but still need a IP
[22:17] <T1> it's really not a good idea
[22:17] <limebyte> T1, what do you think then, is a good idea instead of CEPH?
[22:17] <T1> unless you are really careful you could just end up loosing everything
[22:18] <T1> rsync, bt sync or similar more simple stuff
[22:19] <limebyte> eh
[22:20] <limebyte> they dont provide the features
[22:21] <limebyte> thats the problem and CEPH seems to be reliable
[22:21] <T1> reliable when components are placed close together without routing etc etc, yes
[22:21] <T1> otherwise you are in for a rough ride
[22:22] * j3roen (~j3roen@93.188.248.149) has joined #ceph
[22:24] * BrianA1 (~BrianA@c-73-189-153-151.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:26] * mr_flea (~JWilbur@mtest2.im-in-the-tor-network.xyz) Quit ()
[22:28] * xarses (~xarses@64.124.158.3) has joined #ceph
[22:29] * xarses (~xarses@64.124.158.3) Quit (Remote host closed the connection)
[22:29] * xarses (~xarses@64.124.158.3) has joined #ceph
[22:39] * Jeffrey4l_ (~Jeffrey@221.195.84.71) Quit (Ping timeout: 480 seconds)
[22:41] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[22:44] * squizzi (~squizzi@2001:420:2240:1268:ad85:b28:ee1c:890) Quit (Ping timeout: 480 seconds)
[22:45] <limebyte> eh
[22:45] * kristen (~kristen@jfdmzpr03-ext.jf.intel.com) Quit (Quit: Leaving)
[22:50] * mgolub (~Mikolaj@91.245.72.48) has joined #ceph
[22:53] * mykola (~Mikolaj@91.245.74.212) Quit (Ping timeout: 480 seconds)
[22:57] * rakeshgm (~rakesh@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Quit: Peace :))
[23:15] * niknakpaddywak (~xander.ni@outbound.lax.demandmedia.com) has joined #ceph
[23:21] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit (Quit: Leaving)
[23:24] * fridim (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[23:24] * mgolub (~Mikolaj@91.245.72.48) Quit (Quit: away)
[23:27] <blizzow> I have 1 OSD node with 8x4TB drives, 2 OSD nodes with 4x4TB drives, and 8 OSD nodes with 4x1TB drives. I have a pool with ~8-12TB of data in it and it currently has 512 PGs, how many PGs should I have in the pool?
[23:33] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[23:40] * scuttlemonkey is now known as scuttle|afk
[23:42] * Nanobot (~mason@46.166.138.136) has joined #ceph
[23:45] * scuttle|afk is now known as scuttlemonkey
[23:47] <T1> use the pg calculator
[23:47] <T1> http://ceph.com/pgcalc/
[23:48] <blizzow> T1: I guess my question is more: does the size of the drives matter?
[23:49] <T1> short answer: yes
[23:50] <doppelgrau> a bit longer: the weight matters (usually depending on the size)
[23:50] <T1> .. and how your selection logic is made matters
[23:51] <T1> you have vastly different sized nodes
[23:51] <T1> so some nodes will hold multiple times more data than others
[23:52] <blizzow> They already do that with a smaller number of PGs.
[23:54] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:56] * fridim (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[23:56] <blizzow> Will adding more PGs to 2048 on my pool possibly increase performance?
[23:57] <T1> it really depends on the number of OSDs
[23:57] <T1> and then a few other factors
[23:57] <T1> use the PG calculator as I suggested and see what it suggests
[23:58] <T1> .. and read up on the recommendations about PG sizing

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.