#ceph IRC Log

Index

IRC Log for 2014-07-07

Timestamps are in GMT/BST.

[0:07] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) has joined #ceph
[0:08] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[0:09] * rendar (~I@host61-177-dynamic.251-95-r.retail.telecomitalia.it) Quit ()
[0:09] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[0:16] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[0:16] * scuttlemonkey is now known as scuttle|afk
[0:19] * danm (~danm@dan.edgemode.net) Quit (Quit: Nettalk6 - www.ntalk.de)
[0:20] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[0:32] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:38] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[0:46] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) has joined #ceph
[0:49] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[0:52] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[0:53] * antiatlasdev (~antiatlas@105.128.79.239) has joined #ceph
[0:53] <antiatlasdev> where i am?
[0:54] * antiatlasdev (~antiatlas@105.128.79.239) Quit ()
[0:55] <iggy> stuck in a well?
[0:57] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) Quit (Quit: Leaving)
[1:04] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[1:04] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[1:08] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[1:09] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[1:11] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[1:21] * Cube (~Cube@66-87-131-125.pools.spcsdns.net) Quit (Quit: Leaving.)
[1:32] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:40] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[1:47] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[1:58] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[1:58] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[2:04] * zack_dolby (~textual@e0109-114-22-8-147.uqwimax.jp) has joined #ceph
[2:25] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[2:25] * LeaChim (~LeaChim@host109-146-189-84.range109-146.btcentralplus.com) Quit (Read error: Operation timed out)
[2:28] * flaxy (~afx@78.130.171.69) Quit (Quit: WeeChat 0.4.2)
[2:34] * flaxy (~afx@78.130.171.69) has joined #ceph
[2:36] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Remote host closed the connection)
[2:36] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[2:38] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) has joined #ceph
[2:38] * Steki (~steki@198.199.65.141) Quit (Read error: Operation timed out)
[2:56] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[3:03] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[3:07] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[3:08] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[3:13] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[3:17] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[3:18] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[3:30] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[3:39] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[3:43] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[3:55] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[3:58] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:59] * zhaochao (~zhaochao@106.38.204.77) has joined #ceph
[4:04] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[4:05] * DV (~veillard@veillard.com) Quit (Ping timeout: 480 seconds)
[4:05] * DV (~veillard@veillard.com) has joined #ceph
[4:17] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[4:27] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[4:39] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[4:42] * bkopilov (~bkopilov@213.57.17.96) Quit (Read error: Operation timed out)
[4:44] * julian (~julian@125.70.135.159) has joined #ceph
[4:49] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[4:53] * vbellur (~vijay@122.167.104.19) Quit (Read error: Operation timed out)
[4:56] * shang (~ShangWu@175.41.48.77) has joined #ceph
[5:03] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[5:08] * vbellur (~vijay@122.167.227.36) has joined #ceph
[5:12] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[5:20] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[5:24] <Jakey> i'm having trouble deciding what to put on the public_addr
[5:25] <Jakey> since i'm deploying ceph locally should i just put in the the local ip address?
[5:26] <Jakey> what ip address should i put in should i put in the ip address of the front end of the admin node or the monitoring nodes ip address? i am so confuse
[5:32] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[5:34] * scuttle|afk is now known as scuttlemonkey
[5:39] * lucas1 (~Thunderbi@218.76.25.66) has joined #ceph
[5:46] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[5:56] * Vacum_ (~vovo@88.130.207.127) has joined #ceph
[5:56] <Jakey> hellooooo????
[5:57] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[5:59] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Remote host closed the connection)
[6:00] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[6:03] * Vacum (~vovo@88.130.208.234) Quit (Ping timeout: 480 seconds)
[6:05] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[6:08] * MACscr (~Adium@c-50-158-183-38.hsd1.il.comcast.net) Quit (Quit: Leaving.)
[6:09] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[6:24] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[6:33] <Qten> Jakey: Public_addr isnt exactly public but better explained here, http://ceph.com/docs/master/rados/configuration/network-config-ref/
[6:38] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[6:39] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[6:39] * rdas (~rdas@121.244.87.115) has joined #ceph
[6:41] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[6:44] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[6:48] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[6:50] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[6:50] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[7:00] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[7:01] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[7:04] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[7:04] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[7:07] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[7:07] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[7:16] * vbellur (~vijay@122.167.227.36) Quit (Ping timeout: 480 seconds)
[7:18] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[7:20] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[7:22] * michalefty (~micha@p20030071CF466300F080E53F6FCB23AF.dip0.t-ipconnect.de) has joined #ceph
[7:23] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[7:35] * lalatenduM (~lalatendu@121.244.87.117) has joined #ceph
[7:35] * vbellur (~vijay@121.244.87.124) has joined #ceph
[7:53] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[7:57] * theanalyst (~abhi@49.32.0.17) has joined #ceph
[8:01] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[8:03] * MACscr (~Adium@c-50-158-183-38.hsd1.il.comcast.net) has joined #ceph
[8:03] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[8:04] * dignus (~jkooijman@t-x.dignus.nl) has joined #ceph
[8:06] * huangjun (~kvirc@58.49.151.105) has joined #ceph
[8:06] * michalefty (~micha@p20030071CF466300F080E53F6FCB23AF.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[8:07] * michalefty (~micha@p20030071CF466300F43F2596297764FE.dip0.t-ipconnect.de) has joined #ceph
[8:08] * michalefty (~micha@p20030071CF466300F43F2596297764FE.dip0.t-ipconnect.de) has left #ceph
[8:08] * rotbeard (~redbeard@2a02:908:df19:9900:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[8:11] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[8:12] * askanis (~askanis@HSI-KBW-078-043-004-079.hsi4.kabel-badenwuerttemberg.de) has joined #ceph
[8:13] * jordanP (~jordan@78.193.36.209) has joined #ceph
[8:13] * drankis (~drankis__@89.111.13.198) has joined #ceph
[8:23] * ikrstic (~ikrstic@178-222-2-138.dynamic.isp.telekom.rs) has joined #ceph
[8:25] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[8:26] * jordanP (~jordan@78.193.36.209) Quit (Ping timeout: 480 seconds)
[8:27] * Nacer_ (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[8:28] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[8:34] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[8:35] * jordanP (~jordan@78.193.36.209) has joined #ceph
[8:35] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[8:36] * ikrstic (~ikrstic@178-222-2-138.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[8:36] * saurabh (~saurabh@121.244.87.117) has joined #ceph
[8:38] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[8:39] * Sysadmin88 (~IceChat77@94.4.20.0) Quit (Quit: He who laughs last, thinks slowest)
[8:42] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[8:46] * rendar (~I@host24-115-dynamic.57-82-r.retail.telecomitalia.it) has joined #ceph
[8:49] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[8:50] * vbellur (~vijay@121.244.87.117) has joined #ceph
[8:51] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Remote host closed the connection)
[8:52] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[8:59] * ScOut3R (~ScOut3R@2E6B463D.dsl.pool.telekom.hu) has joined #ceph
[9:01] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Quit: Leaving.)
[9:02] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[9:03] * lucas1 (~Thunderbi@218.76.25.66) Quit (Ping timeout: 480 seconds)
[9:05] * haomaiwang (~haomaiwan@114.54.30.94) has joined #ceph
[9:09] * ScOut3R (~ScOut3R@2E6B463D.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[9:11] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[9:16] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:16] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[9:18] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[9:30] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[9:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[9:37] * newbie|2 (~kvirc@59.173.202.135) has joined #ceph
[9:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:44] * huangjun (~kvirc@58.49.151.105) Quit (Ping timeout: 480 seconds)
[9:45] <Jakey> https://www.irccloud.com/pastebin/Ne8aRDMP
[9:45] <Jakey> what is remote hostname ^^ ??
[9:45] <Jakey> the logs is paste up there
[9:45] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[9:45] * ChanServ sets mode +v andreask
[9:50] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) has joined #ceph
[9:50] * fsimonce (~simon@host27-60-dynamic.26-79-r.retail.telecomitalia.it) has joined #ceph
[9:52] <newbie|2> you write the ipA hostA in /etc/hosts, but on the ipA, the real hostname is hostB
[9:53] <newbie|2> the remote hostname is the real hostname on that machine
[9:53] <newbie|2> node8ibmlocal is just a alias on your deploy host
[9:54] <newbie|2> the real hostname is "ibm"
[9:56] * bandrus (~Adium@h-199-142.a137.corp.bahnhof.se) has joined #ceph
[9:56] * newbie|2 is now known as huangjun
[10:06] * ikrstic (~ikrstic@c82-214-88-26.loc.akton.net) has joined #ceph
[10:08] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[10:18] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Quit: Leaving)
[10:31] * zidarsk8 (~zidar@2001:1470:fffd:101c:ea11:32ff:fe9a:870) has joined #ceph
[10:35] * lcavassa (~lcavassa@78.25.240.221) has joined #ceph
[10:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[10:37] * jtang_ (~jtang@80.111.83.231) has joined #ceph
[10:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:43] * rdas (~rdas@121.244.87.115) Quit (Quit: Leaving)
[10:44] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[10:44] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:46] * rdas (~rdas@121.244.87.115) has joined #ceph
[10:49] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[11:03] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[11:04] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) Quit (Ping timeout: 480 seconds)
[11:07] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has joined #ceph
[11:07] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has left #ceph
[11:08] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[11:08] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has joined #ceph
[11:09] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has left #ceph
[11:09] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has joined #ceph
[11:10] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has left #ceph
[11:10] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has joined #ceph
[11:11] * _are_ (~quassel@2a01:238:4325:ca00:f065:c93c:f967:9285) has joined #ceph
[11:11] * drankis (~drankis__@89.111.13.198) Quit (Ping timeout: 480 seconds)
[11:12] * analbeard (~shw@185.28.167.198) has joined #ceph
[11:15] * zack_dolby (~textual@e0109-114-22-8-147.uqwimax.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[11:17] * analbeard1 (~shw@support.memset.com) has joined #ceph
[11:20] * analbeard (~shw@185.28.167.198) Quit (Ping timeout: 480 seconds)
[11:29] <tnt_> Mmm, the docs isn't very clear. To get rid of the warning, do you need the 'optimal' tunables (which AFAICT would include crus_tunables3 and be incompatible with anything < 3.15 kernel), or to the 'bobtail' ones (which only use CRUSH_TUNABLES2 and work with anything >= 3.9 kernel) ?
[11:31] * dlan_ (~dennis@116.228.88.131) has joined #ceph
[11:33] * dlan (~dennis@116.228.88.131) Quit (Ping timeout: 480 seconds)
[11:38] * askanis1 (~askanis@HSI-KBW-078-043-004-079.hsi4.kabel-badenwuerttemberg.de) has joined #ceph
[11:39] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[11:40] * askanis (~askanis@HSI-KBW-078-043-004-079.hsi4.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[11:58] * marrusl (~mark@faun.canonical.com) has joined #ceph
[11:59] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[12:02] <johnfoo> ok so
[12:02] <johnfoo> i still have packet 'loss' problems after testing without the cluster network
[12:03] <johnfoo> on 0.80.2
[12:03] <johnfoo> it worked ok in 0.79 but i upgraded the kernel to 3.15 when upgrading to 0.80.2
[12:04] <johnfoo> i initially thought about fucked up tcp tuning, but the stack is now properly tuned and it didn't change anything
[12:04] <johnfoo> i also thought it could be the network cards the cluster network was on
[12:04] <johnfoo> because they're broadcom and broadcom drivers are... well broadcom drivers
[12:05] <johnfoo> i kinda at a loss for ideas right now
[12:05] <liiwi> how do you see the packetloss? in which counters?
[12:06] * vbellur (~vijay@121.244.87.117) Quit (Read error: Operation timed out)
[12:06] <johnfoo> liiwi: OSDs misses heartbeart for short periods
[12:06] <johnfoo> and data traffic
[12:06] <johnfoo> the mon will mark them down, they get back up a second later
[12:06] <johnfoo> complaining they were wrongly marked down
[12:07] <liiwi> is that visible in any OS or nic counters?
[12:07] <johnfoo> no
[12:07] <johnfoo> there is no real packet loss
[12:07] <johnfoo> switches show no loss, interfaces show no losses, and pcap dumps show the tcp traffic
[12:07] <johnfoo> no RST, no FIN, no nothing
[12:08] <topro> johnfoo: cannot help with you problem but AFAIK there is no 0.80.2 yet, as I'm desperately waiting for such an stable service release
[12:08] <johnfoo> topro: http://ceph.com/docs/master/release-notes/#v0-82
[12:08] <johnfoo> the problem was identical on 0.80.1 though
[12:09] * masterpe_ is now known as masterpe
[12:09] <johnfoo> which is the one linked on the website
[12:09] <topro> so are you talking about 0.80.2 or 0.82?
[12:09] <johnfoo> 0.80.2, i fucked up the version number :v:
[12:09] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) has joined #ceph
[12:09] <johnfoo> s1r1u02 ceph # ceph --admin-daemon /var/run/ceph/ceph-mon.mon00.asok version
[12:09] <johnfoo> {"version":"0.82"}
[12:09] <topro> then again, there is no 0.80.2, the release-notes are for 0.82
[12:09] <johnfoo> wait a minute
[12:10] <johnfoo> no no
[12:10] <johnfoo> it is 0.82
[12:10] <johnfoo> my bad
[12:10] <topro> nvm. just was curious to see if there is some 0.80.2 imminent
[12:10] * drankis (~drankis__@91.188.43.210) has joined #ceph
[12:21] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[12:23] * rwheeler (~rwheeler@nat-pool-tlv-u.redhat.com) has joined #ceph
[12:23] * vbellur (~vijay@121.244.87.124) has joined #ceph
[12:38] * haomaiwang (~haomaiwan@114.54.30.94) Quit (Remote host closed the connection)
[12:38] * haomaiwang (~haomaiwan@203.69.59.199) has joined #ceph
[12:41] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[12:49] * ikrstic (~ikrstic@c82-214-88-26.loc.akton.net) Quit (Quit: Konversation terminated!)
[12:53] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[12:54] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:55] * huangjun (~kvirc@59.173.202.135) Quit (Read error: Operation timed out)
[12:55] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[12:55] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[12:57] * haomaiwa_ (~haomaiwan@114.54.30.94) has joined #ceph
[12:57] * drankis (~drankis__@91.188.43.210) Quit (Ping timeout: 480 seconds)
[12:59] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[13:00] * haomaiwang (~haomaiwan@203.69.59.199) Quit (Read error: Operation timed out)
[13:06] * rdas (~rdas@121.244.87.115) Quit (Ping timeout: 480 seconds)
[13:08] * zhaochao (~zhaochao@106.38.204.77) has left #ceph
[13:10] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[13:10] * shang (~ShangWu@175.41.48.77) Quit (Ping timeout: 480 seconds)
[13:13] * kiwnix (~kiwnix@00011f91.user.oftc.net) has joined #ceph
[13:13] <kiwnix> x
[13:16] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) Quit (Quit: Splunk> Be an IT superhero. Go home early.)
[13:17] * boichev (~boichev@213.169.56.130) has joined #ceph
[13:18] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) has joined #ceph
[13:19] * vbellur (~vijay@121.244.87.117) has joined #ceph
[13:26] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[13:26] * ChanServ sets mode +v andreask
[13:34] * dvanders (~dvanders@2001:1458:202:180::101:f6c7) has joined #ceph
[13:34] * b0e1 (~aledermue@juniper1.netways.de) has joined #ceph
[13:36] * b0e (~aledermue@juniper1.netways.de) Quit (Ping timeout: 480 seconds)
[13:38] * marrusl (~mark@faun.canonical.com) Quit (Ping timeout: 480 seconds)
[13:39] * vmx (~vmx@p508A5789.dip0.t-ipconnect.de) has joined #ceph
[13:40] * zidarsk8 (~zidar@2001:1470:fffd:101c:ea11:32ff:fe9a:870) has left #ceph
[13:43] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[13:47] * saurabh (~saurabh@121.244.87.117) Quit (Quit: Leaving)
[13:48] * julian (~julian@125.70.135.159) Quit (Quit: Leaving)
[13:48] * marrusl (~mark@faun.canonical.com) has joined #ceph
[13:49] * huangjun (~kvirc@117.151.51.188) has joined #ceph
[13:53] <SpComb> how does ceph behave if the mons lose quorum? Like if I have two mon's (a bad idea, I know) and I stop one of them
[13:53] <SpComb> it seems like e.g. rbd still works, but `ceph status` hangs on some retry... is that expected?
[13:53] <SpComb> I'd assume `ceph status` would still work even without quorum, it would give me some read-only status
[13:55] <askanis1> Is someone successfully deploying ceph using https://github.com/stackforge/puppet-ceph ?
[13:55] <askanis1> Seems like I'm too stupid, can sb provide me with an example for a successful ceph::conf call ?
[13:55] <askanis1> TIA !
[13:56] <tnt_> SpComb: that's expected.
[13:57] <tnt_> Without quorum ceph will just refuse to answer anything. Not even a status since it can't be sure of anything.
[13:57] <SpComb> I noticed that all the OSDs switched their connections over to the working mon, and some trivial rbd-querying via `virsh pool-info san` worked
[13:59] <tnt_> That's just because those operations most likely didn't need the mon at all ...
[14:00] <tnt_> the osd will continue to accept stuff if they can (if they still have connection to the other OSD involved in the PG you're trying to access) and if you have a valid authenticaion token.
[14:05] <SpComb> is running with two mons any worse than running with one?
[14:05] <SpComb> seems to behave pretty much the same with 0/2 mons up as with 1/2 mons up
[14:05] <tnt_> yes.
[14:05] <SpComb> i.e. existing rbd volumes are still useable
[14:06] <tnt_> They won't stay that way though. And you won't be able to connect to new ones.
[14:06] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[14:08] <SpComb> I had somehow assumed that a minority set of mons would degrade into some kind of readonly state, but I suppose that's not really feasible
[14:08] <tnt_> well no ... because if a mon is alone, it doesn't know if somewhere else there isn't a quorum of mons that can take decision and make stuff evolve.
[14:09] <SpComb> yeah, perhaps if the majority also went readonly, but meh :)
[14:09] <joao> SpComb, clients that still have a connection with the osds will keep that connection for 5 minutes or something
[14:09] * ikrstic (~ikrstic@fo-f-130.180.230.121.targo.rs) has joined #ceph
[14:09] <joao> there's this timeout that will be triggered when clients and osds notice that the mons are down
[14:10] <joao> and then they will simply stop
[14:10] <joao> (accepting requests and making requests)
[14:10] <tnt_> Anyone know if changing the tunables will impact currently running RBDs ? (obviously the kernel and client do support the tunables).
[14:11] <joao> also, the read-only question although interesting poses a significant issue: what if the only mon that is still alive doesn't know about the latest cluster state?
[14:12] <joao> e.g., say that said monitor was synchronizing from the other monitor (for instance, just been added) and this other monitor went down
[14:12] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[14:12] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[14:12] <joao> this new monitor only knows about a portion of the state, not the latest
[14:12] <SpComb> for maintenance purposes it might be useful to be able to bring down a mon gracefully
[14:13] <SpComb> but whatever, not that important, just need to keep in mind that three mons is only N+1 redundancy - as opposed to say three OSD replicas being N+2
[14:13] <joao> just kill the monitor and it will come down gracefully
[14:13] * b0e1 (~aledermue@juniper1.netways.de) Quit (Ping timeout: 480 seconds)
[14:13] <joao> now if you mean "while maintaining quorum", just have more than 2
[14:15] <johnfoo> joao: do you have an idea of what could make osds miss heartbeats and requests while there is no packet loss anywhere on the network ?
[14:15] <SpComb> by graceful stop I mean that the monitor that is about to go offline could inform the others that their state is the newest
[14:15] <SpComb> for a two-node cluster you could take one down for maintenance, but not be able to cold-start without a manual override
[14:16] <SpComb> like, say, ganeti does
[14:16] <joao> oh, still on the read-only thing: the biggest problem of them all would be to figure out if the monitor that is still alive is really the only one alive or if there's other monitors alive but this one monitor is unable to reach them
[14:17] <joao> johnfoo, anything that comes to mind is osds not being able to reach each other
[14:17] <johnfoo> that's the thing
[14:17] <joao> but I'm not an expert
[14:17] <johnfoo> they are
[14:17] <johnfoo> they randomly can't communicate
[14:17] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) has joined #ceph
[14:17] <johnfoo> then get back up
[14:17] <johnfoo> and complain they were wrongly marked down
[14:17] <joao> johnfoo, what version?
[14:18] <johnfoo> 0.82. same behavior under 0.80.1
[14:18] * tnt_ wishes there was a rsync for ceph pools ...
[14:18] <johnfoo> 0.79 was fine but i was using a 3.13 kernel
[14:18] <johnfoo> and upgraded to 3.15 at the same time i upgraded the cluster to 0.80.1
[14:18] * ikrstic (~ikrstic@fo-f-130.180.230.121.targo.rs) Quit (Quit: Konversation terminated!)
[14:19] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[14:20] * Anticimex (anticimex@95.80.32.80) Quit (Ping timeout: 480 seconds)
[14:20] <joao> johnfoo, have you tried looking in the tracker for a similar issue? if you don't find one, I'd recommend opening a new issue
[14:20] <joao> or the mailing list
[14:20] <johnfoo> yeah that's probably what i'm going to do
[14:20] <johnfoo> i didn't find anything ressembling that
[14:21] <johnfoo> i just wanted to be damn sure it wasn't my setup being retarded
[14:21] <joao> I could try to figure out this one but I'm expecting it to be time consuming for someone (like me) who isn't really familiar with the code paths involved
[14:21] <joao> johnfoo, it might still be :p
[14:21] <joao> I'm just saying I have no idea
[14:21] <johnfoo> yeah i know
[14:21] <johnfoo> yeah i don't have any idea either
[14:22] <johnfoo> i thought maybe ntp drifted since ceph is so anal about timings
[14:22] <johnfoo> but no
[14:22] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[14:22] <joao> SpComb, that "feature" would only be useful on a 2 mon cluster
[14:22] <SpComb> yeah, my main reference point is operating a two-node ganeti cluster
[14:23] <joao> and it's completely useless as long as you have another monitor
[14:23] <joao> I see
[14:23] <tnt_> johnfoo: the one time I had hearbeat misses, it was due to leveldb blocking stuff for too long, but that was on the mon, and a while ago ...
[14:23] <joao> if that's happening on the mon THEN I may be of help
[14:24] <joao> johnfoo, open a ticket, pour all and any information you can gather
[14:24] <SpComb> it doesn't seem like ganeti is using paxos as such, I wonder what the differences are specifically
[14:24] <joao> if you happen to be able to reproduce this, add debugging to the mons (debug mon = 10, debug ms = 1), and some osd debugging as well (I imagine 'debug ms = 1' would be enough?)
[14:24] <tnt_> joao: btw, I'm happy to see the mon is more and more well behaved wrt to IO : http://i.imgur.com/ESkTc3H.png :)
[14:25] <joao> oh goodie
[14:25] <joao> I can't recall what we did there
[14:25] <joao> is that the emperor release on october?
[14:26] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[14:26] <joao> or is it maybe a leveldb upgrade?
[14:27] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) Quit (Remote host closed the connection)
[14:28] <tnt_> joao: I think it was emperor.
[14:29] <joao> tnt_, no changes in cluster workload?
[14:29] <tnt_> joao: and the last drop is firefly.
[14:29] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[14:29] <tnt_> joao: not really. Or if anything, it increased.
[14:29] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[14:29] <joao> nice
[14:29] * Anticimex (anticimex@95.80.32.80) has joined #ceph
[14:29] <joao> I wish everybody had nice little graphs like this one
[14:30] <joao> to show us what was going around
[14:30] <joao> usually I only get to see these when there's spikes that bring the cluster to the brink of destruction
[14:30] <tnt_> not so happy with the OSDs though. http://i.imgur.com/klfEpeT.png memory per process over the same period.
[14:30] * fmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) Quit (Quit: Leaving)
[14:31] <tnt_> joao: well, I added instrumentation when there was the big issues with the mon like 1 year ago with it filling memory / disk /...
[14:31] <joao> tnt_, what leveldb version are you using?
[14:31] <joao> also, do you keep cpu stats for the mons?
[14:32] <tnt_> joao: 1.14.0-3
[14:32] <tnt_> joao: wrt to CPU, I think I do. Let me check. I don't ahve presets for it, so I have to dig in the raw data :p
[14:33] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) has joined #ceph
[14:33] <joao> if you manage to find them I'd be grateful
[14:33] * primechuck (~primechuc@173-17-128-36.client.mchsi.com) Quit (Remote host closed the connection)
[14:35] <tnt_> joao: http://i.imgur.com/al8uyU4.png
[14:35] <tnt_> blue is the leader.
[14:36] <tnt_> looks like firefly is way more cpu hungry. Though it's still pretty low.
[14:38] <joao> this is interesting data
[14:38] <joao> ty very much
[14:39] <joao> tnt_, that drop back in january, any idea what changed in the clusters?
[14:39] <joao> s/clusters/cluster
[14:39] <kraken> joao meant to say: tnt_, that drop back in january, any idea what changed in the cluster?
[14:39] * ninkotech (~duplo@cst-prg-45-6.cust.vodafone.cz) has joined #ceph
[14:39] * ninkotech_ (~duplo@cst-prg-45-6.cust.vodafone.cz) has joined #ceph
[14:39] <tnt_> joao: http://i.imgur.com/6nnBhj2.png Oh ... I had never noticed this. The peons seems to be leaking.
[14:40] <tnt_> joao: Let me check if I can find the various interventions we did.
[14:41] <joao> huh
[14:41] <joao> interesting data indeed
[14:43] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) Quit (Quit: Textual IRC Client: www.textualapp.com)
[14:44] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[14:44] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[14:47] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[14:47] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[14:48] <tnt_> joao: ok so here's the dates we did stuff: 2014-07-02=Update to firefly 2014-01-24=Update to Emperor 0.72.2+LevelDB 1.14 2014-01-18=Update to Emperor 0.72.2 (but a version linked to old static leveldb) 2013-10-19=Update to dumpling 2013-07-01=Update to cuttlefish
[14:49] * nljmo_ (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[14:50] * LeaChim (~LeaChim@host109-146-189-84.range109-146.btcentralplus.com) has joined #ceph
[14:50] <joao> tnt_, ty very much!
[14:51] <SpComb> joao: but I suppose ganeti must be doing some kind of tie-breaker in the case of an even majority, i.e. a 2/2 majority remains a 1/2 majority if the second node drops out - since either node will refuse to cold-start on its own
[14:52] <SpComb> but I presume that the ceph mon has more complex state than a ganeti master, which is pretty much just a control plane with no active involvement
[14:52] <tnt_> SpComb: doesn't help. Connectivity between the mons could be broken and they could both still be running.
[14:53] <SpComb> okay, true, ganeti has an active master and entirely passive backup masters
[14:53] <SpComb> perhaps that's the difference
[14:53] * bandrus (~Adium@h-199-142.a137.corp.bahnhof.se) Quit (Read error: Connection reset by peer)
[14:53] * bandrus (~Adium@h-199-142.a137.corp.bahnhof.se) has joined #ceph
[14:54] <johnfoo> i think i have found the answer to my mysterious packet loss
[14:54] <johnfoo> s1r1u02 ~ # systemd-coredumpctl list ceph-osd | wc -l
[14:54] <johnfoo> 198
[14:55] <tnt_> not so mysterious after all ...
[14:55] <johnfoo> yup
[14:55] <joao> SpComb, I've been through this a few times over the years with Sage, and we end up always reaching the same conclusion: the overhead of adding something like this tramples the need for it (given a third monitor will certainly fix the vast majority of use-cases)
[14:56] * ninkotech__ (~duplo@cst-prg-45-6.cust.vodafone.cz) has joined #ceph
[14:56] * ninkotech_ (~duplo@cst-prg-45-6.cust.vodafone.cz) Quit (Ping timeout: 480 seconds)
[14:56] * ninkotech (~duplo@cst-prg-45-6.cust.vodafone.cz) Quit (Ping timeout: 480 seconds)
[14:56] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[14:56] <SpComb> joao: sure, I'm not seriously proposing anything, mainly trying to understand how things work
[14:56] <joao> the hardest part is making sure the model would hold even on a 3+ cluster
[14:56] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:56] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[14:57] <SpComb> I will indeed have a third monitor
[14:57] <tnt_> and the fact that stuff keeps running fine even if there is no quorum for a few instants during maintenance ...
[14:57] <joao> for instance, say that we do have 3 monitors and this in place; considering that clients can contact any monitor (and they will, randomly on start), and that the only thing preventing split-head is having the monitors "ignoring" clients if they're not in quorum
[14:58] <joao> if we were to have this in place and we were to have the so called "master" being unable to reach the other monitors but still able to receive client connections
[14:58] * sroy (~sroy@2607:fad8:4:6:6e88:14ff:feff:5374) has joined #ceph
[14:58] <joao> and the other two monitors were able to form a quorum and still have clients of their own
[14:58] <joao> we could very well incur in a divergent state
[14:59] <tnt_> One thing that annoys me is that the timeout seems overly long ... like you do ceph -s on a cluster with 2 out of 3 mons and if the random() ends up on the down one, it takes like 5 sec to retry another.
[15:00] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[15:00] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has left #ceph
[15:00] <SpComb> a monitor that becomes a majority should indeed kill itself, as the remaining monitors may form a majority; I'd assume (but haven't tested) that a ganeti cluster with three nodes would have the master suicide if both backup masters go away
[15:00] <SpComb> err, *becomes a minority
[15:03] <joao> if a monitor becomes a minority and kills himself, what will happen when the network is restored and the second monitor is able to come back to life? or even if the user finishes his intervention and restarts the monitor?
[15:04] <joao> do we need to restart the other monitor manually as well?
[15:04] * theanalyst (~abhi@49.32.0.17) Quit (Read error: Connection reset by peer)
[15:05] <joao> that seems hardly a fair behavior to impose on the monitor, assuming it will "be dead" from the client's perspective anyway
[15:05] * diegows (~diegows@190.190.5.238) has joined #ceph
[15:05] <joao> it's just up to make sure that if another monitor comes back to life it will form quorum
[15:06] <joao> for instance, say you add a new monitor to a 1-node cluster; the existing monitor will drop off quorum, and while the other monitor is in the process of being manually started, it will kill itself
[15:06] <tnt_> whack-a-mole with the mons !
[15:06] <joao> the user will have to start the new monitor and then start the previously-existing monitor
[15:07] <joao> when there would be absolutely no need for that
[15:07] <SpComb> by suicide I presumeably meant something along the lines of going passive
[15:07] <joao> sure, this could be solved with timeouts: given timeout goes off, we've been out of quorum for a few minutes, let's die
[15:07] <joao> ah
[15:07] <joao> sorry, wasn't obvious you meant that :)
[15:08] <joao> SpComb, so if you are alone you go passive; what if all the remaining cluster is dead?
[15:08] <joao> and you're the absolute last one?
[15:08] <joao> how do you figure that one out?
[15:08] <SpComb> but I don't know what I'm talking about, I should take a better look some day at how ganeti really behaves
[15:09] <joao> the thing is, without user intervention stating "this one is now the leader for all intents and purposes", we don't feel like that's the greatest idea ever
[15:09] <joao> poses too many questions we may not be able to figure out answers for
[15:10] <SpComb> ganeti offers a `gnt-masterd --no-voting` option to force a node as master; you need it if you cold-start a two-node cluster and the other node isn't online
[15:11] <joao> tnt_, do you have any idea what leveldb version were you guys running prior to the upgrade to 1.14?
[15:12] * vbellur (~vijay@122.167.227.36) has joined #ceph
[15:12] <joao> SpComb, let me add ganetti's to my "to-read" pile
[15:12] <tnt_> joao: I can find out. gimme a sec.
[15:13] <joao> I'm pretty sure I went through it a while back but can't remember any of it, so I'll brush up and check out whether there's anything we could put to good use :)
[15:13] <tnt_> joao: 0+20120125.git3c8be10-1 from the default precise repo. Only static linked.
[15:14] <SpComb> I'm not sure its really an environment where ceph would be interesting, but ganeti is perfectly viable for really small setups with just two nodes, in that having two nodes is far better than one
[15:14] <SpComb> whereas with ceph having two nodes is far worse than having just one :P
[15:14] <joao> SpComb, not true
[15:15] <joao> although most people will smack me for saying this, two nodes are great if you're only concerned about replication of mon state
[15:15] <joao> better than running with just one, much like this one poor soul on the mailing list that had his leveldb corrupted on a single-mon cluster
[15:16] <joao> I'm pretty sure he'd think that having a two-mon cluster would be waaaay better than running on a 1-mon cluster
[15:16] <joao> tnt_, ty
[15:17] <darkfader> joao: ++
[15:18] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[15:18] * bandrus (~Adium@h-199-142.a137.corp.bahnhof.se) Quit (Quit: Leaving.)
[15:19] <joao> tnt_, ty
[15:19] <joao> oh, already had done this
[15:19] <tnt_> lol :)
[15:19] <joao> I'm just in awe with this data
[15:20] * ikrstic (~ikrstic@fo-f-130.180.230.121.targo.rs) has joined #ceph
[15:21] <joao> tnt_, did you guys upgraded to latest stable firefly?
[15:22] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[15:22] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[15:28] * tws_ (~traviss@rrcs-24-123-86-154.central.biz.rr.com) has joined #ceph
[15:29] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[15:31] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[15:34] * kitz (~kitz@admin163-7.hampshire.edu) Quit (Quit: kitz)
[15:39] * Hell_Fire__ (~HellFire@123-243-155-184.static.tpgi.com.au) has joined #ceph
[15:40] * Hell_Fire_ (~HellFire@123-243-155-184.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[15:40] * ikrstic (~ikrstic@fo-f-130.180.230.121.targo.rs) Quit (Read error: Operation timed out)
[15:48] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:48] * tdasilva (~quassel@nat-pool-bos-t.redhat.com) has joined #ceph
[15:53] * markbby (~Adium@168.94.245.2) has joined #ceph
[15:54] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[15:56] * ismell (~ismell@host-64-17-89-79.beyondbb.com) has joined #ceph
[15:59] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) Quit (Quit: Leaving.)
[16:00] * ninkotech__ (~duplo@cst-prg-45-6.cust.vodafone.cz) Quit (Ping timeout: 480 seconds)
[16:01] <johnfoo> joao: so it turns out ceph-osd are crashing on a ENOENT error
[16:02] <johnfoo> #13 0x00000000008b1cba in JournalingObjectStore::journal_replay (this=this@entry=0x338a000, fs_op_seq=<optimized out>) at os/JournalingObjectStore.cc:86
[16:02] <johnfoo> 86 os/JournalingObjectStore.cc: No such file or directory.
[16:02] <johnfoo> the actual assert is on frame 10
[16:03] * fdmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) has joined #ceph
[16:04] <johnfoo> ah yes it is trying to mount a journal to replay and failing
[16:04] <johnfoo> hence the crash
[16:06] * lpabon (~lpabon@nat-pool-bos-t.redhat.com) has joined #ceph
[16:06] <tnt_> joao: I currently use d43e7113dd501aea1db33fdae30d56e96e9c3897 which is the latest commit of origin/firefly when I built the package last week. I added a few other commits above that, mostly for rgw bug fixing.
[16:07] <wonko_be> hi all, is there any way to prefer client traffic to backfilling. If I remove an OSD in my cluster, performance drops to a 10th while the backfill is happening
[16:09] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[16:10] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[16:12] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) has joined #ceph
[16:12] <Vacum_> wonko_be: check out "osd recovery op priority", "osd client op priority", "osd max backfills", "osd recovery max active" on https://ceph.com/docs/master/rados/configuration/osd-config-ref/
[16:12] <Vacum_> wonko_be: is your network saturated by the backfilling?
[16:14] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[16:14] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[16:17] <wonko_be> no, it isn't
[16:17] * bkopilov (~bkopilov@149.78.111.204) has joined #ceph
[16:18] * rwheeler (~rwheeler@nat-pool-tlv-u.redhat.com) Quit (Quit: Leaving)
[16:18] <wonko_be> i tried it first with a "larger" cluster (6 nodes, 3 disks/osds per node), and now even on smaller setups, the performance drop is huge
[16:20] <wonko_be> my cluster network is split from my client network, the cluster network is bonded over 3 x 1Gbps per node
[16:20] <wonko_be> the client network is 1 Gbps per node, the client is a simple fio test, 70/30% rw split 8k blocks
[16:25] <tnt_> I don't think the network speed has anything to do with it. It's probably just killing IO with seeks ?
[16:25] <tnt_> I mean Disk IO.
[16:26] <wonko_be> iowait is pretty high, but that is to be expected I assume. But shouldn't client access get preference?
[16:27] <wonko_be> I expect to take a certain hit in performance, but it is really dropping massively, I only get 40-50 iops, while I have +400 iops when there is no backfilling going
[16:28] <wonko_be> and i see this with either setup, 3 or 6 nodes, doesn't really matter how many OSDs...
[16:30] * boichev2 (~boichev@213.169.56.130) has joined #ceph
[16:30] <johnfoo> how hard would it be to migrate BTRFS OSDs to XFS ?
[16:30] <johnfoo> tear them down one by one and let ceph rebuild ?
[16:31] <tnt_> johnfoo: having issue with btrfs ?
[16:31] <johnfoo> probably
[16:31] <johnfoo> the stack traces shows the OSD crashed on an unexpected error
[16:31] <johnfoo> while trying to replay a journal
[16:31] <johnfoo> filestore(/var/lib/ceph/osd/ceph-5) error (17) File exists not handled on operation 22 (41209638.0.3, or op 3, counting from 0)
[16:31] <johnfoo> etc...
[16:31] <johnfoo> i can give you the coredump if you want
[16:32] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[16:33] * boichev (~boichev@213.169.56.130) Quit (Ping timeout: 480 seconds)
[16:34] <johnfoo> it's throwing the assert error at FileStore.cc:2566, which is basically the last resort assert
[16:34] <tnt_> Sorry, I never used btrfs ... I was just curious because I was wondering about creating new OSD on btrfs ...
[16:35] <johnfoo> it worked fine with a 0.79, 3.13 combo
[16:35] <johnfoo> there was a lot of commits in 3.15 for btrfs
[16:35] <johnfoo> so it may not be ceph's fault
[16:35] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[16:47] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[16:48] * xarses_ (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[16:54] * kevinc (~kevinc__@client65-40.sdsc.edu) has joined #ceph
[17:01] * b0e (~aledermue@juniper1.netways.de) Quit (Quit: Leaving.)
[17:02] <Anticimex> wonko_be: there's a ticket on that IO prioritization
[17:02] <Anticimex> (or lack there of)
[17:02] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Read error: Operation timed out)
[17:03] <Anticimex> wonko_be: do you think http://tracker.ceph.com/issues/8580 applies?
[17:05] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) has joined #ceph
[17:06] <Anticimex> 8580 would be nice to see fix for :)
[17:07] <wonko_be> lets me check Anticimex
[17:09] <wonko_be> reading the solution, it would apply to my case, I guess.
[17:09] <wonko_be> i just though I was doing something wrong, as the impact from a single failed/removed/out'd osd is chocking the complete cluster
[17:10] <Anticimex> have you seen and configure the priority and max backfills optoins Vacum mentioned above?
[17:10] <wonko_be> it just isn't usable any more during backfilling, which kind of makes it useless
[17:10] <wonko_be> yes
[17:11] <wonko_be> osd max backfills is set to 3 now, recovery max active too, and the client priority is set to 63, and the recovery priority is set to 1
[17:12] <wonko_be> didn't really change a lot
[17:16] <wonko_be> feel free to check: https://www.dropbox.com/s/cikcosmisiw80o1/Screenshot%202014-07-07%2017.15.49.png
[17:16] <wonko_be> the first drop is when I marked an OSD out, the small rise around 17:04 is when the backfill is complete, then I mark it back in
[17:17] <wonko_be> the graph shows iops on the client where I run the fio
[17:18] <wonko_be> but, as there is a ticket for this, I assume this is then expected behavior? Or shouldn't the impact be this huge?
[17:19] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[17:19] * Zethrok_ (~martin@95.154.26.34) Quit (Read error: Connection reset by peer)
[17:28] * JC (~JC@AMontpellier-651-1-32-204.w90-57.abo.wanadoo.fr) has joined #ceph
[17:30] * narb (~Jeff@38.99.52.10) has joined #ceph
[17:31] <tnt_> Mmm, simulation of the data movements when enabling HSHPOOL flag shows ... pretty much everything moving :p
[17:34] * kevinc (~kevinc__@client65-40.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[17:39] * kevinc (~kevinc__@client65-40.sdsc.edu) has joined #ceph
[17:40] * KaZeR (~kazer@c-67-161-64-186.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[17:41] * askanis1 (~askanis@HSI-KBW-078-043-004-079.hsi4.kabel-badenwuerttemberg.de) Quit (Quit: Leaving.)
[17:42] * Snow_Flakey (~snow@a82-161-133-247.adsl.xs4all.nl) has joined #ceph
[17:42] * sjm (~sjm@pool-72-76-115-220.nwrknj.fios.verizon.net) has joined #ceph
[17:43] <Snow_Flakey> Hello everybody.
[17:44] <Snow_Flakey> I am using ceph-deploy v1.5.6 / 1.5.7 but the ???fix??? where it does download el6 packages on a RHEL7 machine still seems unresolved
[17:44] <alfredodeza> Snow_Flakey: do you have the issue number handy?
[17:45] <Snow_Flakey> No, I looked in the release notes
[17:45] <alfredodeza> Snow_Flakey: there is always a workaround :)
[17:45] <Snow_Flakey> ???Fix RHEL7 installation issue that was pulling el6 packages (David Vossel)???
[17:45] <alfredodeza> wait
[17:45] * alfredodeza checks
[17:46] <alfredodeza> right so that was 1.5.6 and you say you are still hitting that problem?
[17:46] <alfredodeza> Snow_Flakey: can you share some output?
[17:46] <Snow_Flakey> 1.5.7 has the same issue
[17:47] <Snow_Flakey> yes, but how many lines is spamming?
[17:47] <alfredodeza> there seems to be something else that is on its way to get fixed, but need a paste
[17:47] * lalatenduM (~lalatendu@121.244.87.117) Quit (Quit: Leaving)
[17:47] <alfredodeza> Snow_Flakey: ah not here
[17:47] <alfredodeza> try fpaste.org
[17:47] <alfredodeza> and paste as much as you can there and share the link back :)
[17:47] <johnfoo> 0> 2014-07-07 17:45:53.886887 7fe171c257c0 -1 os/FileStore.cc: In function 'virtual int FileStore::mount()' thread 7fe171c257c0 time 2014-07-07 17:45:53.885737
[17:47] <johnfoo> os/FileStore.cc: 1234: FAILED assert(c > prev)
[17:47] <johnfoo> it gets better :v:
[17:47] <johnfoo> at least it's a known error
[17:48] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:48] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) Quit (Ping timeout: 480 seconds)
[17:48] * lcavassa (~lcavassa@78.25.240.221) Quit (Remote host closed the connection)
[17:48] <Snow_Flakey> http://fpaste.org/116046/
[17:49] <alfredodeza> ah
[17:49] <alfredodeza> yes
[17:49] <alfredodeza> I know what this is
[17:49] <alfredodeza> Snow_Flakey: this *just* got merged in today
[17:49] <alfredodeza> I will have a new release any day now
[17:50] <alfredodeza> however, there is a way around this
[17:50] <alfredodeza> one sec
[17:52] * analbeard1 (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:54] * KaZeR (~kazer@64.201.252.132) has joined #ceph
[17:57] * yuriw (~Adium@c-76-126-35-111.hsd1.ca.comcast.net) has joined #ceph
[17:58] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[18:01] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[18:02] * alfredodeza still working on it
[18:02] <Snow_Flakey> take your time
[18:05] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[18:08] * kevinc (~kevinc__@client65-40.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[18:08] <alfredodeza> Snow_Flakey: can you try: ceph-deploy install --repo-url http://ceph.com/rpm-firefly/rhel7/ ceph-mon1 ceph-ods1 ceph-ods2
[18:08] <alfredodeza> that should work, if it doesn't paste me the output
[18:09] <Snow_Flakey> Will try
[18:09] * flaxy (~afx@78.130.171.69) Quit (Quit: WeeChat 0.4.2)
[18:10] <joao> tnt_, still around?
[18:12] <Snow_Flakey> http://fpaste.org/116057/40474954/
[18:14] <Snow_Flakey> I guess I am missing some generic python stuff now
[18:14] <alfredodeza> aha
[18:14] <alfredodeza> it looks like it
[18:14] <alfredodeza> but
[18:14] <alfredodeza> that should be available no?
[18:14] <alfredodeza> try and do: ceph-deploy pkg --install python-flask ceph-mon1
[18:16] <Snow_Flakey> not available, do I need epel?
[18:16] <alfredodeza> yes
[18:16] <Snow_Flakey> will add epel and try again
[18:16] <alfredodeza> !norris Snow_Flakey
[18:16] <kraken> There used to be a street named after Snow_Flakey, but it was changed because nobody crosses Snow_Flakey and lives.
[18:16] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Quit: Leaving)
[18:18] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[18:19] * fdmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) Quit (Quit: Leaving)
[18:20] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) Quit (Read error: Operation timed out)
[18:22] * xarses (~andreww@12.164.168.117) has joined #ceph
[18:25] <johnfoo> hey
[18:25] <johnfoo> having multiple monitor processes on a single box
[18:25] <johnfoo> yes/no/terrible idea ?
[18:26] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[18:27] <iggy> johnfoo: defeats the purpose
[18:27] <johnfoo> yeah i know
[18:27] <johnfoo> but would it work ?
[18:28] <iggy> if configured appropriately (but still terrible idea)
[18:28] * rweeks (~rweeks@pat.hitachigst.com) has joined #ceph
[18:31] * aldavud (~aldavud@213.55.176.228) has joined #ceph
[18:32] * fdmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) has joined #ceph
[18:33] * hybrid512 (~walid@195.200.167.70) Quit (Quit: Leaving.)
[18:35] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[18:38] * reed (~reed@75-101-54-131.dsl.static.sonic.net) has joined #ceph
[18:38] * hybrid512 (~walid@195.200.167.70) Quit ()
[18:40] * kevinc (~kevinc__@client65-40.sdsc.edu) has joined #ceph
[18:40] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[18:42] <tnt_> joao: yup
[18:44] <joao> tnt_, the next time you notice mem usage growing, mind running 'ceph heap stats && ceph heap release && ceph heap stats' and see if memory usage drops? :)
[18:45] <joao> on the peons I mean, so it would probably be a 'ceph tell mon.foo heap ...'
[18:45] * vmx (~vmx@p508A5789.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[18:49] * flaxy (~afx@78.130.171.69) has joined #ceph
[18:50] <Snow_Flakey> http://fpaste.org/116071/ any workaround for python-jinja2, EPEL for RHEL7 is still in Beta
[18:51] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[18:52] * askanis (~askanis@2001:4dd0:ff00:84de:99c6:5cbd:465a:e456) has joined #ceph
[18:53] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[18:53] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[18:53] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[18:55] * fdmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) Quit (Quit: Leaving)
[18:55] <tnt_> joao: Ok, I'll try to remember when I see it next. But since the log was over a matter of month, it might take a while :p
[18:56] * qhartman (~qhartman@64.207.33.50) has joined #ceph
[18:56] <joao> tnt_, thanks
[18:57] <joao> we'll take anything we can get :)
[18:58] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[18:59] * fdmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) has joined #ceph
[19:00] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[19:01] * gregsfortytwo (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[19:01] * gregsfortytwo (~Adium@2607:f298:a:607:50c1:d42:1595:3af3) has joined #ceph
[19:01] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[19:01] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[19:03] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[19:04] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit ()
[19:04] * alexbligh1 (~alexbligh@89-16-176-215.no-reverse-dns-set.bytemark.co.uk) Quit (Quit: Terminated with extreme prejudice - dircproxy 1.0.5)
[19:04] * alexbligh1 (~alexbligh@89-16-176-215.no-reverse-dns-set.bytemark.co.uk) has joined #ceph
[19:06] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[19:07] * lcavassa (~lcavassa@faun.canonical.com) has joined #ceph
[19:08] <Snow_Flakey> Okay, that was enough fun for one day. I try to resolve python-jinja2 an other day
[19:08] * Snow_Flakey (~snow@a82-161-133-247.adsl.xs4all.nl) Quit (Quit: Snow_Flakey)
[19:11] * askanis (~askanis@2001:4dd0:ff00:84de:99c6:5cbd:465a:e456) Quit (Quit: Leaving.)
[19:12] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:12] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:14] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Ping timeout: 480 seconds)
[19:16] * blSnoopy (~snoopy@miram.persei.mw.lg.virgo.supercluster.net) Quit (Remote host closed the connection)
[19:16] <topro> btw. is there a 0.80.2 stable service release scheduled (i.e. does anyone care about service releases)?
[19:18] * sigsegv (~sigsegv@188.25.121.7) has joined #ceph
[19:18] * michalefty (~micha@188-195-129-145-dynip.superkabel.de) has joined #ceph
[19:20] * marrusl (~mark@faun.canonical.com) Quit (Quit: sync && halt)
[19:23] * KB_ (~oftc-webi@cpe-74-137-224-213.swo.res.rr.com) has joined #ceph
[19:23] * michalefty (~micha@188-195-129-145-dynip.superkabel.de) Quit (Quit: Leaving.)
[19:24] * blSnoopy (~snoopy@miram.persei.mw.lg.virgo.supercluster.net) has joined #ceph
[19:24] * blSnoopy is now known as Snoopy
[19:24] * Snoopy is now known as blSnoopy
[19:27] * houkouonchi-home (~linux@pool-71-189-160-82.lsanca.fios.verizon.net) has joined #ceph
[19:29] * aldavud (~aldavud@213.55.176.228) Quit (Ping timeout: 480 seconds)
[19:30] <joao> tnt_, still around? :p
[19:32] <joao> tnt_, would you mind injecting '--leveldb_compression=false' on your monitors and see if cpu usage drops to emperor levels again?
[19:32] <KB_> Hi all - hoping I can get a hand with multi-region/multi-zone concepts... I've got 2 radosgw regions that each have 2 zones. The zone replication and region replication all seems to work just fine, but when I try to write to the 2nd region's gateway for the master zone, I'm getting an HTTP 400, with an error in the radosgw log of: "location constraint () doesn't match region (us-east-2)"
[19:32] <joao> I'm pretty sure that's what caused the cpu spike in firefly
[19:33] * askanis (~askanis@2001:4dd0:ff00:84de:40e4:819:394e:3797) has joined #ceph
[19:38] * askanis (~askanis@2001:4dd0:ff00:84de:40e4:819:394e:3797) Quit ()
[19:40] * lcavassa (~lcavassa@faun.canonical.com) Quit (Ping timeout: 480 seconds)
[19:41] * askanis (~askanis@2001:4dd0:ff00:84de:1900:5b4a:6cf3:5425) has joined #ceph
[19:42] <KB_> I had assumed that when creating a bucket, it would end up in the region/master zone of the gateway I'm attached to... is that not accurate?
[19:43] <tnt_> joao: sure, just gimme 10 min.
[19:45] <joshd> KB_: iirc you need to write to master zone directly
[19:46] <KB_> yep, I am... region us-east-2, zone 1
[19:47] <KB_> when I write to us-east-1, zone 1, no issues - everything works fine
[19:47] <KB_> data replicates to zone 2, and metadata/bucket index replicates to us-east-2 region
[19:47] <KB_> but when I write to us-east-2, just get that location constraint error
[19:47] <KB_> us-east-1 is the master region
[19:51] * imriz (~imriz@212.199.155.208.static.012.net.il) has joined #ceph
[19:52] <joshd> KB_: yeah, writes need to go to the master zone of the master region
[19:52] * kevinc (~kevinc__@client65-40.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[19:53] <KB_> ok... so how does data get to the secondary region?
[19:54] <joshd> data isn't synchronized between regions - just between zones within a region
[19:54] <KB_> right, that was my understanding as well...
[19:54] <joshd> users and the bucket namespace are synced between regions so you can have a single namespace
[19:54] <KB_> yep, and users appear to be synced
[19:55] * vmx (~vmx@dslb-084-056-033-110.pools.arcor-ip.net) has joined #ceph
[19:55] * Pedras (~Adium@216.207.42.132) has joined #ceph
[19:56] <KB_> so the big question is "how do I write to the secondary region"? For example, if my master region/master zone is in new york (for example), and my secondary region/master zone is in california, I don't want writes local from inside my CA datacenter to be sent to NY
[19:56] <KB_> that would negate the benefit of regions... unless I'm conceptually way off base!
[19:57] <joshd> so iirc it's just bucket and user create/delete that need to go to the master region
[19:57] <KB_> yes
[19:57] <KB_> ah -
[19:57] <KB_> user AND bucket
[19:57] <KB_> interesting
[19:57] <tnt_> joao: huh ... should ceph tell mon.* injectargs '--leveldb_compression=false' work ?
[19:57] * dmick1 (~dmick@2607:f298:a:607:cd20:130:9c34:bf4) has joined #ceph
[19:58] <joshd> so writing objects once the buckets exist should happen within each dc
[19:58] <KB_> that might explain my issue. I expected that connecting to the secondary region and creating a bucket would end up there, but with the unified namespace requires buckets to be everywhere as well...
[19:58] * sage___ (~quassel@gw.sepia.ceph.com) has joined #ceph
[19:59] * sjust (~sjust@gw.sepia.ceph.com) has joined #ceph
[19:59] <joshd> I thought the secondary region would forward the bucket create to the master region, but perhaps that hasn't been implemented yet
[19:59] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) has joined #ceph
[19:59] * sjust (~sjust@gw.sepia.ceph.com) Quit ()
[20:02] <KB_> so as a follow up - how do I set the location constraint during bucket creation in the master region/master zone to tell the bucket that it "belongs" to the secondary region?
[20:03] * imriz (~imriz@212.199.155.208.static.012.net.il) Quit (Ping timeout: 480 seconds)
[20:06] * sroy (~sroy@2607:fad8:4:6:6e88:14ff:feff:5374) Quit (Quit: Quitte)
[20:06] <joshd> KB_: similar to the s3 api: http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUT.html http://tracker.ceph.com/issues/2169#note-3
[20:07] <hedin> i'm trying to create a cluster, but I keep getting an error when running ceph-deploy mon create-initial and I have a hard time figuring out why... https://dpaste.de/0XMW
[20:08] * sputnik13 (~sputnik13@207.8.121.241) has joined #ceph
[20:09] <alfredodeza> hedin: is it possible that you have more than one mon running there in ceph1 ?
[20:10] <alfredodeza> I am thinking that maybe you have tried this one too many times and now you have stale monitors there
[20:10] <alfredodeza> can you check that nothing ceph-related is running in that box hedin?
[20:11] <hedin> At some point I had installed both the ubuntu repos ceph and from ceph-deploy, but I have apt-get remove --purge all ceph packages and run ceph-deploy purge and purgedata
[20:12] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:12] <hedin> ps aux|grep -i ceph does not return anything interestinh
[20:12] * sarob (~sarob@2001:4998:effd:600:dc17:195c:60fa:49e2) has joined #ceph
[20:14] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) has joined #ceph
[20:15] <alfredodeza> oh
[20:15] <alfredodeza> hedin: you are using ceph-deploy 1.4!
[20:15] <alfredodeza> do upgrade :)
[20:16] <hedin> I have tried to add the ceph.com repos but it seems to be missing the ceph-deploy package for ubuntu-14.04 so I installed it from the ubuntu universe repository and it contains the 1.4 release
[20:16] <alfredodeza> weird
[20:16] <hedin> I agree
[20:17] <alfredodeza> hedin: we do have a trusty repo http://ceph.com/debian-firefly/dists/trusty/
[20:18] <alfredodeza> how odd, trusty has not been updated
[20:19] * ikrstic (~ikrstic@109-93-126-224.dynamic.isp.telekom.rs) has joined #ceph
[20:20] * askanis (~askanis@2001:4dd0:ff00:84de:1900:5b4a:6cf3:5425) Quit (Quit: Leaving.)
[20:20] * sarob (~sarob@2001:4998:effd:600:dc17:195c:60fa:49e2) Quit (Ping timeout: 480 seconds)
[20:21] <KB_> joshd: Thanks so much for your explanation and help!
[20:22] <hedin> $ cat /etc/apt/sources.list.d/ceph.list
[20:22] <hedin> deb https://ceph.com/debian-firefly trusty main
[20:22] <alfredodeza> ok sorry, yes, that should work
[20:22] <hedin> then apt-get update && apt-get upgrade and ceph-deploy is not up for an upgrade from 1.4
[20:22] <alfredodeza> but you tell me that that doesn't ?
[20:22] <alfredodeza> no way
[20:22] <joshd> KB_: you're welcome!
[20:23] <hedin> to me it looks like ceph-deploy is not present in that repos..
[20:24] * sputnik13 (~sputnik13@207.8.121.241) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[20:24] <alfredodeza> for trusty
[20:24] <alfredodeza> yeah
[20:24] <alfredodeza> wow
[20:24] * alfredodeza fixes
[20:26] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:29] <primechuck> Is there any good way to kick up RBD debuging in libvirt? Having an issue where some pools go to launch with an RBD volume but they just spin at 100% CPU for hours...slowly reading the disk.
[20:29] * davidzlap (~Adium@ip68-4-173-198.oc.oc.cox.net) has joined #ceph
[20:32] * askanis (~askanis@2001:4dd0:ff00:84de:e0c2:86ab:e4ea:ebc3) has joined #ceph
[20:32] <tnt_> joao: Somehow I can't figure out the project command to inject it ... they all say EINVAL ...
[20:34] <alfredodeza> hedin: yeah, we have not had a trusty ceph-deploy release it seems
[20:34] <alfredodeza> :(
[20:35] <hedin> okay... then I guess I'l have to reinstall with 12.04?
[20:35] <alfredodeza> I don't think that should be an issue still
[20:35] <alfredodeza> something is wrong there
[20:36] <alfredodeza> can you paste me the logs of your output trying from the beginning ?
[20:36] <alfredodeza> e.g. new, install, mon create-initial
[20:36] <hedin> yes
[20:38] * valeech (~valeech@pool-71-171-123-210.clppva.fios.verizon.net) has joined #ceph
[20:43] * rendar (~I@host24-115-dynamic.57-82-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:44] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[20:46] * rendar (~I@host24-115-dynamic.57-82-r.retail.telecomitalia.it) has joined #ceph
[20:47] <primechuck> Is there an issue with running a monitor+OSD+qemu/kvm on the same box?
[20:47] * toMeloos (~tom@5ED28C85.cm-7-3c.dynamic.ziggo.nl) has joined #ceph
[20:47] * askanis (~askanis@2001:4dd0:ff00:84de:e0c2:86ab:e4ea:ebc3) Quit (Quit: Leaving.)
[20:48] * askanis (~askanis@2001:4dd0:ff00:84de:75a5:719f:cb55:2a81) has joined #ceph
[20:52] * askanis (~askanis@2001:4dd0:ff00:84de:75a5:719f:cb55:2a81) Quit ()
[20:52] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) Quit (Quit: Leaving.)
[20:52] * askanis (~askanis@2001:4dd0:ff00:84de:317e:274a:a040:c4d3) has joined #ceph
[20:53] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) has joined #ceph
[20:54] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) Quit ()
[20:55] <hedin> alfredodeza: https://dpaste.de/zQK7
[20:57] <alfredodeza> ok
[20:57] <alfredodeza> hedin: do you have multiple networks on that box?
[20:57] <hedin> no
[20:57] <alfredodeza> argh
[20:57] * alfredodeza is running out of ideas
[20:57] <hedin> :s
[20:58] <joao> tnt_, oh, I just realized that for it to take effect the monitor must be restarted
[20:58] <alfredodeza> hedin: can you try to raise the log levels for the mon and try to start them and check the output?
[20:58] <hedin> installed packages: https://dpaste.de/6tNk
[20:58] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) has joined #ceph
[20:58] <joao> tnt_, you'd have to adjust your ceph.conf with 'leveldb compression = false' and restart; just one monitor would be awesome
[20:59] <joao> my guess is that we'd be able to see that one monitor cpu usage dropping compared to the others
[20:59] <alfredodeza> hedin: http://fpaste.org/116111/14047595/
[21:03] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[21:03] <tnt_> joao: Ok, I'll try that tomorrow. I just started migrating some pools to hashpspool=true so the monitors are under load ...
[21:03] <joao> tnt_, sure, thanks
[21:04] * markbby (~Adium@168.94.245.2) has joined #ceph
[21:05] * askanis (~askanis@2001:4dd0:ff00:84de:317e:274a:a040:c4d3) Quit (Quit: Leaving.)
[21:05] * kevinc (~kevinc__@client65-40.sdsc.edu) has joined #ceph
[21:10] * analbeard (~shw@support.memset.com) has joined #ceph
[21:13] * sommarnatt (~sommarnat@c83-251-199-89.bredband.comhem.se) has joined #ceph
[21:15] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) Quit (Quit: Leaving.)
[21:16] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) has joined #ceph
[21:19] <hedin> alfredodeza: https://dpaste.de/3dLB
[21:20] <alfredodeza> hedin: you need to try to start those mons manually now and tail the logs at /var/log/ceph/
[21:20] <alfredodeza> manually like running this: sudo ceph-mon --cluster ceph --mkfs -i ceph1 --keyring /var/lib/ceph/tmp/ceph-ceph1.mon.keyring
[21:20] <alfredodeza> on your ceph1 server
[21:22] * kevinc (~kevinc__@client65-40.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[21:25] * ikrstic (~ikrstic@109-93-126-224.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[21:26] <tnt_> I'm wondering: I see a lot of people setting up 10G bit link for ceph, sometimes dual 10G for client and backend network ... but in my setup the OSDs don't get even _close_ to even single 1G link speed (like maybe 10% of it ant that's when doing recovery ...).
[21:27] <tnt_> So did anyone doing 10G exactly measure the benefits of doing so ?
[21:27] <cookednoodles> add more nodes, stripe
[21:28] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) Quit (Quit: Leaving.)
[21:28] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[21:29] <tnt_> cookednoodles: do you have a 10G ceph cluster ? And do you really measure 10G speed usage on those links ?
[21:29] <cookednoodles> I don't, but I've seen it on the mailing lists
[21:30] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[21:31] <hedin> alfredodeza: https://dpaste.de/p9hj and nothing gets written to /var/log/ceph/
[21:31] <alfredodeza> hedin: is that thing returning a non-zero exit status?
[21:32] <alfredodeza> what does $? says
[21:34] <hedin> it returns 127
[21:38] * askanis (~askanis@2001:4dd0:ff00:84de:f803:ef11:faa6:b914) has joined #ceph
[21:39] * askanis (~askanis@2001:4dd0:ff00:84de:f803:ef11:faa6:b914) Quit ()
[21:40] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[21:40] * sarob (~sarob@2001:4998:effd:600:c9d2:8de7:56ad:87ec) has joined #ceph
[21:40] * ircolle (~Adium@76.195.221.149) has joined #ceph
[21:43] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) has joined #ceph
[21:44] <sommarnatt> tnt_: We're getting 1GB/s+ during recovery
[21:44] <sommarnatt> on 2x10Gbit/s
[21:45] <sommarnatt> That's with enterprise SSD as journals for each osd
[21:47] * Sysadmin88 (~IceChat77@94.4.20.0) has joined #ceph
[21:48] <tnt_> sommarnatt: how many OSD do you have per node ?
[21:48] <sommarnatt> 5 nodes, 10 osds on each
[21:48] * sarob (~sarob@2001:4998:effd:600:c9d2:8de7:56ad:87ec) Quit (Ping timeout: 480 seconds)
[21:48] <sommarnatt> and 2 enterprise ssd on each, 5 osds per ssd for journal
[21:49] <tnt_> sommarnatt: do you know which kind of object you have ? Here I have mostly RGW which yields quite a lot of small objects (rather than a "few" large objects). Seems my pool that does RBD actually recovers much faster and uses the links a lot more because of the 4M object size.
[21:50] * dneary (~dneary@87-231-145-225.rev.numericable.fr) Quit (Read error: Operation timed out)
[21:50] <sommarnatt> Only RBD here, we're running qemu/kvm on it
[21:50] <sommarnatt> So yeah we've got 4M objects
[21:51] <tnt_> Ok, thanks for the details.
[21:52] <sommarnatt> np haven't tried RGW yet, so that might be quite different
[21:52] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) Quit (Quit: Leaving.)
[21:53] <tnt_> well, it depends a lot of what you put on it :) We have a pool with video files and that works well too. OTOH we have a pool with like emails and that's a zillions small files and takes much longer when doing recovery.
[21:54] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:54] <sommarnatt> Oh yeah ;)
[21:56] <sommarnatt> Im in the progress of rsyncing around 600GB of data, mostly email files..
[21:56] <sommarnatt> that takes some time.
[21:56] <sommarnatt> especially since the old storage is rather crappy nowadays.
[21:57] <tnt_> hehe :)
[21:59] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[22:00] * kevinc (~kevinc__@client65-40.sdsc.edu) has joined #ceph
[22:00] <alfredodeza> hedin: I am out of ideas :/ you could try the mailing list
[22:00] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) Quit (Quit: Leaving)
[22:02] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[22:04] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) has joined #ceph
[22:05] <hedin> alfredodeza: i'm going to throw 12.04 on the server and see if it works
[22:06] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[22:06] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) has joined #ceph
[22:09] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[22:10] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) has joined #ceph
[22:11] * tdasilva (~quassel@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[22:12] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Remote host closed the connection)
[22:18] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) has joined #ceph
[22:19] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) Quit ()
[22:20] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) has joined #ceph
[22:20] <toMeloos> Hi everyone, I found the "osd set-overlay" and "osd remove-overlay" commands for writeback cache tiering but is there anyway I can see the existing overlays?
[22:22] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) Quit ()
[22:22] * dmick1 is now known as dmick
[22:22] <toMeloos> osd dump doesn't seem to provide any overlay info either :(
[22:23] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) has joined #ceph
[22:23] * hyperbaba (~hyperbaba@80.74.175.250) has joined #ceph
[22:23] <hyperbaba> hi there
[22:24] * stj (~stj@tully.csail.mit.edu) Quit (Ping timeout: 480 seconds)
[22:26] <hyperbaba> I have a problem with a small ceph cluster. Upon adding a ceph-osd using ceph-deploy the new osd is marked down from the start.
[22:27] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) Quit ()
[22:28] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[22:36] * ircolle (~Adium@76.195.221.149) Quit (Quit: Leaving.)
[22:36] * gillesMo (~gillesMo@00012912.user.oftc.net) has joined #ceph
[22:38] * kiwnix (~kiwnix@00011f91.user.oftc.net) Quit (Remote host closed the connection)
[22:38] * gillesMo (~gillesMo@00012912.user.oftc.net) Quit ()
[22:40] * stj (~stj@2001:470:8b2d:bb8:21d:9ff:fe29:8a6a) has joined #ceph
[22:43] * gillesMo (~atomic@151.172.24.109.rev.sfr.net) has joined #ceph
[22:47] * gillesMo (~atomic@00012912.user.oftc.net) Quit ()
[22:47] <xarses> hyperbaba: just descirbe your issue in a little more detail, paste some logs you have and maybe someone will pipe up
[22:47] * gillesMo (~atomic@00012912.user.oftc.net) has joined #ceph
[22:48] * gillesMo (~atomic@00012912.user.oftc.net) Quit ()
[22:56] * JC (~JC@AMontpellier-651-1-32-204.w90-57.abo.wanadoo.fr) Quit (Quit: Leaving.)
[22:57] * kevinc (~kevinc__@client65-40.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[22:57] <hyperbaba> xarses: it is too late for me to engage in this.Going to sleep . Will try my luck tomorrow.
[22:58] * toMeloos (~tom@5ED28C85.cm-7-3c.dynamic.ziggo.nl) has left #ceph
[22:58] * fdmanana (~fdmanana@bl4-179-170.dsl.telepac.pt) Quit (Quit: Leaving)
[23:02] * CAPSLOCK2000 (~oftc@541856CC.cm-5-1b.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[23:08] * kevinc (~kevinc__@client65-40.sdsc.edu) has joined #ceph
[23:10] * hyperbaba (~hyperbaba@80.74.175.250) Quit (Remote host closed the connection)
[23:12] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[23:13] * fridudad (~oftc-webi@p5DD4E55B.dip0.t-ipconnect.de) has joined #ceph
[23:14] * rweeks (~rweeks@pat.hitachigst.com) Quit (Read error: Connection reset by peer)
[23:15] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[23:15] * rweeks (~rweeks@pat.hitachigst.com) has joined #ceph
[23:15] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[23:17] * JC (~JC@AMontpellier-651-1-32-204.w90-57.abo.wanadoo.fr) has joined #ceph
[23:19] * xarses (~andreww@12.164.168.117) Quit (Read error: Operation timed out)
[23:19] * rendar (~I@host24-115-dynamic.57-82-r.retail.telecomitalia.it) Quit ()
[23:21] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Read error: Operation timed out)
[23:22] * JC1 (~JC@AMontpellier-651-1-32-204.w90-57.abo.wanadoo.fr) has joined #ceph
[23:25] * bandrus (~Adium@m90-141-181-147.cust.tele2.se) has joined #ceph
[23:29] * JC (~JC@AMontpellier-651-1-32-204.w90-57.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[23:30] * rturk|afk is now known as rturk
[23:33] * rotbeard (~redbeard@2a02:908:df19:9900:76f0:6dff:fe3b:994d) has joined #ceph
[23:34] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) has joined #ceph
[23:35] * xarses (~andreww@12.164.168.117) has joined #ceph
[23:37] * Hell_Fire (~HellFire@123-243-155-184.static.tpgi.com.au) has joined #ceph
[23:38] * lcavassa (~lcavassa@62.253.225.18) has joined #ceph
[23:38] * Hell_Fire__ (~HellFire@123-243-155-184.static.tpgi.com.au) Quit (Read error: Network is unreachable)
[23:42] * Karcaw (~evan@71-95-122-38.dhcp.mdfd.or.charter.com) Quit (Ping timeout: 480 seconds)
[23:43] * sarob (~sarob@nat-dip27-wl-a.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:43] * fridudad (~oftc-webi@p5DD4E55B.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[23:43] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) has joined #ceph
[23:44] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) Quit ()
[23:44] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) has joined #ceph
[23:46] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[23:51] * Karcaw (~evan@71-95-122-38.dhcp.mdfd.or.charter.com) has joined #ceph
[23:52] * scuttlemonkey is now known as scuttle|afk
[23:53] * sjm (~sjm@pool-72-76-115-220.nwrknj.fios.verizon.net) has left #ceph
[23:54] * vmx (~vmx@dslb-084-056-033-110.pools.arcor-ip.net) Quit (Quit: Leaving)
[23:59] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.