#ceph IRC Log

Index

IRC Log for 2016-05-02

Timestamps are in GMT/BST.

[0:00] <TMM> mlc, tlc, and interfaces are just red herrings
[0:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[0:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[0:21] * wwdillingham (~LobsterRo@209-6-222-74.c3-0.hdp-ubr1.sbo-hdp.ma.cable.rcn.com) Quit (Quit: wwdillingham)
[0:25] * BillyBobJohn (~BillyBobJ@95.211.169.35) has joined #ceph
[0:28] * darks (~DougalJac@06SAABYL1.tor-irc.dnsbl.oftc.net) has joined #ceph
[0:47] * Brochacho (~alberto@c-73-45-127-198.hsd1.il.comcast.net) has joined #ceph
[0:55] * BillyBobJohn (~BillyBobJ@06SAABYLW.tor-irc.dnsbl.oftc.net) Quit ()
[0:55] * Eman2 (~Swompie`@06SAABYMY.tor-irc.dnsbl.oftc.net) has joined #ceph
[0:57] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:58] * darks (~DougalJac@06SAABYL1.tor-irc.dnsbl.oftc.net) Quit ()
[1:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[1:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[1:03] * Arfed1 (~cheese^@06SAABYM6.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:04] * madkiss (~madkiss@31.154.44.218) Quit (Quit: Leaving.)
[1:05] * rendar (~I@host112-137-dynamic.59-82-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[1:17] * billwebb (~billwebb@66.56.15.14) has joined #ceph
[1:19] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[1:25] * Eman2 (~Swompie`@06SAABYMY.tor-irc.dnsbl.oftc.net) Quit ()
[1:25] * Frostshifter (~PappI@static-ip-85-25-103-119.inaddr.ip-pool.com) has joined #ceph
[1:32] * LeaChim (~LeaChim@host86-147-119-244.range86-147.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:33] * Arfed1 (~cheese^@06SAABYM6.tor-irc.dnsbl.oftc.net) Quit ()
[1:37] * LeaChim (~LeaChim@host86-147-119-244.range86-147.btcentralplus.com) has joined #ceph
[1:48] * LeaChim (~LeaChim@host86-147-119-244.range86-147.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:48] * billwebb (~billwebb@66.56.15.14) Quit (Quit: billwebb)
[1:55] * Frostshifter (~PappI@4MJAAELLV.tor-irc.dnsbl.oftc.net) Quit ()
[1:55] * Rehevkor (~Wijk@edwardsnowden1.torservers.net) has joined #ceph
[1:56] * wernerru (~oftc-webi@c-98-209-40-232.hsd1.mi.comcast.net) Quit (Quit: Page closed)
[1:56] * wernerru (~oftc-webi@c-98-209-40-232.hsd1.mi.comcast.net) has joined #ceph
[1:59] <via> do the mons need to have the bluestore expierimental features enabled to support a bluestore osd?
[2:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[2:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[2:02] * oms101 (~oms101@p20030057EA1E7900C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:03] * Frostshifter (~Malcovent@destiny.enn.lu) has joined #ceph
[2:10] * oms101 (~oms101@p20030057EA06F800C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:13] <via> that doesn't appear to have made a difference
[2:25] * Rehevkor (~Wijk@76GAAEZMY.tor-irc.dnsbl.oftc.net) Quit ()
[2:26] * Xa (~dontron@tor-relay.zwiebeltoralf.de) has joined #ceph
[2:33] * Frostshifter (~Malcovent@4MJAAELMM.tor-irc.dnsbl.oftc.net) Quit ()
[2:33] * offender (~bildramer@hessel3.torservers.net) has joined #ceph
[2:55] * Xa (~dontron@76GAAEZNK.tor-irc.dnsbl.oftc.net) Quit ()
[2:55] * cooey1 (~Altitudes@exit1.ipredator.se) has joined #ceph
[3:00] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[3:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[3:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[3:03] * offender (~bildramer@06SAABYPN.tor-irc.dnsbl.oftc.net) Quit ()
[3:03] * Coe|work (~CoZmicShR@4.tor.exit.babylon.network) has joined #ceph
[3:25] * cooey1 (~Altitudes@7V7AAD9UK.tor-irc.dnsbl.oftc.net) Quit ()
[3:26] * phyphor (~Kayla@81-7-17-171.blue.kundencontroller.de) has joined #ceph
[3:33] * Coe|work (~CoZmicShR@6AGAABJ8R.tor-irc.dnsbl.oftc.net) Quit ()
[3:37] * stein (~stein@185.56.185.82) Quit (Ping timeout: 480 seconds)
[3:42] * derjohn_mobi (~aj@x590d06ba.dyn.telefonica.de) has joined #ceph
[3:50] * aj__ (~aj@x590e5d6a.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[3:55] * phyphor (~Kayla@06SAABYRH.tor-irc.dnsbl.oftc.net) Quit ()
[3:56] * pico1 (~TheDoudou@remailer.cpunk.us) has joined #ceph
[3:56] * stein (~stein@185.56.185.82) has joined #ceph
[4:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[4:01] * Meths (~meths@95.151.244.244) Quit (Read error: Connection reset by peer)
[4:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[4:01] * Meths (~meths@95.151.244.244) has joined #ceph
[4:07] * dug (~AluAlu@128.153.145.125) has joined #ceph
[4:21] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[4:23] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[4:25] * pico1 (~TheDoudou@6AGAABKAA.tor-irc.dnsbl.oftc.net) Quit ()
[4:25] * Diablodoct0r (~biGGer@cloud.tor.ninja) has joined #ceph
[4:30] * karnan (~karnan@121.244.87.117) has joined #ceph
[4:30] * jclm (~jclm@marriott-hotel-ottawa-yowmc.sites.intello.com) has joined #ceph
[4:37] * dug (~AluAlu@76GAAEZPQ.tor-irc.dnsbl.oftc.net) Quit ()
[4:37] * narthollis (~Hejt@192.42.116.16) has joined #ceph
[4:39] * huangjun (~kvirc@117.152.69.60) has joined #ceph
[4:41] * huangjun|2 (~kvirc@117.152.69.60) has joined #ceph
[4:41] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) has joined #ceph
[4:45] * shohn1 (~shohn@dslb-178-012-178-117.178.012.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[4:48] * huangjun (~kvirc@117.152.69.60) Quit (Ping timeout: 480 seconds)
[4:55] * Diablodoct0r (~biGGer@6AGAABKA6.tor-irc.dnsbl.oftc.net) Quit ()
[4:56] * Tenk (~ain@85.159.237.210) has joined #ceph
[4:58] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[5:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[5:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[5:05] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:07] * narthollis (~Hejt@4MJAAELO3.tor-irc.dnsbl.oftc.net) Quit ()
[5:10] * tobiash (~quassel@212.118.206.70) Quit (Read error: Connection reset by peer)
[5:10] * goberle (~goberle@mid.ygg.tf) Quit (Read error: Connection reset by peer)
[5:11] * tobiash (~quassel@212.118.206.70) has joined #ceph
[5:11] * goberle (~goberle@mid.ygg.tf) has joined #ceph
[5:11] * al (d@niel.cx) Quit (Ping timeout: 480 seconds)
[5:11] * al (quassel@niel.cx) has joined #ceph
[5:13] * epheo (~smuxi@ns326280.ip-91-121-67.eu) Quit (Remote host closed the connection)
[5:13] * epheo (~smuxi@ns326280.ip-91-121-67.eu) has joined #ceph
[5:25] * Tenk (~ain@6AGAABKBX.tor-irc.dnsbl.oftc.net) Quit ()
[5:25] * Nijikokun (~Tumm@93.115.95.216) has joined #ceph
[5:30] * Vacuum__ (~Vacuum@88.130.208.17) has joined #ceph
[5:34] * dgurtner (~dgurtner@c-75-74-127-185.hsd1.fl.comcast.net) has joined #ceph
[5:37] * Vacuum_ (~Vacuum@i59F7916D.versanet.de) Quit (Ping timeout: 480 seconds)
[5:37] * Dysgalt (~OODavo@tor-exit4-readme.dfri.se) has joined #ceph
[5:38] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[5:41] * kefu (~kefu@183.193.162.205) has joined #ceph
[5:55] * Nijikokun (~Tumm@06SAABYVE.tor-irc.dnsbl.oftc.net) Quit ()
[5:55] * Ian2128 (~nupanick@chomsky.torservers.net) has joined #ceph
[5:59] * TiCPU (~owrt@2001:470:1c:40::2) Quit (Read error: Connection reset by peer)
[6:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[6:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[6:01] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[6:03] * alexxy (~alexxy@biod.pnpi.spb.ru) has joined #ceph
[6:06] * alexxy[home] (~alexxy@biod.pnpi.spb.ru) Quit (Ping timeout: 480 seconds)
[6:07] * Dysgalt (~OODavo@6AGAABKCW.tor-irc.dnsbl.oftc.net) Quit ()
[6:09] * overclk (~quassel@121.244.87.117) has joined #ceph
[6:12] * dgurtner (~dgurtner@c-75-74-127-185.hsd1.fl.comcast.net) Quit (Ping timeout: 480 seconds)
[6:17] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[6:18] <m0zes> any thoughts on how to fix a scrub error in an ec pool? 34.260s0 deep-scrub stat mismatch, got 52904/52905 objects, 0/0 clones, 52904/52905 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 143977902783/143977902783 bytes, 0/0 hit_set_archive bytes.
[6:18] <m0zes> I've tried a repair and it doesn't seem to do anything.
[6:19] <m0zes> If I'm reading this right, all the bytes are there. is that a missing object? or is that just a statistic tracking bug?
[6:19] <m0zes> is it a 0 byte object missing?
[6:21] <m0zes> my logs indicate that the primary osd was told to initiate the repair, but I'm not seeing anything in the logs indicating that it is actually doing anything. 2016-05-01 23:15:43.711778 mon.0 10.5.38.1:6789/0 2218911 : audit [INF] from='client.? 10.5.38.1:0/3477524802' entity='client.admin' cmd=[{"prefix": "pg repair", "pgid": "34.260"}]: dispatch
[6:21] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[6:25] * Ian2128 (~nupanick@7V7AAD9U9.tor-irc.dnsbl.oftc.net) Quit ()
[6:25] * rcfighter (~maku@static-83-41-68-212.sadecehosting.net) has joined #ceph
[6:26] * yanzheng (~zhyan@118.116.113.70) has joined #ceph
[6:27] <flaf> m0zes: maybe you can compare ???sha1sum /var/lib/ceph/osd/ceph-$id/current/$pgid_head/* | sed "s|/ceph-$id/|/ceph-id/|" | sha1sum??? on each osd?
[6:28] <flaf> (on each pg)
[6:29] <m0zes> this is an ec pool, so the data should be different ;)
[6:29] <flaf> Ah... correct, sorry.
[6:31] <m0zes> its just weird. being an ec pool, I assumed scrub errors should be *mostly* impossible due to the the checksums and parity blocks. or at least trivially repairable.
[6:32] * cmorandin (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[6:32] <flaf> Yes indeed, I'm afraid to be unqualified to help you...
[6:32] <m0zes> the "stat mismatch" is the sum total of the logs provided by the osd, about this pg.
[6:33] * yanzheng (~zhyan@118.116.113.70) Quit (Quit: This computer has gone to sleep)
[6:34] <m0zes> ahh well. I'll ask again in the morning. and I might pop over #ceph-devel, too
[6:35] <flaf> m0zes: http://tracker.ceph.com/issues/8752
[6:35] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[6:35] <flaf> Is your problem similar to this ticket?
[6:36] * deepthi (~deepthi@115.118.63.76) has joined #ceph
[6:38] <m0zes> similar, except there it was only happening on caching tiers. this is an underlying ec pool.
[6:39] * yanzheng (~zhyan@118.116.113.70) has joined #ceph
[6:40] * yanzheng (~zhyan@118.116.113.70) Quit ()
[6:41] * Brochacho (~alberto@c-73-45-127-198.hsd1.il.comcast.net) Quit (Quit: Brochacho)
[6:41] <m0zes> interestingly enough, someone on the mailing list asked about a similar looking issue yesterday.
[6:42] * derjohn_mobi (~aj@x590d06ba.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[6:43] <flaf> but in the ML, the OP has got 284/282 objects ???
[6:44] <flaf> but in the ML, the OP has ???got 284/282 objects??? ???
[6:44] <flaf> m0zes: which version of ceph for you?
[6:44] <m0zes> infernalis (9.2.1).
[6:45] * Aim_ (~aim@batroun.rokkanet.org) has joined #ceph
[6:51] * TiCPU (~owrt@2001:470:1c:40::2) has joined #ceph
[6:53] * yanzheng (~zhyan@118.116.113.70) has joined #ceph
[6:53] * yanzheng (~zhyan@118.116.113.70) Quit ()
[6:55] * rcfighter (~maku@76GAAEZR6.tor-irc.dnsbl.oftc.net) Quit ()
[6:56] * hgjhgjh (~Silentkil@178-17-170-99.static.host) has joined #ceph
[6:57] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[7:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[7:01] * cmorandin (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[7:02] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[7:06] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) Quit (Quit: It's just that easy)
[7:07] * W|ldCraze (~dug@tor-exit-node-nibbana.dson.org) has joined #ceph
[7:21] * i_m (~ivan.miro@31.173.100.109) has joined #ceph
[7:25] * hgjhgjh (~Silentkil@4MJAAELRZ.tor-irc.dnsbl.oftc.net) Quit ()
[7:26] * Scaevolus (~Dysgalt@hessel3.torservers.net) has joined #ceph
[7:30] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) has joined #ceph
[7:34] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[7:37] * W|ldCraze (~dug@06SAABYYO.tor-irc.dnsbl.oftc.net) Quit ()
[7:37] * Bored (~poller@46.183.218.199) has joined #ceph
[7:40] * i_m1 (~ivan.miro@31.173.100.234) has joined #ceph
[7:40] * i_m (~ivan.miro@31.173.100.109) Quit (Read error: Connection reset by peer)
[7:43] * derjohn_mobi (~aj@88.128.81.196) has joined #ceph
[7:45] * branto (~branto@nat-pool-brq-t.redhat.com) has joined #ceph
[7:55] * Scaevolus (~Dysgalt@76GAAEZS9.tor-irc.dnsbl.oftc.net) Quit ()
[8:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[8:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[8:01] * krypto (~krypto@103.252.27.27) has joined #ceph
[8:03] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Remote host closed the connection)
[8:03] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[8:07] * Bored (~poller@76GAAEZTJ.tor-irc.dnsbl.oftc.net) Quit ()
[8:11] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[8:12] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[8:13] * olqs (~olqs@cpe90-146-85-69.liwest.at) has joined #ceph
[8:15] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:15] <olqs> Hi, i have two bluestore osds which fail to start with different error messages. http://pastebin.com/gbqwMhjx Is there a possibility to repair this osd, or is the only way to zap the disk and resync?
[8:20] * mykola (~Mikolaj@91.245.78.193) has joined #ceph
[8:20] * mgolub (~Mikolaj@91.245.78.193) has joined #ceph
[8:25] * rcfighter (~GuntherDW@nl7x.mullvad.net) has joined #ceph
[8:30] * Hemanth (~hkumar_@103.228.221.149) has joined #ceph
[8:34] * kefu (~kefu@183.193.162.205) Quit (Ping timeout: 480 seconds)
[8:36] * BlackDex_ is now known as BlackDex
[8:36] * kefu (~kefu@114.92.122.74) has joined #ceph
[8:37] * derjohn_mobi (~aj@88.128.81.196) Quit (Ping timeout: 480 seconds)
[8:42] * ronrib (~boswortr@45.32.242.135) has joined #ceph
[8:49] * codice (~toodles@75-128-34-237.static.mtpk.ca.charter.com) Quit (Remote host closed the connection)
[8:51] * dvanders (~dvanders@dvanders-pro.cern.ch) has joined #ceph
[8:54] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[8:55] * krypto (~krypto@103.252.27.27) Quit (Ping timeout: 480 seconds)
[8:55] * rcfighter (~GuntherDW@76GAAEZUL.tor-irc.dnsbl.oftc.net) Quit ()
[8:56] * shylesh__ (~shylesh@121.244.87.118) has joined #ceph
[8:56] * itamarl (~itamarl@194.90.7.244) has joined #ceph
[9:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[9:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[9:02] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[9:03] * garphy`aw is now known as garphy
[9:04] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[9:06] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:07] * AGaW (~OODavo@exit1.ipredator.se) has joined #ceph
[9:12] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[9:13] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[9:22] * Concubidated (~cube@c-50-173-245-118.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:22] * derjohn_mobi (~aj@fw.gkh-setu.de) has joined #ceph
[9:24] * Concubidated (~cube@c-50-173-245-118.hsd1.ca.comcast.net) has joined #ceph
[9:25] * brannmar (~andrew_m@46.166.138.145) has joined #ceph
[9:29] * fsimonce (~simon@87.13.130.124) has joined #ceph
[9:33] * lmb__ (~Lars@2a02:8109:8100:1d2c:7026:1b65:dba7:61f4) Quit (Remote host closed the connection)
[9:37] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[9:37] * AGaW (~OODavo@6AGAABKHH.tor-irc.dnsbl.oftc.net) Quit ()
[9:40] * pabluk__ is now known as pabluk_
[9:43] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:43] * rendar (~I@host38-182-dynamic.12-79-r.retail.telecomitalia.it) has joined #ceph
[9:45] * penguinRaider (~KiKo@14.139.82.6) Quit (Read error: Connection reset by peer)
[9:48] * lmb (~Lars@2a02:8109:8100:1d2c:f021:41c1:6dbf:f45d) has joined #ceph
[9:52] * krypto (~krypto@103.252.26.176) has joined #ceph
[9:54] * rotbeard (~redbeard@aftr-109-90-232-106.unity-media.net) has joined #ceph
[9:55] * brannmar (~andrew_m@76GAAEZVT.tor-irc.dnsbl.oftc.net) Quit ()
[9:57] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[9:59] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[10:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[10:01] * Meths (~meths@95.151.244.244) Quit (Read error: Connection reset by peer)
[10:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[10:02] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[10:02] * Meths (~meths@95.151.244.244) has joined #ceph
[10:03] * penguinRaider (~KiKo@14.139.82.6) has joined #ceph
[10:06] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[10:08] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[10:09] * codice (~toodles@75-128-34-237.static.mtpk.ca.charter.com) has joined #ceph
[10:12] * Kurt (~Adium@2001:628:1:5:a116:4b84:8ed0:cd45) has joined #ceph
[10:14] * shohn1 (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) has joined #ceph
[10:15] * olqs (~olqs@cpe90-146-85-69.liwest.at) Quit (Quit: leaving)
[10:16] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) Quit (Quit: Leaving.)
[10:16] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[10:17] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[10:17] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[10:17] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[10:20] * adun153 (~ljtirazon@121.58.192.6) has joined #ceph
[10:20] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:20] <adun153> Hi, why is my OSD in a "down" state?
[10:20] <adun153> osd.15.
[10:20] <adun153> 2016-05-02 09:44:04.901055 7f6b2f9f8700 0 -- 192.168.2.1:0/29280 >> 192.168.2.2:6834/2018437 pipe(0x5642670c8000 sd=41 :0 s=1 pgs=0 cs=0 l=1 c=0x56425dfa6520).fault
[10:21] * bvi (~bastiaan@185.56.32.1) has joined #ceph
[10:24] * i_m (~ivan.miro@31.173.121.32) has joined #ceph
[10:24] * i_m1 (~ivan.miro@31.173.100.234) Quit (Read error: Connection reset by peer)
[10:26] * Mraedis (~Sun7zu@06SAABY4U.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:27] * b0e (~aledermue@213.95.25.82) has joined #ceph
[10:28] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[10:30] * evelu (~erwan@37.161.157.40) has joined #ceph
[10:30] * thomnico (~thomnico@2a01:e35:8b41:120:5145:b80f:6a00:6a05) has joined #ceph
[10:33] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) has joined #ceph
[10:37] * rhonabwy (~jwandborg@anonymous.sec.nl) has joined #ceph
[10:39] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[10:43] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:46] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:46] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[10:46] <sep> adun153, is the process running ? check with ps aux | grep osd.15 on the node in question, also check the log of the osd in question on the node
[10:46] <Be-El> using krbd on a host with osds is not recommended due to possible memory deadlocks; does the same restriction also apply to kernel based cephfs?
[10:47] * i_m (~ivan.miro@31.173.121.32) Quit (Ping timeout: 480 seconds)
[10:48] * b0e1 (~aledermue@213.95.25.82) has joined #ceph
[10:48] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:52] * b0e (~aledermue@213.95.25.82) Quit (Ping timeout: 480 seconds)
[10:55] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[10:55] * Mraedis (~Sun7zu@06SAABY4U.tor-irc.dnsbl.oftc.net) Quit ()
[11:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[11:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[11:03] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[11:03] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[11:07] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) has joined #ceph
[11:07] * rhonabwy (~jwandborg@7V7AAD9W6.tor-irc.dnsbl.oftc.net) Quit ()
[11:08] * MKoR (~mps@185.100.87.73) has joined #ceph
[11:09] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:10] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:7875:fe8c:7399:b0c8) has joined #ceph
[11:10] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[11:11] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Killed (NickServ (Too many failed password attempts.)))
[11:12] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[11:18] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[11:20] * b0e1 (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[11:20] * b0e (~aledermue@213.95.25.82) has joined #ceph
[11:21] * pabluk_ is now known as pabluk__
[11:23] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[11:25] * overclk (~quassel@121.244.87.117) Quit (Remote host closed the connection)
[11:29] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[11:30] * rakeshgm (~rakesh@121.244.87.117) Quit (Ping timeout: 480 seconds)
[11:37] * Kioob1 (~Kioob@ALyon-652-1-70-193.w109-213.abo.wanadoo.fr) has joined #ceph
[11:37] * MKoR (~mps@4MJAAELW6.tor-irc.dnsbl.oftc.net) Quit ()
[11:37] * BillyBobJohn (~curtis864@93.115.95.205) has joined #ceph
[11:51] * ade (~abradshaw@85.158.226.30) has joined #ceph
[12:00] * Kottizen (~DougalJac@hessel2.torservers.net) has joined #ceph
[12:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[12:01] * haomaiwang (~haomaiwan@106.187.51.170) has joined #ceph
[12:06] * krypto (~krypto@103.252.26.176) Quit (Quit: Leaving)
[12:07] * wernerru (~oftc-webi@c-98-209-40-232.hsd1.mi.comcast.net) Quit (Quit: Page closed)
[12:07] * BillyBobJohn (~curtis864@4MJAAELXL.tor-irc.dnsbl.oftc.net) Quit ()
[12:07] * brianjjo (~aleksag@orion.enn.lu) has joined #ceph
[12:08] * krypto (~krypto@103.252.26.176) has joined #ceph
[12:09] * penguinRaider_ (~KiKo@14.139.82.6) has joined #ceph
[12:11] * penguinRaider (~KiKo@14.139.82.6) Quit (Ping timeout: 480 seconds)
[12:12] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[12:14] * karnan (~karnan@121.244.87.117) has joined #ceph
[12:20] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[12:24] * Gandle (~boob@00021b85.user.oftc.net) has joined #ceph
[12:27] * rdias (~rdias@2001:8a0:749a:d01:90df:fd92:5a73:6034) Quit (Ping timeout: 480 seconds)
[12:27] <flaf> Be-El: I know the problem for krdb/OSDs but I have never read something equivalent for kcephs/OSDs. But it's not a proof of course. ;)
[12:27] <flaf> However, personally now I prefer to use ceph-fuse.
[12:28] <Be-El> ceph-fuse would be fine for the intended task (moving files between cephfs data pools), but the kernel client has a number of advantages with respect to speed
[12:29] <Be-El> especially meta data speed
[12:29] <Be-El> (and working page cache + support for supplementary unix groups)
[12:30] * TMM (~hp@185.5.122.2) has joined #ceph
[12:30] * Kottizen (~DougalJac@06SAABY7V.tor-irc.dnsbl.oftc.net) Quit ()
[12:31] * rdias (~rdias@2001:8a0:749a:d01:dd23:dbaa:3b64:d12) has joined #ceph
[12:31] <flaf> Ah ok, I see.
[12:31] * TMM (~hp@185.5.122.2) Quit ()
[12:32] * TMM (~hp@185.5.122.2) has joined #ceph
[12:35] * Salamander_ (~sixofour@hessel2.torservers.net) has joined #ceph
[12:36] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[12:37] * shohn1 (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) Quit (Quit: Leaving.)
[12:37] * brianjjo (~aleksag@06SAABY73.tor-irc.dnsbl.oftc.net) Quit ()
[12:43] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[12:44] * kefu is now known as kefu|afk
[12:47] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[12:56] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[12:56] * shylesh__ (~shylesh@121.244.87.118) Quit (Remote host closed the connection)
[12:56] <adun153> I'm seeng this in my cluster.
[12:56] <adun153> 83 pgs stuck degraded
[12:56] <adun153> 3633 pgs stuck inactive
[12:56] <adun153> Should I be worried?
[12:57] * raarts (~Adium@82-171-243-109.ip.telfort.nl) Quit (Remote host closed the connection)
[12:57] * Hemanth (~hkumar_@103.228.221.149) Quit (Ping timeout: 480 seconds)
[13:01] * haomaiwang (~haomaiwan@106.187.51.170) Quit (Remote host closed the connection)
[13:01] * itamarl (~itamarl@194.90.7.244) Quit (Quit: itamarl)
[13:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[13:04] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[13:04] * Salamander_ (~sixofour@76GAAEZYJ.tor-irc.dnsbl.oftc.net) Quit ()
[13:07] * jlayton (~jlayton@2606:a000:1125:4074:c5:7ff:fe41:3227) Quit (Quit: ZNC 1.6.2 - http://znc.in)
[13:07] * jlayton (~jlayton@2606:a000:1125:4074:c5:7ff:fe41:3227) has joined #ceph
[13:07] * Atomizer (~KrimZon@06SAABY97.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:08] * kefu|afk is now known as kefu
[13:13] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[13:17] * The1_ (~the_one@87.104.212.66) has joined #ceph
[13:18] <sep> adun153, well inactive mean they will not process read/write requests. so the storage is in effect unavailable
[13:19] <adun153> sep: what are the most common reasons why they are that way?
[13:19] <sep> if that is a problem for your cluster then yes you should worry.
[13:20] <flaf> adun153: are all osds in? (=> ceph status)
[13:20] <sep> adun153, perhaps the number of replicas to low ? what is your replication count and your min
[13:20] <sep> ceph osd tree to check all osds
[13:20] * jclm (~jclm@marriott-hotel-ottawa-yowmc.sites.intello.com) Quit (Quit: Leaving.)
[13:21] <adun153> osdmap e12292: 66 osds: 53 up, 55 in; 1669 remapped pgs
[13:21] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Ping timeout: 480 seconds)
[13:22] * T1 (~the_one@87.104.212.66) Quit (Ping timeout: 480 seconds)
[13:22] <adun153> sep: where do I check replicas?
[13:22] <flaf> Ah so there are some OSDs "not in".
[13:22] <sep> so you have lots of osds that are down. why is that ? a broken serrver ?
[13:23] <flaf> adun153: indeed you have to repair the OSDs which are down.
[13:23] <adun153> I'm a noob. So, 66 osds minus 53 up, does that mean that 13 osds are down?
[13:23] <sep> adun153, yes
[13:23] <sep> when troubleshooting have a terminal with "watch ceph -s" in it. ; and another terminal with ceph -w in it
[13:24] <flaf> => as sep has said => ceph osd tree
[13:24] <sep> ceph osd tree will show you what osd's are down/up/in. if all osds that are down are on one server, then you need to look at that one.
[13:24] * Hemanth (~hkumar_@103.228.221.149) has joined #ceph
[13:25] <adun153> I have three storage nodes
[13:25] <flaf> The first thing the check it's if the daemon of an OSD down is running or not.
[13:25] <adun153> storage1 has some OSDs down
[13:25] <adun153> storage2 and 3 have all OSDs up.
[13:26] <adun153> When I first noticed storage1 was down, all OSDs were down
[13:26] <adun153> I rebooted it.
[13:26] <adun153> Was that a REALLY BAD move?
[13:26] <sep> i like to find out why a problem is before rebooting,
[13:26] <adun153> Now 9 out of 22 are up.
[13:27] * jlayton (~jlayton@2606:a000:1125:4074:c5:7ff:fe41:3227) Quit (Quit: ZNC 1.6.2 - http://znc.in)
[13:27] <sep> also i like to set "no out" before rebooting, to avoid the cluster trying to remap everything
[13:27] <sep> adun153, is the server just up ? it sounds like it is in the process of starting osd's
[13:27] <flaf> adun153: ssh to "storage1" and check the status of the daemons.
[13:27] <adun153> sep: I rebooted it about 2 hours ago.
[13:27] <sep> the ceph -w windows should tell you if osds's are connecting/peering
[13:28] <adun153> flaf: what's the command to check the daemon statuses?
[13:28] * LeaChim (~LeaChim@host86-150-161-6.range86-150.btcentralplus.com) has joined #ceph
[13:28] * jlayton (~jlayton@2606:a000:1125:4074:c5:7ff:fe41:3227) has joined #ceph
[13:28] <flaf> adun153: it depends on your OS.
[13:28] * waydoo (~waydoo@112.198.78.155) has joined #ceph
[13:28] * Miouge (~Miouge@188.188.83.166) has joined #ceph
[13:29] <adun153> flaf: ubuntu?
[13:29] <adun153> oh, you mean using "ps"?
[13:29] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[13:29] <adun153> All 22 OSD daemons are running for all 22 OSDs in Storage1.
[13:29] <Miouge> Where can I find info on the downsides of erasure coding (except for CPU usage)?
[13:30] <flaf> adun153: => status ceph-osd id=<osd-id>
[13:30] <flaf> (for ubuntu 14.04)
[13:30] <sep> also see if you can see messages like "falsly marked down" in ceph -w windows ? could be it's thrashing becouse the recovery traffic. do you have a separate public and cluster network ?
[13:31] <adun153> sep: yes, I have a separate network for replication between the nodes.
[13:32] <flaf> If a daemon is running (according to "status ceph-osd id=$id), check the logs of the OSD. In clear, classical method. ;)
[13:33] * yanzheng (~zhyan@118.116.113.70) has joined #ceph
[13:34] <sep> adun153, without reading the logs it's a bit hard to know what is going on. but if osd's are running but ceph see them as down and then up irregularly. i think i would have tried setting the noout flag. to see if it stabilizes with osd's up... since you have inactive pg's you wont make things much worse.
[13:34] <adun153> flaf: here's from a downed osd.
[13:34] <adun153> 2016-05-02 19:34:08.385490 7fb027e43700 0 -- 192.168.0.15:0/5877 >> 192.168.2.2:6824/3018437 pipe(0x559c25355000 sd=82 :0 s=1 pgs=0 cs=0 l=1 c=0x559c2801f180).fault
[13:34] <sep> i also allways set noout flag before rebooting a node.
[13:34] <adun153> sep: what does setting noout do?
[13:35] * pabluk__ is now known as pabluk_
[13:35] <sep> osds are not marked out of the cluster if they are down. iow the cluster does not try to remap the data to other osd's
[13:36] <sep> since you take down osd's when rebooting, you normaly do not want the cluster to start copying all the data to other disks. since with a reboot you expect the osd to return shortly.
[13:36] <flaf> adun153: try to restart the daemon (via the restart command) and, _simultaneously_ in another terminal, ???tail -f /var/log/ceph/ceph-osd.$id.log??? to see what happens.
[13:36] <sep> then once the node is rebooted and osd's are up again, you unset the noout flag
[13:36] <sep> think of it as "maintainance mode"
[13:36] <adun153> ah.
[13:37] <adun153> flaf: will do
[13:37] * Atomizer (~KrimZon@06SAABY97.tor-irc.dnsbl.oftc.net) Quit ()
[13:37] * nastidon (~clarjon1@06SAABZBK.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:39] * yanzheng (~zhyan@118.116.113.70) Quit (Quit: This computer has gone to sleep)
[13:40] * rotbeard (~redbeard@aftr-109-90-232-106.unity-media.net) Quit (Ping timeout: 480 seconds)
[13:40] <adun153> flaf: http://pastebin.com/DqFVFVKr
[13:40] <sep> adun153, the default replica is 3, check it with "ceph osd pool ls detail" ; if you have replica of 3, and 3 servers you basicaly have no redundancy, since you can not loose any disks or nodes
[13:41] <adun153> flaf: this is right after starting osd 26.
[13:43] <adun153> sep: http://pastebin.com/Dw5emaRj
[13:43] <adun153> Looks like default is 2.
[13:44] <flaf> if it's correct after a restart, try to restart the other osd which are down.
[13:44] * wjw-freebsd (~wjw@176.74.240.1) has joined #ceph
[13:46] <adun153> flaf: The logs I showed you, the last few lines with the "fault" at the end, is that cause for concern?
[13:47] * overclk (~quassel@117.202.104.175) has joined #ceph
[13:47] * thomnico (~thomnico@2a01:e35:8b41:120:5145:b80f:6a00:6a05) Quit (Quit: Ex-Chat)
[13:48] <flaf> I'm not sure but I don't think so. I think this line is not important (not sure).
[13:48] <flaf> after the restart, your osd is UP, correct?
[13:48] * rotbeard (~redbeard@aftr-109-90-232-106.unity-media.net) has joined #ceph
[13:49] <adun153> flaf: It is up, correct.
[13:49] <adun153> I am restarting the other downed OSDs one by one as well.
[13:49] <flaf> ok, do the same manipulation with the remaining down osds
[13:49] <adun153> flaf: aye aye. :)
[13:52] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[13:55] <adun153> flaf: all OSDs for all nodes are up now.
[13:55] <adun153> health HEALTH_WARN
[13:55] <adun153> 170 pgs down
[13:55] <adun153> 5407 pgs peering
[13:55] <adun153> 5064 pgs stuck inactive
[13:55] <adun153> 5353 pgs stuck unclean
[13:55] <adun153> 615 requests are blocked > 32 sec
[13:55] * Hemanth (~hkumar_@103.228.221.149) Quit (Ping timeout: 480 seconds)
[13:55] <adun153> Does that look good, or is still there cause for concern?
[13:57] <adun153> cephdeploy@services1:~/os-cluster$ ceph status
[13:57] <adun153> cluster 3175dc2e-bd5b-4cd7-91ce-1ab9454b4142
[13:57] <adun153> health HEALTH_WARN
[13:57] <adun153> 170 pgs down
[13:57] <adun153> 5407 pgs peering
[13:57] <adun153> 5278 pgs stuck inactive
[13:57] <adun153> 5355 pgs stuck unclean
[13:57] <adun153> 638 requests are blocked > 32 sec
[13:57] <adun153> monmap e1: 3 mons at {storage1=192.168.0.15:6789/0,storage2=192.168.0.16:6789/0,storage3=192.168.0.17:6789/0}
[13:57] <adun153> election epoch 224, quorum 0,1,2 storage1,storage2,storage3
[13:57] <adun153> osdmap e12611: 66 osds: 66 up, 66 in; 1299 remapped pgs
[13:57] <adun153> flags sortbitwise
[13:57] <adun153> pgmap v6004665: 8080 pgs, 17 pools, 1153 GB data, 188 kobjects
[13:57] <adun153> 3051 GB used, 58068 GB / 61119 GB avail
[13:57] <adun153> 3938 peering
[13:57] <adun153> 2673 active+clean
[13:57] <adun153> 1299 remapped+peering
[13:57] <flaf> adun153: don't paste here.
[13:57] <adun153> 170 down+peering
[13:57] <adun153> flaf: sorry.
[13:58] * bjozet_ (~bjozet@82-183-17-144.customers.ownit.se) has joined #ceph
[13:58] * waydoo (~waydoo@112.198.78.155) Quit (Ping timeout: 480 seconds)
[13:58] <adun153> flaf: http://pastebin.com/mcSPfrU7
[13:59] <flaf> np, it's to bad practice
[13:59] * thomnico (~thomnico@2a01:e35:8b41:120:5145:b80f:6a00:6a05) has joined #ceph
[14:00] <adun153> flaf: does the "ceph status" output look good?
[14:00] * bjozet (~bjozet@82-183-17-144.customers.ownit.se) Quit (Ping timeout: 480 seconds)
[14:00] <flaf> Need to wait. If the number of active+clean increases, it's good.
[14:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[14:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[14:01] <flaf> but the message ???638 requests are blocked > 32 sec??? is not hopeful...
[14:01] <adun153> flaf: 2 osds in storage1 changed to "down" status.
[14:02] <adun153> now 3 osds are down.
[14:02] <flaf> see the log of this osd. You have a problem. No idea what it is.
[14:02] <flaf> *these osds.
[14:02] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:03] * infernix (nix@000120cb.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:03] * infernix (nix@2001:41f0::2) has joined #ceph
[14:03] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit ()
[14:03] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:04] <adun153> flaf: this is from osd.26 logs. http://pastebin.com/tTDxh2mH
[14:04] * ade (~abradshaw@85.158.226.30) Quit (Ping timeout: 480 seconds)
[14:05] * Diablothein (~W|ldCraze@76GAAEZ0H.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:05] <adun153> does this mean that storage1 osds can't communicate with the other osds on the other storage nodes?
[14:05] <s3an2> adun153, do you have any network problems? or high load problems
[14:06] <adun153> s3an2: There was a switch of the servers to anothe swith about a week or so ago.
[14:06] <adun153> s3an2: could that be the cause, and if yes, what can I do to check/fix it?
[14:07] <s3an2> network ping and telnet tests from the node that has osd.26 to a node that has one of the OSD's reprting the errors in the log http://pastebin.com/tTDxh2mH
[14:07] <s3an2> over the front and back network
[14:07] * waydoo (~waydoo@112.198.98.197) has joined #ceph
[14:07] * nastidon (~clarjon1@06SAABZBK.tor-irc.dnsbl.oftc.net) Quit ()
[14:10] <adun153> s3an2: it looks like storage1 can't connect to storage2 and 3 over the cluster network
[14:10] * ade (~abradshaw@85.158.226.30) has joined #ceph
[14:10] <adun153> pingable through the public network, though.
[14:10] <flaf> the atop command can help to see if the network is too loader, or cpu or a disk etc...
[14:11] * i_m (~ivan.miro@88.206.104.168) has joined #ceph
[14:11] <flaf> *too loaded
[14:11] <sep> adun153, what is the networks ? 10 gig or gig only ?
[14:13] * debian112 (~bcolbert@24.126.201.64) Quit (Ping timeout: 480 seconds)
[14:13] <adun153> sep: I found out the issue.
[14:13] <adun153> Storage1 cannot connect to storage2 and storage3
[14:13] <adun153> how do I "tell" the cluster to disregard storage1?
[14:14] <sep> adun153, why? is it not more sensible to fix the network problem ?
[14:14] <adun153> sep: there is no one at the DC right now.
[14:15] <adun153> We just need to have the cluster up and running ASAP, even with the risk of data loss, if another node will go down.
[14:15] * deepthi (~deepthi@115.118.63.76) Quit (Ping timeout: 480 seconds)
[14:16] <s3an2> You can find all the OSD's that are in Storage1 (maybe from ceph osd tree output) and then 'for i in `cat Storage1` ; do ceph osd out $i ; done'
[14:16] <sep> well a bit outside my comfort zone, but i would have stopped the osd procesees on storage1.
[14:16] <adun153> sep: ok, will stop osd processes on storage1.
[14:16] <s3an2> stopping the osd's will work also assuming down out subtree limit has now been modified from the defaults
[14:17] <adun153> s3an2: let's go with the safer route: "ceph osd out <n>" is done from the admin node, right?
[14:18] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[14:18] <sep> marking the osd's out, but not down. wont that make the other osd's try to use that osd as a source when recovering out objects ? won't that be a problem when the osd on storage 1 is not reachable. ?
[14:19] <sep> am by no means an expert tho. just a slightly new user as well
[14:19] <sep> but making them out is a good start. since you can allways stop them later if just marking them out is not enough :)
[14:20] <adun153> sep, s3an2: so, proceed with the "out"ing?
[14:20] <sep> adun153, indeed
[14:21] * etienneme (~arch@88.ip-167-114-240.eu) has joined #ceph
[14:22] <adun153> sep: I've marked them as out. Now what?
[14:23] <s3an2> I see no problem with doing both stop the osd process on the nodes in storage1 (provents other OSD's trying to use them in anyway) and update the crush to out the OSD's as you are not going to be able to fix your network problem quickly.
[14:23] <sep> Miouge, biggest downside is higher latency, so you will need a caching tier to use it for rbd. also more complex so it do require more cpu then simple replication. also less thruput
[14:24] * deepthi (~deepthi@115.118.49.127) has joined #ceph
[14:24] <sep> s3an2, now pay attention to "watch ceph -s" and "ceph -w"
[14:25] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) has joined #ceph
[14:25] <adun153> s3an2: "out"ing an OSD means that it is not considered part of the cluster now?
[14:25] <adun153> meaning, they will not try to replicate to those OSDs?
[14:25] <Miouge> sep: So SSD cache tier for RBD but RadosGW does not need the cache tier? Also less throughput is for read & write?
[14:25] <sep> adun153 yes
[14:25] <adun153> sep: active+clean are going way up now.
[14:25] <sep> http://docs.ceph.com/docs/master/rados/operations/monitoring-osd-pg/
[14:26] <adun153> What's the value of terminating the OSD process, again?
[14:27] <sep> on my slightly old jessie hammer i do /etc/init.d/cehp --verbose stop osd ; do not know what the correct way to do it on your os / version is
[14:30] <s3an2> adun153, if you stop the OSD process in storage1 it will just mean the OSD's won't be used at all even for recovery/cleanup, as you have network problems in storage1 I would say its a good thing to do.
[14:30] <adun153> s3an2 will do
[14:33] <adun153> All OSDs on storage1 now down.
[14:33] <s3an2> adun153, what does 'ceph -s' look like now
[14:33] <sep> Miouge, i do not know about the rwg, yes for the io. also this link is interesting http://events.linuxfoundation.org/sites/events/files/slides/2015-03-13-vault_0.pdf
[14:34] <adun153> s3an2: http://pastebin.com/a6kzJ09T
[14:34] * Diablothein (~W|ldCraze@76GAAEZ0H.tor-irc.dnsbl.oftc.net) Quit ()
[14:35] * Chaos_Llama (~KeeperOfT@06SAABZDX.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:37] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[14:37] * nix_ (~nix@b2b-130-180-60-34.unitymedia.biz) has joined #ceph
[14:37] <s3an2> is the number of down+peering changing?
[14:38] <adun153> s3an2: looks like no. But active+clean is going up a little bit.
[14:38] <adun153> bit by bit
[14:39] <s3an2> can you share the crushmap (http://docs.ceph.com/docs/master/rados/operations/crush-map/#get-a-crush-map)
[14:39] <nix_> Hi. I have a problem with incomplete PGs. Having followed https://ceph.com/community/incomplete-pgs-oh-my/ 4 of 7 PGs resolved, but 3 stay incomplete. Issuing 'ceph pg 3.14 query' blocks indefinitely. What can I do now?
[14:39] <sep> adun153, also even without going to the DC you can do much network throubleshooting from ssh; check you have stable connectivity on both networks to all nodes (clients, mons and osd on public) (osd's on cluster netowork), and that you have full MTU on both.
[14:40] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[14:42] * Moriarty (~Dysgalt@217.23.13.129) has joined #ceph
[14:42] <adun153> s3an2: http://pastebin.com/Sb5P8pxV
[14:44] <s3an2> adun153, crishmap looks good - what do you see if you 'ceph pg <n> query' one of those down PGs?
[14:45] <adun153> s3an2, uh, how do I tell which PG is down?
[14:46] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[14:47] <s3an2> There are a number of ways but of the top of my head 'ceph health detail | grep -i down' should do it
[14:47] * rotbeard (~redbeard@aftr-109-90-232-106.unity-media.net) Quit (Quit: Leaving)
[14:50] <adun153> s3an2: http://pastebin.com/JxjjXP96
[14:51] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[14:57] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[14:57] * rraja (~rraja@121.244.87.117) has joined #ceph
[15:00] * mhack (~mhack@66-168-117-78.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[15:01] <s3an2> The recovery of that down PG seems to be blocked by osd.16 (http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure) has more details on understanding the data in the pg query output
[15:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[15:01] <adun153> s3an2, osd.16 is on storage1.
[15:04] * neurodrone (~neurodron@158.106.193.162) Quit (Quit: neurodrone)
[15:04] * neurodrone (~neurodron@158.106.193.162) has joined #ceph
[15:04] * Chaos_Llama (~KeeperOfT@06SAABZDX.tor-irc.dnsbl.oftc.net) Quit ()
[15:05] * spate (~Corti^car@6AGAABKRM.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:05] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[15:06] <s3an2> It looks like the cluster wants to probe that OSD to ensure the recovery of that PG is consistent, I can only guess that data may have been written to storage1 when in a degraded state before storage1 was marked as down and out. I see your only option now as to get that osd back online and mark it as 'in' or mark the osd as lost but that could risk data loss as the cluster cannot guarantee that the other copies of the data are consistent and up to date.
[15:07] <adun153> s3an2: I see. So that means no choice but to wait until the network issue is fixed?
[15:08] * waydoo (~waydoo@112.198.98.197) Quit (Remote host closed the connection)
[15:10] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[15:10] <s3an2> I think your safest option is to fix the network problem and get the osd???s online and ???in??? - You can move an OSD from one host to another (assuming the journal is moved with it or the journal is on the same disk) - or you can risk marking OSd's in storage1 as lost.
[15:12] * Moriarty (~Dysgalt@06SAABZD9.tor-irc.dnsbl.oftc.net) Quit ()
[15:12] * Spikey (~SEBI@strasbourg-tornode.eddai.su) has joined #ceph
[15:18] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:19] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[15:22] * Miouge (~Miouge@188.188.83.166) Quit (Ping timeout: 480 seconds)
[15:22] <mnaser> bonded public+private (2x10g) or dedicated public and private (10g each?)
[15:23] <adun153> mnaser: were you talking to me?
[15:23] <sep> mnaser, bonded 2x10g public + bonded 2x10g private
[15:23] <mnaser> no, just an arch question adun153 :)
[15:23] <adun153> ah.
[15:23] <mnaser> sep: if only we could fit 4 ports in these boxes, but it's not really possible
[15:23] * rdias (~rdias@2001:8a0:749a:d01:dd23:dbaa:3b64:d12) Quit (Ping timeout: 480 seconds)
[15:23] <mnaser> + only 5 OSDs per node (but they're SSDs)
[15:23] <mnaser> bit overkill
[15:24] <sep> mnaser, you realy do want a separate cluster network.
[15:24] <mnaser> gotcha
[15:24] <nix_> I have a problem with incomplete PGs. Having followed https://ceph.com/community/incomplete-pgs-oh-my/ 4 of 7 PGs resolved, but 3 stay incomplete. Issuing 'ceph pg 3.14 query' blocks indefinitely. What can I do now?
[15:24] <sep> mnaser, i run 10g public + 10g cluster. but it does give me a problem with HA when you need to reboot a switch
[15:25] * csoukup (~csoukup@159.140.254.106) has joined #ceph
[15:25] <mnaser> sep: how so?
[15:25] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) has joined #ceph
[15:26] <sep> mnaser, if i reboot the public switch the clients can not reach the cluster = storage unavailable
[15:26] <mnaser> oh i see, yeah that's problematic
[15:26] <sep> i have been considering 10g + 1g active passive since i have the same problem with only 2 10g ports in the servers
[15:27] <sep> but i do not know what is the most correct answer
[15:27] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[15:27] <mnaser> probably splitting your infra over 2 10g switches might be the easiest
[15:31] * shylesh (~shylesh@45.124.226.164) has joined #ceph
[15:33] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has left #ceph
[15:34] * spate (~Corti^car@6AGAABKRM.tor-irc.dnsbl.oftc.net) Quit ()
[15:35] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[15:35] * aarontc (~aarontc@2001:470:e893::1:1) Quit (Ping timeout: 480 seconds)
[15:36] * deepthi (~deepthi@115.118.49.127) Quit (Quit: Leaving)
[15:38] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[15:38] * rmart04 (~rmart04@75-148-217-209-Houston.hfc.comcastbusiness.net) has joined #ceph
[15:41] * wwdillingham (~LobsterRo@140.247.242.44) has joined #ceph
[15:41] * aarontc (~aarontc@2001:470:e893::1:1) has joined #ceph
[15:41] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) has joined #ceph
[15:41] * ChanServ sets mode +o nhm
[15:42] * Spikey (~SEBI@7V7AAD90S.tor-irc.dnsbl.oftc.net) Quit ()
[15:46] * EinstCrazy (~EinstCraz@180.152.117.239) has joined #ceph
[15:50] * sage__ (~quassel@pool-173-76-103-210.bstnma.fios.verizon.net) has joined #ceph
[16:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[16:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[16:01] * adun153 (~ljtirazon@121.58.192.6) Quit (Quit: Leaving)
[16:02] * bene3 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[16:04] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:05] * datagutt (~Blueraven@tor-exit4-readme.dfri.se) has joined #ceph
[16:05] * Racpatel (~Racpatel@2601:87:3:3601::675d) has joined #ceph
[16:06] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:08] * dsl (~dsl@mobile-166-176-120-123.mycingular.net) has joined #ceph
[16:08] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[16:10] * EinstCrazy (~EinstCraz@180.152.117.239) Quit (Remote host closed the connection)
[16:10] * EinstCrazy (~EinstCraz@180.152.117.239) has joined #ceph
[16:12] * MrBy (~MrBy@85.115.23.2) Quit (Quit: Leaving)
[16:12] * Blueraven (~jwandborg@151.100.179.50) has joined #ceph
[16:12] * rmart04 (~rmart04@75-148-217-209-Houston.hfc.comcastbusiness.net) Quit (Quit: rmart04)
[16:12] * rmart04 (~rmart04@75-148-217-209-Houston.hfc.comcastbusiness.net) has joined #ceph
[16:13] * rmart04 (~rmart04@75-148-217-209-Houston.hfc.comcastbusiness.net) Quit ()
[16:14] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:16] * dsl (~dsl@mobile-166-176-120-123.mycingular.net) Quit (Remote host closed the connection)
[16:16] * i_m (~ivan.miro@88.206.104.168) Quit (Ping timeout: 480 seconds)
[16:19] * EinstCrazy (~EinstCraz@180.152.117.239) Quit (Ping timeout: 480 seconds)
[16:20] * nix_ (~nix@b2b-130-180-60-34.unitymedia.biz) has left #ceph
[16:21] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[16:24] * ade (~abradshaw@85.158.226.30) Quit (Ping timeout: 480 seconds)
[16:27] * alkaid (~alkaid@128.199.95.148) Quit (Ping timeout: 480 seconds)
[16:29] * danieagle (~Daniel@189.0.86.76) has joined #ceph
[16:32] * EinstCrazy (~EinstCraz@180.152.117.239) has joined #ceph
[16:34] * datagutt (~Blueraven@06SAABZIO.tor-irc.dnsbl.oftc.net) Quit ()
[16:35] * Miho (~Kyso_@192.42.115.101) has joined #ceph
[16:36] * Kioob1 (~Kioob@ALyon-652-1-70-193.w109-213.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[16:37] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[16:39] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[16:40] * EinstCrazy (~EinstCraz@180.152.117.239) Quit (Remote host closed the connection)
[16:42] * Blueraven (~jwandborg@4MJAAEL3L.tor-irc.dnsbl.oftc.net) Quit ()
[16:43] * Hemanth (~hkumar_@103.228.221.149) has joined #ceph
[16:44] * xarses (~xarses@64.124.158.100) has joined #ceph
[16:44] * danieagle (~Daniel@189.0.86.76) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[16:46] * vata (~vata@207.96.182.162) has joined #ceph
[16:48] * Drankis (~martin@mikrotik.hostnet.lv) has joined #ceph
[16:51] * mykola (~Mikolaj@91.245.78.193) Quit (Ping timeout: 480 seconds)
[16:52] <wwdillingham> What sort of performance hit (if any) can i expect on my rbd devices by enabling the journaling feature
[16:52] * mgolub (~Mikolaj@91.245.78.193) Quit (Ping timeout: 480 seconds)
[16:52] * KindOne (kindone@h183.41.30.71.dynamic.ip.windstream.net) Quit (Quit: Hiring PHP developers does not contribute to the quota of employees with disabilities.)
[16:52] * mykola (~Mikolaj@91.245.73.44) has joined #ceph
[16:52] * mgolub (~Mikolaj@91.245.73.44) has joined #ceph
[16:53] * joshd (~jdurgin@71-92-201-212.dhcp.gldl.ca.charter.com) has joined #ceph
[16:53] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) has joined #ceph
[16:54] * Drankis (~martin@mikrotik.hostnet.lv) Quit (Quit: Leaving)
[16:55] * mgolub (~Mikolaj@91.245.73.44) Quit (Read error: No route to host)
[16:55] * mykola (~Mikolaj@91.245.73.44) Quit (Read error: No route to host)
[16:55] * mgolub (~Mikolaj@91.245.73.44) has joined #ceph
[16:55] * mykola (~Mikolaj@91.245.73.44) has joined #ceph
[16:56] * jdohms_ (~jdohms@flyingmonkey.concordia.ab.ca) Quit (Quit: leaving)
[16:56] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) has joined #ceph
[16:57] * alkaid (~alkaid@128.199.95.148) Quit (Quit: Leaving)
[16:57] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[17:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[17:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[17:03] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[17:04] * Miho (~Kyso_@6AGAABKUY.tor-irc.dnsbl.oftc.net) Quit ()
[17:05] * PeterRabbit (~Moriarty@exit1.ipredator.se) has joined #ceph
[17:05] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[17:06] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[17:06] * bene3 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[17:09] * kawa2014 (~kawa@89.184.114.246) Quit (Ping timeout: 480 seconds)
[17:11] * wjw-freebsd (~wjw@176.74.240.1) Quit (Ping timeout: 480 seconds)
[17:11] * mattbenjamin (~mbenjamin@173-165-86-195-Illinois.hfc.comcastbusiness.net) has joined #ceph
[17:12] * TehZomB (~Wizeon@192.42.115.101) has joined #ceph
[17:15] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[17:17] * Kioob1 (~Kioob@ALyon-652-1-70-193.w109-213.abo.wanadoo.fr) has joined #ceph
[17:18] * kawa2014 (~kawa@178.162.201.97) has joined #ceph
[17:19] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[17:24] * mattbenjamin (~mbenjamin@173-165-86-195-Illinois.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:26] * linuxkidd (~linuxkidd@241.sub-70-210-192.myvzw.com) has joined #ceph
[17:29] * TMM (~hp@185.5.122.2) Quit (Quit: Ex-Chat)
[17:29] * rjdias (~rdias@bl7-92-98.dsl.telepac.pt) has joined #ceph
[17:29] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[17:33] * rdias (~rdias@bl7-92-98.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[17:33] * rjdias is now known as rdias
[17:34] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Remote host closed the connection)
[17:34] * PeterRabbit (~Moriarty@4MJAAEL43.tor-irc.dnsbl.oftc.net) Quit ()
[17:35] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:36] * wushudoin (~wushudoin@2601:646:8202:5ed0:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:37] * kawa2014 (~kawa@178.162.201.97) Quit (Ping timeout: 480 seconds)
[17:42] * TehZomB (~Wizeon@76GAAEZ6B.tor-irc.dnsbl.oftc.net) Quit ()
[17:42] * Bromine (~raindog@hessel2.torservers.net) has joined #ceph
[17:44] * bvi (~bastiaan@185.56.32.1) Quit (Quit: Ex-Chat)
[17:45] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[17:45] <ledgr> Hello, I have replaced failed drive as of this tutorial: http://ceph.com/planet/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/
[17:45] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[17:45] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[17:45] <ledgr> # ceph osd create
[17:45] <ledgr> 17
[17:46] <ledgr> but when i run ceph-deploy --overwrite-conf osd prepare [node]:sdz
[17:46] <ledgr> this new drive is added as osd.24, with status up, but weight 0
[17:47] <ledgr> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
[17:47] <ledgr> 17 0 0 0 0 0 0 0
[17:47] <ledgr> 24 0 1.00000 921G 51420k 921G 0.01 0
[17:47] <ledgr> why
[17:47] <ledgr> ?
[17:47] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[17:50] * ircolle (~Adium@2601:285:201:633a:4cd8:3cde:b655:18ac) has joined #ceph
[17:50] * huangjun|2 (~kvirc@117.152.69.60) Quit (Ping timeout: 480 seconds)
[17:52] * Kioob1 (~Kioob@ALyon-652-1-70-193.w109-213.abo.wanadoo.fr) Quit (Quit: Leaving.)
[17:52] * adun153 (~ljtirazon@49.144.44.240) has joined #ceph
[17:57] * mattbenjamin (~mbenjamin@173-165-86-195-Illinois.hfc.comcastbusiness.net) has joined #ceph
[18:00] * evelu (~erwan@37.161.157.40) Quit (Read error: Connection reset by peer)
[18:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[18:01] * haomaiwang (~haomaiwan@106.187.51.170) has joined #ceph
[18:04] * pabluk_ is now known as pabluk__
[18:05] * Shesh (~Tonux@185.100.87.82) has joined #ceph
[18:07] * evelu (~erwan@37.161.157.40) has joined #ceph
[18:08] * mattbenjamin (~mbenjamin@173-165-86-195-Illinois.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:12] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) Quit (Quit: Leaving)
[18:12] * Bromine (~raindog@6AGAABKXJ.tor-irc.dnsbl.oftc.net) Quit ()
[18:12] * danielsj (~cryptk@185.61.149.51) has joined #ceph
[18:14] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[18:14] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[18:18] * yanzheng (~zhyan@118.116.113.70) has joined #ceph
[18:20] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[18:23] * Hemanth (~hkumar_@103.228.221.149) Quit (Ping timeout: 480 seconds)
[18:24] * overclk (~quassel@117.202.104.175) Quit (Ping timeout: 480 seconds)
[18:26] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[18:27] * overclk (~quassel@117.202.99.115) has joined #ceph
[18:28] * haplo37 (~haplo37@199.91.185.156) Quit (Ping timeout: 480 seconds)
[18:29] * mattbenjamin (~mbenjamin@173-165-86-195-Illinois.hfc.comcastbusiness.net) has joined #ceph
[18:31] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[18:33] * evelu (~erwan@37.161.157.40) Quit (Read error: Connection reset by peer)
[18:33] * erwan_taf (~erwan@62.147.161.106) has joined #ceph
[18:35] * Shesh (~Tonux@76GAAEZ76.tor-irc.dnsbl.oftc.net) Quit ()
[18:35] * KeeperOfTheSoul (~drdanick@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[18:39] * mattbenjamin (~mbenjamin@173-165-86-195-Illinois.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:40] * kawa2014 (~kawa@89.184.114.246) Quit (Ping timeout: 480 seconds)
[18:40] * Hemanth (~hkumar_@103.228.221.149) has joined #ceph
[18:40] * kawa2014 (~kawa@212.110.41.244) has joined #ceph
[18:41] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[18:42] * danielsj (~cryptk@4MJAAEL7C.tor-irc.dnsbl.oftc.net) Quit ()
[18:44] * kefu (~kefu@114.92.122.74) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:44] * yanzheng (~zhyan@118.116.113.70) Quit (Quit: This computer has gone to sleep)
[18:44] * adun153 (~ljtirazon@49.144.44.240) Quit (Remote host closed the connection)
[18:46] * davidzlap (~Adium@2605:e000:1313:8003:90f5:10a4:d675:6c9d) has joined #ceph
[18:48] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[18:50] * Miouge (~Miouge@188.189.69.6) has joined #ceph
[18:52] * i_m (~ivan.miro@31.207.236.130) has joined #ceph
[18:53] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:57] * dvanders_ (~dvanders@46.227.20.178) has joined #ceph
[18:58] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[18:59] * erwan_taf (~erwan@62.147.161.106) Quit (Ping timeout: 480 seconds)
[18:59] * joshd (~jdurgin@71-92-201-212.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[19:00] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Quit: Leaving...)
[19:01] * haomaiwang (~haomaiwan@106.187.51.170) Quit (Remote host closed the connection)
[19:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[19:02] * LeaChim (~LeaChim@host86-150-161-6.range86-150.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[19:04] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[19:05] * KeeperOfTheSoul (~drdanick@06SAABZP8.tor-irc.dnsbl.oftc.net) Quit ()
[19:05] * Sigma (~luckz@06SAABZRF.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:07] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[19:07] * Miouge (~Miouge@188.189.69.6) Quit (Ping timeout: 480 seconds)
[19:07] * dvanders_ (~dvanders@46.227.20.178) Quit (Remote host closed the connection)
[19:08] * erwan_taf (~erwan@37.161.157.40) has joined #ceph
[19:10] * LeaChim (~LeaChim@host86-150-161-6.range86-150.btcentralplus.com) has joined #ceph
[19:12] * SquallSeeD31 (~ahmeni@h-130-176.a2.corp.bahnhof.no) has joined #ceph
[19:18] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[19:21] * Guest1823 is now known as herrsergio
[19:24] * jluis (~joao@8.184.114.89.rev.vodafone.pt) has joined #ceph
[19:24] * ChanServ sets mode +o jluis
[19:30] * joao|afk (~joao@8.184.114.89.rev.vodafone.pt) Quit (Ping timeout: 480 seconds)
[19:32] * kawa2014 (~kawa@212.110.41.244) Quit (Quit: Leaving)
[19:33] * derjohn_mobi (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[19:35] * Sigma (~luckz@06SAABZRF.tor-irc.dnsbl.oftc.net) Quit ()
[19:35] * Lunk2 (~dotblank@216.230.148.77) has joined #ceph
[19:35] * overclk (~quassel@117.202.99.115) Quit (Read error: Connection reset by peer)
[19:42] * SquallSeeD31 (~ahmeni@06SAABZR0.tor-irc.dnsbl.oftc.net) Quit ()
[19:42] * jwandborg (~Crisco@tor-exit1-readme.dfri.se) has joined #ceph
[19:49] * csoukup (~csoukup@159.140.254.106) Quit (Ping timeout: 480 seconds)
[19:55] * shylesh (~shylesh@45.124.226.164) Quit (Remote host closed the connection)
[20:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[20:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[20:01] * derjohn_mobi (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[20:02] * krypto (~krypto@103.252.26.176) Quit (Ping timeout: 480 seconds)
[20:02] * krypto (~krypto@sessfw99-sesbfw99-92.ericsson.net) has joined #ceph
[20:03] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[20:05] * Lunk2 (~dotblank@7V7AAD96B.tor-irc.dnsbl.oftc.net) Quit ()
[20:10] * wwdillingham (~LobsterRo@140.247.242.44) Quit (Quit: wwdillingham)
[20:11] * wwdillingham (~LobsterRo@140.247.242.44) has joined #ceph
[20:12] * jwandborg (~Crisco@7V7AAD96M.tor-irc.dnsbl.oftc.net) Quit ()
[20:12] * ZombieTree (~spidu_@anonymous.sec.nl) has joined #ceph
[20:14] * infernixx (nix@spirit.infernix.net) has joined #ceph
[20:15] * billwebb (~billwebb@50-203-47-138-static.hfc.comcastbusiness.net) Quit (Quit: billwebb)
[20:15] * infernix (nix@000120cb.user.oftc.net) Quit (Read error: Connection reset by peer)
[20:18] * infernixx is now known as infernix
[20:22] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) Quit (Quit: Ex-Chat)
[20:28] * joshd (~jdurgin@206.169.83.146) has joined #ceph
[20:28] * Hemanth (~hkumar_@103.228.221.149) Quit (Ping timeout: 480 seconds)
[20:29] * infernix (nix@spirit.infernix.net) Quit (Ping timeout: 480 seconds)
[20:30] * erwan_taf (~erwan@37.161.157.40) Quit (Read error: Connection reset by peer)
[20:32] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[20:35] * uhtr5r (~Mattress@snowfall.relay.coldhak.com) has joined #ceph
[20:36] * nils_ (~nils_@doomstreet.collins.kg) has joined #ceph
[20:37] * krypto (~krypto@sessfw99-sesbfw99-92.ericsson.net) Quit (Ping timeout: 480 seconds)
[20:37] * krypto (~krypto@G68-90-105-88.sbcis.sbc.com) has joined #ceph
[20:38] * Miouge (~Miouge@188.189.69.6) has joined #ceph
[20:38] * raarts (~Adium@82-171-243-109.ip.telfort.nl) has joined #ceph
[20:41] * Hemanth (~hkumar_@103.228.221.149) has joined #ceph
[20:42] * ZombieTree (~spidu_@4MJAAEMBB.tor-irc.dnsbl.oftc.net) Quit ()
[20:42] * matx (~vegas3@atlantic480.us.unmetered.com) has joined #ceph
[20:42] <raarts> Hi, I am new to ceph, and trying to learn all about it. So far I've found ceph-deploy, ansible-ceph, ceph-docker, and those work but that actually makes ceph a little opaque. Is there somewhere a detailed walkthrough on how to install ceph by hand with explanation? I want to really understand it before going into production with it. What is the recommendation here?
[20:44] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[20:46] * stupidnic (~foo@office.expresshosting.net) has joined #ceph
[20:46] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:7875:fe8c:7399:b0c8) Quit (Ping timeout: 480 seconds)
[20:47] * erwan_taf (~erwan@37.161.157.40) has joined #ceph
[20:48] * infernix (nix@000120cb.user.oftc.net) has joined #ceph
[20:48] * shaunm (~shaunm@74.83.215.100) Quit (Ping timeout: 480 seconds)
[20:50] * Miouge_ (~Miouge@188.189.74.126) has joined #ceph
[20:52] * Miouge (~Miouge@188.189.69.6) Quit (Ping timeout: 480 seconds)
[20:52] * Miouge_ is now known as Miouge
[20:52] * krypto (~krypto@G68-90-105-88.sbcis.sbc.com) Quit (Remote host closed the connection)
[20:53] * krypto (~krypto@103.252.26.176) has joined #ceph
[20:56] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[20:59] * georgem (~Adium@206.108.127.16) has joined #ceph
[20:59] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[20:59] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) Quit (Quit: Leaving.)
[21:00] * Miouge_ (~Miouge@188.188.65.204) has joined #ceph
[21:00] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) has joined #ceph
[21:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[21:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[21:02] * krypto (~krypto@103.252.26.176) Quit (Ping timeout: 480 seconds)
[21:03] * Miouge (~Miouge@188.189.74.126) Quit (Ping timeout: 480 seconds)
[21:03] * Miouge_ is now known as Miouge
[21:05] * uhtr5r (~Mattress@6AGAABK2W.tor-irc.dnsbl.oftc.net) Quit ()
[21:05] * Zyn (~ain@7V7AAD98E.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:07] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[21:12] * matx (~vegas3@7V7AAD97U.tor-irc.dnsbl.oftc.net) Quit ()
[21:12] * Esvandiary (~smf68@06SAABZW7.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:19] <hoonetorg> raarts: write your own formula in saltstack or a chef-ceph-cookbook
[21:20] * bvi (~bastiaan@152-64-132-5.ftth.glasoperator.nl) has joined #ceph
[21:20] <hoonetorg> or read through what f.e. the ansible-ceph does and do it step by step by hand
[21:20] <hoonetorg> or ceph-deploy
[21:20] * alkaid (~alkaid@128.199.95.148) Quit (Quit: Leaving)
[21:21] <hoonetorg> or walk through this http://docs.ceph.com/docs/master/install/manual-deployment/
[21:22] <hoonetorg> (i had difficulties with that :) )
[21:22] <xcezzz> honestly you???re not going to gain anything from that...
[21:22] <xcezzz> you WILL gain learning ceph-deploy
[21:23] <xcezzz> in the future you are not going to be installing and setting up nodes manually??? that is just silly??? and tedious??? and likely to cause inconsistency issues
[21:24] <hoonetorg> xcezzz: you're right
[21:25] <hoonetorg> but i learned a lot writing my own salt-ceph-formula (was my 1st formula at all, and needs a lot of refactoring)
[21:25] <raarts> Ok, but I feel I need to know what's going on behind the screens, to be able to troubleshoot more effectively later, and gain a deeper understanding.
[21:25] <xcezzz> hoonetorg: same???
[21:25] <hoonetorg> :)
[21:25] <xcezzz> raarts: you learning how it generates the ceph cluster id, manually formatting/partitioning osds, etc??? is not that useful
[21:26] <xcezzz> you can still learn everything with ceph
[21:27] <xcezzz> setup a test cluster
[21:27] <xcezzz> try to screw it up
[21:27] * dyasny_ (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Remote host closed the connection)
[21:27] <xcezzz> ??? which is very hard to do ??? just always go 3 replica, or erasure code
[21:28] <raarts> xcezzz: hoonetorg: ok, I'll start with ceph-deploy. Thanks.
[21:28] <xcezzz> jack drives, disconnect ports, reboot servers mid workload, etc??? you WILL have times you have to resolve issues yourself??? but you???ll notice the real intracacies you ???may??? gain from doing the whole thing manually will have no effect on those cases
[21:28] * madkiss (~madkiss@31.154.44.218) has joined #ceph
[21:33] * Hemanth (~hkumar_@103.228.221.149) Quit (Ping timeout: 480 seconds)
[21:35] * Zyn (~ain@7V7AAD98E.tor-irc.dnsbl.oftc.net) Quit ()
[21:35] * SinZ|offline (~Da_Pineap@76GAAE0ET.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:36] * nils_ (~nils_@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[21:38] * Miouge (~Miouge@188.188.65.204) Quit (Quit: Miouge)
[21:42] * Esvandiary (~smf68@06SAABZW7.tor-irc.dnsbl.oftc.net) Quit ()
[21:42] * brannmar (~SurfMaths@06SAABZYH.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:48] * rendar (~I@host38-182-dynamic.12-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:52] * rendar (~I@host38-182-dynamic.12-79-r.retail.telecomitalia.it) has joined #ceph
[21:54] * zaitcev (~zaitcev@c-50-130-189-82.hsd1.nm.comcast.net) has joined #ceph
[21:59] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) Quit (Quit: Leaving)
[22:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[22:01] * thomnico (~thomnico@2a01:e35:8b41:120:5145:b80f:6a00:6a05) Quit (Ping timeout: 480 seconds)
[22:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[22:01] * erwan_taf (~erwan@37.161.157.40) Quit (Ping timeout: 480 seconds)
[22:05] * SinZ|offline (~Da_Pineap@76GAAE0ET.tor-irc.dnsbl.oftc.net) Quit ()
[22:05] * smf68 (~Jourei@hessel3.torservers.net) has joined #ceph
[22:10] * erwan_taf (~erwan@37.166.41.150) has joined #ceph
[22:12] * KindOne (kindone@h183.41.30.71.dynamic.ip.windstream.net) has joined #ceph
[22:12] * brannmar (~SurfMaths@06SAABZYH.tor-irc.dnsbl.oftc.net) Quit ()
[22:15] * wwdillingham (~LobsterRo@140.247.242.44) Quit (Quit: wwdillingham)
[22:15] * wwdillingham (~LobsterRo@140.247.242.44) has joined #ceph
[22:17] * Skaag (~lunix@rrcs-67-52-140-5.west.biz.rr.com) has joined #ceph
[22:19] * jowilkin (~jowilkin@c-98-207-136-41.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[22:20] * jowilkin (~jowilkin@c-98-207-136-41.hsd1.ca.comcast.net) has joined #ceph
[22:22] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[22:24] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) has joined #ceph
[22:24] * shaunm (~shaunm@72.49.2.237) has joined #ceph
[22:26] * hellboy2k8 (~Kiks@146.185.31.226) has joined #ceph
[22:26] * hellboy2k8 (~Kiks@146.185.31.226) Quit ()
[22:26] * georgem (~Adium@206.108.127.16) Quit (Ping timeout: 480 seconds)
[22:27] * penguinRaider (~KiKs@146.185.31.226) has joined #ceph
[22:35] * smf68 (~Jourei@4MJAAEME2.tor-irc.dnsbl.oftc.net) Quit ()
[22:35] * Architect (~Linkshot@62.149.25.15) has joined #ceph
[22:35] * mykola (~Mikolaj@91.245.73.44) Quit (Quit: away)
[22:35] * mgolub (~Mikolaj@91.245.73.44) Quit (Quit: away)
[22:36] * bstillwell (~bryan@bokeoa.com) has joined #ceph
[22:39] * sileht (~sileht@gizmo.sileht.net) Quit (Quit: WeeChat 1.4)
[22:41] * sileht (~sileht@gizmo.sileht.net) has joined #ceph
[22:42] * csoukup (~csoukup@2605:a601:9c8:6b00:8135:1780:6432:2948) has joined #ceph
[22:42] * richardus1 (~Jamana@tor.effi.org) has joined #ceph
[22:50] * csoukup (~csoukup@2605:a601:9c8:6b00:8135:1780:6432:2948) Quit (Ping timeout: 480 seconds)
[22:52] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[22:53] * allaok (~allaok@ARennes-658-1-234-97.w2-13.abo.wanadoo.fr) has joined #ceph
[22:54] * thansen (~thansen@17.253.sfcn.org) has joined #ceph
[22:56] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) Quit (Read error: Connection reset by peer)
[22:56] * shohn (~shohn@dslb-188-102-024-152.188.102.pools.vodafone-ip.de) has joined #ceph
[22:59] * Skaag (~lunix@rrcs-67-52-140-5.west.biz.rr.com) Quit (Read error: Connection reset by peer)
[23:00] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) has joined #ceph
[23:00] * Skaag (~lunix@rrcs-67-52-140-5.west.biz.rr.com) has joined #ceph
[23:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[23:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[23:05] * Architect (~Linkshot@6AGAABK6K.tor-irc.dnsbl.oftc.net) Quit ()
[23:06] * wwdillingham (~LobsterRo@140.247.242.44) Quit (Ping timeout: 480 seconds)
[23:08] * drnexus (~cmorandin@boc06-4-78-216-15-170.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[23:12] * allaok (~allaok@ARennes-658-1-234-97.w2-13.abo.wanadoo.fr) has left #ceph
[23:12] * richardus1 (~Jamana@76GAAE0GR.tor-irc.dnsbl.oftc.net) Quit ()
[23:12] * Shnaw (~Oddtwang@94.155.49.47) has joined #ceph
[23:13] * scheuk (~scheuk@204.246.67.78) has joined #ceph
[23:17] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) has joined #ceph
[23:19] * haplo37 (~haplo37@199.91.185.156) Quit (Remote host closed the connection)
[23:23] * thansen (~thansen@17.253.sfcn.org) Quit (Quit: Ex-Chat)
[23:25] * shaunm (~shaunm@72.49.2.237) Quit (Ping timeout: 480 seconds)
[23:35] * ivancich (~ivancich@aa2.linuxbox.com) Quit (Read error: Connection reset by peer)
[23:35] * ivancich_ (~ivancich@aa2.linuxbox.com) has joined #ceph
[23:35] * ivancich_ is now known as ivancich
[23:38] * dsl (~dsl@72-48-250-184.dyn.grandenetworks.net) Quit (Remote host closed the connection)
[23:38] * bvi (~bastiaan@152-64-132-5.ftth.glasoperator.nl) Quit (Quit: Ex-Chat)
[23:42] * Shnaw (~Oddtwang@7V7AAEAAK.tor-irc.dnsbl.oftc.net) Quit ()
[23:42] * mog_2 (~PierreW@06SAABZ3A.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:45] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[23:47] * HappyLoaf (~HappyLoaf@cpc93928-bolt16-2-0-cust133.10-3.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[23:48] * HappyLoaf (~HappyLoaf@cpc93928-bolt16-2-0-cust133.10-3.cable.virginm.net) has joined #ceph
[23:51] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:55] * kevinc (~kevinc__@client64-174.sdsc.edu) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.