#ceph IRC Log

Index

IRC Log for 2016-04-04

Timestamps are in GMT/BST.

[0:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[0:01] * dgurtner (~dgurtner@217.149.140.193) has joined #ceph
[0:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[0:02] * Hejt (~ggg@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[0:12] * nolan (~nolan@2001:470:1:41:a800:ff:fe3e:ad08) Quit (Quit: ZNC - http://znc.sourceforge.net)
[0:27] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:246c:eb77:7876:1701) has joined #ceph
[0:32] * utugi______ (~CoMa@tor1e1.privacyfoundation.ch) has joined #ceph
[0:32] * Hejt (~ggg@4MJAADWM8.tor-irc.dnsbl.oftc.net) Quit ()
[0:32] * ItsCriminalAFK (~Jourei@tor3.digineo.de) has joined #ceph
[0:40] * davidz1 (~davidz@2605:e000:1313:8003:4429:d01f:bf3f:6f57) Quit (Quit: Leaving.)
[0:40] * danieagle (~Daniel@189-47-91-188.dsl.telesp.net.br) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[0:44] * wiml (~wiml@2602:47:d406:8a00:76d4:35ff:febf:194f) has joined #ceph
[0:45] * wiml (~wiml@2602:47:d406:8a00:76d4:35ff:febf:194f) Quit ()
[0:46] * wiml (~wiml@2602:47:d406:8a00:76d4:35ff:febf:194f) has joined #ceph
[0:50] * Anticimex (anticimex@netforce.sth.millnert.se) Quit (Ping timeout: 480 seconds)
[0:50] * Lea (~LeaChim@host86-168-120-216.range86-168.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[1:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[1:01] * utugi______ (~CoMa@06SAAAWFP.tor-irc.dnsbl.oftc.net) Quit ()
[1:02] * osuka_ (~Bj_o_rn@chomsky.torservers.net) has joined #ceph
[1:02] * bjornar_ (~bjornar@ti0099a430-1561.bb.online.no) Quit (Ping timeout: 480 seconds)
[1:02] * ItsCriminalAFK (~Jourei@6AGAAACY6.tor-irc.dnsbl.oftc.net) Quit ()
[1:02] * Epi (~hyst@93.115.95.205) has joined #ceph
[1:11] * derjohn_mob (~aj@x590d10ec.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[1:13] * yuxiaozou (~yuxiaozou@128.135.100.102) Quit (Ping timeout: 480 seconds)
[1:13] * olid19811115 (~olid1982@aftr-185-17-204-248.dynamic.mnet-online.de) Quit (Ping timeout: 480 seconds)
[1:14] * Anticimex (anticimex@netforce.sth.millnert.se) has joined #ceph
[1:19] * ronrib (~boswortr@45.32.242.135) Quit (Remote host closed the connection)
[1:19] * geli (~geli@geli-2015.its.utas.edu.au) Quit (Read error: Connection reset by peer)
[1:20] * ronrib (~boswortr@45.32.242.135) has joined #ceph
[1:23] * gopher_49 (~gopher_49@75.66.43.16) has joined #ceph
[1:26] * dgurtner (~dgurtner@217.149.140.193) Quit (Ping timeout: 480 seconds)
[1:31] * osuka_ (~Bj_o_rn@06SAAAWGJ.tor-irc.dnsbl.oftc.net) Quit ()
[1:32] * Kidlvr (~thundercl@176.61.147.146) has joined #ceph
[1:32] * Epi (~hyst@4MJAADWO8.tor-irc.dnsbl.oftc.net) Quit ()
[1:32] * _s1gma (~Heliwr@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[1:33] * yuxiaozou (~yuxiaozou@128.135.100.102) has joined #ceph
[1:37] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[1:48] * nolan (~nolan@2001:470:1:41:a800:ff:fe3e:ad08) has joined #ceph
[1:49] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[1:57] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Remote host closed the connection)
[1:57] * oms101 (~oms101@p20030057EA3CB600C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[2:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[2:01] * Kidlvr (~thundercl@6AGAAAC1D.tor-irc.dnsbl.oftc.net) Quit ()
[2:02] * verbalins (~Mousey@192.87.28.28) has joined #ceph
[2:02] * _s1gma (~Heliwr@06SAAAWHG.tor-irc.dnsbl.oftc.net) Quit ()
[2:02] * luigiman (~Sketchfil@6AGAAAC18.tor-irc.dnsbl.oftc.net) has joined #ceph
[2:03] * badone (~badone@66.187.239.16) has joined #ceph
[2:05] * oms101 (~oms101@p20030057EA04C000C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:19] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) Quit (Quit: Leaving.)
[2:26] * wjw-freebsd4 (~wjw@smtp.digiware.nl) has joined #ceph
[2:31] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Ping timeout: 480 seconds)
[2:31] * verbalins (~Mousey@4MJAADWQV.tor-irc.dnsbl.oftc.net) Quit ()
[2:32] * luigiman (~Sketchfil@6AGAAAC18.tor-irc.dnsbl.oftc.net) Quit ()
[2:35] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[2:36] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[2:37] * RameshN (~rnachimu@101.222.240.85) has joined #ceph
[2:41] * xENO_ (~Knuckx@atlantic480.us.unmetered.com) has joined #ceph
[2:56] * jowilkin (~jowilkin@2601:644:4000:b0bf:ea2a:eaff:fe08:3f1d) Quit (Ping timeout: 480 seconds)
[2:58] * georgem (~Adium@69-165-151-116.dsl.teksavvy.com) has joined #ceph
[2:59] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[3:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[3:01] * georgem (~Adium@69-165-151-116.dsl.teksavvy.com) Quit ()
[3:01] * georgem (~Adium@206.108.127.16) has joined #ceph
[3:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[3:05] * jowilkin (~jowilkin@2601:644:4000:b0bf:ea2a:eaff:fe08:3f1d) has joined #ceph
[3:07] * vata (~vata@cable-21.246.173-197.electronicbox.net) has joined #ceph
[3:10] * xENO_ (~Knuckx@4MJAADWRX.tor-irc.dnsbl.oftc.net) Quit ()
[3:11] * SweetGirl (~Gecko1986@185.36.100.145) has joined #ceph
[3:24] * Skaag1 (~lunix@65.200.54.234) has joined #ceph
[3:25] * bjornar_ (~bjornar@ti0099a430-1561.bb.online.no) has joined #ceph
[3:37] * _br_ (~matx@chomsky.torservers.net) has joined #ceph
[3:40] * SweetGirl (~Gecko1986@06SAAAWKS.tor-irc.dnsbl.oftc.net) Quit ()
[3:41] * smf68 (~VampiricP@ekumen.nos-oignons.net) has joined #ceph
[3:41] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:246c:eb77:7876:1701) Quit (Ping timeout: 480 seconds)
[3:50] * RameshN (~rnachimu@101.222.240.85) Quit (Ping timeout: 480 seconds)
[4:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[4:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[4:04] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) Quit (Ping timeout: 480 seconds)
[4:05] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) has joined #ceph
[4:07] * _br_ (~matx@6AGAAAC5H.tor-irc.dnsbl.oftc.net) Quit ()
[4:10] * smf68 (~VampiricP@4MJAADWTQ.tor-irc.dnsbl.oftc.net) Quit ()
[4:11] * qable (~slowriot@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[4:16] * bjornar_ (~bjornar@ti0099a430-1561.bb.online.no) Quit (Ping timeout: 480 seconds)
[4:21] * wyang (~wyang@114.111.166.42) has joined #ceph
[4:24] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[4:36] * csoukup (~csoukup@2605:a601:9c8:6b00:7907:7972:321:b120) has joined #ceph
[4:37] * Curt` (~djidis__@ns316491.ip-37-187-129.eu) has joined #ceph
[4:38] * guerby (~guerby@2a03:7220:8080:a500::1) Quit (Ping timeout: 480 seconds)
[4:41] * qable (~slowriot@4MJAADWUM.tor-irc.dnsbl.oftc.net) Quit ()
[4:41] * JWilbur (~w2k@host-200-176.junet.se) has joined #ceph
[4:42] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[4:43] * c_soukup (~csoukup@136.63.84.142) has joined #ceph
[4:45] * gopher_49 (~gopher_49@75.66.43.16) Quit (Ping timeout: 480 seconds)
[4:47] * csoukup (~csoukup@2605:a601:9c8:6b00:7907:7972:321:b120) Quit (Ping timeout: 480 seconds)
[4:53] * Skaag1 (~lunix@65.200.54.234) Quit (Quit: Leaving.)
[4:54] * c_soukup (~csoukup@136.63.84.142) Quit (Read error: Connection reset by peer)
[5:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[5:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[5:06] * Zabidin (~oftc-webi@124.13.35.225) has joined #ceph
[5:06] <Zabidin> Hello..
[5:06] <Zabidin> Just want to ask, why i get this error when want to add other monitor? > [root@mon01 ceph]# ceph mon getmap -o /tmp/monmap 2016-04-04 11:05:28.711396 7f2b0c6a0700 0 -- :/2519154720 >> 42.0.30.39:6789/0 pipe(0x7f2b00000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f2b00004ef0).fault 2016-04-04 11:05:37.712745 7f2b0c59f700 0 -- 42.0.30.38:0/2519154720 >> 42.0.30.39:6789/0 pipe(0x7f2b000081b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f2b00005110).fault 2016-04-04 11:05:52.7145
[5:07] * Curt` (~djidis__@4MJAADWVC.tor-irc.dnsbl.oftc.net) Quit ()
[5:07] * Jourei (~Kakeru@06SAAAWN0.tor-irc.dnsbl.oftc.net) has joined #ceph
[5:11] * JWilbur (~w2k@06SAAAWNB.tor-irc.dnsbl.oftc.net) Quit ()
[5:11] * airsoftglock (~slowriot@torsrvs.snydernet.net) has joined #ceph
[5:19] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) has joined #ceph
[5:22] * jamespd (~mucky@mucky.socket7.org) Quit (Read error: Connection reset by peer)
[5:22] * jamespd (~mucky@mucky.socket7.org) has joined #ceph
[5:30] * yanzheng (~zhyan@125.70.23.194) has joined #ceph
[5:30] * Vacuum_ (~Vacuum@i59F7929F.versanet.de) has joined #ceph
[5:32] * yanzheng (~zhyan@125.70.23.194) Quit ()
[5:33] <Zabidin> What is the error > http://pastebin.com/uH7aiANX ?
[5:35] * gopher_49 (~gopher_49@75.66.43.16) has joined #ceph
[5:37] * Vacuum__ (~Vacuum@88.130.202.31) Quit (Ping timeout: 480 seconds)
[5:37] * Jourei (~Kakeru@06SAAAWN0.tor-irc.dnsbl.oftc.net) Quit ()
[5:37] * pakman__ (~Azerothia@185.100.84.82) has joined #ceph
[5:41] * airsoftglock (~slowriot@06SAAAWN3.tor-irc.dnsbl.oftc.net) Quit ()
[5:41] <Zabidin> Found something.. ntp not sync..
[5:41] <Zabidin> Maybe that the error about..
[5:41] * Heliwr (~Hidendra@prettypls.lvpshosting.com) has joined #ceph
[5:43] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[5:45] * wyang (~wyang@114.111.166.42) Quit (Quit: This computer has gone to sleep)
[5:45] <Zabidin> Everytime i reboot monitor, mon02 will not up..
[5:45] <Zabidin> Getting error from systemd
[5:46] <Zabidin> Apr 04 11:44:58 mon02.domain.net ceph-mon[2354]: monitor data directory at '/var/lib/ceph/mon/ceph-mon01' does not exist: have you run 'mkfs'?
[5:46] <Zabidin> Any idea why? Mon01 is up properly..
[5:52] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) Quit (Quit: Leaving.)
[5:56] <Zabidin> How to clean > HEALTH_ERR; 64 pgs stuck inactive; 64 pgs stuck unclean?
[5:56] <lurbs> You have no OSDs in the cluster.
[5:56] <lurbs> "osdmap e1: 0 osds: 0 up, 0 in"
[5:57] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[6:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[6:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[6:07] * pakman__ (~Azerothia@06SAAAWOR.tor-irc.dnsbl.oftc.net) Quit ()
[6:11] * Heliwr (~Hidendra@6AGAAAC9Z.tor-irc.dnsbl.oftc.net) Quit ()
[6:11] <Zabidin> Getting this error when ceph-deploy osd activate > [osd01][WARNIN] __main__.Error: Error: ceph osd create failed: Command '/usr/bin/ceph' returned non-zero exit status 1: 2016-04-04 12:10:49.332900 7fb9f6497700 0 librados: client.bootstrap-osd authentication error (22) Invalid argument ?
[6:11] <Zabidin> Am i wrong?
[6:13] <Zabidin> Full error activate: http://pastebin.com/d21zSkfQ
[6:22] * ivancich_ (~ivancich@aa2.linuxbox.com) Quit (Ping timeout: 480 seconds)
[6:25] * RameshN (~rnachimu@121.244.87.117) has joined #ceph
[6:26] <Zabidin> Error http://pastebin.com/VjXws8gr when issue this command: ceph-deploy admin mon01 mon02 osd01
[6:26] <Zabidin> Please assist me.
[6:37] * DoDzy (~mollstam@ns381528.ip-94-23-247.eu) has joined #ceph
[6:41] * matx (~elt@tor.les.net) has joined #ceph
[6:47] * wyang (~wyang@114.111.166.42) has joined #ceph
[7:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[7:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[7:06] * wjw-freebsd4 (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[7:07] * DoDzy (~mollstam@06SAAAWQH.tor-irc.dnsbl.oftc.net) Quit ()
[7:07] * matx1 (~dicko@c-73-172-157-202.hsd1.md.comcast.net) has joined #ceph
[7:07] * masteroman (~ivan@93-142-231-121.adsl.net.t-com.hr) has joined #ceph
[7:11] * matx (~elt@4MJAADWY6.tor-irc.dnsbl.oftc.net) Quit ()
[7:11] * BlS (~Bonzaii@4MJAADWZ7.tor-irc.dnsbl.oftc.net) has joined #ceph
[7:11] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:14] * masterom1 (~ivan@93-139-222-135.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[7:14] * wyang (~wyang@114.111.166.42) Quit (Quit: This computer has gone to sleep)
[7:20] * kawa2014 (~kawa@94.163.31.242) has joined #ceph
[7:23] * derjohn_mob (~aj@x590d036c.dyn.telefonica.de) has joined #ceph
[7:27] * wyang (~wyang@59.45.74.45) has joined #ceph
[7:29] * derjohn_mob (~aj@x590d036c.dyn.telefonica.de) Quit (Remote host closed the connection)
[7:37] * matx1 (~dicko@6AGAAADCV.tor-irc.dnsbl.oftc.net) Quit ()
[7:37] * sardonyx (~K3NT1S_aw@marylou.nos-oignons.net) has joined #ceph
[7:40] * dyasny (~dyasny@46-117-8-108.bb.netvision.net.il) Quit (Ping timeout: 480 seconds)
[7:41] * BlS (~Bonzaii@4MJAADWZ7.tor-irc.dnsbl.oftc.net) Quit ()
[7:41] * LRWerewolf (~Da_Pineap@91.250.241.241) has joined #ceph
[7:53] * wiml (~wiml@2602:47:d406:8a00:76d4:35ff:febf:194f) has left #ceph
[7:53] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[7:56] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[8:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[8:02] * jowilkin (~jowilkin@2601:644:4000:b0bf:ea2a:eaff:fe08:3f1d) Quit (Ping timeout: 480 seconds)
[8:07] * sardonyx (~K3NT1S_aw@4MJAADW00.tor-irc.dnsbl.oftc.net) Quit ()
[8:07] * danielsj (~Chaos_Lla@exit.tor.uwaterloo.ca) has joined #ceph
[8:08] * krobelus_ (~krobelus@80-121-72-207.adsl.highway.telekom.at) has joined #ceph
[8:10] <Zabidin> I just reinstall ceph using ceph-deploy install.. Creating monitor success.. Ceph mon-initial, no error.. Ceph -s getting error, no osd yet.. Prepare osd no error.. But activate osd got error > http://pastebin.com/M4GkTJWy
[8:10] * overclk (~quassel@121.244.87.117) has joined #ceph
[8:11] * LRWerewolf (~Da_Pineap@06SAAAWST.tor-irc.dnsbl.oftc.net) Quit ()
[8:11] * jowilkin (~jowilkin@2601:644:4000:b0bf:ea2a:eaff:fe08:3f1d) has joined #ceph
[8:12] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:14] * krobelus (~krobelus@193-80-10-65.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[8:15] <Be-El> Zabidin: wrong permissions for the journal devices
[8:15] * MannerMan (~oscar@user170.217-10-117.netatonce.net) has joined #ceph
[8:15] <Zabidin> Checking now..
[8:17] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[8:18] <Zabidin> Maybe i forgot to add user ceph in sudoers..
[8:19] <Zabidin> Can run same command again, Be-El?
[8:20] <Be-El> I don't know, I do not use ceph-deploy
[8:21] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:23] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:24] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:25] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:26] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:27] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:28] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:29] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:31] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:33] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:34] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:35] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:36] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:37] * danielsj (~Chaos_Lla@06SAAAWTT.tor-irc.dnsbl.oftc.net) Quit ()
[8:37] <Zabidin> Be-El: Permission denied from who? root or ceph?
[8:37] * Phase (~cmrn@176.61.147.146) has joined #ceph
[8:37] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:37] <Be-El> Zabidin: OSDs are run as ceph user in the infernalis and jewel releases (I assume you use one of them)
[8:38] * Phase is now known as Guest10286
[8:38] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[8:39] <Zabidin> Be-El: permission > http://pastebin.com/5g5LC5pN
[8:39] <Zabidin> Be-El: own by ceph
[8:39] <Be-El> Zabidin: the problem are the journal devices, not the ceph directories
[8:40] <Be-El> Zabidin: check the target of the journal links in the osd subdirectories
[8:41] * LRWerewolf (~Malcovent@06SAAAWU0.tor-irc.dnsbl.oftc.net) has joined #ceph
[8:42] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:44] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:44] * dalegaard-39554 (~dalegaard@vps.devrandom.dk) has joined #ceph
[8:44] * Zabidin2 (~oftc-webi@tech.nocser.net) has joined #ceph
[8:44] * Zabidin2 (~oftc-webi@tech.nocser.net) Quit ()
[8:44] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:45] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:47] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:48] * rakeshgm (~rakesh@121.244.87.117) Quit (Remote host closed the connection)
[8:49] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:49] * derjohn_mob (~aj@b2b-94-79-172-98.unitymedia.biz) has joined #ceph
[8:50] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[8:51] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:52] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:53] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:54] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:56] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[8:57] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[8:58] * gopher_49 (~gopher_49@75.66.43.16) Quit (Ping timeout: 480 seconds)
[8:58] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[9:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[9:02] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:03] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[9:03] * rakeshgm (~rakesh@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:03] <Zabidin> Be-El: chown to ceph. No error permission. But other error come out: http://pastebin.com/v1k2p5HD
[9:04] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:05] <Zabidin> Be-El: sdc1 not created. Why?
[9:05] <Zabidin> It took one disk only.
[9:06] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:07] * Guest10286 (~cmrn@06SAAAWUT.tor-irc.dnsbl.oftc.net) Quit ()
[9:07] * mason1 (~rogst@4MJAADW3V.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:08] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:08] * dvanders (~dvanders@dvanders-pro.cern.ch) has joined #ceph
[9:08] <Zabidin> Be-El: How to delete old monitor and create new config file?
[9:09] <Be-El> Zabidin: I do not use ceph-deploy, so I cannot help you with specific problems.
[9:09] <Zabidin> Be-El: Which one you use?
[9:09] <Be-El> I deploy manually
[9:11] * LRWerewolf (~Malcovent@06SAAAWU0.tor-irc.dnsbl.oftc.net) Quit ()
[9:11] * Tralin|Sleep (~Zyn@marylou.nos-oignons.net) has joined #ceph
[9:11] <Zabidin> Be-El: Manually? Not expert yet.
[9:12] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:12] * analbeard (~shw@support.memset.com) has joined #ceph
[9:14] * rakeshgm (~rakesh@121.244.87.124) has joined #ceph
[9:14] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:15] * yuxiaozou (~yuxiaozou@128.135.100.102) Quit (Ping timeout: 480 seconds)
[9:16] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:16] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[9:17] * i_m (~ivan.miro@deibp9eh1--blueice4n6.emea.ibm.com) has joined #ceph
[9:18] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:19] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:20] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:21] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:22] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[9:23] * dvanders (~dvanders@dvanders-pro.cern.ch) Quit (Remote host closed the connection)
[9:24] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[9:26] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) has joined #ceph
[9:27] * derjohn_mob (~aj@b2b-94-79-172-98.unitymedia.biz) Quit (Ping timeout: 480 seconds)
[9:35] * dgurtner (~dgurtner@87.215.61.26) has joined #ceph
[9:37] * fsimonce (~simon@host201-70-dynamic.26-79-r.retail.telecomitalia.it) has joined #ceph
[9:37] * mason1 (~rogst@4MJAADW3V.tor-irc.dnsbl.oftc.net) Quit ()
[9:39] <Zabidin> Be-El: Can single ssd have two journal from 2 disk on same box?
[9:40] * rendar (~I@host241-182-dynamic.33-79-r.retail.telecomitalia.it) has joined #ceph
[9:41] * Tralin|Sleep (~Zyn@4MJAADW32.tor-irc.dnsbl.oftc.net) Quit ()
[9:41] <BranchPredictor> Zabidin: yes
[9:41] * xENO_ (~Spikey@tollana.enn.lu) has joined #ceph
[9:42] <Zabidin> BranchPredictor: When i try activate it will take either one hdd..
[9:42] <Zabidin> sdd or sdc..
[9:42] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:43] <BranchPredictor> Zabidin: repartition the ssd and use one of those partitions as journal
[9:43] <Zabidin> BranchPredictor: I did partition it using parted.. Let me try again..
[9:44] * wjw-freebsd4 (~wjw@smtp.digiware.nl) has joined #ceph
[9:44] <BranchPredictor> Zabidin: well, I didn't follow on discussion so I don't know how exactly you do it, but it is absolutely doable
[9:45] * Moriarty (~datagutt@81-7-15-115.blue.kundencontroller.de) has joined #ceph
[9:46] <Zabidin> BranchPredictor: If i prepare 2 hdd with same sdd but using different partition on ssd.. Giving this error > http://pastebin.com/5iX4VGES
[9:46] <BranchPredictor> I have 22 journals on one ssd, so it's not a probem
[9:46] <Zabidin> BranchPredictor: I'm trying to prepare osd using journal
[9:47] <BranchPredictor> Zabidin: try invoking command that failed ("/usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f11e6078-a06b-417d-983a-76e0d7cb1942 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdd") to see exact output and possible failure reason
[9:48] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[9:49] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:49] * guerby (~guerby@2a03:7220:8080:a500::1) has joined #ceph
[9:52] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[9:58] * viveksing (~user@114.79.157.147) has joined #ceph
[9:59] * dyasny (~dyasny@bzq-82-81-161-51.red.bezeqint.net) has joined #ceph
[10:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[10:01] * wyang (~wyang@59.45.74.45) has joined #ceph
[10:01] <viveksing> Hello all, I want to configure rados-gw for switf object storage , after initial setup i am able to create the swift containers, and list them, however i am not able upload objects to them
[10:01] <viveksing> i get the below errr
[10:01] <viveksing> Container HEAD failed: http://nokia.storeadmin:7480/swift/v1/nokia 401 Unauthorized
[10:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[10:01] <viveksing> kindly suggest me the where am i doing it wrong
[10:01] <viveksing> ?
[10:04] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[10:08] * wjw-freebsd4 (~wjw@smtp.digiware.nl) Quit (Quit: Nettalk6 - www.ntalk.de)
[10:11] * xENO_ (~Spikey@4MJAADW48.tor-irc.dnsbl.oftc.net) Quit ()
[10:11] * roaet (~notmyname@06SAAAWYS.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:13] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:15] * Moriarty (~datagutt@6AGAAADJS.tor-irc.dnsbl.oftc.net) Quit ()
[10:15] * Xa (~redbeast1@anonymous.sec.nl) has joined #ceph
[10:16] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:17] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:19] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:20] * rakeshgm (~rakesh@121.244.87.124) Quit (Ping timeout: 480 seconds)
[10:21] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:22] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:23] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:24] * Skaag (~lunix@cpe-172-91-77-84.socal.res.rr.com) Quit (Quit: Leaving.)
[10:24] <Zabidin> Keep getting same error > [osd01][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v activate --mark-init systemd --mount /dev/sdc1
[10:24] <Zabidin> anybody knows?
[10:25] <Zabidin> Full error: http://pastebin.com/8pV8kCQt
[10:25] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:26] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:26] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:27] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:28] * olid19811115 (~olid1982@aftr-185-17-205-250.dynamic.mnet-online.de) has joined #ceph
[10:29] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:29] * evelu (~erwan@46.231.131.178) has joined #ceph
[10:30] * vicente (~vicente@1-172-101-87.dynamic.hinet.net) has joined #ceph
[10:31] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:31] <etienneme> [osd01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.q8J542/keyring osd allow * mon allow profile osd
[10:31] <etienneme> [osd01][WARNIN] Error EINVAL: entity osd.0 exists but key does not match
[10:31] <etienneme> check keyring content?
[10:32] <etienneme> Zabidin: ^
[10:33] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:34] <Zabidin> etienneme: is this correct > http://pastebin.com/f7McALUW ? osd tree down..
[10:35] <Zabidin> etienneme: setup hdd and jpurnal use ssd..
[10:35] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[10:36] * Lea (~LeaChim@host86-168-120-216.range86-168.btcentralplus.com) has joined #ceph
[10:38] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:39] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:40] * rakeshgm (~rakesh@121.244.87.124) has joined #ceph
[10:41] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:41] * roaet (~notmyname@06SAAAWYS.tor-irc.dnsbl.oftc.net) Quit ()
[10:41] * Wijk (~visored@37.203.209.2) has joined #ceph
[10:42] * vicente (~vicente@1-172-101-87.dynamic.hinet.net) Quit (Ping timeout: 480 seconds)
[10:42] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:43] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:44] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:45] * Xa (~redbeast1@6AGAAADK3.tor-irc.dnsbl.oftc.net) Quit ()
[10:45] * Oddtwang (~Plesioth@orion1626.startdedicated.com) has joined #ceph
[10:46] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:48] <etienneme> It seems correct yes (I mean, there is no errors :) )
[10:49] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:49] <etienneme> But I guess that /var/lib/ceph/tmp/mnt.q8J542/keyring and /var/lib/ceph/bootstrap-osd/ceph.keyring on your servers are different, and It should not
[10:51] * TMM (~hp@185.5.122.2) has joined #ceph
[10:52] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:53] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:54] <Kdecherf> hello world
[10:55] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:55] <Kdecherf> do we need a raid 1 device for journal if we plan to use SSD [for journal]?
[10:56] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[10:57] * alkaid (~alkaid@128.199.95.148) Quit (Remote host closed the connection)
[10:58] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[11:00] * alkaid (~alkaid@128.199.95.148) Quit ()
[11:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[11:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[11:01] * alkaid (~alkaid@128.199.95.148) has joined #ceph
[11:04] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[11:05] * rakeshgm (~rakesh@121.244.87.124) Quit (Ping timeout: 480 seconds)
[11:05] <Zabidin> Why my ceph not showing up?
[11:06] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[11:06] * derjohn_mob (~aj@88.128.80.198) has joined #ceph
[11:08] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[11:09] * kawa2014 (~kawa@94.163.31.242) Quit (Ping timeout: 480 seconds)
[11:11] * Wijk (~visored@4MJAADW7F.tor-irc.dnsbl.oftc.net) Quit ()
[11:14] <smerz> Kdecherf, no, usually one has the redundancy split over nodes so if a journal fails (and that node goes offline) the others takeover. so no raid1 for journal typically.
[11:14] <Kdecherf> smerz: ok thx
[11:15] <smerz> just beware, if you loose a journal you loose all ods on that journal. a typical ratio is about 6 spinners on 1 ssd journal
[11:15] * rdas (~rdas@121.244.87.116) Quit (Read error: Connection reset by peer)
[11:15] <smerz> *osd's
[11:15] * Oddtwang (~Plesioth@4MJAADW7O.tor-irc.dnsbl.oftc.net) Quit ()
[11:15] <Kdecherf> ack
[11:15] * mr_flea1 (~Borf@93.174.90.30) has joined #ceph
[11:15] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[11:16] * Pulec1 (~vegas3@tor2e1.privacyfoundation.ch) has joined #ceph
[11:16] * bjornar_ (~bjornar@109.247.131.38) has joined #ceph
[11:16] * wyang (~wyang@59.45.74.45) has joined #ceph
[11:16] <Kdecherf> also, is this a good idea to create a small cache pool on the same ssd as journal?
[11:17] <smerz> haven't used cache pools myself, i'll leave that for someone else :)
[11:17] <smerz> but writes are extremely sensitive to latency, so i wouldn't be surprised if it's not a good idea
[11:20] <smerz> i dunno
[11:20] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) has joined #ceph
[11:22] * rdas (~rdas@121.244.87.116) has joined #ceph
[11:26] * karnan (~karnan@121.244.87.117) has joined #ceph
[11:28] <Kdecherf> ok
[11:29] <Kdecherf> what is the typical ratio of ssd osds on one ssd journal?
[11:29] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[11:32] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:32] <smerz> Kdecherf, 1:1, you dont have a separate journal. one journals on each ssd then
[11:33] <Kdecherf> ok thx
[11:34] * ngoswami (~ngoswami@121.244.87.116) Quit ()
[11:35] * vicente (~vicente@1-172-101-87.dynamic.hinet.net) has joined #ceph
[11:38] * RameshN (~rnachimu@121.244.87.117) Quit (Ping timeout: 480 seconds)
[11:39] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:40] * derjohn_mob (~aj@88.128.80.198) Quit (Ping timeout: 480 seconds)
[11:40] * dgurtner (~dgurtner@87.215.61.26) Quit (Quit: Reconnecting)
[11:40] * dgurtner (~dgurtner@87.215.61.26) has joined #ceph
[11:44] <Zabidin> How to set journal on ssd and data on hard disk? Let say sda and sdb is hard disk while sdc is ssd. Is its like this: ceph-deploy osd prepare osd1:sda:/dev/sdc osd1:sdb:/dev/sdc ?
[11:44] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:246c:eb77:7876:1701) has joined #ceph
[11:44] * vicente (~vicente@1-172-101-87.dynamic.hinet.net) Quit (Ping timeout: 480 seconds)
[11:45] * mr_flea1 (~Borf@06SAAAW09.tor-irc.dnsbl.oftc.net) Quit ()
[11:45] * dusti (~K3NT1S_aw@6AGAAADOY.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:46] * Pulec1 (~vegas3@6AGAAADNY.tor-irc.dnsbl.oftc.net) Quit ()
[11:46] * Catsceo (~TheDoudou@4MJAADW9Z.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:48] * RameshN (~rnachimu@121.244.87.124) has joined #ceph
[11:49] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: This computer has gone to sleep)
[11:50] * swami1 (~swami@49.32.0.32) has joined #ceph
[11:58] <Zabidin> Anyone use version version 9.2.1 on centos 7?
[12:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[12:01] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[12:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[12:02] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[12:04] * Zabidin (~oftc-webi@124.13.35.225) Quit (Quit: Page closed)
[12:05] * davidccc_ (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[12:05] * davidccc_ (~dcasier@84.197.151.77.rev.sfr.net) Quit ()
[12:09] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[12:15] * dusti (~K3NT1S_aw@6AGAAADOY.tor-irc.dnsbl.oftc.net) Quit ()
[12:15] * utugi______ (~osuka_@85.159.237.210) has joined #ceph
[12:16] * Catsceo (~TheDoudou@4MJAADW9Z.tor-irc.dnsbl.oftc.net) Quit ()
[12:16] * Hazmat (~Heliwr@199.87.154.251) has joined #ceph
[12:19] * overclk (~quassel@121.244.87.117) Quit (Remote host closed the connection)
[12:23] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[12:25] * GabrielDias (~GabrielDi@srx.h1host.ru) Quit (Remote host closed the connection)
[12:26] * rakeshgm (~rakesh@121.244.87.117) Quit (Ping timeout: 480 seconds)
[12:29] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[12:30] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[12:31] * ftuesca (~ftuesca@181.170.106.78) has joined #ceph
[12:31] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[12:34] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[12:35] * rraja (~rraja@117.206.129.6) has joined #ceph
[12:36] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[12:38] * ade (~abradshaw@212.77.58.61) has joined #ceph
[12:45] * quinoa (~quinoa@24-148-81-106.c3-0.grn-ubr2.chi-grn.il.cable.rcn.com) Quit (Quit: Leaving...)
[12:45] * utugi______ (~osuka_@06SAAAW29.tor-irc.dnsbl.oftc.net) Quit ()
[12:45] * Pulec1 (~Enikma@93.115.95.216) has joined #ceph
[12:46] * Hazmat (~Heliwr@6AGAAADP9.tor-irc.dnsbl.oftc.net) Quit ()
[12:46] * bret1 (~Yopi@marylou.nos-oignons.net) has joined #ceph
[12:49] * dgurtner (~dgurtner@87.215.61.26) Quit (Quit: Reconnecting)
[12:49] * dgurtner (~dgurtner@87.215.61.26) has joined #ceph
[12:49] * wyang (~wyang@59.45.74.45) has joined #ceph
[12:50] * Bartek (~Bartek@dynamic-78-8-68-29.ssp.dialog.net.pl) has joined #ceph
[12:52] * b0e (~aledermue@213.95.25.82) has joined #ceph
[12:53] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[12:54] * RameshN (~rnachimu@121.244.87.124) Quit (Ping timeout: 480 seconds)
[12:55] * dgurtner_ (~dgurtner@82.199.64.68) has joined #ceph
[12:56] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[12:56] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[12:56] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[12:57] * dgurtner (~dgurtner@87.215.61.26) Quit (Ping timeout: 480 seconds)
[12:58] * ngoswami (~ngoswami@121.244.87.116) Quit ()
[13:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[13:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[13:03] * RameshN (~rnachimu@121.244.87.117) has joined #ceph
[13:05] * overclk (~quassel@117.202.96.224) has joined #ceph
[13:08] * Bartek (~Bartek@dynamic-78-8-68-29.ssp.dialog.net.pl) Quit (Ping timeout: 480 seconds)
[13:13] * Bartek (~Bartek@dynamic-78-8-68-29.ssp.dialog.net.pl) has joined #ceph
[13:15] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[13:15] * Pulec1 (~Enikma@6AGAAADRD.tor-irc.dnsbl.oftc.net) Quit ()
[13:15] * notmyname2 (~Tumm@65.19.167.131) has joined #ceph
[13:16] * madkiss (~madkiss@2001:6f8:12c3:f00f:2cee:fe78:3765:b8ad) Quit (Quit: Leaving.)
[13:16] * bret1 (~Yopi@4MJAADXCD.tor-irc.dnsbl.oftc.net) Quit ()
[13:16] * SEBI1 (~MonkeyJam@edwardsnowden2.torservers.net) has joined #ceph
[13:17] * daiver (~daiver@95.85.8.93) Quit (Read error: Connection reset by peer)
[13:19] * Bartek (~Bartek@dynamic-78-8-68-29.ssp.dialog.net.pl) Quit (Remote host closed the connection)
[13:23] * gfidente (~gfidente@0001ef4b.user.oftc.net) has joined #ceph
[13:28] * wyang (~wyang@59.45.74.45) has joined #ceph
[13:30] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[13:31] * linuxkidd (~linuxkidd@29.sub-70-193-113.myvzw.com) has joined #ceph
[13:38] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[13:42] * pabluk__ is now known as pabluk_
[13:42] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[13:42] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[13:43] * yanzheng (~zhyan@125.70.23.194) has joined #ceph
[13:44] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[13:45] * notmyname2 (~Tumm@6AGAAADSR.tor-irc.dnsbl.oftc.net) Quit ()
[13:45] * Maariu5_ (~EdGruberm@nooduitgang.schmutzig.org) has joined #ceph
[13:46] * SEBI1 (~MonkeyJam@6AGAAADST.tor-irc.dnsbl.oftc.net) Quit ()
[13:46] * tritonx (~RaidSoft@6AGAAADT4.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:48] * haomaiwa_ (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[13:51] * rraja (~rraja@117.206.129.6) Quit (Remote host closed the connection)
[13:52] * wjw-freebsd (~wjw@vpn.ecoracks.nl) has joined #ceph
[13:53] * bene2 (~bene@2601:18c:8501:25e4:ea2a:eaff:fe08:3c7a) has joined #ceph
[13:54] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Ping timeout: 480 seconds)
[13:58] * dgurtner (~dgurtner@82.199.64.68) has joined #ceph
[13:58] * dgurtner_ (~dgurtner@82.199.64.68) Quit (Ping timeout: 480 seconds)
[14:00] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) Quit (Quit: Leaving)
[14:01] * haomaiwa_ (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[14:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[14:01] * dneary (~dneary@pool-96-237-170-97.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:05] * ibravo (~ibravo@72.198.142.104) has joined #ceph
[14:08] * gfidente (~gfidente@0001ef4b.user.oftc.net) Quit (Quit: bye)
[14:09] <TMM> What would be the best way to reduce the amount of cpu time used during a rebuild? I've recently increased the size of "osd recovery max active" I should probably reduce that again, but is that enough?
[14:09] <TMM> And does "osd max backfills" make a large contribution to CPU load under rebuild?
[14:11] * huangjun (~kvirc@117.152.72.24) has joined #ceph
[14:15] * Maariu5_ (~EdGruberm@6AGAAADT0.tor-irc.dnsbl.oftc.net) Quit ()
[14:15] * Bonzaii (~Spessu@91.109.29.120) has joined #ceph
[14:16] * tritonx (~RaidSoft@6AGAAADT4.tor-irc.dnsbl.oftc.net) Quit ()
[14:17] * yanzheng (~zhyan@125.70.23.194) Quit (Quit: This computer has gone to sleep)
[14:18] * wgao (~wgao@106.120.101.38) Quit (Read error: Connection timed out)
[14:18] * wgao (~wgao@106.120.101.38) has joined #ceph
[14:20] * Rehevkor (~vend3r@216.17.99.183) has joined #ceph
[14:25] * bniver (~bniver@pool-173-48-58-27.bstnma.fios.verizon.net) has joined #ceph
[14:33] * viveksing (~user@114.79.157.147) Quit (Remote host closed the connection)
[14:34] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: This computer has gone to sleep)
[14:40] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[14:43] <boolman> I have a strange error with cephfs, if cephfs is already mounted and I run 'mount -a' I get 'mount error 16 = Device or resource busy'. there is no user in the path
[14:45] * Bonzaii (~Spessu@4MJAADXGC.tor-irc.dnsbl.oftc.net) Quit ()
[14:45] * Bobby (~xENO_@85.159.237.210) has joined #ceph
[14:45] <etienneme> lsof |grep "/your/mount/point" to check if something is used?
[14:45] * sugoruyo (~georgev@paarthurnax.esc.rl.ac.uk) has joined #ceph
[14:46] <boolman> no there is not, maybe a bit unclear with the "there is no user in the path"
[14:46] <boolman> also, I can umount without problems
[14:48] <boolman> hm I'l start of by upgrading the kernel
[14:50] * Rehevkor (~vend3r@6AGAAADV5.tor-irc.dnsbl.oftc.net) Quit ()
[14:51] * mason1 (~Teddybare@4MJAADXHX.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:51] * mattbenjamin (~mbenjamin@nat-pool-bos-u.redhat.com) has joined #ceph
[14:51] * Cory_ (~cory@CPE-58-174-236-103.sa.bigpond.net.au) has joined #ceph
[14:52] <Cory_> Hi team, i'm looking for some help debuggin an OSD that wont come online. Is anyone available to help?
[14:53] <etienneme> Put logs/infos here and maybe someone will have ideas :)
[14:54] <Cory_> Ok, ive got a snippet form the log of the OSD thats giving me greif, shall i just paste that here?
[14:55] <Cory_> https://paste.gnome.org/p1cd8cdvm
[14:56] <Cory_> Here is aa bit more info
[14:57] <Cory_> https://paste.gnome.org/pjewdpgd0
[14:58] <Mosibi> Cory_: what's that the status of PG 34.9 ?
[14:58] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[14:59] <Cory_> root@vm-mon1 ceph]# ceph pg 34.9 query
[14:59] <Cory_> shows
[14:59] <Cory_> https://paste.gnome.org/p0a98nlum
[15:00] <Mosibi> Cory_: can you put the complete json there ?
[15:00] <Cory_> Sorry, just getting the hang of this IRC client
[15:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[15:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[15:02] * wyang (~wyang@59.45.74.45) has joined #ceph
[15:02] * mhack (~mhack@66-168-117-78.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:04] * Cory1 (3aaeec67@107.161.19.53) has joined #ceph
[15:04] <Cory_> https://paste.gnome.org/pkhmf1110
[15:04] <Cory_> Thats the complete output from ceph pg 34.9 query
[15:06] <boolman> nope, still the same error with kernel 4.2
[15:09] <Mosibi> Cory_: strange, osd.35 has nothing to do with pg 34.9
[15:10] <Mosibi> Cory_: is there any data for that PG on the OSD ?
[15:11] <Mosibi> Cory_: for example: /var/lib/ceph/osd/ceph-35/current/34.9_head/
[15:11] <Cory_> That OSD is currently removed, i'll need to re-insert it and check
[15:11] <Cory_> Are you asking because of the comment on line 74?
[15:12] <Cory_> osd.35 pg_epoch: 3585 pg[34.9(unlocked)] enter Initial
[15:12] <Cory_> immediatley before the thread closed?
[15:12] <Mosibi> yes
[15:12] <Mosibi> Cory_: it is osd.35 you are trying to start, is it?
[15:12] <Cory_> That is correct
[15:13] <Mosibi> ack :D
[15:13] <Cory_> The log file also mentions pg 34.8, but thats reporting as ok too
[15:13] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:13] <Cory_> I'm wondering if i can just mark osd35 as lost?
[15:14] <Mosibi> All your PG's are active+clean?
[15:14] <Cory_> no
[15:14] <Cory1> [root@cephosd1 log]# ceph -s
[15:14] <Cory1> cluster 46ded320-ec09-40bc-a6c4-0a8ad3341035
[15:14] <Cory1> health HEALTH_WARN
[15:14] <Cory1> 1 pgs backfilling
[15:14] <Cory1> 164 pgs degraded
[15:14] <Cory1> 47 pgs down
[15:14] <Cory1> 47 pgs peering
[15:14] <Cory1> 1 pgs recovering
[15:14] <Cory1> 1 pgs recovery_wait
[15:14] <Cory1> 53 pgs stale
[15:14] <Cory1> 164 pgs stuck degraded
[15:14] <Cory1> 47 pgs stuck inactive
[15:14] <Cory1> 53 pgs stuck stale
[15:15] <Cory1> 346 pgs stuck unclean
[15:15] <Cory1> 162 pgs stuck undersized
[15:15] <Cory1> 162 pgs undersized
[15:15] <Cory1> 69 requests are blocked > 32 sec
[15:15] <Cory1> recovery 2402/4513536 objects degraded (0.053%)
[15:15] <Cory1> recovery 84150/4513536 objects misplaced (1.864%)
[15:15] <Cory1> recovery 1/3065044 unfound (0.000%)
[15:15] <Cory1> 1/40 in osds are down
[15:15] <Cory1> noout flag(s) set
[15:15] <Cory1> monmap e1: 3 mons at {vm-mon1=172.16.102.81:6789/0,vm-mon2=172.16.102.82:6789/0,vm-mon3=172.16.102.83:6789/0}
[15:15] <Cory1> election epoch 62, quorum 0,1,2 vm-mon1,vm-mon2,vm-mon3
[15:15] <Cory1> osdmap e4809: 40 osds: 39 up, 40 in; 112 remapped pgs
[15:15] <Cory1> flags noout
[15:15] <Cory1> pgmap v15573881: 5130 pgs, 15 pools, 12755 GB data, 2993 kobjects
[15:15] <Cory1> 22531 GB used, 39648 GB / 62180 GB avail
[15:15] <Cory1> 2402/4513536 objects degraded (0.053%)
[15:15] <Cory1> 84150/4513536 objects misplaced (1.864%)
[15:15] <Cory1> 1/3065044 unfound (0.000%)
[15:15] <Cory1> 4763 active+clean
[15:15] <Cory1> 162 active+undersized+degraded
[15:15] <Cory1> 112 active+remapped
[15:15] <Cory1> 39 down+peering
[15:15] <Cory1> 22 stale+active+remapped
[15:15] <Cory1> 21 stale+active+clean
[15:15] <Cory1> 4 stale+down+remapped+peering
[15:15] <Cory1> 4 stale+down+peering
[15:15] <Cory1> 1 stale+active+remapped+backfilling
[15:15] <Cory1> 1 active+recovering+degraded
[15:15] <Cory1> 1 stale+active+recovery_wait+degraded
[15:15] <Cory1> client io 2283 kB/s rd, 811 kB/s wr, 128 op/s
[15:15] * Bobby (~xENO_@6AGAAADXN.tor-irc.dnsbl.oftc.net) Quit ()
[15:15] * Sami345 (~Gibri@192.42.116.16) has joined #ceph
[15:15] <Mosibi> Cory1: whoah....
[15:15] <neurodrone> Please use pastebin.com or something if you are pasting >5 lines.
[15:15] <Cory_> Sorry!
[15:15] <Mosibi> Cory1: do not mark it as lost yet....
[15:16] <Cory_> Ok, how can we confim that i have a good copy of all objects on other OSD's
[15:17] <Cory_> This cluster was configureed for a replica set of 2 and was healthy untill a motherboard failure, i then moved all of the OSD's to another host, all worked ok i think except osd 35
[15:18] <Mosibi> Cory1: it is recovering at the moment?
[15:19] <Cory1> I think so, there is alot of activity on one of the osd's still
[15:19] <Mosibi> Cory1: with 'ceph -w' you should see recovery activity'
[15:19] <Cory_> ceph -w shows
[15:19] <Cory_> 2016-04-04 22:49:35.306304 mon.0 [INF] pgmap v15574170: 5130 pgs: 1 stale+active+recovery_wait+degraded, 4763 active+clean, 4 stale+down+peering, 22 stale+active+remapped, 1 active+recovering+degraded, 1 stale+active+remapped+backfilling, 39 down+peering, 112 active+remapped, 4 stale+down+remapped+peering, 162 active+undersized+degraded, 21 stale+active+clean; 12755 GB data, 22531 GB used, 39648 GB / 62180 GB avail; 10722 kB/s rd, 1274 kB/s
[15:19] <Cory_> wr, 318 op/s; 2402/4513538 objects degraded (0.053%); 84150/4513538 objects misplaced (1.864%); 1/3065045 unfound (0.000%)
[15:20] <Cory_> No recover activity
[15:20] <Mosibi> that 1 unfound is troublesome...
[15:20] * mason1 (~Teddybare@4MJAADXHX.tor-irc.dnsbl.oftc.net) Quit ()
[15:20] <Cory_> Brinign 35 back online should solve that yes?
[15:21] <Mosibi> I would think so.
[15:21] <Mosibi> can you put the output of ceph health detail somewhere?
[15:21] <Cory1> Sure
[15:22] <Cory_> https://paste.gnome.org/plch1dvwn
[15:25] * georgem (~Adium@69-165-151-116.dsl.teksavvy.com) has joined #ceph
[15:27] * Cory2 (3aaeec67@107.161.19.53) has joined #ceph
[15:27] * Cory1 (3aaeec67@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[15:28] <Mosibi> Cory2: is that paste complete?
[15:28] <Cory2> I'm going to plug OSD35 back in so we can try and bring it online?
[15:28] <Mosibi> yes
[15:28] <Cory_> https://paste.gnome.org/plch1dvwn
[15:29] <Cory_> Sorry, i'm using 2 different IRC accounts on different PC's
[15:29] <Cory_> I'm justin using Cory2 now
[15:29] * Cory_ (~cory@CPE-58-174-236-103.sa.bigpond.net.au) Quit (Quit: Cory_)
[15:29] <Cory2> Did you get that paste?
[15:30] * Cory2 (3aaeec67@107.161.19.53) Quit ()
[15:30] * Cory2 (3aaeec67@107.161.19.53) has joined #ceph
[15:30] <Mosibi> Cory2: yes, but i think it was not complete
[15:31] * kawa2014 (~kawa@89.184.114.246) Quit (Ping timeout: 480 seconds)
[15:31] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[15:31] * Cory2 (3aaeec67@107.161.19.53) Quit ()
[15:33] * rdas (~rdas@121.244.87.116) Quit (Ping timeout: 480 seconds)
[15:37] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[15:40] <Anticimex> hm, is there something oen can configure to avoid or drastically reduce client ops getting blocked by peering
[15:41] * gopher_49 (~gopher_49@host2.drexchem.com) has joined #ceph
[15:41] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[15:41] <T1w> seperate client and cluster network?
[15:42] <Anticimex> nope, same
[15:42] <Anticimex> i don't think there's network io issue
[15:42] <T1w> .. then you have a startingpoint
[15:42] <Anticimex> the cluster is empty
[15:42] <T1w> it's one of the first and biggest troublepoints
[15:42] <Anticimex> i'll have to get some graphs up
[15:43] <T1w> .. then it's possibly not a peering issue you are seeing
[15:43] <Anticimex> well, when i restart hosts osds and don't set noout,nodown,noup, data starts to move around
[15:43] <T1w> .. as should be expected, yes
[15:44] <Anticimex> and that moving, results in some client ops getting: 2016-04-04 15:36:44.977354 osd.96 [WRN] slow request 30.660092 seconds old, received at 2016-04-04 15:36:14.317190: osd_op(client.87216.0:5728 notify.7 [watch ping cookie 139935092342304 gen 12] 5.84ada7c9 ondisk+write+known_if_redirected e2023) currently waiting for peered
[15:45] * Sami345 (~Gibri@4MJAADXI2.tor-irc.dnsbl.oftc.net) Quit ()
[15:45] * Grimhound (~Redshift@6AGAAAD1F.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:46] <Anticimex> with no data in the cluster, i'm wondering why it blocks
[15:47] <Anticimex> primary or acting primary gets some work to do i can understand, shuffling metadata around and messaging, but why block
[15:47] <Anticimex> well need to understand this better, thanks :)
[15:48] <boolman> hm how come I can't mount my cephfs at boot, I use _netdev option and also tried adding 'ceph' to /etc/modules
[15:48] <jcsp> boolman: are you getting an error?
[15:51] * CydeWeys1 (~luigiman@88.79-160-125.customer.lyse.net) has joined #ceph
[15:53] * wyang (~wyang@59.45.74.45) has joined #ceph
[15:54] <T1w> Anticimex: it could be because you block writes if not enough copies can be written if a single node is down..
[15:55] <T1w> afk - home I go..
[15:56] * Cory1 (77fc11a3@107.161.19.53) has joined #ceph
[15:56] <Cory1> Sorry, I dropped out while i went tot he server room
[15:56] <Cory1> Did you get my paste?
[15:57] * zenpac (~zenpac3@66.55.33.66) has joined #ceph
[15:57] <Anticimex> T1w: right, don't think it's that. it sorts itself out over some time, so cluster is "nominally healthy" or whatever one can call it
[15:58] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[15:58] <Anticimex> or more specifically, the crush rule does find acting sets even if one node disappears
[15:59] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[15:59] <zenpac> What does Calamari monitor (via ceph commands) to get its events?
[16:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[16:01] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[16:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[16:01] <brians__> writeback cache on SATA based OSDs a good idea?
[16:01] * zaitcev (~zaitcev@c-50-130-189-82.hsd1.nm.comcast.net) has joined #ceph
[16:02] <brians__> and hello :)
[16:02] * bara (~bara@213.175.37.12) has joined #ceph
[16:03] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:05] <IcePic> brians__: probably not. When the OSD asks if the data is on the disks, the answer should hopefully be true.
[16:06] <brians__> thanks IcePic
[16:06] <IcePic> depends on how important your data is of course.
[16:07] <sugoruyo> hey folks, can someone help figure out what's up with some EC PGs not peering? There's a whole host that's down and all its OSDs are marked out but the PGs are not getting remapped to other OSDs and no recovery is happening
[16:07] <brians__> I was wondering would it help lots of very small random IOs IcePic
[16:07] <brians__> bs=1 count=1000 oflag=dsync
[16:07] <brians__> 1000 bytes (1.0 kB) copied, 4.04058 s, 0.2 kB/s
[16:07] <brians__> :s
[16:08] <IcePic> brians__: it will make everything faster, at the price of data getting lost in case of power outage or spontaneous reboots.
[16:08] * RameshN (~rnachimu@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:08] <brians__> got it IcePic thanks - one final question if you are open to it :)
[16:09] <brians__> I've been thinking about a small cluster 4 nodes with 4 nl sas drives with dc3700 as journal disks
[16:09] <brians__> would I get HUGE performance increase if I changed to 4 decent SSDs instead of the NL sas
[16:09] <brians__> my understanding from reading is that ceph can't really take full advantage of all SSD osds .
[16:09] <IcePic> it depends on the kind of data you are expecting.
[16:09] <brians__> but I could be misunderstanding that
[16:10] <m0zes> another option is to split the storage ssds into multiple "osds"
[16:10] <IcePic> For many applications, just having the journal on SSD will allow the spinning disks to work in a decent fashion and hopefully keep a good data flow up
[16:10] <brians__> kind of data is all virtual machine images using rbd
[16:10] * ircolle (~Adium@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[16:10] <brians__> I'm seeing great throughput on RBD with R=2 of 500MB/s
[16:10] * kawa2014 (~kawa@89.184.114.246) Quit (Ping timeout: 480 seconds)
[16:11] <brians__> but smaller IOs seem to get hindered by the IOPS of the sata drives
[16:11] <brians__> (thats my limited understanding of the situaion)
[16:11] <brians__> m0zes interesting idea also.
[16:11] * i_m (~ivan.miro@deibp9eh1--blueice4n6.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[16:11] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[16:11] <brians__> because the SSD should be able to handle the IOPS of 2 OSDS
[16:11] <m0zes> I've seen it done to better utilize the p3700 ssds as OSDs.
[16:12] <m0zes> thye did 4 with the p3700s
[16:12] <boolman> jcsp: currently it just hangs
[16:12] * rakeshgm (~rakesh@121.244.87.117) Quit (Quit: Leaving)
[16:12] <brians__> I'm planning single P3700 in each box just for journalling right now
[16:12] <boolman> hm, wonder if it might be a dns issue, will try to use the ips
[16:12] <brians__> and then possibly 4 - 6 lower end intel 1TB SSDs
[16:12] <brians__> as OSDs
[16:13] * Racpatel (~Racpatel@2601:87:3:3601::4edb) has joined #ceph
[16:13] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:15] * Grimhound (~Redshift@6AGAAAD1F.tor-irc.dnsbl.oftc.net) Quit ()
[16:15] * Grum (~Jourei@tor1e1.privacyfoundation.ch) has joined #ceph
[16:17] * sage (~quassel@2607:f298:6050:709d:dc44:e0b0:a2b6:f311) Quit (Remote host closed the connection)
[16:18] * sage (~quassel@2607:f298:6050:709d:dcee:8b17:54b4:7195) has joined #ceph
[16:18] * ChanServ sets mode +o sage
[16:19] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:20] * CydeWeys1 (~luigiman@6AGAAAD1X.tor-irc.dnsbl.oftc.net) Quit ()
[16:21] * neobenedict (~Grimmer@tor-exit-hirsiali.unsecu.re) has joined #ceph
[16:21] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[16:21] <Cory1> Hi all, i'm having alot of troubles with getting one of my OSD's online. I cant even see an error in this output, can anybody shed some light?
[16:21] <Cory1> https://paste.gnome.org/pjewdpgd0
[16:22] * swami1 (~swami@49.32.0.32) Quit (Quit: Leaving.)
[16:23] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[16:24] * ngoswami (~ngoswami@121.244.87.116) Quit ()
[16:25] * debian112 (~bcolbert@24.126.201.64) Quit (Ping timeout: 480 seconds)
[16:25] <Cory1> Looks like maybe PGLog::read_log(ObjectStore* is the faulting function?
[16:27] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[16:28] * EinstCrazy (~EinstCraz@180.152.103.247) has joined #ceph
[16:28] * j3roen (~j3roen@77.60.46.13) Quit (Remote host closed the connection)
[16:31] * j3roen (~j3roen@77.60.46.13) has joined #ceph
[16:31] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[16:32] * bara_ (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[16:33] <m0zes> looks like a corrupt log entry in the pg log for 34.9. not that I know how to fix it ;)
[16:35] <Cory1> Yeah, i agree.
[16:35] <Cory1> Also not sure how to resolve it!
[16:35] <m0zes> has the rest of your cluster recovered the loss of the osd?
[16:36] <Cory1> The rest of the OSDs are but but my health state is WARN
[16:36] <Cory1> 2016-04-05 00:06:35.710264 mon.0 [INF] pgmap v15578682: 5130 pgs: 1 stale+active+recovery_wait+degraded, 4763 active+clean, 4 stale+down+peering, 22 stale+active+remapped, 1 active+recovering+degraded, 1 stale+active+remapped+backfilling, 39 down+peering, 112 active+remapped, 4 stale+down+remapped+peering, 162 active+undersized+degraded, 21 stale+a
[16:36] <Cory1> ctive+clean; 12755 GB data, 22532 GB used, 39648 GB / 62180 GB avail; 1103 kB/s rd, 1771 kB/s wr, 495 op/s; 2402/4513592 objects degraded (0.053%); 84150/4513592 objects misplaced (1.864%); 1/3065072 unfound (0.000%)
[16:38] <m0zes> ooh, stale pgs, and an unfound object... I'd stick around, keep asking, and perhaps ask on the mailing list.
[16:38] <Cory1> Thanks :)
[16:38] * bara (~bara@213.175.37.12) Quit (Ping timeout: 480 seconds)
[16:39] <neurodrone> But these aren???t stuck stale right?
[16:39] * wwdillingham (~LobsterRo@140.247.242.44) has joined #ceph
[16:39] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[16:39] <neurodrone> Can you query once of the pgs and see where they are headed?
[16:39] <Cory1> They are not stuck stale i dont think
[16:40] <wwdillingham> I would like to experiment with cephfs on an existing cluster. would it be unwise to deploy the mds service alongside monitor service on an existing monitor (I have 5 mons) knowing that I will soon remove this mds and buy a dedicated host for being an MDS? My thought is just to blast /var/lib/ceph/mds and remove the cephfs pools when I am done testing and then deploy to the dedicated mds.
[16:40] <neurodrone> Then they should be fine. Querying them will tell you where they are headed.
[16:40] <Cory1> Ok, standby
[16:41] <neurodrone> wwdillingham: Assuming this is not your prod setup I don???t see any specific problems with that setup. ;)
[16:41] <m0zes> wwdillingham: that should be fine. remember the mds is stateless (at least as far as local storage is concerned). you shouldn't need to blow away anything ;)
[16:41] <wwdillingham> neurodrone: it is my prod setup, though the cephfs component would not be in production until we had a dedicated mds
[16:42] * kefu (~kefu@183.193.163.144) has joined #ceph
[16:42] * rmart04 (~rmart04@support.memset.com) has joined #ceph
[16:43] <Cory1> http://pastebin.com/mTFrywA6
[16:44] <Cory1> neurodone: thats a dump of ceph pg dump_stuck stale
[16:45] * Grum (~Jourei@06SAAAXFN.tor-irc.dnsbl.oftc.net) Quit ()
[16:45] * Jebula (~danielsj@217.23.15.200) has joined #ceph
[16:46] <neurodrone> And do you have an osd.35 running in your cluster ATM?
[16:46] <Cory1> correct, osd35 is showing down. because it will not start
[16:46] * kefu (~kefu@183.193.163.144) Quit ()
[16:47] <neurodrone> Is this is within a replicated pool btw?
[16:47] <Cory1> A ceph pg 39.c8 query hangs. presumable because 39.c8 is showing as being on OSD35
[16:47] <Cory1> Yes, a replciated pool
[16:47] <neurodrone> Yep, I guessed. Too many pgs on 35 look broken.
[16:47] <neurodrone> How many copies?
[16:47] <Cory1> There was 2
[16:47] <Cory1> But i set to 1 earlier today
[16:47] <Cory1> (I Shoulnt have)
[16:48] <neurodrone> Oh!
[16:48] <neurodrone> Hmm.
[16:48] <Cory1> I think if i can get osd35 online i'll be ok
[16:48] <neurodrone> Yeah, this certainly looks like you will need osd.35 to be up because all these copies that are on that OSD won???t be able to make it.
[16:49] <neurodrone> I???d hate for your mark them as lost. So getting it up is your best resort.
[16:49] <Cory1> Thats my thoughts
[16:49] * davidz (~davidz@2605:e000:1313:8003:5180:6d40:4789:d518) has joined #ceph
[16:49] <Cory1> I need to resolve why the OSD crashes when i start it
[16:49] <neurodrone> They segfault?
[16:49] <Cory1> Naturally this has inportant info on it
[16:49] <neurodrone> Which version of ceph?
[16:50] <Cory1> Here is a breif snippet of the fault from the logs of 35
[16:50] <Cory1> http://pastebin.com/HvcrVynm
[16:50] <Cory1> V 94.3
[16:50] * neobenedict (~Grimmer@4MJAADXMP.tor-irc.dnsbl.oftc.net) Quit ()
[16:51] * alkaid (~alkaid@128.199.95.148) Quit (Quit: Leaving)
[16:51] * Pettis (~GuntherDW@176.10.99.205) has joined #ceph
[16:51] <neurodrone> Oh they abort, interesting.
[16:51] <Cory1> Yeah, odd
[16:52] <Cory1> And the error is not very clear
[16:53] * xarses (~xarses@64.124.158.100) has joined #ceph
[16:54] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[16:54] <neurodrone> Can you query for pg `34.8` ?
[16:54] <neurodrone> Sorry `34.9` actually.
[16:55] <neurodrone> Something is up with that guy.
[16:55] * davidz1 (~davidz@2605:e000:1313:8003:5180:6d40:4789:d518) has joined #ceph
[16:55] <Cory1> You think it's 34.9?
[16:55] <Cory1> I'll pastbin the results
[16:55] * davidz (~davidz@2605:e000:1313:8003:5180:6d40:4789:d518) Quit (Read error: Connection reset by peer)
[16:55] <neurodrone> That???s the last one in the stack trace.
[16:57] * ade (~abradshaw@212.77.58.61) Quit (Ping timeout: 480 seconds)
[16:57] * davidz1 (~davidz@2605:e000:1313:8003:5180:6d40:4789:d518) Quit (Read error: Connection reset by peer)
[16:57] * davidz (~davidz@2605:e000:1313:8003:5180:6d40:4789:d518) has joined #ceph
[16:58] <Cory1> http://pastebin.com/H26Uwqme
[16:58] <Cory1> Thats the result of ceph pg 34.9 query
[16:59] <m0zes> it is active+clean o_O
[16:59] <brians__> I wish I could say the same about my girlfriend
[17:00] <Cory1> Yeah, it looks ok? But it's the last thing that OSD35 mentions before it crashes
[17:00] * Muhlemmer (~kvirc@79.116.126.92) has joined #ceph
[17:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[17:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[17:02] <Cory1> http://pastebin.com/MaeWHH2E
[17:02] <Cory1> This is what i see when i try and start OSD35
[17:03] <neurodrone> Cory1: What are your logging levels?
[17:03] <neurodrone> Oh nvm.
[17:03] <Cory1> Defaults at the moment
[17:04] <neurodrone> Can you put objecter, filestore, journal and osd at 20/20?
[17:04] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:04] <neurodrone> And rerun.
[17:05] <Cory1> I'd normally do that with ceph tell
[17:05] <Cory1> But i cant if the OSD is down?
[17:05] <neurodrone> For now you can do it in ceph.conf.
[17:05] <neurodrone> The one that this OSD will use when it starts up.
[17:05] <Cory1> ok, i'll do that now
[17:07] <Cory1> Just confirming how I should do that, i'm using http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/
[17:07] <Cory1> Settings in the boot time heading
[17:07] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[17:08] <neurodrone> Yep, you can just add lines like ???debug objecter = 20/20??? and such.
[17:08] <neurodrone> In the [global] section.
[17:08] <Cory1> I wasnt sure if i could put them in global, ok will do
[17:08] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) Quit (Quit: Ex-Chat)
[17:08] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) has joined #ceph
[17:09] * Muhlemmer (~kvirc@79.116.126.92) Quit (Ping timeout: 480 seconds)
[17:10] * yuxiaozou (~yuxiaozou@128.135.100.101) has joined #ceph
[17:10] * EinstCrazy (~EinstCraz@180.152.103.247) Quit (Remote host closed the connection)
[17:12] <Cory1> Its a big log file, would you like to see the whole thing?
[17:13] <neurodrone> Sure.
[17:13] <neurodrone> This should make it much clearer now.
[17:13] * wyang (~wyang@59.45.74.45) Quit (Quit: This computer has gone to sleep)
[17:14] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[17:14] * wjw-freebsd (~wjw@vpn.ecoracks.nl) Quit (Ping timeout: 480 seconds)
[17:14] <Cory1> https://www.dropbox.com/s/j6g82s0z50qibet/ceph-osd.35.log?dl=0
[17:15] <Cory1> Seems to be fauting while reading the log?
[17:15] * Jebula (~danielsj@06SAAAXGW.tor-irc.dnsbl.oftc.net) Quit ()
[17:15] * JamesHarrison1 (~Teddybare@185.36.100.145) has joined #ceph
[17:15] <neurodrone> Hmm. Why is it issuing delete calls to so many objects.
[17:15] <boolman> jcsp: I found that I had to use IP-addresses in fstab for it to work at boot
[17:16] <Cory1> Probably because it's going from a replica size of 2 down to 1?
[17:17] <neurodrone> Cory1: What do you get when you run ???ceph osd map <pool> rb.0.6f57.2506ccfd.00000000cd66???
[17:17] <neurodrone> Replace <pool> with your pool name.
[17:17] * Muhlemmer (~kvirc@86.127.75.37) has joined #ceph
[17:18] <Cory1> Ok,standby
[17:18] <Cory1> osdmap e4816 pool 'volumes_sata' (39) object 'rb.0.6f57.2506ccfd.00000000cd66' -> pg 39.85411809 (39.9) -> up ([38], p38) acting ([38], p38)
[17:18] <neurodrone> Hmm.
[17:19] <neurodrone> Okay.
[17:19] <neurodrone> I cannot find pg 39.85 in your list of stale so that might not be related.
[17:20] <Cory1> 39.85 isnt showing as stale
[17:20] * Pettis (~GuntherDW@4MJAADXN7.tor-irc.dnsbl.oftc.net) Quit ()
[17:21] * sixofour (~redbeast1@193.0.213.42) has joined #ceph
[17:21] <Anticimex> ok i'm getting closer to the perf troubles. completely awful rados bench
[17:21] * TMM (~hp@185.5.122.2) Quit (Quit: Ex-Chat)
[17:21] <neurodrone> Cory1: journaler, objectcacher to 20/20 as well if you can?
[17:22] <Anticimex> https://gist.github.com/Millnert/b26f7a51203282f211c5a077c5838642 - not that pretty
[17:22] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:22] <Cory1> Yep, will do now
[17:27] <Cory1> Not sure if that did anything new, the file size is the same as the last log file. Here it is anyway
[17:27] <Cory1> https://www.dropbox.com/s/ynwkoghg4bcx635/ceph-osd.35-2.log?dl=0
[17:28] <neurodrone> Oh very odd, it logged 6 less lines. Ha.
[17:28] <neurodrone> So our last hope is `optracker = 20/20`.
[17:28] <m0zes> Anticimex: how many osds?
[17:28] <neurodrone> Let???s see if that does anything useful
[17:28] <Anticimex> m0zes: 36
[17:28] <Anticimex> with nvram journal
[17:29] <m0zes> and are the osds in use for other iops?
[17:29] <Anticimex> no idle
[17:29] <Anticimex> "no, idle"
[17:29] <m0zes> weird.
[17:29] * Muhlemmer (~kvirc@86.127.75.37) Quit (Quit: KVIrc 4.9.1 Aria http://www.kvirc.net/)
[17:29] <Anticimex> turning on some osd debugging to see if i get a clue
[17:29] <Anticimex> then i guess it's fio time
[17:29] <Anticimex> and osd / filestore / journal configuration trimming
[17:30] <Cory1> Ok done
[17:30] <Cory1> https://www.dropbox.com/s/i1x2vqcgp7qkr96/ceph-osd.35-3.log?dl=0
[17:31] * Muhlemmer (~kvirc@86.127.75.37) has joined #ceph
[17:32] <neurodrone> Hmm.
[17:32] <m0zes> Anticimex: have you done any analysis on performance out of 'ceph tell osd.$num bench' ?
[17:33] <neurodrone> Cory1: I am out of ideas. Sorry.
[17:33] <m0zes> you might have a particular set of osds that are slow, or even a single one.
[17:33] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[17:33] <Cory1> Oh no! Your hepl has been great so far.
[17:33] <m0zes> I've got a *really* simple python test script. http://paste.ie/view/8297dd56
[17:34] <neurodrone> I don???t know how fine you are with upgrading but moving to 0.94.6 might be worth it. Unless of course the osd node contains other working OSDs.
[17:34] <Cory1> All 10 OSD's are on that same node so that would be tricky
[17:34] <neurodrone> Yeah, probably not the best way forward then.
[17:35] <Anticimex> m0zes: ah, no
[17:35] <Anticimex> that's a good hint
[17:35] <m0zes> Anticimex: I'd also check the osd hosts to see if there are any network errors reported.
[17:35] <m0zes> drops and/or overruns.
[17:35] * kefu (~kefu@183.193.163.144) has joined #ceph
[17:35] <Anticimex> network apepars fine, 10Gbps free flowing, just did iperfs
[17:36] <Cory1> So my understanding is that the OSD is replaying a log file when it starts, it's trying to delete 85411809/rb.0.6f57.2506ccfd.00000000cd66/head//34 but faults?
[17:36] <m0zes> I have had a bad network cable cause performance issues for ceph...
[17:36] * bjornar_ (~bjornar@109.247.131.38) Quit (Ping timeout: 480 seconds)
[17:36] * Muhlemmer (~kvirc@86.127.75.37) Quit (Read error: No route to host)
[17:36] <Anticimex> m0zes: if i do this: http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#runtime - sepcifically ceph daemon osd.$id config set debug_osd 0/5 , where should i expect to get logs?
[17:36] * Muhlemmer (~kvirc@86.127.75.37) has joined #ceph
[17:36] <neurodrone> Looks like it. Something with that object.
[17:36] <Anticimex> what's the value i should turn on to get some good osd logging anyway?
[17:36] <Anticimex> 0/5 sounds low
[17:36] <neurodrone> Or the pg that holds that object.
[17:37] <Anticimex> m0zes: yeah, i'm aware, ECMP multi-path hashing etc as well
[17:37] <m0zes> Anticimex: /var/log/ceph/ceph-osd.$id.log iirc. or journalctl. I've not checked under systemd yet, so it might be there.
[17:37] * LDA (~lda@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[17:37] <Anticimex> but something's fishy. will try th ebench
[17:37] <Anticimex> yeah, thanks. i checked those
[17:37] <Anticimex> gues 0/5 is just too low
[17:38] <m0zes> 0/5 is very low.
[17:38] <Cory1> Yet we decided that the PG holding that object looked ok?
[17:39] <m0zes> that script will output an osdnums.csv file in the current working directory
[17:39] <m0zes> that pg is healthy, so I'd be tempted to try and remove it from the broken osd. unfortunately I don't know how to do that.
[17:39] <m0zes> Cory1: ^^
[17:40] <neurodrone> Yep, the cluster does think it is healthy.
[17:40] <Anticimex> m0zes: 20/20 is very high :)
[17:41] <Cory1> Ok, so given i have some servers offline at the moment, do you have any recommendations?
[17:41] <neurodrone> I???d suggest asking in the mailing list. This is beyond what I have worked with till date.
[17:42] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: This computer has gone to sleep)
[17:42] * Muhlemmer (~kvirc@86.127.75.37) Quit (Quit: KVIrc 4.9.1 Aria http://www.kvirc.net/)
[17:42] <Cory1> Ok thanks, Ive already posted to the mailing list. I might send an update throug, gievn you seem to understand this well. What info should i provide in the update?
[17:44] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[17:44] * Geph (~Geoffrey@41.77.153.99) has joined #ceph
[17:45] * JamesHarrison1 (~Teddybare@6AGAAAD7V.tor-irc.dnsbl.oftc.net) Quit ()
[17:45] * jacoo (~ghostnote@tor-exit.gansta93.com) has joined #ceph
[17:46] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[17:46] * dyasny (~dyasny@bzq-82-81-161-51.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[17:48] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[17:49] <Anticimex> m0zes: may have located the fault, misconfigured the nvram journal it appears
[17:50] * kefu_ (~kefu@114.92.120.83) has joined #ceph
[17:50] * sixofour (~redbeast1@6AGAAAD74.tor-irc.dnsbl.oftc.net) Quit ()
[17:50] * hgjhgjh (~ain@185.100.85.192) has joined #ceph
[17:54] * oliveiradan3 is now known as doliveira
[17:57] * kefu (~kefu@183.193.163.144) Quit (Ping timeout: 480 seconds)
[17:58] * evelu (~erwan@46.231.131.178) Quit (Remote host closed the connection)
[17:58] <m0zes> Anticimex: how so?
[17:58] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[17:59] <Anticimex> well, writes to it without any special flags are fine and dandy
[17:59] <Anticimex> it/them
[17:59] <Anticimex> but put on 'sync' or 'direct' (simple dd test), and i get less than 1 iops
[17:59] * bara (~bara@213.175.37.12) has joined #ceph
[18:00] <Anticimex> centos 7.2, nvram/dm-crypt/ext4 (with nobarrier)
[18:00] * evelu (~erwan@46.231.131.178) has joined #ceph
[18:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[18:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[18:01] * huangjun (~kvirc@117.152.72.24) Quit (Ping timeout: 480 seconds)
[18:01] * Geph (~Geoffrey@41.77.153.99) Quit (Ping timeout: 480 seconds)
[18:03] <m0zes> ouch
[18:03] * bara_ (~bara@nat-pool-brq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:04] <Anticimex> hey, at least found the issue
[18:04] <Anticimex> :]
[18:05] <s3an2> Anyone using CephFs Quotas?
[18:06] * cathode (~cathode@50.232.215.114) has joined #ceph
[18:08] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[18:08] * dgurtner (~dgurtner@82.199.64.68) Quit (Ping timeout: 480 seconds)
[18:09] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[18:13] * bara_ (~bara@213.175.37.12) has joined #ceph
[18:15] * jacoo (~ghostnote@06SAAAXJ5.tor-irc.dnsbl.oftc.net) Quit ()
[18:15] * mps (~dontron@176.10.99.205) has joined #ceph
[18:20] * hgjhgjh (~ain@4MJAADXRY.tor-irc.dnsbl.oftc.net) Quit ()
[18:28] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[18:28] * madkiss (~madkiss@2001:6f8:12c3:f00f:1541:eaa8:e84e:1fb0) has joined #ceph
[18:32] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[18:32] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[18:37] * DanFoster (~Daniel@office.34sp.com) Quit (Quit: Leaving)
[18:37] * mykola (~Mikolaj@91.225.200.223) has joined #ceph
[18:38] * sjackson (~circuser-@207.111.246.196) Quit (Remote host closed the connection)
[18:39] * DanFoster (~Daniel@2a00:1ee0:3:1337:6811:27cc:d1f3:b1cd) has joined #ceph
[18:40] * DanFoster (~Daniel@2a00:1ee0:3:1337:6811:27cc:d1f3:b1cd) Quit ()
[18:40] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:42] <Cory1> Anyone able to help with my OSD not coming up when I Issue the start command?
[18:42] <Cory1> https://www.dropbox.com/s/i1x2vqcgp7qkr96/ceph-osd.35-3.log?dl=0
[18:43] <Cory1> It fails when deleting an object by the looks of this log
[18:45] * mps (~dontron@6AGAAAEBA.tor-irc.dnsbl.oftc.net) Quit ()
[18:45] * MatthewH12 (~Pommesgab@chomsky.torservers.net) has joined #ceph
[18:48] * Skaag (~lunix@65.200.54.234) has joined #ceph
[18:51] * Miouge (~Miouge@h-72-233.a163.priv.bahnhof.se) has joined #ceph
[18:51] * RameshN (~rnachimu@101.222.183.43) has joined #ceph
[18:52] * bjornar_ (~bjornar@ti0099a430-1561.bb.online.no) has joined #ceph
[18:56] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has left #ceph
[18:56] * lcurtis_ (~lcurtis@47.19.105.250) has joined #ceph
[18:59] * dgurtner (~dgurtner@217.149.140.193) has joined #ceph
[19:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[19:01] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[19:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[19:01] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[19:02] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[19:03] * rmart04 (~rmart04@support.memset.com) Quit (Quit: rmart04)
[19:08] * bara_ (~bara@213.175.37.12) Quit (Ping timeout: 480 seconds)
[19:09] * bara (~bara@213.175.37.12) Quit (Ping timeout: 480 seconds)
[19:09] * RameshN (~rnachimu@101.222.183.43) Quit (Ping timeout: 480 seconds)
[19:10] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) Quit (Quit: It's just that easy)
[19:11] * Kupo1 (~tyler.wil@23.111.254.159) has joined #ceph
[19:12] * RameshN (~rnachimu@101.222.247.215) has joined #ceph
[19:14] * rakeshgm (~rakesh@106.51.225.4) has joined #ceph
[19:15] * MatthewH12 (~Pommesgab@4MJAADXVF.tor-irc.dnsbl.oftc.net) Quit ()
[19:15] * Tralin|Sleep (~MatthewH1@46.183.222.171) has joined #ceph
[19:15] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[19:20] * dgurtner (~dgurtner@217.149.140.193) Quit (Ping timeout: 480 seconds)
[19:20] * Spessu (~storage@atlantic480.us.unmetered.com) has joined #ceph
[19:21] * Kupo1 (~tyler.wil@23.111.254.159) has left #ceph
[19:21] * swami1 (~swami@27.7.172.84) has joined #ceph
[19:25] * evelu (~erwan@46.231.131.178) Quit (Ping timeout: 480 seconds)
[19:28] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Quit: Leaving.)
[19:29] * ivancich (~ivancich@aa2.linuxbox.com) has joined #ceph
[19:29] * ivancich is now known as ivancich_
[19:33] * overclk (~quassel@117.202.96.224) Quit (Remote host closed the connection)
[19:33] * kefu_ (~kefu@114.92.120.83) Quit (Max SendQ exceeded)
[19:33] * kefu (~kefu@114.92.120.83) has joined #ceph
[19:35] * Cory1 (77fc11a3@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[19:39] * pabluk_ is now known as pabluk__
[19:45] * Tralin|Sleep (~MatthewH1@06SAAAXOM.tor-irc.dnsbl.oftc.net) Quit ()
[19:45] * RameshN (~rnachimu@101.222.247.215) Quit (Ping timeout: 480 seconds)
[19:47] * swami1 (~swami@27.7.172.84) Quit (Quit: Leaving.)
[19:49] * dgurtner (~dgurtner@217.149.140.193) has joined #ceph
[19:50] * Spessu (~storage@4MJAADXWX.tor-irc.dnsbl.oftc.net) Quit ()
[19:51] * Esvandiary (~xolotl@Relay-J.tor-exit.network) has joined #ceph
[19:51] * Hemanth (~hkumar_@103.228.221.131) has joined #ceph
[19:55] * kefu (~kefu@114.92.120.83) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:58] * angdraug (~angdraug@64.124.158.100) has joined #ceph
[20:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[20:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[20:05] * bdeetz (~bdeetz@72.194.132.130) has joined #ceph
[20:06] <bdeetz> So... I'm thinking about deploying Ceph in my environment. Is anybody interested in telling me I'm crazy?
[20:13] <rkeene> No.
[20:13] <rkeene> (Not without more information -- Ceph is relatively easy to setup)
[20:15] * dgurtner (~dgurtner@217.149.140.193) Quit (Ping timeout: 480 seconds)
[20:19] * shaunm (~shaunm@74.83.215.100) Quit (Ping timeout: 480 seconds)
[20:19] <bdeetz> rkeene: I agree. So. We have roughly 500TB of file storage between Isilon and Oracle ZFS. Isilon is very expensive and my management would like to save some money (like management does). With experience in Lustre and other parallel file systems, CephFS doesn't scare me. I've been toying with it for months and I have to say, I'm very impressed. I do have some concerns though. My understanding is that XFS has some interesting failure modes that can occur at loss
[20:19] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[20:20] * Esvandiary (~xolotl@06SAAAXQR.tor-irc.dnsbl.oftc.net) Quit ()
[20:20] * Sliker (~VampiricP@06SAAAXR7.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:21] <rkeene> You don't have to use XFS -- I'm not a fan of it due to lack of checksumming, my next release of our Ceph-based appliance will use ZFS.
[20:21] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[20:22] <bdeetz> Interesting. So, should I not trust documentation on ceph.com to give me best practices?
[20:23] <rkeene> You should trust yourself
[20:26] <bdeetz> Indeed. Have you heard of anybody deploying CephFS at a scale of roughly 700TB+ usable?
[20:27] <brians__> bdeetz https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the-life-of-a-petabyte-scale-block-storage-service
[20:27] * darthbacon (~Jason@64.234.158.96) has joined #ceph
[20:28] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[20:28] <bdeetz> @brians_: yes. I've watched CERN's stuff and am trying to talk with their team about about what I'm planning. That said, they're using Ceph for RBD instead of CephFS.
[20:29] <brians__> I missed the cephfs part to your question bdeetz apologies.
[20:29] * angdraug (~angdraug@64.124.158.100) Quit (Quit: Leaving)
[20:29] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[20:30] * shaunm (~shaunm@74.83.215.100) Quit (Ping timeout: 480 seconds)
[20:31] * darthbacon1 (~Jason@129.24.240.130) has joined #ceph
[20:32] <T1> bdeetz: cephfs is only JUST become production ready with jewel..
[20:32] <T1> bdeetz: be careful..
[20:32] <T1> has only even
[20:32] <darthbacon1> I understand. It is a very small use case.
[20:33] * darthbacon1 (~Jason@129.24.240.130) Quit ()
[20:33] * darthbacon1 (~Jason@129.24.240.130) has joined #ceph
[20:34] <darthbacon1> sorry I keep dropping connection
[20:35] <bdeetz> T1: Is that your opinion due to the lack of fsck or due to other issues within the filesystem?
[20:36] <T1> a little more than half a year ago we looked at using cephfs, but there were problems with everything from no support for multiple active MDSs for auto hot zoning and load balancing, 1000+ files in a single directory is (was?) a bit no no if you did not want to suffer maaaajor performance issues and other small stuff such as no fsck or other way of running consistency checks
[20:36] * untoreh (~untoreh@151.50.215.50) Quit (Read error: No route to host)
[20:36] <darthbacon1> It's just been extremely convenient. I've been able to keep the use case simple enough to dodge most of those issues.
[20:36] <T1> .. that was what I was told, could read from entries to the mailinglist and other public resources..
[20:37] * darthbacon (~Jason@64.234.158.96) Quit (Ping timeout: 480 seconds)
[20:37] <T1> I spent the better part of a few months on reading up on ceph stuff even before I purchased the first hardware for it
[20:38] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[20:38] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[20:38] <T1> be very vey careful
[20:39] <T1> cephfs is really early
[20:39] <darthbacon1> yes, a lot of the other bits are clearly documented. When I start delving into cephFS network traffic, all I get is information about cephX authentication.
[20:39] * Hemanth (~hkumar_@103.228.221.131) Quit (Ping timeout: 480 seconds)
[20:39] <Kvisle> does anyone have any good suggestions of early tuning of ceph rbd? I experience somewhat high latency in rbd-block-devices in my VMs, even when throughput is low. all journals are located on ssds, and it looks to me like the osds are fine
[20:39] <bdeetz> right. The multiple MDS issue is also a problem on Panasas. I've been toying around with it in my test environment on dedicated hardware for a few months. We're looking at move forward with it. I'd be journals on Intel's nvme pcie cards. One of the biggest things I want to flush out of people is if they've experienced data loss and how that loss occurred.
[20:40] <T1> when it gets mature enough we will probably move away from RBDs with a filesystem inside (XFS was chosen with no major consideration), but that is not anytime this year..
[20:40] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[20:40] <ben3> Kvisle: high network latency?
[20:41] <T1> afk..
[20:41] <T1> food
[20:41] <ben3> is there a filesystem that works better inside a rbd?
[20:41] <Kvisle> shouldn't be ... ping latency is about 0.150ms, and I've done my best to remove bottlenecks
[20:41] <ben3> Kvisle: that's huge
[20:42] <ben3> normal ping should be like 1/10th of that
[20:42] <cathode> should move to 10G over fiber. typical 0.005ms
[20:42] <cathode> sorry too many zeroes
[20:42] <ben3> are you using gbe kvisile?
[20:42] <cathode> 0.03 to 0.05ms
[20:43] <bdeetz> Speaking of networking, is 40G or FDR IB an excessive interconnect between OSD servers?
[20:43] <ben3> cathode: yeh that sounds more realistic at least
[20:43] <Kvisle> ben3: multiple 1gbps interfaces bundled together with lacp. 2gbps on the hypervisors, and 4gbps on the osd servers
[20:44] <ben3> Kvisle: yeah that won't go as well as 10gbe+
[20:44] <ben3> and you will experience higher latency
[20:44] <ben3> at the same time, you can still tune your ethernet cards and cstates
[20:44] <cathode> 10G with passive SFP+ cables (twinax) is about as low-latency as fiber too and way way cheaper
[20:44] <ben3> 10gb single ethernet cards are like $16 USD on ebay
[20:45] <ben3> single sfp+
[20:45] <cathode> yea
[20:45] <ben3> but infiniband isn't that expensive
[20:45] <cathode> i bought 4 of them a couple months ago. the Chelsio T310 cards
[20:45] * davidz1 (~davidz@2605:e000:1313:8003:9c0b:b9a2:d5e0:3618) has joined #ceph
[20:45] <ben3> and wlil get even faster when rdma stuff is in
[20:45] * basicxman (~DoDzy@6AGAAAEKD.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:45] <bdeetz> RDMA is awesome when you have it
[20:45] * davidz (~davidz@2605:e000:1313:8003:5180:6d40:4789:d518) Quit (Read error: Connection reset by peer)
[20:45] <ben3> i have dual infiniband ddr, dual infiniband qdr, and dual 10gbe cards at home
[20:46] <ben3> i mostly got more dual 10gbe cards because i got a box of cheap sfp+ cables :)
[20:46] <ben3> bdeetz: yeah, but when do you have it?
[20:46] <ben3> i wish more stuff would support rdma :)
[20:46] <bdeetz> ben3: I've used RDMA in an HPC environment with MPI jobs.
[20:47] <zenpac> In order to setup LVM and Ceph on Cinder and Nova, do I need distinct sections for each of (ceph,lvm) in each of /etc/nova/nova.com and /etc/cinder/cinder.conf ? Anyone have example of this?
[20:47] <zenpac> That should have read: /etc/nova/nova.conf
[20:48] <rkeene> icko, OpenStack
[20:48] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[20:50] * Sliker (~VampiricP@06SAAAXR7.tor-irc.dnsbl.oftc.net) Quit ()
[20:50] * basicxman1 (~dicko@6AGAAAEKK.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:51] <ben3> What is generally considered cstate settings for ceph?
[20:52] <ben3> The lower the lower latency but it only marginally affects throughput
[20:53] <ben3> Most people would disable c6 at least though right?
[20:54] * huangjun (~kvirc@117.152.72.24) has joined #ceph
[20:56] * mykola (~Mikolaj@91.225.200.223) Quit (Quit: away)
[20:58] * angdraug (~angdraug@64.124.158.100) has joined #ceph
[20:58] <johnavp1989> Does anyone have any ideas as to why I would be experiencing poor write performance on an RBD?
[20:59] <ben3> johnavp1989: writes have to be synced to all servers before saying complete.
[20:59] <ben3> so you need low write latency in order to get fast performance. even ssd's can have high write latency.
[20:59] <ben3> but you could check each of them :)
[21:00] <ben3> what is poor for you atm?
[21:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[21:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[21:03] * huangjun (~kvirc@117.152.72.24) Quit (Ping timeout: 480 seconds)
[21:05] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[21:05] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[21:07] * kawa2014 (~kawa@89.184.114.246) Quit ()
[21:07] * Cory1 (77fc11a3@107.161.19.53) has joined #ceph
[21:09] <johnavp1989> ben3: This is the type of performance I'm talking about
[21:09] <johnavp1989> sudo dd if=/dev/zero of=/mnt/test/smallfile bs=1M count=1000
[21:09] <johnavp1989> 1000+0 records in
[21:09] <johnavp1989> 1000+0 records out
[21:09] <johnavp1989> 1048576000 bytes (1.0 GB) copied, 94.6032 s, 11.1 MB/s
[21:09] <ben3> damn
[21:09] <ben3> that's insanely slow
[21:09] <koollman> is there some kind of bandwidth limitation going on ?
[21:10] <ben3> even gbe should go faster than that
[21:10] <ben3> it's not 100 megabit ethernet on the client is it? :)
[21:10] <koollman> it's very close to 100mbits (12MB/s), yes
[21:10] <johnavp1989> Nope the client has a 1G interface and each OSD has 2 1G interfaces in LACP
[21:10] <ben3> yeh strangely close to 100 megabit :)
[21:11] <ben3> what mtu?
[21:12] <ben3> i strangely found when i tested with a gbe client that 6k mtu was faster than 9k mtu
[21:12] <ben3> not that it sounds like something that mtu should resolve.
[21:12] <ben3> are you using ssd journals?
[21:15] * basicxman (~DoDzy@6AGAAAEKD.tor-irc.dnsbl.oftc.net) Quit ()
[21:15] * Dragonshadow (~nastidon@atlantic480.us.unmetered.com) has joined #ceph
[21:17] <johnavp1989> ben3: Yes I am using SSD journals, granted it's only 1 journal per 13 OSD's (maybe a bit much?)
[21:17] <ben3> nvme ssd?
[21:17] <ben3> that ratio is insane
[21:18] <ben3> you may be better off not using journals if you have any load on the system.
[21:18] <ben3> but you can check i/o wait using iostat
[21:18] <ben3> to see how active the ssd is
[21:19] <ben3> also if that ssd dies you'll lose 13 dsiks
[21:19] <johnavp1989> Nope not nvme :(
[21:19] <ben3> does it have power loss protection at least?
[21:19] * Cory1 (77fc11a3@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[21:19] * Bartek (~Bartek@78.10.243.59) has joined #ceph
[21:20] <johnavp1989> Yes it's in a datacenter with dual PSU's
[21:20] <ben3> no the ssd itself
[21:20] <johnavp1989> ha nope
[21:20] <ben3> makes a huge difference to sync write speed
[21:20] <ben3> lots of ssd's suck with sync write performance
[21:20] <ben3> what kind of ssd is it?
[21:20] * basicxman1 (~dicko@6AGAAAEKK.tor-irc.dnsbl.oftc.net) Quit ()
[21:20] * Maariu5_ (~Bromine@ns316491.ip-37-187-129.eu) has joined #ceph
[21:21] <ben3> s/kind/model/
[21:21] * ftuesca (~ftuesca@181.170.106.78) Quit (Quit: Leaving)
[21:21] <ben3> it's very likely you'd be better off with no ssd journaling rather than 13:1 on a consumer ssd.
[21:22] <ben3> but it may sort of work if you have enough servers and low enough load
[21:22] <ben3> problem is you're going to be writing a lot of data to that ssd.. and killing it rather quickly, but before it does it could give worse performance
[21:22] <ben3> and then when it dies you have to resync 13 disks once you get a new ssd in
[21:23] <ben3> resyncing 13 disks on gbe doesn't sound fun to me
[21:23] <johnavp1989> The SSD is a INTEL SSDSC2BX400G4R
[21:24] <ben3> ahh that does have power loss protection
[21:24] <ben3> so at least that part is covered :)
[21:25] <johnavp1989> haha lovely
[21:25] <ben3> it's rated at 400mb/sec
[21:25] <ben3> for sequential write speed
[21:25] <ben3> so yeh check out i/o wait
[21:25] <ben3> do you have other load on your system?
[21:31] <johnavp1989> ok i'll have a look at the iowait. the mtu is 1500 I haven't setup jumbo frames though I do plan to but expected better speeds withouth it
[21:32] <johnavp1989> So there's not much load but each server is running both a monitor and an OSD in KVM
[21:32] <johnavp1989> Both VM's and the host are running on a RAID1 15K SAS
[21:33] <ben3> Osd in kvm?
[21:33] <johnavp1989> yup. The OSD is in KVM but the OSD disks are directly mounted the the VM
[21:33] <ben3> Why?
[21:34] <johnavp1989> OSD can be taken down without taking down the monitor and vice versa
[21:39] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[21:39] <ben3> osd can be taken down without using kvm
[21:43] <johnavp1989> I really meant the entire host but I supposed it doesn't matter
[21:45] * Dragonshadow (~nastidon@4MJAADX19.tor-irc.dnsbl.oftc.net) Quit ()
[21:45] * legion (~Kizzi@hessel2.torservers.net) has joined #ceph
[21:46] * rendar (~I@host241-182-dynamic.33-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:47] <johnavp1989> Here's iostat while writing from a single host
[21:47] <johnavp1989> http://pastebin.com/qce18k8y
[21:48] <johnavp1989> eh cut that off a bit http://pastebin.com/cENsDa3G
[21:49] <johnavp1989> sda is the journal
[21:49] <alexbligh1> has anyone built librados/librdb for OS-X?
[21:49] * rendar (~I@host241-182-dynamic.33-79-r.retail.telecomitalia.it) has joined #ceph
[21:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[21:50] * Maariu5_ (~Bromine@06SAAAXU1.tor-irc.dnsbl.oftc.net) Quit ()
[21:51] * Joppe4899 (~Knuckx@217.23.13.129) has joined #ceph
[21:51] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[22:00] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[22:02] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[22:08] * dneary (~dneary@pool-96-237-170-97.bstnma.fios.verizon.net) has joined #ceph
[22:10] <wateringcan> hi all, if i have 5 OSD drives sharing a single SSD for journal device, is there any reason why i wouldn't to consume all space on the SSD?
[22:11] <brians__> wateringcan you may add new OSDs in the future and want to use the SSD for additional journals?
[22:12] <brians__> Its pointless making the journal huge - it will never be utilised
[22:13] <wateringcan> brians__: i suppose so, that is the only thing i could think of too
[22:15] <brians__> johnavp1989 far from an expert here but those iowait on your spinners are very high. have you got some kind of IO issue on whatever hdd bus you are using?
[22:15] * legion (~Kizzi@6AGAAAENM.tor-irc.dnsbl.oftc.net) Quit ()
[22:15] * skrblr (~kalleeen@ns316491.ip-37-187-129.eu) has joined #ceph
[22:17] * Bartek (~Bartek@78.10.243.59) Quit (Ping timeout: 480 seconds)
[22:20] * Joppe4899 (~Knuckx@6AGAAAENS.tor-irc.dnsbl.oftc.net) Quit ()
[22:20] * tritonx (~Deiz@static-ip-85-25-103-119.inaddr.ip-pool.com) has joined #ceph
[22:21] * georgem (~Adium@69-165-151-116.dsl.teksavvy.com) Quit (Quit: Leaving.)
[22:22] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:31] * gopher_49 (~gopher_49@host2.drexchem.com) Quit (Ping timeout: 480 seconds)
[22:32] * Foloex (~foloex@81-67-102-161.rev.numericable.fr) has joined #ceph
[22:32] <Foloex> Hello world
[22:34] <Foloex> I'm unable to mount a CephFS with the kernel driver but I'm able to mount it with ceph-fuse. I have no idea on why, dmesg doesn't show any error... What could be wrong ?
[22:36] <[arx]> what mount command do you use?
[22:36] * thomnico (~thomnico@132-110.80-90.static-ip.oleane.fr) has joined #ceph
[22:36] <Foloex> sudo mount -t ceph 192.168.0.50:6789:/ ceph/
[22:37] <lurbs> Could be that your kernel version is too old for the tunables your cluster is using.
[22:37] * bene2 (~bene@2601:18c:8501:25e4:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[22:37] <lurbs> http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables
[22:37] <lurbs> http://docs.ceph.com/docs/master/rados/operations/crush-map/#which-client-versions-support-crush-tunables
[22:37] <[arx]> you need -o name=admin,secretfile=
[22:37] <lurbs> Although I'd expect that to raise an error.
[22:37] <Foloex> "ceph-fuse -m 192.168.0.50:6789 ceph/" works fine
[22:38] <Foloex> I did not protect the cluster and it use to mount with this command
[22:38] * dgurtner (~dgurtner@217.149.140.193) has joined #ceph
[22:38] <Foloex> I'm running CoreOS which has kernel 4.4.6
[22:38] <[arx]> ah, ok then.
[22:40] <Foloex> I guess the good news is that I'm able to retrieve the files somehow ^^
[22:40] <Foloex> but I'd rather know why
[22:40] * allaok (~allaok@ARennes-658-1-51-139.w2-13.abo.wanadoo.fr) has joined #ceph
[22:45] * skrblr (~kalleeen@6AGAAAEO0.tor-irc.dnsbl.oftc.net) Quit ()
[22:45] * Drezil1 (~hassifa@06SAAAXYU.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:50] * tritonx (~Deiz@06SAAAXXL.tor-irc.dnsbl.oftc.net) Quit ()
[22:51] * Vale (~TehZomB@93.174.93.133) has joined #ceph
[22:53] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[22:59] * wwdillingham (~LobsterRo@140.247.242.44) Quit (Quit: wwdillingham)
[23:01] * haomaiwang (~haomaiwan@li745-113.members.linode.com) Quit (Remote host closed the connection)
[23:01] * LDA (~lda@host217-114-156-249.pppoe.mark-itt.net) Quit (Quit: LDA)
[23:01] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[23:02] * haomaiwang (~haomaiwan@li745-113.members.linode.com) has joined #ceph
[23:06] * Bartek (~Bartek@78.10.243.59) has joined #ceph
[23:06] * mattbenjamin (~mbenjamin@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[23:08] * chwy (~chwy@2a01:e34:edc1:4270:18dc:d1d5:e3dd:b494) has joined #ceph
[23:10] * davidz (~davidz@2605:e000:1313:8003:9c0b:b9a2:d5e0:3618) has joined #ceph
[23:11] * mhack (~mhack@66-168-117-78.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[23:11] <Foloex> kernel mount hangs for a while before giving up on a "mount: 192.168.0.50:6789:/: can't read superblock" while dmesg contains "libceph: client314106 fsid cdcdcffc-beaf-49ef-ba62-e4e58aecaee4" and "libceph: mon0 192.168.0.50:6789 session established"
[23:12] * davidz1 (~davidz@2605:e000:1313:8003:9c0b:b9a2:d5e0:3618) Quit (Read error: Connection reset by peer)
[23:13] <Foloex> monitor, mds and osd logs look fine
[23:13] <Foloex> cluster is healthy
[23:13] * gopher_49 (~gopher_49@mobile-166-173-250-145.mycingular.net) has joined #ceph
[23:15] * thomnico (~thomnico@132-110.80-90.static-ip.oleane.fr) Quit (Ping timeout: 480 seconds)
[23:15] * Drezil1 (~hassifa@06SAAAXYU.tor-irc.dnsbl.oftc.net) Quit ()
[23:15] * Lite (~Averad@192.87.28.28) has joined #ceph
[23:17] <m0zes> enable debug logging.
[23:17] <m0zes> "echo module ceph +p > /sys/kernel/debug/dynamic_debug/control"
[23:17] <m0zes> try to mount
[23:17] <m0zes> disable debug logging "echo module ceph -p > /sys/kernel/debug/dynamic_debug/control"
[23:17] <m0zes> pastebin logs.
[23:18] <Foloex> that's for the host mounting cephfs or for the ceph monitor ?
[23:18] <m0zes> host mounting ceph
[23:18] <Foloex> ok
[23:19] * Miouge (~Miouge@h-72-233.a163.priv.bahnhof.se) Quit (Quit: Miouge)
[23:19] * chwy (~chwy@2a01:e34:edc1:4270:18dc:d1d5:e3dd:b494) Quit (Quit: Leaving)
[23:20] <Foloex> it's trying to mount
[23:20] * Vale (~TehZomB@4MJAADX5G.tor-irc.dnsbl.oftc.net) Quit ()
[23:20] * zc00gii (~clarjon1@strasbourg-tornode.eddai.su) has joined #ceph
[23:22] <Foloex> et voil??: http://pastebin.com/aLq4FpHT
[23:22] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[23:24] * thomnico (~thomnico@37.160.5.5) has joined #ceph
[23:28] <m0zes> it isn't getting mds0 as active. it is trying to use mds-1. strange.
[23:28] <Foloex> I have two mds on two hosts: "mdsmap e104: 1/1/0 up {0=mds_1=up:active}, 1 up:standby"
[23:35] <Foloex> my issue seems to be related to http://docs.ceph.com/docs/firefly/dev/kernel-client-troubleshooting/
[23:41] <Foloex> I don't see any issue there, there are only logs from when I managed to ceph-fuse cephfs and copy the data to a backup...
[23:42] <Foloex> and the mds cannot be broken as ceph-fuse works fine
[23:43] <Foloex> I guess the cluster is in a wierd state that only affects kernel mount
[23:45] * Lite (~Averad@06SAAAXZ0.tor-irc.dnsbl.oftc.net) Quit ()
[23:47] <Foloex> kernel mount fails both on two CoreOS beta hosts and on a another debian 8 host, they have different kernel version
[23:49] * gopher_49 (~gopher_49@mobile-166-173-250-145.mycingular.net) Quit (Ping timeout: 480 seconds)
[23:50] <gregsfortytwo1> you probably have the cluster using features not supported by the kernel yet
[23:50] <gregsfortytwo1> most common one is new crush placement algorithms/tunables
[23:50] * zc00gii (~clarjon1@6AGAAAER6.tor-irc.dnsbl.oftc.net) Quit ()
[23:51] * haplo37 (~haplo37@199.91.185.156) Quit (Remote host closed the connection)
[23:52] * thomnico (~thomnico@37.160.5.5) Quit (Ping timeout: 480 seconds)
[23:52] * rendar (~I@host241-182-dynamic.33-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[23:52] <Foloex> gregsfortytwo1: there is no backward compatibility in such cases ?
[23:52] <gregsfortytwo1> not for data placement
[23:52] <gregsfortytwo1> you can change it if you really want to
[23:53] <gregsfortytwo1> and if you upgrade from older versions it doesn't switch them automatically
[23:53] * wwdillingham (~LobsterRo@mobile-166-186-168-229.mycingular.net) has joined #ceph
[23:53] <gregsfortytwo1> but it helps a lot with getting a good data distribution
[23:54] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[23:54] <lurbs> Until you upgrade the cluster past the point where long running QEMU processes linked against old librbd can handle it, and the VMs all die. :)
[23:54] <Foloex> I don't think I updated the cluster but maybe the docker image I'm using did
[23:56] <Foloex> I'm running Ceph 9.2.1
[23:57] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[23:57] * wwdillingham (~LobsterRo@mobile-166-186-168-229.mycingular.net) Quit (Read error: Connection reset by peer)
[23:59] * dgurtner (~dgurtner@217.149.140.193) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.