#ceph IRC Log

Index

IRC Log for 2016-05-23

Timestamps are in GMT/BST.

[0:02] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[0:03] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Ping timeout: 480 seconds)
[0:15] * rendar (~I@host121-178-dynamic.36-79-r.retail.telecomitalia.it) has joined #ceph
[0:15] * osuka_ (~SinZ|offl@7V7AAE3FN.tor-irc.dnsbl.oftc.net) Quit ()
[0:15] * Sun7zu (~Ralth@chomsky.torservers.net) has joined #ceph
[0:17] * penguinRaider (~KiKo@mail2.geosansar.com) has joined #ceph
[0:30] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[0:30] * badone (~badone@66.187.239.16) Quit (Quit: k?thxbyebyenow)
[0:39] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Ping timeout: 480 seconds)
[0:45] * Sun7zu (~Ralth@7V7AAE3GW.tor-irc.dnsbl.oftc.net) Quit ()
[0:45] * kalmisto (~Ian2128@95.128.43.164) has joined #ceph
[0:55] * rendar (~I@host121-178-dynamic.36-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[0:56] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[0:57] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[1:02] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) has joined #ceph
[1:04] * huangjun (~kvirc@117.151.52.46) has joined #ceph
[1:12] * huangjun (~kvirc@117.151.52.46) Quit (Ping timeout: 480 seconds)
[1:15] * kalmisto (~Ian2128@7V7AAE3H3.tor-irc.dnsbl.oftc.net) Quit ()
[1:15] * Cue (~DougalJac@65.19.167.130) has joined #ceph
[1:17] * oms101 (~oms101@p20030057EA604400C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:25] * oms101 (~oms101@p20030057EA481B00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:31] * ircuser-1 (~Johnny@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[1:45] * Cue (~DougalJac@06SAACVLW.tor-irc.dnsbl.oftc.net) Quit ()
[1:45] * MonkeyJamboree (~spidu_@178-17-170-99.static.host) has joined #ceph
[1:59] * LeaChim (~LeaChim@host86-176-96-249.range86-176.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:59] * doppelgrau (~doppelgra@dslb-146-060-066-134.146.060.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[2:15] * MonkeyJamboree (~spidu_@7V7AAE3J1.tor-irc.dnsbl.oftc.net) Quit ()
[2:15] * Sketchfile (~Gecko1986@4MJAAFES5.tor-irc.dnsbl.oftc.net) has joined #ceph
[2:17] * wer (~wer@216.197.66.226) Quit (Remote host closed the connection)
[2:25] * hesco (~hesco@2601:cb:c001:faea:8cde:1adc:bfc4:41fb) has joined #ceph
[2:26] <hesco> I seem unable to create a second monitor on the next node. Can anyone here please advise what I may be missing? I've been following the recipe at: https://blog.zhaw.ch/icclab/deploy-ceph-and-start-using-it-end-to-end-tutorial-installation-part-13/
[2:26] <hesco> Raw
[2:26] <hesco> I have posted my output here: https://gist.github.com/hesco/6429e92f8db1acc808bcae069c05861a
[2:45] * Sketchfile (~Gecko1986@4MJAAFES5.tor-irc.dnsbl.oftc.net) Quit ()
[2:45] * n0x1d1 (~Kayla@exit1.ipredator.se) has joined #ceph
[2:52] * huangjun (~kvirc@113.57.168.154) has joined #ceph
[3:02] * aj__ (~aj@x590cd3eb.dyn.telefonica.de) has joined #ceph
[3:02] * derjohn_mobi (~aj@x590d05b4.dyn.telefonica.de) Quit (Read error: Connection reset by peer)
[3:15] * n0x1d1 (~Kayla@06SAACVOJ.tor-irc.dnsbl.oftc.net) Quit ()
[3:15] * matx (~luckz@chulak.enn.lu) has joined #ceph
[3:25] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:25] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[3:25] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:30] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[3:30] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[3:30] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[3:45] * matx (~luckz@06SAACVPJ.tor-irc.dnsbl.oftc.net) Quit ()
[3:45] * oracular (~Kyso@atlantic850.dedicatedpanel.com) has joined #ceph
[3:50] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[3:53] * EinstCrazy (~EinstCraz@58.247.117.134) has joined #ceph
[3:57] * EinstCrazy (~EinstCraz@58.247.117.134) Quit (Remote host closed the connection)
[3:59] * danieagle (~Daniel@189.97.76.22) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[4:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[4:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[4:03] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[4:03] * _khyron (~khyron@fixed-190-159-187-190-159-75.iusacell.net) has joined #ceph
[4:07] * kefu (~kefu@183.193.181.153) has joined #ceph
[4:08] * khyron (~khyron@fixed-190-159-187-190-159-75.iusacell.net) Quit (Ping timeout: 480 seconds)
[4:15] * yanzheng (~zhyan@118.116.115.252) has joined #ceph
[4:15] * oracular (~Kyso@4MJAAFEVY.tor-irc.dnsbl.oftc.net) Quit ()
[4:15] * Nanobot (~Esge@torsrva.snydernet.net) has joined #ceph
[4:17] * penguinRaider (~KiKo@mail2.geosansar.com) Quit (Ping timeout: 480 seconds)
[4:19] * flisky (~Thunderbi@36.110.40.21) has joined #ceph
[4:27] * penguinRaider (~KiKo@mail2.geosansar.com) has joined #ceph
[4:28] * lj (~liujun@111.202.176.44) has joined #ceph
[4:30] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:31] * kefu (~kefu@183.193.181.153) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[4:34] * mhuang (~mhuang@119.254.120.72) has joined #ceph
[4:36] * shaunm (~shaunm@67.136.129.196) Quit (Ping timeout: 480 seconds)
[4:43] * kefu (~kefu@183.193.181.153) has joined #ceph
[4:45] * Nanobot (~Esge@7V7AAE3OQ.tor-irc.dnsbl.oftc.net) Quit ()
[4:46] * shyu (~shyu@114.254.40.190) has joined #ceph
[4:50] * brannmar (~Wijk@91.109.29.120) has joined #ceph
[5:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[5:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[5:14] * kefu_ (~kefu@114.92.122.74) has joined #ceph
[5:16] * markl (~mark@knm.org) has joined #ceph
[5:16] * kefu (~kefu@183.193.181.153) Quit (Ping timeout: 480 seconds)
[5:19] * Vacuum_ (~Vacuum@i59F7940B.versanet.de) has joined #ceph
[5:20] * brannmar (~Wijk@4MJAAFEX8.tor-irc.dnsbl.oftc.net) Quit ()
[5:26] * Vacuum__ (~Vacuum@i59F79BE6.versanet.de) Quit (Ping timeout: 480 seconds)
[5:29] * Sami345 (~AotC@146.0.43.126) has joined #ceph
[5:30] <TheSov> so is there a functional way to get bluestore working on debian?
[5:35] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[5:35] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[5:38] * zhaochao (~zhaochao@125.39.112.4) has joined #ceph
[5:42] * aarontc (~aarontc@2001:470:e893::1:1) Quit (Quit: Bye!)
[5:46] * yanzheng (~zhyan@118.116.115.252) Quit (Quit: ??????)
[5:59] * Sami345 (~AotC@7V7AAE3QY.tor-irc.dnsbl.oftc.net) Quit ()
[5:59] * roaet (~ItsCrimin@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[6:00] * nils__ (~nils_@port-5958.pppoe.wtnet.de) has joined #ceph
[6:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[6:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[6:01] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[6:02] * aarontc (~aarontc@2001:470:e893::1:1) has joined #ceph
[6:05] * shyu (~shyu@114.254.40.190) Quit (Ping timeout: 480 seconds)
[6:06] * kefu_ is now known as kefu|afk
[6:06] * kefu|afk is now known as kefu_
[6:06] * kefu_ is now known as kefu
[6:07] <TheSov> so after i do ceph-disk prepare /dev/sdc /dev/sdb1, should it automount and start the osd?
[6:07] * nils_ (~nils_@port-10062.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[6:14] * kefu (~kefu@114.92.122.74) Quit (Max SendQ exceeded)
[6:15] * kefu (~kefu@114.92.122.74) has joined #ceph
[6:21] * deepthi (~deepthi@115.118.211.182) has joined #ceph
[6:29] * roaet (~ItsCrimin@4MJAAFEZ2.tor-irc.dnsbl.oftc.net) Quit ()
[6:29] * tokie (~jwandborg@5.255.80.27) has joined #ceph
[6:30] * kefu (~kefu@114.92.122.74) Quit (Max SendQ exceeded)
[6:31] * kefu (~kefu@114.92.122.74) has joined #ceph
[6:36] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[6:37] * prallab (~prallab@216.207.42.137) has joined #ceph
[6:47] * aj__ (~aj@x590cd3eb.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[6:47] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[6:59] * tokie (~jwandborg@4MJAAFE0Y.tor-irc.dnsbl.oftc.net) Quit ()
[6:59] * PcJamesy (~Bonzaii@178-175-128-50.static.host) has joined #ceph
[7:03] * prallab (~prallab@216.207.42.137) Quit (Remote host closed the connection)
[7:04] * user_7475 (779f3eb0@107.161.19.53) has joined #ceph
[7:04] <user_7475> Allah is doing
[7:05] <user_7475> sun is not doing Allah is doing
[7:05] <user_7475> moon is not doing Allah is doing
[7:05] <user_7475> stars are not doing Allah is doing
[7:05] <user_7475> planets are not doing Allah is doing
[7:05] <user_7475> galaxies are not doing Allah is doing
[7:06] <user_7475> oceans are not doing Allah is doing
[7:06] <user_7475> mountains are not doing Allah is doing
[7:06] <user_7475> trees are not doing Allah is doing
[7:07] <user_7475> mom is not doing Allah is doing
[7:07] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[7:07] <user_7475> dad is not doing Allah is doing
[7:07] <user_7475> boss is not doing Allah is doing
[7:07] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:07] <user_7475> job is not doing Allah is doing
[7:08] <user_7475> dollar is not doing Allah is doing
[7:08] <user_7475> degree is not doing Allah is doing
[7:08] <user_7475> medicine is not doing Allah is doing
[7:08] <user_7475> customers are not doing Allah is doing
[7:08] * prallab (~prallab@216.207.42.137) has joined #ceph
[7:09] <user_7475> you can not get a job without the permission of allah
[7:09] <user_7475> you can not get married without the permission of allah
[7:09] <user_7475> nobody can get angry at you without the permission of allah
[7:09] <user_7475> light is not doing Allah is doing
[7:10] <user_7475> fan is not doing Allah is doing
[7:10] <user_7475> businessess are not doing Allah is doing
[7:10] <user_7475> america is not doing Allah is doing
[7:10] <user_7475> fire can not burn without the permission of allah
[7:10] <user_7475> knife can not cut without the permission of allah
[7:11] <user_7475> rulers are not doing Allah is doing
[7:11] <user_7475> governments are not doing Allah is doing
[7:11] * kefu (~kefu@114.92.122.74) Quit (Max SendQ exceeded)
[7:11] <user_7475> sleep is not doing Allah is doing
[7:11] <user_7475> hunger is not doing Allah is doing
[7:12] * kefu (~kefu@114.92.122.74) has joined #ceph
[7:12] <user_7475> food does not take away the hunger Allah takes away the hunger
[7:12] <user_7475> water does not take away the thirst Allah takes away the thirst
[7:12] <user_7475> seeing is not doing Allah is doing
[7:12] <user_7475> hearing is not doing Allah is doing
[7:12] <lurbs> Hey, at least use a pastebin.
[7:12] <user_7475> seasons are not doing Allah is doing
[7:12] <user_7475> weather is not doing Allah is doing
[7:13] <user_7475> humans are not doing Allah is doing
[7:13] <user_7475> animals are not doing Allah is doing
[7:13] <user_7475> the best amongst you are those who learn and teach quran
[7:14] <user_7475> one letter read from book of Allah amounts to one good deed and Allah multiplies one good deed ten times
[7:15] <user_7475> hearts get rusted as does iron with water to remove rust from heart recitation of Quran and rememberance of death
[7:15] <user_7475> heart is likened to a mirror
[7:15] <user_7475> when a person commits one sin a black dot sustains the heart
[7:16] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[7:17] * user_7475 (779f3eb0@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[7:19] * wgao (~wgao@106.120.101.38) has joined #ceph
[7:24] * linjan_ (~linjan@176.195.66.84) has joined #ceph
[7:29] * PcJamesy (~Bonzaii@7V7AAE3TM.tor-irc.dnsbl.oftc.net) Quit ()
[7:29] * Eric1 (~Jebula@tor-exit-node.seas.upenn.edu) has joined #ceph
[7:30] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[7:32] * shyu (~shyu@114.254.40.190) has joined #ceph
[7:38] * linjan_ (~linjan@176.195.66.84) Quit (Ping timeout: 480 seconds)
[7:38] * hemebond (~james@121.98.140.35) has joined #ceph
[7:39] <hemebond> Can anyone help me with a new OSD that's stuck booting?
[7:39] <hemebond> This has happened twice now following the manual configuration instructions on the website.
[7:43] * dgurtner (~dgurtner@178.197.232.109) has joined #ceph
[7:44] * aj__ (~aj@88.128.80.75) has joined #ceph
[7:50] * efirs (~firs@c-50-185-70-125.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:51] * kmroz (~kilo@00020103.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:54] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[7:55] * bvi (~Bastiaan@152-64-132-5.ftth.glasoperator.nl) has joined #ceph
[7:59] * Eric1 (~Jebula@7V7AAE3UD.tor-irc.dnsbl.oftc.net) Quit ()
[7:59] * clarjon1 (~poller@207.244.70.35) has joined #ceph
[7:59] * kmroz (~kilo@00020103.user.oftc.net) has joined #ceph
[7:59] * dgurtner (~dgurtner@178.197.232.109) Quit (Ping timeout: 480 seconds)
[8:01] * overclk (~quassel@121.244.87.117) has joined #ceph
[8:02] * dgurtner (~dgurtner@178.197.227.237) has joined #ceph
[8:02] * hesco (~hesco@2601:cb:c001:faea:8cde:1adc:bfc4:41fb) Quit (Ping timeout: 480 seconds)
[8:03] * bvi (~Bastiaan@152-64-132-5.ftth.glasoperator.nl) Quit (Ping timeout: 480 seconds)
[8:08] * bvi (~Bastiaan@185.56.32.1) has joined #ceph
[8:15] * Lokta (~Lokta@carbon.coe.int) has joined #ceph
[8:18] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Killed (NickServ (Too many failed password attempts.)))
[8:18] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[8:29] * clarjon1 (~poller@4MJAAFE3K.tor-irc.dnsbl.oftc.net) Quit ()
[8:29] * osuka_ (~roaet@freedom.ip-eend.nl) has joined #ceph
[8:31] * garphy`aw is now known as garphy
[8:33] * aj__ (~aj@88.128.80.75) Quit (Ping timeout: 480 seconds)
[8:34] * shylesh__ (~shylesh@121.244.87.118) has joined #ceph
[8:41] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[8:41] * itamarl (~itamarl@194.90.7.244) has joined #ceph
[8:43] * deepthi (~deepthi@115.118.211.182) Quit (Remote host closed the connection)
[8:45] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[8:48] * deepthi (~deepthi@115.118.211.182) has joined #ceph
[8:48] * bvi (~Bastiaan@185.56.32.1) Quit (Ping timeout: 480 seconds)
[8:55] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:57] * dvanders (~dvanders@dvanders-pro.cern.ch) has joined #ceph
[8:59] * osuka_ (~roaet@06SAACVZS.tor-irc.dnsbl.oftc.net) Quit ()
[8:59] * Eric1 (~Throlkim@freedom.ip-eend.nl) has joined #ceph
[8:59] * bvi (~Bastiaan@185.56.32.1) has joined #ceph
[9:01] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[9:03] * user_8464 (779f442c@107.161.19.53) has joined #ceph
[9:03] <user_8464> Allah is doing
[9:03] <user_8464> sun is not doing Allah is doing
[9:03] <user_8464> moon is not doing Allah is doing
[9:04] <user_8464> stars are not doing Allah is doing
[9:04] <user_8464> planets are not doing Allah is doing
[9:04] <user_8464> galaxies are not doing Allah is doing
[9:04] <user_8464> ocenas a
[9:04] <user_8464> oceans are not doing Allah is doing
[9:04] <etienneme> hemebond: You can increase logs verbosity to get more infos :)
[9:05] <user_8464> mountains are not doing Allah is doing
[9:05] <hemebond> etienneme: I get more text, but not more infos :-)
[9:05] <user_8464> trees are not doing Allah is doing
[9:05] <etienneme> You don't have any errors?
[9:05] <user_8464> mom is not doing Allah is doing
[9:05] <hemebond> Not that I can see, no.
[9:05] <user_8464> dad is not doing Allah is doing
[9:05] <user_8464> boss is not doing Allah is doing
[9:06] <user_8464> job is not doing Allah is doing
[9:06] <user_8464> dollar is not doing Allah is doing
[9:06] <user_8464> degree is not doing Allah is doing
[9:06] <hemebond> The log has ticks and do_waiters
[9:06] <etienneme> what does a ceph -s reports? (X osds: X up, X in) count
[9:06] <user_8464> medicine is not doing Allah is doing
[9:06] <user_8464> customers are not doing Allah is doing
[9:06] <hemebond> osdmap e43: 2 osds: 1 up, 1 in
[9:07] <user_8464> you can not get a job without the permission of allah
[9:07] <user_8464> you can not get married without the permission of allah
[9:07] <hemebond> My colleague managed to get it "in" by reweighting it but it fell out again.
[9:07] <user_8464> nobody can get angry at you without the permission of allah
[9:07] <user_8464> light is not doing Allah is doing
[9:07] <user_8464> fan is not doing Allah is doing
[9:08] <user_8464> businessess are not doing Allah is doing
[9:08] <user_8464> amerca is not doing Allah is doing
[9:08] <etienneme> I'm surprised you don't have more infos or errors on logs :/ with an higher verbosity
[9:08] <user_8464> fire can not burn without the permission of allah
[9:08] <hemebond> I have a lot more text but I can't see anything that looks like an error.
[9:08] <user_8464> knife can not cut without the permission of allah
[9:09] <user_8464> rulers are not doing Allah is doing
[9:09] <hemebond> I'm still very new to Ceph so if it's not obvious then I'll miss it.
[9:09] <hemebond> (nothing I've done in Ceph has gone particularly well so far)
[9:09] <user_8464> governments are not doing Allah is doing
[9:09] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[9:09] <user_8464> sleep is not doing Allah is doing
[9:09] <user_8464> hunger is not doing Allah is doing
[9:10] <user_8464> food does not take away the hunger Allah takes away the hunger
[9:10] * linjan_ (~linjan@86.62.112.22) has joined #ceph
[9:10] <hemebond> I see the new osd getting the epoch and such so it seems to be able to connect to everything it needs.
[9:10] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[9:10] <hemebond> The crush map has been updated.
[9:10] <user_8464> water does not take away the thirst Allah takes away the thirst
[9:10] <user_8464> seeing is not doing Allah is doing
[9:11] <user_8464> hearing is not doing Allah is doing
[9:11] * itamarl is now known as Guest1798
[9:11] <user_8464> seasons are not doing Allah is doing
[9:11] * itamarl (~itamarl@194.90.7.244) has joined #ceph
[9:11] * karnan (~karnan@121.244.87.117) has joined #ceph
[9:11] <user_8464> weather is not doing Allah is doing
[9:11] <user_8464> humans are not doing Allah is doing
[9:11] <user_8464> animals are not doing Allah is doing
[9:12] <hemebond> Is there something that prevents an osd booting properly if the cluster is in a warning state?
[9:12] <user_8464> the best amongst you are those who learn and teach quran
[9:12] <hemebond> (I haven't seen it be a problem but I'm reduced to reading bugs)
[9:13] <user_8464> one letter read from book of Allah amounts to one good deed and Allah multiplies one good deed ten times
[9:13] * Guest1798 (~itamarl@194.90.7.244) Quit (Ping timeout: 480 seconds)
[9:13] * ledgr_ (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[9:13] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Read error: Connection reset by peer)
[9:13] * dgurtner (~dgurtner@178.197.227.237) Quit (Read error: No route to host)
[9:14] <user_8464> hearts get rusted as does iron with water to remove rust from heart recitation of Quran and rememberance of death
[9:14] <hemebond> handle_osd_map epochs [43,43], i have 43, src has [1,43]
[9:14] <user_8464> heart is likened to a mirror
[9:14] <hemebond> That look okay?
[9:14] <user_8464> when a person commits one sin a black dot sustains the heart
[9:15] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[9:15] <hemebond> heartbeat: osd_stat(1083 MB used, 185 GB avail, 196 GB total, peers []/[] op hist [])
[9:16] * sage__ (~quassel@pool-173-76-103-210.bstnma.fios.verizon.net) Quit (Read error: Connection reset by peer)
[9:16] <etienneme> Having a health_warn is not an issue to boot
[9:17] * sage__ (~quassel@pool-173-76-103-210.bstnma.fios.verizon.net) has joined #ceph
[9:17] <etienneme> you could try to grep error o fatal in logs (i don't know if it will work)
[9:17] <etienneme> But if it can't boot, it should tell why.
[9:18] * analbeard (~shw@support.memset.com) has joined #ceph
[9:18] * user_8464 (779f442c@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[9:19] <T1w> thank you for leaving
[9:19] <hemebond> etienneme: Will less logging it just stops at something like "finished init, now booting"
[9:19] <hemebond> With more logging it goes into the ticks.
[9:19] <hemebond> Looks like it can't find peers or something.
[9:19] <etienneme> If you check running process, you have the osd?
[9:19] * prallab (~prallab@216.207.42.137) Quit (Remote host closed the connection)
[9:20] <hemebond> yes.
[9:21] <T1w> portante?
[9:21] <etienneme> If it already worked, your network should be ok. But you could check
[9:21] <hemebond> Right now there is only one Ceph host.
[9:21] <hemebond> With one mon and currently one osd.
[9:21] <hemebond> Just trying to add another osd.
[9:22] <hemebond> ooh, one thing...
[9:22] <hemebond> Lemme try
[9:23] <hemebond> Nope, that didn't help.
[9:24] <hemebond> done with init, starting boot process
[9:24] <hemebond> start_boot - have maps 1..43
[9:24] <hemebond> So it seems to progress with booting but doesn't seem to find any peers.
[9:25] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[9:26] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:26] <etienneme> from the osd down, you can do a ceph osd find X (id of the osd up) you will get an IP and port
[9:26] <etienneme> You can do a nc ip port to check if he can reach the other osd
[9:29] <hemebond> "ip": ":\/0"
[9:29] <hemebond> That looks pretty bad.
[9:29] * Eric1 (~Throlkim@06SAACV1G.tor-irc.dnsbl.oftc.net) Quit ()
[9:29] <hemebond> ceph v027.?????????????????????
[9:29] <hemebond> ^ from the nc command
[9:30] <hemebond> To the working osd.
[9:32] <hemebond> I see connections from both osd processes to the mon process.
[9:32] * pabluk_ is now known as pabluk
[9:35] <etienneme> lno it's OK :)
[9:36] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[9:39] <hemebond> ?
[9:41] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[9:42] * sudocat (~dibarra@2602:306:8bc7:4c50:7c2d:c463:1362:9051) Quit (Ping timeout: 480 seconds)
[9:44] * madkiss (~madkiss@2001:6f8:12c3:f00f:8d20:b987:d518:c6d1) Quit (Quit: Leaving.)
[9:44] <hemebond> In another environment I see connections between the osds.
[9:45] <hemebond> So there does seem to be something stopping them from talking to each other; or attempting to.
[9:45] <etienneme> Maybe someone will have more ideas :)
[9:46] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[9:46] <hemebond> Fingers crossed :-)
[9:46] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[9:51] * _khyron (~khyron@fixed-190-159-187-190-159-75.iusacell.net) Quit (Ping timeout: 480 seconds)
[9:52] * bvi (~Bastiaan@185.56.32.1) Quit (Ping timeout: 480 seconds)
[9:54] <hemebond> etienneme: Could it be some sort of auth issue?
[9:55] <hemebond> I'm not using authentication. The initial cluster was setup with ceph-deploy. The new osd was created manually.
[9:55] <hemebond> I haven't seen "auth" mentioned in the logs but I also haven't found how to check the auth settings for the various daemons.
[9:55] <etienneme> You can check with "ceph auth list"
[9:56] <etienneme> you should see your osd
[9:56] <hemebond> ooooo! Working osd has "caps: [mon] allow profile osd" failing osd has "caps: [mon] allow rwx".
[9:57] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[9:57] <hemebond> Oh :-( that looks like it's allowing anything. That correct?
[9:59] * Inuyasha (~JohnO@192.87.28.82) has joined #ceph
[9:59] <etienneme> No
[9:59] <etienneme> you can allow everything with *
[9:59] <hemebond> Oh sorry, there is a line with [osd]
[9:59] <hemebond> Both have: "caps: [osd] allow *"
[10:00] <etienneme> the admin client? :)
[10:00] <hemebond> both the osd entries
[10:00] * bvi (~Bastiaan@185.56.32.1) has joined #ceph
[10:00] <hemebond> From "ceph auth list"
[10:01] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[10:03] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:03] <hemebond> rwx seems to be the same as * (according to the docs)
[10:04] <hemebond> Hmm, why would the docs say to use "rwx" instead of "profile osd" when setting up a new osd?
[10:05] <hemebond> Though that is for the [mon] line.
[10:09] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:12] <hemebond> The osd should be/register as up before being in, correct?
[10:16] <hemebond> Hmm, I also don't see "do_mon_report" in the bad osd log.
[10:16] * prallab (~prallab@216.207.42.137) has joined #ceph
[10:19] * MannerMan (~oscar@user170.217-10-117.netatonce.net) Quit (Remote host closed the connection)
[10:20] <Be-El> hemebond: did you change the osd key permissions and restarted the osd ?
[10:20] * Kioob1 (~Kioob@ALyon-652-1-188-79.w109-213.abo.wanadoo.fr) has joined #ceph
[10:20] <hemebond> Yes. The osd perms (listed in ceph auth list) now match a working osd and the bad osd was restarted.
[10:21] * MannerMan (~oscar@user170.217-10-117.netatonce.net) has joined #ceph
[10:22] <hemebond> I've pasted the start of the osd log in case there's anything in there to indicate the problem: http://paste.debian.net/hidden/a71a65d8/
[10:23] <hemebond> It just continues with the tick and heartbeat entries.
[10:23] * evelu (~erwan@46.231.131.178) has joined #ceph
[10:24] <Be-El> what's the status of the osd according to ceph -s ?
[10:24] <hemebond> Down and out: osdmap e334: 5 osds: 4 up, 4 in
[10:25] <Be-El> are all osds located on the same host?
[10:25] <Be-El> and why do you use an outdated version for this setup?
[10:25] <hemebond> In the env1, no. In env2 yes.
[10:25] <hemebond> Outdated version?
[10:26] <DaveOD> Hey guys, I've got a quick question: I'm running CephFS and I'm doing some benchmars. I've got 5 OSD servers with each 6 OSD's on it and all got 10Gbit network. Each OSD has it's journal on his own disk. When dd'ing to the OSD drive it'self, I get 170MB/s. But when doing the OSD Bench directly to a OSD, I get 57.31 MB/s. I only have the default settings configured for Ceph. I would expect an OSD speed of 85MB/s because of the journal being on the same drive. Any
[10:26] <Be-El> 0.94.4...the current hammer release is 0.94.7
[10:27] <Be-El> DaveOD: dd is using sequential reads. if you colocate the journal on the osd drive, the drive has to do a lot more seek operations
[10:27] <hemebond> It was only setup last year. This is the first time I've had to touch this environment since it was setup and I was (rightfully) concerned it would all fall over when I did.
[10:28] <hemebond> Would updating help? Not sure there is a newer version just an apt-get away.
[10:28] <Be-El> hemebond: what's the output of 'ceph osd dump'?
[10:29] * Inuyasha (~JohnO@4MJAAFE70.tor-irc.dnsbl.oftc.net) Quit ()
[10:29] <DaveOD> Be-El: I understand. So the seek time is actually causing that much loss? If I would colocate the journal on a seperate SSD, what would you expect of OSD bench speed? 150MB/s? Or is that to much expectation? :-)
[10:29] <Be-El> DaveOD: that's what i assume
[10:29] <hemebond> The bad osd shows "osd.1 down out weight 0 up_from 0 up_thru 0 down_at 0 last_clean_interval [0,0) :/0 :/0 :/0 :/0 autoout,exists XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
[10:29] * Mattias (~oftc-webi@c-209-44.eduroam.liu.se) Quit (Ping timeout: 480 seconds)
[10:29] <hemebond> Good osd shows IP address and ports.
[10:29] <Be-El> DaveOD: it depends on the ssd drive. not every drive is suitable for ceph journals
[10:30] * prallab (~prallab@216.207.42.137) Quit (Remote host closed the connection)
[10:30] <hemebond> X's are some uuid obviously.
[10:30] <Be-El> hemebond: ok, remove that osd completely and re-add it with ceph-disk
[10:31] <hemebond> Okay.
[10:31] <Be-El> hemebond: mixing several deployment method may add some problems. and you have selected the most difficult method
[10:31] <hemebond> Yeah, this environment/setup has been unpleasant from the start.
[10:31] <Be-El> hemebond: does ceph.conf contain entries for that osd that might conflict with the actual setup?
[10:31] <hemebond> I only used the manual setup because I couldn't be sure how ceph-deploy would behave.
[10:31] * karnan (~karnan@121.244.87.117) Quit (Quit: Leaving)
[10:32] <hemebond> I added stuff for the osd to try and help it.
[10:32] <hemebond> "debug osd" and "host"
[10:33] <Be-El> just remove it. ceph should be able to detect all necessary settings by itself
[10:33] * allaok (~allaok@machine107.orange-labs.com) Quit (Quit: Leaving.)
[10:33] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[10:33] <Be-El> the debug_osd entry can stay for debugging purposes
[10:34] * LeaChim (~LeaChim@host86-176-96-249.range86-176.btcentralplus.com) has joined #ceph
[10:34] <hemebond> Okay. Question: osd_pool_default_size is the number of replicas to maintain in the cluster of each file?
[10:35] <Be-El> yes
[10:35] * prallab (~prallab@216.207.42.137) has joined #ceph
[10:36] * lmb (~Lars@ip5b41f0a4.dynamic.kabel-deutschland.de) Quit (Ping timeout: 480 seconds)
[10:36] <hemebond> And will it try to make three replicas even if there's one osd?
[10:37] <Be-El> yes, and it will fail, since the default setup tries to distribute the replicas among hosts
[10:37] <Be-El> with less than three host you won't end up with a healthy cluster
[10:38] <hemebond> I have the CRUSH map configured to balance across osds. So that will still fail? Good.
[10:38] * DanFoster (~Daniel@office.34sp.com) Quit (Ping timeout: 480 seconds)
[10:39] <hemebond> This is one server with one mon and one osd; I'm not sure how it ever became healthy
[10:39] <Be-El> if you use a custom crush ruleset it should be ok
[10:39] <Be-El> to get it healthy with only a single osd you can reduce the number of replicas to one
[10:39] <hemebond> Well I just changed it this evening when adding the new osd.
[10:40] <hemebond> Looking at the (healthy) osd config it looks like "3".
[10:40] <hemebond> Is there another daemon that controls it or is it just the osds?
[10:41] <Be-El> the daemon do not control anything about the pool configuration
[10:41] <Be-El> pool configuration is stored in the mon databases; the osd daemon only use the information from the mons to perform their tasks
[10:41] <hemebond> I see. Should I check the mon config before trying to add the new osd and adjust it down to 1?
[10:43] <Be-El> i would propose to get the osd up and running first
[10:43] <Be-El> pools can be changed in the running setup
[10:43] <hemebond> Okay. Going through the "short form" on http://docs.ceph.com/docs/hammer/install/manual-deployment/
[10:44] <Be-El> stop
[10:44] <hemebond> Okay.
[10:45] <Be-El> the mon is up and running?
[10:45] <Be-El> and you just want to add another osd?
[10:45] <hemebond> Yes.
[10:45] <Be-El> skip that documentation and either use ceph-deploy or ceph-disk
[10:45] <hemebond> I have removed the osd according to http://docs.ceph.com/docs/hammer/rados/operations/add-or-rm-osds/#removing-osds-manual
[10:46] <Be-El> for the later log into the osd node and use 'ceph-disk prepare ...' and ceph-disk active ....'
[10:46] <Be-El> the help messages for both commands should give you some information about the necessary parameters
[10:47] <hemebond> Okay, I'm doing this in the test environment first which had the same issue.
[10:47] <hemebond> btw, that documentation does use ceph-disk. Is it still not correct?
[10:48] <hemebond> http://docs.ceph.com/docs/hammer/install/manual-deployment/#short-form
[10:48] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[10:48] * ade (~abradshaw@85.158.226.30) has joined #ceph
[10:48] <Be-El> if it uses ceph-disk it should be correct
[10:49] <hemebond> Hmm the auth is still there from the previous install. Docs don't say how to delete it. Will do that first.
[10:49] <Be-El> the long form in the master version has some issues (as you already found out)
[10:49] <Be-El> ceph auth del osd.X
[10:49] <Be-El> ceph osd crush remove osd.X
[10:49] <Be-El> and 'ceph osd rm X'
[10:49] <hemebond> Got it. Thanks.
[10:49] <Be-El> (don't ask me why the same semantic has three different verbs.... )
[10:50] * rendar (~I@host30-183-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[10:51] * karnan (~karnan@121.244.87.117) has joined #ceph
[10:53] * shylesh__ (~shylesh@121.244.87.118) Quit (Quit: Leaving)
[10:53] * shylesh__ (~shylesh@121.244.87.118) has joined #ceph
[10:54] <hemebond> Okay, done in test.
[10:54] <hemebond> Now for prod.
[10:55] <DaveOD> Be-El: sorry but another quick question. So I have 30 OSD's (5 OSD servers with each 6 OSD's) with colocated journals. OSD Benchmarks show a max throughput / OSD of 57.31 MB/s. When dd'ing directly on the CephFS mount, I also get max throughput of 50MB/s. And my OSD's %utilized is rather low (certainly not 100% used). Is there any logical explanation for that?
[10:56] <Be-El> DaveOD: writes go to the journal first. if you are dd'ing to a cephfs file, the content is striped into chunks of 4 MB, which are sent to the various osd.
[10:57] <Be-El> so if you use a e.g. 100 MB file, you will have 25 chunks. given 30 osds you will end up with about one chunk per osd
[10:59] <DaveOD> so seek time will also explain why i don't see my OSD drives go to 100% utilization?
[10:59] <DaveOD> when doing to the osd bench test, i see my drive go to 100%
[10:59] <DaveOD> but when using ceph fs
[10:59] <DaveOD> i don't see that happen
[10:59] <Be-El> seek time should be part of utilization
[11:00] <Be-El> what size do you use with dd (absolute + bs size)?
[11:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[11:01] * thomnico (~thomnico@2a01:e35:8b41:120:fd94:9eb8:f9e3:5c38) has joined #ceph
[11:01] <DaveOD> I have done multiple
[11:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[11:01] <DaveOD> dd if=/dev/zero of=./test.bin bs=1G count=8 oflag=direct
[11:01] <DaveOD> dd if=/dev/zero of=./test.bin bs=4k count=100 oflag=direct
[11:01] <Be-El> and keep in mind that dd is writing sequentially. I would expect it to write chunk by chunk, so only one osd is written to at once
[11:02] <Be-El> for more parallelization try tools like fio
[11:02] <DaveOD> forgot about that
[11:02] <DaveOD> thanks
[11:02] <DaveOD> by the way
[11:02] <Be-El> and depending on your setup, you might also want to use conv=fdatasync to skip page cache
[11:03] <DaveOD> big nodes using in ceph is never a good idea. Is that statement still correct?
[11:03] <DaveOD> By theory: if you have 36 drives / OSD server
[11:03] <Be-El> big nodes means big data movement in case of a failing node
[11:03] <DaveOD> let's say 28 OSD's and 6 SSD for each OSD server
[11:03] <Be-El> -> off for meeting
[11:03] <DaveOD> network will always be the limit?
[11:03] <DaveOD> certainaly for 10G networks?
[11:03] <hemebond> Be-El: Thank you. That has all worked and the osd is in and up now.
[11:04] <hemebond> I'd really like to know why it was failing when using the manual process though.
[11:05] <Be-El> yeah...meeting is cancelled......the first good news this day \o/
[11:05] <hemebond> LOL
[11:05] <hemebond> Thank you very much, Be-El.
[11:05] <Be-El> if fuel would now start to setup the cloud.....*crossing fingers*...
[11:06] <Be-El> hemebond: i assume there have been some wrong configurations, either in the mon osd database or on the ceph.conf file
[11:06] <Be-El> hemebond: may be adding the osd with the wrong auth permissions was the cause
[11:07] <hemebond> That's what I suspect too.
[11:07] <hemebond> The config is a mess and hasn't been touched since it was thrown together at the beginning.
[11:08] * nardial (~ls@dslb-088-076-178-130.088.076.pools.vodafone-ip.de) has joined #ceph
[11:09] <hemebond> Now the whole reason for this was to add a bigger osd because the existing one is full/filling. I will need to change the replicas to 1 and adjust weights to favour the new osd I think.
[11:11] <Be-El> i hope that cluster is a pure lab/testing setup
[11:11] <hemebond> hahahahaha *sob*
[11:14] <hemebond> It's moved stuff off the old (small) osd already. I wonder if the running config already has 1 replica specified.
[11:16] <Be-El> with replicas=2 and two osd of different size you will be restricted to the size of the smaller osd
[11:16] <Be-El> but your data will survive a disk failure
[11:17] <Be-El> (and given the "fun" i had with seagate 3TB drives in the past, replicas=3 is more than mandatory.... )
[11:17] * zhaochao_ (~zhaochao@124.202.191.132) has joined #ceph
[11:17] <hemebond> Every config I look at has osd_pool_default_size="3" and yet it just moved files off the smaller osd.
[11:19] * TMM (~hp@185.5.122.2) has joined #ceph
[11:19] <hemebond> Ceph has automatically weighted the larger osd at a higher value (thank you, Ceph).
[11:20] <Be-El> default weight = size in TB
[11:21] <hemebond> "osd_pool_default_min_size = 3" is in ceph.conf. I need to figure out how many replicas it's keeping.
[11:21] <hemebond> Ah yes, I remember reading about the size=weight thing yesterday.
[11:21] <hemebond> I thought I'd have to do it manually but Ceph is smarter than me it seems.
[11:22] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[11:22] * zhaochao (~zhaochao@125.39.112.4) Quit (Ping timeout: 480 seconds)
[11:23] * zhaochao_ is now known as zhaochao
[11:23] <hemebond> ceph osd pool get rbd size -> size: 1
[11:23] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[11:23] <hemebond> It looks like the pools are all set to 1
[11:24] <Be-El> they were probably set to that replica size after creation. the values in ceph.conf are only the default ones
[11:25] * pabluk is now known as pabluk_
[11:25] <hemebond> If the server is restarted (or the monitor or something) will it adjust to what's in the config?
[11:25] <DaveOD> Just another quick question about best practies: Network connection => 10 Gbps => 1250 MB/s . SATA drive has avg throughput of 170MB/s. So most recommended is 7 OSD's per 10Gbps connection?
[11:25] <Be-El> nope, the pool configuration is stored in the mon databases
[11:25] <hemebond> Oh thank goodness for that.
[11:26] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:26] <Be-El> almost everything is stored in the mon databases....you lose your mon, you lose the cluster
[11:26] <hemebond> Be-El: Been there already :-D
[11:26] <Be-El> (and given the dynamic nature of the mon database, you cannot make any consistent backup....)
[11:26] * lmb (~Lars@p578a6a91.dip0.t-ipconnect.de) has joined #ceph
[11:27] <DaveOD> because on this page: http://ceph.com/planet/zero-to-hero-guide-for-ceph-cluster-planning/ . This is claimed: a minimum of 5 physically separated nodes and minimum of 100 OSD @ 4TB per OSD the cluster capacity is over 130TB
[11:27] <Be-El> DaveOD: you need to add some protocol overhead
[11:27] <hemebond> Yikes. That's something else I've been looking at recently. The consensus seemed to be "Don't backup, just add more"
[11:27] <DaveOD> so in that example they have 20 OSD's / physical node
[11:27] * realitysandwich (~perry@213.61.152.126) has joined #ceph
[11:28] <Be-El> -> off for lunch (in case that's not cancelled, too)
[11:28] <realitysandwich> anyone around that can answer some questions about journals on ssds?
[11:28] <hemebond> Thanks Be-El, you were a life-saver.
[11:28] * pabluk_ is now known as pabluk
[11:33] * Nanobot (~Shnaw@chomsky.torservers.net) has joined #ceph
[11:37] * deepthi (~deepthi@115.118.211.182) Quit (Remote host closed the connection)
[11:38] * HappyLoaf (~HappyLoaf@cpc93928-bolt16-2-0-cust133.10-3.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[11:40] * b0e (~aledermue@213.95.25.82) has joined #ceph
[11:41] * rraja (~rraja@121.244.87.117) has joined #ceph
[11:42] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:6828:63e8:7fe1:76b1) has joined #ceph
[11:43] <s3an2> realitysandwich, please state your question and if someone here can help they will.
[11:44] * ade (~abradshaw@85.158.226.30) Quit (Ping timeout: 480 seconds)
[11:45] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[11:47] <realitysandwich> I am wondering how many 10k disks I can journal on one ssd, and if I should be putting the ssds for journaling into a raid or if the recovery process after a failure isn't something to worry about
[11:48] * ade (~abradshaw@85.158.226.30) has joined #ceph
[11:50] * rakeshgm (~rakesh@121.244.87.117) Quit (Quit: Leaving)
[11:50] * branto (~branto@178-253-128-218.3pp.slovanet.sk) has joined #ceph
[11:50] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[11:52] * grauzikas (grauzikas@78-56-222-78.static.zebra.lt) has joined #ceph
[11:54] <s3an2> It depends ;), typically 4:1 or 5:1 spinning disks to SSD journal is what I have been using. You need to ensure the SSD is suitable as a journal - I really like the intel DC3710 but do have a read of sebastien han blog. I personally do not raid the journal I just have a configation that can gracefully handle the loss of 4/5 OSD's when the journal fails and just let ceph take care of the recovery.
[11:54] * aj__ (~aj@tmo-111-172.customers.d1-online.com) has joined #ceph
[11:54] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[11:55] <realitysandwich> ok, I was hoping I could get away with 6 OSDs on one intel dc3710
[11:55] <realitysandwich> but it sounds like that might be stretching it a bit much
[11:57] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) Quit (Ping timeout: 480 seconds)
[11:57] <monsted> depends on your performance needs
[11:58] <realitysandwich> i suppose I could change my config and put the drives into pairs of raid0 so rather than 6 I would have 3 per node
[11:59] <realitysandwich> monsted: this will be used as storage for an opennebula cluster
[12:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[12:01] <realitysandwich> I can't really predict how heavy the load will be right now, but the more robust I can make it the happier I will be to let it loose in the wild
[12:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[12:03] * Nanobot (~Shnaw@4MJAAFFBJ.tor-irc.dnsbl.oftc.net) Quit ()
[12:03] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Read error: Connection reset by peer)
[12:03] * phyphor1 (~hgjhgjh@tor-exit.insane.us.to) has joined #ceph
[12:08] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) has joined #ceph
[12:11] <Be-El> realitysandwich: if you want to build a raid-1 of ssds, you probably won't gain that much
[12:12] <Be-El> realitysandwich: if you take two drives of the same brand, they will wear out at the same pace
[12:13] <s3an2> realitysandwich, I would avoid the use of RAID with ceph
[12:18] * grauzikas (grauzikas@78-56-222-78.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[12:22] <realitysandwich> ok, thanks for the tips
[12:28] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[12:31] * aj__ (~aj@tmo-111-172.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[12:33] * phyphor1 (~hgjhgjh@7V7AAE34P.tor-irc.dnsbl.oftc.net) Quit ()
[12:33] * Sliker (~FierceFor@tsn109-201-154-203.dyn.nltelcom.net) has joined #ceph
[12:35] * user_8464 (779dad4a@107.161.19.53) has joined #ceph
[12:36] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[12:44] * smatzek (~smatzek@216.38.34.186) has joined #ceph
[12:48] * shyu (~shyu@114.254.40.190) Quit (Ping timeout: 480 seconds)
[12:49] * bene (~bene@2601:18c:8501:41d0:ea2a:eaff:fe08:3c7a) has joined #ceph
[12:51] * lundmar (~lundmar@77.68.251.24) has joined #ceph
[12:53] * smatzek (~smatzek@216.38.34.186) Quit (Read error: Connection reset by peer)
[12:53] * user_8464 (779dad4a@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[12:55] * lundmar (~lundmar@77.68.251.24) has left #ceph
[12:55] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[12:56] * ade (~abradshaw@85.158.226.30) Quit (Ping timeout: 480 seconds)
[13:00] * user_8464 (779dad4a@107.161.19.53) has joined #ceph
[13:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[13:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[13:03] * Sliker (~FierceFor@tsn109-201-154-203.dyn.nltelcom.net) Quit ()
[13:04] * tallest_red (~clusterfu@hessel2.torservers.net) has joined #ceph
[13:04] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) Quit (Read error: Connection reset by peer)
[13:08] * user_8464 (779dad4a@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[13:11] * user_8464 (779dad4a@107.161.19.109) has joined #ceph
[13:12] * ade (~abradshaw@85.158.226.30) has joined #ceph
[13:14] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[13:15] * user_8464 (779dad4a@107.161.19.109) Quit ()
[13:16] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[13:16] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[13:20] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) has joined #ceph
[13:21] * shylesh__ (~shylesh@121.244.87.118) Quit (Remote host closed the connection)
[13:21] * nardial (~ls@dslb-088-076-178-130.088.076.pools.vodafone-ip.de) Quit (Remote host closed the connection)
[13:29] * overclk (~quassel@121.244.87.117) Quit (Remote host closed the connection)
[13:30] * jlayton (~jlayton@2606:a000:1125:4074:c5:7ff:fe41:3227) Quit (Quit: ZNC 1.6.2 - http://znc.in)
[13:30] * wjw-freebsd (~wjw@176.74.240.1) has joined #ceph
[13:31] * jlayton (~jlayton@107.13.84.55) has joined #ceph
[13:33] * ogzy (~oguzy@94.55.116.66) has joined #ceph
[13:33] <ogzy> i am getting Setting system user ceph properties..usermod: user ceph is currently used by process 29001
[13:33] <ogzy> error and ceph-dploy is exiting with an error code while installing jewel to ubuntu 14.04, any idea?
[13:33] * tallest_red (~clusterfu@4MJAAFFEQ.tor-irc.dnsbl.oftc.net) Quit ()
[13:33] * rogst (~Guest1390@199.68.196.124) has joined #ceph
[13:33] <ogzy> this is where i got the error: ceph-deploy -v install cephmon-aa cephosd-ab cephosd-aa cephmon-ba cephosd-bb cephosd-ba
[13:34] * prallab (~prallab@216.207.42.137) Quit (Remote host closed the connection)
[13:34] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:34] * prallab (~prallab@216.207.42.137) has joined #ceph
[13:34] * prallab (~prallab@216.207.42.137) Quit (Remote host closed the connection)
[13:34] <ogzy> here is the error: http://pastebin.com/ENeL8R8G
[13:34] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:35] <ogzy> anybody got the same error
[13:35] <ogzy> ?
[13:38] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[13:38] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Read error: Connection reset by peer)
[13:39] * prallab (~prallab@216.207.42.137) has joined #ceph
[13:39] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[13:39] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit ()
[13:41] * onyb (~ani07nov@112.133.232.10) has joined #ceph
[13:43] * user_8464 (779dd652@107.161.19.109) has joined #ceph
[13:47] * prallab (~prallab@216.207.42.137) Quit (Ping timeout: 480 seconds)
[13:47] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[13:48] * hommie (~hommie@2a00:ec8:404:1128:6dce:f9e3:4519:18e8) has joined #ceph
[13:50] * ade (~abradshaw@85.158.226.30) Quit (Quit: Too sexy for his shirt)
[13:50] * ade (~abradshaw@85.158.226.30) has joined #ceph
[13:50] * vanham (~vanham@199.59.96.208) has joined #ceph
[13:51] <hommie> guys, have any of you ever faced a situation where you end up with "broken" objects? my S3 client is able to list the contents of the bucket, but I cannot remove anything... even using radosgw-admin tool (object rm, or bucket rm w/ --purge-objects)
[13:55] * user_8464 (779dd652@107.161.19.109) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[13:57] * flisky (~Thunderbi@36.110.40.21) Quit (Ping timeout: 480 seconds)
[13:59] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[13:59] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[13:59] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[14:01] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:01] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit ()
[14:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[14:03] * rogst (~Guest1390@4MJAAFFGC.tor-irc.dnsbl.oftc.net) Quit ()
[14:03] * VampiricPadraig (~sese_@edwardsnowden0.torservers.net) has joined #ceph
[14:04] <TheSov> does anyone know how to create an OSD on jewel 10.2 on ubuntu? cuz other than standard OSD, i cannot specify my own journal or use bluestores, its just not working.
[14:07] * pdrakeweb (~pdrakeweb@cpe-65-185-74-239.neo.res.rr.com) has joined #ceph
[14:07] * shyu (~shyu@114.254.40.190) has joined #ceph
[14:07] * shyu (~shyu@114.254.40.190) Quit ()
[14:08] * ieth0 (~ieth0@user232.77-105-223.netatonce.net) has joined #ceph
[14:09] * dvanders (~dvanders@dvanders-pro.cern.ch) Quit (Remote host closed the connection)
[14:09] * shyu (~shyu@114.254.40.190) has joined #ceph
[14:11] <vanham> TheSov, I'm using Jewel 10.2.1 on Ubuntu 14.04 LTS, Kernel 4.4, with an external SSD journal and didn't do anything different. I haven't tried bluestore yet as it is not graded production ready yet. What's the problem you're having?
[14:12] <TheSov> vanham when i do a ceph-disk prepare /dev/sdc /dev/sdb1 it goes through the motions to create an osd, but then theres no mount point created and i cant mount it manually and get the osd to work
[14:12] <TheSov> something breaks
[14:13] <vanham> Oh, ok, I don't use ceph-disk. It's not that hard to do it the manual way...
[14:13] * rakeshgm (~rakesh@121.244.87.117) Quit (Quit: Leaving)
[14:13] <TheSov> whats the manual way?
[14:13] <vanham> And you will know exactly where the problem is when it happens
[14:13] <vanham> http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-osds/
[14:14] <vanham> I don't use any of the automation ceph tools: ceph-disk, ceph-deploy (blearg!)
[14:14] <TheSov> ok ill try that
[14:15] <TheSov> im just getting tired of purging and stuff because when i try to delete the nonfunctional osd's it says they dont exist, yet they are in the osd tree!
[14:16] <vanham> there is ceph osd rm and ceph osd crush rm
[14:16] <vanham> They are different things
[14:16] <Be-El> ceph-disk has a verbose (-v) flag...
[14:16] <TheSov> i run all of those vanham
[14:16] <vanham> Hey Be-El, how's is it going?
[14:16] <TheSov> ceph osd out osd.99, ceph osd crush remove osd.99, ceph auth del osd.99 i ran all of these
[14:16] <TheSov> every time it says the osd doesnt exist
[14:16] <TheSov> and it still shows in osd tree
[14:16] <vanham> TheSov, really? the ceph osd tree will show the osd? Damn...
[14:17] * dvanders (~dvanders@dvanders-pro.cern.ch) has joined #ceph
[14:17] <TheSov> thats what im saying
[14:17] <vanham> Be-El, solved my caching problem!
[14:17] <TheSov> its like jewel is fubar'd or something
[14:17] <vanham> Now new writes goes only to SSD and latter it might go to HD...
[14:17] <vanham> ceph osd pool set rbd_cache min_write_recency_for_promote 0
[14:17] <Be-El> TheSov: did you miss 'ceph osd rm X' ?
[14:19] <TheSov> yes be-el
[14:19] * bene (~bene@2601:18c:8501:41d0:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[14:19] <Be-El> TheSov: and did you stop the running osd process?
[14:20] <TheSov> there wasnt one
[14:20] <TheSov> it never started
[14:20] <vanham> Something right is now right here...
[14:20] <vanham> Something right is not right here...
[14:20] <vanham> or there
[14:21] <vanham> TheSov, doesn't make sense
[14:21] <vanham> Does you ceph status from MON quorum?
[14:21] <vanham> *shows
[14:21] <vanham> *show
[14:21] <TheSov> not right now cuz i just purged everything
[14:21] <TheSov> gonna try manually
[14:22] <vanham> Ok
[14:23] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[14:27] * shyu (~shyu@114.254.40.190) Quit (Quit: Leaving)
[14:27] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[14:31] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[14:31] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[14:31] <TheSov> ok im back here monmap e1: 3 mons at {ceph-1=10.25.0.30:6789/0,ceph-2=10.25.0.31:6789/0,ceph-3=10.25.0.32:6789/0}
[14:31] <TheSov> election epoch 6, quorum 0,1,2 ceph-1,ceph-2,ceph-3
[14:32] <TheSov> now to try to create an OSD
[14:32] * fsimonce (~simon@host128-29-dynamic.250-95-r.retail.telecomitalia.it) has joined #ceph
[14:33] * VampiricPadraig (~sese_@06SAACWCZ.tor-irc.dnsbl.oftc.net) Quit ()
[14:34] * allen_gao (~allen_gao@58.213.72.214) has joined #ceph
[14:34] * Lokta (~Lokta@carbon.coe.int) Quit (Read error: Connection reset by peer)
[14:40] * inf_b (~o_O@dot1x-235-086.wlan.uni-giessen.de) has joined #ceph
[14:40] * inf_b (~o_O@dot1x-235-086.wlan.uni-giessen.de) Quit (Remote host closed the connection)
[14:41] * sudocat (~dibarra@2602:306:8bc7:4c50:4939:6c2d:ad83:18c8) has joined #ceph
[14:45] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:45] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Read error: Connection reset by peer)
[14:48] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Quit: ZNC 1.6.3 - http://znc.in)
[14:50] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[14:50] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[14:50] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[14:50] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[14:51] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:51] * dugravot6 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Write error: connection closed)
[14:52] * dugravot62 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:52] * dugravot61 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Read error: Connection reset by peer)
[14:53] * zhaochao (~zhaochao@124.202.191.132) Quit (Quit: ChatZilla 0.9.92 [Firefox 45.1.1/20160507231935])
[14:54] * ntpttr_ (~ntpttr@134.134.139.76) has joined #ceph
[14:57] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[14:57] * shyu (~shyu@114.254.40.190) has joined #ceph
[14:57] * sudocat (~dibarra@2602:306:8bc7:4c50:4939:6c2d:ad83:18c8) Quit (Ping timeout: 480 seconds)
[14:58] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) has joined #ceph
[14:58] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Ping timeout: 480 seconds)
[14:59] * rakeshgm (~rakesh@121.244.87.117) Quit (Remote host closed the connection)
[15:00] * shyu_ (~shyu@114.254.40.190) has joined #ceph
[15:00] <hommie> guys, have any of you ever faced a situation where you end up with "broken" objects? my S3 client is able to list the contents of the bucket, but I cannot remove anything... even using radosgw-admin tool (object rm, or bucket rm w/ --purge-objects)
[15:01] * dvanders_ (~dvanders@2001:1458:202:16b::102:124a) has joined #ceph
[15:02] * mhack (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[15:03] * QuantumBeep (~nupanick@destiny.enn.lu) has joined #ceph
[15:04] * dvanders (~dvanders@dvanders-pro.cern.ch) Quit (Ping timeout: 480 seconds)
[15:06] * hesco (~hesco@2601:cb:c001:faea:d55c:b613:9d55:6b32) has joined #ceph
[15:08] * ntpttr_ (~ntpttr@134.134.139.76) Quit (Remote host closed the connection)
[15:08] <TheSov> vanham, u there?
[15:09] <vanham> yep yep
[15:09] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:09] * shyu (~shyu@114.254.40.190) has left #ceph
[15:10] * shyu (~shyu@114.254.40.190) has joined #ceph
[15:10] * rdias (~rdias@2001:8a0:749a:d01:1152:8501:c481:ebd) Quit (Ping timeout: 480 seconds)
[15:10] * hommie (~hommie@2a00:ec8:404:1128:6dce:f9e3:4519:18e8) Quit (Remote host closed the connection)
[15:10] <TheSov> getting stuck here ceph osd -i osd.0 --mkfs --mkkey
[15:10] <TheSov> i tried many combo's just 0
[15:11] <TheSov> the actual dir name
[15:11] <TheSov> etc
[15:11] <TheSov> i just keep getting "[Errno 2] No such file or directory: "
[15:11] * thomnico (~thomnico@2a01:e35:8b41:120:fd94:9eb8:f9e3:5c38) Quit (Ping timeout: 480 seconds)
[15:12] <Kvisle> TheSov: 'ceph-osd -i 0 --mkfs --mkkey --osd-uuid $uuid'
[15:12] <Kvisle> ?
[15:12] <Kvisle> it requires it to be mounted in /var/lib/ceph/osd/ceph-0/
[15:12] <TheSov> it is mounted there
[15:12] <TheSov> 1 moment
[15:13] <TheSov> ok that worked
[15:13] <Kvisle> TheSov: you can look at the script further down in this article http://www.redpill-linpro.com/sysadvent/2015/12/18/stateless-osd-servers.html
[15:14] <TheSov> i wonder why
[15:15] * dgurtner (~dgurtner@178.197.235.255) has joined #ceph
[15:15] * ntpttr_ (~ntpttr@134.134.139.77) has joined #ceph
[15:16] * Racpatel (~Racpatel@2601:87:3:3601::4edb) has joined #ceph
[15:17] * ntpttr_ (~ntpttr@134.134.139.77) Quit ()
[15:17] * huangjun (~kvirc@113.57.168.154) Quit (Ping timeout: 480 seconds)
[15:18] * HappyLoaf (~HappyLoaf@cpc93928-bolt16-2-0-cust133.10-3.cable.virginm.net) has joined #ceph
[15:18] * rdias (~rdias@2001:8a0:749a:d01:6533:6fe7:ca4a:71a8) has joined #ceph
[15:18] <Kvisle> we've later modified the code to support multiple clusters .. we abuse the filesystem label to assign cluster name to the osd :P
[15:19] * hesco (~hesco@2601:cb:c001:faea:d55c:b613:9d55:6b32) Quit (Ping timeout: 480 seconds)
[15:20] * prallab (~prallab@103.227.98.190) has joined #ceph
[15:21] * thomnico (~thomnico@2a01:e35:8b41:120:f97a:3e5d:4334:7d82) has joined #ceph
[15:23] * mattbenjamin1 (~mbenjamin@12.118.3.106) has joined #ceph
[15:23] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[15:27] * prallab_ (~prallab@103.227.98.190) has joined #ceph
[15:27] * prallab (~prallab@103.227.98.190) Quit (Read error: Connection reset by peer)
[15:28] * ira (~ira@nat-pool-bos-t.redhat.com) has joined #ceph
[15:32] <realitysandwich> anyone have tips on setting up a failover for mounting a cephfs? I was thinking about setting up ucarp between my two monitor nodes so that if one goes down the ip will get picked up by the 2nd, but I haven't seen anything similar to this in the documentation and was wondering what is the general practice for something like this
[15:32] <Be-El> realitysandwich: the mons already provide failover
[15:32] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:32] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[15:32] <Be-El> realitysandwich: but with 2 mons you will never reach quorum if one mon fails
[15:33] <realitysandwich> how are they doing it? cause when you mount the fs its with the ip of one of the monitors, so if it goes down how does the client know where to send requests?
[15:33] * prallab_ (~prallab@103.227.98.190) Quit (Read error: No route to host)
[15:33] * QuantumBeep (~nupanick@4MJAAFFKZ.tor-irc.dnsbl.oftc.net) Quit ()
[15:33] * Sliker (~Kristophe@tor-exit.squirrel.theremailer.net) has joined #ceph
[15:33] * dgurtner (~dgurtner@178.197.235.255) Quit (Ping timeout: 480 seconds)
[15:34] * prallab (~prallab@103.227.98.190) has joined #ceph
[15:36] * yanzheng (~zhyan@118.116.115.252) has joined #ceph
[15:42] * zenpac (~zenpac3@66.55.33.66) Quit (Quit: Leaving)
[15:43] * dgurtner (~dgurtner@178.197.235.255) has joined #ceph
[15:43] * Lokta (~Lokta@carbon.coe.int) has joined #ceph
[15:48] * prallab (~prallab@103.227.98.190) Quit (Remote host closed the connection)
[15:50] * squizzi (~squizzi@107.13.31.195) has joined #ceph
[15:55] * ade (~abradshaw@85.158.226.30) Quit (Ping timeout: 480 seconds)
[15:55] * bene (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[15:58] * dgurtner (~dgurtner@178.197.235.255) Quit (Ping timeout: 480 seconds)
[15:58] * zaitcev (~zaitcev@c-50-130-189-82.hsd1.nm.comcast.net) has joined #ceph
[16:00] * ade (~abradshaw@85.158.226.30) has joined #ceph
[16:02] * dgurtner (~dgurtner@178.197.235.255) has joined #ceph
[16:02] * lmb (~Lars@p578a6a91.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[16:03] * Sliker (~Kristophe@7V7AAE4EF.tor-irc.dnsbl.oftc.net) Quit ()
[16:03] * rcfighter (~Neon@tor.les.net) has joined #ceph
[16:06] <sep> realitysandwich, you can provide multiple monitor ip's to the mount command http://docs.ceph.com/docs/hammer/man/8/mount.ceph/
[16:06] * yanzheng (~zhyan@118.116.115.252) Quit (Quit: This computer has gone to sleep)
[16:08] * shyu_ (~shyu@114.254.40.190) Quit (Ping timeout: 480 seconds)
[16:08] <Be-El> realitysandwich: the first step every ceph client should perform is getting the current mon map from the configured monitors. afterwards it should only rely on that map (and also listen on updates). for redundancy for this initial step see sep's comment
[16:09] * prallab (~prallab@103.227.98.190) has joined #ceph
[16:09] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:09] <realitysandwich> very nice, thanks again
[16:09] * LeaChim (~LeaChim@host86-176-96-249.range86-176.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[16:10] * onyb (~ani07nov@112.133.232.10) Quit (Ping timeout: 480 seconds)
[16:12] * dvanders (~dvanders@dvanders-pro.cern.ch) has joined #ceph
[16:14] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[16:15] * lmb (~Lars@2a02:8109:8100:1d2c:6c7b:166f:548a:e815) has joined #ceph
[16:15] * dvanders_ (~dvanders@2001:1458:202:16b::102:124a) Quit (Ping timeout: 480 seconds)
[16:17] * prallab (~prallab@103.227.98.190) Quit (Ping timeout: 480 seconds)
[16:18] * LeaChim (~LeaChim@host86-176-96-249.range86-176.btcentralplus.com) has joined #ceph
[16:19] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Remote host closed the connection)
[16:24] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[16:24] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Remote host closed the connection)
[16:25] * mhuang (~mhuang@119.254.120.72) Quit (Ping timeout: 480 seconds)
[16:26] * dgurtner (~dgurtner@178.197.235.255) Quit (Ping timeout: 480 seconds)
[16:31] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:32] * flisky (~Thunderbi@106.37.236.222) has joined #ceph
[16:33] * flisky (~Thunderbi@106.37.236.222) Quit ()
[16:33] * rcfighter (~Neon@06SAACWJO.tor-irc.dnsbl.oftc.net) Quit ()
[16:42] * xarses_ (~xarses@64.124.158.100) has joined #ceph
[16:45] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-019.uzh.ch) has joined #ceph
[16:49] * shyu (~shyu@114.254.40.190) Quit (Ping timeout: 480 seconds)
[16:51] * Brochacho (~alberto@c-73-45-127-198.hsd1.il.comcast.net) has joined #ceph
[16:52] * branto (~branto@178-253-128-218.3pp.slovanet.sk) Quit (Quit: Leaving.)
[16:52] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[16:53] * branto (~branto@178-253-128-218.3pp.slovanet.sk) has joined #ceph
[16:56] * rwheeler (~rwheeler@pool-173-48-195-215.bstnma.fios.verizon.net) Quit (Quit: Leaving)
[16:59] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[16:59] * dgurtner (~dgurtner@178.197.235.255) has joined #ceph
[16:59] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[17:01] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[17:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[17:03] * CoMa (~Eric@06SAACWMP.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:03] * itamarl is now known as Guest1832
[17:04] * wushudoin (~wushudoin@2601:646:9580:a5e0:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:04] * itamarl (~itamarl@194.90.7.244) has joined #ceph
[17:04] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[17:04] * sudocat (~dibarra@104-188-116-197.lightspeed.hstntx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[17:05] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[17:06] * joshd1 (~jdurgin@71-92-201-212.dhcp.gldl.ca.charter.com) has joined #ceph
[17:06] * prallab (~prallab@103.227.98.190) has joined #ceph
[17:06] * wushudoin (~wushudoin@2601:646:9580:a5e0:2ab2:bdff:fe0b:a6ee) Quit ()
[17:06] * Guest1832 (~itamarl@194.90.7.244) Quit (Ping timeout: 480 seconds)
[17:07] * wushudoin (~wushudoin@2601:646:9580:a5e0:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:08] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[17:10] * antongribok (~antongrib@216.207.42.140) has joined #ceph
[17:14] * prallab (~prallab@103.227.98.190) Quit (Ping timeout: 480 seconds)
[17:16] * CoMa (~Eric@06SAACWMP.tor-irc.dnsbl.oftc.net) Quit ()
[17:16] * Tetard_ is now known as Tetard
[17:16] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[17:18] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[17:19] * debian112 (~bcolbert@24.126.201.64) has left #ceph
[17:20] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:20] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[17:22] * ade (~abradshaw@85.158.226.30) Quit (Ping timeout: 480 seconds)
[17:23] * Racpatel (~Racpatel@2601:87:3:3601::4edb) Quit (Quit: Leaving)
[17:23] * wjw-freebsd (~wjw@176.74.240.1) Quit (Ping timeout: 480 seconds)
[17:24] * Racpatel (~Racpatel@2601:87:3:3601::4edb) has joined #ceph
[17:25] * itamarl (~itamarl@194.90.7.244) Quit (Quit: itamarl)
[17:26] * Bonzaii1 (~Skyrider@4MJAAFFSA.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:31] * branto (~branto@178-253-128-218.3pp.slovanet.sk) Quit (Quit: Leaving.)
[17:34] * Mika_c (~Mika@36-229-239-114.dynamic-ip.hinet.net) has joined #ceph
[17:34] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[17:36] * thumpba (~thumbpa@45-19-50-56.lightspeed.austtx.sbcglobal.net) has joined #ceph
[17:36] <thumpba> how often should i have to reweight my ceph cluster
[17:36] * prallab (~prallab@103.227.98.190) has joined #ceph
[17:37] <thumpba> how often should i have to reweight my ceph osd's, they seem to hold data unevenly from time to time
[17:38] * Skaag (~lunix@65.200.54.234) has joined #ceph
[17:40] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[17:41] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[17:42] * ibravo (~ibravo@72.83.69.64) has joined #ceph
[17:43] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[17:44] * prallab (~prallab@103.227.98.190) Quit (Ping timeout: 480 seconds)
[17:45] * shaunm (~shaunm@208.71.28.198) has joined #ceph
[17:47] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[17:48] * gregmark (~Adium@68.87.42.115) has joined #ceph
[17:49] * dgurtner (~dgurtner@178.197.235.255) Quit (Ping timeout: 480 seconds)
[17:51] * gregmark (~Adium@68.87.42.115) Quit ()
[17:52] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[17:53] * Mika_c (~Mika@36-229-239-114.dynamic-ip.hinet.net) Quit (Quit: Leaving)
[17:55] * dgurtner (~dgurtner@178.197.235.255) has joined #ceph
[17:55] * linuxkidd (~linuxkidd@242.sub-70-210-192.myvzw.com) has joined #ceph
[17:56] * Bonzaii1 (~Skyrider@4MJAAFFSA.tor-irc.dnsbl.oftc.net) Quit ()
[17:56] * AG_Clinton (~kalleeen@192.42.115.101) has joined #ceph
[17:58] <SamYaple> If I have two roots, can I know how much space is left in a single root for scaling purposes?
[18:00] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[18:03] * scg (~zscg@valis.gnu.org) has joined #ceph
[18:04] <The_Ball> One of my OSDs is showing as being on the wrong host in "ceph osd tree", how can that happen?
[18:04] * linjan_ (~linjan@86.62.112.22) Quit (Ping timeout: 480 seconds)
[18:05] * PIP3 (~PIP3@ip-37-24-81-116.hsi14.unitymediagroup.de) has joined #ceph
[18:06] <ibravo> does ceph-ansible works with bluestore? I am getting an error when deploying
[18:07] <scg> is it recommended to use ceph with 2 physical nodes with 10 or more ODS? I want to buy two servers with a bunch of SSDs
[18:08] <scg> This will be used for disk provisioning for VMs.
[18:08] <SamYaple> ibravo: it does have bluestore implemented yes
[18:08] * mykola (~Mikolaj@91.245.76.80) has joined #ceph
[18:08] * [1]lj (~liujun@111.202.176.44) has joined #ceph
[18:09] <SamYaple> scg: typically 3+ nodes are used but if you are only doing two copies of the objects, i dont see the problem
[18:09] <SamYaple> youll want 3 ceph-mons though for HA
[18:09] <SamYaple> if you dont care about HA, one also would work
[18:09] <ibravo> SamYaple: it fails on 'Make sure an osd scenario was chosen'
[18:09] <scg> SamYaple: I read that. I want only 2 object replicas. I will run 3 VMs for monitors.
[18:10] <SamYaple> ibravo: yea i broke that, sorry. or rather a change i made exposed the issue. were working on a fix. for now you can skip that task, just ensure you have only one scenario set to true and youll be fine
[18:10] * khyron (~khyron@fixed-190-159-187-190-159-75.iusacell.net) has joined #ceph
[18:11] <antongribok> ibravo: another workaround for that issue is to set one scenario to "true" and all others set explicitly to "false", this worked for me last week
[18:11] <scg> SamYaple: I did a test with 2 nodes each one with 2 ODS. When I intentionally crashed one of the nodes, my cluster stopped working. I am just using the default crash map and I don't have placement rules. So I guess my problem must be there.
[18:11] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:6828:63e8:7fe1:76b1) Quit (Ping timeout: 480 seconds)
[18:11] * wes_dillingham (~wes_dilli@140.247.242.44) has joined #ceph
[18:12] <wes_dillingham> I am unable to unprotect a protected snapshot (it had no children) when i attempt to do so, i get a core dump??? is there any way to force this?
[18:12] <SamYaple> scg: thats not enough information to tell you what the issue is. logs and command outputs are needed
[18:13] * shaunm (~shaunm@208.71.28.198) Quit (Ping timeout: 480 seconds)
[18:13] <ibravo> antongribok: I took that approach and seems to be installing. Thanks
[18:13] <The_Ball> restarting the OSD fixed the issue btw
[18:13] * linjan_ (~linjan@86.62.112.22) has joined #ceph
[18:13] <antongribok> scg: just curious, did you check that you had changed the defaults, like "replicated size 3 min_size 2"
[18:14] <SamYaple> ibravo: antongribok issue should be fixed in master in a few days
[18:14] <antongribok> you can see that in the output of "ceph osd dump |grep pg_num"
[18:14] * lj (~liujun@111.202.176.44) Quit (Ping timeout: 480 seconds)
[18:14] * [1]lj is now known as lj
[18:14] * shaunm (~shaunm@208.71.28.198) has joined #ceph
[18:17] <scg> SamYaple: I changed that. ceph osd pool get poolname size = 2; ceph osd pool get poolname min_size = 1
[18:18] <scg> SamYaple: Actual it is also in the clustername.conf file.
[18:18] <scg> SamYaple: Actually.
[18:19] * Kioob1 (~Kioob@ALyon-652-1-188-79.w109-213.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:19] * dgurtner (~dgurtner@178.197.235.255) Quit (Ping timeout: 480 seconds)
[18:20] <scg> SamYaple: I know that can be many things. I just want to know it is has to work "out of the box" without tweaking something else.
[18:24] <TMM> So, I have a pool of SSDs that I'd like to use as journals for my spinners
[18:24] <TMM> I've been trying various permutations of ceph-deploy to get this to work properly but ceph deploy really does not like what I'm trying to do it seems
[18:25] <TMM> I'm trying to get it to use an existing blockdevice as journal, but it always wants to partition it
[18:25] <TMM> which doesn't work so well, as the devices are logical volumes
[18:25] <TMM> so, then I just made a one big filesystem and tried to use 'file' journals, that failed also because ceph-disk isn't symlinking to the files it creates
[18:26] <TMM> (sorry, not using ceph-deploy, I meant ceph-disk, I'm using that manually)
[18:26] * AG_Clinton (~kalleeen@7V7AAE4LY.tor-irc.dnsbl.oftc.net) Quit ()
[18:26] * Xa (~Jaska@tor-exit.15-cloud.fr) has joined #ceph
[18:26] <antongribok> TMM: just curious, are you trying to use LVM-Cache?
[18:27] <TMM> antongribok, no, I'm using lvm only to create 3gb chunks on my ssd pool
[18:27] <TMM> Which I'd then like the osds to use as journals
[18:28] <TMM> file journals actually didn't work at all with ceph-disk because of a missing 'return' so I'm suspecting this is not a very well tested feature :)
[18:29] <PIP3> Hey, I'm new to ceph and I'm having trouble getting my crush map to work. I'm getting "Error EINVAL: Failed crushmap test: *** Error in `crushtool': double free or corruption (out): 0x0000555fb901f400 ***" Here is the crush map I want to use: http://pastebin.com/hF6Gkee1
[18:29] * shylesh (~shylesh@45.124.226.242) has joined #ceph
[18:29] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[18:30] <antongribok> TMM: interesting, I thought that there was a way to do this, but I haven't done this myself... I'm about to start testing lvm-cache, so I'm curious how you end up resolving this
[18:31] * vet-hd_ (~oftc-webi@strix.wnet.ua) has joined #ceph
[18:32] <vet-hd_> Hi everyone
[18:34] * xarses_ (~xarses@64.124.158.100) Quit (Remote host closed the connection)
[18:34] * xarses (~xarses@64.124.158.100) has joined #ceph
[18:35] * mhuang (~mhuang@117.114.129.62) has joined #ceph
[18:35] * dugravot62 (~dugravot6@dn-infra-04.lionnois.site.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[18:37] * ledgr_ (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[18:37] <scg> SamYaple: I am testing my cluster using 2 mon (yes, even). In this lab env, there is one mon in each node (again, I know that is not recommended). When I try to check the mon_status through ceph I can not get it. So I use ceph daemon "socket file of mon-0" mon_status and I can see "state": "probing". So the problem must be there. I understood that ceph should work with 1 mon.
[18:38] <TMM> antongribok, it appears that ceph-disk may be clever enough to create new parititions for each journal itself
[18:39] * branto (~branto@178-253-128-218.3pp.slovanet.sk) has joined #ceph
[18:39] * shaunm (~shaunm@208.71.28.198) Quit (Ping timeout: 480 seconds)
[18:39] <TMM> antongribok, I'm now having ceph-disk just work on the raw /dev/md device, and that appears to be giving the expected results
[18:40] <vet-hd_> Help please repair one PG on ceph storage. PG has no primary OSD. (and no OSD set totaly): # ceph pg map 3.1a6 osdmap e9465 pg 3.1a6 (3.1a6) -> up [] acting []
[18:40] * jowilkin (~jowilkin@2601:644:4000:b0bf:ea2a:eaff:fe08:3f1d) has joined #ceph
[18:41] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[18:41] <vet-hd_> I have old osdmap where no issue whith pg 3.1a6. How i cat update/insert osdmap?
[18:42] <vet-hd_> I have old osdmap where no issue whith pg 3.1a6. How i can update/insert osdmap?
[18:42] * kefu (~kefu@114.92.122.74) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:43] <vet-hd_> How i can set OSD set for PG? How i can set primary OSD for PG?
[18:44] <vet-hd_> Can anyone help me?
[18:46] * kefu (~kefu@183.193.181.153) has joined #ceph
[18:46] * kefu is now known as kefu|afk
[18:48] * joshd1 (~jdurgin@71-92-201-212.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[18:49] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:51] <scg> can I remove a dead monitor from a multi-monitor cluster? I want to allow a multi-monitor cluster to work with 1 mon in case 2 out of 3 crashes.
[18:52] <realitysandwich> anyone know what an "average" write speed would be for VMs running on a ceph datastore would be?
[18:53] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[18:54] <wes_dillingham> realitysandwich: (the theoretical write speed of the hardware / replication level ) - minus a little bit
[18:55] <realitysandwich> hrm, right now I am only getting about 60mb/s
[18:56] <wes_dillingham> It all depends on the hardware, replication settings
[18:56] * Xa (~Jaska@4MJAAFFVR.tor-irc.dnsbl.oftc.net) Quit ()
[18:56] <realitysandwich> only replicating once, but I am running the journals on the same disks as the OSDs right now, but I have some SSDs on order
[18:56] * chopmann (~sirmonkey@2a02:8108:8b40:5600::2) has joined #ceph
[18:57] <wes_dillingham> networking between osds matters too
[18:57] <wes_dillingham> and between client - osd
[18:57] <realitysandwich> i have two 1gb bonded links between all of the machines
[18:58] <realitysandwich> I am getting slow requests though, so I think the network transfer is fine
[18:58] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:59] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[18:59] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[19:00] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[19:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[19:01] * haomaiwang (~haomaiwan@li401-170.members.linode.com) has joined #ceph
[19:01] <wes_dillingham> I have a rbd device, which is in a weird state, I am unable to remove snapshots from it (core dumps) or create snapshots on it (again core dump) I have the following features enabled: features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling other devices in the pool have no issues
[19:01] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[19:02] <dillaman> wes_dillingham: any chance you can run 'rbd' from within gdb, get it to crash, and provide the output from "thread apply all bt"?
[19:02] * linjan_ (~linjan@86.62.112.22) Quit (Ping timeout: 480 seconds)
[19:03] <wes_dillingham> dillaman: ill give it a shot
[19:03] <dillaman> wes_dillingham: you'll need the debug symbols installed, though
[19:03] <dillaman> wes_dillingham: if that's too much, perhaps just run rbd with "--debug-rbd=20" on the CLI and collect the generated logs
[19:05] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[19:06] <wes_dillingham> dillaman: get it to you in 1 sec
[19:07] * mhack (~mhack@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[19:08] <s3an2> Anyone know if ganesha supports cephfs client quotas
[19:09] * haomaiwang (~haomaiwan@li401-170.members.linode.com) Quit (Remote host closed the connection)
[19:10] * karnan (~karnan@106.51.131.247) has joined #ceph
[19:13] <ibravo> In ceph-ansible, I'm stuck at collect admin and bootstrap keys. The command 'ceps-create-keys --cluster ceph --id hostname' results in 'ceph-mon is not in quorum: u'probing'
[19:13] * mhuang (~mhuang@117.114.129.62) Quit (Quit: This computer has gone to sleep)
[19:13] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[19:14] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[19:15] * realitysandwich (~perry@213.61.152.126) Quit (Ping timeout: 480 seconds)
[19:15] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[19:16] * pabluk is now known as pabluk_
[19:17] * mhack (~mhack@nat-pool-bos-u.redhat.com) has joined #ceph
[19:22] <ibravo> It seems to be a similar issue than https://github.com/ceph/ceph-ansible/issues/480
[19:26] * Scrin (~Zeis@watchme.tor-exit.network) has joined #ceph
[19:28] * thumpba (~thumbpa@45-19-50-56.lightspeed.austtx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[19:28] * thumpba (~thumbpa@2602:302:d133:2380:c04:d78a:b7d6:6464) has joined #ceph
[19:29] * rakeshgm (~rakesh@106.51.29.57) has joined #ceph
[19:30] * wer (~wer@216.197.66.226) has joined #ceph
[19:31] * shylesh (~shylesh@45.124.226.242) Quit (Remote host closed the connection)
[19:34] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[19:34] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Remote host closed the connection)
[19:34] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[19:35] * guerby (~guerby@ip165.tetaneutral.net) Quit (Ping timeout: 480 seconds)
[19:36] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[19:37] * overclk (~quassel@117.202.96.4) has joined #ceph
[19:37] * guerby (~guerby@ip165.tetaneutral.net) has joined #ceph
[19:38] * kefu|afk (~kefu@183.193.181.153) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:40] * rakeshgm (~rakesh@106.51.29.57) Quit (Quit: Leaving)
[19:42] * tnarg (~oftc-webi@vpn.uberatc.com) has joined #ceph
[19:42] * kevinc (~kevinc__@client65-32.sdsc.edu) has joined #ceph
[19:42] * realitysandwich (~perry@b2b-78-94-59-114.unitymedia.biz) has joined #ceph
[19:42] <tnarg> I'm looking for advice on practical upper bounds when scaling ceph
[19:43] <Aeso> tnarg, upper bounds of what metric?
[19:43] <tnarg> # of OSDs in a cluster for example
[19:43] <cathode> i don't think anyone has reached limits like that yet
[19:43] <cathode> yahoo has a 500PiB ceph cluster last i heard
[19:43] <tnarg> all in a single cluster?
[19:43] * ircolle (~Adium@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[19:44] * TMM (~hp@185.5.122.2) Quit (Quit: Ex-Chat)
[19:44] <cathode> afaik, yes
[19:44] <tnarg> I wasn't sure if that meant that clients would need to hold tons of metadata about the cluster
[19:45] * shaunm (~shaunm@208.71.28.198) has joined #ceph
[19:46] <tnarg> do clients get a full copy of the cluster state from the monitors?
[19:46] * rakeshgm (~rakesh@106.51.29.57) has joined #ceph
[19:47] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[19:47] <Aeso> tnarg, clients use the CRUSH map to read and write data (which scales with the number of OSDs, but is still ultimately just a hashed lookup table)
[19:47] <vanham> Folks, about RadosGW, what kind of performance are you getting? I'm writing small files straight to SSDs and I'm only getting about 130-150 PUTS per second.
[19:47] <vanham> files are 20k-30k in average
[19:48] * branto (~branto@178-253-128-218.3pp.slovanet.sk) Quit (Quit: Leaving.)
[19:48] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[19:48] <tnarg> Aeso, okay, so N bytes per OSD where N is < 100?
[19:48] <vanham> When I do a rados-bench (ohhh old!) for 32k files, it will give me the 150 PUTS/s mark.
[19:50] * davidz (~davidz@2605:e000:1313:8003:f165:1d2b:ae5:b924) has joined #ceph
[19:51] <Aeso> tnarg, something in that ballpark. Each object has a 'name', and the CRUSH map is a big list of PGs, where they are, and what range of names they're responsible for afaik
[19:51] <kevinc> we are upgrading our ceph cluster from hammer to jewel, we run the ceph deploy command to update ceph on one of the monitors, we changed the permissions on /var/lib/ceph, but systemctl start ceph.target doesn't start the monitor, any suggestions?
[19:52] <Aeso> kevinc, do you get any errors or logs when you try to start the monitors?
[19:52] <tnarg> In Sage Weil's AMA on reddit he was asked "What is the largest size of ceph cluster you've seen so far in production today?" and his answer was "The largest I've worked with was ~1300 OSDs"
[19:52] <tnarg> that gave me pause
[19:53] <cathode> keep in mind you can reduce OSD count by making each OSD larger
[19:53] <kevinc> aeso: no errors when i run he command, nothing in /var/logs/ceph and nothing in journalctl
[19:53] <tnarg> <cathode> true, although you increase your failure domain
[19:53] <cathode> for example i've read that some people might create really large OSDs for example from software raid
[19:53] <cathode> or zfs underneath ceph
[19:54] <neurodrone> kevinc: Upping your debug levels for objecter or filestores might help.
[19:54] <neurodrone> You might need to chown ceph:ceph all the way upto the disks.
[19:54] <kevinc> aeso: systemctl status ceph-mon says Loaded: not-found (Reason: no such file or directory)
[19:55] <Aeso> tnarg, I think the largest publically documented tests I've seen are from CERN, 7200 OSDs. They had some set up issues but once the cluster was up and running it was smooth sailing.
[19:55] <cathode> so if you have four racks of OSD servers, each server might have a bunch of JBOD attached disk shelves but might only present 12 OSDs or something because they'd use zfs to manage the actual disks
[19:55] <neurodrone> Try manually running ceph-mon or ceph-osd in foreground after upping the lgo levels.
[19:55] <tnarg> I'm looking at provisioning a cluster with 16800 OSD disks (not including journal disks)
[19:56] * Scrin (~Zeis@4MJAAFFY2.tor-irc.dnsbl.oftc.net) Quit ()
[19:56] <neurodrone> Wow, what???s the CPU:OSD ratio you are planning to have?
[19:56] <neurodrone> And by ???OSD??? I don???t mean disks, more like running daemons.
[19:57] <tnarg> 24 disks per chassis, 20 disks for OSD, 4 disks RAID10 for journals, 2x10 core CPUs per chassis, 128GB RAM per chassis
[19:59] <cathode> tnarg - you'd have to try that and see i guess... and maybe try some different config arrangements
[19:59] <neurodrone> If you are planning to run OSDs on individual disks you might want to bench performance specifically with the no. of those on each such box. Ceph works well with 3:1/4:1 CPU core:OSD ratio.
[19:59] <kevinc> running ceph-mon as the ceph user returns the error: monitor data directory at '/var/lib/ceph/mon/ceph-admin' does not exist have you run mkfs? we do have a /var/lib/ceph/mon/ceph-ceph-mon/
[19:59] <neurodrone> And that is without any sort of disk encryption, which will involve further cpu cycles.
[19:59] <kevinc> should we rename that directory?
[20:00] <neurodrone> kevinc: Are you MON nodes admin nodes too? Those should ideally be separate.
[20:00] <johnavp1989> Quick question regarding Ceph behavior. If a cluster is low on space and an OSD fails and there's not enough space to recover that OSD what does Ceph do? Will it try to recover anyway or will it just wait in a degraded state until the OSD is replaced?
[20:00] <neurodrone> tnarg: Also any specific reason not to colocate journals? By disks I assume you mean HDDs?
[20:01] <kevinc> neurodrone: not sure what you are asking, but the monitors only run the mon service, we have separate nodes for management and odds
[20:01] * reed (~reed@75-101-54-18.dsl.static.fusionbroadband.com) has joined #ceph
[20:01] <neurodrone> johnavp1989: Ceph wont recover if it can???t. If you have determined the cause as space issues you will need to add capacity or reduce the no. of replicas or something.
[20:02] <Aeso> tnarg, if I were you I'd talk to inktank/redhat to see if they have any success stories that aren't well-known to the public. I suspect they've probably had projects on that scale but I don't know of any that are well documented publically
[20:02] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[20:02] <neurodrone> kevinc: `/var/lib/ceph/mon/ceph-admin` this confuses me. Your ceph-<mon-id> directory should be enough. Have you ran the ceph-mon command manually?
[20:02] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[20:03] <johnavp1989> neurodrone: Great thank you. Do you know if there's any recommendations for available space? Something we know not to exceed
[20:03] <neurodrone> I think if you have your alerts set you will be notified before time. We have downtuned ceph settings to 0.75 nearfull and 0.85 full.
[20:04] <neurodrone> https://github.com/ceph/ceph/blob/jewel/src/common/config_opts.h#L262-L263 attributes I mean.
[20:04] * mhackett (~mhack@nat-pool-bos-t.redhat.com) has joined #ceph
[20:04] <johnavp1989> neurodrone: Thank you!
[20:04] <kevinc> neurodrone: when we run ceph-mon the error is: monitor data directory at '/var/lib/ceph/mon/ceph-admin' does not exist: have you run 'mkfs'?
[20:05] <neurodrone> What parameters do you pass to the command?
[20:05] <neurodrone> I assume you just pass the id and public addr?
[20:05] * Vale (~Arfed@exit1.torproxy.org) has joined #ceph
[20:07] * mhack (~mhack@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[20:08] <ibravo> I'm having issues with ceph-ansible deploying a monitor: Check the error at 14:00:27 on https://paste.fedoraproject.org/370076/14640268/
[20:11] <tnarg> neurodrone: 3:1 core:osd?
[20:11] <neurodrone> tnarg: 3 or 4 cores per OSD is a sweet spot.
[20:11] <tnarg> wow, okay
[20:11] <tnarg> more than I expected
[20:12] <neurodrone> Intel did publish this last year did they not? A sec.
[20:12] <neurodrone> Oh but that was with SSDs I think. I think I may have said that sooner before getting an answer to me question.
[20:12] <tnarg> I'm assuming that is thread, not necessarily physical core
[20:13] <neurodrone> "Also any specific reason not to colocate journals? By disks I assume you mean HDDs?"
[20:13] <neurodrone> That???s physical cores.
[20:13] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[20:13] * rakeshgm (~rakesh@106.51.29.57) Quit (Remote host closed the connection)
[20:13] <neurodrone> We have seen similar utilization on our cluster, and we use encryption as well so that adds.
[20:14] <neurodrone> Also if you are not saturating your disk, you might need to look into multiple OSDs per disk.
[20:15] <neurodrone> That???s more preferable compared to increase threads per shard.
[20:15] <neurodrone> Assuming you are using hammer++.
[20:18] * ben1 (ben@pearl.meh.net.nz) Quit (Ping timeout: 480 seconds)
[20:18] <tnarg> what kind of memory footprint does an OSD need?
[20:20] * arcimboldo (~antonio@dhcp-y11-zi-s3it-130-60-34-019.uzh.ch) Quit (Ping timeout: 480 seconds)
[20:21] <neurodrone> I have seen an OSD range from 2-10G.
[20:26] <sage> tnarg: i'm torturing a 5700 osd cluster right now
[20:26] <kevinc> We upgraded one of our monitors from Hammer to Jewel and now the server thinks its name is mon.admin instead of mon.ceph-mon-3, any ideas?
[20:27] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[20:27] <kevinc> we upgraded using ceph-deploy
[20:31] <kevinc> if we run ceph-mon manually and set id to the server name and it works
[20:32] * overclk (~quassel@117.202.96.4) Quit (Remote host closed the connection)
[20:35] * rwmjones (~rwmjones@230.83.187.81.in-addr.arpa) Quit (Ping timeout: 480 seconds)
[20:35] * Vale (~Arfed@4MJAAFF0T.tor-irc.dnsbl.oftc.net) Quit ()
[20:35] <tnarg> @sage: any more details you can share about the setup?
[20:35] * rwmjones (~rwmjones@230.83.187.81.in-addr.arpa) has joined #ceph
[20:37] <sage> it's behaving reasonably well so far with the exception of an asyncmsgr bug i'm tracking down. not done with testing tho
[20:39] * thomnico (~thomnico@2a01:e35:8b41:120:f97a:3e5d:4334:7d82) Quit (Quit: Ex-Chat)
[20:40] * onyb (~ani07nov@112.133.232.10) has joined #ceph
[20:44] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[20:44] <vet-hd_> Hello. I have problem with pg on ceph cluster - PG no placed on any OSD.
[20:45] <vet-hd_> How i can manualy or automaticaly set OSD for these PG?
[20:47] <vet-hd_> I found the PG dir on 3 OSD and i backup it data. How i insert it again in cluster?
[20:48] <vet-hd_> Anyone can help?
[20:49] * nils__ (~nils_@port-5958.pppoe.wtnet.de) Quit (Quit: Leaving)
[20:51] <vet-hd_> I think one of the way is modify osdmap file. Is there a tool for editing osdmap?
[20:51] * linjan_ (~linjan@176.195.66.84) has joined #ceph
[20:51] <kevinc> Any idea why systemctl start ceph.target fails to start the ceph monitor but systemctl start ceph-mon@ceph-mon-3 works?
[20:52] <scg> is it mandatory or safer to configure the placement rule in a cluster inside the same rack where all the disks have the same size?
[20:55] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[20:55] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[20:57] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[20:57] * chopmann (~sirmonkey@2a02:8108:8b40:5600::2) Quit (Ping timeout: 480 seconds)
[20:57] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[20:57] * ade (~abradshaw@91.118.86.66) has joined #ceph
[20:58] * arcimboldo (~antonio@84-75-174-248.dclient.hispeed.ch) has joined #ceph
[21:00] * pabluk_ is now known as pabluk
[21:06] * Hemanth (~hkumar_@103.228.221.150) has joined #ceph
[21:07] * PIP3 (~PIP3@ip-37-24-81-116.hsi14.unitymediagroup.de) Quit (Quit: Leaving)
[21:07] * Hemanth (~hkumar_@103.228.221.150) Quit ()
[21:10] * karnan (~karnan@106.51.131.247) Quit (Quit: Leaving)
[21:15] <BranchPredictor> 5700 osds... that's one huge cluster.
[21:16] * linjan_ (~linjan@176.195.66.84) Quit (Ping timeout: 480 seconds)
[21:17] * shaunm (~shaunm@208.71.28.198) Quit (Ping timeout: 480 seconds)
[21:23] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[21:24] * linjan_ (~linjan@176.195.66.84) has joined #ceph
[21:27] * ira (~ira@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[21:28] * kevinc (~kevinc__@client65-32.sdsc.edu) Quit (Quit: Leaving)
[21:29] * richi_ (~richi@213.188.59.132) has joined #ceph
[21:29] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) has joined #ceph
[21:36] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[21:38] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[21:38] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[21:39] * evelu (~erwan@46.231.131.178) Quit (Ping timeout: 480 seconds)
[21:42] * antongribok (~antongrib@216.207.42.140) Quit (Quit: Leaving...)
[21:42] * davidzlap (~Adium@2605:e000:1313:8003:24e8:f7b6:bedf:1694) Quit (Quit: Leaving.)
[21:44] <ibravo> I can't deploy with ceph-ansible. Error at ceps-create-keys --cluster ceph --id monitor name. Ideas?
[21:44] <ibravo> ceph-create-keys (spell check)
[21:51] * ade (~abradshaw@91.118.86.66) Quit (Quit: Too sexy for his shirt)
[21:52] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[21:53] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[21:54] * richi_ (~richi@213.188.59.132) has left #ceph
[21:54] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) has joined #ceph
[22:00] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[22:04] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[22:05] * rwheeler (~rwheeler@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[22:07] * sickology (~mio@vpn.bcs.hr) Quit (Read error: Connection reset by peer)
[22:07] * sickology (~mio@vpn.bcs.hr) has joined #ceph
[22:10] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[22:11] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[22:12] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[22:13] * pabluk is now known as pabluk_
[22:14] * Random (~skney@4MJAAFF6F.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:23] <vanham> Guys, on the ceph osd thread types (journal_write, filestore_sync, tp_osd_tp, osd_srv_agent, ms_pipe_write, etc), witch ones take care of LevelDB changes on Jewel?
[22:23] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[22:24] <vanham> I'm trying to understand why my RadosGW is so slow right now.
[22:25] <vanham> SSDs are going almost 100% usage, while I'm writing very very little
[22:27] <vanham> I'm writing about 5MB/s on objects while my SSDs are doing a much more than that. Also hundreds / thousands of R/W IOPs on iostat while I'm only able to PUT less than 40 objects per second right now.
[22:27] <vanham> My main bucket have about 500k objects right now.
[22:27] * mykola (~Mikolaj@91.245.76.80) Quit (Remote host closed the connection)
[22:28] <vanham> But, even when using an empty bucket for testing, since we are continuously writing to the big one, performance is suffering
[22:32] * vata (~vata@cable-21.246.173-197.electronicbox.net) has joined #ceph
[22:38] * wes_dillingham (~wes_dilli@140.247.242.44) Quit (Ping timeout: 480 seconds)
[22:42] * ibravo (~ibravo@72.83.69.64) Quit (Quit: This computer has gone to sleep)
[22:44] * Random (~skney@4MJAAFF6F.tor-irc.dnsbl.oftc.net) Quit ()
[22:44] * Pirate (~offer@tor2r.ins.tor.net.eu.org) has joined #ceph
[22:47] * ibravo (~ibravo@72.83.69.64) has joined #ceph
[22:48] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[22:50] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[22:51] * jowilkin (~jowilkin@2601:644:4000:b0bf:ea2a:eaff:fe08:3f1d) Quit (Ping timeout: 480 seconds)
[22:52] <TMM> hi all, I've just installed a ceph jewel cluster on my new test hardware and I'm seeing some strange behavior. When doing a rados bench from 8 clients with 60 operations each, I see a pretty nice throughput of about 1.2GB/s into my cluster. But it pauses for long periods of time while cpu load on the osd machines plummet
[22:52] <TMM> Am I seeing some cache flushes happening perhaps?
[22:52] <TMM> the setup is a bunch of spinners with some ssds as journals
[22:53] <TMM> I see cpu load on the osds plummet from about 10% per osd process to like 0.3 and then all writes to the cluster stop
[22:56] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[22:57] * rendar (~I@host30-183-dynamic.1-87-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[22:57] <TMM> I actually see a bunch of spinners hit 80 - 100% utilization during these stalls
[22:58] <TMM> perhaps my journals are a bit too fast/optmistically sized
[22:58] * scg (~zscg@valis.gnu.org) Quit (Quit: Leaving)
[22:58] * scg (~zscg@valis.gnu.org) has joined #ceph
[22:58] * scg (~zscg@valis.gnu.org) Quit ()
[22:58] * scg (~zscg@valis.gnu.org) has joined #ceph
[22:59] <TMM> hmm, the 90 - 100% utilization is actually during the time when the cluster is performing fine
[22:59] <TMM> it goes way down when there are 0writes
[22:59] * vanham (~vanham@199.59.96.208) Quit (Ping timeout: 480 seconds)
[22:59] * ibravo (~ibravo@72.83.69.64) Quit (Quit: Leaving)
[23:00] <TMM> So, yeah, everything just stops including disk writes, lovely
[23:01] <vet-hd_> Hello. I have problem with pg on ceph cluster - PG no placed on any OSD.
[23:01] <vet-hd_> I found the PG dir on 3 OSD and i backup it data. How i insert it again in cluster?
[23:01] <vet-hd_> Anyone can help?
[23:01] <vet-hd_> I think one of the way is modify osdmap file. Is there a tool for editing osdmap?
[23:04] * scg (~zscg@valis.gnu.org) Quit (Quit: Leaving)
[23:04] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:06] <TMM> vet-hd_, did you use ceph-objectool to export the data from the pg?
[23:11] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[23:11] * ircolle (~Adium@c-71-229-136-109.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[23:14] * Pirate (~offer@4MJAAFF7R.tor-irc.dnsbl.oftc.net) Quit ()
[23:14] * zapu (~darks@7V7AAE42J.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:21] * bene (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[23:23] * rendar (~I@host30-183-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[23:25] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[23:26] * dgurtner (~dgurtner@178.197.239.195) has joined #ceph
[23:27] * thumpba (~thumbpa@2602:302:d133:2380:c04:d78a:b7d6:6464) Quit (Remote host closed the connection)
[23:29] * thumpba (~thumbpa@45-19-50-56.lightspeed.austtx.sbcglobal.net) has joined #ceph
[23:29] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[23:31] * garphy is now known as garphy`aw
[23:37] * thumpba (~thumbpa@45-19-50-56.lightspeed.austtx.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[23:37] * bvi (~Bastiaan@185.56.32.1) Quit (Ping timeout: 480 seconds)
[23:41] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[23:44] * zapu (~darks@7V7AAE42J.tor-irc.dnsbl.oftc.net) Quit ()
[23:44] * Zombiekiller (~cooey@50.7.151.127) has joined #ceph
[23:46] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:48d6:c3b0:afc:a88e) has joined #ceph
[23:54] * Brochacho (~alberto@c-73-45-127-198.hsd1.il.comcast.net) Quit (Quit: Brochacho)
[23:56] * rwheeler (~rwheeler@pool-173-48-195-215.bstnma.fios.verizon.net) has joined #ceph
[23:57] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[23:59] * Lokta (~Lokta@carbon.coe.int) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.