#ceph IRC Log

Index

IRC Log for 2013-05-15

Timestamps are in GMT/BST.

[0:00] * fridudad_ (~oftc-webi@p4FC2DD1A.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[0:02] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) has joined #ceph
[0:04] * aliguori (~anthony@32.97.110.51) Quit (Remote host closed the connection)
[0:15] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) has joined #ceph
[0:18] <dwt> hi forgive my ignorance, why does this have to be at the TOP of the file? "For example, on Ubuntu, add env CEPH_ARGS="--id volumes" to the top of /etc/init/cinder-volume.conf."
[0:19] * vata (~vata@2607:fad8:4:6:6c9f:efe8:22e5:53c4) Quit (Quit: Leaving.)
[0:23] * psiekl (psiekl@wombat.eu.org) Quit (Ping timeout: 480 seconds)
[0:24] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:27] * psiekl (psiekl@wombat.eu.org) has joined #ceph
[0:32] <dmick> dwt: it probably doesn't. Just needs to be present before any ceph commands are called, likely
[0:32] <dwt> that's what I am thinking. I'll test that theory shortly :D
[0:38] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[0:39] * drokita1 (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:41] * alex__ (~chatzilla@d24-141-198-231.home.cgocable.net) Quit (Ping timeout: 480 seconds)
[0:42] * danieagle (~Daniel@186.214.76.12) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[0:44] * tnt (~tnt@91.176.27.204) Quit (Ping timeout: 480 seconds)
[0:45] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[0:49] <dwt> I'm trying to write the puppet-cinder driver for ceph. Either way it's forcing me to modify stdlib (I'm totally OCD)
[0:49] * portante|ltp (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[0:52] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:58] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[1:08] * rustam (~rustam@94.15.91.30) has joined #ceph
[1:25] * kyle__ (~kyle@216.183.64.10) Quit (Quit: Leaving)
[1:29] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[1:30] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:33] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[1:39] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:39] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:40] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[1:45] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) has joined #ceph
[1:48] <sagewk> sjust: patch looks good
[1:48] <sjust> sagewk: I'll test it overnight
[1:48] <sagewk> k
[1:49] <sjust> we might want to add an aio/dio dimension to the test suites
[1:49] <sjust> is there a way to do that without doubling the number of runs?
[1:52] <cjh_> with the ceph rados restful api i get access denied when i try to use it. is there something special i need to setup to access it?
[1:54] <cjh_> maybe some simple curl examples on the wiki would help
[2:24] * berant (~blemmenes@24-236-240-247.dhcp.trcy.mi.charter.com) has joined #ceph
[2:24] <cjh_> should we be using ceph-deploy-release-1.0 or the 0.1-31.{number}
[2:25] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[2:27] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:27] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:29] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[2:30] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[2:30] * alram (~alram@cpe-75-83-127-87.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:31] * LeaChim (~LeaChim@176.250.188.136) Quit (Remote host closed the connection)
[2:36] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[2:38] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:42] <cjh_> ceph-deploy just says i don't have any modules installed
[2:43] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:43] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:47] * rturk is now known as rturk-away
[2:48] <dmick> cjh_: Authorization for the admin API duplicates the S3 authorization mechanism.
[2:49] <dmick> so presumably you'll want to use something like boto or something
[2:49] <cjh_> dmick: do i need to make myself a special admin user ? it wasn't clear
[2:49] <dmick> as for ceph-deploy, there shouldn't be a huge delta, but either ought to work
[2:49] * Kioob (~kioob@luuna.daevel.fr) Quit (Read error: No route to host)
[2:49] <cjh_> ok i guess the package for centos6 is broken a little
[2:50] <dmick> admin user: not sure.
[2:51] <cjh_> i fixed ceph-deploy
[2:51] <cjh_> lemme shoot you a diff
[2:51] <cjh_> unless you want to do it
[2:51] <cjh_> it was 2 lines haha
[2:53] * dwt (~dwt@rtp-isp-nat1.cisco.com) Quit (Read error: Connection reset by peer)
[2:54] * rustam (~rustam@94.15.91.30) has joined #ceph
[2:56] * alram (~alram@cpe-75-83-127-87.socal.res.rr.com) has joined #ceph
[2:57] * scuttlemonkey_ is now known as scuttlemonkey
[2:59] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:00] <dmick> what was the bug cjh_?
[3:01] <cjh_> i submitted a pull request :)
[3:01] <cjh_> i was just missing a path in the ceph-deploy script on centos6
[3:01] <cjh_> i gotta run. lemme know what you think
[3:04] <dmick> tnx
[3:06] * alram (~alram@cpe-75-83-127-87.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[3:12] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[3:29] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[3:30] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[3:32] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[3:37] * berant (~blemmenes@24-236-240-247.dhcp.trcy.mi.charter.com) Quit (Quit: berant)
[3:57] * sh_t (~sht@lu.privatevpn.com) has joined #ceph
[3:59] <sh_t> hi everyone. i'm looking to get a basic 2 node installation going but i'm getting stuck on the preparation and creation of the osds. basically I am getting errors like this upon running prepare/activate/create. everything else is ok and i'm able to communicate with my ceph node (zap disk, etc). http://pastebin.com/KYtxQzvZ any suggestions?
[4:07] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:13] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[4:17] * Orban (~ruckc@173-167-202-19-ip-static.hfc.comcastbusiness.net) has joined #ceph
[4:17] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:18] <Orban> i'm trying to standup a ceph cluster on rhel6 (with ceph-0.61.2), i've got the monitor services running (finally) but i can't get the "ceph osd" commands to work because i keep getting access denied. i tried "ceph health" but i get access denied. I setup a client.admin key in the keyring in /etc/ceph/ceph.keyring but nothing changes... any suggestions?
[4:21] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[4:26] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[4:26] <terje-> what does 'ceph auth list' report?
[4:27] <terje-> and if that fails, what does 'ceph auth list -k /etc/ceph/ceph.keyring'
[4:27] <terje-> report?
[4:27] * markbby (~Adium@168.94.245.4) has joined #ceph
[4:30] <Orban> both show access denied
[4:32] <terje-> there is a problem with your keyring
[4:33] <Orban> so recreate a client.admin key and redistribute to the monitor keyrings?
[4:33] <terje-> create a new key and test it, yea.
[4:34] <terje-> if it works, push it out
[4:34] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:36] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[4:43] <Orban> so, i just redid the ceph-authtool, distributed the file, and now getting unexpected key errors on the ceph-mon.?.logs
[4:51] <Orban> and i just redid all the mon. key and the client.admin same results...
[4:51] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[4:52] <Orban> and i just redid all the mon. key and the client.admin same results...
[4:52] <Orban> err, sorry
[5:02] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[5:06] <Orban> i just tried re-mkfs'ing the mon datastores to no avail
[5:09] <Orban> hmm, did it again and it works now... weird
[5:09] * themgt (~themgt@24-177-231-192.dhcp.gnvl.sc.charter.com) has joined #ceph
[5:09] <Orban> from only one of the 3 nodes...
[5:09] * themgt (~themgt@24-177-231-192.dhcp.gnvl.sc.charter.com) Quit ()
[5:16] * rustam (~rustam@94.15.91.30) has joined #ceph
[5:18] * fredlisboa (~fredlisbo@186.213.250.251) has joined #ceph
[5:18] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:22] * Orban (~ruckc@173-167-202-19-ip-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[5:26] <sh_t> anyone able to give some input on my issue above? thanks.
[5:26] <sh_t> the log says that the osd is ready for use but it isnt.
[5:27] * fredlisboa (~fredlisbo@186.213.250.251) Quit (Quit: Leaving)
[5:28] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[5:45] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:45] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:49] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[5:50] * rustam (~rustam@94.15.91.30) has joined #ceph
[5:52] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:13] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:15] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:51] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[6:55] * psiekl (psiekl@wombat.eu.org) Quit (Ping timeout: 480 seconds)
[7:11] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[7:40] * rustam (~rustam@94.15.91.30) has joined #ceph
[7:42] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[7:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:14] * tnt (~tnt@91.177.224.32) has joined #ceph
[8:19] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:21] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:23] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:32] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Few women admit their age. Few men act theirs.)
[8:38] * LeaChim (~LeaChim@176.250.188.136) has joined #ceph
[8:41] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Quit: Page closed)
[8:42] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:44] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:49] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:49] * coyo (~unf@71.21.193.106) has joined #ceph
[8:50] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:51] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[8:53] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:55] * coyo|2 (~unf@71.21.193.106) has joined #ceph
[8:59] * coyo (~unf@00017955.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:01] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:05] * coyo (~unf@71.21.193.106) has joined #ceph
[9:08] * coyo|2 (~unf@71.21.193.106) Quit (Ping timeout: 480 seconds)
[9:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:10] * rustam (~rustam@94.15.91.30) has joined #ceph
[9:11] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[9:13] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[9:16] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:16] * ChanServ sets mode +v andreask
[9:16] * esammy (~esamuels@host-2-102-68-228.as13285.net) has joined #ceph
[9:16] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[9:16] * coyo (~unf@00017955.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:17] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:18] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:20] * esammy (~esamuels@host-2-102-68-228.as13285.net) has left #ceph
[9:32] * tnt (~tnt@91.177.224.32) Quit (Ping timeout: 480 seconds)
[9:37] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[9:37] * rustam (~rustam@94.15.91.30) has joined #ceph
[9:38] * loicd trying to remember what's the difference between the next and the master branch
[9:39] <loicd> got it http://ceph.com/docs/master/install/clone-source/#choose-a-branch
[9:40] <loicd> I'm not sure I understand when next starts divering from master. Is it a few days or a few hours before the release ?
[9:41] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[9:41] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:44] <loicd> s/divering/diverging/
[9:45] * loicd heading to the office
[9:45] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:47] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:49] * leseb (~Adium@83.167.43.235) Quit ()
[9:49] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:55] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:58] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:00] * Anticime1 is now known as Anticimex
[10:05] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[10:21] * rustam (~rustam@94.15.91.30) has joined #ceph
[10:28] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[10:34] * JohansGlock_ (~quassel@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[10:34] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:35] * frank9999 (~frank@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[10:36] <nigwil> ahh, that is the keyword I was struggling to remember
[10:36] <nigwil> oops, not here :-)
[10:40] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[10:43] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[10:46] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[10:50] <tnt> mmh, osd memory usage seems more well behaved on cuttlefish than it was on bobtail. nice.
[10:53] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[11:03] <mrjack> my monitoring is going amok since 0.61.2 - is there a way to configure ceph that a clockskew does not result in a CEPH HEALTH_WARN status?
[11:03] <mrjack> or to define another limit for clockskew?
[11:05] <tnt> I think ceph relies on synced clocks and will misbehave is they're not synced ...
[11:05] <tnt> so ... fix your clock :p
[11:06] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) has joined #ceph
[11:09] <mrjack> tnt: even when running ntpdate on all nodes, there is a clockskew warning
[11:11] <tnt> how much ?
[11:11] <mrjack> not much a few 0.0545779s > max 0.05s, 0.124411s > max 0.05s
[11:12] <tnt> also ntpdate is deprecated, you should use 'ntp'
[11:13] <tnt> and make sure you specify an explicit ntp server rather than a generic 'pool' (rr dns) so that they all sync to the same ntp server.
[11:16] <mrjack> ?
[11:16] <mrjack> ntp is not installed...
[11:17] <mrjack> it is in package ntpd and that is blocking minutes on boot
[11:17] <mrjack> i use host time.fu-berlin.de has address 130.133.1.10 which is no pool imho?!
[11:18] <mrjack> where is that statement that ntpdate is depracted? ;)
[11:19] <tnt> it's blocking minutes on boot when ntpdate is installed as well :p
[11:21] <tnt> http://en.wikipedia.org/wiki/Ntpdate
[11:21] <tnt> "ntpdate is a deprecated computer program"
[11:21] * tziOm (~bjornar@ti0099a340-dhcp0870.bb.online.no) has joined #ceph
[11:21] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[11:21] <tnt> it's also in the debian bug that explains the bad interaction on boot between ntpd and ntpdate that basically they don't care since you shouldn't be installing ntpdate anyway.
[11:22] <mrjack> so i should install ntp and remove ntpdate?
[11:22] <tnt> yes, that's what I did on all my servers now.
[11:23] <tnt> one issue though is that ntpd doesn't block on boot when adjusting the time and so it takes a couple of minutes for the time to be correct after boot
[11:23] <mrjack> will give it a try
[11:23] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[11:23] <mrjack> tnt: put ntp -d -q <timeserver> in rc.local?
[11:24] * dfanc_wrk (~fancd--@212.147.27.221) has joined #ceph
[11:25] <tnt> mrjack: well, you can't do that while ntpd is running AFAIK and rc.local is after ceph start.
[11:25] <tnt> mrjack: I know I must still hack something to make it work but since it solves itself a couple of min after boot it's not high on the list.
[11:42] <dfanc_wrk> Hi, anyone has experience with radosgw - swift - keystone ?
[11:48] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[11:48] * rustam (~rustam@94.15.91.30) has joined #ceph
[11:51] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:13] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) Quit (Quit: Leaving)
[12:13] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[12:15] * brill (~brill@2001:67c:64:42:31ff:3e92:cf41:41b4) has joined #ceph
[12:16] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) has joined #ceph
[12:36] * ifur (~osm@hornbill.csc.warwick.ac.uk) Quit (Quit: Lost terminal)
[12:44] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[12:54] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Ping timeout: 480 seconds)
[13:12] * xiaoxi (~xiaoxi@shzdmzpr01-ext.sh.intel.com) has joined #ceph
[13:15] * eschnou (~eschnou@62-197-93-189.teledisnet.be) has joined #ceph
[13:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[13:17] * xiaoxi1 (~xiaoxi@shzdmzpr01-ext.sh.intel.com) has joined #ceph
[13:19] * xiaoxi (~xiaoxi@shzdmzpr01-ext.sh.intel.com) Quit (Remote host closed the connection)
[13:19] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[13:20] * brill (~brill@2001:67c:64:42:31ff:3e92:cf41:41b4) Quit (Ping timeout: 480 seconds)
[13:24] * eschnou (~eschnou@62-197-93-189.teledisnet.be) Quit (Ping timeout: 480 seconds)
[13:28] <tnt> Does anyone know how the pgid is fed to crush ? (i.e. when using crushtool you specify a number to get an osd output, byt a pgid is two number pool_id.pg_number)
[13:28] * benner (~benner@193.200.124.63) Quit (Read error: Connection reset by peer)
[13:33] <andreask> tnt: you mean like "ceph pg 3.b53 query"?
[13:34] <tnt> andreask: Well I'm trying to simulate pg placement using crushtool so I can try various things without messing with the real cluster.
[13:35] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[13:35] <tnt> right now, placement is not very even : http://pastebin.com/raw.php?i=ER6CitMQ
[13:39] <tnt> ok, well, I think I should osdmaptool instead of crushtool :p
[13:40] <andreask> I see, you want to optimize placement?
[13:41] <tnt> yes
[13:41] <tnt> having variance between -18% to +28% (wrt to average) is a bit much :p
[13:41] <andreask> how many pgs?
[13:42] <tnt> Too much :p 12808
[13:43] <tnt> That's mostly because I use several pools ( 10 or so )
[13:43] <andreask> but not a lot of osds
[13:43] <andreask> hmm ... so around 1000 pgs for 10 osds?
[13:43] <andreask> per pool ...
[13:44] <tnt> exactly. I'll be adding 4 shortly but that's still only 16. Back then when I created them, there was no advice about choosing pg_num when you have multiple pools.
[13:44] <tnt> wait, I'll paste the list of pools
[13:45] <tnt> http://pastebin.com/raw.php?i=Gmxseapu
[13:45] <joelio> Memory usage on Cuttlefish install is looking very stable after running for a couple of weeks
[13:46] <joelio> good work
[13:46] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:48] * humbolt (~elias@91-113-100-118.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[13:49] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[13:50] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[13:57] <andreask> tnt: more pgs reduce statistical variance in placement
[13:58] <tnt> andreask: well ... here it doesn't seem to be the case.
[13:59] <tnt> 12k PGs is way above what's recommended for 12 OSDs and I have a lot of pgs/osd variance.
[13:59] <andreask> tnt: some of your pools only have 768 pgs
[14:00] * humbolt (~elias@91-113-103-253.adsl.highway.telekom.at) has joined #ceph
[14:01] <tnt> and the formula in spec is OSDs * 100 / replica. The biggest pool is #30 and repl=3 -> 12*100/3 = 400.
[14:02] <tnt> When I look at pgs in that particular pool, the biggest osd has 266 PGs and the lower one is 147. That's pretty big.
[14:02] <andreask> tnt: yeah, that is not very consistent in the docs ... reading the mailing list, its more like ~100 or more pgs per osd and pool
[14:03] <tnt> well, that's still lower than what I have, so it should be more distributed than it is. Also there seem to be another nasty effect of using multiple pools: For each pool it's always the same OSD that has more PGs than the other.
[14:05] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[14:06] <andreask> all osds the have the same weight?
[14:07] * dosaboy (~dosaboy@host86-161-206-107.range86-161.btcentralplus.com) Quit (Quit: leaving)
[14:09] <tnt> andreask: yes
[14:10] <tnt> I'm still using the argonaut tunables so that's why I want to simulate, to see the impact of various things.
[14:13] <andreask> I see
[14:15] <tnt> last recourse is to play with the weight but I'd like to avoid it.
[14:15] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[14:21] * rustam (~rustam@94.15.91.30) has joined #ceph
[14:24] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[14:24] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) has joined #ceph
[14:27] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:28] <andreask> tnt: hmm .... one suggestion is also to use power of 2 values for number of pgs, but I must admit I have never checked if there is a big influence of that
[14:33] * brill (~brill@193.120.41.115) has joined #ceph
[14:34] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:35] <tnt> andreask: great, I did a simulation using 'optimal' tunables and the result is worse ...
[14:36] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:38] * brill (~brill@193.120.41.115) Quit (Quit: Leaving)
[14:42] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:43] * DarkAce-Z is now known as DarkAceZ
[14:47] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[14:49] <tnt> yikes, there are hard-coded behavior changes depending on the pool id ... "if (crush_ruleset == 0 && auid == 0)"
[14:52] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[14:53] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[14:53] * ChanServ sets mode +v andreask
[14:56] <tnt> I think one of the problem is that my pools don't have the flag FLAG_HASHPSPOOL .
[14:58] <andreask> what's your ceph version btw? cuttlefish?
[14:59] <tnt> it's cuttlefish now yes. Upgraded yesterday. But everything was created back at argonaut. I did argonaut -> bobtail -> cuttlefish basically.
[15:03] * dosaboy (~dosaboy@faun.canonical.com) has joined #ceph
[15:03] <andreask> tnt: how did you simulate that it will get worse?
[15:03] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[15:04] <tnt> andreask: I modified osdmap tool to get the raw pg of all pgs of a pool then I just count how many PG end up in each OSD.
[15:04] <tnt> going to optimal tunables made it a bit worse ... (not by much, but no improvement).
[15:05] <tnt> Simulating other things ( like the FLAG_HASHPSPOOL or # of PGs in the pool) is going to be more tricky ...
[15:07] <tnt> I think I'll need to write a custom test app calling crush->do_rule(...) with custom params myself.
[15:12] <andreask> maybe there are already some dev-tools around from the ceph-devs
[15:23] <joao> mrjack, mrjack_, around?
[15:25] <mrjack_> yeah
[15:25] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[15:25] <mrjack_> joao: i'm here ;)
[15:26] <joao> mrjack_, how's your cluster hanging?
[15:26] <mrjack_> joao: hm?
[15:26] <mrjack_> hanging?
[15:26] <joao> going
[15:26] <mrjack_> going well ;)
[15:26] <joao> seen any other asserts on the authmonitor?
[15:27] <mrjack_> i noticed fewer load on my nodes
[15:27] <joao> mrjack_, can you provide me with a copy of one of your running monitor's data dir?
[15:27] <mrjack_> no, but i had trouble taking the mon out and in again... because i first startet the mon, and then told ceph that there is another mon... got things up a little confused, but managed to establish quorum again and now it's working again
[15:28] <joao> I've seen what the problem was with the other monitor, but not the cause; was hopping to compare it with an healthy one
[15:28] <mrjack_> joao: hm... i have deleted it
[15:28] <mrjack_> joao: i thought you got it?
[15:28] <joao> mrjack_, I mean the store from one of you running monitors
[15:28] <joao> a healthy one
[15:28] <mrjack_> joao: sure
[15:29] <joao> I have the broken one here :)
[15:29] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[15:29] <joao> mrjack, thanks!
[15:29] <mrjack_> joao: i have problems, if i try to tar it up, it sometimes says file has been vanished before tar could read it
[15:30] <joao> mrjack, try copying it first to some location
[15:30] <mrjack_> joao: i think it is inconsistent then?
[15:30] <joao> yeah, probably
[15:30] <joao> but I'll see if it works
[15:30] <mrjack_> hm
[15:31] <mrjack_> my store is now 1.2gb
[15:31] <joao> otherwise, the only option would be to shutdown one of the monitors and copy it
[15:31] <joao> hmm
[15:31] <joao> most likely you're suffering from the leveldb growing issue
[15:31] <mrjack_> ah ok
[15:32] <mrjack_> hm i have some GB left, but not much
[15:32] <mrjack_> i'll take one out, copy it, so you can grab it
[15:33] <mrjack_> in a few hours? i'd like to go to training now ;)
[15:33] <joao> mrjack, try shutting down the monitor, copy the store and restart the monitor with 'mon compact on start = true'
[15:33] <joao> that should reduce the store size
[15:33] <joao> mrjack_, sure
[15:33] <mrjack_> ok i'll give it a try
[15:33] <joao> I'll be around all day
[15:33] <joao> well, most of it anyways
[15:35] <mrjack_> i'll get back to you once i have kicked my ass up ;)
[15:35] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:36] <mrjack_> joao: but i see from the leveldb log, that it is compacting
[15:36] <joao> how frequently?
[15:36] <joao> also, could you share that log?
[15:36] <mrjack_> every 10 minutes or so
[15:37] <mrjack_> that log is in the mon directory...
[15:37] <mrjack_> so you'll get that, too
[15:37] <joao> thanks
[15:37] <mrjack_> ok, see you in about 2 hours.
[15:37] <mrjack_> bbl.
[15:48] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[15:49] * alrs (~lars@209.144.63.76) has joined #ceph
[15:55] * ScOut3R_ (~ScOut3R@212.96.47.215) has joined #ceph
[15:55] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[15:58] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:01] * leseb (~Adium@83.167.43.235) has joined #ceph
[16:01] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:01] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[16:02] * drokita (~drokita@199.255.228.128) has joined #ceph
[16:09] <matt_> random question, but does each osd keep a persistent tcp connection to every other osd that it's in a PG with?
[16:11] * portante|ltp (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[16:11] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[16:11] <agh> Hello to all
[16:11] <agh> I'm setting up an Openstack / Ceph PoC
[16:12] <agh> But, i've a question
[16:12] <agh> I well understand how Ceph works
[16:12] <agh> but i'm newbie in Openstack
[16:12] <agh> => question : where have I to install Cinder ?
[16:12] <agh> My Ceph cluster is already up & running
[16:13] <agh> if i'm right, computes node must have access to the Ceph network, no ?
[16:19] <matt_> agh, I can't help with the cinder config but yes, your compute node need access to the ceph public network
[16:19] <matt_> if you haven't setup a separate osd and client networks then it's the same network
[16:19] <agh> mm... ok. Yes I have 2 different networks.
[16:20] <agh> There is a lot of quick howtos about openstack over ceph, but not a real doc... ok i'm gonna try
[16:20] <agh> thanks anyway
[16:21] <matt_> yeh, so in your case the compute nodes need access to the client/public network
[16:21] <matt_> sorry I can't help much with the rest
[16:23] * xiaoxi1 (~xiaoxi@shzdmzpr01-ext.sh.intel.com) Quit (Remote host closed the connection)
[16:34] <leseb> hi
[16:38] <Vjarjadian> anyone here got experience with multi site Ceph over 'normal' WAN... I'm thinking should the WAN fail at one of the sites, if all sites have a MON. Ceph might try and rebalance to it's replication ratio at each site and act as individual clusters... is this right and would it easily merge back together again when WAN was restored? The cluster i'm planning is 'only' archive storage and backup storage so it's not doing a lot of writes... just keeping everyt
[16:40] <tnt> Vjarjadian: nope
[16:40] <tnt> Vjarjadian: ceph will only work if at least (# mon/2)+1 mons are up.
[16:41] <tnt> Vjarjadian: so one of the two sites will just go down.
[16:41] <agh> I've a problem that i do not understand... at all
[16:41] <agh> i have a Ceph cluster up & running with several pools :
[16:41] <joao> Vjarjadian, not only what tnt said, but also you should take into account the latencies between sites
[16:41] <agh> ceph osd lspools 0 data,1 metadata,2 rbd,4 vm,6 test,
[16:41] <agh> ok.
[16:41] <joao> ceph doesn't like latency much
[16:42] <agh> with ceph -s : pgmap v3069890: 10336 pgs: 10336 active+clean; 320 GB data, 663 GB used, 83131 GB / 83794 GB avail; 727KB/s wr, 76op/s
[16:42] <agh> well.
[16:42] <agh> now, i want to create a new pool
[16:42] <agh> so i do :
[16:42] <agh> ceph osd pool create os-images 2000
[16:42] <joao> monitors will start getting cranky (and by cranky I mean they will start displaying erroneous behavior)
[16:42] <agh> the command works, the pool is created
[16:42] <agh> BUT
[16:42] <Vjarjadian> joao, latency would be a problem... which is why i wasnt planning on using it as live data... just for mostly static data that can go slow if needed.
[16:42] <agh> HEATLH_WARN with 2 active+degraded PGs
[16:43] <agh> do you have any idea ?
[16:43] <joao> Vjarjadian, there are some strict latency requirements for operations on ceph
[16:43] <tnt> agh: can you do a query on those two PGs ?
[16:44] <Vjarjadian> one of the guides i've seen has 'mon osd down out interval = 60' wouldn't that help at all?
[16:44] <agh> tnt: how ?
[16:44] <joao> oh
[16:44] <joao> looks like we increased default timeout values
[16:44] <tnt> agh: do a ceph pg dump | grep degraded to get their ir
[16:45] <joao> Vjarjadian, the monitor used to have some timeouts that would go off after 50ms
[16:45] <joao> but it appears most of those have been increased to as high as 10s
[16:45] <agh> tnt:
[16:45] <tnt> agh: then ceph pg 1.1 query (replace 1.1 by the pgid you got from the grep)
[16:45] <agh> 13.af 0 0 0 0 0 0 0 active+degraded 2013-05-15 14:45:17.500330 0'0 5156'9 [3] [3] 0'0 0.000000 0'0 0.000000
[16:45] <agh> 13.1a 0 0 0 0 0 0 0 active+degraded 2013-05-15 14:45:18.013525 0'0 5156'9 [3] [3] 0'0 0.000000 0'0 0.000000
[16:46] <Vjarjadian> agh, use a pastebin :)
[16:46] <tnt> agh: ok, so basically looks like CRUSH failed. What's your cluster topology ? ( # host, # osds ...)
[16:46] <Vjarjadian> make it easier to read for one
[16:47] <tnt> agh: what version do you have ? and what client version do you have ?
[16:47] <joao> secondly, it doesn't break on-going conversations
[16:48] <agh> tnt: http://pastebin.com/C5TM7Jww
[16:48] <tnt> agh: http://ceph.com/docs/master/rados/operations/crush-map/
[16:48] <agh> tnt: bobtail
[16:48] <tnt> agh: read the section about "Tunables".
[16:49] <tnt> agh: ah but you have two OSDs down
[16:49] <agh> tnt: yes, down and out
[16:49] <agh> tnt: problem with hard drives.
[16:51] <agh> tnt: ok. So what do i have to do with tunables ?
[16:53] <agh> ceph osd crush tunables bobtail for instance ?
[16:53] <tnt> agh: well, I would say "ceph osd crush tunables bobtail" BUT 1) make sure you read the client compatibility 2) you may want to talk to a dev to make sure because I never used them myself yet and they come with some warning.
[16:54] <tnt> of course is all depends on how important the cluster data is ...
[16:54] <agh> tnt: mmm. ok i will send a mail to the ML so
[16:54] <agh> tnt: it's an "almost production" cluster...
[17:05] <topro> update bobtail to cuttlefish: update clients first or daemons first, what is recommended. Couldn't find that on the release notes.
[17:06] <tnt> topro: I updated the rgw, then the mons, then the osds and I haven't updated the client yet.
[17:07] <topro> well actually I'm talking about cephfs clients ;)
[17:07] * vata (~vata@2607:fad8:4:6:a464:4175:e0ad:aef) has joined #ceph
[17:08] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[17:08] <tnt> topro: features are negotiated anyway so I don't think that matters.
[17:08] <topro> ok, that sounds good
[17:09] <topro> regarding your conversation before, whats the impact of using tunables? performance-wise or stability-wise?
[17:10] <tnt> I think it just influences the data-placement, so ... neither would really be affected.
[17:11] <topro> just prevent imbalance of pg vs osds?
[17:11] <tnt> yeah, I think in general it's to make CRUSH more well behaved and more closely match a random distribution while still meeting constraint.
[17:12] <topro> ok, as I would need linux 3.9 on clients to mount cephfs with bobtail-tunables. currently I can only get linux 3.8 on wheezy without building my own kernel
[17:14] <tnt> I've just built my own kernel based on 3.8.13 + all ceph patches from upstream :p
[17:20] * Wolff_John (~jwolff@vpn.monarch-beverage.com) has joined #ceph
[17:23] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:24] <absynth> lol. dona in "my first mailing list subscription" :D
[17:26] <tnt> :)
[17:26] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[17:27] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[17:34] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:41] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[17:43] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[17:45] <topro> when restarting osds after upgrade to cuttlefish they get reweighted to 0.45 by init-script automatically. is that normal behaviour
[17:45] <topro> ^ ?
[17:48] * dfanc_wrk (~fancd--@212.147.27.221) Quit (Quit: Leaving)
[17:48] <tnt> topro: are you sure they were reweighted ? I mean, it does print a message when doing service ceph start, but in fact, it doesn't do it unless the osd are new and not pre-existing.
[17:49] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:49] <topro> tnt: well I don't know if it really does it. I thought when on start it says it will, it really does
[17:50] <topro> simple way to see osd weights without crushmap decompilation?
[17:50] <topro> hmm, ceph osd tree still says 1 for all osds
[17:50] <tnt> ceph osd tree
[17:50] <topro> ^^
[17:51] <tnt> yup, so it's fine.
[17:51] <topro> so the start message is simply misleading
[17:52] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[17:52] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[17:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:53] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[17:55] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[17:56] * ScOut3R_ (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[17:57] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[17:59] * joshd1 (~jdurgin@2602:306:c5db:310:9dc7:5393:e640:6544) has joined #ceph
[18:00] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[18:09] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:10] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[18:13] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[18:15] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:22] * tnt (~tnt@91.177.224.32) has joined #ceph
[18:24] * Gugge-47527 (gugge@kriminel.dk) Quit (Read error: Connection reset by peer)
[18:24] * Gugge-47527 (gugge@kriminel.dk) has joined #ceph
[18:24] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[18:25] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:26] * alram (~alram@38.122.20.226) has joined #ceph
[18:26] * sagewk (~sage@2607:f298:a:607:10e3:380:4dd2:33c) Quit (Remote host closed the connection)
[18:31] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:33] * rustam (~rustam@94.15.91.30) has joined #ceph
[18:36] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[18:39] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[18:39] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[18:40] * sagewk (~sage@2607:f298:a:607:1010:231c:3d5f:f266) has joined #ceph
[18:41] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:41] * jskinner (~jskinner@69.170.148.179) Quit (Ping timeout: 480 seconds)
[18:42] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:42] * markbby1 (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[18:43] * dcasier (~dcasier@80.215.40.248) has joined #ceph
[18:45] * sagewk (~sage@2607:f298:a:607:1010:231c:3d5f:f266) Quit (Quit: Leaving.)
[18:48] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[18:48] * Wolff_John (~jwolff@vpn.monarch-beverage.com) Quit (Ping timeout: 480 seconds)
[18:50] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:50] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:53] * sagewk (~sage@2607:f298:a:607:1010:231c:3d5f:f266) has joined #ceph
[18:56] * dcasier (~dcasier@80.215.40.248) Quit (Ping timeout: 480 seconds)
[18:56] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[19:00] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Download IceChat at www.icechat.net)
[19:02] <paravoid> so, any tips on how to reduce the latency penalty on recovery?
[19:02] <paravoid> https://graphite.wikimedia.org/render/?title=Top%208%20FileBackend%20Methods%20by%20Max%2090th%20Percentile%20Time%20%28ms%29%20log%282%29%20-1day&from=-1day&width=1024&height=500&until=now&areaMode=none&hideLegend=false&logBase=2&lineWidth=1&target=cactiStyle%28substr%28highestMax%28FileBackendStore.*.tp90,8%29,0,2%29%29
[19:02] <paravoid> this is the result of a single failed disk
[19:03] <paravoid> reducing "osd recovery op priority"?
[19:03] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:05] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:09] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Quit: Leaving.)
[19:10] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[19:14] <sjust> paravoid: that would help
[19:14] * gregaf (~Adium@2607:f298:a:607:e538:5598:e131:ae4b) Quit (Quit: Leaving.)
[19:16] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[19:20] * jskinner (~jskinner@69.170.148.179) Quit (Ping timeout: 480 seconds)
[19:22] * gregaf (~Adium@2607:f298:a:607:6c73:5681:87:beda) has joined #ceph
[19:26] * alram (~alram@38.122.20.226) Quit (Quit: Lost terminal)
[19:29] * fridudat (~oftc-webi@p4FC2D750.dip0.t-ipconnect.de) has joined #ceph
[19:30] * alram (~alram@38.122.20.226) has joined #ceph
[19:35] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:35] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:36] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[19:39] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:39] * alrs (~lars@209.144.63.76) Quit (Ping timeout: 480 seconds)
[19:42] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:45] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[19:47] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[19:47] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:59] * Wolff_John (~jwolff@ftp.monarch-beverage.com) has joined #ceph
[20:00] * coyo (~unf@pool-71-170-191-140.dllstx.fios.verizon.net) has joined #ceph
[20:01] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:01] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[20:02] * glowell (~glowell@2607:f298:a:607:705f:39a9:7120:329a) has joined #ceph
[20:04] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:04] * ChanServ sets mode +v andreask
[20:06] * dosaboy (~dosaboy@faun.canonical.com) Quit (Quit: leaving)
[20:13] <mrjack_> joao: i have a question on the leveldb mon store.db ... i have many .sst files in there (each for one monmap epoch?!) - can i delete old ones?
[20:16] <joao> mrjack_, I'd say that's a really bad idea; .sst's are where leveldb keeps its stuff, and it should be left to leveldb to take care of them
[20:16] <joao> deleting them may very well render the store unusable
[20:17] <tnt> mrjack_: if it takes too much space I found that restarting the mon helped ...
[20:20] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[20:21] * rustam (~rustam@94.15.91.30) has joined #ceph
[20:22] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[20:23] <sjust> mrjack_: leveldb compacts the .sst files over time on it's own
[20:23] <sjust> you don't want to delete them
[20:23] <masterpe> Hi I did an upgrade from ceph 0.56.4-1precise to 0.61.2-1precise
[20:23] <masterpe> And now I get the message "found errors while attempting to convert the monitor store: (17) File exists"
[20:24] <masterpe> And it looks like I hit the bug http://www.spinics.net/lists/ceph-users/msg01552.html -> http://tracker.ceph.com/issues/4974
[20:24] <tnt> masterpe: when (which version) was the cluster created btw ?
[20:25] <masterpe> The initial version?
[20:25] <tnt> yes
[20:26] <tnt> the upgrade guides says that if the cluster was created with argonaut, you must upgrade to 0.56.6 first and then to 0.61.x
[20:26] <joao> most likely that's a previous attempt at converting that failed and left a 'store.db' on the mon data dir
[20:27] <joao> masterpe, try 'mv <mon-data-dir>/store.db <mon-data-dir>/store.db.old' and then rerun the monitor
[20:28] * diegows (~diegows@200.68.116.185) has joined #ceph
[20:28] <paravoid> masterpe: no idea about the bug, but the fix seems to be included in 0.61.2
[20:29] <masterpe> I know
[20:30] <joao> if 4974 were to be hit, it would probably cause a crash dump
[20:30] <joao> s/to be/being/
[20:30] <joao> my best guess is what I just mentioned above
[20:31] <joao> other than that, only a log file with 'debug mon = 20' would probably tell us more
[20:39] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[20:42] * ChanServ sets mode +o scuttlemonkey
[20:49] * anthonyacquanita (~anthonyac@pool-173-50-173-226.pghkny.fios.verizon.net) has joined #ceph
[20:49] * JohansGlock_ (~quassel@kantoor.transip.nl) has joined #ceph
[20:50] <cjh_> anyone know what the ORNL experiment is in sage's pdf slides ?
[20:51] <anthonyacquanita> ? -- I have a new NFS single nfs server with 17 drives. Looking to use ceph instead of nfs. What should I do? JBOD or Raid?
[20:51] <cjh_> jbod
[20:51] <anthonyacquanita> and can I add a second ceph server later
[20:51] * ChanServ sets mode +v joao
[20:52] * rtek (~sjaak@rxj.nl) Quit (Read error: Connection reset by peer)
[20:52] * rtek (~sjaak@rxj.nl) has joined #ceph
[20:54] * ChanServ sets mode +v dmick
[20:54] <anthonyacquanita> So just install everything on this on nfs server and as I grow move services off the single server and make it a more dedicated osd server?
[20:55] * ChanServ sets mode -v dmick
[20:55] <cjh_> test it and see what you get out of it ;)
[20:55] * JohansGlock_ (~quassel@kantoor.transip.nl) Quit (Quit: No Ping reply in 180 seconds.)
[20:55] * saaby (~as@mail.saaby.com) Quit (Ping timeout: 480 seconds)
[20:55] * frank9999 (~frank@kantoor.transip.nl) Quit (Remote host closed the connection)
[20:55] * JohansGlock_ (~quassel@kantoor.transip.nl) has joined #ceph
[20:56] * bergerx_ (~bekir@78.188.101.175) Quit (Remote host closed the connection)
[20:56] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[20:57] <anthonyacquanita> will do, thanks
[20:58] * JohansGlock_ (~quassel@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[20:58] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[20:58] <nhm_> cjh_: yes, I performed it. :)
[20:58] * saaby (~as@mail.saaby.com) has joined #ceph
[20:59] <cjh_> was the double writing the key to getting to 5.5GB/s ?
[21:00] <nhm_> cjh_: the double writing is (presumably) what stopped us from getting beyond 5.5GB/s
[21:00] <cjh_> i'm curious how you got there because i've been trying for weeks to get there.
[21:00] <nhm_> cjh_: since the DDN SFA10k is capable of about 11-12GB/s.
[21:00] <cjh_> what is DDN SFA10K?
[21:02] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[21:03] <nhm_> cjh_: http://www.ddn.com/products/SFA10K-X
[21:04] <cjh_> that looks expensive :)
[21:04] <nhm_> cjh_: yes. :)
[21:04] <nhm_> cjh_: it's what ORNL built spider with.
[21:04] <cjh_> nhm: so i'm seeing about the same thing. my cluster is capable of X and ceph performs as 50% of X or less
[21:05] <nhm_> cjh_: are you counting journal throughput as part of what your cluster is capable of?
[21:05] <nhm_> cjh_: the DDN experiment had journals on DDN exported LUNs.
[21:05] <cjh_> journal throughput hmm.. not sure what you mean
[21:05] <cjh_> i have combined journal/osd disks
[21:06] <cjh_> so i get 1/2 of the write speed right?
[21:06] <cjh_> throughput i mean
[21:06] <nhm_> cjh_: yes
[21:06] <cjh_> ok so my cluster is more like 75% of what is possible then
[21:06] <cjh_> what would happen if i made the journal like 100MB
[21:07] <cjh_> the whole journal thing doesn't work well with the use case of constant sustained high write workloads
[21:07] <nhm_> cjh_: you'd hit the journal limits faster and presumably have short stalls while they drain and bumpier performance
[21:07] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:08] <cjh_> yeah at 5GB where its' currently at i see a lumpy performance
[21:08] <cjh_> they're constantly filling and draining and it bogs the whole thing down
[21:08] <nhm_> cjh_: for those kinds of scenarios it seems to be best to put a couple of journals on fast SSDs to give you the throughput advantage without losing too much capacity.
[21:08] <cjh_> nhm: yeah i keep wondering about doing that
[21:09] <cjh_> why not eliminate the journal all together?
[21:09] <cjh_> i get that it gives us snapshots and some other cool stuff
[21:09] <scuttlemonkey> if anyone is interested the UDS Ceph session is about to start
[21:09] <scuttlemonkey> http://summit.ubuntu.com/uds-1305/meeting/21786/servercloud-s-ceph/
[21:10] <nhm_> cjh_: it's use to guarantee consistency for writes. Basically we can do direct IO writes to the journal totally sequentially to guarantee that data hits the disk fast, then use buffered io to lazily copy the data over to the OSD.
[21:10] <cjh_> i see
[21:10] <cjh_> i'm not sure what i should do then. grow the journal or shrink it
[21:11] <nhm_> cjh_: ultimately if your journals are on the same disks as the OSDs, you are bound by doing 1 buffered and 1 direct IO write to the disk for every write, and the overhead that entails.
[21:11] <cjh_> terrible
[21:12] <cjh_> yeah i guess that's the limit with the design
[21:12] <cjh_> i keep getting push back that it's slow
[21:13] <nhm_> cjh_: with btrfs, there's a chance we might be able to improve it quite a bit by doing a COW from the journal to the OSD.
[21:13] <cjh_> yeah btrfs is decent but still not great
[21:13] <cjh_> i'm going to try xfs again to see how it performs with cuttlefish
[21:13] <nhm_> cjh_: It may be worthwhile to look at flash-backed NVRAM cards for journals a well.
[21:14] * l0nk (~alex@87-231-111-125.rev.numericable.fr) has joined #ceph
[21:14] <cjh_> yeah they're quite fast. and they can handle prob 2GB/s
[21:15] * l0nk (~alex@87-231-111-125.rev.numericable.fr) Quit ()
[21:15] <cjh_> i guess the downside is if i lose a flash drive i lose 12 3T drives
[21:15] <cjh_> that's a lot of replication haha
[21:15] <tnt> how about battery backed RAM ... a couple of G is pretty cheap.
[21:15] <nhm_> Yeah, that's why the nvram cards are interesting. It's all ram, and if the machine loses power a supercapacitor transfers the data to flash.
[21:16] * dosaboy (~dosaboy@host86-161-206-107.range86-161.btcentralplus.com) has joined #ceph
[21:16] <nhm_> RAM won't have the same write cycle limitations as flash.
[21:16] <cjh_> yeah i should see if i can get my hands on a few flash or ram pci cards and see how that performs
[21:17] <cjh_> as an aside does ceph-deploy have support for specifying a user to use instead of sudo?
[21:17] <nhm_> I suspect that in the future, those kinds of solutions could be very reasonable for high performance use cases.
[21:17] <cjh_> nhm_: i agree
[21:17] <nhm_> both as a journal store and as a caching layer for the OSDs themselves like bcache/flashcache.
[21:17] * ScOut3R (~ScOut3R@business-80-99-161-73.business.broadband.hu) has joined #ceph
[21:22] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[21:24] * glowell (~glowell@2607:f298:a:607:705f:39a9:7120:329a) Quit (Quit: Leaving.)
[21:25] <jmlowe> sage: You around, I need some advice?
[21:27] <scuttlemonkey> jmlowe: he is in the Ceph UDS session at the moment
[21:27] <jmlowe> oh, right, somebody said that
[21:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[21:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[21:33] * ScOut3R (~ScOut3R@business-80-99-161-73.business.broadband.hu) Quit (Ping timeout: 480 seconds)
[21:34] <jmlowe> I suppose I should throw this out there, I had an osd crash last week and after I recovered I had a object that was 1.2M on disk and ceph through it should be 4M, Sage advised me to truncate up to 4M which I did, I now have this error when I scrub "2.17a deep-scrub stat mismatch, got 1044/1044 objects, 7/7 clones, 4187981824/4184942592 bytes."
[21:35] <jmlowe> So now what?
[21:37] <jmlowe> the difference of 4187981824 vs 4184942592 is exactly the amount I truncated the object by
[21:39] <mrjack> rbd rm <image> makes lots of slow requests, why?
[21:39] <mrjack> and finally marks osd down falsely
[21:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:45] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[21:48] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[21:50] * danieagle (~Daniel@186.214.76.12) has joined #ceph
[22:00] * thebishop (~thebishop@66.187.140.174) has joined #ceph
[22:03] <nhm_> jmlowe: let me see if I can find someone
[22:06] <nhm_> mrjack: strange, I don't think I've seen that.
[22:08] <sagewk> jmlowe: 'ceph pg repair <pgid>' will fix that stats mismatch
[22:11] <mrjack> nhm_: yeah...
[22:11] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:11] * ChanServ sets mode +v andreask
[22:12] <jmlowe> sagewk: that did the trick, thanks!
[22:12] * rustam (~rustam@94.15.91.30) has joined #ceph
[22:15] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[22:16] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[22:17] <ccourtaut> Hi there
[22:18] <ccourtaut> I'd like to know the current status of RGW Geo?
[22:18] <ccourtaut> And what is the current branch for this blueprint?
[22:19] <nhm_> ccourtaut: I think work has started but it's still early in the process.
[22:19] <nhm_> ccourtaut: for some value of work.
[22:20] <fridudat> nhm_: do you use flashcache / bcache?
[22:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:22] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) Quit (Read error: No route to host)
[22:23] * Wolff_John (~jwolff@ftp.monarch-beverage.com) Quit (Ping timeout: 480 seconds)
[22:24] <sagewk> paravoid: around?
[22:26] <ccourtaut> nhm_, i know it's still early, though i founded a first stage of implementation in wip-rgw-geo branch, but still don't know if it's the right wip branch
[22:27] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:28] <yehuda_hm> ccourtaut: the latest geo-replication implementation is in wip-rgw-geo-2
[22:28] <ccourtaut> yehuda_hm, ok thanks, i'll take a look at it
[22:29] * fridudat (~oftc-webi@p4FC2D750.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[22:32] * coyo (~unf@00017955.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:34] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) has joined #ceph
[22:35] <paravoid> sagewk: yes
[22:41] <sagewk> now that i know you're a DD.. i have a distro question for you :)
[22:43] <paravoid> haha
[22:43] <paravoid> sure :)
[22:43] <sagewk> ugh redmine search is so bad
[22:43] <paravoid> happy to help
[22:43] <sagewk> http://tracker.ceph.com/issues/4865
[22:43] <nhm_> lol
[22:43] * glowell (~glowell@38.122.20.226) has joined #ceph
[22:43] <sagewk> wheezy udev doesn't populate /dev/disk/by-partuuid. we worked around this on el6 with a helper script and a separate udev rule
[22:44] <sagewk> but it would be nicer if udev did that and ceph-disk could work in the normal way
[22:45] <paravoid> that seems to be http://git.kernel.org/cgit/linux/hotplug/udev.git/commit/?id=693b6344e193f5aeca21df5f1c98fd32148006ac
[22:45] <paravoid> right?
[22:45] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[22:46] <ay> Hum. Quick questions. I've got ceph up and running for cephfs and block device. Everything (except some strangeness when starting) works as expected. But How do i limit what machines can mount what blockdevice or cephfs?
[22:47] <paravoid> nah, 60-persistent-storage.rules is completely different anyway
[22:47] <paravoid> so, shipping something like that in wheezy's udev is next to impossible now that wheezy is released
[22:47] <tnt> sagewk: you should be familiar with CRUSH :) Is it possible/allowed to set FLAG_HASHPSPOOL on an existing pool ? Also, what kind of variance is expected in pgs/osd distribution ?
[22:48] <joao> well, congrats to all Chelsea fans out there; kinda hating you all now
[22:48] <paravoid> i.e. I don't think it would ever pass the stable release managers' review
[22:49] <sagewk> paravoid: yeah
[22:49] <paravoid> shipping your own rules.d would be the best way to go I think
[22:50] <sagewk> if we do that, will they conflict with udev if/when that gets updated?
[22:51] * jluis (~JL@89.181.145.37) has joined #ceph
[22:51] <paravoid> it might, not sure what udev does when a symlink exists already
[22:51] <sagewk> or if there is a dup rule in 2 files..
[22:52] <paravoid> if you have a test setup
[22:52] <paravoid> you can just cp the rules file into a second one and see what happens :)
[22:52] <sagewk> yeah, cool.
[22:52] <sagewk> thanks!
[22:53] <paravoid> should I open a bug report against (unstable's) udev?
[22:53] <paravoid> or do you want to?
[22:54] <sagewk> would be great if you could :)
[22:54] <sagewk> but in the meantime we'll need to do it ourselves for wheezy right?
[22:55] <paravoid> yeah
[22:55] * ScOut3R (~ScOut3R@business-80-99-161-73.business.broadband.hu) has joined #ceph
[22:55] <dmick> Will the ID_ vars be present to use?
[22:57] <paravoid> hey, wait
[22:57] * joao (~JL@89.181.145.151) Quit (Ping timeout: 480 seconds)
[22:57] <paravoid> # by-partlabel/by-partuuid links (partition metadata)
[22:57] <paravoid> ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_UUID}=="?*", \ SYMLINK+="disk/by-partuuid/$env{ID_PART_ENTRY_UUID}"
[22:57] <paravoid> ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_NAME}=="?*", \ SYMLINK+="disk/by-partlabel/$env{ID_PART_ENTRY_NAME}"
[22:57] <paravoid> that's from my wheezy install
[22:57] <paravoid> so it is there
[22:58] <sagewk> oh, hrm!
[22:58] <sagewk> what file?
[22:59] <sagewk> hrm, i have that too, but
[22:59] <sagewk> root@burnupi26:/lib/udev# ls -ald /dev/disk/by-*
[22:59] <sagewk> drwxr-xr-x 2 root root 880 May 14 11:17 /dev/disk/by-id
[22:59] <sagewk> drwxr-xr-x 2 root root 460 May 14 11:17 /dev/disk/by-path
[22:59] <sagewk> drwxr-xr-x 2 root root 240 May 14 11:17 /dev/disk/by-uuid
[22:59] <sagewk> sorry, hav ea meeting.. back in a bit!
[22:59] <sagewk> thanks
[22:59] * sagelap (~sage@2607:f298:a:607:598c:d480:4af:b6ce) has joined #ceph
[23:00] <paravoid> http://bugs.debian.org/681809
[23:00] <paravoid> and somehow it was in my browser's history
[23:00] <tnt> ah ! just found a bug in my crush simulation tool.
[23:01] * jluis (~JL@89.181.145.37) Quit (Remote host closed the connection)
[23:02] <paravoid> could be http://git.kernel.org/cgit/linux/hotplug/udev.git/commit/?id=1b9e13e2e2c4755752e1e9fd8ff4399af7329ab8
[23:02] * ScOut3R (~ScOut3R@business-80-99-161-73.business.broadband.hu) Quit (Read error: Operation timed out)
[23:02] <paravoid> aka 178~10
[23:03] * joao (~JL@89-181-145-37.net.novis.pt) has joined #ceph
[23:03] * ChanServ sets mode +o joao
[23:03] * Wolff_John (~jwolff@vpn.monarch-beverage.com) has joined #ceph
[23:07] <paravoid> replied to that bug report
[23:08] * sagelap1 (~sage@38.122.20.226) has joined #ceph
[23:09] <sagelap1> paravoid: looks promising..
[23:09] <sagelap1> will get fix get into wheezy?
[23:09] * sagelap (~sage@2607:f298:a:607:598c:d480:4af:b6ce) Quit (Read error: Connection reset by peer)
[23:10] <paravoid> I wouldn't give it high chances, but it might be able to creep in if there's other serious enough changes that need to happen on wheezy
[23:10] <paravoid> let's see what the maintainer will say
[23:10] <paravoid> however, if I understand that patch well, you could ship your rules files that have the rules without the ID_ part
[23:10] <sagelap1> well we can safely ship a || rule with th ewrong ID_ name
[23:10] <sagelap1> :)
[23:10] <sagelap1> yeah
[23:11] <paravoid> this would make them compatible with udev < 178
[23:11] <paravoid> and they'd be just ignored on >= 178
[23:11] <paravoid> since there isn't going to be any PART_ENTRY_{NAME,UUID} there
[23:11] <paravoid> so, win
[23:11] <sagelap1> yay
[23:12] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:14] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:16] <elder> sagelap1, I gave a long update to http://tracker.ceph.com/issues/5043 but I believe I found the bug. I'll prepare a fix but when you get a chance it would be nice to get acknowledgement from you on my analysis.
[23:21] <paravoid> sagelap1: speaking of packaging, did you see my mail on ceph-dev about radosgw 0.61/librados2 0.56?
[23:22] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:24] * Wolff_John (~jwolff@vpn.monarch-beverage.com) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[23:24] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[23:31] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[23:33] * thebishop (~thebishop@66.187.140.174) Quit (Quit: Leaving)
[23:38] * vata (~vata@2607:fad8:4:6:a464:4175:e0ad:aef) Quit (Quit: Leaving.)
[23:38] * sagelap1 (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[23:42] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[23:42] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[23:43] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[23:43] * rturk-away is now known as rturk
[23:44] * sagelap (~sage@38.122.20.226) has joined #ceph
[23:44] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[23:44] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[23:45] <masterpe> Hi I did an upgrade from ceph 0.56.4-1precise to 0.61.2-1precise
[23:45] <masterpe> On two mon nodes it is converting the mon data
[23:46] <masterpe> but on the 3th mon node i get root@storage-05:/srv/mon.e# /etc/init.d/ceph start mon
[23:46] <masterpe> === mon.e ===
[23:46] <masterpe> Starting Ceph mon.e on storage-05...
[23:46] <masterpe> [5435]: (33) Numerical argument out of domain
[23:46] <masterpe> failed: 'ulimit -n 8192; /usr/bin/ceph-mon -i e --pid-file /var/run/ceph/mon.e.pid -c /etc/ceph/ceph.conf '
[23:46] <masterpe> What the hell is "Numerical argument out of domain"
[23:51] <sagelap> look in /var/log/ceph/ceph-mon.e.log
[23:53] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:55] * rturk is now known as rturk-away
[23:56] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[23:56] * rustam (~rustam@94.15.91.30) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.