#ceph IRC Log


IRC Log for 2013-12-04

Timestamps are in GMT/BST.

[17:47] -kinetic.oftc.net- *** Looking up your hostname...
[17:47] -kinetic.oftc.net- *** Checking Ident
[17:47] -kinetic.oftc.net- *** Couldn't look up your hostname
[17:47] -kinetic.oftc.net- *** No Ident response
[17:47] * CephLogBot (~PircBot@ has joined #ceph
[17:47] * Topic is 'Latest stable (v0.72.0 "Emperor") -- http://ceph.com/get || dev channel #ceph-devel '
[17:47] * Set by scuttlemonkey!~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net on Tue Dec 03 16:30:05 CET 2013
[17:54] * Pedras (~Adium@c-67-188-26-20.hsd1.ca.comcast.net) has joined #ceph
[17:54] * sagelap (~sage@2600:1012:b004:410f:c685:8ff:fe59:d486) Quit (Read error: Connection reset by peer)
[17:56] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[18:01] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[18:02] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[18:03] * mattt_ (~textual@ Quit (Quit: Computer has gone to sleep.)
[18:03] * nwat (~textual@eduroam-244-187.ucsc.edu) has joined #ceph
[18:04] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:05] * philipgian (~philipgia@athedsl-89963.home.otenet.gr) has joined #ceph
[18:07] * sagelap (~sage@2600:1012:b004:410f:b5fc:5dca:6f92:fb96) has joined #ceph
[18:10] * getup (~getup@gw.office.cyso.net) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[18:13] * BManojlovic (~steki@fo-d- has joined #ceph
[18:15] * [fred] (fred@earthli.ng) has joined #ceph
[18:20] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[18:20] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit ()
[18:20] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:21] * hjjg_ (~hg@p3EE30203.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[18:21] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[18:23] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[18:26] * Pedras (~Adium@c-67-188-26-20.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:28] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[18:36] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) has joined #ceph
[18:39] * SvenPHX (~Adium@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[18:39] * SvenPHX (~Adium@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[18:39] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[18:41] <L2SHO> can anyone explain the difference between the crush weight and the osd weight?
[18:42] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[18:42] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Read error: No route to host)
[18:42] * xevwork (~xevious@6cb32e01.cst.lightpath.net) Quit (Remote host closed the connection)
[18:48] * fouxm_ (~fouxm@ Quit (Remote host closed the connection)
[18:51] * Sysadmin88 (~IceChat77@ has joined #ceph
[18:52] * houkouonchi-home (~linux@66-215-209-207.dhcp.rvsd.ca.charter.com) has joined #ceph
[18:54] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:54] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[18:54] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit ()
[18:55] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[18:55] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[18:56] * sagelap (~sage@2600:1012:b004:410f:b5fc:5dca:6f92:fb96) Quit (Read error: Connection reset by peer)
[19:00] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[19:01] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[19:02] * xevwork (~xevious@6cb32e01.cst.lightpath.net) has joined #ceph
[19:12] * bandrus (~Adium@ Quit (Quit: Leaving.)
[19:13] * bandrus (~Adium@ has joined #ceph
[19:14] * Sysadmin88 (~IceChat77@ Quit (Quit: Life without danger is a waste of oxygen)
[19:20] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[19:21] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[19:23] * Pedras (~Adium@ has joined #ceph
[19:23] * Pedras (~Adium@ Quit ()
[19:23] * Pedras (~Adium@ has joined #ceph
[19:26] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) has joined #ceph
[19:30] * sleinen1 (~Adium@ has joined #ceph
[19:34] * madkiss (~madkiss@ATuileries-152-1-40-29.w82-123.abo.wanadoo.fr) Quit (Quit: Leaving.)
[19:36] * CAPSLOCK2000 (~oftc@2001:610:748:1::8) has joined #ceph
[19:36] * sleinen (~Adium@2001:620:0:25:1571:69cb:4ee2:7678) Quit (Ping timeout: 480 seconds)
[19:37] * sagelap (~sage@37.sub-70-197-65.myvzw.com) has joined #ceph
[19:38] * sleinen1 (~Adium@ Quit (Ping timeout: 480 seconds)
[19:43] * mancdaz is now known as mancdaz_away
[19:44] * ARichards (~textual@ has joined #ceph
[19:47] * Cube (~Cube@66-87-64-254.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[19:48] * Cube (~Cube@66-87-67-110.pools.spcsdns.net) has joined #ceph
[19:52] * dmsimard (~Adium@palpatine.privatedns.com) has joined #ceph
[19:56] * Tamil (~tamil@ has joined #ceph
[19:56] * glzhao_ (~glzhao@ Quit (Quit: leaving)
[19:59] * sagelap (~sage@37.sub-70-197-65.myvzw.com) Quit (Read error: Connection reset by peer)
[20:00] * xdeller (~xdeller@ Quit (Quit: Leaving)
[20:01] * aliguori (~anthony@ has joined #ceph
[20:04] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[20:05] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:12] * Gamekiller77 (~oftc-webi@128-107-239-233.cisco.com) has joined #ceph
[20:15] * warrenu (~Warren@2607:f298:a:607:cd5:7dc5:a006:fbcc) has joined #ceph
[20:15] * dmsimard (~Adium@palpatine.privatedns.com) Quit (Quit: Leaving.)
[20:22] * dmsimard (~Adium@palpatine.privatedns.com) has joined #ceph
[20:25] * Knorrie (knorrie@yoshi.kantoor.mendix.nl) has left #ceph
[20:26] * dmsimard (~Adium@palpatine.privatedns.com) Quit ()
[20:26] * jsfrerot (~jsfrerot@ has joined #ceph
[20:27] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:27] <jsfrerot> Hi, I added a mon server to my existing cluster without using ceph-deploy.
[20:27] <jsfrerot> When I start it using the init script it's not
[20:28] <jsfrerot> so I have to start it by hand like this: ceph-mon -i host_name --public-addr x.x.x.x:6789
[20:28] <jsfrerot> what am I missing so it automatically starts at boot and also starts and stop using the init scripts
[20:28] <jsfrerot> using ubuntu 13.04
[20:29] <jsfrerot> ceph version 0.67.4-1raring
[20:54] * bandrus (~Adium@ Quit (Ping timeout: 480 seconds)
[21:01] * gregsfortytwo2 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[21:01] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[21:02] * JoeGruher (~JoeGruher@ has joined #ceph
[21:05] * lofejndif (~lsqavnbok@82VAAGW3S.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:06] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[21:06] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:12] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:14] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[21:14] <aarontc> jsfrerot: did you add the mon to your ceph.conf?
[21:16] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) Quit (Quit: Computer has gone to sleep.)
[21:17] <jsfrerot> aarontc: I tried to add it like this: [mon.host_name]
[21:17] <jsfrerot> mon addr = x.x.x.x:6789
[21:17] <jsfrerot> and still doesn't start
[21:17] <jsfrerot> but on my other nodes I didn't have to add it in the config file...
[21:18] <jsfrerot> This leaves me wondering how does it know how to start the mons and osds...
[21:18] <aarontc> jsfrerot: you also need a "host = " line in the [mon.<name>] section
[21:18] <aarontc> that's how the init script knows what to start on a given host
[21:18] <jsfrerot> mon_host ?
[21:18] <jsfrerot> in the global section ?
[21:19] * bandrus (~Adium@adsl-63-205-14-9.dsl.scrm01.pacbell.net) has joined #ceph
[21:20] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[21:21] * rudolfsteiner (~federicon@ has joined #ceph
[21:26] <aarontc> jsfrerot: each mon needs a section like this...
[21:26] <aarontc> [mon.chekov]
[21:26] <aarontc> host = chekov
[21:26] <aarontc> mon addr =
[21:27] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) Quit (Remote host closed the connection)
[21:28] <jsfrerot> aarontc: I don't understand how my other nodes are working without it then
[21:28] <aarontc> jsfrerot: me, either :)
[21:28] <aarontc> do you still have a "mon_initial_members" item in your ceph.conf?
[21:28] <aarontc> that might be related
[21:30] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) has joined #ceph
[21:33] <jsfrerot> yep
[21:35] * Jean-Roger (Jean-Roger@ALille-651-1-30-11.w2-5.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[21:36] * ARichards (~textual@ Quit (Quit: Textual IRC Client: www.textualapp.com)
[21:36] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:37] <jsfrerot> aarontc: I just added the host entry and it's working now. Thanks
[21:38] <aarontc> no problem
[21:38] * Jean-Roger (Jean-Roger@ALille-651-1-8-104.w92-131.abo.wanadoo.fr) has joined #ceph
[21:38] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:38] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[21:39] * jsfrerot (~jsfrerot@ Quit (Quit: leaving)
[21:40] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:40] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[21:40] * mfisch__ (~mfisch@c-76-25-23-72.hsd1.co.comcast.net) has joined #ceph
[21:45] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:45] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[21:46] * linuxkidd (~linuxkidd@cpe-066-057-061-231.nc.res.rr.com) has joined #ceph
[21:46] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[21:47] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[21:47] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:51] * mfisch__ is now known as mfisch
[21:52] <mfisch> I'm trying to bring up a cluster and stuck on starting the monitor in the Quickstart guide
[21:52] <mfisch> Any hints on how to debug this? [ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[21:56] <Gamekiller77> what did you use to deploy the monitor ceph-deploy or by hand
[21:59] <mfisch> ceph-deploy
[21:59] <mfisch> ceph-deploy 1.3.3
[22:00] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[22:00] <mfisch> I see this as well above the socket
[22:00] <mfisch> Starting Ceph mon.rhel-ceph-node01 on rhel-ceph-node01...
[22:00] <mfisch> failed: 'ulimit -n 32768; /usr/bin/ceph-mon -i rhel-ceph-node01 --pid-file /var/run/ceph/mon.rhel-ceph-node01.pid -c /etc/ceph/ceph.conf '
[22:01] <mfisch> however when that's run by hand it works
[22:01] * bandrus (~Adium@adsl-63-205-14-9.dsl.scrm01.pacbell.net) Quit (Read error: Connection reset by peer)
[22:02] * MarkN (~nathan@ has joined #ceph
[22:02] * MarkN (~nathan@ has left #ceph
[22:03] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[22:06] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[22:14] <alfredodeza> mfisch: so that ERROR comes after the fail to call ulimit and ceph-mon?
[22:15] <alfredodeza> full logs on a pastebin would be ideal to try and narrow it down
[22:17] * Sysadmin88 (~IceChat77@ has joined #ceph
[22:23] <mfisch> ok
[22:23] <mfisch> will grab
[22:24] <mfisch> Here's the log from deploying: http://pastebin.com/JwvG3KRq
[22:27] <alfredodeza> mfisch: do you have iptables enabled?
[22:28] <mfisch> I dont think so, let me look
[22:29] <mfisch> yep, nm, they're enabled
[22:29] <alfredodeza> you need to either open the ports necessary so that mons can talk to each other or disable iptables
[22:29] <mfisch> ok
[22:29] <alfredodeza> also, note that the command that fails is *not* the one that has ulimits, is the previous one
[22:30] <alfredodeza> this one --> sudo /sbin/service ceph -c /etc/ceph/ceph.conf start mon.rhel-ceph-node01
[22:31] <mfisch> ok
[22:31] <mfisch> disabling iptables
[22:34] * markbby (~Adium@ Quit (Quit: Leaving.)
[22:35] <mfisch> also I noticed that there's no repo for emporer, I'm using the dumpling one but it seems to have been updated
[22:35] <Pedras> http://ceph.com/rpm-emperor/
[22:36] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[22:36] * ChanServ sets mode +o elder
[22:36] <cmdrk> hmm
[22:37] <cmdrk> i have a bunch of OSDs in one box, 4 TB disks. I am noticing that they are being written to fairly unevenly
[22:37] <Pedras> cmdrk: out of curiosity, how many?
[22:37] <cmdrk> i have about 400 TB and 6000 pgs between data, metadata, and rbd pools
[22:37] <mfisch> Pedras: gah I cannot spell emperor! thanks
[22:37] <Pedras> misch:no
[22:38] <cmdrk> Pedras: ..56 :/
[22:38] <Pedras> misch:np I mean
[22:38] <Pedras> cmdrk: I have been mucking with 68
[22:38] <cmdrk> :)
[22:39] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[22:39] <Pedras> cmdrk: my df seems pretty even
[22:39] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:39] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[22:39] <cmdrk> im seeing a bunch of disks with 1% utilization, most with 10% utilization, and a fe with 20% utilization
[22:39] <Pedras> cmdrk: how much RAM?
[22:39] <cmdrk> 128GB
[22:39] <cmdrk> 99GB cached right now
[22:39] <Pedras> cmdrk: did you adjust pg_num and friends in all pools
[22:40] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[22:40] <cmdrk> yeah, there are about 2000 pg/pgp_num per pool
[22:40] <cmdrk> maybe i should increase it?
[22:40] <Pedras> I just followed the the doc's formula
[22:41] <Pedras> http://ceph.com/docs/master/rados/operations/placement-groups/
[22:42] <Pedras> when you change pg_num and pgp_num did the cluster start reporting osd being taken out or other problems?
[22:42] <cmdrk> hmm yeah.. i have about (110 OSDs * 100)/ 2
[22:42] <cmdrk> i changed pg_num and pgp_num right after i set up the cluster
[22:42] <cmdrk> dont remember any problems
[22:42] * AfC (~andrew@203-219-79-122.static.tpgi.com.au) has joined #ceph
[22:43] <Pedras> I had to restart osds.. things got out of wack for a bit
[22:43] <Pedras> what distro are you using? I am on fc19
[22:43] <cmdrk> EL6
[22:44] <cmdrk> oooh, i just noticed pg_num but not pgp_num was updated on my data pool :o
[22:45] <Pedras> maybe that is it
[22:45] <cmdrk> needs correcting regardless!
[22:45] <Pedras> are you using cephfs?
[22:45] <Pedras> or mainly rbd
[22:46] <cmdrk> cephfs with a custom kernel
[22:46] <cmdrk> for the d_prune patch etc
[22:46] <mfisch> alfredodeza: same result w/o iptables...
[22:46] <Pedras> I am not familiar with that
[22:46] <Pedras> cmdrk: what is it for?
[22:46] <cmdrk> basically kernel 3.12 b/c i noticed issues where files were disappearing/reappearing in cephfs
[22:47] <Pedras> ah
[22:47] <cmdrk> baically a file wouldn't be there until you tried to stat it, then it would magically appear again :)
[22:47] <cmdrk> no file loss, just didnt report its existence
[22:47] <dmick> heisenfiles
[22:47] <cmdrk> yes
[22:47] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[22:47] <cmdrk> exactly :)
[22:47] <Pedras> cmdrk: I gave up with el6 userland and 3.x kernel - there is some trouble with "high memory" boxes
[22:47] * JoeGruher (~JoeGruher@ Quit (Remote host closed the connection)
[22:47] <cmdrk> ahh
[22:48] <Pedras> cmdrk: although couldn't ascertain what "high memory" is
[22:48] <cmdrk> interesting, where does this information come from?
[22:48] <cmdrk> i have a lot of boxes in O(100GB) range
[22:48] <Pedras> cmdrk: udev would tie itself in nots with our 256GB nodes
[22:48] <cmdrk> Pedras: good to know
[22:48] <Pedras> cmdrk: I didn't try with less memory
[22:48] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[22:49] <cmdrk> question .. is there any situation when pgp_num < pg_num is advantageous ? i guess i dont understand why there are two values to set
[22:49] <Pedras> cmdrk: didn't research the issue more but at the time that led me to this : https://groups.google.com/forum/#!msg/linux.kernel/QgsZHAte5WA/aPg-DLQxRr0J
[22:49] <cmdrk> from a sysadmin point of view anyway :)
[22:50] <cmdrk> interesting. ill file that one away -- thanks for the heads up
[22:50] <Pedras> cmdrk: I was wondering that myself yesterday. if the doc states to set one after the other. Maybe some developer could elaborate further on that
[22:50] <Pedras> blasted rhel on 2.6.x for ages now
[22:51] <mfisch> alfredodeza: I think the issue is with ulimit, I have an idea on how to fix it
[22:51] * JoeGruher (~JoeGruher@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[22:52] <alfredodeza> mfisch: doing that manually fails?
[22:52] <alfredodeza> the ulimit command that is
[22:53] <Pedras> cmdrk: are you stuck with using el6?
[22:53] <Pedras> cmdrk: nothing against it, it ubuntu just seems to be "preferred" dist
[22:54] <cmdrk> eh, I
[22:54] <cmdrk> i'd prefer to keep an rpm distro at least
[22:54] <cmdrk> for sysadmin sanity :)
[22:54] <cmdrk> but yeah, could consider F19.
[22:54] <janos> cmdrk: honestly i have no issues with bare-butt minimal fedora installs
[22:54] <cmdrk> just dont like the idea of having to update the machines to Fedora [n+1] every year
[22:55] <cmdrk> what would be preferable is EL7 :)
[22:55] <janos> yeah i tend to skip versions
[22:55] <cmdrk> well, n+2 then ;)
[22:55] <janos> some things Just Work so i leave them alone
[22:55] <Pedras> cmdrk: yeahh??? el7???. one day
[22:55] <janos> when it's time to re-eval i wipe and get a newer version
[22:56] <janos> but something in between fedora and centOS release times would be nice
[22:56] <Pedras> janos: indeed
[22:57] <Pedras> more dists is??? more
[22:57] <Pedras> :)
[22:57] <janos> yep
[22:57] <Pedras> this el7 deal has been??? slow
[22:58] <Pedras> cmdrk: what kind of testing have you done?
[22:59] * malcolm (~malcolm@ Quit (Ping timeout: 480 seconds)
[22:59] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[23:09] <mfisch> alfredodeza: yeah it's more than the hardlimit, trying again in a few mins
[23:11] <cmdrk> Pedras: with regards to?
[23:11] * philipgian (~philipgia@athedsl-89963.home.otenet.gr) Quit (Ping timeout: 480 seconds)
[23:12] * japuzzo (~japuzzo@pok2.bluebird.ibm.com) Quit (Quit: Leaving)
[23:13] <Pedras> cmdrk: creating load on it
[23:14] <cmdrk> ive done some benchmarking with rados and the xrootd + cephfs. some plots here: http://stash.osgconnect.net/+lincolnb/ceph-oct-2013
[23:14] <cmdrk> but that data is probably invalidated now that i realize pgp_num wasn't updated with pg_num
[23:16] <Pedras> haven't use it??? great tip
[23:16] <Pedras> xroot that is
[23:17] <cmdrk> yeah.. its just a file access protocol, used in high energy physics probably exclusively :)
[23:17] <mfisch> alfredodeza: digging more and I found that the socket was still using the old hostname of this node
[23:18] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[23:18] <alfredodeza> wow
[23:18] <mfisch> alfredodeza: I'm sure this is the issue. ceph-mon is running but with an unexpected socket name
[23:18] <mfisch> I saw this warning before and fixed it but it didn't "take" I guess
[23:18] <alfredodeza> I wonder if there is anything cephde-deploy could somehow try to infer that
[23:18] <mfisch> ceph-deploy does warn on this but not until after the config file is created I dont think
[23:19] <alfredodeza> gotcha
[23:20] <mfisch> and ceph-mon now works
[23:20] <mfisch> thanks for the assist alfredodeza
[23:20] <alfredodeza> np
[23:24] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:31] <cmdrk> woo, all of my OSDs crashed after setting pgp_num :/
[23:31] <Pedras> cmdrk: I saw a few going down
[23:31] <cmdrk> yeah
[23:31] <Pedras> cmdrk: and a few more restarts were needed until things settled
[23:31] <cmdrk> yeah it seems that way here
[23:32] <Pedras> how is memory/swap now?
[23:32] <cmdrk> no swapping, 100 GB cached
[23:33] <Pedras> I ask since 128G sounds on the low end, for so many osds
[23:34] <cmdrk> yeah, probably need to add more
[23:34] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) Quit (Quit: Computer has gone to sleep.)
[23:34] <cmdrk> they're on 4 dell shelfs, might just get more hosts and split up the OSDs a bit
[23:34] <Pedras> the docs were recently updated to 1GB per TB
[23:34] <cmdrk> well 8 dell shelfs altogether
[23:34] <cmdrk> ahh
[23:34] <Pedras> ~1GB / TB
[23:35] <Pedras> that is one host right now?
[23:35] <cmdrk> two hosts
[23:35] <cmdrk> each with half of the OSDs
[23:35] <Pedras> I forced mine into one :)
[23:35] <cmdrk> ahh :)
[23:35] <Pedras> only have one of these beasts
[23:36] <pmatulis> what is the link to the 1GB/1TB thing?
[23:36] <Pedras> I do not know
[23:36] <Pedras> noticed that modification recently in the docs
[23:37] <pmatulis> ah, http://ceph.com/docs/master/start/hardware-recommendations/
[23:37] * malcolm (~malcolm@ has joined #ceph
[23:37] <Pedras> sorry, I presume you asked the reasoning for it. I don't know
[23:38] <Pedras> but saying 1GB per OSD was very carte blanche
[23:38] <Pedras> with 6TB disks around the corner ehhe
[23:38] * dmsimard (~Adium@palpatine.privatedns.com) has joined #ceph
[23:39] <Pedras> cmdrk: how is the network setup of those 2 boxes?
[23:39] <cmdrk> 10Gb interface for each
[23:40] <cmdrk> going to change them to 20Gb bonds i think
[23:40] <Pedras> I was going for 20G for the backend network only but I am quickly changing my mind :)
[23:46] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:46] * dmsimard (~Adium@palpatine.privatedns.com) Quit (Ping timeout: 480 seconds)
[23:50] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[23:50] * rudolfsteiner (~federicon@ Quit (Quit: rudolfsteiner)
[23:54] * ivotron (~ivotron@adsl-76-254-10-5.dsl.pltn13.sbcglobal.net) Quit (Remote host closed the connection)
[23:54] * dmsimard (~Adium@ has joined #ceph
[23:56] * rendar (~s@host84-179-dynamic.7-87-r.retail.telecomitalia.it) Quit ()
[23:58] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.