#ceph IRC Log


IRC Log for 2012-09-14

Timestamps are in GMT/BST.

[0:02] <damien> joshd: sorry went away for a bit
[0:02] <damien> let me try that
[0:03] <damien> http://dpaste.com/800667/
[0:03] <damien> result!
[0:05] * Ryan_Lane (~Adium@owa.robertfountain.com) Quit (Quit: Leaving.)
[0:06] <joshd> woot!
[0:06] * damien hopes that's some help
[0:07] <joshd> that means the floating point error is 'inexact result', the one I thought was least likely
[0:08] <joshd> can you 'list *0x7ffff74a6675' to make sure the backtrace is correct?
[0:09] <damien> joshd: http://dpaste.com/800668/
[0:09] <joshd> ok, that is where it said
[0:10] <joshd> can you go to frame 1 and print elapsed?
[0:10] <damien> $4 = {tv = {tv_sec = 0, tv_nsec = 61000}}
[0:10] <joshd> I wonder if this is happening because of the guest setting the floating point mode?
[0:11] <joshd> by default inexact_result is ignored, since there are so many easy to hit inexact floating point values
[0:11] <damien> but the guest OS could be changing that to something else?
[0:12] <joshd> yeah, but it shouldn't be impacting librbd, but it's possible that it's leaking through
[0:12] <joshd> I don't think qemu is changing it
[0:13] <damien> is it a runtime thing?
[0:13] <joshd> are you running any kind of weird distro or libc?
[0:13] <joshd> yeah
[0:13] <damien> it's a windows guest
[0:13] <damien> haven't seen this with a linux one
[0:14] <joshd> it's a cpu setting for which floating point exceptions are enabled
[0:15] <damien> ah okay
[0:15] <damien> can the exceptions be handled?
[0:16] <joshd> not from librbd - it'd have to be changed in qemu
[0:17] <Tv_> joshd: it wouldn't surprise me if vm->qemu jump would just let fp modes stay as they are
[0:17] <Tv_> joshd: kernel does that too
[0:17] <Tv_> joshd: significant speed increase
[0:17] <Tv_> with the assumption that you won't do floating point, or when you do, *then* you save/restore flags
[0:17] <joshd> Tv_: so should we try to audit librbd/librados etc and remove floating point?
[0:18] <Tv_> this talks about kernel side: http://www.linuxsmiths.com/blog/?p=253
[0:18] <Tv_> you could wrap all use of fp in similar save/restore
[0:18] <Tv_> sucky
[0:19] <Tv_> really, it would be nice to have ceph use integer arithmetic only, for several reasons
[0:20] <joshd> this particular use is for perfcounters, so it's not too hard to convert to integers
[0:20] <Tv_> where do they even need float?
[0:20] <Tv_> spit out raw numbers, let consumer worry about math
[0:21] <joshd> it's adding time values for latencies, converting them to seconds as a double
[0:21] <Tv_> e.g. instead of computing average, give sum and count
[0:21] <Tv_> eww
[0:21] <Tv_> count nanoseconds or something
[0:21] <joshd> it already does sum and count
[0:21] <joshd> but yeah, it could be nanoseconds instead
[0:21] <Tv_> computer time isn't floating points
[0:22] <gregaf> we could probably replace that without *too* much trouble; it's just for convenience so the value is in seconds
[0:22] <gregaf> and given the constraints it already has on auto-conversion we could probably even make it nice
[0:24] * Ryan_Lane (~Adium@owa.robertfountain.com) has joined #ceph
[0:24] <joshd> there aren't too many other places we use floating point on the client side
[0:25] <joshd> it's definitely doable to convert them to int arithmetic, and it sounds like that's the best way forward
[0:26] <dmick> yeah. it's nice to have "a number" to compare, but
[0:26] <dmick> FP brings along baggage
[0:26] <dmick> and given that you've got classes to hide the unpleasantness...
[0:27] <dmick> can keep interface the same, convert only on output. but it's a potentially-big change (not sure how abstracted it is ATM)
[0:28] <gregaf> there's definitely some stuff that would have to change, but it'll be easy to find where since I think the conversion might even be explicit
[0:29] <joshd> the conversion isn't explicit, but it's easy to make it a compile error to use it
[0:29] <Tv_> when do you output perf counters *from librbd*?
[0:30] <joshd> via admin socket
[0:30] <gregaf> the ObjectCacher keeps track of cache hits/misses, etc
[0:30] <Tv_> in a library?...
[0:30] <joshd> well, it's not available by default
[0:31] <joshd> you have to configure it to have one
[0:31] <Tv_> besides, you'd probably serve the socket from a separate thread anyway
[0:31] <Tv_> can make fp safe there, format for output
[0:31] <Tv_> just keep state in ints
[0:31] <joshd> yeah
[0:33] <joshd> oh, crush may be problematic
[0:33] <joshd> it uses floats all over
[0:35] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[0:36] * sjustlaptop (~sam@ has joined #ceph
[0:36] <dmick> if this is a problem, one might think there's some existing discussion of "what not to do in qemu"
[0:57] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[1:03] * Tv_ (~tv@2607:f298:a:607:5905:afb4:18b:79c5) Quit (Quit: Tv_)
[1:11] <jlogan> First node with 4 osd came up well. I'm now trying to add a 2nd node but the osd create is not setting up the same files/directories
[1:11] <jlogan> root@ceph02-sef:/etc/ceph# ceph-osd -d -i 4 --mkkey --mkfs
[1:11] <jlogan> root@ceph02-sef:/etc/ceph# ls -l /var/lib/ceph/osd/ceph-4/
[1:11] <jlogan> total 4
[1:11] <jlogan> -rw------- 1 root root 56 Sep 13 15:59 keyring
[1:11] <jlogan> the osd is on it's own drive: /dev/sdb1 on /var/lib/ceph/osd/ceph-4 type btrfs (rw,relatime,space_cache)
[1:12] <jlogan> root@ceph02-sef:/etc/ceph# ceph auth add osd.4 osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-4/keyring
[1:12] <jlogan> 2012-09-13 16:11:56.172088 7f4108495760 -1 read 56 bytes from /var/lib/ceph/osd/ceph-4/keyring
[1:12] <jlogan> added key for osd.4
[1:12] <jlogan> root@ceph02-sef:/etc/ceph# service ceph -a start osd.4
[1:12] <jlogan> === osd.4 ===
[1:12] <jlogan> Starting Ceph osd.4 on ceph02-sef...
[1:12] <jlogan> 2012-09-13 16:12:10.534107 7f14e5187780 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-4: (2) No such file or directory
[1:12] <jlogan> failed: ' /usr/bin/ceph-osd -i 4 --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf '
[1:13] <jlogan> I was following the steps here:
[1:13] <jlogan> http://ceph.com/docs/master/cluster-ops/add-or-rm-osds/#adding-an-osd-manual
[1:20] <joshd> jlogan: is there anything in the log for that osd?
[1:20] <joshd> normally in /var/log/ceph/osd.4.log
[1:21] <joshd> err, /var/log/ceph/ceph-osd.4.log
[1:33] * jlogan (~Thunderbi@2600:c00:3010:1:49cf:a720:7a5f:aaa9) Quit (Ping timeout: 480 seconds)
[1:39] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:52] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:56] * sjustlaptop1 (~sam@ has joined #ceph
[1:56] * sjustlaptop (~sam@ Quit (Read error: Connection reset by peer)
[2:06] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:13] * Ryan_Lane (~Adium@owa.robertfountain.com) Quit (Quit: Leaving.)
[3:05] * sjustlaptop1 (~sam@ Quit (Ping timeout: 480 seconds)
[4:13] * joshd (~joshd@ Quit (Quit: Leaving.)
[4:47] <Tobarja1> can someone look at a `ceph osd dump -o - | grep osd` and tell me if after last_clean_interval, there appears to be a `[XXX,YYY)` where it should be either opening and closing braces or brackets?
[4:57] <dmick> yes, I have such a thing
[4:58] <dmick> See the comment in osd/OSDMap.h about struct osd_info_t for what that means
[5:02] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[5:05] * jlogan (~Thunderbi@2001:470:b:52f:cd7d:82eb:2838:b681) has joined #ceph
[5:08] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:18] * Cube (~Adium@ Quit (Quit: Leaving.)
[5:22] <dmick> Tobarja1: does that help?
[5:52] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:00] * jlogan (~Thunderbi@2001:470:b:52f:cd7d:82eb:2838:b681) Quit (Remote host closed the connection)
[6:01] * jlogan (~Thunderbi@2001:470:b:52f:cd7d:82eb:2838:b681) has joined #ceph
[6:09] * deepsa (~deepsa@ Quit (Remote host closed the connection)
[6:09] * deepsa (~deepsa@ has joined #ceph
[6:27] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[6:57] * hijacker_ (~hijacker@ Quit (Ping timeout: 480 seconds)
[6:57] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[7:02] * hijacker_ (~hijacker@ has joined #ceph
[7:15] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[7:30] * loicd (~loic@ has joined #ceph
[7:38] * dmick (~dmick@2607:f298:a:607:9c04:b691:ad19:b925) has left #ceph
[8:16] * loicd (~loic@ Quit (Quit: Leaving.)
[8:29] * maelfius (~mdrnstm@ Quit (Quit: Leaving.)
[8:41] * jlogan (~Thunderbi@2001:470:b:52f:cd7d:82eb:2838:b681) Quit (Ping timeout: 480 seconds)
[8:58] * BManojlovic (~steki@ has joined #ceph
[9:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:08] * coredumb (~coredumb@ns.coredumb.net) Quit (Read error: Operation timed out)
[9:20] * Leseb (~Leseb@ has joined #ceph
[9:23] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[9:23] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:00] * Leseb_ (~Leseb@ has joined #ceph
[10:04] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[10:07] * Leseb (~Leseb@ Quit (Ping timeout: 480 seconds)
[10:07] * Leseb_ is now known as Leseb
[10:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[10:31] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[10:50] * JIos (~Jios@ppp118-208-237-131.lns20.hba2.internode.on.net) has joined #ceph
[10:50] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:55] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:11] <gregorg> Hi
[11:11] <gregorg> seems ceph init script is buggy in some cases :
[11:11] <gregorg> host for mon.smith-02 is smith-02.local, i am smith-02
[11:11] <gregorg> so smith-02 is skipped
[11:12] <gregorg> service ceph stop -> do nothing cause of skip
[11:12] <gregorg> hostname=`hostname | cut -d . -f 1`
[11:13] <gregorg> perhaps in check_host() function, we should cut $host too ?
[11:17] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[11:17] <gregorg> simply by adding "host=${host%%.*}" in line 44 in ceph_common.sh
[11:17] * andret (~andre@pcandre.nine.ch) has joined #ceph
[11:37] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[12:29] <wido> gregorg: No, the hostname in your ceph.conf should match the output host `hostname`
[12:29] <wido> that's the way it's intended to work
[12:40] * loicd (~loic@ has joined #ceph
[12:40] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:45] * idnc_sk (~idnc_sk@ has joined #ceph
[12:45] * deepsa (~deepsa@ Quit (Read error: Connection reset by peer)
[12:45] <idnc_sk> hi
[12:45] <idnc_sk> one quick question
[12:46] <idnc_sk> should the osd daemons run under root or a standard user
[12:46] <idnc_sk> ?
[12:46] <idnc_sk> I'm asking because
[12:47] <wido> idnc_sk: root
[12:47] <wido> Maybe the monitor could run under a different user, but the OSD can't
[12:47] <idnc_sk> wido: great, thx
[12:48] <idnc_sk> my mkcephfs: /storage/ceph/osd11) could not find 23c2fcde/osd_superblock/0/
[12:48] <idnc_sk> in index: (2) No such file or directory
[12:48] <idnc_sk> turning on debug, hold on..
[12:48] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[12:48] * deepsa (~deepsa@ has joined #ceph
[12:49] <idnc_sk> and I dont understand why it's trying to remove
[12:49] <idnc_sk> rm: cannot remove `/storage/ceph/mon11': Device or resource busy
[12:49] <idnc_sk> > sry for spam
[12:51] <joao> idnc_sk, is mkfs finishing successfully?
[12:51] <idnc_sk> no, I'm pasting the whole log to pastebin, one sec
[12:51] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:53] <idnc_sk> pastebin.com/K3HyPHZM
[12:53] <idnc_sk> config comming..
[12:56] <idnc_sk> pastebin.com/d207nzWM
[12:56] <idnc_sk> needs tunning + not sure about the mds syntax
[12:56] <joao> btw, these
[12:56] <joao> 2012-09-14 10:47:41.452429 7f1c12295780 -1 filestore(/storage/ceph/osd11) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[12:57] <joao> are not "real" error messages
[12:57] <idnc_sk> so - its just a msg that the file was not yet created?
[12:57] <joao> the keyring one neither, as it is creating the said keyring right after it
[12:57] <joao> yes
[12:58] <idnc_sk> aaa, good to know, so the actual error is with the mon's I guess
[12:59] <joao> looks that way, yes
[12:59] <idnc_sk> hmm, mon11 and mon21 are mounted xfs partitions, ou wait!
[12:59] <idnc_sk> /dev/vgLocal/ceph-mon /storage/ceph/mon21 xfs rw,noexec,nodev,noatime,nodiratime,barrier=0 0 0
[12:59] <idnc_sk> can it be because of the noexec, nodev part?
[13:01] <idnc_sk> ..regardless, still not getting why it tries to remove the mon folders..
[13:02] <joao> I believe that the monitor's store mkfs tries to remove the existing mon directories in order to create a new store
[13:02] <joao> I thought this had already been deprecated; which version are you using?
[13:03] <idnc_sk> ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
[13:05] <joao> yeah, it was removed in the mean time, but 0.48.1 still tried to remove the mon directory during mkfs
[13:05] <idnc_sk> hmm, ok, will try to mount the part. to a parent directory - so mkcephfs can remove the mon dir - one sec
[13:09] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[13:12] <idnc_sk> mha, working
[13:12] <idnc_sk> great
[13:13] <idnc_sk> thanks a lot!
[13:13] <idnc_sk> so, I guess when I take the latest version, this should not be an issue
[13:13] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[13:16] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:16] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:19] <idnc_sk> juuu what a beautiful sight
[13:20] <idnc_sk> is this normal? HEALTH_WARN 4608 pgs stuck inactive; 4608 pgs stuck unclean
[13:20] <idnc_sk> will it resolve itself or should I do some type of fsck
[13:23] <idnc_sk> hoppa, probably a ntp sync issue, one sec
[13:23] <idnc_sk> ok, thx again, later
[13:23] * idnc_sk (~idnc_sk@ Quit (Quit: leaving)
[13:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:33] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:34] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:38] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Ping timeout: 480 seconds)
[13:45] * JIos (~Jios@ppp118-208-237-131.lns20.hba2.internode.on.net) Quit (Remote host closed the connection)
[13:45] * idnc_sk (~idnc_sk@ has joined #ceph
[13:54] <idnc_sk> noticed an interesting issue with rbd volumes
[13:54] <idnc_sk> since rbd is using a kernel module
[13:55] <idnc_sk> if a operation on a mounted rbd stops responding, you are stuck
[13:55] <gregorg> wido: not true as you "cut" hostname command output
[13:56] <gregorg> hostname=`hostname | cut -d . -f 1`
[13:57] <idnc_sk> dd if=/dev/zero of=/mnt/test.img bs=1024M, no count since the test volume was only few megs
[13:58] <idnc_sk> > probably not directly a ceph issue, but still -
[13:59] <idnc_sk> ok, pls ignore the post above(if not already :)
[14:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:06] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[14:20] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:20] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit ()
[14:21] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:31] <idnc_sk> btw, I have a test setup on 2 nodes, 3osd;s per node, I just shut down n1, what rendered the whole cluster as not accessible - do I have to define replications via crush first(to get a HA setup) or is this a issue caused by the config(mds fe)
[14:32] <idnc_sk> setup: 2x(3xosd;1xmds;1xmon)
[14:33] <idnc_sk> is ceph avare of osd's sitting on the same physical machine?
[14:35] <joao> idnc_sk, I believe that is on the crush map
[14:36] <joao> being there, ceph will take it into consideration
[14:37] <idnc_sk> is the crush map generated during mkcephfs or do I have to generate it on my own
[14:37] <idnc_sk> reading http://ceph.com/wiki/Custom_data_placement_with_CRUSH, give me a sec :)
[14:38] <idnc_sk> "By default, all OSDs are placed in a single pool, and replicas are placed on N (2, by default) pseudorandom nodes"
[14:38] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[14:40] * deepsa (~deepsa@ has joined #ceph
[14:42] * idnc_sk (~idnc_sk@ Quit (Quit: leaving)
[15:07] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) has joined #ceph
[15:14] * huangjun (~hjwsm1989@ Quit ()
[15:28] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Read error: Operation timed out)
[15:39] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[16:00] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:22] * amatter (~amatter@ has joined #ceph
[16:22] * jlogan (~Thunderbi@2001:470:b:52f:f::3) has joined #ceph
[16:23] * jlogan (~Thunderbi@2001:470:b:52f:f::3) Quit ()
[16:24] * jlogan (~Thunderbi@2001:470:b:52f:4c2e:2ecb:2b32:e9d5) has joined #ceph
[16:24] * jlogan (~Thunderbi@2001:470:b:52f:4c2e:2ecb:2b32:e9d5) Quit ()
[16:41] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Ping timeout: 480 seconds)
[17:01] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[17:05] * jlogan (~Thunderbi@ has joined #ceph
[17:07] * guerby (~guerby@nc10d.tetaneutral.net) has joined #ceph
[17:09] * rumpler (c0373729@ircip2.mibbit.com) has joined #ceph
[17:11] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:14] <rumpler> I just finished configuring and starting a 6-host cluster, running 'service ceph -a status' from mon.0 reports argonaut for all the local services, but unknown version info for the rest of the services on other boxes... Is this correct?
[17:19] <stan_theman> is there a list of some big groups using ceph?
[17:19] * amatter (~amatter@ has joined #ceph
[17:27] <nhmlap> stan_theman: hrm, I don't know what's public and what isn't.
[17:28] <rumpler> This is the 1st time I have used Mibbit for IRC, and my last chat came across as rumpler, is that my username and if so how do I change it?
[17:28] <stan_theman> gluster had a nice "look at the people using gluster" page. i know it's not as important as other things, but it was neat to send to other people
[17:28] <nhmlap> stan_theman: yeah, that site isn't terribly accurate, but it'd be nice to have something similar.
[17:28] <nhmlap> that is accurate rather. ;)
[17:29] <nhmlap> rumpler: try "/nick <nickname>"
[17:29] * rumpler is now known as crispy
[17:29] <crispy> ping
[17:30] <nhmlap> crispy: pong
[17:30] <crispy> Hey it worked, thanks!
[17:30] <nhmlap> no problem
[17:32] <crispy> stan_theman: there's a ton if interst in ceph out there although I'm not sure some of those fortun 500 folks would be willing to reveal their identities
[17:32] <crispy> *fortune
[17:32] * MK_FG (~MK_FG@ Quit (Read error: Operation timed out)
[17:32] <stan_theman> heh. yeah, i dont doubt that, but i remember seeing some pretty interesting guys using gluster (pandora?) and they had some google maps thing
[17:32] <stan_theman> where you could pin your company on the world map and describe your cluster/usage
[17:33] <nhmlap> stan_theman: yeah, I used to work for the Minnesota Supercomputing Institute. We were on that map even though we didn't end up using gluster in the end.
[17:33] <stan_theman> heh
[17:33] <crispy> I wonder if the inktank folks are thinking of adding soemthing like that to the ceph.com page
[17:33] <nhmlap> stan_theman: For all I know it may still be on there.
[17:34] * sagelap (~sage@c-66-31-47-40.hsd1.ma.comcast.net) has joined #ceph
[17:34] <nhmlap> crispy: yeah, we've got a visitor map, but I don't think we have a map of deployments.
[17:34] <nhmlap> http://www.revolvermaps.com/?target=enlarge&i=1lzi710tj7s&color=80D2DC&m=0
[17:42] * nhorman_ (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[17:42] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Read error: Connection refused)
[17:57] <jlogan> Picking up where I left off yesterday... osd start is failing.
[17:57] <jlogan> ceph.osd.4.log is empty.
[17:57] <jlogan> I used strace and did find this:
[17:57] <jlogan> open("/var/lib/ceph/osd/ceph-4/magic", O_RDONLY) = -1 ENOENT (No such file or directory)
[17:57] <jlogan> I believe this command is not setting up the disk correctly
[17:57] <jlogan> ceph-osd -d -i 4 --mkkey --mkfs
[17:58] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[18:01] * sagelap (~sage@c-66-31-47-40.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[18:02] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) Quit (Quit: Leaving.)
[18:02] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Ping timeout: 480 seconds)
[18:03] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[18:05] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) has joined #ceph
[18:06] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:07] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[18:07] * Tv_ (~tv@2607:f298:a:607:c52:3be1:39e1:21ca) has joined #ceph
[18:10] * liiwi (liiwi@idle.fi) has joined #ceph
[18:12] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[18:12] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[18:12] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Remote host closed the connection)
[18:18] * liiwi (liiwi@idle.fi) Quit (Read error: Operation timed out)
[18:21] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[18:21] <Tv_> soo.. i need to come up with a command-line way to pick what ceph branch to run.. stable (argonaut/bobtail/c*) or stable-testing or autobuild (master/branchname/tagname)
[18:22] <Tv_> how do i formulate that as --flags or something?
[18:23] <Tv_> ceph-deploy install --stable (defaults to latest stable, right now argonaut soon to be bobtail)
[18:23] <Tv_> ceph-deploy install --stable=argonaut (install argonaut even when bobtail is out)
[18:23] <Tv_> ceph-deploy install --testing
[18:23] <Tv_> ceph-deploy install --dev (install unstable master)
[18:23] <Tv_> ceph-deploy install --dev=BRANCH_OR_TAG
[18:23] <Tv_> ?
[18:24] <Tv_> i'm not thrilled about --stable vs stable=foo both existing
[18:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:25] <elder> Is argonaut a release name? --release=argonaut?
[18:25] <Tv_> good point
[18:25] <Tv_> though this may need to drill down deeper into what minor release of the argonaut branch, perhaps
[18:25] <Tv_> but that's still a release
[18:25] <joao> what about a --release=[stable|dev|testing] and a --version/something-else=[branch|tag|release_name] ?
[18:26] <Tv_> so --release=dev would mean just tags from gitbuilder
[18:26] <Tv_> but calling master branch a "release" is weird
[18:28] * lxo (~aoliva@lxo.user.oftc.net) Quit ()
[18:28] <elder> Well if you want to support checkout by branch, why not do --branch=<branchname>
[18:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:28] <Tv_> elder: and then a --tag too?
[18:29] <elder> Sure. Why not do the obvious thing?
[18:29] <Tv_> elder: also, i want people to *know* they're asking for bad things to happen, so i like --unstable in the name for that ;)
[18:29] <Tv_> elder: oh target audience is admins not devs
[18:30] <elder> Hmm. Then probably having a clear definition of a release and/or version is the right vocabulary I suspect.
[18:30] <Tv_> yeah that's what i'm trying to clear our
[18:30] <Tv_> we have the "real releases" (= argonaut etc)
[18:30] <Tv_> we have "development releases" like 0.53
[18:31] <Tv_> argonaut gets minor releases inside, so we have argonaut.1 argonaut.2 etc too
[18:31] <elder> Why are admins using development releases? Early access?
[18:31] <Tv_> elder: we often ask people to do that to repro a bug etc
[18:31] <Tv_> elder: also, i think this'll be really good for qa
[18:32] <Tv_> i can install a ceph cluster in ~20 seconds: 10 for vms to come up, 10 for ceph to install ;)
[18:32] <elder> But for QA we can open ourselves up to different requirements--and support --branch or --tag
[18:32] * liiwi (liiwi@idle.fi) has joined #ceph
[18:32] <Tv_> elder: trying to serve qa without leading admins astray
[18:32] <Tv_> no sane admin would ever accidentally ask for --unstable
[18:32] <Tv_> where as --tag=v0.53 they might
[18:33] <elder> Maybe those require a --unstable in addition to be allowed.
[18:33] <elder> Or --ireallymeanit
[18:33] <Tv_> also, the urls i construct don't differentiate tags from branches, so separate --branch and --tag is nice but meaningless
[18:34] <elder> To you it is, but not to the user.
[18:34] <Tv_> elder: i know but allowing --tag=master is just weird
[18:36] <Tv_> --dev=BRANCH_OR_TAG instead of --unstable=... might flow better
[18:36] <Tv_> so --stable / --stable=CODENAME / --testing / --testing=CODENAME / --dev / --dev=REF
[18:36] <Tv_> and perhaps CODENAME[.MINOR]
[18:36] <elder> Sounds OK to me.
[18:37] <Tv_> a good stable / testing /dev split there
[18:37] <Tv_> makes sense to me
[18:37] <elder> What does joao think?
[18:37] <elder> (Since he expressed an opinion above)
[18:38] <Tv_> lll
[18:38] <Tv_> crap
[18:38] <joao> sorry, focused on the other screen and stopped paying attention; reading last paragraphs :)
[18:38] <Tv_> this box needs a reboot, something got upgraded on the fly and things are not happy
[18:40] * Tv_ (~tv@2607:f298:a:607:c52:3be1:39e1:21ca) Quit (Remote host closed the connection)
[18:40] <joao> well, I'm not very fond of the --branch and --tag idea, the --dev/stable/testing=CODENAME seems reasonable, and I do think that requiring an --ireallymeanit when --unstable is used would be best
[18:40] <joao> but then again, I'd probably be satisfied with a --sha=
[18:41] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[18:41] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[18:46] * BManojlovic (~steki@ has joined #ceph
[18:52] * liiwi (liiwi@idle.fi) has joined #ceph
[18:56] * Tv_ (~tv@2607:f298:a:607:391b:b457:8e5c:c6ea) has joined #ceph
[18:57] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[19:00] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:03] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[19:08] * sjustlaptop (~sam@ has joined #ceph
[19:14] * chutzpah (~chutz@ has joined #ceph
[19:16] * jjgalvez (~jjgalvez@ has joined #ceph
[19:16] * sjustlaptop (~sam@ Quit (Ping timeout: 480 seconds)
[19:25] * amatter (~amatter@ has joined #ceph
[19:32] * crispy (c0373729@ircip2.mibbit.com) has left #ceph
[19:33] * crispy (c0373729@ircip2.mibbit.com) has joined #ceph
[19:37] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[19:40] <crispy> I have a new cluster up & running, HEALTH_OK, all 48 osd's and 3 mon's services started... Can anyone point me in the direction of step-by-step instructions for generating a key for a client so I can test mount rbd?
[19:40] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:41] * loicd (~loic@magenta.dachary.org) Quit ()
[19:51] <Tv_> crispy: mkcephfs should have created a client.admin key for you, rbd commands use that by default
[19:51] <Tv_> just http://ceph.com/docs/master/rbd/rados-rbd-cmds/ should work
[19:52] * amatter (~amatter@ has joined #ceph
[20:02] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Read error: Operation timed out)
[20:10] * dmick (~dmick@2607:f298:a:607:c0e5:8c94:9bef:ac54) has joined #ceph
[20:17] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:20] <jlogan> joshd: I needed to reboot my 2nd and 3rd hosts, then ceph-osd worked.
[20:21] <joshd> that's odd... did that trigger a remounting of partitions that had become unmounted or something?
[20:21] <jlogan> I don't think so.
[20:21] <jlogan> I was running 'umount /var/lib/ceph/osd/ceph-8 ; mkfs -t btrfs /dev/sdb1 ; mount /dev/sdb1 /var/lib/ceph/osd/ceph-8' between tries.
[20:24] * dmick-mibbit (267a14e2@ircip1.mibbit.com) has joined #ceph
[20:25] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[20:26] <dmick> crispy: sent you a private message to join our smaller conversation
[20:28] * crispy (c0373729@ircip2.mibbit.com) has left #ceph
[20:37] * sagelap (~sage@c-66-31-47-40.hsd1.ma.comcast.net) has joined #ceph
[20:55] * sjustlaptop (~sam@ has joined #ceph
[20:57] * maelfius (~mdrnstm@ has joined #ceph
[21:04] * lofejndif (~lsqavnbok@82VAAGG26.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:10] <stan_theman> does anyone know if ceph is capable of running separate mds servers?
[21:12] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[21:14] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:15] * maelfius (~mdrnstm@ Quit (Quit: Leaving.)
[21:17] * pentabular (~sean@adsl-70-231-129-231.dsl.snfc21.sbcglobal.net) has joined #ceph
[21:19] <jmlowe> stan_theman: yes
[21:22] <stan_theman> separate from each other? (rather than multiple, which seems to be out of favor atm)
[21:22] <jmlowe> you want active active, or active passive?
[21:24] <stan_theman> i think for this, i'd want two separate metadata servers that are unaware of each other
[21:24] <jmlowe> I believe that would be two separate clusters you are after there
[21:29] <stan_theman> i'm trying to minimize the trouble posed to me by the mds being the more flaky of the daemons. i may want multiple clusters on the same set of hardware to achieve this, but i'm not sure exactly how hairy it's getting at this point
[21:34] <jmlowe> <- not a ceph developer
[21:34] <jmlowe> I think you want to run multiple mds daemons, and you can run them anywhere as they store their data in the osd's
[21:34] <Tv_> stan_theman: you can set up multiple clusters on the same hardware, with --cluster
[21:35] <Tv_> stan_theman: plenty of scripts are still hardcoded to assume $cluster="ceph", but you should be able to do it already
[21:35] <Tv_> stan_theman: now, they'll have completely separate osds etc, not just mds
[21:36] <stan_theman> that's perfect, thanks a ton!
[21:36] <Tv_> sudo start ceph-osd cluster=foo id=42 etc should work just fine
[21:37] <jmlowe> if you are trying to solve problems with your mds going down, I'm not sure multiple clusters on the same hardware is going to get you anywhere
[21:37] <Tv_> yeah i think the hardcoded "ceph" instances are more in the chef cookbook etc, core product should be good
[21:37] <stan_theman> it's just that they fall over, passive comes up beautifully, but i dont want to end up with hundreds of osds burning through a bunch of passive mds
[21:38] <Tv_> yeah not sure if that's the *right* solution
[21:38] <Tv_> stan_theman: huh what?
[21:38] <Tv_> stan_theman: standby mdses do just about nothing
[21:38] <Tv_> what burn?
[21:39] <Tv_> stan_theman: perhaps you should state your problem, not a proposed solution
[21:39] <jmlowe> if I had to guess, I'd put money on your osd's being sick giving the mds's bad data so they fall over
[21:39] * Cube (~Adium@ has joined #ceph
[21:39] <Tv_> jmlowe: that sounds unlikely
[21:40] <Tv_> mds still crashes & hangs, that's why we aren't calling it production ready
[21:40] <Tv_> but rados hasn't corrupted data in a very long time
[21:40] * jjgalvez1 (~jjgalvez@ has joined #ceph
[21:40] <jmlowe> Tv_: clearly you've never experienced the joy of corrupt leveldbs
[21:40] <stan_theman> Tv_: i didn't mean to offend. i can issue a big rm over cephfs and the mds goes away
[21:41] <Tv_> jmlowe: oh yeah i did help debug that.. can't trust any libraries these days :(
[21:41] <Tv_> stan_theman: and a standby takes its place, right? what's this about "osds burning through mds"?
[21:41] <stan_theman> my previously-standby comes in perfectly
[21:42] <stan_theman> and some other guy issues an rm
[21:42] <jmlowe> Tv_: I think fiemap problems were trashing just about everything
[21:43] <Tv_> stan_theman: sounds like you just need to start the first daemon back, and now *it* will be the standby
[21:43] <Tv_> stan_theman: e.g. upstart will do that for you
[21:43] <Tv_> stan_theman: so yes, you've found a way to make mds crash (and we'd appreciate a good bug report!), but i still don't see how running multiple clusters or anything like that would help there
[21:43] <jmlowe> stan_theman: well I'd say either your particular ceph filesystem is broken or there is a bug that your usage patterns and hardware have exposed, either way throwing more daemons at it will just mean you have more to recover
[21:43] <Tv_> stan_theman: and i especially don't see how "osds burn through mds"
[21:44] <jmlowe> I think he means that it is trivial to knock over a mds and he's doing it often
[21:45] <stan_theman> right, i didn't mean to offend anybody
[21:46] <Tv_> not offended, puzzled
[21:46] <Tv_> stan_theman: please explain more
[21:46] <jmlowe> I don't believe it's offensive, they just want to make a good product and if it isn't working correctly they want to squash those bugs asap
[21:47] <stan_theman> i will submit the report soon! just feeling out some of my options for now. we have a couple standby md servers and some juice to kick up dead guys
[21:47] <jmlowe> how about this in cron * * * * * service ceph -s start
[21:47] <Tv_> mds status is like.. it can hang or crash for you; it may come back after restart of daemon or hose the cephfs permanently; it should not hurt mons or osds while doing so
[21:47] * jjgalvez (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[21:48] <Tv_> the value in multiple clusters is to contain the damage
[21:48] <Tv_> but i'm not sure needing to maintain separate piles of osds for the clusters makes that worth it
[21:48] <stan_theman> and it got me onto another path where i can't successfully run multiple active and have bottleneck concerns
[21:48] <Tv_> jmlowe: eww
[21:48] <jmlowe> you shouldn't add that cron entry by the way
[21:49] <Tv_> stan_theman: multiple active mds is *more* likely to fail
[21:49] <stan_theman> Tv_: heh, i am fully aware :P
[21:49] <jmlowe> I once had a broken pbs/moab, only thing I could do until the fix came out was to restart frequently with a cron job
[21:50] <Tv_> jmlowe: dude, upstart/systemd/runit/s6/daemontools/*anything* other than that cron job ;)
[21:50] <stan_theman> we've pretty much got the same thing going here jmlowe. i was kind of surprised that ceph didn't try to kick the dead mds after passive takes over
[21:50] <stan_theman> because that's almost exactly always the fix
[21:51] <Tv_> restarting daemons automatically has been Standard Operating Procedure for a decade now
[21:52] <Tv_> stan_theman: not ceph's job; see upstart et al
[21:52] <stan_theman> i understand. perhaps the better route would be to mention it in the docs, since it sounds like everybody's doing their own homebrew method
[21:53] <jmlowe> Tv_, think rhel 4 circa 2002
[21:53] <Tv_> stan_theman: better route is kicking sysvinit to the curb -- see src/upstart ;)
[21:53] <stan_theman> heh
[21:53] <Tv_> jmlowe: existed way before that; or perhaps your definition of Standard is very RHELly
[21:56] <elder> ORHELY
[21:56] <joao> ORLY?
[21:57] <Tv_> R'hyel (http://en.wikipedia.org/wiki/R%27lyeh )
[21:57] <elder> Oh, so now it's necessary to cite references?
[21:58] <Tv_> wasn't sure if the joke was easy enough
[21:58] <joao> at least two, as wikipedia may be easily contested in some circles
[21:58] <Tv_> but "The nightmare corpse-city of R'lyeh…" is a valid description of RHEL, as far as I'm concerned
[22:00] <elder> ,___,
[22:00] <elder> {o,O}
[22:00] <elder> |)``)
[22:00] <elder> -"-"-
[22:00] <elder> O RLY?
[22:00] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:00] <nhmlap> high score
[22:01] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:01] <joao> rofl
[22:01] <nhmlap> actually, I found a 24 line version but I'll refrain from posting it.
[22:01] <elder> http://www.hjo3.net/orly/gal1/orly_owl.jpg
[22:03] <nhmlap> elder: http://asciiorly.ytmnd.com/
[22:03] <elder> With sound!
[22:23] <dspano> Tv_: Lol!
[22:24] <pentabular> +1
[22:36] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:42] * nhorman_ (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:42] * pentabular (~sean@adsl-70-231-129-231.dsl.snfc21.sbcglobal.net) Quit (Remote host closed the connection)
[22:46] * pentabular (~sean@ has joined #ceph
[22:47] * pentabular is now known as Guest7104
[22:47] * Guest7104 is now known as pentabular
[22:54] * Cube (~Adium@ Quit (Ping timeout: 480 seconds)
[22:54] * Cube (~Adium@ has joined #ceph
[22:58] * maelfius (~mdrnstm@ has joined #ceph
[22:59] * sagelap1 (~sage@176.sub-70-192-1.myvzw.com) has joined #ceph
[22:59] <Tv_> 3-node ceph cluster deployment, no optimizations, no parallelism, includes apt-get update && install: 1m20sec; time taken by apt: 1m1sec
[22:59] <Tv_> <3 ceph-deploy
[22:59] * sagelap (~sage@c-66-31-47-40.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[23:00] <joao> Tv_, cool!
[23:00] <Tv_> i expect apt time to decrease by n, if i parallelize it
[23:00] <Tv_> but not interesting right now
[23:00] <Tv_> usability first
[23:01] <Tv_> (and you can parallelize it outside of the tool just fine, already)
[23:01] <joao> and I'm sure that even if you don't optimize it, those 19 extra seconds won't be noticeable in the whole picture :)
[23:08] * steki-BLAH (~steki@ has joined #ceph
[23:11] <dmick> the ASCII owl is definitely the highlight of my chat this week
[23:11] * BManojlovic (~steki@ Quit (Read error: Operation timed out)
[23:11] <dmick> next to the Lovecraft, even
[23:11] * lofejndif (~lsqavnbok@82VAAGG26.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[23:28] * sagelap1 (~sage@176.sub-70-192-1.myvzw.com) Quit (Ping timeout: 480 seconds)
[23:35] * pentabular (~sean@ has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.