#ceph IRC Log

Index

IRC Log for 2013-03-08

Timestamps are in GMT/BST.

[0:10] * aliguori (~anthony@32.97.110.51) Quit (Quit: Ex-Chat)
[0:16] * The_Bishop (~bishop@93.182.144.2) Quit (Ping timeout: 480 seconds)
[0:20] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[0:25] * scuttlemonkey changes topic to 'v0.56.3 has been released -- http://goo.gl/f3k3U || argonaut v0.48.3 released -- http://goo.gl/80aGP || New Ceph Monitor Changes http://ow.ly/ixgQN'
[0:26] <scuttlemonkey> jmlowe: your quote was used in the new blog entry ^^
[0:27] * Philip_ (~Philip@hnvr-4d07ac83.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[0:27] * The_Bishop (~bishop@2001:470:50b6:0:d08e:b805:c836:825c) has joined #ceph
[0:51] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:53] * rinkusk (~Thunderbi@CPE00259c467789-CM00222d6c26a5.cpe.net.cable.rogers.com) Quit (Ping timeout: 480 seconds)
[0:54] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:55] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:56] <infernix> what happens if an object on an OSD gets corrupted and it is being read by a client?
[0:56] <infernix> does the client receive said corrupted data?
[0:56] <Vjarjadian> scuttlemonkey, you work with proxmox at all?
[0:59] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[1:02] <iggy> interesting read
[1:03] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[1:05] <dmick> infernix: that's what scrub is for
[1:05] * themgt (~themgt@208.69.231.2) has joined #ceph
[1:06] <janos> the blog shows release of .57 and .58, but topic and the fedora repo still have .56.3
[1:07] <janos> is there a reason for the disconnect?
[1:07] <janos> or just busy
[1:07] <janos> like are .57/.58 considered not really ready?
[1:07] <dmick> repos have LTS releases
[1:07] <dmick> development versions are available in a different place
[1:08] <dmick> .56.3 is "bobtail.3"
[1:08] <janos> ah i didn't realize which were LTS and not
[1:08] <janos> ah, gotcha
[1:08] <janos> thanks
[1:14] * tryggvil (~tryggvil@95-91-243-251-dynip.superkabel.de) Quit (Quit: tryggvil)
[1:16] <infernix> dmick: on btrfs i presume?
[1:17] <infernix> last i checked it was still recommended to use xfs in production?
[1:20] <infernix> and does the btrfs scrub play nice with ceph, e.g. does it replicate a second or third copy of the object if the first is found to be corrupted with scrub?
[1:20] <dmick> infernix: no, ceph has scrub and deep-scrub
[1:22] <jmlowe> btrfs scrub will fix errors in btrfs or if you have a failing disk, DO NOT USE BTRFS with kernel versions < 3.8 and ceph, you WILL LOSE DATA
[1:23] <infernix> which is why i'm still on xfs. btrfs corrupts file size on 3.7 was fixed only a month ago
[1:24] <jmlowe> yeah, I'm the one who found that bug and got it reported with sjust and sage's help
[1:25] <scuttlemonkey> Vjarjadian: me personally? No, I haven't done much with it
[1:25] <infernix> i'll just stick with this for a little longer, i can blow up my entire production deployment without too much trouble as i'm refreshing all data in it daily
[1:26] <jmlowe> that being said having raid 10 for data and metadata on with 4 disks per btrfs filesystem kept my osd's alive for a few weeks with a failing drive
[1:28] * Tiger (~kvirc@80.70.238.91) Quit (Ping timeout: 480 seconds)
[1:40] * noob22 (~yaaic@70-1-141-152.pools.spcsdns.net) has joined #ceph
[1:42] <noob22> nhm: I haven't gotten a chance to fire up ceph oh centos 6 yet but I was wondering. if I have a vanilla kernel build would the rbd portion 'just work'?
[1:42] <noob22> it looked like the rpm packages provided everything I needed but just needed a newer kernel
[1:47] <iggy> noob22: rbd (and cephfs) have been merged in the upstream kernel for some time now
[1:48] * noob23 (~yaaic@108.121.25.144) has joined #ceph
[1:48] <dmick> ...but the versions that are there may be way out of date
[1:48] * noob23 (~yaaic@108.121.25.144) has left #ceph
[1:49] <dmick> how many noobs do we have here, jeez
[1:51] * noob23 (~yaaic@108.121.25.144) has joined #ceph
[1:51] <noob23> piggy:
[1:51] <noob23> iggy: sorry my connection is crap here
[1:52] <noob23> so there's nothing else I need other than a newer kernel. I'm going to give this a shot in a vm tonight :-)
[1:52] <iggy> I was assuming using an up to date vanilla kernel, but yeah, if you mean using like a 2.6.32 vanilla kernel... don't bother
[1:53] <noob23> right. I mean a newer 3.x kernel
[1:53] <iggy> the newer the better generally speaking
[1:53] <dmick> oh. sure, if you're building kernels, have at it
[1:53] * noob22 (~yaaic@70-1-141-152.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[1:53] <dmick> the only thing newer than the upstream is our own kernel tree
[1:53] <dmick> github.com/ceph/ceph-client
[1:54] <noob23> good point. I didn't think to use that
[1:54] <iggy> does the kernel rbd client support v2 images yet?
[1:54] <dmick> not yet
[1:54] <iggy> so yeah... that level of bleeding edge probably won't buy you much
[1:55] <noob23> right
[1:56] <dmick> bugs, mostly
[1:57] <noob23> heh
[1:57] <iggy> I wonder what the diff is between 3.8.2 and ceph-client
[1:57] <phantomcircuit> what would be the best way to manipulate rbd images from python ?
[1:57] <phantomcircuit> i'd like to cut libvirtd out of that part of my infrastructure...
[1:58] * jlogan1 (~Thunderbi@2600:c00:3010:1:3500:efc8:eaed:66fd) Quit (Ping timeout: 480 seconds)
[1:58] <dmick> iggy: easy to check.
[1:58] * darkfader (~floh@88.79.251.60) Quit (Remote host closed the connection)
[1:58] <dmick> phantomcircuit: depends on what you mean by manipulate, but there is a librbd Python binding
[1:59] <phantomcircuit> currently it's just create, destroy, expand
[1:59] <iggy> a lot of the basic stuff can be done manipulating stuff in /sys
[1:59] <iggy> I don't know about expand (re: /sys)
[1:59] * darkfader (~floh@88.79.251.60) has joined #ceph
[1:59] <dmick> iggy: if you're using krbd
[2:00] <phantomcircuit> optimally i would like to allow for creating v2 clone volumes from images to save disk space
[2:00] <dmick> phantomcircuit: you can certainly do those things
[2:00] <iggy> then yeah, you have to go the librbd route
[2:00] <phantomcircuit> currently all volumes are v1
[2:00] <phantomcircuit> ok
[2:00] <phantomcircuit> it's not possible to mount an rbd v2 volume yet right?
[2:00] <dmick> http://ceph.com/docs/master/rbd/librbdpy/
[2:00] <dmick> not in the kernel
[2:00] <iggy> or subprocess'ing the rbd command line tool
[2:01] <phantomcircuit> so i cannot say create a cow volume, mount it, edit config files, unmount
[2:01] <dmick> no; you'd have to do that thru a VM ATM
[2:01] * themgt (~themgt@208.69.231.2) Quit (Quit: themgt)
[2:01] <phantomcircuit> ok
[2:02] <phantomcircuit> it should be easy enough to setup an init script which does configuration beyond dhcp settings
[2:02] <dmick> or you could take your chances with rbd-fuse
[2:02] <dmick> which is there, but a bit raw
[2:02] <phantomcircuit> lol fuse
[2:02] <phantomcircuit> no thank you :)
[2:02] <dmick> <shrug> it works great for me
[2:02] <dmick> particularly for little things like that
[2:02] <phantomcircuit> my best experience with fuse is sshfs
[2:02] <iggy> if your editing was happening in a VM, you could attach it as a regular qemu block device
[2:02] <phantomcircuit> and that behaves fairly poorly if anything goes even slightly wrong :/
[2:03] <phantomcircuit> iggy, yeah
[2:03] <phantomcircuit> so far i've though of a couple of ways to do it
[2:03] <dmick> I'm just saying. I've mounted ext4 with rbd-fuse and done fio tests.
[2:03] * ScOut3R_ (~scout3r@1F2EAE22.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[2:03] <phantomcircuit> the most straight forward is to mount and edit directly
[2:03] <phantomcircuit> followed by an init script which pulls config that users could disable if they wanted
[2:03] <iggy> use something like virt-edit...
[2:03] * jjgalvez1 (~jjgalvez@12.248.40.138) Quit (Quit: Leaving.)
[2:04] <iggy> at least it's a little automated
[2:04] <phantomcircuit> followed by installing an ssh key and running scripts over ssh
[2:04] * dmick is intrigued
[2:04] <phantomcircuit> which is afaict what ovh does actually
[2:04] <iggy> but that's not exactly taking libvirt out of your setup
[2:05] <phantomcircuit> is libguestfs based on libvirt?
[2:05] <dmick> libguestfs. who knew.
[2:05] * BillK (~BillK@58-7-124-91.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[2:05] <iggy> it uses libvirt, yeah
[2:06] <phantomcircuit> libvirtd is basically broken in the way it deals with rbd
[2:06] <phantomcircuit> pools -> volume right
[2:06] <phantomcircuit> when libvirtd starts it connects to each monitor
[2:06] <ShaunR> man i'm doing some simple dd tests and ceph's loosing badly.. any ideas why? http://pastebin.ca/2329729
[2:07] <phantomcircuit> if any of them are offline it refuses to start rbd internally and in turn refuses to start any vm that uses rbd
[2:07] <ShaunR> i'm testing a local 4 disk raid10 array against 3 ceph storage servers with 3 disks each.
[2:07] <ShaunR> 10 concurrent vm's
[2:07] <iggy> phantomcircuit: sounds like a bug... is this something that's well known and just not fixed yet?
[2:08] <iggy> <--- not a libvirt user, so I don't keep up
[2:08] <phantomcircuit> iggy, im not sure actually it's certainly not documented anywhere
[2:08] <phantomcircuit> ShaunR, the benchmarks with oflag=direct it's almost certainly caused by the added latency of ceph and the network
[2:09] * noob23 (~yaaic@108.121.25.144) Quit (Quit: Yaaic - Yet another Android IRC client - http://www.yaaic.org)
[2:10] <ShaunR> phantomcircuit: that would only account for two of the tests...
[2:10] <ShaunR> the read test i couldnt use direct when using /dev/null as the output
[2:10] <iggy> those numbers all look bad (even the lvm ones which shouldn't be that bad)
[2:10] <ShaunR> and ceph mostly ran around 5MB/s were as the LVM ran 12MB/s
[2:11] <phantomcircuit> ShaunR, run both of them such that the results are returned from cache
[2:11] <phantomcircuit> it will give you an idea what the overhead for network/ceph is compared to lvm
[2:11] <phantomcircuit> and actually iggy has a good point
[2:11] <phantomcircuit> the lvm numbers are terrible
[2:11] <phantomcircuit> bs=1G
[2:12] <phantomcircuit> ShaunR, how much memory on the system
[2:12] <ShaunR> each VM 1G
[2:12] <iggy> that's not good
[2:13] <iggy> under ideal circumstances, you would have ended up going at least partially into ram
[2:13] <ShaunR> why would the vm's need more ram?
[2:13] <iggy> *into swap
[2:14] <iggy> understand what bs is
[2:14] <phantomcircuit> ShaunR, the buffer is almost certainly causes swapping
[2:14] <phantomcircuit> try it again with bs=1M
[2:14] * BillK (~BillK@124-169-104-82.dyn.iinet.net.au) has joined #ceph
[2:14] <iggy> that says read 1G of data from 1 place and write it to another
[2:14] <phantomcircuit> after 1M if the overhead is significant relative to size you're screwed anyways since most iops are not 1M
[2:15] <iggy> read 1G of data _all at once_
[2:15] <ShaunR> I see what your saying
[2:15] <ShaunR> I'll run them again, is 1M ideal for this test?
[2:15] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:16] <iggy> it's going to depend
[2:16] <iggy> your lvm will probably do better with 128K (or whatever the stripe size is)
[2:16] <iggy> rbd might do better with 4M
[2:16] <iggy> but 1M is a good starting point
[2:16] <phantomcircuit> iggy, i use it since it's a middle ground
[2:17] <iggy> fair point
[2:17] <phantomcircuit> although for a test like that without the flush op rbd should do fine with 128k since they should end up being merged
[2:17] <phantomcircuit> mostly
[2:18] <iggy> something close to max MTU...
[2:18] * LeaChim (~LeaChim@b0faa0c8.bb.sky.com) Quit (Read error: Operation timed out)
[2:18] <sstan_> 8k ?
[2:19] <iggy> for jumbo frames sure
[2:20] <ShaunR> should i not be using oflag=direct?
[2:22] <iggy> it's not really something that most apps do... so from a testing actual workloads perspective...
[2:25] <dmick> gah. finally figured out why my pastebin.ca access was broken
[2:25] <dmick> they must have an IP-addr-based vhost, and it doesn't have the v6 addr, but DNS does. Doh.
[2:28] <dmick> and I'd tell slepp, but his CAPTCHA is braindead. sigh.
[2:28] <dmick> no good deed goes unpunished
[2:32] <ShaunR> Here's a new set of tests, same tests i didnt change anything but bs
[2:32] <ShaunR> http://pastebin.ca/2329737
[2:33] <ShaunR> i forgot to update the command those at the top of that pastebin
[2:33] <ShaunR> LVM is still kicking it's but
[2:33] <iggy> gregaf: looks fine to me
[2:33] <ShaunR> 4 disks against 9 disks
[2:33] <iggy> errr... -gregaf
[2:34] <iggy> there is overhead in maintaining replicas
[2:35] <ShaunR> cache on the vm's is set to none for lvm and writeback on rbd
[2:35] <ShaunR> (qemu cache)
[2:35] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:35] <iggy> gregaf: would be nice to get a little blog post about the samba integration... just read the cephfs blog post and saw it in there... was the first I'd heard of it
[2:36] <gregaf> they aren't upstream right now, but they're available
[2:36] <gregaf> *pokes slang*
[2:36] <iggy> ShaunR: are those dd's all running simultaneously?
[2:36] <gregaf> I'm actually not sure where they live
[2:36] <ShaunR> iggy: i understand that but 4 disks still kicking 9 disks butt?!
[2:36] <ShaunR> iggy: ya, using pssh
[2:36] <iggy> gregaf: https://github.com/ceph/samba/commit/c0e9d806e577b6f1898a899f43d924ec22e2e2ab at least
[2:37] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[2:37] * themgt (~themgt@97-95-235-55.dhcp.sffl.va.charter.com) has joined #ceph
[2:37] <gregaf> yep, that's probably it
[2:37] <nwat> gregaf: I got rid of that CPU hog problem in MDS, and now am facing new problem. All my clients are getting connection refused attempts in their log. The mds has 16 GB virtual memory used, but other than that is just sitting their idle.
[2:37] <iggy> ShaunR: are the guests all on different hosts?
[2:38] <ShaunR> no, guests are all on the same host
[2:38] * nick5 (~nick@74.222.153.12) Quit (Remote host closed the connection)
[2:38] <ShaunR> are you thinking a network speed limitation?
[2:38] <iggy> so you're effectively limited to 1gbit...
[2:38] * nick5 (~nick@74.222.153.12) has joined #ceph
[2:38] <iggy> that seems like pretty good numbers for 1gbit
[2:39] <ShaunR> ya, this host has a single gbit connetion to the storage
[2:39] <ShaunR> the storage hosts have 1 gb public and 1 gb private
[2:39] <gregaf> nwat: more context, please :)
[2:39] <gregaf> are these new clients or are they trying to reconnect?
[2:40] <gregaf> does the MDS think it has active sessions? what state is it in?
[2:40] <nwat> gregaf: new clients connections
[2:40] <nwat> root@issdm-44:~# ceph mds stat
[2:40] <nwat> e463: 1/1/1 up {0=a=up:active}
[2:41] <ShaunR> iggy: i'm going to run these tests again tomorrow but have them write more data (rather than 1M), i'm going to monitor the network on the host and see whats up
[2:42] <nwat> gregaf: where is the active session info located?
[2:42] <ShaunR> with a gbit connection i really shouldnt be able to pull more than 100MB/S
[2:42] <ShaunR> i guess this is going to be tough to test unless i can bond two connections on the host
[2:43] <gregaf> nwat: not sure we expose it anywhere great, actually :/
[2:45] <gregaf> nwat: but in particular I don't see ECONNREFUSED anywhere in our code so I suspect it's a more oblique issue, like max file descriptors open or something
[2:46] <gregaf> I assume you checked that the monitors think it's up
[2:46] <gregaf> try telnet'ing the port and make sure it responds
[2:48] <gregaf> I'm off home, night all!
[3:06] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:07] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:14] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:14] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Read error: Connection reset by peer)
[3:25] <janos> dang! osd's on own back end network with bonding round-robin mode. found one with questionable cable that was negotiating 100Mb. slowing everything down
[3:26] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[3:27] <janos> so i have a small machine which is basically just running samba and it's connecting to ceph's public network
[3:27] <janos> i've tested the heck out of that network and the cluster network
[3:27] <janos> and getting great speeds
[3:27] <janos> and dones some rados bench tests
[3:27] <janos> -s
[3:28] <janos> but when i use samba against an rbd...
[3:28] <janos> it starts fast and quicking drops to the 30MB/s range
[3:28] <janos> *quickly
[3:28] <janos> any ideas?
[3:28] <janos> .56.3
[3:28] <janos> kernel rbd
[3:28] * themgt (~themgt@97-95-235-55.dhcp.sffl.va.charter.com) Quit (Quit: themgt)
[3:28] <janos> 3.8.1 kernel
[3:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[3:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:50] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[3:53] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[4:03] * dpippenger (~riven@216.103.134.250) Quit (Remote host closed the connection)
[4:06] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[4:10] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[4:16] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:26] * diegows (~diegows@190.188.190.11) Quit (Ping timeout: 480 seconds)
[4:33] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:33] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:38] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:40] * andrew (~andrew@ip68-231-33-29.ph.ph.cox.net) Quit (Quit: andrew)
[4:46] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:49] * Gugge-47527 (gugge@kriminel.dk) Quit (Read error: Connection reset by peer)
[4:50] * Gugge-47527 (gugge@kriminel.dk) has joined #ceph
[4:57] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:20] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[5:27] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[5:51] * janeUbuntu (~jane@202.29.6.19) Quit (Ping timeout: 480 seconds)
[6:03] * janeUbuntu (~jane@118.175.7.68) has joined #ceph
[6:11] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[6:11] * loicd (~loic@magenta.dachary.org) has joined #ceph
[6:15] * dmick (~dmick@2607:f298:a:607:514b:1518:5845:bb5f) Quit (Quit: Leaving.)
[6:25] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:27] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:43] * The_Bishop (~bishop@2001:470:50b6:0:d08e:b805:c836:825c) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[6:43] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:13] * yehuda_hm (~yehuda@2602:306:330b:1410:d9d2:d915:85bb:6501) Quit (Ping timeout: 480 seconds)
[7:21] * yehuda_hm (~yehuda@2602:306:330b:1410:d18d:8c54:a9e0:2cc4) has joined #ceph
[7:23] * xiaoxi (~xiaoxiche@134.134.139.74) has joined #ceph
[7:44] * yehuda_hm (~yehuda@2602:306:330b:1410:d18d:8c54:a9e0:2cc4) Quit (Ping timeout: 480 seconds)
[8:19] * KindOne (~KindOne@h79.24.131.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[8:20] * leseb (~leseb@78.251.34.83) has joined #ceph
[8:22] * sdx32 (~sdx23@with-eyes.net) Quit (Remote host closed the connection)
[8:22] * sdx23 (~sdx23@with-eyes.net) has joined #ceph
[8:36] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[8:39] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:46] * KindOne (KindOne@h174.23.131.174.dynamic.ip.windstream.net) has joined #ceph
[8:49] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[8:53] * Morg (b2f95a11@ircip2.mibbit.com) has joined #ceph
[9:04] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[9:07] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:14] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Read error: Connection reset by peer)
[9:14] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[9:16] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[9:17] * tryggvil (~tryggvil@2a02:8108:80c0:1d5:24c4:b957:6006:3c6) has joined #ceph
[9:19] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:21] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) has joined #ceph
[9:22] * tryggvil (~tryggvil@2a02:8108:80c0:1d5:24c4:b957:6006:3c6) Quit ()
[9:24] * Philip_ (~Philip@hnvr-4d07ac83.pool.mediaWays.net) has joined #ceph
[9:27] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[9:33] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:36] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:40] * LeaChim (~LeaChim@b0faa0c8.bb.sky.com) has joined #ceph
[9:43] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) has joined #ceph
[9:47] * l0nk (~alex@83.167.43.235) has joined #ceph
[9:52] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[9:52] * leseb_ (~leseb@83.167.43.235) has joined #ceph
[9:55] * esammy (~esamuels@host-2-103-101-90.as13285.net) has joined #ceph
[9:55] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[9:56] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) has joined #ceph
[9:56] <gucki> hi there
[9:56] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[9:56] <gucki> i'm having a really strange problem with data integrity
[9:57] <gucki> when i import a 10gb file (vm image) using "rbd import .." and then export it again "rbd export ..." the md5sum changed. i didn't start the vm of course ;)
[9:57] <gucki> is there any way to find out why the data gets corrupted?
[9:58] <gucki> the cluster is latest stable bobtail
[9:59] <gucki> the cluster reports it is healty, no errors. i tried the import twice now
[9:59] <absynth> and does the image work?
[9:59] <absynth> as in, can you start it?
[9:59] <absynth> i seem to remember we had similar observations
[10:01] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[10:01] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Man who run behind car get exhausted)
[10:02] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Remote host closed the connection)
[10:02] <gucki> absynth: no, it's not working. it crashed with many fs errors.. :(
[10:02] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[10:02] * xiaoxi (~xiaoxiche@134.134.139.74) Quit (Ping timeout: 480 seconds)
[10:03] <absynth> hm
[10:03] <gucki> absynth: was it "only" an import problem for you or was some hardware damaged?
[10:03] <absynth> i don't remember
[10:03] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[10:05] <gucki> absynth: is there any way to get an internal md5sum using ceph rbd? (so without exporting it)
[10:05] <absynth> is that for every image you import or only for one specific one?
[10:06] <absynth> no idea, sorry
[10:06] <gucki> absynth: i noticed it with an other images, but it worked after i imported it again. so i thought i made as mistake. but now it seems there's really something wrong and i'm a bit worried.. :(
[10:07] <gucki> absynth: i just remoed the image and import and then export it again, to see if the md5sum is the same as before
[10:11] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) Quit (Ping timeout: 480 seconds)
[10:11] <gucki> absynth: ok, the md5sum is exactly the same wrong one.....so the import seems to be broken. a hardware failure wouldn't produce the same md5um again i assume :)
[10:11] <gucki> absynth: anyway, this is really bad :(
[10:11] <absynth> yeah
[10:11] <absynth> and yeah
[10:11] <absynth> it is
[10:12] <absynth> i think there *might* be a legitimate reason for the md5sum to change during import, but the rbd should definitely not be broken
[10:15] <gucki> absynth: i only thought of holes, but in fact that shouldn't matter because holes should be returned as zeros. so the md5sum should be equal
[10:15] <gucki> absynth: in fact using a hex editor i can see there's a difference in the images :(
[10:20] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[10:21] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) has joined #ceph
[10:21] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:24] <absynth> joao: around?
[10:24] <absynth> gucki: you're probably best off waiting (as usual) and/or opening a ticket
[10:25] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[10:25] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:29] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[10:30] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) Quit (Ping timeout: 480 seconds)
[10:33] <joao> absynth, am now
[10:33] <joao> what's up?
[10:34] <absynth> joao: regarding your blog post about the mon changes
[10:34] <absynth> if we eventually upgrade our production setup to .58, should we do so supervised or do you think it will be painless?
[10:34] <joao> on the monitor it should be painless
[10:36] <joao> the critical part of upgrading is the store conversion, which should happen without a hitch
[10:37] <joao> the monitors may not hold quorum during the upgrade though
[10:39] <absynth> that means i/o interruption, right?
[10:39] <joao> but that should not be a problem if you only upgrade, say, n/2 of your monitors, and then convert one other
[10:39] <absynth> n=3
[10:40] <joao> well, upgrade one monitor; once that's done, upgrade a second one; quorum won't form until that second one has fully converted and joined the quorum
[10:41] <joao> I have no idea how long that can take tbh; I only tested conversion on teuthology, and the workload imposed was far from a production environment
[10:41] <joao> but I'm sure we will get around to test it more before cuttlefish
[10:42] <absynth> hmmm, cuttlefish
[10:42] <absynth> yummy
[10:42] <joao> indeed
[10:42] <joao> can't think of that release without thinking about lunch
[10:42] <absynth> you can go to océanario and get one from the tank ;)
[10:42] <joao> I wish
[10:42] <joao> they won't let me
[10:42] <absynth> you shouldn't have tried to skin the otters.
[10:42] <absynth> are they still alive?
[10:42] <joao> :(
[10:43] <joao> I don't know; they tell me nothing and I can't even get in 50m of the place
[10:43] <absynth> damn those restraining orders
[10:44] * BillK (~BillK@124-169-104-82.dyn.iinet.net.au) Quit (Quit: Leaving)
[10:53] <gucki> joao: hey. do you have any idea why the import could mess up the data? :(
[10:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[10:56] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[10:56] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:57] <jtang> good morning
[10:57] <gucki> hi :)
[10:58] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:59] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[11:00] <joao> gucki, sorry, not really :\
[11:01] <joao> opening a ticket on that might be best, and/or wait around for joshd maybe
[11:02] <gucki> joao: yeah, i'll open a ticket for sure. i just tried to import it to a different pool, but same wrong md5 :(
[11:04] <gucki> joao: i already tried format1 and format2, same issue :(
[11:07] * andrew (~andrew@ip68-231-33-29.ph.ph.cox.net) has joined #ceph
[11:07] * KindTwo (KindOne@h147.26.131.174.dynamic.ip.windstream.net) has joined #ceph
[11:08] * andrew (~andrew@ip68-231-33-29.ph.ph.cox.net) Quit ()
[11:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[11:09] * loicd (~loic@magenta.dachary.org) has joined #ceph
[11:09] * Philip__ (~Philip@hnvr-4dbb3e96.pool.mediaWays.net) has joined #ceph
[11:11] * KindOne (KindOne@h174.23.131.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[11:11] * KindTwo is now known as KindOne
[11:12] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[11:16] * Philip_ (~Philip@hnvr-4d07ac83.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[11:16] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[11:17] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[11:21] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[11:28] <loicd> joao: nice post http://ceph.com/dev-notes/cephs-new-monitor-changes/ :-)
[11:28] <joao> thanks :)
[11:39] <todin> I shutdown one of my osd, because of disk failure, but the cluster, doesn't start the backfill.
[11:40] <todin> probably I messed up my crushmap
[11:45] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[11:45] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[11:50] <absynth> is noout set?
[11:54] * yanzheng (~zhyan@134.134.139.72) has joined #ceph
[12:03] * BillK (~BillK@124-169-104-82.dyn.iinet.net.au) has joined #ceph
[12:06] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[12:09] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Operation timed out)
[12:18] * leseb_ (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[12:20] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:22] <gucki> joao: ok, ticket is here now http://tracker.ceph.com/issues/4388
[12:24] <joao> gucki, cool, will point the guys to it as soon as they wake up
[12:29] <gucki> joao: great, thanks :)
[12:31] <todin> absynth: no it is not set
[12:32] * leseb_ (~leseb@83.167.43.235) has joined #ceph
[12:35] * gregorg_taf (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[12:36] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[12:41] * Philip__ (~Philip@hnvr-4dbb3e96.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[12:47] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:47] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:48] * rtek (~sjaak@empfindlichkeit.nl) Quit (Quit: leaving)
[12:53] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:54] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:58] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[12:58] * loicd (~loic@magenta.dachary.org) has joined #ceph
[13:04] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[13:12] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[13:12] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[13:14] <barryo> I'm attempting my first test install and can't figure out which RPM (el6) installs the rbd kernel module?
[13:17] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[13:19] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[13:32] <barryo> is there a kmod package for rbd that will work on el6?
[13:37] <jtang> barryo: probalby not, i've not seen one myself
[13:38] <jtang> we've been running the more recent kernels from elrepo
[13:38] <jtang> to get rbd/cephfs support
[13:38] <jtang> it works, but its not ideal
[13:41] * LiRul (~lirul@91.82.105.2) has joined #ceph
[13:41] <LiRul> hi there
[13:42] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[13:42] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[13:42] * yanzheng (~zhyan@134.134.139.72) Quit (Remote host closed the connection)
[13:44] <LiRul> have you plan to implement async replication (for multi datacenters)?
[13:46] * loicd (~loic@90.84.146.198) has joined #ceph
[13:50] * leseb_ (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[13:52] * leseb_ (~leseb@83.167.43.235) has joined #ceph
[13:53] * loicd (~loic@90.84.146.198) Quit (Read error: Operation timed out)
[13:58] <barryo> I'm not fond of running elrepo kernels in production, is it a requirement for older versions of ceph too?
[14:03] * loicd (~loic@90.84.146.198) has joined #ceph
[14:05] * joelio upgrades
[14:06] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[14:08] * markbby (~Adium@168.94.245.5) has joined #ceph
[14:10] <joelio> /wi4
[14:10] <joelio> balls
[14:18] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:19] * loicd (~loic@90.84.146.198) Quit (Ping timeout: 480 seconds)
[14:21] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[14:21] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[14:25] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) has joined #ceph
[14:26] <scuttlemonkey> LiRul: yes, there are multi-staged plans for geo replication
[14:27] <scuttlemonkey> the first stage is underway now and will be the more simple solution using the rados gateway
[14:27] <scuttlemonkey> no specific eta yet, but it is one of the two high-priority tasks under the spotlight
[14:27] <LiRul> scuttlemonkey: ahh great news thanks
[14:30] * loicd (~loic@90.84.146.220) has joined #ceph
[14:30] * esammy (~esamuels@host-2-103-101-90.as13285.net) has left #ceph
[14:33] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[14:33] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[14:33] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit ()
[14:35] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[14:38] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) Quit (Ping timeout: 480 seconds)
[14:45] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:47] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[14:48] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit ()
[14:48] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:52] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Remote host closed the connection)
[14:55] * loicd (~loic@90.84.146.220) Quit (Ping timeout: 480 seconds)
[14:55] * Morg (b2f95a11@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:56] <joelio> Am I right in thinking that the debian init scripts only read from ceph.conf when shutting down. ie. if you have made a change to the config (removed an OSD) it will only stop the OSDs that are in ceh.conf. There are processes that remain after for the orphaned OSD's
[14:56] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:57] <todin> something is wrong with my crushmap, if I shutdown a osd, no replications starts http://pastebin.com/jShnjhdN
[14:57] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[14:59] <scuttlemonkey> todin: are you sure that osd had data on it that needed to be moved?
[14:59] <scuttlemonkey> if an osd goes down and you still have the appropriate level of replication it wont need to move anything
[15:00] <todin> scuttlemonkey: ceph -s says it is degraded 16%, I took down a whole node
[15:00] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[15:01] <absynth> did you do something weird, like ceph osd pause or whatnot?
[15:01] <todin> absynth: not that I could remeber, could I show the current config?
[15:02] * Merv31000 (~Merv.rent@150.101.235.251) has joined #ceph
[15:03] <absynth> sure, paste it
[15:03] <todin> absynth: http://pastebin.com/jShnjhdN
[15:03] <scuttlemonkey> fwiw todin, you are iterating over "racks" based on pool rules for replication
[15:04] <absynth> yeah, taht's your crushmap
[15:04] <todin> I think the placment rule for rbd is not right
[15:04] <scuttlemonkey> step take default
[15:04] <scuttlemonkey> step chooseleaf firstn 0 type rack
[15:04] <scuttlemonkey> that says start at 'default' and put one replica per group of type rack
[15:04] <todin> scuttlemonkey: yes, I want the replicas on to diffrent racks
[15:05] <todin> and in the racks choose the nodes, which one I don't mind
[15:06] <scuttlemonkey> can you pastebin 'ceph health detail'
[15:07] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:08] <todin> scuttlemonkey: http://pastebin.com/WAfB4La7
[15:08] * Merv31000 (~Merv.rent@150.101.235.251) has left #ceph
[15:09] <scuttlemonkey> how many mons do you have?
[15:09] <todin> scuttlemonkey: 3 mons, one is down
[15:09] <absynth> you don't have quorum
[15:09] <absynth> or?
[15:10] <absynth> no, scratch that
[15:10] * Loffler (~Loffler@150.101.235.251) has joined #ceph
[15:10] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[15:10] <todin> now all there mons are up
[15:11] * yehuda_hm (~yehuda@2602:306:330b:1410:b492:72d4:9825:1bd9) has joined #ceph
[15:11] <scuttlemonkey> ok, lets just snag that first pg
[15:11] * Merv31000 (~Merv.rent@150.101.235.251) has joined #ceph
[15:11] <scuttlemonkey> 'ceph pg 2.63f list_missing [starting offset, in json]
[15:12] <Merv31000> Hello Scuttlemonkey
[15:12] <scuttlemonkey> hey Merv
[15:12] <absynth> scuttlemonkey: so, just to learn something, what are you looking for now?
[15:13] <scuttlemonkey> absynth: trying to see what objects are unfound
[15:13] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:13] <absynth> how do you see there are any?
[15:13] <Merv31000> I need to get our system going reliably again.
[15:13] <todin> http://pastebin.com/pZeqj4s6
[15:14] <absynth> i can't see any unfound objects in that ceph health dump?
[15:14] <scuttlemonkey> yeah
[15:14] <Merv31000> Should I be doing fsck on all servers before anything?
[15:14] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has left #ceph
[15:15] <scuttlemonkey> Merv31000: you could...from the sounds of your ceph -w it sounds like your network is having issues
[15:15] <scuttlemonkey> and ceph is just one of the things that is suffering as a result
[15:15] <Merv31000> I think so too ...
[15:16] <scuttlemonkey> we're not really the best folks to help with that...but we can help you get ceph put back together once boxes stop exploding
[15:16] <scuttlemonkey> todin: odd...what happens when you 'ceph pg 2.63f query'
[15:16] <absynth> ceph hates packet loss
[15:16] <absynth> and it hates latency
[15:16] <absynth> *much*
[15:16] <Merv31000> I thought that might be the case.
[15:17] <joao> fyi, since a couple of dev releases ago, 'ceph health --format=json' provides estimates on latency
[15:17] <scuttlemonkey> Merv31000: you have a ping -t going to one of the boxes it's complaining about?
[15:17] <todin> http://pastebin.com/gkdUPLKU
[15:18] <Merv31000> no ... Out of desperation I powered off and unplugged all except one just now as they were repeatedly rebooting
[15:18] <scuttlemonkey> todin: and ceph health detail isn't moving?
[15:18] * markbby (~Adium@168.94.245.5) Quit (Remote host closed the connection)
[15:19] <todin> scuttlemonkey: still 16.489% degraded
[15:19] <Merv31000> I can bring up 3 for a quorum and try that,
[15:21] <scuttlemonkey> todin: if you bring the downed node back up does it recover? or is this a larger issue of how you are placing your data in the first place?
[15:21] <todin> scuttlemonkey: If I bring them up again, the degardation goes to zero
[15:21] <scuttlemonkey> Merv31000: you could, just to see how ceph is handling things...although if I were you I would want to know why the boxes keep going out
[15:24] * sagewk (~sage@2607:f298:a:607:acf1:ce74:be94:3445) Quit (Ping timeout: 480 seconds)
[15:25] <Merv31000> Is it Stonith that is forcing the reboots? Can I disable that to keep things more stable?
[15:25] <absynth> there's a mechanism in ceph that forces reboots?
[15:26] <Merv31000> I am not expert in ceph and related .. normally O have my tech to solve these problems
[15:27] <absynth> as far as i know, there is no stonith-type automatism in ceph itself, and i don't think it is necessary either
[15:27] <absynth> unless, maybe, for stray monitors after a split-horizon situation
[15:28] <scuttlemonkey> yeah, I have never seen stonith reset stuff set up with ceph
[15:29] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[15:29] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has left #ceph
[15:30] <Merv31000> I think Stonith is used to reboot our host servers if they appear to be not responding?
[15:30] <scuttlemonkey> todin: what version are you running?
[15:31] <todin> scuttlemonkey: 0.57
[15:31] <scuttlemonkey> k
[15:32] <absynth> Merv31000: that would be very silly, as ceph is not dependant on host unanimosity
[15:32] <absynth> the monitor hosts (you usually have an odd number >=3) need to form a majority, but that`s about it
[15:33] <todin> scuttlemonkey: I had that problem a few year ago, the crushmap was wrong, last time Tv helped me.
[15:33] <Merv31000> Sorry I only know the bits I have slowly leaned as our system was created.
[15:33] <Merv31000> Pretty sure it has stonith . tech refers to it.
[15:34] <scuttlemonkey> todin: yeah, most of these issues turn out to be crushmap related...just stepping through yours now
[15:34] * sagewk (~sage@2607:f298:a:607:6d1c:81c9:99e8:b20c) has joined #ceph
[15:34] <absynth> Merv31000: so, what hosts does your tech say get shot in the head?
[15:34] <absynth> mons, OSDs?
[15:34] <absynth> what do you even do on the cluster? hosts VMs on RBD, use radosgw, use cephfs?
[15:34] * drokita (~drokita@199.255.228.128) has joined #ceph
[15:35] <Merv31000> We have 5 what I call hosts ... each has an OSD and also is a monitor and runs VMs
[15:35] <todin> scuttlemonkey: I think the crush alog tries to find a new place, but cannot find some, because I have only two racks. I think I need a further placement rule for the host
[15:36] * stxShadow (~jens@p4FD06FB0.dip.t-dialin.net) has joined #ceph
[15:36] <Merv31000> Using Ext4fs as RBD was a real problem for us
[15:37] <scuttlemonkey> todin: yeah, but there are other hosts still up on the rack w/ the downed host
[15:37] <absynth> that sounds like a really badly-designed setup
[15:37] <scuttlemonkey> you didn't take the whole rack down, did you?
[15:37] <absynth> co-locating mons and OSDs is usually not a good idea
[15:37] <absynth> so, lets see
[15:37] <todin> scuttlemonkey: no, just one host in the second rack
[15:37] <absynth> what does "ceph -s" on one of the nodes say?
[15:37] <absynth> is it doing some kind of recovery right now?
[15:38] <Merv31000> are you aking me about the rack?
[15:38] <absynth> no, he was asking todin
[15:38] <todin> absynth: no recovery
[15:38] <absynth> _I_ was asking Merv31000 ;)
[15:38] <absynth> so. absynth -> Merv31000, scuttlemonkey -> todin ;)
[15:39] <absynth> Merv31000: can you show "ceph -s"?
[15:40] <Merv31000> No recovery , no ceph right now. only one host at runlevel 1.
[15:40] <janos> sounds dire
[15:40] <absynth> so the whole farm is down?
[15:40] <Merv31000> we could net keep servers running long enough for a revocery to progress ... kept rebooting.
[15:40] <absynth> what the fuck?
[15:40] <Merv31000> Yep.
[15:40] <absynth> i mean, seriously...?!
[15:41] <janos> something else wrong
[15:41] <Merv31000> Horribly true ..
[15:41] <absynth> who in their right mind would implement some kind of stonith mechanism on ceph OSDs?
[15:41] <scuttlemonkey> hehe
[15:41] <janos> i was wondering same...
[15:41] <absynth> ok, so the first thing should be to find and disable that mechanism
[15:41] <Merv31000> I think there may be some networking issue .. this has been working well for 3months
[15:41] <absynth> even if it was implemented with the mons in mind, it is obviously completely unnecessary
[15:41] <absynth> because if you have 5 nodes, you can reach quorum
[15:42] <absynth> are the OSDs interconnected by a private network?
[15:43] <scuttlemonkey> todin: I'm not understanding why it isn't backfilling from store2 into another host on rack2
[15:43] <Merv31000> Vladimir built it very carefully after lots of reseach and feedback from Ceph community
[15:43] <janos> i doubt anyone in here told him to use stonith
[15:44] <Merv31000> we have a quorum of 3
[15:44] * LiRul (~lirul@91.82.105.2) has left #ceph
[15:44] <todin> scuttlemonkey: me neither, that is why I asked
[15:44] <absynth> ceph osd pause?
[15:44] <absynth> i suggested it earlier, but i feel compelled to do so again
[15:44] <absynth> we had that once
[15:44] <todin> absynth: you asking me?
[15:45] <absynth> yes, this time i am
[15:45] <absynth> i should prefix my statements always
[15:45] <absynth> Merv31000: so, whatever he built, it's currently not working. we don't mean to point fingers, we all want to get you running again
[15:45] <Merv31000> I appreciate that!
[15:45] <absynth> Merv31000: and in order to do taht, you need to find vladimir and have him disable (or at least explain very, very well) that stonith mechanism he implemented
[15:46] <scuttlemonkey> todin: you could always just send it a 'ceph osd unpause' and see
[15:46] <absynth> because if you don't, you will never get the cluster running again
[15:46] <Merv31000> He is probably uncntacable all long-weekend!
[15:46] <Merv31000> He is probably uncontacable all long-weekend!
[15:46] <janos> ;(
[15:46] <absynth> Merv31000: uh, that's not so good
[15:46] <Merv31000> No!
[15:47] <absynth> Merv31000: then, look on the remaining node and try to find anything that would reboot it upon whatever
[15:47] <absynth> Merv31000: maybe in /etc/network/ifdown.d or in the inittab, there's the craziest places
[15:47] <scuttlemonkey> todin: we could try a 'ceph pg dump_stuck unclean'
[15:47] <mattch> Merv31000: Not running pacemaker is it?
[15:47] <todin> scuttlemonkey: no change after unpause
[15:48] <Merv31000> I am pretty confident they will all start again and try to rebuild, but will reboot again an again
[15:48] <Merv31000> Yes it runs pacemanker.
[15:48] <absynth> Merv31000: and this flapping makes your cluster impossible to control
[15:48] <mattch> Merv31000: that's almost certainly where the stonith will be implemented then.
[15:48] <Merv31000> yep
[15:48] <scuttlemonkey> todin: what kind of testing have you been doing?
[15:48] <scuttlemonkey> have you taken other hosts down during this test?
[15:49] <mattch> Merv31000: if you're happy to turn off pacemaker, you could put it in maintenance mode (which will make it watch, but not do anything to the cluster): crm configure property maintenance-mode=true
[15:49] <todin> scuttlemonkey: http://pastebin.com/fv0iLaKM
[15:50] * Cube (~Cube@12.248.40.138) has joined #ceph
[15:50] <todin> scuttlemonkey: I tried to do a reduandancy test, on the cluster.
[15:50] <todin> scuttlemonkey: no, just store5
[15:51] <scuttlemonkey> ok
[15:52] * darkfader hands mattch a beer for knowing how to properly use pacemaker, much different from people who wrote most howtos on it :)
[15:54] <Merv31000> My problem is I am not expert enough to solve the probable underlying problem or readily act on many of the suggestions.
[15:54] <absynth> Merv31000: the most interesting question here would probably be, is this a production cluster?
[15:54] <Teduardo> has anyone tried building ceph nodes from the new backblaze 3.0 pod std?
[15:54] <Merv31000> YEP!
[15:54] <Teduardo> im sure thats a bad idea
[15:55] <absynth> Teduardo: IMHO, backblaze is not a good idea for ceph, too many spinners per host
[15:55] <janos> Teduardo: not a good match
[15:55] <absynth> Teduardo: if you have one OSD per spinner, you will have what, 48 OSDs per host
[15:55] <Teduardo> the new std i think is 136
[15:55] <Teduardo> drives
[15:55] <Teduardo> per box
[15:55] <absynth> ahaha
[15:55] <absynth> lol
[15:55] <absynth> yeah, right
[15:55] <Merv31000> 500 websites and 1000 email clinets
[15:56] <absynth> Merv31000: oh, that is really unfortunate. i presume you should probably contact inktank for an immediate emergency support ticket
[15:56] <Teduardo> nevermind, only 45 =)
[15:56] <absynth> they usually do that if you throw some money at them, right scuttlemonkey? :)
[15:56] <todin> scuttlemonkey: shouldn't my crushmap not have some sort of step chose firstn 0 type rack -> step choseleaf fristn 0 type host?
[15:57] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:57] <janos> Merv31000: really sounds like maybe that was over-engineered
[15:57] <absynth> from my perspective, it makes a lot of sense to use pacemaker in the VMs that host the actual services
[15:57] <absynth> i.e. your web servers and mail servers
[15:57] <Merv31000> Scuttlemonkey was looking for new customer 24/7 support
[15:57] <absynth> but not on the underlying hosts
[15:58] <janos> absynth: right
[15:58] <absynth> Merv31000: what timezone are you in?
[15:58] <scuttlemonkey> todin: I don't think it's required...but I'm far from the expert
[15:58] <janos> ceph allows for a rather simple underlying structure, which is wonderful
[15:58] <Merv31000> I think it is on the VMs
[15:59] <absynth> Merv31000: but how can the pacemaker on the VMs kill the VM hosts?
[15:59] <scuttlemonkey> Merv31000: yeah, we're always idly trolling for support stuffs
[15:59] <Merv31000> Adelaide South Australia
[15:59] <Merv31000> I thik it is Stonith that is killing hosts it deems to be inresponsive.
[16:01] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[16:02] <absynth> Merv31000: you must get rid of that, as mattch said
[16:03] <absynth> Merv31000: even if it's only temporarily
[16:03] <absynth> Merv31000: and then you will want to hire someone from inktank
[16:03] * madkiss (~madkiss@089144192073.atnat0001.highway.a1.net) has joined #ceph
[16:03] <Merv31000> Yes ... finding out how to will be the tricky part ...
[16:04] <mattch> Merv31000: maintenance-mode should be a safe operation to enable... and will stop the stonith happening.
[16:04] <mattch> (of course, there's no way to know that whoever enabled the stonith function didn't have a really good reason fro doing it to prevent data corruption :( )
[16:04] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:05] <mattch> though as absynth says, it shouldn't be used/needed for ceph, so it's only if you have other data sources like drbd on the same pool that I'd hesitate using it
[16:06] <Merv31000> So If I bring each host up and put it into maintenance mode that might give it a chance to rebuild and for us to find out what is triggering the reboots?
[16:06] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[16:07] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:09] <absynth> yeah, if you can disable the stonith mechanism, you should at least be able to restart ceph
[16:09] * bmjason (~bmjason@static-108-44-155-130.clppva.fios.verizon.net) has joined #ceph
[16:09] <mattch> Merv31000: If you run that command on any up node, it will set it across the whole pool automatically
[16:09] <mattch> http://www.hastexo.com/resources/hints-and-kinks/maintenance-active-pacemaker-clusters gives a bit more info on using it for doing updates etc in pacemaker
[16:09] <scuttlemonkey> todin / Merv31000 : I actually have to run out for a while here
[16:10] <scuttlemonkey> the main body of inktankers should be arriving in a couple hours though
[16:10] <Merv31000> OK ... I will still be here tearing my hair out ...
[16:10] <joelio> I take it having 4/6 or any number of even mons is a bad idea (not just having 2) trying to think how any over 3 causes issue with voting though
[16:11] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[16:11] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[16:12] <scuttlemonkey> Merv31000: see PM, if you want to go that route you can
[16:12] <scuttlemonkey> afk
[16:13] <mattch> joelio: It all comes down to how they get 'split up' - any split where you end up with 50% or less of the total nodes, and you'll not have quorum.
[16:14] <absynth> you will have quorum in the bigger partition
[16:14] <absynth> the smaller one will stop all i/o
[16:14] <joelio> mattch: I was hoping to have an initial 6 OSDs (3 per rack, 2 racks) and have MONs on all 6 (if that would be possible and not cause issues with voting)
[16:14] <absynth> (which is probably the right thing to do in most, if not all cases)
[16:14] <absynth> joelio: i don't think you want to colocate mons and osds
[16:14] <absynth> and you _definitely_ do not want an even number of mons
[16:15] <joelio> ok, that's fine. I'm not colocating, they're next to each other but split for PDU and Switch redundancy
[16:15] <Merv31000> I will give it a try
[16:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:16] <mattch> joelio: with 6 nodes, with half on each rack, any failure domain that splits racks will take out both as neither has quorum. With 4 on one and 3 on another, the 4 rack will keep going
[16:16] <absynth> joelio: who is, the mons and osds or the osds?
[16:17] <bmjason> absynth: going down that path.. if you have 3 mons.. one fails (because hardware does).. what do you see is the impact of running two mons until the third is fixed
[16:17] <joelio> mattch: that would make 7 though ;) 2 in one and 3 on the other right
[16:17] <mattch> joelio: Yep - was implying that you'd want to have an odd number :)
[16:18] <joelio> cool, so a split of 3 hosts per rack and 1 less MON per host in 1 rack right?
[16:18] <absynth> bmjason: nothing, because there still is quorum
[16:18] <joelio> 6 Hosts, 72 OSDs (12 per hosts), 5 mons
[16:18] <absynth> bmjason: since those two remaining mons know that they are the majority
[16:19] <absynth> bmjason: however, if there is another interruption and those two mons start drifting away from each other, you are fscked
[16:24] <sstan> ceph is communicating on the public network even though it's configured with a cluster network
[16:24] * janisg (~troll@85.254.50.23) Quit (Ping timeout: 480 seconds)
[16:25] <sstan> does rbd bench send information to ceph via the public network ??
[16:25] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[16:25] <mattch> joelio: You now also need to make sure your crushmap splits stuff between the 2 racks, so that the 'up' one has all the data in an outage, and also how to make sure that all your clients can speak to both sides so they can always reach the side with data on if the other goes down. :-)
[16:27] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[16:28] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[16:30] * Philip__ (~Philip@hnvr-4dbb3e96.pool.mediaWays.net) has joined #ceph
[16:31] * vata (~vata@2607:fad8:4:6:b4e7:cb12:5d61:573) has joined #ceph
[16:31] <todin> how could I increase the count of osd which do backfilling?
[16:32] <absynth> you cannot increase that count, as it is dependant on your replica count
[16:32] <absynth> you can tell the OSDs to backfill more
[16:32] <joelio> mattch: brilliant thanks. Already aware of the maps
[16:33] <janos> absynth: how do you tell osds to backfill more?
[16:33] <todin> absynth: how could I do that?
[16:33] <absynth> ceph osd tell \* injectargs '--osd-recovery-max-active X'
[16:33] <absynth> x=10 is the default (threads per osd)
[16:33] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[16:33] <janos> cool, thanks
[16:34] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[16:34] <absynth> we usually use 1 or 2, anything more fucks with our guest VMs
[16:34] <absynth> pardon my french
[16:35] * janos takes note, of the vm's not the french
[16:35] <todin> absynth: my cluster is empty
[16:35] * janisg (~troll@85.254.50.23) has joined #ceph
[16:35] * failshell (~failshell@lpr157.lapresse.ca) has joined #ceph
[16:35] <failshell> hello. im looking into using cephfs instead of nfs. is it possible to run it in VMs in a ESX cluster?
[16:36] <elder> sage, is this something you've seen? http://pastebin.com/CsM9QYn2
[16:36] <bmjason> failshell: are you asking if you can run the osd's in vms on esx?
[16:37] <bmjason> or run vms running on esx with cephfs in as the storage tier?
[16:37] <failshell> bmjason: osd's, that's the storage nodes right?
[16:38] <bmjason> failshell: yes that is where the storage sits
[16:38] <failshell> i dont want to run VMs on Ceph, but use ceph within a VM, to deploy a storage cluster
[16:38] <bmjason> failshell: you can do it for testing.. but i/o performance is going to be really really really low
[16:38] <failshell> ah damn
[16:39] <mattch> failshell: cephfs is also still a bit alpha/beta, so don't use it in production just yet (rbd is fine though)
[16:39] * madkiss (~madkiss@089144192073.atnat0001.highway.a1.net) Quit (Ping timeout: 480 seconds)
[16:39] <failshell> ok
[16:39] <failshell> well, ill go to plan B then
[16:42] <jtang> *sigh* i just had a revisit of gpfs stats and output from the mm* tools
[16:42] <jtang> i really appreciate json output from the ceph suite of tools
[16:43] * failshell (~failshell@lpr157.lapresse.ca) Quit (Quit: Leaving...)
[16:44] <absynth> i have the haunting feeling that this guy was not happy with the answers he got
[16:44] <bmjason> hehe you think :)
[16:45] <mattch> I would have been interested to know what plan B was...
[16:45] <absynth> whatever it was, it does not involve #ceph
[16:45] <janos> not that vmware is a bad product, but i feel kinda bad for admins stuck in vmware land. good luck ever getting management to try something else
[16:46] <bmjason> yeah.. agreed
[16:46] <bmjason> if they are using vmware in the first place.. they are used to spending money
[16:46] <absynth> yep :/
[16:46] <bmjason> soo they will probably just throw money at emc or netapp
[16:46] <janos> yep
[16:46] <janos> the standard "safe" route
[16:46] <janos> $$$$
[16:46] <absynth> we have customers who have a 1/4 million bucks EMC and host, like, dunno, 200 vms on it
[16:47] <bmjason> that is only 1k per vm
[16:47] <bmjason> lol
[16:47] <janos> could make a nice ceph cluster 250k
[16:47] <janos> ;)
[16:47] <janos> +with
[16:47] <absynth> yeah, and the joke is: the performance of our ceph-backed nodes is actually much better
[16:47] <bmjason> with 250k you should be able to make a really good openstack cloud backed with ceph all running 10GE
[16:47] <mattch> that's at least an order of magnitude from my planned pool, and I don't expect any problems running a hundred or so vms :)
[16:48] <joelio> Had an HDS at the last $WORK - about £1k per disk :O
[16:48] <janos> :{}
[16:48] <joelio> cost per GB was ludicrous
[16:48] <absynth> isn't that a rapper?
[16:48] <bmjason> yeah.. that is why i am really excited to move our 1000 vms to this
[16:48] * madkiss (~madkiss@089144192012.atnat0001.highway.a1.net) has joined #ceph
[16:48] <janos> i have some iscsi sans that i just can't tell my boss to ditch though i hardly use them
[16:49] <janos> not huge investment, but big enough
[16:49] <bmjason> iscsi osd
[16:49] <bmjason> lol
[16:49] <absynth> for 1K VMs, you will be fine with 12 or 13 nodes
[16:49] <janos> i've thought about it!
[16:49] <absynth> 4 OSDs each
[16:49] <bmjason> exactly what we are doing
[16:50] <janos> working ok?
[16:50] <joelio> absynth: yea, unfortunately is didn't ship with blow and bitches... we can live in hope that one day a vendor will provide that :)
[16:50] <janos> lol
[16:50] * dmner (~tra26@tux64-13.cs.drexel.edu) has joined #ceph
[16:51] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[16:51] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[16:51] <dmner> Has anyone had problems with bonnie on a ceph block device?
[16:52] <absynth> joelio: not feeling treated well by inktank? :D
[16:52] * Tiger (~kvirc@80.70.238.91) has joined #ceph
[16:52] <darkfader> janos: EMC^2 has heli and golf course
[16:52] <dmner> I'm getting a bunch of these messages "task xfsbufd/rbd0:1636 blocked for more than 120 seconds" on the vms I am testing it with (crashed my phyiscal boxes)
[16:52] <joelio> absynth: a reach around wouldn't go amiss, hahah
[16:52] <darkfader> might be a substitute if you get older
[16:52] <absynth> yuck, TMI
[16:52] <janos> joelio:hhahahaha
[16:52] <bmjason> rofl
[16:53] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[16:53] <absynth> dmner: that usually happens if some OSDs crash and you have unfound objects / lots of slow requests, from my experience
[16:54] * leseb_ (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[16:54] <joelio> right, rocking 0.58 and liking it thus far (it's not fallen over under load generation anyway!)
[16:55] <dmner> absynth: is there a good way to find our which OSD's are causing the problem (all osds that are in are up)
[16:56] <absynth> can you see slow requests?
[16:56] <Merv31000> I have been advised to try crm configure property maintenance-mode=true to disable pacemaker but crm is not available ... any ideas?
[16:56] <janos> noob debug question - if i do iostat -x 1 and notice one particualr disk constantly at 100% util - is that a good sign that that's the slow disk?
[16:57] * leseb_ (~leseb@83.167.43.235) has joined #ceph
[16:57] <absynth> or the one with the most to do
[16:57] <janos> i've seen on other hosts where it changes disk and seems distributed
[16:57] <janos> but on anothe rhost there is one disk that seems to be the target
[16:58] <janos> util doesn't hop around so much
[16:58] * janos will dig more
[16:58] <janos> oh egad. a "green" drive. that means 5400rpm i bet
[16:58] <janos> the only one in the group
[16:58] <absynth> janos: maybe the disk is failing, too?
[16:58] <dmner> would the slow requests show up in the ceph logs of the client or osd
[16:59] <absynth> osd
[16:59] <absynth> in ceph -w for example
[16:59] <janos> smart tests and mcelog, etc don't show any failure
[16:59] <janos> i think i know why - this disk appears to be a WD green drive. it's gonna come out
[17:00] * madkiss (~madkiss@089144192012.atnat0001.highway.a1.net) Quit (Read error: Connection reset by peer)
[17:00] <joelio> janos: recommend a SATA?
[17:01] * joelio about to aquire WD black - only as it's what other in work use
[17:01] <bmjason> we use WD black drives
[17:01] <janos> that should be fine
[17:01] <bmjason> they work really well
[17:01] <joelio> cool :)
[17:01] <janos> i have those and some seagates
[17:01] <bmjason> seagates work well too.. just generally cost more
[17:01] <janos> i found my 7200 rpm newer single-platter 1TB seagates get great speed
[17:02] <dmner> absynth: I think I found my slow osds, I will set them as out and once it rebalances rerun my tests
[17:02] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) Quit (Quit: gerard_dethier)
[17:03] <dmner> thanks
[17:06] <absynth> :)
[17:07] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:08] * BillK (~BillK@124-169-104-82.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:13] <absynth> hm, a guy just posted an interesting paper on the list
[17:13] <absynth> about power fault behavior of SSDs
[17:13] <absynth> but the basic assumption seems to be flawed: "he behavior of these new components during power faults—which happen relatively frequently in data centers—"
[17:13] <janos> lol
[17:13] <absynth> what kind of datacenter "frequently" has power faults?
[17:13] <janos> that's one heck of a premise
[17:14] * jlogan1 (~Thunderbi@2600:c00:3010:1:3500:efc8:eaed:66fd) has joined #ceph
[17:14] <janos> he needs to change that to "house"
[17:14] <absynth> "the behavior of freezing hells - which happens relatively frequently in hells"
[17:14] <janos> lol
[17:15] <absynth> oh, he has 4 references to corroborate his power fault premise
[17:16] <absynth> hosting.com, level3, amazon/netflix, and another AWS outage
[17:16] <janos> btw, tried making low-power low-cost osd nodes for home. be warned - went through 3 amd APU units. cannot even post with both ram slots filled, and with one it only sees just shy of 2GB. let's just say that before i discovered that it made for a real unhappy cluster
[17:17] <janos> uh, those outages were either really edge case weirdness or routing problems, not power
[17:17] <absynth> wait, getting the links
[17:17] <janos> iirc
[17:18] <absynth> http://www.informationweek.com/%20cloud-computing/infrastructure/amazon-web-%20services-hit-by-power-outage/240002170
[17:18] <absynth> http://www.theregister.co.uk/2012/ 07/10/data centre power cut/
[17:18] <absynth> http://www.wired. com/wiredenterprise/2012/07/amazon explains/
[17:19] <absynth> those spaces probably shouldn't be there, i blame the usenix tex stylesheet
[17:19] <absynth> number 4: http://www.datacenterknowledge.com/archives/2012/07/28/human-error-cited-hosting-com-outage/
[17:19] <Kdecherf> Hm, we have a network maintenance next monday which will cause a disconnect of all daemons for 5 minutes. What is the best: shutdown daemons before or let the cluster up?
[17:20] <absynth> set noout and hope?
[17:23] <janos> wow a blown busbar
[17:23] <sstan> was anyone able to use an rbd for LVM ?
[17:23] <janos> the ones at our datacenter... i'd hate to see what that would look like
[17:25] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:27] <Kdecherf> absynth: what's happen when all MON lose the link with other MON?
[17:28] <absynth> I/O on the cluster is halted to prevent corruption
[17:28] <absynth> because you have no quorum between the mons
[17:36] * Tiger (~kvirc@80.70.238.91) Quit (Ping timeout: 480 seconds)
[17:36] <scuttlemonkey> Merv31000: how goes? Just poking my head in before I have to leave again
[17:37] <scuttlemonkey> if you're still having trouble I wanted to make sure someone waking up on the west coast could help you wrangle
[17:37] <absynth> i have seen a blip on the sage radar 20 mins go
[17:37] <absynth> ago, even
[17:38] * hybrid5121 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Read error: Connection reset by peer)
[17:38] <Merv31000> We have had no success. our system does not respond to the crm configure property maintenance-mode=true
[17:38] <Merv31000> says crm not found.
[17:39] <absynth> "which crm" finds no binary?
[17:40] <absynth> or "locate crm"?
[17:40] <Merv31000> We tried restarting everything having pushed and prodded all cables, etc but servers go off rails with page faults, etc
[17:40] <Merv31000> yep, which crm gave no answer ... I am sure that worked before.
[17:41] <darkfader> maybe some software update blew out the crm binaries
[17:41] <darkfader> i rather don't wanna know though :0
[17:42] <Merv31000> hosts were restarted by a long power-out and that caused a new kernel update to be used.
[17:42] <absynth> from a scientific standpoint, i am really curious about what is happening there
[17:42] <absynth> from an operations standpoint, i really feel your pain
[17:42] <Merv31000> and me from a desperate standpoint.
[17:43] <janos> can you boot with older kernel?
[17:43] <darkfader> <= from pillow standpoint
[17:43] <darkfader> turns over and grins evil grin bbl
[17:43] <Merv31000> they shoild all have the previous kernel.
[17:45] <Merv31000> I am thinking I need to get local linux/networking support to look for a problem that is spooking ceph/etc
[17:45] <scuttlemonkey> Merv31000: that's probably the best bet
[17:45] * Tiger (~kvirc@80.70.238.91) has joined #ceph
[17:45] <scuttlemonkey> once the machines stay up we can help you iron out the ceph part
[17:46] <Merv31000> But I can try rolling back a kernel to see what happens .... 3.16am here and local guys all in bed, where I shoiuld be.
[17:47] <Merv31000> Thanks for trying to assist.
[17:47] <janos> wish you the best of luck
[17:48] <Merv31000> I will let you know what develops and may need help with ceph etc after.
[17:48] <absynth> yeah, that would be good
[17:54] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[17:54] <scuttlemonkey> Merv31000: cool, if/when you get ready for ceph help shoot us an email
[17:54] <scuttlemonkey> nthomas@inktank and wolfgang.schulze@inktank
[17:54] <scuttlemonkey> if you hit those two they'll get you ironed out
[17:56] <scuttlemonkey> ok, I'm off again...good luck!
[17:58] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:03] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) has joined #ceph
[18:05] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) has joined #ceph
[18:07] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) Quit ()
[18:07] * jotag (70cd1338@ircip2.mibbit.com) has joined #ceph
[18:08] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[18:08] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[18:11] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[18:12] * jotag (70cd1338@ircip2.mibbit.com) Quit ()
[18:16] * Merv31000 (~Merv.rent@150.101.235.251) Quit (Quit: Leaving)
[18:19] * Merv31000 (~Merv.rent@150.101.235.251) has joined #ceph
[18:21] * alram (~alram@38.122.20.226) has joined #ceph
[18:21] * Merv31000 (~Merv.rent@150.101.235.251) Quit ()
[18:22] * Loffler (~Loffler@150.101.235.251) Quit (Quit: Leaving)
[18:24] * noob21 (~cjh@173.252.71.3) has joined #ceph
[18:29] <Elbandi_> hi ppl, here is my php extension for cephfs:
[18:29] <Elbandi_> https://github.com/Elbandi/phpcephfs
[18:29] * leseb_ (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[18:30] <Elbandi_> feel free to comment or suggest :)
[18:30] <ShaunR> Elbandi_: what would be the purpose of this? just curious.
[18:32] <gregaf> it's php bindings for the libcephfs API
[18:32] <gregaf> nice!
[18:32] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:33] <ShaunR> I guess i'll have to look at what API's are offered through libcephfs
[18:33] * cashmont (~cashmont@c-76-18-76-30.hsd1.nm.comcast.net) has joined #ceph
[18:35] <ShaunR> Could be cool to have one for rbd!
[18:35] <ShaunR> Would*
[18:35] * leseb_ (~leseb@83.167.43.235) has joined #ceph
[18:35] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) has joined #ceph
[18:37] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:37] <Elbandi_> ShaunR: you can access your cephfs directly from php, no local mount is needed
[18:37] * leseb_ (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[18:38] <ShaunR> Elbandi_: i was just looking through your code and all the functions, pretty cool.
[18:39] <ShaunR> oh hey look at that, there are already php bindings for librados... https://github.com/ceph/phprados
[18:39] * esammy (~esamuels@host-2-103-101-90.as13285.net) has joined #ceph
[18:40] <ShaunR> Elbandi_: did you write the rados extension too? The header is exacting the same in the .c file
[18:48] * markbby (~Adium@168.94.245.1) has joined #ceph
[18:48] <Kdecherf> hm, I have a "connection failed" from a client to the active mds, but not log on the mds
[18:50] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[18:53] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) has joined #ceph
[19:00] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[19:06] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[19:08] * rturk-away is now known as rturk
[19:09] * markbby (~Adium@168.94.245.1) Quit (Ping timeout: 480 seconds)
[19:13] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:16] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:18] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:21] <Elbandi_> ShaunR: no, phprados was a sample for me
[19:21] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[19:22] <Elbandi_> "how to write a php ext" :)
[19:22] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) has joined #ceph
[19:31] * dmick (~dmick@2607:f298:a:607:a40d:8efc:5e0f:154e) has joined #ceph
[19:45] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:57] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[19:58] <dmner> corrupt inc osdmap on a client seems like a bad thing
[20:06] <dmner> is there a safe way to recover from corrupt inc osdmaps?
[20:08] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[20:08] * sjustlaptop (~sam@38.122.20.226) Quit ()
[20:08] <gucki> gregaf: yt there?
[20:09] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[20:09] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[20:09] <gucki> sjustlaptop: hey there :)
[20:16] * eschnou (~eschnou@157.68-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:17] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[20:18] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[20:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:21] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Ping timeout: 480 seconds)
[20:31] * ntranger_ (~ntranger@proxy2.wolfram.com) Quit ()
[20:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:35] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[20:35] * mjevans (~mje@209.141.34.79) has joined #ceph
[20:38] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[20:42] * eschnou (~eschnou@157.68-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[20:47] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[20:57] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[21:03] <sstan> what's the reason that mapped rbd devices don't show in `fdisk -l`
[21:12] <sstan> rbd mapping changes the file /proc/devices
[21:13] <sstan> 251 rbd2
[21:13] <sstan> 252 rbd1
[21:13] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (Quit: Leaving)
[21:16] * sstan_ (~chatzilla@modemcable016.164-202-24.mc.videotron.ca) Quit (Ping timeout: 480 seconds)
[21:17] <todin> can me someone help with my crushmap, inner rack replication does not work, if I shut down a node, no replication starts.
[21:17] <todin> that's my crushmap http://pastebin.com/jShnjhdN
[21:17] * stxShadow (~jens@p4FD06FB0.dip.t-dialin.net) Quit (Remote host closed the connection)
[21:19] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[21:20] <noob21> which pool has the problem?
[21:20] <noob21> or is it all of them?
[21:22] <todin> noob21: the rbd pool, I want the replication over two racks. if I than shutdown one node, the cluster stays degraded
[21:24] <noob21> what is your replication level?
[21:24] <todin> 2
[21:24] <noob21> i don't think you can have rep=2 and shut down half your replica's. that's why it stays degraded
[21:25] <todin> noob21: I have in each rack 3 nodes. and I only shutdown one node.
[21:25] <janos> when you say shutdown a node - do you mean a whole rack? or one host in a rack?
[21:25] <janos> ok
[21:25] <janos> that answers that;)
[21:25] <todin> why didn't the other 2 node in the rack get the replicas?
[21:25] <noob21> it looks like only 2 racks are defined in your crush map
[21:25] <todin> noob21: 2 rack with each 3 nodes
[21:25] <noob21> yeah i'm not sure where the problem is
[21:25] <noob21> i see
[21:26] <todin> I would expext no replication if I shutdwon a whole rack
[21:35] * eschnou (~eschnou@157.68-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:35] <joshd> todin: could be the old crush tunables issue with a deep hierarchy (http://ceph.com/docs/master/rados/operations/crush-map/#impact-of-legacy-values)
[21:35] * barryo (~barry@host86-128-153-10.range86-128.btcentralplus.com) has joined #ceph
[21:37] * dmner (~tra26@tux64-13.cs.drexel.edu) Quit (Quit: io errors yay)
[21:39] <todin> joshd: shouldn't my crushmap not have some sort of step chose firstn 0 type rack -> step choseleaf fristn 0 type host?
[21:41] <joshd> yeah, your current rule doesn't distinguish between hosts, only racks
[21:42] <barryo> I've been doing some reading on what filesystem to use for OSD's and keeping finding mentions of btrfs causing issues, is this still the case? do most people use btrfs or xfs?
[21:42] <barryo> forgive the noobish questions, I only properly started experimenting with ceph today :)
[21:43] <todin> joshd: so I should change the crushmap?
[21:44] <joshd> todin: yeah, if you search ceph-devel you should find examples of what you're after
[21:44] <joshd> todin: you should enable the crush tunables too
[21:45] <janos> barryo - i think in recent kernels (3.8.x+) the btrfs truncation issue is fixed. but i would get confirmation from others ont hat first
[21:45] <janos> *on that
[21:45] <todin> joshd: I tried to search for that, I know I had this disscusion with tv a few years ago, but I couldn't find it, he explained it quit well
[21:46] <todin> janos: I am running with 3.8.0 and btrfs, and everything is stable
[21:46] <janos> i have half xfs half btrfs
[21:46] <barryo> janos: I'm going to try to stick to using the el6 kernels so I'll stick with xfs for the time being
[21:46] <barryo> it's good to know that the two can be mixed though
[21:46] <janos> yeah
[21:46] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[21:47] <janos> i'll eventually take down my xfs ones and move to btrfs
[21:47] <janos> one by one
[21:47] <janos> the joy of no downtime ;)
[21:48] * livekcats (~stackevil@178.112.97.35.wireless.dyn.drei.com) has joined #ceph
[21:50] <joshd> todin: i think what you mentioned there is right, but I'd suggest testing the new map with the crushtool before using it
[21:50] <barryo> I look forward to reaching the point where I don't fear downtime, at the moment I have 45 virtual machines spread across 10 hosts. I'm hoping to use ceph as storage for the VMs and look forward to not having to dread a host failing on me.
[21:51] <janos> barryo: yeah. i haven't deployed in production yet
[21:51] <todin> joshd: I already injected the new map a few hours ago, the cluster is quit busy
[21:51] <janos> i've been insanely building a cluster at home to dogfood it a while
[21:51] <janos> and UPS just showed up with more parts!
[21:52] * janos REALLY wishes 10GBe equipment would come down in price. the home market wants it tooo
[21:53] <janos> well, THIS home market does at least
[21:55] * esammy (~esamuels@host-2-103-101-90.as13285.net) has left #ceph
[21:57] <noob21> it looks like when i run the manual mkcephfs cmds i can't get it working
[21:58] <barryo> I'm buildign a cluster on my development hosts at work, dismantled an old RAID array and put a disk in each of them
[21:58] <barryo> we're still using 1GBe at work sadly
[21:59] <janos> you can LAG some together
[22:00] <janos> at home my backend cluster network is bonded nics with LAGs
[22:00] <janos> just using round-robin mode on the bond
[22:01] <noob21> mkcephfs --init-local-daemons osd.0 -d /tmp/foo doesn't seem to make anything in the directory
[22:04] <noob21> anyone know how to manually do mkcephfs so that it doesn't ssh into each node?
[22:04] <barryo> I figured I'd need to bond nics, it would be interesting to know what sort of set-up others have who are using ceph to host ~50 virtual machines
[22:05] <barryo> I've got budget to buy the kit, messing it up the spec scares me
[22:05] <janos> yeah
[22:05] <todin> barryo: I have 96 disk for 400vms
[22:06] <barryo> todin: over how many hosts?
[22:06] <todin> 6
[22:07] <barryo> what sort of spec do they have?
[22:07] <todin> barryo: they are e5 1650 32G 16 SAS Disk dual 19GbE
[22:08] <dmick> noob21: did it not initialize things in the osd data path (/var/lib/ceph/osd..., by default?)
[22:08] <noob21> no that dir is still blank
[22:08] <scuttlemonkey> todin: glad you got an answer finally...sorry, been a crazy day for me
[22:08] <todin> scuttlemonkey: np
[22:09] <scuttlemonkey> todin: definitely let me know if that works....guess I thought it would handle it from the top down w/o having to go to the host level
[22:09] <dmick> and by "that dir" you mean "the ceph.conf osd dir", not "the -d dir"?
[22:09] <noob21> dmick: i figured init-local-dirs would do that?
[22:09] <noob21> yeah the ceph.conf osd dir
[22:09] <dmick> I've never used mkcephfs in that way, just looking at the code. I'll try an experiment
[22:09] <noob21> ok
[22:10] <todin> scuttlemonkey: I will do, the cluster has to shuffle a few hundrets of Gigabytes
[22:10] <noob21> dmick: i'm using the default paths ceph expects
[22:11] <barryo> todin: that's not as high a spec as I would have expected
[22:12] <todin> barryo: what do you expect?
[22:12] <scuttlemonkey> todin: hehe, sure
[22:13] <noob21> dmick: maybe the problem is i don't have devs and fs path defined in the ceph.conf
[22:13] <noob21> i want to just experiment and have ceph just use the local drive
[22:14] <dmick> yeah, it certainly should support that
[22:14] <dmick> you don't want devs and fs unless you want ceph to be taking over the drive, wiping it, and mkfs'ing it
[22:14] <noob21> right
[22:14] <noob21> ok
[22:15] <noob21> yeah so i just put [osd.] and host = xxx
[22:15] <noob21> osd.0*
[22:15] <todin> joshd: do you know, if in a openstack setup the openstack controller must be able to connect to the ceph cluster, or just the nova compute nodes?
[22:16] <barryo> todin: I'm new to this, I don't know what to expect ;)
[22:16] <dmick> noob21: if I try the instructions at the top of mkcephfs
[22:16] <dmick> --prepare-monmap makes me a monmap in -d
[22:17] <dmick> but the next steps seem to be 1) missing -c, and 2) require the existence of the paths
[22:17] <noob21> ok so should i point that at /var/lib/ceph/mon/mon-a ?
[22:17] <joshd> todin: just compute and cinder-volume (and glance-api, if you're doing rbd behind that too)
[22:17] <dmick> that is it failed trying to create osd.0's dir, saying ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory
[22:17] <noob21> i see
[22:17] <dmick> --init-local-daemons osd failed this way
[22:17] <noob21> yeah i believe mine did also
[22:18] <dmick> so I'm not sure what sequence of steps you were following?
[22:18] <phantomcircuit> lol
[22:18] <noob21> just the steps that were laid out in the man page
[22:18] <noob21> same order
[22:18] <todin> joshd: but glance-api normaly runs one the controller node
[22:18] <dmick> manpage? psh. :)
[22:18] <barryo> can you use rbd import without the kernel module?
[22:18] <phantomcircuit> dmick, yesterday i was talking about editing files on an rbd v2 volume as part of a CoW install
[22:18] <noob21> lol
[22:18] <noob21> my /tmp/foo/x exists now
[22:18] <dmick> noob21: I'll look there and compare. phantomcircuit: yes?
[22:18] <noob21> with the monmap osdmap keyring, etc
[22:19] <phantomcircuit> someone today suggested that i create the config files as large blocks of \n
[22:19] <phantomcircuit> and then edit the files at fixed offsets using the block writing capabilities of librbd
[22:19] <phantomcircuit> my response was simply
[22:19] <phantomcircuit> wat
[22:20] <dmick> that....seems....very Survivorman
[22:20] <noob21> dmick: /usr/bin/monmaptool: writing epoch 0 to /tmp/foo/monmap (1 monitors)
[22:20] <iggy> seems doable, but bound to crash and burn at some point
[22:20] <noob21> that might be the issue?
[22:20] <joshd> todin: I'm not too familiar with what the 'controller node' runs, since it's an abstraction for the docs, I just know what the services need
[22:20] <dmick> noob21: the first step creates the monmap and writes it to -d
[22:20] <noob21> ok so i need to point it at the right -d
[22:20] <noob21> it doesn't automagically grab it
[22:20] <dmick> no, the /tmp location is fine
[22:20] <dmick> because you use that in later steps
[22:20] <todin> joshd: oki, have to dig a few tunnels between the networks
[22:21] <dmick> but the osd step is the same, and failed for you, so...that's surely the issue?
[22:21] <dmick> I think maybe you're overthinking this
[22:21] <noob21> lol i might be
[22:21] <noob21> i just want to fire up an osd and monitor on my local machine :)
[22:21] <dmick> between --prepare-monmap and --init-daemon osd, make sure the osd data dir (and journal dir, if different) are present
[22:22] <dmick> when I do that, --init-daemon initializes the filestore in that dir
[22:22] <noob21> yup they're present
[22:22] <noob21> see that's what mine isn't doing
[22:23] <dmick> I don't believe you :)
[22:23] <noob21> lol
[22:23] <noob21> i can pastebin it :D
[22:23] <dmick> please
[22:24] * eschnou (~eschnou@157.68-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:24] <noob21> http://pastebin.com/sAYL2arB
[22:24] <dmick> so, when init'ing the mon, why do you expect contents in osd? :)
[22:24] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[22:25] <noob21> well the mon doesn't work either
[22:25] <noob21> oh i see what i did
[22:25] <noob21> yes
[22:25] <noob21> when i swap that for osd nothing is in the osd directory
[22:26] <noob21> refresh the paste, i revised it
[22:26] <noob21> this is on centos6 so that might be the issue
[22:26] <noob21> i have a newer 3.2.38 kernel
[22:27] <dmick> http://pastebin.com/B4SUXRs2
[22:27] <dmick> is what I just did with success
[22:27] <noob21> ok lemme try that
[22:27] <dmick> your pastebin still says "init mon" and then "ls osd"
[22:27] <dmick> I am confuse
[22:28] <noob21> yep
[22:28] <noob21> nada
[22:28] <dmick> if there were also no error messages pointing at the problem...
[22:29] <noob21> ok how's this: http://pastebin.com/Pa80pGyY
[22:29] <dmick> I dunno. typescript the session and your ceph.conf and paste it?
[22:29] <noob21> ok lemme paste that
[22:29] <noob21> pretty simple ceph.conf: http://pastebin.com/mj1KnPVr
[22:30] <dmick> and the host you're running this on answers "uname" with "ceph1"?
[22:30] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) Quit (Quit: Leaving.)
[22:30] <dmick> sorry, "hostname"
[22:30] <noob21> yeah i have to hide the hostname :(
[22:30] <dmick> well
[22:30] <noob21> it's a valid resolvable name
[22:31] <dmick> it answers "hostname" with whatever you have in "host="?
[22:31] <noob21> oh
[22:31] <dmick> literally?
[22:31] <noob21> yes
[22:31] <noob21> i've got the full fqdn in there
[22:31] <dmick> in where?
[22:31] <noob21> in the ceph.conf
[22:31] <dmick> they have to match
[22:31] <dmick> i.e. "the output of the hostname command" is one thing
[22:31] <dmick> "the contents of host =" is another thing
[22:31] <dmick> those things must be identical
[22:31] <noob21> hmm
[22:31] <janos> hostname -s, i thought
[22:32] <noob21> oh..
[22:32] <noob21> hostname -s is something else
[22:32] <dmick> yes, hostname -s
[22:32] <noob21> ok let me change that and try this again
[22:32] <janos> if you have host = blah, hostname -s needs to say blah
[22:32] <noob21> what a bonehead problem haha
[22:33] <dmick> basically mkcephfs looks through the whole conf looking for "OSD sections that are tagged with this host" and the way it does that is with hostname -s vs host=
[22:33] <noob21> haha... that was it
[22:33] <noob21> sorry man :(
[22:33] <dmick> no worries
[22:33] <bmjason> we all do it
[22:33] <bmjason> i did three today
[22:33] <noob21> haha
[22:33] <bmjason> :)
[22:33] <dmick> there's no error, because it's not considered an error
[22:33] <bmjason> bone head moves that is
[22:33] <noob21> i'm at a new place and the naming scheme is completely different
[22:33] <dmick> maybe we should see if one starts with the other and issue a warning if so, since this happens so often
[22:34] <noob21> ok lemme try and start it up
[22:34] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) has joined #ceph
[22:34] <noob21> there we go :D
[22:34] <noob21> it's running
[22:34] <noob21> on centos6
[22:35] <noob21> no hacking involved just had to update the kernel
[22:38] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:38] <dmick> noob21: good
[22:38] <dmick> I think a warning there is possible and will pursue it
[22:38] <dmick> silly to waste time on this with no computer help
[22:38] <barryo> I'd like to get cepg working without updating the kernel on el6, what are the chances?
[22:39] <dmick> if you don't want kernel rbd or kernel cephfs, I think not bad
[22:39] * leseb (~leseb@78.251.34.83) Quit (Read error: Connection reset by peer)
[22:39] <dmick> and you can definitely use ceph without them
[22:39] * leseb (~leseb@78.251.34.83) has joined #ceph
[22:40] <barryo> a colleague reckons you can deal with rbd well enough using libvirt, does that sound reasonable?
[22:42] <dmick> depends on your use case, but yes, you can definitely run qemu-kvm VMs all day with only userland libraries
[22:43] <noob21> dmick: i agree :)
[22:44] <barryo> can you import images without using the kernel mosule?
[22:45] <dmick> yep
[22:45] <noob21> looks like i can create the rbd but not map it haha
[22:45] <noob21> rbd: /sys/bus/rbd/add does not exist!
[22:45] <dmick> noob21: you definitely can't map format 2, you're not trying that, right?
[22:46] <noob21> is that the default for bobtail?
[22:46] <dmick> no
[22:46] <dmick> but that error sounds like rbd.ko couldn't load
[22:47] <noob21> yeah
[22:47] <noob21> i wonder if it's not included in the kernel build i have
[22:47] <noob21> oddly enough it lets me create the rbd
[22:48] <barryo> I had the same issue with mapping using the default el6 kernel
[22:48] <noob21> the default el6 kernel doesn't include rbd i think
[22:48] <dmick> noob21: create, you mean like with userland tools? sure, that's just manipulating teh cluster
[22:48] <noob21> it's like a revision or 2 too old
[22:48] <noob21> oh ok
[22:48] <dmick> krbd is completely separaet from creation
[22:48] <noob21> the mapping is all the kernel then
[22:49] <noob21> i gotcha
[22:49] * leseb_ (~leseb@78.251.50.220) has joined #ceph
[22:49] <dmick> "map" means "associate a Real Kernel Block Device, using rbd.ko, with a previously-existing rbd image"
[22:49] <noob21> right
[22:49] <dmick> and is only necessary if you want a /dev/rbd<n> to deal with
[22:49] <dmick> (for you and barryo)
[22:50] <dmick> and rbd has been in the kernel for a long time, but it's kinda buggy in pre, say, 3.4-or-so
[22:50] <barryo> that makes me happy, i misunderstood things earlier today and thought rbd would work properly on el6 defauilt kernel
[22:50] <dmick> barryo: it will, just not the kernel-module flavor
[22:51] <barryo> is there anything that can only be done with the kernel module?
[22:54] * KindOne (KindOne@h147.26.131.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[22:55] * leseb (~leseb@78.251.34.83) Quit (Ping timeout: 480 seconds)
[22:55] <dmick> barryo: yes. You must have the kernel module if you want to use rbd images as kernel block devices
[22:56] <dmick> but you do not need to use kernel block devices for qemu-kvm; it has built-in device types that can access the cluster directly
[22:56] <barryo> kernel block devices are only used for mounting localy, is that right?
[22:56] <dmick> http://ceph.com/docs/master/rbd/rbd/ may hep
[22:56] <dmick> *help
[22:57] <dmick> specifically the sections on qemu and libvirt
[22:58] <barryo> can you mount cephfs without the kernel module?
[22:58] <dmick> there is a FUSE module, yes
[22:59] <barryo> excellent :)
[23:02] <dmick> and you can use it with Hadoop
[23:02] * KindOne (KindOne@h147.26.131.174.dynamic.ip.windstream.net) has joined #ceph
[23:03] * bmjason (~bmjason@static-108-44-155-130.clppva.fios.verizon.net) Quit (Quit: Leaving.)
[23:03] <barryo> I'm now toally sold on ceph over drbd for our virtualisation system
[23:03] <dmick> there is also a Ganesha plugin (for NFS), which may be harder to find, and a Samba plugin, which may not be findable at the moment, but are both at least in the words
[23:03] <dmick> *works
[23:05] <dmick> barryo: there is a lot of good introductory information on inktank.com as presentations/videoa, and a lot of documentation on ceph.com
[23:05] <dmick> http://www.inktank.com/resource/webinar-getting-started-with-ceph/ in particular might be useful
[23:06] <barryo> thanks for the link
[23:06] <barryo> i'll ckheck it out next week
[23:07] <dmick> np
[23:08] <barryo> saying that, the fact that I can come on IRC on a Friday night and get answers to all my questions is what really sells ceph to me
[23:09] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[23:10] <dmick> we're here to help :)
[23:10] * loicd (~loic@rom26-1-88-170-47-165.fbx.proxad.net) Quit (Quit: Leaving.)
[23:12] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[23:14] <barryo> am I right in thinking that putting ceph on a RAID5 array is a bad thing to do?
[23:15] <Gugge-47527> barryo: define bad
[23:15] <rturk> s/bad/confusing
[23:15] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[23:15] <rturk> :)
[23:15] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:16] <gregaf> it is not typically optimal, and the RAID-5 read/modify/write behaviors aren't going to do any favors under Ceph's workload
[23:16] <gregaf> but it''ll work and could be right under some setups
[23:17] * sstan (~chatzilla@dmzgw2.cbnco.com) Quit (Remote host closed the connection)
[23:17] <dmick> my first though is usually "unnecessary and wasteful", but if it's what you've got, all a Ceph OSD needs is "a filesystem", so it'll definitely work.
[23:17] * leseb_ (~leseb@78.251.50.220) Quit (Read error: Connection reset by peer)
[23:17] * leseb (~leseb@78.251.50.220) has joined #ceph
[23:18] <dmick> but Ceph is ready to deal with failures of plain-old-disks (assuming you plan your failure domains accordingly and have "enough" redundancy). It's designed for component failure.
[23:20] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[23:21] <barryo> I thought that RAID5 would add some I/O overhead that probably isn't necessary considering the design of ceph
[23:22] <dmick> yep.
[23:22] <dmick> but if you have few OSDs on few hosts, it could add welcome reliability too
[23:23] * leseb (~leseb@78.251.50.220) Quit (Remote host closed the connection)
[23:25] * alram (~alram@38.122.20.226) Quit (Read error: Operation timed out)
[23:26] <nhm> I tend to think of RAID5 being a reasonable option when you have a crazy number of disks per host (say 60)
[23:26] <barryo> I'll probably have under 8 disks per host
[23:29] * wer (~wer@wer.youfarted.net) has joined #ceph
[23:32] * esammy (~esamuels@host-2-103-101-90.as13285.net) has joined #ceph
[23:39] <barryo> thanks for all he help, i'm off for the nighg
[23:39] * barryo (~barry@host86-128-153-10.range86-128.btcentralplus.com) has left #ceph
[23:44] * vata (~vata@2607:fad8:4:6:b4e7:cb12:5d61:573) Quit (Quit: Leaving.)
[23:45] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[23:47] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Remote host closed the connection)
[23:54] <noob21> dmick: i found it. RBD isn't being built into the kernel i'm using
[23:54] <noob21> easy to fix :)
[23:55] <dmick> if you don't have a kernel module, it's difficult for it to load, yes
[23:55] <noob21> haha yes
[23:58] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:59] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.