#ceph IRC Log


IRC Log for 2012-06-08

Timestamps are in GMT/BST.

[0:04] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:10] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:20] * lx0 is now known as lxo
[0:20] <lxo> I'm now trying to use $cluster in ceph.conf. it took me a while to find the --cluster option in the source code, for I couldn't find it in the docs
[0:20] <lxo> (I mean --help output, man pages, that sort of stuff)
[0:21] <lxo> I'm a bit disappointed that there doesn't seem to be a way to set what cluster should be expanded to in the conf file itself. this would have been convenient for my intended use
[0:21] <lxo> is there any reason for it to not be available?
[0:21] <lxo> other than ???nobody wrote the patch yet???, I mean ;-)
[0:21] <Tv_> lxo: it's too late at that stage
[0:22] <Tv_> lxo: the config file path is /etc/ceph/$cluster.conf
[0:22] <lxo> for that use, yes
[0:22] * MarkN (~nathan@ has joined #ceph
[0:22] <lxo> but if I specify say -c /etc/ceph/saved/ceph.conf, I'd like cluster name to be set to ceph/saved, so that all the $cluster occurrences refer to the proper locations
[0:23] <lxo> even if I didn't specify --cluster (because I'm so used to using -c by now ;-)
[0:23] <Tv_> lxo: don't put slashes in cluster name
[0:23] <Tv_> we haven't strictly defined the charset, but it most definitely isn't arbitrary
[0:23] <lxo> Tv_, don't get distracted by such details
[0:23] * MarkN (~nathan@ has left #ceph
[0:24] <Tv_> lxo: yeah you'll need to pass --cluster foo -c path
[0:24] <lxo> I currently have everything set up so that my current cluster is in /media/ceph, while old snapshots might be accessed in /media/ceph/saved
[0:24] <Tv_> lxo: i really don't want to see /etc/ceph/foo.conf saying cluster=bar, that'd be really confusing
[0:24] <Tv_> lxo: you run daemons for the snapshots too?
[0:25] <Tv_> lxo: i guess i just don't understand your use case (yet)
[0:25] <lxo> isn't it even more confusing (and dangerous) if ceph-osd -c /etc/ceph/foo.conf -i 0 --mkfs ends up overwriting the data of your main cluster, rather than complaining the clustername doesn't match, or that --cluster was not specified?
[0:26] <Tv_> lxo: actually that won't overwrite anything ;)
[0:26] <Tv_> lxo: but it may be awfully confusing
[0:26] <Tv_> lxo: -c is a bit dangerous that way
[0:26] <lxo> exactly my point
[0:26] <Tv_> you shouldn't use it much
[0:26] <Tv_> it was dangerous before --cluster was introduced
[0:26] <lxo> how so?
[0:27] <lxo> before, I had to set different pathnames for each cluster so they wouldn't overlap
[0:27] <Tv_> well perhaps "misleading" is a better word than "dangerous"
[0:27] <lxo> now I can use $cluster and they're more likely to be expanded to the same string if I happen to leave --cluster out
[0:28] <Tv_> you're just saying your fingers are trained to type -c /path/to/foo.conf not --cluster foo
[0:28] <lxo> I'd rather have a mismatch complaint than lose data
[0:28] <Tv_> the whole point of --cluster support was to avoid using -c all the time
[0:28] <Tv_> so put the config files in the right place, don't use -c normally, and you're safe
[0:29] <Tv_> but yeah i'll happily agree that it's not ideal
[0:29] <lxo> I know, but I'm already used to using -c
[0:29] <Tv_> the multiple clusters on same hardware feature had pretty big customer demand, that's why it was done
[0:29] <lxo> and, really, enabling people to set cluster from ceph.conf, and checking for mismatches from a manually-specified one, sounds far safer to me than the current arrangement. but what do I know? :-)
[0:30] <Tv_> lxo: the only thing i can think of is letting [global] cluster=foo not *set* the value, but *assert* the value
[0:30] <Tv_> lxo: that'll catch the mistake
[0:30] <Tv_> lxo: /etc/ceph/foo.conf saying cluster=bar would be really, really bad
[0:31] <lxo> asserting is good, yeah. if it could override the built-in default, but not an explicit --cluster setting, that would be even better
[0:31] <Tv_> both /etc/ceph/foo.conf and ceph.conf saying cluster=foo is pretty bad too
[0:31] <lxo> so I could have my ceph.conf set cluster name to e.g. home
[0:32] <Tv_> lxo: we really did not foresee people changing the name if they're running just one instance
[0:32] <lxo> *nod*
[0:32] <Tv_> it's not like anything would output the name, etc
[0:32] <Tv_> the less that feature is used, the more things remain copy-pasteable in conversations etc
[0:33] <Tv_> e.g. the new-style upstart config special-cases cluster=ceph, and lets you not type it in
[0:33] <Tv_> sudo initctl start ceph-osd id=42
[0:33] <Tv_> vs
[0:33] <Tv_> sudo initctl start ceph-osd cluster=home id=42
[0:33] * lxo is on systemd, not upstart :-)
[0:33] <Tv_> i expect similar things will apply there, later
[0:34] * a2 (avati@ has joined #ceph
[0:34] <Tv_> don't know if systemd understands instance jobs
[0:35] <lxo> anyway... I guess I'll leave pathnames in place for my old cluster while I copy data from it to the new cluster and be done with it for now. thanks
[0:35] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[0:35] <lxo> this is just something I use when I find corruption in the cluster (presumably introduced by old btrfs bugs) and decide to re-create it from scratch
[0:36] <lxo> so I move existing mons and osds to ceph/saved, create a new empty cluster, and copy things over. it's a bit of a pain, but I've had to do it a number of times since I started playing with ceph
[1:01] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:15] <joao> gregaf, around?
[1:15] <gregaf> yep
[1:21] * a2 (avati@ Quit (Quit: leaving)
[1:36] <Qten> hi all, i know its recommended not to use ceph and kvm on the same physical hardware, is this due to something that is going to be fixed or ?
[1:37] * Tv_ (~tv@2607:f298:a:607:bd15:990e:65cd:46db) Quit (Read error: Operation timed out)
[1:37] <joshd> Qten: there's no recommendation against kvm
[1:38] <Qten> sorry i ment using rbd vols
[1:38] <Qten> (just waking up)
[1:39] <Qten> or have i got my infomation wrong ? :)
[1:39] <joshd> ah, no, that's an inherent limitation in the kernel, and proposals for using special memory pools or pinning haven't gotten upstream in the past
[1:40] <joshd> if you really want to run them on the same host, having syncfs support makes the problem less likely, but does not eliminate it
[1:41] <Qten> so which compoent is the part which causing the issue?
[1:43] <Qten> i guess i'm having issues with paying $8600 for a pure storage node with 12 x 3tb drives
[1:43] <Qten> and 2 ssd's
[1:44] <Qten> but thats a personal problem i guess :)
[1:45] <joshd> it's a deadlock when there's memory pressure - the kernel rbd module is sending requests to the osd on the same host, and can't proceed until they finish, but the osd and underlying fs need more memory to process those requests, which is unavailable
[1:46] <Qten> so its more about using too much physical memory for vms?
[1:47] <joshd> not necessarily for vms directly, but for the kernel, as I understand It
[1:47] <Qten> sorry thats what i ment :)
[1:48] <dmick> it's not so much using too much as "there's deadlock when you run out"
[1:48] <Qten> if we added say 128g to the box and only used 96 for vm's then technically we should allways have 32 avaliable for the os as if we only had 32g in the box to begin with or is this more of a scaling issue ie: more memory in use = more memory needed
[1:48] <sjust> Qten: are you planning on mounting the rbd volumes from the host or the guest?
[1:49] <Qten> sjust: the vols would have the os installed on them so the host
[1:51] <joshd> the scaling issue is with I/O from the vms requiring more memory usage by kernel rbd - if guests aren't accessing the disk much, it's not likely to be a problem
[1:51] <Qten> i do understand the issue i think now a bit better if i'm correct in saying its more related to more memory in use the more memory possibly required by the kernel for fs/rbd/etc
[1:54] <Qten> yep
[1:54] <Qten> on a semi unrelated note joshd i believe you are working on openstack/nova doco for rbd vols? :)
[1:55] * Sentinel (~sentinel@174-23-184-173.slkc.qwest.net) has joined #ceph
[1:56] <joshd> yeah, it's in progress
[1:57] <joshd> should be happening in the next couple weeks
[1:58] <Qten> any tid bits you can leak yet?
[2:00] <Qten> i found the part about adding the volume info to the nova.conf file from openstack doco but thats all i've been able to find
[2:06] * lofejndif (~lsqavnbok@82VAAEBX3.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:11] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:11] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[2:18] * joao (~JL@ Quit (Quit: Leaving)
[2:24] * stass (stas@ssh.deglitch.com) has joined #ceph
[2:30] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[2:52] * eightyeight (~atoponce@pthree.org) Quit (Quit: wwjd? jwrtfm.)
[3:04] * aliguori (~anthony@ has joined #ceph
[3:17] <renzhi> joshd: thanks for the info, I'll take a look into that
[3:19] <joshd> renzhi: cool, sounds good - the fixes shouldn't be too hard, but there may be other race conditions I didn't check too closely
[3:19] <renzhi> ok, so right now, I shouldn't share the cluster handle.
[3:20] <renzhi> Was asking this because of the crash I ran into with librados, and was trying to see if there are other ways
[3:20] <joshd> you should be able to share the cluster handle
[3:21] <joshd> but it hasn't been stress tested to detect races when using many ioctxs recently
[3:22] <joshd> basically any shared state in the RadosClient (like the osdmap) needs to be protected by the RadosClient::lock
[3:23] <renzhi> I looked at the code, and thought that I could share it, and since I haven't really looked closely, I'm not too sure.
[3:23] <renzhi> is this issue on someone's radar, and is it actively worked on?
[3:24] <joshd> not immediately, no
[3:24] <joshd> I won't have time to do it any time soon, at least
[3:24] <joshd> if you'd like to work on it, feel free
[3:25] <renzhi> :) I'll take a look, and see if I could submit some patch, and right now, we are busy with stress testing and the crash just hit us.
[3:27] * Ryan_Lane (~Adium@dslb-178-005-175-188.pools.arcor-ip.net) has joined #ceph
[3:27] * Ryan_Lane1 (~Adium@dslb-178-005-175-188.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[3:33] * aliguori (~anthony@ Quit (Read error: Operation timed out)
[3:50] <joshd> Qten: the basic steps are 1. add ceph.conf and keyring file to the compute node 2. install ceph-common 3. add secret key to libvirt 4. add the flags to nova (the ones that the openstack docs mention, plus rbd_user (the rados user, i.e. admin) and rbd_secret_uuid (the uuid that libvirt used to store the secret)) 5. set CEPH_ARGS="--id <user>" for nova-volumes
[3:54] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:10] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[4:23] <Sentinel> I thought mkcephfs copies the ceph.conf and keyring files to node 2 and on?
[4:24] <Sentinel> I'm still working on getting it to work myself actually. But when I run mkcephfs I see the files copied over and they seem to be fine
[4:24] <dmick> Sentinel: it certainly can, if you set up the cluster with it. There are other ways to set up a cluster
[4:24] <Sentinel> when I start the service on the systems mon is up but osd still not running. Not sure what I'm doing wrong :/
[4:25] <Sentinel> no worries. I've worked with a few other cluster techs but ceph is a little different. still trying to work it out...
[4:25] <Sentinel> ceph health still says no osd.s running ...
[4:25] <dmick> you can also start from the 'master' node with init.d/ceph -a start (or service ceph -a start )
[4:26] <Sentinel> yeah that works. I can even look and see the default pools with lspools
[4:26] <dmick> worth verifying if the procs started on the osd nodes, and if not, there may be info left behind in the logs. ah. ok.
[4:27] <Sentinel> I'll have to check that. been poking through the logs a bit
[4:27] <Sentinel> hoping for some tidbit why I'm not quite there yet :D
[4:28] <Sentinel> setup a couple of test nodes and turned off authentication (for now). figured I'd get the basics working then get more elaborate
[4:28] <dmick> wait, so, are you saying ceph health still reports problems even though lspools works? Or did you just have success during this conversation?
[4:29] <Sentinel> that's the rub. I can do a rados lspools and see the default pools from both nodes (nodes 1 and 2 are running). rados from either will show them. But ceph health says no osds
[4:30] <Sentinel> I'm running ubuntu 12.04 and using their ceph packes
[4:30] <dmick> how about ceph -s?
[4:31] <Sentinel> looks like that works. I see mon.0 with an address and an election completed but no osds still
[4:31] <dmick> can you paste the status about osds?
[4:31] <Sentinel> says osd e1: 0 osds: 0 up, 0 in
[4:32] <dmick> and you've got osds configured in ceph.conf? :)
[4:33] <dmick> (in fact maybe you could just pastebin the ceph.conf)
[4:33] <Sentinel> maybe I don't :D from the ceph.conf file I have a general [osd] plus a couple of other's [osd.0] and [osd.1] both have a host= node1 and node2 which resolve in the hosts table
[4:34] <Sentinel> lemme see about copying it.. I have it on another machine not part of my network (yet). It's in a couple vms'
[4:35] <dmick> mm. cut'n'paste can be a challenge
[4:35] <Sentinel> I'm adding the vm to my network and will ssh it over to another live system
[4:36] <dmick> brb
[4:36] <Sentinel> k
[4:38] <Sentinel> http://pastebin.com/CjGePNVK
[4:38] <Sentinel> got it :D
[4:38] <Sentinel> btw, node 1 and node 2 resolve in /etc/hosts on both systems. I also have ssh keys setup for root
[4:51] <Sentinel> I noticed there is a 0.47 version out there. I'm running 0.41 and wonder perhaps there is my issue?
[4:56] <dmick> 0.41 is earlier than the 12.04 version...that's odd
[4:56] <dmick> at least I'm pretty sure it is
[4:57] <sage> iirc 12.04 is 0.41...
[4:57] <Sentinel> yup. Ubuntu 12.04 provided 0.41 of ceph
[4:57] <dmick> time flies. and clearly sage would know
[4:57] <Sentinel> I even ran an apt-get update to make sure
[4:57] <Sentinel> but I'm wondering if the ceph.conf I have is sufficient anyway
[4:57] <Sentinel> http://pastebin.com/CjGePNVK
[4:57] <dmick> yeah, looking
[4:57] <sage> fwiw you'll be better off with the recent debs that we build
[4:58] * cattelan is now known as cattelan_away
[4:58] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[4:58] <Sentinel> that would be cool. didn't realize they had a repo out there. Checking...
[5:00] <dmick> http://ceph.com/docs/master/install/debian/#add-development-packages
[5:00] <Sentinel> awesome thanks!
[5:04] * Ryan_Lane (~Adium@dslb-178-005-175-188.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[5:04] <dmick> .I don't see anything immediately wrong with that conf file, but
[5:05] <Sentinel> something missing perhaps?
[5:05] <dmick> i'd check /var/log/ceph/*osd*.log to see if there are any cluse
[5:05] <dmick> could be, but I'm not spotting it
[5:05] <dmick> ceph -a start doesn't report any errors, I'm assuming
[5:06] <Sentinel> haven't tried with ceph -a I'll check
[5:06] <dmick> well I mean
[5:06] <dmick> /etc/init.d/ceph or service ceph, whichever you used to start the cluster
[5:08] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[5:08] <Sentinel> I use 'service ceph start' and have the two boxes running fine. ceph -a however says mon.0 -> 'unrecognized subsystem' (-22)
[5:09] <Sentinel> I'm using xfs filesystem for the /xfsdisk thought that would be ok
[5:09] <dmick> so, ceph won't start things on the other nodes without -a; I usually use -a from the 'master'
[5:09] <dmick> are you saying you're running ceph start on each node?
[5:09] <Sentinel> yes
[5:10] <dmick> hmm, do I expect that to work...
[5:10] <Sentinel> looking at the logs...
[5:11] <dmick> yeah, I just tested that on my small cluster, and it works ok
[5:12] <Sentinel> interesting. In the logs there are some bits that look odd. I see '*** caught signal (Aborted) ** then what looks like a stack trace of a sort
[5:12] <dmick> that doesn't sound good
[5:12] <Sentinel> I'll setup the latest and see if that fixes it.
[5:12] <dmick> it might be quickest to...JINX
[5:12] <Sentinel> what version did you run in your test?
[5:13] <dmick> I'm running a development snapshot, but it's based on 0.47.2
[5:13] <Sentinel> ok. I'll do the upgrade and see where it's at. I kept thinking I was going nuts. It didn't look that tough to setup after reading the docs :D
[5:14] <dmick> yeah, osd dumping core may just be an old bug
[5:16] * The_Bishop (~bishop@2a01:198:2ee:0:a4c6:6eb9:d9c5:a283) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[5:20] * The_Bishop (~bishop@2a01:198:2ee:0:5db3:33be:9388:34dd) has joined #ceph
[5:26] <dmick> Sentinel: I have to take off pretty soon
[5:27] <Sentinel> no worries. I'm updating now :D
[5:27] <Sentinel> and I really appreciate the help. at least I know my conf it's far off :D
[5:29] <dmick> good deal
[5:29] <dmick> feel free to email me at dan.mick@inktank.com if I can help further, or I'll be around tomorrow PDT biz hours
[5:30] <Sentinel> sounds good and thanks a bunch!
[5:30] <dmick> np
[5:30] <elder> You call this biz hours?
[5:30] <Sentinel> rofl!
[5:30] <dmick> heh. no, but tomorrow I won't be here this late
[5:30] <dmick> night
[5:30] * dmick is now known as dmick_away
[5:30] <Sentinel> thanks and have a good night!
[6:20] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[6:20] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[6:21] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[6:21] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[6:41] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[6:46] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[7:12] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[7:12] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[7:20] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:32] * aliguori (~anthony@ has joined #ceph
[7:47] * The_Bishop (~bishop@2a01:198:2ee:0:5db3:33be:9388:34dd) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[7:48] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[7:48] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[7:55] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[8:05] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[9:03] * renzhi (~renzhi@ Quit (Read error: No route to host)
[9:04] * renzhi (~renzhi@ has joined #ceph
[9:06] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Read error: Connection reset by peer)
[9:08] * Qu310 (~qgrasso@ppp59-167-157-24.static.internode.on.net) Quit (Remote host closed the connection)
[9:11] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[9:17] * Qu310 (~qgrasso@ppp59-167-157-24.static.internode.on.net) has joined #ceph
[9:19] * BManojlovic (~steki@ has joined #ceph
[9:26] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[9:31] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[9:36] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[10:02] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:06] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[10:06] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[10:06] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[10:06] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[10:08] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[10:08] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[10:25] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:31] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:42] * Anticimex (anticimex@netforce.csbnet.se) Quit (Remote host closed the connection)
[10:53] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:54] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:59] * renzhi (~renzhi@ Quit (Quit: Leaving)
[11:01] * renzhi (~renzhi@ has joined #ceph
[11:03] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[11:10] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:18] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[11:20] * sdx24 (~sdx23@with-eyes.net) has joined #ceph
[11:22] * sdx23 (~sdx23@with-eyes.net) Quit (Read error: Connection reset by peer)
[11:44] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[11:48] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:48] * joao (~JL@89-181-148-114.net.novis.pt) has joined #ceph
[11:51] * The_Bishop (~bishop@2a01:198:2ee:0:6989:cd09:f772:c37b) has joined #ceph
[12:04] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[12:26] * renzhi is now known as renzhi_away
[12:27] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[12:36] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[12:37] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[12:50] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[12:53] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[12:53] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[13:00] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Ping timeout: 480 seconds)
[13:01] * MK_FG (~MK_FG@ has joined #ceph
[13:20] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:36] * lofejndif (~lsqavnbok@9YYAAGX8Z.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:32] * lofejndif (~lsqavnbok@9YYAAGX8Z.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[14:42] * Ryan_Lane1 (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) has joined #ceph
[14:42] * Ryan_Lane (~Adium@dslb-188-106-102-014.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[15:32] * aliguori (~anthony@ has joined #ceph
[15:49] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[15:52] * brambles (brambles@ Quit (Remote host closed the connection)
[15:53] * brambles (brambles@ has joined #ceph
[15:55] <nhm> good morning #ceph
[15:55] <sage> good morning :
[15:55] <sage> )
[15:55] <nhm> sage: up early today!
[15:56] <sage> :) i have ~20 min before i have to get the kids up
[15:56] <nhm> Hah, Our kids get *us* up (at about 5:30-6:00am)
[15:58] * brambles_ (brambles@ has joined #ceph
[15:59] <sage> speaking of kids, brambles_, every time i see you nick i think of this guy from richard scarry's dictionary http://dpcc.com/lpetix/brain/brambles.jpg
[15:59] <nhm> hehe
[16:13] * usuario (~usuario@220.Red-81-38-221.dynamicIP.rima-tde.net) has joined #ceph
[16:14] * usuario (~usuario@220.Red-81-38-221.dynamicIP.rima-tde.net) has left #ceph
[16:24] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[16:30] <jmlowe> is there some sort of integrity checker for ceph? I'm looking for something to compare object replicas to make sure they are the same
[16:32] <joao> i'm sure a checksumming tool, going over the objects on the back storage nodes would accomplish that same purpose
[16:33] <jmlowe> I didn't see a way in the api to select an object from a specific osd, is that possible?
[16:33] <joao> in any case, the filestore_idempotent_sequence test has a class that goes around comparing two distinct filestores, if you want to get your hands dirty
[16:33] <joao> you'd have to have access to the osd's back stores though
[16:34] <joao> jmlowe, not into those kinds of details unfortunately
[16:34] <joao> *I'm not
[16:35] * eightyeight (~atoponce@pthree.org) has joined #ceph
[16:40] * stxShadow (~Jens@ip-78-94-238-69.unitymediagroup.de) has joined #ceph
[16:51] * rz (~root@ns1.waib.com) Quit (Ping timeout: 480 seconds)
[16:58] * rz (~root@ns1.waib.com) has joined #ceph
[17:12] * rz (~root@ns1.waib.com) Quit (Ping timeout: 480 seconds)
[17:13] * rz (~root@ns1.waib.com) has joined #ceph
[17:16] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:30] <ninkotech> fsck.ceph :) that would be nice
[17:34] <jmlowe> I dug around a bit and was able to compare some md5 sums in the storage backend
[17:35] * Tv_ (~tv@2607:f298:a:607:bd15:990e:65cd:46db) has joined #ceph
[17:38] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:39] <joao> jmlowe, comparing md5sums should be fairly simple to do cluster-wide with a bash script
[17:40] <joao> I would assume it would be slow if you have a big cluster with lots of osds and objects though :)
[18:07] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[18:10] * BManojlovic (~steki@ has joined #ceph
[18:12] <nhm> interesting, Chris Mason is going to work for fusion IO...
[18:12] <elder> Hmmm.
[18:12] <Tv_> well that's one way to solve the seek problem ;)
[18:12] <joao> lol
[18:12] <elder> Oracle losing interest in btrfs?
[18:13] <nhm> elder: "Fusion-io really believes in open source, and I'm excited to help them shape the future of high performance storage."
[18:13] <elder> Chris losing interest in btrfs?
[18:14] <Tv_> "From a Btrfs point of view, very little will change. I'll still
[18:14] <nhm> elder: sounds like he's still going to be the btrfs maintainer.
[18:14] <Tv_> maintain Btrfs and will continue all of my Btrfs development in the
[18:14] <Tv_> open."
[18:14] <Tv_> https://lwn.net/Articles/500738/ etc
[18:14] <nhm> http://article.gmane.org/gmane.linux.file-systems/65090/
[18:14] <Tv_> "Posted Jun 7, 2012 14:37 UTC (Thu) by masoncl (subscriber, #47138) [Link]
[18:14] <Tv_> I'll actually disagree here, but regardless, Btrfs is still my prime focus."
[18:15] <Tv_> sounds like btrfs is definitely not worse off
[18:20] <joao> it just feels like oracle will lose a whole lot of control it kept on btrfs though
[18:20] <Tv_> oh yes
[18:21] <elder> Oracle was not dictating what Chris did with btrfs, so "control" wasn't really an issue. He was pretty much free to develop it as he saw fit.
[18:21] <Tv_> so btrfs might be better off ;)
[18:21] <Tv_> <- not an Oracle fan
[18:21] <joao> although I'm not sure if that's actually a bad thing
[18:21] <joao> lol
[18:21] <nhm> Tv_: that was my thought too.
[18:21] <Tv_> the one thing you might see, is a little big of neglect for spinning rust
[18:21] <elder> I expect so Tv_
[18:21] <joao> elder, I'm not familiar with that sort of thing, but Oracle did have chris as an employee and now they don't
[18:21] <Tv_> just because you know he'll get a fast disk on his laptop ;)
[18:22] <joao> that's all I'm basing my opinion on :)
[18:22] <Tv_> s/big/bit/
[18:22] <elder> He likely already has one. But spinning media is not what Fusion IO is into.
[18:22] <Tv_> elder: he had an SSD on his laptop when i last heard him talk
[18:23] <Tv_> but fusionio tends to outperform run of the mill ssds by quite a lot
[18:24] <nhm> Tv_: I'd like to do some tests on the fusionIO cards vs OCZ stuff.
[18:24] <Tv_> as in, they quote numbers above SATA 3 rates
[18:24] <Tv_> oh and that's *bytes* not *bits* there
[18:24] <Tv_> so >>
[18:35] <gregaf> since when does FusionIO go into laptops?
[18:37] <Tv_> not sure if they have a PCI Express Mini Card product out in the market
[18:37] <Tv_> but it's the same interface, just less size on the board
[18:37] <Tv_> (= less storage)
[18:38] <gregaf> all the pictures I've seen of them are big 16x cards, I thought, so I was thinking more board size than electrical compatibility ;)
[18:38] <Tv_> yeah the price makes more sense in high end servers
[18:39] <Tv_> but it wouldn't be the first hardware company to have way nicer prototypes internally
[18:39] <Tv_> just not commercially viable
[18:39] <Tv_> the board size just says how many cells of storage you can cram in
[18:39] <Tv_> but the big boards are *huge*
[18:40] <Tv_> as in, terabyte range
[18:41] <Tv_> uhh 5TB
[18:41] <Tv_> drool
[18:41] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:42] <SpamapS> Tv_: hey, how's CEPH dev going?
[18:42] <Tv_> lol, apt-cache show eatmydata
[18:43] <Tv_> SpamapS: the new chef stuff is getting very nice
[18:43] <SpamapS> Tv_: *awesome*
[18:43] <nhm> Tv_: yeah, if the Fast-Forward proposal goes through, we'll probably be buying something like that.
[18:43] <SpamapS> Tv_: is the "new way" available in 0.47.1 ?
[18:43] <Tv_> SpamapS: oh and i can spin up ubuntu cloud image vms faster than you ;) https://github.com/ceph/downburst
[18:44] <SpamapS> Tv_: *nice*!
[18:44] <Tv_> SpamapS: it really needs one bugfix that went in recently, and 7332e9c717fb627d51efcaa3f31473a2c129e876 makes it secure, too
[18:44] <SpamapS> Tv_: tho I've been using lvm for lxc container creation for a while now. :)
[18:45] <Tv_> SpamapS: 8 seconds to ssh in on a full kvm vm
[18:45] <SpamapS> Tv_: yeah, sounds about the same as the LXC case
[18:45] <SpamapS> which is no surprise, they're basically doing the same thing
[18:45] <Tv_> SpamapS: also, remote and scales as much as i need
[18:45] <SpamapS> tho qcow is going to be more flexible than lvm
[18:46] <Tv_> SpamapS: plus really really sweet integration with cloud-init
[18:46] <nhm> Tv_: I'm so looking forward to chef deployments.
[18:46] <ninkotech> downburst looks nice!
[18:47] <nhm> Tv_: Especially when someone (that isn't me) abstracts megacli away. ;)
[18:47] <ninkotech> but i would like to have debian, not ubuntu
[18:47] <Tv_> ninkotech: the enabler is ubuntu cloud images; a tiny rootfs that has all the right stuff in it
[18:48] <Tv_> ninkotech: that and cloud-init, really
[18:48] <ninkotech> Tv_: iam not a big fun of ubuntu. its not as good and systematic as debian is :)
[18:49] <ninkotech> but i might impove the code to fit my needs maybe... if i will have time
[18:49] <Tv_> frankly, i'll take "works on my laptop" and "does awesome things on the cloud" over debian
[18:49] <Tv_> (and you'll still find my name attached to an old @debian.org address all over the net)
[18:50] <Tv_> on the cloud? in the cloud?
[18:50] <Tv_> when surrounded by water vapor.
[18:50] <ninkotech> Tv_: this 'it works for me, so it must be good enough' is exactly why i do not like ubuntu :)
[18:52] <ninkotech> but i will fight for your right to use it :)
[18:53] * bchrisman (~Adium@ has joined #ceph
[18:57] <SpamapS> ninkotech: you know ubuntu is based on Debian, right?
[18:58] <ninkotech> SpamapS: yes, sir
[18:58] * stxShadow (~Jens@ip-78-94-238-69.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[18:59] <SpamapS> I'm a DD, btw, also owning one of those @debian.org addresses. I think Debian serves an important purpose as a broadly accepting, technically excellent operating systme...
[18:59] <SpamapS> But I'm also a Canonical Employee.. so I have to say.. Ubuntu is quite a bit better for certain things. :)
[19:00] <SpamapS> One of those things is "timely security updates" :)
[19:02] <SpamapS> ninkotech: anyway, Ubuntu has come a long way.. we don't just say "works for me" and ship it every 6 months.. https://jenkins.qa.ubuntu.com/
[19:04] * SpamapS files himself in the spam folder
[19:04] <ninkotech> SpamapS: the culture differences are deep and you cant overcome them by few automated tests, even if they are good to have... :)
[19:04] <ninkotech> heh... we should go to another channel for distro wars :)
[19:04] * goozoo (~hhhaaa@9YYAAGYLI.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:05] <ninkotech> i guess both distros are needed :)
[19:05] <goozoo> i guess both distros are needed :)
[19:08] * chutzpah (~chutz@ has joined #ceph
[19:14] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:18] * rturk (~rturk@2607:f298:a:607:e43f:f6c8:8a95:3a6c) has joined #ceph
[19:18] * rturk (~rturk@2607:f298:a:607:e43f:f6c8:8a95:3a6c) has left #ceph
[19:19] * rturk (~rturk@aon.hq.newdream.net) has joined #ceph
[19:23] * goozoo (~hhhaaa@9YYAAGYLI.tor-irc.dnsbl.oftc.net) Quit (autokilled: This host violated network policy. Mail support@oftc.net if you think this in error. (2012-06-08 17:23:14))
[19:23] * goozoo (~hhhaaa@9YYAAGYM7.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:23] * goozoo (~hhhaaa@9YYAAGYM7.tor-irc.dnsbl.oftc.net) Quit ()
[19:50] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Read error: Operation timed out)
[19:50] * gregaf (~Adium@2607:f298:a:607:495f:f540:7282:4591) Quit (Read error: Operation timed out)
[19:51] * dmick_away (~dmick@2607:f298:a:607:f04f:10a7:7055:7d8a) Quit (Read error: Operation timed out)
[19:51] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Read error: Operation timed out)
[19:51] * rturk (~rturk@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[19:51] * Tv_ (~tv@2607:f298:a:607:bd15:990e:65cd:46db) Quit (Read error: Operation timed out)
[19:53] * mkampe (~markk@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[19:53] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[19:54] * sagewk (~sage@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[19:54] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[19:58] <Sentinel> So ceph is now working great! Had to upgrade to 0.47.2 from Ubuntu's 0.41 to get around the core dumps we were seeing :D
[19:58] <Sentinel> so now I'm curious about HA and replication. is there some way to tell what your replication is set at?
[19:59] <Sentinel> I'm pretty sure it's replicating files. 4 gigs of files and I'm using about 9.5 gigs of space out of 40 gigs in my /xfsdisk :D
[20:02] * rturk (~rturk@ has joined #ceph
[20:05] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[20:05] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[20:05] * yehudasa (~yehudasa@2607:f298:a:607:107e:54e6:81f1:673f) has joined #ceph
[20:06] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) has joined #ceph
[20:06] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[20:07] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[20:07] * gregaf (~Adium@2607:f298:a:607:d0d5:985a:c3ae:dced) has joined #ceph
[20:08] * dmick (~dmick@2607:f298:a:607:60fa:5af4:363a:faee) has joined #ceph
[20:08] <Sentinel> Hey dan. ceph 0.47.2 is working great now :D
[20:08] * mtk (qgc2S56Jrd@panix2.panix.com) has joined #ceph
[20:09] <Sentinel> mounted the file system and created a few 1 gig files and copied them successfully.
[20:13] * mtk (qgc2S56Jrd@panix2.panix.com) Quit (Remote host closed the connection)
[20:18] <gregaf> Sentinel: if you do "ceph osd dump" it will output all the pools, which will include a "size" parameter
[20:18] <gregaf> that's the number of copies there are supposed to be for everything in that pool
[20:18] <Sentinel> sweet. I'll check that out :D
[20:19] <Sentinel> is there a way to increase the replication or does ceph do that dynamically?
[20:19] <Sentinel> I'm thinking when I have a couple dozen systems I may need to increase replication.
[20:19] <Tv_> Sentinel: you set the desired level, ceph strives to meet that
[20:20] <gregaf> http://ceph.com/docs/master/control/
[20:20] <Sentinel> very cool :D
[20:20] <gregaf> you can change the "size" param
[20:20] <gregaf> "ceph osd pool set data size 3"
[20:22] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[20:22] <Sentinel> I'm currently at home for lunch but I'll check it out in a few minutes when I get back in the office. Very cool stuff. Thanks!
[20:25] <dmick> eatmydata: lovely
[20:27] <dmick> Sentinel: good to hear
[20:28] <Sentinel> Off to work now. Time to play and get crazy. Let you guys know how it goes :D
[20:28] <Sentinel> later!
[20:41] * rturk_ (~rturk@aon.hq.newdream.net) has joined #ceph
[21:22] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[21:26] * rturk_ (~rturk@aon.hq.newdream.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[21:37] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[21:46] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[21:51] * lofejndif (~lsqavnbok@82VAAECTA.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:09] * Oliver1 (~oliver1@ip-88-153-225-211.unitymediagroup.de) has joined #ceph
[22:21] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has left #ceph
[22:22] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:39] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:40] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[22:48] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[23:00] <joao> /usr/bin/ld: /usr/lib/debug/usr/lib/x86_64-linux-gnu/crt1.o(.debug_info): relocation 0 has invalid symbol index 10
[23:00] <joao> lol
[23:00] <joao> this was weird
[23:00] <joao> had never seen nothing like it before
[23:06] * Oliver1 (~oliver1@ip-88-153-225-211.unitymediagroup.de) has left #ceph
[23:26] * cattelan_away is now known as cattelan
[23:36] <dmick> sometimes a line of code will just make you laugh out loud:
[23:36] <dmick> op.op.op = CEPH_OSD_OP_OMAPGETVALS;
[23:41] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:41] <sjust> ...yeah
[23:42] <dmick> pronouncing that makes me feel like The Penguin
[23:43] <nhm> ugh... too many dimensions of data to plot...
[23:46] <dmick> 3 is the maximum number of D's I can see
[23:49] <Tv_> dmick: http://www.youtube.com/watch?v=j1KSaUEu_T4
[23:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:53] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.