#ceph IRC Log

Index

IRC Log for 2013-08-23

Timestamps are in GMT/BST.

[0:00] <sagewk> np. if you can figure out what the right combination is to reliable reproduce this it will help. thanks!
[0:00] * dmsimard1 (~Adium@108.163.152.2) Quit (Quit: Leaving.)
[0:00] <sagewk> (and btw most interested in finding and fixing the crash 3.10!)
[0:02] <Kioob> Just one thing : hangs were much more frequent when I move a lot of data from OSD (lots of recovery, and some OSD wrongly marked down)
[0:03] * mschiff (~mschiff@84.46.0.74) has joined #ceph
[0:04] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) Quit (Quit: Leaving.)
[0:06] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[0:08] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[0:12] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:14] * diegows (~diegows@190.190.11.42) has joined #ceph
[0:18] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) Quit (Quit: Leaving.)
[0:23] <n1md4> Kioob: http://cinosure.com/foo
[0:26] * diegows (~diegows@190.190.11.42) Quit (Remote host closed the connection)
[0:28] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[0:29] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[0:30] <odyssey4me> I'm in the final stretch of a setup for openstack rbd/cinder configuration and need some help getting the last mile done. Can anyone assist?
[0:32] <odyssey4me> This is the error: http://paste.openstack.org/show/44957/
[0:32] <odyssey4me> It's on the start of cinder-volume. I can see it's an auth error but I've configured the rbd_user and the key is in the /etc/ceph directory.
[0:33] <odyssey4me> I've been following this guide: http://ceph.com/docs/next/rbd/rbd-openstack/
[0:33] <odyssey4me> Glance is working perfectly, it's just cinder that isn't.
[0:34] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[0:36] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) has joined #ceph
[0:40] * devoid (~devoid@130.202.135.184) Quit (Quit: Leaving.)
[0:40] <odyssey4me> tell me something - if one executes 'rados lspools' from a host, what needs to be in place for it to work? that's the command that's failing
[0:43] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) Quit (Quit: Leaving.)
[0:44] <odyssey4me> hmm, bug: https://bugs.launchpad.net/cinder/+bug/1083540
[0:45] * dmsimard (~Adium@69.165.206.93) has joined #ceph
[0:47] * dmsimard1 (~Adium@108.163.152.66) has joined #ceph
[0:49] * dmsimard (~Adium@69.165.206.93) Quit (Read error: Operation timed out)
[0:53] * alfredodeza is now known as alfredo|afk
[0:54] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:55] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[0:55] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit ()
[0:56] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[0:57] <joshd> odyssey4me: looks like the keyring file isn't there or is unreadable by cinder-volume
[0:58] * dmsimard1 (~Adium@108.163.152.66) Quit (Quit: Leaving.)
[0:58] <odyssey4me> joshd - the user I'm using is 'volumes' and the keyring file is /etc/ceph/client.volumes.key with cinder:cinder as the owner:group
[0:59] <odyssey4me> and the keyring file only contains the key, ie
[0:59] <odyssey4me> root@control1:~# cat /etc/ceph/client.volumes.key
[0:59] <odyssey4me> AQAReBZSUF00DxAAXydw4d8bgQzc5/WIssihlA==
[1:01] <odyssey4me> how do I specify the location of the keyring file for 'rados' ?
[1:01] <odyssey4me> or where does it look by default
[1:01] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[1:02] <joshd> odyssey4me: a file with just a key like that is just a key file, which has no default location (but you could add a [client.volumes] section with keyfile = /etc/ceph/client.volumes.key in ceph.conf
[1:02] <odyssey4me> aha: rados --id volumes --conf /etc/ceph/ceph.conf -k /etc/ceph/client.volumes.key lspools
[1:03] * itatar (~itatar@209.6.175.46) has joined #ceph
[1:04] <joshd> a keyring has one or more [$type.$id]\n key = $SECRET sections, and has a few default locations
[1:05] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:06] <odyssey4me> yup, I've seen those
[1:06] <odyssey4me> interesting:
[1:06] <odyssey4me> root@control1:~# rados --id volumes --conf /etc/ceph/ceph.conf -k /etc/ceph/client.volumes.key lspools
[1:06] <odyssey4me> 2013-08-23 01:05:50.438507 7fc9116ad780 -1 auth: can't open /etc/ceph/client.volumes.key in ceph.conf: (2) No such file or directory
[1:06] <odyssey4me> so rados can't open it, even though it's 644
[1:07] <itatar> hello, after spending some time looking at the architecture page (http://ceph.com/docs/master/architecture/) I wanted to give it a try. I have two fedora18 boxes with the ceph package installed. Is this an ok OS to give it a try or should I install something else?
[1:07] <joshd> odyssey4me: use --keyfile instead of -k (which is short for --keyring)
[1:08] <odyssey4me> joshd - aha, that works
[1:08] <odyssey4me> now if only your patch was backported to precise :/
[1:08] <odyssey4me> (grizzly, I mean)
[1:09] <joshd> yeah, it makes setup a bit easier. it might cherry-pick cleanly since it was early in the cycle
[1:11] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[1:11] <xarses> hello
[1:11] <odyssey4me> joshd - thanks for the help... it turns out that adding the client.volumes with the keyfile parameter resolves the issue :)
[1:12] <joshd> odyssey4me: you're welcome, glad that fixed it
[1:12] <odyssey4me> this will be easy enough to automate with chef, so it's not too much of a hassle
[1:13] <odyssey4me> @alram - btw, I've done a ton of setup automation in the chef cookbooks... you're well overdue for a review ;)
[1:13] <cephalobot> odyssey4me: Error: "alram" is not a valid command.
[1:14] <joshd> itatar: fedora should work fine, although most tested on is ubuntu
[1:18] <xarses> hi all, I'm trying to setup glance to use ceph
[1:18] <xarses> it appears that the problem is with the generated keyring file
[1:19] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[1:19] <xarses> glance gives me no hint of this, but attempting to set up rdb map, it compains about the keyring file
[1:19] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has left #ceph
[1:19] <itatar> thanks joshd
[1:19] <xarses> rbd map foo --pool images --name client.images --keyfile ceph.client.images.keyring
[1:19] <xarses> 2013-08-22 23:18:13.917482 7f39b2c48760 -1 auth: failed to decode key '
[1:19] <xarses> [client.images]
[1:19] <xarses> key = AQDGmBZS+B4rDhAADaqqi78QiJvoHjExyd64SQ==
[1:19] <xarses> '
[1:20] <odyssey4me> xarses - the keyfile needs to onlye contain the key itself
[1:20] <xarses> ceph auth get-or-create client.images mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images' > ceph.client.images.keyring
[1:20] <odyssey4me> ie: AQDGmBZS+B4rDhAADaqqi78QiJvoHjExyd64SQ==
[1:20] <odyssey4me> what you have there is a keyring... :)
[1:24] <xarses> ok
[1:25] <xarses> so adding [client.images] keyfile = ceph.client.images.keyring makes rdb map happy (minus the --keyfile)
[1:25] <xarses> however glance still doesn't work right
[1:26] <xarses> glance image-create --name cirros --container-format bare --disk-format qcow2 --is-public yes --location https://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-disk.img
[1:26] <xarses> takes the imagee
[1:26] <xarses> and pretends to save it correctly
[1:26] <xarses> but no data is stored in ceph
[1:26] <xarses> or /var/lib/glance/images
[1:27] <xarses> and retreving the image blocks sending 0 bytes
[1:27] <xarses> i have no clue what to check futher
[1:27] <odyssey4me> xarses - I followed this to the letter for glance, and it worked: http://ceph.com/docs/next/rbd/rbd-openstack/
[1:30] <joshd> xarses: there was a bug in grizzly with --location - https://bugs.launchpad.net/glance/+bug/1146830
[1:31] <xarses> ya, i saw that
[1:31] <xarses> never had issues with it while the backend was swift or localfs
[1:31] <xarses> and it says the image is 100%
[1:32] <joshd> the patch in that bug fixes it
[1:32] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:32] <xarses> nova image-show 8858ebcf-2d78-453e-af06-a71a308746de
[1:32] <xarses> +----------------------+--------------------------------------+
[1:32] <xarses> | Property | Value |
[1:32] <xarses> +----------------------+--------------------------------------+
[1:32] <xarses> | status | ACTIVE |
[1:32] <xarses> | updated | 2013-08-22T23:29:58Z |
[1:32] <xarses> | name | cirros |
[1:32] <xarses> | created | 2013-08-22T23:29:58Z |
[1:32] <xarses> | minDisk | 0 |
[1:32] <xarses> | progress | 100 |
[1:32] <xarses> | minRam | 0 |
[1:32] <xarses> | OS-EXT-IMG-SIZE:size | 9761280 |
[1:32] <xarses> | id | 8858ebcf-2d78-453e-af06-a71a308746de |
[1:32] <xarses> +----------------------+--------------------------------------+
[1:36] <itatar> the instructions to install ceph-deploy (on http://ceph.com/docs/master/start/quick-start-preflight/) are debian centric. is there something written up for fedora?
[1:38] <odyssey4me> xarses - did you set glance_api_version = 2 ?
[1:38] <dmick> sagewk: hm, this is after connect()
[1:38] * dmick digs
[1:38] <joshd> itatar: I think you just need epel and one of the ceph.com repos listed here: http://ceph.com/docs/master/install/rpm/
[1:38] <sagewk> sigh.. not sure then. never actually used the _get method
[1:38] <sagewk> there are unit tests, but that is probably the only user
[1:40] <dmick> so the problem is when it's *not* in .conf, rados_conf_get can't seem to get the default
[1:40] <sagewk> huh
[1:41] <dmick> gdb'ing
[1:41] <xarses> odyssey4me: no, im not on trying to get cinder to work yet, just glance
[1:42] <xarses> i applied that patch and it didn't do anything for me
[1:43] <xarses> odyssey4me: do i need to add that somewhere for glance its self?
[1:43] <dmick> really odd. finds it, but it's ""
[1:44] <dmick> maybe there's no default for clients?
[1:44] <joshd> xarses: does uploading a local file work?
[1:44] <dmick> ARGH
[1:45] <dmick> common_preinit
[1:47] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[1:47] <itatar> thank you joshd
[1:49] <xarses> joshd: ok that worked
[1:49] <xarses> now how do i fix that --location doesn't work?
[1:50] <joshd> xarses: not sure exactly, but this may help: https://review.openstack.org/#/c/42640/
[1:51] <joshd> xarses: glance-api.log with debug on of an upload with --location may give a clue to the problem
[1:51] * madkiss (~madkiss@64.125.181.92) Quit (Quit: Leaving.)
[1:51] <xarses> i have all of the glance logs going to glance-all.log
[1:51] <xarses> never saw anything that looked like an error
[1:52] <xarses> debug was on the whole time
[1:52] <joshd> but it might show the details of the request at least
[1:52] <xarses> i applied https://review.openstack.org/#/c/27457/
[1:52] <xarses> but that didn't help
[1:54] * torment2 (~torment@pool-72-64-192-26.tampfl.fios.verizon.net) has joined #ceph
[1:55] <xarses> they are both trying to address a similar problem
[1:55] <xarses> but not exactly doing the same thing
[1:55] <joshd> right
[1:56] <joshd> I'm wondering if the --location upload somehow provided a size of 0 to rbd
[1:57] <joshd> which the second patch would fix
[2:02] * AfC (~andrew@2407:7800:200:1011:31b2:f929:558b:657f) has joined #ceph
[2:05] <itatar> sigh, now the os I have is based on fedora18 called "applianceoperatingsystem" so ceph_deploy doesn't like it:
[2:05] <itatar> -bash-4.2# ceph-deploy install cephadmin
[2:05] <itatar> [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts cephadmin
[2:05] <itatar> [ceph_deploy.install][DEBUG ] Detecting platform for host cephadmin ...
[2:05] <itatar> [ceph_deploy][ERROR ] UnsupportedPlatform: Platform is not supported: applianceoperatingsystem
[2:05] <itatar> is there a way to override this check?
[2:07] <dmick> hack the source?
[2:09] <dmick> hosts/__init__.py:_get_distro()
[2:15] <xarses> joshd: i had to apply both patches
[2:15] <xarses> as the second is based on the first
[2:15] <xarses> but it didn't resolve the problem with --location
[2:16] <joshd> xarses: could you create a new bug on launchpad and add your glance log to it then?
[2:16] <xarses> neither "chunk" or "resize" are in my glance log
[2:17] * tnt (~tnt@109.130.102.13) Quit (Ping timeout: 480 seconds)
[2:17] <xarses> also, the line numbers in my rbd.py didn't match up, should i fetch the current master and try that first?
[2:18] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[2:18] * sagelap1 (~sage@2600:1012:b00b:952b:c5a2:f13c:f3e3:162b) has joined #ceph
[2:19] <joshd> it's worth a shot, but I don't think any of the other changes would affect uploading
[2:20] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Ping timeout: 480 seconds)
[2:21] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:21] <joshd> xarses: could you see that the rbd image is actually being created on the backend (rbd ls images) and what the size shown there is (rbd info images/uuid)
[2:21] * nwat_ (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:22] <alfredo|afk> itatar: that doesn't sound like something we support, hence the message
[2:22] * alfredo|afk is now known as alfredodeza
[2:22] <alfredodeza> the problem is that we need to determine package manager and dependencies according to the OS
[2:24] <xarses> joshd, so i removed the .pyo and pyc and it hasn't recompiled them after restarting glance
[2:25] <xarses> and i added another from the local filesystem just for fun, works fine, but the driver still didn't compile
[2:25] <xarses> in glance/store/rbd.py
[2:26] <joshd> xarses: is rbd set as the default store? it might be lazy-loading and not compile it until it's used as well
[2:27] <xarses> joshd: ls images only shows the two images from local files
[2:27] <xarses> rbd ls images
[2:27] <xarses> 1e6a7b44-1a05-4f4f-9ed9-7f08fca40ce7
[2:27] <xarses> 2abe4fd2-4d6a-42c0-b041-9fb073fc41af
[2:27] <joshd> xarses: ok, that's a good clue - it's not even getting to create the images when --location is used
[2:27] <xarses> grep rbd /etc/glance/glance-api.conf
[2:27] <xarses> default_store = rbd
[2:27] <xarses> # glance.store.rbd.Store,
[2:27] <xarses> rbd_store_ceph_conf = /etc/ceph/ceph.conf
[2:27] <xarses> rbd_store_user = images
[2:27] <xarses> rbd_store_pool = images
[2:27] <xarses> rbd_store_chunk_size = 8
[2:28] <xarses> rbd info images/2abe4fd2-4d6a-42c0-b041-9fb073fc41af
[2:28] <xarses> rbd image '2abe4fd2-4d6a-42c0-b041-9fb073fc41af':
[2:28] <xarses> size 9532 KB in 2 objects
[2:28] <xarses> order 23 (8192 KB objects)
[2:28] <xarses> block_name_prefix: rbd_data.10c15e2be4d4
[2:28] <xarses> format: 2
[2:28] <xarses> features: layering
[2:28] <xarses> same for the other object
[2:28] <xarses> (ish)
[2:30] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[2:31] * madkiss (~madkiss@184.105.243.235) has joined #ceph
[2:32] <joshd> xarses: could you pastebin the glance-all.log containing one of the --location uploads?
[2:32] * Cube1 (~Cube@88.128.80.12) has joined #ceph
[2:33] * Cube (~Cube@88.128.80.12) Quit (Read error: Connection reset by peer)
[2:34] <xarses> joshd you can have the whole thing if you'd like =)
[2:34] <joshd> sure
[2:46] * AfC (~andrew@2407:7800:200:1011:31b2:f929:558b:657f) Quit (Ping timeout: 480 seconds)
[2:46] * AfC (~andrew@59.167.244.218) has joined #ceph
[2:46] * Cube1 (~Cube@88.128.80.12) Quit (Read error: Connection reset by peer)
[2:46] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[2:46] * Cube (~Cube@88.128.80.12) has joined #ceph
[2:48] <xarses> http://pastebin.com/rn7LRtj9
[2:49] * Cube (~Cube@88.128.80.12) Quit ()
[2:50] * cofol1986 (~xwrj@110.90.119.113) Quit (Read error: Connection reset by peer)
[2:51] <joshd> looks like debug might not be on for glance-api? just a couple lines from it in there
[2:52] <xarses> grep deb /etc/glance/glance-api.conf
[2:52] <xarses> # Show debugging output in logs (sets DEBUG log level output)
[2:52] <xarses> #debug = False
[2:52] <xarses> debug = True
[2:52] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[2:53] * yy-nm (~Thunderbi@218.74.38.31) has joined #ceph
[2:53] <xarses> unless somewhere else should be set
[2:54] * mschiff_ (~mschiff@port-30155.pppoe.wtnet.de) has joined #ceph
[2:54] * sagelap (~sage@2600:1012:b021:bc09:5c2e:507:8615:626c) has joined #ceph
[2:55] <xarses> i created https://bugs.launchpad.net/glance/+bug/1215682
[2:55] <joshd> that's the right setting, I guess there's just very little logging
[2:59] * mschiff (~mschiff@84.46.0.74) Quit (Ping timeout: 480 seconds)
[2:59] * sagelap1 (~sage@2600:1012:b00b:952b:c5a2:f13c:f3e3:162b) Quit (Ping timeout: 480 seconds)
[2:59] * rturk is now known as rturk-away
[3:00] <xarses> bbl
[3:09] * xarses (~andreww@204.11.231.50.static.etheric.net) Quit (Ping timeout: 480 seconds)
[3:10] * sagelap (~sage@2600:1012:b021:bc09:5c2e:507:8615:626c) Quit (Ping timeout: 480 seconds)
[3:14] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[3:19] <nerdtron> hi all!, i'm using ceph on 3 desktop computers with 2 hard drives each
[3:20] <nerdtron> i want to add another disk to each of them, how do i do it? I don't want to lose data
[3:21] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:22] * mikedawson_ (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:27] * julian (~julian@125.70.135.40) has joined #ceph
[3:29] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Remote host closed the connection)
[3:29] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:29] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[3:30] <dmick> nerdtron: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[3:30] <dmick> unless ceph-deploy, in which case
[3:30] * madkiss (~madkiss@184.105.243.235) Quit (Quit: Leaving.)
[3:30] <dmick> http://ceph.com/docs/master/rados/deployment/ceph-deploy-osd/ is more appropriate
[3:31] <nerdtron> ceph deploy is what i'm using, but have you tried? any problems encountered?
[3:32] <dmick> I've added and removed OSDs. It's sorta Ceph's bread-and-butter.
[3:32] <dmick> you may end up having to think about what to do with your crushmap as you change config, is all
[3:33] <dmick> but that's not a "lose data" thing, that's a "it may take a bit before the cluster settles to cleanly replicating everything as yuo wish" thing
[3:38] * xarses (~andreww@c-50-136-199-72.hsd1.ca.comcast.net) has joined #ceph
[3:43] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[3:44] * mikedawson_ (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[3:45] <xarses> back
[3:47] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[3:51] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:52] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Read error: Operation timed out)
[3:52] <dmick> sage: yt?
[3:54] <dmick> https://github.com/ceph/ceph/pull/531 and https://github.com/ceph/ceph/pull/532 are small
[3:58] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Remote host closed the connection)
[4:05] * jluis (~JL@89-181-146-94.net.novis.pt) has joined #ceph
[4:05] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[4:10] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:18] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[4:20] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[4:20] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[4:25] * jluis (~JL@89-181-146-94.net.novis.pt) Quit (Ping timeout: 480 seconds)
[4:34] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[4:37] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:56] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[5:06] * fireD_ (~fireD@93-142-197-175.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-142-214-29.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:22] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[5:26] <dmick> hmf. that's interesting. I marked osd4 (of 35) out for fun. then I noticed 15 was having disk errors, so marked it out. cluster never got fully clean again; osd25 was implicated, and I noticed it was flapping. Marked it out before checking its disks.....and it's stopped flapping now. ?
[5:31] <dmick> ah, I also marked 4 back in; maybe 4 is the primary for PGs on the bad areas of 25's disk
[5:31] <dmick> wonder if the log has enough to identify pgs that cause EIO
[5:32] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[5:34] <nerdtron> dmick i'm curious on how many mons you have
[5:36] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[5:40] * iii8 (~Miranda@91.207.132.71) has joined #ceph
[5:44] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[5:45] * houkouonchi-work (~linux@12.248.40.138) Quit ()
[5:49] * rongze (~quassel@li565-182.members.linode.com) has joined #ceph
[5:50] <dmick> in this case 5
[5:50] <dmick> but really only because I'm testing bringing mons up and down
[5:55] <mikedawson> dmick: I see you implemented Read Latency perf counters for RBD sometime ago (http://tracker.ceph.com/issues/2408)
[5:56] * rongze_ (~quassel@117.79.232.249) Quit (Ping timeout: 480 seconds)
[5:56] <mikedawson> dmick: I've never successfully made rbd create an admin socket. I think ceph.conf, apparmor, and /var/run/ceph permissions are good to go. Any tips on where to look next?
[5:58] <mikedawson> dmick: I have an apparent rbd read latency issue with periodic spikes of 100x or more latency I'm trying to track down
[5:58] <dmick> oh my yes, a *long* time ago
[5:59] <dmick> sadly long before I understood admin sockets :)
[5:59] <mikedawson> dmick: is there any other way to get those counters?
[5:59] <dmick> but, um...yeah, I have wanted this, and tried several time and failed; josh assures me it's possible and obvious
[5:59] <dmick> no, you want the admin socket for sure. let's see: i know more now, can I puzzle it out
[6:00] <dmick> qemu?
[6:00] <mikedawson> dmick: yes
[6:00] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Lost terminal)
[6:00] <mikedawson> dmick: only place I can see someone with success is http://www.spinics.net/lists/ceph-devel/msg07676.html
[6:01] <mikedawson> dmick: "If you add admin_socket=/var/run/ceph/kvm.asok to the rbd device on the qemu command line"...
[6:01] <dmick> right. so that's just a ceph.conf setting as usual
[6:02] <dmick> presumably the socket never shows up?
[6:02] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[6:02] * houkouonchi-work (~linux@12.248.40.138) Quit ()
[6:02] <mikedawson> dmick: I've been trying to do that via ceph.conf, but I haven't tried via the qemu-system-x86_64 command line
[6:02] <mikedawson> dmick: correct - no socket so far
[6:02] <dmick> oh ok. what section of ceph.conf?
[6:03] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[6:03] <mikedawson> [client]
[6:03] <dmick> I wonder if you can test part of that with, like, ceph -w
[6:04] * dmick experiments
[6:05] <dmick> indeed, that makes it create it. Does ceph -w cause /var/run/ceph/kvm.asok to show up for you?
[6:05] <dmick> (and can you do ceph daemon /var/run/ceph/kvm.asok version if so?)
[6:06] <mikedawson> dmick: ceph.conf -> http://pastebin.com/raw.php?i=rwAXWNXg
[6:06] <dmick> ah, you'd said kvm.asok, but conf says rbd.asok; either way that's what I mean
[6:06] <dmick> oh sorry the quote was kvm.asok. anyway.
[6:07] <dmick> the question is can you run ceph -w, and see /var/run/ceph/rbd.asok, and use it
[6:07] * mikedawson wonders if 'admin_socket' should be 'admin socket', bet that's it
[6:07] <dmick> no, it needs _ in the qemu config
[6:08] <dmick> and '_' and ' ' are equivalent to ceph
[6:08] <mikedawson> dmick: what about ceph.conf?
[6:08] <dmick> oh. hm.
[6:08] <dmick> maybe
[6:08] <dmick> did you try the ceph -w experiment?
[6:08] <dmick> it's really easy. all the cool kids are doing it.
[6:10] <mikedawson> I run ceph -w, nothing seems to happen with rbd sockets
[6:10] * silversurfer (~jeandanie@124x35x46x13.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:10] <dmick> admin_socket works for me.
[6:12] <dmick> try strace -f -e trace=bind ceph -w
[6:12] <dmick> somewhere in the noise will be a bind call
[6:13] <silversurfer> Hi all, Stupid question: Because of the incompatibility of v0.67 <> v0.66 MON deamons it is not possible to revert back to dumpling to cutlefish right?
[6:14] <mikedawson> dmick: http://pastebin.com/raw.php?i=bNNahpaK
[6:14] <dmick> but...
[6:14] <dmick> [pid 23435] bind(6, {sa_family=AF_FILE, path="/var/log/ceph/rbd.asok"}, 110) = 0
[6:14] <mikedawson> dmick: I changed to /var/log/ceph/... before that run to see if apparmor was in the way
[6:14] <dmick> and surely there appeared a /var/log/ceph/rbd.asok while ceph -w was running, right?
[6:15] <nerdtron> hi all! i just edited my ceph.conf file what should I do to apply it to all nodes?? and how do i make sure that it is appled
[6:15] <dmick> nerdtron: manually or with ceph-deploy config push
[6:16] <dmick> http://ceph.com/docs/master/rados/deployment/ceph-deploy-admin/#deploy-config-file
[6:16] <dmick> mikedawson: ?
[6:16] <nerdtron> dmick: i used config push and how do i make sure that it is applied?
[6:16] <dmick> restart the daemons
[6:17] <nerdtron> i'm sorry how?
[6:17] <mikedawson> mikedawson: yes! it is there. What the hell?
[6:17] <dmick> mikedawson: it's late :)
[6:17] <dmick> nerdtron: you're aware of ceph.com/docs, right?
[6:17] <dmick> not being snotty, just, the more you can help yourself, the less you depend on help
[6:18] <mikedawson> dmick: the only thing I can think of is I changed ceph.conf to 'admin socket', the bounced the vm, then ran ceph -w
[6:18] <nerdtron> oh sorry, i'm not sure if the mons or the osd need to be restart
[6:18] <dmick> depends on what configuration you changed
[6:18] <mikedawson> dmick: Thanks a bunch!
[6:18] <dmick> mikedawson: the '_' didn't matter
[6:19] <nerdtron> i added a cluster_network and public_network
[6:19] <mikedawson> dmick: then it is magic! I'll take it.
[6:19] <dmick> the ceph -w was just to verify that the config setting was doing what you thought and that your user had creation mechanisms
[6:19] <dmick> are you saying you also got one connected to the vm?
[6:19] <dmick> are you sure?
[6:19] <mikedawson> dmick: not sure yet
[6:19] <dmick> since it's in "client", it'll apply to all clients (and you'll want to move it to client.<whatever-the-kvm-id is>)
[6:19] <dmick> or else every ceph command will clobber it
[6:20] <dmick> I believe "id=" in the qemu config line, and it needs to be unique per vm instance
[6:20] <dmick> and that'll make name be "client.<id>"
[6:20] <mikedawson> dmick: is there a $pid expansion available in ceph.conf?
[6:21] <dmick> yes, although there's also $name and $id
[6:21] <dmick> although I take your point; you could just keep it in [client] but make it meta-named
[6:21] <dmick> http://ceph.com/docs/master/dev/config/?highlight=metavariables#metavariables
[6:22] <silversurfer> when reverting to cuttlefish the mon crashes immediately. dump: http://pastebin.com/9MvgVj7y
[6:23] <dmick> silversurfer: yes, in general, downgrade simply doesn't work
[6:23] <dmick> there's no attempt to support it and no testing; if it works, or appears to, it's by luck. in this case because of the protocol change it's just about 100% unlikely
[6:24] <silversurfer> dmick: Thank you. Makes sense
[6:24] <dmick> nerdtron: so that'll really only affect osds (assuming the public network stays the same)
[6:24] <dmick> but you may have hostname work to do
[6:25] <dmick> I don't remember what might need changing when that changes post-installation
[6:25] * rongze_ (~quassel@106.120.176.78) has joined #ceph
[6:25] <nerdtron> the public network stays the same.. i added a backend network
[6:26] <dmick> it might just work; I don't know the details of how address resolution works with interface selection
[6:26] <nerdtron> does the hostnames should point on the cluster now or shoudl they stay to point on the public network?
[6:26] <dmick> that's exactly what I mean about not knowing the details :)
[6:26] <nerdtron> hhmmm..ok thanks, i'll try after the disk i added finished copying
[6:27] <nerdtron> 35MB/s per secs is too slow on rebuilding
[6:27] <nerdtron> not using a seperate ss for the journal
[6:27] <nerdtron> and the cpu spikes up
[6:28] <dmick> I'm doing a similar thing with my test cluster atm; I only have the one 1Gb interface :(
[6:28] <dmick> I smacked up osd_recovery_threads and osd_recovery_max_active to try to speed it up. not sure it mattered; I think I'm net-limited
[6:29] <dmick> mikedawson: sorted?
[6:30] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[6:30] * houkouonchi-work (~linux@12.248.40.138) Quit ()
[6:30] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[6:31] * rongze (~quassel@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[6:31] <mikedawson> dmick: closer. thanks a bunch!
[6:32] <dmick> have you gotten qemu to create a socket yet?
[6:32] <dmick> if so I'm declaring victory and going home :)
[6:33] <mikedawson> dmick: openstack creates my libvirt.xml for guests, so I have to decide how I want to handle it...
[6:33] <dmick> ah. I dunno if or how you can set id
[6:33] <mikedawson> dmick: head home!
[6:33] <dmick> heh. ok. at least I'll go put my cycling shorts on
[6:39] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:41] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:44] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[6:44] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[6:49] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[6:52] * brzm (~medvedchi@node199-194.2gis.com) has joined #ceph
[6:53] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[6:54] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[7:02] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[7:04] * jaydee (~jeandanie@124x35x46x10.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:08] * silversurfer (~jeandanie@124x35x46x13.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[7:18] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[7:26] * sagelap (~sage@76.89.177.113) has joined #ceph
[7:37] * silversurfer (~jeandanie@124x35x46x13.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:40] * jaydee (~jeandanie@124x35x46x10.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[7:46] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:52] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Read error: Operation timed out)
[8:02] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[8:04] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[8:05] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[8:08] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[8:31] * dlan (~dennis@116.228.88.131) Quit (Read error: Connection reset by peer)
[8:37] * sagelap (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[8:41] * julian (~julian@125.70.135.40) Quit (Quit: Leaving)
[8:50] <nerdtron> dmick hi
[8:50] * rongze (~quassel@li565-182.members.linode.com) has joined #ceph
[8:50] <nerdtron> are there any performance improvements if i put the journal on a seperate partition on the same hard drive as the OSD?
[8:52] <yanzheng> I don't think so
[8:57] * ssejour (~sebastien@out-chantepie.fr.clara.net) has joined #ceph
[8:57] * rongze_ (~quassel@106.120.176.78) Quit (Read error: Operation timed out)
[9:01] * julian (~julianwa@125.70.135.40) has joined #ceph
[9:02] * tnt (~tnt@109.130.102.13) has joined #ceph
[9:07] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:08] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:09] * rongze_ (~quassel@106.120.176.78) has joined #ceph
[9:09] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[9:11] * tnt (~tnt@109.130.102.13) Quit (Ping timeout: 480 seconds)
[9:12] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[9:14] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[9:15] * rongze (~quassel@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[9:21] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) has joined #ceph
[9:24] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[9:24] * ChanServ sets mode +v andreask
[9:25] * dlan (~dennis@116.228.88.131) has joined #ceph
[9:26] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[9:34] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:36] * KindTwo (~KindOne@h168.53.186.173.dynamic.ip.windstream.net) has joined #ceph
[9:37] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:38] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:38] * KindTwo is now known as KindOne
[9:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[9:44] * rongze (~quassel@li565-182.members.linode.com) has joined #ceph
[9:45] * rongze__ (~quassel@117.79.232.249) has joined #ceph
[9:48] * allsystemsarego (~allsystem@5-12-37-127.residential.rdsnet.ro) has joined #ceph
[9:50] * rongze_ (~quassel@106.120.176.78) Quit (Ping timeout: 480 seconds)
[9:53] * rongze (~quassel@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[9:55] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[10:14] * KindTwo (~KindOne@h168.53.186.173.dynamic.ip.windstream.net) has joined #ceph
[10:14] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:14] * KindTwo is now known as KindOne
[10:17] * lupine (~lupine@lupine.me.uk) Quit (Ping timeout: 480 seconds)
[10:18] * lupine (~lupine@lupine.me.uk) has joined #ceph
[10:22] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[10:23] * yasu` (~yasu`@99.23.160.231) Quit (Remote host closed the connection)
[10:24] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[10:27] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[10:30] <ccourtaut> morning
[10:31] * AfC (~andrew@59.167.244.218) Quit (Read error: Operation timed out)
[10:33] * sleinen (~Adium@2001:620:0:46:699a:1e8e:f82a:c697) Quit (Read error: No route to host)
[10:34] * sleinen (~Adium@2001:620:0:46:699a:1e8e:f82a:c697) has joined #ceph
[10:37] * frank9999 (~Frank@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[10:49] * pressureman (~pressurem@62.217.45.26) has joined #ceph
[10:50] * deg (~oftc-webi@V10K1.bull.fr) has joined #ceph
[10:52] * LeaChim (~LeaChim@176.24.168.228) has joined #ceph
[10:55] <pressureman> i just apt-get upgraded my test cluster and saw a new dumpling release 0.67.2, but haven't seen any announcement... is it official, and if so, are there release notes?
[10:57] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:10] * Alex-OSSO (~root@vpn.mc.osso.nl) Quit (Quit: Lost terminal)
[11:15] * mtl1 (~Adium@c-67-176-54-246.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[11:17] * mtl1 (~Adium@c-67-176-54-246.hsd1.co.comcast.net) has joined #ceph
[11:18] * yy-nm (~Thunderbi@218.74.38.31) Quit (Quit: yy-nm)
[11:18] * Frank9999 (~Frank@kantoor.transip.nl) has joined #ceph
[11:24] * rongze (~quassel@211.155.113.206) has joined #ceph
[11:28] * rongze__ (~quassel@117.79.232.249) Quit (Ping timeout: 480 seconds)
[11:33] <loicd> morning
[11:47] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:47] * fireD_ (~fireD@93-142-197-175.adsl.net.t-com.hr) Quit (Quit: Lost terminal)
[11:50] * dlan (~dennis@116.228.88.131) Quit (Read error: Connection reset by peer)
[11:54] * silversurfer (~jeandanie@124x35x46x13.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[11:54] * KindTwo (~KindOne@h97.48.186.173.dynamic.ip.windstream.net) has joined #ceph
[11:56] <loicd> pressureman: 67.2 was published yesterday, I guess the release notes / announcement will be published today.
[11:56] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:57] * julian (~julianwa@125.70.135.40) Quit (Read error: Connection timed out)
[11:58] * julian (~julianwa@125.70.135.40) has joined #ceph
[11:58] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[11:59] * dlan (~dennis@116.228.88.131) has joined #ceph
[12:03] * KindTwo (~KindOne@h97.48.186.173.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[12:18] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) has joined #ceph
[12:19] * bergerx_ (~bekir@78.188.204.182) Quit (Ping timeout: 480 seconds)
[12:31] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[12:40] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:42] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[12:43] * dlan (~dennis@116.228.88.131) Quit (Read error: Connection reset by peer)
[12:49] * rongze_ (~quassel@notes4.com) has joined #ceph
[12:54] * dlan (~dennis@116.228.88.131) has joined #ceph
[12:57] * rongze (~quassel@211.155.113.206) Quit (Ping timeout: 480 seconds)
[12:57] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[13:00] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[13:00] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:01] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[13:01] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[13:02] * pressureman (~pressurem@62.217.45.26) Quit (Quit: Ex-Chat)
[13:02] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[13:03] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[13:03] * ChanServ sets mode +o scuttlemonkey
[13:10] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[13:11] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[13:12] * nerdtron (~kenneth@202.60.8.252) Quit (Ping timeout: 480 seconds)
[13:17] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[13:17] * ChanServ sets mode +v andreask
[13:19] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:25] * loicd happy that the first test of the jerasure plugin works :-) https://github.com/dachary/ceph/commit/aaa6c6cbec08358c300e5071440e9232f82ca377#L17R33
[13:27] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:28] * yanzheng (~zhyan@101.83.59.15) has joined #ceph
[13:29] * julian (~julianwa@125.70.135.40) Quit (Read error: Connection timed out)
[13:30] * julian (~julianwa@125.70.135.40) has joined #ceph
[13:46] <janos> loicd: i've seen talk about erasure code in here over the past month or two. what is it exactly?
[13:47] <janos> i know i can delete objects in ceph. so it can't just be that can it?
[13:47] <loicd> it is :) only it uses less disk to do so. That's the basic idea.
[13:48] <janos> haha
[13:48] <janos> ok
[13:48] <loicd> janos: http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Erasure_coded_storage_backend_%28step_2%29
[13:48] <loicd> there is a drawing that will hopefully get you an idea about how it's done
[13:48] <janos> oh sweet
[13:48] <janos> thanks, will read
[13:48] <janos> i stopped using rss and the like long ago - i was distracting myself wayyy too much
[13:50] <janos> oh neat
[13:50] <janos> didn't realize the parity aspect
[13:54] <janos> ahhhh all the PG references
[13:55] <janos> i'm doing alot of postgresql stuff right right now - they use 'PG" 'pg' etc as a shortned references as well
[13:58] * mschiff (~mschiff@port-30155.pppoe.wtnet.de) has joined #ceph
[14:00] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:00] * mschiff_ (~mschiff@port-30155.pppoe.wtnet.de) Quit (Read error: Operation timed out)
[14:00] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:07] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:07] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:24] * zhyan_ (~zhyan@101.83.172.240) has joined #ceph
[14:27] * brzm (~medvedchi@node199-194.2gis.com) Quit (Quit: Leaving.)
[14:28] * yanzheng (~zhyan@101.83.59.15) Quit (Ping timeout: 480 seconds)
[14:29] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:31] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:31] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[14:39] <alfredodeza> PSA: ceph-deploy has a new release out, v 1.2.2
[14:39] <alfredodeza> make sure you update :)
[14:39] <alfredodeza> ceph-deploy repos?
[14:39] <kraken> ceph.com/packages/ceph-extras/(debian|rpm) and ceph.com/(debian|rpm)-{release}
[14:40] <alfredodeza> ceph-deploy docs?
[14:40] <kraken> https://github.com/ceph/ceph-deploy#ceph-deploy----deploy-ceph-with-minimal-infrastructure
[14:40] <alfredodeza> \o/
[14:41] * mozg (~andrei@194.72.131.170) has joined #ceph
[14:47] <mozg> hello guys
[14:48] <mozg> does anyone know if Dumpling support a geo-replication feature?
[14:48] <mozg> i was wondering if you could have async or sync replication across to another data centre?
[14:51] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Quit: Ex-Chat)
[14:54] <zackc> mozg: unfortunately no, that will be in the next release
[14:55] <joao> mozg, rgw now has multi-region support
[14:55] <joao> http://ceph.com/docs/master/release-notes/#v0-67-dumpling
[15:01] * cfreak200 (~cfreak200@p4FF3E172.dip0.t-ipconnect.de) has joined #ceph
[15:03] * cfreak201 (~cfreak200@p4FF3E75F.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[15:04] <ofu_> how do I fix something like this? 2013-08-23 14:38:56.096881 7f5948a39700 0 log [ERR] : 3.26c5 osd.45: soid f45e26c5/rb.0.3c80.238e1f29.0000000201ef/head//3 candidate had a read error, digest 2087527079 != known digest 432743869
[15:04] <ofu_> all 4 copies seem to have the same md5sum that ceph claims to be unhappy with
[15:08] * symmcom (~wahmed@rx0so-shaw-pipe.cg.bigpipeinc.com) has joined #ceph
[15:08] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:11] <symmcom> Hello amazing CEPH community!
[15:11] <mikedawson> hello symmcom
[15:13] <symmcom> today i bring a question which still unanswered in the tiny brain of mine. Have been trying to understand it for last 5 months or so. To some it might be sooo simple that i could be taken as "idiot" :)
[15:13] <symmcom> but i am going to ask anyway with hope that somebody would take some time to explain this to me and propel me further into my experience with amazing CEPH
[15:15] <symmcom> here it goes...... I fully understand and have fully setup RBD Block storage and CEPH FS Storage..... but i am not understanding CEPH Object storage, what do i put in it, when to use it and what is it anyway ....
[15:15] * zhyan_ (~zhyan@101.83.172.240) Quit (Ping timeout: 480 seconds)
[15:17] <mikedawson> symmcom: object storage (rados gateway or rgw) is similar to Amazon S3 or OpenStack Swift. Think of a non-structured way to store files without a filesystem hierarchy
[15:18] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:19] <symmcom> regular files can be stored in OBJ storage just like folders/files?
[15:19] * twcai (~twcai@125.119.255.30) has joined #ceph
[15:21] <janos> it's kind of like a giant key/value pair storage
[15:21] <janos> symmcom: yep, regular files
[15:22] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:22] <mikedawson> symmcom: imagine you have millions or billions of images, video files, etc. Object Storage is a way to upload them into a flat namespace storage system indexed by key. No folders at all.
[15:24] <symmcom> hmm thats interesting. u said no folders at all.. what if i want to organize those image/video files into different folders? not possible?
[15:24] <janos> symmcom: yes, just liek s3. s3 has no folders. they fake it
[15:24] * zhyan_ (~zhyan@101.82.120.66) has joined #ceph
[15:24] <janos> keys with '/' in them look like fodlers
[15:24] <janos> *folders
[15:24] <janos> but they aren't
[15:25] <janos> i back up the family photos/videos to ceph
[15:25] <symmcom> not familiar with S3 thats probably i m having difficulties wrapping my mind around it. Although i heard of Amazon S3 and Swift
[15:25] <janos> it's all there nice and orderly
[15:27] <symmcom> ok....here is the setup i currently have in production environment... Proxmox Hypervisor Cluster of 4 nodes, CEPH Storage Cluster of 2 nodes total of 14 TB space with setup of RBD and CEPH FS. All Virtual Machines are on RBD Shared Storage. All ISO images are on CEPH FS used by VM.
[15:27] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[15:28] <symmcom> how would VM users access the OBJ storage? via Rados GW ? with authentication?
[15:28] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[15:28] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit ()
[15:32] * deg (~oftc-webi@V10K1.bull.fr) Quit (Quit: Page closed)
[15:33] * markbby (~Adium@168.94.245.4) has joined #ceph
[15:34] <zackc> symmcom: that's correct
[15:34] <mikedawson> symmcom: yes, Rados Gateway is the primary interface for object storage. It is basically an apache webserver with ceph/rgw bits
[15:37] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[15:38] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[15:39] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[15:41] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[15:41] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[15:43] <symmcom> mike> i like the sound of "Web Server" :) how are files copy done? through web interface? '
[15:49] * aciancaglini (~quassel@78.134.20.174) has joined #ceph
[15:50] <zackc> symmcom: radosgw is compatible with Amazon S3 and OpenStack Swift
[15:51] <zackc> it doesn't ship with any web ui (that i'm aware of), but it's possible to build on top of it
[15:52] <symmcom> zackc> if i want to setup our own rados GW on our cluster, do i need to sign up with S3 first?
[15:53] <Gugge-47527> it works _like_ s3
[15:53] <Gugge-47527> not _with_ s3
[15:53] <Gugge-47527> So no, you dont need to sign out with anything at amazon to use the radosgw
[15:54] <zackc> sorry, i was unclear
[15:54] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[15:57] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[16:00] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:02] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[16:03] <symmcom> slowly, but surely i m beginning to understand.. :) it is clear to me "what" i can store on OBJ Store, but still not clear how exactly people access it
[16:03] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:04] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:08] <symmcom> i m reading the CEPH Quick Start for OBJ Storage. Which one is Server Node? I have 2 Admin Nodes, 9 MON Nodes, 2 MDS Nodes, 2 OSD Nodes
[16:09] * zhyan_ (~zhyan@101.82.120.66) Quit (Ping timeout: 480 seconds)
[16:11] <ofu_> symmcom: you should have osd >> mons and probably several osd processes (one for each disk) per osd node
[16:12] <twcai> Did anyone tried Ceph OSD performance tuning? I use "rados bench" to run benchmark in one OSD cluster. Looks like OSD instance cannot use more than 4 cores, even I increase op threads to 32.
[16:13] <ofu_> twcai: thats why I use 24 osds/disks per node => enough processes
[16:14] <loicd> ccourtaut: do you know gtest well ?
[16:15] <mozg> joao, what is the multi-region feature does?
[16:15] * loicd reading http://code.google.com/p/googletest/wiki/V1_6_AdvancedGuide#Type-Parameterized_Tests
[16:17] <symmcom> ofu> I have 2 OSD nodes with 3 OSDs on each a total of 6 OSDs
[16:24] <mikedawson> symmcom: you probably would should have 3 ceph-mon, not 9
[16:25] <twcai> ofu_: hi, is it a design? Run multi OSD instances in one node?
[16:25] <loicd> ccourtaut: that's what I was after : http://code.google.com/p/googletest/wiki/V1_6_AdvancedGuide#Type-Parameterized_Tests
[16:26] * mozg (~andrei@194.72.131.170) Quit (Ping timeout: 480 seconds)
[16:26] * julian (~julianwa@125.70.135.40) Quit (Quit: afk)
[16:27] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:27] <symmcom> mike> the reason i went with 9 cause i thought the more the better. :) i should take down 6 MONs?
[16:29] <mikedawson> symmcom: On a busy cluster, there is significant overhead between monitors. 3 or 5 monitors are typical.
[16:29] <absynth> so
[16:29] <absynth> did the weird performance issue in dumpling get fixed?
[16:30] <symmcom> Ok, i will reduce the MONs to 3. CEPH OBJ Storage Quick start says Install Apace and FasCGI on the server machine. Which one is my Server Machine? Admin?
[16:30] <absynth> the machine that will be serving data via RadosGW/S3
[16:31] <mikedawson> symmcom: If you just shut 6 of 9 ceph-mon processes down, you will lose quorum (the three remaining monitors will be less than 50% of the total of 9). That won't work. There is a process to remove Monitors. http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
[16:32] <symmcom> absynth> Ok. Going to fire up a new pc for the Rados. mike> i will do it one at a time and make sure the quorum keeps up
[16:39] * jerker is now known as rekrej
[16:41] <rekrej> Just bought two HP ProLiant MicroServer T1610G with 10 GB RAM and four HDD and one SSD each planning for ZFS but hoping for Ceph on top.
[16:41] <symmcom> mike> i m using the command #service ceph -a stop mon.ceph-mon-09 to stop the mon but keep saying ceph-mon-09 not found in /etc/ceph/ceph.conf. any idea? i did not add anything manually into ceph.conf. Whatever ceph-deploy added i left it the way it is
[16:41] <rekrej> Anyone running something similiar?
[16:41] <absynth> ceph on two nodes? hrm, i dunno if this is really an awesome idea
[16:42] <rekrej> I can buy more if I need to or use others for MON.
[16:42] <absynth> i'm not sure if you can effectively achieve *any* meaningful data redundancy with two nodes
[16:42] <symmcom> rekrej> i tried CEPH on top ZFS, performance....realyl sucked
[16:42] <rekrej> focusing on the hardware for small OSDs
[16:42] <absynth> yeah, three mons would be the sensible minimum
[16:42] <absynth> 10gb ram sounds sensible for the normal osd load profile
[16:43] <rekrej> symmcom: auch. So XFS it is still?
[16:43] * sprachgenerator (~sprachgen@130.202.135.179) has joined #ceph
[16:43] <absynth> but the gods help ye if you encounter one of the memleak bugs :)
[16:43] <rekrej> absynth: four drives, 10 GB was my idea
[16:43] <rekrej> But one can add up to 16 in them, just put another stick in.
[16:44] <absynth> nah, for a small OSD that sounds healthy
[16:44] <symmcom> i tried to use both FreeNAS and OmniOS+Napp-It ZFS and put CEPH on top because i liked ZFS redundancy soo much, but it was a waste of time due to the fact that ZFS is memory hungry
[16:44] <rekrej> Actually my focus on was to reduce to cost per drive slot. I went through all manufacturers here and tried to find one who could beat supermicro... but i do not want large 24/36-disk monsters from supermicro for ceph.
[16:44] <absynth> would be completely idiotic, anyway
[16:45] <rekrej> symmcom: hmm, yes zfs want memory
[16:45] <absynth> one disk per OSD, that would mean 24 OSDs on such a box
[16:45] <absynth> not sure that is something someone in their right mind would want to do ;)
[16:45] <symmcom> to be honest, CEPH redundancy is far superior than ZFS and i dont have to spend lotsa money on memory so ZFS does not go hungry
[16:46] <rekrej> absynth: in the supermicro ones, yes, the point it, i do not have money for 2-3 large supermicro boxes for the moment, also a risk taking if this do not work fine. these small boxes I can expand later if it works fine. Or just move the stuff to other services and reuse the MicroServers for other purposes.
[16:46] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:47] <rekrej> are the erasure coding stuff still in blueprint or have there been any testing/running?
[16:48] <rekrej> for us uhm interested users?
[16:49] <rekrej> symmcom: i really like the checksums and redundancy in zfs. A feeling of smug safenss. But not very fast, no.
[16:49] <odyssey4me> xarses Did you find the cause of the problem you had uploading images into Glance? I think I may have the same issue.
[16:51] <rekrej> the interesting with the HP ProLiant MicroServer is they have just enough CPU power for the four HDDs. For example the even cheaper Storage Pods like Backblaze are building, they do not have enough CPU power. http://blog.backblaze.com/category/storage-pod/
[16:51] <rekrej> cheaper in terms of cost per drive slot
[16:56] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[16:56] * ChanServ sets mode +v andreask
[16:57] * zhyan_ (~zhyan@101.82.120.66) has joined #ceph
[16:57] * bergerx_ (~bekir@78.188.204.182) Quit (Ping timeout: 480 seconds)
[16:57] <symmcom> rekrej> I lived with ZFS for 2+ years and had 5 nodes cluster with 27 Virtual machines with ZFS as shared storage for all VMs. living with CEPH now i can tell this much.... i wont even miss ZFS. :) yes ZFS has checksums but comes at a cost. CEPH gives u the ability of not just HDD redundancy but the entire server. ZFS cannot be scale out seamlessly just by adding another server node with bunch
[16:57] <symmcom> of HDD on it.
[16:57] <ccourtaut> does anyone know if there is something to setup to get acces to the /admin/log api on the radosgw?
[16:58] * alram (~alram@38.122.20.226) has joined #ceph
[16:58] * sagelap (~sage@2600:1012:b025:3060:c5a2:f13c:f3e3:162b) has joined #ceph
[17:02] <rekrej> symmcom: indeed. This is the cunning plan. One thing - what about compression?
[17:03] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Read error: Connection reset by peer)
[17:03] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:03] <rekrej> symmcom: for some of my usage I get 3.5x compression with ZFS. XFS still do not have transparent compression as far as I know.. Btrfs does but I do not want to rely on it yet..
[17:06] <symmcom> rekrej> as far as i know CEPH RBD do not have any compression, it is just a block device thus higher performance than NFS
[17:07] <rekrej> symmcom: mmm, yes, but thinking, in order to get compression for the object storage one need compression either in ceph (not there yet) or in the file system below,
[17:07] <janos> is there a recommended kernel version baseline for Dumpling?
[17:07] <janos> i'm around the 3.8.x area
[17:08] <rekrej> so for the file system below one can choose between Btrfs and ZFS to get compression but neither is as good as Ext4fs or XFS as far as I understand.
[17:08] <rekrej> (I will mostly run CephFS probably)
[17:08] <symmcom> rekrej> i m not sure your usage purpose with ZFS, but in my case i was storing my virtual machines on NFS, at first with compression on. But compression and Virtual Machine files(Virtual HDD) dont really go togther. big performance issue. I had to disable ZFS compression in order get somewhat acceptable performance out of all VMs
[17:09] <rekrej> symmcom: ok. My usage is mostly backup for HPC stuff, 100 million files. Works like a charm in ZFS. Will see how Ceph cope with it.
[17:09] <rekrej> CephFS I mean.
[17:10] <symmcom> rekrej> r u talking about CEPH File Storage? i dont believe it is fully ready to take on full time file sharing. Performance and stability still a issue
[17:10] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:10] <symmcom> i tried to backup on CEPH FS all my VM, almost everytime crashed
[17:10] <rekrej> symmcom: yes I am aware :) have been running from time to time
[17:10] <rekrej> symmcom: mmm, VMs better run on RBD I guess.
[17:11] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[17:11] <symmcom> now i use CEPH FS purely to store ISO images, CEPH RBD for all VM store and FreeNAS ZFS for weekly backup
[17:13] <rekrej> symmcom: one really cool thing with CephFS is that it as least from the beginning was planned to be able to scale with distributed metadata between the different MDS. no other planned distributed storage projects had that in mind. CephFS has been slightly negligated for a while (need a good foundation to stand on first) but is showing steady progress.
[17:14] <rekrej> A harder problem to solve than just object storage and RBD. But there are a lot of old school HPC centers interested in a new distributed parallel file system to replace their proprietary solutions with.
[17:14] <rekrej> I really hope Inktank can get some of that cake.
[17:14] <symmcom> rekrej> Totally agree with you there!! It got good potential, just need some T.L.C :)
[17:14] * sel (~sel@212.62.233.233) Quit (Quit: Leaving)
[17:15] * sagelap1 (~sage@2600:1012:b010:b4ca:dc7a:80fc:75d6:a860) has joined #ceph
[17:15] <rekrej> googlng tender love care ahh
[17:15] <symmcom> :)
[17:16] <rekrej> our kHPC center just bought a couple of racks with other type of proprietary storage. Had CephFS been stable that could have been a great choice. They are even running
[17:16] <symmcom> mikedawson> if u r still here... i have removed 4 MONs and now down to 5 MONs. CEPH Health is OK. Cluster active+clean
[17:16] <rekrej> opennebula which fits just right in.
[17:17] <mikedawson> symmcom: good
[17:18] * madkiss (~madkiss@184.105.243.235) has joined #ceph
[17:19] * mozg (~andrei@194.72.131.170) has joined #ceph
[17:19] <rekrej> symmcom: you made me listen to TLC Waterfalls :) /signing of going home from work
[17:19] <symmcom> :) ok rekrej
[17:21] * sagelap (~sage@2600:1012:b025:3060:c5a2:f13c:f3e3:162b) Quit (Ping timeout: 480 seconds)
[17:22] * dlan_ (~dennis@116.228.88.131) has joined #ceph
[17:22] * dlan (~dennis@116.228.88.131) Quit (Read error: Connection reset by peer)
[17:25] * yehudasa_ (~yehudasa@2602:306:330b:1410:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[17:31] * sagelap1 is now known as sagelap
[17:31] * xarses (~andreww@c-50-136-199-72.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:32] <sagelap> joao: ping!
[17:34] <joao> here
[17:34] <sagelap> can you look at wip-6090 ?
[17:35] <joao> sure
[17:36] <sagelap> the expand mon tests have shaken out a surprising number of paxos problems this summer. the corner cases are getting smaller and smaller, though!
[17:36] * sagelap (~sage@2600:1012:b010:b4ca:dc7a:80fc:75d6:a860) Quit (Read error: Connection reset by peer)
[17:40] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[17:40] * ntranger (~ntranger@proxy2.wolfram.com) has joined #ceph
[17:43] <ccourtaut> yehudasa: around?
[17:45] <ccourtaut> i'm getting a 404 with radosgw-agent when it tries to reach the dest cluster on the /admin/replica_log?bounds url
[17:46] <ccourtaut> at first i got another problem reach the src cluster because i was missing the .log pool
[17:46] <ccourtaut> now i got it on both cluster
[17:46] <ccourtaut> am i missing another pool?
[17:46] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:47] <ccourtaut> or might it be something else?
[17:47] * madkiss (~madkiss@184.105.243.235) Quit (Quit: Leaving.)
[17:47] <joao> sagelap, I think it makes sense; are you positive the rationale for discarding the uncommitted_v holds?
[17:47] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[17:47] * yasu` (~yasu`@99.23.160.231) has joined #ceph
[17:48] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[17:49] <ntranger> hey alfredodeza! I'm running the updated ceph-deploy mon create, and it gets to "statring ceph-create-keys on ceph01..." and it just hangs there. Any hints?
[17:52] <alfredodeza> oh god no
[17:52] <alfredodeza> ntranger: what OS
[17:53] <alfredodeza> and is ceph-deploy actually hanging or is it ceph-create-keys ?
[17:54] <ntranger> Its Scientific Linux 6.4. As for if its the keys that are hanging or ceph-deploy itself, I'm not sure.
[17:54] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[17:55] <alfredodeza> I mean, you are running ceph-deploy and it is not exiting the command forcing you to Ctrl-C? or it ends nicely but you see ceph-create-keys hanging on the remote host?
[17:55] <ntranger> yeah, I have to control out of it
[17:56] <alfredodeza> can you reproduce from scratch every time or have you just hit this once ?
[17:56] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[17:56] <ntranger> everytime I try to run it, it hangs in the exact same place
[17:56] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:57] <alfredodeza> can you share your ceph-deploy output?
[17:58] <ntranger> sure, one sec.
[18:00] * danieagle (~Daniel@177.97.251.212) has joined #ceph
[18:01] <ntranger> alfredodeza: here ya go. http://pastebin.com/qZGGLKCZ
[18:01] <sagewk> zackc: https://github.com/ceph/teuthology/pull/48
[18:01] <alfredodeza> ntranger: that looks like ceph-deploy 1.2.1
[18:01] <alfredodeza> or, 1.2 actually
[18:01] <alfredodeza> ntranger: are you sure this is 1.2.2 ?
[18:02] <ntranger> it may be 1.2.1 actually. I downloaded it a couple weeks ago, and am just not getting back to it
[18:02] * ssejour (~sebastien@out-chantepie.fr.clara.net) Quit (Quit: Leaving.)
[18:03] <alfredodeza> ntranger: we released yesterday :)
[18:03] <alfredodeza> ntranger: can you upgrade and try again?
[18:03] <ntranger> I fail. lol
[18:03] <ntranger> absolutely
[18:03] <alfredodeza> because this is one issue I fixed
[18:03] <alfredodeza> ntranger: no no, no worries :)
[18:03] <alfredodeza> you did give me a partial heart attack though because it was a very very difficult thing to fix
[18:03] <zackc> sagewk: look good; merged
[18:04] <ntranger> do you have the download link by chance?
[18:04] <ntranger> Sorry about that. :)
[18:05] <alfredodeza> ceph-deploy repos?
[18:05] <kraken> ceph.com/packages/ceph-extras/(debian|rpm) and ceph.com/(debian|rpm)-{release}
[18:05] <alfredodeza> ntranger: ^ ^
[18:05] <alfredodeza> thanks kraken
[18:05] <alfredodeza> high five
[18:05] <kraken> ( ‘-’)人(゚_゚ )
[18:05] <ntranger> :)
[18:05] <ntranger> thanks
[18:09] <sagewk> joao: i'm pretty sure, but it does make me slightly nervous. it matches the conditions under which we set it in teh first place, though (during begin())
[18:09] <sagewk> so i think it's good. and it passed ~250 iterations without (paxos) breaking
[18:10] <sagewk> (the osd thrasher vs osd leak test race sadly makes a lot of failure noise, but the mons looked good)
[18:11] <zackc> sagewk: https://github.com/ceph/teuthology/pull/49
[18:12] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[18:14] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) Quit (Ping timeout: 480 seconds)
[18:18] <joao> sagewk, can you think of any situation during which this could happen? two monitors in the cluster having the same pn for different versions, and having the same version both committed and uncommitted?
[18:19] <joao> considering that (afair) paxos recovery puts the cluster on-hold for new proposals, it seems to me that if this is the case then the problem may go deeper than this :\
[18:21] <joao> hmm... unless the proposal is already on-going when the recovery starts; apparently we don't care what's the state when we're committing proposals
[18:21] <joao> maybe that's how we can get to that state, it being perfectly reasonable
[18:21] <sagewk> joao: i think that happens all the time, just depends on wehter the mons knew the proposal committed before they crashed.
[18:22] <sagewk> usually it doesn't matter because we ignore the uncommitted_value if it's not last_committed+1.. except for this pn off-by-one comparison makes us not learn the +1 value
[18:23] * devoid (~devoid@130.202.135.213) has joined #ceph
[18:24] <yehudasa> ccortaut: maybe missing the .log pool?
[18:24] <yehudasa> ccourtaut: ^^^
[18:24] <ccourtaut> yehudasa: nop, got it on both clusters
[18:25] <ccourtaut> was missing it at first, and got a problem with a 404 on src
[18:25] <ccourtaut> now got a 404 on dest when the agent tries to get /admin/replica_log?bounds url
[18:26] <yehudasa> do you see anything on the rgw log?
[18:26] <ccourtaut> yehudasa: i make a pastebin in a moment
[18:27] * yehudasa_ (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) has joined #ceph
[18:27] * zhyan_ (~zhyan@101.82.120.66) Quit (Ping timeout: 480 seconds)
[18:27] * sleinen (~Adium@2001:620:0:46:699a:1e8e:f82a:c697) Quit (Ping timeout: 480 seconds)
[18:28] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[18:32] <ccourtaut> yehudasa: http://pastebin.com/wmtndjh1
[18:33] <ccourtaut> yehudasa_: here are the logs of the radosgw of the dest region
[18:33] <joao> sagewk, the pn being off by one bothers me the more I think about it; that should be one thing that should never happen
[18:35] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[18:35] <joao> oh
[18:35] <joao> think I got it
[18:36] * tnt (~tnt@109.130.102.13) has joined #ceph
[18:39] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[18:39] <yehudasa> ccourtaut: did you first run a full sync?
[18:40] <ccourtaut> yehudasa: nop
[18:40] <ccourtaut> might be the problem indeed
[18:40] <ccourtaut> i tried that right now
[18:43] <sagewk> joao: look good?
[18:45] <ccourtaut> yehudasa: did seems to work with --sync-scope full in cli, but i wrote sync_scope : "full" in my yaml config and still got the same issue
[18:46] <yehudasa_> ccourtaut: I think the sync-scope param won't work with the yaml conf, but I'm not too sure
[18:46] <yehudasa_> maybe joshd would know, joshd?
[18:46] <ccourtaut> yehudasa: but even after a first a first run with --sync-scope full
[18:46] <ccourtaut> if i switch back to incremental
[18:46] <ccourtaut> still got the same issue
[18:46] <yehudasa_> ah, that shouldn't happen
[18:47] <ccourtaut> yehudasa_: ok
[18:47] <joao> sagewk, yeah, I think so; still struggling a bit with the whole accepted_pn issue, but considering it is only updated during recovery I think I'm missing the full picture
[18:47] <yehudasa_> ccourtaut, can you set debug ms = 1 on your rgw and try again?
[18:48] <ccourtaut> yehudasa_: still got a 404 as you have seen in the radosgw log
[18:48] <ccourtaut> yehudasa_: ok
[18:49] <mikedawson> sagewk: "osd: disable PGLog::check() via config option (fixes CPU burn)". What is the option, and does it default to burning less CPU? Is there a tradeoff to the lower CPU setting?
[18:49] <sagewk> it turns off debugging and defaults to off
[18:49] <sagewk> forget the name of the option, but only relevant for our testing really
[18:50] <mikedawson> sagewk: thx
[18:50] <ccourtaut> yehudasa_: got this http://pastebin.com/CguDwZeD
[18:52] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[18:53] <ccourtaut> yehudasa_: does it help?
[18:53] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[18:58] <yehudasa_> ccourtaut: looking now
[18:59] <xarses> ceph-deploy 1.0.0 doesn't block on mon create (for the mon's to sync, generate keys) is this expected?
[19:00] <alfredodeza> xarses: that sounds like a very old version you are using, but correct, no command should block in ceph-deploy
[19:00] <xarses> prepare disks does
[19:01] <xarses> would it be benign to use a newer version to drop cuttlefish?
[19:01] * rturk-away is now known as rturk
[19:02] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[19:02] <xarses> osd prepare blocks, osd activate appears to also, but doesn't run long enough to be sure
[19:03] <xarses> alfredodeza: is there a preferred way to check for the cluster mon's have initialized and the keys generated ok?
[19:04] <ccourtaut> yehudasa_: ok thanks
[19:04] <yehudasa> ccourtaut: not sure that it's an actual problem, what does the agent have to say?
[19:05] * symmcom (~wahmed@rx0so-shaw-pipe.cg.bigpipeinc.com) Quit ()
[19:05] <ccourtaut> it does says something about it only in verbose
[19:06] <ccourtaut> thought it might be a problem, but if you say it's the normal behaviour
[19:07] <yehudasa> you don't have many buckets and users, do you?
[19:09] <ccourtaut> yehudasa: do not have any bucket by now indeed
[19:10] <yehudasa> ccourtaut, so there are no bounds set on that shard
[19:10] <ccourtaut> ok
[19:10] * mschiff (~mschiff@port-30155.pppoe.wtnet.de) Quit (Remote host closed the connection)
[19:11] * jluis (~JL@89.181.146.94) has joined #ceph
[19:11] <ccourtaut> yehudasa: thanks a lot for your help, it seems that i have a multi ceph cluster up and running with master and slave and a radosgw-agent on my laptop :)
[19:11] * diegows (~diegows@190.190.11.42) has joined #ceph
[19:12] * nhm (~nhm@184-97-168-219.mpls.qwest.net) Quit (Quit: Lost terminal)
[19:12] <yehudasa> ccourtaut: np, please let us know of any difficulties, it's new to us too
[19:13] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[19:13] * mozg (~andrei@194.72.131.170) Quit (Read error: Operation timed out)
[19:13] <ccourtaut> yehudasa: ok no problem, i was setup up a local cluster master/slave to be able to play with radosgw-agent
[19:13] * bergerx_ (~bekir@78.188.204.182) Quit (Remote host closed the connection)
[19:13] <ccourtaut> to understand what has been done so far, and to be able to contribute then
[19:13] <ccourtaut> yehudasa: many thanks, got to leave workplace before getting kicked out XD
[19:14] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:14] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[19:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[19:20] * Steki (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:21] * sleinen1 (~Adium@2001:620:0:26:399e:e592:2a41:bc7d) has joined #ceph
[19:22] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:25] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[19:27] * yasu` (~yasu`@99.23.160.231) Quit (Remote host closed the connection)
[19:27] * twcai (~twcai@125.119.255.30) Quit (Ping timeout: 480 seconds)
[19:27] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[19:28] * carif (~mcarifio@rrcs-208-105-189-126.nys.biz.rr.com) has joined #ceph
[19:31] <mtl1> Hi. I have a question about snapshots and clones, and if this is a bad idea. I'm going to use rbd to do boot devices for kvm guests. I have a "base" guest that I snapshot, protect, and then clone to make new guest images from. Does it make any difference at all if I have 1 snapshot per guest clone, or if I have several guest clones made from a single weekly or so snapshot?
[19:32] <mtl1> I'm coming from a zfs mentality, and I know that when you reach a certain number of snapshots on a pool with zfs, the performance of the pool starts to decrease. I'd like to make sure that ceph doesn't have the same kind of issue with the number of snapshots.
[19:36] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[19:37] * carif (~mcarifio@rrcs-208-105-189-126.nys.biz.rr.com) Quit (Ping timeout: 480 seconds)
[19:37] <ofu_> this sureley depends on the filesystem of your osds
[19:38] <mtl1> I'm using btrfs
[19:41] <sagewk> zackc: were you already updating bobtail or shall i?
[19:42] <zackc> sagewk: ah, i have not
[19:43] * zackc wasn't sure if there were other branches that should get it as well
[19:44] <sagewk> cuttlefish and dumpling too
[19:45] * KindTwo (~KindOne@h184.211.89.75.dynamic.ip.windstream.net) has joined #ceph
[19:47] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:47] * KindTwo is now known as KindOne
[19:55] * yasu` (~yasu`@dhcp-59-179.cse.ucsc.edu) has joined #ceph
[20:02] <ntranger> alfredodeza: Okay, I got an error saying that it needed pushy 5.3, so I just tried installing that, and pushy is failing its install. Do I have to uninstall the old pushy to install the new one?
[20:08] <joao> sagewk, repushed 5909 (pull-request: 526); commits that changed: 7a091d3 and 4ac086c
[20:08] <joao> gotta run; be back later today, maybe
[20:10] <paravoid> sjust: so, I can do more tests with those two OSDs again today
[20:10] <paravoid> I started them up
[20:10] <paravoid> they're out, but still using quite a bit of CPU
[20:11] <zackc> sagewk: my pushes are failing (fatal: The remote end hung up unexpectedly)
[20:11] <paravoid> I also saw a lot of slow requests today while restarting OSDs for 0.67.2 :( no peering, just slow req for e.g. 120s because there were pgs in a recovering state
[20:11] <paravoid> different bug
[20:11] <paravoid> third serious one this week :)
[20:12] <sagewk> zackc: yay github
[20:16] * jluis (~JL@89.181.146.94) Quit (Ping timeout: 480 seconds)
[20:20] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[20:25] * KindTwo (~KindOne@198.14.197.98) has joined #ceph
[20:26] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[20:26] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:26] * KindTwo is now known as KindOne
[20:28] <alfredodeza> ntranger: I could swear 1.2.2 solves that :/
[20:31] * diegows (~diegows@190.190.11.42) Quit (Read error: Operation timed out)
[20:36] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[20:39] <joshd> mtl1: the only way many snapshots would degrade performance would be on the original base image for writes, but I haven't seen evidence it would have much of an effect even there
[20:40] <mtl1> Excellent. Thank you. The speed on the base image isn't a concern. The speed of the cloned images is all I'm worried about.
[20:49] <sagewk> yehudasa: https://github.com/ceph/ceph/pull/535
[20:49] <sagewk> or anyone else familiar with readdir_r's particular flavor of suck
[20:50] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[20:53] <yehudasa> sagewk: on it
[20:56] <yehudasa> sagewk: basically ok, just not sure about the offsetof portability, do we define it or are we getting it from some external source?
[20:56] <yehudasa> ah, it's in stddef.h
[20:56] <yehudasa> forget it
[20:56] <sagewk> not sure where it comes from. we're already using it elsewhere, though, fwiw
[20:58] <yehudasa> ok, it's going way back to C89, it's safe to use
[21:07] * jochen (~jochen@laevar.de) Quit (Remote host closed the connection)
[21:16] <sprachgenerator> I'm working with the latest build of dumpling and having issues with ceph-deploy and osd prepare - it's specifically hanging at "Preparing host xxx disk"
[21:16] <alfredodeza> sprachgenerator: what version of ceph-deploy
[21:16] <alfredodeza> make sure you are using the latest one, there was a release last night
[21:16] <alfredodeza> v1.2.2 is the latest
[21:17] <sprachgenerator> 1.2.2
[21:18] <alfredodeza> hrmnnn
[21:18] <sprachgenerator> here are some details from the ceph-deploy node and the node that it's deploying to: http://pastie.org/8263542
[21:18] <alfredodeza> I was about to ask :)
[21:18] * sagedroid (~yaaic@2607:f298:a:607:806e:9322:9e5c:89bc) has joined #ceph
[21:20] * sagedroid (~yaaic@2607:f298:a:607:806e:9322:9e5c:89bc) Quit (Read error: Connection reset by peer)
[21:20] * sagedroid2 (~yaaic@2607:f298:a:607:806e:9322:9e5c:89bc) has joined #ceph
[21:21] * sagedroid2 (~yaaic@2607:f298:a:607:806e:9322:9e5c:89bc) Quit ()
[21:22] <sprachgenerator> interestingly enough - it finished /dev/sda on that machine - (usually it hangs there) the the ceph related processes on that node after calling /dev/sdb are; http://pastie.org/8263552
[21:22] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[21:23] <sprachgenerator> there are now two /usr/sbin/ceph-disk activate's (sda/sdb) respectively - the mount point for /sda seems funny as well
[21:25] <alfredodeza> sprachgenerator: have you tried running the failing command on the actual host and see what is going on there?
[21:25] <alfredodeza> for example: /usr/sbin/ceph-disk-prepare --cluster ceph -- /dev/sdb
[21:27] <devoid> yea he tried that
[21:27] <devoid> alfredodeza: ^
[21:27] <alfredodeza> and what was the output?
[21:30] <devoid> it hangs after "The operation has completed successfully."
[21:30] <devoid> so success?
[21:30] <alfredodeza> so if it is hanging on the remote host, that is worrying, but hopefully not related to ceph-deploy
[21:34] <alfredodeza> where does that output come from? I don't see ceph-deploy/ceph returning that
[21:34] <devoid> ok, it completes after a few minutes
[21:34] <alfredodeza> maybe I should add a warning that it can take a bit of time depending on your disk size
[21:36] <devoid> 500Mb?
[21:36] <devoid> s/M/G/
[21:39] <alfredodeza> well I am in the process to get al the `osd` actions to tells us more what the hell they are doing :)
[21:39] <alfredodeza> we are (right now) doing like 20 things that we don't report on
[21:43] <devoid> yea, it looks like this is some subprocess echoing to stdout
[21:45] <sagewk> https://github.com/ceph/ceph/pull/530
[21:47] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[21:48] * allsystemsarego (~allsystem@5-12-37-127.residential.rdsnet.ro) Quit (Quit: Leaving)
[21:56] <ntranger> alfredodeza: here is the error I get when trying to install ceph-deploy-1.2.2-0.noarch.rpm, http://pastebin.com/6JggZ5Nc
[21:57] <alfredodeza> ntranger
[21:57] <alfredodeza> err
[21:57] <alfredodeza> ntranger: that looks like a packaging bug
[21:58] <alfredodeza> what OS is this? CentOS?
[21:58] <ntranger> scientific linux 6.4
[21:58] * KindTwo (~KindOne@h83.54.186.173.dynamic.ip.windstream.net) has joined #ceph
[21:59] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:59] <alfredodeza> ah right scientific
[21:59] * KindTwo is now known as KindOne
[21:59] <alfredodeza> this is really odd, it works for me on CentOS 6.3, let me try CentOS 6.4
[21:59] <sprachgenerator> so for the processes that spawn after: /usr/sbin/ceph-disk prepare --cluster ceph -- /dev/sda - two remain even after the partitioning/mkfs commands are completed: http://pastie.org/8263614 - running the activate command manually shows that it cannot talk to any of the mon hosts in the config: t would appear as though the ceph-deploy tool when creating the mon configs chooses the wrong IP interface rather then the one specified in ceph.conf :
[21:59] <sprachgenerator> http://pastie.org/8263636
[21:59] <alfredodeza> ntranger: what happens if you do not specify a version?
[22:01] * Meths (rift@2.25.193.59) Quit (Ping timeout: 480 seconds)
[22:06] <sprachgenerator> from the client side you see: http://pastie.org/8263656
[22:07] <ntranger> alfredodeza: well, its saying 1.2.2 is installed, but pushy errored. When I try to run the "ceph-deploy install" command, I get this http://pastebin.com/CcXxS5J7
[22:09] <alfredodeza> ntranger: that looks like it is adding the EPEL but complaining EPEL is already there
[22:09] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Quit: Leaving)
[22:09] <alfredodeza> which seems OK to me
[22:09] <alfredodeza> although I have a ticket open to work on not attempting to re-add EPEL if it is already there
[22:12] <sagewk> dmick: alfredodeza: does that python unwind properly on exception?
[22:12] <sagewk> and still raise?
[22:13] <dmick> which python?
[22:13] <alfredodeza> what do you mean by 'unwind properly'
[22:14] <alfredodeza> the reason for the traceback is because we are doing check_call and the command is giving a non-zero exit status, so CalledProcessError gets raised
[22:14] <alfredodeza> right now we are logging all remote exceptions for every command as ERROR
[22:14] <alfredodeza> so that will show up
[22:15] <dmick> are we talking about ntranger's report?
[22:15] <alfredodeza> dmick: yes
[22:15] <alfredodeza> so the solution here is not to re-add EPEL if it is already there
[22:15] <alfredodeza> otherwise there is too much noise
[22:16] <dmick> there must be a way to make rpm exit successfully, no?
[22:16] <alfredodeza> there is a ticket for this, issue 6102
[22:16] <dmick> is it really that stupid?
[22:16] <kraken> alfredodeza might be talking about: http://tracker.ceph.com/issues/6102 [if EPEL has been added skip adding it again]
[22:16] <sagewk> alfredodeza: dmick: http://fpaste.org/34338/88737137/
[22:17] <sagewk> helps if i paste the link
[22:17] <alfredodeza> lol, and here we were thinking something different
[22:17] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:17] <alfredodeza> sagewk: I think a context manager would be better
[22:18] <alfredodeza> otherwise you would be going that dance all over
[22:18] <sagewk> just want to hold the lock for that one function
[22:18] <sagewk> actually, it doesn't matter if we release the lock when we crash anyway
[22:18] <alfredodeza> but, to your question, yes, the exception gets raised, and the finally always executes
[22:18] <alfredodeza> sagewk: yep, a context manager would do that correctly for just that function
[22:19] <sagewk> k thanks
[22:20] <alfredodeza> then your call would look like this: http://fpaste.org/34340/77289190/
[22:20] * danieagle (~Daniel@177.97.251.212) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[22:20] <alfredodeza> which is nicer to read and easier to implement elsewhere
[22:21] * yanzheng (~zhyan@101.82.231.225) has joined #ceph
[22:22] <dmick> alfredodeza: add --replacepkgs to that rpm call; no error
[22:22] <dmick> also, the 'vh' is pretty useless when a machine is running it
[22:23] * alfredodeza takes ntoe
[22:23] <dmick> so I'd change -Uvh to -U --replacepkgs
[22:23] <alfredodeza> *note
[22:23] <alfredodeza> excellent
[22:23] <alfredodeza> thank you dmick
[22:23] <dmick> sometimes you have to threaten package managers with violence to meet your goals :)
[22:23] <alfredodeza> lol
[22:26] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[22:31] * WarrenUsui (~Warren@2607:f298:a:607:ccbb:b6b0:2d7c:a034) has joined #ceph
[22:33] <n1md4> hi. i have a problem starting mon, and I think it's related to the IP address changing (my test stack is on dhcp). here is a snippet from running mon in debug mode http://pastebin.com/raw.php?i=cFJPFA1C
[22:33] <n1md4> perhaps the fsid needs recreating?
[22:37] <xarses> looks like the port is already bound
[22:38] * aardvark (~Warren@2607:f298:a:607:641f:2be0:aeb3:2608) Quit (Ping timeout: 480 seconds)
[22:38] <n1md4> what do you mean, sorry?
[22:39] <xarses> either the ip is not the ip of an interface on the host
[22:39] <xarses> or the port 6789 is open by another process
[22:39] <n1md4> fyi, the current ip is x.x.x.211 it was x.x.x.212
[22:39] <xarses> the log shows 212
[22:39] <xarses> so check ceph.conf for the old ip
[22:40] <n1md4> yeah, well, when i built the thing it was, had to move it, and when it came online dhcp server dished out this ip
[22:40] <n1md4> ceph.conf has the new ip
[22:40] <xarses> ok
[22:40] * a2 (~avati@ip-86-181-132-209.redhat.com) has joined #ceph
[22:40] <xarses> check netstat -na | grep g789
[22:40] <xarses> erm 6789
[22:42] <n1md4> nothing
[22:43] <n1md4> i know mon is not running, and it's likely because of this ip change
[22:43] <xarses> hmm, im still drawn to the fact that your log shows 212, but the host/config is 211
[22:44] <sagewk> alfredodeza: can you look at https://github.com/ceph/teuthology/pull/50 ?
[22:44] * alfredodeza looks
[22:45] * xmltok_ (~xmltok@pool101.bizrate.com) has joined #ceph
[22:45] <n1md4> xarses: i can see that .. just not sure where 212 is being stored .. I may just kludge it back to 212 :)
[22:46] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[22:46] <xarses> ya, I would have expected it to be only in the config
[22:47] <n1md4> xarses: it would have been nice, but the headache it's giving me isnt' worth the effort to find out - as it's only a temp environment anyway.
[22:47] <xarses> I know the feeling
[22:48] <xarses> i burned mine down and restarted this morning for a random head scratcher
[22:48] <n1md4> hehe
[22:49] * KindTwo (~KindOne@h208.36.28.71.dynamic.ip.windstream.net) has joined #ceph
[22:51] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:51] * KindTwo is now known as KindOne
[22:54] <gregaf> sagewk: do we have separate rules for backporting MDS code to previous releases, or is it just anything minimally invasive and useful?
[22:55] <sagewk> right
[22:55] <gregaf> sweet
[22:55] <gregaf> finally getting around to that locking bug patch from the end of July(!)
[23:05] * KindTwo (~KindOne@h161.33.186.173.dynamic.ip.windstream.net) has joined #ceph
[23:06] * sjustlaptop (~sam@38.122.20.226) Quit (Quit: Leaving.)
[23:07] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:07] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:07] * KindTwo is now known as KindOne
[23:07] <n1md4> xarses: okay, so getting there. next hurdle, osds on the one ofe the nodes is down ... you have a fix for that?
[23:11] <xarses> one of the 'rados osd' commands will probably help
[23:11] <xarses> i haven't gotten that far with my cluster yet
[23:14] <n1md4> ... well, if it makes any difference, here's what I've got .. http://pastebin.com/raw.php?i=ay556CXJ
[23:14] * Steki (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:15] <xarses> is there no [osd.3] section in ceph.conf?
[23:15] <xarses> maybe try that from the node that has the osd's
[23:16] <n1md4> cuttlefish doesn't define them, if I'm right?
[23:18] <sagewk> https://github.com/ceph/ceph/pull/536
[23:19] * ismell (~ismell@host-24-56-171-198.beyondbb.com) Quit (Ping timeout: 480 seconds)
[23:20] <n1md4> ah! it was the same dodgy ip business on the other node's ceph.conf
[23:20] <n1md4> sorted now ... nearly.
[23:25] * sprachgenerator (~sprachgen@130.202.135.179) Quit (Quit: sprachgenerator)
[23:25] * diegows (~diegows@190.190.11.42) has joined #ceph
[23:26] * madkiss (~madkiss@207.239.114.206) has joined #ceph
[23:27] * sjustlaptop (~sam@38.122.20.226) Quit (Quit: Leaving.)
[23:27] * indeed (~indeed@206.124.126.33) has joined #ceph
[23:28] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:28] * sjustlaptop (~sam@38.122.20.226) Quit ()
[23:29] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:32] * symmcom (~wahmed@S0106001143030ade.cg.shawcable.net) has joined #ceph
[23:33] * devoid (~devoid@130.202.135.213) Quit (Quit: Leaving.)
[23:33] * KindTwo (~KindOne@h120.38.186.173.dynamic.ip.windstream.net) has joined #ceph
[23:35] <symmcom> Hello community, i got a newbie question on setting up CEPH Objest storage..... I followed the quick start quide all the way to creating a new user. Now my question is how do i access the obj storage to start putting files on it
[23:36] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:36] * KindTwo is now known as KindOne
[23:39] * Meths (~meths@2.25.213.253) has joined #ceph
[23:44] <n1md4> symmcom: hah! just have my stack working now, and was about to ask the same question.
[23:44] <gregaf> the s3 protocol object storage, you mean?
[23:44] <gregaf> symmcom and n1md4 ^
[23:45] <symmcom> i m so new into OBJ Store i dont even know what "Stack Working Now" means :)
[23:45] * yanzheng (~zhyan@101.82.231.225) Quit (Ping timeout: 480 seconds)
[23:46] <n1md4> gregaf: couldn't tell you. most of my experience has been with lvm/drbd/iscsi, so that presents a block device to a client..
[23:46] <n1md4> my goal is to use with xenserver
[23:47] <n1md4> symmcom: stack working means the output of ceph -s says health ok :)
[23:47] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:47] <symmcom> gregaf> I have CEPH Cluster wirh RBD and FS for last 5 months, just trying to learn the CEPH Object Storage. Some people here helped finally understand what CEPH OBJ Storage really is, now just trying my hand at it by setting up
[23:48] * vata (~vata@2607:fad8:4:6:40a1:464d:8581:6fb8) Quit (Quit: Leaving.)
[23:48] <gregaf> symmcom: okay, well if you've actually got it successfully running then you need to create S3 users and then use them to start speaking the S3 protocal at the gateway IP
[23:48] <gregaf> http://ceph.com/docs/master/radosgw/config/#create-a-gateway-user to create it, then…pick a tool and go (we don't provide one ourselves)
[23:49] <symmcom> gregaf> i think u r very close to answering the real question i have. :) I created a user, just dont know how and what do i use to log in to see my bucket. Umentioned tool. Which tool and where i can pick it up
[23:49] <gregaf> n1md4: create an rbd user (or copy your admin user, if you're okay with that) to the node in question, then start playing with the block device commands
[23:50] <gregaf> look around http://ceph.com/docs/master/rbd/rbd/
[23:50] <n1md4> gregaf: nice, thanks.
[23:50] <gregaf> symmcom: the rados gateway (Ceph Object Storage) speaks the S3 protocol, which is a de facto standard supported by a number of clients
[23:51] * sjustlaptop (~sam@38.122.20.226) Quit (Quit: Leaving.)
[23:51] <gregaf> Cyberduck seems to be pretty common (thanks to being free), or s3cmd on the command line
[23:52] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[23:52] <gregaf> I've gotta run now though, good luck guys :)
[23:52] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:52] <symmcom> ur answer have been excellent gregaf!! Thanks!
[23:58] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.