#ceph IRC Log

Index

IRC Log for 2013-09-18

Timestamps are in GMT/BST.

[0:03] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has left #ceph
[0:08] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Quit: jlogan)
[0:12] * rendar (~s@host154-179-dynamic.12-79-r.retail.telecomitalia.it) Quit ()
[0:12] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[0:12] * sagelap (~sage@2600:100d:b119:bd83:583e:a21c:c5b2:a056) Quit (Ping timeout: 480 seconds)
[0:15] * Steki (~steki@198.199.65.141) Quit (Ping timeout: 480 seconds)
[0:16] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Ping timeout: 480 seconds)
[0:18] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:19] * dmsimard (~Adium@108.163.152.2) Quit (Quit: Leaving.)
[0:23] * roald (~roaldvanl@87.209.150.214) has joined #ceph
[0:26] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[0:28] * BillK (~BillK-OFT@124-169-207-19.dyn.iinet.net.au) has joined #ceph
[0:31] * roald (~roaldvanl@87.209.150.214) Quit (Ping timeout: 480 seconds)
[0:34] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[0:35] * sleinen1 (~Adium@2001:620:0:25:40bb:5d79:3b9e:1e4e) Quit (Quit: Leaving.)
[0:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[0:36] * Svedrin (svedrin@ketos.funzt-halt.net) Quit (Ping timeout: 480 seconds)
[0:37] * Svedrin (svedrin@ketos.funzt-halt.net) has joined #ceph
[0:43] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:47] <xarses> alfredodeza: ping
[0:47] <alfredodeza> xarses: depends
[0:47] <angdraug> alfredodeza: xarses just found that the hang bug is back with ceph-deploy built from this commit: 6b41b315
[0:47] <alfredodeza> not for OSD paths
[0:47] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[0:48] * ChanServ sets mode +o scuttlemonkey
[0:48] <angdraug> given the subsequent commits that went into 1.2.4, I suspect it would replicate with that, too
[0:48] <alfredodeza> it hangs where?
[0:48] <alfredodeza> there are timeouts for mon commands now
[0:49] <xarses> [root@node-4 ~]# ps axu | grep ceph
[0:49] <xarses> root 23988 0.1 0.1 279956 1272 ? Ssl 22:37 0:00 python /usr/bin/ceph-deploy mon create node-4:192.168.0.2
[0:49] <xarses> root 24168 0.0 0.3 155344 2864 ? Sl 22:37 0:00 /usr/bin/ceph-mon -i node-4 --pid-file /var/run/ceph/mon.node-4.pid -c /etc/ceph/ceph.conf
[0:49] <xarses> root 26644 0.0 0.1 103240 848 pts/0 S+ 22:43 0:00 grep ceph
[0:49] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) Quit (Quit: Leaving.)
[0:49] <xarses> grabbing 1.2.4.0 from dumpling
[0:49] <alfredodeza> that may or may not be related to ceph-deploy :)
[0:49] * alfredodeza is inclined to say not-related
[0:50] <xarses> ceph -s returns OK
[0:50] <angdraug> can't be ceph, we're using the same cuttlefish version as before
[0:50] <xarses> well its started with health_err, but thats my fault
[0:51] <xarses> but ceph-deploy should at least kill its self after 7 seconds right?
[0:52] <alfredodeza> xarses: for mon commands, yes
[0:53] <alfredodeza> plus, you need to at least give me a way to replicate this :)
[0:53] <alfredodeza> we test ceph-deploy extensively, and something that hangs would be caught immediately (which we had before)
[0:55] * alfredodeza is juggling kids bath time and ceph-deploy
[0:55] * alfredodeza thinks kids are winning this battle
[0:56] * alfredodeza is now known as alfredo|afk
[0:58] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Ping timeout: 480 seconds)
[1:02] <xarses> alfredodeza: http://paste.openstack.org/show/47188/, so using 1.2.4.0 with centos is broken
[1:02] * dmsimard (~Adium@108.163.152.2) Quit (Ping timeout: 480 seconds)
[1:03] <alfredo|afk> xarses: how long before the Ctrl-C ?
[1:04] <xarses> 37 seconds
[1:04] <xarses> its actually still running now
[1:04] <alfredo|afk> so if you use 1.2.3 it works?
[1:04] * alfredo|afk it is baffled
[1:05] <xarses> no
[1:05] <xarses> hmm, probably
[1:05] <xarses> we where using the master branch
[1:05] <xarses> between 1.2.3 and 1.2.4
[1:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:06] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[1:06] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit ()
[1:07] <angdraug> pretty sure it was 2d12dcf
[1:08] <alfredo|afk> angdraug: what do you mean?
[1:08] <alfredo|afk> that just swapped to the new remote library
[1:08] <alfredo|afk> later we decided to add the timeout
[1:08] <alfredo|afk> *right* now it has the timeout
[1:09] <alfredo|afk> if what you guys are using from master doesn't have that timeout it is bound to hang
[1:09] <xarses> should have been working for us at pull 72
[1:09] <alfredo|afk> it no longer hangs on our end
[1:09] <alfredo|afk> it hangs when `create keys` never ends on the remote end
[1:09] <angdraug> at pull 72 afair it still had hangs
[1:09] <alfredo|afk> if whatever you've built from master does not have the timeouts then it will hang
[1:10] <angdraug> then we took 2d12dcf and hangs were fixed
[1:10] <xarses> nope, create keys wasn't the issue for us
[1:10] <alfredo|afk> sure, if it is not `create keys` it has to be something else on the remote end that never terminates
[1:10] <alfredo|afk> regardless, whatever you guys are building, you need to make sure you have the `timeout` flags there
[1:10] <alfredo|afk> if you don't, it will hang
[1:11] <xarses> the paste is from 1.2.4.0 from http://ceph.com/rpm-dumpling/el6/noarch/ceph-deploy-1.2.4-0.noarch.rpm
[1:11] <angdraug> 3 most recent ceph-deploy rpms we tested were built from pull 72, from 2d12dcf, and most recent one from 6b41b315
[1:11] <alfredo|afk> and all 3 are broken?
[1:11] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:11] <angdraug> 2d12dcf wasn't
[1:12] <alfredo|afk> ok, so that is the one that works but everything after that does not?
[1:13] <angdraug> yes
[1:13] <angdraug> we didn't bisect it, so all we have for you is the range between those two commits
[1:13] <xarses> 6b41b315 and df3f351249 (v1.2.4) are broken
[1:14] <xarses> for centos
[1:14] <alfredo|afk> xarses: that paste that you sent backgrounds the process, can you actually Ctrl-C so we can see if there is any traceback?
[1:14] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[1:14] <xarses> http://paste.openstack.org/show/47192/
[1:15] <alfredo|afk> pushy!
[1:16] <alfredo|afk> that looks like something in the build is not building what I have in master
[1:16] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[1:16] * alfredo|afk is now known as alfredodeza
[1:16] <alfredodeza> you see, (so we are on the same page) we are no longer using pushy for *that* call
[1:16] <alfredodeza> we are using something else
[1:16] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:17] <alfredodeza> if it is hanging when it is starting the monitors and it is pushy for you guys, then I am *sure* there is a disconnect to what you guys have and what the most recent release is suppose to be using
[1:18] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:18] <alfredodeza> this is the new remote connection thing we are using to start the MON, that has nothing to do with pushy: https://github.com/ceph/ceph-deploy/blob/master/ceph_deploy/hosts/centos/mon/create.py#L14
[1:18] <alfredodeza> your symptoms describe what I saw before swapping that
[1:19] <alfredodeza> what worries me is that you see this from the latest release
[1:19] <alfredodeza> so I am going to grab that and see the contents
[1:19] <xarses> yes, did someone not update build server?
[1:19] <xarses> because thats usually what i forget
[1:21] <alfredodeza> I am not sure
[1:21] <alfredodeza> centos 6 ?
[1:22] <xarses> yes
[1:22] <alfredodeza> in the meantime can you try and build from what is on master right now and run with that?
[1:22] <alfredodeza> you should not get a hang
[1:23] <alfredodeza> if you do, again, Ctrl-C and paste me the traceback
[1:23] <xarses> ya, angdraug is building it now
[1:24] <joelio> SUCCESS :) Finally got the right invokation for aws-sdk s3
[1:26] <alfredodeza> xarses: I just uncompressed what was built and I see it built correctly :/
[1:26] <alfredodeza> with the new connections
[1:26] * xarses scratches his head
[1:27] * sprachgenerator (~sprachgen@130.202.135.232) Quit (Quit: sprachgenerator)
[1:28] * gregaf1 (~Adium@2607:f298:a:607:a433:7d00:695b:9509) has joined #ceph
[1:28] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:28] <alfredodeza> lets make sure there is no disconnect from the build (rpm/deb) and run it directly
[1:29] <alfredodeza> clone the repo, and run bootstrap
[1:29] <alfredodeza> and then run directly the ceph-deploy script
[1:29] <alfredodeza> you should **not** have any hangs
[1:35] * alfredodeza is now known as alfredo|afk
[1:35] <alfredo|afk> xarses: do send me an email or ping me about your findins
[1:35] <alfredo|afk> *findings
[1:35] * alfredo|afk is going offline now
[1:35] <xarses> ok
[1:35] <xarses> thanks for the help
[1:37] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[1:46] * gregaf1 (~Adium@2607:f298:a:607:a433:7d00:695b:9509) Quit (Quit: Leaving.)
[1:52] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[1:58] * gregaf1 (~Adium@2607:f298:a:607:2026:a478:8d61:dbdf) has joined #ceph
[1:59] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[2:06] * bandrus1 (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[2:06] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[2:07] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[2:08] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[2:10] * LeaChim (~LeaChim@host86-135-252-168.range86-135.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:10] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[2:13] * alfredo|_ (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:16] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Read error: Operation timed out)
[2:17] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[2:17] * alfredo|afk (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[2:18] <xarses> alfredodeza, alfredo|afk, found the issue
[2:18] <alfredo|_> xarses: what was it
[2:18] * alfredo|_ is now known as alfredodeza
[2:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:19] <alfredodeza> is it something I need to fix?
[2:19] <xarses> https://github.com/ceph/ceph-deploy/blob/master/ceph_deploy/mon.py#L109
[2:19] <xarses> the distoro.mon.create runs fine
[2:20] <xarses> distro.sudo_conn.close() is wacked out, i can assert before it and get the assert, and after it, it just hangs
[2:20] <alfredodeza> xarses: but your output is past that
[2:21] <alfredodeza> xarses: this log line: "[node-4][INFO ] Running command: /sbin/service ceph start mon.node-4"
[2:21] <alfredodeza> is past that distro.sudo_conn.close()
[2:21] <xarses> http://paste.openstack.org/show/47201/
[2:23] <alfredodeza> hrmn
[2:24] <alfredodeza> the problem here is that *all* things that hang are when we attempt to close the connection with that lib
[2:24] <alfredodeza> :/
[2:25] <alfredodeza> hrmnnn
[2:25] <alfredodeza> I know who is responsible
[2:25] <alfredodeza> not sure how to go around it
[2:25] * danieagle (~Daniel@177.133.172.16) has joined #ceph
[2:25] <joelio> so, getting closer with this s3.. I'm hitting a 2Gb limit now with mutlipart, rather than the 1Gb..
[2:25] <xarses> does pushy need to run the lsb detection?
[2:25] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[2:26] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[2:27] <alfredodeza> I just created issue 6335
[2:27] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:27] * gregaf (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[2:27] <alfredodeza> issue 6335
[2:27] <kraken> alfredodeza might be talking about: http://tracker.ceph.com/issues/6335 [ceph-deploy may *still* hang with pushy]
[2:27] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[2:28] <alfredodeza> xarses: not anymore
[2:28] <alfredodeza> actually we don't need to use it at all, it is just that the transition as a whole is going to take a bit
[2:29] <alfredodeza> xarses: can you try something for me?
[2:29] * Guest6922 (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[2:30] <xarses> sure
[2:30] <alfredodeza> in ceph_deploy/hosts/centos/mon/create.py before the #TODO
[2:30] <alfredodeza> add this: distro.sudo_conn.close()
[2:30] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[2:31] <alfredodeza> and remove the `distro.sudo_conn.close()` in ceph_deploy/mon.py
[2:31] <alfredodeza> let me know how that runs
[2:31] <alfredodeza> because I cannot replicate I am in the dark here :(
[2:31] <alfredodeza> the only thing I can think of is of removing pushy as a whole
[2:31] <alfredodeza> but that is going to take quite the effort
[2:32] <xarses> that appears to work
[2:32] <xarses> :)
[2:32] <alfredodeza> no shit
[2:32] <alfredodeza> really?
[2:33] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[2:33] * alfredodeza yells out the window
[2:33] <xarses> http://paste.openstack.org/show/47202/
[2:34] <alfredodeza> why is it saying: [node-4][ERROR ] unrecognized command
[2:34] <alfredodeza> how is this not valid: [node-4][INFO ] Running command: ceph daemon mon.node-4 mon_status
[2:35] <xarses> that appears to be more or less normal
[2:35] <alfredodeza> that should actually tell you the output of the monitor
[2:35] <alfredodeza> the actual status
[2:35] <xarses> that never worked
[2:35] <alfredodeza> which is critical to this release
[2:36] <alfredodeza> nooooooo
[2:36] <alfredodeza> even on the remote host?
[2:36] <alfredodeza> that doesn't even work there?
[2:36] <alfredodeza> `sudo ceph deamon mon.node-4 mon_status`
[2:36] <xarses> [root@node-4 ~]# ceph daemon mon.node-4 mon_status
[2:36] <xarses> unrecognized command
[2:36] <xarses> cuttlefish
[2:37] <alfredodeza> ahhhh
[2:37] <alfredodeza> cuttlefish
[2:37] <alfredodeza> maybe that was added in dumpling
[2:37] <alfredodeza> ok
[2:37] <xarses> that was always my guess
[2:37] <alfredodeza> xarses: does it make more sense to overwrite 1.2.4 with this fix? or do you think it is better to just release 1.2.5 ?
[2:37] <xarses> ceph -s looks resonablee
[2:38] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[2:38] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:38] <xarses> I'll use the hacked version for now
[2:38] <xarses> but that might be quite the blocker for el6/centos6 users
[2:40] <alfredodeza> maybe 1.2.5 makes sense for this so we play nice with upgrades
[2:40] <xarses> also i wonder if we should shortcut the other distro's too so that they are less likely to hit this before more of the pushy code is riped out
[2:40] <xarses> you can change it to 1.2.4.1 unless you dont want to have 4 digit vers
[2:40] <xarses> the rpm builds 1.2.4.0 as it is
[2:42] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[2:43] * yy-nm (~Thunderbi@122.233.44.183) has joined #ceph
[2:43] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[2:44] * angdraug (~angdraug@204.11.231.50.static.etheric.net) Quit (Quit: Leaving)
[2:44] * rturk is now known as rturk-away
[2:44] <alfredodeza> xarses: pull request opened https://github.com/ceph/ceph-deploy/pull/83
[2:45] <alfredodeza> no, 4 digits, no way :(
[2:45] <alfredodeza> 1.2.5 it is
[2:45] <xarses> ok
[2:45] <alfredodeza> glowell: ping
[2:47] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:49] <alfredodeza> norris glowell
[2:49] <kraken> When glowell does a commit, he doesn't push to the remote repository, he makes all clones pull from him.
[2:50] <joelio> is there any other implementation of radosgw I can test? I've just tried the fastcgi with multipart large files and I'm getting strange effects. I can (almost) commit a full 4.7GB file but subsequent runs fail a lot earlier, feels something broken in fastcgi
[2:51] <joelio> a restart of apache brings back the larger commit capability
[2:51] <xarses> alfredodeza, tested with centos 6 +1
[2:51] <alfredodeza> thank you
[2:51] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[2:51] <alfredodeza> sorry for all the headaches
[2:51] <alfredodeza> we are trying really hard to get this better
[2:51] <alfredodeza> :(
[2:51] <xarses> i know the feeling
[2:52] <xarses> thanks for the help, ttyl
[2:53] <nhm> joelio: that does sound strange. :/
[2:53] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[2:54] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[2:54] <joelio> nhm: yea, I've been battling this all day.
[2:55] <nhm> joelio: I'm not the right person to talk to about radosgw sadly. I'm actually doing some profiling work on it now, but I don't know anything about large file support.
[2:57] <joelio> yea, I'm foraging a path on this myself
[2:57] <joelio> slowly :)
[2:57] <yehuda_hm> joelio: what fastcgi module are you using?
[2:57] <nhm> joelio: there was a bit of discussion recently on the mailing list about it I think.
[2:58] <joelio> it'll be a good way provide a more agnostic way that we could store video.. but I can't see how we'd get 4K stuff in there atm :)
[2:58] <joelio> yehuda_hm: I'm using the debian packaged one
[2:58] <joelio> (from gitbuilder.ceph..)
[2:58] <yehuda_hm> is the mod_fastcgi or mod_fcgi?
[2:58] <nhm> joelio: oh, are you one of the people having performance issues with 4k writes?
[2:58] <joelio> nhm: no, UltraHD
[2:59] <nhm> oh, lol
[2:59] <yehuda_hm> joelio: were you the one sent that message on the mailing list earlier today?
[3:00] <joelio> ii libapache2-mod-fastcgi 2.4.7~0910052141-1-inktank2 amd64 Apache 2 FastCGI module for long-running CGI scripts
[3:00] <joelio> ii libfcgi0ldbl 2.4.0-8.1ubuntu3 amd64 Shared library of FastCGI
[3:00] <joelio> yehuda_hm: np
[3:00] <joelio> no, sorry
[3:00] <joelio> (it's late here!)
[3:00] <nhm> joelio: for video it may make sense to break up the videos into chunks and stick them directly into rados objects via librados. Then you can grab the first chunks before the last ones have even finished uploading.
[3:00] <joelio> nhm: I appreciate that - problem is it becomes very Ceph specific :)
[3:00] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:01] <joelio> I was checking to see performance on s3 first
[3:01] * xarses (~andreww@204.11.231.50.static.etheric.net) Quit (Ping timeout: 480 seconds)
[3:01] <joelio> I have a feeling it's not going to fly too well, but must test at least
[3:01] <yehuda_hm> joelio: do you have issues with uploading smaller files?
[3:01] <nhm> joelio: true, but then you can also use some of the neat data processing stuff on the OSDs. :D
[3:02] <yehuda_hm> oh, btw, what version are you using?
[3:02] <joelio> yehuda_hm: no, that works fine
[3:02] <joelio> I'm using latest dev in deb sources
[3:03] <joelio> ceph version 0.68 (b4cf0f2574be701d9efeb88c45ffd3c2004dce2c)
[3:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:03] <yehuda_hm> joelio: can you set the following configurable on your osd:
[3:04] <yehuda_hm> osd max attr size = 1048576
[3:04] <yehuda_hm> see if that changes anything?
[3:04] <yehuda_hm> maybe even start with a bigger number
[3:04] <joelio> I already have osd_max_attr_size = 655360
[3:04] <joelio> I'll try bigger
[3:05] <nhm> yehuda_hm: aha, you are on burnupi39 preventing my unmounting of sde2. ;)
[3:06] <yehuda_hm> try now
[3:06] <nhm> yehuda_hm: one more, pid 7050
[3:06] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[3:07] <nhm> in /srv/osd-device-3-data/current/11.f9e_head/DIR_E/DIR_9/DIR_F/DIR_0
[3:09] <nhm> got tired of waiting and kill -9'd it. ;)
[3:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[3:14] <yehuda_hm> nhm, sorry
[3:14] <yehuda_hm> was afk
[3:14] <nhm> no worries
[3:15] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[3:15] <yehuda_hm> hmm.. it might have been some other shell that just hung there when my machine got offline
[3:15] <joelio> yehuda_hm: right, well the multipart completes but there are still failures https://gist.github.com/joelio/ad500b8ed5fef2cb98b8
[3:17] <yehuda_hm> joelio: that's the problem (the big attrs).
[3:17] <yehuda_hm> maybe add another zero there
[3:17] <yehuda_hm> I'd need debug ms = 1 in order to see the actual request size
[3:17] * danieagle (~Daniel@177.133.172.16) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[3:18] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[3:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[3:19] * Vjarjadian (~IceChat77@176.254.6.245) Quit (Ping timeout: 480 seconds)
[3:23] <joelio> hmm, no, same thing
[3:23] <yehuda_hm> joelio: did you restart the osds?
[3:23] <joelio> ah!
[3:24] <joelio> it's 2:30am, give me slack :)
[3:24] <yehuda_hm> joelio: you can injectargs
[3:24] <yehuda_hm> at least try doing it without osd restart
[3:24] <joelio> I generally use config push
[3:24] <joelio> but that doesn't restart osds I guess
[3:25] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[3:26] <yehuda_hm> ceph tell osd.\* injectargs '--osd_max_attr_size 10000000'
[3:26] <yehuda_hm> try that
[3:27] <joelio> nope, still no joy
[3:28] <yehuda_hm> can you set 'debug ms = 1' on your gateway, restart it, then retry
[3:28] <yehuda_hm> I want to see the log for the failing operation
[3:29] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[3:29] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[3:29] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Read error: Connection reset by peer)
[3:30] <nhm> hrm talk like a pirate day just so happens to overlap with an important meeting. I wonder if I should wear an eye patch.
[3:30] <yehuda_hm> nhm: that's talk like a pirate day, not dress up like a pirate day
[3:30] <yehuda_hm> you got that mixed up
[3:31] * peetaur (~peter@CPEbc1401e60493-CMbc1401e60490.cpe.net.cable.rogers.com) has joined #ceph
[3:31] <nhm> yehuda_hm: I fail to see your reasoning.
[3:31] <yehuda_hm> arrr
[3:31] <joelio> yehuda_hm: yea, I'll do it tomorrow, need sleep - for posterity this is my hacky bit of ruby I'm using to test https://gist.github.com/joelio/37f02f0f76bd4f3ba234
[3:32] <nhm> I need a hat.
[3:32] <joelio> I'm using split on the shell to generate the parts (simple, does the job)
[3:33] <nhm> yehuda_hm: btw, I should know soon if setting the filestore merging and splitting parameters moves the graph.
[3:33] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[3:34] <yehuda_hm> how much is it now?
[3:35] <nhm> at 428K objects, ~612puts/s
[3:35] <yehuda_hm> when are we expecting the drop now?
[3:36] <nhm> I think at ~650K
[3:36] <nhm> Seeing some errors now
[3:37] <nhm> well, "failures"
[3:37] <yehuda_hm> it started splitting, btw
[3:37] <nhm> guess it did it a bit sooner than I expected.
[3:38] <nhm> seems that's what's causing these failure messages.
[3:38] <nhm> stuff getting backed up I guess?
[3:38] <nhm> we are down to 565 puts/s
[3:38] <yehuda_hm> where did we start?
[3:38] <nhm> around 650
[3:39] <yehuda_hm> yeah, so it's a big drop already
[3:39] <nhm> 219 failures so far
[3:39] <yehuda_hm> currently < 1% failures
[3:39] <yehuda_hm> I mean < 1% split
[3:40] <yehuda_hm> so there's still way ahead of us
[3:40] <yehuda_hm> what kind of errors are we getting?
[3:40] <nhm> wow
[3:40] <nhm> don't know, not recording stack traces.
[3:40] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[3:40] <nhm> we can record those in subsequent runs.
[3:41] <yehuda_hm> hmm.. strange, I don't see any interesting errors in the apache logs
[3:41] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[3:41] <nhm> down to 521 PUTS/s
[3:41] <yehuda_hm> oh, I know why
[3:41] <yehuda_hm> grep -v 201 wouldn't work
[3:42] <nhm> hrm?
[3:42] <yehuda_hm> trying to filter out the successful requests
[3:42] <yehuda_hm> 201 = success for PUT
[3:42] <nhm> ah
[3:42] <yehuda_hm> however 201 is in every line anyway (2013)
[3:43] <nhm> we are already down to around 510 puts/s
[3:43] <nhm> and dropping fast.
[3:43] <yehuda_hm> well, sound like we have a winner
[3:44] * yasu` (~yasu`@99.23.160.231) Quit (Remote host closed the connection)
[3:44] <nhm> I suppose we don't see this in RBD due to the big block sizes.
[3:45] <yehuda_hm> less objects obviously
[3:45] <nhm> well, that's annoying.
[3:46] <nhm> I wonder if we really gain anything with such aggressive splitting.
[3:48] <yehuda_hm> not sure, it'd be nice discussing it with sjust
[3:48] <yehuda_hm> I think I don't see the error on the apache logs because it doesn't really fail there.. might be a client timing out
[3:49] <nhm> could be. All of them seemed to happen around when splitting started, and now it's slow but no longer erroring out.
[3:50] <nhm> These numbers are the running average, so the actual throughput is lower right now.
[3:50] <nhm> But maybe is steady.
[3:51] <yehuda_hm> ok, I think that at this point I'd like to see the effect of the other gateway params on the GET
[3:52] <yehuda_hm> e.g., num of thread, objecter related configurables
[3:52] * glzhao (~glzhao@117.79.232.211) has joined #ceph
[3:52] <yehuda_hm> don't remember all the params we discussed
[3:52] <nhm> yehuda_hm: probably will have to wait until friday. I have to travel tomorrow and get some slides together.
[3:52] <yehuda_hm> np
[3:53] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Ping timeout: 480 seconds)
[3:53] <nhm> I can get another run going with higher split parameters though, and let it run through all of the puts and gets with a high thread count at least.
[3:53] <nhm> (and debugging off)
[3:53] <yehuda_hm> there are the objecter tunables also
[3:54] <nhm> I can throw them in if you think they'd make a difference.
[3:55] <yehuda_hm> the max ops might make some difference for GETs
[3:55] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[3:56] <nhm> I should really just bake all of this stuff into my test program so we can do parametric sweeps with it.
[3:57] <nhm> yehuda_hm: btw, I have run rados bench tests with higher objecter inflight ops values, but it hasn't really had much of an effect in that case so far.
[3:57] * sarob (~sarob@2601:9:7080:13a:ac66:e503:5d45:7cf7) has joined #ceph
[3:57] <nhm> yehuda_hm: for iops, so far osd op threads and debugging have had the biggest impact.
[3:58] * diegows (~diegows@190.190.11.42) has joined #ceph
[3:58] <nhm> yehuda_hm: now this is interesting, after about 830K objects, performance has started increasing again.
[3:59] <nhm> we dipped down to around 440ops/s, now back up to 456
[4:02] <yehuda_hm> hmm.. interesting
[4:02] <nhm> still climbing, at ~464
[4:02] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[4:03] <yehuda_hm> did someone press the turbo button?
[4:04] <yehuda_hm> hmm.. most of the pgs are not split there, btw
[4:04] <nhm> oh strange
[4:05] <yehuda_hm> oh, actually they are
[4:05] <yehuda_hm> my check was wrong
[4:06] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[4:07] <yehuda_hm> well, maybe because there are relatively smaller number of entries the filesystem caches help
[4:07] <yehuda_hm> so the splitting slowed things down, but once that's done, caches started warming up
[4:07] <nhm> yeah, could be
[4:07] <nhm> We are up to 475 now
[4:09] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[4:09] <yehuda_hm> in the osd I'm looking at there are a total of ~50k entries for the data (directories and files), which is not a lot
[4:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[4:15] * sarob (~sarob@2601:9:7080:13a:ac66:e503:5d45:7cf7) Quit (Remote host closed the connection)
[4:16] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[4:16] <nhm> yehuda_hm: looks like we peaked and are now slowly heading down a bit.
[4:17] <nhm> ooh, more failures.
[4:17] * mikedawson (~chatzilla@206.246.156.8) has joined #ceph
[4:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[4:21] <nhm> would ceph pg dump show me pgs that are splitting?
[4:21] <yehuda_hm> they're not really splitting ...
[4:22] <yehuda_hm> it's just the backing filestore
[4:22] <nhm> yes, not that splitting
[4:22] <nhm> but that's a certain osd or filestore state or something right?
[4:22] <nhm> we should have some way to see it.
[4:22] <yehuda_hm> didn't find any
[4:22] <yehuda_hm> other than looking at the actual filesystem
[4:24] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[4:24] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:34] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[4:38] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[4:41] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:04] * mikedawson (~chatzilla@206.246.156.8) Quit (Ping timeout: 480 seconds)
[5:04] * sjustlaptop (~sam@172.56.21.10) Quit (Quit: Leaving.)
[5:05] * fireD (~fireD@93-139-176-126.adsl.net.t-com.hr) has joined #ceph
[5:07] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[5:07] * fireD_ (~fireD@93-142-250-43.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:07] * a is now known as Guest6971
[5:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[5:17] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:18] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[5:20] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[5:20] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[5:22] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[5:26] * sarob (~sarob@2601:9:7080:13a:5513:754:f4f4:8f85) has joined #ceph
[5:27] * evil_ste1e is now known as evil_steve
[5:29] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:35] * sarob (~sarob@2601:9:7080:13a:5513:754:f4f4:8f85) Quit (Ping timeout: 480 seconds)
[5:36] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[5:41] * gregaf1 (~Adium@2607:f298:a:607:2026:a478:8d61:dbdf) Quit (Quit: Leaving.)
[5:43] * Cube (~Cube@netblock-72-25-110-171.dslextreme.com) has joined #ceph
[5:44] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[5:50] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:55] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[5:59] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[5:59] * yy-nm (~Thunderbi@122.233.44.183) Quit (Quit: yy-nm)
[6:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[6:20] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[6:32] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Ping timeout: 480 seconds)
[6:40] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:42] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[6:43] * Cube (~Cube@netblock-72-25-110-171.dslextreme.com) Quit (Quit: Leaving.)
[6:44] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:45] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[6:50] * gucki (~smuxi@HSI-KBW-109-192-187-143.hsi6.kabel-badenwuerttemberg.de) has joined #ceph
[6:50] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:02] * sagelap (~sage@2600:100d:b120:a27e:2962:2b58:3e5d:ae2f) has joined #ceph
[7:04] * test (~oftc-webi@c-76-103-249-37.hsd1.ca.comcast.net) has joined #ceph
[7:04] * test (~oftc-webi@c-76-103-249-37.hsd1.ca.comcast.net) Quit ()
[7:06] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[7:06] * Andes (~oftc-webi@183.62.249.162) has joined #ceph
[7:06] * AfC (~andrew@2407:7800:200:1011:6e88:14ff:fe33:2a9c) has joined #ceph
[7:15] * Cube (~Cube@netblock-72-25-110-171.dslextreme.com) has joined #ceph
[7:18] * sagelap (~sage@2600:100d:b120:a27e:2962:2b58:3e5d:ae2f) Quit (Ping timeout: 480 seconds)
[7:24] * Guest6971 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[7:26] * sagelap (~sage@2600:100d:b120:a27e:6c20:a595:3297:eb24) has joined #ceph
[7:32] * houkouonchi-home (~linux@pool-71-165-8-99.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:33] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[7:34] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[7:39] * sagelap (~sage@2600:100d:b120:a27e:6c20:a595:3297:eb24) Quit (Ping timeout: 480 seconds)
[7:43] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[7:43] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[7:43] * ChanServ sets mode +o scuttlemonkey
[7:46] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[7:49] * sagelap (~sage@2600:100d:b120:a27e:747d:18a0:8ff7:a834) has joined #ceph
[7:51] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:53] * yy-nm (~Thunderbi@122.233.44.183) has joined #ceph
[7:56] * AfC (~andrew@2407:7800:200:1011:6e88:14ff:fe33:2a9c) Quit (Ping timeout: 480 seconds)
[7:56] * haomaiwa_ (~haomaiwan@211.155.113.239) has joined #ceph
[8:03] * haomaiwang (~haomaiwan@211.155.113.239) Quit (Ping timeout: 480 seconds)
[8:04] <Andes> hi, I try use chef to install ceph, but with the following issue. Seem to keep looping waiting for the key.
[8:04] <Andes> * ruby_block[get osd-bootstrap keyring] action run
[8:05] <Andes> waiting all the time
[8:05] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Read error: Operation timed out)
[8:07] * sagelap (~sage@2600:100d:b120:a27e:747d:18a0:8ff7:a834) Quit (Ping timeout: 480 seconds)
[8:10] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[8:13] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[8:15] * sleinen (~Adium@2001:620:0:2d:e4ab:c515:639f:c834) has joined #ceph
[8:15] * Andes slaps sagewk around a bit with a large fishbot
[8:16] * sagelap (~sage@2600:100d:b120:a27e:c903:b449:9ef7:1071) has joined #ceph
[8:19] * rendar (~s@host105-180-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[8:24] * sleinen (~Adium@2001:620:0:2d:e4ab:c515:639f:c834) Quit (Ping timeout: 480 seconds)
[8:27] * sleinen (~Adium@2001:620:0:26:f0e8:a79c:27b8:60f) has joined #ceph
[8:34] * Cube (~Cube@netblock-72-25-110-171.dslextreme.com) Quit (Quit: Leaving.)
[8:38] * sagelap (~sage@2600:100d:b120:a27e:c903:b449:9ef7:1071) Quit (Ping timeout: 480 seconds)
[8:40] <Andes> and the mon.log keep showing '2013-09-17 23:39:39.875885 7fddc1b68700 1 mon.ubuntu@0(leader).auth v1 client did not provide supported auth type'
[8:40] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[8:43] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) has joined #ceph
[8:46] * julian (~julianwa@125.70.135.154) has joined #ceph
[8:47] * sagelap (~sage@2600:100d:b120:a27e:8c1d:14c2:b73f:78f1) has joined #ceph
[8:50] <xarses> Andes what does ps axu | grep ceph show?
[8:53] <Andes> mon is already running, i thought
[8:53] <Andes> root 961 0.0 0.9 133444 9136 ? Ssl 23:20 0:00 /usr/bin/ceph-mon --cluster=ceph -i ubuntu -f
[8:54] <Andes> but I see there is no key inside /var/lib/ceph/osd or boostrap-osd or mon
[8:54] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[8:55] <Andes> chef-client keep stuck. showing ' ruby_block[get osd-bootstrap keyring] action run'
[8:56] <xarses> it looks like ceph-create-keys -i ubuntu didn't run
[8:56] <Andes> always need to do by hand?
[8:56] <xarses> not normally
[8:56] <Andes> I follow the guide
[8:57] <Andes> the ceph-create-key is inside mon.rb??
[8:57] <xarses> you can try running it by hand, it might give us a clue to why it doesn't start
[8:57] <xarses> you can also service ceph -a stop
[8:57] <xarses> and service ceph -a start
[8:57] <xarses> and it should spawn create keys if necessary
[8:58] <Andes> a disk is prepared for osd, but not yet mounted. will that matter?
[8:58] * sagelap (~sage@2600:100d:b120:a27e:8c1d:14c2:b73f:78f1) Quit (Ping timeout: 480 seconds)
[8:59] <xarses> no the osd commands other than prepare cant run unless there is a monitor quorum and a osd-bootstrap keyring
[8:59] <Andes> ok
[8:59] <Andes> but something wired
[8:59] <Andes> root@ubuntu:~# service ceph -a stop root@ubuntu:~# ps aux|grep ceph root 961 0.0 0.9 133444 9552 ? Ssl 23:20 0:00 /usr/bin/ceph-mon --cluster=ceph -i ubuntu -f root 16226 0.0 0.0 8108 924 pts/1 S+ 23:59 0:00 grep --color=auto ceph
[9:00] <Andes> cluster can't stop
[9:00] <xarses> ya, seems like similar to someone this morning
[9:00] <Andes> maybe I should kill this
[9:00] <xarses> its ok to kill it if you want
[9:01] <Andes> when i kill it, the ceph-create-key process pop up
[9:02] <Andes> root@ubuntu:~# kill -9 16235
[9:02] <xarses> hehe
[9:02] <Andes> root@ubuntu:~# ps aux|grep ceph
[9:02] <xarses> ya, similar
[9:02] <Andes> root 16346 1.5 0.7 129276 7160 ? Ssl 00:01 0:00 /usr/bin/ceph-mon --cluster=ceph -i ubuntu -f
[9:02] <xarses> admin socket didn't create
[9:02] <Andes> root 16347 2.0 0.7 43732 7440 ? Ss 00:01 0:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i ubuntu
[9:02] <xarses> probably
[9:02] <Andes> sorry?
[9:03] <Andes> how to identity if the sock not create?
[9:03] <xarses> similar behavior to someone this morning
[9:03] <Andes> oh, I see the irc log, but no solution
[9:04] <xarses> likely caused by `hostname -s` not resolving to an interface on the boc
[9:04] <xarses> box
[9:04] <xarses> not sure why that would effect chef though
[9:05] <Andes> I use a fqdn. like ubuntu.wyl.com
[9:05] <xarses> ah
[9:05] <Andes> you mean that matters?
[9:05] <xarses> it could
[9:05] <xarses> depending on how the chef scripts parse it
[9:06] <Andes> now I should change all the fqdn to common hostname?
[9:06] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:06] * nigwil (~chatzilla@2001:44b8:5144:7b00:281b:37d2:55ac:e71) has joined #ceph
[9:07] <Andes> I don't really study the fqdn well, but I need to user s3 client, set up my dns server, and things like this
[9:08] * Meths_ (~meths@2.25.213.255) has joined #ceph
[9:08] * sleinen (~Adium@2001:620:0:26:f0e8:a79c:27b8:60f) Quit (Quit: Leaving.)
[9:08] * sagelap (~sage@2600:100d:b120:a27e:805b:438f:404c:60a) has joined #ceph
[9:08] * sleinen (~Adium@2001:620:0:2d:409c:dae5:cdac:1023) has joined #ceph
[9:11] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:12] <xarses> Andes, sorry i don't know the chef scripts well enough, i do know that the monitors wont start correctly with a fqdn. Radiosgw (for s3) support is a different story
[9:13] * Meths (~meths@2.25.191.175) Quit (Ping timeout: 480 seconds)
[9:13] <Andes> yep. thanks. I first change to common hostname. we will see
[9:15] <xarses> good luck, it's bed time for me.
[9:15] <Andes> well. good night :)
[9:15] <Andes> I'm having sunshine
[9:16] * sleinen (~Adium@2001:620:0:2d:409c:dae5:cdac:1023) Quit (Ping timeout: 480 seconds)
[9:21] * mtanski (~mtanski@193.106.79.137) has joined #ceph
[9:21] * mtanski (~mtanski@193.106.79.137) Quit ()
[9:25] * sleinen (~Adium@2001:620:0:26:d42:5d22:f97b:732) has joined #ceph
[9:25] * Andes (~oftc-webi@183.62.249.162) Quit (Remote host closed the connection)
[9:26] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:29] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:37] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[9:37] * ChanServ sets mode +v andreask
[9:38] * LeaChim (~LeaChim@host86-135-252-168.range86-135.btcentralplus.com) has joined #ceph
[9:40] * haomaiwang (~haomaiwan@117.79.232.211) has joined #ceph
[9:40] * haomaiwa_ (~haomaiwan@211.155.113.239) Quit (Read error: Connection reset by peer)
[9:41] * yy-nm (~Thunderbi@122.233.44.183) Quit (Quit: yy-nm)
[9:43] * KindOne (~KindOne@h11.36.28.71.dynamic.ip.windstream.net) has joined #ceph
[9:47] * madkiss (~madkiss@2001:6f8:12c3:f00f:6041:5086:4359:10ab) has joined #ceph
[9:49] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[10:03] <ccourtaut> morning!
[10:03] * fretb (~fretb@frederik.pw) Quit (Quit: leaving)
[10:06] * fretb (~fretb@frederik.pw) has joined #ceph
[10:06] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[10:09] * allsystemsarego (~allsystem@188.25.131.49) has joined #ceph
[10:15] * fretb (~fretb@frederik.pw) Quit (Quit: leaving)
[10:18] * sleinen (~Adium@2001:620:0:26:d42:5d22:f97b:732) Quit (Quit: Leaving.)
[10:18] * sleinen (~Adium@130.59.94.146) has joined #ceph
[10:23] * sleinen1 (~Adium@2001:620:0:26:5cce:a212:73f3:fbc5) has joined #ceph
[10:24] * fretb (~fretb@frederik.pw) has joined #ceph
[10:25] * fretb (~fretb@frederik.pw) Quit ()
[10:28] * sleinen (~Adium@130.59.94.146) Quit (Ping timeout: 480 seconds)
[10:33] <joelio> ccourtaut: bon jour!
[10:33] <ccourtaut> joelio: :)
[10:36] * X3NQ (~X3NQ@195.191.107.205) has joined #ceph
[10:40] <andreask> hi joelio ... you got your big-file upload problem solved?
[10:41] <joelio> andreask: I've got much further!
[10:41] <joelio> not quite there yet, but I;ve found out how to make aws-sdk play with Ceph
[10:41] <joelio> and got the multipart working
[10:42] * X3NQ (~X3NQ@195.191.107.205) has left #ceph
[10:42] <andreask> what was the key?
[10:42] <joelio> I can send all chunks now, it definitely uploads (can see heavy activity) but I'm getting this back.. AWS::S3::Errors::UnknownError
[10:43] <joelio> andreask: this is what I've got.. https://gist.github.com/joelio/37f02f0f76bd4f3ba234
[10:43] <joelio> (missing require 'aws-sdk')
[10:44] <joelio> I'm using the split binary, allows me to chunk files easier for testing
[10:45] <joelio> outputs default x??? so, just using those as the parts
[10:45] <joelio> got further than list night too.. using the mpm-event model is not wise!
[10:45] <joelio> with external fastcgi
[10:47] <joelio> pupet mpm-worker on and it's much more robust - I think I just need to get the incantation of the s3 object in aws-sdk right now, think the Ceph side (should) be sorted
[10:49] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:51] <andreask> joelio: cool, so the radosgw seems to work as expected?
[10:52] * fretb (~fretb@frederik.pw) has joined #ceph
[10:57] <joelio> yea, seems to be much happier now with mpm-worker. Getting the config hash right for the AWS object took longer than expected, so maybe it might be useful to someone?
[11:00] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:04] * nigwil (~chatzilla@2001:44b8:5144:7b00:281b:37d2:55ac:e71) Quit (Remote host closed the connection)
[11:04] * nigwil (~chatzilla@2001:44b8:5144:7b00:281b:37d2:55ac:e71) has joined #ceph
[11:07] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[11:18] * julian (~julianwa@125.70.135.154) Quit (Quit: afk)
[11:28] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) has joined #ceph
[11:47] * mattt (~mattt@92.52.76.140) has joined #ceph
[11:57] * schlitzer|work (~schlitzer@109.75.189.45) has joined #ceph
[11:59] <joelio> heh, this is strange.. so the s3 aws-sdk gem is uploading all the parts uisng PUT. Once the last part is uploaded.. I get a few POSTS, then a DELETE.. the script hangs while the object is deleted and then returns back with AWS::S3::Errors::UnknownError
[12:00] <joelio> I'm thinking the aws-sdk gem doesn't like the s3 implementations Ceph uses.. although I can't think why not
[12:00] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:01] <joelio> maybe it's trying to check the object was commited successfully, sees something it doesn't like or expect and fails back to DELETE as a safe cleanup
[12:01] <joelio> I think I'll hook my script up to AWS proper to A/B
[12:09] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:13] * thomnico (~thomnico@2a01:e35:8b41:120:4cd5:c71c:2e34:8f17) has joined #ceph
[12:16] * joelio succeeds at multipart into Ceph's S3 :)
[12:16] <joelio> finally!
[12:20] <joelio> Strangely enough, I've disabled continue support and it's working everytime now (with the inktank provided apache/fastcgi mod)
[12:20] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[12:26] <joelio> some quick wget tests showing 200MB/s too, so not too shabby
[12:32] * glzhao (~glzhao@117.79.232.211) Quit (Quit: leaving)
[12:49] <jamespage> what's the easiest way to add capabilities to an existing cephx key?
[12:54] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[13:02] <fretb> jamespage: ceph auth add client.NAME ARGS
[13:03] <fretb> args being like mon 'allow r'
[13:04] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[13:04] * ChanServ sets mode +v andreask
[13:05] <jamespage> fretb, thanks - I'll five that a spin
[13:07] * diegows (~diegows@190.190.11.42) has joined #ceph
[13:08] <jamespage> fretb, pointed me in the right direction 'ceph auth caps' did the right things
[13:16] * phoenix (~phoenix@vpn1.safedata.ru) has joined #ceph
[13:16] <phoenix> hi ppl
[13:16] <phoenix> i have little truble
[13:19] <phoenix> I use the OS: Debian. After I upgrade to version :0.67.3-1. after the update I got here is a message to the cluster: ceph-s
[13:19] <phoenix> unparseable JSON status
[13:19] <phoenix> как устранить данную проблему? и вернуть кластер в рабочее состояние?
[13:19] <phoenix> how to fix this problem? and return the cluster to a working state?
[13:21] <andreask> you upgraded from what version?
[13:23] <phoenix> ceph version 0.61.7
[13:24] * yanzheng (~zhyan@134.134.137.71) has joined #ceph
[13:27] <phoenix> sorry problem was solved by myself.
[13:27] <andreask> forgot to upgrade ceph-common first?
[13:28] <phoenix> yes
[13:28] <phoenix> %)
[13:31] * thomnico (~thomnico@2a01:e35:8b41:120:4cd5:c71c:2e34:8f17) Quit (Ping timeout: 480 seconds)
[13:31] * thomnico (~thomnico@2a01:e35:8b41:120:7c6e:2871:7cf:8198) has joined #ceph
[13:32] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[13:32] * ChanServ sets mode +o elder
[13:39] * foosinn (~stefan@office.unitedcolo.de) Quit (Remote host closed the connection)
[13:39] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[13:41] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[13:41] * roald (~roaldvanl@139-63-21-115.nodes.tno.nl) has joined #ceph
[13:41] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[13:43] <phoenix> hm i have next problem:)
[13:45] <phoenix> when i try mounting :ceph-fuse -o nonempty -m 10.1.1.3:6789 /backup i see:ceph-fuse[2623]: starting ceph client and the directory is not mounted.
[13:46] <phoenix> what I'm doing wrong, I apologize for boring.
[13:46] <phoenix> firewall is off.
[13:53] <janos> phoenix: not boring. just most of the people that can help with that are not awake yet
[13:54] <andreask> phoenix: what does "ceph -s" look like?
[13:55] <phoenix> ceph -s
[13:55] <phoenix> cluster 2d2dde58-5053-4c1f-b501-e3a0acf1ddec
[13:55] <phoenix> health HEALTH_ERR 2 full osd(s)
[13:55] <phoenix> monmap e1: 1 mons at {a=10.1.6.142:6789/0}, election epoch 1, quorum 0 a
[13:55] <phoenix> osdmap e77: 2 osds: 2 up, 2 in full
[13:55] <phoenix> pgmap v36712: 576 pgs: 575 active+clean, 1 active+clean+scrubbing+deep; 1923 GB data, 3854 GB used, 201 GB / 4056 GB avail
[13:55] <phoenix> mdsmap e57: 1/1/1 up {0=a=up:active}
[13:55] <joelio> guess the answers in there :)
[13:55] <andreask> :-)
[13:55] <joelio> andreask: got it working!!!!
[13:55] <phoenix> this is a test stand
[13:55] <andreask> joelio: woot ;-)
[13:57] <phoenix> any ideas?
[13:57] <andreask> phoenix: health HEALTH_ERR 2 full osd(s)
[13:57] <joelio> andreask: yea, the errors were caused by using the event worker and it leaving cruft in the bucket.. created a new one after transitioning to worker model and it works great
[13:58] <joelio> phoenix: your OSDs are full I think. I'm not sure why that would stop you from mounting cephfs - perhaps because you need to write to it to mount?
[13:58] * jcfischer (~fischer@user-28-17.vpn.switch.ch) has joined #ceph
[13:59] <joelio> phoenix: can you add another OSD to balance things out perhaps and try to remount?
[14:00] <phoenix> health HEALTH_ERR 2 full osd (s) - that is, if I remove this error, then fine automatically mount?I increase the size of them. the virtual disks on vmware
[14:01] <jcfischer> we have a weird phenomena with our mons (I added 2 mons to our setup of 3 mons): I have stopped the 2 new mons, but the remaining (old 3) are constantly going in and out of service with this message when they are not running:
[14:01] <jcfischer> 2013-09-18 14:01:04.236953 7fc0348d9700 1 mon.h5@2(electing) e14 discarding message auth(proto 0 27 bytes epoch 14) v1 and sending client elsewhere
[14:01] <jcfischer> after a while the mons find each other and start working again
[14:01] <jcfischer> any ideas?
[14:01] <joelio> left over processes, have you rebooted?
[14:03] <andreask> phoenix: yes, remove data or resize the disks
[14:07] <phoenix> thx
[14:08] * yanzheng (~zhyan@134.134.137.71) Quit (Remote host closed the connection)
[14:09] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:09] <mattt> anyone had much success changing a mon's ip address ?
[14:14] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:17] <andreask> mattt: you tried following the manual?
[14:17] <mattt> andreask: yep, wasn't working for me, but eventually got it
[14:20] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) has joined #ceph
[14:20] <mattt> andreask: can i ask a quick aside, is it common practice to add each individual osd to ceph.conf ?
[14:24] <joelio> not when using ceph-deploy
[14:25] <mattt> joelio: yeah, it's a bit confusing
[14:26] <joelio> and thinking about it, makes more sense now to me.. if you're adding osds at scale, you wouldn't want to have such config churn. Keep the configurables of ceph away from potentially more volatile data is more sensible I think
[14:26] <joelio> but yea, confusing :)
[14:27] <joelio> it is probably better to let Ceph manage it's own state of OSDmap, rather than the config file I guess
[14:28] * shang (~ShangWu@175.41.48.77) Quit (Remote host closed the connection)
[14:28] <mattt> joelio: that was my thought -- if you have a large installation then maintaining ceph.conf would be tricky
[14:28] <mattt> however i saw some reference documentation where the osds were outlined in ceph.conf, so wasn't sure
[14:29] <joelio> maybe creating more specific crushmaps for tiered storage.. can be more granular over tuning osds
[14:30] <joelio> for my use case, where everything is homogenous, it's fine
[14:44] <andreask> yeah, writting every osd to the ceph.conf doesn't scale well
[14:44] <andreask> .. and is fortunately not needed any more
[14:47] * yanzheng (~zhyan@101.82.175.199) has joined #ceph
[14:49] * iii8 (~Miranda@91.207.132.71) has joined #ceph
[14:50] <mattt> thanks joelio / andreask
[14:50] <andreask> yw
[14:51] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:57] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[15:04] * phoenix (~phoenix@vpn1.safedata.ru) has left #ceph
[15:11] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[15:17] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[15:18] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[15:19] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[15:23] * yanzheng (~zhyan@101.82.175.199) Quit (Ping timeout: 480 seconds)
[15:29] * todin_ (tuxadero@kudu.in-berlin.de) Quit (Read error: Connection reset by peer)
[15:29] * yanzheng (~zhyan@101.82.175.199) has joined #ceph
[15:31] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[15:31] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[15:31] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[15:31] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[15:39] <claenjoy> Hi , have "health HEALTH_WARN 5 pgs degraded; 5 pgs stale; 5 pgs stuck stale; 5 pgs stuck unclean" mon ok , and osd all up
[15:39] <claenjoy> how can I fix them ?
[15:40] <claenjoy> with ceph health detail have :
[15:40] <claenjoy> http://paste.ubuntu.com/6123942/
[15:43] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[15:46] <andreask> claenjoy: how many osd nodes?
[15:48] <claenjoy> 4 osds
[15:51] <claenjoy> 4 ods are all up and and 3 mon up as well
[15:52] <alfredodeza> hi guys, there is a new release of ceph-deploy (1.2.5)
[15:52] <alfredodeza> make sure you update!
[15:52] <alfredodeza> lots and lots of fixes and improvements
[15:53] <JustEra> does it support ext4 in osd creation now ? :p
[15:53] <alfredodeza> of course not :)
[15:54] <JustEra> :(
[15:54] <alfredodeza> JustEra: I don't foresee us implementing that feature
[15:54] <alfredodeza> sorry, not sure if I discussed this with you before
[15:55] <JustEra> not with me :)
[15:55] <alfredodeza> ah, then I apologize for the snark reply
[15:55] <joelio> alfredodeza: cool, nicely done
[15:55] <alfredodeza> for ceph-deploy, features are not added unless they make sense for new users who want to get started with ceph
[15:55] <alfredodeza> JustEra: ^ ^
[15:56] <alfredodeza> this is also documented and better explained here: https://github.com/ceph/ceph-deploy#why-is-feature-x-not-implemented
[15:56] <dmsimard> alfredodeza: That would be me, asking about other filesystem types for ceph-deploy. A patch was submitted recently in fact: http://tracker.ceph.com/issues/6154#change-27641
[15:56] <dmsimard> I don't have the time to test it, however
[15:56] <alfredodeza> dmsimard: the docs where fixed btw
[15:57] <dmsimard> You might just have to fix them again if the patch goes through :D
[15:57] <andreask> claenjoy: can you pastebin a "ceph osd dump --format=json-pretty"?
[15:57] <claenjoy> @andreask , do you need more info ?
[15:57] <cephalobot> claenjoy: Error: "andreask" is not a valid command.
[15:57] <alfredodeza> dmsimard: good catch on the `--fs-type` being available for other OSD commands
[15:57] <alfredodeza> *maybe* that feels to me like we should attempt to implement that
[15:58] <alfredodeza> I don't like the idea of allowing a flag for just a few subcommands
[15:58] <alfredodeza> if we have it for one we should be consistent
[15:58] * danieagle (~Daniel@177.133.172.16) has joined #ceph
[15:58] <claenjoy> andreeask : http://paste.ubuntu.com/6124012/
[15:58] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[16:01] <alfredodeza> JustEra: that would mean that the flag might get included soon :)
[16:01] <JustEra> alfredodeza, Nice \o/
[16:05] <andreask> claenjoy: hmm ... can you pastebin the full output of "ceph -s" and "ceph pg dump" please
[16:08] <dmsimard> Oh yeah !
[16:08] <dmsimard> Issue #6154 has been updated by Alfredo Deza.
[16:08] <dmsimard> • Assignee set to Alfredo Deza
[16:08] <kraken> dmsimard might be talking about: http://tracker.ceph.com/issues/6154 [ceph-deploy should be able to use an argument "fs-type" to specify the filesystem type on an OSD.]
[16:08] * sagelap (~sage@2600:100d:b120:a27e:805b:438f:404c:60a) Quit (Ping timeout: 480 seconds)
[16:08] <dmsimard> :)
[16:08] <JustEra> hmmm last ceph-deploy give an error when creating a mon :(
[16:08] <alfredodeza> JustEra: paste
[16:08] <claenjoy> http://paste.ubuntu.com/6124051/ and http://paste.ubuntu.com/6124055/ , what do you think it could be ?
[16:08] * Svedrin (svedrin@ketos.funzt-halt.net) Quit (Ping timeout: 480 seconds)
[16:09] <JustEra> http://pastebin.com/z5UJFE3S
[16:09] * alfredodeza looking
[16:10] <joelio> do you have lsb installed?
[16:10] <alfredodeza> JustEra: are you able to try with another distro?
[16:11] <alfredodeza> joelio: we are now falling back if you don't have lsb installed BTW
[16:11] <JustEra> it as working with the previous version of ceph-deploy just upgrade purge/uninstall and trying to reinstall all
[16:11] <claenjoy> andreask : what do you think it could be ?
[16:11] <JustEra> was*
[16:12] <joelio> alfredodeza: ah, cool, makes sense to have easy fallback
[16:12] <andreask> claenjoy: hmm ... I'm not sure .. the pgs are not in the same pool ...
[16:12] <alfredodeza> joelio: yeah we don't want to be forcing anyone to install something if we are able to have other means to figure out OS details
[16:12] <joelio> /etc/issue :)
[16:13] <JustEra> alfredodeza, btw it does reconize the os version "ceph_deploy.mon][INFO ] distro info: Debian 7.1 wheezy"
[16:13] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[16:13] * ChanServ sets mode +o scuttlemonkey
[16:13] <alfredodeza> joelio: that isn't entirely consistent
[16:13] <alfredodeza> JustEra: yep
[16:13] * jcfischer_ (~fischer@193.190.130.49) has joined #ceph
[16:14] <andreask> claenjoy: can you show the result of "ceph pg {poolnum}.{pg-id} query for these 5 pgs?
[16:14] <joelio> alfredodeza: isn't ther another python dependency needed, sure someone's had that before?
[16:14] <joelio> .. on Wheezy
[16:14] <alfredodeza> joelio: what do you mean
[16:14] <alfredodeza> oh that is just a Python error
[16:15] <alfredodeza> related to the library that executes python on the remote end
[16:15] <alfredodeza> trying to figure out why is it complaining
[16:15] * malcolm__ (~malcolm@101.165.48.42) has joined #ceph
[16:15] <alfredodeza> JustEra: can you verify you have ceph-deploy 1.2.5 ?
[16:16] <alfredodeza> the line numbers don't match on what I have
[16:16] <JustEra> root@hamtaro /root/alibaba [38]# ceph-deploy --version
[16:16] <JustEra> 1.2.5
[16:16] <alfredodeza> joelio: thanks
[16:16] * jcfischer (~fischer@user-28-17.vpn.switch.ch) Quit (Ping timeout: 480 seconds)
[16:16] * jcfischer_ is now known as jcfischer
[16:16] <alfredodeza> err
[16:16] <alfredodeza> JustEra I mean
[16:17] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:17] <claenjoy> andreask : if I do " ceph pg 0.f query" I have :
[16:17] <claenjoy> Error ENOENT: i don't have pgid 0.f
[16:18] * malcolm__ (~malcolm@101.165.48.42) Quit (Read error: Connection reset by peer)
[16:18] * malcolm (~malcolm@101.165.48.42) has joined #ceph
[16:18] <alfredodeza> dammit
[16:18] <alfredodeza> I am going to have to release 1.2.6 today
[16:18] * alfredodeza curses
[16:18] <alfredodeza> JustEra: I just found the problem
[16:18] <JustEra> alfredodeza, haha nice
[16:19] * BillK (~BillK-OFT@124-169-207-19.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:19] <andreask> claenjoy: oh .. for all of them?
[16:20] <claenjoy> andreask : yes all of 5 of them , said : "Error ENOENT: i don't have pgid "
[16:22] <alfredodeza> Created issue 6337
[16:22] <kraken> alfredodeza might be talking about: http://tracker.ceph.com/issues/6337 [Closed connection causes import errors for Debian mon create commands]
[16:22] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[16:22] <alfredodeza> JustEra: ^ ^
[16:23] * sagelap (~sage@2600:100d:b120:a27e:6c20:a595:3297:eb24) has joined #ceph
[16:23] <JustEra> alfredodeza, any ETA ? (if I wait for a release or I switch to another project :P)
[16:23] <alfredodeza> today
[16:23] <alfredodeza> the DEB/RPM releases take a bit longer
[16:24] <JustEra> rgr
[16:24] <alfredodeza> if you are familiar with Python install tools you can have the fix no longer than 2 hours
[16:24] <JustEra> yup
[16:24] <alfredodeza> ok excellent
[16:25] <alfredodeza> give me a bit, just a matter of making the changes
[16:25] <JustEra> rgr
[16:26] * jcfischer_ (~fischer@user-23-23.vpn.switch.ch) has joined #ceph
[16:28] * jcfischer (~fischer@193.190.130.49) Quit (Ping timeout: 480 seconds)
[16:28] * jcfischer_ is now known as jcfischer
[16:29] * yanzheng (~zhyan@101.82.175.199) Quit (Ping timeout: 480 seconds)
[16:29] <mattt> rgw enable ops log
[16:29] <mattt> when enabled, where is that actually sent to? and how does one 'retrieve it'? ?
[16:30] <andreask> claenjoy: these pgs are only on osd with the id 3 ... have these pgs every been healthy
[16:30] <andreask> ?
[16:31] <joelio> mattt: add log_file = /var/log/ceph/radosgw.log to the [client.radosgw.gateway] section in ceph.conf
[16:31] <joelio> try that
[16:31] <mattt> yeah, i can see that stuff being logged
[16:31] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Read error: Operation timed out)
[16:31] * sagelap (~sage@2600:100d:b120:a27e:6c20:a595:3297:eb24) Quit (Quit: Leaving.)
[16:31] * sagelap (~sage@2600:100d:b120:a27e:6c20:a595:3297:eb24) has joined #ceph
[16:31] <mattt> joelio: but look at http://eu.ceph.com/docs/wip-5492/radosgw/config/
[16:32] <mattt> "By default, the RADOS Gateway will log every successful operation in the RADOS backend."
[16:32] <joelio> rgw_enable_ops_log = false
[16:32] <joelio> debug_rgw = 0
[16:32] <joelio> if you don't want it
[16:33] <alfredodeza> JustEra: can you try my fix and confirm it works?
[16:33] <JustEra> yep
[16:33] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[16:34] <alfredodeza> You would need to clone from here --> https://github.com/alfredodeza/ceph-deploy and then checkout the 6337
[16:34] <alfredodeza> the 6337 branch I mean
[16:34] <alfredodeza> please let me know if that works for you (it should!)
[16:34] <mattt> joelio: yeah, i got it all disabled, was wondering why that'd be sent to rados itself tho
[16:35] <joelio> not sure I follow?
[16:39] <claenjoy> andreask : I not sure if I get your question , how can I check it ?
[16:43] <andreask> claenjoy: have you restarted some of the osds?
[16:44] <claenjoy> andreask yes
[16:44] <JustEra> alfredodeza, confirming that it works ;)
[16:44] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[16:44] <alfredodeza> thanks JustEra
[16:44] <alfredodeza> would you be able to comment on that Pull Request?
[16:44] <alfredodeza> JustEra: https://github.com/ceph/ceph-deploy/pull/84
[16:45] * a is now known as Guest7029
[16:45] <andreask> claenjoy: I'd try to restart osd.3
[16:46] <JustEra> alfredodeza, done
[16:46] <alfredodeza> thank you sir
[16:46] <claenjoy> andreask still the same
[16:50] <andreask> claenjoy: you modified your crushmap?
[16:51] * Guest7029 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[16:51] <claenjoy> andreask : maybe I delete some of them and the I make an other new , because at the begin it didn't work, mount correct the partition osd so maybe yes
[16:54] <andreask> claenjoy: is this only a test-system?
[16:54] * malcolm (~malcolm@101.165.48.42) Quit (Ping timeout: 480 seconds)
[16:54] <JustEra> hmmm alfredodeza now I've a problem with crep-create-keys on the mon
[16:54] <claenjoy> it 's the begin of my project
[16:55] <claenjoy> andreask , at the moment all osds are empty
[16:55] <alfredodeza> JustEra: make sure your hostnames match
[16:55] <alfredodeza> oh, you are using FQDN
[16:55] <JustEra> http://pastebin.com/0nwDfYwm
[16:55] <JustEra> yep
[16:55] <alfredodeza> that might be a problem. The monitors use `shortname -s`
[16:55] <claenjoy> andreask : only think I did with crush map was : "ceph osd crush remove ..."
[16:55] <JustEra> was working with previous version
[16:56] <claenjoy> andreask: and then I made an other new
[16:56] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:56] <andreask> claenjoy: so to get rid of these pgs ... means recreating them empty --> data loss! ... you can do "ceph pg force_create_pg <pgid>"
[16:56] <claenjoy> aaa ok I will
[16:59] * sprachgenerator (~sprachgen@130.202.135.210) has joined #ceph
[17:05] <JustEra> alfredodeza, hmmm tried without fqdn hostname and it didn't work :( "INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'" hostname -s return the same host that the cmd
[17:06] <alfredodeza> JustEra: do you have iptables up?
[17:06] <alfredodeza> try increasing the verbosity of the monitor logging and restart them
[17:06] <JustEra> no fw setup atm (preprod env)
[17:07] <JustEra> infact this is the /usr/sbin/ceph-create-keys that fail
[17:07] <alfredodeza> create keys will fail if the monitors never form quorum
[17:08] <alfredodeza> try and increase the verbosity of the mons in /etc/ceph/ceph.conf by adding; debug mon = 10
[17:08] <alfredodeza> and also add: debug ms = 10
[17:08] <alfredodeza> then restart the mons and tail the output
[17:09] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:10] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[17:10] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit ()
[17:11] * roald (~roaldvanl@139-63-21-115.nodes.tno.nl) Quit (Ping timeout: 480 seconds)
[17:13] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:13] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:15] <JustEra> hm I've some IP addr that are missing in the mon
[17:16] <JustEra> replaced by 0.0.0.0:0/2 & /2
[17:23] <claenjoy> andreask: now I have health HEALTH_WARN 1 pgs degraded; 1 pgs stale; 5 pgs stuck inactive; 1 pgs stuck stale; 6 pgs stuck unclean
[17:24] <JustEra> alfredodeza, any idea ? on the admin host the mon get only his ip address and on the others mon it get all ip beside the admin hosts :/
[17:24] <andreask> claenjoy: you have now more stuck pgs?
[17:26] <alfredodeza> JustEra: hrmnn, are those hosts able to ping each other?
[17:27] <JustEra> alfredodeza, yeah
[17:27] <alfredodeza> by the value of `hostname -s` ?
[17:27] <andreask> andreask: have you checked your current crush-map?
[17:27] <JustEra> alfredodeza, yup
[17:27] <andreask> claenjoy: have you checked your current crush-map?
[17:28] <alfredodeza> it is possible that the values from /var/lib/ceph/ are no longer correct and you may need to start again
[17:28] <alfredodeza> have you tried from scratch?
[17:28] <JustEra> alfredodeza, set ceph-deploy -v uninstall and ceph-deploy purge & purgedata
[17:28] * todin (tuxadero@kudu.in-berlin.de) Quit (Read error: Connection reset by peer)
[17:31] <JustEra> alfredodeza, maybe delete /var/lib/ceph with rm -rf ?
[17:32] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:32] <JustEra> alfredodeza, nah with purge/purgedata/uninstall it delete /var/lib/ceph fine so :(
[17:33] <andreask> claenjoy: maybe "ceph osd crush dump" gives a hint
[17:34] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[17:34] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:35] * andreask need to go ... dinner
[17:36] <claenjoy> andrewask : seems yes from the output , I just "ceph pg force_create_pg 2.d "
[17:37] <claenjoy> andrewask : and all other rest
[17:37] <alfredodeza> JustEra: purge and then purgedata should take care of everything to start from scratch. However, be warned, that removes *everything* that is ceph-related
[17:37] <alfredodeza> including any data you may have added
[17:39] <JustEra> alfredodeza, already do that and same :(
[17:40] * sagelap (~sage@2600:100d:b120:a27e:6c20:a595:3297:eb24) Quit (Ping timeout: 480 seconds)
[17:40] <alfredodeza> can you paste the complete output from the start?
[17:41] * roald (~roaldvanl@87.209.150.214) has joined #ceph
[17:42] <JustEra> alfredodeza, http://pastebin.com/NqwgEvKU
[17:43] <claenjoy> andreask: this my output after the pg froce_create for the 5 of before : http://paste.ubuntu.com/6124416/
[17:43] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:43] <alfredodeza> JustEra: woah, you are using public IPs ?
[17:44] <alfredodeza> e.g. 88.190.54.6
[17:44] <JustEra> alfredodeza, it's a testing env it will be on a private rpn on the prod :)
[17:45] <claenjoy> andreask : my fault I force_create_pg a wrong !!! so let me see what I can do
[17:46] <claenjoy> andreask : ok now it's better : "health HEALTH_WARN 6 pgs stuck inactive; 6 pgs stuck unclean"
[17:47] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[17:47] * ChanServ sets mode +v andreask
[17:48] <alfredodeza> I am running out of ideas JustEra
[17:48] <alfredodeza> I don't see how this might be related to ceph-deploy though
[17:48] <alfredodeza> :/
[17:48] <claenjoy> alfredodeza: new release support LVM partions ?
[17:49] <JustEra> yeah me too but it was working with the fqdn and with ceph-deploy previous version :/
[17:49] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[17:51] <JustEra> alfredodeza, going out of work, I will retry tomorrow and tell you if it was related or not see you o/
[17:51] <kraken> \o
[17:52] * jeff-YF (~jeffyf@67.23.117.122) Quit (Quit: jeff-YF)
[17:52] * sleinen1 (~Adium@2001:620:0:26:5cce:a212:73f3:fbc5) Quit (Quit: Leaving.)
[17:52] * sleinen (~Adium@130.59.94.146) has joined #ceph
[17:52] * JustEra (~JustEra@89.234.148.11) Quit (Quit: This computer has gone to sleep)
[17:54] <claenjoy> I need to now active+clean all 6 pg , how can I do that ?
[17:55] * sleinen (~Adium@130.59.94.146) Quit (Read error: Connection reset by peer)
[17:56] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[17:59] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:00] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has left #ceph
[18:01] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[18:01] * markbby (~Adium@168.94.245.2) Quit ()
[18:02] * jcfischer (~fischer@user-23-23.vpn.switch.ch) Quit (Quit: jcfischer)
[18:04] * sjm_ (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[18:06] <claenjoy> anysuggestions?
[18:07] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[18:07] * mattt (~mattt@92.52.76.140) Quit (Read error: Connection reset by peer)
[18:07] * roald (~roaldvanl@87.209.150.214) Quit (Ping timeout: 480 seconds)
[18:09] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[18:09] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[18:11] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:11] <claenjoy> andreask : thanksssss !!!! I did the pg force_create and then restart ceph-all to the osd.3 now is fixxxx
[18:12] <andreask> claenjoy: good to hear! but still ... would be interesting what you did to get into this state ... maybe the osd logs on osd.3 give hints
[18:18] <rsanti> so... how can I debug a situation when I launch a monitor and it just... quits with error code 1?
[18:18] <claenjoy> mmm , I m not sure maybe I delete ceph-3 , in the directory (/var/lib/ceph/osd/) and the I restart the machine
[18:20] * a (~a@209.12.169.218) has joined #ceph
[18:20] * a is now known as Guest7042
[18:21] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[18:21] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:22] * med is now known as medbot
[18:22] * medbot is now known as med
[18:22] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[18:24] * thomnico (~thomnico@2a01:e35:8b41:120:7c6e:2871:7cf:8198) Quit (Quit: Ex-Chat)
[18:24] <claenjoy> andreask : do you want any my log ? to check ?
[18:26] * danieagle (~Daniel@177.133.172.16) Quit (Ping timeout: 480 seconds)
[18:27] * angdraug (~angdraug@204.11.231.50.static.etheric.net) has joined #ceph
[18:31] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:32] * jcfischer (~fischer@62.50.241.95) has joined #ceph
[18:33] * fabioFVZ (~fabiofvz@213.187.20.119) has joined #ceph
[18:33] * fabioFVZ (~fabiofvz@213.187.20.119) Quit (Remote host closed the connection)
[18:35] * danieagle (~Daniel@186.214.63.143) has joined #ceph
[18:41] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[18:49] * vata (~vata@2607:fad8:4:6:e963:8811:f4e6:44c0) has joined #ceph
[18:50] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[18:59] * aliguori (~anthony@204.57.119.28) has joined #ceph
[19:02] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[19:04] * doubleg (~doubleg@69.167.130.11) Quit (Quit: Lost terminal)
[19:05] * aliguori (~anthony@204.57.119.28) Quit (Quit: Ex-Chat)
[19:05] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) Quit (Quit: Leaving.)
[19:05] * bandrus1 (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[19:09] * nhm (~nhm@184-97-187-196.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[19:09] * jcfischer_ (~fischer@user-23-12.vpn.switch.ch) has joined #ceph
[19:09] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:10] * jcfischer (~fischer@62.50.241.95) Quit (Ping timeout: 480 seconds)
[19:10] * jcfischer_ is now known as jcfischer
[19:10] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:10] <claenjoy> andreask : anyway I have problem with clock skew detected on mon
[19:10] * jcfischer (~fischer@user-23-12.vpn.switch.ch) Quit ()
[19:11] <xarses> claenjoy: a drift of around 50ms will do that
[19:12] <claenjoy> everytime I restart ntp and then ceph-mon-all then it's work ! so I checked the ps aux | ceph and ntp , and I can see the ceph-mon is running before ntp service
[19:13] <claenjoy> should be opposite ? so before ntp service and then ceph-mon , correct ?
[19:15] <claenjoy> xarses: thanks , so what you suggest ?
[19:18] * doxavore (~doug@99-7-52-88.lightspeed.rcsntx.sbcglobal.net) has joined #ceph
[19:19] * gregaf (~Adium@2607:f298:a:607:2026:a478:8d61:dbdf) has joined #ceph
[19:21] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[19:24] * diegows (~diegows@190.190.11.42) has joined #ceph
[19:27] * rturk-away is now known as rturk
[19:30] <xarses> claenjoy, its ok the start order, as long as the time says in sync
[19:30] <xarses> ntpstat
[19:30] <xarses> synchronised to NTP server (10.0.0.2) at stratum 4
[19:30] <xarses> time correct to within 546 ms
[19:30] <xarses> polling server every 1024 s
[19:30] <peetaur> silly terrible question... how would one mount something from ceph into a windows client? mount an RBD image in Linux and samba share it?
[19:32] <xarses> claenjoy, if you have a drift of 546 ms like me, then you will probably get clock skew's alot, iirc they tollerate 50ms before that message is posted
[19:33] <xarses> peetaur, you could skin the radios part a couple of ways, but for the windows, you have cifs (samba) or nfs (if you have the tools lib)
[19:33] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[19:34] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[19:34] * ChanServ sets mode +o scuttlemonkey
[19:34] <xarses> for the prior you could use CephFS, or rbd to mount the image
[19:34] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[19:35] <xarses> morning MACscr
[19:35] * danieagle (~Daniel@186.214.63.143) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[19:35] <MACscr> morning xarses
[19:36] <peetaur> xarses: ok thx. So no chance of a ceph client of some sort, eh?
[19:36] <xarses> peetaur, not that I'm aware of
[19:37] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[19:37] <xarses> http://wiki.ceph.com/03FAQs/01General_FAQ#Do_Ceph_Clients_Run_on_Windows.3F
[19:37] <claenjoy> xarses: thanks !
[19:42] <peetaur> k, so some day I guess that'll exist ;)
[19:42] <peetaur> but either way, samba on top is no worse than anything else you can do in such a situation.
[19:43] <xarses> seems that way
[19:43] <xarses> mon crea
[19:43] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[19:44] <angdraug> are there any best practices for drive topology with osd journals on SSDs?
[19:45] * jksM (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[19:45] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:46] * gregmark1 (~Adium@68.87.42.115) has joined #ceph
[19:46] * dmsimard1 (~Adium@108.163.152.2) has joined #ceph
[19:46] * jantje_ (~jan@paranoid.nl) has joined #ceph
[19:46] * Snow-_ (~snow@sputnik.teardrop.org) has joined #ceph
[19:46] * MACscr1 (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[19:46] * Svedrin (svedrin@ketos.funzt-halt.net) has joined #ceph
[19:46] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[19:46] * mech4221 (~steve@ip68-2-159-8.ph.ph.cox.net) has joined #ceph
[19:47] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[19:47] <angdraug> in particular, I'm looking for insights on how would putting journals for multiple OSDs (1 HDD for each) on a single SSD impact resilience of the whole cluster?
[19:48] <angdraug> on one hand, 1:1 ratio between HDD for OSDs and SSD for journals is an obvious overkill
[19:48] <angdraug> on the other, having all journals on a single SSD creates an SPoF and turns a drive failure into a server failure
[19:48] <angdraug> thoughts? links?
[19:48] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[19:48] * twx_ (~twx@rosamoln.org) has joined #ceph
[19:49] * joelio_ (~Joel@88.198.107.214) has joined #ceph
[19:49] * pmatulis_ (~peter@64.34.151.178) has joined #ceph
[19:49] * todin_ (tuxadero@kudu.in-berlin.de) has joined #ceph
[19:49] * zjohnson_ (~zjohnson@guava.jsy.net) has joined #ceph
[19:49] * codice_ (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[19:49] * ikla_ (~lbz@c-67-190-136-245.hsd1.co.comcast.net) has joined #ceph
[19:49] * Zethrok_ (~martin@95.154.26.34) has joined #ceph
[19:49] * Rocky_ (~r.nap@188.205.52.204) has joined #ceph
[19:49] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * markbby (~Adium@168.94.245.2) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * todin (tuxadero@kudu.in-berlin.de) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * dmsimard (~Adium@108.163.152.2) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * leseb (~leseb@88-190-214-97.rev.dedibox.fr) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * claenjoy (~leggenda@37.157.33.36) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * Rocky (~r.nap@188.205.52.204) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * mech422 (~steve@ip68-2-159-8.ph.ph.cox.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * phantomcircuit (~phantomci@covertinferno.org) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * ikla (~lbz@c-67-190-136-245.hsd1.co.comcast.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * Snow- (~snow@sputnik.teardrop.org) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * wrale (~wrale@wrk-28-217.cs.wright.edu) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * Zethrok (~martin@95.154.26.34) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * zjohnson (~zjohnson@guava.jsy.net) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * twx (~twx@rosamoln.org) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * jantje (~jan@paranoid.nl) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * pmatulis (~peter@64.34.151.178) Quit (resistance.oftc.net oxygen.oftc.net)
[19:49] * joelio (~Joel@88.198.107.214) Quit (resistance.oftc.net oxygen.oftc.net)
[19:50] <dmsimard1> angdraug: What I understand on that topic is that all writes are ultimately done on the journal for an OSD
[19:50] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:50] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[19:50] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Read error: Connection reset by peer)
[19:50] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Read error: Connection reset by peer)
[19:51] <dmsimard1> angdraug: With that in mind, it means that the drive on which the journal is located can quickly become a bottleneck if it is not fast enough (e.g, if you plan on putting many OSDs on a single SSD)
[19:51] * dmsimard1 is now known as dmsimard
[19:51] <dmsimard> I have yet to do some tests with journals, i'll get there eventually but that is what I make of the discussions on the channel
[19:53] * phantomcircuit (~phantomci@covertinferno.org) has joined #ceph
[19:53] * leseb (~leseb@88-190-214-97.rev.dedibox.fr) has joined #ceph
[19:54] <angdraug> http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
[19:55] <angdraug> ^ recommends using different SSDs for different OSDs, or forgoing SSDs if there's too many spinners
[19:56] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[19:56] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[19:56] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[19:56] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[19:57] <angdraug> I'm not sure I buy the argument about doubling the load on the SSDs, I guess it's a tradeoff between TCO and downtime
[19:58] <dmsimard> Well, the risk of outage is multiplied - since you have now have SSD or OSD failures
[20:00] * wrale (~wrale@wrk-28-217.cs.wright.edu) has joined #ceph
[20:00] <angdraug> well that one is a tradeoff between resilience and performance
[20:01] <angdraug> if you want more performance out of your osds, there's probably no way around putting journals on ssd
[20:01] <mikedawson> angdraug: just wrote about this on the mailing list http://comments.gmane.org/gmane.comp.file-systems.ceph.user/4267
[20:01] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[20:01] <xarses> i'd guess the crush map could be updated to prevent journal-ssd sharing osd's from being part of the same replication group
[20:02] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[20:03] <jjgalvez> xarses: as long as you keep replicas off the same host you would not have them on any osd sharing an ssd for journals
[20:03] <angdraug> mikedawson: nice thread, thanks!
[20:05] <xarses> "Dell - Internal Use - Confidential"...
[20:05] <mikedawson> angdraug: glad to help
[20:09] <sprachgenerator> I'm having trouble activating an OSD using ceph-deploy (1.2.5) - it reports back as "activating" however it never shows up and I receive: "journal read_header error decoding journal header" both the data/journal on on /dev/sda1,2 respectively
[20:12] * nhm (~nhm@72.21.225.66) has joined #ceph
[20:13] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:14] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[20:16] <angdraug> mikedawson: I think your analysis belongs on http://ceph.com/docs/next/install/hardware-recommendations/#additional-considerations
[20:17] <claenjoy> How can use as backup a nas with ceph-cluster already working ?
[20:20] <mikedawson> angdraug: thanks. it could be useful there.
[20:23] * rweeks (~rweeks@50-0-136-111.dsl.dynamic.sonic.net) has joined #ceph
[20:24] <rweeks> hey rturk and scuttlemonkey: does ceph-deploy not deploy a Ceph Gateway?
[20:24] <rturk> don't believe ceph-deploy does radosgw yet, no
[20:24] * rweeks is reading through docs and can only find "Manual Install" for the Gateway
[20:25] <rweeks> ah.
[20:26] <xarses> rweeks: you mean radiosgw?
[20:26] <xarses> no it dosn't deploy radiosgw
[20:27] <rweeks> ok.
[20:27] <rweeks> any chef recipes or other stuff out there for radosgw, out of curiousity?
[20:28] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[20:29] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has left #ceph
[20:29] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[20:30] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[20:31] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[20:31] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:31] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:31] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:31] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:31] <xarses> i think the chef repo has some radiosgw config in it
[20:31] <scuttlemonkey> rweeks: he juju charms do rgw
[20:31] <xarses> but dnt quote me on it
[20:31] <scuttlemonkey> the*
[20:32] <xarses> s/dnt/dont/
[20:33] <rweeks> ok, I will check those out. Not used juju much yet
[20:36] <rweeks> although I did the setup using ceph-deploy, so i'm not sure if I could use juju to deploy the radosgw at this point
[20:36] * sagelap (~sage@2600:100d:b12c:6006:6c20:a595:3297:eb24) has joined #ceph
[20:36] * Snow-_ is now known as Snow-
[20:37] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:37] <scuttlemonkey> rweeks: yeah, if you want to keep what you have already done it will be easier to just install by hand
[20:37] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:37] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:37] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:37] <rweeks> ok
[20:37] * schlitzer|work (~schlitzer@109.75.189.45) Quit (Ping timeout: 480 seconds)
[20:37] <rweeks> I'll keep the juju in mind for future ones
[20:38] * ScOut3R (~ScOut3R@4E5C2305.dsl.pool.telekom.hu) has joined #ceph
[20:39] * b1tbkt____ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:42] * nhm (~nhm@72.21.225.66) Quit (Ping timeout: 480 seconds)
[20:43] * dosaboy (~dosaboy@host109-158-232-255.range109-158.btcentralplus.com) Quit (Quit: leaving)
[20:43] * dosaboy (~dosaboy@65.93.189.91.lcy-01.canonistack.canonical.com) has joined #ceph
[20:44] * sagelap (~sage@2600:100d:b12c:6006:6c20:a595:3297:eb24) Quit (Ping timeout: 480 seconds)
[20:44] * dosaboy (~dosaboy@65.93.189.91.lcy-01.canonistack.canonical.com) Quit ()
[20:46] * dosaboy (~dosaboy@65.93.189.91.lcy-01.canonistack.canonical.com) has joined #ceph
[20:47] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Ping timeout: 480 seconds)
[20:49] * swinchen (~swinchen@samuel-winchenbach.ums.maine.edu) has joined #ceph
[20:50] * Meths_ is now known as Meths
[20:54] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[20:56] * claenjoy (~leggenda@37.157.33.36) Quit (Quit: Leaving.)
[20:59] * sjm_ (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Quit: Leaving)
[21:03] * sagelap (~sage@2600:100d:b106:43a2:6c20:a595:3297:eb24) has joined #ceph
[21:03] * gregaf1 (~Adium@38.122.20.226) has joined #ceph
[21:03] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[21:04] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[21:05] * scuttlemonkey (~scuttlemo@204.57.119.28) has joined #ceph
[21:05] * ChanServ sets mode +o scuttlemonkey
[21:11] * gregaf1 (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[21:20] * sprachgenerator (~sprachgen@130.202.135.210) Quit (Quit: sprachgenerator)
[21:24] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:24] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:24] * b1tbkt____ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:24] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:24] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:28] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[21:29] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[21:29] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[21:30] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[21:30] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:30] <rsanti> can ceph possibly work with 1 osd?
[21:34] <xarses> yes
[21:34] <xarses> but as always, its not reccomended
[21:34] * rturk is now known as rturk-away
[21:34] <rsanti> yeah, I'm just testing the kvm-ceph integration
[21:35] <rsanti> except ofc it's not working for me so I guess I must've botched it up
[21:37] <xarses> rsanti see http://ceph.com/docs/next/start/quick-ceph-deploy/#create-a-cluster
[21:37] <xarses> "Single Node Quick Start"
[21:38] <xarses> it might still require 2 OSD's on the same host though
[21:38] <rsanti> then I suppose the answer is "no, must have two OSDs"
[21:38] <xarses> is probably some way to update that with the crush map
[21:38] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[21:40] <xarses> rsanti, you can start multiple osd's on the filesystem
[21:40] <xarses> just point ceph-deploy osd create /some/mounted/path
[21:45] <rsanti> xarses: I have to create an lvm anyway given the odd ways the computers I'm working with are partitioned so I might as well have it on a second machine
[21:45] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:45] <rsanti> *lvm volume
[21:45] * b1tbkt___ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:45] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:47] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[21:48] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:58] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[21:59] * allsystemsarego (~allsystem@188.25.131.49) Quit (Quit: Leaving)
[22:02] * rturk-away is now known as rturk
[22:02] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[22:02] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[22:03] * codice_ (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (Remote host closed the connection)
[22:04] * markbby1 (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[22:06] * Vjarjadian (~IceChat77@94.4.30.72) has joined #ceph
[22:07] * sprachgenerator (~sprachgen@130.202.135.210) has joined #ceph
[22:10] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[22:16] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[22:20] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[22:20] <rsanti> heh. by adding a second osd I went from a status of health_warn 192 to health_warn 192, 1/1 osd down
[22:20] <rsanti> a restart of all services later, launching ceph just gives me a process that's waiting for goto
[22:21] <rsanti> *godot
[22:21] <rsanti> I feel like Homer Simpson mixing milk and cereals
[22:21] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[22:22] * rturk is now known as rturk-away
[22:24] * ScOut3R (~ScOut3R@4E5C2305.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[22:24] * rturk-away is now known as rturk
[22:31] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Ping timeout: 480 seconds)
[22:35] <xarses> progress is progress
[22:37] <rsanti> waiting for godot is progress? :)
[22:37] <xarses> no you have osd's is progress
[22:37] <xarses> i dont know that godot is
[22:38] <rsanti> is it normal for ceph to just.... stall?
[22:38] <xarses> no
[22:38] <xarses> not really
[22:39] <xarses> it will behave oddly if your osd's dont have ports 6800 - 7100 tcp open between them
[22:39] <xarses> and could partially explain the issue
[22:39] <xarses> also your mon should have port 6879 open
[22:40] <xarses> erm
[22:40] <xarses> 6789 even
[22:42] * sagelap (~sage@2600:100d:b106:43a2:6c20:a595:3297:eb24) Quit (Ping timeout: 480 seconds)
[22:43] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[22:45] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[22:46] * sagelap (~sage@2600:100d:b106:43a2:6c20:a595:3297:eb24) has joined #ceph
[22:54] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[22:57] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[22:58] * gucki (~smuxi@HSI-KBW-109-192-187-143.hsi6.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[22:58] <rsanti> yeah, the monitors seem alive and well and if I take one down ceph notices
[22:58] <rsanti> otherwise, no signal
[22:59] <rsanti> attaching strace shows blocking futex calls
[23:01] * ircolle1 (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[23:01] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[23:01] * sagelap (~sage@2600:100d:b106:43a2:6c20:a595:3297:eb24) Quit (Ping timeout: 480 seconds)
[23:02] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:03] * rweeks (~rweeks@50-0-136-111.dsl.dynamic.sonic.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[23:05] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[23:06] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[23:06] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit ()
[23:06] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[23:11] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit ()
[23:13] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[23:13] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[23:13] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[23:13] * ircolle1 (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[23:16] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[23:17] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[23:19] * BillK (~BillK-OFT@124-169-207-19.dyn.iinet.net.au) has joined #ceph
[23:23] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:28] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[23:29] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[23:32] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Remote host closed the connection)
[23:32] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[23:33] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[23:33] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[23:35] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[23:35] * ChanServ sets mode +v andreask
[23:37] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:38] <xarses> alfredodeza issue 6348
[23:38] <kraken> xarses might be talking about: http://tracker.ceph.com/issues/6348 [ceph-deploy new.new needs to use same seperator as conf.write]
[23:38] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[23:41] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Quit: Leaving)
[23:46] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[23:48] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[23:48] * ChanServ sets mode +v andreask
[23:59] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.