#ceph IRC Log


IRC Log for 2012-10-03

Timestamps are in GMT/BST.

[0:00] * dty (~derek@testproxy.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[0:02] * cblack101 (86868b4a@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:04] * pentabular (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) Quit (Quit: pentabular)
[0:05] * sage1 (~sage@ Quit (Ping timeout: 480 seconds)
[0:10] <kyle_> gregaf: i'm not sure where i went wrong, i have both monitors running but the cluster is unresponsive. Is there a way to sort of, "restart" the cluster. Meaning have the services all try to reconnect to each other?
[0:13] * pentabular (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has joined #ceph
[0:14] <gregaf> kyle_: do you have an admin socket for the monitors?
[0:14] <kyle_> or actually it looks like my problem is this. this is my mon0 log entry:
[0:14] <kyle_> -- >> pipe(0x1334000 sd=17 pgs=0 cs=0 l=0).failed verifying authorize reply
[0:15] <gregaf> if you do, run "ceph --admin-daemon </path/to/socket> mon_status", and also quorum_status, and pastebin the output
[0:15] <kyle_> okay i'll see
[0:15] <gregaf> kyle_: how did you set up your second monitor?
[0:15] * sage1 (~sage@ has joined #ceph
[0:15] <kyle_> using this: http://ceph.com/docs/master/cluster-ops/add-or-rm-mons/
[0:17] <gregaf> okay, get me that output
[0:17] <kyle_> sorry i'm not sure what my path to the socket would be or when i would've setup an admin socket
[0:18] <gregaf> look in /var/lib/ceph/mon, that's the default path iirc
[0:18] <gregaf> it'll end in ".asok"
[0:18] <gregaf> if not there, you can add it to the ceph.conf file and restart the daemon
[0:21] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[0:24] <kyle_> { "name": "0",
[0:24] <kyle_> "rank": 0,
[0:24] <kyle_> "state": "probing",
[0:24] <kyle_> "election_epoch": 0,
[0:24] <kyle_> "quorum": [],
[0:24] <kyle_> "outside_quorum": [
[0:24] <kyle_> "0"],
[0:24] <kyle_> "monmap": { "epoch": 2,
[0:24] <kyle_> "fsid": "32f5d82e-76bb-4b7d-84fa-7e2ecefdd06a",
[0:24] <kyle_> "modified": "2012-10-01 16:02:15.572455",
[0:24] <kyle_> "created": "2012-08-24 09:53:17.523557",
[0:24] <kyle_> "mons": [
[0:24] <kyle_> { "rank": 0,
[0:24] <kyle_> "name": "0",
[0:24] <kyle_> "addr": "\/0"},
[0:24] <kyle_> { "rank": 1,
[0:24] <kyle_> "name": "mon.0",
[0:24] <kyle_> "addr": "\/0"}]}}
[0:24] <kyle_> so yeah the rank 1 mon should just be name = 1 not mon.0. epic fail there.
[0:25] <gregaf> what's the output when run on the new monitor?
[0:25] <gregaf> having this bad name will be annoying but should be usable
[0:26] <gregaf> (especially if the first thing you do is get rid of it :p)
[0:26] <kyle_> right so yeah the new one looks like it's running independantly
[0:26] <kyle_> { "name": "1",
[0:26] <kyle_> "rank": -1,
[0:26] <kyle_> "state": "probing",
[0:26] <kyle_> "election_epoch": 0,
[0:26] <kyle_> "quorum": [],
[0:26] <kyle_> "outside_quorum": [],
[0:26] <kyle_> "monmap": { "epoch": 0,
[0:26] <kyle_> "fsid": "32f5d82e-76bb-4b7d-84fa-7e2ecefdd06a",
[0:27] <kyle_> "modified": "2012-08-24 09:53:17.523557",
[0:27] <kyle_> "created": "2012-08-24 09:53:17.523557",
[0:27] <kyle_> "mons": [
[0:27] <kyle_> { "rank": 0,
[0:27] <kyle_> "name": "0",
[0:27] <kyle_> "addr": "\/0"}]}}
[0:27] <kyle_> new one should be .81 which is correct in the first one
[0:28] <kyle_> i think i know what to do.
[0:29] * pentabular (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has left #ceph
[0:29] <Tv_> elder: i have an xfs question..
[0:29] <Tv_> sudo mkfs.xfs -f /dev/vdb1
[0:29] <Tv_> mkfs.xfs: /dev/vdb1 contains a mounted filesystem
[0:29] <Tv_> oh mounted not just pre-existing.. oops
[0:29] <gregaf> you'll want to shut down the new monitor and recreate with the name "mon.0", kyle_; I think that should bring it all back up
[0:29] <Tv_> elder: nevermind
[0:29] <elder> Glad I could help.
[0:29] <Tv_> elder: you're a good rubber ducky
[0:30] <joao> gregaf, correct me if I'm wrong, but the new one only has that monmap because it's the only map it knows about
[0:30] <joao> it should be updated once it joins the quorum, no?
[0:30] <gregaf> yeah, but it's also got some internal beliefs about its name which don't match what the existing cluster expects
[0:31] <joao> the problem is why it is not joining the quorum, but I suppose that's because it's ceph.conf has its id as '1' (or smth) instead of mon.0
[0:31] <gregaf> so they aren't able to connect to each other
[0:31] <gregaf> right?
[0:31] <gregaf> yeah, the old version of the monmap isn't the problem
[0:31] <joao> s/it's/its
[0:33] <sagewk> joao: can you look at http://tracker.newdream.net/issues/3252 ?
[0:33] <sagewk> houkouonchi-work just reproduced on burnupi
[0:34] <joao> I don't have much experience cracking down on these kinds of issues (vstart ftw), but I would assume that even starting the monitor (on the cli) with './ceph-mon -i mon.0 -d' should work and provide useful info
[0:34] <joao> sagewk, looking
[0:34] <gregaf> yeah, I don't remember what its beliefs are; it actually might be good enough just restarting with that name
[0:34] <joao> sagewk, that was happening to me too
[0:35] <joao> I mean, when I set the policy on the quorum
[0:35] <joao> erm
[0:35] <joao> on the monitor; w8, too much lines of thought at the same time for this kind of hour
[0:35] <sagewk> joao: in this case i'd compile/build arognaut, vstart, then recompile, then restart just one ceph-mon and debug..
[0:35] <joao> let me rephrase: that happened to me on *my* branch after I set the messenger's policy
[0:36] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) has joined #ceph
[0:36] <elder> sagewk, I updated my patch to use page_offset() as hch suggested. Still "reviewed-by" from you?
[0:36] <joao> sagewk, that might have something to do with the GV policy; I will take a look though
[0:37] <sagewk> elder: yep!
[0:37] <elder> OK.
[0:37] <sagewk> joao: sweet thanks.
[0:37] <joao> with --debug-ms 10 (?) it should show something like "missing features"
[0:37] <sagewk> i suspect it's something slightly more annoying than that, but yeah
[0:38] <joao> sagewk, can this wait till morning?
[0:38] <sagewk> yeah np
[0:40] <joao> I will start on it now though; just making sure that I can go to bed if I eventually get frustrated :p
[0:42] <elder> sagewk, ceph-client/testing now contains: old content of testing branch rebased to 3.6; your four invalid mapping patches; the 32-bit page index fix; and the four btrfs patches.
[0:43] <elder> I'd like to have it go through a nightly cycle or two, then push all but the four btrfs fixes to master.
[0:43] <sagewk> k. we can drop the btrfs ones, actually
[0:43] <sagewk> we don't need them in the nightly
[0:43] <elder> Where are they needed?
[0:44] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has left #ceph
[0:44] <elder> I thought we'd drop them once they're present upstream, at which point we might move ahead (after 3.7-rc1 at some point).
[0:49] * loicd1 (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) has joined #ceph
[0:49] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:51] <joao> my connection to the planas is painfully slow
[0:54] <joao> oh, nevermind; looks like I had two terminals with running monitors dumping a whole lot of logs and hogging bandwidth
[0:56] <kyle_> so i tried as suggested to re-create the monitor using the "correct" name. I get this when try to start up the new mon:
[0:56] <kyle_> root@ceph-mon1:/var/run/ceph# ceph-mon -i mon.0 -d
[0:56] <kyle_> 2012-10-02 15:54:46.383632 7ffac6652780 0 ceph version 0.52-922-g2bf3f8c (commit:2bf3f8c5882961fe1ba27279d07b81c17fa82362), process ceph-mon, pid 5516
[0:56] <kyle_> 2012-10-02 15:54:46.383646 7ffac6652780 1 store(/data/mon.mon.0) mount
[0:56] <kyle_> 2012-10-02 15:54:46.383770 7ffac6652780 0 mon.mon.0 does not exist in monmap, will attempt to join an existing cluster
[0:56] <kyle_> starting mon.mon.0 rank -1 at mon_data /data/mon.mon.0 fsid 32f5d82e-76bb-4b7d-84fa-7e2ecefdd06a
[0:56] <kyle_> 2012-10-02 15:54:46.384550 7ffac6652780 1 mon.mon.0@-1(probing) e0 init fsid 32f5d82e-76bb-4b7d-84fa-7e2ecefdd06a
[0:56] <kyle_> 2012-10-02 15:54:56.385076 7ffac0ecd700 1 mon.mon.0@-1(probing) e0 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
[0:56] <kyle_> 2012-10-02 15:54:56.385102 7ffac0ecd700 1 mon.mon.0@-1(probing) e0 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
[0:56] <kyle_> 2012-10-02 15:55:01.385161 7ffac0ecd700 1 mon.mon.0@-1(probing) e0 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
[0:56] <kyle_> 2012-10-02 15:55:01.385179 7ffac0ecd700 1 mon.mon.0@-1(probing) e0 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
[0:56] <kyle_> 2012-10-02 15:55:06.385232 7ffac0ecd700 1 mon.mon.0@-1(probing) e0 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
[0:56] <kyle_> 2012-10-02 15:55:06.385252 7ffac0ecd700 1 mon.mon.0@-1(probing) e0 discarding message auth(proto 0 26 bytes epoch 0) v1 and sending client elsewhere; we are not in quorum
[0:56] <kyle_> last line just repeats itself
[0:58] <kyle_> i was going to just start from scratch using the new name but i cannot get the monmap or keyring because the cluster is un responsive so i just moved /data/mon.1 to /data/mon.mon.0
[1:03] <joao> well, the only other thing I can think of is to rebuild the monmap and restart the first monitor with --monmap <path-to-new-monmap> and re-add the second monitor
[1:03] <joao> maybe someone else can give some input on whether or not this is advised?
[1:04] <joao> btw, kyle_, are those two monitors running the same version of ceph?
[1:06] <kyle_> hmm actually no they are not one is 0.50 and one is 0.52
[1:07] <kyle_> is it possible to just rebuild the entire cluster from scractch but keep the 500 or so gigs that have been copied to it?
[1:07] <gregaf> nope, that's not going to work
[1:08] <kyle_> okay. understandable. i may just start over, since i have a much better understanding of things now
[1:09] <gregaf> I'm uncertain why it's not joining quorum with the other monitor, though
[1:09] <gregaf> I don't think any feature bits got added between .50 and .52
[1:09] <joao> does 0.52 the latest master?
[1:09] <joao> err, s/does/is/
[1:10] <joao> and by that I mean, does 0.52 already include wip-mon-gv? :)
[1:11] <kyle_> oh that could be it. i'm not using the tagged version. only for the osd node.
[1:11] <kyle_> i think i may start over with everything tagged .52
[1:12] <kyle_> if everything is working right it shouldn't take more than a couple days to get that 500GB copied back over once up.
[1:12] <joao> kyle_, I think you are experiencing exactly the same issue I am about to debug
[1:12] <gregaf> if that works for you!
[1:13] <joao> I mean, if you are using the latest master in the 'newer' monitor
[1:13] <kyle_> yes i am
[1:13] <gregaf> couple days? that's awfully slow, isn't it?
[1:13] <kyle_> yeah it is but when the memory was maxing out and hitting swap, it took like 5 days
[1:14] <gregaf> ah right
[1:14] <gregaf> it should be much faster than that now ;)
[1:14] <kyle_> yeah so i should be alright considering that
[1:15] <kyle_> my main issue was the DRBD i'm migrating from is running out of space so i'm kind of time constrained to have this up before it get too full
[1:22] * johnl (~johnl@2a02:1348:14c:1720:2cc6:1331:fad8:91a8) Quit (Remote host closed the connection)
[1:22] * johnl (~johnl@2a02:1348:14c:1720:edf2:aa8:a67e:4d3b) has joined #ceph
[1:23] <joao> sagewk, was unable to reproduce it
[1:24] <joao> 3 monitors; first ran all 3 using argonaut, then killed them all; then brought only 2 using argonaut, the third using master: nothing popped up
[1:24] <joao> then killed each other of the remaining 2 running argonaut and brought them up using master, and all looks good
[1:25] <joao> maybe there was more to it than just the 2:1 conf?
[1:27] <gregaf> well, obviously there was a client involved that was spamming the new one with attempts to authorize
[1:29] <sagewk> 0.52 does not have wip-mon-gv
[1:30] <joao> sagewk, kyle_ was using master for the monitor
[1:30] * lofejndif (~lsqavnbok@83TAABCZK.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:30] <sagewk> ah
[1:30] <sagewk> so, weird coincidence that this is the same bug you were looking at?
[1:30] * loicd1 (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) Quit (Quit: Leaving.)
[1:31] <joao> given the output and behavior, would seem that way
[1:32] <sagewk> weird!
[1:32] * jlogan (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[1:32] <sagewk> anyway, it's happened/happening on burnupi03 if you wanna look there. and/or check with houkouonchi-work for the other machines involved
[1:33] <joao> what is houkouonchi-work?
[1:33] <sagewk> sandon
[1:33] <houkouonchi-work> me
[1:33] <houkouonchi-work> =P
[1:33] <joao> oh
[1:33] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:33] <joao> right! :p
[1:33] <sagewk> is that something like "hoo-koo-on-chi"?
[1:34] <joao> sagewk, second run at it, same configuration, ran ./ceph -w on the sidelines; no joy :\
[1:34] <houkouonchi-work> it said like 'hoe koh ohn chi'
[1:34] <joao> I would read it as hôkô-onchi
[1:35] <houkouonchi-work> its 方向音痴 which means no sense of direction in Japanese
[1:35] <houkouonchi-work> yeah its long o's
[1:35] <Cube1> which is fitting for sandon...
[1:35] <Cube1> :)
[1:35] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) has joined #ceph
[1:36] <houkouonchi-work> joao: and actually writing it using Hepburn's roomaji system it would be hōkōonchi
[1:41] * scuttlemonkey (~scuttlemo@ has joined #ceph
[1:44] <Tv_> hrmmph..
[1:44] <Tv_> sagewk: have a moment?
[1:44] * edv (~edjv@107-1-75-186-ip-static.hfc.comcastbusiness.net) has left #ceph
[1:47] <joao> sagewk, houkouonchi-work, what other machines belong to this cluster?
[1:49] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) Quit (Remote host closed the connection)
[1:49] <joao> nevermind; found the admin socket :p
[2:10] * Tv_ (~tv@2607:f298:a:607:b899:20f7:e1bb:234c) Quit (Quit: Tv_)
[2:11] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) Quit (Quit: dty)
[2:12] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) has joined #ceph
[2:16] * dty_ (~derek@pool-71-178-175-208.washdc.fios.verizon.net) has joined #ceph
[2:16] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) Quit (Read error: Connection reset by peer)
[2:16] * dty_ is now known as dty
[2:21] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:32] <joao> sagewk, still no luck
[2:32] <joao> will look into it again tomorrow
[2:33] <joao> night #ceph
[2:35] * slang (~slang@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[2:50] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:51] * chutzpah (~chutz@ Quit (Quit: Leaving)
[2:51] <gregaf> wido: congrats on your launch!
[2:52] <gregaf> (http://www.42on.com/)
[2:52] * loicd (~loic@magenta.dachary.org) has joined #ceph
[2:53] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[3:05] * sagelap (~sage@216.sub-70-197-146.myvzw.com) has joined #ceph
[3:07] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) Quit (Ping timeout: 480 seconds)
[3:12] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Quit: leaving)
[3:15] * dmick (~dmick@2607:f298:a:607:1a03:73ff:fedd:c856) Quit (Quit: Leaving.)
[3:16] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[3:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:20] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:23] <iggy> wido: on the front page - "You can storage millions of objects through language bindings for..." You can store... maybe
[3:24] <iggy> wido: also "and various other Open source project surrounding it and we will continue to do so" ... projects ... continue to be?
[3:26] <iggy> wido: "Our experiences with storage date back more then 10 years" not technically wrong, but in normal/USian English, you'd hear it more like "Our experience with storage dates back more then 10 years" (singular experience, plural dates)
[3:26] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) has joined #ceph
[3:26] <iggy> wido: "It should be solid and well thought of, if your storage fails" ... well thought out
[3:27] <iggy> grammar nazi out
[3:28] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[3:29] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit ()
[3:30] * sagelap (~sage@216.sub-70-197-146.myvzw.com) Quit (Ping timeout: 480 seconds)
[3:35] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[3:35] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[3:38] * Cube1 (~Adium@ Quit (Quit: Leaving.)
[3:40] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[3:43] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[3:47] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:49] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:21] * scuttlemonkey (~scuttlemo@ Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[4:27] * LarsFronius (~LarsFroni@ip-109-47-0-115.web.vodafone.de) has joined #ceph
[4:36] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:38] * LarsFronius (~LarsFroni@ip-109-47-0-115.web.vodafone.de) Quit (Quit: LarsFronius)
[4:45] * maelfius (~mdrnstm@ Quit (Quit: Leaving.)
[4:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:55] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[5:07] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[5:13] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[5:20] * amatter (~amatter@ has joined #ceph
[5:24] * nhm (~nhm@174-20-35-45.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[5:25] * amatter_ (~amatter@ Quit (Ping timeout: 480 seconds)
[5:38] * scuttlemonkey (~scuttlemo@ has joined #ceph
[5:52] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:29b6:cf95:db28:2c6c) has joined #ceph
[5:56] * sagelap (~sage@ has joined #ceph
[6:01] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:29b6:cf95:db28:2c6c) Quit (Quit: LarsFronius)
[6:07] * DonaHolmberg (~Adium@cpe-76-173-241-97.socal.res.rr.com) has joined #ceph
[6:11] * nhm (~nhm@174-20-35-45.mpls.qwest.net) has joined #ceph
[6:20] * The_Bishop (~bishop@2001:470:50b6:0:6814:9778:4a30:fd6d) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[6:31] * Cube1 (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:36] * tomaw (tom@tomaw.netop.oftc.net) Quit (Ping timeout: 600 seconds)
[6:45] * deepsa (~deepsa@ Quit (Read error: Connection reset by peer)
[6:47] * deepsa (~deepsa@ has joined #ceph
[6:49] * miroslavk (~miroslavk@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[6:52] * miroslavk (~miroslavk@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:02] * mosu001 (~mosu001@en-439-0331-001.esc.auckland.ac.nz) Quit (Read error: Connection reset by peer)
[7:21] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[7:45] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[8:00] * SvenDowideit (~SvenDowid@203-206-171-38.perm.iinet.net.au) Quit (Ping timeout: 480 seconds)
[8:08] * DonaHolmberg (~Adium@cpe-76-173-241-97.socal.res.rr.com) Quit (Quit: Leaving.)
[8:08] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[8:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:27] * gaveen (~gaveen@ has joined #ceph
[8:47] * deepsa_ (~deepsa@ has joined #ceph
[8:49] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[8:49] * deepsa_ is now known as deepsa
[9:02] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:02] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:07] * verwilst (~verwilst@d5152D6B9.static.telenet.be) has joined #ceph
[9:08] * amatter_ (~amatter@ has joined #ceph
[9:13] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[9:14] * Cube1 (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[9:18] * Leseb (~Leseb@ has joined #ceph
[9:23] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:27] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:37] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) has joined #ceph
[9:44] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[9:53] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:53] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:58] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:58] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[10:00] * pradeep (~6a3379fc@2600:3c00::2:2424) has joined #ceph
[10:40] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:42] * tomaw_ is now known as tomaw
[10:46] <joao> iggy, wido, "dates back more then 10 years" -> "dates back more than 10 years" :p
[10:48] * alexxy[home] (~alexxy@ has joined #ceph
[10:51] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[10:57] * idnc_sk (~idnc_sk@ has joined #ceph
[10:57] <idnc_sk> hi
[10:57] <joao> hello
[10:59] * idnc_sk (~idnc_sk@ Quit ()
[10:59] <Leseb> hi all
[11:01] * idnc_sk (~idnc_sk@ has joined #ceph
[11:02] <idnc_sk> sorry - doing a linux introduction to a non-linux guy - first baby steps
[11:02] <idnc_sk> .. :)
[11:02] <idnc_sk> irc was on the attraction list right after package managers
[11:04] <idnc_sk> nooo worries, I'm searching for a suitable RTFM first wallpaper for him
[11:06] <joao> consider yourself lucky that guy is not a BSc in Law trying out linux on a whim, and wondering why linux has a window manager and not just a CLI :p
[11:10] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[11:10] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:7586:c2c0:a1b7:4037) has joined #ceph
[11:11] <joao> besides, idnc_sk, why not just use a wallpaper like this one instead of a rtfm one? ;) http://etix.files.wordpress.com/2011/01/ubuntucommands.png?w=1200
[11:11] <joao> that resolution sucks though
[11:12] <idnc_sk> joao: great
[11:12] <idnc_sk> fw him the link - btw, smart kid
[11:12] * LarsFronius_ (~LarsFroni@2a02:8108:3c0:79:fd06:aa61:f88b:5d87) has joined #ceph
[11:12] <idnc_sk> heh
[11:12] <idnc_sk> ok, have to go, cu later
[11:13] * idnc_sk (~idnc_sk@ Quit (Quit: leaving)
[11:19] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:7586:c2c0:a1b7:4037) Quit (Ping timeout: 480 seconds)
[11:19] * LarsFronius_ is now known as LarsFronius
[11:20] * Cube1 (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[11:21] * Cube1 (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit ()
[11:36] * pradeep (~6a3379fc@2600:3c00::2:2424) Quit (Quit: TheGrebs.com CGI:IRC)
[11:41] * Leseb_ (~Leseb@ has joined #ceph
[11:41] * Leseb (~Leseb@ Quit (Read error: Connection reset by peer)
[11:41] * Leseb_ is now known as Leseb
[11:56] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:28] * SvenDowideit (~SvenDowid@203-206-171-38.perm.iinet.net.au) has joined #ceph
[12:30] * lofejndif (~lsqavnbok@83TAABDPB.tor-irc.dnsbl.oftc.net) has joined #ceph
[12:35] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:42] * xiu (~xiu@ Quit (Quit: leaving)
[12:42] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:42] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:43] * xiu (~xiu@ has joined #ceph
[12:45] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) Quit (Quit: Leaving.)
[12:46] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) has joined #ceph
[12:53] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) has joined #ceph
[12:59] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[13:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[13:08] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:08] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[13:10] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[13:10] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Remote host closed the connection)
[13:11] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[13:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[14:04] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Quit: Leaving.)
[14:11] * BManojlovic (~steki@ has joined #ceph
[14:11] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[14:23] * lofejndif (~lsqavnbok@83TAABDPB.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[14:45] * EmilienM (~EmilienM@ has joined #ceph
[14:47] * gohko_ (~gohko@natter.interq.or.jp) has joined #ceph
[14:47] * MikeMcClurg (~mike@ has joined #ceph
[14:47] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) has joined #ceph
[14:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[14:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[14:52] * gohko (~gohko@natter.interq.or.jp) Quit (Ping timeout: 480 seconds)
[15:05] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[15:14] * glowell (~Adium@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[15:14] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[15:14] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[15:33] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[15:53] <joao> houkouonchi-work, around?
[16:10] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:10] * loicd (~loic@ has joined #ceph
[16:10] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) Quit (Quit: dty)
[16:20] * cblack101 (c0373626@ircip3.mibbit.com) has joined #ceph
[16:24] * DonaHolmberg (~Adium@cpe-76-173-241-97.socal.res.rr.com) has joined #ceph
[16:25] * EmilienM (~EmilienM@ Quit (Ping timeout: 480 seconds)
[16:26] * dty (~derek@129-2-129-153.wireless.umd.edu) has joined #ceph
[16:30] * EmilienM (~EmilienM@ has joined #ceph
[16:32] * DonaHolmberg (~Adium@cpe-76-173-241-97.socal.res.rr.com) has left #ceph
[16:42] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[16:46] * dty_ (~derek@129-2-129-153.wireless.umd.edu) has joined #ceph
[16:46] * dty (~derek@129-2-129-153.wireless.umd.edu) Quit (Read error: Connection reset by peer)
[16:46] * dty_ is now known as dty
[16:48] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[16:50] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[16:59] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[17:04] * spaceman139642 is now known as spaceman-39642
[17:06] * sagelap (~sage@2600:1013:b017:a67a:5937:7102:3b80:124d) has joined #ceph
[17:09] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:13] * kyle_ (~kyle@ip03.foxyf.simplybits.net) Quit (Quit: Leaving)
[17:15] * slang (~slang@ace.ops.newdream.net) Quit (Remote host closed the connection)
[17:16] * verwilst (~verwilst@d5152D6B9.static.telenet.be) Quit (Quit: Ex-Chat)
[17:17] * dty_ (~derek@129-2-129-153.wireless.umd.edu) has joined #ceph
[17:17] * dty (~derek@129-2-129-153.wireless.umd.edu) Quit (Read error: Connection reset by peer)
[17:17] * dty_ is now known as dty
[17:17] * scuttlemonkey (~scuttlemo@ Quit (Ping timeout: 480 seconds)
[17:22] * dabeowul1 (dabeowulf@free.blinkenshell.org) Quit (Remote host closed the connection)
[17:22] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[17:24] * Tv_ (~tv@2607:f298:a:607:5c1e:e9a0:aa30:35e7) has joined #ceph
[17:26] * ninkotech_ (~duplo@ Quit (Quit: Konversation terminated!)
[17:26] * ninkotech (~duplo@ has joined #ceph
[17:28] * dabeowulf (dabeowulf@free.blinkenshell.org) has joined #ceph
[17:29] * loicd1 (~loic@ has joined #ceph
[17:29] * loicd (~loic@ Quit (Read error: No route to host)
[17:34] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[17:36] * scuttlemonkey (~scuttlemo@ has joined #ceph
[17:38] * EmilienM (~EmilienM@ Quit (Read error: No route to host)
[17:39] * dty (~derek@129-2-129-153.wireless.umd.edu) Quit (Quit: dty)
[17:43] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[17:44] * jlogan1 (~Thunderbi@ has joined #ceph
[17:45] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[17:46] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[17:47] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[17:50] * sagelap (~sage@2600:1013:b017:a67a:5937:7102:3b80:124d) Quit (Ping timeout: 480 seconds)
[17:52] * sagelap1 (~sage@ has joined #ceph
[17:52] <sagewk> glowell: pushed some edits to release-process.rst, look ok?
[17:52] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:56] <glowell> Yes, thanks for the corrections.
[18:03] <sagewk> glowell: let's fix the the gen_reprepro_conf so that it gets rid of the current weird dists/components stuff and pulls the list out of $bindir/deb_dists
[18:03] <sagewk> that'll simplify somewhat
[18:05] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[18:07] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:09] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[18:09] <slang> I pushed some changes to teuthology to allow for easier deployment elsewhere
[18:09] <slang> its in wip-gendeploy
[18:09] <slang> joshd: want to take a look?
[18:10] <glowell> sagewk: ok, I was also looking at gen_reprepro_conf for the fix to Dan's complaint about the Content-<arch>.gz file not being generated.
[18:10] <slang> joshd: I'm not sure how locker.py was getting started on sepia -- I had to add the main to get it to run
[18:11] <joshd> slang: yeah, I just noticed those
[18:12] <slang> hmm...I should probably put the config values back to what they were :-)
[18:12] <joshd> locker.py is being run by apache with mod_wsgi on sepia
[18:12] <slang> ah
[18:13] <joshd> but in general it's a lot easier to get running without worrying about the locking infrastructure (i.e. check_locks: false)
[18:14] <slang> I just saw sage's email
[18:15] <sagewk> joshd, slang: maybe teuthology should default to check-locks: false behavior when the .teuthology.yaml doesn't specify a lock server.
[18:15] <joshd> the user/host genericizing make sense
[18:15] <sagewk> that will make it less painful for casual users
[18:15] <joshd> sagewk: that's a good idea
[18:16] <slang> how does it find a list of machines without the lock server?
[18:16] <joshd> teuthology-{schedule,suite,worker} should have a useful error message when queue_server isn't defined too
[18:16] * gregaf (~Adium@ Quit (Remote host closed the connection)
[18:16] * loicd1 (~loic@ Quit (Ping timeout: 480 seconds)
[18:16] * gregaf (~Adium@ has joined #ceph
[18:16] <joshd> slang: you can specify the machines to run on in a targets: section in your config
[18:17] <slang> joshd: I see
[18:17] <gregaf> we'll need to be super-careful about our processes in that case, or every new hire's first teuthology run will smash into somebody else's ;)
[18:18] * gaveen (~gaveen@ has joined #ceph
[18:18] <slang> joshd: the workers adding themselves might make the process of setting up a lock server a lot more straightforward - the locker could even add the table to the database if it doesn't exist yet
[18:18] <joshd> gregaf: that's why it'd only happen before they configure their .teuthology.yaml. they aren't likely to get a list of targets to run on before then
[18:19] <joshd> slang: that's another part that's not necessary for the base setup though - you can run jobs without having them go into a queue an run by workers
[18:20] <slang> oh no workers either
[18:20] <slang> well snap
[18:20] <joshd> slang: a while ago teuthology-suite would just run everything itself instead of putting jobs in a queue - it wouldn't be hard to restore that functionality if e.g. queue_server isn't defined
[18:21] <joshd> your changes are certainly good if someone wants to set up the whole infrastructure, but it's not the easiest way to get going
[18:22] <gregaf> oh lol, I didn't realize everything routed through the queue server now :p
[18:23] <joshd> the canonicalize_hostname stuff could read default values from .teuthology.yaml as well (loaded by teuthology.misc.read_config)
[18:24] <slang> I guess I can see that from the doc as I read it now, but the 'Reserving target machines' section makes it seem like the locking stuff is necessary
[18:25] <slang> joshd: you mean for ubuntu/sepia defaults?
[18:26] <joshd> yeah
[18:26] <slang> user: slang
[18:26] <slang> default-user: ubuntu
[18:26] <slang> *shrugs*
[18:27] <slang> that seems a bit confusing to me
[18:28] * miroslavk (~miroslavk@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[18:28] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[18:29] <joshd> slang: it's not too important, I've just tried to put any setup-specific stuff like hosts in .teuthology.yaml
[18:34] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:36] * miroslavk (~miroslavk@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:39] * allsystemsarego (~allsystem@ has joined #ceph
[18:43] * edv (~edjv@ has joined #ceph
[18:45] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) Quit (Quit: tryggvil)
[18:46] * adjohn (~adjohn@ has joined #ceph
[18:56] * MikeMcClurg (~mike@ Quit (Quit: Leaving.)
[18:57] * adjohn (~adjohn@ Quit (Quit: adjohn)
[18:59] * adjohn (~adjohn@ has joined #ceph
[18:59] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:59] * MishMash (~mishmash@5355D083.cm-6-6d.dynamic.ziggo.nl) has joined #ceph
[19:00] * scuttlemonkey (~scuttlemo@ Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[19:00] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[19:09] * adjohn (~adjohn@ Quit (Quit: adjohn)
[19:10] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[19:12] * chutzpah (~chutz@ has joined #ceph
[19:12] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:fd06:aa61:f88b:5d87) Quit (Quit: LarsFronius)
[19:31] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:32] * yehudasa_ (~yehudasa@ has joined #ceph
[19:32] <stan_theman> i'll be on call
[19:32] <stan_theman> ww
[19:34] * maelfius (~mdrnstm@ has joined #ceph
[19:41] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:4862:93e3:9b56:6cce) has joined #ceph
[19:46] * dmick (~dmick@2607:f298:a:607:1a03:73ff:fedd:c856) has joined #ceph
[19:46] * miroslavk (~miroslavk@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[19:51] * pentabular (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has joined #ceph
[19:51] <pentabular> Tv_:
[19:51] <pentabular> https://github.com/seanchannel/ceph/commit/bd425c06fedd6e18659120ba142d4586668b8a9e#admin/build-doc
[19:52] <pentabular> easy peasy and just how you like it :)
[19:54] * scuttlemonkey (~scuttlemo@ has joined #ceph
[19:56] * yehudasa_ (~yehudasa@ Quit (Ping timeout: 480 seconds)
[19:57] * amatter_ (~amatter@ Quit (Remote host closed the connection)
[19:57] * amatter (~amatter@ has joined #ceph
[20:00] * Kioob (~kioob@luuna.daevel.fr) Quit (Ping timeout: 480 seconds)
[20:10] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[20:11] * gaveen (~gaveen@ has joined #ceph
[20:20] <joao> has anyone seen this?
[20:20] <joao> 2012-10-03 11:19:23.203782 7f7448951700 -1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7f7448951700 time 2012-10-03 11:19:23.202133
[20:20] <joao> msg/SimpleMessenger.cc: 1297: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)
[20:21] <sagewk> yeah.. on argonaut.
[20:22] <sagewk> but not v0.52 or later.. which crashed?
[20:22] <joao> argonaut
[20:22] <sagewk> ok.
[20:22] <joao> ran just fine the second time around
[20:22] <sagewk> yeah, it's a hard race to hit.
[20:22] <sagewk> the fix is extensive, not sure if it'll be merged into argonaut or not.
[20:22] <joao> well, there it goes again
[20:23] <joao> maybe third time's the charm? :p
[20:23] <joao> yep, it was
[20:32] * morpheus__ (~morpheus@foo.morphhome.net) has joined #ceph
[20:32] <Tv_> pentabular: so the --reinstall is to ensure it's a sphinx-in-venv not python-sphinx.deb?
[20:32] <Tv_> pentabular: i'm not sure why that has to be done
[20:33] * amatter_ (~amatter@ has joined #ceph
[20:33] <pentabular> if you have sphinx already installed you need all those options to force it to be compiled into the virtualenv
[20:33] <pentabular> that way it ignores your system-installed sphinx and copes gracefully
[20:33] <Tv_> pentabular: but why?
[20:33] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[20:34] <pentabular> ah: because you were right
[20:34] <pentabular> ..can
[20:34] <pentabular> ...can't use sphinx from system installed package
[20:34] <Tv_> pentabular: no my reason was that the sphinx.deb at the time was too old
[20:35] <pentabular> ..... https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1056103/comments/6
[20:35] <Tv_> that's just not understanding virtualenv
[20:35] <pentabular> even if you have sphinx already installed (as I do for other things), you cannot make use of asphixiate and sphinx-ditaa
[20:36] <Tv_> pentabular: you put the extensions in virtualenv; you need to have that virtualenv active
[20:36] <pentabular> yes, even with it active the system-package for sphinx does not pick them up
[20:36] <Tv_> pentabular: running /usr/bin/sphinx-build does not achieve that
[20:36] * amatter (~amatter@ has joined #ceph
[20:37] <pentabular> by the way: this sped up the docs build by 10 min.
[20:37] <Tv_> pentabular: ugly kludge: PYTHONPATH=`pwd`/../src/pybind ./virtualenv/bin/python /usr/bin/sphinx-build -a -b dirhtml -d doctrees ../doc output/html
[20:37] <pentabular> ^ ^ that's not my code. ;)
[20:37] <pentabular> ..existing
[20:37] <Tv_> pentabular: read what i pasted
[20:38] <Tv_> pentabular: i'm running sphinx-build from python-sphinx.deb using the python from the virtualenv
[20:39] <pentabular> oh.. you're using said "ugly kludge" ?
[20:39] * loicd (~loic@ has joined #ceph
[20:39] <Tv_> pentabular: it assumes sphinx-build is in /usr/bin instead of doing a path lookup, so it's not all that pretty
[20:39] <Tv_> and if sphinx-build were to become a shell wrapper that execs something etc, it'd stop working
[20:39] <Tv_> but there's no sphinx in my virtualenv
[20:40] <Tv_> so take that part out of your patch, clean up that kludge if you can find a good way ;) and i'll merge your branch
[20:41] * amatter_ (~amatter@ Quit (Ping timeout: 480 seconds)
[20:41] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[20:41] <pentabular> hm.. okay. but unclear on who/which kludge we mean (looking..)
[20:42] <Tv_> compare that line and what's in master admin/build-doc
[20:43] <pentabular> got it.
[20:43] <pentabular> haven't tried it that way.
[20:44] <Tv_> now, if the system sphinx happens to be old enough, and pip -r installs >=1.1.2 according to admin/doc-requirements.txt, then /usr/bin/sphinx-build is still the old one, and argh
[20:45] <Tv_> it'll keep working as long as /usr/bin/sphinx-build is a tiny python wrapper that just imports a module
[20:45] * amatter_ (~amatter@ has joined #ceph
[20:45] <pentabular> right, so I figured forcing it to build sphinx in the venv let you have whatever system packages you want and the rest are dealt with by pip
[20:45] <pentabular> no-breaky
[20:45] <Tv_> pentabular: also, if you do this, then please add check for python-sphinx into the dpkg packages and sphinx-build into the else branch command list
[20:46] <pentabular> well, okay, but pip already does that, hence the forceful args
[20:46] <Tv_> pentabular: is --system-site-packages all that useful if sphinx still needs to installed? that's like the biggest packages that goes in there
[20:46] <pentabular> sphnx is not that big at all, and this shortened the docs build by 10 min.
[20:47] <Tv_> curious about what's slow then
[20:47] <pentabular> compiling everything in venv
[20:47] <Tv_> compiling?
[20:47] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[20:47] <Tv_> do you mean .pyc, or is there some c extension getting installed?
[20:48] <pentabular> right now everything from doc-requirements.txt gets downloaded and built within the venv, and that's what takes so long.
[20:48] <pentabular> I have most of that stuff in system packages, so no need.
[20:48] <Tv_> that's 3 lines
[20:48] <Tv_> or is this about sphinx's dependencies?
[20:49] <pentabular> er.. actually it's all the dependencies I think
[20:49] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[20:49] <pentabular> sphinx's deps, that is
[20:49] * cblack101 (c0373626@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[20:49] <Tv_> sounds like downloading docutils takes ages
[20:49] <Tv_> i wonder if their webhosting is busted
[20:49] <Tv_> ohh lxml
[20:49] <Tv_> how i hate you so
[20:49] <Tv_> ok that explains everything
[20:50] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[20:50] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[20:50] <pentabular> hm.. checking that
[20:51] <Tv_> pentabular: yeah it's lxml that's slow
[20:51] <Tv_> pentabular: ok so one last question in my mind is, whether to install sphinx in venv or not
[20:51] <Tv_> pentabular: i'm off to a meeting, sorry!
[20:51] <pentabular> cheers!
[20:53] * amatter (~amatter@ has joined #ceph
[20:57] * amatter_ (~amatter@ Quit (Ping timeout: 480 seconds)
[21:01] * slang (~slang@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[21:13] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[21:14] <stan_theman> is ceph-debugpack still the preferred way of submitting information with a report?
[21:19] <Tv_> stan_theman: if it isn't, we'll need to improve it ;)
[21:20] <amatter> hi guys. still working on troubleshooting a significant performance issue where we write some data then the the whole cluster comes to a halt. See rados bench here: http://pastebin.com/qVt9C0Mc. 7 osds running 12.04 on raid0 7.2k btrfs with latest 3.5.4 kernel.
[21:20] <Tv_> amatter: please check dmesg on your osd nodes, just to eliminate reasons
[21:22] <Tv_> amatter: but that's not just bad performance, that's a hang that lasts quite a long time
[21:22] <Tv_> "finished" sitting at 34
[21:23] <amatter> no messages in dmesg, once the bench starts all of the osds start complaining about slow ops
[21:23] <Tv_> amatter: yeah i see three options: 1. underlying fs broke 2. underlying block device broke 3. ceph bug
[21:23] <amatter> here's a dd on one of the osds: dd if=/dev/zero of=/data/test bs=4096 count=409600 conv=fdatasync : 1677721600 bytes (1.7 GB) copied, 10.9161 s, 154 MB/s
[21:24] <Tv_> nhm: do you have time? ^
[21:25] * danieagle (~Daniel@ has joined #ceph
[21:25] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[21:26] <amatter> Tv_: yes, I've been trying to isolate where the problem is. At first I thought btrfs 3.2 was broken, so now I've just updated everything to 3.5.4, but maybe I need to delete the volumes and have them rebuild in case the performance issue is being grandfathered in by the newer btrfs kernel code
[21:27] <Tv_> amatter: oh yeah btrfs definitely experiences bad aging still
[21:27] * MishMash (~mishmash@5355D083.cm-6-6d.dynamic.ziggo.nl) has left #ceph
[21:28] <amatter> Tv_: I think I'm just going to abandon btrfs and switch each osd over to xfs.
[21:29] <sagewk> amatter: the btrfs performance issues are related to fragmentation. the new code is better, but an fs previously fragmented by the old code won't get better by upgrading the kernel
[21:29] <amatter> is xfs or ext4 the better alternative to btrfs?
[21:30] <sagewk> we've been using xfs. nhm would know more about specific ext4 performance behavior
[21:31] <amatter> sagewk: thanks.
[21:31] <sagewk> np :)
[21:32] <Tv_> sagewk: last i heard ext4+leveldb was kinda in between xfs & btrfs (pre-fragmentation) in performance
[21:32] <Tv_> sagewk: but he hadn't done much aging, i recall
[21:32] <amatter> sagewk: would that fragmentation cause poor writes on the bench that I'm seeing or just slow access to existing data?
[21:32] <sagewk> both. it's metadata fragmentation that is the issue, so it'll affect reads and writes
[21:32] <Tv_> sagewk: oh also, that reminds me.. what do we want to do with depending/not depending on xfsprogs and btrfs-tools?
[21:33] <Tv_> sagewk: ceph-disk-prepare still has ext4 by default because mkfs.ext4 is the one i can rely on
[21:33] <sagewk> is there any reason *not* to depend on them? as long as they are in main on ubuntu and whatever teh debian equivalentis
[21:33] <Tv_> sagewk: yeah though ubuntu is no guarantee of suse/centos/...
[21:34] <Tv_> they are both in main in 12.04
[21:34] <sagewk> the only loser ther will be btrfsprogs and centos/rhel, i think
[21:34] <sagewk> so maybe make xfsprogs required, btrfsprogs recommended, and default to xfs?
[21:35] <sagewk> i don't think making ext4 the default when we don't use/test it as much
[21:35] <Tv_> sagewk: works for me, i'll put commits into wip-fstypes
[21:35] <Tv_> sagewk: yeah it's a sad default, always installable but not necessary well-tested
[21:41] <Tv_> sagewk: ohh crap gdisk is in universe
[21:41] <sagewk> ah man... sigh
[21:42] <Tv_> sagewk: it's the only scriptable gpt editor i've been able to find, and ceph-disk-prepare needs it; it's currently a Recommends: (plus installed explicitly by ceph-deploy install)
[21:42] <sagewk> that is probably ok
[21:42] <Tv_> that means ceph-disk-prepare without ceph-deploy will often fail, for newbies
[21:43] <Tv_> we can improve the error message, though
[21:43] <Tv_> not a showstopper
[21:43] <pentabular> $.02 ubuntu installs all 'recommends' by default, fwiw, tho users like me tend to change that
[21:43] <Tv_> but fundamentally, we'll be actively using software from universe
[21:43] <Tv_> pentabular: yeah except a lot of automated install things explicitly ask to not have that, as Recommends: gets soo much stuff
[21:44] <pentabular> indeed.
[21:44] <Tv_> but perhaps that'll help the newbies just enough
[21:44] * Cube (~Adium@ has joined #ceph
[21:50] <Tv_> sagewk: pushed to wip-fstypes
[21:52] <sagewk> tv_: k
[21:54] * pentabular (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) Quit (Quit: pentabular)
[21:58] * cblack101 (86868949@ircip4.mibbit.com) has joined #ceph
[21:58] <nhm> amatter: heya, sorry, was working on other stuff.
[21:59] <amatter> nhm: hi
[21:59] <nhm> amatter: what kind of write was causing the stall?
[22:00] * adjohn (~adjohn@ has joined #ceph
[22:00] <amatter> nhm: pretty much any sustained write, but in this case a bench from rados bench. There is no other load on the cluster.
[22:01] <nhm> amatter: Does the write itself stall, or just other stuff while the write is going?
[22:01] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[22:01] <amatter> nhm: the write itself: see http://pastebin.com/qVt9C0Mc
[22:01] * Cube (~Adium@ Quit (Ping timeout: 480 seconds)
[22:02] <nhm> amatter: How many OSDs?
[22:03] * Cube (~Adium@ has joined #ceph
[22:03] <amatter> nhm: 7 identical osds, raid0 btrfs on 12.04 3.5.4. Here's a result of a representative dd on an osd: dd if=/dev/zero of=/data/test bs=4096 count=409600 conv=fdatasync 1677721600 bytes (1.7 GB) copied, 10.9161 s, 154 MB/s
[22:04] <nhm> amatter: My armchair analysis is that one of your OSDs is slow/stalled and by the time it's completed 34 ops, all of the in-flight operations are waiting on some backed up OSD.
[22:04] <nhm> amatter: if you increase the number of in flight ops with the -t parameters to rados bench, I bet the same thing will happen, but it will just take a bit longer.
[22:05] * EmilienM (~EmilienM@195-132-228-252.rev.numericable.fr) has joined #ceph
[22:08] <amatter> nhm: hmm, see: http://pastebin.com/8ecmDM39
[22:08] <amatter> nhm: no stall there except for the first few seconds when I start a bench
[22:09] <nhm> amatter: hrm. I would expect it to do better, but it's interesting that it doesn't stall at all.
[22:09] <nhm> amatter: what if you try with -t 1?
[22:10] <amatter> nhm: -t 1: http://pastebin.com/snpQTBVa
[22:10] <nhm> hrm... does -t 16 still stall?
[22:10] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[22:10] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[22:11] <amatter> nhm: no, working now. i've been fighting this problem for a month and now it's mysteriously resolved itself
[22:11] <nhm> amatter: it's because we are watching. The gremlins know.
[22:12] <amatter> nhm: seems to be the case. you indicate that the performance of -t 64 wasn't that great. Any suggestions on how to improve it?
[22:12] <nhm> amatter: What kind of boxes are these?
[22:13] <nhm> amatter: Probably need to know more about your setup. What controllers, using expanders? how many drives, where are the journals, etc.
[22:13] <amatter> nhm: Dell PowerEdge 850 / 3.2GHz dual core / 4GB ECC ram / 2x 7.2k SATA
[22:13] <amatter> nhm: mdadm raid except one box where I'm expirementing with btrfs own raid0
[22:13] <nhm> amatter: I'd try skipping the raid and just doing 2 OSDs per box.
[22:14] <amatter> nhm: journals are on the same array and partition as the data
[22:18] <nhm> amatter: Yeah, I guess my thought would be to skip the md raid, put 2 partitions on each disk, one for the journal and one for the data disk, and have 2 OSDs per box.
[22:18] * Tv_ (~tv@2607:f298:a:607:5c1e:e9a0:aa30:35e7) Quit (Quit: Tv_)
[22:19] <nhm> amatter: btrfs has the best performance on a fresh filesystem but has some issues over time. XFS starts out slower but ages better (though still slows down some). Ext4 seems to start out in-between BTRFS and XFS. We don't really know yet how well it ages.
[22:20] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] <dmick> Tv: *seconds* ahead of me with the latest response about How to write data to rbd pool
[22:23] <dmick> I'm beginning to wonder...
[22:23] <dmick> oh gone doh
[22:23] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[22:24] * scuttlemonkey (~scuttlemo@ Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[22:25] <jlogan1> What is the state of libvirt integration, client side? I'm looking at Foreman and virt-manager as my initial 2 entry points to running my VMs under Ceph rbd.
[22:25] <jlogan1> But neither of those allow me to use rbd as the storage device when I run the setup. I am able to use those tools to make a VM then use qemu-img convert and then edit the host directly.
[22:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:31] <Anticimex> hmm, how does a ceph cluster compare with more familiar SANs like netapp/3par etc
[22:31] <Anticimex> let me just throw out a number from 3par of "150k IOPS"
[22:31] <gregaf> compare in what sense?
[22:31] <gregaf> ah
[22:32] <gregaf> that depends very, very much on your configuration
[22:32] <Anticimex> would it be feasible (cheap) to create a ceph storage cluster with some really sharp frontends that could expose this somehow?
[22:32] <Anticimex> indeed
[22:32] <Anticimex> :)
[22:32] <Anticimex> preferably even as FCoE
[22:32] <Anticimex> i wonder if anyone has had those thoughts
[22:33] <gregaf> if we put it on SSDs then we get something like 10k IOPs per daemon, so you could get 150k (replicated) over 30 disks, which you can squeeze into a pretty small form factor
[22:34] <gregaf> or if you just want that much for a benchmark then you just need SSD journals and can wait to flush it out to spinning disks
[22:34] <gregaf> and of course if using the filesystem from the client then you can merge ops and get even better numbers
[22:35] <nhm> gregaf: I'm not sure we actually scale that well on SSDs.
[22:35] <nhm> gregaf: for small writes. I don't know that we don't, but I'm wary.
[22:35] <gregaf> but most people running on spinny disks see something that approximates (disk_num*disk_IOPS/replication)
[22:35] <gregaf> for sustained load
[22:35] <Anticimex> the scenario i'm considering is essentially the feasability of using ceph as the storage system, and still have front ends that expose this in market-familiar ways (SMB, NFS, FCoE, iSCSI)
[22:35] <gregaf> ah
[22:36] <Anticimex> such a front end could even be commercial, though it would require some integration i suppose, with ceph i mean
[22:36] <gregaf> well with a proper frontend you can do all kinds of ridiculous things
[22:36] <Anticimex> right
[22:36] <gregaf> like flash and battery-backed RAM caches
[22:36] <gregaf> so hell if I know
[22:36] <Anticimex> mmm
[22:37] <Anticimex> it boils down to cost of such a front end, various stability issues of course, and how ceph storage scales vs the wellknown storage vendors racks
[22:37] <gregaf> yeah
[22:37] <Anticimex> saw one data point of typical DC costs spread over networking, mgm sys, servers, storage etc
[22:37] <gregaf> you should be able to build a Ceph system with similar performance but less cost than the vendors, or we've failed ;)
[22:37] <Anticimex> that said storage represented 51% of investment. which to me sounds a bit high, ie, storage systems are overpriced :)
[22:38] <Anticimex> gregaf: mmm, i'm just wonder how much margin there is there :)
[22:38] <Anticimex> i suppose quite a lot
[22:38] <nhm> Anticimex: popular storage vendors make a lot of money.
[22:38] <gregaf> but if you want to get into their IOP ranges then you'll definitely need to spend some real money
[22:38] <Anticimex> sure
[22:38] * loicd (~loic@ Quit (Quit: Leaving.)
[22:38] <Anticimex> gregaf: the popular vendors charge real money too :)
[22:39] <gregaf> yeah
[22:39] <gregaf> I'm just saying it's not magic, if you have SSD journals and spinny disks stores your long-term throughput is still going to be stuck at the spinny disk throughput
[22:40] <gregaf> whereas some of the vendors have things like internal log-structured filesystems so they can compress a bunch of small IOs into one big IO
[22:40] <gregaf> aggregate throughput scales very nicely so far though, right nhm?
[22:41] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:41] <gregaf> whereas because they have those silly frontends you can stack up more storage and not ever increase your throughput, which just makes it a bigger and bigger wall
[22:42] <nhm> gregaf: Large IO yes. SmallIO, we seem to hit some limits.
[22:42] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[22:44] <dmick> jlogan1: yes, virt-manager can't do it directly; don't know Foreman
[22:45] <jlogan1> dmick: Thanks for confirming that.
[22:45] <jlogan1> How are you creating your VMs?
[22:46] <jlogan1> http://theforeman.org/ does a really nice job of working with puppet to make the VMs and then build/configure the VMs it creates. I set of web pages, it even use libvirt to show you the VNC console in the browser.
[22:47] <dmick> I don't fully understand the interfaces involved yet, but I guess there's different interfaces for managing storage pools vs. storage volumes, and that's part of the reason it doesn't fit in as well as one might hope?...
[22:47] <dmick> I've hoped to have time to dig further
[22:47] <dmick> I generally create them with virt-manager, but
[22:47] <jlogan1> it I could have foreman add rbd then it would be one stop shopping for new hosts.
[22:47] <dmick> if I want rbd volumes, I have to edit the xml
[22:48] <dmick> but I'm just hacking, not doing this for any kind of production
[22:54] <joao> can anyone confirm that this syntax is correct (on ceph.conf's [global])
[22:54] <joao> # able to list their addresses
[22:54] <joao> mon host =,,
[22:54] <joao> fsid = ca6b70a6-4cf9-4ec0-81bb-ad54cde3cfa0
[22:54] <joao> mon_initial_members = plana61,plana84,plana41
[22:55] <jlogan1> dmick: Thanks for your input.
[22:55] <jlogan1> Are there any tools in place today for VM creation that know rbd?
[22:56] <gregaf> joao: that looks right to me
[22:56] <joao> gregaf, thanks
[22:56] <gregaf> err, sorry, use spaces instead of underscores for mon initial members
[22:56] <dmick> jlogan1: openstack, cloudstack. Those are a little heavier than just VM creation
[22:57] <joao> gregaf, would that cause any problems? I was under the impression that it would end up being converted to '_' anyway
[22:57] <joao> or '-'
[22:57] <gregaf> Openstack doesn't really do a great job of handling it either in this version :/
[22:57] <gregaf> joao: I don't remember exactly how the parsing works; I suspect it would be fine but am not certain
[22:58] <joao> ok, will give it a shot with spaces instead
[22:58] <dmick> gregaf: if you mean Folsom, which is released, I'm told it does things pretty well
[22:58] <gregaf> oh right, forgot it was out now
[22:58] <gregaf> was thinking of Essex, of course
[22:58] <dmick> there's in-progress documentation at https://github.com/ceph/ceph/tree/wip-rbd-openstack-doc, jlogan1
[23:00] <jlogan1> I'm on Ubuntu 12.10, should I give Openstack another look? Before it seemed a bit clumsy to setup, and I was having nothing but trouble trying to put hosts on different vlans for the load balancer to see.
[23:03] <dmick> It is moving fast, and getting measurably better as far as I can see
[23:03] <dmick> it's still a big deployment framework
[23:05] * pentabular (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has joined #ceph
[23:05] * pentabular is now known as Guest404
[23:06] * Guest404 is now known as pentabular
[23:07] <pentabular> Tv_: how does this look?
[23:07] <pentabular> https://github.com/seanchannel/ceph/commit/08f13c95bf210f433636d00d76b725feee6d3a37
[23:10] <joshd> jlogan1: it looks like Foreman is using libvirt storage pools
[23:11] <joshd> the libvirt method for building a vm is not implemented by the rbd storage driver right now
[23:11] * dty (~derek@testproxy.umiacs.umd.edu) has joined #ceph
[23:11] <gregaf> pentabular: he had to take off for the day, might be on later though
[23:11] * EmilienM (~EmilienM@195-132-228-252.rev.numericable.fr) has left #ceph
[23:11] <pentabular> ah. thanks!
[23:18] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[23:23] <pentabular> does ceph accept pull requests via github or only via email?
[23:25] <gregaf> github works!
[23:25] <pentabular> thx! that's a little easier (and prettier)
[23:29] * miroslavk1 (~miroslavk@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[23:34] * miroslavk (~miroslavk@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[23:36] * scuttlemonkey (~scuttlemo@ has joined #ceph
[23:38] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:41] * LarsFronius_ (~LarsFroni@2a02:8108:3c0:79::2) has joined #ceph
[23:41] <dty> btw, if anyone is trying to deploy radosgw with RHEL6/CENTOS6 and mod_fastcgi you definitely need the newest version of it and not 2.4.6 (eg. SNAP-0910052141). Fought with it for a long time before comparing it to a precise 12.04 install and noticing the version precise was deploying
[23:47] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:4862:93e3:9b56:6cce) Quit (Ping timeout: 480 seconds)
[23:47] * LarsFronius_ is now known as LarsFronius
[23:49] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Quit: Leaving.)
[23:58] * slang (~slang@ace.ops.newdream.net) Quit (Quit: slang)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.