#ceph IRC Log


IRC Log for 2012-04-30

Timestamps are in GMT/BST.

[1:19] <sage> burnupi healthy:
[1:19] <sage> 2012-04-29 19:18:29.676004 pg v22507: 65536 pgs: 65536 active+clean; 102320 KB data, 2138 GB used, 185 TB / 189 TB avail
[1:19] <sage> 2012-04-29 19:18:29.829559 osd e11033: 206 osds: 206 up, 206 in
[1:19] <sage> (11000 torturous epochs of thrashing)
[1:20] <joao> cool
[13:46] <elder> Thanks for stopping by mermi!
[13:47] <joao> lol
[13:47] <joao> morning elder
[13:47] <elder> Good morning joao. What time is it in your part of the world?
[13:47] <joao> lunch hour
[15:37] * aliguori (~anthony@ has joined #ceph
[15:51] <ao> Hi, does someone here know when 0.46 might be available? I am preparing new tests and would like to have the write cache for speedup.
[16:16] <joao> nhm, around?
[16:28] * cattelan_away is now known as cattelan
[18:04] <nhm> joao: sorry, I didn't have my IRC window FGed, what's up?
[18:06] <joao> have you ever configured a cluster of 1 of each on the same machine?
[18:06] <joao> I mean a ceph cluster
[18:06] <joao> "cluster"
[18:06] <nhm> joao: hrm, not sure what you mean by "one of each"?
[18:06] <joao> 1 osd, 1 mon, 1 mds
[18:07] <The_Bishop> i have such setup
[18:07] <nhm> joao :I've done configurations with each node having each service, and other configurations with each node running multiple services before...
[18:07] <joao> somehow I always end up with one of them not starting, or dieing, or unresponsive
[18:08] <The_Bishop> or did the osd process die soon after start?
[18:08] <nhm> joao: I've had pretty good luck with things starting...
[18:08] <nhm> joao: where are you setting this up?
[18:08] <joao> nhm, my desktop
[18:08] <nhm> joao: ah, ok. Are you running the kernel client on the same machine?
[18:09] <joao> I feel like this is the wrong answer, but no
[18:10] <The_Bishop> this is good
[18:10] <nhm> joao: hehe, that would be too easy. ;)
[18:10] <nhm> joao: Any errors?
[18:11] <joao> nhm, usually, the osd starts, mounts, replays, unmounts and closes the journal
[18:11] <joao> all in one go
[18:11] <The_Bishop> what do you see in the log?
[18:12] <joao> although, if I start off fresh, new data dirs and all, everything goes according to plan; on the second run though, not so much
[18:13] <joao> just a sec
[18:13] <joao> not it appears to be working, testing with rados
[18:14] <joao> and no, I dind't do anything different this time
[18:14] <nhm> huh, I don't think I've ever seen that. Usually it takes a while to get all of the PGs in order, but that's it. My problems are much more performance related.
[18:15] <joao> oh, here we go
[18:15] <joao> okay, so I started a mon and a mds with init-ceph
[18:15] <joao> started an osd with -d
[18:16] <joao> everything went okay; then I killed the osd, and re-ran the ceph-osd with -d
[18:17] <joao> no errors, but the osd gets to this point and then hangs, and rados bench can't finish any ops
[18:17] <joao> 2012-04-30 17:14:57.772956 7fcf981e3700 20 filestore(deploy/dev/osd0) flusher_entry finish
[18:17] <joao> 2012-04-30 17:14:57.780404 7fcf9ed56780 1 journal close deploy/dev/osd0.journal
[18:17] <joao> mon, mds and osd are still up and running though
[18:18] <joao> although 'ceph -s' states that the mds is "laggy or crashed"
[18:18] <joao> teuthology makes this look so damn easy
[18:29] <sagewk> joao: you're running current master branch?
[18:30] <joao> last thursday's master
[18:30] <joao> actually, am compiling the current as we speak
[18:31] <sagewk> does ceph -s report the pgs and active+clean or stale+active+clean?
[18:31] <joao> jecluis@Magrathea:~/Code/dreamhost/ceph/src$ ./ceph -s
[18:31] <joao> 2012-04-30 17:18:03.892402 pg v77: 24 pgs: 16 active+degraded, 8 active+replay+degraded; 3978 KB data, 51382 MB used, 4185 MB / 58497 MB avail; 1042/2084 degraded (50.000%)
[18:32] <sagewk> replay means there is a 30s timeout that has to expire before you can write to the pg
[18:32] <sagewk> because of the weird ack/commit semantics, there's a window for clients to replay requests.
[18:32] <sagewk> for the data pool only (fs file content)
[18:36] <joao> sagewk, indeed, it works again
[18:37] <joao> although the osd stopped outputting to the console
[18:41] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[20:13] <joao> does any of you use eclipse?
[20:17] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[20:20] <nhm> joao: I do, but I haven't been recently because the eclipse python stuff keeps crashign.
[20:21] <nhm> I didn't want to spend a lot of time on it, so I've just been using vim lately.
[20:22] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[20:23] <joao> somehow, since I upgraded to precise, it just stopped resolving some of dout's symbols, which makes it nearly impossible to navigate the source
[20:25] <nhm> no idea. Eclipse is great when it works. ;)
[20:39] * aliguori (~anthony@ has joined #ceph
[20:47] * adamcrume (~quassel@adsl-99-115-83-144.dsl.pltn13.sbcglobal.net) has joined #ceph
[20:48] <adamcrume> Is the mkcephfs script maintained?
[20:48] <adamcrume> I get several errors when running it.
[20:51] <gregaf> joao: it's a change from Helios to Indigo
[20:51] <gregaf> I have no idea why, though
[20:52] <gregaf> just rolled myself back to Helios since I'm not using the packaged versions anyway
[20:52] <gregaf> adamcrume: it should be working for now, although we hope to deprecate it soon
[20:53] <joao> yeah, I noticed the upgrade just a few moment ago; gonna try that as soon as I get the laptop working
[20:53] <joao> not even time machine is working today
[20:53] * joao is pissed
[20:53] <gregaf> can somebody review and merge wip-2352 for me?
[20:53] <gregaf> joao: bad luck :(
[20:53] <elder> My laptop isn't working again, if it's any consolation.
[20:53] <gregaf> I reinstalled my Air yesterday since something's been a little bit broken for a while
[20:54] <gregaf> first time I've had to do that with an Apple device; it pissed me off
[20:54] <adamcrume> gregaf: If mkcephfs is going to be deprecated, what's the preferred method of setting up a simple cluster? Chef is mentioned in the online docs, but there aren't any details.
[20:55] <nhm> gregaf: I had performance problems on my last air which is ultimately what drove me to install linux.
[20:55] <gregaf> adamcrume: yeah, that's why it isn't deprecated yet ??? chef stuff isn't done
[20:55] <nhm> so far this one seems to be alright...
[20:56] <gregaf> I've gotta run off to lunch soon, but give us your errors and somebody will deal with them :)
[20:58] <adamcrume> To start, I run it with "mkcephfs -a -c adam-cluster.conf -k adam-cluster.keyring -v", where adam-cluster.conf is a configuration file in the current directory.
[21:00] <adamcrume> I get "cp: missing file operand", cmon usage messages, and others that I think are caused by those.
[21:03] <elder> I think my brain is starting to decompose from looking at all this hex crap for so long.
[21:37] <elder> OK, going offline for a while. Back this evening.
[22:13] <Tv_> adamcrume: the chef stuff isn't ready to be run in anger
[22:13] <Tv_> adamcrume: can you share you adam-cluster.conf?
[22:15] <adamcrume> Tv_: Here's the config file: http://pastebin.com/P43rnwmP
[22:21] <Tv_> adamcrume: hmm.. can you perhaps run it as sh -x /sbin/mkcephfs -a -c ....
[22:21] <Tv_> adamcrume: pastebin the debug it spews
[22:21] <Tv_> adamcrume: it's not immediately obvious which one of the cp's fails
[22:22] <adamcrume> Tv_: http://pastebin.com/LZkDqmRr
[22:23] <Tv_> adamcrume: what version of ceph is this?
[22:23] <adamcrume> Tv_: I think it turns out that $monkeyring and $tmpkeyring are empty. I'm not sure, though.
[22:24] <adamcrume> Tv_: ceph version 0.21 (090436f5)
[22:24] <Tv_> adamcrume: that's pretty darn old
[22:24] <Tv_> adamcrume: this bug has already been fixed, and mkcephfs looks somewhat different these days
[22:24] <adamcrume> Tv_: I used apt-get.
[22:25] <Tv_> adamcrume: what distro?
[22:25] <dmick> flab/flak unreachable?
[22:26] <Tv_> 0.21 was released in july 2010.. that's almost 2 years ago
[22:26] <adamcrume> Tv_: Good question. I'm using my lab's cluster, and I'm not sure how to find out. If I install Ceph from source, should this work better?
[22:27] <Tv_> adamcrume: http://ceph.newdream.net/docs/master/ops/install/mkcephfs/ shows how to add our apt repository, to get newer packages
[22:27] <Tv_> adamcrume: but really, our current focus is ubuntu 11.10 / 12.04 and that era of distributions
[22:27] <adamcrume> Tv_: Here we go. I think I'm using Ubuntu-Server 10.10 _Maverick Meerkat_ - Release amd64 (20101007).
[22:28] <Tv_> adamcrume: we should still provide debs for that; see the link above
[22:28] <adamcrume> Tv_: Okay, I'll try that. Thanks.
[22:30] <Tv_> adamcrume: but that is 2.5 years out of date, and ubuntu will stop supporting it right about now: https://wiki.ubuntu.com/LTS
[22:31] <adamcrume> Tv_: I'll have to talk with our admins about upgrading it.
[22:31] <adamcrume> Tv_: We don't have proper admins, and everyone's short on time.
[22:31] <Tv_> adamcrume: yeah, and 10.10 is probably fine for a quick experiment, but i hope you're not planning on setting up anything new with it..
[22:32] <Tv_> i miss greg.. by which i mean, the darn light keeps turning off without him in the room
[22:32] <dmick> lol
[22:33] <dmick> http://www.toysrus.com/product/index.jsp?productId=11562289&CAWELAID=882887841
[22:33] <adamcrume> Tv_: Nothing permanent. This is just to run some tests to compare my file system simulator against.
[22:34] <Tv_> dmick: if you get one of those for the office, i'm sure someone will equip the rc helicopters with a sharp spike
[22:46] <yehudasa> sagewk: ok, finally figured that out.. wasn't the -system-libs3 thing
[22:47] <yehudasa> was the default param, however, it required using the deprecated AC_CHECK_LIB instead if AC_SEARCH_LIBS
These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.