#ceph IRC Log


IRC Log for 2011-02-07

Timestamps are in GMT/BST.

[0:22] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:24] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[0:26] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[0:37] * uwe (~uwe@mb.uwe.gd) Quit (Quit: sleep)
[0:40] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[1:54] * L7X (~lanang@leonhart.its.ac.id) has joined #ceph
[1:56] <- *L7X* helo everybody..
[1:56] <- *L7X* :)
[1:56] <L7X> helo everybody... :)
[2:44] * L7X (~lanang@leonhart.its.ac.id) has left #ceph
[3:24] * darkfader (~floh@ has joined #ceph
[5:09] * chrisrd (~chrisrd@ Quit (Quit: chrisrd)
[5:09] * chrisrd (~chrisrd@ has joined #ceph
[5:13] * chrisrd (~chrisrd@ has left #ceph
[5:18] * chrisrd (~chrisrd@ has joined #ceph
[7:52] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (Quit: Leaving)
[8:11] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:46] * allsystemsarego (~allsystem@ has joined #ceph
[9:02] * Psi-Jack (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[9:02] <Psi-Jack> Interesting. No topic at all?
[9:10] * uwe (~uwe@ has joined #ceph
[9:13] <jantje> Hi !
[9:14] <Psi-Jack> Mornin
[9:20] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:51] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[10:06] * tuhl (~tuhl@ has joined #ceph
[10:07] * tuhl (~tuhl@ has left #ceph
[10:07] * tuhl (~tuhl@ has joined #ceph
[10:11] <tuhl> is sage online?
[10:12] <tuhl> what is his NIC?
[10:17] * Yoric (~David@ has joined #ceph
[10:36] <tuhl> is anybody using ceph in production yet?
[10:43] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[10:46] * tuhl (~tuhl@ Quit (Remote host closed the connection)
[11:03] * Yoric_ (~David@ has joined #ceph
[11:03] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[11:03] * Yoric_ is now known as Yoric
[11:15] * Yoric (~David@ Quit (Quit: Yoric)
[13:01] * allsystemsarego (~allsystem@ Quit (Remote host closed the connection)
[13:27] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:34] * uwe (~uwe@ Quit (Ping timeout: 480 seconds)
[13:36] * uwe (~uwe@ has joined #ceph
[13:40] * uwe (~uwe@ Quit (Remote host closed the connection)
[13:41] * uwe (~uwe@ has joined #ceph
[13:44] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[13:54] * tuhl (~tuhl@business-092-079-173-189.static.arcor-ip.net) has joined #ceph
[15:45] <jantje> evening
[15:46] <prometheanfire> yo
[16:12] * baldben (~bencheria@cpe-76-173-232-163.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:19] * hijacker_ (~hijacker@ Quit (Quit: Leaving)
[16:45] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[16:46] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[17:01] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:01] * eternaleye (~eternaley@ Quit (Read error: No route to host)
[17:02] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) has joined #ceph
[17:04] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[17:04] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[17:10] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:11] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[17:11] * Ormod (~hjvalton@vipunen.hut.fi) has joined #ceph
[17:13] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[17:13] <Ormod> While looking at librados I came across the fact that not everything there is being exported in the C API
[17:14] <Ormod> Is that just oversight/future work or is there some design reason behind that? (i.e. if I were to write a couple of patches would they be accepted)
[17:14] <Ormod> stumbled on to this with my python librados wrapper
[17:18] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[17:27] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[17:30] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[17:40] * Yoric (~David@ has joined #ceph
[17:41] <Psi-Jack> I'm having issues trying to compile ceph for opensuse 11.3, it's failing to find libcrypto++ despite it being installed.
[17:46] * tuhl (~tuhl@business-092-079-173-189.static.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[17:50] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[17:52] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:56] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[17:57] * greglap (~Adium@ has joined #ceph
[17:59] * ack_ (ANONYMOUS@ has joined #ceph
[17:59] * ack_ is now known as [ack]
[18:01] <greglap> Ormod: pretty sure it's just an oversight — what's missing?
[18:08] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[18:20] <Ormod> librados.hpp
[18:20] <Ormod> 108: int list_pools(std::list<std::string>& v);
[18:20] <Ormod> for example
[18:20] <greglap> well you can't create a list_pools that looks like that, is there not a list_next_pool call or something?
[18:21] <greglap> anyway, if you want to design replacements we'll take the patches!
[18:22] <Ormod> greglap: yeah I know but there's nothing for listing pools afaik
[18:22] <Ormod> but ok, I'll probably slap something together once I get some free time
[18:25] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[18:27] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[18:32] * Yoric_ (~David@ has joined #ceph
[18:32] * Yoric (~David@ Quit (Ping timeout: 480 seconds)
[18:32] * Yoric_ is now known as Yoric
[18:35] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:41] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:42] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[18:45] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[18:49] * Yoric (~David@ Quit (Quit: Yoric)
[18:51] * uwe (~uwe@ Quit (Quit: sleep)
[18:53] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[18:56] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[18:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:00] <gregaf> Psi-Jack: what's your error code?
[19:00] <Psi-Jack> configure: error: libcrypto++ not found
[19:00] <Psi-Jack> See `config.log' for more details.
[19:00] <Psi-Jack> error: Bad exit status from /var/tmp/rpm-tmp.oQj7TL (%build)
[19:01] <Psi-Jack> Just that, during the configure.
[19:02] <gregaf> what version of libcrypto++ do you have?
[19:03] <Psi-Jack> 5.6.0
[19:05] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[19:05] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[19:06] * fzylogic (~fzylogic@ has joined #ceph
[19:07] <yehudasa> Psi-Jack: do you know where the libcrypto++.so resides?
[19:07] <Psi-Jack> From what I can tell, listing the package contents, there is no shared library, just the .a. heh
[19:08] <yehudasa> is that /usr/lib/libcrypto++.a?
[19:08] <Psi-Jack> Well /usr/lib64/libcrptopp.a
[19:09] <yehudasa> can you 'nm /usr/lib64/libcryptopp.a | grep EncryptionE'?
[19:09] <Psi-Jack> Yep
[19:10] <Psi-Jack> Lots of stuff. heh
[19:10] <yehudasa> and grep _ZTIN8CryptoPP14CBC_EncryptionE ?
[19:10] <Psi-Jack> 4 lines worth with that.
[19:11] <yehudasa> can you copy paste them?
[19:11] <Psi-Jack> 14CBC_EncryptionE
[19:11] <Psi-Jack> U _ZTIN8CryptoPP14CBC_EncryptionE
[19:11] <Psi-Jack> U _ZTIN8CryptoPP14CBC_EncryptionE
[19:11] <Psi-Jack> 0000000000000000 V _ZTIN8CryptoPP14CBC_EncryptionE
[19:11] <bchrisman> Left an idle cluster/filesystem sitting around over the weekend… there was a power outage… looks like it came back up automatically but the filesystem is hung (any ls command hangs in uninterruptible sleep).. messages: http://pastebin.com/MH7Q6Pqu ceph -s: http://pastebin.com/Mb3kQbhW … I'm trying to reconcile my reading of the ceph -s output regarding mds status… there's a message regarding 'laggy or crashed' which I guess means ther
[19:12] * cmccabe (~cmccabe@ has joined #ceph
[19:13] <gregaf> bchrisman: looks like the MDS has crashed
[19:13] <gregaf> if there were available standbys it would kick that MDS out and replace it, but in the absence of such conveniences the cluster just has to wait around and hope the MDS starts responding again
[19:13] <bchrisman> gregaf: I'm guessing no backup mds toopk over, or did they all crash?
[19:14] <gregaf> if you had your system configured with backup MDSes, they probably all crashed
[19:14] <gregaf> in that case my guess would be there's a problem with the MDS journal
[19:14] <yehudasa> Psi-Jack: can you 'gcc -o test test.c -lcryptopp'?
[19:14] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) Quit (Ping timeout: 480 seconds)
[19:15] <yehudasa> or -lcrypto++
[19:15] <Psi-Jack> Well, I would, if there was a test.c :)
[19:15] <yehudasa> you don't need it
[19:15] <yehudasa> just 'touch test.c'
[19:16] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:16] <bchrisman> gregaf: yeah.. that makes sense… power outage occurred several hours before the hang message… ceph mds dump output: http://pastebin.com/JgRvyQ1G
[19:16] <bchrisman> gregaf: that's telling me that my max_mds was set to 1, meaning no failover mds configured?
[19:16] <Psi-Jack> /usr/lib64/gcc/x86_64-suse-linux/4.5/../../../../lib64/crt1.o: In function `_start':
[19:16] <Psi-Jack> /usr/src/packages/BUILD/glibc-2.11.2/csu/../sysdeps/x86_64/elf/start.S:109: undefined reference to `main'
[19:16] <Psi-Jack> collect2: ld returned 1 exit status
[19:17] <gregaf> bchrisman: to have failover MDSes you need to have more MDSes in your ceph.conf than you have max_mds set to :)
[19:17] <yehudasa> Psi-Jack: that's with libcrypto++ or libcryptopp?
[19:17] <Psi-Jack> pp, not ++
[19:18] <yehudasa> what's in your config.log?
[19:18] <bchrisman> gregaf: My config defines three mds's http://pastebin.com/XdUhw7t4
[19:19] <Psi-Jack> configure:15024: error: libcrypto++ not found
[19:19] <yehudasa> Psi-Jack: anything before that?
[19:19] <gregaf> bchrisman: so you did have standby MDSes, and they all crashed, presumably
[19:19] <bchrisman> gregaf: nevermind
[19:19] <bchrisman> gregaf: I screwed up my conf file.
[19:19] <gregaf> can you look and see if you have core dumps or logs? (there should be a backtrace at the end of the log)
[19:20] <gregaf> heh
[19:20] <gregaf> well, you should still have a core dump or log backtrace
[19:20] <Psi-Jack> Heh, well, there's quite a bit before that. heh
[19:21] <Psi-Jack> http://pastebin.com/69uf0UQq
[19:22] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[19:25] <yehudasa> Psi-Jack: let's do the test.c check again, now with a test.c that contains something like:
[19:25] <yehudasa> int main() {}
[19:27] <Psi-Jack> /tmp/ccITuiOX.o: In function `main':
[19:27] <Psi-Jack> test.c:(.text+0x0): multiple definition of `main'
[19:27] <Psi-Jack> /tmp/ccZ0fjrQ.o:test.c:(.text+0x0): first defined here
[19:27] <Psi-Jack> collect2: ld returned 1 exit status
[19:27] <Psi-Jack> Wierd, cause there's only the one main, just like you said. heh
[19:28] <yehudasa> Psi-Jack: does that happen also when you're omitting the -lcryptopp?
[19:28] <yehudasa> brb
[19:28] <Psi-Jack> Oh, oops, I was doing gcc -o test test.c -lcryptopp test.c
[19:31] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[19:31] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[19:32] <Psi-Jack> Without the redundant test.c, it compiles.
[19:41] <Psi-Jack> Hmm, why does ceph need libcrypto++ anyway? Seems like a very arbitrary library to me, from examining it a little more.
[19:45] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:48] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Quit: Ex-Chat)
[19:53] <yehudasa> Psi-Jack: now try a new test.c: http://pastebin.com/g1QHd9xE
[19:54] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[19:54] <Psi-Jack> Yeah, with that, it fails with a lot of stuff. I thought ahead on that, and also grabbed a different libcrypto++ src.rpm from opensuse factory, instead of opensuse packman repos.
[19:55] <yehudasa> yeah, looks like that package is broken
[19:55] <Psi-Jack> Yeah, now I have /usr/lib64/libcryptopp.so.9
[19:55] <yehudasa> does the new test.c compile now?
[19:55] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:55] <Psi-Jack> I didn't test that, but the build process is now actually compiling for ceph.
[19:56] <yehudasa> oh, so problem is fixed
[19:56] <Psi-Jack> Heh yeah.
[19:56] <Psi-Jack> Gods, I hope openSUSE 11.4 fixes a lot of crap.
[19:57] <Psi-Jack> 11.3 was like the worst version I have ever seen since they became Novell.
[19:57] <gregaf> bchrisman: did you figure out your problem?
[19:58] <Psi-Jack> Okay, yeah, it's compiling, just without gtkmm support.
[19:59] <bchrisman> gregaf: well, my cluster was misconfigured, and there were power outages involved taking down the whole cluster, so I kind of punted until I see it again under more controlled circumstances.
[19:59] <gregaf> mmm, okay — we'd like to survive cluster death, though!
[20:00] <bchrisman> gregaf: my ceph.conf was trying to list mds's as [mds(hostname)] rather than [mds.(hostname)]...
[20:00] <bchrisman> gregaf: yeah.. I'll be testing full cluster down later..
[20:01] <bchrisman> node failure testing went swimmingly… and drive pull testing went reasonably well (things we can hack around to make it work basically)
[20:01] <bchrisman> I'm looking at the NFS issue right now with the mailing list recommendation of checking out using the NFSv4 expiration flags instead of returning stale filehandles.
[20:02] * uwe (~uwe@ip-94-79-145-210.unitymediagroup.de) has joined #ceph
[20:21] <Psi-Jack> I was trying to read up on how ceph actually works, but the front page leaves little to actually detail it out. heh
[20:24] <gregaf> Psi-Jack: Ceph started as an academic project; the best resource right now is the sequence of published papers on it
[20:24] <gregaf> http://ceph.newdream.net/publications/
[20:25] <Psi-Jack> Ahhh, I see.. Not bad, starting out as a academic project and making it's way into kernel mainline, impressive,.
[20:27] <Psi-Jack> Heh. Tap tap tap.. I love how much slower C++ takes to compile. blah.
[20:28] <cmccabe> hi all, if nobody is using sepia1 and sepia2, I'd like to use them for a sec
[20:33] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:37] * uwe (~uwe@ip-94-79-145-210.unitymediagroup.de) Quit (Quit: sleep)
[20:40] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[20:44] <Tv|work> hell is escaping regexps over ssh remote calls calling sh -c..
[20:47] <cmccabe> tv: does re.escape help at all?
[20:48] <Tv|work> actually it seems the escaping part was done for me
[20:48] <Tv|work> but still..
[20:48] <Tv|work> sudo("""perl -ne 'if (m{^([^#]\S*\s+/\s+\S+\s+)(\S+)(\s+.*)$}) { $_="$1$2,user_xattr$3\n" unless $2=~m{(^|,)user_xattr(,|$)}; } print' -i.bak /etc/fstab""")
[20:49] <Tv|work> i feel like typing NO CARRIER after that
[20:50] <cmccabe> I got the impression that paramiko creates persistent sessions and there's no great cost to splitting up one command into several
[20:50] <Tv|work> there isn't
[20:50] <cmccabe> tv: I suppose having to create files in /tmp might be viewed as inelegant by some
[20:51] <cmccabe> tv: but it's still often the least-bad choice
[20:51] <Tv|work> not using perl -i means i worry about preserving access modes etc
[20:54] <wido> sjust: You there?
[20:54] <wido> I keep getting PG's which break down, did my logs help you?
[20:56] <sjust> wido: Sorry, I haven't looked much into it yet. I'll start looking shortly, though.
[20:59] <gregaf> johnl: think I found the problem with your OSDs
[21:01] <gregaf> looks like you've got some ops that are bigger than the journal
[21:03] <sjust> wido: sorry for the delay, looking into it no
[21:03] <sjust> *now
[21:17] <wido> sjust: No problem at all! Just wanted to know
[21:17] <wido> "8224 pgs: 8191 active+clean, 33 active+clean+inconsistent"
[21:29] <wido> sjust: I'm afk for a moment. Have to work tonight, I'll be back in about 2:30h if you need me
[21:41] * uwe (~uwe@ip-94-79-145-210.unitymediagroup.de) has joined #ceph
[22:12] * `gregorg` (~Greg@ has joined #ceph
[22:12] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[22:14] * gregorg_taf (~Greg@ has joined #ceph
[22:14] * `gregorg` (~Greg@ Quit (Read error: Connection reset by peer)
[22:24] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[22:43] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[23:28] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:47] <Tv|work> i wonder how much time i've spent watching ceph compile..
[23:48] <Tv|work> i need to add something on top of autotest for pre-built tarballs, this is driving me crazy
[23:49] <cmccabe> tv: can you just install the dpkg?
[23:51] <Tv|work> cmccabe: that might be the easiest route, but it still means having some way of delivering the *right* deb
[23:52] <cmccabe> tv: the other favorite solution is running make install DESTDIR=/my/chroot/type/environment
[23:52] <Tv|work> deb or with binaries, not much difference there
[23:52] <cmccabe> tv: then you have a directory with just the ceph binaries and support stuff in it, which you can copy over
[23:52] <Tv|work> but it needs to be built & shipped to many machines etc
[23:52] <cmccabe> tv: making a deb is a more difficult process than typing make install, at least at the moment
[23:52] <Tv|work> and not every test wants the same version
[23:52] <Tv|work> and not every test runs on the same hardware platform
[23:52] <Tv|work> etc etc
[23:53] <cmccabe> tv: making a deb involves making a tarball, running pbuilder, all this stuff.
[23:53] <cmccabe> tv: I would probably stick to copying binaries out of some location using rsync
[23:54] <Tv|work> sadly it ain't that simple
[23:54] <Tv|work> multiple platforms, building master vs "what i just typed in my editor", etc
[23:55] <cmccabe> tv: if we come up with a solution that works for a simple case, perhaps we can add features over time
[23:55] <cmccabe> tv: I understand the appeal of compiling everything inside the VM, but unfortunately ceph takes a long, long time to compile in restricted-resource environments.
[23:55] <Tv|work> oh that's not the point here
[23:55] <Tv|work> the point is compiling once vs many times
[23:56] <cmccabe> tv: I've been trying to fight this battle for a while-- for example, getting people to stop inlining everything in headers so much
[23:56] <cmccabe> so why can't I just set up a directory with binaries and tell your test infrastructure to use that?
[23:57] <Tv|work> cmccabe: uhh, because of networking?
[23:57] <cmccabe> the VMs must have some sort of network
[23:57] <cmccabe> tv: you need some kind of communications channel in
[23:57] <Tv|work> which would be what i need to build
[23:57] <Tv|work> come oin
[23:58] <cmccabe> tv: well let me know if you want help or ideas
[23:59] <cmccabe> tv: I think the success of this project will be very important to ceph and I'm glad you've made so much progress on it

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.