#ceph IRC Log


IRC Log for 2011-03-31

Timestamps are in GMT/BST.

[0:00] <DJLee> oh actually enough metaops with file w/r, etc, but still doesnt make mds link no more than 1mb/s
[0:00] <DJLee> including cpu load
[0:01] <DJLee> oh actually cpu is sweating a bit;
[0:01] <gregaf> the MDS is really never going to max out its network link; it's the CPU usage that you've got to worry about
[0:02] <gregaf> or occasionally the disk latency, but hopefully not too often
[0:02] <Tv> and even without the cpu load, you can have clients suffering from locking delays
[0:02] <Tv> synchronous updates in multi-writer scenarios etc
[0:04] <gregaf> that's just a performance disaster anyway though
[0:04] <gregaf> not that it's any worse for us than for anybody else, mind you
[0:10] <bchrisman> yeah… libceph doesn't implement any lock calls… though if I remember from some testing I did, posix locks work...
[0:11] <Tv> i'm sure there are similar scenarios with two clients manipulating the same directory
[0:11] <Tv> they'll invalidate each others caches all the time
[0:18] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[0:25] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:29] * imcsk8 (~ichavero@ Quit (Quit: Leaving)
[0:41] <Tv> sagewk: sepia{23,24,25,26,27,50,54,55,56,60,61,62,63,64,65,68,69} have rbd in kernel now
[0:41] <Tv> sagewk: more to come as soon as i can lock machines in autotest..
[0:42] <sagewk> sweet thanks
[0:42] <bchrisman> Curious about what the DirResult struct is doing/being used for? I ran into it trying to chase down why lib ceph does an indirect return mechanism with DIR ** rather than what's probably more usual, returning a DIR *… it looks to me like the ceph client is getting bits and pieces of the directory, possibly from different places?
[0:42] <bchrisman> (for the opendir call, but seems to be in other calls as well)
[0:46] <sagewk> bchrisman: yeah, the directory may be fragmented, and we fetch in pieces from the mds
[0:47] <bchrisman> sagewk: okay.. that's what I was expecting… good enough..
[0:55] <sagewk> not sure about the DIR ** tho :)
[0:56] <Tv> canonical answer: to provide out-values, where the return value itself is an int to pass error info
[0:57] <Tv> source seems to confirm that theory
[1:00] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[1:01] <sagewk> i suspect it's ugly because I tried to fake the standard types. that was probably a dumb idea. we can clean that up...
[1:02] <bchrisman> looks like it works for the fuse client well enough :)
[1:04] <Tv> sjust1: you have sepia{13,14,16,17,18,89} locked. i'd like to upgrade their kernels..
[1:05] <sjust1> ah, ok
[1:08] <Tv> sjust1: it seems sepia13 is down, any clue?
[1:08] <sjust1> oh, right
[1:10] <sjust1> sepia13 is reinstalling
[1:12] <Tv> ok then it'll get the kernel with the usual run, that's fine
[1:15] <Tv> sjust1: fyi seems= like sepia89 is booting from the network, going into bios
[1:15] <sjust1> nooooooooooooooo
[1:15] <sjust1> ok
[1:16] <Tv> oh the serial console speed is busted
[1:16] <sjust1> yes
[1:16] <Tv> ok leaving it locked so it's not used for tests
[1:16] <sjust1> it is where the kernel building stuff is anyway
[1:17] <Tv> oh yeah i already filed https://dev.newdream.net/issues/9292
[1:17] <Tv> yeah just taking it more clearly out of rotation, then
[1:19] <Tv> alright that means kernel upgrade is done
[1:19] <Tv> and painful things in my tools are yet again more obvious :(
[1:19] <Tv> (need to deal with down or locked hosts nicer)
[1:27] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:19] * samsung (~samsung@ has joined #ceph
[2:26] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:27] <bchrisman> libceph's lseek… from what I can tell, ceph_read and ceph_write are effectively pread & pwrite, in that they tae in an offset… so I'm not sure what ceph_lseek would do?
[3:01] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[3:19] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:20] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:31] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[3:33] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[3:42] * pombreda (~Administr@dev.nexb.us) has joined #ceph
[3:50] * cmccabe (~cmccabe@ has left #ceph
[4:05] * pombreda (~Administr@dev.nexb.us) Quit (Quit: Leaving.)
[5:00] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[5:06] * MarkN (~nathan@ has joined #ceph
[5:06] * MarkN (~nathan@ has left #ceph
[5:14] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[5:24] <samsung> hi,can i set filestore_btrfs_snap to false?
[5:48] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:51] <greglap> bchrisman: been a while since I looked at libceph, but I think there are either two sets of read/write functions or else the offset can be set to −1 and is ignored?
[5:52] <greglap> samsung: you can set it false, but you don't want to — that'll slow down your commits and stuff rather a lot!
[6:00] <samsung> that can result in osd timeout?
[6:03] <greglap> ummm, probably not
[6:03] <greglap> just slower throughput
[6:06] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:12] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[6:33] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[7:45] * lxo (~aoliva@ has joined #ceph
[8:11] * root (~root@ has joined #ceph
[8:11] * root is now known as chengmao
[8:12] <chengmao> hi,ai
[8:12] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[8:13] * chengmao (~root@ Quit ()
[8:14] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:16] * Administrator_ (~Yuki@ has joined #ceph
[8:20] <Administrator_> exit
[8:20] * Administrator_ (~Yuki@ Quit ()
[8:21] <samsung> did snap sync do faster than btrfs sync?
[8:30] * alexxy (~alexxy@ has joined #ceph
[8:57] * allsystemsarego (~allsystem@ has joined #ceph
[9:08] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:46] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[10:05] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[10:14] * lxo (~aoliva@ has joined #ceph
[10:59] * WhiteKIBA (~WhiteKIBA@vm3.rout0r.org) has left #ceph
[11:02] * DJLee (82d8d198@ircip2.mibbit.com) Quit (Remote host closed the connection)
[11:28] * gregorg (~Greg@ Quit (Quit: Quitte)
[12:45] * st-7068 (~st-7068@a89-154-147-132.cpe.netcabo.pt) has joined #ceph
[12:46] * st-7068 (~st-7068@a89-154-147-132.cpe.netcabo.pt) Quit ()
[13:07] * Yoric (~David@ has joined #ceph
[13:14] * st-7320 (~st-7320@a89-154-147-132.cpe.netcabo.pt) has joined #ceph
[13:25] * gregorg (~Greg@ has joined #ceph
[13:31] * Administrator_ (~samsung@ has joined #ceph
[13:32] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:35] * samsung (~samsung@ Quit (Ping timeout: 480 seconds)
[13:52] * st-7320 (~st-7320@a89-154-147-132.cpe.netcabo.pt) Quit (Remote host closed the connection)
[14:33] * Yoric (~David@ Quit (Ping timeout: 480 seconds)
[14:37] * Yoric (~David@ has joined #ceph
[14:58] * Yoric (~David@ Quit (Quit: Yoric)
[15:17] * Administrator__ (~samsung@ has joined #ceph
[15:23] * Administrator_ (~samsung@ Quit (Ping timeout: 480 seconds)
[16:01] * Administrator_ (~samsung@ has joined #ceph
[16:02] * Administrator_ (~samsung@ Quit ()
[16:06] * Administrator__ (~samsung@ Quit (Ping timeout: 480 seconds)
[16:52] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[16:52] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[17:43] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:55] * greglap (~Adium@ has joined #ceph
[18:01] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[18:08] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:20] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[18:38] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:58] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:02] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:17] <bchrisman> had a question from late yesterday, apologies for repeating: libceph's lseek… from what I can tell, ceph_read and ceph_write are effectively pread & pwrite, in that they take in an offset… so I'm not sure what ceph_lseek would do?
[19:19] <gregaf> bchrisman: been a while since I looked at libceph, but I think there are either two sets of read/write functions or else the offset can be set to −1 and is ignored?
[19:19] <Tv> bchrisman: libceph does seem to have ways to "read from current offset" too
[19:19] <bchrisman> ahh maybe that's implemented down the stack in the client write method.
[19:19] <bchrisman> okie
[19:19] <Tv> if (offset < 0) {
[19:19] <Tv> lock_fh_pos(f);
[19:19] <Tv> offset = f->pos;
[19:19] <Tv> movepos = true;
[19:20] <Tv> }
[19:20] <Tv> looks like negative offsets are "current offset"
[19:20] <bchrisman> good enough. :)
[19:28] <cmccabe> should perhaps add a comment to the header about that
[19:30] <Tv> or, like, document the API?-)
[19:30] <gregaf> heresy!
[19:36] <wido> I'm seeing a buffer overflow in the radosgw, it's due to CEPH_CRYPTO_HMACSHA1_DIGESTSIZE which is to small
[19:36] <wido> the default is 20, but I needed at least 42 to get my segfault (abort due to buffer overflow) away, but auth_sign.compare() is still failing
[19:38] <Tv> wido: that's funky.. what commit are you at?
[19:39] <wido> Tv: d5f10bcb054613e7
[19:39] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[19:39] <wido> s3 -u list fails for example
[19:40] <wido> after incrementing the constant from 20 to 42 I got it working, but getting gdb attached to a fastcgi proces isn't trivial
[19:41] <Tv> wido: ok i'm due to start testing rgw for real soon, i will look at that shortly.. feel free to file a ticket
[19:42] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:42] <Tv> i don't see where the 42 might come from though.. hexdigits + '\0' + off by one error, perhaps
[19:45] <wido> Tv: Me neither, but that worked, I've been incrementing until it started working
[19:46] <Tv> wido: 70b021d4f926d20f7478851cb44ff9609420ad4e might be related
[19:46] <Tv> wido: i never went through that commit in detail enough to understand it
[19:48] <wido> Tv: CEPH_CRYPTO_HMACSHA1_DIGESTSIZE * 2 + 1
[19:48] <Tv> is 41
[19:48] <wido> 20*2+1 = 41
[19:48] <wido> you were faster, let me test with 41 :-)
[19:48] <Tv> needs that extra off by one..
[19:49] <Tv> and i just looked at the hefixying logic and it wasn't obviously wrong
[19:50] <Tv> oh wait what
[19:50] <Tv> /* destination should be CEPH_CRYPTO_HMACSHA1_DIGESTSIZE bytes long */
[19:50] <Tv> buf_to_hex((unsigned char *)dest, CEPH_CRYPTO_HMACSHA1_DIGESTSIZE, hex_str);
[19:50] <Tv> now that is wrong
[19:51] <Tv> err wait buf_to_hex destination arg is the last arg, not the first one..
[19:51] <Tv> ok that hex string is only used for the log
[19:53] <wido> Tv: it wasn't 42, but 40. 39 fails, 40 works
[19:53] <wido> verified it again just now
[19:53] <Tv> wido: can you give me the exact line you're changing?
[19:54] <Tv> i think i saw some places where rgw concatenated two sha-1's, intentionally
[19:54] <wido> Tv: I noticed the overflow in s3::authorize, saw the constant comming from common/ceph_crypto.h
[19:55] <wido> to see if it really was the problem I incremented CEPH_CRYPTO_HMACSHA1_DIGESTSIZE to 40
[19:55] <wido> in ceph_crypto.h
[19:55] <wido> no real fix though, just to see if that was the issue.
[19:55] <Tv> ok but in authorize
[19:55] <Tv> i think Wes found the same bug at the same time ;)
[19:57] <Tv> wido: since you can reproduce it nicely, can you try restoring the define to 20, but at the end of rgw_rest_s3.cc changing char hmac_sha1[CEPH_CRYPTO_HMACSHA1_DIGESTSIZE];
[19:57] <Tv> wido: just say [40] there
[19:58] <Tv> i don't yet understand how it could crash, but that looks like it'd be the place
[19:59] <Tv> wido: btw i definitely see the ceph_armor call right after it crashing if you changed the DIGESTSIZE define
[19:59] <Tv> wido: so i think you tried to work around a bug, but caused another :-/
[19:59] <Tv> hmm expect that should only crash with DIGESTSIZE>45
[19:59] <Tv> so maybe you were safe, after all
[20:01] <wido> Tv: I tried changing that char to a size of 40 this afternoon, still the buffer overflow
[20:02] <wido> do you have a simple way to get gdb attached to a FastCGI proces? That's what I've been looking for all afternoon
[20:04] <Tv> wido: i still haven't run rgw, personally..
[20:04] <cmccabe> wido: the core file isn't helpful?
[20:04] <Tv> wido: but i'm due to automate tests of it, so that will happen soon
[20:10] <cmccabe> wido: you can get core files by setting ulimit -c unlimited and setting /proc/sys/kernel/core_pattern before starting fastcgi
[20:10] <wido> cmccabe: Haven't looked at the core, but it's running under www-data, so I'd have to set a other core dump location
[20:11] <cmccabe> wido: also, are you using cryptopp or the other crypto lib
[20:11] <bchrisman> gregaf: is there an ETA on 0.26.. and will that osd fix be in it? :)
[20:11] <bchrisman> gregaf: schedule on tracker suggests 0.26 on 3/26
[20:12] <gregaf> well we're trying two-week sprints but I think we actually just started it…not really sure how we're handling the transition though
[20:12] <wido> cmccabe: libcrypto8
[20:12] <gregaf> it's a pretty short patch you can apply yourself, or we autobuild packages of the next branch so you can just grab those if you like
[20:13] <gregaf> (and yes, the next branch is for .26 so the fix will be in it)
[20:13] <cmccabe> wido: I think it's either cryptopp or nss
[20:13] <cmccabe> wido: I don't know what crypto8 is, but chance are, ceph isn't using it
[20:14] <Tv> cmccabe: sounds like libcrypto++8
[20:14] <wido> cmccabe: my bad, cryptopp indeed, the package is called libcrypto
[20:15] <cmccabe> I see one potential overflow, which is that key_len is not checked against the length of key_buf
[20:15] <cmccabe> unless it's checked earlier
[20:15] <bchrisman> gregaf: cool.. thanks… we were tracking master… but that threw some smoke into the process so we backed up to tracking tags.
[20:15] <gregaf> ah, yeah
[20:15] <gregaf> the next and stable branches should be much safer than master
[20:16] <cmccabe> unless hmac.Final is overflowing its buffer, I don't see how dest could be too small
[20:18] <cmccabe> it's not really clear to me what buffer you need for hmac.Final. Maybe there's a surprise hidden in cryptopp
[20:18] <cmccabe> but an md5 is only 20 bytes, so it seems pretty logical to output an md5 into a 20-byte buffer
[20:18] <cmccabe> er, SHA1 rather
[20:19] <Tv> cmccabe: oh this might be it
[20:19] <Tv> src/rgw/rgw_admin.cc:21:#define SECRET_KEY_LEN 40
[20:19] <Tv> that's the user info ->secret_key buffer size
[20:20] <Tv> which is passed as key, key_len to calc_hmac_sha1
[20:20] <cmccabe> yeah...
[20:20] <cmccabe> definitely more than 20 bytes
[20:20] <Tv> which blindly memcpys to buf of size 20
[20:20] <Tv> i'll fix it as soon as i get a local rgw running
[20:21] <cmccabe> I don't understand why SECRET_KEY_LEN would be 40 rather than 20
[20:21] <Tv> yeah me neither but not changing code until i can test it
[20:21] <cmccabe> all my amazon and dreamhost s3 keys are 20 characters
[20:21] <cmccabe> unless he's concatenating secret key and access key into one or something
[20:22] <cmccabe> even then, you would need a null terminator
[20:22] <cmccabe> yeah, that should probably just be 20+1
[20:22] <Tv> there's a +1 in the actual buffer allocation
[20:22] <cmccabe> k
[20:22] <Tv> anyway, i'm on it
[20:22] <underdark> Im looking at ceph to do a block-device and possibly file storage cluster, but im lost as to how production ready everything is
[20:23] <Tv> underdark: rbd is pretty good, the ceph filesystem side is not quite as robust yet; proceed with care
[20:23] <Tv> underdark: you might run into lots of issues still, so don't put it blindly onto production, but early adopters are appreciated
[20:24] <underdark> Tv: ok, im looking to build a bunch of systems and first only need block storage for kvm based vm's
[20:25] <underdark> whats the advise on using the debian stock kernel or rolling our own using the latest release from the ceph website?
[20:25] <wido> underdark: The Qemu-RBD code is pretty stable, haven't seen much issues lately
[20:25] <wido> underdark: there is no special kernel needed when running KVM
[20:26] <Tv> underdark: if you use the rbd or ceph kernel modules, you will want the bugfixes from our kernel fork
[20:26] <Tv> underdark: to use the qemu features, you don't need those kernel modules
[20:26] <underdark> ok, sounds good
[20:26] <bchrisman> I'm also guessing libceph doesn't implement lock calls because it was originally built for hadoop … which is … well… lockless-ish...?
[20:27] <gregaf> bchrisman: libceph wasn't built for hadoop, that's just the only in-tree user
[20:27] <gregaf> it doesn't implement locks because….well, it never got updated after we added locking
[20:28] <gregaf> there's no advisory locking support in the userspace client at all right now
[20:28] <bchrisman> ahh haven't tested that over fuse.
[20:28] <bchrisman> guess we know how that would turn out then.. :)
[20:28] <gregaf> heh
[20:28] <gregaf> patches welcome! ;)
[20:30] <bchrisman> yeah… I'm guessing we can look at the kernel source to see how that's implemented… and then implement in client/Client.cc… and then export that interface to libceph/cfuse?
[20:30] <gregaf> yep
[20:30] <bchrisman> okie
[20:31] <wido> Tv: I think I found it
[20:32] <wido> char key_buf[(CEPH_CRYPTO_HMACSHA1_DIGESTSIZE * 2) + 1];
[20:32] <Tv> wido: that's the one being used to hexify it, then logged? you could even comment out those few lines and not change the actual functioning
[20:33] <wido> right now my RGW is working again, creating and listing
[20:33] <wido> got to run! be back in about 1.5 hours
[20:33] <cmccabe> wido: we did find it. it's the fact that SECRET_KEY_LEN doesn't match CEPH_CRYPTO_HMACSHA1_DIGESTSIZE
[20:33] <cmccabe> wido: yet the code assumes that it does
[20:33] <wido> cmccabe: Ah, yes, a few lines back. Missed that part
[20:34] <Tv> wido: if you got a patch, please share just so we can verify we're talking about the same thing; i'm setting up a dev env right now
[20:34] <cmccabe> wido: I would just try making SECRET_KEY_LEN the same as CEPH_CRYPTO_HMACSHA1_DIGESTSIZE
[20:34] <wido> Tv: sure, I'll be back in 1.5 hours, I'll do it by then
[20:34] <Tv> wido: thansk
[20:34] <cmccabe> wido: bye, see you later
[20:36] <bchrisman> gregaf: right now I'm putting up a quick and dirty samba vfs layer which attaches directly to libceph instead of going through cfuse/kernel client… that's why all the libceph questions… good to see that it's flushing out issues that we'll need to deal with (aka, patch etc) in the underlying functionality..
[21:05] <Tv> yay for a working rgw
[21:19] <cmccabe> tv: I am really confused by the difference between RGWHandler_REST_OS and RGWHandler_REST_S3
[21:20] <cmccabe> tv: I thought I understood this class hierarchy until I saw these two classes, which seem to be doing the same thing?
[21:20] <cmccabe> oh
[21:20] <cmccabe> it's for openstack support
[21:21] <Tv> "OS" is a horrible abbreviation for openstack :-(
[21:21] <cmccabe> I think I'm going to end up creating somewhere between 6 and 12 new classes
[21:21] <cmccabe> giving this adventure a java-like feel
[21:22] <Tv> cmccabe: what are you working on?
[21:22] <cmccabe> status command
[21:22] <Tv> StatusFactoryFactoryFactoryGetter
[21:22] <cmccabe> .... factory
[21:23] <cmccabe> it would be nice to see people use start using composition more than inheritance
[21:23] <cmccabe> I also have a test env, so let me know if you want to test the key_len fix
[21:24] <Tv> cmccabe: my real interest is making sure that the tests we already kind of have cover bugs like that
[21:25] <cmccabe> tv: yeah
[21:31] <bchrisman> does 'cwd' get initialized somewhere after mount in Client?
[21:32] <bchrisman> wondering if I call 'path_walk' after initialize/mount, whether it's going to choke on asserting cwd (as cur)
[21:33] <cmccabe> bchrisman: perhaps call chdir first?
[21:33] <cmccabe> bchrisman: I'm not really sure why it starts as NULL rather than /, you'd have to ask sage or someone
[21:34] <bchrisman> cmccabe: chdir calls path_walk :)
[21:35] <cmccabe> bchrisman: doh!
[21:35] <bchrisman> add_update_inode looks like it does something..
[21:35] <bchrisman> maybe that's internal.. will see where that's called from..
[21:37] <gregaf> bchrisman: yeah, it gets set as soon as there are any inodes in the cache, which happens on mount
[21:38] <gregaf> it's perhaps not the greatest startup but I'm sure there was some reason for it
[21:48] <bchrisman> ahh cool.. that's fine… thanks
[21:52] <cmccabe> do we have a way to parse xml in nagios?
[21:53] <cmccabe> since generally s3 responses are xml, we could return somewhat structured data
[22:06] <frank_> what would general recommendation be for an offsite (gigE, +/- 5ms RTT latency) replication of a ceph cluster?
[22:08] <gregaf> frank_ depends on how much data you're pushing
[22:08] <gregaf> you could set up a cluster and custom CRUSH map to replicate all the data off-site if your writes will fit inside a gigE connection
[22:09] <gregaf> otherwise, rsync+cron? ;)
[22:10] <frank_> gregaf: wouldn't the reads be the problem?
[22:10] <frank_> (see http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg01488.html)
[22:11] <gregaf> you can set up a CRUSH map so that the off-site storage never serves reads
[22:11] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[22:11] <gregaf> you'd put the off-site OSDs in a separate grouping from the on-site ones, and for each crush rule grab the number of on-site replicas you want, and then grab one off-site replica
[22:11] <frank_> myes
[22:11] <frank_> would be possible
[22:12] <gregaf> the primary will then always be on-site and you will push writes out to the off-site replica synchronously (so that connection needs to be able to keep up with all your writes)
[22:12] <frank_> but i was thinking along the lines of a zfs send | ssh | zfs receive like strategy
[22:12] <gregaf> but reads are always served from the primary
[22:13] <gregaf> ah, dunno about that kind of stuff then
[22:22] <Tv> cmccabe: nagios doesn't parse anything; the monitoring plugins then again are arbitrary code you write
[22:23] <Tv> cmccabe: but that implies having to write one, as opposed to using one that makes sure that a http status==200 and body=="ok"
[22:23] <wido> Is there any particular reason that CEPH_CRYPTO_HMACSHA1_DIGESTSIZE is set to 20?
[22:23] <cmccabe> tv: it's justa thought. I'm sure the first thing will be status==200
[22:23] <wido> it's only being used in the RGW
[22:24] <Tv> wido: that's the size of SHA1 output
[22:24] <wido> Isn't my secret 40 chars long?
[22:24] <cmccabe> wido: see if 025816648912258c4eac2f16506eb7d782f3701f fixes your crash bug
[22:24] <wido> the access key is 20, the secret 40
[22:24] <Tv> wido: that might be the source of the bug
[22:26] <Tv> cmccabe: oh wow nothing ever used the key_buf?
[22:26] <Tv> that's ridiculous
[22:26] <cmccabe> tv: yeah, it was a real brown paper bag one
[22:27] <wido> cmccabe: Still compiling, but I think that was the source indeed, my hacking on those lines resolved it for me
[22:28] <wido> cmccabe: confirmed, fixed :) tnx
[22:28] <cmccabe> wido: great
[22:29] <wido> but the question remains, why the seperate constants in rgw_admin and ceph_crypto
[22:29] <wido> they seem to do the same, but could only cause confusion
[22:29] <Tv> wido: perhaps the user secret can be near anything
[22:31] <wido> Tv: in size you mean?
[22:31] <Tv> ret = gen_rand_base64(secret_key_buf, sizeof(secret_key_buf));
[22:31] <Tv> yup, it's 40 bytes of base64-encoded randomness
[22:32] <Tv> that is, it really is 40 bytes
[22:32] <wido> ah, yes
[22:38] <wido> i'm goin afk, ttyl
[22:39] <cmccabe> I do think we should unify those constants, or put in a static_assert
[22:40] <Tv> this particular one is an arbitrary size though, unrelated to SHA1 etc
[22:40] <Tv> you could s/40/57/ and be happy
[22:41] <cmccabe> oh, on closer inspection, yeah. It doesn't really have anything to do with SHA1
[22:41] <Tv> apart from silly bugs
[22:41] <cmccabe> the bug was ever thinking that it did
[22:46] <sagewk> bchrisman: are you working on the libceph/samba glue, or is that coming later?
[22:47] <bchrisman> sagewk: yeah… doing a rough first implementation
[22:48] <bchrisman> sagewk: we may go with samba-over-cfuse… but we're going to startup/explore this a bit along the way
[22:48] <bchrisman> sagewk: I've mapped most of the calls.. mainly just working on xattr stuff for now...
[22:49] <bchrisman> sagewk: putting that into libceph.. then calling from vfs/samba… it's an unusual vfs layer because it backs the whole thing, rather than making a few mods for a particular filesystem
[22:50] <sagewk> ok. fwiw i think libceph is going to perform much better :)
[22:50] <bchrisman> sagewk: but it would get the kernel pretty much completely out of the way, except for socket calls
[22:50] <sagewk> also, can you resend the Client.cc namespace patch when you have a chance?
[22:50] <bchrisman> sagewk: yeah.. cfuse will go into and out of the kernel.. and *then* make those same socket calls
[22:50] <bchrisman> sagewk: yup.. I'll do that before end of the week…
[22:52] <sagewk> k thanks. if you run into other issues with libceph (besides setxattr) let us know
[22:52] <sagewk> (and/or send patches :)
[22:52] <sagewk> starting on the lookup by ino stuff now
[22:53] <Tv> so RGWRados::create_bucket is taking ridiculously long for me, calling from rgw_admin user create.. how do i debug this?
[22:54] <Tv> (from rgw_store_user_info)
[22:58] <sagewk> this is in your test environment?
[22:58] <sagewk> give radosgw --log_file /tmp/foo --debug-objecter 10 --debug-ms 1 and see what it's doing
[22:59] <Tv> sagewk: it's vstart.sh env
[23:01] <Tv> sagewk: you mean radosgw_admin? i can't get it to recognize those options
[23:02] <Tv> i can just slap them in ceph.conf, though...
[23:04] <Tv> http://pastebin.com/raw.php?i=53uqk275
[23:05] <Tv> osd log: http://pastebin.com/raw.php?i=chv5B0Jj
[23:08] <sagewk> probably the .users bucket needs to be created first
[23:08] <sagewk> er, object.
[23:09] <sagewk> can you do debug osd = 20?
[23:09] <Tv> on the cosd or on the rgwadmin?
[23:09] <sagewk> i'm not sure what the rgw bootstrapping looks like.. some of thse buckets/objects mayh need to be precreated.
[23:09] <sagewk> cosd
[23:09] <Tv> ok hold on
[23:12] <Tv> http://pastebin.com/raw.php?i=UUnZL3Qn
[23:12] <Tv> bleh 2011-03-31 14:11:48.815011 7fbcfc458700 osd0 3 still in boot state, dropping message osd_op(client4120.0:1 .users [create 0~0] 4.f3e9) v2
[23:15] <Tv> restarting to cluster, just because that's the hammer i have..
[23:15] <Tv> s/to/the/
[23:17] <Tv> ahahaah
[23:17] <Tv> [mon.a]
[23:17] <Tv> mon data = "dev/mon.a"
[23:17] <Tv> those quotes are in the actual filenames
[23:17] <Tv> nice, vstart.sh ;)
[23:18] <sagewk> that's a new config parsing issue... wanna take a look cmccabe?
[23:18] <Tv> well it could also be just a matter of not writing the quotes in
[23:18] <Tv> trivial change in vstart.sh
[23:18] <sagewk> that's a quick workaround
[23:18] <Tv> i'll commit it once i have some clarity on what's blowing up here
[23:18] <sagewk> that error message on the osd is a bug btw, fixing now.
[23:20] <Tv> sagewk: you mean it's expected to work even if it says bad things?
[23:21] <sagewk> the osd shouldn't drop messages in the boot state or else there's a race during startup (that you hit it looks like)
[23:21] <Tv> ahh
[23:21] <Tv> well i'm more curious why it didn't move out of boot state
[23:21] <sagewk> do you have the full log?
[23:22] <Tv> hold on, vstart ate my ceph.conf ;)
[23:22] <Tv> well, now i get different behavior
[23:23] <Tv> $ ./radosgw_admin user create --display-name=foo
[23:23] <Tv> 2011-03-31 14:23:01.993664 7fced3fe0720 librados: client.admin authentication error Operation not permitted
[23:23] <Tv> couldn't init storage provider
[23:23] <Tv> this one i should be able to debug
[23:23] <Tv> oh it didn't kill the previous cmon properly
[23:24] <sagewk> you don't have the old log do you?
[23:24] <Tv> sagewk: not one with the debug you asked for
[23:24] <sagewk> k. if you see it again let me know
[23:24] <Tv> yeah
[23:24] <Tv> perhaps it was the same thing though -- vstart failing to kill the old cmon, under some circumstances, new one can't bind and thus exits, osd has old keys that are now useles
[23:25] <Tv> doesn't look like it, but i don't have anything better :(
[23:26] <Tv> and now radosgw_admin worked
[23:27] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[23:33] <cmccabe> tv: yeah, the quotes thing is a bug
[23:33] <cmccabe> tv: I really want to fix it in ConfUtils, but maybe we should put a temporary workaround in place if it's bothering you
[23:34] <Tv> cmccabe: i have enough workaround already, will commit when i'm in cleanup phase
[23:34] <Tv> there's no reason to have quotes there in the first place
[23:34] * eternaleye_ is now known as eternaleye
[23:34] * eternaleye (~eternaley@ Quit (Remote host closed the connection)
[23:34] <cmccabe> tv: well... it is something we support in the config. so it's kind of good to test
[23:34] <Tv> cmccabe: dumb question but.. why?
[23:34] <Tv> as in, what's the value of "" in the config fle
[23:35] <cmccabe> so that people can use strings with spaces in them
[23:35] <sagewk> btrfs devs = "/dev/sda /dev/sdb /dev/sdc \
[23:35] <sagewk> /dev/sdd /dev/sde"
[23:35] <cmccabe> otherwise, confutils is aggressive about trimming whitespace
[23:35] <Tv> you mean leading/trailing/more than one contiguous whitespace?
[23:35] <cmccabe> I'm not sure if it strips internal whitespace from keys
[23:36] <cmccabe> so probably just leading and trailing
[23:36] <cmccabe> and more generally, the INI / python conf style formats that everyone use support quoted strings.
[23:36] <sagewk> hmm i guess we don't really need them actually, even if we do support \. unless we want leading/trailing whitespace in the value.
[23:36] <cmccabe> I think it's weird not to have quoted strings
[23:37] <cmccabe> although I must admit logically you don't need them
[23:37] <sagewk> yeah
[23:37] <cmccabe> but logically you don't need to support linebreaks either, but people are rather fond of them in conf files :)
[23:37] <cmccabe> anyway, I think I can work around this without too much difficulty...
[23:38] <cmccabe> a full fix will come later this week... in the config parser
[23:39] <Tv> cmccabe: umm, i don't think ConfigParser does anything automatic about quotes, no idea what you're talking about there
[23:39] <cmccabe> tv: one of my changes was to move parsing of values into md_config_t
[23:40] <cmccabe> tv: part of the reason this was done was to support get_val and set_val (the new library API allowing you to get/set conf values)
[23:40] <cmccabe> tv: long story short, used to, will again, doesn't now
[23:41] <Tv> cmccabe: i'm questioning the need
[23:41] <cmccabe> here's one good thing. We want to be compatible with python's ConfigParser class
[23:41] <cmccabe> and that supports quotes I think
[23:41] <Tv> cmccabe: it doesn't
[23:41] <Tv> cmccabe: and it doesn't like the indents anyway
[23:42] <cmccabe> tv: hmm. Are you saying there's no quoting mechanism in ConfigParser? How do you get leading/trailing whitespace in a value then?
[23:42] <Tv> cmccabe: mostly, you don't
[23:42] <Tv> i guess you could do tricks with \ continuations
[23:42] <Tv> but really, why would you want to?
[23:43] <cmccabe> if you can do it with backslashes, and ConfigParser doesn't support it, that does make me question the need for quotes
[23:45] <cmccabe> tv: mostly I just want us to use something that users will find familiar. If quotes aren't usually supported in these kind of config files then we shouldn't either.
[23:46] <Tv> in my mind, it's code you can rip out without losing anything valuable
[23:46] <Tv> less code = happier Tv
[23:46] <Tv> (especially in this case, where the code has already been ripped out ;)
[23:46] <cmccabe> heh
[23:46] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:47] <Tv> well, the s3 tests kinda run
[23:47] <Tv> not very well though
[23:48] <Tv> it's not running all of them, though, and other mysterious things
[23:48] <Tv> def run(self, *args, **vargs):
[23:48] <Tv> pass
[23:48] <Tv> oh Wes
[23:49] <Tv> there are lots of traps here
[23:49] <Tv> and a Grue
[23:51] <Tv> of 15 huge test cases, 3 were enabled and 2 succeeded; forcibly enabling all makes 6 succeed
[23:52] <Tv> i have >600 lines to clean up

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.