#ceph IRC Log


IRC Log for 2011-03-15

Timestamps are in GMT/BST.

[0:11] * MK_FG (~MK_FG@ Quit (Server closed connection)
[0:12] * MK_FG (~MK_FG@ has joined #ceph
[1:25] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:28] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:00] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:05] * cmccabe (~cmccabe@ has left #ceph
[2:41] * rajeshr (~Adium@ Quit (Quit: Leaving.)
[2:53] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:56] * greglap (~Adium@ has joined #ceph
[3:15] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[3:41] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[3:42] * lxo (~aoliva@ has joined #ceph
[3:56] * DJlee (82d8d198@ircip1.mibbit.com) has joined #ceph
[4:02] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[4:47] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[4:59] <lxo> upgraded to 0.25.1, let's see if I have better luck with it!
[5:00] <lxo> got rpms for blag140k/x86_64, if anyone's interested
[5:14] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[5:28] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[5:28] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[5:28] * neurodrone_ is now known as neurodrone
[6:50] * greglap1 (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[6:50] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[7:03] * hijacker_ (~hijacker@ has joined #ceph
[7:03] * hijacker (~hijacker@ Quit (Read error: Connection reset by peer)
[7:24] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[7:26] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:02] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[8:02] * greglap1 (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[8:15] * eternaleye_ (~eternaley@ has joined #ceph
[8:21] * eternaleye (~eternaley@ Quit (Ping timeout: 480 seconds)
[8:44] * johnl (~johnl@johnl.ipq.co) Quit (Remote host closed the connection)
[8:45] * lidongyang (~lidongyan@ Quit (Remote host closed the connection)
[8:50] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[8:51] * johnl (~johnl@johnl.ipq.co) has joined #ceph
[8:52] * lidongyang (~lidongyan@ has joined #ceph
[9:07] * johnl (~johnl@johnl.ipq.co) Quit (Remote host closed the connection)
[9:13] * johnl (~johnl@johnl.ipq.co) has joined #ceph
[9:28] * allsystemsarego (~allsystem@ has joined #ceph
[9:51] * Yoric (~David@ has joined #ceph
[10:09] * Yoric_ (~David@ has joined #ceph
[10:09] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[10:09] * Yoric_ is now known as Yoric
[10:14] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[10:28] * Yoric (~David@ has joined #ceph
[13:49] * [ack] (ANONYMOUS@ Quit (Server closed connection)
[13:49] * [ack] (ANONYMOUS@ has joined #ceph
[15:59] * iggy (~iggy@theiggy.com) Quit (Server closed connection)
[15:59] * iggy (~iggy@theiggy.com) has joined #ceph
[16:28] * prometheanfire (~mthode@mx1.mthode.org) Quit (Server closed connection)
[16:28] * prometheanfire (~mthode@mx1.mthode.org) has joined #ceph
[16:35] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[16:37] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[16:50] * greglap (~Adium@ has joined #ceph
[16:58] * Yoric (~David@ Quit (Quit: Yoric)
[17:20] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[17:27] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[17:28] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:31] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[17:32] * lxo (~aoliva@ has joined #ceph
[17:33] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:37] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:38] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:53] * cmccabe (~cmccabe@m430536d0.tmodns.net) has joined #ceph
[17:53] <Tv> dout("auth_init name '%s' secret '%s'\n", name, secret);
[17:54] <Tv> i question the sensibility of that..
[17:54] <cmccabe> tv: especially since dout doesn't support format specifiers...
[17:55] <Tv> ahahah
[17:55] <Tv> oh this is kernel side
[17:55] <Tv> it's probably a wrapper on printk
[17:55] <cmccabe> tv: yeah, it is
[17:56] <cmccabe> I forget who gets to look at klog
[17:56] <cmccabe> isn't it a root capability or something?
[17:56] <Tv> anyone, by default
[17:56] <Tv> dmesg
[17:57] <cmccabe> but yeah, it does seem that the key should not be echoed
[17:58] <Tv> does the on-wire auth protocol actually pass the "name"? how is that used, is it a hashtable key to find the right secret?
[17:58] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:00] <Tv> you know what i love? "git grep" is repeatably faster than "grep"
[18:01] <Tv> (and avoids searching the wrong files)
[18:02] <gregaf> Tv: it's not just a wrapper around grep to avoid searching the wrong files?
[18:02] <Tv> gregaf: nope
[18:03] <Tv> it can do things like search arbitrary commits etc, so it's from scratch C
[18:03] <gregaf> iiiinteresting
[18:03] <Tv> well it'll probably use some regex lib
[18:03] <Tv> there seems to be one bundled in git sources, too, under compat/
[18:04] <Tv> but as is typical with git, it was finetuned by performance maniacs
[18:04] <Tv> oh also, if you like e.g. emacses M-x grep and just hitting enter on the right line (or if you have something similar for vi/whatever):
[18:05] <Tv> $ cat /home/tv/bin/grip
[18:05] <Tv> #!/bin/sh
[18:05] <Tv> git grep -n "$@" | cat
[18:05] <Tv> (the |cat disables colorization, which makes emacs detect the patterns right)
[18:05] <cmccabe> tv: you could use --no-color and save a process
[18:06] <Tv> no such option
[18:06] <cmccabe> I'm on version 1.7.4
[18:07] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Quit: Leaving.)
[18:07] <Tv> ah it seems i'm using the deb on this machine, not my own build, so it's older than i'm used to.. yeah it was added in .2 or .3 i recall
[18:07] <Tv> i was puzzled when that didn't work, but just put in a quick workaround..
[18:07] <gregaf> I wonder if it's safe to assume the ARM build will complete if it's been going for 16-20 hours without an issue
[18:07] <Tv> haha
[18:08] <Tv> well, there's the halting problem..
[18:08] <gregaf> actually it doesn't matter — I need to try running it to make sure the instructions it's inserting actually exist
[18:08] <gregaf> arm development blows :(
[18:09] <Tv> yup
[18:09] <Tv> and remember you have a high-end arm that's emulated there
[18:09] <gregaf> yeah
[18:09] <Tv> ask me about powerpc-nommu some day...
[18:09] <gregaf> hahaha
[18:09] <cmccabe> you could try setting up a cross-compile env
[18:10] <gregaf> either this machine (and the debian build farm one) has an arm arch that the atomic-ops people don't know about/didn't include in their checks
[18:10] <cmccabe> it's not easy but it would allow you to build on multicore
[18:10] <gregaf> or else it's too old to include the instructions
[18:10] <gregaf> but my rough understanding is that ARMv5 machines are all really old and crappy
[18:10] <gregaf> and ARMv6 has all the instructions they want
[18:10] <Tv> cmccabe: real world crosscompiles are so hard i know embedded shops that choose not to do it..
[18:11] <Tv> last, best hope for sanity: http://www.scratchbox.org/
[18:12] <Tv> note: "Uses either QEMU or a real target hardware to execute cross-compiled binaries (extremely useful when cross-compiling software which uses autoconf & co.)"
[18:12] <Tv> the autoconf feature test thing is a royal pain with cross-compilation
[18:12] <cmccabe> tv: I've done cross-compile before. You mostly tell autoconf not to test at all because of the binary compat issues
[18:13] <cmccabe> tv: it's not easy, but it's not impossible. You just need to build binuits, build gcc, build binutils again, and then you're ready
[18:14] <Tv> and patch a bazillion libraries that your project uses, so they'll hopefully start working when cross-compiling, fix every naive makefile snippet that ignores cross-compilation, ....
[18:14] <Tv> it's ok when you're isolated
[18:14] <cmccabe> tv: to help you avoid this, a lot of projects give out pre-compiled gcc and binuits binaries, like android
[18:14] <cmccabe> tv: a lot of manufacturers give out precompiled toolchains for their arm boards
[18:15] <Tv> which are mostly pure crap ;)
[18:15] <cmccabe> I dunno. I didn't find them to be that crappy
[18:15] <cmccabe> I mean it's not like they do anything to it. They just download gcc and hit compile.
[18:15] <Tv> i did business supporting OpenEmbedded in .fi, because it was so much better than the manufacturer's own sdk
[18:15] <Tv> we kept saying, the sdk is a demo only..
[18:16] <cmccabe> calling a prebuilt toolchain an "sdk" is a little grandiose
[18:16] * midnightmagic (~nouser@S0106000102ec26fe.gv.shawcable.net) has joined #ceph
[18:16] <cmccabe> I think we're talking about different things
[18:16] <Tv> yeah i mean the thing that goes from source to a firmware image
[18:17] <Tv> and sucks along the way
[18:17] <cmccabe> prebuilt toolchains are good if you want to put a new kernel on something without waiting a few hours to build binuits and gcc
[18:17] <cmccabe> OpenEmbedded is like a full distro almost
[18:18] <cmccabe> I support the idea of OE in theory, but I've never actually used it
[18:20] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:21] <cmccabe> tv: scratchbox is kind of interesting
[18:21] <cmccabe> tv: seems to be chroot-based
[18:21] <Tv> yeah the cross-compiler is in a chroot if i recall
[18:21] <Tv> so you can just say "gcc"
[18:22] <Tv> minimizes the need for fixing bad makefiles
[18:28] * rajeshr (~Adium@ has joined #ceph
[18:28] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[18:28] <bchrisman> you guys have recommendations for tracing down cfuse issues? I've got a cfuse process that just 'goes away'.. I see initialization in messages and I run some stress testing… and at some point 'poof'.. messages notices the fuse init.. but the going away is only noticed by fuse (Transport endpoint is not connected)
[18:29] <Tv> bchrisman: core dumps perhaps?
[18:29] <bchrisman> didn't see any..
[18:29] <Tv> ulimit?
[18:30] <bchrisman> unlimited
[18:30] <bchrisman> err
[18:30] <bchrisman> yeah..
[18:30] <bchrisman> feh.
[18:30] <bchrisman> core file sze zero..
[18:31] <bchrisman> wonder if that prevents file creation at all.
[18:31] <Tv> yes
[18:31] <gregaf> team meeting, back soon!
[18:31] <bchrisman> cool… thx for waking me up.. :)
[18:31] * cmccabe (~cmccabe@m430536d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[18:33] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:42] * cmccabe (~cmccabe@m340536d0.tmodns.net) has joined #ceph
[19:01] * DJlee (82d8d198@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[19:04] <Tv> somebody force pushed master.. naughty
[19:04] <gregaf> ....when?
[19:05] <Tv> + fd3de5a...4ee75a8 master -> newdream/master (forced update)
[19:05] <gregaf> oh, sage did that yesterday
[19:06] <gregaf> iirc tossing out some bad librbd/librados merges
[19:06] <gregaf> they weren't very long-lived and it's something I think we're stuck with until we institute proper reviews prior to merge
[19:09] * cmccabe (~cmccabe@m340536d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[19:11] <Tv> cfuse always daemonize change breaks my autotest framework :(
[19:11] <gregaf> it used to daemonize by default — does it force a daemonize now?
[19:11] <Tv> yes
[19:12] <Tv> db25852fd9a6138c67e7f33f0334bcfb4ca832f0
[19:12] <Tv> then again, df8c00945f8ddb0553b74fe169af70b4e2d202b2
[19:12] <Tv> no clue what the real end result is currently
[19:12] <gregaf> sagewk: see above
[19:12] <Tv> i just need the foreground feature, to have it as child
[19:12] <gregaf> and Colin's not here....
[19:12] <Tv> i can just develop against an older version, that's fine
[19:13] <gregaf> the recent configuration setup changes broke the old behavior and they tried to set it up so it would daemonize by default
[19:13] <Tv> but it needs to get fixed at some point
[19:13] <gregaf> I guess it's still missing functionality, though
[19:14] <sagewk> tv: sigh. ok. i'll hack a slight better workaround for the stable branch
[19:15] <sagewk> it's fixed in master....
[19:15] <sagewk> just tedious to backport to stable...
[19:15] <sagewk> is that good enough for autotest?
[19:15] <Tv> sagewk: oh yeah screw stable ;)
[19:16] <sagewk> that's what i always say :)
[19:17] <Tv> ah ok so the merge really does the right thing, good
[19:17] <Tv> i was worried the temporary hack got inherited from stable
[19:18] * yehudasa (~quassel@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:20] * cmccabe (~cmccabe@m3f0536d0.tmodns.net) has joined #ceph
[19:21] <cmccabe> IRC keeps kicking me because of my eeevil cell phone IP address
[19:21] <cmccabe> I keep getting this message about SASL, but I thought that was just for email
[19:22] <Tv> sasl is an authentication library
[19:30] * cmccabe (~cmccabe@m3f0536d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[19:41] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:52] * cmccabe (~cmccabe@m4a0536d0.tmodns.net) has joined #ceph
[19:52] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[19:52] * lxo (~aoliva@ has joined #ceph
[20:01] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[20:01] * cmccabe (~cmccabe@m4a0536d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[20:15] * cmccabe (~cmccabe@m4c0536d0.tmodns.net) has joined #ceph
[20:28] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[20:34] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[20:46] <Tv> hmm my mons just hang there, ceph -s and ceph health just hang, ...
[20:46] <Tv> no new log entries
[20:50] <Tv> [pid 3316] sendmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\0\354\f\0\0\0\2\0\0\n\3\0165\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 136}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 136
[20:50] <Tv> [pid 3316] sendmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"\372\1\0\0\0\0\0\0\10\0\0\0\2\0\0\0\0\0\0\0\17\0\0\0\0\0\0\0\0\0\0\0"..., 33}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 33
[20:50] <Tv> [pid 3316] poll([{fd=4, events=POLLIN|0x2000}], 1, 900000
[20:50] <Tv> and that just times out
[20:50] <Tv> bleh
[21:06] <Tv> next run: it got a few commands further (ceph auth adds), hangs in ceph mds set_max_mds now
[21:06] <Tv> oh and i simplified to 1 mon, 1 mds, 1 osd
[21:08] <Tv> [pid 2094] sendmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"ceph v027", 9}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9
[21:08] <gregaf> Tv: you're going to have to give us more than that...
[21:08] <Tv> ok so cmon does say hello
[21:08] <Tv> gregaf: i know, i wish i had more
[21:08] <gregaf> what are you doing?
[21:08] <Tv> basic bring up of cluster is where it starts failing
[21:09] <Tv> i have "debug mon = 20"
[21:09] <gregaf> via init-ceph?
[21:09] <Tv> not really, just running the daemons individually.. the autotest stuff wants a lot of control over them
[21:10] <Tv> 2011-03-15 13:01:54.975043 7fc40b481720 mon.0@0(starting).auth v0 importing client.admin auth(auid = 0 key=AQCxxX9NKJwuMBAADvX0lPyxQyCUErL5vWe/ZA== with 3 caps)
[21:10] <Tv> 2011-03-15 13:01:54.975055 7fc40b481720 mon.0@0(starting).auth v0 encode_pending v 1
[21:10] <Tv> 2011-03-15 13:01:54.975089 7fc40b481720 store(dev/mon.0) put_bl auth/1 = 217 bytes
[21:10] <Tv> 2011-03-15 13:01:55.016943 7fc40b481720 store(dev/mon.0) set_int auth/last_committed = 1
[21:10] <Tv> 2011-03-15 13:01:55.060373 7fc40b481720 store(dev/mon.0) put_bl monmap/latest = 195 bytes
[21:10] <Tv> that's the end of the log
[21:10] * Tv notices a secret in the log file.. :-(
[21:10] <gregaf> log files can have user permissions preventing access, and we want to make it possible to debug auth issues
[21:11] <gregaf> did you run mkcephfs et al?
[21:11] <gregaf> and have you checked that the nodes can all communicate with eachother?
[21:11] <Tv> currently, there's just one node
[21:11] <Tv> 1 mon, 1 mds, 1 osd
[21:12] <gregaf> well have you made sure that loopback works?
[21:12] <gregaf> the daemons are still going to send tcp messages to eachother
[21:12] <Tv> ping is fine
[21:12] <Tv> this setup worked a week or two ago
[21:12] <Tv> i may have to manually bisect :(
[21:12] <gregaf> are you specifying config files for each daemon?
[21:12] <Tv> yes
[21:13] <Tv> there is nothing ceph in /etc
[21:13] <gregaf> well maybe you should add debugging when you run ceph tool
[21:13] <Tv> how?
[21:13] <gregaf> put it on the command line
[21:13] <gregaf> ceph —debug_ms 1 —debug_monc 10 ....
[21:13] <Tv> --help was not helpful
[21:14] <gregaf> you can pass in any of our configuration options on the command line like that :)
[21:14] <Tv> 2011-03-15 13:14:02.881485 7fd15622a700 -- mark_down -- 0x1d1fd10
[21:15] <Tv> http://pastebin.com/raw.php?i=mastqjbJ
[21:15] <Tv> it just thinks the daemon is slow to respond
[21:15] <Tv> which it is (from strace)
[21:15] <gregaf> so it can't communicate with the mon
[21:15] <Tv> http://pastebin.com/raw.php?i=YiQzyuzd
[21:15] <gregaf> it waits 3 seconds without any communication back
[21:15] <Tv> that's entirety of mon.0 log
[21:15] <Tv> and the daemon is still running
[21:16] <gregaf> can you gdb attach and see what it's doing?
[21:16] <Tv> sure let me fiddle
[21:17] <Tv> pthread_cond_wait under messenger->wait(); in cmon main
[21:18] <Tv> lots of threads, it seems
[21:18] <gregaf> Tv: that put_bl is it trying to commit to disk
[21:18] <gregaf> is the disk access slow enough that it might just be hanging there?
[21:19] <Tv> it's perfectly interactive
[21:20] <gregaf> well that's the last thing your monitor log has it doing, is writing to disk
[21:20] <Tv> nothing special in kernel logs
[21:20] <gregaf> how much debugging do you have on the monitor?
[21:20] <gregaf> what levels, I mean
[21:22] <Tv> backtrace of all threads is huge..
[21:22] <Tv> http://pastebin.com/raw.php?i=DPa2jmV5
[21:22] <gregaf> I'm really surprised there's not more debugging than that — something's getting stuck on startup
[21:23] <Tv> pastebin refuses to host my traceback file as too big ;)
[21:24] <Tv> http://paste2.org/p/1304653
[21:24] <gregaf> wait, that's your entire log file from the monitor?
[21:24] <gregaf> that doesn't look anything like my vstart mon's log file
[21:25] <Tv> log file is 79 lines total
[21:25] <Tv> ooh it seems it got new entries..
[21:25] <Tv> ah no just misreading
[21:32] <Tv> proc = utils.BgJob(command='{bindir}/cmon -f -i {id} -c {conf}'.format(
[21:32] <Tv> not using -D
[21:33] <Tv> man page says
[21:33] <Tv> -D Debug mode: do not daemonize after startup (run in foreground) and send log output
[21:33] <Tv> to stdout.
[21:33] <Tv> -f do not daemonize after startup (run in foreground), but log to the usual location.
[21:33] <Tv> Useful when run via crun(8).
[21:33] <cmccabe> those are switched unfortunately
[21:33] <cmccabe> -f logs to foreground; -D is do not daemonize
[21:34] <cmccabe> the man page needs an update
[21:34] <cmccabe> I don't think you want -f unless you're prepared to handle a lot of spew to stderr
[21:36] <Tv> that's... undesirable
[21:36] <Tv> -f is a reasonably standard switch to "do not double-fork"
[21:36] <gregaf> if we don't have any users we should probably just switch the names, then
[21:36] <Tv> also, i'm still getting some content in the log file
[21:36] <Tv> gregaf: well, a couple of weeks ago this worked..
[21:37] <gregaf> probably they got inadvertently switched, then
[21:37] <cmccabe> it's been this way for a pretty long time...
[21:37] <Tv> generic_usage() is even more confused
[21:37] <Tv> http://paste2.org/p/1304653
[21:38] <Tv> err
[21:38] <Tv> -D Run in the foreground.\n\
[21:38] <Tv> -f Run in foreground. Show all log messages on stderr.\n\
[21:38] <cmccabe> that isn't confused. It describes what it does now.
[21:38] <Tv> and unusable
[21:38] <Tv> and the old behavior was different, this hasn't been like this for long
[21:38] <Tv> oh well
[21:39] <cmccabe> I'm pretty sure that it's been like this since you've started :)
[21:39] <Tv> let me just say that "foreground logging" is a weird combination of words
[21:39] <Tv> cmccabe: this setup worked before the common_init etc changes
[21:39] <cmccabe> the usage doesn't include the phrase "foreground logging"
[21:39] <cmccabe> -f Run in foreground. Show all log messages on stderr.\n\
[21:40] <wido> cmccabe: 4c22c159d203f046c98a7d636df700bb550b597d is a result of my post on the ml?
[21:41] <cmccabe> wido: I think that commit will help us achieve what we need to with librbd
[21:41] <cmccabe> wido: er sorry, rbd
[21:41] <cmccabe> wido: but rbd itself probably still needs to be changed to call that function
[21:42] <joshd> wido: qemu uses the default config now
[21:42] <joshd> cmccabe: I think the command line tools still use common_init
[21:43] <cmccabe> joshd: that's fine...
[21:43] <wido> joshd: I see! You took away my Signed-Off ;)
[21:43] <cmccabe> joshd: common_init works great for command-line tools where we control the whole process
[21:44] <wido> Oh, no, I see the commit is a bit different
[21:44] <cmccabe> joshd: I guess the confusing thing is that currently since we haven't implemented true per-cluster config, the stupid global config may be modified by rados.init()
[21:45] <cmccabe> joshd: I think for now, just call rados.read_conf(NULL) after rados.init()
[21:46] <wido> right now there is no way to pass config options to librbd through Qemu, is there?
[21:47] <joshd> wido: not yet
[21:47] <wido> Any ideas? Some special formatted string in the 'drive' argument?
[21:47] <cmccabe> joshd: eventually common_init will return an md_config_t* and rados_init will be able to take that config as a param
[21:47] <joshd> we plan to add a way to specify a separate ceph config for each drive, yeah
[21:48] <wido> rbd:rbd/beta:/etc/ceph/beta.conf
[21:48] <wido> something like that
[21:48] <cmccabe> joshd: actually we might be able to create an API to initialize rados with a pre-existing config without too much trouble, even without de-globalizing g_conf
[21:49] <cmccabe> joshd: we just have a constructor that takes md_config_t* instead of the normal params
[21:49] <cmccabe> joshd: this api would really only be useful for our own utility programs, but it would still be useful.
[21:50] <joshd> that'll let you use drives from more than one cluster as well
[21:50] <joshd> wido: I'll add your signed-off - I didn't notice the patches were so similar
[21:51] <wido> joshd: Oh, it was just a joke!
[21:51] <wido> But feel free
[21:53] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[21:53] <joshd> cmccabe: I'm not too worried about it atm, I don't think it causes many problems to use common_init in rbd and rados right now
[21:54] <joshd> wido: not sure about the exact format yet, but something like you suggested, probably
[21:55] <joshd> wido: I was wondering if you noticed qemu using a lot of memory before crashing when you were writing 100GB
[21:55] <wido> joshd: No, sorry. I wasn't really paying attention to the machine, i started the dd and started doing some other work
[21:55] <joshd> unfortunately valgrind and qemu don't mix well
[21:55] <wido> what I did notice, my write speeds were low, very low, about 7MB/sec
[21:57] <joshd> wido: is that much slower than qemu before librbd?
[21:59] <Tv> ./src/mount.ceph -v -o secret=AQC70X9NwCfNCRAAdjXjdlajlO6EH9UI0AOogA== /mnt
[21:59] <Tv> Can't understand option: 'secret=AQC70X9NwCfNCRAAdjXjdlajlO6EH9UI0AOogA=='
[21:59] <Tv> what's all that about then?
[21:59] <Tv> ohhh -v expects an extra arg???
[21:59] <Tv> without -v it works
[22:00] <Tv> the code in mount.ceph.c doesn't look like that, but now i'm just suspicious
[22:00] <cmccabe> mount.ceph has its own argument parsing which is all inside mount/mount.ceph.c
[22:00] <Tv> [0 tv@dreamer ~/src/ceph.git]$ ./src/mount.ceph -v -o secret=AQC70X9NwCfNCRAAdjXjdlajlO6EH9UI0AOogA== /mnt 2>&1|head -1
[22:00] <Tv> Can't understand option: 'secret=AQC70X9NwCfNCRAAdjXjdlajlO6EH9UI0AOogA=='
[22:00] <Tv> [0 tv@dreamer ~/src/ceph.git]$ ./src/mount.ceph -o secret=AQC70X9NwCfNCRAAdjXjdlajlO6EH9UI0AOogA== /mnt 2>&1|head -1
[22:00] <Tv> Can't understand option: ''
[22:00] <Tv> [0 tv@dreamer ~/src/ceph.git]$
[22:01] <Tv> it seems the -v gobbles the next arg
[22:01] <Tv> or something.. it changes the behavior
[22:01] <gregaf> our arg parsing is pretty fragile, a lot of them do just gobble the next argument
[22:02] <cmccabe> actually that's not it in this case
[22:02] <Tv> *src = argv[1];
[22:02] <Tv> *node = argv[2];
[22:02] <Tv> uhh, unconditional
[22:02] <cmccabe> it's that mount.ceph hardwires argv[1] and argv[2]
[22:02] <Tv> lovely
[22:02] <Tv> fscking lovely
[22:03] <cmccabe> tv: it may suck, but it looks like all the mount.foo programs operate this way
[22:03] <cmccabe> [cmccabe@highcastle src]$ mount.nfs -h
[22:03] <cmccabe> usage: mount.nfs remotetarget dir [-rvVwfnsih] [-o nfsoptions]
[22:04] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[22:04] <gregaf> probably they're all copy-pastad
[22:04] <gregaf> I'm quite certain that's how we wrote ours, although I don't remember if Sage or I actually did it :)
[22:04] <cmccabe> I think the mount program execs these programs
[22:05] <cmccabe> so you're not really "supposed" to invoke them directly
[22:06] <Tv> yeah that i can buy
[22:06] <cmccabe> so it's just sort of a secret handshake between the mount program and mount.foo
[22:06] <cmccabe> [cmccabe@highcastle src]$ mount.nilfs2 -h
[22:06] <cmccabe> mount.nilfs2: invalid option -- 'h'
[22:06] <cmccabe> sigh...
[22:07] <Tv> http://www.ackbar.org/images/ackbar.jpg
[22:08] <cmccabe> it's a trap?
[22:08] <Tv> yup
[22:08] <cmccabe> "nooooooooooooooooooooooooooooooo"
[22:08] <Tv> some day i wish to have an ackbar-free source tree
[22:10] <wido> joshd: I didn't notice a real difference when librbd got used, but I normally had about 35MB/sec write
[22:10] <wido> the 7MB/sec is something I noticed today
[22:12] <wido> But while my write is busy, a "df -h" is very slow, all read ops seem to be very slow too
[22:12] <wido> "time df -h": 6sec
[22:14] <joshd> hmm, yehudasa may know more about qemu performance
[22:14] <wido> Well, not really important now.
[22:14] <joshd> yeah, something to look into later though
[22:15] <wido> re-running the 100GB file, Qemu just went from 19.6% mem to 21.2%
[22:15] <wido> but then dropped again to 18.5%, seems to stay between 18 and 22% (for now)
[22:17] <joshd> that's what I've been seeing too
[22:17] <joshd> thanks for testing again
[22:18] <wido> np! My write is still running, so the behaviour might change in time. My write crashed at 11GB
[22:32] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Server closed connection)
[22:32] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:34] <wido> just passed the 11GB point, right now at 12GB
[22:36] <wido> I'm going afk, ttyl
[22:39] <Tv> anyone have a good idea why trying to mount ceph would give "mount error 16 = Device or resource busy"?
[22:39] <Tv> ohh it got mounted, and is doing something special to avoid double-mounts
[22:39] <Tv> whee this switched from not working to working and i didn't even notice
[22:40] <Tv> yay
[22:40] <gregaf> huh?
[22:41] <Tv> never mind
[22:41] <Tv> there's special logic in mount to prevent double-mounting the exact same thing
[22:41] <Tv> i've played too much with mounting things on top of each other that i always forget it has that special case that it forbids
[22:42] <Tv> (bind mounts are a mindwarp)
[22:50] <Tv> whee autotest is running fsx with kernel client
[22:51] <Tv> completely io-bound on the single osd i have in use right now
[23:21] * DJLee (82d8d198@ircip1.mibbit.com) has joined #ceph
[23:36] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.