#ceph IRC Log


IRC Log for 2012-03-28

Timestamps are in GMT/BST.

[0:00] <imjustmatthew> yeap, I'll put it someplace and send you a link
[0:00] <imjustmatthew> thanks for your help
[0:00] <gregaf> awesome, thanks!
[0:00] <gregaf> np
[0:01] <nhm> sjust: were you able to get teuthology to create OSD journals on an alternate drive? I'd like to do the same thing...
[0:01] <sjust> nhm: sorry it took so long, about to commit it
[0:01] <sjust> had to do battle with udev
[0:01] <nhm> sjust: ugh, udev can be a pain.
[0:02] <sjust> or rather, it took a while to realize that I needed to add ubuntu to the disk group rather than attempting to change the permissions
[0:03] <nhm> heh, html5 mmo: BrowserQuest
[0:12] <joao> lol
[0:16] <joao> sometimes I wonder if it's even possible to debug c++
[0:20] <gregaf> yehudasa: reviewed, few minor style/doc things on github, but looks good
[0:21] <yehudasa> gregaf: thanks
[0:42] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[0:52] <yehudasa> gregaf: looking at Throttle.h:
[0:52] <yehudasa> bool _should_wait(int64_t c) {
[0:52] <yehudasa> return
[0:52] <yehudasa> max &&
[0:52] <yehudasa> ((c < max && count + c > max) || // normally stay under max
[0:52] <yehudasa> (c >= max && count > max)); // except for large c
[0:53] <yehudasa> shouldn't it be count + c >= max ?
[0:54] <yehudasa> when I'm setting a Threshold with max=1, I'm able to do get(1) twice on it before it blocks
[0:59] <gregaf> yehudasa: umm, I think switching that would be okay
[0:59] <gregaf> although it's not really designed for very small values like that
[0:59] <yehudasa> gregaf: I'm not completely sure that would be correct
[0:59] <yehudasa> we're failing on the second part of the if
[1:00] <gregaf> not unless you have already exceeded max
[1:01] <yehudasa> in this case c == max, and count == max
[1:01] <gregaf> yeah
[1:01] <gregaf> so swapping the first term to a >= should deal with it
[1:01] <gregaf> really though Throttler isn't designed for use where an off-by-one like that is going to matter
[1:01] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[1:01] <gregaf> and the specific lines you're looking at are a hack Sage wrote so that very large messages could work (iirc)
[1:02] <yehudasa> yeah, don't care too much about it, was just noticing that when I tried to verify that the rgw throttling was working
[1:04] <gregaf> yeesh, that second term is wrong anyway
[1:04] <yehudasa> Hmm, actually I think it should be (c <= max && count + c > max)
[1:04] <yehudasa> for the first term
[1:04] <gregaf> should be
[1:04] <gregaf> (c >= max && count < max))
[1:04] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:04] <gregaf> so that it only allows through more if currently UNDER max, not if currently OVER max
[1:04] <yehudasa> hmm, but it's _should_wait
[1:05] <yehudasa> so if count > max it should wait
[1:05] * ivan\ (~ivan@ Quit (Quit: ERC Version 5.3 (IRC client for Emacs))
[1:05] <gregaf> *sigh* right
[1:08] <yehudasa> gregaf: in any case, can you look at wip-rgw-throttle?
[1:08] <yehudasa> pretty trivial
[1:09] <gregaf> actually I think you need to leave it the way it is ?????if you do c<=max in the first term then it's going to evaluate to true for c == max and any value of count
[1:10] <gregaf> 1 is just not a safe max to use in the throttler
[1:11] <yehudasa> it would only evaluate to true if count > 0
[1:12] <yehudasa> otherwise we'll always have an off by one issue
[1:12] <gregaf> no, the first term would ??? wow, my stack is not working
[1:13] <gregaf> \me bangs forehead
[1:13] * gregaf bangs forehead
[1:13] <gregaf> there we go
[1:13] <gregaf> not real familiar with irc commands
[1:14] <yehudasa> I thought for a second that you switched to TeX
[1:15] <gregaf> just the HEAD commit on that branch?
[1:15] <yehudasa> yeah, there's nothing else
[1:15] <gregaf> okay, looks boring enough :)
[1:16] <gregaf> although I love the git history...yikes!
[1:19] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[1:26] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[1:38] <Tv|work> \banghead{\me}
[1:39] <gregaf> yep, that'd be \latex's way of doing it
[1:42] * lofejndif (~lsqavnbok@04ZAACBN2.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[1:46] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:09] * MarkDude (~MT@ has joined #ceph
[2:46] * perplexed (~ncampbell@ Quit (Ping timeout: 480 seconds)
[2:48] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[2:50] <joao> sagewk, still there?
[2:53] <joao> well, the workload gen already destroys collections
[2:53] <joao> still not sure if everything works, as I just rebased the whole thing on top of master and, well, had to rebuild the whole repo
[2:54] <joao> and it is taking its time, so I'll leave it compiling and I'm heading to bed
[2:54] <joao> (btw, haven't pushed it to github yet, as I want to make sure everything is working properly)
[2:55] <joao> talk to you guys tomorrow
[2:55] <joao> bye
[3:01] <joao> hmm...
[3:02] <joao> so, I think I messed something up
[3:02] <joao> make[3]: *** No rule to make target `librados.cc', needed by `librados_la-librados.lo'. Stop.
[3:02] <joao> any clues?
[3:03] <joshd> joao: you might just need to 'make distclean' and rebuild, since the location of that file changed a little while ago
[3:03] <gregaf> joao: I'm surprised if your repo was old enough for it, but ???lol, what Josh said
[3:03] <gregaf> or if you did an improper merge, librados.cc is located in librados/librados.cc now, so make sure that the Makefile.am is right
[3:04] <joao> honestly, I thought I had been working with the latest master as of one week and half ago
[3:05] <joao> but what makes me a sad panda is rebuilding ceph again
[3:12] <joao> well, looks like I'm not waiting to see if it compiles ;)
[3:13] <joao> thanks
[3:13] <joao> off to bed
[3:13] <joao> o/
[3:19] <nhm> g'night joao
[3:54] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[3:55] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:08] * Oliver (~oliver1@ip-109-90-14-183.unitymediagroup.de) has joined #ceph
[4:10] * Oliver1 (~oliver1@ip-109-90-14-183.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[5:04] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) has joined #ceph
[5:53] <elder> Tv__, I'm looking at how debuginfo/dbgsym packages are built. Do you happen to know anything about that?
[5:57] <Tv__> elder: for what?
[5:57] <elder> The kernel.
[5:58] <elder> And in particular, a kernel that's not packaged by a distro
[5:58] <elder> (i.e., one from autobuilder)
[5:58] <Tv__> well that's all from make deb-pkg
[5:58] <elder> There doesn't seem to be a rule in it to make the debug symbols package though.
[5:59] <elder> I see where such a rule would probably belong though.
[5:59] <Tv__> does it build them?
[5:59] <Tv__> seems like a no
[6:00] <elder> Well, I don't see them. I did do a "make deb-pkg" for the first time and see that it does generate linux-firmware, linux-headers, linux-image, and linux-libc-dev
[6:00] <Tv__> yeah.. no debug in there, right?
[6:00] <elder> (and the firmware is the out-of-date from kernel tree I think)
[6:00] <elder> Right.
[6:00] <Tv__> so the kernel is not set to build debug symbol debs
[6:00] <elder> As far as I can see, that's a correct statement.
[6:01] <Tv__> now what do you want to do about that?
[6:01] <Tv__> you started out as "how"
[6:01] <elder> But I'm finding my way around with, well poor lighting at least. I mostly wanted to know if you knew anything about it to help make sure I'm not going too far off in the weeds.
[6:02] <Tv__> i've used the symbols in a different context, but i think that was more of a custom thing
[6:02] <Tv__> like, packaged a product and archived the symbols
[6:03] <elder> There is a utility that I installed that wraps dh_strip with commands that grab the symbols and package them. But I saw a reference somewhere saying it wasn't needed for the kernel. Which led me to think there may already be a way to get it, I just haven't found it yet.
[6:03] <Tv__> that's for userspace
[6:03] <elder> Right.
[6:03] <elder> I tried it, and it did seem to produce a partial ddeb
[6:03] <Tv__> it's about calling strip on dynamic linked executables
[6:03] <elder> But it produced an error.
[6:03] <Tv__> not applicable to kernel
[6:04] <elder> I expect it's out there somewhere and I'll keep searching for clues.
[6:04] <elder> I mean, Ubuntu builds them, I just don't know how. Actually, I think the Ubuntu kernel build system has a whole debian/ infrastructure that supports it.
[6:04] <Tv__> well, you need to get to a point where you know how to run "make bzImage" and get the debug symbols somewhere from that
[6:05] <Tv__> we can package it from there
[6:05] <Tv__> but i've forgotten what files to archive etc
[6:05] <elder> I think I can extract the info, but I'm just not sure what all is expected to be packaged.
[6:05] <Tv__> where do you see ubuntu kernel having debug symbols? my package search failed to find that
[6:05] <elder> You have to add a different repository and then pull them from there.
[6:05] <elder> Just a sec.
[6:06] <Tv__> ah right
[6:06] <Tv__> anyway, we can look at their debian/rules to see what they do
[6:06] <Tv__> and maintain a branch against linus's tree that does similar
[6:07] <elder> http://ddebs.ubuntu.com $(lsb_release -cs)
[6:07] <elder> https://wiki.ubuntu.com/DebuggingProgramCrash
[6:07] <elder> (described in that link, under "Debutg Symbol Packages"
[6:08] <elder> OK. I'll take a look at that before I go to bed. Maybe we can discuss it tomorrow. Thanks.
[6:08] * MarkDude (~MT@ Quit (Read error: Connection reset by peer)
[6:16] * chutzpah (~chutz@ Quit (Quit: Leaving)
[6:24] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[6:25] <elder> For tomorrow... I think the rules are in this file, in the ubuntu kernel tree (which I have downloaded): git://kernel.ubuntu.com/ubuntu/ubuntu-oneiric.git
[6:25] <elder> Whoops
[6:26] <elder> That's what I cloned. Here is the file therein: debian/rules.d/2-binary-arch.mk
[6:26] <elder> Good night.
[6:38] * f4m8 (f4m8@kudu.in-berlin.de) has joined #ceph
[6:39] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:01] * wi3dzma (3eb503c8@ircip1.mibbit.com) has joined #ceph
[7:37] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:45] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:49] * cattelan is now known as cattelan_away
[7:50] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:08] * raso (~raso@debian-multimedia.org) Quit (Quit: WeeChat 0.3.6)
[9:00] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:00] * LarsFronius (~LarsFroni@f054106020.adsl.alicedsl.de) has joined #ceph
[9:13] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[9:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:18] * Oliver (~oliver1@ip-109-90-14-183.unitymediagroup.de) Quit (Quit: Leaving.)
[9:22] * stass (stas@ssh.deglitch.com) has joined #ceph
[9:46] * LarsFronius (~LarsFroni@f054106020.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[9:50] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[9:53] * Oliver (~oliver1@p4FFFE978.dip.t-dialin.net) has joined #ceph
[10:02] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:07] <joao> any of you guys still up by any chance?
[10:24] * BManojlovic (~steki@ has joined #ceph
[10:57] * lofejndif (~lsqavnbok@659AAA35W.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:30] <Dieter_be> i'm up
[11:30] <Dieter_be> but i'm none of them guys
[11:30] <Dieter_be> maybe i can help you though
[11:49] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:54] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Operation timed out)
[11:54] * LarsFronius_ is now known as LarsFronius
[12:07] <joao> Dieter_be, thanks for offering, but I managed to solve the problem by myself :)
[12:08] * lofejndif (~lsqavnbok@659AAA35W.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[12:16] * phil (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[12:17] * phil is now known as Guest8014
[12:36] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:02] * Theuni (~Theuni@ has joined #ceph
[14:04] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:13] * wi3dzma (3eb503c8@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:22] * raso (~raso@debian-multimedia.org) has joined #ceph
[14:27] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[14:50] <nhm> good morning #ceph
[14:52] * andret (~andre@pcandre.nine.ch) Quit (Ping timeout: 480 seconds)
[14:53] * andret (~andre@pcandre.nine.ch) has joined #ceph
[14:59] <joao> morning nhm
[14:59] <nhm> joao: have you used teuthology-suite at all?
[15:00] <joao> I've used teuthology to run tests, if that's what you mean
[15:00] <nhm> joao: ok, I was wondering specifically about the teuthology-suite and teuthology-schedule scripts.
[15:00] <joao> then, no :)
[15:01] <nhm> joao: ok, thanks anyway
[15:02] <joao> nhm, did you by any chance ever ran into a "perpetually waiting" behavior on FileStore?
[15:02] <nhm> joao: not yet
[15:02] <joao> more exactly on apply_transaction()
[15:02] <joao> :\
[15:02] <joao> well, this have been taking me all morning to figure out
[15:03] <nhm> joao: any errors?
[15:03] <joao> everything worked before rebasing the workload gen on top of master
[15:03] <joao> no, just waiting for a condition
[15:03] <joao> #1 0x000000000044e832 in Wait (mutex=..., this=0x7fffffffdc40) at ./common/Cond.h:48
[15:03] <joao> #2 FileStore::apply_transactions (this=0x7c1760, tls=..., ondisk=<optimized out>)
[15:03] <joao> at os/FileStore.cc:2317
[15:03] <joao> #3 0x000000000044ef2e in FileStore::apply_transaction (this=0x7c1760, t=..., ondisk=0x0)
[15:03] <joao> at os/FileStore.cc:2299
[15:04] <joao> this also happens for queue_transaction(s)
[15:04] <nhm> Yeah, I can't say I have seen that.
[15:05] <joao> well, back to trying to figure out what signals that condition then :)
[15:05] <joao> thanks
[15:25] <nhm> np, good luck
[15:28] <joao> wohoo!
[15:28] <nhm> figure it out?
[15:28] <joao> kinda
[15:28] <joao> the store->mount() should fail
[15:29] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[15:29] <joao> I mean, should tell me it is failing or smth
[15:29] <joao> well, in reality the fault is all mine
[15:29] <joao> since I'm not checking the return value of store->mount()
[15:32] * stass (stas@ssh.deglitch.com) has joined #ceph
[15:41] * Oliver (~oliver1@p4FFFE978.dip.t-dialin.net) Quit (Quit: Leaving.)
[15:41] <joao> note to future self: --filestore-xattr-use-omap is your friend
[15:44] * f4m8 is now known as f4m8_
[15:46] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[15:48] * stass (stas@ssh.deglitch.com) has joined #ceph
[16:14] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:25] <joao> sagewk, ping me when you're around
[17:13] * cattelan_away is now known as cattelan
[17:22] * Guest8014 (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[17:22] * xarthisius (~xarth@hum.astri.uni.torun.pl) has joined #ceph
[17:32] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:46] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:00] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:19] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:32] <sagewk> joao: ping
[18:32] * sjust (~sam@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:33] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[18:33] * sjust (~sam@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:34] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[18:34] * sjust (~sam@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:34] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[18:37] <sagewk> any reason not to set lpg_num=0 by default?
[18:39] <sjust> what would happen if you attempted to use the preferred pg feature?
[18:41] <sagewk> objecter would currently just hang, i think
[18:41] <sjust> that's my only concern
[18:41] <sjust> we would want librados to detect that case and return some form of error
[18:42] <sagewk> that'd be pretty easy
[18:42] <sjust> yeah
[18:56] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[19:03] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[19:06] <sagewk> have the conf room back, back to vid-yo
[19:08] <joao> great
[19:08] <nhm> okay, which one today?
[19:08] <joao> btw, is there any way I can run a standalone test with teuthology?
[19:09] <nhm> joao: what kindo f stand-alone test?
[19:09] <nhm> joao: I've been running various non-ceph stuff via teuthology...
[19:09] <joao> yeah, I figured I was being way too abstract
[19:10] <joao> well, I've got the workload generator, and I would like to run it on a btrfs volume on a plana
[19:10] <joao> without building ceph on the said plana
[19:10] <joao> any ideas on how to achieve this?
[19:10] <nhm> joao: ah, you need to do what I need to do. I want to move the partitinoing code out of the ceph task.
[19:10] <nhm> joao: you could just run a task interactive and partition one of the virtual devices yourself...
[19:11] <nhm> that is probably the quickest way right now.
[19:11] <joao> how would I do that? any docs on that?
[19:12] <nhm> joao: no docs, but I can help you. You basically just want to create a teuthology file that has the "interactive" task, and then you can either use the python prompt to do stuff or just log into the machine(s) and do whatever yourself.
[19:12] <nhm> joao: I'll send you something in a bit.
[19:13] <elder> sagewk, are we on skype today?
[19:13] <nhm> elder: vidyo, but I don't know which room
[19:14] <elder> I only have a link to one of them.
[19:14] <joao> I'm seeing myself double on the Danger Room
[19:14] <joao> not anymore
[19:15] <joao> nhm, thanks
[19:18] <joao> are we on the right room?
[19:18] <nhm> Can you guys see me?
[19:18] <joao> I can
[19:18] <nhm> joao: I saw everyone else before, but then vidyo crashed and now I can see you guys
[19:20] <elder> joao, can you hear me?
[19:20] <joao> elder, I can
[19:26] <nhm> elder: are you running vidyo on ubuntu?
[19:26] <elder> Yes.
[19:27] <nhm> elder: it keeps crashing. It just crashed when I tried to change the microphone to use, and now it's not even letting me log in.
[19:27] <elder> Hmm. I don't know, I just clicked on the link and it was already running from doing it before.
[19:28] <nhm> elder: yeah, apparently that's how it's supposed to work
[19:28] <elder> I don't remember what was necessary to get it running the first time.
[19:28] <elder> But I vaguely remember you told me you started it from the command line or something.
[19:29] <nhm> elder: yeah. I still get the little notification icon that it's running, but then it crashes or says I'm not logged in, even when I try to log in.
[19:29] <nhm> it seems like the USB headset makes it really unhappy.
[19:30] <elder> Mine is just audio earbuds.
[19:30] <nhm> yeah, I tried using the headset they sent me.
[19:30] <nhm> ugh
[19:30] <elder> I had some ipod headphones sitting around.
[19:31] <nhm> yeah, it's segfaulting on login now.
[19:32] <nhm> wonderful
[19:34] <nhm> well, I got it working again by blowing away .vidyo
[19:35] * perplexed (~ncampbell@ has joined #ceph
[19:35] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:37] <Tv|work> vidyo has a daemon and a browser plugin, i understand
[19:38] <nhm> Tv|work: it wasn't respecting the system wide microphone settings, so it changed the preferences in vidyo, which made it segfault, and then it segfaulted on login until I blew away .vidyo and started over.
[19:38] <dmick> ugh
[19:38] <Tv|work> lovely
[19:39] <Tv|work> that's like my first ever good skype experience -- on a webos tablet where it's bundled by the vendor
[19:39] <Tv|work> i'm now convinced video conference belongs on the list of inherently broken software jwz's groupware rant talks about
[19:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:48] <nhm> sweet, I've spread the plague to Dan's machine.
[19:50] <joao> so, how would one interact with the cluster using teuthology's interactive shell?
[19:51] <nhm> joao: oh sorry, got distracted.
[19:51] <nhm> joao: one sec
[19:51] <joao> nhm, np
[19:51] <Tv|work> joao: ctx.cluster.run(args=['uptime'])
[19:51] <joao> oh, neat
[19:52] <nhm> joao: you can do that, or just ssh into the nodes too.
[19:52] <joao> and is there anyway I could run a local script on a remote plana using this?
[19:52] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) has joined #ceph
[19:53] <nhm> joao: sounds like you've got an interactive task running now then?
[19:53] <joao> yeah
[19:54] <nhm> joao: cool.
[19:55] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:01] <joao> ok, changing my approach: is there any way to deploy a compiled ceph branch onto a plana, without executing, and leaving it there??
[20:02] <nhm> joao: I think the intent was to let you run individual context managers in tasks. For now I think you need to modify it to only create the config without launching the daemons.
[20:02] <nhm> in the ceph task that is.
[20:03] <gregaf> is anybody able to use the tracker? from where I am looks like a database problem since i can ping it, but loading pages doesn't work
[20:03] <joao> I'm not fully aware of how teuthology works, but I'm going on a limb here and assume it's easier just to get the branch on the plana and compile ceph there, no?
[20:04] <joao> gregaf, sure looks like it's perpetually waiting for server response
[20:04] <dmick> gregaf: me2
[20:04] * sjust (~sam@aon.hq.newdream.net) Quit (Quit: Leaving.)
[20:05] <gregaf> k, long as it's not just me :/
[20:05] <dmick> tracker.newdream.net/ returns immediately with "can't contact"
[20:06] <nhm> joao: I guess it depends if you would rather spend your time compiling ceph or modifying python code.
[20:06] <dmick> joao: you can always just scp/ssh
[20:06] <joao> oh man, we've had over 200 forest fires just this afternoon
[20:06] <gregaf> sagewk: ping, tracker's down
[20:06] <dmick> lock the machines and do what you will
[20:06] * sjust (~sjust@aon.hq.newdream.net) has joined #ceph
[20:06] <joao> and it's late march
[20:06] <joao> dmick, not sure if that would work, but I can give it a try
[20:07] * sjust (~sjust@aon.hq.newdream.net) has left #ceph
[20:07] <joao> nhm, I'm not sure what would be more time consuming: dealing with python given that I'm a noob, or installing ceph's dependencies and running an install on a plana
[20:07] * sjust (~sjust@aon.hq.newdream.net) has joined #ceph
[20:07] <joao> I'm sure the first one would be more rewarding long-term though
[20:08] <joao> nhm, what do you think would be needed in teuthology to support this?
[20:08] * sjust (~sjust@aon.hq.newdream.net) has left #ceph
[20:08] <nhm> joao: I'm the wrong person to ask honestly. I just started with python/teuthology 3 weeks ago.
[20:08] * sjust (~sjust@aon.hq.newdream.net) has joined #ceph
[20:09] <dmick> joao: why would it possibly not work?
[20:09] <dmick> they're just remote machines
[20:09] <joao> oh yeah... Tv|work, what do you think would be needed to support this? :p
[20:09] <joao> dmick, I was thinking of scp'ing the compiled binaries :p
[20:09] <joshd> did someone install debian on plana71 and forget to reimage it?
[20:09] <Tv|work> joao: deploy without executing?
[20:09] <joao> Tv|work, yeah
[20:10] <dmick> It's official: the Vidyo Linux client is an utter PoS
[20:10] <dmick> and unusable.
[20:10] <Tv|work> joao: just drop the tarball into a temp dir
[20:10] <Tv|work> joao: that's just about as much "deploy" as you can get without executing
[20:10] <joao> yeah
[20:10] <Tv|work> joao: setting up the cluster means running ceph code
[20:11] <joao> and what about running tests that are not dependent on osd's, mon's and the works?
[20:11] <Tv|work> joao: if you mean "don't start the daemons", well, nobody's needed that, so it doesn't exist.. you could make it happen, but instead you might be happier going into interactive mode and then killing the daemons you don't want
[20:11] <Tv|work> joao: there's no harm from having them running; simple is good
[20:12] <joao> Tv|work, I'm actually trying to just run the workload generator on a btrfs volume
[20:12] <joao> I believe I'm overthinking this a lot
[20:12] <Tv|work> joao: so just do it! ignore the ceph cluster
[20:12] <dmick> yeah. it's just a machine. teuthology is probably irrelevant
[20:12] <Tv|work> or let teuth give you the software, but just don't use the cluster
[20:12] <Tv|work> plana aren't exactly running out of hard drives you know
[20:13] <nhm> Tv|work: at some point I'm probably going to look into creating partitions outside of the ceph task.
[20:13] <dmick> (just know that if you use Teuthology to install the cluster, you'll need to use it from /tmp/cephtest)
[20:13] <Tv|work> nhm: i want to get the whole chef thing going....
[20:13] <joao> okay
[20:13] <Tv|work> dmick: with ugly environment variables too
[20:13] <joao> thanks a lot
[20:13] <nhm> Tv|work: Yeah, I was just going to say that it may be better to just focus on chef.
[20:14] <nhm> have any of you guys tried ekiga?
[20:15] <dmick> not seriously, but I know of it
[20:16] * bchrisman (~Adium@ has joined #ceph
[20:16] <nhm> dmick: I'm going to try setting it up. If you have time to test it we could see how that works.
[20:17] <Tv|work> nhm: well the office is kinda playing by the Corp IT rules, and vidyo was their thing...
[20:17] <Tv|work> nhm: i don't think they'll be enthusiastic about ekiga
[20:18] <nhm> Tv|work: hrm, ok. I won't waste time on it if they won't go for it.
[20:18] <Tv|work> nhm: though you should talk about your vidyo issues with them
[20:18] <nhm> Tv|work: who is them?
[20:18] <dmick> I just emailed cephbix
[20:18] <dmick> *biz
[20:18] <sagewk> gregaf: poked.
[20:18] <Tv|work> i wish i remembered who the right guy is
[20:19] <sagewk> fwiw, redmine keeps doing this:
[20:19] <sagewk> 14831 ? S 60:17 Rails: /home/cephtracker/tracker.newdream.net
[20:19] <sagewk> 19382 ? S 0:03 \_ git --git-dir /home/sage/git/ceph.git log --no-co
[20:19] <sagewk> 14833 ? R 77:53 Rails: /home/cephtracker/tracker.newdream.net
[20:19] <sagewk> 4947 ? S 0:03 \_ git --git-dir /home/sage/git/ceph.git log --no-co
[20:19] <sagewk> to which i say
[20:19] <sagewk> $ sudo kill 4947 14833 19382 14831
[20:20] <Tv|work> a rails programmer didn't understand timeouts and defensive programming? *gasp!*
[20:20] <dmick> ow, the sarcasm, it burns
[20:32] <joao> oh, looks like teuthology keeps the binaries at /tmp/cephtest/binary
[20:32] <joao> which is pretty useful for this :)
[20:45] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:51] * chutzpah (~chutz@ has joined #ceph
[21:08] <elder> joao, you want to do this:
[21:08] <elder> export LD_LIBRARY_PATH="/tmp/cephtest/binary/usr/local/lib:${LD_LIBRARY_PATH}"
[21:08] <elder> export PATH="/tmp/cephtest/binary/usr/local/bin:${PATH}"
[21:08] <elder> export PATH="/tmp/cephtest/binary/usr/local/sbin:${PATH}"
[21:09] <elder> Also, if you are going to log in to client "client.0", you may want this as well:
[21:09] <elder> export CEPH_ARGS="--conf /tmp/cephtest/ceph.conf"
[21:09] <elder> export CEPH_ARGS="${CEPH_ARGS} --keyring /tmp/cephtest/data/client.0.keyring"
[21:09] <elder> export CEPH_ARGS="${CEPH_ARGS} --name client.0"
[21:09] <joao> elder, thanks
[21:09] <joao> :)
[21:10] <joao> btw, is it all possible to force gitbuilder to rebuild a specific branch?
[21:10] <elder> If you update the branch, it will get built.
[21:10] <elder> Or maybe, it will gitbuilt
[21:11] <dmick> groan
[21:11] <elder> Do you prefer sarcasm?
[21:12] <joao> lol
[21:12] <joao> I don't suppose it is built *right away* though
[21:12] <joao> that, or I'm doing something wrong
[21:13] <joao> also, there is the chance that everything is okay and I'm just being silly
[21:14] <elder> What branch, what repository?
[21:15] <joao> branch wip-2087 @ ceph.git
[21:15] <joao> but hey, even if it's not the *latest*, it works
[21:15] <dmick> joao: it polls pretty frequently
[21:16] <dmick> just rebuilt 10 min ago
[21:16] <dmick> well, 15
[21:16] <elder> 2087 is done on oneiric-amd64
[21:17] <elder> what architecture were you looking for?
[21:17] <elder> http://ceph.newdream.net/gitbuilder-oneiric-amd64/
[21:17] <elder> http://ceph.newdream.net/gitbuilder.cgi
[21:18] <joao> elder, was just looking for the latest version of that branch (pushed an hour ago or so) to run on the planas
[21:18] <joao> so I guess oneiric-amd64 is fine
[21:19] <dmick> 4f0d170, right?
[21:19] <joao> yeah
[21:19] <joao> that's the one
[21:19] <elder> That's the head of branch wip-2087 at the moment.
[21:19] <joao> will redeploy teuthology
[21:20] <joao> I suppose that at the time it wasn't the one
[21:21] <elder> dmick, can you help me again with looking at the console for a machine? ubuntu@plana54.front.sepia.ceph.com
[21:21] <dmick> sage just asked about three others
[21:21] <elder> I suspect I reproduced the problem I was looking for when running kernel_untar_build.sh
[21:22] <elder> Not that I can do much about it...
[21:22] <elder> However I may try to set up the core dumping and re-start the tests, even without the debug symbols. I've trudged through raw memory plenty of times before.
[21:23] <elder> Whenever you get the chance. I have another one too, but I have to look into it first.
[21:23] * elder thinks dmick's time would be better spent enabling us to have console access...
[21:32] <elder> Is this a problem?
[21:32] <elder> 2012-03-28 04:28:00.468493 7f2515b56700 log [INF] : 0.4 scrub ok
[21:32] <elder> 2012-03-28 04:28:01.167595 7f251dd68700 journal check_for_full at 548864 : JOURNAL FULL 548864 >= 528383 (max_size 104857600 start 1077248)
[21:32] <elder> 2012-03-28 04:28:01.603414 7f251dd68700 journal check_for_full at 548864 : JOURNAL FULL 548864 >= 528383 (max_size 104857600 start 1077248)
[21:42] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[21:42] * sjust (~sjust@aon.hq.newdream.net) Quit (Quit: sjust)
[21:43] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[21:45] <gregaf> elder: means that the OSD journal filled up so it needs to wait for writes to flush (assuming I'm in the right context here)
[21:45] <elder> Sounds reasonable. So it's a transient situation that should correct itself?
[21:46] <gregaf> yep
[21:46] <gregaf> well, depends on what you mean by transient ??? the journal will block the incoming write until it has flushed more data out to the main store and can wrap around again
[21:46] <elder> Do we have to dump it many, many times within a minute?
[21:47] <gregaf> but it's caused by writing faster to the OSD than its main store can keep up, so if you keep doing that what's actually happening is you're backing up the OSD
[21:47] <gregaf> could probably implement exponential backoff; it's not a complicated check or printout
[21:47] <gregaf> but somebody would have to actually do it :)
[21:47] <elder> Printing messages doesn't help that any...
[21:54] * rosco (~r.nap@ Quit (Remote host closed the connection)
[21:54] * rosco (~r.nap@ has joined #ceph
[21:56] <elder> I'm trying to figure out the meaning of a teuthology error. Anyone willing to take a look at the traceback?
[22:07] * BManojlovic (~steki@ has joined #ceph
[22:18] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:28] * perplexed_ (~ncampbell@ has joined #ceph
[22:30] * perplexed_ (~ncampbell@ Quit ()
[22:31] <Tv|work> elder: pastebin it
[22:32] * lofejndif (~lsqavnbok@09GAAEEMF.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:34] <elder> http://pastebin.com/rwGUBFUc
[22:34] <elder> And just before that: Removing image: 100% complete...done.
[22:35] <elder> tasks:
[22:35] <elder> - ceph:
[22:35] <elder> log-whitelist:
[22:35] <elder> - wrongly marked me down or wrong addr
[22:35] <elder> - objects unfound and apparently lost
[22:35] <elder> - thrashosds:
[22:35] <elder> - rbd:
[22:35] <elder> all:
[22:35] <elder> image_size: 20480
[22:35] <elder> - workunit:
[22:35] * perplexed (~ncampbell@ Quit (Ping timeout: 480 seconds)
[22:35] <elder> all:
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:35] <elder> - suites/iozone.sh
[22:36] <Tv|work> elder: gevent stupidity is hiding the original traceback :(
[22:36] <Tv|work> elder: ceph_manager.Trasher has a bug
[22:36] <elder> OK, well I don't understand what that means, but I take it the traceback is not useful...
[22:36] <elder> OK.
[22:36] <Tv|work> sjust: ^
[22:37] <elder> So, it didn't really look like an error. Do you think there was no error, and it was a test problem?
[22:37] <Tv|work> or perhaps it really was self._exception that was None.. let's see
[22:37] <Tv|work> no, that results in something else
[22:37] <Tv|work> so somewhere in the trasher, "a=None; a()" happens
[22:37] <Tv|work> and gevent lost the traceback
[22:38] * grape (~grape@ has joined #ceph
[22:40] <Tv|work> i dislike the fact that it inherits Greenlet in the first place
[22:40] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[22:40] <Tv|work> this is not java :(
[22:40] <Tv|work> elder: you need to talk to Sam
[22:41] <elder> OK.
[22:41] <elder> sjust, whenever you happen to notice, I guess I need to talk with you...
[22:41] <Tv|work> sjust: gevent re-raises the exception from inside the greenlet without preserving the stack trace
[22:41] <sjust> ugh
[22:42] <Tv|work> sjust: if you didn't inherit Greenlet, there'd be things to do
[22:42] <Tv|work> now that you do, i don't much care ;)
[22:42] <sjust> what?
[22:42] <Tv|work> nothing else in the code needs to inherit Greenlet; you're too special
[22:43] <sjust> so don't inherit from Greenlet?
[22:43] <sjust> done
[22:43] <Tv|work> well then it's just a function
[22:43] <Tv|work> and there's less of this OO crap and self and it's easier to wrap in a try: except: if we need to
[22:48] <sjust> diff --git a/teuthology/task/ceph_manager.py b/teuthology/task/ceph_manager.py
[22:48] <sjust> index f4d85a6..bf23e64 100644
[22:48] <sjust> --- a/teuthology/task/ceph_manager.py
[22:48] <sjust> +++ b/teuthology/task/ceph_manager.py
[22:48] <sjust> @@ -5,7 +5,7 @@ import re
[22:48] <sjust> import gevent
[22:48] <sjust> import json
[22:48] <sjust>
[22:48] <sjust> -class Thrasher(gevent.Greenlet):
[22:48] <sjust> +class Thrasher:
[22:48] <sjust> def __init__(self, manager, config, logger=None):
[22:48] <sjust> self.ceph_manager = manager
[22:48] <sjust> self.ceph_manager.wait_for_clean()
[22:48] <sjust> @@ -28,8 +28,8 @@ class Thrasher(gevent.Greenlet):
[22:48] <sjust> # prevent monitor from auto-marking things out while thrasher runs
[22:48] <sjust> manager.raw_cluster_cmd('mon', 'tell', '*', 'injectargs',
[22:48] <sjust> '--mon-osd-down-out-interval', '0')
[22:48] <sjust> - gevent.Greenlet.__init__(self, self.do_thrash)
[22:48] <sjust> - self.start()
[22:48] <sjust> + thread = gevent.spawn(self.do_thrash)
[22:48] <sjust> + thread.start()
[22:48] * Tv|work waits for sjust to get kicked
[22:48] <sjust>
[22:48] <sjust> def kill_osd(self, osd=None):
[22:48] <sjust> if osd is None:
[22:48] <sjust> @@ -81,7 +81,7 @@ class Thrasher(gevent.Greenlet):
[22:48] <sjust>
[22:48] <sjust> def do_join(self):
[22:48] <sjust> self.stopping = True
[22:48] <sjust> - self.get()
[22:48] <sjust> + thread.get()
[22:48] <sjust>
[22:48] <sjust> def choose_action(self):
[22:48] <sjust> chance_down = self.config.get("chance_down", 0)
[22:48] <sjust> that should be enough to remove the Greenlet inheritance
[22:49] <sjust> whoops
[22:49] <Tv|work> sjust: do you mean self.thread perhaps?
[22:49] <sjust> I do
[22:49] <sjust> hence the whoops
[22:49] <sjust> that look ok
[22:49] <sjust> ?
[22:49] <Tv|work> still ugly but better ;)
[22:54] * perplexed (~ncampbell@ has joined #ceph
[23:04] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[23:18] <elder> So sjust, Tv|work, can I assume that there's nothing to learn from my teuthology error, and therefore can kind of ignore it and move on?
[23:19] <Tv|work> elder: i think sjust would like to know how to reproduce it
[23:19] <Tv|work> elder: but from the breakage itself, probably not
[23:21] <elder> sjust, I just pasted you the entire yaml file in a private window. I doubt all the kernel_untar runs are necessary.
[23:21] <elder> Wait a minute, sent you the wrong one. I'll e-mail you the right one. (Sorry)
[23:22] <Tv|work> i think he's distracted by a bunch of things
[23:23] <elder> Ritalin.
[23:23] * lofejndif (~lsqavnbok@09GAAEEMF.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[23:26] <nhm> looks like vidyo doesn't support any ubuntu versions more recent than 10.10.
[23:27] <Tv|work> nhm: how about running it on an android phone, then?
[23:27] <elder> Do they support Windows versions newer than Vista?
[23:28] <nhm> Tv|work: I wonder if they'd support cyanogenmod. ;)
[23:29] <Tv|work> they don't need to know ;)
[23:30] <nhm> Tv|work: what until they ask me what model it is and figure out it's supposed to be running windows mobile.
[23:30] * danieagle (~Daniel@ has joined #ceph
[23:30] <elder> nhm, what are you going to do with firmware installation? All firmware? Selected ones?
[23:30] <Tv|work> nhm: who's asking you what model your phone is? android is android.
[23:31] <elder> I got this warning on an initramdisk: W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-6.2.3.fw for module bnx2
[23:31] <Tv|work> elder: all firmware
[23:31] <elder> OK.
[23:31] <elder> That'll fix it.
[23:31] <nhm> elder: all firmware put into linux-updates. You can try wip-firmware if you are adventurous.
[23:31] * LarsFronius (~LarsFroni@f054096124.adsl.alicedsl.de) has joined #ceph
[23:31] <elder> I've got plenty of adventure, than you.
[23:31] <nhm> er s/linux-updates/updates
[23:32] * lofejndif (~lsqavnbok@19NAAHPD4.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:32] <nhm> elder: speaking of which, can you send me whatever kernel you are requesting in teuthology so I can test it with that?
[23:32] <elder> Just use testing or master.
[23:32] <nhm> nto the kernel itself, just the yaml
[23:32] <elder> They're both the same.
[23:32] <nhm> ok
[23:33] <elder> kernel:
[23:33] <elder> mon:
[23:33] <elder> branch: master
[23:33] <elder> client:
[23:33] <elder> branch: master
[23:35] * perplexed (~ncampbell@ Quit (Quit: perplexed)
[23:37] * perplexed (~ncampbell@ has joined #ceph
[23:49] * LarsFronius (~LarsFroni@f054096124.adsl.alicedsl.de) Quit (Quit: LarsFronius)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.