#ceph IRC Log


IRC Log for 2012-04-10

Timestamps are in GMT/BST.

[0:01] * Guest1336 (~adjohn@ Quit (Ping timeout: 480 seconds)
[0:01] <sagewk> yehudasa, sjust: want to look at wip-encoding?
[0:03] <yehudasa__> sagewk: sure
[0:06] <nrheckman> Put content is no longer throwing up errors, but I think I still have a problem with the URL encoded / in the request path. It's just overwriting the first entry each time and dropping everything including and after the first instance of '%2F'
[0:06] <nrheckman> For instance... I put a file "dbcontent%2F0%2F1%2F0%2F6" and all that exists is "dbcontent"
[0:08] <yehudasa__> are you sure that's all that exists? might be that the java api just shows you the common prefix?
[0:08] <nrheckman> I'm listing the files with libs3
[0:09] <nrheckman> if I put without encoded slashes (with lbs3) everything seems fine...
[0:09] <yehudasa__> are you putting the file through libs3?
[0:09] <yehudasa__> the file with the '%2F0'
[0:09] <nrheckman> That works fine, the java client is url encoding the path. which is giving me fits.
[0:10] <yehudasa__> how do you put the "dbcontent%2F0%2F1%2F0%2F6" file?
[0:11] <nrheckman> java aws client
[0:11] * jluis (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[0:12] <yehudasa__> apache log? rgw log?
[0:13] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:13] <nrheckman> rgw log: "2012-04-09 14:51:13.756463 7fd0eebfd700 src=/dbcontent%2F0%2F1%2F0%2F304"
[0:13] <yehudasa__> can you provide the complete log for this requset?
[0:13] <yehudasa__> request
[0:14] * bchrisman (~Adium@ Quit (Ping timeout: 480 seconds)
[0:14] <nrheckman> Sure.
[0:15] * bchrisman (~Adium@ has joined #ceph
[0:16] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[0:16] <nrheckman> http://pastebin.com/Fr5WzTKR
[0:17] <nrheckman> I think I got the whole thing.
[0:19] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[0:21] <yehudasa__> nrheckman: yeah, something weird is going on. What version are you running?
[0:22] <nrheckman> # radosgw -v
[0:22] <nrheckman> ceph version 0.44.1 (commit:c89b7f22c8599eb974e75a2f7a5f855358199dee)
[0:39] * lofejndif (~lsqavnbok@09GAAERD5.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[0:39] <yehudasa__> nrheckman: ok, that's a bug. I can reproduce it with libs3 also. It's not really related to the url encoding afaict
[0:40] <nrheckman> yehudasa: you know of a work around?
[0:40] <yehudasa__> nrheckman: I'll look at it, fix should be pretty trivial
[0:40] <yehudasa__> are you compiling from source?
[0:42] <nrheckman> Yeah
[0:43] <yehudasa__> nrheckman: created issue #2259
[0:55] <yehudasa__> sagewk: looks ok, other than my comment
[0:59] <yehudasa__> nrheckman: I pushed a fix, you can cherry-pick 8d5c87a86e070b4e95ef0d58a469bdbbef4a826c
[1:00] <nrheckman> yehudasa: thanks for that. I will give it a go.
[1:02] * MarkN (~nathan@ has joined #ceph
[1:17] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[1:18] <nrheckman> yehudasa: Working great!
[1:30] <yehudasa__> yay
[2:12] <elder> Well, I need <xfs/xfs.h> which I believe would normally be packaged as part of an xfsprogs-dev package, and that doesn't seem to be available for Debian. At least as far as I can see so far.
[2:12] * adjohn is now known as Guest1345
[2:12] * adjohn (~adjohn@ has joined #ceph
[2:13] <elder> Maybe it's xfslibs-dev...
[2:14] * Guest1345 (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Read error: Operation timed out)
[2:14] <dmick> ubuntu claims it is
[2:14] <dmick> Tv turned me on to apt-file. handy program.
[2:38] <elder> Yes, I have now installed apt-file and I see it will be useful.
[2:39] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:48] * jlkinsel (~jlk@ has joined #ceph
[2:49] <jlkinsel> hey guys - saw mention of this in the irc logs but soln didn't help...getting a undefined reference error when trying to build 0.44.1 http://pastebin.com/HBs54buq
[2:49] <jlkinsel> I've tried with the git source as well and ran submodule init/submodule update, didn't help
[2:52] * jluis (~JL@89-181-153-140.net.novis.pt) Quit (Quit: Leaving)
[2:58] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:04] * adjohn (~adjohn@ Quit (Quit: adjohn)
[3:50] * jlkinsel (~jlk@ Quit (Quit: leaving)
[3:53] * MarkN (~nathan@ Quit (Quit: Leaving.)
[3:54] * MarkN (~nathan@ has joined #ceph
[4:17] * MarkN (~nathan@ Quit (Quit: Leaving.)
[4:18] * MarkN (~nathan@ has joined #ceph
[4:18] * MarkN (~nathan@ has left #ceph
[4:24] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[5:43] * Qten1 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Ping timeout: 480 seconds)
[6:01] * mig5 (~mig5@ppp59-167-182-161.vic.adsl.internode.on.net) has joined #ceph
[6:25] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[6:29] * brambles (brambles@ Quit (Remote host closed the connection)
[6:30] * brambles (brambles@ has joined #ceph
[6:40] * f4m8_ is now known as f4m8
[6:43] * mig5 (~mig5@ppp59-167-182-161.vic.adsl.internode.on.net) has left #ceph
[6:56] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:59] * chutzpah (~chutz@ Quit (Quit: Leaving)
[7:19] * cattelan is now known as cattelan_away
[7:44] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[8:00] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has joined #ceph
[8:00] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has left #ceph
[8:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:21] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[8:39] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[8:51] * LarsFronius (~LarsFroni@f054106219.adsl.alicedsl.de) has joined #ceph
[8:57] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[8:58] * LarsFronius (~LarsFroni@f054106219.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[8:59] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[9:19] * loicd (~loic@ has joined #ceph
[9:29] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:48] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:03] * BManojlovic (~steki@ has joined #ceph
[10:04] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has joined #ceph
[11:17] * Meths (rift@ Quit (Quit: leaving)
[12:13] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[12:22] * joao (~JL@ has joined #ceph
[12:23] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[12:30] * MK_FG (~MK_FG@ Quit (Remote host closed the connection)
[12:33] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:47] * lofejndif (~lsqavnbok@82VAACYOM.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:58] <nhm> good morning #ceph
[14:59] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) has joined #ceph
[15:10] * MK_FG (~MK_FG@ has joined #ceph
[15:24] <joao> good morning nhm
[15:35] * lofejndif (~lsqavnbok@82VAACYOM.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[16:03] * f4m8 is now known as f4m8_
[16:25] <wonko_be> is it currently possible to do a ceph-osd --mkfs without having a config file (taking all parameters on the command line?)
[16:28] <joao> wonko_be, I don't think so
[16:29] <joao> I believe that the function handling ceph's arguments will fail if it doesn't find a config file
[16:31] <joao> well, actually, what I said isn't entirely true, but it holds for ceph-osd
[16:51] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[17:08] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[17:14] * nrheckman (~chatzilla@75-149-56-241-SFBA.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[17:38] * BManojlovic (~steki@ has joined #ceph
[17:44] <sagewk> wonko_be: try -c /dev/null ?
[17:44] <sagewk> or -c ''
[17:46] <sagewk> firmware stuff is still taking down plana nodes :(
[17:46] <joao> :(
[17:48] <elder> Didn't we get that fixed?
[17:48] <elder> Never checked in?
[17:48] <sagewk> it seems to consistently fail on some subset of nodes
[17:49] <nhm> elder: It's not committed to master, and doesn't seem to work consistently.
[17:50] <elder> I am still able to run my manual update script if it's needed.
[17:50] <nhm> elder: but I was under the impression that it shouldn't be strictly necessary since we already had the updated firmware in place (I thought this was just to get new firmware in the future).
[17:50] <elder> If the machines ever get re-imaged they need to be updated again.
[17:50] <elder> And it may also depend on the kernel version (?)
[17:50] <nhm> sagewk: yes, I noticed the same problem. It seems to be due to the way the git pull is being performed.
[17:51] <sagewk> it's in master now, that's why the nightly qa runs aren't working
[17:52] <nhm> sagewk: In my testing on burnupi, the git pull seems to fail consistently on roughly 30% of the nodes.
[17:52] * cattelan_away is now known as cattelan
[17:52] <elder> Do we have a local mirror of the git repository?
[17:52] <nhm> sagewk: a git pull origin master works on those nodes. I don't know enough about git to know what the downsides to doing that are vs the approach that is being used now.
[17:53] <elder> It really should only be necessary to pull the latest occasionally anyway.
[17:53] <elder> So a recursive wget or something might be better. Maybe.
[17:54] <sagewk> i don't think it's a network thing.. something else is going on.
[17:54] <sagewk> in any case, latest teuthology.git should pull from our mirror. nhm, want to see if it still fails with that?
[17:54] <sagewk> is there a node where it consistently fails i can look at?
[17:55] <nhm> sagewk: hrm, let me see if I can find a burnupi node for you.
[17:58] <nhm> heh, finding one that breaks is tough now that I got them all working.
[17:59] <sagewk> how did you fix them?
[17:59] <nhm> sagewk: instead of doing a "sudo git pull <git_url>", I did a "sudo git pull origin master".
[18:00] <nhm> sagewk: on burnupi the first command was returning 'fatal: Couldn't find remote ref git', but only on like 30% o the nodes.
[18:02] <nhm> Tommi said something about the git url looking "like a <rbranch>:<lbranch> refspec. PEBKAC."
[18:03] <nhm> so apparently the command is being invoked wrong, but I don't really know why.
[18:03] <elder> If you were using fetch you might use: git fetch remotename remote-branchname:local-branchname
[18:04] <elder> That allows you to say which local branch will contain what is on the remote branch specified.
[18:04] <elder> But I think if you just did "master" it implies "master:master"
[18:05] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) Quit (Quit: Ex-Chat)
[18:10] <nhm> elder: so any idea why the "git pull <url>" would only some times break?
[18:10] <sagewk> grr tons of machines seem to be down. not sure if its because of this or something else
[18:10] <elder> No.
[18:10] <elder> I'd have to look at what was getting reported.
[18:10] <nhm> I'd rather just do a git pull origin <branch> which seems to be standard, not sure why that's not what we are doing.
[18:11] <sagewk> hard to tell when the nodes where teuthology log show it failed aren't up. the ... pull origin master works on 9 other random nodes, but that's not very conclusive
[18:11] <nhm> since the origin is being set anyway.
[18:12] <sagewk> yeah
[18:12] <elder> We actually might *not* want to do a git pull. Instead, we might want: git checkout -b temp; git fetch origin +master:master; git checkout master; git branch -D temp
[18:12] <elder> That will forcibly fetch whatever is in the remote master and make ours match.
[18:12] <sagewk> or git fetch origin ; git reset --hard origin/master
[18:13] <elder> I suppose. Though there's no reason to fetch anything other than master. And I think it refused to fetch on top of the current branch.
[18:14] <elder> Regardless of how many ways we can think of to do it, I think we should simply be making our tree match the remote master, period. And updating may not be needed very frequently at all.
[18:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:15] <nhm> elder: indeed. In the long run all of this is kind of insane anyway. This what chef/puppet/etc were made for.
[18:16] <sagewk> - 'sudo', 'git', 'pull', 'origin', 'master'
[18:16] <sagewk> + 'sudo', 'git', 'fetch', 'origin',
[18:16] <sagewk> + run.Raw('&&'),
[18:16] <sagewk> + 'sudo', 'git', 'reset', '--hard', 'origin/master'
[18:16] <sagewk> ?
[18:16] <nhm> sagewk: seems clear to me.
[18:17] <elder> Looks like C++ to me. :)
[18:17] <sagewk> heh
[18:17] <nhm> elder: I'm not sure how I should interpret that comment. ;)
[18:17] <elder> Why do the pull?
[18:17] <elder> Fetch will fetch it.
[18:17] <sagewk> that's the removed line
[18:17] <nhm> elder: that's being remoted
[18:17] <elder> Oh.
[18:17] <nhm> er removed
[18:17] <elder> Ye.s
[18:18] <elder> Now it looks like a diff of a Perl file.
[18:18] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[18:18] <sagewk> hrm, well, updating the kernel seems to have taken down all the nodes i just tested on.
[18:18] <wonko_be> sagewk: seems to work - now I only need a way to specify my journal options (filesize) to ceph-osd --mkfs ...
[18:19] <nhm> sagewk: We can fix them with ipmitool....
[18:19] <nhm> sagewk: I can help if you want
[18:19] <sagewk> --osd-journal-size <sizeinmb>
[18:20] <wonko_be> ah, perfect, thx
[18:20] <sagewk> nhm: is there console?
[18:20] <sagewk> before we blindly powercycle a few dozen machines? i have a feeling they won't come back up anyway
[18:21] <sagewk> also,
[18:21] <nhm> sagewk: There are on the burnupi nodes. Haven't tried plana yet.
[18:21] <sagewk> Error: Unable to establish IPMI v2 / RMCP+ session
[18:21] <sagewk> Unable to set Chassis Power Control to Cycle
[18:21] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[18:21] <sagewk> where should ipmitool be run from?
[18:22] <nhm> sagewk: there are ip addresses for planaXX ipmi.
[18:22] <nhm> sagewk: I just run it from my box.
[18:23] <nhm> username/password are probably different from the burnupi nodes though.
[18:23] <nhm> yep, got in to plana02
[18:23] <sagewk> ah, working now
[18:25] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Read error: Operation timed out)
[18:29] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[18:34] * lxo gives up on cephfs snapshots for the time being, and decides to start experimenting with farms of hard links
[18:35] <lxo> I've got a couple of trivially reproducible bugs in my setting, that I can't seem to find time to debug :-(
[18:36] <lxo> and reduced tests don't seem to trigger it
[18:38] <sagewk> lxo: can you make sure that the steps to reproduce are documented somewhere (if they're not already)? Ideally in the form of a convenient bash script
[18:42] <lxo> sagewk, unfortunately one of my reproducers is precisely documented, but you couldn't trigger the problem with it :-( for the other, I have a script, but triggering the problem depends a bit on timing. the other is a simple rsync, which presumably depends on pre-existing conditions
[18:43] <lxo> even the script seems to rely on pre-existing conditions, for it doesn't seem to trigger the bug on a similar, just-constructed tree
[18:43] <sagewk> lxo: ok. i was running in an odd environment when i tried to reproduce (uml, single node), so probably with some more effort it will work.
[18:45] <lxo> one thing that I think would help test things out more reproducibly is some way to get the mds to flush its cache (??-la --reset-journal, but only after making sure everything in the journal is recoverable from cephfs metadata, and without actually restarting the mds)
[18:46] <sagewk> yeah
[18:46] <sagewk> that wouldn't be too difficult, actually
[18:49] <lxo> something like ceph mds # flush
[18:49] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[18:50] * lofejndif (~lsqavnbok@83TAAEVOT.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:52] * rturk (~rturk@aon.hq.newdream.net) has joined #ceph
[18:54] <lxo> incidentally, --reset-journal proper is still be broken for me :-( ceph-mds remains waiting forever, and the mons don't seem to take notice of it
[18:56] * gregaf (~Adium@aon.hq.newdream.net) has left #ceph
[19:00] <joao> so, using ccache to compile ceph should be just 'CC="ccache g++" make', right?
[19:01] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:03] <sagewk> CXX=... i think?
[19:04] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:05] * chutzpah (~chutz@ has joined #ceph
[19:06] <joao> oh yeah, that makes sense
[19:17] <elder> I can't hear anybody on the con call
[19:17] <elder> I'm kind of screwed.
[19:18] <elder> I'm dropping off. Someone let me know how it goes.
[19:19] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[19:25] <nhm> elder: notable: 10GE should be working soon on tests nodes, some networking problems being fixed on plana, Sam is finding lock contention issues on the OSD side, I've got some bulk performance tests I'm going to send out to interested people.
[19:31] <elder> Thank you.
[19:31] <elder> Going offline for a minute.
[19:32] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[19:38] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[19:39] * LarsFronius (~LarsFroni@g231137245.adsl.alicedsl.de) has joined #ceph
[19:43] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[19:53] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[19:55] * yehudasa__ (~yehudasa@aon.hq.newdream.net) Quit (Quit: Ex-Chat)
[20:05] <sagewk> teuthology vm back up
[20:05] <sagewk> the host oopsed.
[20:05] <sagewk> (maverick, uptime 224 days)
[20:16] <nhm> hehe
[20:28] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[20:30] * loicd (~loic@ Quit (Quit: Leaving.)
[20:39] * joao (~JL@ Quit (Remote host closed the connection)
[20:47] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:12] <lxo> lesson #1) mkdir .snap/whatever is *much* faster than cp -lR . ../snaps/whatever :-)
[21:14] <dmick> heh. yes, real snapshots are nice
[21:14] * lxo wonders if this has to do more with the overhead of turning an inode with a single name into a hard-linked inode, so this will become faster from the second snapshot on, or if it's going to be like that for all snapshots
[21:16] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:17] <lxo> anyway, I'll feel much safer with my data safely guarded in actual directories, rather than in snapshots that change from under me in unpredictable ways, (timestamps, loss of files when I move or hardlink or unhardlink them elsewhere in the tree, etc), or even cause mds failures when I change files in a directory that contains several snapshots. once I move it all into a sufficiently replicated and ceph_snapshotted ceph filesystem, I'll be able to free up s
[21:17] <lxo> ome disk space and get back to trying to nail these bugs down
[21:17] <gregaf> lxo: I'm not certain, but I actually suspect no ?????you have to touch a lot more inodes directly with that cp command than with a ceph snapshot
[21:18] <gregaf> turning all of those inodes into hardlinks definitely isn't helping, but it's the same order of work as all the other stuff
[21:18] <lxo> I don't expect it to be quite as fast as an actual snapshot, but I'm hoping it won't take as close to forever as my first experiment is taking ;-)
[21:19] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[21:19] <lxo> anyway... I've been trying to create this snapshot farm for over a year now, a few more weeks (if it takes that long) to get all the data in won't bother me too much
[21:19] <sagewk> lxo: be a little careful with huge numbers of hard links.. they populate teh ceph mds anchor table, and the current implementation won't scale well
[21:19] <lxo> if only I weren't so fond of backups ;-)
[21:19] <lxo> OMG
[21:20] <lxo> ok, I'll watch out for that. maybe I'm screwed one way or another ;-)
[21:20] <lxo> one more incentive for me to nail the snapshot bugs down ;-)
[21:20] <sagewk> yeah, us too
[21:21] <gregaf> someday we will get to work on the MDS again???*dreamy smile*
[21:21] <lxo> I understand snapshots are not much of a priority for you guys, so I kind of feel I ought to look into that myself, for I really want to rely on them
[21:21] <lxo> :-)
[21:22] <gregaf> we definitely appreciate the work and the patches :)
[21:22] <lxo> I'll probably need some hand holding, for the mds is a piece of ceph whose working I haven't quite figured out yet
[21:22] <gregaf> makes us feel like maybe we really can be a community project!
[21:22] <lxo> :-)
[21:23] <lxo> it kind of feels odd to work at Red Hat, that recently acquired gluster, and help out with ceph, but hey, I got involved with ceph long before Red Hat's move ;-)
[21:24] <yehudasa> "<gregaf> someday we will get to work on the MDS again???*dreamy smile*" | s/dreamy/terrified/g
[21:25] <yehudasa> all in the point of view
[21:25] <lxo> but ceph's design appeals a lot to me, so I'm inclined to stay around and enjoy my ROI on it ;-) I haven't even looked at gluster ;-)
[21:26] <gregaf> yehudasa: well you don't have to if you don't want to, but right now I'm the unattached dev without a daemon ;)
[21:26] <lxo> not that invested much time into improving it (I know), but just learning about its features and seeing how to apply them to my wishes took some time
[21:27] <lxo> anyhow... gotta go now. ttyl
[21:29] <nhm> lxo: I'm friends with one of the gluster guys. I think they'll probably end up with a bit of a different focus than us, but who knows.
[21:35] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[21:41] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[21:43] * kloo (~kloo@a82-92-246-211.adsl.xs4all.nl) has joined #ceph
[21:43] <kloo> hi!
[21:44] <kloo> i have a few directories on my cephfs that appear empty on the client side, but they can't be rmdir'ed because mds says they are in fact not empty.
[21:44] <yehudasa> hi!
[21:45] <kloo> the mds debug for such a failing rmdir looks like http://pastebin.com/My0PxErx
[21:45] <kloo> hey yehudasa, i have another.. carrot?
[21:45] <kloo> looking at the code it appears this is a known problem area, i see some conditional MDS_VERIFY_FRAGSTAT stuff.
[21:45] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) has joined #ceph
[21:45] <kloo> is this being worked on? can i help with this behaviour i get?
[21:46] <yehudasa> kloo: this gets awkward.. maybe we should forget about the carrot and the stick?
[21:46] <kloo> it seems to get into this state when i 'rm -rf' some large/deep directory structure, which causes delayed (and retried?) operations.
[21:46] <kloo> .. on the osds.
[21:46] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[21:47] <kloo> i haven't run with MDS_VERIFY_FRAGSTAT yet but i expect it to fail with an assertion.. which is worse than empty dirs i can't rmdir. :)
[21:47] <kloo> awkward is my middle name..
[21:48] <gregaf> kloo: most of the FS bug reports we get have something to do with this stuff, yeah
[21:48] <gregaf> the info about delayed and retried ops on the OSD maybe being related is new, though
[21:49] <gregaf> how did you arrive at that conclusion?
[21:49] <kloo> well, the two things seem to coincide based on my small sample set.
[21:50] <kloo> experiencing this, can i help in some way?
[21:51] <gregaf> we're still trying to get a set of logs which show a folder "going bad"
[21:51] <kloo> mds debug 20 logs?
[21:51] <gregaf> are you just noticing the delayed ops due to output in "ceph -w"?
[21:52] <kloo> yeah and my osd nodes go cpu-bound for a spell during 'rm -rf', oprofile points to btrfs.
[21:53] <kloo> basically every time i rm -rf a largish directory structure i get 1+ of these.
[21:53] <kloo> this evening was particularly fruitful, i created 35.
[21:54] <kloo> so if debug logging doesn't prevent it, i can produce such logs.
[21:54] <gregaf> interesting
[21:54] <gregaf> let me look up the settings we want
[21:55] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[21:56] <gregaf> kloo: do your current MDS logs contain any lines that contain "mismatch between"?
[21:56] <kloo> grepping..
[21:57] <kloo> yes, of two flavours:
[21:58] <gregaf> okay, good
[21:58] <kloo> "mismatch between head items and fnode.fragstat! printing dentries" and "mismatch between child accounted_rstats and my rstats!"
[21:58] <gregaf> so yes, if you can turn up mds debugging to 20 and then break a directory, and post that log for me, that would be helpful
[21:59] <gregaf> but make sure it's newly-broken and wasn't broken before :)
[21:59] <kloo> of course.
[22:00] <kloo> but it's ok to create an additional one on this fs that already has a bunch?
[22:00] <kloo> or should i preferably do so on a fresh cephfs?
[22:00] <kloo> the latter would be more difficult, i don't have a scratch setup at present.
[22:01] <gregaf> if you can do it on a fresh FS it would be awesome, but as long as the new to-be-broken dir isn't a parent or child of a broken directory it should be fine
[22:01] <gregaf> (only difference I can think of off-hand between the two is less noise on a fresh FS)
[22:01] <kloo> acknowledged, i'll attempt the simpler thing first.
[22:02] <kloo> i have to go now but i will hopefully return soon, with a gift. :)
[22:02] <kloo> oh and ceph rocks.
[22:02] <kloo> bye.
[22:02] * kloo (~kloo@a82-92-246-211.adsl.xs4all.nl) Quit (Quit: ...)
[22:02] <gregaf> cool, thanks!
[22:09] <elder> If I want to test a new or modified teuthology task, is it possible to do so without committing the change to the git tree?
[22:12] <gregaf> elder: yeah, just run teuthology out of the dirty tree
[22:12] <yehudasa> elder: yes
[22:12] <elder> So run it out of my local tree and that's where it gets for tasks?
[22:12] <elder> goes for/gets the tasks
[22:12] <gregaf> yep
[22:12] <yehudasa> yeah
[22:12] <elder> Great
[22:12] <elder> Great
[22:13] <yehudasa> oh, man, I just need to set up a script to copy whatever greg says
[22:13] <yehudasa> will save me lots of typing
[22:13] <elder> Good idea.
[22:13] <elder> Or just say "what he said"
[22:13] <yehudasa> oh, good idea
[22:14] <gregaf> I'm reading a lot of email today; irc is more interesting ;)
[22:14] <yehudasa> what he said
[22:14] <elder> Nice.
[22:20] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[22:20] <elder> Workunits, on the otherhand, you have to commit to the tree, is that right?
[22:20] <gregaf> yes
[22:20] <dmick> "that's what he said"
[22:20] <dmick> oh wait, wrong punchline
[22:20] <gregaf> just do it in a branch and run that branch
[22:21] <gregaf> (or specify a local Ceph repo to run from instead)
[22:21] <elder> Don't know how do run that branch for the ceph code.
[22:21] <dmick> gitbuilders build any pushed branch
[22:21] <dmick> and you can specify which branch to install in the task
[22:22] <gregaf> elder: I think it's documented in the ceph task docstring
[22:22] <elder> OK, I'll look there. THanks.
[22:22] <gregaf> explanations for how to run a branch, run a local directory, etc
[22:22] <sagewk> firmware git update still failing all over the place
[22:23] <dmick> - ceph:
[22:23] <dmick> # path: /home/dmick/src/ceph/ceph
[22:23] <dmick> branch: v0.43
[22:23] <dmick> (I've used path or branch successfully)
[22:23] <dmick> sagewk: hm
[22:23] <elder> Just say the word and I can push updated firmware to all plana nodes that need it.
[22:23] <dmick> can I look at a log/error somewhere?
[22:24] <elder> As a stopgap
[22:24] <dmick> assuming you can talk to them :)
[22:24] <sagewk> i want to make this work correctly so we don't hit this later
[22:24] <dmick> which, with a 10g interface up now, might be easier
[22:24] <sagewk> plana08 hit it and is now offline
[22:24] <dmick> but..yeah. Can you point me to, or clip, the test log while I attempt to log in?
[22:24] <sagewk> 2012-04-10T12:52:51.596 DEBUG:teuthology.orchestra.run:Running: 'sudo git --git-dir=/lib/firmware/updates/.git config --get remote.origin.url >/dev/null || sudo git --git-dir=/lib/firmware/updates/.git remote add origin git://ceph.newdream.net/git/linux-firmware.git'
[22:24] <sagewk> 2012-04-10T12:52:51.620 DEBUG:teuthology.orchestra.run:Running: 'cd /lib/firmware/updates && sudo git pull origin master'
[22:24] <sagewk> 2012-04-10T12:52:51.745 INFO:teuthology.orchestra.run.err:fatal: Not a git repository (or any of the parent directories): .git
[22:25] <dmick> it's rebooting ATM (on serial console)
[22:25] <sagewk> the whole thing is at /a/next-2012-04-10_12:52:44/163/teuthology.log on teuthology vm
[22:25] <sagewk> k
[22:26] <sagewk> i forgot to push my change to do git fetch origin ; git reset --hard origin/master instead of git pull origin master.. but i don't think that is the problem ehre
[22:26] <dmick> 08?...is up, and has fw
[22:26] <sagewk> here
[22:26] <dmick> or well let me rephrase that
[22:27] <dmick> it didn't fail to open its bnx2, but /lib/firmware/updates isn't present. hm.
[22:27] <sagewk> ok, can trigger it by hand.
[22:29] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:41] * joao (~JL@ has joined #ceph
[22:49] <sagewk> wip-discard should be ready to merge into the other rbd stuff
[23:21] * Meths (rift@ has joined #ceph
[23:46] <yehudasa> gregaf: just pushed a trivial fix to master
[23:46] <gregaf> cool, thanks
[23:52] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.