#ceph IRC Log

Index

IRC Log for 2011-06-21

Timestamps are in GMT/BST.

[0:04] * aliguori (~anthony@32.97.110.65) Quit (Quit: Ex-Chat)
[0:04] * aliguori (~anthony@32.97.110.65) has joined #ceph
[0:07] * aliguori (~anthony@32.97.110.65) Quit ()
[0:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[0:20] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[0:33] * swendel (~swendel@p578D18A2.dip.t-dialin.net) Quit (Quit: Lost terminal)
[0:43] * lx0 (~aoliva@9YYAABKDG.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[0:44] * lx0 (~aoliva@09GAAE0RP.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:00] <yehudasa> cmccabe: did you push -f?
[1:01] <gregaf1> by which he means: don't push -f if your config is set up to push all branches
[1:01] <gregaf1> <??? learned this the hard way in a secondary repo without his normal git settings
[1:04] <cmccabe> hmm...
[1:05] <cmccabe> I think maybe I did that from flab?
[1:05] <cmccabe> sigh
[1:05] <cmccabe> yehudasa: yeah, I can confirm that my repo on flab, which I had to use because metropolis was down, did not have push = HEAD in the .git/config
[1:06] <cmccabe> I think maybe I just need to start putting the branch name in the git push command...
[1:06] <gregaf1> only lost one commit, not a big deal :)
[1:06] <cmccabe> relying on that thing being set in the config seems like a slender thread to hang on
[1:07] <cmccabe> sorry about that anyway
[1:07] <gregaf1> I think I erased a few merges last time
[1:07] <yehudasa> cmccabe: yes.. or at least when you push -f look at the command output
[1:07] <yehudasa> you erased a couple of commits I pushed today
[1:07] <gregaf1> if you're cleverer than I am and use git to version your homedir you can put it in your global prefs
[1:08] <cmccabe> well, you can put it in your global prefs anyway I think
[1:08] * allsystemsarego (~allsystem@188.27.164.204) Quit (Quit: Leaving)
[1:10] <cmccabe> hmm
[1:10] <cmccabe> I don't think you can set this in the .gitconfig
[1:10] <cmccabe> it has to be in the project config
[1:16] <Tv> gregaf1: don't ever -f, say +branchname instead
[1:17] <gregaf1> cmccabe: you can, I have it in mine and I'm pretty sure it works
[1:20] <cmccabe> tv: I like the + idea, I wasn't aware of that
[1:21] <Tv> chopping off a toe is more pleasant than losing the whole foot
[1:21] * verwilst (~verwilst@dD576F64D.access.telenet.be) Quit (Quit: Ex-Chat)
[1:21] <Tv> (new mnemonic: -f as in foot)
[2:10] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:38] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[2:43] * greglap (~Adium@166.205.136.123) has joined #ceph
[2:44] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:47] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:11] * greglap (~Adium@166.205.136.123) Quit (Read error: Connection reset by peer)
[3:14] * cmccabe (~cmccabe@208.80.64.174) has left #ceph
[3:38] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[3:42] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[3:54] * lx0 (~aoliva@09GAAE0RP.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[3:54] * lx0 (~aoliva@1GLAACCZN.tor-irc.dnsbl.oftc.net) has joined #ceph
[4:17] * lx0 (~aoliva@1GLAACCZN.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[4:18] * lx0 (~aoliva@82VAAB498.tor-irc.dnsbl.oftc.net) has joined #ceph
[5:58] * ork (~user@125.33.176.195) has joined #ceph
[6:26] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:27] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:28] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit ()
[6:40] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:19] * ork` (~user@123.116.123.137) has joined #ceph
[7:19] * ork (~user@125.33.176.195) Quit (Read error: Connection reset by peer)
[7:37] * ork` (~user@123.116.123.137) Quit (Remote host closed the connection)
[12:45] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:03] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has left #ceph
[13:04] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:58] * sugoruyo (~george@athedsl-408632.home.otenet.gr) has joined #ceph
[14:01] <sugoruyo> hey folks, can someone help me figure out why my MDSs are crashing?
[14:06] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[14:11] <sugoruyo> they seem to be in recovery but, when they reach rejoin, it says laggy or crashed next to them
[14:13] <sugoruyo> and the processes die, i'm also noticing that after a week of simply sitting there mounted by a client that woudln't write anything to it the mds cluster is at e6403 and increasing about once every two minutes
[14:13] <sugoruyo> ceph -w just keeps churning out these:
[14:13] <sugoruyo> 2011-06-21 15:13:14.737460 mds e6407: 3/3/1 up {0=2=up:rejoin(laggy or crashed),1=1=up:rejoin(laggy or crashed),2=0=up:rejoin(laggy or crashed)}
[14:43] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[15:36] * aliguori (~anthony@32.97.110.65) has joined #ceph
[15:51] * nolan (~nolan@phong.sigbus.net) Quit (Ping timeout: 480 seconds)
[17:50] * greglap (~Adium@166.205.143.183) has joined #ceph
[18:12] <sagewk> http://www.storagebod.com/wordpress/?p=699
[18:13] <yehudasa> sugoruyo: what does your mds log say?
[18:14] <yehudasa> do you have a core dump?
[18:40] * greglap (~Adium@166.205.143.183) Quit (Read error: Connection reset by peer)
[18:45] <sugoruyo> yehudasa: i was afk, since this is a test system i reran mkcephfs, deleted the old logs and everything
[18:46] <sugoruyo> i don't know how to obtain a core dump, i might have some output from the mds logs though, the last few lines looked like a stack trace i think
[18:46] <yehudasa> sugoruyo: yeah.. the stack trace might be interesting
[18:47] <yehudasa> also, what version are you running?
[18:47] <sugoruyo> well before re-mkcephfs'ing i ran latest in the ubuntu repos - 1 update
[18:47] <sugoruyo> currently i'm running the latest in the repos
[18:48] <sugoruyo> trace is about 22 lines, pastie?
[18:48] <sugoruyo> http://pastie.org/2102249
[18:48] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:52] * cmccabe (~cmccabe@208.80.64.174) has joined #ceph
[19:03] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:04] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[19:07] <Tv> sagewk, *: new repo ceph-qa-suite.git, new cli tool teuthology-suite, use like this: mkdir z; teuthology-suite --archive-dir=z --suite=.../ceph-qa-suite.git/ my-sepia-machines.yaml
[19:08] <sagewk> yay!
[19:09] <yehudasa> Tv: I assume these instructions are in some README?
[19:10] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:10] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[19:10] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:10] <Tv> yehudasa: not.. really..
[19:10] <Tv> the code *just* started working
[19:11] <yehudasa> Tv: oh, ok
[19:35] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[20:04] <wido> I removed my "latest" file on my monitor, my 'find' deleted to much stuff....
[20:04] <wido> It's not a monitor map, but what is it?
[20:09] <yehudasa> cmccabe: commit f6c7343f7581b3fcfe1d773ca3b3c997fd883a4d broke AuthNone
[20:10] <cmccabe> yehudasa: what seems to be the problem
[20:10] <cmccabe> yehudasa: vstart without auth worked fine for me, for what that's worth
[20:10] <yehudasa> vstart works for me, but other stuff doesn't
[20:11] <yehudasa> e.g., rados -p foo ls
[20:11] <yehudasa> specifically you removed the encoding of the entity name
[20:11] <cmccabe> perhaps the change to AuthNoneAuthorizer::build_authorizer is to blame
[20:12] <cmccabe> where is the decode that corresponds to this encoding?
[20:12] <yehudasa> you can't just change that protocol
[20:14] <cmccabe> we can restore the name there if you like. The cephx build_authorizer doesn't encode the name, so I was hoping I didn't need it there
[20:15] <yehudasa> well.. we need to get the entity name there
[20:15] <yehudasa> with cephx we don't need it because the tickets hold it
[20:16] <yehudasa> but with auth=none we do need it because there are no tickets
[20:16] <cmccabe> yeah, I'm looking at CephXTicketHandler::build_authorizer now
[20:23] <cmccabe> yehudasa: 9e9cec694ac89c1c2ed162bad68ce5da362cc3b3 should fix it
[20:23] <cmccabe> yehudasa: I don't know what I was thinking there. I think I put an entry on my TODO list to track down whether we needed that entity name, but then I forgot to do it
[20:23] <yehudasa> ok, thanks
[20:34] <stingray> yehudasa: have you seen my bug about rbd offsets mismatch?
[20:34] <yehudasa> stingray: I pushed a fix yesterday, do you still see it?
[20:34] <stingray> yehudasa: I haven't recompiled anything
[20:34] <stingray> did you push it to stable?
[20:34] <yehudasa> stingray: it requires updating both osds and librados.. it was pushed to master
[20:35] <yehudasa> we can cherry-pick it to stable
[20:35] <stingray> I'm tracking stable
[20:35] <stingray> I guess I can cherrypick it myself
[20:35] <stingray> maybe
[20:36] <yehudasa> stingray: we'll send it to stable
[20:36] <stingray> shall I or will you?
[20:36] <stingray> aha
[20:36] <stingray> great
[20:37] <stingray> yehudasa: thanks!
[20:37] <yehudasa> stingray: ok, it's pushed now, let me know if it works for you.. it requires updating both sides
[20:38] <stingray> that mismatch stuff, it only affected rbd, right?
[20:38] <yehudasa> yeah
[20:38] <yehudasa> only rbd was using it.. it would have affected other stuff though because there was also an osd bug
[20:39] <stingray> okay
[20:39] <stingray> I am kicking off my rebuilds
[20:39] <stingray> will see if it helped in 1 hour
[20:39] <yehudasa> great
[20:40] * sugoruyo (~george@athedsl-408632.home.otenet.gr) Quit (Ping timeout: 480 seconds)
[20:43] * sugoruyo (~george@athedsl-408632.home.otenet.gr) has joined #ceph
[20:44] <cmccabe> ah, looks like that dirfrag stuff was the last obstacle to building libcommon without globals
[21:03] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[21:05] * alexxy[home] (~alexxy@79.173.81.171) has joined #ceph
[21:06] <cmccabe> is gitbuilder down?
[21:08] * wido_ (~wido@fubar.widodh.nl) has joined #ceph
[21:08] * Yulya_ (~Yu1ya_@ip-95-220-242-20.bb.netbynet.ru) has joined #ceph
[21:08] * df__ (davidf@dog.thdo.woaf.net) has joined #ceph
[21:08] * murb_ (~murb@red.danu.be) has joined #ceph
[21:11] * Yulya__ (~Yu1ya_@ip-95-220-242-20.bb.netbynet.ru) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * alexxy (~alexxy@79.173.81.171) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * jbd (~jbd@ks305592.kimsufi.com) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * wido (~wido@fubar.widodh.nl) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * `gregorg` (~Greg@78.155.152.6) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * df (davidf@dog.thdo.woaf.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * andret (~andre@pcandre.nine.ch) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:11] * murb (~murb@red.danu.be) Quit (reticulum.oftc.net magnet.oftc.net)
[21:18] * andret (~andre@pcandre.nine.ch) has joined #ceph
[21:19] <cmccabe> never mind, it's responding again
[21:19] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[21:28] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Remote host closed the connection)
[21:43] * darkfader (~floh@188.40.175.2) has joined #ceph
[21:43] <stingray> yehudasa: didn't seem to help
[21:43] <stingray> but there's a chance I messed up the update
[21:43] * murb_ is now known as murb
[21:43] <stingray> [root@mpi-m2 t]# rbd --version
[21:44] <stingray> ceph version 0.29.1-8-gc48540a (commit:c48540aec6107199cc6585ec968682f43ed8c050)
[21:44] <stingray> osds are at the same ver
[21:45] <yehudasa> stingray: that's the version
[21:46] * lidongyang_ (~lidongyan@222.126.194.154) has joined #ceph
[21:46] <stingray> yep
[21:46] <stingray> still export doesn't match import
[21:47] <yehudasa> stingray: can you send the log?
[21:47] <stingray> I didn't look at offsets yet, just contents
[21:47] <stingray> I am grabbing the offsets now
[21:48] * darkfaded (~floh@188.40.175.2) Quit (Ping timeout: 480 seconds)
[21:48] <yehudasa> also, if you could compile and run http://pastebin.com/52SdBjUt on the source object it could help
[21:49] * lidongyang (~lidongyan@222.126.194.154) Quit (Ping timeout: 480 seconds)
[21:51] <stingray> sure I can but I doubt it's the source copy
[21:52] <yehudasa> stingray: the source object is a sprase file, and apparently its specific structure triggers a bug
[21:52] <stingray> http://pastebin.com/qAsQkW8n
[21:53] <stingray> first few lines in import log and export log
[21:53] <stingray> second extend offset match
[21:53] <stingray> third doesn't. reading 4096 bytes at offset 1331200 vs writing 4096 bytes at ofs 2367488
[21:55] <yehudasa> hmm.. had a similar issue yesterday, I thought I fixed it
[21:57] <stingray> doesn't look fixed :)
[21:57] <stingray> so, do you still need fiemap?
[21:57] <yehudasa> hmm.. at the moment I can do without it I think
[21:58] <stingray> http://pastebin.com/82yxFKye anyway
[21:59] <yehudasa> cool, thanks
[21:59] <stingray> will go home, will be great if you ping me when you update stable again :)
[21:59] <stingray> so far this one prevents me from running qemu-kvm
[22:00] <stingray> I ported your async changes to my kvm, and it seems to be working fast and stable except for the mangled data
[22:01] <yehudasa> great
[22:05] * aliguori (~anthony@32.97.110.65) Quit (Ping timeout: 480 seconds)
[22:07] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:08] <sagelap> let's meet at 2 for the planning mtg
[22:14] * aliguori (~anthony@32.97.110.64) has joined #ceph
[22:14] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[22:24] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[22:28] * aliguori (~anthony@32.97.110.64) Quit (Ping timeout: 480 seconds)
[22:37] * aliguori (~anthony@32.97.110.64) has joined #ceph
[22:48] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Quit: Ex-Chat)
[23:23] * lxo (~aoliva@186.214.51.231) has joined #ceph
[23:30] * lx0 (~aoliva@82VAAB498.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[23:31] * jmlowe (~Adium@217.5.145.235) has joined #ceph
[23:32] <jmlowe> quick question, is there going to be a 0.30 release?
[23:48] * aliguori (~anthony@32.97.110.64) Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.