#ceph IRC Log


IRC Log for 2011-05-11

Timestamps are in GMT/BST.

[0:01] <joshd> yeah, 0.26 worked too
[0:03] <rageguy> [root@mpi-m2 1]# qemu-img convert -f rbd -O raw rbd:rbd/wtest1 wtest-0.img
[0:03] <rageguy> qemu-img: error while reading
[0:03] <rageguy> bah
[0:04] <rageguy> [root@mpi-m2 1]# rbd export wtest1 wtest-0.img
[0:04] <rageguy> writing 4096 bytes at ofs 0
[0:04] <rageguy> writing 2117632 bytes at ofs 4194304
[0:04] <rageguy> ...
[0:04] <rageguy> facepalm
[0:06] <rageguy> it looks like it reads first 4k and then fails
[0:07] <rageguy> okay, I'll try nonsparse one now
[0:07] <rageguy> what else I can try, eh
[0:08] <joshd> rageguy: can you turn on debugging with 'debug monc 10', 'debug objecter 20', and 'debug rados 20' in the global section of your ceph.conf
[0:08] <rageguy> sure
[0:09] <rageguy> where it is supposed to dump logs?
[0:09] <joshd> I think it depends on your 'log dir' setting
[0:09] <rageguy> qemu-img still prints the same error
[0:10] <joshd> if you just have 'log dir = ' it should print to stderr
[0:10] <rageguy> well it doesn't
[0:11] <rageguy> aha
[0:11] <rageguy> sorry typo
[0:11] <rageguy> now it does
[0:11] <joshd> cool
[0:11] <Tv> sepia clocks being adjusted; please avoid running autotests for a while
[0:11] <Tv> and they are way off and this time in all kinds of directions
[0:12] <rageguy> I flooded your private chat with stuff
[0:13] <joshd> rageguy: I think it's faster to pastebin
[0:15] <Tv> gregaf: bleh the hardware clocks are all accurate to <1 second, that's not a valid explanation
[0:15] <gregaf> Tv: but the software clocks are off?
[0:15] <Tv> no wait, i'm misreading something
[0:15] <Tv> ok the hardware clocks are off
[0:15] <Tv> let's see, if i reset them now, and see what happens after a reboot
[0:19] <Tv> we've lost a bunch of sepia boxes again, and serial consoles are dead too
[0:19] <Tv> ops hate us :(
[0:49] <gregaf> Tv: where's autotest get its Ceph builds from? gitbuilder?
[0:49] <Tv> yup
[0:50] <gregaf> okay
[0:50] <Tv> you can pass in custom urls if you want your branch as gitbuilder builds it / your local tarball that has special magic in it
[0:50] <gregaf> ah, right
[0:51] <gregaf> I'm testing a possible band-aid locally now and will want to run it on the cluster shortly (I hope)
[0:51] <gregaf> but I still don't know how the client got into the different state to begin with :/
[0:52] <Tv> well clocks are synced on the surviving sepia machines, and an ntp install run is in progress
[0:52] <gregaf> oh, I'll wait until we can run it on the old code then
[0:53] <Tv> gregaf: start the job already, times are in sync
[0:53] <gregaf> *pushes buttons*
[0:54] <Tv> *bleep* *bloop*
[0:55] <Tv> or is it more like http://www.youtube.com/watch?v=1jjN-H62U64
[0:56] <gregaf> errr, hmm
[0:56] <gregaf> how long does it normally take for cfuse to mount?
[0:57] <gregaf> 15:55:42 DEBUG| Running '/usr/local/autotest/tests/download/ceph_newdream_net_8116_tarball_wip_cfuse_debug/ceph_dbench/usr/local/bin/cfuse -f -c /usr/local/autotest/tmp/tmpLHtcgS_ceph_dbench.cluster0/ceph.conf --name=client.0 /usr/local/autotest/tmp/tmpLHtcgS_ceph_dbench.cluster0/mnt.0'
[0:57] <gregaf> 15:55:42 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:55:47 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:55:52 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:55:57 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:02 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:07 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:12 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:17 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:22 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:27 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:32 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[0:57] <gregaf> 15:56:37 DEBUG| cfuse not yet mounted, got fs type 'ext2/ext3'
[1:19] <Tv> gregaf: how long does it take for a ceph cluster to get healthy...
[1:19] <Tv> gregaf: sounds like you have a problem there
[1:20] <Tv> well, "ceph health" should be checked good before it even gets there
[1:20] <gregaf> yeah
[1:20] <Tv> but i've seen that a few times
[1:20] <Tv> often a daemon died just seconds after startup
[1:22] <gregaf> did the daemon logs get moved when you fixed it up to save them?
[1:22] <gregaf> they're not in those debug directories anymore that I can find :(
[1:22] <gregaf> although there's a "client.log" file that's just got autotest output
[1:23] <gregaf> ???er, actually, that's probably because it's an old ceph-autotest branch
[1:23] <gregaf> let me try it with an updated one
[1:23] <gregaf> oh, nope, guess you didn't change the autotest repo to do that
[1:29] <gregaf> Tv: hmm, it's doing it again, can you take a look?
[1:29] <gregaf> I just don't know autotest or our control hacks well enough to start guessing
[1:29] <Tv> gregaf: yeah hold on
[1:29] <gregaf> http://autotest.ceph.newdream.net/afe/#tab_id=view_job&object_id=575
[1:29] <Tv> well at this point it's just three machines trying to use ceph..
[1:31] <Tv> well the cfuse log says
[1:31] <Tv> 2011-05-10 16:31:00.749548 7f9475bcf720 client4108 target mds0 not active, waiting for new mdsmap
[1:31] <Tv> i guess that's related
[1:31] <gregaf> where's the cfuse log located?
[1:31] <Tv> /usr/local/autotest/tmp/tmptCSkOO_ceph_dbench.cluster0/results/log/client.0.log
[1:32] <Tv> on sepia69
[1:32] <gregaf> oh, I thought there were links to it in the web interface
[1:32] <gregaf> did I make that up?
[1:32] <Tv> yeah but that's only updated every now and then
[1:32] <Tv> it runs rsync at certain poinst
[1:32] <Tv> points
[1:33] <Tv> there's another mechanism that tails the log files, but i haven't figured out how to plug ceph logs into it, or whether that's even possible..
[1:33] <gregaf> ah
[1:34] <gregaf> yep, mds crashed on a Journaler assert
[1:35] <gregaf> oh dammit, shouldn't have aborted until I looked at all the logs
[1:36] <Tv> hehe... it should collect them in the web ui though
[1:36] <gregaf> yeah
[1:36] <gregaf> the Journaler assert was on a response from the OSD
[1:37] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:53] * greglap (~Adium@ has joined #ceph
[1:53] <greglap> oh, I bet I know what broke
[1:53] <greglap> it's the pre-zero code and we just swapped the OSD to return ENOENT if you try to delete a non-existent object
[2:01] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:05] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[2:05] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:35] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Read error: Operation timed out)
[2:41] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:55] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[3:11] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[3:15] * cmccabe (~cmccabe@ has left #ceph
[3:17] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[3:23] * zwu (~root@ has joined #ceph
[3:35] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[3:38] * zwu (~root@ has joined #ceph
[4:02] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[4:06] * zwu (~root@ has joined #ceph
[4:19] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[4:23] * zwu (~root@ has joined #ceph
[4:36] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[4:49] * zwu (~root@ has joined #ceph
[5:00] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[5:14] * zwu (~root@ has joined #ceph
[5:52] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[6:14] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[6:30] * zwu (~root@ has joined #ceph
[6:47] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[6:53] * zwu (~root@ has joined #ceph
[7:15] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[7:46] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[7:52] * zwu (~root@ has joined #ceph
[8:14] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[8:28] * zwu (~root@ has joined #ceph
[8:57] * zwu (~root@ Quit (Ping timeout: 480 seconds)
[9:18] * yehuda_hm (~yehuda@bzq-79-182-117-140.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[9:40] * zwu (~root@ has joined #ceph
[9:40] * zwu (~root@ Quit ()
[13:10] * yehuda_hm (~yehuda@nesher4.haifa.il.ibm.com) has joined #ceph
[13:17] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:36] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[15:15] * MK_FG (~MK_FG@ Quit (Quit: o//)
[15:26] * MK_FG (~MK_FG@ has joined #ceph
[15:33] * yehuda_hm (~yehuda@nesher4.haifa.il.ibm.com) Quit (Ping timeout: 480 seconds)
[15:47] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[16:47] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:51] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:53] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:55] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:12] * yehuda_hm (~yehuda@bzq-79-182-117-140.red.bezeqint.net) has joined #ceph
[17:36] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:39] * lxo (~aoliva@ Quit (Quit: later)
[17:39] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:57] * lxo (~aoliva@ has joined #ceph
[18:00] * greglap1 (~Adium@ has joined #ceph
[18:03] <greglap1> sagewk, all: running late today, be in to the office at ~11:30 (online until then)
[18:10] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:11] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:26] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:59] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:01] * aliguori (~anthony@ has joined #ceph
[19:09] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[19:38] <sagewk> let's meet in here
[19:41] <bchrisman> here where? :)
[19:41] <rageguy> ---> here ?
[19:41] <rageguy> <--- or there?
[19:41] <rageguy> :)
[19:42] <bchrisman> these guys paid pretty well without finagling
[19:42] <bchrisman> so I didn't go looking around
[20:06] <sagewk> btw we finally have a clean gitbuilder! http://ceph.newdream.net/gitbuilder/
[20:06] <Tv> ooh
[20:06] <Tv> it's so.. pale green
[20:06] <sagewk> :)
[20:07] * greglap1 (~Adium@ Quit (Quit: Leaving.)
[20:08] <bchrisman> looks boring with just green :)
[20:12] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[20:15] * aliguori (~anthony@ has joined #ceph
[20:28] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[21:13] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[21:13] * yehuda_hm (~yehuda@bzq-79-182-117-140.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[21:24] * yehuda_hm (~yehuda@bzq-79-178-112-50.red.bezeqint.net) has joined #ceph
[21:30] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[21:32] * lxo (~aoliva@ has joined #ceph
[21:33] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[21:35] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[22:17] <Tv> 2011-05-11 13:16:46.143455 log 2011-05-11 13:15:55.138793 mon0 24 : [WRN] message from mon1 was stamped 0.031635s in the future, clocks not synchronized
[22:17] <Tv> gregaf: even ntp-synced, there's drift
[22:17] <gregaf> HA! it hung!
[22:18] <gregaf> it just took 4 minutes instead of 4 seconds...
[22:18] <Tv> and i get this
[22:18] <Tv> 2011-05-11 13:18:18.763055 mon0 -> 'HEALTH_WARN 18 pgs degraded, 21/42 degraded (50.000%)' (0)
[22:18] <Tv> i wonder if that's about timing too
[22:18] <Tv> because this run doesn't look at all special otherwise
[22:22] <gregaf> Tv: that's what happens if you only have one OSD
[22:22] <gregaf> ceph health didn't previously warn on degradation but that got added last week, remember?
[22:25] <yehuda_hm> sagewk: did you need anything?
[22:25] <sagewk> oh yeah i had a question..
[22:26] <sagewk> can you remember what the commit_op_seq fast-forward block is for in FielStore::mount()? line 1675 or so
[22:26] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[22:27] <sagewk> AFAICS the rollback code above will swap _some_ consistent subvol onto current/, and a few lines down we read in that seq value. i'm not sure why we would want to warp the seq number forward again...
[22:27] <yehuda_hm> hmm.. no idea
[22:28] <yehuda_hm> did I do that?
[22:28] <sagewk> yeah
[22:28] <sagewk> it was part of the original patch that added the use_stale_snap option
[22:28] <sagewk> c1d078160a454c92fea899659d506e0b0ab7d92b
[22:32] <Tv> gregaf: well my tests have completed just fine several times today...
[22:33] <gregaf> Tv: I added an option to introduce clock skew to try and reproduce it locally, although I've only managed to break it once
[22:33] <gregaf> which is odd since there haven't been any patches pushed that should impact it
[22:34] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[22:34] <Tv> gregaf: perhaps only some odd codepath breaks..
[22:34] <gregaf> yeah, but there shouldn't be anything different about the runs at this point
[22:35] <gregaf> and previously it would break after ~4 seconds on sepia, right?
[22:35] <Tv> gregaf: i mean something stochastic, from e.g. timing, sometimes goes via path A, sometimes path B
[22:35] <Tv> gregaf: i don't know/remember how fast it failed
[22:35] <Tv> mostly i start a job and come back to it minutes later
[22:35] <yehuda_hm> sagewk: I think warping current is meaningless there
[22:35] <gregaf> hmm, maybe it takes longer than I remember it taking
[22:35] <sagewk> yehuda_hm: ok cool, i'll take it out.
[22:35] <Tv> gregaf: old logs might have that answer
[22:35] <yehuda_hm> probably just wanted to set to the higher number
[22:36] <yehuda_hm> just in case
[22:36] <sagewk> the only thing i can think of is possible journal interaction, but that code will only apply events if they match perfectly... and in fact warping the seq could mean applying updates that shouldn't be
[22:37] <Tv> gregaf: the degraded thing... job 587 ran dbench against a single osd, the cluster was perfectly healthy... :-/
[22:37] <Tv> 05/11 10:19:15 DEBUG| ceph:0033| Ceph health: HEALTH_OK
[22:37] <gregaf> well, that's odd, maybe the degraded detection is borked?
[22:37] <gregaf> or 587 didn't actually write anything, but I don't think that's possible...
[22:38] <Tv> oh it might have had actual dbench commented out
[22:38] <Tv> but it'd still mount everything etc
[22:39] <Tv> i think it goes something like this
[22:39] <Tv> 0 osds is fine for the first time, as there's no data
[22:40] <Tv> if you kill and osd, it comes back up but cluster stays degraded
[22:40] <Tv> err i mean not "0 osds" but "less than replication count osds"
[22:41] <Tv> i think i meant to put code somewhere to autoadjust replication count to be num of osds in test cluster, but forgot how to do it ;)
[22:42] <gregaf> well it reports the cluster's degraded if the replication level is higher than the number of up+in osds, but there might be some triggers first
[22:43] <gregaf> cmccabe probably knows, I think he wrote the health stuff
[22:43] <Tv> it sounds like the logic is wrong when there's no data, or something like that
[22:43] <sagewk> iirc it reports pgs with degraded flag set
[22:43] <sagewk> or if up or in is less than num_osds
[22:43] <cmccabe> gregaf: as far as the osd health stuff, it's mostly checking PG flags
[22:44] <cmccabe> gregaf: I think it might warn about degraded objects too, let me check
[22:44] <sagewk> it does now, yeah
[22:56] <Tv> hrrrmmph another cfuse hanging before the goes through
[22:56] <gregaf> before the what?
[22:57] <Tv> after starting cfuse, before the actual mount happens
[22:57] <Tv> = talking to ceph etc
[22:57] <gregaf> that probably means the MDS hasn't started up all the way
[22:57] <Tv> ah i see i dropped the word "mount" from the original
[22:57] <Tv> that's a guaranteed sign of me getting tired
[22:58] <Tv> 2011-05-11 13:58:10.385758 mds e4: 1/1/1 up {0=0=up:creating}
[22:58] <Tv> what's "creating" there?
[22:58] <gregaf> part of the startup routine
[22:58] <gregaf> if it's sitting there for very long it either means the MDS crashed and hasn't been marked laggy yet, or the OSDs aren't responding properly to the MDS
[22:59] <Tv> the process is alive
[22:59] <Tv> the osd killer did manage to restart at least one of the osds already
[22:59] <gregaf> oh, you're doing that thing now?
[22:59] <gregaf> heh
[22:59] <Tv> 2011-05-11 13:59:39.995629 osd e3: 2 osds: 2 up, 2 in
[22:59] <gregaf> are all the PGs active?
[22:59] <Tv> it seems mds doesn't recover from outages?
[22:59] <Tv> 2011-05-11 13:59:39.995407 pg v9: 36 pgs: 36 creating; 0 KB data, 10352 MB used, 58005 MB / 72016 MB avail
[23:00] <Tv> "creating" again
[23:00] <gregaf> yeah
[23:00] <Tv> i don't really know what all this means
[23:00] <gregaf> the PGs are in the creating state
[23:00] <gregaf> which is???oh, I don't remember
[23:00] <Tv> but why aren't they making progress from there on
[23:01] <gregaf> some kind of bug
[23:01] <gregaf> two kinds are possible:
[23:01] <Tv> oh well, i guess i can declare the osd killer test to be good enough ;)
[23:01] <gregaf> 1) timing is just not working out and due to the load and everything none of them are finishing before one of the OSDs get killed, at which point they have to start again
[23:01] <gregaf> 2) there's a bug in the state machine that you've uncovered
[23:02] <Tv> oh the killing stopped already
[23:03] <Tv> (due to my screwup, the test stopped killing osds almost as soon as it started; alreade moved that trigger later)
[23:03] <Tv> oh well, rerun
[23:04] <Tv> but this is what i mean when i say that external QA doesn't work very well; if i don't know the degraded etc behavior, i can't figure out any of the problems until i stop and learn it
[23:04] <Tv> which is why *everyone* needs to create tests for the things they've worked on
[23:07] <cmccabe> tv: that's really not a very good argument
[23:07] <cmccabe> tv: obvious flaw #1: most organizations have more than 1 person working on QA
[23:07] <cmccabe> tv: and that allows some specialization to occur there as well
[23:08] <Tv> like, some guy setting up the autotest server and writing a framework for the tests?
[23:08] <Tv> or wait, i guess you mean mirror developer knowledge in the qa department
[23:09] <Tv> sure, but if you really have people with those skills, they start to count as developers in my book
[23:09] <cmccabe> at Laurel Networks, if QA had to ask the developer why it was behaving in a certain way, that was already halfway to filing a bug
[23:09] <cmccabe> in some cases that was all the way. confusing behaviors are equivalent to bugs in end users' eyes.
[23:11] <cmccabe> web developers often don't have QA as a separate department, but that's because their requirements are different
[23:12] <cmccabe> quality isn't as important in a website as it is in many other products. Bugfixes can be rolled out instantly.
[23:13] <cmccabe> it's easy to test websites and developers are often actually "customers"
[23:13] <cmccabe> as in, facebook developers probably use facebook even when not at work
[23:23] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:27] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[23:28] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:31] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[23:38] <Tv> i think i just raced against the fact that cwd is a global in POSIX :(
[23:38] <Tv> *sigh*
[23:39] * Tv shakes fist at autotest
[23:39] <cmccabe> race against the machine?
[23:40] <Tv> in this case, rage is appropriate too

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.