#ceph IRC Log


IRC Log for 2010-10-08

Timestamps are in GMT/BST.

[1:26] * Guest1671 (quasselcor@bas11-montreal02-1128536392.dsl.bell.ca) Quit (Remote host closed the connection)
[1:32] * bbigras (quasselcor@bas11-montreal02-1128536392.dsl.bell.ca) has joined #ceph
[1:33] * bbigras is now known as Guest2078
[6:18] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[6:53] * f4m8_ is now known as f4m8
[8:09] * allsystemsarego (~allsystem@ has joined #ceph
[8:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:43] * atgeek (~atg@please.dont.hacktheinter.net) Quit (Remote host closed the connection)
[8:43] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[8:43] * MarkN (~nathan@mail.zomojo.com) Quit (Ping timeout: 480 seconds)
[8:46] * MarkN (~nathan@ has joined #ceph
[8:46] * andret (~andre@pcandre.nine.ch) has joined #ceph
[8:48] * cclien (~cclien@60-250-103-120.HINET-IP.hinet.net) Quit (Remote host closed the connection)
[8:48] * cclien (~cclien@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[9:14] * atg (~atg@please.dont.hacktheinter.net) Quit (Remote host closed the connection)
[9:14] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[9:23] * LW (~jkreger@rrcs-98-101-117-50.midsouth.biz.rr.com) Quit (Remote host closed the connection)
[9:23] * LW (~jkreger@rrcs-98-101-117-50.midsouth.biz.rr.com) has joined #ceph
[9:40] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:28] * Yoric (~David@ has joined #ceph
[11:20] * MarkN (~nathan@ Quit (synthon.oftc.net weber.oftc.net)
[11:20] * allsystemsarego (~allsystem@ Quit (synthon.oftc.net weber.oftc.net)
[11:20] * conner (~conner@leo.tuc.noao.edu) Quit (synthon.oftc.net weber.oftc.net)
[11:20] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net weber.oftc.net)
[11:20] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (synthon.oftc.net weber.oftc.net)
[11:24] * Yoric_ (~David@ has joined #ceph
[11:24] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[11:24] * Yoric_ is now known as Yoric
[11:30] * MarkN (~nathan@ has joined #ceph
[11:35] * conner (~conner@leo.tuc.noao.edu) has joined #ceph
[11:35] * allsystemsarego (~allsystem@ has joined #ceph
[11:35] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[11:36] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[12:27] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:40] * Guest2078 (quasselcor@bas11-montreal02-1128536392.dsl.bell.ca) Quit (Remote host closed the connection)
[12:42] * bbigras (quasselcor@bas11-montreal02-1128536392.dsl.bell.ca) has joined #ceph
[12:43] * bbigras is now known as Guest2111
[14:52] * morse (~morse@supercomputing.univpm.it) Quit (Quit: Bye, see you soon)
[14:52] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:58] * morse (~morse@supercomputing.univpm.it) Quit (Quit: Bye, see you soon)
[14:58] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:16] * sentinel_e86_ (~sentinel_@ has joined #ceph
[15:16] * sentinel_e86 (~sentinel_@ Quit (Quit: sh** happened)
[15:18] * sentinel_e86_ (~sentinel_@ Quit ()
[15:19] * sentinel_e86 (~sentinel_@ has joined #ceph
[16:17] * MarkN (~nathan@ Quit (Ping timeout: 480 seconds)
[17:32] * f4m8 is now known as f4m8_
[18:39] * Yoric (~David@ Quit (Quit: Yoric)
[18:58] * morse_ (~morse@supercomputing.univpm.it) has joined #ceph
[19:03] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[19:11] * morse_ (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:16] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[19:31] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[19:34] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:47] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[19:49] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[20:01] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:19] <wido> sagewk: node07 and node12 are working again. osd0 seems to be crashing due to another bug right now
[20:22] <sagewk> wido: ok, i'll look in a bit.
[20:31] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[20:37] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:52] <wido> I'm looking into http://tracker.newdream.net/issues/417, but is it correct that current mon's don't have a "whoami" file? They get there name from the ceph.conf? (If host matches their hostname) correct?
[20:53] <wido> A new mon should just get the monitor data from another mon, no need to modify any data, just add it to the monmap and ceph.conf
[20:53] <sagewk> right. i'm just waiting to update that until 0.22 is released.
[20:54] <sagewk> we'll change it to show alphanumeric names for the monitors, and skip the whoami step.
[20:54] <sagewk> (oh actually the whoami might be gone in v0.21... is that what you mean? i've lost track :))
[20:55] <wido> yes, my mon's don't have a whoami
[20:55] <wido> I just wanted to update the wiki, say the issue
[20:55] <wido> saw*
[20:55] <sagewk> your mons are running 0.22~rc though right?
[21:24] <wido> sagewk: yes, the latest unstable
[21:25] <wido> btw, the rc branch? Release Candidate? Should I switch to that one or stick to unstable?
[21:31] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[21:39] <sagewk> rc = release candidate, yeah. i'd switch.
[21:46] <sagewk> colinm: one thing we really need to do is keep ceph[1-4]/sepia busy with the qa workloads. yehuda is working with ceph1.. but you work on 2,3 and see if the qa.sh/g.sh scripts can be improved? should probably be logging to disk
[21:46] <sagewk> i kinda suspect they're keeping history in ram and swapping or something? they act weird after a while. and in any case, our goal should be to have a history of the full run on disk somewhere where we can look at it later
[21:50] <cmccabe2> hmm
[21:52] <cmccabe2> does it make sense to have swap enabled on ceph{1,2,34}
[21:52] <cmccabe2> hopefully we know better than the VM about what time to flush things to disk
[21:53] <sagewk> they're just clients running random workloads. they probably won't need swap, but i don't see any reason to disable it.
[21:53] <cmccabe2> oh, right, clients
[21:53] <cmccabe2> the servers definitely shouldn't have swap on
[21:53] <cmccabe2> clients probably should
[21:54] <cmccabe2> well, I can run qa.sh and put the output on disk
[21:54] <cmccabe2> see if there is anything interesting
[21:56] <cmccabe2> We do need a better test suite infrastructure
[21:56] <cmccabe2> I worked at a company where you could just do something like this from the command line:
[21:56] <cmccabe2> $ smoke test -t all
[21:56] <cmccabe2> and then it would create a new smoke test job, insert it into a queue, and run it on a cluster when there was time.
[21:56] <cmccabe2> The QA department developed a bunch of tests to make sure that things were reasonable.
[21:57] <cmccabe2> It was a pretty sophisticated system considering that it was written in TCL
[21:59] <sagewk> yeah, we're interviewing qa ppl now. in the meantime, we just need something to keep these boxes busy testing, with logging so we have the info to fix any problems we fine
[21:59] <sagewk> find
[21:59] <cmccabe2> I saw the kernel client bug
[22:00] <cmccabe2> couldn't see any obvious way that con->osd->o_osd->map_sem could become NULL though
[22:00] <cmccabe2> I'll kick off a few qa.sh jobs before lunch then
[22:02] <cmccabe2> btw where is qa.sh
[22:03] <sagewk> in /root
[22:03] <cmccabe2> k
[22:05] <gregaf> if a semaphore becomes null or an access segfaults it's usually because the container got deleted
[22:06] <gregaf> I've seen that a few times with a session dying and then the object/struct gets deleted but pointers to it are hanging around
[22:06] <cmccabe2> sounds like an object lifecycle/locking issue then
[22:07] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:08] <gregaf> yep, we specifically saw it (server-side) with messages containing a pointer to their connection
[22:09] <gregaf> and the connection/pipe getting erased due to a reconnect
[22:46] <wido> I just tried to build packages, got a message from dh_install that cdebugpack couldn't be found.
[22:46] <wido> I added to Makefile.am: sh -c "if [ \"$(bindir)\" = \"/usr/bin\" ]; then mkdir -p $(DESTDIR)/usr/bin ; $(install_sh_SCRIPT) -m 0755 cdebugpack $(DESTDIR)/usr/bin/cdebugpack ; else mkdir -p $(DESTDIR)$(bindir) ; $(install_sh_SCRIPT) -m 0755 cdebugpack $(DESTDIR)$(bindir)/cdebugpack ; fi"
[22:47] <wido> that places cdebugpack in the right place so dh_install adds it to ceph.deb
[22:47] <wido> anyone hit that too?
[22:48] <yehudasa> wido: is that a debian issue?
[22:49] <wido> I'm running Ubuntu, but that shouldn't matter. dh_install wants to include cdebugpack, but that file isn't moved to the right place by dpkg-buildpackage, so the install fails
[22:50] <wido> But yes, it's for debian packages
[22:50] <yehudasa> hmm.. haven't seen it, but yet again, I'm not installing on debian, just compiling from source
[22:50] <yehudasa> cdebugpack was just added this week
[22:51] <wido> Ah, ok :) That explains. I've got a automated build proces set up, builds packages every morning
[22:51] <wido> just add this to src/Makefile.am, there are two similar lines like this one, for mkcephfs
[22:52] <yehudasa> oh, ok
[22:52] <wido> around line 300
[22:52] <wido> "install-data-local"
[22:53] <wido> I'm going afk
[22:53] <wido> ttyl!
[22:53] <yehudasa> yep!
[23:01] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Quit: bye)
[23:17] <cmccabe2> so I've got ceph3 and ceph4 running qa.sh. Logs under /home/cmccabe/log/ceph3/untar_snap_rm.sh.log and /home/cmccabe/log/ceph4/untar_snap_rm.sh.log
[23:17] <cmccabe2> so far no obvious problems
[23:17] <cmccabe2> but it's only been an hour I guess :)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.