#ceph IRC Log

Index

IRC Log for 2010-10-26

Timestamps are in GMT/BST.

[0:13] * mib_jw83r5 (a3011351@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:24] <jantje> sage: right, but I'm more interested why this doesn't show up on other my other servers (which are 100% identical)
[0:26] <sage> i'd verify the disk utilizations are similar. if so, it might just be a slow disk or something (happens surprisingly often)
[0:26] <jantje> the active msd is on that machine
[0:27] <sage> the slow one or the fast one? :)
[0:27] <jantje> the one thats throtteling the journal
[0:28] <jantje> i should mount using a different MDS
[0:28] <jantje> see what that gives me
[0:37] * jantje loves sysrq-trigger
[0:40] <johnl> hrm, rados client started hanging. df returns but an ls on a pool hangs
[0:41] <johnl> put hangs too. was working a min ago
[0:41] <sage> johnl: ceph -s output?
[0:42] <johnl> http://pastebin.com/7YJkVSjG
[0:42] <johnl> 2 mons, 3 osds
[0:46] <gregaf> johnl: (different topic) do you still have the core dump and executable for that cfuse crash?
[0:47] <johnl> gregaf: not sure, but it happens every time so I can get whatever you need
[0:47] <johnl> though this is the same cluster, so suspect I should wait until rados is working again :)
[0:47] <gregaf> okay
[0:47] <sage> johnl: the writes are hanging bc some of the pg's are peering (not active).
[0:47] <sage> an osd must have recently restarted or something?
[0:48] <johnl> process startup times for all the osds suggest not. been up for hours.
[0:49] <jantje> sage: well, it's probably nothing to worry about
[0:49] <johnl> I changed the replication "size" for the data pool to 2 and then 3 and then 2, to watch it re-replicate. which seem to work fine
[0:49] <johnl> then I wrote some files with rados, which worked fine
[0:49] <johnl> then one file just didn't write, hung waiting.
[0:50] <johnl> now most stuff hangs, except lspools and df
[0:50] <johnl> gregaf: I'll attach a core (and executable?) to the ticket I opened, that ok?
[0:50] <sage> sounds like something we need to fix :).
[0:50] <gregaf> johnl: that'd be great
[0:51] <johnl> gregaf: or I can give you access to the server it's on if you want. it's just a throwaway vm
[0:51] <gregaf> I just need to check the value of a few variables to see how the invalid function input is being generated, to make sure I'm using a reasonable stick on it :)
[0:51] <gregaf> assuming they're within the redmine size limits — I'm actually not sure what they are
[0:51] <johnl> same for you sage, can give you access to this hung cluster if you want to poke it. just an ssh key away!
[0:53] <sage> johnl: are osd logs enabled?
[0:53] <johnl> got a /var/log/ceph/osd.0.log file with current entries
[0:54] <johnl> and osd.1.log
[0:54] <jantje> Hmm, I marded all OSDs on machine as down , but i still see like 200MB/sec data flowing
[0:54] <jantje> to that machine
[0:56] <jantje> ok, much better with 'out'
[0:56] <jantje> :)
[1:17] <johnl> gregaf: how can I get a core dump from cfuse? it's segfaulting but I'm not seeing a core dump anywhere
[1:28] * seibert (~seibert__@drl-dhcp42-115.sas.upenn.edu) Quit (Ping timeout: 480 seconds)
[1:33] <sage> johnl: ulimit -c unlimited?
[1:34] <johnl> that worked, thanks sage. 46M core file.
[1:35] <johnl> that rlimit was set to 0 by default, didn't know that!
[1:39] <johnl> core and executable attached to bug report. bzipped up good :)
[1:40] <gregaf> johnl: thanks!
[1:41] <johnl> np, very glad to help!
[1:41] <johnl> will see what else I can break
[1:45] * yehuda (~yehuda@adsl-69-225-137-176.dsl.irvnca.pacbell.net) has joined #ceph
[2:00] <johnl> bedtime for me now, night all.
[2:06] * jantje too
[2:06] <jantje> 2am
[2:06] <jantje> :(
[2:10] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:12] * yehuda (~yehuda@adsl-69-225-137-176.dsl.irvnca.pacbell.net) Quit (Ping timeout: 480 seconds)
[2:12] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[2:13] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:17] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:59] * greglap (~Adium@166.205.137.22) has joined #ceph
[3:14] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) Quit (Ping timeout: 480 seconds)
[3:30] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:51] * greglap (~Adium@166.205.137.22) Quit (Read error: Connection reset by peer)
[4:31] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[5:34] * yehuda (~yehuda@adsl-69-225-137-176.dsl.irvnca.pacbell.net) has joined #ceph
[5:35] * yehuda (~yehuda@adsl-69-225-137-176.dsl.irvnca.pacbell.net) Quit ()
[6:00] * Jiaju (~jjzhang@222.126.194.154) Quit (Remote host closed the connection)
[6:09] * Jiaju (~jjzhang@222.126.194.154) has joined #ceph
[8:07] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:32] * lidongyang_ (~lidongyan@222.126.194.154) has joined #ceph
[8:32] * lidongyang (~lidongyan@222.126.194.154) Quit (Read error: Connection reset by peer)
[8:57] * dubst (~me@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[9:10] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:34] * dubst (~me@pool-173-55-24-140.lsanca.fios.verizon.net) has joined #ceph
[10:04] * gregorg (~Greg@epoc-01.easyrencontre.com) has joined #ceph
[10:53] * Meths_ (rift@91.106.171.17) has joined #ceph
[10:57] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[10:59] * Meths (rift@91.106.140.149) Quit (Ping timeout: 480 seconds)
[11:09] * allsystemsarego (~allsystem@188.27.167.113) has joined #ceph
[11:18] <johnl> hey gregaf, gimme a shout when you're back re: tracker #518. gdb isn't working like you're expecting
[11:37] * gregorg (~Greg@epoc-01.easyrencontre.com) Quit (Ping timeout: 480 seconds)
[11:40] * Yoric (~David@213.144.210.93) has joined #ceph
[11:47] * gregorg (~Greg@epoc-01.easyrencontre.com) has joined #ceph
[12:20] * Devin (~ddiez@62.82.8.34.static.user.ono.com) has joined #ceph
[12:20] <Devin> hi
[12:23] <Devin> is there a way to check for "high level" features roadmap? it would be great to have http://ceph.newdream.net/about/ features splitted in "versions"
[12:24] <Devin> now it's not easy to know what will be abilable in 0.24 or 1.0
[12:26] <Devin> from what I can understand 0.24 could be a candidate to replace nfs servers?
[12:35] <jantje> http://tracker.newdream.net/projects/ceph/roadmap
[12:35] <jantje> for basic things it's running pretty stable actually
[12:36] <jantje> (just my personal opinion)
[12:36] <jantje> but I think most people will wait untill there is a working fsck
[12:40] <Devin> but the redmine roadmap is very "low level", I think it could be a good idea to have a more high level feature roadmap with version number on it
[12:40] <Devin> will 1.0 have all the reafures on the about document?
[12:40] <Devin> questions like this one
[13:35] <dubst> I'm curious as to how #400 is gonna generate keys for each node. Or if they're going to be created during the node creation.
[13:55] * gregorg_taf (~Greg@epoc-01.easyrencontre.com) has joined #ceph
[13:55] * gregorg_taf (~Greg@epoc-01.easyrencontre.com) Quit ()
[14:00] * gregorg (~Greg@epoc-01.easyrencontre.com) Quit (Ping timeout: 480 seconds)
[14:07] <jantje> Devin: I don't know, but I know for sure v1.0 is going to kick ass :-)
[15:04] * gregorg (~Greg@epoc-01.easyrencontre.com) has joined #ceph
[15:06] * seibert (~seibert__@drl-dhcp192.sas.upenn.edu) has joined #ceph
[15:09] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) has joined #ceph
[15:16] * Meths_ is now known as Meths
[15:50] * allsystemsarego (~allsystem@188.27.167.113) Quit (Quit: Leaving)
[15:59] <jantje> auch, a simple cp of a 500MB dir took 22minutes
[16:26] <Devin> jantje: that's what I'm waiting for :D
[16:27] <Devin> we wan to do some testing so we can get rid of our old and not-so-beloved nfs servers
[16:27] <jantje> actually, pnfs should be pretty OK once it gets into the wild
[16:27] <jantje> except for the lack of redundancy ofcourse
[16:27] <jantje> :)
[16:28] <Devin> that's why we want to test ceph, because it's not only "parallel" but redundant
[16:37] <jantje> a few months ago I was really trying to figure out pnfs, to get a set up working, but that horribly failed
[16:38] <jantje> I didn't knew CEPH back then, and actually I accidentily found CEPH ...
[16:45] <Devin> :)
[16:45] <Devin> we plan to test 0.24
[16:46] <Devin> so we are ready to go when 1.0 is released
[16:59] * alexxy[home] (~alexxy@79.173.82.178) has joined #ceph
[17:01] * alexxy (~alexxy@79.173.82.178) Quit (Ping timeout: 480 seconds)
[17:20] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (Quit: Leaving.)
[17:28] <jantje> Devin: you can never start early enough
[17:28] <jantje> Devin: what are you planning to use it for ?
[17:29] <jantje> I going to try and build an high performance parallel build cluster
[17:30] * gregorg (~Greg@epoc-01.easyrencontre.com) Quit (Ping timeout: 480 seconds)
[17:33] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:35] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:37] <Devin> jantje: webservers
[17:37] <Devin> we have a few millions of images
[17:46] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[17:50] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[17:54] <greglap> johnl: what's gdb doing instead?
[17:54] <johnl> ello
[17:54] <johnl> lemme go run again
[17:57] <johnl> No symbol "prior_size" in current context.
[17:57] <johnl> the stack trace output differs a bit from the one I pasted in the ticket
[17:58] <greglap> what is it?
[17:58] <greglap> (pastebin or just here)
[17:58] <johnl> if you gimme your ssh key, you can debug yourself if you like. but I also don't mind going though it though
[17:59] <johnl> http://pastebin.com/CkQHtfp0
[18:00] <greglap> ah, looks like the frame numbers are a bit different, same assert though
[18:00] <greglap> frame 8
[18:01] <greglap> I can just ssh in if you like, though, shall I email it to you or paste it here?
[18:01] <johnl> paste
[18:01] <greglap> ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAzqJQfdAKaggPDgSpk8P6QjLeyQYOz183fVSKnqAa5EeaCJozS1dn8z+iphswT3YL6wqIMqVHMfkSUcEeTBwZmvydaghOMJEIQSP1PCzH0V2XprELrEcBC3QTvaElPjjEYgBVUvEw8sL/RIQ7mgt9vxD0NcCDspsz35fmU222KzFxD41PZ5XzHQ3Lumwdax7mcpqrw8Aa5/tgcEqAIu54HYRf1Qmk6+ueMAvpwkX3eiSEtkVvuV1Kxpc3/0mOfJ7cn+eyPX3qUbo4oY76hIYseoUIwEaVE/pNW+pjwDSy4gSOcuxYDEGKzCwZPZeQtRQWzRt3YNafxi5xsByobGLlZw== gfarnum@GF-Macbook.local
[18:22] * gregorg (~Greg@81.253.43.15) has joined #ceph
[18:22] * gregorg (~Greg@81.253.43.15) Quit ()
[18:49] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:54] * Devin (~ddiez@62.82.8.34.static.user.ono.com) Quit (Quit: Devin)
[18:57] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has left #ceph
[18:57] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[18:58] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (Quit: Leaving.)
[19:00] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:20] * greglap (~Adium@166.205.137.189) has joined #ceph
[19:35] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:35] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has left #ceph
[19:40] <greglap> johnl: where did you get the version of Ceph you're running?
[19:40] <johnl> debian repository
[19:40] <greglap> it's marked itself as a git revision which I can't find anywhere
[19:40] <greglap> ah
[19:40] <johnl> sorry, the ceph repository
[19:41] <johnl> debs
[19:41] <johnl> weird
[19:41] <greglap> I'll ask Sage about how he packages those
[19:41] <johnl> actually, possibly unrelated, but I have a git tree clone here and I tried pulling today and am getting merge errors
[19:41] <johnl> I've made no local changes
[19:41] <johnl> someone been fiddling with the git repo perhaps?
[19:42] <greglap> which branch is it?
[19:42] <johnl> master
[19:42] <greglap> hmm, maybe, although I think we try to avoid rebasing that
[19:42] <johnl> tip of my head is 140284fb32b9ab8018f6ad740ee54a7603322779
[19:42] <johnl> so like may, long time ago
[19:43] <johnl> yeah, gotta be careful rebasing stuff that's already been pushed
[19:44] <johnl> anyway, probably unrelated. I didn't build the packages, they're direct from the repository
[19:45] <greglap> yeah, the package repo might be separate, which could explain it
[19:45] <johnl> yer
[19:45] <greglap> I do see a few discard commits in the master branch, we'll talk about it once everybody's in
[19:45] <johnl> it's a common thing to do, maintain a separate packaging repo
[19:45] <johnl> k
[19:46] <greglap> just most of our testers are still using the dev tree so I wasn't expecting to have to check, and I want to make sure I'm looking at the right pieces :)
[19:57] * allsystemsarego (~allsystem@188.27.167.113) has joined #ceph
[20:20] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[20:29] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[20:36] * greglap (~Adium@166.205.137.189) Quit (Ping timeout: 480 seconds)
[20:47] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:59] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:59] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[21:25] * phreak- (~phreak@gangsta.nl) Quit (Quit: -)
[21:58] * allsystemsarego (~allsystem@188.27.167.113) Quit (Quit: Leaving)
[21:59] <gregaf> johnl: just pushed a hopeful fix to testing
[21:59] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[22:00] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:01] <gregaf> you can cherry-pick it if you like (b5d9bec659daa8ba26810e7508ec473aba8ad287)
[22:07] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[22:09] * allsystemsarego (~allsystem@188.27.167.113) has joined #ceph
[22:13] * allsystemsarego (~allsystem@188.27.167.113) Quit ()
[22:18] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[22:39] * seibert (~seibert__@drl-dhcp192.sas.upenn.edu) Quit (Ping timeout: 480 seconds)
[23:07] * dubst (~me@pool-173-55-24-140.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[23:12] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[23:24] * eternaleye_ (~eternaley@195.215.30.181) has joined #ceph
[23:24] * eternaleye (~eternaley@195.215.30.181) Quit (Remote host closed the connection)
[23:46] * dubst (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.