#ceph IRC Log


IRC Log for 2010-12-01

Timestamps are in GMT/BST.

[0:05] <johnl> build of unstable is failing
[0:05] <johnl> cosd.cc: In function 'int main(int, const char**)':
[0:05] <johnl> cosd.cc:65: error: 'IsHeapProfilerRunning' was not declared in this scope
[0:05] * MarkN (~nathan@ Quit (Read error: Connection reset by peer)
[0:05] <johnl> worth filing bug reports about unstable not building?
[0:06] <sagewk> what os are you on?
[0:06] <johnl> Linux. Ubuntu Lucid i386
[0:06] * MarkN (~nathan@ has joined #ceph
[0:06] <johnl> unstable bf784cdb4f60
[0:07] <sagewk> iirc this is a library versioning issue or something. gregaf do you remember?
[0:07] <johnl> a google-perftools thing?
[0:07] <sagewk> yeah
[0:07] <johnl> too old a version perhaps?
[0:09] <johnl> it's 0.98. quite old.
[0:09] <johnl> I'll backport a newer version and rebuild
[0:09] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[0:10] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[0:10] <johnl> ta.
[0:18] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[0:19] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[0:22] <gregaf> johnl: as I recall this problem occurs because google-perftools changed the prototype for that function from "bool IsHeapProfilerRunning();" to "int isHeapProfilerRunning();"
[0:22] <gregaf> in April or May
[0:23] <gregaf> and for some reason some systems have a header from after that change and a library from before that change
[0:24] <johnl> hrm right. well, latest package will likely sort that for me.
[0:24] <johnl> ta
[0:26] <gregaf> now that i think of it the other person who reported this was running some version of Ubuntu as well
[0:29] * cmccabe1 (~cmccabe@adsl-76-199-100-125.dsl.pltn13.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[0:30] <johnl> heh
[0:41] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[0:42] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[0:48] * cmccabe (~cmccabe@adsl-75-37-28-50.dsl.pltn13.sbcglobal.net) has joined #ceph
[0:59] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[1:00] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[1:19] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[1:53] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:24] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Read error: Operation timed out)
[2:35] * greglap (~Adium@ has joined #ceph
[3:09] * lidongyang_ (~lidongyan@ Quit (Remote host closed the connection)
[3:37] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:37] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[4:41] * lidongyang (~lidongyan@ has joined #ceph
[5:54] * cmccabe (~cmccabe@adsl-75-37-28-50.dsl.pltn13.sbcglobal.net) has left #ceph
[6:21] * sentinel_e86 (~sentinel_@ Quit (Quit: sh** happened)
[6:21] * sentinel_e86 (~sentinel_@ has joined #ceph
[6:22] * iggy (~iggy@theiggy.com) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * cclien (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * Meths (rift@ Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * LW_ (~jkreger@rrcs-98-101-117-50.midsouth.biz.rr.com) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * morse (~morse@supercomputing.univpm.it) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * andret (~andre@pcandre.nine.ch) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * Mark23 (~mark@ Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * wido (~wido@fubar.widodh.nl) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * MarkN (~nathan@ Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * alexxy (~alexxy@ Quit (synthon.oftc.net resistance.oftc.net)
[6:22] * failboat (~stingray@stingr.net) Quit (synthon.oftc.net resistance.oftc.net)
[6:23] * MarkN (~nathan@ has joined #ceph
[6:23] * alexxy (~alexxy@ has joined #ceph
[6:23] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[6:23] * andret (~andre@pcandre.nine.ch) has joined #ceph
[6:23] * Mark23 (~mark@ has joined #ceph
[6:23] * wido (~wido@fubar.widodh.nl) has joined #ceph
[6:23] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[6:23] * iggy (~iggy@theiggy.com) has joined #ceph
[6:23] * cclien (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) has joined #ceph
[6:23] * Meths (rift@ has joined #ceph
[6:23] * failboat (~stingray@stingr.net) has joined #ceph
[6:23] * LW_ (~jkreger@rrcs-98-101-117-50.midsouth.biz.rr.com) has joined #ceph
[6:23] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[6:40] * ijuz__ (~ijuz@p4FFF7443.dip.t-dialin.net) has joined #ceph
[6:40] <f4m8_> Guten morgen zusammen
[6:40] * f4m8_ is now known as f4m8
[6:47] * ijuz (~ijuz@p57999A8A.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[6:51] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[8:12] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[8:32] * lidongyang (~lidongyan@ Quit (Remote host closed the connection)
[9:17] * lidongyang (~lidongyan@ has joined #ceph
[9:22] * ijuz__ (~ijuz@p4FFF7443.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[9:50] <failboat> sagewk: not yet
[9:50] <failboat> sagewk: I managed to crash anchorserver again
[9:50] <failboat> can't do it in synthetic workload though
[9:50] <failboat> only rsync on my homedir :(
[9:55] * allsystemsarego (~allsystem@ has joined #ceph
[10:22] <jantje_> maybe you can do a strace of your rsync
[10:22] <jantje_> (I'm not sure if that would be of any help)
[10:58] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[11:24] <failboat> if only I could find a place to store that gigantic trace
[11:24] <failboat> anyway
[11:24] <failboat> I'll continue
[11:26] * johnl_ (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[12:07] * tjikkun_ (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[12:08] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Read error: No route to host)
[12:13] * lidongyang (~lidongyan@ Quit (Remote host closed the connection)
[12:27] * lidongyang (~lidongyan@ has joined #ceph
[13:10] * johnl_ (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Quit: bye)
[14:40] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[14:52] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[15:34] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[15:53] * f4m8 is now known as f4m8_
[16:30] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (Quit: Leaving.)
[16:35] * Meths_ (rift@ has joined #ceph
[16:42] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[16:44] * Meths_ is now known as Meths
[16:49] * greglap (~Adium@ has joined #ceph
[17:18] <greglap> failboat: the AnchorServer is part of how Ceph implements hard links, so any synthetic test you come up with will probably need to use those
[17:19] <greglap> if that helps you
[17:39] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:04] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:26] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[18:37] <sagewk> failboat: if you can reproduce the (workload leading up to the) crash with the mds logging enabled (debug mds = 20 and debug ms = 1) that log should also be sufficient.
[18:57] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:13] * cmccabe (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has joined #ceph
[19:21] * shdb (~shdb@217-162-231-62.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[19:22] * shdb (~shdb@217-162-231-62.dclient.hispeed.ch) has joined #ceph
[19:34] <wido> hi
[19:34] <cmccabe> wido: hi
[19:34] <wido> After the recent crashes my OSD's keep crashing with new pg errors
[19:34] <wido> A lot of them
[19:35] <cmccabe> wido: is this the latest unstable?
[19:35] <wido> In the past I had situations where only I would run into
[19:35] * ajnelson (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[19:35] <wido> cmccabe: yesterdays
[19:35] <cmccabe> v0.25~rc?
[19:35] <wido> I would only run into these, since I have a "old" FS, of a few weeks old
[19:36] <wido> cmccabe: No, "ceph version 0.24~rc (commit:463d624d38d2c5444cc9aa6a2c8e6d3fbcca65fd)"
[19:36] <wido> I'll upgrade to the latest right now, see what that does
[19:36] <cmccabe> wido: great
[19:36] <cmccabe> wido: yeah, we're stabilizing 0.25~rc now. A bunch of known bugs were fixed
[19:37] <wido> But, I had some things like these in the past, my cluster would keep crashing and you guys concluded that no new cluster would run into these issues
[19:37] <sagewk> wido: you should use the 'rc' branch for the time being
[19:37] <wido> sagewk: ok, i'll switch
[19:38] <wido> and see what that does
[19:42] <wido> sagewk: the rc branch is still at 0.23-1, correct?
[19:43] * ajnelson (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[19:47] <sagewk> rc should be 0.24~1
[19:48] <sagewk> er, 0.24~rc .. i.e., not quite 0.24
[19:50] * ajnelson (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[20:01] <wido> ah, it's the debian changelog which is outdated
[20:02] <wido> for example, all my OSD's (expect for one) crash directly after startup: http://pastebin.com/DTwsx9Sk
[20:03] <wido> Before I start creating issues, is this (If you can see it that fast) due to my very damaged fs / old fs?
[20:03] <sagewk> do you have a gdb backtrace?
[20:03] <sagewk> i was just working with that code, probably my fault
[20:04] <wido> sagewk: yes: http://pastebin.com/qcmyaDQX
[20:04] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Read error: Operation timed out)
[20:05] <sagewk> http://fpaste.org/Amv4/ should fix it
[20:05] <sagewk> btw you're running in trailing journal mode, which is unusual.. is that on purpose?
[20:07] <wido> sagewk: No, that is not on purpose at all. btrfs journal or OSD journal?
[20:10] <sagewk> osd journal
[20:10] <sagewk> which node is this?
[20:11] <wido> on node01
[20:17] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[20:18] <johnl> doh, got disconnected
[20:18] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:20] <sagewk> wido: oh, i see the problem
[20:21] <wido> I'm just wondering, is it worth the time hunting these issues down? Or is this just another corner case no body will run into? I'll happily create an issue for everything I find, no problem
[20:26] <sagewk> this one is real :)
[20:27] <sagewk> in general, adding issues is good. the recovery stuff you're hitting i wouldn't tho since it's likely fallout from instability earlier in this cycle
[20:27] <wido> Yes, I get it. Well, take your time checking this one out, I just wanted to check the btrfs issue tomorrow and for that I need a working cluster
[20:30] <sagewk> ok. well i'll push a fix for this pushed in a few minutes in any case
[20:30] <sagewk> once that's verified i suggest wiping and then looking for the btrfs bug
[20:31] <sagewk> no need for that patch from before.. the async snap creation is disabled by default now until the ioctl interface is finalized
[20:31] <wido> btw, something else, how "big" is IPv6 in the US? I've got a issue with Brocade/Foundry about IPv6 and it seems to me that the US not really hurrying to implementing IPv6
[20:31] <wido> sagewk: Ok, i'll try to recreate the issue without your patch, just see if I get some warnings
[20:31] <gregaf> IPv6? what's that?
[20:32] <wido> gregaf: Thanks, that explains it :)
[20:32] <gregaf> ;)
[20:32] <johnl> IPv4 will never catch on. I'm still on IPX.
[20:32] <cmccabe> I think the IPv4 address space is going to run out this year
[20:32] <cmccabe> er, early next year
[20:33] <cmccabe> I saw an article that said 6 months, tops
[20:33] <johnl> the IPX address space is still going strong.
[20:33] <cmccabe> haha
[20:33] <cmccabe> I remember it was an option for warcraft 2
[20:33] <cmccabe> and also I think Lotus Notes might have supported it at one point?
[20:34] <gregaf> IPX was huge, it was the only way to play games online for a while
[20:34] <gregaf> or maybe just multiplayer, period
[20:34] <wido> indeed, I remember IPX :-)
[20:35] <wido> No, but here in Europe the IPv4 space will run out somewhere next year
[20:35] <cmccabe> some people believe that ISPs will just start NATing everyone
[20:35] <gregaf> I don't think the US is in any better shape in terms of available addresses
[20:35] <wido> In a few weeks we will be running dual-stack, but Brocade/Foundry has a real bug in there router platform
[20:35] <cmccabe> that would be kind of horrible I think... we'd have to tunnel absolutely everything over port 80 probably
[20:35] <gregaf> but the projections keep getting pushed back as IP holders start more aggressively gating and reclaiming unused ones and stuff
[20:36] <wido> and they don't seem to be willing to fix it
[20:36] <gregaf> and there are still tons of issues with IPv6 adoption
[20:36] <wido> gregaf: yes, that's true :) Like I'm seeing right now
[20:36] <cmccabe> well, the "running out" is in terms of companies buying new blocks of addresses
[20:36] <cmccabe> it doesn't mean that companies that already have them don't have some headroom
[20:38] <wido> ok, but it answers my question, the problems are the same. Brocade is just my problem right now
[20:43] <gregaf> Ars Technica had a pretty good overview of the state of IPv6 and its issues recently if you were looking for more background
[20:43] <gregaf> (at least, it seemed good to me: http://arstechnica.com/business/news/2010/09/there-is-no-plan-b-why-the-ipv4-to-ipv6-transition-will-be-ugly.ars/4 )
[20:44] <wido> gregaf: Yes, I saw that one :)
[20:44] <wido> nice indeed
[20:56] * ajnelson (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[21:01] * ajnelson (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[22:01] * ajnelson (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[22:06] <johnl> hey sagewk, sorry if I missed your response to this, but in bug #621 you mention commit:307404231ecb09fdd2f6dd6e50677e746bba4236 but that isn't available in the git repository. you pushed?
[22:08] <gregaf> johnl: Sage is at lunch but I think the commit got renamed or something
[22:08] <gregaf> cbb562089c788e5eeb8cbee7a2be5de0b40d84b4 is pushed and I'm pretty sure that's the one he meant
[22:08] <gregaf> commit cbb562089c788e5eeb8cbee7a2be5de0b40d84b4
[22:08] <gregaf> Author: Sage Weil <mailto:sage@newdream.net>
[22:08] <gregaf> Date:   Wed Dec 1 09:51:27 2010 -0800
[22:08] <gregaf>    rbd: use MIN instead of min()
[22:08] <gregaf>    Not even sure where min() was coming from, but it seems to be missing on
[22:08] <gregaf>    i386 lucid.:
[22:10] <johnl> ah yeah
[22:11] <wido> yehudasa: If you are playing with the RGW sometime, this might be fun to try: http://ceph.newdream.net/wiki/RADOS_Gateway#Accelerating_the_gateway_with_Varnish
[22:11] <johnl> gregaf: thats' not on unstable branch though. what is unstable branch exactly?
[22:12] <johnl> I'm looking for a branch to get the latest fixes (so I can test when my bugs are fixes :)
[22:13] <gregaf> unstable is our main dev branch
[22:14] <gregaf> most bug fixes go into the testing branch (they all should but sometimes they get mixed up or we think a bug got introduced in unstable but it was actually older)
[22:15] <failboat> sagewk: it usually crashes when I try to rm -rf the tree
[22:16] <failboat> which I previously create by rsyncing (which succeed)
[22:17] <johnl> gregaf: can't see it on the testing branch either.
[22:17] <johnl> it's in the repo but I can't for the life of me find what branch it's on!
[22:18] <wido> sagewk: I'm going afk, if you have a fix for the crash I'm seeing, mail me or post it here, i'll read it tomorrow before doing the mkcephfs for the btrfs test
[22:18] <wido> Let me know which backtrace I should NOT be seeing, so that I know if it's fixed
[22:19] <gregaf> johnl: oh, he only put it into the rc branch
[22:19] <johnl> ah!
[22:20] <johnl> bunch of other commits on there not on unstable or testing too!
[22:20] <johnl> hard for an outsider to follow!
[22:21] <johnl> could you merge it all to unstable? or is there a reason it's separate?
[22:21] <gregaf> yeah, we're trying to firm up our release practices but we haven't fully established them yet
[22:21] <gregaf> my guess is you'll want to follow either testing or rc, but I'm not entirely clear on the rc branch myself so I'll let Sage sort it out with you
[22:25] <johnl> ok ta.
[22:26] <johnl> I'm rigging up an automated build atm
[22:26] <johnl> so would be good to know of one good branch to do that from
[22:28] <johnl> you working on ceph full time greg?
[22:28] <gregaf> yep!
[22:28] <johnl> sweet. you and sage? or more?
[22:30] <gregaf> Sage and Yehuda, I came in summer last year, we added another full-time over the summer (cmccabe) and we have a couple guys who are split about 50/50 between Ceph and other company products (sjust and joshd)
[22:31] <johnl> wow! ace.
[22:31] <gregaf> (and hello, I hope you enjoyed your IM alerts to the three of you :P)
[22:31] <cmccabe> gregaf: for some reason, my IM only alerts when someone starts the line with cmccabe:
[22:32] <gregaf> well that's a lame alert system
[22:32] <cmccabe> gregaf: yeah, maybe the later versions of pidgin are better or something
[22:32] <gregaf> it's a configurable option in some clients I've used, maybe you should check your prefs
[22:33] <wido> you guys are using pidgin?
[22:34] <wido> I'm using plain old irssi on a console running in a screen
[22:34] <cmccabe> wido: that has some advantages, but I was never able to get screen's notification system to work reliably
[22:34] <gregaf> I think we're all using some multi-protocol client since we have a company jabber server
[22:35] <cmccabe> wido: the whole monitor-for-activity thing sort of worked, but sometimes seemed to miss events
[22:35] <gregaf> I'm actually on Adium since I use a Mac desktop/laptop
[22:35] <cmccabe> also as gregaf said, we usually use a multi-protocol client just for convenience
[22:35] <cmccabe> although there is a text version of pidgin (finch), it has some quirks
[22:35] <wido> I'm using pidgin, but only for my own Jabber
[22:35] <wido> I switch a lot from places during the day, office, home, other office, etc
[22:36] <wido> log on to my own server and "screen -x"
[22:36] <wido> and i'm back in the IRC channel
[22:36] <cmccabe> I run most applications inside screen, but pidgin is one exception
[22:37] <wido> Yeah, my notification right now is a yellow line, so I have to check it myself every now and then.
[22:37] <cmccabe> do you use screen's monitor-for-activity?
[22:37] <wido> uh no, I simply have a terminal open somewhere on my Ubuntu desktop, where I have the IRC channel
[22:37] <cmccabe> k
[22:38] <wido> but, I'm really going afk now! My message to sagewk has gone up a lot of lines, could one of you point him to it?
[22:38] <wido> He was working on a fix, which I had to try before cleaning my cluster for another test
[22:39] <cmccabe> ok
[22:39] <wido> tnx! ttly
[22:39] <wido> ttyl
[22:39] <cmccabe> bye
[22:40] <gregaf> johnl: sagewk says following the rc branch would be ideal
[22:41] <gregaf> everything goes into there before it goes out in any release
[22:46] <johnl> right. I'll use that. ta
[23:01] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:14] * johnl builds ceph ubuntu lucid packages...
[23:15] * ajnelson (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[23:15] * ajnelson (~Adium@soenat3.cse.ucsc.edu) Quit ()
[23:16] * ajnelson (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[23:37] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.