#ceph IRC Log


IRC Log for 2011-09-13

Timestamps are in GMT/BST.

[0:10] <Tv> greglap: well, it failed again..
[0:10] <Tv> different error though
[0:10] <Tv> this is valid
[0:10] <Tv> i'll push a fix asap
[0:14] <Tv> greglap: gitbuilder should go green with 37f1b96
[0:14] <greglap> ah, I see
[0:15] <Tv> there was another issue there also, but that went away after clearing the cache
[0:22] <Tv> test/test_librbd.cc: In function ???void write_test_data(librbd::Image&, const char*, off_t)???:
[0:22] <Tv> warning: test/test_librbd.cc:679: comparison between signed and unsigned integer expressions
[0:22] <Tv> gitbuilder is good apart from that
[0:22] <Tv> (i386 compiled faster ;)
[0:27] * greglap1 (~Adium@aon.hq.newdream.net) has joined #ceph
[0:27] * greglap (~Adium@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[0:53] <Tv> gregaf, sjust, joshd: fyi I'll most likely be remote tomorrow, I have building maintenance coming in etc crap.. I'll also be without electricity in the morning, but that *should* end by 8am.
[0:54] <joshd> ok, should we call you for the daily?
[0:55] <Tv> joshd: yeah, tommi.virtanen should work for skype
[1:55] <cp> Question: I'm trying to inject a new monmap using this command "cmon -i 1 --inject-monmap /tmp/mm1"
[1:56] <cp> This is based on the instructions on the wiki, but cmon complains that these options don't exist
[1:58] * greglap1 (~Adium@aon.hq.newdream.net) Quit (Quit: Leaving.)
[1:58] <joshd> hmm, that may have broken with some recent argument parsing cleanups
[1:59] <cp> joshd: Is there an alternate way you'd suggest for adding monitors?
[2:01] <joshd> cp: no, you have to start the daemon (cmon) at some point
[2:01] <joshd> is there any output before the usage message?
[2:01] <cp> joshd: just the warning message
[2:02] * adjohn (~adjohn@ Quit (Remote host closed the connection)
[2:02] * adjohn (~adjohn@ has joined #ceph
[2:02] <cmccabe> cp: the error I get is must specify '--mon-data=foo' data path
[2:02] <cp> cmccabe: Hmmm... I'm running from the deb packages
[2:03] <cp> Does this mean I'm just using a horribly out-of-date version? (my compiling attempts didn't work out)
[2:04] <joshd> no, not too old - and I don't think the options changed
[2:05] <cmccabe> cp: --inject-monmap works fine for me
[2:06] <cp> cmccabe: what's the complete command?
[2:06] <cmccabe> cp: I think your problem is that there is no mon.1
[2:06] <cmccabe> cp: perhaps you want mon.a?
[2:06] <cmccabe> cp: ./cmon -i a --inject-monmap /tmp/mm1
[2:06] <cp> Currently I have [mon1], trying to add [mon2] (then [mon0] etc)
[2:07] <cmccabe> do you have a section for [mon.1] in the config?
[2:07] <cp> cmccabe: ah, I do see the usage thing now. Sorry
[2:08] <cp> but still not working: cmon -i 1 --mon-data=/mnt/ceph/mon1 --inject-monmap /tmp/mm1
[2:08] <cp> ** WARNING: Ceph is still under heavy development, and is only suitable for **
[2:08] <cp> ** testing and review. Do not trust it with important data. **
[2:08] <cp> usage: cmon -i monid [--mon-data=pathtodata] [flags]
[2:09] <cp> cmccabe: [mon1]
[2:09] <cp> ; host = node21
[2:09] <cp> mon addr =
[2:09] <cp> mon data = /mnt/ceph/mon1
[2:09] <cmccabe> it should be [mon.1]
[2:10] <joshd> the current version will complain if you try to add a monitor without a period:'2011-09-12 17:50:38.699599 7ff63423c720 ERROR! old-style section name(s) found: mon1. Please use the new style section names that include a period.'
[2:10] <cmccabe> anyway, I'm guessing it never even loads that configuration file, otherwise you'd see a lot of warning messages
[2:11] <cmccabe> either install it to /etc/ceph/ceph.conf, or use -c to tell the commands where it is
[2:14] <cmccabe> cp: does that clear it up?
[2:15] <cp> cmccabe: not yet. It's a bit messed up right now..
[2:16] <cmccabe> cp: maybe try something simple like ceph -s
[2:16] <cmccabe> cp: are you able to get a result from that command?
[2:19] <cp> cmccabe: that hangs. I tried stopping everything and restarting the box, but now it's not working so well.
[2:20] <cmccabe> cp: is your configuration installed to /etc/ceph/ceph.conf?
[2:20] <cp> cmccabe: yup
[2:20] <joshd> ceph -s will hang when it can't communicate with a monitor
[2:21] <cp> There's a monitor up and running. I just stopped everything and started them again. pgrep shows a pid
[2:21] <cmccabe> cp: what do you get for this: cconf --name mon.1 'mon data'
[2:22] <cp> cmccabe: nothing
[2:22] <cmccabe> cp: ok. that means there is no monitor with id 1
[2:22] <cmccabe> cp: what about mon.a?
[2:23] <cp> cmccabe: nothing for that either
[2:23] <cmccabe> what is actually in your config file?
[2:28] <cmccabe> cp: I have to go in a little bit
[2:28] <cp> cmccabe: pasting now
[2:28] <cmccabe> cp: basically, cconf is a tool that can retrieve information out of your config file
[2:28] <cmccabe> cp: if cconf can't find how your monitors are configured, then either your configuration file does not exist, or there is no monitor with that name
[2:29] <cp> http://pastebin.com/8xazvVAd
[2:30] <cmccabe> cp: what about cconf --name mon.1 -c <path-to-config> 'mon data'
[2:31] <cp> cconf --name mon.1 -c /etc/ceph/ceph.conf 'mon data' gives nothing
[2:31] <cmccabe> how about cat /etc/ceph/ceph.conf
[2:32] <cp> that works at least :)
[2:33] <cmccabe> cp: well, I pasted your configuration and ran cconf --name mon.1 -c ./ceph.conf 'mon data'
[2:33] <cmccabe> and got /mnt/ceph/mon1
[2:34] <cp> cmccabe: Hmm... so there's something up with my vm then. Perhaps I'll trying wiping it again and start over and see if the cconf command works then
[2:34] <cmccabe> so either you're running some version of the software that's really, really old, or there's some kind of permissions problme
[2:34] <cmccabe> what version of the software are you running
[2:35] <cp> cmccabe: how do I tell?
[2:35] * yoshi (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:35] <cmccabe> ./ceph --version
[2:35] <cp> ceph version 0.24.3 (commit:2cd2c56dd07c4862da6a5a8b4c2febafacc37d22)
[2:36] <cmccabe> cp: that is pretty old, 7 months ago
[2:36] <cp> Ahhh...
[2:36] <cp> I used "apt-get install ceph"
[2:36] <cmccabe> cp: another thing is that monitors should be named mon.a, mon.b and so forth.
[2:36] <joshd> which repo is that from?
[2:36] <cmccabe> cp: I don't think anyone has tested configurations with mon.1, mon.2 and so forth
[2:36] <joshd> that might be the default one in ubuntu
[2:37] <cmccabe> cp: mon.1 might work, but nobody else but you is trying it, so...
[2:37] <joshd> you should add our repo for up to date packages: http://ceph.newdream.net/wiki/Debian#Release_packages
[2:37] <cp> cmccabe: I'm happy to follow the crowd
[2:37] <cmccabe> cp: hopefully the newer software won't have those parsing problems
[2:38] <cmccabe> cp: I honestly don't remember what cconf did 7 months ago; it was a little buggier then as I recall
[2:38] <cp> cmccabe: :) Well I'll try over with a newer version as soon as I get it down. Thanks
[2:38] <cmccabe> k
[2:44] * cmccabe (~cmccabe@ has left #ceph
[2:49] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:21] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:25] * adjohn is now known as Guest10059
[3:25] * adjohn (~adjohn@ has joined #ceph
[3:26] * adjohn (~adjohn@ Quit ()
[3:27] * Guest10059 (~adjohn@ Quit (Read error: Operation timed out)
[3:30] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) Quit (Quit: jojy)
[3:44] * sagelap (~sage@ has left #ceph
[3:54] * cp (~cp@ Quit (Quit: cp)
[4:17] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:17] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit ()
[6:34] * yehuda_hm (~yehuda@bzq-79-183-235-246.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[6:45] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:35] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[7:55] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[8:28] * lxo (~aoliva@09GAAAF7K.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[8:37] * yehuda_hm (~yehuda@ has joined #ceph
[9:31] * ghaskins_ (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Read error: Connection reset by peer)
[9:31] * ghaskins_ (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[9:41] * Ormod (~valtha@ohmu.fi) Quit (Ping timeout: 480 seconds)
[9:44] * Ormod (~valtha@ohmu.fi) has joined #ceph
[10:46] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Quit: Ex-Chat)
[11:10] * yehuda_hm (~yehuda@ Quit (Ping timeout: 480 seconds)
[12:38] * yehuda_hm (~yehuda@ has joined #ceph
[12:40] * yehuda_hm (~yehuda@ Quit (Read error: Connection reset by peer)
[14:25] * gregorg (~Greg@ Quit (Quit: Quitte)
[14:25] * gregorg (~Greg@ has joined #ceph
[14:31] * gregorg (~Greg@ Quit (Quit: Quitte)
[14:31] * gregorg (~Greg@ has joined #ceph
[14:36] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[15:33] * yoshi (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[15:47] * lxo (~aoliva@83TAADCTR.tor-irc.dnsbl.oftc.net) has joined #ceph
[16:37] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[17:03] * aliguori (~anthony@ has joined #ceph
[17:09] <wido> The Ceph Ubuntu packages got updated to 0.34: http://packages.ubuntu.com/search?keywords=ceph&searchon=names&suite=oneiric&section=all
[17:09] <wido> So 11.10 will have a newer version of Ceph
[17:16] * Tv_ (~Tv_@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[17:21] <wido> I had a few dead disks again, so my cluster has some really heavy recovery to do, which is not working out that well.
[17:21] <wido> On all my machines I see "btrfs-cleaner" eating about 50% of all the CPU capacity and the OSDs eating the rest
[17:21] <wido> are you seeing the same?
[17:45] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:01] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[18:02] <acaos> Hello; I'm setting up a Ceph cluster and I was wondering if anyone has any experience with memory usage ballooning on the cosd processes
[18:06] <Tv_> acaos: we had a bug like that recently
[18:07] <acaos> Anything I could do to help you debug it? Or should I cherry-pick a fix?
[18:07] <Tv_> acaos: i see a few fixes going in on the 7th
[18:07] <acaos> I'm running the 0.34 release; 3 monitors and 320 osds
[18:07] <Tv_> acaos: if this is just a test setup, you might want to try running master until 0.35 comes out
[18:08] <acaos> it's hopefully production soon, but right now it's still in testing, so I'll pull master and give it a shot
[18:08] <Tv_> acaos: commits 228bd59216e355e95d5484259b1ee5acd369d8c4 and 676dc9ce6a485ec1fbd4ed03fce89109e64290d4 are mem leak fixes in cosd
[18:09] <acaos> also, the thread bloat with that many OSDs and the default crush map setup is .. an issue
[18:09] <Tv_> those don't look like the fairly serious mem leak i remember hearing about earlier
[18:09] <Tv_> acaos: are you running lots of OSDs per host?
[18:09] <acaos> yes
[18:09] <acaos> 1 per device
[18:10] <Tv_> acaos: how many devices, how many cpu cores?
[18:10] <acaos> 16 devices, 16 cores
[18:10] <Tv_> acaos: oh that actually sounds fairly balanced
[18:11] <Tv_> acaos: it pretty much comes down to this: we're not focusing on performance quite yet; right now we'll be happy to hear it works right ;)
[18:11] <acaos> with that many OSDs, it falls over badly with the default crush map; runs out of available pids
[18:12] <Tv_> acaos: your 320 osd setup sounds big enough that you probably want to tune pgs & crush rules to match your setup
[18:12] <acaos> yeah, already had to do that
[18:12] <Tv_> whoa
[18:12] <acaos> even with our tuning, over 5000 threads per host
[18:12] <Tv_> acaos: i haven't heard of *that*
[18:14] <Tv_> acaos: ok so two things: 1. 5000 threads for 16 osds ~= 312 threads/osd, which sounds a bit high but i haven't checked lately
[18:14] <Tv_> acaos: but 2. 5000 threads is *nothing* to a modern linux box!
[18:14] <Tv_> it definitely should not be running out of pids
[18:15] <acaos> that's post-tuning, pre-tuning it was over 30000
[18:15] <Tv_> ah that i can see being an issue
[18:16] <acaos> the problem is the default crush map has every OSD talking to every other, pretty much, so you've got at least 320 * 320 threads right there
[18:16] <acaos> well, 16 * 320, but I think it has one thread per direction, too
[18:16] <Tv_> that's what PGs are for
[18:17] <acaos> yeah
[18:17] <acaos> what I did was grouped the OSDs into subpools so they had a limited group of others they talk to
[18:17] <bchrisman> ahh.. that makes sense.. the default is a flat space of OSDs.. which is bad for everything when you have more than a trivial cluster.
[18:18] <Tv_> but even then, it's not really OSDs that talk to OSDs, it's PG talking to other replicas of the PG
[18:18] <acaos> yeah
[18:18] <Tv_> that should limit the amount of communication
[18:18] <acaos> if, say, osd.1 and osd.2 share 3 PGs, do they establish 3 connections or 1?
[18:18] <bchrisman> yeah??? the flat space means that osds could potentially talk to every other osds.. by making that hierarchy.. that limits it.. right?
[18:19] <acaos> bchrisman: correct, that's how we solved it
[18:20] <Tv_> my understanding is that the crush hierarchy is more a tool for fault isolation
[18:20] <acaos> Tv_: we killed two birds with one stone there
[18:20] <bchrisman> seems like at least the idea of having multiple nodes with multiple osds is common enough that the default should be to have node-level redundancy.. but right now it doesn't.
[18:20] <bchrisman> (the default crushmap.. that is)
[18:21] <Tv_> bchrisman: the flat default is there because it's hard to automagically detect what osds are on the same node
[18:21] <Tv_> bchrisman: that'll change when we publish the chef cookbooks to bring up a whole cluster at once
[18:22] <acaos> anyway, I will give master a try and see if it helps our memory leak issues, because we were having each OSD use 500m-2g after a weekend of benchmarking, and the monitors were using >20g
[18:22] * eternaleye (~eternaley@ Quit (Ping timeout: 480 seconds)
[18:22] <bchrisman> yeah??? can drag that out of ceph.conf, but it's not 'naturally enshrined'.. :)
[18:22] <Tv_> my understanding is that the thing you really care about is number of PGs per physical machine
[18:22] <Tv_> that's the unit of memory, cpu etc consumption
[18:23] <Tv_> so with 16 osds per machine, you might accidentally have way too many PGs on the machine
[18:23] <Tv_> and the CRUSH hierarchy shouldn't really be all that related
[18:23] <acaos> we've only got ~8000 PGs at the moment
[18:23] <acaos> though we want to get up to a lot more
[18:23] <Tv_> acaos: so ~400 per machine? last I heard ~100 was a good guess for max PGs for one host
[18:24] <acaos> the docs said 100/OSD
[18:24] <acaos> rather than 100/host
[18:24] <Tv_> acaos: (we haven't benchmarked it yet to find the sweet spot, and we definitely haven't tuned it much)
[18:24] <Tv_> acaos: ahh docs are confusing there
[18:24] <Tv_> acaos: cpu & ram is consumed from the whole host
[18:24] <acaos> true
[18:24] <Tv_> acaos: we sometimes mistake "osd" to be "host"
[18:24] <Tv_> trying to get out of that as much as we can, but there's lots of old references
[18:25] <acaos> thing is, every pool we make is adding more PGs too
[18:25] <Tv_> acaos: oh and performance of large numbers of pools is also a bit of an unknown right now
[18:26] <Tv_> because of just this overhead
[18:26] <acaos> how's performance of pools with very large numbers of objects?
[18:26] <Tv_> should be significantly easier
[18:26] <Tv_> i don't have a good benchmark on ridiculously large pools, but nothing obvious jumped out
[18:26] <acaos> I am noticing though that pgs are flat directories inside
[18:27] <acaos> wouldn't that have directory performance issues at some point?
[18:27] <Tv_> there's work pending on auto-split/merge of pgs
[18:27] <Tv_> but also, htrees & btrfs are way better at flat directories these days than old school unix
[18:27] <acaos> I have some experience with ext4 with very large directories and the performance is ... poor even with directory hashing
[18:28] <Tv_> in the future, it'll hash the contents out to subdirs when the dir gets too big
[18:28] <acaos> yeah, the stuff I work on uses directory hashing extensively
[18:28] <Tv_> depends on the operations you do; and "ls" will still suck, independently
[18:28] <acaos> is that the intent of the new HashIndex stuff which was just committed?
[18:28] <acaos> (for hashed subdirectories)
[18:28] <acaos> I was reading the commits, but it was a little unclear what the new filestore workw as for
[18:28] <acaos> er, was for
[18:29] <Tv_> yeah that's all PG expansion/contraction stuff
[18:29] * gregaf1 (~Adium@aon.hq.newdream.net) has joined #ceph
[18:29] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[18:29] * eternaleye (~eternaley@ has joined #ceph
[18:30] * Tv|work (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:30] * gregaf (~Adium@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[18:30] <Tv_> heh
[18:30] * Tv_ waves at Tv|work
[18:30] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[18:30] <acaos> is there an additional architecture mailing list, or other source I could keep up on?
[18:30] <Tv_> acaos: if you want to chat more about that, sjust is the one working on it
[18:31] * sagewk (~sage@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[18:31] <Tv_> acaos: ceph-devel and this irc is pretty much where it's at
[18:31] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[18:31] <acaos> because I was very interested in the filestore changes, but I couldn't find as much info as I wanted
[18:31] <acaos> ok, thank you
[18:35] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:48] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[18:52] * eternaleye (~eternaley@ Quit (Quit: eternaleye)
[18:52] * eternaleye (~eternaley@ has joined #ceph
[19:05] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[19:09] <greglap> acaos: Tv_: I'm still doing some work with memory leaks, but the really bad one in the OSDs is fixed in current master
[19:11] <Tv_> oh right clarification: sjust was working on the directory hashing things, greg has been grinding with valgrind ;)
[19:19] <acaos> greglap: thank you; I'm actually building that now to test
[19:19] <acaos> there's a really bad one in the monitors too
[19:19] <acaos> (our monitors hit >20g in size and went belly up)
[19:20] * cmccabe (~cmccabe@maa2636d0.tmodns.net) has joined #ceph
[19:27] <Tv_> sounds like a bug alright ;)
[19:28] <acaos> I suspect it may be related to the fact that one of our OSDs fell half-off the network (the cluster address did, but not the public)
[19:28] <Tv_> acaos: ooh interesting
[19:28] <acaos> however, we don't have the client_messenger/cluster_messenger fix from last week in
[19:28] <Tv_> we haven't tested failures other than fail-stop that much
[19:29] <Tv_> perhaps that left it the osd in a half-alive state, and it still got messages queued for it
[19:29] <acaos> it was still able to communicate with the mon, but not the other osds
[19:29] <acaos> it was spam-killing the other osds
[19:29] <greglap> acaos: are you using cephx?
[19:29] <acaos> no, we are not
[19:29] <bchrisman> also that can screw up other nodes, as there's no throttling of repeering traffic
[19:29] <greglap> and yes, I could see a split death doing horrible things to memory on other nodes
[19:30] <acaos> the memory leak was before that split death
[19:30] <acaos> at least, the OSD one
[19:30] <acaos> the monitor one was after
[19:30] <greglap> yeah
[19:31] <greglap> the OSD one you're worried about is probably 2f04acb3ccc198076e37e4751cb71ea4fc6e6949
[19:31] <acaos> basically, it was doing stuff like this over and over: mon0 28065 : [INF] osd166 failed (by osd255
[19:31] <acaos> is the one with the half-dead network
[19:31] <greglap> although actually 8c5cb598357ea452a07704554db27bb674efe21a might be relevant too
[19:32] <acaos> let me glance at those really quickly
[19:34] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[19:34] <acaos> would that pg leak fix in 2f04... happen in a no-failure case?
[19:36] <greglap> acaos: hmm, I don't actually remember
[19:37] <acaos> hm, I have another strange issue just now, actually - I've killed all our osds, but the monitors still think: osd e16469: 320 osds: 50 up, 319 in
[19:37] <greglap> they'll go down eventually
[19:37] <Tv_> acaos: if you kill them *all*, the monitor needs to wait a timeout to realize they're dead
[19:37] <acaos> it's been 20 minutes
[19:38] <Tv_> if you kill some, the survivors report the others dead
[19:38] <Tv_> oh
[19:38] <Tv_> i think the timeout is 15 minutes
[19:38] <acaos> I'll wait a bit more, then
[19:38] <Tv_> our qa coverage doesn't handle this stuff yet :(
[19:39] <acaos> I'll need to file some bug reports =)
[19:42] <acaos> so, another question - I see blacklist applies to the osds, but is there a version for the monitors, too?
[19:42] <acaos> to tell a monitor 'there's a misbehaving osd, ignore it'
[19:43] <gregaf1> acaos: actually, the blacklist isn't used for that
[19:43] <gregaf1> it's for telling the OSDs to ignore dead MDSes and other kinds of nodes
[19:43] <gregaf1> you could maybe use it for ignoring other OSDs, but we have other tools for that
[19:44] <acaos> what would the best way to do that be, then?
[19:44] <gregaf1> sjust: actually, how do you kill an OSD via the monitors?
[19:44] <gregaf1> I forgot that marking down didn't stick if the OSD disagrees
[19:44] <sjust> gregaf1: I'm not sure, checkin the wiki
[19:44] <acaos> will 'out' work?
[19:44] <acaos> or can an osd disagree?
[19:45] <sjust> acaos: marking an osd out causes the map to assign its pgs elsewhere
[19:45] <gregaf1> acaos: out should work, but it's stronger ??? it will remove it from the mapping and rereplicate the data elsewhere
[19:45] <acaos> yeah, I just don't know if the osd can disagree on that, too
[19:45] <acaos> (since, presumably, it would 'know' when it's up)
[19:45] <sjust> acaos: no, out/in up/down are seperate
[19:45] <acaos> so an osd can never say it's in?
[19:46] <sjust> right
[19:46] <sjust> although on boot I think they default to in
[19:46] <acaos> yeah, which is essentially 'an osd can mark itself in'
[19:46] <acaos> I guess I'm asking for an 'administratively killed' status
[19:46] <sjust> yeah, we probably want to change that
[19:46] <sjust> for forcing the osd to die...
[19:47] <gregaf1> it's a bit odd for us since we presume you have good admin tools, and failures are always considered catastrophic
[19:47] <Tv_> the fire alarms in my building are playing classical music ;)
[19:47] <acaos> failures happen
[19:47] <Tv_> the joys of building maintenance
[19:47] <gregaf1> you just picked up on one of the (hopefully rare) ones that isn't
[19:47] <sjust> I think the "correct" way to do it is to just kill the process
[19:48] <acaos> yeah, it just happened that that particular server couldn't be reached by administration due to the network split
[19:48] <acaos> but it could still talk to the monitor
[19:48] <sjust> oh
[19:48] <Tv_> sjust: yeah i'm an enemy of in-ceph communication when there's plenty of good outside-ceph communication & control mechanisms
[19:49] <acaos> I would agree with that, but I think there's a good use-case for a 'killed' status
[19:49] <Tv_> i mean, what next; "ceph remoteshell"?
[19:49] <acaos> 'ceph dwim'
[19:49] <Tv_> acaos: but a good setup will just restart an exiting daemon, unless you tell the supervisor otherwise!
[19:50] <acaos> yeah, I meant by 'killed' rather 'the osd itself will attempt to die, and all other ceph services in the cluster will ignore, refuse, and close connections from it'
[19:51] <acaos> I guess it could be done with cephx; it's just rather heavy-weight to need the full auth system for that
[19:51] * cp (~cp@ has joined #ceph
[20:18] * Dantman (~dantman@S01060023eba7eb01.vc.shawcable.net) has joined #ceph
[20:40] * Tv_ (~Tv_@cpe-76-168-227-45.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:12] * Tv (~Tv@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[21:12] <Tv> I wonder how many times I can spill things on my Macbook Air before it's completely fried.
[21:19] <xns> you'd be surprised, my keyboard backlight is a pleasing coffee brown now
[21:19] <Tv> i think i'm at my 4th "fails to power on" situation now
[21:19] <Tv> waiting for it to dry up..
[21:20] <Tv> that's 4 in 9 months
[21:23] <ajm> I pancaked mine flat, still going strong.
[21:38] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:49] * aliguori (~anthony@ has joined #ceph
[22:09] * cmccabe (~cmccabe@maa2636d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[22:42] <slang> sjust: still trying to debug this problem I'm seeing with pgs stuck in crashed+peering
[22:45] <slang> it looks like the issue is just that GetMissing doesn't send any requests out because the peer info is empty for all the acting osds in this pg
[22:45] <slang> this results in peer_missing_requested being empty and need_up_thru being true...
[22:46] <slang> so we never transition to active
[22:46] * jclendenan (~jclendena@ Quit (Read error: Connection reset by peer)
[22:49] * jclendenan (~jclendena@ has joined #ceph
[23:02] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[23:11] * ghaskins_ (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[23:16] * cp (~cp@ Quit (Quit: cp)
[23:23] <sjust> slang: I'm making some progress now, sorry for the delay
[23:29] <sjust> slang: which osd got marked lost?
[23:29] <acaos> gregaf1: the changes in 0.35 have had an interesting effect; vsize for the cosd processes is the same as before (and appears to be slowly growing), but rss is much lower (current split is ~650m/~200m; before the vsize and rss were almost the same)
[23:31] * al (d@niel.cx) Quit (Ping timeout: 480 seconds)
[23:32] <greglap> acaos: you mean the current master over the last release?
[23:32] <acaos> yes, sorry
[23:32] <acaos> and wrong person as well, sorry
[23:32] <acaos> current master versus 0.34
[23:33] <greglap> acaos: they're both me, I'm running around so I'm on multiple computers at different times today :)
[23:33] <acaos> I am enlightened =)
[23:33] <acaos> screen junkie here, so I tend to just log in once
[23:33] <greglap> heh
[23:34] <sjust> slang: I seem to have found the (a?) problem
[23:34] <greglap> there are definitely still some leaks of various types that might be accounting for lost vsize, or it might just be the allocation patterns, not sure
[23:34] <sjust> I kid you not, it's a 0 vs o bug
[23:34] <acaos> it could be a memory fragmentation thing, yeah
[23:34] <acaos> I have had no real luck trying to use tcmalloc's heap dump, though
[23:35] <greglap> hmmm, we were just using it today
[23:35] <acaos> well, it's functional, just unenlightening
[23:35] <greglap> had some trouble with debug symbols but otherwise it was fine
[23:35] <greglap> ah, yeah
[23:35] <acaos> yeah, debug symbol issues, and telling me 27.1% of the memory is used by std::string
[23:35] <greglap> oh, learning to read it can take a while
[23:36] <greglap> it's probably easier if you go through the hassle of installing the graphical bits
[23:36] <acaos> some day I'll be able to read 0000000000571af8
[23:36] <greglap> then you can see call strings for who owns it and stuff
[23:37] <greglap> ah, yeah, that's probably a problem with debug symbols
[23:37] <greglap> which...I can't help you with, unfortunately :/
[23:37] <greglap> probably the vsize is so large because of the number of threads for the messenger
[23:37] <acaos> I have them all built and around, I can gdb and so on just fine
[23:37] <acaos> but pprof doesn't seem to be able to use them
[23:37] <sjust> slang: I just pushed a fix to master for a prior_set bug which is probably causing your issues
[23:37] <acaos> I probably just need to figure out the right invocation
[23:37] <greglap> acaos: oh, strange ??? perhaps it doesn't do as well pulling the symbols in; it's just a perl script
[23:37] <greglap> or you could probably get gdb to translate the offsets into file and line numbers?
[23:38] <greglap> not sure about that, though
[23:38] <acaos> that actually failed too, I tried disas *addr
[23:38] <acaos> and that didn't get me anywhere
[23:38] <greglap> hmm, dunno then
[23:38] <acaos> tcmalloc's offsets don't match with gdb's for whatever reason
[23:38] <acaos> I'll figure it out, just not as enlightening as I had hoped
[23:39] <acaos> it could be thread stacks using up all that vsize, but the interesting thing was just that before, vsize and rss were almost the same
[23:39] <acaos> but now, big difference
[23:40] <greglap> hmm, not sure then
[23:45] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[23:53] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[23:55] <acaos> ah, very nice - heap release worked to drop a lot of that RSS too
[23:56] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.