#ceph IRC Log


IRC Log for 2012-09-19

Timestamps are in GMT/BST.

[0:17] <pentabular> re: topic: still?
[0:20] <nhmlap_> hrm, that probably needs to be fixed
[0:35] <pentabular> does ceph use multiple config files e.g. per-host in /etc/ceph?
[0:35] <dmick> lol, no ops
[0:35] <elder> dmick, I can't get to plana53 console again. (Thanks to my !#@ freezing machine.) Can you help?
[0:35] <elder> Info: SOL payload already active on another session
[0:36] <dmick> ISTR ipmitool has a "steal" option
[0:36] <dmick> try "sol deactivate"
[0:36] <elder> OK.
[0:37] <pentabular> re: configs: or just per-cluster?
[0:39] <joshd> pentabular: just per-cluster - the default is /etc/ceph/$cluster.conf, and the default cluster is ceph
[0:39] <elder> dmick, "sol deactivate" did the trick for me. I have a new option in my plana script now.
[0:40] <elder> Thank you.
[0:40] <joshd> pentabular: you can always specify a config file to use instead
[0:40] <dmick> Que bon.
[0:40] <pentabular> trying to separate various deploy automations from what is actually ceph
[0:40] <pentabular> e.g. puppet, chef...
[0:40] <pentabular> ceph-deploy
[0:41] <dmick> or, C'est bon. I don't konw. I don't do French.
[0:42] <dmick> it should be said though that ceph.conf doesn't necessarily have to be the same on every machine
[0:42] * pentabular should probably look for installed manpages
[0:42] <dmick> if that's more what you were driving at. dunno
[0:43] <nhmlap_> dmick: I originally read that as "ceph.com".
[0:43] <nhmlap_> dmick: yay for dynamic webpages
[0:45] <dmick> heh
[0:46] <dmick> I suppose ceph.conf could be made to contain a URI
[0:46] <dmick> (I'm in a sick mood today; I blame libvirt)
[0:47] <nhmlap_> lol
[2:29] * pentabular (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has joined #ceph
[2:31] <pentabular> why are there empty init conf files?
[2:31] <pentabular> /etc/init/ceph-mds-all.conf
[2:31] <pentabular> /etc/init/ceph-mon-all.conf
[2:31] <pentabular> ^ ^ nothing but a one-line description
[2:32] <dmick> I dunno, let's look
[2:32] <dmick> mine are not empty
[2:32] <pentabular> populated dynamically?
[2:32] <dmick> possibly they were placeholders for when we knew they'd contain contents
[2:33] <dmick> so that the packaging was done before the upstart scripts were complete
[2:33] <dmick> that's a guess
[2:33] <pentabular> ah.. these are in fact from the debian repo
[2:33] <dmick> let's git log them
[2:33] <dmick> and see
[2:34] <dmick> hum. no, the only version I have has contents in it
[2:35] <dmick> hm. but not the contents I have in my filesystem. Perhaps upstart scripts are magic
[2:35] <pentabular> never really heard of dynamically created upstart scripts, but why not I guess
[2:36] <pentabular> haven't got very far.. perhaps magic will occur later
[2:36] <dmick> well, as I say, the sources are in the github repo
[2:36] <dmick> but they don't match
[2:37] <pentabular> somewhere under debian?
[2:37] <dmick> oh hang on, looking at wrong branch
[2:37] <dmick> src/upstart
[2:37] <pentabular> lol. thks
[2:39] <pentabular> looks like Tv may have added content there just last week
[2:41] <pentabular> should I be using github or tarball release instead of debian repo?
[2:42] <pentabular> that sounds obvious out loud
[2:42] <dmick> well
[2:42] <dmick> the debian repo is also built nightly
[2:42] <dmick> (or actually on-demand)
[2:42] <dmick> just depends on which debian repo
[2:43] <pentabular> ORLY?
[2:43] <dmick> yeah for instance
[2:44] <dmick> I get the bleeding edge every time I upgrade
[2:44] <pentabular> I'm using "deb http://ceph.com/debian/ precise main"
[2:45] <pentabular> ..wouldn't mind using whatever dev branch
[2:45] <dmick> those are the "STABLE RELEASE"
[2:45] <dmick> see http://ceph.com/docs/master/install/debian/ for other options
[2:45] <pentabular> "STABL^H^HE"
[2:45] <dmick> but of course you get what you pay for
[2:45] <dmick> might be busted
[2:46] <pentabular> N.P. OK w/ that
[2:46] <dmick> I just mean that's the heading in that section
[2:46] <pentabular> better than being lost in the past
[2:47] <dmick> yeah, ok, that matches now
[2:47] <dmick> you probably had 3764ca6115fe7c0bf8dcba448af012625ea5abc3 in yours
[2:47] <dmick> I had 60e273ad5cdaaea9f3177e4eb7369a591f6105bb in mine
[2:48] <pentabular> dunno where to spot that in the installed stuff
[2:49] <pentabular> ah.. yes, re: ...install/debian. thanks.
[2:50] <pentabular> had got my repo from the "5-min. quickstart guide"
[2:51] <pentabular> domo arigato
[2:51] <pentabular> [mr. robato]
[2:52] * pentabular sploosh
[2:52] * pentabular (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has left #ceph
[3:31] * sagelap (~sage@150.sub-166-250-34.myvzw.com) has joined #ceph
[3:39] * pentabular (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has joined #ceph
[3:41] <pentabular> how many things need a port in ceph? Is it just the one "6789" (example)?
[3:44] <pentabular> my other guess would be: mon, rbd, and osd (3 separate ports)
[3:45] * sagelap (~sage@150.sub-166-250-34.myvzw.com) Quit (Ping timeout: 480 seconds)
[3:47] <pentabular> okay, no.. just the one port.. it's all on 6789 (or whatever you set), right?
[8:49] <xiu> hi, is using rbd on a osd server still tricky ?
[10:51] <NaioN> xiu: yes :)
[11:27] <xiu> NaioN: ok thanks :) just found out http://tracker.newdream.net/issues/3076
[13:39] <nhmlap_> good morning #ceph
[13:44] <joao> good morning nhmlap_ :)
[16:13] <amatter_> I need to take a large osd down for maintenance. Is there a way to suspend the replication of its several terabytes of data while it is down so that the other osds aren't flooded with activity and the cluster takes a big performance hit?
[16:22] <nhmlap_> amatter_: hrm, I'm not sure. There is a pause command that seems to pause and unpause command that seems to stop IO requests from going to the OSDs, but I don't think that's quite what you want.
[16:23] <nhmlap_> ugh, tool early in the morning
[16:26] <joao> I don't think there is a way to do specifically that, but I believe there's a timeout that's responsible for triggering the replication once an osd is down for too long
[16:27] <joao> maybe setting that timeout for a higher value could avoid the replication during the maintenance, but I don't think that's advised
[16:36] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:59] <mgalkiewicz> anybody home?
[16:59] <nhmlap_> mgalkiewicz: Sort of, what's up?
[16:59] <mgalkiewicz> failed upgrade from 0.48.1 to 0.51
[17:00] * nhmlap_ crawls back under the covers
[17:00] <mgalkiewicz> I have upgraded osd0 to 0.51 and waited some time for recovery
[17:00] <nhmlap_> mgalkiewicz: well that doesn't sound pleasant. Any messages?
[17:01] <mgalkiewicz> when it was finished I have shut down osd2 and started upgrade
[17:01] <mgalkiewicz> during this osd0 and osd1 started to crash
[17:01] <mgalkiewicz> I will provide you with logs in a minute
[17:02] <nhmlap_> mgalkiewicz: Ok. Not sure how much help I can be but I'll give it a try.
[17:03] <elder> I don't remember seeing this before in my teuthology runs. Is it new?
[17:03] <elder> <type 'exceptions.StopIteration'>
[17:05] <nhmlap_> elder: No idea. I haven't been using teuthology lately.
[17:05] <mgalkiewicz> https://dl.dropbox.com/u/5820195/ceph-osd.1.log.gz
[17:07] <mgalkiewicz> after the upgrade I have started osd2 and shut down osd1
[17:07] <nhmlap_> mgalkiewicz: do you use cephx?
[17:07] <mgalkiewicz> osd1 is the only not upgraded
[17:07] <mgalkiewicz> yes
[17:07] <mgalkiewicz> now the cluster is slowly recovering but it shows some pgs unfound
[17:08] <mgalkiewicz> do you think it is a good idea to upgrade osd1 and start it?
[17:08] <mgalkiewicz> without upgrading the rest of osds crash randomly
[17:08] <nhmlap_> ok, so you only updated the one OSD and not the other?
[17:09] <mgalkiewicz> osd0 is upgraded, osd1 not and osd2 was stopped for upgrade during which the problems started
[17:10] <mgalkiewicz> all three are replicated
[17:11] <nhmlap_> hrm, I wonder if this is the same problem as here: http://tracker.newdream.net/issues/3072
[17:14] <mgalkiewicz> I guess that debian package is not release yet with this fix
[17:14] <nhmlap_> There is a patch for the problem reported in the tracker: http://tracker.newdream.net/projects/ceph/repository/revisions/03136d057f0048e9cd840a6e83efedfc20969247
[17:15] <mgalkiewicz> the log I sent you is from 0.48.1
[17:15] <mgalkiewicz> it looks like the problem does not occur on upgraded osds 1 and 2
[17:15] <nhmlap_> mgalkiewicz: Yeah, I'm not sure how long that bug has been present.
[17:15] <mgalkiewicz> 0 and 2
[17:16] <mgalkiewicz> do you think that the upgrade of osd 1 may help?
[17:16] <mgalkiewicz> it is now shut down and osd0 and 2 are recovering the cluster
[17:18] <nhmlap_> mgalkiewicz: It doesn't seem like that bug was fixed until after 0.51 was released, but it's always possible that something else resolved the issue on your 0.51 nodes.
[17:19] <mgalkiewicz> for now it looks like 0.51 cant cooperate properly with 0.48.1
[17:20] <mgalkiewicz> I did the same in my staging cluster and it worked without a problem
[17:20] <nhmlap_> mgalkiewicz: I don't do a lot of mixed cluster testing myself, so I can't really say. It may be that a bug in 0.48.1 is getting exposed when you upgraded the other nodes.
[17:25] <mgalkiewicz> and what about those slow requests?
[17:26] <mgalkiewicz> nhmlap_: what might caused it?
[17:26] <mgalkiewicz> status: https://gist.github.com/3750250
[17:27] <mgalkiewicz> it does not seem to recover futher
[17:30] <nhmlap_> mgalkiewicz: I don't have any real expert knwoledge of this problem. Based on the tracker entry it sounds like during replay you'll end up waiting on the wrong version of a repop. I think we'll need to wait for Sam or Sage to get a better answer if this is what you are seeing and what we should do from here.
[17:31] <nhmlap_> mgalkiewicz: I don't want to give you bad advice that could screw it up more.
[17:32] <mgalkiewicz> ok but if I understand correctly only those 32 pgs (unfound) are not usable right now?
[17:33] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[17:38] <nhmlap_> mgalkiewicz: not sure if this might be useful to check: http://ceph.com/docs/master/cluster-ops/troubleshooting-osd/?highlight=unfound
[18:23] <mgalkiewicz> nhmlap_: I have upgraded osd1 and after start everything is fine
[18:23] <mgalkiewicz> only mds is laggy for some reason nothing interesting in mds logs
[18:23] <nhmlap_> mgalkiewicz: Oh good. I'm glad that's all it took.
[22:16] <pentabular> is there no support for 32-bit systems?
[22:16] <pentabular> re: "Ceph was designed to run on commodity hardware"
[22:17] <joshd> huh? where do you get that idea?
[22:17] <pentabular> bot. of hardware requirements
[22:18] <pentabular> http://ceph.com/docs/master/install/hardware-recommendations/
[22:18] <joshd> the only thing that doesn't work on 32-bit is the fuse client for cephfs
[22:18] <pentabular> of all things, really?
[22:18] * pentabular stops being snarky
[22:19] <joshd> hehe
[22:19] <pentabular> good to know, though the listed min. req's say 64
[22:20] <joshd> looks like a bad header
[22:20] <joshd> the section above has 64/i386
[22:20] <pentabular> okay, nevermind my 64/286 :)
[22:25] <pentabular> obviously 64-bit is going to support better scale, just trying out on 32-bit hardware
[22:26] <dmick> I didn't know 32 bit hardware still existed :-P
[22:26] <dmick> did you blow the dust out first pentabular? :)
[22:26] <dmick> (I'm just kidding)
[22:26] <pentabular> oh, there were a few moths between the tubes..
[22:26] <dmick> Hopper bugs!
[22:27] <pentabular> picture a McIntosh amp w/ monitor and keyboard...
[22:28] <pentabular> oops: forgot to feed the squrrel..
[22:28] <dmick> it's kinda slow, but it sounds *fabulous*
[22:29] <pentabular> dmick: noticed the missing init files / contents also in dev/release pkgs
[22:29] <pentabular> they are older than Tv's commit
[22:29] <pentabular> so.. dev/qa-test it is..
[22:30] <pentabular> oh yeah: that's why I asked:
[22:30] <dmick> pentabular: yes, that stuff is evolving quickl;y
[22:30] <pentabular> the repo for the QA/Test pkgs includes "-x86_64"
[22:30] <dmick> we may well not be building 32-bit packages; that's possible
[22:31] * pentabular warms hands over glowing tubes
[22:31] <pentabular> huh.
[22:31] <dmick> or maybe not
[22:31] <dmick> http://gitbuilder.ceph.com/ceph-deb-precise-i686-basic/ref/master/
[22:31] <kyle_> Hello, excuse my ignorance. I haven't used IRC since the nieties. I have a small ceph cluster and have a question about ceph-mds CPU Usage.
[22:32] <dmick> ask away kyle_
[22:32] <kyle_> My MDS server is only using one core by the looks, should I have multiple intances per server to utilize all cores?
[22:33] <dmick> pentabular: those haven't been refreshed lately tho
[22:35] <gregaf1> kyle_: systems with more than one active MDS are considerably more likely to break
[22:35] <pentabular> dmick: indeed, but thanks.
[22:35] <gregaf1> unfortunately it does have a "Big MDS Lock" so it's effectively pretty single-threaded
[22:36] <gregaf1> but I'm afraid you're going to have to live with it unless you like being very bleeding-edge (we don't call CephFS production-ready yet, but there are lots of workloads a single-MDS system does fine at; not so much with a multi-MDS system)
[22:37] <dmick> it might be worth restating that the MDS is not as much of a bottleneck with CephFS as it is with other distributed filesystems
[22:38] <kyle_> So is adding multiple MDS servers a stable solution, or are there settings I can tweak to squeeze better utilization per box?
[22:38] <nhm> kyle_: are you seeing poor metadata throughput?
[22:41] <joao> btw, is any of you guys having issues with thunderbird syncing with the inktank account?
[22:41] <kyle_> Currently I have an DRBD dual server setup that is running out of space. So as a more scalable solution I have decided to use ceph. Currently I am Rsyning my DRBD servers to the ceph cluster. It seems to be going slow, and I noticed in top that my mds server never goes above %13 (with irix mode off). This box has dual quad core xeons.res.
[22:44] <darkfader> wtf is an irix mode?
[22:44] <darkfader> *chuckle*
[22:45] <kyle_> when you are in top hit shift+i and it accounts for all cores
[22:45] <darkfader> ahhh
[22:45] <darkfader> hehe ok
[22:45] <kyle_> I don't have much more in terms of numbers. My initial benchmarks using sysbench were limited by the NIC (which i plan to bond soon). Local benchmarks (not through mounting externally) were well above 300MBs.
[22:45] <darkfader> nice :>
[22:46] <kyle_> So i believe the mds CPU usage is bottlenecking. But I can have better numbers for you after this initial sync. Not sure how long it will take since it's about 725GB.
[22:47] <gregaf1> yeah, rsync is a pretty mean workload for the MDS — it does stuff that's not optimized as well as a lot of the operations are
[22:47] <darkfader> do you see the mds process always consume 100% one of the cores?
[22:47] <gregaf1> (and of course as a distributed FS rsync is always going to compare slow to a local FS)
[22:47] <darkfader> 8 cores could end up as 13% for one fully loaded
[22:47] <kyle_> yes but nothing more than the %100
[22:47] <kyle_> yeah exactly
[22:48] <darkfader> ok then you're right
[22:48] <gregaf1> but if you're moving from DRBD you could also use RBD and have just about the same front-end, right?
[22:48] <kyle_> i understand RSync is not the typical load. Was only curious if there was a way to get more out of mds
[22:49] <kyle_> i believe I haven't familiarized myself enough with RBD.
[22:49] <darkfader> gregaf1: could that ever be spread over multiple mds? only if he runs multiple rsyncs for different trees of the source fs, right?
[22:49] <darkfader> i successfully used rsync to kill my first ceph fs back then :)
[22:49] <kyle_> hahaha
[22:50] <gregaf1> darkfader: umm, I'd have to look again at exactly what rsync is doing and what the tree looks like
[22:50] <kyle_> is there a better way to move 1TB of data?
[22:50] <darkfader> kyle_: and tripe-read what greg said about /dev/rbd / or test it
[22:50] <darkfader> because it might be the right thing
[22:50] <gregaf1> if there are parallel metadata accesses then it would be faster, if not, no
[22:50] <kyle_> hmm okay i could do parallel access i believe. good point.
[22:53] <kyle_> so i plan to add another mds server in the near future is that as unstable as multiple daemons per box?
[22:54] <joshd> multiple active mds anywhere is unstable
[22:54] <kyle_> I'm adding it mostly for fault tolerance.
[22:54] <kyle_> i see
[22:54] <joshd> but for fault tolerance you can have another one in standby mode
[22:55] <joshd> this is the default right now, if you add another one
[22:55] <kyle_> ah that's right. I'll surely do that.
[22:58] <nhm> ooh, raspberry Pi turbo mode
[23:00] <kyle_> Thanks for all your help gentleman. My normal load/usage i believe will be fine since this is for user generated content and I have a CDN to mitigate the load.
[23:15] * nhm pokes github
[23:19] * pentabular thinks I'll tackle building from git source
[23:19] <pentabular> for my dusty 32-bit hardware
[23:20] <dmick> pentabular: it's fairly straightforward on Ubuntu at least, and I'm led to believe not hard on Debian and Centos
[23:21] <dmick> I'll take a look at that gitbuilder and see what's up
[23:21] <pentabular> please and thank you :)
[23:32] <pentabular> master branch seems pretty quiet
[23:38] <slang> anyone want to review wip-vstartfixes for me?
[23:42] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) has joined #ceph
[23:42] <dmick> slang: +1
[23:42] <gregaf1> are you sure that's the actual problem for vstart.sh -n though? I think it might be that without -x (cephx on) then the monitors don't create themselves a keyring but they still expect it, or something…ie, giving them an existing empty keyring masks the problem but I'm not sure it's the right fix...
[23:43] <dmick> oh other commit
[23:43] <gregaf1> (haven't looked into it in any depth myself; I just run with -x because the few times I've had to do so I was too lazy))
[23:43] <dmick> testing vstart right now, I'll test that fix
[23:44] <slang> without a keyring file the ceph-mon process fails
[23:44] <dmick> the one I had used was
[23:44] <dmick> -- [ "$cephx" -eq 1 ] && cmd="$cmd --keyring=$keyring_fn"
[23:44] <dmick> ++ cmd="$cmd --keyring=$keyring_fn"
[23:45] <dmick> I'm not sure which makes more sense
[23:45] <slang> also, any reason not to have set -e at the top of the script?
[23:46] <dmick> really surprised -e isn't the default
[23:46] <joao> gregaf1, sagelap1, what would you think it's best: cherry-pick the relevant wip-mon-gv commits and fix conflicts, or hack around the incompat feature bit to get this thing running?
[23:46] <dmick> reading its description. wonder what *doesn't* cause script exit?
[23:47] <gregaf1> not sure what you mean, joao?
[23:48] <slang> dmick: anything returning 0?
[23:49] <joao> making it impossible to run a live converted store, which has that feature set
[23:49] <gregaf1> ah, right
[23:49] <dmick> slang: yeah, but I mean -e vs non-e
[23:49] <gregaf1> and a merge is too expensive for an initial test?
[23:50] <gregaf1> in that case, yes, cherry-pick what you need, but keep a pristine branch/tag that you can use for a proper merge later
[23:50] <joao> yeah, full blown merge would be nasty
[23:50] <joao> cool, cherry-picking it
[23:58] <dmick> I can confirm -e, either set or on the #! line, seems to work for simple ('-n' only) runs, and that moving keyring to [global] seems to work

