#ceph IRC Log

Index

IRC Log for 2012-11-05

Timestamps are in GMT/BST.

[0:03] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[0:03] * stass (stas@ssh.deglitch.com) has joined #ceph
[0:03] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) Quit (Quit: Leaving.)
[0:10] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[0:29] * vhasi_ (vhasi@vha.si) has joined #ceph
[0:31] * vhasi (vhasi@vha.si) Quit (Ping timeout: 480 seconds)
[0:33] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) has joined #ceph
[0:37] * maxim (~pfliu@111.194.201.250) has joined #ceph
[0:41] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[1:04] * joao (~JL@62.50.239.160) Quit (Quit: Leaving)
[1:09] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:13] * maxim (~pfliu@111.194.201.250) Quit (Quit: Ex-Chat)
[1:15] * loicd (~loic@31.36.8.41) Quit (Quit: Leaving.)
[1:22] <lennie> ok, good bye till an other day...
[1:22] * lennie (~leen@524A9CD5.cm-4-3c.dynamic.ziggo.nl) Quit (Quit: bye)
[1:57] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[1:57] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[1:59] * joey_alu_ (~root@135.13.255.151) has joined #ceph
[1:59] * jmcdice_ (~root@135.13.255.151) Quit (Remote host closed the connection)
[2:00] * jmcdice (~root@135.13.255.151) has joined #ceph
[2:01] * joey_alu (~root@135.13.255.151) Quit (Ping timeout: 480 seconds)
[2:26] * lightspeed (~lightspee@82-68-190-217.dsl.in-addr.zen.co.uk) has joined #ceph
[2:35] * noob2 (47f46f24@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[2:41] * lightspeed (~lightspee@82-68-190-217.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[3:00] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) has joined #ceph
[4:20] <Robe> yay, insomnia
[4:50] <Robe> how do pool PGs map onto OSD pgs?
[4:50] <Robe> n:1?
[5:17] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[5:26] * danieagle (~Daniel@186.214.79.76) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[5:33] * stp__ (~stp@dslb-084-056-002-013.pools.arcor-ip.net) has joined #ceph
[5:41] * stp (~stp@dslb-084-056-021-011.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[6:04] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:06] * curmet (ca037809@ircip3.mibbit.com) has joined #ceph
[6:06] <curmet> Gm Folks !
[6:06] <curmet> :)
[6:06] <curmet> can anyone guide me about UBUNTU kernel version and cepph v.0.51 compatibility ?
[6:07] <curmet> *ceph
[6:08] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) has joined #ceph
[6:17] * mib_68vijz (ca037809@ircip3.mibbit.com) has joined #ceph
[6:19] * yoshi_ (~yoshi@p20198-ipngn3002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:26] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Ping timeout: 480 seconds)
[6:46] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:50] * Cube (~Cube@184-231-35-186.pools.spcsdns.net) Quit (Quit: Leaving.)
[7:44] * sagelap (~sage@62-50-223-253.client.stsn.net) has joined #ceph
[8:13] * yoshi_ (~yoshi@p20198-ipngn3002marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[8:46] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[8:51] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[8:54] * sagelap (~sage@62-50-223-253.client.stsn.net) Quit (Ping timeout: 480 seconds)
[9:16] * gucki (~smuxi@80-218-125-247.dclient.hispeed.ch) has joined #ceph
[9:17] <gucki> good morning :)
[9:21] <curmet> :)
[9:25] <Robe> sgott
[9:38] * loicd (~loic@31.36.8.41) has joined #ceph
[9:40] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:44] * isomorphic (~isomorphi@659AABQ4O.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:45] <Robe> the metadata pool is solely used for mds?
[9:54] <isomorphic> Can anybody tell me if ceph supports xattrs in the sense that Selinux requires? I can see the xattr references in the source, but mounting a ceph fs suggests that it is not supported :/
[10:00] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[10:10] <todin> morning #ceph
[10:10] <Robe> hoi
[10:23] * sagelap (~sage@ip212-238-58-191.hotspotsvankpn.com) has joined #ceph
[10:30] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:32] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:33] <tnt> Anyone used obsync on 0.48.2 ? I'm getting the ever so helpful message "ERROR TYPE: unknown, ORIGIN: destination"
[10:47] * mib_68vijz (ca037809@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[10:49] * sagelap (~sage@ip212-238-58-191.hotspotsvankpn.com) Quit (Ping timeout: 480 seconds)
[10:52] <tnt> meh ... I don't see how it could possibly work.
[10:53] <tnt> FileStore.__init__ calls Store.__init__(self, "file://" + url) but Store.__init__ doesn't take any arguments
[10:59] <tnt> mmm, the faulty commit dates back from "Thu Dec 8 14:30:02 2011". Does anyone uses obsync to sync to a local dir ?
[11:02] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[11:47] * loicd (~loic@31.36.8.41) Quit (Quit: Leaving.)
[11:50] <gucki> is there any way to estimate crushmap changes before actually performing them? like when i remove osd X and add osd Y then XX % will have to rebalance?
[11:53] * loicd (~loic@31.36.8.41) has joined #ceph
[11:55] * vhasi_ is now known as vhasi
[11:56] <gucki> anybody knows to interpret the values of slow request messages like here http://pastie.org/5188677 ?
[11:57] <gucki> am i correctly guessing that osd.2 is reporting that osd.3 was slow when writing data?
[11:57] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:02] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[12:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[12:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[12:10] * KindOne (KindOne@h116.26.131.174.dynamic.ip.windstream.net) Quit (Remote host closed the connection)
[12:15] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[12:15] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit ()
[12:17] * KindOne (KindOne@h116.26.131.174.dynamic.ip.windstream.net) has joined #ceph
[12:27] * curmet (ca037809@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[12:33] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[12:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Read error: Connection reset by peer)
[12:39] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[12:40] * maxim (~pfliu@222.128.129.242) has joined #ceph
[12:45] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[13:09] <NaioN> gucki: you get those when one or more osd's get overloaded
[13:09] <NaioN> and the backstore can't keep up
[13:10] <NaioN> gucki: in theory you can calculate which pg's will be moved/replicated but as far as I know there isn't a tool for
[13:11] <NaioN> but do you replace the osd? because then it will be all pg's that resided on the old osd
[13:14] <gucki> NaioN: yes, but how to i have to read the log messages so that i know which osd is overloaded? i think the osd on the left is not the one overloaded, but the reporting one?
[13:15] <gucki> NaioN: is there a place where one can add feature requests? like the tool for estimation?
[13:15] <gucki> NaioN: yeah, i'm replacing one machine and at the same time adding a new machine...so one osd out, 2 new in.
[13:15] <gucki> the new machine has 2 disks..
[13:15] <NaioN> gucki: you should ask one of the developers to know for sure what the messages mean
[13:16] <NaioN> I use something like iostat to see what is happening
[13:16] <gucki> NaioN: yes, same for me. quite nice tool, especially the % util column :)
[13:16] <NaioN> ok well in that case some pg's will get moved
[13:16] <gucki> NaioN: 25% in my case ;)
[13:17] <NaioN> hmmm those 25% is that on a multicore system?
[13:18] <NaioN> because I don't know if iostat deals with multi-core
[13:18] <NaioN> so it could be that one core is full
[13:19] <NaioN> gucki: http://tracker.newdream.net/ for issue/bugs/features
[13:20] <gucki> NaioN: ah the 25% were percentage of pg's rebalanced. output of ceph -w
[13:20] <gucki> NaioN: now i'm already at 20% :)
[13:20] <NaioN> ohhh
[13:21] <NaioN> i thought you meant the iostat :)
[13:22] <NaioN> ceph -w gives you indeed the output, but after you've added the osd
[13:23] <tnt> Just out of curiosity, do you guys use a lot of different rados pools ?
[13:24] <NaioN> not a lot
[13:24] <NaioN> about 8-10
[13:24] * aliguori (~anthony@213.27.183.42) has joined #ceph
[13:25] <tnt> ok. Here rgw create 8 just for various system stuff :p
[13:25] <NaioN> well could be, we don't use rgw
[13:26] <tnt> Ok, I was just worried to create too many and have too many pgs for nothing :p
[13:27] <tnt> But I have about 8 for real data as well.
[13:27] <NaioN> as far as i know it is considered stable for 100's of pools
[13:27] * aliguori (~anthony@213.27.183.42) Quit (Remote host closed the connection)
[13:27] <NaioN> but not 1000's and above
[13:28] <tnt> ok, good to know
[13:29] <Robe> btw.
[13:29] <Robe> who's responsible for the docs at inktank?
[13:38] * aliguori (~anthony@213.27.183.42) has joined #ceph
[13:41] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[13:48] * aliguori (~anthony@213.27.183.42) Quit (Remote host closed the connection)
[13:59] <rosco> Where are the cephday@amsterdam presentation slides? Are they online yet?
[13:59] <rosco> wido: ^^
[14:07] <gucki> rosco: these? http://vimeo.com/50620695
[14:07] <gucki> rosco: ah no, seems too old...sorry
[14:14] <rosco> gucki: np, its always "The slides will be up later" thing after presentations :)
[14:25] * mtk (R3jgasDv96@panix2.panix.com) has joined #ceph
[14:27] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[14:31] * maxim (~pfliu@222.128.129.242) Quit (Ping timeout: 480 seconds)
[14:34] * gregorg (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[14:36] * gregorg (~Greg@78.155.152.6) has joined #ceph
[14:37] <wido> rosco: they will be online, they are being collected
[14:39] <rosco> wido: Cool thnx.
[14:40] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[14:40] <wido> brings be to it. gregorg send your slides to Ross yet? ;)
[14:41] <wido> rturk: is collecting the slides and putting them online on ceph.com
[14:46] <gucki> any dev online who has good experience with adding new monitors? :)
[14:53] <tnt> there is really nothing to it ...
[15:09] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:15] <gucki> tnt: well, last time it rendered my cluster unavailable because no monitor had a quorum. yesterday we talked a lot about it here...
[15:16] <gucki> http://irclogs.ceph.widodh.nl/index.php?date=2012-11-04
[15:17] <gucki> tnt: do you think this step by step should be working? http://pastie.org/5189700
[15:17] <gucki> tnt: or did i mixup the correct order of the commands...?
[15:21] <tnt> No looks good to me ... but the cluster will become down for a few sec after you add mon.b and before it actually join the quorum IIRC.
[15:22] <gucki> tnt: ok, a few secs is no problem....last time it stayed down and i had to recover by manually injecting the old monmap into the primary (old) monitor
[15:22] <gucki> tnt: are you a dev? (then i'll try again later using my step by step guide.. *g*)
[15:22] <tnt> one issue I could see with the step is that when you add mon.b to mon.a, then you get a new monmap and you might need to inject it in mon.b manually ?
[15:23] <tnt> Nope, not a dev sorry ...
[15:23] <tnt> but I do have a test cluster hanging around so I could test those steps :p
[15:27] <jefferai> what's the currently recommended version of ceph to install -- debian stable or testing? I've been told testing is very stable usually and has bug fixes...
[15:27] <gucki> tnt: would be great :)
[15:28] <gucki> jefferai: version of ceph or distro?
[15:28] <jefferai> version of ceph
[15:28] <jefferai> they have two repos
[15:28] <jefferai> debian, and debian-testing
[15:28] <jefferai> debian-testing comes from master, but apparently by the time things get there they've had a lot of QA already
[15:28] <gucki> jefferai: i think there's only one stable version, which is 0.48.2 argonaut
[15:28] <jefferai> right
[15:28] <gucki> jefferai: no sure in which repo it is
[15:29] <jefferai> that's debian
[15:29] <gucki> jefferai: i'm using ubuntu quantal, which has 0.48.2 ..
[15:29] <jefferai> I'm not
[15:29] <gucki> jefferai: sorry, no idea how stable master is...
[15:30] <jefferai> me neither, that's why I'm asking
[15:30] <gucki> jefferai: what features does master have that are missing in the current stable
[15:30] <jefferai> I haven't the foggiest idea
[15:30] <jefferai> I'm more concerned with bug fixes
[15:30] <gucki> jefferai: so is it worth going with the latest, probably unstable version?
[15:30] <gucki> jefferai: normally the stable version is the stablest one? (at least i'd assume that)
[15:31] <jefferai> gucki: master isn't the "latest"
[15:31] <gucki> jefferai: ah ok, normally master in git has the last commit....sorry
[15:31] <jefferai> it doesn't
[15:31] <tnt> Anybody knows how to configure the rgw log level ? they're very verbose right now ...
[15:31] <jefferai> branches do
[15:31] <jefferai> after some amount of QA they go into master
[15:38] * noob2 (a5a00214@ircip4.mibbit.com) has joined #ceph
[15:48] <tnt> gucki: mmm ...
[15:55] <tnt> gucki: well, it didn't work :p
[16:00] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) has joined #ceph
[16:12] <tnt> gucki: Ok, got it to work :p
[16:23] <gucki> tnt: ah, what did you change? why did it fail at first?
[16:27] <tnt> gucki: I just retried from scratch and basically I just didn't do a ceph mon add ... I just started mon.b and let itself join the cluster
[16:40] <gucki> tnt: ok, probably that's the key difference
[16:40] <gucki> i'll try it that way and see if it wants to join the cluster itself :)
[16:42] <tnt> because basically when you do ceph mon add, it ends up being 1/2 mon online and so it's out of quorum and refuses to do any operation. And mon-b can't join it because at that point it's not entirely initialzie (the fs is really minimal) ...
[16:43] <gucki> tnt: sounds plausible
[16:46] <tnt> you actually only get the issue if going from 1->2 once you get 2, you can add other without problems
[16:50] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[16:51] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[17:16] * vata (~vata@208.88.110.46) has joined #ceph
[17:21] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has joined #ceph
[17:23] * jlogan1 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) has joined #ceph
[17:28] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[17:28] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:28] <gucki> tnt: i'll try a bit later on and let you know if it worked :)
[17:29] <gucki> tnt: thanks in advance
[17:41] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) has joined #ceph
[17:41] <sagelap> elder: speaking of xfstests, 045 is still failing.. any idea what that's about?
[17:42] <elder_> Yes. Ia dded something to the bug
[17:42] <elder_> but haven't gotten back to it.
[17:42] <elder_> The mount() command is returning 0 despite a failure.
[17:42] <elder_> Dave Chinner said the next version of the command beyond what we have on our nodes fixes it again.
[17:43] <elder_> But I haven't gone in to see if that's true.
[17:43] <elder_> And I don't know how to update all the target nodes to fix that problem. Mean time that test could be removed from the list...
[17:43] <elder_> (We really ought to define the tests to run in the nightly test separate from the run_xfs_tests.sh script)
[17:47] <sagelap> that'd be nice
[17:48] <sagelap> did dchinner mention which util-linux version by chance?
[17:56] <elder_> Sorry, yes.
[17:56] <elder_> Just a minute.
[17:59] <elder_> tracker is kind of slow
[18:01] <elder_> I can't find it right now. But I believe it's version 4.01. That is, I think it was one minor revision after what we now have. And I see this:
[18:02] <elder_> qubuntu@plana49:~$ quota --version
[18:02] <elder_> Quota utilities version 4.00.
[18:02] <elder_> Compiled with: USE_LDAP_MAIL_LOOKUP EXT2_DIRECT HOSTS_ACCESS ALT_FORMAT RPC RPCR
[18:02] <elder_> Bugs to jack@suse.cz
[18:02] <elder_> ubuntu@plana49:~$
[18:02] <sagelap> k
[18:03] <sagelap> hmm there is an updated version of the packge
[18:04] <sagelap> http://changelogs.ubuntu.com/changelogs/pool/main/q/quota/quota_4.00-3ubuntu1/changelog ... that should be the fix we need right?
[18:05] <sagelap> joshd: there?
[18:05] <elder_> Maybe. I didn't go looking and was going to try what I found to see.
[18:06] <elder_> But reportedly, yes, Dave said he no longer saw the problem with 4.0.1
[18:06] <elder_> (Or rather, didn't see the problem)
[18:07] <sagelap> that's a backport of the (er, a) fix to precise's version. if you apt-get update ; apt-get install quota on a plana that'll get the update.
[18:07] <sagelap> next time you do an xfstest run lets see if it fixes it?
[18:07] <elder_> OK.
[18:07] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:07] <elder_> Tommi didn't like people installing stuff on plana nodes... So at some point they should all get updated if this does fix it.
[18:08] <elder_> Using whatever the appropriate route for doing that.
[18:08] <elder_> (is)
[18:08] <elder_> I can try that right now I think on my plana machines.
[18:10] <elder_> sagelap can you remind me again how I can get an arbitrary command to be run on a target node using teuthology?
[18:10] <elder_> In my yaml file?
[18:11] <gucki> tnt: yeah, this time it worked as you said. the key is also to wait till the second monitor is up, as can been seen by "ceph mon stat" :)
[18:16] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:19] * tnt (~tnt@34.23-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:26] <elder_> sagewk, "apt-get update; apt-get install quota" did update quota. The quota command still self-reports "Quota utilities version 4.00." And the failure still occurs.
[18:27] <elder_> (All this on plana53)
[18:31] <Robe> does the loss of a journal automatically invalidate OSD content?
[18:37] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:40] <Robe> google answered it: http://www.spinics.net/lists/ceph-devel/msg06074.html
[18:43] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:50] <jtang> for those that care - https://github.com/jcftang/tchpc-vagrant/tree/ansible/ceph-playbooks
[18:50] <jtang> i just made a start at a ceph playbook
[18:50] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[18:50] <jtang> partly because i wanted it for my own testing and development
[18:52] <Robe> and I wonder - is the 1GiB Journal per OSD not massively oversized?
[18:53] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) Quit (Read error: Connection reset by peer)
[18:53] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[18:54] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) has joined #ceph
[18:57] <elder_> sagelap, see comments above Re: quotas.
[19:21] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[19:24] <sagelap> elder_ too bad, ok.
[19:24] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:24] <elder_> It's possible updating my sources.list would change that result, I don't know.
[19:25] <joshd> sagelap: I'm here
[19:29] <sagelap> elder_: we want the fix in precise so we don't run anything weird. probably need to identify the upstream patch and ping the ubuntu package maintainer.
[19:29] <elder_> I'm glad you say that.
[19:29] <sagelap> joshd: hammered on wip-rbd-read a bunch. it fixes all the crashes I was seeing before.. haven't been able to brea it
[19:30] <elder_> I really prefer sticking with stock stuff.
[19:30] <elder_> Have you tried to aon it or lax it?
[19:30] <sagelap> joshd: i hope it doesn't conflict with the objectcacher stuff you started working on!
[19:30] <sagelap> :)
[19:31] <joshd> sagelap: probably not, but I'll look at it in a little bit
[19:31] * benpol (~benp@garage.reed.edu) has joined #ceph
[19:31] <nhm> sagelap: apparently ceph started off as a php project. ;)
[19:31] <sagelap> hehe
[19:33] <elder_> Are there any rbd requests that might return EAGAIN?
[19:33] <elder_> I mean data I/O requests.
[19:33] <elder_> I want to select an error code to use so the original completion doesn't actually complete if the I/O has been re-submitted to a parent.
[19:34] <rweeks> anyone know if we've ran Ceph on one of these: https://help.ubuntu.com/community/UbuntuStudio/RealTimeKernel
[19:34] <joshd> elder: only when you're using an option that currently is never used with rbd, and requires client side support
[19:34] <sagelap> nothing that would escape osd_client
[19:34] <elder_> What option?
[19:34] <joshd> localize_reads, but sage is right about it not escaping osd_client
[19:34] <joshd> it's not supported by the kernel currently
[19:35] <elder_> OK. I'm going to use that then, and will maybe add an assertion if possible to make sure we never actually see it.
[19:35] <joshd> ok
[19:36] <elder_> I'm open to another errno value if anyone thinks it would make more sense.
[19:38] <sagelap> EAGAIN works for me
[19:42] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:44] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:58] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[20:05] * mtk (R3jgasDv96@panix2.panix.com) Quit (Remote host closed the connection)
[20:05] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[20:08] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) has joined #ceph
[20:12] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[20:29] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[20:41] <benpol> I upgraded my test cluster from argonaut to 0.53 last week and now cephx authenticated connections from outside the cluster (ie both kernel rbd and libvirt) are failing. I've updated the cluster's ceph.conf to use three new cephx related configs as described in http://ceph.com/docs/master/cluster-ops/authentication/#enabling-cephx is there another step I've missed?
[20:43] <benpol> A command like this: rbd map rbd/v1image --id garage --keyfile /tmp/garage.base64
[20:43] <benpol> returns: add failed: (1) Operation not permitted
[20:44] <benpol> prior to the upgrade libvirt and the kernel rbd driver were both working.
[20:52] <gucki> is using async io for the journal safe and does it bring performance improvements? (i'm using kernel 3.5 and latest argonaut release) :)
[20:58] * wtipton (~wtipton@lithium.ccmr.cornell.edu) has joined #ceph
[21:01] <wtipton> Hi ceph chat, I've set up a new cluster to try to use CephFS, and I have an issue I'm hoping to get help troubleshooting
[21:01] <wtipton> When I try to mount it, I get:
[21:01] <wtipton> [236123.741993] libceph: mon2 192.168.0.108:6789 feature set mismatch, my 8a < server's 4008a, missing 40000
[21:01] <wtipton> [236123.742000] libceph: mon2 192.168.0.108:6789 missing required protocol features
[21:01] <wtipton> All the machines in the cluster run (and have only ever run) 0.53
[21:02] <wtipton> and kernels are up to date w/ the Ubuntu 12.04 repositories
[21:06] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:07] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[21:08] <wtipton> and if I look in the linux/ceph/ceph_fs.h header, it looks like feature flags are only defined up to (1<<7). I don't see anything about 0x40000 which is something like (1<<18)
[21:08] <wtipton> any ideas?
[21:25] <joshd> gucki: it's safe, but I'm not sure how much improvement it gives
[21:28] * jjgalvez (~jjgalvez@cpe-76-175-16-2.socal.res.rr.com) has joined #ceph
[21:29] <joshd> wtipton: are any of the machines 32-bit? I wonder if there's a bug in the feature interpretation there
[21:29] * dmick (~dmick@2607:f298:a:607:6d30:a089:4f65:a21d) has joined #ceph
[21:30] * jjgalvez1 (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[21:30] <gucki> joshd: do i need to install any special libraries to get it working? i added "osd journal aio = true" to the [osd] section but in the log it says aio = 0 on startup..
[21:31] <joshd> gucki: it needs to be compiled against libaio
[21:32] <wtipton> joshd: they're all 64bit
[21:32] <dmick> davidz: re your push comment: I think it's just noting that that branch has all those commits past master
[21:33] <tnt> wtipton: the default 12.04 kernel aren't exactly 'up to date' ceph-wise AFAIK.
[21:33] <joshd> yeah, but those features haven't changed in a long time
[21:34] <wtipton> tnt: i understand, but I was hoping it'd be workable anyway. fwiw, i got impatient and just deleted everything and reran mkcephfs, and now it reports healthy and appears to mount and work fine
[21:35] <gucki> joshd: ok, then it seems aio is not compiled in for the ubuntu packages :(
[21:35] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[21:35] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:35] <davidz> dmick: yup…my branch was somehow based at the "testing" branch instead of "master"
[21:36] <tnt> wtipton: I don't know about cephfs, but if you're planning to use rbd, the default 12.04 kernel definitely isn't all the good, there is a nasty bug that's been fixed for a while that's not fixed in it.
[21:36] <gucki> joshd: when an osd has been shutdown gracefully it's journal can safely be deleted, right? it'll just create a new one the next startup..?
[21:36] <dmick> davidz: but, hm, testing isn't an ancestor, the way I read it...
[21:36] * jjgalvez (~jjgalvez@cpe-76-175-16-2.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:38] <joshd> wtipton: ah, there was one feature added recently - support for crush tunables
[21:38] * allsystemsarego (~allsystem@188.27.166.61) has joined #ceph
[21:38] <wtipton> tnt: ah, I see. I'm planning on just using cephfs, but I'll look into updating the kernels
[21:38] <joshd> wtipton: you'll need to reset to the legacy values (http://ceph.com/docs/master/cluster-ops/crush-map/#legacy-values) or update your kernel to be able to use a kernel client
[21:39] <wtipton> joshd: ah, ok. thank you :)
[21:40] <joshd> wtipton: tnt: this page was just added last week to help explain that: http://ceph.com/docs/master/install/os-recommendations/
[21:41] <nhm> gucki: talk to Sam about that, I think there is something you have to run to make sure it's safe.
[21:45] <joshd> benpol: is there anything in syslog/dmesg after the map fails?
[21:52] <tnt> joshd: ah nice
[22:06] * gregorg (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[22:13] <todin> is this a bug report worth? http://pastebin.com/g06LYmbZ
[22:14] <dmick> todin: definitely, particularly if you can reproduce and document how
[22:14] <dmick> unrelatedly: apparently ceph creates /tmp/memlog somehow (from MemoryModel). I've been wondering where this comes from.
[22:14] <todin> dmick: just startet 20 vms and one of them throwed this assertion
[22:16] <dmick> librbd aio handling is seeing a lot of work right now, so it could be one we've seen/fixed, but, it's always worth filing
[22:16] <benpol> joshd: just this: libceph: client0 fsid 2b101c75-ba61-4789-a29f-9406bf9df557
[22:17] <benpol> joshd: and this: libceph: mon0 134.10.139.130:6789 session established
[22:19] <dmick> mds, in MDCache::check_memory_usage(); looks unconditional. hump.
[22:19] <dmick> er, humph.
[22:20] <benpol> joshd: and this in the ceph-mon.log: 2012-11-05 13:19:53.621427 7fe98ba74700 0 -- 134.10.139.130:6789/0 >> 134.10.15.23:0/3575620971 pipe(0x2e5cd80 sd=18 :6789 pgs=0 cs=0 l=0).accept peer addr is really 134.10.15.23:0/3575620971 (socket is 134.10.15.23:32896/0)
[22:23] <todin> dmick: http://tracker.newdream.net/issues/3444
[22:24] <dmick> thanks
[22:30] * wtipton (~wtipton@lithium.ccmr.cornell.edu) Quit (Remote host closed the connection)
[22:32] <benpol> joshd: the same rbd map command succeeds when I run it with admin credentials like this: rbd map rbd/v1image --id admin --keyfile /tmp/client.admin.base64
[22:34] * allsystemsarego (~allsystem@188.27.166.61) Quit (Quit: Leaving)
[22:35] <todin> dmick: got the assertion a second time one a other hw-node
[22:40] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit (Ping timeout: 480 seconds)
[23:12] <jefferai> dmick: hey -- do you suggest, at this point (given bug fixes and all) going with the debian repo (argonaut) the debian-testing repo?
[23:13] * tryggvil (~tryggvil@16-80-126-149.ftth.simafelagid.is) has joined #ceph
[23:16] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) Quit ()
[23:19] <benpol> So with 0.53 a client that had the following cap "[osd] allow rw pool=rbd" can't talk to the rbd pool anymore.
[23:19] <jmlowe> there is a bug fixed in 0.54, it can't parse the equals, leave it out and it will work
[23:20] <benpol> jmlowe: ah, thanks!
[23:20] <jmlowe> tripped over the same thing myself until joshd helped me
[23:20] <benpol> parsing, can't live with it can't live without it.
[23:31] <benpol> jmlowe: thanks again!
[23:31] * benpol heads home
[23:41] <jefferai> jmlowe: you have any insight into my question? :-)
[23:41] * noob2 (a5a00214@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:46] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[23:50] <dmick> jefferai: what are you after?
[23:50] <dmick> personally I would always go later than argonauth
[23:50] <dmick> s/h//
[23:51] <jefferai> dmick: stability, but ideally with known bugs fixed
[23:51] <dmick> but I'd rather have 'latest insight' at this stage
[23:51] <jefferai> like, setting up a production cluster, but I just know there are a lot of known bugs in argonaut
[23:51] <gucki> jefferai: mh, i wonder why they then call it the stable release?!
[23:51] <dmick> we're trying to backport severe bugs
[23:52] <jefferai> sure
[23:52] <jefferai> dmick: so the text on the debian/ubuntu page suggests that the debian-testing repo ought to be pretty stable
[23:52] <gucki> jefferai: and for me it works without any problems so far :) (i'm only using kvm-rbd)
[23:52] <dmick> but of course there's always tension between top-of-tree, latest numbered release, and "supported"
[23:52] <jefferai> right
[23:52] <jefferai> and I see in changelogs that e.g. defaults have changed and so on
[23:53] <dmick> testing is a good middle ground IMO
[23:53] <jefferai> so I'm just trying to decide whether, at this point, I should base a new cluster off of the code that is nearly ready for the next stable release
[23:53] <jefferai> or the "old" stable release
[23:53] <dmick> it's seen more shakedown, but has more benefit of later development
[23:53] <jefferai> yeah
[23:53] <jefferai> that was my thought
[23:53] <dmick> bobtail is imminent, so this will be moot in a week or two
[23:53] <dmick> (or so)
[23:53] <gucki> dmick: next stable is expected around mid december right?
[23:53] <jefferai> yeah, my time is short so I need to set up this week
[23:53] <dmick> uh....actually not sure about schedule. it's closing soon, but there will be some testing
[23:53] <jefferai> but if it's that close, testing seems like a good choice
[23:54] <dmick> sage just posted...let me see if I can find that email
[23:54] <dmick> he's shooting for freeze end of this week, release probably 3wks after
[23:55] <dmick> so, early Dec, most likely
[23:55] <dmick> Message-ID: <alpine.DEB.2.00.1210311341580.20646@cobra.newdream.net>
[23:55] <joshd> jefferai: you should be aware of this bug in 0.53 http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/9700
[23:58] <jefferai> joshd: great, thanks
[23:59] <dmick> dec: yt?
[23:59] * gucki (~smuxi@80-218-125-247.dclient.hispeed.ch) Quit (Remote host closed the connection)
[23:59] <dmick> I think we've gotten to the bottom of the reason you and I couldn't start kvms

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.