#ceph IRC Log

Index

IRC Log for 2012-03-21

Timestamps are in GMT/BST.

[0:22] * LarsFronius (~LarsFroni@f054112005.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[0:23] * Tv|work (~Tv_@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[0:37] * gregorg (~Greg@78.155.152.6) has joined #ceph
[0:37] * gregorg_taf (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[0:40] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[1:03] * BManojlovic (~steki@212.200.240.216) Quit (Remote host closed the connection)
[1:13] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:37] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:05] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:05] <lxo> sage, ping
[2:07] <lxo> remember that problem of snapshots changing timestamps? it looks like I get it when I touch the directory or move anotther directory into it, but *not* when I create new files in it. does that make any sense to you?
[2:15] * lofejndif (~lsqavnbok@09GAAD4SS.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[2:21] * tnt_ (~tnt@194.11-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[3:00] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[3:03] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[3:18] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:06] * ajm (adam@adam.gs) Quit (Quit: ajm)
[4:12] * ajm (~ajm@64.188.63.86) has joined #ceph
[4:27] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Remote host closed the connection)
[4:29] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[5:26] * Liam_SA (~Liam_SA@41.161.35.68) Quit (Read error: Connection reset by peer)
[5:37] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[6:03] * cattelan is now known as cattelan_away
[7:06] * Theuni (~Theuni@46.253.59.219) has joined #ceph
[7:36] * Theuni (~Theuni@46.253.59.219) Quit (Ping timeout: 480 seconds)
[8:03] * tnt_ (~tnt@194.11-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:19] * LarsFronius (~LarsFroni@f054111069.adsl.alicedsl.de) has joined #ceph
[8:31] * LarsFronius (~LarsFroni@f054111069.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[9:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:19] * tnt_ (~tnt@194.11-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:27] * Theuni (~Theuni@82.141.26.50) has joined #ceph
[9:37] * tnt_ (~tnt@212-166-48-236.win.be) has joined #ceph
[9:45] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[9:48] * Theuni (~Theuni@82.141.26.50) Quit (Ping timeout: 480 seconds)
[10:07] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[10:16] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[10:23] <dwm__> Hmm. HTTP service on kernel.org is down.
[10:38] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:00] <hijacker> dwm__, confirmed, 80/tcp filtered http on pub2.kernel.org
[11:12] * tjikkun_ (~tjikkun@82-169-255-84.ip.telfort.nl) has joined #ceph
[11:15] <dwm__> Though the btrfs wiki still works, for example. As does git://.
[11:39] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[11:54] * stxShadow (~jens@p4FFFEBF1.dip.t-dialin.net) has joined #ceph
[12:06] * oliver1 (~oliver@p4FFFEBF1.dip.t-dialin.net) has joined #ceph
[12:53] <dwm__> Hmm, I think I might be running into the fragementation performance problem Christian Brunner ran into.
[12:53] <dwm__> Does anyone know if the larger-node/leaf size changes went into Linux v3.3?
[13:02] <dwm__> Right, looks like it's not. Josef's doing dev work on it in his tree, but the key commit -- "Btrfs: allow metadata blocks larger than the page size" is still in Chris Mason's dangerdonteveruse branch.
[13:44] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:22] * Theuni (~Theuni@195.62.106.110) has joined #ceph
[14:44] <Dieter_be> if you don't use xfs, will rados itself do the data checksumming on all blocks it reads?
[15:05] * tnt_ (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[15:13] * Theuni (~Theuni@195.62.106.110) Quit (Quit: Leaving.)
[15:17] * Theuni (~Theuni@195.62.106.110) has joined #ceph
[15:17] * Theuni (~Theuni@195.62.106.110) Quit ()
[15:17] * Theuni (~Theuni@195.62.106.110) has joined #ceph
[15:23] * tnt_ (~tnt@194.11-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[15:23] <Dieter_be> err i meant, if you don't use btrfs, of course
[15:23] <Dieter_be> such as when you use xfs
[15:34] * Theuni (~Theuni@195.62.106.110) Quit (Quit: Leaving.)
[15:35] <Dieter_be> hmm where is workingdir.conf ? http://ceph.newdream.net/wiki/Simple_test_setup says it comes with the code checkout by default, but it doesn't
[15:39] * Theuni (~Theuni@195.62.106.110) has joined #ceph
[15:40] <wonko_be> that page was last updated in 2009 - it might be outdated
[15:40] <wonko_be> (understatement)
[15:41] <Dieter_be> whoa!
[15:41] <Dieter_be> didn't notice that
[15:46] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[15:47] <jmlowe> Anybody else having this problem?
[15:47] <jmlowe> W: A error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://ceph.newdream.net oneiric InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 6EAEAE2203C3951A
[15:48] <wonko_be> the repo signing key changed location when the move from github.com/NewDreamNetwork to github.com/ceph was made
[15:49] <jmlowe> ah, missed that detail
[15:49] <wonko_be> key "https://raw.github.com/ceph/ceph/master/keys/release.asc"
[15:50] <wonko_be> I guess that will be the problem
[15:52] <jmlowe> hmm, that key doesn't seem to be working for me
[15:52] <jmlowe> it's apt-key add right, or has my coffee intake been insufficient?
[15:59] * guilhem (~spectrum@sd-20098.dedibox.fr) has joined #ceph
[15:59] * guilhem is now known as Guest7104
[16:00] <Guest7104> hello all
[16:00] <stxShadow> hi
[16:00] <Guest7104> we have deployed an ceph - radosgw infrastructure last week
[16:01] <Guest7104> and today we have some problems :x
[16:01] * nhm (~nh@68.168.168.19) Quit (Read error: Operation timed out)
[16:01] <Guest7104> on the radosgw side : "idle timeout (30 sec)" on Apache
[16:02] <Guest7104> and with ceph -w, many "[WRN] old request osd_op(client.4740.0:1450 .dir.18 [call rgw.bucket_prepare_op] 11.58234bb1) v4 received at 2012-03-21 16:01:19.775234 currently waiting for sub ops"
[16:02] <stxShadow> the last message is -> one of your osds (maybe journaling) is slow
[16:03] <stxShadow> are there any other messages on ceph -w ?
[16:03] <Guest7104> nothing else
[16:04] <Guest7104> PG active+clean
[16:04] <jmlowe> So any chance we could get the actual key that signed the repo?
[16:04] <Guest7104> only many "old requests"
[16:05] <jmlowe> I'm looking for DSA key ID 03C3951A
[16:05] <wonko_be> ah, sec
[16:06] <wonko_be> jmlowe: that is labeled "autobuild" on my machine
[16:06] <jmlowe> well never mind, keys.gnupg.net has it
[16:06] <wonko_be> try https://raw.github.com/ceph/ceph/master/keys/autobuild.asc
[16:06] <jmlowe> gpg --verify InRelease
[16:06] <jmlowe> gpg: Signature made Wed 21 Mar 2012 09:15:31 AM EDT using DSA key ID 03C3951A
[16:06] <jmlowe> gpg: Good signature from "Ceph automated package build (Ceph automated package build) <sage@newdream.net>"
[16:06] <jmlowe> gpg: WARNING: This key is not certified with a trusted signature!
[16:06] <jmlowe> gpg: There is no indication that the signature belongs to the owner.
[16:06] <jmlowe> Primary key fingerprint: FCC5 CB2E D8E6 F6FB 79D5 B331 6EAE AE22 03C3 951A
[16:07] <wonko_be> it should be that one:
[16:07] <wonko_be> pub 1024D/03C3951A 2011-02-08 [expires: 2013-02-07] Key fingerprint = FCC5 CB2E D8E6 F6FB 79D5 B331 6EAE AE22 03C3 951A
[16:07] <wonko_be> uid Ceph automated package build (Ceph automated package build) <sage@newdream.net>
[16:07] <jmlowe> ok, great, really dragging this mornign
[16:46] * Tv|work (~Tv_@aon.hq.newdream.net) has joined #ceph
[16:46] * Theuni1 (~Theuni@195.62.106.110) has joined #ceph
[16:46] * Theuni (~Theuni@195.62.106.110) Quit (Read error: Connection reset by peer)
[16:49] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) has joined #ceph
[16:51] <imjustmatthew> hey, when I know I need to restart the active mds can I use on of the "ceph" control commands to transfer active mds role to one of the standbys?
[17:06] * Theuni1 (~Theuni@195.62.106.110) Quit (Quit: Leaving.)
[17:31] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[17:35] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:38] <gregaf1> imjustmatthew: nope, don't think that option exists — you should make a bug!
[17:39] * stxShadow (~jens@p4FFFEBF1.dip.t-dialin.net) Quit (Remote host closed the connection)
[17:39] * oliver1 (~oliver@p4FFFEBF1.dip.t-dialin.net) has left #ceph
[17:41] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[17:48] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[17:54] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:54] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[18:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:18] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[18:22] * cattelan_away is now known as cattelan
[18:38] * bchrisman (~Adium@69.239.249.254) has joined #ceph
[18:40] * LarsFronius (~LarsFroni@f054111069.adsl.alicedsl.de) has joined #ceph
[18:43] * perplexed (~perplexed@mobile-198-228-210-160.mycingular.net) has joined #ceph
[18:50] * bchrisman (~Adium@69.239.249.254) Quit (Quit: Leaving.)
[18:53] * perplexed (~perplexed@mobile-198-228-210-160.mycingular.net) Quit (Quit: perplexed)
[19:02] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[19:02] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:04] * The_Bishop (~bishop@178-17-163-220.static-host.net) has joined #ceph
[19:06] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[19:06] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[19:11] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[19:15] <dwm__> Ha, that explains a lot: I think my autogenerated CRUSH map has done something odd.
[19:16] <dwm__> (Context: 12 OSDS, each on their own spindle in a single host.)
[19:16] <dwm__> `rados bench` appears to only be exercising two of them..
[19:17] <dwm__> (One for each replica.)
[19:24] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:37] * LarsFronius (~LarsFroni@f054111069.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[19:39] <dwm__> Hmm, no -- it's just rados bench that seems to map solely onto a pair of OSDs -- RBD use spreads evenly.
[19:44] <joshd> dwm_: check the crushmap for the pool rados bench is using (or are you possibly setting object locators when running rados bench?)
[19:45] * Guest7104 (~spectrum@sd-20098.dedibox.fr) has left #ceph
[19:51] * aliguori (~anthony@32.97.110.59) has joined #ceph
[19:53] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:54] <dwm__> joshd: Was initially using a newly-minted 'bench' pool, and also the default RBD pool -- both gave the same behaviour.
[19:54] <dwm__> (i.e. iostat /dev/sd? 1 was showing that only two block devices were active.)
[19:55] <dwm__> But when I'm running iozone over the same rbd pool, then it spreads the load across the entire set.
[19:55] <dwm__> (I'm not *knowingly* setting object locators..)
[19:56] <joshd> hmm, maybe an object locator is accidentally getting set by rados bench by default
[19:59] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[20:03] <joshd> dwm_: try rados -p rbd bench 10 write --debug-objecter 10 --log-to-stderr 2>&1 | grep op_submit
[20:03] <joshd> the output should be like: op_submit oid benchmark_write_data @2...
[20:04] <joshd> if the @2 part is more than just @POOL_ID, an object locater is being used
[20:04] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[20:05] <dwm__> Simply "@2", followed by [write 0~4194304], which I assume is the object to be written.
[20:06] <dwm__> However, each line ended with the same OSD identifier.
[20:06] <dwm__> Ah, save the final 'benchmark_write_data', which happened to go to a different one.
[20:07] <dwm__> Multiple runs target different osds.
[20:07] <dwm__> Stab in the dark: maybe the fact the OSDs are all on the same IP is causing minor confusion..
[20:08] <dwm__> Also: This is .43. Haven't updated to .44 yet.
[20:08] <dwm__> Hmm. Would you expect the oid to change each time?
[20:08] <dwm__> 'Cos it's not. :)
[20:09] <joshd> that would be the problem :)
[20:10] <dwm__> Yeah, it shows an oid of 'illustrious.doc.ic.ac.uk_1492' each time -- combination of hostname + pid?
[20:10] <joshd> yeah, but it should have an increasing number at the end too
[20:10] <dwm__> Logical. Seems to be absent.
[20:25] <dwm__> Hmm, I'm just looking at the rados bench code. generate_object_name() seems to be where this /should/ happen -- and indeed, the object names have _object%d suffixes which are absent from my debug output.
[20:26] <dwm__> Ah, got it.
[20:26] <dwm__> The buffer for the object name is 30 chars long.
[20:26] <dwm__> illustrious.doc.ic.ac.uk_NNNN is 30 chars long. The unique identifiers are being truncated.
[20:29] <dwm__> Should be easy to work around -- either put the serial number first or make the oid larger.
[20:30] <dwm__> Is there an upper bound on the size of oids?
[20:30] <dwm__> (Given I'm about to compile a local copy of 0.44 /anyway/ ... :)
[20:30] <joshd> looks like 128 is allocated for it in write_bench
[20:30] <joshd> I don't see why it doesn't just use std::strings though
[20:34] <dwm__> I'll try the trivial change of changing to 128 in 0.44 and retest.
[20:38] <gregaf1> rados bench is just hack that's been hacked on a lot; this issue would be something that fell through the cracks when the names started being made unique on each test runner
[20:38] <gregaf1> should make a bug for it in the tracker, though!
[20:39] <dwm__> Righto, shall mint one.
[20:44] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[20:52] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:58] <dwm__> Issue created as #2196.
[21:05] <elder> Is "git submodule update" enough to fix my "(new commits)" status for src/leveldb?
[21:07] * jmlowe (~Adium@173-15-112-198-Illinois.hfc.comcastbusiness.net) has joined #ceph
[21:07] <joshd> elder: if you've already run 'git submodule init', yes
[21:07] <elder> Well I believe I had.
[21:17] <sjust1> elder: you may need to run git submodule sync
[21:17] <elder> OK.
[21:18] <elder> Is that a bit like: submodule update (similar to remote update) and submodule sync (similar to fetch), or along those lines?
[21:18] <sjust1> elder: it's less consistent than that, git stores the remote for each submodule
[21:18] <sjust1> if the remote changed, git submodule sync updates it
[21:18] <sjust1> git submodule init registers the submodules
[21:19] <sjust1> and git submodule update just checks out the relevant hash
[21:19] <elder> But a submodule is basically including something from a different git repository by reference rather than duplicating it, right?
[21:19] <sjust1> yeah, you would think it would be simpler
[21:22] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[21:22] * lofejndif (~lsqavnbok@19NAAHI40.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:31] * BManojlovic (~steki@212.200.240.216) has joined #ceph
[21:32] <dwm__> Hmm, I'm not sure I believe iftop.
[21:32] <dwm__> It's currently claiming that my gigabit NIC is pushing out at (at peak) 1.5Gbit/sec..
[21:34] <elder> That's faast.
[21:34] <elder> You should turn down your electricity.
[21:45] <dwm__> I suspect an artifact of using a local br0.
[21:46] <dwm__> Either that, the advantages of using good-ole' 230V rather than you're watered-down USian voltage. :)
[21:46] <elder> Or maybe it's Hz.
[21:46] <elder> That has something to do with speed, right?
[21:46] <dwm__> Oh dear, I just used the wrong "you're".
[21:47] <elder> Your so dum.
[21:47] <dwm__> No, duum.
[21:47] <dwm__> Shortened to dwm.
[21:47] <elder> I meant sew.
[21:47] <dwm__> It's in the name.
[22:03] * jmlowe (~Adium@173-15-112-198-Illinois.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[22:54] * wlbilljg (~wgallaghe@nat-204-14-239-208-sfo.net.salesforce.com) has joined #ceph
[22:57] <wlbilljg> i know this is probably a very common question, but any idea when ceph would be stable enough for production?
[22:57] <wlbilljg> http://ceph.newdream.net/wiki/FAQ#Is_Ceph_ready_for_use_in_a_production_environment.3F
[22:57] <wlbilljg> we're considering using it for an internal project
[23:13] <iggy> wlbilljg: there have been a few emails about where the devs have talked about features they want to finish off before they say it's ready, but I don't recall there being a time frame
[23:13] <wlbilljg> ok
[23:32] <dwm__> In practice, different access layers will be viable at different rates.
[23:33] <wlbilljg> are we talking about weeks or months... or years?
[23:33] <dwm__> There seems to be an opinion that the OSD, RADOSGW and RBD layers are approaching maturity, while the cephfs layer is a little ways off.
[23:33] <dwm__> (The cephfs layer is, of course, dependant on the OSD.)
[23:35] * Q (Q@ppp59-167-157-24.static.internode.on.net) has joined #ceph
[23:35] * Q is now known as Guest7150
[23:37] * Qten (Q@ppp59-167-157-24.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[23:43] * aliguori (~anthony@32.97.110.59) Quit (Quit: Ex-Chat)
[23:53] * perplexed (~perplexed@mobile-198-228-210-160.mycingular.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.