#ceph IRC Log


IRC Log for 2011-08-08

Timestamps are in GMT/BST.

[0:13] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[1:20] * verwilst (~verwilst@d51A5B7A8.access.telenet.be) Quit (Quit: Ex-Chat)
[1:20] * verwilst (~verwilst@d51A5B7A8.access.telenet.be) has joined #ceph
[1:22] * verwilst (~verwilst@d51A5B7A8.access.telenet.be) Quit ()
[1:35] <lxo> does anyone else experience cmon growing to several GBs of virtual memory use every now and again? (seen with 0.31 and 0.32)
[3:29] * lxo (~aoliva@09GAAFY7E.tor-irc.dnsbl.oftc.net) Quit (Quit: later)
[3:36] * lxo (~aoliva@19NAACW54.tor-irc.dnsbl.oftc.net) has joined #ceph
[3:48] * iggy (~iggy@theiggy.com) has joined #ceph
[4:08] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:41] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Ping timeout: 480 seconds)
[7:55] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[7:58] * huangjun (~root@ has joined #ceph
[9:01] * jantje_ (~jan@paranoid.nl) has joined #ceph
[9:08] * jantje (~jan@paranoid.nl) Quit (Ping timeout: 480 seconds)
[10:07] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[10:10] * darktim (~andre@ticket1.nine.ch) has joined #ceph
[10:51] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[10:53] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[11:08] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:10] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[11:17] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:06] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[13:20] * verwilst (~verwilst@d51A5B689.access.telenet.be) has joined #ceph
[13:37] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[13:48] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[13:49] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[15:18] * gregorg (~Greg@ has joined #ceph
[16:54] * huangjun (~root@ Quit (Remote host closed the connection)
[17:03] * gregorg_taf (~Greg@ has joined #ceph
[17:03] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[17:06] * gregorg (~Greg@ has joined #ceph
[17:06] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[17:08] * gregorg_taf (~Greg@ has joined #ceph
[17:08] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[17:11] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:12] * greglap (~Adium@ has joined #ceph
[17:27] * gregorg_taf (~Greg@ Quit (Quit: Quitte)
[17:46] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:58] * rsharpe (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:18] * lxo (~aoliva@19NAACW54.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[18:19] <Tv> sagewk: no 10:15 daily then, right?
[18:24] * lxo (~aoliva@9YYAAAO6R.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:36] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:40] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:41] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:56] <sagewk> tv: right
[18:56] <sagewk> slang: around?
[18:56] <slang> yes
[18:56] <sagewk> slang: looking through your heartbeat logs now. is this a flat network, or are there different subnets for front/backend networks, or?
[18:57] <slang> sagewk: flat
[18:57] <slang> sagewk: 5 nodes, each node has two 10gige ports, each port is plugged into a single switch
[18:58] <sagewk> how are the two ports used?
[18:58] <sagewk> does ceph.conf specify which ips to bind to or anything?
[18:58] <slang> sagewk: the first port on each node is[1-5], the second is[1-5]
[18:58] <slang> sagewk: yes
[18:58] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[18:58] <slang> sagewk: 4 osds per address
[18:59] <sagewk> slang: i think that may be (partly?) what is confusing things
[19:00] <slang> I've been trying to figure out why there are so many ms_handle_reset messages in the logs
[19:00] <sagewk> can you pastebin your ceph.conf?
[19:00] <slang> I see them in the osd logs, mon logs, client logs, etc.
[19:01] <sagewk> if the osds are flapping that's probably why..
[19:02] <slang> sagewk: well, the osd processes never actually fail
[19:02] <slang> sagewk: they just get marked down
[19:02] <slang> sagewk: but continue running
[19:02] <sagewk> they bind to new addresses/ports when they're marked down/back up
[19:02] <slang> oh
[19:03] <sagewk> so the old connections are all closed out. otherwise its a big mess figuring out which messages are intended for the old instance
[19:03] <slang> sagewk: ok
[19:03] <sagewk> can you do an experiement when one of the interfaces disabled? if nothing else it will make the logs easier to understand
[19:03] <slang> sagewk: the flapping appears to stop after a while though
[19:03] <sagewk> and if that turns out to be the problem, i'll know to focus on that
[19:04] <slang> sagewk: and I'm able to add them back (the one's marked down) individually one at a time
[19:04] <slang> sagewk: sure
[19:04] <slang> sagewk: posting the conf file now
[19:04] <slang> sagewk: you want me to run with the same config file, just disable one interface?
[19:05] <slang> sagewk: one interface on each node, or just one?
[19:05] <sagewk> one interface/ip on each node
[19:05] <sagewk> let me see the conf first..
[19:06] <slang> sorry - I can't seem to get to pastebin.com at the moment...
[19:07] <slang> http://dl.dropbox.com/u/18702194/ceph.conf
[19:09] <sagewk> slang: ok conf should be fine.
[19:09] <sagewk> btw the host is usually a hostname
[19:09] <sagewk> and fwiw i reach for fpaste.org when pastebin ads are getting me down
[19:10] <slang> sagewk: once all osds are running (after restarting the downed ones), I don't see any more osds getting marked down, but I still see the ms_handle_reset messages
[19:10] <slang> ah nice re: fpaste
[19:10] <slang> sagewk: I wanted to avoid dns lookups
[19:11] <sagewk> it's just matching the output from `hostname`
[19:11] <sagewk> it just controls which daemons are started/stopped by the init script
[19:11] <slang> ah ok
[19:11] <slang> sagewk: so should I run the experiment with one interface down?
[19:12] <sagewk> slang: yeah let's see how well that does
[19:12] <slang> sagewk: i just want to make sure I understand the experiment
[19:13] <slang> sagewk: down eth1 on node one node, and startup up servers again
[19:15] <sagewk> slang: yeah
[19:17] <sagewk> slang: my worry is that the daemons learn their ip when they connect to a remote host (the remote node tells you what ip you appear to come from as part of the handshake). and which ip that is is random with the two interfaces and two ips.
[19:17] <sagewk> want to make sure that's not part of the problem
[19:18] <slang> according to ceph -w, only two osds are down
[19:18] <slang> should be 4 though
[19:19] <gregaf> it is possible to specify specific IPs to bind to; if the problem is random IP assignment then that could probably fix it
[19:20] <slang> and the down osds aren't on the same node as the downed interface
[19:21] <slang> so the osds with the IP on the downed interface aren't binding to that IP already?
[19:25] <slang> I can post the logs if you want
[19:25] <sagewk> slang: hard to say since each osd actually binds three times, once for public, once for cluster, once for heartbeats
[19:25] <sagewk> was everything restarted after you downed the second interface?
[19:26] <slang> the --bind arg seems to just set the public_addr
[19:26] <sagewk> slang: yeah let's ignore that for now and just make sure the 'default' binding behavior works with one eth
[19:27] <sagewk> did you restart all osds after downing the interface? (they could have bound to both ips for different msgr instances)
[19:28] <slang> sagewk: yes, no cosd/cmon/cmds processes were running, then I did ifdown eth1, then /etc/init.d/ceph -a start
[19:29] <sagewk> ok cool. if you post the new logs i'll dig in!
[19:29] <sagewk> brb
[19:52] <slang> http://dl.dropbox.com/u/18702194/ceph-logs-t4.tgz
[19:53] <slang> logs are big, the ceph-w.out file shows summary output of ceph -w
[20:08] <slang> hrm
[20:08] <slang> putting: cluster addr = <ip>
[20:08] <slang> for each osd makes everything happy
[20:09] <slang> no osds get marked down
[20:39] * squig (~bendeluca@soho-94-143-249-50.sohonet.co.uk) Quit (Ping timeout: 480 seconds)
[20:40] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Ping timeout: 480 seconds)
[20:56] * verwilst (~verwilst@d51A5B689.access.telenet.be) Quit (Quit: Ex-Chat)
[21:04] <sagewk> slang: happy with two interfaces or just one?
[21:04] <slang> with two interfaces
[21:06] <sagewk> slang: great. completely happy? no heartbeat weirdness at all?
[21:06] <slang> sagewk: does that help figure out what happens in the default binding case?
[21:06] <sagewk> slang: it points to the general problem, i'm not sure exactly which part falls over.
[21:07] <slang> sagewk: yep completely happy, no heartbeat weirdness that I can see
[21:07] <sagewk> slang: i think for any multi interface setup like yours to behave sanely tho it needs the cluster addr. you may want to set public addr too, so that the osds are split across the interfaces for client io
[21:08] <slang> yep I set the public addr too
[21:08] <sagewk> slang: great. yay! i'll go ahead and merge the heartbeat stuff then for 0.33
[21:08] <slang> sagewk: sorry for all that trouble for what turned out to just be a config issue
[21:08] <sagewk> slang: thanks for your help tracking this down.
[21:09] <sagewk> slang: well, i think it would have worked before with those settings, but we also realized there was a larger design problem that's now fixed
[21:09] <sagewk> slang: so win-win anyway :)
[21:09] <slang> sagewk: ok cool :-)
[21:09] <sagewk> slang: your mds crash is up next. lunch first
[21:10] <slang> :-)
[21:11] <sagewk> wido: around? about that osd craziness you were seeing...
[21:12] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[21:16] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Quit: Tv)
[21:17] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[21:39] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[21:40] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[22:14] * hutchins (~hutchins@c-75-71-83-44.hsd1.co.comcast.net) has joined #ceph
[22:22] <jantje_> Hey all
[22:23] <joshd> hi jantje_
[22:25] <jantje_> so, whats up with the buzz about CEPH and OpenStack
[22:27] <joshd> we added a rbd driver to openstack compute (nova)
[22:27] <joshd> this means you can attach/detach rbd drives to guests
[22:28] <jantje_> hmm, i see
[22:28] <jantje_> well, nothing I would need anyway ;-)
[22:28] <joshd> we'll probably be making a glance backend at some point, so you can store images in ceph
[22:28] <jantje_> I just need a high performance cloud storage for a distributed build environment ;-)
[22:30] <joshd> if you don't need a hierarchy, you could build that on top of rados
[22:40] <jantje_> what do you mean with hierarchy
[22:41] <jantje_> I did some testing a half year ago (and some bugreports :p), and I just mounted the cephfs. Is there a different approach now?
[22:42] <joshd> by hierarchy I mean directory hierarchy, like in a filesystem
[22:42] <jantje_> Oh, yes, I need a FS.
[22:43] <joshd> the ceph filesystem is built on top of rados, which just stores objects
[22:49] <jantje_> so how's the btrfs stability lately
[22:51] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[22:53] <jojy> regarding the readahead patch, does it help the client side buffer cache in kernel ?
[22:56] <joshd> jantje_: has seemed pretty stable, sagewk would know more about recent btrfs
[22:57] <yehudasa> jojy: the readahead patch makes the readahead more effective
[23:01] <jojy> yehudasa: just on the osds or also on the client kernel.I was wondering if the client rbd's tcp messaging could improve. Say a request for a sequential blocks on a client can make use of the new readahead stuff for its buffer cache..
[23:04] <jantje_> joshd: or would it be worth trying the 3.0 series already? I'm not sure.
[23:06] <joshd> jantje_: there were some btrfs bugs people running ceph saw in 3.0, I'm not sure if those are fixed in 3.1-rc1
[23:07] <yehudasa> jojy: the kernel already handles that in a different layer
[23:09] <jojy> yehudasa: in a non clustered io, the readahead can optimize the case of sequential read on a block device(and thus make use of the page cache), but for the clustered case, i was wondering if that can be done.
[23:11] <yehudasa> jojy: the're are a few aspects to the problem. First, sage's patch doesn't introduce a mechanism that hasn't been used before, but makes it more effective, as it was synchronous beforehand and now it's async and can cross block boundaries
[23:12] <yehudasa> jojy: second, rbd doesn't use the specific readahead infrastructure, but otoh, the kernel itself manages that for it. Also, rbd is already reading cross-object-boundaries.
[23:20] <jojy> yehudasa: the readahead mechanism in kernel is a fs layer thing or does it apply at the block layer also. So say i dont use fs and directly access block(scsi), would it be able to use the readahead mechanism in the kernel
[23:27] <yehudasa> jojy: block device readahead is controlled in a different layer
[23:27] <yehudasa> jojy: you can try use the blockdev utility to set it up
[23:28] <yehudasa> jojy: e.g., blockdev --setra xxx
[23:29] <jojy> yehudasa: thanks. i will play around with it
[23:42] <jantje_> joshd: I think I'm going to go with 3.0, it seems there are some changes that could give me some gains
[23:43] <sagewk> jantje_: yeah 3.0 is mostly better... there is just a performance issue where the filesystem starts slowing down after a few days that christian brought up on linux-btrfs last week. not sure if that's resolved yet
[23:46] <jantje_> shouldn't be a problem, yet.
[23:49] <johnl_> hi all
[23:50] <johnl_> I just upgraded ceph and now I'm seeing "[ERR] 0.e3 scrub stat mismatch" errors
[23:50] <johnl_> never seen those before, ever.
[23:51] <johnl_> just finding what commit my build is from
[23:56] * verwilst (~verwilst@d51A5B689.access.telenet.be) has joined #ceph
[23:56] <joshd> johnl_: that can happen if the underlying disk or filesystem for the osd goes bad
[23:58] <johnl_> joshd: yeah, but it was all fine, I upgraded and restarted ceph and every osd started reporting that
[23:58] <johnl_> but suspicious :)
[23:58] <johnl_> unless scrub hasn't actually been working until now!
[23:59] <joshd> johnl_: I'm not aware of any scrub bugs in the last few months
[23:59] <johnl_> git commit 68cbbf42c42
[23:59] <gregaf> johnl_: don't think there's anything new in it recently, what version did you upgrade from?
[23:59] <johnl_> gregaf: let me check that.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.