#ceph IRC Log

Index

IRC Log for 2011-08-05

Timestamps are in GMT/BST.

[0:33] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Read error: Connection reset by peer)
[0:34] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[0:44] * aliguori (~anthony@32.97.110.59) Quit (Remote host closed the connection)
[1:22] <slang> sagewk: I'm still seeing a bunch of nodes get marked down with this branch
[1:22] <gregaf> sagewk: sjust: you guys want to review the wip-pg-creation branch I just pushed?
[1:23] <slang> uploading the logs now
[1:23] <gregaf> it should be good; I'm testing it with heavy pool creation but it can always use eyes on the fundamental stuff :)
[1:24] <sagewk> slang: ok thanks
[1:34] <slang> http://dl.dropbox.com/u/18702194/ceph-logs-t3.tbz2
[1:45] * morse_ (~morse@supercomputing.univpm.it) has joined #ceph
[1:49] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[1:58] <slang> log [WRN] : map e17 had wrong client addr (192.168.101.112:6819/25397 != my 192.168.101.112:6819/25397
[1:58] <slang> I'm really confused by those kinds of errors
[1:59] <slang> oh
[2:00] <slang> else if (osdmap->get_cluster_addr(whoami).probably_equals(cluster_messenger->get_myaddr()))
[2:00] <slang> clog.warn() << "map e" << osdmap->get_epoch()
[2:00] <slang> << " had wrong client addr (" << osdmap->get_cluster_addr(whoami)
[2:00] <slang> << " != my " << cluster_messenger->get_myaddr();
[2:00] <slang> else if (osdmap->get_hb_addr(whoami).probably_equals(hbout_messenger->get_myaddr()))
[2:00] <slang> clog.warn() << "map e" << osdmap->get_epoch()
[2:00] <slang> << " had wrong client addr (" << osdmap->get_hb_addr(whoami)
[2:00] <slang> << " != my " << hbout_messenger->get_myaddr();
[2:00] <slang>
[2:00] <slang> those should be: else if(!osdmap...
[2:00] <slang> yes?
[2:13] <rsharpe> Can we get the following in libceph.h changed?
[2:13] <rsharpe> /* Get the CephContext of this mount */
[2:13] <rsharpe> CephContext *ceph_get_mount_context(struct ceph_mount_info *cmount);
[2:14] <rsharpe> It should be struct CephContext ??? so that it works in both C and C++
[2:14] <rsharpe> Or something else that works for both C and C++.
[2:15] <cmccabe> rsharpe: good point
[2:16] <cmccabe> I thought we had some C test programs for libceph, but maybe not
[2:17] <cmccabe> looks like testceph is C++ now
[2:17] <bchrisman> always has been :)
[2:18] <cmccabe> that's kind of weird because it's using the C API
[2:18] <bchrisman> ayup :)
[2:19] <cmccabe> and no C++ features except for cout
[2:26] <cmccabe> rsharpe: fixed by 66c3d8ff60ca585b97540daee942e2c5c6e5538f
[2:27] * cmccabe (~cmccabe@69.170.166.146) has left #ceph
[2:27] * cmccabe (~cmccabe@69.170.166.146) has joined #ceph
[2:27] <cmccabe> rsharpe: hope that helps
[2:29] * cmccabe (~cmccabe@69.170.166.146) has left #ceph
[2:33] * huangjun (~root@122.225.105.244) has joined #ceph
[3:00] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:02] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:27] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:14] <sage> slang: yeah, those last 3... altho one of those ifs should eval to true if you got into that block at all
[4:31] * hutchint (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) has joined #ceph
[4:46] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[5:03] * ajm (adam@adam.gs) Quit (Read error: Connection reset by peer)
[5:03] * ajm- (adam@adam.gs) has joined #ceph
[5:05] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) has joined #ceph
[5:07] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) Quit (Read error: Connection reset by peer)
[5:09] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) has joined #ceph
[5:22] * votz (~votz@pool-72-78-219-212.phlapa.fios.verizon.net) Quit (Quit: Leaving)
[6:13] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:17] <hutchint> I have a quick question, is the communication between the different nodes is done using tcp/ip?
[6:18] <sage> hutchint: yep
[6:19] <hutchint> Sweet, thanks. I'm currently reading all the literature but haven't gotten there yet
[7:20] * hutchint (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) Quit (Quit: Leaving)
[7:26] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) Quit ()
[7:48] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) has joined #ceph
[9:32] <huangjun> Is there any publications of recovery statemachine in ceph?
[9:41] <huangjun> i got MLog event, but the recovery statemachine in GetInfo state, so it should Crashed?
[10:54] * lxo (~aoliva@83TAACP02.tor-irc.dnsbl.oftc.net) Quit (Quit: later)
[10:55] * lxo (~aoliva@09GAAFXB0.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:01] * lxo (~aoliva@09GAAFXB0.tor-irc.dnsbl.oftc.net) Quit (Quit: later)
[11:02] * lxo (~aoliva@28IAABEJO.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:12] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Ping timeout: 480 seconds)
[11:34] * pmjdebruijn (~pascal@overlord.pcode.nl) has joined #ceph
[11:34] <pmjdebruijn> hi all
[11:41] <pmjdebruijn> are there any recent ceph patches for 2.6.32-stable (.e.g .43)?
[11:43] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:19] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[12:20] * Juul (~Juul@3408ds2-vbr.4.fullrate.dk) has joined #ceph
[14:20] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[14:25] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[15:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[15:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:06] * huangjun (~root@122.225.105.244) Quit (Remote host closed the connection)
[16:11] <wido> pmjdebruijn: Did you take a look at the standalone / backport branch?
[16:11] <wido> But personally I'd recommend using a newer kernel, like 3.0.
[16:14] * morse_ (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:16] <pmjdebruijn> wido: well we'll take that under consideration
[16:16] <pmjdebruijn> 2.6.32 would be convenient, but it's not a hard requirement
[16:16] <wido> pmjdebruijn: Running Ubuntu, RHEL or CentOS I guess?
[16:17] <wido> All the new development goes into the latest kernel, they do backport it to older kernels from time to time
[16:17] <wido> .32 is going the be the long term support kernel for all major distros, so I guess they'll backport back to that kernel when the time is right
[16:19] <pmjdebruijn> ok
[16:19] <pmjdebruijn> we'll we are on Ubuntu
[16:19] <pmjdebruijn> but we have a custom netboot setup
[16:19] <pmjdebruijn> so we roll our own kernels
[16:19] <pmjdebruijn> we do stick with the 2.6.32.x branch for stability indeed
[16:19] <pmjdebruijn> for most machines
[16:20] <wido> Right now Ceph isn't ready yet for production. Test whatever you like, but using it in production is at your own risk ;)
[16:24] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:27] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:31] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:42] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:50] <slang> sagewk: any more info I can provide for that heartbeat issue?
[16:51] <slang> Once things settle down (after a minute or so), I'm able to kill the osd processes that have been marked down and restart them, which adds them back into the osdmap
[16:52] <slang> at that point the pgs seem stable, so I don't see more osds getting marked down
[16:54] * Juul (~Juul@3408ds2-vbr.4.fullrate.dk) Quit (Ping timeout: 480 seconds)
[17:00] <pmjdebruijn> wido: yeah fair enough
[17:04] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:15] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:43] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:44] <wido> pmjdebruijn: Ok, oppassen zou ik zeggen ;) Al is het wel leuk speelgoed :-)
[17:44] * Juul (~Juul@95.209.230.202.bredband.3.dk) has joined #ceph
[17:45] <rsharpe> cmccabe: Thanls
[17:59] * greglap (~Adium@166.205.141.226) has joined #ceph
[18:02] <pmjdebruijn> wido: haha thanks :)
[18:29] <greglap> sagewk: hmm, does cfuse maybe need readahead default tuning?
[18:29] <greglap> it seems like it ought to be able to saturate the NIC with a simple dd
[18:33] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:42] * greglap (~Adium@166.205.141.226) Quit (Quit: Leaving.)
[18:50] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Quit: Tv)
[18:50] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:51] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:58] * cmccabe (~cmccabe@69.170.166.146) has joined #ceph
[19:11] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Quit: Tv)
[19:12] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[19:13] * Tv (~Tv|work@aon.hq.newdream.net) has left #ceph
[19:13] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[19:23] * lx0 (~aoliva@83TAACQQE.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:24] * lxo (~aoliva@28IAABEJO.tor-irc.dnsbl.oftc.net) Quit (Quit: later)
[19:24] * lx0 (~aoliva@83TAACQQE.tor-irc.dnsbl.oftc.net) Quit ()
[19:25] * lx0 (~aoliva@9KCAABALW.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:26] * lx0 (~aoliva@9KCAABALW.tor-irc.dnsbl.oftc.net) Quit ()
[19:26] * lx0 (~aoliva@83TAACQQF.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:26] * lx0 (~aoliva@83TAACQQF.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[19:29] * lxo (~aoliva@9KCAABALZ.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:30] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[19:37] <kemo> Hmm, while Ceph isn't production ready, is it stable? =D
[19:40] <gregaf> Yes!
[19:40] <gregaf> No!
[19:40] <gregaf> Maybe!
[19:40] <gregaf> depends on which pieces you're trying to use :)
[19:46] * Juul (~Juul@95.209.230.202.bredband.3.dk) Quit (Quit: Leaving)
[19:52] <sagewk> tv, cmccabe: pushed a debian-deps branch that strips out all the explicit (non-python) lib deps.. once that builds we can see what if anything debian is doing wrong
[19:52] <Tv> yay
[19:52] <cmccabe> k
[20:28] <yehudasa> sagewk: git repository is really slow
[20:32] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[20:34] <jojy> was looking at the crush code and thought "calc_bits_of" could use "__builtin_clz" but i guess its not that critical a path for optimization
[20:39] <kemo> Heeeeyyyyy....
[20:40] <yehudasa> jojy: I'd rather have the code clear of unportable compiler intrinsics, unless the gain would be really high
[20:40] <yehudasa> jojy: and I don't think will get anything from that specific micro optimization
[20:41] <jojy> i understand.. just browsing code so..
[20:44] <gregaf> kemo: seriously, it depends on what parts of the project you're interested in
[20:45] <gregaf> the POSIX filesystem ought to do fine if you're just, like copying backups to it, but that's the only thing that I'd say it's well-tested on
[20:45] <gregaf> RADOS is pretty stable for small clusters and simple object storage, as is RGW
[20:46] <yehudasa> speaking of stable rados, joshd: latest master crashes on the first hint of watch/notify
[20:47] <joshd> taking a look
[20:47] <yehudasa> osd/ReplicatedPG.cc: 2442: FAILED assert(!ctx->ops.empty())
[20:48] <yehudasa> that's on a notify msg
[20:55] <joshd> yehudasa: my watch/notify tests still pass - what operations are you doing?
[20:55] <joshd> oh, reproduced it wit testlibrbd
[20:56] <joshd> no, that's a different issue
[21:01] <yehudasa> joshd: forget it for now.. I think it's my bad
[23:36] <sagewk> slang: do you have a core for #1366
[23:36] <sagewk> ?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.