#ceph IRC Log

Index

IRC Log for 2012-02-27

Timestamps are in GMT/BST.

[0:59] * lofejndif (~lsqavnbok@56.Red-88-19-214.staticIP.rima-tde.net) Quit (Quit: Leaving)
[1:00] * tnt_ (~tnt@80.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:31] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:04] * steki-BLAH (~steki@212.200.243.16) Quit (Remote host closed the connection)
[2:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[3:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:52] * tnt_ (~tnt@80.63-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:19] * pmjdebruijn (~pascal@overlord.pcode.nl) has joined #ceph
[9:19] <pmjdebruijn> hi all
[9:20] <pmjdebruijn> http://ceph.newdream.net/download/ceph-0.42.2.tar.gz
[9:20] <pmjdebruijn> is missing
[9:23] * tnt_ (~tnt@80.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:24] <pmjdebruijn> I could checkout from git, though I'd rather have the official tarball
[9:24] <pmjdebruijn> :D
[9:26] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:32] * tnt_ (~tnt@212-166-48-236.win.be) has joined #ceph
[9:41] * stxShadow (~jens@p4FD061E8.dip.t-dialin.net) has joined #ceph
[10:20] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:26] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:35] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[12:17] * absynth (~absynth@absynth.de) has joined #ceph
[12:18] * absynth is now known as filoo_absynth
[12:18] <filoo_absynth> hello everyone
[12:20] <filoo_absynth> i'm trying to set up radosgw/s3 with apache2 and mod_fcgid. unfortunately, i seem to be missing an important information about radosgw
[12:36] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Remote host closed the connection)
[13:34] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[14:02] * nyeates (~nyeates@pool-173-59-237-75.bltmmd.fios.verizon.net) has joined #ceph
[15:23] * lofejndif (~lsqavnbok@56.Red-88-19-214.staticIP.rima-tde.net) has joined #ceph
[15:24] * lofejndif (~lsqavnbok@56.Red-88-19-214.staticIP.rima-tde.net) Quit (Max SendQ exceeded)
[15:25] * lofejndif (~lsqavnbok@56.Red-88-19-214.staticIP.rima-tde.net) has joined #ceph
[15:36] * steki-BLAH (~steki@91.195.39.5) has joined #ceph
[15:36] * BManojlovic (~steki@91.195.39.5) Quit (Read error: Connection reset by peer)
[15:38] * lofejndif (~lsqavnbok@56.Red-88-19-214.staticIP.rima-tde.net) Quit (Quit: Leaving)
[17:10] <wido> filoo_absynth: What is the problem?
[17:10] <wido> What are you missing?
[17:19] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[17:21] <stxShadow> hmmm .... can someone explain me the following log message:
[17:21] <stxShadow> 2012-02-27 13:57:39.070122 log 2012-02-27 13:57:38.603317 osd.2
[17:21] <stxShadow> 10.10.10.14:6801/5115 556 : [WRN] bad locator @56 on object @79 loc @56
[17:21] <stxShadow> op osd_op(client.44350.0:417452 rb.0.0.00000000136c [write
[17:21] <stxShadow> 3538944~28672] 56.9fb2fa17) v4
[17:21] <stxShadow> i've a lot of them in my logs
[17:33] * steki-BLAH (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:37] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:38] * tnt_ (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:41] * nyeates (~nyeates@pool-173-59-237-75.bltmmd.fios.verizon.net) Quit (Quit: Zzzzzz)
[17:46] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:51] * tnt_ (~tnt@80.63-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:59] * stxShadow (~jens@p4FD061E8.dip.t-dialin.net) Quit (Remote host closed the connection)
[17:59] * sagewk (~sage@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:16] * Tv|work (~Tv__@aon.hq.newdream.net) has joined #ceph
[18:20] * tnt__ (~tnt@235.36-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:20] * tnt_ (~tnt@80.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[18:27] <yehudasa_> filoo_absynth: anything specific that we can help you with?
[18:37] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:38] * imjustmatthew (~matthew@pool-96-228-59-130.rcmdva.fios.verizon.net) has joined #ceph
[18:39] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:43] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[18:43] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[18:49] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[18:49] * Enoria (~Enoria@albaldah.dreamhost.com) has joined #ceph
[18:57] <imjustmatthew> I got a "mon/PGMonitor.cc: 265: FAILED assert(paxosv == pg_map.version)" over the weekend, but am having trouble reproducing it, does anyone want the log/stack trace?
[18:58] <Tv|work> imjustmatthew: that sounds severe enough that that's a definite yes
[18:58] <Tv|work> imjustmatthew: please open a bug report
[18:58] <imjustmatthew> k, what should I include?
[18:59] <Tv|work> imjustmatthew: at least the monitor logs.. perhaps the monitor state directory
[19:01] <imjustmatthew> Tv|work: will do, thanks
[19:01] <gregaf1> imjustmatthew: any monitor logs you have, and the contents of three files in the mon state dir...
[19:01] <gregaf1> firs_committed, last_committed, and latest
[19:01] <gregaf1> they'll be in /path/to/mon/dir/pgmap
[19:02] <gregaf1> oh, and the range of the numbered files in that directory, but not their contents :)
[19:02] <gregaf1> and yes, we're very interested in that bug; it'll be high priority for me
[19:03] <imjustmatthew> gregaf1: okay, I'll give you the bugID in a minute
[19:04] <imjustmatthew> gregaf1: Be warned, I haven't been able to reproduce it, and the cluster is now in a broken state from other issues
[19:04] * nyeates (~nyeates@pool-173-59-237-75.bltmmd.fios.verizon.net) has joined #ceph
[19:04] <gregaf1> well, whatever you've got :)
[19:05] <sagewk> see http://tracker.newdream.net/issues/1789
[19:05] * fronlius (~fronlius@g231138073.adsl.alicedsl.de) has joined #ceph
[19:06] <imjustmatthew> sagewk: that's probably it, the only difference is the (memory addresses?) in the square brackets
[19:06] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:06] <imjustmatthew> sagewk: and I'm on 0.42.2
[19:12] * BManojlovic (~steki@212.200.243.16) has joined #ceph
[19:13] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[19:15] <sagewk> imjustmatthew: can you attach/reference whatever logs you have to that bug?
[19:15] <sagewk> thanks!
[19:16] * fronlius_ (~fronlius@f054187150.adsl.alicedsl.de) has joined #ceph
[19:18] <imjustmatthew> sagewk: Attached now.
[19:19] * fronlius (~fronlius@g231138073.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[19:19] * fronlius_ is now known as fronlius
[19:19] <imjustmatthew> sagewk: Is there anything else you'd like saved from the cluster data itself before I nuke it? I'll retain the logs for a week or so for my own use.
[19:30] * fronlius (~fronlius@f054187150.adsl.alicedsl.de) Quit (Quit: fronlius)
[19:38] * stxShadow (~Jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[19:39] <stxShadow> hmm .... my osd log is full of "connect protocol version mismatch, my 9 != 0" ..... anything to worry about ?
[19:51] * joao (~JL@ace.ops.newdream.net) has joined #ceph
[20:00] <Tv|work> stxShadow: quick guess: sounds like you were running a mixed cluster, nodes on both sides of the recent protocol change
[20:00] <Tv|work> stxShadow: if you upgraded lately, re-read the release announcement to see what changed
[20:01] <sagewk> stxshadow: is that really a 0?
[20:04] <stxShadow> yes
[20:04] <stxShadow> 2012-02-27 06:25:53.890440 7f8c40894700 -- 10.10.10.12:6801/9751 >> 10.10.10.10:6800/24027 pipe(0x12ecfa00 sd=30 pgs=0 cs=0 l=0).connect protocol version mismatch, my 9 != 0
[20:05] <stxShadow> the hosts are all on the same version
[20:06] <stxShadow> root@fcmsmon0:~# ceph -v
[20:06] <stxShadow> ceph version 0.42-70-g0e4367a (commit:0e4367aaac88b99c36386b6ce5e8d816fdd4ada0)
[20:06] <stxShadow> -> same on all other 7 nodes
[20:12] <gregaf1> stxShadow: that's pretty odd, can you restart them all with --debug_ms 10?
[20:16] <stxShadow> not on this cluster .... but on my test system -> same error
[20:16] <stxShadow> the test cluster was setuped freshly yesterday (no other ceph binaries there before)
[20:17] <stxShadow> 2012-02-27 20:15:30.086448 7f71ef664700 -- 10.0.0.11:6801/29483 >> 10.0.0.12:6802/5849 pipe(0x4185000 sd=25 pgs=0 cs=0 l=0).connect protocol version mismatch, my 9 != 0
[20:17] <stxShadow> 2012-02-27 20:15:45.087024 7f71ef664700 -- 10.0.0.11:6801/29483 >> 10.0.0.12:6802/5849 pipe(0x4185000 sd=25 pgs=0 cs=0 l=0).connect protocol version mismatch, my 9 != 0
[20:17] <stxShadow> -> the same there
[20:19] <stxShadow> ok ....restarted the testcluster with "debug ms = 10" in ceph.conf
[20:23] <stxShadow> http://pastebin.de/23720
[20:24] <stxShadow> both nodes 10.0.0.11 und 10.0.0.12 are on the same version
[20:24] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[20:24] <stxShadow> ceph version 0.42-70-g0e4367a (commit:0e4367aaac88b99c36386b6ce5e8d816fdd4ada0)
[20:27] <gregaf1> stxShadow: can I get the log from the other side too? (10.0.0.12:6800)
[20:27] <stxShadow> sure ..... just a moment .... same timeframe ?
[20:28] <gregaf1> yeah, I want to see the failed connection from both sides
[20:31] <stxShadow> http://pastebin.de/23722
[20:35] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:35] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[20:37] <gregaf1> well that's interesting, 10.0.0.12 doesn't even see 10.0.0.11's incoming connect, although they're sending pings back and forth
[20:37] <gregaf1> I wonder if the log level outputs are asymmetric
[20:40] <gregaf1> stxShadow: sorry, I'm just hunting for clues hereā€¦can you do it again with --debug_ms 20 and --debug_osd 10 and zip up the logs and send them to me?
[20:46] <stxShadow> ok .....
[20:47] <stxShadow> i set the debug ms = 10 in the global section of my ceph.conf
[20:47] <stxShadow> then i copied it to all 3 nodes
[20:47] <stxShadow> and restarted from node1 with "/etc/init.d/ceph -a restart"
[20:48] <stxShadow> node1 has passwordless ssh access to the others
[20:48] <stxShadow> -> i think the way is ok
[20:48] <gregaf1> yeah, it's working right
[20:49] <gregaf1> when I said asymmetric log level outputs I meant I wondered if the accepter code was set differently from the connecter code :)
[20:49] <gregaf1> it is not enough to have shown nothing, but it is enough to make me want to see the rest
[20:50] <stxShadow> oh ... sorry :S
[20:51] <gregaf1> stxShadow: also, your bad locator warning
[20:51] <gregaf1> it is confusing us, what workload was running?
[20:52] <gregaf1> oh, it looks like rbd?
[20:52] <stxShadow> kvm vserver -> using rbd
[20:52] <gregaf1> hrm
[20:52] <gregaf1> how many OSDs?
[20:52] <stxShadow> 4
[20:52] <stxShadow> if you want to take a look
[20:52] <stxShadow> i can give you access
[20:52] <stxShadow> to the cluster ....
[20:53] <joshd> rbd never sets locators, so that's pretty strange
[20:53] <gregaf1> that might be helpful, but it'll have to be later, I've got a meeting coming up
[20:53] <gregaf1> joshd: ah, I was about to be asking about that :)
[20:54] <stxShadow> where do i send the osd logs ?
[20:54] <stxShadow> or do you need the mon / mds logs too ?
[20:54] <gregaf1> no, just the OSD logs
[20:55] <gregaf1> you can send them to me: gregory.farnum AT dreamhost.com
[20:55] <gregaf1> oh, you're talking about the messenger stuff...good
[20:55] <stxShadow> hehe .... yes
[20:56] <gregaf1> if you still have the ones with the bad locator I'll take those too :)
[20:56] <stxShadow> hmmm .... will have a look for them
[20:56] <stxShadow> first i will send the messenger stuff
[20:56] <gregaf1> cool
[20:56] <gregaf1> I have to run now though, will be back later :)
[20:57] <stxShadow> ok ... thank you
[20:59] * aliguori (~anthony@32.97.110.59) has joined #ceph
[20:59] <filoo_absynth> re
[21:24] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[21:25] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[21:31] * stxShadow (~Jens@ip-88-153-224-220.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[21:54] * nyeates (~nyeates@pool-173-59-237-75.bltmmd.fios.verizon.net) Quit (Quit: Zzzzzz)
[21:56] * aliguori (~anthony@32.97.110.59) Quit (Quit: Ex-Chat)
[21:57] <pulsar> sagewk: got my messages?
[22:01] * filoo (~jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[22:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[22:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:13] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:16] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[22:19] * ^conner (~conner@leo.tuc.noao.edu) Quit (Ping timeout: 480 seconds)
[22:28] * ^conner (~conner@leo.tuc.noao.edu) has joined #ceph
[22:28] * lofejndif (~lsqavnbok@25.Red-88-1-167.dynamicIP.rima-tde.net) has joined #ceph
[22:29] * lofejndif (~lsqavnbok@25.Red-88-1-167.dynamicIP.rima-tde.net) Quit (Max SendQ exceeded)
[22:31] * lofejndif (~lsqavnbok@25.Red-88-1-167.dynamicIP.rima-tde.net) has joined #ceph
[22:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[22:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:45] * jens (~jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[22:46] * jens (~jens@ip-88-153-224-220.unitymediagroup.de) Quit ()
[22:48] * filoo (~jens@ip-88-153-224-220.unitymediagroup.de) Quit (Remote host closed the connection)
[22:48] * stxShadow (~jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[22:48] * stxShadow (~jens@ip-88-153-224-220.unitymediagroup.de) has left #ceph
[22:49] * stxShadow (~jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[22:50] <imjustmatthew> I'm trying to load a core dump for ceph-mds with "gdb ceph-mds "<corefile>, but getting errors, the first one being about not finding debugging symbols in /usr/bin/ceph-mds, do I need a different package of ceph?
[22:51] * stxShadow (~jens@ip-88-153-224-220.unitymediagroup.de) Quit ()
[22:52] <joshd> imjustmatthew: if you're on a debian-style distro you need the ceph-dbg package
[23:05] * elder_ (~elder@aon.hq.newdream.net) has joined #ceph
[23:05] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[23:05] * jluis (~JL@ace.ops.newdream.net) has joined #ceph
[23:05] * yehudasa__ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[23:05] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[23:05] * joshd1 (~joshd@aon.hq.newdream.net) has joined #ceph
[23:06] * dmick (~dmick@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[23:06] * joshd (~joshd@aon.hq.newdream.net) Quit (Write error: connection closed)
[23:06] * sagewk1 (~sage@aon.hq.newdream.net) has joined #ceph
[23:07] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[23:07] <imjustmatthew> That worked, and I'm in the correct frame to get the variable but it looks like I'm still doing something wrong.
[23:07] <imjustmatthew> I'm trying to print r from http://tracker.newdream.net/issues/2110
[23:07] <imjustmatthew> and getting
[23:07] <imjustmatthew> #11 0x00000000006a2271 in Journaler::_finish_write_head (this=0x2861000, r=<optimized out>, wrote=..., oncommit=0x0) at osdc/Journaler.cc:360
[23:07] * Tv|work (~Tv__@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[23:07] <sagewk1> what if you go up one frame?
[23:08] <imjustmatthew> (gdb) print r $1 = <optimized out>
[23:08] <imjustmatthew> up is lower number frame?
[23:08] <sagewk1> f 12
[23:08] <sagewk1> p rc
[23:09] <sagewk1> or p m->result
[23:09] <imjustmatthew> p rc gives $2 = -108
[23:09] <sagewk1> perfect, thanks
[23:10] <imjustmatthew> np, I hope it helps
[23:10] <sagewk1> dup of http://tracker.newdream.net/issues/1796
[23:11] * sjust2 (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:11] <sagewk1> thanks!
[23:12] * elder (~elder@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:12] * joao (~JL@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[23:12] * yehudasa_ (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:12] * gregaf1 (~Adium@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:12] * sagewk (~sage@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[23:16] * __nolife (~Lirezh@83-64-53-66.kocheck.xdsl-line.inode.at) Quit (Quit: changing servers)
[23:17] * __nolife (~Lirezh@83-64-53-66.kocheck.xdsl-line.inode.at) has joined #ceph
[23:20] * gohko_ (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[23:26] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[23:30] <sagewk1> sjust: pushed wip-split2 with the osd.cc bits of the split refactor, disabled.
[23:30] * sagewk1 is now known as sagewk
[23:31] <sjust> ok, looking
[23:40] <sjust> sagewk: other than the nit I picked, looks fine
[23:43] * fronlius (~fronlius@f054187150.adsl.alicedsl.de) has joined #ceph
[23:43] <sagewk> sjust: thanks, will fix that up and merge.
[23:43] <sjust> ko
[23:43] <sjust> *ok
[23:47] * lofejndif (~lsqavnbok@25.Red-88-1-167.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[23:57] * lofejndif (~lsqavnbok@29.Red-81-39-149.dynamicIP.rima-tde.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.