#ceph IRC Log

Index

IRC Log for 2011-08-23

Timestamps are in GMT/BST.

[0:23] <johnl_> just upgraded my ceph cluster and all my osds are making each other as failed
[0:23] <johnl_> 4 osds
[0:23] <sagewk> johnl_ to v0.33?
[0:24] <johnl_> sorry, was just finding the commit I built
[0:24] <johnl_> built early this morning from master
[0:25] <sagewk> cmccabe: I've had to add _GNU_SOURCE to a few other places.. should we just define that in the Makefile?
[0:25] <sagewk> johnl_: doy ou see 'wrongly marked down' in the ceph -w output/log?
[0:25] <cmccabe> hmm
[0:26] <cmccabe> sagewk: I can't think of anywhere we don't want _GNU_SOURCE, so probably
[0:28] <johnl_> commit 3a623bb327, so Sunday afternoon
[0:28] <johnl_> nah, not wrongly marked down. "2011-08-22 22:19:25.777459 log 2011-08-22 22:19:25.690509 mon0 10.126.174.94:6789/0 35 : [INF] osd0 10.208.188.226:6800/1829 failed (by osd2 10.219.16.42:6800/1865)"
[0:30] <sagewk> johnl_: are you using multiple interfaces?
[0:30] <johnl_> no
[0:30] <johnl_> if I restart the entire cluster, it comes up 2011-08-22 22:29:38.800097 osd e168: 4 osds: 4 up, 4 in
[0:31] <johnl_> then within about 30 seconds they start failing each other
[0:31] <johnl_> this is the same cluster with the scrub errors I reported actually
[0:31] <johnl_> but it's been working for the last 48 hours. written over 700gig of data to it without complaint.
[0:32] <johnl_> upgraded and then osds start failing
[0:34] <johnl_> btw, I rigged launchpad to build packages from git master each night: https://launchpad.net/~ceph/+archive/unstable
[0:34] * chossette (~chossette@212-198-248-35.rev.numericable.fr) has joined #ceph
[0:34] <johnl_> been running fine for months
[0:35] <sagewk> johnl_: can you capture a some osd logs and post them somewhere?
[0:36] <johnl_> will do that now
[0:36] <sagewk> debug ms = 1, debug osd = 10 should be sufficient
[0:36] * chossette (~chossette@212-198-248-35.rev.numericable.fr) Quit ()
[0:36] <sagewk> johnl_: thanks!
[0:46] <johnl_> sagewk: http://tracker.newdream.net/issues/1434
[0:47] <jojy> mount is returning i/o error (5)
[0:47] <sagewk> johnl_: can you do it with 'debug ms = 1', and attach 2 osd logs? ideally a pair where one marked down the other
[0:48] <sagewk> jojy: check dmesg|tail
[0:48] <jojy> nothing to see there. its clear
[0:49] <johnl_> that was with debug ms = 1
[0:49] <johnl_> hrm, debug ms = 1 in the mon section, debug osd = 10 in osd section. that not right?
[0:49] <sagewk> debug ms =1 in the osd section too, so it'll affect the osd
[0:49] <johnl_> doh. what does "ms" stand for?
[0:49] <sagewk> messenger
[0:50] <johnl_> k
[0:50] <sagewk> tnx :)
[0:52] <johnl_> ah, heh. all the osds bar one are crashing. just happened to get the log off the non-crashed one before.
[0:53] <johnl_> posting logs
[0:58] <johnl_> ticket updated
[1:08] <gregaf> well if they're crashing they ought to be marking each other down?
[1:08] <gregaf> oh, I'm probably behind in that conversation, n/m me
[1:11] <johnl_> yeah, I suspect that the bug is improperly filed now :)
[1:12] * u3q (~ben@uranus.tspigot.net) has joined #ceph
[1:18] <johnl_> bed time now, nn
[1:27] <sagewk> johnl_: thanks!
[1:54] * The_Bishop (~bishop@p4FCDEE25.dip.t-dialin.net) has joined #ceph
[2:11] * pinklady (~pinklady@212-198-248-35.rev.numericable.fr) has joined #ceph
[2:13] * pinklady (~pinklady@212-198-248-35.rev.numericable.fr) Quit ()
[2:24] * Juul (~Juul@3408ds2-vbr.4.fullrate.dk) Quit (Quit: Leaving)
[2:29] * MKFG (~MK_FG@188.226.51.71) has joined #ceph
[2:32] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Ping timeout: 480 seconds)
[2:32] * MKFG is now known as MK_FG
[2:37] * cmccabe (~cmccabe@69.170.166.146) has left #ceph
[2:39] <jojy> failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/osd.0.asok': error 2: No such file or directory
[2:40] <jojy> i think this error causes mkceph to not suceed
[2:41] <jojy> i dont see any other error before the mount time I/O err
[2:43] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:59] * huangjun (~root@61.184.206.217) has joined #ceph
[3:12] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:17] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[3:26] * aaagirl (~aaagirl@212-198-248-35.rev.numericable.fr) has joined #ceph
[3:28] * aaagirl (~aaagirl@212-198-248-35.rev.numericable.fr) Quit ()
[3:31] * The_Bishop (~bishop@p4FCDEE25.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[3:40] * The_Bishop (~bishop@p4FCDEE25.dip.t-dialin.net) has joined #ceph
[4:11] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[4:14] * The_Bishop (~bishop@p4FCDEE25.dip.t-dialin.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[4:18] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) Quit (Quit: jojy)
[4:19] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[5:05] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[5:05] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit ()
[5:16] * zyh (~zyh@182.92.247.2) has joined #ceph
[5:21] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:23] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has left #ceph
[6:23] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[7:12] * The_Bishop (~bishop@port-92-206-21-65.dynamic.qsc.de) has joined #ceph
[7:51] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (Quit: Leaving.)
[7:51] * sage1 (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[8:14] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[9:55] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[9:57] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[11:33] * cybergggirl (~cybergggi@212-198-248-35.rev.numericable.fr) has joined #ceph
[11:35] * cybergggirl (~cybergggi@212-198-248-35.rev.numericable.fr) Quit ()
[11:47] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:23] * zyh (~zyh@182.92.247.2) Quit (Ping timeout: 480 seconds)
[13:43] * huangjun (~root@61.184.206.217) Quit (Quit: Lost terminal)
[15:13] * ajm (adam@adam.gs) Quit (Quit: ajm)
[15:13] * ajm (adam@adam.gs) has joined #ceph
[15:20] * u3q (~ben@uranus.tspigot.net) Quit (Read error: Connection timed out)
[15:21] * u3q (~ben@uranus.tspigot.net) has joined #ceph
[15:33] * jclendenan (~jclendena@204.244.194.20) Quit (Read error: Connection timed out)
[15:33] * jclendenan (~jclendena@204.244.194.20) has joined #ceph
[17:51] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[18:22] * greglap (~Adium@166.205.141.168) has joined #ceph
[18:22] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:23] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[18:25] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:44] * visiogirl (~visiogirl@212-198-248-35.rev.numericable.fr) has joined #ceph
[18:44] * greglap (~Adium@166.205.141.168) Quit (Read error: Connection reset by peer)
[18:46] * visiogirl (~visiogirl@212-198-248-35.rev.numericable.fr) Quit ()
[18:57] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[18:58] * greglap (~Adium@aon.hq.newdream.net) Quit ()
[18:59] * cmccabe (~cmccabe@69.170.166.146) has joined #ceph
[19:17] * Dantman (~dantman@S01060023eba7eb01.vc.shawcable.net) has joined #ceph
[19:31] <gregaf> bchrisman: yep, that assert is fixed in 80dfc981a0f70127b475d30b36f97e39eae49994
[19:31] <gregaf> errr, 4a17d71c8e369c47c7fa110de1569190cd555657
[19:45] <bchrisman> ahh okay thx
[20:01] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Quit: Ex-Chat)
[20:16] * aliguori (~anthony@32.97.110.59) has joined #ceph
[21:30] <Tv> dpkg-query: warning: parsing file '/var/lib/dpkg/status' near line 1967 package 'linux-image-3.0.0-ceph-00026-gf07cebb':
[21:30] <Tv> error in Version string 'ceph': version number does not start with digit
[21:30] <Tv> hrmmm
[21:31] <Tv> that's from sepia
[21:31] <Tv> Version: ceph
[21:38] <Tv> http://tracker.newdream.net/issues/1438
[21:47] <sagewk> yeah 3.0.0-ceph-... is what we want
[21:47] <sagewk> probably passing wrong magic to make-kpkg or something
[22:59] * aliguori (~anthony@32.97.110.59) Quit (Ping timeout: 480 seconds)
[23:08] * aliguori (~anthony@32.97.110.65) has joined #ceph
[23:42] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.