#ceph IRC Log


IRC Log for 2011-05-07

Timestamps are in GMT/BST.

[0:31] * Meduka_Meguca (~Yulya@ip-95-220-156-143.bb.netbynet.ru) has joined #ceph
[0:32] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:38] * Meduka_M1guca (~Yulya@ip-95-220-133-98.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[0:42] * Meduka_M1guca (~Yulya@ip-95-220-174-242.bb.netbynet.ru) has joined #ceph
[0:47] <bchrisman> here's an excerpt from monlog (20) during that cfuse failure
[0:49] * Meduka_Meguca (~Yulya@ip-95-220-156-143.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[0:55] <bchrisman> hrm
[0:55] <bchrisman> time synchronization was too far off
[0:55] <bchrisman> 15s??? when I had them synch'ed, mounted immediatley.
[0:58] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[0:59] <Tv> bchrisman: huh, the sepia cluster here seems to have very unreliable clocks, they're regularly 30sec off, that only ever affects things that look at mtimes
[0:59] * neurodrone (~neurodron@ Quit (Ping timeout: 480 seconds)
[0:59] <Tv> i don't think there's anything time-related in ceph auth
[1:00] <bchrisman> wonder if there's a dependence on what mon you're mounting against and whether its time offset is positive or negative.. dunno
[1:01] <bchrisman> in this case, mon I was targeting with mount was ???15s??? but that was the same node as the client
[1:01] * neurodrone (~neurodron@ has joined #ceph
[1:01] <bchrisman> other nodes where 0 & +8sec
[1:01] <Tv> bchrisman: what i'm saying is that it's statistically very unlikely for me to never have seen that problem, if it were so
[1:02] <Tv> Mr. Occam says I should look for another explanation, at least for a while.
[1:02] <sagewk> there are problems in the mon communications if hte clocks are off, which can prevent it from being readable, and may prevent auth
[1:02] <sagewk> at least there used to be :)
[1:02] <Tv> sagewk: i wonder why we don't trigger then more often
[1:03] <Tv> as in, i can run pjdtests with the cluster wildly off-sync, and only the mtime stuff fails
[1:03] <sagewk> maybe only 1 monitor in the sepia workloads?
[1:03] <Tv> most often 3
[1:03] <bchrisman> I'm running 3 mons??? and only see this in failing mount
[1:04] <sagewk> it's auth that fails, or auth request never gets a reply?
[1:04] <Tv> sagewk: his log had auth timing out
[1:04] <bchrisman> that's the message I get yeah
[1:04] <bchrisman> there a bug out for it?
[1:05] <sagewk> nope
[1:06] <bchrisman> I'll put one in.. can at least probably have a better error message..
[1:07] <Tv> bchrisman: the message should also say whether it got a tcp connection or not, etc
[1:07] <Tv> bchrisman: to head people in the right path when troubleshooting
[1:08] * DLange (~DLange@dlange.user.oftc.net) Quit (Quit: +++ATH :))
[1:09] <Tv> "failed to connect to any monitor" vs "monitors did not reply, tried, "
[1:10] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[1:21] * neurodrone (~neurodron@ Quit (Quit: zzZZZZzz)
[2:30] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:46] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:25] * cmccabe (~cmccabe@ has left #ceph
[3:42] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[4:54] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[5:23] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:30] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[7:40] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[10:25] <trollface> ugh
[10:43] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[11:32] * allsystemsarego (~allsystem@ has joined #ceph
[12:27] * MarkN (~nathan@ has joined #ceph
[12:39] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[13:07] * Meths (rift@ has joined #ceph
[13:51] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:25] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[14:35] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[14:41] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:41] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[16:36] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:41] * yehuda_hm (~yehuda@bzq-79-183-23-236.red.bezeqint.net) Quit (Read error: Connection reset by peer)
[18:06] <trollface> I have 2 pgs in crashed+down+peering state, is it fixable?
[19:17] * murb (~murb@red.danu.be) has joined #ceph
[19:30] * tuhl (~tuhl@p50895ED6.dip.t-dialin.net) has joined #ceph
[19:32] <tuhl> what is the current state of btrfs? Is is production ready for ceph?
[20:13] * tuhl (~tuhl@p50895ED6.dip.t-dialin.net) Quit (Remote host closed the connection)
[21:36] * pachuco (5138106c@ircip2.mibbit.com) has joined #ceph
[21:59] * Nadir_Seen_Fire (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[22:04] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[22:51] * pachuco (5138106c@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:21] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[23:23] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[23:30] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:36] * verwilst (~verwilst@dD5767030.access.telenet.be) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.