#ceph IRC Log

Index

IRC Log for 2012-07-16

Timestamps are in GMT/BST.

[0:01] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Quit: ZNC - http://znc.sourceforge.net)
[0:01] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[0:03] * nymous (~darthampe@109-161-126-188.pppoe.yaroslavl.ru) Quit (Ping timeout: 480 seconds)
[0:27] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[0:43] * The_Bishop (~bishop@2a01:198:2ee:0:1d6b:f9e3:63b6:b4cd) has joined #ceph
[0:50] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[1:08] * dpemmons (~dpemmons@204.11.135.146) Quit (Remote host closed the connection)
[1:23] * BManojlovic (~steki@212.200.241.106) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:35] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[1:44] * tremon (~aschuring@d594e6a3.dsl.concepts.nl) Quit (Quit: getting boxed in)
[1:45] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has left #ceph
[1:46] * Qten (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[2:55] * Qu310 (~qgrasso@120.88.69.209) has joined #ceph
[3:01] * Qten (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Ping timeout: 480 seconds)
[3:11] * Qu310 is now known as Qten
[3:39] * Qu310 (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[3:45] * Qten (~qgrasso@120.88.69.209) Quit (Ping timeout: 480 seconds)
[3:50] * nIMBVS (~nIMBVS@82.79.97.36) has joined #ceph
[3:51] <nIMBVS> hello. I'm running obsync like so: obsync --dry-run --verbose --delete-after --src-type="s3" --src-host="s3.amazon.com" --src-bucket="src-bucket" --dst-type="s3" --dst-host="s3.amazon.com" --dst-bucket="dst-bucket"
[3:52] <nIMBVS> but I'm getting "s3.amazon.com: no such bucket as src-bucket"
[3:52] <nIMBVS> using obsync 0.41 and 0.47 on ubuntu percise (12.04) and quantal (12.10)
[3:53] <nIMBVS> any ideas what I'm doing wrong?
[4:43] * deepsa (~deepsa@122.172.169.247) has joined #ceph
[4:51] * renzhi (~renzhi@180.169.73.90) has joined #ceph
[6:24] * renzhi (~renzhi@180.169.73.90) Quit (Quit: Leaving)
[6:50] <sage> nimbvs: did you replace src-bucket with the name of your s3 bucket?
[8:00] * nIMBVS (~nIMBVS@82.79.97.36) Quit (Quit: User pushed the X - because it's Xtra, baby)
[8:01] * nIMBVS (~nIMBVS@82.79.97.36) has joined #ceph
[8:02] <nIMBVS> of course. when I run the command, src-bucket contains my bucket's name
[9:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:15] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:57] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:00] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) has joined #ceph
[10:19] * loicd (~loic@z2-8.pokersource.info) has joined #ceph
[10:26] * loicd1 (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[10:27] * loicd1 (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit ()
[10:27] * loicd1 (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[10:27] * loicd (~loic@z2-8.pokersource.info) Quit (Read error: Connection reset by peer)
[10:27] * loicd1 is now known as loicd
[11:00] <loicd> Hi. I wonder if http://ceph.com/docs/master/config-cluster/chef/ is up to date ?
[11:01] <loicd> It seems to be since it mentions "Ceph 0.48 Argonaut" ;-)
[11:24] * sileht (~sileht@sileht.net) has joined #ceph
[11:25] * fc (~fc@83.167.43.235) has joined #ceph
[11:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[11:56] * sileht (~sileht@sileht.net) Quit (Ping timeout: 480 seconds)
[12:04] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[12:13] * sileht (~sileht@sileht.net) has joined #ceph
[13:10] * hijacker (~hijacker@213.91.163.5) Quit (Quit: Leaving)
[13:11] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[13:17] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[13:22] * tjikkun_ (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[13:23] * goedi (goedi@195.26.5.166) Quit ()
[13:24] <nIMBVS> Eureka! I got it. I needed to put S3's endpoints on src-host and dst-host, not only s3.amazon.com on both. In my case --src-host="s3.amazonaws.com" and --dst-host="s3-us-west-1.amazonaws.com"
[13:34] * tjikkun_ (~tjikkun@82-169-255-84.ip.telfort.nl) has joined #ceph
[14:03] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (Ping timeout: 480 seconds)
[14:07] * gregorg (~Greg@78.155.152.6) has joined #ceph
[14:25] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:28] * kirin` (telex@xn--phnix-ibb.net) Quit (Ping timeout: 480 seconds)
[14:47] * jamespage (~jamespage@tobermory.gromper.net) Quit (Quit: Coyote finally caught me)
[14:48] * kirin` (telex@xn--phnix-ibb.net) has joined #ceph
[14:56] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[14:56] * lofejndif (~lsqavnbok@28IAAFYQY.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:13] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:15] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[15:16] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[15:18] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:28] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Read error: Operation timed out)
[15:45] * kirin` (telex@xn--phnix-ibb.net) Quit (Ping timeout: 480 seconds)
[15:47] * lofejndif (~lsqavnbok@28IAAFYQY.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[15:48] * kirin` (telex@xn--phnix-ibb.net) has joined #ceph
[16:00] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Ping timeout: 480 seconds)
[16:01] * kirin` (telex@xn--phnix-ibb.net) Quit (Read error: Connection reset by peer)
[16:11] * kirin` (telex@xn--phnix-ibb.net) has joined #ceph
[16:27] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[16:41] * jamespage (~jamespage@tobermory.gromper.net) has joined #ceph
[17:13] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Ping timeout: 480 seconds)
[17:20] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:20] * Tv_ (~tv@2607:f298:a:607:394a:5e1a:feb6:b166) has joined #ceph
[17:23] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[17:34] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[17:38] * lofejndif (~lsqavnbok@28IAAFYU0.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:38] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:46] * sagelap (~sage@2600:1012:b00b:783e:7d77:dbf5:c28c:67a1) has joined #ceph
[17:51] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[17:52] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[17:53] <SpamapS> sagewk: when you're here.. I am wondering about how viable libfcgi's upstream is... their mailing list seems to have been dead since May 2011.
[17:55] * sagelap (~sage@2600:1012:b00b:783e:7d77:dbf5:c28c:67a1) Quit (Read error: Operation timed out)
[18:05] <sagewk> spamaps: that isn't so surprising. there probably isn't any new development, and the current code is already well hardened?
[18:07] <SpamapS> sagewk: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681591 <-- Stuff like this still needs a place to land though
[18:08] <SpamapS> sagewk: when I say its dead, I don't mean no messages due to inactivity.. I mean.. all messages bounce.
[18:08] * deepsa_ (~deepsa@101.63.194.24) has joined #ceph
[18:10] <SpamapS> sagewk: anyway, its just something that has come up while looking at it for adding to our supported seed in Ubuntu (aka "main")
[18:11] * deepsa (~deepsa@122.172.169.247) Quit (Ping timeout: 480 seconds)
[18:11] * deepsa_ is now known as deepsa
[18:13] * jluis is now known as joao
[18:15] <sagewk> bummer. ok
[18:17] <SpamapS> I emailed the admin listed on the mailing list page (http://mailman.chelsea.net/mailman/listinfo/fastcgi-developers) about it
[18:24] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[18:25] * lofejndif (~lsqavnbok@28IAAFYU0.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[18:30] * Cube (~Adium@12.248.40.138) has joined #ceph
[18:32] * deepsa_ (~deepsa@122.172.4.234) has joined #ceph
[18:35] * deepsa (~deepsa@101.63.194.24) Quit (Ping timeout: 480 seconds)
[18:40] * deepsa_ (~deepsa@122.172.4.234) Quit (Ping timeout: 480 seconds)
[18:40] * allsystemsarego (~allsystem@188.25.131.234) has joined #ceph
[18:41] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[18:50] * deepsa (~deepsa@122.172.4.234) has joined #ceph
[18:50] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:51] * The_Bishop (~bishop@2a01:198:2ee:0:1d6b:f9e3:63b6:b4cd) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:56] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:58] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:08] * andrew (~andrew@50-93-251-66.fttp.usinternet.com) has joined #ceph
[19:08] * andrew is now known as andrewbogott
[19:32] * dmick (~dmick@38.122.20.226) has joined #ceph
[19:35] * andrewbogott (~andrew@50-93-251-66.fttp.usinternet.com) Quit (Quit: andrewbogott)
[19:38] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:43] * andrew (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[19:44] * andrew is now known as andrewbogott
[19:52] * tjikkun_ (~tjikkun@82-169-255-84.ip.telfort.nl) Quit (Quit: Ex-Chat)
[19:58] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:24] * BManojlovic (~steki@212.200.241.106) has joined #ceph
[20:25] * loicd (~loic@67.23.204.5) has joined #ceph
[20:32] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:34] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[20:34] * JJ1 (~JJ@12.248.40.138) has joined #ceph
[20:41] * kirin` (telex@xn--phnix-ibb.net) Quit (Quit: Lost terminal)
[20:42] * danieagle (~Daniel@177.43.213.15) Quit (Remote host closed the connection)
[20:43] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[20:55] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[21:02] * loicd (~loic@67.23.204.5) Quit (Ping timeout: 480 seconds)
[21:08] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Quit: andrewbogott)
[21:13] * loicd (~loic@67.23.204.5) has joined #ceph
[21:23] * Dr_O (~owen@host-78-149-118-190.as13285.net) has joined #ceph
[21:49] * loicd1 (~loic@67.23.204.5) has joined #ceph
[21:49] * loicd (~loic@67.23.204.5) Quit (Read error: Operation timed out)
[21:49] * loicd1 (~loic@67.23.204.5) Quit ()
[22:00] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[22:01] <mgalkiewicz> hi guys I have a problem with hanging rbd map. Is it true that rbd volume should not be mapped on server with osd?
[22:03] <joshd> mgalkiewicz: it can lead to the same deadlock that loopback nfs or the cephfs kernel client mounted on a server with an osd can
[22:04] <joshd> mgalkiewicz: you can make it less likely by making sure your libc and kernel support sync_fs, but the deadlock is still possible
[22:05] <mgalkiewicz> hmm I have some rbd map processes waiting forever in iowait and it is not possible to kill them
[22:05] <mgalkiewicz> but It happens randomly
[22:05] <joshd> ok, that's a different problem
[22:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:06] <mgalkiewicz> could you give me some hints how to debug this?
[22:07] * Dr_O (~owen@host-78-149-118-190.as13285.net) Quit (Quit: Leaving)
[22:07] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:07] <joshd> if you can turn on kernel debugging, you can see where in the process things are going wrong
[22:07] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[22:08] <joshd> i.e. the kernel section of http://ceph.com/wiki/Debugging
[22:08] <mgalkiewicz> yeah sure I will try
[22:09] <joshd> actually, someone just mailed the list about the same problem - do you have anything in syslog like "cannot create duplicate filename '/devices/virtual/block/rbd0'"?
[22:09] <mgalkiewicz> ehh CONFIG_DYNAMIC_DEBUG is not configured in my kernel
[22:10] <mgalkiewicz> checking
[22:11] <mgalkiewicz> hmm nothing like this
[22:13] <mgalkiewicz> in my case the process just hangs
[22:14] <mgalkiewicz> the same happens with unmap
[22:15] <mgalkiewicz> I have noticed that after successful mapping lets say /dev/rbd0, after unmapping rbd0 still exists but /dev/rbd/vol_name not
[22:15] <joshd> can you still access the monitors with e.g. ceph -s when this happens?
[22:15] <mgalkiewicz> yes and status shows that everything is fine
[22:16] <mgalkiewicz> rbd showmapped returns nothing
[22:16] <joshd> definitely sounds like a race condition in the kernel rbd module then (for map and unmap)
[22:16] <mgalkiewicz> but /dev/rbd/vol_name exists
[22:17] <mgalkiewicz> hmm I guess it will be not easy to find workaround
[22:18] <mgalkiewicz> so the best solution is to enable kernel debugging and provide you some logs?
[22:18] <joshd> yeah, that'd be the most helpful
[22:19] <mgalkiewicz> I am using 3.2.20 do you remember any similar problems in such version?
[22:24] <joshd> not exactly, but the kernel code used by rbd map has changed a bit since then
[22:25] <joshd> there was a separate issue with device id (i.e. the 0 in /dev/rbd0) re-use that was fixed in 3.4
[22:26] <joshd> it'd be good to see if the issue still happens in 3.4
[22:30] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:31] <mgalkiewicz> do you remember which issue number?
[22:32] <mgalkiewicz> which ticket described that problem
[22:34] <joshd> http://tracker.newdream.net/issues/1907
[22:35] <joshd> as a result of that, the way rbd map works was restructured a bit
[22:38] <mgalkiewicz> hmm I am not mounting the volume only creating filesystem on it
[22:38] <mgalkiewicz> is there any difference between using rbd tool and executing for example echo 0 > /sys/bus/rbd/remove
[22:38] <mgalkiewicz> ?
[22:40] <joshd> no, rbd map/unmap/showmapped are just wrappers around the kernel module's sysfs interface
[22:42] * loicd (~loic@67.23.204.5) has joined #ceph
[22:43] <mgalkiewicz> it was mentioned in this ticket "use the kernel interface to remove the device instead of using 'rbd unmap', i.e."
[22:54] <mgalkiewicz> I have quite interesting strace output from mapping https://gist.github.com/3124972
[22:55] <mgalkiewicz> it looks like the problem with connecting to mon
[22:56] <mgalkiewicz> what is more in mon logs there are some complaints about time synchronization "clocks are too skewed for us to function"
[22:56] * lofejndif (~lsqavnbok@09GAAGRLM.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:56] <joshd> that's just the entry point for the kernel
[22:57] <joshd> that's where the actual work of mapping happens
[22:57] <mgalkiewicz> how precise the time should be synchronize
[22:58] <mgalkiewicz> there are differences around 0.9s
[22:59] <joshd> there's a configurable threshold, mon_clock_drift_allowed, that defaults to 0.05
[22:59] * LarsFronius_ (~LarsFroni@2a02:8108:3c0:24:69bf:5e36:57f3:f253) has joined #ceph
[23:01] <joshd> I don't think the clock drift would cause the hang like that
[23:01] <joshd> can you see the client connection in the monitor logs when the map is hanging?
[23:02] <mgalkiewicz> no
[23:02] <mgalkiewicz> at least not when executing through strace
[23:03] <joshd> I'm wrong, too much clock drift will prevent the monitors from making progress and letting a client connect
[23:03] <joshd> it could be the source of rbd map hanging
[23:03] <mgalkiewicz> I do that because I am able to kill it as opposed to running without it
[23:04] * LarsFronius (~LarsFroni@95-91-243-243-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[23:04] * LarsFronius_ is now known as LarsFronius
[23:05] <mgalkiewicz> no I am wrong it still hangs
[23:06] <mgalkiewicz> joshd: is it possible that clock drift caused this and because some rbd map commands still hangs I wont be able to map anything till they are killed?
[23:10] <joshd> that looks very possible, actually
[23:11] <mgalkiewicz> what a shame that I cant test it right know because it happened in my production environment
[23:12] <mgalkiewicz> rebooting the server or ceph is probably the only way to kill them
[23:13] <mgalkiewicz> unless you have any idea?
[23:13] <joshd> that locking issue looks like it's fixed in 3.4 at least
[23:13] <joshd> but unfortunately there's no way to recover without rebooting in 3.2, sorry
[23:14] <mgalkiewicz> ok
[23:14] <mgalkiewicz> right know I am mapping all volumes on node with mon only
[23:14] <mgalkiewicz> I will add ntpd servers to all ceph machines
[23:15] <joshd> only the monitors need their time sychronized
[23:15] <mgalkiewicz> yeah I have 3 mons, 2osd and mds so all ceph machines in my case
[23:17] <mgalkiewicz> ok thx for help anyway
[23:17] * andrewbogott (~andrewbog@c-76-113-214-220.hsd1.mn.comcast.net) has joined #ceph
[23:18] <joshd> you're welcome
[23:45] <andrewbogott> I am trying to configure my first tiny ceph cluster, and am immediately hitting this error "unable to read magic from mon data" <- does that indicate a specific misunderstanding on my part?
[23:45] <andrewbogott> That is, am I supposed to prime the mon data dir somehow?
[23:48] <andrewbogott> My conf file (largely cribbed from the ceph wiki:) http://pastebin.com/40PBp6Mu
[23:48] <dmick> it looks like that whole message is "unable to read magic from mon data.. did you run mkcephfs?"
[23:49] <dmick> and, well, did you? :) or how did you set the cluster up?
[23:49] * allsystemsarego (~allsystem@188.25.131.234) Quit (Quit: Leaving)
[23:50] <andrewbogott> dmick: Maybe things on the wiki are out of order??? I had the impression that I could just set up dirs and conf files on each node and then do 'service ceph start -a'
[23:50] <andrewbogott> um??? 'service ceph -a start' that is
[23:51] <dmick> that's been changing a lot, but, now I know how you were setting up and starting; I'll research unless someone else beats me
[23:51] <andrewbogott> dmick: Thank you. I did try mkcephfs earlier, but encountered a different equally cryptic error. So I thought, if I have to fight through cryptic errors, might as well be doing it by hand so I learn about the steps.
[23:51] <andrewbogott> If that's foolish, I'll go back to makecephfs :)
[23:52] <dmick> heh. no, I understand.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.