#ceph IRC Log

Index

IRC Log for 2011-06-05

Timestamps are in GMT/BST.

[0:23] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[1:26] * verwilst (~verwilst@dD576974E.access.telenet.be) Quit (Quit: Ex-Chat)
[2:23] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[3:33] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) has joined #ceph
[3:37] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[3:43] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[3:56] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) has joined #ceph
[3:56] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) Quit ()
[4:43] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[4:55] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[5:02] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[5:07] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[5:10] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[5:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[5:19] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[6:22] * joshd (~jdurgin@rrcs-74-62-34-205.west.biz.rr.com) has joined #ceph
[6:34] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[7:00] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[7:08] * nolan (~nolan@phong.sigbus.net) Quit (Remote host closed the connection)
[7:10] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[7:20] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: Been here. Done that.)
[8:45] * joshd (~jdurgin@rrcs-74-62-34-205.west.biz.rr.com) Quit (Quit: Leaving.)
[10:06] * allsystemsarego (~allsystem@188.27.167.240) has joined #ceph
[10:07] * sugoruyo (~george@athedsl-397373.home.otenet.gr) has joined #ceph
[10:07] <sugoruyo> hi all, anyone here?
[10:09] <sugoruyo> i'm trying to set up a small ceph system (3 machines) for testing
[10:09] <sugoruyo> and i'm having some trouble with the `ceph` command and authentication can anyone help me out a bit?
[10:16] <wonko_be> paste the error in pastie or pastebin
[10:16] <wonko_be> and post it here
[10:19] <sugoruyo> wonko_be: i don't exactly get an error, it's just if i try to use the command in any way (e.g. i want to send a pg dump command) it puts me in interactive mode, but there is no interaction even then, just a bunch of errors on authentication (which i haven't setup explicitly)
[10:30] <wonko_be> "a bunch of errors" seems to be a good candidate to pastie :-)
[10:31] <wonko_be> toegether with the command...
[10:31] <sugoruyo> ok gimme a sec to gather some output and paste it
[10:32] <wonko_be> also, what auth have you configured in the config...
[10:32] <sugoruyo> nothing, i have nothing in my config to do with any kind of auth
[10:34] <sugoruyo> http://pastie.org/2021528
[10:34] <sugoruyo> here's the output from running `ceph`
[10:35] <sugoruyo> now if i type in `ceph osd stat`
[10:36] <sugoruyo> i still get put into interactive mode and receive just these messages
[10:39] <wonko_be> seems like it can't contact the mon
[10:40] <sugoruyo> you mean the `ceph` command can't contact the mon? it's running on the same machine and the conf file is where it looks for it by default, if i try it with -m <mon ip> it still does the same thing
[10:40] <sugoruyo> i got no errors when i did mkcephfs so i just assumed that my ceph.conf is correct
[10:41] <wonko_be> not to sound stupid, but you're sure the mon is running?
[10:42] <wonko_be> also, check that /etc/ceph/keyring exists and has data in it
[10:43] <sugoruyo> when i run `sudo service ceph -a start` (it's on Ubuntu server 10.10) the output regarding the mon is
[10:43] <sugoruyo> === mon.0 ===
[10:43] <sugoruyo> Starting Ceph mon.0 on littleboy...
[10:43] <sugoruyo> starting mon.0 rank 0 at 10.254.254.100:0/0 mon_data /srv/mon0 fsid f5c8cd4b-7d58-6c4f-e836-4d981a2f9fcc
[10:44] <sugoruyo> (i cut out the warning about ceph being under developemnt)
[10:44] <wonko_be> verify that it is running using "ps auxwww | grep cmon"
[10:44] <wonko_be> if it isn't, check your logs
[10:45] <sugoruyo> `ps auxwww | grep cmon` output: root 8577 0.0 0.3 66308 7792 ? Ssl 11:31 0:00 /usr/bin/cmon -i 0 -c /etc/ceph/ceph.conf
[10:47] <wonko_be> do you have data in /etc/ceph/keyring?
[10:47] <sugoruyo> i have a suspicion (it's probably stupid but...): do i need to specify a port for mon.0 to be contacted on in the ceph.conf?
[10:47] <wonko_be> it should default okay
[10:48] <sugoruyo> yes i have some data in the keyring, a [client.admin] section with a key and auid
[10:48] <wonko_be> hm
[10:48] <wonko_be> have you checked you log for the mon?
[10:49] <sugoruyo> i'm looking at it right now, i'm not sure what to look for however
[10:51] <sugoruyo> it says it won the standalone election, but then it says '0 osds: 0 up, 0 in' when i should have 6 osds
[10:51] <wonko_be> hmmz
[10:52] <wonko_be> you got some firewalling there?
[10:52] <wonko_be> and is the mon standalone (only one running?)
[10:52] <sugoruyo> no f/w, but now i noticed something weird in the output of the start command
[10:52] <sugoruyo> the mon is standalone
[10:53] <sugoruyo> the output seems to indicate it was starting the osds and mdss on 0.0.0.0
[10:53] <wonko_be> that is okay
[10:55] <sugoruyo> the host parts of each osds definition is hostname, which is in the /etc/hosts file so that should be enough right? i mean it SSH's into things and it seemed to be contacting the right machines for the right daemons upon creation
[10:55] <wonko_be> true
[10:55] <wonko_be> but for communication once it is running, the nodes should be able to contact each other network-wise
[10:55] <wonko_be> and somehow your osd's don't connect to your mon
[10:56] <wonko_be> looks like connectivity issues
[10:56] <wonko_be> so, firewalling (general or on the host with iptables), selinux, ... would be the first things I'd look at
[10:57] <wonko_be> check what port the mon is listening on, and try to connect to it
[10:57] <sugoruyo> they're on a Gigabit 8-port unmanaged switch and of course in the same subnet
[10:57] <sugoruyo> i haven't configured any f/w on the machines but i'm not sure if they would be blocked by default
[10:58] <sugoruyo> i'll try telnet into the mon's port
[10:58] <wonko_be> do a "iptables -L"
[10:58] <wonko_be> and "netstat -anp | grep cmon | grep LISTEN"
[10:58] <wonko_be> or something alike
[10:58] <sugoruyo> i see 3 chains, they all have policy accept
[10:58] <sugoruyo> i'm not familiar with iptables though
[10:58] <wonko_be> an no rules I assume
[10:58] <wonko_be> only the three chains?
[10:59] <sugoruyo> just the three chains, the netstat | grep... command shows one line of output
[11:00] <sugoruyo> tcp 0 0 10.254.254.100:6800 0.0.0.0:* LISTEN 8577/cmon
[11:00] <wonko_be> that is a funky port
[11:01] <wonko_be> sorry, should be okay
[11:01] <wonko_be> if you defined it like that in your config
[11:01] <sugoruyo> isn't 6800 what a mon should be listening to? that's what i read in the wiki, but i defined no ports in the config
[11:02] <wonko_be> try to telnet into it... it should be garbage that you see, but you should get something
[11:03] <sugoruyo> it connects and gives me garbage
[11:03] <wonko_be> hmmz, mon should be on port 6789
[11:04] <wonko_be> no idea if that is the problem
[11:05] <wonko_be> check the sample config on http://ceph.newdream.net/wiki/Cluster_configuration#Example_configuration
[11:05] <sugoruyo> you're right i'm looking at the wiki, 6800 is the first port for an mds, the machine running the monitor also runs an mds
[11:06] <wonko_be> notice that you should define the sections with a dot now ([mon.0] and not [mon0])
[11:06] <sugoruyo> yeah i know, it wouldn't run mkcephfs if i didn't change that
[11:06] <wonko_be> uhu
[11:08] <sugoruyo> if i go to my ceph.conf and add the port to the mon.0 section, how do i propagate this change? do i need to copy the new ceph.conf file to all nodes then restart everything or do i need to `mkcephfs` again?
[11:10] <wonko_be> copy it around, no need to recreate the fs
[11:10] <sugoruyo> ok let me try that
[11:10] <wonko_be> then restart your mon
[11:11] <wonko_be> use "ceph -s" to see the status of your monitor
[11:13] <sugoruyo> when i run `ceph -s` it should just print some output then quit right? it reacts like you might expect from `ceph -w`
[11:14] <wonko_be> indeed
[11:14] <sugoruyo> it just spews out all those messages
[11:14] <sugoruyo> restarting the mon i get this in the output
[11:14] <sugoruyo> WARNING: 'mon addr' config option 10.254.254.100:6789/0 does not match monmap file
[11:14] <sugoruyo> continuing with monmap configuration
[11:14] <wonko_be> ah, indeed
[11:15] <sugoruyo> i'm guessing the monmap is cached somewhere
[11:15] <wonko_be> it is
[11:15] <wonko_be> in the mon metadata
[11:15] <wonko_be> just a sec
[11:15] <sugoruyo> that's where i tell to put it in my config right?
[11:16] <wonko_be> right, but don't go changing it... anyhow, even with the mon on a different port, it should work
[11:16] <wonko_be> unless the osds/mdss don't honour the port in the config
[11:17] <wonko_be> let me verify that quickly
[11:17] <sugoruyo> well not even the mon honours it! it still starts at 6800
[11:18] <wonko_be> that is normal, it tell you this "continuing with monmap configuration"
[11:18] <sugoruyo> in fact i copied the ceph.conf around and restarted everything to be safe but it seems to do the same
[11:19] <wonko_be> just a second, let me put up a cluster with the mon on a different port
[11:19] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[11:24] <wonko_be> actually, this works for me
[11:25] <wonko_be> but then again, it gives me my port on my starting line "starting mon.0 rank 0 at 10.1.10.240:6800/0 mon_data /data/mon0 fsid 6e52aeb8-5f5a-b0ce-6de5-1f6bed6e3f6a"
[11:25] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[11:28] <sugoruyo> well, i get the stuff about the monmap
[11:28] <sugoruyo> and then a similar line but the port is 0
[11:29] <sugoruyo> starting mon.0 rank 0 at 10.254.254.100:0/0
[11:29] <sugoruyo> maybe i need to get rid of the monmap... so it goes by the config
[11:29] <wonko_be> switching back to 6800 should be okay
[11:29] <wonko_be> I verified it, it works
[11:30] <wonko_be> and then it will be in spec with the monmap again
[11:30] <sugoruyo> then my problem lies elsewhere
[11:30] <wonko_be> lets see the output of your monmap
[11:31] <wonko_be> for me it is "monmaptool --print /ceph/mon0/monmap/1"
[11:31] <wonko_be> but your path will be different
[11:31] <wonko_be> "/ceph/mon0" is my monitor metadata path
[11:31] <sugoruyo> i changed the config back to no port specification, copied it around and restarted the whole thing
[11:31] <wonko_be> ah, no port
[11:32] <wonko_be> interesting
[11:32] <sugoruyo> i still get port 0 when starting mon
[11:32] <sugoruyo> let me check the monmap
[11:32] <wonko_be> still get the error about the monmap not being in sync with the config?
[11:33] <sugoruyo> nope, but i still get 10.254.254.100:0/0 in the output when it starts
[11:33] <sugoruyo> should i paste monmap/1 in here or pastie? 6 lines
[11:33] <wonko_be> just the one line with the IP of the monitor
[11:33] <wonko_be> 0: xxx
[11:34] <sugoruyo> 0: 10.254.254.100:0/0 mon.0
[11:34] <wonko_be> no idea how it handles a port of 0
[11:34] <wonko_be> but that can't be good
[11:34] <sugoruyo> tcp 0 0 10.254.254.100:6800 0.0.0.0:* LISTEN 19576/cmon
[11:34] <sugoruyo> that's the netstat output
[11:35] <wonko_be> can you pastie your config?
[11:35] <sugoruyo> gimme a sec
[11:35] <sugoruyo> i also customised the crushmap
[11:36] <sugoruyo> http://pastie.org/2021685
[11:38] <sugoruyo> looking back at it
[11:38] <sugoruyo> my crushmap lists a device twice (in two different second-layer buckets)
[11:42] <wonko_be> aha, I think I have recreated your problem
[11:45] <sugoruyo> i see, does that shed any light? this is the first ceph setup i've attempted so i lack experience with things, the docs aren't always helpful either
[11:46] <wonko_be> yeah, if you can restart, I'd suggest to explicitly specify the port in the config, and then recreate everything
[11:46] <wonko_be> or hang on for 5 more minutes to see how I can get this thing going again
[11:47] <sugoruyo> well since i need to fix my crushmap i might just do another `mkcephfs`
[11:48] <wonko_be> jup, but add the port to you mon addr
[11:48] <wonko_be> mon addr = 10.254.254.100
[11:48] <wonko_be> mon addr = 10.254.254.100:6789
[11:48] <wonko_be> like that
[11:48] <sugoruyo> yeah, i fixed the map, and changed that in ceph.conf
[11:49] <sugoruyo> do you see anything else before i `mkcephfs` again?
[11:49] <wonko_be> if you can wait for 5 more minutes
[11:49] <sugoruyo> sure
[11:52] <wonko_be> yeah, can't seem to fix it easily
[11:52] <wonko_be> (i'm just a normal user, no ceph developer :-))
[11:52] <wonko_be> also, with your custom crushmap, you might want to load it once the cluster has started okay
[11:52] <wonko_be> and not upon creation of the cluster
[11:53] <sugoruyo> well you seem to have a much better handle on things than me so any help is greatly appreciated
[11:53] <sugoruyo> i
[11:53] <wonko_be> if you can wait, some real dev might step in and tell you the solution ...
[11:54] <wonko_be> but no idea when that might happen, being sunday and all
[11:54] <sugoruyo> well that depends on how long i would have to wait...
[11:54] <wonko_be> it seems that not specifying a port makes it default to 6800 sometimes, and to 0 on other occasions
[11:55] <sugoruyo> the thing with the crushmap is, that i have this problem with my `ceph` command that just kicks me into that mode where i can only watch the output not send any commands and i'm afraid it might make it impossible to use the map afterwards
[11:55] <wonko_be> problem that I face is that I don't see how I should update the monmap without the monitor being running, which gives me a chicken-and-egg problem
[11:56] <sugoruyo> the good thing is i don't have any data in there so i can just mkcephfs again and again untiil it works
[11:56] <wonko_be> just get your cluster running first, ceph should not give you any troube reloading a new map
[11:56] <wonko_be> i've done that a million times (give or take a few)
[11:57] <wonko_be> check this page for info on loading a new map http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
[11:59] <sugoruyo> i `mkcephfs`'d again, starting things seems to start the mon at 6789
[11:59] <sugoruyo> i'll try the `ceph` command now
[11:59] <sugoruyo> `ceph -s` now outputs some stuff and the drops me back to the shell (as i believe it should)
[12:00] <wonko_be> aha
[12:00] <wonko_be> that looks good
[12:00] <wonko_be> if you do ceph -w, it should be talking about scrubbing
[12:00] <wonko_be> ctrl-c to abort
[12:00] <sugoruyo> should i pastie the ceph -s output so you can have a look?
[12:00] <wonko_be> feel free, but should be okay I assume
[12:01] <sugoruyo> ceph -w outputs a bunch of
[12:01] <sugoruyo> messages which end in scrub ok
[12:04] <sugoruyo> do you know how i can tell it to replicate something x times? eg i want 2x replication for data and 3x for metadata
[12:04] <wonko_be> jup, let me look that up
[12:04] <sugoruyo> is it the min_size, max_size parameters in the crushmap?
[12:05] <wonko_be> there you go: http://ceph.newdream.net/wiki/Adjusting_replication_level
[12:05] <wonko_be> has (nearly) nothing to do with the crushmap
[12:05] <sugoruyo> the one i generated with crushtool and then altered by hand seems to only have a ruleset for data
[12:06] <wonko_be> you should copy/paste it for the other sets also
[12:06] <wonko_be> and adapt the ruleset nr and the name
[12:07] <sugoruyo> also do you know where i can find some sort of guide on the whole crushmap syntax thing? i mean in rules it has these "step" lines like
[12:07] <sugoruyo> step chooseleaf firstn 0 type node
[12:07] <wonko_be> it is somehow summarized on the "custom data placement"
[12:07] <wonko_be> but not that lengthy
[12:08] <wonko_be> this is the only place I know: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH#Writing_rules
[12:10] <sugoruyo> yeah, i've gone through that in the wiki, i was wondering if there was something more complete
[12:14] <sugoruyo> i'll try mounting now!
[12:14] <sugoruyo> thanks a lot for all your help
[12:16] <wonko_be> np, I changed your problem in a ceph issue on the issue tracker
[12:16] <wonko_be> feel free to add your comments to it if I missed something http://tracker.newdream.net/issues/1143
[12:20] <sugoruyo> i looked at the bug report you posted, seems to sum up things nicely and completely (at least as far as i can tell), i'll keep an eye on it
[12:25] <sugoruyo> seems to work so far, do you know how i can make it mount with read, write permissions for all users?
[12:27] <wonko_be> just mount it somewhere, and start using it as normal
[12:27] <wonko_be> see it like a normal networked filesystem
[12:27] <sugoruyo> i'm on ubuntu so i need to sudo mount it, and then it only allows root to access ti
[12:28] <wonko_be> nfs and the likes
[12:28] <wonko_be> add it to your fstab, so it mount at system boot
[12:28] <wonko_be> that is nothing ceph-related but normal sysadmin stuff
[12:28] <wonko_be> same goes with other filesystems (local and/or remote)
[12:28] <sugoruyo> unless, i'm stupid enough to have left the mountpoint with perms for only root access ....
[12:39] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[12:54] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[13:43] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:51] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[13:55] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:10] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[14:53] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[15:34] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[15:50] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[16:57] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[17:09] * sugoruyo (~george@athedsl-397373.home.otenet.gr) Quit (Quit: sugoruyo)
[17:32] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[17:52] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[18:05] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[18:10] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[19:09] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[19:15] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[20:24] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[21:30] * verwilst (~verwilst@d51A5B40A.access.telenet.be) has joined #ceph
[22:02] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[22:20] * alexxy (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[22:26] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:41] * allsystemsarego (~allsystem@188.27.167.240) Quit (Quit: Leaving)
[22:57] * Yulya__ (~Yu1ya_@ip-95-220-235-13.bb.netbynet.ru) has joined #ceph
[23:04] * Yulya_ (~Yu1ya_@ip-95-220-173-252.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[23:16] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.