#ceph IRC Log

Index

IRC Log for 2011-02-16

Timestamps are in GMT/BST.

[0:03] <Tv> sjust: btw funky indents: git show 28bb6fb5271802beaed14309ff663719944c9aa4|grep prefix.oid
[0:03] <sjust> Tv: yeah, fixing it, forgot the header line for emacs
[0:04] <cmccabe> how to get backtrace_symbols_fd to log to syslog?
[0:04] <cmccabe> there seems to be no good way
[0:27] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[0:46] <Tv> alright time to go pick up my car / slash my wrists in LA traffic
[0:54] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:02] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) Quit (Ping timeout: 480 seconds)
[2:42] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:59] * ooolinux (~bless@203.114.244.22) has joined #ceph
[3:00] <ooolinux> hi
[3:01] <cmccabe> ooolinux: hi
[3:02] <ooolinux> pg_pool 0 'data' pg_pool(rep pg_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 lpg_num 2 lpgp_num 2 last_change 4 owner 0)
[3:03] <ooolinux> the msg is 'ceph osd dump -o - ''s output, do you know the meaning of every part?
[3:03] <cmccabe> ooolinux: looks like that line describes pool 0, which is named 'data'
[3:05] <cmccabe> ooolinux: pgp_num, lpg_num, and lpgp_num have something to do with sizing the PGs in the pool
[3:05] <cmccabe> the other stuff is crush-related and I haven't dealt with it
[3:05] <ooolinux> yes, rep pg_size is duplicate 2 times
[3:08] <cmccabe> ?
[3:18] <ooolinux> rep pg_size means pool data has 2 duplicate.
[3:21] <cmccabe> ooolinux: yeah, good point.
[3:21] <cmccabe> ooolinux: I have to go, night!
[3:21] <ooolinux> bye
[3:21] * cmccabe (~cmccabe@208.80.64.79) has left #ceph
[5:52] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:36] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[7:51] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:12] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:24] * Psi-Jack (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[9:26] <Psi-Jack> I just installed Debian 6.0 and installed ceph 0.24.3 on it from ceph's repos and I'm trying to work with rbd stuff. The problem I'm getting is, cclass -a shows the rbd class, I can ceph class activate rbd 1.3 no problem, but when I use the rbd tool, such as rbd list, it just hangs there doing nothing. I can't list, create, anything.
[9:49] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[9:54] * allsystemsarego (~allsystem@188.25.130.49) has joined #ceph
[10:08] * Yoric (~David@213.144.210.93) has joined #ceph
[10:33] <Psi-Jack> okay, now I have ceph 0.24.3 mds, mon, and osd running on my Debian 6 server, and trying to cfuse it from my Debian 5 server it never actually mounts anything, just does exactly like the rbd tool does, and hangs, doing nothing.
[10:42] * ooolinux (~bless@203.114.244.22) Quit (Ping timeout: 480 seconds)
[10:57] * ao (~ao@85.183.4.97) has joined #ceph
[12:53] * Guest1424 (quasselcor@bas11-montreal02-1128535712.dsl.bell.ca) Quit (Remote host closed the connection)
[12:56] * bbigras (quasselcor@bas11-montreal02-1128535712.dsl.bell.ca) has joined #ceph
[12:56] * bbigras is now known as Guest1571
[13:43] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[13:43] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has left #ceph
[13:45] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[13:47] * Guest1571 (quasselcor@bas11-montreal02-1128535712.dsl.bell.ca) Quit (Ping timeout: 480 seconds)
[14:10] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[15:13] * allsystemsarego (~allsystem@188.25.130.49) Quit (Quit: Leaving)
[15:18] * alexxy (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[15:20] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[15:30] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[15:54] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[15:54] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[16:05] <cclien_> hello
[16:05] <cclien_> any tips for debugging radosgw w/ lighttpd?
[16:08] <cclien_> lighttpd doesn't show any log entries from radosgw.
[16:24] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[16:27] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[16:28] <greglap> cclien_: I think it requires apache right now
[16:41] <cclien_> greglap: yes. I have tried it with Apache, it works very well. Just curious why it doesn't work on lighttpd :)
[16:41] <cclien_> With lighttpd, it treats object name as a part of bucket name.
[16:42] <greglap> it runs on lighttpd?
[16:42] <greglap> I'm not real up to speed on rgw's environment stuff
[16:43] <greglap> but it has pretty specific requirements about how plugins interact with the server daemon in order to perform well and within spec
[16:43] <greglap> like to behave properly it actually requires a hacked modfcgi
[16:49] * gregorg (~Greg@78.155.152.6) Quit (Quit: Quitte)
[16:49] * gregorg (~Greg@78.155.152.6) has joined #ceph
[16:51] <cclien_> greglap: I think so. I am wordering if there any possibily to make it lighttpd-compatible.
[16:51] <cclien_> that's why I am looking for tips for debugging :D
[16:52] <greglap> I'll ask yehuda about it, but I wouldn't count on :(
[16:52] <cclien_> greglap: Thanks :D
[16:53] * ao (~ao@85.183.4.97) Quit (Quit: Leaving)
[17:19] * FoxMURDER (~fox@ip-89-176-11-254.net.upcbroadband.cz) has joined #ceph
[17:21] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[17:25] <FoxMURDER> Hey. I know i'm not the first, nor the las to ask, but is ceph - especially its rbd part, trustworthy? Just figured out that redhat-cluster+drbd+clvm is too complicated for my setup and want to give ceph a shot. But i wasn't able to figure out the probability of losing data.
[17:41] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[17:45] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:53] * greglap (~Adium@166.205.136.169) has joined #ceph
[17:58] <greglap> FoxMURDER: as a dev, I think RBD is doing pretty well — people are building systems with it now but I'm not sure that anybody's actually got a production system currently running on it
[17:59] <greglap> it's pretty stable though, I've heard of some issues with snapshots but I don't recall hearing about anything that's lost live data in a long time
[18:00] <greglap> if you stick around for a while you may get more feedback from people using it more than I do, though
[18:00] <greglap> :)
[18:05] <FoxMURDER> sounds pretty good :) thanks for the info ...
[18:07] <FoxMURDER> one more question ... is there a way to have two distinctive storage pools? i mean i have two different kinds of storage - small, but fast (15krpm drives) and large, but slower (7.2krpm drives) and want to be able configure which data goes where. is crush the thing i should be looking at?
[18:08] <greglap> yep!
[18:08] <greglap> that's one of the motivations for RADOS' "pools" :)
[18:09] <FoxMURDER> ah. perfect. i'm off to do some more ceph reading :)
[18:09] <greglap> you'd create a pool "fast" and a pool "big" or whatever, and then assign each of them different crush rules
[18:09] <monrad-51468> but you can't "tier" the ceph data right?
[18:09] <monrad-51468> oh you can
[18:09] <greglap> tier it how?
[18:09] <greglap> like in what sense?
[18:10] <monrad-51468> well we have some hitachi storage that can store more used data on faster (but smaller disks, e.g SSDs)
[18:11] <greglap> ah, well it's not capable of moving hotspot data to faster storage or otherwise automatically detecting it, no :(
[18:11] <greglap> but you can set up some pretty complex data placement rules
[18:12] <monrad-51468> but i guess that could make its way into ceph at some point along the road
[18:12] <monrad-51468> but right now we are limited by metadata lookups
[18:12] <greglap> the possibility for certain kinds of optimizations definitely exists
[18:12] <monrad-51468> and at least ceph can do many MDS's
[18:15] <monrad-51468> how does they scale?
[18:15] <greglap> across multiple MDSes?
[18:17] <greglap> haven't tested it lately I'm afraid, but in an early form for his thesis sage tested it
[18:18] <monrad-51468> ah ok
[18:18] <monrad-51468> i think that would be a big hit here
[18:19] <greglap> he's got a nice little graph in his thesis, you can download it from the website
[18:19] <greglap> but eg "In the makedirs workload, each client creates a tree of nested directories four levels deep, with ten files and subdirectories in each directory. Average MDS throughput drops from 2000 ops per MDS per second with a small cluster, to about 1000 ops per MDS per second (50% efficiency) with 128 MDSs (over 100,000 ops/sec total)."
[18:20] <greglap> I think we've implemented some stuff since then to make per-MDS throughput higher in that kind of workload, but I'd have to check
[18:24] * sagelap (~sage@216.2.29.104) has joined #ceph
[18:32] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:35] * sagelap (~sage@216.2.29.104) Quit (Ping timeout: 480 seconds)
[18:39] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:40] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[18:42] * greglap (~Adium@166.205.136.169) Quit (Quit: Leaving.)
[18:49] * atg (~atg@please.dont.hacktheinter.net) Quit (Remote host closed the connection)
[18:49] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[18:52] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[18:57] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[19:15] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:23] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[19:26] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[19:26] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:28] * allsystemsarego (~allsystem@188.25.130.49) has joined #ceph
[19:47] <cmccabe> tv: so one thing about configuration that is a little funky right now is this
[19:47] <cmccabe> tv: cconf only parses the stuff that it cares about
[19:48] <Tv> huh
[19:48] <cmccabe> tv: so it's not possible to, say, iterate over all MDS configs in cconf.
[19:48] <cmccabe> tv: each time you run cconf, you set g_conf.id
[19:48] <Tv> cmccabe: i was *just* doing that, with cconf, by using --list-sections
[19:48] <cmccabe> tv: that determines which configuration sections are read. The rest are thrown away
[19:48] <cmccabe> tv: well, listing sections is one thing
[19:49] <cmccabe> tv: but did you notice that g_conf only has one data structure, for, say, lockdep?
[19:49] <Tv> that gives me what ids to iterate over, then i just call -i ... -t ...
[19:49] <cmccabe> tv: or addr, or other fields like that
[19:49] <cmccabe> tv: yeah, you can do it by invoking cconf many times I guess
[19:49] <cmccabe> tv: that is one reason I would just like to have something that allows me to look at the config file as a whole
[19:50] <Tv> cmccabe: yes but if we write that in python, then i'm going to argue it must take over cconf's role
[19:50] <Tv> hence, i write cconf commands for now, we can clean that up later
[19:50] <cmccabe> tv: I'm fine with taking over cconf's role
[19:50] <Tv> frankly, i'd like to postpone that work
[19:50] <Tv> there's huge missing gaps
[19:51] <Tv> and cconf is plenty fine to do what's needed, at least for me
[19:51] <Tv> come back to that when we have >100 automated tests ;)
[19:51] <Tv> that'll also tell you *what* is needed, so the replacement is not written in a vacuum
[19:52] <cmccabe> tv: here is something we've talked about doing that depends on having a real config file parser
[19:52] <cmccabe> tv: having a python script that is like dsh (exeucting a command on many machines), but which uses the ceph.conf to discover the IP addresses of those machines.
[19:52] <cmccabe> tv: then we could drop the buggy -a stuff from init-ceph and stop burning time on that
[19:52] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[19:52] <Tv> that's completely doable with cconf, i think
[19:53] <Tv> anyway, i don't think replacing one kind of ssh with another is much value
[19:53] <cmccabe> tv: it's painful and annoying
[19:53] <cmccabe> tv: the value is getting it out of init-ceph
[19:54] <Tv> look, i'm writing "give the ips and ports of all mons" right now, and it looks like ~30 lines of verbose python
[19:54] <bchrisman> just checking here… cconf extracts values only from a config file… doesn't actually pull values from running services…. that correct?
[19:54] <Tv> bchrisman: yes
[19:54] <cmccabe> tv: can you put those ~30 lines of verbose python into a library that I can use?
[19:55] <cmccabe> tv: hmm.. separate repos. Guess not.
[19:55] <Tv> cmccabe: right now they're going into teuthology.ceph, but i don't see that as the big challenge..
[19:55] <Tv> also, autotest stuff goes through a few utility functions you won't have outside of it
[19:55] <Tv> but anyway, it's pretty simple stuff
[19:56] <cmccabe> tv: I could write a python ceph.conf file parser in 15 mins. If you hate it, you can go back to using cconf. But I think it will be much better.
[19:56] <Tv> cmccabe: i don't see it being that valuable right now
[19:56] <Tv> cmccabe: as in, my priorities are definitely elsewhere
[19:57] <Tv> and honestly, i'd rather argue for using a more common config file format, than writing custom parsers
[19:58] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[20:00] <cmccabe> tv: I think the differences between our format and the standard python configfile format are totally trivial
[20:00] <cmccabe> tv: they're all whitespace
[20:00] <Tv> cmccabe: except ConfigParser uses leading whitespace for continuation lines
[20:00] <Tv> now every line looks like a continuation line
[20:01] <cmccabe> tv: oh those python people and their significant whitespace...
[20:01] <Tv> RFC-822...
[20:01] <cmccabe> tv: anyway, this class is now totally trivial. It literally consists of writing a regexp, let's see if I'm up to it.
[20:01] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:02] <sjust> Tv: I can't seem to find the wiki page, what is the hostname for the testvm host?
[20:02] <Tv> sjust: you see the hostnames in the autotest web ui, they're testvm-NNN.ceph.newdream.net
[20:02] <Tv> sjust: oh you mean the kvm host
[20:02] <sjust> yeah
[20:03] <Tv> looking..
[20:03] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[20:03] <Tv> sjust: https://uebernet.dreamhost.com/wiki/CephLibvirt looks good to me
[20:03] <Tv> linked from the Ceph page
[20:03] <sjust> Ah! thanks
[20:04] <Tv> sjust: you'll need to be able to ssh in as root to use virt-manager
[20:04] <prometheanfire> https://uebernet.dreamhost.com/wiki/CephLibvirt doesn't work for me
[20:04] <sjust> Tv: yeah
[20:04] <Tv> prometheanfire: staff only
[20:04] <prometheanfire> ah
[20:04] <sjust> Tv: actually, I just need atomic ops install on the test machines
[20:05] <Tv> sjust: dev or libs?
[20:05] <sjust> dev
[20:05] <gregaf> I don't think there is a lib — just a header file :)
[20:06] <sjust> true
[20:06] <Tv> yeah
[20:07] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:07] <Tv> sjust: done
[20:07] <sjust> Tv: thanks
[20:08] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[20:13] <cmccabe> tv: oh, have you taken a look at /root/packages on flak?
[20:13] <Tv> cmccabe: nope
[20:13] <cmccabe> tv: I'm guessing that many of those packages are things people will want in the test env
[20:13] <cmccabe> tv: of course... probably not emacs or vi, or all that :)
[20:14] <Tv> there's a way for tests to say "i need this deb installed", i'm just not quite clear on the details yet
[20:14] <gregaf> yeah, that's all the stuff that should be installed on all our dev machines; it should probably get on the test ones too
[20:14] <gregaf> it's our standard dev environment ;)
[20:15] <Tv> so, urr, cconf is a read-only tool? bummer
[20:15] <Tv> i was kinda expecting it to be the equivalent of "git config"
[20:16] <cmccabe> tv: it's basically just calling common_init and then reading from g_conf
[20:29] <Tv> ConfigObj can *almost* parse ceph.conf.. just ";" as comment instead of "#" is stopping it
[20:29] <Tv> checking one alternative, then forking ConfigObj a bit for our porpoises
[20:31] <cmccabe> #!/usr/bin/python
[20:31] <cmccabe> from ConfigParser import *
[20:31] <cmccabe> import re, sys, tempfile
[20:31] <cmccabe> class CephConfigParser(ConfigParser):
[20:31] <cmccabe> """Extended ConfigParser that can read Ceph config files"""
[20:31] <cmccabe> def read(self, conf_file_name):
[20:31] <cmccabe> ceph_conf_re = re.compile("[ \t]*(?P<key>[^=]*)[ \t]*=(?P<val>.*)$")
[20:31] <cmccabe> conf = open(conf_file_name, 'r')
[20:31] <cmccabe> temp = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
[20:31] <cmccabe> while True:
[20:31] <cmccabe> line = conf.readline()
[20:31] <cmccabe> if (line == ""):
[20:31] <cmccabe> break
[20:31] <cmccabe> match = ceph_conf_re.match(line)
[20:31] <cmccabe> if (not match):
[20:31] <cmccabe> temp.write("%s" % line)
[20:31] <cmccabe> else:
[20:31] <cmccabe> key = match.group('key').replace (" ", "_")
[20:31] <cmccabe> if key[-1] == '_':
[20:31] <cmccabe> key = key[0:-1]
[20:31] <cmccabe> temp.write("%s=%s\n" % (key, match.group('val')))
[20:31] <cmccabe> conf.close()
[20:31] <cmccabe> print "temp.name = '%s'" % temp.name
[20:31] <cmccabe> ConfigParser.read(self, temp.name)
[20:32] <cmccabe> anyway. I have to meet someone for lunch, I'm curious if you come up with something better. If you do, can you put it somewhere in the main repo so I can use it too?
[20:33] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has left #ceph
[20:43] <Tv> for tests, that control their circumstances anyway, just avoid ";" comments and use configobj
[20:43] <Tv> http://www.voidspace.org.uk/python/configobj.html
[20:43] <Tv> works perfectly
[21:11] * sagelap (~sage@216.2.29.104) has joined #ceph
[21:14] * sagelap (~sage@216.2.29.104) Quit ()
[21:14] * sagelap (~sage@216.2.29.104) has joined #ceph
[21:20] * sagelap (~sage@216.2.29.104) Quit (Remote host closed the connection)
[21:55] <gregaf> Tv: I can't remember if we discussed this already — could we set up gitbuilder to email people when they break a branch?
[21:55] <Tv> gregaf: perrrhaps
[21:56] <Tv> the builder part doesn't really always know why it's building a particular commit, etc
[21:56] <Tv> as in, the same commit is on multiple branches
[21:56] <gregaf> I'm just noticing that master got broken (I think in the tests) :(
[22:02] <Tv> gregaf: the existing email things in there seem to be about emailing about everything new.. maybe something could be done with the rss feed
[22:02] <Tv> the rss isn't very helpful about blame either
[22:03] <gregaf> yeah, it's kind of a problem since if somebody breaks a branch and another person branches off from there then the new branch will be broken
[22:03] <gregaf> hmm
[22:05] <Tv> well the next step from that starts to sound like gerrit or something
[22:08] <prometheanfire> if there are not enough ods nodes up or some go do, is it similiar to a pulled plug as far as what happens to ceph
[22:08] <Psi-Jack> Wow, NOW people are here? heh
[22:12] <gregaf> prometheanfire: I'm not quite sure what you mean
[22:13] <gregaf> Psi-Jack: of course! all the Ceph devs are at work now
[22:13] <Psi-Jack> hehe
[22:13] <prometheanfire> how does ceph fail if a bunch of nodes fail
[22:13] <gregaf> prometheanfire: depends on which particular nodes in the system fail :)
[22:13] <prometheanfire> say I have it all on 1 rack and that racks power goes out
[22:13] <Psi-Jack> Well,I've been trying to actually use ceph, but it doesn't seem to be working at all for me.
[22:14] <prometheanfire> I might end up using it instead of nfs :D
[22:14] <gregaf> if you lose it all, then yeah, I think the clients will attempt to connect but be unable to reach their host devices
[22:14] <Psi-Jack> I installed Debian 6.0 and ceph 0.24.3 on it from ceph's repos and I'm trying to work with rbd stuff. The problem I'm getting is, cclass -a shows the rbd class, I can ceph class activate rbd 1.3 no problem, but when I use the rbd tool, such as rbd list, it just hangs there doing nothing. I can't list, create, anything.
[22:15] <Psi-Jack> Also, I have ceph 0.24.3 mds, mon, and osd running on my Debian 6 server, and trying to cfuse it from my Debian 5 server it never actually mounts anything, just does exactly like the rbd tool does, and hangs, doing nothing.
[22:15] <prometheanfire> and if I set the redundancy to 2 for ods and loose three ods nodes, same thing?
[22:15] <prometheanfire> (I'm basically wondering if I'll loose data like xfs)
[22:15] <yehudasa> Psi-Jack: sounds like your system is not up
[22:15] <gregaf> if you lose only part of your nodes but you have a current copy of all your data then everything will keep running, although it may hang for a short time while Ceph determines that nodes are really down and reassigns OSDs to cover them
[22:16] <Psi-Jack> yehudasa: It's up, running, and everything.
[22:16] <gregaf> prometheanfire: if you lose all copies of some data then attempts to access that data will hang but everything else will keep working
[22:16] <Psi-Jack> In fact, if I stop ceph services on the server, I get fault messages from cfuse, and that stops when I start it back up.
[22:17] <prometheanfire> ok, so it is like if nfs goes down, just hanging
[22:17] <yehudasa> Psi-Jack: what does 'ceph -s' show?
[22:17] <Psi-Jack> Sorta, yeah. Except it NEVER connects or anything/
[22:17] <yehudasa> you might just have monitors up but not osds
[22:17] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[22:18] <gregaf> prometheanfire: yeah, I guess so then
[22:18] <Psi-Jack> You want all 5 lines pasted here or pastebin'd?
[22:18] <gregaf> in a well-designed crush map it will take a lot of node failures for that to happen though!
[22:18] <yehudasa> pastebin is always appreciated
[22:18] <Psi-Jack> http://pastebin.com/fPbXk1Du
[22:19] <prometheanfire> gregaf: true, I like to know the worst case though :D
[22:19] <yehudasa> 2011-02-16 16:17:33.699964 osd e3: 0 osds: 0 up, 0 in
[22:19] <yehudasa> Psi-Jack: you don't have osds up
[22:19] <Psi-Jack> Hmm. How do I get them up?
[22:19] <yehudasa> how do you start the system?
[22:19] <Psi-Jack> It's a single server setup for now, just testing. ceph's running in a kvm guest for testing purposes.
[22:20] <Psi-Jack> I started it all with /etc/init.d/ceph start
[22:20] <yehudasa> how does your ceph.conf look like?
[22:21] <Psi-Jack> http://pastebin.com/6vKQbYWv
[22:23] <yehudasa> Psi-Jack: did you run mkcephfs?
[22:23] <Psi-Jack> Yep
[22:23] <yehudasa> do you have any logs e.g., at /var/log/ceph?
[22:23] <yehudasa> specifically osd logs
[22:24] <Psi-Jack> http://pastebin.com/6TgrTBAV
[22:25] <Psi-Jack> I'd originally had it in /mnt/ceph, then changed it to /data/osd0
[22:25] <yehudasa> you should probably run mkcephfs again
[22:25] <yehudasa> there's some inconsistency there
[22:26] <yehudasa> probably because you switched the data volume
[22:27] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[22:27] <Psi-Jack> Okay. Now it dhows 1 osds1: 1 up, 1 in
[22:27] <Psi-Jack> shows
[22:27] <yehudasa> and what does the mon line show?
[22:28] <yehudasa> I mean the 'pg' line
[22:28] <Psi-Jack> mon e1: 1 mons at {0=172.17.6.201:6789/0}
[22:28] <Psi-Jack> pg v6: 264 pgs: 264 active+clean+degraded; 24 KB data, 3364 KB used, 79507 MB / 79511 MB avail; 21/42 degraded (50.000%)
[22:28] <yehudasa> yeah, so rbd should be working now
[22:28] <Psi-Jack> Woohoo, indeed, it does.
[22:29] <Psi-Jack> When using cfuse, I should be able to see the mount using mount, yes?
[22:29] <Psi-Jack> Oh, blah, now cfuse is getting a connection timed out. :/
[22:29] <gregaf> "see the mount using mount"?
[22:29] <yehudasa> what does your mds line in ceph -s show?
[22:30] <Psi-Jack> mds e4: 1/1/1 up {0=up:active}
[22:30] <yehudasa> yeah, so you should be able to mount it
[22:30] <gregaf> you did restart cfuse, right?
[22:30] <Psi-Jack> Yes.
[22:30] <Psi-Jack> 2011-02-16 16:29:19.712041 7f83ee6656f0 monclient(hunting): authenticate timed out after 30
[22:31] <gregaf> hmmm, how're you starting cfuse?
[22:31] <Psi-Jack> cfuse -m 172.17.6.201:6789 /mnt/data
[22:32] <Psi-Jack> Which that is the right IP and port.
[22:33] <yehudasa> Psi-Jack, what happens if you just run 'cfuse /mnt/data'?
[22:34] <Psi-Jack> Same thing
[22:34] <yehudasa> and you said that rbd worked?
[22:34] <Psi-Jack> Yes, rbd list was able to show me something for once.
[22:34] <yehudasa> does it still show it?
[22:35] <Psi-Jack> Yes. Just not rbd images presently. ;)
[22:35] <yehudasa> can you 'cfuse --debug-ms=1 /mnt/data'?
[22:36] <Psi-Jack> Though I can't create anything. create error: Operation not supported, from rbd create foo --size 1024
[22:36] <yehudasa> 'cclass -a' might solve that
[22:36] <Psi-Jack> Nope.
[22:36] <yehudasa> ceph class list
[22:36] <yehudasa> ?
[22:36] <Psi-Jack> Getting the same Opperation not supported in cfyse, too, one sec and I'll pastebin
[22:37] <Psi-Jack> http://pastebin.com/EUjH1yjN
[22:38] <Psi-Jack> As for ceph class list, this is under mon0 installed classes: rbd (v1.3 [x86-64]) [active]
[22:38] <yehudasa> yeah, I think there is some other issue
[22:39] <yehudasa> can you paste the mon log?
[22:39] <Psi-Jack> Hmm, I had the auth stuff commented out in the ceph.conf
[22:39] <yehudasa> did you restart the mon after doing that?
[22:40] <Psi-Jack> Yes. But that's likely why it was failing was auth not being there.
[22:40] <yehudasa> but the mon shouldn't expect auth
[22:40] <yehudasa> because you commented that out
[22:41] <yehudasa> basically from what I see here, the cfuse is trying to use the 'none' authentication method and the mon tells it that it's not supported
[22:41] <Psi-Jack> Heh
[22:42] <gregaf> can you kill your monitor and restart the whole system?
[22:42] <gregaf> probably the cmon didn't get killed and when you try to start the system now the monitor starts up and can't bind its port because the old instance is using it
[22:42] <Psi-Jack> Sure
[22:43] <Psi-Jack> The ceph vm is restarting now. :)
[22:43] <Psi-Jack> Rebooting that is.
[22:44] <Psi-Jack> Nope, still getting the same stuff.
[22:45] <yehudasa> can you paste the mon log?
[22:45] <Psi-Jack> http://pastebin.com/WWiDTn3a
[22:48] <Psi-Jack> Hmmm
[22:48] <Psi-Jack> WITH the auth supported = cephx, I can't even get the osd's to come back up even after a fresh mkcephfs.
[22:50] <yehudasa> can you add 'debug ms = 1' to your ceph.conf?
[22:50] <yehudasa> also 'debug auth = 20'
[22:50] <Psi-Jack> I presume in global?
[22:50] <yehudasa> put it in the global section
[22:51] <Psi-Jack> Okay.
[22:52] <Psi-Jack> I still have complete failure to do anything when I do have auths supported = cephx, osds won't even come up.
[22:53] <yehudasa> Psi-Jack: at this point just disable the auth
[22:53] <Psi-Jack> Okay.
[22:54] <Psi-Jack> It's disabled, and everything's up and running now it seems.
[22:54] <yehudasa> cfuse?
[22:54] <Psi-Jack> Not yet, I'm trying that now and going to pastebin another mon log
[22:55] <Psi-Jack> http://pastebin.com/12rtKx1B
[23:02] <yehudasa> Psi-Jack: where's your ceph.conf located?
[23:02] <Psi-Jack> Standard location, /etc/ceph/ceph.conf
[23:02] <yehudasa> does your cfuse spit the extra networking messages now when you run it?
[23:03] <Psi-Jack> Yes
[23:04] <Psi-Jack> Blah.
[23:05] <Psi-Jack> You know what it was, on that side of things? I didn't copy the ceph.conf over from the server to the client so it was still attempting auth.
[23:07] <Psi-Jack> Though, rbd still doesn't work.
[23:08] <Psi-Jack> And now it does. heh
[23:14] <Psi-Jack> Hmmm, and so far the only side-effect is, Proxmox itself doesn't work at all with qemu's rbd or ceph's rbd features at all. Blah. ;)
[23:22] * frankl_ (~frank@november.openminds.be) Quit (Ping timeout: 480 seconds)
[23:23] * frankl_ (frank@november.openminds.be) has joined #ceph
[23:35] * allsystemsarego (~allsystem@188.25.130.49) Quit (Quit: Leaving)
[23:53] <Tv> sjust: i think you filled the disk on the vm..
[23:53] <prometheanfire> what about proxmox?
[23:54] <sjust> Tv: oops, yeah, forgot to mention that
[23:54] <prometheanfire> http://forum.proxmox.com/threads/5760-Ceph-and-file-storage-backend-capabilities
[23:54] <sjust> Tv: I would have expected it to remove the old runs from the client after archiving...
[23:54] <Tv> sjust: i gave them just 20giggers because copying them around was so slow
[23:54] <Tv> sjust: yeah it should.. i think this is from one run
[23:54] <sjust> Tv:
[23:54] <Tv> checking
[23:54] <sjust> oh
[23:54] <sjust> Tv: oh, actually, yeah
[23:54] <Psi-Jack> prometheanfire: Yep. I know. I saw your posting about it already. :)
[23:54] <Tv> nope, there's a bunch of temp dirs in there
[23:54] <prometheanfire> :D
[23:54] <sjust> Tv: ah
[23:55] <Tv> 13G /usr/local/autotest/tmp/tmpTd70Cs_snaps
[23:55] <Tv> lovely
[23:55] <Tv> so it doesn't know how to clean up after itself
[23:55] <Tv> writing that one down..
[23:55] <Psi-Jack> I'm using Proxmox for my virtualization infrastructure. I'm waiting for Proxmox 2.0 because it's going to be so much more awesome, and likely have more CLI tools to manage things due to the pacemaker usage. ;)
[23:56] <Tv> sjust: also, i haven't plugged in anything in the "repair" hook, that's supposed to do stuff like that i guess
[23:56] <sjust> Tv: ah
[23:56] <prometheanfire> ya, same
[23:56] <prometheanfire> but ganeti is getting shared storage in the next couple of weeks
[23:56] <Psi-Jack> Looked at ganeti, and just wasn't very... thrilled.
[23:56] <prometheanfire> which means ceph
[23:56] <prometheanfire> and rbd
[23:56] <Psi-Jack> Possibly also sheepdog? heh
[23:57] <prometheanfire> sec
[23:57] <prometheanfire> http://groups.google.com/group/ganeti/msg/ea48ff118c0d2729
[23:58] <Psi-Jack> Hmm, interesting.
[23:58] <cmccabe> how does ganeti compare to ganglia I wonder?
[23:58] <Psi-Jack> My /current/ setup is multipath'd iSCSI, multiple NAS servers exporting iSCSI and replicating by DRBD, then the vm host servers use those iSCSI targets under multipath.
[23:59] * bbigras (quasselcor@bas11-montreal02-1128535815.dsl.bell.ca) has joined #ceph
[23:59] <Psi-Jack> cmccabe: Ermm.. ganglia the monitoring system?
[23:59] * bbigras is now known as Guest1645

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.