#ceph IRC Log


IRC Log for 2011-11-29

Timestamps are in GMT/BST.

[8:10] <chaos_> sagewk, thanks for patch
[8:11] <chaos_> (hack)
[8:11] <chaos_> now i've build ceph;p
[8:33] <chaos_> checking for LIBEDIT... configure: error: No usable version of libedit found.p
[8:33] <chaos_> i can't build 0.38
[8:39] <chaos_> oh well ;) dpkg-buildpackage works better than manual compilation
[9:48] <chaos_> and hell broke loose ;/ i've hit http://tracker.newdream.net/issues/1612
[9:48] <chaos_> which kernel version fixes this?
[9:48] <chaos_> 3.1?
[9:48] <chaos_> 3.2?
[10:30] <chaos_> sjust, ping
[10:52] <chaos_> sagewk, gregaf - http://tracker.newdream.net/issues/1757, if u need anything alse add comment to bug report
[11:44] * pmjdebruijn (~pascal@overlord.pcode.nl) has joined #ceph
[11:44] <pmjdebruijn> hi
[11:45] <pmjdebruijn> I'm trying to build Ceph git master
[11:45] <pmjdebruijn> I noticed the new dependancy on libuuid
[11:46] <pmjdebruijn> however it does not seem to build with version 2.17.2
[11:46] <pmjdebruijn> even though configure seems to think it's okay
[11:47] <pmjdebruijn> https://launchpadlibrarian.net/86176461/buildlog_ubuntu-lucid-amd64.ceph_0.38%2B250~gc2889fe-1unnet2~lucid_FAILEDTOBUILD.txt.gz
[11:47] <pmjdebruijn> this is one of my buildlogs
[11:49] <pmjdebruijn> "checking for uuid_parse in -luuid... yes"
[11:50] <pmjdebruijn> http://packages.ubuntu.com/lucid-updates/amd64/uuid-dev/filelist
[11:50] <pmjdebruijn> "./include/types.h:22:18: error: uuid.h: No such file or directory"
[11:52] <pmjdebruijn> erhm
[11:52] <pmjdebruijn> shouldn't that have been src/include/types.h:#include "uuid.h"
[11:53] <pmjdebruijn> sorry
[11:53] <pmjdebruijn> #include "uuid/uuid.h"
[11:56] <pmjdebruijn> hmm wait, there is a local uuid.h
[15:48] <grape> Is there a definitive dependency list for building 0.38?
[15:48] <damoxc> grape: could check the debian/control file
[16:17] <grape> damoxc: thanks
[16:20] <Olivier_bzh> hi everybody,
[16:21] <Olivier_bzh> I would like to know if ceph can work with different osd's with different sizes ?
[16:22] <Olivier_bzh> is it possible, or should it be avoided ?
[16:22] <todin> Olivier_bzh: yes, you can ajust the weight of an osd
[16:23] <Olivier_bzh> ok thanks, and one more question to adjust and tune better ceph on my cluster :
[16:24] <lxo> gregaf, remember that snapshot creation problem I mentioned the other day (Friday?), that caused some files in snapshots to appear in getdents of the enclosing dir, but fail to stat? it's repeatable! I created a new copy of the same tree, took a snapshot, and voila, the same two files fail in the same way
[16:25] <Olivier_bzh> I have 3 servers with 1 osd per server
[16:25] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[16:26] <Olivier_bzh> the osd is taking a lot of proc, but I have often free procs available
[16:27] <Olivier_bzh> I would like to know if this is a good idea to have more osds on each server to balance the proc load ?
[16:27] <lxo> now, repeatable is good, but where should I start looking, in the snapshot creation code or in the dir listing out of snapshotted dirs, or elsewhere?
[16:28] <lxo> are there any utilities I could use to decode directories from the file that hold their info in osds?
[16:28] <todin> Olivier_bzh: it depends, I think you shouldn't have more than one osd per disk, if you have more than one diks in the machine you could start more than one osd
[16:29] <Olivier_bzh> thank you todin, yes I have 3 disks in each servers, and for the momment, it is pooled in one btrfs file system
[16:31] <todin> Olivier_bzh: so you could start overall 9 osd, 3 per server on 3 server
[16:32] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[16:32] <Olivier_bzh> that's fine... and overall, I've some free ethernet interfaces : I guess that if I launch osds on different interfaces, this will also balance the network traffic, isn't it ?
[16:34] <todin> Olivier_bzh: you could give the osd diffrent ip which are bond to diffrent interfaces, or use nic bonding
[16:35] <Olivier_bzh> Thank you very much Odin, I've finished with my questions ;-)
[16:35] <Olivier_bzh> it helps me a lot to understand how is working ceph
[16:35] <todin> Olivier_bzh: ok, no problem
[16:37] <lxo> hmm, interesting... the directory hasn't changed for days, even though I created the snapshot (of a parent dir) today. I suppose that's how it should be, so now I gotta figure out whether the snapshot root was changed in any way to reflect the snapshot change, or whether the mds is still behind, or where the snapshot information is held as far as directories are concerned
[16:46] <lxo> looks like it's the mds running slow, for some ancestors have already been updated, but not this one dir, nor its parent and grantparent
[16:48] <lxo> it's a tree with 3 backups of SHR GNU/Linux root filesystems, with thousands of small files scattered over hundreds of directories each; the one directory that fails has a particularly large number of entries
[17:09] <lxo> even more interesting: after the osd file holding the root of the snapshot was last modified (by the creation of the snapshot), a bunch of other osd files holding directories in there were modified over less than a minute, and from then on, the only osd files modified within 1.*/ are 200.* files, which I believe to be the mds journal, at a rate of one new 4MB file every 4-5 minutes
[17:09] <lxo> this seems to imply that the mds got stuck in committing stuff to directories, no?
[17:11] <lxo> I've recently stopped any other activity in the filesystem, which was getting new journal files created, but it didn't make progress any more. now, let's see what happens when I restart the mds...
[17:15] <lxo> but, before that... let me see whether the last-modified directory holds any relationship with the one from which files disappeared
[17:31] <lxo> hmm, apparently not. it updated only 21 directories, none of which are even close to the failing one
[17:39] <lxo> ok, the mds has officially lost it (TM); other files are now disappearing from under it, within other opkg/info dirs (package manager meta-info with a few files per package)
[17:52] <lxo> couldn't get anything relevant from the running mds session with gdb
[17:54] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:56] <chaos_> sagewk, i didn't have commit id
[17:56] <chaos_> i'll look if it's fixed
[17:57] <chaos_> oh.. its fixed in 3.2-rc1
[17:57] <chaos_> i'll try
[18:03] <lxo> ok, mds restarted and recovered, files that last failed still fail, but re-creating them doesn't seem to bring about any new failures
[18:07] <lxo> or just not in the first round... :-(
[18:08] <lxo> oh well... I'll keep looking into this, but suggestions of where to look for higher likelihood of finding a bug would be appreciated ;-)
[18:34] <sage> chaos_: keep in mind the btrfs xattr bug may have corrupted the object metadata such that upgrading the kernel now won't help.
[18:36] <sage> lxo: if you can reproduce teh bug with debug mds = 20 that should be enough. if you watch teh _fetch_dir debug output you should see what was contained on disk, altho that may not actually be the culprit (due to the way recent changes are journaled but not nec. in the directory object)
[18:37] <sage> lxo: btw, does your btrfs patch #2 address the same problem christian was seeing with performance degradation after a couple days of usage?
[18:39] <sage> will be in late.. waiting at home for guy to show up
[18:41] <chaos_> sage, how i can recover from this?
[18:43] <sage> chaos_: you can kludge around it with something like http://fpaste.org/PKwW/
[18:44] <chaos_> i'll
[18:44] <sage> although if you're unlucky it may manifest in some other way down the line
[18:44] <chaos_> unfortunately 3.2 kernel has patched this bug, osd crashes as before
[18:44] <sage> yeah
[18:45] <chaos_> ehh
[18:45] <sage> i'm not sure if there is a right way to work around the bug. btrfs basically dropped setxattr, and we don't know exactly what it should have contained
[18:46] <lxo> sage, yeah, it's probably a fix for that one problem
[18:47] <sage> that hack will make it not crash (there) at least, but there are probably other objects with the same problem. you're best off wiping ..
[18:47] <lxo> I'll bump up mds logging and see how that goes
[18:47] <chaos_> well this data wasn't so important, i can lose file or two
[18:47] <chaos_> sage, wiping what?!
[18:47] <chaos_> whole cluster?
[18:48] <sage> chaos_: yeah, if there is important data. this probably isn't the only object to hit that btrfs bug
[18:48] <chaos_> ...
[18:48] <lxo> chaos_, since he's speaking of btrfs xattrs, I'd guess just that the fail osd would suffice, assuming the data is replicated in other osds it should recover by itself
[18:48] <chaos_> lxo, bots osd crashed at the same time
[18:49] <chaos_> both*
[18:49] <lxo> ugh
[18:49] <lxo> that sucks
[18:49] <lxo> been there before
[18:49] <chaos_> very much
[18:50] <sage> if i were you i'd probably apply the workaround and try to copy stuff out, see how things hold up. at least then you'll know where you stand
[18:50] <chaos_> we'll have to reconsider btrfs, how much ceph sucks with ext4?
[18:51] <sage> we could spend some time trying to cope with this particular corruption.. i hate to spend time on specific old bugs, though, so ti's a question of how generally useful we can make a workaround
[18:51] <chaos_> sage, if i recover most of data it will be fine
[18:51] <sage> ext4 has xattr size limits. xfs is probably a better bet. honestly, though, i'd stick with btrfs for testing, unless you need to load this up with production data immediately.
[18:52] <gregphone> ext4 depends on what you're doing
[19:06] <chaos_> ok, building buntu package with sage workaround
[19:08] <NaioN> sage or somebody else who knows, I noticed that the roadmap has been updated to .40 but I don't see a .39 tag in git?
[19:09] <NaioN> I like to have a .39 build...
[19:09] <chaos_> everyone want something;p i want my data back
[19:10] <NaioN> also I've asked keizerflipje (see some time time back on this channel) to build the last git (of today), but we're getting an error with the uuid.h include?
[19:10] <NaioN> chaos_: okok yours a little more important ;)
[19:10] <NaioN> but then again, it isn't tagged as stable yet ;)
[19:13] <joshd> NaioN: libuuid is a new dependency in master (packaged as uuid-dev or libuuid-devel)
[19:13] <joshd> 0.39 will probably be tagged at the end of the week
[19:15] <NaioN> joshd: thanks! I'll inform keizerflipje, so he can solve his build issue
[19:16] <NaioN> and we will check at the end of the week to build a .39 release
[19:19] <pmjdebruijn> joshd: I already fixed the Build-Depends to include uuid-dev (which in turn pulls in libuuid1)
[19:20] <pmjdebruijn> the build log I pasted contains everything including the setup of the build environment
[19:25] <joshd> pmjdebruijn: looks like it's failing for us too - sage: deb-amd64 gitbuilder shows the same errors since libuuid was added
[19:26] <joshd> there must be something missing from the control file, since building from source (instead of building a package) works
[19:27] <pmjdebruijn> hmmm
[19:27] <pmjdebruijn> very odd
[19:28] <Tv> joshd: what errors?
[19:28] <pmjdebruijn> Tv: https://launchpadlibrarian.net/86176461/buildlog_ubuntu-lucid-amd64.ceph_0.38%2B250~gc2889fe-1unnet2~lucid_FAILEDTOBUILD.txt.gz
[19:29] <Tv> pmjdebruijn: i'd rather see the error from the gitbuilder where i can fix it ;)
[19:29] <Tv> that's just a pile of unknown.. it's missing the uuid library
[19:29] <joshd> Tv: In file included from mds/MDS.cc:17: error: ./include/types.h:22:18: uuid.h: No such file or directory
[19:29] <Tv> joshd: oh so someone started using a lib without adding it to build-depends
[19:30] <joshd> except it was added to build-depends, that's the odd part
[19:30] <Tv> joshd: and then the gitbuilders actually need to get the package installed; the build-dep is just a more explicit error for that
[19:30] <pmjdebruijn> Tv: I am pulling in uuid-dev and libuuid1
[19:31] <Tv> oh uuid.h is in /usr/include/uuid/
[19:31] <Tv> no the includes are <uuid/uuid.h> that should work..
[19:31] <pmjdebruijn> well it's in the local source there is a uuid.h as well
[19:31] <Tv> src/include/types.h:22:#include "uuid.h"
[19:31] <NaioN> Tv: be aware that's there two uuid.h's
[19:31] <Tv> oh evil
[19:31] <Tv> we should really use pkg-config for libuuid
[19:31] <NaioN> Tv: pmjdebruijn also thought that was the problem
[19:31] <pmjdebruijn> that would be nicest
[19:31] <pmjdebruijn> NaioN: yeah that confused me too
[19:32] <pmjdebruijn> there is a hilarious monologue of mine in the backlog because of that :D
[19:32] <chaos_> sage, it looks that osd booted, and i've lost 1 pg, i'll boot second one in a minute
[19:32] <NaioN> but it seems it refers to another uuid.h
[19:32] <NaioN> so it's a little more complicated :)
[19:33] <pmjdebruijn> so I'm wondering whether it's a uuid version or gcc version issue
[19:33] * pmjdebruijn never tested a non-package build on Ubuntu Lucid
[19:33] <pmjdebruijn> NaioN: btw I'm not called keizerflipje over here :D
[19:34] <NaioN> yeah i noticed
[19:34] <pmjdebruijn> :D
[19:34] <NaioN> tab completion didn't work
[19:34] <NaioN> so i thought you left the channel already :)
[19:34] <Tv> joshd: can you show me the internal build error we have for that?
[19:34] <Tv> joshd: that's way more reproducible for me than launchpad
[19:35] <Tv> afk
[19:35] <greglap> Tv: josh is sitting next to me now like you should be :p
[19:35] <pmjdebruijn> https://launchpad.net/~unnet-pkg-master/+archive/ceph-release/+files/ceph_0.38%2B250~gc2889fe.orig.tar.gz
[19:35] <pmjdebruijn> https://launchpad.net/~unnet-pkg-master/+archive/ceph-release/+files/ceph_0.38%2B250~gc2889fe-1unnet2~lucid.diff.gz
[19:36] <pmjdebruijn> those are our package sources btw
[19:37] <pmjdebruijn> anyhow, I'll be afk for a bit too, I'll be around again tomorrow
[19:40] <chaos_> sage, yey! my data it's here, at least partialy
[22:30] <grape> similarly, what should the permissions be on the mon.* and osd.* mountpoints?
[22:31] <grape> and the log files would probably be in the same boat
[22:32] <NaioN> grape: what could be a problem with sudo is the enviroment
[22:32] <grape> NaioN: so running it as root would be safer?
[22:33] <NaioN> no
[22:33] <grape> NaioN: not safe by security, but safe as in getting it running properly
[22:33] <NaioN> but you have to check the options for sudo
[22:34] <NaioN> because (same as with su) it's not always that all the environment variables are changed
[22:34] <NaioN> eg if you do a sudo -s your HOMEDIR remains that of the user
[22:35] <NaioN> so i think you don't have troubles with the permissions, but with some variables that are different under user root and sudo root
[22:36] <NaioN> but the process shall be executed as root (0)
[22:36] <NaioN> so security wise it's the same
[22:38] <NaioN> but normally that's not a problem for programs that don't do anything specific for a user, because in that case you don't depend on an environment variable of the user
[22:38] <NaioN> grape: what goes wrong?
[22:39] <grape> lol chasing my tail :-)
[22:40] <NaioN> looked in the man of sudo and I think you need -i
[22:40] <grape> ah that is what I needed to know :-)
[22:42] <grape> I was working on some install docs and noticed that I was doing something different than the other docs said. namely I was running "sudo ssh hostname command" when some other docs said "ssh hostname sudo command"
[22:43] <grape> it occurred to me that this could make a difference
[22:43] <NaioN> grape: sudo isn't a security tool, it's just that you can work under your normal user and invoke commands with root privileges if needed, so you don't have to run around as root all the time :)
[22:43] <NaioN> grape: yes that's a difference
[22:43] <grape> sure. running with your pants down is hard :-)
[22:44] <NaioN> with the first you wouldn't have the problems
[22:44] <NaioN> because you would become root on the first host and logon as root on the second host and get the root environment
[22:45] <grape> ok, then that would give a consistent outcome
[22:45] <NaioN> with the second line you ssh as the user to the second host and sudo to root (environment remaining) and execute the command
[22:45] <grape> this could bring trouble depending on distro and sudo configuration then
[22:46] <grape> environment setup, rather
[22:46] <NaioN> well what I don't get is in what sort of trouble do you get?
[22:47] <grape> my problem was that ceph wasn't starting
[22:47] <grape> the root/sudo thing just eliminates another possibility
[22:48] <NaioN> hmmm maybe a PATH problem (different PATHs for a user and root)
[22:48] <NaioN> I could image that thats diffrent between user and root
[22:48] <grape> well, everything is strictly being run as root, so I don't think that is a problem
[22:50] <NaioN> you said that if you run as root natively than everything works, but if you sudo to root it doesn't?
[22:50] <grape> no, I don't know that. I only know that it does not start if I run everything as root.
[22:51] <NaioN> do you have an error?
[22:51] <grape> that problem is only if crypt is on. when I turn crypt off, I skip that problem and go on to not being able to connect to the other nodes
[22:52] <grape> actually, let me run my setup script again so I have something fresh to work with
[22:52] <grape> i need a fresh mess to work with :-)
[22:52] <NaioN> :)
[23:04] <grape> ok, nice and fresh. the problem is that monclient wants to send_mon_message to mon.c at
[23:04] <grape> mon.c doesn't want to talk to monclient
[23:05] <grape> the address looks funny to me though
[23:05] <grape> why is it /0? shouldn't it be /24?
[23:06] <NaioN> no that's right
[23:06] <grape> ok
[23:07] <grape> that's good to know.
[23:07] <NaioN> but if you do a ceph -s what do you see?
[23:08] <grape> a whole bunch of failed connections like this:
[23:08] <grape> 2011-11-29 16:59:20.319364 7fc350fb0700 -- :/17338 >> pipe(0xaeb040 sd=5 pgs=0 cs=0 l=0).fault 111: Connection refused
[23:08] <grape> 2011-11-29 16:59:20.319425 7fc350fb0700 -- :/17338 >> pipe(0xaeb040 sd=5 pgs=0 cs=0 l=0).fault first fault
[23:08] <grape> 2011-11-29 16:59:20.319684 7fc350fb0700 -- :/17338 >> pipe(0xaeb040 sd=5 pgs=0 cs=0 l=0).connect error, 111: Connection refused
[23:09] <NaioN> hmmmm has the monitor been started?
[23:09] <NaioN> what do you see in the mon log
[23:10] <grape> it looks like it has imported all of the keys
[23:11] <grape> it created the initial map, etc
[23:11] <NaioN> I run without cephx :)
[23:11] <grape> no errors
[23:11] <grape> I am running without cephx as well right now
[23:12] <grape> well, i'm not running, but if I were...
[23:12] <NaioN> which keys are you talking about?
[23:12] <NaioN> hmmmm
[23:12] <NaioN> could you paste the ceph.conf to eg pastebin?
[23:12] <grape> sure
[23:15] <grape> http://pastebin.com/mtLvBFHM
[23:20] <grape> while I am at it, here is the ceph -s output
[23:20] <grape> http://pastebin.com/P3a1mYnH
[23:27] * verwilst (~verwilst@dD576FA87.access.telenet.be) Quit (Quit: Ex-Chat)
[23:28] <NaioN> grape: lets have a look
[23:30] <NaioN> you have run mkcephfs?
[23:30] <NaioN> with -a en --mkbtrfs?
[23:31] <grape> yeah
[23:31] <NaioN> s/en/and :)
[23:31] <grape> here is mon.a.log http://pastebin.com/pPtHNBa8
[23:31] <NaioN> and it didn't give any errors?
[23:31] <grape> no errors other than the connection fail
[23:32] <NaioN> and the proces runs?
[23:32] <NaioN> cepn-mon?
[23:32] <NaioN> ceph-mon
[23:34] <grape> ps ax doesn't return anything with ceph in it
[23:34] <NaioN> hmmmm that isn'y good :)
[23:34] <NaioN> how do you start the service?
[23:35] <NaioN> with /etc/init.d or with service ceph start? depens on the distro
[23:35] <grape> with /etc/init.d/
[23:35] <grape> im on ubuntu 11.10
[23:35] <NaioN> then service ceph start
[23:35] <NaioN> does that give errors?
[23:36] <grape> nah still does nothing
[23:36] <NaioN> hmmm you get output with service ceph start?
[23:37] <grape> nothing with both
[23:37] <NaioN> that isn't good
[23:38] <NaioN> you have another kind of problem
[23:38] <NaioN> you should see something of output from the ceph daemons
[23:39] <grape> do you need to be on one of the osd nodes to init properly? I can try starting from a different node
[23:39] <NaioN> http://pastebin.com/efjejtej
[23:40] <NaioN> best to start with the mon
[23:40] <NaioN> because then you can do ceph -s
[23:40] <NaioN> and you get information about the cluster
[23:40] <grape> it did nothing
[23:41] <grape> i have one node that is only mon, the other two nodes are mon + osd *2
[23:42] <NaioN> you don't have a mds?
[23:42] <grape> no
[23:42] <NaioN> well i don't know if its needed for running
[23:42] <grape> I am only going to be using it for block devices - didn't think it was needed
[23:43] <NaioN> yeah ok, but i don't know if you need to specify it
[23:43] <grape> I can add one
[23:43] <NaioN> yes it doesn't hurt
[23:46] <grape> the script is running to build the cluster again
[23:55] <grape> it looks like maybe it is pushing monmap before it creates the filesystems
[23:56] <grape> === osd.1 ===
[23:56] <grape> pushing conf and monmap to alef-storage:/tmp/mkfs.ceph.19737
[23:56] <grape> umount: /srv/osd.1: not mounted
[23:56] <grape> umount: /dev/sdc1: not mounted
[23:56] <grape> that would explain why it isn't connecting
[23:56] <NaioN> no thats correct
[23:57] <NaioN> if you use --mkbtrfs mkcephfs first tries to umount and then creates the fs
[23:57] <grape> then when does it push the monmap?
[23:58] <NaioN> 23:55 < grape> pushing conf and monmap to alef-storage:/tmp/mkfs.ceph.19737
[23:58] <grape> gotcha
[23:58] <NaioN> then it creates the fs and then it puts the info on the right spot
[23:59] <NaioN> sorry have to go...
[23:59] <grape> thanks for your help

