#ceph IRC Log


IRC Log for 2010-12-02

Timestamps are in GMT/BST.

[0:01] * ijuz (~ijuz@p4FFF6162.dip.t-dialin.net) has joined #ceph
[0:17] * ajnelson (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:21] <johnl> hey, shouldn't the librados debian package provide a librados.so
[0:22] <johnl> as a symlink to librados.so.1.0.0 ?
[0:22] <sagewk> apparently that's the job of the -dev package. .so is only needed when building
[0:23] <johnl> ah right.
[0:23] <sagewk> or maybe ldconfig makes those? dunno, i'm not super familiar with how the shared library stuff works
[0:23] <johnl> that indeed does the job
[0:23] <johnl> however it works, installing the -dev package got it created
[0:24] <johnl> looks to be in the package itself, not ldconfig
[0:33] <johnl> right, got a launchpad ppa building ubuntu packages from the rc git branch
[0:33] <johnl> https://launchpad.net/~johnleach/+archive/ceph-rc
[0:34] <johnl> not quite automated because launchpad can't auto-pull from any branch but master
[0:34] <johnl> so I have to git pull and bzr push atm
[0:34] * ajnelson (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[0:34] <johnl> am running the rc packages now though on my cluster. looks good so far.
[0:40] <johnl> oop, crashed it.
[0:41] <sagewk> johnl: stack trace?
[0:42] <johnl> http://pastebin.com/X7jFwvKQ
[0:42] <johnl> wasn't running in debug mode. I could probably reproduce
[0:42] <sagewk> ha, I just push a fix for that about 5 minutes ago :)
[0:42] <johnl> I was writing lots of objects and then added a new osd
[0:42] <johnl> lol
[0:43] <johnl> on rc branch?
[0:43] <sagewk> yeah
[0:43] <johnl> right, I'll set a rebuild going.
[0:43] <johnl> canonical pay for the build time :)
[0:47] <johnl> neat, builds whenever I push.
[0:47] <sagewk> what is it?
[0:48] <johnl> launchpad.net
[0:48] <sagewk> ah
[0:48] <johnl> new feature called recipes
[0:48] <johnl> just write a recipe, which takes a bzr repo, applies a transform on the package vesion and does a build.
[0:49] <johnl> so I'm just mirroring your git repo. the debian/ dir on there is enough. the recipe adds the commit to the version and builds
[0:49] <johnl> unfortunately it doesn't know the git commit, just the bzr commit :(
[0:50] <johnl> easy to run a build for all the ubuntu distros though. just a checkbox :)
[0:50] <sagewk> cool
[0:51] <cmccabe> sagewk: good fix in 78a14622438addcd5c337c4924cce1f67d053ee9
[0:51] <cmccabe> sagewk: I hadn't considered backlogs in that case
[0:51] <sagewk> yeah
[0:54] <johnl> off to bed. nn.
[0:55] <sagewk> ttyl
[0:55] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Quit: bye)
[1:05] <bchrisman> I've got a question on build: I'm starting with CentOS5.5 with EPEL pkgs, autogen.sh succeeds, ./configure succeeds… but it looks like $(builddir) is not getting set in the makefile: make[2]: *** No rule to make target `/ceph_ver.c', needed by `ceph_ver.o'. Stop.
[1:07] <bchrisman> I'm not so familiar with the details of the autogen stuff and was wondering if anybody here had any pointers before I go digging around too much?
[1:07] <sagewk> which version are you building?
[1:08] <bchrisman> a few minutes ago: git clone git://ceph.newdream.net/ceph.git
[1:09] <sagewk> can you try the 'rc' branch? there were a number of changes in the Makefile with srcdir vs builddir recently
[1:09] <bchrisman> ahh ok… will do.
[1:10] <cmccabe> bchrisman: you might have an old version of the code combined with an old version of automake
[1:11] <cmccabe> bchrisman: older versions of automake didn't define $builddir at all, so you might see messages like the one you pasted
[1:12] <cmccabe> bchrisman: we have a workaround now though
[1:12] <bchrisman> cmccabe: that's in the rc branch?
[1:12] <cmccabe> bchrisman: yeah, try the rc branch
[1:13] <sagewk> i'm looking at 4adfdee7, which changes a bunch of problems jim schutt had. i can't remember exactly what he's running, but i think it's rhel-based
[1:14] <cmccabe> sagewk: yeah, that commit should resolve it
[1:14] <cmccabe> sagewk: although I didn't like that commit because it broke VPATH builds (where srcdir != builddir)
[1:15] <bchrisman> how do I grab the rc branch? I'm not terribly familiar with git yet?
[1:15] <cmccabe> sagewk: so I resolved in 62075f34b316b03c
[1:15] <cmccabe> bchrisman: do you have a git repository checked out yet?
[1:16] <bchrisman> I just cloned from what I guess is trunk.
[1:16] <cmccabe> bchrisman: you should be able to do "git checkout rc"
[1:17] <cmccabe> bchrisman: and then you will be on the rc branch. It might be a good idea to run git pull as well in case there are any new changes
[1:17] <bchrisman> cmccabe: error: pathspec 'rc' did not match any file(s) known to git. (sorry for the newbie questions on git here.. )
[1:18] <cmccabe> bchrisman: try doing git pull first?
[1:18] <sagewk> git checkout -b rc origin/rc
[1:18] <bchrisman> sagewk: thanks that worked.
[1:18] <bchrisman> will build again now..
[1:20] <bchrisman> that fixed the make problem.. thanks..
[1:20] <cmccabe> bchrisman: np
[2:47] * tjikkun_ (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[2:51] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:00] * greglap (~Adium@ has joined #ceph
[3:09] * ajnelson (~Adium@soenat3.cse.ucsc.edu) Quit (Ping timeout: 480 seconds)
[3:19] * Dakota_Moss (~C0-k0nToL@ has joined #ceph
[3:19] * Dakota_Moss (~C0-k0nToL@ has left #ceph
[3:20] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Read error: Operation timed out)
[3:52] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[4:03] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[4:23] * ajnelson (~Adium@dhcp-128-22.cruznetsecure.ucsc.edu) has joined #ceph
[6:37] * Guest792 (quasselcor@bas11-montreal02-1128535712.dsl.bell.ca) Quit (Remote host closed the connection)
[6:39] * ajnelson (~Adium@dhcp-128-22.cruznetsecure.ucsc.edu) Quit (Read error: Operation timed out)
[6:39] * bbigras (quasselcor@bas11-montreal02-1128535712.dsl.bell.ca) has joined #ceph
[6:39] * bbigras is now known as Guest1304
[6:46] * ijuz_ (~ijuz@p4FFF5EB0.dip.t-dialin.net) has joined #ceph
[6:48] * f4m8_ is now known as f4m8
[6:53] * ijuz (~ijuz@p4FFF6162.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[8:17] * todinini (tuxadero@kudu.in-berlin.de) Quit (Remote host closed the connection)
[8:34] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[8:49] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:51] * todinini (tuxadero@kudu.in-berlin.de) has joined #ceph
[9:14] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:50] * gregorg_taf (~Greg@ Quit (Quit: Quitte)
[9:50] * gregorg (~Greg@ has joined #ceph
[9:54] * allsystemsarego (~allsystem@ has joined #ceph
[10:44] * Yoric (~David@ has joined #ceph
[11:19] * Meths_ (rift@ has joined #ceph
[11:27] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[11:36] * Meths_ is now known as Meths
[13:15] * Yoric_ (~David@ has joined #ceph
[13:15] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[13:15] * Yoric_ is now known as Yoric
[14:24] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[14:34] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[15:17] * Yoric_ (~David@ has joined #ceph
[15:17] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[15:17] * Yoric_ is now known as Yoric
[15:18] * Yoric_ (~David@ has joined #ceph
[15:18] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[15:18] * Yoric_ is now known as Yoric
[15:53] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[16:04] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (synthon.oftc.net graviton.oftc.net)
[16:04] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net graviton.oftc.net)
[16:04] * DeHackEd (~dehacked@dhe.execulink.com) Quit (synthon.oftc.net graviton.oftc.net)
[16:04] * nolan (~nolan@phong.sigbus.net) Quit (synthon.oftc.net graviton.oftc.net)
[16:04] * michael-ndn (~michael-n@ Quit (synthon.oftc.net graviton.oftc.net)
[16:13] * sentinel_e86 (~sentinel_@ Quit (Quit: sh** happened)
[16:14] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[16:14] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[16:14] * DeHackEd (~dehacked@dhe.execulink.com) has joined #ceph
[16:14] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[16:14] * michael-ndn (~michael-n@ has joined #ceph
[16:14] * sentinel_e86 (~sentinel_@ has joined #ceph
[16:27] <johnl> any devs about? I've got a cluster I can break, but I can't seem to see what is going on. no crashes. just stops accepting data
[16:27] <johnl> currently reproducible
[17:30] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (Quit: Leaving.)
[17:45] * Meths_ (rift@ has joined #ceph
[17:50] * greglap (~Adium@ has joined #ceph
[17:51] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[18:06] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[18:09] <greglap> johnl: I'm around for a bit now, what's going on with your cluster?
[18:09] <johnl> writing a large object to rados hangs
[18:09] <johnl> large being 100meg
[18:10] <johnl> can insert 1, 5 and 20 meg
[18:10] <johnl> but try a 100meg and it hangs. but then can't insert any other objects
[18:10] <johnl> whole cluster needs a restart
[18:10] <greglap> hmmm
[18:10] <johnl> debug logs from mon and osd don't seem to show anything untoward
[18:10] <johnl> but the "ceph -w" stops outputting stats
[18:11] <johnl> and any other writers stop too
[18:11] <greglap> oh
[18:11] <greglap> that is odd
[18:11] <johnl> this is the latest build from rc
[18:11] <johnl> yeah, is weird.
[18:11] <greglap> what steps are you taking?
[18:11] <greglap> like what are you doing, with which tools, before trying to put the 100MB object
[18:12] <johnl> restart whole cluster. run "ceph -w" on one node. write a 1meg object. write a 5 meg object. all fine. then write 100meg, all sticks.
[18:12] <johnl> rados put
[18:12] <greglap> this is all from the same client node?
[18:12] <johnl> ceph -w on one node. writes from another
[18:13] <greglap> how many of each server daemon do you have running?
[18:14] <johnl> 2 mons. 4 osd
[18:15] <greglap> 1 MDS?
[18:15] <johnl> 2 mds too I believe. though are they used for rados only?
[18:15] <johnl> lemme check
[18:15] <greglap> no, they aren't
[18:15] <johnl> two mds
[18:15] <greglap> I'll try reproducing it real quick using vstart, gimme a few here
[18:16] <johnl> I can give you access to my cluster if you like
[18:16] <johnl> is a test one. no real data
[18:23] <greglap> hmmm, looks like I can reproduce it
[18:26] <johnl> my journal is 100mb btw. first thing that came to mind :)
[18:26] <greglap> yeah, that's definitely a possibility, and Sage reworked a good bit of that stuff recently
[18:26] <greglap> I just want to rule out one other thing first
[18:27] <greglap> we've had some issues with our message throttling lately
[18:28] <johnl> glad you can reproduce :)
[18:29] <greglap> yeah, it definitely makes life easier!
[18:34] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:34] * Meths_ is now known as Meths
[18:36] <greglap> all right, looks like it's probably a journal issue to me
[18:36] <greglap> sagewk, you see this?
[18:39] <greglap> johnl: oh, sage says this is a known issue with writeahead journals
[18:39] <greglap> my bad, I don't play around at the extremes too often *blush*
[18:39] <greglap> single OSD ops can't exceed the size of the journal in writeahead mode
[18:39] <johnl> righty
[18:40] <greglap> we'll try and fix it to return a useful error at some point, but the issue isn't really something that can be fixed while ensuring data consistency
[18:40] <johnl> I take it a journal is not necessary if using btrfs?
[18:40] <greglap> it is, but it can run in parallel mode
[18:40] <sagewk> with btrfs you can journal in 'parallel' mode, in which case that op won't get journaled but can still be applied to the fs.
[18:41] <greglap> heh, I'll let sage take over, he knows more about this than me
[18:41] <sagewk> there will be a latency spike, but things will (er, should, haven't tested this :) stillw ork
[18:41] <greglap> at the station anyway! :)
[18:41] <johnl> hehe
[18:41] <johnl> ta greg
[18:42] <johnl> sagewk: so does cephfs never write large objects? does it split everything?
[18:42] <johnl> I believe rbd does
[18:42] <sagewk> normally the largest write is a stripe unit of a file, by default 4mb
[18:42] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[18:43] <johnl> can I write one object larger than the journal in smaller writes?
[18:43] <sagewk> although with dir fragmentatino off a large directory flush can also get big... but that's harder to trigger and metadata workload dependent
[18:43] <sagewk> oh, yeah definitely
[18:43] <johnl> ah so the rados tool is just generating a big write
[18:44] <sagewk> it's just a single atomic write that can't exceed the journal size
[18:44] <sagewk> right
[18:44] <sagewk> oh, you're doing 'rados put objname /some/file' or something?
[18:44] <johnl> yeah. 100 meg file (random data)
[18:44] <sagewk> yeah the tool should probably write in (largish) chunks by default
[18:44] <johnl> suppose nobody is really using the rados tool, but perhaps it should write in smaller chunks?
[18:44] <johnl> :)
[18:45] <johnl> I'll open a ticket :)
[18:45] <johnl> it does seem to hang the entire cluster btw. which seems pretty bad.
[18:45] <johnl> I'll double check that
[18:46] <sagewk> too slow http://tracker.newdream.net/issues/624
[18:46] <johnl> ha
[18:46] <sagewk> thanks
[18:47] <sagewk> are you guys looking at rbd or the distributed fs?
[18:47] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:47] <johnl> well, I'm just a bit of a distributed storage nerd and am just playing right now
[18:47] <johnl> not so interested in rbd just now
[18:48] <johnl> the fs is less interesting to me too. the distributed object store is exciting
[18:49] <johnl> we do web deployments, many of which generate assets such as avatars or whatever
[18:49] <johnl> we're using glusterfs to keep a mirrored store of them atm.
[18:50] <johnl> but it starts to be difficult managing 6 million files or more
[18:50] <johnl> we don't need all the posix stuff, so storing them as objects in rados would be perfect
[18:51] <johnl> but I'm really just playing around atm. ceph looks ace.
[18:51] <sagewk> cool.
[18:51] <sagewk> there's also a (poorly documented, atm) class mechanism that may be of interest
[18:51] <sagewk> you can load .so's in to the object store and define your own read/modify transformations on objects (implement new methods on top of the existing read/write/remove/etc)
[18:52] <johnl> oh very interesting
[18:52] <johnl> I was imagining doing that kind of work with a proxy in front of ceph
[18:53] <johnl> how will ceph be with many millions of little files?
[18:53] <johnl> is there some ram limitation on that?
[18:58] * ajnelson (~Adium@dhcp-128-22.cruznetsecure.ucsc.edu) has joined #ceph
[18:59] <gregaf> johnl: depends on the workload
[18:59] <johnl> I'd imagine millions of files < 200k. in tens of thousands of pools. Write once, read many.
[19:00] <gregaf> oh, if you're just talking about objects in pools, definitely not
[19:00] <gregaf> if you mean at the filesystem level, Ceph doesn't need to hold all metadata in-memory or anything
[19:00] <johnl> sweet
[19:01] <johnl> I was worried about resyncing a degraded filesystem of tens of millions of files needing many tens of gigs of ram on the monitors or something
[19:01] <johnl> don't understand the ceph architecture yet
[19:01] <gregaf> but it does need to hold the working set (at the directory level) in memory to do a lot of stuff, so if you have ridiculously large directories you can get things going pretty slowly
[19:02] <johnl> I'm definitely thinking pools and objects only, not the filesystem.
[19:02] <gregaf> bonnie for instance doesn't play very nicely on the userspace client, though I think the kernel client is more efficient in terms of how it handles capabilities so it works better there
[19:02] <johnl> yer, I upset cfuse with bonnie :)
[19:02] <gregaf> oh, yeah, the object store is fine assuming you've been sane about setting the number of PGs and stuff
[19:02] <johnl> ace
[19:03] <johnl> I'll read the paper about it all sometime soon :)
[19:03] <gregaf> I mean, it's not well-tested at the moment so there may be implementation issues but architecturally the number of objects is not a problem
[19:03] <johnl> when I exhaust my current methods of finding bugs ;)
[19:04] <johnl> yeah, that's what I want to hear. bugs can be fixed. architecture is harder :)
[19:05] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:05] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[19:06] <johnl> right, curry time. back later.
[19:06] <johnl> thanks for answering my questions!
[19:06] <gregaf> np
[19:24] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Read error: Connection reset by peer)
[19:28] * Yoric (~David@ Quit (Quit: Yoric)
[20:30] <wido> sagewk: are you sure my bug isn't caused by switching from unstable to rc?
[20:31] <wido> I've got a cdebugpack, do you need that?
[20:36] <sagewk> wido: which bug was it?
[20:38] <wido> Oh, the MDS bug
[20:38] <wido> load table 2
[20:40] * Meths_ (rift@ has joined #ceph
[20:41] <sagewk> it probably is, but shouldn't have crashed :) i'll take a look shortly
[20:44] <wido> sagewk: Oh, no hurry at all, but I don't want to send you on a ghost chase. Btw, from the logger machine it's simply ssh root@noisy
[20:47] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[21:43] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[22:13] * Meths_ is now known as Meths
[22:16] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[22:19] <cmccabe> johnl: I dunno if you're cc'ed on the bug
[22:19] <cmccabe> johnl: but I found out what is behind 622
[22:22] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[22:29] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[22:45] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[22:53] * rmull (~rmull@acsx02.bu.edu) has joined #ceph
[23:08] <rmull> Could Ceph be used to replace an NFS-attached fileserver?
[23:10] <iggy> rmull: can you elaborate on nfs-attached fileserver?
[23:10] <rmull> Example: A single machine with a large number of disks is running storage, monitor, and metadata daemons. I want to access that data from a machine with a small disk.
[23:10] <iggy> do you just mean an nfs server?
[23:10] <rmull> Yep
[23:11] <iggy> you can have a single server ceph cluster afaik
[23:11] <iggy> if I understand the question correctly
[23:11] <gregaf> you could do that, I'm not sure that the added complexity would be worth it though
[23:11] <rmull> gregaf: Okay, I think that's basically why I am asking
[23:11] <rmull> Thanks to both of you
[23:13] <gregaf> Ceph is designed for storage problems that are too large for NFS; if NFS satisfies your bandwidth and availability needs I'm not sure why you'd use anything else?
[23:14] <rmull> Yes, there wouldn't be much sense.
[23:26] <rmull> Could ceph be used in a F(riend)2F(riend) network, where people across the world can each run a server in a common ceph cluster, so that everyone has access to everyone else's data?
[23:30] <cmccabe> rmull: most people's internet connections are pretty poor in terms of both latency and bandwidth. The network protocols people generally use to share data peer-to-peer take this into account.
[23:32] <rmull> cmccabe: Thanks. I guess I'll stop asking noob questions.
[23:32] <rmull> haha
[23:33] <cmccabe> rmull: well, don't be afraid to ask questions, even newbie ones
[23:34] <rmull> Well, in that case, since you offered...
[23:34] <rmull> One more: does ceph segment the transfers so that a multiple chunks from a single replicated file can come from more than one node?
[23:35] <cmccabe> well, ceph breaks the files up into objects which are stored in PGs
[23:36] <cmccabe> and PGs spread them across multiple OSDs
[23:36] <cmccabe> so basically yes
[23:36] <rmull> Okay, thank you cmccabe
[23:37] <cmccabe> np
[23:49] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.