#ceph IRC Log


IRC Log for 2011-05-19

Timestamps are in GMT/BST.

[0:09] * MarkN (~nathan@ has left #ceph
[0:13] * alexxy[home] (~alexxy@ has joined #ceph
[0:14] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[0:17] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[0:31] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[1:34] <sagewk> bchrisman: there?
[1:35] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Read error: Operation timed out)
[1:36] <bchrisman> sagewk: ayup
[1:36] <bchrisman> whassup?
[1:36] <sagewk> were you able to sort out the readdir_r callback return value thing?
[1:36] <sagewk> from monday?
[1:36] <sagewk> er.. last week? i forget :)
[1:37] <bchrisman> getting there??? there's something odd going on with samba there...
[1:37] <bchrisman> I implemented ceph_readdir() and Client::readdir(), and I exercise those in testceph and it behaves as expected.
[1:38] <bchrisman> however the vfs layer cephwrap_readdir returns what I think it should, but samba goes into some nutty loop calling telldir then readdir then telldir......
[1:38] <bchrisman> so I'm still tracking that down.
[1:39] <bchrisman> that's why I've been going over the types involved carefully.
[1:41] <bchrisman> what's odd is that after readdir returns the last entry in the directory, samba calls 'telldir' and then does a stat which gets passed in with an unprintable character:
[1:41] <bchrisman> 2011-05-18 23:23:40.862288 7f5f5c8ce7c0 client5116 lstat enter (relpath./]_^? mask 341)
[1:41] <bchrisman> that *looks* like it's coming from the samba request.. but I'll check into it.
[1:42] <bchrisman> Here's a snippet of context on that client log entry: http://pastebin.com/4x9z7Q8J
[2:23] <bchrisman> ahh 'ls' works now on samba-vfs mounted share.. read/write still needs help
[2:24] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:42] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[3:06] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[3:41] * djlee_ (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[3:48] * djlee__ (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[4:04] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[4:42] * votz_ (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (Quit: Leaving)
[5:45] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:09] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[6:17] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[8:40] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[8:40] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[8:40] * neurodrone_ is now known as neurodrone
[8:45] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[8:45] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[8:45] * neurodrone_ is now known as neurodrone
[8:55] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[9:00] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[9:15] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[9:26] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[10:13] * allsystemsarego (~allsystem@ has joined #ceph
[10:25] * MK_FG (~MK_FG@ has joined #ceph
[11:38] <lxo> yay, 0.28! building rpms...
[11:41] * Guest805 (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) Quit (Quit: No Ping reply in 180 seconds.)
[11:42] * bbigras (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) has joined #ceph
[11:42] * bbigras is now known as Guest1463
[11:59] * Hugh_ (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Read error: No route to host)
[12:03] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[12:07] * damien1 (~damien@94-23-154-182.kimsufi.com) has joined #ceph
[12:07] * damien1 is now known as damoxc
[12:07] <damoxc> does anyone know what the directory size limit is supposed to be (file count, not file size)
[13:19] * Meths_ (rift@ has joined #ceph
[13:25] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[14:27] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:38] * alexxy[home] (~alexxy@ Quit (Remote host closed the connection)
[14:43] * alexxy (~alexxy@ has joined #ceph
[14:51] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[14:52] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[14:57] * alexxy (~alexxy@ has joined #ceph
[14:59] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[15:22] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:58] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[16:20] <bchrisman> damoxc: the ceph design allows directory metadata to be split across multiple nodes/mds servers, which should allow much larger single directories than other clustered filesystems while maintaining good lookup performance.
[16:21] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:26] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[16:32] <damoxc> bchrisman: is the splitting something I have to do, or should it take place automaticaly?
[16:32] <damoxc> bchrisman: at about 300,000 files it grinds to a halt
[16:32] <bchrisman> should be automatic...
[16:32] <bchrisman> that's a multi-mds feature.. right now I'm doing single-mds testing, but I think others are testing the multi-mds capabilities.
[16:33] <damoxc> ah yeah I've spotted things about that
[16:33] <damoxc> i have 3 mds servers
[16:33] <damoxc> i assumed they'll be all used, or is that not the case?
[16:34] <bchrisman> ceph -s should show their states
[16:34] <bchrisman> mine will show 1 active, 2 standby
[16:34] <damoxc> mds e13: 1/1/1 up {0=up:active}, 2 up:standby
[16:34] <damoxc> hmm
[16:34] <damoxc> that doesn't look good
[16:34] <bchrisman> there's a way to enable multi-mds.. probably not on by default.
[16:35] <damoxc> yeah just trying to google it
[16:35] <bchrisman> but I'm not completely certain.. yeah.. probably something on the wiki about it.
[16:36] <damoxc> ceph mds set_max_mds N looks promisin
[16:36] <bchrisman> yep: OPTION(max_mds, OPT_INT, 1),
[16:37] <bchrisman> that just defaults to one.
[16:37] <bchrisman> though you might want to change it in your ceph.conf and restart
[16:38] <damoxc> true, i'll give that a try
[16:38] <damoxc> thanks for your help!
[17:35] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:06] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[18:14] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:41] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:45] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:56] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[18:58] * alexxy (~alexxy@ has joined #ceph
[19:06] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[19:07] * alexxy (~alexxy@ has joined #ceph
[19:08] * cmccabe (~cmccabe@ has joined #ceph
[19:10] <cmccabe> fyi, I have to implement getattrs() in librados
[19:15] <bchrisman> cmccabe: hey.. can you push in that spec file change? :)
[19:15] <bchrisman> htink it was just a description
[19:16] <cmccabe> yeah
[19:16] <bchrisman> cool thx
[19:20] <sagewk> cmccabe: this is weird: http://ceph.newdream.net/gitbuilder-deb-amd64/log.cgi?log=2fc13de1613415e7b84189e2e8839f362262393c
[19:20] <sagewk> i can't see where libcrush.c is coming from
[19:21] <sagewk> oh, maybe the build environment the gitbuilder machine is dirty... is that possible, tv?
[19:21] <cmccabe> sagewk: this kind of change might require a make distclean
[19:22] <cmccabe> sagewk: let me try it on my machine...
[19:25] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[19:27] <Tv> sagewk: the tree gets cleaned, but it uses ccache
[19:28] <Tv> (and i mean git clean, not make clean)
[19:28] <Tv> i'll clean the cache and rerun the failures
[19:29] <cmccabe> I was doing a clean rebuild but I just ran out of disk space
[19:29] <Tv> oh wait this is the deb builder -- i actually don't know as much about that ;)
[19:29] <Tv> yeah the deb builder has empty ccache, so i don't think that's at fault
[19:30] <Tv> non-deb builder failed too: http://ceph.newdream.net/gitbuilder/log.cgi?log=2fc13de1613415e7b84189e2e8839f362262393c
[19:30] <Tv> running locally with empty ccache
[19:31] <Tv> meeting?
[19:32] <sagewk> meeting!
[19:32] <cmccabe> I freed some disk space and am doing a build from scratch now. If that still has the problem then I'll take a closer look.
[19:44] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[19:47] <cmccabe> so to summarize
[19:47] <cmccabe> clean rebuild works, non-clean does not
[19:47] <cmccabe> make distclean itself appears to fail because of some internal confusion in automake
[19:48] <cmccabe> I can take a closer look and possibly find a more elegant solution if you give me access to the build machine; otherwise, I recommend rm -rf followed by git-clone
[19:49] <cmccabe> actually, another possibility would be git-reset to an earlier changing, running make distclean from that, and then pulling
[19:49] <cmccabe> *earlier change
[19:51] <Tv> make[3]: *** No rule to make target `libcrush.o', needed by `libcrush.a'. Stop.
[19:51] <Tv> that just happened in a clean clone with empty ccache :(
[19:52] <cmccabe> tv: find | grep Makefile.am | xargs -l grep libcrush
[19:52] <cmccabe> tv: <no results>
[19:52] <cmccabe> tv: libcrush does not exist anywhere in the makefiles; I don't know what to tell you
[19:53] <Tv> cmccabe:
[19:53] <Tv> err
[19:53] <cmccabe> tv: Is it possible that you didn't use make distclean or clone from scratch?
[19:53] <Tv> the tree is very very clean
[19:53] <Tv> hold on i'll get something reproducable
[19:54] <cmccabe> tv: that doesn't really answer the question
[19:54] <cmccabe> tv: did you use make distclean or clone from scratch?
[19:54] <Tv> clone from scratch / git clean -fdx
[19:54] <cmccabe> you did both?
[19:55] <cmccabe> actually, maybe the deb and RPM need to be updated...
[19:55] <Tv> it's not about packaging
[19:55] <cmccabe> I can see ceph.spec.in still refers to libcrush.so
[19:56] <Tv> git reset --hard 2fc13de1613415e7b84189e2e8839f362262393c && git clean -fdx && ./autogen.sh && ./configure --with-debug --with-radosgw && DISTCC_HOSTS='--randomize @flak/16,cpp,lzo @vit/16,cpp,lzo @swab/16,cpp,lzo @slider/16,cpp,lzo @kai/16,cpp,lzo' distcc-pump make -j100 CXX='distcc g++-4.4' CC='distcc gcc-4.4'
[19:56] <cmccabe> what do you see when you do find | grep libcrush
[19:56] <Tv> after failed build:
[19:57] <Tv> ./src/.deps/libcrush.Po
[19:57] <Tv> ./debian/libcrush1.postrm
[19:57] <Tv> ./debian/libcrush1.postinst
[19:57] <Tv> ./debian/libcrush-dev.install
[19:57] <Tv> ./debian/libcrush1.install
[19:57] <Tv> the Po goes away at git clean -fdx
[19:57] * Meths_ is now known as Meths
[19:57] <Tv> run that command line in a throwaway repo
[19:57] <cmccabe> earlier we speculated that removing the .deps directory would fix these kinds of problems
[19:57] <cmccabe> can you try rm -rf .deps and see if that helps?
[19:57] <Tv> git clean -fdx dude
[19:58] <cmccabe> ic
[19:58] <cmccabe> try grepping for libcrush
[19:58] <Tv> just run the command, it should be 100% reproducible
[19:58] <cmccabe> it must be embedded into some build product or script that is not getting cleaned
[19:59] <cmccabe> I can build fine here, cloning from scratch
[19:59] <Tv> that revision?
[19:59] <cmccabe> I will clone from scratch and try the command. one moment.
[20:01] <cmccabe> this is the command I'm running: git reset --hard 2fc13de1613415e7b84189e2e8839f362262393c && git clean -fdx && ./autogen.sh && ./configure --with-debug --with-radosgw make -j8
[20:01] <cmccabe> none of that distcc stuff
[20:01] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[20:02] <sagewk> maybe run the command on the ubuntu gitbuilder box in a clean repo without the distcc bits
[20:02] <Tv> sagewk: gitbuilders don't do distcc anyway
[20:03] <cmccabe> ok, I do have the same error here, about libcrush.o
[20:04] <cmccabe> oh, the reason is because the original change was flawed
[20:04] <cmccabe> you need both 14a3f262f5b57c5228f56b17e781b1383b7f17da and 2fc13de1613415e7b84189e2e8839f362262393c
[20:04] <sagewk> screwed up merge?
[20:05] <cmccabe> I think so
[20:06] <df> If one were to use a crush map to replicate data in ceph, who does the replication? does the client write the data to two osds? or does it write to one, then that osd forwards it on to the second host?
[20:07] <sagewk> df: the osds do the replication on your behalf
[20:08] <df> good :)
[20:12] <df> on a vaguely related note, i haven't managed to test this part yet (still trying to solve some performance issues), how well is failure of an osd handled?, assuming that it doesn't host any data required by the current working set of clients, will it just be ignored (and clients would only block if they attempted to access something on it)?
[20:12] <cmccabe> tv: if my experience is correct, gitbuilder should succeed with the latest master?
[20:13] <sjust> df: the clients will switch to accessing data on other osds while the osds re-replicate that data among the still-up osds
[20:14] <Tv> cmccabe: i see a green master..
[20:14] <df> sure, but assume the case where there is no replication -- (and is the advice never to do that)
[20:14] <cmccabe> tv: k
[20:14] <Tv> but not on the deb builder, digging..
[20:15] <cmccabe> yeah, debs need a fix, which I am about to post.
[20:15] <sjust> df: ah, the clients will block on data that was handled by the downed osd until it comes back up
[20:15] <Tv> cmccabe: ah ok
[20:15] <cmccabe> tv: do we have a test that runs make distcheck periodically?
[20:15] <df> sjust, but shouldn't block if they don't need to touch it
[20:15] <sjust> df: right
[20:16] <Tv> cmccabe: i don't think so; it was slow enough that i didn't want to make the existing gitbuilder do it, and i didn't have resources to run another one
[20:16] <Tv> but now that i shelved the testvm's, i might have the resources again..
[20:16] <cmccabe> it would really be nice to do that once a day or something
[20:16] <cmccabe> at least
[20:17] <cmccabe> it's just so easy to break it, and even though I always get mad when someone else does, I always do it too
[20:17] <Tv> yeah Just a Simple Matter of Sysadmin Time
[20:18] <cmccabe> we could make a tracker issue or something I guess
[20:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[20:23] * alexxy (~alexxy@ has joined #ceph
[20:26] <cmccabe> how do we build the deb again?
[20:26] <cmccabe> dpkg-buildpackage?
[21:03] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[21:26] <bchrisman> rpmbuild isn't building libcrush anymore.. not sure if that's a more general problem.. tracing it down
[21:28] <cmccabe> bchrisman: libcrush was merged into libcommon
[21:28] <cmccabe> bchrisman: should be removed from the spec file as well now
[21:28] <bchrisman> ahh so that needs to come out of the spec file...
[21:29] <cmccabe> it was removed from the spec.in, did you rerun make?
[21:29] <bchrisman> yup??? that in master? I'll rebase and check again
[21:29] <cmccabe> yes
[21:29] * aliguori (~anthony@ has joined #ceph
[21:32] <bchrisman> cmccabe: indeed it was
[21:45] <cmccabe> k
[21:57] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) has joined #ceph
[22:03] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (Quit: Leaving)
[22:11] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:12] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[22:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[22:14] * aliguori (~anthony@ has joined #ceph
[22:27] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[22:31] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[22:48] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:54] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:55] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[22:57] * lxo (~aoliva@ has joined #ceph
[22:58] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[23:18] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[23:25] * verwilst (~verwilst@dD5769611.access.telenet.be) has joined #ceph
[23:28] * alexxy (~alexxy@ has joined #ceph
[23:50] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.