#ceph IRC Log


IRC Log for 2011-04-27

Timestamps are in GMT/BST.

[0:10] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[0:10] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[0:10] * neurodrone_ is now known as neurodrone
[0:17] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[1:00] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:10] * joshd (~jdurgin@ has joined #ceph
[1:11] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[1:36] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:57] * joshd (~jdurgin@ Quit (Ping timeout: 480 seconds)
[2:09] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:21] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:30] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:37] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:41] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[4:17] * Guest1850 (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) Quit (Quit: http://quassel-irc.org - Discuter simplement. Partout.)
[4:18] * bbigras (quasselcor@bas11-montreal02-1128536388.dsl.bell.ca) has joined #ceph
[4:18] * bbigras is now known as Guest3338
[4:30] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[4:34] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[4:36] * DanielFriesen (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Quit: http://daniel.friesen.name or ELSE!)
[4:36] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[8:04] * Juul (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) has joined #ceph
[8:13] * joshd (~jdurgin@ has joined #ceph
[8:22] * lidongyang (~lidongyan@ has joined #ceph
[8:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:56] * Juul (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[9:19] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[9:24] * allsystemsarego (~allsystem@ has joined #ceph
[9:29] * chraible (~chraible@blackhole.science-computing.de) Quit (Remote host closed the connection)
[9:40] * joshd (~jdurgin@ Quit (Quit: Leaving.)
[9:51] * chraible (~chraible@blackhole.science-computing.de) has joined #ceph
[10:26] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[10:44] * greghome (~greghome@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[10:55] * greghome (~greghome@cpe-76-170-84-245.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[10:56] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[11:28] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[14:16] * hijacker (~hijacker@ Quit (Ping timeout: 480 seconds)
[14:23] * hijacker (~hijacker@ has joined #ceph
[14:38] <stingray> I have this interesting state where degraded doesn't go down below certain number
[14:38] * zk (identsucks@whatit.is) has left #ceph
[14:38] <stingray> is it because of some missing chunks, and how I can fix it?
[14:45] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Connection reset by peer)
[14:45] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:46] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[14:55] * MarkN (~nathan@ has joined #ceph
[15:01] * stingray is now known as trollface
[15:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[15:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[15:35] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit ()
[16:15] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:18] * aliguori_ (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:45] <wido> trollface: you mean degraded objects?
[16:45] <wido> I guess you have some non-clean PG's
[16:45] <wido> my cluster for example: pg v611200: 10608 pgs: 379 active, 549 active+clean, 4 active+clean+crashed, 755 peering,
[16:46] <wido> In a tip-top situation you would have 10608 active+clean PG's
[16:46] <trollface> 2011-04-27 18:46:01.925357 pg v175903: 808 pgs: 779 active+clean, 29 active+clean+degraded; 1229 GB data, 3693 GB used, 9486 GB / 13885 GB avail; 17495/950796 degraded (1.840%)
[16:46] <wido> something is causing the cluster not to recover
[16:46] <wido> ceph pg dump -o -|grep degraded
[16:46] <wido> that will give you the degraded PG's
[16:47] <wido> Are all the OSD's up and in?
[17:14] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[17:21] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[17:35] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:41] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:50] * greglap (~Adium@ has joined #ceph
[17:51] <trollface> wido: yeah, to the extent
[17:51] <trollface> (there were a couple more osds but they're gone)
[17:52] <trollface> so, I've got the list
[17:52] <trollface> now how can I fix it ?
[18:05] <greglap> trollface: do you have any OSDs currently marked down?
[18:28] * eternaleye_ (~eternaley@ has joined #ceph
[18:28] * eternaleye (~eternaley@ Quit (Read error: Connection reset by peer)
[18:43] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:44] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:48] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:49] <trollface> gregaf: no
[18:49] <trollface> 4 osds, 4 in, 4 up
[19:11] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:31] <sagewk> skype!
[19:42] <sagewk> bchrisman: whats the automake thing you hit?
[19:45] <bchrisman> testceph.cc doesn't compile/link: 1) doesn't build testceph.o 2) doesn't attempt to link versus libceph (-lceph) 3) linking doesn't include testceph.o. It might be our version of the automake stuff. I'll get the exact command line it's performing and what I executed to get it to compile in one sec
[19:47] * Yulya_ (~Yulya@ip-95-220-190-12.bb.netbynet.ru) has joined #ceph
[19:50] <bchrisman> sagewk: http://pastebin.com/FZziFRkW
[19:52] <bchrisman> we're still ironing out some bugs in our automatic build/deploy tools so I wont have a cluster to test on for another couple hours.. once that's up, I'll verify that testceph works??? Colin mention he'd tested it, so I'll just be verifying that there aren't any environment dependencies.
[19:52] <bchrisman> From there I'm converting testceph.cc into a testceph.c, which I'll use to make sure all the libceph calls are working before my next iteration of testing the samba vfs module.
[19:53] <Tv> bchrisman: have you changed e.g. src/Makefile.am at all?
[19:53] <bchrisman> Tv: nope.
[19:53] <bchrisman> fresh checkout of newdream master
[19:53] <Tv> bchrisman: i'll do a clean build & see what happens for me
[19:53] <bchrisman> I did that after you guys mentioned it might be a cleanup thing.
[19:54] <bchrisman> Yeah.. I'm guessing there's automake dark magic at work...
[19:54] * Yulya (~Yulya@ip-95-220-161-228.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[19:57] <bchrisman> this isn't blocking anything we're working on???
[19:58] <sagewk> hrm testceph builds for me just fine, but it isn't authenticating for some reason. i'll futz with that in a bit
[19:58] <sjust> trollface: you mentioned that you had more osds earlier? How did they get removed?
[20:03] <trollface> sjust: by force :)
[20:03] <bchrisman> sagewk: yeah??? just did it again from a fresh git clone: ./autogen.sh -> ./configure ???without-tcmalloc -> cd src -> make testceph ???> can't find main.. must be a different version of some build tool
[20:04] <bchrisman> does your libtool line list -lceph and testceph.o?
[20:06] <Tv> /bin/bash ../libtool --tag=CXX --mode=link ccache distcc g++-4.4 -Wall -D__CEPH__ -D_FILE_OFFSET_BITS=64 -D_REENTRANT -D_THREAD_SAFE -rdynamic -Wtype-limits -Wignored-qualifiers -Winit-self -Wpointer-arith -fno-strict-aliasing -Wnon-virtual-dtor -Wno-invalid-offsetof -Wstrict-null-sentinel -g -O2 -Wl,--as-needed -latomic_ops -o testceph testceph.o libceph.la libcrush.la -lpthread -lm -lcrypto++
[20:06] <Tv> the end looks very different
[20:08] <bchrisman> not all too familiar with the libtool library archives.. is that going to statically link testceph?
[20:10] <Tv> .libs/testceph: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
[20:10] <Tv> dynamic
[20:11] <bchrisman> ahh cool
[20:15] <Tv> bchrisman: ohhhh
[20:16] <Tv> bchrisman: you need to say ./configure --with-debug
[20:16] <Tv> right now you're asking it to compile just a random .c it has no knowledge of -> it can't pull in the right libs
[20:17] <bchrisman> It's attempting to compile testceph.cc
[20:17] <bchrisman> I can test that quickly..
[20:18] <Tv> bchrisman: yeah but only because you said "make testceph"
[20:18] <Tv> bchrisman: if you had said "make", it wouldn't have touched it at all
[20:18] <bchrisman> ahhh
[20:18] <Tv> now you're just hitting the make default rules
[20:18] <bchrisman> is there a rule there for building testceph?
[20:18] <bchrisman> how are you building it?
[20:18] <Tv> yeah, as long as you say --with-debug
[20:18] <bchrisman> ahh ok
[20:19] <Tv> now that i look at it, we could probably redo how --with-debug works, so that it *knows* about the things, but doesn't try to compile them by default...
[20:20] <bchrisman> hmm.. come to think of it, I should've checked that there was a rule in the generated makefile???
[20:21] <bchrisman> hmm.. I did check that actually??? there's a target testceph$(EXEEXT): which evaluates to testceph:
[20:21] <bchrisman> but yeah..
[20:21] <bchrisman> works with configure ???with-debug
[20:23] * MarkN (~nathan@ Quit (Ping timeout: 480 seconds)
[20:24] <sjust> trollface: at the same time?
[20:24] <trollface> sjust: no, I was allowing the system to resync
[20:25] <trollface> but I do not exclude the possibility something went missing
[20:25] <trollface> now I just want to fix it. Is it possible? if there are missing chunks I can recreate them
[20:25] * MarkN (~nathan@ has joined #ceph
[20:25] <trollface> I will run sha1deep -r /ceph overnight to check if it can read everything
[20:26] <trollface> but I can do rsync -c as well
[20:29] <gregaf> trollface: sjust: but none of the objects are missing, right?
[20:29] <gregaf> just marked as degraded
[20:37] <trollface> yes
[20:38] <trollface> all pgs in dump that match degraded are active+clean+degraded
[20:38] <trollface> but it doesn't undegrade itself
[20:38] <trollface> ...
[20:48] <sjust> trollface: try ceph pg dump -o -
[20:48] <sjust> the 4th column will give you unfound objects for each pg
[20:48] <sjust> let me know if it's non-zero for any pg
[20:50] <bchrisman> Tv: sagewk: yeah??? that was it.. running ./configure ???with-debug...
[20:50] <bchrisman> Tv: thx
[20:55] <Tv> bchrisman: i have a branch that makes the source work the way you tried to use it ;)
[20:57] <bchrisman> Tv: that going into master soon? :)
[21:05] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[21:05] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[21:05] * neurodrone_ is now known as neurodrone
[21:08] <trollface> sjust: it's zero for all pgs
[21:09] <trollface> (the "unf" column)
[21:10] <trollface> [3,2] i guess it's osd ids
[21:10] <trollface> I have replication set to 3
[21:12] <trollface> okay, let me try fun stuff with crushmap
[21:13] <trollface> I'll get back to you with a nice "I am an idiot" facial expression
[21:15] <jjchen> Some quick questions about Rados object attributes. 1. Is there any limit on the number of attributes or the length of an attribute? 2. How attributes are stored physically? The reason I am asking this question is that we plan to store user metadata inside object attributes. If the application always replace the entire metadata, is it more efficient to store all metadata inside one attribute than storing them in separate attributes?
[21:16] <bchrisman> I'm passing void *'s into libceph for things like ceph_mount_t and ceph_dir_result_t??? a bit messy, but I'm guessing that should work
[21:17] <Tv> bchrisman: waiting for http://ceph.newdream.net/gitbuilder/#origin/with-debug to go green..
[21:18] <gregaf> jjchen: object attributes are stored as xattrs on the local filesystem
[21:18] <Tv> jjchen: those are probably (i'm not the expert on this) mapped to xattrs in the user.ceph.* namespace, and thus have some size limits
[21:19] <gregaf> the OSD code will split them up if necessary so there's not a limit on attribute size, but...
[21:20] <gregaf> if your underlying fs has a limit on the number and size of xattrs then the total size needs to fall within that limit, minus some space for bookkeeping
[21:20] <Tv> bchrisman: hey the mailing list thing.. that seems to a confusion between "struct ceph_mount_t" vs "ceph_mount_t" without the struct
[21:20] <Tv> bchrisman: as in, it's not a typedef
[21:21] <gregaf> I think with btrfs there are no limits on object attributes (it can store an unlimited number), but with ext* you're stuck at 4k total or something
[21:21] <bchrisman> Tv: ahh yes.. so that would need to be typedef'd to be used in the prototypes?
[21:21] <Tv> bchrisman: so you probably shouldn't try to work around it, it looks like it's just invalid C as is
[21:21] <Tv> bchrisman: yeah, or the prototypes should use struct
[21:22] <Tv> (there's schools of thought that think C typedefs are evil; Linus being the head of one of them)
[21:22] <bchrisman> Tv: heh??? the alternative is defining the struct wherever you're planning on dealing with it? or using a macro of some kind?
[21:23] <Tv> bchrisman: if it's always used as a pointer in the prototypes, just "struct foo;" is enough of a declaration
[21:23] <Tv> http://lkml.indiana.edu/hypermail/linux/kernel/0206.1/0402.html
[21:30] <Tv> bchrisman: well, my with-debug branch is green.. i'm just gonna merge it in and see who yells ;)
[21:31] * eternaleye_ is now known as eternaleye
[21:32] <Tv> bchrisman: let me know if you don't have enough to make progress on the libceph stuff; should be easy to fix the prototypes to really work with C, nobody's just tried that yet I guess
[22:18] <jjchen> Just found the link:http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg01011.html on the questions I am asking, which says "It appears that the various underlying filesystems that we currently use (btrfs, ext3, ext4) have some limits on the xattrs sizes. For ext3/4 the total sizes of all xattrs on a single file is limited and can't go beyond a single block (e.g., typically 4k), whereas in btrfs, the limitation is per each xatt
[22:18] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:20] <gregaf> ah, yep!
[22:21] <gregaf> so if you use btrfs as a backing store you're fine, with the others you will be limited in size (Ceph also uses some of that xattr space internally so you actually have a bit less space available than if you were just using ext4)
[22:21] <jjchen> Sounds right
[22:42] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:53] <wido> Tv: My mon is about to explode again
[22:53] <wido> res = 3.7G atm
[22:54] <Tv> wido: my only guess is that some oddity in messenger is queueing outgoing undelivered messages
[22:54] <Tv> wido: (not acked because osds are slow)
[22:54] <Tv> wido: but i don't know enough of that code to debug it effectively
[22:54] <wido> could be, I've got logs from mon = 20, ms = 1
[22:54] <wido> would that be worth anything?
[22:54] <Tv> sagewk, gregaf: do you have time?
[22:55] <wido> Oh, no rush, going afk in a bit
[22:55] <wido> I now know how to reproduce it
[22:56] <wido> I just want to know which logs should are useful don't know if mon = 20 and ms = 1 is enough
[22:56] <wido> you said that you guess it's the messenger, if so, would ms = 20 be better?
[22:59] <Tv> wido: sadly i just don't know enough to answer
[22:59] <Tv> i have not internalized what debug settings affect what
[22:59] <wido> No prob, I'll retry with both mon and ms to 20 and retry it, I'll file a bug tomorrow
[22:59] <wido> I'm afk!
[22:59] <sagewk> wido: not sure logs will help.. is it reproducible?
[23:00] <sagewk> tcmalloc has a way to give you a heap profile/report.. it's in the wiki IIRC
[23:00] <sagewk> may need to be enabled at startup tho
[23:01] <wido> sagewk: It's semi-reproduceable, but only on my cluster I think
[23:01] <wido> I'll check out the heap profile tomorrow, really afk now
[23:03] <sagewk> wido: sounds good thanks!
[23:25] * joshd (~jdurgin@ has joined #ceph
[23:26] * joshd (~jdurgin@ Quit ()
[23:27] * joshd (~jdurgin@ has joined #ceph
[23:33] <yehudasa> Tv: trying to compile --with-nss, fails due to missing pk11pub.h. Shouldn't we include nss/pk11pub.h instead, or is there other magic involved?
[23:34] <Tv> yehudasa: the pkg-config is supposed to add the right -I
[23:35] <Tv> $ pkg-config --cflags nss
[23:35] <Tv> -I/usr/include/nss -I/usr/include/nspr
[23:36] <yehudasa> yeah, that's what returns
[23:36] <yehudasa> but I guess it's only added for the specific modules that use ceph_crypto.h?
[23:36] <Tv> yehudasa: you need to use ${CRYPTO_CXXFLAGS} etc
[23:37] <yehudasa> Tv: right
[23:38] <Tv> that's not just modules that use ceph_crypto.h, though rgw is the only one that wants crypto but not ceph_crypto.h
[23:38] <yehudasa> Tv: anything that includes ceph_crypto.h needs to have that
[23:39] <Tv> yes, but there are things that do not include ceph_crypto.h that need that
[23:39] <Tv> anything that wants more than AES

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.