#ceph IRC Log


IRC Log for 2012-08-15

Timestamps are in GMT/BST.

[0:00] <sjust> it looks reasonable
[0:50] <dmick> oh hey, 2918
[0:57] <sagewk> dmick: hah thanks. otherwise sane?
[0:58] <dmick> I'm frightened by things still using maxosd, but it's not like I've checked all the other uses
[1:00] <dmick> osd_state/weight are presumably sparse, at least
[1:00] <dmick> or sparse enough
[1:01] <dmick> hm. although it does resize() them..
[3:49] <lxo> eek! cluster snapshots are disabled in 0.50!
[4:36] <lxo> sjust, any idea of what it takes to unbreak cluster_snap ?
[8:45] <Tobarja> anyone using a desktop app such as gladinet to touch ceph storage via s3 or swift?
[9:08] <alphe> hello I have found a problem compiling ceph 0.50 with lib boost 1.50 under arch linux
[9:09] <alphe> I searched but didn't find any reference to that
[9:10] <alphe> it involves osd/PG.cc and the boost/smart_ptr management more precisely the intrusive_ptr_add_ref and the intrusive_ptr_release method from boost namespace
[9:17] <alphe> hello
[9:48] <NaioN> alphe: most developers are sleeping at the moment
[9:48] <NaioN> different timezone...
[9:49] <NaioN> sorry I can't help you
[10:40] <joao> alphe, what kind of a problem?
[11:06] <alphe> joao
[11:07] <alphe> sorry I was buzy trying to solve the issue
[11:07] <alphe> ok so here is the prob
[11:07] <alphe> ceph 0.50 when compiling on arch linux brings up an error during compilation
[11:08] <alphe> that error is related to boost/smart_ptr
[11:09] <alphe> for instance the error is something like intrusive_ptr_add_reference wasn't declared in this scope and same for intrusive_ptr_reference
[11:09] <alphe> using lib boost 1.50 (lastest one)
[11:10] <alphe> yeah I know arch is a pain... but at the mean time it allows you to control to the strict minimum your linux OS
[11:10] <alphe> and then figure out better source of problems
[11:11] <alphe> and OSD layer is full of problems
[11:11] <joao> I see
[11:11] <joao> can you pinpoint in which line is that error coming from?
[11:11] <alphe> so the ceph file involve is ceph-0.50/src/osd/PG.cc and PG.h
[11:11] <alphe> 5201
[11:11] <alphe> :)
[11:11] <joao> ty
[11:11] <alphe> PG.cc:5201
[11:12] <alphe> I tryed to backtrack but in fact those function are made to not be prototyped in boost/smart_ptr/intrusive_ptr.hpp but directly in the .h of your project
[11:13] <alphe> and the way it is done at the end of both file osd/PG.h and PG.cc should be the right way
[11:13] <alphe> now I'm trying to compile ceph 0.48 with libboost 1.50 to see if it works
[11:14] <alphe> oh I'm using linux kenel for 64 bits
[11:16] <alphe> the only difference I see is that in the boost instrusive_ptr.h the function commented is like that intrusive_ptr_add_ref( T * p); and that in PG.cc it is like this intrusive_ptr_add_ref(PG * p) or something like that
[11:19] <joao> alphe, try adding "#include <boost/intrusive_ptr.hpp>" on PG.h somewhere after the first bunch of boost includes, right on the beginning of the file
[11:19] <alphe> tryed that
[11:19] <joao> no joy?
[11:20] <alphe> didn't work
[11:20] <alphe> I even tryed adding back the commented lines into intrusive_ptr.hpp and no results T___T
[11:21] <joao> maybe there's some other dependency I'm missing; I think sjust would be the one to talk to when he's available
[11:21] <alphe> with that I got ride of the errors at .o generation but at final stage library linking it was crashing because of multiple definition of those 2 fonctions
[11:23] <alphe> hum during compilation of ceph 0.48 I have a can not allocate anymore memory
[11:24] <alphe> trying to compile on virtual machines virtualbox with archlinux on centos 5.8 it worked without problems
[11:26] <alphe> swap wasn't activated T___T
[11:27] <alphe> 5:30 am I worked all night long I'm broken
[11:29] <alphe> with 0.48 the compilation flag -Wnarrowing seems different than with the 0.50 I don't remember which one it was
[11:30] <alphe> wow with ceph-0.48 +libboost 1.50 on arch linux it works perfectly but the compilation is in -Wnarrowing
[11:41] <alphe> I'm trying a compilation of 0.50 now
[11:41] <alphe> to see if the problem wasn't related to the swap of my linux VM not being activated
[11:46] <alphe> boost::intrusive_ptr_add_ref(PG *) and the compilation option is -fpermissive
[11:50] <alphe> going to bed now sorry hope someone will see that mistery :)
[11:51] <alphe> bye
[16:33] <tjpatter> Using the 5-min startup guide, I am getting "HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean"
[16:33] <tjpatter> Any thoughts?
[16:38] <dspano> tjpatter: How many OSDs do you have?
[16:38] <tjpatter> 1
[16:39] <dspano> Is you're just playing around, you'll need to set your replication level to one then.
[16:39] <tjpatter> Do you know the setting off of the top of your head?
[16:41] <dspano> ceph osd pool set $poolname size 1
[16:42] <dspano> You're just using the default pools, so you'll want to run the command for the data, metadata, rbd pools
[16:43] <dspano> You can see your current replication and pools with this command.
[16:43] <dspano> ceph osd dump | grep size
[16:44] <dspano> My config looks like this.
[16:44] <dspano> pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1 owner 0 crash_replay_interval 45
[16:44] <dspano> pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1 owner 0
[16:44] <dspano> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 1 owner 0
[16:44] <dspano> pool 3 'nova' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 6 owner 0
[16:44] <dspano> pool 4 'images' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 owner 0
[16:44] <dspano> My osd dump I mean.
[16:44] <dspano> I have two OSDs that it replicates data to, so my size is 2.
[16:44] <tjpatter> I just changed my sizes to all 1's and I'm still getting the health warning
[16:45] <tjpatter> Should I restart the cluster to take effect?
[16:45] <dspano> What does your OSD log say?
[16:46] <tjpatter> a lot of "scrub ok" entries
[16:46] <tjpatter> Nothing out of the ordinary
[16:51] <dspano> That's good.
[16:51] <dspano> Do you have your monitor server running on the same machine.
[16:57] <tjpatter> Yes
[16:58] <tjpatter> 1 node cluster :)
[16:58] <tjpatter> I have a separate, staging cluster with many nodes. That is working great. Just the 1 node setup I get these errors on.
[16:59] <dspano> tjpatter: That's weird.
[17:00] <tjpatter> I can start over. No worries. Thanks for your help!
[17:00] <tjpatter> Was just wondering if anyone had ran into it out of the gate before.
[17:02] <dspano> tjpatter: Not out of the gate. I had that problem due to a messed up LAG. The two OSDs were sort of connected but not connected. It drove me nuts for weeks.
[17:02] <tjpatter> Ouch! Have any hair left after that?
[17:06] <Leseb> hi guys
[17:07] <Leseb> does this statement is still true: "client with RBD device mapped writes **only** on one OSD?" , my client shows connections to **every** OSDs of my cluster and seems to write on every OSDs (according to a tcpdump)
[17:38] <dspano> tjpatter: Yeah, I just slapped myself on the forehead after I figured it out.
[17:55] <BasketCase> What does ceph mds compat rm_incompat 1 do ? I've been playing around a bit, and it seems now my CEPHFS mounted file system is blank after I ran that ? ??? oopss??? after creating new MDS's and rebooting somehow i have back again??? compat compat={},rocompat={},incompat={1=base v0.20,3=default file layouts on dirs,4=dir inode in separate object} but no file system ? (although the cluster storage usage of the pools still seem a
[18:13] <gregaf> BasketCase: wow, I don't know why that command exists???you removed some compatibility bits from the MDSMap by running it; I'm not sure what the implications of that are???what nodes did you reboot?
[18:17] <Leseb> does this statement is still true: "client with RBD device mapped writes **only** on one OSD?" , my client shows connections to **every** OSDs of my cluster and seems to write on every OSDs (according to a tcpdump)
[18:18] <gregaf> Leseb: that statement was never true; where did you find it?
[18:18] <dspano> gregaf: I'll be sure to stay away from that command.
[18:20] <Leseb> gregaf: I saw that on s sage presentation on a slide
[18:21] <gregaf> which one?
[18:21] <Leseb> but maybe I was confused
[18:21] <Leseb> msst ??? april 16, 2012
[18:21] <dspano> gregaf: Is that compatibility bits as in POSIX compatibility bits?
[18:21] <Leseb> the slide says: "client writes the first replica"
[18:21] <Leseb> but client writes multiple chunks
[18:22] <gregaf> dspano: I don't think POSIX has a formalization of compat bits? but it's comparable to the compat bits in ext*, yes
[18:22] <gregaf> Leseb: ah, that's talking about how there are 2 or 3 (or N) replicas, but the client only writes one copy
[18:22] <dspano> gregaf: That's interesting. Thanks.
[18:23] <Leseb> yes, so at the end the client writes multiple chunks on different OSD but only one time (a replica) ??
[18:24] <Leseb> gregaf: ?
[18:25] <BasketCase> gregaf: well i basically rebooted everything ??? hehehe.. now it's at compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
[18:25] <BasketCase> would i be able to do a ceph mds setmap 1209 which is directly before i did such silly thing ?
[18:25] <gregaf> Leseb: the client has a virtual disk which is composed of 4MB objects ??? when it writes, it will write a given object to only one OSD, but it has many objects
[18:26] <gregaf> BasketCase: so I think those are the right compatbits, and honestly I think you ought to be seeing your filesystem again if you've restarted the cluster and the clients
[18:26] <Leseb> gregaf: I see, thanks for the clarification :)
[18:26] <gregaf> if you aren't, you could restart one of your MDSes with high debugging enabled, zip up the resulting log, and ftp it to cephdrop@ceph.com
[18:27] <BasketCase> i have most all the logs up to his point at debug levels, but honestly i was playing last night??? and tinkering for awhile???
[18:28] <BasketCase> as in i may have also added and removed MDS nodes too??? and rebooted many times
[18:28] <BasketCase> it all started from an MDS failing and no other MDS would take over...
[18:28] <gregaf> next time that happens you probably don't want to run random commands ;)
[18:29] <BasketCase> heheh.. yah i was seeing what would happen.. i'm good at doing that :)
[18:29] <gregaf> so turn off all your MDS nodes and generate a fresh log when you start up one of them
[18:29] <gregaf> and include the output of "ceph -s" :)
[18:39] <BasketCase> k. i'll send them over here in a bit
[18:40] <BasketCase> no big deal.. nothing i can't loose.. just wondering if it's fixable or not
[19:26] <dspano> Leseb: I read your post on multitail. That tool is awesome!
[19:53] <joao> do we have a way to obtain an hash out of a bufferlist?
[19:54] <sjust> what kind of hash?
[19:56] <joao> I think a crc would be fine
[19:56] <sjust> buffer.h I think has a crc32 method
[19:57] <mikeryan> nhm: osd_scrub_min_interval and osd_scrub_max_interval
[19:57] <mikeryan> each is in seconds
[19:57] <mikeryan> set them to like 86400 each
[20:00] <joao> sjust, it does indeed; thanks
[20:01] <Tobarja> is `du -sb {cephfs-mounted-partition}` known to be inaccurate vs ceph -s?
[20:09] <sagewk> looks like a regression in -rc1
[20:43] <BasketCase> Is there a particular login to use for mailto:cephdrop@ceph.com ?
[20:45] <gregaf> BasketCase: you should be able to just ftp files in without a password
[20:46] <BasketCase> hmmm says password required for cephdrop??? sftp://cephdrop@ceph.com
[21:10] <maelfius> dmick: so the hang I was seeing yesterday, seems like it was something wrong with the environment. I tore everything down and spun up (albiet a slightly larger cluster) and it seems to work now. Thanks for the help yesterday.
[21:35] <dmick> maelfius: glad to hear it. odd problem.
[21:55] <alphe> Hello
[21:56] <alphe> I found I problem with libboost and osd/PG.cc and osd/PG.h when compiling the 0.50 on arch linux using libboost 1.50
[21:56] <alphe> it is due to the adding of intrusive_ptr_add_ref and intrusive_ptr_release in the code of ceph 0.50
[21:57] <dmick> alphe: I just saw a bug for that go by
[21:57] <dmick> http://tracker.newdream.net/issues/2946
[21:57] <dmick> or, if not that, something pretty similar
[21:59] <alphe> ty dmick I look at it right now
[21:59] <dmick> sjust: did you see that bug, and any comments?
[22:00] <sjust> yeah, we've seen it
[22:00] <sjust> looking again now
[22:01] <alphe> thank you it seems important in the smart_ptr managing stuff but when you look at the boost/smart_ptr documentation in the test source code they provide there is never mention of the need to implemente them directly
[22:02] <alphe> rather you use swap_ptr things ...
[22:02] <sjust> it's necessary specifically for intrusive_ptr
[22:03] <alphe> if ytou just remove those added line at the end of osd/PG.cc and osd/PG.h then the error is triggered that those methods are needed and are not defined
[22:04] <sjust> right
[22:04] <sjust> alphe: exactly what was the error/
[22:04] <sjust> ?
[22:04] <alphe> but a diff betwin 0.48 and 0.50 for those files only shows those lines involvint intrusive_ptr as additions
[22:05] <alphe> I prepared an email can I send it to you ?
[22:05] <sjust> yep
[22:05] <sjust> sam.just@inktank.com
[22:05] <dmick> or ceph-devel?...
[22:05] <sjust> even better
[22:05] <joshd> yeah, others might have the same problem
[22:05] <alphe> I subscribed to ceph-devel but had not reply
[22:06] <dmick> you mean majordomo didn't send you the confirmation email back?
[22:06] <dmick> it can be picky about its email commands
[22:06] <alphe> he did but then told me that the owner had to grant me access
[22:10] <dmick> I don't remember needing that step. Maybe try again?...but if it doesn't work, we can post it there for you of course
[22:11] <dmick> http://vger.kernel.org/majordomo-info.html may be useful
[22:13] <alphe> I sent you the email
[22:14] <alphe> it is not very verbose you have the 2 diff files for PG.cc and PG.h and the copy past of the error bunch of message droped by g++
[22:16] <alphe> can't be that it needs some keywords to overload those methods into boost namespace ?
[22:16] <sjust> I'm not sure what you mean
[22:16] <alphe> like using friendly or extern
[22:17] <alphe> something to indicate to the compiler that we will insert the definitions of both methods into th boost remote namespace
[22:20] <dmick> alphe: are you using gcc 4.7?
[22:22] <alphe> that is the one that comes with arch linux base-devel let me ask the version
[22:22] <alphe> 4.7.1
[22:22] <alphe> gcc 4.7.1 to be precise
[22:23] <alphe> that is the one that comes by default with pacman -S base-devel
[22:24] <alphe> prerelease version of the 20120721
[22:24] <dmick> ok, that's common to where we've seen the problem manifest, just making certain
[22:27] <alphe> ok
[22:28] <alphe> I try stuff like adding the prototyping of those methods back into boost/smart_ptr/intrusive_ptr.hpp then the compilation error at .o stage doesn't show but it crash when he does the final linking because of multiple definitions
[22:31] <alphe> could it be that the args type are differents ?
[22:32] <alphe> cause in intrusive_ptr.hpp you have things like intrusive_ptr_add_ref(T * p); and in PG.h/cc you have intrusive_ptr_add_ref(PG *pg)
[22:44] <sjust> 4.7 changed the argument dependent lookup rules
[22:44] <alphe> ok lost ...
[22:45] <alphe> ahahaha
[22:45] <alphe> ok so basically T * p should be substitued by PG *pg
[22:45] <sjust> yeah, but actually ReplicatedPG* pg
[22:45] <sjust> since ReplicatedPG > PG
[22:46] <alphe> yeah it's like a chained list but with class
[22:52] <alphe> I have to go now you have my mail in case it works
[22:52] <alphe> or you find a work around please tell me in advance thank you and thank you for your time
[22:53] * alphe (~shadwolf@pc-61-190-86-200.cm.vtr.net) has left #ceph
[23:40] <joao> can someone please poke me when you guys head out to the business update?
[23:41] <gregaf> Jude says within 5 minutes
[23:42] <joao> cool
[23:42] <joao> thanks
[23:52] <dmick> 10 and counting
[23:53] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.