[4:11] <ryann> having trouble with rbd create. I'm aware of the bug, added osd_class_dir = /usr/local/lib/rados-classes to ceph.conf. still doesn't work. :-/
[4:14] <dmick> what's happening ryann?
[4:16] <ryann> [librbd: failed to assign a block name for image] is what I'm getting. I'm trying to perform the command on the mon.0 node.
[4:16] <ryann> tried adding /usr/local/lib/rados-classes to ld.so.conf, did ldconfig. Futile, I know. :P
[4:17] <dmick> so you built from source and installed in /usr/local?
[4:18] <ryann> dmick: Yeah, I know. I didn't do a PREFIX=/usr. is that my problem?
[4:18] <dmick> I'm not suggesting anything, just trying to make sure I understand where we're at
[4:19] <ryann> dmick: no offense taken :)
[4:19] <dmick> anything in the osd log file with cls_ in it?
[4:19] <dmick> er, sorry, cls
[4:22] <ryann> dmick: nothing [cat ceph-osd.0.log | grep cls] Do i need to change how it logs debugging or something?
[4:22] <dmick> possibly; depends on your ceph.conf; just looking for clues
[4:22] <dmick> which bug did you mean, btw?
[4:23] <ryann> dmick: http://ceph.com/docs/master/dev/osd-class-path/ has it laid out nicely.
[4:23] <dmick> look at that.
[4:24] <dmick> which commit did you build?
[4:24] <dmick> (ceph -v)
[4:25] <ryann> dmick: ceph-0.48argonaut
[4:25] <ryann> oops; ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
[4:25] <dmick> mmk
[4:26] <dmick> so that fix was definitely included (i.e. the fix for 1722) so I doubt that's the issue specifically
[4:27] <dmick> but it could be some other problem loading the libcls_rbd.so, certainly
[4:27] <dmick> let's see..
[4:28] <dmick> what exactly did you put in your ceph.conf, and in which section?
[4:28] <dmick> (relating to osd_class_dir)
[4:30] <ryann> I put it under [osd] and the exact line was osd_class_dir = /usr/local/lib/rados-classes
[4:31] <ryann> should it be under [global]
[4:31] <dmick> I think it must not have the _
[4:31] <dmick> notice how the others don't; I don't know if our parser forgives that or not
[4:31] <ryann> dmick: OOOOOOOHHHHH.... :P
[4:31] <ryann> ( I'll try that...)
[4:31] <dmick> but...it also should no longer be necessary at all
[4:32] <dmick> the default should work
[4:32] <dmick> there's a way to ask it to dump the config from the daemon so you know what it thinks curerntly
[4:32] <dmick> looking
[4:35] <dmick> an admin socket command, "config show"...
[4:35] <ryann> dmick: ??
[4:36] <dmick> yeah, looking for an exact syntax
[4:38] <ryann> :q
[4:38] <dmick> error: can't quit :)
[4:38] <ryann> I know! :) wrong focus...
[4:42] <dmick> there it is
[4:42] <dmick> ceph --admin-daemon <path-to-admin-socket> config show
[4:43] <dmick> the admin sockets show up as "client.id.asok" in the client dirs
[4:43] <ryann> dmick: Trying that...
[4:43] <dmick> i.e. mon.a.asok, etc.
[4:43] <dmick> so for instance mine shows, among other things
[4:43] <dmick> osd_class_dir = /usr/lib/rados-classes
[4:44] <dmick> the underscore-vs-space thing has bitten me before
[4:45] <dmick> but as I say, the default should be right
[4:45] <ryann> dmick: in the client dirs? Ok this may help. Do i need to be at a client? I don't have the FS mounted, nor did I have the kernel module loaded yet. Perhaps i need to run this on another machine as a client (not the mOnitor)?
[4:45] <dmick> "client" in this case means "daemon"
[4:46] <dmick> what's your cluster look like? how many mons/osds on how many machines?
[4:46] <ryann> 3 mon/mds's (same nodes) 4 OSD nodes
[4:46] <dmick> I *think* you should be able to do this from any monitor node
[4:47] <dmick> since it's an osd setting, it should also work from the osd node
[4:48] <ryann> dmick: be right back
[4:55] <dmick> the .asok path may be mentioned in the .conf file; if not it defaults to /var/run/ceph/*asok
[4:55] <ryann> dmick: what is the socket file called? not sure if I'm seeing that. :-/
[4:56] <ryann> oh you juuuust asnwered that...
[4:57] <ryann> that command Shows the correct path under osd_class_dir, at least for osd.0
[4:58] <dmick> this is with your edit to the ceph.conf with the underscores?
[4:59] <ryann> This is the edit without underscores. so my config file shows osd class dir = /usr/local/lib/rados-classes
[4:59] <dmick> ok. (and you restarted the cluster)
[5:00] <ryann> dmick: sorry if I take a while to respond here and there. I'm at a TV station, and we just went to air for our 10pm show. Yes, I stopped, then restarted the cluster. (Service ceph -a stop///start)
[5:00] <dmick> ah. ok. Sorry I'm going through this with baby steps; I'm a bit new too
[5:02] <ryann> Also, I noticed the webpage said that it was looking for cls_rbd.so, I only have libcls_rbd.so (and so.1, so.1.0.0 <- actualy library). is there a file 'cls_rbd.so' that i'm missing? does your setup have it?
[5:02] <dmick> no, it meant libcls_rbd.so
[5:03] <dmick> what does ceph-clsinfo /usr/local/lib/rados-classes/libcls_rbd.so say?
[5:03] <ryann> rbd 2.0 x86-64
[5:03] <ryann> both on the mon.a and osd.0
[5:04] <dmick> is there anything in the osd log about load_class?
[5:05] <dmick> (by which I guess I mean "osd logs", if there are 4)
[5:05] <ryann> not in ceph-osd.0.log should I be looking at a different file?
[5:06] <dmick> I'm not 100% clear on that; if you could check them all that would be solid proof
[5:06] <dmick> i.e. ceph-osd.N.log
[5:06] <dmick> (on each osd node, IOW)
[5:07] <dmick> (I find having cssh open to the machines all at once to be very useful for stuff like this)
[5:08] <ryann> cssh huh. I just tried an ssh. hold on...
[5:08] <dmick> actually we can tell from pmap whether the osd loaded the .so or not
[5:09] <dmick> on the osd nodes, pmap $(pgrep ceph-osd) | grep libcls_rbd
[5:09] <ryann> So far no.
[5:10] <ryann> no osd nodes have loaded the library
[5:10] <dmick> hm. ok
[5:11] <dmick> and no mention of load_class in the logfiles?
[5:11] <ryann> Wait! Got hit on osd4...
[5:12] <ryann> _load_class could not open class /usr/local/lib/rados-classes/libcls_rbd.so (dlopen failed) No such file or directory. I know it's there. however it's a link to the other so.1.0.0 file. I wonder if it's permissions?
[5:12] <dmick> No such file wouldn't be permissions, I don't think
[5:13] <dmick> but yes, we're narrowing down here
[5:13] <ryann> Wait. hold on. The file libcls_rbd.so.1 is there, NOT libcls_rbd.so. hmmm wonder why..
[5:13] <dmick> hmmm
[5:13] <ryann> manually created. Trying again........
[5:14] <dmick> lrwxrwxrwx 1 root root 19 Jul 30 10:56 libcls_rbd.so -> libcls_rbd.so.1.0.0
[5:14] <dmick> lrwxrwxrwx 1 root root 19 Jul 30 10:56 libcls_rbd.so.1 -> libcls_rbd.so.1.0.0
[5:14] <dmick> is what I have
[5:14] <ryann> yah. checking the other nodes...
[5:15] <ryann> Sigh....
[5:15] <ryann> Worked :)
[5:16] <ryann> Here is the time when the abbrv OMG really applies.
[5:16] <dmick> good. so something went awry in the install.
[5:16] <dmick> I don't suppose you saved make install logs :)
[5:16] <dmick> (no one ever does)
[5:17] <ryann> I agree. I havne't deleted the build folders. However, the build machine and the node machines are 2 different systems. It was annoying as heck to package up the ceph stuff after a make install and transfer it to a machine that didn't have gcc, etc installed.
[5:18] <dmick> yeah. there's a .deb target, but if you're not on Debian that's not very useful. I *think* the RPM target sorta works, but I have little experience with that
[5:18] <ryann> The nodes only have a very small USB partition as root. So i couldn't afford to build ceph at those machines. I had to create a build machine that matched completely libc, etc (CentOS 6.2) and then build on that for those other machines.
[5:18] <dmick> did you make .rpms?
[5:18] <ryann> Ya know, is make DEST_PREFIX=/someting worked like the kernel builds =, then I could easily package into a built tar file. :P
[5:19] <dmick> there's almost certainly a tarball target too; I'd be surprised if that didn't work
[5:19] <ryann> dmick: make DEST_PREFIX=/someting install I mean. I wanted to to an RPM build, but there was a huge caveat that kept me from doing that. I can't remember.
[5:20] <dmick> trying to remember how to make tarballs
[5:22] <dmick> make dist?...
[5:22] <dmick> naw, that's source
[5:22] <dmick> DESTDIR, perhaps?
[5:23] <dmick> http://www.gnu.org/software/automake/manual/automake.html#Staged-Installs
[5:32] <dmick> yeah, that did the job
[5:32] <dmick> $ mkdir inst; make DESTDIR=$(pwd)/inst install
[5:38] <ryann> dmick: I'm here just into something for a sec.
[5:38] <dmick> k. I'm about to go
[5:38] <dmick> but yeah, .rpm build or DESTDIR build is probably more convenient and error-proof. for next time.
[5:41] <ryann> thanks dmick!
[5:41] <dmick> np! Sorry it took so long to get there, but thank you for helping me learn with the diagnostic process!
[5:52] <ryann> C Ya!
[10:10] <loicd> fc: interesting information from Sage Weil on the "Puppet modules for Ceph" thread. What do you make of it ?
[10:32] <tnt> How can I tell ceph that a PG is lost forever ? I see "pg 12.1 is stuck stale+active+clean, last acting [0]" and I know there won't be an copy available anywhere ever again ... Since the failure I replaced osd0 by a new copy but that PG was only on OSD0, no other copies.
[11:18] <tnt> I tried 'ceph osd lost X' but that doesn't change anything
[15:31] <spongie> is there an in depth explanation of how the ssh keys are used for authentication?
[16:16] * LarsFronius (~LarsFroni@dyndsl-031-150-008-069.ewe-ip-backbone.de) has joined #ceph
[16:28] <tnt> Mmm, when have cephx auth enabled, the OSD fail to start with "** ERROR: osd init failed: (95) Operation not supported" ...
[16:37] <tnt> Nevermind, PEBKAC ... I put the 'auth supported = cephx' in the wrong section of the config ...
[17:13] <newtontm> hi, when running "ceph -s" or "ceph health" is there a way to specify a different user than "client.admin" ?
[17:14] <newtontm> let's say I want to specify "client.nagios" for the monitoring...
[17:21] <tnt> -n ?
[17:22] <newtontm> thx ;)
[18:20] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[22:16] <dabeowulf> tnt: Out of curiosity, did you manage to do that somehow: "How can I tell ceph that a PG is lost forever ?"
[22:17] <joshd> dabeowulf: http://ceph.com/docs/master/ops/manage/failures/osd/#unfound-objects
[23:29] <Tv_> FYI: all gitbuilders going through a rolling reboot -- i'm making all cache=writeback for faster IO
[23:39] <dmick> faster gitbuilders yay
[23:44] <elder> Any benchmarks on what difference it will make?
[23:56] <Tv_> 200MB/s vs 250MB/s streaming writes

