#ceph IRC Log


IRC Log for 2013-08-15

Timestamps are in GMT/BST.

[0:12] <alfredodeza> verdurin: ping
[0:38] <sjustlaptop> dmick: the "rw" and "r" annotations in the MonCommands.h
[0:39] <sjustlaptop> those differentiate between commands which only return data and those with side effects?
[0:40] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[0:42] <sagewk> sjustlaptop: yeah
[1:00] <sagewk> https://github.com/ceph/ceph/pull/501 <-- gussy up ceph -s
[1:14] <ishkabob> does anyone know if dumpling has any problems with bootstrapping monitors?
[1:17] <joshd> sagewk: https://github.com/ceph/ceph/pull/503
[1:19] <sagewk> joshd: looks good to me!
[1:22] <bitblt> I'm sure someone has seen this before, but I haven't been able to figure it out myself. I set glance up to use rbd and get this: Error: Rados(): can't supply both rados_id and name
[1:22] <bitblt> I'm also running the mon on my openstack controller...if that makes any difference
[1:23] <sagewk> ishkabob: nope
[1:23] <sagewk> bitblt: joshd is pushing a fix for that right now :)
[1:23] <bitblt> great...I had done this before and it worked...thought I borked something :)
[1:34] <Tamil> ishkabob: please file a bug with the logs you have
[1:35] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC (Ping timeout))
[1:59] <gregaf> sagewk: hrm, I'm doing a basic librados demo app and when trying to link it with librados.so (which should be the only dependency it needs, I think?), I hit an undefined reference error on bufferlists:
[2:00] <sagewk> yehudasa_: repushed the rgw drain thing against master; the signal stuff was refactored a bit. if it works ok we should backport the whole set to dumpling
[2:00] <gregaf> gregf@kai:~/ceph/examples/librados [master]$ g++ hello_world.o /usr/lib/librados.so
[2:00] <gregaf> hello_world.o: In function `ceph::buffer::list::iterator::iterator(ceph::buffer::list*, unsigned int)':
[2:00] <gregaf> hello_world.cc:(.text._ZN4ceph6buffer4list8iteratorC1EPS1_j[ceph::buffer::list::iterator::iterator(ceph::buffer::list*, unsigned int)]+0x6c): undefined reference to `ceph::buffer::list::iterator::advance(unsigned int)'
[2:00] <gregaf> collect2: ld returned 1 exit status
[2:00] <sagewk> -lrados btw, no need to ref teh .so explicitly
[2:00] <sagewk> hrm
[2:01] <sagewk> weird, buffer.cc is part of librados.so
[2:01] <gregaf> I'm really hoping I might have done something wrong, but I can't imagine what it would be
[2:01] <sagewk> try with -lrados
[2:01] <gregaf> same output
[2:02] <gregaf> I'm of course not doing any twiddling with "advance" on my own, just basic init-from-string and c_str() calls
[2:02] <sagewk> does your installed librados-dev package match?
[2:02] <sagewk> void advance(int o);
[2:02] <sagewk> is what i have in master, not unsigned int
[2:03] <gregaf> it's what you get from the debian dumpling repo with apt-get install librados-dev
[2:03] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[2:03] <dmick> 36d42deab8746245cc9900e5cf1cce9a9aceb43d, last april. ??
[2:03] <sagewk> advance(unsigned int) doesn't apear in the dumpling tree :/
[2:04] <sagewk> dpkg -l librados-dev ?
[2:04] <dmick> are you *sure* you know which version you're linking against?
[2:04] <dmick> (does librados get a sha1 stamp?..)
[2:04] <gregaf> hrm, what's a good way to check?
[2:06] <gregaf> I did apt-get remove librados-dev and there is still a librados.so.2.0.0 there, but it's dated yesterday
[2:06] <gregaf> Version: 0.67-1~bpo60+1
[2:06] <gregaf> from apt-cache
[2:10] <joao> sagewk, wip-4635 ?
[2:15] <Psi-Jack_> Finally!
[2:15] <gregaf> ahah, I've got a stray /usr/local/include/rados directory from somewhere
[2:15] <dmick> doh
[2:15] <gregaf> at least, I hope
[2:15] <Psi-Jack_> My ceph cluster is about to be fully recovered, after almost a full day of rebuilding from 1 osd being bugged out. :/
[2:15] <Psi-Jack_> sjustlaptop: You hanyd?
[2:15] <nhm_> Psi-Jack_: lots of data on the OSD?
[2:16] <Psi-Jack_> nhm_: no, actually, just 1TB.
[2:16] <gregaf> yep, zapping that seems to have done it
[2:16] <nhm_> Psi-Jack_: yeah, seems like 24h to recover is a bit excessive.
[2:16] <Psi-Jack_> Yeah. Quite a bit. :)
[2:17] <nhm_> Psi-Jack_: was it transferring consistently over that time?
[2:17] <gregaf> is the OSD dead so everything was replicating?
[2:17] <Psi-Jack_> But, it wasn't exactly 24 hours of recovery. Last night, I took osd.7 out because of a leveldb issue found, determined by sjustlaptop in case 5859 (IIRC).
[2:18] <Psi-Jack_> This morning, I wiped the OSD disk itself, set it up, put it back into crushmap, and let it rebuild back into it, but, when I did that my entire VM infrastructure went a-wall for 4 hours, until I could get home, restart ceph on the same system running osd.7,.
[2:18] <Psi-Jack_> I just restarted ALL ceph on the 1 server, and it just finished recovering from everything.
[2:18] <Psi-Jack_> Almost finished.. :)
[2:19] <Psi-Jack_> Still 0.383 degraded,
[2:20] <dmick> gregaf: interesting that your compiler was looking in /usr/local/include by default
[2:20] <Psi-Jack_> heh.
[2:21] <gregaf> is it not supposed to? that looks like a standard path to me, but I might have done it
[2:21] <dmick> maybe it's standard for gcc; looking
[2:21] <Psi-Jack_> sjustlaptop: thought it was the HDD that went bad, but, it's less than 6 months old, all diags show it's fine, and it seems to be maintaining itself now being back in the cluster.,
[2:21] <Psi-Jack_> Ahh, here we go. bug 5958.
[2:22] <dmick> terrifyingly enough, yes
[2:22] <Psi-Jack_> I need to know how to upload to cephdrop@ceph.com
[2:22] <dmick> live and learn
[2:22] <dmick> http://gcc.gnu.org/onlinedocs/cpp/Search-Path.html
[2:23] <dmick> Psi-Jack_: pm on its way
[2:23] <Psi-Jack_> Thanks. :)
[2:23] * sagelap (~sage@2600:1012:b028:8858:4d3e:ed03:8189:552d) has joined #ceph
[2:24] <yehudasa_> sagelap: so basically you retrained the old SIGTERM behavior?
[2:26] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[2:28] * bandrus (~Adium@cpe-76-95-220-174.socal.res.rr.com) has joined #ceph
[2:30] <Psi-Jack_> sjustlaptop: Cool. Just got everything uploaded and updated in the ticket for ya. :)
[2:31] <Psi-Jack_> YAY!
[2:31] <Psi-Jack_> health HEALTH_OK
[2:31] <Psi-Jack_> Finally. :D
[2:42] <sagelap> yehudasa_: yeah. i forget what all the bumps here i hit on the way there, but yeah. it uses teh async singal thread instead of doing it directly in the handler.
[2:43] * berant (~blemmenes@24-236-241-163.dhcp.trcy.mi.charter.com) has joined #ceph
[2:44] * Schelluri (~Sriram@108-225-16-176.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[2:45] * Schelluri (~Sriram@108-225-16-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:50] <sagelap> joao: did you run teh crush_ops.sh against wip-4635?
[2:50] <yehudasa_> sagelap: not sure I'm following. The original SIGTERM signal (that you retained) was just causing _exit(0)
[2:53] <sagelap> on master it does handle_sigterm
[2:53] <yehudasa_> ah
[2:53] <sagelap> i didn't put those changes in dumpling bc it seemed liek it should get more testing
[2:54] <sagelap> joao: still there?
[2:55] * yanzheng (~zhyan@ has joined #ceph
[2:58] <joao> sagelap, sure thing
[2:58] * huangjun (~kvirc@ has joined #ceph
[2:58] <joao> okay, "sure thing" in reply to being still here
[2:59] <huangjun> hello, use ceph-fuse -m -r test /mnt
[2:59] <joao> I'll run crush_ops.sh against it
[2:59] <huangjun> but it outputs:mount failed with (116) Stale file handle
[3:00] <joao> sagelap, no need to run crush_ops.sh; vstart fails creating the osd
[3:00] * joao dives back

