#ceph IRC Log


IRC Log for 2013-08-15

Timestamps are in GMT/BST.

[0:00] * jmlowe (~Adium@2601:d:a800:97:3d71:6a87:eafe:ac4e) Quit (Ping timeout: 480 seconds)
[0:01] * jeff-YF (~jeffyf@ Quit (Ping timeout: 480 seconds)
[0:02] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[0:04] * clayb (~kvirc@ Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[0:06] * zhyan_ (~zhyan@ has joined #ceph
[0:08] * yanzheng (~zhyan@ Quit (Ping timeout: 480 seconds)
[0:10] * athrift (~nz_monkey@ Quit (Remote host closed the connection)
[0:10] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[0:11] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:11] * athrift (~nz_monkey@ has joined #ceph
[0:11] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[0:12] <alfredodeza> verdurin: ping
[0:14] * scuttlemonkey (~scuttlemo@ has joined #ceph
[0:14] * ChanServ sets mode +o scuttlemonkey
[0:18] * zhyan_ (~zhyan@ Quit (Ping timeout: 480 seconds)
[0:19] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:19] * zhyan_ (~zhyan@ has joined #ceph
[0:19] * alram (~alram@ Quit (Ping timeout: 480 seconds)
[0:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:25] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:25] * mschiff (~mschiff@ has joined #ceph
[0:27] * alram (~alram@ has joined #ceph
[0:29] * mschiff (~mschiff@ Quit (Remote host closed the connection)
[0:29] * mschiff (~mschiff@ has joined #ceph
[0:30] * MarkN (~nathan@ has joined #ceph
[0:31] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[0:31] * MarkN (~nathan@ has left #ceph
[0:31] * rudolfsteiner (~federicon@ Quit (Quit: rudolfsteiner)
[0:32] * mschiff_ (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[0:34] * mschiff (~mschiff@ Quit (Read error: Operation timed out)
[0:38] <sjustlaptop> dmick: the "rw" and "r" annotations in the MonCommands.h
[0:39] <sjustlaptop> those differentiate between commands which only return data and those with side effects?
[0:40] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[0:42] <sagewk> sjustlaptop: yeah
[0:48] * mschiff_ (~mschiff@port-49581.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[0:50] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[0:50] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[0:50] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[0:56] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[1:00] <sagewk> https://github.com/ceph/ceph/pull/501 <-- gussy up ceph -s
[1:00] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) Quit (Remote host closed the connection)
[1:01] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[1:01] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[1:02] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[1:03] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[1:05] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Quit: No Ping reply in 180 seconds.)
[1:05] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:06] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:07] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) Quit (Read error: Operation timed out)
[1:08] * alexxy (~alexxy@2001:470:1f14:106::2) Quit ()
[1:08] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[1:09] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[1:09] * AfC (~andrew@2407:7800:200:1011:8c69:bbc4:9be:f05e) has joined #ceph
[1:11] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:14] <ishkabob> does anyone know if dumpling has any problems with bootstrapping monitors?
[1:16] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) Quit (Remote host closed the connection)
[1:17] <joshd> sagewk: https://github.com/ceph/ceph/pull/503
[1:17] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[1:19] <sagewk> joshd: looks good to me!
[1:21] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) Quit (Remote host closed the connection)
[1:21] * bitblt (~don@rtp-isp-nat1.cisco.com) has joined #ceph
[1:22] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[1:22] <bitblt> I'm sure someone has seen this before, but I haven't been able to figure it out myself. I set glance up to use rbd and get this: Error: Rados(): can't supply both rados_id and name
[1:22] <bitblt> I'm also running the mon on my openstack controller...if that makes any difference
[1:23] <sagewk> ishkabob: nope
[1:23] <sagewk> bitblt: joshd is pushing a fix for that right now :)
[1:23] <bitblt> great...I had done this before and it worked...thought I borked something :)
[1:24] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:24] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Quit: No Ping reply in 180 seconds.)
[1:25] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:26] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[1:26] * mschiff_ (~mschiff@port-49581.pppoe.wtnet.de) has joined #ceph
[1:28] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:31] * mschiff (~mschiff@port-49581.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[1:32] * bitblt (~don@rtp-isp-nat1.cisco.com) Quit (Quit: Leaving)
[1:34] <Tamil> ishkabob: please file a bug with the logs you have
[1:35] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC (Ping timeout))
[1:40] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[1:42] * todin (tuxadero@kudu.in-berlin.de) Quit (Read error: Operation timed out)
[1:42] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[1:47] * carif (~mcarifio@ has joined #ceph
[1:47] * bandrus (~Adium@cpe-76-95-220-174.socal.res.rr.com) has joined #ceph
[1:50] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[1:50] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[1:55] * scuttlemonkey (~scuttlemo@ Quit (Ping timeout: 480 seconds)
[1:55] * bandrus (~Adium@cpe-76-95-220-174.socal.res.rr.com) Quit (Quit: Leaving.)
[1:56] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:59] * LeaChim (~LeaChim@ Quit (Read error: Operation timed out)
[1:59] <gregaf> sagewk: hrm, I'm doing a basic librados demo app and when trying to link it with librados.so (which should be the only dependency it needs, I think?), I hit an undefined reference error on bufferlists:
[2:00] <sagewk> yehudasa_: repushed the rgw drain thing against master; the signal stuff was refactored a bit. if it works ok we should backport the whole set to dumpling
[2:00] <gregaf> gregf@kai:~/ceph/examples/librados [master]$ g++ hello_world.o /usr/lib/librados.so
[2:00] <gregaf> hello_world.o: In function `ceph::buffer::list::iterator::iterator(ceph::buffer::list*, unsigned int)':
[2:00] <gregaf> hello_world.cc:(.text._ZN4ceph6buffer4list8iteratorC1EPS1_j[ceph::buffer::list::iterator::iterator(ceph::buffer::list*, unsigned int)]+0x6c): undefined reference to `ceph::buffer::list::iterator::advance(unsigned int)'
[2:00] <gregaf> collect2: ld returned 1 exit status
[2:00] <sagewk> -lrados btw, no need to ref teh .so explicitly
[2:00] <sagewk> hrm
[2:01] <sagewk> weird, buffer.cc is part of librados.so
[2:01] <gregaf> I'm really hoping I might have done something wrong, but I can't imagine what it would be
[2:01] <sagewk> try with -lrados
[2:01] <gregaf> same output
[2:02] <gregaf> I'm of course not doing any twiddling with "advance" on my own, just basic init-from-string and c_str() calls
[2:02] <sagewk> does your installed librados-dev package match?
[2:02] * mschiff (~mschiff@port-7806.pppoe.wtnet.de) has joined #ceph
[2:02] <sagewk> void advance(int o);
[2:02] <sagewk> is what i have in master, not unsigned int
[2:03] <gregaf> it's what you get from the debian dumpling repo with apt-get install librados-dev
[2:03] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[2:03] <dmick> 36d42deab8746245cc9900e5cf1cce9a9aceb43d, last april. ??
[2:03] <sagewk> advance(unsigned int) doesn't apear in the dumpling tree :/
[2:04] <sagewk> dpkg -l librados-dev ?
[2:04] <dmick> are you *sure* you know which version you're linking against?
[2:04] <dmick> (does librados get a sha1 stamp?..)
[2:04] <gregaf> hrm, what's a good way to check?
[2:05] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[2:05] <dmick> dpkg is a good start; I'd check librados2 and librados-dev both
[2:06] <gregaf> I did apt-get remove librados-dev and there is still a librados.so.2.0.0 there, but it's dated yesterday
[2:06] <gregaf> Version: 0.67-1~bpo60+1
[2:06] <gregaf> from apt-cache
[2:06] <joshd> -dev packages are for headers
[2:06] <joshd> .so comes from the librados2 package
[2:07] <gregaf> ah, right, but I'm surprised it wasn't auto-removed
[2:07] <dmick> yeah, that's not how debian package mgmt works
[2:07] <dmick> it surprises me too
[2:08] <dmick> apt-get autoremove might delete it if there are no other dependants
[2:08] * mschiff_ (~mschiff@port-49581.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[2:09] <gregaf> ah, right, I was thinking it had freed up so much space it must have removed some dependencies (librados-dev+librados-2 is 130MB, but removing librados-2 only frees 5MB?)
[2:10] * zhyan_ (~zhyan@ Quit (Ping timeout: 480 seconds)
[2:10] <joao> sagewk, wip-4635 ?
[2:11] <dmick> librados2 has only the .so, pretty much
[2:11] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[2:11] <gregaf> more like how did librados-dev get to be 125MB for what I would expect to be some text files
[2:11] <dmick> librados.a
[2:12] * Schelluri (~Sriram@108-225-16-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:12] <dmick> -rw-r--r-- 1 root root 188338616 Aug 9 16:29 /usr/lib/librados.a
[2:12] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[2:13] <dmick> -rw-r--r-- 1 root root 5948048 Aug 9 16:31 /usr/lib/librados.so.2.0.0
[2:14] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:15] <Psi-Jack_> Finally!
[2:15] <gregaf> ahah, I've got a stray /usr/local/include/rados directory from somewhere
[2:15] <dmick> doh
[2:15] <gregaf> at least, I hope
[2:15] <Psi-Jack_> My ceph cluster is about to be fully recovered, after almost a full day of rebuilding from 1 osd being bugged out. :/
[2:15] <Psi-Jack_> sjustlaptop: You hanyd?
[2:15] <nhm_> Psi-Jack_: lots of data on the OSD?
[2:16] <Psi-Jack_> nhm_: no, actually, just 1TB.
[2:16] <gregaf> yep, zapping that seems to have done it
[2:16] <nhm_> Psi-Jack_: yeah, seems like 24h to recover is a bit excessive.
[2:16] <Psi-Jack_> Yeah. Quite a bit. :)
[2:17] <nhm_> Psi-Jack_: was it transferring consistently over that time?
[2:17] <gregaf> is the OSD dead so everything was replicating?
[2:17] <Psi-Jack_> But, it wasn't exactly 24 hours of recovery. Last night, I took osd.7 out because of a leveldb issue found, determined by sjustlaptop in case 5859 (IIRC).
[2:17] * mschiff (~mschiff@port-7806.pppoe.wtnet.de) Quit (Remote host closed the connection)
[2:18] <Psi-Jack_> This morning, I wiped the OSD disk itself, set it up, put it back into crushmap, and let it rebuild back into it, but, when I did that my entire VM infrastructure went a-wall for 4 hours, until I could get home, restart ceph on the same system running osd.7,.
[2:18] <Psi-Jack_> I just restarted ALL ceph on the 1 server, and it just finished recovering from everything.
[2:18] <Psi-Jack_> Almost finished.. :)
[2:19] <Psi-Jack_> Still 0.383 degraded,
[2:20] <dmick> gregaf: interesting that your compiler was looking in /usr/local/include by default
[2:20] <Psi-Jack_> heh.
[2:21] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Ping timeout: 480 seconds)
[2:21] <gregaf> is it not supposed to? that looks like a standard path to me, but I might have done it
[2:21] <dmick> maybe it's standard for gcc; looking
[2:21] <Psi-Jack_> sjustlaptop: thought it was the HDD that went bad, but, it's less than 6 months old, all diags show it's fine, and it seems to be maintaining itself now being back in the cluster.,
[2:21] <Psi-Jack_> Ahh, here we go. bug 5958.
[2:22] <dmick> terrifyingly enough, yes
[2:22] <Psi-Jack_> I need to know how to upload to cephdrop@ceph.com
[2:22] <dmick> live and learn
[2:22] <dmick> http://gcc.gnu.org/onlinedocs/cpp/Search-Path.html
[2:23] <dmick> Psi-Jack_: pm on its way
[2:23] <Psi-Jack_> Thanks. :)
[2:23] * sagelap (~sage@2600:1012:b028:8858:4d3e:ed03:8189:552d) has joined #ceph
[2:24] <yehudasa_> sagelap: so basically you retrained the old SIGTERM behavior?
[2:26] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[2:28] * bandrus (~Adium@cpe-76-95-220-174.socal.res.rr.com) has joined #ceph
[2:30] <Psi-Jack_> sjustlaptop: Cool. Just got everything uploaded and updated in the ticket for ya. :)
[2:31] <Psi-Jack_> YAY!
[2:31] <Psi-Jack_> health HEALTH_OK
[2:31] <Psi-Jack_> Finally. :D
[2:37] * bandrus (~Adium@cpe-76-95-220-174.socal.res.rr.com) Quit (Quit: Leaving.)
[2:37] * carif (~mcarifio@ Quit (Quit: Ex-Chat)
[2:41] * bandrus (~Adium@cpe-76-95-220-174.socal.res.rr.com) has joined #ceph
[2:42] <sagelap> yehudasa_: yeah. i forget what all the bumps here i hit on the way there, but yeah. it uses teh async singal thread instead of doing it directly in the handler.
[2:43] * berant (~blemmenes@24-236-241-163.dhcp.trcy.mi.charter.com) has joined #ceph
[2:44] * Schelluri (~Sriram@108-225-16-176.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[2:45] * Schelluri (~Sriram@108-225-16-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:50] <sagelap> joao: did you run teh crush_ops.sh against wip-4635?
[2:50] <yehudasa_> sagelap: not sure I'm following. The original SIGTERM signal (that you retained) was just causing _exit(0)
[2:53] <sagelap> on master it does handle_sigterm
[2:53] <yehudasa_> ah
[2:53] <sagelap> i didn't put those changes in dumpling bc it seemed liek it should get more testing
[2:54] <sagelap> joao: still there?
[2:55] * yanzheng (~zhyan@ has joined #ceph
[2:58] <joao> sagelap, sure thing
[2:58] * huangjun (~kvirc@ has joined #ceph
[2:58] <joao> okay, "sure thing" in reply to being still here
[2:59] <huangjun> hello, use ceph-fuse -m -r test /mnt
[2:59] <joao> I'll run crush_ops.sh against it
[2:59] <huangjun> but it outputs:mount failed with (116) Stale file handle
[3:00] <joao> sagelap, no need to run crush_ops.sh; vstart fails creating the osd
[3:00] * joao dives back

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.