#ceph IRC Log

Index

IRC Log for 2011-12-30

Timestamps are in GMT/BST.

[0:47] * mgalkiewicz (~maciej.ga@85.89.186.247) has joined #ceph
[0:50] <mgalkiewicz> Hello guys
[0:50] <mgalkiewicz> my osd crashes after start
[0:50] <mgalkiewicz> http://pastie.org/3093013
[0:50] <mgalkiewicz> I have two osds in the cluster
[0:54] * jojy (~jvarghese@108.60.121.114) Quit (Quit: jojy)
[1:08] <mgalkiewicz> restarting the other osd fixed the problem
[1:08] * mgalkiewicz (~maciej.ga@85.89.186.247) Quit (Quit: Ex-Chat)
[1:16] * aa (~aa@r190-135-22-128.dialup.adsl.anteldata.net.uy) has joined #ceph
[1:17] * BManojlovic (~steki@212.200.241.197) has joined #ceph
[1:23] * steki-BLAH (~steki@212.200.241.171) Quit (Ping timeout: 480 seconds)
[1:41] * BManojlovic (~steki@212.200.241.197) Quit (Remote host closed the connection)
[1:47] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[2:16] * pc_ (~pc@88-117-123-181.adsl.highway.telekom.at) has joined #ceph
[2:23] * pc (~pc@91-115-229-221.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[2:35] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:35] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit ()
[3:19] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:56] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[4:05] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[4:20] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:21] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:37] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[4:40] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:41] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[4:41] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:44] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has left #ceph
[4:59] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[7:38] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[8:52] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:01] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[9:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:34] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[9:39] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:49] * fronlius (~fronlius@f054101171.adsl.alicedsl.de) has joined #ceph
[10:00] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[10:00] * fronlius (~fronlius@f054101171.adsl.alicedsl.de) Quit (Quit: fronlius)
[10:02] * fronlius (~fronlius@f054101171.adsl.alicedsl.de) has joined #ceph
[10:03] * fronlius (~fronlius@f054101171.adsl.alicedsl.de) Quit ()
[10:17] * pc (~pc@88-117-113-196.adsl.highway.telekom.at) has joined #ceph
[10:23] * pc_ (~pc@88-117-123-181.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[10:29] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:30] * pc (~pc@88-117-113-196.adsl.highway.telekom.at) Quit (Quit: Verlassend)
[11:06] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[11:08] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[11:15] * gregorg (~Greg@78.155.152.6) has joined #ceph
[11:36] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[11:53] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[11:53] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[12:16] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[12:53] * andresambrois (~aa@r186-52-185-7.dialup.adsl.anteldata.net.uy) has joined #ceph
[12:57] * aa (~aa@r190-135-22-128.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[12:57] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[12:57] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[13:35] <wonko_be> are there any chef cookbooks public yet?
[14:21] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[14:29] * root (~root@79.126.243.235) has joined #ceph
[14:45] * andresambrois (~aa@r186-52-185-7.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[15:09] * root (~root@79.126.243.235) Quit (Remote host closed the connection)
[15:54] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has left #ceph
[15:58] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[17:12] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[17:16] <sagewk> wonko_be: https://github.com/NewDreamNetwork/ceph-cookbooks
[17:16] <sagewk> wonko_be: the multiple monitor stuff isn't in there yet, tho. only does single monitor for now.
[17:18] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:19] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[17:34] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:38] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:47] <wonko_be> sagewk: am I right that these recipes are rather a way of replacing mkcephfs script?
[17:47] <sagewk> yeah
[17:49] <wonko_be> okay
[17:49] <wonko_be> then I'll clean up mine, and publish them
[17:49] <wonko_be> but that will be next year :-)
[17:58] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[17:59] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:05] <jojy> Unhandled exception in service handler: boost::exception_detail::clone_impl<Forte::DbLookupFailedException>(invalid target_guid)
[18:23] <Sargun> Oh,god, C++
[18:27] <nhm> sargun: Would you prefer fortran? ;)
[18:27] <wonko_be> lol
[18:28] <Sargun> C.
[18:28] <Sargun> kthxbai.
[18:41] <yehudasa_> jojy: where do you see that exception?
[18:42] <jojy> sorry yehudasa wrong window :|
[18:42] <yehudasa_> no worries
[18:53] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:11] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[19:32] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:33] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:47] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[19:50] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has joined #ceph
[19:53] <fghaas> sagewk, thanks for merging those ocf patches -- that was quick and painless :)
[19:53] <sagewk> fghaas: no problem :)
[19:54] <fghaas> I wasn't able to find any preferred way of submitting patches -- are the list, and a github pull request, both equally acceptable for you guys?
[19:54] <sagewk> both are fine. the list is a bit more transparent, i guess
[19:55] <fghaas> true; by contrast github has line notes which greatly facilitate the review process
[19:55] <fghaas> so I'm fine either way, just curious what you guys preferred
[19:57] <sagewk> yeah.. a pull request accompanied by an email on the list might capture the best of both? it doesn't matter too much, tho :)
[19:58] <sagewk> i'm just happy to have contributions in the first place :)
[19:58] <fghaas> I can work with that
[19:59] <fghaas> working on that humble patch set I noticed a few oddities in the spec that I'll tackle next, I guess
[20:02] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[20:05] * jojy_ (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[20:05] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:05] * jojy_ is now known as jojy
[20:56] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:23] * jojy (~jvarghese@75-54-228-176.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)
[21:39] * fghaas (~florian@85-127-155-32.dynamic.xdsl-line.inode.at) has left #ceph
[22:22] * verwilst (~verwilst@d51A5B710.access.telenet.be) has joined #ceph
[22:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:44] * verwilst (~verwilst@d51A5B710.access.telenet.be) Quit (Quit: Ex-Chat)
[22:45] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[22:56] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[23:04] <nhm> sagewk: ping
[23:05] <sagewk> nhm: hi
[23:06] <nhm> sagewk: Does MDS::ms_dispatch get called anywhere? I've been digging into 1682.
[23:07] <sagewk> it's call by teh messenger when a message is being delivered
[23:08] <nhm> ah, I see that now. I must have missed it.
[23:08] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[23:09] <sagewk> fwiw, it may be a dup of 1549
[23:10] <nhm> sagewk: I've just been running through the code and familiarizing myself with it.
[23:10] <nhm> sagewk: any ruling out various things in the process.
[23:10] <sagewk> i suspect the way to catch that one is going to be to hammer the mds while running it through valgrind...
[23:10] <sagewk> i think there's some use-after-free or some such going on that's making us clobber things
[23:11] <wido> hi
[23:11] <wido> has there been any development on the libvirt storage pool lately?
[23:12] <nhm> sagewk: yeah, I wasn't sure if there were specific workloads that triggered it. Seems like there might be multiple things going on.
[23:12] <wido> I've been implementing this for the last few days and never thought about asking if somebody else had already done it
[23:12] <joshd> wido: not that I know of
[23:12] <wido> joshd: ok, thanks
[23:13] <sagewk> wido: nice. how is it coming?
[23:13] <wido> sagewk: Well, it's a bit of a struggle. Since the current storage pools al rely on either a regular file or device being present
[23:13] <nhm> sagewk: all of the various fucntions that call finish() delete the contexts immediately after as far as I can tell.
[23:14] <wido> since RBD is 'virtual' it has a lot more work involved
[23:14] <wido> also, the cephx auth makes it a bit harder, since you have to deal with that in the pool as well
[23:14] <nhm> sagewk: I noticed that trim() is getting called before find_idle_sessions in MDS::tick(), but I haven't looked into it any further.
[23:15] <sagewk> nhm: shouldn't matter as long as the references are being held correctly. probably there's a pointer somewhere without a reference..
[23:15] <wido> I actually needed the libvirt integration for CloudStack, we choose to start using cloudstack instead of OpenStack, but it's missing RBD support
[23:15] <wido> So I wanted to implement RBD, but they fully rely on libvirt's storage pool management
[23:16] <wido> Step 1: RBD storage pool support in libvirt, Step 2: RBD support in CloudStack
[23:16] <sagewk> wido: ah. well, getting the storage pools to work will be a win for casual virt-manager users, too
[23:17] <sagewk> yeah
[23:17] <wido> sagewk: True, so it's always a good thing :)
[23:17] <wido> but it's just a lot of work in libvirt and I don't think the first patch will make it ;)
[23:17] <wido> I found your conversation on the libvirt mailinglist about a year ago
[23:17] <wido> they suggested to do the kernel RBD first, letting libvirt set up rbd devices
[23:18] <wido> but I prefer the librbd implementation in Qemu over the krbd
[23:18] <wido> although krbd would be easier, since you then simply feed block devices to libvirt, which it already knows well
[23:20] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:21] <wido> I'm going afk, but at least I know nobody is doing duplicate work here
[23:21] <wido> Give me a few weeks, lot of stuff going on here
[23:21] <wido> ttyl
[23:21] <sagewk> wido: for the storage pool part, i think it needs to use librbd
[23:22] <sagewk> wido: since all it is doing is listing, creating, removing images.
[23:22] <wido> sagewk: Yes, I figured the same
[23:22] <sagewk> wido: whether the vm then uses qemu+librbd or krbd shouldn't matter
[23:22] <wido> well, the LVM implementation is actually kind of nasty, the invoke lvs, vgcreate and all that stuff
[23:22] <wido> same goes for the iSCSI implementation
[23:22] <sagewk> getting the kernel rbd support workign will be another nice piece that will make non-kvm people happy
[23:23] <sagewk> wido: yeah.. using librbd directly should be much nicer, no external processes or parsing.
[23:23] <wido> For now I'll stick to the librbd only, but I get the point that krbd would be nice for non-kvm
[23:23] <sagewk> someday :)
[23:23] <wido> I'll might find a way to work it out somehow, it might do both
[23:24] <wido> but I'll focus on KVM first
[23:24] <sagewk> in any case, it's not directly related to the storage pool piece.
[23:24] <wido> But I'm really going afk now. ttyl!
[23:24] <sagewk> :) ttyl
[23:31] * BManojlovic (~steki@79.101.99.102) has joined #ceph
[23:39] <nhm> sagewk: did you ever try running with with logging at 20 fr 1682?
[23:39] <sagewk> nhm: yeah, no luck hitting it that way..
[23:41] <nhm> hrm. Well, this is my first time looking at the ceph code so I'm still absorbing a lot, but I think I concur with your thoughts on throwing valgrind at it.
[23:44] <nhm> oh well, time to head home. Have a happy new years!

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.