#ceph IRC Log

Index

IRC Log for 2012-04-06

Timestamps are in GMT/BST.

[0:00] <sjust> wip_omap_check has the fix, if someone could review it
[0:02] <sjust> kloo: looks like there will be a 44.2 soonish
[0:02] <kloo> excellent.
[0:02] <kloo> thanks!
[0:02] <sjust> no problem
[0:03] <sjust> on the plus side, I guess it's been passing...
[0:03] <kloo> it appears so. :)
[0:03] <kloo> i will consider my leveldb structures well-checked.
[0:03] <kloo> also one wonders about the powerful hardware others use, that this wasn't reported before.
[0:04] <sjust> kloo: are you running ext4?
[0:04] <kloo> no, btrfs.
[0:04] <sjust> oh
[0:04] <sjust> hmm
[0:04] <sjust> are you using the omap interfaces?
[0:04] <sjust> oh, I guess cephfs is using them
[0:04] <sjust> right
[0:04] <kloo> i'm using mostly cephfs, a bit of rbd.
[0:05] <sjust> I think cephfs is using it for directories
[0:06] <yehudasa> sjust: looks ok to me, but maybe we should spin some basic sanity check?
[0:06] <kloo> it's midnight here in CET, sleepy time for me.
[0:06] <kloo> thanks again.
[0:06] <sjust> yehudasa: I'll run one now
[0:06] <sjust> kloo: no problem, thanks for the report
[0:06] * kloo (~kloo@a82-92-246-211.adsl.xs4all.nl) Quit (Quit: bye.)
[0:10] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[0:18] * rturk_ (~rturk@aon.hq.newdream.net) has joined #ceph
[0:18] * rturk_ (~rturk@aon.hq.newdream.net) Quit ()
[0:19] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[0:21] * perplexed (~ncampbell@216.113.168.141) has joined #ceph
[0:56] * Oliver (~oliver1@ip-37-24-160-195.unitymediagroup.de) has left #ceph
[1:47] * LarsFronius (~LarsFroni@g231137242.adsl.alicedsl.de) has joined #ceph
[1:48] * BManojlovic (~steki@212.200.243.246) Quit (Remote host closed the connection)
[2:12] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:15] * LarsFronius (~LarsFroni@g231137242.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[2:43] * lofejndif (~lsqavnbok@28IAADR6Y.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[3:00] * perplexed (~ncampbell@216.113.168.141) has left #ceph
[3:38] * rturk (~rturk@aon.hq.newdream.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[3:54] * joao (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[4:16] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[4:21] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:40] * joao (~JL@89-181-153-140.net.novis.pt) Quit (Remote host closed the connection)
[4:49] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[5:27] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[6:04] * elder (~elder@12.207.22.82) Quit (Remote host closed the connection)
[6:15] <Qten> heyas i was checking out the roadmap lastnight, i'm trying to findout which version is going to be considered stable?
[6:17] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:19] * LarsFronius (~LarsFroni@g231137242.adsl.alicedsl.de) has joined #ceph
[8:05] * cattelan_away is now known as cattelan_away_away
[8:05] * cattelan_away_away is now known as cattelan_away
[8:05] * cattelan_away is now known as cattelan_away_away
[8:06] <iggy> Qten: pretty much the same as all open source projects... it's ready when it's ready
[8:06] * cattelan_away_away is now known as cattelan_away
[8:07] <iggy> things are coming along quite nicely... rados seems to be in pretty good shape... cephfs is surely soon to follow
[8:07] * cattelan_away is now known as cattelan_away_away
[8:50] <NaioN> Qten: they are removing the warnings in 0.45 ;)
[8:51] <NaioN> but it depends on what your usecase is
[8:51] * LarsFronius (~LarsFroni@g231137242.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[9:03] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:28] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:29] * LarsFronius (~LarsFroni@ip-2-206-0-114.web.vodafone.de) has joined #ceph
[9:30] * LarsFronius (~LarsFroni@ip-2-206-0-114.web.vodafone.de) Quit ()
[10:06] * Qten (Qten@ppp59-167-157-24.static.internode.on.net) Quit ()
[10:06] * Qten (Qten@ppp59-167-157-24.static.internode.on.net) has joined #ceph
[10:13] * loicd (~loic@83.167.43.235) has joined #ceph
[12:29] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:28] * joao (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[14:07] * notjacques (~claude_eg@122x212x156x18.ap122.ftth.ucom.ne.jp) has left #ceph
[14:35] * lofejndif (~lsqavnbok@09GAAEN3B.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:55] * morpheusx (~morpheus@foo.morphhome.net) has joined #ceph
[15:09] <wonko_be> is there a command to get the fsid of a monitor?
[15:11] * Oliver1 (~oliver1@p5483BC4F.dip.t-dialin.net) has joined #ceph
[15:23] <wonko_be> got it, just found monmaptool
[15:32] <nhm> good morning all
[15:35] <Oliver1> Good afternoon ;)
[15:35] * lofejndif (~lsqavnbok@09GAAEN3B.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[15:39] <nhm> Oliver1: How goes your ceph work?
[15:39] <Oliver1> Well, waiting on a commitment from Sage, that my git-stuff is the right version: v0.44.1-212-g57dff03
[15:40] <Oliver1> Should include the new "reordering"...
[15:43] <Oliver1> 2b more exact: "wip-osd-reorder branch", here is holiday in germany, but well??? weather is bad ;)
[16:17] <nhm> Oliver1: sorry, got pulled away for a bit. Yeah, it's a Holiday weekend here too.
[16:22] <Oliver1> No prob.
[16:30] <nhm> good luck with the new branch
[16:35] <Oliver1> If this solves our rbd-corruption problem, it would be a big step in the right direction, keeping fingers crossed ;)
[16:43] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[16:53] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[16:59] * imjustmatthew (~imjustmat@pool-71-176-237-208.rcmdva.fios.verizon.net) has joined #ceph
[17:02] * kloo (~kloo@a82-92-246-211.adsl.xs4all.nl) has joined #ceph
[17:02] <kloo> hi!
[17:03] <kloo> my ceph-osd processes are spending an awful lot of time doing this: http://pastebin.com/wYnPKWeJ
[17:04] <kloo> in src/include/buffer.h (version 0.44.1) i find the following comment about a bunch of operations including that copy_in():
[17:04] <kloo> **** WARNING: this are horribly inefficient for large bufferlists. ****
[17:04] <kloo> // **** WARNING: this are horribly inefficient for large bufferlists. ****
[17:06] <kloo> my non-expert reading of the code suggests that seeks in the buffer lists are done by jumping to the start of the list and then advancing the iterator until it lands at the right place.
[17:07] <kloo> this may underlie http://tracker.newdream.net/issues/2161 ?
[17:07] <kloo> is this stuff on your radar?
[17:11] * LarsFronius (~LarsFroni@f054114231.adsl.alicedsl.de) has joined #ceph
[17:11] <kloo> in my setup i think it is causing osds to be marked down during recovery, that's how much cpu it takes / how slow it is.
[17:12] <imjustmatthew> is ceph-fuse considered more or less stable than the kernel client right now?
[17:12] <kloo> in fact osds were being marked out until i upped the down to out time.
[17:14] <kloo> > hi!
[17:14] <kloo> > my ceph-osd processes are spending an awful lot of time doing this: http://pastebin.com/wYnPKWeJ
[17:14] <kloo> > in src/include/buffer.h (version 0.44.1) i find the following comment about a bunch of operations including that copy_in():
[17:14] <kloo> > **** WARNING: this are horribly inefficient for large bufferlists. ****
[17:14] <kloo> > // **** WARNING: this are horribly inefficient for large bufferlists. ****
[17:14] <kloo> > my non-expert reading of the code suggests that seeks in the buffer lists are done by jumping to the start of the list and then
[17:14] <kloo> +advancing the iterator until it lands at the right place.
[17:14] <kloo> > this may underlie http://tracker.newdream.net/issues/2161 ?
[17:14] <kloo> > is this stuff on your radar?
[17:14] <kloo> *** LarsFronius (~LarsFroni@f054114231.adsl.alicedsl.de) has joined channel #ceph
[17:14] <kloo> > in my setup i think it is causing osds to be marked down during recovery, that's how much cpu it takes / how slow it is.
[17:14] <kloo> <imjustmatthew> is ceph-fuse considered more or less stable than the kernel client right now?
[17:14] <kloo> > in fact osds were being marked out until i upped the down to out time.
[17:16] <kloo> sorry, let's call that a mouse spasm. :(
[17:16] <nhm> kloo: not sure if we are looking into that or not. Sage and Co. should be around in an hour or two.
[17:16] <kloo> thanks nhm.
[17:17] <nhm> kloo: Maybe comment on the bug with your findings
[17:17] <nhm> kloo: doesn't hurt to have it documented
[17:26] <kloo> done.
[17:33] * lofejndif (~lsqavnbok@9YYAAE48O.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:43] * Oliver1 (~oliver1@p5483BC4F.dip.t-dialin.net) Quit (Quit: Leaving.)
[18:02] <sagewk> joao: around?
[18:02] <joao> I'm here
[18:20] * adjohn (~adjohn@s24.GtokyoFL16.vectant.ne.jp) has joined #ceph
[18:20] * adjohn (~adjohn@s24.GtokyoFL16.vectant.ne.jp) has left #ceph
[18:24] * rturk (~rturk@aon.hq.newdream.net) has joined #ceph
[18:24] * loicd (~loic@83.167.43.235) Quit (Quit: Leaving.)
[18:29] <kloo> thanks sage, looking forward to a more efficient copy_in().
[18:34] <sagewk> kloo: which version are you using currently? i can push a branch for you to test
[18:37] <kloo> 0.44.1 minus DBObjectMap::check() calls (discussed with Samuel yesterday).
[18:37] <kloo> and yes please. :)
[18:55] * sC (54b5afbb@ircip1.mibbit.com) has joined #ceph
[18:56] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:57] <sC> I have a small Question regarding the stability of ceph - is it stable enough to be ran in a working envoirenment?
[18:59] <vhasi> if you by "working" mean "production" you should probably do extensive testing first
[19:01] <vhasi> last time i checked it was not considered "stable" but if i'm not misinformed there are a few production environments running Ceph, although i can't say to what degree
[19:02] <sC> The only thing I found was "The object store (RADOS), radosgw, and RBD are considered reasonably stable. However, we do not yet recommend storing valuable data with it yet without proper precautions."
[19:03] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:03] <sC> But I am not sure about how this applies to the CephFS in general :/
[19:03] <kloo> in my recent experience it mostly works but i do run into issues, that i come here to discuss.
[19:05] <sC> The question is not "does it work" but "is it save" :)
[19:06] <sC> thanks anyway :)
[19:06] * LarsFronius (~LarsFroni@f054114231.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[19:07] <sagewk> kloo: i poushed a wip-encoding branch for you
[19:07] <sagewk> pushed
[19:07] <vhasi> i would advise against using it in a prodution environment without reliable redundancy unless you're a developer who can find the issues and salvage corrupt filesystem data
[19:09] <sC> vhasi: okay thank you that answers my question pretty much :/
[19:10] <kloo> thanks sage, i'm going to give it a try.
[19:10] <vhasi> sC: mind you i'm not a developer and do not speak for the development team in any way
[19:10] <vhasi> sC: it's just my personal recommendation
[19:10] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[19:10] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:11] <sC> vhasi: yep but that is what i feared by reading the manuals
[19:16] <joao> nhm, is it crashing again?
[19:29] <nhm> joao: yep
[19:30] <nhm> joao: how did you fix your empty room problem?
[19:30] <joao> I didn't
[19:31] <joao> have no idea what happened; one day it didn't work, the next day it did
[19:31] * cattelan_away_away is now known as cattelan_away
[19:31] <nhm> heh, yay for mysterious and inconsistent bugs.
[19:32] * lofejndif (~lsqavnbok@9YYAAE48O.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:59] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[20:06] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[20:22] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:28] * LarsFronius (~LarsFroni@d210095.adsl.hansenet.de) has joined #ceph
[20:32] * LarsFronius_ (~LarsFroni@g231168143.adsl.alicedsl.de) has joined #ceph
[20:32] * LarsFronius (~LarsFroni@d210095.adsl.hansenet.de) Quit (Read error: Connection reset by peer)
[20:32] * LarsFronius_ is now known as LarsFronius
[21:07] <kloo> sage, my first impression is that it seems a bit better but the bl copy_in() still runs hot during recovery.
[21:09] <kloo> it's no longer 100% user mode cpu.
[21:25] * lofejndif (~lsqavnbok@19NAAHXOV.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:25] <sagewk> kloo, ok thanks. will pull it into master, and we'll look at improving it further.
[21:27] <kloo> is there anything i could do to help?
[21:34] <sagewk> kloo: do you have a profile of where the time is being spent?
[21:36] <kloo> no but i can do that; you mean from gprof?
[21:36] <sagewk> kloo: or whatever you're comfortable with!
[21:37] <kloo> okidoki.
[21:38] <sagewk> kloo: thanks!
[21:40] * f4m8_ (f4m8@kudu.in-berlin.de) Quit (Remote host closed the connection)
[21:45] * rturk (~rturk@aon.hq.newdream.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[21:49] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:01] * lofejndif (~lsqavnbok@19NAAHXOV.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[22:04] * cattelan_away is now known as cattelan_away_away
[22:04] * cattelan_away_away is now known as cattelan_away
[22:20] * nrheckman (~chatzilla@75-149-56-241-SFBA.hfc.comcastbusiness.net) has joined #ceph
[22:22] <nrheckman> Hey everybody, I'm having what will probably turn out to be a configuration issue. Just learning how to get ceph set up... Whenever I attempt to start it, I'm seeing the following in my osd.0.log "2012-04-06 13:14:59.996748 7f06a96a9780 ** ERROR: osd init failed: (1) Operation not permitted". Can anybody give me a clue what to look at? Thanks!
[22:31] <nrheckman> Ahh... I think I narrowed it down. If i remove "auth supported = cephx" from my global config, everything starts up fine.
[22:46] <sagewk> tv|work: ceph will no longer do its own pidfile handling, right?
[23:08] * Oliver1 (~oliver1@84.131.188.79) has joined #ceph
[23:10] * kloo (~kloo@a82-92-246-211.adsl.xs4all.nl) Quit (Quit: time for bed.)
[23:33] * Oliver1 (~oliver1@84.131.188.79) Quit (Quit: Leaving.)
[23:52] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[23:57] <nrheckman> Hrmm... having some trouble with radosgw and mod_fastcgi. Every request results in a timeout: "[Fri Apr 06 14:41:23 2012] [error] [client 192.168.122.1] FastCGI: comm with (dynamic) server "/var/www/s3gw.fcgi" aborted: (first read) idle timeout (30 sec)".

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.