#ceph IRC Log


IRC Log for 2012-08-16

Timestamps are in GMT/BST.

[0:37] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:48] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[0:52] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:03] <sagewk> dmick, joshd: i see lot so valgrind warnings in the current master.. is this stuff fixed in the rbd branches?
[1:05] <dmick> uhhh
[1:05] <dmick> I'll get back to you on that? :)
[1:05] <joshd> sagewk: probably not
[1:05] <dmick> rbd cli seems fairly clean with the new code; not sure about everything else
[1:05] <sagewk> http://fpaste.org/grJQ/
[1:05] <dmick> but it's not like I did a lot of coverage teseting
[1:06] <dmick> ramping up on it
[1:06] <joshd> hmm, I don't remember seeing warnings in those places
[1:06] <joshd> pretty sure I fixed some of those at least
[1:06] <sagewk> test_librbd_fsx doesn't try to close the image and shutdown rados... need to add that to find the leaks in the noise
[1:06] <joshd> sagewk: did that in wip-rbd-protect as well
[1:07] <joshd> sagewk: should probably test on that branch anyway since the i/o paths are a bit different
[1:08] <sagewk> joshd: ah, ok.
[1:08] <sagewk> is that stuff moving into master soon?
[1:08] <joshd> yeah, in the next day or so probably
[1:10] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:14] <dmick> I can push the full-on merge to the public ws if you want
[1:14] <dmick> (it's in my private repo ATM just because I continue to feel guilt about the gitbuilders)
[1:15] <gregaf> they're a lot healthier now than they used to be
[1:15] <dmick> yeah
[1:16] <Tobarja> what was that business update joao was talking abbut?
[1:16] <dmick> Tobarja: inktank meeting. IRC channel mission creep. :)
[1:17] <dmick> oh I guess it's already there. wip-rbd-dmick
[2:02] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:16] * lofejndif (~lsqavnbok@19NAABS8O.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:16] <sjust> sage: that branch looks good
[2:53] * James259 (~James259@ has joined #ceph
[2:54] * chutzpah (~chutz@ Quit (Quit: Leaving)
[2:55] <James259> Hi Folks. Wonder if anyone can tell me what I am doing wrong. osd.3 had to be turned off for a few days. There are 3 osd's in total that are 'in' and it appeared to properly move all data onto the remaining 2 osd's and showed HEALTH_OK. Now, I reconnect osd.3 and the osd process bails on startup. Here is the error: http://pastebin.com/RSsTpG8r
[2:56] <James259> If anyone can help me get this OSD back online, it would be very much appreciated.
[2:56] <sjust> ah, I think this is fixed in 0.48.1
[2:56] <James259> how do i check my version?
[2:56] <sjust> you appear to be running 0.48
[2:56] <sjust> one sec
[2:56] <sjust> ??ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
[2:57] <sjust> I'll check the commit
[2:57] <sjust> yeah, that's the 0.48 release
[2:57] <James259> aha.. ceph -v returns... ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
[2:57] <James259> :) thanks so much.
[2:57] <sjust> no problem at all :)
[2:58] <sjust> I'll be around for at least another hour
[2:58] <James259> okay. So is there a fix? (or are you still checking the commit)
[2:58] <James259> oh, sorry
[2:58] <James259> Just read it again - I need 0.48.1
[2:58] <sjust> hang on
[2:58] <James259> :)
[2:58] <sjust> ec5cd6def9817039704b6cc010f2797a700d8500
[2:59] <sjust> I think that's what you'll want
[2:59] <James259> kk... thank you. Can I just apt-get upgrade to get that?
[2:59] <sjust> or rather, that's the specific commit, you probably want to just upgrade to v0.48.1, one sec
[3:01] <sjust> http://ceph.com/docs/master/install/debian/
[3:01] <James259> aptitude says there is an update available. Should I just let it go get it?
[3:02] <James259> Version: 0.48argonaut-1precise
[3:02] <sjust> that should work
[3:02] <James259> do I need to update all servers at the same time or can I do them one at a time while everything is online?
[3:02] <sjust> upgrading one at a time should be ok
[3:02] <James259> brill. you are a star. thank you so much.
[3:07] <dmick> strcmp(sjust, "*") == 0
[3:15] <James259> that osd seems to be working now. thanks sjust.
[3:16] <sjust> cool, good to hear
[3:18] * dmick (~dmick@ Quit (Quit: Leaving.)
[3:27] * tightwork (~tightwork@ has joined #ceph
[3:32] * James259 (~James259@ Quit ()
[3:42] * Cube (~Adium@ Quit (Quit: Leaving.)
[3:54] <tightwork> What level of development is ceph? is it ready for live production environment?
[3:54] <sjust> depends on the situation
[3:54] <sjust> what part would you want to use?
[3:55] <sjust> or, what is your use case?
[3:57] * deepsa (~deepsa@ has joined #ceph
[4:01] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[4:07] <tightwork> rados to store vm images
[4:07] <sjust> using rbd, or just as raw rados objects/
[4:07] <sjust> ?
[4:07] <tightwork> yea sorry, rbd
[4:07] <sjust> either way, those parts are fairly stable at this point
[4:08] <tightwork> great!
[4:08] <sjust> we are in the process of getting rbd layering working
[4:08] <sjust> that should be in one of the next few releases
[4:10] <tightwork> whats that do? :) I just have the concept of ceph and why I want it in the contet with the negatives in scaling with traditional san/das/raid arrays
[4:11] <sjust> it allows you to clone an existing image (like an ubuntu image) and mount a vm on the clone
[4:12] <sjust> and the clone will be lazily created as writes happen
[4:12] <sjust> so, thin provisioned clones
[4:14] <tightwork> ah I see
[4:14] <tightwork> that will be great
[4:15] <sjust> yup!
[4:15] <tightwork> is ceph similar to swift? Is a replica defined?
[4:15] <sjust> we are working on rbd and rados performance...well, as we speak
[4:16] <sjust> we do provide a swift interface to the ceph object store via rados gateway
[4:17] <sjust> we do replicate data, is that what you mean?
[4:17] <tightwork> I suppose I am thinking of glance backed swift
[4:18] <sjust> we have hooks in glance to support rbd
[4:19] <tightwork> i see
[4:19] <sjust> it's also possible to boot directly from rbd
[4:19] <tightwork> would you need say, gpxe?
[4:20] <tightwork> or an iscsi layer?
[4:20] <maelfius> tightwork: I'm working on building out glance using radosgw (s3 compat). though i am personally looking forward to cleaner support for booting RDB for instances
[4:20] <sjust> we have direct support via qemu as well as linux kernel support
[4:21] <maelfius> tightwork: (for my company that is), it seems to be quite functional.
[4:21] <tightwork> so where are you guys seeing bottlenecks? network?
[4:25] <sjust> ideally but, we are still working on small random io performance
[4:30] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[4:30] * Ryan_Lane (~Adium@ has joined #ceph
[4:30] * Ryan_Lane (~Adium@ Quit ()
[4:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:40] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:49] <tightwork> How can I push mkcephfs to my nodes? Im using ubuntu I know its picky about not using a root account.
[6:55] * nhm (~nhm@ Quit (Ping timeout: 480 seconds)
[7:00] * Cube (~Adium@ has joined #ceph
[7:10] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[7:10] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:14] <tightwork> Oh ok, it wants to ask for the password million times
[7:14] * sjust (~sam@ Quit (Read error: Operation timed out)
[7:14] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) Quit (Write error: connection closed)
[7:17] * gregaf (~Adium@ Quit (Ping timeout: 480 seconds)
[7:17] * gregaf1 (~Adium@2607:f298:a:607:f9f3:ac0f:d584:3283) has joined #ceph
[7:18] <tightwork> I am reading quick start and the docs. Where is the btrfs partition specified? /var/lib/ceph ?
[7:19] * yehudasa (~yehudasa@ Quit (Ping timeout: 480 seconds)
[7:19] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) Quit (Ping timeout: 480 seconds)
[7:20] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[7:20] <maelfius> tightwork: i used shared keys to get around the password issues. for the btrfs partition stuff, I am using XFS because as far as i know isn't ready for prime-time
[7:21] <maelfius> but i also define specific locations for my environment to find the mounted partitions via "osd data = <path>" in the ceph.conf
[7:21] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:22] <maelfius> tightwork: I assume that is what you were meaning.
[7:27] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:27] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[7:29] * sjust (~sam@ has joined #ceph
[7:33] * mkampe (~markk@ has joined #ceph
[7:34] * yehudasa (~yehudasa@ has joined #ceph
[7:35] * loicd (~loic@brln-4db81486.pool.mediaWays.net) has joined #ceph
[7:38] * sagewk (~sage@ has joined #ceph
[7:46] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:46] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[7:48] * tightwork (~tightwork@ Quit (Ping timeout: 480 seconds)
[8:14] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[8:14] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[8:21] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:26] * Cube (~Adium@ Quit (Quit: Leaving.)
[8:27] * EmilienM (~EmilienM@ has joined #ceph
[8:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[8:57] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[9:04] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:24] * johnl (~johnl@2a02:1348:14c:1720:1531:7033:ff8a:c766) Quit (Remote host closed the connection)
[9:24] * johnl (~johnl@2a02:1348:14c:1720:9859:49dd:fb7d:4146) has joined #ceph
[9:26] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:27] * loicd (~loic@brln-4db81486.pool.mediaWays.net) Quit (Quit: Leaving.)
[9:32] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[9:38] * Leseb (~Leseb@ has joined #ceph
[9:40] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:42] <Leseb> hi all
[9:44] * loicd (~loic@p5B2C523C.dip.t-dialin.net) has joined #ceph
[9:49] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:19] * The_Bishop (~bishop@2a01:198:2ee:0:146b:1f19:e860:cfc7) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[10:30] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[10:30] * maelfius (~Adium@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[10:38] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[10:39] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:49] * maelfius (~Adium@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[10:49] * BManojlovic (~steki@ has joined #ceph
[10:56] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:59] * maelfius (~Adium@pool-71-160-33-115.lsanca.fios.verizon.net) has left #ceph
[11:14] * senner (~Home_User@68-113-228-89.dhcp.stpt.wi.charter.com) Quit (Quit: Leaving.)
[11:37] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[11:37] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[13:34] * tightwork (~tightwork@ has joined #ceph
[13:38] * lofejndif (~lsqavnbok@04ZAAE3MH.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:02] * loicd (~loic@p5B2C523C.dip.t-dialin.net) Quit (Quit: Leaving.)
[14:12] * lofejndif (~lsqavnbok@04ZAAE3MH.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[14:15] * tightwork (~tightwork@ Quit (Ping timeout: 480 seconds)
[14:22] * lofejndif (~lsqavnbok@09GAAHJ4R.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:23] * loicd (~loic@brln-4db81486.pool.mediaWays.net) has joined #ceph
[14:31] * lofejndif (~lsqavnbok@09GAAHJ4R.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[15:30] * tightwork (~tightwork@rrcs-71-43-128-65.se.biz.rr.com) has joined #ceph
[15:52] * nhm (~nhm@253-231-179-208.static.tierzero.net) has joined #ceph
[16:01] * senner (~Wildcard@68-113-228-89.dhcp.stpt.wi.charter.com) has joined #ceph
[16:04] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[16:12] * deepsa (~deepsa@ has joined #ceph
[16:20] * rosco (~r.nap@ Quit (Quit: *Poof*)
[16:20] * rosco (~r.nap@ has joined #ceph
[16:22] * exec (~v@ Quit (Ping timeout: 480 seconds)
[16:27] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:41] * nhm (~nhm@253-231-179-208.static.tierzero.net) Quit (Ping timeout: 480 seconds)
[16:56] * nhm (~nhm@253-231-179-208.static.tierzero.net) has joined #ceph
[16:59] * jluis (~JL@89-181-148-52.net.novis.pt) has joined #ceph
[17:00] * liiwi (liiwi@idle.fi) Quit (Ping timeout: 480 seconds)
[17:02] * johnmwilliams_ (u4972@irccloud.com) has joined #ceph
[17:03] <johnmwilliams_> Good morning. Is ceph-authtool available on lucid?
[17:03] <johnmwilliams_> Trying to install via chef: http://ceph.com/docs/master/config-cluster/chef/
[17:05] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[17:09] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:26] * nhm (~nhm@253-231-179-208.static.tierzero.net) Quit (Ping timeout: 480 seconds)
[17:26] * loicd1 (~loic@brln-4db801aa.pool.mediaWays.net) has joined #ceph
[17:31] * loicd (~loic@brln-4db81486.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[17:38] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:46] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:46] * nhm (~nhm@253-231-179-208.static.tierzero.net) has joined #ceph
[17:48] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:01] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:01] <dspano> Is it a known issue for the cephfs client to crash when you're using virtual nics in ubuntu?
[18:02] * Tv_ (~tv@ has joined #ceph
[18:07] * BManojlovic (~steki@ has joined #ceph
[18:07] <gregaf1> johnmwilliams_: if you're using the default lucid packages it'll be called cauthtool ?????but you'll want to add the official Ceph repos and use those instead to get an up-to-date Ceph
[18:07] <gregaf1> dspano: definitely not
[18:08] <dspano> I added this to my /etc/network/interfaces file and it will lock up when the client tries to use any of the mounted files.
[18:08] <dspano> auto bond0:0
[18:08] <dspano> iface bond0:0 inet static
[18:08] <dspano> address
[18:08] <dspano> netmask
[18:09] <dspano> It works after I remove it.
[18:09] <senner> create a bridge to the bond0 vlan
[18:09] <senner> or a vlan.. on the bond
[18:09] <senner> don't try to put a virtual interface on the bond directly
[18:10] * nhm (~nhm@253-231-179-208.static.tierzero.net) Quit (Ping timeout: 480 seconds)
[18:11] <dspano> Alright. I'll give that a try. If you don't mind my asking a stupid question, why does it mess it up?
[18:11] <johnmwilliams_> gregaf1: What is the official repo?
[18:11] <gregaf1> it's in those docs
[18:11] <gregaf1> http://ceph.com/docs/master/install/debian/
[18:16] <johnmwilliams_> I have the latest lucid release from http://ceph.com/debian/dists/lucid/. It does not look to be up to date.
[18:17] <dspano> senner: Forget my previous question, I'm an idiot. I get it now.
[18:18] <johnmwilliams_> gregaf1: Lucid = 0.31 Precise = 0.48
[18:18] <gregaf1> yeah, hrm
[18:19] <johnmwilliams_> Are there any later builds for lucid?
[18:19] <gregaf1> There might not be provided ones ??? I'm afraid I don't handle the packaging; I'm not sure if they just don't have new enough libraries or what
[18:20] <johnmwilliams_> BTW, I got past the cauthtool problem but now I am stuck on preparing the disks on 0.31
[18:20] <johnmwilliams_> sudo ceph-disk-prepare --cluster-uuid={fsid} /mnt
[18:21] <gregaf1> okay, yeah ?????we don't build packages because the dependencies got bad to track, but you should be able to build from source if you're interested...
[18:21] <gregaf1> yeah, all the chef stuff is in much newer versions; you aren't going to get that going with 0.31
[18:22] <johnmwilliams_> 10-4. Thanks.
[18:29] <senner> dspano: cool let me know if it works. I had some issues with bridging before and I noticed i was doing it the way you describe. I use open-vswitch now and it works pretty nice
[18:30] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[18:30] <dspano> senner: I'm only setting up the virtual ip temporarily while I add this host and another to a pacemaker cluster. I hope I don't run into the same issue when I have pacemaker handle this virtual ip.
[18:31] <dspano> senner: I've heard of open-vswitch. I've wanted to check it out, but haven't had the time to do any research.
[18:32] * nhm (~nhm@ has joined #ceph
[18:33] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[18:35] * Cube (~Adium@ has joined #ceph
[18:36] * liiwi (liiwi@idle.fi) has joined #ceph
[18:40] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[18:41] * Cube (~Adium@ Quit (Quit: Leaving.)
[18:47] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:52] * aliguori (~anthony@ has joined #ceph
[18:52] <tnt> sagewk: btw, you're lucky that only loopback fails for you. When I tried 3.6-rc1 a couple of days ago the entire networking behavior was random ...
[18:53] <sagewk> tnt: i don't follow netdev.. but if that's the case i expect they are all over the problem?
[18:54] * Cube (~Adium@2607:f298:a:607:e2f8:47ff:fe08:733a) has joined #ceph
[18:55] <tnt> I don't follow netdev either so I'm not sure. And I had no time to investigate further yet, I had to bring services back up so went to 3.5.1 instead.
[18:56] <tnt> I have bridges over vlan over bonded interfaces and different MTUs ... so ... plenty of places that could fail.
[18:57] <tnt> btw, are the latest libceph/rbd kernel fixes backported to older kernels ?
[18:58] * bchrisman (~Adium@ has joined #ceph
[19:04] * chutzpah (~chutz@ has joined #ceph
[19:05] <dspano> senner: I think that did the trick.
[19:05] <dspano> senner: Thanks!
[19:09] * dmick (~dmick@ has joined #ceph
[19:21] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:21] * glowell_ (~glowell@c-98-210-226-131.hsd1.ca.comcast.net) has joined #ceph
[19:27] * danieagle (~Daniel@ has joined #ceph
[19:30] <senner> dspano: no problem
[19:31] <dmick> How polite, Evan: https://bugs.launchpad.net/whoopsie/+bug/1022435
[19:32] * bitsweat (~bitsweat@ip68-106-243-245.ph.ph.cox.net) has joined #ceph
[19:32] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:36] <dspano> dmick: I had that same issue when they gave me a company issued laptop. Lol.
[19:36] <dspano> dmick: Luckily, I had hacking familiarity.
[19:41] <dmick> er, which issue?
[19:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[19:47] <tightwork> I used ceph osd create and it gives me ... mon< - [osd,create] mon.0 -> '3' (0) .. so is the new id 3 ?
[19:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:14] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[20:32] <dmick> tightwork: I'm not sure. checking
[20:42] <dspano> dmick: I was joking. I was referring to the whoopsie bug.
[20:58] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[21:00] <dspano> Is this a normal log entry? 1 mon.2@0(leader).log v545953 check_sub sub mdsmap not log type
[21:02] <dspano> I just saw this commit, so it will be taken care of in a later release then? http://ceph.com/git/?p=ceph.git;a=commitdiff;h=7adc6c08f1474806c83ef6c89e8752d6cfd35bfd
[21:12] <joshd> dspano: yeah, that's harmless
[21:13] <dspano> josh: Thanks!
[21:31] * The_Bishop (~bishop@2a01:198:2ee:0:c50d:2f48:eff6:9e1d) has joined #ceph
[21:41] <Leseb> joshd: hi josh! one question if you don't mind.. could we consider that marking an OSD as out is a sort of maintenance mode? thank you in advance :)
[21:41] <yehudasa> SpamapS: we want to make sure that we can get rgw to 12.10 main, is there any issue we had trouble with last time that we should fix now?
[21:42] <Leseb> I noticed that since the OSD is marked as out, the client doesn't right anymore on this OSD
[21:44] <joshd> yeah, marking it out means data isn't placed on it, it's equivalent to setting its weight to 0 in the crushmap
[21:45] <Leseb> yes I notice that the weight switched from 1 to 0 on the crush map :)
[21:45] <joshd> you can prepare for maintenance by slowly decreasing an osd's weight to zero
[21:45] <Tv_> so apparently my tolerance for shell scripting is now ~50 lines
[21:45] <tightwork> howd that get there? http://ceph.com/wiki/File:Male_enhancement_pictures_5077.jpg
[21:45] <Tv_> rewriting in python because that's faster
[21:45] <Tv_> tightwork: lots of spam. i want to kill the wiki, it's too much work to keep clean.
[21:45] <tightwork> true
[21:46] <tightwork> its public wiki?
[21:46] <Tv_> tightwork: http://ceph.com/docs is the new hotness
[21:46] <Leseb> joshd: I also noticed that even if the OSD is out, it receives the replica from other OSD
[21:46] <Leseb> still correct?
[21:46] <joshd> Leseb: it'll might take a little while until other osds get the updated osdmap that has the osd marked out
[21:47] <joshd> they're distributed via gossip, so not everyone is up to date at once
[21:47] <Leseb> this why the other OSDs still replicates?
[21:47] <Leseb> ok but my test setup is only 3 machines??? 1 OSD each
[21:48] <joshd> just to make sure, what exactly do you mean by replicates?
[21:49] <Leseb> I mean that the OSD receives data from other OSDs while that one is marked as out. I assume that the client is still writing on a other OSD
[21:50] <gregaf1> what kind of data is it receiving?
[21:50] <Leseb> let's say osd 0,1,2. and 0 is marked as down, it seems that client writes on odd 1 and then OSD 1 writes to OSD 0
[21:50] <gregaf1> since the OSD is still "up", my guess is the others are backfilling from it
[21:50] <Leseb> I can see the OSD partition growing
[21:53] <joshd> that doesn't mean it's getting writes though, it might be internal data it's writing, like more osdmaps
[21:55] <Leseb> ok, so if I set an OSD as out, even if the client writes this OSD won't receive anything?
[21:55] <Leseb> is that the expected behavior?
[21:56] <dmick> tightwork: lunch got in the way. Yes, I believe '3' is the new id
[21:56] <joshd> it won't receive any client writes once the client has the updated osdmap (it will refuse them if it has the updated osdmap)
[21:58] <Leseb> joshd: thank you for the clarification, one more question please, could you explain me those 2 options? filestore_fsync_flushes_journal_data and filestore_sync_flush (set to false by default)
[21:58] <joshd> I'll defer to sjust on that
[21:58] <sjust> hi, looking
[21:59] <Leseb> joshd: sjust thank you guys :D
[22:00] <sjust> filestore_fsync_flushes_journal_data:
[22:00] <sjust> there are filesystems where fsync coincidentally flushes the filesystem journal which makes it effectively a syncfs
[22:01] <sjust> it's not used much anymore
[22:03] <Leseb> sjust: ok for that one
[22:08] <sjust> as far as filestore_sync_flush goes: it appears to enable running sync_file_range to reduce the amount of dirty data in the fs after writes
[22:08] <sjust> but I'm not quite sure that it'
[22:08] <sjust> s wired up correctly
[22:09] <sjust> you should consider it to be obsoleted by filestore_flusher
[22:11] <Leseb> ok since the filestore_flusher comes with true by default
[22:11] <Leseb> so the filestore_flusher simply flush data from the journal to the backend fs?
[22:22] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:36] <sjust> it's an attempt to make the periodic syncs faster by reducing the amount of dirty data
[22:36] <sjust> turning it off frequently improves performance
[22:47] <SpamapS> yehudasa: this bug shows the status of libfcgi, which is needed for enabling radosgw again https://bugs.launchpad.net/ubuntu/+source/libfcgi/+bug/1017978
[22:48] <yehudasa> SpamapS: reading through it ...
[22:49] <SpamapS> yehudasa: it seems to me that we just need the security team to ACK that all their concerns are addressed.
[22:50] <SpamapS> actually it looks like they've ACK'd .. since the compiler warnings have been addressed
[22:50] <yehudasa> SpamapS: there's a message on 7/23 that says that radosgw package will stay in universe?
[22:51] <Leseb> sjust: thanks I'll try and check if I get better pert ;)
[22:51] <Leseb> *perf
[22:52] <SpamapS> yehudasa: indeed. That just means it can't go on any CDs and won't get active security team support for quantal's lifecycle
[22:53] <SpamapS> yehudasa: and a very small subsection of users will not see it (some people do actually turn off universe because of the lack of timely security updates)
[22:53] <yehudasa> SpamapS: can we get it up to main now that libfcgi is acked?
[22:53] <SpamapS> yehudasa: I believe James Page is alluding to the fact that we should probably target doing that when it is fully integrated w/ keystone.
[22:55] <yehudasa> SpamapS: is there a reason why not putting it in main without keystone support?
[22:55] <yehudasa> SpamapS: it's completely orthogonal
[22:55] <SpamapS> yehudasa: available resources mostly
[22:56] <SpamapS> yehudasa: It has not been a small task getting libfcgi into main.. and thats just so we can build radosgw :-P
[22:56] <SpamapS> yehudasa: we are approaching feature freeze.. which is typically when the MIR team gets *slammed*
[22:57] <yehudasa> SpamapS: haven't they looked at it beforehand already?
[22:57] * Cube (~Adium@2607:f298:a:607:e2f8:47ff:fe08:733a) Quit (Ping timeout: 480 seconds)
[22:57] <yehudasa> SpamapS: the issues that they had before were all about libfcgi, and there were a few ceph issues that were addressed long ago
[22:57] <SpamapS> yehudasa: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/932898
[22:59] <yehudasa> SpamapS: message for 4/12 states that the problem is with libfcgi, if that's cleared then what else is needed?
[23:01] * Cube (~Adium@ has joined #ceph
[23:01] <SpamapS> yehudasa: in comment 8 there are specific concerns raised about radosgw's code quality
[23:02] <SpamapS> yehudasa: I think we will need to file a new MIR for radosgw, and have a second security review done.
[23:03] <SpamapS> yehudasa: I had thought bug 1017978 would be that. Its not clear why jdstrand does not think radosgw is part of it.
[23:04] <yehudasa> SpamapS: can we do that now?
[23:06] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:06] <SpamapS> yehudasa: we can, but I'm not finding the extra hammer we need to bump its priority up (usually we'd say "agreed to in blueprint X" but the blueprint is quite vague) https://blueprints.launchpad.net/ubuntu/+spec/servercloud-q-ceph-object-integration
[23:07] <sagewk> spamaps, yehudasa: i'm not quite sure where the link to keystone is coming from wrt prioritization, too.. i havne't been very involved in that discussion. i'll run that by bryan and nick i guess?
[23:08] <SpamapS> sagewk: yeah, get Nick to throw a business partnership stakeholder assertion into the bug and it should get bumped up to High importance.
[23:09] <SpamapS> I'm changing the ceph task back to 'New' and asserting that there is a strategic desire to have radosgw in main.
[23:09] <sagewk> sweet, thanks.
[23:09] <sagewk> i want to get all of this sorted out well before freeze this time :)
[23:10] <Leseb> sjust: wow it's really impressive, I really got extra performance :D
[23:10] <SpamapS> sagewk: yes please!
[23:11] <sagewk> spamaps: are there other unresolved issues that you know of? fyi we're aiming to get all the rbd layering stuff in our next stable release in time for freeze
[23:12] <sagewk> (at least freeze our release when 12.10 freezes)
[23:13] <SpamapS> sagewk: No AFAIK things are on track and our packaging is (slowly) changing back to be more like yours.
[23:14] <sagewk> spamaps: great. let us know if anything comes up :)
[23:15] <SpamapS> sagewk: I will. jamespage is also closer to the issue so I'll ping him tomorrow morning since its well into the night for him (UK)
[23:20] <sagewk> thanks!
[23:24] * EmilienM (~EmilienM@ Quit (Quit: Leaving...)
[23:24] * gregaf1 (~Adium@2607:f298:a:607:f9f3:ac0f:d584:3283) Quit (Quit: Leaving.)
[23:28] * DLange (~DLange@dlange.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:28] * gregaf (~Adium@ has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.