#ceph IRC Log

Index

IRC Log for 2012-09-21

Timestamps are in GMT/BST.

[0:06] <dmick> sagewk: repeating for your benefit:
[0:06] <dmick> (02:55:53 PM) SpamapS: sagelap: 717 files changed, 180478 insertions(+), 88133 deletions(-)
[0:06] <dmick> (02:56:00 PM) SpamapS: sagelap: *wow*
[0:06] <sagewk> spamaps: from what to what?
[0:07] <SpamapS> 0.48.1 -> 0.48.2
[0:07] <SpamapS> but
[0:08] <sagelap> eh... that' can't be right
[0:08] <SpamapS> I think I grabbed the tarball, where the previous packages were from the git tag
[0:08] <SpamapS> so a lot of that is generated files
[0:08] <SpamapS> redoing w/ git
[0:08] <sagewk> fatty:src 03:23 PM $ git diff --stat v0.48.1argonaut..v0.48.2argonaut | tail -1
[0:08] <sagewk> 43 files changed, 715 insertions(+), 198 deletions(-)
[0:09] <SpamapS> debian/rules really needs a get-orig-source rule
[0:09] <sagelap> k
[0:13] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[0:13] * liiwi (liiwi@idle.fi) has joined #ceph
[0:17] <SpamapS> 40 files changed, 712 insertions(+), 197 deletions(-)
[0:17] <SpamapS> much better :)
[0:19] <sagewk> :)
[0:20] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:20] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:22] <sagewk> spamaps: there is a watch file.. should the get-orig-source rule just run uscan?
[0:22] <SpamapS> sagewk: shoot, missed that.. it does the right thing
[0:23] <sagewk> k cool
[0:39] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[0:48] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[0:49] * Karcaw (~evan@96-41-198-212.dhcp.elbg.wa.charter.com) has joined #ceph
[0:55] * BManojlovic (~steki@212.200.243.39) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:57] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:12] * MikeMcClurg (~mike@83-131-44-240.adsl.net.t-com.hr) Quit (Quit: Leaving.)
[1:13] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[1:23] <nhm> thank god, the last set of raid controller tests is about to run.
[1:25] <dmick> raid. it kills nhm dead.
[1:27] <nhm> at least this time I remembered to pass the stripe width and chunk size into ext4 and xfs properly. Not that it is making a whole lot of difference.
[1:34] * lofejndif (~lsqavnbok@1RDAADRAA.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:40] <sagewk> what if: the primary tells the replica "i have [N?] objects for you", and the replica says "give me the next one" when it is ready for it. that way it's pull, so flow control is easier, and the primary still controls the consistency/ordering of the pushed object
[1:40] <sjust> sagewk, mikeryan: seems similar to extending the backfill reservation concept to a recovery reservation
[1:41] <sagewk> yeah.. although in this case the reservation isn't strictly needed, it's more implicit because the puller is only requesting as much data as it is ready for
[1:41] <sagewk> it might make the reservation unnecessary?
[1:41] <sjust> mikeryan: we should be able to demonstrate it on burnupi pretty easily
[1:42] <mikeryan> sjust: wrong window?
[1:42] <sjust> mikeryan: meh, this is probably a good place for the design discussion
[1:43] <sagewk> we should think about the related problems we want to solve, like making small object recovery go fast too. if the replica is pulling, it won't know how big the objects are
[1:43] <sagewk> it could say "send me more", though, and the primary responds with a 4MB big objects of 10 little ones, or something
[1:44] <mikeryan> "send me more" or "send me chunks as big as $N MB"?
[1:45] <mikeryan> we can get into trouble if we try to get too cute about optimizing things as well
[1:45] <sjust> I'm not sure I see a big advantage of that over doing a pg wide reservation combined with prioritizing the message queue
[1:45] <mikeryan> bin packing is NP-hard :)
[1:45] <sagewk> it won't know whether there are big objects or little objects coming next.. so i think we need some generic 'more' metric, N units of work.
[1:45] <sjust> the pg reservation has the added benefit of causing us to finish single pgs faster
[1:45] <sagewk> true
[1:46] <sjust> we probably need to prioritize the op queue anyway to prevent the replica needing to predict load
[1:46] <sagewk> hmm. and it would just have a long/slow queue if it is otherwise busy, throttling recovery that way
[1:46] <sjust> right
[1:46] <sagewk> yeah
[1:46] <mikeryan> long/slow queue
[1:46] <mikeryan> someone explain?
[1:46] <sagewk> recovery vs client
[1:47] <mikeryan> ah, we give clients higher priority
[1:47] <sjust> right
[1:47] <nhm> sagewk: you might want to think about this in the context of things like RDMA too. Wouldn't a generic "more" metric be better in that case?
[1:47] <sagewk> so the real challenge there is going to be making sure there aren't subtle ordering requirements with repops vs recovery ops
[1:47] <sjust> so the recovery messages will just sort of accumulate
[1:47] <sjust> sagewk: yeah, but it shouldn't be too bad
[1:47] <mikeryan> sjust: careful, that's what i thought about chunky scrub
[1:48] <sjust> mikeryan: :)
[1:48] <sagewk> we can artificially skew the recovery (or client op) queue in qa to flush out those bugs
[1:48] <sjust> yeah
[1:48] <sjust> a good first step would probably be to add a bit of instrumentation to find out how many pushes are in progress on an OSD vs time
[1:48] <sagewk> already there, i think..
[1:48] <sagewk> oh, not in progress
[1:49] * Ryan_Lane (~Adium@216.38.130.162) Quit (Quit: Leaving.)
[1:49] <sjust> that way, we can reproduce the conditions and verify that a small number of osds are seeing a disproportionate number of pushes
[1:49] <sagewk> yeah
[1:49] <sagewk> hmm, harder to count on the receiving end, since they're sitting in the op queue
[1:49] <sjust> yeah
[1:49] <sagewk> bleh
[1:50] <sjust> just counting the number of pushes completed vs time would be enough
[1:50] <sagewk> nhm: i think the transport is orthogonal
[1:50] <sjust> I guess the perf counter already do that?
[1:50] <sagewk> on the primary they do, probably not on the replica.. i forget
[1:50] <sagewk> but trivial to add
[1:50] <sjust> yep
[1:51] <sjust> mikeryan: that would be done in ReplicatedPG::handle_push()
[1:53] <nhm> sagewk: you are probably right, I think was thinking about it incorrectly.
[1:53] * lofejndif (~lsqavnbok@1RDAADRAA.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:55] <mikeryan> k, so add that perf counter
[1:56] <mikeryan> next step is duplicate the problem somewhere?
[1:56] <sjust> yep
[1:58] <mikeryan> which git tag?
[1:58] <mikeryan> v0.48.2argonaut?
[2:00] * The_Bishop (~bishop@2001:470:50b6:0:5ff:eeaf:4d92:4854) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[2:02] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:04] * sagelap1 (~sage@227.sub-70-197-143.myvzw.com) has joined #ceph
[2:07] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[2:08] <sagelap1> mikeryan: let's work off of master for this
[2:08] <mikeryan> k
[2:08] * slang (~slang@2607:f298:a:607:b919:8ff9:ead4:a81e) Quit (Quit: Leaving.)
[2:09] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:29] * tryggvil (~tryggvil@46-239-230-168.tal.is) Quit (Quit: tryggvil)
[2:31] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) has joined #ceph
[2:35] * dmick (~dmick@2607:f298:a:607:9409:5dc2:eaeb:4218) Quit (Quit: Leaving.)
[2:51] * sagelap1 (~sage@227.sub-70-197-143.myvzw.com) Quit (Ping timeout: 480 seconds)
[2:51] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[2:54] <nhm> heh, the dirt cheap controller without cache is doing better in raid0 testing than the expensive controller with cache.
[3:01] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:03] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:06] * maelfius (~mdrnstm@66.209.104.107) Quit (Quit: Leaving.)
[3:19] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[3:33] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) has joined #ceph
[3:48] * yehudasa_ (~yehudasa@static-66-14-234-139.bdsl.verizon.net) has joined #ceph
[3:57] * yehudasa_ (~yehudasa@static-66-14-234-139.bdsl.verizon.net) Quit (Ping timeout: 480 seconds)
[4:06] * yehudasa_ (~yehudasa@ace.ops.newdream.net) has joined #ceph
[4:39] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[4:41] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[5:06] * deepsa (~deepsa@122.172.5.120) has joined #ceph
[5:12] * yehudasa_ (~yehudasa@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[5:20] * Ryan_Lane1 (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[5:20] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[5:23] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) Quit (Remote host closed the connection)
[5:56] * nhm (~nhm@67-220-20-222.usiwireless.com) Quit (Ping timeout: 480 seconds)
[6:07] * Cube (~Adium@12.248.40.138) Quit (Quit: Leaving.)
[6:24] * lxo (~aoliva@28IAAHRRF.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[6:29] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:48] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:55] * pentabular (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has left #ceph
[7:19] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Read error: Connection reset by peer)
[7:28] * Tobarja (~athompson@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[7:34] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[7:36] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:29] * Ryan_Lane1 (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:31] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[8:42] * helloadam (~adam@office.netops.me) Quit (Ping timeout: 480 seconds)
[8:54] * MikeMcClurg (~mike@83-131-44-240.adsl.net.t-com.hr) has joined #ceph
[9:10] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[9:22] * deepsa_ (~deepsa@115.242.146.166) has joined #ceph
[9:23] * deepsa (~deepsa@122.172.5.120) Quit (Ping timeout: 480 seconds)
[9:23] * deepsa_ is now known as deepsa
[9:30] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) has joined #ceph
[9:31] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[9:33] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:35] <pmjdebru1jn> hi folks
[9:35] <pmjdebru1jn> 0.48.2 :o
[9:36] <pmjdebru1jn> the release notes mention changes to the upstart scripts etc
[9:39] * pentabular (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has joined #ceph
[9:39] * pentabular is now known as Guest7788
[9:39] * Guest7788 (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has left #ceph
[9:52] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:02] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:11] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:15] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[10:23] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:35] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[11:03] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:17] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[11:28] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:05] * deepsa_ (~deepsa@122.172.5.120) has joined #ceph
[12:06] * deepsa (~deepsa@115.242.146.166) Quit (Ping timeout: 480 seconds)
[12:06] * deepsa_ is now known as deepsa
[12:31] * joao (~JL@89.181.153.232) Quit (Quit: Leaving)
[12:40] * joao (~JL@89.181.153.232) has joined #ceph
[12:51] * loicd (~loic@LPuteaux-156-16-100-112.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[13:15] * loicd (~loic@90.84.144.232) has joined #ceph
[13:17] * loicd1 (~loic@90.84.144.176) has joined #ceph
[13:23] * loicd (~loic@90.84.144.232) Quit (Ping timeout: 480 seconds)
[13:24] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:32] * loicd1 (~loic@90.84.144.176) Quit (Ping timeout: 480 seconds)
[13:35] * loicd (~loic@magenta.dachary.org) has joined #ceph
[14:03] * steki-BLAH (~steki@85.222.222.69) has joined #ceph
[14:06] * nhm (~nhm@67-220-20-222.usiwireless.com) has joined #ceph
[14:07] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[14:14] * nhm_ (~nhm@174-20-32-79.mpls.qwest.net) has joined #ceph
[14:16] * nhm (~nhm@67-220-20-222.usiwireless.com) Quit (Ping timeout: 480 seconds)
[14:24] * SvenDowideit (~SvenDowid@203-206-171-38.perm.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:26] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[14:33] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[14:36] * steki-BLAH (~steki@85.222.222.69) Quit (Ping timeout: 480 seconds)
[14:37] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) has joined #ceph
[14:54] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[15:01] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:07] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[15:29] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[15:31] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[15:31] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[15:51] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[15:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:55] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Remote host closed the connection)
[15:56] * The_Bishop (~bishop@e179020010.adsl.alicedsl.de) has joined #ceph
[15:58] * guerby_ (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[15:59] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Read error: No route to host)
[16:09] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:11] * deepsa (~deepsa@122.172.5.120) Quit (Remote host closed the connection)
[16:13] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) has joined #ceph
[16:14] * deepsa (~deepsa@122.172.18.25) has joined #ceph
[16:21] * slang (~slang@2607:f298:a:607:c048:944b:5099:9a42) has joined #ceph
[16:22] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:26] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[16:32] * mrjack_ (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[16:41] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[16:44] * jamespage (~jamespage@tobermory.gromper.net) has joined #ceph
[16:44] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:51] * cblack101 (c037362a@ircip2.mibbit.com) has joined #ceph
[16:53] * sagelap (~sage@157.sub-70-197-142.myvzw.com) has joined #ceph
[16:56] * cron0 (~cron0@palpatine.privatedns.com) has joined #ceph
[16:56] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:02] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:08] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[17:18] * slang (~slang@2607:f298:a:607:c048:944b:5099:9a42) Quit (Read error: Connection reset by peer)
[17:19] * slang (~slang@2607:f298:a:607:c048:944b:5099:9a42) has joined #ceph
[17:23] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[17:29] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[17:34] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[17:35] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[17:44] * ninkotech (~duplo@89.177.137.231) Quit (Read error: Connection reset by peer)
[17:50] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[17:54] * sagelap1 (~sage@157.sub-70-197-142.myvzw.com) has joined #ceph
[17:59] * sagelap (~sage@157.sub-70-197-142.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:01] * nhm_ giggles while installing more drives
[18:05] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:09] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[18:10] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[18:16] * sagelap1 (~sage@157.sub-70-197-142.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:16] * Ryan_Lane (~Adium@207.239.114.206) has joined #ceph
[18:16] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[18:19] * ninkotech (~duplo@89.177.137.231) has joined #ceph
[18:22] * Ryan_Lane1 (~Adium@222.sub-166-250-35.myvzw.com) has joined #ceph
[18:23] * Ryan_Lane (~Adium@207.239.114.206) Quit (Read error: Operation timed out)
[18:23] * Cube (~Adium@12.248.40.138) has joined #ceph
[18:29] * MikeMcClurg (~mike@83-131-44-240.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[18:29] <amatter_> I'm getting a lot of "misdirected client" messages in the logs. Should I do anything about them?
[18:30] * gregaf (~Adium@2607:f298:a:607:c54c:c94:b0fa:326b) has joined #ceph
[18:31] * sagelap (~sage@169.sub-70-197-145.myvzw.com) has joined #ceph
[18:39] * MikeMcClurg (~mike@93-137-170-239.adsl.net.t-com.hr) has joined #ceph
[18:41] * Cube (~Adium@12.248.40.138) Quit (Ping timeout: 480 seconds)
[18:41] * Cube (~Adium@12.248.40.138) has joined #ceph
[18:43] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Remote host closed the connection)
[18:44] * BManojlovic (~steki@212.200.243.39) has joined #ceph
[18:46] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[18:47] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[18:51] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[18:51] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[18:52] * pradeep (~6ac616a2@2600:3c00::2:2424) has joined #ceph
[18:55] <pradeep> hi
[18:57] * sagelap (~sage@169.sub-70-197-145.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:58] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Read error: No route to host)
[18:58] <pradeep> ./bootstrap is generatin error in teuthology,what is the solution
[18:58] <pradeep> hi all, i have installed virtualenv and pip and when i run ./bootstrap- it is generating err- -su: ./bootstrap: No such file or directory.what is the solution
[18:59] * pradeep (~6ac616a2@2600:3c00::2:2424) Quit (Quit: TheGrebs.com CGI:IRC (EOF))
[19:02] <cblack101> I had osd.27 in my cluster faile due to a disk problem, replaced disk, reformatted, followed the instructions at http://ceph.com/wiki/Replacing_a_failed_disk/OSD for replacing a failed disk, and ceph -s repots 47 up, 47 in and 48 OSDs... Is the last step here to get back to 48 just to run a ceph osd in osd.27? or is there something else afterward?
[19:02] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:03] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit ()
[19:03] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:03] * benpol (~benp@garage.reed.edu) has joined #ceph
[19:04] <Cube> cblack101: Is the osd already in your crushmap?
[19:05] <cblack101> I never modified the map after the failure, does it self-modify on failure?
[19:05] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Read error: No route to host)
[19:06] <Cube> You should be able to just mark it as in then I believe, (ceph osd in 27)
[19:07] <cblack101> ok will give that a shot, was ok to here and got nervous and figured I'd ask before I blow something up! :-)
[19:07] <Cube> :)
[19:08] <joao> sagewk, around?
[19:08] <sagewk> joao: yeah
[19:10] * sagelap (~sage@38.122.20.226) has joined #ceph
[19:11] <wido> cblack101: iirc OSD's don't go auto "in" again
[19:12] <sagewk> wido: it depends.. there are several config options to control it. by default, if it's automatically marked out, it'll get automatically marked in again.
[19:12] <wido> cblack101: Never mind, that's not true as long as you didn't touch "mon_osd_auto_mark_auto_out_in"
[19:12] <sagewk> yeah :)
[19:12] <wido> sagewk: You just beat me :)
[19:12] <wido> Was checking config_opts.h
[19:16] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:17] <cblack101> OK, looks like we're in business, once I ran ceph osd up 27... ceph -s showed some pages stuck and degraded, now shows OK but still lloking at: osdmap e66: 48 osds: 47 up, 47 in ceph -s, did I do somethign wrong?
[19:18] <cblack101> and osd.27 is now not running
[19:18] <cblack101> Uh oh
[19:19] * Ryan_Lane1 (~Adium@222.sub-166-250-35.myvzw.com) Quit (Quit: Leaving.)
[19:20] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Read error: No route to host)
[19:20] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:20] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[19:21] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Read error: Connection reset by peer)
[19:21] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:21] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit ()
[19:21] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:23] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) Quit (Remote host closed the connection)
[19:24] * EmilienM (~EmilienM@55.67.197.77.rev.sfr.net) has joined #ceph
[19:28] * slang (~slang@2607:f298:a:607:c048:944b:5099:9a42) Quit (Quit: Leaving.)
[19:33] * slang (~slang@38.122.20.226) has joined #ceph
[19:38] * dmick (~dmick@2607:f298:a:607:a5ae:52fa:c5f9:dc59) has joined #ceph
[19:39] <joshd> cblack101: can you pastebin the output of 'ceph osd tree' and 'ceph osd dump'?
[19:44] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:58] <cblack101> http://mibpaste.com/fjO6db ok, I can't seen to get the osd27 to start now, tried unmounting, reformatting and re-running the ce[h-osd mkfs... thoughts?
[19:58] * nhm (~nhm@67-220-20-222.usiwireless.com) has joined #ceph
[20:00] * nhm_ (~nhm@174-20-32-79.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[20:05] * The_Bishop_ (~bishop@e179008215.adsl.alicedsl.de) has joined #ceph
[20:05] <joshd> cblack101: what's in osd.27's log after you try to start it?
[20:06] <joshd> there should be something at the end telling us why it's no longer running
[20:07] <dmick> cblack101: is the daemon in fact not running (from ps)?
[20:12] * The_Bishop (~bishop@e179020010.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[20:14] <sagewk> spamaps: have a minute to share where things are at?
[20:15] <cblack101> ps ax | grep osd reports .../usr/bin/ceph-osd -i 24 --pid-file /var/run/ceph/osd.24.pid....for -i 24-26 & 28-31... no i- 27 so I'm thinking that 7 is not rrunning josh
[20:15] * steki-BLAH (~steki@212.200.241.145) has joined #ceph
[20:16] <SpamapS> sagewk: nearly ready to upload to the beta freeze queue, it may not get reviewed until next week tho
[20:17] <sagewk> k
[20:18] <nhm> hrm, doesn't look like a SAS9211 likes to live in the same machine with 4 SAS9207s in it.
[20:20] <dmick> cblack101: ok; and is there nothing else in the log from when it tried to start?
[20:21] * BManojlovic (~steki@212.200.243.39) Quit (Ping timeout: 480 seconds)
[20:22] <cblack101> Here's a paste of the osd.27.log after I did a sservice ceph -a stop/start http://mibpaste.com/cvfWPG - we're 48 up/ 48 in now with 2 pgs stuck
[20:27] <joshd> what state are they stuck in?
[20:28] <cblack101> We have pgs recovering now, log on osd.27 is counting down ---- 17599/172826 degraded (10.183%) --- % is getting lower, this is good I assume?
[20:29] <dmick> yep
[20:29] <dmick> sounds like it's proceeding now
[20:30] <cblack101> down to 9.6% now and dropping, I'll let you knwo how we're doing once we get to 0.000%
[20:37] * maelfius (~mdrnstm@66.209.104.107) has joined #ceph
[20:45] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Quit: ZNC - http://znc.sourceforge.net)
[20:47] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[20:48] * darkfaded (~floh@188.40.175.2) has joined #ceph
[20:48] * darkfader (~floh@188.40.175.2) Quit (Read error: Connection reset by peer)
[20:49] * BManojlovic (~steki@212.200.241.157) has joined #ceph
[20:49] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) has joined #ceph
[20:52] <dilemma> So I'm doing some failure testing with my Ceph storage nodes, and I'm running into something that's a bit concerning. It looks like more of a btrfs issue, but I wonder if anyone here has run into it. When I hot-unplug the drive out from under an OSD (with a mounted and active btrfs filesystem on it), it takes the kernel down with it.
[20:53] <dilemma> I get a stack trace, along with the error: "Fixing recursive fault but reboot is needed"
[20:55] * steki-BLAH (~steki@212.200.241.145) Quit (Ping timeout: 480 seconds)
[20:55] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[20:57] * slang (~slang@38.122.20.226) Quit (Quit: Leaving.)
[21:05] * The_Bishop_ (~bishop@e179008215.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:06] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:07] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[21:08] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:13] * pentabular (~sean@adsl-70-231-141-128.dsl.snfc21.sbcglobal.net) has joined #ceph
[21:18] * scuttlemonkey (~scuttlemo@96-42-146-5.dhcp.trcy.mi.charter.com) Quit (Quit: zzzzzzzzzzzzzzzzzzzz)
[21:39] <SpamapS> sagewk: hey did you guys take any pics of the shuttle?
[21:40] * The_Bishop (~bishop@2001:470:50b6:0:dc25:80e5:4676:e618) has joined #ceph
[21:40] <SpamapS> I got to the roof of The Grove right as it was heading south away from griffith observatory :-P
[21:44] <SpamapS> http://instagram.com/p/P2VpT8lFyO/ .. hehe.. I dare you to say you couldn't see it
[21:45] <nhm> nice!
[21:45] <nhm> Too bad I'm back here in MN. :(
[21:46] <nhm> That would have been a great shot.
[21:50] <Cube> watch out! Its heading straight for AON.
[21:51] <nhm> Is it bad that I have to still have to move PCIE cards around before I can get a working configuration?
[21:52] <nhm> aren't we past that in this day and age?
[21:59] <nhm> alright, after a small setback, lets see how 24 OSDs in one node goes.
[21:59] <elder> Wheee!!!
[22:00] <nhm> elder: 24 spinning disks + 8 SSDs for jouranls.
[22:00] <nhm> elder: it'll probably suck because I have CRC32c calculations still on.
[22:00] <elder> I think it will still be good.
[22:00] <nhm> elder: maybe. :)
[22:01] <nhm> elder: I want to break 2GB/s.
[22:01] <elder> Do it.
[22:01] <nhm> OH YEAH
[22:02] <nhm> heh, the fans are spinning up louder
[22:03] <elder> That means it's going to be FAST.
[22:03] <nhm> too bad it's doing 4K random IO.
[22:05] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:25] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) Quit (Quit: Leaving)
[22:28] <pentabular> the fans push the bits harder to go faster
[22:28] <pentabular> :)
[22:35] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[22:35] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[22:38] * cron0 (~cron0@palpatine.privatedns.com) Quit (Ping timeout: 480 seconds)
[22:44] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:46] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:46] <SpamapS> sagewk: 13:45 -queuebot:#ubuntu-release- Unapproved: ceph (quantal-release/main) [0.48.1-0ubuntu2 => 0.48.2-0ubuntu1] (core)
[22:46] <SpamapS> sagewk: its up to the release team now
[22:47] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:54] <nhm> boo, initial numbers for 4M IOs: 1226MB/s. Gotta try it with crc32c disabled.
[23:03] <nhm> probably should disable blktrace/collectl/perf too.
[23:12] <elder> Turn up the fans too.
[23:16] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:18] * loicd (~loic@82.235.173.177) has joined #ceph
[23:22] * nhm (~nhm@67-220-20-222.usiwireless.com) Quit (Ping timeout: 480 seconds)
[23:26] <sagelap> spamaps: good view from here: http://cleverdevil-public.objects.dreamhost.com/view-from-the-top.jpg (tho i was about 12 floors down.. still good though!)
[23:27] * nhm (~nhm@67-220-20-222.usiwireless.com) has joined #ceph
[23:31] <mikeryan> sagelap: were you guys watching the shuttle?
[23:32] <dmick> we were, during pizza lunch, from our floor
[23:32] <dmick> made two loops around the area
[23:32] * slang (~slang@38.122.20.226) has joined #ceph
[23:32] <dmick> well two passes anyway
[23:32] <mikeryan> i was holed up at nasa ames for a good two hours
[23:32] <mikeryan> we got one half-assed pass
[23:32] <SpamapS> I was about a minute too late to see it pass directly over the hollywood sign apparently. Still saw it circling Griffith
[23:32] * slang (~slang@38.122.20.226) Quit ()
[23:33] <mikeryan> still qualifies as the oldest 747 i've ever seen in operation
[23:33] <mikeryan> oh yeah, the shuttle was cool too ;)
[23:33] <SpamapS> does make you wonder what will happen to the 747
[23:33] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Read error: Connection reset by peer)
[23:34] * slang (~slang@38.122.20.226) has joined #ceph
[23:34] <mikeryan> NASA's pretty famous for leaving its outdated planes parked at its various and sundry bases
[23:35] <mikeryan> they also operate some laughably old planes, probably because they take very good care of them
[23:35] <mikeryan> http://www-gte.larc.nasa.gov/img/DC-8.jpg
[23:35] <mikeryan> like that guy
[23:36] <pentabular> jealous of the view from Inktank. any pics?
[23:36] * slang (~slang@38.122.20.226) Quit (Remote host closed the connection)
[23:36] <dmick> http://framework.latimes.com/2012/09/19/endeavour-photos/
[23:37] <pentabular> ..I meant you guys' :)
[23:38] <mikeryan> http://latimesphoto.files.wordpress.com/2012/09/shuttle13.jpg
[23:38] <pentabular> 'your guy's" gOOsh!
[23:38] <mikeryan> pentabular: the second tallest building in the picture is where inktank is located
[23:39] <pentabular> very neat one
[23:39] <elder> (It's basically the building in the middle of that picture. Aon tower.)
[23:39] * nhm (~nhm@67-220-20-222.usiwireless.com) Quit (Ping timeout: 480 seconds)
[23:40] <pentabular> nobody snapped one from pizza lunch??
[23:42] <dmick> pentabular: the one sagewk posted is from the top of our building, with a few of our guys
[23:43] <dmick> http://cleverdevil-public.objects.dreamhost.com/view-from-the-top.jpg
[23:43] * pentabular scrolls up
[23:43] <dmick> haven't seen many people here took that were particularly good
[23:43] <elder> No, but where's the picture of the pizza?
[23:43] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[23:45] * sagelap (~sage@2607:f298:a:607:64c8:48b:8f5f:4504) has joined #ceph
[23:47] <pentabular> wow.. that is surprisingly far away. Neat shot, though, sagewk
[23:48] <dmick> far away from what? It's right at our building :)
[23:48] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[23:55] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[23:56] * slang (~slang@2607:f298:a:607:fc7e:bfaf:6f9b:d9be) has joined #ceph
[23:56] * slang (~slang@2607:f298:a:607:fc7e:bfaf:6f9b:d9be) has left #ceph
[23:56] * slang (~slang@2607:f298:a:607:fc7e:bfaf:6f9b:d9be) has joined #ceph
[23:58] * benpol (~benp@garage.reed.edu) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.