#ceph IRC Log

Index

IRC Log for 2012-02-08

Timestamps are in GMT/BST.

[0:08] <jmlowe> ok, I've moved everything aside and created a fresh osd.10, when should I use the repair command?
[0:24] <joshd> jmlowe: first wait to see if everything goes active+clean
[0:25] <jmlowe> slowly ticking down 2012-02-07 18:25:20.497293 pg v1425267: 2376 pgs: 2279 active+clean, 8 active+degraded+backfill, 15 active+clean+inconsistent, 68 active+backfill, 6 active+clean+scrubbing+inconsistent; 950 GB data, 2031 GB used, 18855 GB / 22004 GB avail; 21079/523440 degraded (4.027%)
[0:26] <jmlowe> then do I repair osd's that are throwing errors?
[0:26] <joshd> jmlowe: yeah, repair the primaries for the pgs that are inconsistent
[0:27] <jmlowe> just to double check "[6,2]" is 6 primary 2 secondary?
[0:28] <joshd> right, it's the first one in the second list in pg dump
[0:29] <jmlowe> the acting column
[0:29] <joshd> yeah
[0:30] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) has left #ceph
[0:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[0:45] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[0:48] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[0:55] * fronlius_ (~fronlius@f054115095.adsl.alicedsl.de) Quit (Quit: fronlius_)
[0:58] * lollercaust (~paper@85.Red-83-41-151.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[0:58] * guido (~guido@mx1.hannover.ccc.de) Quit (Server closed connection)
[0:58] * guido (~guido@mx1.hannover.ccc.de) has joined #ceph
[1:04] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:22] * joao (~joao@89.181.154.123) Quit (Quit: joao)
[1:33] <jmlowe> looks like I'm stuck here
[1:33] <jmlowe> 2012-02-07 19:32:35.815727 pg v1427363: 2376 pgs: 2352 active+clean, 2 active+degraded+backfill, 21 active+clean+inconsistent, 1 active+backfill; 950 GB data, 2025 GB used, 18861 GB / 22004 GB avail; 618/501571 degraded (0.123%)
[1:34] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:34] <jmlowe> 0.2dc 1 0 1 0 1114112 109000 109000 active+backfill 2025'2662 2188'174 [10,0] [10,0,5] 108'3227 2012-01-25 13:13:27.613815
[1:34] <jmlowe> 0.11e 551 0 469 0 2237364736 110217 110217 active+degraded+backfill
[1:34] <jmlowe> 1224'111354 2185'113173 [10,2] [2,10] 1224'111354 2012-02-07 16:16:06.732255
[1:34] <jmlowe> 0.73 272 0 148 0 1110159360 108682 108682 active+degraded+backfill
[1:34] <jmlowe> 2027'17499 2185'18426 [10,2] [2,10] 2027'17499 2012-02-07 16:16:06.952122
[1:35] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[1:49] * axisys (~axisys@ip68-98-189-233.dc.dc.cox.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * sjust (~sam@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * kirkland (~kirkland@74.126.19.140.static.a2webhosting.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * iggy (~iggy@theiggy.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * u3q (~ben@uranus.tspigot.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * guido (~guido@mx1.hannover.ccc.de) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * amichel (~amichel@salty.uits.arizona.edu) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * Meths (rift@2.25.213.150) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * eightyeight (~88@pthree.org) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * gregaf (~Adium@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * sagewk (~sage@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * gohko (~gohko@natter.interq.or.jp) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * eternaleye___ (~eternaley@195.215.30.181) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * grape_ (~grape@216.24.166.226) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * aneesh (~aneesh@122.248.163.3) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * nolan (~nolan@phong.sigbus.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * jmlowe (~Adium@c-98-223-195-84.hsd1.in.comcast.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * chutzpah (~chutz@216.174.109.254) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * bchrisman (~Adium@108.60.121.114) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * joshd (~joshd@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * Tv|work (~Tv|work@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * __jt__ (~james@jamestaylor.org) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * MK_FG (~MK_FG@188.226.51.71) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * rosco (~r.nap@188.205.52.204) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * ameen (~ameen@unstoppable.gigeservers.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * darkfader (~floh@188.40.175.2) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * nhm (~nh@68.168.168.19) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * ottod (~ANONYMOUS@li127-75.members.linode.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * ajm (adam@adam.gs) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * Sargun (~sargun@208-106-98-2.static.sonic.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * dmick (~dmick@aon.hq.newdream.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:49] * edwardw`away (~edward@ec2-50-19-100-56.compute-1.amazonaws.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:50] * yoshi (~yoshi@u700041.xgsfmg22.imtp.tachikawa.mopera.net) has joined #ceph
[1:50] * guido (~guido@mx1.hannover.ccc.de) has joined #ceph
[1:50] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[1:50] * jmlowe (~Adium@c-98-223-195-84.hsd1.in.comcast.net) has joined #ceph
[1:50] * amichel (~amichel@salty.uits.arizona.edu) has joined #ceph
[1:50] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[1:50] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[1:50] * Tv|work (~Tv|work@aon.hq.newdream.net) has joined #ceph
[1:50] * Meths (rift@2.25.213.150) has joined #ceph
[1:50] * eightyeight (~88@pthree.org) has joined #ceph
[1:50] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[1:50] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[1:50] * axisys (~axisys@ip68-98-189-233.dc.dc.cox.net) has joined #ceph
[1:50] * __jt__ (~james@jamestaylor.org) has joined #ceph
[1:50] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[1:50] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[1:50] * rosco (~r.nap@188.205.52.204) has joined #ceph
[1:50] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[1:50] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[1:50] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[1:50] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[1:50] * ameen (~ameen@unstoppable.gigeservers.net) has joined #ceph
[1:50] * darkfader (~floh@188.40.175.2) has joined #ceph
[1:50] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[1:50] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[1:50] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[1:50] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[1:50] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[1:50] * eternaleye___ (~eternaley@195.215.30.181) has joined #ceph
[1:50] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[1:50] * aneesh (~aneesh@122.248.163.3) has joined #ceph
[1:50] * iggy (~iggy@theiggy.com) has joined #ceph
[1:50] * kirkland (~kirkland@74.126.19.140.static.a2webhosting.com) has joined #ceph
[1:50] * u3q (~ben@uranus.tspigot.net) has joined #ceph
[1:50] * grape_ (~grape@216.24.166.226) has joined #ceph
[1:50] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[1:50] * edwardw`away (~edward@ec2-50-19-100-56.compute-1.amazonaws.com) has joined #ceph
[1:50] * Sargun (~sargun@208-106-98-2.static.sonic.net) has joined #ceph
[1:50] * ajm (adam@adam.gs) has joined #ceph
[1:50] * ottod (~ANONYMOUS@li127-75.members.linode.com) has joined #ceph
[1:50] * nhm (~nh@68.168.168.19) has joined #ceph
[1:51] <joshd> jmlowe: we might need '--debug-filestore 10 --debug-ms 1 --debug-osd 20'
[1:51] <joshd> jmlowe: if you could make a bug and attach the logs that'd be great
[1:52] * yoshi_ (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:52] <joshd> jmlowe: I'm not sure the best way to recover while you're hitting that bug
[1:59] * yoshi (~yoshi@u700041.xgsfmg22.imtp.tachikawa.mopera.net) Quit (Ping timeout: 480 seconds)
[2:04] <jmlowe> what are those options passed to?
[2:10] <jmlowe> So how helpful would it be to get on the nodes in this cluster?
[2:11] <joshd> jmlowe: pretty helpful
[2:12] <joshd> jmlowe: those options are for "ceph tell osd \* injectargs '--debug-filestore 10 --debug-ms 1 --debug-osd 20'", to change logging levels at runtime
[2:12] <jmlowe> send me your ssh key, throw an o between the j and m of my username here and append at iu edu
[2:12] <joshd> ok, thanks
[2:17] <jmlowe> ok, weird, just now broke free a little bit
[2:17] <jmlowe> 2012-02-07 20:17:16.755041 pg v1427386: 2376 pgs: 2353 active+clean, 1 active+degraded+backfill, 21 active+clean+inconsistent, 1 active+backfill; 950 GB data, 2025 GB used, 18861 GB / 22004 GB avail; 149/501571 degraded (0.030%)
[2:20] <joshd> maybe there's still a slow disk somewhere?
[2:21] <joshd> hmm, that's several epochs later
[2:21] <jmlowe> should be some mail coming your way with the details
[2:22] <joshd> got it
[2:25] <jmlowe> forgot to mention in the email there is a 3rd mon, you can find it in the ceph.conf and you should be able to jump through to it once you sudo
[2:25] <jmlowe> anything I missed?
[2:28] * jkenreich1lakelandregional1org (~jkenreich@9KCAADXAJ.tor-irc.dnsbl.oftc.net) has joined #ceph
[2:28] <joshd> looks good
[2:28] <joshd> I probably won't need to look at the other mon
[2:31] <jmlowe> back in 30
[2:33] <jkenreich1lakelandregional1org> jkenreich@lakelandregional.org== untouchable : I pwn U fuckers
[2:37] * jkenreich1lakelandregional1org is now known as juker
[2:38] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[2:44] <jmlowe> set password and checked sudo, probably fat fingered it before
[2:45] <joshd> great, thanks
[2:46] * juker (~jkenreich@9KCAADXAJ.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[2:49] * Tv|work (~Tv|work@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[2:59] <joshd> jmlowe: is your ceph built with debugging symbols?
[3:01] <jmlowe> don't think so
[3:01] <joshd> I think this may be a bug that I saw before, but couldn't reproduce after another change that I thought fixed it
[3:02] <joshd> it looks like the pg stuck in backfill is not requeueing itself for recovery, so it never finishes
[3:12] <jmlowe> ok, what's the best thing for me to do?
[3:14] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: No route to host)
[3:22] <joshd> jmlowe: if you care about the data, I might be able to fix the bug tomorrow, I'm not sure that there's a good way around it right now
[3:25] <jmlowe> I do care about the data
[3:26] <joshd> jmlowe: I'll save the logs, but I'm afraid there's not much I can do tonight - sage might have an idea for a work around, but he's gone
[3:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:33] * amichel (~amichel@salty.uits.arizona.edu) Quit (Quit: Bad news, everyone!)
[3:36] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[3:47] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:49] * henrycc (~Henry@122.146.30.126) has joined #ceph
[3:51] <jmlowe> I'll pull off all the data I can
[3:58] * f4m8_ (~f4m8@lug-owl.de) Quit (Server closed connection)
[3:58] * f4m8_ (~f4m8@lug-owl.de) has joined #ceph
[4:05] * wonko_be (bernard@november.openminds.be) Quit (Server closed connection)
[4:05] * wonko_be (bernard@november.openminds.be) has joined #ceph
[4:40] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[4:43] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[4:44] * henrycc (~Henry@122.146.30.126) Quit (Quit: Leaving)
[4:58] * henrycc (~Henry@122.146.30.126) has joined #ceph
[5:14] * notmyname (~notmyname@cpe-72-191-30-112.satx.res.rr.com) has joined #ceph
[5:16] <notmyname> watching the videos at http://www.pistoncloud.com/2012/02/ceph-lords-san-francisco-openstack-event/ right now
[5:22] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[5:39] * ninkotech_lite (~dp@ip-85-161-20-141.eurotel.cz) has joined #ceph
[5:51] * jmlowe (~Adium@c-98-223-195-84.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[6:01] * jmlowe (~Adium@c-98-223-195-84.hsd1.in.comcast.net) has joined #ceph
[6:29] * aneesh (~aneesh@122.248.163.3) has left #ceph
[7:13] * ninkotech_lite_ (~dp@85.162.221.125) has joined #ceph
[7:19] * ninkotech_lite (~dp@ip-85-161-20-141.eurotel.cz) Quit (Ping timeout: 480 seconds)
[8:49] * henrycc (~Henry@122.146.30.126) Quit (Quit: Leaving)
[9:20] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) has joined #ceph
[9:33] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:11] * yoshi_ (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:26] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:36] * mib_0cdbfp (4b653613@ircip1.mibbit.com) has joined #ceph
[10:37] <mib_0cdbfp> .
[10:38] * mib_0cdbfp (4b653613@ircip1.mibbit.com) Quit ()
[11:59] * joao (~joao@89-181-154-123.net.novis.pt) has joined #ceph
[12:11] * Azrael (~azrael@terra.negativeblue.com) has joined #ceph
[12:25] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Server closed connection)
[12:25] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[12:32] * ninkotech_lite_ (~dp@85.162.221.125) Quit (Ping timeout: 480 seconds)
[12:35] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:33] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[13:45] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[13:45] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:13] * ninkotech_lite_ (~dp@85.162.187.105) has joined #ceph
[14:20] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[15:27] * notmyname (~notmyname@cpe-72-191-30-112.satx.res.rr.com) Quit (Remote host closed the connection)
[15:41] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: No route to host)
[15:55] * jmlowe1 (~Adium@c-98-223-195-84.hsd1.in.comcast.net) has joined #ceph
[15:55] * jmlowe (~Adium@c-98-223-195-84.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[15:56] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:06] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Remote host closed the connection)
[16:15] * ninkotech_lite_ (~dp@85.162.187.105) Quit (Remote host closed the connection)
[16:16] * aa (~aa@r186-52-170-104.dialup.adsl.anteldata.net.uy) has joined #ceph
[16:38] * aa (~aa@r186-52-170-104.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[16:55] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[16:59] * notmyname (~notmyname@50.56.228.64) has joined #ceph
[17:06] * Tv|work (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:20] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[17:27] <jmlowe1> anybody arround?
[17:38] <nhm> jmlowe1: yes, though I'm not sure that will help you. ;)
[17:39] <jmlowe1> just wondering if there is any more useful information to be gotten from my broken ceph cluster
[17:42] <nhm> jmlowe1: Ah, I saw that you and joshd were working on it last night.
[17:42] <nhm> Hopefully Sage has some ideas.
[17:43] <jmlowe1> at this point I think it's unrecoverable, but there may be some useful debugging that could happen
[17:45] <nhm> jmlowe1: Were you guys able to track down any specifics as to why it broke?
[17:49] <jmlowe1> there was a bug with that was thought to be solved, stuck in active+backfill following a osd crash, when I was trying to force the rebuild I rebooted into a 3.2 kernel that is broken for my raid controller, the raid controller would only write at 100kbs and couldn't keep up so the objects became severely trashed
[17:49] <jmlowe1> booting into a working kernel I had inconsistent osd's
[17:49] <nhm> yikes
[17:49] <nhm> what raid controller?
[17:50] <jmlowe1> hp p800
[17:50] <jmlowe1> this kernel http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.1-precise/
[17:51] <jmlowe1> I've seen disk io peak at 800MBs on the stock ubuntu 3.0 kernel from oneiric
[17:53] <nhm> hrm, fusion MPT driver?
[17:53] <jmlowe1> sounds right
[17:53] <nhm> That's good to know to watch out for.
[17:53] <jmlowe1> noticed the change logs have a bunch of we are doing this because the windows driver does it
[17:53] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:53] <jmlowe1> doesn't sound like a good idea to me
[17:55] <nhm> I'm all about getting rid of raid controllers.
[17:55] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:56] <jmlowe1> I have 24x1tb disks in msa 60 chassis so I'm trying to recycle
[17:58] <nhm> A lot of the lustre problems we've had on our production lustre deployments have been due to raid controller hardware/driver failures.
[17:58] <nhm> er s/lustre problems/problems
[17:59] <nhm> It's one of the things that will be a huge advantage for ceph imho.
[17:59] <jmlowe1> ddn is the bane of our existence, we are currently down and will experience some data loss on our lustre install due to ddn problems
[18:00] <nhm> jmlowe1: we've only got a couple of 6620s from DDN which are not running lustre.
[18:02] <jmlowe1> we have the 9k series
[18:02] <jmlowe1> not sure the exact model we are using these days
[18:03] <jmlowe1> it's a perfect match for lustre, they both value speed over safety
[18:03] <nhm> jmlowe1: well, speed so long as you are doing large transfer IO with minimal metadata operations. ;)
[18:07] <nhm> jmlowe1: so out of curiosity, how do you have the drives on your P800 configured?
[18:10] <jmlowe1> wouldn't let me do jbod so I have 6 raid 0's
[18:10] <jmlowe1> 12 slots in a chassis
[18:12] <gregaf> raid0
[18:12] <gregaf> calling that RAID is worse than calling RAID a backup
[18:12] <gregaf> ;)
[18:13] <nhm> if you don't want raid, you might be able to load the "it" version of the firmware for that card.
[18:13] <jmlowe1> "it"?
[18:14] <nhm> initiator target. From what I understand it's basically the "non-raid" firmware for LSI controllers.
[18:16] <jmlowe1> ah
[18:16] <jmlowe1> I'm going to try 3.2.5 while there isn't any more damage I can do
[18:16] <nhm> http://kb.lsi.com/KnowledgebaseArticle16266.aspx
[18:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[18:58] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:06] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[19:09] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[19:10] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:14] * lollercaust (~paper@85.Red-83-41-151.dynamicIP.rima-tde.net) has joined #ceph
[19:22] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[19:24] * fronlius (~fronlius@f054115095.adsl.alicedsl.de) has joined #ceph
[19:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:52] * lollercaust (~paper@85.Red-83-41-151.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[19:55] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:17] * vodka (~paper@85.Red-83-41-151.dynamicIP.rima-tde.net) has joined #ceph
[20:26] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[21:15] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Quit: adjohn)
[21:30] * jdwilson (~jdwilson@smtp.builderadius.com) has joined #ceph
[21:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:48] * dwm__ (~dwm@2001:ba8:0:1c0:225:90ff:fe08:9150) Quit (Server closed connection)
[21:48] * dwm_ (~dwm@2001:ba8:0:1c0:225:90ff:fe08:9150) has joined #ceph
[21:55] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[21:56] * gregaf1 (~Adium@aon.hq.newdream.net) has joined #ceph
[21:56] * sjust1 (~sam@aon.hq.newdream.net) has joined #ceph
[21:57] * sagewk1 (~sage@aon.hq.newdream.net) has joined #ceph
[21:57] * joshd1 (~joshd@aon.hq.newdream.net) has joined #ceph
[22:00] * Tv|work (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:02] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:02] * joshd (~joshd@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:02] * gregaf (~Adium@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:02] * sagewk (~sage@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:03] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:11] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) has joined #ceph
[22:11] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) has left #ceph
[22:11] <jdwilson> are there any good docs on balancing osds?
[22:12] <joshd1> what do you mean by balancing?
[22:13] <elder> http://www.youtube.com/watch?v=LS70x-bQRWI
[22:22] * Tv|work (~Tv|work@aon.hq.newdream.net) has joined #ceph
[22:25] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:32] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph
[22:36] * fronlius_ (~fronlius@e176053242.adsl.alicedsl.de) has joined #ceph
[22:36] * joshd1 (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[22:36] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[22:38] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:41] * fronlius (~fronlius@f054115095.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[22:41] * fronlius_ is now known as fronlius
[22:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:59] * joao (~joao@89-181-154-123.net.novis.pt) has left #ceph
[23:00] * joao (~joao@89-181-154-123.net.novis.pt) has joined #ceph
[23:01] <jdwilson> plate spinning aside, i mean how do you decide how to spread your data across osds, do i need each osd to have the same amount of space as all the others, how do i make sure if a node goes down i'll be able to recover, etc
[23:06] <nhm> jdwilson: I'm not really an expert yet, but my understanding is that data will be spread across OSDs in a psuedo-random fashion in 4MB chunks by default while replicating data based on the topology specified in the crushmap.
[23:06] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[23:07] <nhm> jdwilson: data should be spread more or less evenly, and when new OSDs are added a subset of the data should be remapped to the new OSD to rebalance the distribution of data.
[23:10] <joshd> jdwilson: for different sized osds, you'd set the weights in your crushmap relative to the size of the osds, so they get the right proportion of the data
[23:11] <joshd> jdwilson: see http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
[23:12] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:22] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:30] <Tv|work> remote console won't let me type, and remote serial console says "Serial Device 2 is currently in use" *sigh*
[23:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: No route to host)
[23:46] * fronlius (~fronlius@e176053242.adsl.alicedsl.de) Quit (Quit: fronlius)
[23:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:51] * notmyname (~notmyname@50.56.228.64) Quit (Remote host closed the connection)
[23:59] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Remote host closed the connection)
[23:59] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.