#ceph IRC Log

Index

IRC Log for 2013-03-20

Timestamps are in GMT/BST.

[0:01] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[0:02] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[0:02] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit ()
[0:05] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[0:05] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit ()
[0:07] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[0:10] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:16] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) Quit (Remote host closed the connection)
[0:20] * al (d@niel.cx) has joined #ceph
[0:23] * jlogan2 (~Thunderbi@2600:c00:3010:1:8c00:81c9:796a:9e97) has joined #ceph
[0:27] * jlogan (~Thunderbi@2600:c00:3010:1:74e2:3ecb:40cd:3b85) Quit (Ping timeout: 480 seconds)
[0:30] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[0:35] * LeaChim (~LeaChim@5ad4a53c.bb.sky.com) Quit (Ping timeout: 480 seconds)
[0:39] * diegows (~diegows@190.190.2.126) has joined #ceph
[0:46] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[0:47] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:55] * tnt (~tnt@54.211-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:57] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Remote host closed the connection)
[0:58] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[1:00] * jjgalvez1 (~jjgalvez@12.248.40.138) has joined #ceph
[1:02] * jjgalvez1 (~jjgalvez@12.248.40.138) Quit ()
[1:05] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Ping timeout: 480 seconds)
[1:13] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:20] <nz_monkey_> nhm_: Hey Mark, is there any information on common configurations and what performance to expect ? Similar to what Nexenta have done e.g. http://www.nexenta.com/corp/solutions/dell-and-nexenta-storage-solutions
[1:21] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[1:30] * xiaoxi (~xiaoxiche@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[1:30] <xiaoxi> is ceph.com goes down?
[1:31] <dmick> xiaoxi: yes
[1:31] <dmick> it's being worked on
[1:35] * noob2 (~cjh@173.252.71.3) Quit (Quit: Leaving.)
[1:37] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[1:39] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[1:42] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[1:47] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[2:03] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[2:03] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[2:09] <iggy> oooh, I wonder if it's an announcement/upgrade or if somebody just broke something
[2:10] <dmick> iggy: DC issues.
[2:11] <iggy> awww :( less exciting (at least for me... I'm sure the person trying to fix it feels differently)
[2:24] <nz_monkey_> dmick: lol I was blaming our upstream DNS, glad to know its not
[2:28] <dmick> nod
[2:33] <yehuda_hm> iggy: you've been since forever, hoping for an exciting announcement?
[2:33] <yehuda_hm> been here I mean
[2:56] <iggy> I don't know... sites being down these days gets me all excited
[3:14] * jlogan2 (~Thunderbi@2600:c00:3010:1:8c00:81c9:796a:9e97) Quit (Ping timeout: 480 seconds)
[3:20] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[4:34] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[4:34] * rturk-away (~rturk@ip-64-90-56-3.dreamhost.com) has joined #ceph
[4:44] * rturk-away (~rturk@ip-64-90-56-3.dreamhost.com) Quit (Ping timeout: 480 seconds)
[4:44] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[4:45] * noob2 (~cjh@pool-96-249-204-90.snfcca.dsl-w.verizon.net) has joined #ceph
[4:47] * noob2 (~cjh@pool-96-249-204-90.snfcca.dsl-w.verizon.net) Quit (Read error: Connection reset by peer)
[5:14] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[5:19] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Quit: Leaving.)
[5:25] * SvenPHX-home (~scarter@71-209-155-46.phnx.qwest.net) Quit (Remote host closed the connection)
[5:44] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[5:56] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[5:56] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:59] * StormBP (~StormBP@109.195.66.120) has left #ceph
[6:00] * StormBP (~StormBP@109.195.66.120) has joined #ceph
[6:00] * StormBP (~StormBP@109.195.66.120) has left #ceph
[6:02] * StormBP (~StormBP@109.195.66.120) has joined #ceph
[6:16] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[6:47] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[7:09] * capri (~capri@212.218.127.222) has joined #ceph
[7:31] <dmick> ceph.com and inktank.com are back, fwiw
[7:35] * The_Bishop (~bishop@2001:470:50b6:0:658b:f0ee:70f9:7308) Quit (Ping timeout: 480 seconds)
[7:43] * The_Bishop (~bishop@2001:470:50b6:0:25a0:cc49:4f3d:68df) has joined #ceph
[7:49] * eschnou (~eschnou@223.86-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:55] * tnt (~tnt@54.211-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:09] * eschnou (~eschnou@223.86-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:15] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[8:15] * StormBP (~StormBP@109.195.66.120) Quit (Read error: Connection reset by peer)
[8:15] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Read error: Connection reset by peer)
[8:15] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[8:16] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[8:55] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:04] * sleinen (~Adium@2001:620:0:26:bdf8:7d1f:958e:607d) has joined #ceph
[9:04] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[9:05] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) has joined #ceph
[9:13] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[9:15] * guocai (~dinglbo@171.216.81.164) has joined #ceph
[9:15] * jluis (~JL@89.181.156.206) has joined #ceph
[9:16] * tnt (~tnt@54.211-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:16] <guocai> hello ereryone. is there anyone meet such error "mount error 5 = Input/output error"
[9:17] <vipr_> /ignore -channels #ceph * JOINS PARTS QUITS NICKS
[9:18] <vipr_> :D
[9:19] <guocai> can any one help me ?
[9:20] <guocai> i am using the bobtail version
[9:22] * joao (~JL@89.181.156.206) Quit (Ping timeout: 480 seconds)
[9:24] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:28] * vipr_ is now known as vipr
[9:30] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:32] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:33] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[9:37] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) has left #ceph
[9:40] * dosaboy (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[9:41] * l0nk (~alex@83.167.43.235) has joined #ceph
[9:42] * guocai (~dinglbo@171.216.81.164) has left #ceph
[9:42] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: If your not living on the edge, you're taking up too much space)
[9:45] * dosaboy1 (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[9:48] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[9:50] * dosaboy (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Ping timeout: 484 seconds)
[9:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:54] * LeaChim (~LeaChim@5ad4a53c.bb.sky.com) has joined #ceph
[9:58] * ScOut3R (~ScOut3R@c83-249-245-183.bredband.comhem.se) has joined #ceph
[10:08] * scuttlemonkey (~scuttlemo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[10:08] * ChanServ sets mode +o scuttlemonkey
[10:22] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[10:34] <BillK> guocai: I see this when ceph is too busy ... usually able to mount ok when its sorted itself out (busy after startup, major migration etc.)
[10:36] * dosaboy1 (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Quit: Leaving.)
[10:45] * scuttlemonkey (~scuttlemo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[10:46] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:46] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[10:50] * mib_95n5o2 (57ee8a78@ircip2.mibbit.com) has joined #ceph
[10:50] <mib_95n5o2> hello
[10:50] * mib_95n5o2 (57ee8a78@ircip2.mibbit.com) has left #ceph
[10:51] * noobie (57ee8a78@ircip2.mibbit.com) has joined #ceph
[10:59] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[11:01] * sleinen (~Adium@2001:620:0:26:bdf8:7d1f:958e:607d) Quit (Quit: Leaving.)
[11:01] * sleinen (~Adium@130.59.94.204) has joined #ceph
[11:03] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:06] * mcclurmc (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[11:09] * janisg (~troll@85.254.50.23) Quit (Ping timeout: 480 seconds)
[11:09] * sleinen (~Adium@130.59.94.204) Quit (Ping timeout: 480 seconds)
[11:09] * sleinen (~Adium@2001:620:0:26:4d09:fc82:325e:1177) has joined #ceph
[11:19] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[11:21] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Quit: Leaving.)
[11:25] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[11:26] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[11:26] * janisg (~troll@85.254.50.23) has joined #ceph
[11:39] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[11:43] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:51] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[12:01] * jtang1 (~jtang@2001:770:10:500:84ca:85e1:17d:1ca) has joined #ceph
[12:03] * jtang2 (~jtang@2001:770:10:500:35d6:21a0:3b85:18db) has joined #ceph
[12:03] * sleinen (~Adium@2001:620:0:26:4d09:fc82:325e:1177) Quit (Quit: Leaving.)
[12:03] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[12:08] * loicd (~loic@lvs-gateway1.teclib.net) has joined #ceph
[12:09] * sleinen (~Adium@130.59.94.204) has joined #ceph
[12:09] * jtang1 (~jtang@2001:770:10:500:84ca:85e1:17d:1ca) Quit (Ping timeout: 480 seconds)
[12:10] * sleinen1 (~Adium@2001:620:0:26:a4f7:7127:35d:f47d) has joined #ceph
[12:17] * sleinen (~Adium@130.59.94.204) Quit (Ping timeout: 480 seconds)
[12:27] * rturk-away (~rturk@ds2390.dreamservers.com) Quit (Ping timeout: 480 seconds)
[12:32] <topro> how to specify ceph fuse mount option "-r some_root_directory" in fstab?
[12:34] * BillK (~BillK@124-148-197-216.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[12:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:43] * scuttlemonkey (~scuttlemo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[12:43] * ChanServ sets mode +o scuttlemonkey
[12:45] * BillK (~BillK@124-148-94-74.dyn.iinet.net.au) has joined #ceph
[12:47] <Psi-jack> So, ceph's website is down.
[12:52] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[12:54] <scuttlemonkey> psi-jack: dreamhost having datacenter problems
[12:58] <janos> does linkedin use dreamhost? ;)
[12:58] <janos> they are having issues too
[12:59] <scuttlemonkey> hah
[12:59] <scuttlemonkey> http://www.dreamhoststatus.com/2013/03/19/power-disruption-affecting-us-west-data-center-irvine-ca/
[12:59] <absynth> redundant power, anyone?
[12:59] <scuttlemonkey> looks like the data center had ups issues
[13:00] <Psi-jack> heh
[13:00] <Psi-jack> Wow. :)
[13:00] <absynth> is that your own facility or a colo?
[13:00] <scuttlemonkey> Update March 19th 6:30pm: As of now we have two network devices, that handle traffic internal to the data center, that are down. One main one, and one backup. Our vendor has said, that BOTH devices were fried during the power outage, and are suggesting we RMA BOTH of them. We are in the process of deploying a spare to that location and we estimate that will take 1-2 hours. We will continue to update as we get m
[13:00] <scuttlemonkey> ore information.
[13:00] <scuttlemonkey> dunno
[13:00] <scuttlemonkey> I don't work for dreamhost :)
[13:00] <absynth> sandboxing for the win, huh? ;)
[13:01] <scuttlemonkey> poor guys, sounds like many have been up all night fighting with this
[13:02] <absynth> i like how they plan to have unplanned issues
[13:02] <absynth> This maintenance is to prevent possible unplanned problems with the server today.
[13:03] <absynth> but yeah, power outages at a scale are really annoying, since many of the long-running boxes might not come back up _at all_
[13:04] <janos> ugh. i dont like that fear
[13:05] <janos> have a machine with 5+ years uptime that i was afraid of after a while
[13:05] <janos> have/had
[13:05] <absynth> there's a video circulating in the german datacente scene
[13:05] <janos> planned it out of production much to my joy
[13:05] <absynth> where some guys are moving a live server between colocations
[13:05] <absynth> in the subway
[13:05] <janos> hahah
[13:05] <scuttlemonkey> hah
[13:05] <absynth> legend has it that it was one of those machine where you do _not_ want to risk downtime
[13:05] <janos> redundant psu and multiple ups's?
[13:06] <janos> SWAP NOW!
[13:06] <absynth> yeah, they had a little cart that was stacked car batteries and the machine or something
[13:06] <janos> hahaha
[13:06] <janos> awesome
[13:06] <scuttlemonkey> wow
[13:06] <scuttlemonkey> that's excellent
[13:06] <absynth> checking if i can find it
[13:07] <absynth> http://www.youtube.com/watch?v=vQ5MA685ApE
[13:07] <absynth> german unfortunately
[13:08] <absynth> basically, they soldered a redundant power supply to the main board _in production_
[13:09] <scuttlemonkey> o_0
[13:09] <jluis> and we're down again
[13:10] <absynth> you don't have offsite backup for the websites, right?
[13:10] <absynth> otherwise i'd fire up a couple VMs and you could switch DNS
[13:10] <scuttlemonkey> jluis: yeah, DH having a rough go of it
[13:10] <jluis> scuttlemonkey, I feel like heading to Irvine and slap the data center folk with a huge 'redundant power supply for dummies'
[13:11] <scuttlemonkey> haha
[13:11] <jluis> not that I understand half of it
[13:11] <absynth> jluis: issue is, redundancy costs $$$. and you don't want DH's controller slapping you with "corporate finance for dummies" :)
[13:11] <scuttlemonkey> sounds like the issue was more along the lines of the power outage surged and fried both networking devices
[13:11] <jluis> I just know I lost contact with the planas and the tracker :(
[13:11] <absynth> jluis: good weather in lisboa?
[13:12] <jluis> absynth, few clouds, otherwise amazing
[13:12] <absynth> you know what to do then
[13:12] <scuttlemonkey> that video was pure, unadulterated awesome
[13:12] <absynth> nice walk on the tejo and some sushi in that place directly at the river
[13:13] <jluis> grab the laptop and head to the nearest coffee shop?
[13:13] <jluis> :p
[13:13] <scuttlemonkey> and all just so they could keep their uptime number
[13:13] <jluis> absynth, I have absolutely no idea which sushi place you're talking about
[13:13] <jluis> but there's an awesome one in Costa
[13:13] <jluis> by the beach
[13:14] <jluis> and now I'm thinking about sushi and it's lunch time
[13:14] <absynth> my ex took me there
[13:14] <absynth> it was directly adjacent to the river, a 2 story place with lots of windows
[13:15] <absynth> there were some bars next to it
[13:16] <jluis> ah!
[13:16] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[13:17] <jluis> that's probably in the old shipping warehouses
[13:17] <jluis> 'docks'
[13:17] <jluis> they call it
[13:17] <jluis> anyway, don't know the place, but now I'm craving for sushi
[13:17] <jluis> good job absynth :p
[13:17] <absynth> yeah
[13:17] <absynth> i do my very best
[13:19] <topro> how to specify ceph fuse mount option "-r some_root_directory" in fstab?
[13:20] <absynth> so, where do i get sushi in this godforsaken place now?
[13:31] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:33] * rturk-away (~rturk@ip-64-90-56-3.dreamhost.com) has joined #ceph
[13:37] <jluis> absynth, I don't where you're getting it, but I know I'm having it for lunch in roughly one hour :p
[13:41] * loicd1 (~loic@lvs-gateway1.teclib.net) has joined #ceph
[13:41] * loicd (~loic@lvs-gateway1.teclib.net) Quit (Read error: Connection reset by peer)
[13:41] * rturk-away (~rturk@ip-64-90-56-3.dreamhost.com) Quit (Ping timeout: 480 seconds)
[13:43] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[13:43] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[13:44] <absynth> jluis: damn your urban environment!
[13:44] <absynth> jluis: damn my rural environment!
[13:45] * scuttlemonkey (~scuttlemo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[13:45] * mcclurmc (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[13:48] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[13:56] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[13:57] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:59] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:02] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Read error: Operation timed out)
[14:08] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[14:10] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[14:12] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[14:23] <vipr> haha
[14:24] <vipr> dat server migration
[14:24] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[14:36] * noobie (57ee8a78@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:36] * mib_g4r7wt (57ee8a78@ircip2.mibbit.com) has joined #ceph
[14:36] * mib_g4r7wt (57ee8a78@ircip2.mibbit.com) Quit ()
[14:42] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[14:44] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[14:46] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[14:50] * drokita (~drokita@199.255.228.128) has joined #ceph
[14:53] * gaveen (~gaveen@175.157.131.51) has joined #ceph
[14:53] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[14:55] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[15:01] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:07] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:08] <dspano> Not sure if anyone has let you know or not. tracker.ceph.com is down.
[15:10] <absynth> there's bigger issues than this
[15:10] <absynth> http://www.dreamhoststatus.com/2013/03/19/power-disruption-affecting-us-west-data-center-irvine-ca/
[15:10] <absynth> basically, dreamhost is just in the process of recovering from a total outage
[15:10] <absynth> anyone with ops, wanna /topic that URL or something?
[15:11] * loicd1 (~loic@lvs-gateway1.teclib.net) Quit (Quit: Leaving.)
[15:11] * markbby (~Adium@168.94.245.4) has joined #ceph
[15:11] * loicd (~loic@lvs-gateway1.teclib.net) has joined #ceph
[15:18] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[15:19] * diegows (~diegows@200-081-044-086.wireless.movistar.net.ar) has joined #ceph
[15:22] * sleinen1 (~Adium@2001:620:0:26:a4f7:7127:35d:f47d) Quit (Quit: Leaving.)
[15:22] * sleinen (~Adium@130.59.94.204) has joined #ceph
[15:25] * dosaboy (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[15:25] * loicd (~loic@lvs-gateway1.teclib.net) Quit (Ping timeout: 480 seconds)
[15:26] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Quit: Leaving.)
[15:30] * Morg (b2f95a11@ircip4.mibbit.com) has joined #ceph
[15:30] * sleinen (~Adium@130.59.94.204) Quit (Ping timeout: 480 seconds)
[15:31] * barryo (~borourke@mable.ph.ed.ac.uk) has joined #ceph
[15:32] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[15:32] * sleinen (~Adium@user-23-14.vpn.switch.ch) has joined #ceph
[15:33] * markbby (~Adium@168.94.245.4) has joined #ceph
[15:34] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[15:40] * jlogan1 (~Thunderbi@2600:c00:3010:1:8c00:81c9:796a:9e97) has joined #ceph
[15:40] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[15:47] * portante (~user@66.187.233.206) has joined #ceph
[15:47] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[15:51] * dosaboy (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Read error: Connection reset by peer)
[15:55] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:00] * sleinen (~Adium@user-23-14.vpn.switch.ch) Quit (Quit: Leaving.)
[16:01] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[16:02] * sleinen (~Adium@user-23-9.vpn.switch.ch) has joined #ceph
[16:04] <joelio> Any peeps here have recommendations for s3 C libs perchance (appreciate plain old libcurl and some REST wranglings woudl do) - interested to see if there are any other options
[16:04] <absynth> hrrm, isn't there some comprehensive tools in python by amazon themselves?
[16:05] <absynth> and there's even a c lib
[16:05] <absynth> libs3
[16:05] <absynth> http://aws.amazon.com/developertools/1648
[16:06] <Robe> http://basho.com/riak-cs-is-now-open-source/
[16:06] <Robe> interesting
[16:07] <absynth> is "architected" even an actual verb?
[16:07] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[16:07] <absynth> never heard of anyone architecting a hous
[16:07] <absynth> +e
[16:09] <drokita> A house can be well architected
[16:11] <absynth> merriam webster doesn't know that verb
[16:12] <joelio> absynth: seen the last updated time on that S3 lib - obviously I googled about before coming here ;)
[16:12] <joelio> hence why I'm asking ;)
[16:12] <dspano> absynth: I wasn't aware of the outage. Guess that would explain why it's so quiet in here.
[16:13] <absynth> joelio: many of the amazon tools seem to have a rather old mtime
[16:13] <joelio> absynth: also was asking for a C library, not python. The app we're creating is written in C and needs to talk to Ceph via S3 :)
[16:13] <joelio> yea, maybe I should test first... just something that's not been updated for 5 years in such a quick moving environment makes me hesistant
[16:14] <absynth> watch your tone, if you don't want help, /ignore me and don't bitch at me
[16:14] <absynth> jeez
[16:14] * diegows (~diegows@200-081-044-086.wireless.movistar.net.ar) Quit (Ping timeout: 480 seconds)
[16:14] <absynth> dspano: i think everyone who is "expendable" is now working on dreamhost machines
[16:15] <joelio> absynth: umm? haha - been reading ember.js posts by any chance? :D
[16:15] <absynth> plus, many of the internal tools at inktank seem to be affected, too
[16:16] <dspano> absynth: What a harsh hump day. I read the posts in the link you posted. There's a lot of anger there.
[16:16] <absynth> dspano: i cannot even imagine how the techs are now feeling
[16:16] <absynth> probably held upright only by caffeine
[16:17] <dspano> No doubt.
[16:17] <jluis> absynth, couldn't help it: http://goo.gl/YrsEq
[16:17] <jluis> the view isn't that awesome though
[16:17] <absynth> jluis: wait, let me counter
[16:17] <darkfader> absynth: the good thing is after 30 hours you don't "feel" anything anymore :)
[16:17] <jluis> absynth, eh :p
[16:18] <dspano> darkfader: Hahahahaha!
[16:18] <absynth> i know, i had some data center night shifts myself
[16:19] <darkfader> damn, power failure 50 mins after they had it restored
[16:19] <darkfader> that's one for the management books
[16:19] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:19] <absynth> yeah, that's even worse than bounceback failure
[16:19] <darkfader> technically it's no biggie, but it after a long time you start losing hope
[16:20] <darkfader> like, you basically expect it to continue like that for the rest of your life :)
[16:20] <absynth> and some customers will perceive the two separate outages as one, continouous issue
[16:20] <absynth> hah, that's how we felt with ceph, 3 months ago :o)
[16:20] <darkfader> absynth: depends on the sla's
[16:20] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[16:20] <darkfader> basically it's the same failure
[16:21] <dspano> I guess it could be worse. You could be working on an Alaskan crab fishing boat.
[16:21] <joelio> or dead
[16:21] <dspano> Lol!
[16:21] <darkfader> it's best to not give clearance all too early, but of course everyone is eager to get back onto their box
[16:22] <absynth> i mean, those crab guys...
[16:23] <absynth> it's kinda awesome but probably only in front of a TV screen
[16:23] <absynth> in a warm, cozy living room
[16:23] <absynth> jluis: https://owncloud.christopher-kunz.de/public.php?service=files&t=208f853f5beb952cab8d819bdc3b1aed
[16:23] <dspano> Yeah, I'm too much of a wuss for that crap.
[16:24] <jluis> absynth, freaking awesome
[16:24] <absynth> wanna switch?
[16:24] <jluis> but looks too cold for my taste
[16:24] <jluis> not really no :p
[16:24] <absynth> at least we have a reason to drink glögg again
[16:24] <jluis> you're welcome to come and join me though
[16:26] <absynth> i think the guys at WHD have it worse
[16:26] <absynth> they are in a *tent*
[16:26] <jluis> woot?
[16:27] <darkfader> absynth: wtf? that is outside?
[16:27] <darkfader> i'm suddenly glad i didn't go there
[16:27] <absynth> parts of it are, yeah
[16:27] <jluis> what happened to the nice building used last year?
[16:27] <absynth> jluis: you weren't present at the sit-down with sage and dona last year, right?
[16:27] <darkfader> moved to cloud
[16:27] <jluis> besides, last year it was warmer in Rust than it was in Lisbon
[16:27] <jluis> absynth, no
[16:27] <absynth> the bar-restaurant thingy, the round one, that's a tent
[16:28] <absynth> but ATM, there's 8.9C in rust, so it should be fine
[16:28] <darkfader> it was sunny here during the day but now it's header for 5 or so
[16:28] <jluis> lol 9C being "fine"
[16:28] <darkfader> donotlike
[16:28] <absynth> jluis: compared to 1C and snowing, 9C is fine
[16:29] <jluis> right
[16:29] <jluis> I should stop talking to you guys about the weather; I always end up feeling like a pussy
[16:30] <absynth> well, until 5 mins ago when everyone else cancelled, i was about to go training
[16:30] <absynth> outside
[16:31] <absynth> because generally, the weather is fine - there's daylight
[16:31] <joelio> Freezing here too.. ironic it's been the spring equinox too
[16:31] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[16:47] <nhm_> nz_monkey_: ooh, that's nice. We actually just started working on defining some "standard" configurations that could be fulfilled by multiple vendors. The goal I think is to eventually get some numbers for them. It'd be neat to put a sheet like this together.
[16:48] <absynth> nhm_: did you ever try operating ceph on the intel ICH on-board sata controllers?
[16:48] <absynth> +OSDs
[16:48] <tnt> yehudasa: ping
[16:49] <nhm_> absynth: hrm, I think some of our QA testing nodes might just have ICH on-board.
[16:50] <nhm_> absynth: I forget if they are intel or AMD based.
[16:50] * stxShadow (~jens@p4FECED20.dip.t-dialin.net) has joined #ceph
[16:50] * zK4k7g (~zK4k7g@digilicious.com) has joined #ceph
[16:51] <barryo> some of my test hosts have ICH on board
[16:53] <absynth> there was a guy in here who ran OSDs off those, and had about 20% iowait permanently
[16:54] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[16:54] <barryo> mine is sitting at 1.80 just now
[16:55] * Morg (b2f95a11@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[16:58] <barryo> and the host that also holds my 4 vms is sitting at 8%
[16:58] * neerbeer (~Adium@208.254.28.100) has joined #ceph
[16:58] <absynth> that still sounds like too much, somehow
[16:58] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[16:59] <barryo> these are pretty old desktops
[16:59] <absynth> then again, maybe not too much
[16:59] <barryo> and one of the vms running on them is a pretty busy zabbix server
[17:00] <neerbeer> hello. Is it possible to use say, 4 compute nodes and let ceph rbd just use underlying local disk storage for osds. That way you wouldn't need 'compute' nodes and sepate 'storage' nodes. Anyone tried this ? Or is there an openstack rbd architecture diagram somewhere.
[17:00] <absynth> neerbeer: we are colocating compute and OSD on one and the same node
[17:01] <tnt> we are too. All physical machines are Xen dom0 and each also host a OSD.
[17:01] <absynth> works like a charm
[17:01] <tnt> (OSD being in a VM, not directly in dom0)
[17:01] <barryo> I'll be seperating mine, I have vm's on 3 seperate networks, some i trust more than others
[17:02] <Vjarjadian> i'm hoping i can do that with proxmox... just waiting for my new server so i can try it
[17:02] <absynth> barryo: VM network != OSD network
[17:02] <absynth> that is a given
[17:02] <joelio> I thought (if possible) keeping OSD's on seperate hardware (non-VM hosts) is best?
[17:02] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[17:02] <absynth> joelio: only if you want to minimize one system's failures bleeding into another one's
[17:03] <absynth> i.e. if your OSDs leak memory (which happens), you don't want VMs to suffer
[17:03] <absynth> and vice versa
[17:03] <tnt> joelio: well ... depends on your requirements and budget ...
[17:03] <absynth> but generally, i don't think that separating OSDs and VM hosts is considered a best practice
[17:03] <absynth> err, that was not put right
[17:04] <absynth> i meant to say: "i don't think that colocating OSDs and VMs is frowned upon"
[17:04] <tnt> what you can't do is have the kernel rbd client on the same logical machine as an OSD. That causes issues.
[17:05] <absynth> if you do colocate both, then adding another host gives you more VM capacity AND more ceph capacity, which is awesome
[17:05] <barryo> that's true, but the VM's I have need to be on seperate physical networks
[17:05] <tnt> vlans not good enough ?
[17:05] <absynth> i was just about to say
[17:06] <absynth> openvswitch + openflow = done
[17:06] <tnt> or even different physical interface on the host is needed.
[17:06] <absynth> each vm gets their own vlan and you're set
[17:06] <joelio> + arbitary security policy abnout not 'crossing the streams' == BOOM
[17:06] <joelio> :)
[17:06] <absynth> in addition, the OSDs should really, really, really be in a separate layer2 cloud
[17:08] <barryo> I've not really done much network design in the past, I'll need to read up on it
[17:08] <barryo> colocating does sound quite tempting
[17:09] <neerbeer> The purpose here is to just purchase a physical 'node' that adds compute and storage space.
[17:10] <barryo> It would certainly make thing more affordable than seperating storage and compute
[17:10] <absynth> neerbeer: then go for the colocation idea
[17:10] <absynth> most barebones have two or more eth anyway
[17:11] <absynth> so you put frontnet/vm frontnet on one, ceph on the other
[17:11] <absynth> with a separate switch if possible
[17:11] <neerbeer> So perhaps a single physical host w/ Raid 5, single volume , running an osd on the hyperviser ( kvm ) . The VMs would have rbd info in their xml/kvm config and one would need ( preferably ) just to maintain an odd number of OSDs/compute nodes if we're running osds on the compute nodes.
[17:11] <absynth> you are confusing OSDs and mons
[17:11] <absynth> OSDs do not need to be present in an odd number
[17:11] <absynth> mons do.
[17:11] <neerbeer> Or just could just trunk all vlans and lacp bond the physical interfaces ..
[17:12] <neerbeer> @absynth : thanks for the clarification.
[17:12] <absynth> we did something like that and it fell at our feet during an ARP storm
[17:12] <absynth> so we decided to physically separate the OSD network from the rest of our net
[17:13] <neerbeer> @absynth: at what scale did you get an arp storm ?
[17:13] <absynth> i don't know the reason, but the effect was that the OSDs did not see each other anymore because the switch started shutting ports or something
[17:13] <absynth> s/know/remember the reason
[17:15] * sagelap (~sage@76.89.177.113) has joined #ceph
[17:15] <joelio> I still (personally) think splitting into compute and storage is a more robust plan if you can accomodate it. Means you can specifiy the confiurations better and less liable to take down both OSDs and/or Hypervisors
[17:16] <absynth> whatever floats your boat, but see it this way
[17:16] <absynth> if a machine goes down, i have 4 OSDs down. that is mitigated by ceph automatically
[17:16] <absynth> the 4 downed VMs are just config files stored centrally, i can restart them on another node in seconds
[17:17] <absynth> if i have a current snapshot, i can even minimize the user impact to a moving cursor
[17:17] <barryo> or use pacemaker to do it automatically ;)
[17:17] <absynth> well, that... does not quite scale in our setup :)
[17:17] <absynth> the "4 downed vms" would rather be something ... bigger.
[17:18] <barryo> I have a total of 60 vm's that I need to host
[17:19] <neerbeer> OSD is on the same node as rbd storage correct ? Is that our assumption in the above thread ?
[17:19] <absynth> neerbeer: OSD is on the same node as RBD consumer
[17:19] <absynth> i.e. the qemu-kvm process that mounts the rbd
[17:20] <joelio> absynth: I'm not following pal, if I lost a storage node, then the VM backed devs wouldn't care either?
[17:20] <jtangwk> cool road map
[17:20] <jtangwk> will there be particular branches where the rest api features will be published under?
[17:22] <joelio> absynth: just trying to understand your rationale (not being picky or anything)
[17:22] <absynth> joelio: yes, but you would have spent a lot more cost in capex and opex with only one advantage: you would have zero Vm downtime instead of a couple seconds
[17:22] <jtangwk> ah, i didnt read the road map more carefully
[17:22] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has left #ceph
[17:22] <absynth> so my point is, i guess, that spending a lot to gain rather little does not make sense in every scenario
[17:22] <joelio> absynth: No, the capex is the same. We have a finite pot
[17:22] <joelio> I'd rather spend that finite pot wisely
[17:23] * The_Bishop (~bishop@2001:470:50b6:0:25a0:cc49:4f3d:68df) Quit (Ping timeout: 480 seconds)
[17:23] <absynth> joelio: i don't think capex is the same
[17:23] <joelio> absynth: seriously we have a set amoutn of funds, so the capex for the platform is bound to be the same.. all of the pot available :)
[17:23] <joelio> it's about what you do with that capex
[17:23] <absynth> ok, then maybe it's "bang for the buck" or something
[17:24] <joelio> umm, still don't follow though
[17:24] <absynth> i look at it from this pov: "how much do i need to spend to host 100 vms"
[17:24] <absynth> if i follow my idea, i need two machines
[17:24] <absynth> each hosting some OSDs, and VMs
[17:24] <barryo> absynth: your idea is starting to grow on me
[17:24] <absynth> i get redundancy (crushmap distributes across nodes)
[17:24] <absynth> and i get some headroom for additional Vms if i dimension my hosts right
[17:25] <absynth> (leaving out mon machines here, because they are obviously irrelevant to the case in question)
[17:25] <absynth> if i go with your route, i have to buy at least 3 machines
[17:25] <neerbeer> @absynth: I'd need to add some code/script to automatically rebuild crushmap if I lose a node,correct. Or at least have a script that can do this manually.
[17:25] <absynth> 2 to provide OSDs and ceph redundancy, and at least 1 for VM hosting
[17:25] <joelio> absynth: suer, but that's really notwhat I'm getting at, am I
[17:25] <joelio> I'm saying it's sub-optimal your arrangemen
[17:26] <neerbeer> Same goes for when I add a node ( node has compute and OSD/MON and MDS
[17:26] <absynth> and here, the statement from above comes back into play, "it does not make sense to split in every scenario"
[17:26] <absynth> split OSDs and VM hosts, that is
[17:27] <janos> absynth - when hosting vm's on ont he OSD hosts, using qemu-img to create images i assume?
[17:27] <absynth> you cannot guarantee 100% vm uptime in either scenario, so what gives?
[17:28] <neerbeer> If I've got OSDs running on physically separate hosts then I've got to carry back 10GE infrastructure between compute and storage to support that separate osd cluster. Or … if I'm spreading my vm blocks across multiple physical hosts, then do I need 10GE between nodes ? I guess it depends.
[17:28] <absynth> neerbeer: yep, depends on what your VMs will be doing
[17:28] <joelio> absynth: seriously don't agree I'm afraid. your argument holds water for an isntance where you wanted to get my on bare minimum funds - but that's not what I was saying - is it. What I'm sayiong is mixed nodes are sub-optimal
[17:28] <absynth> you will want 10Gbe between OSDs, either way
[17:29] <absynth> joelio: and i am saying that i cannot see where they are
[17:29] <absynth> joelio: counter argument would be "what happens if your VM node dies?"
[17:29] <absynth> you cannot host the VMs on the OSD node
[17:29] <joelio> When an OSD blows up and takes down your VMs on the same box
[17:29] <joelio> you lose both on one box
[17:29] <absynth> if one of my nodes dies, i can colocate the VMs on any other OSD node
[17:30] <joelio> whilst the box groans under load
[17:30] <joelio> and probably falls over
[17:30] <absynth> an osd being down is a transparent event, it leads to a rebalance but never to an outage. what concerns me (from experience) is downed VMs
[17:31] <drokita> Are there any alternative ways to get the up/down OSD status apart from 'ceph osd tree'
[17:31] * dosaboy (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[17:32] <absynth> joelio: if you are strictly speakign from a _theoretical_ point of view, splitting OSDs and VM hosts is "ideal" in the way that neither component can affect the other, i guess that is what you were getting at
[17:32] * The_Bishop (~bishop@2001:470:50b6:0:658b:f0ee:70f9:7308) has joined #ceph
[17:32] <joelio> if you read, yes :)
[17:32] <absynth> well, in practice that argument is just not that valid, in my experience
[17:32] <joelio> what I'm saying is under theory and (really if you can afford to do it) - split the nodes
[17:33] <joelio> I can see really bad things happening in maintenance/recovery situations with an all-in-one design - but ah well, what do I know :)
[17:34] <absynth> shoot
[17:34] <absynth> what can you see?
[17:35] <joelio> Not being able to adequately accound for memory increases
[17:35] <joelio> both offloading VMs from a failing host onto the other hypervisor and for OSD/metadata recovery
[17:36] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) has joined #ceph
[17:36] <joelio> losing individual OSDs on VM hosts
[17:36] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:36] <joelio> and then having massive amount of memory chewed up if you try and fsck it
[17:36] <absynth> memory issues are present in ceph-osd. these will kill you even on single-hosted OSD boxes, because the OSD processes grow several GBytes per minute
[17:36] <joelio> I don't in my tiered design ;)
[17:37] <absynth> what do you mean, you don't?
[17:37] <joelio> as I know what's needed for recovery and headrooms
[17:37] <absynth> try deep-scrubbing your cluster, then. :)
[17:37] <joelio> it does
[17:37] <absynth> we have 0.56.2 and that has a massive deep-scrub memleak with big PGs
[17:37] <absynth> and the headroom, however big you dimension it, will hardly be 40G per OSD process
[17:38] <joelio> every fscked a very large XFS fs?
[17:38] <joelio> :O)
[17:38] <absynth> we have seen individual OSD failure on VM hosts, and i cannot see the issue with that
[17:38] <absynth> osd fails, you wait for the rebalance to finish, done
[17:38] <drokita> joelio: 3TB before... who knows when it will get done
[17:38] <absynth> what does that have to do with colocation of VM and OSD?
[17:42] <joelio> it has to do with the maintenance of OSDs( if you're on XFS - as I am)
[17:42] <joelio> thing that need to be though about
[17:42] <joelio> it requires RAM for metadata etc.
[17:43] <joelio> if you have loads of sparely poulated VMs and you have KSM enabled
[17:43] <joelio> BOOOM
[17:43] <Gugge-47527> Just the chance of an error in OSD crashing the machine is anough for me to dont host VMs there
[17:43] <absynth> drokita: i don't know of any other way than osd tree
[17:43] <joelio> Gugge-47527: +1
[17:43] <Gugge-47527> I want my VM hosts to do as little as possible :)
[17:44] <joelio> again +1 :)
[17:44] <Gugge-47527> a colocated setup is better than a lot of setups ive seen, but i would not do it
[17:44] <Gugge-47527> just as i would not do a lot of those setups ive seen :P
[17:45] <neerbeer> how many different osd/vm setups are there ? You are either colocating osds and vms or you aren't
[17:45] <Gugge-47527> neerbeer: who said anything about other setups including ceph? :)
[17:45] <drokita> absynth: ceph osd dump workks as well, but you lose the ability to reference the host that it is on. Trying to put some SNMP monitors in place and I want them to be host specific. Going a different direction now :)
[17:45] <Gugge-47527> setups like "single machines hosting vms on non raided storage"
[17:45] <neerbeer> ah, ok
[17:46] <absynth> or setups like "N vmware machines hosting vms on one netapp", awesome if you have 20 VMs but not so awesome if you have 100
[17:47] <Gugge-47527> absynth: "but the netapp stuff never fails!" :)
[17:47] <absynth> it gracefully degrades. :)
[17:49] <absynth> we have seen a vmware installation at a customer's which had about 150 VMs served off one netapp (no idea which, but rather pricey) which had massive performance issues. things you see if you are running VMs off a severely degraded ceph
[17:50] <darkfader> absynth: did they use nfs?
[17:50] <darkfader> or did they not understand shit and use iscsi *hrhr*
[17:50] <absynth> i think it was mixed, actually :)
[17:50] <darkfader> ok fair enough
[17:50] <absynth> but they were vmware certified!!!11111
[17:51] <darkfader> netapps sales people need to be beaten up daily
[17:51] <absynth> btw inktank people how's that ceph engineer certification coming along? :)
[17:51] <darkfader> they love selling a head with 3x as many heads as it should have
[17:51] <darkfader> shelfs
[17:51] * dosaboy (~gizmo@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Quit: Leaving.)
[17:51] <darkfader> damn headache
[17:52] <absynth> i would love to place a "giving head" joke here, but can't think of one that works
[17:52] <darkfader> hrhr
[17:53] <janos> this place isn't that scsi
[17:53] <darkfader> absynth: what do you know about that engineer cert?
[17:53] <darkfader> i missed it, would be a fun thing to take
[17:54] <absynth> darkfader: about this time last year (WHD 12), they were pondering a certification programme
[17:54] <absynth> that's about it
[17:54] <darkfader> ahh ic
[17:54] <absynth> i might want to poke scuttlemonkey or someone about it
[17:54] * dwm37 (~dwm@northrend.tastycake.net) Quit (Server closed connection)
[17:54] * dwm37 (~dwm@northrend.tastycake.net) has joined #ceph
[17:54] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:54] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:57] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[17:57] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[18:02] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[18:02] <darkfader> hehe "Wanna work with Ceph every day?
[18:02] <darkfader> "
[18:02] <darkfader> on the career page
[18:03] <darkfader> (i was just searching about for "ceph engineer" ... fail :)
[18:04] <neerbeer> What are the actual bandwidth numbers folks are seeing between a vm and a few ceph nodes ? Or what would I expect to see if I was 10GE connected and my vm was spread across 3 osds ?
[18:05] * stxShadow (~jens@p4FECED20.dip.t-dialin.net) Quit (Remote host closed the connection)
[18:08] * BillK (~BillK@124-148-94-74.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:09] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:10] * sleinen (~Adium@user-23-9.vpn.switch.ch) Quit (Quit: Leaving.)
[18:10] * sleinen (~Adium@130.59.94.204) has joined #ceph
[18:11] <joelio> neerbeer: That's a 'hoe long is a piece of string question' it really needs qualifying. OSD's could be on spinners or SSD, XFS btrfs ext4 etc.. etc..etc.
[18:11] <joelio> rbd backed VMs etc?
[18:11] <joelio> caching?
[18:12] <joelio> Otherwise, 710345791201581 is an answer :)
[18:13] <absynth> no! Pi times 710345791201581 is much more accurate!!!11
[18:13] * tnt (~tnt@54.211-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:13] <joelio> does not comute
[18:13] <joelio> or compute
[18:13] <joelio> (Jonny 5 on a work from home day I guess)
[18:15] <absynth> neerbeer: traffic spikes occur during recovery, apart from that the traffic is not massiv
[18:15] <absynth> do not expect that 10gbe link to be fully utilized
[18:15] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[18:15] * tryggvil_ is now known as tryggvil
[18:15] <absynth> (this is obviously a very rough answer to a very rough question)
[18:16] * sleinen (~Adium@130.59.94.204) Quit (Read error: Operation timed out)
[18:18] * Cube (~Cube@cpe-76-95-217-215.socal.res.rr.com) Quit (Quit: Leaving.)
[18:19] <joelio> yea, I've managed to saturate a 10Gb VM host node, but only during heavy threaded I/O
[18:19] <joelio> on several VMs
[18:22] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[18:22] * sagelap (~sage@76.89.177.113) Quit (Read error: Connection reset by peer)
[18:22] * sagelap (~sage@76.89.177.113) has joined #ceph
[18:28] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[18:28] * BillK (~BillK@124-169-38-84.dyn.iinet.net.au) has joined #ceph
[18:30] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[18:32] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Remote host closed the connection)
[18:34] * neerbeer (~Adium@208.254.28.100) Quit (Quit: Leaving.)
[18:36] * stacker100 (~stacker66@206.pool85-61-191.dynamic.orange.es) has joined #ceph
[18:36] * stacker666 (~stacker66@215.pool85-58-189.dynamic.orange.es) Quit (Read error: Connection reset by peer)
[18:38] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[18:42] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:44] * Cube (~Cube@12.248.40.138) has joined #ceph
[18:55] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) Quit (Remote host closed the connection)
[19:00] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) has joined #ceph
[19:06] * sleinen (~Adium@2001:620:0:26:4d0b:c04e:5d5:72b8) has joined #ceph
[19:07] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[19:07] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[19:12] * Vjarjadian_ (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[19:13] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[19:14] * Struckto (d995cb93@ircip3.mibbit.com) has joined #ceph
[19:14] <Struckto> PUSSY
[19:14] <Struckto> DILDO
[19:14] <Struckto> PISSFAG
[19:14] <Struckto> ASSFUCK
[19:14] * Struckto (d995cb93@ircip3.mibbit.com) has left #ceph
[19:15] <janos> dang i wonder if he lost a game on xbox. he sounds like it
[19:15] <dmick> finally, some intelligent commentary in the channel
[19:16] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Ping timeout: 480 seconds)
[19:19] * Vjarjadian_ (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: He who laughs last, thinks slowest)
[19:20] * mcclurmc (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:21] * BillK (~BillK@124-169-38-84.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[19:21] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:22] <acaos> so, I'm having an issue with a new Ceph installation (and actually with existing ones) related to the crushmap - the PGs are not evenly balanced amongst the OSDs (there's 50% or more variance)
[19:23] <acaos> I've tried all the various algorithms and even tried changing weights to get them to even up, but they don't seem to really follow any rules - once, I changed the weight by 0.05 on an OSD and it nearly doubled in the number of PGs it had
[19:23] <acaos> can anyone offer any advice on how to spread PGs evenly amongst all the OSDs?
[19:27] * jtang2 (~jtang@2001:770:10:500:35d6:21a0:3b85:18db) Quit (Quit: Leaving.)
[19:28] <sjustlaptop> acaos: how many pools do you have?
[19:29] <sjustlaptop> also, how many pgs?
[19:30] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[19:31] <acaos> I have just one pool (data), and 1712 PGs across 96 OSDs
[19:32] <acaos> the cluster is arranged as 3 hives of 4 hosts, each host with 8 OSDs, and my crush rule is basically choose 1 hive, then chooseleaf 0 host (with size=3)
[19:32] <acaos> I've also tried choose 1 hive, choose 0 host, choose 1 device
[19:33] <acaos> right now my OSDs range from a low of 23 PG-replicas to a high of 115 PG-replicas
[19:34] <acaos> that's using alg tree on root and hive and alg uniform on host (though I've tried a dozen different mixes of algorithms)
[19:36] <sjustlaptop> acaos: we suggest 100-200 pgs per osd for reasonably smooth placement
[19:36] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[19:36] <sjustlaptop> so num_pgs * num_replicas = N * num_osds
[19:36] <sjustlaptop> where N is 100-200
[19:36] <sjustlaptop> so for 150:
[19:37] <acaos> ok, I'll try bumping it up
[19:37] <sjustlaptop> num_pgs = 150*num_osds / num_replicas
[19:37] <acaos> is there any other way to improve it other than to increase the number of PGs?
[19:37] <sjustlaptop> not really
[19:37] <sjustlaptop> it's a pseudorandom placement
[19:38] <acaos> even with no straw present?
[19:38] * neerbeer (~Adium@65.221.12.128) has joined #ceph
[19:38] <sjustlaptop> yeah
[19:38] <sjustlaptop> and you want to use the default ( which I think is straw)
[19:39] <sjustlaptop> more pgs is beneficial for a few other reasons as well, the OSD uses them as the unit of concurrency
[19:39] <sjustlaptop> it will also increase the number of OSDs involved in recovery when an OSD dies
[19:40] <acaos> but load also increases as the number of PGs do, right?
[19:40] <acaos> this is actually a lab cluster intended to test some stuff we don't want to test on our live cluster
[19:40] <sjustlaptop> load gets bad when the number of pgs gets very large
[19:41] <sjustlaptop> up <500/osd seems to behave ok
[19:41] * neerbeer (~Adium@65.221.12.128) Quit ()
[19:41] <sjustlaptop> there is per-pg memory overhead
[19:41] <sjustlaptop> though much less in current master than in the past
[19:41] * gaveen (~gaveen@175.157.131.51) Quit (Remote host closed the connection)
[19:41] <sjustlaptop> due to more aggressive log trimming
[19:41] <acaos> yeah, the lab cluster is actually for doing a dry run on an upgrade to 0.55
[19:41] <sjustlaptop> why to 0.55?
[19:41] <acaos> isn't that the latest release?
[19:42] <janos> 0.56.3
[19:42] <acaos> sorry, 0.65
[19:42] <acaos> er, 56
[19:42] <acaos> my mistake
[19:42] <sjustlaptop> 0.60 (?) I think is the latest release
[19:42] <sjustlaptop> 0.56.3 is bobtail, the latest stable release
[19:42] <acaos> yeah
[19:42] <acaos> that's what I had meant, to Bobtail
[19:42] <sjustlaptop> you want 56.3
[19:42] <acaos> 0.56.3
[19:42] <gregaf> .59 isn't out yet, sjustlaptop :p
[19:42] <sjustlaptop> gregaf: ok, so I can't keep the numbers straight :)
[19:43] <sjustlaptop> the latest is N where 0.56.3 < N < version_of(cuttlefish)
[19:45] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[19:54] * dennis (~dweazle@tilaa.krul.nu) Quit (Server closed connection)
[19:54] * dennis (~dweazle@tilaa.krul.nu) has joined #ceph
[19:55] <joelio> sjustlaptop: thought it was N = 50-100 for OSD (unless docs have changed)
[19:55] <sjustlaptop> joelio: if you have replication 2, that's what it ends up being
[19:55] <joelio> ahh, I see, cool :)
[20:02] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[20:10] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[20:10] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[20:11] <sjustlaptop> gregaf, sagewk: anyone want to review the osd shutdown cleanup pull request?
[20:12] <gregaf> not available right now, maybe later
[20:12] <sjustlaptop> sage, sagelap: ^
[20:12] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[20:15] <sjustlaptop> gregaf: what message would an OSD send to the mon to get itself marked down?
[20:17] <sjustlaptop> would I just send a MMonCommand?
[20:21] <janos> hrmm. if i were to make a small machine which just uses kernel rbd to mount and map and export via samba - should that machine have a connection to the cluster network or the front network? (or both?)
[20:21] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[20:22] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[20:25] * eschnou (~eschnou@223.86-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:27] <gregaf> sjustlaptop: I don't know that we have a proper mechanism for that right now
[20:27] <gregaf> so (god help us) constructing an MMonCommand which refers to itself and just marks down might be the only way to do it today
[20:28] <gregaf> janos: front network; only the OSDs communicate over the cluster network
[20:35] <janos> gregaf: sounds good, thanks
[20:35] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[20:38] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[20:39] <sjustlaptop> gregaf: hmm, that is unsatisfying
[20:40] <gregaf> I could just be forgetting something, though
[20:40] <sjustlaptop> I think you are right
[20:40] <sjustlaptop> I'll add a message, I think
[20:40] <gregaf> don't forget the inter-version compatibility layers… ;)
[20:41] <sjustlaptop> I think old osds don't send it and old mons won't decode it/
[20:41] <sjustlaptop> ?
[20:41] <sjustlaptop> what do mons do when they get a message they don't recognize?
[20:41] <gregaf> right, but if somebody has a new OSD and an old mon the OSD shouldn't send it
[20:41] <sjustlaptop> ok
[20:41] <sjustlaptop> easy enough
[20:41] <gregaf> I believe everybody crashes if they get an unrecognized message
[20:41] <gregaf> on an assert
[20:41] <sjustlaptop> ok, good for debugging anyway
[20:42] <dmick> default:
[20:42] <dmick> ret = false;
[20:44] <dmick> lsubdout(cct, ms, 0) << "ms_deliver_dispatch: unhandled message " << m << " " << *m << " from "
[20:44] <dmick> << m->get_source_inst() << dendl;
[20:44] <dmick> assert(!cct->_conf->ms_die_on_unhandled_msg);
[20:44] <sjustlaptop> ah
[20:44] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[20:45] <dmick> defaults to false
[20:45] <dmick> fwiw
[20:45] <gregaf> that's different — that's a message which the endpoint doesn't have a registered handler for
[20:45] <gregaf> check Message::decode
[20:46] <gregaf> oh, it's got a separate ms_die_on_bad_msg
[20:46] <sjustlaptop> ah, can just send a MOSDFailure about itself
[20:46] <sjustlaptop> and update the mon accordingly
[20:46] <gregaf> sjustlaptop: you'd need to add special-case handling for a self-report :/
[20:46] * diegows (~diegows@host28.190-30-144.telecom.net.ar) has joined #ceph
[20:47] <gregaf> I'm not saying no, just that it's quite separate from the rest of the logic handling
[20:48] <gregaf> dmick: sjustlaptop: regarding unknown messages, it looks like what's supposed to happen is that the connection gets closed, and that might actually work
[20:48] <gregaf> but we're not exercising those paths so don't count on it behaving correctly
[20:48] <dmick> gregaf: that *seems* like what happens when (all the) ms_dispatch returns false, which is what the mon does if it gets an unknown type, but I'm definitely grasping
[20:50] <sjustlaptop> gregaf: yeah, I know
[20:50] <sjustlaptop> I'm looking at the code now
[20:50] <sjustlaptop> the osd will only conditionally send it
[20:50] <dmick> oh but decode_message, which probably comes earlier. I see.
[20:51] <sjustlaptop> gregaf: actually, would it be cleaner to overload MOSDFailure or create a new message?
[20:51] <sjustlaptop> need compat flag either way
[20:51] <gregaf> I think I'd rather create a new message
[20:51] <sjustlaptop> yeah, I agree
[20:51] <sjustlaptop> this path is only tangentially related to the Failure code
[20:51] <gregaf> then we can also have it do stuff like supplying a one-line log to the monitor about why and such
[20:52] <sjustlaptop> yeah
[20:52] <gregaf> and given that it should not be OSD-specific
[20:52] <sjustlaptop> osd specific?
[20:52] <gregaf> we'll want to do this on the MDS eventually too
[20:52] <sjustlaptop> yeah, but the log will be totally different
[20:52] <sjustlaptop> and will live in a different *Monitor
[20:53] <sjustlaptop> *the logic will be
[20:53] <gregaf> mmm
[20:53] <gregaf> the handling logic will be, but the communication is going to be the same data "I'm shutting down and this is why"
[20:54] <sjustlaptop> yeah
[20:54] <sjustlaptop> seems like the least interesting part though
[20:55] <gregaf> I suppose
[20:55] <gregaf> mostly I just don't want to redo all the wrapping in six months, since I think the wrapping is going to be most of the code
[20:56] <sjustlaptop> what wrapping?
[20:56] <gregaf> perhaps that's misguided, though
[20:56] <gregaf> compat bits, new message type
[20:56] <sjustlaptop> the switch cases in Monitor and OSDMonitor?
[20:56] <sjustlaptop> you'll need the compat bits anyway since I'm not doing the MDS now
[20:57] <sjustlaptop> plus, better for the Monitor dispatcher to be able to ignore message contents when routing the message
[20:57] <gregaf> yeah, but if we get a self-shutdown report that we can't handle it's fine to just dump it
[20:57] <gregaf> and the dispatcher would ignore the message contents — just look at the source type ;)
[20:58] <sjustlaptop> you win this round
[20:58] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[20:58] <dmick> lol
[20:58] <sjustlaptop> so MMarkMeDown rather than MOSDMarkMeDown?
[20:58] <gregaf> unless you think that's a really bad idea, yeah
[20:59] <sjustlaptop> you don't think we'll want daemon specific sub information at some point?
[20:59] <sjustlaptop> it seems likely that we would
[20:59] <dmick> using namespace lebowski; class MMarkItZero
[21:00] <gregaf> hmmm
[21:00] <sjustlaptop> I think we'll want daemon specific sub info, MOSDMarkMeDown seems unambiguous
[21:01] <gregaf> yeah, that's persuasive
[21:01] <gregaf> okay
[21:01] <dmick> would we ever want this message to originate somewhere besides the OSD itself?
[21:01] <sjustlaptop> nope
[21:02] <sjustlaptop> if it comes from another OSD, that's the MOSDFailure system
[21:02] <sjustlaptop> if it comes from a client, that's an MMonCommand
[21:02] * Svedrin (svedrin@2a01:4f8:100:3061::10) Quit (Server closed connection)
[21:02] * dmick nods
[21:02] <sjustlaptop> and what's more, the OSD needs an ack to continue shutting down
[21:02] * Svedrin (svedrin@2a01:4f8:100:3061::10) has joined #ceph
[21:02] <gregaf> ...huh?
[21:02] <sjustlaptop> osd -> mon : shut me down
[21:02] <sjustlaptop> mon -> osd : mk
[21:03] <sjustlaptop> otherwise
[21:03] <sjustlaptop> osd-> mon : shut me down
[21:03] <sjustlaptop> osd.killmessenger()
[21:03] <sjustlaptop> did the shut me down get there?
[21:03] <gregaf> you're just worried about the message not actually going out
[21:03] <gregaf> okay
[21:04] <gregaf> we definitely don't want to do just that
[21:04] <sjustlaptop> hmm?
[21:04] <gregaf> otherwise failing to transmit the message prevents the OSD from shutting down
[21:04] <sjustlaptop> that's right, it does
[21:04] <sjustlaptop> there'll be a timeout
[21:04] <gregaf> there's nothing catastrophic about failing to inform the monitor we're going away
[21:04] <gregaf> it just means a fallback to the failure reports system
[21:04] <sjustlaptop> you're right, it'll just timeout after 20s or something
[21:04] <gregaf> I wouldn't make it that large
[21:05] <gregaf> more like 1second
[21:05] <sjustlaptop> well, it's going to be gconf->osd_shutdown_wait_for_mon_timeout
[21:05] <gregaf> damn, I wish we exported a "this message has been received by the remote socket" interface
[21:05] <sjustlaptop> that would be neat
[21:05] <sjustlaptop> I could add that instead
[21:05] <sjustlaptop> also not very hard, I think
[21:06] <gregaf> well, you'd still need the timeout ;)
[21:06] <sjustlaptop> yeah, but not the return message or message handling
[21:07] <sjustlaptop> actually, that wouldn't guarrantee that the message was paxos'd
[21:07] <gregaf> no, but who cares
[21:07] <sjustlaptop> so a poorly timed election or whatnot could cause a 20s interuption in IO
[21:08] <sjustlaptop> which is suboptimal
[21:08] <gregaf> how would that happen?
[21:08] <sjustlaptop> I have no idea
[21:08] <sjustlaptop> mon failure?
[21:08] <gregaf> and anyway, surely by the time you're sending this shutdown the OSD has stopped handling client IO
[21:08] <sjustlaptop> no, that's the point
[21:08] <gregaf> yeah, if the mon we send to fails then we'd have to retransmit
[21:08] <gregaf> ah
[21:09] <gregaf> this is front gate, not an ending action
[21:09] <sjustlaptop> correct
[21:09] <sjustlaptop> the point is for a seemless-er osd shutdown procedure
[21:09] <gregaf> okay, in that case you definitely need a full response from the monitor then
[21:09] <sjustlaptop> yah
[21:09] <sjustlaptop> k
[21:09] <gregaf> anything else widens the read hole
[21:10] <sjustlaptop> eh, that's a separate issue
[21:10] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[21:10] <gregaf> …not really…?
[21:10] <sjustlaptop> it is.
[21:10] <sjustlaptop> in this scenario even if we didn't tell the mon anything
[21:10] <sjustlaptop> the worst case is that we have to wait for the heartbeat timeout to continue IO
[21:11] <sjustlaptop> for the read hole to be a thing, the osd would have to not know that it had been marked out and continue serving reads
[21:11] <sjustlaptop> which it won't do, since the process is no longer living
[21:11] * mcclurmc (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[21:12] <gregaf> right, but if you're continuing to serve reads at any point after sending a shutdown to the monitor then you're basically trying to race into the hole
[21:12] <gregaf> and if you stop serving reads once you transmit the message then there's not much point to gating shutdown
[21:13] <sjustlaptop> ah, right
[21:13] <sjustlaptop> but it's still the same problem
[21:14] <gregaf> yeah, it's the same problem but making it more likely is all I'm saying
[21:14] <sjustlaptop> hmm, I think you are right
[21:14] <sjustlaptop> though, it's no worse than marking an osd out
[21:14] <sjustlaptop> via the ceph tool
[21:15] * leseb (~leseb@HSI-KBW-46-237-220-11.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[21:15] <gregaf> s/out/down
[21:15] <sjustlaptop> no, out
[21:15] <sjustlaptop> or down
[21:15] <sjustlaptop> right
[21:15] <gregaf> they'll keep trying to talk to it if it's out
[21:15] <sjustlaptop> either way
[21:15] <sjustlaptop> right
[21:15] <gregaf> anyway, yes, add a monitor reply
[21:16] <gregaf> maybe a tracker entry for the messenger "socket received" thing, but that'll require some thought as it could be misused pretty badly
[21:18] <gregaf> joshd: the ObjectCacher assert(ob->last_commit_tid < tid) thing — do we have fixes for that queued in the bobtail branch?
[21:18] <gregaf> some people are talking about it on the list again
[21:19] <gregaf> oh, wait, he says he's on latest bobtail
[21:20] <gregaf> or sjust: I don't remember what was at fault last time we looked at this (apart from the kernel client one)
[21:20] <gregaf> and now I'm going to go hide because I really need to get through this review
[21:21] <joshd> yeah, I was already responding. all the latest fixes are in the bobtail branch already afaict
[21:22] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[21:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:25] * alram (~alram@38.122.20.226) has joined #ceph
[21:31] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:42] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:42] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[21:42] * markbby (~Adium@168.94.245.4) has joined #ceph
[21:48] * LeaChim (~LeaChim@5ad4a53c.bb.sky.com) Quit (Ping timeout: 480 seconds)
[21:52] * ScOut3R (~ScOut3R@c83-249-245-183.bredband.comhem.se) Quit (Remote host closed the connection)
[21:57] * LeaChim (~LeaChim@b0fad8fe.bb.sky.com) has joined #ceph
[21:58] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Quit: ZNC - http://znc.in)
[21:58] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[21:59] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:00] * kyle_ (~kyle@ip03.foxyf.simplybits.net) has joined #ceph
[22:00] <kyle_> Hello all.
[22:01] <dmick> hi kyle_
[22:03] <kyle_> if i have an old version of ceph in my cluster that has been idle for a few months... is there an easy way to remove all the old ceph stuff so i can rely on packages. Running ubuntu 12.04 on all boxes.
[22:05] <Cube> Is the old version installed via packages?
[22:06] <kyle_> no. i was building with git cloning
[22:06] <Cube> oh okay, going to have to manually remove everything then I believe
[22:06] <dmick> maybe make uninstall
[22:06] <kyle_> there was not support for packages at the time. that i know of.
[22:06] <kyle_> okay i'll try that thanks
[22:07] <dmick> that won't clean up things like data directories etc.
[22:07] <dmick> so you might want to ferret those out too, perhaps with ceph.conf's help if you customized
[22:07] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[22:07] <kyle_> okay. i think that part should be pretty painless manually
[22:11] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[22:13] * LeaChim (~LeaChim@b0fad8fe.bb.sky.com) Quit (Ping timeout: 480 seconds)
[22:22] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Quit: Leaving.)
[22:22] * LeaChim (~LeaChim@5e0d7853.bb.sky.com) has joined #ceph
[22:22] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[22:22] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[22:33] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Quit: ZNC - http://znc.in)
[22:33] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[22:34] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit ()
[22:45] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[22:47] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[22:48] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[22:50] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[22:52] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:56] * sleinen (~Adium@2001:620:0:26:4d0b:c04e:5d5:72b8) Quit (Quit: Leaving.)
[23:05] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:09] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[23:10] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:11] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[23:13] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[23:16] * sagelap (~sage@76.89.177.113) Quit (Quit: Leaving.)
[23:16] * sagelap (~sage@76.89.177.113) has joined #ceph
[23:17] * diegows (~diegows@host28.190-30-144.telecom.net.ar) Quit (Ping timeout: 480 seconds)
[23:17] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[23:19] * zK4k7g (~zK4k7g@digilicious.com) Quit (Quit: Leaving.)
[23:35] * rturk-away is now known as rturk
[23:42] * janisg (~troll@85.254.50.23) Quit (Ping timeout: 480 seconds)
[23:43] * eschnou (~eschnou@223.86-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:43] * janisg (~troll@85.254.50.23) has joined #ceph
[23:46] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[23:47] <sjustlaptop> gregaf: how do I get the feature bits off of a MonClient*?
[23:47] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[23:48] * kyle_ (~kyle@ip03.foxyf.simplybits.net) Quit (Quit: Leaving)
[23:49] <sjustlaptop> seems if there is no open session, send_mon_message queues the message in the MonClient, so I might not be able to trust that the connection the message is eventually sent on is the same as the connection on the session which existed when I first attempted to send it
[23:50] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[23:51] <sjustlaptop> gregaf: is there a way to get a least common feature set?
[23:51] <sjustlaptop> **greatest common feature set
[23:58] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) Quit (Quit: Leaving.)
[23:58] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[23:58] <gregaf> sjustlaptop: sorry, my brain is being DDOSed by Yan's patch series
[23:59] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[23:59] <gregaf> you're trying to figure out what your monitor has to decide if you can send that MOSDGoingDown or whatever?
[23:59] * The_Bishop (~bishop@2001:470:50b6:0:658b:f0ee:70f9:7308) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.