#ceph IRC Log

Index

IRC Log for 2013-03-11

Timestamps are in GMT/BST.

[0:00] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[0:01] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Read error: Operation timed out)
[0:01] * BillK (~BillK@124-169-226-205.dyn.iinet.net.au) has joined #ceph
[0:03] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[0:05] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[0:05] <nz_monkey_> We are using bcache on our OSD's. With this in place, I am failing to logically see the benefit of ceph's journal. Is it possible to disable it completely ?
[0:06] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[0:11] <iggy> i've never heard of a way
[0:13] * Philip__ (~Philip@hnvr-4d079a78.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[0:20] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:21] * sagelap (~sage@24-121-147-29.npg.sta.suddenlink.net) Quit (Ping timeout: 480 seconds)
[0:23] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[0:29] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Now if you will excuse me, I have a giant ball of oil to throw out my window)
[0:31] <BillK> saw a config the other day with a journal size of zero - meant to ask but didnt
[0:43] * BillK (~BillK@124-169-226-205.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[0:48] * Tiger (~kvirc@80.70.238.91) Quit (Ping timeout: 480 seconds)
[0:50] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[0:52] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[1:04] * sagelap (~sage@2600:1010:b121:4fe2:b1fe:5b5:58c7:7392) has joined #ceph
[1:15] * BillK (~BillK@124-168-243-201.dyn.iinet.net.au) has joined #ceph
[1:19] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Ping timeout: 480 seconds)
[1:19] * sagelap1 (~sage@2600:1010:b107:35b:b1fe:5b5:58c7:7392) has joined #ceph
[1:21] * sagelap (~sage@2600:1010:b121:4fe2:b1fe:5b5:58c7:7392) Quit (Ping timeout: 480 seconds)
[1:21] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[1:24] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[1:30] * sagelap1 (~sage@2600:1010:b107:35b:b1fe:5b5:58c7:7392) Quit (Ping timeout: 480 seconds)
[1:35] * esammy (~esamuels@host-2-102-70-24.as13285.net) Quit (Quit: esammy)
[2:09] * janeUbuntu (~jane@2001:3c8:c103:a001:8544:d054:eff8:6fa5) Quit (Ping timeout: 480 seconds)
[2:17] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) Quit (Remote host closed the connection)
[2:20] * janeUbuntu (~jane@118.175.7.68) has joined #ceph
[2:29] <Kioob> Hi
[2:30] <Kioob> I change the ruleset of a pool, which imply moving 10% of data. It's now running, and all the cluster is near unailable : a simple 4k write take 7 seconds
[2:32] <Kioob> one point : one pool use totally different OSD (full SSD) and is not concern by the remapped & backfilling, but is also very very slow
[2:32] <Kioob> so
[2:32] <Kioob> It seem to be a MON overload, or network limit, right ?
[2:45] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:46] <Kioob> well, it doesn't seem to be the network....
[2:47] <Kioob> (1.5Gbps used, on 10Gbps network)
[2:50] <Kioob> and the �ceph-mon� process doesn't seem to use a lot of CPU
[2:50] <Kioob> so... I don't know
[2:54] * stackevil (~stackevil@77.116.17.65.wireless.dyn.drei.com) Quit (Quit: This Mac has gone to sleep!)
[3:05] * sagelap (~sage@2600:1012:b01e:fb5d:b1fe:5b5:58c7:7392) has joined #ceph
[3:18] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[3:28] * sagelap (~sage@2600:1012:b01e:fb5d:b1fe:5b5:58c7:7392) Quit (Ping timeout: 480 seconds)
[3:42] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0.2/20130307023931])
[4:35] <BillK> using .58, changed feature set from legacy to optimal, "libceph: mon0 192.168.44.90:6789 feature set mismatch, my 4008a < server's 204008a, missing 2000000" on mount
[5:33] * stackevil (~stackevil@77.116.17.65.wireless.dyn.drei.com) has joined #ceph
[5:35] * stackevil (~stackevil@77.116.17.65.wireless.dyn.drei.com) Quit ()
[5:36] * stackevil (~stackevil@77.116.17.65.wireless.dyn.drei.com) has joined #ceph
[5:37] * janeUbuntu (~jane@118.175.7.68) Quit (Ping timeout: 480 seconds)
[5:38] * stackevil (~stackevil@77.116.17.65.wireless.dyn.drei.com) Quit ()
[6:22] <BillK> fixed ... set tunables back to legacy and used manual method without the --set-chooseleaf-descend-once, docs are misleadin/need kernel version highlighted more for TUNABLES2
[7:00] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:07] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[7:09] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[7:10] * sleinen1 (~Adium@2001:620:0:26:e063:3965:dfd1:b2b8) has joined #ceph
[7:11] * sleinen1 (~Adium@2001:620:0:26:e063:3965:dfd1:b2b8) Quit ()
[7:12] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Read error: Connection reset by peer)
[7:14] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:17] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[7:46] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[7:49] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[7:52] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[7:58] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[8:00] * sleinen1 (~Adium@2001:620:0:25:9121:a393:542a:b8e9) has joined #ceph
[8:06] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:15] * Philip__ (~Philip@hnvr-4d079a78.pool.mediaWays.net) has joined #ceph
[8:15] * esammy (~esamuels@host-2-102-70-24.as13285.net) has joined #ceph
[8:16] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[8:19] * sleinen1 (~Adium@2001:620:0:25:9121:a393:542a:b8e9) Quit (Quit: Leaving.)
[8:19] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[8:25] * The_Bishop_ (~bishop@e179008236.adsl.alicedsl.de) has joined #ceph
[8:27] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:27] * leseb (~leseb@78.251.62.76) has joined #ceph
[8:30] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[8:31] * The_Bishop (~bishop@e179004212.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[8:40] * Philip__ (~Philip@hnvr-4d079a78.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[8:45] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[8:50] * sleinen (~Adium@2001:620:0:25:c829:1e69:3c83:bfe) has joined #ceph
[8:52] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[8:58] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:00] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Read error: Connection reset by peer)
[9:00] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[9:07] * Lennie`away is now known as leen
[9:07] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) has joined #ceph
[9:07] * leen is now known as Lennie_
[9:08] <Lennie_> hi, has anyone had a (or all) ceph-mon's stuck at start up ?
[9:08] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) has joined #ceph
[9:09] <Lennie_> it seems to be stuck applying incrementals in the loop on line 120 of https://github.com/ceph/ceph/blob/master/src/mon/OSDMonitor.cc
[9:10] * ninkotech_ (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[9:17] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[9:23] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:24] <Lennie_> the version/build is 0.57-1quantal
[9:28] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[9:35] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[9:37] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:45] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:47] <Lennie_> anyway if anyone wants to discus this problem, let me know, maybe I'll even make it a bugreport
[9:47] * Lennie_ is now known as Lennie`away
[9:51] * sleinen (~Adium@2001:620:0:25:c829:1e69:3c83:bfe) Quit (Quit: Leaving.)
[9:54] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[9:55] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[9:57] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit ()
[10:00] * leseb__ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:08] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[10:09] * l0nk (~alex@83.167.43.235) has joined #ceph
[10:10] * leseb__ is now known as leseb_
[10:13] * LeaChim (~LeaChim@b0faa428.bb.sky.com) has joined #ceph
[10:19] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[10:31] * leseb (~leseb@78.251.62.76) Quit (Read error: Connection reset by peer)
[10:31] * leseb (~leseb@78.251.62.76) has joined #ceph
[10:34] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[10:36] * livekcats (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[10:37] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[10:37] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:43] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Ping timeout: 480 seconds)
[10:43] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[10:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[10:49] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:54] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:55] * jtang1 (~jtang@2001:770:10:500:40c4:414:a4e3:b6c1) has joined #ceph
[10:57] * jtang2 (~jtang@2001:770:10:500:f0b2:3cf4:a78d:2f49) has joined #ceph
[11:03] * jtang1 (~jtang@2001:770:10:500:40c4:414:a4e3:b6c1) Quit (Ping timeout: 480 seconds)
[11:05] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Read error: Operation timed out)
[11:14] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[11:14] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit ()
[11:14] * hybrid512 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[11:48] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[11:48] * Frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[11:58] * BManojlovic (~steki@91.195.39.5) Quit (Read error: Operation timed out)
[11:58] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[12:07] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[12:08] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:12] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[12:19] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[12:31] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[12:31] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[12:37] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:38] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[12:43] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[12:45] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Remote host closed the connection)
[12:46] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:46] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Read error: Operation timed out)
[13:09] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:20] * livekcats is now known as stackevil
[13:27] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:38] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:38] * Philip__ (~Philip@hnvr-4dbd242e.pool.mediaWays.net) has joined #ceph
[13:39] * Anticimex (anticimex@netforce.csbnet.se) Quit (Ping timeout: 480 seconds)
[13:43] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[13:46] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[13:51] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[13:53] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[14:02] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[14:04] * leseb__ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[14:05] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:06] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[14:10] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Quit: There are 10 types of people. Those who understand binary and those who don't.)
[14:10] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[14:11] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[14:15] * leseb__ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[14:16] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[14:25] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[14:25] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[14:26] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[14:30] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) Quit (Remote host closed the connection)
[14:31] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[14:32] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * jbd_ (~jbd_@34322hpv162162.ikoula.com) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * BillK (~BillK@124-168-243-201.dyn.iinet.net.au) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * samppah (hemuli@namibia.aviation.fi) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * gregorg_taf (~Greg@78.155.152.6) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * morse (~morse@supercomputing.univpm.it) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * jefferai (~quassel@quassel.jefferai.org) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * fred1 (~fredl@2a00:1a48:7803:107:8532:c238:ff08:354) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * lurbs (user@uber.geek.nz) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * Lennie`away (~leen@lennie-1-pt.tunnel.tserv11.ams1.ipv6.he.net) Quit (reticulum.oftc.net charon.oftc.net)
[14:32] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[14:33] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[14:33] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[14:33] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[14:33] * BillK (~BillK@124-168-243-201.dyn.iinet.net.au) has joined #ceph
[14:33] * samppah (hemuli@namibia.aviation.fi) has joined #ceph
[14:33] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[14:33] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[14:33] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[14:33] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:33] * jefferai (~quassel@quassel.jefferai.org) has joined #ceph
[14:33] * fred1 (~fredl@2a00:1a48:7803:107:8532:c238:ff08:354) has joined #ceph
[14:33] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) has joined #ceph
[14:33] * lurbs (user@uber.geek.nz) has joined #ceph
[14:33] * Lennie`away (~leen@lennie-1-pt.tunnel.tserv11.ams1.ipv6.he.net) has joined #ceph
[14:34] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[14:34] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[14:35] * ChanServ sets mode +o dmick
[14:35] * ChanServ sets mode +o joao
[14:45] * drokita (~drokita@199.255.228.128) has joined #ceph
[14:55] <absynth> so, did the guy with the STONITH problem turn up over the weekend?
[14:55] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[14:55] * l0nk (~alex@83.167.43.235) has joined #ceph
[14:55] <janos> not that i saw
[14:57] * Norman (53a31f10@ircip1.mibbit.com) has joined #ceph
[14:57] <absynth> aww
[14:57] <absynth> i'd really love to hear the rest of the story
[14:58] <absynth> it sounded genuinely interesting
[14:58] <janos> yeah, hopefully he'll show up today
[14:58] <janos> i want to know what collision of situations caused that
[14:58] <Norman> hey guys! doesn anyone know if there is a stable RBD patch for a 2.6.32 kernel ?
[15:00] <absynth> i don't know, but i think... maybe not?
[15:00] <absynth> what is that, debian squeeze stock kernel?
[15:07] <Norman> absynth: rhel kernel
[15:09] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:16] * yehuda_hm (~yehuda@2602:306:330b:1410:218b:dec7:4e73:e322) has joined #ceph
[15:23] * The_Bishop__ (~bishop@f052101142.adsl.alicedsl.de) has joined #ceph
[15:26] <scuttlemonkey> the Stonith guy sent me an update
[15:26] <scuttlemonkey> apparently it was some unrelated kernel issue
[15:26] <scuttlemonkey> they had just update to 3.8.x a few days before...once they rolled back to 3.6.x it apparently fixed w/e issue was causing reboot
[15:27] <janos> @scuttlemonkey: funky. at least they were able to rollback
[15:27] <cephalobot> janos: Error: "scuttlemonkey:" is not a valid command.
[15:27] <scuttlemonkey> the good news is that once they fixed that problem the ceph cluster came right back up with no issue or data corruption
[15:27] <janos> hopefully no data corruption
[15:27] <janos> cool!
[15:30] * The_Bishop_ (~bishop@e179008236.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[15:33] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has left #ceph
[15:34] * lofejndif (~lsqavnbok@212.84.206.250) has joined #ceph
[15:38] <absynth> interesting... the guy was really, really panicky
[15:40] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[15:41] <janos> he sounded like he was not the techie, but his butt was still on the line
[15:41] <janos> not a good position
[15:41] <scuttlemonkey> yeah, his tech was gone for the long weekend
[15:41] <scuttlemonkey> he was the boss-guy w/ multiple-hundreds of potentially angry customers
[15:42] <absynth> scratch "potentially", i guess
[15:43] <janos> seems like some pre-upgrade deployment tests were not done - something they may start doing in the future i hope ;)
[15:43] <janos> when you have a customer base that large
[15:50] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[15:50] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[15:51] * jlogan (~Thunderbi@2600:c00:3010:1:3500:efc8:eaed:66fd) has joined #ceph
[15:58] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) Quit (Quit: gerard_dethier)
[16:02] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[16:06] * vata (~vata@2607:fad8:4:6:c835:900e:24a0:35fa) has joined #ceph
[16:06] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:08] <Norman> this is for building the kernel client http://ceph.com/w/index.php?title=Building_kernel_client, where can I find how to build the RBD.ko ? :)
[16:16] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[16:26] <Gugge-47527> Norman: it should be in all recent kernels :)
[16:30] <barryo> it's not in el6's default kernel sadly
[16:41] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[16:42] * The_Bishop__ (~bishop@f052101142.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[16:45] * sagelap (~sage@107.sub-70-197-64.myvzw.com) has joined #ceph
[16:48] * gregaf1 (~Adium@2607:f298:a:607:e847:b709:a931:7089) Quit (Quit: Leaving.)
[16:51] * The_Bishop__ (~bishop@f052101142.adsl.alicedsl.de) has joined #ceph
[16:52] * gregaf (~Adium@2607:f298:a:607:7d6f:9e6d:762c:9089) has joined #ceph
[16:52] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[16:58] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[16:59] <absynth> sagelap: it turns out i used your software as early as 96
[16:59] <absynth> and i can prove that
[17:06] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:08] * BillK (~BillK@124-168-243-201.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:09] * brambles (lechuck@s0.barwen.ch) Quit (Remote host closed the connection)
[17:09] * brambles (lechuck@s0.barwen.ch) has joined #ceph
[17:12] * gerard_dethier (~Thunderbi@72.26-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:15] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[17:23] <Karcaw> is anyone working on RPM's for 0.58?
[17:29] <joao> absynth, are you going to show us your page on geocities from back then? :p
[17:29] <janos> tripod!
[17:30] <joao> I have a feeling it's really on geocities, and using webring I bet :p
[17:30] <janos> haha
[17:31] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:32] * sagelap (~sage@107.sub-70-197-64.myvzw.com) Quit (Ping timeout: 480 seconds)
[17:42] * noob2 (~cjh@173.252.71.3) has joined #ceph
[17:42] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[17:44] * sagelap (~sage@2607:f298:a:607:6845:ba75:64c3:82a8) has joined #ceph
[17:56] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) Quit (Remote host closed the connection)
[18:01] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) has joined #ceph
[18:05] * Norman (53a31f10@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[18:05] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:07] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:08] * jtang2 (~jtang@2001:770:10:500:f0b2:3cf4:a78d:2f49) Quit (Ping timeout: 480 seconds)
[18:08] * jtangwk (~Adium@2001:770:10:500:9d7f:aa39:c3c1:4b7c) Quit (Ping timeout: 480 seconds)
[18:10] <absynth> joao: no, it wasn't geocities
[18:10] <joao> :(
[18:10] <absynth> but yes, it was in a webring
[18:10] <joao> :)
[18:10] <absynth> http://www.rhwd.owl.de/
[18:10] <absynth> (german only)
[18:11] <joao> I noticed
[18:11] <joao> good thing chrome translated pages on the fly :p
[18:11] <joao> *translates
[18:21] * joelio imagines a use case for using a LeapMotion in Ceph Administration
[18:22] * capri (~capri@p54A552CC.dip0.t-ipconnect.de) has joined #ceph
[18:22] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[18:23] <absynth> what#s that?
[18:23] <absynth> a motion controller?!
[18:23] * capri (~capri@p54A552CC.dip0.t-ipconnect.de) Quit ()
[18:24] <joelio> absynth: https://www.leapmotion.com/
[18:26] <absynth> yeah, i had a cursory glance
[18:26] <joelio> pretty much a micro kinekt for your desk (without IR mapping by the looks)
[18:27] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:27] <joelio> I'm imagining some cheesy hackers style interactions with Ceph. Rubbing a matrix of PGs to kick off a scrub or something, could look quite fun :)
[18:28] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:28] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[18:29] <absynth> you are probably an avid fan of psDoom, too?
[18:30] <joelio> once upon a time, maybe :)
[18:31] * sagelap (~sage@2607:f298:a:607:6845:ba75:64c3:82a8) Quit (Quit: Leaving.)
[18:31] * sagelap (~sage@38.122.20.226) has joined #ceph
[18:33] <joao> I can imagine your cat jumping in front of that thing and taking all your osds down
[18:33] <joao> that said, http://www.youtube.com/watch?v=JQCP85FngzE
[18:33] * gerard_dethier (~Thunderbi@72.26-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: gerard_dethier)
[18:34] * jtangwk (~Adium@2001:770:10:500:3565:409a:b866:e8b2) has joined #ceph
[18:35] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[18:41] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) has joined #ceph
[18:54] * noahmehl_ (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[18:58] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:59] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:59] * noahmehl_ is now known as noahmehl
[19:02] * danieagle (~Daniel@177.97.248.247) has joined #ceph
[19:03] * oddover (~oddover@glados.colorado.edu) has joined #ceph
[19:05] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:05] <oddover> Hi all. I'm evaluating ceph for my environment. I'm trying to set it up on some test servers right now. My question is about the 5-minute quick start guide. It mentions a "ceph server" and "ceph client". would the client be a desktop or something that I'm mounting the FS to? or would it be a "slave" server that will have another copy of the FS that can be shared out.
[19:05] <oddover> I hope that makes sense
[19:06] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[19:07] <scuttlemonkey> any 'client' can mount the cephfs
[19:07] <scuttlemonkey> oddover: ^
[19:07] <dmick> In general "server" is the one or ones providing the service. So those would be the machines running the cluster and storing the data. "client" is the one accessing the service. So that would be the one mounting the filesystem, or accessing the cluster through S3, or whatever.
[19:07] <dmick> (there are many access methods)
[19:07] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:08] <oddover> dmick: That makes sense. that's what I was looking for.
[19:08] <scuttlemonkey> oddover: I assume you are specifically interested in cephfs since that's what you mentioned
[19:08] <oddover> so it is possible to have more than one server?
[19:08] <jmlowe> ceph is an object store (rados), there is a posix adaptor (cephfs), a s3 adaptor (radosgw), a thingy to turn a bunch of objects into a block device (rbd)
[19:08] <oddover> scuttlemonkey: yes.
[19:08] <scuttlemonkey> oddover: you can have many many "server" machines...but it becomes one cluster
[19:09] <dmick> oddover: the whole point of Ceph is that it's massively distributed for redundancy; it doesn't make much sense to have only one server. So, yes, by nature.
[19:09] <scuttlemonkey> client machines can mount it in a number of ways, even when we're just talking about cephfs specifically (and not the block device or object store)
[19:09] <oddover> ok. so lemme rephrase. I'm trying to set up a cephfs cluster with 3 servers, and potentially lots of clients (for now just one).
[19:09] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[19:10] <scuttlemonkey> so doing what...one monitor and 2 osd machines?
[19:10] <oddover> not sure I understand that terminology
[19:11] <scuttlemonkey> ok, a ceph cluster is made up of several different pieces
[19:11] <dmick> oddover: what's your actual question?
[19:11] * Philip__ (~Philip@hnvr-4dbd242e.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[19:11] <oddover> dmick: I'm trying to get a good understanding of the different pieces of ceph.
[19:12] <oddover> so I can set something up and compare with other distributed FS out there (glusterfs, xtreemFS, etc)
[19:12] <jmlowe> osd stores objects, mon knows where everything is, mds provides the mapping between posix semantics and objects
[19:12] <dmick> sounds like an intro presentation would fill in a lot of gaps
[19:12] <scuttlemonkey> oddover: I might reccommend watching one of the webinars
[19:12] <jmlowe> you will have one or more of each of those
[19:12] <scuttlemonkey> sage does a great job of explaining the arch
[19:12] <scuttlemonkey> one sec, I'll grab link
[19:12] <dmick> ^ +10
[19:13] <oddover> thanks. that's probably what I'm missing
[19:13] <oddover> I only started looking at ceph today, so I'm very new
[19:13] <scuttlemonkey> the resources section on inktank has tons of good stuff: http://www.inktank.com/resources/
[19:14] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:14] <scuttlemonkey> but the advanced ceph also does a good job of doing basic arch: http://youtu.be/fQ64X7oCXkE
[19:15] <oddover> "Introduction to OpenStack and Ceph" and "Webinar – Getting Started With Ceph" look to be what I want. can you recommend one over the other?
[19:15] <dmick> take two, they're free
[19:16] <oddover> :P but they're an hour each
[19:16] <scuttlemonkey> the advanced one isn't all that advanced
[19:16] <scuttlemonkey> it's just aimed at a technical audience vs non
[19:16] <Gugge-47527> i would recommend the advanced one :)
[19:16] <oddover> ok, I'll try that one
[19:16] <Gugge-47527> and the other two :P
[19:17] <Gugge-47527> 3 hours is not much time to invest in it :)
[19:17] <oddover> thanks guys. I'll let you know if I have more questions later
[19:17] <scuttlemonkey> good luck :)
[19:22] * danieagle (~Daniel@177.97.248.247) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[19:29] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[19:31] <oddover> aha! much better understanding already.
[19:31] <scuttlemonkey> :)
[19:32] <oddover> the picture that's on the ceph homepage is very much complimented by the one before it in this presentation
[19:32] <oddover> I didn't get that
[19:32] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) has joined #ceph
[19:32] <oddover> that the red was disk/storage, blue is "api", and white/gray is sort of clients.
[19:35] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Oops. My brain just hit a bad sector)
[19:36] <joshd> t0rn: can you check bobtail memory usage when you set log_max_recent = 500?
[19:40] <joshd> joao: can the global_id adjust by rank collide when the number of monitors changes, or is there a mechanism to prevent that?
[19:42] <joao> joshd, I'm pretty sure that won't be a problem
[19:43] * ScOut3R_ (~ScOut3R@1F2EAE22.dsl.pool.telekom.hu) has joined #ceph
[19:43] <joao> the monitor resets 'last_allocated_id' upon a finished election; during update_from_paxos() it will set the last_allocated_id to the max
[19:43] <joao> that will prompt the ceiling to be raised
[19:44] <joao> so you shouldn't have a clash with ids assigned prior to the last election
[19:44] <t0rn> joshd: is log_max_recent a client or cluster setting ?
[19:45] <joao> and all ids are assigned taking the rank and the number of monitors into consideration, regardless of whether you added/removed monitors
[19:45] <joshd> t0rn: both, but in this case it's just on the client that's probably causing the problem (since the default is 100000)
[19:45] <joao> joshd, was my explanation clear? any thoughts?
[19:47] <joshd> joao: yeah, that makes sense. thanks for the explanation
[19:47] <t0rn> joshd , ty, i'll test it in a bit and let you know of the results
[19:48] <joshd> t0rn: great, thanks
[19:57] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit (Read error: Connection reset by peer)
[19:58] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[19:59] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit ()
[20:05] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[20:05] * sagelap (~sage@38.122.20.226) Quit (Quit: Leaving.)
[20:05] * sagelap (~sage@2607:f298:a:607:6845:ba75:64c3:82a8) has joined #ceph
[20:17] * capri (~capri@p54A552CC.dip0.t-ipconnect.de) has joined #ceph
[20:23] <t0rn> joshd , here's how i tested. On my client, i placed a /etc/ceph/ceph.conf , with log_max_recent = 500 in [global] (otherwise blank). I then started my test instance with virsh create (using qemu-rbd). So far, the memory usage is much lower than before, and is about the same as argonaut client. I can let the test run for 24 hours and graph the result if you wish, but so far in the first 25 minutes of the test running, the most its went up
[20:25] * ScOut3R_ (~ScOut3R@1F2EAE22.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[20:26] <joshd> t0rn: great, no need to do 24hours, I was pretty sure that was the problem
[20:26] <joshd> t0rn: that'll be the default in 0.56.4 for clients (log_max_recent=500)
[20:27] <t0rn> ty joshd. Indeed I am now seeing about the same memory usage in my 0.56.3 client as i was on argonaut client
[20:30] <joshd> t0rn: yw. todin's 'memory leak' turned out to be caused by that, so I wanted to make sure you were hitting the same underlying issue
[20:37] * yehudasa (~yehudasa@2607:f298:a:607:8c77:4d39:f04c:96ab) Quit (Ping timeout: 480 seconds)
[20:37] * capri (~capri@p54A552CC.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[20:38] * capri (~capri@p54A552CC.dip0.t-ipconnect.de) has joined #ceph
[20:39] * fghaas (~florian@vpn13.hotsplots.net) has joined #ceph
[20:41] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[20:42] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[20:43] * fghaas (~florian@vpn13.hotsplots.net) Quit ()
[20:44] * eschnou (~eschnou@46.85-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:45] * yehudasa (~yehudasa@2607:f298:a:607:1cb4:8aaa:37d8:7521) has joined #ceph
[20:53] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Quit: This Mac has gone to sleep!)
[20:56] * capri (~capri@p54A552CC.dip0.t-ipconnect.de) Quit (Quit: Verlassend)
[21:00] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[21:01] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:02] * sagelap (~sage@2607:f298:a:607:6845:ba75:64c3:82a8) Quit (Quit: Leaving.)
[21:11] * fghaas (~florian@vpn13.hotsplots.net) has joined #ceph
[21:12] * sagelap (~sage@2607:f298:a:607:6845:ba75:64c3:82a8) has joined #ceph
[21:13] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[21:17] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[21:21] <fghaas> joshd: I'm seeing a "IOError: [Errno 32] Corrupt image download. Checksum was <md5> expected <othermd5>" error on a test folsom glance box backed by rbd (bobtail, 0.56.3). does that ring a bell to you at all? google is astonishingly silent. I'm tempted to rule out a general glance problem, as a file backed store works just fine
[21:29] <fghaas> that's an error coming from python-glanceclient afaict
[21:30] * eschnou (~eschnou@46.85-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[21:31] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:31] <joshd> fghaas: I haven't gotten that message, but it seems it's verifying the image it gets back against the md5 stored by glance
[21:32] <fghaas> joshd: yeah it's from integrity_iter(iter, checksum)
[21:33] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[21:33] <joshd> fghaas: I'd suggest doing a deep scrub to check for issues
[21:33] <fghaas> hm. the checksum that glance is expecting, would that be the same as an md5sum on a kernel rbd dev from that image?
[21:34] <joshd> I'd expect so. or an rbd export
[21:34] <fghaas> joshd: the error existed before bobtail; I upgraded to rule out any version issues, and there was a deep scrub as part of the upgrade
[21:34] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[21:35] <joshd> is it possible the original upload was incomplete somehow?
[21:35] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:36] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[21:38] <elder> Anybody else having trouble with google?
[21:38] <fghaas> joshd, ok the md5sum from rbd export matches the one that glance reports a seeing. so that rules out an issue on download. up next, as you suggested, I'm checking the upload :)
[21:39] <scuttlemonkey> elder: what kind of trouble? I have been enjoying the Adams doodle all day :)
[21:39] <elder> Back now.
[21:39] <elder> Weird.
[21:42] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:45] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[21:47] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) has joined #ceph
[21:48] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[21:51] <gregaf> oh dear
[21:51] <gregaf> I hate doodles
[21:53] <scuttlemonkey> but...but....Douglas Adams! :P
[21:54] <gregaf> I'm *still* desperately clicking on the Guide
[21:54] <gregaf> and now I'm waiting....
[21:54] <gregaf> I think there's going to be an answer, maybe?
[21:55] <gregaf> ah, and now it's repeating
[21:55] <gregaf> there we go, all done
[21:59] * kyann (~kyann@did75-15-88-160-187-237.fbx.proxad.net) Quit (Quit: I love my HydraIRC -> http://www.hydrairc.com <-)
[22:02] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[22:04] <fghaas> joshd: you were totally right. the image had broken on upload. stupid me, I always suspected only a download problem
[22:05] <joshd> fghaas: I'm glad it wasn't a corruption on the backend :) I'm surprised it wasn't set to an error state by glance though
[22:06] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[22:06] <fghaas> joshd: the original image may have been created with glance image-create --download-from, and in that case I don't know whether glance ever creates a checksum before uploading to the store
[22:07] * andrew_ (~andrew@ip68-231-33-29.ph.ph.cox.net) has joined #ceph
[22:08] <joshd> I'd still expect the state to stay 'downloading' or whatever, and not be available
[22:08] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Quit: WeeChat 0.3.2)
[22:10] <andrew_> how do i deal with osd messages about slow requests? what do they actually mean?
[22:11] <joshd> fghaas: you may be hitting https://bugs.launchpad.net/glance/+bug/1146830 then
[22:14] <Gugge-47527> can mapping rbds on machines running osds cause deadlocks?
[22:16] * dpippenger (~riven@216.103.134.250) has joined #ceph
[22:16] <dmick> Gugge-47527: http://ceph.com/docs/master/faq/?highlight=deadlock#how-can-i-give-ceph-a-try
[22:16] <dmick> second paragraph
[22:17] <Gugge-47527> thankyou, i remember reading that before :)
[22:17] <dmick> andrew_: http://ceph.com/docs/master/rados/operations/troubleshooting-osd/?highlight=slow%20requests#old-requests-or-slow-requests
[22:18] <Gugge-47527> i think i just hit something like that on a 3.8.2 kernel though
[22:18] <Gugge-47527> im gonna get a few more machines to run my tests :)
[22:18] <dmick> "though" makes it sound like you think newer kernels eliminate the problem; they don't
[22:19] <Gugge-47527> ahh misread the first sentence :)
[22:20] <andrew_> thanks dmick; exactly to the point. no errors anywhere but dmesg.
[22:20] <Gugge-47527> just read it again, and now i get it
[22:21] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[22:23] <andrew_> if i can add storage to a server, is it better to add it to the sole osd on that server, or t start another osd on that srver?
[22:23] * Philip__ (~Philip@hnvr-4dbd242e.pool.mediaWays.net) has joined #ceph
[22:24] <janos> i'd add another osd
[22:24] <janos> in most cases you're better off thinking of 1:1 osd:disk
[22:24] <janos> like anything there can be exceptions
[22:26] <andrew_> thanks. is there a rule of thumb about how many TB per OSD for a modern fast CPU?
[22:26] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit ()
[22:27] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has left #ceph
[22:27] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[22:27] <janos> they dont really do the recommendations on the per/GB/TB measure that i've seen
[22:27] <janos> it's more per osd
[22:27] <janos> i forget the exact numbers, but i think something like 1GHz per osd. and 1-2 GB of ram per OSD
[22:28] <janos> and that's not full-time
[22:28] <janos> but when heavy rebalancing or the like is going on, you'll want it
[22:28] <sagewk> elder: ping?
[22:29] <elder> Yes/
[22:29] <sagewk> elder: did you do the ceph-client.git rebasing yet?
[22:29] <elder> No
[22:29] <sagewk> is master is empty (nothing there since the pull)?
[22:29] <elder> I'm gathering things, probably won't get to that until tonight.
[22:29] <elder> I haven't done anything to master.
[22:30] <sagewk> ok, i'm going to stick the crush/osdmap decoding fix in there, and then send it to linus..
[22:30] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[22:30] <elder> OK.
[22:30] <elder> I haven't looked to see if there's anything else to go there (or back-port for that matter). I forgot to last week.
[22:32] <sagewk> np
[22:32] <sagewk> i think this is the only regression
[22:36] <fghaas> joshd: possibly, yes. (sorry for the delay)
[22:40] <joao> fghaas, were you satisfied by my answer on G+? (assuming you had the time to read it since)
[22:41] <fghaas> joao: well satisfied in the sense that you explained your reasoning well, yes, definitely. thanks a lot. satisfied in the sense of yes, this will make it more reliable and I'm now less afraid of of mon-massacring issues like the one in that bug report, not quite
[22:45] <fghaas> ... but if someone could come up with the leveldb equivalent of git reset --hard $tag to roll back to a previous version, that would be awesome
[22:52] * noahmehl_ (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[22:52] <joao> fghaas, that wouldn't be so hard to accomplish really
[22:52] <joao> give me a morning and I'll get you a tool just for that
[22:52] <joao> I'll add that to my stack and get it done next weekend, as soon as this sprint's over
[22:54] * nick5 (~nick@74.222.153.12) Quit (Remote host closed the connection)
[22:54] * nick5 (~nick@74.222.153.12) has joined #ceph
[22:56] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[22:57] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:57] * noahmehl_ is now known as noahmehl
[23:02] <fghaas> joao: that would be awesome. if someone kills all their mons with a similar issue, telling them "shut down all your mons, run /usr/sbin/joaos-awesome-repair-tool --dwim --all-osds, then fire them back up" is orders of magnitude better than "ouch. 'fraid you accept either open heart surgery or death"
[23:04] <joao> yeah
[23:04] <joao> there's a drawback on the whole thing though, but we might be able to surpass it without much fuss
[23:05] <joao> I've added that to my calendar, and will take a proper look over the weekend
[23:08] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:08] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Read error: Connection reset by peer)
[23:08] <joao> nope, figured it out
[23:08] <joao> should be easy enough, just not sure if gregaf or sage will be happy with the solution :p
[23:10] <gregaf> huhwhat?
[23:11] <joao> say, you want to easily rollback your monitors to latest_monmap_version-2
[23:11] * stackevil (~stackevil@178.112.38.246.wireless.dyn.drei.com) has joined #ceph
[23:12] <joao> although that's trivial to do on the monmap side, you'd still have to consider Paxos versions, right?
[23:12] <gregaf> yeah, but it should all btrfs snapshot just fine
[23:13] * vata (~vata@2607:fad8:4:6:c835:900e:24a0:35fa) Quit (Quit: Leaving.)
[23:13] <joao> so, one possible solution, besides surgically remove those versions if still present on the paxos side (which would be tough to do), we could just create a new Paxos version that would revert to a state representing the store's state
[23:13] <joao> gregaf, but I'm guessing there's not many people running mons on btrfs
[23:13] <gregaf> true
[23:13] <gregaf> "new Paxos version"?
[23:13] <joao> I knew you wouldn't buy it
[23:14] <joao> I have a hard time buying it myself
[23:14] <gregaf> I don't know what the phrase means
[23:14] <joao> ah, kay
[23:14] <joao> so
[23:14] <joao> remember how we keep paxos versions independently from the service-specific versions, right?
[23:14] <gregaf> yeah
[23:15] <joao> a paxos version being short of a whole service-specific transaction encoded in a bufferlist
[23:15] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[23:15] <gregaf> you want to issue a command "snapshot" and bundle up the store in a transaction, and then let people say "revert" and pull it out and set the whole store based on that?
[23:15] <joao> so, in order to guarantee that the paxos state represents the store state, in case a new monitors joins in after the restart and performs a sync, or a recovery or wtv
[23:16] <joao> well, that would certainly be an option, but I guess the performance would severely drop during the snapshot
[23:16] <joao> not sure
[23:16] <joao> big enough stores would certainly suffer a toll with that approach
[23:16] <gregaf> I don't think we could pull it off with that much data — among other things the LevelDB store would hate you
[23:17] <gregaf> also I'm very reluctant to put anything like that into the monitors in a way that's accessible without separate tools
[23:17] <joao> I was thinking more on the lines of, 'set osdmap:last_committed = osdmap:last_commited-1'
[23:17] <joao> and add a Paxos transaction with just that
[23:17] <joao> err, a Paxos version, or append to the latest Paxos version, or whatever
[23:17] <gregaf> that would require also cleaning up the store OSDMaps past that version, etc
[23:18] <gregaf> s/store/stored
[23:18] <joao> not really
[23:18] <joao> you could, but not really
[23:18] <joao> as long as you set foo:latest_committed to 'x', you won't ever read versions past 'x'
[23:19] <gregaf> until you restart and go to figure out what you've got on-disk that you'd like to propose...
[23:19] <joao> that's done on the Paxos level
[23:19] <joao> services are no longer paxos-machines
[23:19] <gregaf> ah, right
[23:19] <gregaf> hurray
[23:19] <gregaf> still, eww
[23:19] <joao> yeah :(
[23:19] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[23:20] <joao> this might be a bit trickier than I thought
[23:20] <gregaf> also, what happens when somebody tries to revert back to versions that have already been trimmed
[23:20] * portante (~user@66.187.233.206) Quit (Quit: upgrading)
[23:20] <joao> I don't think that's a problem
[23:20] <gregaf> you're just really not supposed to revert
[23:21] <joao> right, you're not
[23:21] <gregaf> we might want to make it possible but we're not going to do so over a weekend
[23:21] <joao> but you might need to
[23:21] <gregaf> even if the code is that easy we want to think through the interfaces very carefully
[23:22] <gregaf> when could you force yourself into a situation that you want to revert and it's safe to do so only on the monitors and not the rest of the cluster?
[23:23] <joao> screwed monmaps for instance
[23:23] <gregaf> monmaps get distributed
[23:25] <joao> they do, but if you fubar'ed, you will want to bring the monitors down (if they're not already down) and do some monmap injection, or surgically roll back to the latest previous monmap
[23:25] <fghaas> gregaf: remember http://tracker.ceph.com/issues/2752? that, for example.
[23:25] <PerlStalker> You know, growing a ceph cluster is really easy when you get the crush map right.
[23:26] <joao> gregaf, I'm guessing this would only be used as a last resort kind of thing
[23:26] <gregaf> I'm not saying these are bad tools, I'm just saying that users do stupid things and so we don't want to unlock it until we've also simplified the conceptually difficult problems which are hidden behind these mechanically difficult problems
[23:28] <dmick> PerlStalker: that's the idea :)
[23:28] <gregaf> fghaas: joao: the maxosd one is actually a good example — what if your monitors aren't the same size, so one of them happily served up the new map with a higher cap for several seconds and shared it out to the OSDs, but you lost quorum?
[23:29] <gregaf> then the admin goes "aha! I know how to fix this!" and rolls back to a previous map, but now the OSDs and the monitors disagree on the state...
[23:29] <PerlStalker> dmick: I tried to break my cluster last week because I didn't add the osd to the crush map properly.
[23:29] * Philip__ (~Philip@hnvr-4dbd242e.pool.mediaWays.net) Quit (Read error: Operation timed out)
[23:30] <joao> yeah, rollbacks (if that is actually to come to life) should probably be done via 'ceph' and with a quorum in place
[23:30] <joao> gregaf, I thought about that, it's a major problem if we can't guarantee that all the monitors are in the same state
[23:31] <joao> but then again, there's no 'only a monitor serving an osdmap with a higher cap'
[23:32] <gregaf> joao: yes, and a whole ton of issues that would prompt a rollback are those which killed the monitors so you can't form a quorum of them — in general other stuff (not all of it) can be solved by simply undoing the change in a later map
[23:32] <joao> there's only the leader changing the osdmap, and the other monitors either committing that version or not
[23:32] <joao> if quorum fails and the monitors don't commit that version, the leader will make sure it is reproposed on the next election
[23:32] <dmick> PerlStalker: sounds like a bad idea, I recommend against it
[23:33] <joao> either way, the monmap wasn't really changed
[23:33] <joao> err, osdmap
[23:33] <gregaf> joao: not sure how that applies to what I was saying — in the bug referenced the monitors did commit, and then they crashed on the update_paxos
[23:34] <PerlStalker> dmick: It was though I managed to avoid data loss. The fun will come when I create a new storage pool with different rules.
[23:35] <joao> I might have misunderstood what you meant, but I was under the impression that you were suggesting that the problem would be that we could end up with one monitor with an osdmap version and the remaining monitors in the quorum without said osdmap version
[23:35] <joao> gregaf, was that not it?
[23:36] <gregaf> no, I'm saying that applying the rollback after information has been leaked out of the monitors and into the cluster is *really bad*, and that can happen under any of a number of scenarios
[23:36] <gregaf> and cleaning up that leakage is the hard part
[23:36] <joao> ah, right
[23:36] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[23:36] <joao> sure, that's certainly true
[23:37] <gregaf> and the only ways in which it's easy to guarantee that the rollback has been done correctly on the monitors are cases in which it's almost guaranteed that the information has leaked out to the clustler
[23:42] * mo- (~mo@2a01:4f8:141:3264::3) has joined #ceph
[23:47] * BillK (~BillK@58-7-239-bcast.dyn.iinet.net.au) has joined #ceph
[23:48] * stackevil (~stackevil@178.112.38.246.wireless.dyn.drei.com) Quit (Quit: There are 10 types of people. Those who understand binary and those who don't.)
[23:49] * stackevil (~stackevil@178.112.38.246.wireless.dyn.drei.com) has joined #ceph
[23:49] * stackevil (~stackevil@178.112.38.246.wireless.dyn.drei.com) Quit ()
[23:50] <mo-> Im feeling very stupid but I wanted to set up my first ceph node on a newly install ubuntu 12.04 box. I added the ceph repo but Im only seeing this: http://pastebin.com/KxsgE0tr
[23:50] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[23:51] <mo-> any ideas?
[23:51] <fghaas> mo-: which ubuntu repos do you have enabled?
[23:51] <mo-> main and updates
[23:51] <fghaas> just main, or universe too?
[23:52] <mo-> so I guess libbost is only available in universe?
[23:52] <mo-> +o
[23:52] <fghaas> no, libgoogle-perftools is
[23:52] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[23:53] <mo-> hm okay that still leaves libboost and gdisk then
[23:53] <fghaas> oh yeah, libboost-thread too
[23:53] * stackevil (~stackevil@178.112.38.246.wireless.dyn.drei.com) has joined #ceph
[23:53] <mo-> well yea that does look much better now, thanks
[23:56] * andrew_ (~andrew@ip68-231-33-29.ph.ph.cox.net) Quit (Quit: andrew_)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.