#ceph IRC Log

Index

IRC Log for 2013-04-29

Timestamps are in GMT/BST.

[0:04] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: No route to host)
[0:06] * yehuda_hm (~yehuda@2602:306:330b:1410:c54d:84c4:231a:4ca6) Quit (Ping timeout: 480 seconds)
[0:08] * yehuda_hm (~yehuda@2602:306:330b:1410:d56:c508:5918:9bdb) has joined #ceph
[0:12] * mega_au (~chatzilla@94.137.213.1) Quit (Ping timeout: 480 seconds)
[0:17] * yehuda_hm (~yehuda@2602:306:330b:1410:d56:c508:5918:9bdb) Quit (Ping timeout: 480 seconds)
[0:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:21] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) has joined #ceph
[0:30] * yehuda_hm (~yehuda@2602:306:330b:1410:d56:c508:5918:9bdb) has joined #ceph
[0:31] * sjustlaptop (~sam@m842736d0.tmodns.net) has joined #ceph
[0:32] * sjustlaptop (~sam@m842736d0.tmodns.net) Quit ()
[0:32] * sjustlaptop (~sam@m842736d0.tmodns.net) has joined #ceph
[0:37] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[0:40] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:47] <iggy> dgbaley27: you can't have a normal filesystem mounted in 2 places at once (not even "read only")
[0:48] <iggy> which is another way of saying ^ that (oops, was slightly scrolled up)
[1:05] * john_barbee1 (~nobody@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[1:11] * tnt_ (~tnt@109.130.111.118) Quit (Ping timeout: 480 seconds)
[1:14] * sjustlaptop (~sam@m842736d0.tmodns.net) Quit (Read error: Connection reset by peer)
[1:14] * john_barbee1 (~nobody@c-98-226-73-253.hsd1.in.comcast.net) Quit (Remote host closed the connection)
[1:20] * yehuda_hm (~yehuda@2602:306:330b:1410:d56:c508:5918:9bdb) Quit (Ping timeout: 480 seconds)
[1:22] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:26] * dgbaley27 (~matt@mrct45-133-dhcp.resnet.colorado.edu) has left #ceph
[1:39] * yehuda_hm (~yehuda@2602:306:330b:1410:d56:c508:5918:9bdb) has joined #ceph
[1:58] * diegows (~diegows@190.190.2.126) has joined #ceph
[1:59] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:59] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:00] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:09] * yehuda_hm (~yehuda@2602:306:330b:1410:d56:c508:5918:9bdb) Quit (Ping timeout: 480 seconds)
[2:19] * yehuda_hm (~yehuda@2602:306:330b:1410:cc65:e250:dc68:8807) has joined #ceph
[2:40] * capri_wk (~capri@pd95c3284.dip0.t-ipconnect.de) has joined #ceph
[2:40] * capri_on (~capri@pd95c3284.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[2:50] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:11] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:17] * kfox1111 (~kfox@96-41-208-2.dhcp.elbg.wa.charter.com) Quit (Ping timeout: 480 seconds)
[3:37] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:03] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:06] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:06] * john_barbee1 (~nobody@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[4:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:06] * john_barbee1 (~nobody@c-98-226-73-253.hsd1.in.comcast.net) Quit ()
[4:13] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:17] * b1tbkt_ (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[4:25] * wschulze1 (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:30] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:32] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:36] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:03] * rustam (~rustam@94.15.91.30) has joined #ceph
[5:11] * nhm (~nhm@65-128-150-185.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[5:32] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:40] * wschulze1 (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:49] * loicd (~loic@magenta.dachary.org) Quit (Ping timeout: 480 seconds)
[6:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[6:55] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:14] * yehuda_hm (~yehuda@2602:306:330b:1410:cc65:e250:dc68:8807) Quit (Read error: Connection reset by peer)
[7:54] * tnt (~tnt@109.130.111.118) has joined #ceph
[8:01] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[8:03] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: If you think nobody cares, try missing a few payments)
[8:13] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[8:15] * mega_au (~chatzilla@94.137.213.1) has joined #ceph
[8:35] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) has joined #ceph
[8:35] <jimyeh> Hi
[8:36] <jimyeh> I have a question about the leader of a monitor service.
[8:37] <jimyeh> Is the leader the only role which can call a elect?
[8:46] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Read error: Connection reset by peer)
[8:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[8:56] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:09] * capri_wk (~capri@pd95c3284.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[9:09] * capri_on (~capri@pd95c3284.dip0.t-ipconnect.de) has joined #ceph
[9:23] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:28] * tnt (~tnt@109.130.111.118) Quit (Ping timeout: 480 seconds)
[9:29] <madkiss1> jimyeh: If that was the case, a cluster partition that has lost its quorum leader wouldn't be able to elect a new one
[9:33] * l0nk (~alex@83.167.43.235) has joined #ceph
[9:33] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:37] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[9:37] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:41] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:41] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:43] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:44] <wido> jimyeh: This is what the monitors use: http://en.wikipedia.org/wiki/Paxos_%28computer_science%29
[9:56] * loicd (~loic@2a01:e35:2eba:db10:f06a:70:5e05:bfce) has joined #ceph
[10:03] * LeaChim (~LeaChim@90.204.16.57) has joined #ceph
[10:05] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[10:08] * loicd (~loic@2a01:e35:2eba:db10:f06a:70:5e05:bfce) Quit (Quit: Leaving.)
[10:08] * rahmu (~rahmu@83.167.43.235) has joined #ceph
[10:15] * l0nk (~alex@83.167.43.235) Quit (Ping timeout: 480 seconds)
[10:30] * l0nk (~alex@83.167.43.235) has joined #ceph
[10:43] * coyo (~unf@00017955.user.oftc.net) Quit (Remote host closed the connection)
[10:46] * coyo (~unf@71.21.193.106) has joined #ceph
[10:48] * l0nk (~alex@83.167.43.235) Quit (Ping timeout: 480 seconds)
[10:50] * jtangwk (~Adium@2001:770:10:500:49cf:4204:1351:99c6) has joined #ceph
[10:53] * vo1d (~v0@91-115-229-155.adsl.highway.telekom.at) has joined #ceph
[11:00] * v0id (~v0@91-115-228-132.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[11:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[11:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:08] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[11:13] * loicd (~loic@185.10.252.15) has joined #ceph
[11:20] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[11:21] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:23] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) Quit (Quit: Leaving.)
[11:27] * tnt (~tnt@212-166-48-236.win.be) Quit (Quit: leaving)
[11:35] * l0nk (~alex@83.167.43.235) has joined #ceph
[11:45] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) has joined #ceph
[11:45] <nwl> morning!
[11:45] <liiwi> good afternoon
[11:46] <nwl> rare that I get to see people in the European time zone..
[11:47] <andreask> there are some here ;-)
[11:49] <nwl> i work for Inktank, so other than Joao, we are all in the US timezones mostly
[12:06] * Havre (~Havre@2a01:e35:8a2c:b230:2cd5:a92f:87c0:a2d1) Quit (Ping timeout: 480 seconds)
[12:16] * rahmu (~rahmu@83.167.43.235) Quit (Remote host closed the connection)
[12:17] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[12:17] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) Quit (Quit: Leaving.)
[12:18] * joao (~JL@89-181-146-10.net.novis.pt) has joined #ceph
[12:18] * ChanServ sets mode +o joao
[12:18] <joao> morning #ceph
[12:23] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[12:35] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) Quit (Quit: Bye)
[12:43] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:47] * LeaChim (~LeaChim@90.204.16.57) Quit (Ping timeout: 480 seconds)
[12:48] * jimyeh (~Adium@122-116-78-170.HINET-IP.hinet.net) has joined #ceph
[12:49] * LeaChim (~LeaChim@90.204.16.57) has joined #ceph
[12:50] <athrift> morning joao
[12:51] <joao> hi there
[12:51] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) has joined #ceph
[12:56] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[13:05] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:12] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[13:15] * sstan (~chatzilla@dmzgw2.cbnco.com) has joined #ceph
[13:15] <sstan> morning
[13:23] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[13:24] * nhm (~nhm@65-128-150-185.mpls.qwest.net) has joined #ceph
[13:33] <wido> joao: Around? :)
[13:34] <wido> I'm playing with the monitors build from the next branch, but it's not working with me
[13:34] <wido> $ ceph auth get-or-create hangs for ever
[13:34] <wido> The monitors say "handle command", but nothing changes. Already have the logs at debug mon = 10, but nothing yet
[13:35] * virsibl (~virsibl@94.231.117.244) has joined #ceph
[13:37] <joao> wido, I'm here, but still catching up with last week
[13:37] <wido> joao: No problem :)
[13:38] <wido> It's just that the monitor are not playing nicely. Don't think the next branch is ready for a 0.61 yet
[13:39] <wido> joao: Just for my idea. "forward_request" means the monitor is forwarding the request to the leader? Looking at Monitor.cc
[13:39] <joao> yes
[13:40] <wido> joao: Ok, I see now. While talking, I see that my leader is now handling my command. I tried it a couple of times and it is processing them now
[13:40] <wido> Like 10 minutes later
[13:40] <joao> hmm
[13:40] <joao> does your leader appear to be overloaded for some reason?
[13:41] <wido> No, not at all. Load of 0.54
[13:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:41] <wido> This machine is dedicated to being a monitor
[13:41] <joao> wido, point me to the leader's logs?
[13:41] <wido> All mons are
[13:41] <wido> joao: Yes, only logging isn't high there :(
[13:41] <joao> oh
[13:42] <joao> how's the memory consumption on the leader?
[13:42] <wido> joao: http://pastebin.com/J7BxE3ga
[13:42] <wido> joao: The whole machine uses 551MB
[13:42] <wido> out of the 4GB
[13:43] <wido> In the log you notice it's creating the keys, the commands suddenly come through. My client sent them to mon2 and mon3
[13:46] <joao> yeah, we should grab a bit more info with higher debug levels
[13:46] <joao> if you are able to reproduce this behavior, that is
[13:46] <joao> and if you're okay with it
[13:46] <joao> anyway, you should be able to inject the debug levels into the leader
[13:47] <wido> joao: Yes, I've been using the admin socket. It's just that I didn't do it on mon1 yet
[13:47] <wido> joao: 10 or 20?
[13:47] <joao> 20 would be better
[13:47] <joao> and add ms 1 too
[13:48] <wido> joao: I'll try. But my commands are comming through now. I just created a new key within 2 seconds
[13:49] <joao> wido, was there any monitor synchronizing by any chance?
[13:49] <wido> joao: No, they were already online for about 3 hours
[13:50] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:50] <joao> okay
[13:50] <joao> let me know if you get those logs
[13:51] <wido> joao: I'm trying to reproduce, but for now all works
[13:52] <joao> wido, okay, better reduce mon debug to 10 then; otherwise your logs will get overpopulated
[14:03] * KindTwo (KindOne@h121.181.130.174.dynamic.ip.windstream.net) has joined #ceph
[14:06] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:06] * KindTwo is now known as KindOne
[14:12] * vipr (~vipr@78-23-116-42.access.telenet.be) has joined #ceph
[14:14] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[14:17] * vipr_ (~vipr@78-23-113-134.access.telenet.be) Quit (Read error: Operation timed out)
[14:18] * vipr_ (~vipr@78-23-113-70.access.telenet.be) has joined #ceph
[14:18] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[14:18] * ChanServ sets mode +o scuttlemonkey
[14:22] * sstan (~chatzilla@dmzgw2.cbnco.com) Quit (Remote host closed the connection)
[14:23] * vipr (~vipr@78-23-116-42.access.telenet.be) Quit (Read error: Operation timed out)
[14:31] <matt_> joao, just an update but my store is up to 15G now :/
[14:32] * sstan (~chatzilla@dmzgw2.cbnco.com) has joined #ceph
[14:34] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:36] <wido> so joao, I have logs! This time it was mon2 (now leader) who took 4 minutes to process my command. Do you suggest any open bug?
[14:38] * jimyeh (~Adium@122-116-78-170.HINET-IP.hinet.net) Quit (Quit: Leaving.)
[14:40] <joao> matt_, I've been briefly looking into that today, will let you know as soon as I know something
[14:40] <joao> I'm still catching up with last week, as I've been out since past Wednesday :\
[14:41] <joao> wido, sure, create a ticket and attach the logs
[14:41] <joao> I'm just going to have lunch really quick and will be back soon
[14:42] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[14:45] <niklas> hi there. When Accessing RadosGW from the Amazon S3 Java Library (http://aws.amazon.com/de/sdkforjava/) I get "FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: error parsing headers: duplicate header 'Status" im my apache error log
[14:46] <niklas> rados log looks good to me, and wireshark tells me, that the RadosGW sends a 500 for some reason I don't see.
[14:47] <tnt> if you're using a standard apache, you must disable the 100-continue in rgw config.
[14:48] <niklas> ok… how come?
[14:48] * TiCPU (~jeromepou@190-130.cgocable.ca) Quit (Ping timeout: 480 seconds)
[14:49] <niklas> It looks to me as if the fastCGI script passes an illegal response to apache, which apache doesn't want to handle…
[14:50] <niklas> see: http://tracker.ceph.com/issues/439, https://www.google.de/search?q=+FastCGI%3A+comm+with+server+"%2Fvar%2Fwww%2Fs3gw.fcgi"+aborted%3A+error+parsing+headers%3A+duplicate+header+'Status'&ie=utf-8&oe=utf-8&redir_esc=&ei=rWZ-UZz0FYWg4gS44YGADA#client=ubuntu&hs=xZo&channel=fs&sclient=psy-ab&q=FastCGI%3A+comm+with+server++aborted%3A+error+parsing+headers%3A+duplicate+header+'Status'&oq=FastCGI:+comm+with+server++aborted%3A+error+parsing+headers%3A+du
[14:50] <niklas> looks like a common problem with cgi scripts…
[14:51] <niklas> can't that be fixed?
[14:51] <tnt> add "rgw print continue = false" in your config
[14:51] <niklas> but for the time being: how do I disable the 100-continue in rgw?
[14:51] <niklas> thx
[14:51] <tnt> rgw has an "optimization" for the 100-continue but it's not compatible with the standardn mod_fastcgi, you need a patched one.
[14:52] <niklas> ok, interesting
[14:52] <niklas> where would I get a patched fcgi?
[14:52] <niklas> btw: The config change works
[14:53] <niklas> thanks, I've been googling for hours^^
[14:54] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:01] * TiCPU (~jeromepou@190-130.cgocable.ca) has joined #ceph
[15:02] <wido> joao: I created the issue for the mon: http://tracker.ceph.com/issues/4851
[15:06] <tnt> niklas: I had the same issue when trying nginx as frontend, and I had to resort to use wireshark on the fastcgi traffic to see the problem :p
[15:08] <tnt> mmm, let's hope 0.61 fixes this http://i.imgur.com/WSyPlkh.png this time.
[15:11] <mikedawson> joao: I am seeing the same things as wido. "ceph auth get-or-create hangs for ever" when the monitor you are starting can't join the quorum (or quorum isn't possible yet i.e. it is the only mon started out of three).
[15:12] <wido> mikedawson: In this case the monitors were already running for a long time and established a quorum
[15:13] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:16] <mikedawson> wido: ok. that is different then. All of my issues with ceph commands stem from time periods where quorum isn't established
[15:16] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[15:17] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:17] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[15:18] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:21] <mikedawson> matt_: When you say your store is 15G, I assume you are talking about the monitor's leveldb, right? Mine are huge, too. mon.a: 19GB, mon.b: 36GB, and mon.c: 37GB. Have you had quorum issues? Can you correlate the periods of rapid growth with quorum issues?
[15:28] * capri (~capri@pd95c3283.dip0.t-ipconnect.de) has joined #ceph
[15:29] * jimyeh (~Adium@112.104.142.211) has joined #ceph
[15:31] * capri_on (~capri@pd95c3284.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[15:41] * capri_on (~capri@pd95c3284.dip0.t-ipconnect.de) has joined #ceph
[15:47] * capri (~capri@pd95c3283.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[15:49] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:50] <mikedawson> joao, matt_: You can see my monitor leveldb growth by day at: http://pastebin.com/raw.php?i=9m8rsCiR
[15:52] <joao> mikedawson, thanks
[15:53] <mikedawson> joao: I was running 0.60 with all three monitors in quorum on April 25th, performing several rados bench throughput tests. At some point mon.a fell out of sync and has not been able to catch up. On the 26th, I started running "next" and working with Sage, but he doesn't appear to have it diagnosed yet
[15:56] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:57] <joao> mikedawson, matt_, wrt the store size, I've just checked the store matt_ provided last week, and I'm baffled to say that the store only contains roughly 810MB worth of data despite being 9GB in size
[15:58] <mikedawson> joao: it also looks like mon_leveldb_cache_size may have an issue. The code appears to default to 32 * 1024 * 1024, but when I query the mon admin socket, I get "mon_leveldb_cache_size": "0". If I add "mon leveldb cache size = 67108864" I see it set properly in the admin socket
[15:59] <mikedawson> joao: I used 64gb above based on a comment from Jim Schutt
[16:00] <joao> mikedawson, OPTION(mon_leveldb_cache_size, OPT_U64, 0)
[16:00] <joao> default is set to 0
[16:00] <mikedawson> joao: I also cephdropped my three mon stores on Friday. Assuming they are still there, they were named "mikedawson-mon-*.tar.gz" or something similar, if you'd like to check them out
[16:00] <joao> kay thanks
[16:01] <matt_> mikedawson, Yep, it correlated with my quorum issues
[16:03] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[16:05] * yehuda_hm (~yehuda@2602:306:330b:1410:9843:ade5:62ab:dd40) has joined #ceph
[16:06] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[16:06] * ssejour (~sebastien@out-chantepie.fr.clara.net) has joined #ceph
[16:09] <mikedawson> joao: when Jim proposed the patch, he had "OPTION(leveldb_cache_size, OPT_U64, 256 * 1024 * 1024)" http://www.spinics.net/lists/ceph-devel/msg13841.html . Then Greg turned merged it with the cache at 0 to save RAM. Setting any value there didn't help my situation, so if you don't thing it is related, I'll go back to 0.
[16:10] <joao> I don't think store size is related with cache, but I can only wonder if that would somehow help with sync being faster
[16:12] <mikedawson> joao: yep, for me, it is all about getting a sync to work. There is a backtrace available http://tracker.ceph.com/issues/4815 for you to see where mon.a gets stuck
[16:12] <ssejour> hello. When I have multiple mon process declared in my ceph.conf. How the load is managed between my ceph clients and my mon processes? Is it a round robin?
[16:15] <Kdecherf> hey world
[16:16] <Kdecherf> does it exist any opened issue about a high memory consumption of MON on 0.60?
[16:22] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[16:23] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[16:23] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[16:26] * gmason (~gmason@hpcc-fw.net.msu.edu) Quit ()
[16:26] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[16:28] * capri_on (~capri@pd95c3284.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[16:30] <mikedawson> Kdecherf: there was a period of time where the build process did not properly include tcmalloc and memory consumption balooned. iirc 0.60 was affected, but the gitbuilder "next" builds certainly were for a while. glowell fixed the build last Friday.
[16:30] * Meyer__ (meyer@c64.org) Quit (Ping timeout: 480 seconds)
[16:32] <Kdecherf> mikedawson: ok thanks, the issue only affects installations with tcmalloc enabled?
[16:34] <jtangwk> nice that someone is looking at erasure coding for ceph
[16:35] <mikedawson> Kdecherf: the issue I am aware of affects installations from packages build without tcmalloc. The test is connect to the admin socket to examine heap stats. It will be something like this: "ceph -m {ip address of mon} heap stats'
[16:36] <Kdecherf> mikedawson: oh ok, well i'm affected :)
[16:37] <mikedawson> Kdecherf: so it says something like "no tcmalloc", right?
[16:38] <Kdecherf> mikedawson: all our packages are explicitely built without tcmalloc
[16:46] <mikedawson> joao: if the leveldb store has a high percentage of useless space (810MB is useful compared to a 9GB total), is there hope of truncating/shrinking the store?
[16:48] * kfox1111 (~kfox@96-41-208-2.dhcp.elbg.wa.charter.com) has joined #ceph
[16:49] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[16:55] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:58] * virsibl (~virsibl@94.231.117.244) has left #ceph
[17:02] <joao> mikedawson, don't know yet
[17:02] * kfox1111 (~kfox@96-41-208-2.dhcp.elbg.wa.charter.com) Quit (Ping timeout: 480 seconds)
[17:11] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:18] * Meyer__ (meyer@c64.org) has joined #ceph
[17:23] * loicd (~loic@185.10.252.15) Quit (Ping timeout: 480 seconds)
[17:26] * vata (~vata@208.88.110.46) has joined #ceph
[17:36] * rustam (~rustam@94.15.91.30) has joined #ceph
[17:40] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[17:41] * madkiss (~madkiss@194.112.182.215) has joined #ceph
[17:41] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[17:44] * rustam (~rustam@94.15.91.30) Quit (Ping timeout: 480 seconds)
[17:51] * gregaf1 (~Adium@2607:f298:a:607:3536:c4f2:afcb:c49e) Quit (Quit: Leaving.)
[17:52] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Quit: wogri_risc)
[17:58] <mikedawson> sage: Agree with your punch list email. I have about 6 uninterrupted hours if I can assist in any way.
[18:00] <sage> mikedawson: thanks
[18:00] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:01] * gregaf (~Adium@2607:f298:a:607:3536:c4f2:afcb:c49e) has joined #ceph
[18:02] <sage> gregaf: waiting for mike's tarballs to download. in the meantime, looking at the fs hang
[18:03] <sage> gregaf: the mon issues is definitely the main thing tho
[18:03] <mikedawson> ^ here, here!
[18:03] <gregaf> yeah, I've got wido's logs from #4837 (that damned sync role assert) and Tamil's argonaut->bobtail->cuttlefish upgrade
[18:03] <sage> http://tracker.ceph.com/issues/4815 <-- the sync hang appears to be something with leveldb
[18:04] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:04] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:04] <sage> maybe sjust has an idea there
[18:05] <gregaf> oh, I haven't seen that full trace
[18:05] <gregaf> try turning up the leveldb tunables (and then crying if that fixes it), sage or mikedawson?
[18:05] <sage> mikedawson: did you have a chance to try that over the weekend?
[18:05] <mikedawson> gregaf, sage, sjust: did you guys see Joao's analysis of matt_'s leveldb store: "I'm baffled to say that the store only contains roughly 810MB worth of data despite being 9GB in size"
[18:05] <gregaf> (how big a pgmap are we looking at here?)
[18:06] <gregaf> hmm, I think I saw that go by but haven't thought about it at all
[18:06] <sage> mikedawson: which bug is that on?
[18:06] <gregaf> that is…bizarrely large
[18:07] <sage> i wonder if we could make ceph-mon explicitly trigger leveldbs compaction on startup or something
[18:07] <mikedawson> sage, I messed with mon_leveldb_cache_size at 64mb following Jim Schutt's thread. no change. Is there anything else you'd recommend?
[18:07] <sage> raise/lower write buffer size?
[18:07] <gregaf> that's a read cache, the write buffer size is more important
[18:08] <gregaf> and block size, although I'm not sure if bigger or smaller is better
[18:08] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] <gregaf> sage, joao, have we looked at how big an individual PGMap is?
[18:08] <sage> nope
[18:08] <joao> yeah, I can get that for you
[18:08] <joao> just a sec
[18:08] <sage> is it a problem if that exceeds the write buffer?
[18:08] <gregaf> there must be something about the data access patterns that is triggering this misbehavior, and that's the obvious one that's markedly different from what our testing covers
[18:08] <gregaf> sage: I have no idea
[18:09] <gregaf> and it shouldn't be 4MB anyway, but I could *imagine* it being problematic
[18:10] <mikedawson> I also have the issue of mon.a (totally won't sync), but mon.b can be a leader with mon.c a peon. That is until I start OSDs.... mon.b typically gets stuck probing (but sometimes mon.c gets stuck probing). Whichever doesn't start probing remains in it's previous leader or peon state
[18:11] <sage> can you generate logs for that second behavior?
[18:12] * madkiss (~madkiss@194.112.182.215) Quit (Ping timeout: 480 seconds)
[18:13] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:13] <mikedawson> sage: is it bad that makes me chuckle? I'll get you the logs asap
[18:13] <sage> :)
[18:13] * tnt (~tnt@109.130.111.118) has joined #ceph
[18:14] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[18:14] <gregaf> bet that's just another overloaded emergent behavior (ugh)
[18:14] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:15] <sage> yeah
[18:15] * madkiss (~madkiss@089144192030.atnat0001.highway.a1.net) has joined #ceph
[18:17] <gregaf> sage: got another one of those disconnected inodes, you want the logs?
[18:18] <sage> attach it to the bug?
[18:18] <gregaf> (also, that's twice in two days but I've not seen it in years prior)
[18:18] <sage> we fixed soemthign in that area ~2m onths ago
[18:18] <sage> but yeah
[18:19] <sage> oh wait, you mean on the mds?
[18:19] <gregaf> no, ceph-fuse
[18:19] <gregaf> do you have the bug number handy?
[18:19] * madkiss (~madkiss@089144192030.atnat0001.highway.a1.net) Quit ()
[18:19] <joao> okay, gregaf, full pgmaps on matt_'s store from last week go as high as 2MB; mostly they are <1MB
[18:20] <sage> sorry no
[18:20] <gregaf> np, you just updated the bug :p
[18:20] <gregaf> oh, crap, wrong one
[18:21] * sagelap (~sage@76.89.177.113) has joined #ceph
[18:21] * ssejour (~sebastien@out-chantepie.fr.clara.net) has left #ceph
[18:21] <gregaf> but I found it
[18:21] * stacker666 (~stacker66@33.pool85-58-181.dynamic.orange.es) Quit (Ping timeout: 480 seconds)
[18:22] <gregaf> joao: hmm, that's definitely small enough to fit into our defaults, so I guess that's less likely to be the issue :/
[18:22] * jskinner (~jskinner@69.170.148.179) Quit (Ping timeout: 480 seconds)
[18:22] <joao> gregaf, we might be having some issues with leveldb's architecture though
[18:22] <gregaf> yeah, maybe
[18:26] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[18:28] <mikedawson> joao: Do you need to look at the size of my pgmaps? I uploaded my mon leveldbs to cephdrop, but they're pretty big, so it may be easier to do it on my end. Is there a process you can show me?
[18:32] <joao> mikedawson, I'll pastebin a script that will obtain those sizes then
[18:32] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:34] <joao> sage, gregaf, compacting the store reduces the size from 9GB to ~250MB
[18:34] <joao> didn't time it though
[18:35] <joao> timing it now
[18:36] <sage> i'm adding a --compact startup option for ceph-mon to test
[18:37] <joao> hmm, mikedawson, is installing py-leveldb an issue for you?
[18:38] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:38] <mikedawson> joao: apparently it isn't as easy as apt-get install py-leveldb
[18:39] * yehuda_hm (~yehuda@2602:306:330b:1410:9843:ade5:62ab:dd40) Quit (Ping timeout: 480 seconds)
[18:39] <joao> mikedawson, python-leveldb
[18:39] <mikedawson> joao: thx. got it
[18:43] <gregaf> joao: sage: sjust suggests that compaction is because we upgraded our config settings so much
[18:43] <sage> could be
[18:43] <sage> it may also be that our trimming workload pattern is pathological
[18:43] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:44] <sage> the CompactRange takes a range of keys.. we may want to have tick() or something doing some preemptive cleanup
[18:44] * yehuda_hm (~yehuda@2602:306:330b:1410:9843:ade5:62ab:dd40) has joined #ceph
[18:44] <mikedawson> sage: requested logs are under "mikedawson3". Test starts on line 1. mon.b (leader) and mon.c (peon) achieve quorum. Then I start OSDs between 16:33:16 and 16:33:27. Soon after, you'll see mon.b start probing.
[18:44] <joao> sage, gregaf, according to ##leveldb it might be an issue with our kind of workload
[18:45] <gregaf> sjust: joao: do you know if leveldb forces a block to be the specified size (if say you send through a synchronous write)?
[18:45] <gregaf> brb, standup
[18:45] <joao> 8<dominictarr> if you have lots of overwrites,
[18:45] <joao> 8<dominictarr> it needs compactions
[18:46] <gregaf> joao: we don't overwrite much, though?
[18:46] <joao> deletes are considered updates, and we do quite a few of them
[18:46] <sage> mikedawson: pushed wip-mon-compact
[18:46] <sage> will take a few to build, but: start ceph-mon with --compact option and it will compact leveldb before starting up
[18:46] <joao> compacting a 9GB store took roughly 1 minute
[18:46] <joao> just fyi
[18:47] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[18:47] <mikedawson> sage, will it get built in ceph-deb-raring-x86_64-?
[18:47] <sage> -basic i think
[18:47] <sage> yeah
[18:48] <mikedawson> sage: ceph-deb-raring-x86_64-basic is a 404
[18:49] <sage> that one then :)
[18:49] <mikedawson> ha
[18:50] <gregaf> joao: ah, true
[18:50] <gregaf> but why would that prevent syncing?
[18:50] <gregaf> seems like the new monitor should just end up with a very small store compared to the others (like by doing compaction)
[18:51] <gregaf> I guess we wipe out the store and don't follow it by a compaction, though
[18:51] <joao> mikedawson, http://pastebin.com/uecBFAk7
[18:52] <joao> doh, just noticed I hardcoded the store path
[18:52] <joao> better run it from the mon directory
[18:52] * yehuda_hm (~yehuda@2602:306:330b:1410:9843:ade5:62ab:dd40) Quit (Ping timeout: 480 seconds)
[18:52] <joao> gregaf, that might be the issue
[18:53] <joao> we probably should compact the store after a clear
[18:53] <sage> joao: sync clear you mean?
[18:53] <gregaf> in ideal circumstances that would be pretty much never, though :p
[18:53] <joao> yes
[18:53] <sage> could also do it from bootstrap (when we're out of quorum anyway)
[18:54] <sage> let's see if that magically fixes mikedawson's mon.a sync...
[18:54] <joao> it would be great to take advantage of leveldb's compression for the sync though
[18:54] <joao> this store I've been testing on was compacted down to 250MB although it still contains 800+MB in data
[18:55] <joao> I guess libsnappy is being used
[18:55] <gregaf> oh, you know what it is, with a 4MB block size but a typical much-smaller value size we're getting a million whiteout entries that never get merged with the original data
[18:56] <gregaf> I bet that's why this is happening?
[18:56] <sage> so... small block size but still large write buffer? would that resolve jim's problem?
[18:56] <gregaf> or maybe with our all-sync workload we're running into some broken leveldb optimization where it's not merge sstables as often as we'd like
[18:56] <gregaf> sage: sjust was asking, but I don't know
[18:57] <sage> sjust: you're following Travis' Rhoden's ceph-users thread?
[18:57] <sage> that's a fixed bug, right?
[18:57] <sjust> yes
[18:57] <sjust> I don't think so
[18:57] <sjust> I'm not sure we've seen this particular one
[18:57] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[19:02] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:04] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Relax, its only ONES and ZEROS!)
[19:06] <cjh_> hey guys, question about ceph throughput. I'm seeing behavior with a 20 server cluster where the aggregate throughput is high but throughput to each individual server is low. I get about 40MB/s from each server when I would expect it to be much higher. I bench'd these servers at 500MB/s write speed to the raid 5 array. CPU and disk seem to not be using more than about 30%
[19:06] * rturk-away is now known as rturk
[19:06] <mikedawson> joao: entries: 552, size: 452.18M, avg: 838.83K
[19:07] <mikedawson> joao: and that's a mon that is 37GB on disk
[19:08] <sjust> cjh_: how are you generating the load?
[19:09] <cjh_> so i'm running radosbench -t 30 60 write on all the nodes
[19:09] <cjh_> it's writing to a 2 replica pool
[19:10] <cjh_> all the machines are in the same rack with a 10Gb switch at the top
[19:12] <sjust> cjh_: try running 4 rados bench instances per node
[19:12] * doubleg (~doubleg@69.167.130.11) Quit (Quit: leaving)
[19:12] <sjust> I seem to recall that radosbench throughput tends to plateu with a single process
[19:14] * doubleg (~doubleg@69.167.130.11) has joined #ceph
[19:14] * dmick (~dmick@2607:f298:a:607:d933:5785:4663:fa77) has joined #ceph
[19:14] <cjh_> ok i'll give that a shot :)
[19:14] * sjustlaptop (~sam@2607:f298:a:697:6526:b1a:3cb1:97d7) has joined #ceph
[19:14] <cjh_> so even if you say 30 threads that doesn't matter?
[19:14] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:15] <gregaf> cjh_: sjust: you're running a process on each node already? or one process across all nodes?
[19:15] <cjh_> gregaf: one process per node
[19:15] <sjust> cjh_: it matters, just less than you'd hope
[19:15] <cjh_> ok
[19:17] <cjh_> sjust: if i integrate a radosgw on each host do you think i would get better performance writing to that instead of using the kernel client? It would reduce one layer by not having an extra xfs file system
[19:19] <nhm> cjh: how are you measuring per-server throughput?
[19:19] <cjh_> you're right. dual rados bench processes on each node gets slightly better throughput. i think i've maxed out my network
[19:19] <cjh_> i'm just going by what radosbench tells me
[19:19] <nhm> cjh_: so like 1 client node with 1 copy of rados bench going to 20 servers?
[19:19] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[19:20] <cjh_> nhm: not quite. running rados bench on each of the 20 servers and they're all writing to the same 2x replication pool
[19:20] <nhm> cjh_: so rados bench on the same servers as the OSDs are on?
[19:20] <cjh_> trying to simulate the entire cluster being written to at once
[19:20] <cjh_> right
[19:20] <nhm> cjh_: all concurrently?
[19:20] <cjh_> yup
[19:20] <cjh_> would it be better if i spun up 20 remote clients and ran rados bench on them?
[19:20] <cjh_> something not on the cluster
[19:20] <nhm> ok. And in that case, each one gets 40MB/s?
[19:21] <cjh_> yeah approx
[19:21] <nhm> ok, so like 800MB/s aggregate?
[19:21] <cjh_> yup
[19:21] <nhm> how many drives per host?
[19:21] <cjh_> 12 i believe in a raid5 configuration
[19:21] <nhm> all in 1 big RAID5?
[19:22] <cjh_> yeah
[19:22] * jimyeh (~Adium@112.104.142.211) Quit (Quit: Leaving.)
[19:22] <cjh_> now if i run rados bench on only 1 node i get 107MB/s
[19:22] <cjh_> but i would expect much higher right?
[19:22] <nhm> Are you sure the traffic is going over the 10G interface?
[19:23] <cjh_> good question
[19:23] <cjh_> it does sound like it's stuck on 1Gb doesn't it
[19:23] * l0uis (~l0uis@madmax.fitnr.com) has left #ceph
[19:23] <nhm> kinda sounds like it, but for performance, ceph tends to like 1-disk OSDs too.
[19:24] <cjh_> yeah i was thinking of breaking it into a jbod next
[19:24] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:24] * stephane (~stephane@109.88.67.89) has joined #ceph
[19:24] <nhm> though in this case that does sound like 1G.
[19:24] <cjh_> gah :(
[19:24] <cjh_> you're right. my ip is on my 1Gb interface
[19:24] <cjh_> hahah i don't know why i didn't pick up on that this weekend. guess i needed another set of eyes
[19:25] <nhm> cjh_: it happens, no worries. :)
[19:25] <joao> mikedawson, in case you want to check for more info on sizes on the store: http://pastebin.com/FZHjzFV8
[19:25] <sage> mikedawson: wip-mon-compact is built.. have a chance to test?
[19:25] <cjh_> still though, 800MB/s aggregate over 1Gb is good :)
[19:26] <joao> mikedawson, './script-name sizes' should provide more info on the whole store
[19:27] <nhm> cjh_: yeah, with 2x replication and raid5 that's not bad at all! :)
[19:27] * stephane (~stephane@109.88.67.89) Quit (autokilled: Do not spam. Mail support@oftc.net with questions. (2013-04-29 17:27:20))
[19:27] <cjh_> these servers are fairly powerful. that's why i was like wtf is going on
[19:27] <nhm> cjh_: what kind of servers?
[19:27] <cjh_> i believe they're hp brand
[19:28] <nhm> cjh_: Ok, there's someone else here that has some HP 2U boxes too.
[19:28] <cjh_> you have supermicro's right?
[19:28] <cjh_> i remember you saying awhile back
[19:28] * jskinner (~jskinner@69.170.148.179) Quit (Ping timeout: 480 seconds)
[19:31] <mikedawson> sage: starting now
[19:31] <nhm> cjh_: I've just got the 1 847a and a bunch of much much older nodes.
[19:32] <nhm> cjh_: some dell R515s too.
[19:32] <cjh_> gotcha
[19:32] <cjh_> how are they performing for you?
[19:32] <nhm> cjh_: which ones?
[19:32] <cjh_> the R515's
[19:33] * Tamil (~tamil@38.122.20.226) has joined #ceph
[19:33] * sjustlaptop (~sam@2607:f298:a:697:6526:b1a:3cb1:97d7) Quit (Ping timeout: 480 seconds)
[19:33] <nhm> cjh_: Problematic. I suspect something strange is going on with the drives/expanders/controller.
[19:34] <cjh_> oh man
[19:34] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) has joined #ceph
[19:35] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[19:36] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:36] <matt_> nhm, are you still having issues with the 3.8 kernel?
[19:37] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:37] <mikedawson> sage, joao: the bad monitor (mon.a) went from 19G to 529k. Wow
[19:39] <joao> mikedawson, cool
[19:40] <nhm> matt_: our builds yes. I grabbed a 3.8 ppa kernel for rarring and that's working much better.
[19:40] <nhm> matt_: We have a ton of debugging enabled, it seems that it may be having a much bigger performance impact than it used to.
[19:41] <matt_> nhm, I don't know if it's relevant but my IOwait using raring's 3.8 and btrfs is through the roof even with a basically idle cluster
[19:41] * LeaChim (~LeaChim@90.204.16.57) Quit (Ping timeout: 480 seconds)
[19:41] <sage> mikedawson: can it sync now?
[19:42] * dwt (~dwt@128-107-239-234.cisco.com) has joined #ceph
[19:42] <nhm> matt_: hrm, I don't think I'm seeing that, but I haven't been paying too much attention since my throughput is fine.
[19:42] <nhm> matt_: doesn't happen with an older kernel?
[19:42] <matt_> nhm, it seems write out data in huge chunks for no reason. iowait is 15% compared to 2% on a server running xfs
[19:42] <mikedawson> sage: I'm backing up store.db on mon.b and mon.c, then will try. got scared seeing how much was compacted on mon.a
[19:42] <matt_> but somehow large write throughput in unaffected.... it's strange
[19:43] <matt_> nhm, it did happen with the previous kernel but I don't think it was as pronounced. It seems much higher now
[19:44] <nhm> matt_: I've seen some remarks in the past that iowait statistics can be a bit misleading.
[19:44] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has left #ceph
[19:44] <nhm> matt_: I don't know enough about what's going on there to have much opinion though.
[19:47] <matt_> nhm, it's a bit odd because iostat confirms it's hitting the drives pretty hard with writes. The XFS server is writing out small 100k requests but the btrfs server is writing out bigger >4mb chunks
[19:48] <nhm> matt_: what's the max_sectors_kb set to for the devices?
[19:49] <matt_> nhm, 512
[19:49] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:49] <nhm> doesn't seem like btrfs should be able to write out data any larger than that?
[19:50] <matt_> nhm, sorry, I meant >4mb/s aggregate a second by looking at iostat
[19:50] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[19:50] <joao> mikedawson, you can nonetheless check with the tool I pastebined to make sure the data is still there
[19:50] * LeaChim (~LeaChim@176.250.170.34) has joined #ceph
[19:50] <mikedawson> sage, joao: after compacting the monitors 37G -> 531M. And mon.a is in quorum and listed as the leader, mon.b and mon.c are peons.
[19:50] <joao> it ought to show the real data size for all keys
[19:51] <sage> excellent
[19:51] <sage> i'm adding 'mon compact on bootstrap = true' (default) to that branch
[19:51] <joao> sage, adding the leveldb interface
[19:51] <joao> err
[19:51] <nhm> matt_: not entirely sure what you mean? like BTRFS is more bursty or something?
[19:52] <mikedawson> osds started, quorum held up, and I'm back to HEALTH_OK
[19:52] <sage> joao: work off that branch.. i already added a 'compact al teh things' call
[19:52] <joao> leveldbstore function I meant
[19:52] <joao> okay
[19:52] * sjustlaptop (~sam@38.122.20.226) Quit ()
[19:52] <joao> sage, wip-mon-compact is it?
[19:52] <sage> yeah
[19:52] <joao> kay
[19:52] <sage> just repushed
[19:53] <sage> gregaf, joao: want to review? i think the bootstap is a good safety-net, even if we do figure out how to compact on trim
[19:53] <mikedawson> sage, joao, gregaf: great work! thank you so much!
[19:54] <matt_> nhm, so right now a have a pool with replica 2. One server is btrfs, one is xfs. The btrfs server is writing out data at almost 3x the rate of the xfs server via iotop and iostat
[19:55] <joao> sage, will do
[19:59] * eschnou (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:59] <gregaf> sage, joao: the monitor has pretty good docs now, let's not add new functions without maintaining those
[20:00] * rturk is now known as rturk-away
[20:00] * rturk-away is now known as rturk
[20:00] * pconnelly (~pconnelly@71-93-233-229.dhcp.mdfd.or.charter.com) has joined #ceph
[20:00] <mikedawson> sage, joao: I can see the mon leveldb growing at the rapid pace still, need to put something in for preventative maint
[20:00] <mikedawson> growing at maybe 10MB/min
[20:01] <gregaf> mikedawson: yeah, nothing implemented for that yet
[20:01] <mikedawson> gregaf: I'd say it's a cuttlefish blocker. 10MB/min is 15GB/day
[20:02] <gregaf> yeah, I'm with you
[20:02] <sage> heading to the dentist, wish me luck
[20:02] <dmick> gl
[20:02] <scuttlemonkey> pconnelly: howdy!
[20:03] <pconnelly> Patrick?
[20:03] <scuttlemonkey> you're in the right place :)
[20:03] <pconnelly> ok
[20:03] <scuttlemonkey> gregaf: dunno if sage or joshd mentioned an interesting MDS issue that was cropping up
[20:04] <scuttlemonkey> pconnelly has some logs for us (you)...says everything works ok for a while then it just randomly stops responding
[20:04] <pconnelly> where are the MDS logs you want to see?
[20:04] <gregaf> is it the restart-every-four-hours-to-fix one?
[20:04] <scuttlemonkey> a restart of the MDS fixes things...but it has gotten to the point where they have had to set up a cron to restart every 4 hours
[20:04] <scuttlemonkey> hah
[20:04] <gregaf> haven't heard any more than that it exists
[20:04] <scuttlemonkey> k
[20:04] <scuttlemonkey> logs incoming!
[20:05] <gregaf> okay, doing monitor stuff to haul joao out of the fire right now though :p
[20:05] <pconnelly> it got more frequent on Friday, so we changed the restart to every 30 mins...
[20:05] <scuttlemonkey> http://goo.gl/yDPt3
[20:05] <nhm> matt_: I typically see btrfs doing better, but not by that much.
[20:05] * sagelap1 (~sage@2600:1012:b02e:845e:74d7:2e7d:f7ab:339d) has joined #ceph
[20:05] <nhm> matt_: or do you mean that btrfs is writing out more backend data for the same amount of client data?
[20:06] <scuttlemonkey> gregaf: gotcha...will this give us the level of logging we need?
[20:06] <scuttlemonkey> ceph mds tell 0 injectargs '--debug_ms 1 --debug_mds 10'
[20:06] <gregaf> there's a good chance; if there's room though might as well put it at 20
[20:07] <matt_> nhm, Yep. That seems like the case but it's really only for small writes I think because I can max out the disks at the same speed as the xfs server
[20:07] <scuttlemonkey> pconnelly: ok, you have a bit of HDD space on the MDS box?
[20:07] <matt_> nhm, I'm going to convert the osd's to xfs over the next couple of days so it's not a huge deal. Just an FYI :)
[20:09] <scuttlemonkey> pconnelly: also, all ceph logs should be at /var/log/ceph/ if I remember correctly (so /var/log/ceph/mds/ for this)
[20:10] * sagelap (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[20:12] * eschnou (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[20:12] <joao> sorry, had to go afk for a while, but got to look at wip-mon-compact
[20:12] <pconnelly> in /var/log/ceph there's ceph.log, ceph-mon.a.log, ceph-osd.[0-3].log and no MDS directory
[20:13] <pconnelly> and a bunch of gzip'd files from previous days
[20:13] <scuttlemonkey> did you turn up the mds logging already?
[20:13] <joao> sage, gregaf, it looks good, and it got me thinking that we ought to aim at controlling the compaction ourselves instead of relying on leveldb for doing that at the most inconvenient time possible
[20:13] <scuttlemonkey> a la: ceph mds tell 0 injectargs '--debug_ms 1 --debug_mds 20'
[20:13] <gregaf> joao: yeah, we want to come up with a good method and interface for initiating compaction when we do deletes and such
[20:14] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[20:14] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[20:14] <joao> doing it when we're out of quorum, right after a sync store clear and every now and then after trimming should keep things small
[20:15] <gregaf> joao: well, like mikedawson was saying, it's growing at tens of megabytes per minute, so we probably need to do it somewhat frequently
[20:15] <joao> ahm, right
[20:15] <gregaf> this might also be less of a problem if we used a smaller block size by default :/
[20:15] <pconnelly> scuttlemonkey: just did that
[20:16] <scuttlemonkey> pconnelly: cool, should be dropping stuff in an /mds/ dir I believe
[20:16] <gregaf> we should get a running cluster and see what different leveldb config settings produce in terms of that
[20:16] <joao> gregaf, but we could incur in the whole performance issue again though
[20:16] * sagelap (~sage@2600:1012:b01f:4345:c685:8ff:fe59:d486) has joined #ceph
[20:16] <gregaf> scuttlemonkey: pretty sure logs all go to the same dir
[20:16] <scuttlemonkey> hrm...maybe I did the dir thing on my own at some point
[20:16] <scuttlemonkey> I remember fiddling
[20:16] <gregaf> joao: yeah, we might need to start tuning based on activity
[20:16] <pconnelly> no mds directory...
[20:17] <scuttlemonkey> but at the very least it should be ceph-mds.*
[20:17] <gregaf> (and obviously we need the cleanups either way, but if we can make it hurt less...)
[20:17] <joao> gregaf, on-the-fly?
[20:17] <sagelap> gregaf, sjust: pushed updated wip-mon-pg (pgs stuck creating)
[20:17] <sjust> k, I'll take a look
[20:17] <gregaf> will check that later today
[20:17] <gregaf> joao: probably not
[20:17] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:17] <gregaf> joao: sagelap: looking at the failed upgrade issues, and I'm super confused
[20:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[20:18] <gregaf> it appears that nothing updates the feature_set on disk once the cluster's created
[20:18] <pconnelly> scuttlemonkey: here's an ls
[20:18] <sagelap> that's entirely possible...
[20:18] <pconnelly> root@storage1:/var/log/ceph# ls -l
[20:18] <gregaf> but that's what is looked at by the cuttlefish store converter
[20:18] <pconnelly> total 17188
[20:18] <pconnelly> -rw------- 1 root root 3388492 Apr 29 11:18 ceph.log
[20:18] <joao> gregaf, I think that adjusting those options on the fly would involve closing and reopening the store anyway, so I don't think that would be much of an option
[20:18] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[20:18] <pconnelly> -rw------- 1 root root 1778038 Apr 29 06:25 ceph.log.1.gz
[20:18] <pconnelly> -rw------- 1 root root 1798411 Apr 28 06:25 ceph.log.2.gz
[20:18] <pconnelly> -rw------- 1 root root 1808273 Apr 27 06:25 ceph.log.3.gz
[20:18] <gregaf> and *something* is doing it for some of the monitors
[20:18] <pconnelly> -rw------- 1 root root 1787525 Apr 26 06:25 ceph.log.4.gz
[20:18] <pconnelly> -rw------- 1 root root 1769974 Apr 25 06:25 ceph.log.5.gz
[20:18] <pconnelly> -rw------- 1 root root 1753879 Apr 24 06:25 ceph.log.6.gz
[20:18] <pconnelly> -rw------- 1 root root 1746848 Apr 23 06:25 ceph.log.7.gz
[20:18] <pconnelly> -rw-r--r-- 1 root root 4673 Apr 29 11:15 ceph-mon.a.log
[20:18] * sagelap1 (~sage@2600:1012:b02e:845e:74d7:2e7d:f7ab:339d) Quit (Ping timeout: 480 seconds)
[20:18] <scuttlemonkey> pconnelly: ack!
[20:18] <pconnelly> -rw-r--r-- 1 root root 2551 Apr 29 06:24 ceph-mon.a.log.1.gz
[20:18] <gregaf> joao: no, I mean not even upgrading updates it
[20:18] <pconnelly> -rw-r--r-- 1 root root 2249 Apr 28 06:24 ceph-mon.a.log.2.gz
[20:18] <pconnelly> -rw-r--r-- 1 root root 1765 Apr 27 06:24 ceph-mon.a.log.3.gz
[20:18] <pconnelly> -rw-r--r-- 1 root root 802 Apr 26 04:24 ceph-mon.a.log.4.gz
[20:18] <pconnelly> -rw-r--r-- 1 root root 503 Apr 25 04:24 ceph-mon.a.log.5.gz
[20:18] <pconnelly> -rw-r--r-- 1 root root 444 Apr 24 04:24 ceph-mon.a.log.6.gz
[20:18] <scuttlemonkey> http://pastebin.com
[20:18] <pconnelly> -rw-r--r-- 1 root root 511 Apr 23 04:24 ceph-mon.a.log.7.gz
[20:19] <scuttlemonkey> please
[20:19] <pconnelly> -rw-r--r-- 1 root root 58942 Apr 29 11:17 ceph-osd.0.log
[20:19] <pconnelly> -rw-r--r-- 1 root root 47235 Apr 29 06:24 ceph-osd.0.log.1.gz
[20:19] <pconnelly> -rw-r--r-- 1 root root 46252 Apr 28 06:24 ceph-osd.0.log.2.gz
[20:19] * pconnelly was kicked from #ceph by joao
[20:19] <joao> gregaf, that is weid
[20:19] <joao> *weird
[20:19] <joao> let me take a look
[20:20] <gregaf> I have two monitors on the same node (burnupi45) which went through the same upgrade procedure, and they both have global versions in the right places, but for some reason one of them has that feature listed in the feature_set and one doesn't
[20:21] <joao> gregaf, I've seen that happening but was never able to reproduce it or figure out why
[20:21] <gregaf> should have made a bug...
[20:21] <gregaf> anyway, I can't even see how *any* of them are supposed to be updated, so help me out?
[20:21] <gregaf> where does that happen?
[20:21] <joao> actually, one of those times was that a monitor was still on pre-bobtail
[20:21] <gregaf> no, I mean when is the feature_set supposed to go to disk?
[20:21] <joao> gregaf, IIRC, something like 'update_on_disk_feature'
[20:22] <joao> I think that was removed for 0.59
[20:22] <scuttlemonkey> ok, so where do the mds logs go when you turn up the logging?
[20:22] <gregaf> all I can see in HEAD or v0.56.4 is update_msgr_features
[20:23] <gregaf> joao: the quorum features get updated in the monmap
[20:23] <wido> gregaf: I'm here for a bit, so if you need me
[20:23] <gregaf> but I've grepped for feature_set, COMPAT_SET_LOC, and write_features and am finding no writers except via mkfs
[20:23] <wido> it's a national holiday tomorrow, so I won't be around tomorrow
[20:23] <joao> gregaf, https://github.com/ceph/ceph/blob/v0.58/src/mon/Monitor.cc#L264
[20:23] <gregaf> wido: don't have any questions yet, thanks — just mentioning something to somebody :)
[20:24] <wido> sure, np
[20:24] <gregaf> scuttlemonkey: same place they go before you turn up the logging, which defaults to /var/log/ceph, but could be elsewhere by config
[20:24] <scuttlemonkey> yeah, pretty sure they haven't changed it
[20:24] <scuttlemonkey> he turned up logging and all I see is ceph.log.* ceph-mon, and ceph-osd
[20:25] <gregaf> turning up logging isn't going to create the file if it doesn't already exist (which it should)
[20:25] <gregaf> check the config via the admin socket, I guess
[20:26] <sjust> sagewk: wip-mon-pg looks good
[20:26] <sagelap> sjust: did your bug include mon thrashing?
[20:27] <sjust> one moment
[20:27] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[20:27] <joao> gregaf, did both monitors reached quorum?
[20:27] <sjust> no
[20:27] <joao> after the upgrade I mean
[20:28] <sagelap> yeah :( i think there is a similar bug lurking still
[20:28] <gregaf> joao: they should have, but I looked at the log and am seeing something strange
[20:28] <sjust> yeah
[20:28] <sagelap> i'll merge this tho
[20:28] <sjust> yep
[20:28] <sagelap> and mark the bugs dups
[20:28] <joao> gregaf, fwiw, they should both have this on their logs upon setting the on-disk flag: 'setting CEPH_MON_FEATURE_INCOMPAT_GV'
[20:28] <gregaf> I see "2013-04-26 17:21:32.771363 7fd50d4f1700 0 mon.b@1(leader) e1 setting CEPH_MON_FEATURE_INCOMPAT_GV" on mon.b, but it's waaaay after it should have been
[20:29] <gregaf> I think maybe it's only happening if they're the leader?
[20:29] <joao> that should also happen on peons
[20:29] <joao> gregaf, see https://github.com/ceph/ceph/blob/bobtail/src/mon/Monitor.cc#L252
[20:30] <sagelap> sjust: can probably downgrade the bug though.. don't think it's a blocker
[20:30] <sjust> I've got a bunch of logs to read after lunch, I'll downgrade it if something doesn't come out of that
[20:31] <sagelap> k
[20:31] <joao> gregaf, with debug mon 10, you should be able to grep for 'recovered_peon' and obtain something useful
[20:32] <gregaf> joao: yeah, you're right
[20:32] <gregaf> weird
[20:32] <gregaf> I'll have to dig into what's happening a bit more (it sure looks like it got restarted at the right times), but lunch first
[20:33] * danieagle (~Daniel@177.97.249.14) has joined #ceph
[20:38] <mikedawson> gregaf, joao: my leveldb is growing more like 26MB/min right now (37GB/day)
[20:39] * sagelap (~sage@2600:1012:b01f:4345:c685:8ff:fe59:d486) Quit (Ping timeout: 480 seconds)
[20:41] * diegows (~diegows@host28.190-30-144.telecom.net.ar) has joined #ceph
[20:44] * alram (~alram@cpe-75-83-127-87.socal.res.rr.com) has joined #ceph
[20:47] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:55] * eschnou (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:58] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[20:58] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[21:02] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[21:06] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[21:09] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[21:09] * ChanServ sets mode +o elder
[21:14] * atb (~atb@d24-141-198-231.home.cgocable.net) has joined #ceph
[21:15] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[21:19] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[21:21] * sileht (~sileht@sileht.net) Quit (Quit: WeeChat 0.4.0)
[21:22] <mikedawson> joao: second script output -> http://pastebin.com/raw.php?i=r23FDBM3 this mon is up to 3.1GB now
[21:22] <paravoid> so, uhm
[21:23] <paravoid> this happened a few minutes ago http://p.defau.lt/?shYp3tCTscSSRZeViQeu9Q
[21:23] <paravoid> this could be #4552
[21:24] <joao> mikedawson, thanks, I'll try to follow up with the leveldb guys, maybe they have an inkling on what may be happening to trigger this
[21:25] <mikedawson> joao: np. thanks for your help
[21:26] <paravoid> any reason not to increase min reports down to 4 or something?
[21:26] <paravoid> or even <number of osds per box> + 1?
[21:27] <paravoid> s/reports/reporters/
[21:33] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:34] <gregaf> paravoid: feel free to set it that way :p but the defaults don't because that would be a lot of built-in intelligence which nobody found necessary so far, it would be a lot of work, and setting it to 9 would suck for people with smaller clusters ;)
[21:34] <paravoid> I have 12 disks per box, so I'd set it to 13 or 14
[21:34] <paravoid> considering the default is 1 it seems a bit excessive
[21:34] <paravoid> but otoh, a single hung SCSI controller marking the cluster down is also not very nice :)
[21:35] <paravoid> 14 is ~10% of the cluster though
[21:35] <gregaf> yeah
[21:36] <gregaf> that should be fixed when we release cuttlefish, like Sam said
[21:37] <paravoid> what's the downside though?
[21:39] <gregaf> just need to acquire more reports — probably not a downside in practice
[21:43] * atb (~atb@d24-141-198-231.home.cgocable.net) Quit ()
[21:43] * sagelap (~sage@2600:1012:b001:6f46:e4ba:9b67:a158:8a80) has joined #ceph
[21:46] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[21:53] * mib_02jdm9 (5b09d5e2@ircip2.mibbit.com) has joined #ceph
[21:54] * sagelap (~sage@2600:1012:b001:6f46:e4ba:9b67:a158:8a80) Quit (Ping timeout: 480 seconds)
[21:54] <gregaf> joao: figured out the upgrade issue — reset() is never called on peons in v0.56.4
[21:54] <gregaf> this is possibly not great
[21:55] <gregaf> wait, damn, maybe it is…must look more carefully...
[21:56] <sage> back
[21:57] <gregaf> *checks more* yeah, I don't think it does…do the monitors go through bootstrap to do an election?
[22:00] * sagelap (~sage@76.89.177.113) has joined #ceph
[22:01] <sage> where are we at with the leveldb thing? have we tried compacting on trim?
[22:04] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[22:08] <mikedawson> sage: I'm still up and running, currently growing leveldb at ~26MB/min (~37GB/min)
[22:08] <mikedawson> 37GB/day, I mean
[22:08] * pconnelly (~pconnelly@71-93-233-229.dhcp.mdfd.or.charter.com) has joined #ceph
[22:10] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:11] <sage> joao: ping
[22:11] <gregaf> I think he went for dinner
[22:11] <sage> k
[22:11] <sage> do you know if he was going to try compacting on trim?
[22:12] * lofejndif (~lsqavnbok@SIPB-TOR.MIT.EDU) has joined #ceph
[22:12] <joao> sage, gregaf, here now
[22:12] <paravoid> hey sage, any downsides on setting osd-min-reporters to 14? :)
[22:13] <sage> jus tmakes it harder to mark failed osds down, but ought to be fine on a large cluster
[22:13] <paravoid> yeah, I had an incident where basically 2-3 osds marked everything down again
[22:13] <joao> sage, yeah, but I wonder if we should always compact on trim or take a more relaxed approach
[22:13] <sage> similar thing where there was a hang from hw or something?
[22:14] <paravoid> could be, who knows
[22:14] <paravoid> good memory btw :)
[22:14] <sage> sjust just worked on a fix for that last week.
[22:14] <joao> then again, working on getting debug infos out during proposals
[22:14] <paravoid> this was #4552, fixed in cuttlefish
[22:14] <paravoid> yeah
[22:14] <paravoid> but in any case it seemed prudent to increase min reporters
[22:14] <sage> yeah
[22:14] <paravoid> whatever the underlying cause
[22:15] <paravoid> 14 is one box (12) + 2
[22:15] <paravoid> also ~10% of the cluster (144 osds)
[22:15] <paravoid> I'll leave it at that for now :)
[22:15] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[22:15] <paravoid> thanks
[22:15] <sage> sounds reasonable. will be interested in hearing how it works out
[22:15] * eschnou (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) Quit (Remote host closed the connection)
[22:16] <sage> joao: you tested doing a full compaction or jsut the trimmed rnage?
[22:16] <joao> sage, ?
[22:16] * eschnou (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:17] <joao> I tested the compaction on a 9GB store using a modified ceph_test_store_tool
[22:17] <sage> i'm just catching up on the compacting on trim.. what have you tried/are trying?
[22:17] <joao> sage, don't have a test for that going yet
[22:18] <joao> first am going to get some output on transactions, to check how big each put is (to assess if the store is growing at a faster pace due to the block sizes and/or page alignment of the sorts)
[22:18] <joao> then going to try to tune leveldb a bit to play with that
[22:18] <sage> k
[22:19] <joao> and then, force compact when appropriate
[22:19] <sage> i'll make a branch that blindling compacts then so tha tmikedawson can test it
[22:19] <joao> kay
[22:19] <gregaf> sage: joao: http://tracker.ceph.com/issues/4858
[22:20] <joao> sage, I believe that compacting directly on MonitorDBStore::apply_transaction ought to do it
[22:20] <sage> compacting the whole thing?
[22:20] <joao> that and on MonitorDBStore::clear() (or whatever the name is)
[22:20] <sage> i'm thinking we can jsut compact the range we trimmed (or similar)
[22:20] <joao> sage, we can mark a transaction with deletes, for instance
[22:20] <joao> and only compact then
[22:20] <joao> hmm
[22:20] <joao> that might work
[22:20] <joao> but we don't really know that
[22:20] <sage> are there normally deletes outside of trim?
[22:21] <joao> I mean, we'd have to infer it somehow
[22:21] <joao> sage, no
[22:21] <gregaf> sage: we don't really know what the ranges are, and the interface needs to not suck for exposing that at the right layers
[22:21] <joao> none that I recall, besides the sync store clear
[22:21] <sage> gregaf: yeah
[22:21] * eschenal (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:21] <joao> the store sync clear does indeed delete every single key in every paxos service
[22:21] <sage> well, we could compact the full range for the paxosservice that is trimming
[22:21] <joao> it's a bunch of deletes
[22:22] <joao> but that shouldn't happen often
[22:22] <joao> sage, yeah, that would probably work
[22:22] <joao> but then again, that would have to be done on the PaxosService level
[22:22] <joao> not that it is a problem imo, just not as fancy
[22:23] <sage> i'll try that.
[22:26] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:27] * eschenal (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[22:36] * danieagle (~Daniel@177.97.249.14) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:37] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[22:38] <sage> mikedawson: sitll around? i have something you can test...
[22:39] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[22:39] <sage> 0f7d951003b09973b75b99189a560c3a308fef23 new wip-mon-compact... can you try that on one mon and see if growth stops?
[22:44] * mib_02jdm9 (5b09d5e2@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[22:46] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[22:48] * sagelap (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[22:51] <Vjarjadian> so, any progress came about on Geo-Replication or async replication that is planned for the next version?
[22:53] * sagelap (~sage@2600:1012:b021:ada9:7896:aa93:7771:c315) has joined #ceph
[23:07] <yehuda_hm> Vjarjardian: we're working on it
[23:09] * sagelap (~sage@2600:1012:b021:ada9:7896:aa93:7771:c315) Quit (Ping timeout: 480 seconds)
[23:19] * eschnou (~eschnou@252.94-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:42] <cjh_> are there docs on setting up a metadata server for ceph? i don't see much on the wiki. maybe i'm looking in the wrong spot
[23:42] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[23:42] * portante is now known as portante|afk
[23:43] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[23:44] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[23:45] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[23:45] <sage> mikedawson: ping
[23:46] * sagelap (~sage@76.89.177.113) has joined #ceph
[23:47] <davidz> cjh_: Did you see this: http://ceph.com/docs/master/start/quick-ceph-deploy/#add-a-mds
[23:48] <cjh_> davidz: yup, but how do i do that without ceph-deploy? I'm running bobtail
[23:53] <davidz> cjh_: I may be totally wrong, but if you've configured the mds in your ceph.conf and everything else is running, tnjust ry starting the mds.
[23:53] <davidz> cjh_: *just try
[23:55] <cjh_> sounds easy enough :)
[23:58] <sstan> getting : mon/AuthMonitor.cc: 148: FAILED assert(ret == 0)
[23:58] <sstan> is that a common problem? (0.60)
[23:59] <sstan> when trying to start mon

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.