#ceph IRC Log

Index

IRC Log for 2013-06-04

Timestamps are in GMT/BST.

[0:01] * eegiks (~quassel@2a01:e35:8a2c:b230:50b6:5de:170f:95f2) Quit (Ping timeout: 480 seconds)
[0:02] * eegiks (~quassel@2a01:e35:8a2c:b230:dd7:5938:8fb7:d3b0) has joined #ceph
[0:03] * drokita1 (~drokita@199.255.228.128) has joined #ceph
[0:03] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[0:06] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:08] * Tamil (~tamil@38.122.20.226) has joined #ceph
[0:09] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) Quit (Remote host closed the connection)
[0:09] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:09] * portante (~user@66.187.233.206) Quit (Quit: home)
[0:14] <jlogan> I'm watching a problem with 1 OSD on my setup.
[0:14] <jlogan> it goes up for a while, then down, and then back in.
[0:15] <jlogan> the log has: "initiating reconnect" and "had timed out after 15" as the main messages
[0:16] * drokita1 (~drokita@199.255.228.128) Quit (Quit: Leaving.)
[0:18] <cjh_> yehuda_hm: that works! so now when i try to suspend a user it returns 200 but does nothing haha
[0:18] <cjh_> i used this format: {admin}/user?format=json&uid=BadUser&suspended=True
[0:20] <davidz> jlogan: The first thing I'd check is that you don't have a disk that is failing. Check for kernel I/O error messages.
[0:20] <davidz> what about load on that machine?
[0:20] <jlogan> davidz: no indication of disk errors in dmesg
[0:21] <jlogan> uptime shows: 14 14 14
[0:21] <jlogan> We have watched the disk usage on that OSD go down today. 45% down to 40%.
[0:22] * via (~via@smtp2.matthewvia.info) has joined #ceph
[0:24] <davidz> That seems high to me. Can you find what processes are using the most CPU, like with "top" or "ps"?
[0:28] <jlogan> Load is currently 5. ceph-osd is using 36.5% and 37.6%, 101% and 3.9%
[0:29] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:29] <davidz> jlogan: You might want to 'ceph osd set noout' while you diagnose this, so that the OSD doesn't get marked out and recovery starts moving things around.
[0:30] * redeemed_ (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Quit: bia)
[0:30] <yehuda_hm> ccourtaut: sorry, I'm kinda afk now, will be in later
[0:31] <ccourtaut> yehuda_hm: ok no problem, i probably won't be there as i'm in euro timezone
[0:31] <jlogan> We did a stop, umount, mount, and now we are watching it again.
[0:31] <yehuda_hm> so, tomorrow (my) morning is good too
[0:32] <ccourtaut> yehuda_hm: what is the best time for you, in your timezone?
[0:32] <jlogan> We may try the noout as well so see if that lets it finish the rebuilding.
[0:34] <jlogan> We have a mon (mon-b) that keeps being removed from the list of 3 hosts.
[0:34] <yehuda_hm> ccourtaut
[0:34] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:35] <yehuda_hm> ccortaut: probably ~10am pacific
[0:35] <ccourtaut> ok, i take note
[0:35] * Meths (rift@2.27.72.232) Quit (Remote host closed the connection)
[0:35] * Meths (rift@2.27.72.232) has joined #ceph
[0:36] <ccourtaut> yehuda_hm: bye
[0:37] * tnt (~tnt@91.176.24.98) Quit (Read error: Operation timed out)
[0:39] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:39] <davidz> jlogan: what version of ceph are you running?
[0:40] <jlogan> 0.63
[0:41] * tnt (~tnt@91.176.13.220) has joined #ceph
[0:46] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit (Quit: leaving)
[0:49] <jlogan> I noticed one of the hosts swapping with 12G of disk cache. Is there a sysctl you suggest around swap?
[0:52] <nigwil> I've got a rogue OSD that is too full, the cluster is in backfill_toofull, is there a way of draining the single too full OSD (is this the "delete some PGs" trick)?
[0:52] * athrift (~nz_monkey@203.86.205.13) Quit (Quit: No Ping reply in 180 seconds.)
[0:53] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[0:54] <sjust> jlogan: odd, how much memory/how many osds?
[0:54] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:57] <jlogan> 24GB ram, 5 or 6 OSDs per host.
[0:57] <jlogan> 3 hosts
[0:59] <sjust> hmm, should be enough memory
[0:59] <sjust> though with 12G of disk cache, it would be tighter
[0:59] <sjust> how much are the osds using?
[1:00] <jlogan> via top, looks like 1-3G per OSD.
[1:00] <sjust> that's not too bad...
[1:00] <sjust> it oom'd with <3G res?
[1:01] <jlogan> I have a heart in your response G res?
[1:03] <jlogan> I have not had any OOM in the dmesg output.
[1:04] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:07] <sjust> oh, they didn't oom?
[1:07] <dmick> jlogan: "less than 3 GB resident"
[1:08] <sjust> sorry, misread
[1:08] <sjust> hmm 1-3G is fairly normal
[1:08] <sjust> during recovery, that is
[1:08] <sjust> how many pgs do you have?
[1:09] <jlogan> no, nothing went OOM, just slow...
[1:09] <jlogan> pgmap v4264363: 4472 pgs: 4472 active+clean; 4013 GB data, 8402 GB used, 20512 GB / 29808 GB avail; 0B/s rd, 16640B/s wr, 9op/s
[1:09] <jlogan> The OSD recovery also finished, and the mon which was unhappy is up.
[1:10] * tnt (~tnt@91.176.13.220) Quit (Read error: Operation timed out)
[1:10] <jlogan> So I think we will just keep watching the cluster for now.
[1:10] <jlogan> but it sure felt like it was stuck.
[1:10] <davidz> nigwil: Can you copy and paste the last part of "ceph pg dump" that shows each OSD and it's kbused kbavail….
[1:11] * espeer (~espeer@105-236-45-136.access.mtnbusiness.co.za) Quit (Remote host closed the connection)
[1:11] * espeer (~espeer@105-236-45-136.access.mtnbusiness.co.za) has joined #ceph
[1:11] * espeer (~espeer@105-236-45-136.access.mtnbusiness.co.za) Quit ()
[1:14] <nigwil> http://pastebin.com/g8MhbSLY
[1:22] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Ping timeout: 480 seconds)
[1:23] * newbie (~kvirc@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[1:25] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[1:26] * mschiff_ (~mschiff@port-27667.pppoe.wtnet.de) Quit (Remote host closed the connection)
[1:27] * LeaChim (~LeaChim@2.122.119.234) Quit (Ping timeout: 480 seconds)
[1:30] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:45] <davidz> nigwil: Look at "ceph osd tree" to see what the weight of the rogue OSD is compared to the others.
[1:46] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:47] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:59] <nigwil> all are showing weight 1
[2:11] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[2:24] * vata (~vata@2607:fad8:4:6:7030:ba13:af9d:b4c) Quit (Quit: Leaving.)
[2:28] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) has joined #ceph
[2:30] * DarkAce-Z (~BillyMays@50.107.53.195) Quit (Ping timeout: 480 seconds)
[2:31] * DarkAceZ (~BillyMays@50.107.53.195) has joined #ceph
[2:35] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:38] * dosaboy (~dosaboy@12.15.145.130) has joined #ceph
[2:42] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:42] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[3:08] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[3:08] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[3:14] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[3:26] * tkensiski (~tkensiski@13.sub-70-211-64.myvzw.com) has joined #ceph
[3:27] * tkensiski (~tkensiski@13.sub-70-211-64.myvzw.com) has left #ceph
[3:31] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:32] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:37] * dosaboy (~dosaboy@12.15.145.130) Quit (Quit: leaving)
[3:41] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[3:45] * rturk is now known as rturk-away
[3:47] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:47] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:54] * lx0 is now known as lxo
[3:54] * The_Bishop_ (~bishop@e179000167.adsl.alicedsl.de) has joined #ceph
[3:57] * Rorik (~rorik@199.182.216.68) Quit (Quit: .)
[3:59] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:01] * The_Bishop (~bishop@e177088019.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[4:30] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:31] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:32] * newbie (~kvirc@pool-71-164-242-68.dllstx.fios.verizon.net) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[4:37] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:53] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:57] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[5:02] * Vanony_ (~vovo@i59F79922.versanet.de) has joined #ceph
[5:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:08] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:09] * Vanony (~vovo@i59F7A4DD.versanet.de) Quit (Ping timeout: 480 seconds)
[5:43] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[5:51] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[5:56] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[5:58] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[6:17] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[6:17] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit ()
[6:18] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[6:19] * yehuda_hm (~yehuda@99-48-177-65.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[6:20] * yehuda_hm (~yehuda@2602:306:330b:1410:803d:aead:f897:570a) has joined #ceph
[6:26] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[6:26] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[6:27] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[6:33] * noahmehl_ (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[6:37] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:37] * noahmehl_ is now known as noahmehl
[6:43] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[6:51] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[6:52] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[6:59] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Ping timeout: 480 seconds)
[7:05] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[7:14] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[7:14] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit (Quit: Leaving.)
[7:41] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) has joined #ceph
[7:42] * Cube (~Cube@c-38-80-203-93.rw.zetabroadband.com) Quit ()
[7:59] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[8:10] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:14] * madkiss (~madkiss@p5DCA3D5C.dip0.t-ipconnect.de) has joined #ceph
[8:20] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Always try to be modest, and be proud about it!)
[8:21] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[8:28] * mntzn (~zn@62.65.33.58) has joined #ceph
[8:30] <mntzn> Hi, in ceph status, pgmap ... $NUM data, what is it used by?
[8:35] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[8:36] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) Quit (Ping timeout: 480 seconds)
[8:36] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) has joined #ceph
[8:51] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Quit: Page closed)
[8:51] * fridudad (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[8:52] * loicd (~loic@185.10.252.15) has joined #ceph
[8:54] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Ping timeout: 480 seconds)
[8:56] * tnt (~tnt@91.176.13.220) has joined #ceph
[8:58] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[8:59] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) has joined #ceph
[9:05] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:05] * ChanServ sets mode +v andreask
[9:06] * dignus (~dignus@bastion.jkit.nl) has left #ceph
[9:07] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Quit: noahmehl)
[9:09] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:10] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) Quit (Read error: No route to host)
[9:15] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[9:17] <loicd> good morning ceph !
[9:17] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) has joined #ceph
[9:19] * Esmil (esmil@horus.0x90.dk) has joined #ceph
[9:20] <wido> joao: You online by any chance?
[9:20] <wido> morning loicd :)
[9:20] <loicd> \o
[9:21] <loicd> wido: did you take a look at yehuda's work on geo replication by any chance ?
[9:21] <wido> loicd: Not yet. Working with a customer on his OpenStack deployment for the last few weeks. Didn't get a chance yet :(
[9:22] <wido> loicd: One thing that came to mind though. We need a DNS director :)
[9:22] <wido> Like Amazon has. Which points the DNS of your bucket to the right endpoint
[9:22] <loicd> it makes a lot of sense indeed :-)
[9:23] <loicd> wido: I though you were more in cloudstack than openstack ;-)
[9:23] <wido> loicd: I am, but doing the Ceph work behind OpenStack
[9:23] <loicd> s/though/thought/
[9:24] <loicd> oh, I see
[9:24] <wido> But that DNS director wouldn't be that hard to develop. You just need a backend behind PowerDNS or BIND
[9:25] <loicd> maybe yehuda thought about this
[9:25] * JohansGlock (~quassel@kantoor.transip.nl) has left #ceph
[9:25] <loicd> wido: are there rgw dedicated meetings where this is discussed ?
[9:26] <wido> loicd: No, not that I know. Just a idea I got a couple of days back
[9:26] <wido> Still need to put it on the ml or so
[9:26] <wido> or somewhere at least
[9:29] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[9:30] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) Quit (Read error: No route to host)
[9:33] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) has joined #ceph
[9:35] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[9:36] * yehudasa (~yehudasa@2607:f298:a:607:3d09:4ac6:3111:56c6) Quit (Ping timeout: 480 seconds)
[9:36] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Ping timeout: 480 seconds)
[9:37] * sjust (~sam@38.122.20.226) has joined #ceph
[9:38] * sagewk (~sage@2607:f298:a:607:b8b1:2d0c:7124:f5a7) Quit (Ping timeout: 480 seconds)
[9:41] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) Quit (Ping timeout: 480 seconds)
[9:41] * dmick (~dmick@2607:f298:a:607:2de1:5205:5306:e5df) Quit (Ping timeout: 480 seconds)
[9:41] * joshd (~joshd@2607:f298:a:607:91e2:9879:95d9:fc7a) Quit (Ping timeout: 480 seconds)
[9:44] * gregaf (~Adium@38.122.20.226) has joined #ceph
[9:44] * tnt (~tnt@91.176.13.220) Quit (Ping timeout: 480 seconds)
[9:45] * LeaChim (~LeaChim@2.122.119.234) has joined #ceph
[9:46] * joshd (~joshd@38.122.20.226) has joined #ceph
[9:50] * dmick (~dmick@2607:f298:a:607:d931:c44d:5888:22e8) has joined #ceph
[9:51] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:53] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:55] * sagewk (~sage@38.122.20.226) has joined #ceph
[9:58] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:00] * dxd828 (~dxd828@195.191.107.205) Quit (Remote host closed the connection)
[10:03] * yehudasa (~yehudasa@38.122.20.226) has joined #ceph
[10:23] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[10:24] * yehuda_hm (~yehuda@2602:306:330b:1410:803d:aead:f897:570a) Quit (Ping timeout: 480 seconds)
[10:24] * yehuda_hm (~yehuda@2602:306:330b:1410:803d:aead:f897:570a) has joined #ceph
[10:29] * wwformat (~chatzilla@222.240.177.34) has joined #ceph
[10:32] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:33] * wwformat (~chatzilla@222.240.177.34) Quit ()
[10:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:40] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[10:44] * rongze (~zhu@117.79.232.150) has joined #ceph
[10:45] * san (~san@81.17.168.194) has joined #ceph
[10:51] <rongze> Hi All, I download tutorial.vdi.gz from http://ceph.com/tutorial/ and start the virtual machine, but I cann't login ubuntu, because I donn't know the password of root.
[10:51] <rongze> How to get the password?
[10:54] <loicd> try login with the ubuntu user instead rongze, and then use sudo
[10:57] <rongze> tutorial.vdi.gz is a virtual box image, it had installed ubuntu..
[10:58] <rongze> I don't know which user can login
[10:58] * dmick (~dmick@2607:f298:a:607:d931:c44d:5888:22e8) Quit (Ping timeout: 480 seconds)
[10:58] * sjust (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[10:59] * joshd (~joshd@38.122.20.226) Quit (Ping timeout: 480 seconds)
[10:59] * sagewk (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[10:59] * gregaf (~Adium@38.122.20.226) Quit (Ping timeout: 480 seconds)
[10:59] * yehudasa (~yehudasa@38.122.20.226) Quit (Ping timeout: 480 seconds)
[11:04] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) Quit (Ping timeout: 480 seconds)
[11:05] * gregaf (~Adium@38.122.20.226) has joined #ceph
[11:08] * yehudasa (~yehudasa@38.122.20.226) has joined #ceph
[11:09] <rongze> loicd: I have solved this problem. I enter the recovery mode of ubuntu and reset root password.
[11:11] * dmick (~dmick@38.122.20.226) has joined #ceph
[11:12] * sagewk (~sage@38.122.20.226) has joined #ceph
[11:13] * joshd (~joshd@38.122.20.226) has joined #ceph
[11:13] * sjust (~sam@38.122.20.226) has joined #ceph
[11:28] * gregaf (~Adium@38.122.20.226) Quit (Read error: Operation timed out)
[11:29] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) has joined #ceph
[11:30] <mntzn> Hi, in ceph status, pgmap ... $NUM data, what does this $NUM data account to? or what type of data is it?
[11:31] * mschiff (~mschiff@tmo-111-50.customers.d1-online.com) has joined #ceph
[11:36] * gregaf1 (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) has joined #ceph
[11:39] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) Quit (Ping timeout: 480 seconds)
[11:48] * mschiff (~mschiff@tmo-111-50.customers.d1-online.com) Quit (Remote host closed the connection)
[11:50] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[11:52] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) has joined #ceph
[11:57] * sjust (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[12:00] * yehudasa (~yehudasa@38.122.20.226) Quit (Ping timeout: 480 seconds)
[12:00] * yehudasa (~yehudasa@2607:f298:a:607:4d8c:2491:279:9f1) has joined #ceph
[12:02] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[12:05] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[12:08] * joshd (~joshd@38.122.20.226) Quit (Ping timeout: 480 seconds)
[12:08] * dmick (~dmick@38.122.20.226) Quit (Ping timeout: 480 seconds)
[12:08] * Anticime1 (anticimex@95.80.32.80) has joined #ceph
[12:08] * dmick (~dmick@2607:f298:a:607:d931:c44d:5888:22e8) has joined #ceph
[12:08] * joshd (~joshd@2607:f298:a:607:1d5a:ef63:530c:5125) has joined #ceph
[12:10] * Anticimex (anticimex@netforce.csbnet.se) Quit (Ping timeout: 480 seconds)
[12:13] * madkiss1 (~madkiss@p4FE641FB.dip0.t-ipconnect.de) has joined #ceph
[12:16] * madkiss (~madkiss@p5DCA3D5C.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[12:17] * pixel (~pixel@81.195.203.34) has joined #ceph
[12:19] * Anticime1 is now known as Anticimex
[12:21] <pixel> hello,Please help me to understentd op/s (operation per second) value in 'pgmap v71520: 3352 pgs: 3352 active+clean; 212 GB data, 429 GB used, 23444 GB / 23874 GB avail; 89237KB/s wr, 24op/s' line? what does "operation" mean?
[12:23] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[12:28] * gregaf1 (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) Quit (Read error: Connection reset by peer)
[12:28] <joao> afair, it's an average of the operations hitting the pgs
[12:29] <joao> pixel, ^
[12:32] <pixel> joao> Something like 1 op = 1 element of PG?
[12:32] <Gugge-47527> its the amount of writes pr second :)
[12:33] <joao> I believe Gugge-47527 is right
[12:33] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) has joined #ceph
[12:33] <Gugge-47527> its actually some pretty big writes .... 89237KB / 24 :)
[12:34] * sagewk (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[12:34] * sagewk (~sage@38.122.20.226) has joined #ceph
[12:36] <pixel> Gugge-47527> :) Yep, there is performed backup process and I want to understant what is element and its size is included in operation
[12:37] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Remote host closed the connection)
[12:38] <pixel> ^ block size, file, etc
[12:39] * maximilian (~maximilia@212.79.49.65) Quit (Ping timeout: 480 seconds)
[12:39] * maximilian (~maximilia@212.79.49.65) has joined #ceph
[12:40] * joao hates living in a world where reproducing bugs only appears to be easy for those with production clusters
[12:40] <joao> /rant
[12:43] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) has joined #ceph
[12:45] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:51] * n3c8-35575 (~mhattersl@pix.office.vaioni.com) Quit (Ping timeout: 480 seconds)
[12:51] <mikedawson> joao: at least its easy!
[12:51] <joao> :(
[12:52] <joao> I left a dozen nodes locked with 3 different jobs trying to reproduce a tiny little hang
[12:52] <joao> last night, I mean
[12:52] <joao> and haven't been able to reproduce it as of now
[12:52] <joao> I'm tempted to lock all the nodes in the lab to run the freaking job
[12:55] * loicd (~loic@185.10.252.15) Quit (Ping timeout: 480 seconds)
[12:58] <maximilian> echo
[13:00] * maximilian (~maximilia@212.79.49.65) Quit ()
[13:00] * maximilian (~maximilia@212.79.49.65) has joined #ceph
[13:01] * san (~san@81.17.168.194) Quit (Ping timeout: 480 seconds)
[13:01] * san (~san@81.17.168.194) has joined #ceph
[13:06] <nhm> good morning #ceph
[13:06] <joao> howdy nhm
[13:06] <nhm> joao: going to test out wip-mon this morning
[13:06] <joao> cool
[13:06] <ofu> every pg has one master and n-1 slaves with n the number of copies of the pool? Is a write ACKed to a client, when all PGs are synced or when the master pg got the write?
[13:07] <joao> nhm, I haven't been able to reproduce the hangs we're after though
[13:07] <nhm> joao: I was reproducing it pretty easily on the burnupi cluster I built
[13:07] <nhm> joao: mons are on burnupi08
[13:08] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[13:09] <joao> nhm, the sync hang?
[13:09] <joao> #5215 ?
[13:09] * Anticimex (anticimex@95.80.32.80) Quit (Remote host closed the connection)
[13:10] * Anticimex (anticimex@95.80.32.80) has joined #ceph
[13:10] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit ()
[13:11] <nhm> joao: sorry, I might be takling about something else. That's where I was seeing the crazy leveldb reads
[13:11] * nhm just woke up
[13:11] <joao> ah, kay
[13:12] <joao> yeah, wip-mon should tackle those
[13:12] <joao> my guess is that it should also tackle #5215, but we would *love* to reproduce reliably it to make sure :\
[13:13] <nhm> joao: I can run my benchmarks in such a way that it will basically just recreate the cluster and create new pools over and over.
[13:13] <nhm> joao: not sure what situations trigger it...
[13:14] <joao> nhm, best guess is loaded mon that gets thrashed, a store sync is triggered and leveldb hangs while cleaning up
[13:15] <joao> *cleaning up after the sync is finished
[13:20] * pressureman (~pressurem@62.217.45.26) has joined #ceph
[13:20] <nhm> joao: lovely. :)
[13:32] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[13:33] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[13:33] <andrei> wido: hello
[13:33] <andrei> are you oinline?
[13:34] <wido> andrei: For a sec, have to go afk in a minute
[13:34] <andrei> do you have a few minutes
[13:34] <andrei> okay, i'll try to be quick
[13:34] <andrei> in your rbd CS support, do you use cache=none by default when you start vms?
[13:35] <andrei> i've read that for better performance you should use cache=writeback
[13:35] <nhm> joao: ping
[13:35] <andrei> and enable rbd cache
[13:35] <wido> andrei: True, but CS doesn't support per storage pool settings
[13:35] <wido> it is on my whishlist to do this though
[13:35] <joao> nhm, pong
[13:35] <andrei> ah, i see
[13:35] <nhm> joao: mind jumping on burnupi08 and looking at /var/log/ceph/mon.b.log?
[13:35] <andrei> so if i change the cache= option it would change it for every storage type, right?
[13:36] <joao> nhm, sure
[13:36] <joao> nhm, wip-mon ?
[13:36] <andrei> I am planning to have ceph storage exclusively, so it should not matter for me much
[13:36] <andrei> at least that is what i think
[13:36] <nhm> joao: yeah
[13:36] <andrei> so, how do I change that in cloudstack?
[13:37] <joao> well, this is unfortunate
[13:37] <joao> let me grab this log
[13:38] <andrei> wido: are you away already?
[13:39] <joao> nhm, will take a look
[13:39] <joao> after lunch
[13:39] <joao> thanks
[13:39] <joao> :)
[13:40] <nhm> joao: heh, sounds good
[13:41] <joao> oh, btw
[13:42] <joao> nhm, one of the symptoms of #5215 being triggered is a monitor staying out-of-quorum forever (or, at least, for a long, long time) and maybe even synchronizing multiple times
[13:42] <joao> just in case you hit it
[13:43] <andrei> hi guys
[13:43] <andrei> how do I enable rbd caching? is it simply by adding this to the ceph.conf:
[13:43] <andrei> [client]
[13:43] <andrei> rbd cache = true
[13:44] <andrei> also, do i need to restart ceph services for this option to take effect?
[13:48] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:50] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:50] * ChanServ sets mode +v andreask
[13:50] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Operation timed out)
[13:55] <andrei> how do I check the running ceph configuration?
[13:59] * pixel (~pixel@81.195.203.34) Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[14:09] * loicd (~loic@80.12.100.14) has joined #ceph
[14:15] <nhm> andrei: you can check the running config through the admin socket
[14:16] <nhm> andrei: and I stick "rbd cache = true" in global, but there's probably a more appropriate place to put it.
[14:16] <nhm> andrei: you'll also need it in the VM description.
[14:16] <andrei> ah, okay
[14:17] <andrei> i've checked docs and they are saying to put it in the [client] section
[14:17] <nhm> ie:
[14:17] <nhm> <driver name='qemu' type='raw' cache='writeback'/>
[14:17] <andrei> nhm: could you please show me the cli for the config check
[14:18] <nhm> andrei: something like: sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show
[14:19] * loicd (~loic@80.12.100.14) Quit (Max SendQ exceeded)
[14:21] * loicd (~loic@80.12.100.14) has joined #ceph
[14:27] <andrei> nhm: nice one, thanks
[14:29] <andrei> nhm: does your running config show rbd_cache true?
[14:30] <andrei> coz when I change ceph.conf and make it according to the docs my running config still shows rbd_cache: false
[14:32] <nhm> andrei: "rbd_cache": "true",
[14:32] <nhm> "rbd_cache_size": "33554432",
[14:32] <nhm> "rbd_cache_max_dirty": "25165824",
[14:32] <nhm> "rbd_cache_target_dirty": "16777216",
[14:32] <nhm> "rbd_cache_max_dirty_age": "1",
[14:32] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:34] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[14:37] * loicd (~loic@80.12.100.14) Quit (Read error: Connection timed out)
[14:38] * ay (~ay@91.247.228.48) has left #ceph
[14:40] * loicd (~loic@80.12.100.14) has joined #ceph
[14:41] * loicd (~loic@80.12.100.14) Quit ()
[14:43] <maximilian> I have two nodes and I m trying some failover tests
[14:45] <andrei> nhm: could you please show me your section from the ceph.conf? Did you simply added rbd cache = true to the [global]?
[14:45] <maximilian> I have reinstalled os&ceph on first partition now I want to rebuild the cluster which is on secon partition from erlier installation
[14:45] <andrei> maximilan: I will be doing some failover tests pretty soon
[14:45] <andrei> would be happy to hear what scenarios you've taken
[14:45] <andrei> and what the results are
[14:46] <maximilian> of course I ll post it but now I m stucked within messages like these
[14:46] <maximilian> 2013-06-04 14:46:12.240455 7fede2fee700 0 -- :/14483 >> 192.168.1.1:6789/0 pipe(0x7fedd800b910 sd=4 :0 s=1 pgs=0 cs=0 l=1).fault
[14:46] <maximilian> 2013-06-04 14:46:15.240950 7fede971c700 0 -- :/14483 >> 192.168.1.2:6789/0 pipe(0x7fedd8005c80 sd=4 :0 s=1 pgs=0 cs=0 l=1).fault
[14:47] <imjustmatthew> maximilian: that usually shows up when the monitors aren't in quorum
[14:47] <maximilian> looks like btrfs scan/check
[14:48] <imjustmatthew> do you have both those two systems setup as monitors?
[14:48] <maximilian> yes
[14:48] <maximilian> on both nodes osd mon and mds
[14:48] <imjustmatthew> then you also need a third machine to be just a monitor
[14:48] <imjustmatthew> quorum requires n/2+1 monitors to be online
[14:48] <imjustmatthew> so when you loose either machine the other falls out of quorum
[14:49] <imjustmatthew> if you use another mahcine (it doesn't need to be dedicated or anything) to acts as a third monitor you'll be able to get the failover to work
[14:52] <maximilian> Ok I got the point, is it possible to rebuild ceph cluster just from btrfs device/partition incase of all machines are crashed except osd partitions?
[14:53] <maximilian> thats what I m trying to do now
[14:55] <maximilian> andrei: do you have also 2 x all in 1 nodes??
[14:56] <andrei> max: nope, i've got 2 storage servers and 2 clients. i've got 3 mons, 2 mds and 2 osds
[14:56] <andrei> one of my clients is running a mon service
[14:59] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:03] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[15:07] <erwan_taf> hi there
[15:08] <erwan_taf> does ceph project is awared of https://blueprints.launchpad.net/swift/+spec/diskfile-databasebroker-as-apis ?
[15:08] <erwan_taf> looks like it will be possible to setup many backends, ceph could be one of thoses
[15:09] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:11] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Quit: work)
[15:23] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[15:25] <nhm> erwan_taf: josh is probably the most knowledgeable about openstack plans.
[15:25] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[15:26] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[15:27] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit ()
[15:28] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[15:31] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:31] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[15:35] * noahmehl_ (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[15:38] * senner (~Wildcard@68-113-235-86.dhcp.stpt.wi.charter.com) has joined #ceph
[15:38] * rongze1 (~zhu@117.79.232.187) has joined #ceph
[15:39] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:39] * noahmehl_ is now known as noahmehl
[15:42] * rongze (~zhu@117.79.232.150) Quit (Ping timeout: 480 seconds)
[15:56] * portante (~user@66.187.233.206) has joined #ceph
[16:00] * mntzn (~zn@62.65.33.58) Quit (Quit: Lost terminal)
[16:01] <joao> nhm, managed to trigger #5215 ! :)
[16:01] <joao> sagewk, ^
[16:02] <nhm> joao: was that on burnupi08?
[16:02] <absynth> sagewk: around?
[16:02] <joao> no
[16:02] <nhm> joao: ah, ok. Good deal!
[16:02] <joao> absynth, he should take another hour and change
[16:03] <absynth> oh yeah, it's only 4pm
[16:03] * yehuda_hm (~yehuda@2602:306:330b:1410:803d:aead:f897:570a) Quit (Ping timeout: 480 seconds)
[16:03] <absynth> we really need to make headway with the osd scrubbing ticket
[16:03] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) has joined #ceph
[16:05] * Volture (~quassel@office.meganet.ru) Quit (Read error: Connection reset by peer)
[16:05] * Volture (~quassel@office.meganet.ru) has joined #ceph
[16:07] <nhm> joao: so any thoughts on that mon log?
[16:07] <joao> nhm, has it happened again?
[16:08] <joao> I have a theory, but haven't investigated it yet
[16:08] <joao> just trying to finish up looking into #5215 now that I have a gdb backtrace
[16:09] * aliguori (~anthony@32.97.110.51) has joined #ceph
[16:09] * yehuda_hm (~yehuda@2602:306:330b:1410:78aa:5cbb:d4b7:2aa9) has joined #ceph
[16:14] <nhm> joao: I'll rerun and see if it happens again I guess.
[16:14] <joao> nhm, I've got the log, so feel free to do whatever you want with the cluster :)
[16:15] <nhm> k
[16:16] * joelio (~Joel@88.198.107.214) has joined #ceph
[16:18] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[16:29] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[16:35] <nhm> joao: mon.b.log again
[16:37] <joao> nhm, gotcha
[16:37] <joao> definitely urgent
[16:37] <joao> looking
[16:38] <nhm> joao: worth making a bug?
[16:38] <joao> yes
[16:38] <nhm> k
[16:38] <joao> thanks
[16:39] <joao> oh wow, I triggered #5215 in two other test clusters
[16:39] <joao> this job is a miracle worker
[16:39] <nhm> heh, which job?
[16:40] <joao> I attached a version of it to #5215, but what really triggers it appears to be forcing the store to sync every time a monitor is thrashed
[16:40] <joao> eventually it goes off
[16:40] <joao> with the job on #4515 it was quicker though
[16:40] <joao> maybe because I was running 7 monitors and 13 osds
[16:41] <nhm> heh
[16:44] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[16:46] <nhm> joao: #5246
[16:46] <joao> thanks
[16:47] * bergerx_ (~bekir@78.188.204.182) Quit (Remote host closed the connection)
[16:51] <mikedawson> absynth: are you talking about #5232?
[16:52] * fmarchand (~fmarchand@90.84.146.243) has joined #ceph
[16:52] <fmarchand> Hi
[16:52] <fmarchand> I have a little pb with rados gw ...
[16:53] <fmarchand> could someone help me ?
[16:54] * joshd1 (~jdurgin@2602:306:c5db:310:2d8f:668d:89ff:fbad) Quit (Quit: Leaving.)
[16:57] * andrei (~andrei@212.183.128.236) has joined #ceph
[16:57] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[16:58] * fmarchand (~fmarchand@90.84.146.243) Quit (Read error: No route to host)
[16:59] * fmarchand (~fmarchand@90.84.146.243) has joined #ceph
[17:04] * joshd1 (~joshd@2602:306:c5db:310:6cb2:fed2:1e06:5459) has joined #ceph
[17:04] * fmarchand (~fmarchand@90.84.146.243) Quit (Read error: No route to host)
[17:08] * jahkeup (~jahkeup@199.232.79.50) has joined #ceph
[17:10] * madkiss1 (~madkiss@p4FE641FB.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[17:10] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:11] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:19] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:22] * joshd1 (~joshd@2602:306:c5db:310:6cb2:fed2:1e06:5459) Quit (Ping timeout: 480 seconds)
[17:25] * andrei (~andrei@212.183.128.236) Quit (Read error: Operation timed out)
[17:25] * jnq (~jon@198.199.79.59) has joined #ceph
[17:25] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:27] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[17:36] * senner (~Wildcard@68-113-235-86.dhcp.stpt.wi.charter.com) Quit (Quit: Leaving.)
[17:37] * senner (~Wildcard@68-113-235-86.dhcp.stpt.wi.charter.com) has joined #ceph
[17:38] * senner (~Wildcard@68-113-235-86.dhcp.stpt.wi.charter.com) Quit ()
[17:40] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[17:42] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:42] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) has joined #ceph
[17:42] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[17:42] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:44] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:53] * jnq (~jon@0001b7cc.user.oftc.net) Quit (Quit: WeeChat 0.3.7)
[17:53] * jnq (~jon@198.199.79.59) has joined #ceph
[17:53] * senner (~Wildcard@97-91-79-214.static.fdul.wi.charter.com) has joined #ceph
[17:53] * jnq (~jon@198.199.79.59) Quit ()
[17:53] * pressureman (~pressurem@62.217.45.26) Quit (Quit: Ex-Chat)
[17:54] * jnq (~jon@198.199.79.59) has joined #ceph
[17:56] * senner1 (~Wildcard@97-91-79-214.static.fdul.wi.charter.com) has joined #ceph
[17:58] * xiaoxi1 (~xiaoxi@shzdmzpr02-ext.sh.intel.com) Quit (Remote host closed the connection)
[18:03] * senner (~Wildcard@97-91-79-214.static.fdul.wi.charter.com) Quit (Ping timeout: 480 seconds)
[18:07] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:08] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:13] * mschiff (~mschiff@81.92.22.210) has joined #ceph
[18:14] * Tamil (~tamil@38.122.20.226) has joined #ceph
[18:18] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[18:18] * tnt (~tnt@91.176.13.220) has joined #ceph
[18:21] * jnq (~jon@0001b7cc.user.oftc.net) Quit (Quit: WeeChat 0.3.7)
[18:22] * jnq (~jon@198.199.79.59) has joined #ceph
[18:29] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:30] * drokita (~drokita@199.255.228.128) has joined #ceph
[18:41] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:03] * senner (~Wildcard@68-113-235-86.dhcp.stpt.wi.charter.com) has joined #ceph
[19:04] <TiCPU> after powering off my 6 ceph servers using the power breaker, it seems the MDS won't start anymore, aborts with segfault, I guess I should report a bug?
[19:05] * senner (~Wildcard@68-113-235-86.dhcp.stpt.wi.charter.com) Quit ()
[19:07] * Maskul (~Maskul@host-92-25-195-4.as13285.net) has joined #ceph
[19:07] * senner1 (~Wildcard@97-91-79-214.static.fdul.wi.charter.com) Quit (Ping timeout: 480 seconds)
[19:12] * tkensiski (~tkensiski@106.sub-70-197-16.myvzw.com) has joined #ceph
[19:12] * tkensiski (~tkensiski@106.sub-70-197-16.myvzw.com) has left #ceph
[19:13] * rturk-away is now known as rturk
[19:14] * loicd (~loic@brln-4d0cdd77.pool.mediaWays.net) has joined #ceph
[19:15] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Operation timed out)
[19:16] * tkensiski1 (~tkensiski@106.sub-70-197-16.myvzw.com) has joined #ceph
[19:16] * tkensiski1 (~tkensiski@106.sub-70-197-16.myvzw.com) has left #ceph
[19:19] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[19:23] <ccourtaut> morning
[19:25] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit (Quit: leaving)
[19:25] <ccourtaut> yehuda: i'm currently working on the rgw geo replication blueprint has we discussed a few weeks ago
[19:26] <yehudasa> ccourtaut: hi
[19:27] <ccourtaut> yehudasa, and i was wondering if you had some way to set up a test environment with master/slave region/zone?
[19:27] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:27] <yehudasa> ccourtaut: I have some scripts that I use for creating master and non-master regions
[19:28] <ccourtaut> yehudasa, are they on github, or published somewhere?
[19:28] <yehudasa> ccourtaut: not published yet, I'll try to make them available somehow now
[19:29] <ccourtaut> ok great!
[19:29] * tkensiski (~tkensiski@132.sub-70-197-10.myvzw.com) has joined #ceph
[19:29] <ccourtaut> it will help me to focus only on the tests of the API, not on deployement related stuff :)
[19:29] * tkensiski (~tkensiski@132.sub-70-197-10.myvzw.com) has left #ceph
[19:30] <yehudasa> ccourtaut: http://pastebin.com/uX0QYzMu
[19:30] <yehudasa> sorry for the manual work ...
[19:31] <yehudasa> ah .. there's a typo there in the swab.region, should be skinny:82 and not swab:81
[19:32] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[19:34] <ccourtaut> yehudasa, don't see the point here, in swab.region it is already skinny:82
[19:34] <yehudasa> there's the endpoint for the region, and there's the different zones' endpoints
[19:35] <yehudasa> since it has only one zone, the same endpoint is used
[19:35] <yehudasa> but it's not necessarily true, you can have zones communication on different ports than the endpoint
[19:35] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has left #ceph
[19:35] <ccourtaut> ok, i think i'll figure it out :)
[19:36] <yehudasa> well, for now it doesn't matter, still missing some code to support that
[19:37] <ccourtaut> btw, i'd like to know what is the current status of the rgw geo blueprint on the rgw part? and after that, what about the agents?
[19:38] <yehudasa> ccourtaut: most of the rgw-side of things is implemented, since missing copying of objects between zones/regions (which I'm working on now). Work on agents has been started recently.
[19:38] <ccourtaut> yehudasa, and about the master/slave region/zone test env, do you use vstart to run your scripts on it? or something else?
[19:38] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) has joined #ceph
[19:39] * jahkeup (~jahkeup@199.232.79.50) Quit (Ping timeout: 480 seconds)
[19:39] <ccourtaut> yehudasa, ok, i'm quite new to ceph, and rgw, and i was mostly focused on understing the way rgw works before starting to contribute on it.
[19:40] <yehudasa> ccourtaut: my tests are pretty limited thus far, I have two clusters I'm running (using vstart). There are two gateways running against these, and two apache sites configured to support these
[19:40] <ccourtaut> yehudasa, i recently thought that agents, which one have no code legacy, might be an easier entry point for me to contribute on this blueprint. What do you think about?
[19:40] <yehudasa> ccourtaut: early testing of the interfaces is probably the best help now
[19:41] <ccourtaut> yehudasa, now that i understand better how to set region/zone with "real" configuration, i think it will be my next focus
[19:41] <sjust> loicd: 327 and 340 are live?
[19:41] <yehudasa> ccourtaut: also, most of the RESTful api for the new stuff (for the agents) is described here: http://wiki.ceph.com/RESTful_API_for_DR_%2F%2F_Geo-Replication
[19:41] <yehudasa> you can give that a go too
[19:43] <ccourtaut> yehudasa, thanks for the link!
[19:43] <ccourtaut> yehudasa, Btw i know that the agent code has started recently, is there some code around?
[19:44] * sebastiandeutsch (~sebastian@p57A07ABB.dip0.t-ipconnect.de) has joined #ceph
[19:45] <yehudasa> ccourtaut: not yet, jbuck could help with that (he's offline now)
[19:46] * ccourtaut thanks yehudasa for his help!
[19:46] <ccourtaut> yehudasa, is jbuck in Pacific too?
[19:46] <yehudasa> ccourtaut: also look at http://pastebin.com/EM58ShNa
[19:46] * sebastiandeutsch (~sebastian@p57A07ABB.dip0.t-ipconnect.de) has left #ceph
[19:48] <ccourtaut> yehudasa, ok, i'll take a look too
[19:50] <ccourtaut> well i will look at everything tomorrow at the office, i think it will make things move a little faster from my side
[19:51] <loicd> sjust: if by live you mean good to merge in my opinion, then yes :-)
[19:51] <sjust> loicd: just checking
[19:52] <loicd> s/good to merge/good to review/ ;-)
[19:52] * loicd being presumptuous
[19:54] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:54] * ChanServ sets mode +v andreask
[19:55] <paravoid> are we there yet^Hwhen is 0.61.3 expected to be released?
[19:56] * The_Bishop_ (~bishop@e179000167.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[20:03] * via (~via@smtp2.matthewvia.info) Quit (Quit: brb)
[20:05] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[20:06] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[20:11] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[20:14] <sagewk> paravoid: :) i have two bugs i'm trying to trigger to make sure it isn't something that shoudl block .3... my goal is this afternoon
[20:14] <paravoid> perfect
[20:15] <paravoid> I was about to offer running git cuttlefish and so some "QA" of my own :) but it doesn't make sense if it's this afternoon
[20:16] <paravoid> s/so/do/
[20:16] <saaby> sagewk: http://tracker.ceph.com/issues/5239 <- do you want us to try out the latest cuttlefish branch?
[20:17] <saaby> i.e. might this be fixed?
[20:17] <sagewk> paravoid: any testing you can provide would be most appreciated! :)
[20:20] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) has left #ceph
[20:21] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:21] * janos (~janos@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[20:23] <sagewk> saaby: please do.
[20:24] <saaby> ok
[20:29] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[20:33] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[20:33] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[20:37] * jluis (~JL@89.181.153.209) has joined #ceph
[20:42] * joao (~JL@89.181.150.251) Quit (Ping timeout: 480 seconds)
[20:43] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[20:49] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[20:50] * joao (~JL@89-181-151-112.net.novis.pt) has joined #ceph
[20:50] * ChanServ sets mode +o joao
[20:50] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[20:51] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[20:52] * Tamil (~tamil@38.122.20.226) has joined #ceph
[20:55] * jluis (~JL@89.181.153.209) Quit (Ping timeout: 480 seconds)
[20:56] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[21:02] * wonko_be_ (bernard@november.openminds.be) has joined #ceph
[21:02] * ggreg_ (~ggreg@int.0x80.net) has joined #ceph
[21:03] * soren_ (~soren@hydrogen.linux2go.dk) has joined #ceph
[21:03] * ggreg (~ggreg@int.0x80.net) Quit (charon.oftc.net solenoid.oftc.net)
[21:03] * sbadia (~sbadia@yasaw.net) Quit (charon.oftc.net solenoid.oftc.net)
[21:03] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (charon.oftc.net solenoid.oftc.net)
[21:03] * soren (~soren@hydrogen.linux2go.dk) Quit (charon.oftc.net solenoid.oftc.net)
[21:03] * wonko_be (bernard@november.openminds.be) Quit (charon.oftc.net solenoid.oftc.net)
[21:03] * nyerup (irc@jespernyerup.dk) Quit (charon.oftc.net solenoid.oftc.net)
[21:05] * ggreg (~ggreg@int.0x80.net) has joined #ceph
[21:05] * nyerup (irc@jespernyerup.dk) has joined #ceph
[21:05] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[21:05] * sbadia (~sbadia@yasaw.net) has joined #ceph
[21:05] * ggreg (~ggreg@int.0x80.net) Quit (Ping timeout: 480 seconds)
[21:07] * rturk is now known as rturk-away
[21:07] * rturk-away is now known as rturk
[21:14] * DarkAceZ (~BillyMays@50.107.53.195) Quit (Ping timeout: 480 seconds)
[21:17] * rturk is now known as rturk-away
[21:19] * DarkAceZ (~BillyMays@50.107.53.195) has joined #ceph
[21:19] * rturk-away is now known as rturk
[21:20] * via (~via@smtp2.matthewvia.info) has joined #ceph
[21:27] * rturk is now known as rturk-away
[21:39] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:44] * drokita1 (~drokita@199.255.228.128) has joined #ceph
[21:51] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[21:58] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[21:58] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) has joined #ceph
[22:02] * sagelap (~sage@2600:1012:b00d:2a9c:a5b8:d69b:857f:ca26) has joined #ceph
[22:02] * sagelap (~sage@2600:1012:b00d:2a9c:a5b8:d69b:857f:ca26) Quit (Read error: Connection reset by peer)
[22:15] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:15] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[22:16] * sagelap1 (~sage@201.sub-70-197-72.myvzw.com) has joined #ceph
[22:17] * Machske (~Bram@d5152D87C.static.telenet.be) Quit ()
[22:17] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[22:20] * sagelap1 (~sage@201.sub-70-197-72.myvzw.com) Quit (Quit: Leaving.)
[22:26] * rturk-away is now known as rturk
[22:27] <Machske> Lost an osd today on a server running kernel 3.6.1. An error in btrfs.c line xxxx. After remount, the osd did not want to come back online even when the sd process was running. Recreated the ceph osd fs on it so it could rebalance again. Seems that btrfs on 3.6.1 ain't reliable yet. Though Oracle Linux and Suse already support it in production envs.
[22:28] * Maskul (~Maskul@host-92-25-195-4.as13285.net) Quit (Read error: Connection reset by peer)
[22:28] <Machske> And formatted it with xfs
[22:28] * Maskul (~Maskul@host-92-25-195-4.as13285.net) has joined #ceph
[22:31] <tnt> Machske: I think ceph uses "advanced" btrfs features which might not be stressed a lot in normal usage.
[22:34] <Machske> so, for now, only to play with, but not to actually store "important" data on it ?
[22:35] <Machske> I'm still wondering what a reliable fs is. This was the first issue ever that I encountered with btrfs. I had several issues with xfs, most of the times fixable, but still.
[22:35] <Kioob> I also have a lot of issues with btrfs for backup usage.
[22:36] <Machske> Kioob: what kernel version ?
[22:36] <Kioob> 3.8.*
[22:36] <Machske> hmm :s
[22:36] <Kioob> Creating concurrent snapshots doesn't work very well
[22:37] <Kioob> Btrfs with low activity works well
[22:37] <Kioob> On a desktop, it's good enough for example
[22:37] <Machske> I like zfs on linux, but have high hopes for btrfs, because the resource requirements appear to be lower that zfs
[22:39] * ggreg_ is now known as ggreg
[23:07] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[23:17] <sjust> loicd: you there?
[23:20] <loicd> yes
[23:20] <sjust> sending an email
[23:20] <sjust> didn't initially notice it went to ceph devel as well :P
[23:20] * loicd wonders what he got wrong this time :-)
[23:21] <loicd> sjust: I tend to do that, hoping that someone else will have an answer instead of bugging you every time ;-)
[23:21] <sjust> yep, good call
[23:22] <sjust> sent email
[23:22] * loicd reading
[23:23] <loicd> *dang*
[23:23] * loicd hugs sjust
[23:25] * rkeene (1011@oc9.org) has joined #ceph
[23:25] * dcasier (~dcasier@120.48.132.79.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[23:27] * sagelap (~sage@2600:1012:b00d:2a9c:a5b8:d69b:857f:ca26) has joined #ceph
[23:30] * Maskul (~Maskul@host-92-25-195-4.as13285.net) Quit (Quit: Maskul)
[23:34] * BillK (~BillK@124-148-124-185.dyn.iinet.net.au) has joined #ceph
[23:35] <Gugge-47527> anyone had any luck running zfs on top of kernel rbd?
[23:36] <Machske> Nope, our test pool kept on corrupting...
[23:36] <Gugge-47527> i tried it quick, and got a hang on zpool export right after a zpool create :)
[23:52] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[23:59] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.