#ceph IRC Log

Index

IRC Log for 2013-08-22

Timestamps are in GMT/BST.

[0:00] * doxavore (~doug@99-7-52-88.lightspeed.rcsntx.sbcglobal.net) Quit (Quit: :qa!)
[0:03] * buck1 (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[0:03] * haomaiwang (~haomaiwan@218.71.76.208) has joined #ceph
[0:05] * loicd gives up laughing after recompiling libtool on AIX and getting configure.ac:35: warning: macro 'AM_PROG_LIBTOOL' not found in library
[0:07] * haomaiwa_ (~haomaiwan@218.71.73.3) Quit (Ping timeout: 480 seconds)
[0:07] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:10] * Cube1 (~Cube@88.128.80.12) has joined #ceph
[0:10] * Cube (~Cube@88.128.80.12) Quit (Read error: Connection reset by peer)
[0:12] * berngp (~bernardo@50-193-59-202-static.hfc.comcastbusiness.net) has joined #ceph
[0:13] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[0:13] * ircolle1 (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has left #ceph
[0:13] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[0:13] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[0:21] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[0:22] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[0:29] * carif (~mcarifio@ip-207-145-81-212.nyc.megapath.net) Quit (Quit: Ex-Chat)
[0:29] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit (Read error: Connection reset by peer)
[0:29] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[0:31] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[0:33] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Remote host closed the connection)
[0:34] * mschiff (~mschiff@85.182.236.82) Quit (Remote host closed the connection)
[0:34] * rudolfsteiner (~federicon@181.21.134.196) has joined #ceph
[0:46] * indeed (~indeed@206.124.126.33) has joined #ceph
[0:50] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) has joined #ceph
[0:52] * tnt (~tnt@109.130.102.13) Quit (Ping timeout: 480 seconds)
[0:52] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit (Read error: Connection reset by peer)
[0:53] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[0:53] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit ()
[0:53] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:54] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[0:57] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[0:57] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[1:17] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:18] * devoid (~devoid@130.202.135.235) Quit (Quit: Leaving.)
[1:27] * LeaChim (~LeaChim@176.24.168.228) Quit (Ping timeout: 480 seconds)
[1:33] * haomaiwang (~haomaiwan@218.71.76.208) Quit (Ping timeout: 480 seconds)
[1:33] * rudolfsteiner (~federicon@181.21.134.196) Quit (Quit: rudolfsteiner)
[1:39] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[1:40] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[1:41] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[1:54] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:01] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:02] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[2:07] * yo61 (~yo61@lin001.yo61.net) Quit (Ping timeout: 480 seconds)
[2:07] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[2:09] * AfC (~andrew@2407:7800:200:1011:8d47:5008:7573:ba95) has joined #ceph
[2:09] * todayman1 (~Adium@128.220.70.121) has joined #ceph
[2:09] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[2:10] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[2:11] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:12] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Read error: Operation timed out)
[2:23] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[2:31] * buck1 (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[2:34] * madkiss (~madkiss@64.125.181.92) Quit (Quit: Leaving.)
[2:41] * berngp (~bernardo@50-193-59-202-static.hfc.comcastbusiness.net) has left #ceph
[2:47] * Cube1 (~Cube@88.128.80.12) Quit (Read error: Connection reset by peer)
[2:47] * Cube (~Cube@88.128.80.12) has joined #ceph
[2:48] * haomaiwang (~haomaiwan@125.108.230.219) has joined #ceph
[2:51] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[2:56] * yanzheng (~zhyan@134.134.139.72) has joined #ceph
[3:04] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[3:07] * Cube (~Cube@88.128.80.12) Quit (Quit: Leaving.)
[3:13] * haomaiwa_ (~haomaiwan@125.108.227.141) has joined #ceph
[3:14] * indeed (~indeed@206.124.126.33) Quit (Remote host closed the connection)
[3:14] * yy-nm (~Thunderbi@218.74.38.31) has joined #ceph
[3:17] * haomaiwang (~haomaiwan@125.108.230.219) Quit (Ping timeout: 480 seconds)
[3:19] * hflai_ is now known as hflai
[3:28] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has left #ceph
[3:29] * madkiss (~madkiss@184.105.243.188) has joined #ceph
[3:39] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[3:40] * clayb (~kvirc@199.172.169.79) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[3:57] * markbby (~Adium@168.94.245.4) has joined #ceph
[3:59] * markbby (~Adium@168.94.245.4) Quit ()
[4:00] * markbby (~Adium@168.94.245.4) has joined #ceph
[4:00] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[4:03] * markbby1 (~Adium@168.94.245.1) has joined #ceph
[4:03] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[4:05] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[4:13] <todayman1> I'm running version 0.61.8. Most of my pgs are stuck stale. When I try to query the pg (with ceph ph {id} query), the response is "i don't have pgid {id}". Yesterday, 3/5 of the disks I was using were corrupted and I replaced them. I'd like to just delete most of the data and get the Ceph cluster into a usable state. Any ideas on how to proceed?
[4:13] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[4:15] <yanzheng> eph osd pool delete xxx xxx --yes-i-really-really-mean-it
[4:16] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[4:20] <todayman1> yanzheng: excellent! Thank you.
[4:23] * rudolfsteiner (~federicon@181.21.167.249) has joined #ceph
[4:28] * Mrwoofer (~oftc-webi@ool-44c430bb.dyn.optonline.net) has joined #ceph
[4:28] <Mrwoofer> Heya guys
[4:44] <mmmucky> hi
[4:46] * todayman2 (~Adium@128.220.70.121) has joined #ceph
[4:46] * todayman3 (~Adium@128.220.70.121) has joined #ceph
[4:47] * todayman3 (~Adium@128.220.70.121) has left #ceph
[4:51] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Remote host closed the connection)
[4:54] * todayman1 (~Adium@128.220.70.121) Quit (Ping timeout: 480 seconds)
[4:54] * todayman2 (~Adium@128.220.70.121) Quit (Ping timeout: 480 seconds)
[4:56] <dmick> sup mmmucky
[4:56] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[4:57] * rudolfsteiner (~federicon@181.21.167.249) Quit (Quit: rudolfsteiner)
[4:58] * rudolfsteiner (~federicon@181.21.167.249) has joined #ceph
[5:05] * fireD (~fireD@93-142-214-29.adsl.net.t-com.hr) has joined #ceph
[5:06] * madkiss (~madkiss@184.105.243.188) Quit (Quit: Leaving.)
[5:07] * fireD_ (~fireD@93-142-214-218.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:08] <mmmucky> oh, not much. just saying hi to Mrwoofer.
[5:13] <Mrwoofer> Hi!
[5:14] <Mrwoofer> I was curious about a feature of ceph, that I heard in one of the OS presentations last year
[5:15] <dmick> what was that?
[5:23] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[5:25] <Mrwoofer> I heard that you can have multiple types of storage and use one as primary i.e ot serve write/reads and the others as 2nd and 3rd replication drives.
[5:25] <Mrwoofer> i.e SSD's and SATA drives
[5:26] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:28] * rudolfsteiner (~federicon@181.21.167.249) Quit (Quit: rudolfsteiner)
[5:40] <dmick> yep
[5:41] <dmick> http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds
[5:41] <dmick> that may not make much sense without reading the rest of that page
[5:48] * devoid (~devoid@107-219-204-197.lightspeed.cicril.sbcglobal.net) has joined #ceph
[5:48] * devoid (~devoid@107-219-204-197.lightspeed.cicril.sbcglobal.net) Quit ()
[5:53] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[6:08] * yy-nm (~Thunderbi@218.74.38.31) Quit (Quit: yy-nm)
[6:08] * markbby1 (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[6:10] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Quit: leaving)
[6:11] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[6:19] <Mrwoofer> Ahh. there it is
[6:19] <Mrwoofer> So that keep scosts very down
[6:19] <Mrwoofer> excellent
[6:27] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[6:29] <dmick> actual tiering is being planned now
[6:29] <dmick> http://wiki.ceph.com/01Planning/02Blueprints/Emperor
[6:38] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:39] * Mrwoofer (~oftc-webi@ool-44c430bb.dyn.optonline.net) Quit (Remote host closed the connection)
[6:41] * madkiss (~madkiss@184.105.243.188) has joined #ceph
[6:43] * MrWoofer (~oftc-webi@ool-44c430bb.dyn.optonline.net) has joined #ceph
[6:44] <MrWoofer> Tiering is based on actual data usage, of which is used more actively, what you pasted me before seemed to be more for redundancy as opposed to anything else. i.e SSD as the primary read/write and the replicas as failover/rebuild tool yes no?
[6:45] <dmick> yes
[6:46] <dmick> tiering can bring the benefits of faster storage with less fast storage required (at least for read)
[6:49] * jmlowe1 (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[6:50] * jmlowe (~Adium@2601:d:a800:97:6403:f9a9:1669:5ce4) Quit (Ping timeout: 480 seconds)
[6:51] * rongze (~quassel@106.120.176.78) Quit (Ping timeout: 480 seconds)
[6:53] <sage> yanzheng: around?
[6:55] * brzm (~medvedchi@node199-194.2gis.com) has joined #ceph
[7:05] * yy-nm (~Thunderbi@218.74.38.31) has joined #ceph
[7:12] * rongze (~quassel@211.155.113.206) has joined #ceph
[7:16] <yanzheng> yes
[7:17] <yanzheng> sage, ?
[7:17] <sage> hey, just wondering if you'd looked at the second patch from majianpeng
[7:18] <sage> oh i see
[7:18] <sage> cool :)
[7:27] <dmick> sage: got any tricks for populating a cluster quickly? rbd-fuse and dd aren't really cutting it
[7:28] <sage> rados bench -p rbd bench 30000 write --no-cleanup
[7:28] <sage> you could try using the long-term cluster tho
[7:28] <dmick> sounds good
[7:28] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:28] <dmick> er, the long-term cluster?
[7:28] <sage> mira049
[7:28] <dmick> ah
[7:29] <sage> just don't poke on it too hard; it has real data in it
[7:29] <dmick> well I have one built, but just wanna shove some data in
[7:29] <dmick> either way.
[7:30] <dmick> probably better to have a separate one so we can play with osd states
[7:30] <sage> yeuah
[7:30] <sage> yeah
[7:32] <dmick> never found myself wishing for 10GB networks more :)
[7:33] <sage> yeah mira is only 1g :(
[7:33] <dmick> yep
[7:34] * madkiss (~madkiss@184.105.243.188) Quit (Quit: Leaving.)
[7:34] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:50] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[7:51] * tnt (~tnt@109.130.102.13) has joined #ceph
[7:59] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:00] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[8:03] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:12] * MrWoofer (~oftc-webi@ool-44c430bb.dyn.optonline.net) Quit (Quit: Page closed)
[8:19] * bwesemann (~bwesemann@2001:1b30:0:6:1907:2ef:3dde:28f4) Quit (Remote host closed the connection)
[8:20] * bwesemann (~bwesemann@2001:1b30:0:6:6832:7482:7288:7e60) has joined #ceph
[8:28] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:30] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[8:37] * lx0 is now known as lxo
[8:38] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:47] * hijacker (~hijacker@213.91.163.5) Quit (Quit: Leaving)
[8:52] * AfC (~andrew@2407:7800:200:1011:8d47:5008:7573:ba95) has left #ceph
[8:53] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:54] * ssejour (~sebastien@out-chantepie.fr.clara.net) has joined #ceph
[8:56] * tnt (~tnt@109.130.102.13) Quit (Ping timeout: 480 seconds)
[8:56] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[8:56] * ChanServ sets mode +v andreask
[9:00] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[9:00] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:01] * nerdtron (~kenneth@202.60.8.252) Quit (Quit: Leaving)
[9:02] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:05] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[9:06] * mschiff (~mschiff@pD95104B7.dip0.t-ipconnect.de) has joined #ceph
[9:11] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:12] * LeaChim (~LeaChim@176.24.168.228) has joined #ceph
[9:14] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:14] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:19] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:30] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) has joined #ceph
[9:36] * sleinen1 (~Adium@130.59.94.162) has joined #ceph
[9:38] * sleinen (~Adium@2001:620:0:46:f504:2be1:d863:12b6) Quit (Ping timeout: 480 seconds)
[9:39] * rongze_ (~quassel@117.79.232.249) has joined #ceph
[9:44] * rongze (~quassel@211.155.113.206) Quit (Ping timeout: 480 seconds)
[9:45] * vipr (~vipr@78-23-114-68.access.telenet.be) has joined #ceph
[9:45] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[9:47] * sleinen1 (~Adium@130.59.94.162) Quit (Ping timeout: 480 seconds)
[9:47] * sleinen (~Adium@2001:620:0:25:78e9:2366:3e:5113) has joined #ceph
[9:50] * scipkw (~chatzilla@nusnet-238-37.dynip.nus.edu.sg) has joined #ceph
[9:51] <scipkw> hi..
[9:53] * sel (~sel@212.62.233.233) has joined #ceph
[9:54] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[9:55] <scipkw> I was searching the mailing list to find out what happen when cluster network failed in ceph and came across a comment from greg that the cluster will behave badly when this happen
[9:55] <scipkw> it was posted in jan 2013
[9:55] <scipkw> may i know if cuttlefish or dumpling still face this issue?
[9:55] * yo61 (~yo61@lin001.yo61.net) has joined #ceph
[9:57] <yo61> OSD and MON nodes done and understood (sort of). Starting to look at actually putting some data on the cluster this morning
[10:02] * lx0 is now known as lxo
[10:03] <sel> Hello, I'm trying to set up a cluster with ceph-deploy (last time I set up a cluster was with mkceph), I'm having a bit of a problem with some keys, how is the bootstrap-osd key generated? Is it created during the ceph-deploy install step?
[10:03] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[10:03] * cofol1986 (~xwrj@110.90.119.113) has left #ceph
[10:05] <yo61> sel: I did this:
[10:05] <yo61> ceph-deploy new mon0{1,2,3}
[10:06] * X3NQ (~X3NQ@195.191.107.205) has joined #ceph
[10:06] <yo61> (installed ceph manually from an internal repo)
[10:06] <sel> Same as me...
[10:07] <yo61> ceph-deploy mon create mon0{1,2,3}
[10:07] <sel> My problem is "[ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-osd/...."
[10:07] <yo61> OK, check on the mon hosts - is create keys still running?
[10:07] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) has joined #ceph
[10:08] <sel> Not as far as I can see...
[10:09] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Ping timeout: 480 seconds)
[10:09] <yo61> Try gatherkeys on just the first mon host?
[10:11] <sel> same problem on all
[10:12] * tobru (~quassel@2a02:41a:3999::94) has joined #ceph
[10:12] <yo61> What do the logs say on the mon nodes?
[10:12] <tobru> hi. does anyone know what "WARNING: cannot read region map" means? (radosgw)
[10:12] <yo61> (I should point out that I'm no expert - still getting to grips with it myself)
[10:13] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[10:13] <tobru> "radosgw-admin region-map get" says "failed to read region map: (2) No such file or directory"
[10:13] * cofol1986 (~xwrj@110.90.119.113) has left #ceph
[10:15] <yo61> sel: /win 15
[10:15] <yo61> Gah
[10:17] <yo61> sel: I found a couple of problems
[10:17] <yo61> Firstly, I installed haveged to produce more entropy for the key generation
[10:18] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[10:19] <yo61> I also found that communication problems between nodes caused the mons to not come up cleanly
[10:26] * sleinen (~Adium@2001:620:0:25:78e9:2366:3e:5113) Quit (Quit: Leaving.)
[10:26] * sleinen (~Adium@130.59.94.162) has joined #ceph
[10:31] * jksM (~jks@3e6b5724.rev.stofanet.dk) Quit (Read error: Connection reset by peer)
[10:32] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[10:34] * sleinen1 (~Adium@130.59.94.162) has joined #ceph
[10:34] * sleinen (~Adium@130.59.94.162) Quit (Read error: Connection reset by peer)
[10:44] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[10:51] * Cube (~Cube@88.128.80.12) has joined #ceph
[10:52] <cofol1986> test
[10:59] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[11:01] * yanzheng (~zhyan@134.134.139.72) Quit (Quit: Leaving)
[11:03] * andreask1 (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[11:03] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[11:03] * ChanServ sets mode +v andreask1
[11:03] * andreask1 is now known as andreask
[11:06] * yy-nm (~Thunderbi@218.74.38.31) Quit (Quit: yy-nm)
[11:06] * ChoppingBrocoli_ (~quassel@rrcs-74-218-204-10.central.biz.rr.com) has joined #ceph
[11:09] * allsystemsarego (~allsystem@5-12-37-127.residential.rdsnet.ro) has joined #ceph
[11:10] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[11:11] * saabylaptop (~saabylapt@2a02:2350:18:1010:1847:efe5:a03:5c6) has joined #ceph
[11:13] * ChoppingBrocoli (~quassel@rrcs-74-218-204-10.central.biz.rr.com) Quit (Ping timeout: 480 seconds)
[11:23] <yo61> So, delayed trains all behind me now
[11:23] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:23] * KindTwo (~KindOne@198.14.195.103) has joined #ceph
[11:24] <yo61> Are there any guidelines/best-practices for creating pools for use with Openstack?
[11:24] * KindTwo is now known as KindOne
[11:26] * scipkw_ (~chatzilla@nusnet-238-37.dynip.nus.edu.sg) has joined #ceph
[11:26] * scipkw (~chatzilla@nusnet-238-37.dynip.nus.edu.sg) Quit (Read error: Connection reset by peer)
[11:26] * scipkw_ is now known as scipkw
[11:36] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[11:36] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[11:36] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit ()
[11:37] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[11:38] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[11:42] * scipkw (~chatzilla@nusnet-238-37.dynip.nus.edu.sg) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 23.0.1/20130814063812])
[11:42] * sleinen (~Adium@macsl.switch.ch) has joined #ceph
[11:43] * sleinen1 (~Adium@130.59.94.162) Quit (Ping timeout: 480 seconds)
[11:45] * athrift_ (~nz_monkey@203.86.205.13) Quit (Remote host closed the connection)
[11:53] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[12:02] * Cube (~Cube@88.128.80.12) Quit (Quit: Leaving.)
[12:02] * cofol1986 (~xwrj@110.90.119.113) Quit (Quit: Leaving.)
[12:03] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:04] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[12:10] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[12:14] * X3NQ (~X3NQ@195.191.107.205) Quit (Quit: Leaving)
[12:17] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[12:23] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) has joined #ceph
[12:34] <loicd> ccourtaut: would you have time to comment on https://github.com/ceph/ceph/pull/518/files ?
[12:34] * loicd fishing for reviewers :-)
[13:07] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Quit: jks)
[13:09] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:11] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[13:17] * flickerdown (~flickerdo@westford-nat.juniper.net) Quit (Read error: Connection reset by peer)
[13:25] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[13:25] * rudolfsteiner (~federicon@181.21.152.152) has joined #ceph
[13:27] * nerdtron (~kenneth@202.60.8.252) Quit (Ping timeout: 480 seconds)
[13:28] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[13:29] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[13:35] * ChoppingBrocoli_ (~quassel@rrcs-74-218-204-10.central.biz.rr.com) Quit (Remote host closed the connection)
[13:50] <ccourtaut> loicd: i usually use explicit constructor, but i do not know what the ceph dev usually do about that
[13:51] <loicd> ccourtaut: I thought about it too :-) But that's not widely used in the codebase therefore I refrained from it
[13:51] <loicd> explicit hobject_t(const sobject_t &o) :
[13:51] <ccourtaut> ok
[13:52] <loicd> maybe in ~20 other places
[13:53] <ccourtaut> not widely used indeed
[13:53] <ccourtaut> i like to use it to avoid ambiguity, but it's clearly not mandatory, and sometimes you might want to not use it
[13:55] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:58] <ccourtaut> do not know if i would have exposed the plugins variable from the registry in public
[13:59] <ccourtaut> might be more conveniant, but allow devs to do crap with it :)
[14:00] <ccourtaut> loicd: another question, is there any threads involved?
[14:00] <ccourtaut> i assume you are in a thread-safe zone here, if threads there is
[14:01] <loicd> hum
[14:02] * loicd thinking about threads
[14:02] <ccourtaut> ^^'
[14:02] <loicd> could you comment on this ( threads ) in the pull request. I need to think about it and can't reply right now.
[14:02] <loicd> ?
[14:02] <ccourtaut> sure
[14:03] <loicd> which plugin variables are you refering to ?
[14:03] <ccourtaut> the plugins variable member or ErasureCodePluginRegistry
[14:03] <ccourtaut> s/or/of/
[14:03] <loicd> ah
[14:04] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) has joined #ceph
[14:04] <loicd> right, it should be protected
[14:04] * rudolfsteiner (~federicon@181.21.152.152) Quit (Quit: rudolfsteiner)
[14:05] <loicd> ?
[14:05] <ccourtaut> so you can have a getter to get a const& to it
[14:05] <loicd> could you please add this as a comment to https://github.com/ceph/ceph/pull/518/files#L4R43 ?
[14:05] <ccourtaut> and a method to add a plugin to it
[14:06] <ccourtaut> that way, if threads are involved, you can protect the acces in the method, without looking around for anyone messing with it manually
[14:06] <loicd> yes
[14:07] <ccourtaut> loicd: does ErasureCodePluginRegistry is meant to be derived?
[14:07] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:08] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:09] <ccourtaut> loicd: i don't think it is, so plugins should be private, protected would tell developers that it is meant to be derived
[14:10] <loicd> no, it is not. I despise private: because it forces you to friend test classes to mess with the members and create conditions for errors and the like. I tend to avoid private: for this reason.
[14:10] <ccourtaut> loicd: oh
[14:10] <ccourtaut> ok
[14:10] <ccourtaut> :)
[14:12] <ccourtaut> loicd: commented
[14:13] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[14:13] * ChanServ sets mode +v andreask
[14:13] * flickerdown (~flickerdo@westford-nat.juniper.net) has joined #ceph
[14:14] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:14] <loicd> thanks !
[14:15] * Wolff_John (~jwolff@vpn.monarch-beverage.com) has joined #ceph
[14:23] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Quit: wogri_risc)
[14:24] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:24] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[14:24] * markbby (~Adium@168.94.245.4) has joined #ceph
[14:28] * markbby1 (~Adium@168.94.245.4) has joined #ceph
[14:29] * yanzheng (~zhyan@101.82.60.167) has joined #ceph
[14:30] * Cube (~Cube@88.128.80.12) has joined #ceph
[14:33] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[14:36] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:42] <thelan> Hello there
[14:42] <thelan> It seems i've got an issue with data rebalancing after adding an osd to an existing cluster
[14:43] <thelan> i've put the osd in but it keeps a 0 weight in the 2nd column when i run the ceph osd tree command
[14:48] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:50] * smiley (~smiley@cpe-67-251-108-92.stny.res.rr.com) Quit (Quit: smiley)
[14:50] * brzm (~medvedchi@node199-194.2gis.com) Quit (Quit: Leaving.)
[14:50] * rudolfsteiner (~federicon@181.21.152.152) has joined #ceph
[14:53] <Gugge-47527> thelan: if you paste the output from ceph osd tree, it is easier to help :)
[14:54] * ntranger_ (~ntranger@proxy2.wolfram.com) Quit (Remote host closed the connection)
[14:56] <thelan> Gugge-47527: sure => http://pastebin.com/pVr8t4VA
[14:56] <thelan> osd.0 was removed and then re-added
[15:00] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[15:01] * iii8 (~Miranda@91.207.132.71) Quit (Ping timeout: 480 seconds)
[15:04] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[15:07] <loicd> is there a *const* version of std::map[] ?
[15:08] <loicd> no there is not
[15:08] <loicd> tedious aspect of map
[15:19] * yanzheng (~zhyan@101.82.60.167) Quit (Ping timeout: 480 seconds)
[15:23] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[15:24] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:24] <ccourtaut> loicd: what do you want to do?
[15:25] <loicd> if (parameters.find(name) == parameters.end() ||
[15:25] <loicd> parameters.find(name)->second.size() == 0) {
[15:25] <loicd> dout(10) << name << " defaults to " << default_value << dendl;
[15:25] * dmsimard (~Adium@ap01.wireless.cl.mtl.iweb.com) has joined #ceph
[15:25] * rudolfsteiner (~federicon@181.21.152.152) Quit (Quit: rudolfsteiner)
[15:25] <loicd> it's heavy
[15:25] <loicd> oh well :-)
[15:26] <ccourtaut> loicd: i don't get the use case of your const map
[15:26] <ccourtaut> :)
[15:26] <ccourtaut> but you could have a const map&
[15:28] * yanzheng (~zhyan@101.83.188.231) has joined #ceph
[15:28] * dmsimard1 (~Adium@108.163.152.66) has joined #ceph
[15:31] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:31] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[15:33] * dmsimard (~Adium@ap01.wireless.cl.mtl.iweb.com) Quit (Ping timeout: 480 seconds)
[15:37] * BillK (~BillK-OFT@58-7-52-33.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:43] * madkiss (~madkiss@184.105.243.235) has joined #ceph
[15:44] * rudolfsteiner (~federicon@181.21.156.154) has joined #ceph
[15:51] * madkiss (~madkiss@184.105.243.235) Quit (Ping timeout: 480 seconds)
[15:51] * madkiss (~madkiss@184.105.243.235) has joined #ceph
[15:57] * aliguori (~anthony@32.97.110.51) has joined #ceph
[15:57] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:58] <via> if i replace an osd and has to rebuild, is that backfill or recovery? and assuming i don't need to doa lot with the cluster, is there a way to speed it up?
[15:58] <via> i tried setting the max backfill/max recoverys higher with little to no effect
[15:58] * sleinen1 (~Adium@130.59.94.174) has joined #ceph
[16:03] * sleinen1 (~Adium@130.59.94.174) Quit (Read error: Connection reset by peer)
[16:03] * flickerdown (~flickerdo@westford-nat.juniper.net) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[16:03] * sleinen1 (~Adium@2001:620:0:2d:1452:a81d:48c2:a7ca) has joined #ceph
[16:04] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:04] * sleinen2 (~Adium@130.59.94.174) has joined #ceph
[16:04] * torment2 (~torment@pool-173-78-201-127.tampfl.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[16:06] * sleinen (~Adium@macsl.switch.ch) Quit (Ping timeout: 480 seconds)
[16:09] * LPG_ (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[16:11] * sleinen1 (~Adium@2001:620:0:2d:1452:a81d:48c2:a7ca) Quit (Ping timeout: 480 seconds)
[16:12] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[16:12] * sleinen2 (~Adium@130.59.94.174) Quit (Ping timeout: 480 seconds)
[16:24] * sleinen (~Adium@2001:620:0:25:5471:514e:dba7:f501) has joined #ceph
[16:25] * sleinen (~Adium@2001:620:0:25:5471:514e:dba7:f501) Quit ()
[16:29] <mikedawson> via: it is backfill http://ceph.com/docs/next/dev/osd_internals/backfill_reservation/
[16:31] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:32] <via> is osd_num_concurrent_backfills a configurable parameter?
[16:32] <mikedawson> via: you can typically tune the backfill until your disk is at 100% iowait "osd max backfills = 10" is default I believe
[16:32] <via> so osd_max_backfills is the right one to up?
[16:33] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[16:34] <mikedawson> via: I turning it down limits disk thrashing for me, at 10 my disk is maxed (I get 100% utilization on 'iostat -x')
[16:35] <via> the two disks this his happening on are 85% and 40%
[16:35] <via> but its doing 3 recovery op/s ish and typically aroun 10 MB/s
[16:35] <via> which i feel is somewhat slow
[16:36] <via> whenever i replace a disk it usually takes about 2 days to complete
[16:36] <via> i'll try turning it down
[16:38] <mikedawson> via: unfortunately backfill doesn't seem to be one big sequential write. Backfill duration is typically limited by spindle contention more than by the sequential write speed of the drive, it seems.
[16:38] <via> i see
[16:39] * jskinner (~jskinner@199.127.136.233) has joined #ceph
[16:39] <mikedawson> via: I would actually inject that value up "ceph osd tell \* injectargs '--osd_max_backfills 15'" on Cuttlefish or "ceph tell osd.* injectargs '--osd_max_backfills 15'" on Dumpling
[16:40] <via> i was injecting it up and down on the two nodes that are being backfilld
[16:40] <via> i've tried going as high as 40
[16:41] <via> now i'm trying to go lower, but so far i don't really see any affect from doing any of this
[16:41] <via> do i need to do it on all osd's for it to be effective?
[16:42] <mikedawson> via: You should see a difference in the %util on the backfilling drives watching 'iostat -xt 2'
[16:42] <via> one of the drives is showing close to 90%, the other is basically 0
[16:42] <mikedawson> via: I have done it across the whole cluster, not sure if that makes the difference.
[16:43] <via> i'll try and see if it does
[16:43] * grepory (~Adium@108-218-234-162.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[16:44] <mikedawson> via: yeah, you'll get periods like that. There seems to be some sort of ordering. If you have graphs, you'll notice some patters that make it appear it could go faster (like when one of the backfilling drives is basically idle). Haven't found a solution to that issue yet.
[16:45] <via> yeah, thats basically what i see
[16:45] <via> thanks for the advice, i'll keep messing with it
[16:45] <mikedawson> via: do you happen to run rbd?
[16:45] * devoid (~devoid@130.202.135.184) has joined #ceph
[16:46] <mikedawson> via: if you do, are your guests responsive during this backfill?
[16:46] <via> i do for a few vms, but they're3 not doing anything at the moment, i'll check
[16:46] <via> yeah, they're plenty responsive
[16:48] <mikedawson> via: my windows guests booted from rbd volumes with heavy writes show extreme latency during recovery or backfill
[16:49] <via> oh.. i mean this is a linux guest, and it seems fine, as if nothing was happening on the cluster
[16:50] <mikedawson> via: yep, my Linux guests seem to behave, while my Windows guests struggle
[16:52] <via> it *is* windows
[16:56] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[16:57] * dmsimard1 (~Adium@108.163.152.66) Quit (Quit: Leaving.)
[16:57] * dmsimard (~Adium@108.163.152.66) has joined #ceph
[17:03] * AaronSchulz_ (~chatzilla@216.38.130.164) has joined #ceph
[17:03] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:04] * sagelap (~sage@2600:1012:b02a:354f:a06c:ddbf:ff76:8bdc) has joined #ceph
[17:05] * alram (~alram@38.122.20.226) has joined #ceph
[17:08] * AaronSchulz (~chatzilla@216.38.130.164) Quit (Ping timeout: 480 seconds)
[17:08] * AaronSchulz_ is now known as AaronSchulz
[17:13] * saabylaptop (~saabylapt@2a02:2350:18:1010:1847:efe5:a03:5c6) has left #ceph
[17:17] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Quit: Ex-Chat)
[17:20] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[17:23] * yanzheng (~zhyan@101.83.188.231) Quit (Remote host closed the connection)
[17:23] <ssejour> hi. I need some help regarding cephx
[17:24] <ssejour> I have auth_cluster_required = cephx with no problem
[17:25] <ssejour> when I set auth_service_required = none , I can not manage my cluster anymore
[17:25] <sagelap> loicd: ping!
[17:25] <loicd> sagelap: pong
[17:25] <ssejour> when I do a ceph auth list , I can see the client.admin entry
[17:25] <sagelap> loicd: doc gitbuilder still has some problems on master
[17:26] * loicd checking
[17:26] <sagelap> ssejour: can you pastebin the output of 'ceph -s --debug-auth 20 --debug-ms 1' so we can see what it is negotiating?
[17:26] <ssejour> and the key is the same as the one in /etc/ceph/ceph.client.admin.keyring
[17:26] <sagelap> ssejour: my guess is that the mon still have auth client required = cephx
[17:27] <ssejour> sagelap: how can I check that it still have this conf?
[17:28] <ssejour> sagelap: auth_client_required = none in ceph.conf
[17:29] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:29] * haomaiwa_ (~haomaiwan@125.108.227.141) Quit (Remote host closed the connection)
[17:30] * haomaiwang (~haomaiwan@li498-162.members.linode.com) has joined #ceph
[17:30] <ssejour> sagelap: http://pastebin.com/UMitq7rW
[17:31] <ssejour> "adding auth protocol: none" ... :(
[17:34] * haomaiwa_ (~haomaiwan@125.108.227.141) has joined #ceph
[17:36] <sagelap> is your goal just to turn it all fof?
[17:36] <sagelap> try removing the other configs and just put 'auth supported = none'
[17:36] <sagelap> yehudasa: rgw next tests failed last night
[17:38] * madkiss (~madkiss@184.105.243.235) Quit (Quit: Leaving.)
[17:38] * allsystemsarego (~allsystem@5-12-37-127.residential.rdsnet.ro) Quit (Quit: Leaving)
[17:39] * sagelap (~sage@2600:1012:b02a:354f:a06c:ddbf:ff76:8bdc) Quit (Quit: Leaving.)
[17:39] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Quit: Ex-Chat)
[17:41] <ssejour> sagelap: my goal is to turn it on :)
[17:41] * haomaiwang (~haomaiwan@li498-162.members.linode.com) Quit (Ping timeout: 480 seconds)
[17:41] * paravoid (~paravoid@scrooge.tty.gr) Quit (Ping timeout: 480 seconds)
[17:44] * grepory (~Adium@108-218-234-162.lightspeed.sntcca.sbcglobal.net) Quit (Quit: Leaving.)
[17:57] <loicd> sage: I fixed the file I missed yesterday and a) git clone fresh, b) run admin/build-doc, c) grep ERROR and found nothing. https://github.com/ceph/ceph/pull/529 . Sorry for the double fail .
[18:00] <ssejour> do you have any advises to help me find what I did wrong?
[18:01] <sagewk> np, thanks for fixing it up!
[18:01] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[18:06] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[18:16] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[18:21] * madkiss (~madkiss@64.125.181.92) has joined #ceph
[18:21] * terje- (~root@135.109.216.239) Quit (Ping timeout: 480 seconds)
[18:23] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) Quit (Ping timeout: 480 seconds)
[18:24] <sel> Is ceph-deply the prefered way to deploy a production cluster?
[18:24] <sagewk> yes. or chef, or juju
[18:28] * sleinen (~Adium@2001:620:0:2d:64b5:6e84:2f6:d7fe) has joined #ceph
[18:29] * sleinen1 (~Adium@2001:620:0:26:c431:eee9:4f54:6cb0) has joined #ceph
[18:32] * tnt (~tnt@109.130.102.13) has joined #ceph
[18:33] * mschiff (~mschiff@pD95104B7.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[18:36] * sleinen (~Adium@2001:620:0:2d:64b5:6e84:2f6:d7fe) Quit (Ping timeout: 480 seconds)
[18:43] * sleinen (~Adium@2001:620:0:46:699a:1e8e:f82a:c697) has joined #ceph
[18:46] * Wolff_John (~jwolff@vpn.monarch-beverage.com) Quit (Ping timeout: 480 seconds)
[18:47] * dmsimard (~Adium@108.163.152.66) Quit (Ping timeout: 480 seconds)
[18:49] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:49] * sleinen1 (~Adium@2001:620:0:26:c431:eee9:4f54:6cb0) Quit (Ping timeout: 480 seconds)
[18:50] * jskinner (~jskinner@199.127.136.233) Quit (Remote host closed the connection)
[18:51] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:52] * jeff-YF_ (~jeffyf@67.23.117.122) has joined #ceph
[18:57] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[18:57] * jeff-YF_ is now known as jeff-YF
[18:57] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: If at first you don't succeed, skydiving is not for you)
[18:59] * ssejour (~sebastien@out-chantepie.fr.clara.net) Quit (Quit: Leaving.)
[18:59] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[19:09] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[19:11] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[19:21] <odyssey4me> When setting up a mount through cephfs, what are the minimum rights required?
[19:21] <odyssey4me> ie should it be rwx for osd?
[19:21] <odyssey4me> something like this: ceph-authtool -n client.foo --cap osd 'allow rwx pool=customer-pool'
[19:22] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:23] <gregaf> just rw on the osd, and mds allow, and mon allow rw (or maybe just read); I believe this is in the docs somewhere
[19:27] <odyssey4me> gregaf - thanks, perhaps then all of the examples here need to be executed? http://ceph.com/docs/master/rados/operations/auth-intro/
[19:27] <kraken> \o
[19:28] <gregaf> all looks good, just not everything you needed
[19:31] <odyssey4me> so this is all that's needed?
[19:31] <odyssey4me> caps mds = "allow"
[19:31] <odyssey4me> caps mon = "allow r"
[19:31] <odyssey4me> caps osd = "allow rw"
[19:32] <gregaf> I believe so, but you should search the docs as I haven't done it in a while
[19:35] <odyssey4me> odd, I keep getting an auth error on mount
[19:36] * n1md4 (~nimda@anion.cinosure.com) has joined #ceph
[19:36] <n1md4> hi. i think mon is broken ... can any one help me getting it back online
[19:37] <n1md4> Currently, I get this "2013-08-22 18:37:50.789064 7f36e0b9e700 0 -- :/11773 >> 192.168.99.211:6789/0 pipe(0x29c82c0 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault"
[19:40] <odyssey4me> ok, I think I'm missing something in the process of generating the key properly
[19:41] <odyssey4me> I'm guessing that the key must be generated somewhere specific?
[19:43] * odyssey4me is working through http://ceph.com/docs/master/rados/operations/authentication/ again
[19:46] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[19:46] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[19:46] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[19:54] <odyssey4me> gregaf - no, there's something funky going on here... even client.admin can't mount
[19:54] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[19:54] <odyssey4me> can anyone assist me with getting cephfs working?
[19:54] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[19:56] <odyssey4me> haha, that's funny - the mount option is not client.admin... it's just adminn
[19:57] * nwat_ (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[19:59] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:00] * Wolff_John (~jwolff@vpn.monarch-beverage.com) has joined #ceph
[20:00] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[20:02] <n1md4> anyone even hint toward the problem?
[20:02] <Kioob> Please which kernel is recommended for Kernel RBD client, since 3.10.* doesn't work and 3.9.* is not stable ?
[20:03] <Kioob> Can I use 3.4.* ?
[20:04] * dmsimard (~Adium@ap05.wireless.co.mtl.iweb.com) has joined #ceph
[20:04] <Kioob> n1md4: it looks like a not running MON yes
[20:04] * alfredodeza is now known as alfredo|lunch
[20:05] <n1md4> init.d/ceph restart doesn't bring it back though ... how do I start it
[20:07] <Kioob> start with "/etc/init.d/ceph status"
[20:07] <Kioob> to see if it's running
[20:07] <Kioob> then, look in logs
[20:07] <sagewk> if it's ubuntu it may be 'start ceph-mon-all'
[20:08] <n1md4> Kioob: http://pastie.org/pastes/8260391/text
[20:08] <Kioob> so "mon.pp-ceph-1: not running."
[20:08] <n1md4> right
[20:08] <n1md4> and osd.2
[20:09] <n1md4> but not sure how to fix either of these problems ... can you lend a couple of minutes?
[20:09] <Kioob> Try what sagewk said, he knows really better than me :D
[20:09] <n1md4> sagewk: it's debian
[20:10] <sagewk> does 'service ceph start mon ' start it?
[20:11] <n1md4> sagewk: no http://pastie.org/pastes/8260401/text
[20:12] <Kioob> tail -n1 -f /var/log/ceph/mon.*log & /etc/init.d/ceph start ; fg
[20:12] <Kioob> failed: 'ulimit -n 8192; /usr/bin/ceph-mon -i pp-ceph-1 --pid-file /var/run/ceph/mon.pp-ceph-1.pid -c /etc/ceph/ceph.conf '
[20:12] <Kioob> ???
[20:12] <Kioob> You modified the /etc/init.d/ceph script ?
[20:13] <n1md4> http://pastie.org/pastes/8260403/text
[20:13] <n1md4> I have not modified anything.
[20:13] <n1md4> will ceph mind being powered off?
[20:14] <n1md4> I have/had 2 nodes, with 3 osds on each, and one of the nodes also acted as the mon/mds
[20:14] <n1md4> i powered them both off to move, and back on again ..
[20:14] <n1md4> certianly the cause of the problem, so 2 questions really, i) can ceph handle being powered off ii) how to get mon started
[20:20] <Kioob> I'm not sure that the "ulimit" stuff works that way
[20:22] <sagewk> oh, hte ulimit might be the problem, can you run that command from the prompt?
[20:22] <sagewk> do you ahve 'max open files' in the ceph.conf anywhere?
[20:23] <Kioob> what do you obtain with ? ceph-conf -c /etc/ceph/ceph.conf -n mon.pp-ceph-1 max_open_files
[20:24] * Kioob is now known as Kioob`aw
[20:27] * dmsimard1 (~Adium@108.163.152.2) has joined #ceph
[20:28] * Cube (~Cube@88.128.80.12) Quit (Read error: Connection reset by peer)
[20:28] * Cube (~Cube@88.128.80.12) has joined #ceph
[20:30] * dmsimard (~Adium@ap05.wireless.co.mtl.iweb.com) Quit (Ping timeout: 480 seconds)
[20:33] * guppy (~quassel@guppy.xxx) Quit (Remote host closed the connection)
[20:35] * guppy (~quassel@guppy.xxx) has joined #ceph
[20:56] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[20:59] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) Quit (Quit: Leaving.)
[20:59] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[21:01] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) Quit ()
[21:05] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[21:08] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[21:12] * alfredo|lunch is now known as alfredodeza
[21:17] * aliguori (~anthony@32.97.110.51) Quit (Remote host closed the connection)
[21:33] * yasu` (~yasu`@99.23.160.231) has joined #ceph
[21:37] * sagelap (~sage@38.122.20.226) has joined #ceph
[21:39] <mtl1> Hi all… I made some updates to a crushmap today, and ended up with 6 pgs stuck in "active+remapped" state. I didn't really do much to the crush map, aside from add in a datacenter entry, and a rack entry for each osd host. Currently, each osd host is in a seperate rack. At least the first stuck pg is showing "waiting_on_backfill"
[21:39] <mtl1> I'm not finding anything out there that's pointing me in the direction of a fix for that.
[21:40] <mtl1> This is dumpling, the first release of it.
[21:40] * rturk-away is now known as rturk
[21:53] <gregaf> mtl1: you've got some data moving around as a result of the crushmap change
[21:53] <gregaf> in the meantime, the OSDs have stuck an override mapping of PGs->OSDs so that OSDs which actually have the data are servicing accesses
[21:54] <mtl1> gregaf: I watched while it rebalanced. I thought that was the case originally as well. It stopped moving data around hours ago though. And, there isn't much there.
[21:54] <gregaf> when there is an available backfill slot (which is a reservation to ship data over to the PG) then it will do so, and eventually the remapped entry will be removed as all the data arrives where it should be
[21:55] <gregaf> hmm, maybe some other issue then that sjust or davidz are more likely to know about
[21:55] <gregaf> you could try restarting the OSDs involved; there is some issue we haven't tracked down that makes them get stuck because the OSD forgets to move them on to the next state
[21:56] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[21:56] <mtl1> 4.6d is one of the ones that shows in this state, if I query that, it isn't showing which OSD(s) it's using.
[21:57] <gregaf> ceph pg dump | grep 4.6d
[21:58] <mtl1> [6,5] is what I'm looking for?
[21:58] <gregaf> there should be another pair of brackets next to that one as well
[21:58] <gregaf> is it empty or does it list some other numbers?
[21:58] <mtl1> That one is just [6]
[21:58] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[21:59] <gregaf> ah, got it
[21:59] <gregaf> this is the crush tunables issue where crush isn't finding a good mapping
[21:59] <gregaf> search for that in the docs and it should explain things (including your options) pretty well
[22:04] <mtl1> gregaf: Thanks. "ceph osd crush tunables optimal" did the trick.
[22:04] <gregaf> np, thanks for using Ceph :)
[22:04] <mtl1> Thanks for making it. ;)
[22:07] * rudolfsteiner (~federicon@181.21.156.154) Quit (Ping timeout: 480 seconds)
[22:08] <davidz> mtl1: You don't want "ceph osd crush tunables optimal" if "you are using a pre-3.8(ish) kernel or other very old (pre-bobtail)
[22:08] <davidz> clients."
[22:08] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[22:09] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[22:10] <mtl1> My cluster servers are using 3.8.0-27. My (kvm servers) clients are running 3.5.0-37.
[22:10] <mtl1> I tried 3.8 on the kvm servers, and was having pretty bad networking issues with the guests.
[22:10] <davidz> mtl1: You can always set them back see http://ceph.com/docs/master/rados/operations/crush-map/#tunables
[22:10] <joshd> with kvm the kernel version doesn't matter
[22:12] * jhujhiti (~jhujhiti@00012a8b.user.oftc.net) has joined #ceph
[22:13] <jhujhiti> can anyone tell me the current status of a freebsd port?
[22:13] <nwat_> jhujhiti: in ceph/wip-port are updates to get freebsd to compile on the latest stable. however, there are some stability issues with it that haven't been resolved
[22:14] <jhujhiti> nwat_: thanks. is there a mailing list or somewhere i can get regular status updates without having to come here and ask every few weeks?
[22:14] <davidz> mtl1: You should be ok then, the cluster server kernels don't matter either.
[22:14] <nwat_> jhujhiti: it'd be ok to discuss it on ceph-devel
[22:15] <n1md4> sagewk: ulimit = unlimited
[22:15] * Kioob`aw is now known as Kioob
[22:15] <sagewk> ulimit adjsuts limits.. ulimit -n unlimited would make num files unlimited. my guess is that ulimit -a will say num files is 4096?
[22:15] <jhujhiti> nwat_: cool. looks like there's a recent thread on portability there. thanks
[22:18] * jhujhiti (~jhujhiti@00012a8b.user.oftc.net) has left #ceph
[22:19] <dmick> sagewk: IIRC there's a kernel limit, and it might be that unlimited is rejected (dim memories)
[22:19] <dmick> (specifically for -n)
[22:20] <sagewk> it's pretty big, though.. and the dfeault is 4096, and the failed start line showed 8192
[22:22] <Kioob> I checked on my Debian, the syntax is good
[22:22] <Kioob> on mine, the $cmd is : 'ulimit -n 131072; /usr/bin/ceph-osd -i 65 --pid-file /var/run/ceph/osd.65.pid -c /etc/ceph/ceph.conf'
[22:24] <Kioob> (ceph 0.61.7 on Debian wheezy)
[22:25] <yasu`> Hi, does osd-num has some special meaning still ? Last time I checked it does IIRC.
[22:25] <sagewk> what happens if you type that ulimit -n 131072 as root?
[22:25] <sagewk> and what does ulimit -a say before that?
[22:26] <yasu`> My question exactly is whether if I can remove osd.1 to (say) osd.5 from 16 nodes.
[22:26] * markbby1 (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[22:26] <Kioob> sagewk: on mine it works fine and it's a production cluster. So can't say what the value was before starting any OSD or MON
[22:27] <sagewk> right sorry, that was meant for n1md4
[22:27] <sagewk> n1md4: what does ulimit -a say, and waht ahppens when you type ulimit -n 8192 as root?
[22:28] <dmick> is there a "which PGs are on this OSD" report, or does one need to assemble that from the pg active list?
[22:28] <sagewk> dmick: need to assemble it
[22:29] <Kioob> I think that is problem come from the ceph-mon part. So it should probably try to directly launch it with : '/usr/bin/ceph-mon -i pp-ceph-1 --pid-file /var/run/ceph/mon.pp-ceph-1.pid -c /etc/ceph/ceph.conf '
[22:29] <Kioob> no ?
[22:29] <dmick> k
[22:30] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:34] <n1md4> sagewk: ulimit says unlimited
[22:34] <n1md4> and -n 8192 returns nothing
[22:34] <sagewk> Kioob: yeah so it's ceph-mon. what happens when you try to launch it?
[22:36] * wrencsok1 (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[22:36] * wrencsok1 (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[22:37] * wrencsok1 (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[22:37] * wrencsok (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[22:38] <Kioob> Any help for that bug http://tracker.ceph.com/issues/5876 please ? What should I do ?
[22:42] <sagewk> Kioob: have you tried 3.10?
[22:42] <sagewk> has this happened more than once?
[22:43] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[22:44] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:44] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[22:49] <Kioob> sagewk: I have the problem once a day, on at least 1 of 3 servers
[22:49] <sagewk> Kioob: can you try a 3.10 kernel?
[22:49] <Kioob> and for 3.10, I have this bug : http://tracker.ceph.com/issues/5760 ; it's not usable at all
[22:50] <sagewk> ah
[22:51] * rudolfsteiner (~federicon@181.21.157.137) has joined #ceph
[22:53] <sagewk> Kioob: how many snapshots do you have on the image?
[22:53] <Kioob> I look
[22:54] <Kioob> 221 images, and 15 snapshots per images, so more than 3000
[22:54] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[22:54] <wrencsok> what is the latest version of ceph-deploy i should install on a 61.7 cluster? is 1.2.2 compatible?
[22:54] <Kioob> (and I would like to increase that number... by a factor of 20)
[22:54] <alfredodeza> wrencsok: it is
[22:55] <wrencsok> thanks
[22:55] <alfredodeza> I *just* released 1.2.2 to PyPI, no RPM/DEB packages yet
[22:55] <Kioob> sagewk: I suppose it's not a good idea to increase snapshot number for now ?
[22:55] <alfredodeza> if you need the OS packages you will have to settle with 1.2.1
[22:57] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[22:57] <wrencsok> ah ok. thanks again.
[22:59] <sagewk> Kioob: it's the # snapshots per image that i think is the trigger
[22:59] <sagewk> can you try adding lots of snaps to a single image and see if that triggers it?
[22:59] <Kioob> oh, good to know
[22:59] <Kioob> mmm
[22:59] <Kioob> I move VM, and I try
[23:00] <alfredodeza> wrencsok: if you can wait a couple of hours, we should have the OS packages then :)
[23:00] <alfredodeza> 1.2.2 fixes a couple of issues and improves error handling a bit
[23:00] <alfredodeza> is there any specific reason you were looking for 1.2.2 ?
[23:00] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) Quit (Quit: Leaving.)
[23:00] <alfredodeza> I consider it a bug fix release
[23:03] <wrencsok> haven't set it up, been running/tuning/testing perf without it. i don't fetch, another fetches that for me and puts in the lab mirror. he's out for the night a few time zones away. he'll grab the latest for me when he gets in for me an throw it on the lab deb mirror.
[23:04] <wrencsok> i have an old 1.0 version, but i'll wait to tomorrow to enable it and play around with it.
[23:04] <alfredodeza> oh there are a *wealth* of improvements and changes from 1.0
[23:04] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[23:04] <alfredodeza> you would be already in a great spot even with 1.2.1
[23:05] <sagewk> Kioob: thanks!
[23:08] * Wolff_John (~jwolff@vpn.monarch-beverage.com) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 23.0.1/20130814063812])
[23:11] * rudolfsteiner (~federicon@181.21.157.137) Quit (Ping timeout: 480 seconds)
[23:15] <Kioob> sagewk: on this host I have only one VM. In that VM I do "dd if=$RBD" in a loop (RBD device is bigger of the RAM). Then, each second I create a snapshot of this drive
[23:15] <Kioob> is it a good test ?
[23:15] <Kioob> or too slow ?
[23:17] <sagewk> Kioob: sure
[23:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[23:19] <Kioob> I create 600 snapshots
[23:20] <Kioob> Should I do writes instead of reads in the VM ?
[23:21] <sagewk> oh.. yeah, writes pelase ;)
[23:22] <Kioob> like that : while [ 1 ] ; do dd if=/dev/urandom of=FILE bs=2M conv=sync ; rm -f FILE ; done
[23:23] <Kioob> ?
[23:24] <sagewk> sure. just needs to generate some write activity
[23:25] <yasu`> problem was solved by myself. what a flexible system the Ceph is...
[23:26] <Kioob> > 2000 snapshot now (yes I increased frequency)
[23:30] * pbojanic (~Adium@65-112-206-178.dia.static.qwest.net) has joined #ceph
[23:34] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[23:35] <n1md4> sagewk: http://pastebin.com/raw.php?i=B7eNZfLW
[23:40] <Kioob> n1md4: /usr/bin/ceph-mon -i pp-ceph-1 --pid-file /var/run/ceph/mon.pp-ceph-1.pid -c /etc/ceph/ceph.conf
[23:42] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[23:43] <n1md4> Kioob: http://pastebin.com/raw.php?i=sDqp87Jw
[23:43] <Kioob> strange :S
[23:43] <Kioob> and with strace ?
[23:44] * Tamil1 (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[23:46] <Kioob> sagewk: 5000 snapshots now. It's slower to add snapshots, but it works. No hang for now (still on kernel 3.9.11) Should I continue ?
[23:48] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[23:49] * ninkotech__ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[23:53] <sagewk> Kioob: back
[23:53] <sagewk> Kioob: sorry, no.. it's not something that simple it seems.
[23:55] * LeaChim (~LeaChim@176.24.168.228) Quit (Ping timeout: 480 seconds)
[23:56] <Kioob> ok :)
[23:56] <Kioob> so now, remove them... maybe it will throw the bug
[23:58] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:58] <Kioob> thanks for your time sagewk, I leave for the night.
[23:59] <Kioob> ++
[23:59] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:59] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.