#ceph IRC Log

Index

IRC Log for 2013-10-22

Timestamps are in GMT/BST.

[0:01] * mschiff (~mschiff@port-49786.pppoe.wtnet.de) Quit (Remote host closed the connection)
[0:01] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[0:03] * sleinen1 (~Adium@user-23-15.vpn.switch.ch) has joined #ceph
[0:05] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:05] <wrencsok> off the top of someone's head, does any know if there is an active memory leak with mon daemons?
[0:06] <wrencsok> didn't think they were supposed to spike so high, this is 67.4 cuttle: 7315 root 20 0 3975m 3.1g 6684 S 5 19.7 519:26.69 ceph-mon
[0:06] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) has joined #ceph
[0:07] <wrencsok> 3.1g of real memory usually they sit much lower around ~1G
[0:09] * sleinen (~Adium@2001:620:0:25:6d0d:d2f1:43ea:c314) Quit (Ping timeout: 480 seconds)
[0:10] <wrencsok> er dumpling 67.4. seems over the course of a few weeks my memory slowly creeps up on mon's. will start tracking it to verify. restarting the mon's things are back where they should be at little over 1g
[0:11] * dmsimard1 (~Adium@2607:f748:9:1666:45b9:6e15:1cd7:5785) Quit (Ping timeout: 480 seconds)
[0:12] * danieagle (~Daniel@186.214.61.130) has joined #ceph
[0:13] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[0:15] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:16] * john_barbee_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[0:16] * diegows (~diegows@200-127-157-157.net.prima.net.ar) Quit (Ping timeout: 480 seconds)
[0:24] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[0:32] * sleinen1 (~Adium@user-23-15.vpn.switch.ch) Quit (Quit: Leaving.)
[0:32] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[0:32] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[0:33] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[0:40] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:49] * glanzi (~glanzi@187.107.160.160) has joined #ceph
[0:49] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[0:50] * rongze (~rongze@114.249.24.120) has joined #ceph
[0:52] * The_Bishop (~bishop@g230082012.adsl.alicedsl.de) has joined #ceph
[0:55] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[1:00] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:02] * rongze (~rongze@114.249.24.120) Quit (Ping timeout: 480 seconds)
[1:05] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:10] * jwilliams_ (~jwilliams@72.5.59.176) Quit (Read error: Operation timed out)
[1:12] * gsaxena (~gsaxena@pool-71-178-225-182.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[1:14] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:14] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[1:19] * JoeGruher (~JoeGruher@134.134.139.72) Quit ()
[1:20] * cfreak201 (~cfreak200@p4FF3EB0C.dip0.t-ipconnect.de) has joined #ceph
[1:20] * cfreak200 (~cfreak200@p4FF3EB0C.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[1:25] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[1:28] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) has joined #ceph
[1:30] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[1:34] * bandrus1 (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[1:35] * JoeGruher (~JoeGruher@134.134.139.76) has joined #ceph
[1:47] * ircolle (~Adium@2601:1:8380:2d9:4cf:f5a8:9108:e140) Quit (Quit: Leaving.)
[1:53] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) Quit (Quit: Leaving.)
[1:57] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[1:59] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[2:00] * Guest3030 (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[2:00] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:06] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has joined #ceph
[2:07] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) Quit (Read error: Connection reset by peer)
[2:08] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) has joined #ceph
[2:11] * alram (~alram@216.103.134.250) Quit (Ping timeout: 480 seconds)
[2:15] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Ping timeout: 480 seconds)
[2:18] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Ping timeout: 480 seconds)
[2:23] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:25] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[2:31] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:32] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[2:34] * zhyan__ (~zhyan@134.134.139.76) has joined #ceph
[2:35] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:35] * angdraug (~angdraug@64-79-127-122.static.wiline.com) Quit (Quit: Leaving)
[2:36] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[2:42] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[2:45] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) Quit (Quit: Leaving.)
[2:46] * Cube (~Cube@66-87-64-232.pools.spcsdns.net) has joined #ceph
[2:47] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:49] * yy-nm (~Thunderbi@122.224.154.38) has joined #ceph
[2:49] * gsaxena (~gsaxena@pool-71-178-29-202.washdc.fios.verizon.net) has joined #ceph
[2:52] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:53] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[2:59] <pmatulis_> shouldn't this be more of a WARNING than an ERROR?
[2:59] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Remote host closed the connection)
[3:00] <pmatulis_> [node1][ERROR ] INFO:ceph-disk:Will colocate journal with data on /dev/sdb
[3:01] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[3:05] <joshd1> pmatulis_: yeah, that could be improved. it's INFO from ceph-disk, but going to stderr, which is logged at ERROR by ceph-deploy aiui
[3:06] <pmatulis_> joshd1: ok
[3:07] <pmatulis_> joshd1: but is there anything else i need to do after 'ceph-deploy osd create node1:sdb' ? 'ceph osd tree' shows the new osd as 'down & out'
[3:08] <pmatulis_> (the docs don't say i need to do anything else)
[3:09] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[3:11] <joshd1> pmatulis_: that should do it - the osd must have failed to start or is having trouble talking to the monitors
[3:11] <nerdtron> hi all, my cluster have scrub errors, how do i start troubleshooting? http://pastebin.com/hHhm9ZzG
[3:12] * haomaiwang (~haomaiwan@183.220.22.42) has joined #ceph
[3:16] * thomnico (~thomnico@70.35.39.20) Quit (Ping timeout: 480 seconds)
[3:16] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[3:18] <pmatulis_> joshd1: this is what i find on node1: http://paste.ubuntu.com/6280346/ - normal?
[3:19] <nerdtron> how to i set another mds server as hot standby?
[3:20] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:21] <joshd1> pmatulis_: that looks normal - does the osd show up in the monitor logs (it should send them a boot message)
[3:22] <joshd1> pmatulis_: and is the ceph-osd process still running?
[3:26] * Pedras (~Adium@64.191.206.83) Quit (Quit: Leaving.)
[3:26] * Husky (~sam@host81-138-206-9.in-addr.btopenworld.com) has joined #ceph
[3:27] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[3:28] * rongze (~rongze@114.249.24.120) has joined #ceph
[3:28] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:28] * Husky_ (~sam@host81-138-206-9.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[3:29] <pmatulis_> joshd1: i don't see any reference to osd.3 , let alone a 'boot' for it. i do see a 'boot' for osd.0 when the node was rebooted a while back. yes, the osd process is running, but i restarted it just in case
[3:30] <joshd1> pmatulis_: have the monitors formed a quorum?
[3:31] <pmatulis_> joshd1: i only have 2 but yeah
[3:33] <joshd1> pmatulis_: sounds like a network issue between osd and mon then, debug ms = 1 on osds or mons might tell you more
[3:36] <joshd1> nerdtron: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-February/000238.html
[3:36] <joshd1> nerdtron: new mdses are standby by default
[3:39] * haomaiwang (~haomaiwan@183.220.22.42) Quit (Remote host closed the connection)
[3:40] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[3:42] * rongze (~rongze@114.249.24.120) Quit (Ping timeout: 480 seconds)
[3:47] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:48] * glanzi (~glanzi@187.107.160.160) Quit (Quit: glanzi)
[3:50] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[3:56] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[3:57] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit ()
[3:58] * julian (~julianwa@125.70.135.165) has joined #ceph
[4:04] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:04] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[4:06] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:07] * haomaiwang (~haomaiwan@183.220.21.78) has joined #ceph
[4:10] <nerdtron> thnx joshd i'll try it now
[4:12] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[4:20] * julian (~julianwa@125.70.135.165) Quit (Quit: afk)
[4:21] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[4:23] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[4:24] * a is now known as Guest3075
[4:26] * rongze (~rongze@211.155.113.208) has joined #ceph
[4:27] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[4:32] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Read error: Operation timed out)
[4:36] * Guest3075 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[4:41] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[4:47] * rongze_ (~rongze@117.79.232.204) has joined #ceph
[4:53] * rongze (~rongze@211.155.113.208) Quit (Read error: Operation timed out)
[4:54] * Pedras (~Adium@c-24-130-196-123.hsd1.ca.comcast.net) has joined #ceph
[4:57] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has joined #ceph
[5:02] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[5:05] * fireD (~fireD@93-139-180-195.adsl.net.t-com.hr) has joined #ceph
[5:07] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[5:07] * fireD_ (~fireD@78-0-203-148.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:09] * The_Bishop_ (~bishop@f048124069.adsl.alicedsl.de) has joined #ceph
[5:16] * The_Bishop (~bishop@g230082012.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[5:21] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[5:26] * mrmayhm (~rollietik@69.80.103.108) has joined #ceph
[5:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[5:32] * glzhao (~glzhao@118.195.65.67) Quit (Ping timeout: 480 seconds)
[5:37] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[5:41] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has left #ceph
[5:43] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:44] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:44] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[5:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:51] * KindTwo (KindOne@h77.37.186.173.dynamic.ip.windstream.net) has joined #ceph
[5:51] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:52] * KindTwo is now known as KindOne
[5:53] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:59] * themgt (~themgt@201-223-204-108.baf.movistar.cl) has joined #ceph
[6:01] * mrmayhm (~rollietik@69.80.103.108) Quit (Ping timeout: 480 seconds)
[6:04] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:04] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[6:05] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[6:07] * haomaiwang (~haomaiwan@183.220.21.78) Quit (Remote host closed the connection)
[6:08] * haomaiwang (~haomaiwan@183.220.21.78) has joined #ceph
[6:10] * JoeGruher (~JoeGruher@134.134.139.76) Quit (Remote host closed the connection)
[6:16] * haomaiwang (~haomaiwan@183.220.21.78) Quit (Ping timeout: 480 seconds)
[6:22] * sarob (~sarob@2601:9:7080:13a:e157:2c7a:e70:7e14) has joined #ceph
[6:26] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:30] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:31] * rongze_ (~rongze@117.79.232.204) Quit (Remote host closed the connection)
[6:31] * sleinen (~Adium@2001:620:0:26:3de1:b4fa:1ec:ef9c) has joined #ceph
[6:32] * rongze (~rongze@211.155.113.241) has joined #ceph
[6:32] * ksingh (~Adium@2001:708:10:91:f803:7660:eef:ff2) has joined #ceph
[6:33] * Cube (~Cube@66-87-64-232.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[6:34] * ksingh1 (~Adium@2001:708:10:10:54fb:c1b8:d896:5c3d) has joined #ceph
[6:38] * rongze (~rongze@211.155.113.241) Quit (Read error: Operation timed out)
[6:40] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:40] * ksingh (~Adium@2001:708:10:91:f803:7660:eef:ff2) Quit (Ping timeout: 480 seconds)
[6:41] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[6:45] * sarob (~sarob@2601:9:7080:13a:e157:2c7a:e70:7e14) Quit (Remote host closed the connection)
[6:45] * sarob (~sarob@2601:9:7080:13a:e157:2c7a:e70:7e14) has joined #ceph
[6:49] * sleinen (~Adium@2001:620:0:26:3de1:b4fa:1ec:ef9c) Quit (Quit: Leaving.)
[6:50] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[6:50] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[6:51] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[6:51] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[6:53] * sarob (~sarob@2601:9:7080:13a:e157:2c7a:e70:7e14) Quit (Ping timeout: 480 seconds)
[6:59] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:01] * nerdtron (~Administr@202.60.8.250) Quit (Ping timeout: 480 seconds)
[7:01] * haomaiwang (~haomaiwan@119.6.74.130) has joined #ceph
[7:03] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[7:03] * haomaiwang (~haomaiwan@119.6.74.130) Quit (Read error: Connection reset by peer)
[7:07] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[7:08] * Administrator_ (~Administr@202.60.8.250) has joined #ceph
[7:11] * sleinen (~Adium@2001:620:0:25:4508:a191:a6b6:2b09) has joined #ceph
[7:11] * haomaiwang (~haomaiwan@183.220.18.230) has joined #ceph
[7:11] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[7:11] * nerdtron (~Administr@202.60.8.250) Quit (Read error: Operation timed out)
[7:14] * gsaxena (~gsaxena@pool-71-178-29-202.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:17] * Pedras (~Adium@c-24-130-196-123.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:17] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[7:19] * danieagle (~Daniel@186.214.61.130) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[7:19] * Administrator_ (~Administr@202.60.8.250) Quit (Ping timeout: 480 seconds)
[7:29] * nerdtron (~Administr@202.60.8.250) Quit (Ping timeout: 480 seconds)
[7:32] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[7:37] * ksingh1 (~Adium@2001:708:10:10:54fb:c1b8:d896:5c3d) Quit (Quit: Leaving.)
[7:40] * haomaiwa_ (~haomaiwan@183.220.18.230) has joined #ceph
[7:40] * haomaiwang (~haomaiwan@183.220.18.230) Quit (Read error: Connection reset by peer)
[7:40] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) has joined #ceph
[7:41] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Quit: Leaving.)
[7:44] * nerdtron (~Administr@202.60.8.250) Quit (Ping timeout: 480 seconds)
[7:47] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[7:47] <nerdtron> i tried repairing pg using ceph pg repair but i still have inconsistent pgs
[7:47] <nerdtron> what is the next step to take?
[7:48] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[7:50] * ksingh (~Adium@teeri.csc.fi) Quit ()
[7:57] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:57] * haomaiwang (~haomaiwan@124.161.77.141) has joined #ceph
[8:01] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[8:03] * ajazdzewski (~quassel@lpz-66.sprd.net) has joined #ceph
[8:04] * haomaiwa_ (~haomaiwan@183.220.18.230) Quit (Ping timeout: 480 seconds)
[8:06] <davidzlap> nerdtron: Check out bugs #5141 or #5148 to see if they describe your repair issue.
[8:06] * sleinen (~Adium@2001:620:0:25:4508:a191:a6b6:2b09) Quit (Quit: Leaving.)
[8:06] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[8:08] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[8:09] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[8:10] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:10] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[8:12] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:14] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:15] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[8:17] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:17] * haomaiwa_ (~haomaiwan@183.220.16.23) has joined #ceph
[8:17] * mschiff (~mschiff@port-28576.pppoe.wtnet.de) has joined #ceph
[8:19] <nerdtron> davidz, nope..how can i repair the pgs?
[8:23] * mschiff (~mschiff@port-28576.pppoe.wtnet.de) Quit (Remote host closed the connection)
[8:24] * haomaiwang (~haomaiwan@124.161.77.141) Quit (Ping timeout: 480 seconds)
[8:26] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[8:33] * RuediR (~Adium@130.59.94.192) has joined #ceph
[8:35] * sleinen (~Adium@2001:620:0:26:a9b0:fa92:8370:bfd5) has joined #ceph
[8:35] * gsaxena (~gsaxena@pool-108-56-185-167.washdc.fios.verizon.net) has joined #ceph
[8:48] * haomaiwang (~haomaiwan@124.161.77.141) has joined #ceph
[8:48] * Vjarjadian (~IceChat77@94.1.37.151) Quit (Read error: Connection reset by peer)
[8:51] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[8:55] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) Quit (Ping timeout: 480 seconds)
[8:55] * ksingh (~Adium@teeri.csc.fi) Quit (Read error: Connection reset by peer)
[8:55] * haomaiwa_ (~haomaiwan@183.220.16.23) Quit (Ping timeout: 480 seconds)
[8:59] * rongze (~rongze@123.151.28.71) has joined #ceph
[9:01] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) has joined #ceph
[9:01] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) Quit ()
[9:01] * sleinen (~Adium@2001:620:0:26:a9b0:fa92:8370:bfd5) Quit (Quit: Leaving.)
[9:01] * sleinen (~Adium@130.59.94.193) has joined #ceph
[9:02] * sleinen1 (~Adium@130.59.94.193) has joined #ceph
[9:02] * sleinen (~Adium@130.59.94.193) Quit (Read error: Connection reset by peer)
[9:03] * sleinen (~Adium@2001:620:0:25:70ef:b6c5:9183:2183) has joined #ceph
[9:10] * sleinen1 (~Adium@130.59.94.193) Quit (Ping timeout: 480 seconds)
[9:20] * simulx (~simulx@vpn.expressionanalysis.com) Quit (Quit: Nettalk6 - www.ntalk.de)
[9:25] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:27] * mattt_ (~textual@94.236.7.190) has joined #ceph
[9:28] * nerdtron (~Administr@202.60.8.250) Quit (Ping timeout: 480 seconds)
[9:30] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Remote host closed the connection)
[9:32] * nigwil (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) has joined #ceph
[9:34] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[9:36] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[9:36] * ChanServ sets mode +v andreask
[9:38] * nigwil_ (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) Quit (Ping timeout: 480 seconds)
[9:50] * freedomhui (~freedomhu@117.79.232.204) has joined #ceph
[9:51] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[9:55] * ksingh (~Adium@teeri.csc.fi) Quit (Read error: Connection reset by peer)
[9:56] <jerker> Is Ceph stable on 32-bit hardware? Is the memory limit based on OSD or storage space? I might get hold of a rack of old storage blades (2 drives, 2 GB RAM, 1.5 GHz Intel Celeron 32-bit each)
[9:58] * haomaiwang (~haomaiwan@124.161.77.141) Quit (Ping timeout: 480 seconds)
[9:58] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[9:59] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[9:59] * ChanServ sets mode +v andreask
[10:00] <jerker> Aha, the recommendation is 1 GB/OSD not 2 GB/OSD as I thought. Thanks. Cool.
[10:01] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) has joined #ceph
[10:01] <jerker> What about 32-bit. Is it possible? The total amount of storage might be up to a couple of PB if I may play with it all.
[10:03] * sleinen (~Adium@2001:620:0:25:70ef:b6c5:9183:2183) Quit (Quit: Leaving.)
[10:06] * sleinen (~Adium@130.59.94.193) has joined #ceph
[10:07] * sleinen1 (~Adium@2001:620:0:25:8019:3e22:ce3b:e119) has joined #ceph
[10:11] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[10:13] * mschiff (~mschiff@pD951181F.dip0.t-ipconnect.de) has joined #ceph
[10:14] * sleinen (~Adium@130.59.94.193) Quit (Ping timeout: 480 seconds)
[10:17] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[10:21] * rongze_ (~rongze@117.79.232.204) has joined #ceph
[10:23] * mschiff (~mschiff@pD951181F.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[10:28] * rongze (~rongze@123.151.28.71) Quit (Ping timeout: 480 seconds)
[10:37] * huangjun (~kvirc@111.172.153.78) has joined #ceph
[10:39] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[10:41] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:48] * ksingh (~Adium@teeri.csc.fi) Quit (Read error: Connection reset by peer)
[10:51] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[10:54] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[10:57] * sleinen1 (~Adium@2001:620:0:25:8019:3e22:ce3b:e119) Quit (Quit: Leaving.)
[10:57] * sleinen (~Adium@130.59.94.193) has joined #ceph
[11:02] * Macheske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[11:05] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) has joined #ceph
[11:05] * sleinen (~Adium@130.59.94.193) Quit (Ping timeout: 480 seconds)
[11:05] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[11:06] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[11:07] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[11:07] * Machske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[11:07] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[11:07] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[11:14] * mattt__ (~textual@92.52.76.140) has joined #ceph
[11:17] * mattt_ (~textual@94.236.7.190) Quit (Read error: Connection reset by peer)
[11:17] * rendar (~s@host184-179-dynamic.10-87-r.retail.telecomitalia.it) has joined #ceph
[11:26] * jantje (~jan@paranoid.nl) has left #ceph
[11:34] * haomaiwang (~haomaiwan@183.220.23.6) has joined #ceph
[11:36] * ksingh (~Adium@2001:708:10:91:5844:914b:a77b:9690) has joined #ceph
[11:37] <niklas> hi
[11:37] <niklas> dmesg -T is full with theese:
[11:37] <niklas> http://pastebin.com/20yt0vbf
[11:37] <niklas> what do I do about that?
[11:37] <niklas> thats on my OSD Hosts
[11:38] <niklas> The ceph Bugtracker does not seem to have anything about it
[11:38] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[11:40] * Macheske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[11:42] * ksingh1 (~Adium@b-v6-0014.vpn.csc.fi) has joined #ceph
[11:43] * haomaiwang (~haomaiwan@183.220.23.6) Quit (Read error: Connection reset by peer)
[11:44] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[11:44] <Gdub> Good morning everyone
[11:45] <Gdub> am struggling into setting up ceph-deploy (using http://ceph.com/docs/master/start/quick-ceph-deploy/)
[11:45] <Gdub> am stuck atgathering the keys
[11:46] <Gdub> i have 4 differents warnings
[11:46] <Gdub> i couldnt see something that could help in the mailing list
[11:46] <erwan_taf> niklas: your trace is a little bit too short
[11:46] <erwan_taf> niklas: sounds like your xfs have some troubles
[11:46] <Gdub> [ceph_deploy.gatherkeys][WARNIN] Unable to find /etc/ceph/ceph.client.admin.keyring on
[11:46] * gucki (~smuxi@77-56-39-154.dclient.hispeed.ch) has joined #ceph
[11:46] <Gdub> [ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-osd/ceph.keyring on
[11:47] <niklas> erwan_taf: thats what dmesg gives me… Where would I get a longer trace from?
[11:47] <niklas> erwan_taf: And what do I do about it?
[11:47] <Gdub> [ceph_deploy.gatherkeys][WARNIN] Unable to find /var/lib/ceph/bootstrap-mds/ceph.keyring on
[11:47] * ksingh (~Adium@2001:708:10:91:5844:914b:a77b:9690) Quit (Ping timeout: 480 seconds)
[11:48] <Gdub> did anyone ever experienced that ?
[11:48] <erwan_taf> niklas: syslog
[11:48] <Gdub> shud i manually create those folders/ files?
[11:49] <niklas> erwan_taf: nope, syslog got just the same trace
[11:50] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[11:51] <erwan_taf> nothing on top ?
[11:54] <niklas> nope, on top of that there is the previous trace
[11:56] <niklas> erwan_taf: I got one after another, about of them 20 per Minute
[11:56] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[11:57] * zhyan__ (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[12:07] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:07] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[12:08] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[12:09] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[12:13] * ksingh1 (~Adium@b-v6-0014.vpn.csc.fi) Quit (Ping timeout: 480 seconds)
[12:14] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[12:18] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Ping timeout: 480 seconds)
[12:20] * otisspud (~otisspud@198.15.79.50) Quit (Remote host closed the connection)
[12:20] * otisspud (~otisspud@198.15.79.50) has joined #ceph
[12:21] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[12:23] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[12:28] * allsystemsarego (~allsystem@188.27.166.164) has joined #ceph
[12:29] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:29] * gucki (~smuxi@77-56-39-154.dclient.hispeed.ch) Quit (Read error: Operation timed out)
[12:32] * sleinen (~Adium@2001:620:0:2d:8470:1e78:9607:4481) has joined #ceph
[12:33] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[12:36] * gucki (~smuxi@77-56-39-154.dclient.hispeed.ch) has joined #ceph
[12:36] <AndreyGrebennikov> hi there people
[12:36] <AndreyGrebennikov> does anyone know if the WindRiver Linux supported by Ceph
[12:36] * i_m (~ivan.miro@deibp9eh1--blueice1n1.emea.ibm.com) has joined #ceph
[12:37] <wogri_risc> AndreyGrebennikov: probably not in terms of pre-built packages. but you can certainly build it yourself.
[12:37] <mattt__> maybe a better question would be has anyone heard of windriver linux?
[12:37] <mattt__> :P
[12:39] <mattt__> ah, this is more for embedded stuff
[12:39] <mattt__> interesting
[12:40] <wogri_risc> heh @ mattt__ :)
[12:40] <wogri_risc> AFAIR ceph only builds on 64 bit plattforms. or is it just the packages?
[12:40] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[12:40] * sleinen (~Adium@2001:620:0:2d:8470:1e78:9607:4481) Quit (Ping timeout: 480 seconds)
[12:53] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[12:54] * rongze_ (~rongze@117.79.232.204) Quit (Remote host closed the connection)
[12:58] * nerdtron (~Administr@202.60.8.250) Quit (Quit: Leaving)
[13:00] * haomaiwang (~haomaiwan@183.220.21.84) has joined #ceph
[13:05] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[13:07] * sleinen (~Adium@2001:620:0:25:bcca:2b0:db57:a572) has joined #ceph
[13:09] * diegows (~diegows@190.190.11.42) has joined #ceph
[13:10] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[13:10] * ChanServ sets mode +v andreask
[13:13] * Cube1 (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[13:13] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[13:18] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:19] * yanzheng (~zhyan@134.134.137.73) has joined #ceph
[13:22] * huangjun (~kvirc@111.172.153.78) Quit (Read error: Connection reset by peer)
[13:25] <jerker> wogri_risc: it won't easily build on CentOS 5.X 32-bit with EPEL, need som packages not available there. Well, have to go CentOS/SL 6.X anyway so, just wait and see until I may get hold of the storage nodes.
[13:28] <jerker> And ZFSonLinux is not really meant for 32-bit (problems with virtual memory) so that means I have to run XFS/Ext4 or something without compression. :-( (Doesn't matter much if I just store compressed backups on it, but I was hoping to run generic storage/backup there and then it is very useful.)
[13:31] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[13:32] * erice (~erice@50.240.86.181) Quit (Read error: Operation timed out)
[13:35] * freedomhui (~freedomhu@117.79.232.204) Quit (Quit: Leaving...)
[13:36] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[13:40] * ksingh (~Adium@hermes1-231.csc.fi) has joined #ceph
[13:44] * ksingh (~Adium@hermes1-231.csc.fi) Quit (Read error: Connection reset by peer)
[13:44] * ksingh (~Adium@b-v6-0018.vpn.csc.fi) has joined #ceph
[13:45] <mozg> wido, hello there
[13:45] <mozg> are you around?
[13:46] <mozg> I am setting up the radosgw to be the secondary storage with CS
[13:46] <mozg> and I was wondering what should I set under the bucket in the Add Secondary Storage Page
[13:47] <mozg> will CS create the bucket for me, or will I need to manually create it?
[13:50] <wido> mozg: I'm not sure, but it can't hurt to create it
[13:50] * markbby (~Adium@168.94.245.3) has joined #ceph
[13:51] <ksingh> guys need help i am stucked
[13:51] <ksingh> http://pastebin.com/qN2Tdcb3
[13:51] <ksingh> unable to add second monitor
[13:51] <ksingh> first monitor added without any problem
[13:52] <ksingh> pls help
[14:02] <ksingh> any one active in here
[14:03] <mozg> wido, cheers
[14:03] <mozg> ksingh, i think there is an issue with having just two mons
[14:03] <mozg> you need to create a third one
[14:04] <mozg> before startig the second mon
[14:04] <ksingh> i am planning for total 3 mons
[14:04] <ksingh> first 1 is done , and while creating second getting error
[14:04] <mozg> once the third one is created you will need to start all three at the same time to form the quorum
[14:04] <mozg> i remember i had to do it this way
[14:04] <mozg> someone from #ceph has suggested me this
[14:05] <ksingh> do u mean at this point , i should proceed with third mon creation and finally restart all MON together
[14:05] <mozg> ksingh, sorry man, I am not sure
[14:05] <mozg> you can try
[14:06] <ksingh> well how to restart all monitors service together
[14:06] <mozg> but I don't know if it will allow you to add the third mon if you are not in quorum
[14:06] <mozg> is your ceph cluster working now?
[14:06] <mozg> ksingh, you simply run the restart commands on each server about the same time. It doesn't have to be at the exact second
[14:06] <mozg> just close to each other
[14:07] <mozg> ksingh, from what i remember
[14:07] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:07] <mozg> i've used the ceph guide on installing additional mons
[14:07] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:07] <mozg> and at that time the documentation was misleading a bit
[14:07] <mozg> when you are going from 1 to 3 mons you need to skip the point where it is asking you to start the monitor
[14:08] <mozg> and instead go and create a third mon
[14:08] <mozg> and start the second and third together
[14:08] <mozg> that has worked for me
[14:08] <mozg> as following the guide to the letter didn't
[14:08] <mozg> someone here has told me to do it this way and it worked
[14:08] <mozg> ksingh, by the way, do you use ceph-deploy?
[14:09] <mozg> or manual cluster creation?
[14:10] * gsaxena (~gsaxena@pool-108-56-185-167.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:14] <ksingh> ceph deploy
[14:17] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[14:19] * sarob (~sarob@2601:9:7080:13a:854e:f3b4:16ba:61fa) has joined #ceph
[14:19] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[14:21] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:21] <alfredodeza> mozg: it would be really useful if you could report this so we can fix it
[14:21] <ksingh> thanks Mozg , lemme check this
[14:21] <ksingh> hello alfredodeza , help me :^)
[14:22] <ksingh> do you have any other solution to this
[14:23] <alfredodeza> you are having problems adding monitors right?
[14:24] <alfredodeza> I would just try what mozg just suggested. I didn't even know we had such a problem in the docs for adding mons :(
[14:25] <ksingh> yes problem in adding monitor , did u checked my logs http://pastebin.com/qN2Tdcb3
[14:25] * sarob (~sarob@2601:9:7080:13a:854e:f3b4:16ba:61fa) Quit (Remote host closed the connection)
[14:26] <ksingh> first monitor properly added , problem comes in adding second and third , i following ceph documentation
[14:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[14:26] <alfredodeza> ksingh: those logs look like you have a bunch of problems
[14:26] <alfredodeza> you need to overwrite the conf
[14:26] <ksingh> i tried that
[14:26] <ksingh> no luck wht atha
[14:26] <ksingh> *that
[14:26] <alfredodeza> and you are failing on the ulimit it loks like?
[14:26] <alfredodeza> well, you should show me *those* logs :)
[14:27] <ksingh> okey
[14:28] <ksingh> please check this http://pastebin.com/ucWBjFY9
[14:28] * haomaiwa_ (~haomaiwan@112.193.130.112) has joined #ceph
[14:29] <alfredodeza> ksingh: have you tried the command that is failing on that host?
[14:29] <alfredodeza> failed: 'ulimit -n 32768; /usr/bin/ceph-mon -i ceph-mon2 --pid-file /var/run/ceph/mon.ceph-mon2.pid -c /etc/ceph/ceph.conf '
[14:29] <ksingh> if i run this command manually on node shell , it runs with out any problem
[14:29] <alfredodeza> hrmn
[14:30] <ksingh> same problem with third monitor
[14:30] <alfredodeza> this might be a workflow issue like mozg pointed out
[14:30] <alfredodeza> can you try his approach?
[14:31] <alfredodeza> that way we can confirm this problem
[14:31] <ksingh> would try that ,
[14:31] <mozg> ksingh, alfredodeza, the issue with documentation that i've mentioned relates to the manual guide and not ceph-deploy
[14:31] <mozg> i don't know much about ceph-deploy
[14:32] <mozg> i've tried to use it before without much luck
[14:32] <alfredodeza> mozg: sure, but that will also help me when I implement the functionality to 'add' another monitor :)
[14:32] <alfredodeza> also very useful to get it right for other people
[14:32] <ksingh> and do i need to update ceph.conf file with new monitor node details
[14:32] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:32] <ksingh> ?
[14:32] <mozg> alfredodeza, I've actually built my cluster manually, but now using ceph-deploy to manage it
[14:32] <ksingh> before restarting services at same time
[14:33] <mozg> and i've tested adding fourth and fifth mon with ceph-deploy and it worked like a charm
[14:33] <alfredodeza> interesting
[14:33] <mozg> but i've not done the ceph-deploy from scratch
[14:33] <alfredodeza> sure
[14:33] <alfredodeza> would you still be able to create a ticket describing what you had to do to get additional mons up?
[14:33] <ksingh> mozg , yes few days back i was doing manually and every thing worked
[14:33] <ksingh> now i am trying ceph.deploy
[14:33] <mozg> ah
[14:34] <mozg> we'll, you are not the first one with issues with ceph-deploy
[14:34] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[14:34] <mozg> i was at the Ceph day in London and they have now dedicated a person to look after this tool and make it better
[14:34] <mozg> so improvements are on their way
[14:34] <BillK> After an MDS crash I recreated the data and metadata but there is still almost 200G being used by the lost objects - is there anyway to recover the space?
[14:34] <alfredodeza> that would be me :)
[14:34] <mozg> but if you have any issues, it is best to report it to the bugging system so that it will get looked at and fixed
[14:35] <alfredodeza> ceph-deploy has had a boatload of fixes in the past few months
[14:35] * rongze (~rongze@117.79.232.235) has joined #ceph
[14:35] <alfredodeza> and has had a release almost every week
[14:35] <mozg> alfredodeza, oh, cool. so, is this your project now?
[14:35] <alfredodeza> so I expect people are finding it a bit better than a few months back :)
[14:35] * haomaiwang (~haomaiwan@183.220.21.84) Quit (Ping timeout: 480 seconds)
[14:35] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:35] <alfredodeza> mozg: I am working a lot on it, yes. I wouldn't say it is mine though
[14:36] <mozg> i mean under your wing ))
[14:36] <alfredodeza> sure
[14:36] <ksingh> guys one more silly question - please forgive me , i have 5 machines ( 1 - ceph deploy admin machine ) ( 3 for monitors mon1 , mon2 , mon3 ) ( 2 for osd )
[14:36] <mattt__> mozg: what you mean is, "who do we speak to when it's broken?" :P
[14:36] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[14:36] <ksingh> now which node to use for managing cluster
[14:36] <mozg> alfredodeza, well, you should be pleased to know that i've not had any issues in cluster management with ceph-deploy
[14:36] <ksingh> cluster management and operation
[14:37] <mozg> it works like a charm
[14:37] <alfredodeza> \o/
[14:38] <mozg> guys, i wanted to ask you a question. I've ordered a new server to replace my old one. So, I will have two identical new servers instead of one new and one old
[14:38] <mozg> i need to replace the old server with the new one and I was wondering what is the best way to do it?
[14:38] <mozg> i can simply replace the system disk and hard disks and journal disks
[14:38] <mozg> would it work?
[14:40] <ksingh> mozg - can you see my question above :^)
[14:40] <ksingh> how u are managing your ceph cluster
[14:40] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has left #ceph
[14:40] <mozg> ksingh, i have it on one of my mon servers
[14:41] <mozg> and manage it from there
[14:41] <mozg> it doens't matter really
[14:41] <mattt__> mozg: you're using replicas right ?
[14:41] <mozg> as long as your management ceph-deploy pc/server has ssh access to you nodes
[14:41] * Levy (Levy@b.clients.kiwiirc.com) has joined #ceph
[14:41] <mozg> mattt__, yeah, replica 2
[14:42] <ksingh> mozg : which one will be good option managment from ceph-deploy node or any mon node ?
[14:42] <mattt__> mozg: is each pg on OSDs on the two different machines?
[14:43] <mozg> mattt__, yeah, each machine is a replica of the other
[14:43] <mattt__> mozg: so you plan on just deleting those OSDs on the box to be decommissioned, then bringing up new OSDs on the new host?
[14:44] <mattt__> (and by deleting i mean removing from crush map etc.)
[14:44] <mattt__> mozg: if the cluster is in use, maybe you want to add the new OSDs and have a 3rd replica, then remove the old box?
[14:45] <mozg> mattt__, I've got a bit of an issue with space in the rack. i can't fit another server unfortunately
[14:46] <mattt__> mozg: eek
[14:46] <mozg> yeah
[14:46] <mattt__> mozg: well, that's what i'd do … stop/remove OSDs on the old machine, put up new box, add OSDs, and then re-add to crush map
[14:46] <mattt__> but i've not done this first hand, so maybe some other input would be good :D
[14:47] <ksingh> mozg : which one will be good option managment from ceph-deploy node or any mon node ?
[14:47] <mozg> mattt__, what if I simply replace the osds as well as the os disk
[14:47] <mozg> it should power on and start working, shouldn't it?
[14:47] <mozg> it will have the same IP address, etc
[14:47] <mattt__> mozg: you mean move the disks from old host into new one?
[14:47] <mozg> yeah, move the disks
[14:47] <mattt__> i'd imagine that'd work?
[14:47] <mozg> i can replace them one by one
[14:48] <mattt__> yeah, if you can retain disks then i'm sure that'd be easier
[14:48] <mozg> as I also need to move from 1TB disks to 3TB disks
[14:48] <mozg> but i can do it one by one
[14:48] <mattt__> oh
[14:48] <mattt__> yeah, i'd probably move the disks up so you have a working cluster
[14:48] <mattt__> then do a disk at a time … more time consuming but presumably a bit safer
[14:48] <mattt__> s/working cluster/redundant cluster/
[14:49] <mozg> mattt__, thanks
[14:49] <mozg> that is something i had in mind
[14:50] <mozg> it should not cause any downtime
[14:50] <mozg> and the disk replacement one by one should not cause a great deal of performance loss
[14:50] <mozg> i hope
[14:50] <mattt__> dunno, that is a lot of data to duplicate to another node
[14:50] <mattt__> but i'm not sure what the alternative would be
[14:51] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[14:51] * erice (~erice@50.240.86.181) has joined #ceph
[14:52] <mattt__> mozg: only think i'm not sure about
[14:52] <mattt__> mozg: if you put the disks in the new host, and it recognizes them differently
[14:53] <mozg> mattt__, when i've created my cluster i have manually partitioned disks and journal disks
[14:53] <mozg> and i am manually mounting them via fstab
[14:53] <mozg> with UUIDs instead of direct block devices
[14:53] <mattt__> oh nice!
[14:53] <mozg> so it should mount the disks without many issues
[14:53] <mozg> i hope
[14:54] <mattt__> lots of critical data on the cluster ?
[14:54] <mozg> yeah
[14:54] <mozg> and i can't afford the downtime
[14:54] <mozg> not much of it anyway
[14:58] <mattt__> good luck :)
[14:58] <cfreak201> Is there a way to mitigate the effect of crush map changes productive evniroments? Like do it with a low priority so normal I/O is not being delayed ?
[14:58] * ksingh (~Adium@b-v6-0018.vpn.csc.fi) Quit (Quit: Leaving.)
[15:01] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[15:06] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[15:17] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[15:18] * dmsimard (~Adium@2607:f748:9:1666:1c78:9c8d:38a0:914d) has joined #ceph
[15:20] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[15:21] * mattt__ (~textual@92.52.76.140) Quit (Quit: Textual IRC Client: www.textualapp.com)
[15:21] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[15:21] * mattt_ (~textual@92.52.76.140) has joined #ceph
[15:29] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[15:43] * julian (~julianwa@125.70.135.165) has joined #ceph
[15:48] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[15:53] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:54] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[15:54] <mattt_> when an OSD comes online, does it always automatically go into "in" state?
[16:02] * freedomhui (~freedomhu@117.79.232.203) has joined #ceph
[16:13] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[16:14] * freedomhui (~freedomhu@117.79.232.203) Quit (Quit: Leaving...)
[16:16] * The_Bishop_ (~bishop@f048124069.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[16:23] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:23] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[16:24] * erice_ (~erice@50.240.86.181) has joined #ceph
[16:26] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:26] * gsaxena (~gsaxena@addr1621292483.ippl.jhmi.edu) has joined #ceph
[16:26] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:29] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) has joined #ceph
[16:29] * erice (~erice@50.240.86.181) Quit (Ping timeout: 480 seconds)
[16:31] * freedomhui (~freedomhu@123.151.28.71) has joined #ceph
[16:34] * rongze (~rongze@117.79.232.235) Quit (Remote host closed the connection)
[16:34] * rongze (~rongze@117.79.232.203) has joined #ceph
[16:43] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[16:44] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[16:48] * julian (~julianwa@125.70.135.165) Quit (Quit: afk)
[16:53] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[16:54] * i_m (~ivan.miro@deibp9eh1--blueice1n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[16:54] <mozg> wido, do you know if there is a fix for ACS 4.2 to stop it copying snapshots to the secondary storage?
[16:54] * Henson_D (~kvirc@lord.uwaterloo.ca) has joined #ceph
[16:55] * JustEra (~JustEra@89.234.148.11) Quit (Quit: This computer has gone to sleep)
[16:56] * ircolle (~Adium@2601:1:8380:2d9:4cf:f5a8:9108:e140) has joined #ceph
[16:57] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[16:57] * a is now known as Guest3128
[16:59] * gsaxena (~gsaxena@addr1621292483.ippl.jhmi.edu) Quit (Ping timeout: 480 seconds)
[17:00] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[17:02] <Henson_D> hello everyone, I may or may not have found a bug, and I'd like to bounce it off you guys. I got a FAILED assert(p->second.size == snapset.clone_size[*curclone]) message while doing either scrub or repair on a PG with an error in it. A couple weeks ago I had a problem with the following kind of error:
[17:03] <Henson_D> deep-scrub 3.9 3082bd09/rb.0.102c.2ae8944a.000000003020/head//3 on disk size (4194304) does not match object info size (4096), I corrected this by checking to see if the object in question (part of an RBD ext4 filesystem) was in use using debugfs, and it was. So I unmounted the filesystem and simply did "get" and "put" on the object to reset the object info size. However before doing this I
[17:04] <Henson_D> took a snapshot of the rbd device in case I messed up. I then called "repair" on the PG with the error. In the case of an RBD device that had 2x replication, the error was fixed. But in the case of the RBD device with 1x replication, I got the FAILED assert message above. This first crashed the OSD with the error, then it would just give me the error and refuse to repair the error, either
[17:04] * Starheaven (~Starheave@208.78.140.246) has joined #ceph
[17:05] <Henson_D> with a scrub or repair. The way I got this to work was to remove the snapshot on the 1x replicated RBD and run repair again, at which point there was no error and the object info size was corrected. So it seems as though if it's 1x replicated with a snapshot, then "get" and "put" to fix the object size causes problems. Does this behaviour make sense, or did I find a bug?
[17:06] <Henson_D> I also know that this assert type error was reported and fixed about 3 years ago, but I'm running 0.67.2 and got this error.
[17:08] * yehudasa (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) has joined #ceph
[17:11] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) has joined #ceph
[17:15] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:17] * markbby (~Adium@168.94.245.3) Quit (Quit: Leaving.)
[17:19] * Guest3128 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[17:19] * yanzheng (~zhyan@134.134.137.73) Quit (Remote host closed the connection)
[17:22] * simulx (~simulx@66-194-114-178.static.twtelecom.net) has joined #ceph
[17:24] * markbby (~Adium@168.94.245.2) has joined #ceph
[17:27] * gsaxena (~gsaxena@addr1621292483.ippl.jhmi.edu) has joined #ceph
[17:27] * RuediR (~Adium@130.59.94.192) Quit (Quit: Leaving.)
[17:28] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:30] * ajazdzewski (~quassel@lpz-66.sprd.net) Quit (Remote host closed the connection)
[17:30] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) has joined #ceph
[17:30] * jharley (~jharley@69-196-134-224.dsl.teksavvy.com) has joined #ceph
[17:30] <jharley> hey, I've got two monitors saying printing "faile to create new leveldb store" on upgrade to 0.67.4
[17:30] <jharley> thoughts?
[17:31] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[17:34] * haomaiwa_ (~haomaiwan@112.193.130.112) Quit (Remote host closed the connection)
[17:34] * Cube1 (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[17:35] <jharley> better news: just one mon that reports that message
[17:35] * haomaiwang (~haomaiwan@112.193.130.112) has joined #ceph
[17:35] <jharley> "2013-10-22 15:34:52.584644 7f090cb097c0 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-mon, pid 6470
[17:35] <jharley> 2013-10-22 15:34:53.206956 7f090cb097c0 -1 failed to create new leveldb store"
[17:35] * shang (~ShangWu@70.35.39.20) has joined #ceph
[17:35] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[17:35] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[17:36] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[17:36] <niklas> Is there any Reason my cluster would stop recovering with 10GPs active+remapped and 218 PGs active+degraded after an OSD Failure?
[17:36] <niklas> Why won't it fix those, too?
[17:37] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[17:37] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:40] <jharley> running "ceph-mon -id mon-02 -d" prints "Corruption: checksum mismatch"
[17:40] * mschiff (~mschiff@85.182.236.82) has joined #ceph
[17:42] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[17:43] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[17:43] * haomaiwang (~haomaiwan@112.193.130.112) Quit (Ping timeout: 480 seconds)
[17:43] <jharley> 'store.db' is 34M in size
[17:44] * raipin (raipin@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[17:44] * raipin (raipin@a.clients.kiwiirc.com) has joined #ceph
[17:46] * Cube1 (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[17:46] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[17:47] * Cube1 (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[17:47] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) has joined #ceph
[17:48] * sjm (~sjm@38.98.115.250) has joined #ceph
[17:49] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[17:49] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[17:50] * jharley (~jharley@69-196-134-224.dsl.teksavvy.com) has left #ceph
[17:51] * jharley (~jharley@69-196-134-224.dsl.teksavvy.com) has joined #ceph
[17:51] <jharley> tried starting with '--force-sync --yes-i-really-mean-it' with no luck either
[17:51] <jharley> do I need to destroy and reinit this mon?
[17:54] * Levy (Levy@b.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[17:56] * gchristensen (~gchristen@li65-6.members.linode.com) has left #ceph
[17:57] * Cube (~Cube@66-87-66-191.pools.spcsdns.net) Quit (Quit: Leaving.)
[17:58] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[17:59] * The_Bishop (~bishop@2a02:2450:102f:4:d10e:5b68:fd24:ff14) has joined #ceph
[18:00] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[18:03] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) has joined #ceph
[18:05] <jharley> FWIW: I rebuild the mon using the instructions here http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
[18:06] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[18:06] * mattt_ (~textual@92.52.76.140) Quit (Read error: Connection reset by peer)
[18:07] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[18:07] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Ping timeout: 480 seconds)
[18:07] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[18:10] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:11] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) has joined #ceph
[18:14] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[18:16] * jcsp1 (~jcsp@212.20.242.100) has joined #ceph
[18:18] * sleinen (~Adium@2001:620:0:25:bcca:2b0:db57:a572) Quit (Quit: Leaving.)
[18:18] * sleinen (~Adium@130.59.94.193) has joined #ceph
[18:19] * a_ (~a@209.12.169.218) has joined #ceph
[18:22] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[18:22] * KindTwo (KindOne@h90.50.186.173.dynamic.ip.windstream.net) has joined #ceph
[18:23] * sleinen (~Adium@130.59.94.193) Quit (Read error: Operation timed out)
[18:23] * sprachgenerator (~sprachgen@130.202.135.205) has joined #ceph
[18:24] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:24] * KindTwo is now known as KindOne
[18:26] * sprachgenerator (~sprachgen@130.202.135.205) Quit ()
[18:28] * ksingh (~Adium@2001:708:10:10:9062:5b25:24f0:5ce8) has joined #ceph
[18:31] * jcsp1 (~jcsp@212.20.242.100) Quit (Quit: Leaving.)
[18:34] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[18:40] * yehudasa (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) Quit (Read error: Connection reset by peer)
[18:40] * yehudasa__ (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) has joined #ceph
[18:41] * Cube (~Cube@12.248.40.138) has joined #ceph
[18:45] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[18:45] * sjustlaptop (~sam@172.56.17.252) has joined #ceph
[18:46] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[18:47] * sleinen1 (~Adium@2001:620:0:26:d843:1b52:c296:a83d) has joined #ceph
[18:51] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[18:52] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[18:53] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[18:53] * JoeGruher (~JoeGruher@134.134.137.71) has joined #ceph
[18:57] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[18:58] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:59] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[18:59] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[19:03] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[19:06] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:06] * Steki (~steki@198.199.65.141) has joined #ceph
[19:09] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[19:19] * gsaxena_ (~gsaxena@mail.atlas-advertising.com) has joined #ceph
[19:19] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[19:20] * gsaxena_ (~gsaxena@mail.atlas-advertising.com) Quit (Remote host closed the connection)
[19:23] * sjustlaptop (~sam@172.56.17.252) Quit (Ping timeout: 480 seconds)
[19:26] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:28] * Pedras (~Adium@64.191.206.83) has joined #ceph
[19:31] <JoeGruher> question... sometimes i see parameters both with an underscore and without, for example, "cluster_network" versus "cluster network" in ceph.conf... is ceph generally smart enough to figure it out either way, or is one preferred, or?
[19:33] * freedomhui (~freedomhu@123.151.28.71) Quit (Quit: Leaving...)
[19:35] <peetaur> JoeGruher: I see underscores, and I've tried without them and it works for cluster network and public network. (did not try with them)
[19:36] * rongze (~rongze@117.79.232.203) has joined #ceph
[19:36] * thomnico (~thomnico@70.35.39.20) Quit (Quit: Ex-Chat)
[19:36] <JoeGruher> just curious if ceph.conf parses underscores the same as spaces... i've seen this with other parameters as well, not just the cluster/public network stuff
[19:37] <ksingh> guys how from where to manage ceph cluster , i have ceph-deploy node , Monitor nodes and OSD nodes
[19:38] <ksingh> so from which node we should usually manage entire cluster
[19:38] <ksingh> which node to use for centralize management of entire cluster
[19:42] <peetaur> JoeGruher: I think so
[19:42] <peetaur> ksingh: probably from the ceph-deploy node
[19:43] <ksingh> peetaur : so on ceph-deploy node , we must have ceph as well as ceph-deploy packages installed
[19:43] * claenjoy (~leggenda@37.157.33.36) Quit (Quit: Leaving.)
[19:43] <ksingh> it has to
[19:43] <peetaur> oh yeah, I do that
[19:43] <peetaur> but others I believe said they wree told they should do the ceph commands on the OSDs or monitors
[19:43] <peetaur> I do it on the ceph-deploy node so I can loop through things where my ssh key is too
[19:44] <peetaur> (but I'm only testing it.... no production system yet)
[19:44] * simulx (~simulx@66-194-114-178.static.twtelecom.net) Quit (Read error: No route to host)
[19:45] <ksingh> my next question is , since we are managing entire cluster from ceph-deploy node , all the commands like monitor additon , deletion OSD additon / deletion etc will be performed from ceph-deploy node
[19:45] <ksingh> so the ceph.conf file gets updated only on ceph-deploy node
[19:45] * simulx (~simulx@vpn.expressionanalysis.com) has joined #ceph
[19:45] <ksingh> do we need to very time push ceph.conf file form ceph-node to Other cluster nodes manually
[19:50] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[19:51] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:52] <ksingh> gyus pls answer my query
[19:53] * yehudasa__ (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[19:54] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[19:55] * Gdub (~Adium@dsl093-174-037-223.dialup.saveho.com) Quit (Quit: Leaving.)
[19:56] <peetaur> ksingh: ceph-deploy --help
[19:56] <peetaur> admin Push configuration and client.admin key to a remote host.
[19:56] <peetaur> config Push configuration file to a remote host.
[19:56] * dmick (~dmick@2607:f298:a:607:f09d:ac37:81d9:ec58) has joined #ceph
[19:57] <ksingh> peetaur : thanks i know that , my question was do we need to manually push configuratino each time to all the nodes
[19:57] <peetaur> I guess so
[19:57] <ksingh> aha
[19:57] <peetaur> but with that easy ceph-deploy thing
[19:58] <ksingh> like what
[19:58] * jharley (~jharley@69-196-134-224.dsl.teksavvy.com) Quit (Quit: jharley)
[19:58] <peetaur> ceph-deploy [--overwrite-conf] node{1,2,3}
[19:59] <peetaur> bash expands the {1,2,3} try with echo to see
[19:59] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[19:59] <ksingh> yep
[20:00] <peetaur> forgot the admin word in there: ceph-deploy [--overwrite-conf] admin node{1,2,3}
[20:00] <ksingh> one more thing each node have theire own copy of ceph.conf
[20:00] <ksingh> does the contents of ceph.conf file has to be exactly same for all nodes
[20:00] <peetaur> and also {1..3} works
[20:01] <peetaur> it has to or doesn't have to depending on your needs... for example rbd caching stuff wouldn't have any effect on the OSDs, only clients
[20:06] <ksingh> Thanks
[20:07] <ksingh> by any chance any one in this channel , konw about solution of this http://pastebin.com/ucWBjFY9
[20:07] <ksingh> i am not able to extend my cluster , i.e not able to add monitor nodes
[20:09] <peetaur> not sure what that is
[20:11] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Ping timeout: 480 seconds)
[20:12] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[20:13] <ksingh> :-(
[20:13] * freedomhui (~freedomhu@106.120.176.65) has joined #ceph
[20:14] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[20:15] * jeff-YF (~jeffyf@67.23.117.122) Quit (Quit: jeff-YF)
[20:18] * Henson_D (~kvirc@lord.uwaterloo.ca) Quit (Quit: KVIrc KVIrc Equilibrium 4.1.3, revision: 5988, sources date: 20110830, built on: 2011-12-05 12:15:22 UTC http://www.kvirc.net/)
[20:22] * gucki (~smuxi@77-56-39-154.dclient.hispeed.ch) Quit (Remote host closed the connection)
[20:22] <TVR> how many nodes in the initial cluster?
[20:22] <dmick> ksingh: check monitor logs on the node and see why it didn't start
[20:22] <TVR> how many mons in the initial cluster actually?
[20:23] <pmatulis_> looks like ulimit command failed. can you manually/locally make such a change? security framework policy?
[20:24] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[20:27] <mikedawson> looks like ksingh's issue is going from one mon to two mons When the second monitor is added, but not yet sync'ed there is a special case of one working monitor out of two (i.e. no quorum). Does ceph-deploy handle that case properly?
[20:28] <dmick> pmatulis_: no, it wasn't ulimit, it was the entire run of ceph-mon most likely
[20:28] <dmick> mikedawson: maybe, but ceph-mon shlouldn't have failed to start
[20:28] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[20:28] <dmick> all of which is why I recommended looking at the mon log to see why it apparently failed to start
[20:29] <mikedawson> dmick: yep
[20:32] <ksingh> guys i am checking the logs , give me a minute , ulimit if i run it manually on shell it completes with no error
[20:32] * tsnider1 (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[20:33] <TVR> It seems ceph-deploy and ceph for that matter cannot go from one mon to > one mon.. as discussed in this thread: http://www.mail-archive.com/ceph-users@lists.ceph.com/msg02109.html
[20:33] <TVR> if there is a way, I would love to hear about it please.
[20:34] * nhm (~nhm@184-97-129-163.mpls.qwest.net) Quit (Read error: Operation timed out)
[20:36] <ksingh> where to check for the monitor logs , on ceph-deploy node i didnt found any monitor logs
[20:36] <ksingh> there is only ceph.log , that doesnt say much on the issue
[20:36] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[20:37] <mikedawson> ksingh: on the node hosting the monitor at /var/log/ceph/ceph-mon.<name>.log
[20:37] <ksingh> okey thnaks got it
[20:37] * yehudasa__ (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) has joined #ceph
[20:38] <ksingh> only 3 lines in log file
[20:38] <ksingh> 2013-10-22 21:33:41.843931 7f60739dd7a0 0 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7), process ceph-mon, pid 22247
[20:38] <ksingh> 2013-10-22 21:33:41.918344 7f60739dd7a0 0 mon.ceph-mon2 does not exist in monmap, will attempt to join an existing cluster
[20:38] <ksingh> 2013-10-22 21:33:41.918491 7f60739dd7a0 -1 no public_addr or public_network specified, and mon.ceph-mon2 not present in monmap or ceph.conf
[20:38] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[20:38] <dmick> ah. yeah, that'll do it
[20:39] <ksingh> i have made entry in ceph.conf file manually , is there any thing else i need to change
[20:39] <ksingh> i have done this change before , but seems this is also not helping
[20:39] <dmick> so TVR is likely right about the ultimate cause: ceph-deploy is designed such that the original "new" establishes which monitors will be in the cluster, and isn't happy about that changing later with "mon create"
[20:40] <ksingh> TVR , your advice pls :^)
[20:40] <ksingh> mike : what do you say
[20:40] <dmick> if you add it to ceph.conf, it could work, but bear in mind you need an odd number of mons (1 or 3, not 2)
[20:40] <dmick> so if you're adding to 1, add 2 more, and "mon create" them both
[20:41] <ksingh> yes i have a plan of having 3 mons
[20:41] <peetaur> ksingh: if it's a test cluster and you just want to play, I'd say stop the monitor, get the mon map, edit it so there's just 1 monitor again, inject, start it (optionally see it fail and cry and start over), and then try adding 2 monitors next time instead of 1.
[20:41] <TVR> I wrote mine (in puppet) to have a $quarum value of at least 3 servers... so it won't create just one.. and then, from there, it can be expanded...
[20:41] <ksingh> 1 already working fine , i am stucked in mon2 and mon3
[20:41] <dmick> I'm not sure if there's anything else you'll have to make up for or not but that might work.
[20:42] <peetaur> ksingh: http://ceph.com/docs/next/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
[20:42] <peetaur> does that 1 have quorum?
[20:42] <TVR> I really was kind of hoping to be flamed and have someone tell me I was wrong.... to be honnest... I was hoping there was a way to expand from 1..
[20:42] <peetaur> when mine doesn't have quorum, even a thing like "ceph -s" just hangs forever
[20:43] <dmick> TVR: ceph-deploy is a simple tool to handle simple installation situations; it intentionally doesn't cover every possible scenario. but I think this only requires a little bit of extra help
[20:43] <TVR> if it is a quarum of 1 then its a quarum.. it has to have an odd number to work
[20:44] <mikedawson> TVR: correction - an odd number is not needed (but is certainly advisable). The requirement is over half of the defined monitors are required for quorum.
[20:44] <ksingh> Peetaur : one mon is up , so my cluster should be in quarum , and yes i have doing in text box , no problem with any change
[20:45] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Ping timeout: 480 seconds)
[20:45] <dmick> 1) the word is "quorum"
[20:45] <TVR> ok.. yea.. that makes sens.. yes
[20:45] <peetaur> ksingh: well I'd take a look at the monmap anyway, even if not to edit it
[20:45] <peetaur> ksingh: it would be nice to know if the 2nd one is in there
[20:45] <TVR> sense.. heh
[20:45] <dmick> 2) quorum depends on how many monitors the cluster currently believes should be there
[20:46] <dmick> if you've been editing ceph.conf ksingh it's difficult to tell
[20:47] <mikedawson> dmick: would the admin socket of the working monitor be authoritative?
[20:47] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[20:49] <dmick> likely, but I think I was sorta counseling for describing where one is at rather than tidbits of "I may have messed with this"
[20:49] <mikedawson> ksingh: what does 'ceph --admin-daemon /var/run/ceph/ceph-mon.,<monitor_name>.asok quorum_status' say?
[20:51] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[20:51] <ksingh> it says admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[20:52] <dmick> and so the first question is "does the pathname you gave as an argument exist"
[20:52] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[20:53] <ksingh> peetaur : the link that you gave me , in sept 2 , command is not doing any thing ceph-mon -i 0 --extract-monmap /tmp/monmap
[20:53] <ksingh> it should create /tmp/monmap , but no luck
[20:54] <ksingh> dmick : no
[20:54] <ksingh> only till /var/run/ceph exists
[20:54] <ksingh> on monitor node2 , that is giving problem
[20:54] <mikedawson> ksingh: do it on the working monitor
[20:55] <ksingh> i am talking about mon2 node
[20:55] <dmick> ksingh: you can't run that monitor command on a non-working monitor
[20:56] <ksingh> dmick : bings , gave me output from working monitor
[20:56] <ksingh> { "election_epoch": 1,
[20:56] <ksingh> "quorum": [
[20:56] <ksingh> 0],
[20:56] <ksingh> "quorum_names": [
[20:56] <ksingh> "ceph-mon1"],
[20:56] <ksingh> "quorum_leader_name": "ceph-mon1",
[20:56] <ksingh> "monmap": { "epoch": 1,
[20:56] <ksingh> "fsid": "7f89d921-a938-463a-984e-8783368514fb",
[20:56] <ksingh> "modified": "0.000000",
[20:56] <ksingh> "created": "0.000000",
[20:56] <ksingh> "mons": [
[20:56] <ksingh> { "rank": 0,
[20:56] <ksingh> "name": "ceph-mon1",
[20:56] <ksingh> "addr": "192.168.1.28:6789\/0"}]}}
[20:57] <dmick> so the cluster still is running with 1 mon defined
[20:57] <dmick> 1) don't paste multiline output here, use something like fpaste.org or pastebin.com
[20:57] <dmick> 2) pastebin the current state of your ceph.conf
[20:57] <mikedawson> ksingh: have you considered taking what you've learned so far, and starting over with your expanded understanding? Then do it again, and again... You'll get it
[20:58] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[20:59] <mikedawson> and dmick, I hereby publicly thank you for helping me through a bunch of these questions over the past year!
[20:59] <pmatulis_> any idea when cloning will be supported in the kernel rbd module?
[21:00] * danieagle (~Daniel@186.214.61.130) has joined #ceph
[21:01] <peetaur> ksingh: what I used was: sudo ceph-mon -i $(hostname) --extract-monmap monmap.blah -d
[21:01] <peetaur> ksingh: which fills in the hostname as the mon id, and names the output monmap.blah
[21:02] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Ping timeout: 480 seconds)
[21:03] <dmick> pmatulis_: it is now. I don't know which kernel version you need exactly but it's been in mainline for some time.
[21:03] * KindTwo (~KindOne@h33.38.28.71.dynamic.ip.windstream.net) has joined #ceph
[21:04] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:04] * ksingh (~Adium@2001:708:10:10:9062:5b25:24f0:5ce8) Quit (Ping timeout: 480 seconds)
[21:04] * KindTwo is now known as KindOne
[21:07] <mozg> guys, has anyone here managed to get ceph/s3 working with CloudStack Secondary Storage?
[21:07] * freedomhui (~freedomhu@106.120.176.65) Quit (Quit: Leaving...)
[21:10] * allsystemsarego (~allsystem@188.27.166.164) Quit (Quit: Leaving)
[21:11] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[21:13] <aarontc> is it a known bug that CephFS isn't supporting sticky bits? I searched trac and didn't find anything
[21:13] <pmatulis_> dmick: ok, i'm reading the docs...
[21:13] <pmatulis_> dmick: but good to hear
[21:16] <dmick> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/3898 so 3.9 at least
[21:16] <dmick> I thought I remembered 3.8 but that's iffy
[21:18] * tsnider1 (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[21:30] * jeff-YF (~jeffyf@pool-108-48-217-245.washdc.east.verizon.net) has joined #ceph
[21:30] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[21:32] * thomnico (~thomnico@70.35.39.20) Quit (Ping timeout: 480 seconds)
[21:36] * nwat (~nwat@eduroam-225-58.ucsc.edu) Quit (Ping timeout: 480 seconds)
[21:38] <pmatulis_> dmick: ok thanks, that's fairly recent
[21:38] * jeff-YF (~jeffyf@pool-108-48-217-245.washdc.east.verizon.net) Quit (Ping timeout: 480 seconds)
[21:39] * gsaxena (~gsaxena@addr1621292483.ippl.jhmi.edu) Quit (Ping timeout: 480 seconds)
[21:44] * Vjarjadian (~IceChat77@94.1.37.151) has joined #ceph
[21:44] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[21:45] <JoeGruher> anyone have thoughts on why ceph deploy might hang here: [joceph01][INFO ] Running command: rpm -Uvh --replacepkgs http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm
[21:45] <JoeGruher> command completes just fine if I run it myself: [ceph@joceph05 ceph]$ ssh joceph01 'sudo rpm -Uvh --replacepkgs http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm'
[22:03] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[22:03] * sjustlaptop (~sam@172.56.9.102) has joined #ceph
[22:06] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) has joined #ceph
[22:07] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Read error: Connection reset by peer)
[22:08] * ksingh (~Adium@b-v6-0001.vpn.csc.fi) has joined #ceph
[22:18] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) has joined #ceph
[22:18] * dmsimard (~Adium@2607:f748:9:1666:1c78:9c8d:38a0:914d) Quit (Quit: Leaving.)
[22:19] <dmick> JoeGruher: no, but I'd look with strace -f -p
[22:19] <dmick> awesome hostnames btw
[22:19] * dmsimard (~Adium@2607:f748:9:1666:1c78:9c8d:38a0:914d) has joined #ceph
[22:20] <JoeGruher> hehe thanks
[22:21] * danieagle (~Daniel@186.214.61.130) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[22:22] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[22:22] * scott__ (~scott@dc.gigenet.com) has joined #ceph
[22:23] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[22:23] <ksingh> dmick : here is the my ceph.conf http://pastebin.com/4PJ4XTQB
[22:24] <loicd> While working on puppet-ceph, the following debate started. I advocate that puppet ( and other similar tools ) should only deal with OSDs calling ceph-disk prepare on designated disks / partitions and let the udev / upstart logic deal with the rest. The tool does not need to know anything about OSD ids or even running osd daemons. It's handled automatically by ceph.
[22:24] <ksingh> this output is from my ceph-deploy node
[22:24] * diegows (~diegows@190.190.11.42) Quit (Read error: Operation timed out)
[22:25] <dmick> ksingh: of course the ceph.conf should be the same on all the nodes or else chaos results.
[22:25] <ksingh> yes i pushed ceph.conf manually on all monitors
[22:25] <dmick> but given that, I'm not sure what happens if monmap disagrees with mon_initial_members
[22:25] <dmick> and mon_host
[22:25] <loicd> Danny Al Gaaf argues that a configuration system must be able to deal with OSD ids and chose to deal with how disks are provisionned instead of letting udev / upstart deal with it.
[22:25] <kraken> (⌐■_■)
[22:26] <dmick> this is a section of code I keep exhorting someone to document but I don't think it's happened yet
[22:26] <loicd> I'm curious to know if someone has an opinion about this.
[22:26] <ksingh> dmick : i also tried adding all the monitor nodes in mon_initial_members , but no result
[22:27] <dmick> looks like they're already there tome
[22:27] <scott__> Can anyone give me a direct guide for Rbd and cloudstack. I have a storage cluster working, but can't get it to be added as primary storage
[22:27] * dmsimard (~Adium@2607:f748:9:1666:1c78:9c8d:38a0:914d) Quit (Quit: Leaving.)
[22:28] <ksingh> dmick : have a look http://www.mail-archive.com/ceph-users@lists.ceph.com/msg02109.html is this the same issue i am facing
[22:28] * danieagle (~Daniel@179.176.57.59.dynamic.adsl.gvt.net.br) has joined #ceph
[22:33] <dmick> ksingh: yeah. I don't have any easy answers. It might be that futzing with the monmap and injecting it would help.
[22:33] <dmick> but I don't have time to reproduce/try it out.
[22:34] <joshd1> loicd: I think ideally ceph-disk should be smart enough to handle various disk+journal configurations itself, rather than duplicating that logic across many deployment tools
[22:35] <ksingh> alright dmick , thanks a big ton for your help , lets discuss tommorow
[22:35] <loicd> joshd1: that's also my thinking. You're saying "ideally", do you have a limitation in mind ?
[22:36] <joshd1> loicd: no, I don't have a good handle on the current state though
[22:37] * diegows (~diegows@190.190.11.42) has joined #ceph
[22:38] <ksingh> one quick check , from ceph-deploy node i am not able to check cluster health , however from ceph monitor able to see health , below output i am getting ffrom ceph-deplo node
[22:38] <ksingh> [root@ceph-admin ~]# ceph health
[22:38] <ksingh> 2013-10-22 23:36:48.045059 7fc8bc92a700 0 -- :/10779 >> 192.168.1.28:6789/0 pipe(0x2541510 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[22:38] <ksingh> ^C
[22:38] <ksingh> [root@ceph-admin ~]#
[22:38] <loicd> I've spent some time exploring it and it looks like it's all covered. It is impressively simple.
[22:38] <loicd> joshd1: ^
[22:39] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[22:40] <dmick> ksingh: the ceph command selects a monitor from the list of known monitors to send the command to
[22:40] <dmick> if you only have 1 of 3 up you can get unlucky
[22:40] <dmick> you can ask a specific mon with -m <addr>
[22:41] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[22:43] <joshd1> loicd: glad to hear it. I don't see any need for puppet to handle osd ids
[22:43] * sagelap (~sage@2600:1012:b01e:fc6a:18f7:88d2:cd17:da33) has joined #ceph
[22:45] * rongze (~rongze@117.79.232.203) has joined #ceph
[22:47] * shang (~ShangWu@38.126.120.10) has joined #ceph
[22:51] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:51] * iii8 (~Miranda@91.207.132.71) has joined #ceph
[22:51] * scott__ (~scott@dc.gigenet.com) Quit (Quit: Lost terminal)
[22:54] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[22:56] * shang (~ShangWu@38.126.120.10) Quit (Ping timeout: 480 seconds)
[22:58] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:59] <JoeGruher> is this the correct way to specify to install v0.71? "ceph-deploy -v install --dev 0.71 <hosts>"
[23:00] <JoeGruher> i assume not because it fails due to lack of a 0.71 (or v0.71) directory in http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/
[23:01] <JoeGruher> but then how would one install 0.71 using ceph-deploy?
[23:01] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[23:03] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:03] * sagelap (~sage@2600:1012:b01e:fc6a:18f7:88d2:cd17:da33) Quit (Quit: Leaving.)
[23:04] * lx0 is now known as lxo
[23:09] * freedomhui (~freedomhu@211.155.113.161) has joined #ceph
[23:10] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[23:10] * ChanServ sets mode +v andreask
[23:11] * shang (~ShangWu@70.35.39.20) has joined #ceph
[23:15] * dmsimard (~Adium@2607:f748:9:1666:e82b:5855:bf2a:7a39) has joined #ceph
[23:15] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[23:16] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.ne1.yahoo.com) has joined #ceph
[23:17] * freedomhui (~freedomhu@211.155.113.161) Quit (Ping timeout: 480 seconds)
[23:18] * sagelap (~sage@42.sub-70-197-65.myvzw.com) has joined #ceph
[23:24] * dmsimard (~Adium@2607:f748:9:1666:e82b:5855:bf2a:7a39) Quit (Quit: Leaving.)
[23:26] * Starheaven (~Starheave@208.78.140.246) Quit (Quit: My MacBook has gone to sleep. ZZZzzz…)
[23:33] * jcsp (~jcsp@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[23:35] * alram (~alram@216.103.134.250) has joined #ceph
[23:35] * thomnico (~thomnico@70.35.39.20) Quit (Read error: Connection reset by peer)
[23:35] * alram (~alram@216.103.134.250) Quit ()
[23:36] <mozg> hello again guys
[23:36] <tsnider> ceph osd tree lists an osd down. There's nothing apparent in the ceph-osd.61.log on why. How can I determine why it's down? This is newly created cluster.
[23:36] <tsnider> 61 0 osd.61 down 0
[23:36] <mozg> i've got a strange issue I hope someone can help me with?
[23:36] <mozg> i've got a cluster with 2 osd servers, 2 mds and 5 mon servers
[23:36] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[23:36] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[23:38] <mozg> my ceph -s shows the following: http://ur1.ca/fx8os
[23:38] <mozg> it shows that all 5 mons are down for some reason
[23:39] <mikedawson> tsnider: it looks like osd.61 has a crush weight of 0 (the second column). That doesn't seem right.
[23:39] <mozg> as you can see on line 3
[23:39] <mozg> however, line four shows that the cluster has a forum
[23:39] <mozg> and all 5 mons seems to be okay
[23:40] <mozg> one thing which i've noticed is it takes ages for simple commands to run
[23:40] <mozg> lice ceph -s sometimes takes over a minute
[23:40] <mozg> and things like ceph osd tree also could take over a minute
[23:40] * BillK (~BillK-OFT@106-68-202-154.dyn.iinet.net.au) has joined #ceph
[23:40] * diegows (~diegows@190.190.11.42) Quit (Read error: Connection reset by peer)
[23:40] <tsnider> mikedawson: yeah -- Let me look thru the creation output and see if there's anything there.
[23:41] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[23:41] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[23:42] <mikedawson> mozg: what version is this?
[23:42] <mozg> and my mon logs show an unusual number of messages like these:
[23:42] <mozg> 2013-10-22 22:41:14.008649 7f985386e700 1 mon.arh-ibstorage2-ib@4(peon).paxos(paxos active c 16304877..16305554) is_readable now=2013-10-22 22:41:14.008652 lease_expire=2013-10-22 22:41:19.001426 has v0 lc 16305554
[23:42] <mozg> mikedawson, it's 0.67.4
[23:43] <mikedawson> mozg: when you have a working quorum, you'll see something like "quorum 0,1,2 a,b,c" not just "quorum"
[23:43] <mozg> mikedawson, ah, okay
[23:43] <mozg> so, where do i begin?
[23:44] <mozg> i've restarted ceph-mons on all servers
[23:45] <mikedawson> mozg: sometimes I do things like ssh mon1 tail /var/log/ceph/ceph-mon.a.log && ssh mon2 tail /var/log/ceph/ceph-mon.b.log && ssh mon3 tail /var/log/ceph/ceph-mon.c.log
[23:45] <mikedawson> mozg: that'll give you an idea what your monitors are doing. Repeat it often to see the state of the cluster change
[23:46] * a_ (~a@209.12.169.218) Quit (Quit: Leaving)
[23:46] <mikedawson> mozg: or look at the admin sockets 'ceph --admin-daemon /var/run/ceph/ceph-mon.,<monitor_name>.asok quorum_status'
[23:46] <mozg> the message i've sent at 22:42 shows the type of entries I am seeing
[23:46] <mozg> lods of them
[23:47] <mozg> and these:
[23:47] <mozg> 2013-10-22 22:46:40.663727 7f93d5e26700 1 mon.arh-ibstorage1-ib@3(electing).elector(50482) init, last seen epoch 50482
[23:47] <mozg> 2013-10-22 22:46:40.689689 7f93d5e26700 1 mon.arh-ibstorage1-ib@3(electing) e7 handle_timecheck drop unexpected msg
[23:47] <mozg> nothing much apart from these
[23:47] * rongze (~rongze@117.79.232.203) has joined #ceph
[23:48] * JoeGruher (~JoeGruher@134.134.137.71) Quit (Ping timeout: 480 seconds)
[23:48] <mikedawson> mozg: if everyone goes through an election, one monitor should become the leader and the rest should become peons. If that doesn't get done, you won't get a working quorum, I don't believe
[23:49] * rendar (~s@host184-179-dynamic.10-87-r.retail.telecomitalia.it) Quit ()
[23:49] <mozg> it's been doing this since this morning in UK
[23:49] <mozg> for like 12 hours at least
[23:49] <mikedawson> mozg: is client i/o working?
[23:50] * sagelap (~sage@42.sub-70-197-65.myvzw.com) Quit (Quit: Leaving.)
[23:50] <mozg> mikedawson, seems so
[23:50] <mozg> my vms are running
[23:50] <mozg> and can read from the disk
[23:50] <mozg> no hang tasks yet
[23:52] <mikedawson> mozg: this seems like a possible bug. joao is likely the expert
[23:52] <mozg> i can see that one mon server is showing slightly different entries
[23:53] <mozg> it's showing this: http://ur1.ca/fx8qh
[23:53] <mozg> where as all other mons are only showing messages i've pasted in the chat
[23:53] <mozg> nothing else
[23:53] <mozg> joao, hello
[23:53] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Quit: Leaving.)
[23:53] * jeff-YF (~jeffyf@67.23.117.122) Quit (Read error: Operation timed out)
[23:54] <mozg> joao, are you online? I've got a strange mons problem and i can't seems to get a quorum formed
[23:54] <mozg> was wondering if you could help me?
[23:54] <mozg> or someone else who's got experience with troubleshooting mons?
[23:55] * dmick (~dmick@2607:f298:a:607:f09d:ac37:81d9:ec58) has left #ceph
[23:55] <mikedawson> mozg: that last paste makes it look like monitor 1 isn't joining the others "quorum 0,2,3,4"
[23:56] * sagelap (~sage@2600:1012:b01e:fc6a:18f7:88d2:cd17:da33) has joined #ceph
[23:56] <mozg> shall I try to stop it and see if quorum will be formed?
[23:56] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[23:56] <mikedawson> mozg: and you shouldn't have elections happening that frequently (every 5 seconds in your log)
[23:57] <mozg> yeah, strange indeed
[23:57] * sleinen1 (~Adium@2001:620:0:26:d843:1b52:c296:a83d) Quit (Quit: Leaving.)
[23:57] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:57] * ksingh1 (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[23:58] <mozg> mikedawson, after stopping that mon i get the quorum now
[23:59] <mozg> quorum 1,2,3,4
[23:59] <wrencsok> you have 4 mon's?
[23:59] <mozg> i had 5, but it was for a test
[23:59] <wrencsok> ah, yeah that's a scary number. can't break a tie

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.