#ceph IRC Log

Index

IRC Log for 2013-09-17

Timestamps are in GMT/BST.

[0:00] * ceph-newbie (~oftc-webi@egress.nitrosecurity.com) has joined #ceph
[0:01] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[0:02] <ceph-newbie> ceph-deploy Error: ceph-disk: Unmounting filesystem failed: Command '['/bin/umount', '--', '/var/lib/ceph/tmp/mnt.4boFdP']' returned non-zero exit status 1
[0:02] <ceph-newbie> Command was: ceph-deploy osd prepare ceph-node2:sdb with ceph-node2 running Ubuntu 13.04 Desktop will all latestest updates
[0:03] <bandrus> running with sudo or as root?
[0:04] <ceph-newbie> ceph-node2 /dev/sdb, having first executed ceph-deploy disk zap ceph-node2:sdb, now had xfs 'ceph data' partition and 'ceph journal' partition with no type
[0:04] <ceph-newbie> ceph-deploy was not run as sudo or root
[0:06] <ceph-newbie> I thought ceph-deploy did not have to be. The 'ceph' user on the ceph-node2 machine is in sudoers as root, as specified in the ceph preflight instructions
[0:07] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[0:08] * Cube (~Cube@12.248.40.138) has joined #ceph
[0:08] <ceph-newbie> I'm guessing the command should be 'sudo /bin/umount ...' instead of '/bin/umount ...' but I can't figure out if that's right
[0:11] <bstillwell> So my previous method for configuring multiple rados gateways isn't working any more.
[0:12] <bstillwell> I was using multiple client.radosgw.gateway sections to do that, but now when I do a 'ceph-deploy --overwrite-conf config push ceph001' it removes one of the entries
[0:14] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[0:14] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[0:18] <bstillwell> looks like I might need to name them differently. Does client.radosgw.gateway1 and client.radosgw.gateway2 sound right?
[0:19] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[0:19] * a_ (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[0:19] <joshd> bstillwell: yeah, that's probably best. as long as the have the same caps (access to all the same pools) that should be fine
[0:20] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[0:20] <bstillwell> joshd: ok, thanks
[0:22] <bstillwell> that worked! :)
[0:23] * AfC (~andrew@1.147.141.141) has joined #ceph
[0:24] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) has joined #ceph
[0:25] * dmsimard (~Adium@108.163.152.2) Quit (Ping timeout: 480 seconds)
[0:27] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[0:28] * a (~a@209.12.169.218) has joined #ceph
[0:28] * a is now known as Guest6852
[0:31] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[0:32] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) Quit (Ping timeout: 480 seconds)
[0:33] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[0:33] * rturk is now known as rturk-away
[0:35] * ceph-newbie (~oftc-webi@egress.nitrosecurity.com) Quit (Quit: Page closed)
[0:35] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[0:36] * AfC (~andrew@1.147.141.141) Quit (Ping timeout: 480 seconds)
[0:38] * nwf (~nwf@67.62.51.95) has joined #ceph
[0:42] * rturk-away is now known as rturk
[0:47] * JustEra (~JustEra@ALille-555-1-93-207.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[0:51] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) has joined #ceph
[0:53] * BillK (~BillK-OFT@124-169-207-19.dyn.iinet.net.au) has joined #ceph
[0:53] * malcolm_ (~malcolm@silico24.lnk.telstra.net) has joined #ceph
[0:54] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:57] <yehuda_hm> nhm: so, where are we standing wrt to the benchmark?
[1:00] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[1:00] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[1:02] * carif (~mcarifio@ip-207-145-81-212.nyc.megapath.net) Quit (Read error: Operation timed out)
[1:06] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[1:06] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) Quit (Read error: Connection reset by peer)
[1:06] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[1:07] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) has joined #ceph
[1:07] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[1:08] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[1:15] * Tamil1 (~Adium@cpe-108-184-67-162.socal.res.rr.com) Quit (Quit: Leaving.)
[1:15] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[1:16] * Tamil1 (~Adium@cpe-108-184-67-162.socal.res.rr.com) has joined #ceph
[1:18] * Lea (~LeaChim@host86-135-252-168.range86-135.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:25] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) has joined #ceph
[1:25] * ScOut3R_ (~scout3r@4E5C2305.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:26] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) Quit (Ping timeout: 480 seconds)
[1:32] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) has joined #ceph
[1:37] * erice (~erice@c-98-245-48-79.hsd1.co.comcast.net) has joined #ceph
[2:05] * peetaur (~peter@CPEbc1401e60493-CMbc1401e60490.cpe.net.cable.rogers.com) Quit (Ping timeout: 480 seconds)
[2:10] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[2:11] * rweeks (~rweeks@50-0-136-111.dsl.dynamic.sonic.net) has joined #ceph
[2:11] <malcolm_> As far as tuning RBD cache goes, I've got 4 osd's with 60 disks a piece, would larger than 32MB make sense in my case?
[2:11] <rweeks> hey, anyone around who is familiar with iozone?
[2:11] <malcolm_> Oh and the OSD's are all connected via ib.
[2:12] * dmsimard1 (~Adium@108.163.152.66) has joined #ceph
[2:12] * Guest6852 (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[2:15] <rweeks> I'm trying to duplicate some testing that one of my customers is doing, and I'm running into stupid iozone issues.
[2:16] * xarses (~andreww@204.11.231.50.static.etheric.net) Quit (Ping timeout: 480 seconds)
[2:18] <rweeks> http://pastebin.com/cQsfzE3p if anyone has brain cycles to take a look.
[2:18] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit (Ping timeout: 480 seconds)
[2:20] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:24] <gregaf> not that I know how to use iozone, but I'm thinking
[2:24] <gregaf> 1. Can not open temp file: iozone.tmp
[2:24] <gregaf> 2. open: Not a directory
[2:24] <gregaf> is the problem
[2:24] <gregaf> you passing it directories instead of files?
[2:25] <gregaf> err, other way around
[2:27] * Tamil1 (~Adium@cpe-108-184-67-162.socal.res.rr.com) has left #ceph
[2:27] <rweeks> I don't think so - the /mnt/ceph/foo/test in the input is a file
[2:28] <rweeks> I touched the file and then made it 777 just to see if that's the problem
[2:28] <rweeks> but the command syntax says
[2:28] <rweeks> " -F filename filename filename ?
[2:28] <rweeks> Specify each of the temporary file names to be used in the
[2:28] <rweeks> throughput testing. The number of names should be equal to the
[2:28] <rweeks> number of processes or threads that are specified."
[2:29] * sagelap (~sage@63-237-196-66.dia.static.qwest.net) Quit (Read error: Operation timed out)
[2:29] <gregaf> maybe you need to first specify a directory for it to store results in, then?
[2:29] <gregaf> dunno, sorry
[2:30] <rweeks> I will look into that
[2:30] <rweeks> no worries
[2:30] <gregaf> I say that because it's clearly trying to create iozone.tmp and not liking the path it's doing that with
[2:30] <gregaf> use strace and see what the failed syscall is? *shrug*
[2:34] * dmsimard1 (~Adium@108.163.152.66) Quit (Ping timeout: 480 seconds)
[2:34] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[2:34] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit ()
[2:36] <rweeks> oooh!
[2:36] <rweeks> that was a great suggestion
[2:37] <rweeks> it pointed to my problem, which is that there was a missing space between filenames.
[2:37] <rweeks> d'oh
[2:37] <rweeks> thanks greg!
[2:39] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) Quit (Ping timeout: 480 seconds)
[2:41] * peetaur (~peter@CPEbc1401e60493-CMbc1401e60490.cpe.net.cable.rogers.com) has joined #ceph
[2:50] * angdraug (~angdraug@204.11.231.50.static.etheric.net) Quit (Quit: Leaving)
[2:52] * sarob (~sarob@ip-64-134-224-253.public.wayport.net) has joined #ceph
[2:58] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:59] * yy-nm (~Thunderbi@122.233.44.183) has joined #ceph
[3:00] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[3:07] * cofol19861 (~xwrj@110.90.119.113) Quit (Read error: Connection reset by peer)
[3:11] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[3:11] * ChanServ sets mode +o scuttlemonkey
[3:12] * cfreak201 (~cfreak200@p4FF3F447.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:15] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[3:15] * lx0 is now known as lxo
[3:20] * rturk is now known as rturk-away
[3:21] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[3:23] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[3:25] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[3:27] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[3:27] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Ping timeout: 480 seconds)
[3:30] * cfreak200 (~cfreak200@p4FF3E2F9.dip0.t-ipconnect.de) has joined #ceph
[3:31] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[3:32] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) has joined #ceph
[3:38] * sarob (~sarob@ip-64-134-224-253.public.wayport.net) Quit (Remote host closed the connection)
[3:38] * sarob (~sarob@ip-64-134-224-253.public.wayport.net) has joined #ceph
[3:38] * glzhao (~glzhao@211.155.113.239) has joined #ceph
[3:41] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) Quit (Ping timeout: 480 seconds)
[3:42] * Macheske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[3:44] * sarob (~sarob@ip-64-134-224-253.public.wayport.net) Quit (Read error: Operation timed out)
[3:46] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[3:47] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[3:48] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[3:48] * a is now known as Guest6865
[3:52] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:52] * Guest6865 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit ()
[3:53] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[3:53] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[3:57] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[4:02] * sarob (~sarob@2601:9:7080:13a:1987:da6d:bc7c:8c07) has joined #ceph
[4:06] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:07] * rweeks (~rweeks@50-0-136-111.dsl.dynamic.sonic.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[4:11] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[4:19] * mech422 (~steve@ip68-2-159-8.ph.ph.cox.net) has joined #ceph
[4:19] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[4:23] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[4:31] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:35] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[4:37] * sjustlaptop (~sam@172.56.21.10) Quit (Ping timeout: 480 seconds)
[4:43] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[4:54] * sarob (~sarob@2601:9:7080:13a:1987:da6d:bc7c:8c07) Quit (Remote host closed the connection)
[4:54] * sarob (~sarob@2601:9:7080:13a:1987:da6d:bc7c:8c07) has joined #ceph
[4:55] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[4:55] * sjustlaptop (~sam@172.56.21.10) Quit (Read error: Connection reset by peer)
[4:58] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[5:02] * sarob (~sarob@2601:9:7080:13a:1987:da6d:bc7c:8c07) Quit (Ping timeout: 480 seconds)
[5:02] * sagelap (~sage@156.39.10.22) has joined #ceph
[5:06] * fireD_ (~fireD@93-142-250-43.adsl.net.t-com.hr) has joined #ceph
[5:08] * fireD (~fireD@93-142-213-185.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:15] * nerdtron (~kenneth@202.60.8.252) Quit (Quit: Leaving)
[5:17] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[5:31] * sagelap (~sage@156.39.10.22) Quit (Read error: Connection reset by peer)
[5:31] * sagelap (~sage@156.39.10.22) has joined #ceph
[5:32] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[5:34] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) has joined #ceph
[5:42] * madkiss (~madkiss@srvnet-01-055.ikbnet.co.at) Quit (Ping timeout: 480 seconds)
[5:46] * sarob (~sarob@2601:9:7080:13a:914d:3f1f:d2f4:7c31) has joined #ceph
[5:49] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:56] * dmsimard (~Adium@108.163.152.66) has joined #ceph
[6:05] * yy-nm (~Thunderbi@122.233.44.183) Quit (Remote host closed the connection)
[6:05] * sagelap (~sage@156.39.10.22) Quit (Ping timeout: 480 seconds)
[6:34] * malcolm_ (~malcolm@silico24.lnk.telstra.net) Quit (Ping timeout: 480 seconds)
[6:44] * sarob_ (~sarob@nat-dip6.cfw-a-gci.corp.yahoo.com) has joined #ceph
[6:50] * malcolm_ (~malcolm@131.181.9.246) has joined #ceph
[6:51] * sarob (~sarob@2601:9:7080:13a:914d:3f1f:d2f4:7c31) Quit (Ping timeout: 480 seconds)
[6:56] * malcolm__ (~malcolm@cfcafwp.sgi.com) has joined #ceph
[6:56] * malcolm_ (~malcolm@131.181.9.246) Quit (Read error: Connection reset by peer)
[6:56] * malcolm__ (~malcolm@cfcafwp.sgi.com) Quit (Read error: Connection reset by peer)
[7:00] * malcolm__ (~malcolm@131.181.9.246) has joined #ceph
[7:10] * malcolm__ (~malcolm@131.181.9.246) Quit (Ping timeout: 480 seconds)
[7:10] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[7:12] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:17] * capri_on (~capri@212.218.127.222) has joined #ceph
[7:20] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[7:23] * Cube (~Cube@12.248.40.138) has joined #ceph
[7:23] * capri (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[7:26] * sarob_ (~sarob@nat-dip6.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[7:29] * dmsimard (~Adium@108.163.152.66) Quit (Quit: Leaving.)
[7:30] * sagelap (~sage@2600:1012:b029:8e87:a9c1:9e98:20c8:9735) has joined #ceph
[7:33] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[7:40] * malcolm__ (~malcolm@silico24.lnk.telstra.net) has joined #ceph
[7:43] * wenjianhn (~wenjianhn@222.129.35.164) has joined #ceph
[7:57] * sagelap (~sage@2600:1012:b029:8e87:a9c1:9e98:20c8:9735) Quit (Read error: Connection reset by peer)
[8:00] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:06] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[8:15] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:16] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.yahoo.com) has joined #ceph
[8:32] * madkiss (~madkiss@089144192241.atnat0001.highway.a1.net) has joined #ceph
[8:35] * rendar (~s@host154-179-dynamic.12-79-r.retail.telecomitalia.it) has joined #ceph
[8:36] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[8:40] * allsystemsarego (~allsystem@188.25.131.49) has joined #ceph
[8:49] * sarob (~sarob@nat-dip6.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[8:52] * malcolm__ (~malcolm@silico24.lnk.telstra.net) Quit (Read error: Operation timed out)
[9:06] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[9:06] * ChanServ sets mode +v andreask
[9:16] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:16] <Kioob`Taff> Hi
[9:17] <Kioob`Taff> from one kernel client I have a lot of errors like that : Sep 17 09:15:41 alg kernel: [90186.020705] libceph: osd47 10.0.0.9:6807 socket error on read
[9:17] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:17] <Kioob`Taff> and on the osd47, I have : 10.0.0.9:6807/13094 submit_message osd_op_reply(37035805 rb.0.32576a.238e1f29.000000002314 [write 892928~4096] ondisk = 0) v4 remote, 10.0.0.11:0/3966050692, failed lossy con, dropping message 0x16c9f200
[9:18] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[9:18] <yanzheng> that's normal
[9:18] <Kioob`Taff> but... I have a lots of lines of this error in logs, filling the disk. And the block device hang
[9:19] <Kioob`Taff> (the RBD)
[9:21] <yanzheng> maybe you have network problem
[9:22] <Kioob`Taff> mmm
[9:22] <Kioob`Taff> The socket is really opened, and other OSD of the same host works. :/
[9:22] <Kioob`Taff> I will check
[9:23] * nigwil (~chatzilla@2001:44b8:5144:7b00:f870:e853:7acb:4154) has joined #ceph
[9:23] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Remote host closed the connection)
[9:24] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[9:24] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:27] <Kioob`Taff> Is there a way to verify connectivity with ceph ? with netcat I can open the socket and see the first OSD message, starting by "ceph v027&3"
[9:27] <nigwil> getting ceph-client install on CentOS 6.4 is an exercise in frustration. Is it planned to be fixed one day?
[9:28] <Kioob`Taff> For me it looks like a problem in kernel client... not a network problem.
[9:28] * LeaChim (~LeaChim@host86-135-252-168.range86-135.btcentralplus.com) has joined #ceph
[9:35] * madkiss (~madkiss@089144192241.atnat0001.highway.a1.net) Quit (Quit: Leaving.)
[9:41] <nigwil> Kioob`Taff: if you do ceph osd tree, you can see all your OSDs
[9:45] <Kioob`Taff> nigwil: mmm yes, I know that. But all my OSD works. I only have one client which have problems.
[9:46] <nigwil> what error are you seeing?
[9:50] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[9:51] <Kioob`Taff> I solved that by rebooting it.
[9:51] <Kioob`Taff> from one kernel client I have a lot of errors like that : Sep 17 09:15:41 alg kernel: [90186.020705] libceph: osd47 10.0.0.9:6807 socket error on read
[9:51] <Kioob`Taff> and on the osd47, I have : 10.0.0.9:6807/13094 submit_message osd_op_reply(37035805 rb.0.32576a.238e1f29.000000002314 [write 892928~4096] ondisk = 0) v4 remote, 10.0.0.11:0/3966050692, failed lossy con, dropping message 0x16c9f200
[9:52] <nigwil> I wonder if your network connection is dropping out?
[9:52] <Kioob`Taff> So, only one OSD from one client is unreachable. After reboot it works, and this client was able to connect to others OSD
[9:52] <Kioob`Taff> The network was working fine
[9:52] <Kioob`Taff> Only the kernel client was complaining about "socket error"
[9:59] * roald (~roaldvanl@139-63-21-115.nodes.tno.nl) has joined #ceph
[10:01] * jcfischer_ (~fischer@user-23-19.vpn.switch.ch) has joined #ceph
[10:06] * jcfischer (~fischer@macjcf.switch.ch) Quit (Ping timeout: 480 seconds)
[10:06] * jcfischer_ is now known as jcfischer
[10:12] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[10:12] * ChanServ sets mode +o scuttlemonkey
[10:15] * shimo_ (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[10:16] * wenjianhn (~wenjianhn@222.129.35.164) Quit (Ping timeout: 480 seconds)
[10:17] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[10:17] * shimo_ is now known as shimo
[10:19] * jcfischer_ (~fischer@macjcf.switch.ch) has joined #ceph
[10:22] * jcfischer (~fischer@user-23-19.vpn.switch.ch) Quit (Ping timeout: 480 seconds)
[10:22] * jcfischer_ is now known as jcfischer
[10:25] * mozg (~oftc-webi@host86-184-120-250.range86-184.btcentralplus.com) has joined #ceph
[10:25] <mozg> hello guys
[10:26] * wenjianhn (~wenjianhn@222.129.35.164) has joined #ceph
[10:35] * hggh (~jonas@aljona.brachium-system.net) has joined #ceph
[10:37] * wenjianhn (~wenjianhn@222.129.35.164) Quit (Ping timeout: 480 seconds)
[10:37] * wenjianhn (~wenjianhn@222.129.35.164) has joined #ceph
[10:39] <mozg> is anyone here running database servers on ceph cluster?
[10:39] <mozg> i was hoping we could have a chat on how to improve the performance
[10:40] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Quit: Leaving.)
[10:42] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Ping timeout: 480 seconds)
[10:44] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[10:45] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Read error: Operation timed out)
[10:57] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[10:59] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:04] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[11:05] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[11:07] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Ping timeout: 480 seconds)
[11:08] <agh> Hello
[11:08] <agh> I need help on radosgw. anyone ?
[11:09] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[11:12] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[11:13] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[11:17] * shimo_ (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[11:21] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[11:21] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[11:21] * shimo_ is now known as shimo
[11:28] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[11:44] <agh> I've a big issue with RadosGW
[11:44] <agh> :'(
[11:44] <agh> I can connect with my creds
[11:44] <agh> i can create a bucket
[11:44] <agh> but, I CAN'T upload a file into it
[11:45] <agh> here is the logfile : http://pastebin.com/6NNuczC5
[11:45] <agh> I need yout help
[11:48] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Read error: Operation timed out)
[11:49] <joelio> agh: how big is your file?
[11:49] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[11:49] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:50] <joelio> agh: main point here, if you're not using chunked encoding, the largst POST is 2Gb
[11:50] <agh> joelio: it's a little file :) 2KB
[11:50] <agh> joelio: it does not work with any file, little or big
[11:51] <agh> And now, i've a lot of these lines in my radosgw.log :
[11:51] <agh> 2013-09-17 09:50:34.490035 7fee56030700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fee113f7700' had timed out after 600
[11:51] <joelio> agh: how are you uploading?
[11:51] <agh> with CyberDuck
[11:51] <agh> joelio: but the fact is, i did a PoC 3 days ago, and it worked !!
[11:52] <agh> joelio: I purged my cluster and reinstall it to do a clean one for prod. But it doesn't work
[11:52] * Rocky (~r.nap@188.205.52.204) has joined #ceph
[11:54] <joelio> agh: did it recreate the pools (does automatically)
[11:55] <joelio> .rgw.root,5 .rgw.control,6 .rgw,7 .rgw.gc,8 .users.uid,9 .users.email,10 .users,11 .rgw.buckets.index,12 .rgw.buckets,
[11:55] <joelio> althogh the automatic only gives 8 pgs per pool annoyingly :)
[11:56] <hggh> having 3 osds in the same server, I need an custom crush-map to ensure that data is replicated to another server, than saved on different osds in the same hardware?
[11:58] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) has joined #ceph
[11:59] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[12:01] * gucki (~smuxi@HSI-KBW-109-192-187-143.hsi6.kabel-badenwuerttemberg.de) has joined #ceph
[12:05] <joelio> Any raring packages for fastcgi/apache2 mod available?
[12:07] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[12:09] * gucki_ (~smuxi@HSI-KBW-109-192-187-143.hsi6.kabel-badenwuerttemberg.de) has joined #ceph
[12:09] * gucki_ (~smuxi@HSI-KBW-109-192-187-143.hsi6.kabel-badenwuerttemberg.de) Quit (Read error: Connection reset by peer)
[12:11] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Read error: Operation timed out)
[12:18] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:21] * wenjianhn (~wenjianhn@222.129.35.164) Quit (Ping timeout: 480 seconds)
[12:34] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Ping timeout: 480 seconds)
[12:34] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[12:34] * ChanServ sets mode +v andreask
[12:39] * glzhao (~glzhao@211.155.113.239) Quit (Quit: leaving)
[12:39] <agh> joelio: yes pools are here :
[12:40] <agh> 0 data,1 metadata,2 rbd,13 .rgw.root,14 .rgw.control,15 .rgw,16 .rgw.gc,17 .users.uid,18 .users,19 .rgw.buckets.index,20 .rgw.buckets,
[12:41] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Ping timeout: 480 seconds)
[12:42] * yanzheng (~zhyan@101.83.192.202) has joined #ceph
[12:43] <agh> I've a big issue with RadosGW
[12:45] <joelio> agh: I'd say it's an issue with how you've got it setup - as you've had it working already right?
[12:47] <agh> joelio: yes
[12:47] <agh> joelio: that's why i'm really deseperated now
[12:47] <agh> agh: i really do not understand
[12:48] <agh> it seems that radosgw can't talk with the osds
[12:49] <joelio> agh: did you seup that auth caps properly
[12:49] <joelio> can you list buckets and create them?
[12:49] <agh> joelio: yes, i can list buckets and create
[12:49] <agh> joelio: for auth caps i did that : ceph-authtool -n client.radosgw.gateway --cap mon 'allow rw' --cap osd 'allow rwx' keyring.radosgw.gateway
[12:50] <joelio> yea, looks right
[12:52] <joelio> agh: where is the error on that paste, I'm having trouble finding it :)
[12:52] <joelio> 2013-09-17 08:12:58.740063 7f3e44efb820 0 ERROR: FCGX_Accept_r returned -4
[12:52] <joelio> ?
[12:52] <agh> joelio: there is no error I think ! That's the problem :(
[12:53] * yanzheng (~zhyan@101.83.192.202) Quit (Ping timeout: 480 seconds)
[12:53] <agh> But, Apache tells me :
[12:53] <agh> [Tue Sep 17 09:44:25 2013] [error] [client 46.231.147.8] FastCGI: incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi" [Tue Sep 17 10:51:09 2013] [error] [client 46.231.147.8] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
[13:01] <andreask> agh: any change your osds ran out of space?
[13:12] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Read error: Connection reset by peer)
[13:19] * zhangjf_zz2 (~zjfhappy@222.128.1.105) has joined #ceph
[13:19] * zhangjf_zz2 (~zjfhappy@222.128.1.105) Quit ()
[13:19] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) has joined #ceph
[13:26] <joelio> what's the largest object you can store in s3? Seems like I can only get so much data an object before I get 2013-09-17 12:20:56.538826 7fddfd7fa700 20 get_obj_state: s->obj_tag was set empty
[13:26] <joelio> 2013-09-17 12:20:56.538841 7fddfd7fa700 20 prepare_atomic_for_write_impl: state is not atomic. state=0x7fdda08120d8
[13:27] <joelio> unless that's a red herring and the object has been stored properly and those errors are post load
[13:32] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[13:35] <joelio> hmm, yea, a 4.7Gb file fails with a 500
[13:36] * nigwil (~chatzilla@2001:44b8:5144:7b00:f870:e853:7acb:4154) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 23.0.1/20130814063812])
[13:45] <joelio> yea, quick googling showing loads of same errors on list
[13:45] <joelio> kinda sucks
[13:48] <joelio> 2013-09-17 12:48:47.351960 7fddeffdf700 20 prepare_atomic_for_write_impl: state is not atomic. state=0x7fdd2c422f18
[13:49] <joelio> 2013-09-17 12:48:47.365536 7fddeffdf700 0 WARNING: set_req_state_err err_no=27 resorting to 500
[13:49] <joelio> :(
[13:49] <joelio> ceph version 0.67.3 (408cd61584c72c0d97b774b3d8f95c6b1b06341a)
[13:52] <joelio> "osd_max_attr_size": "65536",
[13:52] <andreask> joelio: i think you need to use multipart ojbect upload
[13:52] <joelio> ^ makes no effect either
[13:53] <joelio> andreask: I'm using s3 gems - http://ceph.com/docs/next/radosgw/s3/ruby/
[13:53] <joelio> I'd hope the store method does multipary
[13:54] <joelio> andreask: also, people on the list atre having the same issue
[13:55] <andreask> joelio: hmm ... at least I see here a special AWS::S3::MultipartUpload class
[13:56] <joelio> really? right, I'll give that a go!
[13:56] * yanzheng (~zhyan@134.134.139.70) has joined #ceph
[13:57] <andreask> joelio: http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html#multipart_upload-instance_method
[13:58] <joelio> the docs on Ceph reference - http://amazon.rubyforge.org/doc/
[13:58] <joelio> can't see that in theree, old perhaps?
[13:58] * haomaiwang (~haomaiwan@211.155.113.239) Quit (Remote host closed the connection)
[13:58] <andreask> I only googled ;-)
[13:59] * haomaiwang (~haomaiwan@117.79.232.243) has joined #ceph
[14:00] * haomaiwang (~haomaiwan@117.79.232.243) Quit (Remote host closed the connection)
[14:01] <joelio> hmm, .sotre is a streamed upload method
[14:01] <joelio> I think it was already right
[14:02] * wenjianhn (~wenjianhn@222.129.35.164) has joined #ceph
[14:03] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[14:04] * haomaiwang (~haomaiwan@211.155.113.239) has joined #ceph
[14:04] <joelio> andreask: I don't know what that link you provided is, but there are no methods for that
[14:05] * claenjoy (~leggenda@37.157.33.36) Quit (Remote host closed the connection)
[14:05] <joelio> also, if you have to force multipart in some way and the gem doesn't support it, the docs should really show a method that works
[14:05] <joelio> but I really don't think that's the issue tbh
[14:06] <andreask> joelio: this is for the official AWS Ruby SDK
[14:06] <joelio> yea, shich isn't the aws/s3 gem
[14:06] <andreask> yes
[14:07] <joelio> I'll try this one instead
[14:08] * madkiss (~madkiss@089144192241.atnat0001.highway.a1.net) has joined #ceph
[14:08] <joelio> I just find it ironic, people are having the same issue using client side tools like cyberduck
[14:09] <joelio> if they don't support multipart, what's the point in them?
[14:09] <joelio> and tbh I think they will...
[14:11] <andreask> cyberduck does S3 multipart uploads IIRC
[14:11] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:13] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[14:17] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[14:18] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has left #ceph
[14:19] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:21] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:22] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:26] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) has joined #ceph
[14:29] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[14:29] <peetaur> someone please convince me thoroughly whether I should use consumer SATA, nearline/enterprise SATA or SAS disks for a small cluster :)
[14:29] <peetaur> hadoop clusters, for example, can use consumer SATA without any issues... if you want more performance, scale horizontally
[14:31] <peetaur> I will probably use 2 SSDs with 3 OSD journals each ... so then maybe the consumer SATA vs enterprise SATA vs SAS is not so relevant (since the SSDs or network would be the bottleneck)
[14:41] <joelio> hmm, giving this s3 stuff up as a bad joke
[14:41] <joelio> shameas we wanted to use it as an object store for video captures
[14:42] <joelio> kinda pointless if I can only upload a bit
[14:42] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:43] <andreask> joelio: oh ... aws sdk libraries don't work for you?
[14:47] <jerker> peetaur: i'm going consumer SATA - SSD and HDD. small office server. low end.
[14:50] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[14:53] <joelio> andreask: afraid not
[14:53] * madkiss (~madkiss@089144192241.atnat0001.highway.a1.net) Quit (Remote host closed the connection)
[14:53] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[14:54] <joelio> andreask: I'm getting XML errors on the stream method. I think that's down to some zone issue. Even using client side tools that should work, still same issue
[14:54] <andreask> joelio: you mean you receive an error from the gateway?
[14:55] <joelio> get 500 on large uploads
[14:55] <andreask> joelio: and at which size do you split them?
[14:56] <joelio> I'm using a client which supprts multipart, so not entirely sure how it's chunking
[14:56] * wenjianhn (~wenjianhn@222.129.35.164) Quit (Quit: Leaving)
[14:57] * mozg (~oftc-webi@host86-184-120-250.range86-184.btcentralplus.com) Quit (Remote host closed the connection)
[14:57] <joelio> andreask: the sdk - afaik - doesn't allow the setting of a server name too.. unless I'd set that as REGION?
[14:58] <joelio> but the sdk is not referenced in Ceph's docs
[14:58] <joelio> the aws-s3 gem is however
[14:58] * diegows (~diegows@190.190.11.42) has joined #ceph
[14:58] <andreask> joelio: yes, that can be a problem ... the only allow official s3 servers
[15:02] <andreask> joelio: any errors from the radosgw or apache?
[15:03] <joelio> I'll gist some up now
[15:09] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[15:12] <joelio> andreask: https://gist.github.com/joelio/050059e3c8b266defc2b
[15:12] <joelio> is the rgw
[15:12] <joelio> WARNING: set_req_state_err err_no=27 resorting to 500
[15:12] <joelio> http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/3664
[15:12] * markbby (~Adium@168.94.245.3) has joined #ceph
[15:13] <andreask> joelio: yeah, I found this http://www.spinics.net/lists/ceph-users/msg04042.html
[15:13] <andreask> you might give this a try
[15:14] <joelio> I already have done
[15:14] <joelio> osd max attr size = 655360
[15:14] <joelio> no good
[15:14] <joelio> changed logging.. still get all the logspam and eventual error
[15:14] <joelio> checked uisng admin socket that config applied.. has
[15:15] <joelio> increased PG's yesterday (as 8 sucks for a default!)
[15:15] <andreask> yep
[15:16] <joelio> and I'm using the precise provided packages (running on raring, but that'll be ok)
[15:17] <joelio> as there are no debs in gitbuilder for raring with 100-continue support
[15:17] <joelio> there is a git repo, but a dpkg-buildpackage fails spectacularly
[15:18] <joelio> even using precise sources doesn't work (to get builddeps) as the upstream has moved significantly since that package was build
[15:18] <ccourtaut> hi there, i'm trying to run some teuthology tests, but got failures
[15:19] <ccourtaut> any ideas on how i could find out more precisly what's happening here
[15:22] <joelio> andreask: I also see several RGWDataChangesLog::ChangesRenewThread: start while uploading
[15:22] <joelio> before it all turns to errors
[15:22] <joelio> so I guess the multipart is working
[15:22] <joelio> just rgw isn't
[15:23] <andreask> joelio: hmm ... I see there this (solved) bug in the tracker http://tracker.ceph.com/issues/6111
[15:24] <joelio> andreask: ooooh, interesting, thanks
[15:24] <joelio> this is on a test cluster, so I'll try that
[15:24] <andreask> joelio: looks suspicous
[15:26] * sagelap (~sage@2600:1012:b015:14d:dcbb:1888:17ac:b4e3) has joined #ceph
[15:29] * sagelap (~sage@2600:1012:b015:14d:dcbb:1888:17ac:b4e3) Quit (Read error: No route to host)
[15:30] <joelio> oh, I like the client io addition to dev
[15:31] <joelio> andreask: nope, same thing
[15:32] <andreask> joelio: looks worth posting to the mailing-list so devs can comment
[15:32] <joelio> will do
[15:34] <joelio> another (minor) thing is that there's slashes in the secret_key - don't know if that's supposed to be part of the spec, but in teh docs, the hash uses single quotees.
[15:34] <joelio> not going to work :)
[15:35] <joelio> I have slashes in my keys
[15:37] * iii8 (~Miranda@91.207.132.71) Quit (Read error: Connection reset by peer)
[15:43] * yanzheng (~zhyan@134.134.139.70) Quit (Remote host closed the connection)
[15:46] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[15:46] <joelio> yea, just tested with 700 Mb file, works great both uploading and downloading
[15:47] <joelio> same issue as the guys on the list have
[15:47] <joelio> soooo close, yet soooo far :)
[15:48] <andreask> ;-) ... I'd say that is something Yehuda can help you with
[15:51] * BillK (~BillK-OFT@124-169-207-19.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:56] <peetaur> jerker: okay thanks. But I'll keep thinking about it. I'm likely to go with consumer SATA and small cheap enterprise SSD, and scale out horizontally unless I find a good reason not to.
[16:03] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[16:06] * markbby (~Adium@168.94.245.3) Quit (Remote host closed the connection)
[16:07] * allsystemsarego (~allsystem@188.25.131.49) Quit (Quit: Leaving)
[16:13] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[16:19] * sagelap (~sage@2600:1012:b015:14d:811c:4a63:828:ae31) has joined #ceph
[16:23] <Kioob`Taff> « rbd: error: image still has watchers »
[16:23] <Kioob`Taff> any way to find where ?
[16:24] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[16:24] <Kioob`Taff> because I don't find any client referencing this image
[16:24] <Kioob`Taff> (since 5 minutes)
[16:26] <Kioob`Taff> « This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout. »
[16:26] <Kioob`Taff> maybe the timeout is greater than 30s ?
[16:26] <agh> Hello
[16:26] <agh> I can't succeed with RadosGW
[16:26] <agh> I follow the docs, under CentOS 6.4 with Ceph 0.67.3
[16:27] <agh> here is the output log of radosgw just after start :
[16:27] <agh> 2013-09-17 14:25:39.064951 7f4c09917820 0 ERROR: FCGX_Accept_r returned -4
[16:27] <agh> ? do you know what is this error ?
[16:27] <agh> it's really strange
[16:27] <agh> I can create bucket
[16:27] <agh> I can create folders in the bucket
[16:28] <agh> PUT /test/truc%2F HTTP/1.1" 200 - "-" "Cyberduck/4.2.1 (Mac OS X/10.8.4) (i386)"
[16:28] <agh> but, I CAN'T upload a file (even a little one) :
[16:28] <agh> "PUT /test/troc%2Fimage001.png HTTP/1.1" 500 534 "-" "Cyberduck/4.2.1 (Mac OS X/10.8.4) (i386)"
[16:28] <agh> error 500
[16:28] <agh> I really do not understand
[16:31] * leseb (~leseb@88-190-214-97.rev.dedibox.fr) Quit (Ping timeout: 480 seconds)
[16:32] * sbadia (~sbadia@yasaw.net) Quit (Ping timeout: 480 seconds)
[16:35] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:36] * odi (~quassel@2a00:12c0:1015:136::9) Quit (Remote host closed the connection)
[16:36] * odi (~quassel@2a00:12c0:1015:136::9) has joined #ceph
[16:37] <joelio> agh: checked the perms on the fcgi too?
[16:37] <agh> yes, chmod +x and apache user
[16:38] <joelio> fastcgi is spawned as ab external process
[16:38] <agh> joelio: it's really strange
[16:38] <agh> i'm on it for 2 days :!
[16:38] <joelio> tell me about it, having rgw issues myself :)
[16:39] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[16:41] * odi (~quassel@2a00:12c0:1015:136::9) Quit (Remote host closed the connection)
[16:41] * odi (~quassel@2a00:12c0:1015:136::9) has joined #ceph
[16:42] <agh> i am desperated
[16:42] <jcfischer> ahrg
[16:42] <jcfischer> (sorry)
[16:43] <jcfischer> I have played around with snapshots on CephFS and managed to reliably crash our two MDS
[16:43] <agh> is there an radosgw expert there ?
[16:43] <jcfischer> and they don't come up any more either (0.61.8)
[16:45] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[16:45] <jcfischer> http://pastebin.com/BpireiNB
[16:46] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has left #ceph
[16:46] <jcfischer> this kind of sucks really bad, because we put or OpenStack VMs on CephFS
[16:47] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[16:47] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[16:47] <tsnider> 'm trying to setup a new CEPH cluster following instructions on http://ceph.com/docs/next/start/quick-ceph-deploy/ . ceph-deploy successfully installed server nodes (at least on the suface). Adding a monitor failed with No such file or directory: '/etc/ceph/ceph.conf.24239.tmp' Details in http://paste.openstack.org/show/47154/. Can someone tell me what I missed? thx
[16:47] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[16:48] <alfredodeza> tsnider: have you tried the command that failed on the remote host?
[16:48] <alfredodeza> this --> ceph-mon --cluster ceph --mkfs -i controller21 --keyring /var/lib/ceph/tmp/ceph-controller21.mon.keyring
[16:48] <alfredodeza> ah wait
[16:48] <alfredodeza> your error is before that
[16:49] * leseb (~leseb@88-190-214-97.rev.dedibox.fr) has joined #ceph
[16:51] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[16:52] <jcfischer> so this is the assertion that brings the mds down immediately: 2013-09-17 16:51:15.620817 7ffead5f6700 -1 mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*, MDSlaveUpdate*)' thread 7ffead5f6700 time 2013-09-17 16:51:15.620077
[16:52] <jcfischer> mds/journal.cc: 1173: FAILED assert(in->first == p->dnfirst || (in->is_multiversion() && in->first > p->dnfirst))
[16:53] <jcfischer> Is there anything I can do to make the MDS come up?
[16:54] * sagelap (~sage@2600:1012:b015:14d:811c:4a63:828:ae31) Quit (Read error: Connection reset by peer)
[16:54] <joao> tsnider, here
[16:54] <tsnider> joao: ok
[16:55] * sagelap (~sage@2600:1012:b015:14d:a9c1:9e98:20c8:9735) has joined #ceph
[16:55] <joao> ceph-deploy should have created the mon dir
[16:55] <joao> are you using ceph-deploy?
[16:55] <joao> or are you deploying the monitors manually?
[16:55] <tsnider> joao: didn't know how ceph operated / or which IRC was active
[16:56] <tsnider> joao - ceph-deploy
[16:56] <joao> both are
[16:56] <joao> #ceph-devel is less than a week old :)
[16:56] <tsnider> joao:following http://ceph.com/docs/next/start/quick-start-preflight/ ...
[16:56] <joao> tsnider, can you check the existence of the mon's data dir?
[16:57] <joao> defaults to /var/lib/ceph/mon/
[16:57] <joao> i.e., that directory should be populated by one or more other directories in the form of 'ceph-ID'
[16:58] <joao> oh nop
[16:58] <joao> forget it
[16:58] <joao> not the issue
[16:58] <joao> only now did I notice you had pasted the resulting output from ceph-deploy
[16:58] <tsnider> joao: /var/lib/ceph exists
[16:58] <tsnider> ceph@controller21:~$ ls -R /var/lib/ceph
[16:58] <tsnider> /var/lib/ceph:
[16:58] <tsnider> mon tmp
[16:58] <tsnider> /var/lib/ceph/mon:
[16:58] <tsnider> ceph-controller21
[16:58] <tsnider> /var/lib/ceph/mon/ceph-controller21:
[16:58] <tsnider> /var/lib/ceph/tmp:
[16:58] <tsnider> ceph-controller21.mon.keyring
[16:58] <joao> alfredodeza, ping
[16:59] <jcfischer> so this is a known bug: http://tracker.ceph.com/issues/5250 - I guess I need to manually build mds and comment out that assertion
[16:59] <alfredodeza> joao: pong
[16:59] <joao> alfredodeza, http://paste.openstack.org/show/47154/
[16:59] <joao> looks familiar?
[16:59] <alfredodeza> it does not, but I am looking into it
[17:00] <joao> alfredodeza, is there a ticket I can point tsnider to?
[17:00] <alfredodeza> joao: searching right now
[17:00] <tsnider> maybe I need to create it manually and move on
[17:03] <jcfischer> trying to rebuild the mds now and commenting out that assertion
[17:03] <alfredodeza> tsnider: does your user have super user permissions?
[17:03] <tsnider> yeah
[17:05] <alfredodeza> tsnider: are you able to replicate that every single time you try?
[17:05] <tsnider> alfredodeza: assuming that:
[17:05] <tsnider> ssh user@ceph-server
[17:05] <tsnider> sudo useradd -d /home/ceph -m ceph
[17:05] <tsnider> sudo passwd ceph
[17:05] <tsnider> echo "ceph ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
[17:05] * sagelap (~sage@2600:1012:b015:14d:a9c1:9e98:20c8:9735) Quit (Read error: Connection reset by peer)
[17:05] <tsnider> sudo chmod 0440 /etc/sudoers.d/ceph
[17:05] <tsnider> is correct
[17:05] <alfredodeza> yes that is correct
[17:06] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:06] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[17:07] <alfredodeza> tsnider: on that remote host, what are the contents of `/etc/ceph/*` ?
[17:08] <tsnider> alfredodeza: replicates trying it as user ceph or root
[17:08] <alfredodeza> this looks like we are being overly protect that temp file, which should not be the case
[17:08] <alfredodeza> *protecting
[17:08] <alfredodeza> we could make a proper temporary file and use that, right now we are doing os.getpid() which is kind of brittle
[17:09] <alfredodeza> tsnider: if you could run `ls -alh /etc/ceph/*` and paste the results of that, it would definitely help
[17:09] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[17:10] <tsnider> alfredodeza: hmm: /etc/ceph doesn't exist on the node. :( ??
[17:10] <alfredodeza> aha
[17:10] <alfredodeza> even more telling
[17:11] <alfredodeza> did you actually install ceph in the remote host?
[17:11] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:11] <alfredodeza> when ceph gets installed, it creates /etc/ceph/
[17:11] <alfredodeza> tsnider: ^ ^
[17:12] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[17:12] <tsnider> alfredodeza: ok -- instructions:
[17:12] <tsnider> To install Ceph on your server node, open a command line on your admin node and type the following:ceph-deploy install {server-node-name}[,{server-node-name}] ceph-deploy install mon-ceph-node
[17:13] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:13] <alfredodeza> sorry, I am not following you. Did you actually install ceph on the remote host
[17:13] <tsnider> alfredodeza: ceph-deploy install mon-ceph-node should be
[17:13] <tsnider> ceph-deploy install {mon-ceph-node}
[17:14] <tsnider> alfredodeza: ah --- installed on the servers but not on the monitor (oops) (now leaving quietly & quickly ;) )
[17:14] <alfredodeza> you need to have ceph installed in all the nodes you want to deploy monitors to
[17:22] <tsnider> alfredodza: yeah -- that helped -- on http://ceph.com/docs/next/start/quick-ceph-deploy/ is this:
[17:22] <tsnider> ceph-deploy mon create {mon-server-name}
[17:22] <tsnider> ceph-deploy mon create mon-ceph-node
[17:22] <tsnider> one or two commands. If two what's the difference between mon-server-name and mon-node-name?
[17:25] <tsnider> alfredodza: oh never mind; ones the server and the other monitor
[17:25] * alram (~alram@38.122.20.226) has joined #ceph
[17:25] * angdraug (~angdraug@204.11.231.50.static.etheric.net) has joined #ceph
[17:27] * JustEra (~JustEra@89.234.148.11) Quit (Ping timeout: 480 seconds)
[17:31] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[17:31] * ChanServ sets mode +o scuttlemonkey
[17:33] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[17:34] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[17:34] <claenjoy> hey guys , need to create a osd in the machine , I have already format the partition /dev/sdb filesystem xfs , use the ceph-deploy osd create machine:/dev/sdb , before lunch that command /dev/sdb has to mount some where ? it is correct :
[17:34] <claenjoy> make a folder example "ceph-01" in /var/lib/ceph/osd/ and mount in it ? thanks a lot
[17:37] * joelio finds Cyberduck *doesn't* do multipart after all!"
[17:37] <joelio> cyberduck 4, yes
[17:37] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has left #ceph
[17:37] <joelio> damn, I'm already usign that
[17:38] <tsnider> (am I running into | will I run into) installation problems regarding now and/or execution time problems later using a short node name vs its full fqdn for commands?
[17:38] <tsnider> e.g ceph-deploy mon monitorNode vs. ceph-deploy mon labA.MonitorNode.mysite.com
[17:38] * mattt (~mattt@92.52.76.140) has joined #ceph
[17:38] <mattt> if i change ceph.conf on my osd nodes, what is the recommended solution to getting the running processes to re-read that file ?
[17:41] * sagelap (~sage@199.106.166.12) has joined #ceph
[17:43] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[17:43] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:44] <mattt> also, do you need to explicitly add each osd to your ceph.conf? it doesn't look to be necessary
[17:44] <mattt> but now i'm having some issues restarting osds and wonder if it's related
[17:45] <jcfischer> so the compiled mds comes past the assertion (that I commented out) but crashes as soon as it tries to handle the snapshot: http://pastebin.com/HB6Jpdjc
[17:46] * sleinen (~Adium@2001:620:0:25:4198:379c:5f48:c67a) has joined #ceph
[17:46] <jcfischer> is there a way to coerce mds to ignore the snapshot or remove it (while I can't mount the CephFS)
[17:47] <jcfischer> is http://tracker.ceph.com/issues/5250 fixed in dumpling?
[17:48] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[17:50] * scuttlemonkey (~scuttlemo@38.127.1.5) Quit (Ping timeout: 480 seconds)
[17:54] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[17:55] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) has joined #ceph
[17:57] * madkiss (~madkiss@089144192241.atnat0001.highway.a1.net) has joined #ceph
[17:59] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[18:02] <tsnider> In http://ceph.com/docs/next/start/quick-ceph-deploy/ is this 1 or 2 commands:
[18:02] <tsnider> ceph-deploy mon create {mon-server-name} ceph-deploy mon create mon-ceph-node
[18:03] <tsnider> Rather:
[18:03] <tsnider> ceph-deploy mon create {mon-server-name}
[18:03] <tsnider> ceph-deploy mon create mon-ceph-node
[18:03] <tsnider> What's the difference between mon-server-name and mon-ceph-node?
[18:03] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[18:05] * X3NQ (~X3NQ@195.191.107.205) Quit (Remote host closed the connection)
[18:06] * sjustlaptop (~sam@172.56.21.10) Quit (Ping timeout: 480 seconds)
[18:06] * joelio throws sharp things at S3 and goes home
[18:09] <bandrus> tsnider: the lower line is an example based on the syntax of the upper line
[18:10] <bandrus> so you would replace {mon-server-name} with you server's name, like the example shown on the lower line
[18:10] * madkiss (~madkiss@089144192241.atnat0001.highway.a1.net) Quit (Quit: Leaving.)
[18:11] <tsnider> bandrus: That's what I thought -- but knowing that all my assumptions are probably wrong.... I've run the mon create command several time and there are no keyrings in /var/lib/ceph/ bootstrap-mds or bootstrap-osd. Like the instructions say should be there.
[18:11] <tsnider> sudo ls -R /var/lib/ceph
[18:11] <tsnider> /var/lib/ceph:
[18:11] <tsnider> bootstrap-mds bootstrap-osd mds mon osd tmp
[18:11] <tsnider> /var/lib/ceph/bootstrap-mds:
[18:11] <tsnider> /var/lib/ceph/bootstrap-osd:
[18:11] <tsnider> /var/lib/ceph/mds:
[18:11] <tsnider> /var/lib/ceph/mon:
[18:11] <tsnider> ceph-controller21
[18:11] <tsnider> /var/lib/ceph/mon/ceph-controller21:
[18:11] <tsnider> done keyring store.db upstart
[18:11] <tsnider> /var/lib/ceph/mon/ceph-controller21/store.db:
[18:11] <tsnider> 000005.sst 000006.log CURRENT LOCK LOG LOG.old MANIFEST-000004
[18:11] <tsnider> /var/lib/ceph/osd:
[18:11] <tsnider> /var/lib/ceph/tmp:
[18:12] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:13] <tsnider> bandrus: http://paste.openstack.org/show/47160/ has details
[18:16] <claenjoy> hello, I can see with ceph-deploy prepare and ceph-deploy activate it make a new folder in var/lib/ceph/osd/ , ceph-0 , which parameter I need to add in /etc/fstab ??? example 1 : rw,noexec,nodev,noatime,nodiratime,barrier=0 / example 2 : rw,noatime,nodiratime / example 3 rw,noatime ?
[18:16] <bandrus> check the bootstrap directories on the mon server, not the one you're running ceph-deploy on.
[18:16] <bandrus> ^ tsnider
[18:17] <tsnider> bandrus: yeah -- it's the same node.
[18:18] * a (~a@209.12.169.218) has joined #ceph
[18:18] <bandrus> got it
[18:18] * sage (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[18:18] * a is now known as Guest6922
[18:20] * scuttlemonkey (~scuttlemo@204.57.119.28) has joined #ceph
[18:20] * ChanServ sets mode +o scuttlemonkey
[18:20] <jcfischer> sagewk, scuttlemonkey: when mds restarts: does it replay a journal (that I can kill) or do clients that were connected while mds crashed retry the last operation that crashed (i.e. doing a snapshot)
[18:21] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[18:21] <tsnider> bandrus: deploy node == monitor node shouldn't be a problem should it?
[18:24] <bandrus> tsnider: shouldn't be. Are any ceph processes running at all, or can you run "ceph auth list"?
[18:25] <claenjoy> UP !
[18:30] * roald (~roaldvanl@139-63-21-115.nodes.tno.nl) Quit (Ping timeout: 480 seconds)
[18:32] <tsnider> bandrus: not successfully :ceph@controller21:~/my-cluster$ ceph auth list
[18:32] <tsnider> 2013-09-17 09:32:24.053824 7f53287a3700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[18:32] <tsnider> 2013-09-17 09:32:24.053837 7f53287a3700 0 librados: client.admin initialization error (2) No such file or directory
[18:32] <tsnider> Error connecting to cluster: ObjectNotFound
[18:37] * mattt (~mattt@92.52.76.140) Quit (Read error: Connection reset by peer)
[18:38] * speedy (~speedy@89.110.198.232) has joined #ceph
[18:40] <speedy> hello, could anyone provide some guidance on what's the best way to export Samba/CIFS mounts to Windows clients, with using CEPH as backend storage?
[18:41] <bandrus> tsnider: are you using sudo?
[18:42] <speedy> would it work to mount cephFS on each OSD and export CIFS mounts from all of them separately?
[18:42] * scuttlemonkey (~scuttlemo@204.57.119.28) Quit (Ping timeout: 480 seconds)
[18:43] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[18:44] <tsnider> bandrus: yes -- maybe I should remvoe the done file and try again.
[18:46] * imjustmatthew (~imjustmat@pool-173-53-100-217.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[18:47] * sbadia (~sbadia@yasaw.net) has joined #ceph
[18:49] <gregaf> jcfischer: you'll need to excise the snapshot from your system; you shouldn't be using them with valued data
[18:49] <gregaf> you can identify where the request comes from by turning up the logging on the mds (mds = 20, journaler = 10, ms = 1) and seeing what source causes it to crash
[18:49] <jcfischer> I created a test directory for the snaps - still it took down everything
[18:49] <gregaf> yes
[18:50] <gregaf> snapshots are an unstable feature of a not-production-ready system
[18:50] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[18:50] <jcfischer> gregaf: I have just learned that the hard way :)
[18:50] <gregaf> actually just look at what phase the crash happens in
[18:50] <jcfischer> -1> 2013-09-17 18:17:18.194668 7f0cd768b700 4 mds.0.server handle_client_request client_request(client.177845:1628 lookupsnap #100008d5fdf//two) v1
[18:50] <gregaf> what mode is the MDS in when it crashes?
[18:50] <jcfischer> 0> 2013-09-17 18:17:18.197371 7f0cd768b700 -1 *** Caught signal (Segmentation fault) **
[18:51] <jcfischer> how do I tell the mode?
[18:51] <gregaf> if you look at the central log is the mds in replay or reconnect or something else?
[18:52] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[18:52] <gregaf> but yeah, that's coming straight from a client; journaled events are a transcription
[18:53] <gregaf> nodes shouldn't be in the snapshot area unless you sent them there, so it should be easy to identify the node sending that request
[18:53] <jcfischer> gregaf: central log? ceph -s shows me that it is in up:reconnect - which is not true, bc both our mds are down
[18:53] <gregaf> it also says "laggy or crashed", right? ;)
[18:53] <jcfischer> ok - turning up log now and restarting mds
[18:53] <jcfischer> it did
[18:53] <gregaf> no, that's all I need
[18:54] * sleinen (~Adium@2001:620:0:25:4198:379c:5f48:c67a) Quit (Quit: Leaving.)
[18:54] * sleinen (~Adium@130.59.94.162) has joined #ceph
[18:54] <gregaf> so the node which you had playing with snapshots is replaying the request which caused the initial crash; you can just make that node go away (I don't remember if a force-unmount will work or not)
[18:54] <gregaf> you might need to reboot that client all the way, not sure
[18:54] <jcfischer> ok - now I just need to remember which node it was :)
[18:54] <gregaf> and then you should get your data backed up because I'm not entirely sure what the long-term consequences are here
[18:55] <jcfischer> k
[19:02] * sleinen (~Adium@130.59.94.162) Quit (Ping timeout: 480 seconds)
[19:03] * speedy (~speedy@89.110.198.232) Quit (Quit: HydraIRC -> http://www.hydrairc.com <- Now with extra fish!)
[19:03] <nhm> yehudasa: did you delete the log at all?
[19:04] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[19:07] * sprachgenerator (~sprachgen@130.202.135.232) has joined #ceph
[19:09] <tsnider> For some reason keyring files are missing from /var/lib/ceph/... after "ceph-deploy mon create" was run. Can I delete the done file and rerun mon create to recreate the keyring files so I can finish installation?
[19:10] <tsnider> e.g. How can I back out and retry installation?
[19:10] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:12] <ron-slc> Odd pg-map status line:
[19:12] <ron-slc> 2013-09-17 10:54:37.279618 mon.0 [INF] pgmap v10210678: 1920 pgs: 1826 active+clean, 84 active+recovery_wait, 10 active+recovering; 2842 GB data, 8225 GB used, 17793 GB / 26019 GB avail; 38985B/s wr, 76op/s; -1033/3889237 degraded (-0.027%); recovering 4 o/s, 16494KB/s
[19:12] <ron-slc> Notice the math in the degraded area?
[19:14] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[19:15] <mikedawson> ron-slc: http://tracker.ceph.com/issues/5884 and http://tracker.ceph.com/issues/3720
[19:22] <jcfischer> gregaf: rebooting the (possibly offending) node made mds restart cleanly
[19:23] <ron-slc> mikedawson: that's it! Thanks.
[19:23] <gregaf> glad to hear it!
[19:23] <mikedawson> ron-slc: sure thing
[19:23] <jcfischer> makes a great war story for my presentation tomorrow about Ceph and openstack
[19:23] <jcfischer> :)
[19:26] * scuttlemonkey (~scuttlemo@204.57.119.28) has joined #ceph
[19:26] * ChanServ sets mode +o scuttlemonkey
[19:29] <jcfischer> gregaf: but now one of the mds is stuck in up:replay...
[19:29] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[19:29] <jcfischer> and all cephfs operations are hanging
[19:29] <gregaf> what is the other MDS doing?
[19:30] <jcfischer> it died (suicide) and I restarted it
[19:30] <jcfischer> now it is in 2013-09-17 19:30:40.483787 7f6882803700 1 mds.-1.0 handle_mds_map standby
[19:31] <jcfischer> ah - 2nd mds is in up:rejoin
[19:31] <jcfischer> up:active
[19:31] <jcfischer> fs operations still hanging
[19:32] <jcfischer> and succeeded
[19:32] <jcfischer> … my nerves
[19:33] <jcfischer> and we just lost another compute/osd host ...
[19:42] * yasu` (~yasu`@99.23.160.231) has joined #ceph
[19:42] * Steki (~steki@198.199.65.141) has joined #ceph
[19:42] <jcfischer> getting back on track - slowly
[19:43] <yasu`> Hi, is there any way to set a new osdmap through the command-line tool ?
[19:45] <bandrus> yasu`: ceph osd setmap -i <mapfile>
[19:45] <wrencsok> http://ceph.com/docs/master/rados/operations/crush-map/
[19:46] * sbadia (~sbadia@yasaw.net) Quit (Quit: WeeChat 0.3.8)
[19:46] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[19:46] <yasu`> ceph --help didn't show it,,, but thanks I'll try it
[19:46] * sbadia (~sbadia@yasaw.net) has joined #ceph
[19:47] <yasu`> I think crushmap and osdmap is a different thing
[19:48] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Quit: Leaving.)
[19:56] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:56] <tsnider> bandrus: any other ideas or should I ask the group again?
[19:57] <bandrus> tsnider: sorry my friend, try asking your question again, I don't have any further ideas at this time
[19:58] * peetaur (~peter@CPEbc1401e60493-CMbc1401e60490.cpe.net.cable.rogers.com) Quit (Quit: Konversation terminated!)
[19:58] <yasu`> I think my cluster have got a broken osdmap; ceph osd dump shows no osds...
[19:59] * jcfischer (~fischer@macjcf.switch.ch) Quit (Ping timeout: 480 seconds)
[20:00] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[20:01] <alfredodeza> xarses: ping
[20:01] <xarses> pong
[20:01] * scuttlemonkey (~scuttlemo@204.57.119.28) Quit (Ping timeout: 480 seconds)
[20:01] <alfredodeza> xarses: how long did you say you had to wait for a monitor to form part of the quorum ?
[20:02] <xarses> up to 20 sec
[20:02] <xarses> in a vm environment
[20:02] <alfredodeza> I am now getting the stuff ready to loop and wait until they have joined
[20:02] <xarses> sweet
[20:02] <alfredodeza> that is like the upmost value?
[20:02] <xarses> 15 is usually the longest
[20:02] <xarses> 12 is normal for me
[20:02] <xarses> i have my scripts to max at 60 sec
[20:02] * Guest6343 (~coyo@thinks.outside.theb0x.org) Quit (Ping timeout: 480 seconds)
[20:03] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:03] <alfredodeza> maybe I should do something a bit fancy like 5s -> 10s -> 15s -> 20s
[20:03] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:03] <xarses> so it would top out around 50 sec
[20:03] <xarses> ya, that should work
[20:03] <xarses> i would 1,5,10,15,20
[20:03] <xarses> but ya
[20:04] <alfredodeza> this is just for deploying intially BTW
[20:05] <tsnider> Hi --- Ceph newbie here. I'm trying to get a monitor node installed following the instructions in http://ceph.com/docs/next/start/quick-ceph-deploy/. After executing the "ceph-deploy mon create {node} \" command there are no keyring files in /var/lib/ceph/bootstrap-mds or /var/lib/ceph/bootstrap-osd. Rerunning the command as suggested doesn't produce them either. This causes the next step of gathering the keyrings to fail. I'd like to know ho
[20:06] <Tamil> tsnider: do you see a ceph-create-keys process running?
[20:06] * sjustlaptop (~sam@172.56.21.10) Quit (Ping timeout: 480 seconds)
[20:06] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[20:07] <xarses> alfredodeza: the quorum, or just waiting for the remote side to finish the commands?
[20:08] <alfredodeza> xarses: `ceph-deploy mon create-initial` will create all the mons defined initially, wait for them to form quorum and then gatherkeys
[20:08] <tsnider> Tamil: yeah -- root@controller21:/home/ceph/my-cluster# ps -ef |grep ceph
[20:08] <tsnider> Tamil: root 9958 21215 0 11:07 pts/3 00:00:00 grep ceph
[20:08] <tsnider> Tamil: root 25663 1 0 08:19 ? 00:00:07 /usr/bin/ceph-mon --cluster=ceph -i controller21 -f
[20:08] <tsnider> Tamil: root 25664 1 0 08:19 ? 00:00:12 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i controller21
[20:09] <xarses> tsnider, run that /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i controller21
[20:09] <xarses> and it will tell you what it's waiting for
[20:09] <xarses> alfredodea, sounds lovely :)
[20:09] <alfredodeza> \o/
[20:09] <xarses> alfredodeza even
[20:09] <alfredodeza> :)
[20:10] <tsnider> xarses: admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[20:10] <tsnider> xarses: INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
[20:10] <xarses> hmm, ceph-mon didn't create its socket...
[20:11] <xarses> is there a ceph.mon.keyring in the working directory of ceph-deploy
[20:12] <tsnider> xarses: yes: in /home/ceph/my-cluster
[20:12] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) has joined #ceph
[20:12] <xarses> tsnider, you can also try service ceph -a stop
[20:12] <xarses> kill the ceph-create-keys
[20:12] <xarses> and then service ceph -a start
[20:13] <xarses> tsnider, you have only one monitor?
[20:14] * skm (~smiley@205.153.36.170) Quit (Read error: Connection reset by peer)
[20:16] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:16] <tsnider> xarses: yes one monitor and 3 server nodes. Killed and restarted the ceph service. Now the python command gets:
[20:16] <tsnider> INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
[20:16] * sage (~sage@76.89.177.113) has joined #ceph
[20:16] <xarses> thats slightly better =)
[20:16] <tsnider> root@controller21:/home/ceph/my-cluster# ps -ef |grep ceph
[20:16] <tsnider> root 12505 1 0 11:15 ? 00:00:00 /usr/bin/ceph-mon --cluster=ceph -i controller21 -f
[20:16] <tsnider> root 12506 1 0 11:15 ? 00:00:00 /usr/bin/python /usr/sbin/ceph-create-keys --cluster=ceph -i controller21
[20:18] <pmatulis> if i assign a small weight (e.g.: 0.5) to an OSD does that mean little data will make it onto the corresponding storage device relative to the other higher-weighted OSDs?
[20:19] * carif (~mcarifio@146-115-183-141.c3-0.wtr-ubr1.sbo-wtr.ma.cable.rcn.com) Quit (Quit: Ex-Chat)
[20:19] <tsnider> xarses: now what -- still nothing in /var/lib/ceph/bootstrap*
[20:21] * sage (~sage@76.89.177.113) Quit (Read error: Operation timed out)
[20:22] <xarses> tsnider, can you post your ceph.conf and the command you used with ceph-deploy. please use a paste service like pastebin pastie or similar
[20:24] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:24] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:24] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:25] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:25] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:25] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:25] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Read error: Connection reset by peer)
[20:25] <tsnider> xarses: it's the one that the scripts create automagically -- I haven't edited it at all. http://paste.openstack.org/show/47176/.
[20:26] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:26] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[20:26] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:28] * sleinen1 (~Adium@2001:620:0:25:40bb:5d79:3b9e:1e4e) has joined #ceph
[20:29] <xarses> tsnider, and what was the ceph-deploy commandline you used?
[20:30] <xarses> for mon create
[20:31] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[20:31] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:31] * b1tbkt_ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:31] * b1tbkt__ (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[20:33] <xarses> tsnider: also, not that it should be an issue for its self, but does your iptables policy allow port 6789?
[20:33] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[20:34] <tsnider> xarses:ceph-deploy mon create ictp-R2C4-Controller21.ict.englab.netapp.com
[20:34] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:34] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[20:34] <xarses> tsnider, go ahead and stop / kill ceph again, and do the ceph-deploy again, but drop the fqdn, just use hostname -s
[20:35] <xarses> the 1.2.4 version should take the fqdn fine, but 1.2.3 only likes `hostname -s`
[20:36] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[20:36] <tsnider> xarses: service ceph stop doesn't kill ceph-mon
[20:37] <xarses> ya, you can outright kill -9 it
[20:37] * markbby1 (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[20:37] * sage (~sage@76.89.177.113) has joined #ceph
[20:37] * markbby (~Adium@168.94.245.2) has joined #ceph
[20:38] <tsnider> xarses: it restarts and creates another create-keys process
[20:38] <tsnider> xarses: with kill -9
[20:38] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[20:40] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:40] <xarses> odd, service ceph -a stop
[20:40] <xarses> and kill -9 should work
[20:40] * sagelap (~sage@199.106.166.12) Quit (Remote host closed the connection)
[20:40] <tsnider> xarses: WRT deploy -- the /var/lib/ceph/mon/ceph-controller21/done file will prevent any real action -- I'll try removing it.
[20:41] <tsnider> xarses: root@controller21:/home/ceph/my-cluster# service ceph -a stop
[20:41] <tsnider> xarses: root@controller21:/home/ceph/my-cluster# ps -ef|grep ceph
[20:41] <tsnider> xarses: root 18701 1 0 11:37 ? 00:00:00 /usr/bin/ceph-mon --cluster=ceph -i controller21 -f
[20:41] <tsnider> xarses: hmmm
[20:42] <xarses> s axu | grep ceph | awk '//{system("kill "$2)}'
[20:42] <xarses> s/^s/ps
[20:42] <xarses> was enough for me to tear it down
[20:43] <xarses> export host=localhost ; ceph-deploy purge $host && ceph-deploy purgedata $host
[20:43] <xarses> and that should be enough to nuke it
[20:43] <xarses> especally since the config files and package will be gone
[20:44] <xarses> you'll have to ceph-deploy install again
[20:44] <tsnider> xarses: yeah -- that worked -- I'll retry with a short host name
[20:45] <bstillwell> I just upgraded my home cluster from cuttlefish to dumpling, and now 'ceph crush osd reweight' doesn't appear to be working any more.
[20:45] <bstillwell> Is there a new method for doing that now?
[20:52] * jcfischer (~fischer@user-23-15.vpn.switch.ch) has joined #ceph
[20:52] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:54] * rturk-away is now known as rturk
[20:56] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[20:57] * scuttlemonkey (~scuttlemo@204.57.119.28) has joined #ceph
[20:57] * ChanServ sets mode +o scuttlemonkey
[20:58] <tsnider> xarses: the nodes have 2 networks a public 1gbE that has a dns server and a private 10GbE local network without DNS. If I use the short name associated with the private network will I have trouble later on if/when server nodes are referenced by the publice FQDN name? Using the public name on 1Gb is fine since this is a 'kick the tires' trial. would node IP work instead of names?
[20:58] <xarses> tsnider, you will need to add "cluster network" and "public network" to your ceph.conf
[20:59] <xarses> all clients and the mons will use the public network
[20:59] <xarses> and the osd's will use cluster network
[20:59] <tsnider> xarses: ok -- that sounds good. syntax example?
[21:00] <xarses> if you want the monitor to bind to an interface other than the DNS value of `hostname -s` you pass hostname:IP to the ceph-deploy new and ceph-deploy mon create commands
[21:04] <tsnider> xarses: because there's no dns for the short names those nodes can't be found. e.g.
[21:04] <tsnider> xarses: root@controller21:/home/ceph/my-cluster# ping Compute21
[21:04] <tsnider> xarses: ping: unknown host Compute21
[21:04] <tsnider> xarses: root@controller21:/home/ceph/my-cluster# ping ictp-R2C4-Compute21.ict.englab.netapp.com
[21:04] <tsnider> xarses: PING ictp-R2C4-Compute21.ict.englab.netapp.com (10.113.193.207) 56(84) bytes of data.
[21:04] <tsnider> xarses: 64 bytes from ictp-R2C4-Compute21.ict.englab.netapp.com (10.113.193.207): icmp_req=1 ttl=64 time=0.251 ms
[21:06] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[21:06] <xarses> thats ok, as long as the ip is passed with the hostname to new and mon create
[21:06] * scuttlemonkey (~scuttlemo@204.57.119.28) Quit (Read error: Operation timed out)
[21:08] <tsnider> xarses: Do I need to purge and recreate the storage nodes also?
[21:12] <xarses> the storage nodes wont create without a monitor in quorum
[21:12] <xarses> so not likely
[21:12] <xarses> you will probably need to send them an updated ceph.conf
[21:12] <xarses> since the osd's dial the monitor's i've never had to pass hostname:ip
[21:12] * wido__ (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) has joined #ceph
[21:12] <xarses> here's your example
[21:12] <xarses> https://gist.github.com/xarses/6599234
[21:12] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) Quit (Read error: Connection reset by peer)
[21:12] * sagelap (~sage@2600:100d:b105:b745:a9c1:9e98:20c8:9735) has joined #ceph
[21:12] <xarses> we have 3 or more interfaces defined, and we don't run the monitors on the DNS resolvable interface
[21:12] <xarses> using the method noted
[21:14] <bstillwell> So I ran the 'ceph osd crush reweight' command with --verbose and I'm still not seeing why it isn't working:
[21:14] <bstillwell> Submitting command {'prefix': 'osd crush reweight', u'name': 'osd.5', u'weight': 0.1}
[21:14] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:19] <bstillwell> looks like the reweight command might be broken. I figured out a workaround though:
[21:19] <bstillwell> ceph osd crush set osd.5 0.1 host=b2
[21:20] * aardvark (~Warren@2607:f298:a:607:956:fb3f:2383:3023) Quit (Read error: Connection reset by peer)
[21:20] * aardvark (~Warren@2607:f298:a:607:956:fb3f:2383:3023) has joined #ceph
[21:20] * sagelap (~sage@2600:100d:b105:b745:a9c1:9e98:20c8:9735) Quit (Ping timeout: 480 seconds)
[21:22] * indeed (~indeed@206.124.126.33) has joined #ceph
[21:23] <tsnider> xarses: ok -- should that also be done with server node installation. e.g. ceph-deploy install osd-1:192.168.10.20?
[21:24] <xarses> tsnider, you shouldn't need to do it with ceph-deploy osd or install
[21:24] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has left #ceph
[21:24] <xarses> only with the monitors since they are extra sensitive
[21:24] <xarses> install only sets up the packages and the such
[21:24] <tsnider> xarses: thx -- I'll see how it goes
[21:25] <xarses> ceph-deploy new and ceph-deploy mon create need host:ip for your example
[21:26] <xarses> ceph-deploy osd [create, activate, prepare] don't need it, especially if you define "cluster network"
[21:27] <tsnider> xarses: thx -- I'm trying it now
[21:28] * WarrenUsui (~Warren@2607:f298:a:607:956:fb3f:2383:3023) has joined #ceph
[21:28] * indeed_ (~indeed@206.124.126.33) has joined #ceph
[21:28] * bandrus1 (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[21:29] <xarses> whee github is 500 again
[21:31] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:33] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Remote host closed the connection)
[21:33] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[21:34] * sjustlaptop (~sam@172.56.21.10) Quit (Ping timeout: 480 seconds)
[21:34] * indeed (~indeed@206.124.126.33) Quit (Ping timeout: 480 seconds)
[21:35] * wusui (~Warren@2607:f298:a:607:956:fb3f:2383:3023) Quit (Ping timeout: 480 seconds)
[21:36] * mattt (~mattt@92.52.76.140) has joined #ceph
[21:38] <angdraug> that fox interview did them in
[21:38] <xarses> lols
[21:38] * gucki (~smuxi@HSI-KBW-109-192-187-143.hsi6.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[21:39] <xarses> "I want to fork a repo but i have to submit a pull request first"
[21:39] * gregaf1 (~Adium@2607:f298:a:607:4de2:a634:c058:52e4) has joined #ceph
[21:39] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[21:40] <xarses> thoes e-notes must have created the 500 error
[21:40] <mattt> hi all, have a cluster w/ 20 OSDs across 4 nodes … did a for loop to change all pool replica sizes from 2 to 3 and in doing so i've knocked 1/4 of my OSDs offline … trying to bring them back online fails
[21:40] * gregaf (~Adium@2607:f298:a:607:5eb:7fe6:865e:ac5) Quit (Quit: Leaving.)
[21:40] <mattt> the cluster was more or less empty, so there shouldn't have been a massive amount of rebalancing to do
[21:45] * scuttlemonkey (~scuttlemo@204.57.119.28) has joined #ceph
[21:45] * ChanServ sets mode +o scuttlemonkey
[21:51] * gregaf (~Adium@2607:f298:a:607:85:5c30:c21c:83) has joined #ceph
[21:52] * gregaf (~Adium@2607:f298:a:607:85:5c30:c21c:83) Quit ()
[21:55] <mikedawson> mattt: did the OSD processes stop? can you restart them?
[21:58] <mattt> mikedawson: yeah, they are stopped …. when i start them up again they die with this: http://paste.org/67411
[21:58] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[21:59] * gregaf1 (~Adium@2607:f298:a:607:4de2:a634:c058:52e4) Quit (Ping timeout: 480 seconds)
[22:00] * gregaf (~Adium@38.122.20.226) has joined #ceph
[22:00] <mikedawson> mattt: looks like you have run into a reportable bug. I'd recommend the mailing list and/or entering a bug at tracker.ceph.com
[22:01] <mikedawson> mattt: what version is this?
[22:01] <mattt> mikedawson: 0.67.3-1precise
[22:02] <mattt> mikedawson: 7/85 unfound (8.235%)
[22:02] <mattt> is it possible for me to see which objects?
[22:05] <mikedawson> mattt: the unfound objects are in placement groups that live on OSDs that aren't running. ''ceph health detail' or 'ceph pg dump' may show you the info
[22:06] * sjustlaptop (~sam@172.56.21.10) Quit (Ping timeout: 480 seconds)
[22:10] <mattt> mikedawson: was what i did completely unreasonable?
[22:10] <mattt> mikedawson: for x in $(rados lspools); do ceph osd pool set $x size 3; done
[22:10] <mattt> bearing in mind this is an unused cluster
[22:10] <mikedawson> mikedawson: seems reasonable
[22:11] <mattt> cool, i'll post to the mlist and see what people say
[22:12] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[22:12] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[22:15] * sagelap (~sage@2600:100d:b119:bd83:583e:a21c:c5b2:a056) has joined #ceph
[22:19] * scuttlemonkey (~scuttlemo@204.57.119.28) Quit (Ping timeout: 481 seconds)
[22:19] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[22:20] * Coyo (~coyo@thinks.outside.theb0x.org) has joined #ceph
[22:20] * Coyo is now known as Guest6943
[22:23] * sjustlaptop (~sam@172.56.21.10) Quit (Read error: Connection reset by peer)
[22:25] * sjm_ (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[22:25] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[22:27] * sjustlaptop (~sam@172.56.21.10) has joined #ceph
[22:27] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[22:31] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[22:31] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Read error: Connection reset by peer)
[22:31] * sjm_ (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Read error: Connection reset by peer)
[22:37] <mattt> mikedawson: do you think part of the problem may be that i'm running a single mon?
[22:37] <mikedawson> mattt: doubtful
[22:39] * scuttlemonkey (~scuttlemo@204.57.119.28) has joined #ceph
[22:39] * ChanServ sets mode +o scuttlemonkey
[22:40] <mikedawson> mattt: but you should run three monitors for a production workload
[22:40] <mattt> mikedawson: understood!
[22:48] * lupine (~lupine@lupine.me.uk) has joined #ceph
[22:50] * scuttlemonkey (~scuttlemo@204.57.119.28) Quit (Ping timeout: 480 seconds)
[22:51] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[22:59] * sjustlaptop (~sam@172.56.21.10) Quit (Read error: Connection reset by peer)
[23:02] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:02] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[23:03] * jcfischer (~fischer@user-23-15.vpn.switch.ch) Quit (Quit: jcfischer)
[23:20] * markbby1 (~Adium@168.94.245.2) Quit (Ping timeout: 480 seconds)
[23:23] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) has joined #ceph
[23:23] * mattt (~mattt@92.52.76.140) Quit (Read error: Connection reset by peer)
[23:25] * scuttlemonkey (~scuttlemo@38.127.1.5) has joined #ceph
[23:25] * ChanServ sets mode +o scuttlemonkey
[23:28] * markbby (~Adium@168.94.245.2) has joined #ceph
[23:28] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[23:33] <tsnider> my last DQOTD: the documentation mentions ". Ensure your Ceph Storage Cluster is in an active + clean " several places. What command displaya the cluster state?
[23:34] <tsnider> s/displaya/displays/
[23:35] <xarses> ceph -s
[23:35] <xarses> or ceph health
[23:36] * nwat (~nwat@eduroam-237-79.ucsc.edu) Quit (Ping timeout: 480 seconds)
[23:39] <tsnider> xarses: hmm -- created OSDs using ceph-deploy osd create ictp-R2C4-$1.ict.englab.netapp.com:$item in a script over 3 nodes with 12 devices each.
[23:39] <tsnider> seemed like it worked I got a bunch of messages in the form: [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ictp-R2C4-Swift21.ict.englab.netapp.com:/dev/sdb1:
[23:39] <tsnider> [ceph_deploy.osd][DEBUG ] Deploying osd to ictp-R2C4-Swift21.ict.englab.netapp.com
[23:39] <tsnider> But ceph health gives: HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
[23:40] <tsnider> I'll see what I missed
[23:40] <xarses> create is supposed to do prepare and activate, your the third case including myself that that appears to not be the case
[23:41] <xarses> run ceph-deploy osd activate in addition to the ceph-deploy osd create you did
[23:42] <tsnider> xarses: I'll add activate and see what happens
[23:44] <tsnider> xarses: Yeah -- that's doing something more / different. I guess create didn't really work
[23:44] <tsnider> 2
[23:44] * sjm (~sjm@dhcp-108-168-18-236.cable.user.start.ca) Quit (Quit: Leaving)
[23:46] * indeed_ (~indeed@206.124.126.33) Quit (Remote host closed the connection)
[23:48] <tsnider> xarses: cool -- HEALTH_OK :) :)
[23:49] * ScOut3R (~scout3r@4E5C2305.dsl.pool.telekom.hu) has joined #ceph
[23:51] * Vjarjadian (~IceChat77@176.254.6.245) has joined #ceph
[23:51] * vata (~vata@2607:fad8:4:6:156c:5b52:b19c:7128) Quit (Quit: Leaving.)
[23:53] <xarses> alfredodeza: Isn't ceph-deploy osd create supposed to do both prepare and activate?
[23:54] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[23:57] <tsnider> xareses: maybe I screwed up the syntax in the script.
[23:57] <wrale> if i wanted to co-locate my osd's on my openstack hypervisors, could i use linux cgroups to keep things from stepping all over the ram I reserve for ceph? i'm planning a possibly openstack (maybe mesos) cluster of ~36 1U nodes with 2*xeon cpu's, 4*3TB SATA and 256GB ecc ram
[23:59] <wrale> wild thought: perhaps if i run the osdaemons in dedicated lxc's? :)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.