#ceph IRC Log

Index

IRC Log for 2014-02-05

Timestamps are in GMT/BST.

[0:02] * JeffK (~Narb@38.99.52.10) Quit (Ping timeout: 480 seconds)
[0:03] * Meistarin (sid19523@0001c3c8.user.oftc.net) Quit ()
[0:07] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:09] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[0:12] * ScOut3R (~scout3r@BC0652CA.dsl.pool.telekom.hu) Quit ()
[0:18] * sarob (~sarob@2001:4998:effd:7801::1032) has joined #ceph
[0:21] * jo0nas (~jonas@188-183-5-254-static.dk.customer.tdc.net) Quit (Quit: Leaving.)
[0:22] * rmerritt (~rmerritt@nyc-333.nycbit.com) has joined #ceph
[0:27] * sarob_ (~sarob@ip-64-134-227-63.public.wayport.net) has joined #ceph
[0:28] * rmerritt (~rmerritt@nyc-333.nycbit.com) Quit (Quit: Leaving)
[0:28] * sarob (~sarob@2001:4998:effd:7801::1032) Quit (Read error: Connection reset by peer)
[0:29] * sprachgenerator (~sprachgen@130.202.135.219) Quit (Quit: sprachgenerator)
[0:33] * dis (~dis@109.110.66.110) Quit (Ping timeout: 480 seconds)
[0:33] * kaizh (~oftc-webi@128-107-239-234.cisco.com) Quit (Remote host closed the connection)
[0:35] <ron-slc> http://slashdot.org/submission/3316063/hewlett-packard-turns-buggy-software-and-firmware-into-a-revenue-stream
[0:35] <ron-slc> I know many here use servers, obviously.. I'm trying to raise a huge sting on pay-for-drivers/firmware
[0:39] * ferai (~quassel@corkblock.jefferai.org) has joined #ceph
[0:39] * fejjerai (~quassel@corkblock.jefferai.org) Quit (Read error: Connection reset by peer)
[0:40] * garphy is now known as garphy`aw
[0:41] * dmsimard (~Adium@70.38.0.246) Quit (Quit: Leaving.)
[0:42] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:44] * fejjerai (~quassel@corkblock.jefferai.org) has joined #ceph
[0:45] * pkjames (~pkjames@65.90.80.230) has joined #ceph
[0:46] * fdmanana (~fdmanana@bl5-4-53.dsl.telepac.pt) Quit (Quit: Leaving)
[0:47] <pkjames> I'm thinking about trying to demo out across 4 large servers using ceph with 2 clusters. Each server has two raids - 1 SSD, and 1 Sata. I would like to have those seperated out for different pools. Can this be done using juju?
[0:48] * ferai (~quassel@corkblock.jefferai.org) Quit (Ping timeout: 480 seconds)
[0:52] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[0:57] * ferai (~quassel@corkblock.jefferai.org) has joined #ceph
[1:03] * fejjerai (~quassel@corkblock.jefferai.org) Quit (Ping timeout: 480 seconds)
[1:14] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit (Quit: mtanski)
[1:18] * sarob (~sarob@ip-64-134-227-63.public.wayport.net) has joined #ceph
[1:18] * nwat (~textual@eduroam-252-51.ucsc.edu) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[1:22] <bens> OOOOH
[1:24] * sarob_ (~sarob@ip-64-134-227-63.public.wayport.net) Quit (Ping timeout: 480 seconds)
[1:25] * zerick (~eocrospom@190.187.21.53) Quit (Read error: Connection reset by peer)
[1:26] * JeffK (~Narb@38.99.52.10) has joined #ceph
[1:26] * humbolt1 (~elias@178-190-250-181.adsl.highway.telekom.at) Quit (Read error: Connection reset by peer)
[1:27] * humbolt (~elias@178-190-250-181.adsl.highway.telekom.at) has joined #ceph
[1:28] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[1:29] * ircolle (~Adium@2601:1:8380:2d9:61b1:7a13:c31f:eaaf) Quit (Quit: Leaving.)
[1:30] * carif (~mcarifio@pool-173-76-155-34.bstnma.fios.verizon.net) has joined #ceph
[1:35] * fejjerai (~quassel@corkblock.jefferai.org) has joined #ceph
[1:36] * garphy`aw is now known as garphy
[1:39] * sarob (~sarob@ip-64-134-227-63.public.wayport.net) Quit (Remote host closed the connection)
[1:39] * ferai (~quassel@corkblock.jefferai.org) Quit (Ping timeout: 480 seconds)
[1:41] * garphy is now known as garphy`aw
[1:42] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[1:48] * xmltok (~xmltok@216.103.134.250) Quit (Quit: Leaving...)
[1:52] * sarob (~sarob@mobile-166-137-178-029.mycingular.net) has joined #ceph
[1:56] * sarob (~sarob@mobile-166-137-178-029.mycingular.net) Quit (Remote host closed the connection)
[1:57] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:58] * reed (~reed@net-188-153-207-115.cust.dsl.teletu.it) Quit (Ping timeout: 480 seconds)
[2:15] * ferai (~quassel@corkblock.jefferai.org) has joined #ceph
[2:15] * fejjerai (~quassel@corkblock.jefferai.org) Quit (Read error: Connection reset by peer)
[2:16] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[2:21] * gregsfortytwo2 (~Adium@2607:f298:a:607:c087:1d16:36ff:dd6b) has joined #ceph
[2:23] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:27] * gregsfortytwo (~Adium@2607:f298:a:607:69ab:3858:67ac:52c5) Quit (Ping timeout: 480 seconds)
[2:30] * humbolt (~elias@178-190-250-181.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[2:36] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:38] * sprachgenerator (~sprachgen@c-67-167-211-254.hsd1.il.comcast.net) has joined #ceph
[2:38] * rmoe (~quassel@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:47] * humbolt (~elias@178-190-250-181.adsl.highway.telekom.at) has joined #ceph
[2:47] * mattt (~textual@182.55.84.224) has joined #ceph
[2:48] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) Quit (Ping timeout: 480 seconds)
[2:50] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) has joined #ceph
[2:52] * mattt (~textual@182.55.84.224) Quit (Quit: Computer has gone to sleep.)
[2:55] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[2:57] * sarob (~sarob@2601:9:7080:13a:e509:f5f2:9044:36cf) has joined #ceph
[3:00] * mattt (~textual@182.55.84.224) has joined #ceph
[3:01] * JC (~JC@c-24-21-120-173.hsd1.or.comcast.net) has joined #ceph
[3:05] * sarob (~sarob@2601:9:7080:13a:e509:f5f2:9044:36cf) Quit (Ping timeout: 480 seconds)
[3:07] * sarob (~sarob@2601:9:7080:13a:3091:6a47:d8bf:93c1) has joined #ceph
[3:16] * sarob (~sarob@2601:9:7080:13a:3091:6a47:d8bf:93c1) Quit (Ping timeout: 480 seconds)
[3:17] * mattt (~textual@182.55.84.224) Quit (Quit: Computer has gone to sleep.)
[3:19] * sjustlaptop (~sam@24-205-43-60.dhcp.gldl.ca.charter.com) has joined #ceph
[3:20] * LeaChim (~LeaChim@host86-166-182-74.range86-166.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:23] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit (Quit: mtanski)
[3:24] * bandrus (~Adium@adsl-75-5-248-216.dsl.scrm01.sbcglobal.net) Quit (Quit: Leaving.)
[3:38] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[3:40] * Tamil2 (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[3:43] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[3:45] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[3:45] * sarob (~sarob@2601:9:7080:13a:5c4d:da82:b71c:bb68) has joined #ceph
[3:47] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) Quit (Read error: Operation timed out)
[3:47] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[3:53] * sarob (~sarob@2601:9:7080:13a:5c4d:da82:b71c:bb68) Quit (Ping timeout: 480 seconds)
[3:56] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[3:57] * keds (Ked@cpc6-pool14-2-0-cust202.15-1.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[3:57] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[3:57] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[4:02] * diegows (~diegows@190.190.17.57) Quit (Read error: Operation timed out)
[4:02] * Tamil2 (~Adium@cpe-108-184-77-29.socal.res.rr.com) Quit (Quit: Leaving.)
[4:02] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[4:04] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:04] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[4:04] * mattt (~textual@182.55.84.224) has joined #ceph
[4:05] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[4:06] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:13] * mattt (~textual@182.55.84.224) Quit (Quit: Computer has gone to sleep.)
[4:13] * Pedras (~Adium@c-67-188-26-20.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:13] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:14] * bandrus (~Adium@adsl-75-5-248-216.dsl.scrm01.sbcglobal.net) has joined #ceph
[4:17] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:18] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[4:25] * bandrus (~Adium@adsl-75-5-248-216.dsl.scrm01.sbcglobal.net) Quit (Quit: Leaving.)
[4:32] * pkjames (~pkjames@65.90.80.230) Quit (Quit: Leaving)
[4:33] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[4:33] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[4:46] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:55] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) has joined #ceph
[4:57] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[4:57] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[5:05] * fireD (~fireD@93-139-156-30.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD_ (~fireD@93-142-240-141.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:07] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[5:08] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[5:10] * Vacum_ (~vovo@i59F79E5B.versanet.de) has joined #ceph
[5:11] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[5:16] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:17] * Vacum (~vovo@88.130.197.151) Quit (Ping timeout: 480 seconds)
[5:26] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:28] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:37] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit (Quit: mtanski)
[5:40] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[5:43] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit ()
[5:52] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[5:56] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[6:03] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has left #ceph
[6:08] * sarob (~sarob@2601:9:7080:13a:71f7:c446:1725:a68) has joined #ceph
[6:09] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Quit: Siva)
[6:10] <iggy> I've done some googling/mailing list searching... it seems like a windows cephfs client has been mentioned a few times. It doesn't look like anyone has actually tried doing the actual work. Does that sound right?
[6:12] <dmick> alphe had been talking seriously about working on it. no idea of his progress.
[6:12] <dmick> have you searched the channel logs?
[6:13] <iggy> I don't think anything came up there... I'll look up the irc logs specifically too
[6:15] * jjgalvez1 (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) has joined #ceph
[6:17] * sarob (~sarob@2601:9:7080:13a:71f7:c446:1725:a68) Quit (Ping timeout: 480 seconds)
[6:18] * sarob (~sarob@2601:9:7080:13a:cca5:3a4a:1f7:acf4) has joined #ceph
[6:19] * jjgalvez (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[6:21] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[6:23] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Read error: Operation timed out)
[6:23] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[6:27] * sarob (~sarob@2601:9:7080:13a:cca5:3a4a:1f7:acf4) Quit (Ping timeout: 481 seconds)
[6:28] * devrim_ (~devrim@208.72.139.54) has joined #ceph
[6:31] * sprachgenerator (~sprachgen@c-67-167-211-254.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[6:33] <devrim_> hi everyone, ceo of koding here, we have an emergency, ur ceph cluster is on its knees, most of our PGs are stuck in peering, and it will not service requests, i'd like to offer $500/hour to someone who can fix this for us
[6:33] <devrim_> *our
[6:34] <devrim_> ceph -s http://d.pr/n/Or8p
[6:34] <devrim_> please contact me at devrim@koding.com if this is something you can help us with.
[6:51] * geraintjones (~geraint@122.56.234.161) has joined #ceph
[6:51] <houkouonchi-home> damn... 117 TB available with only 25 OSD's
[6:51] <geraintjones> Hi guys
[6:52] <geraintjones> yeah that will be 60 as soon as we can get this damn thing stable
[6:57] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[6:58] <geraintjones> They are staying up
[6:58] <geraintjones> but not progressing through peering
[6:58] * lupu (~lupu@86.107.101.246) Quit (Ping timeout: 480 seconds)
[7:00] <janos> what version is that?
[7:00] <geraintjones> root@storage2:~# ceph -v
[7:00] <geraintjones> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
[7:01] <janos> is the noout flag still needed?
[7:01] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[7:01] <geraintjones> no
[7:01] <geraintjones> i will unset
[7:01] <janos> i don't know if that's holding anything up. more curiosity on that aspect
[7:02] <geraintjones> no change
[7:02] <geraintjones> could it be a network issue ?
[7:02] <janos> in ceph -w are there any commication issues?
[7:02] <janos> yeah i was wondering that
[7:02] <janos> all those waits
[7:02] <geraintjones> they can all ping each other on the cluster network
[7:02] <geraintjones> and public network
[7:03] <janos> are there any particular problem OSD's?
[7:03] <geraintjones> we did have one which kicked this whole thing off
[7:03] <geraintjones> but that was rmed
[7:03] <geraintjones> it was running at 700KB
[7:03] <geraintjones> :-/
[7:03] <janos> ouch
[7:04] <janos> 8 mons would amke me nervous. but if they have quorum they should be fine
[7:04] <sage> what do you mean by 'rmed'?
[7:04] <geraintjones> set as out
[7:04] <geraintjones> then removed from the crush map
[7:04] <geraintjones> and stopped
[7:04] <geraintjones> as per the docs
[7:04] <sage> what was wrong with it?
[7:05] <geraintjones> H/W was trashed
[7:05] <sage> only 1 osd? was the cluster otherwise healthy at the time?
[7:05] <geraintjones> yeah other than some slow requests because of it
[7:05] <geraintjones> we don't have any unfound objects which makes me semi-confident that all the data is fine
[7:06] <geraintjones> its also rebuilding fairly fast ~180MB/s
[7:06] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[7:06] <geraintjones> not seeing anything alarming in the output of iostat -xk 3
[7:07] <sage> it is the 'down' and peering pgs that are concerning. pick a random down pg (from ceph pg dump) and 'ceph pg <pgid> query'
[7:08] <geraintjones> http://pastebin.com/6WfXGnng
[7:08] * fatih_ (~fatih@c-50-174-71-251.hsd1.ca.comcast.net) has joined #ceph
[7:08] <geraintjones> "peering_blocked_by": []}, ??!?!?!
[7:09] <geraintjones> so from that I am assuming it should be peered just fine ?
[7:09] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[7:10] <sage> it's in the process of probing other osds. but not making progress. when you grep thedown pgs from the dump output, is there a pattern in which osds they map to? look for the list of 2 or 3 osds in brackets
[7:10] * cenkalti (~cenkalti@208.72.139.54) has joined #ceph
[7:11] <geraintjones> yes they are all on two new boxes we added this morning
[7:11] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[7:11] * ybrs (~aybarsbad@212.253.42.32) has joined #ceph
[7:11] * ybrs (~aybarsbad@212.253.42.32) has left #ceph
[7:12] <geraintjones> I say "all" it was a quick scan of the list
[7:12] <geraintjones> but it sure looks like they are centred on those boxes.
[7:12] * sarob (~sarob@2601:9:7080:13a:dd99:81de:40a7:f11d) has joined #ceph
[7:12] <sage> perhaps focus on one pg (say, the one you queried before) and restart the osd daemon on the new box that is involved
[7:12] <geraintjones> okay
[7:12] <geraintjones> up": [
[7:12] <geraintjones> 3,
[7:12] <geraintjones> 26],
[7:12] <geraintjones> "acting": [
[7:12] <geraintjones> 3,
[7:12] <geraintjones> 26]},
[7:12] <geraintjones> { "first": 4622,
[7:12] <geraintjones> "last": 4692,
[7:12] <geraintjones> "maybe_went_rw": 1,
[7:12] <geraintjones> "up": [
[7:12] <geraintjones> 26],
[7:12] <geraintjones> "acting": [
[7:13] <geraintjones> 26]}],
[7:13] <geraintjones> so 3 or 26 ?
[7:13] <janos> i assume 26 is a newer one
[7:13] <sage> probably; can start there
[7:14] <sage> (it's not clear from the query output which one it is waiting on; could be any of 4 6 16 17 26.)
[7:14] <geraintjones> if I had a linux bridge between storage0 and storage2 and 3 do you think that would be an issue ?
[7:15] <sage> if it prevents them from talking over tcp then yeah
[7:15] <janos> i would think you'd see some output in ceph -w about communication problems, but sage would know much more than i on that
[7:15] <houkouonchi-home> geraintjones: why would they be bridged? to save on switch ports or something?
[7:16] <geraintjones> we had storage0 and storage1 back to back on 10gbe with a cable, to avoid having to take both s0 and s1 offline to move them to a switch I added a new mic to s1 that goes to a Nexus 5k
[7:16] <geraintjones> it was just just a bandaid until I knew s0 would be happy to go for a reboot
[7:17] <geraintjones> i have to go offline for 10mins
[7:17] <geraintjones> we have a few guys here who can run commands for you
[7:17] <geraintjones> fatih_:
[7:17] <geraintjones> is one
[7:17] <geraintjones> :)
[7:18] <houkouonchi-home> I have never tried with ceph but I have had issues with bridging suddenly stop working and/or having intermittent issues in the past
[7:20] * sarob (~sarob@2601:9:7080:13a:dd99:81de:40a7:f11d) Quit (Ping timeout: 480 seconds)
[7:20] <sage> if there is a communication issue you would normally see osds marking each other down (unless you're adjusted the mon osd min down reporters (or similar) options from the defaults. if there is a weird tcp issue though (stuck stream or something) that could explain this
[7:21] <sage> anyway, if you hit a weird network bug restarting osds or even just marking htem down will restart peering, and query wil show you which osds it is trying to talk to.
[7:25] * geraintjones (~geraint@122.56.234.161) Quit (Ping timeout: 480 seconds)
[7:28] <devrim_> we've down for 8 hours dont know what to do
[7:28] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:34] <dmick> suggestions above; hard to get a better source for advice
[7:37] * jjgalvez1 (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) Quit (Quit: Leaving.)
[7:39] * odyssey4me (~odyssey4m@41-135-15-47.dsl.mweb.co.za) has joined #ceph
[7:39] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) has joined #ceph
[7:42] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) Quit (Quit: Leaving.)
[7:46] * sjustlaptop (~sam@24-205-43-60.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:49] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[7:56] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[7:59] * Cube (~Cube@netblock-75-79-17-34.dslextreme.com) has joined #ceph
[8:03] * cenkalti (~cenkalti@208.72.139.54) Quit (Remote host closed the connection)
[8:08] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[8:15] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[8:19] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[8:23] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:23] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[8:25] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[8:29] * humbolt1 (~elias@212-197-132-210.adsl.highway.telekom.at) has joined #ceph
[8:34] * Cube (~Cube@netblock-75-79-17-34.dslextreme.com) Quit (Quit: Leaving.)
[8:34] * humbolt (~elias@178-190-250-181.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[8:35] <geraintjones> It was a miconfigured switch....
[8:36] <geraintjones> someone (namely me) forgot to enable jumbos
[8:40] * Sysadmin88 (~IceChat77@94.4.30.95) Quit (Quit: Clap on! , Clap off! Clap@#&$NO CARRIER)
[8:40] * thomnico (~thomnico@41.164.30.187) has joined #ceph
[9:03] * fatih_ (~fatih@c-50-174-71-251.hsd1.ca.comcast.net) Quit (Quit: Linkinus - http://linkinus.com)
[9:07] * srenatus (~stephan@185.27.182.16) has joined #ceph
[9:11] * steki (~steki@91.195.39.5) has joined #ceph
[9:11] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Remote host closed the connection)
[9:16] * thomnico (~thomnico@41.164.30.187) Quit (Quit: Ex-Chat)
[9:18] * thomnico (~thomnico@41.164.30.187) has joined #ceph
[9:23] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Quit: Siva)
[9:28] * haomaiwa_ (~haomaiwan@218.71.76.134) Quit (Remote host closed the connection)
[9:28] * haomaiwang (~haomaiwan@101.78.195.61) has joined #ceph
[9:32] * haomaiwa_ (~haomaiwan@218.71.76.134) has joined #ceph
[9:34] * fghaas (~florian@91-119-84-76.dynamic.xdsl-line.inode.at) has joined #ceph
[9:36] * devrim_ (~devrim@208.72.139.54) Quit (Remote host closed the connection)
[9:37] <wonko_be> for the journal, is there a pro or con on either file vs lv's vs partitions
[9:37] <wonko_be> partitions would add the least amount of layers I assume
[9:38] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:38] <wonko_be> giving best performance
[9:38] <wonko_be> but i might be missing some points
[9:39] * haomaiwang (~haomaiwan@101.78.195.61) Quit (Ping timeout: 480 seconds)
[9:41] * srenatus_ (~stephan@185.27.182.16) has joined #ceph
[9:41] * srenatus (~stephan@185.27.182.16) Quit (Read error: Connection reset by peer)
[9:42] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:42] <svg_> wonko_be: wido once recommended to just partition a small part (say 4g) at the beginning of the ssd, and let the ssd internals do the scrubbing-or-how-is-this-called so it reuses all the disk
[9:43] * thomnico (~thomnico@41.164.30.187) Quit (Read error: Operation timed out)
[9:44] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[9:44] * ChanServ sets mode +v andreask
[9:44] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[9:45] * svg_ is now known as svg
[9:48] * hp_ (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[9:48] * TMM is now known as Guest183
[9:48] * hp_ is now known as tmm
[9:48] * tmm is now known as TMM
[9:51] <wonko_be> svg: it's no question of where to layout the journal, and how much space of the ssd i should us. If i point the journal for osd X to /dev/sdc5, or i add /dev/sdc to lvm, create vg and lvs on it, and point the journal to /dev/mapper/journals-osdX, or i create an xfs on /dev/sdc, and point the journal to /where/i/mount/the/journals/journal-for-osdX
[9:51] * srenatus_ (~stephan@185.27.182.16) Quit (Read error: Connection reset by peer)
[9:52] <wonko_be> it adds layers to the journal, but in a FS, the fscache will kick in
[9:52] * Guest183 (~hp@c97185.upc-c.chello.nl) Quit (Ping timeout: 480 seconds)
[9:53] <wonko_be> on the devices, direct_io and derivates are possible
[9:53] <wonko_be> lvm sits in between
[9:53] <wonko_be> a file is easy to move around if you shuffle osds to another box
[9:54] <wonko_be> partitions are doable to, but require more than a simply scp/rsync
[9:54] <wonko_be> etc etc... would like the feedback from the community on this
[9:54] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) has joined #ceph
[9:56] * srenatus (~stephan@185.27.182.16) has joined #ceph
[9:56] * senk (~senk@194.95.73.117) has joined #ceph
[9:56] <senk> hey guys
[9:57] <senk> started with the quickstart guide on 4 vms yesterday
[9:57] <senk> now im stuck with a problem after trying to add additional monitors
[9:58] <senk> http://pastebin.com/W4TtGyFA
[9:58] <senk> thats the log
[10:01] <senk> version 0.72.2
[10:01] * alexm_ (~alexm@83.167.43.235) has joined #ceph
[10:02] * devrim_ (~devrim@208.72.139.54) has joined #ceph
[10:14] * thomnico (~thomnico@41.164.30.187) has joined #ceph
[10:15] * senk (~senk@194.95.73.117) Quit ()
[10:16] <svg> wonko_be: the suggestion was to go with partitions, I'm not aware of other suggestions though.
[10:17] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[10:20] * senk (~Adium@194.95.73.117) has joined #ceph
[10:22] * garphy`aw is now known as garphy
[10:23] * fdmanana (~fdmanana@bl5-4-53.dsl.telepac.pt) has joined #ceph
[10:25] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:28] * KindTwo (KindOne@198.14.193.180) has joined #ceph
[10:28] * odyssey4me2 (~odyssey4m@165.233.71.2) has joined #ceph
[10:28] * dis (~dis@109.110.66.216) has joined #ceph
[10:29] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) Quit (Quit: geraintjones)
[10:30] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:30] * KindTwo is now known as KindOne
[10:31] * odyssey4me (~odyssey4m@41-135-15-47.dsl.mweb.co.za) Quit (Read error: Operation timed out)
[10:43] * srenatus_ (~stephan@185.27.182.16) has joined #ceph
[10:45] * LeaChim (~LeaChim@host86-166-182-74.range86-166.btcentralplus.com) has joined #ceph
[10:47] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:47] * mattt (~textual@92.52.76.140) has joined #ceph
[10:50] * srenatus (~stephan@185.27.182.16) Quit (Ping timeout: 480 seconds)
[10:52] * devrim_ (~devrim@208.72.139.54) Quit ()
[10:57] * thomnico (~thomnico@41.164.30.187) Quit (Ping timeout: 480 seconds)
[11:12] * mattt (~textual@92.52.76.140) Quit (Read error: Connection reset by peer)
[11:12] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[11:17] * allsystemsarego (~allsystem@86.121.79.14) has joined #ceph
[11:20] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[11:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[11:39] * jjgalvez (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) has joined #ceph
[11:49] * dpippenger (~riven@66-192-9-78.static.twtelecom.net) Quit (Quit: Leaving.)
[11:49] * srenatus_ is now known as srenatus
[11:59] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Read error: Operation timed out)
[12:02] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[12:05] <dwm> Has anyone experimented with putting the OSD XFS filesystem journal on SSDs, alongside a Ceph journal?
[12:08] <humbolt1> Are live migrations based on ceph RBD supported in Openstack Havana already and out of the box (as far as ceph is concerned)?
[12:09] * capri_on (~capri@212.218.126.52) has joined #ceph
[12:09] <fghaas> humbolt1: yep
[12:10] <fghaas> nova live_migration is expected to work
[12:10] <humbolt1> fghaas: yep sounds cool. But I guess I still have to configure libvirtd for it.
[12:10] <humbolt1> fghaas: because just following the ubuntu install guide does not cut it.
[12:11] <fghaas> well you do need to start libvirtd with -l of course, and you need to set up your TLS certs
[12:12] <fghaas> ah wait, scratch the TLS certs part; the default for live_migration_uri in nova.conf is actually qemu+tcp://%s/system
[12:12] * keds (~ked@cpc6-pool14-2-0-cust202.15-1.cable.virginm.net) has joined #ceph
[12:12] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[12:14] <fghaas> also, http://docs.openstack.org/trunk/config-reference/content/section_configuring-compute-migrations.html#configuring-migrations-kvm-libvirt
[12:15] <humbolt1> I hate the fact, that this is still necessary: Shared storage: NOVA-INST-DIR/instances/ (for example, /var/lib/nova/instances) has to be mounted by shared storage. This guide uses NFS but other options, including the OpenStack Gluster Connector are available.
[12:16] <fghaas> that's not true if you boot from volume
[12:17] * capri_wk (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[12:17] <fghaas> if all your guests boot from RBD backed cinder volumes rather than from glance images, then there is positively no need for your instances directory to be shared
[12:17] * capri_wk (~capri@212.218.127.222) has joined #ceph
[12:19] <humbolt1> fghaas: Great. I only boot from volumes.
[12:21] <humbolt1> fghaas: Actually, I have to use the "boot from image (creates a new volume)" option to get anything to work. Should the plain "boot from image" option work as well, when using ceph RBD for both cinder and glance?!
[12:22] <fghaas> no, if you think that is the case, then you're fundamentally misunderstanding how glance works
[12:23] <fghaas> if your glance is rbd backed, then booting from an image will make Nova stream the image from Glance, convert it to a qcow2 file, and use that as a backing store for your new Nova guest
[12:24] <fghaas> which will be completely ephemeral and will go away if you delete your instance
[12:24] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[12:24] <fghaas> there's some pretty high quality OpenStack training coming up in your neck of the woods, humbolt1 by the way... just in case you're interested :)
[12:25] * capri_on (~capri@212.218.126.52) Quit (Ping timeout: 480 seconds)
[12:27] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:29] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) has joined #ceph
[12:30] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) Quit ()
[12:31] * i_m (~ivan.miro@deibp9eh1--blueice4n2.emea.ibm.com) has joined #ceph
[12:31] * jjgalvez (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) Quit (Quit: Leaving.)
[12:32] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[12:38] * zebra (~zebra@101.165.194.249) has joined #ceph
[12:39] <zebra> anyone around I can chat with about some Ceph use cases?
[12:40] <fghaas> zebra: pretty much everyone in the channel might have some thoughts to share on that; just ask your question
[12:41] <zebra> k
[12:41] <zebra> so I have about 4PB of data at the moment
[12:41] <zebra> sitting in a large HSM environment
[12:41] <zebra> for one reason or another, I'm having to make some tough decisions about infrastructure
[12:41] <zebra> one of those is the nature/role HSM and tiering even plays these days
[12:41] <zebra> and the other is data locality
[12:42] <zebra> here is the thing - I want to retain my copy/replication/many-copies architecture that HSM allows for and the ability to bring back infinite snapshots of old data that have been written
[12:42] <zebra> but it might be unwieldy to do it with tape silos for all kinds of local site issue reasons
[12:42] <zebra> so I am wondering
[12:43] <zebra> is Ceph a good/sold thing in it's filesystem component for a very large file serving (10GbE+ flat speeds) infrastructure?
[12:44] * markbby (~Adium@168.94.245.2) has joined #ceph
[12:45] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Quit: Siva)
[12:47] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[12:53] <fghaas> zebra: the filesystem isn't considered stable at this point
[12:53] <zebra> ok
[12:53] <zebra> that is good information to have heard here
[12:53] <zebra> I was aware of this
[12:53] <zebra> but it's nice to hear it from the locals.
[12:54] <zebra> so we're not talking production grade, 400 people slamming it with scientific research data durable/stable yet.
[12:55] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[12:55] * markbby (~Adium@168.94.245.2) has joined #ceph
[12:56] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[12:56] * ChanServ sets mode +v andreask
[12:58] <fghaas> zebra: you may wait for input from gregsfortytwo or sage, but considering they're on the U.S. West Coast and probably asleep, don't hold your breath -- but as per my information, the "the filesystem isn't production ready" mantra still applies
[12:58] <zebra> understood
[12:58] <zebra> so you'd say object store = win, filesystem = "wait and see" ?
[12:59] * thb (~me@0001bd58.user.oftc.net) has joined #ceph
[13:01] * thb (~me@0001bd58.user.oftc.net) Quit ()
[13:02] * thb (~me@0001bd58.user.oftc.net) has joined #ceph
[13:05] <fghaas> as for filesystem wait and see, yes.
[13:06] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[13:06] * lupu (~lupu@86.107.101.246) has joined #ceph
[13:06] * markbby (~Adium@168.94.245.2) has joined #ceph
[13:11] * ircuser-1 (~ircuser-1@35.222-62-69.ftth.swbr.surewest.net) Quit (Ping timeout: 480 seconds)
[13:13] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[13:14] * t0rn (~ssullivan@c-24-11-198-35.hsd1.mi.comcast.net) has joined #ceph
[13:14] * diegows (~diegows@190.190.17.57) has joined #ceph
[13:16] * srenatus_ (~stephan@185.27.182.16) has joined #ceph
[13:22] * srenatus (~stephan@185.27.182.16) Quit (Ping timeout: 480 seconds)
[13:23] * leochill (~leochill@nyc-333.nycbit.com) has joined #ceph
[13:27] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:29] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[13:41] * mattt (~textual@182.55.84.224) has joined #ceph
[13:41] * stewiem2000 (~stewiem20@195.10.250.233) has joined #ceph
[13:42] * mattt (~textual@182.55.84.224) Quit ()
[13:48] * fdmanana (~fdmanana@bl5-4-53.dsl.telepac.pt) Quit (Quit: Leaving)
[13:49] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[13:51] <humbolt1> I don't get RBD based live migration to work. Eventhough everything is in place
[13:53] * ircuser-1 (~ircuser-1@35.222-62-69.ftth.swbr.surewest.net) has joined #ceph
[13:53] * KindTwo (KindOne@h59.35.186.173.dynamic.ip.windstream.net) has joined #ceph
[13:53] * ircuser-1_ (~ircuser-1@35.222-62-69.ftth.swbr.surewest.net) has joined #ceph
[13:54] * keds (~ked@cpc6-pool14-2-0-cust202.15-1.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[13:54] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:54] * KindTwo is now known as KindOne
[13:59] * markbby (~Adium@168.94.245.3) has joined #ceph
[14:12] * markbby (~Adium@168.94.245.3) Quit (Remote host closed the connection)
[14:14] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:16] <tnt> humbolt1: with what vm system ?
[14:17] <humbolt1> kvm
[14:17] <humbolt1> openstack
[14:19] <humbolt1> 2014-02-05 13:18:20.549+0000: 47358: warning : virAuditSend:135 : Failed to send audit message virt=kvm vm="instance-00000067" uuid=f25b85ef-8dde-4754-93b2-49c87652261c vm-ctx=118:126 img-ctx=118:126 model=dac: Operation not permitted
[14:19] <fghaas> humbolt1: red herring
[14:19] <humbolt1> ?
[14:21] <fghaas> that's a red herring; "failed to send audit message" warnings should not preclude live migration
[14:21] * mattt (~textual@182.55.84.224) has joined #ceph
[14:21] <fghaas> check your nova-compute.log
[14:21] <fghaas> also, make sure you can do "virsh --connect qemu+tcp://<migrationtargethost>/system list" to see whether your libvirtd is set up the way nova expects
[14:24] <humbolt1> 47500 TRACE nova.openstack.common.rpc.amqp TypeError: libvirt_info() takes exactly 6 arguments (7 given)
[14:24] <humbolt1> the virsh command is successfull!
[14:24] <fghaas> humbolt1: um, are you an openstack newbie, by any chance? :)
[14:25] <humbolt1> fghaas: pretty much.
[14:26] <fghaas> your questions may be better suited for #openstack on freenode, because the issues you are troubleshooting are unrelated to Ceph. At any rate, check your nova-compute.log for ERROR messages and make sure the (poorly named) live_migration_flag option in nova.conf does include VIR_MIGRATE_LIVE
[14:27] <humbolt1> fghaas: it does
[14:27] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[14:28] * schmee (~quassel@phobos.isoho.st) Quit (Quit: No Ping reply in 180 seconds.)
[14:28] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Quit: Siva)
[14:29] <fghaas> yeah, no way to tell without more info and without looking at your logs
[14:29] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:30] * schmee (~quassel@phobos.isoho.st) has joined #ceph
[14:30] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[14:31] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) has joined #ceph
[14:31] <humbolt1> which kernel to run on ubuntu 12.04 LTS for latest ceph+havana?
[14:33] * mattt (~textual@182.55.84.224) Quit (Quit: Computer has gone to sleep.)
[14:35] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[14:38] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Quit: Siva)
[14:38] <fghaas> humbolt1: for your ceph hosts or for your Nova nodes?
[14:39] <humbolt1> well, they are one and the same. each ceph node is an openstack node as well.
[14:39] <fghaas> 3.11
[14:41] <fghaas> should be in the HWE stack, iirc
[14:43] <fghaas> also, surely you are aware that Ceph doesn't give you any local affinity, so don't assume that just because some Ceph data is on the same node as the Nova guest using it, Ceph will actually read it locally
[14:44] * mattt (~textual@182.55.84.224) has joined #ceph
[14:47] <humbolt1> fghaas: I only have 3 nodes for this minimal setup. So I have no choice but to have everything on the same hardware.
[14:54] <fghaas> humbolt1: yeah, just letting you know... even if you set your cinder/glance pool size to 3 and you have a copy of all your ceph data on all of your nodes, rbd will still fetch some data over the wire
[14:55] <humbolt1> alright
[14:55] <fghaas> (for good measure, by the way, glusterfs is no better)
[14:56] <humbolt1> I have a dedicated nic for cluster_network and I plan to upgrade it to 10GbE some time soon.
[14:57] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Ping timeout: 480 seconds)
[14:59] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) has joined #ceph
[15:02] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[15:06] * TMM (~hp@c97185.upc-c.chello.nl) has joined #ceph
[15:06] * fdmanana (~fdmanana@bl5-4-53.dsl.telepac.pt) has joined #ceph
[15:07] * BillK (~BillK-OFT@124-148-105-94.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:08] * markbby (~Adium@168.94.245.3) has joined #ceph
[15:08] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) has joined #ceph
[15:09] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[15:11] * plut0 (~cory@pool-108-4-147-13.albyny.fios.verizon.net) has joined #ceph
[15:16] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[15:18] * capri_wk (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[15:19] <plut0> how goes the zfs development?
[15:20] * Siva (~sivat@117.192.53.165) has joined #ceph
[15:21] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[15:24] * sleinen2 (~Adium@2001:620:0:25:e0da:c8f0:1511:544e) has joined #ceph
[15:31] * sprachgenerator (~sprachgen@c-67-167-211-254.hsd1.il.comcast.net) has joined #ceph
[15:35] * jcsp (~Adium@0001bf3a.user.oftc.net) has joined #ceph
[15:37] * JeffK (~Narb@38.99.52.10) Quit (Ping timeout: 480 seconds)
[15:38] * capri (~capri@212.218.127.222) has joined #ceph
[15:43] * JC (~JC@c-24-21-120-173.hsd1.or.comcast.net) Quit (Quit: Leaving.)
[15:45] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[15:46] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[15:46] * rendar (~s@host94-182-dynamic.37-79-r.retail.telecomitalia.it) has joined #ceph
[15:48] * kickr (~kickr@0001c4ca.user.oftc.net) Quit ()
[15:50] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit (Quit: dereky)
[15:51] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[15:54] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[15:56] <humbolt1> Do I need NFS for live migration of a RBD VM now in Havana?
[15:58] <fghaas> again, if your nova guest boots from volume, then no you don't
[15:58] <fghaas> if it boots from an image, then yes you do need something that shares /var/lib/nova/instances
[16:01] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[16:04] * sleinen2 (~Adium@2001:620:0:25:e0da:c8f0:1511:544e) Quit (Quit: Leaving.)
[16:04] * sleinen (~Adium@130.59.94.140) has joined #ceph
[16:05] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) has joined #ceph
[16:07] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[16:07] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit (Quit: dereky)
[16:09] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[16:09] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[16:10] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[16:12] * sleinen (~Adium@130.59.94.140) Quit (Ping timeout: 480 seconds)
[16:14] * kickr (~kickr@0001c4ca.user.oftc.net) Quit (Quit: Leaving)
[16:14] * mattt (~textual@182.55.84.224) Quit (Quit: Computer has gone to sleep.)
[16:16] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) Quit (Quit: Leaving)
[16:21] * thomnico (~thomnico@41.164.30.187) has joined #ceph
[16:22] * Siva_ (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[16:27] * Siva (~sivat@117.192.53.165) Quit (Ping timeout: 480 seconds)
[16:27] * Siva_ is now known as Siva
[16:30] * sleinen (~Adium@130.59.94.140) has joined #ceph
[16:32] * sleinen1 (~Adium@2001:620:0:25:81a5:cdb4:5d04:2624) has joined #ceph
[16:32] * sleinen1 (~Adium@2001:620:0:25:81a5:cdb4:5d04:2624) Quit ()
[16:32] * sleinen (~Adium@130.59.94.140) Quit (Read error: Connection reset by peer)
[16:32] * sleinen1 (~Adium@130.59.94.140) has joined #ceph
[16:33] * i_m (~ivan.miro@deibp9eh1--blueice4n2.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[16:34] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:35] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[16:35] * diegows (~diegows@190.190.17.57) Quit (Ping timeout: 480 seconds)
[16:36] * japuzzo (~Joe@ool-4570886e.dyn.optonline.net) has joined #ceph
[16:37] * sleinen1 (~Adium@130.59.94.140) Quit (Read error: Connection reset by peer)
[16:37] * sleinen (~Adium@130.59.94.140) has joined #ceph
[16:41] * thomnico (~thomnico@41.164.30.187) Quit (Ping timeout: 480 seconds)
[16:45] * sleinen (~Adium@130.59.94.140) Quit (Ping timeout: 480 seconds)
[16:45] * sleinen (~Adium@2001:620:0:25:74ab:7c36:8943:357c) has joined #ceph
[16:46] * kickr (~kickr@0001c4ca.user.oftc.net) Quit (Quit: Leaving)
[16:47] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[16:48] * senk (~Adium@194.95.73.117) Quit (Ping timeout: 480 seconds)
[16:49] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[16:49] * sprachgenerator (~sprachgen@c-67-167-211-254.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[16:50] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit ()
[16:52] * freedomhui (~freedomhu@119.139.94.114) has joined #ceph
[16:55] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:56] * leochill (~leochill@nyc-333.nycbit.com) Quit (Quit: Leaving)
[17:00] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[17:02] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:03] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:03] * sleinen (~Adium@2001:620:0:25:74ab:7c36:8943:357c) Quit (Quit: Leaving.)
[17:03] * sleinen (~Adium@130.59.94.140) has joined #ceph
[17:07] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:09] * steki (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[17:11] * sleinen (~Adium@130.59.94.140) Quit (Ping timeout: 480 seconds)
[17:17] * xarses (~andreww@12.164.168.117) has joined #ceph
[17:18] <fghaas> anyone here aware of the realistic number of rbd clients you could run on a machine with 512G RAM, where the caller's max processes ulimit is 8192, before you would see pthread_create hit EAGAIN, causing the error message in https://github.com/ceph/ceph/blob/master/src/common/Thread.cc#L107?
[17:18] <fghaas> hundreds? thousands? would it be expected that you hit that limit with no more than about 200 qemu-rbd callers?
[17:19] <fghaas> (just trying to get a second opinion before I file yet another bug today that is then determined to be a feature)
[17:20] * sleinen (~Adium@2001:620:0:2d:708a:79f2:3266:19ce) has joined #ceph
[17:21] * lupu (~lupu@86.107.101.246) Quit (Quit: Leaving.)
[17:21] * bandrus (~Adium@adsl-75-5-248-216.dsl.scrm01.sbcglobal.net) has joined #ceph
[17:22] * lupu (~lupu@86.107.101.246) has joined #ceph
[17:25] * freedomhui (~freedomhu@119.139.94.114) Quit (Quit: Leaving...)
[17:26] * rotbeard (~redbeard@2a02:908:df19:7a80:76f0:6dff:fe3b:994d) has joined #ceph
[17:27] * senk (~Adium@ip-5-147-216-213.unitymediagroup.de) has joined #ceph
[17:27] <gregsfortytwo1> it'll depend on whether you're invoking them as separate clients or unified, and how many OSDs there are that each client needs to connect to
[17:27] <gregsfortytwo1> fghaas: ^
[17:28] <fghaas> gregsfortytwo, for qemu/kvm domains I would guess that's "separate", not "unified" per your definition, correct?
[17:28] * sleinen (~Adium@2001:620:0:2d:708a:79f2:3266:19ce) Quit (Ping timeout: 480 seconds)
[17:28] <gregsfortytwo1> I don't actually know the kvm architecture well enough to be sure, but with that phrasing I'm going to guess "separate"
[17:29] <gregsfortytwo1> joshd would know
[17:30] <fghaas> yowzers. so 200 qemu domains talking to 200 OSDs makes 40k threads? eek.
[17:30] <fghaas> well it's definitely 200 different processes -- absent threads being sharable across processes, that would definitely meet the standard definition of "separate"
[17:31] * sleinen (~Adium@2001:620:0:25:a036:d669:31e3:9b21) has joined #ceph
[17:33] <gregsfortytwo1> that's our standard 2 threads per Pipe pseudo-problem
[17:33] <fghaas> so more like 80k threads in that scenario?
[17:33] <gregsfortytwo1> yeah :)
[17:34] <fghaas> um, and can you give me an estimate for memory footprint per thread?
[17:34] <gregsfortytwo1> so far the only trouble it's actually caused is "you want me to set my thread limit *where*!?!?!"
[17:34] <gregsfortytwo1> very low
[17:35] <gregsfortytwo1> probably the Linux minimum (4k?)
[17:35] * ScOut3R (~ScOut3R@catv-89-133-22-210.catv.broadband.hu) has joined #ceph
[17:35] <gregsfortytwo1> or maybe there's not a minimum and I'm just thinking of the max stack size (or are they the same? argh need sleep)
[17:35] <fghaas> ok, so about 300M, that's bearable
[17:36] <gregsfortytwo1> we've had a lot of people convinced it will destroy their systems who try it out and say "oh, okay then"
[17:36] <gregsfortytwo1> it'll be a problem eventually
[17:37] <jtangwk> is there a way of limiting osd recovery ?
[17:37] <gregsfortytwo1> but both Linux and processors are a lot better at that sort of thing than they were in the nineties
[17:37] <jtangwk> i want to place an osd into an existing cluster and it nerfs the entire system performance
[17:38] <jtangwk> mind you we only have two osd's right now
[17:38] <gregsfortytwo1> jtangwk: check the documentation for the max backfill config options
[17:39] <fghaas> gregsfortytwo1: still, in that scenario (200 OSDs, 200 VMs)... with libvirt-qemu's nproc limit set to 260,000 I wouldn't expect to hit that assertion
[17:39] <jtangwk> ah okay, thanks
[17:41] <gregsfortytwo1> fghaas: sorry, I think I'm missing some crucial context
[17:41] <gregsfortytwo1> you hit that Thread create assert?
[17:41] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:41] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Read error: Operation timed out)
[17:41] <fghaas> yeah exactly. well, andreask did, I'm standing in for him while he is feeding his kids dinner :)
[17:42] * senk (~Adium@ip-5-147-216-213.unitymediagroup.de) Quit (Quit: Leaving.)
[17:42] <fghaas> so with 8,000 threads per process (as per /proc/<pid>/limits) and 260k per user, you should get nowhere near those limits with 400 threads per process and 200 processes running under that user... right?
[17:42] <fghaas> s/so with/so with the limit being/
[17:42] <kraken> fghaas meant to say: so with the limit being 8,000 threads per process (as per /proc/<pid>/limits) and 260k per user, you should get nowhere near those limits with 400 threads per process and 200 processes running under that user... right?
[17:43] <gregsfortytwo1> hrm, I wouldn't expect so
[17:43] <fghaas> see I wouldn't either :)
[17:44] <fghaas> unless there's something that leaks threads
[17:46] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[17:46] * sjustlaptop (~sam@24-205-43-60.dhcp.gldl.ca.charter.com) has joined #ceph
[17:46] <gregsfortytwo1> yeah, I'm not sure
[17:46] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[17:47] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:47] <gregsfortytwo1> I could see going a little past that if the OSDs were flapping, but not by 3x
[17:47] <fghaas> ok, we'll keep digging into that
[17:47] <fghaas> and see if we can get this to reliably reproduce
[17:48] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit (Quit: dereky)
[17:49] * ScOut3R (~ScOut3R@catv-89-133-22-210.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:50] * JeffK (~JeffK@38.99.52.10) has joined #ceph
[17:54] * markbby (~Adium@168.94.245.3) Quit (Quit: Leaving.)
[17:54] * JoeGruher (~JoeGruher@134.134.137.75) has joined #ceph
[17:55] * peetaur (~peter@dhcp-108-168-3-60.cable.user.start.ca) has joined #ceph
[17:56] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) has joined #ceph
[17:56] * markbby (~Adium@168.94.245.3) has joined #ceph
[17:58] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[17:58] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[17:59] * plut0 (~cory@pool-108-4-147-13.albyny.fios.verizon.net) has left #ceph
[17:59] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[18:02] * stewiem2000 (~stewiem20@195.10.250.233) Quit (Quit: Leaving.)
[18:06] <JeffK> Hi everyone, I'm in need of some help. I was hoping to present a proposal for a project involving running VMs off a Ceph cluster to my boss today and I really wanted to show off a proof of concept with it, but haven't been able to get it working quite yet.
[18:07] * kickr (~kickr@0001c4ca.user.oftc.net) Quit (Quit: Leaving)
[18:07] <bens> JeffK what is happenin'?
[18:07] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[18:08] <JeffK> Well, I started by setting up a 3 node cluster and that seems to be working fine. I am having trouble getting a VMM to talk to it.
[18:09] * kickr (~kickr@0001c4ca.user.oftc.net) Quit ()
[18:09] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[18:10] <pmatulis> JeffK: you will need to be more specific. what did you do to have a vm talk to it...
[18:10] <JeffK> At first, I was trying to run Xen on one of the cluster nodes, but I couldn't get it to talk to libvirt to create a storage repository on RBD.
[18:10] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit (Quit: dereky)
[18:10] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[18:10] <JeffK> I got an error saying the libvirt driver was not recognised.
[18:11] * Underbyte (~jerrad@pat-global.macpractice.net) has joined #ceph
[18:11] <JeffK> I then set up a 4th box and installed the XenServer tech preview that is supposed to have Ceph compatibility built in.
[18:11] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit ()
[18:11] <pmatulis> JeffK: i presume you use Xen already?
[18:12] <JeffK> Yeah, we have a couple servers running XenServer right now, so I figure my boss will be more likely to approve of a product he is already familiar with.
[18:13] <JeffK> We are a consultancy firm and he is a big fan of plug and play type stuff. Doesn't want to spend much time fiddling with stuff (plus, he doesn't know Linux at all).
[18:13] * c74d (~c74d@2002:4404:712c:0:d441:98f3:f34d:c664) has joined #ceph
[18:15] <JeffK> Anyway, the 4th box with the XenServer tech preview doesn't seem to want to talk to Ceph.
[18:16] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[18:16] <JeffK> When I was initially configuring the cluster, I had trouble getting it setup using CentOS (the networking in that distro seems quite foreign to me). I was able to get it working using Ubuntu no problem though.
[18:17] <JeffK> So, ideally, instead of using the 4th box with the CentOS based tech preview, I'd like to just get Xen working with Ceph on one of the cluster nodes.
[18:18] <JeffK> It doesn't even have to work all that well, I just want something to show him.
[18:19] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Read error: Operation timed out)
[18:20] * ScOut3R (~scout3r@BC0652CA.dsl.pool.telekom.hu) has joined #ceph
[18:23] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[18:24] * dereky (~dereky@129-2-129-152.wireless.umd.edu) has joined #ceph
[18:28] * Sysadmin88 (~IceChat77@94.4.30.95) has joined #ceph
[18:30] * odyssey4me2 (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[18:32] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[18:33] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 26.0/20131205075310])
[18:38] * Steki (~steki@198.199.65.141) has joined #ceph
[18:38] * rmoe (~quassel@12.164.168.117) has joined #ceph
[18:41] * alexm_ (~alexm@83.167.43.235) Quit (Ping timeout: 480 seconds)
[18:42] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[18:44] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:50] * garphy is now known as garphy`aw
[18:52] * srenatus_ (~stephan@185.27.182.16) Quit (Read error: Operation timed out)
[18:52] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[18:53] * diegows (~diegows@200.68.116.185) has joined #ceph
[18:54] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit (Quit: mtanski)
[18:56] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[18:57] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[18:57] * dereky (~dereky@129-2-129-152.wireless.umd.edu) Quit (Quit: dereky)
[18:59] * gregmark (~Adium@68.87.42.115) has joined #ceph
[19:02] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[19:02] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[19:04] * neurodrone (~neurodron@rrcs-72-43-115-186.nyc.biz.rr.com) has joined #ceph
[19:07] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[19:08] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) has left #ceph
[19:10] * gregsfortytwo (~Adium@2607:f298:a:607:88b1:f0c9:a56a:5850) has joined #ceph
[19:12] * Sysadmin88 (~IceChat77@94.4.30.95) Quit (Quit: Depression is merely anger without enthusiasm)
[19:12] * sprachgenerator (~sprachgen@130.202.135.186) has joined #ceph
[19:15] * sleinen (~Adium@2001:620:0:25:a036:d669:31e3:9b21) Quit (Quit: Leaving.)
[19:15] * gregsfortytwo2 (~Adium@2607:f298:a:607:c087:1d16:36ff:dd6b) Quit (Ping timeout: 480 seconds)
[19:15] * sleinen (~Adium@130.59.94.140) has joined #ceph
[19:15] * srenatus (~stephan@185.27.182.16) has joined #ceph
[19:16] <JeffK> Hmm, so weird. After I install the XenServer tech preview, the networking gets all screwed up. I can't even ping in or out.
[19:16] <JeffK> It adds some bridges, but everything looks fine to me. I even have iptables disabled just to be sure.
[19:21] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:22] * senk (~Adium@ip-5-147-216-213.unitymediagroup.de) has joined #ceph
[19:23] * sleinen (~Adium@130.59.94.140) Quit (Ping timeout: 480 seconds)
[19:24] * Steki (~steki@198.199.65.141) Quit (Ping timeout: 480 seconds)
[19:26] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[19:26] * markbby (~Adium@168.94.245.3) Quit (Quit: Leaving.)
[19:27] * srenatus (~stephan@185.27.182.16) Quit (Ping timeout: 480 seconds)
[19:31] * dmsimard1 (~Adium@ap02.wireless.co.mtl.iweb.com) has joined #ceph
[19:33] * Tamil2 (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[19:33] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[19:35] * Tamil (~Adium@cpe-108-184-77-29.socal.res.rr.com) has joined #ceph
[19:36] * dmsimard (~Adium@108.163.152.2) Quit (Ping timeout: 480 seconds)
[19:39] * nwat (~textual@eduroam-227-20.ucsc.edu) has joined #ceph
[19:41] * Pedras (~Adium@216.207.42.132) has joined #ceph
[19:42] * Tamil2 (~Adium@cpe-108-184-77-29.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:43] * xdeller (~xdeller@213.33.189.50) Quit (Quit: Leaving)
[19:48] * jjgalvez (~jjgalvez@ip98-167-16-160.lv.lv.cox.net) has joined #ceph
[19:49] * dereky (~dereky@proxy00.umiacs.umd.edu) has joined #ceph
[19:49] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) Quit (Ping timeout: 480 seconds)
[19:52] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) has joined #ceph
[19:54] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) has joined #ceph
[19:54] * dgbaley27 (~matt@c-76-120-64-12.hsd1.co.comcast.net) has joined #ceph
[19:58] * markbby (~Adium@168.94.245.3) has joined #ceph
[20:05] * leochill (~leochill@nyc-333.nycbit.com) has joined #ceph
[20:08] * Cube (~Cube@66-87-131-70.pools.spcsdns.net) has joined #ceph
[20:08] * markbby (~Adium@168.94.245.3) Quit (Ping timeout: 480 seconds)
[20:10] <geraintjones> Hi Guys
[20:10] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) Quit (Ping timeout: 480 seconds)
[20:11] <geraintjones> is backfill_toofull a problem ?
[20:11] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Read error: Connection reset by peer)
[20:11] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[20:12] * markbby (~Adium@168.94.245.3) has joined #ceph
[20:13] <geraintjones> recovery has seemingly stalled
[20:13] * sroy (~sroy@2607:fad8:4:6:6e88:14ff:feff:5374) has joined #ceph
[20:13] <geraintjones> http://pastebin.com/pMCjr4Ma
[20:14] <jcsp> geraintjones: it means that a PG is trying to create a new copy on an OSD which doesn't have enough space.
[20:14] <jcsp> it is bad news
[20:14] <geraintjones> all of my odds have heaps of free space
[20:14] <Gugge-47527> the one its trying to use does not :)
[20:15] <Gugge-47527> its free space is under osd_backfill_full_ratio
[20:15] <geraintjones> http://pastebin.com/QdfBADdT
[20:16] <Gugge-47527> and ceph pg dump
[20:16] <Gugge-47527> make just grep for the toofull lines :)
[20:17] <geraintjones> http://pastebin.com/PuEJsbF2
[20:18] <Gugge-47527> my first guess is you have an osd defined using the / filesystem on storage3
[20:18] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[20:18] * ChanServ sets mode +v andreask
[20:19] <geraintjones> pah
[20:19] <geraintjones> you are right
[20:19] <geraintjones> seems 26 isn't mounted
[20:19] <geraintjones> if I stop id=26
[20:19] <Gugge-47527> i would reweight it to 0
[20:20] <geraintjones> and mount it then copy everything to the right place and start it ?
[20:20] <Gugge-47527> and wait for it to be empty
[20:20] <geraintjones> okay I will do that first
[20:20] <Gugge-47527> im pretty sure a simple copy wont keep all the neaded attributes
[20:20] <fghaas> Gugge-47527: "ceph osd out 26" would be, um, simpler
[20:21] <fghaas> than your reweight suggestion, I mean
[20:21] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[20:21] <Gugge-47527> maybe :)
[20:22] <geraintjones> yay
[20:22] <geraintjones> now its backfilling :)
[20:22] <geraintjones> will re-add 26 once its all happy
[20:24] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[20:25] * markbby (~Adium@168.94.245.3) Quit (Quit: Leaving.)
[20:25] * markbby (~Adium@168.94.245.3) has joined #ceph
[20:25] * dmsimard (~Adium@70.38.0.249) has joined #ceph
[20:26] * sleinen1 (~Adium@2001:620:0:26:3c05:a639:7d32:c2da) has joined #ceph
[20:27] * dmsimard2 (~Adium@70.38.0.245) has joined #ceph
[20:28] * jcsp1 (~Adium@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[20:28] * kickr (~kickr@0001c4ca.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:29] * dmsimard1 (~Adium@ap02.wireless.co.mtl.iweb.com) Quit (Read error: Connection reset by peer)
[20:31] * senk (~Adium@ip-5-147-216-213.unitymediagroup.de) Quit (Quit: Leaving.)
[20:32] * jcsp (~Adium@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:33] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:33] * dmsimard2 (~Adium@70.38.0.245) Quit (Quit: Leaving.)
[20:33] * dmsimard (~Adium@70.38.0.249) Quit (Ping timeout: 480 seconds)
[20:35] * dmsimard (~Adium@ap03.wireless.co.mtl.iweb.com) has joined #ceph
[20:39] * jcsp (~Adium@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[20:40] * jcsp1 (~Adium@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[20:40] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[20:41] * gregmark (~Adium@68.87.42.115) has joined #ceph
[20:42] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[20:43] * joelio (~Joel@88.198.107.214) Quit (Ping timeout: 480 seconds)
[20:54] * ScOut3R (~scout3r@BC0652CA.dsl.pool.telekom.hu) Quit ()
[20:58] * Pedras1 (~Adium@216.207.42.134) has joined #ceph
[20:58] * dgbaley27 (~matt@c-76-120-64-12.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[21:01] * Pedras1 (~Adium@216.207.42.134) Quit ()
[21:02] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[21:02] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[21:03] * fdmanana (~fdmanana@bl5-4-53.dsl.telepac.pt) Quit (Quit: Leaving)
[21:04] * Pedras1 (~Adium@216.207.42.134) has joined #ceph
[21:04] * Pedras (~Adium@216.207.42.132) Quit (Ping timeout: 480 seconds)
[21:07] * garphy`aw is now known as garphy
[21:08] * garphy is now known as garphy`aw
[21:09] * garphy`aw is now known as garphy
[21:14] * Pedras1 (~Adium@216.207.42.134) Quit (Ping timeout: 480 seconds)
[21:15] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[21:18] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) Quit (Quit: geraintjones)
[21:18] * Underbyte (~jerrad@pat-global.macpractice.net) Quit (Ping timeout: 480 seconds)
[21:18] * Cube (~Cube@66-87-131-70.pools.spcsdns.net) Quit (Quit: Leaving.)
[21:20] * lupu (~lupu@86.107.101.246) Quit (Ping timeout: 480 seconds)
[21:21] * Underbyte (~jerrad@pat-global.macpractice.net) has joined #ceph
[21:34] <dwm> Oooh, I just saw that v0.76 has landed, and brings MDS directory storage changes.
[21:34] <dwm> Does this have performance implications?
[21:35] <dwm> (I've just started testing CephFS performance, and -- at least in the cold-cache case -- running `find` on a populated CephFS appears to be remarkably slow. I'm wondering if this will affect that..
[21:35] <dwm> )
[21:37] * fdmanana (~fdmanana@bl5-4-53.dsl.telepac.pt) has joined #ceph
[21:43] * xmltok (~xmltok@216.103.134.250) has joined #ceph
[21:44] <nwat> I have a whole bunch of inconsistent PGs in a single pool. Is there a form of 'ceph pg repair' that'll do it all at once..?
[21:45] <sage> nwat: have to go one at a time
[21:45] <sage> dwm: that's surprising; find should be pretty fast since it's loading entire directories with one object io
[21:46] * nwat deletes pool
[21:47] <fghaas> nwat, that is -- ahem -- decisive action, but let me know whether that fixes your problem if the PGs are inconsistent :)
[21:47] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) has joined #ceph
[21:47] * sleinen1 (~Adium@2001:620:0:26:3c05:a639:7d32:c2da) Quit (Quit: Leaving.)
[21:47] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[21:48] <nwat> Heh.. i was just kidding. Yeh, I did a repair on each PG listed as inconsistent, but only about half have resolved. There doesn't seem to be any active repairs, though.
[21:48] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit (Quit: mtanski)
[21:49] * Sysadmin88 (~IceChat77@94.4.30.95) has joined #ceph
[21:51] <dmick> nwat, sage: well, there's ceph osd repair, right?
[21:51] <dmick> for those that share a primary
[21:52] <dmick> although I'm sure it handles them serially anyway
[21:52] <fghaas> I don't think that's in dumpling though, iirc
[21:52] <dmick> I think it's pretty old code
[21:54] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:55] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[21:55] <sage> jcsp: can you redo the allow_new_snaps PR against next? the change is theretoo
[21:56] <xmltok> are firefly rpms available?
[21:56] <sage> jcsp: otherwise, Reviewed-by:
[21:57] * sarob (~sarob@2001:4998:effd:600:cc1d:bfe3:c1e1:b725) has joined #ceph
[21:57] <dmick> xmltok: firefly isn't finalized yet
[21:57] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[21:59] <xmltok> oh, there was a release notification put out yesterday on ceph.com
[22:01] <jcsp> sage: done (I think). do you merge things to next+master in parallel or pick from one to the other usually?
[22:02] <jcsp> I'm sort of fuzzily inferring the relationship
[22:02] <jcsp> the next PR is https://github.com/ceph/ceph/pull/1190
[22:05] <sage> we semi-regularly merge next back into master, so it can go just into next.
[22:05] <sage> merged, thanks!
[22:05] <sage> jcsp: ^
[22:05] <jcsp> gotcha
[22:05] <jcsp> thanks
[22:06] * ScOut3R (~ScOut3R@BC0652CA.dsl.pool.telekom.hu) has joined #ceph
[22:07] * kickr (~kickr@0001c4ca.user.oftc.net) Quit (Remote host closed the connection)
[22:07] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[22:07] * kickr (~kickr@0001c4ca.user.oftc.net) has joined #ceph
[22:11] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) has joined #ceph
[22:18] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[22:18] * xarses (~andreww@12.164.168.117) has joined #ceph
[22:20] * t0rn (~ssullivan@c-24-11-198-35.hsd1.mi.comcast.net) Quit (Quit: Leaving.)
[22:25] * zebra (~zebra@101.165.194.249) Quit (Quit: Mmm. Sleep.)
[22:31] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:31] * mtanski (~mtanski@cpe-72-229-51-156.nyc.res.rr.com) Quit (Quit: mtanski)
[22:33] * JoeGruher (~JoeGruher@134.134.137.75) Quit (Remote host closed the connection)
[22:33] * kickr (~kickr@0001c4ca.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:34] * srenatus (~stephan@185.27.182.16) has joined #ceph
[22:41] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) Quit (Quit: geraintjones)
[22:42] * srenatus (~stephan@185.27.182.16) Quit (Ping timeout: 480 seconds)
[22:44] * rotbeard (~redbeard@2a02:908:df19:7a80:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[22:44] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) has joined #ceph
[22:48] * rendar (~s@host94-182-dynamic.37-79-r.retail.telecomitalia.it) Quit ()
[22:51] * dmsimard (~Adium@ap03.wireless.co.mtl.iweb.com) Quit (Quit: Leaving.)
[22:57] * zerick (~eocrospom@200.1.177.147) has joined #ceph
[23:04] * rkeene (1011@oc9.org) has left #ceph
[23:05] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Remote host closed the connection)
[23:06] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[23:06] * sprachgenerator (~sprachgen@130.202.135.186) Quit (Quit: sprachgenerator)
[23:06] <dwm> sage: Sorry, was AFK. I haven't got a feel for where the latency is just yet; it appears to be find for listing the contents of a directory, but actually proceeding from one to the next feels slow.
[23:06] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[23:07] * sprachgenerator (~sprachgen@130.202.135.186) has joined #ceph
[23:08] * dupont-y (~dupont-y@2a01:e34:ec92:8070:51d7:99c0:f1ee:2bfe) has joined #ceph
[23:09] * sprachgenerator (~sprachgen@130.202.135.186) Quit ()
[23:11] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[23:11] * dereky (~dereky@proxy00.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[23:12] * sroy (~sroy@2607:fad8:4:6:6e88:14ff:feff:5374) Quit (Quit: Quitte)
[23:13] * xarses (~andreww@12.164.168.117) has joined #ceph
[23:15] * lupu (~lupu@86.107.101.246) has joined #ceph
[23:16] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[23:17] * geraintjones (~geraint@222-152-77-45.jetstream.xtra.co.nz) Quit (Quit: geraintjones)
[23:18] * BillK (~BillK-OFT@124-148-105-94.dyn.iinet.net.au) has joined #ceph
[23:22] * allsystemsarego (~allsystem@86.121.79.14) Quit (Quit: Leaving)
[23:22] * JoeGruher (~JoeGruher@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[23:23] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:29] * brambles (lechuck@s0.barwen.ch) Quit (Ping timeout: 480 seconds)
[23:31] * lupu (~lupu@86.107.101.246) Quit (Quit: Leaving.)
[23:31] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:31] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[23:31] * lupu (~lupu@86.107.101.246) has joined #ceph
[23:32] * Pedras (~Adium@216.207.42.132) has joined #ceph
[23:33] * brambles (lechuck@s0.barwen.ch) has joined #ceph
[23:36] * lupu (~lupu@86.107.101.246) Quit ()
[23:36] * lupu (~lupu@86.107.101.246) has joined #ceph
[23:37] * Pedras (~Adium@216.207.42.132) Quit (Read error: Connection reset by peer)
[23:37] * Pedras (~Adium@216.207.42.132) has joined #ceph
[23:38] * mikedawson (~chatzilla@adsl-108-90-194-29.dsl.ipltin.sbcglobal.net) has joined #ceph
[23:39] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:40] * carif (~mcarifio@pool-173-76-155-34.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[23:41] * john_barbee (~jbarbee@2001:430:ffff:667:a4fd:b7a5:8660:5756) has joined #ceph
[23:44] <xmltok> loic around?
[23:44] <bens> I like big drives and I cannot lie. You other brothers can't deny. That when data walks in and needs a place to store, ceph has the room and then it has some more.
[23:45] * lupu (~lupu@86.107.101.246) Quit (Read error: No route to host)
[23:45] <bens> sorry dmick.
[23:45] <dmick> hi, I'm bens and I'm addicted to big butt parodies :)
[23:45] * lupu (~lupu@86.107.101.246) has joined #ceph
[23:47] <fghaas> rturk-away: look no further than to bens for inspiration for this year's oscon ignite talk of yours
[23:47] <bens> ooh, is ceph comin' to oscon?
[23:49] <bens> oh no! oscon conflicts with HOPE this year!
[23:50] <fghaas> well I hope someone submitted a talk this year
[23:51] <dupont-y> hello everybody : quick question : is ceph osd repair * supposed to hang after a while ?
[23:51] <dupont-y> seems like there is a deadlock somewhere
[23:51] * nwat (~textual@eduroam-227-20.ucsc.edu) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[23:52] * TMM (~hp@c97185.upc-c.chello.nl) Quit (Quit: Ex-Chat)
[23:52] <dmick> heh. no, I can just about guarantee it's not supposed to hang
[23:53] <dupont-y> well hang is not quite correct. the osd just start to repair the pg
[23:53] * leochill (~leochill@nyc-333.nycbit.com) Quit (Quit: Leaving)
[23:53] <dupont-y> and after a while they're not juste reparing anymore
[23:55] * garphy is now known as garphy`aw
[23:55] <dmick> has the repair finished? It's not guaranteed to fix the problem; are there log messages about what it did?
[23:55] <dupont-y> no
[23:56] <dupont-y> i've been hit by the 0.75 bug
[23:56] * carif (~mcarifio@pool-173-76-155-34.bstnma.fios.verizon.net) has joined #ceph
[23:56] <dupont-y> lots of pg have been taggued with scrub problems
[23:56] <dupont-y> (corrected in 0.76)
[23:57] <dupont-y> ceph-mon-lmb-B-1:~# ceph health
[23:57] <dupont-y> HEALTH_ERR 2611 pgs inconsistent; 18 pgs stuck unclean; 2611 scrub errors
[23:58] <dupont-y> if I do ceph osd repair *, I see pg 0.xxx repairing, then 1.xxx ... but the more osd are adavancing in pg , the slower ther repair is
[23:59] <dupont-y> and after 10 minutes, no osd is repairing pg anymore
[23:59] <dupont-y> for example :
[23:59] <dupont-y> ceph-mon-lmb-B-1:~# ceph osd repair "*"
[23:59] <dupont-y> osds 0,1,2,3,4,5,9,10,11,12,13,14,15,16,17,18,19,20,24,25,26 instructed to repair
[23:59] <dupont-y> 2014-02-05 23:59:38.072605 osd.5 [INF] 0.d1 repair ok, 0 fixed
[23:59] <dupont-y> 2014-02-05 23:59:38.566393 osd.15 [INF] 0.12 repair ok, 0 fixed
[23:59] <dupont-y> 2014-02-05 23:59:38.828461 osd.1 [INF] 0.a9 repair ok, 0 fixed
[23:59] <dupont-y> 2014-02-05 23:59:38.931034 osd.26 [INF] 0.25 repair ok, 0 fixed
[23:59] <dupont-y> 2014-02-05 23:59:39.102552 osd.5 [INF] 0.f7 repair ok, 0 fixed

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.