#ceph IRC Log

Index

IRC Log for 2015-05-12

Timestamps are in GMT/BST.

[0:02] <Hydrar> gregsfortytwo: Gentoo's ebuild defaults to it off it seems, maybe I should poke maintainer to turn it on by default
[0:03] <gregsfortytwo> Hydrar: uh, yeah, not sure why it would be off anywhere
[0:03] <gregsfortytwo> maybe libatomicops isn't packaged/working on some obscure architecture
[0:05] <Hydrar> gregsfortytwo: The ebuilds can conditionally enable it by default for specific arches I think
[0:05] <Hydrar> Can't remember if it was nginx where libatomicops was not suppose to be used rather, so that use flag is a bit weird
[0:11] * pvh_sa (~pvh@197.79.8.193) Quit (Ping timeout: 480 seconds)
[0:12] * boichev2 (~boichev@213.169.56.130) has joined #ceph
[0:13] * shohn (~shohn@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:14] * Georgyo (~georgyo@shamm.as) Quit (Remote host closed the connection)
[0:14] * Georgyo (~georgyo@shamm.as) has joined #ceph
[0:15] * Georgyo (~georgyo@shamm.as) Quit (Remote host closed the connection)
[0:15] * Georgyo (~georgyo@shamm.as) has joined #ceph
[0:16] * boichev (~boichev@213.169.56.130) Quit (Ping timeout: 480 seconds)
[0:17] * ircolle1 (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[0:20] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:21] * hamiller (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[0:21] * hamiller (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit ()
[0:25] * bene (~ben@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[0:26] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit (Ping timeout: 480 seconds)
[0:27] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[0:29] * reed (~reed@216.3.101.62) has joined #ceph
[0:31] * cephiroth (~oftc-webi@br167-098.ifremer.fr) Quit (Remote host closed the connection)
[0:35] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) Quit (Remote host closed the connection)
[0:38] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[0:45] * LorenXo (~lmg@176.106.54.54) has joined #ceph
[0:48] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[0:48] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[0:49] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[0:50] * diegows (~diegows@190.190.5.238) has joined #ceph
[0:58] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Ping timeout: 480 seconds)
[0:59] * reed (~reed@216.3.101.62) Quit (Quit: Ex-Chat)
[0:59] * oblu (~o@62.109.134.112) Quit (Remote host closed the connection)
[1:01] * lavalake (~chatzilla@c-98-239-240-118.hsd1.pa.comcast.net) has joined #ceph
[1:03] * mwilcox_ (~mwilcox@116.251.192.71) has joined #ceph
[1:04] * lavalake (~chatzilla@c-98-239-240-118.hsd1.pa.comcast.net) Quit (Remote host closed the connection)
[1:13] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[1:14] * lavalake (~chatzilla@c-98-239-240-118.hsd1.pa.comcast.net) has joined #ceph
[1:15] * LorenXo (~lmg@7R2AAASOM.tor-irc.dnsbl.oftc.net) Quit ()
[1:15] * Scaevolus (~BillyBobJ@2WVAAB9UF.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:18] * BManojlovic (~steki@cable-89-216-175-133.dynamic.sbb.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:22] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[1:22] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[1:23] * oblu (~o@62.109.134.112) has joined #ceph
[1:30] * LeaChim (~LeaChim@host86-159-233-65.range86-159.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:37] * oms101_ (~oms101@p20030057EA032C00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:44] <alphe> ceph 0.94 hammer is really a big bugged stuff
[1:44] <alphe> I return to the cluster osd.1 osd 2 osd.3 then osd.8 osd.7 osd.9 go missing ..
[1:45] * ylmson (~Moriarty@chulak.enn.lu) has joined #ceph
[1:45] * Scaevolus (~BillyBobJ@2WVAAB9UF.tor-irc.dnsbl.oftc.net) Quit ()
[1:46] * oms101_ (~oms101@p20030057EA028300EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[1:46] <alphe> i never seen a ceph behaving that way ...
[1:47] <alphe> I use ceph since 0.5X and this is the worst brand of them all ..
[1:48] <alphe> normally you stop monitors mds osds and then start osds monitor mds and all is back to normal cool and dandy
[1:48] <alphe> actually with hammer my nodes as soon they are back in the ring they are knocked down by the heavy load of un necesary rebuild ...
[1:57] * daniel2_ (~dshafer@0001b605.user.oftc.net) Quit (Remote host closed the connection)
[1:58] * xarses (~andreww@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[1:58] * primechuck (~primechuc@173-17-128-216.client.mchsi.com) has joined #ceph
[2:06] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Ping timeout: 480 seconds)
[2:11] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[2:13] <sage> alphe: are they crashing, or getting marked down because they are loady?
[2:13] <sage> if the latter, you can work around it by throttling recovery or even 'ceph osd set nodown'
[2:14] <sage> if they are crashy, then bug report please! we haven't seen it before
[2:15] * joshd (~jdurgin@66-194-8-225.static.twtelecom.net) Quit (Ping timeout: 480 seconds)
[2:15] * ylmson (~Moriarty@7R2AAASRD.tor-irc.dnsbl.oftc.net) Quit ()
[2:15] * Zyn (~rushworld@exit1.tor-proxy.net.ua) has joined #ceph
[2:15] * oblu- (~o@62.109.134.112) has joined #ceph
[2:16] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[2:17] * wkennington (~william@76.77.181.17) Quit (Remote host closed the connection)
[2:18] <alphe> sage marked down because they have too much work to do
[2:18] <sage> try nodown then
[2:18] <alphe> the become laggy
[2:18] <alphe> ok
[2:19] <alphe> actually I tryed everything
[2:19] <alphe> I tryed pause nodown noup etc...
[2:20] <alphe> sage I really wish to tell ceph Hey dude stop trying to rebuild stuff that are here
[2:20] <alphe> wait everyone is back online lets free some heap then you can try your silly pack of rebuild stuff
[2:20] * oblu (~o@62.109.134.112) Quit (Ping timeout: 480 seconds)
[2:20] <sage> if you did nodown then the osds shouldn't go down. unless they (crash and) restart, in whcih case that is a different problem (and logs in a bug report please :)
[2:21] <alphe> but really it is the first ceph release which I get that everyone is laggy problem
[2:21] <sage> in hammer htere is a new 'ceph osd set norebalance' that will do repair but not migration.. that might be what you are asking for?
[2:21] <sage> it might be the age of the cluster changing the load/performance characteristics (more expensive recovery due to fragmentation, maybe)
[2:21] <alphe> it s a new installed cluster ...
[2:22] <alphe> 37 TB /2TB used ...
[2:22] <Kingrat> so it is new and not upgraded, even from giant?
[2:22] <alphe> and its lagging rebuilding like it was rebuilding 37TB ...
[2:23] <alphe> kingrat nope ... I learn since a lot of time my lesson better start a brand new cluster from scratch
[2:23] <alphe> delete / format / wipe everything even install a clean virgin linux ...
[2:24] <Kingrat> ive never done that and ive gone from firefly to giant to hammer
[2:24] * KevinPerks (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[2:24] <alphe> which each release ... hammer / ubuntu vivid is the way for me to try to solve some annoying issues
[2:25] <alphe> with cephfs /ntfs previous version the folder tree was disapearing from the bindfs for no reason
[2:25] <alphe> Yang jin told me it was a kernel bug and that 3.18 fixed it
[2:26] <alphe> so far I had not problems until suddently 1 osd went laggy /missing then a second then a third
[2:26] <alphe> etc until I only get 3/20 active osd ...
[2:26] <Kingrat> it has been fine for me, but im only using rbd
[2:26] <alphe> stoping restarting never get to get the cluster fully back ...
[2:27] <Kingrat> if you pastebin your config id be glad to look at it, even though im no expert
[2:27] <alphe> at most I get 16/20 nodes back then it start to lagg again and slowly osds are market missing
[2:27] <Kingrat> also what is your cluster configuration? number of hosts, number of osds per host, network config, etc
[2:27] * joshd (~jdurgin@206.169.83.146) has joined #ceph
[2:28] <alphe> ok so ceph cluster is down I limited the backfill process to 4
[2:28] <alphe> a nice little trick that save son ressources on my atom based nodes ---
[2:29] <alphe> yes every node of my cluster is an atom with 2gb of ram 1 of them have 4 gb of ram and it is the more stable so yes I get the main problem is lack of RAM
[2:29] <Kingrat> how many nodes?
[2:29] <alphe> 20
[2:29] <Kingrat> so 1 osd each?
[2:29] <alphe> 10 nodes 20 osds
[2:29] <Kingrat> oh
[2:29] <Kingrat> 2 each, ok
[2:30] <Kingrat> which atom?
[2:30] <alphe> yes ... with 2TB disk to each nodes
[2:30] <alphe> 2500 CC
[2:30] <alphe> intel 2500 CC motherboard
[2:30] <alphe> and since 2 and half years it was working great
[2:31] <alphe> I had some client to connect difficult time but never have that mess in the rebuild
[2:31] <alphe> since all osds have been at some moment marked missing then the whole stuff is rebuilding ...
[2:32] * wkennington (~william@76.77.181.17) has joined #ceph
[2:32] <alphe> yes ... with 2TB disk to each osd
[2:32] * dneary (~dneary@pool-96-252-45-212.bstnma.fios.verizon.net) has joined #ceph
[2:32] <Kingrat> hammer seems to be using up a little more ram for me than the older versions, so maybe that is your issue as you expect
[2:33] <Kingrat> given my number of pgs is a bit on the high side, i am around 900mb-1 gig per osd right now, not loaded, not rebuilding
[2:33] <alphe> since the whole thing is rebuilding the overall perf of my osd is at max and so the osd are laggy they don t see anymore their friend so they start tweeting on the net that life is unfaire ...
[2:33] <alphe> etc... damn adolescent osds
[2:34] <Kingrat> i would try to slow down all the backfill and rebuild
[2:34] <alphe> normally limiting the backfill process on each osd make the load nicer
[2:34] <Kingrat> as you have already done, but i would go to 1
[2:36] <devicenull> bleh
[2:36] <alphe> maybe the rebuild is so agressive because there is not mutch data in the cluster
[2:36] <devicenull> can I force ceph to prioritize repairing down+peering PGs
[2:36] <devicenull> versus ones that are just misplaced?
[2:37] <devicenull> I have this: https://gist.githubusercontent.com/devicenull/df59bb2e8f1d5db0b98e/raw/d78e549e7e874408a76bd0d1020d1dfd46816e75/gistfile1.txt
[2:37] <Kingrat> alphe, nah i dont think so, its probably just a huge load because you are running gig-e and low ram
[2:37] <alphe> when you have 35TB of 37TB used then probably the rebuild algorythm is more ... hum ... lazzy... so you have time to flipflap your osds and that doesn t create an avalanche
[2:37] <devicenull> seems very silly that I can't use some of my volume while ceph moves things around that are just in the wrong spot
[2:37] <alphe> devicenull you know what osd is having the pgs wonked ?
[2:38] <alphe> then yes you can
[2:38] <alphe> ceph osd repair osdXX
[2:38] <devicenull> ah, that'll increase the priority?
[2:38] <alphe> that will repair your dead pg on the osd ...
[2:39] <alphe> but some time because of the load your osd is getting laggy and it appear that pg are stuck
[2:39] <devicenull> yea, they've been like this for a couple hours
[2:39] <alphe> you can simply restart the osd that as the pgs stuck and then ceph will restart the self repair process
[2:40] <alphe> stop ceph-osd id=XXX
[2:40] <alphe> id is the number of your osd in the osd tree
[2:40] <devicenull> yea I should have tried the usual trick of blindly restarting everything
[2:40] <alphe> as you can see it on the ceph osd tree
[2:41] <alphe> devicenull hum I did that with hammer and hammer didn t like it at all ...
[2:41] <devicenull> ahh
[2:41] <alphe> but not at all not at all the whole cluster is tossed in the air rebuilding
[2:41] <devicenull> "starting or marking this osd lost may let us proceed"
[2:41] <devicenull> it's like I forgot how to debug this stuff
[2:41] <alphe> that is why I recommand you a more subtile approche
[2:43] <devicenull> yea.. I had an OSD that had failed but hadn't been fully removed, looks like rebuilding is blocked on it
[2:43] <alphe> yeap
[2:44] <alphe> you can mark it as lost or out and the rebuild process will restart
[2:45] * Zyn (~rushworld@8Q4AAAPP4.tor-irc.dnsbl.oftc.net) Quit ()
[2:45] * CoMa (~Lattyware@tor-exit-2.zenger.nl) has joined #ceph
[2:47] <devicenull> or I can just bring it back up for awhile.. it was down because the hardware is flakey but it'll probably stay up long enough...
[2:53] * rahatm1 (~rahatm1@d173-183-79-206.bchsia.telus.net) has joined #ceph
[2:53] * diegows (~diegows@190.190.5.238) Quit (Quit: Leaving)
[2:55] <devicenull> yea, maybe next time I'll actually look at the errors instead of assuming it was just because of the rebuild
[2:55] <devicenull> *facepalm*
[2:56] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[2:56] * B_Rake (~B_Rake@69-195-66-67.unifiedlayer.com) Quit (Ping timeout: 480 seconds)
[2:57] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[2:59] <flaf> Hi,
[3:00] <flaf> T1w, jeroenvh: I have read your discussion about ceph in multisites etc. (sorry I'm a little late ;)).
[3:02] <flaf> We are thinking too of purchasing a second location with a dark fiber between the 2 datacenters (10km)
[3:02] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:03] <flaf> In fact, there will be 3 locations, 2 rooms in DC1 and 1 rooms in DC2.
[3:05] <flaf> Of course, we want to install a ceph cluster split in the 3 locations.
[3:05] * shyu (~Shanzhi@119.254.196.66) has joined #ceph
[3:06] <flaf> My question is about the connections between the locations.
[3:06] * calvinx (~calvin@101.100.172.246) has joined #ceph
[3:06] <alphe> imagine that actually the recommanded ceph cluster made by supermicro use a dual 10gbe
[3:06] <flaf> Is it recommended to have simple L2 connections between the locations.
[3:07] <alphe> that gives you a glimpse of what you will need
[3:07] <alphe> I would say the higher network bandwidth the best ...
[3:08] <alphe> flaf that will work until on site is down and the rest is stuck reconstructing it
[3:08] <flaf> I ask the question because guys recommended for us L3 connections (ie differents IP networks between the locations).
[3:09] <flaf> alphe: in fact, my question is not currently about performance but about L2 or L3 connections.
[3:10] <alphe> you will need the more possible network bandwidth betwin your sites
[3:10] * florz (nobody@2001:1a50:503c::2) Quit (Ping timeout: 480 seconds)
[3:10] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[3:11] <alphe> ceph use a public network for monitoring and receive give data to clientes share
[3:11] <alphe> ceph use a prive network for osd interchange such as rebalancing rebuilding scrubbing all that admin data stuff
[3:12] <flaf> The guy who proposes us the 3th location with the dark fiber recommends L3 connection between the datacenters, not L2 (but it was a generic advise, the guy was not aware of our intention of build a ceph cluster).
[3:12] <flaf> alphe, yes I know.
[3:12] <alphe> so if you get one part of the cluster in washington another part in london and the last on in paris then betwin those 3 sites you will need a stable confiable connection the faster possible
[3:13] <alphe> or your ceph cluster will constantly detect and rebuild faillure
[3:13] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[3:14] <flaf> But, my question is: is it possible to have cluster nodes in different L3 networks (IP networks)?
[3:14] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has left #ceph
[3:14] <alphe> cpu:9.6 mem:90.6 5:35.49 ceph-osd hehehe do you think I have enough ram to start the other osd ?
[3:15] * CoMa (~Lattyware@8Q4AAAPQF.tor-irc.dnsbl.oftc.net) Quit ()
[3:15] * Grimhound (~ylmson@185.77.129.11) has joined #ceph
[3:15] <flaf> In other words, is it possible to have one node in DC1 and another node in DC2 but with a IP-router between DC1 and DC2?
[3:15] <alphe> and that is just after performing a heap release on that osd ...
[3:16] <alphe> ok tomorow i will cry to my boss for a meme upgrade ...
[3:16] * m0zes (~mozes@beocat.cis.ksu.edu) Quit (Remote host closed the connection)
[3:16] <alphe> flaf theorically ofcourse that is why vpn exists
[3:16] * m0zes (~mozes@beocat.cis.ksu.edu) has joined #ceph
[3:17] <alphe> but practically the outside network is laggy messy and you don t have mutch control on its use or quality unless you use point to point connections
[3:18] * cholcombe (~chris@pool-108-42-124-94.snfcca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[3:20] <flaf> Ok, I thought too that L3 connections between sites was not a good thing.
[3:21] * JV (~chatzilla@204.14.239.107) Quit (Ping timeout: 480 seconds)
[3:24] * kefu (~kefu@114.92.105.224) has joined #ceph
[3:25] * georgem (~Adium@69-196-174-91.dsl.teksavvy.com) has joined #ceph
[3:27] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) has joined #ceph
[3:28] * georgem1 (~Adium@69-196-174-91.dsl.teksavvy.com) has joined #ceph
[3:28] * georgem (~Adium@69-196-174-91.dsl.teksavvy.com) Quit (Read error: Connection reset by peer)
[3:29] * zack_dolby (~textual@e0109-114-22-11-74.uqwimax.jp) Quit (Read error: Connection reset by peer)
[3:30] <alphe> I have too much pgs ....
[3:31] <alphe> basically my rebuild nightmare comes from there ...
[3:32] * zack_dolby (~textual@pw126255083120.9.panda-world.ne.jp) has joined #ceph
[3:33] <alphe> 594 active+clean woot my cluster started to fill mutch better
[3:34] <flaf> alphe: if you have too much pgs, normally you should have a warning with "ceph -s".
[3:34] <alphe> I set nobackfill nodown norebalance then I waited until I got 17/20 nodes in the ring then start backfill
[3:34] <alphe> flaf I disabled the warning ...
[3:35] <flaf> How many pgs per osd?
[3:35] <flaf> (in the warning)
[3:35] <alphe> big granularity was supposed to be better for small configs ... in 047 or something around that
[3:35] * Concubidated (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:37] <alphe> the cluster is super laggy omg ceph -s takes a whole 5 to 10 seconds to complete
[3:39] * MentalRay (~MRay@107.171.161.165) has joined #ceph
[3:39] <alphe> pgs = 8196
[3:39] <alphe> or something around that crazy number
[3:40] * Concubidated1 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:41] * Concubidated2 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:41] <flaf> and size = 2 or 3 I guess.
[3:41] * Concubidated3 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:41] * lurbs always sets 'size = 2.71828'
[3:41] <alphe> should be 100 pg * number osd / number of pools
[3:42] * Concubidated2 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[3:42] <alphe> should be (100 pg * number osd )/ number of pools
[3:42] * Concubidated3 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[3:42] * Concubidated2 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:43] * Concubidated3 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:43] <flaf> If you have size == 2 for instance, you have ~ 819 pg / OSD which really too much indeed.
[3:43] <alphe> is that even possible ?
[3:43] <alphe> client io 0 B/s rd, 2847 MB/s wr, 7013 op/s
[3:45] <flaf> No idea.
[3:45] * Grimhound (~ylmson@2WVAAB9VX.tor-irc.dnsbl.oftc.net) Quit ()
[3:45] * DJComet (~Zeis@5NZAACGO9.tor-irc.dnsbl.oftc.net) has joined #ceph
[3:45] * Concubidated (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[3:45] <flaf> If I understand well you have no reading but only writing...
[3:47] <alphe> yeah reconstructions ...
[3:47] <alphe> of undead osds
[3:47] * Concubidated3 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Read error: No route to host)
[3:47] <flaf> But it's "client" here?
[3:47] * Concubidated1 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Read error: No route to host)
[3:48] * ira (~ira@0001cb91.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:48] <alphe> flaf yeah some time osd use the public network if not use to speed up interchages
[3:48] <alphe> one bus to read the other to write or something like that
[3:49] <flaf> Ah ok, I didn't know.
[3:50] <alphe> it is a wild guess ...
[3:50] <alphe> but since I have no client connected actually then that should be something like that
[3:50] * Concubidated2 (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[3:51] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[3:52] * Concubidated (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[3:54] * primechuck (~primechuc@173-17-128-216.client.mchsi.com) Quit (Remote host closed the connection)
[3:54] <alphe> with norebalance nobackfill nodown the rebuild process is way faster
[3:55] * primechuck (~primechuc@173-17-128-216.client.mchsi.com) has joined #ceph
[3:55] * Concubidated (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Read error: No route to host)
[3:59] * rahatm1 (~rahatm1@d173-183-79-206.bchsia.telus.net) Quit (Remote host closed the connection)
[4:04] * Concubidated (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[4:06] * kefu (~kefu@114.92.105.224) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[4:07] * fam is now known as fam_away
[4:07] * kefu (~kefu@114.92.105.224) has joined #ceph
[4:13] * fam_away is now known as fam
[4:14] * zhaochao (~zhaochao@125.39.8.227) has joined #ceph
[4:15] * DJComet (~Zeis@5NZAACGO9.tor-irc.dnsbl.oftc.net) Quit ()
[4:15] * airsoftglock (~TheDoudou@37.187.129.166) has joined #ceph
[4:16] * florz (nobody@2001:1a50:503c::2) has joined #ceph
[4:19] * kefu_ (~kefu@114.92.109.171) has joined #ceph
[4:21] * MentalRay (~MRay@107.171.161.165) Quit (Quit: This computer has gone to sleep)
[4:23] * calvinx (~calvin@101.100.172.246) Quit (Quit: calvinx)
[4:26] * kefu (~kefu@114.92.105.224) Quit (Ping timeout: 480 seconds)
[4:30] * fam is now known as fam_away
[4:34] * bkopilov (~bkopilov@bzq-79-183-181-186.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[4:35] * fam_away is now known as fam
[4:36] * fam is now known as fam_away
[4:37] * fam_away is now known as fam
[4:39] * shang (~ShangWu@175.41.48.77) has joined #ceph
[4:43] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[4:44] * dlan (~dennis@116.228.88.131) has joined #ceph
[4:45] * airsoftglock (~TheDoudou@0SGAAAO1E.tor-irc.dnsbl.oftc.net) Quit ()
[4:45] * delcake (~VampiricP@7R2AAASY0.tor-irc.dnsbl.oftc.net) has joined #ceph
[4:49] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[4:51] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[4:53] * dlan (~dennis@116.228.88.131) has joined #ceph
[4:56] * calvinx (~calvin@101.100.172.246) has joined #ceph
[4:59] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[4:59] * alram (~alram@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[4:59] * dlan (~dennis@116.228.88.131) has joined #ceph
[5:06] * MentalRay (~MRay@107.171.161.165) has joined #ceph
[5:11] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[5:11] * dlan (~dennis@116.228.88.131) has joined #ceph
[5:15] * delcake (~VampiricP@7R2AAASY0.tor-irc.dnsbl.oftc.net) Quit ()
[5:16] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[5:16] * dlan (~dennis@116.228.88.131) has joined #ceph
[5:21] * dlan (~dennis@116.228.88.131) Quit (Remote host closed the connection)
[5:24] * zack_dolby (~textual@pw126255083120.9.panda-world.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[5:25] * zack_dolby (~textual@pw126255083120.9.panda-world.ne.jp) has joined #ceph
[5:26] * dlan (~dennis@116.228.88.131) has joined #ceph
[5:26] * shylesh (~shylesh@121.244.87.124) has joined #ceph
[5:26] * jeevan_ullas (~Deependra@114.143.35.153) has joined #ceph
[5:26] * zack_dolby (~textual@pw126255083120.9.panda-world.ne.jp) Quit (Read error: Connection reset by peer)
[5:27] * KevinPerks (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[5:29] * JV (~chatzilla@216.2.50.205) has joined #ceph
[5:30] * JV_ (~chatzilla@204.14.239.107) has joined #ceph
[5:38] * JV (~chatzilla@216.2.50.205) Quit (Ping timeout: 480 seconds)
[5:38] * lavalake is now known as jian_wang
[5:38] * kefu_ (~kefu@114.92.109.171) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[5:38] * dneary (~dneary@pool-96-252-45-212.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[5:41] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[5:45] * Frostshifter (~SaneSmith@politkovskaja.torservers.net) has joined #ceph
[5:46] * lightspeed (~lightspee@2001:8b0:16e:1:8326:6f70:89f:8f9c) Quit (Ping timeout: 480 seconds)
[5:47] * dalgaaf (uid15138@charlton.irccloud.com) Quit (Quit: Connection closed for inactivity)
[5:50] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[5:50] * MrAbaddon (~MrAbaddon@194.38.157.12) has joined #ceph
[5:52] * lightspeed (~lightspee@2001:8b0:16e:1:8326:6f70:89f:8f9c) has joined #ceph
[5:56] * Vacuum__ (~vovo@i59F79BF4.versanet.de) has joined #ceph
[5:57] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[5:59] * MrAbaddon (~MrAbaddon@194.38.157.12) Quit (Ping timeout: 480 seconds)
[6:00] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[6:01] * MentalRay (~MRay@107.171.161.165) Quit (Quit: This computer has gone to sleep)
[6:02] <alphe> my ceph cluster is getting better
[6:02] * Vacuum_ (~vovo@i59F790B1.versanet.de) Quit (Ping timeout: 480 seconds)
[6:03] * haomaiwa_ (~haomaiwan@115.218.159.105) Quit (Quit: Leaving...)
[6:04] * MrAbaddon (~MrAbaddon@62.48.251.6) has joined #ceph
[6:05] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[6:06] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[6:07] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[6:08] * kefu (~kefu@114.92.109.171) has joined #ceph
[6:09] * kefu_ (~kefu@114.92.109.171) has joined #ceph
[6:09] * kefu (~kefu@114.92.109.171) Quit (Read error: Connection reset by peer)
[6:11] * MrAbaddon (~MrAbaddon@62.48.251.6) Quit (Remote host closed the connection)
[6:15] * Frostshifter (~SaneSmith@0SGAAAO4I.tor-irc.dnsbl.oftc.net) Quit ()
[6:15] * sese_ (~N3X15@195.169.125.226) has joined #ceph
[6:22] * gregmark (~Adium@68.87.42.115) Quit (Read error: Connection reset by peer)
[6:22] * gregmark (~Adium@68.87.42.115) has joined #ceph
[6:25] * shaunm (~shaunm@74.215.76.114) Quit (Ping timeout: 480 seconds)
[6:28] * linjan (~linjan@213.8.240.146) has joined #ceph
[6:35] * shang (~ShangWu@175.41.48.77) Quit (Ping timeout: 480 seconds)
[6:37] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:37] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Ping timeout: 480 seconds)
[6:37] * amote (~amote@121.244.87.116) has joined #ceph
[6:45] * sese_ (~N3X15@0SGAAAO5M.tor-irc.dnsbl.oftc.net) Quit ()
[6:45] * Da_Pineapple (~WedTM@nx-74205.tor-exit.network) has joined #ceph
[6:45] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:46] * red (~red@chello084112110034.11.11.vie.surfer.at) has joined #ceph
[6:47] * georgem1 (~Adium@69-196-174-91.dsl.teksavvy.com) Quit (Quit: Leaving.)
[6:49] * linjan (~linjan@213.8.240.146) Quit (Ping timeout: 480 seconds)
[6:51] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[6:52] * haomaiwang (~haomaiwan@115.218.159.105) has joined #ceph
[6:59] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[7:03] * kefu_ (~kefu@114.92.109.171) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[7:07] <alphe> ok so now I have 2783 active+clean+replay
[7:07] <alphe> what should I do to get ride of them ?
[7:15] * Da_Pineapple (~WedTM@53IAAAR7C.tor-irc.dnsbl.oftc.net) Quit ()
[7:15] * cryptk (~cryptk@2.tor.exit.babylon.network) has joined #ceph
[7:18] * shaunm (~shaunm@74.215.76.114) has joined #ceph
[7:19] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[7:23] * oro (~oro@80-219-254-208.dclient.hispeed.ch) has joined #ceph
[7:23] * oro_ (~oro@80-219-254-208.dclient.hispeed.ch) has joined #ceph
[7:26] * shang (~ShangWu@125.227.44.79) has joined #ceph
[7:35] * karnan (~karnan@121.244.87.117) has joined #ceph
[7:45] * cryptk (~cryptk@0SGAAAO7X.tor-irc.dnsbl.oftc.net) Quit ()
[7:45] * jacoo (~Altitudes@nx-01.tor-exit.network) has joined #ceph
[7:45] * linjan (~linjan@95.35.27.121) has joined #ceph
[7:46] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[7:58] * pvh_sa (~pvh@197.79.6.49) has joined #ceph
[8:04] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[8:04] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[8:05] * i_m (~ivan.miro@pool-109-191-92-175.is74.ru) has joined #ceph
[8:05] * Nacer (~Nacer@2001:41d0:fe82:7200:fd2c:400e:3153:6372) has joined #ceph
[8:10] * linjan (~linjan@95.35.27.121) Quit (Read error: No route to host)
[8:12] * kefu (~kefu@114.92.109.171) has joined #ceph
[8:15] * jacoo (~Altitudes@53IAAAR95.tor-irc.dnsbl.oftc.net) Quit ()
[8:15] * PuyoDead (~capitalth@nx-01.tor-exit.network) has joined #ceph
[8:17] * kefu (~kefu@114.92.109.171) Quit (Max SendQ exceeded)
[8:18] * kefu (~kefu@114.92.109.171) has joined #ceph
[8:23] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[8:25] * pvh_sa (~pvh@197.79.6.49) Quit (Ping timeout: 480 seconds)
[8:26] * smithfarm (~ncutler@nat1.scz.suse.com) has joined #ceph
[8:26] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) Quit (Quit: Copywight 2007 Elmer Fudd. All wights wesewved.)
[8:29] * cok (~chk@2a02:2350:18:1010:a88d:ce55:507e:3276) has joined #ceph
[8:33] * shang (~ShangWu@125.227.44.79) Quit (Ping timeout: 480 seconds)
[8:34] * ntt (~oftc-webi@195.60.190.156) Quit (Remote host closed the connection)
[8:35] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:35] * Nacer (~Nacer@2001:41d0:fe82:7200:fd2c:400e:3153:6372) Quit (Remote host closed the connection)
[8:39] * fam is now known as fam_away
[8:41] * oro_ (~oro@80-219-254-208.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:41] * oro (~oro@80-219-254-208.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:42] * ircolle1 (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[8:42] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[8:42] * ircolle1 (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[8:42] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[8:45] * PuyoDead (~capitalth@53IAAASBU.tor-irc.dnsbl.oftc.net) Quit ()
[8:46] * aj__ (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[8:47] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[8:47] * swami1 (~swami@49.32.0.182) has joined #ceph
[8:48] <swami1> loicd: Hi
[8:48] <loicd> swami1: morning !
[8:48] <loicd> swami1: afternoon actually, right ?
[8:48] <swami1> loicd: Good morning. How are you doing?
[8:49] <loicd> swami1: very well, thank you :-) What can I do for you ?
[8:50] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[8:54] * JV_ (~chatzilla@204.14.239.107) Quit (Ping timeout: 480 seconds)
[8:59] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[9:01] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[9:01] <Be-El> hi
[9:02] <swami1> loicd: Hope you will be visiting the openstack summit in Canada
[9:02] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[9:02] <loicd> swami1: I won't be there unfortunately :-(
[9:03] * vbellur (~vijay@91.126.187.62) Quit (Ping timeout: 480 seconds)
[9:06] * chasmo77 (~chas77@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[9:06] * vbellur (~vijay@91.126.187.62) has joined #ceph
[9:08] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:13] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[9:14] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[9:15] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[9:15] * vbellur (~vijay@91.126.187.62) Quit (Ping timeout: 480 seconds)
[9:17] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[9:18] * dgurtner (~dgurtner@178.197.231.88) has joined #ceph
[9:18] * ChrisNBl_ (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[9:19] * analbeard (~shw@support.memset.com) has joined #ceph
[9:23] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) has joined #ceph
[9:25] <swami1> loicd: OK...As I planned to be there, if you are coming, thinking to meet in person
[9:25] <loicd> it would have been nice
[9:26] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit (Ping timeout: 480 seconds)
[9:26] * nljmo_ (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[9:27] * pvh_sa (~pvh@41.164.8.114) has joined #ceph
[9:28] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) has joined #ceph
[9:30] * shang (~ShangWu@125.227.44.79) has joined #ceph
[9:31] * nardial (~ls@dslb-178-009-182-130.178.009.pools.vodafone-ip.de) has joined #ceph
[9:33] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[9:33] <swami1> loicd: :(
[9:34] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:35] * vbellur (~vijay@91.126.187.62) has joined #ceph
[9:35] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[9:41] * swami1 (~swami@49.32.0.182) has left #ceph
[9:45] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[9:46] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[9:48] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:50] * cok (~chk@2a02:2350:18:1010:a88d:ce55:507e:3276) has left #ceph
[9:51] * shang (~ShangWu@125.227.44.79) Quit (Ping timeout: 480 seconds)
[9:51] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) Quit (Ping timeout: 480 seconds)
[9:54] * lupin7474 (~mbrc@62.241.6.96) Quit (Quit: leaving)
[9:55] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[9:55] * oro_ (~oro@2001:620:20:16:f592:6b60:c6b1:f7e5) has joined #ceph
[9:56] * branto (~branto@178-253-131-113.3pp.slovanet.sk) has joined #ceph
[9:56] * antoine (~bourgault@192.93.37.4) has joined #ceph
[9:57] * oro (~oro@2001:620:20:16:f592:6b60:c6b1:f7e5) has joined #ceph
[10:00] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[10:02] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[10:04] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[10:04] * linjan (~linjan@195.110.41.9) has joined #ceph
[10:04] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:07] * b0e (~aledermue@213.95.25.82) has joined #ceph
[10:10] * jordanP (~jordan@213.215.2.194) has joined #ceph
[10:12] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[10:13] * fam_away is now known as fam
[10:13] * fam is now known as fam_away
[10:14] * fam_away is now known as fam
[10:19] * treenerd (~treenerd@85.193.140.98) Quit (Remote host closed the connection)
[10:20] * ChrisNBl_ (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Ping timeout: 480 seconds)
[10:20] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[10:22] * fam is now known as fam_away
[10:22] * fam_away is now known as fam
[10:22] * fam is now known as fam_away
[10:25] * bitserker (~toni@88.87.194.130) has joined #ceph
[10:25] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[10:26] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) has joined #ceph
[10:32] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[10:34] * fam_away is now known as fam
[10:37] * smithfarm (~ncutler@nat1.scz.suse.com) Quit (Quit: Leaving.)
[10:38] * smithfarm (~ncutler@nat1.scz.suse.com) has joined #ceph
[10:38] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:40] * fam is now known as fam_away
[10:40] * fam_away is now known as fam
[10:41] * coredumb (~coredumb@irc.mauras.ch) has joined #ceph
[10:41] <coredumb> Hello folks
[10:42] <coredumb> is there a way to run a redundant MDS setup ?
[10:44] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[10:45] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[10:45] * tunaaja (~Peaced@tor.metaether.net) has joined #ceph
[10:47] * smithfarm (~ncutler@nat1.scz.suse.com) Quit (Quit: Leaving.)
[10:47] * ntt (~oftc-webi@195.60.190.156) has joined #ceph
[10:47] * smithfarm (~ncutler@nat1.scz.suse.com) has joined #ceph
[10:48] <ntt> Hi. I'm trying to install ceph on centos 7 following this guide -> http://docs.ceph.com/docs/master/start/quick-ceph-deploy but it simply doesn't works. I have an error: "KeyNotFoundError: Could not find keyring file: /etc/ceph/ceph.client.admin.keyring on host node1". Someone can help me please? Installing ceph is very frustrating....
[10:50] * cok (~chk@nat-cph1-sys.net.one.com) has joined #ceph
[10:50] <coredumb> mmmh i really don't get the documentation
[10:51] <loicd> ntt: could you copy in http://paste2.org/ the full sequence of command + output that you are using ?
[10:51] <coredumb> it states in multiple different pages that you can run multiple MDSs for high availability but on the ceph-deploy-mds page you get this nice warning:
[10:51] <coredumb> Important
[10:51] <coredumb> You must deploy at least one metadata server to use CephFS. There is experimental support for running multiple metadata servers. Do not run multiple metadata servers in production.
[10:51] <ntt> loicd: sure! thank you
[10:52] <loicd> coredumb: what is troubling you ?
[10:52] <coredumb> loicd: can i or not / should i or not run multiple MDS servers ?
[10:53] <coredumb> i mean running only one is a single point of failure
[10:53] <coredumb> ...
[10:53] <ntt> loicd: wait a moment... i launch ceph-deploy purge and install again...
[10:54] * smithfarm (~ncutler@nat1.scz.suse.com) Quit (Remote host closed the connection)
[10:54] <loicd> coredumb: you can run multiple MDS servers. But it's not recommended to do so in production because it's still fragile. And you're correct: having a single MDS is a single point of failure. That's one of the reasons why CephFS is not ready yet. It's very actively developped though and things will change :-)
[10:55] <coredumb> loicd: :(
[10:55] <loicd> coredumb: :-)
[10:55] <Be-El> loicd: does the stability problem also refer to an active/standby setup?
[10:55] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) Quit (Ping timeout: 480 seconds)
[10:56] <loicd> Be-El: I'm not familiar with CephFS really.
[10:56] <coredumb> loicd: so it means that i must stick to glusterfs
[10:56] <Be-El> loicd: i'm only using it. but afaik as i know the documentation states that active/active is fragile, while active/standby should be ok
[10:57] * smithfarm (~ncutler@nat1.scz.suse.com) has joined #ceph
[10:57] <loicd> coredumb: I understand your disapointement. You would like to use CephFS in production and it's not ready yet.
[10:57] <coredumb> loicd: yes :)
[10:58] * Kioob`Taff (~plug-oliv@2a01:e35:2e8a:1e0::42:10) has joined #ceph
[10:58] <coredumb> actually i was interested in the fact that the ceph client is available at kernel level and not only limited to fuse
[10:58] <loicd> Be-El: I don't know enough to give you a better insight.
[10:58] <ntt> loicd: http://paste2.org/6M7DZfX6
[10:59] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[10:59] <Be-El> coredumb: maybe you should repeat your question later the day, when more developers with insight into cephfs are active on the channel
[10:59] <ntt> loicd: i have 4 nodes configuration (like the tutorial): nadmin -> admin node, node1 -> monitor node, node2 -> osd1, node3 -> osd2
[10:59] <Be-El> coredumb: or use the mailing list
[11:00] <coredumb> Be-El: well seems like it's clear enough with this red warning in documentation :D
[11:02] <Be-El> coredumb: as loicd already said, cephfs is under constant development. the hammer release included are number of bugfixes especially for cephfs
[11:03] <loicd> ntt: [ceph_deploy.gatherkeys][DEBUG ] Checking node1 for /etc/ceph/ceph.client.admin.keyring
[11:03] <loicd> ntt: should have been created by
[11:04] <coredumb> Be-El: i'll roam around here for a while and ask again later then :)
[11:04] <loicd> [node1][DEBUG ] Starting ceph-create-keys on node1...
[11:05] <ntt> loicd: should i run manually that command and search for errors?
[11:05] <loicd> ntt: can you verify that /etc/ceph/ceph.client.admin.keyring does not exist on node1 ? in case it has been created *after* ceph-deploy tried to copy it
[11:06] <ntt> loicd: /etc/ceph/ceph.client.admin doesn't exists on node1
[11:06] <loicd> ntt: can you also verify that ceph-create-keys is still running ?
[11:06] <loicd> it should *not* be running
[11:06] <loicd> on node1 that is
[11:06] * oro (~oro@2001:620:20:16:f592:6b60:c6b1:f7e5) Quit (Ping timeout: 480 seconds)
[11:06] <loicd> it is the process that creates the keys when you bootstrap a cluster
[11:07] <ntt> ceph-create-keys: error: argument --id/-i is required <-- This is the result if i run ceph-create-keys on node1
[11:07] <loicd> it is the supposed to create /etc/ceph/ceph.client.admin but something went wrong
[11:07] * oro_ (~oro@2001:620:20:16:f592:6b60:c6b1:f7e5) Quit (Ping timeout: 480 seconds)
[11:08] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[11:10] <loicd> ntt: ok. I suspect this is a bug related to systemd or something.
[11:10] <ntt> yes
[11:10] <loicd> ntt: ceph-create-keys is called by init scripts
[11:11] <ntt> yes.... i'm checking
[11:11] <loicd> ntt: did you search http://tracker.ceph.com/ for this error ?
[11:11] <loicd> http://tracker.ceph.com/projects/ceph-deploy/search?utf8=%E2%9C%93&q=ceph-create-keys
[11:11] <ntt> no... i'm really a newb with ceph
[11:12] <ntt> i'm searching a reliable object storage for openstack and i don't want to use swift
[11:12] * loicd exploring
[11:12] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[11:12] <loicd> ntt: it's a little frustrating to run into that kind of problem when trying something for the first time :-/
[11:13] <ntt> yes... i confirm :)
[11:13] <ntt> so... this problem is related to the os in some way.... should i use ubuntu
[11:13] <ntt> ?
[11:15] * tunaaja (~Peaced@8Q4AAAPY3.tor-irc.dnsbl.oftc.net) Quit ()
[11:15] <loicd> if that's not too much trouble for you to switch operating systems, that's likely to solve your issue
[11:15] * Rens2Sea (~FNugget@ncc-1701-a.tor-exit.network) has joined #ceph
[11:15] <loicd> ntt: otherwise you can manually run ceph-create-keys and keep going
[11:16] <loicd> let me create a bug report with what you have
[11:16] <ntt> what is the value of the id parameter?
[11:16] <loicd> I think it's the mon id, let me check
[11:16] <ntt> ok. thank you
[11:16] <loicd> yeah, it's the mon id
[11:17] <loicd> ntt: ceph-create-keys --verbose --id node1
[11:17] <loicd> should create /etc/ceph/*keyring
[11:18] <kaisan> did anyone write a shell oneliner to return if "this server" is the leading monitor?
[11:18] <ntt> admin_socket: exception getting command descriptions: [Errno 2] No such file or directory INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
[11:18] <ntt> loicd: this is the result
[11:19] * loicd checking
[11:19] * oro_ (~oro@pat.zurich.ibm.com) has joined #ceph
[11:19] * oro_ (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[11:19] <ntt> loicd: should i call this command from the admin node?
[11:19] <loicd> no, it must be run from node1
[11:20] <loicd> ntt: ls -l /var/run/ceph/ ?
[11:20] <ntt> ok. i've done from node1
[11:20] <loicd> ah
[11:20] <ntt> srwxr-xr-x 1 root root 0 12 mag 11.15 ceph-mon.node1.asok -rw-r--r-- 1 root root 5 12 mag 11.15 mon.node1.pid
[11:20] <loicd> ntt: ceph-create-keys --verbose --id node1 --cluster ceph
[11:21] <loicd> (the --cluster argument does not default to the ceph value, it has to be set explicitly)
[11:21] <loicd> ntt: this is the socket by which you can talk to the monitor, FYI ;-)
[11:21] <ntt> http://paste2.org/dPcbC2dg
[11:21] * oro (~oro@pat.zurich.ibm.com) has joined #ceph
[11:21] * oro (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[11:22] * loicd thinking
[11:22] <ntt> loicd: firewalld is disabled
[11:23] <loicd> netstat -tlpn ?
[11:24] <loicd> ntt: http://paste2.org/6M7DZfX6 has [node1][DEBUG ] "addr": "192.168.122.201:6789/0",
[11:25] <loicd> but it looks like your /etc/ceph.conf has a line with 10.0.0.10 instead
[11:25] <loicd> could it be that the name resolution is different from the admin node and the node1 perspective ?
[11:25] <ntt> tcp 0 0 192.168.122.201:6789 0.0.0.0:* LISTEN 5243/ceph-mon
[11:25] <loicd> right
[11:26] <ntt> yes.... because i'm trying to install with 2 separate networks. 10.0.0.0/24 private and 192.168.122.0/24 public
[11:26] <loicd> what's in your /etc/ceph.conf on node1 ?
[11:26] <ntt> 10.0.01
[11:26] <ntt> 10.0.0.10
[11:26] <ntt> this is the problem
[11:26] <loicd> cat you show the full /etc/ceph/ceph.conf ?
[11:27] <loicd> you should just change the mon hosts = ... to have 192.168.122.201 instead of 10.0.0.10
[11:27] <loicd> and then ceph-create-keys will run successfully
[11:27] <ntt> http://paste2.org/EgbAz1e1
[11:27] <ntt> ok.... but my idea is that the "replication" traffic should go on 10.0.0.0/24
[11:28] <loicd> ntt: you can worry about that after the cluster is setup, you don't need to do that right now
[11:28] <loicd> you'll just add cluster_network = 10.0.0.0/24 and restart the osd
[11:29] <ntt> ok.... but can i change only the local file on node1?
[11:29] <loicd> vi /etc/ceph/ceph.conf ?
[11:29] <ntt> yes
[11:29] <loicd> well, yes
[11:29] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) has joined #ceph
[11:29] <loicd> and also change it on admin so they are in sync
[11:31] <ntt> ok. now it works! What i've done: changed /etc/ceph/ceph.conf only on node1 and manually called the script ceph-create-keys. Next, I adjusted /etc/ceph/ceph.conf on node1 and called ceph-deploy mon create-initialk
[11:32] <loicd> ok
[11:32] <ntt> loicd: really thank you! Now i proceed with the tutorial
[11:32] <loicd> on admin node you need to ceph-deploy gatherkeys now
[11:33] <loicd> to collect the keys on the admin node from node1
[11:33] <loicd> so that they can be distributed on the OSD
[11:33] <ntt> no... i don't think because create-initial already called gatherkeys
[11:33] <ntt> from logs
[11:34] <ntt> [ceph_deploy.gatherkeys][DEBUG ] Got ceph.client.admin.keyring key from node1.
[11:34] <ntt> and so on...
[11:34] <loicd> oh, you re-ran create-initial ?
[11:34] <loicd> ah right
[11:34] <ntt> yes
[11:34] <loicd> ok then :-)
[11:34] <ntt> I manually re-adjusted /etc/ceph/ceph.conf on node1
[11:34] <loicd> so, it was not a bug after all. It was just a mis-configuration :-)
[11:34] <ntt> yes
[11:35] <loicd> cool
[11:35] <alphe> sage I succeed in dealling with my laggy nodes !!!
[11:35] <alphe> so here is the way to do it nobackfill noscrub norecover nodeepscrub
[11:35] <ntt> loicd: but it's important to understand how ceph-deploy can handle the problem of replication traffic on a secondary network
[11:35] <alphe> then pause
[11:35] <alphe> then restart of all osd until they are stabilised
[11:36] * dyasny (~dyasny@198.251.56.148) Quit (Ping timeout: 480 seconds)
[11:36] <alphe> then you remove the pause and they start to reconstruct slowly but nicely without overload
[11:36] <alphe> then you unset the norecover
[11:37] <alphe> that will speed up the process
[11:38] <alphe> at the end of it you will have a good segement of your pgs that are clean+active+replay
[11:38] <alphe> and a warning that there are stuck requests on osd
[11:38] <alphe> so you ask for a ceph health detail and restart osd by osd maining the nodown all the way
[11:39] <alphe> and after 8 hours of fight your laggy ceph cluster is back on line
[11:39] <alphe> my next move will be to improve ram on my ceph nodes
[11:40] <alphe> I discovered the use of ceph tell osd.XX heap release which in my case was neat
[11:40] <alphe> have a good night all
[11:40] <alphe> bye
[11:40] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Quit: Leaving)
[11:45] * sankarshan (~sankarsha@121.244.87.117) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[11:45] * Rens2Sea (~FNugget@5NZAACGYH.tor-irc.dnsbl.oftc.net) Quit ()
[11:45] * Bromine (~AluAlu@176.10.99.201) has joined #ceph
[11:46] * dyasny (~dyasny@198.251.58.219) has joined #ceph
[11:46] * rendar (~I@host76-193-dynamic.252-95-r.retail.telecomitalia.it) has joined #ceph
[11:53] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[11:57] * cok (~chk@nat-cph1-sys.net.one.com) has left #ceph
[11:58] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[11:59] * kawa2014 (~kawa@89.184.114.246) Quit (Ping timeout: 480 seconds)
[11:59] * kawa2014 (~kawa@212.110.41.244) has joined #ceph
[12:00] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) Quit (Quit: Ex-Chat)
[12:00] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) has joined #ceph
[12:01] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[12:05] * antoine (~bourgault@192.93.37.4) Quit (Ping timeout: 480 seconds)
[12:11] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[12:15] * Bromine (~AluAlu@7R2AAATIY.tor-irc.dnsbl.oftc.net) Quit ()
[12:15] * Phase (~Qiasfah@62-210-170-27.rev.poneytelecom.eu) has joined #ceph
[12:16] * Phase is now known as Guest4899
[12:17] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[12:20] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[12:23] * lucas1 (~Thunderbi@218.76.52.64) Quit (Quit: lucas1)
[12:26] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[12:26] * shyu (~Shanzhi@119.254.196.66) Quit (Remote host closed the connection)
[12:30] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[12:32] * kawa2014 (~kawa@212.110.41.244) Quit (Ping timeout: 480 seconds)
[12:32] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[12:44] * ChrisNBl_ (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[12:44] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) has joined #ceph
[12:45] * Guest4899 (~Qiasfah@53IAAASPK.tor-irc.dnsbl.oftc.net) Quit ()
[12:45] * Jaska (~Sirrush@176.10.99.202) has joined #ceph
[12:45] * kefu (~kefu@114.92.109.171) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:48] <ntt> loicd: i'm trying to install an rgw on the monitor node (following the tutorial), but i have an error. Can you help me please?
[12:49] <loicd> ntt: it's lunch time here in France. Food is sacred for us, you know ? ;-)
[12:50] <ntt> yes.... i'm from italy so food is sacred :)
[12:50] * ChrisNB__ (~ChrisNBlu@178.255.153.117) has joined #ceph
[12:50] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[12:51] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit (Ping timeout: 480 seconds)
[12:52] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[12:53] * KevinPerks (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[12:54] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) Quit ()
[12:54] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[12:57] * dopesong (~dopesong@lb0.mailer.data.lt) has joined #ceph
[12:57] * ChrisNBl_ (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Ping timeout: 480 seconds)
[13:02] * fdmanana (~fdmanana@bl5-2-127.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[13:03] * fxmulder_ (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) has joined #ceph
[13:05] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) has joined #ceph
[13:06] * hellertime (~Adium@72.246.0.14) has joined #ceph
[13:09] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:13] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[13:15] * Jaska (~Sirrush@5NZAACG0Q.tor-irc.dnsbl.oftc.net) Quit ()
[13:15] * xENO_ (~vend3r@176.10.99.200) has joined #ceph
[13:15] * cok (~chk@2a02:2350:18:1010:b9fb:d81c:72ac:66b2) has joined #ceph
[13:16] * linjan (~linjan@195.110.41.9) has joined #ceph
[13:19] <jcsp> I'm British, my lunch came out of a can :-)
[13:20] <Tetard> damn I'm hungry now
[13:21] * madkiss (~madkiss@2001:6f8:12c3:f00f:b516:4290:b046:6cba) Quit (Quit: Leaving.)
[13:27] * Concubidated (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[13:31] * cok (~chk@2a02:2350:18:1010:b9fb:d81c:72ac:66b2) has left #ceph
[13:35] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[13:39] * shang (~ShangWu@211-75-72-139.HINET-IP.hinet.net) has joined #ceph
[13:42] <loicd> jcsp: :-D
[13:45] * xENO_ (~vend3r@8Q4AAAP2H.tor-irc.dnsbl.oftc.net) Quit ()
[13:45] * Kealper (~SurfMaths@TerokNor.tor-exit.network) has joined #ceph
[13:47] * fdmanana (~fdmanana@bl5-2-127.dsl.telepac.pt) has joined #ceph
[13:48] <loicd> shylesh: could you tell me more about your use case ?
[13:49] <loicd> shylesh: ideally I could read the fix you're trying to test and suggest a simple way
[13:52] * KevinPerks (~Adium@173-14-159-105-NewEngland.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[13:54] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[13:59] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[14:01] * dopesong (~dopesong@lb0.mailer.data.lt) Quit (Quit: Leaving...)
[14:04] <shylesh> loicd: There is a perf improvement fix which says degraded pgs should always get priority over misplaced while recovery is happening
[14:05] <shylesh> loicd: suppose PG A is mapped to [0,1,2] and PG B mapped to [0,1,3] and assume 2 goes down and 3 is marked out
[14:06] * shang (~ShangWu@211-75-72-139.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[14:06] <shylesh> loicd: now new set becomes [0,1,4] where 4 is having PGs A and B to get backfill and recovery
[14:06] <loicd> shylesh: can I get a look at the fix ?
[14:07] <shylesh> loicd: I want to check that PG A gets the priority over B because A is degraded and B is misplaced
[14:07] <shylesh> loicd: I don't have pointer to the commit
[14:07] <shylesh> loicd: sam gave me some steps to test this
[14:08] <shylesh> loicd: I am still working on it
[14:08] <loicd> shylesh: ah, cool :-)
[14:08] <loicd> I'm glad you found help.
[14:08] <shylesh> loicd: I should thank u too
[14:08] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[14:08] * shylesh (~shylesh@121.244.87.124) Quit (Remote host closed the connection)
[14:13] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[14:14] * georgem (~Adium@184.151.178.179) has joined #ceph
[14:14] * hellerbarde2 (~quassel@nat-dok-04-084.nat.fhnw.ch) has joined #ceph
[14:14] * georgem (~Adium@184.151.178.179) Quit ()
[14:15] * Kealper (~SurfMaths@8Q4AAAP20.tor-irc.dnsbl.oftc.net) Quit ()
[14:15] * Borf (~AotC@ncc-1701-a.tor-exit.network) has joined #ceph
[14:17] * dgurtner (~dgurtner@178.197.231.88) Quit (Ping timeout: 480 seconds)
[14:19] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[14:19] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[14:19] * dgurtner (~dgurtner@178.197.231.88) has joined #ceph
[14:20] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[14:21] * antoine (~bourgault@192.93.37.4) has joined #ceph
[14:23] * primechuck (~primechuc@173-17-128-216.client.mchsi.com) Quit (Remote host closed the connection)
[14:24] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:24] * zhaochao (~zhaochao@125.39.8.227) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 31.6.0/20150331233809])
[14:28] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[14:28] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[14:30] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Quit: Leaving.)
[14:31] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[14:32] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[14:34] * shang (~ShangWu@211-75-72-139.HINET-IP.hinet.net) has joined #ceph
[14:34] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[14:35] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:37] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[14:39] * shang (~ShangWu@211-75-72-139.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[14:43] * kanagaraj (~kanagaraj@121.244.87.124) has joined #ceph
[14:44] * rdas (~rdas@121.244.87.116) has joined #ceph
[14:45] * Borf (~AotC@7R2AAATON.tor-irc.dnsbl.oftc.net) Quit ()
[14:45] * Vidi (root@89-73-177-236.dynamic.chello.pl) has joined #ceph
[14:47] * Concubidated (~Adium@nat-pool-bos-t.redhat.com) has joined #ceph
[14:47] * KevinPerks (~Adium@nat-pool-bos-t.redhat.com) has joined #ceph
[14:49] * cok (~chk@nat-cph1-sys.net.one.com) has joined #ceph
[14:54] * kefu (~kefu@114.92.109.171) has joined #ceph
[14:55] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[14:55] * alram (~alram@nat-pool-bos-t.redhat.com) has joined #ceph
[14:56] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[14:57] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[14:57] * kanagaraj (~kanagaraj@121.244.87.124) Quit (Quit: Leaving)
[14:57] * cok (~chk@nat-cph1-sys.net.one.com) Quit (Quit: Leaving.)
[15:02] * mwilcox_ (~mwilcox@116.251.192.71) Quit (Ping timeout: 480 seconds)
[15:03] * rotbeard (~redbeard@x5f75050a.dyn.telefonica.de) has joined #ceph
[15:05] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[15:05] * bandrus (~brian@nat-pool-bos-u.redhat.com) has joined #ceph
[15:08] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[15:09] * hellerbarde2 (~quassel@nat-dok-04-084.nat.fhnw.ch) Quit (Ping timeout: 480 seconds)
[15:10] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) has joined #ceph
[15:11] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[15:11] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:13] * sjm (~sjm@nat-pool-bos-u.redhat.com) has joined #ceph
[15:15] * Vidi (root@7R2AAATPY.tor-irc.dnsbl.oftc.net) Quit ()
[15:15] * rcfighter (~PuyoDead@5.61.34.63) has joined #ceph
[15:16] * shohn (~shohn@nat-pool-bos-u.redhat.com) has joined #ceph
[15:17] * linuxkidd (~linuxkidd@nat-pool-bos-u.redhat.com) has joined #ceph
[15:18] * alram (~alram@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[15:18] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[15:21] * pvh_sa (~pvh@41.164.8.114) Quit (Ping timeout: 480 seconds)
[15:23] * alram (~alram@nat-pool-bos-t.redhat.com) has joined #ceph
[15:24] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[15:27] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[15:27] * Hemanth (~Hemanth@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:29] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[15:31] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[15:31] * zviratko (~zviratko@241-73-239-109.cust.centrio.cz) Quit (Ping timeout: 480 seconds)
[15:35] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[15:40] * zviratko (~zviratko@241-73-239-109.cust.centrio.cz) has joined #ceph
[15:40] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[15:42] * calvinx (~calvin@101.100.172.246) Quit (Quit: calvinx)
[15:44] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[15:45] * rcfighter (~PuyoDead@8Q4AAAP5L.tor-irc.dnsbl.oftc.net) Quit ()
[15:45] * Zeis (~cmrn@81-89-96-90.blue.kundencontroller.de) has joined #ceph
[15:47] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) Quit (Quit: bye!)
[15:47] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[15:52] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[15:52] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit (Quit: Leaving)
[15:54] * thomnico_ (~thomnico@2a01:e35:8b41:120:31a1:bb5a:1e1f:9b46) has joined #ceph
[15:54] * kanagaraj (~kanagaraj@27.7.33.209) has joined #ceph
[15:55] * bene (~ben@nat-pool-bos-t.redhat.com) has joined #ceph
[15:58] * thomnico (~thomnico@2a01:e35:8b41:120:3911:f487:be0d:3d95) Quit (Ping timeout: 480 seconds)
[16:01] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:05] * kefu (~kefu@114.92.109.171) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[16:07] * kefu (~kefu@114.92.109.171) has joined #ceph
[16:07] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has left #ceph
[16:09] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[16:13] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[16:13] * bkopilov (~bkopilov@bzq-79-183-181-186.red.bezeqint.net) has joined #ceph
[16:13] * boichev2 (~boichev@213.169.56.130) Quit (Ping timeout: 480 seconds)
[16:15] * Zeis (~cmrn@5NZAACG56.tor-irc.dnsbl.oftc.net) Quit ()
[16:15] * raindog (~Spikey@thoreau.gtor.org) has joined #ceph
[16:15] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[16:17] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[16:18] * kefu (~kefu@114.92.109.171) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[16:21] * pvh_sa (~pvh@41.164.8.114) has joined #ceph
[16:24] * kefu (~kefu@114.92.109.171) has joined #ceph
[16:24] * pvh_sa (~pvh@41.164.8.114) Quit (Read error: Connection reset by peer)
[16:25] <burley> If you have an inconsistent object, and replication = 3, will ceph pg repair still always pick from the primary OSD, or does it select the 2 that match as correct to fix the third?
[16:25] <m0zes> picks from primary.
[16:25] <m0zes> based on recent mailing list posts.
[16:27] <m0zes> this thread: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001331.html
[16:28] <burley> yeah, read that thread, still don't like the answer :)
[16:28] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[16:30] <mlausch> hi - i have a problem with my ceph mons and the leveldb
[16:30] <mlausch> the leveldb store is on all of my 5 mons at aprox. 80 - 90 GB
[16:31] <mlausch> we triggert "ceph tell mon.mon1 compact" on one of the nodes
[16:33] <mlausch> the node compacted the data and tried to synchronise with the cluster. while synchonizing. the mon from which the data are read stopped after about 20 seconds the networkstream and the cluster calling a new leader election which took some long time
[16:34] <mlausch> after i get a new ledader the synchonizeing process started again.
[16:34] <mlausch> i don't know how i can bring in my mon again
[16:35] <mlausch> can anyone help me
[16:36] <mlausch> i'am using ceph dumpling
[16:37] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[16:41] * kefu (~kefu@114.92.109.171) Quit (Max SendQ exceeded)
[16:41] * kefu (~kefu@114.92.109.171) has joined #ceph
[16:45] * raindog (~Spikey@8Q4AAAP7A.tor-irc.dnsbl.oftc.net) Quit ()
[16:52] * mtanski (~mtanski@65.244.82.98) has joined #ceph
[16:52] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[16:57] * reed (~reed@75-101-54-131.dsl.static.fusionbroadband.com) has joined #ceph
[16:57] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[17:00] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[17:02] * wushudoin (~wushudoin@nat-pool-bos-u.redhat.com) has joined #ceph
[17:05] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[17:11] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:11] * KevinPerks (~Adium@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[17:11] * KevinPerks (~Adium@nat-pool-bos-t.redhat.com) has joined #ceph
[17:11] * segutier (~segutier@173.231.115.58) has joined #ceph
[17:12] * madkiss (~madkiss@2001:6f8:12c3:f00f:20c0:26fb:d47:5062) has joined #ceph
[17:13] * kanagaraj (~kanagaraj@27.7.33.209) Quit (Quit: Leaving)
[17:13] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Quit: This computer has gone to sleep)
[17:13] <mlausch> does noone have a idea?
[17:14] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[17:15] * dux0r (~galaxyAbs@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[17:16] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:19] * leseb_ (~leseb@81-64-215-19.rev.numericable.fr) Quit (Quit: ZNC - http://znc.in)
[17:19] * vbellur (~vijay@91.126.187.62) Quit (Ping timeout: 480 seconds)
[17:21] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[17:21] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[17:27] * cholcombe (~chris@pool-108-42-124-94.snfcca.fios.verizon.net) has joined #ceph
[17:29] * JV (~chatzilla@204.14.239.106) has joined #ceph
[17:30] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[17:31] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) has joined #ceph
[17:32] * B_Rake (~B_Rake@69-195-66-67.unifiedlayer.com) has joined #ceph
[17:33] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[17:37] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[17:40] * frednass (~fred@dn-infra-12.lionnois.univ-lorraine.fr) has left #ceph
[17:42] * rwheeler (~rwheeler@5.29.243.114) Quit (Quit: Leaving)
[17:43] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) has joined #ceph
[17:44] * vbellur (~vijay@91.126.187.62) has joined #ceph
[17:45] * dux0r (~galaxyAbs@7R2AAATXN.tor-irc.dnsbl.oftc.net) Quit ()
[17:45] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[17:45] * PeterRabbit (~Rens2Sea@wannabe.torservers.net) has joined #ceph
[17:47] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[17:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:49] * smithfarm (~ncutler@nat1.scz.suse.com) Quit (Quit: Leaving.)
[17:51] * TMM_ (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[17:52] * huangjun (~kvirc@175.8.105.71) has joined #ceph
[17:54] <devicenull> too many PGs per OSD (563 > max 300)
[17:54] <devicenull> wtf am I supposed to do about that?
[17:54] <devicenull> given that I can't reduce the number of PGs in a pool
[17:54] <huangjun> hi,all, i want to print the bad data crc message's body info. but set debug_ms=20, not works
[17:55] * rotbeard (~redbeard@x5f75050a.dyn.telefonica.de) Quit (Quit: Leaving)
[17:55] * daniel2_ (~dshafer@0001b605.user.oftc.net) has joined #ceph
[17:55] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) Quit (Quit: Leaving)
[17:55] * mwilcox_ (~mwilcox@116.251.192.71) has joined #ceph
[17:55] <huangjun> it seems the message body flushed to stdout or stderr?
[17:56] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:56] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[17:56] <devicenull> nm, found "mon pg warn max per osd"
[17:56] <devicenull> seems like a very silly warning, given that there's really nothing you can do about it
[17:57] <huangjun> yes, it control the warning message
[17:57] <huangjun> you can delete a pool or add more osds
[17:57] <devicenull> great, so a version upgrade either means a massive data migration
[17:57] <devicenull> or purchasing new hardware
[17:57] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Ping timeout: 480 seconds)
[17:57] * capri_oner (~capri@212.218.127.222) has joined #ceph
[17:59] * vbellur (~vijay@91.126.187.62) Quit (Ping timeout: 480 seconds)
[17:59] * N00b (59fb9d43@107.161.19.53) has joined #ceph
[18:01] * alram (~alram@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[18:02] * xarses (~andreww@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:03] * mwilcox_ (~mwilcox@116.251.192.71) Quit (Ping timeout: 480 seconds)
[18:04] * kefu (~kefu@114.92.109.171) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:04] * wenjunhuang__ (~wenjunhua@61.135.172.68) has joined #ceph
[18:04] * capri_on (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[18:04] * alram (~alram@nat-pool-bos-t.redhat.com) has joined #ceph
[18:07] * N00b (59fb9d43@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[18:07] * rendar (~I@host76-193-dynamic.252-95-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[18:09] * jeevan_ullas (~Deependra@114.143.35.153) Quit (Quit: Textual IRC Client: www.textualapp.com)
[18:09] * antoine (~bourgault@192.93.37.4) Quit (Ping timeout: 480 seconds)
[18:11] * rendar (~I@host76-193-dynamic.252-95-r.retail.telecomitalia.it) has joined #ceph
[18:11] * linuxkidd (~linuxkidd@nat-pool-bos-u.redhat.com) Quit (Read error: Connection reset by peer)
[18:11] * wenjunhuang_ (~wenjunhua@61.135.172.68) Quit (Ping timeout: 480 seconds)
[18:15] * Kioob`Taff (~plug-oliv@2a01:e35:2e8a:1e0::42:10) Quit (Quit: Leaving.)
[18:15] * kapsel (~k@psel.dk) Quit (Read error: Connection reset by peer)
[18:15] * PeterRabbit (~Rens2Sea@5NZAACG9I.tor-irc.dnsbl.oftc.net) Quit ()
[18:15] * xENO_ (~rapedex@5NZAACHAG.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:16] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[18:20] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[18:21] * linjan (~linjan@213.8.240.146) has joined #ceph
[18:21] * ntt_ (~oftc-webi@89.21.199.250) has joined #ceph
[18:23] <ntt_> Hi. Following this guide -> http://docs.ceph.com/docs/master/start/quick-ceph-deploy I installed ceph with 1 mon and 2 osd but i've an error when i try to add an RGW. Someone can help me pleasE?
[18:25] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[18:25] <ntt_> error is: "/etc/init.d/ceph: rgw.rgw.node1 not found (/etc/ceph/ceph.conf defines mon.node1 , /var/lib/ceph defines mon.node1)"
[18:25] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[18:28] * nsoffer (~nsoffer@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:31] <huangjun> maybe you need to add client.radosgw section in ceph.conf
[18:32] <huangjun> you should look the ceph radosgw doc
[18:33] * xarses (~andreww@12.164.168.117) has joined #ceph
[18:34] <ntt_> ceph-deploy should add this line in ceph.conf or i'm wrong?
[18:35] <huangjun> you may need to do it yourself
[18:41] * JV (~chatzilla@204.14.239.106) Quit (Ping timeout: 480 seconds)
[18:41] * leseb_ (~leseb@81-64-215-19.rev.numericable.fr) has joined #ceph
[18:45] * xENO_ (~rapedex@5NZAACHAG.tor-irc.dnsbl.oftc.net) Quit ()
[18:45] * Bj_o_rn (~HoboPickl@tor-exit2-readme.puckey.org) has joined #ceph
[18:55] <ntt_> following the quick-ceph-deploy tutorial, why "ceph osd lspools" returns only "o rbd," ? where i define pools with ceph-deploy?
[19:01] * KevinPerks1 (~Adium@nat-pool-bos-t.redhat.com) has joined #ceph
[19:01] * KevinPerks (~Adium@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[19:03] <huangjun> create pools by "ceph osd pool" or "rados " command series
[19:08] * championofcyrodi1 (~championo@50-205-35-98-static.hfc.comcastbusiness.net) has left #ceph
[19:11] * pdrakewe_ (~pdrakeweb@oh-71-50-38-193.dhcp.embarqhsd.net) has joined #ceph
[19:12] * alram (~alram@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[19:12] * Hemanth (~Hemanth@117.192.241.91) has joined #ceph
[19:12] * shylesh (~shylesh@123.136.222.112) has joined #ceph
[19:13] * jordanP (~jordan@213.215.2.194) Quit (Quit: Leaving)
[19:15] * Bj_o_rn (~HoboPickl@5NZAACHA4.tor-irc.dnsbl.oftc.net) Quit ()
[19:15] * PierreW (root@89-73-177-236.dynamic.chello.pl) has joined #ceph
[19:19] * pdrakeweb (~pdrakeweb@cpe-65-185-74-239.neo.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:19] * alram (~alram@nat-pool-bos-t.redhat.com) has joined #ceph
[19:22] * sjm (~sjm@nat-pool-bos-u.redhat.com) Quit (Quit: Leaving.)
[19:22] * sjm (~sjm@nat-pool-bos-u.redhat.com) has joined #ceph
[19:25] * vbellur (~vijay@91.126.187.62) has joined #ceph
[19:25] * JV (~chatzilla@204.14.239.106) has joined #ceph
[19:29] * shohn (~shohn@nat-pool-bos-u.redhat.com) Quit (Read error: Connection reset by peer)
[19:32] * analbeard (~shw@support.memset.com) has joined #ceph
[19:33] * analbeard (~shw@support.memset.com) Quit ()
[19:33] * shohn (~shohn@nat-pool-bos-u.redhat.com) has joined #ceph
[19:34] * analbeard (~shw@support.memset.com) has joined #ceph
[19:39] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[19:39] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[19:41] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[19:41] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[19:42] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) Quit (Quit: Leaving.)
[19:45] * PierreW (root@8Q4AAAQBH.tor-irc.dnsbl.oftc.net) Quit ()
[19:45] * Kyso_ (~SaneSmith@176.10.99.208) has joined #ceph
[19:45] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[19:45] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[19:46] * vbellur (~vijay@91.126.187.62) Quit (Ping timeout: 480 seconds)
[19:46] * markl (~mark@knm.org) Quit (Ping timeout: 480 seconds)
[19:47] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[19:50] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Remote host closed the connection)
[19:50] * jeroenvh (~jeroen@37.74.194.90) Quit (Ping timeout: 480 seconds)
[19:53] * Hemanth (~Hemanth@117.192.241.91) Quit (Ping timeout: 480 seconds)
[19:54] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[19:59] * BManojlovic (~steki@cable-89-216-224-179.dynamic.sbb.rs) has joined #ceph
[19:59] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Quit: This computer has gone to sleep)
[20:00] * fdmanana (~fdmanana@bl5-2-127.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[20:00] * wenjunhuang__ (~wenjunhua@61.135.172.68) Quit (Read error: Connection reset by peer)
[20:00] * wenjunhuang__ (~wenjunhua@61.135.172.68) has joined #ceph
[20:00] * alram (~alram@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:01] * georgem (~Adium@fwnat.oicr.on.ca) has left #ceph
[20:02] * Hemanth (~Hemanth@117.192.235.238) has joined #ceph
[20:04] * rlrevell1 (~leer@vbo1.inmotionhosting.com) has joined #ceph
[20:07] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[20:08] * pvh_sa (~pvh@105-236-14-195.access.mtnbusiness.co.za) has joined #ceph
[20:10] * ChrisNB__ (~ChrisNBlu@178.255.153.117) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[20:11] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[20:11] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[20:12] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:14] * fdmanana (~fdmanana@bl5-2-127.dsl.telepac.pt) has joined #ceph
[20:15] * Kyso_ (~SaneSmith@8Q4AAAQB4.tor-irc.dnsbl.oftc.net) Quit ()
[20:15] * Epi (~xolotl@212.7.194.71) has joined #ceph
[20:16] * branto (~branto@178-253-131-113.3pp.slovanet.sk) has left #ceph
[20:17] * alram (~alram@nat-pool-bos-t.redhat.com) has joined #ceph
[20:19] * loganlsfkd (~logan@216.245.207.2) Quit (Remote host closed the connection)
[20:19] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[20:20] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:21] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[20:21] * Hemanth (~Hemanth@117.192.235.238) Quit (Ping timeout: 480 seconds)
[20:22] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) Quit (Remote host closed the connection)
[20:22] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:23] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[20:26] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[20:28] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:29] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[20:30] * rlrevell1 (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[20:30] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:30] * Hemanth (~Hemanth@117.192.236.139) has joined #ceph
[20:33] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:35] * JV (~chatzilla@204.14.239.106) Quit (Ping timeout: 480 seconds)
[20:35] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:38] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:40] * rlrevell1 (~leer@vbo1.inmotionhosting.com) has joined #ceph
[20:40] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Read error: Connection reset by peer)
[20:40] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:41] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:42] * nsoffer (~nsoffer@bzq-109-65-255-36.red.bezeqint.net) has joined #ceph
[20:42] * Hemanth (~Hemanth@117.192.236.139) Quit (Ping timeout: 480 seconds)
[20:43] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:43] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:45] * Epi (~xolotl@7R2AAAT72.tor-irc.dnsbl.oftc.net) Quit ()
[20:45] * Quackie (~tallest_r@static-ip-85-25-103-119.inaddr.ip-pool.com) has joined #ceph
[20:45] * kingcu (~kingcu@kona.ridewithgps.com) has joined #ceph
[20:45] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:48] <theanalyst> is it ok if we try to bring up ceph clients (like rgw/rbd) when the ceph cluster is still in a WARN state (ie bringup mons, osds etc)
[20:48] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[20:48] <theanalyst> meant this in context of a deployment orchestrator pipeline sort of way for eg. puppet managed
[20:48] <kingcu> theanalyst: i've been doing it without a problem
[20:48] <kingcu> matter of fact, my ceph cluster never leaves the WARN state :)
[20:49] <kingcu> https://gist.github.com/kingcu/499c3d9373726e5c7a95
[20:49] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:50] <theanalyst> hm seems familiar :)
[20:50] <kingcu> you having the same issue?
[20:51] <theanalyst> facing some issues when we begin with one mon + one osd (and rgw starting) before others
[20:51] <kingcu> dropped in here to see if anyone else had any recommendations, would like to have proper monitoring enabled without parsing out specific warning conditions
[20:51] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:52] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:54] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:55] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[20:56] <theanalyst> yes thats also something we wanted to know the general sequence of operations for bringing up a cluster
[20:56] * goberle (~goberle@195.154.71.151) has joined #ceph
[20:57] * wschulze (~wschulze@nat-pool-bos-t.redhat.com) has joined #ceph
[20:57] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[20:58] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[21:00] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit ()
[21:00] * rotbeard (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) has joined #ceph
[21:00] <kingcu> theanalyst: documentation recommends restarting services in this order for a rolling upgrade, which I would expect would be the general recommendation for bringing up new services as well: mons, osds, mds, rgw
[21:02] * bitserker (~toni@88.87.194.130) Quit (Ping timeout: 480 seconds)
[21:03] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[21:03] <theanalyst> yes; but when bootstrapping a cluster kind of tempted to do osds and things like mds & rgw in parallel
[21:04] <theanalyst> however here the problem being that adding osds take a large bit of time (considering the no of disks to be added as osds etc) while clients can come up more quickly
[21:04] <kingcu> i've done that and haven't run into a problem, though you probably don't want to do that on existing production clusters
[21:04] <kingcu> i'm using ansible and run rolling updates sequentially
[21:06] * alram (~alram@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[21:07] * srk (~oftc-webi@32.97.110.56) has joined #ceph
[21:10] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[21:12] <theanalyst> kingcu: yeah upgrades are okay I guess, are there more recommendations for fresh cluster installs
[21:13] <kingcu> not sure unfortunately, sorry that i can't help. i wouldn't be too worried about fresh clusters, because there's not much to break. if you are automating a system that's going to be used by other people and has to be reliable, there are probably best practices
[21:14] <kingcu> but, bringing up a fresh cluster, semi-automated (you are running your chef/puppet/ansible script and can babysit) i wouldn't worry about it. but, i am a crappy sysadmin so YMMV
[21:14] * shylesh (~shylesh@123.136.222.112) Quit (Remote host closed the connection)
[21:14] <theanalyst> kingcu: :)
[21:15] <srk> hi, anyone seen osd crashing (seg ffault) during crush map updates with Hammer?
[21:15] * Quackie (~tallest_r@0SGAAAP19.tor-irc.dnsbl.oftc.net) Quit ()
[21:15] * thomnico_ (~thomnico@2a01:e35:8b41:120:31a1:bb5a:1e1f:9b46) Quit (Ping timeout: 480 seconds)
[21:15] * alram (~alram@nat-pool-bos-t.redhat.com) has joined #ceph
[21:17] <srk> stack: https://pastebin.osuosl.org/26376/
[21:22] * rotbeard (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[21:27] * rendar (~I@host76-193-dynamic.252-95-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:30] * rendar (~I@host76-193-dynamic.252-95-r.retail.telecomitalia.it) has joined #ceph
[21:35] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[21:42] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) has joined #ceph
[21:44] <srk> old defect related to monitors crashing with similar stack: http://tracker.ceph.com/issues/7487
[21:44] <srk> it was resolved though
[21:45] <srk> The issue we are seeing is osd crash
[21:45] * Plesioth (~Kyso_@195.169.125.226) has joined #ceph
[21:47] * ntt_ (~oftc-webi@89.21.199.250) Quit (Remote host closed the connection)
[21:48] <srk> @dmic, @sage, any clues on the osd crash? stack is here: https://pastebin.osuosl.org/26376/
[21:48] <cephalobot> srk: Error: "dmic," is not a valid command.
[21:49] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[21:50] * bkopilov (~bkopilov@bzq-79-183-181-186.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[21:50] * bkopilov (~bkopilov@bzq-79-180-27-146.red.bezeqint.net) has joined #ceph
[21:55] * srkr (~srk@32.97.110.56) has joined #ceph
[21:55] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[21:56] <sage> srk: reproducible?
[21:56] <sage> can you post a copy of your cursh map? (ceph osd getcrushmap -o /tmp/map)
[21:57] <srk> yes, reprdocible
[21:57] * LeaChim (~LeaChim@host86-159-233-65.range86-159.btcentralplus.com) has joined #ceph
[21:58] * rotbeard (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) has joined #ceph
[21:58] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[22:02] * cl (~oftc-webi@32.97.110.54) has joined #ceph
[22:03] * treenerd (~treenerd@cpe90-146-100-181.liwest.at) Quit (Quit: Verlassend)
[22:04] <srk> crushmap is here: https://pastebin.osuosl.org/26386/
[22:04] <srk> postd it in txt format
[22:06] * oro (~oro@pat.zurich.ibm.com) has joined #ceph
[22:06] * oro (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[22:06] * oro (~oro@pat.zurich.ibm.com) has joined #ceph
[22:06] * oro (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[22:06] * i_m (~ivan.miro@pool-109-191-92-175.is74.ru) Quit (Quit: Leaving.)
[22:06] * oro (~oro@pat.zurich.ibm.com) has joined #ceph
[22:06] * oro (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[22:07] * oro_ (~oro@pat.zurich.ibm.com) has joined #ceph
[22:07] * oro_ (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[22:12] * wschulze (~wschulze@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[22:12] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[22:15] * Plesioth (~Kyso_@2WVAACAE1.tor-irc.dnsbl.oftc.net) Quit ()
[22:15] * KUSmurf (~redbeast1@manning1.torservers.net) has joined #ceph
[22:16] * wschulze (~wschulze@nat-pool-bos-t.redhat.com) has joined #ceph
[22:19] <srk> @sage, pls let me know if you need any further details
[22:21] * jluis (~joao@249.38.136.95.rev.vodafone.pt) has joined #ceph
[22:21] * ChanServ sets mode +o jluis
[22:22] <sage> srk: do you min dposting the binary version somewhere?
[22:23] * hellertime (~Adium@72.246.0.14) Quit (Quit: Leaving.)
[22:23] <JohnPreston78> hi everyone
[22:24] <JohnPreston78> has someone done bench tests changing MTU on the ceph OSD hosts
[22:25] <sage> srk: it doesn't seem to compile
[22:26] <srk> I've changed the hostnames after decompiling
[22:26] <srk> would that matter
[22:27] <srk> I can post the original output of the crushtool
[22:27] * joao (~joao@249.38.136.95.rev.vodafone.pt) Quit (Ping timeout: 480 seconds)
[22:31] <srk> https://pastebin.osuosl.org/26396/
[22:31] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) has joined #ceph
[22:33] * alfredodeza (~alfredode@198.206.133.89) has left #ceph
[22:35] <sage> the replicated_ruleset still references bucket0, which does not exist
[22:37] * logan (~a@63.143.49.103) Quit (Ping timeout: 480 seconds)
[22:39] <srk> I think we removed the default bucket and didn't remove the ruleset
[22:40] <sage> canyou do 'ceph osd getcrsuhmap -o /tmp/foo' to grab the actual crush mpa that led to the crash?
[22:41] * PaulC (~paul@222-154-14-12.jetstream.xtra.co.nz) has joined #ceph
[22:42] <srk> I didn't mean it was removed manually from the text file
[22:44] <srk> so, the crushmap that I posted was the actual output
[22:44] * wenjunhuang__ (~wenjunhua@61.135.172.68) Quit (Ping timeout: 480 seconds)
[22:44] <srk> got this line in ceph.conf too: osd pool default crush rule = 0
[22:44] * wenjunhuang__ (~wenjunhua@61.135.172.68) has joined #ceph
[22:45] * KUSmurf (~redbeast1@2WVAACAFM.tor-irc.dnsbl.oftc.net) Quit ()
[22:45] * EdGruberman (~Yopi@exit1.ipredator.se) has joined #ceph
[22:46] <srk> We are just separating ssd and sata disks and created new rulesets
[22:46] <sage> ceph osd getcrushmap should dump it in compiled/binary form..
[22:46] <srk> ok
[22:47] <srk> can I send it your email? not sure where I can post it ::)
[22:47] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Quit: This computer has gone to sleep)
[22:47] <sage> sure
[22:48] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[22:49] * KevinPerks1 (~Adium@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[22:49] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[22:52] * KevinPerks (~Adium@nat-pool-bos-t.redhat.com) has joined #ceph
[22:52] <cmdrk> to see if I'm using straw or straw2, do i need to get the crushmap and decompile it?
[22:53] * JV_ (~chatzilla@204.14.239.106) has joined #ceph
[22:53] <srk> @sage, sent to newdream id
[22:53] <cephalobot> srk: Error: "sage," is not a valid command.
[22:54] <sage> ah.. yeah, the map is bad, because the rule refers to a bucket tha tdoes not exist.
[22:55] <cmdrk> hmm.. well decompiling the crushmap is easy enough. i guess that answers my own question
[22:55] <sage> and crashes when it tries to do it. do you have your command history that led to this state? my guess ist hat you deleted a bucket and it didn't verify that it wasn't in use by a rule
[22:57] <srk> yes, we are deleting the default bucket without removing the ruleset that is using it.
[22:57] * segutier (~segutier@173.231.115.58) Quit (Quit: segutier)
[22:57] * segutier (~segutier@173.231.115.58) has joined #ceph
[22:58] <srk> so, the crash is expected. correct?
[22:58] <sage> explained. will patch it to not crash in this case. and we'll want to prevent bucket deletion too
[22:58] <sage> if you create a new bucket it shoudl fix the crash
[22:59] * Vacuum_ (~vovo@i59F79BED.versanet.de) has joined #ceph
[22:59] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Ping timeout: 480 seconds)
[22:59] <srk> ok
[23:00] <kingcu> theanalyst: yoyo you still around?
[23:00] <kingcu> i believe i just fixed my issue with "failing to respond to cache pressure" warnings, you mentioned you were experiencing them as well
[23:00] <sage> srk: thanks for hte bug report!
[23:01] <cmdrk> kingcu: did upgrading your clients fix the cache pressure issue? /me reading mailing list
[23:01] <srk> sure, no problem. For now, we will either delete both bucket and rule or keep both.
[23:01] <kingcu> yeah just responded. when i upgraded my cluster thinking it would solve the problem, i had neglected to update the clients (not on purpose, just overlooked the issue)
[23:02] <kingcu> it was indeed the kernel hanging on to dentries. upgrading the clients may not have fixed it, only a couple days of not having an issue will confirm that
[23:02] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Quit: This computer has gone to sleep)
[23:02] * alram (~alram@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[23:03] <kingcu> but, was able to temporarily fix the issue by clearing dentries and inodes. Try this, on the client machines that are having the issue (i just grepped the mds log for the client ids and saw the IPs associated with the client ids)
[23:03] <cmdrk> i have seen the same issue -- just realized after reading your post that my test client is firefly and my new cluster is hammer
[23:03] <kingcu> echo 2 | sudo tee /proc/sys/vm/drop_caches
[23:03] <cmdrk> i'll have to give upgrading a whirl.
[23:04] <kingcu> if this fixes the issue, i'll be happily continuing my move away from NFS+nginx for some of my main file storage uses
[23:04] <cmdrk> unfotunately i have a firefly cluster as well.. not sure if it's safer to have hammer clients mounting a firefly mds or a firefly client talking to a hammer mds
[23:05] <cmdrk> s/mounting/communicating\ with
[23:05] <kingcu> cmdrk: hopefully that mailing list discussion will help others, there were a couple mentions of the problem I found by searching, but nothing with good search terms
[23:05] <kingcu> cmdrk: out of my depth on that one
[23:05] * bene (~ben@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[23:05] <gregsfortytwo> we don't test that scenario much but it ought to work
[23:05] <gregsfortytwo> or I will be a sad panda
[23:05] <cmdrk> gregsfortytwo: which? :)
[23:05] * JV_ (~chatzilla@204.14.239.106) Quit (Ping timeout: 480 seconds)
[23:06] <gregsfortytwo> older or newer clients
[23:06] <gregsfortytwo> either way
[23:06] <cmdrk> it has been fine thus far with a 3.18 kernel
[23:06] * Vacuum__ (~vovo@i59F79BF4.versanet.de) Quit (Ping timeout: 480 seconds)
[23:06] <cmdrk> with a firefly client and hammer cluster
[23:06] * oro (~oro@pat.zurich.ibm.com) has joined #ceph
[23:06] * oro (~oro@pat.zurich.ibm.com) Quit (Read error: Connection reset by peer)
[23:06] <cmdrk> except for the cache pressure thing popping up from time to time
[23:08] * JV (~chatzilla@204.14.239.106) has joined #ceph
[23:10] * rotbeard (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[23:12] * espeer (~quassel@phobos.isoho.st) Quit (Remote host closed the connection)
[23:13] * schmee (~quassel@phobos.isoho.st) Quit (Remote host closed the connection)
[23:13] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[23:13] * vbellur (~vijay@91.126.187.62) has joined #ceph
[23:14] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:15] * EdGruberman (~Yopi@5NZAACHGJ.tor-irc.dnsbl.oftc.net) Quit ()
[23:15] * Sun7zu (~Jourei@tor-exit0-readme.dfri.se) has joined #ceph
[23:18] * linjan (~linjan@213.8.240.146) Quit (Ping timeout: 480 seconds)
[23:19] * Concubidated (~Adium@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[23:21] * KevinPerks (~Adium@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[23:21] * wschulze (~wschulze@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[23:21] * sjm (~sjm@nat-pool-bos-u.redhat.com) has left #ceph
[23:21] * bandrus (~brian@nat-pool-bos-u.redhat.com) Quit (Quit: Leaving.)
[23:23] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[23:25] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Quit: This computer has gone to sleep)
[23:25] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[23:26] * linjan (~linjan@80.179.241.26) has joined #ceph
[23:26] * shohn (~shohn@nat-pool-bos-u.redhat.com) Quit (Quit: Leaving.)
[23:31] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[23:32] * elder (~elder@c-24-245-18-91.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[23:34] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[23:37] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[23:38] * espeer (~quassel@phobos.isoho.st) has joined #ceph
[23:38] * wushudoin (~wushudoin@nat-pool-bos-u.redhat.com) Quit (Quit: Leaving)
[23:39] * schmee (~quassel@phobos.isoho.st) has joined #ceph
[23:40] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[23:42] * JV (~chatzilla@204.14.239.106) Quit (Ping timeout: 480 seconds)
[23:43] * elder (~elder@c-24-245-18-91.hsd1.mn.comcast.net) has joined #ceph
[23:43] * ChanServ sets mode +o elder
[23:44] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) has joined #ceph
[23:45] * Sun7zu (~Jourei@8Q4AAAQHP.tor-irc.dnsbl.oftc.net) Quit ()
[23:49] * rlrevell1 (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[23:54] * srk (~oftc-webi@32.97.110.56) Quit (Quit: Page closed)
[23:55] * cl (~oftc-webi@32.97.110.54) Quit (Remote host closed the connection)
[23:55] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[23:57] * JV (~chatzilla@204.14.239.106) has joined #ceph
[23:58] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[23:59] * shohn (~shohn@173-14-159-105-NewEngland.hfc.comcastbusiness.net) has joined #ceph
[23:59] <Anticimex> is it possible to specifiy per qemu-rbd-volume that it shall or shall not use rbd caching?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.