#ceph IRC Log

Index

IRC Log for 2015-06-05

Timestamps are in GMT/BST.

[0:02] * sleinen1 (~Adium@84-72-160-233.dclient.hispeed.ch) Quit ()
[0:03] * sleinen1 (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[0:03] * sleinen (~Adium@2001:620:0:82::100) Quit (Read error: Connection reset by peer)
[0:04] * sleinen (~Adium@2001:620:0:82::100) has joined #ceph
[0:05] * sleinen1 (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[0:08] * vata (~vata@207.96.182.162) Quit (Ping timeout: 480 seconds)
[0:11] * daviddcc (~dcasier@77.151.197.84) Quit (Ping timeout: 480 seconds)
[0:12] * rlrevell (~leer@184.52.129.221) has joined #ceph
[0:12] * rlrevell (~leer@184.52.129.221) has left #ceph
[0:13] * sleinen (~Adium@2001:620:0:82::100) Quit (Ping timeout: 480 seconds)
[0:15] * Thayli (~Miho@9S0AAANHP.tor-irc.dnsbl.oftc.net) Quit ()
[0:15] * hassifa (~yuastnav@zscore.mit.edu) has joined #ceph
[0:21] * jskinner (~jskinner@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[0:21] * jskinner (~jskinner@host-95-2-129.infobunker.com) has joined #ceph
[0:26] * danieagle (~Daniel@177.45.216.63) has joined #ceph
[0:27] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[0:29] * jskinner (~jskinner@host-95-2-129.infobunker.com) Quit (Ping timeout: 480 seconds)
[0:30] * Kupo1 (~tyler.wil@23.111.254.159) has joined #ceph
[0:37] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) Quit (Quit: Bye)
[0:40] * linuxkidd__ (~linuxkidd@63.79.91.17) Quit (Quit: Leaving)
[0:45] * Silentspy (~JamesHarr@185.77.129.54) has joined #ceph
[0:45] * moore (~moore@64.202.160.88) has joined #ceph
[0:49] * hassifa (~yuastnav@5NZAAC90W.tor-irc.dnsbl.oftc.net) Quit ()
[0:53] * fsimonce (~simon@host253-71-dynamic.3-87-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[0:58] * midnight_ (~midnightr@216.113.160.71) Quit (Remote host closed the connection)
[1:01] * danieagle (~Daniel@177.45.216.63) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[1:01] * vata (~vata@cable-21.246.173-197.electronicbox.net) has joined #ceph
[1:05] * alram (~alram@192.41.52.12) Quit (Ping timeout: 480 seconds)
[1:07] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:07] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[1:09] * jclm (~jclm@211.177.245.106) Quit (Quit: Leaving.)
[1:09] * oms101_ (~oms101@p20030057EA3A1500C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:15] * Silentspy (~JamesHarr@8Q4AAA90J.tor-irc.dnsbl.oftc.net) Quit ()
[1:15] * Defaultti1 (~Thononain@185.77.129.11) has joined #ceph
[1:18] * oms101_ (~oms101@p20030057EA332800C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:26] * jclm (~jclm@118.130.198.164) has joined #ceph
[1:45] * Defaultti1 (~Thononain@5NZAAC93R.tor-irc.dnsbl.oftc.net) Quit ()
[1:53] * rwheeler (~rwheeler@38.140.108.3) has joined #ceph
[1:57] * wushudoin (~wushudoin@2601:9:4b00:f10:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[2:00] * LeaChim (~LeaChim@host86-163-124-72.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:04] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[2:04] * vata (~vata@cable-21.246.173-197.electronicbox.net) has joined #ceph
[2:04] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[2:08] * CScrace (cb638001@107.161.19.53) has joined #ceph
[2:09] <CScrace> Is it worth putting journals on RAID1 SSDs? Or just put them straight on the SSDs and spread them out?
[2:11] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[2:17] * moore (~moore@64.202.160.88) Quit (Remote host closed the connection)
[2:18] * ircolle (~Adium@2601:1:a580:1735:290b:ded2:2213:cc90) Quit (Quit: Leaving.)
[2:19] * brianjjo (~jwandborg@176.106.54.54) has joined #ceph
[2:21] * bandrus (~brian@55.sub-70-214-35.myvzw.com) Quit (Quit: Leaving.)
[2:24] <Kupo1> afaik its more reliable to just have more journals/osd's then try raiding things
[2:26] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[2:27] <CScrace> hmmm
[2:28] <CScrace> having trouble getting journals to work on raid devices anyway
[2:30] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:34] * rwheeler (~rwheeler@38.140.108.3) Quit (Remote host closed the connection)
[2:38] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[2:43] * Concubidated (~Adium@66.87.66.104) Quit (Quit: Leaving.)
[2:45] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) Quit (Ping timeout: 480 seconds)
[2:49] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[2:49] * brianjjo (~jwandborg@8Q4AAA92K.tor-irc.dnsbl.oftc.net) Quit ()
[2:49] * ylmson (~Hazmat@static-ip-85-25-103-119.inaddr.ip-pool.com) has joined #ceph
[3:02] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) has joined #ceph
[3:03] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) Quit (Remote host closed the connection)
[3:06] * OutOfNoWhere (~rpb@199.68.195.102) has joined #ceph
[3:08] * kefu (~kefu@114.92.116.93) has joined #ceph
[3:19] * ylmson (~Hazmat@5NZAAC97V.tor-irc.dnsbl.oftc.net) Quit ()
[3:19] * Popz (~Kealper@31.31.78.141) has joined #ceph
[3:20] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[3:29] <m0zes> CScrace: journals on raid1 ssds doesn't make much sense. ssds tend to die at the same time if they were bought together (and same mfg/ssd controller)
[3:29] <m0zes> in that case raid1 doesn't help
[3:32] * shohn1 (~shohn@dslb-094-223-167-060.094.223.pools.vodafone-ip.de) has joined #ceph
[3:34] * yghannam (~yghannam@0001f8aa.user.oftc.net) Quit (Quit: Leaving)
[3:36] <CScrace> ok thanks, think we have decided to not use raid1 :)
[3:37] * alram (~alram@64.134.221.151) has joined #ceph
[3:38] * shohn (~shohn@dslb-188-102-027-024.188.102.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[3:41] * kefu (~kefu@114.92.116.93) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[3:49] * Popz (~Kealper@8Q4AAA934.tor-irc.dnsbl.oftc.net) Quit ()
[3:49] * Lattyware (~TGF@89.105.194.70) has joined #ceph
[3:51] * kefu (~kefu@114.92.116.93) has joined #ceph
[3:56] * flisky (~Thunderbi@106.39.60.34) has joined #ceph
[3:57] * zhaochao (~zhaochao@125.39.8.226) has joined #ceph
[4:00] * CScrace (cb638001@107.161.19.53) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[4:19] * yghannam (~yghannam@0001f8aa.user.oftc.net) has joined #ceph
[4:19] * Lattyware (~TGF@9S0AAANT2.tor-irc.dnsbl.oftc.net) Quit ()
[4:20] * nicatronTg (~Shnaw@tor-exit1.arbitrary.ch) has joined #ceph
[4:20] * alram (~alram@64.134.221.151) Quit (Quit: leaving)
[4:20] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[4:26] * jclm (~jclm@118.130.198.164) Quit (Quit: Leaving.)
[4:27] * jclm (~jclm@118.130.198.164) has joined #ceph
[4:31] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Quit: leaving)
[4:40] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) has joined #ceph
[4:43] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit ()
[4:49] * nicatronTg (~Shnaw@8Q4AAA953.tor-irc.dnsbl.oftc.net) Quit ()
[4:54] * Random (~Frostshif@37.157.196.230) has joined #ceph
[4:59] * kefu (~kefu@114.92.116.93) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[5:06] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[5:19] * vbellur (~vijay@122.171.123.165) Quit (Ping timeout: 480 seconds)
[5:20] * deepsa (~Deependra@00013525.user.oftc.net) has joined #ceph
[5:21] * ketor (~ketor@182.48.117.114) has joined #ceph
[5:24] * Vacuum_ (~Vacuum@88.130.216.125) has joined #ceph
[5:24] * Random (~Frostshif@8Q4AAA96W.tor-irc.dnsbl.oftc.net) Quit ()
[5:24] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[5:25] * Nats_ (~natscogs@114.31.195.238) has joined #ceph
[5:25] * Nats (~natscogs@114.31.195.238) Quit (Read error: Connection reset by peer)
[5:31] * Vacuum__ (~Vacuum@88.130.211.207) Quit (Ping timeout: 480 seconds)
[5:39] * jyoti-ranjan (~ranjanj@idp01webcache1-z.apj.hpecore.net) has joined #ceph
[5:41] * squ (~Thunderbi@46.109.36.167) has joined #ceph
[5:54] * Revo84 (~Jourei@89.105.194.90) has joined #ceph
[6:02] * OutOfNoWhere (~rpb@199.68.195.102) Quit (Ping timeout: 480 seconds)
[6:14] * yguang11 (~yguang11@12.31.82.125) has joined #ceph
[6:21] * yguang11_ (~yguang11@2001:4998:effd:7801::10a0) has joined #ceph
[6:24] * Revo84 (~Jourei@5NZAADAFP.tor-irc.dnsbl.oftc.net) Quit ()
[6:28] * yguang11 (~yguang11@12.31.82.125) Quit (Ping timeout: 480 seconds)
[6:41] * Concubidated (~Adium@71.21.5.251) has joined #ceph
[6:48] * oro (~oro@84-72-20-79.dclient.hispeed.ch) has joined #ceph
[6:53] * jclm (~jclm@118.130.198.164) Quit (Quit: Leaving.)
[6:54] * Bonzaii (~Silentkil@8Q4AABAAL.tor-irc.dnsbl.oftc.net) has joined #ceph
[6:55] * vbellur (~vijay@121.244.87.117) has joined #ceph
[6:56] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[7:12] * amote (~amote@121.244.87.116) has joined #ceph
[7:12] * karnan (~karnan@106.51.132.78) has joined #ceph
[7:21] * Hemanth (~Hemanth@117.192.249.169) has joined #ceph
[7:22] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[7:22] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[7:24] * Bonzaii (~Silentkil@8Q4AABAAL.tor-irc.dnsbl.oftc.net) Quit ()
[7:24] * adept256 (~cyphase@hessel0.torservers.net) has joined #ceph
[7:30] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[7:31] * shang (~ShangWu@175.41.48.77) has joined #ceph
[7:40] <raw> "http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/" says "
[7:40] <raw> Tip
[7:40] <raw> DO NOT mount kernel clients directly on the same node as your Ceph Storage Cluster, because kernel conflicts can arise. However, you can mount kernel clients within virtual machines (VMs) on a single node." how serious is this?
[7:41] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has joined #ceph
[7:45] * treenerd (~treenerd@83-64-142-10.zwischennetz.xdsl-line.inode.at) has joined #ceph
[7:47] * vbellur (~vijay@121.244.87.124) has joined #ceph
[7:50] * overclk (~overclk@121.244.87.117) has joined #ceph
[7:53] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:54] * vikhyat (~vumrao@121.244.87.116) Quit (Remote host closed the connection)
[7:54] * adept256 (~cyphase@9S0AAAN31.tor-irc.dnsbl.oftc.net) Quit ()
[7:58] * Hemanth (~Hemanth@117.192.249.169) Quit (Ping timeout: 480 seconds)
[8:04] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[8:10] * treenerd (~treenerd@83-64-142-10.zwischennetz.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[8:12] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[8:20] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) Quit (Quit: Relax, its only ONES and ZEROS!)
[8:23] * karnan (~karnan@106.51.132.78) Quit (Ping timeout: 480 seconds)
[8:24] * Averad (~skney@chomsky.torservers.net) has joined #ceph
[8:28] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[8:35] * cooldharma06 (~chatzilla@14.139.180.40) has joined #ceph
[8:37] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) has joined #ceph
[8:44] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) has joined #ceph
[8:49] * karnan (~karnan@223.229.156.182) has joined #ceph
[8:50] * cooldharma06 (~chatzilla@14.139.180.40) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 21.0/20130515140136])
[8:51] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[8:54] * Averad (~skney@7R2AABIBL.tor-irc.dnsbl.oftc.net) Quit ()
[8:54] * Thayli (~Tumm@37.187.129.166) has joined #ceph
[8:56] * jyoti-ranjan (~ranjanj@idp01webcache1-z.apj.hpecore.net) Quit (Read error: Connection reset by peer)
[8:58] * karnan (~karnan@223.229.156.182) Quit (Ping timeout: 480 seconds)
[9:04] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[9:06] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) has joined #ceph
[9:06] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[9:07] * derjohn_mob (~aj@tmo-111-165.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[9:07] * ketor (~ketor@182.48.117.114) Quit (Remote host closed the connection)
[9:07] * karnan (~karnan@223.229.156.182) has joined #ceph
[9:08] * ketor (~ketor@182.48.117.114) has joined #ceph
[9:09] * derjohn_mob (~aj@tmo-111-165.customers.d1-online.com) has joined #ceph
[9:10] * analbeard (~shw@support.memset.com) has joined #ceph
[9:12] * bitserker (~toni@88.87.194.130) has joined #ceph
[9:14] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[9:14] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) Quit (Quit: Leaving.)
[9:15] * dgurtner (~dgurtner@178.197.231.155) has joined #ceph
[9:24] * Thayli (~Tumm@5NZAADART.tor-irc.dnsbl.oftc.net) Quit ()
[9:27] * yguang11_ (~yguang11@2001:4998:effd:7801::10a0) Quit (Ping timeout: 480 seconds)
[9:27] * oro (~oro@84-72-20-79.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[9:30] * roland_ (~roland@asselijn.lorentz.leidenuniv.nl) Quit (Remote host closed the connection)
[9:31] * jyoti-ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) has joined #ceph
[9:33] * Concubidated (~Adium@71.21.5.251) Quit (Quit: Leaving.)
[9:34] * johnmce (~oftc-webi@netserv2.netserv.nsdsl.net) has joined #ceph
[9:35] <johnmce> Hi I've got a Ceph cluster down right now - people screaming. Can any give advice?
[9:35] * karnan (~karnan@223.229.156.182) Quit (Ping timeout: 480 seconds)
[9:35] <johnmce> One ceph osd host said something about internal memory corruption, so I shut it down.
[9:36] <johnmce> I've got PGs down+peering
[9:36] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[9:36] <johnmce> Example: pg 14.598 is stuck unclean since forever, current state down+peering, last acting [13,33]
[9:36] <johnmce> Example2: pg 14.91d is down+peering, acting [20,36]
[9:37] <vikhyat> johnmce: osd_pool_default_size ?
[9:37] <vikhyat> johnmce: in ceph.conf I think 2
[9:38] <vikhyat> johnmce: osd_pool_default_min_size ?
[9:38] <johnmce> vikhyat: I dropped the min_size to 1 one several pools, although not all
[9:39] <johnmce> default size is either 2 or three for the different pools
[9:39] <vikhyat> johnmce: in ceph.conf ?
[9:39] <vikhyat> johnmce: or you have used ceph tell command ?
[9:39] <johnmce> No, I did this: ceph osd pool set cinder-ceph min_size 1
[9:40] <vikhyat> johnmce: did you set that osd out
[9:40] <vikhyat> means osds in that node
[9:40] <johnmce> No, should I do that?
[9:41] * jyoti-ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) Quit (Ping timeout: 480 seconds)
[9:42] * derjohn_mob (~aj@tmo-111-165.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[9:42] <johnmce> vikhyat: Should I set the osds to out?
[9:44] <vikhyat> johnmce: there are two things
[9:45] <vikhyat> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing
[9:45] * thomnico (~thomnico@2a01:e35:8b41:120:c4da:8f3f:b541:fb3f) has joined #ceph
[9:45] <vikhyat> either you can do that if you do not want rebalance of data
[9:46] <vikhyat> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[9:46] <vikhyat> or this out will declare the osd out of the cluster and then rebalance will start
[9:46] <vikhyat> HTH
[9:48] * fsimonce (~simon@host253-71-dynamic.3-87-r.retail.telecomitalia.it) has joined #ceph
[9:49] <johnmce> vikhyat: noout was not set, so backfilling is well underway. Is rebalancing and backfilling the same thing?
[9:49] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[9:50] <vikhyat> johnmce: you can say same as once osd goes down in any replica set that set need to backfill down osd place from some other osd
[9:52] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[9:52] <vikhyat> if you want to check you can issues you can run pg query on pgs which are in bad state
[9:52] <johnmce> vikhyat: The OSD are already down and backfilling is happening. Are you saying that the stuck+peering problem will go away of I mark the OSDs as out?
[9:53] <johnmce> vikhyat: Wjen I run this "ceph pg 14.91d query" I wait five minutes and get no response
[9:53] * rotbart (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) has joined #ceph
[9:54] * tallest_red (~Uniju@46.182.106.190) has joined #ceph
[9:55] <johnmce> vikhyat: These are my pgs that are down: http://pastebin.com/nHTh94XY
[9:55] <johnmce> vokhyat: So a number og pg14.xxx are down+peering
[9:56] <johnmce> vikhyat: And a query give no info
[9:56] <vikhyat> johnmce:are these pgs acting set osds are down
[9:57] * mdxi (~mdxi@50-199-109-154-static.hfc.comcastbusiness.net) Quit (Quit: leaving)
[9:57] <johnmce> vikhyat: Not sure how to answer that. How do I find that?
[9:57] <vikhyat> last acting [32,26]
[9:57] <vikhyat> these are nothing but osd numbers
[9:58] <vikhyat> which means you have replica count 2
[9:58] <vikhyat> for these pgs
[9:59] * treenerd_ (~treenerd@85.193.140.98) has joined #ceph
[10:00] * sleinen (~Adium@130.59.94.173) has joined #ceph
[10:00] <johnmce> vikhyat: None of the OSD I see as last acting are down
[10:01] * johnmce (~oftc-webi@netserv2.netserv.nsdsl.net) Quit (Remote host closed the connection)
[10:01] * sleinen1 (~Adium@2001:620:0:82::102) has joined #ceph
[10:03] * ChrisNBlum (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[10:04] * johnmce (~oftc-webi@netserv2.netserv.nsdsl.net) has joined #ceph
[10:04] <johnmce> vikhyat: Sorry I went offline. Do you have any advice?
[10:06] * vbellur (~vijay@121.244.87.117) has joined #ceph
[10:07] <johnmce> Can anyone help me get my ceph cluster working again? I've got a number of OSDs on a single host down and out and I've got 6 PGs down+peering.
[10:08] <johnmce> List of down PGs: http://pastebin.com/nHTh94XY
[10:08] * sleinen (~Adium@130.59.94.173) Quit (Ping timeout: 480 seconds)
[10:08] * RomeroJnr (~h0m3r@hosd.leaseweb.net) has joined #ceph
[10:11] * daviddcc (~dcasier@77.151.197.84) has joined #ceph
[10:15] <johnmce> Please can any help me with my ceph cluster that is completely non-functional I've been down for hour and am getting nowhere.
[10:15] <johnmce> I've got a number of OSDs on a single host down and out and I've got 6 PGs down+peering.
[10:17] * ketor (~ketor@182.48.117.114) Quit (Remote host closed the connection)
[10:17] * ketor (~ketor@182.48.117.114) has joined #ceph
[10:20] <Nats_> perhaps a stupid question, but is your min_size set to 1
[10:20] <johnmce> Please, can anyone help. Dunning "ceph pg 14.91d query" against my down+peering PG just hangs. What causes that?
[10:20] <johnmce> I did drop the min_size to 1. Is that a problem?
[10:20] <Nats_> no thats what it should be
[10:20] <johnmce> We are talking min_size on the pools?
[10:21] <Nats_> (if you have 2 replicas, which is what it appears)
[10:21] <Nats_> what does ceph osd tree look like
[10:21] <johnmce> Some pools have 2 and some 3.
[10:22] <johnmce> Ceph OSD tree : http://pastebin.com/qFBQzta4
[10:22] <Nats_> seems like crush map is wrong if that causes a problem
[10:23] <Nats_> not that it helps u now
[10:23] <Nats_> and what about ceph pg dump | grep down
[10:24] * tallest_red (~Uniju@3DDAAAPJM.tor-irc.dnsbl.oftc.net) Quit ()
[10:24] * kalmisto (~Malcovent@192.42.116.16) has joined #ceph
[10:25] <johnmce> List of down PGs: http://pastebin.com/nHTh94XY
[10:26] <johnmce> How to I find out which pool the down PGs belong to?
[10:26] <Nats_> the pg number corresponds to the pool
[10:26] <Nats_> pg 14.91d means pool 14
[10:27] <Nats_> ceph --id compute osd pool stats is one way to map that to a name, there's presumedly others
[10:28] <Nats_> err, drop the --id part
[10:28] <Nats_> just 'ceph osd pool stats'
[10:28] * daviddcc (~dcasier@77.151.197.84) Quit (Ping timeout: 480 seconds)
[10:29] * joao (~joao@bl16-158-23.dsl.telepac.pt) has joined #ceph
[10:29] * ChanServ sets mode +o joao
[10:31] <johnmce> Nats_: OK. I've double-checked that pool and min_size is 1, yet the PGs are still stuck+peering. Any idea what my course of action should be?
[10:34] * fdmanana__ (~fdmanana@bl13-138-253.dsl.telepac.pt) has joined #ceph
[10:34] * linjan (~linjan@213.8.240.146) has joined #ceph
[10:35] <Nats_> ceph pg dump | grep down will give you two sets of id's, where it should be and where it currently is
[10:35] <Nats_> tbh my first strategy would be to look at how to get your host back up, since in a correct config you'd not have any issues
[10:36] <Nats_> if recovery is actually making progress, you can perhaps just nurse it through recovery
[10:36] <Nats_> good luck :|
[10:39] <johnmce> Nats_: thanks for your help. I'm going to try to bring that host back up. It complained of something "corrupted in memory", which is why I took it down
[10:41] <johnmce> Nats_: This was the error on the host "XFS (bcache3): Corruption of in-memory data detected. Shutting down filesystem". That was after a reboot and that's when I shut it down.
[10:43] <johnmce> Nats_: The host is a Dell server with ECC memory
[10:43] * DV_ (~veillard@2001:41d0:1:d478::1) Quit (Remote host closed the connection)
[10:46] * oro (~oro@2001:620:20:16:b5ff:10b9:d8a8:b807) has joined #ceph
[10:52] * jclm (~jclm@211.177.245.3) has joined #ceph
[10:54] * kalmisto (~Malcovent@9S0AAAOGM.tor-irc.dnsbl.oftc.net) Quit ()
[10:54] * Rosenbluth (~toast@tor-amici-exit.tritn.com) has joined #ceph
[10:56] * joao (~joao@bl16-158-23.dsl.telepac.pt) Quit (Quit: Changing server)
[10:56] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:57] * joao (~joao@bl16-158-23.dsl.telepac.pt) has joined #ceph
[10:57] * ChanServ sets mode +o joao
[10:58] * karnan (~karnan@223.229.156.182) has joined #ceph
[11:06] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Quit: Leaving.)
[11:11] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) Quit (Remote host closed the connection)
[11:11] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) has joined #ceph
[11:16] * fam is now known as fam_away
[11:22] * fdmanana__ (~fdmanana@bl13-138-253.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[11:23] * lovejoy (~lovejoy@57519dc8.skybroadband.com) has joined #ceph
[11:23] * lovejoy (~lovejoy@57519dc8.skybroadband.com) Quit ()
[11:24] * Rosenbluth (~toast@9S0AAAOHS.tor-irc.dnsbl.oftc.net) Quit ()
[11:24] * AGaW (~Heliwr@bolobolo2.torservers.net) has joined #ceph
[11:31] * fdmanana__ (~fdmanana@bl13-138-253.dsl.telepac.pt) has joined #ceph
[11:32] * Nacer_ (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[11:43] * derjohn_mob (~aj@2001:6f8:1337:0:2db2:14bc:2d74:477a) has joined #ceph
[11:47] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[11:54] * kefu (~kefu@60.247.111.66) has joined #ceph
[11:54] * AGaW (~Heliwr@8Q4AABAII.tor-irc.dnsbl.oftc.net) Quit ()
[11:54] * Szernex (~ggg@8Q4AABAJO.tor-irc.dnsbl.oftc.net) has joined #ceph
[12:05] <raw> i wrote yesterday that i have reproduceable file corruptions when using cephfs. those were only reproduceable with 30GB+ files and only when streaming the file from a 3rd machine at maximum speed.
[12:06] <raw> i was not able to reproduce it by copying the file using a 50% speed throttle or by copying it from a lokal drive (but that may also be speed related)
[12:06] <raw> after some testing, the problem does also not occur when i have no OSDs running on the machine where cephfs is mounted
[12:06] <raw> als mounting cephfs with option wsize=32768 makes the problem disappear
[12:07] * perpetualrabbit (~perpetual@asselijn.lorentz.leidenuniv.nl) has joined #ceph
[12:07] * shohn (~shohn@dslb-094-223-165-016.094.223.pools.vodafone-ip.de) has joined #ceph
[12:11] * shohn1 (~shohn@dslb-094-223-167-060.094.223.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[12:12] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[12:14] * lucas1 (~Thunderbi@218.76.52.64) Quit (Quit: lucas1)
[12:19] * jyoti-ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) has joined #ceph
[12:24] * Szernex (~ggg@8Q4AABAJO.tor-irc.dnsbl.oftc.net) Quit ()
[12:24] <raw> my ceph is dreaming: recovery io 14577 MB/s, 3644 objects/s client io 101814 MB/s rd, 102 GB/s wr, 2533 kop/s :)
[12:26] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[12:29] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[12:32] <smerz> 102 GB/s write?
[12:32] <smerz> wow
[12:34] <smerz> do you see 102/GBs over the network ?
[12:42] <sugoruyo> smerz: I think that's why raw said ceph is dreaming...
[12:42] * karnan (~karnan@223.229.156.182) Quit (Ping timeout: 480 seconds)
[12:44] <smerz> yeah
[12:50] * kefu (~kefu@60.247.111.66) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:51] * kefu (~kefu@107.191.53.55) has joined #ceph
[12:53] * kefu_ (~kefu@60.247.111.66) has joined #ceph
[12:54] * jyoti_ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) has joined #ceph
[12:54] * jyoti-ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) Quit (Read error: Connection reset by peer)
[12:55] * karnan (~karnan@106.51.135.178) has joined #ceph
[12:58] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[12:58] * treenerd_ (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[12:59] * kefu (~kefu@107.191.53.55) Quit (Ping timeout: 480 seconds)
[13:06] * Mika_c (~Mk@114-38-51-24.dynamic.hinet.net) has joined #ceph
[13:07] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[13:10] * treenerd_ (~treenerd@85.193.140.98) has joined #ceph
[13:10] <treenerd> hi ceph community, what happens if the journal size is to small and the partition is full?
[13:11] <treenerd> Just a comprehension question.
[13:11] * kefu_ (~kefu@60.247.111.66) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[13:13] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[13:13] * kefu (~kefu@60.247.111.66) has joined #ceph
[13:15] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) Quit ()
[13:17] * ira (~ira@c-71-233-225-22.hsd1.ma.comcast.net) has joined #ceph
[13:20] * nils_ (~nils@doomstreet.collins.kg) has joined #ceph
[13:20] * jyoti_ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) Quit (Read error: Connection reset by peer)
[13:20] * jyoti_ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) has joined #ceph
[13:21] * zhaochao (~zhaochao@125.39.8.226) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 38.0.1/20150526223604])
[13:23] * bene (~ben@c-24-60-237-191.hsd1.nh.comcast.net) has joined #ceph
[13:25] * karnan (~karnan@106.51.135.178) Quit (Ping timeout: 480 seconds)
[13:26] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[13:28] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[13:29] * Izanagi (~Tralin|Sl@178-175-128-50.ip.as43289.net) has joined #ceph
[13:31] * flisky (~Thunderbi@106.39.60.34) Quit (Quit: flisky)
[13:32] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) has joined #ceph
[13:36] <tuxcraft1r> http://paste.debian.net/203602/
[13:36] * jyoti_ranjan (~ranjanj@idp01webcache4-z.apj.hpecore.net) Quit (Read error: Connection reset by peer)
[13:36] <tuxcraft1r> ^ hi everyone, can somebody take a look at my pastebin
[13:36] <tuxcraft1r> i use ceph-deploy osd create to make osds
[13:37] <tuxcraft1r> but they are not all (some are) activated and on reboot I got errors
[13:39] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[13:40] * johnmce (~oftc-webi@netserv2.netserv.nsdsl.net) Quit (Quit: Page closed)
[13:44] <treenerd> Hi tuxcraftlr, which ceph deploy version did you use, which distri?
[13:45] <treenerd> Ah i see debian jessie
[13:46] * kefu is now known as kefu|afk
[13:47] * ChrisNBl_ (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) has joined #ceph
[13:50] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[13:51] * peem (~oftc-webi@office.forlinux.co.uk) has joined #ceph
[13:52] <peem> hi. just created ceph cluster with 5 osds, but helath shows health_warn. Any hints where to start looking at what is wong ?
[13:53] <vikhyat> peem ceph -s
[13:53] * Hemanth (~Hemanth@121.244.87.117) Quit (Quit: Leaving)
[13:54] * ChrisNBlum (~ChrisNBlu@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Ping timeout: 480 seconds)
[13:55] <peem> vikhayt: looked there, still makes little sence to me.
[13:56] * ganders (~root@190.2.42.21) has joined #ceph
[13:56] * jks (~jks@178.155.151.121) Quit (Ping timeout: 480 seconds)
[13:56] <peem> vikhyat: I can see bunch of pgs being degraded, peering, stale, stuck, and not sure what it means.
[13:57] <vikhyat> peem http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
[13:57] <vikhyat> peem: HTH !
[13:59] * Izanagi (~Tralin|Sl@5NZAADA2E.tor-irc.dnsbl.oftc.net) Quit ()
[13:59] * cmrn (~luckz@destiny.enn.lu) has joined #ceph
[14:00] * kefu|afk (~kefu@60.247.111.66) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:00] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[14:01] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[14:01] * rotbart (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[14:02] <peem> vikhyat: would issues with hardware be reason for such thing ? It is a test system for demonstration, and I had to use some old hardware, so they may be bad disks.
[14:02] * TMM_ (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[14:03] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Ping timeout: 480 seconds)
[14:03] <vikhyat> peem: could be then your osds would be down
[14:03] <vikhyat> peem: check ceph osd tree command output
[14:04] * alfredodeza (~alfredode@198.206.133.89) has left #ceph
[14:14] * ganders (~root@190.2.42.21) Quit (Quit: WeeChat 0.4.2)
[14:21] <peem> vikhyat: thanks, tree had much more osds, I think left overs from previous plepare/activate comands which failed .. zap'd the drives and starting from scratch with osds
[14:22] <vikhyat> peem: great so now you can debug !
[14:22] * finster (~finster@2a01:4f8:d15:1000::2) has left #ceph
[14:23] * jskinner (~jskinner@host-95-2-129.infobunker.com) has joined #ceph
[14:23] <peem> vikhyat: already started with new osds, but will be more carefull and spot issues like that quicker.
[14:24] * RomeroJnr (~h0m3r@hosd.leaseweb.net) Quit ()
[14:26] <tuxcraft1r> someone?
[14:29] * cmrn (~luckz@8Q4AABANK.tor-irc.dnsbl.oftc.net) Quit ()
[14:32] * itsjpr (~imjpr@thing2.it.uab.edu) has joined #ceph
[14:32] * imjpr (~imjpr@thing2.it.uab.edu) has joined #ceph
[14:34] * wschulze1 (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) has joined #ceph
[14:39] <smerz> tuxcraft1r, from the POC i tried "invalid (someone else's?) journal". i got that after moving the journal to ramdisk (for testing purposes ofc!). there is a command to reinitialize an empty journal.
[14:40] <smerz> the journal location. did you use the default ? or some other customer location(s)
[14:40] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[14:41] <peem> one more question, though. I can see "osd down" command, but not "osd up". My osd seems to be going down by itself, not sure why, so I wanted to up it..
[14:43] * ketor (~ketor@182.48.117.114) Quit (Remote host closed the connection)
[14:44] * squ (~Thunderbi@46.109.36.167) Quit (Quit: squ)
[14:44] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[14:45] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[14:48] * ChrisNBl_ (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:54] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) Quit (Quit: bye!)
[14:54] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[14:55] * deepsa (~Deependra@00013525.user.oftc.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[14:57] * KevinPerks (~Adium@cpe-75-177-32-35.triad.res.rr.com) has joined #ceph
[14:58] * mdxi (~mdxi@50-199-109-154-static.hfc.comcastbusiness.net) has joined #ceph
[14:59] * ChauffeR1 (~Defaultti@tor-exit1.arbitrary.ch) has joined #ceph
[15:00] * KevinPerks1 (~Adium@cpe-75-177-32-14.triad.res.rr.com) has joined #ceph
[15:00] * vikhyat is now known as vikhyat|brb
[15:02] <janos_> peem, the cluster determines when it is considered up
[15:02] <janos_> you can mark it in and out
[15:02] * kefu (~kefu@60.247.111.66) has joined #ceph
[15:02] <janos_> and once in, if the cluster is happy with it, it can be marked up
[15:03] <peem> janos_: ok, that makes sence... looks like my issues may be with firewall, so looking at it now.
[15:03] * treenerd_ (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[15:03] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[15:05] * KevinPerks (~Adium@cpe-75-177-32-35.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:06] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:07] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) has joined #ceph
[15:11] * treenerd_ (~treenerd@85.193.140.98) has joined #ceph
[15:12] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[15:16] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:18] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[15:19] * vbellur (~vijay@121.244.87.124) has joined #ceph
[15:20] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) has joined #ceph
[15:25] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[15:25] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[15:25] * primechuck (~primechuc@host-95-2-129.infobunker.com) has joined #ceph
[15:25] * jyoti-ranjan (~ranjanj@idp01webcache5-z.apj.hpecore.net) has joined #ceph
[15:26] * dyasny (~dyasny@198.251.58.23) has joined #ceph
[15:27] <tuxcraft1r> smerz: the deploy list command shows that deploy create made two partions and used one for data and one for journal
[15:29] * ChauffeR1 (~Defaultti@3OZAAB2XG.tor-irc.dnsbl.oftc.net) Quit ()
[15:30] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) has joined #ceph
[15:30] * KevinPerks1 (~Adium@cpe-75-177-32-14.triad.res.rr.com) Quit (Read error: Connection reset by peer)
[15:36] * daviddcc (~dcasier@LCaen-656-1-144-187.w217-128.abo.wanadoo.fr) has joined #ceph
[15:41] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[15:48] * treenerd_ (~treenerd@85.193.140.98) Quit (Quit: Verlassend)
[15:48] * treenerd (~treenerd@85.193.140.98) Quit (Remote host closed the connection)
[15:49] * vikhyat|brb is now known as vikhyat
[15:50] * dephcon (~oftc-webi@c73-110.rim.net) has joined #ceph
[15:50] * linuxkidd_ (~linuxkidd@17.sub-70-215-195.myvzw.com) has joined #ceph
[15:52] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[15:54] * amote (~amote@121.244.87.116) Quit (Quit: Leaving)
[15:54] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[15:54] * overclk (~overclk@121.244.87.117) Quit (Quit: Leaving)
[15:55] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[15:56] <alphe> one of my osd suddently crashed
[15:56] <alphe> and now when I try to start that particular osd it crash after some moment
[15:56] <alphe> with a ton of dump and failled assertion
[15:57] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) has joined #ceph
[15:57] <alphe> using 94.1 anyone have a clue of what is happenning to my poor osd
[15:57] <alphe> the disk is available and it seems I can browse it
[15:59] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) has joined #ceph
[15:59] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[16:00] <raw> alphe, does dmesg tell you something?
[16:01] <alphe> indeed didnt though to look there
[16:01] <alphe> seems like a xfs problem
[16:02] <alphe> I have a XFS (mydrive) metadata I/O error: block etc.. (xfs_trans_read_buf_map)
[16:02] <alphe> should I run xfs_repair ?
[16:03] <raw> if the rest of your cluster is healthy, i would just reformat that osd
[16:04] <alphe> ceph-deploy zap host:/thedisk ?
[16:06] * sleinen1 (~Adium@2001:620:0:82::102) Quit (Ping timeout: 480 seconds)
[16:06] <alphe> raw can it be done with ceph-disk ?
[16:08] <raw> alphe, im doing it using ceph deploy that uses ceph disk, so think yes. you have to completely remove the osd from you cluster manually (see http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ )
[16:08] <alphe> ok I understand how to do it now thank you :)
[16:09] <alphe> I was wondering if there where a way that didn't mean erase osd and recreate
[16:09] * blynch (~blynch@vm-nat.msi.umn.edu) has joined #ceph
[16:09] <raw> after it is completely removed, you can redeploy that osd like you did the first time you set it up. it will get the next free osd id - which should be the one you just freed up by removing the osd.
[16:10] <raw> alphe, yes, you can try fsck and stuff and repair it. i just prefer to reformat to be 100% sure that everything is correct afterwards
[16:10] <alphe> yeah
[16:11] <alphe> that is the faster way too
[16:11] * flisky (~Thunderbi@118.186.147.15) has joined #ceph
[16:11] * flisky (~Thunderbi@118.186.147.15) Quit ()
[16:11] * alram (~alram@192.41.52.12) has joined #ceph
[16:11] <alphe> how would I know if my disk is getting bad sector ? Do I need to monitor smart on my nodes apart ceph ?
[16:11] <alphe> smartctl ?
[16:13] <raw> yes, use smartctl -A or --all on that disk and see if something looks suspicious
[16:13] <raw> also run a --test=long on that disk to see if it got any bad sectors
[16:13] <alphe> ok will unmount first then smartctl the drive
[16:13] <alphe> and then remove the osd zap the disk and recreate is with ceph-deploy
[16:14] * jyoti-ranjan (~ranjanj@idp01webcache5-z.apj.hpecore.net) Quit (Read error: Connection reset by peer)
[16:16] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[16:16] <raw> xfs is very stable, its unlikely that it will destroy itself... so if the disk is not corrupt, there is possible another reason like bad memory, bad sata cable, bad disk controller or something.
[16:17] <alphe> I have smart errors on the disk that smells bad
[16:17] <raw> yeah
[16:18] <alphe> but it would be fun that ceph like a sas raid card warn when there is smart alert well not really ceph purpose ...
[16:20] <raw> would make live easier. ceph could watch for the sata state of drives and warn about bad health while adjusting the osd weight just in case :)
[16:24] * rlrevell (~leer@vbo1.inmotionhosting.com) has joined #ceph
[16:25] * shaunm (~shaunm@74.215.76.114) Quit (Ping timeout: 480 seconds)
[16:27] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) Quit (Quit: Leaving.)
[16:33] <alphe> raw exactly and then ceph creates its first terminator and we are all in probleme I mean the sys admin IT guys ...
[16:33] <alphe> with a terminator in your data center no need of any IT guys
[16:34] <raw> ceph will then call you at night and demand for new hard drives
[16:37] * moore (~moore@97-124-123-201.phnx.qwest.net) has joined #ceph
[16:42] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has left #ceph
[16:43] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[16:45] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[16:46] * kawa2014 (~kawa@2001:67c:1560:8007::aac:c1a6) has joined #ceph
[16:50] * jashank42 (~jashan42@117.197.170.152) has joined #ceph
[16:52] <rlrevell> is anyone else getting 404 errors from the ceph package repos?
[16:52] <rlrevell> http://gitbuilder.ceph.com/apache2-deb-trusty-x86_64-basic/ref/master/dists/trusty/main/binary-amd64/Packages for example
[16:53] * jashank42 (~jashan42@117.197.170.152) Quit (Remote host closed the connection)
[16:54] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Quit: Leaving.)
[16:55] * kefu (~kefu@60.247.111.66) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[16:56] * kawa2014 (~kawa@2001:67c:1560:8007::aac:c1a6) Quit (Quit: Leaving)
[16:56] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[16:56] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[16:58] * kawa2014 (~kawa@2001:67c:1560:8007::aac:c1a6) has joined #ceph
[16:58] * jashank42 (~jashan42@117.207.177.7) has joined #ceph
[16:59] * luigiman (~Catsceo@176.10.104.240) has joined #ceph
[16:59] * TMM_ (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[16:59] * nils_ (~nils@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[16:59] * kefu (~kefu@60.247.111.66) has joined #ceph
[17:02] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:07] * ksperis (~ksperis@46.218.42.103) Quit (Remote host closed the connection)
[17:17] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) has joined #ceph
[17:20] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:20] * daviddcc (~dcasier@LCaen-656-1-144-187.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[17:21] * derjohn_mob (~aj@2001:6f8:1337:0:2db2:14bc:2d74:477a) Quit (Ping timeout: 480 seconds)
[17:24] <kapil_> q quick question on krbd. I mapped an rbd image, after I reboot the host, should the mapped rbd device come up on its own ?
[17:24] <kapil_> In my case I don' t see the mapped rbd device after I reboot
[17:24] * shang (~ShangWu@175.41.48.77) Quit (Remote host closed the connection)
[17:25] * reed (~reed@198.8.80.61) has joined #ceph
[17:26] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[17:28] * sleinen1 (~Adium@2001:620:0:82::102) has joined #ceph
[17:29] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[17:29] * luigiman (~Catsceo@7R2AABIJK.tor-irc.dnsbl.oftc.net) Quit ()
[17:29] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Quit: Leaving)
[17:31] * kefu (~kefu@60.247.111.66) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:32] * jrocha (~jrocha@vagabond.cern.ch) Quit (Quit: Leaving)
[17:36] * sleinen1 (~Adium@2001:620:0:82::102) Quit (Ping timeout: 480 seconds)
[17:37] * oro (~oro@2001:620:20:16:b5ff:10b9:d8a8:b807) Quit (Ping timeout: 480 seconds)
[17:38] * wschulze1 (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:38] * danieagle (~Daniel@177.9.74.223) has joined #ceph
[17:43] * vbellur (~vijay@122.172.39.207) has joined #ceph
[17:44] * kawa2014 (~kawa@2001:67c:1560:8007::aac:c1a6) Quit (Ping timeout: 480 seconds)
[17:46] * kefu (~kefu@60.247.111.66) has joined #ceph
[17:46] * joao (~joao@bl16-158-23.dsl.telepac.pt) Quit (Read error: Connection reset by peer)
[17:48] * daviddcc (~dcasier@LCaen-656-1-144-187.w217-128.abo.wanadoo.fr) has joined #ceph
[17:49] * joao (~joao@bl16-158-23.dsl.telepac.pt) has joined #ceph
[17:49] * ChanServ sets mode +o joao
[17:51] <rkeene> Ceph emits invalid XML
[17:52] <rkeene> ceph osd dump --format xml yields: ...<blacklist><10.0.98.71:0/1004094>2015-06-05 16:18:37.789683</10.0.98.71:0/1004094></blacklist>...
[17:52] <raw> kapil_, i think mapped rbds are gone after reboot
[17:53] <rkeene> Where XML tags may not contain the characters [:/] and may not start with a number. See the XML specification here: http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name
[17:54] <raw> kapil_, maybe this helps you: http://www.sebastien-han.fr/blog/2013/11/22/map-slash-unmap-rbd-device-on-boot-slash-shutdown/
[17:54] * bandrus (~brian@55.sub-70-214-35.myvzw.com) has joined #ceph
[17:56] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[17:56] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) has joined #ceph
[17:58] * kefu (~kefu@60.247.111.66) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:59] * Tralin|Sleep (~Corneliou@marcuse-2.nos-oignons.net) has joined #ceph
[17:59] * bitserker (~toni@88.87.194.130) Quit (Quit: Leaving.)
[18:01] * kawa2014 (~kawa@89.184.114.246) Quit ()
[18:01] * jashank42 (~jashan42@117.207.177.7) Quit (Read error: Connection reset by peer)
[18:02] * jashank42 (~jashan42@117.207.177.7) has joined #ceph
[18:02] * bitserker (~toni@88.87.194.130) has joined #ceph
[18:05] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[18:05] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[18:06] * ganders (~root@190.2.42.21) has joined #ceph
[18:09] <ganders> I've ceph 0.82 and I want to deploy with ceph-deploy install cmd to a centos 7.0.1406 but i'm getting the unsupportedplatform msg
[18:09] * halbritt_ (~halbritt@65.50.222.90) Quit (Ping timeout: 480 seconds)
[18:09] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit (Quit: Leaving)
[18:09] * cholcombe (~chris@c-73-180-29-35.hsd1.or.comcast.net) has joined #ceph
[18:11] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) has joined #ceph
[18:12] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[18:12] * moore (~moore@97-124-123-201.phnx.qwest.net) Quit (Remote host closed the connection)
[18:12] * ircolle (~Adium@2601:1:a580:1735:a1f8:dc80:9764:90cb) has joined #ceph
[18:14] * moore (~moore@97-124-123-201.phnx.qwest.net) has joined #ceph
[18:15] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:18] * sugoruyo (~georgev@paarthurnax.esc.rl.ac.uk) Quit (Quit: I'm going home!)
[18:24] * halbritt (~halbritt@65.50.222.90) has joined #ceph
[18:25] * bitserker (~toni@88.87.194.130) Quit (Quit: Leaving.)
[18:29] * Tralin|Sleep (~Corneliou@5NZAADBHQ.tor-irc.dnsbl.oftc.net) Quit ()
[18:29] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[18:29] * GuntherDW (~blank@politkovskaja.torservers.net) has joined #ceph
[18:29] * midnightrunner (~midnightr@c-67-174-241-112.hsd1.ca.comcast.net) has joined #ceph
[18:30] * jashank42 (~jashan42@117.207.177.7) Quit (Ping timeout: 480 seconds)
[18:32] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[18:32] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) Quit (Quit: Leaving.)
[18:34] * linuxkidd_ (~linuxkidd@17.sub-70-215-195.myvzw.com) Quit (Quit: Leaving)
[18:34] * shaunm (~shaunm@74.215.76.114) has joined #ceph
[18:34] * srk (~srk@32.97.110.57) has joined #ceph
[18:35] * sleinen (~Adium@194.230.159.114) has joined #ceph
[18:37] * midnightrunner (~midnightr@c-67-174-241-112.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:37] * midnightrunner (~midnightr@216.113.160.71) has joined #ceph
[18:38] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[18:40] * jashank42 (~jashan42@117.197.162.17) has joined #ceph
[18:45] * sleinen (~Adium@194.230.159.114) Quit (Ping timeout: 480 seconds)
[18:46] * Miouge (~Miouge@h-72-233.a163.priv.bahnhof.se) has joined #ceph
[18:46] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[18:48] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit (Quit: Leaving)
[18:49] * dgurtner (~dgurtner@178.197.231.155) Quit (Ping timeout: 480 seconds)
[18:54] * oro (~oro@84-72-20-79.dclient.hispeed.ch) has joined #ceph
[18:55] * colonD (~colonD@173-165-224-105-minnesota.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:56] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[18:56] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[18:58] * Concubidated (~Adium@71.21.5.251) has joined #ceph
[18:59] * GuntherDW (~blank@7R2AABILT.tor-irc.dnsbl.oftc.net) Quit ()
[18:59] * Unforgiven (~hgjhgjh@193.111.136.164) has joined #ceph
[19:02] * Mika_c (~Mk@114-38-51-24.dynamic.hinet.net) Quit (Quit: Konversation terminated!)
[19:04] * colonD (~colonD@173-165-224-105-minnesota.hfc.comcastbusiness.net) has joined #ceph
[19:14] * georgem (~Adium@184.151.179.99) has joined #ceph
[19:16] * raw (~raw@37.48.65.169) Quit (Remote host closed the connection)
[19:16] * thomnico (~thomnico@2a01:e35:8b41:120:c4da:8f3f:b541:fb3f) Quit (Quit: Ex-Chat)
[19:18] * Nacer (~Nacer@c2s31-2-83-152-89-177.fbx.proxad.net) has joined #ceph
[19:20] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[19:22] * kl4m (~kl4m@modemcable226.111-70-69.static.videotron.ca) has joined #ceph
[19:29] * Unforgiven (~hgjhgjh@8Q4AABAXW.tor-irc.dnsbl.oftc.net) Quit ()
[19:29] * jacoo (~pepzi@108.61.123.227) has joined #ceph
[19:33] * adeel (~adeel@2602:ffc1:1:face:2c68:d561:c794:86c5) has joined #ceph
[19:37] * georgem (~Adium@184.151.179.99) Quit (Quit: Leaving.)
[19:37] * moore (~moore@97-124-123-201.phnx.qwest.net) Quit (Remote host closed the connection)
[19:37] * moore (~moore@64.202.160.233) has joined #ceph
[19:38] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[19:39] * Nacer (~Nacer@c2s31-2-83-152-89-177.fbx.proxad.net) Quit (Remote host closed the connection)
[19:42] * oro (~oro@84-72-20-79.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[19:42] * srk (~srk@32.97.110.57) Quit (Ping timeout: 480 seconds)
[19:43] * gaveen (~gaveen@175.157.13.68) has joined #ceph
[19:43] * gaveen (~gaveen@175.157.13.68) Quit ()
[19:43] * itsjpr (~imjpr@thing2.it.uab.edu) Quit (Ping timeout: 480 seconds)
[19:44] * imjpr (~imjpr@thing2.it.uab.edu) Quit (Ping timeout: 480 seconds)
[19:59] * jacoo (~pepzi@8Q4AABAY0.tor-irc.dnsbl.oftc.net) Quit ()
[19:59] * darks (~Blueraven@37.46.122.202) has joined #ceph
[20:00] * kl4m (~kl4m@modemcable226.111-70-69.static.videotron.ca) Quit ()
[20:02] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[20:03] * daviddcc (~dcasier@LCaen-656-1-144-187.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[20:03] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[20:03] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[20:06] <gleam> 0.82?
[20:07] * srk (~srk@32.97.110.57) has joined #ceph
[20:17] * kl4m (~kl4m@modemcable226.111-70-69.static.videotron.ca) has joined #ceph
[20:18] * itsjpr (~imjpr@164.111.200.170) has joined #ceph
[20:18] * imjpr (~imjpr@164.111.200.170) has joined #ceph
[20:18] <debian112> So I hear that I should use an HBA card for ceph, Can anyone recommend one?
[20:20] <debian112> what will be benefits running HBA vs Raid card with RAID0 on each disk?
[20:21] * vbellur (~vijay@122.172.39.207) Quit (Ping timeout: 480 seconds)
[20:22] * evilrob00 (~evilrob00@cpe-72-179-3-209.austin.res.rr.com) Quit (Remote host closed the connection)
[20:23] * joao (~joao@bl16-158-23.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[20:29] * darks (~Blueraven@7R2AABIM7.tor-irc.dnsbl.oftc.net) Quit ()
[20:29] * rf` (~hoopy@72.52.91.30) has joined #ceph
[20:29] * Hemanth (~Hemanth@117.221.98.214) has joined #ceph
[20:31] * jwilkins (~jwilkins@2601:9:4580:f4c:ea2a:eaff:fe08:3f1d) Quit (Ping timeout: 480 seconds)
[20:37] * ganders (~root@190.2.42.21) Quit (Quit: WeeChat 0.4.2)
[20:37] * BManojlovic (~steki@cable-89-216-232-165.dynamic.sbb.rs) has joined #ceph
[20:43] <darkfader> debian112: ideally, faster error detection and more stable performance. also easier to setup for people that don't know raid controller setup well
[20:44] <darkfader> oh and a lot cheaper if you have many systems :)
[20:44] <rlrevell> debian112: some raid cards make it a real pain to do JBOD, the ones i use you have to make each disk a raid0
[20:45] <rlrevell> oh, just saw that in your post ;-)
[20:45] * capri_on (~capri@212.218.127.222) has joined #ceph
[20:45] <debian112> darkfader, rlrevel: so is everyone saying HBA?
[20:45] <debian112> I will have 16 drives
[20:47] <mtanski> Ceph already handles data replication, recovery so it???s somewhat redundant.
[20:47] <mtanski> to use RAID
[20:48] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[20:48] <mtanski> With a much better failure domain that most raid levels (well depending on your conf)
[20:48] <mtanski> Using raid0 with one disk just seams moot
[20:49] <mtanski> And using raid0 accross many drives, just makes recovery that much more painful if only one drive in the array fails
[20:49] <rlrevell> mtanski: yeah it's dumb, but we couldn't find a jbod mode anywhere in the raid bios settings
[20:49] <tuxcraft1r> im trying to setup a test cluster (i now used one disk as osd on all my three nodes)
[20:49] <mtanski> you got to do what you got to do
[20:50] <Tetard_> rlrevell: create individual Raid 0 volumes on each disk
[20:50] <tuxcraft1r> i used ceph-deploy osd create ceph0X:/dev/sda on all my disk
[20:50] * Tetard_ is now known as Tetard
[20:50] <tuxcraft1r> they are all mounted
[20:50] <rlrevell> that
[20:50] <rlrevell> s what we did
[20:50] <tuxcraft1r> and show up in ceph osd tree
[20:50] <tuxcraft1r> but all as down
[20:50] <tuxcraft1r> what is next?
[20:50] * Hemanth (~Hemanth@117.221.98.214) Quit (Ping timeout: 480 seconds)
[20:51] <debian112> mtanski: yup, just checking what is the recommended. My only reservation is that going with the HBA, I will have to run Software RAID 1 on my two OS disk, or purchase another RAID card for those two disk.
[20:51] <rlrevell> tuxcraft1r: ceph-deploy osd activate ceph0X:/dev/sda
[20:51] <debian112> my drive layout: http://paste.debian.net/204655/
[20:51] <mtanski> debian112: your OS disk should be hardly used really
[20:51] * capri (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[20:52] <tuxcraft1r> rlrevell: shouldn't the activate be handled by ceph-deploy osd create ?
[20:52] <mtanski> so I don???t think it makes a difference if it???s software or hardware, unless you have some standard organizational policy
[20:52] <rlrevell> tuxcraft1r: iirc create just makes the data structures on the disk, activate tells the cluster to start using it
[20:53] <tuxcraft1r> rlrevell: how come it is already mounted hten
[20:53] <tuxcraft1r> if i run ceph-deploy osd activate ceph01:/dev/sda after the create it hangs
[20:53] <tuxcraft1r> [ceph01][INFO ] Running command: ceph-disk -v activate --mark-init sysvinit --mount /dev/sda
[20:53] <tuxcraft1r> and then it freezes
[20:54] * supay (sid47179@id-47179.highgate.irccloud.com) Quit (Remote host closed the connection)
[20:54] * carmstrong (sid22558@id-22558.highgate.irccloud.com) Quit (Remote host closed the connection)
[20:54] * kamalmarhubi (sid26581@id-26581.highgate.irccloud.com) Quit (Remote host closed the connection)
[20:54] * kklimonda_ (sid72883@id-72883.highgate.irccloud.com) Quit (Remote host closed the connection)
[20:54] <rlrevell> are your mons created and in quorum?
[20:54] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[20:54] <tuxcraft1r> 2015-06-05 20:42:03.601944 7f14619ad7c0 -1 filestore(/var/lib/ceph/tmp/mnt.p9Ngyu) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[20:54] <debian112> mtanski: agree, just always been a stickler for hardware RAID for OS.
[20:54] <debian112> Thanks
[20:54] <debian112> looks like I will go with the HBA
[20:54] <debian112> thanks everyone
[20:56] <rlrevell> tuxcraft1r: it appears from my reords that i never did a ceph-deploy osd create ceph01:/dev/sda. it was ceph-deploy osd prepare ... then ceph-deploy osd activate ...
[20:57] <tuxcraft1r> i been add this for a few days now
[20:57] <tuxcraft1r> the only way i can get it to cleanly stop is to purge everything reboot and then start all over again
[20:57] <tuxcraft1r> the osd is auto mounted
[20:58] <tuxcraft1r> but it is not "in"
[20:58] <rlrevell> tuxcraft1r: what guide are you following
[20:59] * rf` (~hoopy@5NZAADBSO.tor-irc.dnsbl.oftc.net) Quit ()
[20:59] * Thononain (~PcJamesy@95.211.169.35) has joined #ceph
[20:59] <tuxcraft1r> rlrevell: bits and pieces of everywhere
[20:59] <rlrevell> tuxcraft1r: i would recommend http://ceph.com/docs/master/start/quick-ceph-deploy/
[21:00] <tuxcraft1r> rlrevell: it doesnt uses real disk in that documentation
[21:00] <tuxcraft1r> so i got lost
[21:01] <rlrevell> oh right. i think i meant http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/
[21:02] <rlrevell> that one uses the real drives
[21:06] * evilrob0_ (~evilrob00@128.107.241.170) has joined #ceph
[21:14] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Read error: Connection reset by peer)
[21:14] * tupper_ (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[21:15] <tuxcraft1r> how do i know if my ceph version is firefly giant or hammer or something else
[21:15] * bandrus1 (~brian@55.sub-70-214-35.myvzw.com) has joined #ceph
[21:15] <tuxcraft1r> so i can figure out what my matching version of ceph-deploy should be
[21:16] * bandrus (~brian@55.sub-70-214-35.myvzw.com) Quit (Read error: Connection reset by peer)
[21:16] <tuxcraft1r> as my activate keeps hanging and maybe it is the version of ceph-deploy is wrong?
[21:16] * raw (raw@ipb2180b11.dynamic.kabel-deutschland.de) has joined #ceph
[21:16] <rlrevell> tuxcraft1r: on a debian based system dpkg -l ceph
[21:16] <rlrevell> rpm -q ceph on redhat iirc
[21:17] <tuxcraft1r> 0.80.7-2 but it doesnt say if it is firefly giant or hammer
[21:18] <rlrevell> that is firefly
[21:18] <rlrevell> http://ceph.com/docs/master/release-notes/
[21:18] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) Quit (Quit: Leaving)
[21:18] <rlrevell> why are you using such an old version?
[21:18] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) has joined #ceph
[21:18] <tuxcraft1r> rlrevell: it is the version available in debian stable
[21:19] * sudocat (~davidi@2601:e:2b80:a80:fcd6:11c1:1293:d3bb) has joined #ceph
[21:19] <tuxcraft1r> and there is no ceph upstream release for debian stable only for old stable
[21:19] <tuxcraft1r> so im using the debian packages
[21:20] <rlrevell> hmm. i'm not sure my advice even applies bc the versions are so far apart. every time i've had things just hang it was an issue with no mons up or mons not in quorum... you could also try zapping all OSDs or even blowing away all of ceph and starting again
[21:20] <tuxcraft1r> but my ceph-deploy server is using debian oldstable with version
[21:21] <tuxcraft1r> version 1.5.25~bpo70+1
[21:21] <tuxcraft1r> of ceph-deploy
[21:21] <tuxcraft1r> and i dont know how i can figure out if it works with 0.80.7-2 of ceph on my nodes
[21:22] <tuxcraft1r> rlrevell: yes i been starting over everytime
[21:22] * jashank42 (~jashan42@117.197.162.17) Quit (Quit: Leaving)
[21:22] <rlrevell> i would ceph-deploy purge as per http://ceph.com/docs/master/rados/deployment/ceph-deploy-install/ and start again following the steps in the guide
[21:22] <tuxcraft1r> rlrevell: but it takes so much time and I keep haning on the activate
[21:22] <rlrevell> but you probably want a version of that guide that matches your ceph version
[21:22] <tuxcraft1r> this is my fourth purge
[21:22] <tuxcraft1r> with reboots after all purges
[21:22] <tuxcraft1r> and zaps on all disks
[21:23] <rlrevell> does the whole system hang or just the ceph-deploy activate process?
[21:23] * sudocat (~davidi@2601:e:2b80:a80:fcd6:11c1:1293:d3bb) Quit ()
[21:23] <tuxcraft1r> only activate
[21:24] <rlrevell> and it's not doing IO?
[21:24] <tuxcraft1r> mabe i sould try a seperate disk for the journall
[21:25] <raw> tuxcraft1r, +1 for rebooting everything
[21:25] <tuxcraft1r> raw: i reboot after the purge
[21:25] <srk> tuxcarft1r, if you haven't tried already, might want to "dd" all the disks that going to be used as osds
[21:25] <raw> tuxcraft1r, also, you can add the ceph deb repository and install every version you want, just make sure the priorities are right.
[21:26] <srk> something like: dd if=/dev/zero of=/dev/sdb bs=1M count=512
[21:26] <tuxcraft1r> srk: did that with all disk clean mbr or gtp
[21:26] <srk> that saved me lot of headaches during activate
[21:28] <tuxcraft1r> i will start all over again and see
[21:28] <tuxcraft1r> an other thing that i didnt get is how to partion the ssd
[21:29] <tuxcraft1r> i cant find that in the manuals
[21:29] <srk> hmm, can you try: ceph-deploy osd prepare --zap-disk myhost:/dev/sdb, this will do both zap-disk as well as activating the osd
[21:29] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) has joined #ceph
[21:29] * Thononain (~PcJamesy@7R2AABIN5.tor-irc.dnsbl.oftc.net) Quit ()
[21:29] <tuxcraft1r> it uses a partion for every osd on the ssd
[21:29] * slowriot (~nicatronT@nx-01.tor-exit.network) has joined #ceph
[21:29] <tuxcraft1r> and the prepair command doenst make the partion
[21:29] <tuxcraft1r> for the ssd
[21:30] <tuxcraft1r> so i did not use my ssds as i could find the info
[21:32] * itsjpr (~imjpr@164.111.200.170) Quit (Ping timeout: 480 seconds)
[21:32] * imjpr (~imjpr@164.111.200.170) Quit (Ping timeout: 480 seconds)
[21:37] * linjan (~linjan@213.8.240.146) Quit (Remote host closed the connection)
[21:37] * yguang11 (~yguang11@nat-dip30-wl-d.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[21:38] * linjan (~linjan@213.8.240.146) has joined #ceph
[21:41] <tuxcraft1r> my udev rule auto picks up the disk after the prepare
[21:41] <tuxcraft1r> and then it hangs
[21:41] <tuxcraft1r> 2015-06-05 21:39:44.472042 7f8b21fdd7c0 -1 filestore(/var/lib/ceph/tmp/mnt.wpsk4b) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[21:42] <raw> :( im really out of ideas. i always get corrupt files with cephfs. i have a 5 node cluster with 128gb ram, 10GBE, 2SSD and 2HDDs and custom crush map. when i copy a file in with some gb size, it gets 500 to 5000 byte long sections of \00 bytes.
[21:42] <tuxcraft1r> http://paste.debian.net/204715/
[21:42] <raw> i have tried hammer and giant release versions, linux kernel 4.0.4 and 3.16.7. i also kicked every server off the cluster to see of it gets better
[21:44] <tuxcraft1r> raw: im still new to ceph, wish i could help you more
[21:45] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[21:45] <raw> i have no idea in which direction i should look. if i md5sum the file after transfer, it looks correct. if i drop the linux buffers and retest again, the file is corrupt.
[21:45] * shohn (~shohn@dslb-094-223-165-016.094.223.pools.vodafone-ip.de) Quit (Quit: Leaving.)
[21:46] <raw> also, scrubbing the whole cluster does not show problems (if scrub detects inconsistencies, it complains in ceph status right?)
[21:46] <raw> that leads me to the conclusion that the file gets submitted correctly to cephfs, but it somehow corrupts before it gets finally written to rados
[21:47] * delattec (~cdelatte@cpe-172-72-105-98.carolina.res.rr.com) Quit (Quit: Leaving)
[21:47] <raw> i have no idea how to debug this
[21:51] <tuxcraft1r> i send an email to the user mailinglist
[21:52] <tuxcraft1r> maybe the udev rules get trickered before the ceph-deploy commands are finished..
[21:53] <tuxcraft1r> tried rebooting a node
[21:53] <tuxcraft1r> but i still get the same error
[21:53] <tuxcraft1r> 2015-06-05 21:52:42.431494 7f12aabd17c0 -1 filestore(/var/lib/ceph/tmp/mnt.DQ2D36) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[21:58] * Nacer (~Nacer@c2s31-2-83-152-89-177.fbx.proxad.net) has joined #ceph
[21:59] * slowriot (~nicatronT@8Q4AABA3W.tor-irc.dnsbl.oftc.net) Quit ()
[21:59] * PuyoDead (~Kayla@ncc-1701-a.tor-exit.network) has joined #ceph
[21:59] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) has joined #ceph
[22:00] <tuxcraft1r> ceph-deploy osd prepare --zap-disk ceph03:sda:/dev/sdd
[22:01] <tuxcraft1r> with ssd as journal i still get the same errorss
[22:01] <tuxcraft1r> so im giving it a rest until help from the mailinglist
[22:02] <tuxcraft1r> Linux ceph03 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux
[22:02] <tuxcraft1r> can be used as kernel right?
[22:02] * pcsquared (sid11336@id-11336.ealing.irccloud.com) Quit (Remote host closed the connection)
[22:04] * burley (~khemicals@cpe-98-28-239-78.cinci.res.rr.com) Quit (Quit: burley)
[22:05] * bene (~ben@c-24-60-237-191.hsd1.nh.comcast.net) Quit (Quit: Konversation terminated!)
[22:05] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) Quit (Quit: Goodbye)
[22:05] <raw> tuxcraft1r, yes, it even supports all crush tunables
[22:05] * rlrevell (~leer@vbo1.inmotionhosting.com) Quit (Ping timeout: 480 seconds)
[22:06] <raw> and thats the one im also currently using ...
[22:06] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) has joined #ceph
[22:17] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[22:18] * evilrob0_ (~evilrob00@128.107.241.170) Quit (Remote host closed the connection)
[22:18] * oro (~oro@84-72-20-79.dclient.hispeed.ch) has joined #ceph
[22:19] * yguang11 (~yguang11@12.31.82.125) has joined #ceph
[22:21] * yguang11_ (~yguang11@2001:4998:effd:7804::102c) has joined #ceph
[22:21] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[22:21] * tupper_ (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Ping timeout: 480 seconds)
[22:25] * Nacer (~Nacer@c2s31-2-83-152-89-177.fbx.proxad.net) Quit (Remote host closed the connection)
[22:26] * jwilkins (~jwilkins@2601:9:703:f100:ea2a:eaff:fe08:3f1d) has joined #ceph
[22:27] * yguang11 (~yguang11@12.31.82.125) Quit (Ping timeout: 480 seconds)
[22:27] * DV_ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[22:29] * PuyoDead (~Kayla@9S0AAAPPA.tor-irc.dnsbl.oftc.net) Quit ()
[22:30] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[22:30] * owasserm (~owasserm@52D9864F.cm-11-1c.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[22:31] * owasserm (~owasserm@52D9864F.cm-11-1c.dynamic.ziggo.nl) has joined #ceph
[22:35] * nwf (~nwf@00018577.user.oftc.net) has joined #ceph
[22:38] * pcsquared (sid11336@id-11336.ealing.irccloud.com) has joined #ceph
[22:43] * yguang11_ (~yguang11@2001:4998:effd:7804::102c) Quit (Ping timeout: 480 seconds)
[22:45] * carmstrong (sid22558@id-22558.highgate.irccloud.com) has joined #ceph
[22:48] * kamalmarhubi (sid26581@id-26581.highgate.irccloud.com) has joined #ceph
[22:48] * georgem (~Adium@184.151.179.99) has joined #ceph
[22:50] * jskinner (~jskinner@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[22:51] * fridim_ (~fridim@56-198-190-109.dsl.ovh.fr) Quit (Ping timeout: 480 seconds)
[22:58] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[22:59] * rogst (~Grimhound@spftor1e1.privacyfoundation.ch) has joined #ceph
[23:00] * supay (sid47179@id-47179.highgate.irccloud.com) has joined #ceph
[23:00] * georgem1 (~Adium@184.151.178.44) has joined #ceph
[23:01] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[23:02] * itsjpr (~imjpr@138.26.125.8) has joined #ceph
[23:02] * imjpr (~imjpr@thing2.it.uab.edu) has joined #ceph
[23:06] * georgem (~Adium@184.151.179.99) Quit (Ping timeout: 480 seconds)
[23:07] * stiopa (~stiopa@cpc73828-dals21-2-0-cust630.20-2.cable.virginm.net) has joined #ceph
[23:09] * oro (~oro@84-72-20-79.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:11] * rogst (~Grimhound@5NZAADBZV.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[23:12] * kklimonda_ (sid72883@id-72883.highgate.irccloud.com) has joined #ceph
[23:14] * wicope (~wicope@0001fd8a.user.oftc.net) Quit (Read error: Connection reset by peer)
[23:16] * gregsfortytwo (~gregsfort@209.132.181.86) Quit (Quit: bye!)
[23:17] * owasserm (~owasserm@52D9864F.cm-11-1c.dynamic.ziggo.nl) Quit (Quit: Ex-Chat)
[23:17] * gregsfortytwo (~gregsfort@transit-86-181-132-209.redhat.com) has joined #ceph
[23:18] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) has joined #ceph
[23:22] * alfredodeza (~alfredode@198.206.133.89) has joined #ceph
[23:23] * Sigma (~clarjon1@179.43.152.254) has joined #ceph
[23:23] * dyasny (~dyasny@198.251.58.23) Quit (Ping timeout: 480 seconds)
[23:27] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) has left #ceph
[23:28] * ChrisNBlum (~ChrisNBlu@dhcp-ip-190.dorf.rwth-aachen.de) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[23:30] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) has joined #ceph
[23:31] * wschulze (~wschulze@cpe-69-206-242-231.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:31] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) has joined #ceph
[23:32] * georgem1 (~Adium@184.151.178.44) Quit (Quit: Leaving.)
[23:33] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[23:37] * Miouge (~Miouge@h-72-233.a163.priv.bahnhof.se) Quit (Quit: Miouge)
[23:37] * BManojlovic (~steki@cable-89-216-232-165.dynamic.sbb.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:39] * wushudoin (~wushudoin@38.140.108.2) Quit (Ping timeout: 480 seconds)
[23:40] * wushudoin_ (~wushudoin@transit-86-181-132-209.redhat.com) Quit (Ping timeout: 480 seconds)
[23:40] * wushudoin_ (~wushudoin@38.140.108.2) has joined #ceph
[23:43] * kl4m (~kl4m@modemcable226.111-70-69.static.videotron.ca) Quit ()
[23:44] * srk (~srk@32.97.110.57) Quit (Ping timeout: 480 seconds)
[23:51] * imjpr (~imjpr@thing2.it.uab.edu) Quit (Ping timeout: 480 seconds)
[23:51] * itsjpr (~imjpr@138.26.125.8) Quit (Ping timeout: 480 seconds)
[23:53] * Sigma (~clarjon1@3DDAAAQZH.tor-irc.dnsbl.oftc.net) Quit ()
[23:56] * jbautista- (~wushudoin@transit-86-181-132-209.redhat.com) has joined #ceph
[23:58] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:59] * primechuck (~primechuc@host-95-2-129.infobunker.com) Quit (Remote host closed the connection)
[23:59] * fsimonce (~simon@host253-71-dynamic.3-87-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.