#ceph IRC Log

Index

IRC Log for 2016-08-01

Timestamps are in GMT/BST.

[0:10] * dux0r (~Kidlvr@5AEAAAOUR.tor-irc.dnsbl.oftc.net) Quit ()
[0:10] * adnavare (~adnavare@192.55.55.41) has joined #ceph
[0:15] * aNupoisc (~adnavare@jfdmzpr04-ext.jf.intel.com) Quit (Remote host closed the connection)
[0:16] * [0x4A6F]_ (~ident@p4FC277D6.dip0.t-ipconnect.de) has joined #ceph
[0:18] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:18] * [0x4A6F]_ is now known as [0x4A6F]
[0:20] * adnavare (~adnavare@192.55.55.41) Quit (Ping timeout: 480 seconds)
[0:23] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:25] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[0:27] * rendar (~I@host229-27-dynamic.44-79-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[0:27] * Pulp (~Pulp@63-221-50-195.dyn.estpak.ee) Quit (Read error: Connection reset by peer)
[0:55] * Scrin (~Jones@108-61-59-19ch.openskytelcom.net) has joined #ceph
[0:56] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[0:59] * kuku (~kuku@119.93.91.136) has joined #ceph
[1:04] * garphy is now known as garphy`aw
[1:05] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[1:13] * kuku_ (~kuku@203.177.235.23) has joined #ceph
[1:19] * badone (~badone@66.187.239.16) Quit (Quit: k?thxbyebyenow)
[1:21] * kuku (~kuku@119.93.91.136) Quit (Ping timeout: 480 seconds)
[1:25] * Scrin (~Jones@108-61-59-19ch.openskytelcom.net) Quit ()
[1:30] * Harryhy (~delcake@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[1:31] * Nats (~natscogs@114.31.195.238) has joined #ceph
[1:55] * oms101 (~oms101@p20030057EA170600C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:59] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) Quit (Read error: Connection reset by peer)
[1:59] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) has joined #ceph
[2:00] * Harryhy (~delcake@9YSAAA0YL.tor-irc.dnsbl.oftc.net) Quit ()
[2:04] * oms101 (~oms101@p20030057EA028700C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[2:11] * miller7 (~user@145.132.0.252) Quit (Ping timeout: 480 seconds)
[2:13] * DJComet (~Pettis@26XAAAQOA.tor-irc.dnsbl.oftc.net) has joined #ceph
[2:20] * miller7 (~user@145.132.0.252) has joined #ceph
[2:23] * Nats (~natscogs@114.31.195.238) Quit (Read error: Connection reset by peer)
[2:25] * Nats (~natscogs@114.31.195.238) has joined #ceph
[2:30] * miller7 (~user@145.132.0.252) Quit (Ping timeout: 480 seconds)
[2:35] * kuku (~kuku@119.93.91.136) has joined #ceph
[2:40] * miller7 (~user@81.171.104.252) has joined #ceph
[2:43] * kuku_ (~kuku@203.177.235.23) Quit (Ping timeout: 480 seconds)
[2:43] * DJComet (~Pettis@26XAAAQOA.tor-irc.dnsbl.oftc.net) Quit ()
[2:50] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) has joined #ceph
[3:06] * Teddybareman (~click@185.65.134.81) has joined #ceph
[3:07] * miller7 (~user@81.171.104.252) Quit (Ping timeout: 480 seconds)
[3:11] * miller7 (~user@145.132.0.252) has joined #ceph
[3:19] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[3:21] * The1_ (~the_one@5.186.54.143) has joined #ceph
[3:26] * miller7 (~user@145.132.0.252) Quit (Read error: Connection reset by peer)
[3:27] * T1 (~the_one@5.186.54.143) Quit (Ping timeout: 480 seconds)
[3:35] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:36] * Teddybareman (~click@61TAAAZ4B.tor-irc.dnsbl.oftc.net) Quit ()
[3:38] * aj__ (~aj@x590d98e4.dyn.telefonica.de) has joined #ceph
[3:38] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[3:38] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:40] * sebastian-w (~quassel@212.218.8.138) has joined #ceph
[3:44] * sebastian-w_ (~quassel@212.218.8.138) Quit (Ping timeout: 480 seconds)
[3:45] * derjohn_mobi (~aj@x4db1270b.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[3:50] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[3:51] * chengpeng_ (~chengpeng@180.168.126.179) Quit (Quit: Leaving)
[3:52] * chengpeng (~chengpeng@180.168.197.82) has joined #ceph
[3:53] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[3:56] * kefu (~kefu@183.193.165.164) has joined #ceph
[4:05] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[4:09] * jfaj (~jan@p20030084AF2FD6005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:13] * kefu (~kefu@183.193.165.164) Quit (Ping timeout: 480 seconds)
[4:18] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[4:23] * kuku (~kuku@119.93.91.136) Quit (Remote host closed the connection)
[4:25] * ade_b (~abradshaw@p4FF7ACEB.dip0.t-ipconnect.de) has joined #ceph
[4:30] * newdave (~newdave@14-202-180-170.tpgi.com.au) has joined #ceph
[4:32] * ade (~abradshaw@p4FF79371.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[4:39] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:59] * kuku (~kuku@119.93.91.136) has joined #ceph
[5:00] * brians (~brian@80.111.114.175) Quit (Read error: No route to host)
[5:00] * brians (~brian@80.111.114.175) has joined #ceph
[5:08] * zdzichu (zdzichu@pipebreaker.pl) Quit (Ping timeout: 480 seconds)
[5:13] * rwheeler (~rwheeler@1.186.34.66) Quit (Quit: Leaving)
[5:28] * lmg (~notarima@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[5:36] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[5:38] * baojg (~baojg@61.135.155.34) has joined #ceph
[5:41] * vimal (~vikumar@114.143.165.7) has joined #ceph
[5:44] <baojg> when use s3-tests,met this issue:MissingSectionHeaderError: File contains no section headers. can someone help ?
[5:45] <baojg> the confige file is yaml format ,but aways parse error
[5:48] * Vacuum__ (~Vacuum@i59F79DEC.versanet.de) has joined #ceph
[5:51] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[5:55] * Vacuum_ (~Vacuum@88.130.211.5) Quit (Ping timeout: 480 seconds)
[5:56] * dcwangmit01 (~dcwangmit@162-245.23-239.PUBLIC.monkeybrains.net) Quit (Quit: Lost terminal)
[5:58] * lmg (~notarima@61TAAAZ6X.tor-irc.dnsbl.oftc.net) Quit ()
[6:01] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[6:09] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:09] * jfaj (~jan@p578E6A67.dip0.t-ipconnect.de) has joined #ceph
[6:12] * Aethis (~Gibri@ip95.ip-94-23-150.eu) has joined #ceph
[6:14] * efirs (~firs@5.128.174.86) has joined #ceph
[6:14] * vimal (~vikumar@114.143.165.7) Quit (Quit: Leaving)
[6:19] * kefu_ (~kefu@114.92.96.253) Quit (Max SendQ exceeded)
[6:20] * kefu (~kefu@ec2-54-64-13-168.ap-northeast-1.compute.amazonaws.com) has joined #ceph
[6:20] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[6:20] * jidar (~jidar@r2d2.fap.me) Quit (Quit: brb)
[6:24] * jidar (~jidar@104.207.140.225) has joined #ceph
[6:27] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[6:28] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) has joined #ceph
[6:35] * rwheeler (~rwheeler@121.244.87.118) has joined #ceph
[6:35] * aj__ (~aj@x590d98e4.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[6:40] * vimal (~vikumar@121.244.87.116) has joined #ceph
[6:42] * Aethis (~Gibri@26XAAAQS6.tor-irc.dnsbl.oftc.net) Quit ()
[6:42] * KeeperOfTheSoul (~Chrissi_@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[6:44] * rraja (~rraja@121.244.87.117) has joined #ceph
[6:54] * ira (~ira@121.244.87.117) has joined #ceph
[6:57] * sankar (~Thunderbi@14.99.134.160) has joined #ceph
[6:57] * Jeffrey4l (~Jeffrey@110.244.236.183) has joined #ceph
[7:12] * KeeperOfTheSoul (~Chrissi_@5AEAAAO03.tor-irc.dnsbl.oftc.net) Quit ()
[7:16] * rf`1 (~ZombieL@se4x.mullvad.net) has joined #ceph
[7:17] * sankar (~Thunderbi@14.99.134.160) Quit (Ping timeout: 480 seconds)
[7:19] * kefu_ (~kefu@114.92.96.253) has joined #ceph
[7:20] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Read error: Connection reset by peer)
[7:20] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[7:22] * kefu (~kefu@ec2-54-64-13-168.ap-northeast-1.compute.amazonaws.com) Quit (Ping timeout: 480 seconds)
[7:30] * efirs (~firs@5.128.174.86) Quit (Ping timeout: 480 seconds)
[7:33] * bvi (~Bastiaan@185.56.32.1) has joined #ceph
[7:37] * swami1 (~swami@49.38.1.162) has joined #ceph
[7:44] * aj__ (~aj@88.128.80.36) has joined #ceph
[7:46] * rf`1 (~ZombieL@61TAAAZ8Q.tor-irc.dnsbl.oftc.net) Quit ()
[7:55] * cheese^ (~Tenk@46.166.190.180) has joined #ceph
[8:13] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[8:19] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:20] <sickolog1> is there a way to change owner of all objects in a bucket?
[8:22] * ade_b (~abradshaw@p4FF7ACEB.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[8:25] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[8:25] * cheese^ (~Tenk@26XAAAQVD.tor-irc.dnsbl.oftc.net) Quit ()
[8:26] * ntpttr_ (~ntpttr@134.134.139.82) has joined #ceph
[8:26] * andrew (~andrew@38.123.99.230) has joined #ceph
[8:27] * rdas (~rdas@121.244.87.116) has joined #ceph
[8:29] * andrew (~andrew@38.123.99.230) has left #ceph
[8:34] * ntpttr_ (~ntpttr@134.134.139.82) Quit (Ping timeout: 480 seconds)
[8:43] * rendar (~I@host216-43-dynamic.31-79-r.retail.telecomitalia.it) has joined #ceph
[8:55] * evelu (~erwan@46.231.131.178) has joined #ceph
[8:58] * aj__ (~aj@88.128.80.36) Quit (Ping timeout: 480 seconds)
[9:01] * Moriarty (~nih@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[9:07] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:15] * analbeard (~shw@support.memset.com) has joined #ceph
[9:15] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:17] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:17] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[9:22] * kuku_ (~kuku@119.93.91.136) has joined #ceph
[9:22] * kuku (~kuku@119.93.91.136) Quit (Read error: Connection reset by peer)
[9:31] * Moriarty (~nih@5AEAAAO3L.tor-irc.dnsbl.oftc.net) Quit ()
[9:35] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:36] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:38] * ade (~abradshaw@p4FF7ACEB.dip0.t-ipconnect.de) has joined #ceph
[9:39] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[9:41] * flesh (~oftc-webi@static.ip-171-033-130-093.signet.nl) has joined #ceph
[9:44] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[9:44] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[9:49] * JANorman (~JANorman@81.137.246.31) has joined #ceph
[9:52] * walcubi (~walcubi@p5797A96F.dip0.t-ipconnect.de) has joined #ceph
[9:52] <walcubi> Hi, what's the best way to communicate with ceph when all admin commands are hanging?
[9:55] <walcubi> Communication with the monitors it seems are timing out.
[9:55] <walcubi> I've so far found out that: All that I've worked out so far is that: "56 pgs are stuck inactive for more than 300 seconds; 32 pgs degraded; 47 pgs peering; 1 pgs recovering; 22 pgs recovery_wait; 56 pgs stuck inactive; 91 pgs stuck unclean; 9 requests are blocked > 32 sec; 2 osds have slow requests"
[9:56] * fsimonce (~simon@host99-64-dynamic.27-79-r.retail.telecomitalia.it) has joined #ceph
[9:56] * kuku_ (~kuku@119.93.91.136) Quit (Remote host closed the connection)
[9:57] <destrudo> you need to hit up the logs on your monservers
[9:59] * DanFoster (~Daniel@office.34sp.com) has joined #ceph
[9:59] <walcubi> Looks like monitors are having an election ever minute.
[10:00] <destrudo> how many mons are running?
[10:00] * JANorman (~JANorman@81.137.246.31) Quit (Quit: Leaving...)
[10:00] * sankar (~Thunderbi@182.156.226.146) has joined #ceph
[10:00] <walcubi> Should be three
[10:01] <destrudo> I'd take the first one in your initial list down just to see if it clears up
[10:02] <destrudo> Are you doing some weird networking layout that's creating a loop or something like that?
[10:02] <destrudo> hardware errors reported on any of the nodes?
[10:03] <walcubi> Shouldn't be any hardware errors, but you never know with new machines.
[10:03] <destrudo> I'm thinking more networking, but I have seen disk errors cause mons to call elections
[10:04] <walcubi> I just destroyed a very large test pool before the weekend, created a new one and started filling it up.
[10:04] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[10:05] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[10:05] <destrudo> just check for link failures all over, and make sure there are no loops or dying disks
[10:05] <destrudo> do ping tests between the nodes
[10:05] <destrudo> normal debugging stuff.
[10:10] * garphy`aw is now known as garphy
[10:10] * kefu_ is now known as kefu
[10:13] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[10:20] <walcubi> It *may* just be that the disk thrashing of the osd daemons is causing it to drop out repeatedly.
[10:22] <destrudo> the osd's hopefully are not using the same disk as the mon
[10:22] <destrudo> because if they are you did one weird deployment
[10:23] <destrudo> I don't know what happens if you get major disk utilization on the mon daemon's mount for constant time periods
[10:24] <destrudo> I would suspect that there is no disk activity, or at least not consistent
[10:24] <destrudo> since it keeps getting cut off
[10:24] <destrudo> (I've done absolutely horrible things to my technically production cluster.)
[10:25] <walcubi> This is only a test environment. And they are separated disks from a top-down view.
[10:26] <destrudo> what does the view matter? the OSD's are operating on different disks, rite?
[10:26] <walcubi> Due to limitations of our hosting provider, they would be sharing the same physical disk.
[10:26] <destrudo> oh
[10:26] <destrudo> hmm.
[10:26] <walcubi> I absolutely wouldn't be doing this in production.
[10:26] <destrudo> lol, I wasn't saying you were.
[10:26] <destrudo> but
[10:27] * newdave (~newdave@14-202-180-170.tpgi.com.au) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[10:28] <destrudo> did you recently add mons or something?
[10:28] <destrudo> or was it just nuke-a-pool an refill
[10:28] <walcubi> OK, managed to destroy the broken pool after some stopping and starting of mons to avoid timeouts.
[10:29] <destrudo> gj. if it happens again enable debugging all over
[10:30] <walcubi> It's starting to look somewhat happier - but still producing a fault message on some stuff.
[10:30] <walcubi> destrudo, just nuke a pool and refill
[10:32] <walcubi> Some 63M objects consuming 700GB of data replicated x4.
[10:33] <baojg> when use s3-tests,met this issue:MissingSectionHeaderError: File contains no section headers. can someone help ?
[10:33] * DanFoster (~Daniel@office.34sp.com) Quit (Quit: Leaving)
[10:34] <walcubi> destrudo, by the way, would you know if things will continue working if all copies of an object are lost?
[10:35] <walcubi> I'll soon find out anyway, I just couldn't see anything in the documentation.
[10:35] * DanFoster (~Daniel@2a00:1ee0:3:1337:8983:aec3:c6fb:7848) has joined #ceph
[10:43] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[10:47] <Hatsjoe> How does one explain the following output: pgmap v2601: 4352 pgs, 2 pools, 0 bytes data, 0 objects --- 244 GB used, 27136 GB / 27380 GB avail
[10:48] <Hatsjoe> Tried running fstrim but did not free up the 244GB of "ghost" usage
[10:55] * fdmanana (~fdmanana@bl12-226-64.dsl.telepac.pt) has joined #ceph
[10:58] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[10:58] * branto (~branto@178-253-133-229.3pp.slovanet.sk) has joined #ceph
[10:58] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[11:06] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[11:06] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[11:14] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[11:18] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[11:19] <sickolog1> hi guys, does anyone know how to change owner of objects in a bucket? I need to change owner of all buckets/objects to a new user...i linked bucket to a new user, and the owner of a bucket is this new user, but all objects are still owned by the old user...
[11:20] <sickolog1> and of course i get forbidden when trying to access anything
[11:25] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Read error: Connection reset by peer)
[11:26] * Miouge_ (~Miouge@109.128.94.173) has joined #ceph
[11:26] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[11:28] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[11:31] * Miouge (~Miouge@109.128.94.173) Quit (Ping timeout: 480 seconds)
[11:31] * Miouge_ is now known as Miouge
[11:31] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[11:38] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[11:41] * kees_ (~kees@2001:610:600:8774:681d:d581:d297:aae3) has joined #ceph
[11:42] * sickolog1 is now known as sickology
[11:42] <kees_> how much traffic should i expect a monitor server to do in a 6-node-osd, ~100TB cluster?
[11:44] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Read error: Connection reset by peer)
[11:45] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[11:49] * Jebula (~xENO_@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[11:49] * sankar (~Thunderbi@182.156.226.146) Quit (Read error: Connection reset by peer)
[11:49] * sankar (~Thunderbi@182.156.226.146) has joined #ceph
[11:58] * sankar (~Thunderbi@182.156.226.146) Quit (Ping timeout: 480 seconds)
[12:00] * sankar (~Thunderbi@182.156.226.146) has joined #ceph
[12:19] * Jebula (~xENO_@9YSAAA09S.tor-irc.dnsbl.oftc.net) Quit ()
[12:19] * kalleeen (~Skyrider@tsn109-201-154-141.dyn.nltelcom.net) has joined #ceph
[12:23] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[12:25] * penguinRaider (~KiKo@146.185.31.226) Quit (Ping timeout: 480 seconds)
[12:30] <flesh> is it normal that recovery doesn't start with a HEALTH_ERR state?
[12:34] * penguinRaider (~KiKo@146.185.31.226) has joined #ceph
[12:36] * bniver (~bniver@pool-173-48-58-27.bstnma.fios.verizon.net) has joined #ceph
[12:37] <flesh> I have HEALTH_ERR due to an inconsistent PG - scrub error
[12:38] <flesh> probably due to i/O errors in one of the osd owing that PG
[12:38] <flesh> ceph pg repair works fine, but as soon as I set the problematic OSD down inconsistent PG comes back
[12:39] <flesh> I am in a deadlock... I need to get that OSD out, but I need that OSD to not be in HEALTH_ERR....
[12:39] * salwasser (~Adium@c-76-118-229-231.hsd1.ma.comcast.net) has joined #ceph
[12:39] <flesh> and recovery doesn't start... is it because of HEALTH_ERR? or do you guys know if I am experiencing weird behaviour?
[12:40] * salwasser (~Adium@c-76-118-229-231.hsd1.ma.comcast.net) Quit ()
[12:41] * DrWhax (~DrWhax_@000199fa.user.oftc.net) has joined #ceph
[12:43] * garphy is now known as garphy`aw
[12:45] <MrBy> flesh: you should paste all relevant information to nopaste, then the chance the someone is willing to help is more likely
[12:46] <flesh> MrBy thanks for the tip
[12:46] <flesh> http://pastebin.com/r0q5x5bG here is the query of the inconsistent PG
[12:47] <MrBy> flesh: ceph -s ; ceph osd tree
[12:47] * sankar (~Thunderbi@182.156.226.146) Quit (Remote host closed the connection)
[12:48] * sankar (~Thunderbi@182.156.226.146) has joined #ceph
[12:49] <flesh> osd tree http://pastebin.com/62rmT9m1
[12:49] * kalleeen (~Skyrider@tsn109-201-154-141.dyn.nltelcom.net) Quit ()
[12:50] <flesh> $ceph health detail: HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 2.ae is active+clean+inconsistent, acting [11,4] 1 scrub errors
[12:51] * garphy`aw is now known as garphy
[12:52] <flesh> $ceph pg repair 2.ae ends up with HEALTH_OK. http://pastebin.com/RY67juPc
[12:52] <flesh> but as soon as I stop osd.11. Get back to HEALTH_ERR, 1 pg inconsistent
[12:53] <flesh> after HEALTH_OK, I have also forced pg scrub, and pg scrub-deep on that PG. it ends up just fine. So i have the feeling something is off
[12:53] * rakeshgm (~rakesh@121.244.87.117) has joined #ceph
[12:58] * chrisinajar (~HoboPickl@46.166.190.201) has joined #ceph
[12:59] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) has joined #ceph
[13:00] * gregmark (~Adium@68.87.42.115) has joined #ceph
[13:00] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[13:06] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[13:07] * [arx] (~arx@the.kittypla.net) Quit (Ping timeout: 480 seconds)
[13:08] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[13:16] <MrBy> flesh: once you stop an osd it is obvious, that you have inconsistent pg's until it is recovered?
[13:20] * kuku (~kuku@112.203.53.134) has joined #ceph
[13:23] * kuku (~kuku@112.203.53.134) Quit (Remote host closed the connection)
[13:26] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[13:28] <flesh> recovery doesn't start. that's what I don't understand. But also, why is it obvious that it goes back to inconsistent pg??
[13:28] * chrisinajar (~HoboPickl@46.166.190.201) Quit ()
[13:29] <flesh> after pg repair the PG is active and ok again. So recovery should go just finewithout inconsistent states
[13:29] <MrBy> flesh: recovery does not start immediately, it starts after a timeout of X seconds, which is good, because the osd "down" might come back within X seconds
[13:29] <flesh> ok...
[13:32] <flesh> I have been for more than a minute now with this state http://pastebin.com/5LDfgfsz
[13:33] <flesh> do you know if the inconsistent state prevents recovery to start ? otherwise I don't understand...
[13:33] <MrBy> flesh: the health_err state is because of the scrub error.
[13:34] <MrBy> flesh: https://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/
[13:34] <MrBy> have you checked the log file of the broken pg?
[13:34] * jarrpa (~jarrpa@2602:3f:e183:a600:eab1:fcff:fe47:f680) has joined #ceph
[13:35] <flesh> MrBy thanks, I already had a look at that. My problem is on osd logs there is no ERR to be found
[13:36] <MrBy> flesh: you checked osd.11 and osd.4 ?
[13:36] <flesh> however, recovery just started ! 5 minutes after... strange... Istill think something is off in this cluster.
[13:36] <flesh> Yep. I checked both logs. and no trace of ERR messages
[13:37] <flesh> I will see how this recovery process finishes...
[13:39] <flesh> grep -Hn 'ERR' /var/log/ceph/ceph-osd.*.log doesn't return anything in any case
[13:41] * johnavp19891 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[13:41] <- *johnavp19891* To prove that you are human, please enter the result of 8+3
[13:41] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[13:42] <MrBy> flesh: what does ceph -s now looks like?
[13:43] <flesh> MrBy http://pastebin.com/z7NEUg3x
[13:47] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[13:47] * rburkholder (~overonthe@199.68.193.62) has joined #ceph
[13:47] <flesh> MrBy, do you think inconsistent state will be fixed after recovery? or.. if ceph pg repair will work again? Siince I can't really find any problematic objects, it's hard for me to see how else I could fix it
[13:48] * dynamicudpate (~overonthe@199.68.193.62) has joined #ceph
[13:48] * garphy is now known as garphy`aw
[13:48] <MrBy> flesh: yes, but i am not sure about the scrub error
[13:48] <flesh> I thought they came hand to hand
[13:49] <MrBy> flesh: maybe
[13:49] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[13:49] <flesh> in my case I haven't seen the one without the other
[13:49] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[13:50] <MrBy> flesh: I haven't experience with all ceph error states yet. Haven't had scrub errors yet, so I can't tell you precisely ;)
[13:50] <MrBy> flesh: fingers crossed
[13:50] <flesh> then I am not so sure if puting down osd.11 is best option. I might get stuck with this 1 scrub error
[13:50] <flesh> hehe
[13:50] <flesh> thanks
[13:50] <MrBy> flesh: wait until recovery completes first
[13:50] <flesh> ok
[13:50] <MrBy> flesh: should not last that long
[13:51] * wewe0901 (uid146646@id-146646.tooting.irccloud.com) has joined #ceph
[13:52] <boolman_> what config options do I need to change if it takes superlong time before the OSD get marked as DOWN if the entire node crashes?(several minutes) and during that time all my vm guests hang on read/writes
[13:55] * ira (~ira@121.244.87.117) Quit (Quit: Leaving)
[13:58] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[14:00] <MrBy> boolman_: http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/
[14:01] <boolman_> MrBy: yeah i read that one, but its not entirerly clear what should down the osd and what config paramater. since it doesnt seem to work very well
[14:03] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[14:03] <boolman_> i have some things in prod on this cluster so I can't really wait forever
[14:03] * dnunez (~dnunez@209-6-91-147.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com) has joined #ceph
[14:04] <flesh> MrBy recovery isn't done yet, but the inconsistent PG has been backfilled and still 1 inconsistent-1scrib error :(
[14:05] <flesh> boolman_ it seems like mon osd report timeout Did you try that one?
[14:07] <boolman_> flesh: is that setting set on the monitor nodes?
[14:07] <boolman_> or do i need to set it on the osd nodes aswell
[14:08] <boolman_> i believe I tried it on the monitor nodes, but it didnt do any difference
[14:09] <flesh> it is a monitors options yes
[14:09] * dnunez (~dnunez@209-6-91-147.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com) Quit (Remote host closed the connection)
[14:09] <flesh> boolman_ what does th logs say?
[14:09] * zdzichu (zdzichu@pipebreaker.pl) has joined #ceph
[14:10] <flesh> how long has the osd been marked as down?
[14:10] <flesh> sorry. has been down
[14:11] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[14:11] <boolman_> flesh: I dont have any logs saved unfortunately, will have to try it again
[14:16] * rwheeler (~rwheeler@121.244.87.118) Quit (Quit: Leaving)
[14:22] * jfaj (~jan@p578E6A67.dip0.t-ipconnect.de) Quit (Quit: WeeChat 1.5)
[14:28] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[14:29] * Racpatel (~Racpatel@2601:87:0:24af::53d5) has joined #ceph
[14:36] * m0zes (~mozes@ns1.beocat.ksu.edu) Quit (Remote host closed the connection)
[14:36] * m0zes (~mozes@ns1.beocat.ksu.edu) has joined #ceph
[14:36] * jrowe (~jrowe@pool-173-67-1-110.bltmmd.fios.verizon.net) has joined #ceph
[14:39] * garphy`aw is now known as garphy
[14:40] * b0e (~aledermue@213.95.25.82) has joined #ceph
[14:42] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[14:44] * sileht (~sileht@gizmo.sileht.net) Quit (Quit: WeeChat 1.5)
[14:45] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) has joined #ceph
[14:45] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[14:46] * sileht (~sileht@gizmo.sileht.net) has joined #ceph
[14:47] * baojg (~baojg@61.135.155.34) Quit (Ping timeout: 480 seconds)
[14:47] * baojg (~baojg@61.135.155.34) has joined #ceph
[14:48] * jrowe (~jrowe@pool-173-67-1-110.bltmmd.fios.verizon.net) Quit (Remote host closed the connection)
[14:49] * jrowe (~jrowe@204.14.236.152) has joined #ceph
[14:52] * johnavp19891 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:53] * dnunez (~dnunez@209-6-91-147.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com) has joined #ceph
[14:57] * karnan (~karnan@106.51.141.85) has joined #ceph
[15:02] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[15:03] * vend3r (~Nanobot@104.156.240.133) has joined #ceph
[15:06] * johnavp1989 (~jpetrini@pool-100-14-10-2.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[15:08] * kees_ (~kees@2001:610:600:8774:681d:d581:d297:aae3) Quit (Quit: Leaving)
[15:10] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[15:14] * GeoTracer (~Geoffrey@41.77.153.99) Quit (Ping timeout: 480 seconds)
[15:14] * GeoTracer (~Geoffrey@41.77.153.99) has joined #ceph
[15:20] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) Quit (Quit: bye)
[15:20] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) has joined #ceph
[15:21] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[15:23] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:24] * m0zes (~mozes@ns1.beocat.ksu.edu) Quit (Quit: WeeChat 1.5)
[15:24] * m0zes (~mozes@ns1.beocat.ksu.edu) has joined #ceph
[15:26] * karnan (~karnan@106.51.141.85) Quit (Ping timeout: 480 seconds)
[15:26] * salwasser (~Adium@72.246.3.14) has joined #ceph
[15:26] * salwasser (~Adium@72.246.3.14) Quit ()
[15:26] * salwasser (~Adium@72.246.3.14) has joined #ceph
[15:27] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Quit: WeeChat 1.5)
[15:27] * gregmark (~Adium@68.87.42.115) has joined #ceph
[15:33] * vend3r (~Nanobot@104.156.240.133) Quit ()
[15:37] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[15:37] * DeMiNe0 (~DeMiNe0@104.131.119.74) Quit (Ping timeout: 480 seconds)
[15:40] * DeMiNe0 (~DeMiNe0@104.131.119.74) has joined #ceph
[15:43] * karnan (~karnan@106.51.137.133) has joined #ceph
[15:45] * m0zes (~mozes@ns1.beocat.ksu.edu) Quit (Quit: WeeChat 1.5)
[15:45] * m0zes (~mozes@ns1.beocat.ksu.edu) has joined #ceph
[15:47] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[15:47] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit ()
[15:48] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[15:49] * MatTahaari (~mmattern@p578a8542.dip0.t-ipconnect.de) has joined #ceph
[15:50] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[15:55] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[15:55] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Quit: WeeChat 1.5)
[15:56] <MatTahaari> Hello. Maybe someone can help me with ceph-deploy mon create-initial - if I try to create a monitor I get the error "Failed to execute command: systemctl enable ceph.target". It's on a Debian 8 with ceph 0.94.7 and ceph-deploy 1.5.34
[15:59] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) has joined #ceph
[15:59] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:59] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[16:02] <MrBy> flesh: how does it look like?
[16:02] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:03] <Hatsjoe> MatTahaari what does systemctl status and journalctl tell you?
[16:03] <Hatsjoe> Also checked the logs in /var/log/ceph?
[16:03] * bene3 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[16:04] * aj__ (~aj@fw.gkh-setu.de) has joined #ceph
[16:04] <flesh> MrBy recovery finished with 1pg inconsitent 1 scrub error. I run ceph pg repair , and health_ok. Just like before... I wonder now if I take out the new primary OSD, if that PG will go into an inconsistent state again (just as it happened with osd.11)
[16:06] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:06] <flesh> i still think something is off... but at least now OSD.11 (with i/O errors) is out of the cluster
[16:07] * Annttu (annttu@0001934a.user.oftc.net) Quit (Remote host closed the connection)
[16:07] * Annttu (annttu@irc.annttu.fi) has joined #ceph
[16:07] * vimal (~vikumar@121.244.87.116) Quit (Quit: Leaving)
[16:08] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) has joined #ceph
[16:09] <MatTahaari> Hatsjoe: status shows nothing about ceph, journalctl also. There is only one log for the new monitor and it's empty.
[16:09] * tries (~tries__@2a01:2a8:2000:ffff:1260:4bff:fe6f:af91) Quit (Ping timeout: 480 seconds)
[16:10] * JCL (~JCL@ip68-96-196-245.lv.lv.cox.net) has joined #ceph
[16:10] <Hatsjoe> systemctl status ceph.target or systemctl status ceph-mon@<mon-id> must say something
[16:11] <Hatsjoe> Else it wouldn't fail
[16:13] * karnan (~karnan@106.51.137.133) Quit (Quit: Leaving)
[16:14] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) Quit (Quit: Leaving.)
[16:15] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[16:16] <MatTahaari> Ah, ok. Sorry misunderstood that. Status shows "ceph.target Loaded: not-found (Reason: No such file or directory) Active: inactive (dead)"
[16:17] * tries (~tries__@2a01:2a8:2000:ffff:1260:4bff:fe6f:af91) has joined #ceph
[16:18] <MrBy> flesh: i see, if you don't get rid of the scrub error you might want to write to the mailing list
[16:18] <flesh> MrBy crub error is gone together with the inconsistent PG
[16:19] <flesh> in my case, they both appeared at the same time... and disappeard at the same time :)
[16:19] <MrBy> flesh: oic, so everything is healthy again, perfect
[16:19] <flesh> yep. thanks a lot
[16:20] * xarses_ (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:20] <Hatsjoe> MatTahaari there you go, now you know why it would fail, so now you can fix the issue, could you try `systemctl enable ceph.target` and if that doesnt work, try `systemctl enable ceph-mon@<mon-id>`
[16:21] * sudocat (~dibarra@192.185.1.20) Quit (Quit: Leaving.)
[16:21] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[16:21] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[16:22] <MatTahaari> both give me the error "Failed to execute operation: No such file or directory"
[16:30] * kefu (~kefu@183.193.165.164) has joined #ceph
[16:30] <Hatsjoe> Did you do ceph-deploy install first? i.e. is ceph installed? Also, did the other steps of `ceph-deploy mon create-initial` complete succesfully?
[16:31] <Hatsjoe> And just to be sure, did you perform those systemctl commands on the same server which was the target for the ceph-deploy command?
[16:31] * johnavp1989 (~jpetrini@8.39.115.8) has left #ceph
[16:31] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[16:31] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[16:34] <MatTahaari> Yes. I installed all nodes with ceph-deploy install and then created a cluster with ceph-deploy new. Then I would create the first monitor on the same node I fired the commands.
[16:34] * sankar (~Thunderbi@182.156.226.146) Quit (Ping timeout: 480 seconds)
[16:36] * batrick (~batrick@2600:3c00::f03c:91ff:fe96:477b) Quit (Quit: WeeChat 1.5)
[16:38] <Hatsjoe> Can you verify the mon data dir (/var/lib/ceph/mon/<mon-id>) contains any data? i.e. is installed properly
[16:38] <MatTahaari> What I found now ist, that there are no files named "ceph.target" in the whole filesystem. I put ceph.target and ceph-mon@.service from git to /lib/systemd/system and now ceph-deploy goes on.
[16:39] <Hatsjoe> Now that I think of it, you installed hammer, I only have experience with Jewel, it is possible systemd support is not available in hammer by default
[16:39] * newdave (~newdave@36-209-181-180.cpe.skymesh.net.au) Quit (Quit: Textual IRC Client: www.textualapp.com)
[16:40] <JohnPreston78> Hello everyone
[16:40] * batrick (~batrick@2600:3c00::f03c:91ff:fe96:477b) has joined #ceph
[16:40] <JohnPreston78> I have a few questions regarding scrubbing : 1. what is it that triggers scrubbing on a PG ? 2. Do the properties around scrubbing we have in the ceph.conf file work ? Has it to be in a specific section or it can be in general ?
[16:42] * mykola (~Mikolaj@91.245.79.155) has joined #ceph
[16:43] <SamYaple> JohnPreston78: scrubbing is triggered based on time since last scrub and that is configurable in the ceph.conf, yes
[16:44] <JohnPreston78> SamYaple: thanks, I wanted to be 100% sure. So it is not at all triggered by what the clients do for IOps
[16:44] <SamYaple> configuring for scrubs is in the [osd] section i believe
[16:44] <SamYaple> not that im aware of
[16:45] <JohnPreston78> Looking at a ceph.conf exemple, it seems indeed that this is the osd config specificly
[16:45] <JohnPreston78> which makes sense
[16:45] <JohnPreston78> thanks SamYaple
[16:49] <MatTahaari> Hatsjoe: found it. It seems that hammer supports systemd. The hammer branch in git has files for that "ceph/systemd". Copied that to /lib/systemd/system and now ceph-deploy works. Monitors are running. Thanks for your help.
[16:52] * swami1 (~swami@49.38.1.162) Quit (Quit: Leaving.)
[16:53] * xarses_ (~xarses@64.124.158.192) has joined #ceph
[17:00] * karnan (~karnan@106.51.137.133) has joined #ceph
[17:00] * karnan (~karnan@106.51.137.133) Quit ()
[17:02] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:02] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) Quit ()
[17:03] * wushudoin (~wushudoin@2601:646:8281:cfd:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:04] <Hatsjoe> Excellebnt, good luck MatTahaari
[17:04] <Hatsjoe> s/Excellebnt/Excellent
[17:10] * MatTahaari (~mmattern@p578a8542.dip0.t-ipconnect.de) Quit (Quit: Ex-Chat)
[17:11] * Nats_ (~natscogs@114.31.195.238) has joined #ceph
[17:12] * Nats (~natscogs@114.31.195.238) Quit (Ping timeout: 480 seconds)
[17:18] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[17:26] * analbeard (~shw@support.memset.com) Quit (Ping timeout: 480 seconds)
[17:30] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[17:30] * branto (~branto@178-253-133-229.3pp.slovanet.sk) Quit (Quit: Leaving.)
[17:34] * joshd1 (~jdurgin@2602:30a:c089:2b0:90e:4e0d:6342:2f69) has joined #ceph
[17:34] * md_ (~john@205.233.53.45) has joined #ceph
[17:35] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[17:35] * blizzow (~jburns@50.243.148.102) has joined #ceph
[17:35] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[17:42] * kefu is now known as kefu|afk
[17:44] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[17:44] * DanFoster (~Daniel@2a00:1ee0:3:1337:8983:aec3:c6fb:7848) Quit (Quit: Leaving)
[17:50] * md_ (~john@205.233.53.45) Quit (Remote host closed the connection)
[17:51] * md_ (~john@205.233.53.42) has joined #ceph
[17:52] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[17:53] * rotbeard (~redbeard@2a02:908:df13:bb00:bc0e:1b4d:b663:a8a4) has joined #ceph
[17:53] * sankar (~Thunderbi@182.156.226.146) has joined #ceph
[17:53] * sudocat (~dibarra@192.185.1.20) Quit (Read error: Connection reset by peer)
[17:54] * sudocat1 (~dibarra@192.185.1.20) has joined #ceph
[17:54] * tsg (~tgohad@134.134.139.78) has joined #ceph
[17:57] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[18:00] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) Quit (Quit: WeeChat 1.5)
[18:04] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) has joined #ceph
[18:04] * rakeshgm (~rakesh@121.244.87.117) Quit (Quit: Leaving)
[18:07] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Killed (NickServ (Too many failed password attempts.)))
[18:07] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[18:09] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[18:11] * sankar (~Thunderbi@182.156.226.146) Quit (Ping timeout: 480 seconds)
[18:20] * ntpttr_ (~ntpttr@134.134.139.78) has joined #ceph
[18:21] * ntpttr_ (~ntpttr@134.134.139.78) Quit ()
[18:24] * rotbeard (~redbeard@2a02:908:df13:bb00:bc0e:1b4d:b663:a8a4) Quit (Quit: Leaving)
[18:26] * jfaj (~jan@p20030084AF2C62005EC5D4FFFEBB68A4.dip0.t-ipconnect.de) Quit (Quit: WeeChat 1.5)
[18:28] * ntpttr (~ntpttr@134.134.139.77) Quit (Remote host closed the connection)
[18:29] * ntpttr (~ntpttr@134.134.139.77) has joined #ceph
[18:29] * ntpttr_ (~ntpttr@fmdmzpr03-ext.fm.intel.com) has joined #ceph
[18:33] * reed (~reed@216.38.134.18) has joined #ceph
[18:34] * Szernex (~rhonabwy@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[18:42] * jermudgeon (~jhaustin@gw1.ttp.biz.whitestone.link) has joined #ceph
[18:46] * aNupoisc (~adnavare@192.55.54.43) has joined #ceph
[18:48] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[18:51] * [arx] (~arx@six.happyforever.com) has joined #ceph
[18:52] * flesh (~oftc-webi@static.ip-171-033-130-093.signet.nl) Quit (Quit: Page closed)
[18:52] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[18:54] * bvi (~Bastiaan@185.56.32.1) Quit (Quit: Leaving)
[18:55] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[19:03] * bniver (~bniver@pool-173-48-58-27.bstnma.fios.verizon.net) Quit (Remote host closed the connection)
[19:04] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) has joined #ceph
[19:04] * Szernex (~rhonabwy@61TAAA0N0.tor-irc.dnsbl.oftc.net) Quit ()
[19:05] * joshd1 (~jdurgin@2602:30a:c089:2b0:90e:4e0d:6342:2f69) Quit (Quit: Leaving.)
[19:09] * evelu (~erwan@46.231.131.178) Quit (Ping timeout: 480 seconds)
[19:11] * linuxkidd (~linuxkidd@ip70-189-207-54.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[19:16] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[19:22] * dcwangmit01 (~dcwangmit@162-245.23-239.PUBLIC.monkeybrains.net) has joined #ceph
[19:24] * sudocat1 (~dibarra@192.185.1.20) Quit (Quit: Leaving.)
[19:24] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[19:26] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:26] * linuxkidd (~linuxkidd@166.170.45.244) has joined #ceph
[19:27] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[19:28] * walcubi (~walcubi@p5797A96F.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[19:31] * kefu (~kefu@114.92.96.253) has joined #ceph
[19:34] * ron-slc (~Ron@173-165-129-118-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[19:36] * kefu|afk (~kefu@183.193.165.164) Quit (Ping timeout: 480 seconds)
[19:43] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[19:45] * aj__ (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[19:45] * RaidSoft (~brannmar@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[19:47] * georgem (~Adium@108.161.124.52) has joined #ceph
[19:47] <blizzow> I was thinking of having two ceph pools running, one based off SSDs and on based off spinner drives. I see an article about altering the crush map to do so, but am wondering if it may be better to just set up two different ceph clusters on the same network. The spinners would be for an object gateway (a la s3). The others would be mainly for hypervisors to access VM images.
[19:47] <blizzow> Anyone here have recommendations?
[19:49] <blizzow> I'm also wondering how performance scales with more drives, I have one machine with 8 spinner drives, and 7 machines with 4 spinners each. Each machine has two SSDs.
[19:49] <SamYaple> blizzow: you can absolutely split the crush map to ensure fast data is on fast storage and slow data is on slow storage.... but as someone who has done that I prefer setting up two clusters
[19:50] <SamYaple> however, if there is even the slightest _chance_ of wanting to move data between the two types of storage, use one clsuter
[19:50] <SamYaple> it will just mess up your free space readings
[19:53] * mykola (~Mikolaj@91.245.79.155) Quit (Ping timeout: 480 seconds)
[19:54] * kefu (~kefu@114.92.96.253) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:55] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:56] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[19:57] <blizzow> SamYaple: I'm not particularly worried about transferring data between different ceph clusters. I am struggling a little with making my hypervisors happily know how to differentiate between pools.
[19:57] * Miouge_ (~Miouge@109.128.94.173) has joined #ceph
[19:58] * mykola (~Mikolaj@193.93.217.39) has joined #ceph
[19:58] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Read error: Connection reset by peer)
[19:59] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[19:59] <SamYaple> blizzow: libvirt should allow multiple secrets defined
[20:00] <SamYaple> however if we are tlaking about something liek openstack, it doesn't support multiple ceph clusters
[20:00] * Miouge (~Miouge@109.128.94.173) Quit (Ping timeout: 480 seconds)
[20:00] * Miouge_ is now known as Miouge
[20:00] <SamYaple> well, technically you could kind of makei t work, but you don't want ot do it that way
[20:00] * davidzlap (~Adium@cpe-172-91-154-245.socal.res.rr.com) has joined #ceph
[20:01] * Hemanth (~hkumar_@27.34.254.131) has joined #ceph
[20:01] * Hemanth (~hkumar_@27.34.254.131) Quit ()
[20:01] * Hemanth (~hkumar_@27.34.254.131) has joined #ceph
[20:03] * EthanL (~lamberet@cce02cs4036-fa12-z.ams.hpecore.net) has joined #ceph
[20:07] * sudocat (~dibarra@192.185.1.20) Quit (Read error: Connection reset by peer)
[20:09] <blizzow> Any ideas what kind of read/write performance I should have with 3 OSDs (1 with 8 SATA drives and 2 with 4 sata drives)? 10Gbe network. 16GB RAM per OSD.
[20:10] <SamYaple> blizzow: osds are drives themselves, no nodes
[20:10] * fdmanana (~fdmanana@bl12-226-64.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[20:10] <SamYaple> 8 sata drives == 8 osds
[20:11] <SamYaple> perforamnce is goign to depend on disk speed, configuration, network, and hypervisor
[20:11] * EthanL (~lamberet@cce02cs4036-fa12-z.ams.hpecore.net) Quit (Read error: Connection reset by peer)
[20:14] * swami1 (~swami@27.7.167.243) has joined #ceph
[20:15] * RaidSoft (~brannmar@61TAAA0PP.tor-irc.dnsbl.oftc.net) Quit ()
[20:15] <blizzow> I'm just trying to get a ballpark on speed and if performance increases somewhat linearly with more drives.
[20:16] <SamYaple> its so dependant on so many factors its hard to say. best to test
[20:17] <SamYaple> for example, are we talking about raw throughput? iops? which hypervisor? which driver for the hypervisor? same type disks throughout? networking?
[20:17] <SamYaple> so many questions
[20:17] <m0zes> individual client throughput increases a bit with each added osd (to a point) but the aggregate supported throughput increases linearly.
[20:20] <blizzow> raw throughput, just a ballpark of how many MB/s can I write to the cluster? KVM based hypervisor using librbd. 10GBe network, 7200 RPM SATA drives.
[20:22] * bniver (~bniver@pool-173-48-58-27.bstnma.fios.verizon.net) has joined #ceph
[20:23] <SamYaple> from a single vm or aggregate vms? what type of data drives? how many? how many nodes are these data drives spread across?
[20:23] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[20:24] <SamYaple> its definetely estimateable, but way more info is needed
[20:24] <SamYaple> also throughput in MB/s isn't really a good measure of speed. iops are very improtant as well
[20:25] * Nacer (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[20:26] * Nacer (~Nacer@176.31.89.99) has joined #ceph
[20:30] * georgem (~Adium@108.161.124.52) Quit (Quit: Leaving.)
[20:30] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[20:31] * georgem (~Adium@108.161.124.52) has joined #ceph
[20:32] * swami1 (~swami@27.7.167.243) Quit (Quit: Leaving.)
[20:33] * georgem (~Adium@108.161.124.52) Quit ()
[20:34] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[20:34] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[20:34] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[20:35] * Nacer (~Nacer@176.31.89.99) Quit (Read error: Connection reset by peer)
[20:35] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[20:35] * Nacer (~Nacer@176.31.89.99) has joined #ceph
[20:39] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Quit: Ex-Chat)
[20:42] * sudocat (~dibarra@192.185.1.20) has joined #ceph
[20:42] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) has joined #ceph
[20:46] * Nacer (~Nacer@176.31.89.99) Quit (Ping timeout: 480 seconds)
[20:46] * sudocat1 (~dibarra@192.185.1.20) has joined #ceph
[20:47] * Nacer_ (~Nacer@pai34-5-88-176-168-157.fbx.proxad.net) Quit (Remote host closed the connection)
[20:47] * Nacer (~Nacer@176.31.89.99) has joined #ceph
[20:48] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[20:50] * rendar (~I@host216-43-dynamic.31-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:53] * sudocat (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[21:10] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[21:11] * aNupoisc (~adnavare@192.55.54.43) Quit (Remote host closed the connection)
[21:16] * rendar (~I@host216-43-dynamic.31-79-r.retail.telecomitalia.it) has joined #ceph
[21:16] * Miouge (~Miouge@109.128.94.173) Quit (Quit: Miouge)
[21:17] * Hemanth (~hkumar_@27.34.254.131) Quit (Quit: Leaving)
[21:19] * Miouge (~Miouge@109.128.94.173) has joined #ceph
[21:27] * Miouge (~Miouge@109.128.94.173) Quit (Ping timeout: 480 seconds)
[21:31] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[21:37] * JCL (~JCL@ip68-96-196-245.lv.lv.cox.net) Quit (Quit: Leaving.)
[21:42] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[21:45] * BManojlovic (~steki@cable-94-189-162-3.dynamic.sbb.rs) has joined #ceph
[21:52] <rkeene> I'm trying to upgrade to Ceph 10.2.2 -- my only client is QEMU. The VM running in QEMU can't access the disk backed by RBD after this upgrade (from 0.94.7). It just hangs in the open() system call
[21:54] <rkeene> Looks like this might be due to RBD locks -- do they prevent I/O from reaching the RBD now ?
[21:54] * aNupoisc (~adnavare@134.134.137.75) has joined #ceph
[21:55] * linuxkidd_ (~linuxkidd@ip70-189-207-54.lv.lv.cox.net) has joined #ceph
[22:02] * linuxkidd (~linuxkidd@166.170.45.244) Quit (Ping timeout: 480 seconds)
[22:06] * ntpttr (~ntpttr@134.134.139.77) Quit (Quit: Leaving)
[22:06] * cyphase (~cyphase@000134f2.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:07] * cyphase (~cyphase@000134f2.user.oftc.net) has joined #ceph
[22:15] <SamYaple> rkeene: they did something with some setting to indicate teh number of clients that can map the rbd at once i believe
[22:16] <SamYaple> i seem to recall something about that
[22:16] <SamYaple> knowing your setup, id start there (might be a new option?)
[22:16] <rkeene> I think it's that the exclusive-lock feature is enabled by default now
[22:16] <rkeene> On newly created RBDs
[22:16] <SamYaple> that is true
[22:16] <rkeene> Testing that ut
[22:16] <SamYaple> you can adjust the defaults still
[22:16] <rkeene> How do you specify that a default feature be omitted from "rbd create" ?
[22:16] <rkeene> I want all the other default features, just not this one
[22:16] <SamYaple> its something in ceph.conf lemme check
[22:17] <rkeene> I mean from the command-line
[22:17] <SamYaple> i do this on one of my clusters
[22:17] <SamYaple> oh
[22:17] <SamYaple> you can set the features you want explictly
[22:17] <SamYaple> as in all of them
[22:17] <SamYaple> i think you have to use the bitmap notation though
[22:17] <rkeene> I want the defaults, minus one
[22:18] <rkeene> In case the defaults change in the future
[22:19] <SamYaple> i think if you do --image-features [] with an explict list it doesnt install the defaults
[22:19] <SamYaple> wold need to test that out
[22:20] <rkeene> Singular, --image-feature; --image-features is deprecated
[22:21] <rkeene> I think I can just remove the feature after the image is created
[22:21] <SamYaple> well image-features is deprecated, but image-feature can take multiple comma seperate i believe
[22:22] <SamYaple> i dont remember. i dont use the cli for creation much
[22:22] <SamYaple> yea thats right
[22:23] <SamYaple> comma seperated works with a singluar --image-feature
[22:25] <rkeene> Testing out just disabling that feature
[22:26] <rkeene> Takes a while for the installer to recompile and install and test, back in 20 minutes
[22:31] * aj__ (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[22:33] * aNupoisc (~adnavare@134.134.137.75) Quit (Remote host closed the connection)
[22:35] * Jeffrey4l_ (~Jeffrey@110.252.75.163) has joined #ceph
[22:38] * Jeffrey4l (~Jeffrey@110.244.236.183) Quit (Ping timeout: 480 seconds)
[22:40] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[22:51] * mykola (~Mikolaj@193.93.217.39) Quit (Quit: away)
[22:53] * aNupoisc (~adnavare@134.134.139.82) has joined #ceph
[22:53] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) Quit (Quit: Lost terminal)
[22:53] <rkeene> !@$1
[22:53] <rkeene> 2016-08-01 20:53:40.121528 7fb2f665cd80 -1 librbd: cannot disable exclusive lock
[22:54] <rkeene> "rbd feature disable rbd/blah exclusive-lock" fails in 10.2.2
[22:54] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) has joined #ceph
[22:54] * ChanServ sets mode +o nhm
[22:54] <SamYaple> i think thats because other enabled features _require_ exlusive lock
[22:54] <SamYaple> what features are enabled?
[22:55] <rkeene> layering,object-map,fast-diff,deep-flatten
[22:56] <SamYaple> deep-flatten needs exlucive lock i think...
[22:56] <SamYaple> no
[22:56] <rkeene> I removed deep-flatten, still fails in the same way
[22:56] <SamYaple> rkeene: object-map requires it
[22:56] <SamYaple> and fast-diff requires object-map
[22:57] <SamYaple> so disable fast-diff, then object-map, then exlusive lock should come free
[22:57] <rkeene> Yes
[23:02] * Nacer (~Nacer@176.31.89.99) Quit (Remote host closed the connection)
[23:20] * wak-work (~wak-work@2620:15c:202:0:b944:e257:ccb6:e02b) Quit (Remote host closed the connection)
[23:20] * wak-work (~wak-work@2620:15c:202:0:6deb:8e17:a5d1:8755) has joined #ceph
[23:23] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:29] * jermudgeon (~jhaustin@gw1.ttp.biz.whitestone.link) Quit (Quit: jermudgeon)
[23:30] * sudocat1 (~dibarra@192.185.1.20) Quit (Ping timeout: 480 seconds)
[23:47] * cronburg (~cronburg@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[23:48] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[23:48] * poningru (~evarghese@gw-sfo.plos.org) has joined #ceph
[23:49] <poningru> hello
[23:49] <poningru> http://docs.ceph.com/docs/hammer/rados/configuration/filesystem-recommendations/
[23:49] <poningru> is that doc current?
[23:49] <poningru> it says copyright 2014
[23:50] <poningru> trying to figure out if xfs or btrfs is the recommended file system now
[23:51] * BManojlovic (~steki@cable-94-189-162-3.dynamic.sbb.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:56] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[23:57] <rkeene> There we go, now everything is working
[23:58] * jermudgeon (~jhaustin@gw1.ttp.biz.whitestone.link) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.