#ceph IRC Log

Index

IRC Log for 2016-06-13

Timestamps are in GMT/BST.

[0:02] * IvanJobs (~ivanjobs@183.192.78.179) Quit (Ping timeout: 481 seconds)
[0:06] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[0:07] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[0:11] * linjan_ (~linjan@176.195.88.134) has joined #ceph
[0:18] * linjan (~linjan@176.195.198.8) Quit (Ping timeout: 480 seconds)
[0:20] * tsuraan (~tsuraan@c-50-157-98-31.hsd1.mn.comcast.net) has joined #ceph
[0:23] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[0:24] <tsuraan> The cache tiering doc says that tiering will normally degrade performance, unless the caching behaviour is such that only a small number of objects are regularly touched. Is this "small number" dependent on the cache size? I'd like to have a few 400GiB NVMe drives as a replicated cache tier on top of some sort of LRC bulk pool running on spinning rust. Does that just mean that my active set needs to be able
[0:25] <tsuraan> to fit on the NVMe, or is there more to it than that?
[0:40] * IvanJobs (~ivanjobs@183.192.78.179) has joined #ceph
[0:40] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[0:48] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Ping timeout: 480 seconds)
[0:58] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[1:04] * kuku (~oftc-webi@119.93.91.136.static.pldt.net) has joined #ceph
[1:11] * oms101 (~oms101@p20030057EA160D00C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:14] * dnunez (~dnunez@c-73-38-0-185.hsd1.ma.comcast.net) has joined #ceph
[1:20] * oms101 (~oms101@p20030057EA1A1000C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:24] * dnunez (~dnunez@c-73-38-0-185.hsd1.ma.comcast.net) Quit (Quit: Leaving)
[1:46] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[1:48] * vbellur (~vijay@71.234.224.255) Quit (Quit: Leaving.)
[1:48] * vbellur (~vijay@71.234.224.255) has joined #ceph
[1:49] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[1:51] * FNugget1 (~osuka_@ded31663.iceservers.net) has joined #ceph
[1:52] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[1:55] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Ping timeout: 480 seconds)
[1:59] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit (Quit: treenerd_)
[2:20] * FNugget1 (~osuka_@06SAADW1Z.tor-irc.dnsbl.oftc.net) Quit ()
[2:20] * Uniju1 (~tokie@exit1.ipredator.se) has joined #ceph
[2:48] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[2:50] * Uniju1 (~tokie@7V7AAF4IK.tor-irc.dnsbl.oftc.net) Quit ()
[2:59] * DoDzy (~superdug@tor.piratenpartei-nrw.de) has joined #ceph
[3:00] * IvanJobs (~ivanjobs@183.192.78.179) Quit ()
[3:00] * huangjun (~kvirc@113.57.168.154) has joined #ceph
[3:05] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[3:19] * kefu (~kefu@183.193.187.151) has joined #ceph
[3:21] * Nicola-1980 (~Nicola-19@x4db48315.dyn.telefonica.de) has joined #ceph
[3:27] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Remote host closed the connection)
[3:27] * Nicola-1_ (~Nicola-19@x4db4b000.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[3:29] * DoDzy (~superdug@7V7AAF4J6.tor-irc.dnsbl.oftc.net) Quit ()
[3:29] * Grum (~vegas3@nl9x.mullvad.net) has joined #ceph
[3:32] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[3:37] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[3:43] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Remote host closed the connection)
[3:43] * sebastian-w (~quassel@212.218.8.138) Quit (Remote host closed the connection)
[3:43] * sebastian-w (~quassel@212.218.8.138) has joined #ceph
[3:44] * yanzheng (~zhyan@118.116.113.63) has joined #ceph
[3:47] * kefu_ (~kefu@114.92.120.18) has joined #ceph
[3:49] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[3:49] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[3:55] * kefu (~kefu@183.193.187.151) Quit (Ping timeout: 480 seconds)
[3:55] * derjohn_mobi (~aj@x590e5d1c.dyn.telefonica.de) has joined #ceph
[3:58] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Remote host closed the connection)
[3:59] * Grum (~vegas3@7V7AAF4LC.tor-irc.dnsbl.oftc.net) Quit ()
[3:59] * Random (~sixofour@marylou.nos-oignons.net) has joined #ceph
[4:00] * Nats__ (~natscogs@114.31.195.238) has joined #ceph
[4:01] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Remote host closed the connection)
[4:03] * aj__ (~aj@x4db10af5.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[4:03] * EinstCrazy (~EinstCraz@58.247.117.134) has joined #ceph
[4:04] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[4:07] * vbellur (~vijay@71.234.224.255) Quit (Read error: Connection reset by peer)
[4:07] * Penn (~penn11.li@14.154.254.12) has joined #ceph
[4:07] * Nats_ (~natscogs@114.31.195.238) Quit (Ping timeout: 480 seconds)
[4:08] * chengpeng (~chris@180.168.126.243) Quit (Quit: Leaving)
[4:08] * rongze (~rongze@117.136.67.195) has joined #ceph
[4:08] * chengpeng (~chris@180.168.170.2) has joined #ceph
[4:14] * rongze (~rongze@117.136.67.195) Quit (Read error: Connection reset by peer)
[4:14] * kuku (~oftc-webi@119.93.91.136.static.pldt.net) Quit (Ping timeout: 480 seconds)
[4:14] * rongze (~rongze@117.136.67.195) has joined #ceph
[4:29] * Random (~sixofour@06SAADW60.tor-irc.dnsbl.oftc.net) Quit ()
[4:30] * Schaap (~blank@torexit-readme.balist.es) has joined #ceph
[4:30] * rongze_ (~rongze@223.64.60.123) has joined #ceph
[4:32] * rongze (~rongze@117.136.67.195) Quit (Ping timeout: 480 seconds)
[4:34] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[4:35] * shyu (~shyu@218.241.172.114) has joined #ceph
[4:37] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) has joined #ceph
[4:48] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) has joined #ceph
[4:57] * kefu_ (~kefu@114.92.120.18) Quit (Max SendQ exceeded)
[4:58] * kefu (~kefu@114.92.120.18) has joined #ceph
[4:59] * Schaap (~blank@7V7AAF4NL.tor-irc.dnsbl.oftc.net) Quit ()
[4:59] * CoZmicShReddeR (~Epi@atlantic480.us.unmetered.com) has joined #ceph
[5:08] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[5:08] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Remote host closed the connection)
[5:08] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[5:10] * NTTEC (~nttec@112.198.77.21) has joined #ceph
[5:20] * NTTEC (~nttec@112.198.77.21) Quit (Ping timeout: 480 seconds)
[5:22] <chengpeng> anyone run radosgw with nginx instead of apache or civetweb?
[5:23] * kefu (~kefu@114.92.120.18) Quit (Max SendQ exceeded)
[5:23] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[5:24] * kefu (~kefu@114.92.120.18) has joined #ceph
[5:29] * CoZmicShReddeR (~Epi@06SAADW9B.tor-irc.dnsbl.oftc.net) Quit ()
[5:30] * VampiricPadraig (~Rens2Sea@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[5:32] * Vacuum_ (~Vacuum@88.130.220.118) has joined #ceph
[5:34] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[5:37] * EinstCra_ (~EinstCraz@58.247.117.134) has joined #ceph
[5:37] * EinstCrazy (~EinstCraz@58.247.117.134) Quit (Read error: Connection reset by peer)
[5:37] * NTTEC (~nttec@203.177.235.23) has joined #ceph
[5:39] * Vacuum__ (~Vacuum@88.130.210.216) Quit (Ping timeout: 480 seconds)
[5:41] * NTTEC (~nttec@203.177.235.23) Quit (Remote host closed the connection)
[5:42] * XCat (~XCat@80.232.241.112) has joined #ceph
[5:43] <XCat> Hi all,
[5:45] <XCat> I found a very suspicious benchmark by virtuozzo stating that their storage is 10x faster sometimes. Can anyone comment wheter such thing can be so, or its just their marketing.
[5:48] <XCat> could this be because they were testing on 1gbit? I am planning also to host storage for containers with ceph, but not on 1gbit of course.
[5:49] <iggy> you could probably detune and/or find 1 specific test where one type of storage is faster than another
[5:50] <iggy> you really need to test your own workload
[5:50] <XCat> the link to their "test" is https://virtuozzo.com/wp-content/uploads/2016/05/Virtuzzo-Storage-v-Ceph-Tech-Note-20160509.pdf
[5:50] <iggy> but there are reasons why a large majority of the openstack deployments in the wild use ceph
[5:51] <iggy> their site is wordpress... automatic grounds for dismissal
[5:51] <XCat> :D
[5:52] <XCat> i have already decided to use ceph, just not ready for production level workload testing yet.
[5:55] <iggy> yeah, I wouldn't guess that kind of hardware was particularly well suited for a ceph cluster
[5:55] <iggy> that's a lot of spinners and no journals
[5:57] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[5:57] <iggy> it also sounds like they don't have as strong of data guarantees as ceph
[5:58] <XCat> i did stumble on them because they do claim that some their customers have 8000 vms or something like that
[5:59] * iggy slow claps
[5:59] <iggy> call me when they've got 25k systems to deal with
[5:59] * VampiricPadraig (~Rens2Sea@7V7AAF4PW.tor-irc.dnsbl.oftc.net) Quit ()
[6:00] * Harryhy (~poller@hessel0.torservers.net) has joined #ceph
[6:00] <iggy> it doesn't really say how they are accessing ceph either
[6:00] <XCat> in my case i just want to use ceph as storage backend for customer's VPS servers, for reliability
[6:01] * vata (~vata@cable-192.222.249.207.electronicbox.net) Quit (Quit: Leaving.)
[6:01] <XCat> because i have problem with the current 'standard cheap VPS' paradigm that on hardware failure you either lose data or have to use an hour old backup with inconsistent state.
[6:01] <iggy> I mean they say _they_ use fuse... does that mean they are using ceph's fuse adapter? if so... definitely not a fair test
[6:02] <iggy> oh, there's some SSDs... 2 for 36 spinners... sounds like they bought hardware designed for their setup and shoehorned ceph onto it to try to look good
[6:02] <XCat> i had suspicions about the test, thats why i decided to come here and ask someone who actually knows :)
[6:02] <XCat> so if i have ssd for each node i should be ok? :)
[6:02] <iggy> my storageops guy could probably rip tons more holes in that benchmark
[6:03] <iggy> SSDs depends on how many spinners you have
[6:04] <XCat> can one ask a upside down question - how much spinners would a single 1.2TB PCIE nvme ssd "support"?
[6:05] <XCat> they say it has 400k read and 300k write ips
[6:05] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:06] <iggy> that's also a pretty old version of ceph
[6:07] <iggy> nvme's are still new enough that I'm not sure there's a ratio worked out yet
[6:07] * overclk (~quassel@117.202.96.223) has joined #ceph
[6:07] <XCat> it has not yet arrived in the mail
[6:08] <iggy> how many spinners do you have per box?
[6:08] <XCat> so i dont even know what the real world iops are
[6:08] <XCat> at the moment i have 6 2.5" bays available for spinners per box
[6:10] <XCat> however i don't have all of the actual spinners yet, nor the ssd yet. But im still curious about the best practice ratio
[6:10] <XCat> i saw different numbers in different ceph presentations
[6:11] <iggy> we have 1:7 and it's rough... not performance wise, that seems fine, but failure wise
[6:12] <XCat> the problem is that most of the hardware we have is very old like 2010 dell servers
[6:12] <iggy> if we lose 1 SSD (we've lost 2 already in less than a year) the hit to the entire cluster is pretty rowdy
[6:12] <XCat> they do have 2x6core xeons, and still do work quite well, so we don't throw them out yet
[6:12] <iggy> 610's?
[6:12] <XCat> yes
[6:12] <XCat> but it has only 1 free pci slot for the ssd
[6:13] <iggy> lol, we have a few of those laying around too
[6:13] <XCat> as the other one is taken by the connectx-3
[6:13] <jamespd_> <-- must commissioned a 12 x SATA + 1 x nvme node.
[6:13] <jamespd_> don't have any numbers yet :|
[6:13] * rongze_ (~rongze@223.64.60.123) Quit (Remote host closed the connection)
[6:14] <XCat> what if i add 1 sata ssd in addition to the 1 nvme ssd
[6:14] <XCat> however both ssds would be very diferrent in performance
[6:14] <XCat> so then i have 2 ssds and 5 spinners per node
[6:14] <iggy> depends on what you're after, and what size cluster you are planning
[6:15] <iggy> 1 nvme can definitely handle 6 spinners
[6:15] <iggy> performance wise
[6:15] <XCat> experimental size, small for starters
[6:15] <XCat> like no more than 5-6 nodes, to see how it goes
[6:15] <iggy> but if you have a small cluster, that makes losing an nvme akin to losing a box and having to rebuild all that data
[6:16] <XCat> so when i read about ceph with thousands of nodes i wonder whether those apply to me
[6:16] <iggy> is this being deployed or poc?
[6:17] <XCat> i want to see if it works
[6:17] <XCat> if it does for a while and i cant easily break it, i will try to bring it in production
[6:18] <iggy> like I said, performance wise, you should be okay
[6:18] <XCat> thank you for your help :)
[6:18] <iggy> it's the failure scenarios that would scare me
[6:18] <XCat> so what 2ssds + 5 spinners ?
[6:18] <XCat> what happens if 1 ssd is nvme and other is sata ?
[6:19] <iggy> it really depends on what you think you can survive
[6:19] <XCat> how would 2 ssd journals with 2x speed difference affect ceph ?
[6:19] <iggy> my next ceph cluster is going to be for internal devs, so I'd be fine with 1 nvme to 6 spinners
[6:19] <iggy> you'd basically be limited to the slowest one probably
[6:21] <XCat> thats what i thought
[6:21] <iggy> I mean you could do it that way and if you don't have an issue with rebuilding a whole box, you could always turn the SSDs into a separate SSD pool for customers that want to pay the extra $
[6:21] <iggy> or something
[6:22] <iggy> the bad thing is there's no definitive answer
[6:22] <XCat> my initial plan was to use the nvme for similar purposes, in addition to ceph
[6:22] <iggy> you'll likely have to test and see what you get
[6:22] <XCat> depending how much of the nvme ceph will eat during testing
[6:22] <XCat> as 400k iops in the nvme datasheet seem a lot
[6:23] <iggy> also depends what your required performance is
[6:24] <XCat> at the moment i have VPS with local storage, spinners as well as ssd ones. but when node breaks down, i lose customers.
[6:24] <iggy> right
[6:24] <XCat> so i want to move to proper 'cloud' storage, to deliver "normal" vps performance yet more reliably
[6:24] <jamespd_> FYI: Our storage servers for general use block devices are 4 SATA to 1 SSD, or 12 SATA to 1 NVME.
[6:25] <iggy> are you going to have like 50 VMs connected to this storage? 500? 5000?
[6:26] <iggy> you can do some rough math to figure out how much perf you'll lose if a box dies
[6:26] * Penn (~penn11.li@14.154.254.12) Quit (Ping timeout: 480 seconds)
[6:26] <iggy> see if that's acceptable
[6:27] <XCat> atm i have no idea how much vms it can handle
[6:27] <XCat> i want to test what it does with 400
[6:27] <XCat> or a little bit less actually ~380
[6:28] <iggy> I'd say go with the 1 nvme + 6 spinners
[6:28] <XCat> so you think it would do the reshufling stuff fast enough if one nvme breaks?
[6:29] <iggy> the nvme's should last a couple of years that you'll be scaled beyond 5-6 nodes by the time one of them decides to shit itself
[6:29] * Harryhy (~poller@06SAADXBS.tor-irc.dnsbl.oftc.net) Quit ()
[6:29] * ZombieTree (~rogst@lispspb.fvds.ru) has joined #ceph
[6:30] <XCat> in theory one can break any time
[6:30] <XCat> i have not seen a nvme break however
[6:31] <iggy> well, like I said, you can do the math pretty easily to see how much of an impact rebuilding a node would be
[6:31] <iggy> and there are tunables to make that impact your running cluster less
[6:31] <XCat> the impact is the transfer of the equivelent size of the broken node between remaining nodes?
[6:32] <iggy> yeah, at whatever speed the rest of the cluster can maintain
[6:32] <iggy> equivalent size of the used space on the broken node
[6:32] <XCat> thank you for explaining :)
[6:33] <iggy> unless you are offering a huge amount of space, that's going to be what maybe 2T
[6:33] <iggy> (assuming 400 nodes * 10G per node * 50% util)
[6:34] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[6:34] <iggy> you need to make sure the rest of your cluster has that space available and that performance wise it can handle it
[6:34] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[6:35] <iggy> so each node will then have 400G incoming
[6:35] <iggy> also keep in mind memory pressure goes up pretty high when rebuilding
[6:35] <XCat> i have some connectx3 adapters, and i ordered some more. i hope the infiniband will remove any network wise bottleneck.
[6:36] <XCat> the nodes in their current configuration have 128g ram, i don't know how much ceph would actually use
[6:36] <iggy> unless I missed ceph getting IB transport, you'll be stuck using the IP stack on those
[6:37] <iggy> for better or worse...
[6:37] <iggy> 128G seems like it should be plenty for 6 spinners
[6:37] <iggy> what size spinners you planning to have?
[6:38] <XCat> i have some 2tb and some 4tb. i dont have enough
[6:38] * Penn (~penn11.li@14.154.254.12) has joined #ceph
[6:39] <XCat> i plan to order more, but i have not decided how big i actually need them. maybe i need them actually smaller, if i look at the current usage.
[6:40] * NTTEC (~nttec@203.177.235.23) has joined #ceph
[6:41] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:41] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[6:45] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[6:45] * rongze (~rongze@223.64.60.123) has joined #ceph
[6:52] * ircuser-1 (~Johnny@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[6:55] * Concubidated (~cube@c-50-173-245-118.hsd1.ca.comcast.net) has joined #ceph
[6:57] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[6:59] * ZombieTree (~rogst@7V7AAF4R3.tor-irc.dnsbl.oftc.net) Quit ()
[6:59] * allenmelon (~Jaska@vmd7136.contabo.host) has joined #ceph
[7:05] * zerick_ (~zerick@104.131.101.65) has joined #ceph
[7:07] * deepthi (~deepthi@115.118.48.46) has joined #ceph
[7:10] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Remote host closed the connection)
[7:13] * allenmelon (~Jaska@06SAADXD3.tor-irc.dnsbl.oftc.net) Quit (autokilled: This host may be infected. Mail support@oftc.net with questions. BOPM (2016-06-13 05:13:50))
[7:14] * dontron (~notarima@4MJAAGE3T.tor-irc.dnsbl.oftc.net) has joined #ceph
[7:18] * zerick_ (~zerick@104.131.101.65) Quit (Remote host closed the connection)
[7:18] * zerick_ (~zerick@104.131.101.65) has joined #ceph
[7:20] * Penn (~penn11.li@14.154.254.12) Quit (Ping timeout: 480 seconds)
[7:29] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:32] * NTTEC_ (~nttec@203.177.235.23) has joined #ceph
[7:32] * NTTEC (~nttec@203.177.235.23) Quit (Read error: Connection reset by peer)
[7:32] * shyu (~shyu@218.241.172.114) Quit (Read error: Connection reset by peer)
[7:33] * swami1 (~swami@49.32.0.204) has joined #ceph
[7:37] * natarej_ (~natarej@101.188.54.14) Quit (Read error: Connection reset by peer)
[7:41] * Penn (~penn11.li@14.154.254.12) has joined #ceph
[7:43] * dontron (~notarima@4MJAAGE3T.tor-irc.dnsbl.oftc.net) Quit ()
[7:44] * Grum (~DougalJac@tor-exit4-readme.dfri.se) has joined #ceph
[7:48] * kawa2014 (~kawa@dynamic-adsl-84-220-86-76.clienti.tiscali.it) has joined #ceph
[7:49] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[7:50] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) has joined #ceph
[7:55] * Lokta (~Lokta@carbon.coe.int) has joined #ceph
[7:58] * kawa2014 (~kawa@dynamic-adsl-84-220-86-76.clienti.tiscali.it) Quit (Ping timeout: 480 seconds)
[7:58] * kefu (~kefu@114.92.120.18) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[7:58] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[8:03] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:07] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:07] * Kurt (~Adium@2001:628:1:5:ac2e:24a2:6ec9:de19) has joined #ceph
[8:07] * kawa2014 (~kawa@dynamic-adsl-84-220-89-95.clienti.tiscali.it) has joined #ceph
[8:08] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[8:12] * Heebie (~thebert@dub-bdtn-office-r1.net.digiweb.ie) Quit (Ping timeout: 480 seconds)
[8:13] * Grum (~DougalJac@7V7AAF4UW.tor-irc.dnsbl.oftc.net) Quit ()
[8:14] * CydeWeys (~luigiman@hessel3.torservers.net) has joined #ceph
[8:15] * Concubidated (~cube@c-50-173-245-118.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:15] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Remote host closed the connection)
[8:16] * kefu (~kefu@114.92.120.18) has joined #ceph
[8:19] * zerick (~zerick@irc.quassel.zerick.me) Quit (Remote host closed the connection)
[8:20] * kefu (~kefu@114.92.120.18) Quit (Max SendQ exceeded)
[8:20] * kefu (~kefu@114.92.120.18) has joined #ceph
[8:20] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[8:26] * kefu (~kefu@114.92.120.18) Quit (Remote host closed the connection)
[8:27] * Miouge (~Miouge@91.177.17.5) has joined #ceph
[8:29] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:31] * kefu (~kefu@114.92.120.18) has joined #ceph
[8:35] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Remote host closed the connection)
[8:36] * zerick (~zerick@188.166.156.218) has joined #ceph
[8:37] * zerick (~zerick@188.166.156.218) Quit (Remote host closed the connection)
[8:38] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[8:38] * karnan (~karnan@121.244.87.117) Quit (Ping timeout: 480 seconds)
[8:39] * karnan (~karnan@121.244.87.117) has joined #ceph
[8:41] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Remote host closed the connection)
[8:42] * mhuang (~mhuang@182.92.253.2) has joined #ceph
[8:43] * CydeWeys (~luigiman@06SAADXG6.tor-irc.dnsbl.oftc.net) Quit ()
[8:43] * KristopherBel (~hyst@46.166.188.222) has joined #ceph
[8:49] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[8:49] * karnan_ (~karnan@121.244.87.117) has joined #ceph
[8:51] * dgurtner (~dgurtner@178.197.226.162) has joined #ceph
[8:51] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Remote host closed the connection)
[8:52] * dvanders (~dvanders@2001:1458:202:225::102:124a) has joined #ceph
[8:53] * EinstCra_ (~EinstCraz@58.247.117.134) Quit (Remote host closed the connection)
[8:56] * EinstCrazy (~EinstCraz@58.247.119.250) has joined #ceph
[8:56] * lmb (~Lars@ip5b41f0a4.dynamic.kabel-deutschland.de) Quit (Quit: Leaving)
[8:58] * CobraKhan0071 (~dicko@freedom.ip-eend.nl) has joined #ceph
[9:00] * KristopherBel (~hyst@7V7AAF4XG.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[9:02] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[9:03] * NTTEC_ (~nttec@203.177.235.23) Quit (Read error: Connection reset by peer)
[9:04] * NTTEC (~nttec@203.177.235.23) has joined #ceph
[9:09] * shyu (~shyu@218.241.172.114) has joined #ceph
[9:09] * NTTEC (~nttec@203.177.235.23) Quit (Read error: Connection reset by peer)
[9:10] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Ping timeout: 480 seconds)
[9:10] * NTTEC (~nttec@203.177.235.23) has joined #ceph
[9:13] * mhuang (~mhuang@182.92.253.2) Quit (Quit: This computer has gone to sleep)
[9:13] * erwan_taf (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[9:16] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[9:17] * fsimonce (~simon@host107-37-dynamic.251-95-r.retail.telecomitalia.it) has joined #ceph
[9:23] * analbeard (~shw@185.32.110.1) has joined #ceph
[9:23] <IvanJobs> I want to verify the saying "PG's acting osds using heartbeats to check each other's health". So I found that when OSD init, OSD will create a heartbeat thread, this thread will call heartbeat_entry, heartbeat_entry will call heartbeat, them send MOSDPing msg to peers.
[9:24] <IvanJobs> I didn't get it. the process above just run once, where is the polling logic? Can anyone explain this? thx in advance.
[9:24] * overclk_ (~quassel@117.202.96.223) has joined #ceph
[9:25] * rendar (~I@host102-48-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[9:27] * overclk (~quassel@117.202.96.223) Quit (Ping timeout: 480 seconds)
[9:28] * CobraKhan0071 (~dicko@4MJAAGE81.tor-irc.dnsbl.oftc.net) Quit ()
[9:28] * WedTM (~ChauffeR@192.87.28.82) has joined #ceph
[9:29] * khyron (~khyron@fixed-190-159-187-190-159-75.iusacell.net) has joined #ceph
[9:32] * wjw-freebsd (~wjw@39.237.62.188.dynamic.wline.res.cust.swisscom.ch) has joined #ceph
[9:41] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[9:42] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[9:43] * khyron (~khyron@fixed-190-159-187-190-159-75.iusacell.net) Quit ()
[9:46] * overclk (~quassel@117.202.96.223) has joined #ceph
[9:47] * overclk_ (~quassel@117.202.96.223) Quit (Ping timeout: 480 seconds)
[9:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[9:58] * WedTM (~ChauffeR@7V7AAF4ZK.tor-irc.dnsbl.oftc.net) Quit ()
[9:58] * Esge1 (~Quackie@46.101.169.151) has joined #ceph
[9:58] * dugravot6 (~dugravot6@194.199.223.4) has joined #ceph
[10:02] * derjohn_mobi (~aj@x590e5d1c.dyn.telefonica.de) Quit (Ping timeout: 480 seconds)
[10:03] * analbeard (~shw@185.32.110.1) Quit (Quit: Leaving.)
[10:04] * NTTEC (~nttec@203.177.235.23) Quit (Remote host closed the connection)
[10:05] * branto (~branto@178-253-128-187.3pp.slovanet.sk) has joined #ceph
[10:05] * NTTEC (~nttec@203.177.235.23) has joined #ceph
[10:05] * NTTEC (~nttec@203.177.235.23) Quit (Remote host closed the connection)
[10:08] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Remote host closed the connection)
[10:09] * dgurtner (~dgurtner@178.197.226.162) Quit (Ping timeout: 480 seconds)
[10:10] * overclk (~quassel@117.202.96.223) Quit (Ping timeout: 480 seconds)
[10:11] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Ping timeout: 480 seconds)
[10:14] * XCat (~XCat@80.232.241.112) Quit (Read error: Connection reset by peer)
[10:17] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:18] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[10:26] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Ping timeout: 480 seconds)
[10:28] * Esge1 (~Quackie@06SAADXLE.tor-irc.dnsbl.oftc.net) Quit ()
[10:28] * xanax` (~rikai@7V7AAF42E.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:30] * garphy`aw is now known as garphy
[10:31] * overclk (~quassel@117.202.96.223) has joined #ceph
[10:32] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[10:40] * Heebie (~thebert@dub-bdtn-office-r1.net.digiweb.ie) has joined #ceph
[10:43] * manoj (~oftc-webi@183.82.3.27) has joined #ceph
[10:43] <manoj> Hi
[10:44] <manoj> I would like to know of we can configure Synchronous replication between 2 diff cpeh clusters?
[10:44] * rdias (~rdias@2001:8a0:749a:d01:f578:f02a:e0d6:c275) Quit (Remote host closed the connection)
[10:44] * rraja (~rraja@121.244.87.117) has joined #ceph
[10:45] * rdias (~rdias@2001:8a0:749a:d01:79cc:3e0e:a421:3e7e) has joined #ceph
[10:47] * bitserker (~toni@81.184.9.72.dyn.user.ono.com) Quit (Quit: Leaving.)
[10:48] <vikhyat> manoj: as I suggested in #sepia
[10:49] <vikhyat> manoj: ceph has async replication for block storage (RBD ) as RBD mirroring and for object storage (Rados Gateway Multisite)
[10:50] <vikhyat> http://docs.ceph.com/docs/jewel/radosgw/multisite/
[10:50] <vikhyat> http://docs.ceph.com/docs/master/rbd/rbd-mirroring/
[10:50] <manoj> oh.. thank you
[10:50] * karnan_ (~karnan@121.244.87.117) Quit (Quit: Leaving)
[10:50] * karnan (~karnan@121.244.87.117) Quit (Quit: Leaving)
[10:50] * shylesh__ (~shylesh@45.124.227.70) has joined #ceph
[10:50] * karnan (~karnan@121.244.87.117) has joined #ceph
[10:58] * xanax` (~rikai@7V7AAF42E.tor-irc.dnsbl.oftc.net) Quit ()
[10:58] * Averad (~Jase@216.218.134.12) has joined #ceph
[11:00] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[11:00] * rdias (~rdias@2001:8a0:749a:d01:79cc:3e0e:a421:3e7e) Quit (Remote host closed the connection)
[11:07] * thomnico (~thomnico@2a01:e35:8b41:120:4810:b973:7519:fa42) has joined #ceph
[11:09] * Hemanth (~hkumar_@121.244.87.117) has joined #ceph
[11:10] * dvanders_ (~dvanders@2001:1458:202:16b::102:124a) has joined #ceph
[11:10] * TMM (~hp@185.5.121.201) has joined #ceph
[11:11] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[11:12] * derjohn_mobi (~aj@2001:6f8:1337:0:6824:4dfa:419b:8856) has joined #ceph
[11:13] * dgurtner (~dgurtner@178.197.226.162) has joined #ceph
[11:16] * dvanders (~dvanders@2001:1458:202:225::102:124a) Quit (Ping timeout: 480 seconds)
[11:21] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:22] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[11:23] * rdias (~rdias@2001:8a0:749a:d01:2c1f:6405:a506:436e) has joined #ceph
[11:28] * Averad (~Jase@7V7AAF43I.tor-irc.dnsbl.oftc.net) Quit ()
[11:28] * Kwen (~Yopi@marylou.nos-oignons.net) has joined #ceph
[11:29] * dgurtner (~dgurtner@178.197.226.162) Quit (Remote host closed the connection)
[11:29] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:29] * dgurtner (~dgurtner@178.197.226.162) has joined #ceph
[11:33] * mhuang (~mhuang@182.92.253.2) has joined #ceph
[11:35] * maxx2042 (~m.vernimm@asa01.comparegroup.eu) has joined #ceph
[11:35] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[11:35] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:36] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:ccf:dc55:3b8d:d313) has joined #ceph
[11:39] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:43] * dvanders_ (~dvanders@2001:1458:202:16b::102:124a) Quit (Remote host closed the connection)
[11:46] * rdas (~rdas@121.244.87.116) has joined #ceph
[11:47] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[11:50] * mhuang (~mhuang@182.92.253.2) Quit (Quit: This computer has gone to sleep)
[11:51] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[11:52] * rongze (~rongze@223.64.60.123) Quit (Remote host closed the connection)
[11:55] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Ping timeout: 480 seconds)
[11:56] <maxx2042> hi! I???ve got some unfound objects in my cluster and I???ve been analysing them. The situation is a bit weird because I can find the objects manually just fine. Would someone be interested to help figure out why ceph isn???t able to find the objects while I can find them right where they are supposed to be?
[11:58] * praveen (~praveen@122.167.128.246) Quit (Remote host closed the connection)
[11:58] * Kwen (~Yopi@4MJAAGFGP.tor-irc.dnsbl.oftc.net) Quit ()
[11:58] * hassifa (~Moriarty@orion1626.startdedicated.com) has joined #ceph
[11:59] <TMM> could someone have a look at this for me? http://paste.debian.net/738302/ With those settings my writeback cache tiers seem to act like writethrough caches and it's killing write performance on my cluster
[11:59] <TMM> I must be doing something really dumb wrong, but I can't figure it out
[12:02] * huangjun (~kvirc@113.57.168.154) Quit (Read error: Connection reset by peer)
[12:03] * huangjun (~kvirc@113.57.168.154) has joined #ceph
[12:04] * bitserker (~toni@63.pool85-52-240.static.orange.es) has joined #ceph
[12:08] <Be-El> TMM: the hit_set_count might be too high, but the last time i tried cache tiers was some month ago
[12:08] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[12:19] * bitserker (~toni@63.pool85-52-240.static.orange.es) Quit (Ping timeout: 480 seconds)
[12:19] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[12:19] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[12:20] <TMM> Be-El, I've already tried lowering it to 1, but with the same results
[12:20] <TMM> hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0}
[12:20] <TMM> hmm, is that 'target_size' perhaps what I'm looking at?
[12:21] * bitserker (~toni@63.pool85-52-240.static.orange.es) has joined #ceph
[12:21] <TMM> guess not, on my hammer cluster it's the same
[12:22] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[12:22] <Be-El> there was some work on the cache tier on infernalis and jewel. not sure how to setup it up in those releases
[12:22] <TMM> "decay_rate 0 search_last_n 0"
[12:22] <TMM> those appear to be new in jewel
[12:22] <TMM> perhaps those are the problem?
[12:22] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[12:23] <TMM> I don't see those mentioned in the documentation for jewl
[12:23] <Be-El> maybe you also need to add target_max_objects for the flush/evict to work properly
[12:23] <TMM> you think it may default to 0?
[12:24] <Be-El> you can validate it with 'ceph osd pool ls detail'
[12:24] <Be-El> but since you do not set it explicitly it will have its default value
[12:24] <TMM> it's completely unset
[12:24] <TMM> it doesn't appear at all
[12:25] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[12:26] <TMM> http://www.spinics.net/lists/ceph-users/msg28229.html <-- seems like someone else is seeing the same issue
[12:26] <TMM> I guess I have to use ec on the backend for this to work? that is weird
[12:27] * chengpeng (~chris@180.168.170.2) Quit (Ping timeout: 480 seconds)
[12:27] * chengpeng (~chris@180.168.126.179) has joined #ceph
[12:27] <Be-El> but in that case target_objects is also set
[12:27] <TMM> I can set it and give it another go
[12:27] <TMM> sec
[12:28] * kefu (~kefu@114.92.120.18) Quit (Quit: Textual IRC Client: www.textualapp.com)
[12:28] * hassifa (~Moriarty@4MJAAGFH4.tor-irc.dnsbl.oftc.net) Quit ()
[12:28] * DoDzy (~storage@anonymous.sec.nl) has joined #ceph
[12:29] <Be-El> there's also a cache_target_dirty_high_ratio setting in jewel. maybe it is also missing
[12:30] <TMM> I do set that to .6
[12:30] <TMM> maybe I *have* to use ec pools for the backend, I kind of wanted to avoid that
[12:30] <Be-El> that's cache_target_dirty_ratio, not cache_target_dirty_high_ratio. there are three ratio in jewel now
[12:31] <Be-El> but i can only guess what's necessary in jewel. the documentation on cache tiers has been incomplete for some time now
[12:32] * huangjun (~kvirc@113.57.168.154) Quit (Ping timeout: 480 seconds)
[12:32] <TMM> adding the target_max_objects didn't change anything
[12:32] <TMM> neither does adding cache_target_dirty_high_ratio
[12:32] <TMM> I set cache_target_dirty_ratio, cache_target_dirty_high_ratio, cache_target_full_ratio
[12:33] <TMM> I even set cache_min_evict_age now, and it still evicts immediately
[12:33] <TMM> trying an ec backing pool now
[12:35] * bitserker1 (~toni@63.pool85-52-240.static.orange.es) has joined #ceph
[12:35] * bitserker (~toni@63.pool85-52-240.static.orange.es) Quit (Read error: Connection reset by peer)
[12:38] <TMM> even with an ec backing pool objects immediately flow into the backing pool
[12:38] <TMM> I'm completely stumped now
[12:38] <Be-El> so something is still missing
[12:38] * kawa2014 (~kawa@dynamic-adsl-84-220-89-95.clienti.tiscali.it) Quit (Ping timeout: 480 seconds)
[12:39] <Be-El> can post post the complete configuration for the rep. pool and its cache tier (lines from ceph osd pool ls detail)?
[12:40] <TMM> http://paste.debian.net/738600/
[12:40] <TMM> I uploaded 800mb to the 'images' pool
[12:41] <TMM> and I see 800mb in the backing pool and 500 in the cache tier
[12:41] <TMM> this makes 0 sense to me
[12:43] <Be-El> does the number of objects in the cache pool change?
[12:44] <TMM> what do you mean change?
[12:45] <Be-El> does the cache tier evict the objects?
[12:45] <TMM> oh you mean does it go up then down?
[12:45] <Be-El> exactly
[12:48] <TMM> let me see
[12:48] <TMM> deleting the pools again
[12:50] * gauravbafna (~gauravbaf@106.206.156.241) has joined #ceph
[12:50] <TMM> hmm, hard to tell the images_cache object count never goes down
[12:50] <TMM> it grows, just slower than the backend pools'
[12:51] <Be-El> so the objects are actually cached, but also flushed to the backend
[12:51] <TMM> well, partially
[12:51] <TMM> I have 112 objects in the backend tier, and 76 in the cache tier
[12:51] <TMM> so they don't even all seem to make it to the cache
[12:52] * EinstCrazy (~EinstCraz@58.247.119.250) Quit (Read error: Connection reset by peer)
[12:53] <Be-El> in your last pastebin there were 109 write operations, and 2239 read operations. how long did the pool exists and what was the timespan between adding the objects (the 109 write operations) and reading them?
[12:53] <TMM> the pool existed for maybe 1 minute
[12:53] <Be-El> if the objects are not 'hot enough', the cache tier might already have evicted them
[12:53] <TMM> I keep deleting the pools and recreating them between tests
[12:53] <TMM> but, there's a min age of 1800
[12:55] <TMM> just adding some new settings from jewel now
[12:55] <TMM> sec
[12:58] * gauravbafna (~gauravbaf@106.206.156.241) Quit (Ping timeout: 480 seconds)
[12:58] * DoDzy (~storage@7V7AAF464.tor-irc.dnsbl.oftc.net) Quit ()
[13:01] <TMM> Be-El, oh! I managed to make it worse! progress!
[13:06] <TMM> hmm, any change to min_read_recency_for_promote or min_write_recency_for_promote just makes it even worse
[13:06] <Be-El> \o/
[13:07] <TMM> I haven't found a setting that makes it 'work' though
[13:08] * tumeric (~jcastro@56.73.54.77.rev.vodafone.pt) has joined #ceph
[13:08] <tumeric> hello guys, I want to divide my pgs into OSD's evenly so in case I one of my machine goes down I can still serve IO's
[13:09] <tumeric> my setup: 3 machines, 72osds, 4224 PG's, 2 replicas
[13:09] <tumeric> is it possible?
[13:09] <tumeric> 24osd per machine
[13:09] * gauravbafna (~gauravbaf@106.216.131.73) has joined #ceph
[13:09] <TMM> that should happen by default
[13:10] <tumeric> my crush map has the same weight for all osd's
[13:10] <tumeric> Well, I did reboot one machine and I got "peered" pgs
[13:10] <tumeric> meaning 24osd went down, I still had 48
[13:10] <TMM> Be-El, it appears that cache tiers are just completely broken in jewel, I think I'm just going to downgrade to hammer
[13:10] <Be-El> tumeric: size=2 and min_size=1 for the pools?
[13:11] <tumeric> size=2 and min_size=2
[13:11] <Be-El> tumeric: in that case the pg are distributed over all three machine, and both replicates have to be accessible to allow I/O
[13:11] <tumeric> should I change to min_size=1 ?
[13:11] <TMM> no, you should change to size=3 and min_size=2
[13:11] <tumeric> I want two replicas
[13:12] <Be-El> tumeric: even number of replicates might result in a problem in split brain situations
[13:12] <tumeric> what advatange would it bring TMM?
[13:12] <TMM> tumeric, not losing data when something goes wrong?
[13:12] <TMM> if 'not losing data' is not that important then size=2 would work too I suppose
[13:12] <tumeric> It is important, of course
[13:12] <Be-El> well, not actually losing data, but not knowing which dataset is the recent one
[13:12] <TMM> Be-El, sure, but it comes down to the same thing :P
[13:13] <tumeric> The thing is that I will lose a lot of space, right?
[13:13] <Be-El> tumeric: sure, that's the problem with replication setups
[13:13] <tumeric> you are right, the thing is that I have to explain this to my boss :p
[13:14] <Be-El> tumeric: do you use ceph for rbd, object storage or cephfs?
[13:14] <tumeric> fs
[13:14] <tumeric> So with size=3 and min_size=2 if I reboot one machine I will still be able to serve IO?
[13:15] <Be-El> too bad, in the object storage case i would have suggested to use erasure coding pools in 4+2 setup and a custom crush ruleset
[13:15] <tumeric> We are using fs, devs love posix :p
[13:15] <Be-El> tumeric: in that case one machine may be inaccessible, and your clients will still have access to all data (read + write)
[13:15] <tumeric> what about FS?
[13:15] <Be-El> tumeric: until the devs find out that most third party software does not handle posix correctly....
[13:16] <Be-El> cephfs does not directly support erasure coding pools
[13:16] <tumeric> I know Be-El but it's not my decision. boss pays, boss demands :p
[13:16] <tumeric> :(
[13:16] <Be-El> you'll need a cache tier on top of the ec pool, which again should be a replicated setup, preferably on ssds
[13:17] <Be-El> and their size should match your working sets etc.
[13:17] <tumeric> But in my situation what would be the best option?
[13:17] * gauravbafna (~gauravbaf@106.216.131.73) Quit (Ping timeout: 480 seconds)
[13:17] <tumeric> 3 replicas, right? first of all..
[13:18] <tumeric> I thought split brain issues would only come with the monitors
[13:18] <Be-El> tumeric: you just need to let your boss decide: more storage (size=2, min_size=1) with possible data loss in case of a failing machine, or less available space (size=3, min_size=2), but better protection against data loss
[13:19] <tumeric> for now I can afford size 3 and min_size 2
[13:19] <tumeric> but are you sure it will work?
[13:19] <Be-El> in the later case you might even loose two machine without data loss (but the data will not be accessible as long as only one machine is available)
[13:19] <tumeric> I mean, my crush map has just 3 hosts, 24 osd per host, same weight
[13:19] <Be-El> tumeric: you can test it
[13:20] <Be-El> tumeric: setup the pool with the correct size/min_size settings, wait for the cluster to become healthy, shutdown one machine
[13:20] <tumeric> yeah
[13:20] <tumeric> I will do it
[13:20] <tumeric> but Be-El should it work with size=2 min_size=1 ? I mean, should ceph be smart enough not to ever place 2 pgs in the same machine?
[13:21] <Be-El> and i'm not sure about the split brain situation with respect to data. the placement groups have a timestamp/epoch counter, and the most recent one is the acting one.
[13:21] <Be-El> tumeric: the default crush rulesets distribute replicates across hosts
[13:22] * chengpeng (~chris@180.168.126.179) Quit (Quit: Leaving)
[13:22] * b0e (~aledermue@213.95.25.82) has joined #ceph
[13:22] <tumeric> Be-El, yes. thats what I thought too
[13:22] <tumeric> but then I noticed that one machine had the two replicas pgs, and it went down I wasnt able to serve files
[13:22] <tumeric> argh :p
[13:23] <Be-El> tumeric: in that case your crush ruleset might not be correct
[13:24] * overclk_ (~quassel@117.202.96.180) has joined #ceph
[13:25] <tumeric> Be-El, or maybe it was serving and I thought it wasnt. What does the peered mean?
[13:25] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[13:25] <Be-El> tumeric: afaik it means that the osds involved for a placement groups are synchronizing their inventory
[13:26] <tumeric> undersized+degraded+peered
[13:26] <tumeric> should still be able to serve, right?
[13:27] * overclk (~quassel@117.202.96.223) Quit (Ping timeout: 480 seconds)
[13:27] <Be-El> nope, only active pgs allows I/O
[13:28] <tumeric> so maybe there is an issue with the crushmap.
[13:28] <tumeric> because only one machine went down
[13:28] * e0ne (~e0ne@194.213.110.76) has joined #ceph
[13:28] * vicente (~~vicente@125-227-238-55.HINET-IP.hinet.net) Quit (Quit: Leaving)
[13:30] <Be-El> well, just upload the crush map (ceph osd tree) and your crush rulesets (ceph osd crush rule dump) to some pastebin and i might have a look at it
[13:32] <tumeric> Thanks, I will do it
[13:33] * Frymaster (~Kyso_@tor-exit.cappuccino.pw) has joined #ceph
[13:33] <tumeric> Well, seems the PG's are recovering
[13:33] <tumeric> but very slow
[13:34] <tumeric> http://pastebin.com/5uXDKBky
[13:34] <tumeric> number keeps reducing
[13:35] <Be-El> how much data is currently stored in your cluster?
[13:36] * rotbeard (~redbeard@185.32.80.238) Quit (Ping timeout: 480 seconds)
[13:37] <tumeric> 3TB
[13:37] <Heebie> tumeric: How fast are your cluster & client networks and what types of disks?
[13:37] <tumeric> but, lots and lots of inodes
[13:37] <tumeric> I have a slow backfill.
[13:38] <tumeric> On purpose
[13:38] <tumeric> I have HDD's and client works are 2gb
[13:38] <tumeric> but they are not full
[13:39] <tumeric> crushmap: http://pastebin.com/ACVJpFNi
[13:39] <Be-El> do you use ssd based journals?
[13:39] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Remote host closed the connection)
[13:39] <tumeric> No
[13:40] <tumeric> http://pastebin.com/1WxDeJRc
[13:40] <tumeric> OSD dump (rgw pools are just for testing)
[13:40] * dgurtner (~dgurtner@178.197.226.162) Quit (Read error: Connection reset by peer)
[13:41] <Be-El> the a_metadata and a_data pools are the problem
[13:41] <tumeric> Thanks
[13:41] <tumeric> so whats the problem with them?
[13:41] * IvanJobs (~ivanjobs@103.50.11.146) has joined #ceph
[13:41] <Be-El> and the rbd pool (which you probably do not use)
[13:42] <tumeric> yeah I dont
[13:42] <tumeric> is it the ruleset?
[13:42] <Be-El> well, as discussed before...their size is 2, and their min_size is also 2. on third of them will have one replicate on the host that's currently not available, so there's only one existing replicate
[13:43] <tumeric> Aha. So min_size = 1
[13:43] <tumeric> or
[13:43] <tumeric> size = 3 and min_size = 2 ?
[13:43] <Be-El> with one replicate the min_size=2 requirement is not met -> pg is not active -> no I/O
[13:43] <tumeric> Makes sense.
[13:43] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Remote host closed the connection)
[13:44] <tumeric> Thanks :)
[13:44] <zdzichu> what sense does min_size have? in this situation you clearly want min_size=1 to have it replicated ASAP
[13:44] <zdzichu> when would one want min_size bigger than number of available replicas?
[13:44] <Be-El> size=3/min_size=2 will put you on the safe side. it will also result in an unhealthy cluster since you do not have three available hosts in the moment
[13:44] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[13:44] <tumeric> zdzichu, Right.
[13:45] <Be-El> zdzichu: min_size should be lower/equal to size. it defines the number of replicates that have to be available to allow I/O to a PG
[13:45] <Heebie> size 5, min_size 3... that way if you lose a single replica, traffic continues... if you lose two, traffic continues. It isn't until you reach the loss of three of 5 that traffic stops.
[13:45] * praveen (~praveen@122.167.128.246) has joined #ceph
[13:45] <Be-El> so size - min_size = number of hosts/osd/<failure domain>, that might be not available without stalling I/O
[13:45] <Heebie> (so size 2, min_size 2 is probably "normal"
[13:46] <Heebie> size >3<, min_size 2. =O
[13:48] <zdzichu> why would I want he traffic to stop?
[13:48] <Heebie> If the number of replicas becomes too low... and one more OSD fails in the middle of a write... you've lost data.
[13:48] <tumeric> Be-El, worked like a charm
[13:48] <tumeric> :p
[13:48] <tumeric> I tried with 2-1
[13:48] * wjw-freebsd (~wjw@39.237.62.188.dynamic.wline.res.cust.swisscom.ch) Quit (Ping timeout: 480 seconds)
[13:49] <Heebie> traffic is stopped in order to protect data that has fallen under the "min_size" limit.
[13:49] <tumeric> and everything is now active
[13:49] * tumeric buys Be-El a sixpack
[13:49] <Be-El> tumeric: if the third box is back online, you might want to consider upgrading to size=3,min_size=2
[13:50] <Be-El> tumeric: if you can convince your boss ;-)
[13:50] <tumeric> I will.
[13:50] <TMM> Be-El, I don't suppose you know of any other documentation with regards to cache tiering than the http://docs.ceph.com/docs/jewel/dev/cache-pool/ pages?
[13:50] <tumeric> Be-El, with 4 machines would it still make sense size=3 and min_size=2 ?
[13:51] <Be-El> TMM: nope, sorry. there've been some thread on the mailing list in the past since jewel was released
[13:51] * valeech (~valeech@wsip-70-166-79-23.ga.at.cox.net) has joined #ceph
[13:51] <Be-El> tumeric: sure, in that case a machine does not have a replicate of all PGs (but all PGs still have 3 replicates)
[13:52] <TMM> Be-El, it does seem a bit odd that it would be this broken, I'm considering I'm misunderstanding something again :)
[13:52] <Heebie> I think size & min_size depend on how much protection you are willing to give to the data set, and how many possible failure-domains of the highest level you have available are available.
[13:52] <tumeric> Heebie, what I told my boss, was that we could be able to lose half of the disks(OSD) and still be able to serve data.
[13:53] * IvanJobs (~ivanjobs@103.50.11.146) Quit (Remote host closed the connection)
[13:53] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[13:53] <Be-El> tumeric: with the current setup you might lose one host. but if you lose two disk in two different hosts, you are still in trouble
[13:53] <TMM> tumeric, you probably shouldn't have said that, unless you added 'but our data won't be safe'
[13:53] <Heebie> Then you can only use size 2, max_size 1. :( and you need to have at least four servers.
[13:54] <Heebie> and what Be-El said still applies.. if you lose exactly the right two disks.. a chunk of data is gone.
[13:54] <tumeric> at the same time, right?
[13:54] <tumeric> I guess I can convince him to use the 3/2 size
[13:55] <Be-El> Heebie: any two disk failure in a size=3/min_size=2 setup with three hosts will result in some PG being unhappy
[13:55] <tumeric> that would be the perfect solution, right?
[13:55] <Heebie> at the same time, or between when the first one fails, and when the data is completely replicated to match the ruleset again.
[13:55] <tumeric> Thats highly unlikely
[13:55] <tumeric> but anyway, It can happen
[13:55] <tumeric> and both disks have to hold the same pg's
[13:56] <Be-El> tumeric: the pg distribution will result in some pgs being distributed across the affected disks
[13:56] * ngoswami_ (~ngoswami@121.244.87.116) has joined #ceph
[13:56] <Heebie> tumeric: You should get your boss to decide how much he wants to protect his data. Failure of single disks? failure of a single server? failure of a cabinet of servers? Failure of a data centre?
[13:56] <Be-El> tumeric: but in this case io will stall until the remaining replicate is copied to another disk in another host
[13:57] <tumeric> Heebie, We are using NFS and RAID. anything is better. We want to survive with 2 machines
[13:57] <tumeric> out of 3
[13:57] <tumeric> But
[13:57] <tumeric> to prevent two disks in different machines failing, maybe is better 3 replicas
[13:58] <tumeric> Be-El, what do you mean by "stall"?
[13:58] <Be-El> tumeric: to survive two disk failure you definitely need 3 replicates
[13:58] <Heebie> across 3 machines, yes, replica 3 is pretty much a must. (one server is your largest failure domain.)
[13:58] <Be-El> tumeric: pg that are not active, e.g. with only one replicate left.
[13:58] <tumeric> If two disks fail, holding different pgs, shouldn't min_size=1 be able to serve IO?
[13:59] <tumeric> I am taking your advice, but I just need some actual arguments
[13:59] <Heebie> You never want min_size to be 1, because that puts your data at-risk if any writing is going on, and you've lost any copies.
[13:59] <Be-El> the problem about any distributed system (local raid with several disk up to ceph across machines) is the fact that in case of a failure, the recovery process puts an extra load in the disk, which might trigger secondary or even tertiaery failures
[14:00] <tumeric> makes sense, Be-El
[14:00] <tumeric> thanks
[14:01] * ngoswami (~ngoswami@121.244.87.116) Quit (Ping timeout: 480 seconds)
[14:02] <tumeric> I will explain him exactly this, than its up to him.
[14:02] * bniver (~bniver@nat-pool-bos-u.redhat.com) has joined #ceph
[14:02] <Be-El> to be on the safe side, bring back the third machine and use size=3,min_size=2. with this setup you might lose one disk or one host without affecting data availability. if you lose two disk in two different hosts, ceph will replicate the data, but I/O to the affected PGs will stall until the PG is recovered
[14:02] <Be-El> if you lose three disk in three different hosts you are in trouble.
[14:02] <tumeric> :D
[14:03] <tumeric> Yeah, if I loose three machines aswell
[14:03] * Frymaster (~Kyso_@4MJAAGFLZ.tor-irc.dnsbl.oftc.net) Quit ()
[14:03] <tumeric> In the past if I lost one disk I was having downtime..
[14:03] <tumeric> so :-)
[14:03] * QuantumBeep (~KungFuHam@marcuse-2.nos-oignons.net) has joined #ceph
[14:03] <tumeric> what about backfills?
[14:03] <Heebie> once backfilling brings data back to a min_size of 2, I/O will restart.
[14:04] <Be-El> that's why your failure domain is at host level, since anything affecting a host (including failure of a disk in the host) will result in an effect on your cluster
[14:04] <Be-El> tumeric: you are at size=2,min_size=1 at the moment?
[14:04] <tumeric> Be-El, yes, I am just testing
[14:04] <tumeric> we are not in prod yet.
[14:05] <tumeric> and its national holiday here hehe
[14:05] <tumeric> osd recovery op priority = 4 osd recovery max active = 5 osd max backfills = 2
[14:05] <tumeric> I have this.
[14:05] <tumeric> I didnt want to push the limits as the hardware we have is not optimal
[14:05] <tumeric> anyway, we are caching a lot of stuff, so requests to cephfs will not be so many
[14:06] <tumeric> we have a huge amount of cache servers
[14:06] <Be-El> the problem is probably the missing ssd journal. each data is written to the journal first in a synchronous way. for disks this operation is very io-demanding
[14:06] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[14:06] <Be-El> the same is also true if ssd are not suitable for journal usage
[14:07] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[14:07] <Be-El> ceph does not acknowledge a write operation if the data is not written to the journals of all involved osds
[14:07] <Be-El> and backfill = write operation between osds
[14:07] <tumeric> Yeah, I know
[14:07] <tumeric> But we will not afford ssd's..
[14:07] <Be-El> so you just need to give it some time
[14:08] <tumeric> Im using btrfs
[14:08] <tumeric> I think it helped a bit
[14:08] <tumeric> I know Be-El, my question is, regarding the hardware, Either I wait or have slow requests
[14:08] <tumeric> right?
[14:08] <Be-El> right
[14:09] <tumeric> Its better to wait
[14:09] <Be-El> the first cluster we built was also on "bad" hardware
[14:09] <tumeric> then
[14:09] <Be-El> and the ssds we used as journals had a horrible performance for synchronous writes
[14:09] <tumeric> We were using NFS directly, so I think even with bad hardware ceph gives a lot more performance
[14:10] <Be-El> do not expect too much. cephfs also may give you a lot of trouble
[14:10] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit ()
[14:10] <Heebie> CEPH performance can suffer miserably from bad hardware. It can only write as fast as the SLOWEST disk in a PG.
[14:10] <tumeric> Be-El, we will be testing and see if it fits
[14:10] <tumeric> Be-El, what kind of trouble btw?
[14:10] <tumeric> Heebie, all of them are slow
[14:10] <tumeric> :D
[14:10] <Be-El> tumeric: hanging I/O, problems with posix locking etc.
[14:11] <tumeric> True, but NFS aswell. There is no perfect solution on premise
[14:11] <tumeric> s3 buckets would be perfect :p
[14:11] <Be-El> tumeric: if you want to use cephfs, you should either use the latest ceph release and ceph-fuse, or use the latest kernel available for your distributionm
[14:11] <tumeric> Be-El, yeah, we have jewel and ubuntu 16.04
[14:11] * valeech (~valeech@wsip-70-166-79-23.ga.at.cox.net) Quit (Quit: valeech)
[14:12] <tumeric> the release from 16/5
[14:12] <Be-El> and ceph-fuse might have it advantages....but the context switch between userspace and kernel takes time
[14:12] <tumeric> Tbh, I was copying a lot (I mean a lot) of files at the same time with rsync
[14:12] <tumeric> small files
[14:12] <tumeric> for more than 4 days
[14:12] <tumeric> and tbh it wasnt bad
[14:12] <Be-El> the kernel implementation is faster, but has different problems
[14:12] <tumeric> now the machine doesnt come up hahah
[14:13] * jmn (~jmn@nat-pool-bos-t.redhat.com) Quit (Quit: Coyote finally caught me)
[14:13] * jmn (~jmn@nat-pool-bos-t.redhat.com) has joined #ceph
[14:13] <tumeric> 655 down+peering
[14:13] <tumeric> oops
[14:13] <tumeric> yeah, now I shut down the other machine
[14:13] <tumeric> makes sense
[14:14] <Be-El> your data should survive it, if you are able to bring the machine back online
[14:15] <tumeric> yeah
[14:15] <tumeric> I am
[14:15] <Be-El> i would advise to stick to ceph-fuse. in case of problems it's easier to kill the ceph-fuse process and remount cephfs than rebooting the machine
[14:15] <tumeric> Be-El, that's a point
[14:15] <tumeric> I was having a lot of issues with kernel implementation due to processes in D state
[14:15] * e0ne (~e0ne@194.213.110.76) Quit (Read error: Connection reset by peer)
[14:15] <Be-El> some processes might not be happy (and thus users might not be happy), but you are not affecting the whole machine
[14:16] <tumeric> but
[14:16] <tumeric> not at mount level
[14:16] <tumeric> at OSD level
[14:16] <tumeric> btrfs would just get crazy
[14:16] <Be-El> but you should keep in mind that ceph-fuse's page cache support might not be complete (at least in case of the hammer release. jewel is a different sotry)
[14:16] <Be-El> tumeric: do you want to mount cephfs on the machine with the osds?
[14:16] <tumeric> No
[14:17] <tumeric> I am dumb, just not that dumb
[14:17] <tumeric> OSDs are mounted
[14:17] <tumeric> on a disk
[14:17] <tumeric> var/lib/ceph*
[14:17] <Be-El> oh, it should be working, as long as you do not mix osds and kernel based ceph clients
[14:17] <tumeric> and sometimes they get crazy
[14:18] <Be-El> the io pattern of btrfs might be a problem. that's why i'm phasing out btrfs in favor of xfs
[14:19] <tumeric> it does yes
[14:19] <tumeric> But
[14:19] <tumeric> It has its advantages.
[14:19] <tumeric> I am not sure yet what ill use
[14:19] <tumeric> especially on the hardware i've been given
[14:19] * valeech (~valeech@wsip-70-166-79-23.ga.at.cox.net) has joined #ceph
[14:19] <Be-El> i'm a fan of data checksums, too. they have saved my ass several times in the past....
[14:20] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) Quit (Remote host closed the connection)
[14:21] <tumeric> Yup.
[14:21] <tumeric> and with ceph you are double protected
[14:21] <tumeric> btrfs+ceph scrub
[14:21] <tumeric> thing is, btrfs has is issues
[14:21] <tumeric> from time to time I have to rebuild an OSD..
[14:21] * valeech (~valeech@wsip-70-166-79-23.ga.at.cox.net) Quit ()
[14:21] <tumeric> its*
[14:22] <TMM> I have tried every pool setting for the cache tiering now, it's just not working on jewel
[14:22] <TMM> I'm surprised nobody else found this out
[14:22] <tumeric> anyway guys Ill let you help other people now, thanks so much for the help
[14:23] <Be-El> tumeric: you're welcome
[14:23] * kawa2014 (~kawa@dynamic-adsl-84-220-89-95.clienti.tiscali.it) has joined #ceph
[14:24] <Be-El> TMM: the only thread from 5/17 about cephfs and cache tier does not have an answer yet
[14:24] <Be-El> +relevant
[14:24] <TMM> I don't use cephfs though
[14:26] <Be-El> but your setup is similar. cephfs is just a different client
[14:28] <TMM> Be-El, ha, that's the day I joined the ml, I don't have that on my local client :)
[14:30] * zer0def (~zer0def@00021960.user.oftc.net) has joined #ceph
[14:31] <zer0def> ok, actually went on the wrong network
[14:31] <zer0def> so hey there, if i prepared and activated an osd with `--dmcrypt`, but there's no /etc/ceph/dmcrypt-keys, where should i start looking for the keys?
[14:31] * IvanJobs (~ivanjobs@183.192.78.179) has joined #ceph
[14:33] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[14:33] * QuantumBeep (~KungFuHam@7V7AAF5BG.tor-irc.dnsbl.oftc.net) Quit ()
[14:35] <zer0def> ok, nvm, just noticed it's stashed on monitors
[14:37] * Hemanth (~hkumar_@121.244.87.117) Quit (Ping timeout: 480 seconds)
[14:37] * mhack (~mhack@24-107-236-214.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:40] * bitserker1 (~toni@63.pool85-52-240.static.orange.es) Quit (Quit: Leaving.)
[14:41] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[14:41] * dgurtner (~dgurtner@178.197.226.162) has joined #ceph
[14:41] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:45] * ira (~ira@nat-pool-bos-t.redhat.com) has joined #ceph
[14:46] * ngoswami__ (~ngoswami@121.244.87.116) has joined #ceph
[14:48] * b0e (~aledermue@213.95.25.82) Quit (Ping timeout: 480 seconds)
[14:50] * praveen (~praveen@122.167.128.246) Quit (Remote host closed the connection)
[14:51] * EinstCrazy (~EinstCraz@116.224.226.201) has joined #ceph
[14:52] * allaok1 (~allaok@machine107.orange-labs.com) has joined #ceph
[14:52] * jmp242 (~kvirc@lnx6187.classe.cornell.edu) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[14:52] * allaok (~allaok@machine107.orange-labs.com) Quit (Remote host closed the connection)
[14:52] * overclk (~quassel@117.202.96.73) has joined #ceph
[14:52] * ngoswami_ (~ngoswami@121.244.87.116) Quit (Ping timeout: 480 seconds)
[14:53] * mattia (20026@ninthfloor.org) has joined #ceph
[14:53] * bara (~bara@213.175.37.12) has joined #ceph
[14:53] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[14:55] * overclk_ (~quassel@117.202.96.180) Quit (Ping timeout: 480 seconds)
[14:55] * EinstCrazy (~EinstCraz@116.224.226.201) Quit (Remote host closed the connection)
[14:57] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[15:00] * b0e (~aledermue@213.95.25.82) has joined #ceph
[15:00] * georgem (~Adium@206.108.127.16) has joined #ceph
[15:01] * rwheeler (~rwheeler@pool-173-48-195-215.bstnma.fios.verizon.net) has joined #ceph
[15:03] * Sigma (~Spessu@4MJAAGFR4.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:03] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[15:10] * rongze (~rongze@58.210.133.42) has joined #ceph
[15:11] * shyu (~shyu@218.241.172.114) Quit (Ping timeout: 480 seconds)
[15:11] * rongze (~rongze@58.210.133.42) Quit (Read error: Connection reset by peer)
[15:16] * bara (~bara@213.175.37.12) Quit (Ping timeout: 480 seconds)
[15:16] * allaok (~allaok@161.105.181.113) has joined #ceph
[15:16] * jeromeb (~jerome__b@131-9-170-31.fibre.evolix.net) has joined #ceph
[15:17] * madkiss (~madkiss@2001:6f8:12c3:f00f:74ce:6906:ffc1:ca6e) Quit (Quit: Leaving.)
[15:17] * MentalRay (~MentalRay@107.171.161.165) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[15:18] * allaok1 (~allaok@machine107.orange-labs.com) Quit (Remote host closed the connection)
[15:18] * MentalRay (~MentalRay@107.171.161.165) has joined #ceph
[15:19] * MentalRay (~MentalRay@107.171.161.165) Quit ()
[15:20] * bene (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[15:20] * allaok (~allaok@161.105.181.113) Quit ()
[15:21] * post-factum (~post-fact@vulcan.natalenko.name) Quit (Killed (NickServ (Too many failed password attempts.)))
[15:21] * post-factum (~post-fact@vulcan.natalenko.name) has joined #ceph
[15:22] * valeech (~valeech@wsip-70-166-79-23.ga.at.cox.net) has joined #ceph
[15:26] * praveen (~praveen@122.167.128.246) has joined #ceph
[15:26] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[15:28] * deepthi (~deepthi@115.118.48.46) Quit (Ping timeout: 480 seconds)
[15:28] * praveen (~praveen@122.167.128.246) Quit (Remote host closed the connection)
[15:28] * allaok (~allaok@machine107.orange-labs.com) has joined #ceph
[15:29] <Be-El> is it possible to change the report time for slow requests? e.g. 10 seconds instead of 30 seconds
[15:33] * Sigma (~Spessu@4MJAAGFR4.tor-irc.dnsbl.oftc.net) Quit ()
[15:33] * offender (~CoZmicShR@7V7AAF5GD.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:34] * m0zes__ (~mozes@n117m02.cis.ksu.edu) has joined #ceph
[15:35] * zer0def (~zer0def@00021960.user.oftc.net) has left #ceph
[15:37] * deepthi (~deepthi@115.117.168.114) has joined #ceph
[15:37] * manoj (~oftc-webi@183.82.3.27) Quit (Quit: Page closed)
[15:37] * kefu (~kefu@114.92.120.18) has joined #ceph
[15:40] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:40] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[15:43] * Penn (~penn11.li@14.154.254.12) Quit (Ping timeout: 480 seconds)
[15:44] <lurbs> Yes, 'osd op complaint time'.
[15:44] <tumeric> Be-El, yes
[15:44] <lurbs> http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#operations
[15:45] <Be-El> ah, thx
[15:45] * scg (~zscg@valis.gnu.org) has joined #ceph
[15:45] <Be-El> let's see if hell breaks loose if i set it to 10 secs...
[15:46] * bitserker (~toni@63.pool85-52-240.static.orange.es) has joined #ceph
[15:47] <SamYaple> Be-El: esh. RIP
[15:47] * huangjun (~kvirc@117.151.41.23) has joined #ceph
[15:47] * haomaiwang (~haomaiwan@li1068-35.members.linode.com) has joined #ceph
[15:49] <Be-El> SamYaple: hmm?
[15:49] * oms101 (~oms101@p20030057EA1A1000C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[15:50] * mitchty_ is now known as mitchty
[15:50] * kefu (~kefu@114.92.120.18) Quit (Max SendQ exceeded)
[15:50] <SamYaple> Be-El: are your logs overflowing? i lowered mine one time trying to find the cause of the slow requests and the logs overflowed
[15:51] <Be-El> SamYaple: everything is fine, although the cluster is backfik$%K$LTMKY$K NO CARRIER
[15:51] * kefu (~kefu@114.92.120.18) has joined #ceph
[15:52] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[15:56] <tumeric> guys, is it possible to increase network throughput within ceph? or it has to be in the backfill options?
[15:56] <IcePic> Be-El: "swoosh" is the sound of that passing over the head of all the youngsters of today. ;)
[15:57] <Be-El> tumeric: you can increase the number of parallel backfills with osd-max-backfills
[15:58] <rkeene> How do you delete snapshots from a pool with 1 OSD, which is full ?
[15:58] <Be-El> rkeene: temporarly setting the full threshold to allow operations on the osd?
[15:59] * DeMiNe0 (~DeMiNe0@104.131.119.74) Quit (Ping timeout: 480 seconds)
[16:03] * offender (~CoZmicShR@7V7AAF5GD.tor-irc.dnsbl.oftc.net) Quit ()
[16:03] * totalwormage (~notarima@snowfall.relay.coldhak.com) has joined #ceph
[16:06] <tumeric> thanks Be-El
[16:06] * DeMiNe0 (~DeMiNe0@104.131.119.74) has joined #ceph
[16:09] * vata (~vata@207.96.182.162) has joined #ceph
[16:09] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[16:10] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:12] * Penn (~penn11.li@14.154.254.12) has joined #ceph
[16:14] <SamYaple> rkeene: like 100% full? or %95 "full"?
[16:15] * IvanJobs (~ivanjobs@183.192.78.179) Quit (Remote host closed the connection)
[16:16] * vbellur (~vijay@2601:18f:700:55b0:5e51:4fff:fee8:6a5c) Quit (Ping timeout: 480 seconds)
[16:16] * IvanJobs (~ivanjobs@183.192.78.179) has joined #ceph
[16:16] <rkeene> SamYaple, 95% full
[16:17] <rkeene> I have like 47GB free
[16:18] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:20] * swami1 (~swami@49.32.0.204) Quit (Quit: Leaving.)
[16:21] <tumeric> ceph mds getting crazy
[16:21] <tumeric> fsmap e814: 1/1/1 up {0=mds01=up:rejoin}, 1 up:standby
[16:21] <tumeric> stuck on rejoin :p
[16:25] <tumeric> sigh..
[16:25] <SamYaple> rkeene: you can bump it up to like 98% and drop the wieght on that osd. or thats what i would do
[16:26] <SamYaple> rkeene: enough to get the data moving at least
[16:26] <rkeene> What would changing the weight accomplish ?
[16:27] <m0zes__> nothing in a 1 osd pool
[16:32] * huangjun (~kvirc@117.151.41.23) Quit (Ping timeout: 480 seconds)
[16:33] * totalwormage (~notarima@06SAADX2F.tor-irc.dnsbl.oftc.net) Quit ()
[16:33] * ZombieTree (~JohnO@94.31.53.203) has joined #ceph
[16:35] * MentalRay (~MentalRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[16:36] <SamYaple> rkeene: lessen the ammount of data on that osd
[16:36] <SamYaple> rkeene: bringing it under the 95% and allowing you to use it. (unless the whole cluster is full up, then add more disks asap)
[16:38] * kefu is now known as kefu|afk
[16:39] * EinstCrazy (~EinstCraz@116.224.226.201) has joined #ceph
[16:39] * vbellur (~vijay@nat-pool-bos-u.redhat.com) has joined #ceph
[16:39] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) has joined #ceph
[16:40] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) Quit (Remote host closed the connection)
[16:40] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) has joined #ceph
[16:42] <rkeene> SamYaple, It's a pool with exactly 1 OSD, I'm trying to lessen the amount of data on the OSD by deleting a snapshot -- but it fails.
[16:43] * danieagle (~Daniel@201-69-180-23.dial-up.telesp.net.br) has joined #ceph
[16:44] <SamYaple> rkeene: oh. yea no super useful what ive said then
[16:44] <SamYaple> not*
[16:44] * vanham (~vanham@191.185.29.65) has joined #ceph
[16:45] <Be-El> the part about raising the 'full' threshold should help
[16:46] * maxx2042 (~m.vernimm@asa01.comparegroup.eu) Quit (Quit: maxx2042)
[16:47] * Penn (~penn11.li@14.154.254.12) Quit (Ping timeout: 480 seconds)
[16:48] * ZombieTree (~JohnO@7V7AAF5KD.tor-irc.dnsbl.oftc.net) Quit ()
[16:51] * blizzow (~jburns@2601:284:8200:e200:7e7a:91ff:fe14:9b91) has joined #ceph
[16:56] * antongribok (~antongrib@216.207.42.140) has joined #ceph
[16:58] * hoopy1 (~theghost9@tor-exit.insane.us.to) has joined #ceph
[16:59] * allaok (~allaok@machine107.orange-labs.com) has left #ceph
[16:59] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Remote host closed the connection)
[17:01] * mykola (~Mikolaj@91.245.78.125) has joined #ceph
[17:01] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[17:05] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:07] * maxx2042 (~m.vernimm@84.241.212.46) has joined #ceph
[17:10] * jowilkin (~jowilkin@2601:644:4000:b0bf:56ee:75ff:fe10:724e) has joined #ceph
[17:15] * wushudoin (~wushudoin@2601:646:9501:d2b2:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:15] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[17:17] * jproulx (~jon@kvas.csail.mit.edu) has joined #ceph
[17:18] * linuxkidd (~linuxkidd@ip70-189-207-54.lv.lv.cox.net) has joined #ceph
[17:20] * maxx2042 (~m.vernimm@84.241.212.46) Quit (Quit: maxx2042)
[17:20] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[17:21] * praveen (~praveen@122.167.128.246) has joined #ceph
[17:21] * xarses (~xarses@ip-64-134-222-177.public.wayport.net) has joined #ceph
[17:23] * bniver (~bniver@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[17:25] * bniver (~bniver@nat-pool-bos-u.redhat.com) has joined #ceph
[17:28] * hoopy1 (~theghost9@7V7AAF5L4.tor-irc.dnsbl.oftc.net) Quit ()
[17:28] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:31] * Racpatel (~Racpatel@2601:641:200:4c30:4e34:88ff:fe87:9abf) has joined #ceph
[17:32] * nhm (~nhm@46.20.243.154) has joined #ceph
[17:32] * ChanServ sets mode +o nhm
[17:32] * djidis__ (~datagutt@4MJAAGF2B.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:35] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[17:36] * rdias (~rdias@2001:8a0:749a:d01:2c1f:6405:a506:436e) Quit (Ping timeout: 480 seconds)
[17:37] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[17:38] * jdillaman (~jdillaman@pool-108-18-97-95.washdc.fios.verizon.net) has joined #ceph
[17:38] * derjohn_mobi (~aj@2001:6f8:1337:0:6824:4dfa:419b:8856) Quit (Ping timeout: 480 seconds)
[17:41] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[17:43] * swami1 (~swami@27.7.164.182) has joined #ceph
[17:44] * gmmaha (~gmmaha@00021e7e.user.oftc.net) Quit (Quit: Off to save the world!!)
[17:44] * rdias (~rdias@2001:8a0:749a:d01:2c1f:6405:a506:436e) has joined #ceph
[17:44] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Remote host closed the connection)
[17:44] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[17:45] * gmmaha (~gmmaha@162.243.31.130) has joined #ceph
[17:46] * gmmaharaj (~gmmaha@162.243.31.130) has joined #ceph
[17:46] * gmmaha (~gmmaha@162.243.31.130) Quit (Read error: Connection reset by peer)
[17:46] * gmmaharaj is now known as gmmaha
[17:47] * swami1 (~swami@27.7.164.182) Quit ()
[17:49] * yanzheng (~zhyan@118.116.113.63) Quit (Quit: This computer has gone to sleep)
[17:52] * wjw-freebsd (~wjw@177-100-31-185.ftth.cust.k-sys.ch) has joined #ceph
[17:52] * Penn (~penn11.li@14.154.254.12) has joined #ceph
[17:52] * yanzheng (~zhyan@118.116.113.63) has joined #ceph
[17:55] * yanzheng (~zhyan@118.116.113.63) Quit ()
[17:56] * yanzheng (~zhyan@118.116.113.63) has joined #ceph
[17:56] * cathode (~cathode@50.232.215.114) has joined #ceph
[17:57] * swami1 (~swami@27.7.164.182) has joined #ceph
[18:00] * deepthi (~deepthi@115.117.168.114) Quit (Quit: Leaving)
[18:01] * MentalRay (~MentalRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Ping timeout: 480 seconds)
[18:02] * djidis__ (~datagutt@4MJAAGF2B.tor-irc.dnsbl.oftc.net) Quit ()
[18:02] * RaidSoft (~Sketchfil@ori.enn.lu) has joined #ceph
[18:03] * davidz (~davidz@2605:e000:1313:8003:b16a:6606:2126:de3e) has joined #ceph
[18:04] * EinstCrazy (~EinstCraz@116.224.226.201) Quit (Remote host closed the connection)
[18:06] <jproulx> root@ceph-osd6:~# mkfs.ext4 /dev/sdc1
[18:06] <jproulx> mke2fs 1.42.9 (4-Feb-2014)
[18:06] <jproulx> /dev/sdc1 is not a block special device.
[18:06] * jproulx switches to correct window mea culpa...
[18:08] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[18:08] * Racpatel (~Racpatel@2601:641:200:4c30:4e34:88ff:fe87:9abf) Quit (Ping timeout: 480 seconds)
[18:10] * kawa2014 (~kawa@dynamic-adsl-84-220-89-95.clienti.tiscali.it) Quit (Ping timeout: 480 seconds)
[18:13] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[18:18] * IvanJobs (~ivanjobs@183.192.78.179) Quit (Remote host closed the connection)
[18:18] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[18:19] * kawa2014 (~kawa@dynamic-adsl-84-220-79-106.clienti.tiscali.it) has joined #ceph
[18:21] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[18:22] * Racpatel (~Racpatel@c-73-170-66-165.hsd1.ca.comcast.net) has joined #ceph
[18:25] * Penn (~penn11.li@14.154.254.12) Quit (Ping timeout: 480 seconds)
[18:26] * shaunm (~shaunm@cpe-192-180-17-174.kya.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:27] * yanzheng (~zhyan@118.116.113.63) Quit (Quit: This computer has gone to sleep)
[18:28] * IvanJobs (~ivanjobs@183.192.78.179) has joined #ceph
[18:28] * nhm (~nhm@46.20.243.154) Quit (Ping timeout: 480 seconds)
[18:32] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Remote host closed the connection)
[18:32] * RaidSoft (~Sketchfil@7V7AAF5PN.tor-irc.dnsbl.oftc.net) Quit ()
[18:32] * allenmelon (~Oddtwang@06SAADYAH.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:32] * ira (~ira@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving)
[18:33] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[18:36] * IvanJobs (~ivanjobs@183.192.78.179) Quit (Ping timeout: 480 seconds)
[18:43] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:47] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) has joined #ceph
[18:47] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) Quit (Read error: Connection reset by peer)
[18:48] * vbellur (~vijay@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:48] * kawa2014 (~kawa@dynamic-adsl-84-220-79-106.clienti.tiscali.it) Quit (Quit: Leaving)
[18:51] * winston-d_ (uid98317@id-98317.richmond.irccloud.com) has joined #ceph
[18:52] * rraja (~rraja@121.244.87.117) Quit (Quit: Leaving)
[18:53] * erwan_taf (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) Quit (Quit: Leaving)
[18:53] * evelu (~erwan@2a01:e34:eecb:7400:4eeb:42ff:fedc:8ac) has joined #ceph
[18:57] * rakeshgm (~rakesh@106.51.24.145) has joined #ceph
[18:58] * bniver (~bniver@nat-pool-bos-u.redhat.com) Quit (Remote host closed the connection)
[18:59] * dustinm` (~dustinm`@68.ip-149-56-14.net) Quit (Ping timeout: 480 seconds)
[18:59] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[19:02] * allenmelon (~Oddtwang@06SAADYAH.tor-irc.dnsbl.oftc.net) Quit ()
[19:03] * swami1 (~swami@27.7.164.182) Quit (Quit: Leaving.)
[19:04] * krypto (~krypto@G68-90-105-31.sbcis.sbc.com) has joined #ceph
[19:07] * rwheeler (~rwheeler@pool-173-48-195-215.bstnma.fios.verizon.net) Quit (Quit: Leaving)
[19:09] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[19:12] <Heebie> Is and index_pool the only pool you need specific to RadosGW?
[19:14] * kefu|afk (~kefu@114.92.120.18) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:17] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:18] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[19:19] * tumeric (~jcastro@56.73.54.77.rev.vodafone.pt) Quit (Ping timeout: 480 seconds)
[19:22] <vanham> Heebie, that's a good pool to put on an ssd
[19:23] <vanham> I recommend you create all of them manually, with 32 pgs (except for the data one). Otherwise u'll get too many pgs per OSD and that's a warning
[19:23] <vanham> *you could get
[19:25] * tumeric (~jcastro@56.73.54.77.rev.vodafone.pt) has joined #ceph
[19:25] * tumeric (~jcastro@56.73.54.77.rev.vodafone.pt) Quit ()
[19:26] * bitserker (~toni@63.pool85-52-240.static.orange.es) Quit (Ping timeout: 480 seconds)
[19:29] <Heebie> vanham: All of which pools? I might need to look at the calculator for that, I guess.
[19:30] * squizzi (~squizzi@107.13.31.195) has joined #ceph
[19:31] * overclk (~quassel@117.202.96.73) Quit (Remote host closed the connection)
[19:32] * isaxi (~mr_flea@ns316491.ip-37-187-129.eu) has joined #ceph
[19:36] * IvanJobs (~ivanjobs@183.192.78.179) has joined #ceph
[19:37] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:37] * branto (~branto@178-253-128-187.3pp.slovanet.sk) Quit (Quit: Leaving.)
[19:44] * IvanJobs (~ivanjobs@183.192.78.179) Quit (Ping timeout: 480 seconds)
[19:46] * Hemanth (~hkumar_@103.228.221.132) has joined #ceph
[19:46] * shylesh__ (~shylesh@45.124.227.70) Quit (Remote host closed the connection)
[19:54] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[19:55] * penguinRaider (~KiKo@116.202.34.78) has joined #ceph
[19:55] * Penn (~penn11.li@14.154.254.12) has joined #ceph
[20:00] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) has joined #ceph
[20:02] * isaxi (~mr_flea@4MJAAGF82.tor-irc.dnsbl.oftc.net) Quit ()
[20:03] * dustinm` (~dustinm`@68.ip-149-56-14.net) has joined #ceph
[20:07] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[20:09] * dgurtner (~dgurtner@178.197.226.162) Quit (Read error: Connection reset by peer)
[20:10] * MentalRay (~MentalRay@office-mtl1-nat-146-218-70-69.gtcomm.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[20:10] * Hemanth (~hkumar_@103.228.221.132) Quit (Ping timeout: 480 seconds)
[20:11] * Hemanth (~hkumar_@103.228.221.182) has joined #ceph
[20:12] * penguinRaider_ (~KiKo@116.202.34.78) has joined #ceph
[20:19] * derjohn_mobi (~aj@x590e5d1c.dyn.telefonica.de) has joined #ceph
[20:20] * penguinRaider (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[20:20] * Hemanth (~hkumar_@103.228.221.182) Quit (Quit: Leaving)
[20:22] * xarses (~xarses@ip-64-134-222-177.public.wayport.net) Quit (Ping timeout: 480 seconds)
[20:22] * praveen (~praveen@122.167.128.246) Quit (Remote host closed the connection)
[20:22] * ira (~ira@c-24-34-255-34.hsd1.ma.comcast.net) has joined #ceph
[20:30] * Penn (~penn11.li@14.154.254.12) Quit (Ping timeout: 480 seconds)
[20:32] <vanham> Heebie, all of the radosgw pools. The calculator is the right answer here
[20:32] * Scaevolus (~Maariu5_@h-184-90.a322.priv.bahnhof.se) has joined #ceph
[20:34] * penguinRaider__ (~KiKo@116.202.34.78) has joined #ceph
[20:35] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:ccf:dc55:3b8d:d313) Quit (Ping timeout: 480 seconds)
[20:41] * penguinRaider_ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[20:46] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[20:49] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[20:54] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[20:56] * ibravo (~ibravo@72.83.69.64) has joined #ceph
[20:57] * MentalRay (~MentalRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[20:59] * gregmark (~Adium@68.87.42.115) has joined #ceph
[20:59] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Remote host closed the connection)
[21:02] * Long_yanG (~long@15255.s.time4vps.eu) has joined #ceph
[21:02] * Scaevolus (~Maariu5_@06SAADYHQ.tor-irc.dnsbl.oftc.net) Quit ()
[21:02] * PuyoDead (~richardus@tor-exit1-readme.dfri.se) has joined #ceph
[21:03] * The_Ball (~pi@20.92-221-43.customer.lyse.net) Quit (Ping timeout: 480 seconds)
[21:04] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) has joined #ceph
[21:07] * penguinRaider_ (~KiKo@116.202.34.78) has joined #ceph
[21:08] * LongyanG (~long@15255.s.time4vps.eu) Quit (Ping timeout: 480 seconds)
[21:10] * ktdreyer (~kdreyer@polyp.adiemus.org) has joined #ceph
[21:10] * ktdreyer (~kdreyer@polyp.adiemus.org) Quit ()
[21:10] * ktdreyer (~kdreyer@polyp.adiemus.org) has joined #ceph
[21:12] * MentalRay (~MentalRay@MTRLPQ42-1176054809.sdsl.bell.ca) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[21:12] * madkiss (~madkiss@2001:6f8:12c3:f00f:c9da:2c2f:a87e:97f7) has joined #ceph
[21:13] * joelc (~joelc@cpe-24-28-78-20.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:13] * MentalRay (~MentalRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[21:14] * penguinRaider__ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[21:15] * penguinRaider__ (~KiKo@116.202.34.78) has joined #ceph
[21:16] * rakeshgm (~rakesh@106.51.24.145) Quit (Ping timeout: 480 seconds)
[21:18] * rakeshgm (~rakesh@106.51.24.145) has joined #ceph
[21:19] * rakeshgm (~rakesh@106.51.24.145) Quit (Remote host closed the connection)
[21:22] * The_Ball (~pi@20.92-221-43.customer.lyse.net) has joined #ceph
[21:22] * penguinRaider_ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[21:22] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) has joined #ceph
[21:22] * treenerd_ (~gsulzberg@cpe90-146-148-47.liwest.at) Quit ()
[21:29] * shaunm (~shaunm@74.83.215.100) has joined #ceph
[21:31] * rendar (~I@host102-48-dynamic.1-87-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[21:32] * PuyoDead (~richardus@4MJAAGGDW.tor-irc.dnsbl.oftc.net) Quit ()
[21:38] * xarses (~xarses@64.124.158.100) has joined #ceph
[21:38] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[21:39] <vanham> Folks, question. I want a pre-release of the branch of what is going to be v10.2.2 latter. So, git builder have that. There is a needed fix for a bug merged already. Should I get the jewel branch or master branch on gitbuilder?
[21:40] * bene (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[21:40] * whatevsz___ (~quassel@185.22.140.109) Quit (Quit: No Ping reply in 180 seconds.)
[21:42] * whatevsz (~quassel@185.22.140.109) has joined #ceph
[21:46] * antongribok (~antongrib@216.207.42.140) Quit (Quit: Leaving...)
[21:47] * ngoswami__ (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[21:50] <vanham> There is too many things new on master... So, no good here...
[21:50] * krypto (~krypto@G68-90-105-31.sbcis.sbc.com) Quit (Read error: Connection reset by peer)
[21:54] * lx0 is now known as lxo
[21:57] * rendar (~I@host102-48-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[21:59] * thomnico (~thomnico@2a01:e35:8b41:120:4810:b973:7519:fa42) Quit (Quit: Ex-Chat)
[21:59] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[22:02] * mykola (~Mikolaj@91.245.78.125) Quit (Quit: away)
[22:03] <ktdreyer> vanham: you want jewel, or jewel-backports
[22:05] <TMM> be-el: After downgrading to hammer the cache pools seem to work as expected (by me at least) I'll test in my lab to see if infernalis does the same thing
[22:06] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[22:08] * penguinRaider_ (~KiKo@116.202.34.78) has joined #ceph
[22:08] * Nicola-1_ (~Nicola-19@x4db48315.dyn.telefonica.de) has joined #ceph
[22:08] * Nicola-1980 (~Nicola-19@x4db48315.dyn.telefonica.de) Quit (Read error: Connection reset by peer)
[22:14] * penguinRaider__ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[22:14] * georgem (~Adium@206.108.127.16) has left #ceph
[22:17] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:19] <vanham> ktdreyer, thanks! But I just checked that the fix is not backported yet
[22:20] <ktdreyer> vanham: ah, ok. is there a redmine ticket, with "Backport: jewel" ?
[22:20] <vanham> Yep!
[22:20] <ktdreyer> good deal
[22:20] <vanham> It's in progress
[22:20] <vanham> http://tracker.ceph.com/issues/16041
[22:22] <vanham> Let's wait a bit more for it then. Doing RBD+NFS is bad for the mind. CephFS is the way to go
[22:23] <derjohn_mobi> Hey, folks! Can I run 0.94.5, 0.94.6 and 0.94.7 in the same cluster or will taht mixing cause trouble?
[22:23] <vanham> derjohn_mobi, in threory yes, but you shouldn't, why would you need that?
[22:24] <derjohn_mobi> During Zero-Downtime Upgrade ...
[22:24] <derjohn_mobi> I mean I don't want to stop everything
[22:24] <vanham> Ok ok ok
[22:25] <SamYaple> derjohn_mobi: of course. thats what ceph excels at
[22:25] <vanham> You can to live upgrading
[22:25] <SamYaple> but you dont want to stay in that state
[22:25] <derjohn_mobi> I just wanted to know if it's "semver" compatible versioning ;)
[22:25] <derjohn_mobi> OK, thx !
[22:25] <vanham> x2: just don't stay in that mode 4ever
[22:25] <derjohn_mobi> sure ..... :)
[22:26] <derjohn_mobi> thx, for the quick reply !
[22:26] <vanham> You can do live upgrading from Hammer to Infernallis and latter from Infernallis to Jewel
[22:26] <vanham> So, you can even do different major versions
[22:26] <vanham> But then the order is important
[22:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:27] <vanham> The docs on upgrading is always complete
[22:27] * wjw-freebsd (~wjw@177-100-31-185.ftth.cust.k-sys.ch) Quit (Ping timeout: 480 seconds)
[22:28] <SamYaple> yea 0.94.5 directly to jewel (10.2.x) is doable too
[22:28] <SamYaple> derjohn_mobi: for upgrading the order of service restarts is important
[22:28] <SamYaple> derjohn_mobi: normally mons, then osds. but the docs always haveth right info on that order
[22:32] * dug (~cryptk@89.187.143.81) has joined #ceph
[22:34] * wjw-freebsd (~wjw@177-100-31-185.ftth.cust.k-sys.ch) has joined #ceph
[22:35] * vanham (~vanham@191.185.29.65) Quit (Quit: Ex-Chat)
[22:38] <[arx]> I am trying to setup jewel on centos 7. when running: ceph-disk activate, i get the following backtrace due a permissions error http://sprunge.us/FBiS
[22:39] <[arx]> weird thing is, the ceph user has rwx permissions to /var/lib/ceph/tmp/
[22:40] <m0zes__> [arx]: selinux or apparmor?
[22:40] <[arx]> selinux is disabled
[22:41] <[arx]> no apparmor
[22:41] <m0zes__> that is strange, then. I???m out of ideas...
[22:47] * penguinRaider__ (~KiKo@116.202.34.78) has joined #ceph
[22:47] * nhm (~nhm@46.20.243.154) has joined #ceph
[22:47] * ChanServ sets mode +o nhm
[22:54] * penguinRaider_ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[23:02] * dug (~cryptk@7V7AAF52R.tor-irc.dnsbl.oftc.net) Quit ()
[23:02] * Curt` (~CydeWeys@06SAADYOF.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:02] * IvanJobs (~ivanjobs@183.192.78.179) has joined #ceph
[23:07] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Remote host closed the connection)
[23:08] * penguinRaider_ (~KiKo@116.202.34.78) has joined #ceph
[23:10] * xarses (~xarses@64.124.158.100) Quit (Ping timeout: 480 seconds)
[23:10] * IvanJobs (~ivanjobs@183.192.78.179) Quit (Ping timeout: 480 seconds)
[23:11] <[arx]> weee, openstack/puppet-ceph disabling the udev rules that chown the partitions to ceph:ceph, prevents ceph-disk activate from working
[23:12] <jamespd_> [arx]: how are they being disabled?
[23:13] <jamespd_> sounds similar to an issue I had recently
[23:14] <[arx]> http://sprunge.us/eeIO
[23:15] * penguinRaider__ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[23:16] <[arx]> the module itself doesn't claim to support jewel on centos yet, i was trying to add that support.
[23:16] <jamespd_> heh
[23:21] * xarses (~xarses@64.124.158.100) has joined #ceph
[23:21] * penguinRaider__ (~KiKo@116.202.34.78) has joined #ceph
[23:21] * jclm (~jclm@rrcs-69-193-93-163.nys.biz.rr.com) has joined #ceph
[23:22] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Quit: Leaving.)
[23:28] * penguinRaider_ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[23:30] * penguinRaider__ (~KiKo@116.202.34.78) Quit (Ping timeout: 480 seconds)
[23:32] * Curt` (~CydeWeys@06SAADYOF.tor-irc.dnsbl.oftc.net) Quit ()
[23:32] * Maza (~Zombiekil@103.41.177.49) has joined #ceph
[23:34] * fsimonce (~simon@host107-37-dynamic.251-95-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[23:39] * ffilz (~ffilz@c-76-115-190-27.hsd1.or.comcast.net) Quit (Quit: Leaving)
[23:40] * penguinRaider__ (~KiKo@146.185.31.226) has joined #ceph
[23:45] * reed (~reed@216.38.134.18) has joined #ceph
[23:45] * ffilz (~ffilz@c-76-115-190-27.hsd1.or.comcast.net) has joined #ceph
[23:48] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[23:52] * m0zes__ (~mozes@n117m02.cis.ksu.edu) Quit (Quit: m0zes__)
[23:54] * xarses (~xarses@64.124.158.100) Quit (Ping timeout: 480 seconds)
[23:58] * Lokta (~Lokta@carbon.coe.int) Quit (Ping timeout: 480 seconds)
[23:59] * mhack (~mhack@24-107-236-214.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.