#ceph IRC Log

Index

IRC Log for 2014-06-18

Timestamps are in GMT/BST.

[0:03] * sleinen1 (~Adium@2001:620:0:26:b5f5:f6f8:e3a6:2a59) Quit (Quit: Leaving.)
[0:03] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[0:05] * sleinen1 (~Adium@2001:620:0:26:1d75:14eb:f0e8:167) has joined #ceph
[0:10] * madkiss (~madkiss@2001:6f8:12c3:f00f:1923:1794:9374:f199) Quit (Quit: Leaving.)
[0:10] * ikrstic (~ikrstic@178-222-94-242.dynamic.isp.telekom.rs) has joined #ceph
[0:11] * sleinen1 (~Adium@2001:620:0:26:1d75:14eb:f0e8:167) Quit (Quit: Leaving.)
[0:11] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:13] * phantomcircuit (~phantomci@2600:3c01::f03c:91ff:fe73:6892) has joined #ceph
[0:16] * rotbeard (~redbeard@2a02:908:df11:9480:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[0:17] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[0:17] * ikrstic (~ikrstic@178-222-94-242.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[0:18] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[0:22] * aldavud_ (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[0:23] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:46] * nljmo (~nljmo@173-11-110-227-SFBA.hfc.comcastbusiness.net) has joined #ceph
[0:47] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[0:47] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[0:48] * john_dee (~id@91.185.84.243) has joined #ceph
[0:48] <john_dee> Hi.
[0:49] <john_dee> Are there any projections on when CephFS will be production-ready?
[0:49] <rweeks> the same year that nuclear fusion is viable
[0:49] * rweeks grins and ducks
[0:49] <john_dee> I???m considering it for a project and, you know, it???s not recommended for produciton data on the site ^)
[0:51] <john_dee> The alternative of GlusterFS isn???t veyr tempting...
[0:53] <Knorrie> I think it's more like: "we're still developing on this, don't blame us if this kills your cat"
[0:53] <john_dee> rweeks: Very optimistic :)
[0:53] <Knorrie> just start using ceph, build a test setup, put production load on it, and you'll see what happens in your case
[0:54] <bens> dmick: people around the office are wearing "i like big data and cannot lie" shirts
[0:54] <Knorrie> the dangers are in corner cases that do not often occur
[0:54] <darkfader> those are called production use
[0:54] * barnim (~redbeard@2a02:908:df11:9480:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[0:55] <john_dee> Hehe
[0:55] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[0:55] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[0:56] * ChrisNBlum (~Adium@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Quit: Leaving.)
[0:56] * kevinc (~kevinc__@client65-78.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[0:56] <john_dee> The most recent comment from the devs just a month ago
[0:56] <john_dee> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/10140
[0:57] <Knorrie> oh, excuse me, I misread CephFS for Cepf
[0:57] * john_dee thinks he should look through the bugtracker.
[0:57] <Knorrie> s/f/h
[0:57] <kraken> Knorrie meant to say: oh, excuse me, I misread CephFS hor Ceph
[0:57] <john_dee> Yep, FS indeed.
[0:57] * fdmanana (~fdmanana@bl4-61-209.dsl.telepac.pt) Quit (Quit: Leaving)
[0:57] <Knorrie> in that case, ignore what I just said
[0:57] * jtang1 (~jtang@80.111.79.253) Quit (Quit: Leaving.)
[0:58] <darkfader> john_dee: i'd read that as a "6 months"
[0:58] <john_dee> I???d deploy Ceph without a second though as object storage, but FS is another thing.
[0:58] <Knorrie> But you can contribute to the project by starting to use it in a test setup, and reporting any issues you encounter
[0:58] <Knorrie> :)
[0:59] <darkfader> if they're now in the stabilizing phase it'll progress well (like before, mon and rbd in the last years)
[0:59] <john_dee> And it???s not that I???m prepared to take chances with ???unexpected bugs??? in production setup :)
[0:59] <darkfader> if you handle anything like financial data it's just not possible
[0:59] <darkfader> but then i'd also not (ever) go with gluster
[0:59] <john_dee> Knorrie: That I will definitely do. In fact I???m in the process of setting up a test env.
[1:00] <rweeks> just wait for GLADOS
[1:00] <john_dee> darkfader: Well, it???s not financial data, but still a mission-critical system. So not much space for improvisation :)
[1:01] <john_dee> rweeks: It might take longer than nuclear fusion :)
[1:01] * jtang1 (~jtang@80.111.79.253) has joined #ceph
[1:01] <darkfader> no go imho
[1:01] <john_dee> darkfader: Why not Gluster?
[1:01] <john_dee> It???s a web service, by the way.
[1:01] * jtang1 (~jtang@80.111.79.253) Quit ()
[1:01] <darkfader> test it for yourself, try how it handles outages
[1:01] <john_dee> A lot of small files on a web frontend.
[1:01] <john_dee> darkfader: Gluster?
[1:01] <darkfader> yes
[1:02] <john_dee> Not good, I assume? ^)
[1:02] <darkfader> that was my impression. it is said to be much better now but i don't trust it anymore
[1:02] <john_dee> I???m running a test setup at the moment and it handles performance not too good :\
[1:02] <darkfader> cephfs has bugs, the other thing has design flaws
[1:03] <Knorrie> heh
[1:03] <john_dee> darkfader: Gluster doesn???t have spofs like mds to take care of.
[1:03] <john_dee> It???s a spof in itself :)
[1:04] <rweeks> every distributed FS has to do _something_ to manage metadata
[1:04] <rweeks> this is not a trivial problem.
[1:04] <rweeks> gluster keeps metadata in a namespace cache in a separate volume
[1:04] <rweeks> that volume is the SPOF
[1:04] <darkfader> john_dee: for the actual thing you need, if you have small files, maybe you could just use the radosgw and turn small files into objects
[1:05] * kevinc (~kevinc__@client65-78.sdsc.edu) has joined #ceph
[1:05] <rweeks> the CephFS can have many MDSes, but each of those have metadata for a chunk of filesystem
[1:05] <rweeks> so you can reduce the SPOF
[1:05] <rweeks> but IIRC some of the bugs are around the multiple MDS
[1:06] <john_dee> darkfader: I wish it was possible at the moment, but it has to be a posix fs, not a object storage.
[1:06] <john_dee> rweeks: This is not the issue since CephFS itself isn???t quite stable.
[1:07] <john_dee> Any recommendations for the use case besides Gluster?
[1:07] <john_dee> GFS, OCFS2?
[1:07] <john_dee> Anyone has experience with these?
[1:07] <rweeks> john_dee, what I recall is that CephFS with a single MDS is actually quite stable
[1:07] <rweeks> it's when you get multiple MDS that it is not
[1:07] <darkfader> if ocfs2 is in the list (so you have some kind of shared storage) i'd go buy veritas
[1:07] <rweeks> (bear in mind my recollection may not be accurate)
[1:08] <darkfader> if the data is important and you need something that already works now
[1:08] <john_dee> darkfader: Budget = $0
[1:08] <rweeks> budget for a mission critical system is zero?
[1:08] <darkfader> sigh. :>
[1:08] <rweeks> pff
[1:08] <rweeks> can't be THAT critical
[1:08] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) has joined #ceph
[1:08] * aldavud_ (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:08] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:09] <darkfader> ocfs2 is nice because it's pretty small and trivial, imho. but i always hear from people (not me) it blew up on them
[1:09] <john_dee> rweeks: Nice, all you have to do is to wait until multiple MDS matures enough so you can get rid of this SPOF :)
[1:09] <rweeks> basically
[1:09] <john_dee> Well, maybe not zero, but probably not enough for Veritas :)
[1:09] <darkfader> if you don't need high perf, moosefs is also a nice thing
[1:10] <darkfader> it's like a cessna trainer for distributed filesystems ;)
[1:10] <john_dee> darkfader: Ouch. Blowing up doesn???t sound too reliable
[1:10] <rweeks> I'd still go back to a previous recommendation and give CephFS a try in your lab.
[1:10] <darkfader> john_dee: idk. i just _never_ saw
[1:10] <john_dee> High perf is nice to have. It???s a web frontend after all.
[1:11] <john_dee> rweeks: Will definitely do.
[1:11] <darkfader> cephfs is fun to use
[1:11] <darkfader> <3 it
[1:12] <john_dee> darkfader: On the other hand, I???ve read about Gluster blowing up too and any other solution.
[1:12] <john_dee> Probably a matter of edge cases, so to say %)
[1:12] <darkfader> for gluster i've seen it lol
[1:12] <john_dee> Beyond repair?
[1:13] <darkfader> beyond care
[1:13] <darkfader> it can take 30 minutes to fail over
[1:13] <darkfader> to a different "data brick" while the data is waiting there
[1:14] <john_dee> Real time eh???
[1:14] <darkfader> and that thing they always repeat about how you can still the access the data on a single node
[1:15] <darkfader> it's not true because you need to run it in a distributed raid mode with async write cache
[1:15] <darkfader> all nodes just have sparse files i think, so if you do a 'ls' it's there, but it's only some blocks inside
[1:20] <john_dee> Doesn???t get me closer to a decision, but thanks for the info. I will at least give CephFS a try on a test setup.
[1:20] <ponyofdeath> anyone know if its safe to remove the snapshots of an osd and only keep current? to help free up some room so that i can start the osd?
[1:25] * john_dee (~id@91.185.84.243) has left #ceph
[1:26] * rweeks (~goodeats@192.169.20.75.static.etheric.net) Quit (Quit: Leaving)
[1:31] * nljmo (~nljmo@173-11-110-227-SFBA.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[1:34] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[1:37] * LeaChim (~LeaChim@host86-174-77-240.range86-174.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:38] * nljmo (~nljmo@173-11-110-227-SFBA.hfc.comcastbusiness.net) has joined #ceph
[1:42] * The_Bishop (~bishop@e180178126.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[1:48] * xinyi (~xinyi@2406:2000:ef96:3:b81f:9fa3:ffff:dfb0) has joined #ceph
[1:48] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[1:52] * ScOut3R (~ScOut3R@5401C5FF.dsl.pool.telekom.hu) has joined #ceph
[1:54] * narb_ (~Jeff@38.99.52.10) has joined #ceph
[1:54] * narb (~Jeff@38.99.52.10) Quit (Read error: Connection reset by peer)
[1:54] * narb_ is now known as narb
[1:55] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[1:55] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: No route to host)
[1:56] * xinyi (~xinyi@2406:2000:ef96:3:b81f:9fa3:ffff:dfb0) Quit (Ping timeout: 480 seconds)
[1:57] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[2:00] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) has joined #ceph
[2:00] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[2:01] * aldavud_ (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[2:02] * cephtron (~cephtron@58-65-166-154.nayatel.pk) Quit (Quit: HydraIRC -> http://www.hydrairc.com <- \o/)
[2:05] * adamcrume (~quassel@2601:9:6680:47:38f4:bc14:868e:786) Quit (Remote host closed the connection)
[2:06] * ScOut3R (~ScOut3R@5401C5FF.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[2:07] * dmick (~dmick@74.203.127.5) has joined #ceph
[2:08] * sarob (~sarob@2001:4998:effd:600:953c:769d:7d2d:b99a) Quit (Remote host closed the connection)
[2:08] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Remote host closed the connection)
[2:08] * sarob (~sarob@2001:4998:effd:600:953c:769d:7d2d:b99a) has joined #ceph
[2:08] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) Quit (Read error: Connection reset by peer)
[2:08] * sz0 (~sz0@94.54.193.66) Quit (Read error: Connection reset by peer)
[2:09] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) has joined #ceph
[2:11] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[2:11] * ScOut3R (~ScOut3R@5401C5FF.dsl.pool.telekom.hu) has joined #ceph
[2:14] * sz0 (~sz0@94.54.193.66) has joined #ceph
[2:15] * zerick (~eocrospom@190.187.21.53) Quit (Ping timeout: 480 seconds)
[2:16] * sarob (~sarob@2001:4998:effd:600:953c:769d:7d2d:b99a) Quit (Ping timeout: 480 seconds)
[2:19] * ScOut3R (~ScOut3R@5401C5FF.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[2:19] * dmick1 (~dmick@74.203.127.5) has joined #ceph
[2:19] * dmick (~dmick@74.203.127.5) Quit (Read error: Connection reset by peer)
[2:20] * dmick1 (~dmick@74.203.127.5) Quit (Read error: Connection reset by peer)
[2:20] * kevinc (~kevinc__@client65-78.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[2:20] * dmick (~dmick@74.203.127.5) has joined #ceph
[2:20] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:21] * dmick1 (~dmick@2607:fb90:c15:cfb6:6c7a:f5bb:6dbd:4c18) has joined #ceph
[2:28] * dmick (~dmick@74.203.127.5) Quit (Ping timeout: 480 seconds)
[2:28] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Quit: Computer has gone to sleep.)
[2:28] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:28] * aldavud_ (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:31] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[2:33] * yguang11 (~yguang11@2406:2000:ef96:e:f5e1:188e:5db:7eb8) Quit (Remote host closed the connection)
[2:34] * yguang11 (~yguang11@2406:2000:ef96:e:f5e1:188e:5db:7eb8) has joined #ceph
[2:36] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Read error: Operation timed out)
[2:37] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[2:37] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[2:40] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[2:40] * dlan_ (~dennis@116.228.88.131) Quit (Read error: Operation timed out)
[2:41] * dlan (~dennis@116.228.88.131) has joined #ceph
[2:42] * dmick1 (~dmick@2607:fb90:c15:cfb6:6c7a:f5bb:6dbd:4c18) Quit (Ping timeout: 480 seconds)
[2:44] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[2:47] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[2:49] * xinyi (~xinyi@2406:2000:ef96:3:a9a3:341:654a:5fbc) has joined #ceph
[2:55] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) Quit (Ping timeout: 480 seconds)
[2:57] * xinyi (~xinyi@2406:2000:ef96:3:a9a3:341:654a:5fbc) Quit (Ping timeout: 480 seconds)
[2:57] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[2:57] * dlan_ (~dennis@116.228.88.131) has joined #ceph
[2:58] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[2:59] <sherry> Hey guys, I'd like to test writeback cache tier but I am not which options best suit me for hit_set_count, hit_set_period and etc. Is there any other document rather than ceph website that explained it in detail?
[2:59] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[2:59] <sherry> or is there anyone that can advice me on the right choice?
[2:59] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) has joined #ceph
[2:59] * dlan (~dennis@116.228.88.131) Quit (Ping timeout: 480 seconds)
[3:00] * tobiash (~quassel@mail.bmw-carit.de) has joined #ceph
[3:00] * rmoe (~quassel@12.164.168.117) Quit (Ping timeout: 480 seconds)
[3:04] * nljmo (~nljmo@173-11-110-227-SFBA.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[3:05] * nljmo (~nljmo@173-11-110-227-SFBA.hfc.comcastbusiness.net) has joined #ceph
[3:08] * tobiash_ (~quassel@host-88-217-137-244.customer.m-online.net) Quit (Ping timeout: 480 seconds)
[3:08] * KaZeR (~kazer@64.201.252.132) Quit (Remote host closed the connection)
[3:09] * nhm_ (~nhm@74.203.127.5) Quit (Ping timeout: 480 seconds)
[3:09] * narb (~Jeff@38.99.52.10) Quit (Quit: narb)
[3:10] * sz0 (~sz0@94.54.193.66) Quit ()
[3:15] <classicsnail> http://www.supermicro.com/products/system/1U/5018/SSG-5018A-AR12L.cfm
[3:16] <classicsnail> 1u, 12 hot swap 3.5" disks, 8 core avoton atom, up to 64GB of RAM on a board, and redudant power
[3:17] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) has joined #ceph
[3:27] * The_Bishop (~bishop@2001:470:50b6:0:d95d:e9f8:c7bc:b977) has joined #ceph
[3:44] * PureNZ (~paul@122-62-45-132.jetstream.xtra.co.nz) has joined #ceph
[3:49] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[3:52] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[3:57] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Read error: Operation timed out)
[4:02] * jcsp1 (~Adium@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[4:06] * jcsp (~Adium@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:09] * shang (~ShangWu@175.41.48.77) has joined #ceph
[4:18] * yguang11 (~yguang11@2406:2000:ef96:e:f5e1:188e:5db:7eb8) Quit (Remote host closed the connection)
[4:19] * yguang11 (~yguang11@2406:2000:ef96:e:a182:ccee:600:2fad) has joined #ceph
[4:20] * masta (~masta@190.7.213.210) Quit (Quit: Leaving...)
[4:30] * xinyi_ (~xinyi@2406:2000:ef96:e:d04:132f:a9d7:514f) has joined #ceph
[4:36] * haomaiwa_ (~haomaiwan@112.193.130.61) Quit (Read error: Connection reset by peer)
[4:39] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Quit: Computer has gone to sleep.)
[4:44] * DV_ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[4:44] * haomaiwang (~haomaiwan@112.193.130.61) has joined #ceph
[4:45] * zhaochao (~zhaochao@124.205.245.26) has joined #ceph
[4:56] <sherry> classicsnail: thanks for ur reply bt I meant in terms of c
[4:57] <sherry> classicsnail: thanks for ur reply bt I meant in terms of Ceph configurations.
[4:57] * sarob (~sarob@mobile-166-137-185-036.mycingular.net) has joined #ceph
[5:01] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) Quit (Quit: Leaving.)
[5:03] * xinyi (~xinyi@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[5:04] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[5:07] * xinyi_ (~xinyi@2406:2000:ef96:e:d04:132f:a9d7:514f) Quit (Ping timeout: 480 seconds)
[5:11] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[5:15] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[5:17] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[5:18] <classicsnail> sherry: I wasn't replying, I was merely putting it in here for general comment if anyone saw fit
[5:18] <classicsnail> your query, I don't know sorry
[5:18] <sherry> ah okay
[5:21] * Vacum_ (~vovo@i59F79493.versanet.de) has joined #ceph
[5:23] * Cube (~Cube@66-87-133-44.pools.spcsdns.net) has joined #ceph
[5:25] * themgt (~themgt@c-76-104-28-47.hsd1.va.comcast.net) has joined #ceph
[5:28] * Vacum (~vovo@88.130.206.118) Quit (Ping timeout: 480 seconds)
[5:33] * lucas1 (~Thunderbi@222.247.57.50) has joined #ceph
[5:33] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) has joined #ceph
[5:47] * Cube (~Cube@66-87-133-44.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[5:48] * Cube (~Cube@66-87-133-44.pools.spcsdns.net) has joined #ceph
[5:49] * jrist (~jrist@174-29-95-54.hlrn.qwest.net) Quit (Read error: Connection reset by peer)
[5:52] * Cube (~Cube@66-87-133-44.pools.spcsdns.net) Quit ()
[5:57] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Quit: Leaving.)
[5:57] * shalicke (~shalicke@192.241.186.125) Quit (Read error: Connection reset by peer)
[5:59] <mastamind> hi. does ceph support multipathing in any way? if it does not, would tcp multipathing be an option?
[6:02] * lucas1 (~Thunderbi@222.247.57.50) Quit (Quit: lucas1)
[6:05] * primechu_ (~primechuc@69.170.148.179) has joined #ceph
[6:05] * primechuck (~primechuc@69.170.148.179) Quit (Remote host closed the connection)
[6:11] * sarob (~sarob@mobile-166-137-185-036.mycingular.net) Quit (Remote host closed the connection)
[6:11] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[6:12] * sarob (~sarob@mobile-166-137-185-036.mycingular.net) has joined #ceph
[6:17] * haomaiwang (~haomaiwan@112.193.130.61) Quit (Remote host closed the connection)
[6:18] * haomaiwang (~haomaiwan@li721-169.members.linode.com) has joined #ceph
[6:18] * vbellur (~vijay@nat-pool-blr-t.redhat.com) has joined #ceph
[6:20] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:20] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[6:20] * sarob (~sarob@mobile-166-137-185-036.mycingular.net) Quit (Ping timeout: 480 seconds)
[6:22] <pinguini> why my KVM virtuals stucks when ceph in recovery state ?
[6:22] <pinguini> where is priority ?
[6:23] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[6:26] <lurbs_> pinguini: When Ceph is backfilling, you mean?
[6:26] * lurbs_ is now known as lurbs
[6:26] * dgbaley27 (~matt@c-98-245-167-2.hsd1.co.comcast.net) has joined #ceph
[6:27] * haomaiwa_ (~haomaiwan@112.193.130.61) has joined #ceph
[6:27] <lurbs> You can set the 'priority' of backfills by changing various config options, specifically 'osd max backfills' and the like: https://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
[6:28] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[6:28] * sarob (~sarob@2601:9:1d00:c7f:d5e9:37c:ba9a:5704) has joined #ceph
[6:31] * xinyi (~xinyi@vpn-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[6:31] * xinyi (~xinyi@2406:2000:ef96:e:d04:132f:a9d7:514f) has joined #ceph
[6:34] * haomaiwang (~haomaiwan@li721-169.members.linode.com) Quit (Ping timeout: 480 seconds)
[6:36] * sarob (~sarob@2601:9:1d00:c7f:d5e9:37c:ba9a:5704) Quit (Ping timeout: 480 seconds)
[6:40] * xinyi (~xinyi@2406:2000:ef96:e:d04:132f:a9d7:514f) Quit (Ping timeout: 480 seconds)
[6:40] <sherry> If I have a cache pool and a main storage pool, and I would like set the layout of a directory in CephFS, Do I need to map the directory to the main storage pool?
[6:41] * michalefty (~micha@188-195-129-145-dynip.superkabel.de) has joined #ceph
[6:54] * xinyi (~xinyi@2406:2000:ef96:e:3cb6:c0a8:5e14:bdf5) has joined #ceph
[6:56] * xinyi (~xinyi@2406:2000:ef96:e:3cb6:c0a8:5e14:bdf5) Quit (Remote host closed the connection)
[6:56] * xinyi (~xinyi@2406:2000:ef96:e:3cb6:c0a8:5e14:bdf5) has joined #ceph
[6:57] * sjm (~sjm@12.7.204.3) has joined #ceph
[6:57] * wschulze (~wschulze@12.7.204.3) has joined #ceph
[7:07] * yguang11 (~yguang11@2406:2000:ef96:e:a182:ccee:600:2fad) Quit (Remote host closed the connection)
[7:07] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) Quit (Ping timeout: 480 seconds)
[7:07] * yguang11 (~yguang11@2406:2000:ef96:e:a182:ccee:600:2fad) has joined #ceph
[7:08] * nhm (~nhm@74.203.127.5) has joined #ceph
[7:08] * ChanServ sets mode +o nhm
[7:09] * drankis_ (~drankis__@89.111.13.198) has joined #ceph
[7:12] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[7:16] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[7:18] * brytown (~brytown@142-254-47-204.dsl.dynamic.sonic.net) has joined #ceph
[7:18] * brytown (~brytown@142-254-47-204.dsl.dynamic.sonic.net) Quit ()
[7:20] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:22] * joshd (~joshd@74.203.127.5) has joined #ceph
[7:22] * joshd (~joshd@74.203.127.5) Quit ()
[7:22] * lucas1 (~Thunderbi@222.240.148.130) has joined #ceph
[7:22] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[7:24] * rdas (~rdas@nat-pool-pnq-t.redhat.com) has joined #ceph
[7:26] * Pedras (~Adium@50.185.218.255) Quit (Quit: Leaving.)
[7:27] * xinyi_ (~xinyi@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[7:29] * sarob (~sarob@2601:9:1d00:c7f:10c9:2d6f:ae5d:26ff) has joined #ceph
[7:31] * xinyi (~xinyi@2406:2000:ef96:e:3cb6:c0a8:5e14:bdf5) Quit (Ping timeout: 480 seconds)
[7:36] * wschulze (~wschulze@12.7.204.3) Quit (Quit: Leaving.)
[7:37] * sarob (~sarob@2601:9:1d00:c7f:10c9:2d6f:ae5d:26ff) Quit (Ping timeout: 480 seconds)
[7:39] * vbellur (~vijay@nat-pool-blr-t.redhat.com) Quit (Ping timeout: 480 seconds)
[7:39] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Quit: Leaving.)
[7:39] <dec> we still have pretty heavy performance impact on libvirt machines using RBD disks when backfilling is occurring
[7:39] <dec> fwiw
[7:40] <dec> even with osd_max_backfills=1 or thereabouts
[7:41] * reed (~reed@75-101-54-131.dsl.static.sonic.net) Quit (Ping timeout: 480 seconds)
[7:43] * nhm (~nhm@74.203.127.5) Quit (Ping timeout: 480 seconds)
[7:49] * lalatenduM (~lalatendu@nat-pool-blr-t.redhat.com) has joined #ceph
[7:52] * michalefty (~micha@188-195-129-145-dynip.superkabel.de) Quit (Quit: Leaving.)
[7:55] * vbellur (~vijay@209.132.188.8) has joined #ceph
[7:59] * saurabh (~saurabh@nat-pool-blr-t.redhat.com) has joined #ceph
[8:13] * rturk-away (~rturk@ds2390.dreamservers.com) Quit (Read error: Operation timed out)
[8:13] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[8:13] * jcsp1 (~Adium@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[8:16] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[8:20] * ikrstic (~ikrstic@178-222-94-242.dynamic.isp.telekom.rs) has joined #ceph
[8:21] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:24] * jcsp (~Adium@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[8:26] * sjm (~sjm@12.7.204.3) Quit (Read error: Operation timed out)
[8:28] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[8:29] * sleinen1 (~Adium@2001:620:0:26:1932:5e3f:488:398b) has joined #ceph
[8:29] * topro (~prousa@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:29] * sarob (~sarob@2601:9:1d00:c7f:e9a8:df2f:a88e:1c83) has joined #ceph
[8:31] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) has joined #ceph
[8:34] * osier (~osier@123.116.48.122) has joined #ceph
[8:36] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:37] * sarob (~sarob@2601:9:1d00:c7f:e9a8:df2f:a88e:1c83) Quit (Ping timeout: 480 seconds)
[8:39] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[8:41] * Sysadmin88 (~IceChat77@94.4.22.173) Quit (Quit: Why is the alphabet in that order? Is it because of that song?)
[8:47] * sleinen1 (~Adium@2001:620:0:26:1932:5e3f:488:398b) Quit (Quit: Leaving.)
[8:52] * DV_ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[9:00] * steki (~steki@91.195.39.5) has joined #ceph
[9:00] * Hell_Fire_ (~HellFire@123-243-155-184.static.tpgi.com.au) has joined #ceph
[9:00] * Hell_Fire (~HellFire@123-243-155-184.static.tpgi.com.au) Quit (Read error: Network is unreachable)
[9:01] * pressureman (~pressurem@62.217.45.26) has joined #ceph
[9:01] * masta (~masta@190.7.205.254) has joined #ceph
[9:02] * vbellur (~vijay@209.132.188.8) Quit (Ping timeout: 480 seconds)
[9:10] * rendar (~I@host37-182-dynamic.37-79-r.retail.telecomitalia.it) has joined #ceph
[9:11] * masta (~masta@190.7.205.254) Quit (Ping timeout: 480 seconds)
[9:12] * vbellur (~vijay@nat-pool-blr-t.redhat.com) has joined #ceph
[9:13] * steki (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:14] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[9:14] * steki (~steki@91.195.39.5) has joined #ceph
[9:15] * steki (~steki@91.195.39.5) Quit ()
[9:15] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: No route to host)
[9:16] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[9:20] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[9:21] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[9:21] * sleinen1 (~Adium@2001:620:0:26:51f7:9810:f834:7348) has joined #ceph
[9:21] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[9:21] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) Quit (Ping timeout: 480 seconds)
[9:22] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) has joined #ceph
[9:23] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[9:28] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[9:31] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:32] * lucas1 (~Thunderbi@222.240.148.130) Quit (Ping timeout: 480 seconds)
[9:33] * sleinen1 (~Adium@2001:620:0:26:51f7:9810:f834:7348) Quit (Quit: Leaving.)
[9:38] * AfC (~andrew@nat-gw2.syd4.anchor.net.au) Quit (Quit: Leaving.)
[9:41] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[9:44] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[9:44] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has left #ceph
[9:48] * sleinen (~Adium@user-28-9.vpn.switch.ch) has joined #ceph
[9:53] * ade (~abradshaw@dslb-088-074-027-132.pools.arcor-ip.net) has joined #ceph
[9:55] * rotbeard (~redbeard@2a02:908:df11:9480:76f0:6dff:fe3b:994d) has joined #ceph
[9:56] * thb (~me@2a02:2028:78:c0a0:c5b2:b14d:688d:e11f) has joined #ceph
[10:00] * sleinen (~Adium@user-28-9.vpn.switch.ch) Quit (Read error: Connection reset by peer)
[10:01] * leseb (~leseb@185.21.174.206) Quit (Killed (NickServ (Too many failed password attempts.)))
[10:01] * sleinen (~Adium@user-28-10.vpn.switch.ch) has joined #ceph
[10:03] * aldavud (~aldavud@213.55.176.177) has joined #ceph
[10:03] * leseb (~leseb@185.21.174.206) has joined #ceph
[10:14] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[10:16] * sleinen (~Adium@user-28-10.vpn.switch.ch) Quit (Ping timeout: 480 seconds)
[10:16] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[10:17] * thomnico (~thomnico@2a01:e35:8b41:120:80fb:f58e:1330:1aae) has joined #ceph
[10:18] * koleosfuscus (~koleosfus@ws11-189.unine.ch) has joined #ceph
[10:20] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[10:22] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[10:26] <ssejourne> hi. do you know a good guide to troubleshoot iowait on a rbd? I have only 22tps (mainly wops) on a rbd with avwait.
[10:26] <ssejourne> I heard about optracker but I don't find this tool/debug info
[10:30] * boichev (~boichev@213.169.56.130) has joined #ceph
[10:31] * ksingh (~Adium@2001:708:10:10:7942:5278:5efb:8381) has joined #ceph
[10:32] * haomaiwa_ (~haomaiwan@112.193.130.61) Quit (Ping timeout: 480 seconds)
[10:34] <ksingh> guys , need help , ceph-radosgw is not generating any logs. I have added below entries in ceph.conf , but still it is not generating any logs.
[10:34] <ksingh> [client.radosgw.gateway]
[10:34] <ksingh> host = storage0111-ib
[10:34] <ksingh> keyring = /etc/ceph/ceph.client.radosgw.keyring
[10:34] <ksingh> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
[10:34] <ksingh> log file = /var/log/ceph/client.radosgw.gateway.log
[10:34] <ksingh> rgw print continue = false
[10:34] <ksingh> rgw ops log rados = true
[10:34] <ksingh> rgw enable ops log = true
[10:35] <ksingh> can you suggest what i should do to generate logs , i have also tried restarting ceph services , but no luck
[10:35] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[10:42] * xinyi_ (~xinyi@vpn-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[10:43] * xinyi (~xinyi@2406:2000:ef96:e:3cb6:c0a8:5e14:bdf5) has joined #ceph
[10:45] * LeaChim (~LeaChim@host86-174-77-240.range86-174.btcentralplus.com) has joined #ceph
[10:45] * xinyi_ (~xinyi@2406:2000:ef96:3:453:b2e9:2cf0:75da) has joined #ceph
[10:46] * PureNZ (~paul@122-62-45-132.jetstream.xtra.co.nz) Quit (Ping timeout: 480 seconds)
[10:48] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Quit: Leaving)
[10:48] * b0e (~aledermue@juniper1.netways.de) has joined #ceph
[10:51] * xinyi (~xinyi@2406:2000:ef96:e:3cb6:c0a8:5e14:bdf5) Quit (Ping timeout: 480 seconds)
[10:53] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[10:58] * steki (~steki@91.195.39.5) has joined #ceph
[11:01] * allsystemsarego (~allsystem@86.121.2.97) has joined #ceph
[11:02] * hijacker (~hijacker@213.91.163.5) Quit (Quit: Leaving)
[11:05] * aldavud (~aldavud@213.55.176.177) Quit (Ping timeout: 480 seconds)
[11:08] * zidarsk8 (~zidar@prevod.fri1.uni-lj.si) has joined #ceph
[11:09] * zidarsk8 (~zidar@prevod.fri1.uni-lj.si) has left #ceph
[11:12] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[11:12] * BManojlovic (~steki@cable-94-189-165-169.dynamic.sbb.rs) Quit (Remote host closed the connection)
[11:13] * zack_dolby (~textual@219.117.239.161.static.zoot.jp) has joined #ceph
[11:15] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:17] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[11:17] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[11:23] * xinyi_ (~xinyi@2406:2000:ef96:3:453:b2e9:2cf0:75da) Quit (Ping timeout: 480 seconds)
[11:25] <tziOm> is ceph-disk supposed to work or depricated?
[11:25] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[11:26] * zack_dolby (~textual@219.117.239.161.static.zoot.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[11:27] * zack_dolby (~textual@219.117.239.161.static.zoot.jp) has joined #ceph
[11:29] <tziOm> because it for sure does not work very well here
[11:48] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[11:51] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[11:53] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[11:56] * koleosfuscus (~koleosfus@ws11-189.unine.ch) Quit (Quit: koleosfuscus)
[11:58] * jnq (~jon@0001b7cc.user.oftc.net) Quit (Quit: WeeChat 0.3.7)
[12:03] * zack_dolby (~textual@219.117.239.161.static.zoot.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[12:06] * koleosfuscus (~koleosfus@ws11-189.unine.ch) has joined #ceph
[12:07] * koleosfuscus (~koleosfus@ws11-189.unine.ch) Quit ()
[12:09] * zack_dolby (~textual@219.117.239.161.static.zoot.jp) has joined #ceph
[12:18] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[12:19] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[12:23] * ade (~abradshaw@dslb-088-074-027-132.pools.arcor-ip.net) Quit (Quit: Too sexy for his shirt)
[12:23] <ssejourne> tziOm: from my point of view, ceph-disk is used in ceph-deploy and it's not supposed to be deprecated
[12:26] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[12:29] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) has joined #ceph
[12:32] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[12:38] <tziOm> ssejourne, ok, then I am disapointed by the shape it seems to be in
[12:38] <tziOm> ceph-disk activate
[12:38] <tziOm> got monmap epoch 1
[12:38] <tziOm> 2014-06-18 12:33:18.071392 7fa4a3a0e780 -1 journal read_header error decoding journal header
[12:38] <tziOm> 2014-06-18 12:33:18.125691 7fa4a3a0e780 -1 filestore(/var/lib/ceph/tmp/mnt.3dapWJ) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[12:38] <tziOm> ...ok..
[12:39] <tziOm> 2014-06-18 12:33:18.163760 7fa4a3a0e780 -1 created object store /var/lib/ceph/tmp/mnt.3dapWJ journal /var/lib/ceph/tmp/mnt.3dapWJ/journal for osd.0 fsid fd594f3b-d6ca-4267-bb45-138eec9c8629
[12:39] <tziOm> 2014-06-18 12:33:18.163837 7fa4a3a0e780 -1 auth: error reading file: /var/lib/ceph/tmp/mnt.3dapWJ/keyring: can't open /var/lib/ceph/tmp/mnt.3dapWJ/keyring: (2) No such file or directory
[12:39] <tziOm> ..why?
[12:39] <tziOm> 2014-06-18 12:33:18.164144 7fa4a3a0e780 -1 created new key in keyring /var/lib/ceph/tmp/mnt.3dapWJ/keyring
[12:39] <tziOm> Error EACCES: access denied
[12:39] <tziOm> Error EACCES: access denied
[12:39] <tziOm> hmm..
[12:41] <ssejourne> off course you use ceph-disk with root privileges?
[12:41] <tziOm> yes
[12:41] <tziOm> client.bootstrap-osd
[12:41] <tziOm> key: AQBDV6FTgGo+LBAAE267tqtWq0XBL4XoCJE3fg==
[12:41] <tziOm> caps: [mon] allow profile osd
[12:41] <tziOm> caps: [osd] allow *
[12:42] <tziOm> ..perhaps those permissions are wrong?
[12:42] <ssejourne> ceph-disk list ?
[12:42] <tziOm> It seems hardly documented..
[12:42] <tziOm> ceph-disk list lists nothing
[12:42] <ssejourne> that's bad
[12:42] <tziOm> ..even tho the ceph-disk-prepare --zap-disk --fs-type xfs /dev/sdc
[12:42] <tziOm> finishes seemingly without error
[12:43] <tziOm> ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
[12:43] <ssejourne> client.bootstrap-osd
[12:43] <ssejourne> caps: [mon] allow profile bootstrap-osd
[12:44] <tziOm> ok.. perhaps that is wrong.
[12:44] <tziOm> how do I remove a auth ?
[12:46] <tziOm> ssejourne, but still osd allow * ?
[12:46] <ssejourne> I have nothing for osd for bootstrap-osd
[12:47] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) has left #ceph
[12:47] <tziOm> ok, seems the auth was the key ... perhaps should be documented ;)
[12:47] * ade (~abradshaw@dslb-088-074-027-132.pools.arcor-ip.net) has joined #ceph
[12:47] <ssejourne> try : sudo ceph --id admin auth get-or-create client.bootstrap-osd mon 'allow' 'profile osd'
[12:47] <ssejourne> ok ;)
[12:48] <tziOm> also, ceph-disk does not know how to make a directory ;) df: `/srv/ceph/osd.1/.': No such file or directory
[12:48] <tziOm> df: no file systems processed
[12:49] <tziOm> also, ceph-disk does not respect settings in ceph.conf (osd data) ... mounted my disk in /var/lib/ceph/osd... even tho ceph.conf says: osd data = /srv/ceph/osd.$id
[12:50] <tziOm> ..but it complained about /srv/ceph ... but then created directory in /var/lib/ceph...
[12:50] * nhm (~nhm@74.203.127.5) has joined #ceph
[12:50] * ChanServ sets mode +o nhm
[12:52] <tziOm> ceph disk is a pile of crap, thats for sure!
[12:52] * fdmanana (~fdmanana@81.193.61.209) has joined #ceph
[12:53] * allsystemsarego_ (~allsystem@79.115.62.26) has joined #ceph
[12:53] <tziOm> ..one would believe --zap-disk actually zapped the disk, but: 7f1e54ac3780 -1 journal check: ondisk fsid a1301354-c72a-4d33-b323-56e99dca497b doesn't match expected a7091af3-c8a7-4dab-a306-3a4df6be8e7c, invalid (someone else's?) journal
[12:53] * allsystemsarego_ (~allsystem@79.115.62.26) Quit ()
[12:53] * zack_dolby (~textual@219.117.239.161.static.zoot.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[12:55] * racingferret (~racingfer@81.144.225.214) has joined #ceph
[12:55] <djh-work> My pgs are in the state "creating", but they are in this state for quite a while now. ceph health shows them as inactive and unclean; pg map shows no osd associated with pgs: ".. -> up [] acting []". How am I able to recover from this state?
[12:56] <djh-work> If I try to remove all the (default) pools, ceph issues an "pool busy" error.
[12:56] <racingferret> hi guys, I'm a bit of a ceph n00b, so bare with me...
[12:56] <djh-work> (I even get an "is in use by CephFS" error, but I'm not using CephFS at all)
[12:59] <classicsnail> do you have an mds running?
[12:59] * tdasilva_ (~quassel@nat-pool-bos-u.redhat.com) has joined #ceph
[12:59] * allsystemsarego (~allsystem@86.121.2.97) Quit (Ping timeout: 480 seconds)
[13:00] * tdasilva (~thiago@nat-pool-bos-u.redhat.com) has joined #ceph
[13:00] * tdasilva (~thiago@nat-pool-bos-u.redhat.com) Quit ()
[13:00] * allsystemsarego (~allsystem@79.115.62.26) has joined #ceph
[13:01] <tziOm> 0 librados: osd.0 authentication error (1) Operation not permitted
[13:01] <tziOm> Error connecting to cluster: PermissionError
[13:01] <tziOm> failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/etc/ceph/keyring.osd.0 osd crush create-or-move -- 0 0.91 host=cdc01n01 root=default'
[13:01] <tziOm> ceph auth get osd.0
[13:01] <tziOm> exported keyring for osd.0
[13:01] <tziOm> [osd.0]
[13:01] <tziOm> key = AQAQcaFTaLW+LRAAZ4XAUS/OdCzuVw1UyoLsaQ==
[13:01] <tziOm> caps mon = "allow profile osd"
[13:01] <tziOm> caps osd = "allow *"
[13:02] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) has joined #ceph
[13:02] <tziOm> ..and this is the key created by ceph-disk ...
[13:02] <djh-work> classicsnail: no, just three monitors, two osds
[13:03] <racingferret> I've got a ceph cluster running as the backend to an Fuel deployed Openstack instance - ceph version 0.67.5. I have 9 OSDs all up, but there is always one that reports it's over 90%, even though "ceph df" reports 303G available.
[13:03] * drankis_ (~drankis__@89.111.13.198) Quit (Quit: Leaving)
[13:03] * drankis_ (~drankis__@89.111.13.198) has joined #ceph
[13:04] <racingferret> Is there a way to manually move some of the pgid's on that node to another less utilised node in the cluster?
[13:05] <classicsnail> djh-work: what does ceph -s say?
[13:06] * sm1ly_ (~sm1ly@ppp109-252-169-241.pppoe.spdop.ru) has joined #ceph
[13:07] * sm1ly (~sm1ly@broadband-188-255-25-37.nationalcablenetworks.ru) Quit (Read error: Connection reset by peer)
[13:07] * sm1ly_ (~sm1ly@ppp109-252-169-241.pppoe.spdop.ru) Quit ()
[13:09] <djh-work> classicsnail: 192 pgs stuck inactive; 192 pgs stuck unclean, and 192 creating
[13:09] <classicsnail> okay, the output of ceph mds dump
[13:12] <djh-work> classicsnail: ceph -s: http://paste.kde.org/p8tok0zhq/64yiqq and ceph mds dump: http://paste.kde.org/pjzybrdle/8h1woi
[13:13] * osier (~osier@123.116.48.122) Quit (Read error: Operation timed out)
[13:15] <pressureman> has anyone noticed a sudden increase in inconsistent pgs since upgrading to firefly (0.80.1)?
[13:15] <classicsnail> djh-work: okay, wasn't what I thought it was... no big deal, you could try newfsing the ceph pools, then removing them
[13:15] <classicsnail> djh-work: I'd also bring in a third osd if you can, or just set size to 2 for all your pools
[13:15] <pressureman> one of my clusters with 24 OSDs is getting about 2-3 inconsistent pgs per day, and the affected OSDs seems to be pretty random
[13:16] <Serbitar> RAM problems or controller problems?
[13:17] <pressureman> one two separate nodes, that never had any problems before firefly?
[13:17] <djh-work> classicsnail: I already set the default size to 2 in ceph.conf; it's just a testcluster so that's fine for now. What do you mean by newfsing the ceph pools?
[13:17] * rotbart (~redbeard@2a02:908:df11:9480:6267:20ff:feb7:c20) has joined #ceph
[13:18] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[13:18] <pressureman> i'm also seeing a ton of "xfsfilestorebackend(/var/lib/ceph/osd/ceph-10) set_extsize: FSSETXATTR: (22) Invalid argument", but i believe that has been fixed upstream, and backported to 0.80.next
[13:18] <classicsnail> ceph mds newfs 1 0 --yes-i-really-mean-it
[13:18] <classicsnail> will destroy your cephfs
[13:18] <classicsnail> but if you're not using it, it deletes everything associated with it
[13:19] <classicsnail> and I've been able to reliably remove the cephfs pools after that
[13:19] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[13:19] <classicsnail> 1 and 0 assuming, are your metadata and data pool ids
[13:19] <djh-work> classicsnail: so that's probably the issue right there? https://www.mail-archive.com/ceph-users@lists.ceph.com/msg10247.html
[13:20] <djh-work> classicsnail: but how is it even possible to come to such a state?
[13:20] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:20] <classicsnail> my answer will be "magic"
[13:21] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[13:21] <classicsnail> sounds like the issue
[13:22] * shang (~ShangWu@175.41.48.77) Quit (Ping timeout: 480 seconds)
[13:22] <djh-work> classicsnail: okay, I issued your newfs command succesfully, but nothing changed: 192 pgs stuck inactive; 192 pgs stuck unclean, and 192 creating
[13:23] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[13:23] <classicsnail> oh, that is probably another issue
[13:24] <classicsnail> when I've changed size with only 2 suitable places for pgs, I've either had to increase the pg_num and pgp_num count to fix that error, or to restart one half of the osds
[13:24] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[13:24] <classicsnail> that's with firefly
[13:24] <classicsnail> I've resolved it both ways, as recently as this afternoon
[13:26] <djh-work> classicsnail: restarting osds does not help either :)
[13:26] <djh-work> maybe I should just tear down this test cluster and start from scratch..
[13:28] <classicsnail> ceph osd pool get rbd size and ceph osd pool get rbd min_size ?
[13:29] <djh-work> classicsnail: 3 and 2
[13:32] <classicsnail> there's your problem
[13:32] <classicsnail> ceph osd pool set rbd size 2
[13:32] <classicsnail> you're trying to create three replicas on two suitable targets
[13:32] * number80 (~80@218.54.128.77.rev.sfr.net) Quit (Quit: 35)
[13:32] <classicsnail> of course, you could instead add a third osd
[13:32] <classicsnail> on a third host
[13:36] * osier (~osier@123.116.48.122) has joined #ceph
[13:37] * saurabh (~saurabh@nat-pool-blr-t.redhat.com) Quit (Quit: Leaving)
[13:41] * ChrisNBlum (~Adium@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[13:42] * yguang11 (~yguang11@2406:2000:ef96:e:a182:ccee:600:2fad) Quit (Remote host closed the connection)
[13:43] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[13:44] <djh-work> classicsnail: so, is my "osd pool default size = 2" setting in ceph.conf not enough to overwrite the default number of three replicas?
[13:47] <classicsnail> I don't think the default settings override the initial pools, no, as evidenced by the fact when queried, your default pools are reporting different numbers
[13:47] <djh-work> Argh! That's quite a tripping hazard for newcomers :(
[13:50] <classicsnail> yup, just tested it all right then on a brand new cluster I created just now
[13:50] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[13:50] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[13:50] <classicsnail> either set your size via the osd pool set command, or add a third osd on a third host (if using he default crush map)
[13:52] <djh-work> classicsnail: after setting all pool's size to 2, does it take a while? Because nothing has changes yet.
[13:52] * leseb (~leseb@185.21.174.206) Quit (Killed (NickServ (Too many failed password attempts.)))
[13:52] * leseb (~leseb@185.21.174.206) has joined #ceph
[13:53] <classicsnail> I've either have had ot restart one set of osds, or increase the pg_num accordingly
[13:56] * vbellur (~vijay@nat-pool-blr-t.redhat.com) Quit (Read error: Operation timed out)
[13:56] * SpComb (terom@zapotek.paivola.fi) has joined #ceph
[13:57] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[13:57] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[13:58] <SpComb> I have a test cluster with two OSDs, and a pool with `replicated size 2`... but creating an rbd object only places it on one osd, and doesn't seem to replicate it to the second osd?
[13:58] <SpComb> osdmap e20 pool 'test' (3) object 'test0' -> pg 3.6e65e3ba (3.2) -> up ([0], p0) acting ([0], p0)
[13:58] <SpComb> not sure what the "up ([0], p0)" syntax there means
[13:58] <SpComb> but I can tell by the disk util. on the second osd that it's not there
[13:59] <SpComb> what have I screwed up? :)
[13:59] <djh-work> classicsnail: restarting one osd does not help; trying to increase the pg_num gives me the error "currently creating pgs, wait".
[14:02] <classicsnail> you have to wait for the existing pg_num increase to take effect, if you overlap them on teh same pool it will complain
[14:04] * lalatenduM (~lalatendu@nat-pool-blr-t.redhat.com) Quit (Quit: Leaving)
[14:05] * osier (~osier@123.116.48.122) Quit (Ping timeout: 480 seconds)
[14:06] * zhaochao (~zhaochao@124.205.245.26) has left #ceph
[14:08] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[14:09] <classicsnail> went over those changes on the test cluster I just built
[14:09] <classicsnail> 2014-06-18 12:09:10.588615 mon.0 [INF] osdmap e342: 72 osds: 72 up, 72 in
[14:09] <classicsnail> 2014-06-18 12:09:10.592450 mon.0 [INF] pgmap v875: 320 pgs: 320 active+clean; 0 bytes data, 3600 MB used, 392 TB / 392 TB avail
[14:09] <classicsnail> 2 hosts, 72 osds, 36 a side
[14:10] <classicsnail> I increased the number of pgs and pgps after setting size 2
[14:10] <classicsnail> you have to do that to all your pools, by the way
[14:10] <classicsnail> the size
[14:12] * vbellur (~vijay@209.132.188.8) has joined #ceph
[14:15] <ksingh> guys , need help , ceph-radosgw is not generating any logs. I have added below entries in ceph.conf , but still it is not generating any logs.
[14:15] <ksingh> [client.radosgw.gateway]
[14:15] <ksingh> host = storage0111-ib
[14:15] <ksingh> keyring = /etc/ceph/ceph.client.radosgw.keyring
[14:15] <ksingh> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
[14:15] <ksingh> log file = /var/log/ceph/client.radosgw.gateway.log
[14:15] <ksingh> rgw print continue = false
[14:15] <ksingh> rgw ops log rados = true
[14:15] <ksingh> rgw enable ops log = true
[14:15] <ksingh> can you suggest what i should do to generate logs , i have also tried restarting ceph services , but no luck
[14:16] <SpComb> seems like all my PGs are active+degraded on osd.0 and no PGs are on osd.1, I guess it's waiting for backfill?
[14:16] <SpComb> what makes a two-osd cluster with both osds up degraded?
[14:18] * DV_ (~veillard@libvirt.org) has joined #ceph
[14:19] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[14:21] * tdasilva (~quassel@nat-pool-bos-t.redhat.com) has joined #ceph
[14:22] <absynth> ksingh: shot in the dark - permission problem?
[14:25] * ufven (~ufven@130-229-28-186-dhcp.cmm.ki.se) has joined #ceph
[14:26] * tdasilva_ (~quassel@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[14:28] * osier (~osier@123.116.48.122) has joined #ceph
[14:29] <Hell_Fire_> SpComb: another one that can be missed is the OSDs not able to talk to each other for whatever reason, I'm getting that myself atm on my demo cluster
[14:29] * Hell_Fire_ is now known as Hell_Fire
[14:30] <ksingh> absynth : Your shot in the dark was almost accurate. Permission was correct , ownership problem :-) Thanks
[14:30] <SpComb> Hell_Fire: I've checked with netstat that they seem to have a TCP connection open
[14:30] <ksingh> absynth : log file should be owned by apache
[14:31] <SpComb> but yeah, I had e.g. both OSDs fill up their disk when I was first testing, so they were in some kind of osd-full state for a while
[14:31] <SpComb> then I resized them both, and wrote some more... still not replicating :/
[14:37] * jnq (~jnq@128.199.169.81) has joined #ceph
[14:37] * jnq (~jnq@128.199.169.81) Quit ()
[14:37] * jnq (~jnq@128.199.169.81) has joined #ceph
[14:43] * BranchPredictor (branch@predictor.org.pl) Quit (Ping timeout: 480 seconds)
[14:43] * sleinen (~Adium@2001:620:0:26:28b6:21e:be74:49cc) has joined #ceph
[14:44] * nhm (~nhm@74.203.127.5) Quit (Ping timeout: 480 seconds)
[14:46] * jtangwk (~Adium@gateway.tchpc.tcd.ie) Quit (Quit: Leaving.)
[14:46] * jtangwk (~Adium@gateway.tchpc.tcd.ie) has joined #ceph
[14:47] * BranchPredictor (branch@predictor.org.pl) has joined #ceph
[14:51] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) has joined #ceph
[14:54] * JayJ_ (~jayj@157.130.21.226) has joined #ceph
[14:56] * osier (~osier@123.116.48.122) Quit (Ping timeout: 480 seconds)
[14:58] * koleosfuscus (~koleosfus@ws11-189.unine.ch) has joined #ceph
[14:58] * osier (~osier@123.116.48.122) has joined #ceph
[15:04] * rotbart (~redbeard@2a02:908:df11:9480:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[15:05] * lpabon (~lpabon@66-189-8-115.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:06] * japuzzo (~japuzzo@ool-4570886e.dyn.optonline.net) has joined #ceph
[15:07] * KB (~oftc-webi@cpe-74-137-252-159.swo.res.rr.com) has joined #ceph
[15:12] * ChrisNBlum (~Adium@dhcp-ip-230.dorf.rwth-aachen.de) Quit (Ping timeout: 480 seconds)
[15:13] * rdas_ (~rdas@nat-pool-pnq-t.redhat.com) has joined #ceph
[15:14] <stj> i think I found a bug with the ceph apache/fastcgi repository...
[15:14] <stj> looks like libapache2-mod-fastcgi depends on apache2 >= 2.2.4, but the ceph repos only provide apache 2.2.22
[15:14] * vbellur1 (~vijay@209.132.188.8) has joined #ceph
[15:14] <stj> so the packages can't be installed as described on the radosgw documentation
[15:15] * JayJ_ (~jayj@157.130.21.226) Quit (Read error: Operation timed out)
[15:15] <stj> this is with Ubuntu trusty
[15:17] <stj> though... 2.2.22 is technically a higher version number than 2.2.4? maybe I'm doing something wrong
[15:17] * DV_ (~veillard@libvirt.org) Quit (Ping timeout: 480 seconds)
[15:17] * rturk|afk is now known as rturk
[15:17] * rdas (~rdas@nat-pool-pnq-t.redhat.com) Quit (Read error: Operation timed out)
[15:18] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[15:18] * rdas__ (~rdas@nat-pool-pnq-t.redhat.com) has joined #ceph
[15:19] * vbellur2 (~vijay@209.132.188.8) has joined #ceph
[15:20] <stj> ah, I think I found the problem. apache2.2-bin wants to be pulled from the trusty repos, since it's a newer version, but the ceph-provided apache2.2-common explicitly requires apache2.2-bin 2.2.22-2trusty.ceph, which isn't getting installed
[15:20] * vbellur (~vijay@209.132.188.8) Quit (Ping timeout: 480 seconds)
[15:22] * diegows (~diegows@170.51.67.192) has joined #ceph
[15:24] * vbellur1 (~vijay@209.132.188.8) Quit (Ping timeout: 480 seconds)
[15:25] * rturk is now known as rturk|afk
[15:25] * rturk|afk is now known as rturk
[15:26] * rdas_ (~rdas@nat-pool-pnq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[15:27] * hasues (~hazuez@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[15:30] * rturk is now known as rturk|afk
[15:34] * vbellur2 (~vijay@209.132.188.8) Quit (Quit: Leaving.)
[15:43] * thomnico_ (~thomnico@2a01:e35:8b41:120:2891:86a8:c9a9:6075) has joined #ceph
[15:46] * boichev (~boichev@213.169.56.130) Quit (Quit: Nettalk6 - www.ntalk.de)
[15:46] * thomnico (~thomnico@2a01:e35:8b41:120:80fb:f58e:1330:1aae) Quit (Ping timeout: 480 seconds)
[15:49] * rotbeard (~redbeard@2a02:908:df11:9480:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[15:50] * lalatenduM (~lalatendu@209.132.188.8) has joined #ceph
[15:51] * Andreas-IPO (~andreas@2a01:2b0:2000:11::cafe) Quit (Read error: Connection reset by peer)
[15:51] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[15:52] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[15:52] * Andreas-IPO (~andreas@2a01:2b0:2000:11::cafe) has joined #ceph
[15:58] * rturk|afk is now known as rturk
[16:00] * joef (~Adium@2620:79:0:131:d507:471d:e09c:7631) Quit (Read error: No route to host)
[16:00] * ksingh (~Adium@2001:708:10:10:7942:5278:5efb:8381) Quit (Quit: Leaving.)
[16:00] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[16:00] * joef (~Adium@138-72-131-163.pixar.com) has joined #ceph
[16:01] * kwmiebach__ (sid16855@id-16855.charlton.irccloud.com) Quit (Read error: No route to host)
[16:01] * kwmiebach__ (~sid16855@id-16855.charlton.irccloud.com) has joined #ceph
[16:02] * dmsimard_away is now known as dmsimard
[16:02] * mondkalbantrieb_ (~quassel@sama32.de) Quit (Quit: No Ping reply in 180 seconds.)
[16:04] * mondkalbantrieb (~quassel@sama32.de) has joined #ceph
[16:05] * zack_dolby (~textual@pdf8519e7.tokynt01.ap.so-net.ne.jp) has joined #ceph
[16:08] * rpowell (~rpowell@128.135.219.215) has joined #ceph
[16:09] * bkero (~bkero@216.151.13.66) Quit (Ping timeout: 480 seconds)
[16:09] * pressureman (~pressurem@62.217.45.26) Quit (Quit: Ex-Chat)
[16:10] * bkero (~bkero@216.151.13.66) has joined #ceph
[16:13] * rpowell (~rpowell@128.135.219.215) Quit (Quit: Leaving.)
[16:13] * rpowell (~rpowell@128.135.219.215) has joined #ceph
[16:13] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:15] * ivan`_ (~ivan`@192.241.198.49) has joined #ceph
[16:17] * ivan` (~ivan`@000130ca.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:17] * ivan`_ is now known as ivan`
[16:22] * Pedras (~Adium@50.185.218.255) has joined #ceph
[16:23] * osier (~osier@123.116.48.122) Quit (Ping timeout: 480 seconds)
[16:23] * haomaiwang (~haomaiwan@124.161.75.196) has joined #ceph
[16:23] * kapil (~ksharma@2620:113:80c0:5::2222) has joined #ceph
[16:29] * osier (~osier@123.116.48.122) has joined #ceph
[16:32] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[16:35] * diegows (~diegows@170.51.67.192) Quit (Ping timeout: 480 seconds)
[16:38] * b0e (~aledermue@juniper1.netways.de) Quit (Quit: Leaving.)
[16:38] * osier (~osier@123.116.48.122) Quit (Ping timeout: 480 seconds)
[16:39] * Pedras (~Adium@50.185.218.255) Quit (Quit: Leaving.)
[16:39] * nhm (~nhm@nat-pool-rdu-u.redhat.com) has joined #ceph
[16:39] * markbby (~Adium@168.94.245.3) has joined #ceph
[16:39] * ChanServ sets mode +o nhm
[16:39] * sleinen1 (~Adium@2001:620:0:26:e41d:df9e:a45e:f88a) has joined #ceph
[16:40] * osier (~osier@123.116.48.122) has joined #ceph
[16:40] * sleinen (~Adium@2001:620:0:26:28b6:21e:be74:49cc) Quit (Ping timeout: 480 seconds)
[16:43] * ajazdzewski (~quassel@lpz-66.sprd.net) has joined #ceph
[16:43] * markbby (~Adium@168.94.245.3) Quit (Remote host closed the connection)
[16:45] * markbby (~Adium@168.94.245.3) has joined #ceph
[16:46] * xinyi (~xinyi@2406:2000:ef96:3:a17a:3548:c031:2de1) has joined #ceph
[16:47] * primechu_ (~primechuc@69.170.148.179) Quit (Remote host closed the connection)
[16:48] * bcundiff_ (~oftc-webi@h216-165-139-220.mdsnwi.dedicated.static.tds.net) Quit (Quit: Page closed)
[16:50] * nljmo (~nljmo@173-11-110-227-SFBA.hfc.comcastbusiness.net) Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz???)
[16:52] <SpComb> ok, adding a third osd, and then dropping/nuking/preparing/readding each of the two osds in turn fixed the cluster and let it rebalance properly
[16:53] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) has joined #ceph
[16:54] * xinyi (~xinyi@2406:2000:ef96:3:a17a:3548:c031:2de1) Quit (Ping timeout: 480 seconds)
[16:54] * primechuck (~primechuc@69.170.148.179) has joined #ceph
[16:54] * osier (~osier@123.116.48.122) Quit (Ping timeout: 480 seconds)
[16:55] * osier (~osier@123.116.48.122) has joined #ceph
[16:55] * i_m (~ivan.miro@gbibp9ph1--blueice2n1.emea.ibm.com) Quit (Quit: Leaving.)
[16:56] * steki (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:57] * rdas__ (~rdas@nat-pool-pnq-t.redhat.com) Quit (Quit: Leaving)
[16:59] * vbellur (~vijay@42.104.63.49) has joined #ceph
[17:02] * rdas (~rdas@nat-pool-pnq-t.redhat.com) has joined #ceph
[17:02] * markbby (~Adium@168.94.245.3) Quit (Remote host closed the connection)
[17:03] * kevinc (~kevinc__@client65-78.sdsc.edu) has joined #ceph
[17:04] * osier (~osier@123.116.48.122) Quit (Ping timeout: 480 seconds)
[17:04] * tdasilva (~quassel@nat-pool-bos-t.redhat.com) Quit (Read error: Operation timed out)
[17:05] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[17:05] * neurodrone (~neurodron@static-108-30-171-7.nycmny.fios.verizon.net) has joined #ceph
[17:07] * dgbaley27 (~matt@c-98-245-167-2.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[17:07] * sleinen1 (~Adium@2001:620:0:26:e41d:df9e:a45e:f88a) Quit (Quit: Leaving.)
[17:08] * markbby (~Adium@168.94.245.1) has joined #ceph
[17:09] * neurodrone (~neurodron@static-108-30-171-7.nycmny.fios.verizon.net) Quit ()
[17:11] * ChrisNBlum (~Adium@dhcp-ip-230.dorf.rwth-aachen.de) has joined #ceph
[17:11] * ChrisNBlum (~Adium@dhcp-ip-230.dorf.rwth-aachen.de) has left #ceph
[17:11] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[17:12] * rturk is now known as rturk|afk
[17:12] * primechuck (~primechuc@69.170.148.179) Quit (Read error: Connection reset by peer)
[17:12] * rturk|afk is now known as rturk
[17:12] * primechuck (~primechuc@69.170.148.179) has joined #ceph
[17:13] * tdasilva (~quassel@nat-pool-bos-u.redhat.com) has joined #ceph
[17:15] * rturk is now known as rturk|afk
[17:16] * brytown (~brytown@2620:79:0:2420::b) has joined #ceph
[17:17] * rturk|afk is now known as rturk
[17:17] <SpComb> I'm seeing qemu-system-x86 just spinning at 100% CPU and not going anywhere when booting up a VM with an rbd-backed disk :/
[17:17] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[17:17] * markbby (~Adium@168.94.245.1) has joined #ceph
[17:18] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) Quit ()
[17:18] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) has joined #ceph
[17:19] * adamcrume (~quassel@2601:9:6680:47:8c06:c13c:1870:a001) has joined #ceph
[17:21] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Ping timeout: 480 seconds)
[17:21] * ScOut3R (~ScOut3R@catv-80-99-64-8.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:25] <absynth> anyone from dreamhost around?
[17:26] <SpComb> attaching to the spinning qemu process with gdb, I get this stacktrace from the spinning thread: https://gist.github.com/anonymous/5072fce635011e2f4290
[17:26] <SpComb> multiple times
[17:30] * narb (~Jeff@38.99.52.10) has joined #ceph
[17:30] * rturk is now known as rturk|afk
[17:33] * kevinc (~kevinc__@client65-78.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[17:34] * rturk|afk is now known as rturk
[17:35] * Pedras (~Adium@50.185.218.255) has joined #ceph
[17:36] * rmoe (~quassel@12.164.168.117) has joined #ceph
[17:36] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:37] <SpComb> https://gist.github.com/anonymous/d10c8c479b0baa572a7c with debug symbols; some qemu-kvm thread is consistently spinning hard in that stach path
[17:37] * brytown (~brytown@2620:79:0:2420::b) Quit (Quit: Leaving.)
[17:38] * kevinc (~kevinc__@client65-78.sdsc.edu) has joined #ceph
[17:39] <SpComb> https://github.com/ceph/ceph/blob/master/src/osdc/Striper.cc#L144 gdb print shows that su=1
[17:39] * rturk is now known as rturk|afk
[17:41] <SpComb> so uuh with len=4M or so that's four million extents?
[17:41] * nljmo (~nljmo@64.125.103.162) has joined #ceph
[17:41] * rturk|afk is now known as rturk
[17:42] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[17:43] * sleinen1 (~Adium@2001:620:0:26:ad15:cfd9:b287:6277) has joined #ceph
[17:46] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[17:46] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[17:47] <SpComb> yeah, disabling the rbd cache for the guest (<driver name='qemu' type='raw' cache='none' />) lets it actually continue
[17:47] * vbellur1 (~vijay@42.104.63.49) has joined #ceph
[17:47] <SpComb> seems like there's somethingin my Ubuntu 14.04 ceph/libvirt setup that's causing the rbd writeback cache to go absolutely nuts trying to stripe the block device
[17:47] <absynth> good luck with performance then ;)
[17:47] <saturnine> Is there a way to make radosgw accept large uploads with no file extension?
[17:48] <saturnine> With the s3 api
[17:49] * ajazdzewski (~quassel@lpz-66.sprd.net) Quit (Remote host closed the connection)
[17:50] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[17:50] * Pedras (~Adium@50.185.218.255) Quit (Quit: Leaving.)
[17:50] * rturk is now known as rturk|afk
[17:51] * rturk|afk is now known as rturk
[17:51] * vbellur (~vijay@42.104.63.49) Quit (Ping timeout: 480 seconds)
[17:51] <SpComb> lolwut, `rbd --pool libvirt info catcp2-test2.disk` shows "stripe unit: 1 bytes" "stripe count: 4194304"
[17:51] <SpComb> that can't be a good idea
[17:51] * haomaiwang (~haomaiwan@124.161.75.196) Quit (Remote host closed the connection)
[17:51] <janos> :O
[17:51] <SpComb> I created it via libvirt's vol-create-as on a rbd pool...
[17:51] <SpComb> must have terrible terrible defaults
[17:51] <janos> man that's awesome
[17:52] <janos> haha
[17:52] * haomaiwang (~haomaiwan@124.161.75.196) has joined #ceph
[17:53] <SpComb> Removing image: 1% complete...
[17:54] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[18:00] * aldavud (~aldavud@213.55.184.211) has joined #ceph
[18:00] * rturk is now known as rturk|afk
[18:01] * xarses (~andreww@12.164.168.117) has joined #ceph
[18:02] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[18:02] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[18:03] * tdasilva (~quassel@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[18:06] <SpComb> found it: http://comments.gmane.org/gmane.comp.emulators.libvirt/96702
[18:06] * madkiss (~madkiss@213162068060.public.t-mobile.at) has joined #ceph
[18:06] <SpComb> or rather https://bugzilla.redhat.com/show_bug.cgi?id=1092208
[18:06] <SpComb> damn libvirt devs fat-fingerit their stripe_count/stripe_unit argument order and swapping the two around :D
[18:06] * lalatenduM (~lalatendu@209.132.188.8) Quit (Quit: Leaving)
[18:08] * haomaiwang (~haomaiwan@124.161.75.196) Quit (Remote host closed the connection)
[18:08] <SpComb> lesson learned: do not use `virsh vol-create-as` with rbd pools on libvirt <1.2.4
[18:08] <SpComb> now can I get two hours of my life back :<
[18:13] * joef1 (~Adium@2620:79:0:8207:f9be:8808:58f1:4bf0) has joined #ceph
[18:16] * rturk|afk is now known as rturk
[18:16] * joef1 (~Adium@2620:79:0:8207:f9be:8808:58f1:4bf0) Quit ()
[18:24] <mastamind> what is the maximum number of pools that can be created?
[18:25] * rdas (~rdas@nat-pool-pnq-t.redhat.com) Quit (Quit: Leaving)
[18:25] * cephtron (~cephtron@58-65-166-154.nayatel.pk) has joined #ceph
[18:26] <cephtron> hello everyone
[18:26] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[18:27] <cephtron> Can plz some one tell what does this error means : 2014-06-18 09:02:55.778427 osd.83 10.247.68.108:6814/4233 308282 : [WRN] slow request 56.274530 seconds old, received at 2014-06-18 09:01:59.503594: osd_sub_op(client.3217467.0:5021924 17.106b bf67506b/rbd_data.3117ab5c0c906d.0000000000001094/head//17 [] v 31279'1679 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent
[18:28] * KaZeR (~kazer@64.201.252.132) has joined #ceph
[18:31] * markbby (~Adium@168.94.245.4) has joined #ceph
[18:32] * thomnico_ (~thomnico@2a01:e35:8b41:120:2891:86a8:c9a9:6075) Quit (Quit: Ex-Chat)
[18:32] * paveraware (~tomc@216.51.73.42) has joined #ceph
[18:34] <paveraware> I???m seeing absolutely massive cpu usage in an idle cluster with FEC enabled. just wondering if anyone knows what difference there might be between an FEC pool and a normal pool. Trying to determine how I???ve misconfigured this system
[18:35] <paveraware> I???ve got about 4.5TB of data loaded into the cluster, and all the OSD processes are just spinning, using as much CPU as they can???
[18:35] * fdmanana (~fdmanana@81.193.61.209) Quit (Quit: Leaving)
[18:35] * koleosfuscus (~koleosfus@ws11-189.unine.ch) Quit (Ping timeout: 480 seconds)
[18:35] <paveraware> 12 cores on 3 nodes pegged at 100%
[18:36] <paveraware> no writes or reads are happening on the cluster currently
[18:37] * leseb (~leseb@185.21.174.206) Quit (Killed (NickServ (Too many failed password attempts.)))
[18:39] * madkiss1 (~madkiss@213162068091.public.t-mobile.at) has joined #ceph
[18:40] * leseb (~leseb@185.21.174.206) has joined #ceph
[18:42] * madkiss (~madkiss@213162068060.public.t-mobile.at) Quit (Ping timeout: 480 seconds)
[18:42] <KaZeR> hey folks.
[18:43] <KaZeR> i was wondering why ceph does not try to auto-repair inconsistent PGs, since most of the time doing a simple pg repair fix the issue
[18:43] * haomaiwang (~haomaiwan@124.161.75.196) has joined #ceph
[18:44] * osier (~osier@111.199.101.82) has joined #ceph
[18:45] * raul (~Aksel@AC9E0567.ipt.aol.com) has joined #ceph
[18:46] * gleam (gleam@dolph.debacle.org) Quit (Ping timeout: 480 seconds)
[18:47] <raul> try this site www.SoccerTips4Sure.com very profesional, nice earnings
[18:47] * raul (~Aksel@AC9E0567.ipt.aol.com) Quit (autokilled: Do not spam other people. Mail support@oftc.net if you feel this is in error. (2014-06-18 16:47:06))
[18:47] * rturk is now known as rturk|afk
[18:47] * xinyi (~xinyi@2406:2000:ef96:3:69df:2918:8b82:2a25) has joined #ceph
[18:47] * Sysadmin88 (~IceChat77@94.4.22.173) has joined #ceph
[18:48] * rturk|afk is now known as rturk
[18:50] * gleam (gleam@dolph.debacle.org) has joined #ceph
[18:51] * sleinen1 (~Adium@2001:620:0:26:ad15:cfd9:b287:6277) Quit (Quit: Leaving.)
[18:51] * haomaiwang (~haomaiwan@124.161.75.196) Quit (Ping timeout: 480 seconds)
[18:55] * xinyi (~xinyi@2406:2000:ef96:3:69df:2918:8b82:2a25) Quit (Ping timeout: 480 seconds)
[18:56] * rturk is now known as rturk|afk
[18:57] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[18:58] * joef1 (~Adium@2620:79:0:8207:45cc:a983:5d77:2a92) has joined #ceph
[18:59] * fdmanana (~fdmanana@81.193.61.209) has joined #ceph
[19:00] * rturk|afk is now known as rturk
[19:05] * aldavud (~aldavud@213.55.184.211) Quit (Ping timeout: 480 seconds)
[19:08] * hybrid512 (~walid@195.200.167.70) Quit (Quit: Leaving.)
[19:09] * madkiss1 (~madkiss@213162068091.public.t-mobile.at) Quit (Ping timeout: 480 seconds)
[19:10] * vbellur1 (~vijay@42.104.63.49) Quit (Ping timeout: 480 seconds)
[19:15] * madkiss (~madkiss@212095007051.public.telering.at) has joined #ceph
[19:23] * Sysadmin88 (~IceChat77@94.4.22.173) Quit (Quit: For Sale: Parachute. Only used once, never opened, small stain.)
[19:24] * tdasilva (~quassel@nat-pool-bos-t.redhat.com) has joined #ceph
[19:27] * vbellur (~vijay@42.104.61.178) has joined #ceph
[19:28] * lyncos (~chatzilla@208.71.184.41) has joined #ceph
[19:28] <lyncos> Hi guys .. I need little help with the feature set mismatch message ...
[19:29] <lyncos> On the ceph serveurs I get kernel 3.11 (ubuntu lts) and on client I get 3.14 (wheezy)... and it seems to complain about: missing 4000000000 which seems to be erasure codin
[19:29] <lyncos> yes I'm using erasure coding
[19:29] <lyncos> I thought if my client were higher version than the server everything should be fine... any idea ?
[19:32] <lyncos> maybe upgrading kernel of my servers would help ?
[19:32] <SpComb> I accidentially bootstrapped the mon keyring with `... --cap mds 'allow *'` and that caused ceph-create-keys to hang with "Error EINVAL: key for client.admin exists but cap mds does not match"
[19:32] <SpComb> gotcha :)
[19:34] <lyncos> by the way I get that exact error message: feature set mismatch, my 384a042aca < server's 784a042aca, missing 4000000000
[19:34] * rotbeard (~redbeard@2a02:908:df11:9480:76f0:6dff:fe3b:994d) has joined #ceph
[19:35] * lalatenduM (~lalatendu@122.167.7.156) has joined #ceph
[19:37] * ScOut3R (~ScOut3R@catv-89-133-44-70.catv.broadband.hu) has joined #ceph
[19:39] * joef1 (~Adium@2620:79:0:8207:45cc:a983:5d77:2a92) Quit (Quit: Leaving.)
[19:44] * haomaiwang (~haomaiwan@124.161.75.196) has joined #ceph
[19:45] * ScOut3R (~ScOut3R@catv-89-133-44-70.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[19:48] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[19:52] * haomaiwang (~haomaiwan@124.161.75.196) Quit (Read error: Operation timed out)
[19:52] <ponyofdeath> is it possible to move rdb image from one pool to another?
[19:53] <lyncos> I get the same problem with Kernel 3.14 on both Wheezy and Ubuntu LTS
[19:56] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[19:57] * madkiss (~madkiss@212095007051.public.telering.at) Quit (Quit: Leaving.)
[19:57] * vbellur (~vijay@42.104.61.178) Quit (Quit: Leaving.)
[19:58] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[19:59] * jharley (~jharley@69-196-185-180.dsl.teksavvy.com) has joined #ceph
[20:00] * rturk is now known as rturk|afk
[20:01] * drankis_ (~drankis__@89.111.13.198) Quit (Ping timeout: 480 seconds)
[20:02] * rturk|afk is now known as rturk
[20:04] * kevinc (~kevinc__@client65-78.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[20:05] * primechuck (~primechuc@69.170.148.179) Quit (Read error: Connection reset by peer)
[20:05] * primechuck (~primechuc@69.170.148.179) has joined #ceph
[20:06] * rturk is now known as rturk|afk
[20:06] * sigsegv (~sigsegv@188.26.160.142) has joined #ceph
[20:07] * brad_mssw (~brad@shop.monetra.com) has joined #ceph
[20:08] <brad_mssw> Is there any reason not to use a mounted LVM logical volume formatted with XFS as a ceph OSD? I noticed proxmox's wrappers seem to enforce the use of raw disks for use as OSDs in ceph. I was wondering if this is based on any real requirements, or if it is arbitrary?
[20:09] <blSnoopy> can pgs be recreated forcibly? (fresh cluster, something probably went wrong)
[20:11] * rturk|afk is now known as rturk
[20:12] * drankis_ (~drankis__@37.148.173.239) has joined #ceph
[20:12] <ponyofdeath> what is the best way to drain an osd? ceph osd out 13, does not seem to work my osd is still at 98% full
[20:14] * sigsegv (~sigsegv@188.26.160.142) Quit (Quit: sigsegv)
[20:16] * rturk is now known as rturk|afk
[20:17] * rturk|afk is now known as rturk
[20:18] * sigsegv (~sigsegv@188.26.160.142) has joined #ceph
[20:18] * lalatenduM (~lalatendu@122.167.7.156) Quit (Quit: Leaving)
[20:19] * lalatenduM (~lalatendu@122.167.7.156) has joined #ceph
[20:27] * sjm (~sjm@nat-pool-rdu-u.redhat.com) has joined #ceph
[20:34] <gleam> wow, redhat bought enovance too now
[20:37] * rturk is now known as rturk|afk
[20:37] <jharley> gleam: for a lot of money, even
[20:37] <gleam> quite a lot, yeah
[20:48] * rendar (~I@host37-182-dynamic.37-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:50] * rendar (~I@host37-182-dynamic.37-79-r.retail.telecomitalia.it) has joined #ceph
[20:51] * dignus (~jkooijman@t-x.dignus.nl) Quit (Read error: Connection reset by peer)
[20:54] <grepory> Was support for variable block size in rados bench removed?
[20:55] <grepory> (i'm using ceph firefly)
[20:55] * hijacker (~hijacker@213.91.163.5) Quit (Ping timeout: 480 seconds)
[20:56] <grepory> It isn't listed under rados bench options or shown in the help for rados bench.
[20:57] <grepory> Also, is there some client configuration that caps iops per client?
[20:58] * imriz (~imriz@5.29.200.177) has joined #ceph
[21:00] * dignus (~jkooijman@t-x.dignus.nl) has joined #ceph
[21:01] * lupu (~lupu@86.107.101.214) has left #ceph
[21:03] * rmoe (~quassel@12.164.168.117) Quit (Ping timeout: 480 seconds)
[21:04] * lalatenduM (~lalatendu@122.167.7.156) Quit (Ping timeout: 480 seconds)
[21:06] * lpabon (~lpabon@66-189-8-115.dhcp.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[21:06] <grepory> no... there doesn't appear to be.
[21:07] * brytown (~brytown@2620:79:0:8204:1de0:1dcf:dd76:ebf5) has joined #ceph
[21:07] * brytown (~brytown@2620:79:0:8204:1de0:1dcf:dd76:ebf5) has left #ceph
[21:10] * KaZeR (~kazer@64.201.252.132) Quit (Remote host closed the connection)
[21:11] * rmoe (~quassel@12.164.168.117) has joined #ceph
[21:16] * rturk|afk is now known as rturk
[21:16] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[21:17] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[21:19] * rturk is now known as rturk|afk
[21:20] * jharley (~jharley@69-196-185-180.dsl.teksavvy.com) Quit (Quit: jharley)
[21:23] * rturk|afk is now known as rturk
[21:25] * rturk is now known as rturk|afk
[21:25] * The_Bishop (~bishop@2001:470:50b6:0:d95d:e9f8:c7bc:b977) Quit (Ping timeout: 480 seconds)
[21:30] * rturk|afk is now known as rturk
[21:31] * osier (~osier@111.199.101.82) Quit (Ping timeout: 480 seconds)
[21:35] * kevinc (~kevinc__@client65-78.sdsc.edu) has joined #ceph
[21:36] * The_Bishop (~bishop@cable-86-56-95-128.cust.telecolumbus.net) has joined #ceph
[21:37] * jcsp_ (~john@nat-pool-rdu-u.redhat.com) has joined #ceph
[21:38] <grepory> Hmm... I have a client system with an RBD volume mounted. I changed the crush tunables on my cluster and now the mount point on the client system can't interact with the mount point. Any process that tries to interact with the volume gets stuck in an uninterruptible state.
[21:40] <paveraware> yeah, I???ve seen that a lot with the kernel RBD client, any change to crushmap causes it to hard hang
[21:40] <paveraware> anybody have any sizeable data in an erasure coded pool?
[21:40] <paveraware> I???m seeing absolutely abysmal cpu usage that scales with the amount of data in the pool
[21:41] <paveraware> more data == more cpu usage
[21:42] <paveraware> also, the cluster is currently idle, but all osd nodes are consuming 100% cpu on all cores, all in the osd processes
[21:46] <grepory> paveraware: thanks. i thought that was the case, so glad to know someone else has seen it too
[21:46] <paveraware> grepory: I???ve seen certain kernel versions/ceph versions behave better
[21:47] <paveraware> grepory: but I don???t have a ???use version X with version Y??? mapping that always works, we basically stopped using the kernel driver and only use rbd-fuse or librbd
[21:47] <grepory> paveraware: hmmm... librbd via qemu?
[21:48] <paveraware> yes
[21:48] <paveraware> and other places too, we???re writing data directly with librbd
[21:49] <grepory> gotcha
[21:49] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[21:52] * diegows (~diegows@r190-64-89-10.su-static.adinet.com.uy) has joined #ceph
[21:52] * abhas (~oftc-webi@122.171.79.188) has joined #ceph
[21:52] <abhas> Hi everyone... I'm trying to explore and learn about ceph...
[21:53] <abhas> so my most basic question is -- if it is possible to run all the ceph processes on the same physical server?
[21:53] <imriz> abhas, yes
[21:53] <imriz> it is possible
[21:53] <abhas> and can I do so in production too? or is this recommended only for development?
[21:54] <abhas> i'm looking at ceph as a way to implement high capacity email archives... and most deployments would start with a single machine
[21:56] <iggy> abhas: don't mount cephfs on computers running mds
[21:56] <iggy> or krbd on anything
[21:56] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Read error: Operation timed out)
[21:57] <iggy> aside from that, it should be possible (although not necessarily recommended)
[21:57] <imriz> abhas, a single server will also lose a lot of Ceph's goddies
[21:57] <iggy> you really have to know what you're doing to tune the cluster well enough to pull it off
[21:58] * kevinc (~kevinc__@client65-78.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[21:58] <iggy> i.e. during rebuild, OSDs can use a ton of memory and kick off the oom killer if you aren't careful
[21:58] * tdasilva (~quassel@nat-pool-bos-t.redhat.com) Quit (Remote host closed the connection)
[21:58] <abhas> sounds complicated...
[21:59] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[21:59] <imriz> abhas, I'd start with at least 2 osds and server, and at least 3 mons running, on 3 different servers
[21:59] <imriz> * 2 OSDs servers
[21:59] <abhas> we started off by using mongodb - gridfs to store binary data and that didn't scale too well ... and so I wanted to explore ceph
[22:00] <imriz> if you're looking for any kind of performance, you will need more servers
[22:00] <abhas> binary data is basically mail files... 9as in a maildir)
[22:00] <abhas> initially data integrity, sharding are the priorities
[22:01] <abhas> since the problem we are trying to solve is archival
[22:01] <imriz> like I said, start with at least two different servers for data
[22:01] <iggy> you can't expect much (availability wise) from a single node ceph setup
[22:02] <abhas> in that case, would it make sense to use VMs within a powerful single physical machine?
[22:02] <imriz> but in general, I recommend at least N+1 OSD servers (N being the replica count)
[22:02] <imriz> abhas, no
[22:02] <imriz> if you want protection, you need at least different physical servers
[22:02] <imriz> if not in different racks, etc
[22:03] <abhas> sure... makes sense, of course.
[22:03] <imriz> and benchmark benchmark benchmark
[22:03] <abhas> Thanks imriz and iggy!
[22:04] <imriz> My main issues with Ceph so far are performance, not data integrity or availability
[22:04] <imriz> the last two are superb
[22:04] <imriz> the former, not so much
[22:05] * abhas (~oftc-webi@122.171.79.188) Quit ()
[22:06] <iggy> are you using SSD journals?
[22:06] <imriz> iggy, no
[22:06] <imriz> and my problems are with reads
[22:06] <imriz> the whole concept of SSD journals is problematic
[22:07] <imriz> if you take the ratio of 1 SSD per OSDs, you are risking of losing 5 OSDs in a row
[22:07] <imriz> if the SSD fails
[22:07] <imriz> if you lower the ratio, a ceph cluster becomes really expensive
[22:07] <janos> depends on the size of th deployment
[22:07] <iggy> correct which is why you want multiple hosts
[22:08] <imriz> still
[22:08] <iggy> how exactly are you using ceph? cephfs? rbd? librados? etc?
[22:08] <imriz> radosgw
[22:08] * iggy backs away slowly
[22:09] <imriz> the competitors are already giving a good fight for the price
[22:09] <imriz> iggy, that is actually one of the areas Inktank claims to be the most complete
[22:09] <imriz> we started by wanting to use the FS
[22:10] <imriz> but Inktank doesn't support it yet
[22:10] * adisney1 (~adisney1@2602:306:cddb:49d0:6888:937b:e35c:699f) has joined #ceph
[22:10] * schmee (~quassel@phobos.isoho.st) Quit (Ping timeout: 480 seconds)
[22:10] <imriz> and anyway, my problem seems to affect rbd too
[22:11] <imriz> I see very high latency when fetching xattrs
[22:11] <imriz> the read it self seems to be fast (enough)
[22:11] <imriz> I am getting 70ms+ latency for 5.5~KB objects :(
[22:12] <imriz> anyway, the whole idea of SSDs are nice, but it works good only if you have a huge budget and a very large cluster where 5 OSDs aren't so much
[22:13] <imriz> don't forget the impact of backfills on the cluster
[22:13] * wattsmarcus5 (~mdw@aa2.linuxbox.com) Quit (Read error: No route to host)
[22:15] * baylight (~tbayly@69-195-66-4.unifiedlayer.com) Quit (Ping timeout: 480 seconds)
[22:15] * rturk is now known as rturk|afk
[22:19] * schmee (~quassel@phobos.isoho.st) has joined #ceph
[22:20] * ade (~abradshaw@dslb-088-074-027-132.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[22:20] * lupu (~lupu@86.107.101.214) has joined #ceph
[22:22] <cookednoodles> what do you mean by expensive ?
[22:22] <cookednoodles> ceph is pretty cheap compared to any competing 'san/nas' solution
[22:23] <imriz> cookednoodles, you'd be surprised how hard EMC are pushing Isilon
[22:25] <imriz> if you include support and you use smaller disks, more servers and you throw in good SLC SSDs with a lower ratio for OSDs
[22:25] <imriz> a decent ceph cluster becomes not that cheap
[22:25] * nhm (~nhm@nat-pool-rdu-u.redhat.com) Quit (Ping timeout: 480 seconds)
[22:26] <imriz> its a budget game
[22:26] * joef (~Adium@138-72-131-163.pixar.com) has left #ceph
[22:26] <imriz> Ceph is superior in features and in design
[22:26] <imriz> but if you can afford..
[22:26] <imriz> *can't
[22:27] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) Quit (Quit: Quitte)
[22:28] * jcsp_ (~john@nat-pool-rdu-u.redhat.com) Quit (Read error: Operation timed out)
[22:31] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[22:31] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[22:32] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[22:36] * sjm (~sjm@nat-pool-rdu-u.redhat.com) has left #ceph
[22:40] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[22:40] * baylight (~tbayly@69-195-66-4.unifiedlayer.com) has joined #ceph
[22:44] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[22:46] <grepory> is there a way to set the default client name in ceph.conf?
[22:47] * diegows (~diegows@r190-64-89-10.su-static.adinet.com.uy) Quit (Read error: Operation timed out)
[22:49] <grepory> i don't even see documentation for the client section of ceph.conf
[22:50] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) has joined #ceph
[22:54] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[22:56] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[22:56] <JayJ> Are there puppet modules to install Ceph?
[22:56] <JayJ> Or should I ask, which puppet module do you folks suggest to install Ceph?
[22:58] * xinyi (~xinyi@corp-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[22:59] <cookednoodles> JayJ, I think there are a few but not office, its on the todo list
[22:59] <cookednoodles> the best support iirc is via ansible, you could try porting
[23:00] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[23:00] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[23:01] <grepory> JayJ: there is one that isn't really maintained for puppet. it took considerable effort to get it going on puppet 2.7 (appears it was intended for either a later version of puppet or an installation with working puppetdb)
[23:02] * sleinen1 (~Adium@2001:620:0:26:e478:5973:9326:6d8d) has joined #ceph
[23:03] * kevinc (~kevinc__@client65-78.sdsc.edu) has joined #ceph
[23:03] <grepory> JayJ: https://github.com/enovance/puppet-ceph is what i originally used, however: https://github.com/stackforge/puppet-ceph is also there.
[23:03] <grepory> we've since switched to deploying with a combination of chef and ceph-deploy.
[23:05] <JayJ> grepory: were there any issues with enovance/puppet-ceph that you recall?
[23:06] <imriz> JayJ, we are working with enovance/puppet-ceph
[23:06] <imriz> very successfully
[23:06] <grepory> JayJ: It just didn't exactly work for us, but that may have been at the very least partially our own fault. It was more than a year ago at this point, though, so I don't recall.
[23:07] <JayJ> imriz: Thanks!
[23:07] <JayJ> grepory: thanks. I'll take a look at that.
[23:08] <imriz> JayJ, we are using it for a couple of months now
[23:08] <imriz> tested it from 0% to 100% deployment
[23:08] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:08] <grepory> so, according to config_opts.h there is no way to set the client name in ceph.conf
[23:09] <grepory> at least, according to my read of it.
[23:09] * primechuck (~primechuc@69.170.148.179) Quit (Remote host closed the connection)
[23:09] * rendar (~I@host37-182-dynamic.37-79-r.retail.telecomitalia.it) Quit ()
[23:09] <JayJ> imriz: Is there anything you want me to be careful before I go down that path?
[23:09] * jksM (~jks@3e6b5724.rev.stofanet.dk) Quit (Quit: jksM)
[23:09] <imriz> yes, we disabled the notify on changes to ceph.conf
[23:10] <imriz> we did not want it to restart OSDs automatically on each change
[23:12] <imriz> but other than that, it works perfectly as-is
[23:12] <JayJ> imriz: I am going back and forth between stackforge/puppet-ceph and eNovance/ceph, did not see enovance/puppet-ceph. Thanks a bunch.
[23:12] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Read error: Operation timed out)
[23:12] <imriz> you are most welcomed
[23:13] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[23:13] * allsystemsarego (~allsystem@79.115.62.26) Quit (Quit: Leaving)
[23:14] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[23:15] <imriz> JayJ, the only thing I currently see missing is the automatic discovery of disks
[23:15] <imriz> Currently, you have to specify the disks statically in the manifest
[23:15] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[23:15] <JayJ> imriz: What do you need to do to add them?
[23:15] <imriz> just add them to the manifest
[23:15] <imriz> like
[23:16] <imriz> ceph::osd::device { '/dev/sdc': }
[23:16] <imriz> using hiera with create_resources will probably be a small improvement here
[23:17] <imriz> but since our servers are identical, we didn't see the benefit of doing so
[23:17] <JayJ> imriz: Too new to this. I will start with this and probably bug you more soon :)
[23:17] <imriz> sure, glad to help
[23:18] <JayJ> imriz: Thanks
[23:18] <imriz> we added a define for radosgw, if you're interested
[23:18] <imriz> we haven't pushed it back yet
[23:18] * diegows (~diegows@r190-64-89-10.su-static.adinet.com.uy) has joined #ceph
[23:19] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[23:19] * drankis_ (~drankis__@37.148.173.239) Quit (Remote host closed the connection)
[23:19] <imriz> also, this module depends on exported resources, so install puppetdb if you haven't done so yet
[23:20] <adisney1> Can anyone help me with ceph-deploy? I'm following the step on the ceph website and I do "ceph-deploy mon create-initial" and during that I get an error about admin_socket no such file or directory.
[23:20] <ponyofdeath> hi, I have a caching pool infront of one of my rbd kvm pools. I am unable to start the vm's now with ,cache=writeback: error reading header from 4f512594-ddac-4626-82b1-b0f525dfc194
[23:20] <adisney1> I assume this is related to the .asok file in the previous command but that file DOES exist on the monitor machine.
[23:21] <imriz> adisney1, just a wild guess, but it is probably looking for it in the wrong place
[23:22] <imriz> what is the manual you're working with?
[23:23] <adisney1> It has an explicit path and it is there.
[23:23] <adisney1> http://ceph.com/docs/master/start/quick-ceph-deploy/
[23:23] <imriz> you can strace it and see where it looks for it
[23:26] <adisney1> I'm unfamiliar with strace. All I know is the previous command ceph-deploy is using is sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.plank1.asok mon_status
[23:27] <adisney1> And I look in that directory and is there
[23:28] <imriz> previous command? as in ceph-deploy install ?
[23:29] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Quit: Computer has gone to sleep.)
[23:29] <adisney1> I do "ceph-deploy mon create-initial" and it displays everything it's doing.
[23:29] <adisney1> The previous line in the log before the error is the sudo ceph
[23:30] <imriz> ahh, not the previous command in the manual
[23:30] <imriz> what if you run ceph -s ?
[23:31] <imriz> and what is the command which is failing?
[23:31] <SpComb> I kinda wish ceph let you define the osd id yourself
[23:31] <SpComb> would make puppeting it easier
[23:32] <imriz> adisney1, also, what is in your ceph.conf?
[23:33] * sleinen1 (~Adium@2001:620:0:26:e478:5973:9326:6d8d) Quit (Quit: Leaving.)
[23:34] <imriz> SpComb, what do you mean? why is it hard to puppetize it?
[23:34] * hasues (~hazuez@kwfw01.scrippsnetworksinteractive.com) Quit (Quit: Leaving.)
[23:35] <SpComb> imriz: managing the osd service needs the id
[23:35] <imriz> you don't need to specify it
[23:35] <imriz> you can discover it
[23:35] <imriz> look at the puppet module we discussed earlier
[23:35] <SpComb> yeah, but multiple puppet runs etc :/
[23:36] <SpComb> I named the monitors by $hostname
[23:36] <adisney1> Just the defaults that ceph-deploy creates. fsid, mon_initial_members, mon_host, auth_cluster_required = cephx, auth_service_required = cephx, auth_client_required = cephx, filestore_xattr_use_omap = true
[23:36] <adisney1> all under global
[23:37] <imriz> SpComb, I just let it enumerate it as it wants
[23:37] <imriz> adisney1, I need to recheck, but I don't think the default name includes the hostname
[23:38] <imriz> maybe that is the reason it is failing
[23:40] <imriz> show us the relevant part from ceph.conf
[23:40] * jtangwk (~Adium@gateway.tchpc.tcd.ie) Quit (Ping timeout: 480 seconds)
[23:40] * diegows (~diegows@r190-64-89-10.su-static.adinet.com.uy) Quit (Read error: Operation timed out)
[23:41] <imriz> do you have something like [mon.plank1] there ?
[23:42] <adisney1> No. It's just a [global] section
[23:43] <adisney1> 7 lines
[23:43] <adisney1> mon_initial_members = plank1
[23:43] <adisney1> I'm just trying to setup one monitor at the moment
[23:44] <adisney1> By the example on that page
[23:44] <imriz> try checking if you have a mon process running
[23:45] <imriz> also, try checking what is under /var/lib/ceph/mon/
[23:46] <imriz> you'll probably see a directory named mon.plank1, is that correct?
[23:46] * ikrstic (~ikrstic@178-222-94-242.dynamic.isp.telekom.rs) Quit (Quit: Konversation terminated!)
[23:46] <adisney1> yes there's a monitor running and there's a directory there
[23:47] <adisney1> called ceph-plank1
[23:47] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[23:48] <imriz> show us the command which is failing
[23:51] * xinyi (~xinyi@2406:2000:ef96:3:8fc:746a:d359:b185) has joined #ceph
[23:59] * masta (~masta@190.7.213.210) has joined #ceph
[23:59] * xinyi (~xinyi@2406:2000:ef96:3:8fc:746a:d359:b185) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.