#ceph IRC Log

Index

IRC Log for 2015-05-06

Timestamps are in GMT/BST.

[0:00] * fred`` (fred@earthli.ng) Quit (Quit: +++ATH0)
[0:01] * rendar (~I@host143-177-dynamic.8-79-r.retail.telecomitalia.it) Quit ()
[0:02] <rkeene> I was looking for some documentation on the new RBD mandatory locking... is there any ?
[0:03] * jdillaman (~jdillaman@pool-173-66-110-250.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[0:08] * oro (~oro@80-219-254-208.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:09] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[0:13] * fred`` (fred@earthli.ng) has joined #ceph
[0:14] <joshd> rkeene: no, definitely need to write some... basically it's automating the previous rbd lock-[add,remove,break] functionality with lock handoff to work with live migration
[0:16] <rkeene> So it doesn't deal with the underlying problem of the lock persisting longer than the process holding the RBD open ?
[0:16] <joshd> rkeene: no, watches still have timeouts
[0:17] * MACscr (~Adium@2601:d:c800:de3:b943:79cd:baeb:1ce3) Quit (Quit: Leaving.)
[0:17] <rkeene> And no watches with rbd still, right ?
[0:19] <joshd> rbd still uses watches to tell whether a client is alive
[0:20] <rkeene> (I mean, you can set watches on RBD objects obviously, but you have to work out the backing object name and apply using "ceph ..." instead of "rbd ...")
[0:20] * Swompie` (~roaet@53IAAAFK0.tor-irc.dnsbl.oftc.net) Quit ()
[0:20] * aleksag (~visored@heaven.tor.ninja) has joined #ceph
[0:21] <joshd> watch/notify is used more now, to communicate with the lock holder
[0:22] <rkeene> The use case I have for a locking mechanism is fencing. e.g.: 10 machines are all capable of hosting a VM that will exclusively own an RBD, and if the VM isn't running on any host any host can start it up, and if that host crashes any other host can restart it there
[0:25] <joshd> yeah, that's pretty much what mandatory locking is for
[0:25] <rkeene> Currently I set a lock (called "running"), but if the hosting machine crashes I have no way to remove the lock -- watching to see if the lock was renewed recently is a PITA
[0:25] * nhm (~nhm@184-97-175-198.mpls.qwest.net) has joined #ceph
[0:25] * ChanServ sets mode +o nhm
[0:25] <rkeene> Right, which is why I'm interested in it
[0:25] * bkopilov (~bkopilov@bzq-79-179-9-83.red.bezeqint.net) has joined #ceph
[0:26] <joshd> when the new client tries to use it, the original lock holder will be blacklisted and their lock broken when their watch times out
[0:26] <joshd> important to note this is just in librbd, so kernel clients can't use it yet
[0:27] <rkeene> Right, but the old client may still be alive when the new one tries to use it
[0:28] <rkeene> The new client should fail unless the old client is dead (basic fence, no STONITH)
[0:28] * JV (~chatzilla@204.14.239.55) has joined #ceph
[0:29] <joshd> yeah, in that case the behavior isn't exactly what you're looking for, since it's worried about the live migration case where two clients do have it open at once temporarily
[0:29] * JV_ (~chatzilla@204.14.239.105) has joined #ceph
[0:30] <joshd> we could add dying if another client has the lock and their watch is confirmed as alive as an option
[0:31] * fsckstix (~stevpem@1.136.96.133) Quit (Ping timeout: 480 seconds)
[0:31] <rkeene> Hmm ?
[0:32] <_robbat21irssi> any civetweb users around? looking at switching from the fastcgi to civetweb, but got a weird bug (on Hammer)
[0:32] <rkeene> You mean a watch for the client to die, so each 1 of the 9 remaining systems can attempt to obtain the lock ? That would be handy
[0:33] <rkeene> The only wrinkle there is that since the client dying is broadcast, every node will simultaneously attempt to *REMOVE* the lock, which may coincide with the winning node adding the lock... but I can deal with that
[0:33] <joshd> no, I mean handling the two living client case differently
[0:33] <joshd> the current behavior with two clients trying to write to the image is that they'll cooperatively trade the lock back and forth
[0:33] <joshd> to handle live migration
[0:34] * subscope (~subscope@92-249-244-15.pool.digikabel.hu) Quit (Quit: Textual IRC Client: www.textualapp.com)
[0:34] <rkeene> What happens if one host dies during live migration ?
[0:34] <joshd> but with plain fencing, you'd like the second client to stop if the first one is still alive
[0:35] <rkeene> Yeah, it's the same sort of issue you'd run into if one host died during live migration -- a lock with no client
[0:35] <joshd> then qemu will fail since the memory to run the guest isn't available
[0:36] <joshd> the existing behavior with a lock and no client is to break the lock and optionally (but by default) blacklist the old lock holder
[0:36] * JV (~chatzilla@204.14.239.55) Quit (Ping timeout: 480 seconds)
[0:37] <rkeene> More than that though, the VM could just never be started because (presumably) something will check for that lock (starting up the VM during live migration is probably frowned upon), and if it can start then no live migration can happen (unless the lock has a transient name, in which case just orphaned locks -- which probably cause no significant problems)
[0:38] * LeaChim (~LeaChim@host86-171-90-60.range86-171.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[0:38] <rkeene> Okay
[0:39] <rkeene> I thought the default behaviour for a lock with no client was to just return an error saying it was locked ?
[0:39] <rkeene> I can check again
[0:43] <rkeene> So if I do: rbd lock add rbd/data01 running start QEMU, stop QEMU, I should expect that running: rbd lock add rbd/data01 running should succeed ?
[0:43] * alram_ (~alram@206.169.83.146) Quit (Ping timeout: 480 seconds)
[0:45] <joshd> yeah, it only cares about a specific lock id that's it's using internally
[0:47] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[0:49] <rkeene> I get: rbd: lock is already held by someone else when I try to run the second "rbd lock" and there are no clients
[0:49] <rkeene> Is this a lock QEMU needs to create in coordination with its client connection via librbd, instead of me calling "rbd lock" directly ?
[0:50] * haomaiwa_ (~haomaiwan@118.244.254.16) has joined #ceph
[0:50] <joshd> librbd is creating it internally when the image has the exclusive-lock feature bit enabled
[0:50] * aleksag (~visored@5NZAACDPG.tor-irc.dnsbl.oftc.net) Quit ()
[0:50] <joshd> that error sounds like a bug
[0:51] <rkeene> The only lock that ever gets created is the one I create called "running"
[0:51] <joshd> oh no, that's the intended behavior for 'rbd lock'
[0:51] <rkeene> Right, which is what I was asking about
[0:52] <joshd> to use it mandatory locking, you need to create the image with --image-features 13 (friendlier format in master)
[0:52] <rkeene> But if QEMU will tell librbd to add a lock that will disappear/invalidate when the client connection goes away that would work for me too -- but there's no documentation I've seen on that
[0:52] <joshd> librbd handles it all internally, qemu is unaware
[0:53] <rkeene> And it will then show up as a lock in "rbd lock list" ?
[0:53] <joshd> yeah
[0:53] <rkeene> Okay
[0:54] <rkeene> So I need 8 (striping) and 1 (layering) in addition to 4 ?
[0:54] <rkeene> err, 8 (object map)
[0:54] * haomaiwang (~haomaiwan@114.111.166.250) Quit (Ping timeout: 480 seconds)
[0:54] * reed (~reed@75-101-54-131.dsl.static.fusionbroadband.com) Quit (Quit: Ex-Chat)
[0:55] <joshd> 13 is just all of the ones in hammer, you only actually need exclusive locking, but the others are handy too
[0:55] <rkeene> Okay, I'll go with 13
[0:55] <rkeene> Rebuilding now for testing
[0:57] * ircolle1 (~ircolle@2601:1:a580:1735:ea2a:eaff:fe91:b49b) Quit (Ping timeout: 480 seconds)
[1:01] * reed (~reed@75-101-54-131.dsl.static.fusionbroadband.com) has joined #ceph
[1:04] * fsckstix (~stevpem@bh02i525f01.au.ibm.com) has joined #ceph
[1:16] * bene (~ben@nat-pool-bos-t.redhat.com) Quit (Quit: Konversation terminated!)
[1:22] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) Quit (Quit: Segmentation fault (core dumped))
[1:27] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[1:29] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[1:33] * JV_ (~chatzilla@204.14.239.105) Quit (Ping timeout: 480 seconds)
[1:35] * shakamunyi (~shakamuny@wbucrp-gdm0a-as.bsc.disney.com) Quit (Ping timeout: 480 seconds)
[1:40] * dgurtner (~dgurtner@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:44] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[1:46] * carmstrong (sid22558@uxbridge.irccloud.com) has joined #ceph
[1:46] <carmstrong> hey folks, any pointers on what "10.132.253.121:6801/1 >> 10.132.162.16:6801/1 pipe(0x6b65000 sd=150 :6801 s=2 pgs=12 cs=3 l=0 c=0x4270680).fault with nothing to send, going to standby" in an OSD means?
[1:47] <carmstrong> this OSD is running, but is marked down shortly later
[1:47] <carmstrong> I have to restart it and then all placement groups recover almost immediately
[1:47] <carmstrong> even though the process is presumably running the entire time
[1:48] * jclm (~jclm@192.16.26.2) Quit (Quit: Leaving.)
[1:50] * AGaW (~Hejt@tor-exit.zenger.nl) has joined #ceph
[1:54] * chutwig (~textual@pool-173-63-230-184.nwrknj.fios.verizon.net) has joined #ceph
[2:00] * wushudoin (~wushudoin@209.132.181.86) Quit (Ping timeout: 480 seconds)
[2:03] * oms101 (~oms101@p20030057EA0A2300EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:05] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:12] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:12] * oms101 (~oms101@p20030057EA0B4800EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[2:13] * dyasny (~dyasny@198.251.61.137) Quit (Ping timeout: 480 seconds)
[2:16] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[2:16] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[2:17] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[2:17] * AGaW (~Hejt@53IAAAFQA.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[2:17] * Coe|work (~rhonabwy@8Q4AAAM7A.tor-irc.dnsbl.oftc.net) has joined #ceph
[2:20] * OutOfNoWhere (~rpb@199.68.195.102) has joined #ceph
[2:21] * sleinen1 (~Adium@2001:620:0:82::100) Quit (Ping timeout: 480 seconds)
[2:25] * sjmtest (uid32746@id-32746.uxbridge.irccloud.com) Quit (Quit: Connection closed for inactivity)
[2:31] * jks (~jks@178.155.151.121) Quit (Ping timeout: 480 seconds)
[2:35] * hellertime (~Adium@pool-173-48-154-80.bstnma.fios.verizon.net) has joined #ceph
[2:36] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) has joined #ceph
[2:37] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[2:38] * yguang11 (~yguang11@vpn-nat.corp.tw1.yahoo.com) Quit ()
[2:44] * JV (~chatzilla@12.19.147.253) has joined #ceph
[2:47] * Coe|work (~rhonabwy@8Q4AAAM7A.tor-irc.dnsbl.oftc.net) Quit ()
[2:47] * zapu (~geegeegee@marylou.nos-oignons.net) has joined #ceph
[2:51] * JV_ (~chatzilla@204.14.239.107) has joined #ceph
[2:52] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[2:57] * JV (~chatzilla@12.19.147.253) Quit (Ping timeout: 480 seconds)
[2:57] * JV_ is now known as JV
[3:01] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[3:02] <dmick> googled; got this as second or third hit: http://www.spinics.net/lists/ceph-devel/msg05691.html
[3:03] <dmick> so the message isn't the problem
[3:05] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:14] * yghannam (~yghannam@0001f8aa.user.oftc.net) has joined #ceph
[3:17] * zapu (~geegeegee@8Q4AAAM7S.tor-irc.dnsbl.oftc.net) Quit ()
[3:30] <carmstrong> dmick: hmm... ok. that would explain why its peers think the osd is down
[3:30] <carmstrong> maybe it's some sort of network issue on the host
[3:31] <dmick> probably where I'd be looking
[3:31] <carmstrong> we're running ceph in Linux containers and see this occasionally all over the place, so it isn't isolated to a single host
[3:31] <carmstrong> (we build a paas which uses ceph) and our users run into this periodically
[3:31] <carmstrong> hmm
[3:46] * shyu (~Shanzhi@119.254.196.66) has joined #ceph
[3:48] * kefu (~kefu@114.86.209.84) has joined #ceph
[3:51] * adept256 (~sixofour@kbtr2ce.tor-relay.me) has joined #ceph
[3:52] * vbellur (~vijay@122.167.250.154) has joined #ceph
[3:53] * georgem (~Adium@69-196-174-91.dsl.teksavvy.com) has joined #ceph
[4:01] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[4:16] * zhaochao (~zhaochao@124.202.190.2) has joined #ceph
[4:20] * JV (~chatzilla@204.14.239.107) Quit (Ping timeout: 480 seconds)
[4:21] * adept256 (~sixofour@53IAAAFU7.tor-irc.dnsbl.oftc.net) Quit ()
[4:21] * cooey (~pepzi@tor-exit2-readme.puckey.org) has joined #ceph
[4:24] * georgem (~Adium@69-196-174-91.dsl.teksavvy.com) Quit (Quit: Leaving.)
[4:26] * hellertime (~Adium@pool-173-48-154-80.bstnma.fios.verizon.net) Quit (Quit: Leaving.)
[4:28] * bandrus (~brian@36.sub-70-211-68.myvzw.com) Quit (Quit: Leaving.)
[4:36] * dmick (~dmick@206.169.83.146) Quit (Quit: Leaving.)
[4:37] * dmick (~dmick@206.169.83.146) has joined #ceph
[4:45] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[4:51] * cooey (~pepzi@789AAAKJW.tor-irc.dnsbl.oftc.net) Quit ()
[5:02] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:03] * Vacuum_ (~vovo@i59F79FB0.versanet.de) has joined #ceph
[5:06] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) has joined #ceph
[5:06] * jdillaman (~jdillaman@pool-173-66-110-250.washdc.fios.verizon.net) has joined #ceph
[5:06] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:09] * fxmulder_ (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:10] * Vacuum__ (~vovo@88.130.200.0) Quit (Ping timeout: 480 seconds)
[5:11] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[5:11] * guerby (~guerby@ip165-ipv6.tetaneutral.net) Quit (Quit: Leaving)
[5:11] * guerby (~guerby@ip165-ipv6.tetaneutral.net) has joined #ceph
[5:13] * shang (~ShangWu@014136240162.static.ctinets.com) has joined #ceph
[5:13] * Kupo1 (~tyler.wil@23.111.254.159) Quit (Read error: Connection reset by peer)
[5:21] * sage (~quassel@2607:f298:6050:709d:44af:276a:1cf4:e2e8) Quit (Remote host closed the connection)
[5:22] * sage (~quassel@2607:f298:6050:709d:f4ed:df5a:830d:eee) has joined #ceph
[5:22] * ChanServ sets mode +o sage
[5:23] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[5:29] * vbellur (~vijay@122.167.250.154) Quit (Ping timeout: 480 seconds)
[5:31] * jeevan_ullas (~Deependra@114.143.38.200) has joined #ceph
[5:32] * kefu (~kefu@114.86.209.84) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[5:32] * jdillaman (~jdillaman@pool-173-66-110-250.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[5:42] * chutwig (~textual@pool-173-63-230-184.nwrknj.fios.verizon.net) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[5:42] * frickler (~jens@v1.jayr.de) Quit (Remote host closed the connection)
[5:45] * b0e (~aledermue@213.95.25.82) has joined #ceph
[5:47] * karnan (~karnan@106.51.243.25) has joined #ceph
[5:48] * overclk (~overclk@121.244.87.117) has joined #ceph
[5:49] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has left #ceph
[5:51] * Doodlepieguy (~uhtr5r@spftor1e1.privacyfoundation.ch) has joined #ceph
[5:55] * shylesh (~shylesh@121.244.87.124) has joined #ceph
[5:58] <rkeene> joshd, I don't see any lock on the image
[5:58] * puffy (~puffy@50.185.218.255) has joined #ceph
[6:04] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[6:15] * namnh (~namnh@118.70.74.159) has joined #ceph
[6:16] <namnh> hi, i had a issue when played with ceph-deploy
[6:16] <namnh> i used ceph-deploy to create two cluster in one server
[6:17] <namnh> the first cluster 's name is dc1
[6:17] <namnh> the second cluster 's name is dc2
[6:17] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[6:18] <namnh> the first ceph cluster is ok
[6:19] <namnh> the second cluster i created: `ceph-deploy --cluster dc2 new ceph-node1`
[6:19] <namnh> that command is ok
[6:19] <namnh> but when i created mon node by using command : `ceph-deploy --overwrite-conf --cluster dc2 --ceph-conf dc2.conf mon create-initial`
[6:20] <namnh> i got an error
[6:21] <namnh> [ceph-node1][INFO ] Running command: initctl emit ceph-mon cluster=dc2 id=ceph-node1
[6:21] <namnh> [ceph-node1][INFO ] Running command: ceph --cluster=dc2 --admin-daemon /var/run/ceph/dc2-mon.ceph-node1.asok mon_status
[6:21] <namnh> [ceph-node1][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[6:21] <namnh> can anyone help me ?
[6:21] * Doodlepieguy (~uhtr5r@7R2AAAF9G.tor-irc.dnsbl.oftc.net) Quit ()
[6:21] <namnh> thanks
[6:21] * QuantumBeep (~Rosenblut@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[6:23] * cholcombe (~chris@pool-108-42-124-94.snfcca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:24] * rdas (~rdas@122.168.75.126) has joined #ceph
[6:28] * namnh (~namnh@118.70.74.159) Quit (Quit: Leaving)
[6:29] <rkeene> joshd, In fact, with --image-features 4, all I/O to the device just seems broken (Ceph 0.94.1)
[6:30] * shang (~ShangWu@014136240162.static.ctinets.com) Quit (Remote host closed the connection)
[6:30] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[6:33] * fxmulder_ (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) has joined #ceph
[6:35] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:38] * namnh (~namnh@118.70.74.159) has joined #ceph
[6:39] * namnh (~namnh@118.70.74.159) Quit ()
[6:39] * JV (~chatzilla@204.14.239.105) has joined #ceph
[6:51] * QuantumBeep (~Rosenblut@789AAAKNJ.tor-irc.dnsbl.oftc.net) Quit ()
[6:51] * TheDoudou_a (~Atomizer@ec2-54-153-74-162.us-west-1.compute.amazonaws.com) has joined #ceph
[6:55] * linjan (~linjan@213.8.240.146) Quit (Ping timeout: 480 seconds)
[6:56] * i_m (~ivan.miro@pool-109-191-92-175.is74.ru) has joined #ceph
[7:00] * wushudoin (~wushudoin@2601:9:4b00:f10:2ab2:bdff:fe0b:a6ee) has joined #ceph
[7:07] * bkopilov (~bkopilov@bzq-79-179-9-83.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[7:08] * badone_ (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) has joined #ceph
[7:12] * badone (~brad@CPE-121-215-241-179.static.qld.bigpond.net.au) Quit (Ping timeout: 480 seconds)
[7:13] * bkopilov (~bkopilov@bzq-79-177-57-199.red.bezeqint.net) has joined #ceph
[7:14] * vbellur (~vijay@122.167.250.154) has joined #ceph
[7:15] * amote (~amote@121.244.87.116) has joined #ceph
[7:16] * oms101 (~oms101@p20030057EA0B4800EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[7:16] * ajazdzewski_ (~ajazdzews@p4FC8F13A.dip0.t-ipconnect.de) has joined #ceph
[7:18] * Hemanth (~Hemanth@121.244.87.117) Quit (Ping timeout: 480 seconds)
[7:21] * rdas (~rdas@122.168.75.126) Quit (Quit: Leaving)
[7:21] * TheDoudou_a (~Atomizer@789AAAKOO.tor-irc.dnsbl.oftc.net) Quit ()
[7:21] * Neon (~Revo84@cloud.tor.ninja) has joined #ceph
[7:21] * bkopilov (~bkopilov@bzq-79-177-57-199.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[7:24] * oms101 (~oms101@p20030057EA6DCA00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[7:26] * Hemanth (~Hemanth@121.244.87.117) has joined #ceph
[7:35] * subscope (~subscope@92-249-244-15.pool.digikabel.hu) has joined #ceph
[7:35] * sleinen1 (~Adium@2001:620:0:82::104) has joined #ceph
[7:36] * derjohn_mob (~aj@tmo-108-150.customers.d1-online.com) has joined #ceph
[7:36] * vbellur (~vijay@122.167.250.154) Quit (Ping timeout: 480 seconds)
[7:37] * wushudoin (~wushudoin@2601:9:4b00:f10:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[7:40] * puffy (~puffy@50.185.218.255) Quit (Quit: Leaving.)
[7:44] * shohn (~shohn@dslc-082-082-188-008.pools.arcor-ip.net) has joined #ceph
[7:46] * oms101 (~oms101@p20030057EA6DCA00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[7:48] * frickler (~jens@v1.jayr.de) has joined #ceph
[7:48] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[7:49] * frickler (~jens@v1.jayr.de) Quit ()
[7:51] * rdas (~rdas@122.168.20.134) has joined #ceph
[7:51] * dustinm` (~dustinm`@105.ip-167-114-152.net) Quit (Ping timeout: 480 seconds)
[7:51] * Neon (~Revo84@7R2AAAGD6.tor-irc.dnsbl.oftc.net) Quit ()
[7:51] * Keiya (~QuantumBe@TerokNor.tor-exit.network) has joined #ceph
[7:52] * frickler (~jens@v1.jayr.de) has joined #ceph
[7:53] * Nacer (~Nacer@2001:41d0:fe82:7200:187f:fdd1:31f4:b1c9) has joined #ceph
[7:54] * oms101 (~oms101@p20030057EA73A400EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[7:56] * reed (~reed@75-101-54-131.dsl.static.fusionbroadband.com) Quit (Quit: Ex-Chat)
[7:57] * sleinen1 (~Adium@2001:620:0:82::104) Quit (Ping timeout: 480 seconds)
[8:01] * dustinm` (~dustinm`@2607:5300:100:200::160d) has joined #ceph
[8:01] * ajazdzewski_ (~ajazdzews@p4FC8F13A.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[8:02] * subscope (~subscope@92-249-244-15.pool.digikabel.hu) Quit (Quit: Textual IRC Client: www.textualapp.com)
[8:04] * vbellur (~vijay@121.244.87.117) has joined #ceph
[8:06] * oro (~oro@80-219-254-208.dclient.hispeed.ch) has joined #ceph
[8:07] * pvh_sa_ (~pvh@105-237-253-44.access.mtnbusiness.co.za) Quit (Ping timeout: 480 seconds)
[8:15] * yghannam (~yghannam@0001f8aa.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:17] * shohn (~shohn@dslc-082-082-188-008.pools.arcor-ip.net) Quit (Read error: Connection reset by peer)
[8:17] * shohn1 (~shohn@dslc-082-082-188-008.pools.arcor-ip.net) has joined #ceph
[8:21] * Keiya (~QuantumBe@5NZAACDSQ.tor-irc.dnsbl.oftc.net) Quit ()
[8:22] * antoine (~bourgault@192.93.37.4) has joined #ceph
[8:24] * JV (~chatzilla@204.14.239.105) Quit (Ping timeout: 480 seconds)
[8:25] * wicope (~wicope@0001fd8a.user.oftc.net) has joined #ceph
[8:27] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) Quit (Quit: I cna ytpe 300 wrods pre mniuet!!!)
[8:34] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has joined #ceph
[8:35] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) Quit (Quit: Bye)
[8:39] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[8:41] * Nacer (~Nacer@2001:41d0:fe82:7200:187f:fdd1:31f4:b1c9) Quit (Remote host closed the connection)
[8:46] * fattaneh (~fattaneh@194.225.33.200) has joined #ceph
[8:47] * jclm (~jclm@rrcs-74-87-24-254.west.biz.rr.com) has joined #ceph
[8:47] <schamane> Hi guys, i am getting crazy, install ceph around 50 times now, but i cant get disks working as osd
[8:47] <schamane> OSD::mkfs: ObjectStore::mkfs failed with error -22
[8:48] <schamane> ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.ix7ZT7: (22) Invalid argument
[8:48] <schamane> ERROR:ceph-disk:Failed to activate
[8:48] <schamane> and i dunno what i can do
[8:48] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) Quit (Quit: Leaving.)
[8:54] * ajazdzewski_ (~ajazdzews@lpz-66.sprd.net) has joined #ceph
[8:55] * ajazdzewski_ (~ajazdzews@lpz-66.sprd.net) Quit ()
[8:55] * cephiroth (~oftc-webi@br167-098.ifremer.fr) Quit (Quit: Page closed)
[8:56] * ajazdzewski (~ajazdzews@lpz-66.sprd.net) has joined #ceph
[8:57] * bkopilov (~bkopilov@bzq-79-177-177-100.red.bezeqint.net) has joined #ceph
[8:57] * Mika_c (~quassel@122.146.93.152) has joined #ceph
[8:59] * nljmo (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) has joined #ceph
[8:59] * nljmo_ (~nljmo@5ED6C263.cm-7-7d.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[9:02] * kawa2014 (~kawa@212.77.3.87) has joined #ceph
[9:03] * fattaneh (~fattaneh@194.225.33.200) Quit (Remote host closed the connection)
[9:03] * linjan (~linjan@195.110.41.9) has joined #ceph
[9:06] * dgurtner_ (~dgurtner@178.197.231.80) has joined #ceph
[9:06] * Concubidated (~Adium@71.21.5.251) Quit (Quit: Leaving.)
[9:07] * Concubidated (~Adium@71.21.5.251) has joined #ceph
[9:09] * derjohn_mob (~aj@tmo-108-150.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[9:11] * Concubidated (~Adium@71.21.5.251) Quit ()
[9:12] * fsckstix (~stevpem@bh02i525f01.au.ibm.com) Quit (Ping timeout: 480 seconds)
[9:13] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:13] * derjohn_mob (~aj@tmo-108-150.customers.d1-online.com) has joined #ceph
[9:13] * frickler_ (~jens@v1.jayr.de) has joined #ceph
[9:14] * frickler_ (~jens@v1.jayr.de) Quit ()
[9:14] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[9:14] * analbeard (~shw@support.memset.com) has joined #ceph
[9:15] * cooldharma06 (~chatzilla@14.139.180.52) has joined #ceph
[9:16] <Be-El> hi
[9:20] * oro (~oro@80-219-254-208.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[9:21] * nupanick (~mps@spftor5e2.privacyfoundation.ch) has joined #ceph
[9:21] * treenerd (~treenerd@85.193.140.98) has joined #ceph
[9:25] * cok (~chk@2a02:2350:18:1010:1184:68d8:996c:2326) has joined #ceph
[9:25] <schamane> found it, had to wipe the journal disk
[9:27] * derjohn_mob (~aj@tmo-108-150.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[9:30] * pvh_sa_ (~pvh@uwcfw.uwc.ac.za) has joined #ceph
[9:31] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[9:36] * fsimonce (~simon@host11-35-dynamic.32-79-r.retail.telecomitalia.it) has joined #ceph
[9:40] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[9:41] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Remote host closed the connection)
[9:44] * bobrik_ (~bobrik@83.243.64.45) has joined #ceph
[9:46] * bobrik__ (~bobrik@83.243.64.45) has joined #ceph
[9:46] * bobrik_ (~bobrik@83.243.64.45) Quit (Read error: Connection reset by peer)
[9:47] * bobrik___ (~bobrik@78.25.122.198) has joined #ceph
[9:47] * derjohn_mob (~aj@tmo-108-150.customers.d1-online.com) has joined #ceph
[9:47] <schamane> pg 0.1f is stuck inactive since forever, current state creating, last acting []
[9:47] <schamane> anyone a hint?
[9:48] <schamane> HEALTH_WARN 64 pgs stuck inactive; 64 pgs stuck unclean
[9:49] * bobrik__ (~bobrik@83.243.64.45) Quit (Read error: Connection reset by peer)
[9:49] * bobrik____ (~bobrik@83.243.64.45) has joined #ceph
[9:50] * jclm (~jclm@rrcs-74-87-24-254.west.biz.rr.com) Quit (Quit: Leaving.)
[9:50] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[9:51] * bobrik (~bobrik@83.243.64.45) Quit (Ping timeout: 480 seconds)
[9:51] * nupanick (~mps@789AAAKVM.tor-irc.dnsbl.oftc.net) Quit ()
[9:55] * thomnico (~thomnico@AToulouse-654-1-311-33.w86-199.abo.wanadoo.fr) has joined #ceph
[9:56] * rdas (~rdas@122.168.20.134) Quit (Quit: Leaving)
[9:57] * bobrik___ (~bobrik@78.25.122.198) Quit (Ping timeout: 480 seconds)
[9:58] * derjohn_mobi (~aj@fw.gkh-setu.de) has joined #ceph
[9:58] * thomnico (~thomnico@AToulouse-654-1-311-33.w86-199.abo.wanadoo.fr) Quit (Remote host closed the connection)
[9:59] * cok (~chk@2a02:2350:18:1010:1184:68d8:996c:2326) Quit (Quit: Leaving.)
[10:00] * thomnico (~thomnico@AToulouse-654-1-311-33.w86-199.abo.wanadoo.fr) has joined #ceph
[10:00] * derjohn_mob (~aj@tmo-108-150.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[10:03] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[10:03] <Be-El> schamane: maybe ceph is not able to assign osds for the pg
[10:06] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[10:07] * pdrakewe_ (~pdrakeweb@oh-71-50-38-193.dhcp.embarqhsd.net) Quit (Ping timeout: 480 seconds)
[10:08] * rdas (~rdas@122.168.20.134) has joined #ceph
[10:09] <schamane> Be-El, it tells me there are 3
[10:09] <schamane> osdmap e10: 3 osds: 3 up, 3 in
[10:09] <schamane> pgmap v17: 64 pgs, 1 pools, 0 bytes data, 0 objects
[10:09] <schamane> 64 creating
[10:11] <Be-El> schamane: see query
[10:11] <schamane> query times out
[10:12] <schamane> tested that already
[10:12] * karnan (~karnan@106.51.243.25) Quit (Ping timeout: 480 seconds)
[10:12] <Be-El> schamane: timeout on irc queries?
[10:12] <schamane> ah ups
[10:13] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[10:14] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit ()
[10:14] * jordanP (~jordan@213.215.2.194) has joined #ceph
[10:15] * oro (~oro@2001:620:20:16:2cea:29e7:e613:15fa) has joined #ceph
[10:18] * mwilcox_ (~mwilcox@116.251.192.71) Quit (Ping timeout: 480 seconds)
[10:21] * karnan (~karnan@106.51.232.102) has joined #ceph
[10:21] * datagutt (~Drezil@89.105.194.69) has joined #ceph
[10:23] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:23] * sleinen (~Adium@130.59.94.213) has joined #ceph
[10:24] * sleinen1 (~Adium@2001:620:0:82::108) has joined #ceph
[10:26] * haomaiwa_ (~haomaiwan@118.244.254.16) Quit (Ping timeout: 480 seconds)
[10:29] * sep (~sep@2a04:2740:1:0:52e5:49ff:feeb:32) Quit (Read error: Connection reset by peer)
[10:31] * sleinen (~Adium@130.59.94.213) Quit (Ping timeout: 480 seconds)
[10:40] * haomaiwang (~haomaiwan@114.111.166.250) has joined #ceph
[10:40] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[10:42] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[10:51] * datagutt (~Drezil@53IAAAGGH.tor-irc.dnsbl.oftc.net) Quit ()
[10:51] * ZombieL (~Aethis@176.10.99.203) has joined #ceph
[11:00] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[11:02] * sugoruyo (~georgev@paarthurnax.esc.rl.ac.uk) has joined #ceph
[11:02] * owasserm (~owasserm@52D9864F.cm-11-1c.dynamic.ziggo.nl) has joined #ceph
[11:03] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[11:08] <sugoruyo> hey folks, anyone ran into "too many PGs per OSD" on a small cluster?
[11:10] * bkopilov (~bkopilov@bzq-79-177-177-100.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[11:13] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has left #ceph
[11:14] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[11:16] <Be-El> sugoruyo: yes, you can disable the warning by setting mon_pg_warn_max_per_osd in the global section of ceph.conf
[11:16] <pvh_sa_> hey there... I've got a cluster here that's been off for some time, and now when its up again its complaining that "e20 not in monmap and have been in a quorum before; must have been removed" - but I never removed the mon from the monmap - any ideas how to fix this?
[11:16] <sugoruyo> Be-El: it's not so much the warning as the fact that my PGs aren't finishing their creation process
[11:17] <Be-El> sugoruyo: that's probably a different problem
[11:18] <sugoruyo> Be-El: so I'm trying to create a set of pools with a total number of PGs such that I'd get about 180 PGs/OSD but it's complaining that I've got 1100 PGs / OSD
[11:18] <sugoruyo> lots of PGs are like this: [36,10,2,24,2147483647,33,2147483647,2147483647,20,2147483647]
[11:19] <sugoruyo> it looks like they haven't been assigned the appopriate number of OSDs (the problem pools are EC pools with 8+2 and 16+2 settings)
[11:19] <Be-El> sugoruyo: you need to fix your crush maps. 2147483647 (-1 as signed int) indicates that no valid osd has been found
[11:20] <Be-El> sugoruyo: do you have enough hosts to satisty the 16+2 ec pool? do you have adjusted the crush settings to allow crush to find enough hosts if the number of hosts is close to the number of chunks for ec pools?
[11:21] <sugoruyo> Be-El: that's what I supsect, I have 6 hosts, 42 OSDs
[11:21] <sugoruyo> I haven't yet messed around with CRUSH
[11:21] * ZombieL (~Aethis@2WVAAB73Q.tor-irc.dnsbl.oftc.net) Quit ()
[11:21] <Be-El> sugoruyo: the default crush ruleset uses hosts for distribution. 6 hosts cannot satisfu the 8+2 or 16+2 requirements
[11:22] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[11:22] <sugoruyo> I'm assuming I need to come up with a CRUSH ruleset that will find enough hosts
[11:22] <sugoruyo> I need to read up on rulesets and make my own then
[11:22] <Be-El> sugoruyo: you may want to use a rule that uses osd-based distribution. but keep in mind that in case of a host failur you might have data loss
[11:23] <sugoruyo> Be-El: that's fine, this is a "dev" cluster
[11:23] <sugoruyo> it's used for testing some functionality works before moving on to the larger one for performance and then on to production testing
[11:24] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:24] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit ()
[11:28] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[11:35] * hellertime (~Adium@pool-173-48-154-80.bstnma.fios.verizon.net) has joined #ceph
[11:36] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[11:40] * branto (~branto@213.175.37.10) has joined #ceph
[11:41] * pvh_sa_ (~pvh@uwcfw.uwc.ac.za) Quit (Ping timeout: 480 seconds)
[11:45] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[11:49] * kefu (~kefu@114.86.209.84) has joined #ceph
[11:49] * pvh_sa_ (~pvh@uwcfw.uwc.ac.za) has joined #ceph
[11:50] * macjack (~macjack@122.146.93.152) has joined #ceph
[11:51] * pepzi (~Jaska@nx-74205.tor-exit.network) has joined #ceph
[12:03] * brutusca_ (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[12:03] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[12:05] <theanalyst> what causes a misdirected client request?
[12:07] * floppyraid (~holoirc@1.136.96.133) has joined #ceph
[12:13] * lucas1 (~Thunderbi@218.76.52.64) Quit (Remote host closed the connection)
[12:21] * pepzi (~Jaska@789AAAK19.tor-irc.dnsbl.oftc.net) Quit ()
[12:28] * rendar (~I@host154-179-dynamic.12-79-r.retail.telecomitalia.it) has joined #ceph
[12:29] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[12:33] * linjan (~linjan@195.110.41.9) has joined #ceph
[12:33] * brutusca_ (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[12:35] * derjohn_mobi (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[12:41] <sugoruyo> Be-El, or anyone: would you know what min_size in a ruleset means? I have an EC pool with 8+2 so `ceph osd pool ...` says min_size is 8 and size is 10. but the CRUSH ruleset it's using (auto-generated by Ceph) says min_size is 3
[12:41] * joelm (~joel@81.4.101.217) has left #ceph
[12:44] * derjohn_mobi (~aj@fw.gkh-setu.de) has joined #ceph
[12:46] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[12:51] * shyu (~Shanzhi@119.254.196.66) Quit (Remote host closed the connection)
[12:57] * Mika_c (~quassel@122.146.93.152) Quit (Remote host closed the connection)
[13:02] * rdas (~rdas@122.168.20.134) Quit (Quit: Leaving)
[13:03] <CapnBB> Hi - can I ask if anyone has tried using the 8TB Seagate Archive drives in an EC pool?
[13:05] <CapnBB> we bought 32 x ST8000AS0002-1NA for our first test cluster, which is behaving badly....
[13:05] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[13:05] * pvh_sa_ (~pvh@uwcfw.uwc.ac.za) Quit (Ping timeout: 480 seconds)
[13:05] <nc_ch> what does badly mean ?
[13:06] <CapnBB> but it could just be my inexperience in setting everything up ;)
[13:06] <nc_ch> well, i do not have any such large disks, i have a mix of 2 and 4 GB ones ...
[13:06] <nc_ch> what is the rest of your cluster layout ?
[13:07] <nc_ch> how are these disks distributed in your hosts, etc, that kind of stuff
[13:07] <CapnBB> rados bench writes quickly (500MB/s) for a few seconds, then crawls at 120MB/s, there are 4 OSD nodes, each with 8 x disks
[13:07] <nc_ch> ok, so that means you have only the 8TB disks
[13:08] <CapnBB> each node is a dell R710 with 24GB RAM, 10Gb nic, LSI SAS, and a MD1200 shelf.
[13:08] <nc_ch> cpu wise ?
[13:09] <CapnBB> I'm using a Kingston HyperX PCIe SSD for journal on each OSD server, dual E5520 @ 2.27GHz
[13:09] <nc_ch> the usual 5gb partitions for the journal, i guess ?
[13:09] <CapnBB> yes - all done with ceph-deploy
[13:09] <nc_ch> yes
[13:10] <nc_ch> 500 MB/sec means that in 10 seconds, your journal is full ...
[13:10] <nc_ch> then the writes wait for the disks themselves
[13:10] <nc_ch> i guess
[13:10] * kefu (~kefu@114.86.209.84) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[13:11] <CapnBB> makes sense.... These disks seem OK when streaming large files, perhaps the erasure coding creates a lot of fragments?
[13:12] <CapnBB> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
[13:12] <CapnBB> sdk 0.30 0.60 4.60 7.40 1268.80 116.80 115.47 93.15 8126.52 83.33 100.00
[13:12] <CapnBB> and I see this^
[13:12] <CapnBB> :(
[13:13] * karnan (~karnan@106.51.232.102) Quit (Ping timeout: 480 seconds)
[13:15] <nc_ch> what kind of EC are you running ?
[13:15] <nc_ch> oh you said
[13:15] <nc_ch> 8+2, sorry
[13:15] <nc_ch> ah, no, that was someone else :)
[13:15] <CapnBB> using the isi 3+1
[13:15] <nc_ch> how's your cpu utilisation on the nodes ?
[13:16] <sugoruyo> CapnBB: is the Archive series SMR?
[13:16] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[13:17] <CapnBB> not too bad normally, load average around 3. Yes - these are the SMR drives, hence my worry !
[13:17] <nc_ch> well, in that case, i would say ... the drives are good for cold storage
[13:18] <sugoruyo> CapnBB: this is a very interesting conversation, but I have to go to lunch! In March we were visited by Seagate to talk about SMR drives, their assessment was that they wouldn't work very well with EC, especially once SMR kicks in
[13:18] <nc_ch> sugoruyo has probably a point there
[13:19] <CapnBB> ahh...! I have to go to lunch too! I'll try with some He8 drives, I think these are non SMR
[13:19] <sugoruyo> they're meant for increasing cold data storage density not for serving hot data off, we've thought about it but haven't had a chance to test them ourselves
[13:19] <sugoruyo> a good way to use these is in a mixed cluster
[13:20] <nc_ch> that is what i would think ...
[13:20] <CapnBB> well - as a EC pool, under an SSD tier, I was hoping that we could use this for an archive data dump
[13:20] <sugoruyo> if you're doing EC with them you might want a bunch of normal ones in writeback caching
[13:20] <sugoruyo> CapnBB: you might want to look at the sizes that objects end up as after EC has chunked them up as well
[13:21] * bene (~ben@c-24-60-237-191.hsd1.nh.comcast.net) has joined #ceph
[13:21] <sugoruyo> CERN were tuning that bit but I don't know if they'd have put something on the net as in slides or a talk or something
[13:21] * karnan (~karnan@171.76.57.211) has joined #ceph
[13:21] <CapnBB> OK - I noticed the CERN talk, but that wasn't EC as I rcall
[13:21] * bkopilov (~bkopilov@bzq-109-66-178-93.red.bezeqint.net) has joined #ceph
[13:22] <sugoruyo> you could also use SMR with replication pools if you worked some CRUSH magic
[13:22] <sugoruyo> CapnBB: maybe look again, they're using quite a bit of EC on one of their Ceph clusters (obviously their OpenStack is running replication)
[13:23] <sugoruyo> CapnBB: if you're writing 4MB objects then EC 8+2 would turn those into 512KB chunks and write 10 of these, one on each disk
[13:24] <sugoruyo> hence my comment about looking at what size they end up as, we pushed our object size to 64MB with 16+2 so we end up with 4MB chunks
[13:25] <CapnBB> OK - I have a 7TB data set on the cluster that is mainly 32GB files, a quick glance at the OSD showed 4MB files
[13:25] <sugoruyo> I'm off to lunch will be back in about 40' if you want to continue this
[13:25] * floppyraid (~holoirc@1.136.96.133) Quit (Ping timeout: 480 seconds)
[13:25] <CapnBB> OK - I have to go for lunch & meeting - will be back in a couple of hours :)
[13:26] * CapnBB is now known as CapnBB_away
[13:34] * pvh_sa_ (~pvh@uwcfw.uwc.ac.za) has joined #ceph
[13:36] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[13:43] * bkopilov (~bkopilov@bzq-109-66-178-93.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[13:43] * bkopilov (~bkopilov@bzq-79-180-203-79.red.bezeqint.net) has joined #ceph
[13:43] <jnq> how would i start debugging a negative degraded state?
[13:45] * T1w (~jens@node3.survey-it.dk) Quit (Remote host closed the connection)
[13:48] <nc_ch> negative ... fun ...
[13:48] <nc_ch> i had that because of a network problem ... but it worked out all by itself ...
[13:49] <nc_ch> don't ask me where the negative sign came from :)
[13:49] <nc_ch> i was considering strong psychopharmaka at the moment i saw this
[13:49] <nc_ch> ( j/k )
[13:50] <nc_ch> what brought the negative degraded state ?
[13:50] * alfredodeza (~alfredode@198.206.133.89) has left #ceph
[13:51] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) has joined #ceph
[13:51] * dontron (~spate@ec2-52-68-94-85.ap-northeast-1.compute.amazonaws.com) has joined #ceph
[13:52] * floppyraid (~holoirc@202.161.23.74) has joined #ceph
[13:52] * antoine (~bourgault@192.93.37.4) Quit (Ping timeout: 480 seconds)
[13:57] * OutOfNoWhere (~rpb@199.68.195.102) Quit (Ping timeout: 480 seconds)
[13:57] * jdillaman (~jdillaman@pool-173-66-110-250.washdc.fios.verizon.net) has joined #ceph
[13:57] * pvh_sa_ (~pvh@uwcfw.uwc.ac.za) Quit (Ping timeout: 480 seconds)
[14:01] * kefu (~kefu@114.86.209.84) has joined #ceph
[14:04] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[14:05] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[14:06] * derjohn_mobi (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[14:07] * kefu (~kefu@114.86.209.84) Quit (Max SendQ exceeded)
[14:07] <sugoruyo> nc_ch: you were considering what?
[14:07] <nc_ch> stuff like valium :)
[14:08] * kefu (~kefu@114.86.209.84) has joined #ceph
[14:08] <nc_ch> strong medication anyway
[14:09] * pdrakeweb (~pdrakeweb@cpe-65-185-74-239.neo.res.rr.com) has joined #ceph
[14:09] <sugoruyo> nc_ch: I'm just curious about the word you used...
[14:09] <nc_ch> pharmakon is not really widely used ... i suppose it is my cultural background ...
[14:13] * antoine (~bourgault@192.93.37.4) has joined #ceph
[14:16] * derjohn_mobi (~aj@fw.gkh-setu.de) has joined #ceph
[14:16] * kefu (~kefu@114.86.209.84) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:21] * dontron (~spate@789AAAK63.tor-irc.dnsbl.oftc.net) Quit ()
[14:21] * spidu_ (~Sketchfil@spftor1e1.privacyfoundation.ch) has joined #ceph
[14:23] * zhaochao (~zhaochao@124.202.190.2) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 31.6.0/20150331233809])
[14:24] * nc_ch (~nc@flinux01.tu-graz.ac.at) Quit (Quit: Leaving)
[14:28] * cok (~chk@nat-cph1-sys.net.one.com) has joined #ceph
[14:28] * branto (~branto@213.175.37.10) Quit (Ping timeout: 480 seconds)
[14:28] * cok (~chk@nat-cph1-sys.net.one.com) Quit ()
[14:29] * georgem (~Adium@207.164.79.103) has joined #ceph
[14:35] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[14:35] * karnan (~karnan@171.76.57.211) Quit (Remote host closed the connection)
[14:36] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[14:39] * kefu (~kefu@114.86.209.84) has joined #ceph
[14:40] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[14:42] * branto (~branto@nat-pool-brq-t.redhat.com) has joined #ceph
[14:44] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[14:47] * blynch (~blynch@vm-nat.msi.umn.edu) Quit (Ping timeout: 480 seconds)
[14:49] * bene (~ben@c-24-60-237-191.hsd1.nh.comcast.net) Quit (Quit: Konversation terminated!)
[14:49] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[14:50] * blynch (~blynch@vm-nat.msi.umn.edu) has joined #ceph
[14:51] * spidu_ (~Sketchfil@789AAAK8F.tor-irc.dnsbl.oftc.net) Quit ()
[14:53] * cok (~chk@technet1-cph3.net.one.com) has joined #ceph
[14:56] * cooldharma06 (~chatzilla@14.139.180.52) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 21.0/20130515140136])
[14:57] * twofish (~twofish@UNIX1.ANDREW.CMU.EDU) has joined #ceph
[14:57] * twofish (~twofish@UNIX1.ANDREW.CMU.EDU) Quit ()
[14:57] * tw0fish (~tw0fish@UNIX1.ANDREW.CMU.EDU) has joined #ceph
[14:57] * sep (~sep@2a04:2740:1:0:52e5:49ff:feeb:32) has joined #ceph
[15:00] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) has joined #ceph
[15:01] * xdeller (~xdeller@h195-91-128-218.ln.rinet.ru) has joined #ceph
[15:02] * cok (~chk@technet1-cph3.net.one.com) Quit (Ping timeout: 480 seconds)
[15:03] <cmdrk> can an erasure coded profile be changed after creation?
[15:04] * sjm (~sjm@pool-173-70-76-86.nwrknj.fios.verizon.net) has joined #ceph
[15:06] * georgem (~Adium@207.164.79.103) Quit (Quit: Leaving.)
[15:07] * cephiroth (~oftc-webi@br167-098.ifremer.fr) has joined #ceph
[15:09] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[15:09] * cok (~chk@technet1-cph3.net.one.com) has joined #ceph
[15:09] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[15:12] * amote (~amote@121.244.87.116) Quit (Quit: Leaving)
[15:13] <cephiroth> hello, 2 disk were 2 osd were mounted have been resized , and i couldn't make them work again ( here the end of message we i tried to start them http://pastebin.com/0Cf5vprV )
[15:14] * overclk (~overclk@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:14] <cephiroth> so i recreated those osd, but since i ceph -s give me that http://pastebin.com/yz8c8h9K
[15:14] <cephiroth> *since ceph -s
[15:15] <cephiroth> and actually i can't manipulate the pool
[15:16] * dyasny (~dyasny@173.231.115.58) has joined #ceph
[15:16] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[15:17] * cok (~chk@technet1-cph3.net.one.com) Quit (Ping timeout: 480 seconds)
[15:21] * Hemanth (~Hemanth@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:21] * demonspork (~W|ldCraze@176.10.99.209) has joined #ceph
[15:24] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[15:28] * cok (~chk@nat-cph1-sys.net.one.com) has joined #ceph
[15:30] * yghannam (~yghannam@0001f8aa.user.oftc.net) has joined #ceph
[15:32] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[15:32] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[15:33] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has left #ceph
[15:36] * jeevan_ullas (~Deependra@114.143.38.200) Quit (Quit: Textual IRC Client: www.textualapp.com)
[15:37] * cok (~chk@nat-cph1-sys.net.one.com) Quit (Quit: Leaving.)
[15:38] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) has joined #ceph
[15:39] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[15:40] * thole3 (~Thomas@85.218.244.90) has joined #ceph
[15:40] * yghannam (~yghannam@0001f8aa.user.oftc.net) Quit (Quit: Leaving)
[15:41] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) Quit ()
[15:41] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[15:45] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[15:46] * colonD (~colonD@173-165-224-105-minnesota.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:46] * Elwell_ (~elwell@203-59-158-233.dyn.iinet.net.au) Quit (Read error: Connection reset by peer)
[15:46] * Manshoon (~Manshoon@208.184.50.131) has joined #ceph
[15:48] * ChrisNBlum (~ChrisNBlu@178.255.153.117) has joined #ceph
[15:50] <thole3> I have deployed a ceph cluster with 3 nodes via juju, but when I am trying to run eg. "sudo ceph status" on one of the nodes I get: ""librados: client.admin authentication error (1) Operation not permitted
[15:50] <thole3> Error connecting to cluster: PermissionError
[15:50] <thole3> ""
[15:50] <thole3> Can anybody lead me in the right direction on troubleshooting this?
[15:50] * Elwell (~elwell@115-166-26-235.ip.adam.com.au) has joined #ceph
[15:50] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[15:51] * demonspork (~W|ldCraze@789AAALA9.tor-irc.dnsbl.oftc.net) Quit ()
[15:51] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Read error: Connection reset by peer)
[15:53] * shylesh (~shylesh@121.244.87.124) Quit (Remote host closed the connection)
[15:53] <Be-El> thole3: no clue about juju, but you either lack permissions to read the client.admin key file or the file does not contain a valid admin key
[15:54] * JayJ (~jayj@157.130.21.226) has joined #ceph
[15:54] <thole3> I have verified both. Running the command as root and the client.admin key is lokated on all nodes.
[15:55] * colonD (~colonD@173-165-224-105-minnesota.hfc.comcastbusiness.net) has joined #ceph
[15:55] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) has joined #ceph
[15:55] <Be-El> thole3: but is it a valid key?
[15:59] <thole3> Be-El: How do I verify that it is valid?
[15:59] * branto (~branto@nat-pool-brq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:00] * JV (~chatzilla@12.19.147.253) has joined #ceph
[16:00] * Manshoon_ (~Manshoon@208.184.50.130) has joined #ceph
[16:00] * JV_ (~chatzilla@204.14.239.17) has joined #ceph
[16:00] * MertsA (~oftc-webi@wsip-68-15-214-247.at.at.cox.net) has joined #ceph
[16:00] <Be-El> thole3: that's a good question. you usually use commands like 'ceph auth list' to list users and their keys....but these commands require admin permissions and thus an admin key...
[16:02] * zaitcev (~zaitcev@2001:558:6001:10:61d7:f51f:def8:4b0f) has joined #ceph
[16:04] <thole3> Be-El: Should the admin key be the same on all nodes? Right now they are not.
[16:04] * macjack (~macjack@122.146.93.152) has left #ceph
[16:05] <Be-El> thole3: the admin key for a cluster should be the same on all hosts, yes
[16:05] <Be-El> thole3: did you try to install ceph several times? maybe the key present on the host belongs to a former installation attempt
[16:05] <cephiroth> 2 disk were 2 osd were mounted have been resized , and i couldn't make them work again ( here the end of message we i tried to start them http://pastebin.com/0Cf5vprV )
[16:05] <cephiroth> so i recreated those osd, but since i ceph -s give me that http://pastebin.com/yz8c8h9K
[16:06] <cephiroth> and actually i can't manipulate the pool
[16:06] * wushudoin (~wushudoin@2601:9:4b00:f10:2ab2:bdff:fe0b:a6ee) has joined #ceph
[16:06] * Manshoon (~Manshoon@208.184.50.131) Quit (Ping timeout: 480 seconds)
[16:08] * JV (~chatzilla@12.19.147.253) Quit (Ping timeout: 480 seconds)
[16:08] <MertsA> Anyone use ceph with just two serious nodes and a VPS?
[16:09] <s3an2> Does anyone have experience using a hybrid drive (SSHD) with Ceph? - Currently looking at building a ceph cluster and trying to select the 'best' drives to use so any imput is welcome.
[16:09] <MertsA> Or would using a VPS as a monitor just flat out fail?
[16:09] * branto (~branto@nat-pool-brq-t.redhat.com) has joined #ceph
[16:09] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[16:10] <thole3> Be-El: I have changed the key on all nodes, and can now run the command, but have to run it a couple of times before it work.
[16:11] <s3an2> MertsA: I run 3 mon's on VM's to support around 400 OSD's, not had a problem from the mon's in the 200 days it has been running.
[16:11] <smerz> mon's do create lot's of sync io. so underlying IO needs to be fast (and durable)
[16:11] <MertsA> So here's the kicker, what I meant was 2 mons on site and one mon in AWS
[16:11] <smerz> i wouldn't. apparently latency between osd's and mons are important
[16:12] <MertsA> how much traffic goes between mons and is 80ms of latency a complete deal breaker?
[16:12] <smerz> also when the aws mon is down you have no quorum
[16:12] <s3an2> I would not want to go over 5-10ms network latency
[16:12] <MertsA> Well, I guess there goes that idea
[16:13] <MertsA> Anyone have any super cheap mon nodes just for quorum? I basically have 2 nodes to work with and no serious budget for a third :(
[16:13] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[16:14] <smerz> no old machine for the 3rd monitor?
[16:14] <smerz> you don't need high end hardware for the monitors. to begin with at least
[16:15] <smerz> unless you're building a large cluster
[16:15] <MertsA> ...All I've got is an old Dell tower running Windows XP right now with 3 GB of RAM and an IDE hard drive.
[16:15] * linjan (~linjan@195.110.41.9) Quit (Ping timeout: 480 seconds)
[16:15] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:15] <thole3> Be-El: And I'am struggling with the health-error: HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean - having no clue why.
[16:15] <smerz> MertsA, use that then i guess
[16:16] <s3an2> MertsA: try > freecycle ;)
[16:16] <MertsA> I think I might lol, I just hate the idea of relying on one of these old clunkers providing quorum in the event failover is needed
[16:17] * MentalRay (~MRay@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[16:21] <Be-El> thole3: running commands like ceph -s should work every time. do you get an error message if it does not work?
[16:21] * Bored (~Joppe4899@wannabe.torservers.net) has joined #ceph
[16:25] * Elwell_ (~elwell@58-7-101-177.dyn.iinet.net.au) has joined #ceph
[16:25] * JV_ (~chatzilla@204.14.239.17) Quit (Ping timeout: 480 seconds)
[16:27] * Elwell (~elwell@115-166-26-235.ip.adam.com.au) Quit (Ping timeout: 480 seconds)
[16:27] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[16:29] * daniel2_ (~daniel@cpe-24-28-6-151.austin.res.rr.com) has joined #ceph
[16:30] * jsheeren (~smuxi@systeembeheer.combell.com) has joined #ceph
[16:30] * jsheeren (~smuxi@systeembeheer.combell.com) has left #ceph
[16:30] * jsheeren (~smuxi@systeembeheer.combell.com) has joined #ceph
[16:33] <thole3> Be-El: Yes, I get the error message described before: librados: client.admin authentication error (1) Operation not permitted Error connecting to cluster: PermissionError
[16:33] <devicenull> hmm, something is quite wrong here
[16:33] <devicenull> pg 0.c is stuck unclean since forever, current state stale+active+undersized+degraded, last acting [83]
[16:33] <devicenull> ceph pg 0.c query -> Error ENOENT: i don't have pgid 0.c
[16:33] <Be-El> thole3: and it sometimes works?
[16:34] <Be-El> devicenull: a single acting osd for the pg is probably not enough to satisfy the pool's requirements
[16:34] <devicenull> yea
[16:34] <devicenull> but, I dont know why there's only a single OSD (it's not a crush issue), nor how to fix it
[16:34] <devicenull> plus, I'd expect pg query to work regardless, no?
[16:35] <thole3> Be-El: ~ every 9th time i run the command.
[16:35] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[16:36] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[16:36] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit ()
[16:36] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[16:38] <devicenull> odd, restarted a couple osds an now pg query works
[16:38] <devicenull> https://gist.githubusercontent.com/devicenull/e0357308f52377d7ee54/raw/5cf41e712b4df817e9b5bf3e47167f5e671cbcde/gistfile1.txt
[16:38] <devicenull> I sill have no idea why that PG is stuck on one oSD though
[16:39] * Manshoon_ (~Manshoon@208.184.50.130) Quit (Ping timeout: 480 seconds)
[16:42] <jsheeren> good afternoon
[16:42] <jsheeren> i have a question about flapping osds..
[16:42] * MertsA (~oftc-webi@wsip-68-15-214-247.at.at.cox.net) Quit (Quit: Page closed)
[16:42] <jsheeren> i'm testing ceph at the moment, we have 16 osds spread over 4 physical servers
[16:43] <jsheeren> due to networking issues, 2 of those servers went down
[16:43] <jsheeren> they are back up now, ceph did the recovery fine
[16:43] <jsheeren> but now, i see in the logs: osd.X map eXXXX wrongly marked me down
[16:44] <jsheeren> every 20 to 30 seconds, several osds go down and then come back up again
[16:44] * JayJ (~jayj@157.130.21.226) Quit (Remote host closed the connection)
[16:45] <jsheeren> the cluster health does not come back to OK, stays in warning
[16:46] <jsheeren> 10% of the pg are peering, 25% are remapped+peering
[16:46] <jsheeren> how can i fix the cluster?
[16:46] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[16:46] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:46] <jsheeren> the network issues we had are fixed, every server can communicate fine
[16:51] * Bored (~Joppe4899@7R2AAAHFL.tor-irc.dnsbl.oftc.net) Quit ()
[16:51] * Bromine (~Szernex@176.10.99.209) has joined #ceph
[16:55] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[17:00] <jsheeren> OK
[17:00] <jsheeren> a reboot of the server where the most OSDs went a-flappin' fixed it
[17:01] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[17:03] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[17:04] * Miouge (~Miouge@94.136.92.20) Quit (Quit: Miouge)
[17:04] <Be-El> thole3: how many monitoring nodes do you have in your setup?
[17:05] <Be-El> devicenull: did you restart osd.83, too?
[17:05] <devicenull> yes
[17:06] <devicenull> so, these have been stuck like this for awhile. I actually just upgraded to hammer (from giant) in the hopes it would fix it
[17:06] <devicenull> so the entire cluster has undergone a reboot
[17:06] <thole3> Be-El: 3
[17:06] <Be-El> jsheeren: the flapping might be due to the backfilling traffic. the osds use the same network for their heartbeat afaik
[17:07] * Miouge (~Miouge@94.136.92.20) has joined #ceph
[17:08] <Be-El> thole3: did you tried to install ceph several times? my gut feeling is that you might have a mixed up cluster configuration from several installation attempts. it would also explain why you have different admin keys on different hosts
[17:09] <Be-El> devicenull: how many pgs are currently not in an active+clean state?
[17:09] * reed (~reed@75-101-54-131.dsl.static.fusionbroadband.com) has joined #ceph
[17:09] <devicenull> 3
[17:10] <devicenull> https://gist.githubusercontent.com/devicenull/cd8fc8199427eab91ff6/raw/ec79b6e61c5512171dc3bb798346145a6ba07099/gistfile1.txt
[17:10] <thole3> Be-El: Yes, I have tried the install several times. Are removing the ceph service from the nodes now, and will try again. Is there anything other than the /etc/ceph folder I need to remove manually, to be shure of a clean install?
[17:10] * kawa2014 (~kawa@212.77.3.87) Quit (Ping timeout: 480 seconds)
[17:10] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:11] * nsoffer (~nsoffer@109.64.255.30) has joined #ceph
[17:12] <jsheeren> Be-El: thx for the explanation.. going to read up about backfilling
[17:15] <Be-El> thole3: you might also want to remove the actual osd/mon data in /var/lib/ceph
[17:15] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:15] <thole3> Be-El: I will try that.
[17:16] <Be-El> jsheeren: you can limit backfilling with several settings. the default might be too resource intensive in some setups
[17:17] <Be-El> jsheeren: http://t75766.file-systems-ceph-user.file-systemstalk.us/backfill-and-recovery-traffic-shaping-t75766.html the first post contains some of the settings that might help reduce the backfilling traffic
[17:18] <Be-El> devicenull: is pool 0 a replicated or an erasure coded pool?
[17:19] <devicenull> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 3 object_hash rjenkins pg_num 64 pgp_num 64 last_change 26701 crash_replay_interval 45 min_read_recency_for_promote 1 stripe_width 0
[17:19] <devicenull> wait, is that the first part of the pg ID?
[17:19] <devicenull> so 0.c is PG c in pool 0?
[17:21] <Be-El> yes
[17:21] <devicenull> ahha, I didnt know that!
[17:21] <jsheeren> Be-El: allright, thanks for the info
[17:21] <devicenull> that makes it easy actually, pool 0 is unused so I'll just nuke it
[17:21] * karnan (~karnan@106.51.240.201) has joined #ceph
[17:21] * Bromine (~Szernex@789AAALFX.tor-irc.dnsbl.oftc.net) Quit ()
[17:21] * MatthewH12 (~Scrin@edwardsnowden0.torservers.net) has joined #ceph
[17:22] <devicenull> unless of course, ceph needs the default 'data' pool?
[17:22] <Be-El> it is the default pool for cephfs. if you do not use cephfs, you can discard the pool
[17:23] * kefu is now known as kefu|afk
[17:23] <Be-El> (actually used to be the default pool, newer version of ceph do not create it automatically anymore)
[17:23] <devicenull> great!
[17:23] <devicenull> removed the pool, cluster is happy again
[17:25] * JV (~chatzilla@204.14.239.55) has joined #ceph
[17:26] * JV_ (~chatzilla@204.14.239.107) has joined #ceph
[17:27] * ajazdzewski (~ajazdzews@lpz-66.sprd.net) Quit (Ping timeout: 480 seconds)
[17:30] * achieva (ZISN2.9G@foresee.postech.ac.kr) Quit (Quit: Http://www.ZeroIRC.NET ?? Zero IRC ?? Ver 2.9G)
[17:30] * jsheeren (~smuxi@systeembeheer.combell.com) Quit (Read error: Connection reset by peer)
[17:32] <debian112> How big does the journaling gets. I am going to partition a 1 SSD drive with 3 partitions to store journaling for 3x4TB drives.
[17:32] * cholcombe (~chris@pool-108-42-124-94.snfcca.fios.verizon.net) has joined #ceph
[17:33] <debian112> will 50GB per drive big more than enough
[17:33] <debian112> be more than enough
[17:33] * JV (~chatzilla@204.14.239.55) Quit (Ping timeout: 480 seconds)
[17:34] * kefu|afk (~kefu@114.86.209.84) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:35] <Be-El> debian112: it depends on your requirements. see the "journal settings" paragraph in http://ceph.com/docs/master/rados/configuration/osd-config-ref/
[17:35] * nsoffer (~nsoffer@109.64.255.30) Quit (Quit: Segmentation fault (core dumped))
[17:35] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[17:36] <debian112> Be-EI: Thanks, I assume many people are running the default?
[17:37] <debian112> I will read over it
[17:37] <s3an2> Just out of interest what SSD are you using for the journal?
[17:37] <debian112> s3an2: Intel SSD DC S3500 Series 300GB, 2.5" SATA 6Gb/s, 20nm MLC (p/n: SSDSC2BB300G4)
[17:38] * blynch (~blynch@vm-nat.msi.umn.edu) Quit (Ping timeout: 480 seconds)
[17:38] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) has joined #ceph
[17:38] <debian112> PCI-E would be better
[17:39] <debian112> It blows the budget
[17:40] <s3an2> yea, Could you increase the disk to Journal ratio if using a PCI-E SSD?
[17:44] * blynch (~blynch@vm-nat.msi.umn.edu) has joined #ceph
[17:45] * puffy (~puffy@50.185.218.255) has joined #ceph
[17:45] * puffy (~puffy@50.185.218.255) Quit ()
[17:45] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[17:46] * treenerd (~treenerd@85.193.140.98) Quit (Ping timeout: 480 seconds)
[17:48] * oro (~oro@2001:620:20:16:2cea:29e7:e613:15fa) Quit (Ping timeout: 480 seconds)
[17:51] * MatthewH12 (~Scrin@7R2AAAHK1.tor-irc.dnsbl.oftc.net) Quit ()
[17:51] * Mattress (~Kottizen@chomsky.torservers.net) has joined #ceph
[17:52] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[17:54] * bandrus (~brian@36.sub-70-211-68.myvzw.com) has joined #ceph
[17:58] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) has joined #ceph
[17:59] * jordanP (~jordan@213.215.2.194) Quit (Quit: Leaving)
[18:01] * vbellur (~vijay@122.167.250.154) has joined #ceph
[18:02] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:07] * yghannam (~yghannam@0001f8aa.user.oftc.net) has joined #ceph
[18:09] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) has joined #ceph
[18:10] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) Quit ()
[18:12] * jclm (~jclm@192.16.26.2) has joined #ceph
[18:12] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[18:13] * kawa2014 (~kawa@212.77.30.29) has joined #ceph
[18:14] * gregmark (~Adium@68.87.42.115) has joined #ceph
[18:14] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[18:18] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:18] * Kioob`Taff (~plug-oliv@2a01:e35:2e8a:1e0::42:10) Quit (Quit: Leaving.)
[18:21] * Mattress (~Kottizen@8Q4AAANGT.tor-irc.dnsbl.oftc.net) Quit ()
[18:21] * Quackie (~Teddybare@185-4-227-34.turkrdns.com) has joined #ceph
[18:22] <cmdrk> s3an2: depends on the performance. i would guestimate 100MB/s of journal performance per OSD
[18:22] <cmdrk> some of those pci-e ssds are like 1200MB/s
[18:26] * cdelatte (~cdelatte@2606:a000:6e63:4c00:fcaa:83b2:9be6:8591) Quit (Quit: Leaving)
[18:29] * thomnico (~thomnico@AToulouse-654-1-311-33.w86-199.abo.wanadoo.fr) Quit (Quit: Ex-Chat)
[18:29] <s3an2> yea, I am currently working with 10 X Seagate Cheetah 15k 600GB - So The journal does need to perform well, the Intel DC s3500 may not be upto the Job but a pci-e ssd may do it
[18:32] * Hemanth (~Hemanth@117.221.97.60) has joined #ceph
[18:33] * cdelatte (~cdelatte@2606:a000:6e63:4c00:fcaa:83b2:9be6:8591) has joined #ceph
[18:36] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) Quit (Quit: Leaving.)
[18:37] * oro (~oro@80-219-254-208.dclient.hispeed.ch) has joined #ceph
[18:40] * treenerd (~treenerd@178.115.133.54.wireless.dyn.drei.com) has joined #ceph
[18:40] <cmdrk> we've got 12-disk ceph nodes, i've been waiting for those PCI-E SSDs to get cheap enough to upgrade our whole cluster.
[18:43] <s3an2> Woul you use just one PCI-E SSD for the 12 drives?
[18:46] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) has joined #ceph
[18:48] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[18:48] * linjan (~linjan@213.8.240.146) has joined #ceph
[18:48] * shakamunyi (~shakamuny@209.66.74.34) has joined #ceph
[18:49] * xarses (~andreww@12.164.168.117) has joined #ceph
[18:50] <cmdrk> s3an2: thats what im hoping to do, yeah.
[18:51] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[18:51] * Quackie (~Teddybare@7R2AAAHPL.tor-irc.dnsbl.oftc.net) Quit ()
[18:51] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[18:51] * Enikma (~tokie@lumumba.torservers.net) has joined #ceph
[18:53] * branto (~branto@nat-pool-brq-t.redhat.com) has left #ceph
[18:54] * smerz (~ircircirc@37.74.194.90) Quit (Ping timeout: 480 seconds)
[18:54] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) Quit (Quit: bye!)
[18:54] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[18:54] * kanagaraj (~kanagaraj@27.7.34.223) has joined #ceph
[18:55] * amote (~amote@1.39.15.132) has joined #ceph
[18:59] * daniel2_ (~daniel@cpe-24-28-6-151.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:00] * treenerd (~treenerd@178.115.133.54.wireless.dyn.drei.com) Quit (Ping timeout: 480 seconds)
[19:01] * dyasny (~dyasny@173.231.115.58) Quit (Ping timeout: 480 seconds)
[19:02] * sleinen1 (~Adium@2001:620:0:82::108) Quit (Ping timeout: 480 seconds)
[19:02] * shakamunyi (~shakamuny@209.66.74.34) Quit (Remote host closed the connection)
[19:04] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[19:06] * Sysadmin88 (~IceChat77@054527d3.skybroadband.com) has joined #ceph
[19:06] * linjan (~linjan@213.8.240.146) Quit (Remote host closed the connection)
[19:07] * Hemanth (~Hemanth@117.221.97.60) Quit (Ping timeout: 480 seconds)
[19:08] * shylesh (~shylesh@123.136.222.8) has joined #ceph
[19:09] * Karcaw (~evan@71-95-122-38.dhcp.mdfd.or.charter.com) Quit (Read error: Connection reset by peer)
[19:10] * Karcaw (~evan@71-95-122-38.dhcp.mdfd.or.charter.com) has joined #ceph
[19:11] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) has joined #ceph
[19:11] * treenerd (~treenerd@91.141.5.177.wireless.dyn.drei.com) has joined #ceph
[19:11] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) Quit ()
[19:12] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[19:13] * kanagaraj (~kanagaraj@27.7.34.223) Quit (Ping timeout: 480 seconds)
[19:13] * qstion_ (~qstion@37.157.144.44) Quit (Remote host closed the connection)
[19:15] * ircolle is now known as ircolle-afk
[19:15] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) has joined #ceph
[19:18] * pvh_sa_ (~pvh@105-237-253-44.access.mtnbusiness.co.za) has joined #ceph
[19:19] * shakamunyi (~shakamuny@209.66.74.34) has joined #ceph
[19:19] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[19:20] * Nacer_ (~Nacer@252-87-190-213.intermediasud.com) Quit (Ping timeout: 480 seconds)
[19:21] * Enikma (~tokie@7R2AAAHR2.tor-irc.dnsbl.oftc.net) Quit ()
[19:21] * ZombieTree (~cmrn@95.128.43.164) has joined #ceph
[19:22] * rendar (~I@host154-179-dynamic.12-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[19:22] * karnan (~karnan@106.51.240.201) Quit (Remote host closed the connection)
[19:24] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[19:25] * rendar (~I@host154-179-dynamic.12-79-r.retail.telecomitalia.it) has joined #ceph
[19:26] * kanagaraj (~kanagaraj@27.7.34.223) has joined #ceph
[19:27] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[19:27] * jwilkins (~jwilkins@c-50-131-97-162.hsd1.ca.comcast.net) has joined #ceph
[19:28] * puffy (~puffy@64.191.206.83) has joined #ceph
[19:29] * sleinen1 (~Adium@2001:620:0:82::100) has joined #ceph
[19:32] * treenerd (~treenerd@91.141.5.177.wireless.dyn.drei.com) Quit (Ping timeout: 480 seconds)
[19:34] * thomnico (~thomnico@92.175.70.202) has joined #ceph
[19:35] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[19:36] * Kupo1 (~tyler.wil@23.111.254.159) has joined #ceph
[19:37] <Anticimex> you have to validate the ssd endurance vs expected write levels, to choose an appropriate endurace rating device
[19:41] * cpceph (~Adium@67.21.63.155) has joined #ceph
[19:41] * sleinen1 (~Adium@2001:620:0:82::100) Quit (Read error: Connection reset by peer)
[19:43] * LeaChim (~LeaChim@host86-171-90-60.range86-171.btcentralplus.com) has joined #ceph
[19:48] * kawa2014 (~kawa@212.77.30.29) Quit (Quit: Leaving)
[19:49] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit (Quit: Leaving)
[19:51] * kanagaraj (~kanagaraj@27.7.34.223) Quit (Quit: Leaving)
[19:51] * dgbaley27 (~matt@c-67-176-93-83.hsd1.co.comcast.net) has joined #ceph
[19:51] * ZombieTree (~cmrn@789AAALM1.tor-irc.dnsbl.oftc.net) Quit ()
[19:51] * tZ (~Dysgalt@lumumba.torservers.net) has joined #ceph
[19:52] * thole3 (~Thomas@85.218.244.90) Quit (Quit: Leaving)
[19:55] * daniel2_ (~daniel@209.163.140.194) has joined #ceph
[19:58] * davidz1 (~davidz@cpe-23-242-189-171.socal.res.rr.com) has joined #ceph
[19:59] * shaunm (~shaunm@74.215.76.114) Quit (Quit: Ex-Chat)
[19:59] * shaunm (~shaunm@74.215.76.114) has joined #ceph
[20:00] * thomnico (~thomnico@92.175.70.202) Quit (Ping timeout: 480 seconds)
[20:01] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Remote host closed the connection)
[20:01] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[20:02] * davidz (~davidz@2605:e000:1313:8003:68f5:c7eb:91f1:371c) Quit (Ping timeout: 480 seconds)
[20:03] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[20:04] * clutchkicker (~Adium@140.247.242.52) has joined #ceph
[20:04] * amote (~amote@1.39.15.132) Quit (Quit: Leaving)
[20:05] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) has joined #ceph
[20:09] * bene (~ben@nat-pool-bos-t.redhat.com) has joined #ceph
[20:09] <clutchkicker> rookie question about disk config, think of doing hardware raid 6 on osd nodes with 12 4TB disks. Does this sounds logical. Also is there any optimal stripe element size.
[20:10] * pvh_sa_ (~pvh@105-237-253-44.access.mtnbusiness.co.za) Quit (Ping timeout: 480 seconds)
[20:10] <clutchkicker> deafult on the card is 64KB
[20:11] * rahatm1 (~rahatm1@w134-87-146-216.wireless.uvic.ca) has joined #ceph
[20:14] * rahatm1 (~rahatm1@w134-87-146-216.wireless.uvic.ca) Quit (Remote host closed the connection)
[20:14] * vbellur (~vijay@122.167.250.154) Quit (Ping timeout: 480 seconds)
[20:14] * xarses (~andreww@12.164.168.117) has joined #ceph
[20:15] * sage (~quassel@2607:f298:6050:709d:f4ed:df5a:830d:eee) Quit (Ping timeout: 480 seconds)
[20:15] <cmdrk> clutchkicker: generally, folks recommend against using RAID with Ceph OSDs. you should JBOD the disks, then put 1 OSD per disk, and let Ceph handle redundancy
[20:16] <clutchkicker> ah ok, I had a feeling that was the case.
[20:18] * rahatm1_ (~rahatm1@w134-87-146-216.wireless.uvic.ca) has joined #ceph
[20:18] * rahatm1 (~rahatm1@w134-87-146-216.wireless.uvic.ca) has joined #ceph
[20:18] * rahatm1_ (~rahatm1@w134-87-146-216.wireless.uvic.ca) Quit (Read error: Connection reset by peer)
[20:19] <clutchkicker> Is this because ceph writes in triplicate?
[20:20] <clutchkicker> Is that the protection level most ppl rely on?
[20:20] * pvh_sa (~pvh@197.79.0.113) has joined #ceph
[20:21] * rahatm1 (~rahatm1@w134-87-146-216.wireless.uvic.ca) Quit (Remote host closed the connection)
[20:21] * brutuscat (~brutuscat@74.Red-88-8-87.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[20:21] * tZ (~Dysgalt@0SGAAAH58.tor-irc.dnsbl.oftc.net) Quit ()
[20:22] * bandrus (~brian@36.sub-70-211-68.myvzw.com) Quit (Quit: Leaving.)
[20:23] * JV_ is now known as JV
[20:26] * Concubidated (~Adium@71.21.5.251) has joined #ceph
[20:26] * Oddtwang (~Mattress@TerokNor.tor-exit.network) has joined #ceph
[20:31] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) Quit (Quit: Leaving.)
[20:34] * jordanP (~jordan@scality-jouf-2-194.fib.nerim.net) Quit (Quit: Leaving)
[20:38] * Nacer (~Nacer@2001:41d0:fe82:7200:4de8:9453:3c33:11a9) has joined #ceph
[20:39] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[20:40] * dgurtner_ (~dgurtner@178.197.231.80) Quit (Ping timeout: 480 seconds)
[20:42] * sage (~quassel@2607:f298:6050:709d:1c54:454c:5aca:7598) has joined #ceph
[20:42] * ChanServ sets mode +o sage
[20:42] * JV (~chatzilla@204.14.239.107) Quit (Ping timeout: 480 seconds)
[20:43] * dgbaley27 (~matt@c-67-176-93-83.hsd1.co.comcast.net) Quit (Remote host closed the connection)
[20:44] * shylesh (~shylesh@123.136.222.8) Quit (Remote host closed the connection)
[20:46] * ircolle-afk (~Adium@2601:1:a580:1735:80f7:499a:4cdb:8288) Quit (Quit: Leaving.)
[20:49] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) has joined #ceph
[20:50] * i_m (~ivan.miro@pool-109-191-92-175.is74.ru) Quit (Ping timeout: 480 seconds)
[20:52] * ircolle1 (~Adium@2601:1:a580:1735:85b0:e558:fa8:e21d) has joined #ceph
[20:53] * pvh_sa (~pvh@197.79.0.113) Quit (Read error: Connection reset by peer)
[20:56] * Oddtwang (~Mattress@53IAAAHLC.tor-irc.dnsbl.oftc.net) Quit ()
[20:56] * Gibri (~Ian2128@torland1-this.is.a.tor.exit.server.torland.is) has joined #ceph
[20:56] * floppyraid (~holoirc@202.161.23.74) Quit (Ping timeout: 480 seconds)
[21:00] * ircolle (~ircolle@c-71-229-136-109.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[21:04] * Gibri (~Ian2128@53IAAAHNJ.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[21:06] * wayneeseguin (~wayneeseg@mp64.overnothing.com) Quit (Quit: *poof*)
[21:16] * bandrus (~brian@36.sub-70-211-68.myvzw.com) has joined #ceph
[21:21] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[21:23] * sleinen1 (~Adium@2001:620:0:82::101) has joined #ceph
[21:28] <debian112> s3an2: I would assume yeah
[21:28] <debian112> ratio can go higher up to 5 OSD
[21:29] <debian112> I would be afraid to go more
[21:29] <m0zes> we do 8 osd per ssd.
[21:29] <debian112> Pci-e ssd?
[21:29] <m0zes> nope, normal ssd.
[21:29] <debian112> m0zes how is the speed?
[21:29] * sleinen (~Adium@84-72-160-233.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[21:30] <m0zes> the ssds themselves aren't fast on a per writer basis.
[21:30] <debian112> but the overall cluster?
[21:31] <debian112> m0zes: so you have 8 partitions on 1 SSD?
[21:32] * bandrus (~brian@36.sub-70-211-68.myvzw.com) Quit (Quit: Leaving.)
[21:32] <m0zes> 11, OS, Swap, SSD OSD, and 8 Journals
[21:32] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[21:33] <m0zes> we haven't pushed it too hard yet, but >10000 IOPs and 2GB/s writes with 24 ceph nodes.
[21:33] * bandrus (~brian@36.sub-70-211-68.myvzw.com) has joined #ceph
[21:34] <m0zes> with a caching tier in front of an EC pool.
[21:34] <debian112> ok: you have 2u servers
[21:35] <debian112> right?
[21:35] <m0zes> yes. Dell 730xd.
[21:36] <m0zes> 16x spinning disks, 2x SSDs, 1x Dual-Port 40GbE nic.
[21:37] <m0zes> the SSDs are Lite-On ECT-480N9S drives. not wonderful, but haven't been attrocious.
[21:37] <debian112> how big are the journals?
[21:38] <m0zes> 37GB iirc.
[21:38] <m0zes> for DSYNC operations you need at least 12x simultaneous writers to get the stated IOPs.
[21:38] <debian112> each?
[21:38] <m0zes> yep.
[21:39] <debian112> 12 in a cluster not per box right?
[21:40] <m0zes> 12 writers per ssd.
[21:40] * thomnico (~thomnico@92.175.70.202) has joined #ceph
[21:40] * pvh_sa (~pvh@105-237-253-44.access.mtnbusiness.co.za) has joined #ceph
[21:41] * bandrus (~brian@36.sub-70-211-68.myvzw.com) Quit (Ping timeout: 480 seconds)
[21:42] <debian112> I was looking at using ratio of 1SSD to 3x4TB.
[21:43] <debian112> server will have: 3 SSD for journal and 9x4TB
[21:44] <debian112> I have a 3u server 16drives space
[21:44] * fxmulder (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) has joined #ceph
[21:45] * shohn1 (~shohn@dslc-082-082-188-008.pools.arcor-ip.net) Quit (Quit: Leaving.)
[21:46] * Bored (~drupal@199.188.100.154) has joined #ceph
[21:47] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) Quit (Quit: leaving)
[21:47] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[21:49] * fxmulder_ (~fxmulder@cpe-24-55-6-128.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:51] * derjohn_mobi (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[22:01] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[22:04] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[22:08] * cpceph (~Adium@67.21.63.155) Quit (Quit: Leaving.)
[22:08] * thomnico (~thomnico@92.175.70.202) Quit (Read error: Connection reset by peer)
[22:13] * ChrisNBlum (~ChrisNBlu@178.255.153.117) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[22:16] * Bored (~drupal@8Q4AAANIY.tor-irc.dnsbl.oftc.net) Quit ()
[22:17] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[22:20] * maku2 (~richardus@tor-exit.mathijs.info) has joined #ceph
[22:20] * sleinen1 (~Adium@2001:620:0:82::101) Quit (Ping timeout: 480 seconds)
[22:22] * dupont-y (~dupont-y@2a01:e34:ec92:8070:b090:7f7a:2cfb:2210) has joined #ceph
[22:23] * dgurtner (~dgurtner@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[22:25] <devicenull> so, i have a pool with replication set to 2
[22:25] <devicenull> I have a 3 pgs that are stuck with only one OSD
[22:25] <devicenull> ceph pg map shows 'osdmap e34153 pg 3.20f (3.20f) -> up [102] acting [102]'
[22:25] <devicenull> how would I figure out why this isn't mapped elsewhere?
[22:26] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[22:31] <dmick> ceph pg query can help IIR
[22:32] <devicenull> sure, that prints a bunch of stuff, but it's not very obvious about what's wrong
[22:32] * nsoffer (~nsoffer@bzq-109-64-255-30.red.bezeqint.net) has joined #ceph
[22:34] * wicope (~wicope@0001fd8a.user.oftc.net) Quit (Read error: Connection reset by peer)
[22:41] * daniel2_ (~daniel@209.163.140.194) Quit (Ping timeout: 480 seconds)
[22:44] <devicenull> ahh
[22:45] <devicenull> possibly running into what chooseleaf_vary_r fixes
[22:45] <loicd> what's the canonical URL for Red Velvet ?
[22:47] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[22:49] * bkopilov (~bkopilov@bzq-79-180-203-79.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[22:49] * tw0fish (~tw0fish@UNIX1.ANDREW.CMU.EDU) Quit (Quit: leaving)
[22:50] * maku2 (~richardus@789AAALT4.tor-irc.dnsbl.oftc.net) Quit ()
[22:50] * demonspork (~Szernex@2.shulgin.nl.torexit.haema.co.uk) has joined #ceph
[22:51] * derjohn_mobi (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[22:52] * tupper (~tcole@2001:420:2280:1272:8900:f9b8:3b49:567e) Quit (Ping timeout: 480 seconds)
[22:52] * segutier (~segutier@207.236.250.131) has joined #ceph
[22:54] * dupont-y (~dupont-y@2a01:e34:ec92:8070:b090:7f7a:2cfb:2210) Quit (Ping timeout: 480 seconds)
[22:55] * adeel (~adeel@2602:ffc1:1:face:50a2:3e38:867d:95b9) has joined #ceph
[22:57] * sjmtest (uid32746@id-32746.uxbridge.irccloud.com) has joined #ceph
[22:57] <dmick> https://access.redhat.com/products/red-hat-ceph-storage, maybe?
[22:58] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) Quit (Remote host closed the connection)
[22:59] <devicenull> the new (hammer) ceph status display is a lot more readable!
[22:59] <devicenull> the one status item per line, I mean
[23:01] * bkopilov (~bkopilov@bzq-79-180-203-79.red.bezeqint.net) has joined #ceph
[23:03] * JV (~chatzilla@204.14.239.17) has joined #ceph
[23:06] * m0zes (~mozes@beocat.cis.ksu.edu) Quit (Quit: WeeChat 1.1.1)
[23:09] * m0zes (~mozes@beocat.cis.ksu.edu) has joined #ceph
[23:09] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:09] <carmstrong> is "too many PGs per OSD (1536 > max 300)" a new warning in hammer? we're deploying in exactly the same way - same number of PGs, same number of OSDs, and have not seen this warning before
[23:09] <carmstrong> these are smaller hosts, so maybe it's doing a calculation based on memory or something?
[23:11] * puffy (~puffy@64.191.206.83) Quit (Quit: Leaving.)
[23:12] * cpceph (~Adium@162.219.43.194) has joined #ceph
[23:12] * segutier (~segutier@207.236.250.131) Quit (Read error: Connection reset by peer)
[23:20] * demonspork (~Szernex@5NZAACD4I.tor-irc.dnsbl.oftc.net) Quit ()
[23:20] * W|ldCraze (~loft@789AAALV8.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:24] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[23:34] * segutier (~segutier@207.236.250.131) has joined #ceph
[23:37] * Guest4380 (~daniel@cpe-76-91-66-107.socal.res.rr.com) has joined #ceph
[23:37] <cmdrk> carmstrong: i think it's a new warning, yes
[23:38] * Guest4380 (~daniel@cpe-76-91-66-107.socal.res.rr.com) Quit ()
[23:39] <carmstrong> cmdrk: do you happen to know how it's calculated? is 300 a hard number or is it based on resources?
[23:39] <carmstrong> I sent a detailed email to the user list - now I'm wondering if we're doing something wrong
[23:40] <cmdrk> i think it's based off of ye olde Ceph pg formula. (# of OSDs * 100)/# of replicas
[23:41] <cmdrk> although i guess that's calculating the number of placement groups per OSD
[23:42] <carmstrong> yeah
[23:42] <carmstrong> we're using 128 PGs for 3 OSDs
[23:42] <carmstrong> which is the recommendation according to the docs
[23:42] <carmstrong> but we have a size of 3, so each OSD has a copy of all PGs
[23:42] <cmdrk> i saw your email, i think its 128 PGs per pool, yeah?
[23:42] <carmstrong> yep
[23:43] <cmdrk> i _think_ (someone please correct me if i'm wrong), the pg calculator is actually supposed to be for all pool combined
[23:43] <cmdrk> all pools*
[23:44] <cmdrk> or something like that -- the docs are not clear IMHO.
[23:44] <carmstrong> that would make a lot more sense
[23:44] <carmstrong> we also see ceph taking a while to become healthy (5 minutes or so) on initial platform start, so I'm thinking this may have something to do with it
[23:45] <carmstrong> I don't know - the docs seem pretty explicit (set pg_num to 128 for <5 OSDs)
[23:46] <cmdrk> hm, yep -- that's true.
[23:49] * KevinPerks (~Adium@cpe-75-177-32-14.triad.res.rr.com) has joined #ceph
[23:50] * W|ldCraze (~loft@789AAALV8.tor-irc.dnsbl.oftc.net) Quit ()
[23:51] * Thononain (~Skyrider@0.tor.exit.babylon.network) has joined #ceph
[23:54] <cmdrk> FWIW, if you just want the warning to go away, you can set "mon pg warn max per osd"
[23:55] <carmstrong> yeah, just thinking this may have exposed a misunderstanding with pg calculation
[23:55] <cmdrk> i'm not sure if it's documented anywhere on ceph.com -- i saw it fly by on the devel list or irc the other day and dug through the code to find it again
[23:55] <cmdrk> yeah
[23:56] <debian112> anybody using AMD processors?
[23:59] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.