#ceph IRC Log

Index

IRC Log for 2015-02-12

Timestamps are in GMT/BST.

[0:01] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[0:03] * joef1 (~Adium@2620:79:0:2420::15) has joined #ceph
[0:04] * joef1 (~Adium@2620:79:0:2420::15) has left #ceph
[0:04] * vilobhmm (~vilobhmm@nat-dip33-wl-g.cfw-a-gci.corp.yahoo.com) has joined #ceph
[0:04] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[0:05] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[0:09] * jdillaman (~jdillaman@pool-108-56-67-212.washdc.fios.verizon.net) has joined #ceph
[0:10] <ohnomrbill> georgem: It is better to separate cluster traffic from client traffic and on separate physical networks if you can afford to do so. However if you have only two 10GbE NICs then it may be better for failover that you bond the interfaces. The added benefit being if you configure for link aggregation which you mentioned. If you can do something like LIFs with QOS that would be better there too
[0:13] <georgem> ohnomrbill: thanks a lot
[0:16] <georgem> both links go to the same switch so much failover there :(
[0:16] <georgem> no much failover there
[0:18] <vilobhmm> : loicd : in cinder+ceph deployment 1) how do we guarantee that user A does not attach volume created by user B ? 2) If some malicious users gets access to the keyring stored in libvirt on the hypervisor all the volume/objects in the pool can be access is there a way to prevent this ?
[0:19] <loicd> vilobhmm: the user does not have access to the keyring. (s)he only sees a volume that can be attached to the VM and access as a regular disk. It is no different from a LVM volume.
[0:20] <loicd> openstack ensure the separation of the tenants, it is not related to Ceph in any way
[0:20] <vilobhmm> loicd : if someone gets access to the keyring on hypervisor say the admin won???t that be harmful to be able to access anyone???s data
[0:21] * redf_ (~red@chello084112110034.11.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[0:21] <georgem> vilobhmm: the user could encrypt the volume
[0:22] <loicd> if someone has access to the hypervisor it means (s)he breached OpenStack and I'm not sure what happens then.
[0:22] * ircolle-afk is now known as ircolle
[0:22] <vilobhmm> loicd : yes if someone breached the hypervisor
[0:22] * Sysadmin88 (~IceChat77@94.12.240.104) has joined #ceph
[0:22] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[0:22] <vilobhmm> then as of now ceph has no adddtional elevel of security to prevent data access is that correct ?
[0:22] <loicd> this someone could tamper with the kvm and play all kind of tricks. I'm tempted to say that if that happens absolutely nothing can protect the user.
[0:23] <vilobhmm> georgem : encrypt volume is there a command line option for the same???but with volume encryption every read/write will consume cpu cycles
[0:24] <vilobhmm> loicd : ok thanks wanted to confirm the same
[0:24] <vilobhmm> is there something coming in upcoming release that can help prevent such situation to happen
[0:24] <loicd> I'm no security expert but it makes sense to me that someone controlling the CPU your machine runs on pretty much can do everything from pretending that 1+1=40 to leaking your data
[0:25] <loicd> vilobhmm: I'm not sure it's possible but, once again, I'm no expert ;-)
[0:25] <loicd> vilobhmm: are you aware of a storage system that pretends otherwise ?
[0:26] <vilobhmm> loicd : wondering how amazon does it???or any public provider does it???do they also have all the volumes in same pool and if the keyring is compromized all the volumes can be accessesd ?
[0:27] <loicd> I don't know
[0:29] <vilobhmm> otheroption is to have one pool per hypervisor but then that nots a better option as with growing hypervisor you can have a limited PG and hence cant grow pools exponentially
[0:29] * dgurtner (~dgurtner@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:32] <flaf> vilobhmm: normally VM in openstack have not access to the "ceph" network and just see attached block devices.
[0:34] * asalor (~asalor@0001ef37.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:36] * togdon (~togdon@74.121.28.6) has joined #ceph
[0:37] <flaf> s/have not/should not have/
[0:37] <kraken> flaf meant to say: vilobhmm: normally VM in openstack should not have access to the "ceph" network and just see attached block devices.
[0:37] <vilobhmm> kraken , flaf : i understand that
[0:38] <vilobhmm> but if someone in openstack world has the keyring whats stopping them by accessing the data/volumes/objects stored in the ceph staorage pool ?
[0:38] <vilobhmm> and that someone being a malicious user
[0:38] <vilobhmm> loicd, kraken, flaf : ^^
[0:42] * steki (~steki@cable-89-216-229-100.dynamic.sbb.rs) has joined #ceph
[0:42] * nitti (~nitti@162.222.47.218) Quit (Ping timeout: 480 seconds)
[0:42] <flaf> keyring files are in the filesystem of the compute hosts, and VM can't access to it. If VM can, it a big bug in openstack. ;) And if a openstack user has a keyring file, his VM are in a network which have no access to the ceph network.
[0:43] * steki (~steki@cable-89-216-229-100.dynamic.sbb.rs) Quit (Read error: Connection reset by peer)
[0:43] * reed (~reed@net-93-144-229-167.cust.dsl.teletu.it) Quit (Quit: Ex-Chat)
[0:44] * elder__ (~elder@210.177.145.249) has joined #ceph
[0:44] * steki (~steki@cable-89-216-229-100.dynamic.sbb.rs) has joined #ceph
[0:45] <vilobhmm> flaf : And if a openstack user has a keyring file -> not an openstack user but some malicious user who logs on to the hypervisor and fetches the key from libvirt on the compute/hypervisor???.are you saying this malicious user won???t be able to access ceph network ?
[0:47] * tim|kumina (~tim@82-171-142-11.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[0:47] * steki (~steki@cable-89-216-229-100.dynamic.sbb.rs) Quit (Read error: Connection reset by peer)
[0:50] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:51] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) has joined #ceph
[0:54] <flaf> Normally keyring file has "600" unix rights. If a malicious user can read the file in the compute node, there is a big problem.
[0:56] * dmsimard is now known as dmsimard_away
[0:56] * zack_dolby (~textual@nfmv001065130.uqw.ppp.infoweb.ne.jp) has joined #ceph
[1:01] * fsimonce (~simon@host217-37-dynamic.30-79-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[1:02] <flaf> But yes, if a malicious user is connected to your compute and can read the keyring file, the user has access to the ceph storage of openstack.
[1:03] <blahnana> which really isn't that hard to understand
[1:03] <blahnana> if your compute is compromised, then any data it has access to is compromised
[1:03] <flaf> exactly. :)
[1:04] <flaf> If a web server is compromised the database used by the server is compromised too, etc.
[1:07] <flaf> Just one precision: if the compute is compromised, just the pools used by the compute are compromised (pools "volumes", "images") but not all the ceph cluster.
[1:08] <blahnana> similar to your database analogy
[1:09] <flaf> yes indeed.
[1:09] * Rickus (~Rickus@office.protected.ca) has joined #ceph
[1:11] * LeaChim (~LeaChim@host86-159-236-51.range86-159.btcentralplus.com) has joined #ceph
[1:13] * oms101 (~oms101@p20030057EA1A5C00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:14] <CephTestC> Does anyone know if I loose my write through cache tier will services be interupted? I'm asking because my boss is sure that if the cache goes down everything will run except just a little slower.
[1:14] * calvinx (~calvin@103.7.202.198) has joined #ceph
[1:17] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:17] * lcurtis (~lcurtis@47.19.105.250) Quit (Ping timeout: 480 seconds)
[1:21] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:22] * oms101 (~oms101@p20030057EA7F5D00EEF4BBFFFE0F7062.dip0.t-ipconnect.de) has joined #ceph
[1:22] * karis (~karis@78-106-206.adsl.cyta.gr) Quit (Remote host closed the connection)
[1:23] * arbrandes (~arbrandes@189.110.13.102) Quit (Ping timeout: 480 seconds)
[1:28] * ircolle (~Adium@2601:1:a580:145a:d48b:2093:624:ab2a) Quit (Quit: Leaving.)
[1:29] * danieagle (~Daniel@201-95-103-54.dsl.telesp.net.br) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[1:31] * togdon (~togdon@74.121.28.6) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[1:34] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) Quit (Ping timeout: 480 seconds)
[1:36] * OutOfNoWhere (~rpb@76.8.45.168) has joined #ceph
[1:37] * jdillaman (~jdillaman@pool-108-56-67-212.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[1:51] * LeaChim (~LeaChim@host86-159-236-51.range86-159.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:00] * jdillaman (~jdillaman@pool-108-56-67-212.washdc.fios.verizon.net) has joined #ceph
[2:01] * LeaChim (~LeaChim@host86-159-114-39.range86-159.btcentralplus.com) has joined #ceph
[2:04] * zack_dol_ (~textual@nfmv001065130.uqw.ppp.infoweb.ne.jp) has joined #ceph
[2:04] * zack_dolby (~textual@nfmv001065130.uqw.ppp.infoweb.ne.jp) Quit (Read error: Connection reset by peer)
[2:06] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:14] * rmoe (~quassel@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:14] * alexosaurus (~Adium@111.223.237.4) has joined #ceph
[2:16] <alexosaurus> hello, any tips for troubleshooting high latency lstat calls with cephfs/libceph, but from one endpoint only? the only difference so far I can tell is that the 'fast' machine is connected to all 3 nodes (mon, storage-1, storage-2), whereas the 'slow' machine is only connected to two (mon, storage-2). but, there is no problem with connectivity.
[2:23] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[2:23] * sputnik13 (~sputnik13@74.202.214.170) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[2:24] * saltlake (~saltlake@pool-71-244-62-208.dllstx.fios.verizon.net) has joined #ceph
[2:25] * rmoe (~quassel@173-228-89-134.dsl.static.fusionbroadband.com) has joined #ceph
[2:26] <dmick> alexosaurus: that sounds like there is a connectivity problem; why is the slow machine not connected to storage-1?
[2:26] <alexosaurus> im not sure, how can i find out? just using /etc/fstab to mount it
[2:26] <alexosaurus> as per cephfs guide
[2:27] <dmick> maybe I don't understand what you mean. what makes you say it's "not connected", and/or what does "not connected" mean?
[2:27] <alexosaurus> oh. i am looking in netstat for tcp connections to the mds/storage servers. fast node lists 3 connections to all 3 nodes, slow node lists only 2.
[2:28] <dmick> oh. "has open TCP connections" as opposed to "has a cable"
[2:28] <alexosaurus> right
[2:29] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[2:29] <dmick> still, almost certainly both clients should be able to connect to both storage nodes regularly, and if there aren't existing connections, it would lead one to suspect a connectivity issue.
[2:29] <dmick> maybe firewalls or SElinux or something?
[2:29] <alexosaurus> i can establish a connection in netcat and there's nothing in the audit logs so im at a bit of a loss
[2:30] * asalor (~asalor@2a00:1028:96c1:4f6a:204:e2ff:fea1:64e6) has joined #ceph
[2:30] <dmick> oh. ok then, that is indeed weird
[2:30] <dmick> you're using nc with the OSD ports?
[2:31] <alexosaurus> osd and mon all work fine
[2:31] * tcatm (~quassel@2a01:4f8:200:71e3:5054:ff:feff:cbce) has joined #ceph
[2:31] * sudocat (~davidi@192.185.1.20) Quit (Ping timeout: 480 seconds)
[2:32] <alexosaurus> i just did an experiment where i took storage-2 offline, the slow machine connected to storage-1 but the high latency continued .. so i suppose theres no issue there
[2:32] * sudocat (~davidi@192.185.1.20) has joined #ceph
[2:35] * dmsimard_away is now known as dmsimard
[2:36] <dmick> other than the obvious "crank up the logging, pore through the logs and compare client requests and hope for a pointer to a phase", I don't have any bright ideas
[2:37] * LeaChim (~LeaChim@host86-159-114-39.range86-159.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:40] * dgbaley27 (~matt@c-67-176-93-83.hsd1.co.comcast.net) has joined #ceph
[2:40] * shyu (~shyu@119.254.196.66) has joined #ceph
[2:45] * dmsimard is now known as dmsimard_away
[2:46] * dmsimard_away is now known as dmsimard
[2:47] * puffy (~puffy@216.207.42.129) Quit (Ping timeout: 480 seconds)
[2:50] * sudocat (~davidi@192.185.1.20) Quit (Ping timeout: 480 seconds)
[2:51] * jdillaman (~jdillaman@pool-108-56-67-212.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[2:55] * saltlake (~saltlake@pool-71-244-62-208.dllstx.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[2:59] * thb1 (~me@2a02:2028:2bf:a191:74e7:1b95:7857:72a3) has joined #ceph
[2:59] * thb is now known as Guest5073
[2:59] * thb1 is now known as thb
[2:59] * macjack (~Thunderbi@123.51.160.200) has joined #ceph
[3:05] * Guest5073 (~me@0001bd58.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:07] * calvinx (~calvin@103.7.202.198) Quit (Read error: Connection reset by peer)
[3:07] * calvinx (~calvin@103.7.202.198) has joined #ceph
[3:08] * calvinx (~calvin@103.7.202.198) Quit ()
[3:15] * zhaochao (~zhaochao@111.161.77.232) has joined #ceph
[3:16] * capri (~capri@212.218.127.222) Quit (Read error: Connection reset by peer)
[3:24] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[3:26] * sudocat (~davidi@2601:e:2b80:9920:5c19:e668:9898:78bb) has joined #ceph
[3:26] * calvinx (~calvin@103.7.202.198) has joined #ceph
[3:27] * cholcombe973 (~chris@7208-76ef-ff1f-ed2f-329a-f002-3420-2062.6rd.ip6.sonic.net) Quit (Remote host closed the connection)
[3:31] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[3:35] * calvinx (~calvin@103.7.202.198) Quit (Read error: Connection reset by peer)
[3:35] * calvinx (~calvin@76.164.201.56) has joined #ceph
[3:35] * nitti (~nitti@c-66-41-30-224.hsd1.mn.comcast.net) has joined #ceph
[3:38] * thb (~me@0001bd58.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:39] * alram (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) has joined #ceph
[3:43] * nitti (~nitti@c-66-41-30-224.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[3:44] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[3:46] * badone (~brad@66.187.239.16) Quit (Ping timeout: 480 seconds)
[3:49] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[3:50] * alram (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) Quit (Quit: Lost terminal)
[3:51] * badone (~brad@66.187.239.11) has joined #ceph
[3:53] * lcurtis (~lcurtis@47.19.105.250) Quit (Ping timeout: 480 seconds)
[3:53] * calvinx (~calvin@76.164.201.56) Quit (Ping timeout: 480 seconds)
[3:54] * calvinx (~calvin@76.164.201.64) has joined #ceph
[3:55] * segutier (~segutier@216-166-19-146.fwd.datafoundry.com) Quit (Quit: segutier)
[3:56] * dalegaard-39554 (~dalegaard@vps.devrandom.dk) Quit (Remote host closed the connection)
[3:58] * vilobhmm (~vilobhmm@nat-dip33-wl-g.cfw-a-gci.corp.yahoo.com) Quit (Quit: Away)
[4:04] * lcurtis (~lcurtis@ool-18bfec0b.dyn.optonline.net) has joined #ceph
[4:19] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[4:20] * ljou (~chatzilla@c-50-184-100-25.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[4:29] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[4:31] * shang (~ShangWu@175.41.48.77) has joined #ceph
[4:31] * vakulkar (~vakulkar@c-50-185-132-102.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:33] * calvinx (~calvin@76.164.201.64) Quit (Ping timeout: 480 seconds)
[4:33] * calvinx_ (~calvin@103.7.202.198) has joined #ceph
[4:38] * bkopilov (~bkopilov@bzq-109-67-167-181.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[4:40] * kefu (~kefu@114.86.208.30) has joined #ceph
[4:44] * bandrus (~brian@54.sub-70-211-78.myvzw.com) Quit (Quit: Leaving.)
[4:47] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[4:50] * bandrus (~brian@54.sub-70-211-78.myvzw.com) has joined #ceph
[4:58] * redf (~red@chello084112110034.11.11.vie.surfer.at) has joined #ceph
[5:00] * dalegaard-39554 (~dalegaard@vps.devrandom.dk) has joined #ceph
[5:01] * vbellur (~vijay@122.167.68.108) has joined #ceph
[5:02] * Steki (~steki@cable-89-216-225-243.dynamic.sbb.rs) has joined #ceph
[5:07] * xahare_ (~pixel@cpe-23-241-195-16.socal.res.rr.com) Quit (Quit: leaving)
[5:09] * BManojlovic (~steki@cable-89-216-229-100.dynamic.sbb.rs) Quit (Ping timeout: 480 seconds)
[5:11] * Vacuum_ (~vovo@i59F79379.versanet.de) has joined #ceph
[5:14] * sjm (~sjm@pool-98-109-11-113.nwrknj.fios.verizon.net) Quit (Quit: Leaving.)
[5:14] * sjm (~sjm@pool-98-109-11-113.nwrknj.fios.verizon.net) has joined #ceph
[5:14] * calvinx_ (~calvin@103.7.202.198) Quit (Quit: calvinx_)
[5:17] * calvinx (~calvin@103.7.202.198) has joined #ceph
[5:18] * Vacuum (~vovo@88.130.204.38) Quit (Ping timeout: 480 seconds)
[5:21] * sudocat (~davidi@2601:e:2b80:9920:5c19:e668:9898:78bb) Quit (Quit: Leaving.)
[5:21] * sudocat (~davidi@2601:e:2b80:9920:5c19:e668:9898:78bb) has joined #ceph
[5:26] * davidz (~davidz@2605:e000:1313:8003:9083:5463:ed5a:3994) Quit (Ping timeout: 480 seconds)
[5:26] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[5:31] * lcurtis (~lcurtis@ool-18bfec0b.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[5:32] * sudocat (~davidi@2601:e:2b80:9920:5c19:e668:9898:78bb) Quit (Quit: Leaving.)
[5:32] * sudocat (~davidi@2601:e:2b80:9920:5c19:e668:9898:78bb) has joined #ceph
[5:32] * OutOfNoWhere (~rpb@76.8.45.168) Quit (Ping timeout: 480 seconds)
[5:35] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[5:35] * dalegaard-39554 (~dalegaard@vps.devrandom.dk) Quit (Remote host closed the connection)
[5:35] * DavidThunder1 (~Thunderbi@cpe-23-242-189-171.socal.res.rr.com) has joined #ceph
[5:39] * DavidThunder (~Thunderbi@2605:e000:1313:8003:905a:be43:660e:a10d) Quit (Ping timeout: 480 seconds)
[5:40] * fmanana (~fdmanana@bl5-3-156.dsl.telepac.pt) has joined #ceph
[5:40] * sileht (~sileht@gizmo.sileht.net) Quit (Read error: Connection reset by peer)
[5:41] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[5:42] * shakamunyi (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:47] * dmsimard is now known as dmsimard_away
[5:48] * fdmanana (~fdmanana@bl13-151-100.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[5:49] * KevinPerks (~Adium@cpe-071-071-026-213.triad.res.rr.com) Quit (Quit: Leaving.)
[5:50] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[5:50] * shilpa_ (~SHILPA@122.172.101.162) has joined #ceph
[5:56] * grepory (sid29799@id-29799.brockwell.irccloud.com) Quit ()
[6:00] * CAPSLOCK2000 (~oftc@2001:610:748:1::8) Quit (Ping timeout: 480 seconds)
[6:00] * Concubidated (~Adium@71.21.5.251) Quit (Quit: Leaving.)
[6:01] * MACscr1 (~Adium@2601:d:c800:de3:d818:1450:30c0:332a) Quit (Quit: Leaving.)
[6:02] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Remote host closed the connection)
[6:07] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[6:07] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) has joined #ceph
[6:08] * rotbeard (~redbeard@2a02:908:df10:d300:6267:20ff:feb7:c20) Quit (Quit: Leaving)
[6:15] * lcurtis (~lcurtis@47.19.105.250) Quit (Ping timeout: 480 seconds)
[6:16] * vbellur (~vijay@122.167.68.108) Quit (Ping timeout: 480 seconds)
[6:18] * amote (~amote@121.244.87.116) has joined #ceph
[6:19] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:21] * kefu (~kefu@114.86.208.30) Quit (Quit: Textual IRC Client: www.textualapp.com)
[6:21] * sputnik13 (~sputnik13@c-73-193-97-20.hsd1.wa.comcast.net) has joined #ceph
[6:23] * raso (~raso@deb-multimedia.org) Quit (Quit: WeeChat 1.0.1)
[6:24] * raso (~raso@deb-multimedia.org) has joined #ceph
[6:24] * karnan (~karnan@121.244.87.117) has joined #ceph
[6:25] * overclk (~overclk@121.244.87.117) has joined #ceph
[6:27] * cooldharma06 (~chatzilla@218.248.25.100) has joined #ceph
[6:31] * shilpa_ (~SHILPA@122.172.101.162) Quit (Quit: Leaving)
[6:34] * vbellur (~vijay@121.244.87.117) has joined #ceph
[6:34] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[6:35] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[6:35] * cooldharma06 (~chatzilla@218.248.25.100) Quit (Ping timeout: 480 seconds)
[6:42] * cooldharma06 (~chatzilla@14.139.180.52) has joined #ceph
[6:43] * yguang11_ (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[6:43] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[6:44] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[6:44] * yguang11_ (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[6:45] * shang (~ShangWu@175.41.48.77) Quit (Quit: Ex-Chat)
[6:47] * dalegaard-39554 (~dalegaard@vps.devrandom.dk) has joined #ceph
[6:50] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[6:54] * yguang11_ (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[6:54] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[7:04] * sileht (~sileht@gizmo.sileht.net) has joined #ceph
[7:05] * tim|kumina (~tim@82-171-142-11.ip.telfort.nl) has joined #ceph
[7:06] * Rickus (~Rickus@office.protected.ca) Quit (Ping timeout: 480 seconds)
[7:07] * tim|kumina (~tim@82-171-142-11.ip.telfort.nl) Quit ()
[7:11] * shilpa_ (~SHILPA@122.172.101.162) has joined #ceph
[7:12] * lucas1 (~Thunderbi@218.76.52.64) Quit (Quit: lucas1)
[7:18] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[7:19] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[7:19] * sjm (~sjm@pool-98-109-11-113.nwrknj.fios.verizon.net) Quit (Quit: Leaving.)
[7:23] * beuwolf (~flo@62.113.200.37) has joined #ceph
[7:25] * alexosaurus (~Adium@111.223.237.4) Quit (Quit: Leaving.)
[7:25] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[7:28] * mykola (~Mikolaj@91.225.200.48) has joined #ceph
[7:36] * capri (~capri@212.218.127.222) has joined #ceph
[7:36] * barra204 (~shakamuny@c-67-180-191-38.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[7:37] * cookednoodles (~eoin@89-93-153-201.hfc.dyn.abo.bbox.fr) has joined #ceph
[7:39] * JCL (~JCL@73.189.243.134) Quit (Quit: Leaving.)
[7:39] * shyu (~shyu@119.254.196.66) Quit (Ping timeout: 480 seconds)
[7:48] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) has joined #ceph
[7:52] <Be-El> hi
[7:54] * cookednoodles (~eoin@89-93-153-201.hfc.dyn.abo.bbox.fr) Quit (Quit: Ex-Chat)
[7:58] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[8:01] * shyu (~shyu@119.254.196.66) has joined #ceph
[8:05] * shang (~ShangWu@175.41.48.77) has joined #ceph
[8:11] <bd> -1401/2258 objects degraded (-62.046%)
[8:12] * dgbaley27 (~matt@c-67-176-93-83.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[8:16] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[8:23] * thb (~me@port-50097.pppoe.wtnet.de) has joined #ceph
[8:23] * lcurtis (~lcurtis@47.19.105.250) Quit (Ping timeout: 480 seconds)
[8:25] * lucas1 (~Thunderbi@218.76.52.64) Quit (Quit: lucas1)
[8:28] * badone (~brad@66.187.239.11) Quit (Ping timeout: 480 seconds)
[8:33] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[8:36] * kawa2014 (~kawa@89.184.114.246) has joined #ceph
[8:37] * Concubidated (~Adium@71.21.5.251) has joined #ceph
[8:41] * alram (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) has joined #ceph
[8:42] * Sysadmin88 (~IceChat77@94.12.240.104) Quit (Quit: Pull the pin and count to what?)
[8:45] * mookins (~mookins@induct3.lnk.telstra.net) has joined #ceph
[8:46] * mookins (~mookins@induct3.lnk.telstra.net) Quit ()
[8:49] * mookins (~mookins@induct3.lnk.telstra.net) has joined #ceph
[8:50] * mookins (~mookins@induct3.lnk.telstra.net) Quit ()
[8:50] * mookins (~mookins@induct3.lnk.telstra.net) has joined #ceph
[8:51] * dgurtner (~dgurtner@178.197.231.49) has joined #ceph
[8:56] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[8:58] * MACscr (~Adium@2601:d:c800:de3:fc3a:160e:1eb4:530c) has joined #ceph
[8:58] <mookins> Hello. I am looking at implementing a 3-6 node cluster and I am trying to decide on an architecture as such. All the machines have good RAID controllers which can do RAID5, RAID50, RAID6 and RAID60. Each machine will have 6 drives. If I ran an OSD per disk, I have a lot of overhead, however if I run one OSD per RAID array, I may have overlapping redundancy. My options are RAID with no Ceph
[8:58] <mookins> reundancy, RAID and Ceph redundancy or no RAID and Ceph redundancy.
[8:59] <mookins> Unfortunately I am not sure how to test any of these because the hardware is tied up at the moment until I go to put together this system.
[9:01] <mookins> This storage system will be primarily for backing VM's
[9:01] <Be-El> mookins: with raid-5/raid-6 you'll have an additional overhead for write operations
[9:02] <mookins> Be-El: Overhead in the CPU?
[9:02] <MACscr> i dont think any raid is recommended with ceph though is it?
[9:02] <Be-El> mookins: no, overhead in the write operation. even with hardware raid controllers, each write to a raid-5/raid-6 is actually a write to several disks
[9:03] <Be-El> mookins: i usually avoid a raid setup for osds. if a disk is behind a raid controller, it will be configured as raid-0 with a single disk
[9:06] <mookins> Be-El: A stripe, eg 5 disks as one disk?
[9:07] <Be-El> mookins: no, just a single disk. i would have used jbod if the raid controllers would support it
[9:07] * zack_dol_ (~textual@nfmv001065130.uqw.ppp.infoweb.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[9:08] <mookins> Be-El: Gotcha. So with my 6 drives per server, run 6 OSDs per machine.
[9:08] <MACscr> yep, you dont want to use any sort of raid if you can control it
[9:08] <Be-El> MACscr: i wouldn't recommend it with respect to iop/s, too. for certain scenarios a raid setup might be a good solution, e.g. if you cannot afford the backfill traffic and latency after a disk failed
[9:10] <mookins> Be-El: Well that settles that. Luckily my controllers do support JBOD mode.
[9:13] * Concubidated (~Adium@71.21.5.251) Quit (Quit: Leaving.)
[9:13] * alram (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) Quit (Quit: leaving)
[9:16] * analbeard (~shw@support.memset.com) has joined #ceph
[9:16] <mookins> analbeard... lol
[9:17] <analbeard> :D
[9:17] <analbeard> (apologies if it offends anyone!)
[9:17] * CAPSLOCK2000 (~oftc@2001:610:748:1::8) has joined #ceph
[9:19] * oro (~oro@80-219-254-208.dclient.hispeed.ch) has joined #ceph
[9:20] <mookins> doesn't it stand for 'An L-bear daemon'?
[9:20] <analbeard> it could do
[9:21] <analbeard> it's probably a more polite explanation
[9:21] * JCLM (~JCLM@73.189.243.134) has joined #ceph
[9:22] * JCLM (~JCLM@73.189.243.134) Quit ()
[9:22] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[9:23] * JCLM (~JCLM@73.189.243.134) has joined #ceph
[9:25] * vbellur (~vijay@121.244.87.117) has joined #ceph
[9:31] * jtang (~jtang@109.255.42.21) Quit (Remote host closed the connection)
[9:37] * bandrus (~brian@54.sub-70-211-78.myvzw.com) Quit (Quit: Leaving.)
[9:37] * ScOut3R (~ScOut3R@catv-89-133-22-210.catv.broadband.hu) has joined #ceph
[9:38] * lucas1 (~Thunderbi@218.76.52.64) has joined #ceph
[9:39] * mookins (~mookins@induct3.lnk.telstra.net) Quit ()
[9:41] * ngoswami_ (~ngoswami@121.244.87.124) has joined #ceph
[9:42] * BManojlovic (~steki@178-221-74-244.dynamic.isp.telekom.rs) has joined #ceph
[9:42] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) has joined #ceph
[9:43] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Quit: o//)
[9:44] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Quit: Leaving)
[9:44] * fsimonce (~simon@host217-37-dynamic.30-79-r.retail.telecomitalia.it) has joined #ceph
[9:44] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[9:46] * ngoswami (~ngoswami@121.244.87.116) Quit (Ping timeout: 480 seconds)
[9:48] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[10:01] * linjan (~linjan@80.178.220.195.adsl.012.net.il) has joined #ceph
[10:02] <sc-rm> Now that I have removed some nodes from our ceph cluster I have the hosts still in the ceph osd tree, how do I get rid of those hosts? Can???t find it in any documentation, I have followed all the steps in http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[10:03] <Be-El> sc-rm: 'ceph osd crush remove <bucket name>' removes the given bucket from the crush tree
[10:04] <sc-rm> Be-El: I have done that :-)
[10:04] <sc-rm> Be-El: http://paste.openstack.org/show/171902/ what I???m talking about is getting rid of the top hosts that no longer has any osds
[10:04] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[10:05] <Be-El> sc-rm: 'ceph osd crush remove node-43' should remove the topmost empty host bucket
[10:05] <sc-rm> Be-El: Okay, I???ll try that
[10:06] <sc-rm> Be-El: That did the trick - thanks. But it???s not very clear that one has to do so to clean up osd tree
[10:06] * jordanP (~jordan@213.215.2.194) has joined #ceph
[10:07] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[10:08] <Be-El> sc-rm: the empty bucet are only a nuisance, since they do not interfere with the crush calculation since they do not have a weight anymore
[10:08] * Steki (~steki@cable-89-216-225-243.dynamic.sbb.rs) Quit (Ping timeout: 480 seconds)
[10:12] * Steki (~steki@cable-89-216-225-243.dynamic.sbb.rs) has joined #ceph
[10:12] * oro (~oro@80-219-254-208.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[10:13] <sc-rm> Be-El: A nuisance is exactly what I would call it ;-) For the sake of cleannes in a huge setup, it would be nice to have had that information in the docs :-)
[10:17] * yguang11_ (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit ()
[10:19] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[10:24] * Steki (~steki@cable-89-216-225-243.dynamic.sbb.rs) Quit (Remote host closed the connection)
[10:30] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[10:34] * lucas1 (~Thunderbi@218.76.52.64) Quit (Quit: lucas1)
[10:38] * liiwi (liiwi@idle.fi) Quit (Ping timeout: 480 seconds)
[10:38] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[10:39] * ssejourne (~ssejourne@2001:41d0:52:300::d16) has joined #ceph
[10:46] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[10:51] * overclk (~overclk@121.244.87.117) Quit (Ping timeout: 480 seconds)
[11:01] * linjan (~linjan@80.178.220.195.adsl.012.net.il) Quit (Ping timeout: 480 seconds)
[11:02] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[11:02] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[11:05] * shang (~ShangWu@175.41.48.77) Quit (Quit: Ex-Chat)
[11:07] * oro (~oro@2001:620:20:16:e433:3ad3:683:82b4) has joined #ceph
[11:07] * jcsp1 (~Adium@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Quit: Leaving.)
[11:09] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[11:10] * linjan (~linjan@176.195.224.228) has joined #ceph
[11:10] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[11:11] * Eric (~oftc-webi@cano-desk.cern.ch) has joined #ceph
[11:12] <Eric> Hey folks
[11:13] * ngoswami_ (~ngoswami@121.244.87.124) Quit (Ping timeout: 480 seconds)
[11:13] * linjan (~linjan@176.195.224.228) Quit ()
[11:13] * linjan (~linjan@176.195.224.228) has joined #ceph
[11:14] <Eric> I am using the rados c++ client to access objects concurently in a lockExclusive-read-write-unlock and lockShared-read-unlock fashion.
[11:15] <Eric> I might hold 2-3 locks at a time max in one program, which then loops in a single thread. Yet rados creates a high quantity of threads (>100) for this program.
[11:16] <Eric> Did anyone experence something similar (version 0.91).
[11:16] <Eric> ?
[11:19] * zhaochao (~zhaochao@111.161.77.232) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 31.4.0/20150113100542])
[11:20] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) Quit (Ping timeout: 480 seconds)
[11:21] * ngoswami_ (~ngoswami@121.244.87.116) has joined #ceph
[11:27] * bitserker (~toni@63.pool85-52-240.static.orange.es) Quit (Remote host closed the connection)
[11:27] * bitserker (~toni@63.pool85-52-240.static.orange.es) has joined #ceph
[11:29] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[11:30] * elder__ (~elder@210.177.145.249) Quit (Quit: Leaving)
[11:30] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[11:32] * linjan (~linjan@176.195.224.228) Quit (Ping timeout: 480 seconds)
[11:33] * gfidente (~gfidente@0001ef4b.user.oftc.net) has joined #ceph
[11:34] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[11:39] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[11:42] * calvinx (~calvin@103.7.202.198) Quit (Quit: calvinx)
[11:46] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[11:46] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[11:47] * fghaas (~florian@charybdis-ext.suse.de) has joined #ceph
[11:51] * badone (~brad@203-121-198-226.e-wire.net.au) has joined #ceph
[11:59] * overclk (~overclk@121.244.87.117) has joined #ceph
[12:01] <ZyTer> hi !
[12:02] <ZyTer> is it possible to change an OSD key, my key : " cat /var/lib/ceph/osd/ceph-5/keyring | grep -A 1 osd.5 ; ceph auth export | grep -A 1 osd.5 " does not match ...
[12:04] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:05] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[12:09] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:10] * ScOut3R (~ScOut3R@catv-89-133-22-210.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[12:11] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[12:13] * badone (~brad@203-121-198-226.e-wire.net.au) Quit (Ping timeout: 480 seconds)
[12:15] * ScOut3R (~ScOut3R@catv-80-98-46-171.catv.broadband.hu) has joined #ceph
[12:18] * liiwi (liiwi@idle.fi) has joined #ceph
[12:19] * shyu (~shyu@119.254.196.66) Quit (Remote host closed the connection)
[12:19] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[12:26] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[12:27] * dustinm` (~dustinm`@105.ip-167-114-152.net) Quit (Remote host closed the connection)
[12:29] * ScOut3R_ (~ScOut3R@catv-80-98-46-171.catv.broadband.hu) has joined #ceph
[12:35] * karis (~karis@conf-nat.admin.grnet.gr) has joined #ceph
[12:35] * ScOut3R (~ScOut3R@catv-80-98-46-171.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[12:37] * dustinm` (~dustinm`@2607:5300:100:200::160d) has joined #ceph
[12:39] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[12:40] * arbrandes (~arbrandes@189.110.13.102) has joined #ceph
[12:43] * gfidente (~gfidente@0001ef4b.user.oftc.net) Quit (Quit: bye)
[12:44] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[12:47] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[12:50] <adam1> The documentation states that the ceph-extras repo isn't required on CentOS 7 (for rbd support in qemu)
[12:50] <adam1> but the system qemu doesn't seem to support rbd...
[12:51] * linjan (~linjan@80.179.241.27) has joined #ceph
[12:58] * andreask (~andreask@2001:67c:1933:800::3200) has joined #ceph
[12:58] * ChanServ sets mode +v andreask
[12:58] * andreask (~andreask@2001:67c:1933:800::3200) has left #ceph
[13:00] * karnan (~karnan@121.244.87.117) Quit (Remote host closed the connection)
[13:00] * rwheeler (~rwheeler@nat-pool-tlv-u.redhat.com) has joined #ceph
[13:12] * brutusca_ (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[13:12] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[13:14] * dgurtner (~dgurtner@178.197.231.49) Quit (Ping timeout: 480 seconds)
[13:20] * dgurtner (~dgurtner@178.197.231.49) has joined #ceph
[13:21] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has joined #ceph
[13:25] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[13:26] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[13:34] * brutusca_ (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[13:35] * KevinPerks (~Adium@cpe-071-071-026-213.triad.res.rr.com) has joined #ceph
[13:37] * sudocat (~davidi@2601:e:2b80:9920:5c19:e668:9898:78bb) Quit (Ping timeout: 480 seconds)
[13:47] * linjan_ (~linjan@80.179.241.27) has joined #ceph
[13:47] * linjan (~linjan@80.179.241.27) Quit (Read error: Connection reset by peer)
[13:51] * tupper_ (~tcole@rtp-isp-nat-pool1-1.cisco.com) Quit (Remote host closed the connection)
[13:56] * shilpa_ (~SHILPA@122.172.101.162) Quit (Ping timeout: 480 seconds)
[13:57] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) has joined #ceph
[13:59] * jdillaman (~jdillaman@pool-108-56-67-212.washdc.fios.verizon.net) has joined #ceph
[14:00] * marrusl (~mark@cpe-24-90-46-248.nyc.res.rr.com) has joined #ceph
[14:02] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:03] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[14:06] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) Quit (Ping timeout: 480 seconds)
[14:07] * lcurtis (~lcurtis@47.19.105.250) has joined #ceph
[14:12] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) has joined #ceph
[14:18] * lcurtis (~lcurtis@47.19.105.250) Quit (Ping timeout: 480 seconds)
[14:24] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) Quit (Ping timeout: 480 seconds)
[14:27] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) has joined #ceph
[14:30] * bauruine (~bauruine@wotan.tuxli.ch) Quit (Quit: ZNC - http://znc.in)
[14:30] * bauruine (~bauruine@wotan.tuxli.ch) has joined #ceph
[14:31] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[14:34] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[14:34] * ScOut3R_ (~ScOut3R@catv-80-98-46-171.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[14:35] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[14:37] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) has joined #ceph
[14:40] * thb (~me@0001bd58.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:41] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[14:41] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has left #ceph
[14:44] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[14:44] * cooldharma06 (~chatzilla@14.139.180.52) Quit (Quit: ChatZilla 0.9.91.1 [Iceweasel 21.0/20130515140136])
[14:50] * alaind (~dechorgna@ARennes-651-1-125-185.w2-2.abo.wanadoo.fr) has joined #ceph
[14:53] * overclk (~overclk@121.244.87.117) Quit (Quit: Leaving)
[14:53] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[14:55] * tupper (~tcole@2001:420:2280:1272:647f:846:62bd:6086) has joined #ceph
[15:03] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) has joined #ceph
[15:05] * nitti (~nitti@162.222.47.218) has joined #ceph
[15:06] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[15:06] * nitti (~nitti@162.222.47.218) Quit (Remote host closed the connection)
[15:06] * nitti (~nitti@162.222.47.218) has joined #ceph
[15:06] * nitti (~nitti@162.222.47.218) Quit ()
[15:07] * dmsimard_away is now known as dmsimard
[15:07] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[15:10] * Eric (~oftc-webi@cano-desk.cern.ch) has left #ceph
[15:10] <BranchPredictor> hello, I have 538026 objects in my cluster
[15:10] <BranchPredictor> and ceph -s shows [..]pgmap v38597: 6912 pgs, 2 pools, 591 GB data, 525 kobjects[..]
[15:10] <BranchPredictor> shouldn't it show 538 kobjects?
[15:11] <BranchPredictor> (i.e. div 1000, not by 1024)
[15:11] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[15:12] * sjm (~sjm@pool-98-109-11-113.nwrknj.fios.verizon.net) has joined #ceph
[15:15] * dmsimard is now known as dmsimard_away
[15:19] * vbellur (~vijay@121.244.87.124) has joined #ceph
[15:20] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:21] * shaunm (~shaunm@74.215.76.114) Quit (Ping timeout: 480 seconds)
[15:21] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) Quit (Quit: Leaving)
[15:22] * dyasny (~dyasny@173.231.115.58) has joined #ceph
[15:28] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[15:29] * hasues (~hazuez@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[15:29] * hasues (~hazuez@kwfw01.scrippsnetworksinteractive.com) has left #ceph
[15:30] * dyasny (~dyasny@173.231.115.58) Quit (Quit: Ex-Chat)
[15:31] * dyasny (~dyasny@173.231.115.58) has joined #ceph
[15:35] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[15:36] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[15:40] * linjan_ (~linjan@80.179.241.27) Quit (Ping timeout: 480 seconds)
[15:40] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[15:41] * yanzheng (~zhyan@182.139.205.12) has joined #ceph
[15:41] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) has joined #ceph
[15:44] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) has joined #ceph
[15:45] * davidz (~davidz@cpe-23-242-189-171.socal.res.rr.com) has joined #ceph
[15:55] * yanzheng (~zhyan@182.139.205.12) Quit (Quit: This computer has gone to sleep)
[15:57] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Quit: Leaving)
[16:02] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[16:05] * bkopilov (~bkopilov@bzq-109-67-167-181.red.bezeqint.net) has joined #ceph
[16:06] * dmsimard_away is now known as dmsimard
[16:14] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[16:20] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[16:21] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[16:25] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[16:30] <Be-El> are you able to change the crush ruleset of an existing pool? i would like to move the cephfs metadata pool to ssd-based osds
[16:33] * sputnik13 (~sputnik13@c-73-193-97-20.hsd1.wa.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[16:34] * joef (~Adium@c-24-130-254-66.hsd1.ca.comcast.net) has joined #ceph
[16:36] * joef1 (~Adium@2620:79:0:2420::f) has joined #ceph
[16:36] <beardo> Be-El, yes ceph osd pool set {pool-name) crush_ruleset {ruleset number}
[16:36] <beardo> see: http://ceph.com/docs/master/rados/operations/pools/
[16:37] * joef1 (~Adium@2620:79:0:2420::f) has left #ceph
[16:37] <Be-El> beardo: and the existing content of the pool is migrated to the new locations?
[16:38] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Ping timeout: 480 seconds)
[16:40] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[16:40] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[16:42] * joef (~Adium@c-24-130-254-66.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:44] <beardo> yes
[16:44] <beardo> the pool will become degraded and ceph will heal to comply with the new ruleset
[16:52] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:52] * segutier (~segutier@216-166-19-146.fwd.datafoundry.com) has joined #ceph
[16:53] <Be-El> beardo: ok, thx. the pool is now being remapped
[16:54] * karis (~karis@conf-nat.admin.grnet.gr) Quit (Remote host closed the connection)
[16:54] * saltlake (~saltlake@12.250.199.170) has joined #ceph
[17:00] * sudocat1 (~davidi@192.185.1.20) has joined #ceph
[17:01] * jtang (~jtang@109.255.42.21) has joined #ceph
[17:01] * sudocat1 (~davidi@192.185.1.20) Quit ()
[17:02] * sudocat1 (~davidi@192.185.1.20) has joined #ceph
[17:02] * cok (~chk@nat-cph5-sys.net.one.com) has joined #ceph
[17:07] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[17:07] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Quit: Ex-Chat)
[17:07] <Tim_> Is the result of: (100 * $NumOfOSDs) / $NumOfReplicas, the number of PG's cluster wide or the number of PG's per pool?
[17:08] <fghaas> Tim_: http://ceph.com/pgcalc/ :)
[17:08] * ircolle (~Adium@2601:1:a580:145a:c803:d242:7966:7b0b) has joined #ceph
[17:11] * earnThis (~oftc-webi@64.119.147.250) has joined #ceph
[17:12] <earnThis> if I have n storage nodes running a ceph cluster, does ceph have the ability for me to designate which data go to which nodes?
[17:13] <Be-El> earnThis: yes and no. you can define rules how data is to be distributed
[17:14] <Be-El> earnThis: but you are not able to define that a certain chunk of data should be stored on a certain disk on a certain storage node
[17:15] * rturk|afk is now known as rturk
[17:15] * BManojlovic (~steki@178-221-74-244.dynamic.isp.telekom.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:15] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[17:16] <earnThis> Be-El: so what is the purpose of defining the data distribution via the rules?
[17:17] * thb (~me@89.204.155.216) has joined #ceph
[17:17] <Be-El> earnThis: you can use the rules to define your failure domain, associate pools to certain classes of storage nodes (e.g. ssd based storage vs. platter based storage)
[17:20] <earnThis> Be-El: but the cluster never cares which physical nodes the pools are on, right?
[17:21] <earnThis> Be-El: it's a logical designation
[17:21] <Be-El> earnThis: the monitors in the cluster store a mapping of the placement group (the parts of a pool) to a storage device
[17:21] * shilpa_ (~SHILPA@122.172.101.162) has joined #ceph
[17:22] <Be-El> earnThis: the rules define how this mapping is build and maintained using a given cluster
[17:22] * gregsfortytwo1 (~gregsfort@209.132.181.86) has joined #ceph
[17:22] <Be-El> earnThis: the mapping is actually resolved by the clients itself
[17:23] * off_rhoden (~off_rhode@209.132.181.86) has joined #ceph
[17:24] * jtang (~jtang@109.255.42.21) Quit (Remote host closed the connection)
[17:26] * gregsfortytwo1 (~gregsfort@209.132.181.86) Quit ()
[17:26] * off_rhoden (~off_rhode@209.132.181.86) Quit ()
[17:26] * gregsfortytwo (~gregsfort@209.132.181.86) has joined #ceph
[17:27] * off_rhoden (~off_rhode@209.132.181.86) has joined #ceph
[17:28] <Tim_> thanks fghaas
[17:31] * joshd1 (~jdurgin@24-205-54-236.dhcp.gldl.ca.charter.com) has joined #ceph
[17:31] <earnThis> Be-El: so if I wanted to be sure two groups of data lived of separate physical nodes I would have to set up 2 two different ceph clusters?
[17:32] <Be-El> earnThis: no, you can just define rules for the groups
[17:33] <Be-El> earnThis: http://ceph.com/docs/master/rados/operations/crush-map/
[17:35] <earnThis> Be-El: i'll take a look, thanks
[17:35] * arbrandes (~arbrandes@189.110.13.102) Quit (Ping timeout: 480 seconds)
[17:35] * jcsp (~jcsp@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[17:36] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[17:37] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[17:37] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) Quit (Read error: Connection reset by peer)
[17:39] * vbellur (~vijay@122.167.240.22) has joined #ceph
[17:41] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:42] * fghaas (~florian@charybdis-ext.suse.de) Quit (Quit: Leaving.)
[17:45] * championofcyrodi (~championo@50-205-35-98-static.hfc.comcastbusiness.net) has joined #ceph
[17:45] <championofcyrodi> ircolle: thank you
[17:46] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[17:46] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[17:46] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit ()
[17:47] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[17:47] <championofcyrodi> Hi all. I am troubleshooting my ceph configuration. my crush map is here: http://pastebin.com/gJRX7HWW and you can see node-40 has a lower weight, since it is a 256GB SSD and not a 1TB disk like the others. However 4 nodes and replication factor of 3 filled this disk up much quicker than the others.
[17:48] * rljohnsn (~rljohnsn@c-73-15-126-4.hsd1.ca.comcast.net) has joined #ceph
[17:48] * derjohn_mob (~aj@88.128.80.3) has joined #ceph
[17:48] <championofcyrodi> i reweighted the osd on that node, and degradation has dropped from 12% to 3%, but now it is still sitting at 3%
[17:48] * thb (~me@0001bd58.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:48] <championofcyrodi> I also lowered the replication factor on a pool of data that was less important to me.
[17:48] <championofcyrodi> (from 3 to 2)
[17:49] <championofcyrodi> currently i am running a scrub on the osd nodes, one at a time, it hopes that the specific osd and pg with the degraded objects will replicate, or at least alert me of some other issue.
[17:50] * blinky_ghost_ (~psousa@195.245.147.94) has joined #ceph
[17:50] <championofcyrodi> is this the correct way to troubleshoot degraded instances? or is there an easier way to identify the pg that do not have matching active and up sets.
[17:51] <championofcyrodi> also of my 4 osds, 4 are up and in.
[17:51] <blinky_ghost_> hi all, I have an ceph cluster in production. I intend to change network in public and cluster storage. Is it safe to change this?
[17:51] <blinky_ghost_> thanks
[17:52] * dyasny (~dyasny@173.231.115.58) Quit (Quit: Ex-Chat)
[17:52] * dyasny (~dyasny@173.231.115.58) has joined #ceph
[17:54] * sputnik13 (~sputnik13@74.202.214.170) has joined #ceph
[17:55] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[17:56] <championofcyrodi> this seems to be a good way to find stuck pgs: ceph pg dump_stuck
[17:57] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[17:58] * derjohn_mob (~aj@88.128.80.3) Quit (Ping timeout: 480 seconds)
[17:59] * cok (~chk@nat-cph5-sys.net.one.com) Quit (Quit: Leaving.)
[17:59] <Be-El> championofcyrodi: as long as you have pool with a replication size of 3 one of their instances will be stored on the smaller ssd
[18:00] * oro (~oro@2001:620:20:16:e433:3ad3:683:82b4) Quit (Ping timeout: 480 seconds)
[18:00] * derjohn_mob (~aj@88.128.80.17) has joined #ceph
[18:00] <Be-El> championofcyrodi: maybe moving the osd might give you a better balance
[18:00] <championofcyrodi> Be-El: that makes sense. I think i'm narrowing it down
[18:01] <championofcyrodi> i have stale pgs, and i think they may be 'Homeless'
[18:01] <championofcyrodi> reading about it now
[18:01] <championofcyrodi> i see many instances like this:
[18:01] <championofcyrodi> pg 10.a is stuck unclean for 4258.772805, current state active+remapped, last acting [2,3,1]
[18:01] <championofcyrodi> pg 9.f1 is active+degraded, acting [3,2]
[18:02] * rmoe (~quassel@173-228-89-134.dsl.static.fusionbroadband.com) Quit (Ping timeout: 480 seconds)
[18:02] <Be-El> championofcyrodi: does ceph -s report a full osd ?
[18:03] <championofcyrodi> not at all
[18:03] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[18:03] <championofcyrodi> when i reweighted the 256GB SSD to .3, the full osd went back down to a ratio that matched the other 1TB osds
[18:03] <Be-El> then it should be ok
[18:04] <Be-El> did you try to restart the osd process for osd.1?
[18:04] <championofcyrodi> the homeless placement groups indicates that i just need to restart the osds that were 'last acting' will allow the cluster to recover the placement groups.
[18:04] <championofcyrodi> heh... just now getting to trying that :)
[18:04] <Be-El> i assume there's a recent data backup available....
[18:05] <championofcyrodi> yes
[18:05] <championofcyrodi> im migrating from an older cluster to a newer one
[18:05] <championofcyrodi> (older version)
[18:05] <Be-El> are you able to shuffle the disks physically?
[18:06] <championofcyrodi> wow that was fast...
[18:06] <championofcyrodi> after a reboot, the degraded pgs when from 3% to 0.002%
[18:06] <championofcyrodi> (reboot of the ceph osd service)
[18:07] * ngoswami_ (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[18:07] <championofcyrodi> Be-El: i could, but there are only 4 "disks" (raid configured). this is really a learning cluster, not production.
[18:08] * puffy (~puffy@50.185.218.255) has joined #ceph
[18:09] <Be-El> so you just learned that a reboot might fix a number of problems ;-)
[18:09] <Be-El> are you using btrfs on the osds?
[18:09] <championofcyrodi> heh, well... i know a reboot can make a problem go away temporarily... but to fix it I really need to know whats going on. at least i know a restart of the osd will help with 'homeless pgs'
[18:10] <championofcyrodi> rbd
[18:10] <championofcyrodi> rados block device images
[18:10] <championofcyrodi> oh wait... the OS filesystem... let me see...
[18:11] <championofcyrodi> looks like it is 'xfs'
[18:12] <Be-El> xfs is the default if you do not specify anything else
[18:12] <championofcyrodi> ah
[18:13] <Be-El> there are some problems with btrfs, but since you are using xfs they probably do not apply
[18:13] <championofcyrodi> well... i've now restarted all of my ceph nodes.. and it's still 0.002%
[18:13] <championofcyrodi> which is 2/86843 objects.
[18:13] <Be-El> what's the state of the pgs?
[18:13] <championofcyrodi> HEALTH_WARN 50 pgs degraded; 1426 pgs stuck unclean; recovery 2/86843 objects degraded (0.002%)
[18:14] <Be-El> if you are able to fix the pgs, the objects should become fine, too
[18:14] * linjan (~linjan@80.179.241.26) has joined #ceph
[18:15] <championofcyrodi> running "ceph pg dump_stuck unclean | wc -l" results in 1427 lines.... whic
[18:15] * rmoe (~quassel@12.164.168.117) has joined #ceph
[18:15] <championofcyrodi> h is minus 1 for the header
[18:15] * roehrich (~roehrich@146.174.238.100) has joined #ceph
[18:15] <Be-El> you can try to set the replication factor for all pools to 2 (it's a learning cluster)
[18:16] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[18:16] <Be-El> if you trust your host and do not want to reshuffle the disks for a better distribution of available storage, you can also change the crush ruleset for the pools to distribute based on osd instead of host
[18:17] <Be-El> be warned, this will result in several replicates of the same data being stored on the same host
[18:21] <championofcyrodi> yea, i'd rather not set new crush rules, i've heard there can be some bugs with cascading the OSD crushmap
[18:21] * jtang (~jtang@109.255.42.21) has joined #ceph
[18:21] <championofcyrodi> althought it's 'learning' i've got a few larger datasets I would prefer not to lose, as i have put a lot of work into other systems running on top of this ceph cluster.
[18:22] * shilpa_ (~SHILPA@122.172.101.162) Quit (Remote host closed the connection)
[18:22] <championofcyrodi> and i'd rather not re-export/import the hundreds of GBs of data.
[18:24] <Be-El> i would move osd.3 to node-40 and one of the osd of node-38 to node-39
[18:24] * jtang (~jtang@109.255.42.21) Quit (Remote host closed the connection)
[18:25] * jtang (~jtang@109.255.42.21) has joined #ceph
[18:25] <Be-El> this would allow you to use ~900GB net storage with a replication size of 3
[18:25] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Remote host closed the connection)
[18:26] * xarses (~andreww@12.164.168.117) has joined #ceph
[18:26] * dyasny (~dyasny@173.231.115.58) Quit (Quit: Ex-Chat)
[18:27] * dyasny (~dyasny@173.231.115.58) has joined #ceph
[18:29] <championofcyrodi> they are heterogeneous systems... HP ProLiant DL380 w/ 5 disk raid, SuperMicro w/ 2 SSDs and 2 2TB disks, and a 'super' Dell C62xx that i can only use temporarily.
[18:29] <championofcyrodi> in the other cluster I have 2 more super micros that are identical (1U racks)
[18:30] <championofcyrodi> so once things are migrated and stable, i'll bring them over and get rid of the dell
[18:30] <championofcyrodi> then it will just be 3 supermicros and 2 HP proliants.
[18:30] <championofcyrodi> and ultimately scale the supermicros and drop the HPs
[18:31] <Be-El> nice zoo, reminds me of my server racks ;-)
[18:32] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[18:34] * Rickus (~Rickus@office.protected.ca) has joined #ceph
[18:36] * ohnomrbill (~ohnomrbil@c-67-174-241-112.hsd1.ca.comcast.net) Quit (Quit: ohnomrbill)
[18:36] <championofcyrodi> heh, we're trying...
[18:36] * vakulkar (~vakulkar@c-50-185-132-102.hsd1.ca.comcast.net) has joined #ceph
[18:37] <championofcyrodi> i just started in May, and trying to do research and get the environment a little more modernize than just puppet, bash and kvm
[18:37] <championofcyrodi> disks are getting thrashed due to poor planning/configuration/layout of services.
[18:38] <championofcyrodi> once we get stability and this 'takes', i'll get more money to buy 10Gbps switch, more servers, etc.
[18:38] <Be-El> well, i started with a single nfs server based on a qnap "enterprise" nas
[18:38] <championofcyrodi> i came from HDFS, read some of the RUSH and CRUSH whitepapers last night... i'm convinced :)
[18:38] * joshd1 (~jdurgin@24-205-54-236.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[18:39] <Be-El> ceph is indeed a very fine system
[18:39] <championofcyrodi> decentralization is nice, and eliminates bottle necks during high traffic times
[18:39] <Be-El> no single point of failure
[18:39] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Read error: Connection reset by peer)
[18:39] <championofcyrodi> I thought the 'monitor' was needed... seems like its just for us humans who want to know whats going on :)
[18:39] <Be-El> the learning curve is somewhat steep at the beginning, but as soon as the main concepts are understood it runs fine
[18:40] <Be-El> no, the monitors are essential. they store all information about pools, osds, pgs etc.
[18:40] <Be-El> the trick about ceph is the fact that you can have multiple monitors
[18:40] <championofcyrodi> so how does an 'osd' client actually find a host->pg->object?
[18:40] <Be-El> every client that wants to access data on a ceph cluster contacts the mon for the osd and pg map
[18:41] <championofcyrodi> using the crush map and a hash right?
[18:41] <Be-El> and afterwards it uses the crush map and the hash for selecting the pg
[18:41] <championofcyrodi> okay, so the monitor need to be there for 'periodic' updates to the client's crushmap?
[18:42] <championofcyrodi> since they probably only support jenkins hash anyway
[18:42] <Be-El> the monitors actually "define" the cluster
[18:42] <Be-El> no monitors, no cluster
[18:42] <championofcyrodi> okay, so they need the pgmap
[18:43] <championofcyrodi> eitherway, i love the way a client can just compute the location of an object. its mathemagical.
[18:43] <Be-El> theoretical the client can do it, yes
[18:44] <championofcyrodi> i see.. the actual ceph implementation is likely different from the [C]RUSH whitepapers
[18:44] <Be-El> if you do some maintenance on the cluster or something fails etc, the reality will not necessary reflect the crush view
[18:44] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) has joined #ceph
[18:44] <Be-El> that why you had degraded objects, stuck pgs etc.
[18:45] <Be-El> and that's why the cluster is trying to resolve the problem
[18:45] <Be-El> if the cluster is healthy, the crush view and the reality are in sync
[18:45] <championofcyrodi> hmmm so i had a client just add more data and start more i/o ops from clients... degraded went from 0.002 to 0.092%
[18:46] * linjan_ (~linjan@213.8.240.146) has joined #ceph
[18:46] <championofcyrodi> about 78 objects of the 2000 added.
[18:46] <Be-El> you probably have a pool with a min_size of 2 and a size of 3
[18:46] <championofcyrodi> i set a pool from 3 to 2...
[18:46] <Be-El> with two replicates available, the client is able to write to the pgs; but you need 3 replicates for a non degraded state
[18:47] <championofcyrodi> but the other pools i left at 3
[18:47] <Be-El> did you set the 'size' or the 'min_size' value?
[18:47] <championofcyrodi> only the size
[18:47] <Be-El> you should also adopt the min_size value then
[18:47] <championofcyrodi> on the same pool
[18:48] <Be-El> on the same pool, yes
[18:48] <Be-El> min_size <= size
[18:48] <championofcyrodi> i understand, it's reporting degraded, even though it has the '2' replicas that I am happy with
[18:48] <championofcyrodi> because the threshold is set to min_size 3 still
[18:48] <championofcyrodi> Be-El you are awesome, thank you so much for taking the time.
[18:48] <Be-El> i usually have it the other wa round
[18:49] <championofcyrodi> yes that is how i would have had it on HDFS
[18:49] <championofcyrodi> i did not realize the min_size existed, but it makes sense that it should
[18:49] <championofcyrodi> otherwise there is no threshold, and hence no way to calc degraded obs
[18:50] <Be-El> it also gives you a way to define when operating on a degraded pg is ok
[18:50] <Be-El> that's why the default size is 3
[18:50] <championofcyrodi> makes sense
[18:51] <Be-El> with min_size set to 2, one disk may fail, but your client can still work with the pgs
[18:51] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[18:51] <championofcyrodi> once i set the min size... do i just wait for it to do it's automated scrubbing?
[18:51] <Be-El> and you have a quorum of cause
[18:52] <championofcyrodi> the number of degraded objects is actually going up now.
[18:52] * linjan (~linjan@80.179.241.26) Quit (Ping timeout: 480 seconds)
[18:52] <Be-El> the change should be reflected almost immediately
[18:52] * derjohn_mob (~aj@88.128.80.17) Quit (Ping timeout: 480 seconds)
[18:52] <championofcyrodi> .3% now
[18:53] <championofcyrodi> granted a user could be just adding a lot of data at the moment.
[18:54] <Be-El> or the problem are the other pools that still have a size of 3
[18:54] <Be-El> the first column of 'ceph pg dump unclean' is the PG id, which is composed of the pool id and a running number
[18:55] * DC-RH (~dc@2605:6000:fb40:a700:f97a:27a4:6aa9:d62b) has joined #ceph
[18:55] <Be-El> you can also use 'rados df' do get an overview of the pools, their sizes and how many objects are degraded
[18:55] <championofcyrodi> heh.. .trying to remember ALL these commands
[18:56] <championofcyrodi> so the pool i modified the min_size has 0 degraded...
[18:56] <championofcyrodi> but we have another pool, that has 238
[18:56] <Be-El> well, it's time to call it a day. good luck with your cluster
[18:56] <championofcyrodi> thanks again.
[18:56] <Be-El> you're welcome
[18:57] * danieagle (~Daniel@201-95-103-54.dsl.telesp.net.br) has joined #ceph
[18:58] * rwheeler (~rwheeler@nat-pool-tlv-u.redhat.com) Quit (Quit: Leaving)
[18:59] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[18:59] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Remote host closed the connection)
[19:00] * jordanP (~jordan@213.215.2.194) Quit (Quit: Leaving)
[19:00] * DC-RH (~dc@2605:6000:fb40:a700:f97a:27a4:6aa9:d62b) Quit ()
[19:02] * cookednoodles (~eoin@89-93-153-201.hfc.dyn.abo.bbox.fr) has joined #ceph
[19:06] * wicked_shell (~wicked_sh@susia1791.etelecom.spb.ru) Quit (Quit: no specific reason)
[19:12] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[19:12] * smokedmeets (~smokedmee@34.sub-70-197-6.myvzw.com) has joined #ceph
[19:13] * earnThis (~oftc-webi@64.119.147.250) Quit (Quit: Page closed)
[19:14] * vilobhmm (~vilobhmm@nat-dip33-wl-g.cfw-a-gci.corp.yahoo.com) has joined #ceph
[19:16] * smokedmeets (~smokedmee@34.sub-70-197-6.myvzw.com) Quit ()
[19:16] * smokedmeets (~smokedmee@34.sub-70-197-6.myvzw.com) has joined #ceph
[19:19] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[19:27] * Sysadmin88 (~IceChat77@94.12.240.104) has joined #ceph
[19:31] * branto (~borix@ip-213-220-214-203.net.upcbroadband.cz) has left #ceph
[19:31] * alram (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) has joined #ceph
[19:36] * alram_ (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) has joined #ceph
[19:41] * alram (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) Quit (Ping timeout: 480 seconds)
[19:53] * cdelatte (~cdelatte@2606:a000:dd42:9e00:3e15:c2ff:feb8:dff8) has joined #ceph
[19:55] * cdelatte (~cdelatte@2606:a000:dd42:9e00:3e15:c2ff:feb8:dff8) Quit ()
[20:00] * cholcombe973 (~chris@pool-108-42-144-175.snfcca.fios.verizon.net) has joined #ceph
[20:01] * brutuscat (~brutuscat@137.Red-83-42-88.dynamicIP.rima-tde.net) Quit (Remote host closed the connection)
[20:03] * blinky_ghost_ (~psousa@195.245.147.94) Quit (Ping timeout: 480 seconds)
[20:04] * Concubidated (~Adium@2607:f298:b:635:de3:a30b:7e10:708a) has joined #ceph
[20:08] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[20:09] * vbellur (~vijay@122.167.240.22) Quit (Ping timeout: 480 seconds)
[20:15] * kawa2014 (~kawa@89.184.114.246) Quit (Quit: Leaving)
[20:15] * macjack (~Thunderbi@123.51.160.200) Quit (Remote host closed the connection)
[20:18] * mykola (~Mikolaj@91.225.200.48) Quit (Ping timeout: 480 seconds)
[20:28] * dgurtner (~dgurtner@178.197.231.49) Quit (Ping timeout: 480 seconds)
[20:31] * kindkid (~Adium@66.55.33.66) has joined #ceph
[20:32] <kindkid> Can solr-cloud run on cephfs?
[20:34] * alaind (~dechorgna@ARennes-651-1-125-185.w2-2.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[20:38] * smokedmeets (~smokedmee@34.sub-70-197-6.myvzw.com) Quit (Ping timeout: 480 seconds)
[20:39] * badone (~brad@203-121-198-226.e-wire.net.au) has joined #ceph
[20:42] * thb (~me@0001bd58.user.oftc.net) has joined #ceph
[20:42] * ircolle is now known as ircolle-afk
[20:46] * VisBits (~textual@8.29.138.28) has joined #ceph
[20:47] <VisBits> evening
[20:47] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) has joined #ceph
[20:48] * segutier (~segutier@216-166-19-146.fwd.datafoundry.com) Quit (Quit: segutier)
[20:51] * eJunky (~oftc-webi@f052104200.adsl.alicedsl.de) has joined #ceph
[20:51] * DavidThunder1 (~Thunderbi@cpe-23-242-189-171.socal.res.rr.com) Quit (Quit: DavidThunder1)
[20:55] <eJunky> can anybody here help me with a crashed ceph cluster?
[20:55] * zack_dolby (~textual@pa3b3a1.tokynt01.ap.so-net.ne.jp) has joined #ceph
[20:57] <VisBits> What sort of crashed? Are you in a failed state or worse than that?
[20:58] <eJunky> yes, I had 4 old server chassis for a test
[20:58] <eJunky> I took one offline to reinstall so i had only 3 left.
[20:58] <eJunky> the suddenly one of them died completely.
[20:58] * alram_ (~alram@ppp-seco11pa2-46-193-140-198.wb.wifirst.net) Quit (Quit: leaving)
[20:59] <eJunky> and finally a not very smart ansible run tried to restart ceph on the remaining 2 hosts.
[20:59] <eJunky> so they don
[20:59] <eJunky> 't
[20:59] <eJunky> come up ad I don't know where to start.
[20:59] <VisBits> okay first
[21:00] <VisBits> are your monitors up?
[21:00] <VisBits> at least one of your monitors
[21:00] <eJunky> yes I suppose so: /usr/bin/ceph-mon -i ceph02 --pid-file /var/run/ceph/mon.ceph02.pid -c /etc/ceph/ceph.conf --cluster ceph
[21:01] <eJunky> and the one on ceph03
[21:01] <eJunky> but no osds are running.
[21:01] * cookednoodles (~eoin@89-93-153-201.hfc.dyn.abo.bbox.fr) Quit (Quit: Ex-Chat)
[21:01] <VisBits> does /etc/ceph/ceph.conf on all of your osd nodes have the monitor configuration? they need to know where to connect
[21:01] <VisBits> if so service ceph start on them
[21:02] <eJunky> yes they have but:
[21:02] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Quit: Leaving.)
[21:03] <eJunky> the log was complaining about the syntax because the port is missing.
[21:03] <eJunky> is that important?
[21:03] * fghaas1 (~florian@212095007047.public.telering.at) has joined #ceph
[21:03] <eJunky> WARNING: 'mon addr' config option 192.168.99.203:0/0 does not match monmap file
[21:05] * kefu (~kefu@114.92.101.83) has joined #ceph
[21:06] * georgem (~Adium@fwnat.oicr.on.ca) has joined #ceph
[21:06] <VisBits> #Monitor Configuration
[21:06] <VisBits> [mon.ceph0-mon0]
[21:06] <VisBits> host = ceph0-mon0
[21:06] <VisBits> mon addr = 10.1.8.40:6789
[21:06] <VisBits> [mon.ceph0-mon1]
[21:06] <VisBits> host = ceph0-mon1
[21:06] <VisBits> mon addr = 10.1.8.41:6789
[21:06] <VisBits> [mon.ceph0-mon2]
[21:06] <VisBits> host = ceph0-mon2
[21:06] <VisBits> mon addr = 10.1.8.42:6789
[21:06] <VisBits> it should look like that
[21:06] <VisBits> mon_initial_members = ceph0-mon0, ceph0-mon1, ceph0-mon2
[21:06] <VisBits> mon_host = 10.1.8.40,10.1.8.41,10.1.8.42
[21:09] * kefu (~kefu@114.92.101.83) Quit (Max SendQ exceeded)
[21:09] * carmstrong (sid22558@id-22558.uxbridge.irccloud.com) Quit (Read error: Connection reset by peer)
[21:09] <eJunky> look very similar
[21:10] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) Quit (Remote host closed the connection)
[21:10] * derjohn_mob (~aj@tmo-109-190.customers.d1-online.com) has joined #ceph
[21:11] <eJunky> ah, I think I got the problem.
[21:12] <eJunky> ansible changed the IPs to those of the cluster internal network. but the monmap had the external ips configured
[21:12] <eJunky> after changing to the external ips I can run ceph -s again.
[21:13] * carmstrong (sid22558@id-22558.uxbridge.irccloud.com) has joined #ceph
[21:16] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) has joined #ceph
[21:16] <eJunky> mmh, on one node the osds came up now but not on the other.
[21:16] <eJunky> root@ceph02:~# service ceph start osd.2 === osd.2 === 2015-02-12 21:16:15.213057 7fec2c4d1700 0 -- :/1036749 >> 192.168.99.202:6789/0 pipe(0x12ba050 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x12ba2e0).fault 2015-02-12 21:16:18.213433 7fec2c3d0700 0 -- :/1036749 >> 192.168.99.203:6789/0 pipe(0x12b8010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x12b82a0).fault 2015-02-12 21:16:21.213548 7fec2c4d1700 0 -- :/1036749 >> 192.168.99.202:6789/0 pipe(0x7fec28008090 sd=4 :0
[21:17] * scuttlemonkey is now known as scuttle|afk
[21:18] * kefu (~kefu@114.92.101.83) has joined #ceph
[21:19] * scuttle|afk is now known as scuttlemonkey
[21:19] * segutier (~segutier@216-166-19-146.fwd.datafoundry.com) has joined #ceph
[21:21] * cookednoodles (~eoin@89-93-153-201.hfc.dyn.abo.bbox.fr) has joined #ceph
[21:22] * brutuscat (~brutuscat@174.34.133.37.dynamic.jazztel.es) has joined #ceph
[21:25] * Azrael_ is now known as Azrael
[21:25] * kefu (~kefu@114.92.101.83) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[21:25] <VisBits> firewall turned off?
[21:27] * puffy (~puffy@50.185.218.255) Quit (Quit: Leaving.)
[21:29] <eJunky> yes, no firewall locally active.
[21:29] <eJunky> It was the "global" mon hosts section pointing to the internal IPs
[21:29] <eJunky> now the remaining OSDs are running thanks.
[21:30] * puffy (~puffy@50.185.218.255) has joined #ceph
[21:30] <eJunky> But at this point I have a totally different question: I bout some WD Se Series 4T drives to upgrade some OSDs.
[21:30] <VisBits> cant help, never done that
[21:30] <VisBits> lol
[21:30] <VisBits> im a newb to
[21:30] <eJunky> unfotunately they are always shown as 1,8T I gues due to the 4k Block size?
[21:32] * joef (~Adium@2620:79:0:2420::f) has joined #ceph
[21:32] <VisBits> hard drive manufacture consider the size based on 1000 Bytes
[21:32] <lurbs> It'll be, mostly, the difference between the manufacturer specifying the size as terabytes (TB, 10^12), and the OS showing it as tebibytes (TiB, 20^40)
[21:32] <VisBits> vs 1024 Bytes
[21:32] * joef (~Adium@2620:79:0:2420::f) has left #ceph
[21:33] <Anticimex> is it simple to apply new ruleset to a pool, and data will juts migrate?
[21:33] <lurbs> s/20^40/10^40/
[21:33] * fghaas1 (~florian@212095007047.public.telering.at) Quit (Ping timeout: 480 seconds)
[21:33] <lurbs> FECK.
[21:33] <lurbs> I meant "TiB, 2^40".
[21:33] * lurbs is pre-coffee.
[21:36] <eJunky> well old disks used to address 512 byte as smallest block size. there are newer ones like the WD Se series that address 4k as smallest available block size.
[21:36] <Anticimex> and this new ruleset using a different crush map, based on rules that actually select same osds?
[21:36] <eJunky> but the controller (Areca 1220) assumes 512 what seems wrong.
[21:36] <Anticimex> how is data migrating? pg's are moved from previous crush map to new crush map?
[21:37] * jrankin (~jrankin@d53-64-170-236.nap.wideopenwest.com) Quit (Quit: Leaving)
[21:38] * puffy (~puffy@50.185.218.255) Quit (Read error: Connection reset by peer)
[21:38] * rturk is now known as rturk|afk
[21:46] * fghaas (~florian@213162068049.public.t-mobile.at) has joined #ceph
[21:50] * rohanm (~rohanm@mobile-166-173-185-157.mycingular.net) Quit (Ping timeout: 480 seconds)
[21:59] * BManojlovic (~steki@cable-89-216-225-243.dynamic.sbb.rs) has joined #ceph
[22:01] * georgem1 (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[22:01] * georgem (~Adium@fwnat.oicr.on.ca) Quit (Read error: Connection reset by peer)
[22:04] * linjan_ (~linjan@213.8.240.146) Quit (Ping timeout: 480 seconds)
[22:06] * ircolle-afk is now known as ircolle
[22:10] <VisBits> how do i get ceph to place nice with device mapper
[22:10] * smokedmeets (~smokedmee@34.sub-70-197-6.myvzw.com) has joined #ceph
[22:11] * jwilkins (~jwilkins@2601:9:4580:f4c:ea2a:eaff:fe08:3f1d) Quit (Ping timeout: 480 seconds)
[22:11] * fghaas1 (~florian@212095007065.public.telering.at) has joined #ceph
[22:13] * linjan_ (~linjan@176.195.224.228) has joined #ceph
[22:13] * andreask (~andreask@h081217069051.dyn.cm.kabsi.at) has joined #ceph
[22:13] * ChanServ sets mode +v andreask
[22:17] * fghaas (~florian@213162068049.public.t-mobile.at) Quit (Ping timeout: 480 seconds)
[22:17] * brutuscat (~brutuscat@174.34.133.37.dynamic.jazztel.es) Quit (Remote host closed the connection)
[22:22] * erikmack (~user@2602:306:37ec:5bb0::43) has joined #ceph
[22:24] * jwilkins (~jwilkins@c-67-180-123-48.hsd1.ca.comcast.net) has joined #ceph
[22:25] * eJunky (~oftc-webi@f052104200.adsl.alicedsl.de) Quit (Quit: Page closed)
[22:27] * brutuscat (~brutuscat@174.34.133.37.dynamic.jazztel.es) has joined #ceph
[22:28] <erikmack> Why would 'ceph osd lspools' only have rbd (not data or metadata)? I'm working through the quick install with ceph-deploy and four VMs. 'ceph -s' looks happy otherwise.
[22:28] <erikmack> Should I just make the missing pools myself on move on? Thanks for helping a noob.
[22:29] <dmick> erikmack: ceph no longer creates data and metadata pools by default
[22:29] <dmick> I suspect the documentation is lagging that particular change
[22:30] <erikmack> dmick: Yes it is. Thanks for confirming that my install may be working
[22:31] * badone (~brad@203-121-198-226.e-wire.net.au) Quit (Ping timeout: 480 seconds)
[22:33] * Pl3x0r (Pl3x0r@41.107.128.182) has joined #ceph
[22:34] * LeaChim (~LeaChim@host86-159-114-39.range86-159.btcentralplus.com) has joined #ceph
[22:34] * Pl3x0r (Pl3x0r@41.107.128.182) has left #ceph
[22:34] * badone (~brad@66.187.239.16) has joined #ceph
[22:36] <VisBits> I can't get ceph to create an osd on a multipath device-mapper disk. It fails to detect the newly created partitions yet they show up no problem
[22:42] * sputnik13 (~sputnik13@74.202.214.170) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[22:45] * roehrich (~roehrich@146.174.238.100) Quit (Quit: Leaving)
[22:45] * sputnik13 (~sputnik13@74.202.214.170) has joined #ceph
[22:48] * georgem1 (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[22:54] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[22:56] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[22:58] * fghaas1 (~florian@212095007065.public.telering.at) Quit (Quit: Leaving.)
[22:58] * vilobhmm (~vilobhmm@nat-dip33-wl-g.cfw-a-gci.corp.yahoo.com) Quit (Quit: Away)
[22:59] <VisBits> appears the issue is when you partition a device-mapper device it names it DEV1 vs DEVP1
[23:01] * dyasny (~dyasny@173.231.115.58) Quit (Ping timeout: 480 seconds)
[23:01] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[23:02] * puffy (~puffy@50.185.218.255) has joined #ceph
[23:02] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[23:02] * andreask (~andreask@h081217069051.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[23:02] * jdillaman (~jdillaman@pool-108-56-67-212.washdc.fios.verizon.net) Quit (Quit: jdillaman)
[23:06] * linjan_ (~linjan@176.195.224.228) Quit (Ping timeout: 480 seconds)
[23:14] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[23:15] * tupper (~tcole@2001:420:2280:1272:647f:846:62bd:6086) Quit (Ping timeout: 480 seconds)
[23:16] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit (Quit: Leaving.)
[23:16] * mookins (~mookins@induct3.lnk.telstra.net) has joined #ceph
[23:18] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) has joined #ceph
[23:19] * georgem (~Adium@69-165-159-72.dsl.teksavvy.com) Quit ()
[23:25] * MACscr1 (~Adium@2601:d:c800:de3:fc3a:160e:1eb4:530c) has joined #ceph
[23:25] * saltlake2 (~saltlake@12.250.199.170) has joined #ceph
[23:25] * MACscr (~Adium@2601:d:c800:de3:fc3a:160e:1eb4:530c) Quit (Ping timeout: 480 seconds)
[23:29] * saltlake (~saltlake@12.250.199.170) Quit (Ping timeout: 480 seconds)
[23:33] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[23:33] * saltlake2 (~saltlake@12.250.199.170) Quit (Ping timeout: 480 seconds)
[23:38] * vilobhmm (~vilobhmm@nat-dip33-wl-g.cfw-a-gci.corp.yahoo.com) has joined #ceph
[23:42] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[23:54] * kindkid (~Adium@66.55.33.66) has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.