#ceph IRC Log

Index

IRC Log for 2013-04-15

Timestamps are in GMT/BST.

[0:01] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[0:02] * LeaChim (~LeaChim@176.250.177.64) Quit (Ping timeout: 480 seconds)
[0:07] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has left #ceph
[0:20] * BillK (~BillK@124-169-41-139.dyn.iinet.net.au) has joined #ceph
[0:21] * danieagle (~Daniel@177.133.174.100) has joined #ceph
[0:26] * dosaboy (~dosaboy@ip-64-134-229-164.public.wayport.net) has joined #ceph
[0:39] * humbolt (~elias@80-121-53-166.adsl.highway.telekom.at) has joined #ceph
[0:47] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[0:47] * ChanServ sets mode +o scuttlemonkey
[1:06] * danieagle (~Daniel@177.133.174.100) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[1:29] * Yen (~Yen@ip-81-11-211-231.dsl.scarlet.be) Quit (Quit: Exit.)
[1:33] * Yen (~Yen@ip-81-11-211-231.dsl.scarlet.be) has joined #ceph
[1:42] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:48] * loicd (~loic@67.131.102.78) Quit (Quit: Leaving.)
[1:48] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[1:49] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[1:54] * The_Bishop (~bishop@e179012172.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[1:56] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:03] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[2:03] * markl (~mark@tpsit.com) Quit (Ping timeout: 480 seconds)
[2:03] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:04] * The_Bishop (~bishop@f052103194.adsl.alicedsl.de) has joined #ceph
[2:05] * Yen (~Yen@ip-81-11-211-231.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[2:05] * The_Bishop (~bishop@f052103194.adsl.alicedsl.de) Quit ()
[2:07] * Yen (~Yen@ip-83-134-66-147.dsl.scarlet.be) has joined #ceph
[2:33] <nigwil> anyone suggest how big the XFS journal device should be in a RBD/RADOS deployment? is there a rule of thumb for the sizing might be a better question
[2:39] <lurbs> The short answer is "it depends". I believe that the journal will start flushing to disk if it's half full, or after the number of seconds specified by "filestore max sync interval" (defaults to 5).
[2:39] <mrjack> it seems on any pgmap update io hangs for some ms?
[2:41] <lurbs> nigwil: From http://ceph.com/docs/master/rados/configuration/ceph-conf/ : "A journal size should find the product of the filestore max sync interval and the expected throughput, and multiply the product by two (2): 'osd journal size = {2 * (expected throughput * filestore max sync interval)}'"
[2:42] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[2:43] <nigwil> lurbs, is the OSD journal and the XFS journal the same thing? I got the idea they were different (since EXT4 doesn't have a journal)
[2:44] <lurbs> The OSD journal is a Ceph-ism, and is distinct from the journal on the XFS filesystem itself.
[2:45] <nigwil> makes sense. My original question was about the XFS journal
[2:45] <lurbs> Every OSD needs a journal, which optimally should be on SSD, or at least a separate drive to the OSD.
[2:45] <lurbs> Oh, right. In which case I have no idea. :)
[2:45] <nigwil> can/should OSD's share a journal?
[2:46] <lurbs> They can share a physical device, but not a journal.
[2:46] <nigwil> lurbs: on the XFS journal I suspect the calculation is very similar since the new XFS implementation does delayed logging so the parameters are similiar
[2:47] <lurbs> You can have the journals existing as a file on the same filesystem, or as partitions (logical or otherwise) on the same device.
[2:48] <nigwil> so in configuring an OSD host, I would have two OS drives in RAID1, one spindle for OSD journals, and the remaining drives as OSDs
[2:48] <lurbs> But you'll hit bottlenecks quickly if it's not fast enough.
[2:48] <lurbs> All writes have to pass through the OSD journals, so you really want that as fast as is reasonable.
[2:49] <nigwil> is an SSD practical though given the wear-leveling/write lifetime limit?
[2:50] <lurbs> Depends on the SSD, but for the newer server-class ones, yes. You might want to look at not allocating all of the SSD's space to enable it to wear-level more effectively though.
[2:51] <lurbs> I'm looking at Intel S3700's for production, personally.
[2:51] <nigwil> ok. I've not used SSD before. I thought the reserved space for wear leveling was hidden from the user, so as you described if I get a 300GB SSD and partition off 100GB for journals, the remaining 200GB for be wear-leveing?
[2:52] <nigwil> leveling
[2:52] <lurbs> There's always *some* hidden space, but it can help to allocate more. Depends on the individual SSD.
[2:53] <lurbs> Larger SSDs can also tend to be faster, so there's that advantage too.
[2:54] <nigwil> ok, it appears that TRIM support is needed to allocate space for wear-leveling
[2:54] <lurbs> http://anandtech.com/show/6489/playing-with-op
[2:55] <lurbs> Spare area vs performance.
[2:57] <nigwil> excellent, that explains it well
[3:12] * steve_ (~astalsi@c-69-255-38-71.hsd1.md.comcast.net) has joined #ceph
[3:13] <steve_> Hey guys, anyone know anything about using rbds as a storage layer for xen?
[3:14] * portante (~user@12.130.126.82) has joined #ceph
[3:22] <nigwil> steve_: I'm a newbie, so could be wrong, but I didn't think there was direct Xen support (yet), have you heard otherwise?
[3:27] <steve_> nigwil: Theres stuff as far back as the old xen wiki mentioning it, but no real docs
[3:28] <steve_> I'm trying it with phy: devices, but hoping ot get something better.
[3:28] <nigwil> yes I just went looking too, and I see a few people trying but that is all
[3:28] * steve_ sighs.
[3:28] <steve_> Its even in the 4.2.1 codebase!
[3:30] <nigwil> "Sylvain Munaut" on the mailing list seems to be the most active with Xen
[3:30] <steve_> nigwil: thanks. I'll probably go poke the mailing list next.
[3:52] * fox_ (~fox@gw.farpost.ru) has joined #ceph
[3:52] * fox_ is now known as foxdalas
[3:55] <foxdalas> hi there. could someone give me an advice, Im gonna to install ceph, and cannot make a choice between 0.56.4 and 0.60.
[3:56] <nigwil> I've been playing with Ceph (for the first time) over the last few days, and I started with 0.60 but hit a recent show-stopping bug so I switched to 0.56.4 and it works fine
[3:58] <foxdalas> thx
[4:00] <foxdalas> How large is your storage?
[4:10] <nigwil> I'm only testing, so 8 x 500GB drives
[4:10] <nigwil> I am going to rebuild the cluster over the next few days since I mangled adding more MONs
[4:23] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[4:28] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[4:44] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:57] * portante (~user@12.130.126.82) Quit (Remote host closed the connection)
[5:02] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[5:07] * humbolt (~elias@80-121-53-166.adsl.highway.telekom.at) Quit (Quit: humbolt)
[5:28] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Ping timeout: 480 seconds)
[5:39] * Yen (~Yen@ip-83-134-66-147.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[5:41] * Yen (~Yen@ip-83-134-66-147.dsl.scarlet.be) has joined #ceph
[5:49] * Yen (~Yen@ip-83-134-66-147.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[5:50] * Yen (~Yen@ip-81-11-238-118.dsl.scarlet.be) has joined #ceph
[5:55] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[6:02] * joshd1 (~joshd@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[6:26] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[6:37] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[6:42] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[6:59] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Quit: Leaving.)
[7:08] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: Pogoapp - http://www.pogoapp.com)
[7:27] * DLange is now known as Guest2306
[7:28] * Guest2306 is now known as DLange
[7:31] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[7:35] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[7:48] * norbi (~nonline@buerogw01.ispgateway.de) has joined #ceph
[7:53] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:54] * leseb1 (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:54] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Connection reset by peer)
[7:55] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[7:56] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[8:05] * leseb1 (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[8:29] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[8:30] * joshd1 (~joshd@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[8:37] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[9:10] * vipr (~vipr@78-23-113-211.access.telenet.be) has joined #ceph
[9:11] * rustam (~rustam@94.15.91.30) has joined #ceph
[9:13] * vipr_ (~vipr@78-23-112-53.access.telenet.be) Quit (Read error: Operation timed out)
[9:15] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[9:15] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:17] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:23] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[9:26] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:31] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[9:35] * vipr (~vipr@78-23-113-211.access.telenet.be) Quit (Quit: leaving)
[9:35] * vipr (~vipr@78-23-113-211.access.telenet.be) has joined #ceph
[9:37] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:40] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:44] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:58] * l0nk (~alex@83.167.43.235) has joined #ceph
[10:00] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[10:06] * stiller1 (~Adium@2001:980:87b9:1:30f7:19d2:ed14:a775) has left #ceph
[10:06] * stiller1 (~Adium@2001:980:87b9:1:30f7:19d2:ed14:a775) has joined #ceph
[10:10] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[10:15] <jerker> yes i fould my ultimate storage node. (high density, 1U) http://www.aaeonusa.com/products/details/?item_id=1748
[10:15] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[10:16] <wogri_risc> SATA Disks... bah...
[10:17] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[10:17] <jerker> hey i have 5 faulty 3TB SAS drives here which I can throw at you :-)
[10:18] <jerker> well maybe only two are actually broken but they took down the whole Dell Powervault storagearray they where fitted to...
[10:18] <wogri_risc> hah :)
[10:19] <wogri_risc> looking at the server chassis I think: "how do you replace a disk if it fails?"
[10:19] <jerker> no problem, yank power cord, remove server, replace drive, put server back again.
[10:19] <jerker> yank ethernet too
[10:20] <wogri_risc> yay.
[10:20] <jerker> actually, I will probably just remove power connector to the drive and let it stay. The cost for be switching a drive is probably one hour of work. The cost for a new drive slot is around the same amount.. When a couple of drives are down then maybe one can fix the server..
[10:21] <wogri_risc> I think like that, too.
[10:21] <jerker> or actually dont touch anything if it is ok to have a faulty drive in the node. I have not thought about that, i usually remove the faulty ones.
[10:22] <jerker> now the problem is how i can manage to get the swedish government to buy those nodes. (university rules)
[10:23] <wogri_risc> maybe autonomica wants to sponsor. they make the money, I heard :)
[10:25] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[10:29] * rahmu (~rahmu@83.167.43.235) has joined #ceph
[10:31] * foxdalas (~fox@gw.farpost.ru) Quit (Quit: This computer has gone to sleep)
[10:38] <jerker> i wish to decouple the whole university networks from SUNET and the NORDUNET. Cheaper to just go down town and buy 100 Gbit/s from a local/national Internet supplier than be in the business our selves... But what shall those people work with then? Think about the children! :-(
[10:39] * LeaChim (~LeaChim@176.250.177.64) has joined #ceph
[10:40] <wogri_risc> don't do this! you'll lose so much you didn't even know you had. All the SAML integration, eduroam is gone forever, and so on and so forth. I've been working at an NREN, so I know about the benefits the universities have.
[10:43] <absynth> it is generally a highly unwise idea to decouple from the national science network, be it in sweden or elsewhere
[10:43] <jerker> For the compute intensive departments I cannot even use Eduroam, it takes up to two minutes for an Android phone or tablet to connect and get an IP... We have to have a parallell network ourselves to get things to work.
[10:43] <absynth> and you will usually not get this through unless you have *really* good reasons (other than the price)
[10:44] <jerker> We were cutting edge 10 years ago. We are not anymore.
[10:44] <jerker> Well make that 20 years ago.
[10:44] <wogri_risc> my phone connects to eduroam just fine
[10:44] <joao> jerker, around here, eduroam doesn't even roam
[10:45] <absynth> what do you mean, doesn't roam?
[10:45] <wogri_risc> there are myths of researchers out there who really make use of the NREN and geant. not where I live, though. :)
[10:47] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[10:47] <joao> you are a student at university A; you go to university B to (e.g.) attend a seminar; you try to use eduroam with your university A login; that doesn't work because the login servers are such a headache to maintain, that no university around here has them working
[10:47] <joao> absynth, ^
[10:47] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:47] <joao> and these are universities in the same city, btw
[10:48] <wogri_risc> a eduroam server is just a radius server.
[10:48] <wogri_risc> sounds like very bad sysadmins to me.
[10:48] * vo1d (~v0@91-115-224-180.adsl.highway.telekom.at) has joined #ceph
[10:48] <joao> no argument there
[10:49] <wogri_risc> heh :)
[10:50] <Vjarjadian> anyone here use Nas4Free as a base for OSDs?
[10:50] <jerker> some universitys have very well locked down windows environments, say they are so good at security (no users can install stuff them selves) total control of hardware (we are the government) but trying to get a RADIOS to integrate into their active directory with passwords is too difficult.
[10:50] <jerker> radius
[10:51] * v0id (~v0@62-46-171-140.adsl.highway.telekom.at) Quit (Read error: Operation timed out)
[10:51] <wogri_risc> yeah. great deal. university of vienna used to be different. very good unix sysadmins there. only small group of windows sysadmins. but that is about to change, too. new ceo, new rules...
[10:51] <joao> ceo?
[10:52] <joao> universities have ceo's?
[10:53] <wogri_risc> yeah. very new and very great in vienna.
[10:53] <wogri_risc> and very shitty for all the people there. dozens have left.
[10:53] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[10:53] <wogri_risc> or is he the CIO? don't know, I left when he came.
[10:55] <joao> I'm assuming that's a private university then?
[10:55] <absynth> that constant Americanization in the university system sucks big time
[10:55] <joao> privately held I mean
[10:55] <wogri_risc> nope.
[10:55] <joao> oh
[10:55] <absynth> public universities in Germany have CIOs, now, too
[10:55] <absynth> no CEOs, though, that's kinda the job of the Chancellor
[10:56] <joao> oh, that just seems wrong
[10:56] <wogri_risc> it is.
[10:57] <wogri_risc> there we go - organisational chart: http://zid.univie.ac.at/organigramm/
[10:57] <wogri_risc> it totally states 'CIO' there
[10:57] <jerker> it is quite interesting when they tried to calculate the number of operating systems at the university here... Ok, what to count? desktops. Ok. Servers, yes, but nit HPC. No tablets... Lets ignore Android. Ok, we are running Windows (and some macs, but they can dual boot!)! Lets get a campus agreement. Yeah right.
[10:57] <joao> I wonder why an university would need a CIO
[10:57] * jerker should get back to work
[10:58] <joao> I get that universities should be able to generate revenue and all, but I don't even grasp the concept of an university being (even slightly) run as a company :\
[10:59] <jerker> we get a lot of money out of corporate research. :/
[10:59] <joao> but then again, it's probably people like me that drove the country to the deficit sink hole we're all in now
[10:59] <wogri_risc> all the 'new generation' university cio's in austria are coming from companies, hired from the outside. they're introducing buzzwords and complicated workflows. no one likes it.
[11:00] <absynth> ITIL!
[11:00] * absynth ducks and runs
[11:06] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:08] <jerker> PM3!
[11:10] <jerker> " pm� is the most popular model for system maintenance management in Sweden today, and can be used by all organizations regardless of the industry. "
[11:10] <absynth> sounds like something women would suffer from
[11:11] <absynth> oh no, that was PMS
[11:41] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[11:42] <rahmu> hello, I'm trying to set up a radosgw install and use it with Swift-compatible clients. I followoed this instructions found here (http://ceph.com/docs/master/start/quick-rgw/) but no matter how much I try to connect, I keep getting 403 Forbidden errors. Can anyone help me, please?
[11:48] * smeven (~diffuse@101.170.97.238) has joined #ceph
[11:50] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[12:03] * norbi (~nonline@buerogw01.ispgateway.de) Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[12:25] * acalvo (~acalvo@208.Red-83-61-6.staticIP.rima-tde.net) has joined #ceph
[12:25] <acalvo> Hi!
[12:25] <acalvo> In a test environment (2 virtual machines), I'm seeing the following error a lot: mons are laggy or clocks are too skewed
[12:26] <acalvo> and perfomance seems incorrect (trying to upload a 2.3Gb takes 11m)
[12:26] <acalvo> using latest ubuntu version (0.56.4)
[12:27] * fox_ (~fox@95.154.88.151) has joined #ceph
[12:28] <wogri_risc> acalvo, did you install ntp on both machines?
[12:28] <wogri_risc> oh, you only have one server?
[12:31] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:32] <acalvo> I have 2 MON, 2 MDS, 2 OSD and one radosgw
[12:33] <acalvo> both machines are pulling the time from the host machine, so they are synchornized
[12:35] <wogri_risc> wait, you have vm's running ceph?
[12:35] <acalvo> Yes, it's a test lab to get an overview of the functionality
[12:36] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[12:36] <wogri_risc> is the vm host very busy?
[12:36] <acalvo> it has some load, but it can handle more
[12:37] <wogri_risc> maybe this is not the best place to test a cluster setup where timing matters.
[12:37] <acalvo> you may be right
[12:38] <acalvo> but for an initial overview, should be enough, shouldn't it?
[12:38] <wogri_risc> maybe you're off best if you ignore the timing messages, continue your tests but don't measure ceph on performance on this setup.
[12:38] <wogri_risc> you can get an overview on it, no question.
[12:39] <acalvo> thanks, that clarifies things
[12:39] <acalvo> I just wanted to drop a possible configuration error
[12:39] <nigwil> with RADOSgw, how does it scale, DNS RR against the web-service?
[12:40] <acalvo> no DNS, just raw IP access
[12:40] <wogri_risc> if you're sure that the vm's are in sync with a ntp server just blame the hypervisor :)
[12:40] <acalvo> it's just one radosgw now
[12:40] <nigwil> thanks acalvo, actually my question was more a general one about how to scale-out RADOSgw services
[12:41] <acalvo> nigwil, I'm sorry, my bad, I thought it was for my setup
[12:41] <nigwil> no worries :-)
[12:42] <wogri_risc> nigwil, I don't know about radosgw in particular, but you can always use a bunch of load balancers with haproxy. combined with vrrp you get a pretty decent ha load balancer.
[12:43] <nigwil> thanks wogri_risc, I'll look into that
[12:44] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[12:45] * rahmu (~rahmu@83.167.43.235) Quit (Remote host closed the connection)
[12:50] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:53] * rahmu (~rahmu@83.167.43.235) has joined #ceph
[12:57] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[13:20] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[13:30] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[13:34] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[13:35] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[13:38] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[13:45] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:08] * rahmu (~rahmu@83.167.43.235) Quit (Remote host closed the connection)
[14:17] * dxd828 (~dxd828@195.191.107.205) Quit (Remote host closed the connection)
[14:19] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[14:19] * rahmu (~rahmu@83.167.43.235) has joined #ceph
[14:24] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[14:24] * fox_ (~fox@95.154.88.151) Quit (Quit: This computer has gone to sleep)
[14:32] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[14:39] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) Quit (Quit: Leaving.)
[14:42] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:45] * diegows (~diegows@190.190.2.126) has joined #ceph
[14:46] * fox_ (~fox@95.154.88.151) has joined #ceph
[14:47] * DarkAce-Z is now known as DarkAceZ
[14:50] * fox_ (~fox@95.154.88.151) Quit ()
[15:09] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:18] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[15:26] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[15:34] * acalvo (~acalvo@208.Red-83-61-6.staticIP.rima-tde.net) Quit (Quit: Ex-Chat)
[15:39] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[15:39] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Read error: Connection reset by peer)
[15:40] * madkiss (~madkiss@2001:6f8:12c3:f00f:189:b6d7:1c53:c946) has joined #ceph
[15:47] * rekby (~Adium@2.93.58.253) has joined #ceph
[15:48] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[15:48] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[15:48] * ChanServ sets mode +o scuttlemonkey
[15:52] <rekby> Hello, I try send email to alex@inktank.com and request support by form on website, I don't receive answer.
[15:52] <rekby> Can you write mу about your price?
[15:53] * yehuda_hm (~yehuda@2602:306:330b:1410:dc4c:623b:a208:e17) has joined #ceph
[15:54] * drokita (~drokita@199.255.228.128) has joined #ceph
[15:54] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[15:57] <nhm> rekby: info@inktank.com is probably the right email address to use to ask about pricing.
[15:57] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[15:57] <rekby> thanks
[16:00] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:03] * mib_j3w0dq (d4af59a2@ircip2.mibbit.com) has joined #ceph
[16:03] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/http://listoffreebitcoinwebsites.blogspot.com/
[16:03] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:03] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:03] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:03] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:03] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:04] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:04] * mib_j3w0dq (d4af59a2@ircip2.mibbit.com) has left #ceph
[16:05] * mib_j3w0dq (d4af59a2@ircip2.mibbit.com) has joined #ceph
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] <mib_j3w0dq> http://listoffreebitcoinwebsites.blogspot.com/
[16:05] * mib_j3w0dq (d4af59a2@ircip2.mibbit.com) has left #ceph
[16:06] <nhm> lovely
[16:07] * fox_ (~fox@95.154.88.151) has joined #ceph
[16:07] <vhasi> impressive
[16:08] <absynth> it seems he or she is maintaining a list of free bitcoin websites.
[16:08] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[16:09] <nhm> absynth: I'm not sure, maybe if they provided a link I could verify it for myself.
[16:09] <absynth> i was just guessing, too. it's hard to determine.
[16:10] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[16:10] * Havre (~Havre@2a01:e35:8a2c:b230:a443:6c35:b44c:5746) has joined #ceph
[16:11] <vhasi> that page had a serious amount of total crap on it
[16:12] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[16:13] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[16:16] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[16:16] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[16:20] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[16:22] <rekby> can anybody help me to setup multi-mds config? (1 active and 1-2 hot stand-by)
[16:22] <rekby> Now I have cluster with 3 mds node and ceph -s
[16:22] <rekby> health HEALTH_OK
[16:22] <rekby> monmap e3: 3 mons at {a=81.177.175.254:6789/0,b=81.177.167.2:6789/0,c=81.177.175.102:6789/0}, election epoch 90, quorum 0,1,2 a,b,c
[16:22] <rekby> osdmap e258: 3 osds: 3 up, 3 in
[16:22] <rekby> pgmap v33690: 576 pgs: 576 active+clean; 178 MB data, 3621 MB used, 206 GB / 209 GB avail; 598B/s wr, 0op/s
[16:22] <rekby> mdsmap e2329: 1/1/1 up {0=a=up:active}
[16:23] <rekby> i see 1 mds active and don't see any standby mds
[16:24] <PerlStalker> Is it normal to see messages like these from time to time: 2013-04-11 17:23:36.760986 7fddaa64b700 0 -- 192.168.253.80:6803/36411 submit_message osd_op_reply(261265 rb.0.10d0b.2ae8944a.0000000003cc [write 560640~4096] ondisk = 0) v4 remote, 192.168.253.92:0/1011452, failed lossy con, dropping message 0x58e2800
[16:27] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[16:27] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[16:31] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[16:32] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[16:32] * dosaboy (~dosaboy@ip-64-134-229-164.public.wayport.net) Quit (Ping timeout: 480 seconds)
[16:32] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[16:49] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Quit: Leaving)
[16:50] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[16:52] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[16:52] * vata (~vata@2607:fad8:4:6:6c73:55ef:8faa:2314) has joined #ceph
[16:52] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[16:54] * dosaboy (~dosaboy@ip-64-134-229-164.public.wayport.net) has joined #ceph
[16:54] * capri (~capri@p54A54FA2.dip0.t-ipconnect.de) has joined #ceph
[16:56] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[17:04] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[17:08] <topro> is scrubbing a cpu-intense task on a ceph osd host?
[17:08] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:13] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[17:18] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit ()
[17:19] <ron-slc> topro: In my experience, as a ceph-user, no. Though a deep-scrub will be a little more intensive, as it reads much more data and content. I'm using E5 Xeons, so there may also be cpu-optimizations???
[17:20] <ron-slc> I know there have also been optimizations to ceph recently (bobtail) in the scrubbing topic.
[17:22] <topro> well, to be honest. the test host where I see those high ceph-osd cpu-load during scrubbing is a AMD Athlon 4850 dual core, so even little cpu-intese jobs will show high cpu-load-figures :/
[17:23] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:23] <ron-slc> Ahh on an athlon that may be possible. I used to have a test system on an Athlon Quad-Core, but I destroyed it a few months back. I know I've seen good results on Opterons..
[17:24] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[17:24] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit ()
[17:25] * fred1 (~fredl@2a00:1a48:7803:107:8532:c238:ff08:354) has left #ceph
[17:25] * dosaboy (~dosaboy@ip-64-134-229-164.public.wayport.net) Quit (Ping timeout: 480 seconds)
[17:27] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:28] <ron-slc> I guess one way you can test the baseline file system utilization (non-ceph). Would be to unmount the OSD's, on that system; and then do : find /var/lib/ceph/osd/ -type f -exec cat {} /dev/null \; And see where that takes CPU on just speed-reading files from your file systems.
[17:28] * fox_ (~fox@95.154.88.151) Quit (Quit: This computer has gone to sleep)
[17:29] <ron-slc> correction cat {} > /dev/null (forgot the Greater-than symbol for redirection.)
[17:30] * rahmu (~rahmu@83.167.43.235) Quit (Remote host closed the connection)
[17:39] * markl (~mark@tpsit.com) has joined #ceph
[17:39] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:47] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[17:49] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[17:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[18:01] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:08] * BillK (~BillK@124-169-41-139.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:09] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[18:09] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:10] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[18:13] * alram (~alram@38.122.20.226) has joined #ceph
[18:14] * rustam (~rustam@94.15.91.30) has joined #ceph
[18:15] * loicd (~loic@67.23.204.250) has joined #ceph
[18:15] * loicd (~loic@67.23.204.250) Quit ()
[18:21] * BillK (~BillK@58-7-56-33.dyn.iinet.net.au) has joined #ceph
[18:25] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[18:27] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:30] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:36] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:38] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[18:39] * jeroenm (~jeroenm@176.62.136.225) has joined #ceph
[18:39] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit ()
[18:40] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[18:40] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[18:41] * portante (~user@67.23.204.250) has joined #ceph
[18:41] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:49] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[18:49] * joshd1 (~joshd@67.23.204.250) has joined #ceph
[18:55] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[18:57] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[19:00] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: IceChat - Its what Cool People use)
[19:06] <jeroenm> hi room
[19:07] * BillK (~BillK@58-7-56-33.dyn.iinet.net.au) Quit (Read error: No route to host)
[19:07] <jeroenm> can I break the silence with a noob question about phprados?
[19:08] <gregaf> I think it's too late — you just broke the silence with a question about etiquette </pedantry>
[19:08] <gregaf> ;)
[19:08] <jeroenm> :-)
[19:08] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[19:08] <jeroenm> So, I succeed to write data, now I want to retrieve it again. Shoud I use rados_read?
[19:09] <gregaf> depending on the question you might have to wait for wido to be around
[19:09] <gregaf> …but I'm going to hazard a guess that "yes" is the answer there
[19:10] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[19:10] <jeroenm> I taught so, still I didn't succeed to get any data back
[19:11] <gregaf> hmm, I haven't used those bindings so I'm afraid I'm not sure
[19:12] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:13] <jeroenm> I'll be patient and wait for wido :-)
[19:18] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[19:22] <mikedawson> gregaf: I am having a libvirt / cephx auth issue on Grizzly + Raring. I assume joshd is at the summit, do you know who else has experience in that area?
[19:22] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:23] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:23] <gregaf> pretty much him, sorry
[19:24] * BillK (~BillK@124-148-90-35.dyn.iinet.net.au) has joined #ceph
[19:25] <mikedawson> gregaf: no worries, thanks
[19:26] * rustam (~rustam@94.15.91.30) has joined #ceph
[19:27] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[19:28] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:34] * joshd1 (~joshd@67.23.204.250) Quit (Ping timeout: 480 seconds)
[19:39] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Operation timed out)
[19:39] * eschnou (~eschnou@182.189-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:40] * loicd (~loic@67.23.204.250) has joined #ceph
[19:40] * joshd1 (~joshd@67.23.204.250) has joined #ceph
[19:41] * loicd (~loic@67.23.204.250) Quit ()
[19:43] * rekby (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[19:44] * dmick (~dmick@2607:f298:a:607:2195:1fb3:8d86:a2c4) has joined #ceph
[19:48] * loicd (~loic@67.23.204.250) has joined #ceph
[19:54] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) Quit (Quit: Leaving.)
[19:55] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:00] * dosaboy (~dosaboy@67.23.204.250) has joined #ceph
[20:00] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[20:01] * joshd1 (~joshd@67.23.204.250) Quit (Ping timeout: 480 seconds)
[20:03] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:04] * portante (~user@67.23.204.250) Quit (Ping timeout: 480 seconds)
[20:04] * jeroenm (~jeroenm@176.62.136.225) Quit (Ping timeout: 480 seconds)
[20:07] <wido> gregaf: FYI, the phprados bindings are a 1 on 1 mapping to the rados C API
[20:09] <gregaf> that's what I expected, but I don't know php at all so I'm not sure how that translates in terms of data structures or arguments — especially when somebody says "I can write but not read", if they probably didn't write or if they failed to read ;)
[20:11] * Cube (~Cube@12.248.40.138) has joined #ceph
[20:14] * eschnou (~eschnou@182.189-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[20:14] * loicd1 (~loic@67.23.204.250) has joined #ceph
[20:15] * loicd (~loic@67.23.204.250) Quit (Read error: Connection reset by peer)
[20:16] <wido> gregaf: Get it :) Hopefully he sends it to the ml
[20:17] * BillK (~BillK@124-148-90-35.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[20:19] * verwilst (~verwilst@dD576962F.access.telenet.be) has joined #ceph
[20:20] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[20:21] <dspano> I've got 8pg stuck in active+degraded due to an osd being restarted too fast. If I let the cluster do it's thing, will it eventually sort itself out?
[20:22] <dmick> started too fast?
[20:22] * jeroenm (~jeroenm@176.62.136.225) has joined #ceph
[20:25] <janos> sounds like premature estartulation
[20:25] <dspano> Lol.
[20:25] * pachuco (58f00adf@ircip1.mibbit.com) has joined #ceph
[20:25] <dspano> I had this happen once before because the osd was restarted before the cluster counted it as down.
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <pachuco> http://listoffreebitcoinwebsites.blogspot.com/
[20:26] <janos> Zzzz
[20:26] * pachuco (58f00adf@ircip1.mibbit.com) Quit (Killed (tjfontaine (No reason)))
[20:26] * ChanServ sets mode +o dmick
[20:26] <dmick> sigh
[20:27] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[20:27] <sjust> I'm going to that website now
[20:28] <sjust> if he'd only posted it 6 times, I would have been able to resist
[20:28] <janos> haha
[20:31] <dspano> It seems that ceph pg.num mark_unfound_lost revert is the only option?
[20:32] <dspano> Looks like some of the pgs are active on two osds when I run ceph pg.num query.
[20:33] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[20:34] * BillK (~BillK@58-7-109-193.dyn.iinet.net.au) has joined #ceph
[20:38] <dmick> dspano: can you show some query output that shows what you mean?
[20:39] * jeroenm (~jeroenm@176.62.136.225) Quit (Ping timeout: 480 seconds)
[20:39] <dspano> Here's the query for one of them. http://pastebin.com/6H1JH43z
[20:41] <dmick> that looks normal except for the 'degraded' state; is your pool size 2, or larger?
[20:43] <dspano> 2 right now.
[20:44] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[20:45] * diegows (~diegows@host28.190-30-144.telecom.net.ar) has joined #ceph
[20:46] * loicd1 (~loic@67.23.204.250) Quit (Quit: Leaving.)
[20:51] * jeroenm (~jeroenm@176.62.136.225) has joined #ceph
[21:02] * chutzpah (~chutz@199.21.234.7) Quit (Remote host closed the connection)
[21:07] * loicd (~loic@67.23.204.250) has joined #ceph
[21:09] * eschnou (~eschnou@182.189-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:09] * capri (~capri@p54A54FA2.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[21:12] * jeroenm (~jeroenm@176.62.136.225) Quit (Quit: Leaving)
[21:15] * loicd (~loic@67.23.204.250) Quit (Ping timeout: 480 seconds)
[21:16] * loicd (~loic@67.23.204.250) has joined #ceph
[21:22] * loicd (~loic@67.23.204.250) Quit (Quit: Leaving.)
[21:22] * loicd (~loic@67.23.204.250) has joined #ceph
[21:23] * loicd (~loic@67.23.204.250) Quit ()
[21:24] <dspano> dmick: Should I wait it out?
[21:24] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Remote host closed the connection)
[21:26] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[21:26] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[21:27] * BillK (~BillK@58-7-109-193.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[21:27] * rustam (~rustam@94.15.91.30) has joined #ceph
[21:31] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) has joined #ceph
[21:35] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[21:36] * dosaboy (~dosaboy@67.23.204.250) Quit (Quit: leaving)
[21:41] * verwilst (~verwilst@dD576962F.access.telenet.be) Quit (Quit: Ex-Chat)
[21:41] * steve_ (~astalsi@c-69-255-38-71.hsd1.md.comcast.net) has left #ceph
[21:44] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[21:48] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[21:51] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[21:58] * gregaf2 (~Adium@2607:f298:a:607:114a:6960:bfaa:e904) has joined #ceph
[21:59] * loicd (~loic@67.23.204.250) has joined #ceph
[22:04] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[22:05] * gregaf1 (~Adium@2607:f298:a:607:4934:567d:851d:dcef) Quit (Ping timeout: 480 seconds)
[22:06] <dmick> dspano: sorry, lunch and distraction
[22:07] <dmick> has anything changed about 19.4?
[22:14] <dspano> Nope.
[22:15] <dspano> I was thinking of setting the OSD that started this to out to see what happens.
[22:17] <dspano> dmick: Here's what ceph health detail says. http://pastebin.com/cxGGNaSh
[22:20] <dmick> ceph -s or ceph osd tree show osd2, 3 are normal?
[22:20] * eschnou (~eschnou@182.189-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:21] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[22:24] <dspano> Yeah.
[22:25] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[22:25] <dspano> Both up and in
[22:27] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[22:30] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:33] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[22:33] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Remote host closed the connection)
[22:35] * LeaChim (~LeaChim@176.250.177.64) Quit (Ping timeout: 480 seconds)
[22:38] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[22:44] <dmick> dspano: can you post a ceph pg dump as well? we're chatting about your situation
[22:44] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:45] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[22:45] * LeaChim (~LeaChim@90.215.24.238) has joined #ceph
[22:47] * joshd1 (~joshd@67.23.204.250) has joined #ceph
[22:47] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[22:47] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[22:52] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[22:52] * rustam (~rustam@94.15.91.30) has joined #ceph
[22:53] <mgalkiewicz> hi guys can someone explain to me option "mon initial members" http://ceph.com/docs/master/rados/configuration/mon-config-ref/?highlight=mon%20initial%20members#initial-members
[22:54] * nigly (~tra26@tux64-13.cs.drexel.edu) has joined #ceph
[22:54] <mgalkiewicz> if I understand correctly putting there 3 mons will require 3 mons to make quorum even though only 2 are required
[22:54] <mgalkiewicz> is it correct?
[22:55] * yasu` (~yasu`@dhcp-59-149.cse.ucsc.edu) has joined #ceph
[22:55] <nigly> anyone experience problems with xfs on ceph as in everything is fine and suddenly IO errors
[22:55] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[22:56] <nigly> because that has been my life for the last 3 days
[22:56] <dmick> when you say "xfs on ceph", what do you mean?
[22:56] <nigly> a rbd that has xfs put on it
[22:57] <dmick> so, mapped by the kernel?
[22:57] <nigly> rbd map image, mkfs.xfs, everything works and then boom its dead
[22:58] <dmick> and you're not running OSDs on the same machine with the mapping, right?
[22:58] <nigly> no definitely now
[22:58] <nigly> not*
[22:58] <dmick> well, no, that's unusual then, and in fact IIRC we test xfs on krbd as part of our suites
[23:00] <dmick> mgalkiewicz: I believe that only sets the initial members in the set, not further requirements on quorum. It probably wants to have them all come up at least once
[23:00] <nigly> we did have a osd disk failure but there should have been a replica. I figured it wasn't normal, I've also had a rbd that wouldn't mount xfs on a certain host worked fine on another but yelled of io errors on the one
[23:02] <nigly> time to find out how much support costs, all in all though when everything works ceph is/was awesome
[23:02] <nigly> dmick: any idea what I should look for in the logs or anything?
[23:03] <dmick> what's the nature of the i/o errors? Is it locking up/timing out, or are they data failures, or?...
[23:04] <mgalkiewicz> dmick: thx for info
[23:06] <nigly> I get xfs_log_force errors when I try to xfs_repair it let me try to find one that is still mounted and I didn't clear the dmesg
[23:06] <dmick> mgalkiewicz: the default is for everyone that comes up to try to join; you can limit the set this way, although I'm not sure why you'd want to have monitors that are not in the quorum
[23:07] <dspano> dmick: http://pastebin.com/g0dY2UmC
[23:07] <dmick> 19.4
[23:10] * loicd (~loic@67.23.204.250) Quit (Read error: Connection reset by peer)
[23:10] * joshd1 (~joshd@67.23.204.250) Quit (Read error: Connection reset by peer)
[23:10] * loicd (~loic@67.23.204.250) has joined #ceph
[23:10] <nigly> ok so a block of hash information on the one but I am going through convo logs and we get a 'can
[23:10] * loicd (~loic@67.23.204.250) Quit (Read error: Connection reset by peer)
[23:11] <nigly> 't read from block device
[23:12] <dmick> nigly: I assume the cluster is healthy? can you access that image with the rbd cli? Are reads outside xfs hung?
[23:13] <dmick> dspano: what version?
[23:13] * joshd1 (~joshd@67.23.204.150) has joined #ceph
[23:14] <nigly> dmick: it is recovering right now, nothing is inactive although the kernel yells about partitions when it maps them. what do you mean access it with the rbd cli?
[23:15] <dmick> like rbd ls
[23:15] <dmick> rbd info
[23:15] <dmick> rbd export
[23:15] <nigly> info works fine
[23:15] <dmick> dspano: and can you also get a ceph osd dump
[23:16] <dmick> nigly: what kernel?
[23:16] * loicd (~loic@67.23.204.150) has joined #ceph
[23:16] * chutzpah (~chutz@199.21.234.7) Quit (Remote host closed the connection)
[23:16] <nigly> 3.2.0-39-generic and 3.2.0-40-generic
[23:20] * josef (~seven@li70-116.members.linode.com) Quit (Ping timeout: 480 seconds)
[23:21] <dmick> that's pretty old for krbd
[23:22] <dmick> http://ceph.com/docs/master/install/os-recommendations/#linux-kernel
[23:22] * LeaChim (~LeaChim@90.215.24.238) Quit (Ping timeout: 480 seconds)
[23:23] <nigly> ah
[23:23] <nigly> yay running precise
[23:24] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[23:24] <dmick> there have been bugs; the usual advice is to retest with a later kernel
[23:24] * dosaboy (~dosaboy@67.23.204.150) has joined #ceph
[23:26] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[23:26] <dspano> dmick: http://pastebin.com/fap3bXkS I didn't realize the pgs were prefaced by the pool id. The pool in question is a test pool. It has size 3. I just assumed the pgs that were causing problems were in the pool for volumes.
[23:27] <dmick> ah. well that would do it
[23:28] <dmick> I'm gonna guess crush map problems?
[23:28] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[23:29] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[23:30] <nigly> dmick: thanks I am going to try to get 3.4 on precise and see how that works
[23:30] * Havre (~Havre@2a01:e35:8a2c:b230:a443:6c35:b44c:5746) Quit (Ping timeout: 480 seconds)
[23:30] <dmick> nigly: nod
[23:31] <nigly> dmick: only the machines using krbd need the new kernel right the osds can remain where they are?
[23:31] <dmick> yeah
[23:33] <dspano> dmick: I guess so.
[23:33] * danieagle (~Daniel@177.133.174.100) has joined #ceph
[23:34] * loicd (~loic@67.23.204.150) has joined #ceph
[23:34] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[23:36] * loicd (~loic@67.23.204.150) Quit ()
[23:41] * joshd1 (~joshd@67.23.204.150) Quit (Ping timeout: 480 seconds)
[23:41] <dspano> dmick: Are you not able to change the size after creation?
[23:43] * loicd (~loic@67.23.204.150) has joined #ceph
[23:45] <dspano> dmick: Sorry. Stupid question.
[23:46] <dmick> dspano: that capability was just recently added
[23:46] <dmick> still considered experimental
[23:47] * loicd (~loic@67.23.204.150) Quit ()
[23:47] * loicd (~loic@67.23.204.150) has joined #ceph
[23:47] * loicd (~loic@67.23.204.150) Quit ()
[23:49] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[23:50] <dspano> dmick: Lol.
[23:50] <dspano> dmick: Good thing I did it on the test pool.
[23:51] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Quit: Ex-Chat)
[23:52] <dspano> I deleted the pool and everything is fine now. :)
[23:52] <dmick> cool
[23:52] <PerlStalker> How "periodically" does ceph auto-scrub pgs?
[23:53] <dmick> PerlStalker: it's tunable, big surprise
[23:53] <dspano> dmick: So at this point the only way to change the replicas is to create a new pool and copy the old one into it.
[23:54] <dmick> argh. sorry; the experimental feature is to change the number of pgs/pool, not the number of replicas
[23:54] <dmick> pg_size, not size, IOW
[23:54] <dmick> changing size I think is doable in general
[23:54] <dmick> sorry
[23:54] <PerlStalker> dmick: Doc link?
[23:55] <PerlStalker> I've been seeing massive network usage during scrubs and I'm trying to understand what's going on.
[23:55] <dmick> look for osd config options scrub load threshold, scrub min interval, scrub max interval, etc.
[23:56] <dmick> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing
[23:57] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:57] <PerlStalker> dmick: Found it. Thanks.
[23:57] * PerlStalker sits back and reads
[23:59] * dosaboy_ (~dosaboy@67.23.204.150) has joined #ceph
[23:59] * dosaboy (~dosaboy@67.23.204.150) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.