#ceph IRC Log

Index

IRC Log for 2013-07-06

Timestamps are in GMT/BST.

[0:00] * dosaboy__ (~dosaboy@host86-163-15-98.range86-163.btcentralplus.com) has joined #ceph
[0:01] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[0:01] * BillK (~BillK-OFT@124-169-221-120.dyn.iinet.net.au) has joined #ceph
[0:02] * dosaboy_ (~dosaboy@host86-163-34-209.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[0:03] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[0:04] * dosaboy (~dosaboy@host86-161-206-168.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[0:04] * haomaiwang (~haomaiwan@211.155.113.208) Quit (Ping timeout: 480 seconds)
[0:05] * dosaboy (~dosaboy@host86-161-247-133.range86-161.btcentralplus.com) has joined #ceph
[0:07] * sleinen1 (~Adium@2001:620:0:25:bdf6:d723:c28c:d39b) Quit (Quit: Leaving.)
[0:08] * dosaboy__ (~dosaboy@host86-163-15-98.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[0:10] <mozg> guys, what is the recommended mkfs.xfs options for ceph osds if I am to store vm images?
[0:10] <mozg> no specifi purpose of the images
[0:10] <mozg> some web servers, some data base, some file servers
[0:10] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:16] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[0:19] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit ()
[0:22] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:32] <grepory> any openstack people in here know how to fix a volume stuck in an attaching state?
[0:32] <grepory> i will also try in their channel… just curious
[0:36] * nwat (~nwatkins@eduroam-226-128.ucsc.edu) has joined #ceph
[0:43] * mkoderer (uid11949@id-11949.tooting.irccloud.com) Quit (Remote host closed the connection)
[0:43] * s2r2 (uid322@id-322.hillingdon.irccloud.com) Quit (Remote host closed the connection)
[0:43] * Tribaal (uid3081@id-3081.hillingdon.irccloud.com) Quit (Remote host closed the connection)
[0:43] * scalability-junk (uid6422@id-6422.hillingdon.irccloud.com) Quit (Remote host closed the connection)
[0:43] * seif (uid11725@hillingdon.irccloud.com) Quit (Remote host closed the connection)
[0:51] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[0:56] * nwat (~nwatkins@eduroam-226-128.ucsc.edu) has left #ceph
[0:56] * haomaiwang (~haomaiwan@117.79.232.249) has joined #ceph
[1:04] * DarkAce-Z (~BillyMays@50.107.55.36) has joined #ceph
[1:04] <sagewk> joao: wip-5509, if you're around!
[1:04] * haomaiwang (~haomaiwan@117.79.232.249) Quit (Ping timeout: 480 seconds)
[1:09] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[1:14] * tnt (~tnt@91.176.58.19) Quit (Ping timeout: 480 seconds)
[1:20] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[1:22] * DarkAce-Z is now known as DarkAceZ
[1:24] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[1:28] * mozg (~andrei@host217-44-214-64.range217-44.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:30] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:31] <Psi-jack> heh.. Ummm, Wow...
[1:31] <Psi-jack> So I got Arch converted to CentOS for my SAN3 server. And ceph is loud as HECK to the pty. message from syslogd@san3 with ceph-mon messages..
[1:32] <Psi-jack> Is it just not logging to disk?
[1:33] * LeaChim (~LeaChim@90.221.247.164) Quit (Read error: Connection reset by peer)
[1:34] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[1:37] * dosaboy_ (~dosaboy@host86-145-219-174.range86-145.btcentralplus.com) has joined #ceph
[1:39] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[1:40] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[1:42] <Psi-jack> There we go. And no, it was logging to syslog, but syslog didn't know what to do with it. heh
[1:43] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[1:43] * dosaboy (~dosaboy@host86-161-247-133.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:43] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[1:45] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[1:48] * jebba (~aleph@72.19.178.3) has left #ceph
[1:57] * haomaiwang (~haomaiwan@117.79.232.249) has joined #ceph
[2:05] * haomaiwang (~haomaiwan@117.79.232.249) Quit (Ping timeout: 480 seconds)
[2:06] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[2:17] <joao> sagewk, shall look in 20 minutes :)
[2:19] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[2:21] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[2:21] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[2:47] <joao> sagewk, around?
[2:53] <grepory> is there a way to easily tell if qemu i'm using has support for rbd?
[2:54] <grepory> would it be linked against librados?
[2:57] <joshd1> yeah, you can also see the supported formats with qemu -drive format=?
[2:57] <grepory> Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2 qed parallels nbd blkdebug host_cdrom host_floppy host_device file
[2:57] * haomaiwang (~haomaiwan@106.3.103.144) has joined #ceph
[2:58] <grepory> (Since Qemu 0.14.0)
[2:58] <grepory> we're on 0.12
[2:59] <grepory> because enterprise linux sucks.
[2:59] <grepory> so that explains that.
[3:03] <joshd1> if you're using rhel and ceph cuttlefish you can use http://www.ceph.com/packages/ceph-extras/rhel6/x86_64/
[3:03] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[3:05] * haomaiwang (~haomaiwan@106.3.103.144) Quit (Ping timeout: 480 seconds)
[3:28] <Psi-jack> Hmmm, okay, I'm having an issue trying to get my osd's back online with the new installed CentOS.
[3:28] <Psi-jack> http://pastebin.ca/2418596
[3:28] * haomaiwang (~haomaiwan@117.79.232.249) has joined #ceph
[3:28] <Psi-jack> That's the osd log of of just one of them.
[3:29] <Psi-jack> Ahhhh, i think .. maybe I know..
[3:30] * haomaiwa_ (~haomaiwan@117.79.232.249) has joined #ceph
[3:31] * haomaiwang (~haomaiwan@117.79.232.249) Quit (Read error: Connection reset by peer)
[3:36] <Psi-jack> yeaaaaah.. I should've noticed that. The cephx keys were distributed only to the nodes that needed them. So they had not authentication because I'd copied ceph1's /cetc/ceph over.
[3:42] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[3:43] * portante is now known as portante|afk
[3:44] <buck> I haven't used teuthology in a bit and after updating it, I'm getting this error: "ImportError: No module named libvirt" but I have libvirt and the python bindings installed. This is on an ubuntu host (12.04, all up to date). Has anyone else run into this?
[3:44] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Remote host closed the connection)
[3:45] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[3:45] * portante|afk is now known as portante
[3:52] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[4:08] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[4:15] * s2r2 (uid322@id-322.charlton.irccloud.com) has joined #ceph
[4:19] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[4:52] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[5:00] * fireD (~fireD@93-139-159-20.adsl.net.t-com.hr) has joined #ceph
[5:06] * portante is now known as portante|afk
[5:07] * fireD1 (~fireD@93-139-180-146.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:15] * AfC (~andrew@2001:44b8:31cb:d400:64b2:fade:4e44:d8e2) has joined #ceph
[5:50] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[5:52] * portante|afk is now known as portante
[5:59] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[6:07] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[6:10] <Psi-jack> yay... 2 of 3 ceph servers converted from Arch to CentOS 6.4, successfully. :D
[6:20] * Tribaal (uid3081@id-3081.ealing.irccloud.com) has joined #ceph
[6:28] * AfC (~andrew@2001:44b8:31cb:d400:64b2:fade:4e44:d8e2) Quit (Quit: Leaving.)
[6:31] * portante is now known as portante|afk
[6:42] * scalability-junk (uid6422@id-6422.charlton.irccloud.com) has joined #ceph
[6:56] * seif (uid11725@ealing.irccloud.com) has joined #ceph
[6:57] * mkoderer (uid11949@ealing.irccloud.com) has joined #ceph
[7:05] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[7:06] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[7:12] * rongze (~zhu@173-252-252-212.genericreverse.com) Quit (Ping timeout: 480 seconds)
[7:12] * haomaiwa_ (~haomaiwan@117.79.232.249) Quit (Ping timeout: 480 seconds)
[7:22] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[7:45] * KindTwo (~KindOne@h39.26.131.174.dynamic.ip.windstream.net) has joined #ceph
[7:47] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:47] * KindTwo is now known as KindOne
[7:59] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[8:21] <grepory> suddenly my openstack boxes can't talk to ceph again :(
[8:22] * haomaiwang (~haomaiwan@117.79.232.244) has joined #ceph
[8:33] <grepory> rados lspools just hangs...
[8:34] <grepory> no problem talking to the cluster
[8:35] <grepory> it's getting stuck on locking… it gets the timezone, then there are a bunch of futex() calls, and it hangs.
[8:37] <grepory> hrm… found an error in cinder-volume's log: monclient: hunting for new mon
[8:37] <grepory> so i guess it can't talk to one of the mons
[8:37] <grepory> even though ceph says everything is ok....
[8:46] <grepory> hmm…. ceph health even returns okay… but ceph osd dump doesn't
[8:47] <grepory> ceph osd ls, ceph -w, ceph -s, ceph health, but … rados df, ceph osd dump, none of those work...
[8:56] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:03] * wer (~wer@206-248-239-142.unassigned.ntelos.net) has joined #ceph
[9:34] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[9:59] * tnt (~tnt@91.176.58.19) has joined #ceph
[11:00] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[11:30] * Cybertinus (~Cybertinu@2001:828:405:30:83:96:177:42) has joined #ceph
[11:35] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[11:46] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[12:13] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[12:57] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[13:18] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) has joined #ceph
[13:33] * terje-_ (~root@135.109.216.239) has joined #ceph
[13:34] * terje- (~root@135.109.216.239) Quit (Ping timeout: 480 seconds)
[13:51] <Cybertinus> Hello. Is it a choice or could it be a bug that ceph-deploy isn't included in the default RHEL6 repo?
[13:51] <Cybertinus> Or I must be doing wrong, always a posibility offcource :)
[13:55] <ofu> ceph-deploy is a separate repo
[13:56] <ofu> http://ceph.com/rpm/el6/noarch/ceph-release-1-0.el6.noarch.rpm is stable 0.61.4, http://ceph.com/rpm/el6/noarch/ceph-deploy-release-1-0.noarch.rpm is the ceph-deploy for it
[13:56] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[13:56] <Cybertinus> ok, thnx ofu
[14:02] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[14:06] <Psi-jack> hmm, ceph-deploy?
[14:07] <Cybertinus> yeah, I'm building my first ceph cluster now :)
[14:07] <Cybertinus> following http://eu.ceph.com/docs/master/start/quick-ceph-deploy/, but I adapt it right away for CentOS instead of Ubuntu :)
[14:08] <Cybertinus> don't know if I'm gonna succeed or not, only one way to find out :)
[14:08] <Psi-jack> Yep. I'm rebuilding my Arch-based Ceph cluster in-place with CentOS. :)
[14:08] <Cybertinus> ok
[14:08] <Psi-jack> Kinda fun..-ish..
[14:08] <Cybertinus> I've got it running in VirtualBox now. 1 admin-node, 1 VM for the rest (mon, osd, stuff like that)
[14:09] <Cybertinus> production cluster or testcluster?
[14:09] <Psi-jack> Taking one server out of the cluster, back it up raw from a rescue disc, install centos, keeping partitions all the same, then putting it back into the cluster to rebuild itself with it. And Voila. :D
[14:09] <Psi-jack> I'm running it production.
[14:09] <Cybertinus> nice nice
[14:10] <Cybertinus> using the beauty of Ceph :)
[14:10] <Psi-jack> I just have 0% downtime because it's ceph. :)
[14:10] <Cybertinus> replacing the OS of the cluster without downtime
[14:10] <Cybertinus> I like
[14:10] <Cybertinus> :)
[14:10] <Psi-jack> though my cluster is currently complaining about it being degraded, but it'll fix itself once this is back online. :)
[14:10] <Cybertinus> yup, not a problem there indeed
[14:10] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[14:11] <Cybertinus> I think you have set your cluster now that it doesn't move osds from down to out?
[14:11] <Psi-jack> I was just getting tired of futsing with Arch crap. And being unable to update anything.
[14:11] <Psi-jack> Oh it does.
[14:11] <Psi-jack> It just tries to re-allocate it elsewhere, but when the osd's come back on, it sees all the data still there, so it checks it out and only rebuilds the new stuff.
[14:12] <Cybertinus> ok
[14:12] <Cybertinus> and that doesn't create to much unneeded I/O?
[14:12] <Psi-jack> Err, no, actually, yeah, they show down, not out. :)
[14:13] <Psi-jack> It does increase I/O yes, but it verifies that the data integrity remains, even if another server happends to fall out somehow.
[14:15] <Psi-jack> heh, I was just mostly glad that CentOS 6.4 can install to GPT, even BIOS compatability mode GPT.
[14:15] <Cybertinus> heh, yes, It can do that indeed
[14:15] <Psi-jack> Though doens't provide /dev/disks/by-partlabel for GPT. :/
[14:16] <Psi-jack> I was actually using those! LOL
[14:17] <Cybertinus> I still use /dev/sda1 or something
[14:17] <Cybertinus> :)
[14:17] <Cybertinus> that just always works ;)
[14:19] <Psi-jack> heh
[14:19] <Psi-jack> I used partlabels because I labeled every gpt partition with stuff. Since my ceph servers use SSD for the OS and journals, I labeled the raw partitions stuff like osd-log-X, osd-journal-X, mon-X, mds-X
[14:20] <Cybertinus> ah, right. That is more clear then /dev/sda, /dev/sdb, etc.
[14:20] <Cybertinus> :)
[14:21] <Psi-jack> Yes it is.
[14:21] <Psi-jack> Especially since I'm using both xfs logdev, and ceph-journals on SSD. :)
[14:21] <Psi-jack> Last centos installation commencing. :D
[14:21] <Cybertinus> seperate SSDs?
[14:21] <Psi-jack> No, same SSD for 3 OSDs.
[14:22] <Cybertinus> ok
[14:22] <Psi-jack> SSD is also storage for mon and mds directly.
[14:22] <Cybertinus> at my work we just deployed a Ceph cluster
[14:22] <Cybertinus> we use 1 SSD for the OS (Ubuntu), 1 SSD for the journals, for 4 OSDs in 1 server
[14:22] <Cybertinus> ok
[14:23] * Psi-jack nods.
[14:23] <Psi-jack> Not bad, though the OS doesn't use the disk much, save for logging.
[14:23] <Cybertinus> true, but now the Ceph part is completly seperate from the OS
[14:23] * Psi-jack nods.
[14:23] <Cybertinus> makes it easier to switch the OS, if ever needed
[14:24] <Psi-jack> Heh, I'm doing it in-place, just formatting the sda2 and sda3, / and swap.
[14:24] <Psi-jack> Everything else is staying exactly the same. :)
[14:24] <Psi-jack> sda1 being the EF02 partition.
[14:24] <Cybertinus> I think we are gonna switch to CentOS to, at some point
[14:25] <Cybertinus> but don't know when and even that we are gonna switch isn't 100% sure
[14:25] <Psi-jack> I actually started switching my entire home cluster environment to CentOS, because I recently changed jobs where I am working with CentOS exclusively now.. I've been using Ubuntu, Debian, and openSUSE for servers for years.. been actually liking the change. CentOS 6 is definitely not bad.. Better than 5 was by a long shot.
[14:26] <Cybertinus> yeah, 6 is way better than 5 indeed
[14:26] <Cybertinus> at my current job we try to get everything on CentOS as much as possible, but we do have some Debian and Ubuntu machines
[14:27] <Psi-jack> Oh, I still have Debian systems, They run my Proxmox VE cluster. :)
[14:27] <Psi-jack> Dangit!
[14:27] <Psi-jack> grub -prompt on the last install. :/
[14:27] <Cybertinus> and there are even a few RHEL machines, but there licence isn't renewed anymore and when posible they too are gonna get formated to CentOS
[14:27] <Cybertinus> yeah, we also run Proxmox :). And that is Debian based
[14:28] <Cybertinus> don't think we ever gonna swap that for CentOS :p
[14:28] <Psi-jack> Why.. Oh why... Does THAt box have to be the picky one..
[14:28] <Cybertinus> heh, yeah, almost done. But not quite yet :/
[14:31] <Psi-jack> Sad too, cause I could list all the gpt partitions, all (hd0,gptX) just find, in the grub rescue.
[14:38] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[14:45] * markit (~marco@88-149-177-66.v4.ngi.it) has joined #ceph
[14:46] <markit> Hi, I'm not english, I'm watching a video about ceph, and talking about raid5 the slide has this write "Annual peta-byte durability for RAID-5 is only 3 nines", what is a "nines"?
[14:46] <markit> means 3 times every 9? but is 1/3
[14:46] <darkfader> markit: they mean 99.9% availability
[14:47] <darkfader> because (it is a silly example) you have so many components in raid5 sets
[14:47] <markit> mmm that would mean the contrary of it's thesis, that raid5 does not survive often to a rebuild
[14:47] <darkfader> no
[14:47] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) has joined #ceph
[14:47] <darkfader> 99.9% is horribly bad
[14:47] <markit> "Annual peta-byte durability for RAID-5 is only 99.9%" seems hight to me
[14:47] <darkfader> nah
[14:48] <markit> I would bed my life in a 99.9% possibility of survive, no?
[14:48] <markit> bet
[14:48] <darkfader> if you look at it it from a per-year perspective it means a few days of downtime per year
[14:48] <darkfader> total downtime in that example
[14:48] <Psi-jack> There we go. I think I know what happened, sorta.. I neverr got the CentOS reboot prompt, it just rebooted itself during the install, so grub's install never finished.
[14:48] <markit> well, total data loss
[14:48] <darkfader> markit: yes you could bet your life, but once!
[14:48] <darkfader> now think about betting it every minute
[14:48] <darkfader> for 10 years
[14:49] <darkfader> you're bound to get shot at a point if you do that
[14:49] <Psi-jack> Annnnd.... Booting. :D
[14:49] <markit> ok, I see, but is "annual durability" -> every 100 years, you wil 99.9% of the time
[14:49] <markit> that sounds great
[14:50] <markit> I know you are right, since is what the speaker want to prove, just don't get it that way
[14:50] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:50] <darkfader> don't use 100 years for an example
[14:50] <darkfader> since almost noone of us is gonna make 100 :))
[14:51] <markit> well, he said "annual", if you take "picoseconds", even a 99.99999% is not enough to survive a single day maybe
[14:51] <darkfader> :)
[14:51] <darkfader> if you wanna read more about the calculations you could grab a cheap used copy of that book: http://www.amazon.com/Availability-Network-Fundamentals-Chris-Oggerino/dp/1587130173/
[14:52] <markit> darkfader: thanks a lot!
[14:52] <markit> In any case, I'm a bit depressed, since I've started investigating distributed storage to have shared storage for 2-3 node proxmox (kvm virtualization)
[14:52] <markit> and ceph should have been an "less expensive" solution
[14:53] <markit> but if you try to use branded hw, like Dell, you end up with so high costs...
[14:53] <markit> also if you think about failure point and redundancy, you have to double a lot of stuff
[14:53] <markit> i.e. ceph nodes with 2x10GB ports, double switch? but this would be a single public interface redundate
[14:54] <markit> if you want to separate public with storage, you need 2x2x10gb?
[14:54] <markit> also Dell SSD are incredible expansive
[14:55] <markit> it's a minefield of "model x is good for price but can't expand storage, model y you can expand storage but not enough ram, etc"
[14:55] <markit> (or you don't have 10Gb nic available as option)
[14:55] <markit> how do you people choose the hardware?
[14:55] <markit> self build?
[14:56] <markit> I can go to my (even if poor) customers and ask them to trust not branded hw
[14:56] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[14:56] <markit> I'm very open for suggestions :)
[14:56] <markit> I've also evaluated sheepdog running in proxmox nodes
[14:57] <markit> that would be much cheaper, but seems not reliable as ceph (maybe in the future...)
[15:00] * Maskul (~Maskul@host-92-25-196-169.as13285.net) has joined #ceph
[15:00] <Psi-jack> Heck, I don't personally use branded hardware, other than Intel NICs, just because they are one of the better ones out there.
[15:00] <markit> Psi-jack: copper?
[15:01] <Psi-jack> I do have, in my house, a singular Dell PowerEdge older model tower unit. It's actually the slowest one I got. LOL
[15:01] * jabadia (~jabadia@77.125.82.90) has joined #ceph
[15:01] <markit> and 1 or 10 gb? (sure you told me and I think 1gb)
[15:01] <Psi-jack> 1gb, yes
[15:01] <markit> and ceph nodes with 2 interface? I've read that 1gb saturates very fast if you have only public
[15:02] <markit> but again I've no idea if I'm trying to reach performances that I will never pratically need :(
[15:04] <Psi-jack> I run it with 2, yes. 1 LAN, 1 SAN.
[15:19] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[15:26] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[15:27] * dosaboy (~dosaboy@host86-164-136-61.range86-164.btcentralplus.com) has joined #ceph
[15:33] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[15:34] * dosaboy_ (~dosaboy@host86-145-219-174.range86-145.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:37] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:00] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) has joined #ceph
[16:05] * dosaboy_ (~dosaboy@host86-161-202-118.range86-161.btcentralplus.com) has joined #ceph
[16:08] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[16:11] * dosaboy (~dosaboy@host86-164-136-61.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[16:19] * Maskul (~Maskul@host-92-25-196-169.as13285.net) Quit (Quit: Leaving)
[16:22] * john_barbee_ (~jbarbee@173-16-234-208.client.mchsi.com) has joined #ceph
[16:23] * john_barbee_ (~jbarbee@173-16-234-208.client.mchsi.com) Quit ()
[16:29] * john_barbee (~jbarbee@173-16-234-208.client.mchsi.com) Quit (Ping timeout: 480 seconds)
[16:48] * BillK (~BillK-OFT@124-169-221-120.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:50] * haomaiwang (~haomaiwan@117.79.232.244) Quit (Remote host closed the connection)
[16:54] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[16:58] <Cybertinus> I'm working through the Quick Start guide now, but I have a small question:
[16:58] <Cybertinus> I'm at http://eu.ceph.com/docs/master/start/quick-ceph-deploy/#add-ceph-osd-daemons
[16:59] * jabadia (~jabadia@77.125.82.90) Quit (Remote host closed the connection)
[16:59] <Cybertinus> If I have seperate "disks" (in reality they are logical volumes within my LVM volume group), do I still need to create thoses "disks" in /tmp?
[16:59] <Cybertinus> I can skip that step right?
[17:32] <Cybertinus> yeah, looks that way
[17:32] <Cybertinus> I only have trouble of hooking up Ceph to LVM LV's, so I'll switch over to just partitions then
[17:49] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) has joined #ceph
[18:06] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[18:06] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) has joined #ceph
[18:14] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[18:35] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[18:51] * haomaiwang (~haomaiwan@117.79.232.244) has joined #ceph
[18:53] * fireD (~fireD@93-139-159-20.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[18:55] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has left #ceph
[18:56] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[18:59] * haomaiwang (~haomaiwan@117.79.232.244) Quit (Ping timeout: 480 seconds)
[19:08] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) has joined #ceph
[19:08] * dosaboy_ (~dosaboy@host86-161-202-118.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[19:21] * yeled (~yeled@spodder.com) Quit (Ping timeout: 480 seconds)
[19:22] * infinitytrapdoor (~infinityt@p5B255184.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[19:51] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[19:52] * haomaiwang (~haomaiwan@li565-182.members.linode.com) has joined #ceph
[19:56] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[20:00] * haomaiwang (~haomaiwan@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[20:08] <infernix> is there a smart way to combine smartctl with failing osds?
[20:08] <infernix> e.g. can a failed osd be given a smartctl -t short and/or long, and use the result accordingly?
[20:09] <Cybertinus> you could build a script that runs smartctl, and if you don't like the output, you can shutdown the corresponding osd, resulting in the pg's being copied to some other place in the cluster, if the downtime is longer then 5 minutes
[20:09] <Cybertinus> and run that script every minute or so via cron
[20:09] <infernix> i'm thinking the other way around
[20:10] <Cybertinus> ah, yes, indeed
[20:10] <Cybertinus> hmm
[20:10] <Cybertinus> don't know about that
[20:10] <infernix> i've got like 8 or 10 xfs failures, osds go down
[20:10] <infernix> but is this really a broken disk or is there something else going on
[20:10] <infernix> (out of 120 osds)
[20:11] <Cybertinus> ok
[20:11] <Cybertinus> 8 or 10 all of a sudden?
[20:11] <infernix> no, over time
[20:11] <infernix> errors lke
[20:11] <infernix> xfs_log_force: error 5 returned.
[20:13] <Cybertinus> ok
[20:13] <Cybertinus> (before you get your hopes up, I'm a pretty big n00b with Ceph, I just started playing with it yesterday/today)
[20:14] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[20:14] <Cybertinus> what does Google tell you with that command?/
[20:14] * fireD (~fireD@93-139-177-124.adsl.net.t-com.hr) has joined #ceph
[20:15] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[20:15] <Cybertinus> err, what does google tell you with that error?
[20:16] <infernix> i'm sure there's a problem with the disk, just wondering whether if ceph is smart enough to figure that out, or if someone has done work for monitoring that incorporates this
[20:18] <Cybertinus> well, you can always hook smartctl up with your current monitoring
[20:18] <Cybertinus> and compare manually if there is some correlation
[20:33] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[20:38] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[20:44] <infernix> hrm
[20:45] <infernix> i restarted a node and now 'initctl restart ceph-osd-all' isn't working
[20:52] * haomaiwang (~haomaiwan@118.186.202.253) has joined #ceph
[20:56] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[21:05] * haomaiwang (~haomaiwan@118.186.202.253) Quit (Ping timeout: 480 seconds)
[21:08] * infernix is puzzled
[21:08] <infernix> it doesn't do anything at all
[21:08] <infernix> doesn't even try to start the osd
[21:11] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:16] <infernix> ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (2) No such file or director
[21:16] <infernix> why isn't it automouting the disks?
[21:16] <infernix> or where does that happen?
[21:18] <infernix> ceph-hotplug
[21:19] <infernix> but that needs a DEVNAME
[21:19] * infinitytrapdoor (~infinityt@p5DDD57CB.dip0.t-ipconnect.de) has joined #ceph
[21:22] * yeled (~yeled@spodder.com) has joined #ceph
[21:22] <infernix> ahh
[21:22] <infernix> ceph-disk-activate
[21:23] <infernix> missing key
[21:32] <infernix> map e14246 wrongly marked me down
[21:33] * infernix panics
[21:39] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[21:51] <infernix> argh
[21:51] <infernix> restarting osds doesn't fix it
[21:54] * n3c8-35575v2 (~mhattersl@pix.office.vaioni.com) has joined #ceph
[21:54] * n3c8-35575 (~mhattersl@pix.office.vaioni.com) Quit (Read error: Connection reset by peer)
[21:54] <Psi-jack> okay.. So now that I have my Arch servers replaced as CentOS, time to start upgrading from Bobtail to Cuttlefish. :)
[21:58] * haomaiwang (~haomaiwan@106.3.103.134) has joined #ceph
[22:04] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[22:06] * haomaiwang (~haomaiwan@106.3.103.134) Quit (Ping timeout: 480 seconds)
[22:10] <infernix> help
[22:11] <infernix> ceph continuesly spits out "wrongly marked me down" for a number of OSDs
[22:11] <infernix> i fluctuate from 46 in to 54 in and back down to 46 in
[22:24] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[22:36] * infinitytrapdoor (~infinityt@p5DDD57CB.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[22:38] <Psi-jack> Well, that was fricken easy as hell to upgrade to cuttlefish.
[22:38] * infinitytrapdoor (~infinityt@p5DDD57CB.dip0.t-ipconnect.de) has joined #ceph
[22:38] <Psi-jack> THANK YOU CEPH DEVS! You are brilliant with your work. :)
[22:48] <infernix> mons can't get into agreement
[22:48] <infernix> one mon always stays ouy
[22:50] <infernix> and it switches
[22:52] <infernix> one mon switches to 0.0.0.0 ip and then back to the actual ip
[22:52] <infernix> wth is going on >.<
[22:58] * haomaiwang (~haomaiwan@li565-182.members.linode.com) has joined #ceph
[23:06] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[23:06] <Cybertinus> sounds like a network problem :S
[23:06] * haomaiwang (~haomaiwan@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[23:06] <Cybertinus> nice, good to hear Psi-jack :)
[23:06] <Psi-jack> heh
[23:07] <Cybertinus> I've got my first Ceph cluster running in my Virtualbox enviroment now :). 1 mon, 2 osd's, so pretty damn small
[23:07] <Cybertinus> but it runs
[23:07] <Psi-jack> yeah. Now, I'm about to upgrade my PVE cluster. :)
[23:07] <Cybertinus> `ceph health` tells me "HEALTH_OK", so that's good
[23:11] <markit> Psi-jack: is it documented? I mean, cluster from 2.x to 3.0, wondering how can be done because maybe 3.x is not compatible with 2.x
[23:12] <infernix> i restarted all mons and things are fine now
[23:14] <Cybertinus> markit: yes, it is documenten. You can easily upgrade from Proxmox 2.3 to 3.0. They even wrote a script for it. Download it, give it execute rights, run it, done :). Basically. There are some things you have to do afterwards, but nothing major
[23:15] <Cybertinus> markit: the entier procedure is described here: http://pve.proxmox.com/wiki/Upgrade_from_2.3_to_3.0
[23:15] <markit> Cybertinus: I did, but for SINGLE pve server, wondering if works for a node too, without loosing "clusterness" ;P
[23:16] <markit> i.e. do you have to remove from cluster first, then re-insert?
[23:16] <Cybertinus> markit: ok, that is a good question. Didn't try that yet. I did upgrade my single 2.3 proxmox machine here to 3.0 too, but that is also not a cluster.
[23:17] <Cybertinus> I would try that out in a few Virtualbox machines (Proxmox can be installed in VirtualBox) or ask it on their forum or something
[23:17] <Cybertinus> or just take the gamble and test it in your production enviroment offcourse :P ;)
[23:19] * infinitytrapdoor (~infinityt@p5DDD57CB.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[23:40] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[23:40] * ChanServ sets mode +v andreask
[23:42] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.