#ceph IRC Log

Index

IRC Log for 2013-08-09

Timestamps are in GMT/BST.

[0:00] * scuttlemonkey (~scuttlemo@mb52036d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[0:00] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:02] * sagelap (~sage@2600:1012:b01d:1be6:f904:5d2c:6572:3201) Quit (Ping timeout: 480 seconds)
[0:08] * rturk is now known as rturk-away
[0:12] * netsrob (~thorsten@212.224.79.27) Quit (Quit: cu)
[0:14] * alfredod_ (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:14] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:14] * sagelap1 (~sage@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:14] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[0:15] * sprachgenerator (~sprachgen@130.202.135.172) Quit (Quit: sprachgenerator)
[0:16] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:16] * alfredod_ (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:17] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[0:19] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:19] * bandrus (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:19] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:19] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:19] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:21] * alfredod_ (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:22] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:23] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:25] * joao (~JL@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[0:27] * bandrus (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[0:29] * bandrus (~Adium@38.106.55.34) has joined #ceph
[0:31] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) has joined #ceph
[0:33] <ishkabob> when I run ceph-deploy to add an osd, does my disk already need to be mounted somewhere?
[0:34] * joao (~JL@me70436d0.tmodns.net) has joined #ceph
[0:34] * ChanServ sets mode +o joao
[0:35] * __jt___ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Ping timeout: 480 seconds)
[0:37] * thelan (~thelan@paris.servme.fr) Quit (Ping timeout: 480 seconds)
[0:38] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) has joined #ceph
[0:40] * thelan (~thelan@paris.servme.fr) has joined #ceph
[0:43] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:43] * alfredod_ (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[0:43] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:44] * Cube (~Cube@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[0:49] * alfredodeza (~alfredode@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Remote host closed the connection)
[0:52] * KippiX (~kippix@coquelicot-a.easter-eggs.com) Quit (Read error: Operation timed out)
[0:52] * KippiX (~kippix@coquelicot-s.easter-eggs.com) has joined #ceph
[0:54] * ishkabob (~c7a82cc0@webuser.thegrebs.com) Quit (Quit: TheGrebs.com CGI:IRC (Ping timeout))
[0:59] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[1:01] * Cube (~Cube@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[1:01] * Cube1 (~Cube@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:02] <Kioob> how much RAM is needed for mons ?
[1:02] * KippiX_ (~kippix@coquelicot-a.easter-eggs.com) has joined #ceph
[1:03] * bandrus1 (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:04] <Kioob> I read 1 GB per mon-daemon, but currently my daemons are eating 2 GB each one
[1:04] <lurbs> Kioob: http://ceph.com/docs/next/install/hardware-recommendations/
[1:04] * KippiX (~kippix@coquelicot-s.easter-eggs.com) Quit (Ping timeout: 480 seconds)
[1:05] <Kioob> So, it's wrong ?
[1:05] <lurbs> Well, it does say that the 1 GB per monitor is for "small production clusters and development clusters", and is a minimum.
[1:06] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:06] <Kioob> oh, then I read too fast
[1:06] <Kioob> thanks !
[1:08] * bandrus2 (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:08] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[1:08] * bandrus (~Adium@38.106.55.34) Quit (Ping timeout: 480 seconds)
[1:09] * bandrus1 (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[1:09] * Cube (~Cube@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:09] * Cube1 (~Cube@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[1:11] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:11] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[1:15] * The_Bishop__ (~bishop@f052096065.adsl.alicedsl.de) has joined #ceph
[1:18] * stepan_cz (~Adium@host86-134-75-80.range86-134.btcentralplus.com) has joined #ceph
[1:19] * The_Bishop_ (~bishop@f052103091.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[1:20] * stepan_cz (~Adium@host86-134-75-80.range86-134.btcentralplus.com) Quit ()
[1:20] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:20] * thelan (~thelan@paris.servme.fr) Quit (Ping timeout: 480 seconds)
[1:20] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[1:22] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[1:22] * athrift_ (~nz_monkey@203.86.205.13) has joined #ceph
[1:23] * athrift (~nz_monkey@203.86.205.13) Quit (Ping timeout: 480 seconds)
[1:23] * bandrus (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) has joined #ceph
[1:23] * bandrus2 (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[1:24] * bandrus (~Adium@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit ()
[1:29] * Cube (~Cube@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[1:29] * piti (~piti@82.246.190.142) Quit (Ping timeout: 480 seconds)
[1:29] * nwat (~nwat@99-119-181-1.uvs.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[1:29] * joao (~JL@me70436d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[1:33] * piti (~piti@82.246.190.142) has joined #ceph
[1:35] * tnt_ (~tnt@91.177.243.62) Quit (Ping timeout: 480 seconds)
[1:40] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:40] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[1:41] <ron-slc> Quick question on Deep-Scrubbing. It seems deep-scrub uses a read, with buffer-cache. This pushes "more-active" data out of the cache, causing a time consuming read operation.. Would deep-scrubbing benefit from IO_DIRECT, to bypass caching?
[1:41] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[1:41] * zjohnson (~zjohnson@guava.jsy.net) has joined #ceph
[1:45] <zjohnson> about to buy some new hw for a ganeti w/ rbd cluster. Having a hard time finding answers to questions like can I buy a bunch of identical nodes and have them all host ceph OSDs as well as virtual machines
[1:45] <zjohnson> currently doing a lot of virtualization with kvm and just using virsh virt-manager for management
[1:46] <ron-slc> I would think this is possible, as the two products are separate, and use separate network ports. Ganetti uses librbd to my knowledge, so same-host should be an issue.
[1:47] <ron-slc> same-host *should-NOT* be an issue...
[1:47] * __jt___ (~james@rhyolite.bx.mathcs.emory.edu) has joined #ceph
[1:47] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Read error: Connection reset by peer)
[1:48] <zjohnson> I am sort of thinking of going with something like a few of these http://www.supermicro.com/products/nfo/2UTwin2.cfm or a few of these http://www.supermicro.com/products/system/3u/5037/sys-5037mr-h8trf.cfm
[1:49] <zjohnson> and by few I mean 2-4
[1:49] <zjohnson> right now I have mostly 1U nodes with 2 or 4 3.5" bays in front
[1:51] <zjohnson> I have used DRBD in the past, but don't full grasp what a good hw foundation would be for ganeti+rbd
[1:53] <ron-slc> any chassis will technically work.. we replaced all our SuperMicro's with Intel S2600GZ systems, they include very good BCM Remote management: http://www.intel.com/content/www/us/en/motherboards/server-motherboards/server-board-s2600gl-gz.html
[1:54] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[1:55] <ron-slc> ceph/rbd is not as fast as a local RAID-10 with array accelerator, due to the multi host + multi disk transaction (replication) commit. But it is much safer than RAID, and you can reboot your storage nodes. For max performance, I'd still recommend Batter/Flash Backed Write cache modules on an array controller.
[1:56] <ron-slc> recommend *BatterY*/Flash
[1:56] <zjohnson> battery flash?
[1:56] <ron-slc> battery-backed, or flash-backed array controller.
[1:56] * scuttlemonkey (~scuttlemo@2607:f298:a:607:98a7:d2f5:1975:9ca5) has joined #ceph
[1:56] * ChanServ sets mode +o scuttlemonkey
[1:57] <ron-slc> either one provides you with Write-Back caching at the hardware level.
[1:57] <ron-slc> well, provides you with *safe* write-back caching.
[1:58] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[1:59] <zjohnson> I've been using software raid almost exclusivly so far
[1:59] <zjohnson> hm
[1:59] <ron-slc> ahh, then performance should only improve for you.. ;)
[2:01] <zjohnson> generally RBD doesn't use traditional raid though is my understanding
[2:02] <zjohnson> ?
[2:02] * The_Bishop__ (~bishop@f052096065.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[2:03] <ron-slc> well you *can* use underlying software/hardware RAID, but this is highly unnecessary, especially if your replica-size/count is set to 3. We use only RAID0 for our Hardware-arrays, just to benefit from cache-acceleration, NO redundancy above what Ceph/Rados provide.
[2:04] <zjohnson> do you put a single raid0 for all rbd allocated disks on each node?
[2:04] <zjohnson> so if you lose a disk all the node's storage is gone, but thats ok with 3x replication?
[2:05] <ron-slc> we do one physical Disk per RAID0. (same as a single sd* Scsi-Disk) in linux software raid.
[2:05] <ron-slc> well even 2x replication, you are safe, most people do 3x replication, just to CYA in-case of a multi-disk failure during rebuild.
[2:06] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[2:06] <zjohnson> sorry if I'm being dense, but what is the benifit to running raid0 for a single physical disk?
[2:06] <zjohnson> isn't that the same as just running the single disk w/o any raid?
[2:07] <ron-slc> In our case, with Hardware RAID accelerators, this gives us Write-back cache acceleration.
[2:08] <zjohnson> ok, so you are implementing raid0 on hw raid contollers to derive a speed benifit over just hooking the drive up to a normal sata or sas or whatever port?
[2:08] <ron-slc> If you don't use hardware RAID accelerators, you just do direct-single Disks. This is how most people here seem to do BULK disk storage.
[2:08] <ron-slc> correct.
[2:08] <zjohnson> what model of hw raid do you commonly use or was that integrated into the mb?
[2:08] <ron-slc> Yea, you are correct RAID0 on software raid would be pointless. :)
[2:08] <lurbs> We do a similar thing, yeah. With the controllers we're using JBOD mode doesn't support writeback caching.
[2:09] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[2:11] <ron-slc> We use the Intel RMS25PB080 with the Intel chassis, this is the same as LSI Logic 2208 Chip.
[2:11] <zjohnson> so if you have a flash-backed up write cache with capaciters to let it do it's thing inevent of powerloss
[2:11] <ron-slc> lurbs: correct, JBOD typcially doesn't provide wb-caching.
[2:11] <zjohnson> then if you say lose power
[2:11] <zjohnson> system powers up
[2:12] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:12] <zjohnson> raid card does the right thing with the flash backed cache and you are free to happikly go on your way?
[2:12] * The_Bishop (~bishop@e177091115.adsl.alicedsl.de) has joined #ceph
[2:12] * scuttlemonkey (~scuttlemo@2607:f298:a:607:98a7:d2f5:1975:9ca5) Quit (Ping timeout: 480 seconds)
[2:12] <ron-slc> Yes, on power-up RAID Card reads the dirty-writes. And they are written to disk, before the OS boots
[2:13] <zjohnson> and the flash cash may commonly be about 512MB?
[2:13] <zjohnson> do you descrete hwraid cards or motherboards with integration?
[2:14] <ron-slc> it is typically sized to the Controller's RAM. Upon power failure, capacitors power RAM + Flash, to do a quick dump of dirty RAM-write cache contents
[2:14] <zjohnson> and not having writes have to wait for physical spinning disks overall shared file i/o is greatly improved for a wide variety of vm applications.. ?
[2:15] <ron-slc> We prefer add-in PCIExpress cards, on-board modules are typically harder to troubleshoot, and less portable to other systems.
[2:15] <lurbs> ron-slc: Don't suppose you had any issues getting that controller to work with ACPI on?
[2:16] <zjohnson> and if you were able to drop money on ssd as main storage then you could probably get away with write-through caching on normal motherboard sata ports and maybe no longer see much improvement with write-back enabled?
[2:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[2:19] <ron-slc> lurbs: Ah ha, YES. HUGE PITA. A kernel in the past 1/2 years broke the ACPI PCI Allocations. The fix is : GRUB_CMDLINE_LINUX="pci=conf1" in your /etc/default/grub I wrote work-around detail comment #4 here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1091263
[2:19] <zjohnson> and this is all pretty much just for writes since say Linux already does good in memory caching for reads.
[2:21] <ron-slc> BEWARE of SSD, it has a typical daily write-duty cycle of *ONLY* 20GB, over 4-5 years. Double this to 40GB, and expect death in only 2-2.5 years. Most ceph MON nodes exceed this limit on their OWN!!!!! VERY DANGEROUS, check the manufacturer specs BEFORE buying, we had to trash 8 SSD disks.
[2:21] <lurbs> ron-slc: Ah, excellent, thanks for that. I've just had ACPI disabled on our Ceph storage nodes.
[2:21] <ron-slc> correct read-caching by the OS happens before the Hardware RAID Read-Cache.
[2:22] <zjohnson> hmm
[2:22] <ron-slc> lurbs: yea, lost 2 days to that one. I haven't checked the 3.10, or 3.11 kernels yet..
[2:23] <zjohnson> hmmm
[2:23] <ron-slc> k guys, nice chat! I need to leave for the day!
[2:23] <zjohnson> thanks!
[2:23] <ron-slc> my pleasure!
[2:23] * zjohnson needs to figure this all out soonish
[2:24] <zjohnson> now to find a good cheap flash cached controller for only 2 disks
[2:24] <zjohnson> could go well with the microcloud hw
[2:34] <zjohnson> afk
[2:42] * huangjun (~kvirc@111.174.91.224) has joined #ceph
[2:45] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[2:48] * mschiff_ (~mschiff@port-34442.pppoe.wtnet.de) has joined #ceph
[2:49] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:50] * yy-nm (~chatzilla@115.196.74.105) has joined #ceph
[2:52] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[2:56] * mschiff (~mschiff@port-28851.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[2:57] * LeaChim (~LeaChim@97e00998.skybroadband.com) Quit (Ping timeout: 480 seconds)
[3:04] * bandrus (~Adium@2607:f298:a:697:4586:cadd:3df4:eedb) has joined #ceph
[3:10] * bandrus (~Adium@2607:f298:a:697:4586:cadd:3df4:eedb) Quit (Quit: Leaving.)
[3:10] * thelan (~thelan@paris.servme.fr) has joined #ceph
[3:23] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[3:28] <shimo> hi. for reasons i have little control over, i'm trying to create an OSD from a plain folder on the OS disk.
[3:29] <shimo> it seems like it should be possible, for example ceph-disk prepare accepts DATA, which is a "path to OSD data (a disk block device or directory)"
[3:31] * cfreak201 (~cfreak200@p4FF3E75F.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:33] <shimo> and https://github.com/ceph/ceph/blob/master/doc/start/quick-ceph-deploy.rst#multiple-osds-on-the-os-disk-demo-only gives an example using ceph-deploy.. however, i'm using chef and trying to use ceph-disk-prepare && ceph-disk-activate instead
[3:42] * yy (~michealyx@115.196.74.105) has joined #ceph
[3:42] * yy (~michealyx@115.196.74.105) has left #ceph
[3:48] * julian (~julianwa@125.69.104.58) has joined #ceph
[3:48] * julian (~julianwa@125.69.104.58) Quit (Read error: Connection reset by peer)
[3:49] <yy-nm> hi, all? i'm wonderring about the manual or tutorial book for ceph source ?
[3:55] <yanzheng> http://ceph.com/docs/master/dev/ + sage's thesis
[3:59] <yy-nm> thx
[4:00] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Read error: Operation timed out)
[4:03] * markl (~mark@tpsit.com) Quit (Ping timeout: 480 seconds)
[4:04] * jaydee (~jeandanie@124x35x46x8.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:06] * silversurfer (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) Quit (Read error: Operation timed out)
[4:52] * cfreak200 (~cfreak200@p4FF3E75F.dip0.t-ipconnect.de) has joined #ceph
[5:04] * silversurfer (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:05] * fireD_ (~fireD@93-142-234-132.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-139-160-151.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:08] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[5:08] * jaydee (~jeandanie@124x35x46x8.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[5:08] * jpieper (~josh@209-6-205-161.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[5:19] * jaydee (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:23] * silversurfer (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[5:37] * KindTwo (KindOne@h1.42.186.173.dynamic.ip.windstream.net) has joined #ceph
[5:41] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:41] * KindTwo is now known as KindOne
[5:43] <huangjun> we use the the same rbd on 3 hosts, one of 3 export as iscsi for win users, and other 2 host mount the rbd device on local directory
[5:46] <huangjun> then i write different files on the 2 hosts mounted rbd on local dirs, they can not see the files that another host writes
[5:54] <elder> You can't mount the the same rbd device for more than one host at the same time.
[5:54] <huangjun> elder: but we have mounted on two hosts
[5:55] <elder> If you attempt to do this you will at best have it function incorrectly, but more likely you will wind up with a corrupted file system and corrupted data.
[5:55] <elder> One rbd image can only be used by one host at a time.
[5:55] <elder> If you mount it on one only, it will work fine. If you unmount it, then mount it on the other host, that will be OK too.
[5:55] <elder> But if you try to do them both at the same time the two hosts will interfere with each other.
[5:56] <elder> If you want to do something like that you need a shared file system, not just a shared block device.
[5:56] <yanzheng> you can't do this unless using cluster fs such as ocfs2
[5:56] <elder> Ceph is capable of sharing the same file system on multiple hosts. Ceph works but it not production-ready yet.
[5:57] <huangjun> yes, we found the inconsistency between multi rbd clients
[5:58] <yanzheng> I think single mds cephfs is already quite stable
[6:00] <huangjun> the risk is high if the only mds down
[6:00] <yanzheng> there are standby mds
[6:04] <huangjun> yanzheng: are you working on mds stablity?
[6:04] <yanzheng> yes
[6:07] <huangjun> so if the mds write metadata to osd, it calculate the location using cursh?
[6:14] * hugokuo (~hugokuo@75-101-56-159.dsl.static.sonic.net) has joined #ceph
[6:14] * hugo_kuo (~hugokuo@75-101-56-159.dsl.static.sonic.net) has joined #ceph
[6:15] * hugo_kuo (~hugokuo@75-101-56-159.dsl.static.sonic.net) Quit ()
[6:17] <hugokuo> Hi all, I'm a newbie of Ceph.
[6:18] <hugokuo> There's a RadosGw in front of my RADOS pool with 3 nodes each has 10 drives for OSDs. After several runs. I found some issues.
[6:19] <hugokuo> 1) I can not purge deleted object by radosgw-admin. A failed msg returned. 2) The data placement is very imbalance. http://i217.photobucket.com/albums/cc280/tonytkdk/imbalanced_zps1e3117a7.png
[6:19] * scuttlemonkey (~scuttlemo@mbf2036d0.tmodns.net) has joined #ceph
[6:19] * ChanServ sets mode +o scuttlemonkey
[6:20] <hugokuo> osdmap e194: 30 osds: 30 up, 30 in
[6:31] <huangjun> hugokuo: data imbanlance are related to your crush settings, are you use host as the crush leaf node?
[6:32] <huangjun> do you want to purge the objects by radosgw-admin?
[6:32] <huangjun> what error outputs?
[6:35] * KindTwo (KindOne@h65.40.28.71.dynamic.ip.windstream.net) has joined #ceph
[6:37] <hugokuo> huangjun, I'm not sure if I use host as crush leaf node. I deploy three RADOS nodes with ceph-deploy by following the quick start instruction.
[6:37] <hugokuo> huangjun, as for the temp remove failed msg as follow :
[6:37] <hugokuo> ailed to list objects
[6:37] <hugokuo> failure removing temp objects: (2) No such file or directory
[6:37] <huangjun> uhh, the default is using host as leaf node
[6:38] <huangjun> that means you didn't have the object, you should make sure if you have it or not
[6:38] <hugokuo> I found that data remain in Rados pool after issued deleting operation from swift CLI client
[6:38] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:38] * KindTwo is now known as KindOne
[6:39] <hugokuo> huangjun, alright, I had better to study more about RADOS first. thanks for your helps
[6:41] <hugokuo> btw, Are you familiar with RadosGW. It seems has a limitation of concurrency connection # .
[6:54] * AfC (~andrew@2407:7800:200:1011:f0a3:241:15ee:11ca) has joined #ceph
[6:59] * Machske (~Bram@81.82.216.124) Quit (Ping timeout: 480 seconds)
[7:04] * Machske (~Bram@81.82.216.124) has joined #ceph
[7:18] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[7:18] <sage> yanzheng: around?
[7:18] <yanzheng> yes
[7:18] <yanzheng> I saw you email
[7:18] <sage> yanzheng: wip-mds looks good. going to pull it into master (which we will resume testing next week)
[7:18] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[7:19] <yanzheng> thanks
[7:19] <sage> heh yeah, can't believe i missed that this was already implemented during the cds slot :)
[7:19] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[7:19] <sage> are the kernel-side fixes in testing yet? i've lost track of what has been merged and what hasn't been
[7:20] <yanzheng> the last series I sent is not in the testing branch
[7:20] <sage> k, let me look
[7:21] <sage> 3 patches from aug 6?
[7:21] <yanzheng> yes
[7:21] <yanzheng> they are relatively big
[7:23] <shimo> ok i know this is not recommended, but does anyone have any idea how to use a directory on the OS disk as an OSD?
[7:23] * KrisK (~krzysztof@213.17.226.11) has joined #ceph
[7:23] <sage> ceph-deploy osd create HOST:/some/dir
[7:23] <sage> should work. although it won't do the mount for you (if that's needed)
[7:24] <sage> 'ceph-disk -v prepare DIR' is what is happening behind the scenes; that will basically do the same thing.
[7:24] <shimo> the thing is that i've been trying to do it with ceph-disk prepare & activate for quite a while
[7:25] <shimo> the prepare goes fine, but it always attempts to create an fs in the folder during activate
[7:25] <sage> hmm that's a bug :)
[7:25] <shimo> yeah, the reason being that ceph-disk always passes --mkfs to ceph-osd
[7:26] <sage> oh, that's normal.
[7:26] <sage> ceph-osd --mkfs means that it needs to initialize it's data directory
[7:26] <sage> yanzheng: the truncate mutex is long overdue, yes
[7:27] <sage> need to review those a bit more carefully tho
[7:27] <shimo> sage: are you familiar with the ceph source?
[7:27] <sage> hmm did i pull in the delete stuff yet?
[7:27] <yanzheng> i think yes
[7:28] <shimo> sage: should https://github.com/ceph/ceph/blob/master/src/ceph_osd.cc#L173 work for a plain directory?
[7:28] <yanzheng> yes for kclient
[7:28] <shimo> because i always get the error
[7:28] <sage> shimo: yeah
[7:28] <yanzheng> I think patches for fuse and mds are still not in
[7:31] <sage> yanzheng: the fuse one doesn't seem to apply (at least with git am, which always seems much more picky)
[7:31] <sage> pulled in teh mds one tho
[7:31] <shimo> sage: thanks, i'm going to try with v0.66 and see if it works there
[7:31] <sage> want to run the fuse one through qa before merging it in tho
[7:32] <sage> btw one of the other things that came up during cds was making teuthology work with euca2ools so you can run the test suite on openstack of ec2 or whatever
[7:34] <sage> btw i'll be in hong kong for the openstack summit in november. will you be there by any chance?
[7:35] <sage> shimo: if it doesn't work please send a quick message to ceph-devel (or better yet, open a tracker.ceph.com bug) with the ceph-disk commands that you did
[7:35] <sage> yanzheng: ^ re: hong kong
[7:36] <yanzheng> hong kong?
[7:36] <shimo> sage: you seem to know a lot so could you perhaps take a very quick look? maybe my usage is wrong: http://paste.ubuntu.com/5965089/
[7:36] <sage> 'll be in hong kong for the openstack summit in november. will you be there by any chance?
[7:36] <yanzheng> no
[7:36] <sage> shimo: i bet you are using ext3/ext4 and didn't mount with user_xattr option
[7:36] <sage> mount -o remount,user_xattr /
[7:37] <shimo> that sounds like a very possible cause! thanks!
[7:38] <shimo> (sadly i have no direct control over the FS but i'll see what i can do)
[7:39] * buck1 (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[7:40] <sage> yanzheng: oh well. i will let you know next time i am in... shanghai, right?
[7:40] <yanzheng> yes
[7:40] <yanzheng> I will send the updated fuse patch soon
[7:42] * yy (~michealyx@115.196.74.105) has joined #ceph
[7:43] * yy (~michealyx@115.196.74.105) has left #ceph
[7:44] * yy (~michealyx@115.196.74.105) has joined #ceph
[7:44] * yy (~michealyx@115.196.74.105) has left #ceph
[7:44] <shimo> unfortunately the error did not change even after a reboot. i can see "/dev/sda1 on / type ext4 (rw,user_xattr)" in `mount` though
[7:45] <shimo> and this is actually running inside a vagrant virtual machine
[7:45] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[7:45] <sage> shimo: can you add 'debug filestore = 20' to the [osd] section of ceph.conf and reproduce the problem?
[7:45] <shimo> sure, just a sec
[7:46] * scuttlemonkey (~scuttlemo@mbf2036d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[7:47] <shimo> umm, where does the log go?
[7:47] * tnt (~tnt@91.177.243.62) has joined #ceph
[7:48] <sage> /var/log/ceph/ceph-osd.NNN.log
[7:50] <shimo> http://paste.ubuntu.com/5965134/
[7:52] <sage> shimo: oh. 'filestore xattr use omap = true' in [osd] section of ceph.conf
[7:53] <shimo> yeah actually got there in the docs right now… hopefully this is it
[7:53] <sage> i think we make it do that automatically now with dumpling
[7:54] <sage> anyway, i'm off to bed. hope that was it! 'night all
[7:54] <shimo> it's working! sorry for wasting your time over config miss!
[7:54] <shimo> thanks alot!
[8:11] * rongze (~quassel@117.79.232.201) Quit (Read error: Connection reset by peer)
[8:11] * rongze (~quassel@notes4.com) has joined #ceph
[8:18] * ismell_ (~ismell@host-24-56-171-198.beyondbb.com) Quit (Ping timeout: 480 seconds)
[8:36] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:45] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[8:46] * athrift_ (~nz_monkey@203.86.205.13) Quit (Ping timeout: 480 seconds)
[8:49] * Georg (~georg_hoe@bs.xidrasservice.com) has joined #ceph
[8:58] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[8:58] * tnt (~tnt@91.177.243.62) Quit (Read error: Operation timed out)
[9:02] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Say What?)
[9:04] * [fred] (fred@konfuzi.us) Quit (Ping timeout: 480 seconds)
[9:09] * Georg (~georg_hoe@bs.xidrasservice.com) has left #ceph
[9:10] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:12] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[9:12] * ChanServ sets mode +v andreask
[9:12] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) has joined #ceph
[9:12] * matt_ (~matt@mail.base3.com.au) has joined #ceph
[9:15] * evil_steve (~evil_stev@irc-vm.nerdvana.net.au) has joined #ceph
[9:15] * tserong (~tserong@124-171-119-22.dyn.iinet.net.au) has joined #ceph
[9:18] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:20] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:24] * sleinen (~Adium@2001:620:0:26:b5b0:57b9:6db2:a29e) has joined #ceph
[9:35] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) Quit (Quit: gentleben)
[9:39] * matt_ (~matt@mail.base3.com.au) Quit (Quit: Leaving)
[9:46] * mschiff (~mschiff@port-34442.pppoe.wtnet.de) has joined #ceph
[9:47] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[9:47] * mschiff_ (~mschiff@port-34442.pppoe.wtnet.de) Quit (Remote host closed the connection)
[9:50] * mschiff_ (~mschiff@port-34442.pppoe.wtnet.de) has joined #ceph
[9:55] * mschiff (~mschiff@port-34442.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[9:55] <loicd> leseb: are you on duty this morning ?
[9:55] <loicd> and good morning :-)
[9:55] <leseb> loicd: yes I am
[9:55] <leseb> loicd: good morning :)
[9:58] <loicd> great :-) I would like to install OpenStack & ceph to work together and I'm not sure what would be the combination that is more likely to work with a minimum amount of effort. Ideally a few hours at most. As you know I'm quite familiar with both OpenStack & Ceph. But I'm not installing OpenStack on a daily basis. Nor Ceph for that matter ;-)
[9:59] * KippiX_ (~kippix@coquelicot-a.easter-eggs.com) has left #ceph
[10:06] <leseb> loicd: well I assume you can use the puppet modules for both openstack and ceph
[10:06] <leseb> how many hosts do you have to deploy?
[10:08] <loicd> leseb: I know the developers of these modules can. I'm not sure I could within hours. Even though I'm very familiar with puppet. Do you think I'll succeed ?
[10:09] <loicd> leseb: I only have a handful of hosts to deploy
[10:10] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[10:11] <loicd> leseb: the one thing that's missing is examples of manifests to show how these modules can actually be used. There is some documentation. But unless the documentation is a perfect reference, it's quite difficult to figure things out without an example.
[10:11] <leseb> loicd: then if you're already familiar with puppet I assume you'll succeed :-)
[10:12] <loicd> leseb: would you say it'll take me hours or days ?
[10:12] <leseb> loicd: definitely hours
[10:12] <loicd> ok. I'll give it a try then and let you know how it went ;-)
[10:13] <leseb> loicd: sure :)
[10:13] <loicd> have you tried that yourself recently ?
[10:13] <leseb> loicd: yes and it worked like a charm
[10:14] <loicd> could you share the manifests you've used ? it would be a great start.
[10:15] <loicd> what puppet modules have you used ? and what operating system ?
[10:16] <loicd> given the puppet modules + the OS + the manifests I guess I'll have the winning combination
[10:16] * KindTwo (KindOne@h196.43.28.71.dynamic.ip.windstream.net) has joined #ceph
[10:17] * AfC (~andrew@2407:7800:200:1011:f0a3:241:15ee:11ca) Quit (Quit: Leaving.)
[10:18] <leseb> loicd: I used Ubuntu 12.04.2 with the official puppet modules for openstack (from the forge) and these modules for ceph https://github.com/enovance/puppet-ceph
[10:19] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:19] * KindTwo is now known as KindOne
[10:19] * LeaChim (~LeaChim@97e00998.skybroadband.com) has joined #ceph
[10:19] <loicd> https://github.com/stackforge/puppet-openstack ?
[10:19] <leseb> loicd: yes
[10:21] <loicd> can you share your manifest ? That will tell me how you assembled the various components. There are zillions of ways to do it and it will show me one way that actually works ;-)
[10:22] <loicd> and which commit id you used ( so that I do not wonder about possible regressions because of commits that happen *after* you installed )
[10:23] <loicd> I'll try this right now. It so happens that I have three ubuntu 12.04 machines handy :-)
[10:24] * ismell (~ismell@host-24-56-171-198.beyondbb.com) has joined #ceph
[10:25] <leseb> loicd: gimme a sec
[10:25] <loicd> sure
[10:26] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[10:26] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[10:27] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:29] * X3NQ (~X3NQ@195.191.107.205) has joined #ceph
[10:29] * LeaChim (~LeaChim@97e00998.skybroadband.com) Quit (Ping timeout: 480 seconds)
[10:33] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[10:39] * LeaChim (~LeaChim@97e00998.skybroadband.com) has joined #ceph
[10:39] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[10:46] * hugokuo (~hugokuo@75-101-56-159.dsl.static.sonic.net) Quit (Quit: ??)
[10:47] <leseb> loicd: site.pp http://pastebin.com/edpDq3JG
[10:47] <leseb> loicd: I hope this'll help you
[10:52] * allsystemsarego (~allsystem@188.25.130.190) has joined #ceph
[10:55] <loicd> I assume "os_role_quantum" is a manifest in roles/*.pp that "does the right thing" and then uses the quantum module from https://github.com/stackforge/puppet-openstack right ?
[10:55] <loicd> leseb: ^
[10:56] <leseb> loicd: https://github.com/stackforge/puppet-quantum
[10:58] <loicd> yes. Could you also share the corresponding roles/xxx.pp so that I can figure out how you've used the puppet-quantum module ? These are the tiny bits that makes a world of difference :-)
[10:59] <loicd> I guess I have the same question regarding os_role_keystone & os_role_rabbitmq. Unless you're using role files from the stackforge unmodified ? That would be great :-)
[11:01] * dosaboy_ (~dosaboy@host81-156-124-131.range81-156.btcentralplus.com) has joined #ceph
[11:01] <leseb> loicd: I can't really ensure that, we (at eNovance) have more or less some customisations for the puppet modules too...
[11:06] * dosaboy (~dosaboy@host86-152-196-168.range86-152.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[11:08] <loicd> leseb: understood :-) Would you mind sharing the role files then ?
[11:11] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[11:15] * yy-nm (~chatzilla@115.196.74.105) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[11:15] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[11:22] * indego (~indego@91.232.88.10) has joined #ceph
[11:24] * KindTwo (KindOne@h196.57.186.173.dynamic.ip.windstream.net) has joined #ceph
[11:26] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:26] * KindTwo is now known as KindOne
[11:30] <loicd> leseb: are you around ?
[11:47] <shimo> radosgw is giving me "Couldn't init storage provider (RADOS)"
[11:48] <shimo> anyone have any idea how to debug this?
[11:52] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[11:54] <shimo> hmm, okay, with debug ms = 1 i'm getting "pool_op_reply(tid 1 (1) Operation not permitted v5)"
[11:58] * CliMz (~CliMz@194.88.193.33) has joined #ceph
[12:02] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) has joined #ceph
[12:06] <shimo> trying with auth disabled, in case anyone is interested.
[12:24] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[12:25] <loicd> leseb: I moved to irc.freenode.net#puppet-openstack to report my progress with the openstack part of the installation. I'll try it using the README.md because I can't figure out how I should use the modules just by reading http://pastebin.com/edpDq3JG ( I'm missing the roles files that contains the actual call to the modules ;-)
[12:27] * Guest2137 (~coyo@thinks.outside.theb0x.org) Quit (Ping timeout: 480 seconds)
[12:31] * toMeloos (~tom@53545693.cm-6-5b.dynamic.ziggo.nl) has joined #ceph
[12:53] * huangjun (~kvirc@111.174.91.224) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[13:24] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) Quit (Ping timeout: 480 seconds)
[13:27] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) has joined #ceph
[13:28] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[13:28] * xdeller (~xdeller@91.218.144.129) has joined #ceph
[13:41] * iggy_ (~iggy@theiggy.com) Quit (Read error: Connection reset by peer)
[13:41] * iggy_ (~iggy@theiggy.com) has joined #ceph
[13:41] * iggy (~iggy@theiggy.com) Quit (Read error: Connection reset by peer)
[14:00] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:02] * madkiss (~madkiss@2001:6f8:12c3:f00f:15b6:17ff:bb27:feb6) Quit (Ping timeout: 480 seconds)
[14:14] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Read error: Operation timed out)
[14:23] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:24] <loicd> leseb: I should have asked "have you tried that yourself recently *using publicly available resources*". That's the trick I think : the non published code / manifests / instructions makes the difference between " it works like a charm" and "I run in circles for hours".
[14:25] <leseb> loicd: arf I'm sorry about this...
[14:25] <loicd> leseb: what?
[14:25] <leseb> loicd: well I should have mentioned you that they were some non public part in my deployment
[14:26] <loicd> can you share the role files ?
[14:26] <loicd> I'm trying to get openstack::controller working, I could use them ;-)
[14:28] <leseb> loicd: not sure if I can and to be honest I don't know where to look for now... sorry
[14:28] * The_Bishop_ (~bishop@f052098027.adsl.alicedsl.de) has joined #ceph
[14:29] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[14:31] * The_Bishop (~bishop@e177091115.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[14:33] <loicd> I think that's the main reason why people fail and get a bad impression from puppet modules. There is no walkthru / howto / example ever published that you can follow and it would "just work". I'm referring to the "paper cuts" of the last CDS.
[14:34] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[14:38] <leseb> loicd: this one also? https://wiki.debian.org/OpenStackPuppetHowto even if it's a bit old, I assume it used to work right?
[14:38] <leseb> loicd: but I mainly agree
[14:39] * The_Bishop__ (~bishop@e179009189.adsl.alicedsl.de) has joined #ceph
[14:40] <loicd> leseb: this one works indeed. If you're looking at deploying OpenStack Essex on Debian GNU/Linux wheezy. And it took many days to get it right. But I would be *extremely* surprised if the Ceph puppet modules work with such a deployment ;-)
[14:41] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:41] <leseb> loicd: you are going to be extremely surprised then :-), just try them out
[14:42] * The_Bishop_ (~bishop@f052098027.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[14:43] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[14:43] * ChanServ sets mode +v andreask
[14:43] <loicd> it so happens that I have a cluster running with exactly these, let me give it a shot
[14:43] <leseb> :)
[14:45] * sleinen (~Adium@2001:620:0:26:b5b0:57b9:6db2:a29e) Quit (Quit: Leaving.)
[14:46] <loicd> git clone https://github.com/enovance/puppet-ceph ceph
[14:46] <loicd> following instructions from
[14:46] <loicd> https://github.com/enovance/puppet-ceph#minimum-puppet-manifest-for-all-members-of-the-ceph-cluster
[14:46] <loicd> keeping an eye on http://pastebin.com/edpDq3JG
[14:48] * markl (~mark@tpsit.com) has joined #ceph
[14:55] * skm (~smiley@205.153.36.170) Quit (Remote host closed the connection)
[14:56] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[14:59] <loicd> leseb: I have created this manifest snippet, does it look good to you : http://pastebin.com/79NTQaqX
[14:59] <loicd> just trying to get the first mon running
[15:00] * erice (~erice@50.240.86.181) has joined #ceph
[15:00] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:02] <loicd> The suggested "ceph-authtool --create /path/to/keyring --gen-key -n mon." in https://github.com/enovance/puppet-ceph#minimum-puppet-manifest-for-all-members-of-the-ceph-cluster is probably a syntax pre-cuttlefish but I'm not sure. ceph-authtool /tmp/keyring --create-keyring --gen-key -n mon.
[15:02] <loicd> worked
[15:08] * yanzheng still uses mkcephfs ;)
[15:08] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:08] <loicd> yanzheng: you're hard core ;-)
[15:10] <loicd> after "git clone https://github.com/enovance/puppet-ceph ceph" and running "puppet agent -vt" without changing anything in the manifests I get a number ( ~10)
[15:10] <loicd> 2013-08-09 15:08:01.897390 7fe16aac6760 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[15:10] <loicd> 2013-08-09 15:08:01.897405 7fe16aac6760 -1 ceph_tool_common_init failed.
[15:11] <loicd> as if the ceph puppet module was trying something when loaded, even if not invoked. I'll assume it's not an issue.
[15:14] <loicd> a puppet --noop says it will install
[15:14] <loicd> +deb http://ceph.com/debian-bobtail/ n/a main
[15:14] <loicd> +deb-src http://ceph.com/debian-bobtail/ n/a main
[15:14] * loicd looking for a way to specify the release as it does not seem to be able to figure it out
[15:16] <leseb> loicd: ceph.pp $release = 'cuttlefish'
[15:16] <loicd> I meant the n/a
[15:17] <loicd> $::lsbdistcodename is not set properly
[15:18] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[15:18] * ChanServ sets mode +v andreask
[15:19] <loicd> fixed
[15:21] * KrisK (~krzysztof@213.17.226.11) Quit (Quit: KrisK)
[15:23] <loicd> puppet agent -vt worked and the mon is running. Now to the first OSD
[15:24] * loicd following https://github.com/enovance/puppet-ceph#puppet-manifest-for-an-osd
[15:24] <loicd> hum
[15:25] * loicd runs puppet agent three times as recommended
[15:30] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:36] <loicd> my version of puppet stdlib misses the ensure_packages function required by the ceph module, upgrading
[15:38] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[15:39] * loicd checking https://forge.puppetlabs.com/puppetlabs/stdlib to figure out which tag to upgrade to . Pretty sure a git pull won't do anything remotely useable ;-)
[15:40] <loicd> Compatibility
[15:40] <loicd> stdlib v2.1.x, v2.2.x
[15:40] <loicd> v2.1.x and v2.2.x of this module are designed to work with Puppet versions 2.6.x and 2.7.x.
[15:40] * loicd trying v2.2.x
[15:41] <loicd> leseb: I think you may appreciate that someone with less knowledge of puppet would already be 100% lost ;-0
[15:44] <loicd> hum my bad
[15:44] * CliMz (~CliMz@194.88.193.33) Quit (Remote host closed the connection)
[15:45] <loicd> the compatibility snippet was obsolete
[15:45] <loicd> https://forge.puppetlabs.com/puppetlabs/stdlib says 3.x is compatible with puppet 2.7
[15:45] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Quit: shimo)
[15:45] <leseb> loicd: oh I see, but in general it's not that easy to get into puppet
[15:46] <loicd> git checkout 3.2.0
[15:46] <loicd> I'm going for this one
[15:46] <loicd> leseb: I agree it's really difficult.
[15:46] * madkiss (~madkiss@089144192103.atnat0001.highway.a1.net) has joined #ceph
[15:47] <leseb> loicd: it remains very obscure for me as well...
[15:47] <loicd> Duplicate declaration: Package[parted] is already \
[15:47] <loicd> declared; cannot redeclare at /etc/puppet/modules/nova/manifests/utilities.pp
[15:52] <loicd> leseb: this is a real error that requires a patch ( simple patch : if (!defined(Package[parted]) ) but it shows that the ceph module is incompatible with the openstack module for wheezy. I'm not surprised. I bet there are a number of similar blockers along the way. Specialy because /etc/ceph/ceph.conf alread contains osd stanza although I've not yet defined an osd.
[15:52] <loicd> it was worth a try though. What made you think it would work out of the box ?
[15:53] <leseb> loicd: hum because usually over a fresh OS
[15:53] * diegows (~diegows@190.190.11.42) has joined #ceph
[15:53] <leseb> loicd: but I might write up an article then, in order to make it 'out of the box' :D
[15:54] <mxmln> has anyone used fuel for openstack?
[15:55] * iggy_ is now known as iggy
[15:55] * sleinen (~Adium@2001:620:0:2d:1daf:77b:824c:57d2) has joined #ceph
[15:56] * iggy_ (~iggy@theiggy.com) has joined #ceph
[15:57] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[15:59] <jnq> from what i've read you keep a static config file with all the OSDs, monitor hosts etc. Is that managed purely as a text file or is there a tool for managing the config?
[16:03] * sleinen (~Adium@2001:620:0:2d:1daf:77b:824c:57d2) Quit (Ping timeout: 480 seconds)
[16:07] * allsystemsarego (~allsystem@188.25.130.190) Quit (Quit: Leaving)
[16:17] * madkiss (~madkiss@089144192103.atnat0001.highway.a1.net) Quit (Ping timeout: 480 seconds)
[16:18] * madkiss (~madkiss@089144192103.atnat0001.highway.a1.net) has joined #ceph
[16:19] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:23] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[16:24] * madkiss1 (~madkiss@213162068088.public.t-mobile.at) has joined #ceph
[16:24] * madkiss (~madkiss@089144192103.atnat0001.highway.a1.net) Quit (Read error: Connection reset by peer)
[16:27] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[16:34] * Coyo (~coyo@thinks.outside.theb0x.org) has joined #ceph
[16:35] * Coyo is now known as Guest2781
[16:43] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[16:46] <indego> Hello, I am new to ceph and about to embark on my first install on some diverse commodity hardware that we have spare. Are there any good hints that are not obvious in the documentation? System will be Debian Wheezy and I am looking at backporting the 3.10 kernel from Debian SID.
[16:49] <indego> My hardware is a bit mish-mash, nothing standard between the systems. I have a mix of 4TB/2TB/1TB disks no SSDs (yet) and can use some 'other' disks for boot/journal.
[17:01] * loicd (~loicd@bouncer.dachary.org) Quit (Quit: quit)
[17:01] * loicd (~loicd@bouncer.dachary.org) has joined #ceph
[17:03] * madkiss1 (~madkiss@213162068088.public.t-mobile.at) Quit (Quit: Leaving.)
[17:07] <mikedawson> indego: there are countless good hints that are not obvious in the documentation ;-) Plan to spend a significant amount of time learning, and treat your couple implementations as proof of concept
[17:07] <mikedawson> indego: s/couple/first couple/
[17:08] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:08] <indego> mikedawson, yes, just need to convince the boss of the latter.
[17:09] <mikedawson> indego: deploy ceph on what you have, learn, repeat.
[17:09] <indego> I have spent time reading about the journal and also looking at bcache. Do these two not kind of do the same thing? Not directly, but if I use a SSD for journal and bcache for the backing store, I am doing 2 SSD writes of the same data just to write it again to disk.
[17:10] <indego> I have wondered if you were to use bcache if the journal is then redundant...
[17:10] <loicd> indego: I found ceph-deploy using Ubuntu precise , following the instructions at http://ceph.com/docs/master/rados/deployment/preflight-checklist/ to be straigforward to get your first cluster running within one or two hours at most.
[17:11] <indego> loicd, yes, read that. It all *looks* simple.
[17:12] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:12] <loicd> I followed the instructions, in order, and got it running within two hours last week, using three newly created machines. It's not just a theory ;-)
[17:12] <indego> I also have some fiber channel (2Gb/s stuff) that I wondered if I could throw in the mix. Has anyone done OSD replication of FC?
[17:17] * amatter (~oftc-webi@209.63.136.134) has joined #ceph
[17:26] * scuttlemonkey (~scuttlemo@mbf2036d0.tmodns.net) has joined #ceph
[17:26] * ChanServ sets mode +o scuttlemonkey
[17:29] <darkfaded> indego: i don't understand what you mean by OSD replication of FC
[17:30] <darkfaded> for example linux dropped IP over FC support years since some person thought FC is a block layer protocol
[17:30] <darkfaded> so that would be out, but then what do you mean
[17:31] * ishkabob (~c7a82cc0@webuser.thegrebs.com) has joined #ceph
[17:31] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[17:32] <ishkabob> hey guys, for some reason, when I'm zapping disks, it doesn't seem to be erasing the partition table
[17:32] <ishkabob> i was wondering how i might debug that
[17:32] <ishkabob> (i'm using ceph-deploy btw)
[17:35] <indego> dalegaar1, sorry s/of/over/
[17:37] * sprachgenerator (~sprachgen@130.202.135.201) has joined #ceph
[17:38] <indego> use the FC 'network' for replication. If there is no IP over FC then I guess not. I do not have much FC experience, I have a box of cards and cables from an old NetApp system that was trashed.
[17:38] <joelio> ishkabob: how are you invoking? It works for me via ceph-deploy osd create --zap-disk
[17:39] <ishkabob> joelio: i'm using "ceph-deploy disk zap camelot:/dev/sdb"
[17:39] <ishkabob> joelio: OR "ceph-deploy disk zap camelot:sdb"
[17:39] <ishkabob> both of them tell me they are zapping the disk in the log, but that's it
[17:39] <joelio> yea, I had trouble doing it that way tbh, hence doing it as part of the create
[17:40] <ishkabob> will this command also run the create for you?
[17:40] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:40] <joelio> yea, it zaps disk, creates aligned partitions, puts on xfs and adds as an osd
[17:41] <joelio> I have a little bit of bash that I used to loop through the nodes
[17:41] <joelio> they all have the same numbr of osds, so was trivial
[17:41] <ishkabob> joelio: when i run that command, i subsequently run ceph -w, and see this line: osdmap e1: 0 osds: 0 up, 0 in
[17:42] <joelio> did you add a -v flag?
[17:42] <joelio> to the create, get verbosity
[17:42] <joelio> it should take about 30 seconds
[17:42] <ishkabob> cool, i'll do that now
[17:42] * devoid (~devoid@130.202.135.202) has joined #ceph
[17:45] <joelio> I also have another term open with a watch -n 5 ceph -s to keep a counter, ceph -w will do the same thing though
[17:46] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:49] * The_Cog (~steve@239.24.187.81.in-addr.arpa) has joined #ceph
[17:52] <ishkabob> joelio: so I just tried that and it gave me this: http://pastebin.com/yY4JThzX
[17:53] <ishkabob> joelio: I do see this in my osd log - "journal read_header error decoding journal header"
[17:54] * scuttlemonkey (~scuttlemo@mbf2036d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[17:54] <ishkabob> joelio: I also looked at the partition table after trying to create the osd, and i only see one partition (i believe there should be another partition for journal right?)
[17:55] <joelio> no osds registering at all?
[17:56] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:56] <ishkabob> joelio: nope, still nothing
[17:57] <joelio> is the osd process running on the host?
[17:57] <ishkabob> joelio: no, however the ceph-osd log continues to be written to
[17:57] <ishkabob> with differen PIDs
[17:57] <ishkabob> i'll throw it in the pastebin
[17:58] <ishkabob> joelio: oh wait, no it doesn't continue to fille up, but i'll throw what i have in the pastebin anyway
[17:59] <joelio> 2 pids though sounds like cruft - there should just be one afaik
[17:59] * indego (~indego@91.232.88.10) Quit (Quit: Leaving)
[17:59] <joelio> unless the other is some initialisation
[18:01] * The_Bishop__ (~bishop@e179009189.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:01] <The_Cog> I'm thinking of trying ceph for backups, large logs, large CSV-data. Would probably want python access to RADOS API from linux and windows. Is there a python swift library available for linux and windows, if so where from?
[18:01] <ishkabob> joelio: and none of them are running when I check them
[18:02] <ishkabob> i believe its actually spawning about 8 pids
[18:02] <joelio> yea, just check via ps auxwww | grep ceph-osd
[18:02] <joelio> should just show parent
[18:05] <ishkabob> joelio: i used the git ceph-deploy and got some more information
[18:05] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) Quit (Quit: gentleben)
[18:05] <ishkabob> so when I run (ceph-deploy -v osd create --zap-disk camelot:sdd)
[18:05] <ishkabob> i get this error
[18:05] <ishkabob> GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. Information: Creating fresh partition table; will override earlier problems! Non-GPT disk; not saving changes. Use -g to override. Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Warning! Main and backup partition tables differ! Use the 'c' and 'e' options on the recovery & transformation menu to exam
[18:06] <joelio> /dev/sdd right?
[18:06] <ishkabob> correct
[18:06] <ishkabob> i have 6 disks in here i can use, been trying different ones in case one is borked
[18:06] <joelio> well, I didn't do it that way, I used full dev path
[18:06] <ishkabob> lemme try that
[18:06] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:06] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[18:07] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) Quit (Read error: Operation timed out)
[18:08] <ishkabob> joelio: are you defining a journal as well?
[18:08] <joelio> ishkabob: I also do it all on one line, like so.. https://gist.github.com/anonymous/9d2b91d059ed6fa0f93b
[18:08] <joelio> no ssd for journal
[18:08] <joelio> all in one
[18:08] <ishkabob> joelio: also, when I tried that, it went back to telling me everything went fine and does nothing
[18:08] <joelio> I have 6 hosts, 6 osds per host
[18:08] <ishkabob> well, i shoulnd't say nothing, it IS creating a partition
[18:10] <ishkabob> joelio: how much space does it determine for the journal? i could try to partition my disks manually and see if that works at least
[18:12] * alfredodeza (~alfredode@2607:f298:a:607:3064:7c81:1997:800f) has joined #ceph
[18:12] <joelio> 2 1049kB 1074MB 1073MB ceph journal
[18:13] <joelio> (from parted)
[18:13] <joelio> I'm going to get some SSDs for them soon, but performance has been great so far, no complaints
[18:14] <ishkabob> joelio: do you know if ceph or ceph-deploy is expecting a GPT part table?
[18:14] <ishkabob> i think i need to wipe the GPT table
[18:14] <joelio> you can always try it
[18:14] * alram (~alram@38.122.20.226) has joined #ceph
[18:14] <joelio> I have GPT
[18:15] * joao (~JL@2607:f298:a:607:c479:555:dc4d:452d) has joined #ceph
[18:15] * ChanServ sets mode +o joao
[18:15] <joelio> that's really the purpose of zap disk I thought though?
[18:17] <ishkabob> yeah, it looks like it does write GPT data
[18:17] <ishkabob> i kiled the GPT data with gdisk and tried creating the disk again, opened up fdisk and its telling me about GPT data again
[18:17] <joelio> use parted
[18:18] <joelio> but then again, the zap disk *should* be doing all that lifting
[18:18] <ishkabob> joelio: what should i use parted to do?
[18:18] <joelio> fdisk isn't really gpt friendly
[18:18] <joelio> "WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'! The util fdisk doesn't support GPT. Use GNU Parted."
[18:19] <joelio> etc.. :)
[18:19] <ishkabob> right of course :)
[18:21] * [fred] (fred@konfuzi.us) has joined #ceph
[18:22] <ishkabob> joelio: indeed it IS creating 2 partitions, fdisk just can't see the other one
[18:22] <ishkabob> joelio: do you see your disks mounted in /etc/mtab?
[18:22] <joelio> /dev/sdd1 on /var/lib/ceph/osd/ceph-2 type xfs (rw)
[18:24] <ishkabob> joelio: what a minute, i just tried to do
[18:24] <ishkabob> service ceph start
[18:24] <ishkabob> and i got this
[18:25] <ishkabob> unable to authenticate as client.bootstrap-osd
[18:25] <ishkabob> i bet that's the problem :)
[18:25] <ishkabob> the disks are creating just fine, but my keys are screwed
[18:26] <joelio> yea, sounds plausible
[18:26] <joelio> everything's a freaking auth problem :)
[18:27] <ishkabob> i'll just create the cluster from scratch again and see what happens
[18:28] <joelio> aye
[18:29] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:29] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[18:30] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[18:31] * tnt (~tnt@91.177.243.62) has joined #ceph
[18:34] * yehudasa_ (~yehudasa@2602:306:330b:1410:95ae:485a:7b55:dc8c) has joined #ceph
[18:39] <sagewk> yehudasa_: can we make the RGWHTTPClient methods pure abstract?
[18:39] <sagewk> and move the trivial implementations to the children that don't need them?
[18:39] <sagewk> er, pure virtual i guess
[18:40] <yehudasa_> sagewk: probably
[18:40] <yehudasa_> I'll do that now
[18:40] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:40] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[18:42] * sprachgenerator_ (~sprachgen@130.202.135.201) has joined #ceph
[18:43] * sprachgenerator (~sprachgen@130.202.135.201) Quit (Read error: Connection reset by peer)
[18:43] * sprachgenerator_ is now known as sprachgenerator
[18:46] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[18:47] <xdeller> <sjust> - is there any estimation on snapshot action throttling in cuttlefish?
[18:47] <The_Cog> I'm thinking of trying ceph for backups, large logs, large CSV-data. Would probably want python access to RADOS API from linux and windows. Is there a python swift library available for linux and windows, if so where from?
[18:49] <alfredodeza> a Python swift library?
[18:49] <sjust> xdeller: you mean snapshot trimming?
[18:50] <xdeller> if i may call it so, it`s about #5844
[18:51] <sjust> xdeller: creating a snapshot causes a hang?
[18:51] <xdeller> 0.61.7 improved situation quite well when mons was upgraded
[18:51] <xdeller> both creation and deletion
[18:51] <sjust> can you describe the sequence of operations
[18:51] <sjust> ?
[18:51] <xdeller> it`s not a hang, just a temporary increase in reads latency
[18:54] <xdeller> create a bunch of vms, say ten with read non-cacheable I/O per osd then create a large image, commit a lot of data to it continuously and do some snapshots
[18:54] <sjust> xdeller: if it's just a brief increase in read latency, that might be map propogation
[18:54] <xdeller> it seems so
[18:54] <xdeller> just because of new osdmap
[18:55] <sjust> ah... I assumed the problem was the background snapshot removal on the osds causing IO
[18:55] <xdeller> it may took a lot of seconds before latest mon improvements
[18:55] <sjust> if it's the map propagation, throttling isn't relevant
[18:55] <sjust> the problem is that the clients at that point might have a newer OSDMap than the osds and have to wait for the osds to catch up
[18:56] <The_Cog> alfredodeza: Yes. The CEPH docs don't refer to any particular one. I found python-cloudfiles on github but also read that cloudfiles had a proprietary API so don't know if that's suitable.
[18:56] <xdeller> I/O seems not to be a problem. as seen in per-disk stats
[18:57] <sjust> xdeller: right, the only way to fix that would be to increase OSDMap propogation speed
[18:57] <alfredodeza> The_Cog: there are python bindings for Ceph but they are not packaged yet
[18:58] <xdeller> i had written ticket above before upgrading to the 0.61.7, so right now delays seems to be _tolerable_
[18:58] <xdeller> though still quite high
[18:58] <sjust> xdeller: hmm, I'll refer you to joao
[18:59] <The_Cog> alfredodeza: I see python-ceph which I think uses librados and is therefore linux only. That's in Ubuntu anyway. I was thinking of python+windows+swift.
[18:59] <xdeller> nice, thanks
[19:00] * X3NQ (~X3NQ@195.191.107.205) Quit (Remote host closed the connection)
[19:03] <joao> xdeller, can you get me the mon log for the leader with 'debug mon = 10', 'debug paxos = 10' and 'debug ms = 1'?
[19:03] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[19:03] <joao> would like to know where the monitor is spending its sweet time handling this
[19:04] <joao> also, how many monitors are you running?
[19:05] <xdeller> three both in testing and production
[19:05] * The_Cog (~steve@239.24.187.81.in-addr.arpa) has left #ceph
[19:05] <xdeller> if you don`t mind, i`ll borrow such logs from test by tomorrow and send you a link to @inktank mail
[19:05] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[19:07] * goldfish (~goldfish@91.215.166.4) Quit (Ping timeout: 480 seconds)
[19:08] <joao> xdeller, just drop them on cephdrop@ceph.com and point me to them then
[19:09] <joao> that would be the easiest approach
[19:18] * sjust (~sam@38.122.20.226) Quit (Read error: Connection reset by peer)
[19:22] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:24] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[19:25] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[19:27] * davidzlap (~Adium@cpe-75-84-249-188.socal.res.rr.com) has joined #ceph
[19:28] * netsrob (~thorsten@212.224.79.27) has joined #ceph
[19:33] <netsrob> hello, i try to solve a 403 error from swift/radosgw: http://pastebin.com/Nx5MedfD
[19:35] <netsrob> ok, was blind: RGW_SWIFT_Auth_Get::execute(): bad swift key
[19:35] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[19:38] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[19:42] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:44] * Psi-Jack (~Psi-Jack@psi-jack.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:44] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[19:50] * Psi-Jack (~Psi-Jack@psi-jack.user.oftc.net) has joined #ceph
[20:00] * Gamekiller77 (~oftc-webi@128-107-239-233.cisco.com) has joined #ceph
[20:00] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[20:00] <Gamekiller77> hello channel i having a problem with cinder and ceph but when i run this command at the cli i get this output
[20:00] <Gamekiller77> rados lspools 2013-08-09 10:23:46.592122 7ff960653760 0 librados: client.admin authentication error (1) Operation not permitted couldn't connect to cluster! error -1
[20:00] <Gamekiller77> what am i missing
[20:00] <Gamekiller77> i have ceph.conf on the nova compute node
[20:00] <Gamekiller77> keyring in place
[20:01] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:05] <netsrob> Gamekiller77: is 'rados lspools' working on commandline?
[20:08] <Gamekiller77> that form the command line of the kvm server
[20:08] <Gamekiller77> let me try it form the primary monitor
[20:09] <netsrob> Gamekiller77: should work on any node
[20:09] <Gamekiller77> yup worked fine from the primary monitor
[20:09] <Gamekiller77> btw i using RHEL
[20:11] <Gamekiller77> the keyring file has the auth token line for client.admin
[20:14] <zjohnson> curious if anyone is succesfully using desktop style drives for data center use? A few years ago I got a bunch of the 3TB Hitachi deskstop disks and have had very few failures.
[20:14] * TiCPU (~jeromepou@190-130.cgocable.ca) has joined #ceph
[20:14] <zjohnson> I talked to some cluster people before purchase and for that model people seemed to be under the impression they were exactly the same as the enterprise utlrastar model
[20:14] <netsrob> Gamekiller77: i'd check auth config on your kvm node
[20:14] * absynth (~absynth@irc.absynth.de) Quit (Remote host closed the connection)
[20:15] <zjohnson> but now I am trying to figure out if that is still the case for any brand of desktopdrive
[20:15] * absynth (~absynth@irc.absynth.de) has joined #ceph
[20:16] <netsrob> zjohnson: depends on the manufacturer, we had big issues with seagate desktop drives in the past
[20:16] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[20:16] * ChanServ sets mode +v andreask
[20:17] <zjohnson> Yeah I have some people here who found transfer speeds on desktop drives being quite a bit slower when packed into a storage array due to vibration I presume
[20:17] * joao (~JL@2607:f298:a:607:c479:555:dc4d:452d) Quit (Ping timeout: 480 seconds)
[20:17] <zjohnson> but we did not have that problem with Hitachi HDS72303
[20:18] * zjohnson needs to find someone who is having success with a current desktop drive :)
[20:18] <netsrob> zjohnson: wd and hitachi had only few failures here, too
[20:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[20:19] <zjohnson> netsrob: are you rob-slc as well?
[20:20] <netsrob> zjohnson: guess not ;)
[20:20] <zjohnson> k
[20:20] <zjohnson> netsrob: do you buy enterprise drives at all?
[20:21] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:21] <zjohnson> 40% the price is a pretty big motivator
[20:21] <netsrob> zjohnson: usually only SAS-drives
[20:22] <zjohnson> so sata = desktop, sas = enterprise for you
[20:22] <zjohnson> ?
[20:23] <netsrob> yes, usually
[20:24] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[20:24] <zjohnson> ok, thanks :)
[20:25] <zjohnson> I may just take a leap of faith and do hitachi 4tb deskstars
[20:25] <ishkabob> zjohnson: we're using 2TB 7200 RPM sata drives and a 40gigabit bonded network, pretty fast
[20:26] * joao (~JL@2607:f298:a:607:9eeb:e8ff:fe0f:c9a6) has joined #ceph
[20:26] * ChanServ sets mode +o joao
[20:26] <zjohnson> is your 40gig network 4x 10000BaseT?
[20:26] <zjohnson> or something else?
[20:26] <ishkabob> yeah it is
[20:26] <zjohnson> ok
[20:26] <ishkabob> its 4 bonded 10gig ports
[20:27] <zjohnson> what switches are you using for that?
[20:27] <zjohnson> I have some AT gigE switches which I used for bonding
[20:27] <zjohnson> don't have any 10gigE here yet
[20:27] <ishkabob> they're arist 7xxx
[20:27] <ishkabob> ack
[20:27] <ishkabob> Arista
[20:27] <ishkabob> can't remember the exact model
[20:32] * zjohnson looks
[20:33] <Gamekiller77> netsrob: what the best way to check auth config on KVM i follow the rbd doc on importing the key to libvirt but shouldnt rados lspool pull from the /etc/ceph/ceph.conf and keyring files ?
[20:34] <zjohnson> .
[20:34] * zjohnson is trying to determine if he should roll out ganeti with RBD or just settle for DRBD
[20:37] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[20:46] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[20:47] <alphe> hello all
[20:47] <zjohnson> netsrob: do you ever consider buying external hard disks and pulling them out, or is that just not worth the bother?
[20:47] <alphe> I have a question can the machine in my ceph cluster hosting the mds be the rados gateway too ?
[20:48] <ismell> so I have configured cephs s3 interface to allow CORS. It reponds to the OPTIONS header correctly with the Access-Control-Allow-Origin, but when I do a POST the Access-Control-Allow-Orign header doesn't come back and the browser blocks it
[20:48] <ismell> is there another step of configuration I need to add?
[20:58] <alphe> if i use a rados gateway for amazon s3 do i need a rados block device ?
[21:00] <alphe> "Ceph Object Storage runs on Apache and FastCGI in conjunction with the Ceph Storage Cluster. Install Apache and FastCGI on the server node." what does server node means there ?
[21:00] <alphe> does it referes to one component of the ceph cluster or is a machine apart ?
[21:01] <alphe> usually in ceph doc server nodes means OSDs, MDS, MONs,
[21:01] <alphe> can i install the object storage on a machine outside the ceph cluster ?
[21:03] * erice (~erice@50.240.86.181) Quit (Ping timeout: 480 seconds)
[21:05] * xdeller (~xdeller@91.218.144.129) Quit (Quit: Leaving)
[21:06] * mikedawson_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:09] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Quit: Leaving)
[21:11] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[21:11] * mikedawson_ is now known as mikedawson
[21:19] * sprachgenerator (~sprachgen@130.202.135.201) Quit (Quit: sprachgenerator)
[21:21] * tnt (~tnt@91.177.243.62) Quit (Ping timeout: 480 seconds)
[21:24] * doubleg (~doubleg@69.167.130.11) has joined #ceph
[21:29] <sagewk> yehudasa_: wip-5921 looks good
[21:31] <yehudasa_> sagewk: cool
[21:31] <yehudasa_> also need to review 5882
[21:32] <sagewk> k
[21:36] <sagewk> 5882 looks good too.
[21:59] <Tamil> alphe: you dont need rbd , if you are using rgw
[21:59] <Tamil> alphe: server is where you have your cluster configured and running
[22:00] * jackhill (jackhill@pilot.trilug.org) has joined #ceph
[22:02] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) Quit (Read error: Connection reset by peer)
[22:02] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) has joined #ceph
[22:13] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[22:16] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[22:16] <netsrob> Gamekiller77: usually the ceph.conf should work, thats why i'd test without kvm first :)
[22:18] <netsrob> zjohnson: i'd recommend it only für 2,5 inch drives, somehow some usb-drives are cheaper than internal ones
[22:18] <netsrob> 3,5 inch would be too messy in large volumes of drives
[22:18] <netsrob> also you never know what kind of drive you get
[22:19] <netsrob> so if you want surprises ;)
[22:21] <netsrob> Gamekiller77: btw i had some issues with libvirt not correctly handling keyfiles, maybe its the same issue with your setup, too
[22:22] <netsrob> maybe try using it without auth first and then migrate to keys
[22:30] <yehudasa_> ismell: what version are you running?
[22:30] * alram (~alram@38.122.20.226) Quit (Read error: Connection reset by peer)
[22:46] * amatter (~oftc-webi@209.63.136.134) Quit (Quit: Page closed)
[22:52] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[22:52] <alphe> hello :)
[22:52] <alphe> after reading the documentation in object storage quick start I saw a missing part
[22:53] <Tamil> alphe: ?
[22:53] <alphe> so we have in the doc an installed apache fast-cgi radosgw
[22:53] <alphe> we created the keys
[22:53] <loicd> is it just me or is github.com not responding ?
[22:53] <alphe> we create the user the subuser we conf ceph to take in charge the radosgw
[22:54] <Tamil> loicd: github.com seems accessible
[22:54] <alphe> but then what about the pool and the bucket ?
[22:54] <loicd> Tamil: thanks
[22:54] <Tamil> loicd: np
[22:54] <loicd> works now indeed
[22:54] <Tamil> loicd: kool
[22:54] <alphe> hello tamil and loicd :)
[22:54] <Tamil> alphe: hello
[22:55] <loicd> alphe: \o
[22:55] <Tamil> alphe: what are you trying to do?
[22:55] <alphe> it tryed serveral things to send data to my cleph cluster (fanstastically installed thanks to loicd advises :) )
[22:55] <loicd> :-)
[22:55] <alphe> i went the "man in the middle" way or "hey it is a proxy ?!"
[22:56] <alphe> i made a virtual machine with a ceph-fuse mounting the ceph cluster
[22:56] <alphe> then to interface with windows i tryed several things samba 4 samba 3.6.9 proftpd webdav
[22:57] <alphe> all of them has a really poor bandwidth use
[22:57] <alphe> when transfering tons of real small files from windows clients
[22:58] <netsrob> alphe: maybe iscsi via rbd?
[22:58] <alphe> to give you an idea of the task it was simply take c:\windows and throw it to the ceph cluster and let see how the network cries !
[22:59] <alphe> netsrob I have two important maters first my clientes are on windows that can t change we use hardware tools that runs windows only ...
[22:59] <alphe> next for the software tool to use the hardware tools on windows i need a drive map style
[23:01] <alphe> the best bandwidth usage i got with tons of different size files was filezilla(windows)10 connections/proftpd(linux) ceph-fuse
[23:02] <alphe> iscsi I don t know how it works it is supposed to be linked to specific hardware capable no ?
[23:02] <netsrob> alphe: iscsi is block-storage over network
[23:02] <netsrob> like a physical hdd
[23:03] <netsrob> but multiple clients are a bit problematic
[23:08] <alphe> yep
[23:09] <alphe> so i went the s3 way I installed apache fastcgi radosgw etc like said on the page
[23:09] <alphe> but then from my client on windows I have a unable to connect message
[23:10] <cjh_> ceph: has anyone attempted to boot off of a ceph rbd with a physical server, not a vm? Wondering what might be involved with that
[23:10] <alphe> obviously it is because I have no pool and no bucket
[23:10] <Gamekiller77> netsrob: do i just do change this line in the conf auth_supported = cephx
[23:11] <alphe> cjh_ booting rbd need QEMU interface no ?
[23:11] <cjh_> alphe: without qemu
[23:11] <cjh_> what i'm wondering is how much work would be involved in making an initrd with a ceph rbd driver
[23:11] <alphe> and how do you do for your bios to know where to locate the netdrive ?
[23:13] <alphe> cjh_ you should be able to bootp ...
[23:13] <alphe> but not directly i think ...
[23:13] <cjh_> so pxe booting?
[23:14] <Tamil> alphe: you may have to use s3 or swift to create buckets - http://ceph.com/docs/master/radosgw/s3/ or http://ceph.com/docs/master/radosgw/swift/
[23:15] <cjh_> alphe: yeah i see what you mean. no easy way to do that
[23:16] <dmick> cjh_: I think there are some BIOSes that know how to boot from iSCSI, but I have no idea how complex that is (mostly add-in card BIOSes in my experience, but it's been several years since I looked)
[23:17] <alphe> pxe boot bioses can be used for initial install from a server or to boot a diskless terminal
[23:18] <alphe> but you need in both case a server to provide the data.
[23:18] <cjh_> dmick: yeah i'm guessing this isn't a trivial thing to get working
[23:18] * mschiff_ (~mschiff@port-34442.pppoe.wtnet.de) Quit (Remote host closed the connection)
[23:19] <dmick> I'd think your easiest chance is with iscsi export of the rbd image, because iscsi is a lot more widely supported, and I suspect more widely for BIOS boot
[23:19] <cjh_> that's true
[23:19] <dmick> I mean PXE is everywhere, and maybe you could get rbd into the boot driver path somehow. I dunno. It might be impossible, it might be totally doable.
[23:20] <ismell> yehudasa_: is there an way way to tell? I don't actually manage CEPH
[23:20] <cjh_> yeah that's tough to say without really digging into it
[23:20] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[23:23] <alphe> tamil the page you provided I rode them and understoud nothing ...
[23:23] <yehudasa_> ismell: do you have any logs?
[23:23] <alphe> I am supposed to write a set of webpage with instructions in it ?
[23:23] <ismell> only what I can see in the browser. I can ask our CEPH guy to pull some though
[23:25] <yehudasa_> I'll need to get the gateway logs for this request, what version they're running
[23:26] <ismell> I'll ask the sys admin
[23:27] <ismell> we'll see how responsive he is
[23:27] * TiCPU (~jeromepou@190-130.cgocable.ca) Quit (Ping timeout: 480 seconds)
[23:29] <ismell> https://gist.github.com/ismell/cb7eade8f8ad06800f1e
[23:29] <ismell> this is what I see from the browser,
[23:29] <ismell> but prolly not enough to help :(
[23:29] <alphe> ok so how do i interact with my s3 able radosgw?
[23:30] <alphe> i used radosgw to create a user
[23:30] <alphe> i used radosgw-admin to create a user
[23:31] <netsrob> Gamekiller77: should work, otherwise disable globally
[23:31] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) Quit (Read error: Connection reset by peer)
[23:32] <yehudasa_> ismell: so the actual problem is that the origin is not reflected correctly
[23:32] <yehudasa_> my guess is that there's some issue with the port number
[23:32] <yehudasa_> I'll try looking at it later
[23:32] <yehudasa_> alphe: you need to have some s3 tool
[23:33] <Tamil> alphe: you should be able to use the s3 apis to create buckets. http://ceph.com/docs/master/radosgw/s3/perl/
[23:33] <Tamil> alphe: you can try from the link . for example, if you are using perl, you may have to install S3 module from cpan
[23:34] <Tamil> alphe: what do you mean by write a set of webpage with instructions in it? I dont get it
[23:37] <alphe> tamil I saw the html like command but i don t know how to input them
[23:37] <alphe> any ready to use s3 tools ?
[23:38] <Tamil> alphe: boto
[23:39] <sagewk> joao: looks good!
[23:39] <joao> great
[23:39] * BillK (~BillK-OFT@124-169-72-15.dyn.iinet.net.au) has joined #ceph
[23:42] <alphe> tamil ok thank I will look into that
[23:42] <Tamil> alphe: sure
[23:42] <alphe> but the doc is funny you are escorted all along the process and then left alone with the raw api : )
[23:43] <alphe> tamil python-boto ?
[23:44] <Tamil> alphe: cause you are open to choose the s3tool of your choice
[23:44] <Tamil> alphe: yes
[23:44] <alphe> ho another ask ... i know it is tons of asks comming from me
[23:44] <alphe> tamil yes "We are open to fail myserably until the end of times"
[23:45] <alphe> ah ah ah ah
[23:45] * alram (~alram@38.122.20.226) has joined #ceph
[23:45] <alphe> ok so the other ask is I create a pool with rados mkpool
[23:46] <alphe> but i don t see how my data sent with the user i create will end to that pool
[23:46] <alphe> i don t know how to see the size of that pool neither ...
[23:47] <dmick> (02:42:43 PM) alphe: but the doc is funny you are escorted all along the process and then left alone with the raw api : )
[23:47] <alphe> boto's doc at first glance is all cloud this cloud that
[23:48] <dmick> alphe: the radosgw provides an s3-compatible interface. Ceph does not provide or produce s3 clients.
[23:48] <alphe> dmick I know that much ...
[23:48] <dmick> then the doc shouldn't be funny
[23:49] <ismell> yehudasa_: I did the POST with curl so I could actually see the response headers: https://gist.github.com/ismell/cb7eade8f8ad06800f1e#file-using-curl
[23:49] <alphe> dmick i use netdrive software as client but still how do i use the rados pool to create a s3 bucket to link with my user so when the netdrive software try to log it opens the share
[23:50] <alphe> dmick the funny part (and it is not an insult it is a comment from someone that discovered s3 amazon thingy like yesterday...)
[23:51] <alphe> the funny part is that we are at the end of object storage quick start with something unacheive and no clues on how to acheive it ...
[23:52] <joao> sage, repushed wip-5205-take2 on top of next
[23:53] <alphe> i am sure that boto is extra and that to create the bucket i need there is a radosgw-admin command
[23:53] <Tamil> alphe: if you are looking to find the pool info, "rados --help" would help
[23:54] * netsrob (~thorsten@212.224.79.27) Quit (Quit: gn8)
[23:54] <alphe> tamil i do rados lspools
[23:54] <alphe> and i see the pool there but only the name no other info
[23:54] <Tamil> alphe: rados ls
[23:56] <Tamil> alphe: rados ls -p <pool_name> should be able to give you the pool info
[23:56] <alphe> hum I do rados ls .rgw.buckets. (which is the name of my pool) and I get pool name was not specified ...
[23:56] <dmick> it's odd, I don't see a -p in your command
[23:57] <alphe> with -p .rgw.buckets. i get the erro opening pool no such file
[23:58] <alphe> rados lspools is a single word ... and it list data metadata rbd .rgw .rgw.gc .rgw.control .rgw.uid .users
[23:58] <Tamil> alphe: does it list .rgw.buckets too?
[23:58] <alphe> and at last .rgw.buckets which I created with rados mkpool .rgw.buckets
[23:59] <alphe> tamil yes ...
[23:59] <alphe> ok the pool and bucket stuff is too much blured in my minf
[23:59] <alphe> ok the pool and bucket stuff is too much blured in my mind

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.