#ceph IRC Log

Index

IRC Log for 2012-12-17

Timestamps are in GMT/BST.

[0:04] * CloudGuy (~CloudGuy@5356416B.cm-6-7b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[0:05] <phantomcircuit> snapshot cloning says to use rbd snap protect/rbd snap clone
[0:05] <phantomcircuit> in what version was that functionality added?
[0:05] <phantomcircuit> im running 0.51
[0:05] <scalability-junk> is ceph filestorage ready to use for production data?
[0:06] <scalability-junk> how scalable is it?
[0:06] <phantomcircuit> http://ceph.com/docs/master/start/quick-cephfs/
[0:06] <scalability-junk> and how are backups done? doing it like sql dump into one file won't work I reckon...
[0:07] <scalability-junk> phantomcircuit, thanks thought so
[0:07] <Psi-jack> I only see that warning in one spot.
[0:07] <Psi-jack> So hard to tell, for certain.
[0:10] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[0:12] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[0:12] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[0:16] * Kioob (~kioob@luuna.daevel.fr) Quit ()
[0:17] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[0:17] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[0:20] * LeaChim (~LeaChim@5ad684ae.bb.sky.com) Quit (Remote host closed the connection)
[0:28] * maxiz (~pfliu@221.223.242.247) Quit (Read error: Operation timed out)
[0:47] * yehuda_hm (~yehuda@2602:306:330b:a40:513b:92fb:8a99:e8e8) Quit (Ping timeout: 480 seconds)
[0:49] * yehuda_hm (~yehuda@2602:306:330b:a40:513b:92fb:8a99:e8e8) has joined #ceph
[1:13] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[1:14] <Psi-jack> Hmmm
[1:14] <Psi-jack> Well, looks like I'm going to have to set up NFSv4 afterall. :/
[1:16] <Psi-jack> Tried various alternatives, including ocfs2 on rbd-mapped disks, and that was a fireball situation, kernel panic gallore. heh
[1:19] * loicd1 (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:21] <Psi-jack> Now, here's the curious question.
[1:21] <Psi-jack> Is it safe to rbd mount an rbd disk from one of the storage servers, so that I can export it for NFS from it?
[1:22] <Psi-jack> Using the ceph-rbd kernel module.
[1:24] <iggy> Psi-jack: nope
[1:25] <Psi-jack> Not safe, eh? Blah.
[1:25] <iggy> kernel clients should never be used on osds
[1:25] <iggy> rbd or cephfs
[1:25] <Psi-jack> Not even the ceph-rbd, method, I presume.
[1:26] <Psi-jack> I'm trying to get around the issue where CephFS doesn't like multiple systems mounting it, and it's really frustrating that it's baaaadly broken, especially after the performance I get out of it. :)
[1:27] <Psi-jack> I didn't want to have to spawn up another VM just to mount cephfs and provide NFS access to it.
[1:40] <iggy> you should be able to mount ceph on multiple systems... that's kind of the point
[1:40] <Psi-jack> I know.
[1:40] <Psi-jack> but it causes kernel exceptions.
[1:41] <iggy> that said... cephfs isn't production ready yet
[1:41] <Psi-jack> Literally mounting and utilizing cephfs from multiple systems at the same time causes severe system lockups and kernel oops.
[1:41] <Psi-jack> It won't even allow apache to run on the second node. heh
[1:43] <darkfaded> Psi-jack: lets just call it strict locking
[1:43] <Psi-jack> VERY strict. :)
[1:43] <Psi-jack> Like, showstopping strict. :)
[1:44] <darkfaded> did it only start when accessing the same files, or just the same directories? (just curious)
[1:46] <Psi-jack> Not entirely sure, to be honest. I mounted cephfs on one node, started apache. Some things are shared, like php session files, for those that need them, and ssl certs are in the volume as well, then I'd mount it on the other node, and it would block apache2 from even starting, let alone just ls -lR /ceph/mount
[1:46] <Psi-jack> It would list several directories/files, before ls would dead halt and killing -9'ing it would zombify it.
[1:46] <darkfaded> :/
[1:46] <Psi-jack> So, directories, files, anything, it will become undead.
[1:46] <Psi-jack> And will make both systems become very unstable.
[1:47] <iggy> hmm, that must be a recent thing... i know that was working fine at one point (albeit slowly)
[1:47] <Psi-jack> Possibly. I tried with the cephfs 0.48 client on a 0.55.x git checkout of ceph for the storage servers.
[1:47] <Psi-jack> Upgraded the clients to 0.55 as well, to see if it was a client thing.
[1:47] <Psi-jack> Same thing.
[1:47] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[1:49] <Psi-jack> Based on the fact the problem occurs on 0.48 and 0.55 clients, I'd guestimate this could be a server-side issue?
[1:49] <Psi-jack> Maybe, anyway.. heh
[1:49] <Psi-jack> Hard to determine.
[1:54] <phantomcircuit> i just created a new rbd volume and it's the format v1
[1:54] <phantomcircuit> how can i get them to be v2?
[1:55] <Psi-jack> Hmm, what's the difference?
[1:55] <phantomcircuit> cloning only works in v2
[1:56] <phantomcircuit> format 2 - Use the second rbd format, which is supported by librbd (but not the kernel rbd module) at this time. This adds support for cloning and is more easily extensible to allow more features in the future.
[1:56] <Psi-jack> hmmm
[1:56] <Psi-jack> mkcephfs, apparently made all my pools v1. heh
[1:57] <Psi-jack> If by version, you mean format.
[1:57] <phantomcircuit> yeah format
[1:57] <Psi-jack> Which you do. :)
[1:58] * The_Bishop__ (~bishop@e179009120.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[1:59] <phantomcircuit> hmm so im stuck with a decision between installing 0.55 manually (ie not using package manager) or not being able to cow
[1:59] <phantomcircuit> :(
[1:59] <Psi-jack> Heh.
[2:00] <Psi-jack> I personally used Arch's ceph-git, which I hacked to fix a few issues in it's PKGBUILD, to install 0.55.x from git master, as of Wednesday this past week.
[2:03] <Psi-jack> looks like rbd create <blah> --format 2 would be the answer to setting format to 2
[2:08] <phantomcircuit> 0.51 doesn't support --format
[2:08] <phantomcircuit> which is the latest version in gentoo portage
[2:09] <Psi-jack> So update the ebuild. :)
[2:09] <Psi-jack> Easy!
[2:11] <Psi-jack> Hmmm
[2:12] <Psi-jack> I have a feeling if I tried to live-migrate a VM now, under RBD, it'd fail or worse cause a kernel oops/panic on the host trying, due to the lack of lock tags. :)
[2:13] * The_Bishop (~bishop@e179009120.adsl.alicedsl.de) has joined #ceph
[2:15] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:15] <phantomcircuit> Psi-jack, libvirt can take care of that using sanlock
[2:16] <Psi-jack> Not using that piece of crap. :)
[2:16] <phantomcircuit> lol
[2:16] <phantomcircuit> live on the edge you can use a network lock manager
[2:20] <phantomcircuit> just realized i was invited to a party today and completely forgot about it
[2:25] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[2:25] <phantomcircuit> Psi-jack, created 0.55 ebuild from 0.51 ebuild
[2:26] <phantomcircuit> odds this works ~20%
[2:26] <Psi-jack> Good job. :)
[2:37] <phantomcircuit> lol it worked
[2:37] <phantomcircuit> horray
[2:37] <phantomcircuit> inb4crash
[2:42] <phantomcircuit> nope just accidentally turned on cephx
[2:45] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:47] <iggy> Psi-jack: migration should work without locking (I'd think it would have to be off in fact)
[2:47] <Psi-jack> Hmm, well, we'll eventually see. :)
[2:48] <Psi-jack> I haven't tested any of that yet, or even failure or bringing a storage server offline and back as of yet.
[2:48] <phantomcircuit> iggy, qemu-kvm can do live migration theoretically
[2:48] <Psi-jack> phantomcircuit: It's not theoretical at all.
[2:48] <Psi-jack> It can.
[2:49] <iggy> yeah, works quite well if you know what you're doing
[2:49] <Psi-jack> Yep.:)
[2:51] * zK4k7g (~zK4k7g@digilicious.com) Quit (Quit: Leaving.)
[2:53] <phantomcircuit> Psi-jack, the theoretical part is the shared disk
[2:53] <Psi-jack> Not really.
[2:53] <phantomcircuit> which i can only assume works very well with ceph
[2:53] <Psi-jack> I've done it quite well over NFS with qcow2 disks.
[2:53] <Psi-jack> And iSCSI
[3:13] <phantomcircuit> heh well i have a bit of a silly setup
[3:31] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:31] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:34] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: This computer has gone to sleep)
[3:41] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[3:41] * ChanServ sets mode +o scuttlemonkey
[3:52] <phantomcircuit> hmm
[3:52] <phantomcircuit> trying to create an rbd volume with format 2
[3:52] <phantomcircuit> librbd: error writing header: (38) Function not implemented
[3:55] <iggy> how are you trying to create it?
[3:58] <phantomcircuit> rbd create --format 2 --size 1000 test
[3:59] <phantomcircuit> http://pastebin.com/raw.php?i=HB8aViBH
[4:03] <iggy> is your rbd tool updated?
[4:11] <phantomcircuit> yes
[4:11] <phantomcircuit> iggy, yes
[4:12] <phantomcircuit> # rbd --version
[4:12] <phantomcircuit> ceph version 0.55 (690f8175606edf37a3177c27a3949c78fd37099f)
[4:13] <iggy> hmm...
[4:14] <iggy> i got nothing at this point really
[4:15] <phantomcircuit> huh seems like all of a sudden ceph_init.sh has decided the mon isn't on this system
[4:15] <phantomcircuit> i had this problem once before
[4:16] <phantomcircuit> hostname matches the host settingl
[4:16] <phantomcircuit> setting*
[4:16] <phantomcircuit> so im still running 0.51
[4:19] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:26] <phantomcircuit> huh this is odd
[4:26] <phantomcircuit> fixed
[4:27] <phantomcircuit> HORRAY create succeeded
[4:27] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:30] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[4:34] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Read error: Operation timed out)
[4:56] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:07] * renzhi (~renzhi@116.226.37.139) has joined #ceph
[5:11] * deepsa (~deepsa@122.166.161.214) has joined #ceph
[5:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:53] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:53] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:01] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (Read error: Operation timed out)
[6:05] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:11] <michaeltchapman> I'm finding my monitor processes are generating quite a lot of iops. I have 192 OSDs, with the mons running off a raid 1 pair of SATA disks I get ~200ms wait times on /dev/sda which I believe is contributing to making the cluster performance very bursty (I get a lot of waiting for sub ops in the logs). Is this pretty normal or should the mons be doing very little?
[6:12] * janos looks at his 3 little test osd's and feels very very small suddenly
[6:14] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[6:15] <michaeltchapman> ha! We have a storage array being physically moved from one dc to another so I get to play with it for a few weeks. I'll be back to testing in VMs soon enough :(
[6:28] * KindOne (KindOne@h161.33.186.173.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[6:30] * KindOne (~KindOne@50.96.87.48) has joined #ceph
[6:36] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[6:38] * kbad (~kbad@malicious.dreamhost.com) has joined #ceph
[6:38] * flakrat_ (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[6:39] * terje__ (~terje@71-218-25-108.hlrn.qwest.net) has joined #ceph
[6:40] * mtk0 (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[6:40] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * deepsa (~deepsa@122.166.161.214) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * The_Bishop (~bishop@e179009120.adsl.alicedsl.de) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * yehuda_hm (~yehuda@2602:306:330b:a40:513b:92fb:8a99:e8e8) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * todin (tuxadero@kudu.in-berlin.de) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * kbad_ (~kbad@malicious.dreamhost.com) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * terje (~terje@71-218-25-108.hlrn.qwest.net) Quit (synthon.oftc.net oxygen.oftc.net)
[6:40] * terje_ (~joey@71-218-25-108.hlrn.qwest.net) Quit (synthon.oftc.net oxygen.oftc.net)
[6:42] * deepsa (~deepsa@122.166.161.214) has joined #ceph
[6:42] * The_Bishop (~bishop@e179009120.adsl.alicedsl.de) has joined #ceph
[6:42] * yehuda_hm (~yehuda@2602:306:330b:a40:513b:92fb:8a99:e8e8) has joined #ceph
[6:42] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[6:44] * terje (~joey@71-218-25-108.hlrn.qwest.net) has joined #ceph
[6:51] * Machske (~bram@d5152D87C.static.telenet.be) Quit (Quit: Ik ga weg)
[7:06] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[7:06] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:30] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[7:31] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit ()
[7:42] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[8:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:12] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:13] * n-other (024893bb@ircip1.mibbit.com) has joined #ceph
[8:24] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[8:24] <madkiss> cheers
[8:25] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:25] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:44] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:44] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:48] * loicd (~loic@90.84.144.207) has joined #ceph
[8:50] * low (~low@188.165.111.2) has joined #ceph
[8:54] * loicd (~loic@90.84.144.207) Quit (Quit: Leaving.)
[9:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:05] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:24] * LeaChim (~LeaChim@5ad684ae.bb.sky.com) has joined #ceph
[9:30] * maxiz (~pfliu@202.108.130.138) has joined #ceph
[9:35] * loicd (~loic@178.20.50.225) has joined #ceph
[9:46] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:52] * pixel (~pixel@81.195.203.34) has joined #ceph
[9:53] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:54] <pixel> Hi everybody, how to resolve the error "FATAL: Module rbd not found. " (modprobe rbd) ?
[9:54] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:57] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[10:07] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[10:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[10:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[10:22] * Leseb_ (~Leseb@193.172.124.196) has joined #ceph
[10:30] * Leseb (~Leseb@193.172.124.196) Quit (Ping timeout: 480 seconds)
[10:30] * Leseb_ is now known as Leseb
[10:40] * n-other (024893bb@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[10:48] * maxiz (~pfliu@202.108.130.138) Quit (Quit: Ex-Chat)
[10:49] * yoshi (~yoshi@80.30.51.242) has joined #ceph
[10:49] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[10:52] * The_Bishop_ (~bishop@e179011086.adsl.alicedsl.de) has joined #ceph
[10:54] * The_Bishop (~bishop@e179009120.adsl.alicedsl.de) Quit (Read error: Operation timed out)
[11:01] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:27] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[11:34] * irctc505 (~2ee79308@2600:3c00::2:2424) has joined #ceph
[11:34] * irctc505 (~2ee79308@2600:3c00::2:2424) Quit ()
[11:34] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:35] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:48] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:52] * agh (~2ee79308@2600:3c00::2:2424) has joined #ceph
[11:53] <agh> hello to all, I need your help ;$
[11:53] * pixel (~pixel@81.195.203.34) Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[12:27] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[12:27] * Morg (d4438402@ircip3.mibbit.com) has joined #ceph
[12:29] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[12:30] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:40] * pixel (~pixel@81.195.203.34) has joined #ceph
[12:43] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[12:43] <dxd828> Hey all
[12:54] * gucki (~smuxi@46-126-114-222.dynamic.hispeed.ch) has joined #ceph
[12:54] * roald (~roaldvanl@87.209.150.214) has joined #ceph
[13:25] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:38] <Psi-jack> Hmmm, blarmy! Still no response to my ML topic, where, mount.cephfs crashes the system, and oops's the kernel? :/
[13:38] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:39] * pixel (~pixel@81.195.203.34) Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[13:47] <agh> Hello, I've a big problem with radosgw
[13:47] <agh> i'm under Centos6.3
[13:47] <agh> Ceph Argonaut
[13:47] <agh> I can't succeed in . I've always a 403 Forbidden error
[13:47] <agh> any idea ?
[13:48] <madkiss> is your wsgi wrapper exedcutable?
[13:48] <agh> what is wsgi wrapper ?
[13:48] <agh> the file with "exec /usr/bin/radosgw" inside ?
[13:48] <madkiss> how did you set stuff up?
[13:49] <agh> i followed the doc
[13:49] <agh> but had to modify the init script, not compatible with CentOS
[13:49] <madkiss> which one?
[13:50] <agh> etc init.d radosgw
[13:50] <madkiss> which document did you follow?
[13:50] <agh> ah, sorry
[13:51] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:51] <agh> this one :http://ceph.com/docs/master/radosgw/config/
[13:51] <madkiss> s3gw.fcgi is what I mean
[13:52] <agh> sorry, i don't understand what you are saying
[13:53] <madkiss> is this the first time you deal with Ceph and RADOS or distributed storage as such?
[13:53] <agh> no no, my ceph cluster is working fine
[13:53] <agh> but i now want ton install a S3 gateway
[13:54] <madkiss> http://ceph.com/docs/master/radosgw/config/ in the version I see here clearly mentions that you are supposed to add a RADOS GW script
[13:54] <madkiss> which is called s3gw.fcgi
[13:54] <agh> yes, i did it
[13:54] <agh> in /var/www
[13:55] <madkiss> well then, you will need to check apache's error log.
[13:55] <agh> there is no error
[13:59] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[13:59] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[14:03] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:11] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[14:11] <dxd828> Can the mkcephfs add a new monitor to an existing cluster?
[14:12] <Psi-jack> Heh.
[14:12] <Psi-jack> So, anyone know why mount.ceph would cause kernel oops errors? ;)
[14:12] <Psi-jack> When multiple systems try to mount and use it? heh
[14:14] * `10 (~10@juke.fm) has joined #ceph
[14:15] <roald> Psi-jack, what kernel ver do you use?
[14:16] <Psi-jack> On the ceph servers, or the clients trying to mount.ceph?
[14:16] <roald> clients
[14:16] <Psi-jack> Ubuntu 12.04.1 standard: 3.2.0
[14:17] <`10> trying to get cephfs to talk to my ceph cluster; getting: libceph: bad option at 'secretfile=/root/keyring'
[14:17] <roald> that�s a pretty old one, did you try to upgrade?
[14:17] <`10> host: Linux dev 3.6.10-1-ARCH #1 SMP PREEMPT Tue Dec 11 09:40:17 CET 2012 x86_64 GNU/Linux
[14:18] <Psi-jack> I hadn't, no. Though, I believe Ceph does make updated kernels for Ubuntu systems, packaged up?
[14:18] <`10> am i out of date? mount.ceph.c in git doesn't even appear to contain such a message
[14:18] <roald> cephfs is upstream in the kernel
[14:18] <`10> but this is not the git, this is whatever is in the 3.6.10 tree
[14:19] <`10> roald: yep, but the mount script is not
[14:19] <`10> does that error look familiar to you?
[14:20] <roald> `10, sorry, was talking to Psi-jack :-)
[14:21] <Psi-jack> roald: Heh, yeah. My CephFS servers are running Arch, so they're up to 3.6.9 right now.
[14:21] <roald> `10, did you try secret=[secret]?
[14:21] <Psi-jack> But, my webservers run Ubuntu.. Previously they ran Ubuntu 10.04, but there were no packages for ceph for 10.04, so I dist-upgraded, reluctantly but successfully, to 12.04.
[14:22] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:22] <Psi-jack> But, I also recall seeing someone here saying that ceph makes some kernel packages for Ubuntu, but I'm not seeing them in the ceph official repos at all.
[14:24] <`10> roald: works; any thoughts why this would be the case?
[14:25] <roald> Psi-jack, I don�t know anything about ubuntu packages, but I do know that the 3.2 kernel you�re running is pretty old, and there were a lot of fixes to cephfs in later versions
[14:25] <roald> (remember, cephfs is in the upstream official kernel)
[14:26] <Psi-jack> Hmmm
[14:26] <roald> `10, IIRC, the secretfile= option was introduced later
[14:28] <Psi-jack> Hmmm.. Seems that 12.04 mainline PPA's only go up to 3.4.0 :/
[14:29] <`10> roald: ok, great; sure this is not a permissions/etc. issue? e.g. if i feed mount a nonexistent secretfile path, the result is the same
[14:29] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[14:30] <roald> `10, not sure. Which ceph version are you using?
[14:31] <`10> git
[14:35] <roald> `10, can you supply the option secretfile without a value?
[14:35] <roald> it should say something like �keyword found, but not file specified�
[14:36] * sagewk (~sage@2607:f298:a:607:64a1:288d:93ad:96c1) Quit (Ping timeout: 480 seconds)
[14:37] <Psi-jack> roald: Hmmm.. Well, I'll investigate that when I can. I'll also likely see about throwing together an Arch-based webserver clustered pair and see how that works.
[14:38] <Psi-jack> I just don't like the idea of running Arch for very exposed things like that. :)
[14:39] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[14:39] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:42] <`10> roald: i cannot, and i've updated to 3.7.0 mainline
[14:44] <roald> `10, what do you mean, you can�t? what does it say?
[14:44] <`10> roald: same result; "bad option"
[14:46] * sagewk (~sage@2607:f298:a:607:6df9:7a80:af99:5918) has joined #ceph
[14:47] <roald> `10, and you�re sure you�re running latest ceph?
[14:48] <Psi-jack> Heh, Well, interesting. Apparently the "script" to install 3.6.10 into Ubuntu 12.04 just basically downloads the raring ringtail mainline kernel.
[14:48] <Psi-jack> LOL.. Well, can't hurt to try. :)
[14:48] <Psi-jack> Well, can't hurt "too much". heh
[14:51] <`10> roald: yes/no; i've built from branch next
[14:51] <`10> which afaik should be enough, since it supports authx and the cluster doesn't have any bearing on the secretfile mount option
[14:57] <Psi-jack> roald: COnfirmed. Tested with 3.6.10, and the same thing happends.
[14:57] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:59] <roald> Psi-jack, i don�t know much about the whole cephfs stack, so it�s probably best to wait for a ceph dev to help you
[14:59] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:59] <Psi-jack> Yeah. That's what I'm waiting for. I wasn't sure if you were or weren't. ;)
[14:59] <roald> but you probably should include stacktraces in your question
[15:00] <agh> does somebody succeed in installing RadosGW on CentOS ?
[15:00] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[15:00] <agh> I'm fighting with it but noway, I don't succeed !
[15:00] <Psi-jack> roald: Yeah, I have screenshots of the stack traces. heh
[15:01] <roald> Psi-jack, I�m just a fan / open source dev, and i just started learning the internals :-)
[15:01] <Psi-jack> hehe
[15:14] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[15:15] <dxd828> On a two server setup is it recommended to run two monitors per server so the quorum algorithm works?
[15:18] <roald> dxd828, then you will also end up with an even number of mons; for paxos to work, you�ll need an odd number
[15:18] <dxd828> roald shall I just have 2 on one machine and 1 on another?
[15:18] <Psi-jack> dxd828: Basically, you'll need a third somewhere else, preferably.
[15:19] <Psi-jack> YOu don't want to loose 2 monitors if just 1 server goes down.
[15:19] <dxd828> Psi-jack yeah I get that… But I only have two machines for the cluster
[15:19] * Morg (d4438402@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[15:19] <dxd828> Can't only one monitor work?
[15:20] <roald> dxd828, one monitor will work, but your ceph setup will be down when that mon goes down
[15:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:21] <dxd828> roald when the others come back will they recover?
[15:22] <dxd828> I have two machines for now I was going to run one osd on each machine, one msd on each and two monitors on each. Then at a later date when I get more machines add dedicated monitors
[15:22] <Psi-jack> roald: Hmmm. I haven't done any failure testing on my setup yet, but I have 3 physical servers running ceph. Each run 3 OSD's, 1 mon, and 1 mds. If I shut one server down, will I loose access to the cluster by doing so, or will it maintain it until it looses another?
[15:24] <roald> dxd828, if it�s just for testing purposes, then one monitor is fine
[15:24] <dxd828> Psi-jackwhy do you lose access to the cluster? Fuse-fs will know about each monitor?
[15:24] <dxd828> roald its testing / production :)
[15:24] <roald> for prod purposes however, you�ll want to have at least 3 boxes
[15:24] <Psi-jack> This has very little to do with fuse
[15:25] <Psi-jack> I'm primarily using Ceph for RBD.
[15:25] <janos> hrm, is quorum and issue when 2 mons are left?
[15:25] <roald> Psi-jack, you need at least 3 mon instances (which you have), at least 1 standby mds (which you have), and enough osds to store you replicas (which you have)
[15:25] <roald> so you�re good to go :)
[15:26] <Psi-jack> roald: Hehe. yeah. I have all that. Why I went with a 3-node setup to begin with. Ideally, if I shut down one of the servers, I'll still have access to the ceph rbd cluster for reads and writes, and it'll re-balance when it comes back online with the third, if I understand things correctly.
[15:27] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[15:27] <Psi-jack> This, allows me to upgrade each server, one by one, for maintenance, I was hoping. :)
[15:27] <Psi-jack> And, of course, to loose one, temporarily, and bring it back up.
[15:28] <roald> janos, no, because you still have a quorum (2 up 1 down)
[15:28] <Psi-jack> Ahhh
[15:28] <Psi-jack> You loose quorum if you loose 2. :)
[15:29] <Psi-jack> Which is what I thought. :)
[15:29] <janos> i'd think one would be quite capable of determining a vote!
[15:29] <Psi-jack> janos: It is. But it's less than 50%
[15:29] <Psi-jack> Well, less than 51% ;)
[15:33] <Psi-jack> Well, my first time to kill one node is probably this coming weekend, for now I'm testing longevity of the cluster. Good week should be good for that. ;)
[15:33] <Psi-jack> Then I'll be changing out my cluster to use systemd service units I wrote, VERY simple, but should effectively handle it.
[15:34] <Psi-jack> osd@.service, mon@.service, mds@.service, where you enable osd@0, osd@1, osd@2, mon@a, mds@a: on node 1.
[15:34] <Psi-jack> And.. With any luck, it'll log to the journald because I have them launching each service in the foreground. :)
[15:40] * psiekl (psiekl@wombat.eu.org) has joined #ceph
[15:41] * psiekl (psiekl@wombat.eu.org) Quit ()
[15:46] <elder> nhm, I'm here.
[15:46] * psiekl (psiekl@wombat.eu.org) has joined #ceph
[15:50] <dxd828> If you have three mons and two die, there is one left how does the algorithm work?
[15:56] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[15:57] <match> dxd828: I think it won't do anything as it's no longer quorate
[15:58] <agh> help help; please ! Does someone has ever succeeded in using RadosGW on CentOS ? I'm killing myself
[15:59] <noob2> i haven't tried. i used ubuntu
[15:59] <nhm> agh: I think so, but I don't actually know for sure. Gary or one of our QA guys might have some insight.
[15:59] <noob2> ceph has custom rados packages that really helped
[16:01] <agh> yes i did it on ubuntu...but my boss WANT CentOS ...
[16:02] <agh> the whole cluster works fine under CentOS.. But i'm stuck with radosGW
[16:02] <noob2> darn
[16:02] <noob2> i have you tried something other than apache, nginx ?
[16:02] <agh> oh no... i will. Is there any config file for nginx somewhere ?
[16:08] * loicd (~loic@178.20.50.225) Quit (Ping timeout: 480 seconds)
[16:11] * yehudasa (~yehudasa@2607:f298:a:607:6105:88d2:5e1f:8abd) Quit (Ping timeout: 480 seconds)
[16:11] <dxd828> Ok read up on Paxos.. Is there anyway to run ceph redundantly with two separate racks in different DC's? We don't have three locations yet :/
[16:12] <janos> dxd828: i'm no expert, so please get more input than mine, but that should be fine, assuming you have enough role redundancy within each DC
[16:13] <janos> multiple mon's and mds's in each DC
[16:13] <dxd828> janos What I'm worried about is if one DC's network dies or something the whole cluster dies at the other location too :/
[16:13] <janos> i'd think that so long as each DC has enough pieces to be independent that should be fine
[16:14] <janos> i'd get more input than mine, but based on what i've read and messed with myself
[16:15] <janos> if all your mons and mds's are in dc1 and it dc1 dies, yeah, dc2 is going to have a stroke ;)
[16:15] * nwat (~Adium@50.12.61.82) has joined #ceph
[16:16] <dxd828> janos I have the same setup in both locations.. But thats not an even number so the system does not work even If i have live mons and mds's :(
[16:17] <janos> sounds like something else is going on then
[16:17] <janos> just a hunch on my part
[16:17] <janos> there are people in here that run large installations
[16:18] <janos> hopefully one will be around and respond
[16:18] <Psi-jack> elder: Aha! Now I have you!
[16:18] <dxd828> Its this majority thing, you have to have an odd number.. I think it is impossible to set up in two DC's :( But thanks for you help..
[16:18] <via> i would imagine you'd want a third monitor at a third location, maybe on a vps or something
[16:18] <dxd828> via That would work..
[16:19] <dxd828> via do the monitors handle the traffic for the data on the ODDs?
[16:19] <via> no
[16:19] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:20] * yehudasa (~yehudasa@2607:f298:a:607:f417:6a39:eebd:1d71) has joined #ceph
[16:20] <elder> Psi-jack, is there something you need me for? Have I been eluding you? (Should I be?)
[16:21] <Psi-jack> Hehe
[16:21] <janos> haha
[16:21] <Psi-jack> elder: Actually, I'm just trying to figure out what could be causing Kernel Ooops, trying to use mount.ceph on two systems.
[16:22] <elder> Oh. Maybe I could help on that... Do I need to look back in the IRC log for more info?
[16:22] <Psi-jack> elder: Actually, I did post the detail into the ML as well, and somewhat to here in channel.
[16:22] <Psi-jack> From the ML, the subject was: CephFS, multiple nodes sharing 1 cephfs mount, Kernel NULL pointer dereference
[16:24] <elder> Hello, Eric.
[16:24] <Psi-jack> Mornin. ;)
[16:24] <elder> Give me a minute to look at your message.
[16:24] <Psi-jack> No problem. It's a good long one. hehe
[16:29] <elder> Well, first of all, I haven't delved much into the ceph file system code yet. So my insights might not be very deep.
[16:30] * Psi-jack nods.
[16:30] <elder> But I could take a quick look at one or more of your kernel crash stack traces and see if there's anything I can deduce.
[16:30] <Psi-jack> I also realize, it's not considered "production ready" either, but I've seen where people do this setup, and it works.
[16:31] <Psi-jack> Let me see if I can get a quick kern.log dump of one of them, should hopefully still be there, so I can pastebin.
[16:31] <elder> Grea.t
[16:33] <Psi-jack> Hmmm. Interesting. That n ever made it to the kern.log. Either way, I have the screenshots, I can post up to some image site.
[16:34] <elder> OK.
[16:38] <Psi-jack> http://imgur.com/vR8Tj http://imgur.com/YHJqN http://imgur.com/a2TDm http://imgur.com/uuiqO http://imgur.com/saC2e
[16:38] <Psi-jack> Bah, so horribly out of order.
[16:38] <Psi-jack> Last one's the first one, top of the stack dump.
[16:39] <elder> No problem. Let me take a look.
[16:39] * joshd1 (~jdurgin@2602:306:c5db:310:41a7:ad0e:fb84:9bc2) has joined #ceph
[16:40] * vata (~vata@208.88.110.46) has joined #ceph
[16:40] <Psi-jack> http://imgur.com/a/lCJBw This might be easier. :)
[16:40] <roald> hey, cool, a jigsaw puzzle!
[16:40] <Psi-jack> lol
[16:41] <Psi-jack> That last URL actually has them organized by filename and in the "blog" view itself. ;)
[16:41] <Psi-jack> Since the filenames are the exact snapshot times, one after the other. :)
[16:43] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[16:44] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[16:44] * ChanServ sets mode +o scuttlemonkey
[16:45] <Psi-jack> elder: Also, I have tried using Linux 3.6.10, but results in similar behavior, so far. It still has the same effective problem, locking up and oops'ing.
[16:45] <elder> Are you able to identify the exact kernel version you're starting with?
[16:46] <Psi-jack> Stock generic kernel from, Ubuntu 12.04 was the original, 3.2.0
[16:46] <Psi-jack> Currently on 3.6.10 now, testing.
[16:46] <elder> I see you're using ceph 0.55, or master as of 12/12/12 (which is still ambiguous but that's OK for now.)
[16:46] <Psi-jack> Hehe
[16:46] <Psi-jack> Correct, that's what the ceph servers are running.
[16:46] <elder> Let me look at 12.04.
[16:46] <Psi-jack> On cweb1/cweb2, I'm using the debian-testing apt repo.
[16:47] <elder> Actually, I'm going to look at 3.6.10, because I already have that.
[16:47] <Psi-jack> Later today, I plan on putting together an Arch-based webserver, using the very same compiled packages on the server. :)
[16:47] <Psi-jack> And, cloning that setup into another VM for my replica. :)
[16:48] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[16:49] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[16:50] <Psi-jack> Just FYI, that oops stack was from 3.2.0
[16:52] * joao (~JL@89.181.148.171) has joined #ceph
[16:52] * ChanServ sets mode +o joao
[16:52] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[16:52] <Psi-jack> There's joao! Heh
[16:54] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[16:55] * nwat (~Adium@50.12.61.82) Quit (Quit: Leaving.)
[16:57] * low (~low@188.165.111.2) Quit (Quit: Leaving)
[16:57] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[17:00] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[17:03] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[17:09] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[17:12] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:19] * nwat (~Adium@50.12.61.82) has joined #ceph
[17:20] * sagelap (~sage@13.sub-70-197-145.myvzw.com) has joined #ceph
[17:22] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:40] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[17:41] <elder> Psi-jack, posted an e-mail analysis. We'll let sage decide what to do with it next.
[17:44] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[17:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:49] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[17:54] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Computer has gone to sleep.)
[17:56] * mengesb (~bmenges@servepath-gw3.servepath.com) Quit (Quit: Leaving.)
[17:57] * sagelap1 (~sage@2607:f298:a:607:f5f5:ee4f:6791:8406) has joined #ceph
[18:00] * nwat (~Adium@50.12.61.82) Quit (Quit: Leaving.)
[18:00] * sagelap (~sage@13.sub-70-197-145.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:02] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:02] * dxd828 (~dxd828@195.191.107.205) has joined #ceph
[18:02] <dxd828> Does anyone know how to get Fuse to re connect after its lost connection to the Mon's?
[18:06] <dxd828> Also has anyone got cephfs kernel driver to work in CentOS 6.3? After installing the packages I can't enable it using mod probe
[18:10] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[18:13] <via> dxd828: 2.6.32, standard kernel with el6, does not have cephfs
[18:13] <via> consider elrepo if you want a newer kernel
[18:14] <dxd828> via, great will do that :)
[18:16] * nwat (~Adium@50.12.61.82) has joined #ceph
[18:18] * mengesb (~bmenges@servepath-gw3.servepath.com) has joined #ceph
[18:19] * jlogan1 (~Thunderbi@2600:c00:3010:1:5dfe:284a:edf3:5b27) has joined #ceph
[18:23] * nwat (~Adium@50.12.61.82) has left #ceph
[18:25] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[18:25] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[18:27] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:32] * jackhill (jackhill@pilot.trilug.org) has joined #ceph
[18:33] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[18:34] * dxd828 (~dxd828@195.191.107.205) Quit (Quit: Textual IRC Client: www.textualapp.com)
[18:39] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:39] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:39] <Psi-jack> elder: Alrighty! Thanks. :)
[18:39] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:43] <Psi-jack> elder: Hmmm, so you think it's something specifically with the ceph kernel client itself? I had tried this with the fuse client, however, it was half-baked.. cweb1 used mount.ceph while cweb2 used ceph-fuse. Don't know if that makes any bit of difference, though :)
[18:43] <Psi-jack> My work-around is, sadly, to put the CephFS mount on a dedicated VM and export that via NFSv4. heh
[18:43] <denken> would it be safe to service ceph stop mon.a && rm -f /data/mon.a/pgmap/[0-9]* && service ceph start mon.a ?
[18:44] <denken> the mon filled up its file system with multi megabyte pgmap files for some reason
[18:44] <elder> I don't know for sure, I just looked at it and found where the problem was occuring. It looks pretty clear the bad pointer is dentry->d_parent. How we get to that state is something someone else can probably better explain.
[18:52] <Psi-jack> hehehe
[18:52] <Psi-jack> Okay. :)
[18:53] <Psi-jack> I'm no kernel hacker, so I'm lost in that end of things. :)
[18:53] * yehuda_hm (~yehuda@2602:306:330b:a40:513b:92fb:8a99:e8e8) Quit (Read error: Operation timed out)
[19:01] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:02] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:03] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[19:05] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[19:05] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:05] * Kioob (~kioob@luuna.daevel.fr) Quit ()
[19:09] <noob2> so when bobtail comes out should I stick with ubuntu 12.04?
[19:16] * BManojlovic (~steki@198-175-222-85.adsl.verat.net) has joined #ceph
[19:18] * sjustlaptop (~sam@2607:f298:a:607:2482:dfe3:5e9f:bb65) has joined #ceph
[19:20] <Psi-jack> Huh.
[19:20] <Psi-jack> This is wierd.
[19:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:20] <Psi-jack> elder: I just noticed something else rather interesting.
[19:20] <Psi-jack> So far, if, on another host, I mount /, and not /cweb, it seems to have... No issue..
[19:21] <Psi-jack> But, the client host this time is an Arch system, with exactly matching ceph build as the servers.
[19:23] <Psi-jack> Going to test this out in my Ubuntu web cluster. ;)
[19:25] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:30] <Psi-jack> heh, weird..
[19:31] <Psi-jack> Okay, so far, it's isolated to Ubuntu 12.04.
[19:36] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:39] * Ryan_Lane (~Adium@216.38.130.167) has joined #ceph
[19:40] * sjustlaptop (~sam@2607:f298:a:607:2482:dfe3:5e9f:bb65) Quit (Ping timeout: 480 seconds)
[19:43] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:51] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[19:52] * sagelap1 (~sage@2607:f298:a:607:f5f5:ee4f:6791:8406) Quit (Ping timeout: 480 seconds)
[19:53] * yasu` (~yasu`@dhcp-59-227.cse.ucsc.edu) has joined #ceph
[19:55] <Psi-jack> Okay, hmm, very strange. I've isolated the problem /specifically/ to cweb2. For some odd reason, only it does this issue.
[20:02] * cblack101 (8686894b@ircip1.mibbit.com) has joined #ceph
[20:03] <Psi-jack> Heh.. Crazy.
[20:04] <Psi-jack> Moved my cweb2 to another host, and it's working. :/
[20:04] <rweeks> odd
[20:04] <Psi-jack> Yes.. Very odd.
[20:04] <rweeks> something funny with the kernel on that system?
[20:04] <Psi-jack> Odd question. It's a VM. :)
[20:05] <Psi-jack> I moved it from ygg2 to ygg1 and it worked, same kernel version. :)
[20:05] <Psi-jack> Will reboot ygg2 itself, and migrate it back and see if it does any differently.
[20:06] <Psi-jack> But yeah, very odd that it would only be acting up on that server. That's actually the newer server. hehe
[20:06] <Psi-jack> Well, newer upgraded server. :)
[20:07] <Psi-jack> Went from an AMD Phenom II 4-Core Black 3.2 Ghz to AMD FX 3.6Ghz Black 8-core.
[20:07] * sagelap (~sage@38.122.20.226) has joined #ceph
[20:08] <Psi-jack> Heck, it did it again.
[20:08] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[20:08] <Psi-jack> locked up just trying to ls -lR the cephfs mount.
[20:10] <rweeks> but it's still only this VM that does it
[20:10] <rweeks> ?
[20:10] <Psi-jack> Yes
[20:10] <Psi-jack> Well only one that's been doing it.
[20:10] <Psi-jack> I wonder if I moved cweb1 to ygg2 if it would do it too.
[20:11] <Psi-jack> I just tried to change the CPU kvm was providing to it, to see if that may be it... Still crashed though..
[20:11] <Psi-jack> Going to move cweb1 to ygg2, and see. :)
[20:13] <Psi-jack> So far...
[20:14] <Psi-jack> cweb1 on ygg2(host), no problems.
[20:14] <Psi-jack> So, what this is basically telling me, destroy cweb2, clone cweb1 to cweb2, and live on. :)
[20:15] <Psi-jack> Wierd..
[20:15] <Psi-jack> Very wierd.. LOL
[20:20] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Read error: Connection reset by peer)
[20:21] <cblack101> basic question: i'm using the rbd kernel driver on a client, after I set a new replication size for a pool, is restarting all the ceph services (OSDs) all I need to do to get things into shape?
[20:22] <sjust> you shoudn't even need to do that
[20:22] <sjust> ceph osd set <pool> size <N>
[20:22] <sjust> right?
[20:24] <cblack101> yep, that's the command, ceph -w churns about in the background as expected until 'degraded' reaches 0.000%
[20:24] <sjust> yeah, that's it
[20:24] <sjust> no need to restart any osds, the objects will re-replicate online
[20:25] <cblack101> Just wondering if restarting OSDs was necessary or I was just being overly cautious
[20:25] <sjust> nope, no need
[20:25] <cblack101> Looks like the latter ;-)
[20:31] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[20:48] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:03] * agh (~2ee79308@2600:3c00::2:2424) Quit (Quit: TheGrebs.com CGI:IRC (Session timeout))
[21:04] <Psi-jack> wow.
[21:05] <Psi-jack> Cloned cweb1's OS disk to cweb2's OS disk, booted up cweb2, changed IP's, hostname, etc..
[21:05] <Psi-jack> Works flawlessly with cephfs now.
[21:05] <Psi-jack> Both servers mounted, can ls -lR their cephfs volume, Apache's started on both, LVS is load balancing.
[21:08] * l0nk (~alex@38.107.128.2) has joined #ceph
[21:09] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:12] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[21:12] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:17] <rweeks> huh
[21:17] <rweeks> wonder what was wrong with cweb2
[21:17] <Psi-jack> Indeed.
[21:17] <rweeks> because what you have now is how it should work
[21:17] <Psi-jack> Yep. :)
[21:17] * DrewBeer (~exstatica@216.152.240.194) has joined #ceph
[21:17] <Psi-jack> That's why I wanted it this way. HA storage, for once, instead of failed storage. :)
[21:17] <rweeks> indeed
[21:18] * sagelap (~sage@38.122.20.226) has joined #ceph
[21:18] <Psi-jack> Heh.
[21:18] <Psi-jack> Totally wierd. Those servers were both generally a clone of each other and maintained as such. Any updates or changes to one was mirrored to the other.
[21:20] <Psi-jack> When I converted them from qcow2 disk images via NFSv4 to RBD-Ceph disks, I used a VM that provided the qcow2 disk and RBD disk, partitioned, rsync'd the raw data, and chrooted in and installed grub back into the new disk, and voila.
[21:20] <Psi-jack> Wierd..
[21:21] <Psi-jack> I was almost done getting the annoying VM for providing NFSv4 from CephFS, when I noticed.. Oh.. Wait.. This is working, perfectly, for the CephFS portion.
[21:21] <phantomcircuit> if i have two groups of osds which are on two continents am i going to see random very high latency on writes?
[21:21] <phantomcircuit> (im guessing yes)
[21:21] <rweeks> yes
[21:21] <phantomcircuit> Psi-jack, magically fixed itself?
[21:21] <rweeks> that's definitely not a recommended layout for Ceph today
[21:22] <Psi-jack> phantomcircuit: No. Very strange issue I isolated to the failing VM.
[21:22] <phantomcircuit> rweeks, that's what i figured
[21:22] <Psi-jack> So, I cloned cweb1 to cweb2. And it works now, flawlessly.
[21:22] <phantomcircuit> i have really cheap compute power in one dc and really cheap storage in another
[21:22] <phantomcircuit> problem is they're on different continents lol
[21:22] <rweeks> phantomcircuit: geo-replication is on the roadmap, but it is not currently a feature
[21:22] <Psi-jack> I kept cweb2's disk, though, so I can analyze later.
[21:22] * dmick (~dmick@2607:f298:a:607:343b:1e17:4acf:c62d) has joined #ceph
[21:23] <phantomcircuit> Psi-jack, ah so you problem wasn't ceph related at all
[21:23] <phantomcircuit> just the most liekly candidate
[21:23] <phantomcircuit> hate when that happens
[21:23] <Psi-jack> phantomcircuit: It was an issue..... I don't know actual root cause, fully yet. ;)
[21:24] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:24] <Psi-jack> I even went as far as to install the ceph-fs-common from the gitbuilder master branch. :)
[21:24] <Psi-jack> WHICH, BTW, had some definite differences in dependancies, needing boost stuff, which wasn't in the debian-testing apt repo for precise.
[21:26] <Psi-jack> So yeah, now we'll see if I maintain my faster access times to my website now. :D
[21:26] <Psi-jack> Switching from NFSv4, which got me averages around 740ms, was reduced to 600ms with CephFS on a single server.
[21:27] <phantomcircuit> 740ms access times o.o
[21:27] <Psi-jack> For Drupal, that's epic. :)
[21:29] <phantomcircuit> oh god drupal
[21:29] <Psi-jack> When I tried GlusterFS, my access times was 2,300ms~3,200ms averaging. Dependant on load.
[21:29] * BManojlovic (~steki@198-175-222-85.adsl.verat.net) Quit (Ping timeout: 480 seconds)
[21:29] <phantomcircuit> someone once asked me to change the color on their drupal site
[21:30] <phantomcircuit> i said i'd try figuring it couldn't be that hard... right?
[21:30] <phantomcircuit> wrong
[21:30] <Psi-jack> GFS2 optimized as much as possible for least amount of locking, upwards of 4,000~5,000ms
[21:30] <Psi-jack> It's really easy, actually, if you know what you're doing. :p
[21:30] <rweeks> dang
[21:30] <rweeks> Psi-jack: that's some really useful comparison numbers. :)
[21:31] <rweeks> nhm: are you following this?
[21:31] <Psi-jack> NFSv4, however, was the fastest solution, but unfortunately, the least HA.
[21:31] <phantomcircuit> Psi-jack, yeah i had no idea and this was an install that had been "customized" by a number of people
[21:31] <Psi-jack> CephFS however. HOLY CRAP.. It's phenomenally faster!
[21:32] <Psi-jack> 1> I can boot up ALL my VM's, at the same time, (roughly 18), all using RBD for their OS disks, and not bog the storage server down so much that it takes the OS 5~10 minutes to load.
[21:33] <Psi-jack> 2> Accessing the storage cluster doesn't cause other systems to bog down trying to read/write to disk, causing ready-timeout issues.
[21:33] <phantomcircuit> lol that vm startup times
[21:34] <phantomcircuit> i had to write an unreasonably complicated script to restart vms in the event of host failure with qcow2 raid1
[21:34] <Psi-jack> 3> Access speeds are majorly increased all over, and high availability, redundancy, and.. Not too sure yet about scalability, but I likely won't be adding a 4th+5th node any time soon to find out. ;)
[21:34] <Psi-jack> phantomcircuit: Dude.. Pacemaker.
[21:34] <Psi-jack> Seriously. Learn it. :)
[21:35] <dmick> Psi-jack: awesome news.
[21:35] <Psi-jack> They have a resource-agent specifically designed for libvirt, even. I used to use it before I ultimately decided to use Proxmox VE when 2.x came out. (I had used 1.x too, switched to libvirt, then switched back to Proxmox VE when 2.x)
[21:35] <Psi-jack> dmick: Yeah. I had some major issues with CephFS, but isolated it to the VM itself having the problem. Very wierd. VERY wierd..
[21:36] <rweeks> also likely weird.
[21:36] <rweeks> *cough* pedant *cough*
[21:36] <Psi-jack> But yeah. Ceph's definitely got my vote of confidence, so far. And I'm running bleeding edge. :D
[21:36] <rweeks> great to hear
[21:37] <Psi-jack> I know, CephFS itself isn't considered production ready, so I will keep reasonably up-to-date until 0.56 comes out, and let you know if I find any issues.
[21:37] <rweeks> actually in your scenario with one MDS it should be fine
[21:38] <Psi-jack> Probably.
[21:38] <phantomcircuit> Psi-jack, lol im not even talking about ha
[21:38] <Psi-jack> But, I wanted it HA too, for who knows which server will go down, and keeping the storage servers up-to-date means updating them, rebooting (kernel updates and all), doing the next one, update, reboot, etc,.
[21:38] <phantomcircuit> im just talking about restarting a host where libvirt has guests set to autostart
[21:38] <Psi-jack> Ahh yeah.
[21:39] <Psi-jack> libvirt's autostart is often fail,
[21:39] <Psi-jack> One of the many reasons I got away from libvirt.
[21:39] <phantomcircuit> Psi-jack, libvirtd itself would lockup for about 30 minutes
[21:39] <phantomcircuit> horrible
[21:39] <Psi-jack> If you don't RA-manage that, it will fail you miserably. :)
[21:39] * jlogan1 (~Thunderbi@2600:c00:3010:1:5dfe:284a:edf3:5b27) Quit (Ping timeout: 480 seconds)
[21:39] <phantomcircuit> i no longer define the domains at all i just use non persistent domains
[21:39] <Psi-jack> Even if you do, it will fail you miserably. :)
[21:39] <nhm> rweeks: I am now!
[21:40] <phantomcircuit> basically im just using libvirtd as a way of talking to qemu
[21:40] <nhm> Psi-jack: that's fantastic. If you feel comfortable doing it and have time, it'd be great to see a writeup of your comparisons.
[21:40] <phantomcircuit> which is sort of silly considering all the code in it...
[21:40] <Psi-jack> rweeks: I my current scenario, I'm only using CephFS between two load-balanced webservers, but now that this is working, I'm going to get my dual-mail servers back up which will also be sharing it's own cephfs subdir as well, for the Maildir.
[21:40] * roald_ (~roaldvanl@87.209.150.214) has joined #ceph
[21:40] <rweeks> nifty.
[21:40] <nhm> Psi-jack: Regardless of where ceph lies, these kinds of things are incredibly useful for the community.
[21:41] <rweeks> ^^^ +1
[21:41] * roald (~roaldvanl@87.209.150.214) Quit (Read error: Connection reset by peer)
[21:41] <Psi-jack> nhm: I will be, actually. I'll be writing my own little review and probably put it up on my site, www.linux-help.org ;)
[21:44] <nhm> Psi-jack: Awesome. :) If you do, let us know and our community/marketing guys will tweet about it and maybe post a link on our blog. :)
[21:45] <rweeks> Yeah, let rturk or scuttlemonkey know when you do
[21:45] <nhm> Psi-jack: Any tuning you've done (both on the ceph side and for NFS/GlusterFS/etc) would be useful too!
[21:45] <Psi-jack> nhm: Hehehe
[21:45] <Psi-jack> nhm: Oh yeah. I'm VERY thorough on stuff like that. :)
[21:46] <nhm> Psi-jack: Glad to hear it! I try to be on the articles I write too. If I'm ever not hopefully people will call me out on it.
[21:46] * jlogan1 (~Thunderbi@72.5.59.176) has joined #ceph
[21:46] <lurbs> Psi-jack: Every permutation of the tuning variables? ;)
[21:47] * BManojlovic (~steki@85.222.178.27) has joined #ceph
[21:47] <nhm> Psi-jack: I'm working on trying to get a Bobtail vs Argonuat comparison out now. Then after that it will be on to smalliobench and then parametric sweeps of tunings for ceph parameters to see how they affect performance on different raid/jbod setups.
[21:48] <Psi-jack> lurbs: Even to the point I configured APC to not stat files after caching them, because with GlusterFS, every stat will cause a self-heal action.
[21:49] <Psi-jack> I will need some help later from you guys on CRUSH stuff. I want to define out my CRUSH map differently that it is now, and insure certain replica's within the cluster. :)
[21:49] <Psi-jack> Feels weird saying CRUSH like that in terms of replication. :D
[21:50] <Psi-jack> Hmm, speaking of which, 10 minutes, and I go home! :D
[21:51] <nhm> Psi-jack: east coast?
[21:51] <Psi-jack> While I'm thinking about it. One simple question I do have is: With CephFS, can you specify a pool to use, rather than just always using the data pool?
[21:51] <Psi-jack> nhm: Yep.
[21:51] <Psi-jack> mount.ceph specifically.
[21:52] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[21:52] <Psi-jack> nhm: 7a-4p. And unfortunately, last week, I found out I'm in the Titanic. Company's deprecating the main product I engineer and support, server-wise. And it's the primary reason this company even has Linux at all. ;)
[21:52] <rweeks> ugh
[21:52] <rweeks> that's no fun
[21:52] <Psi-jack> Yeah... Heh
[21:53] <nhm> Psi-jack: boo
[21:53] <Psi-jack> For me, it's okay, for about a year... Maybe two.. But.. Definitely on a sinking ship./ :)
[21:54] <nhm> Psi-jack: That happened to a friend of mine, though it was kind of lucky for him. He didn't realize how much he hated it until he started a new job that he really likes.
[21:54] <Psi-jack> Yeaaah.
[21:54] <Psi-jack> This job's frustrated me a lot, because of all the Windows nonsense involved.
[21:54] <Psi-jack> Basically the company I started working for was bought out by a Windows .NET shop. :)
[21:55] <rweeks> eww
[21:56] <Psi-jack> Exactly.
[21:56] <janos> don't bag on .net! (bag on windows though)
[21:56] <nhm> Psi-jack: this thread may be useful. Not sure how much of it is valid at this point: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/6148
[21:57] <Psi-jack> I will not just bag on .net, but I will defacate on it! :D
[21:57] <janos> awww man!
[21:57] <janos> i like .net/mono
[21:57] <janos> can't stand windows administration and methods though
[21:57] <Psi-jack> heh
[21:57] <Psi-jack> nhm: Hmm.
[21:57] <janos> well can't stand most .net "developers" either
[21:57] <janos> out of the box crap is well, crap
[21:57] <nhm> I did some work in C# using mono. It's ok. I don't remember loving it or hating it.
[21:58] <janos> or as i call it, clicky-draggy developers
[21:58] <Psi-jack> nhm: Just changing the mount path :/ vs :/cweb, doesn't change the actual pool of which the data comes from.
[21:58] <Psi-jack> Ideally, I would want to use a different pool for different types of data.
[21:58] <nhm> Psi-jack: probably best to talk to Greg. I have no idea what the current state of the FS is.
[21:58] <Psi-jack> Like, I wouldn't want mail servers to share the same data pool as the webserver, only differentiated by it's base mount path origin.
[21:59] <Psi-jack> :)
[21:59] <Psi-jack> janos: I thought it was funny.
[21:59] <Psi-jack> my co-worker behind me's making a tool, and was trying to make a textual table to work properly in creating an outlook email message.. I was like.. Outlook's using HTML by default..
[22:00] <Psi-jack> And as such, not using a monospaced font...
[22:00] <janos> ugh
[22:00] <janos> mail in general....
[22:00] <Psi-jack> lol
[22:00] <janos> outlook in particular
[22:00] <janos> just not a pretty scene
[22:00] <Psi-jack> Yeah.. I know...
[22:00] <janos> smtp will never get the full rewrite it needs ;)
[22:01] <janos> in the meantime i'll continue to love on postfix
[22:01] <rweeks> Psi-jack: looks like the cephfs command is where you set the pool
[22:01] <rweeks> http://ceph.com/docs/master/man/8/cephfs/
[22:02] <Psi-jack> Hmmm
[22:05] <rweeks> If I'm reading this correctly you can do the set options on a file or directory
[22:05] <rweeks> which means you can then use the --pool option to set the pool for that file or directory
[22:05] <rweeks> but I would want one of the devs to confirm that
[22:06] <Psi-jack> Odd.. I was thinking a mount option more so than anything. ;)
[22:06] <Psi-jack> The idea I was thinking is authentication-wise, having a different auid requirement for different pools, and of course different allocation, crush, etc.
[22:06] <rweeks> yep
[22:06] <rweeks> so you'd specify pools for certain dirs that are exported
[22:07] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[22:08] * silversurfer (~silversur@61.121.217.50) has joined #ceph
[22:08] <phantomcircuit> janos, there are methods to windows administration?
[22:08] <phantomcircuit> i always got the impression it was just "try this... nope try something else" in a giant loop
[22:08] <houkouonchi-work> nhm: I have made a bunch of modifications to this guys software that is c# made for windows but I am running on linux with mono =)
[22:08] <houkouonchi-work> that was fun since I am not really a developer so learning c# =)
[22:09] <houkouonchi-work> its been a good learning experience though
[22:09] <janos> phantomcircuit: sadly that's my general experience as well
[22:11] <Psi-jack> Blah.
[22:12] <Psi-jack> Love those last minute "I'm going to die" webserver issues, just before it's time to go home. ;)
[22:13] <Psi-jack> BBIAB
[22:17] * kYann (~Yann@did75-15-88-160-187-237.fbx.proxad.net) has joined #ceph
[22:17] * kYann (~Yann@did75-15-88-160-187-237.fbx.proxad.net) Quit ()
[22:18] * silversurfer (~silversur@61.121.217.50) Quit (Read error: Connection reset by peer)
[22:19] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[22:23] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[22:30] <nhm> houkouonchi-work: I wrote a raytracer in C# just to play around with it. Ended up porting it to C and making it into a photon mapping engine.
[22:30] <nhm> C++ rather
[22:31] <nhm> I'm a programmer, just a slow one with bad habits. ;)
[22:31] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[22:33] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:35] <Psi-jack> Okay, nice...
[22:36] <Psi-jack> Starting to get the hourly average report since the fix of my load-balanced webservers. Starting to dip down to 550ms. :D
[22:38] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:38] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit ()
[22:38] * l0nk (~alex@38.107.128.2) Quit (Quit: Leaving.)
[22:44] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[22:46] <rweeks> sweet
[22:47] <Psi-jack> Very. :)
[22:47] <Psi-jack> heh, I'm half-pondering putting my databases on RBD volumes now. LOL
[22:48] <dmick> CEPH ALL THE THINGS
[22:48] <houkouonchi-work> just ceph-it
[22:49] * KYann (~KYann@did75-15-88-160-187-237.fbx.proxad.net) has joined #ceph
[22:49] <KYann> Hi !
[22:50] <rweeks> Psi-jack: that's a logical step. Since the RBDs are striped across multiple OSDs you should get good performance
[22:50] * yoshi (~yoshi@80.30.51.242) Quit (Remote host closed the connection)
[22:51] <KYann> I'm having trouble with the radosgateway. It timedout on init and won't start
[22:51] <KYann> my cluster is degraded but I don't understand why it is a problem for the radosgateway
[22:52] <KYann> I tried to increase init timeout but the rados won't succeed init
[22:52] <KYann> maybe some of you know what to do, or log in that case ?
[22:53] * timmclaughlin (~timmclaug@69.170.148.179) has joined #ceph
[22:54] <nhm> Psi-jack: depending on the database, it can be tough. Some of them are single threaded with small IO sizes. That's painful on any distributed storage system.
[22:56] * rread (~rread@c-98-234-218-55.hsd1.ca.comcast.net) has joined #ceph
[22:59] <janos> curse, my latent urges to buy hardware have come back strong since discovering ceph
[22:59] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:00] <dmick> KYann: certainly the first thing I'd look at is the radosgw log
[23:00] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:09] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:15] * timmclaughlin (~timmclaug@69.170.148.179) Quit (Remote host closed the connection)
[23:19] <KYann> dmick It's debug ms = x to get debug info ?
[23:20] <dmick> KYann: there are lots of debug flags possible; debug ms = X turns on debugging for the "messenger", which is basically the daemon-to-daemon message passing engine
[23:21] <dmick> first, do you have any radosgw output at all in the log?
[23:21] <KYann> none, only on console I get 2012-12-17 23:19:24.417814 7f94e2973700 -1 Initialization timeout, failed to initialize
[23:28] <dmick> does ceph -s report status OK?
[23:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[23:32] * silversurfer (~silversur@61.121.217.50) has joined #ceph
[23:32] <KYann> dmick status is not ok
[23:32] * noob2 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[23:32] <KYann> but at least 50% of the osd are up
[23:33] <dmick> but ceph can connect to the cluster, so the cluster is at least listening
[23:33] * silversurfer (~silversur@61.121.217.50) Quit (Read error: Connection reset by peer)
[23:33] <KYann> yes
[23:33] <dmick> are you using auth?
[23:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[23:39] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[23:43] <KYann> yes !
[23:47] <dmick> sorry KYann; long enough delay that I get distracted and go do other things
[23:48] <dmick> so it's possible that the key for the radosgw client isn't properly installed?...I'd try turning on auth debug (debug auth = 20, try starting radosgw again, make sure radosgw has a log file you can find)
[23:50] * roald_ (~roaldvanl@87.209.150.214) Quit (Ping timeout: 480 seconds)
[23:51] * jlogan1 (~Thunderbi@72.5.59.176) Quit (Read error: Connection reset by peer)
[23:52] * jlogan1 (~Thunderbi@72.5.59.176) has joined #ceph
[23:52] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[23:52] * silversurfer (~silversur@61.121.217.50) has joined #ceph
[23:57] * silversurfer (~silversur@61.121.217.50) Quit (Read error: Connection reset by peer)
[23:58] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.