#ceph IRC Log


IRC Log for 2012-08-06

Timestamps are in GMT/BST.

[0:41] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[0:46] * Leseb (~Leseb@ Quit (Ping timeout: 480 seconds)
[0:46] * Leseb_ is now known as Leseb
[1:12] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[1:13] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:13] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[3:05] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) Quit (Read error: Operation timed out)
[3:17] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) has joined #ceph
[3:56] * renzhi (~renzhi@ has joined #ceph
[5:03] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) has joined #ceph
[5:24] * deepsa (~deepsa@ has joined #ceph
[5:25] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit (Quit: themgt)
[5:47] * bshah (~bshah@sproxy2.fna.fujitsu.com) has joined #ceph
[5:51] * MarkDude (~MT@c-98-210-253-235.hsd1.ca.comcast.net) has joined #ceph
[5:51] * izdubar (~MT@c-98-210-253-235.hsd1.ca.comcast.net) has joined #ceph
[6:05] * izdubar (~MT@c-98-210-253-235.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[6:37] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) has joined #ceph
[6:47] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit (Quit: themgt)
[7:01] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[7:02] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) has joined #ceph
[7:07] * tnt (~tnt@167.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:11] * cdf (~cdf@ has joined #ceph
[7:11] <cdf> hi, all
[7:18] <cdf> ceph -s shows my osds are 0,0,0, but service ceph -a stop show there are 2 osds are killed, can anyone tell me why? thanks
[7:49] * cdf (~cdf@ Quit (Quit: Leaving)
[8:08] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[8:30] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:40] * tnt (~tnt@167.39-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:46] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:47] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[8:54] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[8:59] * Anticimex (anticimex@netforce.csbnet.se) Quit (Read error: Operation timed out)
[9:00] * cdf (~cdf@ has joined #ceph
[9:03] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[9:08] * MarkDude (~MT@c-98-210-253-235.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[9:09] * Leseb (~Leseb@ has joined #ceph
[9:10] * Leseb (~Leseb@ Quit (Remote host closed the connection)
[9:11] * Leseb (~Leseb@ has joined #ceph
[9:15] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) Quit (Ping timeout: 480 seconds)
[9:18] * BManojlovic (~steki@ has joined #ceph
[9:23] * cdf (~cdf@ Quit (Quit: Leaving)
[9:23] * cdf (~cdf@ has joined #ceph
[9:23] * cdf (~cdf@ Quit ()
[9:36] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) has joined #ceph
[9:44] <tnt> Mmmm: "libceph: osd1 connect authorization failure" that doesn't sound too good
[10:08] * verwilst (~verwilst@ has joined #ceph
[10:21] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:32] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:50] * fc (~fc@ has joined #ceph
[11:12] * deepsa (~deepsa@ Quit (Quit: Computer has gone to sleep.)
[11:43] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[11:45] <tnt> If a host has two disks and running 2 osd daemons, is there any benefit (or downside) in putting the journal of one osd on the data disk of the other ?
[11:48] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[12:10] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:12] * deepsa (~deepsa@ has joined #ceph
[12:14] * benner_ (~benner@ Quit (Read error: Connection reset by peer)
[12:14] * benner (~benner@ has joined #ceph
[12:30] <exec> tnt: ceph balances writes. but you'll lose both osd in case of any disk failure . are you sure?
[12:31] <exec> tnt: I have 7 drives with 7 osd on each host. and iostat shows almost identical usage in time of massive writes
[12:37] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:37] <exec> ofc, writes will be different in real usage, however disk failure what leads crash of both osd - it's not exatly what you want, right? )
[12:53] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:01] * renzhi (~renzhi@ Quit (Quit: Leaving)
[13:13] * deepsa (~deepsa@ Quit (Quit: Computer has gone to sleep.)
[13:35] * deepsa (~deepsa@ has joined #ceph
[13:38] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:38] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[13:44] <tnt> exec: well, what I'd want is for the disk not to crash :p
[13:45] <tnt> exec: but yeah, given in 'average' both osd should end up with the same load, that wouldn't really benefit anything
[14:26] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[14:26] * nhm (~nh@174-20-105-46.mpls.qwest.net) has joined #ceph
[14:32] * deepsa (~deepsa@ Quit (Quit: Computer has gone to sleep.)
[14:33] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Quit: Leaving)
[14:34] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:39] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[15:01] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:07] * deepsa (~deepsa@ has joined #ceph
[15:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[15:17] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[15:34] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) has joined #ceph
[15:35] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit ()
[15:58] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[16:06] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:19] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Quit: Ex-Chat)
[16:19] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[16:43] * stxShadow (~Jens@ip-78-94-238-69.unitymediagroup.de) has joined #ceph
[16:50] * loicd1 (~loic@brln-4dba8516.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[16:56] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[17:11] * nhorman_ (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[17:12] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[17:18] * Tv_ (~tv@2607:f298:a:607:d976:71b0:669f:be18) has joined #ceph
[17:25] * loicd (~loic@brln-4dbac78d.pool.mediaWays.net) has joined #ceph
[17:32] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:33] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) has joined #ceph
[17:35] * verwilst (~verwilst@ Quit (Quit: Ex-Chat)
[17:50] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[17:53] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:55] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit (Quit: themgt)
[17:56] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:00] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[18:01] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[18:03] <flakrat> Anyone know how to resolve this error when attempting to build Ceph 0.48 on CentOS6: "error: File /home/build/rpmbuild/SOURCES/libs3-trunk.tar.gz: No such file or directory"
[18:04] <flakrat> I found the same question asked in the mail archives, but no solution
[18:05] <Tv_> flakrat: sounds like it's actually trying to build libs3, there..
[18:06] <flakrat> yeah, because the spec file doesn't mention libs3 at all, so must be in the build scripts
[18:06] <Tv_> well ceph links against libs3
[18:07] <Tv_> radosgw parts do, that is
[18:07] <Tv_> you can disable building radosgw if you don't want to use it
[18:07] <Tv_> or you'll need to get libs3 built/installed
[18:07] <flakrat> Tv_, thanks, I'll work on getting libs3 installed
[18:08] <mikeryan> flakrat: if you're building out of git, you need to make sure you fetch all the submodules
[18:08] <mikeryan> git submodule update --init
[18:08] <Tv_> flakrat: our git repo has libs3... darn mike
[18:08] <Tv_> where's my tinfoil hat?
[18:09] <Tv_> though usually people building rpms start from .tar.g
[18:09] <Tv_> z
[18:10] * stxShadow (~Jens@ip-78-94-238-69.unitymediagroup.de) has left #ceph
[18:10] <flakrat> mikeryan, Tv_ I'll try building the rpm using the git tree next, after running the submodule update, I see libs3 in the tree now
[18:11] * tnt (~tnt@167.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:19] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:29] <flakrat> So, here's what I did to get rpmbuild working: 1. pull the submodules to get libs3 source, 2. tar up libs3 into libs3-trunk.tar.gz and place in rpmbuild/SOURCES, edit the cehp.spec file and replaced package names "devel" and "fuse" with "ceph-devel" and "ceph-fuse", 3. build: rpmbuild ~/rpmbuild/SPECS/ceph.spec
[18:31] <flakrat> so far it's building without any errors :-)
[18:32] <Tv_> flakrat: hmm http://www.rpm.org/max-rpm/s1-rpm-inside-package-directive.html says it's supposed to do that "subpackage" thing automatically, but i'm no rpm expert
[18:34] <flakrat> hmm, I'll take a look at the link and see if I can't figure out why it's not doing that
[18:37] * bchrisman (~Adium@ has joined #ceph
[18:43] * adjohn (~adjohn@ has joined #ceph
[18:48] * Cube (~Adium@ has joined #ceph
[18:57] * dmick (~dmick@2607:f298:a:607:5f8:127f:6ff7:5407) has joined #ceph
[18:57] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:10] * chutzpah (~chutz@ has joined #ceph
[19:14] * elder (~elder@2607:f298:a:607:3c9c:52cb:843e:71a9) has joined #ceph
[19:16] <flakrat> strange, I copied the original spec file over my edited one and now it's not complaining about "package devel already exists", looks like the subpackage thing is working.
[19:28] <nhm> joshd: Mark mentioned you were doing some interesting investigation on friday with the filestore. Any news?
[19:29] <nhm> doh, sorry that was for Sam
[19:29] <sjust1> nhm: yeah, I think we are hitting a create bottleneck with that test, I'm putting together a real small random io tester
[19:31] <nhm> sjust1: yeah, it's almost certainly a create bottleneck.
[19:32] <sjust1> nhm, mikeryan: in the mean time, rbd would yield more realistic numbers
[19:32] <nhm> sjust1: did you see the create numbers from the ceph/gluster tests I did?
[19:32] <sjust1> create in ceph? that's going to be mds limited
[19:33] <nhm> sjust1: It would be interesting to compare it and see if it's actually the MDS.
[19:34] <sjust1> nhm: yeah, but at the moment we should probably nail down basic small io performance
[19:35] <nhm> sjust1: agreed, I don't want to spend much time on the MDS until we have the filestore in a place we like.
[19:35] <sjust1> yep
[19:35] <nhm> sjust1: Is your intention to run the create benchmark on the underlying filesystem?
[19:35] <sjust1> nhm: I'm not interested in creates at the moment
[19:36] <sjust1> rados will happily do mid-file partial overwrites, so I'm putting together a benchmark that'll do small writes around a maybe 1000 objects
[19:39] <nhm> sjust1: what's the goal?
[19:39] <sjust1> that should be a very easy pattern for the osd, so to verify that it's fast
[19:39] <sjust1> it also is pretty similar to rbd
[19:45] <sjust1> nhm: mikeryan is going to try to do some basic fio testing on rbd on a small plana cluster
[19:45] <mikeryan> hey, that's me
[20:01] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) Quit (Remote host closed the connection)
[20:06] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Quit: leaving)
[20:07] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[20:11] <dmick> sjust1: there's rbd-fsx, of course; not sure if it's flexible enough to be interesting
[20:11] <dmick> but it has a lot of options
[20:12] <mikeryan> dmick: what's the difference between rbd and rbd-fsx
[20:12] <mikeryan> i'm looking at those right now
[20:12] <dmick> by rbd-fsx I mean the test where we took the fsx test and hacked it to work against rbd
[20:12] <dmick> let me find its actual name
[20:13] <dmick> source in test/rbd/fsx.c
[20:13] <dmick> binary: test_librbd_fsx
[20:13] * danieagle (~Daniel@ has joined #ceph
[20:13] <mikeryan> Copyright (C) 1991, NeXT Computer, Inc. All Rights Reserverd.
[20:14] <mikeryan> does this operate at the fs level?
[20:14] <mikeryan> sjust1's idea was to use fio to benchmark the raw block device
[20:15] <joshd> test_librbd_fsx is block level
[20:16] <sjust1> that also sounds good
[20:16] <joshd> what are you trying to measure?
[20:16] <sjust1> joshd: small io without creates
[20:17] <joshd> fsx can do small writes, but you'd need to change it to write a zero to each object to create them all
[20:18] <sjust1> joshd: meh, it's probably ok as long as it runs for a while
[20:20] <mikeryan> running a smoke test now on three plana machines, i'll let you know how it goes
[20:20] <joshd> if you're just measuring throughput in the osd that'll be fine
[20:20] <joshd> it's a stress test, and doesn't try to measure performance
[20:21] <mikeryan> hrm, i believe we're going to instrument FileStore to provide some direct measurements
[20:22] <joshd> that'd be fine then
[20:22] <joshd> just turn off hole punching (-H) so objects aren't deleted
[20:23] <mikeryan> is that configurable through the rbd_fsx teuth task?
[20:23] <mikeryan> doesn't seem to be, but it's easy enough to add
[20:31] * Cube (~Adium@ Quit (Ping timeout: 480 seconds)
[20:31] * Cube (~Adium@ has joined #ceph
[20:33] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) has joined #ceph
[20:36] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[20:39] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:55] * tnt_ (~tnt@45.124-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:57] * tnt (~tnt@167.39-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[21:07] * BManojlovic (~steki@ has joined #ceph
[21:08] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[21:12] * JohnS50 (~quassel@71-86-129-2.static.stls.mo.charter.com) has joined #ceph
[21:13] <JohnS50> I have a question about ceph & iscsi. Is this an appropriate place to discuss it?
[21:23] <dmick> JohnS50: sure, go ahead
[21:24] <JohnS50> I am hoping to get a little redundant iscsi drive created
[21:25] <JohnS50> i have 3 linux boxes running the monitor part and 2 running the osd
[21:25] <dmick> and you want to use the Ceph cluster to back an iscsi instance?
[21:25] <JohnS50> i was reading about iscsi targets and saw 1 article that said to expose the rbd to the iscsi target
[21:26] <JohnS50> i saw a different article that said that using the kernel rbd driver on the osd was BAD
[21:26] <JohnS50> yes, I want to use the ceph cluster as the back end
[21:27] <JohnS50> I'm getting a little confused over what iscsi needs - does it need the ceph file system or the rbd block device?
[21:28] <joshd> why do you need iscsi specifically?
[21:28] <dmick> I believe you *can* do that with a Ceph cluster; I'm not sure it gains you a lot over standard software raid, but it should be possible. In this context, iscsi would be the software target daemon running against some background storage. But yeah, what joshd said is probably relevant
[21:29] <joshd> can you not use rbd directly on your clients?
[21:29] <JohnS50> why does a dog lick --nevermind -- i want to connect the storage to a vmware host
[21:30] <JohnS50> I don't know about using rbd directly. help?
[21:31] <dmick> so rbd just creates a normally-usable kernel block device
[21:31] <dmick> (or non-kernel, but for your usage case kernel is probably the right answer)
[21:31] <dmick> that talks to the Ceph cluster instead of talking to a device directly
[21:31] * pentabular (~sean@adsl-71-141-229-174.dsl.snfc21.pacbell.net) has joined #ceph
[21:31] <dmick> so, it's "just a disk", and anything that uses "just a disk" can use it
[21:32] <JohnS50> I saw 1 article that said using the kernel mode driver on the same machine as the osd service was bad - it didn't say why
[21:32] <JohnS50> Any ideas on that? or just bad info?
[21:32] <JohnS50> kernel mode rbd driver
[21:33] <dmick> do you need to run the VMs on some/all of the same machines as the Ceph cluster?
[21:33] <JohnS50> vmware host is a separate machine
[21:33] <dmick> so the kernel mode driver will run on that machine
[21:33] <JohnS50> I was hoping to get some iscsi, redundate storage for it, cheaply
[21:34] <dmick> and provide a block device abstraction as a locally-visible disk for that machine
[21:34] <dmick> through krbd
[21:34] <dmick> krbd will then contact the cluster across the network
[21:35] <JohnS50> you lost me. The vmware host just uses iscsi
[21:35] <dmick> the vmware host doesn't have to use iscsi
[21:35] <dmick> It can surely use a local disk too, right?
[21:36] <JohnS50> vmware can use a local disk, but I don't have enought space, so I wanted to just attach something with iscsi
[21:36] <dmick> right
[21:36] <dmick> but to access Ceph, you don't have to use iscsi; Ceph has its own network protocol for accessing remote storage
[21:36] * MarkDude (~MT@c-98-210-253-235.hsd1.ca.comcast.net) has joined #ceph
[21:36] <dmick> so if you use Ceph's block device to give the appearance of a local disk
[21:37] <dmick> vmware, or anything else (ext3, dd) can use it just as though it were a local disk
[21:37] <dmick> even though it's really accessing across the network
[21:37] <dmick> no need for iscsi
[21:37] <JohnS50> but you would need something on the vmware machine to let it talk to ceph
[21:37] <dmick> and that something is the rbd driver
[21:38] <JohnS50> does vmware come with an rbd driver?
[21:38] <dmick> no, ceph does.
[21:38] <dmick> say you install rbd on the vmware machine.
[21:38] <JohnS50> that's the hard part
[21:38] <dmick> then create an rbd image. That image will show up as /dev/rbd0.
[21:38] <dmick> but none of its storage will live on the vmware machine.
[21:39] <JohnS50> it's esxi - I don't know that you can install an rbd driver for it
[21:40] <dmick> that might be different; looking at the architecture of esxi
[21:41] <JohnS50> I know iscsi works on vmware. The fewer drivers I need to install the better off I'll be.
[21:41] <dmick> oh, ESXi is the thing that *is* the operating system; sorry, I was thinking of VMWare server
[21:42] <JohnS50> sorry - newest version of esxi running on bare metal
[21:45] * jamespage (~jamespage@tobermory.gromper.net) Quit (Quit: Coyote finally caught me)
[21:46] <JohnS50> so esxi (iscsi) -> ubuntu iscsi target ->rbd (kernel mode ????) does anyone do that? Any good docs on setting it up?
[21:48] <joshd> people have done that, but I'm not aware of any docs about it specifically
[21:49] <JohnS50> ok, well if no one knows of a specific problem with kernel mode rbd on the ceph machine, I'll try it and see what happens.
[21:50] <joshd> it's possible to run into a deadlock with a kernel client on the same machine as the osds (cephfs or rbd)
[21:50] <gregaf> it should be fine if it's co-located with the monitor, though ??? and that's a "lighter" daemon anyway
[21:50] <joshd> it's an inherent limitation of kernel clients (nfs etc have the same problem)
[21:52] <JohnS50> so if I run osd and monitor and rbd all on 1 machine it might work?
[21:53] <joshd> it might, but it's safer to separate the osd and the rbd kernel client
[21:54] * nhm (~nh@174-20-105-46.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[21:55] <JohnS50> is there a non-kernel rbd client? and if so, is it worth using or is kernel mode way better?
[21:55] <joshd> there is, you can use it through qemu/kvm
[21:56] <JohnS50> ok - thanks for the info!
[21:57] <JohnS50> I'll probably be back in a few days.....
[21:57] <dmick> yeah, sorry, that was me being unclear on exactly what ESXi is; I should know better by now to not assume about VMware. Anyway, yeah, either run rbd on the monitor or
[21:57] <dmick> run it on a separate machine; either should be OK
[21:57] <dmick> and, the monitors and the OSDs can share machines, btw
[21:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:58] <JohnS50> no problem - it's hard keeping the versions of vmware straight
[21:58] <dmick> with the obvious warnings about redundancy
[22:01] <JohnS50> Thanks again for helping new users with some really cool tech.
[22:01] <JohnS50> have a good day
[22:02] <dmick> you too!
[22:06] * JohnS50 (~quassel@71-86-129-2.static.stls.mo.charter.com) has left #ceph
[22:21] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:32] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[22:37] <Tv_> FYI: as far as i can tell, sepia lab is having an internal network partition going on
[22:37] <Tv_> trying to pinpoint where, but e.g. all vercoi and some planas are accessible
[22:37] <Tv_> please refrain from running tests right now
[22:38] <dmick> I just updated keys on planas
[22:38] <dmick> only a few were unresponsive then. many more now
[22:38] <Tv_> this is more, plana02 can't reach the dhcp server
[22:40] <dmick> yes, same on 96
[22:40] <dmick> 96 observes everyone ARPing like mad
[22:44] * nhorman_ (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:44] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[22:55] <Tv_> all of plana & burnupi will reboot, to renew dhcp leases
[22:55] <Tv_> hooray ISC
[22:56] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[22:58] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) has joined #ceph
[23:07] * mpw (~mpw@chippewa-nat.cray.com) has joined #ceph
[23:10] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:13] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:17] <gregaf> Tv_: so vercoi networking should be good now?
[23:18] <Tv_> gregaf: vercoi themselves have been good all the time; individual vms may have trouble
[23:18] <Tv_> will make a round after plana & burnupi
[23:18] <gregaf> okay
[23:19] <gregaf> I can get back into one that was busted before ??? what went wrong?
[23:24] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:26] <mgalkiewicz> hi is it normal that mon takes up to 9.5GB of ram?
[23:26] <mikeryan> Tv_: i've quaffed a few beers with vixie
[23:26] <mikeryan> he's about as proud of BIND and ISC-dhcp as you'd expect
[23:29] <mgalkiewicz> and the next question is it possible to force mon listen on many ips and which osd's ip is reported to the clients?
[23:31] <dmick> mgalkiewicz: that sounds kind of excessive, but it depends
[23:32] <mgalkiewicz> I have 3 mons and 2 osds in my configuration and ceph status reports: pgmap v808564: 432 pgs: 432 active+clean; 7084 MB data, 16002 MB used, 2458 GB / 2477 GB avail
[23:32] <mgalkiewicz> so it is not a big cluster
[23:32] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit (Quit: themgt)
[23:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:33] <joshd> that's a very large epoch number - the monitors might be holding a bunch of old maps around (gregaf?)
[23:34] <mgalkiewicz> epoch number? could you explain a little bit?
[23:34] <dmick> as for which OSD's IP: the OSDs vary per object/pg, so different ones will be reported for different object accesses
[23:34] <dmick> monitors on multiple IPs...not sure
[23:34] <joshd> mgalkiewicz: the pgmap v808564 - that's >800k maps
[23:35] <mgalkiewicz> now it is v808603
[23:35] <mgalkiewicz> after a minute
[23:35] <mgalkiewicz> and map represents what?
[23:36] <joshd> hang on, I have to figure out if that's the pgmap or the osdmap epoch
[23:37] <joshd> if it's the osdmap, it's the cluster state, i.e. which osds are in/out, up/down, and the crushmap
[23:37] <mgalkiewicz> so each epoch is a different cluster state?
[23:39] <joshd> ok, that's the pgmap, so it's not suprising that it's high
[23:39] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[23:40] <mgalkiewicz> I dont know if you remember but last time a had some problems with cephx entries in log which indicated that clients cannot be authorised because of "cephx: verify_authorizer could not get service secret for service osd secret_id=0"
[23:40] <mgalkiewicz> I would like to restart the cluster (especially both osds) but I need to be sure that it is safe
[23:40] <mgalkiewicz> and will not cause any downtime
[23:40] <joshd> the pgmap stores the state (active, degraded, etc) and usage information for each pg
[23:40] <mgalkiewicz> in staging cluster the number does not increase with every invocation
[23:41] <joshd> the usage info is updated relatively frequently, so it's not surprising that the pgmap is at a high epoch number
[23:41] <mgalkiewicz> ok
[23:41] <mgalkiewicz> what about ram usage on node with mon
[23:42] <joshd> that's definitely not normal
[23:42] <mgalkiewicz> mon logs does not provide any interesting information
[23:42] <joshd> I wonder if it's related to your earlier issue with the authorizer?
[23:43] <mgalkiewicz> only osds report problems with authentication
[23:43] <joshd> but it could potentially trigger a memory leak in the monitor
[23:44] <mgalkiewicz> I would like to fix this as soon as possible because the server is going to be out of memory
[23:45] <mgalkiewicz> especially the ones with osds
[23:45] <joshd> in any case, it is safe to restart them (journal will be replayed on the osds)
[23:45] <mgalkiewicz> ok I will restart the first one
[23:48] <dmick> nhm: ping
[23:48] <mgalkiewicz> joshd: hmm osd.0 which was restarted reports some slow requests in logs
[23:49] <joshd> if it's still replaying the journal requests will be queued
[23:50] <mgalkiewicz> now it is up and in
[23:53] <mgalkiewicz> I assume that is is safe to restart the other one "pgmap v808916: 432 pgs: 432 active+clean; 7084 MB data, 14667 MB used, 2459 GB / 2477 GB avail"
[23:54] <joshd> yup
[23:54] <joshd> all active+clean gives you maximum availability when restarting
[23:55] * glowell (~Adium@2607:f298:a:607:e879:58df:6663:9d8a) has joined #ceph
[23:56] * pentabular (~sean@adsl-71-141-229-174.dsl.snfc21.pacbell.net) has left #ceph
[23:56] <mgalkiewicz> thats what I thought
[23:56] <mgalkiewicz> joshd: could you explain me the last thing
[23:56] <mgalkiewicz> is it possible to force mon to listen on many ips?
[23:57] <joshd> no, they only have one ip
[23:57] <joshd> why do you want multiple?
[23:58] <mgalkiewicz> because I am changing the infrastructure a little bit and now all servers have 2 network interfaces
[23:59] <mgalkiewicz> old clients are using old public ip
[23:59] <mgalkiewicz> and I want
[23:59] <mgalkiewicz> mon to listen on the private one for new clients
[23:59] <mgalkiewicz> but there is a time when old and new clients needs an access

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.