#ceph IRC Log

Index

IRC Log for 2012-11-20

Timestamps are in GMT/BST.

[0:00] <gregaf> 0.48 is an LTS, although it's nearing the end of its supported life as Bobtail (v0.55) will be out soon
[0:01] <gregaf> either v0.54 or v0.48 will upgrade nicely to v0.55, and v0.48 has a lot more testing
[0:01] * timmclaughlin (~timmclaug@69.170.148.179) Quit (Remote host closed the connection)
[0:01] <gregaf> so I'd stick with that for production
[0:01] <met> ok, thanks
[0:04] * paravoid (~paravoid@scrooge.tty.gr) has joined #ceph
[0:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:08] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:21] * met (~met@bl18-229-91.dsl.telepac.pt) has left #ceph
[0:21] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:24] <nwatkins> gregaf: you around?
[0:25] * vjarjadian (~IceChat7@5ad6d001.bb.sky.com) has joined #ceph
[0:25] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[0:25] <vjarjadian> hi
[0:25] <gregaf> nwatkins: yep
[0:27] * glowell (~glowell@ip-64-134-128-132.public.wayport.net) has joined #ceph
[0:28] <nwatkins> gregaf: my client log looks a bit odd with that patch from sage. An fstat following an open(O_CREAT) on some file shows the mtime on the MDS. An lstat shortly after on the same file shows the local node's time. Some time later, lstat on the the same file begins to return the MDS time as expected. Is there any way the MDS might return inconsistent data like this?
[0:29] <gregaf> nwatkins: this is all with the hadoop bindings?
[0:29] <nwatkins> gregaf: yeh, it is through hadoop, debug client = 20
[0:29] <gregaf> the MDS…really shouldn't; I can't come up with a mechanism although there might be one somewhere :/
[0:31] <nwatkins> gregaf: mind having a glance at this log to see if I'm interpreting it correctly? if it's actually weird, I'll poke around deeper
[0:33] * vjarjadian (~IceChat7@5ad6d001.bb.sky.com) Quit (Ping timeout: 480 seconds)
[0:36] <gregaf> nwatkins: I'm pretty busy but if you put it somewhere accessible I'll glance over it
[0:36] <nwatkins> gregaf: thanks. I'll stick the log up with some annotations.
[0:37] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Quit: Leaving.)
[0:44] * nwatkins (~Adium@soenat3.cse.ucsc.edu) has left #ceph
[0:50] * CristianDM (~CristianD@host165.186-108-123.telecom.net.ar) has joined #ceph
[0:52] <CristianDM> Is it possible to use ceph 0.54 in the server nodes (mon/osd) and 0.48.2 inside the qemu clients?
[0:54] <gregaf> that shouldn't be a problem, CristianDM
[0:55] <CristianDM> Thanks. I will try
[0:55] <CristianDM> :D
[0:56] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:05] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[1:10] * BManojlovic (~steki@212.69.20.139) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:11] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[1:13] * xiaoxi (~xiaoxiche@134.134.139.76) has joined #ceph
[1:15] <xiaoxi> Hello everyone~ I got a ~500MB/s sequential write performance for 8 nodeS(24 SATA as OSD,8 SSD as journal).Alought 500MB/s seems to match my SSD's write performance(130MB/s*8/2). But:
[1:17] <xiaoxi> 1.I didn't find throttle info for jounal in the log( enabled debugging)
[1:17] <xiaoxi> 2.when I try to tune filestore queue max ops = 5000
[1:17] <xiaoxi> filestore queue max bytes = 104857600
[1:17] <xiaoxi> filestore max sync interval = 120
[1:17] <xiaoxi> filestore min sync interval = 10
[1:19] <xiaoxi> the performance actually increase (from 410MB/s to 480MB/s)
[1:19] <xiaoxi> I assume performance is not bounded by journal,right?
[1:19] <sjust> is this xfs?
[1:20] <sjust> performance is basically bounded by journal
[1:20] <sjust> so that's three sata/ssd?
[1:20] <sjust> and replication 2?
[1:20] <sjust> ah, I see
[1:20] <xiaoxi> no,btrfs is used.
[1:20] <sjust> your ssd is only good for 130 MB/s?
[1:21] <sjust> rbd?
[1:21] <xiaoxi> in each node, 3 sata +1 ssd are used.
[1:21] <sjust> kernel client?
[1:21] <xiaoxi> yes,rbd
[1:21] <xiaoxi> well,I tried both kernel client and qemu,almost the same
[1:21] <sjust> yeah, performance will be limited by your ssd journal in this case
[1:22] <xiaoxi> but if this is true,I suppose I could see some throttle infomations in the log,right?
[1:22] <sjust> not necessarily
[1:22] <sjust> the client will only keep some number (16 by default?) of outstanding ops at a time, so it won't sent op 17 until op 1 ends
[1:24] <xiaoxi> I got your idea~ I will try a better SSD...BTW,SSD's sequential write performance usually not as good as we assume.
[1:24] <sjust> yeah, true
[1:24] <xiaoxi> My ssd is Intel 330 Serise
[1:24] <sjust> ah
[1:24] * adjohn (~adjohn@69.170.166.146) Quit (Quit: adjohn)
[1:24] <sjust> some of the newer intel ones were I thought good for >300
[1:24] * gucki (~smuxi@80-218-125-247.dclient.hispeed.ch) Quit (Remote host closed the connection)
[1:25] <xiaoxi> From spec it seems 400+, I really have some.Will try it on~
[1:28] <xiaoxi> An other question, why random read performance not as good as I assume before?(since I have 24 spindles, I assume a 2400+ IOPS for 4K RW,but only less than 1K IOPS is measures)
[1:29] <xiaoxi> Alought 2 replica seems can used as an explaination,but with journal's help,I really hope it will be significant better than 24*100
[1:30] <sjust> the journal only helps with short bursts
[1:31] <sjust> Our testing seems to suggest that you can expect around 60-80 iops/spindle
[1:31] <sjust> and replication knocks a factor of two off of that
[1:31] <sjust> though that depends on the filesystem
[1:32] <xiaoxi> I tried to tune the sync time and queue lenth of filestore, but there isn't noticable increase
[1:32] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:32] <sjust> how are you measuring random write?
[1:33] <sjust> oops
[1:33] <sjust> you meant random read
[1:33] <sjust> yeah, replication doesn't matter in that case, you should be seeing better than 1k
[1:33] <sjust> the sync time and queue length will only matter for writes
[1:33] <sjust> actually, decreasing sync time will probably hurt reads
[1:33] <xiaoxi> both random read& write.read is acceptable,it is around 80*24= ~2K iops
[1:34] <sjust> not by a lot
[1:34] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:34] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:34] <xiaoxi> write is ~ 40 *24=960
[1:34] <sjust> you are seeing 2k iops reads?
[1:34] <gregaf> xiaoxi: how did you build this cluster?
[1:34] <xiaoxi> yes, and I cleaned the pagecache before the test
[1:34] <sjust> xiaoxi: cool
[1:35] <gregaf> your numbers aren't atrocious but you would also see lower counts if you don't have enough PGs
[1:35] <sjust> yeah, 960 iops is pretty close to reasonable
[1:35] <gregaf> oh, n/m, I also missed the total counts
[1:36] <xiaoxi> My test method is : create 24 rbd volumes, mounted(though kernel driver or qemu) to 24 VM,run aiostress inside the VM and collect the data.
[1:36] <sjust> is this 0.54 or argonaut?
[1:36] <xiaoxi> 0.54 and argonaut show quite a bit difference in performance
[1:36] <sjust> these numbers were from 0.54?
[1:36] <xiaoxi> yes
[1:37] <sjust> how much worse was argonaut
[1:38] <xiaoxi> 1 ~ 2 IOPS per volume, that means 30~50 IOPS in total...
[1:38] <sjust> hmm...
[1:39] <xiaoxi> ah,you are hoping 0.54 shows much better result?
[1:39] <sjust> well, it's nice when newer versions are faster
[1:40] <sjust> you mean argonaut was only able to do 30-50 total iops on this cluster?
[1:40] <sjust> that sounds more like a bug
[1:40] <xiaoxi> no, I means argonaut is only 50 IPOS worse than 0.54
[1:40] <sjust> ah, ok
[1:40] <sjust> that is closer to what I would have expected
[1:42] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:42] <xiaoxi> But why jounal cannot help for continues pressure? if my journal is big enough ,I think it will really help for merging small requests to big one
[1:43] <sjust> somewhat, but with a sufficiently random workload, it won't help enough
[1:43] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[1:43] * ajm (~ajm@adam.gs) has left #ceph
[1:43] <gregaf> xiaoxi: it's not merging small requests into big ones — the journal does sequential writes but it still needs to flush those out into the regular disk and its regular filesystem
[1:44] <gregaf> so it's a burst buffer, and it gives the filesystem enough time to optimize requests, but it's not going to dramatically change the random IOPs you can obtain
[1:44] <xiaoxi> Even with sequential small write,it helps but not really. In theory , with the help of journal ,4K/128K/4M sequential write shold show a similar performance,right?
[1:45] <sjust> yeah, it should help with sequential small writes
[1:45] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[1:46] <sjust> there is a fair amount of per-op overhead, so I would not expect 4k to be quite as fast
[1:46] <sjust> how close are 128k and 4M?
[1:47] <xiaoxi> 4K is around 2.5MB/s*24,8K is around 3.75MB/s*24,128K ~ 12.5MB/s*24
[1:47] <sjust> how about 4MB?
[1:48] <sjust> aiostress is likely doing directio
[1:48] <sjust> if you were working on top of a filesystem in the vm, the page cache should take care of this for you
[1:49] <xiaoxi> gregaf:Thanks, I got your idea. well, it is really a wonder for me that I measured 2K IOPS + for random write in Amazon's standard EBS
[1:49] <xiaoxi> 4M is ~ 20MB/s *24
[1:50] * The_Bishop (~bishop@p4FCDF0D3.dip.t-dialin.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[1:50] <xiaoxi> yes, aiostress is doing directIO and I disabled the qemu-cache
[1:50] <xiaoxi> not really, I run aiostress ontop of the raw disk
[1:51] <xiaoxi> raw rbd volume actually
[1:51] <xiaoxi> and I have done dd for the rbd volume before the test
[1:52] <sjust> yeah
[1:53] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:57] <xiaoxi> so from the data, 4K/128K are much far from 4M's performance.
[1:57] <sjust> xiaoxi: yes, we need to reduce per-op overhead
[1:58] <sjust> but most use-cases run on top of a filesystem in the guest os and benefit from the guest os pagecache
[1:58] <sjust> which would erase the difference between 4k and 4M sequential io
[2:00] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[2:00] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:02] <xiaoxi> There are some kinds of workload, such as DB workload ,they do small sequential write for DB journal and flush(sync) them very often.
[2:02] <sjust> yeah, that's true
[2:02] <xiaoxi> In this senarios,filesystem ,qemu-write back cache cannot provide any help.
[2:02] <sjust> right
[2:02] <sjust> to help that, we simply have to improve small write latency
[2:03] <sjust> or complicatedly improve small write latency :)
[2:03] <xiaoxi> But in traditional sata disk, 4K sequential write is fast enough,almost the same as 4M...this is what this kind of workload expect to see..
[2:04] <sjust> only with a deep queue
[2:04] <joshd> xiaoxi: to help that journal workload specifically, you can use a smaller strip size (with recent rbd this can be done while still having larger objects)
[2:04] <joshd> stripe size even
[2:05] <xiaoxi> joshd: Can i tune this by configuration? and can I tune objects size also?
[2:05] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[2:06] <joshd> xiaoxi: yes, take a look at ceph.com/docs/master/dev/file-striping/ for the details of the striping
[2:07] <joshd> you decide the stripe unit, stripe count, and object size at rbd creation time
[2:08] <joshd> that's in the next branch, not in 0.54 though
[2:10] <xiaoxi> hmm..looking forward for 0.55~
[2:11] <xiaoxi> you mean I can modify these configuration(object size,striping mechanishm) in the next release,not in 0.54?
[2:11] <joshd> that's right
[2:12] <xiaoxi> Cool, I am hoping larger obj size + small stripe size could help for both small sequential write/big sequential write
[2:12] <joshd> you can still change stripe size in 0.54, but only because it's treated as the same as object size
[2:12] <xiaoxi> Can I change object size in 0.54?
[2:12] <xiaoxi> I would like to test it first
[2:13] <joshd> yeah, the --order option to the rbd tool
[2:14] <xiaoxi> rbd create test_volume 30720 --order 16?
[2:14] <xiaoxi> hmm.16<<20
[2:15] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[2:16] <joshd> that would be -s 30720, but that would give you 64k objects (1 << order) bytes
[2:16] <elder> Is teuthology.front.sepia.ceph.com down?
[2:16] <elder> Or do I have a problem at my end?
[2:16] <joshd> elder: I can reach it
[2:16] <elder> OK.
[2:16] <elder> What IP do you have?
[2:17] <elder> (I mean what is the ip of teuthology.front.sepia.ceph.com)
[2:17] <rweeks> can't ping it from here, but I'm not on vpn
[2:19] <xiaoxi> joshd:thanks a lot~
[2:19] <xiaoxi> sjust:thanks for your kindly help
[2:19] <joshd> you're welcome
[2:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:50] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[3:12] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:12] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:20] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[3:25] <xiaoxi> How does Ceph choose sync time? I know there is a range [min_sync_interval, max_sync_interval] can be configured ,but how ceph decide the actuall sync time in this range?
[3:26] <gregaf> xiaoxi: you'll need to wait for sjust or sage to give a good answer I think
[3:27] <gregaf> but basically there are certain things the OSD does that can trigger an early sync (ie, need to coalesce to make a pool snapshot; have dirtied too much data) as long as the min_time hasn't passed; otherwise it will wait until the max time
[3:27] <gregaf> iirc
[3:27] <gregaf> I've gotta run though, cya tomorrow
[3:27] * Ryan_Lane (~Adium@216.38.130.167) Quit (Quit: Leaving.)
[3:31] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[3:49] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[3:55] * glowell (~glowell@ip-64-134-128-132.public.wayport.net) Quit (Quit: Leaving.)
[4:05] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[4:06] * CristianDM (~CristianD@host165.186-108-123.telecom.net.ar) Quit ()
[4:16] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:17] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:19] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:25] * cclien_ (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[4:25] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (Read error: Connection reset by peer)
[4:34] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[4:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:57] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[5:20] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:20] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:23] * glowell (~glowell@ip-64-134-128-132.public.wayport.net) has joined #ceph
[5:26] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[5:26] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[5:28] * deepsa (~deepsa@122.172.159.35) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[5:37] * glowell (~glowell@ip-64-134-128-132.public.wayport.net) Quit (Quit: Leaving.)
[5:38] <dec_> how much memory should an OSD process use?
[5:39] <dec_> ours (currently running 6 OSDs per server) are using 1.1GB resident memory and over 8GB of virtual memory, per OSD
[5:39] <dec_> the 1.1GB is easily explained with a 1.1GB OSD journal configured
[5:40] <dec_> but why are they using so much virtual mem?
[5:41] * joshd1 (~jdurgin@2602:306:c5db:310:9011:885f:57da:3c7e) has joined #ceph
[5:44] * deepsa (~deepsa@122.172.159.35) has joined #ceph
[5:45] <joshd1> dec_: that's high for non-recovery - the journal isn't part of the osd's memory usage
[5:46] <joshd1> dec_: is your ceph compiled to use tcmalloc?
[5:46] <joshd1> that's in google-perftools, and reduces memory usage a lot
[5:50] <dec_> I don't know :)
[5:50] * dec_ is now known as dec
[5:51] <dmick> ldd /usr/bin/ceph-osd | grep tcmalloc
[5:51] <dec> yeah - it doesn't look like it is built with it
[5:55] <dec> we've just built RPMs from the distribution spec (I forgot to mention we're on 0.53
[5:56] * adjohn (~adjohn@69.170.166.146) Quit (Quit: adjohn)
[6:02] <dec> I'll look at building a packaged tcmalloc in with the ceph build... but in the mean time, even without tcmalloc, is this 'normal'?
[6:05] <dec> the problem is that even though these machines are reasonably beefy, with 48GB of RAM, they're triggering page allocation failures - seemingly because ceph-osd is using so much memory
[6:09] <joshd1> I've seen osds using >700mb in the past without tcmalloc, it really makes a huge difference - with tcmalloc they tend to use more like 200mb most of the time
[6:10] <joshd1> there are rpms these days for a couple platforms too: http://ceph.com/docs/master/install/rpm/
[6:10] <dec> Cool - we are EL6 x86_6464
[6:11] <dec> (however I note those are two releases behind - 0.52 <-> current 0.54)
[6:11] <joshd1> hmm, they shouldn't be behind, an rsync may have gotten stuck or something
[6:12] <dec> http://ceph.com/rpms/el6/x86_64/
[6:12] <dec> there only seems to be 0.52 there
[6:13] <dmick> was just doing the same thing, can confirm
[6:13] <dmick> not sure why
[6:14] <joshd1> the gitbuilders certainly built rpms of 0.54
[6:14] <dec> just curious - what do you use to built the rpms from git? (we're looking at our rpm building infrastructure atm)
[6:16] <xiaoxi> joshd1:are you still online?
[6:16] <joshd1> gitbuilder with some customized scripts https://github.com/ceph/autobuild-ceph/blob/master/build-ceph-rpm.sh
[6:17] <xiaoxi> joshd1:I have a question for you,How does Ceph choose sync time? I know there is a range [min_sync_interval, max_sync_interval] can be configured ,but how ceph decide the actuall sync time in this range?
[6:17] <joshd1> dec: it understands git enough to try to bisect for failures even
[6:20] <joshd1> xiaoxi: I don't know the details of that exactly
[6:21] <xiaoxi> joshd1:OK.Thanks all the way.
[6:23] <xiaoxi> joshd1:And I still cannot fully understand how OSD will act like when data disks with bigger aggregation BW than journal?
[6:25] <xiaoxi> Say I have a journal with 130MB/s for seq write, and several sata disks as data disks.Alought journal is slower than data disk,but the request will return when data disk finish,right?
[6:28] <joshd1> I'm not sure exactly what happens, since that's not a common situation, but I'd guess once the journal fills up, it will become the bottleneck
[6:30] <joshd1> for xfs and ext4, data must still be written to the journal for crash consistency
[6:36] <xiaoxi> sure,unless there is some mechanishm can "DISCARD" request in the queue.Say if data disk finished first,discard(cancel) the request queued in journal's req queue..
[6:40] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[6:52] * jstrunk_ (~quassel@146.6.139.110) Quit (Read error: Operation timed out)
[6:53] <dmick> dec: if you care to, there are rpms built at http://gitbuilder.ceph.com/ceph-rpm-centos6-x86_64-basic/ref/
[6:53] * jstrunk (~quassel@146.6.139.110) has joined #ceph
[6:53] <dmick> not sure why the repos aren't rebuilt; could just be we haven't yet automated that
[6:53] <dec> dmick: cool - thanks
[7:00] <joshd1> xiaoxi: that could work with btrfs using parallel journal mode, but ext4 and xfs require the journal to be up to date to maintain crash consistency of the main data store
[7:01] <joshd1> xiaoxi: it's btrfs snapshots in particular that make maintaining a consistent main data store easy
[7:15] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[7:16] * dmick (~dmick@2607:f298:a:607:8d34:7c9a:3476:1a1e) Quit (Quit: Leaving.)
[7:21] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Quit: Leaving.)
[7:21] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[7:27] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[7:27] * loicd (~loic@magenta.dachary.org) has joined #ceph
[7:36] <dec> ah hah - so it looks like, while ceph is using lots of memory, our allocation failures are actually network rx buffer alloc failures
[7:56] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:11] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[8:14] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:31] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[8:50] * gregorg (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[8:51] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)
[8:59] * deepsa_ (~deepsa@101.62.109.9) has joined #ceph
[9:00] * deepsa (~deepsa@122.172.159.35) Quit (Ping timeout: 480 seconds)
[9:00] * deepsa_ is now known as deepsa
[9:00] * gregorg (~Greg@78.155.152.6) has joined #ceph
[9:07] * loicd (~loic@178.20.50.225) has joined #ceph
[9:12] * joao (~JL@89.181.157.220) Quit (Ping timeout: 480 seconds)
[9:13] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[9:24] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:25] * gregorg_taf (~Greg@78.155.152.6) has joined #ceph
[9:25] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[9:27] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[9:30] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[9:32] * fc (~fc@home.ploup.net) has joined #ceph
[9:40] * pixel (~pixel@81.195.203.34) has joined #ceph
[9:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:41] <pixel> Hi
[9:44] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:44] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:45] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) Quit (Ping timeout: 480 seconds)
[9:47] <pixel> when I try to run the comman ceph-osd -i 3 --mkfs --mkkey I get the error "OSD::mkfs: FileStore::mkfs failed with error -22" ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-3: (22) Invalid argument, Does anybody know how to fix it?
[9:51] <joshd1> look at the osd log for more clues, but it may be needing the fs (if ext4) to be mounted with the user_xattr option
[9:52] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[10:06] * xiaoxi (~xiaoxiche@134.134.139.76) Quit (Ping timeout: 480 seconds)
[10:19] * pixel (~pixel@81.195.203.34) Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[10:21] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:22] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[10:22] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:32] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[10:49] * deepsa_ (~deepsa@122.172.173.96) has joined #ceph
[10:50] * deepsa (~deepsa@101.62.109.9) Quit (Ping timeout: 480 seconds)
[10:50] * deepsa_ is now known as deepsa
[11:07] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:09] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[11:32] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[11:34] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:39] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[11:50] * jtangwk (~Adium@2001:770:10:500:24b3:c9e0:700a:125c) has joined #ceph
[12:15] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[12:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[12:26] * yoshi (~yoshi@p11251-ipngn4301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:09] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[13:11] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[13:25] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[13:37] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:52] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[13:52] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[13:53] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[13:53] * joao (~JL@89.181.153.24) has joined #ceph
[13:53] * ChanServ sets mode +o joao
[14:04] * MikeMcClurg1 (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[14:10] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[14:10] * Leseb (~Leseb@193.172.124.196) Quit (Read error: Connection reset by peer)
[14:10] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[14:42] * timmclaughlin (~timmclaug@69.170.148.179) has joined #ceph
[14:48] * SIN (~SIN@78.107.155.77) has joined #ceph
[14:48] <SIN> Hello
[14:51] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Quit: foo)
[14:51] <SIN> Can someone tell me how to optimize ceph storage IO. Or maybe give a link to some docs about it.
[14:51] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[14:53] * weber (~he@219.85.117.233) has joined #ceph
[15:07] <tnt> look for the slides of the recent "ceph day" event, there were some best practice in there. Also look at the two blog post about ceph performance.
[15:24] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:25] <xiaoxi> tnt:could you please plaste the links?
[15:27] <joao> http://ceph.com/community/our-very-first-ceph-day/
[15:27] <joao> presentations on the bottom
[15:27] <joao> http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/
[15:28] <joao> http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/
[15:32] <Anticimex> nice
[15:32] <Anticimex> with such stats. moar please :)
[15:34] <tnt> Any clue when 3.1 will be out ?
[15:35] <joao> 3.1?
[15:35] <joao> I'd say a couple more years, at least
[15:36] <tnt> Damnit, wrong channel.
[15:36] <joao> if we keep up with the current versioning scheme
[15:36] <joao> :p
[15:39] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:53] <fmarchand> Hi everbody !
[15:53] <fmarchand> hi joao !
[15:53] <joao> hey there :)
[15:57] <fmarchand> I have a question ! My single machine cluster has 2 osd's and 1 mds ... when I did a iotop on it ... I noticed that osd processes had a lot of read operation on disk ...
[15:57] <fmarchand> is that normal ?
[15:57] * calebamiles (~caleb@65-183-137-95-dhcp.burlingtontelecom.net) has joined #ceph
[16:09] <fmarchand> joao: what are osd's doing ? playing minesweeper ? :)
[16:10] <tnt> scrubbing ?
[16:10] * loicd (~loic@178.20.50.225) Quit (Ping timeout: 480 seconds)
[16:13] <joao> sorry, wasn't paying attention to the channel
[16:14] <joao> fmarchand, tnt may be right
[16:15] <fmarchand> I checked with ceph -w and no .... but It was a good guess !
[16:16] <joao> what does 'ceph -s' report?
[16:16] <fmarchand> health HEALTH_OK
[16:16] <fmarchand> monmap e1: 1 mons at {a=172.16.2.72:6789/0}, election epoch 0, quorum 0 a
[16:16] <fmarchand> osdmap e104: 2 osds: 2 up, 2 in
[16:16] <fmarchand> pgmap v347065: 192 pgs: 192 active+clean; 111 GB data, 385 GB used, 1526 GB / 2014 GB avail
[16:16] <fmarchand> mdsmap e76: 1/1/1 up {0=a=up:active}
[16:16] <fmarchand> everything fine ... according to me
[16:17] <joao> yeah, looks that way
[16:17] <fmarchand> but with iotop : 18131 be/4 root 331.30 K/s 0.00 B/s 0.00 % 25.05 % ceph-osd -i 0 --pid-file /var/run~h/osd.0.pid -c /etc/ceph/ceph.conf
[16:17] <fmarchand> 18154 be/4 root 417.05 K/s 0.00 B/s 0.00 % 18.56 % ceph-osd -i 1 --pid-file /var/run~h/osd.1.pid -c /etc/ceph/ceph.conf
[16:17] <fmarchand> 18130 be/4 root 257.25 K/s 0.00 B/s 0.00 % 16.25 % ceph-osd -i 0 --pid-file /var/run~h/osd.0.pid -c /etc/ceph/ceph.conf
[16:17] <fmarchand> 18155 be/4 root 311.82 K/s 0.00 B/s 0.00 % 10.17 % ceph-osd -i 1 --pid-file /var/run~h/osd.1.pid -c /etc/ceph/ceph.conf
[16:18] <joao> assuming that you don't have anything reading from the osds, I'm not really sure what could make the osds perform that much read IO
[16:18] <joao> but then again, I can't really say I know the osd inside-out
[16:21] <fmarchand> but normally it should be idle (I mean no cpu usage) if you don't ask him anything ?
[16:22] <joao> I would assume so
[16:22] <joao> don't know if the mds can take a toll on that
[16:24] <joao> fmarchand, what version are you running btw?
[16:32] <fmarchand> 48.2argonaut
[16:39] * gaveen (~gaveen@112.134.112.98) has joined #ceph
[16:42] <fmarchand> .... I unmounted all client's fs .... killed few processes .... still doing io ...
[16:59] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:13] * xiaoxi (~xiaoxiche@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[17:13] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) has joined #ceph
[17:20] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:21] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[17:26] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:31] <fmarchand> joao : no idea ?
[17:31] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[17:31] <joao> nothing crosses my mind
[17:41] <dilemma> anyone here know much about using rbd with libvirt?
[17:41] <dilemma> I'm having some unusual problems with it after upgrading to libvirt 1.0.0
[17:45] <joshd1> dilemma: are you using cephx?
[17:45] <dilemma> yes
[17:46] <joshd1> what problems are you having?
[17:48] <dilemma> http://pastebin.com/vx0uWdL6
[17:48] <dilemma> basically, I can no longer hot-attach a device
[17:49] <dilemma> obviously, I've removed some identifying info from the config
[17:49] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:49] <dilemma> but I've confirmed that the same config works after downgrading back to libvirt 0.9.13
[17:49] <joshd1> yeah, the config looks fine
[17:50] <dilemma> the error message doesn't make any sense to me
[17:51] <joshd1> my guess is libvirt is trying to run a different qemu/kvm binary, that doesn't have rbd support, and thus tries to open it as a regular file
[17:51] <dilemma> well, that command is attaching to an existing qemu-kvm process
[17:52] <dilemma> which does have rbd support (and in fact, has an rbd volume already attached from when this worked in 0.9.13)
[17:52] <match> dilemma: I got that message before I compiled up libvirt with rbd support
[17:52] * jlogan1 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) has joined #ceph
[17:52] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[17:52] <dilemma> http://pastebin.com/YwF4jx6Y
[17:53] <dilemma> looks like I do in fact have rbd support, unless I'm missing something
[17:53] <joshd1> that particular 'cannot open file' string is found in the libvirt pool layer
[17:53] <dilemma> I'm not using pool support
[17:53] <joshd1> it shouldn't be trying to use that at all when attaching
[17:53] <dilemma> since it wasn't present in 0.9.13
[17:54] <joshd1> yeah, I'm trying to figure out where the error is coming from
[17:55] <joshd1> if you turn up libvirtd logging to debug levels, /var/log/libvirt/libvirtd.log might have something useful
[17:55] * shelleyp (~shelleyp@173-165-81-125-Illinois.hfc.comcastbusiness.net) has joined #ceph
[17:56] * noob2 (a5a00214@ircip1.mibbit.com) has joined #ceph
[17:56] <noob2> to have multiple rados gateways do i just repeat the client.radosgw.gateway in the config file with new hostnames?
[17:57] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[17:57] <noob2> or maybe i could just have the one for the host i'm currently on and another for a different host
[17:58] <shelleyp> My test ceph environment had a crash when writing to an rbd - Now the cluster is stuck in a mode where ceph -w says .. 792 stale+active+clean
[17:58] <match> dilemma: what does the following give you? 'qemu-img info -f rbd rbd:poolname/guestname'
[17:59] <joshd1> dilemma: if you add 'allow_disk_format_probing = 0' to your libvirtd.conf it should skip treating it as a file
[17:59] <joshd1> I think this is simply a regression in libvirt
[17:59] <dilemma> qemu-img: Unknown file format 'rbd'
[18:00] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:00] * tnt (~tnt@162.63-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:01] <dilemma> my fault - I'm running my qemu/rbd stack out of /opt, and I forgot the LD_LIBRARY_PATH when running that command
[18:01] <dilemma> image: rbd:pool-name/volume-name
[18:01] <dilemma> file format: rbd
[18:01] <dilemma> virtual size: 10G (10737418240 bytes)
[18:01] <dilemma> disk size: unavailable
[18:01] <dilemma> cluster_size: 4194304
[18:01] * gaveen (~gaveen@112.134.112.98) Quit (Ping timeout: 480 seconds)
[18:03] <dilemma> also, libvirtd log doesn't have anything that looks useful to me after setting the log level to 1: http://pastebin.com/nqNQsaeQ
[18:03] <match> dilemma: And libvirt knows to use the alt LD_LIBRARY_PATH in calling qemu/kvm?
[18:04] <gregaf> fmarchand: the OSD is never going to be completely idle on disk; it has a lot of background tasks and does some periodic syncs pretty much no matter what
[18:05] <gregaf> joao: ^ fyi :)
[18:05] <dilemma> # strings /proc/$(pidof libvirtd)/environ | grep LD_LIBRARY_PATH
[18:05] <dilemma> LD_LIBRARY_PATH=/opt/kvm-stack-1/lib:/opt/kvm-stack-1/lib64:
[18:05] <joao> gregaf, good to know
[18:05] <joao> thanks
[18:05] <match> dilemma: Just checking :)
[18:05] <gregaf> noob2: yeah, you just repeat the regular RGW invocation; you don't need to do any special coordination config or whatever
[18:05] <match> dilemma: I think joshd1's suggestion is probably the right one then
[18:05] <dilemma> regression?
[18:06] <shelleyp> Any ideas on the stale cluster state? My ceph health shows..
[18:06] <shelleyp> 2012-11-20 11:05:32.294802 mon <- [health]
[18:06] <shelleyp> 2012-11-20 11:05:32.295826 mon.0 -> 'HEALTH_WARN 792 pgs stale' (0)
[18:06] <gregaf> shelleyp: you probably lost enough OSDs that some PGs aren't active
[18:06] <gregaf> what's the rest of ceph -s output?
[18:07] <fmarchand> gregaf : so If have constant read operation on disk from my osd process this is normal. It's what you're saying ?
[18:08] <gregaf> I can't tell you exactly what is going on; sjust could when he's in — but yes, it's a disk manager; I'd expect it to have some amount of constant activity
[18:08] <shelleyp> Thanks ... ceph -s output is:
[18:08] <shelleyp> 2012-11-20 11:05:32.294802 mon <- [health]
[18:08] <shelleyp> 2012-11-20 11:05:32.295826 mon.0 -> 'HEALTH_WARN 792 pgs stale' (0)
[18:08] <shelleyp> 2012-11-20 11:07:36.933821 pg v7915: 800 pgs: 8 creating, 792 stale+active+clean; 24831 MB data, 0 KB used, 0 KB / 0 KB avail
[18:08] <shelleyp> 2012-11-20 11:07:36.939492 mds e27: 1/1/1 up {0=a=up:replay}
[18:08] <shelleyp> 2012-11-20 11:07:36.939611 osd e87: 4 osds: 0 up, 0 in
[18:08] <shelleyp> 2012-11-20 11:07:36.939850 log 2012-11-20 11:03:45.793488 mon.0 192.168.0.198:6789/0 15 : [INF] osd.0 out (down for 300.851998)
[18:08] <shelleyp> 2012-11-20 11:07:36.940083 mon e1: 1 mons at {a=192.168.0.198:6789/0}
[18:08] <gregaf> shelleyp: right, so all of your OSDs are down
[18:08] <gregaf> meaning, not running
[18:08] <gregaf> you'll need to turn those on to do anything with the cluster ;)
[18:09] <shelleyp> Yes, I tried doing ceph osd in 0 but no luck
[18:09] <dilemma> joshd1 match: adding 'allow_disk_format_probing = 0' had no effect
[18:10] <fmarchand> thx gregaf !
[18:10] <dilemma> added it and restarted libvirtd, same symptoms
[18:10] <noob2> gregaf: thanks :)
[18:11] <match> dilemma: possibly unrelated, but I had an odd bug where if I had a copy of ceph.conf on the vm host, and left the 'host' part out the xml, it worked, but gave errors if I left it in
[18:11] <joshd1> dilemma: yeah, the log shows virFileOpenAs (the function that shouldn't be called) but not who's calling it
[18:12] <dilemma> already had a copy of ceph.conf there (tried that on my own earlier), and just now, removing the host section has no effect
[18:12] <match> dilemma: Just a stab in the dark. Have to head - hope you get it resolved...
[18:12] <dilemma> joshd1: yeah, that function is still in the logs after setting 'allow_disk_format_probing = 0'
[18:13] <match> dilemma: Last stab in the dark - remove the qemu raw reference?
[18:14] <shelleyp> Is there something other than "ceph osd in x" that I can do to start the OSD's? I have restarted the services already
[18:17] <gregaf> shelleyp: the monitors don't think the OSDs are even running; can you go and check that the daemons are up?
[18:17] <gregaf> (that's what the "0 up" in the OSD section means)
[18:18] * gaveen (~gaveen@112.134.112.226) has joined #ceph
[18:18] <shelleyp> How do I check that the OSD daemons are up?
[18:19] <dilemma> joshd1: any ideas where I should start with the libvirtd source to track down this regression?
[18:19] <gregaf> shelleyp: go look at top or "ps aux | grep ceph" on the hosts?
[18:19] <joshd1> dilemma: my guess would be src/qemu/qemu_hotplug.c, although it may occur before it gets to that step
[18:20] <shelleyp> 3 or the 4 OSD's show up,but missing one..
[18:20] <shelleyp> root 10397 0.0 0.0 307884 5136 ? Ssl Nov19 0:20 /usr/bin/ceph-osd -i 0 -c /etc/ceph/ceph.conf
[18:20] <shelleyp> root 10495 0.0 0.0 306856 5144 ? Ssl Nov19 0:19 /usr/bin/ceph-osd -i 1 -c /etc/ceph/ceph.conf
[18:20] <shelleyp> root 10593 0.0 0.0 306856 5216 ? Ssl Nov19 0:19 /usr/bin/ceph-osd -i 2 -c /etc/ceph/ceph.conf
[18:22] <shelleyp> I cannot get the 4th OSD to start
[18:22] <shelleyp> === osd.3 ===
[18:22] <shelleyp> osd.3: not running.
[18:24] <gregaf> can you get the output of osd.3's log (should be in /var/log/ceph if you didn't change defaults) and paste it on pastebin?
[18:24] <gregaf> also grab the output of ceph -s again
[18:32] <shelleyp> Thanks - http://pastebin.com/KJgcGtS1
[18:33] <shelleyp> 2012-11-20 11:33:01.076287 pg v7916: 800 pgs: 8 creating, 792 stale+active+clean; 24831 MB data, 0 KB used, 0 KB / 0 KB avail
[18:33] <shelleyp> 2012-11-20 11:33:01.084309 mds e29: 1/1/1 up {0=a=up:replay}
[18:33] <shelleyp> 2012-11-20 11:33:01.084426 osd e88: 4 osds: 0 up, 0 in
[18:33] <shelleyp> 2012-11-20 11:33:01.084747 log 2012-11-20 11:22:05.193860 mon.0 192.168.0.198:6789/0 2 : [INF] mds.? 192.168.0.198:6800/32144 up:boot
[18:33] <shelleyp> 2012-11-20 11:33:01.085201 mon e1: 1 mons at {a=192.168.0.198:6789/0}
[18:34] * joshd1 (~jdurgin@2602:306:c5db:310:9011:885f:57da:3c7e) Quit (Quit: Leaving.)
[18:34] <gregaf> shelleyp: hmm, that is a bad sign
[18:35] <gregaf> can you shut down all your OSDs, and then run
[18:35] <gregaf> sudo /usr/bin/ceph-osd -i 0 -c /etc/ceph/ceph.conf --debug_osd 20 --debug_ms 1
[18:36] <gregaf> let it run for a while, and then pastebin the resulting log?
[18:36] * deepsa (~deepsa@122.172.173.96) Quit (Ping timeout: 480 seconds)
[18:36] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:36] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:37] <shelleyp> Sure, thanks - Just to be sure, how to I shut down just the OSD's?
[18:37] <gregaf> just kill them or use your service management system if you like
[18:37] <shelleyp> OK
[18:37] * deepsa (~deepsa@122.172.27.154) has joined #ceph
[18:38] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:39] * yehudasa_ (~yehudasa@38.122.20.226) has joined #ceph
[18:43] <shelleyp> Here it is ... Thanks http://pastebin.com/AZzdzvZG
[18:43] * deepsa_ (~deepsa@122.172.21.33) has joined #ceph
[18:44] * deepsa (~deepsa@122.172.27.154) Quit (Remote host closed the connection)
[18:44] * deepsa_ is now known as deepsa
[18:58] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:59] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) has joined #ceph
[19:00] <noob2> for the radosgw are there any advantages to integrating with openstack keystone if you're not running openstack?
[19:00] <noob2> maybe this is a silly question
[19:04] <rweeks> for authentication?
[19:05] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:07] * fc (~fc@home.ploup.net) Quit (Quit: leaving)
[19:13] <sjust> fmarchand: how much activity?
[19:13] <gregaf> noob2: not a chance; Keystone is a lot slower than using the other authentication methods RGW provides
[19:17] <noob2> awesome
[19:18] <noob2> i'll leave it out
[19:27] * timmclau_ (~timmclaug@69.170.148.179) has joined #ceph
[19:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[19:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:31] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:32] * timmclaughlin (~timmclaug@69.170.148.179) Quit (Ping timeout: 480 seconds)
[19:32] <gregaf> shelleyp: hmm, this is not normal behavior — can I get the current status of ceph -s again?
[19:34] <gregaf> and then we'll need to get monitor debug logging as well
[19:35] <shelleyp> Sure ...
[19:35] <shelleyp> 2012-11-20 12:34:45.069206 pg v7921: 800 pgs: 8 creating, 792 stale+active+clean; 24831 MB data, 0 KB used, 0 KB / 0 KB avail
[19:35] <shelleyp> 2012-11-20 12:34:45.077439 mds e32: 1/1/1 up {0=a=up:replay}, 1 up:standby
[19:35] <shelleyp> 2012-11-20 12:34:45.077569 osd e93: 4 osds: 0 up, 0 in
[19:35] <shelleyp> 2012-11-20 12:34:45.078004 log 2012-11-20 12:34:33.688525 mon.0 192.168.0.198:6789/0 2 : [INF] mds.? 192.168.0.198:6800/3959 up:boot
[19:35] <shelleyp> 2012-11-20 12:34:45.078430 mon e1: 1 mons at {a=192.168.0.198:6789/0}
[19:35] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:35] <shelleyp> I did not use btrfs, maybe I should just rebuild it and do that?
[19:35] <gregaf> I don't think that's your problem here
[19:36] <gregaf> the monitors and the OSDs aren't communicating properly
[19:36] <gregaf> do you have iptables or anything running?
[19:36] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Remote host closed the connection)
[19:37] <gregaf> shelleyp: did you say your cluster crashed and this is when the problem appeared?
[19:38] <shelleyp> No - This cluster worked really well and was stable, until I accidentally powered down the server when it was writing rbd traffic
[19:39] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) Quit (Quit: Leaving.)
[19:43] <gregaf> shelleyp: check how your monitor is running and copy that command
[19:43] <gregaf> then kill the monitor and the OSDs and run the command again, appending "--debug_ms 1 --debug_mon 20"
[19:44] <gregaf> then turn on the monitor and the OSD (same line as previously), wait a while (60-120 seconds) and send me the logs
[19:44] <gregaf> this time please include the full log; the last one is just the tail
[19:44] <gregaf> and what version are you using?
[19:44] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:45] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[19:47] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has left #ceph
[19:51] <shelleyp> OK, working on it
[19:52] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) Quit (Quit: Leaving)
[19:53] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[19:54] * shelleyp (~shelleyp@173-165-81-125-Illinois.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[19:58] * dmick (~dmick@2607:f298:a:607:15f3:a75d:146d:65e) has joined #ceph
[19:59] * ChanServ sets mode +o dmick
[20:01] * calebamiles1 (~caleb@32.140.201.253) has joined #ceph
[20:02] * JoDarc (~Adium@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[20:02] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Read error: Operation timed out)
[20:02] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[20:02] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[20:05] * calebamiles (~caleb@65-183-137-95-dhcp.burlingtontelecom.net) Quit (Read error: No route to host)
[20:06] * shelleyp (~shelleyp@173-165-81-125-Illinois.hfc.comcastbusiness.net) has joined #ceph
[20:07] <shelleyp> mon log ... http://pastebin.com/i7dArhtB
[20:12] * calebamiles1 (~caleb@32.140.201.253) Quit (Ping timeout: 480 seconds)
[20:14] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[20:21] * calebamiles (~caleb@65-183-137-95-dhcp.burlingtontelecom.net) has joined #ceph
[20:21] * rlr219 (43c87e04@ircip1.mibbit.com) has joined #ceph
[20:22] * MikeMcClurg1 (~mike@firewall.ctxuk.citrix.com) Quit (Quit: Leaving.)
[20:24] * jlogan1 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) Quit (Quit: jlogan1)
[20:27] * JoDarc (~Adium@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[20:28] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[20:29] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[20:29] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[20:29] * yehudasa_ (~yehudasa@38.122.20.226) Quit (Ping timeout: 480 seconds)
[20:29] * jlogan1 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) has joined #ceph
[20:30] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[20:30] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[20:33] * jjgalvez1 (~jjgalvez@12.248.40.138) has joined #ceph
[20:36] * Ryan_Lane (~Adium@216.38.130.167) has joined #ceph
[20:37] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[20:38] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Ping timeout: 480 seconds)
[20:39] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[20:41] <noob2> has anyone checked if the rewrite rule for s3 works on the wiki?
[20:41] * jjgalvez1 (~jjgalvez@12.248.40.138) Quit (Ping timeout: 480 seconds)
[20:41] <noob2> i'm getting an error in apache saying the headers were not passed
[20:42] <dmick> noob2: are you asking if the rewrite configuration documented in the wiki is current and correct?
[20:43] <noob2> yup
[20:43] <noob2> i keep getting this in my logs: FastCGI: incomplete headers (0 bytes) received from server
[20:43] <dmick> in general the wiki is the wrong place for that, but
[20:43] <noob2> i was thinking maybe the rewrite rule is not correct
[20:43] <noob2> i mean the ceph docs
[20:43] <dmick> the first question is: which mod_fastcgi are you using?
[20:43] <noob2> lemme check
[20:44] <dmick> we have hacks to that and Apache to make things work better
[20:44] <dmick> and there are packages
[20:44] <noob2> i think i have the ubuntu one
[20:44] <dmick> http://ceph.com/docs/master/radosgw/manual-install/
[20:44] <noob2> yeah i was looking at that.
[20:44] <noob2> i think it installed the ubuntu ones anyways
[20:45] <noob2> do i have to specify the repo to pull from?
[20:45] <tnt> why do you even need rewrite ? I use lighttpd and don't have any rewrite rules at all.
[20:45] <dmick> yes, if you want the optimized ones. That's not explicit there
[20:45] <noob2> interesting
[20:46] <noob2> yeah i have the ubuntu versions installed
[20:46] <noob2> dpkg confirmed it
[20:46] * brambles (~xymox@shellspk.ftp.sh) has joined #ceph
[20:46] * gaveen (~gaveen@112.134.112.226) Quit (Remote host closed the connection)
[20:47] <dmick> but I don't know that that error is related. Let me check a few things
[20:48] <noob2> dmick: looks like the apache files are not in the ceph dev repo
[20:48] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[20:48] <dmick> no, they're separate
[20:48] <noob2> oh
[20:48] <dmick> http://gitbuilder.ceph.com/apache2-deb-precise-x86_64-basic/
[20:49] <dmick> http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-precise-x86_64-basic/
[20:49] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[20:49] <noob2> thanks
[20:49] <noob2> will those work with 12.10 also?
[20:50] <dmick> don't know if anyone's tested that, but it's worth a try. but again, not sure that's your problem
[20:50] <noob2> ok
[20:50] <noob2> yeah
[20:50] <dmick> tnt: the claim is that you need rewrite to pass HTTP_AUTHORIZATION in the environment
[20:54] <dmick> noob2: our rgw expert says that's probably a result of the 100-continue handling, so our packages will help
[20:54] <dmick> if they work on quantal
[20:54] <dmick> if not, the sources are available :)
[20:55] * brambles (~xymox@shellspk.ftp.sh) Quit (Quit: leaving)
[20:55] <noob2> ok
[20:55] <noob2> so lightty would also solve this i'd guess
[20:55] <noob2> like tnt says
[20:55] <dmick> not sure how
[20:56] * rweeks is now known as goodeating
[20:56] <goodeating> er.
[20:56] * goodeating is now known as rweeks
[20:57] <rweeks> wrong channel...
[20:57] * brambles (~xymox@shellspk.ftp.sh) has joined #ceph
[20:58] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[20:58] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[20:59] * eternaleye_ (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[21:00] * Qten (Q@qten.qnet.net.au) has joined #ceph
[21:00] * cypher6877 (~jay@cpe-76-175-167-163.socal.res.rr.com) has joined #ceph
[21:00] <noob2> tnt: could you share your lighttpd.conf ?
[21:00] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[21:02] * jlogan2 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) has joined #ceph
[21:02] * jochen_ (~jochen@laevar.de) has joined #ceph
[21:02] * mistur_ (~yoann@kewl.mistur.org) has joined #ceph
[21:02] * mdxi_ (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[21:02] * slang (~slang@ace.ops.newdream.net) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * jlogan1 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * maxiz_ (~pfliu@111.192.248.200) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * Qu310 (Q@qten.qnet.net.au) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * cypher497 (~jay@76.175.167.163) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * jochen (~jochen@laevar.de) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * mistur (~yoann@kewl.mistur.org) Quit (synthon.oftc.net oxygen.oftc.net)
[21:02] * gohko_ (~gohko@natter.interq.or.jp) Quit (synthon.oftc.net oxygen.oftc.net)
[21:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:06] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[21:06] * jlogan1 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) has joined #ceph
[21:06] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[21:06] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[21:06] * maxiz_ (~pfliu@111.192.248.200) has joined #ceph
[21:06] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[21:06] * cypher497 (~jay@76.175.167.163) has joined #ceph
[21:06] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[21:06] * gohko_ (~gohko@natter.interq.or.jp) has joined #ceph
[21:06] <shelleyp> Found my issue - files system mounting issue - Thanks for all your help guys !
[21:06] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Remote host closed the connection)
[21:06] * shelleyp (~shelleyp@173-165-81-125-Illinois.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[21:07] * jlogan1 (~Thunderbi@2600:c00:3010:1:1ccf:467e:284:aea8) Quit (Ping timeout: 481 seconds)
[21:10] * gohko_ (~gohko@natter.interq.or.jp) Quit (Read error: Operation timed out)
[21:11] * cypher497 (~jay@76.175.167.163) Quit (Read error: Operation timed out)
[21:15] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[21:25] * timmclau_ (~timmclaug@69.170.148.179) Quit (Remote host closed the connection)
[21:29] * timmclaughlin (~timmclaug@69.170.148.179) has joined #ceph
[21:32] * rlr219 (43c87e04@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[21:37] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[21:37] * BManojlovic (~steki@212.69.20.139) has joined #ceph
[21:38] * eternaleye_ is now known as eternaleye
[21:54] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[22:01] <noob2> i've gotten a little further
[22:01] <noob2> now i crash with: radosgw: must specify 'rgw socket path' to run as a daemon
[22:08] <Robe> gdisk sure looks nice
[22:10] <dmick> noob2: that's in the rgw config docs
[22:10] <dmick> Robe: gdisk?
[22:14] <noob2> yeah i added the socket path and the rgw never returns now
[22:15] <noob2> FastCGI: (dynamic) server "/var/www/s3gw.fcgi" has failed to remain running for 30 seconds given 3 attempts
[22:15] <noob2> it just hangs it seems
[22:15] <Robe> dmick: a less-broken gpt fdisk utility apparently
[22:16] <dmick> Robe: afaik the only one :)
[22:16] <dmick> noob2: check logs maybe?
[22:16] <dmick> (client.rgw.log in "the log dir")
[22:17] <dmick> (or whatever you named the client)
[22:17] <dmick> client.radosgw.gateway perhaps
[22:17] <Robe> dmick: the only one?!
[22:17] <dmick> if you haven't reviewed each step in http://ceph.com/docs/master/radosgw/config/ it might be worth while
[22:18] <noob2> yeah i followed it exactly
[22:18] <dmick> Robe: that handles gpt, yeah
[22:18] <noob2> just fails immediately
[22:18] <dmick> in which case, then, /var/log/ceph/radosgw.log
[22:18] <Robe> dmick: there's parted and gnu-fdisk as well
[22:18] <noob2> yeah that is oddly empty
[22:19] * timmclaughlin (~timmclaug@69.170.148.179) Quit (Remote host closed the connection)
[22:19] <noob2> i think maybe the radosgw process never started. i can't find it in ps
[22:19] * timmclaughlin (~timmclaug@69.170.148.179) has joined #ceph
[22:19] <dmick> Robe: yes, but their GPT support blows to the point that I don't even acknowledge it
[22:20] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[22:20] <dmick> noob2: are you still using quantal packages for apache2 and mod_fastcgi, or is this with ours?
[22:20] <Robe> dmick: thanks!
[22:21] <noob2> still using the defaults from quantal
[22:21] <dmick> Robe: heh
[22:21] <noob2> i don't think radosgw even started. the logs are blank
[22:21] <Robe> for reaffirming my suspicion, that is
[22:21] <dmick> gdisk could still use lots of love
[22:21] <dmick> but it's a lot closer
[22:21] <dmick> surprising how primitive those tools still are
[22:21] <dmick> noob2: so radosgw should start from your ceph startup, not apache
[22:22] <noob2> yeah i see the init file
[22:22] <noob2> i started it but can't tell if it's running or not
[22:22] <Robe> hrm
[22:22] <dmick> uh
[22:22] <dmick> hm, maybe that's changed since I looked last
[22:22] <Robe> the 5 minute setup claim wasn't an understatemen
[22:22] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:25] <dmick> noob2: if it were running you'd see it in ps
[22:25] <noob2> ok then it crashed
[22:26] <dmick> noob2: but it looks like it may be started from apache now...checking
[22:26] <noob2> ok
[22:26] <noob2> weird thing is i can make new users just fine
[22:26] <noob2> it prints out keys and all
[22:27] <dmick> yeah, iirc radosgw-admin doesn't use the daemon
[22:28] <noob2> ok
[22:28] <noob2> interesting
[22:28] <dmick> so yeah it's started from /etc/init.d/ceph (or service ceph)
[22:28] <noob2> oh..
[22:28] <dmick> so if it's going wrong, it's going wrong there
[22:28] <noob2> let me check that
[22:28] <noob2> ok
[22:28] <noob2> i see this now
[22:29] <noob2> radosgw -c /etc/ceph/ceph.conf --rgw-socket-path=/tmp/radosgw.sock -n client.radosgw.1
[22:29] <dmick> and it's not ceph, it's got its own service now; sorry again
[22:29] <dmick> my knowledge is outdated :)
[22:29] <dmick> but it looks like it's running now?
[22:30] <noob2> does that client.radosgw.1 look correct?
[22:30] <noob2> yeah i see 6 radosgw processes
[22:30] <dmick> it should match what's in your ceph.conf
[22:30] <noob2> yup that's what i have in there also
[22:31] <noob2> when i try swift i get a slightly different error
[22:31] <noob2> terminated by calling exit with status '0'
[22:34] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Quit: Leaving.)
[22:35] * vjarjadian (~IceChat7@5ad6d001.bb.sky.com) has joined #ceph
[22:37] <Robe> hah
[22:38] <Robe> fat-fingering pool size/replication settings
[22:46] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Ping timeout: 480 seconds)
[22:48] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[22:49] * iggy_ (~iggy@theiggy.com) has joined #ceph
[22:50] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Remote host closed the connection)
[22:52] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[22:53] * calebamiles (~caleb@65-183-137-95-dhcp.burlingtontelecom.net) Quit (Quit: Leaving.)
[22:59] <elder> dmick, I'm here, any word on where/how we'll be meeting?
[23:03] <dmick> vidyo, shortly
[23:09] * Steki (~steki@bojanka.net) has joined #ceph
[23:12] * BManojlovic (~steki@212.69.20.139) Quit (Ping timeout: 480 seconds)
[23:13] * ghbizness2 (~ghbizness@host-208-68-233-254.biznesshosting.net) has joined #ceph
[23:13] * ghbizness2 (~ghbizness@host-208-68-233-254.biznesshosting.net) has left #ceph
[23:19] * noob2 (a5a00214@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:19] * vjarjadian_ (~IceChat7@5ad6d001.bb.sky.com) has joined #ceph
[23:23] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[23:24] * vjarjadian (~IceChat7@5ad6d001.bb.sky.com) Quit (Ping timeout: 480 seconds)
[23:28] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[23:31] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:38] * Steki (~steki@bojanka.net) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:41] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[23:43] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.