#ceph IRC Log

Index

IRC Log for 2013-02-11

Timestamps are in GMT/BST.

[0:04] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:04] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:04] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[0:06] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:16] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[0:16] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:16] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:32] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:33] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:41] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:41] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:49] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:49] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:55] * danieagle (~Daniel@177.99.135.238) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[0:57] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:57] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[0:59] * ScOut3R (~ScOut3R@2E6B53AC.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[1:07] <CrashHD> how do you inrease reps
[1:08] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:08] <CrashHD> is it size?
[1:13] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[1:23] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[1:25] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[1:28] * loicd (~loic@2a01:e35:2eba:db10:142c:984e:a283:b04c) Quit (Quit: Leaving.)
[1:28] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:37] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[1:37] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[1:43] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:43] * loicd (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) has joined #ceph
[1:45] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[1:45] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[1:49] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:52] * LeaChim (~LeaChim@b0faa140.bb.sky.com) Quit (Ping timeout: 480 seconds)
[2:01] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:01] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[2:07] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[2:09] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:10] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[2:10] <jluis> CrashHD, yes, it's 'size'
[2:12] <CrashHD> gotcha
[2:12] <CrashHD> it's funny
[2:12] <CrashHD> in the osd dump
[2:12] <CrashHD> it's "rep" or something
[2:12] <CrashHD> verbiage is just slightly inconsistent
[2:12] <CrashHD> ahh "rep size"
[2:12] <CrashHD> not so inconsistent I guess
[2:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:13] <CrashHD> I was suprised when more than 80% of my io test came it at under 1ms
[2:13] <CrashHD> in a full random scenario
[2:13] <CrashHD> granted, in a vm workstation environment against ssd vmdk's
[2:14] <CrashHD> but still
[2:14] <CrashHD> very good
[2:14] <CrashHD> am I missing something?
[2:14] <CrashHD> these are primary storage worthy type numbers
[2:15] * BManojlovic (~steki@105-171-222-85.adsl.verat.net) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:16] <CrashHD> which I'm suprised to see?
[2:17] <loicd> I would very much like a review on https://github.com/dachary/ceph/commit/ed660ce40fe593c0eab2b32629f8e522ffa4ee93
[2:18] <loicd> in the meantime, I'll get a good night sleep :-)
[2:22] * loicd (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) Quit (Quit: Leaving.)
[2:26] <CrashHD> anyone have a good cartoonish ceph image?
[2:27] <CrashHD> not like the drawings
[2:27] <CrashHD> but something clipartish
[2:33] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[2:42] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:42] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[2:49] * diegows (~diegows@190.188.190.11) has joined #ceph
[2:50] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:50] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[2:58] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:59] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[3:00] * diegows (~diegows@190.188.190.11) Quit (Ping timeout: 480 seconds)
[3:11] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:11] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[3:19] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:19] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[3:19] <CrashHD> lol this network hates me
[3:19] <CrashHD> I wonder what's going on
[3:19] <CrashHD> anyon eelse using an mirc client?
[3:24] * diegows (~diegows@190.188.190.11) has joined #ceph
[3:36] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:36] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[3:44] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:44] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[3:46] * diegows (~diegows@190.188.190.11) Quit (Ping timeout: 480 seconds)
[3:57] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:57] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[3:58] <phantomcircuit> CrashHD, * CrashHD has quit (Ping timeout: 480 seconds)
[3:58] <phantomcircuit> im guessing comcast is failing you
[3:58] <CrashHD> lol
[3:58] <CrashHD> its not though
[3:58] <CrashHD> I'm connceted to two or three other networks
[3:58] <CrashHD> no issue
[3:59] <CrashHD> my ip isn't pingable though
[3:59] <CrashHD> wonder if that is it
[4:06] <ShaunR> CrashHD: i use mirc
[4:07] <ShaunR> I dont think the network uses ping to see if your alive, they use a form that works through the irc protocol
[4:08] <ShaunR> you got some delay though....
[4:08] <ShaunR> [19:07] [CrashHD PING reply]: 20secs
[4:08] <ShaunR> -
[4:08] <ShaunR> thats what i just got
[4:08] <CrashHD> wow
[4:08] <CrashHD> crazy
[4:08] <CrashHD> hmm just got 1 sec from you
[4:08] <CrashHD> strange
[4:09] <ShaunR> yep i just got back 0 from you
[4:09] <ShaunR> so your connection looks to be inconsistant.
[4:10] <ShaunR> your on a different server than you were the other day, so it sounds to me like your internet/network connection may be having issues.
[4:10] <ShaunR> since your blocking icmp i cant try you directly
[4:11] * noob21 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) Quit (Quit: Leaving.)
[4:21] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:21] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[4:21] <CrashHD> argh
[4:21] <CrashHD> this is crazy
[4:21] <CrashHD> I never have this kind of problem
[4:31] <ShaunR> sombody in #oftc might have more insight of whats going on
[4:31] <CrashHD> good call
[4:32] <CrashHD> anyone seeing sub ms latencies with their ceph install?
[4:32] <CrashHD> I'm just baffled at my last test
[4:32] <CrashHD> agaisnt rbd
[4:32] <CrashHD> with a 10gb volume
[4:32] <CrashHD> random read
[4:32] <CrashHD> avg latency = 769 usec
[4:49] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:49] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[4:57] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[4:57] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[5:00] <infernix> Average Latency: 0.336671
[5:01] <CrashHD> nice
[5:02] <infernix> Min latency: 0.115112
[5:02] <infernix> Average Latency: 0.127647
[5:02] <CrashHD> rbd?
[5:02] <CrashHD> how many osds?
[5:02] <infernix> 48, rados bench
[5:03] <infernix> rados -p rbd bench 10 write -t 16 -b 1024768
[5:03] <infernix> Average Latency: 0.160141
[5:03] <infernix> i have a bandwidth issue though, still looking into it
[5:03] <infernix> just set it up
[5:08] <CrashHD> it took me all day to setup 3 nodes lol
[5:08] <CrashHD> I didn't see much in the way of puppet work
[5:08] <infernix> 59 osds over 5 nodes
[5:09] <infernix> rados -p rbd bench 10 write -t 256 -b 4194304
[5:09] <infernix> Bandwidth (MB/sec): 349.754
[5:09] * infernix grmbls
[5:10] <infernix> [ 3] 0.0-10.0 sec 12.8 GBytes 11.0 Gbits/sec on iperf
[5:12] <CrashHD> 348 us what 8Gb
[5:12] <CrashHD> 358M(B)/s is more like 6G(b)
[5:13] <CrashHD> 120MB/s for every 1Gb or so
[5:13] <CrashHD> so 4G(b)
[5:13] <CrashHD> hmm seems like a lot of network overhead
[5:13] <infernix> no that's not it
[5:13] <infernix> i've pushed 1.5gbyte before
[5:15] <infernix> i can read from each of the 12 disks with 180mb/sec concurrently. so no bottleneck in the HBA
[5:23] <CrashHD> interesting
[5:23] <CrashHD> 180mb/sec
[5:23] <CrashHD> wow
[5:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has left #ceph
[5:25] <infernix> i need to get to my benchmarker code on my dev box but it's at the office and it's off
[5:25] <infernix> meh
[5:37] <infernix> Bandwidth (MB/sec): 1517.565
[5:37] <infernix> reads are fine
[5:38] <infernix> Bandwidth (MB/sec): 1718.333 with 65k mtu
[5:41] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[5:43] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has left #ceph
[5:52] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[5:54] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[5:57] <infernix> 59 OSDs with XFS on 12 hosts and only 330MB/s average write
[6:08] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:08] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[6:08] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:17] <infernix> 1850MB reads with rbd and 16 random io threads, directio
[6:17] <infernix> it's just the writes that still suck
[6:18] <infernix> i mean 350mbyte/s over 59 OSDS is only 6MByte/s
[6:19] <infernix> aha. but if I take 2 rbd devices to read from i get 2.8gbyte/s reads
[6:20] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:20] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[6:20] <infernix> yet the same test with writes and it doesn't scale at all. ~210mb/sec
[6:21] <infernix> what could it be >.<
[6:25] <phantomcircuit> infernix, reads are coming back from cache
[6:26] <phantomcircuit> writes are going to disk
[6:26] <phantomcircuit> infernix, how many placement groups?
[6:27] <phantomcircuit> infernix, copypasta the pool line for rbd from ceph osd dump
[6:27] <infernix> phantomcircuit: just made a new pool with 2900
[6:27] <infernix> and i doubt that all of it is coming from the cache, but i'll rerun with dropped caches
[6:28] <infernix> ph.
[6:28] <infernix> i just killed the rbd test box
[6:28] <infernix> didn't like to bench on that new pool
[6:30] <infernix> pool 3 'rbd' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 2900 pgp_num 2900 last_change 278 owner 0
[6:31] <infernix> for what it's worth i'm running on double QDR infiniband
[6:31] <infernix> so 3.2gbyte/s should be doable per ib nic
[6:34] <phantomcircuit> infernix, can you dump the crush ruleset?
[6:34] <phantomcircuit> http://ceph.com/docs/master/rados/operations/crush-map/#getcrushmap
[6:35] <infernix> phantomcircuit: http://pastebin.ca/2312720
[6:36] <infernix> ok, so been writing 350mb of random data to 2 rbd devices for a few mins
[6:36] <infernix> dropping caches
[6:37] <phantomcircuit> infernix, what's the network topology ?
[6:38] <infernix> 950mb/s :|
[6:38] <infernix> QDR infiniband
[6:38] <phantomcircuit> well i mean you have 5 hosts here
[6:38] <infernix> eg 40gbit, but let's say effectively about 17gbit
[6:38] <infernix> per host
[6:38] <phantomcircuit> are they connected to each other or are they in a star topology
[6:38] <infernix> it's fat tree
[6:39] <infernix> each host has the full 40gbit
[6:39] <infernix> including the test node
[6:39] <infernix> i've pushed 1.5gbyte/s with less nodes and less disks before
[6:39] <phantomcircuit> weird
[6:40] <infernix> meh, i need to sleep over it and get my benchmark data from my dev box
[6:45] <infernix> the slowest osd bench does 40mbyte/s
[6:46] <infernix> 59 disks 2 replicas = at least 1.1gb/s writes
[6:46] <infernix> no idea why i'm not going past 350-400
[6:47] <Vjarjadian> how many disks and ram per host? iirc ram usage shoots up when accessing the OSDs
[6:49] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:49] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[6:49] <infernix> 32GB ram
[6:49] <infernix> 12 per host
[6:50] <infernix> 1 6-core E5-2620 (e.g. 12 hyperthreading threads)
[6:50] <Vjarjadian> nice setup
[6:50] <infernix> ok i'm seeing 600MB with 3x rados bench write now
[6:51] <infernix> yeah i had to sacrifice a node too
[6:51] <infernix> planned on 72 osds
[6:52] <infernix> need to write with 1gb, read with 2.5+
[6:52] <infernix> ok 860mb with 5x rados bench
[6:53] <Vjarjadian> you must have a real heavy application to need that sort of performance...
[6:54] <Vjarjadian> most of the networks i work with are built more for low power consumption... nothing like that
[6:57] <infernix> :)
[6:58] <infernix> it's cool stuff but i've some more work to do to get it ready for prime time
[6:58] <infernix> but sleep. really this time.
[6:58] * infernix &
[7:09] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:09] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[7:17] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:17] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[7:23] * capri (~capri@212.218.127.222) has joined #ceph
[7:25] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:26] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[7:34] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:34] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[7:36] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[7:38] * sleinen1 (~Adium@2001:620:0:26:810d:bd27:b041:49ed) has joined #ceph
[7:44] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:54] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:54] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[7:59] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:01] * sleinen1 (~Adium@2001:620:0:26:810d:bd27:b041:49ed) Quit (Quit: Leaving.)
[8:02] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:02] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[8:10] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:11] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[8:19] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:19] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[8:27] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:27] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[8:33] <madkiss> ahum
[8:35] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:35] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[8:36] * Morg (b2f95a11@ircip2.mibbit.com) has joined #ceph
[8:37] <madkiss> I have a three-node cluster here
[8:37] <madkiss> where I can not start any MDSes
[8:38] <madkiss> 0> 2013-02-11 08:37:19.012034 7fd52c8d2700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread 7fd52c8d2700 time 2013-02-11 08:37:19.010463
[8:38] <madkiss> mds/MDSTable.cc: 150: FAILED assert(0)
[8:40] <dignus> morning
[8:43] <madkiss> hello dignus
[8:51] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:51] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[8:54] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[8:57] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[9:02] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:04] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Read error: Connection reset by peer)
[9:05] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:09] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[9:13] * hybrid512 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:17] * loicd (~loic@lvs-gateway1.teclib.net) has joined #ceph
[9:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:26] * andret (~andre@pcandre.nine.ch) has joined #ceph
[9:28] * leseb (~leseb@stoneit.xs4all.nl) has joined #ceph
[9:28] * leseb (~leseb@stoneit.xs4all.nl) Quit (Remote host closed the connection)
[9:28] * leseb (~leseb@2001:980:759b:1:2958:e655:e458:7165) has joined #ceph
[9:34] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:49] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[9:51] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:51] <capri> http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/
[9:56] * dosaboy (~user1@host86-164-229-186.range86-164.btcentralplus.com) has joined #ceph
[10:03] * hybrid512 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Read error: Connection reset by peer)
[10:09] * LeaChim (~LeaChim@b0faa140.bb.sky.com) has joined #ceph
[10:32] * phillipp (~phil@p5B3AF3EE.dip.t-dialin.net) has joined #ceph
[10:37] * phillipp1 (~phil@p5B3AF543.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[10:46] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:52] * hybrid512 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[11:52] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[12:01] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) has joined #ceph
[12:12] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:22] * BillK (~BillK@124-169-186-232.dyn.iinet.net.au) has joined #ceph
[12:25] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[12:41] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:44] * danny (~danny@charybdis-ext.suse.de) has joined #ceph
[12:46] * danny (~danny@charybdis-ext.suse.de) Quit ()
[12:47] * danny (~danny@charybdis-ext.suse.de) has joined #ceph
[12:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:51] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:52] * danny (~danny@charybdis-ext.suse.de) has left #ceph
[12:52] * dalgaaf (~danny@charybdis-ext.suse.de) has joined #ceph
[12:55] * low (~low@188.165.111.2) has joined #ceph
[13:01] * dalgaaf (~danny@charybdis-ext.suse.de) has left #ceph
[13:01] * dalgaaf (~dalgaaf@nat.nue.novell.com) has joined #ceph
[13:08] * dalgaaf (~dalgaaf@nat.nue.novell.com) Quit (Quit: Konversation terminated!)
[13:11] * dalgaaf (~dalgaaf@charybdis-ext.suse.de) has joined #ceph
[13:30] * diegows (~diegows@190.188.190.11) has joined #ceph
[13:38] * leseb (~leseb@2001:980:759b:1:2958:e655:e458:7165) Quit (Remote host closed the connection)
[13:42] * leseb (~leseb@mx00.stone-it.com) has joined #ceph
[13:45] * jks (~jks@3e6b7199.rev.stofanet.dk) Quit (Ping timeout: 480 seconds)
[13:51] * jks (~jks@4810ds1-ns.2.fullrate.dk) has joined #ceph
[13:55] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:11] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:25] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[14:26] * sleinen1 (~Adium@2001:620:0:25:d545:5c83:cd50:b6b3) has joined #ceph
[14:27] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:28] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Read error: Connection reset by peer)
[14:28] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:32] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:32] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Read error: Connection reset by peer)
[14:32] * tryggvil_ is now known as tryggvil
[14:33] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[14:38] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:43] * fghaas (~florian@91-119-222-199.dynamic.xdsl-line.inode.at) has joined #ceph
[14:43] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Read error: Connection reset by peer)
[14:46] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) has joined #ceph
[14:57] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:59] <loicd> I see that http://tracker.ceph.com/issues/4070 is not closed or linked to https://github.com/ceph/ceph/pull/44 although it was merged. I would be happy to close it myself but it seems I don't have enough redmine permissions for that. Should I ask for permission or should I just wait for someone else to do it ?
[15:02] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[15:04] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[15:09] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[15:22] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:32] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Remote host closed the connection)
[15:33] * Morg (b2f95a11@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[15:39] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:39] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[16:01] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[16:06] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[16:07] * jks (~jks@4810ds1-ns.2.fullrate.dk) Quit (Ping timeout: 480 seconds)
[16:10] * jks (~jks@4810ds1-ns.0.fullrate.dk) has joined #ceph
[16:17] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Remote host closed the connection)
[16:22] * wer (~wer@wer.youfarted.net) has joined #ceph
[16:22] * phillipp (~phil@p5B3AF3EE.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[16:25] * phillipp (~phil@p5B3AF189.dip.t-dialin.net) has joined #ceph
[16:38] * vata (~vata@2607:fad8:4:6:80fc:1aae:f39b:dcbe) has joined #ceph
[16:40] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[16:41] <jks> sage: you here?
[16:43] <joshd1> loicd: I closed it, not sure who grants redmine permissions (rturk? sagewk?)
[16:44] <loicd> joshd1: thanks !
[16:46] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[16:47] <loicd> joshd1: I'm happily working on buffer.{cc,h} unit tests . https://github.com/ceph/ceph/pull/41/files It already is a large diff and will at least double before I'm done. I'm tempted to submit it as a single pull request because it is a whole. But it will be tedious to review. Would you advise me to submit it as multip smaller patches instead ?
[16:48] <loicd> s/multip/multiple/
[16:50] <joshd1> loicd: I was just looking at those :) smaller patches would be nice if they're fixing bugs, or doing things other than adding tests, but I'm fine with a large diff
[16:51] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[16:51] <loicd> joshd1: ok. I'll keep adding to it then :-)
[16:51] <loicd> joshd1: anything you would like me to do differently on https://github.com/ceph/ceph/pull/41/files ?
[16:54] <joshd1> loicd: no, looks good so far
[17:00] <noob2> anyone have experience with serving an rbd out over nfs?
[17:01] <absynth_> yeah. works.
[17:01] <noob2> that's what i figured :)
[17:01] <noob2> i'm bonnie++ testing it now
[17:02] <absynth_> i said "works". not "runs". ;)
[17:02] <noob2> lol
[17:02] <absynth_> (that pun is better in German)
[17:02] <noob2> so far so good
[17:02] <noob2> my rados gw machine is serving it up
[17:04] <paravoid> is there a way to make radosgw's log a little less verbose?
[17:05] <paravoid> debug radosgw 30?
[17:05] <absynth_> paravoid: did you make any headway on that "wrongly marked me down" issue yet?
[17:07] <paravoid> no
[17:07] <paravoid> it's been stable for the past few days
[17:08] <paravoid> sage said that you tracked it down?
[17:08] <absynth_> yeah
[17:08] * BillK (~BillK@124-169-186-232.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:08] <absynth_> actually, a connected issue in the peering code, i think
[17:08] <absynth_> he suspected there might be a connection
[17:12] <topro> with no path-settings in ceph.conf, do MONs MDSs or OSDs write to other locations than in /var/lib/ceph?
[17:12] <fghaas> topro: they shouldn't, at least not if you're using debian packages
[17:13] <topro> fghaas: debian packages from ceph.com/debian-bobtail/
[17:13] <fghaas> (haven't used the rpms yet, personally)
[17:13] <wer> my 260tb test cluster is in trouble this morning. two of the mons is not letting me login and the whole cluster is not happy :(
[17:13] <fghaas> topro: yeah. why, are you observing that they're writing elsewhere?
[17:14] <fghaas> wer: how many mons have you got, and how many of them are down?
[17:14] <wer> fghaas: I have three and two are down.
[17:14] <topro> just ask as I mount a ssd to /var/lib/ceph to handle MON and MDS IO, and then one seperate 10k spinning disk for each OSD. But just get heavy IO-load on my /var mounted disk
[17:14] <fghaas> wer: then the remaining one isn't quorate. you'll have to fire at least one of the mons back up
[17:14] <topro> and wonder if it's from a ceph daemon
[17:15] <fghaas> are they failing with asserts, or why won't they start?
[17:16] <wer> fghaas: yeah, I am going to power cycle them and see what happens. Strange that I would two over the weekend. I had one node marked noout over the weekend as I was in the middle of transitioning to another interface on the box.... So it was down. hmm.
[17:16] <fghaas> wer, well if a mon is down you should at least see some traces in the logs, provided you're logging at all
[17:17] <fghaas> or do you mean you can't even ssh into the boxes?
[17:17] * low (~low@188.165.111.2) Quit (Quit: Leaving)
[17:17] <noob2> fghaas: you wrote the article on ceph over iscsi right?
[17:17] <fghaas> noob2: ceph over iscsi? you mean rbd exported as an iscsi target?
[17:17] <wer> 1 mon.f@2(probing) e17 discarding message auth(proto 0 27 bytes epoch 17) v1 and sending client elsewhere; we are not in quorum over and over since yesterday.
[17:17] <noob2> right
[17:18] <fghaas> noob2: yeah, that was me
[17:18] <noob2> did you encounter any issues with high load on the proxy machine causing it to kernel panic?
[17:18] <noob2> i can kernel panic my ubuntu 12.10 proxy node that runs fibre channel if i put enough strain on it
[17:18] <noob2> well enough strain + downing a few osd's to make the cluster slow down in response
[17:19] <loicd> fghaas: good afternoon sir :-)
[17:19] <jluis> wer, that message should go away when you have enough monitors up to form quorum
[17:19] <jluis> should go away after they form a quorum, to be more precise
[17:19] <absynth_> but can 2 mons form a quorum? i never quite understood that
[17:19] <jluis> absynth_, if you have a total of 3, yeah
[17:20] <jluis> a quorum is composed by a majority of mons
[17:20] <noob2> absynth: i think 2 mons can figure out what to do but not form a quorum
[17:20] <noob2> i'm not entirely sure
[17:20] <jluis> depends on how many monitors you have in the monmap really
[17:20] <jluis> if you have only 1, the cluster will work just fine; with 2, you'll need them both up, as 1 monitor is not the majority
[17:21] <jluis> with 3 monitors, you'll need at least 2 to form a quorum
[17:21] <jluis> with 4, you'll still need 3
[17:21] <fghaas> loicd: bonjour :)
[17:21] <jluis> q = m/2+1
[17:21] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[17:22] <fghaas> noob2: is that panic in rbd, libceph, or some tcm module?
[17:22] * BillK (~BillK@58-7-100-185.dyn.iinet.net.au) has joined #ceph
[17:22] <jluis> absynth_, does this make sense to you? :)
[17:22] <noob2> the tcm module
[17:22] <noob2> target seems to panic
[17:23] <noob2> rbd performs just fine :)
[17:23] <fghaas> noob2: then talk to nab :)
[17:23] * capri (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[17:23] <fghaas> I've only ever tested with iscsi, not fc
[17:23] <noob2> yeah i've been talking to him on and off. he thinks the latency in the ceph cluster can push target into buggy code
[17:24] <noob2> how much did you beat up the proxy over iscsi? did you really pound on it?
[17:25] <absynth_> jluis: yeah, i had a wrong train of thought
[17:25] <absynth_> or maybe not
[17:25] <absynth_> what happens if 2 mons are up, but both have a different view of the cluster?
[17:25] <absynth_> they could never reach a quorum whats right
[17:27] <jluis> well, if they happen to know about each other, they will exchange monmaps when they are probing for other monitors
[17:27] <jluis> that *should* make one of them obtain a more recent monmap
[17:28] <absynth_> ok, and then they have quorum, because "who cares what the third guy says, he's not here anyway"
[17:29] <fghaas> noob2: not that much, no, it was intended as a proof of concept
[17:29] * BMDan (~BMDan@69.174.51.46) has joined #ceph
[17:29] <fghaas> have you seen the same kernel panics regardless of frontend?
[17:29] <noob2> i gotcha
[17:29] <jluis> okay, let me gather your idea: you have 3 monitors, each one has a different version of the monmap? or a different monmap altogether?
[17:29] <noob2> yeah and regardless of kernel version also
[17:29] <noob2> i've used every vanilla kernel up to the latest
[17:29] <jluis> or just one of them a different map or a different version?
[17:30] <jluis> all of them should be dealt in different fashion, and if not then we must have a bug
[17:30] <ShaunR> so ceph right out of the box is there any performance optimizations that should be done specifically for env's that are using rbd with qmeu/kvm vm's running on it.
[17:30] <jluis> and that just gave me an idea for a new monitor test
[17:30] <wer> I am getting a crapload of [DBG] osd.85 10.5.0.192:6839/14692 failure report canceled by osd.53 10.5.0.191:6815/25571... and from osd.49. in the mon's logs now. I don't know quite what that means but seamingly all osd's are chiming in with osd.53...
[17:31] * jluis is now known as joao
[17:31] <wer> if that makes sens :)
[17:32] <BMDan> Last week, several people suggested putting multiple OSDs on a single physical node. As I'm about to go into doing this, though, I can't find anything about defining failure groups—that is, saying, "Don't replicate osd.1's data to osd.2, because they're on the same box. I'm sure this is a lack of Google-fu, but can someone throw me a bone, at least what search terms to use?
[17:32] * jlogan1 (~Thunderbi@2600:c00:3010:1:c1d0:fd9:21b5:8830) has joined #ceph
[17:33] <absynth_> BMDan: from the current release, the default crush map reweights over nodes, not osds
[17:33] <absynth_> in other words,
[17:33] * capri (~capri@212.218.127.222) has joined #ceph
[17:34] * carsona (~carsona@office.betterlinux.com) Quit (Quit: leaving)
[17:34] <absynth_> i have no other words
[17:34] <BMDan> lol
[17:34] <BMDan> I was on the edge of my seat!
[17:34] <janos> lol
[17:34] <absynth_> i was looking for the config setting
[17:34] <absynth_> but alas, can't find it currently
[17:35] <joao> crushmap, 'choose[leaf]' parameter on the pool's
[17:35] <joao> I think
[17:35] * loicd (~loic@lvs-gateway1.teclib.net) Quit (Ping timeout: 480 seconds)
[17:35] <absynth_> yeah, exactly
[17:35] <absynth_> it should be set to host
[17:35] <joao> there's some docs on that
[17:35] <BMDan> At the risk of asking something dumb: so you're saying if I just do it, it'll Just Work™?
[17:35] <janos> steve obs will sue you
[17:35] <joao> http://ceph.com/docs/master/rados/operations/crush-map/
[17:35] <janos> *jobs
[17:36] <absynth_> at the risk of answering something dumb: probably
[17:36] <absynth_> (tm)
[17:36] <wer> :)
[17:36] <ShaunR> Here are some FIO tests i ran, one is against a local raid10 4 sata disk array with a image based filesystem and the other against a two server ceph cluster with each cluster having a raid 10 4 sata disk array. http://pastebin.ca/2312860 http://pastebin.ca/2312862
[17:36] <wer> I am going to shoot osd.49 and 53 in the head.... I can't see in there logs that they are any weirder then the others.... the mons are chatting about them too much.
[17:37] <joao> BMDan, assuming you set the osds as being on the same host, and that you are using the latest release, then yeah, it should just work
[17:37] <joao> if you are running an earlier version, then you should edit your crushmap
[17:37] <BMDan> Good to go on that front; everything's the latest version.
[17:38] <joao> BMDan, couldn't hurt to check your crushmap just to make sure though
[17:38] <joao> I forget when that was introduced
[17:38] <BMDan> Yeah, I will; I'm going to blow away config in the course of doing this, anyway, since I want a clean test.
[17:38] <BMDan> That is, not trying to preserve data.
[17:39] <absynth_> then ceph is your man!
[17:39] <absynth_> :>
[17:39] <absynth_> (scnr)
[17:40] <BMDan> Y'know what'd make a spiffy option that would save me a lot of scripting that I'm about to have to do? optional if [ ! -e $osd_datapath ]; then mkdir $osd_datapath; fi
[17:41] <BMDan> If nobody says, "Yeah! Lemme do that real quick!" then I'll submit a ticket for it.
[17:41] <BMDan> Just a thought. :)
[17:41] <wer> anyone know what this means? I think this is what brought down the mons.... osd.5 10.5.0.189:6857/13449 failure report canceled by osd.49 10.5.0.191:6803/25158 over and over.
[17:42] <wer> BMDan: I had to write some nifty scripts for all that stuff.... mkfs'ing and mounting. Nifty things....
[17:42] <wer> fstab and config generation.....etc.
[17:44] <wer> first I wasn't allowed to names osd.s the way I wanted :) Then later after an update, ceph would only add additional osd's itself following it's convention.... so I had to conform to that.... But once I figured out what ceph wanted to do, I adjusted (being a simple human) and wrote some nifty bash. Now I can make a node in under 10 minutes with 24 osd's on it.
[17:46] <BMDan> Speaking of naming OSD's… do they need to be sequential? Asking because I have twelve disks per node, so I'm thinking 1xx, 2xx, 3xx, 4xx for my four physical boxen.
[17:46] <absynth_> you have 24 spinners per host?
[17:46] <absynth_> BMDan: yep, they do, or at least it is strongly discouraged to do what you are planning to do
[17:46] <BMDan> Grumble grumble grumble.
[17:46] <wer> BMDan: yes.
[17:46] <absynth_> we did it like that ourselves, but alas, the powers that be discourage it
[17:46] <janos> same
[17:46] <BMDan> grumblegrumblegrumblegrumble
[17:47] <wer> you can't skip... and the only way I could make more was to create (with no naming options) and ceph would give me the next sequential.
[17:47] <absynth_> you can create osds that never join
[17:47] <BMDan> ...
[17:47] <janos> the crushmap export looks prety funny when you do that
[17:47] <BMDan> Kind of hate you a little, now, absynth.
[17:47] <absynth_> but then, for a 4XX 5XX 6XX naming convention, you have hudnreds of empty OSDs
[17:47] <absynth_> which is... not optimal :)
[17:47] <slang1> BMDan: re mkdir osddata, I thought mkcephfs did that already...are you having to make the osd dirs yourself?
[17:47] <absynth_> BMDan: i'll cope
[17:47] <janos> i have 1xx and 2xx and the crushmap has all those empty entries in it
[17:48] <wer> yup. It is and adjustment to figure out what it want's to do..... being a simple human and all.
[17:48] <wer> absynth_: yes 24 spinners per node.
[17:48] <BMDan> slang1: Well, I know it won't empty them itself. I seem to recall it yelled at me when they didn't exist, though, too.
[17:49] <absynth_> wer: is that working out?
[17:49] <wer> absynth_: until this morning it had been stable for weeks. Performance was acceptable.
[17:50] <wer> right now it is eff'ed cause I moved the mons off those nodes....
[17:50] <wer> e.f.f.'d
[17:51] <slang1> BMDan: ah yeah it doesn't want to delete old data, but it should create them if they're not there
[17:51] <wer> I think I need to shoot these two osd's that are showing up in the logs a lot.... but I don't know why.... response on the commandline for ceph is crazy slow right now and I lost two mons over the weekend.
[17:51] <slang1> test -d $osd_data || mkdir -p $osd_data
[17:51] <BMDan> Okay, last question before I generate this config: any good reason to go with either (in a two-node, four-osd cluster) osd.1=host1, osd.2=host2, osd.3=host1, osd.4=host2 or the alternative of osd.1=host1, osd.2=host1, osd.3=host2, osd.4=host2 ?
[17:51] <absynth_> you usually want the latter
[17:52] <slang1> k:q
[17:52] * slang1 argh
[17:52] <BMDan> slang1: [ -d $osd_data ] || mkdir -p $osd_data is slightly shorter. Though I'd recommend using "-e"; after all, if it exists but is a file, running mkdir probably will not result in your expected failure mode.
[17:53] <BMDan> wer: You can manually pull down those OSDs (that is, pull them out cleanly, one at a time), then reinsert them and see if they de-f*** themselves.
[17:53] <slang1> BMDan: by all means, if you have changes to the mkcephfs script post them to the mailing list as a patch or submit a pull request via github
[17:53] <BMDan> slang1: Ah, thought you were quoting theory, not the actual script. In that case, I will. :)
[17:54] <slang1> BMDan: cool, thanks!
[17:55] * capri (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[17:57] <wer> 1 of my mons is still not in the quorum.....
[17:58] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:58] <wer> man this is suckish.
[17:59] <joao> wer, what's happening on said monitor?
[17:59] <wer> ok since restarting those two osd's ceph is being a little more responsive.....
[17:59] <wer> joao: I had power managed it and it cam up.....
[18:00] <wer> osd.8 10.5.0.189:6866/14182 failure report canceled by osd.53 10.5.0.191:6815/25571 saying things like this over and over....
[18:01] <wer> 2013-02-11 17:01:18.545657 7f2339a04700 1 mon.e@1(electing).elector(663) init, last seen epoch 663
[18:02] <joao> wer, when mon.e finishes electing, it will be in-quorum
[18:02] <absynth_> uh, just to rule out the obvious: network issues? datetime issues?
[18:02] <joao> if it never ends up in-quorum, let me know
[18:02] <joao> grabbing some coffee
[18:02] <joao> brb
[18:03] <wer> joao: thanks. will do. grabbing coffee too :)
[18:04] <infernix> so i'm kind of stuck at 350mb writes per rados bench instance
[18:05] <BMDan> slang1: 2013-02-11 12:04:35.296578 7f02b6371780 -1 ** ERROR: error creating empty object store in /opt/data/osd.10: (2) No such file or directory
[18:05] <infernix> has any testing by anyone seen higher writes?
[18:05] <infernix> i'm curious where the bottleneck is
[18:05] <BMDan> Do I have a broken/old copy of mkcephfs relative to yours? I'm on the latest released version.
[18:05] * capri (~capri@212.218.127.222) has joined #ceph
[18:10] * BillK (~BillK@58-7-100-185.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:14] <ircolle> I'm pretty excited to see that everyone on the ceph-dev mailing list won $250k USD, at least that's what Mrs Josephine Rose is telling us… ;-)
[18:15] <slang1> BMDan: possibly. what's your mkcephfs command line?
[18:15] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:17] * fghaas (~florian@91-119-222-199.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[18:21] * The_Bishop_ (~bishop@e179018225.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:21] <wer> infernix: I have writtin enough to max out a 1gig link.... changing to 10gig as we speak so I am looking for the next bottleneck as well... I was getting close to 1gig per rados instance.... and rados lives on each node.
[18:22] * The_Bishop (~bishop@e179018225.adsl.alicedsl.de) has joined #ceph
[18:24] * BillK (~BillK@124-148-77-66.dyn.iinet.net.au) has joined #ceph
[18:25] * The_Bishop_ (~bishop@e179013183.adsl.alicedsl.de) has joined #ceph
[18:25] <scuttlemonkey> ircolle: time for a vacation...thanks western union! :P
[18:26] * leseb (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[18:26] * dalgaaf (~dalgaaf@charybdis-ext.suse.de) Quit (Quit: Konversation terminated!)
[18:27] * sagewk (~sage@2607:f298:a:607:799c:4aca:4834:466d) has joined #ceph
[18:32] * gregaf (~Adium@2607:f298:a:607:7939:b194:6ed2:5b12) has joined #ceph
[18:32] * The_Bishop (~bishop@e179018225.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[18:35] * sagewk (~sage@2607:f298:a:607:799c:4aca:4834:466d) Quit (Remote host closed the connection)
[18:36] * gaveen (~gaveen@112.135.130.211) has joined #ceph
[18:37] <junglebells> 5~3~7~~
[18:37] <junglebells> Sorry for the spam, having some term issues :)
[18:39] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[18:40] <wer> joao: my mon is still not in.
[18:40] <junglebells> infernix: Sorry, I'm pretty much capped at 120MB/s with 3 nodes, 3 OSDs, and 10gig :(. I think I should have gone "jbod" instead of having OSD's on top of RAID 0+1
[18:40] <infernix> wer: bit or byte?
[18:41] <wer> 1gig links..... 850mbps ish.... per radosgw.
[18:41] <wer> about what you would expect when network is the cap...
[18:42] <wer> I also did some tsung on the reads and had similar performance in the other direction.
[18:42] <wer> my setup has 24 osd's per node though.
[18:42] <joao> wer, which version are you running?
[18:42] <joao> wer, also, what does 'ceph -s' show?
[18:43] <infernix> yeah i'm at 59 OSDs, 12 per box, and 15gbit link speeds (inifiniband)
[18:43] <wer> joao: 0.56.1 and hold please....
[18:46] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[18:46] <wer> joao: http://pastebin.com/7Yd5npPt keeping in mind I have 1 node down and noout'd.
[18:46] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[18:47] * sagewk (~sage@2607:f298:a:607:799c:4aca:4834:466d) has joined #ceph
[18:47] <wer> joao: crap... I may have a firewall issue.... one sec.
[18:47] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:47] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:47] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[18:48] <wer> joao: I'm an idiot.
[18:49] <noob2> anyone else have iop data from their cluster?
[18:49] <noob2> mine shows about 1,000 iops
[18:49] <joao> wer, a firewall issue then? :)
[18:49] <noob2> i have 6x 12 sata drives
[18:50] <noob2> and a raid0 for each drive configuration
[18:50] <wer> joao: yeah. I have firehol startup automatically in this image. Since I had to reboot them it was up.... everything is snappy now.
[18:50] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[18:50] <wer> everything is responsive now..... now the only lingering question in my mind is why these mons get to be so unhappy. One was chatty about osd.49 and the other about osd.53. I have restarted both osd's.
[18:51] <junglebells> noob2: That's about what I'm seeing as well, 800-1000 during rados bench write
[18:51] <noob2> gotcha
[18:51] <noob2> ok cool
[18:51] <noob2> yeah it depends on what else is going on but about 900-1000
[18:52] <junglebells> noob2: On the same token, I'm also very new to the ceph game and I'm currently running on top of 3 OSD's each with 4x2TB in RAID 0+1. I think I should be using each disk as an OSD though...
[18:52] <wer> joao: the mons are on old hardware so it is possible I am just unlucky.... hmm. I don't really know as there was nothing else telling on the box from any of the logs other then the two osd's I mentioned.
[18:53] <noob2> junglebells: yeah that is the recommendation everyone makes
[18:53] <noob2> i haven't tried it any other way though
[18:53] <junglebells> My read/write speed was incredible on the LSI raid that I have but I'm seeing ~1/2 that performance now testing speeds with rados bench and rbd
[18:55] <junglebells> noob2: I basically have 3 days to get this shiznat flying as I have a MySQL db cluster that's about to run out of space :) After a lot of research it seemed like ceph would likely be my best solution. I got my hardware request fulfilled last week and have been tinkering since
[18:56] * skinder (~skinder@23.31.92.81) has joined #ceph
[18:56] <wer> junglebells: .... I run each disk as an osd... just makes life easier IMO. I also just keep the journal there two... but I am not worried about latency.
[18:56] <wer> *too
[18:56] <noob2> junglebells: how are you going to export your storage to your mysql box?
[18:56] <noob2> just out of curiousity
[18:57] <junglebells> Well since I have both innodb and myisam tables to support, I intend to mount up the rbd on each of my two mysql nodes that are needed (one master, one slave) + mysql proxy for load balancing
[18:58] * fghaas (~florian@91-119-222-199.dynamic.xdsl-line.inode.at) has joined #ceph
[18:58] <junglebells> This storage array is also intended to be the basis for moving to virtualization. So it's a little overkill (besides the space) for what I'm doing now but it's going to be needed shortly
[19:00] <junglebells> Specifically, I have some developers who have pidgeonholed us into supporting a database that has one table that's currently at 700GB and growing by 0.9-1.1GB/day. Sharding (at this point) isn't an option. 1) Because we use the FOSS version of MySQL and 2) [at the application layer] because of time constraints.
[19:00] * noob21 (~noob2@ext.cscinfo.com) has joined #ceph
[19:01] <darkfader> junglebells: running is not an option? :))
[19:02] <skinder> i'm trying to compile ceph from the hadoop-common/cephfs/branch-1.0/ branch, and the CephTalker.java is tossing an error: "cannot find symbol : method get_stripe_unit_granularity(), location: class com.ceph.fs.CephMount". i checked, and the libcephfs jar I have, libcephfs-0.56.2, the CephMount class does not have this method. any idea where i can find some libs that do have this method?
[19:05] * noob2 (~noob2@ext.cscinfo.com) Quit (Ping timeout: 480 seconds)
[19:05] <skinder> or should i perhaps be using argo instead of bobtail?
[19:05] <noob21> it looks like splunk recommends 800-1200 iops for their indexing systems. ceph can easily meet that
[19:05] <noob21> maybe another avenue that ceph can take over :D ?
[19:05] * hybrid512 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[19:05] <joao> wer, so all is good with the monitors now?
[19:06] <wer> joao: yes it is. ty.
[19:06] <joao> cool
[19:06] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:06] <wer> other than I don't know why they died :
[19:06] <wer> :P
[19:07] <joao> wer, grep the logs for 'FAILED' or 'Abort' and let me know if anything pops up
[19:07] <wer> k
[19:09] <wer> joao: ah... /var/log/ceph/ceph-mon.d.log: 0> 2013-02-09 01:29:10.468004 7f8fbae2e700 -1 *** Caught signal (Aborted) **
[19:09] <joao> okay
[19:10] <joao> could you drop the mons somewhere so I can take a look?
[19:10] <joao> err
[19:10] <joao> the mon logs I mean
[19:11] <wer> yeah. gimmie a minute... and I'll pm.
[19:11] <joao> thanks
[19:13] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Why is the alphabet in that order? Is it because of that song?)
[19:13] * BillK (~BillK@124-148-77-66.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[19:14] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:20] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) has joined #ceph
[19:20] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[19:22] * BillK (~BillK@58-7-73-186.dyn.iinet.net.au) has joined #ceph
[19:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[19:24] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[19:30] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:32] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[19:32] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[19:34] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[19:36] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:43] * rturk-away is now known as rturk
[19:44] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[19:44] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[19:45] * bstaz (~bstaz@ext-itdev.tech-corps.com) Quit (Read error: Connection reset by peer)
[19:47] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[19:47] * noob21 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[19:47] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[19:51] * bstaz (~bstaz@ext-itdev.tech-corps.com) has joined #ceph
[20:00] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:00] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[20:05] * dmick (~dmick@2607:f298:a:607:e4bc:3c7b:ef1d:1a2f) has joined #ceph
[20:05] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[20:07] <BMDan> slang1: mkcephfs -a -c /etc/ceph/ceph.conf
[20:07] <BMDan> No "dev = entries
[20:07] <BMDan> No "dev =" entries*
[20:08] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:09] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[20:09] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[20:09] <jks> BMDan, and have you got any dev entries in the configuration file?
[20:11] <BMDan> Nope.
[20:12] <jks> have you created the filesystems in advance?
[20:13] <BMDan> The parent directory exists, yes.
[20:13] <BMDan> I'm not dropping OSDs into the root of the partition; I'm going one level deeper so lost+found doesn't get in the way.
[20:14] <BMDan> Old habit from MySQL partitions. ;)
[20:14] <jks> what do you mean by "the parent directory"? and what you set in ceph.conf for each osd?
[20:14] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[20:15] <BMDan> [osd]
[20:15] <BMDan> ; This is where the osd expects its data
[20:15] <BMDan> osd data = /opt/data/$name/data
[20:15] <jks> I would recommend using the paths described in the guide
[20:15] <BMDan> /opt/data/$name exists; /opt/data/$name/data does not.
[20:15] <jks> why do you refer to if it doesn't exist?
[20:15] <jks> have you mounted your disk on /opt/data/$name ?
[20:16] <BMDan> Yes.
[20:16] <jks> then you need to create the data directory
[20:16] <jks> but consider using the default paths instead
[20:16] * BillK (~BillK@58-7-73-186.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[20:16] <BMDan> Default path or not doesn't affect this; issue is that line 359 of mkcephfs doesn't appear to work.
[20:16] <BMDan> test -d $osd_data || mkdir -p $osd_data
[20:17] <jks> submit a patch :-)
[20:17] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:18] <BMDan> I recognize that that is an easy answer to give, especially when it appears that someone is complaining without being willing to fix the issue.
[20:18] <BMDan> In this case, however, I have a contrary report from someone else, which is why I was asking him/her for more feedback on it.
[20:18] <jks> I don't know what other answers I could give you
[20:18] <BMDan> That is, perhaps there's something particular to my configuration that is causing the problem.
[20:18] <jks> if it doesn't work, it doesn't work
[20:18] <jks> if you want to debug it, then make it print out what it has evaluated $osd_data to at that point
[20:18] <BMDan> Therefore, I was letting slang1 know the answer to the question he asks.
[20:18] <BMDan> asked*
[20:19] <jks> okay
[20:25] * skinder (~skinder@23.31.92.81) Quit (Quit: Konversation terminated!)
[20:31] * BillK (~BillK@124-149-91-22.dyn.iinet.net.au) has joined #ceph
[20:33] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:33] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[20:34] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[20:45] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:45] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[20:49] * noob2 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[20:50] * yehuda_hm (~yehuda@2602:306:330b:a40:8090:687f:33d3:7315) Quit (Ping timeout: 480 seconds)
[20:58] * gaveen (~gaveen@112.135.130.211) Quit (Remote host closed the connection)
[21:02] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:02] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[21:10] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:10] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[21:12] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[21:14] <slang1> BMDan: yeah looks like it doesn't work. I'm curious what $osd_data is in your case
[21:15] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[21:15] * fghaas (~florian@91-119-222-199.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[21:16] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[21:16] <BMDan> slang1: Yeah, need to futz with some RAID container settings, then will instrument mkcephfs to find out. :)
[21:17] <slang1> BMDan: cool
[21:18] <BMDan> In related news, my burning hatred for LSI just grew to bonfire proportions. Standby for meme...
[21:19] * BillK (~BillK@124-149-91-22.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[21:21] <slang1> BMDan: I think megacli is the most poorly documented tool I've ever used
[21:21] <BMDan> Bah, can't find it now, so I'll just link this: http://memegenerator.net/instance/24391153
[21:23] <BMDan> Other one was Scumbag Steve saying, "Makes hardware used almost exclusively in servers/Makes Linux CLI download link require JavaScript to work."
[21:24] <BMDan> Also, I am somewhat concerned that I've now correctly typed two MegaCli commands in a row without error. :\
[21:31] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:32] * BillK (~BillK@58-7-243-105.dyn.iinet.net.au) has joined #ceph
[21:34] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:34] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[21:36] * The_Bishop_ (~bishop@e179013183.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:40] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[21:41] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[21:42] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:51] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[21:51] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[21:52] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:53] * yehuda_hm (~yehuda@2602:306:330b:a40:1cf6:5f3:81cd:6df6) has joined #ceph
[21:53] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[22:03] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:03] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[22:06] <ShaunR> I'm converting a qemu raw disk image to a rbd cluster using qemu-img (going from local server to cluster). I'm watching the network traffic and i'm really only seeing a few hundred KB/s... it seams like this number should be ALOT higher!
[22:06] <ShaunR> am i missing somthing here?
[22:11] <mauilion> hey guys. I just ran into a cephx problem. I only just added my mds daemons and used the commands in the linked file to set up authentication.
[22:12] <mauilion> http://nopaste.linux-dev.org/?67828
[22:12] <mauilion> I was able to mount the fs using cephfs
[22:12] <mauilion> then I guess the rotating key expired.
[22:12] <mauilion> and all I see in the logs of the active mds
[22:13] <mauilion> cephx client: could not set rotating key: decode_decrypt failed. error:NSS AES final round failed:
[22:13] <mauilion> so I shut that one down. The other one setup using the same method was in standby and became active.
[22:13] <mauilion> so that's good.
[22:13] <mauilion> now I can't restart the first one cause auth is failing.
[22:14] <mauilion> 2013-02-11 13:02:13.716286 7f1a1de07780 -1 mds.-1.0 ERROR: failed to authenticate: (1) Operation not permitted
[22:14] <mauilion> I am sure I missed something here.
[22:14] <mauilion> But I will be damned if I see it
[22:15] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:15] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) has joined #ceph
[22:23] * CrashHD (CrashHD@c-24-10-14-95.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:24] * CrashHD (~na@65.74.156.108) has joined #ceph
[22:44] <BMDan> Hmm, /me is looking at ext4 options… anyone know if -I (inode-size) would obviate the need for user_xattr?
[22:49] <sjustlaptop> you always need user_xattr
[22:49] <slang1> ShaunR: that does sound slow, but might depend on the hardware you have
[22:50] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:51] <slang1> mauilion: it sounds like maybe something went wrong during the creation of the key for that first mds
[22:51] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:51] <slang1> mauilion: did you try to recreate it?
[22:52] <mauilion> slang1: I have been waiting for the key to expire on my active mds
[22:52] <mauilion> to see if it happens again
[22:53] <slang1> mauilion: ah ok
[22:53] * BMDan blinks… am I reading this correctly? user_xattr is a mount option only; there's no corresponding mkfs option that needs be specified?
[22:53] <slang1> mauilion: how long did it take for the first mds?
[22:53] * noob2 (~noob2@ext.cscinfo.com) has left #ceph
[22:53] <mauilion> not very
[22:53] <mauilion> I can test again
[22:53] <mauilion> back in a bit
[22:54] <BMDan> Hmm, indeed, this seems to be the case. Handy! :)
[23:05] <mauilion> slang1: you win the prize
[23:06] <mauilion> slang1: I can see how I did it
[23:06] <slang1> mauilion: ooh a prize
[23:06] <mauilion> slang1: the diffence between the content of my keyring file and ceph get-or-create mds.0 was distrubing
[23:06] <slang1> mauilion: woops
[23:06] <mauilion> thanks!!
[23:07] <slang1> mauilion: no worries
[23:14] <paravoid> yehuda_hm: hey, around?
[23:29] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[23:30] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:31] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:43] * sleinen1 (~Adium@2001:620:0:25:d545:5c83:cd50:b6b3) Quit (Quit: Leaving.)
[23:44] <ShaunR> i'm doing a rbd cp from outside my ceph cluster, any reason it would be going horribly slow? It's not taking that data and sending it to the remote server and then back to the ceph server is it?
[23:44] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[23:46] <joshd> that's exactly what it's doing, and it's doing it synchronously
[23:47] <glowell> ls
[23:47] <ShaunR> is there a way to have it just copy the data within the cluster?
[23:49] <joshd> not without running it inside the cluster
[23:50] <joshd> it's copying from and to many different osds due the pseodo-random placement of objects
[23:51] <joshd> so making the osds or something do it would be equivalent to running the command from within the cluster
[23:51] <joshd> you could do a clone and flatten it later instead
[23:52] <ShaunR> i'm running it on one of the clusters now, still seams to be going slow
[23:52] <ShaunR> i just want to deuplicate it
[23:53] <ShaunR> this is a test ceph cluster so it's running on two machines right now (each machine has a mon/mds/osd)
[23:54] * vata (~vata@2607:fad8:4:6:80fc:1aae:f39b:dcbe) Quit (Quit: Leaving.)
[23:55] <joshd> if you created it as a format 2 image, you can just clone it (http://ceph.com/docs/master/rbd/rbd-snapshot/#layering)
[23:55] * jochen_ (~jochen@laevar.de) has joined #ceph
[23:56] * yehudasa (~yehudasa@2607:f298:a:607:51be:f300:de79:d2d2) has joined #ceph
[23:57] <ShaunR> not sure what format 2 is/does... looks like maybe COW?
[23:57] * jochen (~jochen@laevar.de) Quit (Ping timeout: 480 seconds)
[23:58] <joshd> that's what layering does, and format 2 supports it, while format 1 does not, but works with the kernel client
[23:58] * a1 (d@niel.cx) Quit (Ping timeout: 480 seconds)
[23:58] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[23:59] <ShaunR> I dont do COW.
[23:59] * al (d@niel.cx) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.