#ceph IRC Log


IRC Log for 2012-11-08

Timestamps are in GMT/BST.

[0:00] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[0:00] <dmick> jefferai: ok
[0:00] <dmick> osd.3 0 handle_osd_map fsid 6f4a8867-673c-4f98-8239-8f142ac8b664 != 00000000-0000-0000-0000-000000000000
[0:00] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:00] <dmick> it seems as though the fsid is damaged
[0:00] <jefferai> how would that happen? It was freshly created right beforehand
[0:00] <jskinner> interesting.. my secret appears to be gone
[0:01] <jefferai> like, the entire cluster
[0:01] <dmick> not sure, but
[0:01] <jefferai> including the XFS filesystems
[0:01] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Remote host closed the connection)
[0:01] * Tamil1 (~Adium@ Quit (Quit: Leaving.)
[0:01] <dmick> can you examine the contents of <osd-data-dir>/ceph_fsid?
[0:01] <jefferai> all six OSDs on that box have the same:
[0:01] <jefferai> 6f4a8867-673c-4f98-8239-8f142ac8b664
[0:02] <dmick> (I guess that's /var/lib/ceph/osd/data/osd.3/ceph_fsid)
[0:02] <jefferai> actually should have been osd.1
[0:02] <dmick> .3 is the one that failed, appaerntly
[0:02] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[0:02] * Tamil (~Adium@2607:f298:a:607:3831:fd6b:688:6ba7) has joined #ceph
[0:03] <jefferai> they must all be failing
[0:03] <jefferai> or else I should be able to start osd.1
[0:03] <jefferai> but I can't
[0:03] <dmick> the 00000 is from the monitor
[0:03] <jefferai> same behavior
[0:03] <jefferai> huh
[0:03] <dmick> somehow the monitor doesn't believe you have the right filesystem data
[0:03] <dmick> so the osd is saying "welp, not me, see ya"
[0:03] * jefferai wonders if XFS is being wonky
[0:03] <dmick> got a meeting
[0:03] * Tamil1 (~Adium@ has joined #ceph
[0:03] <jefferai> I do have the mons on xfs as well
[0:03] <dmick> maybe troll the monitor's data store?
[0:03] <jefferai> maybe, not sure what I'd be looking for
[0:04] <jefferai> only osd.2 and osd.3 have that in their logs, but I can't start any of the osds on that box
[0:04] <jefferai> thing is, this is reproduceable
[0:05] <jefferai> if I totally wipe everything in /var/lib/ceph
[0:05] <jskinner> OH - I was still in as the cinder user lol
[0:05] <jefferai> which itself is an XFS filesystem on raid-1
[0:05] <jefferai> and I create all the dirs, and re-create XFS filesystems on all of the OSDs
[0:05] <jefferai> and re-run mkcephfs
[0:05] <jefferai> then the same thing happens over and over
[0:05] <jefferai> on this one box, all the osds fail
[0:05] <jefferai> on the other two boxes, they're all fine
[0:06] <jefferai> so I wonder if I should suspect something about the parent /var/lib/ceph XFS filesystem
[0:06] <joshd> jefferai: did you say earlier this one box has some extra kernel patches?
[0:06] <jefferai> joshd: nope
[0:06] <jefferai> it had a slightly different kernel version, but the other two boxes have the same one now
[0:06] <jefferai> (just a normal ubuntu kernel update)
[0:06] * ScottIam (~ScottIam@ Quit (Quit: Leaving)
[0:07] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[0:08] <jskinner> ok, made the change... still seem to be getting a file exists error
[0:10] * Tamil (~Adium@2607:f298:a:607:3831:fd6b:688:6ba7) Quit (Ping timeout: 480 seconds)
[0:11] <joshd> jskinner: can you pastebin the output of 'rbd create --id volumes --size 1 --pool volumes testperms --debug-ms 1'?
[0:11] <jskinner> yep
[0:13] <jskinner> http://pastebin.com/s1jEuZ0g
[0:14] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[0:14] <jefferai> dmick: woah woah
[0:14] <jefferai> http://tracker.newdream.net/issues/2115
[0:14] <jefferai> on my storage-1 box I have no keys in my keyring except for client.admin
[0:14] <jefferai> on the other two boxes I have the appropriate keys for the OSDs on those boxes
[0:14] <jefferai> perhaps a mkcephfs bug?
[0:15] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[0:15] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[0:16] <joshd> jskinner: ok, so that shows it's still a problem with the osd caps
[0:18] <jskinner> this is what I've got now: caps: [osd] allow rwx pool=volumes; allow rx pool=images
[0:18] <joshd> jskinner: 'ceph auth list' shows the osd caps for client.volumes as 'allow rwx pool=volumes; allow rx pool=images'?
[0:19] <jefferai> sage (sagewk? sagelap?): since you closed that bug, maybe you can comment
[0:19] <jefferai> the problem is that at the end of makecephfs it wipes the keys
[0:20] <joshd> jskinner: are your osds also running v0.48.2?
[0:20] <joshd> jskinner: there are some osd caps parsing changes in 0.53
[0:20] <jefferai> dmick: sage: see http://paste.kde.org/599678/ -- at the end of makecephfs it wipes the keys in the keyring (I put the line from the makecephfs output in the middle)
[0:21] * Tamil1 (~Adium@ Quit (Quit: Leaving.)
[0:21] * Tamil (~Adium@2607:f298:a:607:1982:5089:44b4:5e27) has joined #ceph
[0:21] <jefferai> ceph.conf is at http://paste.kde.org/599654/
[0:21] <jskinner> how do I check osd version
[0:21] * jefferai will be back in a bit
[0:21] <joshd> jskinner: ceph-osd --version
[0:21] <joshd> on the osd nodes, of course
[0:22] <jskinner> yes they are also running 0.48.2
[0:22] * houkouonchi-work (~linux@ Quit (Remote host closed the connection)
[0:23] * houkouonchi-work (~linux@ has joined #ceph
[0:25] * slang (~slang@ace.ops.newdream.net) Quit (Quit: slang)
[0:26] <joshd> hmm, I used exactly that capability string and it works... there must be some difference
[0:27] <jskinner> well I pushed my ceph.conf and keyring for volumes out to all my nodes again
[0:27] <jskinner> and ran that command again
[0:27] * Tamil (~Adium@2607:f298:a:607:1982:5089:44b4:5e27) has left #ceph
[0:28] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[0:28] * Tamil (~Adium@2607:f298:a:607:1982:5089:44b4:5e27) has joined #ceph
[0:31] <jskinner> restarted ceph on all nodes - I think I busted it
[0:31] * Q310 (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[0:32] <jskinner> health HEALTH_WARN 535 pgs peering; 535 pgs stuck inactive; 535 pgs stuck unclean
[0:34] <joshd> are your osds all up?
[0:35] <joshd> it shouldn't take that long for them to peer
[0:35] <jskinner> 12 up 12 in - yep
[0:35] <joshd> what does 'ceph health detail' say?
[0:36] <jskinner> looks like a ton of pgs that are stuck peering
[0:36] <dmick> jefferai: keyring.admin is meant to hold the keys for the client named client.admin; could be that's the issue (setting that keyring filename globally and trying to use it both for osd keys and client.admin keys). Not sure
[0:36] <jskinner> and a lot that are peering, acting
[0:36] <joshd> sounds like it's just a bit slow getting started then
[0:37] <jskinner> yeah i restarted ceph on all nodes at the same time - may have been a bad idea lol
[0:38] <jskinner> I didn't have the ceph.client.volumes.keyring pushed out to all nodes
[0:38] <jskinner> so I did that and restarted ceph
[0:38] <dmick> jefferai: yes, I think that's the issue
[0:39] <dmick> move keyring = /etc/ceph/keyring.admin out of the [global] section and put it in [client.admin], so it only applies to client.admin
[0:39] <dmick> and let the osd and mon keyrings default
[0:41] * houkouonchi-work (~linux@ Quit (Remote host closed the connection)
[0:42] <jskinner> well looks like its actually getting worse - HEALTH_WARN 72 pgs down; 1015 pgs peering; 1015 pgs stuck inactive; 1015 pgs stuck unclean
[0:45] <jskinner> well - no huge deal, it's just a dev environment. Maybe I'll just call it a day and see if it sorts itself out over night.
[0:45] * houkouonchi-work (~linux@ has joined #ceph
[0:48] <jskinner> thanks for all of your help, joshd. I think we're really close here.
[0:48] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[0:49] <joshd> jskinner: you're welcome. you helped uncover a couple bugs in error reporting
[0:51] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) Quit (Quit: ummm)
[0:51] <jskinner> Good to know. I'll tackle this beast in the morning. Thanks again, take it easy.
[0:51] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[0:56] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:59] * jtang1 (~jtang@ has joined #ceph
[1:14] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[1:18] * The_Bishop (~bishop@2001:470:50b6:0:ec24:ee44:4264:26c) Quit (Remote host closed the connection)
[1:19] * maxiz (~pfliu@ has joined #ceph
[1:24] <jefferai> dmick: suggestion: when creating the keyring, if the file already exists, replace the client.admin entry
[1:24] <jefferai> instead of blowing away the whole thing
[1:25] * gregaf (~Adium@2607:f298:a:607:54ad:7d4e:4ed:af03) Quit (Quit: Leaving.)
[1:26] * jlogan2 (~Thunderbi@ has joined #ceph
[1:28] * jlogan (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) Quit (Ping timeout: 480 seconds)
[1:30] <dmick> sure, it's a thought. right now it's just set up to have those rings be different. it's kind of a painful failure scenario in any event
[1:31] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[1:31] <jefferai> yep
[1:31] <jefferai> it is
[1:31] <jefferai> Like the guy in that bug report, I saw it in a configuration file
[1:32] <jefferai> either on the ceph site somewhere
[1:32] <jefferai> or somewhere seemingly authoritative
[1:32] <jefferai> didn't realize it was bad news
[1:32] * loicd (~loic@ Quit (Quit: Leaving.)
[1:36] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:37] <dmick> yeah, the sample config file at http://ceph.com/wiki/Cluster_configuration shows a [global]
[1:37] <dmick> but it also shows specific paths for the mds and osd
[1:37] <dmick> in general we really really need to kill the wiki
[1:38] <dmick> there's out-of-date info there
[1:41] <rturk> ugh, that reminds me
[1:41] <rturk> want to run through everything in that wiki and make sure there's nothing we really really need
[1:42] <dmick> YES WE DO :)
[1:42] * houkouonchi-work (~linux@ Quit (Read error: Connection reset by peer)
[1:42] <dmick> jefferai: still doesn't explain why just this node, though; did you somehow have different ceph.conf's on the different machines? or were the OSDs failing everywhere and this is just the first place you diagnosed?
[1:43] <jefferai> nope
[1:43] <jefferai> definitely the same ceph.conf
[1:43] <jefferai> but
[1:43] <jefferai> the client key was copied to the node I ran mkcephfs from
[1:43] <jefferai> so it only overwrote the one keyring
[1:43] * houkouonchi-work (~linux@ has joined #ceph
[1:44] <dmick> oh it was mkcephfs that did the damage. yeah, sure
[1:45] <jefferai> yep
[1:45] <jefferai> ok, stupid question
[1:45] <jefferai> does size = 3 mean replicas = 2?
[1:46] <jefferai> http://ceph.com/docs/master/cluster-ops/pools/ says that for a pool, setting size to 3 means you have two replicas
[1:46] <dmick> replica counts always include the primary
[1:46] <dmick> sigh, no, that's wrong, and that's been noticed before
[1:46] <jefferai> but in http://ceph.com/docs/master/cluster-ops/placement-groups/ it talks about replicas
[1:46] <jefferai> hm
[1:46] <jefferai> ok
[1:46] <jefferai> so size = 3 means the original + 3 replicas?
[1:46] <dmick> I will personally fix that right *(&@#$ now
[1:46] <dmick> no, size is the total number of replicas, including the master
[1:47] <dmick> s/master/primary/
[1:47] <jefferai> ok
[1:47] <dmick> so size == number of copies
[1:47] <jefferai> right
[1:47] <jefferai> ok
[1:47] <jefferai> so http://ceph.com/docs/master/cluster-ops/placement-groups/
[1:47] <jefferai> replicas there means actual replicas
[1:47] <jefferai> or total copies?
[1:47] <jefferai> in the calculation for placement groups
[1:47] <jefferai> I assume total copies, e.g. equal to the pool data size
[1:47] <dmick> we don't tend to talk about replicas as different from copies
[1:47] <jefferai> ok
[1:47] <dmick> the page is just misleading to no end
[1:47] <jefferai> just asking because the language is slightly different
[1:47] <dmick> yes
[1:48] <jefferai> so to have three *total* copies, including master
[1:48] <jefferai> I calculate placement groups with 3 for the value of replicas
[1:48] <dmick> yeah
[1:48] <jefferai> and use a pool data size of 3
[1:48] <jefferai> right?
[1:48] <dmick> right. pool data size is the number of copies is the number of replicas
[1:48] <jefferai> great
[1:48] <jefferai> where "copies" includes master
[1:48] <jefferai> slash primary
[1:48] <jefferai> :-)
[1:48] <dmick> primary is the right term, and yes
[1:50] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[1:56] <jefferai> huh
[1:56] <jefferai> did you know that a default crushmap from a brand new cluster will fail to compile properly? :-)
[1:58] <jefferai> oh, no
[1:58] <jefferai> nm
[1:58] <jefferai> vim changed something with an errant keypress
[1:58] <dmick> I sense trust issues jefferai :)
[1:59] <jefferai> Hah
[1:59] <jefferai> well
[1:59] <jefferai> first cluster
[1:59] <jefferai> finally
[1:59] <jefferai> but, really looking forward to it
[1:59] <dmick> I know. I'm not immune to them either
[2:04] <jefferai> next up, benching
[2:04] <jefferai> :-)
[2:06] <dmick> good dela
[2:07] <dmick> sorry for the missteps and the time it took to diagnose; appreciate your testing and working with me to get things improved
[2:07] * AaronSchulz (~chatzilla@ Quit (Remote host closed the connection)
[2:18] <dmick> jefferai: see 3 latest commits (doc fixes)
[2:29] * Tamil (~Adium@2607:f298:a:607:1982:5089:44b4:5e27) has left #ceph
[2:30] <jefferai> dmick: no problem -- I much appreciate you taking the time to work with me on it
[2:39] * houkouonchi-work (~linux@ Quit (Read error: Connection reset by peer)
[2:42] * Q310 (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) Quit ()
[2:44] <jefferai> dmick: if you're still around and up for more, got the next issue :-)
[2:44] <jefferai> I can run iperf between the boxes at 10Gb/s
[2:45] <jefferai> but when I ran a rados bench I got
[2:45] <jefferai> Bandwidth (MB/sec): 0.008
[2:45] <jefferai> using
[2:45] <jefferai> rados -p ssds bench 60 write -b 4MB -t 16
[2:46] <lurbs> Is there an easy way to check if a snapshot is protected?
[2:47] <joshd> lurbs: rbd info image@snap
[2:47] <dmick> lurbs: or rbd ls -l
[2:47] <lurbs> Ta.
[2:47] <lurbs> I'd seen it before, but couldn't recall where.
[2:48] <dmick> jefferai: well, yeah, that certainly sucks
[2:49] <dmick> joshd: what would you look at first?
[2:49] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:49] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:49] <jefferai> hm
[2:49] <joshd> do a benchmark of the osds individually (ceph osd tell \* bench) and watch ceph -w for the results
[2:50] <jefferai> in the output of the bench, it has a column for started/finished
[2:50] <jefferai> those are 4MB objects?
[2:51] <elder> joshd, dmick any thoughts on my message from a few hours ago?
[2:52] <dmick> it made sense to me, but of course the devil is in the details
[2:52] <jefferai> joshd: heh, I ran that -- where do I find the results? the command put out "ok" but that's it
[2:53] <dmick> "watch ceph -w"
[2:53] <dmick> i.e. start ceph -w, then do the osd tell
[2:53] <elder> dmick, well the details have been a real devil and as I thought about it I think this approach will make everything go a lot easier.
[2:53] <dmick> nomenclaturally and structurally I lik eit
[2:54] <dmick> and jefferai: yes, those are "I/Os" in started and finished
[2:54] <elder> This is not the first time I've found stuff where things get allocated and then have nothing to track them afterward, something I'm not that comfortable with in the kernel.
[2:54] <jefferai> dmick: then that doesn't make sense
[2:54] <jefferai> because if the I/O operations are 4MB each
[2:54] <jefferai> and it's going that fast
[2:54] <jefferai> there's no way it's 0.0008 MB
[2:55] <jefferai> joshd: dmick: http://paste.kde.org/599708/
[2:55] <jefferai> so that appears to be 18 osds each seeing about 50MB/s
[2:55] <dmick> I think -b 4MB may be wrong, and it may be doing 4 byte writes, checking
[2:56] <dmick> 4MB is the default, so you can omit and get the test you want
[2:56] <jefferai> hm, ok
[2:56] * houkouonchi-work (~linux@ has joined #ceph
[2:56] <jefferai> there we go
[2:56] <jefferai> average MB/s = 251
[2:57] <jefferai> does that sound decent for SSDs over a 10Gb link?
[2:57] * jefferai thought he's seen people talk about 500MB+
[2:58] <joshd> jefferai: it depends more on the disks and config than the network once you're got 10Gb
[2:58] <jefferai> sure
[2:58] <joshd> it seems reasonable given no tuning
[2:58] <jefferai> I see
[2:59] <jefferai> not sure what tuning to do...custom placement I guess, but right now I just have three boxes, each with 6 SSDs, connected with 10Gb
[2:59] <jefferai> each SSD is an OSD, have a journal on another (raid-1) SSD
[3:00] <jefferai> I just don't really have a good idea for what to expect, but this at least seems like a decent amount compared to iSCSI over 2Gb :-)
[3:00] <lurbs> You'd be limited by the write speed of the journals then, especially if you were writing to every OSD at once in the benchmark.
[3:00] <dmick> jefferai: yes, those were 4-byte writes. Not hard to improve on. :)
[3:01] <jefferai> dmick: hah, ok
[3:01] <jefferai> lurbs: I see -- sadly I am using files for journals -- 5GB files, but still files
[3:01] <jefferai> I guess I could attempt to create 18 partitions
[3:01] <jefferai> if that seems like a better idea
[3:02] <lurbs> I'm not sure how much of a difference that makes. Would be curious to know, though.
[3:02] <dmick> it's better; not sure how much, but better. journals are straight-sequential circular buffers, and don't benefit from filesystem at all
[3:02] <jefferai> sure...but if it can do aio/dio on a single file, it should still be decent...in theory
[3:02] <jefferai> yeah, I'm curious how much better, but not sure I'm curious enough to currently redo all the disks
[3:03] <dmick> yeah, that's why I said "not sure how much"
[3:03] <jefferai> I know the RAID-1 will probably affect it too
[3:03] <jefferai> I guess I could RAID-0 them, or just use one of the two
[3:03] <jefferai> but I thought having raid-ed journals might be a good idea, keeping the box alive if only one of th OSDs goes down
[3:03] <jefferai> so that you don't have to resync all the osds, just the failed one
[3:03] <jefferai> or is that totally faulty thinking
[3:04] <dmick> journals are separate in any case; you're just protecting against a journal SSD failure, right?
[3:04] <jefferai> well
[3:04] <jefferai> I have two ssds
[3:04] <jefferai> raided
[3:04] <jefferai> for the OS
[3:04] <lurbs> I decided against RAID 1, and split the journals across individual SSDs. Faster, more likely to fail, but a failure will only drop half of the OSDs on a single node.
[3:04] <dmick> oh and the journals are there. I see
[3:04] <jefferai> so I raided a second area on the SSDs
[3:04] <lurbs> Seemed a reasonable compromise.
[3:05] <jefferai> because I thought journaling on raid might make it more robust
[3:05] <jefferai> so that if one of the two goes out the box as a whole doesn't suffer
[3:05] <dmick> one of the two SSDs. yes.
[3:05] <jefferai> but I also don't know what I'm talking about :-)
[3:05] <dmick> (you said "OSDs" before when you meant SSDs I think; that threw me for a minute)
[3:05] <jefferai> oh, yes, sorry
[3:05] <jefferai> I did mean that
[3:06] <jefferai> so I guess if putting the journal on the RAID-1 is going to significantly impact speed I'll change it
[3:06] <jefferai> but
[3:06] <jefferai> I figured that was probably still faster than putting the journal on a spinny disk on a second partition
[3:07] <dmick> I don't have a feel for how much (software, I assume) RAID affects performance
[3:07] <dmick> but I do feel fairly confident that no-RAID, no-filesystem will be the raw-speed winner
[3:07] <jefferai> sure
[3:08] <denken> software raid1 has very little overhead aside from the double IO
[3:08] <jefferai> yes, but only on write
[3:08] <jefferai> on read, it can be 2x the speed
[3:09] <jefferai> (especially if you are on spinny disks, btw, you can actually use the raid10 driver with layout=far to get a speed bump when doing mirroring)
[3:09] <jefferai> (with two disks, that is)
[3:10] <denken> i would still put the OS on partition 1 (raid1'd) and split the remaining ssd space into multiple non-raid partitions for raw journal devices
[3:10] <jefferai> hm
[3:10] <jefferai> yeah, I can do that
[3:11] <denken> with good monitoring you'll know as soon as an ssd dies anyway
[3:11] <jefferai> yes
[3:11] <jefferai> I wasn't sure how journals dying affected osds
[3:11] <denken> you'll just have degraded performance untils its replaced
[3:11] <jefferai> I guess those OSDs will be toast
[3:11] <denken> the osds will stop servicing IO and pgs will become stuck
[3:12] <denken> but IO to the block device / objects will continue
[3:12] <jefferai> sure
[3:12] <jefferai> okay
[3:12] <denken> as long as you dont loose another osd in another chassis ;)
[3:12] <jefferai> well
[3:13] <denken> s/another/the other/
[3:13] <jefferai> I plan on size = 2
[3:13] <jefferai> er
[3:13] <jefferai> = 3
[3:13] <jefferai> so I can lose two of 'em
[3:13] <denken> ah
[3:14] * buck (~buck@c-24-7-14-51.hsd1.ca.comcast.net) has left #ceph
[3:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[3:16] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[3:27] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[3:43] * jlogan2 (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[3:59] * James259 (~James259@ has joined #ceph
[4:00] <James259> ceph.com broken?
[4:03] <joshd> so it would seem
[4:05] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:05] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:05] <James259> kk.. just wanted to check it wasn't my connection. :) ty.
[4:05] <joshd> and now it's back
[4:06] <James259> ahh, great.. guess someone did a reboot. :P
[4:25] * James259 (~James259@ Quit ()
[4:26] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[4:42] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:51] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[5:23] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[5:30] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[5:30] <jefferai> denken: dmick: lurbs: so using separate partitions for each journal device, I average 241MB/s
[5:30] <jefferai> actually, not average
[5:30] <jefferai> that's what it was at the end of the 60 second
[5:30] <jefferai> s
[5:30] <jefferai> it started out close to 300
[5:30] <jefferai> max bandwidth of 340MB/s
[5:31] <jefferai> my guess is that I fill up the journals relatively quickly and time goes down
[5:31] <dmick> yeah
[5:31] <dmick> write-only is rough on journals
[5:31] <jefferai> yeah
[5:31] <dmick> did it report an average? Sounds like it won't ber a lot different
[5:32] <jefferai> Bandwidth (MB/sec): 241.521
[5:32] <jefferai> Stddev Bandwidth: 52.962
[5:32] <jefferai> Max bandwidth (MB/sec): 340
[5:32] <jefferai> Min bandwidth (MB/sec): 68
[5:33] <dmick> ah, so 241 was also the average
[5:34] <jefferai> well
[5:34] <jefferai> yeah, I guess
[5:34] <jefferai> started out higher
[5:34] <jefferai> and climbed down over time
[5:34] <jefferai> hm, another documentation bug
[5:34] <jefferai> http://ceph.com/docs/master/man/8/rados/?highlight=bench
[5:34] <jefferai> it says there's a mode option, read or write
[5:34] <jefferai> but, if you actually try to do read, it thinks it's bogus
[5:34] <dmick> you have to do write first so there's something to read
[5:35] <jefferai> yeah, but read isn't valid regardless
[5:35] <jefferai> according to the tool itself
[5:35] <jefferai> write/seq/rand
[5:35] <dmick> oh oh
[5:35] <dmick> right
[5:35] <dmick> it was read once, but now there are two kinds of read
[5:35] <jefferai> yeah
[5:36] <dmick> and...now that it cleans up after itself, read doesn't work either. oy.
[5:36] <jefferai> unless you pass --no-cleanup
[5:36] <jefferai> to write
[5:36] <dmick> ah
[5:37] <jefferai> well
[5:37] <jefferai> maybe
[5:37] <jefferai> trying that now
[5:37] <jefferai> ah
[5:37] <jefferai> root@hbfs-storage-1:~# rados -p ssds bench 60 rand
[5:37] <jefferai> Random test not implemented yet!
[5:37] <jefferai> error during benchmark: -1
[5:37] <jefferai> error 1: (1) Operation not permitted
[5:37] <jefferai> hah
[5:38] <dmick> oh yeah, that too :)
[5:38] <jefferai> however, results for sequential reading:
[5:38] <jefferai> Read size: 4194304
[5:38] <jefferai> Bandwidth (MB/sec): 768.959
[5:38] <jefferai> Average Latency: 0.0831432
[5:38] <jefferai> Max latency: 0.290477
[5:38] <jefferai> Min latency: 0.027521
[5:38] <jefferai> woohoo!
[5:38] * jefferai should have tested the sequential read with the journal files instead of devices, ah well
[5:38] <dmick> yeah, don't think journal really affects read at all
[5:39] <jefferai> unless you're reading before the journal has committed
[5:39] <jefferai> ...maybe?
[5:39] <jefferai> :-)
[5:40] <dmick> don't remember for certain, but of course that's not what you'd call significant portion of reads :)
[5:40] <jefferai> sure
[5:40] <jefferai> even so, that's > 3x faster than the max I could possibly do with my previous iSCSI setup
[5:41] <dmick> w00t
[5:42] <dmick> there is rather less overhead in an object store ...
[5:42] <dmick> what were you using iSCSI for? VM backing?
[5:42] <jefferai> yeah
[5:42] <jefferai> had two boxes that I had essentially gotten donated
[5:42] <jefferai> one was an equallogic
[5:43] <jefferai> the other an EMC CX3-10c
[5:43] <jefferai> both quite old
[5:43] <jefferai> the EMC was originally fiber channel but ended up using iscsi with it
[5:43] <jefferai> with vmware on top
[5:43] <jefferai> performance was eh
[5:43] <jefferai> although I hear both are better these days
[5:43] <jefferai> what kept me up at night was the lack of redundancy
[5:43] <jefferai> especially since two of the EMC's os drives are toast
[5:44] <jefferai> and I can't get them replaced
[5:44] <jefferai> too old to get a support contract
[5:44] <jefferai> (and probably couldn't afford it anyways)
[5:44] <jefferai> so I'm hoping to get everything working with ceph and ganeti and/or ovirt before more os drives die
[5:44] <jefferai> or one of them just dies, period :-)
[5:45] * dmick nods
[5:45] <jefferai> controller or so
[5:45] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[5:45] <jefferai> so I have 12 storage boxes now -- 4 with SSDs, 8 with spinny disks (3 online for now)
[5:45] <jefferai> 4 compute nodes
[5:45] <jefferai> and dual switches
[5:45] <jefferai> everything redundant
[5:45] <jefferai> and data size = 3
[5:45] <jefferai> assuming nothing with ceph goes haywire, I'll be able to sleep again :-)
[5:57] <s_parlane> jefferai: what vendor are your storage boxes ?
[6:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:01] * adjohn is now known as Guest4962
[6:01] * Guest4962 (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[6:01] * _adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[6:01] * _adjohn is now known as adjohn
[6:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:01] <jefferai> s_parlane: they're just silicon mechanics nstor
[6:01] <jefferai> or whatever they're called
[6:01] <jefferai> 20 2.5" or 12 3.5"
[6:01] <jefferai> I got them to sell them to me without drives
[6:01] <jefferai> so I could put in my own, far cheaper drives
[6:01] * didders_ (~btaylor@ has joined #ceph
[6:02] * didders_ (~btaylor@ has left #ceph
[6:02] * didders_ (~btaylor@ has joined #ceph
[6:03] <jefferai> the quality is eh -- it's your basic Supermicro chassis + mother board
[6:03] <jefferai> with LSI disk controllers
[6:03] <jefferai> so you get what you pay for in some sense
[6:03] <jefferai> but, the price was right
[6:03] <jefferai> has (crappy, but good enough) iKVM
[6:04] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:05] <s_parlane> you have 12 of them ?
[6:05] <s_parlane> dont you get 24 2.5" (thats what we have in our supermicro)
[6:05] <jefferai> 2U boxes
[6:05] <jefferai> either 20 2.5
[6:05] <jefferai> or 12 3.5
[6:05] <jefferai> supermicro makes a ton of chassis
[6:06] <s_parlane> ok
[6:06] <jefferai> with various tradeoffs with what you can fit in vs. disk caddies
[6:06] <jefferai> s_parlane: are yours 2U?
[6:06] <s_parlane> yes, we have 1 with 24x 2.5 (23 spin + 1 ssd), and 1 with 12x 3.5 (all 2TB)
[6:07] <s_parlane> both are 2U
[6:07] <jefferai> hm
[6:07] <jefferai> not sure about the 24, probably some difference in the layout of the other bits
[6:08] <s_parlane> what interface are you using ? (gigabit, or 10gig)
[6:08] <jefferai> dual 10gig
[6:08] <s_parlane> what switching gear ?
[6:08] <jefferai> although for redundancy, not because I expect I can saturate it with ceph traffic :-)
[6:08] <jefferai> Arista
[6:08] <jefferai> so I'm doing lacp bonding to the two arista switches which run an mlag
[6:08] <jefferai> working great, so far
[6:08] <jefferai> although the arista switches break pxe booting
[6:09] <jefferai> they haven't figured out the why, yet
[6:09] <s_parlane> interesting
[6:09] <jefferai> they just drop DHCP boot request packets on the floor
[6:09] <s_parlane> are you using sdn ?
[6:09] <jefferai> "normal" dhcp packets go out just fine
[6:09] <jefferai> not sure what sdn is
[6:09] <s_parlane> software defined networking
[6:09] <jefferai> oh
[6:09] <jefferai> no, not at this point
[6:10] <s_parlane> are you running them as pure l2 ?
[6:10] <jefferai> yeah
[6:10] <jefferai> well,
[6:10] <jefferai> techncially they have ip routing on
[6:10] <jefferai> since arista requires that just to talk between its own interfaces
[6:10] <s_parlane> what form factor and port density are you getting ?
[6:10] <jefferai> like, for its own vlan interface to talk to another ip on the same subnet
[6:10] <jefferai> 1U, redundant fans/power supply, 48 10GbE copper
[6:10] <jefferai> plus 4 SFP+
[6:11] <s_parlane> oh right
[6:11] <s_parlane> what cost is that ?
[6:11] <jefferai> let me look
[6:12] <s_parlane> 48 10GbE, thats a bit insane, do you have 24 10GbE ethernet devices ?
[6:12] <jefferai> total including power supplies, 3 year support contract, etc. was 33.5 k for both
[6:12] <jefferai> s_parlane: not yet
[6:12] <jefferai> at the moment I have 16
[6:12] <s_parlane> oh but you have 2 of them ?
[6:12] <jefferai> although I'm using some of the ports for 1G
[6:12] <jefferai> point is they're all copper
[6:13] <jefferai> juniper and cisco and extreme wanted to sell me switches that needed SFPs for every port
[6:13] <jefferai> so I didn't have to do that, and could get good/cheap intel dual-port cards
[6:13] <jefferai> s_parlane: yeah, two of them for redundancy
[6:13] <jefferai> theoretically one of the switches could completely die and nobody will know a thing
[6:13] <s_parlane> disclaimer: i work for alliedtelesis
[6:13] <jefferai> except my monitoring
[6:13] <jefferai> :-)
[6:14] * jefferai googles
[6:14] <s_parlane> we sell a product that will give you 8x dual 10Gb in SFP+, ethernet or XFP
[6:14] <jefferai> so you're a service integrator?
[6:15] <s_parlane> i work at one of the r&d centers, we write the layer 3 switching software here
[6:15] <jefferai> ah
[6:15] <jefferai> cool
[6:15] <jefferai> openflow I imagine
[6:15] <s_parlane> nah, we don't support openflow or any sdn yet
[6:15] <jefferai> oh
[6:15] <jefferai> s_parlane: which product are you referring to?
[6:15] <jefferai> you guys have a lot
[6:15] <s_parlane> switchblade x908 (with XEM-2XT/2XP/2XS)
[6:16] <s_parlane> ether/xfp/sfp+ (respectively
[6:16] <jefferai> ah
[6:16] <s_parlane> plus it will do back-port stacking, to another unit, to get your redundancy
[6:16] <jefferai> oh, I forgot: one of the things I use the ports for are 4 LACPd ports, for a 40Gb interconnect between the switches
[6:17] <jefferai> sure
[6:17] <jefferai> what's the cost on those?
[6:17] <s_parlane> sorry not sure
[6:17] <s_parlane> i write the software, not the price lists
[6:17] <jefferai> heh
[6:17] <jefferai> ok
[6:18] <s_parlane> if i understand correctly, cheaper than the cisco eq
[6:18] <jefferai> yeah, that's not hard
[6:18] <jefferai> TBH, I don't know much about who's out there in telecom
[6:18] <jefferai> we have a lot of cisco/juniper/extreme/foundry around, but mostly too pricy
[6:18] <s_parlane> the backport stacking will give you 40Gb (or more ?) interconnect and the switches will work together
[6:19] <s_parlane> (and not waste your precious front ports)
[6:19] <jefferai> yeah, I know what stacking is :-)
[6:19] <jefferai> arista can do stacking too, I just didn't
[6:19] <jefferai> because I have enough ports, right now
[6:19] <jefferai> if I need to clear up ports I can put in a cheap 1G switch for half the ones I have plugged into the 10G right now, just no need to yet
[6:20] <s_parlane> oh right, there is XEM-STK, which uses the XEM bay to do stacking, but thats more useful for x900 (12 or 24 copper ports, 1U, 1 or 2 XEM bays)
[6:21] <s_parlane> anyways, back to my other questions, how many disks are you intending to put in each node ?
[6:21] <jefferai> eventually, full up
[6:21] <jefferai> I have 8 of the 12x3.5, and 4 of the 20x2.5
[6:21] <jefferai> minus two OS disks in each
[6:21] <jefferai> but, couldn't afford all the drives now
[6:21] <jefferai> so when people want to use storage, they'll have to chip in
[6:23] <s_parlane> so 12x 3.5 will give you over 1GB of read, which should saturate one of your 10Gb links
[6:23] <jefferai> that's assuming a lot
[6:23] <jefferai> a best case scenario
[6:24] <s_parlane> (if anyone can figure a suitable way to work out actual expected performance, i'd love to see it)
[6:24] <jefferai> heh
[6:24] <jefferai> and, if I end up saturating 20Gb of bandwidth end-to-end for each of my compute nodes, I'm okay with that
[6:24] <s_parlane> btw, depending on what system you are using on top of rados, then maybe your backup will do that
[6:24] <jefferai> that would mean an aggregate bandwidth of an order of magnitude more than I had last generation
[6:24] <jefferai> like, a literal order of magnitude
[6:24] <jefferai> so, great!
[6:25] <jefferai> mostly RBD
[6:25] <jefferai> to qemu
[6:25] <jefferai> I'll be running backups out of the VMs to a backup box
[6:25] <s_parlane> oh ok
[6:25] <jefferai> I expect it'll take up a lot of bandwidth, yes
[6:25] <jefferai> but that's okay
[6:25] <jefferai> will run off-hours
[6:26] <s_parlane> for mean off-hours are 1day a week + (max) 6 hours a day for other days
[6:26] <jefferai> what kind of backup were you talking about?
[6:26] <jefferai> direct object backup with snapshots, or some such thing?
[6:26] <s_parlane> doing snapshot + copy of the whole OS rbd
[6:26] <jefferai> ah
[6:27] <jefferai> I guess I could
[6:27] <jefferai> I'm not sure it'll buy me much
[6:27] <s_parlane> depends what works for you, and what you run in the VMs
[6:27] <jefferai> I have a preseed file for setting up VMs and a Salt install to configure them
[6:27] <jefferai> I can get up and running again very quickly
[6:27] <s_parlane> Salt ?
[6:27] <jefferai> so the flexibility of direct access to files in the backups is nice
[6:28] <jefferai> http://www.saltstack.org
[6:28] <jefferai> like chef but better
[6:28] <jefferai> I say that from experience :-)
[6:28] <didders_> I create and connect an rbd device through a monitor? What kind of redundancy can I build through that? Can multiple nodes connect to the same rbd through different monitors?
[6:28] <jefferai> no database, everything in files + current daemon info, so you can put all data in Git intead of worrying about backing up the various DBs/queues Chef uses
[6:29] <s_parlane> yay, git
[6:29] <s_parlane> didders_: what protocol are you talking to the monitor with ?
[6:30] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[6:30] <didders_> not sure, still figuring out where the SPOFs are… NFS or iscsi probably
[6:30] <s_parlane> ok, well NFS will cause you a SPOF
[6:30] <didders_> no multipathing?
[6:31] <s_parlane> iSCSI supports multipath, which will mean you can attach the rbd on mutiple monitors, and tell it to deal with using any/all of them
[6:31] <jefferai> but you still need a clustered fs on top of that
[6:31] <jefferai> if you have the same iSCSI target exported to multiple hosts
[6:31] <s_parlane> NFS wont if you use GFS2 or OCFS, but thats a whole nother clustering system to worry about
[6:31] <didders_> true, I'm assuming higher level is fenced
[6:32] <jefferai> didders_: what's your use case?
[6:32] <jefferai> maybe we can figure out a better way
[6:32] <s_parlane> didders_: how many clients need access to the data ? and how long can it be missing for before you get shot ?
[6:33] <didders_> jefferai: running VM images/instances
[6:33] <didders_> from hypervisors… rather than traditional nas
[6:33] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[6:33] <s_parlane> didders_: multipath the iscsi
[6:33] <jefferai> didders_: which hypervisor?
[6:33] <didders_> vmware
[6:33] <didders_> :-|
[6:33] <jefferai> then yeah
[6:34] <s_parlane> yeah, def multipath iscsi
[6:34] <jefferai> multipath the iscsi, and use VMFS on top
[6:34] <s_parlane> there is a guide for this
[6:34] * jefferai is moving away from vmware, finally
[6:34] <s_parlane> http://ceph.com/wiki/ISCSI
[6:35] <s_parlane> jefferai: i am trying to avoid it in the first place
[6:35] <s_parlane> openstack, cloudstack, eucalyptus ? (or something different again)
[6:35] <didders_> would my replica also play into the expected iops?
[6:35] <didders_> read I assume..
[6:35] <didders_> write just number of odd ?
[6:35] <didders_> osd
[6:36] <didders_> jefferai: nice.. I'm no fan of vmware tbh … its the corp thing you know
[6:36] <jefferai> yeah
[6:36] <s_parlane> didders_: what do you want to calc IOPs for ?
[6:36] <didders_> vthis vthat
[6:36] <s_parlane> didders_: vOpenSwitch
[6:36] <jefferai> s_parlane: openstack and cloudstack seem geared more for hundreds of machines or multi tenancy
[6:37] <jefferai> neither of which fit my needs
[6:37] <jefferai> I got recommendations from people in here to stay away from openstack for my scaling, because it is just far too much work
[6:37] <jefferai> so my current plan is to use ganeti
[6:37] * soomae (~Chatzilla@9YYAAKKL6.tor-irc.dnsbl.oftc.net) has joined #ceph
[6:37] <jefferai> I hear good things about ovirt but I don't really want to use a specialized distro
[6:37] <soomae> I hear good things about ovirt but I don't really want to use a specialized distro
[6:37] <s_parlane> ganeti, isn't that designed for drbd ?
[6:37] <soomae> ganeti, isn't that designed for drbd ?
[6:37] <didders_> s_parlane: the in house cluster is heavily used for dev which the dev uses tons of snapshots
[6:37] <soomae> s_parlane: the in house cluster is heavily used for dev which the dev uses tons of snapshots
[6:38] <jefferai> oh, good, another tor loser
[6:38] <soomae> oh, good, another tor loser
[6:38] <didders_> and snapshots kill the io
[6:38] <soomae> and snapshots kill the io
[6:38] <jefferai> dmick: you have ops... ^
[6:38] <soomae> dmick: you have ops... ^
[6:38] <didders_> I loved ganeti but I didn't see any clone support.. that could be different. I put a bug/feature request last year or so
[6:38] <soomae> I loved ganeti but I didn't see any clone support.. that could be different. I put a bug/feature request last year or so
[6:39] <jefferai> s_parlane: ganeti has added rbd support
[6:39] <soomae> s_parlane: ganeti has added rbd support
[6:39] <soomae> fuck off jefferai
[6:39] <jefferai> http://docs.ganeti.org/ganeti/current/html/install.html#installing-rbd
[6:39] <soomae> http://docs.ganeti.org/ganeti/current/html/install.html#installing-rbd
[6:39] <didders_> when crowbar gets off the ground it will definitely help out openstack
[6:39] <soomae> when crowbar gets off the ground it will definitely help out openstack
[6:39] <s_parlane> well, ceph will do 2 IOPs per 1 write, and 1 IOPs per 1 read (distributed across the OSDs)
[6:39] * dmick sets mode +b soomae!*@*
[6:39] <jefferai> soomae: your mom is calling you from upstairs
[6:39] * soomae was kicked from #ceph by dmick
[6:40] <didders_> it kinda sorta works but you know kinda sorta works doesn't work in the board room
[6:40] <jefferai> s_parlane: only thing about ganeti is it says it needs the rbd kernel driver -- I think that's just outdated docs as qemu handles rbd natively
[6:40] <jefferai> but yeah, it appears to fully support rbd for HA
[6:40] <jefferai> I'll know soon
[6:40] <s_parlane> jefferai: oh, how recenty did they add rbd ?
[6:40] <jefferai> I think in 2.5 or 2.6
[6:40] <jefferai> and, I don't really need cloning
[6:41] <jefferai> I've used it to clone templates in vmware
[6:41] * nedaja (~Chatzilla@9YYAAKKL6.tor-irc.dnsbl.oftc.net) has joined #ceph
[6:41] <jefferai> but now that I have a preseeding solution it's practically just as fast to do that
[6:41] <nedaja> but now that I have a preseeding solution it's practically just as fast to do that
[6:41] <didders_> s_parlane: would that metric be multiplied by the number of disks and their rated iops?
[6:41] <jefferai> dmick: ^
[6:41] <nedaja> s_parlane: would that metric be multiplied by the number of disks and their rated iops?
[6:41] <nedaja> dmick: ^
[6:41] <nedaja> fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck
[6:41] <nedaja> fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck
[6:41] <nedaja> fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck
[6:41] <nedaja> fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck
[6:41] <nedaja> fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck fuck
[6:41] <didders_> oh come on
[6:41] <nedaja> oh come on
[6:41] <didders_> ffs
[6:41] * nedaja was kicked from #ceph by dmick
[6:41] <didders_> halp :)
[6:41] <jefferai> dmick: does oftc allow blocking certain masks from joining?
[6:41] <jefferai> they're all coming from tor
[6:41] <dmick> ueaj
[6:42] <dmick> er, yeah
[6:42] <dmick> that doesn't seem like a great filter tho
[6:42] <jefferai> it doesn't, no
[6:42] <didders_> some misinformed member of my team at work setup the vmware NAS with 4 separate raid 5 groups…64k strip
[6:42] <jefferai> didders_: :-D
[6:43] * dmick sets mode +b nedaja!*@*
[6:43] <jefferai> dmick: they use different names every time
[6:43] <jefferai> randomly pick them I thin
[6:43] <jefferai> think
[6:43] <dmick> yeah. I don't know what else to do
[6:43] <jefferai> the idea of tor is nice, but practically speaking it's mostly a wasteland
[6:43] <jefferai> like freenet
[6:43] <jefferai> the concept is interesting, but if you actually look at it it's all just kiddie porn
[6:43] <didders_> I'm really tired of raid tbh
[6:44] <s_parlane> didders_: per IOP coming into the cluster, so theorectically, work out the entire maximum read and write IOPs supported, and half the write value and thats your MAX numbers for read and write IOPs
[6:44] * jefferai isn't tired of two-disk mirrors :-D
[6:44] <didders_> yeah a mirror i can deal with … a 22TB array is scary
[6:44] <jefferai> sure is
[6:45] <s_parlane> didders_: replace half with third for replica count of 3, and so on
[6:45] <s_parlane> (treat it like raid 1+0)
[6:46] <jefferai> s_parlane: so with my 8 3.5" machines, 2 disks for OS /journal each, so 10 drives per machine, that's 320TB raw space, or ~107TB with 3-way replication
[6:46] <jefferai> if I fill it all up
[6:46] <jefferai> not bad for the very reasonable cost
[6:46] <jefferai> and other people at work tried to tell me that netapp was really quite reasonable :-)
[6:47] <s_parlane> yeah, make sure your journals can handle being written to faster than all of your drives combined
[6:47] <jefferai> (which it isn't, and is extra unreasonable if you want a second box for HA/redundancy)
[6:47] <didders_> s_parlane: think of ceph iops calc as raid10 ?
[6:47] <jefferai> eh
[6:47] <didders_> thats with a 2 replica ?
[6:47] <s_parlane> yes
[6:47] <jefferai> the journals are on SSDs, 8GB per journal
[6:47] <jefferai> It
[6:47] <jefferai> it's the best I can do
[6:48] <jefferai> two SSDs shared amongst the various drives
[6:48] <jefferai> in own partitions, though
[6:48] <s_parlane> jefferai: how much ram per node ?
[6:48] <didders_> ok cool, i feel better something i can scribble on the white board against :)
[6:48] <jefferai> 128GB
[6:48] <jefferai> oh sorry
[6:48] <jefferai> 64 GB
[6:48] <didders_> thanks.. i think this could be wikiable
[6:48] <jefferai> 128GB on my VM nodes
[6:49] <s_parlane> don't ask me to prove that in pratice, but in theory it's right
[6:49] <jefferai> that gives me > 2GB RAM per OSD
[6:49] <jefferai> and on most boxes > 4GB RAM per OSD
[6:49] <jefferai> well above the minimums in the docs
[6:49] <s_parlane> jefferai: better than my current 0.5GB per (2TB) disk
[6:49] <jefferai> 12 cores per box
[6:50] <s_parlane> (not to mention, 1/2 core per disk)
[6:50] <jefferai> yeah, I just ordered at the sweet spot
[6:50] <jefferai> Opterons are far cheaper than Xeons
[6:50] <jefferai> and 4GB sticks far cheaper than 8GB
[6:50] * sjustlaptop1 (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[6:51] <s_parlane> im pre-production, so im using pentium4's with 4x 3.5 2TB drives and 2GB ram + 3x GbE
[6:51] <didders_> AMDs stock is like $2 .. crazy
[6:51] <jefferai> yeah, really is
[6:51] <jefferai> I really don't quite understand it
[6:51] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[6:51] <jefferai> they have very decent processors for the price
[6:52] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[6:52] <jefferai> I know people are concerned with density, but doesn't cost factor in somewhere?
[6:52] <s_parlane> i can't saturate the network, but the cpu and memory are complaining alot
[6:52] <jefferai> and for something like Ceph, it's quite decent enough
[6:52] <s_parlane> TCO, its all about the TCO
[6:52] <didders_> jefferai: we buy next day support from dell and RARELY ever use it… waste is a fact of life
[6:52] <jefferai> sure, true
[6:53] <s_parlane> initial + running (power *2) + maintaince + downtime + admin time + upgrades
[6:53] <jefferai> the newest xeons are decently faster than the newest opterons, but about 4* the price
[6:53] <didders_> apple is trying to push intel out rumor mill has it
[6:53] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[6:53] <didders_> might shake something up
[6:54] <jefferai> I guess if I ever get all 20 SSDs in this box I won'thave a core per
[6:54] <s_parlane> - recoverable materials profit (at EOL)
[6:54] <jefferai> I'll have 0.666 core per
[6:54] <jefferai> or so
[6:54] <jefferai> but hoping that will still be okay
[6:54] <jefferai> and can upgrade the processor...
[6:54] <didders_> you guys ever see http://www.solidfire.com …real deal or vapor?
[6:55] <didders_> ssd array
[6:55] <s_parlane> front page wants me to wait 5 days
[6:55] <didders_> hope you have something nice to drink
[6:56] <didders_> :-p
[6:56] <jefferai> real deal I'm sure
[6:56] <jefferai> but I don't see anything revolutionary there
[6:56] <jefferai> and let's see what the prices end up being
[6:56] <didders_> who else builds SSD arrays?
[6:56] <didders_> other than DIY
[6:56] <jefferai> everyone
[6:56] <didders_> supermicro + ssd
[6:56] <jefferai> any storage vendor now has SSD arrays
[6:56] <didders_> everyone = emc/netapp
[6:56] <didders_> etc
[6:57] <didders_> hmm
[6:57] <jefferai> it's not rocket science
[6:57] <jefferai> nor innovative
[6:57] <didders_> true
[6:57] <jefferai> it's replacing hard drives with SSDs
[6:57] <jefferai> and your IOPS go up
[6:57] <jefferai> magic!
[6:57] <didders_> :)
[6:57] <jefferai> the rest is just nice management consoles on top
[6:57] <jefferai> and some easy-integration/scaling sauce
[6:57] <s_parlane> yeah, but you can fake all that with enough RAM and SSD caches
[6:57] <jefferai> true
[6:58] <jefferai> hell, they could be using ceph under the hood, export rbd volumes
[6:58] <didders_> :)
[6:58] <jefferai> ceph makes it easy to build that kind of thing, for someone with the startup capital
[6:58] <s_parlane> actually i think its rbd<->iSCSI
[6:58] <didders_> id hope their not making raid 5 groups ;)
[6:58] <jefferai> s_parlane: yeah, I meant export rbd volumes over iscsi
[6:58] <s_parlane> yeah
[6:58] <didders_> ugh
[6:58] <didders_> i should just ...
[6:58] <didders_> yeah anyway
[6:58] <didders_> :)
[6:59] <didders_> SMASH
[6:59] <s_parlane> 5 nodes use 1.5kW
[6:59] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[6:59] <didders_> … wow how'd the array get corrupted ..crazy guess we need to rebuild it
[6:59] <didders_> <_<
[7:00] <jefferai> s_parlane: "depending on I/O load"
[7:00] <jefferai> :-)
[7:00] <s_parlane> didders_: dd if=/dev/zero of=/dev/iscsi
[7:00] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[7:00] <s_parlane> oh, its blank anyways, lets just adjust that, oh here you go, more storage
[7:01] <s_parlane> (or less probably, since i guess you want to switch to 10 or 6)
[7:01] <s_parlane> anyways, time to go for a bit, bbl
[7:01] <didders_> yeah I'm pushing 10
[7:01] <jefferai> you're all so cynical :-)
[7:01] * jefferai isn't, of course
[7:02] <jefferai> see yo
[7:02] <jefferai> you
[7:02] <jefferai> time for bed, here
[7:02] <didders_> later
[7:03] <didders_> lol omg their playing quarters with a giant solo cup on jimmy fallon
[7:04] <didders_> funny story my boss keeps asking me how many servers we have 'left' and i keep telling me were out :)
[7:05] * dmick (~dmick@2607:f298:a:607:a19c:5287:dc35:bf55) Quit (Quit: Leaving.)
[7:05] <didders_> oh management
[7:07] * sjustlaptop1 (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[7:09] * s_parlane (~scott@ Quit (Ping timeout: 480 seconds)
[7:15] <rweeks> tell him you need more servers to run this amazing ceph software
[7:17] * shitter (~Chatzilla@9YYAAKKL6.tor-irc.dnsbl.oftc.net) has joined #ceph
[7:18] * shitter (~Chatzilla@9YYAAKKL6.tor-irc.dnsbl.oftc.net) Quit ()
[7:25] * didders_ (~btaylor@ Quit (Quit: didders_)
[7:28] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) Quit (Read error: No route to host)
[7:39] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:45] <tore_> if you are made on money, forget ssd's... God loves STEC ZeusRAM
[7:48] * iltisanni (d4d3c928@ircip1.mibbit.com) has joined #ceph
[7:49] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:50] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[7:52] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:03] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:04] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[8:04] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:09] * sagelap (~sage@bzq-19-168-31-70.red.bezeqint.net) has joined #ceph
[8:28] <dweazle> tore_: yes got one of those in my ZFS cluster, they're awesome:)
[8:30] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:30] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:46] <tore_> yeah ssd in raid setups typically don't live too long. There site floating around where they track how long you can continuously write to a specific model SSD before they break. Most don't last more than a few months.
[8:47] <tore_> obviously I'm talking about MLC SSD's. I'm not sold on SLCs either considering their cost...
[8:55] <dweazle> yes, we use stec zeusram for zil and some stec mach16 for read caching in our zfs cluster, but with ceph and the amount of nodes we would like to deploy that really isn't an option
[8:56] * jtang1 (~jtang@ has joined #ceph
[8:56] <dweazle> we haven't deployed any other ssd's just yet because of the endurance issue
[8:58] <tore_> You running nexenta?
[8:58] <dweazle> no freebsd
[8:59] <tore_> I've only run nexenta and ZFS on ubuntu. to tell you the truth I haven't been impressed with what I've seen on ZFS. It kicks out drives too easily
[8:59] <dweazle> oh i haven't seen any of that, this cluster has been in production for almost 2 years now
[9:00] <dweazle> i only use raid-10 though, no raidz
[9:00] <dweazle> and no dedup, no compression
[9:00] <dweazle> capacity is not an issue, i would rather have more iops
[9:01] <dweazle> i wasn't really that impressed by nexenta, especially considering they charge quite a lot for the HA stuff
[9:02] <dweazle> we now basically have the same on freebsd
[9:04] <tore_> we run raid 10, raidz2 and raidz3. I think we maybe have a dozen nexenta clusters as of last week. part of our problem was with seagate drives. They've had some serious issues in the past 18 months with drive longevity. Hitachi's are also kind of suckign right now with extrordinary failure rates during the first 30 days of service.
[9:05] <tore_> here chekc this out. this is a good read: http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm
[9:05] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:05] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:06] <tore_> yeah I would not have boaught nexenta, but ***sigh*** management...
[9:06] <dweazle> we only use WD at the moment
[9:06] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:06] <dweazle> had some issues with hitachi and seagate as well
[9:06] <tore_> hehe soon the hitahci problems will be at your doorstep then
[9:06] <dweazle> ah thanks for the link
[9:07] <tore_> yeah it's a good one
[9:07] <dweazle> our zfs cluster only has WD drives
[9:07] <tore_> http://wdc.com/en/company/hgst/
[9:07] <tore_> WD acquired hitachi
[9:07] <dweazle> oh crap
[9:07] <tore_> yep
[9:07] <dweazle> haha didn't know that :)
[9:07] <tore_> even Netapp is freaking out now
[9:08] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:08] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:08] <tore_> They are losing seagate and hitachi drives left and right. One of our Netapp filers had triple disk failures on 3 different raid6 aggregates in the span of 3 weeks
[9:10] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:12] * ramsay_za (~ramsay_za@ has joined #ceph
[9:13] <dweazle> oh wow
[9:14] * jlogan (~Thunderbi@2600:c00:3010:1:4990:f1e9:6310:a09f) has joined #ceph
[9:21] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[9:26] <dweazle> tore_: interesting .. actually it seems the MWI is completely useless to determine how much life an SSD has left :) (in that regard kind of like S.M.A.R.T. on HDD's;)
[9:33] * tnt (~tnt@ptra-178-50-65-72.mobistar.be) has joined #ceph
[9:38] * Leseb (~Leseb@ has joined #ceph
[9:45] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:51] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[10:01] * gucki (~smuxi@84-72-8-40.dclient.hispeed.ch) has joined #ceph
[10:02] <gucki> good morning everybody
[10:02] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[10:03] <gucki> anybody knows how much memory overhead rbd has on kvm guests when using rbd disks? I'm askining because I see a kvm guest with 2048 mb ram is acutally taking 2700 mb on the host...?! :-(
[10:08] <tnt> Mmm, I guess there is the cache. I think you can configure the amount of RAM you dedicate to it IIR.
[10:09] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[10:10] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[10:10] * BillK (~billk@58-7-223-146.dyn.iinet.net.au) has joined #ceph
[10:15] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[10:21] <gucki> tnt: you mean the cache for writeback? it's set to 32mb in the ceph.conf
[10:21] <gucki> tnt: and i only attached one disk
[10:26] <gucki> it put the question and details here http://serverfault.com/questions/446646/kvm-process-has-to-high-memory-footprint-on-host
[10:39] * tnt (~tnt@ptra-178-50-65-72.mobistar.be) Quit (Ping timeout: 480 seconds)
[10:41] * loicd (~loic@ has joined #ceph
[10:43] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[10:43] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:48] * tnt (~tnt@ptra-178-50-65-72.mobistar.be) has joined #ceph
[10:51] <gucki> tnt: did you get my last messages? :(
[10:51] <tnt> last one I got was "10:26 < gucki> it put the question and details ..."
[10:52] <tnt> but I had to leave temporarly :p
[10:52] <gucki> tnt: ah ok yeah, it was my last one ;)
[10:52] <tnt> Unfortunately I don't know. You can look at the process map to see who use that memory maybe.
[10:52] <gucki> tnt: can you explain me how to do this? :)
[10:52] <tnt> cat /proc/<pid>/maps
[10:53] <tnt> then look for what segments are large :p
[10:53] <tnt> but if it's just the heap, it won't help much
[10:54] * jtang1 (~jtang@2001:770:10:500:a9cc:4d6f:df82:1d7d) has joined #ceph
[10:58] * tnt_ (~tnt@212-166-48-236.win.be) has joined #ceph
[11:02] * tnt (~tnt@ptra-178-50-65-72.mobistar.be) Quit (Ping timeout: 480 seconds)
[11:02] * tnt_ is now known as tnt
[11:12] * loicd (~loic@ Quit (Quit: Leaving.)
[11:15] * s_parlane (~scott@121-74-248-190.telstraclear.net) has joined #ceph
[11:17] * jlogan (~Thunderbi@2600:c00:3010:1:4990:f1e9:6310:a09f) Quit (Ping timeout: 480 seconds)
[11:19] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[11:25] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:25] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:26] * BillK (~billk@58-7-223-146.dyn.iinet.net.au) Quit (Quit: Ex-Chat)
[11:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:40] * maxiz (~pfliu@ Quit (Ping timeout: 480 seconds)
[11:48] * sagelap (~sage@bzq-19-168-31-70.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[11:49] * BManojlovic (~steki@ has joined #ceph
[11:57] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:01] <gucki> tnt: mh ok, i just see a bunch of mapped libraries and then huge areas without any explaination... :
[12:17] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:17] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:27] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:45] * gaveen (~gaveen@ has joined #ceph
[12:47] * loicd (~loic@ has joined #ceph
[12:49] * joao (~JL@ Quit (Remote host closed the connection)
[13:04] * didders_ (~btaylor@ has joined #ceph
[13:08] <Qten> hey guys, seems the copy-on-write cloning isnt quite working for me, I'm using the latest version of ceph 0.53 w/ folsom & cinder and a RAW volume, however when i provison a image it "downloads" the full image to the new volume. I also have the option show_image_direct_url=True in my /etc/glance/glance-api.conf and glance is configured to use rbd for storage etc. any ideas?
[13:17] <benner> how many CPU cores need per OSD? 1:1?
[13:33] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[13:40] * starfish (~Chatzilla@659AABU56.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:42] * gaveen (~gaveen@ has joined #ceph
[13:44] * starfish (~Chatzilla@659AABU56.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[13:45] * starfish (~Chatzilla@82VAAHSIV.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:54] * starfish (~Chatzilla@82VAAHSIV.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[13:54] * starfish (~Chatzilla@659AABU67.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:01] * didders_ (~btaylor@ Quit (Quit: didders_)
[14:13] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[14:17] * Kioob`Taff1 (~plug-oliv@local.plusdinfo.com) has joined #ceph
[14:23] * noob2 (a5a00214@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:25] <Kioob`Taff1> hi
[14:25] <starfish> hi
[14:25] * Kioob`Taff1 is now known as Kioob
[14:27] <Kioob> simple question about a RBD setup : is it possible to start with all OSD on one unique server, then add some physical servers after ?
[14:27] <Kioob> data will automatically be move to other servers ?
[14:28] <starfish> simple question about a RBD setup : is it possible to start with all OSD on one unique server, then add some physical servers after ?
[14:28] <starfish> data will automatically be move to other servers ?
[14:31] <tnt> Any admin around to ban him ?
[14:31] <starfish> Any admin around to ban him ?
[14:33] * starfish (~Chatzilla@659AABU67.tor-irc.dnsbl.oftc.net) Quit (Killed (tomaw (No reason)))
[14:33] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[14:34] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[14:35] <tnt> Kioob: yes, you can add osd later and data will be spread around
[14:36] * joao (~JL@89-181-150-224.net.novis.pt) has joined #ceph
[14:38] <Kioob> thanks tnt !
[14:46] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[14:46] <jtang> whats with the bots?
[14:47] <jtang> im surprised to see two in 24hrs
[14:48] <tnt> yeah, weird ... someone doesn't like ceph ?
[14:48] <jtang> heh
[14:48] <jtang> i need to print me my flight details and maps for sc12
[14:48] <jtang> :)
[14:48] * winsx (~chatzilla@9KCAAC0KF.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:48] <jtang> kinda looking forward to it, i'm gonna be so jet lagged!
[14:51] * starfish (~Chatzilla@83TAACFAL.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:56] * winsx (~chatzilla@9KCAAC0KF.tor-irc.dnsbl.oftc.net) has left #ceph
[15:13] * jstrunk (~quassel@ Quit (Remote host closed the connection)
[15:15] * jstrunk (~quassel@ has joined #ceph
[15:18] * jstrunk (~quassel@ Quit (Remote host closed the connection)
[15:21] * jstrunk (~quassel@ has joined #ceph
[15:21] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[15:23] * danieagle (~Daniel@ has joined #ceph
[15:29] * long (~chatzilla@ has joined #ceph
[15:36] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[15:36] * yanzheng (~zhyan@ has joined #ceph
[15:37] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:42] * starfish (~Chatzilla@83TAACFAL.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[15:59] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) has joined #ceph
[16:03] <tnt> yehudasa: Yesterday you propose to make the object belong to the subuser, so I had the idea of making it belong to it but only until the multipart complete then make it belong to the principal. But I don't quite see how you would put owner ship to a subuser ? they don't have real uid afaict.
[16:11] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[16:16] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[16:19] * yanzheng (~zhyan@ Quit (Quit: Leaving)
[16:19] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[16:49] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[17:06] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[17:11] <jefferai> Hallo
[17:12] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:14] <jefferai> now that I have a working storage cluster, I want to have my compute nodes connect to it -- so my understanding is that I need to have ceph.conf for the cluster and a client key with sufficient access
[17:14] <jefferai> on each compute node
[17:14] <jefferai> if I do that, then rbd should just work?
[17:14] <jefferai> or is there some client setup I'm missing?
[17:17] * long (~chatzilla@ Quit (Quit: ChatZilla 0.9.89 [Firefox 16.0.2/20121024073032])
[17:17] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:20] * jlogan1 (~Thunderbi@2600:c00:3010:1:4990:f1e9:6310:a09f) has joined #ceph
[17:23] * josef (~seven@li70-116.members.linode.com) has left #ceph
[17:24] * Qten (Q@qten.qnet.net.au) Quit (Ping timeout: 480 seconds)
[17:24] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[17:25] * filooabsynth (~Adium@p4FD07815.dip.t-dialin.net) has joined #ceph
[17:28] * oliver2 (~oliver@ has joined #ceph
[17:28] <filooabsynth> there we are again ;)
[17:32] <oliver2> Yes... small advice... uhm, best-practice changed in the docu, when integrating new OSD's /node, to begin with a weight of 0... q.: what's with the crushmap when weighting an node/pool etc? Example: have 4 new OSD's on a new node.
[17:33] <oliver2> We raised from 16 OSD to 20 now. Should the weight in the pool stay on 16 as long as all OSD's weights have 0?
[17:34] * stxShadow (~jens@jump.filoo.de) has joined #ceph
[17:40] <filooabsynth> anyone around?
[17:49] <scuttlemonkey> hey filooabsynth, what's up?
[17:49] * ScottIam (~ScottIam@ has joined #ceph
[17:54] * jasat (~Chatzilla@1GLAAATFK.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:54] <oliver2> Thing is: 4 hosts in a rack in a pool with weight 4 each, sums up to 16. New new node with 4 OSD's, would end in new overall 20. Should twenty be specified from first crushmap injection, or should it stay at 16 as long as all new OSD's are on 0?
[17:55] <jasat> Thing is: 4 hosts in a rack in a pool with weight 4 each, sums up to 16. New new node with 4 OSD's, would end in new overall 20. Should twenty be specified from first crushmap injection, or should it stay at 16 as long as all new OSD's are on 0?
[17:58] * jasat is now known as jasoom
[18:06] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:10] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) has joined #ceph
[18:15] * filooabsynth (~Adium@p4FD07815.dip.t-dialin.net) Quit (Quit: Leaving.)
[18:18] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:19] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:24] * jasoom is now known as barsoom
[18:25] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:26] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:27] * Kioob (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[18:31] * barsoom (~Chatzilla@1GLAAATFK.tor-irc.dnsbl.oftc.net) Quit (Killed (Ganneff (sod off)))
[18:32] * barsoom (~Chatzilla@659AABVJV.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:32] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[18:33] * barsoom (~Chatzilla@659AABVJV.tor-irc.dnsbl.oftc.net) Quit (Killed (Ganneff (No reason)))
[18:40] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:42] * nwatkins (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[18:42] * nwatkins (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has left #ceph
[18:48] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:54] <benpol> So with all of the issues with journals, SSDs, and filesystems, I'm curious if anyone's spent much time looking into the use of hybrid drives (like Seagate's Momentus XT) as OSDs.
[18:55] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[18:57] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:59] * stxShadow (~jens@jump.filoo.de) Quit (Remote host closed the connection)
[19:02] <benpol> Though as I read more about the Momentus approach I see that the NAND is basically used to speed up *read* operations, which is great but doesn't address the need to accelerate writes via a journal on a separate device. That's all, thanks for listening. :_
[19:02] * kbad (~kbad@malicious.dreamhost.com) has joined #ceph
[19:03] <darkfaded> benpol: it also skips almost all sequential r/w
[19:03] <darkfaded> and from what i understood, the ceph journal is quite sequential write-ish
[19:03] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[19:04] * benpol nods
[19:04] <rweeks> yes, because that makes sense
[19:04] <rweeks> random-write journal seems… heh
[19:05] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[19:08] <sagelap> ceph --admin-daemon <pathname> config set debug_client 20
[19:08] <jefferai> stupid question, but: once I have a running ceph cluster, with public/private addressing, what do clients need access to in order to access e.g. rbd devices?
[19:08] <jefferai> the public network, plus a valid key?
[19:10] * guigouz (~guigouz@ has joined #ceph
[19:10] <slang> sagelap: you were also saying in theory we might be able to set the config in gdb with (gdb) cct->conf.debug_client=20;
[19:10] <slang> ?
[19:10] <slang> or something similar...
[19:10] <sagelap> slang: yeah. that used to be easy because the debug stuff was just a regular config value
[19:11] <sagelap> now it is more like cct->_subsys[ceph_subsys_client].log_level = 20 ; cct->_subsys[ceph_subsys_client].gather_level = 20
[19:11] * slang nods
[19:11] <sagelap> something along those lines
[19:11] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[19:12] * chutzpah (~chutz@ has joined #ceph
[19:13] <davidz> It seemed to work but nothing coming out to client.0.log afterward
[19:13] <davidz> ceph --admin-daemon /tmp/cephtest/asok.client.0 config set debug_client 20
[19:13] <davidz> { "success": "applying configuration change: internal_safe_to_start_threads = 'true'\n"}
[19:13] <sagelap> davidz: hmm. i think it will loop ever 10-20 seconds when the mds session msg goes out. unless those have all closed..
[19:17] <joao> have anyone seen this while compiling master?
[19:17] <joao> make[3]: *** No rule to make target `librbd/cls_rbd_client.cc', needed by `librbd_la-cls_rbd_client.lo'. Stop.
[19:17] <yehuda_hm> joao: make distclean, reconfigure, remake
[19:17] <davidz> sagelap: I think that when I raised the debug level a new file client.admin.log appeared but it is empty even after 5 minutes.
[19:17] <sagelap> make clean ; rm -r .deps ; rerun autogen.sh ; make
[19:17] <joao> oh, makes sense
[19:19] <jefferai> hooray
[19:19] <jefferai> got a core dump
[19:19] <jefferai> sagelap: want to know about it?
[19:21] * oliver2 (~oliver@ Quit (Quit: Leaving.)
[19:23] <sagelap> jefferai: yeah
[19:23] <jefferai> sagelap: will file a bug report, but here's the trace;
[19:24] <jefferai> http://paste.kde.org/600266/
[19:26] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Quit: Leaving.)
[19:27] <jefferai> sagelap: http://tracker.newdream.net/issues/3463
[19:31] <sagelap> jefferai: if you have a core file, can you gdb and include a 'bt'?
[19:31] <jefferai> if you tell me where to find the core file :-)
[19:31] <sagelap> usually in current directory, or /core
[19:33] <jefferai> sagelap: thread apply all bt or some such thing?
[19:33] <jefferai> cause I'm afraid that there isn't much useful info here
[19:33] <sagelap> or just 'bt', if it's the crash
[19:33] * jefferai looks for a debug package
[19:33] <sagelap> do you have 'ceph-dbg' package intalled (on a debian/ubuntu box)?
[19:33] <jefferai> nope, will do
[19:34] <sagelap> that should give you the symboles
[19:34] <sagelap> s
[19:34] <jefferai> yeah, I know
[19:34] <jefferai> getting it installed
[19:35] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[19:35] * rino (~rino@ has joined #ceph
[19:35] <jefferai> sagelap: updated the issue
[19:35] <joshd> gucki: what version of qemu are you using? I think it was 1.2 that had a memory leak like that
[19:36] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:37] <gucki> joshd: hey :). yeah it's 1.2.0
[19:37] <gucki> joshd: the one from ubuntu quantal
[19:37] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:37] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[19:38] <joshd> gucki: librbd will have some overhead, but no leaks that I know of. if you compare with using a raw image file, you can see if it's qemu
[19:38] <jefferai> joshd: related question - I have separate cluster/public addresses. I assume to access RBDs from a qemu box, I need to have it on the public network, with an appropriate access key...right?
[19:39] <gucki> joshd: does the overhead of librd depend on the thread settings (op threads and filestore threads)? i set the wb cache to 32mb, so the overhead should be like 50mb in total?
[19:39] <gucki> joshd: yeah seems like i'll have to do some deeper debugging...but 700mb (30%) seems really evil :(
[19:39] <joshd> jefferai: yup, and and ceph.conf with the monitor addresses (or some other way to configure those, depending on your management platform)
[19:40] <jefferai> ah, so the monitors hand out the access permissions?
[19:40] <jefferai> so it uses the client admin key to talk to a monitor, which it finds via ceph.conf, and then gets the storage info?
[19:41] <joshd> yes, although it doesn't have to be the admin key
[19:41] <jefferai> right
[19:42] <joshd> gucki: I'm not sure what the overhead is offhand, but it may be a bit higher than 50mb virtual. there will be a bunch of threads for networking
[19:48] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) has joined #ceph
[19:49] * Oliver2 (~oliver1@ip-178-201-146-106.unitymediagroup.de) Quit ()
[19:49] <todin> hi, does anyone know if the webcast about chef by dreamhost from yesterday is somewhere online?
[19:52] <jefferai> sagelap: any chance that could happen if there was an issue with the monmap that I retrieved?
[19:52] <jefferai> I got the monmap and put it into monmap/latest
[19:52] <jefferai> I made an assumption that that was where it was supposed to go
[19:52] * adjohn (~adjohn@m9e0536d0.tmodns.net) has joined #ceph
[19:53] * adjohn (~adjohn@m9e0536d0.tmodns.net) has left #ceph
[20:00] * dmick (~dmick@2607:f298:a:607:752e:36a4:1152:7d34) has joined #ceph
[20:03] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[20:04] * ChanServ sets mode +o dmick
[20:05] * jtang1 (~jtang@2001:770:10:500:a9cc:4d6f:df82:1d7d) Quit (Ping timeout: 480 seconds)
[20:07] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[20:09] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[20:11] * jmlowe (~Adium@2001:18e8:2:28a2:8d57:7a91:8ad9:382c) has joined #ceph
[20:14] <dmick> no more tor trolls today?
[20:14] <elder> Not recently anyway.
[20:15] <jmlowe> I could always start trolling if you need that
[20:16] <elder> I could always start trolling if you need that
[20:16] <dmick> the community, stepping up!
[20:16] <jmlowe> well played sir
[20:17] * guigouz (~guigouz@ Quit (Ping timeout: 480 seconds)
[20:19] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:23] * rweeks puts on his troll wig
[20:26] <jefferai> dmick: managed to find a repeatable abort, so yes, I'm still nervous :-)
[20:27] <dmick> repeatable abort?
[20:27] <dmick> bug leading to crash?
[20:27] <joshd> jefferai: I think this is relevant: http://tracker.newdream.net/issues/3438#note-2
[20:30] * guigouz (~guigouz@ has joined #ceph
[20:31] <jefferai> dmick: yeah, http://tracker.newdream.net/issues/3463
[20:31] <jefferai> joshd: very different outcome, at least
[20:32] <jefferai> although I concur that the documentation was quite unclear
[20:33] <joao> that was weird
[20:34] <sagelap> jefferai: that is exactl the problem
[20:34] <sagelap> latest file is prefixed by a u32 of the length of the rest of the data
[20:34] <joao> oh
[20:34] <jefferai> sagelap: hm
[20:34] <jefferai> any idea what went wrong?
[20:34] <sagelap> (iirc :)
[20:35] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Remote host closed the connection)
[20:35] <sagelap> you copied a monmap file to latest manually, right?
[20:35] <jefferai> I did
[20:35] <joao> so, let me get this straight, you got the monmap to monmap/latest?
[20:35] <jefferai> ceph mon getmap -o <mon data dir>/monmap/latest
[20:35] <sagelap> yeah, that' sthe problem
[20:35] <joao> was that a fresh mkfs'd monitor?
[20:35] <jefferai> joao: no, that was before mkfs
[20:36] <joao> I'm with sagelap
[20:36] <jefferai> step 3
[20:36] <jefferai> mkfs is step 4
[20:36] <sagelap> sorry, i misspoke: it needs a u64 with the version number (monmap epoch) to come first, then the monmap itself
[20:36] <joao> jefferai, don't do that
[20:36] <jefferai> joao: don't do what?
[20:36] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[20:36] <joao> you may end up with a corrupted store
[20:36] <jefferai> well
[20:36] <jefferai> I followed this exactly: http://ceph.com/docs/master/cluster-ops/add-or-rm-mons/
[20:36] <joao> drop the monmap on monmap/latest
[20:36] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) Quit (Read error: No route to host)
[20:36] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit ()
[20:37] <joao> jefferai, you should drop the result of 'ceph mon getmap' somewhere else besides <mon-data>/monmap or any subdirectories of <mon-data>
[20:38] <jefferai> hm, okay
[20:38] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) has joined #ceph
[20:38] <jefferai> that is *not* clear
[20:38] <jefferai> it seems like you'd *want* to put it in the mon data dir
[20:38] <joao> jefferai, I know; I've updated 3438 with that note and will pass it along to john
[20:38] <jefferai> okay
[20:38] <jefferai> I'll try again
[20:38] <joao> jefferai, it's okay to put it in the mon data dir
[20:38] <joao> just with another name besides 'monmap' :)
[20:39] <joao> and *never* on a subdir of the mon data dir
[20:39] <joao> <mon-data>/mon.map would be fine though
[20:39] <joao> or /tmp/`uuidgen`-monmap for instance ;)
[20:40] <dmick> joao: if you have a concise recommendation, feel free to just add it to the docs and push, or add it and send to John for final review
[20:40] <dmick> he's buried ATM
[20:41] <sagelap> slang: merged wip-mds
[20:41] <joao> dmick, kay
[20:41] <sagelap> joao, jefferai: yeah best to avoid fiddling anything on the mon dir directly (aside from perhaps the keyring)
[20:41] <slang> sagelap: cool. thanks!
[20:41] <sagelap> slang: is that symlink stuff sorted out?
[20:42] <slang> I believe so
[20:42] <sagelap> wip-test-multiclient is the same patch as the one in wip-mds?
[20:42] <joao> sagelap, even the keyring might be better to keep of the mon data dir, given that it doesn't really matter where it is when you pass it to --mkfs
[20:42] <slang> sagelap: the tests are passing, if the branch looks good I think its ready for merge
[20:42] <joao> better to just leave the mkfs deal with it as it sees fit
[20:42] <sagelap> the one in the mon dir is what is uses on startup to authenticate with its peers
[20:42] <slang> sagelap: yes
[20:42] <sagelap> so it's needed (and occasionally useful to be able to change manually)
[20:43] <joao> sagelap, yeah, but at time of --mkfs, ceph-mon will take care of it; it's okay to leave it in the mon data dir, but to avoid issues with people doing the wrong thing, I just think it would be best no to
[20:43] <joao> *not
[20:44] <jefferai> sagelap: to be honest, the docs were unclear and I thought that the point of the "manual" method was that you *are* fiddling the keyring and the monmap manually
[20:44] <jefferai> and using that to build the rest of the store
[20:44] * adjohn (~adjohn@ has joined #ceph
[20:44] <jefferai> because step 1 is create the mon data path
[20:44] <jefferai> and step 2
[20:44] <jefferai> er
[20:44] <jefferai> and step 2+ just say "path"
[20:44] <jefferai> that said, I now have 5 mons and quorum, so thanks :-)
[20:45] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[20:45] <joao> jefferai, glad it worked for you, despite the docs ambiguity
[20:45] * Tamil (~Adium@ has joined #ceph
[20:45] <sagelap> np. sorry for the confusion jefferai!
[20:46] <jefferai> it's okay
[20:46] <jefferai> hopefully someone with privs can just put a bolded note in there saying "Do not perform step 2+ in the mon dir"
[20:46] <jefferai> or some such thing
[20:46] <rturk> has someone captured a doc bug in redmine?
[20:46] <joshd> rturk: http://tracker.newdream.net/issues/3438#note-2
[20:46] <sagelap> joao: can you fix up the docs appropriately?
[20:46] <dmick> rturk: I was thinking "just go fix it now", but, yeah
[20:47] <joao> sagelap, I can, as soon as I walk the dog
[20:47] <sagelap> no rush, thanks joao!
[20:50] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Remote host closed the connection)
[20:50] <jefferai> joao: sagelap: thanks much :-)
[20:51] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:53] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[20:57] * yehudasa_ (~yehudasa@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[21:06] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[21:07] <yehudasa_> tnt: can I add signed-off-by to your patch for 3452?
[21:07] <yehudasa_> tnt: can I add signed-off-by to your patch for 3452?
[21:08] * jmlowe (~Adium@2001:18e8:2:28a2:8d57:7a91:8ad9:382c) Quit (Quit: Leaving.)
[21:12] <wer> Does anyone know how I can configure radosgw to allow anonymous puts/
[21:12] <wer> ?
[21:13] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[21:14] * didders_ (~btaylor@rrcs-71-43-128-65.se.biz.rr.com) has joined #ceph
[21:14] <rweeks> s3 or swift?
[21:14] <wer> lol
[21:15] <rweeks> well, it matters, because the APIs are different.
[21:15] <wer> rweeks: I just don't really know :) I am currently s3'ing. And all my s3 libs and keys and things work.
[21:16] <rweeks> see, every S3 user needs an access key
[21:16] <rweeks> that's part of the API
[21:16] * adjohn (~adjohn@ Quit (Quit: adjohn)
[21:16] <wer> yeah. I may be asking the question all wrong.
[21:17] <rweeks> so if you create a single s3 user
[21:17] <rweeks> and a single access key
[21:17] <rweeks> and give that to any client that needs to put, then that would likely work
[21:17] <wer> yes. I and I believe I have that functioning. And I may have asked the original question incorrectly......
[21:18] <yehudasa_> wer: anonymous PUT currently requires that you set up the bucket to be publicly writeable
[21:18] <tziOm> How does crush work in relation to osds filling up?
[21:18] <sagelap> sjust: there?
[21:18] <joao> sagelap, do you think we can assume {path} as being a temporary path for both the keyring and the monmap, granted that I just checked that Monitor::mkfs() does indeed stash a copy of the keyring on mon-data/keyring ?
[21:18] <yehudasa_> wer: however, the correct way to do that coming up the next stable version would be to use POST
[21:19] <sagelap> sjust: let's close #3213 and #3208?
[21:19] <sagelap> joao: which docuemnt?
[21:19] <joao> http://ceph.com/docs/master/cluster-ops/add-or-rm-mons/
[21:20] <wer> yehudasa: This is the example I was told to make work :) Perhaps I am asking the question all wrong as this is my first time messing with object stores. curl -XPUT http://<hostname>/<uuid>.gz --data-binary @<uuid>.gz
[21:21] <yehudasa_> wer: well, you'll need to set up the bucket to be publicly writeable
[21:21] <wer> so which bucket? Is there a bucket in that curl command :)
[21:21] <wer> sorry I sooooooo ignorant :)
[21:22] <yehudasa_> it should be really be http://<hostname>/<bucket>/<uuid>.gz
[21:22] <wer> ahh! ok
[21:22] <wer> And make that bucket public?
[21:22] <rweeks> http://ceph.com/docs/master/radosgw/s3/bucketops/ might also help
[21:22] <rweeks> as well as reading all of http://ceph.com/docs/master/radosgw/s3
[21:22] <rweeks> yes, make that bucket public
[21:23] <sagelap> joao: i'd make it {temporaryfilename} or something, and make a note that it should be in a separate location that is *not* hte mon data dir
[21:23] <joshd> tziOm: you can set the weights of osds based on whatever you want. usually you'd do this based on capacity
[21:23] <sagelap> could even put that step before creating the mon data dir so that it is harder to be misled.
[21:23] <joao> sagelap, okay
[21:24] <wer> yehudasa: ok. I think that makes sense. thank you so much.
[21:24] <wer> rweeks: I will go re-read that stuff :) Thanks!
[21:24] <rweeks> certainly
[21:24] <tziOm> joshd, I know, but say a osd is full, how is it not given data?
[21:24] * sjustlaptop (~sam@m810436d0.tmodns.net) has joined #ceph
[21:24] <wer> This stuff is really cool btw :)
[21:25] <joshd> tziOm: currently that's not handled automatically
[21:25] <rweeks> thanks, wer. we like it.
[21:26] <tziOm> joshd, so full means undefined behaviour?
[21:26] <joshd> tziOm: once an osd passed a the full_ratio specified in the osdmap, the osdmap is marked 'full', and writes block until there's more space or the full ratio is increased
[21:27] <joshd> tziOm: there are warnings when an osd passes the nearfull_ratio
[21:27] <tziOm> joshd, so I would have to update the crushmap/weight manually?
[21:27] <joshd> tziOm: basically right now you should plan to always have extra space
[21:28] * jtang1 (~jtang@ has joined #ceph
[21:28] <joshd> tziOm: right, with ceph osd crush set ...
[21:28] <tziOm> joshd, how is it normal to use the weight parameter? set to OSD_IN_MB ?
[21:29] <joshd> usually more like TB, but yeah
[21:29] <tziOm> what is the size of weight?
[21:30] <tziOm> 64bit?
[21:30] <joshd> it's a float
[21:30] <wer> rweeks: should I be using the s3 libs to change acl's on a bucket or radosgw-admin?
[21:30] <yehudasa_> wer: you need to use an s3 client
[21:30] <wer> ok great. thanks.
[21:33] <tziOm> joshd, are there plans to handle osd full in a automated way?
[21:34] * absynth (~Adium@ip-178-201-144-23.unitymediagroup.de) has joined #ceph
[21:34] * absynth (~Adium@ip-178-201-144-23.unitymediagroup.de) has left #ceph
[21:35] * didders_ (~btaylor@rrcs-71-43-128-65.se.biz.rr.com) Quit (Quit: didders_)
[21:40] <joshd> tziOm: nothing concrete that I know of, but I certainly think it would be great. it's a difficult thing to get correct without accidentally making things worse
[21:42] <joshd> tziOm: there is an 'auto-weight-by-utilization' command, in case your distribution of storage becomes unbalanced for some reason. this just adjusts the weights in the crushmap for you, but only if you run it
[21:42] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[21:43] <joshd> tziOm: that's 'ceph osd reweight-by-utilization'
[21:44] <tziOm> ic
[21:44] <tziOm> so a ceph osd reweight-by-usage
[21:48] * ScottIam (~ScottIam@ Quit (Quit: Leaving)
[21:49] * Tamil (~Adium@ Quit (Quit: Leaving.)
[21:54] * adjohn (~adjohn@ has joined #ceph
[21:54] <tnt> yehudasa_: sure
[21:56] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[21:56] * sjustlaptop (~sam@m810436d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[21:57] * Tamil (~Adium@ has joined #ceph
[22:03] * Tamil (~Adium@ has left #ceph
[22:08] * BManojlovic (~steki@ has joined #ceph
[22:13] <yehudasa_> sagelap: can you take a look at wip-3452, wip-3453? need to push both to next, stable
[22:16] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Read error: Connection reset by peer)
[22:16] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[22:22] * s_parlane (~scott@121-74-248-190.telstraclear.net) Quit (Ping timeout: 480 seconds)
[22:35] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:38] * s_parlane (~scott@ has joined #ceph
[22:49] * sjustlaptop1 (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[22:53] * tryggvil (~tryggvil@16-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:54] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[23:02] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[23:10] * yehudasa_ (~yehudasa@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[23:21] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:42] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[23:46] * mdrnstm (~mdrnstm@206-169-78-213.static.twtelecom.net) has joined #ceph
[23:54] * Qten (Q@qten.qnet.net.au) has joined #ceph
[23:56] * gucki (~smuxi@84-72-8-40.dclient.hispeed.ch) Quit (Remote host closed the connection)
[23:56] <Qten> lo, seems the copy-on-write cloning isnt quite working for me, I'm using the latest version of ceph 0.53 w/ folsom & cinder and a RAW volume, however when i provison a image it "downloads" the full image to the new volume. I also have the option show_image_direct_url=True in my /etc/glance/glance-api.conf and glance is configured to use rbd for storage etc. any ideas?
[23:58] <joshd> Qten: using client.volumes (or whichever rados client cinder is using), does 'rbd info glancepool/glanceimage' work?
[23:58] <joshd> Qten: my suspicion is it's related to the osd caps for the rados client cinder is using ('ceph auth list' will show them)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.