#ceph IRC Log


IRC Log for 2011-02-17

Timestamps are in GMT/BST.

[0:00] <cmccabe> psijack: so ganglia is basically read-only, but with ganeti you can change stuff
[0:01] <cmccabe> psijack: oh, ganeti is from google? I thought their python cluster mgmt stuff was closed-source.
[0:02] <Tv> cmccabe: ganeti isn't the big cluster manager
[0:03] <Tv> "Recommended cluster size 1-40 physical nodes"
[0:03] <prometheanfire> I think it's limited by the storage though
[0:03] <cmccabe> tv: weird. If it's really "from Google" I wonder what they use it for
[0:03] <cmccabe> tv: or is it like an employee project
[0:03] <Tv> cmccabe: it's a big company
[0:03] <Tv> they have research clusters, old-school servers, etc
[0:04] <cmccabe> yeah no doubt
[0:04] <Tv> the thing most people think of as "the google", as far as i've heard, uses cgroups not virtualization
[0:05] <Tv> because virtualization would be overhead, and they care about the last 2%
[0:07] <cmccabe> cgroups sound like the right tool for running multiple daemons on the same machine
[0:24] <Tv> bah cauthtool exits without error message in this test :(
[0:25] <Tv> nm, misread the log, it's cconf complaining about usage
[0:33] * greglap (~Adium@ has joined #ceph
[0:36] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:40] <greglap> a ways back, but google definitely uses cgroups in their big stuff — they were talking about scheduling policies with cgroups on fsdevel recently
[0:55] * greglap1 (~Adium@ has joined #ceph
[1:02] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[1:10] * greglap1 (~Adium@ Quit (Quit: Leaving.)
[1:41] <Tv> note to self: do not select both endpoints of a client-server connection as clients
[1:44] * sagelap (~sage@ has joined #ceph
[1:47] <Tv> note to self: make changes before retrying
[1:47] <Tv> (it's a good thing it's getting 5pm already)
[1:48] <bchrisman> hmm.. perhaps cgroups' resource limiting may make a viable ceph-kernel-client on same nodes as mds/osd servers?
[1:49] <bchrisman> ie w/o resorting to a kvm image or what not.
[1:50] <cmccabe> bchrisman: the NFS-deadlock problem relates to needing more memory in order to flush the page cache
[1:50] <cmccabe> bchrisman: I don't think the page-cache is controlled by cgroups, although I could be wrong
[1:51] <cmccabe> bchrisman: the page cache isn't a per-process thing; it's shared amongst all processes.
[1:51] * sagelap (~sage@ Quit (Read error: Connection reset by peer)
[1:51] * sagelap (~sage@ has joined #ceph
[1:51] <Tv> once we have the test setup decent, i want to build that setup and push it, see if it really is that flaky or not
[1:52] <cmccabe> tv: reminds me of the old calvin & hobbes joke about building bridges
[1:52] <bchrisman> cmccabe: wikipedia page claims it includes page cache… was going to look more into it for an actual reliable source.
[1:53] <Tv> bchrisman: you'd need to cgroup-limit everything using the mountpoint, prevent them from taking too much away from pages available to ceph daemons.. sounds fairly invasive
[1:53] <cmccabe> tv: Calvin asks his dad how weight limits for bridges are measured. He answers, "They drive bigger and bigger trucks over the bridge until it breaks. Then they weigh the last truck and rebuild the bridge."
[1:53] <Tv> bchrisman: that becomes essentially the same as splitting it into two machines
[1:54] <bchrisman> Tv: yeah.. it's mainly a question of which is worse… that or going to a full kvm image...
[1:54] <bchrisman> Tv: neither are pretty
[1:54] <bchrisman> Tv: but full virtualization would be a known-to-work solution.
[1:55] <cmccabe> bchrisman: well the classic reasons to use LXC rather than QEMU are to allow sharing of common memory, easier administration, no virt overhead
[1:56] <bchrisman> cmccabe: ah yeah… nice
[1:57] <cmccabe> tv: I think he says this while they're driving over a bridge too
[1:57] * ooolinux (~bless@ has joined #ceph
[1:58] * ooolinux (~bless@ Quit ()
[1:59] * ooolinux (~bless@ has joined #ceph
[1:59] * ooolinux (~bless@ Quit (Read error: Connection reset by peer)
[1:59] * ooolinux (~bless@ has joined #ceph
[2:02] <cmccabe> http://picayune.uclick.com/comics/ch/1986/ch861126.gif
[2:05] <prometheanfire> common memory can be done with ksm
[2:07] <cmccabe> prometheanfire: yeah, ksm is cool. It's not really going to be anywhere near as efficient as just running LXC though
[2:07] <cmccabe> promeatheanfire: for a start, you have to periodically checksum every page if I remember correctly
[2:09] <prometheanfire> true
[2:11] <DeHackEd> ksm doesn't use checksumming
[2:11] <cmccabe> hashing, whatever
[2:11] <DeHackEd> doesn't do that either
[2:12] <cmccabe> DeHackEd: from http://lwn.net/Articles/306704/
[2:12] <cmccabe> The algorithm is relatively simple. The KSM driver, inside a kernel thread, picks one of the memory regions registered with it and start scanning over it. For each page which is resident in memory, KSM will generate an SHA1 hash of the page's contents. That hash will then be used to look up other pages with the same hash value. If a subsequent memcmp() call shows that the contents of the pages are truly identical, all processes with a refe
[2:13] <DeHackEd> http://lwn.net/Articles/309155/
[2:13] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:14] <prometheanfire> :D
[2:15] <DeHackEd> they now build a binary search tree with the pages themselves as the keys. for pages that are sufficiently random in content, a memcmp() along the tree does not scan all 4096 bytes
[2:16] <DeHackEd> depending on the memory contents it's more cache-friendly as well
[2:16] <prometheanfire> right now I'm saving 1664 out of 8304
[2:16] <cmccabe> DeHackEd: interesting, they must have added that later
[2:17] <cmccabe> DeHackEd: it's a good idea to try to minimize the amount of memory you have to read
[2:17] <prometheanfire> the overhead used to be horrid
[2:17] <cmccabe> DeHackEd: I guess since they've only got a single thread doing the scanning, using a non-lockless data structure is fine
[2:21] <DJLee> guys, when a node is running multiple cosd, and runs out of memory, e.g., 4GB, starts to swapping hard, and then goes idle,
[2:21] <DJLee> the cosds seem to still hold all those mems,
[2:22] <DJLee> unless i stop them off, and restart just the cosds..
[2:34] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[2:45] * WesleyS (~WesleyS@ Quit (Quit: WesleyS)
[2:58] * ooolinux (~bless@ Quit (Quit: Leaving)
[2:58] * ooolinux (~bless@ has joined #ceph
[3:11] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:12] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[4:23] <ooolinux> hi
[4:31] <ooolinux> any one know ceph's messenger dispatch?
[4:32] <ooolinux> now every child dispatcher will process all messenger
[4:37] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[4:41] <greglap> ooolinux: I know the messenger dispatch pretty well
[4:41] <greglap> and yes, it's supposed to call all child dispatchers until one of the dispatchers says that they've processed it
[4:44] <ooolinux> yes, no error
[4:44] <greglap> DJLee: memory allocators don't seem to deal too well with all the message-passing and stuff — if you're using tcmalloc it should help, but I don't know that we can do too much about it :/
[4:45] <ooolinux> would you ask some question about ceph? if i can answer?
[4:45] <ooolinux> thanks
[4:45] <greglap> you want me to ask you questions?
[4:50] <ooolinux> yes
[4:50] <ooolinux> i am reading monitor code
[4:51] <ooolinux> would you tell me the different between leader or peon ?
[4:53] <ooolinux> lunch time
[4:59] <greglap> sorry, got distracted with something else :)
[4:59] <greglap> the monitor code is based upon guaranteed lossless safe storage, which is provided through a Paxos implementation
[5:01] <greglap> in non-trivial Paxos implementations, it's desirable to have one node that is responsible for handling updates and gathering votes from the other nodes
[5:01] <greglap> that node is the leader, and the other nodes are followers (or "peons" in our nomenclature)
[5:03] <DJLee> greglap: thanks, i think the best way (as far as generating the fair benchmark) is to shut down the cosds, for each run..
[5:03] <greglap> so in Ceph, the leader monitor is the one that creates new maps (which are then ratified by a quorum of all monitors before becoming canon) handles all incoming requests which request state changes
[5:03] <greglap> DJLee: yeah
[5:03] <DJLee> It seems that running multiple tests over the same-running cosds, will kinda accumate some memories, hehe
[5:03] <greglap> we use memory pools in a few parts of Ceph but I'm not sure if messages do — if not that's one thing that might help as well
[5:04] <greglap> in a beefy machine with a bit of RAM and paging enabled then it's not usually a problem (I don't think we've had issues in our long-lived tests) because the OSD doesn't actually have that much live memory
[5:04] <DJLee> i think 0.24.3 is pertty stable, really liking it so far
[5:04] <greglap> so the inactive stuff just gets paged out
[5:05] <DJLee> right, we are doing crappy machine vs beefy machine comparisons :)
[5:05] <greglap> we made a lot of effort in the .24 tree to nail down issues without adding new features (which always disrupt the code and cause problems)
[5:06] <DJLee> right, i find the pg sizes in 24.3 is
[5:07] <DJLee> 264 per osd
[5:07] <DJLee> much less than the unstable (master), which peers so much faster
[5:08] <greglap> pg sizes? you mean in-memory?
[5:08] <DJLee> placementgroup, sorry
[5:08] <greglap> I got that
[5:08] <greglap> I'm just not sure what you mean by their size
[5:08] <DJLee> 2011-02-18 17:08:56.311632 pg v1375: 1056 pgs: 1056 active+clean; 8197 MB data, 25214 MB used, 6937 GB / 7335 GB avail
[5:08] <DJLee> this is running 4 osds.. right
[5:09] <greglap> oh, number of PGs
[5:09] <greglap> got it
[5:09] <DJLee> yeah sorry number and size is clearly different, heh
[5:10] <DJLee> in next few days, i'll try to bombard the mds node, and so thining to do 'mdtest' tool, or similar dir creation tests.
[5:10] <greglap> so .24.3 peers faster than master branch?
[5:10] <DJLee> yeah
[5:10] <DJLee> I think mainly due to less peers
[5:10] <greglap> hmmm, they shouldn't have different default PG numbers I don't think
[5:11] <greglap> I wonder what we did
[5:11] <DJLee> i mean less num of pgs, (234 has 264 per osd, and master had i think 2300s)
[5:11] <greglap> soon we will have QA suites to help catch this kind of thing
[5:11] <greglap> oh, I wonder if we upped one of them in the config by mistake
[5:11] <DJLee> i see, are you guys located in the same office?
[5:12] <greglap> most of us
[5:13] <DJLee> osd_pg_bits = 7 osd_pgp_bits = 5
[5:13] <DJLee> thats in my ceph.conf, but if i remember correcltly that wasnt being read by the master 10 days ago,
[5:13] <greglap> hmm
[5:14] <greglap> I wonder if maybe mkcephfs wasn't looking at the config file
[5:14] <DJLee> still dont get how i get 264 pg number, (and so i dont think pg is being read..?)
[5:14] <DJLee> mkcephfs did read the number of osds and locations though..
[5:14] <greglap> oh, yeah
[5:14] <greglap> humm
[5:15] <greglap> I remember you talking about this some a while ago but I don't remember what the resolution was
[5:15] <DJLee> yeah the res. was to set it above, 7, and 5, (previously it was defaulted to 9 and 6 i think)
[5:16] <DJLee> but the pg num didnt change.
[5:16] <greglap> even after mkcephfs?
[5:16] <greglap> it should have :/
[5:17] <greglap> could you create a bug in the tracker for it? :)
[5:17] <DJLee> sure
[5:17] <DJLee> will do after the test finish, (it's still running)
[5:17] <greglap> that's the answer to most things if it doesn't get resolved in chat — make a bug so it doesn't get lost in our heads!
[5:18] <DJLee> yes~
[5:19] <DJLee> btw, is there easier way to dump the ceph -w to a file, while im away
[5:20] <DJLee> at least some way to stop and start ceph -w for each test, currently ive set to debug none, and most are logged to logger dir = /cephlog directory,
[5:20] <DJLee> but that doesnt have the general ceph -w status message.
[5:23] <greglap> we don't have any better monitoring, but ceph -w just dumps to stdout so you should be able to run it in the background and pipe it into a file
[5:24] <DJLee> yeah,
[5:29] <DJLee> for random read/write, wouldn't it be almost impossible to test this fairly? mainly because the file must be created first, and how big?
[5:29] <DJLee> if the file created happened to be residing on the edge of the disk, (and small) then random effect will almost be sequential
[5:30] <DJLee> to truely test random, a single file must be as large as the entire osd
[5:30] <greglap> I really haven't thought about that too much
[5:31] <DJLee> e.g., 2tb x 6 osd = 12tb, so a file needs to be 12TB, in order to make the actuator move all the ways,
[5:31] <greglap> but i think for small random reads/writes the system's ability to mask latency is a much bigger factor than the variations in disk speed
[5:31] <greglap> for larger writes that can be a bigger factor but I'm not sure how much of one
[5:31] <greglap> I think Ceph's 4MB chunks are actually not much over the line between bandwidth and latency breaking even in terms of importance (though I could be wrong about that)
[5:32] <DJLee> right latency is effect and i guess seen easily (arm movement is significant)
[5:33] <DJLee> because im getting random read pretty high here.. (which is good) but maybe a bit too high?
[5:34] <greglap> you probably are storing most of your data in cache, then?
[5:35] <greglap> it's up to you to decide whether you want to write enough to evict most of it out of cache, or if your usage patterns mean that your data will usually be hot :)
[5:35] <DJLee> i did empty the cache (eacho 3 > /proc/sys/vm/drop_cache)
[5:36] <DJLee> disk's journal size shouldnt have any effect on the 'reading'
[5:36] <greglap> nope, journal wouldn't
[5:36] <greglap> I didn't know about that drop_cache option
[5:37] <DJLee> that drop_cache made some biggg difference,
[5:37] <greglap> well what kinds of reads did you get?
[5:37] <greglap> and does your storage have a lot of caching?
[5:37] <DJLee> e.g., dd write to disk, and read it back, (and if you read it back again, you get the 'cached' performance), and so you gotta drop_cache to make sure it's reading from disk again, etc
[5:38] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:38] <greglap> well hopefully prefetching would work with a dd read
[5:38] <greglap> and so it's a nice streaming read all around
[5:39] <DJLee> for sequential reading (using iozone, 8 threads 8gig) 6 osds, i get 80-90mb/s, good (almost max nic), but for random read, argh, 50-60mb/s
[5:40] <DJLee> tried on 256k, 1024k and 4096k block, similar results
[5:40] <greglap> those sound like reasonable numbers to me, but we haven't done any profiling yet to improve those
[5:41] <greglap> it's a better ratio than most disks will give you :p
[5:41] <DJLee> oh, what do you mean 'better ratio than' ?
[5:42] <greglap> I think most hard drives drop their bandwidth more than 30% from streaming to random reads :)
[5:43] <DJLee> right, (actually worse, down to 90%+, depending on the position/location of the 'file' and the block size..
[5:44] <DJLee> e.g., seq read from edge to inner all the way is like 120mb/s down to 70mb/s (typical log curve)
[5:44] <DJLee> but try doing random read for that entire space, goes right now to 10mb/s or less (i think it goes to 200kb/s if its a 4k block)
[5:45] <DJLee> so it also depends on how 'loing' actual benchmark runs for... hehe..
[5:47] <DJLee> btw, in iozone, there's no time limit factor, unlike iometer or fio, if you set 8gb file size, then it will go through all that allocated blocks (8gb) whether it's seq/random
[5:52] <DJLee> but you know.. not many people want to see low performance by configuring things into worse case, like i just mentioned putting files to the worse position, heh;
[5:53] <greglap> most systems do attempt to prevent those cases for better overall performance, yes
[5:56] <ooolinux> why sometimes osd listen more than 1 port?
[5:57] <greglap> well it has a number of connections it needs to maintain
[5:57] <greglap> and it has a couple of different threads that send messages
[5:57] <greglap> one for the other OSDs
[5:57] <greglap> one for clients
[5:57] <greglap> and one for sending and receiving heartbeats from other OSDs
[5:57] <ooolinux> if all connection is working all the time?
[5:57] <greglap> yep, it maintains those three at all times
[5:58] <ooolinux> o see
[5:58] <ooolinux> i see
[6:00] <ooolinux> do you have call flow chart of ceph?
[6:00] <greglap> no :(
[6:01] <greglap> there's just not very much documentation, although we're slowly improving it
[6:02] <ooolinux> ok, i dont know what paxos, paxosserver do?
[6:02] <greglap> they're an implementation of Paxos
[6:02] <greglap> it's a pretty well-known algorithm; you can look up a number of descriptions of it online
[6:02] <greglap> or check out Leslie Lamport's original paper :)
[6:06] <ooolinux> i know paxos->read(paxosv, monmap_bl) , but i can find when and where paxosv is write on disk?
[6:08] <greglap> check out the stuff surrounding Paxos::commit()
[6:17] * stingray (~stingray@stingr.net) Quit (Ping timeout: 480 seconds)
[6:17] * stingray (~stingray@stingr.net) has joined #ceph
[6:17] <ooolinux> handle_commit handle paxos message, just write latest,last_committed . when and where write digital file?
[6:18] <greglap> digital file?
[6:18] <ooolinux> take mdsmap as example
[6:19] <ooolinux> mdsap dir, digital file and latest ,last_committed
[6:19] <greglap> I don't remember the exact code paths here
[6:19] <ooolinux> i know latest and last_committed
[6:19] <greglap> I think it starts in "propose_new_value"
[6:19] <greglap> the basic algorithm is:
[6:19] <greglap> 1) leader propagates proposed new key-value to all nodes
[6:20] <greglap> 2) nodes write key-value to durable storage, along with a "proposed" flag
[6:20] <greglap> 3) nodes send "confirm this key" messages back to leader
[6:21] <greglap> 4) when leader gets a majority of the total nodes confirming, it broadcasts a "commit" message
[6:21] <greglap> 5) all the nodes move on to the newest value
[6:21] <greglap> (and they write to durable storage that they've done so)
[6:21] <greglap> commit() is steps 4-5
[6:22] <greglap> I think the earlier steps start with propose_new_value() :)
[6:22] <greglap> it's bedtime for me though, g'night!
[6:25] <ooolinux> thanks greglap
[6:25] <ooolinux> Paxos::store_state write those file
[6:32] <DJLee> gnight
[6:37] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has left #ceph
[7:25] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[7:31] * Psi-Jack (~psi-jack@yggdrasil.hostdruids.com) Quit (Read error: Operation timed out)
[8:03] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:16] * allsystemsarego (~allsystem@ has joined #ceph
[9:09] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:54] <DJLee> I've made a bug track here, http://tracker.newdream.net/issues/810
[9:55] <DJLee> oh I think the assignee is wrong;
[10:05] * Yoric (~David@ has joined #ceph
[10:17] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[10:21] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[10:21] * Yoric (~David@ has joined #ceph
[10:23] * Yoric (~David@ Quit ()
[10:24] * Yoric (~David@ has joined #ceph
[10:40] * ooolinux (~bless@ Quit (Ping timeout: 480 seconds)
[10:45] * Yoric (~David@ Quit (Quit: Yoric)
[10:48] * DJLee (82d8d198@ircip3.mibbit.com) Quit (Remote host closed the connection)
[10:51] * Yoric (~David@ has joined #ceph
[13:08] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * stingray (~stingray@stingr.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Ormod (~hjvalton@vipunen.hut.fi) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * MK_FG (~MK_FG@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * prometheanfire (~mthode@mx1.mthode.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * eternaleye (~eternaley@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Meths (rift@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * darkfader (~floh@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * johnl (~johnl@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Guest1645 (quasselcor@bas11-montreal02-1128535815.dsl.bell.ca) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * atg (~atg@please.dont.hacktheinter.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Jiaju (~jjzhang@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * nolan (~nolan@phong.sigbus.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * [ack] (ANONYMOUS@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * __jt__ (~james@jamestaylor.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * iggy (~iggy@theiggy.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * allsystemsarego (~allsystem@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * morse (~morse@supercomputing.univpm.it) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * alexxy (~alexxy@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * hijacker (~hijacker@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Yoric (~David@ Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * pruby (~tim@leibniz.catalyst.net.nz) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * Anticimex (anticimex@netforce.csbnet.se) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * raso (~raso@debian-multimedia.org) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * DeHackEd (~dehacked@dhe.execulink.com) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * f4m8 (~f4m8@lug-owl.de) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * yx (~yx@82VAABWSB.tor-irc.dnsbl.oftc.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:08] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (kinetic.oftc.net reticulum.oftc.net)
[13:14] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:14] * DeHackEd (~dehacked@dhe.execulink.com) has joined #ceph
[13:14] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[13:14] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:14] * allsystemsarego (~allsystem@ has joined #ceph
[13:14] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[13:14] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[13:14] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[13:14] * Guest1645 (quasselcor@bas11-montreal02-1128535815.dsl.bell.ca) has joined #ceph
[13:14] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[13:14] * Jiaju (~jjzhang@ has joined #ceph
[13:14] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) has joined #ceph
[13:14] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[13:14] * MK_FG (~MK_FG@ has joined #ceph
[13:14] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[13:14] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:14] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[13:14] * [ack] (ANONYMOUS@ has joined #ceph
[13:14] * prometheanfire (~mthode@mx1.mthode.org) has joined #ceph
[13:14] * eternaleye (~eternaley@ has joined #ceph
[13:14] * Meths (rift@ has joined #ceph
[13:14] * Ormod (~hjvalton@vipunen.hut.fi) has joined #ceph
[13:14] * darkfader (~floh@ has joined #ceph
[13:14] * johnl (~johnl@ has joined #ceph
[13:14] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:14] * iggy (~iggy@theiggy.com) has joined #ceph
[13:14] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[13:14] * __jt__ (~james@jamestaylor.org) has joined #ceph
[13:14] * hijacker (~hijacker@ has joined #ceph
[13:14] * alexxy (~alexxy@ has joined #ceph
[13:14] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:14] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[13:14] * stingray (~stingray@stingr.net) has joined #ceph
[13:14] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[13:14] * Yoric (~David@ has joined #ceph
[13:14] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[13:14] * cclien_ (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) has joined #ceph
[13:14] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[13:14] * raso (~raso@debian-multimedia.org) has joined #ceph
[13:14] * f4m8 (~f4m8@lug-owl.de) has joined #ceph
[13:14] * yx (~yx@82VAABWSB.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:14] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[14:05] * Yoric_ (~David@ has joined #ceph
[14:05] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[14:05] * Yoric_ is now known as Yoric
[14:09] * DJLee (82d8d198@ircip1.mibbit.com) has joined #ceph
[15:12] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[15:16] * andret (~andre@pcandre.nine.ch) has joined #ceph
[16:37] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Remote host closed the connection)
[17:13] * Yoric (~David@ Quit (Quit: Yoric)
[17:31] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[17:57] * greglap (~Adium@ has joined #ceph
[18:02] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:03] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:04] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit ()
[18:04] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:43] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:53] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:54] <Tv> bah multimachine autotests need to pass keys from one host to another.. this gets complicated..
[18:56] * sagelap (~sage@ has joined #ceph
[18:57] * sagelap (~sage@ has left #ceph
[19:09] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:21] * cmccabe (~cmccabe@ has joined #ceph
[19:22] <Tv> soo.. http://ceph.newdream.net/wiki/Monitor_commands#AUTH_subsystem says add is for *osds only*, is that wrong?
[19:23] <Tv> how else do you add a new client key, at runtime?
[19:23] <Tv> also, how do the osds get to know the new client key? via the monitors? does anything read its keyring after startup?
[19:24] <gregaf> yehudasa knows this all much better than I remember it
[19:24] <Tv> i need to read the paper on ceph's security architecture..
[19:24] <gregaf> but IIRC clients authenticate against the monitor with their key, and then get rotating keys which they use when communicating with the OSDs and MDSes
[19:24] <Tv> ah that would make sense
[19:24] <gregaf> we don't have a paper on the security, unfortunately
[19:25] <Tv> i thought there was one
[19:25] <gregaf> Yehuda implemented it about 1.5 years ago
[19:25] <gregaf> yeah, that's not what we use; it was way too expensive/complicated
[19:25] <Tv> hahaha
[19:25] <cmccabe> tv: there are some security papers, but none of them describe the current situation exactly
[19:25] <Tv> ok
[19:25] <Tv> so is ceph auth add for osds only, or for client keys too?
[19:25] <Tv> yehudasa: ?
[19:26] <Tv> i mean, the caps should tell what the key can do
[19:26] <gregaf> I think it's for anybody, but I'll poke him
[19:27] <yehudasa> Tv: ceph auth add is for all entities
[19:28] <Tv> ok thanks
[19:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:46] <gregaf> woah, so the latest gitbuilder of master has an internal compiler error
[19:47] <DeHackEd> nice
[19:47] <cmccabe> heh, an ICE?
[19:47] <cmccabe> has someone been going nuts with templates?
[19:47] <DeHackEd> and what version of gcc?
[19:53] <gregaf> dunno, that stuff belongs to Tv
[19:55] <cmccabe> I thought it was 4.4.5
[19:58] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[20:03] <Tv> gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5)
[20:04] <Tv> gregaf: where do you see the compiler error..
[20:05] * sagelap (~sage@ has joined #ceph
[20:05] * sagelap (~sage@ has left #ceph
[20:05] <Tv> ah i386 version
[20:06] <gregaf> oh, sorry, thought I'd posted that
[20:08] <Tv> rebooting hte machine, as it doesn't seem to be related to the source content..
[20:09] <Tv> (difference between ok and fail was purely in .gitignore)
[20:16] <Tv> still does it.. crap
[20:18] <Tv> might be the config flags changed recently, undoing some of that to see
[20:20] <Tv> it's testradospp.cc consistently, though
[20:20] <Tv> which.. yeah, we didn't do --with-debug originally
[20:20] <Tv> that's why it didn't happen originally
[20:25] <Tv> well, without --with-debug it doesn't compile the problematic file.. small comfort
[20:34] <cmccabe> wow, it ICEs consistently?
[20:35] <cmccabe> normally that's something that happens very rarely and isn't that reproducible
[20:35] <cmccabe> you might try checking memory consumption... this might sounds stupid, but are you hitting some kind of 32-bit limit?
[20:35] <gregaf> and testradospp.cc, while possible broken, isn't that complicated
[20:37] <cmccabe> yeah... it's short (which is a good thing btw)
[20:37] <cmccabe> maybe there's some library linking foo on 32-bit?
[20:40] <Tv> i wiped out ccache, waiting to see.. ccache might very well change it from "it crashed once" to "it crashes reliably"
[20:40] <Tv> nope, still busted
[20:49] <Tv> the best i can really offer is reporting the bug to gcc and either disabling testradospp.cc, or modifying it until it avoids the crash
[20:50] <gregaf> I'm not too concerned about it either way; I don't think the tool is actually that useful
[20:53] <Tv> feel free to comment it out in Makefile.am, i think that's the cleanest way around this for now
[20:56] <cmccabe> so when are you supposed to use open_pool, and when are you supposed to use lookup_pool?
[20:58] <joshd> cmccabe: open_pool provides a pool context so you can perform other operations on the pool
[20:59] <cmccabe> so lookup_pool is only for checking that the pool exists?
[20:59] <joshd> lookup_pool just looks at the osdmap
[20:59] <cmccabe> lookup_pool does return an int rather than a bool, is there anything interesting I can do with that
[21:00] <joshd> I'm not sure there is with librados
[21:01] <joshd> but it's -ENOENT if it doesn't exist
[21:01] <cmccabe> so basically it's negative if you can't lookup the pool
[21:02] <cmccabe> but aside from that... its value is unimportant
[21:02] <cmccabe> well, I guess you could have different error codes.
[21:04] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[21:04] <joshd> yeah, afaict
[21:05] <joshd> yehudasa would know for sure
[21:08] * alexxy (~alexxy@ has joined #ceph
[21:12] * peritus (~andreas@h-150-131.A163.priv.bahnhof.se) has joined #ceph
[21:23] <Tv> oh bah corefiles have a limit on how long the path to executable can be.. now i need to jump through hoops to even figure out what crashed
[21:24] <Tv> cmon it is..
[21:26] <gregaf> cmccabe: I believe the return value is the pool number
[21:27] <gregaf> not sure, though
[21:27] <cmccabe> gregaf: yea
[21:27] <cmccabe> gregaf: just wondering what the user can do with it
[21:27] <gregaf> yeah, I don't think there's much since all the ops require a pool context
[21:27] <gregaf> but it is potentially-useful information leakage (uh-oh!)
[22:25] * nolan (~nolan@phong.sigbus.net) Quit (Remote host closed the connection)
[22:27] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:39] <johnl> ello
[22:43] <cmccabe> johnl: hi
[22:52] <gregaf> hi johnl
[22:54] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Quit: Tv)
[22:56] <cmccabe> so, I see that there are rados snaps and "selfmanaged snaps"
[22:57] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:59] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[23:22] <FoxMURDER> mmm is it known bug of rbd kernel driver, that adding nonexistent device from working ceph kind of breaks the thing ?:)
[23:23] <FoxMURDER> and rmmod/modprobe rbd is screwed up too :-O
[23:25] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[23:25] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:25] <yehudasa> FoxMURDER: that's a known bug, was fixed a while ago
[23:26] <FoxMURDER> is it in 2.6.38 or do i have to use some repo?
[23:34] <gregaf> FoxMURDER: the fix is in the 38 rcs but they're currently broken for other reasons thanks to the scalability work that went mainline :(
[23:34] <FoxMURDER> :'-(
[23:35] <gregaf> actually rbd will work fine, but not the Ceph fs
[23:36] <Tv> the good news is, if it never works, it scales sublinearly!
[23:36] <FoxMURDER> lol
[23:36] <gregaf> heh
[23:36] <FoxMURDER> i'd rather hear it's production ready, than having it scale :) would save me so much pain ...
[23:40] <yehudasa> FoxMURDER: you can cherry-pick commit 766fc439 from git://ceph.newdream.net/git/ceph-client.git
[23:51] * sagelap (~sage@ has joined #ceph
[23:51] * sagelap (~sage@ has left #ceph
[23:52] <FoxMURDER> i'm afraid that even with this patched i won't get this past my boss :( will have to figure out other way to cluster up such storage ...
[23:59] * sagelap (~sage@ has joined #ceph
[23:59] * sagelap (~sage@ has left #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.