#ceph IRC Log


IRC Log for 2011-02-08

Timestamps are in GMT/BST.

[0:05] * uwe (~uwe@ip-94-79-145-210.unitymediagroup.de) Quit (Quit: sleep)
[0:06] <cmccabe> tv: just out of curiosity, are you compiling everything on a central server and exporting to the VM, or are you compiling inside the VM itself?
[0:08] <Tv|work> cmccabe: right now i'm doing a thing that lets you say make install on your own source tree
[0:09] <Tv|work> cmccabe: then i'll probably make gitbuilder provide that, and use the gitbuilder-provided one in the tests by default
[0:09] <cmccabe> tv: sounds cool.
[0:10] <cmccabe> tv: the best part about that is that I can make and make install with CFLAGS="-Os" or something and use those binaries
[0:10] <Tv|work> yeah, and if we want to run tests against different variants we can run a gitbuilder for each one..
[0:10] <Tv|work> just need to make sure worker cpu platform matches
[0:12] <Tv|work> woo tests ran all the way to trying to unmount cfuse
[0:12] <cmccabe> :)
[0:18] <sjust> cmccabe: where does the osd->clog logging go?
[0:19] <cmccabe> sjust: depends on "clog to monitors" and "clog to syslog"
[0:19] <sjust> ah
[0:19] <cmccabe> sjust: if "clog to syslog" is set, central logs go to syslog
[0:19] <cmccabe> sjust: if "clog to monitors" is set, they are sent to the monitors (see LogClient.cc)
[0:20] <cmccabe> sjust: you can set both options, or neither, if you want.
[0:20] <sjust> cmccabe: thanks
[0:20] <cmccabe> sjust: np
[0:23] <darkfader> cmccabe: wow, so the monitors can now collect messages from all osds? thats sweet
[0:23] <cmccabe> darkfader: Yeah. Of course, central logging has been around for a while, I can't claim credit for it :)
[0:24] <cmccabe> darkfader: I guess the idea behind central logging / CLOG is that we send "important" messages to the monitors so that we can get an idea about what's going on in the cluster as a whole
[0:24] <cmccabe> darkfader: the dilemma we have when debugging, of course, is that it's hard to tell what is important except in retrospect.
[0:25] <darkfader> hehe
[0:25] <cmccabe> darkfader: the reason why syslog support was added is that many people like to use a syslogd to aggregate messages from an entire cluster
[0:26] <darkfader> i need my loghost to have all relevent status messages of what happens, including ceph. but having more info at the mon's is cool too
[0:27] <cmccabe> darkfader: it's kind of a shame that it's so network-intensive to send all those messages.
[0:27] <darkfader> only while we got debug turned up... ?
[0:27] <cmccabe> darkfader: yeah, that's why we have log levels.
[0:27] <darkfader> yeah but so don't worry about it too much
[0:28] <darkfader> later on it wont be a problem
[0:28] <darkfader> ah, but i remember someone tarred' up over 1GB of logs here last week
[0:28] <darkfader> makes me see your point :)
[0:29] <cmccabe> darkfader: logs will always be important, but hopefully with more stability will come fewer logs
[0:30] <cmccabe> darkfader: well, what I mean to say is higher log levels.
[0:30] <cmccabe> darkfader: sigh. I meant higher as in higher number, as in less logs :P
[0:31] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Remote host closed the connection)
[0:31] <darkfader> haha relax
[0:32] <darkfader> i'm with you
[0:32] <cmccabe> that always gets me with renice() too
[0:32] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[0:32] <cmccabe> I always find myself saying higher priority, when what I really mean is higher nice() value, which is a lower priority
[0:32] <darkfader> hehe. renice is confjusing everyone
[0:33] <darkfader> and ionice, too. because it doesnt work reverse
[0:33] <cmccabe> heh
[0:34] <cmccabe> darkfader: I guess it's "nicer" to give your timeslice to other processes. Or something
[0:35] <darkfader> yes - more nice is more less nice for yourself. but that will mean your scheduler prio (the other thing) will go up
[0:35] <darkfader> it's best not to think about it unless on sunny days
[0:37] <Tv|work> hmmph vstart.sh is hostile towards "make install"
[0:37] <Tv|work> it wants ./cosd
[0:37] <cmccabe> tv: yeah, vstart.sh is ... unfortunate
[0:37] * Tv|work <-- suffering death by a thousand cuts
[0:37] <Tv|work> cmccabe: yeah i do intend to make vstart.sh go away, but need to use it for now to get going
[0:38] <cmccabe> tv: it's a good thing to look at sort of as a reference; it's not used directly when setting up a multi-machine cluster
[0:38] <cmccabe> tv: too much nasty baked in
[0:40] <cmccabe> tv: you might have better luck integrating with mkcephfs
[0:41] <cmccabe> tv: mkcephfs does a lot of the same stuff-- creating keyrings and monmaps, etc.
[0:41] <cmccabe> tv: for some reason, vstart.sh doesn't call mkcephfs-- just kind of does its own thing
[0:41] <Tv|work> ok vstart.sh wants init-ceph which doesn't even seem to be shipped in "make install"
[0:41] <Tv|work> not using vstart then..
[0:42] <cmccabe> tv: init-ceph is usually installed as /etc/init.d/ceph
[0:42] <Tv|work> not by make install
[0:42] <cmccabe> tv: we kind of agonized over whether make install should install /etc/init.d/ceph, but eventually decided not to do it
[0:42] <Tv|work> yeah i understand that
[0:42] <Tv|work> config files are finicky
[0:42] <cmccabe> tv: the problem is that not all distributions even have sysv-style init scripts any more
[0:42] <Tv|work> but vstart is just not meant to work outside the source tree
[0:42] <cmccabe> tv: like Ubuntu has upstart now
[0:43] <Tv|work> <3 upstart
[0:43] <Tv|work> (though they should have copied more from runit)
[0:43] <cmccabe> tv: upstart is cool but I don't think we have integration with it yet
[0:43] <cmccabe> tv: it would be cool if autoconf could figure out your init system and let the appropriate init script be installed by make install
[0:44] <Tv|work> honestly, autoconf is never going to get that right..
[0:45] <Tv|work> nor will the world wait for autoconf to be updated
[0:45] <prometheanfire> I should be able to test ceph on a single host (just to check things out) right?
[0:45] <Tv|work> i'm more inclined to say upstream should not worry about inits
[0:45] <Tv|work> prometheanfire: yes, though using cfuse is recommended, because loopback mounts may hang if you run out of memory
[0:46] <prometheanfire> I think I should be fine hopefulle
[0:46] <prometheanfire> testing rbd and kvm
[0:46] <cmccabe> tv: yeah, we currently put init scripts into the dpkg and RPM, but not make install
[0:47] <cmccabe> tv: you may need to manually copy over init-ceph in your test framework if you plan on using it
[0:47] <Tv|work> cmccabe: i don't want it but vstart wanted; but i'm already moving away from vstart
[0:47] <cmccabe> k
[1:03] <Tv|work> what's the difference between ceph.conf "log dir" and "logger dir"?
[1:03] <yehudasa> tv: log dir is for the debug output, logger dir is for the logger
[1:04] <Tv|work> that's an awesome non-answer :-/
[1:04] <yehudasa> the logger just dumps all kind of statistics that I think only sage ever looked at
[1:04] <cmccabe> tv: logger is short for "profiling logger"
[1:04] <Tv|work> multiple logging subsystems, for some the verbosity goes up to 11?
[1:04] <cmccabe> tv: the profiling logger is a third thing, not the central logger or dout, which can do some useful stuff (allegedly)
[1:08] <prometheanfire> lol, forgot to build btrfs into the kernel
[1:11] <cmccabe> we should probably rename Logger to ProfilingLogger to clear up some of the confusion
[1:11] <cmccabe> and likewise with the configuration options
[1:25] <prometheanfire> I keep on getting failed to assign a block name for image
[1:25] <prometheanfire> when I try to create rbd devices
[1:42] <sagewk> pushed kernel with latest btrfs onto cosd,sepia machines
[1:42] <prometheanfire> 2011-02-07 19:39:40.258733 mon0 -> 'no installed classes!' (0)
[1:43] <prometheanfire> guess that's no good
[1:43] <gregaf> http://ceph.newdream.net/wiki/Rbd
[1:43] <gregaf> looks like you didn't load some of the required rbd stuff into the cluster
[1:44] <prometheanfire> hmm, I was thinking it would be in 0.24.1
[1:45] <gregaf> check out the wiki page
[1:45] <gregaf> RBD uses some of RADOS' cool class stuff for OSD-local customizable calculations
[1:45] <prometheanfire> you talking about .37? if so I am using it
[1:45] <gregaf> so you need to run a few extra steps to set up RBD
[1:45] <prometheanfire> ok
[1:45] <gregaf> the code's all there
[1:46] <prometheanfire> no symbols :D
[1:48] <prometheanfire> ceph class add -i /usr/lib64/rados-classes/libcls_rbd.so.1.0.0 rbd 1.3 x86-64 works though
[1:48] <sjust> wido: are you there?
[1:57] * tuhl (~tuhl@p4FD2788E.dip.t-dialin.net) has joined #ceph
[1:59] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[2:05] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:09] <yehudasa> promethanfire: yeah, it should work
[2:09] <yehudasa> the symbols are only needed for the script to be able to run the same command line
[2:09] <yehudasa> they're used to determine the class name, version and architecture
[2:11] <prometheanfire> ah
[2:11] <tuhl> sage?
[2:11] <prometheanfire> I loaded rbd just fine, but I don't know what else to load
[2:13] <yehudasa> that's all, you don't need to load anything else
[2:13] <gregaf> tuhl: sage is out for the day; what can we help you with?
[2:13] <prometheanfire> huh, worked this time
[2:14] <tuhl> gregaf: he tried to connect vi axmpp with ne
[2:14] <tuhl> google talk seems not be be contable with his jabber server
[2:14] <gregaf> oh, he probably just mis-clicked, then?
[2:14] <prometheanfire> looks like I'll need kvm from git too
[2:15] <tuhl> gregaf: he sent me an invite, I accepted, but somethin went wrong
[2:16] <gregaf> huh, dunno then
[2:16] <tuhl> I will send mail to him
[2:19] * tuhl (~tuhl@p4FD2788E.dip.t-dialin.net) has left #ceph
[2:22] <Tv|work> well, that is a successful "write one file, read it back" on cfuse via autotest
[2:22] <Tv|work> lots of cleanup needed :(
[2:23] <cmccabe> gj!
[2:27] * cmccabe (~cmccabe@ has left #ceph
[2:32] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:41] * fzylogic (~fzylogic@ Quit (Quit: DreamHost Web Hosting http://www.dreamhost.com)
[2:49] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:53] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:00] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[3:00] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[3:15] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:18] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:19] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:19] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:24] <prometheanfire> would I use rbd cp to copy a disk image to rbd?
[3:27] <joshd> prometheanfire: cp is for copying images within rbd, you probably want to use 'rbd import'
[3:27] <prometheanfire> thanks :D
[3:32] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[3:33] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:43] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[4:46] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) has joined #ceph
[5:45] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:48] * greglap (~Adium@ has joined #ceph
[6:03] * baldben (~bencheria@cpe-76-173-232-163.socal.res.rr.com) has joined #ceph
[6:11] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) Quit (Ping timeout: 480 seconds)
[6:33] * baldben (~bencheria@cpe-76-173-232-163.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[6:34] * baldben (~bencheria@cpe-76-173-232-163.socal.res.rr.com) has joined #ceph
[6:55] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[7:11] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[8:31] * hijacker (~hijacker@ has joined #ceph
[8:32] * gregorg_taf (~Greg@ Quit (Quit: Quitte)
[8:32] * gregorg (~Greg@ has joined #ceph
[8:56] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[9:01] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:12] * uwe (~uwe@ has joined #ceph
[10:27] * allsystemsarego (~allsystem@ has joined #ceph
[11:45] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[12:27] * uwe is now known as Guest690
[12:27] * Guest690 (~uwe@ Quit (Read error: Connection reset by peer)
[12:27] * uwe (~uwe@ has joined #ceph
[13:10] * Yoric (~David@ has joined #ceph
[13:26] * alexxy[home] (~alexxy@ has joined #ceph
[13:27] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[13:30] * Yoric_ (~David@ has joined #ceph
[13:30] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[13:30] * Yoric_ is now known as Yoric
[13:41] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[16:00] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[16:14] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[17:00] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[17:21] * baldben (~bencheria@cpe-76-173-232-163.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:30] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[17:31] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[17:37] * Psi-Jack (~psi-jack@yggdrasil.hostdruids.com) Quit (Ping timeout: 480 seconds)
[17:50] * greglap (~Adium@ has joined #ceph
[17:53] * Yoric_ (~David@ has joined #ceph
[17:53] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[17:53] * Yoric_ is now known as Yoric
[17:57] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[18:10] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[18:12] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:12] * f4m8 (~f4m8@lug-owl.de) Quit (Server closed connection)
[18:12] * f4m8 (~f4m8@lug-owl.de) has joined #ceph
[18:13] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:23] <sagewk> tv|work: 'make dist' failing on master branch.. something with gtest stuff in the makefile
[18:24] <Tv|work> sagewk: first guess: someone changed code without changing tests.. will look into it
[18:25] <sagewk> thanks
[18:27] <Tv|work> oh i6t seems osdmaptool tries to open a log file in /var/log these days.. that's not a good idea
[18:27] <sagewk> nope
[18:28] <sagewk> mention it to colin when he gets on
[18:28] * Yoric (~David@ has joined #ceph
[18:28] <Tv|work> yeah
[18:29] <Tv|work> <3 gitbuilder for having already bisected it to the commit that fails
[18:30] <sagewk> :)
[18:35] <Tv|work> sagewk: confirmed that reverting eb9e6197e makes clitests (and gtest) succeed again; it is that commit
[18:35] <sagewk> thanks
[18:37] * uwe (~uwe@ Quit (Quit: sleep)
[18:39] * shdb (~shdb@80-219-123-230.dclient.hispeed.ch) Quit (Server closed connection)
[18:39] * shdb (~shdb@80-219-123-230.dclient.hispeed.ch) has joined #ceph
[18:40] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:47] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:49] <Tv|work> okay now i'm just fighting vstart.sh.. time to reimplement (parts of) it
[18:50] <sagewk> which parts?
[18:50] <Tv|work> i can probably hardcode more of ceph.conf, so avoid the bits that generate it..
[18:50] <Tv|work> i can hardcode "3 osds, all on localhost" etc stuff like that
[18:50] <Tv|work> i need the "start these daemons
[18:50] <Tv|work> / stop these daemons"
[18:51] <Tv|work> the mkfs'y bits, but i understand mkcephfs is better for that
[18:51] <sagewk> i think if anything we can reimplement it to just generate a ceph.conf and then run ./mkcephfs and ./init-ceph start
[18:51] <Tv|work> well even init-ceph is wrong for this case
[18:52] <sagewk> how so?
[18:52] <Tv|work> the daemons are neither all in cwd, nor properly installed
[18:52] <Tv|work> i don't need/want pid file handling
[18:52] <Tv|work> i don't want to touch /var/log
[18:53] <Tv|work> etc
[18:53] <sagewk> put them in cwd?
[18:53] <prometheanfire> got kvm working with a template VM :D
[18:53] * fzylogic (~fzylogic@ has joined #ceph
[18:53] <Tv|work> sure but that counts as "fighting vstart.sh" in my mind
[18:53] <Tv|work> there's a lot of hoops to jump through
[18:53] <Tv|work> for something that was never that good of a match
[18:54] <sagewk> vstart assumes everything is in cwd..
[18:54] <Tv|work> this thing is much happier with non-daemonizing things it can keep as children, etc
[18:54] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[18:54] <wido> sjust: here again.
[19:10] <sjust> wido: what ceph version is noisy running?
[19:17] <sjust> wido: also, how do I access the syslog logs?
[19:21] <wido> sjust: A version from about 3 days ago
[19:21] <sjust> master?
[19:21] <wido> the logs are in /var/log
[19:21] <wido> Yes, master
[19:21] <sjust> ok
[19:21] <sjust> thanks
[19:21] <wido> but I also log everything to logger.ceph.widodh.nl
[19:21] <wido> sjust: the logs are in "/srv/ceph/remote-syslog" on the logger machine
[19:21] <sjust> ok, thanks
[19:25] * Yoric (~David@ Quit (Quit: Yoric)
[19:38] * Meths_ (rift@ has joined #ceph
[19:45] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[19:48] * Meths_ is now known as Meths
[20:01] <Tv|work> regarding earlier conversation: s3 clone in ruby: http://code.whytheluckystiff.net/parkplace
[20:02] <Tv|work> a fork of that: https://github.com/razerbeans/boardwalk
[20:03] <Tv|work> don't see any tests though
[20:05] <Tv|work> riak's large-blob storage, Luwak, may serve as inspiration: http://wiki.basho.com/Luwak.html
[20:06] <Tv|work> (they're usually pretty good about REST, though their current apis ignore authentication completely)
[20:06] * cclien (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (Server closed connection)
[20:06] * cclien (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) has joined #ceph
[20:09] <cmccabe> riak feels like it was designed for in-datacenter use
[20:09] <cmccabe> the lack of authentication, the fact that buckets are mostly a formality...
[20:09] <Tv|work> oh yeah it's a "put a proxy in front" design
[20:11] <Tv|work> swift has tests of its web api (naturally; they're decent programmers)
[20:11] <cmccabe> riak has an interesting "get all siblings in one request" functionality
[20:12] <cmccabe> they show an example where they do curl -v -H "Accept: multipart/mixed" and get several objects in the same request
[20:12] <sjust> wido: it looks like your problem might be caused by the bug fixed in 3055d094413aa7e6c94bec7ece646856ce7f5f25
[20:13] <Tv|work> that looks more like a request of all branched versions..
[20:13] <Tv|work> riak is a dynamo design, objects can have multiple versions of them, with vector clocks
[20:13] <sjust> osd0,1 erroneously deleted all objects in certain pg's, causing the scrub errors you are seeing
[20:13] <cmccabe> tv: I kind of skipped ahead; I assumed siblings meant "objects at the same level of the hierarchy" but I see I was wrong
[20:16] <cmccabe> tv: so exposing the vector clock might make riak more powerful than s3 in some ways?
[20:16] <cmccabe> tv: I can see that their PUT operation has things like "last_write_wins (true or false) – whether to ignore object history (vector clock) when writing"
[20:16] <Tv|work> cmccabe: very different
[20:17] <sagewk> cmccabe: where did you see the inconsistent write_log on merge_log?
[20:17] <cmccabe> tv: in S3 you don't really get a choice-- you just blast the PUT operation and it overwrites the latest
[20:17] <sagewk> it looks like all paths are doing it
[20:17] <Tv|work> but riak is not really relevant to this conversation as such; i merely wanted to point out that they have "large blob storage" api too
[20:17] <cmccabe> sagewk: 1sec, let me check master
[20:17] <Tv|work> so *luwak* is interesting
[20:17] <Tv|work> but not core riak as such (apart from the fact that they have a really nice REST api design)
[20:18] <cmccabe> sagewk: In OSD::_process_pg_info, only one branch does pg->write_info(*t);
[20:20] <cmccabe> the other branch calls PG::proc_replica_log, which always overwrites peer_info[from] but never calls pg->write_info
[20:21] <Tv|work> so i should type out what i meant with the whole "s3 api authentication etc is painful to use"; web developers have been really, *really* happy about how simple using OAuth2 has been, and using oauth2 can be summarized as 1) use https 2) include this secret token as a query argument
[20:21] <Tv|work> vs oauth1 that was "order your form submit fields, md5 them, sign, blahblah", which was pretty much inspired by the amazon apis
[20:22] <cmccabe> tv: I remember that in S3, the signing is done on the request as a whole rather than just one part
[20:22] <Tv|work> so if i was looking at writing anything like that from scratch, i'd definitely try to use oauth2
[20:22] <cmccabe> tv: I always assumed that the reasoning was that otherwise packet sniffers could steal your key and do nasty things
[20:23] <Tv|work> hence 1) https
[20:23] <cmccabe> tv: I guess you could argue that HTTPS gives you a secure session anyway, so why bother
[20:23] <Tv|work> using oauth2 without https would be very very stupid
[20:23] <cmccabe> tv: does HTTPS obviate replay attacks?
[20:23] <Tv|work> cmccabe: TLS does
[20:24] <Tv|work> oauth2 is not perfect, definitely not -- but it's a really sweet compromise
[20:25] <cmccabe> tv: I wonder why amazon still supports HTTP. I guess compatibility?
[20:25] <Tv|work> cmccabe: yeah, plus they have a huge codebase now..
[20:26] <cmccabe> tv: so basically, amazon's signing scheme is a crude attempt to make HTTP as secure as HTTPS
[20:26] <Tv|work> cmccabe: it's a "yes this request is allowed" mechanism that does not require a secure channel
[20:26] <Tv|work> cmccabe: i've built my fair share of those
[20:27] <cmccabe> :)
[20:27] <Tv|work> cmccabe: that's nice because then you can e.g. share a link to the content
[20:27] <cmccabe> tv: hence their hacks to try to defeat replay attacks
[20:27] <Tv|work> just make it a "read access allowed" signature
[20:27] <Tv|work> and for read-only, you don't need to really care about replay attacks
[20:27] <Tv|work> trying to do more with that just gets painful
[20:27] <Tv|work> but really, non-idempotent web requests are a recipe for pain
[20:28] <Tv|work> make the api look idempotent instead
[20:29] <cmccabe> tv: I guess create bucket, delete bucket at least are going to be non-idempotent
[20:30] <cmccabe> tv: well, I guess maybe you could check to see if a bucket "exactly" like the requested one exists.
[20:30] <cmccabe> tv: this reminds me of the evil old NFSv2 server/client communication in a way
[20:30] <wido> sjust: ok, i'll upgrade and see what happends
[20:38] * shdb (~shdb@80-219-123-230.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[20:39] <sjust> wido: if that was the problem, the inconsistent pg's should continue to appear until all are fixed
[20:47] <Tv|work> cmccabe: yeah, it's not idempotent unless it's "*i* now have a bucket by this name"
[20:48] <Tv|work> cmccabe: the thing that is fugly is access control list manipulation; you need replay protection so attackers can't roll back your acl to re-include the hole fixed
[20:48] <Tv|work> cmccabe: forcing all of that to go via https is so much nicer
[20:50] <cmccabe> tv: maybe this is a dumb question, but what's the performance of HTTPS vs. HTTP
[20:51] <cmccabe> tv: you always hear about websites being stingy with the amount of HTTPS they use, even banks, because there's some kind of perception that HTTP is cheaper somehow
[20:52] <cmccabe> http://stackoverflow.com/questions/149274/http-vs-https-performance
[20:52] <cmccabe> tv: "Bottom line: making lots of short requests over HTTPS will be quite a bit slower than HTTP, but if you transfer a lot of data in a single request, the difference will be insignificant."
[20:53] <Tv|work> cmccabe: there's a classic google statement about that..
[20:54] <Tv|work> http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
[20:54] <Tv|work> "In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. "
[20:54] <Tv|work> (that page is an awesome resource in speeding up ssl, too)
[20:55] <Tv|work> that stackoverflow thing also seems to miss the fact that TLS has sessions that live longer than TCP connections, etc
[20:55] <Tv|work> i rarely trust stackoverflow, these days :-/
[20:56] <Tv|work> also, next gen intel cpus have AES on chip..
[20:56] <cmccabe> tv: yeah, stackoverflow has a lot of noise
[20:57] <cmccabe> tv: it's helpful to get a general idea about things, but sometimes I cringe when I read the C/C++ advice on there
[20:57] <Tv|work> anyway, the difference of tls/no is often dwarfed by your other architectural decisions
[20:58] <Tv|work> managing the frigging keys and certs is the only valid reason to avoid tls
[20:58] <cmccabe> tv: one thing you do have to keep in mind is that google's SSL connections are probably pretty long-lived.
[20:58] <Tv|work> cmccabe: for stuff like gmail.com? yes and no, but definitely not "large objects"
[20:59] <cmccabe> tv: I mean for gmail they're doing ajax-y stuff... doesn't that involve keeping connections open for a while?
[20:59] <Tv|work> sure but there's still lots of just html, css, etc downloads
[21:00] <cmccabe> tv: anyway, do people generally send multiple S3 commands in a single session?
[21:00] * WesleyS (~WesleyS@ has joined #ceph
[21:00] <Tv|work> cmccabe: depends on what the use case is
[21:01] <cmccabe> tv: I remember as an S3 user we would sometimes send multiple S3 REST requests in a session, and sometimes only 1
[21:02] <cmccabe> tv: in our case, we were sending such large objects that I'm sure handshaking was not a big part of the total cost
[21:02] <Tv|work> TLS can reuse the first handshake whether it's the same TCP connection or not
[21:02] <Tv|work> if you just set it up right
[21:02] <cmccabe> tv: anyway, I think any API needs to include some provision for anonymous download via HTTP. (read-only, as you said)
[21:02] <Tv|work> yeah, but that part is really easy
[21:03] <cmccabe> tv: read-only is always easy :)
[21:03] <Tv|work> HMAC-SHA1 the object id, expiry timestamp, and an optional IPv4 class C source address; slap that in the url
[21:03] <Tv|work> oh shared secret too in there ;)
[21:05] <cmccabe> tv: it's nice to be able to generate anonymous URLs on the fly
[21:05] <cmccabe> tv: as in, make something accessible for a short amount of time
[21:06] <Tv|work> hence expiry
[21:06] <cmccabe> tv: S3 also has more interesting mechanisms like buckets that are owned by one person, but which other S3 users can upload to (but those other users get charged)
[21:06] <Tv|work> yeah, that's just ACL
[21:06] <Tv|work> all those operations should be under https
[21:06] <cmccabe> tv: I think those are called golden buckets
[21:07] <cmccabe> tv: ok never mind, I read someone's silly blog post. That's not official terminology.
[21:07] <cmccabe> tv: but amazon did have some interesting payment options, I just don't know the terminology
[21:08] <cmccabe> tv: but yeah, those things probably all should be under HTTPS
[21:15] <Tv|work> that's still just a "i give you access to this, under these terms" = acl
[21:17] <cmccabe> tv: true
[21:19] * chrisrd (~chrisrd@ Quit (Server closed connection)
[21:19] * chrisrd (~chrisrd@ has joined #ceph
[21:37] <Tv|work> alright off to receive a shipment of furniture; bbl ~1h
[21:42] * uwe (~uwe@mb.uwe.gd) has joined #ceph
[22:06] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:10] * gregorg_taf (~Greg@ has joined #ceph
[22:10] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[22:27] <Tv|work> ..aand back
[22:50] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[22:52] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:57] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:02] * hijacker (~hijacker@ Quit (Server closed connection)
[23:02] * hijacker (~hijacker@ has joined #ceph
[23:04] * baldben (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[23:20] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:20] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[23:31] * uwe (~uwe@mb.uwe.gd) Quit (Quit: sleep)
[23:38] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[23:38] * bcherian (~bencheria@ip-66-33-206-8.dreamhost.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.