#ceph IRC Log


IRC Log for 2011-02-10

Timestamps are in GMT/BST.

[0:07] <Tv|work> can someone give me a quick braindump on the cls_*.cc stuff, what's the big idea there?
[0:09] <cmccabe> tv: it's a way of dynamically loading code-- yehuda knows the most about it as I recall
[0:09] <Tv|work> cmccabe: yeah looks like plugins, i'm looking more for the "why" -- what was the point of the exercise?
[0:09] <Tv|work> and where are the callers..
[0:10] <Tv|work> grepping for "objclass", not seeing it..
[0:10] <yehudasa> tv: it allows us loading dynamic code into the osd
[0:11] <Tv|work> yehudasa: why? what are you planning on using it for?
[0:11] <yehudasa> for manipulating/handling objects data
[0:11] <Tv|work> the big picture ;)
[0:11] <bchrisman> cmccabe: my mds seemed to ignore SIGHUP… is that a recent addition (was intending to get a log rotation)?
[0:11] <Tv|work> loading an md5 implementation doesn't seem enough to justify it
[0:11] <yehudasa> tv: getting computation closer to the objects
[0:11] <Tv|work> yehudasa: are there hooks that call something dynamic; where?
[0:12] <yehudasa> tv: you call it externally, from the clients
[0:12] <Tv|work> yehudasa: will things work if no classes are loaded?
[0:12] <yehudasa> tv: most of the stuff yes
[0:12] <yehudasa> currently it's only required for rbd
[0:12] <cmccabe> bchrisman: SIGHUP doesn't rotate logs per se. It reloads the configuration, including the logging config, and forces a dout reopen
[0:13] <yehudasa> e.g., rbd header manipulation is done as an external feature
[0:13] <yehudasa> it's not really part of the osd, and shouldn't be part of it per se
[0:13] <cmccabe> bchrisman: so the idea is, you move the old logs /some/archive/area, then send a SIGHUP. Then the daemon opens a new file in /var/log/whatever
[0:13] <bchrisman> gotchya
[0:13] <bchrisman> makes sense
[0:13] <cmccabe> bchrisman: without this mechanism, the daemon would just continue to log to the "archived" file since it's got an fd
[0:14] <bchrisman> yup
[0:14] <cmccabe> bchrisman: yeah, it's a standard unix thing, been around forever
[0:14] <yehudasa> but otoh, a client can just say 'exec rbd list snapshots' and it'll get all the snapshots that are on the rbd header
[0:14] <bchrisman> cmccabe: yup
[0:14] <yehudasa> and you don't need to implement it on every client, you can just use librados for it
[0:14] <Tv|work> yehudasa: ok.. how does the code get loaded in that case -- does every ceph osd expect to find the .so or whatever locally?
[0:15] <bchrisman> that'll blow away any config options I've changed with injectargs then?
[0:15] <yehudasa> tv: no. It's being loaded via the monitor
[0:15] <yehudasa> the monitors distribute the objects
[0:15] <Tv|work> yehudasa: let me ask a more precise question -- the .so is transferred over the network?
[0:15] <yehudasa> yeah
[0:15] <cmccabe> bchrisman: yeah
[0:15] <Tv|work> oookay
[0:15] <Tv|work> note to self: ceph is ridiculously insecure ;)
[0:16] <yehudasa> tv: it's transferred over the internal cluster network
[0:16] <Tv|work> my argument holds ;)
[0:17] <yehudasa> well.. that depends on the level of ridicule you're aiming at
[0:17] <Tv|work> yeah nothing to worry about now, just trying to understand the whole picture
[0:19] <cmccabe> tv, yehudasa: sending a .so to the OSDs to make use of their computational resources is an interesting idea
[0:19] <cmccabe> tv, yehudasa: there are other ways to do this if you're the sysadmin for the cluster, of course
[0:27] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:38] * gregorg (~Greg@ has joined #ceph
[0:38] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[0:58] <gregaf> Tv|work: as always, remember that Ceph was designed with a trusted network in mind
[0:58] <gregaf> johnl: are you here right now?
[0:58] <Tv|work> gregaf: sure but there's still layers of defense..
[0:59] * mnigh (~mnigh@75-128-161-124.static.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[0:59] <Tv|work> there's a difference between "if you can talk tcp to me, you can destroy all my data" and "if you can talk TCP to me, you are root"
[1:01] <Tv|work> also, hadoop is a hell of a good example of "hey we trust anyone who can run these commands".. a bit down the line, now it's suddenly "we're actively looking into protecting users from each other, and will embrace kerberos throughout the system"
[1:02] <Tv|work> personally, i think the time of trusted networks has largely passed us
[1:02] <Tv|work> the cloud is not your friend
[1:04] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[1:04] <Tv|work> (while you = newdream may plan to run ceph in a network that is largely inaccessible, a large chunk of the userbase will be looking at ec2 and such)
[1:07] <gregaf> yep; it's something for the future
[1:07] <gregaf> keep in mind that something like encrypted communication can be handled just by modifying the messenger, which helps a lot
[1:12] <cmccabe> tv: I hadn't heard that hadoop was looking into kerberos, that's very interesting
[1:12] <cmccabe> tv: I kind of picture hadoop as something people run in-house to analyze large data sets, so I have trouble imaging why high isolation would be important
[1:15] <Tv|work> cmccabe: no longer true; plus big companies are structured like a bunch of unrelated companies
[1:16] <cmccabe> tv: I guess a lot of data is sensitive, so I can see why access control could become important
[1:16] <Tv|work> and even when it's all internal, there's huge rewards for running it all "on the cloud"
[1:16] <sagewk> um, we have a kerberos like authetnication framework for exactly this reason
[1:17] <sagewk> the network is only trusted inasmuch as we don't take any real steps to mitigate DOS
[1:17] <Tv|work> sagewk: sure, but authentication is only part of the answer, data confidentiality is another
[1:17] <cmccabe> sagewk, tv: still when you run something on the cloud, you are trusting the cloud provider to a certain extent
[1:18] <sagewk> yeah. no encryption yet.
[1:18] <Tv|work> cmccabe: or not, and building security on top
[1:18] <cmccabe> sagewk, tv: with stuff like ec2, Amazon gives you a set of kernels to run and you must choose one of those.
[1:18] <Tv|work> cmccabe: let me rephrase that
[1:18] <cmccabe> sagewk, tv: it would be fairly trivial for amazon to backdoor one of them
[1:18] <Tv|work> cmccabe: most people are not defending against the cloud provider; they're defending against other inhabitants of the cloud
[1:18] <Tv|work> you can enter into a legal agreement with the cloud provider
[1:19] <cmccabe> sagewk, tv: are those guys on the same subnet as you? If so, that's bad, bad bad.
[1:19] <Tv|work> that absolves you of responsilibity if they attack you..
[1:19] <Tv|work> cmccabe: yup
[1:19] <cmccabe> sagewk, tv: a lot of the basic IP protocols are insecure when a black hat is sharing your LAN. Like ARP has some gaping holes.
[1:19] <Tv|work> sagewk: is ceph's auth more than just a handshake? it smells like you could still hijack an open connection..
[1:20] <sagewk> usually there's a vlan, fwiw. but whatever
[1:20] <Tv|work> sadly, vlans are easy to walk through
[1:20] <Tv|work> seriously
[1:20] <sagewk> tv: handshake during connection startup. you could conceivably hijack the tcp session.
[1:21] <sagewk> could pretty easily add a signature to each message header by the session key
[1:21] <Tv|work> yeah but tls is easier ;)
[1:21] <gregaf> or real encrypted communication for I don't think that much more pain?
[1:21] <sagewk> if you can afford it yeah
[1:22] <gregaf> just not a priority atm with the number of developers and user targets we currently have
[1:23] <gregaf> Tv|work: and if you're worried about the ability to break shit through the classes I guess you could try and exploit privilege escalation stuff...
[1:23] <gregaf> but the security model we already have does include bits for whether you can call class functions on each pool
[1:23] <gregaf> or load class functions at all
[1:24] <Tv|work> yeah i'm just a "if it doesn't have to be dynamic, don't make it dynamic" kinda guy
[1:24] <Tv|work> i don't see your rbd or whatever needs changing on the fly, *and* shipping .so's being a realistic answer
[1:24] <Tv|work> too high risk of crashing stuff, in my mind
[1:25] <DeHackEd> so write it in lua, perl, or other embedded language?
[1:25] <Tv|work> well let's first see what the rbd world looks like in 6 months.
[1:26] <Tv|work> but i do honestly think that a large chunk of ceph's code quality stems from trying to cover a whole lot of ground quickly; something "dumber" and less dynamic is always easier to make stable
[1:27] <Tv|work> alright, that's a working, *good*, autotest setup
[1:27] <Tv|work> now i want to figure out how to have things in libraries nicely, write a handful of examples, and then unleash this beast..
[1:28] <gregaf> Tv|work: the point of the classes is so that other people can have their own code that moves calculation work onto the OSDs
[1:28] <gregaf> I don't think that code has required work since I got brought on
[1:30] <gregaf> and systems where functionality is extended through plug-ins are pretty well recognized as being easier to maintain and stabilize than ones that aren't, so I'm not sure why you think rbd would work better as core OSD code
[1:32] <cmccabe> gregaf, tv: I don't have a very good knowledge of the code in question, but I do feel mildly surprised that something as core as rbd would require plugins
[1:33] <cmccabe> gregaf, tv: it feels like the kind of thing that might be a configure-time option, but having it runtime is weird
[1:33] <gregaf> it's easier to prototype and stabilize when it's implemented as a plugin
[1:33] <cmccabe> gregaf, tv: also keep in mind that there are speed penalties for using the PLT (which .so requires on x86)
[1:41] <Tv|work> gregaf: i'm afraid of the complexity of the interface; this is mostly from watching even the mapreduce world, with their fairly well defined data flow, suffer from people not being able to get their code to work and it being really painful to debug
[1:42] <Tv|work> and the fact that you do it wrong, you crash the osd
[1:42] <gregaf> ah, I don't know much about that — you could ask Yehuda; he's worked with it in rbd :)
[1:42] <Tv|work> at least the mapreduce world was protected from that, unless you wrote to /dev/sda1..
[1:42] <Tv|work> oh actually, you couldn't, you weren't even root
[1:43] <Tv|work> oh right, you could often write over hadoop's hdfs internal files if you wanted
[1:43] <Tv|work> can anyone explain this line in vstart.sh: [ "$cephx" -eq 1 ] && $SUDO $CEPH_BIN/cauthtool --create-keyring --gen-key --name=mon. $keyring_fn
[1:44] <Tv|work> what's the "mon.", why is there a trailing period, what does that mean?
[1:45] <gregaf> I'd expect there to be a $id or something after the period...
[1:45] <Tv|work> it's not inside a loop over anything
[1:45] <Tv|work> $keyring_fn == .ceph_keyring
[1:46] <gregaf> yeah, that's just the filename to write to
[1:46] <Tv|work> yes but what i'm saying, that's shared (once) for all daemons
[1:46] <Tv|work> it's inside a if $start_mon
[1:47] <sagewk> that's just the master key
[1:47] <sagewk> shared by the monitors
[1:47] <Tv|work> sagewk: so the "mon." tries to signify "all monitors"?
[1:48] <yehudasa> tv: the format is <entity type>.<entity name>
[1:48] <yehudasa> for the monitors, the entity name is blank
[1:48] <Tv|work> ok
[1:48] <sagewk> it's just a name.. "mon." instead of "mon.sharedsecretkey" or whatever
[2:01] <Tv|work> hmm it seems there are three ways to make this library shared between tests.. 1) put it on the server (pain to develop changes to the library) 2) bundle with every test (not in source tree, but as part of "make") 2) make it an extra download (slight pain to develop changes to library)
[2:01] <Tv|work> i'm guessing that means 2) wins
[2:01] <cmccabe> tv: err, dumb question, but what is "this library?"
[2:01] <Tv|work> (every test gets a separate .tar.bz2)
[2:01] <cmccabe> tv: the autotest framework for ceph?
[2:01] <Tv|work> cmccabe: utilities to make starting ceph really convenient
[2:01] <Tv|work> in autotest tests
[2:01] <Tv|work> because i don't want to copy-paste this all over the place
[2:02] <cmccabe> tv: why not add it to the ceph codebase?
[2:02] <Tv|work> cmccabe: that'd be one way to do the "extra download"
[2:02] <Tv|work> cmccabe: the ceph tree is not normally on the test workers
[2:03] <cmccabe> tv: so normally it's just the output of make install
[2:03] <Tv|work> yeah; way faster
[2:03] <cmccabe> tv: why not add it to the autotest repo
[2:03] <Tv|work> i tried compiling ceph in the test, but stopped after a few iterations
[2:03] <Tv|work> cmccabe: that's 1)
[2:03] <cmccabe> tv: it sounds like our tests are going to live in the autotest repo anyway
[2:03] <Tv|work> don't think so
[2:04] <Tv|work> that makes changing them a *pain*
[2:04] <cmccabe> tv: we sort of had this conversation yesterday but I think I was confused. If I want to write a test, what repository am I writing it in?
[2:04] <Tv|work> it doesn't exist yet
[2:04] <Tv|work> i just have a directory i fiddle with
[2:05] <Tv|work> a single test gets served to autotest as .tar.bz2 over http
[2:05] <Tv|work> that makes it easy to develop tests; just give autotest url to your own .tar.bz2
[2:06] <cmccabe> tv: looking at history...
[2:06] <Tv|work> i'm thinking all the tests would go in a repo (separate or ceph.git), then there'd be some sort of "make" to build the tarballs
[2:06] <cmccabe> (03:34:42 PM) Tv|work: autotest historically bundles its tests in its own source tree
[2:06] <cmccabe> (03:34:50 PM) cmccabe: tv: I see
[2:06] <cmccabe> (03:34:54 PM) Tv|work: because it was built to test the linux kernel
[2:06] <Tv|work> then making gitbuilder provide the "make install tarballs"
[2:06] <cmccabe> tv: so we're going to *not* do that, and instead create a third repo then?
[2:07] <Tv|work> and that enables autotest to do e.g. nightly test runs automatically
[2:07] <Tv|work> cmccabe: putting the tests in autotest.git is really cumbersome; any change means you need to deploy things on the server
[2:07] <cmccabe> tv: I think you're on the right track, just trying to understand
[2:08] <Tv|work> i have two slight question marks on a diagram here
[2:08] <Tv|work> sorry if i'm talking out loud a bit too much about this ;)
[2:08] <cmccabe> :)
[2:08] <cmccabe> anyway. Yes, not having to restart the autotest server all the time sounds like pretty good thing.
[2:09] <cmccabe> also consider using rsync to transfer test files, so that you don't end up copying a dozen identical 50-MB binaries each time
[2:09] <Tv|work> example #1: to test your own hacked ceph against standard tests, you'll submit this as autotest "control file": job.cache=False; job.run_test('http://something/standard/a-single-test.tar.bz', ceph='http://my-hacked-ceph/ceph-install.tgz')
[2:10] <Tv|work> example #2: to test autobuilt ceph against your hacked tests, you'll submit this as autotest "control file": job.cache=False; job.run_test('http://my-hacked-tests/a-single-test.tar.bz')
[2:10] <cmccabe> tv: is ceph.conf in that tar file?
[2:10] <Tv|work> example #3: to test autobuilt ceph against standard tests, you'll submit job.cache=False; job.run_test('http://something/standard/a-single-test.tar.bz')
[2:11] <Tv|work> etc
[2:11] <Tv|work> cmccabe: it's ether in a-single-test.tar.bz, or created by the python in there
[2:11] <cmccabe> tv: ok
[2:12] <Tv|work> i think that'll work, i just need to write the standard ceph-binary & test tarball providers
[2:13] <Tv|work> because i really want autotest to run a test suite against master every night
[2:13] <cmccabe> tv: yes!
[2:13] <Tv|work> fwiw as long as you provide both tarball urls manually, it'll work already
[2:15] <cmccabe> should be pretty easy to write a script to make submitting a test a one-line process
[2:15] <Tv|work> yeah it's more the need to host things on http that makes it ugly
[2:15] <Tv|work> i wish i could just bundle it in the autotest submission, but nooo
[2:17] <cmccabe> well, I'm sure there's some python module that's the equivalent of ruby's mongrel
[2:17] <Tv|work> oh yeah that's not the issue, it's just that you need to leave it running etc
[2:17] <Tv|work> it's not a fire-and-forget thing anymore
[2:17] <Tv|work> where your test might actually sit in a queue for 3 hours before it gets run
[2:17] <Tv|work> you can't take the http server down
[2:18] <Tv|work> and sure you could solve that by adding a layer of abstraction but it gets ugly pretty fast
[2:19] <cmccabe> well, we can have a little script that copies the files to some central server running apache
[2:19] <cmccabe> probably want to prepend a date to the bundles keep things sorted
[2:19] <Tv|work> yeah right now i'm thinking of writing a bit of python that serves .tar.bz2 on the fly from a git repo
[2:19] <Tv|work> so all you'd need to do is make sure your stuff is pushed to some branch, not even master
[2:19] <Tv|work> you wouldn't even need to tar it yourself
[2:19] <cmccabe> I guess git does have some kind of http functionality
[2:19] <cmccabe> but I thought you were talking about binaries, not code
[2:20] <Tv|work> not good enough -- already checked ;)
[2:20] <Tv|work> oh this is more about the test definitions
[2:20] <Tv|work> the ceph thing you're gonna have to set up yourself, as it really needs to be compiled
[2:20] <cmccabe> I guess git would be good for storing the tests in a concise format
[2:20] <Tv|work> we can have a place to host those too, so it's a "make install && ./shiver-me-tarball"
[2:20] <cmccabe> but honestly the binary package will dominate space consumption
[2:20] <Tv|work> oh i'm not gonna put anything binary in gt
[2:21] <cmccabe> yeah, the binaries are the real space hogs.
[2:21] <Tv|work> not even the test .tar.bz2
[2:21] <Tv|work> i'll create that on the fly
[2:21] <Tv|work> it's pretty easy
[2:21] <Tv|work> that way anything in the "test repo" is ready to run at any moment
[2:21] <cmccabe> well, I'm looking forward to trying it out
[2:22] <cmccabe> sounds pretty good
[2:22] <Tv|work> so worst case you have your little wip-fiddle-with-it branch, and you keep git commit --amending and git push origin +wip-fiddle-with-it'ing, between autotest submits
[2:23] <Tv|work> alright till tomorrow
[2:31] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:05] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) has joined #ceph
[3:30] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:36] * cmccabe (~cmccabe@ has left #ceph
[4:00] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[4:31] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:00] * greglap (~Adium@ has joined #ceph
[6:00] * greglap (~Adium@ Quit (Quit: Leaving.)
[6:04] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[6:10] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[7:55] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:10] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:19] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[8:35] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:43] * gregorg (~Greg@ Quit (Ping timeout: 480 seconds)
[8:47] * gregorg (~Greg@ has joined #ceph
[8:51] * Jiaju (~jjzhang@ Quit (Ping timeout: 480 seconds)
[9:02] * allsystemsarego (~allsystem@ has joined #ceph
[9:09] * gregorg_taf (~Greg@ has joined #ceph
[9:09] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[9:09] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[9:10] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:32] * Jiaju (~jjzhang@ has joined #ceph
[9:32] * uwe (~uwe@ has joined #ceph
[10:08] * gregorg (~Greg@ has joined #ceph
[10:19] * Yoric (~David@ has joined #ceph
[12:00] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) has joined #ceph
[12:08] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[12:47] * hijacker (~hijacker@ Quit (Remote host closed the connection)
[12:47] * hijacker (~hijacker@ has joined #ceph
[12:57] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[13:39] * Yoric_ (~David@ has joined #ceph
[13:39] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[13:39] * Yoric_ is now known as Yoric
[14:08] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) Quit (Quit: Leaving)
[14:14] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:27] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[15:08] * mnigh (~mnigh@75-128-161-124.static.stls.mo.charter.com) has joined #ceph
[15:21] * gregorg_taf (~Greg@ has joined #ceph
[15:21] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[16:13] * Yoric (~David@ Quit (Quit: Yoric)
[16:36] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[16:37] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[16:51] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[16:58] * greglap (~Adium@static-72-67-79-74.lsanca.dsl-w.verizon.net) has joined #ceph
[17:02] * todin (tuxadero@kudu.in-berlin.de) Quit (Quit: leaving)
[17:07] * greglap1 (~Adium@static-72-67-79-74.lsanca.dsl-w.verizon.net) has joined #ceph
[17:07] * greglap (~Adium@static-72-67-79-74.lsanca.dsl-w.verizon.net) Quit (Read error: Connection reset by peer)
[17:37] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:42] * greglap1 (~Adium@static-72-67-79-74.lsanca.dsl-w.verizon.net) Quit (Quit: Leaving.)
[17:54] * greglap (~Adium@ has joined #ceph
[17:58] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[18:08] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:15] * uwe (~uwe@ Quit (Quit: sleep)
[18:16] * greglap1 (~Adium@ has joined #ceph
[18:22] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[18:32] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:34] <Tv|work> i wonder how much everyone will hate me if i name this ceph autotest helper library "teuthology"...
[18:34] <Tv|work> "The study of cephalopods is a branch of malacology known as teuthology." says wikipedia ;)
[18:35] <sagewk> hehe
[18:35] <sagewk> works for me :)
[18:41] * greglap1 (~Adium@ Quit (Quit: Leaving.)
[18:47] <wido> hi
[18:47] <wido> I'm seeing something strange. When my OSD is busy, dpkg freezes. For example, apt-get install sysstat blocks for ever
[18:48] <wido> seen this before in the past, but never paid attention to it
[18:48] <Tv|work> wido: sounds like it's IO starved, perhaps artificially so (kernel client bugs)
[18:48] <sagewk> huh. this is on the osd, with no client mounts on that machine?
[18:48] <wido> Tv|work: This machine is only running OSD's
[18:48] <wido> sagewk: Yes, only OSD's
[18:49] <sagewk> try cat /proc/$pid/stack for dpkg (or any children) to see what it's blocking on
[18:49] <Tv|work> wido: hmm.. are you out of memory perhaps? dpkg is a relatively greedy on RAM? "vmstat 1" is useful
[18:49] <Tv|work> that 2nd one was not meant as a question -- dpkg *is* greedy on ram
[18:50] <wido> Hmm, it just finished. dpkg blocked for about 3 hours
[18:50] <Tv|work> "vmstat 1" is an awesome tool for asking "why is this slow"
[18:50] <wido> just got home, didn't check the machine until now
[18:50] <wido> Tv|work: If it happends again, i'll check
[18:51] <wido> btw, sagewk I've got the first Atom machine running. Dual Core Atom with 4GB Ram, Intel SSD en 4x2TB
[18:51] <Tv|work> sweet
[18:51] * Tv|work is now known as Tv
[18:52] <Tv> (shut up nickserv ;)
[18:53] <wido> Tv: http://zooi.widodh.nl/ceph/osd/
[18:54] <Tv> is the one on top a separate system disk?
[18:54] <wido> Tv: That is a SSD, used for system and journaling
[18:54] <wido> Intel X25-M 80GB
[18:54] <wido> brb, 10 min
[18:54] <Tv> ah ok so one ssd, 4x hdd; for some reason misread that as 4xSSD
[18:54] <Tv> it's silly how empty that box looks
[18:55] <sagewk> nice
[18:55] <Tv> it'll be interesting to know where the cpu/io ratio sweet spot is
[18:55] <Tv> if atom is beefy enough to exercise all that fully
[18:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:02] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[19:02] <Tv> whee master plan for autotest diagrammed.. the rest is a Simple Matter of Programming
[19:17] <wido> Tv: It's a need box. The Atom only uses 20W. The setup you see here would use about 3kW of energy when you put 44 of them in a 19" rack
[19:17] <wido> On 230V that is 16A, normal here for a 19" rack
[19:17] <wido> or is it "neet"?
[19:21] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[19:22] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:22] <Tv> "neat"?
[19:22] <Tv> wido: yeah and it looks like you cloud cram two of those in that space, if you wanted..
[19:23] <wido> Tv: "neat" indeed :) We are working on some hardware for the OSD's to build a really cheap large scaling cluster. The Atom seems really suited due to it's power consumption
[19:26] <wido> sagewk: I see the Ceph packages are in Ubuntu Natty (11.04), but it does not depend on the tcmalloc() lib. Is that on purpose?
[19:27] <cmccabe> wido: tcmalloc is optional
[19:27] <wido> cmccabe: Yes, but it will give much better memory usage. But can you link the binary to tcmalloc when it was not build against it?
[19:27] <sagewk> hmm that's probably an oversight.. we should use tcmalloc on newer distributions
[19:28] <wido> sagewk: Ah, ok. I'll open a bug at launchpad
[19:28] <cmccabe> sagewk, wido: yeah maybe there is some package management foo that we're not doing
[19:28] <sagewk> the question is if it's wrong in the debian/control in ceph.git, or if clint change it when he did the ubuntu package
[19:29] <Tv> sagewk: to use tcmalloc, debian/control should have a build-dependency on it; it doesn't
[19:29] <gregaf> wido: tcmalloc is only a linking thing
[19:29] <gregaf> although if you built without it then some of the heap profiler stuff won't be accessible through Ceph
[19:29] <Tv> sagewk: and then debian/rules should say ./configure --with--tcmalloc, and it doesn't
[19:30] <wido> gregaf: Yes, I thought so. But if the Ubuntu packages don't use it, it would give a bad user experience
[19:30] <sagewk> if it's present it'll use it, so the build-dep should be enough
[19:30] <cmccabe> gregaf: yeah, there are some ifdefs in there. I don't think they super-important, but still.
[19:30] <wido> for the early-adapters
[19:30] <sagewk> will fix that up. thanks wido!
[19:30] <Tv> sagewk: that's considered bad form, as I recall it
[19:30] <wido> sagewk: No bug to open at launchpad?
[19:30] <Tv> sagewk: makes it more like to change in surprising ways
[19:30] <Tv> *likely
[19:30] <sagewk> hmm, actually libgoogle-perftools-dev is already in build-depends.
[19:31] <sagewk> are you sure it's just dependong in libgooglesomething and not libtcmalloc_minimal or whatever?
[19:31] <sagewk> tv: yeah, can add --with-tcmalloc.
[19:32] <Tv> sagewk: runtime lib deps should be automatically discovered
[19:32] <Tv> 10:30?
[19:32] <sagewk> right, it'll break the package build this way if the build-dep is wrong tho
[19:32] <sagewk> yep, skype!
[19:49] <Tv> wido, cmccabe, sagewk: fwiw last ceph deb i see in http://archive.ubuntu.com/ubuntu/pool/universe/c/ceph/ is 0.21-0ubuntu1 and it doesn't have libgoogle-perftools-dev as build-dep
[19:56] <Tv> but i guess that's just old
[20:06] <wido> Tv: https://launchpad.net/ubuntu/+source/ceph/+bugs
[20:06] <wido> They are working on a new version
[20:07] <Tv> whee autobuilt binary packages: http://ceph.newdream.net/gitbuilder-i386/tarball/sha1/
[20:07] <Tv> inching closer to test automation
[20:12] <sjust> :)
[20:57] <Tv> whee http://ceph.newdream.net/gitbuilder-i386/tarball/ref/
[20:57] <Tv> (don't trust the links, but if you do the url manually, it'll work)
[21:00] <Tv> (rm'ed the tarball to test things, it'll show up again after a successful run)
[21:01] <Tv> alright that'll provide binaries for automated (non-developer controlled) tests; next up, serve the tests as a .tar.bz2
[21:01] <cmccabe> tv: wiki page?
[21:01] <Tv> cmccabe: not yet..
[21:59] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[22:36] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:37] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has left #ceph
[22:49] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[23:07] * uwe (~uwe@mb.uwe.gd) has joined #ceph
[23:16] <Tv> gitbuilder going down to get a bigger disk
[23:40] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:47] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) has joined #ceph
[23:48] * uwe (~uwe@mb.uwe.gd) Quit (Quit: sleep)
[23:49] <Tv> gitbuilder is back up and will catch up again, slowly

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.