#ceph IRC Log

Index

IRC Log for 2011-03-16

Timestamps are in GMT/BST.

[0:14] <Tv> sooo.. even with all the log to files stuff going on, something *still* writes to stderr for some stuff?
[0:15] <Tv> this makes no sense
[0:16] <cmccabe> tv: there are some hard-coded calls to cerr still lingering
[0:16] <Tv> you mean derr?
[0:17] <cmccabe> tv: no, I mean cerr
[0:17] <cmccabe> tv: as in, hard-coded stderr
[0:17] <Tv> well this message i'm staring to goes to derr
[0:17] <Tv> and it shows up both in log and stderr
[0:17] <cmccabe> tv: as I recall, derr writes both to stderr and to the log prior to daemonize
[0:18] <Tv> there is no daemonize :(
[0:18] <cmccabe> tv: this was done so that users didn't have to always open a log file to find out why daemonize failed
[0:18] <cmccabe> tv: well, you can supply --log_to_stderr=0 if you want to force no logging to stderr
[0:19] <gregaf> logging to stderr on every run is a lot more annoying than having to open a log file every 20th run, if you ask me :/
[0:19] <Tv> <bitchy>why the hell does this have to be so non-unixy</bitchy>
[0:20] <cmccabe> gregaf: it only logs to stderr if it fails before daemonize
[0:20] <gregaf> ....
[0:20] <Tv> or if it thinks you haven't seen enough yellow that day yet
[0:20] <gregaf> it only logs if it fails before daemonize?
[0:20] <cmccabe> gregaf: and it's always logged to stderr. I just changed it from being hard-coded to using derr
[0:20] <gregaf> you mean it only logs before it daemonizes
[0:20] <cmccabe> gregaf: back when it was hard-coded to stderr, the error message would *not* show up in the log
[0:20] <gregaf> it always logs if you tell it to
[0:21] <cmccabe> gregaf: it always logs to the log file
[0:21] <gregaf> yes
[0:21] <cmccabe> gregaf: it logs to stderr if daemonize has not yet been called
[0:21] <gregaf> yeah, I don't think it did that previously
[0:21] <cmccabe> gregaf: come on, that's really not that complex
[0:21] <cmccabe> gregaf: yes, there have been calls to derr in there since the very beginning.
[0:21] <cmccabe> gregaf: like if cosd failed to start for some reason.
[0:21] <gregaf> cmccabe
[0:21] <gregaf> 4:20
[0:21] <gregaf> gregaf: it only logs to stderr if it fails before daemonize
[0:22] <gregaf> is what I was responding to
[0:23] <cmccabe> ok. There was a time when derr did not exist.
[0:23] <cmccabe> It was a dark and scary time.
[0:23] <cmccabe> Some errors went to cerr. Some went to dout(0)
[0:23] <cmccabe> You never knew whether your error would end up in the log file or not.
[0:24] <cmccabe> Then, I created derr. derr *always* logs to the log file if one has been configured.
[0:24] <cmccabe> *sometimes*, derr logs to stderr. Again this is configurable and you can turn it off
[0:26] <cmccabe> tv: the reason why the logging system is "non-unixy" is because unix doesn't have log levels or multiple sinks
[0:26] <Tv> cmccabe: says who?-)
[0:26] <cmccabe> tv: unix has stdout and stderr
[0:27] <cmccabe> tv: yes, you can use pipes and tee and all that to simulate multiple sinks. But that's not generally done with daemons for a variety of good reasons.
[0:27] <cmccabe> tv: I guess a purist might say that we should just replace the whole dout() system with UNIX's built-in syslog().
[0:27] <Tv> let me just say that at this point, my natural reaction to many of the ceph codebase problems is ripping out code
[0:28] <Tv> syslog is pretty bad
[0:28] <cmccabe> tv: but syslog tends to lose messages when you output a lot of messages
[0:28] <cmccabe> tv: its number of priority levels is *very* limited
[0:28] <Tv> i kinda liked the daemontools/runit design, but e.g. upstart doesn't support that at all
[0:28] <cmccabe> tv: and it cannot use shared-memory or other optimizations (it uses a unix domain socket)
[0:29] <Tv> you can't use a postprocessor for debug-worthy log verbosity anyway, that's too late
[0:29] <Tv> suppress at the origin
[0:29] <Tv> holy shit, remind me to not let you "optimize" ceph logging
[0:29] <cmccabe> tv: actually, you can meaningfully postprocess logs
[0:29] <cmccabe> tv: for example LTTng has a circular buffer and only copies certain of those events out to disk
[0:30] <Tv> by the time you've stringified your debug junk, you've lost already
[0:30] <cmccabe> tv: this is argument by assertion.
[0:30] <cmccabe> tv: in most cases, that is true. But not always.
[0:31] <cmccabe> tv: anyway, I don't understand why "optimize" is in the scare quotes
[0:31] <Tv> http://i.qkme.me/usx.jpg
[0:31] <cmccabe> tv: logging consistently comes up as a top cpu hog when I profile
[0:31] <Tv> because there's a frigging lock on dout
[0:32] <Tv> and lots of string manipulation
[0:32] <DJLee> guys, when does mds receive/send lots of traffic? i dont see many occasions,?
[0:32] <Tv> and most ceph runs are still with heavy logging on
[0:32] <DJLee> or mds to osd traffic?
[0:32] <gregaf> what do you mean lots of traffic?
[0:32] <Tv> DJLee: the whole idea is that the metadata operations are comparatively light, and osds take the brunt
[0:32] <sjust> DJLee: should be mostly during heavy metadata operations
[0:33] <cmccabe> tv: I get kind of frustrated when people "diagnose" performance problems without using a profiler
[0:33] <gregaf> DJLee: hopefully, it doesn't see a lot of traffic
[0:33] <cmccabe> tv: it's really a bad practice and we should avoid it
[0:33] <cmccabe> tv: until we have actual data, any speculation about performance should be prefaced by "in my opinion..."
[0:33] <gregaf> as Tv says, hopefully most traffic is delegated to the client cache and to the OSDs
[0:33] <Tv> sjust: http://autotest.ceph.newdream.net/afe/#tab_id=view_host&object_id=sepia24.ceph.dreamhost.com
[0:34] <Tv> cmccabe: even tcp/ip over localhost does 6+ Gbps easily, unix domain sockets are faster, file writes are even faster; that's not your bottleneck
[0:35] <DJLee> cool, this is what i have in my ballpark figure,
[0:35] <DJLee> client -> osd (heavy) ,
[0:35] <DJLee> but client -> mds (light, ok, except lots od metadata),
[0:35] <DJLee> mds to osd (almost no traffic?),
[0:35] <DJLee> osd to osd (heavy when leveraging replications?)
[0:35] <gregaf> well, the MDS maintains a streaming write to the OSDs, but the volume of that will vary a lot depending on workload and hopefully never gets too high
[0:36] <cmccabe> tv: I never said it was the bottleneck
[0:36] <sjust> Tv: woot
[0:36] <cmccabe> tv: I just pointed out that syslog was less flexible because it had to use unix domain sockets
[0:36] <Tv> cmccabe: i have not endorsed syslog at any point
[0:36] <cmccabe> tv: never said you did either
[0:37] <Tv> *sigh*
[0:39] <Tv> whee sepia24 can run a simple shell command when told to do so.. http://autotest.ceph.newdream.net/afe/#tab_id=view_job&object_id=207
[0:39] <Tv> sjust: moar hardware!
[0:39] <sjust> yep, working on a script
[0:40] <Tv> i wonder if sepia24 would complete this test run faster than the vms will.. the vms have been at it for a while now ;)
[0:43] * midnightmagic (~nouser@S0106000102ec26fe.gv.shawcable.net) Quit (Quit: midnightmagic)
[0:48] <Tv> except i need >=2 to run this, or i risk the same-box memory pressure bug
[0:52] * chip (~chip@brma.tinsaucer.com) Quit (Server closed connection)
[0:52] * chip (~chip@brma.tinsaucer.com) has joined #ceph
[1:02] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[1:11] * rajeshr (~Adium@98.159.94.26) Quit (Ping timeout: 480 seconds)
[1:16] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[1:19] * greglap (~Adium@166.205.140.219) has joined #ceph
[1:20] * rajeshr (~Adium@98.159.94.26) has joined #ceph
[1:27] * rajeshr (~Adium@98.159.94.26) Quit (Quit: Leaving.)
[1:37] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:39] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[1:47] * Meths (rift@91.106.202.83) Quit (Server closed connection)
[1:48] * Meths (rift@91.106.202.83) has joined #ceph
[2:03] * greglap (~Adium@166.205.140.219) Quit (Read error: Connection reset by peer)
[2:24] * eternaleye_ is now known as eternaleye
[2:31] * cmccabe (~cmccabe@m4c0536d0.tmodns.net) has left #ceph
[3:15] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:34] <neurodrone> Is it possible to try running Ceph on virtual nodes created entirely using a single VM on a single machine?
[3:37] * Juul (~Juul@nat-sonicnet.noisebridge.net) has joined #ceph
[4:14] <bchrisman> neurodrone: sure is.. and it's a good test platform.. not great for performance or availability though.. :)
[4:14] <bchrisman> or rather.. multiple VMs...
[4:16] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[4:18] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[4:22] <neurodrone> bchrisman: True that. Although, I am just trying a test implementation myself though and I only have a single machine at my disposal at the moment. :)
[4:22] <bchrisman> yeah.. should work between a few VMs.
[4:23] <bchrisman> recommended the clients be on a separate VM from the osd/mds processes.
[4:23] <neurodrone> bchrisman: I was wondering if there was a way to establish connection between multiple virtual nodes through VBox and then ssh into each one of them to operate on them.
[4:23] <neurodrone> I see.
[4:23] <bchrisman> not sure about vbox.. sorry..
[4:24] <neurodrone> bchrisman: That's okay. I will just try and see how it works. Just wanted to know if it was possible to do that. :)
[4:36] <greglap> If you're just using one node you might as well use the vstart script — if you have a fair bit of memory then deadlock is pretty unlikely
[4:36] <greglap> neurodrone:
[4:39] <neurodrone> greglap: Ah. Well, I am just using one physical system to test the workings of Ceph, so I am planning to host 3 nodes each as an MDS/OSD and monitor to simulate the workings of Ceph. I am not too worried about the performance benchmarks right now, I just need to get the system up and running primarily. :)
[4:40] <greglap> well you can run more than one daemon on a single node if you like, and it's probably easier/cheaper than trying to put them all in their own vm
[4:41] <greglap> in fact you can do that with vstart, although that won't teach you anything about how to admin the system
[4:41] <greglap> *shifts uncomfortably*
[4:41] <neurodrone> I agree. That is what I am planning to do. Have 3 different instances of VM running for each node, each running its own daemon service it is responsible for.
[4:42] <neurodrone> I will check up on vstart. Anything that helps. ;)
[5:53] * Juul (~Juul@nat-sonicnet.noisebridge.net) Quit (Ping timeout: 480 seconds)
[7:16] * MK_FG (~MK_FG@188.226.51.71) Quit (Quit: o//)
[7:19] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[8:06] * allsystemsarego (~allsystem@188.25.130.175) has joined #ceph
[8:20] * ooolinux (~bless@203.114.244.241) has joined #ceph
[8:21] * ooolinux (~bless@203.114.244.241) Quit (autokilled: This host may be infected. Mail support@oftc.net with questions. BOPM (2011-03-16 07:21:00))
[8:23] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:43] * neurodrone_ (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[8:43] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[8:43] * neurodrone_ is now known as neurodrone
[9:27] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:38] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[10:15] * Yoric (~David@213.144.210.93) has joined #ceph
[12:28] * lxo (~aoliva@201.82.54.5) Quit (Quit: later)
[12:42] * lxo (~aoliva@201.82.54.5) has joined #ceph
[13:14] * lxo (~aoliva@201.82.54.5) Quit (Ping timeout: 480 seconds)
[13:17] * lxo (~aoliva@201.82.54.5) has joined #ceph
[14:59] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[15:02] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[15:13] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[15:15] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[15:15] * lxo (~aoliva@201.82.54.5) Quit (Read error: Connection reset by peer)
[15:16] * lxo (~aoliva@201.82.54.5) has joined #ceph
[15:17] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[15:18] * alexxy[home] (~alexxy@79.173.81.171) Quit (Ping timeout: 480 seconds)
[15:32] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[15:39] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[15:42] * __jt___ (~james@jamestaylor.org) Quit (Server closed connection)
[15:42] * __jt__ (~james@jamestaylor.org) has joined #ceph
[15:46] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[15:55] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:56] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Server closed connection)
[15:56] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[16:06] * greglap1 (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[16:06] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[16:10] * Yoric (~David@80.70.32.140) has joined #ceph
[16:24] * greglap1 (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[16:48] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[16:50] * greglap (~Adium@166.205.138.209) has joined #ceph
[17:00] * Yoric (~David@80.70.32.140) Quit (Quit: Yoric)
[17:02] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:05] <Tv> 03/15 17:17:54 DEBUG|base_utils:0106| [stdout] All operations completed A-OK!
[17:05] <Tv> wheee fsx completed overnight
[17:06] <Tv> and not even that much after i left... :(
[17:06] <Tv> 03/15 19:27:57 DEBUG|base_utils:0106| [stdout] All operations completed A-OK!
[17:06] <Tv> both 1-node and multinode clusters
[17:28] <sjust> cool!
[17:40] * greglap (~Adium@166.205.138.209) Quit (Quit: Leaving.)
[17:48] * Yoric (~David@213.144.210.93) has joined #ceph
[17:56] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[17:59] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:03] <Tv> sagewk: i'm sure you can come up with the same google search as i did, but it seems http://www.mediawiki.org/wiki/Manual:Combating_spam http://www.mediawiki.org/wiki/Anti-spam_features look like they just need someone to admin the mediawiki properly..
[18:04] * cmccabe (~cmccabe@208.80.64.121) has joined #ceph
[18:06] <sagewk> tv: i guess my real question is who
[18:06] <Tv> yes
[18:10] <sagewk> no normal meeting, i have a stupid thing
[18:14] <Tv> whee 6-node real hardware fsx running
[18:14] <cmccabe> tv: that's cool. which nodes are clients?
[18:15] <Tv> just one client for now
[18:15] <cmccabe> tv: k
[18:16] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[18:17] * Yoric (~David@213.144.210.93) has joined #ceph
[18:24] <bchrisman> do_autogen.sh is a build script vaguely specific to the ceph group's internal testing?
[18:25] <cmccabe> bchrisman: basically, yes
[18:25] <cmccabe> bchrisman: you don't have to use it; I just got tired of copying and pasting the same arguments to autogen.sh over and over
[18:26] <bchrisman> cmccabe: ahh cool.. yeah.. I'll pull some of that into my build scripting… notice the gtk requirement it includes … there a gui component to ceph I haven't yet run across?
[18:27] <cmccabe> bchrisman: there is a gui version of cephtool
[18:27] <cmccabe> bchrisman: it was done by Michael McThrow originally and we integrated it
[18:27] <cmccabe> bchrisman: it's still a little rough
[18:28] <bchrisman> cmccabe: cephtool… haven't seen that… is it in git master??
[18:28] <cmccabe> it's called ./ceph
[18:28] <bchrisman> oh.. okay.. :)
[18:28] <Tv> gceph perhaps
[18:29] <cmccabe> yes, the graphical verison is gceph
[18:29] <bchrisman> gceph uses the gtk stuff then gotchya.
[18:30] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Quit: Leaving.)
[18:30] <cmccabe> gtk is definitely optional
[18:36] <bchrisman> yeah cool
[18:37] <bchrisman> there a screenshot floating around of what the gtk stuff looks like?
[18:37] <bchrisman> Would be a bit of work to get it into our environment right now.. we just went through a round of package attrition to get things to the relatively-fast state that they are at present.
[18:41] <cmccabe> bchrisman: I'm not sure if there's a screenshot up at present
[18:43] <cmccabe> let me see if I can post one somewhere accessible
[18:43] <bchrisman> would make a nice addition to the wiki :)
[18:43] <cmccabe> in general, gceph and cephtool need more work to make sure they're displaying the things that administrators really care about
[18:43] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[18:45] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (Server closed connection)
[18:46] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[18:49] <cmccabe> bchrisman: http://ceph.newdream.net/wiki/Gceph
[18:50] <bchrisman> http://ceph.newdream.net/wiki/Gceph
[18:50] <bchrisman> erf.. thanks. ;)
[18:51] <cmccabe> we would like to move from the icon-based view to something that displayed a list
[18:51] <gregaf> cmccabe: gceph doesn't expose anything that's not in cephtool, does it?
[18:51] <cmccabe> gregaf: nope
[18:51] <gregaf> didn't think so…so what's with the PG grouping?
[18:52] <cmccabe> gregaf: it makes no sense
[18:52] <cmccabe> gregaf: it was just something that was in the original patch and hasn't been fixed up since then
[18:52] <gregaf> the grouping is meaningless?
[18:52] <cmccabe> gregaf: yep
[18:52] <gregaf> oh, lame :(
[18:52] <cmccabe> gregaf: also, PGs have names, rather than numbers
[18:53] <cmccabe> gregaf: essentially all 4 panes should look like the monitor pane, where it's a list
[18:53] <cmccabe> preferably with pretty colored icons in the list
[18:54] <gregaf> yeah, don't get rid of the pretty colored icons or it's just a space-eating view of ceph -s
[18:54] <gregaf> ;)
[18:54] <cmccabe> haha
[18:54] <cmccabe> well, it was always going to be that
[18:54] <gregaf> a good gui should show the data more usefully, though
[18:54] <gregaf> like it could display the association between OSDs and PGs much better than a list-based view can
[18:55] <cmccabe> yeah
[18:56] <bchrisman> yeah… I'm guessing focus will be on administration.. so you'll want to have things which have gone wonky be a bit prominent and all the healthy stuff be aggregated..
[18:57] * sagephone (~yaaic@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:58] <cmccabe> I'm not sure what the use case is really
[18:59] <cmccabe> you can't expect an admin to have this thing open all day
[18:59] <bchrisman> figuring some support guy would pop this kind of thing up to see why customers are screaming.. :)
[18:59] <cmccabe> maybe the most realistic use case is that the admin gets notified that something interesting is happening (through SNMP trap or some other yet-to-be-determined mechanism), and then opens gceph to take a closer look
[18:59] <Tv> eww snmp
[19:01] <cmccabe> I dunno, in the commercial router world SNMP seems to be a requirement for some people
[19:01] <cmccabe> I never had to implement it so I have no idea how good/bad it is as a standard
[19:02] <cmccabe> when your sysadmins are more technical, email-based notification mechanisms seem to be the cool thing to do
[19:07] * sagephone2 (~yaaic@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:07] * sagephone (~yaaic@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[19:23] <bchrisman> looking at docs, seems tcmalloc has some issues with x86_64 platform? also noticed all the precompiled packages are for the i686 platform… I guess I can use the various flags they recommend to get around the issues… is that what you guys are doing when building with tcmalloc? So far I've just been letting it not get used, but there's an autogen change that's turned it into a requirement.. so it's either fix or flag it all off.
[19:24] <cmccabe> bchrisman: I'm not aware of 64-bit issues with tcmalloc, do you have a url?
[19:24] <cmccabe> bchrisman: also tv's autogen change just affects defaults. You can turn off tcmalloc if you want
[19:26] <Tv> bchrisman: maybe you're thinking of the profiling or what ever it was, in the same package, that had trouble on 64-bit
[19:28] <Tv> "There are two issues that can cause program hangs or crashes on x86_64
[19:28] <Tv> 64-bit systems, which use the libunwind library to get stack-traces.
[19:28] <Tv> Neither issue should affect the core tcmalloc library; they both
[19:28] <Tv> affect the perftools tools such as cpu-profiler, heap-checker, and
[19:28] <Tv> heap-profiler."
[19:30] <bchrisman> yeah… saw that… figured it must be why they're not publishing binaries for google-perftools for x86_64 platform (RHEL)…
[19:30] * atgeek (~atg@please.dont.hacktheinter.net) Quit (Server closed connection)
[19:30] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[19:37] <gregaf> bchrisman: hmmm, is there a tcmalloc-minimal or something like that available?
[19:37] <gregaf> that just includes the core memory allocator and heap analyzer, IIRC
[19:37] * Juul (~Juul@131.243.47.30) has joined #ceph
[19:38] * Juul (~Juul@131.243.47.30) Quit (Remote host closed the connection)
[19:38] <gregaf> I'm not sure if our autogen setup would accept the minimal package or not, but if it doesn't we could make it do so
[19:39] * sagephone2 (~yaaic@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[19:39] <bchrisman> I don't see anything similar in any existing rhel packages…
[19:39] * Juul (~Juul@static.3.202.4.46.clients.your-server.de) has joined #ceph
[19:39] <bchrisman> is it a clear perf win for ceph to use tcmalloc?
[19:40] <gregaf> in our testing it pretty dramatically lowers memory usage
[19:40] <bchrisman> (ie, do I not bother doing perf testing without it?)
[19:40] <bchrisman> ahh ok
[19:40] <gregaf> or at least slows the growth
[19:40] <bchrisman> you guys mainly testing on debian?
[19:40] <gregaf> I don't think it should have a significant performance impact
[19:41] <gregaf> yeah, although I think we're moving towards Ubuntu once we get our test system running
[19:41] <bchrisman> I think the autogen/configure stuff basically needs a test program to compile/run successfully.. so I imagine a minimal would work if I can find/build it.
[19:41] <bchrisman> anyways.. thanks...
[19:42] <gregaf> yep
[19:42] <Tv> gregaf: tcmalloc is *designed* to improve performance though.. odd
[19:42] <Tv> bchrisman: sounds like you want ./configure --with-tcmalloc=check, no code changes needed
[19:42] <gregaf> it is, I just don't think that we're doing anything intensive enough for it to matter
[19:43] <Tv> bchrisman: we just made it "you really really should have this", now you need to explicitly say you don't
[19:43] <cmccabe> tv: performance has a lot of dimensions. You can have a low-memory heap allocator that takes more time to give you space, or a fast allocator that burns through your heap quicker
[19:43] <gregaf> nothing's changed in the code though, so if not having tcmalloc worked before it will continue to work now :)
[19:43] <cmccabe> tv: and there's the whole fragmentation discussion
[19:43] <bchrisman> Tv: yeah.. I've got that right now… wanted to figure whether I need to go make tcmalloc work before doing any perf testing/profiling...
[19:43] <Tv> gregaf: not quite true, see a9afdca18e2264fff70b5aaf864ae9abb0436dca
[19:44] <Tv> you can pass in --with-tcmalloc=check or --without-tcmalloc, whatever is appropriate
[19:44] <gregaf> Tv: I meant that the config change didn't signal any changes in how we used tcmalloc :)
[19:44] <bchrisman> yeah.. that's what I figured...
[19:45] <Tv> yeah
[19:45] <Tv> i think if we manage to optimize the locks better, you'll see a bigger difference between tcmalloc and no tcmalloc
[19:46] <Tv> or perhaps glibc built-in malloc has gotten better in the last few years
[19:46] <Tv> wouldn't surprise me if it had it's own threadpooling logic, too
[19:46] <Tv> but yeah the concrete change gregaf says we're seeing is that ceph with tcmalloc uses less memory
[19:51] <bchrisman> always good to see that.
[20:21] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:34] * Guest34 (quasselcor@bas11-montreal02-1128535815.dsl.bell.ca) Quit (Server closed connection)
[20:34] * bbigras (quasselcor@bas11-montreal02-1128535815.dsl.bell.ca) has joined #ceph
[20:35] * bbigras is now known as Guest938
[20:40] * Juul (~Juul@static.3.202.4.46.clients.your-server.de) Quit (Quit: Leaving)
[21:49] * MarkN (~nathan@59.167.240.178) has joined #ceph
[22:00] <Tv> hehe.. /me scheduled an fsx run over 6 sepia machines every day at 9pm
[22:51] <Tv> aand now we run bonnie, fsx and ffsb all every night starting at 9pm
[22:57] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Server closed connection)
[22:57] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:20] * rajeshr (~Adium@98.159.94.26) has joined #ceph
[23:33] <Tv> i'm adding tests faster than sepia can run them (and thus check for my typos)!
[23:37] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.