#ceph IRC Log


IRC Log for 2012-04-27

Timestamps are in GMT/BST.

[0:00] <nhm> I also tried increasing objecter inflight ops and objecter inflight op bytes which seemed to degrade performance (but maybe it was just general degradation)
[0:00] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:01] <nhm> Those were ones that Yehuda mentioned yesterday. Any other fun things to tweak?
[0:02] * deam (~deam@dhcp-077-249-088-048.chello.nl) Quit (Ping timeout: 480 seconds)
[0:04] <sagewk> ms_dispatch_throttle_bytes, filestore_queue_{max,committing}_{ops,bytes}
[0:05] <sagewk> journal_max_write_{bytes,entries}
[0:05] * deam (~deam@dhcp-077-249-088-048.chello.nl) has joined #ceph
[0:11] <nhm> sagewk: ok, will try those too. I'm going to try just stock right now with debug up and watching the network for some light evening reading material. :P
[0:13] <nhm> the megaraid_sas module on these systems is acting a bit funny too. It doesn't seem to be recognizing the max_sectors module param.
[0:14] <nhm> max_hw_sectors_kb seems to be limited to 128 with this firmware/driver combo.
[0:16] <darkfader> nhm: check if readahead_kb is also limited
[0:16] <darkfader> i saw that in fujitsu servers with lsi raid
[0:18] <nhm> darkfader: I was able to up read_ahead_kb from 128 to 256 without problem.
[0:18] * aliguori (~anthony@ Quit (Remote host closed the connection)
[0:21] * deam_ (~deam@dhcp-077-249-088-048.chello.nl) has joined #ceph
[0:24] * deam (~deam@dhcp-077-249-088-048.chello.nl) Quit (Ping timeout: 480 seconds)
[0:31] * steki-BLAH (~steki@bojanka.net) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:40] <nhm> well, good news is that I'm not seeing any extent fragmentation on this test I just ran.
[0:46] * loicd (~loic@ has joined #ceph
[0:49] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[1:03] <gregaf> nhm: sorry, been in meetings and stuff
[1:04] <gregaf> you'd have to tell me more about what "basically failed" means, but yes, you're going to allow clients to double (or more) the per-OSD memory use by changing that param ;)
[1:04] <nhm> gregaf: nothing crashed, but rados bench didn't report a single successful write.
[1:05] <gregaf> oh???.bizarre....
[1:05] <gregaf> that shouldn't be able to happen at all
[1:05] <gregaf> you should reproduce that and gather logs!
[1:05] <nhm> hehe I'll make a note. ;)
[1:05] <gregaf> errr???you're sure you set it properly in bytes, and didn't set it to like 1024 or something?
[1:06] <gregaf> that's the only way I can imagine it blocking writes completely
[1:07] <nhm> gregaf: I believe I set it to 1GB in bytes.
[1:08] <nhm> gregaf: I've seen some behavior like that on this cluster when a bunch of OSDs end up stuck in peering though before they work themselvse out. I'll try testing it again and make sure that's not happening.
[1:08] * ssedov (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[1:09] * danieagle (~Daniel@ has joined #ceph
[1:11] * loicd (~loic@ Quit (Quit: Leaving.)
[1:12] <nhm> gregaf: mind if I send you some spreadsheets/graphs to glance at? Not sure there is much that can be descerned there other than that it looks like something is thrashing.
[1:13] <gregaf> nhm: uh, if you like
[1:13] <gregaf> but I don't guarantee anything except "that's a graph" :)
[1:13] * stass (stas@ssh.deglitch.com) has joined #ceph
[1:13] <nhm> gregaf: that's fine, it makes me feel better. ;)
[1:15] * stass (stas@ssh.deglitch.com) Quit ()
[1:17] * stass (stas@ssh.deglitch.com) has joined #ceph
[1:27] <gregaf> nhm: that's a graph
[1:28] <gregaf> the xfs one seems to peak about every 30 seconds, which I think is the sync interval, so that does look to me like one of the throttles is blocking things and then letting a lot of stuff through very quickly
[1:29] <gregaf> not sure about the other
[1:30] <nhm> gregaf: see, that's helpful. ;) Ok, I gotta go, bbl
[1:38] * judollise (~judollise@28IAAD8YH.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:38] <- *judollise* ok
[1:38] <- *judollise* hello
[1:39] * judollise (~judollise@28IAAD8YH.tor-irc.dnsbl.oftc.net) has left #ceph
[1:44] * judollise (~judollise@28IAAD8YH.tor-irc.dnsbl.oftc.net) has joined #ceph
[2:16] * lofejndif (~lsqavnbok@28IAAD8SF.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:17] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:17] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:23] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:34] * yoshi (~yoshi@p3167-ipngn3601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:35] * judollise (~judollise@28IAAD8YH.tor-irc.dnsbl.oftc.net) has left #ceph
[3:24] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[3:47] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[3:49] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit ()
[3:55] * jefferai (~quassel@quassel.jefferai.org) Quit (Remote host closed the connection)
[3:59] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:06] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[4:12] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:24] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[4:26] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[4:30] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit ()
[4:58] * jefferai (~quassel@quassel.jefferai.org) has joined #ceph
[5:54] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[6:13] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[6:24] * s[X]_ (~sX]@ has joined #ceph
[6:24] * s[X]_ (~sX]@ Quit (Remote host closed the connection)
[6:36] * s[X]_ (~sX]@ has joined #ceph
[6:41] * s[X]_ (~sX]@ Quit (Remote host closed the connection)
[7:12] * cattelan is now known as cattelan_away
[7:13] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[8:38] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[9:10] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:22] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[9:22] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[9:22] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[9:22] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[9:46] * The_Bishop (~bishop@cable-82-119-14-175.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[10:09] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[10:15] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:27] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[10:27] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:44] * Theuni (~Theuni@ has joined #ceph
[10:45] * Theuni (~Theuni@ Quit ()
[10:50] * creature (~wadding@ has joined #ceph
[10:56] * Theuni (~Theuni@ has joined #ceph
[10:57] * creature (~wadding@ Quit ()
[11:13] * Theuni (~Theuni@ Quit (Ping timeout: 480 seconds)
[11:29] * Theuni (~Theuni@ has joined #ceph
[11:31] * Theuni (~Theuni@ Quit ()
[11:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[12:02] * Theuni (~Theuni@mindy.gocept.net) has joined #ceph
[12:14] * joao (~JL@89-181-154-158.net.novis.pt) has joined #ceph
[12:21] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[12:25] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[12:27] * Theuni (~Theuni@mindy.gocept.net) Quit (Ping timeout: 480 seconds)
[12:29] * The_Bishop (~bishop@cable-82-119-14-175.cust.telecolumbus.net) has joined #ceph
[12:38] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:44] * joao (~JL@89-181-154-158.net.novis.pt) Quit (Quit: Leaving)
[12:52] * yoshi (~yoshi@p3167-ipngn3601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:05] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[13:43] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[13:45] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[13:46] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:49] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[13:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:53] <nhm> good morning #ceph
[13:53] <Dieter_b1> hello nhm
[14:18] * joao (~JL@89-181-154-158.net.novis.pt) has joined #ceph
[14:23] <nhm> well, performance problems weren't due to retransmission if tshark is to be believed. Time to start digging into the OSD logs.
[14:24] <nhm> sjust: where does your latest filtering stuff live?
[14:27] * loicd (~loic@magenta.dachary.org) has joined #ceph
[14:36] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[14:37] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[14:40] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:53] * cattelan_away is now known as cattelan
[15:03] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:09] <joao> nhm, looks like it's safe to upgrade from oneiric to precise
[15:09] <joao> don't know about from natty :)
[15:10] <joao> also, against all that was expected, the unity dock now works on multiple displays, which made me a happier man
[15:11] <elder> I left my (text-based) upgrade going overnight. Just checked and it was prompting me for something...
[15:11] <joao> yeah, that happened to me as well
[15:13] <joao> mid upgrade it prompted me to provide init services to be restarted, and that made me waste an additional half an hour
[15:13] <joao> but all things considered, nothing seems to be broken
[15:14] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[15:20] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[15:31] * RNZ (~RNZ@ has joined #ceph
[15:38] <nhm> joao: elder: good to know. Maybe I'll try upgrading.
[15:38] <nhm> Maybe I'll try cinnamon.
[15:44] <RNZ> hi all. Ceph really fault-tolerant fs?
[15:45] <RNZ> anybody make cable pull test on 2 peers and 1 client?
[15:45] * f4m8 is now known as f4m8_
[15:50] * oliver1 (~oliver@p4FD06EB2.dip.t-dialin.net) has joined #ceph
[15:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[15:57] * prometheanfire (~promethea@rrcs-24-173-105-83.sw.biz.rr.com) has joined #ceph
[15:57] <prometheanfire> RNZ: give them time
[16:04] * prometheanfire (~promethea@rrcs-24-173-105-83.sw.biz.rr.com) has left #ceph
[16:39] * cattelan is now known as cattelan_away
[16:42] * Theuni (~Theuni@ has joined #ceph
[16:56] <elder> Anybody know if it's possible to run Vidyo with audio only? I want to participate by phone today--not at home--and would like to reduce the bandwidth requirements.
[16:57] <nhm> elder: you can turn off your own video, not sure if you can turn off remote video.
[16:58] <elder> I'll fiddle with it. That alone would halve it.
[16:59] <elder> nhm do you know the commands to activate certain debug message code in ceph?
[16:59] <elder> It uses debugfs or something. I haven't done it manually in a while and I can't remember what the command was.
[17:00] <nhm> elder: I haven't done that. I've injected stuff and put debug args in ceph.conf.
[17:00] <nhm> I don't know anything about debugfs though.
[17:01] <elder> Any idea how to activate debug in a yaml file?
[17:01] <nhm> For tuethology?
[17:01] <elder> Yes
[17:01] <nhm> sure, one sec
[17:03] <elder> aybe ceph: conf: client.0: debug client: 10
[17:03] <nhm> elder: for say OSD debugging, in overrides->ceph->conf->osd you can put things like "debug osd : 20" or "debug ms : 1"
[17:03] <elder> OK.
[17:04] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[17:05] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[17:06] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[17:06] <nhm> elder: btw, greg mentioned that xfs sync interval is ~30s, does that sound right?
[17:07] <elder> I think so.
[17:07] <elder> It's configurable.
[17:07] <nhm> elder: I'm seeing periodic spikes in throughput/iops and then drops inbetween. They aren't perfectly regular and there is some variability, but he mentioned it as a possibility.
[17:08] <elder> There are a number of parameters related to that. If you sync too often you can end up with excess I/O, but it still could perform better in aggregate depending on the load.
[17:08] <elder> I think that may be normal.
[17:08] <elder> If the system is more stressed I think writeback will kick in between those intervals and might smooth it out a bit.
[17:09] <elder> You could verify it by tweaking the XFS parameters.
[17:09] <elder> Let me go see how to do that. It's in /proc I think.
[17:09] <nhm> it's entirely possibly it's not related to xfs at all and there is some other limitation somewhere.
[17:10] <nhm> today I'm going to start digging into the OSD logs and look at the lifecycle of the ops in detail.
[17:12] <nhm> elder: sent you the same graphs I sent greg last night. Unfortuantely they aren't all that useful except to shwo that there is a lot of variability and journal/osd/client throughputs all slow down at the same times.
[17:13] <elder> Look at /proc/sys/fs/xfs/* Those are sysctl parameters that can be tweaked. In particular you may like xfssyncd_centisecs, which at me is 3000 (meaning 30 seconds).
[17:14] <elder> I believe if you write those it changes them at runtime (at least some--including that one)
[17:14] <nhm> ok, I'll try tweaking that and see what happens
[17:14] <elder> So if you changed it to, say, 1500, and/or 4500, you would see a corresponding difference in your graphs.
[17:14] <elder> ...if that were indeed the culprit
[17:14] * Theuni (~Theuni@ Quit (Ping timeout: 480 seconds)
[17:16] <elder> nhm, descriptions from the code:
[17:17] <elder> xfssyncd_centisecs: Interval between xfssyncd wakeups
[17:17] <elder> xfsbufd_centisecs: Interval between xfsbufd wakeups.
[17:17] <elder> age_buffer_centisecs: Metadata buffer age before flush.
[17:17] <elder> fstrm_timer: Filestream dir-AG assoc'n timeout.
[17:18] <elder> That's probably all you're interested in, and the last one only matters if you're using the filestreams allocator.
[17:18] <elder> Sorry, last one was filestream_centisecs
[17:18] <nhm> elder: any opinions on what would be most likely (assuming it was any of these)?
[17:19] <elder> xfssyncd_centisecs
[17:19] <elder> It's 3000 on my machine (default value)
[17:19] <elder> The rest are different.
[17:19] <elder> Well, except filestream_centisecs, which is also 30, but see above.
[17:19] <nhm> yeah, 3000 here too. I'm setting it for 500 just to make it obvious.
[17:20] <elder> I'd go the other way, myself, but that should do it either way.
[17:20] <nhm> I'll try it the other way too.
[17:21] <gregaf> elder: nhm: I was actually talking about the filestore's sync interval ;)
[17:21] <elder> Great.
[17:21] <elder> They could also be interacting with each other.
[17:22] <elder> I would start with XFS first.
[17:22] <elder> If you're grabbing stats on the OSD system anyway.
[17:22] <gregaf> filestore_max_sync_interval
[17:22] <gregaf> only, oh god, it's actually 5 seconds
[17:23] <gregaf> so I don't think the xfs one would even be dumping anything since it doesn't have the time to activate, right?
[17:23] <elder> Start closer to the hardware before you move up.
[17:23] <elder> gregaf, I don't really know what the picture is here.
[17:23] <elder> Are stats accumulated on the osd machine?
[17:23] <elder> Is the filestore the osd code?
[17:24] <gregaf> there's a module in the OSD called the FileStore which is responsible for actually putting RADOS objects onto disk ??? into the filesystem, tailoring behavior to match the underlying FS' consistency guarantees and abilities, etc
[17:25] <elder> So every 5 seconds it issues a sync?
[17:25] <gregaf> it has to run a sync on the whole store fairly frequently, since until then it can't clear out the journal
[17:25] <elder> FS sync, or file fsync?
[17:25] <gregaf> yes, by default 5 seconds (configurable)
[17:25] <gregaf> fs sync
[17:25] <gregaf> it calls syncfs() if it can, otherwise sync()
[17:25] <elder> Nice.
[17:25] <gregaf> (except on btrfs, where it uses snapshots instead and goes craAAAaaazy with optimization attempts)
[17:26] <elder> We should just skip buffering altogether and do everything synchronous.
[17:26] <gregaf> heh
[17:26] <gregaf> ext4 wouldn't like that very much ??? 5 seconds is pretty unreasonable to sync on but it still provides some buffering
[17:27] <elder> This is for the benefit of getting the journal to disk, right?
[17:27] <gregaf> no, no, the journal is all done directly with fdatasync
[17:27] <gregaf> this is so that we can clear old entries out of the journal
[17:27] <elder> So this is just being really sure user data hits disk?
[17:27] <elder> Kind of spazzy.
[17:28] <gregaf> yeah ??? since the journal and data store are on different disks we can't just use barriers to ensure ordering
[17:28] <elder> Why so urgent to clear entries from the journal?
[17:29] <gregaf> it has a limited size
[17:29] <elder> How big?
[17:29] <gregaf> configurable
[17:29] <elder> Well then make it bigger. My gut feeling is 5 seconds is too frequent.
[17:29] <gregaf> I suspect that in most production deployments a config interval of 15-60 seconds will be more likely, with a nice large journal
[17:29] <nhm> elder: changing xfssyncd_centisecs didn't seem to have a real major effect.
[17:30] <gregaf> but the defaults are set up so that it works okay on whatever test hardware people happen to have, where they might not want a journal bigger than 100-1024MB
[17:30] <nhm> I'll try upping that next.
[17:30] <gregaf> and yes, I'm with you on 5 seconds being too frequent
[17:30] <nhm> I've got 10G journals on this setup, so upping it should work fine.
[17:31] <gregaf> it used to be even less and we got it upped; I thought it had been upped again but I guess not
[17:31] <elder> nhm, try a larger xfssyncd_centisecs first.
[17:31] <gregaf> nhm: well, you're running 10G networking, right? So I wouldn't push it much past 10 seconds, unfortunately
[17:31] <gregaf> (but if the graphs come out looking much different we'll know where to look)
[17:32] <elder> gregaf, do you know how to turn on the kernel code debug stuff?
[17:32] <gregaf> oh right
[17:32] <gregaf> I was going to look for that and forgot
[17:32] <nhm> elder: did
[17:32] <gregaf> I have a script that Sage dumped on me at one point
[17:32] <elder> nhm, ok, good to know. Must be the spazfs that's doing it.
[17:33] <nhm> elder: a really low one may have had a negative effect. A high one was able the same as I typically see.
[17:33] <elder> I had that too and can't find it, gregaf
[17:33] <gregaf> #!/bin/sh -x
[17:33] <gregaf> p() {
[17:33] <gregaf> echo "$*" > /sys/kernel/debug/dynamic_debug/control
[17:33] <gregaf> }
[17:33] <gregaf> echo 9 > /proc/sysrq-trigger
[17:33] <gregaf> p 'module ceph +p'
[17:33] <gregaf> p 'module libceph +p'
[17:33] <gregaf> p 'module rbd +p'
[17:33] <gregaf> p 'file net/ceph/messenger.c -p'
[17:33] <gregaf> p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph | awk '{print $1}' | sed 's/:/ line /'` '+p'
[17:33] <gregaf> p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph | awk '{print $1}' | sed 's/:/ line /'` '+p'
[17:33] <elder> Exactly.
[17:33] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:33] <elder> Thank you.
[17:33] <gregaf> and:
[17:33] <gregaf> gregf@kai:~$ cat ceph_kclient_debug_off.sh
[17:33] <gregaf> #!/bin/sh -x
[17:33] <gregaf> p() {
[17:33] <gregaf> echo "$*" > /sys/kernel/debug/dynamic_debug/control
[17:33] <gregaf> }
[17:33] <gregaf> echo 9 > /proc/sysrq-trigger
[17:33] <gregaf> p 'module ceph -p'
[17:33] <gregaf> p 'module libceph -p'
[17:33] <gregaf> p 'module rbd -p'
[17:33] <nhm> gregaf: yeah, 10GE, but I have 7 OSDs and 7 10G journals per OSD node.
[17:33] <gregaf> p 'file' `grep -- --- /sys/kernel/debug/dynamic_debug/control | grep ceph | awk '{print $1}' | sed 's/:/ line /'` '-p'
[17:33] <gregaf> p 'file' `grep -- === /sys/kernel/debug/dynamic_debug/control | grep ceph | awk '{print $1}' | sed 's/:/ line /'` '-p'
[17:33] <gregaf> nhm: ah, right, forgot to divvy it up
[17:34] <nhm> gregaf: should I also set filestore_min_sync_interval?
[17:34] <gregaf> nhm: I'm not entirely sure about when that's used, let me look at it
[17:36] <elder> gregaf, I think your debug_off script wants "echo 0 > /proc/sysrq-trigger"
[17:36] <elder> (use 0, not 9)
[17:36] <gregaf> err, if you say so?
[17:36] <gregaf> I don't know how any of it works
[17:36] <elder> The number sets the kernel message output console log level, 0..9.
[17:37] <gregaf> ah
[17:37] <elder> 9 is right for opening the floodgates, 0 is what you want to close them again.
[17:37] <gregaf> Sage probably just runs his at 9 all the time, then
[17:37] <elder> And parses it in real-time.
[17:37] <gregaf> the ceph-specific stuff is what really spews out data ;)
[17:38] <gregaf> nhm: this is too complex for me to guess at, so I'll say yes, you might as well try it out
[17:38] <gregaf> can check with sjust or sagewk when they get in
[17:38] <nhm> ok
[17:46] <nhm> interesting, increasing the min/max sync interval to 29/30s didn't seem to do anything.
[17:46] <nhm> I think I just need to sit down and pour through these OSD logs.
[17:47] <elder> Could be writeback then.
[17:47] <elder> * The longest time for which data is allowed to remain dirty
[17:47] <elder> */
[17:47] <elder> unsigned int dirty_expire_interval = 30 * 100; /* centiseconds */
[17:48] <elder> /proc/sys/vm/dirty_expire_centisecs
[17:49] <elder> Try that.
[17:49] <nhm> ok, will do
[17:53] <nhm> elder: no change
[17:54] <elder> No other suggestions at the moment.
[17:54] <elder> Sorry.
[17:55] <nhm> elder: np, I think I need to go through and start looking at where ops are getting hung up.
[17:59] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[18:01] <sagewk> gregaf, elder: yeah, i use that script with uml and pipe the output to a file, so 9 is what i want in that case. not appropriate for a machine with a serial console
[18:01] <sagewk> klogd usually can't keep up
[18:01] <elder> But the disable?
[18:01] <elder> Disable should be turning off console messages.
[18:02] <elder> I.e., echo 0 > /proc/sysrq-trigger
[18:02] <sagewk> oh, agreed.. that's wrong
[18:08] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[18:11] <elder> Also sagewk and gregaf, you should change "grep ceph" to be "grep /ceph/" in two places.
[18:12] <elder> because "ceph" may end up in all paths in your build kernel (it does in mine). It may not matter though...
[18:12] <elder> You'll only be getting a few other spurious messages from elsewhere that include "---" in them.
[18:20] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[18:23] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:24] <nhm> gregaf: this may not be unexpected, but looking through the osd logs, it seems that periods of inactivity correspond with periods where the number of ops in filght are high and grow old. Thing seem to speed back up around the times when the number of objects in flight drop back down.
[18:24] <elder> Is there a throttle for in-flight OSD ops?
[18:24] <elder> Or a maximum?
[18:24] <nhm> ie a spike in performance correlates with:
[18:25] <nhm> 2012-04-27 10:46:20.413467 7fc612b17700 --OSD::tracker-- ops_in_flight.size: 17; oldest is 7.886469 seconds old
[18:25] <nhm> 2012-04-27 10:46:21.421955 7fc612b17700 --OSD::tracker-- ops_in_flight.size: 2; oldest is 7.055721 seconds old
[18:26] * oliver1 (~oliver@p4FD06EB2.dip.t-dialin.net) has left #ceph
[18:28] <joao> INFO:teuthology.orchestra.run.err:W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-6.2.3.fw for module bnx2
[18:28] <joao> should I be concerned?
[18:29] <nhm> joao: you should be ok.
[18:29] <joao> okay then
[18:30] <nhm> joao: it seems to like to complain about that. You are only in trouble if /lib/firmware/updates is empty and you don't have a working firmware in /lib/firmware
[18:30] <nhm> Then the system might boot without proper network access on recent kernels.
[18:30] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[18:30] * LarsFronius_ is now known as LarsFronius
[18:31] <joao> btw, looks like ceph's oneiric-amd64 gitbuilder is "temporarily unavailable"
[18:33] <yehudasa> nhm: didn't you try completely disabling filestore flusher at one point?
[18:34] <yehudasa> so in that case those min/max values had no meaning afaik
[18:34] <nhm> yehudasa: that's true.
[18:35] <nhm> yehudasa: though I think I may want to revisit it again. I need to keep examining these logs for now.
[18:35] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[18:35] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:36] * LarsFronius_ (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[18:37] * bchrisman (~Adium@ has joined #ceph
[18:42] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: No route to host)
[18:42] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:44] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[18:44] * ss7pro (~ss7pro@static.nk-net.pl) has joined #ceph
[18:45] <ss7pro> Hi sjust are you online ?
[18:45] <sjust> ss7pro: yeah, sorry, still sidetracked for a bit longer
[18:46] <ss7pro> ok no problem
[18:48] * LarsFronius_ (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[18:51] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[18:57] * LarsFronius_ (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[18:57] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:58] <elder> sagewk, nhm, sjust whoever, I'm going to try to participate in the standup today but I'll be in the car. First time I've tried that with VidYO
[18:59] <sagewk> heh ok!
[18:59] <sjust> ...ok
[18:59] <sjust> not driving I hope!
[18:59] <elder> Nope.
[19:00] * LarsFronius__ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[19:00] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[19:00] * LarsFronius__ is now known as LarsFronius
[19:00] * LarsFronius_ (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Read error: Connection reset by peer)
[19:00] * LarsFronius_ (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[19:01] * danieagle (~Daniel@ has joined #ceph
[19:01] <gregaf> nhm: yeah, I think sjust has seen that a couple times too ?????it's one of the reasons I thought the filestore sync might be involved (and the client throttling)
[19:06] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Operation timed out)
[19:06] * LarsFronius_ is now known as LarsFronius
[19:07] <nhm> gregaf: One of the things that has me scratching my head a bit is what kind of resource would make writes stall across all OSDs on all nodes?
[19:07] <gregaf> you should talk to sjust about this ?????I've consulted on it with him some but he remembers the ins and outs of what's been explored much better than I do
[19:09] <nhm> gregaf: Sure, though I'll happily transfer knowledge from anyone who is willing. ;)
[19:09] <yehudasa> nhm: can you try a setup similar to jim's?
[19:10] <nhm> yehudasa: yep, I'm on it.
[19:10] <gregaf> as I recall there were two intertwined issues; one was on the messenger and one was on the disk, but if you didn't fix both you didn't get much better performance
[19:10] <gregaf> I really don't remember well though
[19:11] <yehudasa> the interesting thing is that he's lowering the filestore queue max ops
[19:11] <nhm> yehudasa: yeah, I noticed that.
[19:14] * chutzpah (~chutz@ has joined #ceph
[19:27] <joao> Tv_, teuthology is trying to get ceph's oneiric-amd64 build (wip-2323)
[19:27] <joao> and failing: http://ceph.newdream.net/gitbuilder-oneiric-amd64/
[19:30] <sagewk> oh i fixed the git daemon on ceph.newdream.net this morning, that might be it
[19:31] <sagewk> click the 'rebuild' link if it showed up as failed
[19:31] <sagewk> something keeps removing the world x bit on my home dir
[19:31] <joao> no, that's not it
[19:32] <joao> the teuthology task fails to grab the build because the gitbuilder is "temporarily unavailable"
[19:32] <sjust> nhm: plana20:/home/sam/teuthology/log_analyzer.py
[19:33] <sagewk> k
[19:40] <sagewk> joao, tv|work: restarted the vm
[19:41] <joao> thanks
[19:44] * ss7pro (~ss7pro@static.nk-net.pl) Quit (Quit: IRC webchat at http://irc2go.com/)
[19:46] <nhm> sjust: sweet, thansk
[19:46] <sjust> it's a bit cryptic, let me know if you have questions
[19:50] <nhm> sjust: will do. It'll be good for me to learn these logs in and out anyway though.
[19:50] <nhm> It's something I've been meaning to do.
[20:43] * The_Bishop (~bishop@cable-82-119-14-175.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:02] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:04] <joao> sagewk, it's gone again
[21:04] <joao> and it was finishing my branch's build :p
[21:04] <joao> any chance I can use any other build on the planas?
[21:09] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[21:18] <elder> Well, Ubuntu 12.04 has served me well. X is operational again on my laptop.
[21:19] <nhm> elder: that's great
[21:21] <joao> the only downside so far seems to be that locally compiled binaries no longer work on the planas
[21:21] <joao> libc mismatch
[21:23] <elder> Why don't you compile on the plana machine?
[21:26] <nhm> elder: joao: you guys know anything about the filestore flusher?
[21:26] <elder> Nyet.
[21:31] * ceph-test (~Runner@mail.lexinter-sa.COM) Quit ()
[22:00] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:06] <joao> nhm, what about it?
[22:06] <joao> I know of it, read through it and all, but you need to be more specific :p
[22:07] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[22:07] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[22:08] <nhm> joao: when everything grinds to a halt during my tests, osd_ping and fluster_entry are some of the only things that are happening.
[22:08] <nhm> s/fluster/flusher
[22:09] <nhm> actually, that's a lie. Those are the only things reporting in the logs during those times.
[22:10] <nhm> So something is blocking almost everything else, but not that, and not osd_ping.
[22:10] <dmick> T or F: we do not have a Debian repo set up with gitbuilder results.
[22:10] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:10] <elder> True that we do not, or false that we do?
[22:10] <joshd> dmick: false
[22:11] <joshd> deb http://gitbuilder.ceph.com/ceph-deb-oneiric-x86_64-basic/ref/master/ oneiric main
[22:11] <dmick> heh, sorry for the wording. Do we or don't we have a repo?
[22:11] <dmick> oh
[22:11] <dmick> w00t
[22:13] <joao> nhm, afaict, it should be closing file descriptors queued during writes
[22:16] <joao> nhm, what are your filestore log level?
[22:16] <dmick> joshd: where do I get the gpg key for that?
[22:16] <joao> *is
[22:17] <joshd> dmick: I think it's this one: http://ceph.newdream.net/git/?p=ceph.git;a=blob_plain;f=keys/autobuild.asc;hb=HEAD
[22:19] <Tv_> dmick: http://ceph.newdream.net/docs/master/ops/install/mkcephfs/ vs http://ceph.newdream.net/docs/master/ops/autobuilt/
[22:20] <nhm> joao: debug filestore = 20
[22:20] <joao> nhm, have you noticed any "queue_flusher ... hit flusher_max_fds" message?
[22:21] <joao> nhm, thing is, flusher_entry() should be running every time a write happens, closing the fd after the write
[22:22] <joao> with that debug level you should be noticing transactions being run
[22:23] <joao> thus it would be understandable having flusher_entry running (as it would be closing fds after the transactions performing writes)
[22:23] <nhm> joao: nope, no flusher_max_fds.
[22:25] <dmick> Tv_: I .. had no idea that autobuilt page existed. thanks.
[22:25] <Tv_> joao: that gitbuilder's http service is back up
[22:25] <joao> that sucks... with no other messages, the only thing I could come up with was that you'd ran out of space in the flusher queue, and would be performing a sync_file_range followed by a close
[22:25] <joao> Tv_, thanks
[22:25] <Tv_> i thought sage's rsync hack worked around that, though
[22:29] <joao> nhm, does flusher_entry keep on running eternally?
[22:34] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:34] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Read error: Connection reset by peer)
[22:35] <nhm> joao: it seems to run for quite a while between sleeps/awakes
[22:35] <nhm> at least in some cases.
[22:36] <joao> and you see no write's happening?
[22:40] <nhm> joao: depends on the time period. For one period, it awoke at 10:46:27 and slept at 10:46:37. There were no writes from 10:46:32 to 10:46:40.
[22:41] <nhm> joao: the thing is, this isn't isolated to one OSD. It seems that performance on all OSDs tends to suffer at the same times.
[22:42] <nhm> so I'm thinking the OSD is getting starved for some reason.
[22:42] <nhm> unless problems on one OSD can affect writes to other OSDs...
[22:42] <Tv_> nhm: same host?
[22:43] <nhm> Tv_: two hosts, 7 OSDs each.
[22:43] <Tv_> nhm: I mean scope of the "one OSD can affect writes to other OSDs" statement
[22:43] <Tv_> within a host, IO scheduler, kernel holding locks for too long, hardware crapping out, etc will most definitely be intermingled
[22:44] <nhm> Tv_: fairly often the client throughput will drop to zero.
[22:44] <joao> nhm, there is an overlap between the last write and the time flusher_entry woke up, so it's probable that it executed within it's expected behavior
[22:45] <nhm> Tv_: with aggregate throughput to the data disks and journals disks on each node getting close to zero (though perhaps not entirely due to lingering writes)
[22:45] <joao> write() -> queue fd -> wake up flusher_entry -> take care of that fd and any subsequent ones -> sleep
[22:46] <joao> it does take 10 seconds, but I'm not sure if that is an acceptable amount of time to run a sync_file_range() and close the descriptor
[22:47] <nhm> joao: yes, I have been noting that some times it's awake for 10s at a time.
[22:48] * Tv_ (~tv@aon.hq.newdream.net) has left #ceph
[22:48] <nhm> joao: is there any reasonable way to independently test the FileStore code?
[22:48] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Quit: LarsFronius)
[22:49] <joao> nhm, maybe test_filestore_idempotent_sequence?
[22:49] <sagewk> test_filestore_workloadgen?
[22:49] <joao> or the workload generator
[22:52] <nhm> sagewk: while you are looking, is there any thing else you can think of that could starve all of the OSDs at the same time, but wouldn't necessarily show up in the OSD logs?
[22:53] <nhm> wouldn't be on the client side as two clients on different nodes show the same behavior as 1 client.
[22:55] <sagewk> nhm: sync(2)?
[22:55] <sagewk> not sure if our kernels have libc wired up to syncfs(2) yet
[22:56] <sagewk> tho that may not nec help anyway
[23:04] <dmick> sagewk: dist build problem reproduced, looking. seems connected to .git_version
[23:05] <sagewk> yeah
[23:38] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[23:43] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Remote host closed the connection)
[23:44] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[23:45] <yehudasa> the daily 12.04 boot
[23:55] * wam (~wam@dslb-188-105-143-242.pools.arcor-ip.net) has joined #ceph
[23:56] * wam (~wam@dslb-188-105-143-242.pools.arcor-ip.net) Quit ()

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.