#ceph IRC Log


IRC Log for 2011-01-28

Timestamps are in GMT/BST.

[0:28] * cmccabe (~cmccabe@ has joined #ceph
[1:50] * ajnelson (~Adium@dhcp-63-189.cse.ucsc.edu) Quit (Read error: Operation timed out)
[1:57] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:25] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:27] * joshd (~jdurgin@ Quit (Quit: Leaving.)
[2:30] * iggy (~iggy@theiggy.com) Quit (Ping timeout: 480 seconds)
[2:32] * sedulous (~sed@2001:41d0:2:46c1::1) has left #ceph
[2:54] * cmccabe (~cmccabe@ has left #ceph
[4:28] * bbigras (quasselcor@ has joined #ceph
[4:29] * bbigras is now known as Guest1901
[4:31] * Guest1671 (quasselcor@ Quit (Read error: Operation timed out)
[4:41] * bbigras_ (quasselcor@ has joined #ceph
[4:45] * bbigras__ (quasselcor@ has joined #ceph
[4:46] * Guest1901 (quasselcor@ Quit (Ping timeout: 480 seconds)
[4:49] * bbigras_ (quasselcor@ Quit (Ping timeout: 480 seconds)
[6:38] * iggy (~iggy@theiggy.com) has joined #ceph
[8:04] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:45] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Quit: Ex-Chat)
[9:04] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:01] * Yoric (~David@ has joined #ceph
[10:39] * eternaleye_ (~eternaley@ has joined #ceph
[10:40] * eternaleye (~eternaley@ Quit (Remote host closed the connection)
[11:33] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) has joined #ceph
[11:33] <fred_> wido, are you around ?
[15:01] * allsystemsarego (~allsystem@ has joined #ceph
[15:11] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) Quit (Quit: Leaving)
[16:47] * greglap (~Adium@ has joined #ceph
[17:15] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) has joined #ceph
[17:16] <fred_> hi
[17:44] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[17:46] <wido> fred_: hi
[17:46] <wido> I'm here now :)
[17:46] <fred_> oh hi!
[17:47] <fred_> wido, I wanted to know if you still had problems with #563 ?
[17:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:48] <wido> fred_: yes, still seeing it
[17:48] <wido> comes around now and than
[17:49] <fred_> but the ceph cluster is usable?
[17:50] <wido> yes, no problem
[17:50] <wido> just a warning
[17:50] <fred_> ok... maybe I should switch to a newer kernel...
[17:51] <fred_> I'm living with 2.6.35 currently
[17:54] <wido> fred_: I'm at 2.6.38 at the moment with some unstable btrfs code
[17:54] <wido> but seems to be working fine now, the warning is still there, but it is something sagewk is still hunting on
[17:56] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[17:57] <fred_> ok, on one of my OSDs, I was seeing multiple of such warnings and then BTRFS stopped working... rebooted... same things after < 12 hours and so on... So ceph is stopped now
[17:57] <fred_> I'll try the latest 2.6.38 rc and see if it behaves better
[17:59] <wido> fred_: Check my post on the ml to Jim Schutt
[17:59] <wido> there is a git repo of Josef Bacik (btrfs dev) with some new code, that is working fine under 2.6.38
[17:59] <fred_> yep, I saw the post... that's why I wanted to talk to you
[18:00] <wido> brb
[18:02] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:02] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[18:27] <wido> fred_: back
[18:27] <wido> but you could try that code, is working fine (except for the warning)
[18:28] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Ping timeout: 480 seconds)
[18:28] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[18:29] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:29] * fred_ (~fred@80-219-183-100.dclient.hispeed.ch) Quit (Quit: Leaving)
[18:39] <bchrisman> I'd appreciate suggestions as to how to figure out what happened to my cluster here. I've got ceph -s, ceph.conf, and my crushmap here: http://pastebin.com/tJiGgDwg
[18:40] <bchrisman> I'm going to go look at the OSD logs now.. but it seems like 5 of my 12 osds are still up.
[18:40] <gregaf> bchrisman: 5 up, and only 8 in so you've got 4 marked out
[18:40] <bchrisman> I was running ping_pong testing… and the setup is kernel client + daemons on all three nodes.
[18:41] <gregaf> switching the power on and off on different nodes/
[18:41] <gregaf> ?
[18:41] <bchrisman> Yeah… so I was going to go sleuthing around to find out why those others left the cluster.
[18:41] <bchrisman> No.
[18:41] * cmccabe1 (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[18:41] <gregaf> "ping-pong testing" is what, then?
[18:41] <bchrisman> uptime is same on all nodes.
[18:42] <bchrisman> it's a lock manager test… often used to test whether a cluster filesystem will handle locking properly.
[18:42] <bchrisman> think it's by the samba guys
[18:42] <gregaf> ah
[18:42] <gregaf> well, the only reason OSDs leave is if they get kicked out for downtime
[18:43] <gregaf> the distributed locking isn't well-tested; we don't have many users so you probably just managed to break it
[18:43] <bchrisman> yeah.. so I want to run the tests again (ground up reinstall) and monitor what's going on and why.
[18:43] <gregaf> you don't have any cores or log files?
[18:44] <bchrisman> Yeah.. I'll go searching around.. where would cores be getting dropped?
[18:44] <bchrisman> in /data/osd?...
[18:44] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:44] <bchrisman> or /var/log/ceph?
[18:44] <gregaf> depends on how you've set them up, but by default in the same dir the executable was run from, I think?
[18:45] <gregaf> that's for core files
[18:45] <gregaf> logs are in /var/log/ceph, yeah
[18:45] <gregaf> (unless you set them elsewhere in the config file or on startup ;) )
[18:46] <bchrisman> and how would I show which osd's are in/out.. some sort of dump?
[18:46] <bchrisman> ahh scrum time.. brb
[18:47] <gregaf> bchrisman: yeah, via the ceph tool
[18:47] <gregaf> I think it's "ceph osd dump -o -" to put it to stdout
[18:49] * Yoric_ (~David@ has joined #ceph
[18:49] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[18:49] * Yoric_ is now known as Yoric
[19:03] <bchrisman> decrypting the osd dump: (osd8 down out up_from 5 up_thru 23 down_at 29 last_clean_interval 0-0), the 'from X up_thru Y' means that for some ceph-internal time/status unit region, this osd was up (5->23), but then was down at 29?
[19:04] <bchrisman> full output here: http://pastebin.com/Xwcc4WRu
[19:05] <bchrisman> all nodes in my cluster have the same uptime… I'll go ahead an reproduce to make sure it wasn't some server room gremlin pulling and replacing cables or whatnot.
[19:06] <gregaf> up_from is the epoch when the OSD got added to the map
[19:06] <gregaf> up_thru is the last epoch they were known to be active
[19:06] <gregaf> down_at is the first epoch they were known to be down
[19:06] * Yoric (~David@ Quit (Quit: Yoric)
[19:06] <gregaf> actually, let me check that
[19:07] <bchrisman> ahh cool.. that's what I thought… great..
[19:07] <Tv|work> bah nfsroot is making bridging way harder
[19:07] <gregaf> yep, that's correct
[19:07] <Tv|work> if i down eth0, i lose /
[19:08] <Tv|work> i feel tempted to re-do ceph-kvm as old-school local disk using servers; they can still netboot
[19:09] <Tv|work> hmm they have eth1 too.. i wonder if that's connected..
[19:10] <Tv|work> not right now at least
[19:11] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:14] <Tv|work> oh fun, xen people stumbled on that in 2004: http://lists.xensource.com/archives/html/xen-devel/2004-08/msg00133.html
[19:15] <sagewk> yeah you can stick the root fs on the local disk and just netboot the kernel
[19:15] <Tv|work> seems like the only way to go
[19:15] <Tv|work> well i could play tricks with eth1
[19:15] <Tv|work> but i'd rather keep it simple
[19:15] <Tv|work> works fine as with nfsroot as long as you're willing to NAT the vms
[19:15] <sagewk> the problem is the udev mac address stuff?
[19:15] <sagewk> we can also wire up eth1
[19:16] <Tv|work> the problem is moving ip address (from kernel bootup-time dhcp) from eth0 to br0
[19:16] <Tv|work> that causes nfsroot to hiccup
[19:16] <Tv|work> could work around that with custom scripts in initramfs doing bridge setup before nfsroot mount, but that's just too much work for little gain
[19:16] <Tv|work> wiring eth1 would let me run the bridge on eth1
[19:16] <Tv|work> leave eth0 for "management"
[19:17] <sagewk> let's do that
[19:17] <sagewk> unless you prefer local root for other reasons
[19:17] <Tv|work> ok if there's plenty of switch ports etc, why not ;)
[19:17] <Tv|work> well that would eliminate the need for bind roots etc
[19:17] <Tv|work> not sure which way is less work in the end
[19:18] <Tv|work> managing n boxes, or managing one more complex nfsroot setup
[19:18] <sagewk> i prefer the nfsroot personally :)
[19:18] <Tv|work> well i'm not arguing too bad, at least yet, so i'll keep trying that
[19:18] <Tv|work> sagewk: who do i talk to for wiring eth1, or will you ask for that to happen?
[19:19] <sagewk> i asked leon, he said it'll be done in a few
[19:20] <Tv|work> sweet
[19:20] <Tv|work> watch 'ethtool eth1|grep Link' ;)
[19:20] * toolbox (~itsme@S0106000102ec26fe.gv.shawcable.net) Quit (Ping timeout: 480 seconds)
[19:22] <gregaf> cmccabe: Tv|work: sjust: joshd: sage wants a 10:30 today :)
[19:22] <Tv|work> i'd be happy with a 10:30 every day, that'd remove some guesswork
[19:23] <sagewk> that'll be the default!
[19:23] <sjust> sounds good
[19:23] <gregaf> I don't remember why we'd decided on 11 initially
[19:24] <joshd> yeah, predictable times are good
[19:24] <gregaf> but as long as we're all here I think earlier is better :)
[19:25] <cmccabe1> 1030 is fine
[19:32] * ajnelson (~Adium@dhcp-63-189.cse.ucsc.edu) has joined #ceph
[19:46] <Tv|work> gregaf: the path was /images/ceph-peon64/cosd/ceph4.2756/ffsb.sh/tmp
[20:02] <Tv|work> waah why is the serial console thingie bound to ^Z
[20:03] <Tv|work> i want to suspend my editor :(
[20:05] <sagewk> tv|work: i know i hate it
[20:10] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:04] <wido> hi
[21:04] <wido> I'm running a VM right now where I'm downloading about 1.1TB of data with 100Mb/sec. All goes well, but the load is pretty high on the machine
[21:05] <wido> My repl is at 3, I have 4 OSD's on the same machine (noisy), but the I/O wait is 25%
[21:06] <wido> Is this something you see? The repl = 3 creates 300Mbit of writes which has to be done, but 4 disks should be able to do that
[21:06] <sagewk> do you have oprofile? that will give you a pretty good holistic view of where time is being spent (inside and outside the kernel)
[21:07] <sagewk> hmm nevermind its iowait. :/
[21:17] <wido> yes, and could it be that Ceph is doing a lot of small writes?
[21:18] <wido> btw, my journal is on a tmpfs
[21:18] <wido> so that is not the problem.
[21:18] <wido> load is 7.54 right now of noisy, a bit high imho for only one VM and 100Mb/sec download
[21:20] <sagewk> yeah, that is high. I'd be interested in where the CPU time is being spent anyway. Is logging on? That eats a fair bit of CPU
[21:24] <wido> sagewk: No, logging is at a low level
[21:25] <wido> Might be a small blocksize which everything is written with?
[21:25] <sagewk> yeah, could be.
[21:26] <sagewk> would be interesting to see what the osd write workload looks like. ceph osd tell 0 injectargs '--debug-ms 1' will let you see the osd_ops arriving and how big they are
[21:30] <wido> sagewk: for each OSD?
[21:30] <wido> or just for osd0?
[21:32] <sagewk> just osd0 will give a representative sample
[21:32] <wido> ok, done :)
[21:32] <wido> just let if run for a while and upload the log somewhere?
[21:35] <sagewk> yeah sure
[21:45] <wido> sagewk: you want the lines with "ondisk" I assume?
[21:47] <sagewk> actually the osd_op ones
[21:48] <wido> ok, I'll gather the last few thousand lines and upload it
[22:02] <wido> sagewk: I've got about 10.000 lines now. Should I open a issue for it and add the log (2.6MB)
[22:03] <sagewk> sure yeah
[23:36] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:54] * cmccabe1 (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.