#ceph IRC Log


IRC Log for 2011-02-11

Timestamps are in GMT/BST.

[0:03] * mnigh (~mnigh@75-128-161-124.static.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[0:07] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:34] <wido> I tried to expand my 4 OSD cluster (with 7.4TB) from 4 to 8 OSD's, to 15TB. Replication is on 3, with 2.6TB of data used
[0:35] <wido> but that seems pretty hard, OSD's keep bouncing up and down
[0:35] <wido> I lowered the recovery ops to 1 from 3, but still they keep bouncing around. Has been running for about 7 hours now
[0:35] <wido> "8224 pgs: 643 active, 4080 active+clean, 1396 peering, 31 crashed+peering, 170 active+degraded, 568 active+clean+degraded, 1102 degraded+peering, 129 crashed+degraded+peering, 105 crashed+down+degraded+peering; 1934 GB data, 5297 GB used, 7726 GB / 13041 GB avail; 625532/1486059 degraded (42.093%)"
[0:36] <wido> Any settings I could try? Num reporters is at 3 btw, just to prevent them reporting eachother as down
[0:37] <cmccabe> wido: are you increasing the number of PGs?
[0:37] <wido> cmccabe: No, just added 4 OSD's and added them to the crushmap
[0:37] <gregaf> I imagine there's probably something blocking that shouldn't be :/
[0:38] <gregaf> I always turn to sjust for stuff like this; he's played with peering the most
[0:39] <wido> I expected it to finish in a few hours, but 7 is a lot
[0:39] <sjust> wido:have any logs?
[0:39] <gregaf> it does look like it's making progress?
[0:39] <wido> my stats also show, that almost no bandwith is being used, the data placement on the new OSD's is at 2%, where the current OSD's are at 93%
[0:40] <wido> sjust: Yes, again on logger.ceph.widodh.nl
[0:40] <sjust> cool, thanks
[0:40] <wido> gregaf: well, no really
[0:40] <wido> sjust: From logger you can "ssh root@atom"
[0:40] <sjust> ok
[0:40] <gregaf> oh, I just figured the active/active+clean ones were
[0:40] <wido> that is the new host with the 4 OSD's
[0:40] <wido> sjust: Logs are at "/srv/ceph/remote-syslog"
[0:40] <sjust> ko
[0:40] <sjust> *ok
[0:41] <wido> the hosts "noisy.ceph.widodh.nl" and "atom.ceph.widodh.nl" are the hosts
[0:41] <wido> noisy is the current one, atom is the new one
[0:41] <wido> gregaf: fyi, it is indeed a Atom board. Trying this one with a Intel X25-M: http://zooi.widodh.nl/ceph/osd/20110210_001.jpg
[0:44] <wido> Ah, I think I see something. osd3 got marked out. Killed it and right now I'm trying to unmount it's btrfs filesystem
[0:44] <wido> the umount is in state D at the moment, but I saw this before. btrfs keep blocking for some reason and that causes the OSD's to act weird
[0:44] <sjust> hmm
[0:45] <wido> happening on noisy right now
[0:45] <cmccabe> wido: the osd should die if btrfs blocks forever
[0:45] <wido> but still, with osd3 marked out, the recovery isn't going anywhere
[0:46] <wido> cmccabe: Yes, osd3 died a few hours ago. Couldn't find what it was, so must have been btrfs? There was no core
[0:47] <cmccabe> wido: are there any OOM killer messages in the logs?
[0:48] <cmccabe> wido: grep oom-killer
[0:48] <cmccabe> wido: also of course check that you have cores turned on
[0:49] <cmccabe> wido: I find that often newer systems don't, like default ubuntu installs
[0:49] <gregaf> is his version new enough to die on a blocked FS?
[0:49] <cmccabe> gregaf: what ver?
[0:49] <wido> cmccabe: No, no OOM killer. Core's are on on my systems
[0:49] <gregaf> dunno, that's why I'm asking
[0:50] <wido> I'm running the master branch from about 12 hours ago
[0:50] <cmccabe> that should be new enough
[0:51] <cmccabe> wido: do you have the string "sync_entry timed out" in your logs?
[0:53] <wido> cmccabe: No, but I see that my unmount has finished after 10 minutes
[0:54] <wido> I really have to go, almost 01:00
[0:54] <wido> sjust: Feel free to look around if you want to!
[0:54] <wido> thanks again and ttyl
[0:54] <sjust> wido: ok!
[0:54] <cmccabe> current filestore timeout is 10 minutes
[0:55] <cmccabe> wido: anyway, see you later!
[2:02] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:29] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:33] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has left #ceph
[3:28] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:59] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) has joined #ceph
[4:44] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[5:45] * Meths_ (rift@ has joined #ceph
[5:51] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[5:55] * Meths (rift@ has joined #ceph
[6:01] * Meths_ (rift@ Quit (Ping timeout: 480 seconds)
[6:05] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[6:30] * Meths_ (rift@ has joined #ceph
[6:34] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[6:38] * Meths_ (rift@ Quit (Ping timeout: 480 seconds)
[6:41] * Meths (rift@ has joined #ceph
[6:46] * Meths_ (rift@ has joined #ceph
[6:51] * Meths__ (rift@ has joined #ceph
[6:51] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[6:55] * Juul (~Juul@static.88-198-13-205.clients.your-server.de) Quit (Quit: Leaving)
[6:56] * Meths (rift@ has joined #ceph
[6:56] * Meths_ (rift@ Quit (Ping timeout: 480 seconds)
[7:00] * Meths__ (rift@ Quit (Ping timeout: 480 seconds)
[7:09] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[7:17] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[7:18] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[7:46] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[8:07] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[8:45] * alexxy (~alexxy@ has joined #ceph
[8:50] * uwe (~uwe@ has joined #ceph
[8:54] * gregorg_taf (~Greg@ Quit (Quit: Quitte)
[8:54] * gregorg (~Greg@ has joined #ceph
[9:36] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[9:41] * Yoric (~David@ has joined #ceph
[12:00] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) has joined #ceph
[12:08] * mnigh (~mnigh@99-72-217-5.lightspeed.stlsmo.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[12:15] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[12:56] * allsystemsarego (~allsystem@ has joined #ceph
[13:10] * eternaleye__ (~eternaley@ has joined #ceph
[13:11] * eternaleye__ is now known as eternaleye
[13:28] * uwe (~uwe@ Quit (Quit: sleep)
[13:38] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[14:02] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[14:04] * allsystemsarego (~allsystem@ has joined #ceph
[14:07] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:28] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[16:16] * monrad (~mmk@domitian.tdx.dk) has joined #ceph
[16:19] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (resistance.oftc.net larich.oftc.net)
[16:19] * Jiaju (~jjzhang@ Quit (resistance.oftc.net larich.oftc.net)
[16:19] * WesleyS (~WesleyS@ Quit (resistance.oftc.net larich.oftc.net)
[16:19] * [ack] (ANONYMOUS@ Quit (resistance.oftc.net larich.oftc.net)
[16:19] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (resistance.oftc.net larich.oftc.net)
[16:19] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (resistance.oftc.net larich.oftc.net)
[16:20] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[16:20] * Jiaju (~jjzhang@ has joined #ceph
[16:20] * WesleyS (~WesleyS@ has joined #ceph
[16:20] * [ack] (ANONYMOUS@ has joined #ceph
[16:20] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[16:20] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[16:22] * uwe (~uwe@ has joined #ceph
[16:26] * DLange (~DLange@dlange.user.oftc.net) Quit (Quit: splitty freenode :()
[16:29] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[16:40] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[17:09] * mnigh (~mnigh@75-128-161-124.static.stls.mo.charter.com) has joined #ceph
[17:09] * prometheanfire (~mthode@mx1.mthode.org) Quit (Quit: leaving)
[17:11] * prometheanfire (~mthode@mx1.mthode.org) has joined #ceph
[17:19] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[17:24] * todin (tuxadero@kudu.in-berlin.de) Quit (Quit: leaving)
[17:30] * uwe (~uwe@ Quit (Quit: sleep)
[17:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:53] * greglap (~Adium@ has joined #ceph
[18:19] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:24] * uwe (~uwe@ip-94-79-145-210.unitymediagroup.de) has joined #ceph
[18:31] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:36] * Yoric (~David@ Quit (Quit: Yoric)
[18:39] * uwe (~uwe@ip-94-79-145-210.unitymediagroup.de) Quit (Quit: quit)
[18:50] * [ack]_ (ANONYMOUS@ has joined #ceph
[18:50] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (resistance.oftc.net larich.oftc.net)
[18:50] * [ack] (ANONYMOUS@ Quit (resistance.oftc.net larich.oftc.net)
[18:50] * WesleyS (~WesleyS@ Quit (resistance.oftc.net larich.oftc.net)
[18:50] * Jiaju (~jjzhang@ Quit (resistance.oftc.net larich.oftc.net)
[18:50] * Jiaju (~jjzhang@ has joined #ceph
[18:50] * WesleyS (~WesleyS@ has joined #ceph
[18:50] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[18:51] * cmccabe1 (~cmccabe@ has joined #ceph
[18:59] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:00] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:14] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[19:37] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[19:41] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit ()
[19:43] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[19:49] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[20:03] * [ack]_ is now known as [ack]
[20:05] <wido> hi
[20:05] <cmccabe1> wido: hi
[20:05] <wido> I think my recovery is due to some btrfs issues
[20:06] <wido> I set the filestore timeout to 120 sec and started to see a lot of timeouts
[20:06] <cmccabe1> wido: what kernel are you running?
[20:06] <wido> 2.6.38 with btrfs-work
[20:08] <wido> I also saw one crash today: http://pastebin.com/ugHQJ0eL
[20:08] <wido> Seems to be related to that?
[20:08] <cmccabe1> that is the timeout message
[20:09] <wido> Ah, it doesn't exit nicely :)
[20:09] <cmccabe1> wido: it should come right after the timeout message in the log
[20:09] <cmccabe1> wido: I guess our rationale for using SIGABRT to terminate the process is that it's nice to have a stack trace and core dump
[20:10] <wido> cmccabe1: Yes, one OSD crashed and dumped his core this way. Another OSD is still hanging in status D, the log showed http://pastebin.com/3nDWMduR
[20:10] <wido> osd.0 crashed, but osd.1 is still hanging in status D and won't exit
[20:11] <cmccabe1> wido: hmm
[20:11] <cmccabe1> wido: it would be really nice to find out what system call is hanging
[20:11] <cmccabe1> wido: I forget if you can use gdb on a process in D state or not
[20:12] <cmccabe1> wido: actually, try strace -p <cosd-pid>
[20:14] <wido> A strace doesn't show anything, now I'm not so good with GDB
[20:14] <cmccabe1> wido: oh, actually maybe try cat /proc/<pid>/stack
[20:14] <cmccabe1> wido: I knew that obscure thing would come in handy some day
[20:16] <wido> cmccabe1: Ok, working on it. Right now the cluster seems to be recovering with 7 out of 8 up. Has been running for about 3 hours now
[20:17] <cmccabe1> wido: it would be really helpful to see what's in that proc file
[20:17] <wido> cmccabe1: http://pastebin.com/Y9g8Ya6p
[20:17] <wido> last 100 log lines of that osd and below the stack
[20:18] <cmccabe1> wido: I think that's the /proc/stack for the main process. But what about the thread that's hanging?
[20:21] <wido> cmccabe1: That is the PID of the hanging process, you mean /proc/<pid>/task/*/stack?
[20:21] <cmccabe1> wido: yeah
[20:21] <wido> ok, there are multiple threads, 46 to be exact
[20:21] <cmccabe1> wido: which one is in D state?
[20:22] <wido> PID 1974 is in D
[20:22] <wido> "root 1974 0.0 0.2 467132 15648 ? Dsl 13:18 0:02 /usr/bin/cosd -i 1 -c /etc/ceph/ceph.conf"
[20:22] <cmccabe1> wido: so basically we need the kernel stack for that
[20:22] * allsystemsarego (~allsystem@ Quit (Read error: Operation timed out)
[20:22] <cmccabe1> wido: in order to see what syscall is hanging
[20:23] * allsystemsarego (~allsystem@ has joined #ceph
[20:24] <wido> cmccabe1: Ok, that is some new terrain for me. Should gdb be enough here? "Attaching to process 1974"
[20:24] <cmccabe1> wido: er, I thought it was in /proc/<pid>/task/<task>/stack ?
[20:26] <wido> cmccabe1: That shows the same as I posted before, those few lines
[20:27] <cmccabe1> wido: it's in exit_mm?
[20:27] <wido> cmccabe1: yes
[20:27] <wido> but another task is showing some btrfs calls, let me gather those
[20:28] <wido> cmccabe1: http://pastebin.com/UTYqQfip
[20:28] <wido> All the other tasks are showing the exit_mm
[20:28] <cmccabe1> wido: ah...
[20:29] <cmccabe1> wido: looks like it's hanging in btrfs_ioctl_snap_create_v2
[20:30] <wido> cmccabe1: What I did see, when stopping OSD's and unmounting the btrfs vols, my dmesg sometimes fills up with btrfs messages
[20:30] * verwilst (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[20:31] <cmccabe1> wido: looks like ::ioctl(basedir_fd, BTRFS_IOC_SNAP_CREATE_V2, &async_args) is hanging.
[20:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:31] <wido> cmccabe1: I saw these messages: "btrfs: sdc checksum verify failed on 1695112499200 wanted A6C299CC found 468C2876 level 0"
[20:32] <wido> sdc is the disk of osd.1, after that messages, a btrfs BUG came up
[20:32] <wido> seems like I hit a btrfs bug again
[20:32] <cmccabe1> wido: yeah, it certainly does
[20:32] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit ()
[20:33] <wido> I think I saw this one before: http://pastebin.com/URN3ShVb have you seen that one?
[20:34] <cmccabe1> wido: I guess the question to answer is, have there been any btrfs bug fixes since your kernel was built that might address this?
[20:34] <cmccabe1> wido: I don't have the answer but others might
[20:35] <wido> cmccabe1: Oh, no, what I meant, did you hit the same one on your test env?
[20:35] <wido> I'm building btrfs almost daily with the btrfs-work tree
[20:35] <cmccabe1> wido: I haven't seen that bug.
[20:36] <cmccabe1> wido: we're rolling out a better test system soon that will hopefully make it easier for us to test with new btrfs kernels
[20:36] <wido> ok, i'll report this one at btrfs
[20:36] <cmccabe1> wido: k
[20:38] <wido> I'm curious if my expansion from 4 to 8 OSD's will work out
[20:58] <Tv> 11:51:42 ERROR| [stderr] 2011-02-11 11:51:42.666380 7f0edba39720 store(dev/mon.1) MonitorStore::get_int: failed to open 'dev/mon.1/auth/last_pn': error 2: No such file or directory
[20:58] <Tv> i've never seen that before..
[20:59] <Tv> cmccabe1: is that perhaps something you mentioned during the call?
[20:59] <Tv> because that files does seem to exist, at least by now
[21:02] <Tv> hmm it seems my ceph is healthy but cfuse didn't succeed..
[21:03] <cmccabe1> tv: that warning was squashed in 8a1906a017a967c096be032636eb55c8c5bb874b
[21:03] <Tv> cmccabe1: thanks
[21:03] <cmccabe1> tv: np
[21:03] <Tv> oh whaaat, make install doesn't seem to include cfuse?
[21:04] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[21:04] <Tv> ---with-fuse grumble grumble
[21:05] <Tv> i hate autodetection of build-deps
[21:05] <gregaf> Tv: I think it should be on by default if you've got the fuse stuff installed
[21:05] <Tv> yeah which means every new machine is a timebomb waiting to build it wrong
[21:05] <cmccabe1> tv: can you run ./do_autogen to do your autoconf/automake?
[21:05] <Tv> cmccabe1: huh?
[21:05] <cmccabe1> tv: the idea is we put all the --prefix, --with-fuse, etc. stuff into do_autogen.sh
[21:06] <Tv> ..which doesn't do cfuse either
[21:06] <cmccabe1> tv: yeah, I guess it should.
[21:06] <Tv> honestly, i don't understand what do_autogen.sh is for
[21:07] <cmccabe1> tv: it's for me not having to remember all those flags to configure all the time, or tell people to use them on the mailing list / chat room
[21:07] <Tv> can't you change the defaults to be with and not without?
[21:08] <cmccabe1> tv: we don't want automake to fail just because it can't find something
[21:08] <Tv> but that's what do_autogen.sh makes it do
[21:08] <cmccabe1> tv: that's what --with-foo makes it do, in general.
[21:08] <cmccabe1> tv: fail if there is no foo
[21:09] <cmccabe1> tv: --without-foo forces no foo
[21:09] <Tv> so what's the difference between putting it in do_autogen.sh and telling people to do, vs doing it directly?
[21:09] <Tv> s/do/use it/
[21:09] <cmccabe1> tv: what's "it"?
[21:09] <Tv> e.g. --with-gtk2=yes
[21:10] <cmccabe1> tv: I think that --with-gtk2 will abort compilation if GTK2-dev is not found. (I could be wrong though)
[21:10] <Tv> yes
[21:10] <cmccabe1> tv: I do think that --with-gtk2 *should* do that if it currently doesn't.
[21:10] <Tv> and if you don't have it, you say --without-gtk2
[21:11] <cmccabe1> tv: the convention in automake-land is to build with what you've got, even if that's not "all features"
[21:11] <Tv> cmccabe1: and e.g. debian spends a crapload of time working around that idiocy, to have reliable builds
[21:11] <cmccabe1> tv: failing the build because the user doesn't have every possible option generally isn't done
[21:11] <Tv> not every possible
[21:11] <Tv> stuff you consider core
[21:11] <Tv> oh well
[21:11] <Tv> i'll just --with-crapo
[21:13] <cmccabe1> tv: maybe I don't really understand all the issues involved, but wouldn't manually specifying every --with-foo be what debian has to do anyway?
[21:14] <Tv> cmccabe1: because nobody knows what all --with flags are needed, and that can change over time
[21:14] <Tv> it's brittle
[21:14] <Tv> so you have things like suddenly the next version of some app is missing a feature
[21:14] <gregaf> I'm confused
[21:15] <cmccabe1> tv: advocating that the build fail if automake can't find every foo would make the build even more brittle though right
[21:15] <gregaf> your problem is that make install didn't install cfuse?
[21:15] <gregaf> so did you want it to fail and say FUSE not available?
[21:15] <Tv> well that's how i discovered this issue
[21:15] <Tv> yeah
[21:15] <cmccabe1> tv: I guess the brittleness would be up-front and visible rather than hidden an non-obvious?
[21:15] <Tv> and if you really don't want to install fuse development libs, say --without-fuse
[21:16] <Tv> just like help claims gtk2 to be already: --with-gtk2 Build the graphical tools. Default=yes.
[21:16] <Tv> cmccabe1: exactly
[21:16] <Tv> fail fast, fail obviously, provide easy way out
[21:16] <gregaf> but cfuse really isn't that important a piece of Ceph
[21:16] <Tv> also stuff like --with-libatomic-ops -- what am i supposed to do? what's the well tested version? etc
[21:16] <Tv> minimize variability
[21:16] <gregaf> you could run clusters and never want it installed
[21:17] <Tv> gregaf: sure, but the cost of having the -dev package installed and compiling a few files is minimal
[21:17] <cmccabe1> tv: there is a comment that says "You want it!" :)
[21:17] <cmccabe1> tv: #libatomic-ops? You want it!
[21:17] <Tv> cmccabe1: where?
[21:17] <cmccabe1> tv: clearly enterprise-grade documentation
[21:17] <cmccabe1> tv: configure.ac
[21:17] <gregaf> in the autogen file, it doesn't count as documentation
[21:17] <Tv> cmccabe1: no user would ever even see that
[21:17] <cmccabe1> tv: haha. yeah, I know.
[21:17] <Tv> most people will run ceph without atomic-ops, given the current state of the source
[21:18] <gregaf> the defaults are all sane, and everything that's important should be in the packaging as recommends and requires
[21:18] <Tv> so what i'm saying here is
[21:18] <Tv> there's two kinds of things here: features and alternate implementation
[21:19] <Tv> features: it's confusing if they go missing when you accidentally don't have the -dev package installed, where you earlier had the feature included (e.g. compiling on a different machine); it'd be IMHO better to have to say "do NOT include this feature, i don't want to install it's dependencies"
[21:20] <Tv> alternate implementations: boolean variables causing combinatorial explosion of variants: debug * tcmalloc * libatomic-ops is already 8 different variants that should all be tested
[21:20] <Tv> it's better if you can remove alternates, or at least make them very very rare (only used by that one freak with HP-UX, etc)
[21:21] <Tv> codify what alternates are the canonical setup, make configure demand those
[21:21] <Tv> not "well it depends on how you built it"
[21:21] <gregaf> you realize these alternates don't impact the code path at all?
[21:22] <gregaf> tcmalloc is a library you link against
[21:22] <Tv> surely they change the behavior of the whole
[21:22] <gregaf> libatomic-ops influences which version of atomic variables we use
[21:22] <Tv> the whole is what we test
[21:22] <Tv> especially for stuff like performance
[21:22] <cmccabe1> gregaf: tv does have a point here
[21:22] <cmccabe1> gregaf: using a different malloc will change memory consumption pattersn
[21:22] <cmccabe1> gregaf: potentially hiding or revealing problems
[21:23] <gregaf> yes, and all these kinds of things are or should be handled in the packaging system for end users
[21:23] <gregaf> that defines our canonical setup
[21:23] <cmccabe1> tv: greg also has a point. The .deb and .rpm have "canonical" ./configure settings
[21:23] <gregaf> if you're familiar enough with Ceph to be using make install you are familiar enough with it to figure out which options to set
[21:23] <Tv> gregaf: there's no one packaging system; but them in the part that's shared, e.g. configure
[21:23] <cmccabe1> however, I'd like to point out that most of our users aren't on deb or rpm yet. We keep telling them to pull latest master :)
[21:24] <gregaf> well, I really like that it builds on whatever machine I put it on without my having to bitch at it
[21:24] <Tv> except when you try to use it, and then it's suddenly missing cfuse or radosgw or ...
[21:24] <gregaf> but whining about it here is not a way to make decisions; bring it up on the mailing list or in a team meeting as you like
[21:26] <cmccabe1> tv: posting about this on the mailing list probably does kinda make sense
[21:26] * ablyler (~Adium@ has joined #ceph
[21:26] <Tv> yeah, will try to have a concrete proposal
[21:28] <ablyler> Hi all, I am new to ceph and running into an issue where i can't get out of the degraded state. I have 3 servers that are setup, all running osd, mds, and mon roles.
[21:28] <ablyler> I added each one at a time.
[21:28] <ablyler> Any ideas on how I can get out of the degraded state?
[21:28] <cmccabe1> ablyler: first off, what does ./ceph health tell you?
[21:29] <ablyler> # ceph health
[21:29] <ablyler> 2011-02-11 20:28:54.772711 mon <- [health]
[21:29] <ablyler> 2011-02-11 20:28:54.773632 mon2 -> 'HEALTH_OK' (0)
[21:29] <cmccabe1> ablyler: what's your replication level? If you have fewer osds than that, you'll always be degraded
[21:30] <ablyler> 3
[21:30] <cmccabe1> ablyler: how many OSDs?
[21:30] <ablyler> i have 3 osds and the replication level is set to 3
[21:31] <cmccabe1> ablyler: ok. What kind of output do you get from ./ceph pg stat -o -
[21:31] <ablyler> # ceph pg stat -o -
[21:31] <ablyler> 2011-02-11 20:31:19.619847 mon <- [pg,stat]
[21:31] <ablyler> 2011-02-11 20:31:19.620757 mon1 -> 'v65: 272 pgs: 272 active+clean+degraded; 100 MB data, 7750 MB used, 1718 GB / 1818 GB avail; 132/211 degraded (62.559%)' (0)
[21:32] <ablyler> the other thing that is weird is that ceph -s shows "osd e25: 1 osds: 1 up, 1 in"
[21:32] <cmccabe1> how about ./ceph pg dump -o -
[21:32] <cmccabe1> ablyler: yeah, that does kind of make it look like you only have one OSD rather than 3
[21:33] <cmccabe1> ablyler: dumb question: have you done a ps to see if the other cosds are running?
[21:33] <ablyler> output of pg dump… http://pastebin.com/WeKiSypv
[21:34] <ablyler> hehe, yea i checked and all the cosds are running
[21:34] <cmccabe1> ablyler: looks like the only cosd that's "in" is osd0
[21:35] <ablyler> wierd
[21:35] <cmccabe1> ablyler: can I see your conf?
[21:35] <ablyler> sure
[21:36] <ablyler> http://pastebin.com/XwLmuPkZ
[21:36] <ablyler> the same config is on all 3 servers
[21:38] <cmccabe1> ablyler: 1sec
[21:38] <ablyler> cmccabe1: np, thanks for helping me out
[21:38] <ablyler> :-)
[21:39] <gregaf> ablyler: how did you start your machines up?
[21:40] <ablyler> i started the daemon's via gentoo's init script
[21:41] <gregaf> and you had this ceph.conf set up before you did that?
[21:41] <ablyler> yea
[21:42] <gregaf> and you ran mkcephfs with this ceph.conf?
[21:43] <ablyler> yea
[21:44] <ablyler> i have tried restarting ceph and it says it has started just fine
[21:44] <ablyler> starting osd2 at osd_data /var/data/ceph/osd2 /var/data/ceph/osd2/journal [ ok ]
[21:44] <gregaf> hmm
[21:45] <ablyler> the other server says the same thing for osd1
[21:45] <gregaf> can you give me the full output of ceph -s?
[21:46] <ablyler> sure: http://pastebin.com/U00Fvemu
[21:47] <gregaf> huh, I've never seen an issue with getting OSDs set up where all the MDSes were operational
[21:47] <gregaf> :)
[21:47] <ablyler> hehe, are there any commands that could be run to re-add the osds?
[21:48] <gregaf> there is a sequence for adding OSDs to an existing cluster
[21:48] <gregaf> but that shouldn't be necessary I don't think after a clean mkcephfs
[21:50] <gregaf> so just to be clear
[21:50] <gregaf> you wrote your ceph.conf
[21:50] <gregaf> then you ran mkcephfs with that ceph.conf
[21:51] <gregaf> then you ran init ceph (or whatever gentoo does) with that ceph.conf
[21:51] <gregaf> and now you have this
[21:51] <gregaf> that right, ablyler?
[21:51] <ablyler> maybe i did something wrong
[21:51] <ablyler> the ceph.conf
[21:52] <ablyler> that i started with only had one osd
[21:52] <ablyler> then i ran mkcephfs
[21:52] <ablyler> then i started ceph
[21:52] <gregaf> ah, yes, that's what happened
[21:52] <gregaf> I'm much less confused now
[21:52] <ablyler> then i added the other OSDs
[21:52] <gregaf> to the ceph.conf?
[21:52] <gregaf> and restarted ceph?
[21:52] <ablyler> yea and ran cosd -c /path/to/ceph.conf -i 4 --mkfs --monmap /path/to/monmap (--mkjournal (if you are using a journal))
[21:53] <gregaf> http://ceph.newdream.net/wiki/OSD_cluster_expansion/contraction
[21:53] <gregaf> did you look at that page?
[21:53] <ablyler> i think i just fixed it by running mkcephfs again
[21:53] <ablyler> yea
[21:53] <ablyler> that is what i followed
[21:53] <gregaf> heh, or run mkcephfs, yes
[21:53] <gregaf> you probably forgot to run "ceph osd setmaxosd 3" :)
[21:54] <ablyler> :-)
[21:54] <ablyler> probably
[22:00] <ablyler> gregaf: thanks for the help :-)
[22:00] <gregaf> np!
[22:21] <wido> cmccabe1: When looking at my pastebin, I noticed something weird: http://pastebin.com/Y9g8Ya6p
[22:21] <wido> "osd.1[1974]: ileStore: sync_entry timed out after"
[22:22] <wido> the first char of FileStore didn't make it to the log
[22:22] <wido> "osd.1[1974]: n thread 7ff254037700"
[22:22] <wido> the "i" is missing there too
[22:28] <sagewk> sjust gregaf: anybody on sepia?
[22:29] <gregaf> not me
[22:29] <sjust> not me
[22:29] <Tv> not me
[22:30] <Tv> mmm autoserv accounts for everyone & then we can use its locking to reserve the machines
[22:30] <sagewk> :)
[22:33] <Tv> so what are the machine-usable ways to check ceph cluster health? is "ceph health" what i really should be using, will it reliably exit 1 on problems, etc
[22:33] <sagewk> it will exit 1 if it can't connect to determine health
[22:34] <sagewk> or exit 0 and give you a "HEALTH_{OK,WARN,ERR} blah blah" string
[22:34] <Tv> ok i can grep that
[22:34] <sagewk> in master, ceph --concise health will strip out all the timestamp and <- crap and _just_ print the string
[22:36] <sagewk> er, it will as soon as i fix my commit that is :)
[22:36] <Tv> hehe
[22:43] * mnigh (~mnigh@75-128-161-124.static.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[22:44] <Tv> sagewk: since that is way nicer to parse, i'm waiting for your commit..
[22:45] <sagewk> pushed
[22:46] <Tv> Go Thunderbirds, go! Err, I mean, gitbuilder!
[23:00] <Tv> oh heh sometimes " blah blah" isn't there
[23:01] <Tv> need to allow for not having a space after the token
[23:01] <sagewk> in the HEALTH_OK case probably?
[23:01] <Tv> yea
[23:01] <Tv> oh funky while i was looping for it to be healthy it went back to degraded
[23:02] <Tv> HEALTH_WARN Some PGs are: degraded
[23:04] <cmccabe1> tv: it might be nice to have some simple interface to lock things through autoserv, even without using the rest of the testing infrastructure
[23:04] <Tv> <3 the whole just click Abort, Clone job, Submit to get a rerun; it reboots the worker if needed etc
[23:04] <Tv> cmccabe1: that already exists
[23:04] <cmccabe1> tv: excellent. any docs?
[23:05] <cmccabe1> tv: or maybe it's simple enough not to need docs
[23:05] <Tv> Using worker machines manually
[23:05] <Tv> ==============================
[23:05] <Tv> You can use the autotest worker machines for manual testing, by
[23:05] <Tv> *locking* them in the web user interface, or on the command line with
[23:05] <Tv> ``atest host mod --lock``. Remember to unlock them when done.
[23:05] <Tv> that's from the upcoming README
[23:05] <cmccabe1> tv: k
[23:06] <cmccabe1> wido: will check up on that first char thing
[23:06] <wido> cmccabe1: k, I checked the FileStore code, it's fine there.
[23:06] <wido> Might be a syslog issue (the syslog daemon), but it's weird
[23:06] <cmccabe1> wido: I think I know what it is...
[23:10] * ablyler (~Adium@ Quit (Quit: Leaving.)
[23:23] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:36] * monrad (~mmk@domitian.tdx.dk) Quit (Quit: bla)
[23:36] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[23:41] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.