[0:37] <cmccabe> gregaf: ok, should be resolved
[0:37] <cmccabe> gregaf: sorry about that, I forgot that those two were inter-dependent
[0:38] <cmccabe> gregaf: at least this eliminates another confusing corner case... log_per_instance now works with log_file.
[0:42] <Tv> ok who uses and did not register it in uebernet :(
[0:42] <cmccabe> I use that IP, but I'm pretty sure I registered it in uebernet
[0:42] <Tv> nope, it was available
[0:42] <Tv> or uebernet is buggy
[0:43] <cmccabe> someone helped me set that up, I forget exactly how
[0:43] <Tv> https://uebernet.dreamhost.com/index.cgi?tree=admin.network&ip=
[0:44] <Tv> cmccabe: fix the machine name, that's all you need to do now
[0:45] <cmccabe> I remember changing this already
[0:45] <cmccabe> I can probably even find the chat room conversation... I had to ask for help a lot since I've never used this system before
[0:48] <cmccabe> (12:37:45 PM) colinm@newdream.net/Home: yehuda suggested https://uebernet.dreamhost.com/index.cgi?tree=admin.network&&
[0:48] <cmccabe> command=editnetwork&network=
[0:48] <cmccabe> (12:37:56 PM) colinm@newdream.net/Home: and
[0:48] <cmccabe> ...
[0:48] <cmccabe> (12:39:16 PM) tv@newdream.net: please make the comment say it's a vm etc, otherwise we'll be hunting the owner of that
[0:48] <cmccabe> mac address some months from now
[0:48] <cmccabe> (12:39:31 PM) colinm@newdream.net/Home: yeah
[0:48] <cmccabe> tv: so apparently... it got cleared somehow after that.
[0:48] <cmccabe> tv: well, I dunno.
[1:29] <Tv> cmccabe: did you really mean this in vstart?
[1:29] <Tv> - log file = out/\$name.log
[1:29] <Tv> + log file = out/\$host
[1:36] <Tv> autotest server is out of space again
[1:36] <Tv> cleaning
[1:36] <Tv> (and hopefully soon the automatic gzipping change will start working!)
[2:02] <cmccabe> tv: vstart just controls some defaults. In this case, I wanted to use log file + log per instance
[2:03] <cmccabe> tv: since log_dir was pasting on a hostname, log_file needed to do the sam ehere
[2:03] <cmccabe> tv: so yes
[9:08] <chraible> hi I compilied & configured ceph 0.26 today and now i want to start ceph with /etc/init.d/ceph -a start oder service ceph -a start but the service is unknown and iun /etc/init.d there is noch such file!
[9:11] <wido> chraible: I'm not sure if make && make install places the init file for you
[9:11] <wido> you might want to grab it from the source directory
[9:12] <chraible> k I will try
[9:12] <wido> ceph-init is the name I think
[9:13] <wido> chraible: src/init-ceph
[9:14] <chraible> when i do ceph -c /etc/ceph/ceph.conf start following message is shown: 2011-04-14 09:19:23.773715 40596940 -- :/12245 >> pipe(0xc285e0 sd=4 pgs=0 cs=0 l=0).fault first fault
[9:14] <chraible> init-ceph is not in the directory
[9:14] <wido> chraible: Just saw your ml message
[9:14] <wido> 'ceph' is the client binary, that's only used for monitoring/managing your cluster
[9:14] <chraible> k
[9:14] <wido> the init script should start your daemons like cosd, cmds and cmon
[9:15] <wido> on both nodes, ceph0 and ceph1 you should have the init script, it's in the source tree
[9:15] <wido> copy it to /etc/init.d/ceph and chmod +x
[9:16] <chraible> great :) thx thx thx :D
[9:16] <chraible> === osd.2 ===
[9:16] <chraible> Mounting Btrfs on ceph2:/data/osd2
[9:16] <chraible> Scanning for Btrfs filesystems
[9:16] <chraible> Starting Ceph osd.2 on ceph2...
[9:16] <chraible> ** WARNING: Ceph is still under heavy development, and is only suitable for **
[9:16] <chraible> ** testing and review. Do not trust it with important data. **
[9:16] <chraible> starting osd2 at osd_data /data/osd2 /data/osd2/journa
[9:17] <wido> yes, that's more like it
[9:17] <wido> btw, you have 2 monitors, it's recommended to have a odd number of monitors
[9:17] <wido> so 1 or 3
[9:18] <wido> a small voting system is used between the monitors, when you have a even number of monitors that could cause trouble
[9:19] <chraible> ok thanks for help :)
[9:19] <wido> np
[9:19] <wido> got to go!
[9:21] * chraible (~chraible@blackhole.science-computing.de) has joined #ceph
[9:31] <chraible> k next question ^^
[9:32] <chraible> when I now want to mount ceph ( mount.ceph /mnt/ceph/ ) with kernel client I got following error :
[9:32] <chraible> FATAL: Module ceph not found.
[9:32] <chraible> mount.ceph: modprobe failed, exit status 1
[9:32] <chraible> mount error: ceph filesystem not supported by the system
[9:32] <chraible> but an lsmod | grep ceph shows: libceph 98864 0
[9:34] <chraible> what did I wrong?
[16:26] <bchrisman> chraible: ceph and libceph modules should be loaded
[16:26] <bchrisman> ceph 319502 1
[16:26] <bchrisman> libceph 200477 1 ceph
[16:26] <bchrisman> libcrc32c 1196 2 libceph,btrfs
[19:23] <gregaf> cmccabe: others: fyi standup will be later today, management meetings
[20:26] <bchrisman> mkcephfs.in line 276: 276: echo mkfs.btrfs $btrfs_devs
[20:27] <bchrisman> haven't always been noticing this because most of the systems on which I'm testing already have btrfs on the same partition layouts from prior tests.
[20:27] <bchrisman> tried to track this down in git, but it seems to have been in some commit which changed most of the file
[20:28] <Tv> bchrisman: i see that
[20:28] <Tv> the relevant lines before that commit were
[20:28] <Tv> if [ $mkbtrfs -eq 1 ]; then
[20:28] <Tv> - get_conf osd_user "root" "user"
[20:28] <Tv> - do_root_cmd "umount $btrfs_path ; for f in $btrfs_devs ; do umount \$f ; done ; modprob
[20:28] <Tv> no echo there
[20:28] <Tv> err that got truncated
[20:29] <cmccabe> bchrisman: sage rewrote most of mkcephfs recently, maybe that's the commit you're thinking of
[20:29] <Tv> - do_root_cmd "umount $btrfs_path ; for f in $btrfs_devs ; do umount \$f ; done ; modprobe btrfs ; mkfs.btrfs $btrfs_devs ; modprobe btrfs ; btrfsctl -a ; mount -t btrfs $btrfs_opt $first_dev $btrfs_path ; chown $osd_user $btrfs_path ; chmod +w $btrfs_path "
[20:30] <Tv> so it seems the rewrite did add the echo with no explanation, perhaps by mistake (leftover debug thing)
[20:31] <Tv> bchrisman: sage is in a meeting right now, hopefully we can confirm that in an hour or so
[20:32] <wido> bchrisman: that mkfs thing was fixed some time ago
[20:32] <wido> I noticed it to, sent a patch to the ml, was about 2 weeks ago I think
[20:32] <Tv> wido: huh..
[20:33] <Tv> $ git show newdream/master:src/mkcephfs.in|grep mkfs.btr
[20:33] <Tv> echo mkfs.btrfs $btrfs_devs
[20:33] <bchrisman> yeah.. it's thereā€¦ we don't always notice it because when we blast new builds on, the disks were formatted the same with previous OS instance..
[20:33] <sjust> wido: did you get a chance to try that branch?
[20:33] <Tv> wido: i think sage dropped your patch accidentally
[20:34] <wido> Tv: Think so, can't find it in the history
[20:34] <bchrisman> wido: Tv yeah it sounded familiar to something on the ml (must've been your post)
[20:34] <wido> sjust: Oh yes, I updated #990. Still seeing the crash, bit different backtrace though (line numbers)
[20:34] <sjust> ah, ok
[20:35] <Tv> wido, bchrisman: ok since we have sage's ack, i'm just gonna re-commit the patch
[20:35] <wido> http://marc.info/?l=ceph-devel&m=130159486220285&w=2
[20:35] <wido> It's there :)
[20:36] <wido> sjust: I can bring my cluster up, 38 OSD's come up, but slowly they start to die again
[20:36] <sjust> wido: I'm looking at the new backtrace
[20:39] <Tv> bchrisman, wido: in master
[20:39] <sjust> wido: seems I was wrong about the cause of the error, I don't suppose you had OSD debugging on when you retried?
[20:42] <wido> sjust: No, but I could start some OSD's with debug at max
[20:42] <wido> see what happends?
[20:42] <sjust> that would help :)
[20:42] <sjust> thanks!
[20:42] <wido> debug osd enough? Or do you need filestore as well?
[20:42] <sjust> OSD should be enough
[20:57] <Tv> gregaf: my recent push just caused a bounce from the commits mailing list
[21:03] <Tv> whee code paths that have not been run before! brace for impact!
[21:09] <wido> sjust: Ok, got all the OSD's running again
[21:10] <wido> it's already getting evening here, so I'll go afk in a moment
[21:10] <wido> If you want to take a look, log on to "root@logger.ceph.widodh.nl" and go to "monitor" (ssh monitor)
[21:11] <wido> my logs are all being sent through remote-syslog to noisy in /var/log/remote/ceph/osd|mds|mon.log
[21:11] <wido> right now I'm seeing the "wrongly marked me down!" messages, but that's due to the atoms which are lacking CPU power I guess
[21:28] <sjustlaptop> ok, taking a look
[21:35] <wido> sjustlaptop: Seems I'm hitting some other bugs to, seeing a PG::do_peer issue on atom2 as well
[21:35] <wido> "btrfs csum failed ino 321117 off 1662976 csum 476747282 private 1198268964" that doesn't seem right either
[21:35] <sjustlaptop> hmmm
[21:36] <gregaf> Tv: what address got bounced?
[21:36] <Tv> Return-Path: <tv42@ceph-tracker.dreamhostps.com>
[21:37] <gregaf> ah
[21:37] <gregaf> used a pretty simple regex for whitelisting, it wasn't prepared for your digits :)
[21:38] <gregaf> should work in the future
[21:38] <Tv> hey don't blame me for not being able to get the username i want ;)
[21:38] <gregaf> not blaming!
[21:38] <Tv> hehe
[21:38] <gregaf> just amused
[21:38] <gregaf> our panel admin system is hilarious fun
[21:38] <Tv> at least it allowed 4 characters
[21:39] <Tv> my next fallbacks are magic addresses from the commodore 64
[21:39] <Tv> sys64738 is still strong in my fingers
[21:51] <wido> sjustlaptop: you might see, the osd's on atom0 are all dying
[21:52] <sjustlaptop> wido: yeah, looking at the atom2 crash right now, actually
[21:56] <sjustlaptop> wido: what was the osd logging level?
[21:58] <wido> sjustlaptop: 20
[21:58] <wido> The ceph.conf is the same on all the nodes: http://zooi.widodh.nl/ceph/ceph.conf
[21:58] <sjustlaptop> wido: ok
[21:59] <wido> sjustlaptop: I just checked noisy, bwm-ng showed 25MB/sec input due to all the logging. Might be some packets got lost?
[21:59] <wido> Although it's a Gbit network
[22:00] <sjustlaptop> I'm just trying to get my bearings with the logs. Do the backtraces usually show up in the logs?
[22:01] <wido> sjustlaptop: they sometimes show up in the logs, but haven't seen them with these latest crashes
[22:05] <sjustlaptop> yeah, can the logger drop entries?
[22:06] <wido> It shouldn't, but it could be? 25MB/sec is a lot of text
[22:06] <cmccabe> sjustlaptop: syslog can drop entries, doutstreambuf cannot
[22:10] <sjustlaptop> cmccabe, thats what I meant
[22:10] <sjustlaptop> wido: hmm, we seem to be loosing logs
[22:11] <cmccabe> sjustlaptop: k
[22:11] <cmccabe> sjustlaptop: one quick hack might be to bump up the priority of the syslogs that dout sends
[22:12] <cmccabe> sjustlaptop: I vaguely remember that there's also some way to get syslog to use TCP, which would probably eliminate the message-dropping ?
[22:12] <sjustlaptop> cmccabe: sorry, I'm not terribly familiar with the newer dout stuff. Is that priority controlled by dout or syslog
[22:13] <sjustlaptop> cmccabe: TCP should eliminate message dropping, the problem seems to be that syslog uses UDP
[22:13] <wido> cmccabe: syslog over TCP is possible indeed, rsyslog supports it
[22:13] <cmccabe> sjustlaptop: it's controlled by dout_prio_to_syslog_prio
[22:13] <cmccabe> sjustlaptop: the TCP approach might be the better one though
[22:13] <cmccabe> sjustlaptop: of course this will slow down the program since it has to wait for all those logs to be sent, rather than just breezing along!
[22:14] <sjustlaptop> wido: could you switch it to TCP? Or actually if a large number of the osds are failing, we could just enable logging on 3-4.
[22:14] <wido> sjustlaptop: Yeah, we could do that on a few hosts
[22:14] <wido> let me rewrite the config
[22:14] <sjustlaptop> cool
[22:14] <wido> I'll enable local logging on atom0, 1, 2 and 3
[22:15] <wido> and start all the OSD's again
[22:16] <sjustlaptop> ok, thanks
[22:17] <sjustlaptop> the logs will show up in atom[0123]:/var/log/ceph ?
[22:20] <wido> sjustlaptop: Yes, they should
[22:20] <sjustlaptop> ok
[22:21] <wido> having a small config issue with them, the override in the [osd8] section for example doesn't seem to overrule the general osd config
[22:22] <sjustlaptop> wido: hrm
[22:23] <wido> sjustlaptop: http://zooi.widodh.nl/ceph/ceph.conf
[22:23] <wido> that should work, shouldn't it?
[22:24] <sjustlaptop> hmm, you probably have to add log/clog to syslog = false
[22:25] <sjustlaptop> I think it ignores log dir if log to syslog is true
[22:27] <wido> I addedd log dir = <empty> and file to prevent them from logging local
[22:27] <wido> syslog = true sent the logs to syslog as well, but will still log local
[22:28] <sjustlaptop> oh
[22:28] <wido> I've just added the 'false' for both clog and syslog, but the osd still doesn't log locally
[22:28] <wido> Seems to me that log dir/file are not overruled and stay empty
[22:28] <wido> could be a bug? I set both values to nothing, then they never get overruled?
[22:29] <sjustlaptop> cmccabe: the [osd?] sections are supposed to override the [osd] section, right?
[22:29] <cmccabe> sjustlaptop: yes
[22:29] <sjustlaptop> wido: yeah, it's almost certainly a bug
[22:29] <sjustlaptop> i
[22:29] <sjustlaptop> *I'll take a look
[22:29] <cmccabe> sjustlaptop: well, actually, it might depend on the order they appear in the configuration file; I'd have to check that
[22:30] <sjustlaptop> in this case it's [osd] followed by [osd8]
[22:30] <wido> cmccabe: config in question: http://zooi.widodh.nl/ceph/ceph.conf
[22:30] <wido> a few OSD's in that config should also log locally, beside their syslog
[22:32] <cmccabe> hmm
[22:32] <wido> sjustlaptop: For now I'll let them log locally, so you can hunt further, but the disk will fill up rather quickly with debug at 20
[22:32] <sjustlaptop> ok
[22:32] <cmccabe> I think there might be a bug here
[22:33] <cmccabe> we probably want to always force the override order [ global, mds, mdsa, mds.a ]
[22:33] <sjustlaptop> cmccabe: yes
[22:33] <sjustlaptop> how does it work now?
[22:34] <cmccabe> actually maybe it's doing the right thing here
[22:34] <cmccabe> except that mdsa takes priority over mds.a, which is odd
[22:35] <cmccabe> oh, actually not. Forget that.
[22:35] <wido> Ok, local logging is set up again
[22:35] <sjust> wido: ok, thanks
[22:35] <wido> all nodes log locally, I hope there is enough diskspace before they crash
[22:35] <sjust> me too
[22:35] <cmccabe> yeah, I'm pretty sure overriding should work as planned
[22:37] <wido> cmccabe: I'll debug it a bit more tomorrow, see what I can come up with
[22:37] <wido> going afk now
[22:37] <cmccabe> what's the behavior you're seeing?
[22:37] <sjust> ok, thanks for the help!
[22:37] <wido> Well, I set a log file and log dir for a few specific OSD's
[22:37] <wido> but they do not log locally
[22:38] <cmccabe> do they log to syslog?
[22:38] <wido> seems that the global setting isn't overruled
[22:38] <wido> yes, they do. My gues, since I set dir/file to NULL, they are somehow ignored further down
[22:38] <cmccabe> you may find an error message in syslog like "couldn't open log file foo" if it is still logging to syslog.
[22:38] <wido> Hmm, could point, I'll try
[22:38] <cmccabe> remember that setting log file doesn't clear log to syslog
[22:38] <wido> No, I know
[22:38] <cmccabe> if you want to clear log to syslog for those osds, you must do it explicitly.
[22:39] <wido> It was the other way around, but I get it
[22:39] <wido> enough for today, ttyl!
[22:39] <cmccabe> try searching for "failed to open log file"
[22:39] <cmccabe> ok later
[22:39] <wido> and sjustlaptop feel free to play around
[22:39] <wido> tnx!
[22:39] <sjust> ok!
