#ceph IRC Log


IRC Log for 2011-06-25

Timestamps are in GMT/BST.

[0:12] * slang (~slang@chml01.drwholdings.com) Quit (Ping timeout: 480 seconds)
[0:30] <cmccabe> it seems like testceph hasn't worked in a while
[0:30] <cmccabe> it doesn't work in 0.29
[0:30] <cmccabe> the last good version I can find so far is 0.28.1
[0:31] <cmccabe> I'm just going to open up readdir and see what seems to be going on
[0:38] <cmccabe> it looks like it's hanging in Client::unmount
[0:39] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[0:53] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[1:01] * lxo (~aoliva@9YYAABOY5.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[1:02] * lxo (~aoliva@09GAAE4U8.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:02] <gregaf1> cmccabe: you want to assign that testceph bug to me?
[1:04] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[1:04] <gregaf1> stingray: I looked at your journal, there's a splotch of ~100 zeros that should be filled in but it everything seems good before that :/
[1:04] <gregaf1> nothing I can diagnose without the rest of the cluster, though
[1:05] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[1:07] <cmccabe> gregaf: I think sage said he was looking at it, you should ask him if he's done anything yet
[1:07] <cmccabe> gregaf: in the interest of non-work-duplication
[1:07] <cmccabe> gregaf: for what it's worth, it reproduces even with only 1 mds
[1:10] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[1:10] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[1:11] <gregaf1> cmccabe: he said he was looking at it after you bisected?
[1:11] <cmccabe> I didn't really succeed in bisecting
[1:11] <cmccabe> unlike for the testrados bug where I quickly tracked it down
[1:12] <cmccabe> testceph kind of went through periods of working and non-working, which is a mess for the purposes of bisecting
[1:12] <gregaf1> yeah
[1:12] <gregaf1> I just saw the end of it was a few pinned inodes
[1:12] <cmccabe> I know it worked in 0.28.1 and not in 0.29, that's about all I can say
[1:12] <gregaf1> and I've worked on a lot of bugs in that area
[1:13] <cmccabe> it's probably simpler to just fix the thing and try to find the commit after
[1:13] <cmccabe> it's 100% reproducible
[1:13] <gregaf1> yeah
[1:13] <cmccabe> probably someone forgot a put() somewhere
[1:13] <gregaf1> well, Sage isn't here right now so I'm going to do it
[1:14] <cmccabe> makes sense
[1:30] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:41] * sugoruyo (~george@athedsl-408632.home.otenet.gr) Quit (Quit: sugoruyo)
[1:57] <gregaf1> cmccabe: ungh, ???log-file doesn't work with testceph?
[1:57] <cmccabe> it should...
[1:58] <gregaf1> oh
[1:58] <gregaf1> it does, but it spews to stdout/stderr too
[1:58] <cmccabe> try --log-to-stderr=0
[1:58] <gregaf1> not that I remember the logging rules but that seems off
[1:59] <cmccabe> logging to any one sink is independent of logging to any other
[1:59] <gregaf1> yeah, looks like that does it
[1:59] <cmccabe> think of it this way: turning on the hot water doesn't turn off the cold
[1:59] <cmccabe> flushing the toilet doesn't turn off the sink
[1:59] <cmccabe> etc, etc
[1:59] <gregaf1> yeah, yeah, except that when you run a CLI program you're unlikely to want it filling out however many MB of logging to stderr
[2:00] * MarkN (~nathan@ Quit (Ping timeout: 480 seconds)
[2:00] <cmccabe> well, I think for CLI programs you usually want stderr
[2:01] <gregaf1> for warnings ??? not for logging :)
[2:01] <cmccabe> I mean if you don't want stderr and do want a logfile, the time-honored way to do it is foo 2> /my/log/file
[2:01] <gregaf1> w/e, it's not likely to bother anybodut outside our team
[2:01] <cmccabe> the other option is --log-to-stderr=1, which will log only high -priority messages to stderr
[2:02] <cmccabe> that is the default for daemons
[2:03] <cmccabe> I guess we could make it the default for everything, but I think it would lead to a lot of "where did my logging go?" questions when people run command-line programs
[2:04] <cmccabe> we could also create an --only-log-to-file=logfile option
[2:05] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[2:05] <cmccabe> I think if I were designing the system from scratch, I would simply have a single --log=<foo> option for every type of logging
[2:06] <cmccabe> then that option would take a comma-separated list of log sinks. So you would do --log=stderr,/tmp/out,syslog to log to stderr, a file, and syslog all at once
[2:06] <cmccabe> and you would do --log= or --log=none to turn off all logging
[2:06] <cmccabe> then if you want to log to a file which happened to be named stderr, you can do --log=./stderr
[2:07] <cmccabe> that would eliminate all confusion I think. Similarly, in the config files, there would be just one line for all logging. If you specify multiple ones the last one wins
[2:07] <cmccabe> anyway, ... everyone is used to the current behavior so I don't think anything is going to change.
[2:08] * MarkN (~nathan@ has joined #ceph
[2:30] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:35] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:37] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:43] * gregaf1 (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:49] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:53] * cmccabe (~cmccabe@ has left #ceph
[2:54] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[3:16] <stingray> gregaf: bah. the rest of the cluster is already in a completely different state but I'm sure I'll reproduce it at some point
[3:32] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[3:33] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[5:27] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:32] <iggy> 6
[5:33] <iggy> oops... i blame the phone keyboard
[5:39] * EricCampbell (~michael@ has joined #ceph
[5:40] <EricCampbell> Afternoon all.
[5:41] <EricCampbell> Just wondering if anyone else is running ceph on debian, don't care what dist..
[5:41] <greglap> EricCampbell: yep, all the development is done on Debian :)
[5:42] <EricCampbell> When I do an apt-get ceph with the newdream repo, i get ceph v29, but when i do an apt-get source i get v25 code.. what's with that?
[5:43] <greglap> uh, not sure...
[5:43] <greglap> I didn't know we had a repo set up for source
[5:43] <greglap> if you want to look at the code you should clone the git repository :)
[5:45] <EricCampbell> http://ceph.newdream.net/debian/dists/
[5:45] <EricCampbell> its got binaries and source. i gather that this area is no longer maintaned?
[5:46] <greglap> the binaries are up-to-date, or at least reasonably so
[5:46] <greglap> newest tagged release is .29.1
[5:46] <greglap> but honestly I don't know anything about packaging, I can poke Sage about it Monday
[5:47] <EricCampbell> is the equivalent tarball this? http://ceph.newdream.net/download/ceph-0.29.1.tar.gz
[5:47] <greglap> yah
[5:47] <EricCampbell> sweet.
[5:47] <EricCampbell> and in regards to the repo, its mentioned here.. http://ceph.newdream.net/wiki/Debian
[5:48] <greglap> yeah, but I don't see anything about source apt-get there :)
[5:48] <EricCampbell> That is true. The reason I added in the source is because i kept getting segfaults out of the repo binaries.
[5:49] <greglap> which doesn't mean it doesn't exist, there are lots of things like that which somebody asked for once and are kept in varying states
[5:49] <greglap> segfaults?
[5:49] <greglap> that's unexpected, did you keep any backtraces handy?
[5:50] <EricCampbell> after i did the apt-get install ceph i just tried ceph -v
[5:50] <EricCampbell> ceph -v
[5:50] <EricCampbell> Segmentation fault
[5:50] <greglap> hmmm
[5:51] <EricCampbell> so i compiled it out of the tarball...
[5:51] <EricCampbell> root@debian:/usr/local/src/ceph-0.29.1# gdb ./src/ceph
[5:51] <EricCampbell> GNU gdb (GDB) 7.2-debian
[5:51] <EricCampbell> Copyright (C) 2010 Free Software Foundation, Inc.
[5:51] <EricCampbell> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
[5:51] <EricCampbell> This is free software: you are free to change and redistribute it.
[5:51] <EricCampbell> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
[5:51] <EricCampbell> and "show warranty" for details.
[5:51] <EricCampbell> This GDB was configured as "i486-linux-gnu".
[5:51] <EricCampbell> For bug reporting instructions, please see:
[5:51] <EricCampbell> <http://www.gnu.org/software/gdb/bugs/>...
[5:51] <EricCampbell> Reading symbols from /usr/local/src/ceph-0.29.1/src/ceph...done.
[5:51] <EricCampbell> (gdb) run
[5:51] <EricCampbell> Starting program: /usr/local/src/ceph-0.29.1/src/ceph
[5:51] <EricCampbell> [Thread debugging using libthread_db enabled]
[5:51] <EricCampbell> Program received signal SIGSEGV, Segmentation fault.
[5:51] <EricCampbell> PGMap (this=0x81bfa00) at ./mon/PGMap.h:240
[5:51] <EricCampbell> 240 nearfull_ratio(((float)g_conf.mon_osd_nearfull_ratio)/100) {}
[5:52] <EricCampbell> sorry about the cut and paste should have asked..
[5:52] <greglap> it's possible somebody adjusted the config setup, it wouldn't do anything useful without a cluster running anyway but it shouldn't segfault
[5:52] <EricCampbell> just don't know if i am doing something dumb, or if it something to do with it running on an esx host.
[5:52] <EricCampbell> running gdb with run -v does the same thing...
[5:53] <greglap> we're still getting a proper qa setup so things that wouldn't work don't get regularly tested for nice behavio
[5:53] <greglap> *behavior
[5:53] <EricCampbell> kk.
[5:53] <EricCampbell> i have one physical box running debian stable that's currently compiling the tarball...
[5:54] <EricCampbell> i have a set of 3 virtuals for both squeez, wheezy and sid debian distro's...
[5:55] <EricCampbell> the squeeze vm's compile and run ceph without segfaulting, but the kernel doesn't support the filesystem.
[5:55] <greglap> I don't recall what kernel version is in there?
[5:56] <EricCampbell> i do a rbd create foo --size 512 and it keeps telling me it couldn't create a block device.
[5:56] <EricCampbell> its 2.6.32, and ceph support appears after 2.6.34
[5:56] <greglap> yeah
[5:56] <greglap> you'll need go get something newer to use the kernel bits :(
[5:56] <greglap> you can test it out with the FUSE client too, though
[5:56] <greglap> that works fine on whatever
[5:56] <EricCampbell> wheezy and sid are much newer...
[5:57] <greglap> *shrug*
[5:57] <EricCampbell> ah, as for the fuse client, i got annoying messages about the filesystem being corrupted and looping.
[5:57] <greglap> ...huh?
[5:57] <greglap> or are you on 32-bit and you mean the startup warnings
[5:58] <EricCampbell> i mounted the ceph cluster i had and ran jigdo-lite to get the latest netinst image as a test and it complained about a circular directory structure.
[5:59] <greglap> "it" being...
[5:59] <EricCampbell> jigdo-lite
[5:59] <EricCampbell> i'll turn the vm's on and see if i can get the output.
[6:02] <greglap> well either it's a better test than fsstress or pjd, or it's confused, or you set up a circular hierarchy somehow :/
[6:04] <EricCampbell> possibly
[6:14] <EricCampbell> seems i have to go for a drive, something went wrong at the datacenter when i turned on the extra vm's. thanks for your help anyway greg.
[6:14] <greglap> np
[6:14] * EricCampbell (~michael@ Quit (Remote host closed the connection)
[6:58] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[7:01] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Read error: Operation timed out)
[7:52] * Yulya_ (~Yu1ya_@ip-95-220-165-239.bb.netbynet.ru) has joined #ceph
[9:04] * sugoruyo (~george@athedsl-408632.home.otenet.gr) has joined #ceph
[13:38] <stingray> EricCampbell!~michael@
[13:38] <stingray> interesting
[13:39] <stingray> by the way, his sigsegv in PGMap.h has to do with globals initialization order
[13:39] <stingray> somehow g_conf isn't initialized at that point
[13:39] <stingray> but it only happens if you compile with gcc 4.6
[13:40] <stingray> I decided to ignore it for now and using something older, as it'll be fixed anyway when you finish the deglob
[13:40] <stingray> .
[14:08] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[14:45] <darkfader> maybe in a year we can do some fun benchmark - ceph with gcc vs. ceph with icc :)
[15:43] * sugoruyo (~george@athedsl-408632.home.otenet.gr) Quit (Quit: sugoruyo)
[15:56] <stingray> well
[15:56] <stingray> I don't think it'll yield any significant differences
[15:57] <stingray> much more promising is to make it not move everything around when one osd is out.
[15:57] <stingray> something like - exit degraded asap, then slowly balance
[16:11] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[16:13] <stingray> morning greg
[16:13] <greglap> lol
[16:13] <greglap> morning
[16:16] <greglap> globals in gcc 4.6, huh?
[16:16] <greglap> that would make sense, it will probably make Colin smile
[16:20] <stingray> I've told him a couple of days ago
[16:20] <stingray> my brain exploded when I tried to fix it
[16:22] <greglap> heh
[20:26] * verwilst_ (~verwilst@dD576F54C.access.telenet.be) has joined #ceph
[20:35] * verwilst_ (~verwilst@dD576F54C.access.telenet.be) Quit (Quit: Ex-Chat)
[21:12] * verwilst_ (~verwilst@dD576F54C.access.telenet.be) has joined #ceph
[21:36] * Yulya_ (~Yu1ya_@ip-95-220-165-239.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[22:11] * Yulya_ (~Yu1ya_@ip-95-220-159-15.bb.netbynet.ru) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.