#ceph IRC Log


IRC Log for 2011-06-18

Timestamps are in GMT/BST.

[0:06] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[0:16] <Tv> yehudasa: oh hey while waiting for a slow task i got time to read the s3-tests multipart change; the python looks plenty fine, if it works then it's good to go to master
[0:16] <yehudasa> Tv: ok, great
[0:17] <Tv> huh what, "file" thinks our core files are "from '/ceph.conf'"
[0:18] <Tv> Core was generated by `/tmp/cephtest/binary/usr/local/bin/cosd -f -i 0 -c /tmp/cephtest/ceph.conf'.
[0:18] <Tv> ah that's better, file is just stupid
[0:23] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[0:23] * jmlowe (~Adium@mobile-166-137-141-159.mycingular.net) has joined #ceph
[0:38] * gregorg_taf (~Greg@ has joined #ceph
[0:38] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[0:48] <slang> sagewk: sorry for the delay
[0:49] <slang> sagewk: your commit fixes the one valgrind error I was seeing (the invalid read one), but the other one (invalid free) persists: http://pastebin.com/raw.php?i=irtTzL93
[0:50] <slang> The same error shows up in gdb
[0:51] <sagewk> can you generate an mds log that corresponds with that valgrind log?
[0:53] <slang> in valgrind it doesn't fail
[0:53] <slang> just reports those errors and continues to write to the log
[0:54] <sagewk> slang: hmm. can you post the mds log at least up to that point? having trouble making sense of the error
[0:55] <slang> sure
[0:55] <slang> I tried debugging a bit, but got lost in fullbit indirection...
[0:58] <sagewk> yeah
[1:06] <slang> http://pastebin.com/raw.php?i=HJ6f60vq
[1:06] * jmlowe (~Adium@mobile-166-137-141-159.mycingular.net) Quit (Read error: Connection reset by peer)
[1:06] <Tv> teuthology now collects core dumps
[1:06] <sjust> :)
[1:07] <slang> btw, cmds has -D in the usage for run in the foreground (but write to the log), but the binary doesn't recognize the flag
[1:07] <slang> and -f seems to write to the log instead of stderr
[1:07] <cmccabe> slang: I'm glad someone is finally trying valgrind with the mds
[1:08] <Tv> slang: *sigh*
[1:09] <Tv> (i fought that fight for a while.. the usages are ridiculously bad, and the -d/-D/-f thing is a mess)
[1:10] <slang> cmccabe: hasn't been done in a while?
[1:10] <cmccabe> slang: yeah
[1:11] <slang> might be good to setup some automated builds/tests that run the server processes in valgrind
[1:11] <sagewk> slang: can you reproduce with http://pastebin.com/2WtjM2bh ?
[1:11] <Tv> yeah that should be fairly easy, though we'd need a better "flavors of running" mechanism
[1:12] <Tv> but it's kinda the same thing as coverage etc, so the beginning is there
[1:12] <sagewk> slang; it's -d instead of -D now, i'll update the usage
[1:12] <cmccabe> slang: so anyway, -D was renamed to -d at some point
[1:12] <cmccabe> the usage should have been updated but it wasn't
[1:12] <slang> ah ok
[1:12] <sagewk> cmccabe: this is right, right?
[1:12] <sagewk> -d Run in the foreground, log to stderr.\n\
[1:12] <sagewk> -f Run in foreground, log to usual location.\n\
[1:13] <gregaf1> we've run it through valgrind in the past and probably could do test cases in the future but from what I remember it's just so slow that anything significant is basically unusable
[1:13] <cmccabe> sagewk: I was just looking at it, let me confirm
[1:13] <cmccabe> yes, --foreground/-f does not affect logging
[1:14] <sagewk> slang: whoops, make that http://pastebin.com/kYZyvUEe
[1:16] <slang> trying..
[1:16] <cmccabe> it looks like the man pages get the -d/-f thing right, but the usage does not
[1:16] <sagewk> slangs: thanks :)
[1:17] <sagewk> cmccabe: pushed fix to stable
[1:17] * dwm (~dwm@vm-shell4.doc.ic.ac.uk) Quit (Ping timeout: 480 seconds)
[1:18] <Tv> cmccabe: please run make check before you push that change
[1:18] <cmccabe> tv: I believe sage is making/has made that change
[1:18] <cmccabe> tv: I pulled stable and didn't see it though
[1:18] <Tv> ah err oh well
[1:18] <cmccabe> ok, now I see it
[1:19] <cmccabe> yes, the tests will need to be updated I suspect
[1:23] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[1:25] <cmccabe> I have updated clitests to reflect the new usage message.
[1:27] <cmccabe> I am happy that automake has the ability to do library dependencies
[1:28] <cmccabe> and handle -fPIC intelligently too
[1:28] <cmccabe> I think this can be used to speed up our build significantly, and also help with making the source slightly more modular, which I think is something we've been talking about
[1:29] <slang> still see the errors in valgrind
[1:29] <sagewk> yay!
[1:29] <slang> this is the log till just after the error shows up: http://pastebin.com/raw.php?i=6h01H3Vb
[1:29] <sagewk> slang: yeah but the log should have enough info for me
[1:30] <slang> right
[1:30] <slang> sagewk: I figured that since all the changes were log messages :-)
[1:31] <sagewk> :) can you tell me which line immediate preceeds the valgrind error?
[1:31] <slang> uh
[1:31] <sagewk> the timestamp is enough
[1:31] <sagewk> oh not the same terminal probably
[1:32] <sagewk> oh, actually a pastebin of that version fo the valgrind error shoudl do it
[1:32] <slang> yeah let me try with -d
[1:32] <sagewk> it has the pointer in there
[1:36] <slang> -d takes too long
[1:36] <slang> or writing to console prevents the error..
[1:36] <sagewk> the valgrind error matching that log is pbly enough, i think it has the pointer value i need
[1:37] <slang> yeah its gone from my console -- sorry
[1:37] <slang> redoing the whole thing
[1:37] <sagewk> :( thanks!
[1:38] <slang> vg: http://pastebin.com/raw.php?i=yLh5fuSJ
[1:39] <slang> log: http://pastebin.com/raw.php?i=DvC5Xhtf
[1:39] <Tv> <>[2823880.339]ET-s(d1:r-one.Ot:err=eon-oue_at
[1:39] <Tv> that reminds of doing kernel development with serial consoles
[1:39] <Tv> yes that just came out of a kernel ;)
[1:39] <Tv> [2823880.353092] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr
[1:40] <Tv> every other character! smp woo!
[1:41] <Tv> translation: no, you can't read /proc/kmsg while syslog is reading it too
[1:43] <Tv> (boggles my mind how many core software components still read/write one character at a time.. that's gotta be less efficient)
[1:44] <cmccabe> do we still need g_conf.osd_pg_layout?
[1:45] <slang> off to find more beer
[1:45] <slang> sagewk: email if there's other patches I can try (samlang@gmail.com)
[1:45] * slang (~slang@chml01.drwholdings.com) has left #ceph
[1:46] <sagewk> ok. thanks for the help!
[1:47] <cmccabe> tv: isn't /proc/kmsg a pipe or fifo or something?
[1:47] <Tv> cmccabe: a virtual file or something like that.. it's magic
[1:47] <Tv> all of /proc is
[1:49] <cmccabe> I guess to userspace, it appears to be a regular file
[1:49] <Tv> yup, most of /proc does
[1:50] <cmccabe> I would have expected it to show up as a pipe actually
[1:50] <Tv> you used to be able to write it too, etc
[1:51] <cmccabe> you can write to named pipes
[1:51] <sagewk> sigh.. EMetaBlob::dirlump is getting copied somewhere.
[1:52] <sagewk> will fix it up later. have a good weekend guys!
[1:52] <Tv> cmccabe: i think you can't have multiple writers open, or something like that
[1:52] <cmccabe> no, you can, and reads of less than PIPE_BUF bytes are atomic
[1:52] <cmccabe> PIPE_BUF is 4096 on linux
[1:53] <cmccabe> you block if you read more than is available though, so maybe that's the issue?
[1:53] <Tv> ah i remember the non-blocking mode behavior, i think
[1:53] <cmccabe> but still... blocking if you read more than what is there is what you want usually for kmsg
[1:53] <Tv> but anyway, fifos are really rarely used, either in userspace or kernelspace
[1:53] <cmccabe> yeah, the nonblocking pipe behavior is a little more complex, but still reasonable
[1:54] <Tv> some days i dream of a linux with all the "legacy" features stripped
[1:54] <cmccabe> fifos are a great feature
[1:54] <cmccabe> a very simple and easy-to-understand form of IPC
[1:55] <Tv> and mostly done very very wrong
[1:55] <cmccabe> much better than a lot of the complex schemes that crept in later like CORBA, DBUS and so forth
[1:55] <cmccabe> or Microsoft COM, now also legacy
[1:55] <Tv> here's a data point: apart from autotest (which is a mess anyway), there are no fifos on this random sepia node i just ran a find on
[1:55] <cmccabe> what about fifos is "done very very wrong"?
[1:57] <Tv> sorry i don't have time for that conversation
[1:57] <Tv> it just belongs in the school of unix you're not supposed to use anymore
[1:57] <cmccabe> because?
[1:57] <Tv> much like sysv ipc, STREAMS, etc
[1:57] <cmccabe> so far you have not managed to name even one disadvantage, however vague
[1:57] <Tv> yes, because i'm not going to get into that conversation now
[1:58] <cmccabe> and I'm pretty sure Michael Kerrisk's book from 2010 has good things to say about named pipes
[1:58] <cmccabe> Named pipes may be somewhat rare, but anonymous pipes are used by almost every UNIX program out there. And the underlying implementation is the same-- the file descriptor system.
[1:59] <Tv> that's a very weak link
[1:59] <cmccabe> er, what?
[1:59] <Tv> fifos are all about the filesystem ugliness of them
[1:59] <cmccabe> they're both returned from open() and function almost the same
[1:59] <Tv> about needing the concept of special files
[1:59] <Tv> about being dangerous to touch because it might hang
[1:59] <Tv> etc etc
[1:59] <cmccabe> in fact they might be identical aside from how you get the file descriptor, I'd have to check
[2:00] <Tv> pipe(2) and inheriting file descriptors is a completely different api
[2:00] <cmccabe> pipe(2) creates an anonymous pipe, not a fifo
[2:01] <cmccabe> There's nothing "dangerous" about blocking if you're reading from a fifo that is empty.
[2:01] <cmccabe> unless you believe that sockets should also be banned for the same reason.
[2:01] <cmccabe> anyway, if you're so upset about blocking, use non-blocking mode.
[2:03] <cmccabe> The use of the filesystem to identify and organize system resources has been consistently praised as one of the best things about UNIX
[2:03] <cmccabe> in fact I think one thing Kerrisk criticizes about SYSV IPC is that it doesn't use the filesystem to track SYSV semaphores, etc, instead requiring special programs
[2:05] <cmccabe> tv: anyway-- just read the chapter about pipes in Kerrisk if you want to learn more about it.
[2:06] <cmccabe> tv: you are getting confused because someone explained that SYSV IPC was deprecated and bad, which it is, and you somehow conflated that with fifos and anonymous pipes, which are elegant and good.
[2:07] <cmccabe> anyway, I gtg.
[2:07] <cmccabe> have a good friday all.
[2:07] <Tv> *cough*
[2:08] * cmccabe (~cmccabe@ has left #ceph
[2:24] * dwm (~dwm@vm-shell4.doc.ic.ac.uk) has joined #ceph
[2:31] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:44] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[2:45] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[2:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:05] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[4:10] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[4:30] * votz (~votz@dhcp0020.grt.resnet.group.UPENN.EDU) Quit (Quit: Leaving)
[4:42] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[4:42] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:43] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[5:21] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:49] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:06] * Yulya_ (~Yu1ya_@ip-95-220-149-165.bb.netbynet.ru) has joined #ceph
[8:06] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[8:55] * allsystemsarego (~allsystem@ has joined #ceph
[9:59] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[10:00] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[10:08] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[10:08] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[11:07] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[11:07] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[11:30] * re_rock (~re_rock@83TAABWLX.tor-irc.dnsbl.oftc.net) has joined #ceph
[11:30] <re_rock> anyone here?
[11:31] * re_rock (~re_rock@83TAABWLX.tor-irc.dnsbl.oftc.net) Quit ()
[12:31] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[13:37] * Yulya_ (~Yu1ya_@ip-95-220-149-165.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[14:33] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[14:39] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:39] <stingray> how I am supposed to load classes on recent?
[16:48] * alexxy[home] (~alexxy@ has joined #ceph
[16:52] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[16:56] * alexxy[home] (~alexxy@ Quit (Ping timeout: 480 seconds)
[16:58] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Read error: Connection reset by peer)
[17:02] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[17:03] * alexxy (~alexxy@ has joined #ceph
[17:13] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:24] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[17:31] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:32] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit ()
[17:42] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[17:53] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[18:11] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[18:41] <stingray> okay, it was plain and simple packaging fail
[19:14] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[19:18] * alexxy (~alexxy@ has joined #ceph
[19:27] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[19:30] * alexxy (~alexxy@ has joined #ceph
[21:34] <stingray> huh
[21:54] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[22:18] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Quit: Leaving)
[22:18] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[22:49] <stingray> and now rbd export doesn't match rbd import
[22:49] <stingray> fffuuu
[23:05] <sage> stingray: hmm?
[23:11] <stingray> sage: you're here
[23:12] <stingray> I thought everyone is on weekend or something
[23:12] <stingray> anyway
[23:12] <stingray> rbd import:
[23:12] <stingray> rbd import file_pos=0 extent_len=12288
[23:12] <stingray> reading 12288 bytes at offset 0
[23:12] <stingray> rbd import file_pos=1048576 extent_len=278528
[23:12] <stingray> reading 278528 bytes at offset 1048576
[23:12] <stingray> rbd import file_pos=1331200 extent_len=4096
[23:12] <stingray> reading 4096 bytes at offset 1331200
[23:12] <stingray> export:
[23:12] <stingray> writing 12288 bytes at ofs 0
[23:12] <stingray> writing 1335296 bytes at ofs 4194304
[23:12] <stingray> writing 4096 bytes at ofs 9437184
[23:12] <stingray> offsets do not quite match
[23:13] <sage> can you submit a bug report?
[23:13] <stingray> sure
[23:13] <stingray> I was hoping I can understand what's going on before doing so
[23:18] <stingray> sage: submitted
[23:35] <sage> thanks. yehuda should be able to take a look on monday. those offsets should all match
[23:37] * verwilst (~verwilst@dD576F319.access.telenet.be) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.