#ceph IRC Log


IRC Log for 2011-04-16

Timestamps are in GMT/BST.

[0:10] <yehuda_hm> Tv, cmccabe, sagewk: maybe we should make a small effort to find out whether the conf file has the now invalid entity names and warn about it in a descriptive way?
[0:11] <Tv> yehuda_hm: frankly, this early i expect people to actually read release notes
[0:12] <yehuda_hm> Tv: we should make a small effort not to completely frustrate our users
[0:13] <cmccabe> yehuda_hm: a warning wouldn't be hard to put in
[0:13] <sagewk> maybe grep -q osd0 $conf && echo WARNING: blah blah in init-ceph
[0:13] <yehuda_hm> yeah, something like that
[0:14] <cmccabe> I can put it in common_init. HOnestly that's probably easier than sed
[0:14] <cmccabe> or grep
[0:15] <cmccabe> in shell scripts, I'm always worried that I'll forget to escape something
[0:19] * WesleyS (~WesleyS@ has joined #ceph
[0:20] <WesleyS> Sage: Happen to know what's up with the playground?
[0:20] <WesleyS> RGW seems to be timing out
[0:22] <yehuda_hm> wes
[0:22] <yehuda_hm> p
[0:22] <yehuda_hm> wes
[0:22] <gregaf> WesleyS: playground got filled up and we're running into amusing boundary conditions
[0:22] <yehuda_hm> oh, my keyboard
[0:22] <WesleyS> Ah, okay!
[0:22] <gregaf> mostly reminding us that Ceph hates full disks
[0:22] <yehuda_hm> wes: whatever greg said
[0:23] <sagewk> maybe someone can run vstart somewhere and point cephbeta/objects temporarily at that for testing?
[0:24] <yehuda_hm> sagewk: that'll also require using the same keyring as the one the playground uses
[0:26] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:29] * WesleyS (~WesleyS@ Quit (Quit: WesleyS)
[0:29] <cmccabe> yehuda_hm: ok, the warning is in there.
[0:30] <cmccabe> yehuda_hm: this is a good warning to have for at least a little while until everyone forgets about the old stuff
[0:41] <sagewk> yehuda_hm: or swap the keyring too?
[0:45] <yehuda_hm> sagewk: actually, can just set ceph auth = none
[0:46] <sagewk> that too :)
[0:47] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[0:50] <yehuda_hm> sagewk: I can run it on ballgate0
[0:51] <sagewk> ah, easy :)
[0:57] <gregaf> Tv: do you remember what happened when one of your test machines ran out of disk space and all the threads hung on dout calls?
[0:58] <Tv> gregaf: not without a bit more nudging...
[0:58] <Tv> all the threads eventually hit the dout lock, yea
[0:58] <gregaf> IIRC it presented as the messenger blocking for some reason and you called me over
[0:59] <Tv> yea
[0:59] <gregaf> and it turned out the disk was full and so the dout prints were never finishing?
[0:59] <Tv> i'm trying to remember why the thread trying to write would hang, instead of fail
[0:59] <gregaf> sagewk says that those should have just errorred out so things would have kept going
[0:59] <gregaf> is why I ask
[0:59] <Tv> it might have raised an exception that went unhandled, or something?
[0:59] <gregaf> I thought you said something about pipes filling up, but Sage says no pipes are involved?
[1:00] <Tv> oh yeah
[1:00] <gregaf> was that an autotest pipe or something?
[1:00] <gregaf> or was it just going to stderr and stderr was redirected to disk?
[1:00] <Tv> well logging under autotest goes to stdout/err
[1:00] <gregaf> okay, that makes sense
[1:00] <Tv> but i can't come up with a reason why that'd hang, right now
[1:01] <gregaf> was trying to work out something else and then got stuck on this, but no problem now
[1:01] <Tv> except for..
[1:01] <cmccabe> tv: dout just calls write(2)
[1:01] <Tv> ok now i'm starting to remember
[1:01] <gregaf> I think the stderr/out was supposed to be consumed by a disk writer rather than console, and it wasn't
[1:01] <Tv> so
[1:01] <gregaf> because the disk was full
[1:01] <Tv> it goes like this:
[1:01] <Tv> normally under autotest logs go to out/log/...
[1:01] <Tv> but the process stdout/stderr is also captured
[1:02] <Tv> but that's a pipe that's not actively emptied, until the process exits
[1:02] <Tv> it's meant more for quick error message + exit(1)
[1:02] <Tv> so i think what happened was that something was spewing both to log and stderr
[1:02] <cmccabe> tv: you can turn off stderr completely with [global] log to stderr = 0
[1:02] <Tv> and that filled the pipe buffer
[1:03] <Tv> gregaf: so it had nothing to do with disk full, as far as i can figure out
[1:03] <gregaf> ah, okay
[1:03] <Tv> cmccabe: added a todo note
[1:04] <cmccabe> tv: stderr should be closed after the daemons call daemon() though
[1:04] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[1:04] <Tv> cmccabe: no double-forks in this setup
[1:04] <Tv> as there will be in almost any new school deployment...
[1:05] <Tv> will not be, more like it
[1:05] <cmccabe> tv: er, running without daemon is kind of weird
[1:05] <cmccabe> tv: why do you refer to this as new-school?
[1:05] <Tv> cmccabe: because everyone's moving away from sysvinit
[1:06] <Tv> /etc/init.d is gonna go away, and with it, self-daemonizing
[1:06] <Tv> pid files are gonna go away
[1:06] <cmccabe> tv: do upstart and systemd start new processes in the daemon environment already?
[1:06] <Tv> world will be a cleaner, happier, and more robust place
[1:06] <Tv> cmccabe: there is no "daemon environment", there is just unix ;)
[1:07] <cmccabe> tv: daemon does a bunch of things. redirects stderr, stdout to /dev/null, chdir("/"), setsid, reparent
[1:07] <Tv> and none of that is wanted
[1:07] <Tv> -1000 lines of code for every daemon! yay!
[1:08] <cmccabe> tv: well, most daemons just call daemon()
[1:08] <Tv> i find that actually pretty rare
[1:08] <cmccabe> tv: the pidfile thing admittedly is duplicated code
[1:08] <cmccabe> tv: anyway, I'm just saying that whoever is setting up the environment should not neglect to close stderr, stdout, chdir, setsid, reparent
[1:08] <bchrisman> always despised pid files.. :)
[1:08] <Tv> daemon(3) behaves different on linux/bsd etc, most portable software reimplements it
[1:09] <Tv> many play tricks with a self-pipe to make sure parent doesn't exit until child has inited, etc
[1:09] <cmccabe> tv: I think you are right that centralizing this work will be more robust
[1:09] <gregaf> cmccabe: looks like we need to update the conf tests for that change in accepted config headings
[1:09] <Tv> it's not even just centralizing, it'll make new things possible
[1:09] <Tv> like noticing your sshd dies, and restarting it
[1:09] <cmccabe> tv: of course, it's ironic you would argue against code duplication since you are in essence redoing this work in autotest :)
[1:09] <cmccabe> tv: when you redo it, please do it right!
[1:09] <Tv> cmccabe: autotest is not an init system, but it somewhat simulates one
[1:10] <cmccabe> tv: anything that inits processes is an init system :)
[1:11] <cmccabe> tv: personally I would have just installed the sysv init scripts and had autotest invoke them... but your way works too.
[1:11] <Tv> my shell is not an init system
[1:13] <bchrisman> cp /bin/bash /bin/initbash
[1:13] <bchrisman> done :)
[1:13] <bchrisman> sorry… :)
[1:13] <cmccabe> bchrisman: tgif eh
[1:14] <bchrisman> cmccabe: you bet.. :)
[1:15] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[1:18] * lxo (~aoliva@ has joined #ceph
[1:23] <yehuda_hm> sagewk: cephbeta is up now, but users are not generated, so if anyone wants to use it it'll need to create a user
[2:14] * Juul (~Juul@slim.dhcp.lbl.gov) Quit (Ping timeout: 480 seconds)
[2:15] * verwilst (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[2:37] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:45] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:59] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:59] * greglap (~Adium@ has joined #ceph
[3:22] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:23] * cmccabe (~cmccabe@ has left #ceph
[3:56] * greglap (~Adium@ Quit (Quit: Leaving.)
[4:08] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Quit: Leaving)
[4:10] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[4:35] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[4:37] * Juul (~Juul@slim.dhcp.lbl.gov) has joined #ceph
[5:02] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:13] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has left #ceph
[5:13] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:31] * lxo (~aoliva@ has joined #ceph
[5:32] * Juul (~Juul@slim.dhcp.lbl.gov) Quit (Ping timeout: 480 seconds)
[5:47] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) Quit (Remote host closed the connection)
[5:52] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[6:06] * lxo (~aoliva@ Quit (Read error: Operation timed out)
[8:02] * Juul (~Juul@slim.dhcp.lbl.gov) has joined #ceph
[8:50] * Dantman (~dantman@S0106001eec4a8147.vs.shawcable.net) has joined #ceph
[9:53] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[12:05] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[12:39] * Juul (~Juul@slim.dhcp.lbl.gov) Quit (Quit: Leaving)
[14:24] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[15:17] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Quit: Ex-Chat)
[15:22] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[15:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[15:39] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[16:37] * lxo (~aoliva@ has joined #ceph
[16:52] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[17:02] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Remote host closed the connection)
[17:04] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[17:13] * lxo (~aoliva@ has joined #ceph
[17:37] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[18:05] * ijuz__ (~ijuz@p4FFF65C2.dip.t-dialin.net) has joined #ceph
[18:53] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:08] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:11] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[19:19] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Operation timed out)
[19:20] * jantje_ (~jan@paranoid.nl) has joined #ceph
[19:20] * jantje (~jan@paranoid.nl) Quit (Read error: Connection reset by peer)
[19:21] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[19:24] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[19:25] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:28] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:35] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[19:39] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[19:55] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[20:00] * allsystemsarego (~allsystem@ has joined #ceph
[20:44] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Quit: Leaving)
[20:44] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) has joined #ceph
[21:12] * lxo (~aoliva@ has joined #ceph
[22:22] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:38] * lxo (~aoliva@ Quit (Quit: later)
[22:39] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[22:53] * lxo (~aoliva@ has joined #ceph

