#ceph IRC Log


IRC Log for 2011-07-20

Timestamps are in GMT/BST.

[0:10] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[1:07] * MarkN (~nathan@ Quit (Quit: Leaving.)
[1:26] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:47] * stingray headdesks over signalfd
[1:47] <cmccabe> stingray: what's wrong with signalfd?
[1:54] <stingray> well
[1:54] <stingray> I'm trying to solve a simple problem - waitpid with timeout
[1:55] <stingray> the thing is, it has to work in multithreaded environment
[1:56] <stingray> more specific picture is - thread A clone()s pidA1 and starts thread A2 which will need to waitpid_with_timeout(pidA1, ..., timeout) in a loop for a while
[1:57] <cmccabe> stingray: probably SIGCHLD is the best solution
[1:57] <stingray> in order for signalfd to work, I have to sigmask chld out
[1:57] <stingray> yeah, right
[1:58] <stingray> but it's cleaner to have a control thread per cloned child, but signals are being sent to the entire group
[1:59] <stingray> or I can create a signal thread which is a common paradigm and then use a bunch of eventfds to demultiplex the events and send them to control threads
[1:59] <stingray> :(
[2:00] <stingray> well, there's also this way:
[2:00] <cmccabe> probably the best solution is to call wait(WNOHANG) in the signal handler itself
[2:00] <cmccabe> then write a byte to a file descriptor, or post a semaphore for the relevant thread
[2:00] <stingray> start a thread that blocks on waitpid, and another thread that does periodic processing, and if we have to bail the other thread just kills the child and first thread continues
[2:01] <cmccabe> wait is a signal-safe function, at least when called with WNOHANG
[2:01] <cmccabe> and you know wait will return something useful when you receive SIGCHLD
[2:01] <stingray> in this case everything is thread-local and I don't need to demultiplex anything
[2:02] <cmccabe> it all depends on what you're trying to do
[2:02] <stingray> cmccabe: yeah, but how many threads you need to have this signal handler installed?
[2:02] <cmccabe> at least one
[2:02] <stingray> well, as I said, I will have variable number of Subprocess objects, that are independent
[2:03] <stingray> there's some lame rpc stack that creates those subprocess objects
[2:04] <cmccabe> stingray: if your process is massively multithreaded, consider using posix_spawn rather than fork+exec
[2:04] <stingray> and within each subprocess after I start it, while it's running every iTimeQuantum I need to check various things, and if those things don't look good - kill the child
[2:04] <cmccabe> stingray: the overhead of fork can be high due to copying all the TLB entries for a big old process
[2:04] <stingray> cmccabe: I'm not using fork, I'm doing clone with CLONE_VM
[2:04] <cmccabe> stingray: that's pretty inside baseball
[2:05] <cmccabe> stingray: :)
[2:05] <stingray> and it's not just TLB overhead it's also fotr a huge process overcommit settings may prevent fork from succeeding
[2:05] <stingray> it'll give you enomem
[2:05] <cmccabe> stingray: yeah
[2:06] <stingray> that's an interview question I ask sometimes :)
[2:06] <cmccabe> stingray: if you already have a lame rpc stack polling the subprocess, can't it just call wait(WNOHANG)?
[2:06] <stingray> well, it's not polling
[2:07] <cmccabe> stingray: it is kind of frustrating that wait just doesn't have a timeout
[2:07] <cmccabe> stingray: I really expected that it would when I looked at the man page
[2:07] <stingray> it's boost:asio all over the place and when subprocess is finished it's actually callback that posts another callback to io_service()
[2:07] <stingray> yeah
[2:07] <cmccabe> stingray: I mean semaphores do...
[2:08] <stingray> and patches to add waitfd() to unify it with signalfd() and timerfd() were rejected repeatedly
[2:08] <stingray> with reasons like - use signalfd
[2:08] <stingray> what about threads? - threads? what threads?
[2:09] <cmccabe> waitfd does seem like a good idea
[2:09] <cmccabe> it seems like maybe the only way to do it without unecessary context switches?
[2:10] <stingray> :(
[2:10] <cmccabe> I mean signalfd can do it without unecessary context switches if there's no other threads waiting for children
[2:12] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[2:12] <stingray> well it seems that if I create gazillion threads with signalfd each they will all be waking up for every child
[2:12] <stingray> but it's okay since I am doing WNOHANG anyway and skipping over the unnesessary wakeups
[2:14] <cmccabe> stingray: it seems like the signalfd man page says " A thread will not be able to read signals that are directed to other threads in the process." from the signalfd
[2:14] <cmccabe> stringray: but I don't know if SIGCHLD gets sent to the specific thread that created the child?
[2:16] <stingray> cmccabe: it's some crazy rule, let me find it
[2:21] <stingray> well I can't find :( IIRC if you don't do crazy things then sigchld is tgkill to your process
[2:21] <stingray> and all threads shall see it
[2:22] <cmccabe> stingray: it really seems like the kernel should send the signal to the parent thread, but I can't find that documented anywhere
[2:22] <stingray> that's that I think is happening here - in this prototype I create a signalfd in every thread and all of them get notified
[2:22] <stingray> to parent process :)
[2:22] <stingray> and the process is your thread group id which is equal to your first thread id
[2:23] <cmccabe> if you want to be portable, you could just create one thread to handle all signals
[2:23] <cmccabe> block signals in every other thread
[2:23] <cmccabe> using pthread_sigmask
[2:23] <cmccabe> and then have the signal thread just sleep forever in between running signal handlers
[2:24] <cmccabe> which presumably would just write to file descriptors that would wake other threads
[2:24] <cmccabe> it's an extra context switch, but it avoids the thundering herd problem you might have with calling signalfd from multiple threads
[2:25] <stingray> no I don't want to be portable, this is 100% linux only
[2:25] <stingray> with cgroups and stuff
[2:25] <cmccabe> that's the thing about signalfd
[2:26] <cmccabe> people were complaining that it doesn't really buy you that much over just writing to an fd inside your signal handler
[2:26] <stingray> yeah, I've read this thing over
[2:27] <cmccabe> as long as you can't control what thread gets SIGCHLD, you are always going to have at least one needless context switch
[2:27] <stingray> the real complaint is you have to block signals and then on execve the mask is not reset so child process may get confused
[2:29] <cmccabe> hmm, the man page for execve says "The dispositions of any signals that are being caught are reset to the default (signal(7))."
[2:29] <stingray> anyway
[2:29] <cmccabe> I guess the disposition is different than the mask?
[2:29] <stingray> yeah
[2:30] <stingray> it's a few pages down on man 7 signal
[2:30] <stingray> A child created via fork(2) inherits a copy of its parent's signal mask; the signal mask is preserved across execve(2).
[2:30] <cmccabe> for posix_spawn, there is POSIX_SPAWN_SETSIGMASK
[2:30] <cmccabe> and there's always fork + setsigmask + exec
[2:30] <stingray> well, have you actually looked at how glibc implements posix_spawn?
[2:30] <cmccabe> haven't looked...
[2:31] <cmccabe> am I going to be sad?
[2:32] <stingray> yeah
[2:32] <cmccabe> sigh... looking at spawni.c now
[2:32] <stingray> it's either fork or vfork
[2:32] <cmccabe> there is one code path that uses vfork... which avoids the TLB issues
[2:32] <cmccabe> but I think could be racy
[2:32] <cmccabe> vfork had some really odd semantics, I can't remember right now
[2:33] <stingray> the thing I'm doing with clone is not portable but much better
[2:34] <cmccabe> http://linux.die.net/man/2/vfork
[2:34] <cmccabe> Bugs: It is rather unfortunate that Linux revived this spectre from the past
[2:34] <cmccabe> when the function's own man page describes it as "rather unfortunate" that it exists... then you know it's a WTF
[2:34] <stingray> clone ftw!
[2:34] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[2:34] <stingray> vfork/fork wtf
[2:35] <cmccabe> yeah I think clone is much better for this
[2:35] <cmccabe> although I like fork from a philosophical point of view
[2:35] <cmccabe> and I really think the glibc guys should just use clone
[2:35] <cmccabe> for posix_spawn
[2:35] <cmccabe> I'm sure Ulrich would furiously reject any patch to make it do so though
[2:37] <stingray> Ulrich would furiously.
[2:37] <stingray> (fixed)
[2:37] <stingray> :)
[2:38] <stingray> somebody pointed me at debian-devel
[2:38] <stingray> there's a nice thread about systemd
[2:38] <stingray> which turned into a rage thread when lennart intervened
[2:39] <cmccabe> lennart is pretty self-confident, and I do think he's a good engineer
[2:39] <cmccabe> I feel like he doesn't always consider things from the end user's point of view though
[2:40] <cmccabe> a lot of times when you change things it has to be really gradual, just because the other software doesn't want to change
[2:40] <cmccabe> I hope systemd becomes more popular though-- there are a lot of good ideas in there
[2:42] <cmccabe> well, I'm heading out
[2:42] <cmccabe> stingray: good luck with the project
[2:43] * cmccabe (~cmccabe@ has left #ceph
[3:10] * joshd (~joshd@ip-64-111-111-107.dreamhost.com) Quit (Quit: Leaving.)
[3:12] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[3:34] * yoshi (~yoshi@p4094-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:17] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[4:26] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[4:54] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[5:57] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[6:02] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[7:37] * lx0 (~aoliva@ has joined #ceph
[7:37] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[8:13] * lx0 is now known as lxo
[10:26] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[10:39] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Quit: Ex-Chat)
[10:50] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[11:14] * yoshi (~yoshi@p4094-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:50] * jantje (~jan@paranoid.nl) Quit (Ping timeout: 480 seconds)
[14:20] * jantje (~jan@paranoid.nl) has joined #ceph
[14:43] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[14:45] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[14:46] * lxo (~aoliva@9KCAAAXVO.tor-irc.dnsbl.oftc.net) has joined #ceph
[14:54] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:49] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Quit: bla)
[15:51] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[15:55] * monrad-51468 (~mmk@domitian.tdx.dk) Quit ()
[15:58] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[16:02] * monrad-51468 (~mmk@domitian.tdx.dk) Quit ()
[16:05] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[16:09] * monrad-51468 (~mmk@domitian.tdx.dk) Quit ()
[16:11] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[16:15] * monrad-51468 (~mmk@domitian.tdx.dk) Quit ()
[16:16] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[16:36] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Quit: bla)
[16:36] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[16:45] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[16:48] * greglap (~Adium@ has joined #ceph
[17:03] * greglap1 (~Adium@ has joined #ceph
[17:08] * greglap (~Adium@ Quit (Ping timeout: 480 seconds)
[17:19] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) has joined #ceph
[17:22] * greglap1 (~Adium@ Quit (Quit: Leaving.)
[17:23] * greglap (~Adium@ has joined #ceph
[17:34] * greglap (~Adium@ Quit (Quit: Leaving.)
[17:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:11] * cmccabe (~cmccabe@ has joined #ceph
[18:32] * joshd (~joshd@ip-64-111-111-107.dreamhost.com) has joined #ceph
[18:36] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:45] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[19:06] <Tv> *** glibc detected *** /home/tv/src/ceph.git/src/.libs/lt-radosgw: double free or corruption (out): 0x00007fdc4c017050 ***
[19:06] <Tv> lovely
[19:06] <yehudasa> Tv: the master version is not stable
[19:07] <Tv> ok, won't bother tracking that further
[19:07] <yehudasa> there were a few fixes that went into the wip-rgw-multithreaded branch
[19:07] <yehudasa> you're probably hitting one of those
[19:08] <yehudasa> but that branch requires different apache configuration.. I'll make that optional and merge with master
[19:13] <Tv> i'm not blocked by that by any means, so no worries
[19:13] <Tv> first time i see that, and apache just spawns a new worker, so it's all good
[19:13] <yehudasa> Tv: yeah, but I should do it anyway, trying to debug something and want to make sure that it's a regression..
[19:35] <bchrisman> right now we use ctdb to control ownership of exported block devices..
[20:05] * aliguori (~anthony@ has joined #ceph
[20:15] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[21:36] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:46] * aliguori (~anthony@ has joined #ceph
[22:41] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[23:46] * aliguori (~anthony@ Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.