Since upgrading to glib 2.36.0 (and GNOME 3.8), it seems that any program launched from the openbox menu becomes a zombie when it exits. All the zombie processes go away on exiting openbox. Programs started from a terminal window don't become zombies.
Created attachment 3342[details]
remove G_SPAWN_DO_NOT_REAP_CHILD
I removed G_SPAWN_DO_NOT_REAP_CHILD, no more zombie processes, and everything else seems to still work OK.
I don't know, haven't looked in to the issue yet as glib 2.36 isn't available on my distro yet. I can tell you the patch attached to this bug is not going to get included though.
Created attachment 3343[details]
use g_child_watch_add to avoid zombies
Looking at glib git commits and bugs, I've seen a couple of process spawning / SIGCHLD related issues, but they all talk about g_spawn_sync, and openbox uses g_spawn_async.
e.g. https://bugzilla.gnome.org/show_bug.cgi?id=698081
Using g_child_watch_add seems to fix the problem.
This looks more reasonable, note however that we call g_spawn_blabla in a couple of other places too.
IIRC the reason we have the waitpid(-1,...) is to avoid zombies when stuff gets reparented to us. I'm not exactly sure when this happens or if you could test that this still works too? Dana?
I never saw any zombie processes left behind from the menu generation program (which creates a pipe menu from all the xdg .desktop files).
I think /usr/libexec/openbox-autostart or /usr/libexec/openbox-xdg-autostart may have become a zombie before, but can't seem to reproduce that currently.
This is also interesting, from the linked bug:
g_warning ("GChildWatchSource: Exit status of a child process was requested but ECHILD was received by waitpid(). Most likely the process is ignoring SIGCHLD, or some other thread is invoking waitpid() with a nonpositive first argument; either behavior can break applications that use g_child_watch_add()/g_spawn_sync() either directly or indirectly.");
We do waitpid(-1,...) which that patch claims interferes with g_child_watch_add() that is being added in patch#2 here.
Here's the CL where we added the waitpid(): http://git.openbox.org/?p=dana/openbox.git;a=commit;h=745e851faa0a6f83858ef064ca589a33497e0b5a
"dont have glib reap children, we shall reap them instead to avoid zombies from processes tranferred to us"
It seems that patch#3 on https://bugzilla.gnome.org/show_bug.cgi?id=698081 reverts the behaviour which is stealing our SIGCHLD events. Is this just fixed in a later version of glib 2.36?
It looks like the zombie problem is fixed in glib-2-36 branch after 2.36.1, even though I thought the patch for gnome bug #698081 only affected g_spawn_sync and openbox uses g_spawn_async.
So no openbox changes needed.
Created attachment 3342 [details] remove G_SPAWN_DO_NOT_REAP_CHILD I removed G_SPAWN_DO_NOT_REAP_CHILD, no more zombie processes, and everything else seems to still work OK.This is also interesting, from the linked bug: g_warning ("GChildWatchSource: Exit status of a child process was requested but ECHILD was received by waitpid(). Most likely the process is ignoring SIGCHLD, or some other thread is invoking waitpid() with a nonpositive first argument; either behavior can break applications that use g_child_watch_add()/g_spawn_sync() either directly or indirectly."); We do waitpid(-1,...) which that patch claims interferes with g_child_watch_add() that is being added in patch#2 here. Here's the CL where we added the waitpid(): http://git.openbox.org/?p=dana/openbox.git;a=commit;h=745e851faa0a6f83858ef064ca589a33497e0b5a "dont have glib reap children, we shall reap them instead to avoid zombies from processes tranferred to us" It seems that patch#3 on https://bugzilla.gnome.org/show_bug.cgi?id=698081 reverts the behaviour which is stealing our SIGCHLD events. Is this just fixed in a later version of glib 2.36?