Bug 5080 - [patch] code/sys/sys_unix.c:Sys_Dialog doesn't check exit codes properly
Status: RESOLVED FIXED
Alias: None
Product: ioquake3
Classification: Unclassified
Component: Misc
Version: GIT MASTER
Hardware: All Linux
: P3 minor
Assignee: Tim Angus
QA Contact: ioquake3 bugzilla mailing list
URL:
Depends on:
Blocks:
 
Reported: 2011-07-14 20:42 EDT by q3urt.undead
Modified: 2011-07-18 15:33:32 EDT
2 users (show)

See Also:


Attachments
Make Sys_Dialog try executables that exist (3.39 KB, patch)
2011-07-14 20:42 EDT, q3urt.undead
Redirect output to null, check error code against 127 (3.11 KB, patch)
2011-07-18 06:51 EDT, Tim Angus
Redirect output to null, check error code against 127 (3.14 KB, patch)
2011-07-18 07:05 EDT, Tim Angus
Redirect to null, check error codes properly (3.58 KB, patch)
2011-07-18 13:35 EDT, q3urt.undead

Description q3urt.undead 2011-07-14 20:42:14 EDT
Created attachment 2837 [details]
Make Sys_Dialog try executables that exist

There are a few problems with the Sys_Dialog function.

1) It assumes that 1=error, 0=success.  That may be true of the command itself, but it doesn't necessarily apply to the shell.  System() will return 127 when the shell fails including when the file is not present.

2) There aren't enough checks in Sys_Dialog to detect if a program was started up properly.  It only has DT_YES_NO and DT_OK_CANCEL and assumes the others are ok.  In my case, I don't have zenity and is assumed to be DR_OK.  Even if it did have a check, it wouldn't properly detect the error due to #1 above.

The attached patch checks for the command in the path.  With this patch, it skips zenity and kdialog and properly displays with xmessage for me.

vanilla r2080 when I execute in a directory without anything quake3 related (fails to find pak0.pk3 which should be an error).  There is an error message about not finding zenity and it never displays anything.  It should use xmessage since I have that installed.

$ ./ioquake3_r2080_vanilla.i386
ioq3 1.36_SVN2080 linux-i386 Jul 14 2011
Have SSE support
----- FS_Startup -----
Current search path:
/home/undead/.q3a/baseq3
./baseq3

----------------------
0 files in pk3 files
"pak0.pk3" is missing. Please copy it from your legitimate Q3 CDROM. Point Release files are missing. Please re-install the 1.32 point release. Also check that your ioq3 executable is in the correct place and that every file in the "baseq3" directory is present and readable
sh: zenity: not found


patched r2080 where it pops up with xmessage:

$ ./ioquake3_r2080_dialog.i386
ioq3 1.36_SVN2080 linux-i386 Jul 14 2011
Have SSE support
----- FS_Startup -----
Current search path:
/home/undead/.q3a/baseq3
./baseq3

----------------------
0 files in pk3 files
"pak0.pk3" is missing. Please copy it from your legitimate Q3 CDROM. Point Release files are missing. Please re-install the 1.32 point release. Also check that your ioq3 executable is in the correct place and that every file in the "baseq3" directory is present and readable
Comment 1 Simon McVittie 2011-07-15 04:01:49 EDT
Rather than second-guessing the shell's PATH search, it might be better to expect particular exit codes, and treat everything else as an error? Sys_Zenity etc. could return DR_OK, DR_CANCEL or a new DR_ERROR, and do appropriate processing on the return from system() before they return.

Zenity exits 0 for OK, 1 for cancel, -1 on unexpected error (which probably comes out as 127 or 255 from system()), 5 on timeout (not relevant to us). So we could treat 0 as OK/yes, 1 as cancel/no and anything else as error.

xmessage exits 0 for OK, 1 for error, and (as we invoke it) also 1 for cancel (oops). We should probably change the command line options so it exits 2 for cancel, and treat 0 as OK/yes, 2 as cancel/no and anything else as error.

kdialog doesn't document its exit codes, but is probably the same as dialog(1), which exits 0 on OK/yes, 1 on cancel/no, 2 or 3 on buttons we never actually use, and -1 on internal error or exiting via pressing ESC (but ESC can be mapped to something less ambiguous with the DIALOG_ESC environment variable).

If a command isn't found, bash will use 127 (not found) or 126 (found but cannot be executed). dash (Debian/Ubuntu /bin/sh) doesn't document what exit code it will use, but it appears to be 127.
Comment 2 Simon McVittie 2011-07-15 04:09:14 EDT
> kdialog doesn't document its exit codes

https://projects.kde.org/projects/kde/kdebase/kde-baseapps/repository/revisions/8953d76773a39a756513280a04409f8984840268/entry/kdialog/kdialog.cpp says:

// We want to return 0 for ok, yes and continue, 1 for no and 2 for cancel

so I think it should be safe to interpret anything else as error.
Comment 3 Tim Angus 2011-07-15 10:33:48 EDT
I agree regarding iterating over the system PATH variable; I don't think that's a great idea. I think it would probably be enough just to change...

if( exitCode >= 0 )

...to...

if( exitCode >= 0 && exitCode < 64 )

The choice of 64 is rather arbitrary. This way the shell indicating the command is not present is caught, but it still leaves plenty of numerical space incase the return codes of the individual commands change (highly unlikely).

I think it's relatively safe to assume that a zero result from the command means OK/Yes and a non-zero result means Cancel/No.

The less safe part is assuming that the shell itself returns something < 0 or >= 64 if the command cannot be found/exectued, but I think in the grand scheme of things that's still a fair assumption since in almost all cases system will use something like /bin/sh.

Could you give this a try and see if it works for you?
Comment 4 q3urt.undead 2011-07-16 22:45:09 EDT
Tim: That's fine if you want to try to execute commands that don't exist.  However, < 64 is not safe.  When I run xmessage, I get 101 when I click ok.  I think exit code 127 is always command not found because /bin/sh is a POSIX shell.  I don't have the spec in front of me, but my guess is that it's a requirement.  Does anyone have a copy of it?  On Debian, you can use dash instead of /bin/sh (bash) which is smaller but still meets the POSIX requirements.

The system() man page covers these return codes conditions:
When an error occurs in system() such as fork() failure, -1
When /bin/sh cannot be executed and the string is not NULL, the exit status will be exit(127).
When the passed in string is NULL, it returns nonzero if the shell is available or zero if not.  Could it return 127 here too since it's vague?  Not worth worrying about since we don't pass in NULL.
Otherwise, it uses the WEXITSTATUS(status) as specified in wait(2).

At this point, I think you have to look at the POSIX spec to see what it says a /bin/sh must return when a command is not found.  My guess is that it requires 127 which is what we see in practice when running a command string outside of system().

I think these are the safe assumptions:

0 = yes/ok (but this isn't the only exit code that means yes/ok).
127 = command not found

I'm seeing 101 means yes/ok as well in xmessage.  I think someone would have to go through all of the ways you call xmessage, kdialog and zenity to see how they respond to all of the ways you are invoking it.

It would be best to find out if there's a way to force all three commands to use specific exit codes.  For instance, instead of "-buttons OK", use "-buttons OK:0" in xmessage.  Then it returns 0 rather than 101 on yes/ok.

Since you want to run through all of the commands in ioquake3, you will also want to redirect stdout and stderr of the processes so it doesn't clutter the screen with "sh: <command> not found" messages.

Simon: You must be assuming something about the options because that is not necessarily how ioquake3 calls it in practice.  When I get the error condition that I mention above, it uses the default path in the switch because I only have an "OK" button.  When you only have OK, it returns something else.

$ xmessage -center -buttons OK "Hi there"
$ echo $?
101

Now you could change ioquake3 to call it like this in order to force it to use 0:

$ xmessage -center -buttons OK:0 "Hi there"
$ echo $?
0

To summarize, you can assume 127 = command not found and then you need to force all three commands to return certain return codes or go through them all and find out what they return.
Comment 5 Simon McVittie 2011-07-17 14:29:24 EDT
(In reply to comment #4)
> I
> think exit code 127 is always command not found because /bin/sh is a POSIX
> shell.  I don't have the spec in front of me, but my guess is that it's a
> requirement.

POSIX 2008 <http://pubs.opengroup.org/onlinepubs/9699919799/> basically says the same as bash documents:
> If a command is not found, the exit status shall be 127. If the command name
> is found, but it is not an executable utility, the exit status shall be 126.
> Applications that invoke utilities without using the shell should use these
> exit status values to report similar errors.
>
> If a command fails during word expansion or redirection, its exit status
> shall be greater than zero.

(In reply to comment #4)
> On Debian, you can use dash
> instead of /bin/sh (bash) which is smaller but still meets the POSIX
> requirements.

(More precisely, /bin/sh on Debian can either be dash or bash; dash is the default for new installs.)

As I said above, dash doesn't document its behaviour, but in practice it's the same as specified by POSIX.

> To summarize, you can assume 127 = command not found and then you need to force
> all three commands to return certain return codes or go through them all and
> find out what they return.

Yes, that; except that 126 will almost never be seen in practice (unless you deliberately chmod -x /usr/bin/zenity, or whatever), but it'd be correct to treat it like 127.
Comment 6 Tim Angus 2011-07-18 06:20:41 EDT
(In reply to comment #4)
> Tim: That's fine if you want to try to execute commands that don't exist.

Yes, that's how it's supposed to work, but I confess I hadn't considered...

> Since you want to run through all of the commands in ioquake3, you will also
> want to redirect stdout and stderr of the processes so it doesn't clutter the
> screen with "sh: <command> not found" messages.

Hopefully it'll be enough to tack on a "&> /dev/null" to the commands. The assumption is that we're using some breed of sh, so I guess that's fair.

> However, < 64 is not safe.  When I run xmessage, I get 101 when I click ok.

OK, that's a problem. I've fixed that.

> To summarize, you can assume 127 = command not found and then you need to force
> all three commands to return certain return codes or go through them all and
> find out what they return.

OK.
Comment 7 Tim Angus 2011-07-18 06:51:10 EDT
Created attachment 2850 [details]
Redirect output to null, check error code against 127

Hopefully this solves the problems? Let me know.
Comment 8 Simon McVittie 2011-07-18 06:59:11 EDT
(In reply to comment #6)
> Hopefully it'll be enough to tack on a "&> /dev/null" to the commands.

&> is a bash extension and won't work in (at least) dash; please use ">/dev/null 2>/dev/null", which is portable (and specified by POSIX [1]).

The Autoconf documentation has some good information on what is and isn't portable in shell scripts, even if you're not actually using Autoconf.

> The
> assumption is that we're using some breed of sh, so I guess that's fair.

system() is specified by POSIX [2] in terms of "sh -c", where sh is a POSIX (Bourne-derived) shell. Typically that means /bin/sh, which is a symlink to bash, dash, ksh or something.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_02
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html
Comment 9 Tim Angus 2011-07-18 07:05:40 EDT
Created attachment 2851 [details]
Redirect output to null, check error code against 127

Alright.
Comment 10 Simon McVittie 2011-07-18 07:39:06 EDT
(In reply to comment #9)
> Redirect output to null, check error code against 127

Looks good to me, FWIW.
Comment 11 q3urt.undead 2011-07-18 13:35:29 EDT
Created attachment 2852 [details]
Redirect to null, check error codes properly

Tim: The system() return codes were not correct.  System() doesn't return the child exit code directly.

There are three cases:

int code = system(...);

code == -1 is a failure (fork() etc)
code & 127 != 0 means the child process had a signal
code >> 8 means the actual child exit code that you want to check against 126/127

Attached is a patch which fixes this.  Using this patch, it works as expected and pops up the xmessage dialog.  I based it off of your patch rather than my earlier one.
Comment 12 Tim Angus 2011-07-18 15:33:32 EDT
Fixed in r2092.