Bug 3427 - Single Player causes SIGBUS on SGI IRIX
Status: RESOLVED FIXED
Alias: None
Product: ioquake3
Classification: Unclassified
Component: Platform
Version: GIT MASTER
Hardware: SGI IRIX
: P3 critical
Assignee: Zachary J. Slater
QA Contact: ioquake3 bugzilla mailing list
URL:
Depends on:
Blocks:
 
Reported: 2007-11-15 02:29 EST by Patrick Baggett
Modified: 2008-02-20 14:42:03 EST
0 users

See Also:


Attachments
Patch against R1266 to fix SIGBUS on SGI IRIX 6.5 (4.37 KB, patch)
2008-02-16 21:18 EST, Patrick Baggett

Description Patrick Baggett 2007-11-15 02:29:56 EST
Creating a single player game under SGI IRIX causes a SIGBUS to be raised and the game aborts. This seems to be caused by the code in botlib/l_precomp.c. Since I have heard that other RISC ports of Quake3 (PowerPC, SPARC) are having a similar problem (though I can't confirm it), it seems quite clear that RISC-based machines are hitting unaligned data (the source of SIGBUS).

Looking into l_precomp.c(1616):
typedef struct value_s
{
	signed long int intvalue;
	double floatvalue;
	int parentheses;
	struct value_s *prev, *next;
} value_t;

Reading "floatvalue" will raise SIGBUS on a RISC machine since it is not 8-byte aligned. On a 64-bit port, the pointers "prev" and "next" are also unaligned.

The following is a more optimal ordering:
typedef struct value_s
{
	double floatvalue;
	struct value_s *prev, *next;
	signed long int intvalue;
	int parentheses;
} value_t;


The first field is 8-byte aligned, the next two are aligned in both 32- and 64-bit builds, the long int (which is 64 bits on ILP64 and 32 bits on ILP32) is also aligned to its natural polymorphic boundary, and finally, the 32-bit integer is aligned. There are probably a few other RISC-unfriendly structures being used, though certainly even x86 processors would benefit from data that is naturally aligned.

I am currently working on a fix for this under SGI IRIX. It would be interesting to see if other RISC architectures have their problem solved. Reported against SVN revision 1212.

Patrick Baggett
Figgle Software
Comment 1 Patrick Baggett 2007-11-16 21:05:03 EST
Changing the structures in l_precomp.c and l_script.h that use doubles (64 bits, requires 8 byte alignment) to floats (32 bits, requires 4 byte alignment) fixes the SIGBUS error, but causes a SIGSEGV elsewhere in the code after adding a bot. Still looking into it.
Comment 2 Patrick Baggett 2007-11-18 04:15:08 EST
After the changes mentioned (double->float)
After adding a bot into a multiplayer game, the SIGSEGV is triggered.
The SIGSEGV occurs when accessing sv_maxbarrier in be_ai_move.c:BotGapDistance() at this line:
end[2] -= 48 + sv_maxbarrier->value;

The value of sv_maxbarrier is something strange like 0x30000000.
I added a printf() where sv_maxbarrier is initialized [BotSetupMoveAI()], but that message never showed up, so I'm assuming the function isn't being called. Considering the context, I think it should. All of the other libvar_t* variables are set to NULL when the SIGSEGV occurs.


Applying this somehow magically fixes this problem.
Index: botlib/be_ai_move.c
===================================================================
--- botlib/be_ai_move.c (revision 1212)
+++ botlib/be_ai_move.c (working copy)
@@ -102,7 +102,7 @@
 #define MODELTYPE_FUNC_STATIC  4
 
 libvar_t *sv_maxstep;
-libvar_t *sv_maxbarrier;
+libvar_t *sv_maxbarrier = (libvar_t*)0xFFFFFFFF;
 libvar_t *sv_gravity;
 libvar_t *weapindex_rocketlauncher;
 libvar_t *weapindex_bfg10k;


With this added, I can play with bots just fine. I can't seem to wrap my head around why setting the value to 0xFFFFFFFF would fix this problem, and somehow I don't think this solution is portable. Even still, I feel bad writing something like:

libvar_t *sv_maxbarrier = (libvar_t*)0xFFFFFFFF; //MAGIC, DON'T TOUCH

Any thoughts?
Comment 3 Tim Angus 2007-12-01 15:23:07 EST
Has r1219 helped this at all?
Comment 4 Patrick Baggett 2007-12-01 20:07:27 EST
(In reply to comment #3)
> Has r1219 helped this at all?
> 

I'll test this Sunday night and get back to you on that. Thanks for bringing that up.
Comment 5 Tim Angus 2007-12-02 08:30:37 EST
...and r1225.
Comment 6 Patrick Baggett 2008-02-16 21:18:48 EST
Created attachment 1667 [details]
Patch against R1266 to fix SIGBUS on SGI IRIX 6.5

It's been awhile, but I promised I'd get this tested. Commits earlier fixing the botlib were successful. Thanks for the help!
This patch (against R1266) fixes the SIGBUS issues as stated in the initial bug log. IOQuake3 build cleanly, and with this patch, should run out-of-the-box. Bots work great, no strange errors or crashes with two bots added.
Comment 7 Tim Angus 2008-02-20 14:42:03 EST
Fixed in r1268.