DescriptionPatrick Baggett
2007-11-15 02:29:56 EST
Creating a single player game under SGI IRIX causes a SIGBUS to be raised and the game aborts. This seems to be caused by the code in botlib/l_precomp.c. Since I have heard that other RISC ports of Quake3 (PowerPC, SPARC) are having a similar problem (though I can't confirm it), it seems quite clear that RISC-based machines are hitting unaligned data (the source of SIGBUS).
Looking into l_precomp.c(1616):
typedef struct value_s
{
signed long int intvalue;
double floatvalue;
int parentheses;
struct value_s *prev, *next;
} value_t;
Reading "floatvalue" will raise SIGBUS on a RISC machine since it is not 8-byte aligned. On a 64-bit port, the pointers "prev" and "next" are also unaligned.
The following is a more optimal ordering:
typedef struct value_s
{
double floatvalue;
struct value_s *prev, *next;
signed long int intvalue;
int parentheses;
} value_t;
The first field is 8-byte aligned, the next two are aligned in both 32- and 64-bit builds, the long int (which is 64 bits on ILP64 and 32 bits on ILP32) is also aligned to its natural polymorphic boundary, and finally, the 32-bit integer is aligned. There are probably a few other RISC-unfriendly structures being used, though certainly even x86 processors would benefit from data that is naturally aligned.
I am currently working on a fix for this under SGI IRIX. It would be interesting to see if other RISC architectures have their problem solved. Reported against SVN revision 1212.
Patrick Baggett
Figgle Software
Changing the structures in l_precomp.c and l_script.h that use doubles (64 bits, requires 8 byte alignment) to floats (32 bits, requires 4 byte alignment) fixes the SIGBUS error, but causes a SIGSEGV elsewhere in the code after adding a bot. Still looking into it.
After the changes mentioned (double->float)
After adding a bot into a multiplayer game, the SIGSEGV is triggered.
The SIGSEGV occurs when accessing sv_maxbarrier in be_ai_move.c:BotGapDistance() at this line:
end[2] -= 48 + sv_maxbarrier->value;
The value of sv_maxbarrier is something strange like 0x30000000.
I added a printf() where sv_maxbarrier is initialized [BotSetupMoveAI()], but that message never showed up, so I'm assuming the function isn't being called. Considering the context, I think it should. All of the other libvar_t* variables are set to NULL when the SIGSEGV occurs.
Applying this somehow magically fixes this problem.
Index: botlib/be_ai_move.c
===================================================================
--- botlib/be_ai_move.c (revision 1212)
+++ botlib/be_ai_move.c (working copy)
@@ -102,7 +102,7 @@
#define MODELTYPE_FUNC_STATIC 4
libvar_t *sv_maxstep;
-libvar_t *sv_maxbarrier;
+libvar_t *sv_maxbarrier = (libvar_t*)0xFFFFFFFF;
libvar_t *sv_gravity;
libvar_t *weapindex_rocketlauncher;
libvar_t *weapindex_bfg10k;
With this added, I can play with bots just fine. I can't seem to wrap my head around why setting the value to 0xFFFFFFFF would fix this problem, and somehow I don't think this solution is portable. Even still, I feel bad writing something like:
libvar_t *sv_maxbarrier = (libvar_t*)0xFFFFFFFF; //MAGIC, DON'T TOUCH
Any thoughts?
Created attachment 1667[details]
Patch against R1266 to fix SIGBUS on SGI IRIX 6.5
It's been awhile, but I promised I'd get this tested. Commits earlier fixing the botlib were successful. Thanks for the help!
This patch (against R1266) fixes the SIGBUS issues as stated in the initial bug log. IOQuake3 build cleanly, and with this patch, should run out-of-the-box. Bots work great, no strange errors or crashes with two bots added.
Created attachment 1667 [details] Patch against R1266 to fix SIGBUS on SGI IRIX 6.5 It's been awhile, but I promised I'd get this tested. Commits earlier fixing the botlib were successful. Thanks for the help! This patch (against R1266) fixes the SIGBUS issues as stated in the initial bug log. IOQuake3 build cleanly, and with this patch, should run out-of-the-box. Bots work great, no strange errors or crashes with two bots added.