Hello, I originally made a post at http://tremulous.net/phpBB2/viewtopic.php?t=3097
I am sometimes getting NaN for s and t that goes into R_FogFactor. It crashes my client at random times in the game. I am not 100% sure that NaN is why it is crashing, but the GNU Debugger has specifically pointed me to line 2106 of tr_image.c which is:
d = tr.fogTable[ (int)(s * (FOG_TABLE_SIZE-1)) ];
(FWIW x86_64 isn't a platform I have the ability or inclination to support right now so don't be suprised if this doesn't get fixed).
The renderer in Trem has changed very little in comparison to Q3. It would be helpful if you could compile ioq3 (http://icculus.org/quake3/) with similar settings and try to reproduce the bug there. In that the bug seems to involve fog I suggest only testing on maps with fog.
I've played ioQ3 OpenArena, which has an unmodified renderer, it doesn't seem to happen, though it's not to say the bug isn't there.. I have not played them as extensively as Tremulous.
The bug happens to me regardless of if I'm rolling around in fog or not. I've got this issue in maps that seemingly don't have fog at all (ACTS). I've tried going on Transit staying up in the fog for a while, nothing.
It's very random and may take a few hours to reproduce. I'll try to reproduce it under ioq3 or OpenArena.
It is also notable that many people have compiled Tremulous under a x86_64 environment and it has worked exceptionally. I'm not sure as to whether this bug is x86_64 specific.
It may even be my hardware, as suggested in the forums. I've ran various tests on my hardware to make sure it is sane. Of course, floating points aren't extremely accurate, but I don't think being 0.0000~ off would effect this code at all.
At first it looked like an array overrun, but further inspection has made me think to eliminate that.
This bug occurs in Tremulous when playing the arachnid2 map. I don't know how
to trigger this bug, I just move around the level randomly until it occurs.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f32879f6720 (LWP 26668)]
0x000000000049f9c7 in R_FogFactor (s=nan(0x7fffff), t=nan(0x7fffff)) at
src/renderer/tr_image.c:1024
1024 d = tr.fogTable[ (int)(s * (FOG_TABLE_SIZE-1)) ];
(gdb) print s
$1 = nan(0x7fffff)
(gdb) bt
#0 0x000000000049f9c7 in R_FogFactor (s=nan(0x7fffff), t=nan(0x7fffff)) at
src/renderer/tr_image.c:1024
#1 0x00000000004b758c in RB_CalcModulateAlphasByFog (colors=0x1115060
"��\231���\231���\231���\231�",
'�' <repeats 124 times>) at src/renderer/tr_shade_calc.c:765
#2 0x00000000004b4414 in ComputeColors (pStage=0x7f327e9107c0) at
src/renderer/tr_shade.c:933
#3 0x00000000004b4b4b in RB_IterateStagesGeneric (input=0x11017e0) at
src/renderer/tr_shade.c:1061
#4 0x00000000004b4e36 in RB_StageIteratorGeneric () at
src/renderer/tr_shade.c:1188
#5 0x00000000004b53cd in RB_EndSurface () at src/renderer/tr_shade.c:1447
#6 0x000000000049005d in RB_RenderDrawSurfList (drawSurfs=0x7f327ae80020,
numDrawSurfs=690) at src/renderer/tr_backend.c:565
#7 0x0000000000490ebd in RB_DrawSurfs (data=0x7f327afb04f8) at
src/renderer/tr_backend.c:904
#8 0x0000000000491363 in RB_ExecuteRenderCommands (data=0x7f327afb04f8) at
src/renderer/tr_backend.c:1074
#9 0x0000000000498566 in R_IssueRenderCommands (runPerformanceCounters=qtrue)
at src/renderer/tr_cmds.c:157
#10 0x0000000000498b19 in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at
src/renderer/tr_cmds.c:433
#11 0x000000000041d28d in SCR_UpdateScreen () at src/client/cl_scrn.c:508
#12 0x0000000000417f7a in CL_Frame (msec=12) at src/client/cl_main.c:2278
#13 0x000000000043b792 in Com_Frame () at src/qcommon/common.c:2746
#14 0x00000000004c9126 in main (argc=7, argv=0x7fff8fbe4028) at
src/sys/sys_main.c:607
I have traced the bug to tess.xyz containing NaN values, which were used in RB_CalcFogTexCoords, which is called in RB_CalcModulateAlphasByFog before calling R_FogFactor.
Here's another backtrace:
#0 0x000000000046d5ae in R_FogFactor (s=<value optimized out>,
t=-nan(0x7fffff)) at src/renderer/tr_image.c:4577
#1 0x000000000047d026 in RB_CalcModulateAlphasByFog (
colors=0x10bcf00
"f\031\031�f\031\031�f\031\031�f\031\031�\177\177\177\001\177\177\177\001\177\177\177\001\177\177\177\001\177\177\177f\177\177\177f\177\177\177f\177\177\177f\177\177\177L\177\177\177L\177\177\177L\177\177\177L\177\177\177\032\177\177\177\032\177\177\177\032\177\177\177\032",
'�' <repeats 120 times>...) at src/renderer/tr_shade_calc.c:765
#2 0x000000000047bbd3 in RB_StageIteratorGeneric () at
src/renderer/tr_shade.c:933
#3 0x000000000047aa1f in RB_EndSurface () at src/renderer/tr_shade.c:1447
#4 0x0000000000464047 in RB_RenderDrawSurfList (drawSurfs=<value optimized
out>, numDrawSurfs=3236) at src/renderer/tr_backend.c:565
#5 0x0000000000464358 in RB_DrawSurfs (data=0x7fcf7bf744f8) at
src/renderer/tr_backend.c:906
#6 0x000000000046442b in RB_ExecuteRenderCommands (data=0x7fcf7bf744f8) at
src/renderer/tr_backend.c:1081
#7 0x00000000004698cf in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at
src/renderer/tr_cmds.c:433
#8 0x00000000004149aa in SCR_UpdateScreen () at src/client/cl_scrn.c:519
#9 0x000000000041175f in CL_Frame (msec=13) at src/client/cl_main.c:2190
#10 0x0000000000427f89 in Com_Frame () at src/qcommon/common.c:2679
#11 0x000000000048bf5b in main (argc=1, argv=0x7fff8f5cd5a8) at
src/unix/unix_main.c:1489
Variable i (int) overflows in RB_CalcModulateAlphasByFog, I guess:
(gdb) f 1
(gdb) print i
$8 = -2147483648
(gdb) list
760 // this is not wasted, because it would only have
761 // been previously called if the surface was opaque
762 RB_CalcFogTexCoords( texCoords[0] );
763
764 for ( i = 0; i < tess.numVertexes; i++, colors += 4 ) {
765 float f = 1.0 - R_FogFactor( texCoords[i][0],
texCoords[i][1] );
I got a decent value of i when it crashed for me, crashed right at the start when i = 0. As I said before, try printing the contents of tess.xyz and don't use a release build, use a debug build.
(In reply to comment #8)
> I got a decent value of i when it crashed for me, crashed right at the start
> when i = 0. As I said before, try printing the contents of tess.xyz and don't
> use a release build, use a debug build.
>
Ok, yeah. Using a debug build crashes when i is 0.
I'm almost sure this bug is x86_64 specific. I use gentoo ~amd64 arch. I have not got a crash using i386 binaries. As soon as using x86_64, then crash is randomly and very frequently occurs.
If that is the case then the bug is probably due to a buffer overflow somewhere. Effects would be different on i386 because the stack layout is different.
Update on the bug:
The NaN tess.xyz values seem to be caused by a blood entity (when you shoot someone in trem there is a small spot of blood) having a NaN origin.
I've given up tracking this bug. Apparently it is somewhere in the cgame, and only occurs when using qvms. Putting some debug printfs in the code made the bug disappear.
Anyways, here's my fix for it:
--- a/src/renderer/tr_scene.c Wed Jul 30 18:19:53 2008 +0800
+++ b/src/renderer/tr_scene.c Fri Aug 01 01:07:03 2008 +0800
@@ -212,6 +212,9 @@
if ( r_numentities >= MAX_ENTITIES ) {
return;
}
+ if ( Q_isnan(ent->origin[0]) || Q_isnan(ent->origin[1]) || Q_isnan(ent->origin[2]) ) {
+ return;
+ }
if ( ent->reType < 0 || ent->reType >= RT_MAX_REF_ENTITY_TYPE ) {
ri.Error( ERR_DROP, "RE_AddRefEntityToScene: bad reType %i", ent->reType );
}
(In reply to comment #15)
> I was unable to reproduce this bug today after i compiled Tremulous without
> -ffast-math (sed -i -e 's/-ffast-math//g' Makefile).
>
This does not work for me. It still crashes frequently. Specially at map called aracnid 2.
(In reply to comment #13)
> I've given up tracking this bug. Apparently it is somewhere in the cgame, and
> only occurs when using qvms. Putting some debug printfs in the code made the
> bug disappear.
>
> Anyways, here's my fix for it:
> --- a/src/renderer/tr_scene.c Wed Jul 30 18:19:53 2008 +0800
> +++ b/src/renderer/tr_scene.c Fri Aug 01 01:07:03 2008 +0800
> @@ -212,6 +212,9 @@
> if ( r_numentities >= MAX_ENTITIES ) {
> return;
> }
> + if ( Q_isnan(ent->origin[0]) || Q_isnan(ent->origin[1]) ||
> Q_isnan(ent->origin[2]) ) {
> + return;
> + }
> if ( ent->reType < 0 || ent->reType >= RT_MAX_REF_ENTITY_TYPE ) {
> ri.Error( ERR_DROP, "RE_AddRefEntityToScene: bad reType %i",
> ent->reType );
> }
>
Amanieu, this patch made the bug disappear for me, too. Thanks.
I put the patch (and a Com_printf) in there, svn revision #1595. My belief is that when this is most likely to trigger, it's a compiler optimization bug, so no further fix is planned.
--ryan.
(In reply to comment #19)
> I put the patch (and a Com_printf) in there, svn revision #1595. My belief
> is that when this is most likely to trigger, it's a compiler optimization
> bug, so no further fix is planned.
Belief versus proven fact. Until we get to the bottom of this, regardless of whether this is compiler bug or not, we risk this issue re-appearing.
(In reply to comment #19)
> I put the patch (and a Com_printf) in there, svn revision #1595. My belief
> is that when this is most likely to trigger, it's a compiler optimization
> bug, so no further fix is planned.
Wow, when I upgraded Bugzilla today, it decided to post a few unsent emails from 3.5 years ago, including Comment #19.
(In reply to comment #20)
> Belief versus proven fact. Until we get to the bottom of this, regardless of
> whether this is compiler bug or not, we risk this issue re-appearing.
You don't look busy, feel free to investigate. :)
--ryan.
Here's another backtrace: #0 0x000000000046d5ae in R_FogFactor (s=<value optimized out>, t=-nan(0x7fffff)) at src/renderer/tr_image.c:4577 #1 0x000000000047d026 in RB_CalcModulateAlphasByFog ( colors=0x10bcf00 "f\031\031�f\031\031�f\031\031�f\031\031�\177\177\177\001\177\177\177\001\177\177\177\001\177\177\177\001\177\177\177f\177\177\177f\177\177\177f\177\177\177f\177\177\177L\177\177\177L\177\177\177L\177\177\177L\177\177\177\032\177\177\177\032\177\177\177\032\177\177\177\032", '�' <repeats 120 times>...) at src/renderer/tr_shade_calc.c:765 #2 0x000000000047bbd3 in RB_StageIteratorGeneric () at src/renderer/tr_shade.c:933 #3 0x000000000047aa1f in RB_EndSurface () at src/renderer/tr_shade.c:1447 #4 0x0000000000464047 in RB_RenderDrawSurfList (drawSurfs=<value optimized out>, numDrawSurfs=3236) at src/renderer/tr_backend.c:565 #5 0x0000000000464358 in RB_DrawSurfs (data=0x7fcf7bf744f8) at src/renderer/tr_backend.c:906 #6 0x000000000046442b in RB_ExecuteRenderCommands (data=0x7fcf7bf744f8) at src/renderer/tr_backend.c:1081 #7 0x00000000004698cf in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at src/renderer/tr_cmds.c:433 #8 0x00000000004149aa in SCR_UpdateScreen () at src/client/cl_scrn.c:519 #9 0x000000000041175f in CL_Frame (msec=13) at src/client/cl_main.c:2190 #10 0x0000000000427f89 in Com_Frame () at src/qcommon/common.c:2679 #11 0x000000000048bf5b in main (argc=1, argv=0x7fff8f5cd5a8) at src/unix/unix_main.c:1489 Variable i (int) overflows in RB_CalcModulateAlphasByFog, I guess: (gdb) f 1 (gdb) print i $8 = -2147483648 (gdb) list 760 // this is not wasted, because it would only have 761 // been previously called if the surface was opaque 762 RB_CalcFogTexCoords( texCoords[0] ); 763 764 for ( i = 0; i < tess.numVertexes; i++, colors += 4 ) { 765 float f = 1.0 - R_FogFactor( texCoords[i][0], texCoords[i][1] );I've given up tracking this bug. Apparently it is somewhere in the cgame, and only occurs when using qvms. Putting some debug printfs in the code made the bug disappear. Anyways, here's my fix for it: --- a/src/renderer/tr_scene.c Wed Jul 30 18:19:53 2008 +0800 +++ b/src/renderer/tr_scene.c Fri Aug 01 01:07:03 2008 +0800 @@ -212,6 +212,9 @@ if ( r_numentities >= MAX_ENTITIES ) { return; } + if ( Q_isnan(ent->origin[0]) || Q_isnan(ent->origin[1]) || Q_isnan(ent->origin[2]) ) { + return; + } if ( ent->reType < 0 || ent->reType >= RT_MAX_REF_ENTITY_TYPE ) { ri.Error( ERR_DROP, "RE_AddRefEntityToScene: bad reType %i", ent->reType ); }