Bug 2998 - Getting NaN
Status: RESOLVED FIXED
Alias: None
Product: ioquake3
Classification: Unclassified
Component: Video
Version: unspecified
Hardware: PC Linux
: P2 normal
Assignee: Tremulous Bugs
QA Contact:
URL:
: 3709
Depends on:
Blocks:
 
Reported: 2007-01-05 04:53 EST by Kyle Hunter
Modified: 2013-02-27 20:03:54 EST
4 users (show)

See Also:



Description Kyle Hunter 2007-01-05 04:53:55 EST
Hello, I originally made a post at http://tremulous.net/phpBB2/viewtopic.php?t=3097

I am sometimes getting NaN for s and t that goes into R_FogFactor. It crashes my client at random times in the game. I am not 100% sure that NaN is why it is crashing, but the GNU Debugger has specifically pointed me to line 2106 of tr_image.c which is:
d = tr.fogTable[ (int)(s * (FOG_TABLE_SIZE-1)) ];
Comment 1 Tim Angus 2007-01-05 05:20:02 EST
(FWIW x86_64 isn't a platform I have the ability or inclination to support right now so don't be suprised if this doesn't get fixed).

The renderer in Trem has changed very little in comparison to Q3. It would be helpful if you could compile ioq3 (http://icculus.org/quake3/) with similar settings and try to reproduce the bug there. In that the bug seems to involve fog  I suggest only testing on maps with fog.
Comment 2 Kyle Hunter 2007-01-05 05:41:37 EST
I've played ioQ3 OpenArena, which has an unmodified renderer, it doesn't seem to happen, though it's not to say the bug isn't there.. I have not played them as extensively as Tremulous.

The bug happens to me regardless of if I'm rolling around in fog or not. I've got this issue in maps that seemingly don't have fog at all (ACTS). I've tried going on Transit staying up in the fog for a while, nothing.

It's very random and may take a few hours to reproduce. I'll try to reproduce it under ioq3 or OpenArena.
Comment 3 Kyle Hunter 2007-01-05 05:45:30 EST
It is also notable that many people have compiled Tremulous under a x86_64 environment and it has worked exceptionally. I'm not sure as to whether this bug is x86_64 specific.

It may even be my hardware, as suggested in the forums. I've ran various tests on my hardware to make sure it is sane. Of course, floating points aren't extremely accurate, but I don't think being 0.0000~ off would effect this code at all.

At first it looked like an array overrun, but further inspection has made me think to eliminate that.
Comment 4 Tim Angus 2007-08-06 13:33:24 EDT
Closing due to inactivity. Re-open if more information becomes available.
Comment 5 Amanieu d'Antras 2008-07-06 07:13:48 EDT
*** Bug 3709 has been marked as a duplicate of this bug. ***
Comment 6 Amanieu d'Antras 2008-07-06 07:16:51 EDT
This bug occurs in Tremulous when playing the arachnid2 map. I don't know how
to trigger this bug, I just move around the level randomly until it occurs.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f32879f6720 (LWP 26668)]
0x000000000049f9c7 in R_FogFactor (s=nan(0x7fffff), t=nan(0x7fffff)) at
src/renderer/tr_image.c:1024
1024            d = tr.fogTable[ (int)(s * (FOG_TABLE_SIZE-1)) ];
(gdb) print s
$1 = nan(0x7fffff)
(gdb) bt
#0  0x000000000049f9c7 in R_FogFactor (s=nan(0x7fffff), t=nan(0x7fffff)) at
src/renderer/tr_image.c:1024
#1  0x00000000004b758c in RB_CalcModulateAlphasByFog (colors=0x1115060
"��\231���\231���\231���\231�",
'�' <repeats 124 times>) at src/renderer/tr_shade_calc.c:765
#2  0x00000000004b4414 in ComputeColors (pStage=0x7f327e9107c0) at
src/renderer/tr_shade.c:933
#3  0x00000000004b4b4b in RB_IterateStagesGeneric (input=0x11017e0) at
src/renderer/tr_shade.c:1061
#4  0x00000000004b4e36 in RB_StageIteratorGeneric () at
src/renderer/tr_shade.c:1188
#5  0x00000000004b53cd in RB_EndSurface () at src/renderer/tr_shade.c:1447
#6  0x000000000049005d in RB_RenderDrawSurfList (drawSurfs=0x7f327ae80020,
numDrawSurfs=690) at src/renderer/tr_backend.c:565
#7  0x0000000000490ebd in RB_DrawSurfs (data=0x7f327afb04f8) at
src/renderer/tr_backend.c:904
#8  0x0000000000491363 in RB_ExecuteRenderCommands (data=0x7f327afb04f8) at
src/renderer/tr_backend.c:1074
#9  0x0000000000498566 in R_IssueRenderCommands (runPerformanceCounters=qtrue)
at src/renderer/tr_cmds.c:157
#10 0x0000000000498b19 in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at
src/renderer/tr_cmds.c:433
#11 0x000000000041d28d in SCR_UpdateScreen () at src/client/cl_scrn.c:508
#12 0x0000000000417f7a in CL_Frame (msec=12) at src/client/cl_main.c:2278
#13 0x000000000043b792 in Com_Frame () at src/qcommon/common.c:2746
#14 0x00000000004c9126 in main (argc=7, argv=0x7fff8fbe4028) at
src/sys/sys_main.c:607

I have traced the bug to tess.xyz containing NaN values, which were used in RB_CalcFogTexCoords, which is called in RB_CalcModulateAlphasByFog before calling R_FogFactor.
Comment 7 Jaak Ristioja 2008-07-12 15:56:04 EDT
Here's another backtrace:
#0  0x000000000046d5ae in R_FogFactor (s=<value optimized out>,
t=-nan(0x7fffff)) at src/renderer/tr_image.c:4577
#1  0x000000000047d026 in RB_CalcModulateAlphasByFog (
    colors=0x10bcf00
"f\031\031�f\031\031�f\031\031�f\031\031�\177\177\177\001\177\177\177\001\177\177\177\001\177\177\177\001\177\177\177f\177\177\177f\177\177\177f\177\177\177f\177\177\177L\177\177\177L\177\177\177L\177\177\177L\177\177\177\032\177\177\177\032\177\177\177\032\177\177\177\032",
'�' <repeats 120 times>...) at src/renderer/tr_shade_calc.c:765
#2  0x000000000047bbd3 in RB_StageIteratorGeneric () at
src/renderer/tr_shade.c:933
#3  0x000000000047aa1f in RB_EndSurface () at src/renderer/tr_shade.c:1447
#4  0x0000000000464047 in RB_RenderDrawSurfList (drawSurfs=<value optimized
out>, numDrawSurfs=3236) at src/renderer/tr_backend.c:565
#5  0x0000000000464358 in RB_DrawSurfs (data=0x7fcf7bf744f8) at
src/renderer/tr_backend.c:906
#6  0x000000000046442b in RB_ExecuteRenderCommands (data=0x7fcf7bf744f8) at
src/renderer/tr_backend.c:1081
#7  0x00000000004698cf in RE_EndFrame (frontEndMsec=0x0, backEndMsec=0x0) at
src/renderer/tr_cmds.c:433
#8  0x00000000004149aa in SCR_UpdateScreen () at src/client/cl_scrn.c:519
#9  0x000000000041175f in CL_Frame (msec=13) at src/client/cl_main.c:2190
#10 0x0000000000427f89 in Com_Frame () at src/qcommon/common.c:2679
#11 0x000000000048bf5b in main (argc=1, argv=0x7fff8f5cd5a8) at
src/unix/unix_main.c:1489

Variable i (int) overflows in RB_CalcModulateAlphasByFog, I guess:

(gdb) f 1
(gdb) print i
$8 = -2147483648
(gdb) list
760             // this is not wasted, because it would only have
761             // been previously called if the surface was opaque
762             RB_CalcFogTexCoords( texCoords[0] );
763
764             for ( i = 0; i < tess.numVertexes; i++, colors += 4 ) {
765                     float f = 1.0 - R_FogFactor( texCoords[i][0],
texCoords[i][1] );
Comment 8 Amanieu d'Antras 2008-07-12 16:00:17 EDT
I got a decent value of i when it crashed for me, crashed right at the start when i = 0. As I said before, try printing the contents of tess.xyz and don't use a release build, use a debug build.
Comment 9 Jaak Ristioja 2008-07-12 17:29:03 EDT
(In reply to comment #8)
> I got a decent value of i when it crashed for me, crashed right at the start
> when i = 0. As I said before, try printing the contents of tess.xyz and don't
> use a release build, use a debug build.
> 

Ok, yeah. Using a debug build crashes when i is 0.
Comment 10 keenblade 2008-07-15 04:35:19 EDT
I'm almost sure this bug is x86_64 specific. I use gentoo ~amd64 arch. I have not got a crash using i386 binaries. As soon as using x86_64, then crash is randomly and very frequently occurs.
Comment 11 Amanieu d'Antras 2008-07-15 06:51:44 EDT
If that is the case then the bug is probably due to a buffer overflow somewhere. Effects would be different on i386 because the stack layout is different.
Comment 12 Amanieu d'Antras 2008-07-21 16:27:56 EDT
Update on the bug:
The NaN tess.xyz values seem to be caused by a blood entity (when you shoot someone in trem there is a small spot of blood) having a NaN origin.
Comment 13 Amanieu d'Antras 2008-08-01 09:52:16 EDT
I've given up tracking this bug. Apparently it is somewhere in the cgame, and only occurs when using qvms. Putting some debug printfs in the code made the bug disappear.

Anyways, here's my fix for it:
--- a/src/renderer/tr_scene.c	Wed Jul 30 18:19:53 2008 +0800
+++ b/src/renderer/tr_scene.c	Fri Aug 01 01:07:03 2008 +0800
@@ -212,6 +212,9 @@
 	if ( r_numentities >= MAX_ENTITIES ) {
 		return;
 	}
+	if ( Q_isnan(ent->origin[0]) || Q_isnan(ent->origin[1]) || Q_isnan(ent->origin[2]) ) {
+		return;
+	}
 	if ( ent->reType < 0 || ent->reType >= RT_MAX_REF_ENTITY_TYPE ) {
 		ri.Error( ERR_DROP, "RE_AddRefEntityToScene: bad reType %i", ent->reType );
 	}
Comment 14 Tim Angus 2008-08-01 09:59:52 EDT
Try checking the values of re.origin and re.axis[ ] in CG_RenderParticle, right before the call to trap_R_AddRefEntityToScene.
Comment 15 Jaak Ristioja 2008-08-01 15:39:58 EDT
I was unable to reproduce this bug today after i compiled Tremulous without -ffast-math (sed -i -e 's/-ffast-math//g' Makefile).
Comment 16 keenblade 2008-08-08 17:01:40 EDT
(In reply to comment #15)
> I was unable to reproduce this bug today after i compiled Tremulous without
> -ffast-math (sed -i -e 's/-ffast-math//g' Makefile).
> 

This does not work for me. It still crashes frequently. Specially at map called aracnid 2.
Comment 17 keenblade 2008-08-09 10:58:26 EDT
(In reply to comment #13)
> I've given up tracking this bug. Apparently it is somewhere in the cgame, and
> only occurs when using qvms. Putting some debug printfs in the code made the
> bug disappear.
> 
> Anyways, here's my fix for it:
> --- a/src/renderer/tr_scene.c   Wed Jul 30 18:19:53 2008 +0800
> +++ b/src/renderer/tr_scene.c   Fri Aug 01 01:07:03 2008 +0800
> @@ -212,6 +212,9 @@
>         if ( r_numentities >= MAX_ENTITIES ) {
>                 return;
>         }
> +       if ( Q_isnan(ent->origin[0]) || Q_isnan(ent->origin[1]) ||
> Q_isnan(ent->origin[2]) ) {
> +               return;
> +       }
>         if ( ent->reType < 0 || ent->reType >= RT_MAX_REF_ENTITY_TYPE ) {
>                 ri.Error( ERR_DROP, "RE_AddRefEntityToScene: bad reType %i",
> ent->reType );
>         }
> 
Amanieu, this patch made the bug disappear for me, too. Thanks.
Comment 18 Amanieu d'Antras 2008-08-09 11:08:08 EDT
The bug is still there, the patch merely prevents it from doing damage.
Comment 19 Ryan C. Gordon 2009-09-14 19:02:57 EDT
I put the patch (and a Com_printf) in there, svn revision #1595. My belief is that when this is most likely to trigger, it's a compiler optimization bug, so no further fix is planned.

--ryan.
Comment 20 Jaak Ristioja 2013-02-27 15:27:00 EST
(In reply to comment #19)
> I put the patch (and a Com_printf) in there, svn revision #1595. My belief
> is that when this is most likely to trigger, it's a compiler optimization
> bug, so no further fix is planned.

Belief versus proven fact. Until we get to the bottom of this, regardless of whether this is compiler bug or not, we risk this issue re-appearing.
Comment 21 Ryan C. Gordon 2013-02-27 20:03:54 EST
(In reply to comment #19)
> I put the patch (and a Com_printf) in there, svn revision #1595. My belief
> is that when this is most likely to trigger, it's a compiler optimization
> bug, so no further fix is planned.

Wow, when I upgraded Bugzilla today, it decided to post a few unsent emails from 3.5 years ago, including Comment #19.

(In reply to comment #20)
> Belief versus proven fact. Until we get to the bottom of this, regardless of
> whether this is compiler bug or not, we risk this issue re-appearing.

You don't look busy, feel free to investigate.  :)

--ryan.