Created attachment 2713[details]
Micro-optimize VectorNormalize{,2}
The current VectorNormalize{,2} functions call sqrt unnecessarily.
vec_t VectorNormalize( vec3_t v ) {
// NOTE: TTimo - Apple G4 altivec source uses double?
float length, ilength;
length = v[0]*v[0] + v[1]*v[1] + v[2]*v[2];
length = sqrt (length);
if ( length ) {
ilength = 1/length;
v[0] *= ilength;
v[1] *= ilength;
v[2] *= ilength;
}
return length;
}
sqrt(length) == 0.0f if and only if length = 0.0f. Thus, the sqrt should be moved inside the if statement.
Additionally, gcc is unable to recognize that it may use reciprocal-sqrt when available to calculate ilength. As such, it's more efficient to do this
ilength = 1/sqrtf(length);
length *= ilength;
rather than
length = sqrt(length);
ilength = 1/length;
since the first consists of a reciprocal-sqrt and multiply and the second is a sqrt and a reciprocal or divide.
Also note, sqrt_f_ should be used instead of sqrt. Otherwise gcc wants to convert to and from double-precision which is silly. sqrtf isn't available to QVMs though, so casting (float)sqrt(length) is used since it generates identical code to sqrtf.
Attached is the final patch.
Created attachment 2714[details]
c.c
If you're worried about whether floating-point rounding errors cause sqrt(x) == 0.0 but x != 0.0, try to make this program assert().
Created attachment 2713 [details] Micro-optimize VectorNormalize{,2} The current VectorNormalize{,2} functions call sqrt unnecessarily. vec_t VectorNormalize( vec3_t v ) { // NOTE: TTimo - Apple G4 altivec source uses double? float length, ilength; length = v[0]*v[0] + v[1]*v[1] + v[2]*v[2]; length = sqrt (length); if ( length ) { ilength = 1/length; v[0] *= ilength; v[1] *= ilength; v[2] *= ilength; } return length; } sqrt(length) == 0.0f if and only if length = 0.0f. Thus, the sqrt should be moved inside the if statement. Additionally, gcc is unable to recognize that it may use reciprocal-sqrt when available to calculate ilength. As such, it's more efficient to do this ilength = 1/sqrtf(length); length *= ilength; rather than length = sqrt(length); ilength = 1/length; since the first consists of a reciprocal-sqrt and multiply and the second is a sqrt and a reciprocal or divide. Also note, sqrt_f_ should be used instead of sqrt. Otherwise gcc wants to convert to and from double-precision which is silly. sqrtf isn't available to QVMs though, so casting (float)sqrt(length) is used since it generates identical code to sqrtf. Attached is the final patch.
Created attachment 2714 [details] c.c If you're worried about whether floating-point rounding errors cause sqrt(x) == 0.0 but x != 0.0, try to make this program assert().