at least on x86 you'll get greater precision using temporary f64 values. (same time for arithmetic operations, the only instructions that change are load and store from registers to cache.. of course mobile is another story.. maybe: I don't know wich instructions have android phones etc.)
Also you can skip one multiplication by 2 (wich is extra instruction even if the compiler decide to optimize it.)
Code: Select all
const f64 scale = f64(sqrtf(diag)); // get scale from diagonal
const f64 scaleinv = 0.5 / scale; // invers of scale for speedup
X = f32( (f64(m[6]) - f64(m[9])) * scaleinv );
Y = f32( (f64(m[8]) - f64(m[2])) * scaleinv );
Z = f32( (f64(m[1]) - f64(m[4])) * scaleinv );
W = f32( 0.50 * scale );
to profile against:
Code: Select all
const f64 scaleinv = 0.5 / f64(sqrtf(diag));
X = f32( (f64(m[6]) - f64(m[9])) * scaleinv );
Y = f32( (f64(m[8]) - f64(m[2])) * scaleinv );
Z = f32( (f64(m[1]) - f64(m[4])) * scaleinv );
W = f32( 0.25 / scaleinv );
The compiler can't do that kind of optimizations on its own because those optimizations could break a ton of code.
At some point irrlicht decided to switch project optimization level, wich at least for GCC means going from x87 extended register to MMX registers (that are used with limited 32 and 64 bit precision even when not using SSE ) wich could potentially have reduced precision for all users in all places where temporary f32 values were used (because a float on x87 have 80 bit O_O)..
Someone should take actual irrlicht code, all snippets in this thread and profile performance and precision
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me