Already mentioned the for loops which reset some arrays:
Code: Select all
void CZBuffer::clear()
{
TZBufferType* p = Buffer;
while(p != BufferEnd)
{
*p = 0; ++p;
}
}
Code: Select all
void CImage::fill(s16 color)
{
(...)
s32* bufferEnd = p + ((Size.Width * Size.Height)>>1);
s32 c = ((color & 0x0000ffff)<<16) | (color & 0x0000ffff);
while(p != bufferEnd)
{
*p = c;
++p;
}
void CZBuffer::clear()
{
memset(Buffer, 0, (BufferEnd-Buffer)*sizeof(TZBufferType));
return;
}
Code: Select all
s32* bufferEnd = p + ((Size.Width * Size.Height)>>1);
s32 c = ((color & 0x0000ffff)<<16) | (color & 0x0000ffff);
while(p != bufferEnd)
{
*p = c;
++p;
}
wmemset similarly to the above:
Code: Select all
s32 c = (color <<16) | (color & 0x0000ffff);
if (sizeof(wchar_t)==(sizeof(s16) << 1))
wmemset((wchar_t*)Data, c,(Size.Width * Size.Height)>>1);
else if (sizeof(wchar_t)==sizeof(s16))
wmemset((wchar_t*)Data, color,Size.Width * Size.Height);
@Pr3t3nd3r: Did you use any special compiler flags for testing your sqrt optimization? Maybe -ffast-math could do a better job than your hard coded approximative code. You should note that even after loop unrolling your code contains lots of float multiplications and will be worse compared to assembly code macros which wont be used anymore in case you use your "optimized" code together with compiler optimizations.
A better job might do hypot(x,y) to replace some sqrt calls:
Code: Select all
f64 getLength() const { return sqrt(X*X + Y*Y); }
Code: Select all
f64 getLength() const { return hypot(X, Y); }
Some optimizations on color bit shifhting may be found in my Linux patch submitted in the following thread
http://irrlicht.sourceforge.net/phpBB2/ ... php?t=5734
applied to SColor.h (also applicable on windows, it's just named Linux patch since the major contribution is for X-Servers).
Note that virtually any compiler can and will calculate constant expressions during compilation, so no need to do it on your own. Thus, all shifts applied to constants are resolved during compilation, as well as calculations proposed in this thread (don't divide PI on your own, the compiler will do a better job, and you still know where the number came from. Same goes for disanti and his first proposal which was even worse due to #define usage)
Anyway, after applying all these patches the triangle handling becomes the major bottleneck, thus optimizations have to be applied to the more complex algortihms now.