My list of Irrlicht optimizations.

hybrid · Post by **hybrid** » Fri Mar 11, 2005 9:24 am

Ok, I kept on searching for interesting places to insert calls to standard functions thereby replacing simple, but not compiler optimizable code.

Already mentioned the for loops which reset some arrays:

Code: Select all

void CZBuffer::clear()
{
        TZBufferType* p = Buffer;
        while(p != BufferEnd)
        {
                *p = 0; ++p;
        }
}

replaced by

Code: Select all

void CImage::fill(s16 color)
{
(...)
        s32* bufferEnd = p + ((Size.Width * Size.Height)>>1);
        s32 c = ((color & 0x0000ffff)<<16) | (color & 0x0000ffff);
        while(p != bufferEnd)
        {
                *p = c;
                ++p;
        }

void CZBuffer::clear()
{
        memset(Buffer, 0, (BufferEnd-Buffer)*sizeof(TZBufferType));
        return;
}

A little harder, but possible

Code: Select all

        s32* bufferEnd = p + ((Size.Width * Size.Height)>>1);
        s32 c = ((color & 0x0000ffff)<<16) | (color & 0x0000ffff);
        while(p != bufferEnd)
        {
                *p = c;
                ++p;
        }

which is already optimized (using one 32bit integer instead of two 16bits). But you can use
wmemset similarly to the above:

Code: Select all

        s32 c = (color <<16) | (color & 0x0000ffff);
        if (sizeof(wchar_t)==(sizeof(s16) << 1))
                wmemset((wchar_t*)Data, c,(Size.Width * Size.Height)>>1); 
        else if (sizeof(wchar_t)==sizeof(s16))
                wmemset((wchar_t*)Data, color,Size.Width * Size.Height);

As you can see I am also using the 32bit value, and additionally the compiler intrinsics to effieciently put that value to mem. Unfortunately I do not know if wchar_t is always set to 4 byte, so I left also a 2 byte variant in and (not shown here) also the original one for all other cases. If anybody could give me a hint on this?

@Pr3t3nd3r: Did you use any special compiler flags for testing your sqrt optimization? Maybe -ffast-math could do a better job than your hard coded approximative code. You should note that even after loop unrolling your code contains lots of float multiplications and will be worse compared to assembly code macros which wont be used anymore in case you use your "optimized" code together with compiler optimizations.

A better job might do hypot(x,y) to replace some sqrt calls:

Code: Select all

f64 getLength() const { return sqrt(X*X + Y*Y); }

replaced with

Code: Select all

f64 getLength() const { return hypot(X, Y); }

will still use intrinsics (and probably even more efficient), or it will result in higher precision without wasting more cycles. So anyway a good thing.

Some optimizations on color bit shifhting may be found in my Linux patch submitted in the following thread
http://irrlicht.sourceforge.net/phpBB2/ ... php?t=5734
applied to SColor.h (also applicable on windows, it's just named Linux patch since the major contribution is for X-Servers).

Note that virtually any compiler can and will calculate constant expressions during compilation, so no need to do it on your own. Thus, all shifts applied to constants are resolved during compilation, as well as calculations proposed in this thread (don't divide PI on your own, the compiler will do a better job, and you still know where the number came from. Same goes for disanti and his first proposal which was even worse due to #define usage)

Anyway, after applying all these patches the triangle handling becomes the major bottleneck, thus optimizations have to be applied to the more complex algortihms now.

hybrid · Post by **hybrid** » Fri Mar 11, 2005 9:26 am

Sorry, the optimized code for the second example moved into the first example. Cannot edit it, but it should be clear which goes where...

Irrlicht Engine

My list of Irrlicht optimizations.

Optimizations