Optimisation

If you are a new Irrlicht Engine user, and have a newbie-question, this is the forum for you. You may also post general programming questions here.
Post Reply
lincsimp
Posts: 21
Joined: Sat Mar 12, 2005 10:11 pm

Optimisation

Post by lincsimp »

Hey
I know that early optimisation is the root of all evil, etc ,etc , but changing:

for (s32 i=0; i<16; ++i)
M = 0.0f;
M[0] = M[5] = M[10] = M[15] = 1;

in matrix4::makeidentity to:

M[0] = 1;
M[1] = 0;
M[2] = 0;
M[3] = 0;
M[4] = 0;
M[5] = 1;
M[6] = 0;
M[7] = 0;
M[8] = 0;
M[9] = 0;
M[10] = 1;
M[11] = 0;
M[12] = 0;
M[13] = 0;
M[14] = 0;
M[15] = 1;

reduces the times taken for the function by ~85% (30.39s to 4.09s /1000000)

which may help...
esaptonor
Posts: 145
Joined: Sat May 06, 2006 11:59 pm

Post by esaptonor »

I know this post was a long time ago, but i have irrlicht version 1.0 and it hasn't been changed, so is it worth changing? or does that method not get called enough to merit changing?
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Post by hybrid »

Simply tell your compiler to unroll loops and you'll get the same thing for free. No need to mess around with the code.
RapchikProgrammer
Posts: 279
Joined: Fri Dec 24, 2004 6:37 pm

Post by RapchikProgrammer »

I think his code should be atleast a little better cause the values of 0,5,10 and 15 are changed twice! First to 0 and then to 1! And in my opinion even the smallest of changes here would be really useful, cause i think the world, projection and view matrices are set to identity matrix at the render of every frame!
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Post by hybrid »

The latest matrix4 code is much better: A memset clearing the complete data and just setting 4 floats. This should give much better improvement.
CuteAlien
Admin
Posts: 9716
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Post by CuteAlien »

hybrid wrote:The latest matrix4 code is much better: A memset clearing the complete data and just setting 4 floats. This should give much better improvement.
Are you sure memset is really faster? I did some tests in gcc by profiling memset vs. loops. Memset was faster when not compiling optimized, with -O2 it had the same speed and with -O3 the loop was faster (2x the speed of memset). So it seems to depend on how you compile the application. In games -O3 is often useful (not always), so memset seems to be worse here.
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Post by hybrid »

Did you use -mtune=i686 -sse2 (or whatever you have)? Memset uses intrinsics which are highly optimized to use the optimal machine code calls. However, for 16 bytes it might not always be better (because you might get the optimal values for both).
Also did you use other numbers in the memset call - and which additional overhead / cache strategy did you target?
CuteAlien
Admin
Posts: 9716
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Post by CuteAlien »

I just compiled without optimization, -O2 and -O3. I tried now optimizing for 686, but it doesn't seem to make a difference. Maybe the test ain't that good as i'm just calling it a million times in a loop and measure that time (i'm using the values after each step, so it won't optimize the loop away). This ain't such a good test, as it could be easier to optimize for the compiler than it would be possible with some more code around it.

But i did some more reading about it around the web, and well... seems like people can't agree which version is faster ;-). Guess i'll stay with loops until i get a faster result the other way :-)
Post Reply