optimization and speed ups
optimization and speed ups
i asked in in prevus thresd about speed ups and stuff but no repliy
i was looking at some of the simple code like skybox.cpp and can see that some things can be done like it reads
for(i =0; i < 6 ; ++i)
{
driver->setMaterial(Material);
driver->drawIndexedTriangleList(&Vertices[i*4], 4, Indices, 2);
}
now i know that in asm ithis converts its costs time to check if i has reached 6
and dose a test each iterration then incrments this all takes clock ticks
whar as loop unroling can speed up things as we know its only 6 lines so if we put
driver->setMaterial(Material[0]);
driver->drawIndexedTriangleList(&Vertices[0*4], 4, Indices, 2);
driver->setMaterial(Material[1]);
driver->drawIndexedTriangleList(&Vertices[1*4], 4, Indices, 2);
driver->setMaterial(Material[2]);
driver->drawIndexedTriangleList(&Vertices[2*4], 4, Indices, 2);
driver->setMaterial(Material[3]);
driver->drawIndexedTriangleList(&Vertices[3*4], 4, Indices, 2);
driver->setMaterial(Material[4]);
driver->drawIndexedTriangleList(&Vertices[4*4], 4, Indices, 2);
driver->setMaterial(Material[5]);
driver->drawIndexedTriangleList(&Vertices[5*4], 4, Indices, 2);
No Checking or incrmenting "I" just add it save clock ticks.
so can we start a thread that will optimizations that can be aded to the next release of IRRLICT and make it better faster coz this engin is great its like click and play
i know its not a lot saved but evry littel helps
i was looking at some of the simple code like skybox.cpp and can see that some things can be done like it reads
for(i =0; i < 6 ; ++i)
{
driver->setMaterial(Material);
driver->drawIndexedTriangleList(&Vertices[i*4], 4, Indices, 2);
}
now i know that in asm ithis converts its costs time to check if i has reached 6
and dose a test each iterration then incrments this all takes clock ticks
whar as loop unroling can speed up things as we know its only 6 lines so if we put
driver->setMaterial(Material[0]);
driver->drawIndexedTriangleList(&Vertices[0*4], 4, Indices, 2);
driver->setMaterial(Material[1]);
driver->drawIndexedTriangleList(&Vertices[1*4], 4, Indices, 2);
driver->setMaterial(Material[2]);
driver->drawIndexedTriangleList(&Vertices[2*4], 4, Indices, 2);
driver->setMaterial(Material[3]);
driver->drawIndexedTriangleList(&Vertices[3*4], 4, Indices, 2);
driver->setMaterial(Material[4]);
driver->drawIndexedTriangleList(&Vertices[4*4], 4, Indices, 2);
driver->setMaterial(Material[5]);
driver->drawIndexedTriangleList(&Vertices[5*4], 4, Indices, 2);
No Checking or incrmenting "I" just add it save clock ticks.
so can we start a thread that will optimizations that can be aded to the next release of IRRLICT and make it better faster coz this engin is great its like click and play
i know its not a lot saved but evry littel helps
-
- Posts: 62
- Joined: Fri Jan 07, 2005 4:37 pm
- Location: California
It might be that everyone is waiting for the engine to stablize, and are waiting to see what niko will put in 0.8, before too much optimization takes place.
Just a thought...
But good ideas here...
I bet there are a lot of small things that could be done.
Although, I have only see a few cases in the forum where a posters code was actually commented on by niko. (usually a bug fixed)
Just a thought...
But good ideas here...
I bet there are a lot of small things that could be done.
Although, I have only see a few cases in the forum where a posters code was actually commented on by niko. (usually a bug fixed)
ross, there are many optimizations you can apply, especially with the VS .NET Compiler.
* You can compile with support for enhanced instruction sets ( /arch:SSE or /arch:SSE2 ), I only recommend SSE, not SSE2.
* As you've said, you can set ( Optimization ) to /O2 to Maximize Speed of compiled code.
* You can set "Favor Size or Speed" to /Ot to Favor Fast Code.
* You can set "References" to /Opt:REF to Eliminate unreferenced function from the code.
There are also programmatic ways to optimize the code as well.
* You can set the FPU Control Words, here's the fastest setup -
Instead of using the default trigonometry functions, you can use lookup tables.
Pentitm processors work best with 16/32 bit alignment, so you should try and set your data alignment to that.
Inline as many short functions as you can.
Declare constantly used variables in a class as a class member to prevent constantly allocating the data new everytime it's used.
The list goes on, but for I believe, the main priority is to get the functionality in, then to worry about the efficiency.
* You can compile with support for enhanced instruction sets ( /arch:SSE or /arch:SSE2 ), I only recommend SSE, not SSE2.
* As you've said, you can set ( Optimization ) to /O2 to Maximize Speed of compiled code.
* You can set "Favor Size or Speed" to /Ot to Favor Fast Code.
* You can set "References" to /Opt:REF to Eliminate unreferenced function from the code.
There are also programmatic ways to optimize the code as well.
* You can set the FPU Control Words, here's the fastest setup -
Code: Select all
// set FPU precision to 24 bits.
_control87 ( _PC_24, _MCW_PC );
// set rounding control to Chop
_control87 ( _RC_CHOP, _MCW_RC );
// set denormal control to flush
_control87 ( _DN_FLUSH, _MCW_DN );
Pentitm processors work best with 16/32 bit alignment, so you should try and set your data alignment to that.
Inline as many short functions as you can.
Declare constantly used variables in a class as a class member to prevent constantly allocating the data new everytime it's used.
The list goes on, but for I believe, the main priority is to get the functionality in, then to worry about the efficiency.
well wol it not be more compable too to just optmise it your self too setting too meny swtichs in ya compiler aint to good coz code hangs and and programs crash iv had this a fue time but as i sead if we all chip in and do it we all get a better engin with out tweaking any compilers even pople that dont tinker with the compiler will still get a faster engin
oh and about the lookup table omg that brings back some memorys with
raycasting hay shall we use the math of 486 or shall we use sin and cos in a lookup table hahaha thanks for that memory m8.
Any way thank you all for your reply's to this thread.
oh and about the lookup table omg that brings back some memorys with
raycasting hay shall we use the math of 486 or shall we use sin and cos in a lookup table hahaha thanks for that memory m8.
Any way thank you all for your reply's to this thread.
All this criticism of optimization! I agree that features are more important, but features mean nothing if they don't run at decent speeds. It is too early for excessive optimiztion, but things such as loop unrolling in critcial areas can make a slight difference and cause no problems. Ross, you could try posting optimization liek this on nx++, they're more likely to get included there than by niko.
You do a lot of programming? Really? I try to get some in, but the debugging keeps me pretty busy.
Crucible of Stars
Crucible of Stars