Take advantage of your harware

full.metal.coder · Post by **full.metal.coder** » Tue Aug 26, 2008 12:16 pm

Recent CPUs have extended instruction sets which can dramatically improve performance, especially in application intensively using floating points math.

If your CPU support MMX and/or SSE you can get the full benefit of your harware by adding a couple of flags to your compiler.

Important Notes :
the following instructions only works with GCC (and its ports). Dependeing on which version of GCC you are using some instruction sets may or may not be supported (sss3, sse4 and sse5 are only since gcc 4.3.0)

make sure your CPU supports these instruction sets (refer to online docs or, under linux, read the output of cat /proc/cpuinfo)

binary compiled with these instruction sets will CRASH if you attempt run them on CPU that do not support them

This should be the LAST optimization to turn on. Always start by optimizing C++ code with your brains and a profiler BEFORE playing with compiler optimizations.

The trick : add some of the following options to you CFLAGS / CXXFLAGS (depends on your build system) :

-mmmx : enables use of MMX instructions
-msse : enables use of SSE instructions
-msse2 : enables use of SSE2 instructions
-msse3 : enables use of SSE3 instructions
-mssse3 : enables use of SSSE3 instructions
-msse4 : enables use of SSE4 instructions
-msse5 : enables use of SSE5 instructions

-mfpmath=sse : use SSE registers and instruction for floating point math instead of 'normal' floating point (also know as x87), much better than any fast-math switch both in terms of speed and precision
It is highly recommended to use the same value for all files accross the whole project (and external libs as well if possible, though not required).

-march=? : ask the compiler to take advantage of a a given CPU architecture, refer to GCC manual for valid values

Another important trick : do NOT use -O3 if you are using GCC 4.x Even the manual confirms that the speed gain is negligable in most case while the size increase and the instability induced might not be...

A few examples :

Irrlicht, src/Irrlicht/Makefile :

Code: Select all

ifndef NDEBUG
CXXFLAGS += -g -D_DEBUG
else
CXXFLAGS += -fexpensive-optimizations -O2 -march=prescott -mmmx -msse -msse2 -msse3 -mfpmath=sse
endif
ifdef PROFILE
CXXFLAGS += -pg
endif
CFLAGS := -fexpensive-optimizations -O2 -DPNG_THREAD_UNSAFE_OK -DPNG_NO_MNG_FEATURES -march=prescott -mmmx -msse -msse2 -msse3 -mfpmath=sse

Bullet, Jamrules :

Code: Select all

COMPILER.CFLAGS.optimize += -O2 -fomit-frame-pointer -march=prescott -mmmx -msse -msse2 -msse3 -mfpmath=sse ;

Enjoy the speed improvement

[/code]

BlindSide · Post by **BlindSide** » Thu Aug 28, 2008 9:12 am

full.metal.coder wrote:-mfpmath=see

typo, should be sse.

torleif · Post by **torleif** » Thu Aug 28, 2008 10:24 am

Looks like the art of optimizing using ints is gone.

Personally I preferred it... one of the great CompSci stories is when a worker at mac programmed rounded corners with no floating point numbers. We'd still have 95' style square corners if it where not for him

full.metal.coder · Post by **full.metal.coder** » Fri Aug 29, 2008 7:45 am

@BlindSide : indeed. hopefully this board allows editing even after several days

@torleif : of course not. arithmetic remains a lot faster than floating points, even using MMX/SSE but there are a number of places where integers are just not affordable... This tricks are not design/algorithmic optimization but assembly-level optimization which can even speed up integer maths btw since it gives access to a dozen of new registers and also because the very nature of SSE (Streaming SIMD (Single Instruction Multiple Data) Extension) enable some very interesting optimizations.

and if you want to keep the speed of integers and the use of decimals you can wrap up your own FIXED POINT math lib which will rely on integers but just interpret them differently. for instance a 32 bit integers is considered as bit range [2^16 , 2^-15] instead of [2^31, 2^0]. Actually you could even create floats with a couple of tricks like storing the position of point and changing it during calculations according to the value of the result. You can add, substract and multiply fixed point numbers as integers (division may be a bit trickier however) and I suppose that's how that mac coder got proper rounded corners.

Irrlicht Engine

Take advantage of your harware

Take advantage of your harware

Re: Take advantage of your harware