My list of Irrlicht optimizations.

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
disanti
Posts: 367
Joined: Sat Jan 17, 2004 1:36 am
Location: California, US
Contact:

My list of Irrlicht optimizations.

Post by disanti »

I was looking at the Irrlicht source code and decided I would optimize a few parts, include it or not Niko, you have your chance.

In CVideoOpenGL.cpp: (several pointless divisions that will slow the render)
replace:

Code: Select all

//! clears the zbuffer
bool CVideoOpenGL::beginScene(bool backBuffer, bool zBuffer, SColor color)
{
	CVideoNull::beginScene(backBuffer, zBuffer, color);

	GLbitfield mask = 0;

	if (backBuffer)
	{
		f32 inv = 1.0f / 255.0f;
		glClearColor(color.getRed() * inv, color.getGreen() * inv,
				color.getBlue() * inv, color.getAlpha() * inv);
...
with:

Code: Select all

#define RGB_TO_FLOAT(x) (x*0.00392f)

//! clears the zbuffer
bool CVideoOpenGL::beginScene(bool backBuffer, bool zBuffer, SColor color)
{
	CVideoNull::beginScene(backBuffer, zBuffer, color);

	GLbitfield mask = 0;

	if (backBuffer)
	{
		//f32 inv = 1.0f / 255.0f;
		//glClearColor(color.getRed() * inv, color.getGreen() * inv,
		//		color.getBlue() * inv, color.getAlpha() * inv);
		glClearColor(RGB_TO_FLOAT(color.getRed()), RGB_TO_FLOAT(color.getGreen()),
				RGB_TO_FLOAT(color.getBlue()), 1.0f);
Keep your eyes peeled, more coming...
________
Colorado Dispensaries
Last edited by disanti on Thu Feb 24, 2011 10:28 am, edited 1 time in total.
disanti
Posts: 367
Joined: Sat Jan 17, 2004 1:36 am
Location: California, US
Contact:

Post by disanti »

In CVideoOpenGL.cpp: (uneeeded for loop)
replace: in draw2DImage(video::ITexture* texture, const core::rect<s32>& destRect, const core::rect<s32>& sourceRect, const core::rect<s32>* clipRect, video::SColor* colors, bool useAlphaChannelOfTexture)

Code: Select all

if(colors==NULL) 
   { 
      colors=new SColor[4];
      for(int i=0;i<4;i++) colors[i]=SColor(0,255,255,255);
      bTempColors=true; 
   } 
with:

Code: Select all

if(colors==NULL) 
   { 
      colors=new SColor[4];
      //for(int i=0;i<4;i++) colors[i]=SColor(0,255,255,255);
	  colors[0]=SColor(0,255,255,255); 
	  colors[1]=SColor(0,255,255,255); 
	  colors[2]=SColor(0,255,255,255); 
	  colors[3]=SColor(0,255,255,255); 
      bTempColors=true; 
   } 
More coming...
________
ZX14 VS HAYABUSA
Last edited by disanti on Thu Feb 24, 2011 10:28 am, edited 1 time in total.
disanti
Posts: 367
Joined: Sat Jan 17, 2004 1:36 am
Location: California, US
Contact:

New Irrlicht Feature! Ability to disable depth testing.

Post by disanti »

Hello all,

For my game, I got tired of the character's arm sticking through walls, so I made an option for OpenGL to disable depth testing. This will also work good for FPS's who need their guns to stop poking through walls. What I did was added a flag to the SMaterial class, then in the OpenGL driver, OnSetMaterial, I added the glDisable(GL_DEPTH_TEST) if and only if the flag in the material is disabled.

The whole new SMaterial.h:
E_MATERIAL_FLAG:

Code: Select all

	//! Material flags
	enum E_MATERIAL_FLAG
	{
		//! Draw as wireframe or filled triangles? Default: false
		EMF_WIREFRAME = 0,

		//! Flat or Gouraud shading? Default: true
		EMF_GOURAUD_SHADING,

		//! Will this material be lighted? Default: true
		EMF_LIGHTING,

		//! Is the ZBuffer enabled? Default: true
		EMF_ZBUFFER,

		//! May be written to the zbuffer or is it readonly. Default: true
		//! This flag is ignored, if the material type is a transparent type.
		EMF_ZWRITE_ENABLE,

		//! Is backfaceculling enabled? Default: true
		EMF_BACK_FACE_CULLING,

		//! Is bilinear filtering enabled? Default: true
		EMF_BILINEAR_FILTER,

		//! Is trilinear filtering enabled? Default: false
		//! If the trilinear filter flag is enabled,
		//! the bilinear filtering flag is ignored.
		EMF_TRILINEAR_FILTER,

		//! Is fog enabled? Default: false
		EMF_FOG_ENABLE,

		//! This is not a flag, but a value indicating how much flags there are.
		EMF_MATERIAL_FLAG_COUNT, 

		//! Is depth testing enabled? Default: true
		EMF_DEPTH_TESTING
	};
Replace:

Code: Select all

		//! material flag union. This enables the user to access the
		//! material flag using e.g: material.Wireframe = true or
		//! material.flag[EMF_WIREFRAME] = true;
		union
		{
			struct
			{
				//! Draw as wireframe or filled triangles? Default: false
				bool Wireframe;

				//! Flat or Gouraud shading? Default: true
				bool GouraudShading;				

				//! Will this material be lighted? Default: true
				bool Lighting;						

				//! Is the ZBuffer enabled? Default: true
				bool ZBuffer;						

				//! May be written to the zbuffer or is it readonly. Default: true
				//! This flag is ignored, if the MaterialType is a transparent type.
				bool ZWriteEnable;					

				//! Is backfaceculling enabled? Default: true
				bool BackfaceCulling;				

				//! Is bilinear filtering enabled? Default: true
				bool BilinearFilter;	

				//! Is trilinear filtering enabled? Default: false
				//! If the trilinear filter flag is enabled,
				//! the bilinear filtering flag is ignored.
				bool TrilinearFilter;

				//! Is fog enabled? Default: false
				bool FogEnable;
			};

			bool Flags[EMF_MATERIAL_FLAG_COUNT];
		};
With:

Code: Select all

		//! material flag union. This enables the user to access the
		//! material flag using e.g: material.Wireframe = true or
		//! material.flag[EMF_WIREFRAME] = true;
		union
		{
			struct
			{
				//! Draw as wireframe or filled triangles? Default: false
				bool Wireframe;

				//! Flat or Gouraud shading? Default: true
				bool GouraudShading;				

				//! Will this material be lighted? Default: true
				bool Lighting;						

				//! Is the ZBuffer enabled? Default: true
				bool ZBuffer;						

				//! May be written to the zbuffer or is it readonly. Default: true
				//! This flag is ignored, if the MaterialType is a transparent type.
				bool ZWriteEnable;					

				//! Is backfaceculling enabled? Default: true
				bool BackfaceCulling;				

				//! Is bilinear filtering enabled? Default: true
				bool BilinearFilter;	

				//! Is trilinear filtering enabled? Default: false
				//! If the trilinear filter flag is enabled,
				//! the bilinear filtering flag is ignored.
				bool TrilinearFilter;

				//! Is fog enabled? Default: false
				bool FogEnable;

				//! Is depth testing enabled? Default: true
				bool DepthTesting;
			};

			bool Flags[EMF_MATERIAL_FLAG_COUNT];
		};
In CVideoOpenGL.h:
Add to end of function:
void CVideoOpenGL::setBasicRenderStates(const SMaterial& material, const SMaterial& lastmaterial,
bool resetAllRenderstates)

Code: Select all

// depth testing
	if (resetAllRenderstates || lastmaterial.DepthTesting != material.DepthTesting)
	{
		if (material.DepthTesting)
			glEnable(GL_DEPTH_TEST);
		else
			glDisable(GL_DEPTH_TEST);
	}
I hope this one helps! ;) It will help me a lot... Although I haven't tested it yet, I'll work on that right now.

Edit: Ok, it is not working, what am I doing wrong Niko? I could have sworn I pin-pointed the part of the engine that manages this.
________
Marine insurance advice
Last edited by disanti on Thu Feb 24, 2011 10:28 am, edited 1 time in total.
Guest

Post by Guest »

Regarding your optimizations, same thing you can get with compiler without changing the Irrlicht source code. Look at compiler switches.
Pr3t3nd3r
Posts: 186
Joined: Tue Feb 08, 2005 6:02 pm
Location: Romania
Contact:

Post by Pr3t3nd3r »

- Step 1
adding the engine code directly in to my code
completed
- Step 2 bench mark
- Duration 30 minutes
- Frames rendered 12
- Average FP minute 12/30= 0.4 (faster at the begining slower at the end) ... observation very low fps 0.0..

Results
1. irr::scenemanager:CsceneManager::drawAll
-Total Time 389.430.194
-seltime 8.789
-calls 26
- 99.3% spend time in ISceneNode::OnPreRender
//My particles are child of some scene nodes so in ISceneNode::OnPreRender it's calling CParticleSystemSceneNode::OnPreRender
//in CParticleSystemSceneNode::OnPreRender the doParticleSystem is 100% using the time
//but in doParticleSystem the time is used 97.9% by the erase
//irr::core::array<irr::scene::Sparticle>::erase
// going in this function
- Total Time 378.340.796
- Self time 44.585.694
- Calls 110.800
- operator = contribution 88.2% (= irr::scene::SParticle)
- total time 334.471.897
- self time 182.642.351
- Calls 99.802.384
-operator = contribution 45.4% in op (= irr::scene::SParticle) (= irr::core::vector3d<float>)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Concluzions:
- DoParticleSystem waz called 201 times
- erase was called 110.800 times
- operator =(Sparticle) was called 99.802.384
- operator =(vector3d) was called 299.913.267 times
The big diference in calls is from erase and =(Sparticle)
in the erase code is this observation
//! Erases an element from the array. May be slow, because all elements
//! following after the erased element have to be copied.
I belive that will be morefaster if the the particle system will use an List and the moving of the rest of particles when you eliminate one prticle will not be necesare

+ if i have the ocazion to say
-the bug with the orientation op particles when they are child of an object and the parent move

http://irrlicht.sourceforge.net/phpBB2/ ... icle#11000
http://irrlicht.sourceforge.net/phpBB2/ ... icle#12796
http://irrlicht.sourceforge.net/phpBB2/ ... icle#26646
http://irrlicht.sourceforge.net/phpBB2/ ... php?t=4525 plz repair the orientation to be automaticly done

I belive that should be 2 types of particles: 1 particle witch go in the direction of the emitor(particle sword ... etc) and particle system in what every particle have his own direction independent of the emitor(trails)
Pr3t3nd3r
Posts: 186
Joined: Tue Feb 08, 2005 6:02 pm
Location: Romania
Contact:

Post by Pr3t3nd3r »

Step3
- removing all the particles from my game to do real bench mark
frames 2500
fps 4
time 10 min



optimization after the number of calls
... not impoving the too much the speed of that call but improve the speed ot the caller function
///////
vector3d -constructors
vector2d -constructors


vector3d<float>
calls 4.292.631
selftime 2917.860
mows called in S3DVertex (contribution 81,2%)
sugestion using reference to pass the values ... improvment ~2%
operator = (S3DVertex)
sugestion:
.... inline?
use of the code of the = with out appealing to the function where is possible
position2d<int> ... probabily same as vector3d
scolor.h
getred
getgreen
getblue
getalpha
... must be speed up .. hmm


S3DVertex (NUL CONSTRUCTOR of the structure !!! S3DVertex(){})
... calls 1.742.051
time 7.957.573
... time used by the consturctors of vectors and scolor





functions to be optimized ...
... drawIndexedTriangleList
calls 178.914 .. time 85.774.905
probabily most time spend in pID3DDevice->DrawIndexedPrimitiveUp
... the setrenderstate3dmode is 10% of the total time

drawIndexedTriangleList is most called by drawMeshBuffer (98.7%)
the switch form drawMeshBuffer
maybe can be replaced by something like

if (mb->getVertexType()==video::EVT_STANDARD)
{
drawIndexedTriangleList((video::S3DVertex*)mb->getVertices(), mb->getVertexCount(), mb->getIndices(), mb->getIndexCount()/ 3);
return;
}
// if there aren't any other values posible // if if there are
drawIndexedTriangleList((video::S3DVertex2TCoords*)mb->getVertices(), mb->getVertexCount(), mb->getIndices(), mb->getIndexCount()/ 3);


or maybe the verification of the vertex type to be done in Canimatedmeshscenenode::render before the for and to be diferent function appeal
Pr3t3nd3r
Posts: 186
Joined: Tue Feb 08, 2005 6:02 pm
Location: Romania
Contact:

Post by Pr3t3nd3r »

step 4
duration 60 minutes
updateAbsolutPozition

matrix4.h
inline void matrix4::transformVect( vector3df& vect) const
{
f32 vector[3];

vector[0] = vect.X*M[0] + vect.Y*M[4] + vect.Z*M[8] + M[12];
vector[1] = vect.X*M[1] + vect.Y*M[5] + vect.Z*M[9] + M[13];
vector[2] = vect.X*M[2] + vect.Y*M[6] + vect.Z*M[10] + M[14];

vect.X = vector[0];
vect.Y = vector[1];
vect.Z = vector[2];
}

with

inline void matrix4::transformVect( vector3df& vect) const
{

f32 vector x=vect.X,y=vect.Y,z=vect.Z;

vect.X = x*M[0] + y*M[4] + z*M[8] + M[12];
vect.Y = x*M[1] + y*M[5] + z*M[9] + M[13];
vect.Z[2] = x*M[2] + y*M[6] + y*M[10] + M[14];
}

ISceneNode.h

virtual void updateAbsolutePosition()
{
if (Parent)
AbsoluteTransformation =
Parent->getAbsoluteTransformation() * getRelativeTransformation();
else
AbsoluteTransformation = getRelativeTransformation();
}
maybe is faster
virtual void updateAbsolutePosition()
{
if (Parent)
{
AbsoluteTransformation =
Parent->getAbsoluteTransformation() * getRelativeTransformation();
return;
}
AbsoluteTransformation = getRelativeTransformation();
}



IGUIElement.h

//! Updates the absolute position.
virtual void updateAbsolutePosition()
{
core::rect<s32> parentAbsolute(0,0,0,0); /// static declaration in the class ?
core::rect<s32> parentAbsoluteClip; // idem

if (Parent)
{
parentAbsolute = Parent->AbsoluteRect;
parentAbsoluteClip = Parent->AbsoluteClippingRect;
}

AbsoluteRect = RelativeRect + parentAbsolute.UpperLeftCorner;

if (!Parent)
parentAbsoluteClip = AbsoluteRect;

AbsoluteClippingRect = AbsoluteRect;
AbsoluteClippingRect.clipAgainst(parentAbsoluteClip);

// update all children
core::list<IGUIElement*>::Iterator it = Children.begin();
for (; it != Children.end(); ++it)
(*it)->updateAbsolutePosition();
}

with

//! Updates the absolute position.
virtual void updateAbsolutePosition()
{
parentAbsolute.set(0,0,0,0); /// static declaration in the class ?


if (Parent)
{
parentAbsolute = Parent->AbsoluteRect;
parentAbsoluteClip = Parent->AbsoluteClippingRect;
AbsoluteClippingRect = RelativeRect + parentAbsolute.UpperLeftCorner;
}

else
{
AbsoluteClippingRect = RelativeRect + parentAbsolute.UpperLeftCorner;
parentAbsoluteClip = AbsoluteRect;
}

AbsoluteClippingRect.clipAgainst(parentAbsoluteClip);

// update all children
core::list<IGUIElement*>::Iterator it = Children.begin();
for (; it != Children.end(); ++it)
(*it)->updateAbsolutePosition();
}

matri4.h
inline void matrix4::setRotationDegrees( const vector3df& rotation )
{
setRotationRadians( rotation * (f32)3.1415926535897932384626433832795 / 180.0 );
}

inline void matrix4::setInverseRotationDegrees( const vector3df& rotation )
{
setInverseRotationRadians( rotation * (f32)3.1415926535897932384626433832795 / 180.0 );
}
with
inline void matrix4::setRotationDegrees( const vector3df& rotation )
{

setRotationRadians( rotation * (f32)0.017453292519943295769236907684883 );
}

inline void matrix4::setInverseRotationDegrees( const vector3df& rotation )
{
setInverseRotationRadians( rotation * (f32)0.017453292519943295769236907684883 );
}


virtual core::matrix4 getRelativeTransformation() const
{
core::matrix4 mat;
mat.setRotationDegrees(RelativeRotation);
mat.setTranslation(RelativeTranslation);

if (RelativeScale != core::vector3df(1,1,1))
{
core::matrix4 smat;
smat.setScale(RelativeScale);
mat *= smat;
}

return mat;
}
with
// mat and smat ... static definitions in class
//in class core::vector3df vectunit(1,1,1);
virtual core::matrix4 getRelativeTransformation() const
{
mat.setRotationDegrees(RelativeRotation);
mat.setTranslation(RelativeTranslation);

if (RelativeScale != vectunit)
{
smat.setScale(RelativeScale);
mat *= smat;
}

return mat;
}
Pr3t3nd3r
Posts: 186
Joined: Tue Feb 08, 2005 6:02 pm
Location: Romania
Contact:

Post by Pr3t3nd3r »

the function isCulled
is performing culling with cam->getViewFrustrum()->boundingBox
the real frustrum isn't a box. because i use an very big FarValue (the game is in space ... any way there are optimizations... 4 me) , there are many meshes send to the video card what are not rendered ...

If a game will render an outside scene and there are many complex objects on the scene them will be send too to the grafic card ...
maybe one implementation of a real frusturm ... intersection with frustrum planes(to be optional) for large farvalue, on outdor scene ...
Pr3t3nd3r
Posts: 186
Joined: Tue Feb 08, 2005 6:02 pm
Location: Romania
Contact:

Post by Pr3t3nd3r »

added optimization.h
for:
abs ... i use in game
fast distance(2d)
sqrt sqrtfast, sqrtfa (one version for floats and one for double) ... and change all the sqrt, and sqrtf in the engine with something like that or an other code .. asambly .. etc
(sqrt, sqrtf, abs ... are very slow)
for the code for sqrtfast and fast distance i found some links on the forum
the sqrtfast i think is the same code that the compiler is doing with out exception treating, calling ... etc and reduce number of iteration (less precizion ? ... but i havn't see real problems because of the loss of precizion ... )
i don't realy understant the code (not even try) and i can not make a version for floats ..
(this code are tested and they are more fast ... )


#ifndef optimizari_H
#define optimizari_H
inline float fastabs(float f)
{
if (f<0) return -f;
return f;
}

inline float distance(float dx, float dy)
{
float min, max, approx;
if ( dx < 0 ) dx = -dx;
if ( dy < 0 ) dy = -dy;
if ( dx < dy )
{
min = dx;
max = dy;
} else
{
min = dy;
max = dx;
}
approx = ( max * 0.9833984375 ) + ( min * 0.4306640625 );
if ( max < ( min * 16 ))
approx -= ( max * 0.0390625 );
}

#define itable ((double *)xtable)

static int xtable[16] = {
0x540bcb0d,0x3fe56936, 0x415a86d3,0x3fe35800, 0xd9ac3519,0x3fe1c80d,
0x34f91569,0x3fe08be9, 0x8f3386d8,0x3fee4794, 0x9ea02719,0x3feb5b28,
0xe4ff9edc,0x3fe92589, 0x1c52539d,0x3fe76672 };

static int norm2(double *t) {
unsigned int e,f,g;
f = ((((unsigned int *)t)[1])>>1)+0x1FF80000;
f &= 0xFFF00000;
e = ((unsigned int *)t)[1];
g = e&0x000FFFFF;
((int *)t)[1] = g+0x40000000-(e&0x00100000);
return f;
}


//inline float sqrtfast(float r)
inline double sqrtfa(double y)
{
double a;
int e,c;

e = norm2(&y);
c = (((int *)&y)[1])>>(18)&(7);
a = itable[c];

for(c=5;c>=0;--c)
a = 0.5*a*(3.0-y*a*a);
a*=y;

((int *)&a)[1] &= 0x000FFFFF;
((int *)&a)[1] |= e;

return a;
}

//inline float sqrtfast(float r)
inline double sqrtfast(double y)
{

double a;
int e,c;

e = norm2(&y);
c = (((int *)&y)[1])>>(18)&(7);
a = itable[c];

for(c=5;c>=0;--c)
a = 0.5*a*(3.0-y*a*a);
a*=y;

((int *)&a)[1] &= 0x000FFFFF;
((int *)&a)[1] |= e;

return a;
}
#endif
disanti
Posts: 367
Joined: Sat Jan 17, 2004 1:36 am
Location: California, US
Contact:

Post by disanti »

Wow... you did a lot Pr3t3nd3r! :shock:
________
Honda cr-v
Last edited by disanti on Thu Feb 24, 2011 10:28 am, edited 1 time in total.
Guest

Post by Guest »

In the class CGUIEnvironment where is the clear, removeall function? ... will be nice to have something like that ...
Pr3t3nd3r
Posts: 186
Joined: Tue Feb 08, 2005 6:02 pm
Location: Romania
Contact:

Post by Pr3t3nd3r »

very nicely done 0.8 Niko. Cool efects. Good job.

I don't yet testet the new engine because i made some changes in the old one and i need to modify the new version and to recompile it. I belive the particle bug is fixed :D and now we have bowth types of particle.
Also I send you some code about the 3 state buttons but i didn't recive any answer ...
niko
Site Admin
Posts: 1759
Joined: Fri Aug 22, 2003 4:44 am
Location: Vienna, Austria
Contact:

Post by niko »

Pr3t3nd3r wrote:Also I send you some code about the 3 state buttons but i didn't recive any answer ...
Wow, I just checked my spam folder and it looks like all your mails were identified as spam. Never read them until now. Sorry! Writing an answer soon.
Guest

Post by Guest »

@ Pr3t3nd3r
Will you show us your optimized engine somewhen?
hybrid

Cache friendly programming

Post by hybrid »

Hi,

I was wondering, why some parts of the Irrlicht functions use a by-column traversal of matrices (i.e. C arrays), where by-row traversal is much more efficient due to improved cache hit. Examples can be found in CImage.cpp and many other files. Any good reason?

Furthermore, some array copy loops can be replaced by memcpy or memset, which has been done in many places already. Should be quite a performance boost. I will try to identify most if not all of these places and fix them.
Post Reply