Page 2 of 2

Posted: Wed Dec 15, 2010 8:37 am
by Simpe
... Or in a place where you can perform multiple vector operations at once. Such as a brute-force culling-algorithm.

However, I'm assuming that you've been testing on x86/x64 which are out-of-order cpu's and they can be pretty awesome at times to swallow "bad code". If you compare that to a in-order cpu (xbox360/ps3/mobile devices etc), you'll get completely different results. If Irrlicht is thought to be used in none-x86/x64-platforms, SSE should definetly be implemented for performance.

Or if you're going to evaluate something like SSE, atleast try multiple platforms, specifically platforms that benefits from it.

Posted: Wed Dec 15, 2010 9:37 am
by fmx
Simpe, you have a point, but remember that most uses of irrlicht are currently on desktop systems which dont use out-of-order CPUs.

I think more tests should be done before this is outright rejected.
I'm developing a custom GLES 2.0 based renderer for my iOS engine (iPhones, etc) and I make use of many base irrlicht types, including Vector2D, Vector3D and Matrix4's.

devsh if you can post your SSE versions of these then I can benchmark the performance of your changes on my iPhone4.

Posted: Wed Dec 15, 2010 8:14 pm
by devsh
I have posted my matrix class... but the functions are incomplete... either just copy them from the class and use pointer arithmetic to treat the 4 __m128 like 16 floats or complete the SSE implementation the reference is pretty easy... SSE is most useful with matrices so I'd start with that... you will have to test combinations of SSE functions against normal functions, stuff like multiplying by another matrix, assignment from scalar, and especially computing the inverse is always obviously faster and assignment of 4 __m128 is faster than memcpy or the same, but other stuff I found slower like some assignments (identity), transposing can be slower with -03 and -ffast-math.

Posted: Thu Dec 16, 2010 10:30 am
by Simpe
fmx wrote:Simpe, you have a point, but remember that most uses of irrlicht are currently on desktop systems which dont use out-of-order CPUs.
I think you meant the other way, (most desktop systems are out-of-order execution cpu's)... possibly because I typoed it in my post :P

But yeah, most irrlicht users run on desktop systems but for those who don't I'd say that something like this is extremely important since it makes a huge diff on performance. Just like vcalls does on in-order-machines ;)

Posted: Thu Dec 16, 2010 12:57 pm
by fmx
:oops: I honestly had no idea, my experience is limited to consumer desktop PCs and iPhones, I still have a lot to learn about other hardware and differences in CPU architecture.

Posted: Thu Dec 16, 2010 11:02 pm
by BlindSide
You're gonna need to use SOA form if you want a decent speed boost, that would require re-writing a lot of the algorithms in Irrlicht.

One trick I found useful is to use a SOA3Vector class which holds 4 3-dimensional vector and performs all the ordinary operations on 4 vectors at once, so as long as you always have a long list of data to perform operations on, you should be fine :)

Re: SSE vector3df and matrix4

Posted: Tue Nov 26, 2013 6:10 pm
by devsh
i'm ressurecting this effort because the CPU is dragging down the performance of Build a World

the previous code I made is completely unusable because its not proper SSE

this time the classes of vector3d and matrix4 and rect2d and aabbox will all be 16byte aligned and padded to 4 floats
even on normal assignment or variable declaration

we'll provide a aligned16 call which will work on both windows and linux as well as a _SSSE3_ #ifdef s and #else s , so that irrlicht can be compiled without those

we'll release the whole thing when done and opengl 3.2 compliance is in

our irrlicht is always merged with the latest stable version ( 1.8 now, but merging with 1.8.1)

Re: SSE vector3df and matrix4

Posted: Sat Jan 11, 2014 5:04 am
by Granyte
I have been working on an SSE implementation of the matrix4 it's been able to make my fps go up 40% in some of my math heavy situation
so far only matrix works has been done as padding the vector to 4 components ended poorly have you gotten it to work?

Re: SSE vector3df and matrix4

Posted: Sun Jan 12, 2014 8:41 pm
by devsh
so far, post-poned until some game features are in and OGL 3.2 core context gets sorted

Re: SSE vector3df and matrix4

Posted: Fri Sep 05, 2014 3:50 am
by devsh
only just sorted out the proper implementation... first I'm going to make/publish the classes and then you need to change the actual type of the matrix etc.

head over to http://irrlicht.sourceforge.net/forum/v ... =9&t=50230