About SIMD math (with profiling code)
Re: About SIMD math (with profiling code)
Yet the X axis is stored as M[0], M[1], M[2]...
Re: About SIMD math (with profiling code)
@devsh: 12, 13, 14, 15 is the last row in a row major matrix. That's what row major means.
As the documentation says the matrix is interpreted the same as in D3D. Opengl puts the translation in the column (because some non-coding math guys prefer that) instead of the row (as d3d does) which is why you might think it always should be a column - but there's no reason for that - both are fine. And as GL is reading the memory then as column major matrix it puts the translation elements again as 12,13,14,15 in memory just as in D3D.
As the documentation says the matrix is interpreted the same as in D3D. Opengl puts the translation in the column (because some non-coding math guys prefer that) instead of the row (as d3d does) which is why you might think it always should be a column - but there's no reason for that - both are fine. And as GL is reading the memory then as column major matrix it puts the translation elements again as 12,13,14,15 in memory just as in D3D.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Re: About SIMD math (with profiling code)
Nothing you say makes sense. Let me say again, Irrlicht matrices are actually the transposes of the true matrices. Therefore, the translation vector is not in the fourth column, it is in the fourth row.
Re: About SIMD math (with profiling code)
Actually the memory representation is independent of such stuff.(devsh explained it well)
In math, translation matrix is the last column, but you can represent the matrix with translation both in M[3] M[7] M[11] or in M[12] M[13] M[14]. You are in both cases refererring to a column if you keep consistent across all code.
Now, in D3D the translation is a row. So if irrlicht take that as assumption (12,13,14) from a D3D point of view irrlicht is column-major. That accidentally can be thinked as a GL matrix with translation in a column and row-major order, they are the same. However that change when matrix multiplication comes into play, from that you can detect if the matrix class is thinked as D3D-ish or Gl-ish. (so you need to post multiply a vector. or to pre-multiply it to get it transformed?)
Memory representation is independent of the math formalism. (speculations: D3D used translation in rows because 1) accidentally C++ operator * associates in the wrong direction, 2) to break code and confuse users so they will stick on D3D easily )
In math, translation matrix is the last column, but you can represent the matrix with translation both in M[3] M[7] M[11] or in M[12] M[13] M[14]. You are in both cases refererring to a column if you keep consistent across all code.
Now, in D3D the translation is a row. So if irrlicht take that as assumption (12,13,14) from a D3D point of view irrlicht is column-major. That accidentally can be thinked as a GL matrix with translation in a column and row-major order, they are the same. However that change when matrix multiplication comes into play, from that you can detect if the matrix class is thinked as D3D-ish or Gl-ish. (so you need to post multiply a vector. or to pre-multiply it to get it transformed?)
Memory representation is independent of the math formalism. (speculations: D3D used translation in rows because 1) accidentally C++ operator * associates in the wrong direction, 2) to break code and confuse users so they will stick on D3D easily )
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Re: About SIMD math (with profiling code)
matrix multiplication is associative, so it doesnt matter if you multiply matrices as A(BC) or (AB)C, you still get the same answer, so C++ operator associativity is not an issue
Re: About SIMD math (with profiling code)
Anyway
TRY MY SIMD MATRIX AND DO SOME BENCHMARKS OF DEATH:
http://irrlicht.sourceforge.net/forum/v ... 04#p293604
I recommend trying my matrix inverse function (which is untested so it may completely not work), it should be 10x faster than irrlicht's one.
TRY MY SIMD MATRIX AND DO SOME BENCHMARKS OF DEATH:
http://irrlicht.sourceforge.net/forum/v ... 04#p293604
I recommend trying my matrix inverse function (which is untested so it may completely not work), it should be 10x faster than irrlicht's one.
Re: About SIMD math (with profiling code)
How do you use your SIMD Matrix operation classes?
Is the expected format similar to Irrlicht?
Row or Column major?
Do we need to convert irrlicht vector to SIMD vector format or from matrix4 to SIMD matrix to use it?
Maybe if you could put out one or more example on how to use it.
Regards,
thanh
Is the expected format similar to Irrlicht?
Row or Column major?
Do we need to convert irrlicht vector to SIMD vector format or from matrix4 to SIMD matrix to use it?
Maybe if you could put out one or more example on how to use it.
Regards,
thanh
Re: About SIMD math (with profiling code)
matrix is proper maths mattrix, the matrix is made of 4 SIMD vectors, each representing a row
yes, you need to convert between irrlicht vectors and matrices, there is no implied conversion (for the sake of speed and the bugs that could arise from different formats)
there are some conversion functions for the vectorSIMDf, but none for matrixSIMD4 so far
the examples on usage are in the other thread
yes, you need to convert between irrlicht vectors and matrices, there is no implied conversion (for the sake of speed and the bugs that could arise from different formats)
there are some conversion functions for the vectorSIMDf, but none for matrixSIMD4 so far
the examples on usage are in the other thread
Re: About SIMD math (with profiling code)
Using my properly implemented class (I was only pitching actual class instances of matrix4 and matrixSIMD4 against each other)
-O0
As we can see we have a clear winner here (SIMD intrinsics)
The funny thing is the optimizations begin to top-out at O2 (I guess there is nothing more the compiler can do with such simple code)
If we look at O0, we see perf gain is only around 10-15%, but as we go to O2 we start hitting 2x speedup.
The reason for that is because the compiler finally starts inlining all the functions and sees a lot of
and eliminates the redundant loads and stores
-O0
-O1RUNTIME of SSE matrix mul 5054530 microseconds
RUNTIME of irrMatrix mul 6241177 microseconds
-O2RUNTIME of SSE matrix mul 375871 microseconds
RUNTIME of irrMatrix mul 562569 microseconds
-O3RUNTIME of SSE matrix mul 288762 microseconds
RUNTIME of irrMatrix mul 548489 microseconds
-O4RUNTIME of SSE matrix mul 301609 microseconds
RUNTIME of irrMatrix mul 574059 microseconds
RUNTIME of SSE matrix mul 296882 microseconds
RUNTIME of irrMatrix mul 538160 microseconds
As we can see we have a clear winner here (SIMD intrinsics)
The funny thing is the optimizations begin to top-out at O2 (I guess there is nothing more the compiler can do with such simple code)
If we look at O0, we see perf gain is only around 10-15%, but as we go to O2 we start hitting 2x speedup.
The reason for that is because the compiler finally starts inlining all the functions and sees a lot of
Code: Select all
...
xmm0 = _mm_load_ps(memory);
xmm0 = _mm_someop_ps(xmm0,xmm1);
_mm_store_ps(memory);
xmm0 = _mm_load_ps(memory);
...
Re: About SIMD math (with profiling code)
Yes indeed it matters if you have vectors one one side of the multiplication chain, and anyway in general (AB) is different from (BA) many implementations would also invert pairs of elements effectively turning A(BC) into A(CB) and then into (CB)A wich is wrong too. In general the "wrong" associativity of C++ and particular choosen implementation force to write matrix in wrong order compared to math and you will get different results.devsh wrote:matrix multiplication is associative, so it doesnt matter if you multiply matrices as A(BC) or (AB)C, you still get the same answer, so C++ operator associativity is not an issue
@devsh. going to test your code XD
Last edited by REDDemon on Wed Jun 10, 2015 9:48 am, edited 1 time in total.
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Re: About SIMD math (with profiling code)
-O2 -msse3
-O3 -msse3
very nice
You can run the benchmark too here:
https://github.com/Darelbi/PublicProfil ... iplication
Code: Select all
RUNTIME of devsh SSE mat mul 1270000 microseconds
RUNTIME of SSE matrix mul 3384000 microseconds
RUNTIME of regularMatrix mul 3720000 microseconds
RUNTIME of irrMatrix mul 3542000 microseconds
Code: Select all
RUNTIME of devsh SSE mat mul 1301000 microseconds
RUNTIME of SSE matrix mul 3126000 microseconds
RUNTIME of regularMatrix mul 3593000 microseconds
RUNTIME of irrMatrix mul 3532000 microseconds
You can run the benchmark too here:
https://github.com/Darelbi/PublicProfil ... iplication
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Re: About SIMD math (with profiling code)
Very nice indeed!
Any other benchmarks in the works? Like matrix inverse
Any other benchmarks in the works? Like matrix inverse
Re: About SIMD math (with profiling code)
Thanks! that's possible XDRdR wrote:Very nice indeed!
Any other benchmarks in the works? Like matrix inverse
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Re: About SIMD math (with profiling code)
That would be nice.REDDemon wrote:Thanks! that's possible XDRdR wrote:Very nice indeed!
Any other benchmarks in the works? Like matrix inverse
Have to say I did not do much research about SIMD yet, but would like to implement this in the feature (or use devsh's code if its available)
But how do you handle CPU's not supporting SIMD?
Re: About SIMD math (with profiling code)
Good question. It is a maintenance issue mostly..
You need 2 branches of the same code (you can do preprocessor trickery, ore just abuse the build system to include the correct file). One with regular C++ and the other one with SIMD instructions.
You have also to build 2 different binaries (pretty easy as long as you stick with CMAKE, but a bit pain using VS or C::B) and warn users about the different download packages (or just do that selection at runtime: the selection would be platform dependent so requires extra code).
You can happily assume everyone have SSE2 (mine laptop is 7 years old and has up to SSE3) but most interesting stuff comes with SSE3 (horizontal add in example, mine SSE matrix multiplication is slow as native code because uses SSE2 and so no horizontal add, while devsh code use SSE3 wich give a huge boost of x3 speed)
It surprises me that modern C++ compilers still can't do the following:
- Convert SIMD instructions into multiple regular x86 instructions (it is possible, and that would at least remove some maintenance burden from C++ developers: you just write SIMD code).. as far as I know Emscripten already do that but that's for web.
- Certain processors just translate SIMD instructions into multiple microcodes instructions
The most important point is that optimizing for SIMD requires also some changes at high level (different memory layout), and just hardcoding routines in SIMD instructions is not the most smart thing todo (but actually no programming language provide high level control for memory layout of data, I just wrote an article in italian once about the topic.), so you have to keep into account that.
You need 2 branches of the same code (you can do preprocessor trickery, ore just abuse the build system to include the correct file). One with regular C++ and the other one with SIMD instructions.
You have also to build 2 different binaries (pretty easy as long as you stick with CMAKE, but a bit pain using VS or C::B) and warn users about the different download packages (or just do that selection at runtime: the selection would be platform dependent so requires extra code).
You can happily assume everyone have SSE2 (mine laptop is 7 years old and has up to SSE3) but most interesting stuff comes with SSE3 (horizontal add in example, mine SSE matrix multiplication is slow as native code because uses SSE2 and so no horizontal add, while devsh code use SSE3 wich give a huge boost of x3 speed)
It surprises me that modern C++ compilers still can't do the following:
- Convert SIMD instructions into multiple regular x86 instructions (it is possible, and that would at least remove some maintenance burden from C++ developers: you just write SIMD code).. as far as I know Emscripten already do that but that's for web.
- Certain processors just translate SIMD instructions into multiple microcodes instructions
The most important point is that optimizing for SIMD requires also some changes at high level (different memory layout), and just hardcoding routines in SIMD instructions is not the most smart thing todo (but actually no programming language provide high level control for memory layout of data, I just wrote an article in italian once about the topic.), so you have to keep into account that.
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me