Has anyone got this working for Irrlicht 1.8? I'm using the direct3d9 hlsl shader but strange things happen to the models. The demo supplied works fine but If I try to use with 1.8 the mesh's tear apart.
Hardware Skinning for Irrlicht 1.7
-
- Competition winner
- Posts: 523
- Joined: Tue Jan 15, 2013 6:36 pm
Re: Hardware Skinning for Irrlicht 1.7
Today I added Hardware Skinning example to shader-pipeline branch.
Library helping with network requests, tasks management, logger etc in desktop and mobile apps: https://github.com/GrupaPracuj/hermes
Re: Hardware Skinning for Irrlicht 1.7
I'm here to explain why your hardware skinning might be slow and not work....
Basically this is probably why the shader refuses to work with pre-OpenGL 4.0 hardware, you're passing bone data as a uniform array (BAAAAAAD)
When the vertex shader accesses the bone array by bone ID stored in the vertex attribute, it can ask for different bone IDs between the vertex shader invocations.
In pre-GL 4.0 days, the uniform data sits in constant registers which prevent divergent access.
Basically 8 to 32 invocations of the vertex shader run at once, and if you ask for a different bone data (with different index from the array) this causes branching and the shader will fetch all the bone data in serial mode and then mask the results for each vertex-shader thread in the warp.
You need to make a class for the Texture Buffer Object (which is a texture "window" onto a GPU buffer) which allows you to use texelFetch() inside the shader and for the values to be cached in the texture cache.
You also need my transient or granular buffer to update the buffer object... or just use glBufferSubData every frame
Then if different "threads" ask for different pixels they can do it in parallel just like in the pixel shader when texturing.
If you really want to test if I'm correct, then "freeze" the animation and shove the uniform bone data into a small 1xN floating point texture and make the shader get its data from the texture instead.
Also not to mention that a TBO gives you a minimum size of 128MB, as opposed to the 16kb of uniforms you can have in one shader, its also persistent (you don't have to reupload every time you set the shader) and its size allows for a number of bones only limited by the bit-depth of your per-vertex boneID.
Additionally the 128MB can be used for skinned mesh instancing (1000 dwarves with 34 bones each as 8 float dual quaternions all drawn in one pass)
Other reasons:
A) Too much data, do you really need floating point bone weights? why not use normalized uint8? (a.k.a. bone weights as color)
B) Why not have bone transformations as dual quaternions, you dont need 4x4 or 4x3 matrices! Blending them is much faster!
[WARNING] - Blindslide shader code seem to work only on GLSL 4.0! (As my primary PC), tested it on my HTPC (Radeon 4350 - Pentium - Glsl 3.3) and the shader give 9 errors and fail to compile! Found out after Lazerblade message... We'll need to convert the shader to a lesser version of OpenGL as my current knowledge of shader is insuficient to do the task. Until then, the open GL version will only work on very recent cards... (I wonder how could shadowlair could have tested openGL?!)
With: 580 fps
Without: 830 fps
Conclusion: my hardware sucks! =/
Constant CascadingWith: 23 fps
Without: 47 fps
Lol )
Basically this is probably why the shader refuses to work with pre-OpenGL 4.0 hardware, you're passing bone data as a uniform array (BAAAAAAD)
When the vertex shader accesses the bone array by bone ID stored in the vertex attribute, it can ask for different bone IDs between the vertex shader invocations.
In pre-GL 4.0 days, the uniform data sits in constant registers which prevent divergent access.
Basically 8 to 32 invocations of the vertex shader run at once, and if you ask for a different bone data (with different index from the array) this causes branching and the shader will fetch all the bone data in serial mode and then mask the results for each vertex-shader thread in the warp.
You need to make a class for the Texture Buffer Object (which is a texture "window" onto a GPU buffer) which allows you to use texelFetch() inside the shader and for the values to be cached in the texture cache.
You also need my transient or granular buffer to update the buffer object... or just use glBufferSubData every frame
Then if different "threads" ask for different pixels they can do it in parallel just like in the pixel shader when texturing.
If you really want to test if I'm correct, then "freeze" the animation and shove the uniform bone data into a small 1xN floating point texture and make the shader get its data from the texture instead.
Also not to mention that a TBO gives you a minimum size of 128MB, as opposed to the 16kb of uniforms you can have in one shader, its also persistent (you don't have to reupload every time you set the shader) and its size allows for a number of bones only limited by the bit-depth of your per-vertex boneID.
Additionally the 128MB can be used for skinned mesh instancing (1000 dwarves with 34 bones each as 8 float dual quaternions all drawn in one pass)
Other reasons:
A) Too much data, do you really need floating point bone weights? why not use normalized uint8? (a.k.a. bone weights as color)
B) Why not have bone transformations as dual quaternions, you dont need 4x4 or 4x3 matrices! Blending them is much faster!
-
- Posts: 1638
- Joined: Mon Apr 30, 2007 3:24 am
- Location: Montreal, CANADA
- Contact:
Re: Hardware Skinning for Irrlicht 1.7
Hardware skinning can only work on modern engines. There still a lot of work to be done before Irrlicht can do more than basic fixed pipeline as it was designed at that time.
Re: Hardware Skinning for Irrlicht 1.7
I'm making my own version
Explanation of how I'm making sure the FPS is much higher in GPU mode than in CPU
http://irrlicht.sourceforge.net/forum/v ... 40#p299340
Explanation of how I'm making sure the FPS is much higher in GPU mode than in CPU
http://irrlicht.sourceforge.net/forum/v ... 40#p299340