Hardware Skinning
Why am I complaining again?
Simply because for some strange reason, again, irrlicht does things completely differently to even the most basic tutorial found on the internet.
And its about to do it again... (lack of TBOs)
What is wrong? - a whole list of things
1) The Normals being calculated are completely wrong (like 2 levels of wrong)
2) The weighting of vertices is managed through a list of indices per bone and a whole ton of linked lists
3) position, rotation and scale hints on joints are completely useless
4) ever present recursion
5) constant recalculation of the bounding box for the mesh
What will be wrong in Irrlicht 1.9?
1) Constant Waterfalling in Hardware Skinning -- you need to implement TBOs before you skin
2) BBox update for skinned meshes
3) The Normals
How did I fix these in my fork?
Texture Buffer Objects
If you attempt to pass joint/bone transformation matrices to the shader as a uniform(GLSL)/constant(HLSL) array, you're going to run into constant waterfalling.
Essentially unless you run GCN 2.0 and use Uniform Buffer Objects, the uniform data sits in registers.
The GPU is a SIMD processor, i.e. the same instruction is carried out across all "threads", usually 32 at a time, in a case of a vertex shader there are 32 vertices processed in one "warp" (at a time).
This means that while i.e. a MUL instruction can take 32 different values as operands, it needs to take the same register, and uniform array elements are different distinct registers.
The texture fetch instruction is an example because the value from the register determines the mem location to fetch.
So this means that when some threads use values from different registers (array indices) the instruction gets carried out multiple times and the results you dont want are masked out.
A similar thing happens with "divergent flow control", a.k.a. if-statements in shaders.
This is not a problem if all threads use the same register most of the time, but in skinning 32 subsequent vertices are very unlikely to be influenced by the same set of bones in the same order.
And that is why I implemented TBOs which sit on top of IGPUBuffers which can be updated any way you deem appropriate (discard/recreate,BufferSubData, persistent mapping, N-Buffer round robin),
and the data is fetched inside the shader through "texelFetch" from a "samplerBuffer" in parallel.
The only way one could implement GPU Skinning right now without TBOs and IGPUBuffer infrastructure, without suffering from constant waterfalling, would be to update an actual 2D (but really 1D) texture all the time.
Not Recalculating a Bounding Box for the skinned mesh from the vertices' positions
One of the complaints about SVN irrlicht which will be version 1.9 is that bounding boxes are not updated for hardware skinned meshes, and well, they cant be.
Thats because the moved vertices are sent from the vertex shader to the rasterizer and a copy is not being kept.
Hell even if a copy was being retrieved, it would be stupid to download it after already drawing the mesh for culling
![Very Happy :D](./images/smilies/icon_biggrin.gif)
So if you want a bounding box, you'd need to skin at least the positions on the CPU! which kind of defeats the objective
WRONG
If you notice, the final vertex position is a linear combination of the original vertex position transformed by N bone/joint matrices.
The weights add up to 1, so the combination MUST lay between the different blended positions (be contained by a 3D convex hull enclosing the positions being mixed).
So if you make BoundingBoxes for each bone/joint by adding all vertices which it influences (weight>0.f) into the box, and then transform the bounding boxes by the
matrices of the bones after animation and merge them into one you get a CONSERVATIVE bounding box for your skinned mesh which completely contains the one
you would have made by recalculating it from moved vertex positions.
And all this at least 100x faster, or 800x if you use my new transformBoxEx() function.
Not only that, but you can draw the BoundingBox of the bone and get a much better visualization of the bone than just a line to its parent.
The Awful Linked Lists
The joint has a list of vertices it influences and weights that it exerts on them, to skin one must make a bool helper array to know if its the first bone to modify the position (use '=') or later (use '+=').
Linked lists are horribly inefficient, especially if I have to traverse the vertex array randomly to modify the values.
After we added a flexible vertex format (supports all OpenGL vertex attribute input data floats,integers,packed formats like R10G10B10A2), it became really expensive to set a position or to read it which made the whole thing even slower.
And the recursion, its just awful!
Instead we keep a list of up to 4 boneIDs per vertex that influence it, and cap the maximum number of bones to 256... everybody does it even crysis (except for the 256 bone limit).
We also notice that the weights have to add up to 1 so its useless to store the 4th weight and also that we dont need the full range of the "float".
We use RGB10A2 format for the weights and use the last 2 bits to tell us how many bones influence the vertex (1 up to 4).
This all boils down to only 8bytes extra data per vertex, and a 4x speed increase.
Every skinning tutorial does it like this.
Useless Caching - Pos/Rot/Scale Hints
I made myself a grid of 100 by 100 animated dwarves, all was fine until I set different animation speeds on them.
It turned out the dwarf was only being skinned once per 10000 because the same frame was being requested all the time.
This practically never happens that all instances of the animated mesh play the same animation, at the same speed and perfectly in-sync.
Instead I used std::lower_bound to find my frame keys instead of trying to accelerate it with hints, if log(N) proves to be too slow (versus the N of an invalidated hint by more than one hint),
one can use a fixed number of bins (i.e. 1024 which are fetched in O(1)) which can give us smaller ranges than (0,maxFrameForLastKey) to binary search.
Normals - Level 1 Of Wrong
Simply multiplying with the sub 3x3 matrix of the transformation matrix will not rotate the normal properly, the InverseTranspose of that 3x3 is the correct NormalMatrix!!!
Every Skinning tutorial on the internet Mentions THIS!
Normals - Level 2 Of Wrong
Here I can't blame anyone, as no implementation really takes care of it, the blending of correct 3x3 inverse transposes does not always give the correct normals, unless all vertices involved are influenced by 1 bone with a weight of 1.
The weights change from vertex to vertex, hence vary across the triangle face which makes the triangle stretch and rotate and that invalidates any normals which were pre-calculated.
Imagine a cube, where 4 corners at the top fully belong to bone A and the 4 at the bottom belong to B. Now scale bone A down or rotate it, and you'll see that the sides now have wrong normals, but the top and bottom are fine.
There are some solutions to this in research papers, so I will update you on how I solve that.