Setting Bones From Matrix
Re: Setting Bones From Matrix
Check this out
https://mmmovania.blogspot.com/2012/11/ ... nning.html
https://mmmovania.blogspot.com/2012/11/ ... nning.html
Re: Setting Bones From Matrix
Thank you devsh that article explained things very well. OZZ Animation provides the bind pose, unfortunately the matrices are in a different format than those given when doing an animation. I raised an issue on their github page to figure out how to intermingle the two.
In the meantime I figured I would drop OZZ and go 100% Irrlicht:
And the shader
And it works!
https://youtu.be/rDImu6lNlX8
27FPS and 1.3million triangles.
Sadly turning off hardware skinning and removing the shader gives about a 40% speedup.. Doesn't really make any sense, you would think that speed increase would be in the other direction..
In the meantime I figured I would drop OZZ and go 100% Irrlicht:
Code: Select all
class irrSkinningCallback : public irr::video::IShaderConstantSetCallBack
{
public:
irr::u32 MaxBones = 70;
irr::scene::ISkinnedMesh* Mesh;
irr::f32* Uniforms = new irr::f32[MaxBones * 16];
irr::f32* Uniforms_Scratch = Uniforms;
irr::core::matrix4 BoneTranslation;
bool Update = true;
irrSkinningCallback() : BoneTranslation(irr::core::matrix4::EM4CONST_NOTHING) {}
~irrSkinningCallback()
{
delete[] Uniforms;
}
void SetupNode(irr::video::IVideoDriver* driver, irr::scene::IAnimatedMeshSceneNode* Node)
{
Mesh = (irr::scene::ISkinnedMesh*)Node->getMesh();
MaxBones = Mesh->getAllJoints().size();
Mesh->setHardwareMappingHint(irr::scene::EHM_STATIC, irr::scene::EBT_VERTEX_AND_INDEX);
Mesh->setHardwareSkinning(true);
irr::video::IGPUProgrammingServices* gpu = driver->getGPUProgrammingServices();
irr::io::path VertPath = "skinning.vert";
irr::io::path FragPath = "";
irr::s32 mtlSkinningShader = gpu->addHighLevelShaderMaterialFromFiles(
VertPath, "main", irr::video::EVST_VS_4_1,
FragPath, "main", irr::video::EPST_PS_4_1,
this, irr::video::EMT_SOLID, 0, irr::video::EGSL_DEFAULT);
Node->setMaterialType((irr::video::E_MATERIAL_TYPE)mtlSkinningShader);
}
void OnSetConstants(irr::video::IMaterialRendererServices* services, irr::s32 userData)
{
if (Update) {
Update = false;
}
else {
Update = true;
return;
}
Uniforms_Scratch = Uniforms;
for (irr::u32 Bone = 0; Bone < MaxBones; ++Bone)
{
BoneTranslation.setbyproduct(
Mesh->getAllJoints()[Bone]->GlobalAnimatedMatrix,
Mesh->getAllJoints()[Bone]->GlobalInversedMatrix);
for (irr::u8 Float = 0; Float < 16; ++Float) {
*Uniforms_Scratch++ = BoneTranslation[Float];
}
}
services->setVertexShaderConstant("bones[0]", Uniforms, MaxBones * 16);
}
};
class irrAnimator
{
irr::scene::ISceneManager* smgr;
irr::scene::IAnimatedMeshSceneNode* Node;
// Shader Callback
irrSkinningCallback* callback;
public:
irr::scene::IAnimatedMeshSceneNode* GetNode() { return Node; }
irrAnimator(irr::video::IVideoDriver* driver, irr::scene::ISceneManager* SceneManager) : smgr(SceneManager)
{
callback = new irrSkinningCallback;
Node = smgr->addAnimatedMeshSceneNode(smgr->getMesh("dwarf.x"));
Node->setMaterialFlag(irr::video::EMF_LIGHTING, false);
callback->SetupNode(driver, Node); // Apply Skinning Shader
// Hacky way to set vertex weights
irr::scene::ISkinnedMesh* SkinnedMesh = (irr::scene::ISkinnedMesh*)Node->getMesh();
for (irr::u32 Buffer = 0; Buffer < SkinnedMesh->getMeshBuffers().size(); ++Buffer)
{
for (irr::u32 Vert = 0; Vert < SkinnedMesh->getMeshBuffers()[Buffer]->getVertexCount(); ++Vert)
{
SkinnedMesh->getMeshBuffers()[Buffer]->getVertex(Vert)->Color = irr::video::SColor(0, 0, 0, 0);
}
}
for (irr::u32 Joint = 0; Joint < SkinnedMesh->getAllJoints().size(); ++Joint)
{
for (irr::u32 Weight = 0; Weight < SkinnedMesh->getAllJoints()[Joint]->Weights.size(); ++Weight)
{
const irr::u32 buffId = SkinnedMesh->getAllJoints()[Joint]->Weights[Weight].buffer_id;
const irr::u32 vertexId = SkinnedMesh->getAllJoints()[Joint]->Weights[Weight].vertex_id;
irr::video::SColor* vColor = &SkinnedMesh->getMeshBuffers()[buffId]->getVertex(vertexId)->Color;
if (vColor->getRed() == 0)
vColor->setRed(Joint + 1);
else if (vColor->getGreen() == 0)
vColor->setGreen(Joint + 1);
else if (vColor->getBlue() == 0)
vColor->setBlue(Joint + 1);
else if (vColor->getAlpha() == 0)
vColor->setAlpha(Joint + 1);
}
}
}
~irrAnimator()
{
delete callback;
}
};
Code: Select all
uniform mat4 bones[100];
void main(void)
{
int BoneID = int(gl_Color.r * 255);
mat4 vertTran = bones[BoneID - 1];
BoneID = int(gl_Color.g * 255);
if(BoneID > 0)
vertTran += bones[BoneID - 1];
BoneID = int(gl_Color.b * 255);
if(BoneID > 0)
vertTran += bones[BoneID - 1];
BoneID = int(gl_Color.a * 255);
if(BoneID > 0)
vertTran += bones[BoneID - 1];
gl_Position = gl_ModelViewProjectionMatrix * vertTran * gl_Vertex;
gl_FrontColor = vec4(1,1,1,1);
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_TexCoord[1] = gl_MultiTexCoord1;
}
https://youtu.be/rDImu6lNlX8
27FPS and 1.3million triangles.
Sadly turning off hardware skinning and removing the shader gives about a 40% speedup.. Doesn't really make any sense, you would think that speed increase would be in the other direction..
Dream Big Or Go Home.
Help Me Help You.
Help Me Help You.
Re: Setting Bones From Matrix
What GPU do you have?
Putting Bones in a Uniform Array may cause Constant-Waterfalling on pre OpenGL 4.0 GPUs.
Look at IrrlichtBAW's HardwareSkinning example and modify it to have 25x25 dwarves, I get 17fps with lighting (on Nvidia 1060 Mobile).
The problem however is that each node (out of the 625) gets its own TextureBufferObject (Texture+Buffer) which gets updated every frame, so its not fully-fully optimized yet.
Beware of comparing a full-GPU load with a empty-CPU load, this is because hundreds of thousands of vertices skinning on the CPU will be faster than all-GPU because CPU has nothing to do, as soon as you throw some CPU load (game logic, AI or physics) the balance may change.
ANOTHER NOTE: YOUR DWARVES ARE ALL IN SYNC, SO CPU SKINS ONLY 1 MESH PER FRAME AS ALL 625 NODES SHARE THE SAME MESH.
TRY HAVING THE DWARVES ANIMATE AT DIFFERENT SPEEDS!
(once I did my hardware skinning became 10x faster than CPU, especially when the meshes were off-screen out of camera's view)
http://irrlicht.sourceforge.net/forum/v ... es#p299488
Putting Bones in a Uniform Array may cause Constant-Waterfalling on pre OpenGL 4.0 GPUs.
Look at IrrlichtBAW's HardwareSkinning example and modify it to have 25x25 dwarves, I get 17fps with lighting (on Nvidia 1060 Mobile).
The problem however is that each node (out of the 625) gets its own TextureBufferObject (Texture+Buffer) which gets updated every frame, so its not fully-fully optimized yet.
Beware of comparing a full-GPU load with a empty-CPU load, this is because hundreds of thousands of vertices skinning on the CPU will be faster than all-GPU because CPU has nothing to do, as soon as you throw some CPU load (game logic, AI or physics) the balance may change.
ANOTHER NOTE: YOUR DWARVES ARE ALL IN SYNC, SO CPU SKINS ONLY 1 MESH PER FRAME AS ALL 625 NODES SHARE THE SAME MESH.
TRY HAVING THE DWARVES ANIMATE AT DIFFERENT SPEEDS!
(once I did my hardware skinning became 10x faster than CPU, especially when the meshes were off-screen out of camera's view)
http://irrlicht.sourceforge.net/forum/v ... es#p299488
Re: Setting Bones From Matrix
Sorry, I get 135 FPS with 625 dwarves... I had the engine compiled in debug mode XD
Re: Setting Bones From Matrix
I suppose I was just expecting to render way more than what I got.
This would probably be a scenario where instancing would greatly improve the situation.
https://www.videocardbenchmark.net/gpu. ... 00%2F8700M
Radeon 8600M/8700M, not the best card but it gets the job done.
The next thing I'll do is cut it down to 14x14 (196 nodes) and ensure they only update at a fixed 30fps
Aside from that I'm not sure there is much I could do to increase performance. The shader itself seems pretty basic at this point.
This would probably be a scenario where instancing would greatly improve the situation.
https://www.videocardbenchmark.net/gpu. ... 00%2F8700M
Radeon 8600M/8700M, not the best card but it gets the job done.
The next thing I'll do is cut it down to 14x14 (196 nodes) and ensure they only update at a fixed 30fps
Aside from that I'm not sure there is much I could do to increase performance. The shader itself seems pretty basic at this point.
Last edited by kklouzal on Fri Feb 02, 2018 12:51 am, edited 1 time in total.
Dream Big Or Go Home.
Help Me Help You.
Help Me Help You.
Re: Setting Bones From Matrix
We don't have instancing on our skinned meshes, yet.
With a 8800M GPU the highest FPS you can expect is 18-19
You're getting 27FPS in CPU skinning mode because you're drawing the same mesh in the same pose 625 times.
There is something wrong with your shader code.
These bone matrices are not weighted by the vertex skinning weights.. its a miracle skinning works, it surely won't funking work on other meshes (assumes equal weight on bones affecting vertex).
(Also reason why your OZZ skinning may not work)
One note, instead of accumulating your bone matrices, accumulate vertices transformed by matrices.
//adding transformed weighted vertices is better than adding weighted matrices and then transforming
//averaging matrices = [1,4]*(21 fmads) + 15 fmads
//averaging transformed verts = [1,4]*(15 fmads + 7 muls)
Final remark, use mat4x3 for the bone matrices, you'll save 25% of the Uniform memory and increase your FPS by 10-25%.
With a 8800M GPU the highest FPS you can expect is 18-19
You're getting 27FPS in CPU skinning mode because you're drawing the same mesh in the same pose 625 times.
There is something wrong with your shader code.
Code: Select all
int BoneID = int(gl_Color.r * 255);
mat4 vertTran = bones[BoneID - 1];
BoneID = int(gl_Color.g * 255);
if(BoneID > 0)
vertTran += bones[BoneID - 1];
BoneID = int(gl_Color.b * 255);
if(BoneID > 0)
vertTran += bones[BoneID - 1];
BoneID = int(gl_Color.a * 255);
if(BoneID > 0)
vertTran += bones[BoneID - 1];
(Also reason why your OZZ skinning may not work)
One note, instead of accumulating your bone matrices, accumulate vertices transformed by matrices.
//adding transformed weighted vertices is better than adding weighted matrices and then transforming
//averaging matrices = [1,4]*(21 fmads) + 15 fmads
//averaging transformed verts = [1,4]*(15 fmads + 7 muls)
Final remark, use mat4x3 for the bone matrices, you'll save 25% of the Uniform memory and increase your FPS by 10-25%.
Re: Setting Bones From Matrix
Thank you devsh you have been a big help to me and the rest of the community!
Dream Big Or Go Home.
Help Me Help You.
Help Me Help You.
Re: Setting Bones From Matrix
We already have this in IrrBAWkklouzal wrote:The next thing I'll do is cut it down to 14x14 (196 nodes) and ensure they only update at a fixed 30fps
Code: Select all
//! only for EBUM_NONE and EBUM_READ, it dictates what is the actual frequency we want to bother updating the mesh
//! because we don't want to waste CPU time if we can tolerate the bones updating at 120Hz or similar
virtual void setDesiredUpdateFrequency(const float& hertz) = 0;
Re: Setting Bones From Matrix
Do this to your dwarve nodes, and you'll see this statement is wrong.kklouzal wrote:Sadly turning off hardware skinning and removing the shader gives about a 40% speedup.. Doesn't really make any sense, you would think that speed increase would be in the other direction..
Code: Select all
#define kInstanceSquareSize 25
for (size_t x=0; x<kInstanceSquareSize; x++)
for (size_t z=0; z<kInstanceSquareSize; z++)
dwarfNode[x+kInstanceSquareSize*z] = anode->setAnimationSpeed(18.f*float(x+1+(z+1)*kInstanceSquareSize)/float(kInstanceSquareSize*kInstanceSquareSize));
Re: Setting Bones From Matrix
LOL they are doing the wave!
196 nodes with gpu skinning 70fps
196 nodes with cpu skinning 35fps
Literally a 100% increase in performance with gpu skinning :)
I was able to do it with 3x3 matrix but the arms on the nodes were mangled up.
This whole little side project is my first experience with shaders and matrices. I don't really know what I'm doing yet. >.<
I'm more of a learn by example type.
196 nodes with gpu skinning 70fps
196 nodes with cpu skinning 35fps
Literally a 100% increase in performance with gpu skinning :)
I'll try to figure these out when I get home from work tonight. I quickly tried the 4x3 matrix tweak but was getting shader compilation errors.One note, instead of accumulating your bone matrices, accumulate vertices transformed by matrices.
..
Final remark, use mat4x3 for the bone matrices, you'll save 25% of the Uniform memory and increase your FPS by 10-25%.
I was able to do it with 3x3 matrix but the arms on the nodes were mangled up.
This whole little side project is my first experience with shaders and matrices. I don't really know what I'm doing yet. >.<
I'm more of a learn by example type.
Dream Big Or Go Home.
Help Me Help You.
Help Me Help You.
Re: Setting Bones From Matrix
Now that I told you that we have the X Hz only Boning mode in IrrBAW, one could take it a whole step further and render to screen+save triangles to transform feedback (possible with the same shader in the same draw call) only when the current keyframe changes and then just draw the mobs in one drawcall (huge batch) with transform feedback draw until the mobs change the frame they're animating on.
This would add another 100% based on my comparisons with a shader without any skinning.
Also you'd get the shadow-pass for half-price (meshes stay static between shadow and main render).
Not mentioning that you could account for the max camera movement in 30 or 120Hz and frustum cull the triangles of your mobs .
This would add another 100% based on my comparisons with a shader without any skinning.
Also you'd get the shadow-pass for half-price (meshes stay static between shadow and main render).
Not mentioning that you could account for the max camera movement in 30 or 120Hz and frustum cull the triangles of your mobs .
Re: Setting Bones From Matrix
So anyway, this is my half-assed semi-optimized attempt at 625 skinned dwarves animating at different speeds with an Omidirectional Point Light with Shadows.
Cubemap shadows are drawn in one render pass using layered rendering with gl_Layer and geometry shader hardware-instancing (fixed function tessellation SM 5.0).
Could optimize with what I said above, as well as using the OpenGL extension to specify explicitly MSAA sample locations for 8x or 16x less pixel shader invocations during depth only pass, but I'd have to switch from a cubemap FBO attachment to a Multisample Array.
And then obviously a custom resolve, but that could be combined with mipmap chain generation and the blur pass for VSM or other.
So anyway, 56FPS on Nvidia 1060 laptop GPU
(You'd get about 7FPS)
But the good news is that with 100 dwarves I get 1000 FPS.
Cubemap shadows are drawn in one render pass using layered rendering with gl_Layer and geometry shader hardware-instancing (fixed function tessellation SM 5.0).
Could optimize with what I said above, as well as using the OpenGL extension to specify explicitly MSAA sample locations for 8x or 16x less pixel shader invocations during depth only pass, but I'd have to switch from a cubemap FBO attachment to a Multisample Array.
And then obviously a custom resolve, but that could be combined with mipmap chain generation and the blur pass for VSM or other.
So anyway, 56FPS on Nvidia 1060 laptop GPU
(You'd get about 7FPS)
But the good news is that with 100 dwarves I get 1000 FPS.