Reducing CPU loading when using 100s scenenodes
Re: Reducing CPU loading when using 100s scenenodes
Sounds reasonable, maybe the names could be EAX_FUSTRUM_BOX and EAC_FUSTRUM_VIEWPORT !! Might make it clearer, or a comment in the header... Irrlicht gurus, what is the correct answer here? 
Re: Reducing CPU loading when using 100s scenenodes
I´ve found that the speed issue is caused by calling setRenderTarget to clear the textures of, in the test case, 170 mesh cubes with 6 textures each. This takes about 8 seconds on a 2 GHz machine with DX9 hardware accelerated graphics.
Creating the mesh cubes and the textures is very fast, but clearing the textures to set colors isn´t.
All my textures are rendertargets because I need to draw on them at run-time.
Any ideas how this could be speeded up?
Would it be faster to use setRenderTarget with the clear frame buffer parameter false, and then use another drawing method or bitblt perhaps?
Appreciate any feedback if anyone has had the same issue.
Creating the mesh cubes and the textures is very fast, but clearing the textures to set colors isn´t.
All my textures are rendertargets because I need to draw on them at run-time.
Any ideas how this could be speeded up?
Would it be faster to use setRenderTarget with the clear frame buffer parameter false, and then use another drawing method or bitblt perhaps?
Appreciate any feedback if anyone has had the same issue.
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
Well, just try it out. You don't need to change anything of the texture content when testing the speed here. But I'd assume that it's more the sheer number of textures (really 1200 different textures?!). Use of texture atlas would improve performance and also render speed.
Re: Reducing CPU loading when using 100s scenenodes
I traced through to see how setRenderTarget clears the texture pixels and it uses D3D9´s clear function, so I imagine that would be implemented in the fastest way... or one would hope! There is little overhead in the setRenderTarget code itself, but if the textures are in video memory, they should be cleared very fast by the GPU.
Well, I guess I have some digging to do!
Well, I guess I have some digging to do!
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
Well, just measure a loop where 100 RTTs are set with and without clear. I assume that there will be only little difference, because enabling and disabling these 100 textures is far more expensive than clearing them (how big are they anyway?). That's why I'd use texture atlas and combine many textures into one.
Re: Reducing CPU loading when using 100s scenenodes
Thanks for the reply!
Most of the mesh cubes have only 2 sides with large textures, 1024 x 64 pixels (there are around at this resolution 300 textures mapped to 150 cubes), the other sides are reduced to 1x1 pixel textures to allow descrete coloring of the cube sides.
All of the larger textures carry descrete information so cannot be referenced off one texture.
So far it seems that the overhead is in the section of my code that clears all these textures using setRenderTarget, but I´ll go through it to locate the exact calls that cause the overhead.
Are all rendertotarget textures created in video memory surfaces by your D3D9 driver?
Most of the mesh cubes have only 2 sides with large textures, 1024 x 64 pixels (there are around at this resolution 300 textures mapped to 150 cubes), the other sides are reduced to 1x1 pixel textures to allow descrete coloring of the cube sides.
All of the larger textures carry descrete information so cannot be referenced off one texture.
So far it seems that the overhead is in the section of my code that clears all these textures using setRenderTarget, but I´ll go through it to locate the exact calls that cause the overhead.
Are all rendertotarget textures created in video memory surfaces by your D3D9 driver?
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
Textures always reside in GPU memory. There's often a shadow copy in CPU memory, though. Locking and unlocking forces these copies to be updated in one or the other way.
I don't see why a) you need textures to color the cubes and b) you cannot put multiple textures together. You simply have to use a cube which has 24 vertices instead of 12 or 18. This is not much of an overhead, but should give you the flexibility of putting all information into the cube. Then simply put all your colors into one texture, or use vertex colors (with the known limitation of mesh duplication). But you can easily reference pure colors from a texture by simply giving all texture coords the same value. Also grouping maybe 8 larger textures into one 1024x1024 texture would even leave enough space to cope with the mipmap leaking in case you don't want to code your mipmaps manually. And the texture coord calculations wouldn't be too complicated. This would bring down your texture number to around 40, which is much better.
I don't see why a) you need textures to color the cubes and b) you cannot put multiple textures together. You simply have to use a cube which has 24 vertices instead of 12 or 18. This is not much of an overhead, but should give you the flexibility of putting all information into the cube. Then simply put all your colors into one texture, or use vertex colors (with the known limitation of mesh duplication). But you can easily reference pure colors from a texture by simply giving all texture coords the same value. Also grouping maybe 8 larger textures into one 1024x1024 texture would even leave enough space to cope with the mipmap leaking in case you don't want to code your mipmaps manually. And the texture coord calculations wouldn't be too complicated. This would bring down your texture number to around 40, which is much better.
Re: Reducing CPU loading when using 100s scenenodes
That all sorta of sounds rather complicated...
I use a 1x1 pixel texture for the cube´s sides that can be any color at run-time but don´t carry images or text.
This limts the GPU´s memory usage and speeds color updating.
Would it be faster using vertex colors?
Mipmaps are turned off, so thats not an issue.
40 textures wouldn´t be enough though, as there are 120 texture surfaces that may contain different information at any one time.
I think its a matter of find a faster way to do the initial update of the textures, as loading all cubes and textures takes a fraction of a second, but the update takes over 8 seconds.
Let me dig through the code and I´ll find out exactly where the bottlenecks occur.
I use a 1x1 pixel texture for the cube´s sides that can be any color at run-time but don´t carry images or text.
This limts the GPU´s memory usage and speeds color updating.
Would it be faster using vertex colors?
Mipmaps are turned off, so thats not an issue.
40 textures wouldn´t be enough though, as there are 120 texture surfaces that may contain different information at any one time.
I think its a matter of find a faster way to do the initial update of the textures, as loading all cubes and textures takes a fraction of a second, but the update takes over 8 seconds.
Let me dig through the code and I´ll find out exactly where the bottlenecks occur.
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
It seems that you still don't know all features of textures and texture coords. You can easily batch several textures together, and choose the proper parts of the large texture to be placed at each cube. The arithmetics are very simple, the management overhead reduction is tremendous.
What update takes so long? Locking and unlocking the textures? Locking the texture will download the data from the GPU, unlocking uploads it again. This means double the data to transport. And additionally each time activating the texture that is locked or unlocked. Pretty much work. You can lock textures write_only (but only for OpenGL, and with next Irrlicht version). This would reduce one texture transport. But after all this is still much overhead.
You could also render to the texture instead of manipulating it on the CPU. But it needs more sophisticated render algorithms. And you must find the real bottleneck first. Otherwise you might optimize things which are fast enough already.
What update takes so long? Locking and unlocking the textures? Locking the texture will download the data from the GPU, unlocking uploads it again. This means double the data to transport. And additionally each time activating the texture that is locked or unlocked. Pretty much work. You can lock textures write_only (but only for OpenGL, and with next Irrlicht version). This would reduce one texture transport. But after all this is still much overhead.
You could also render to the texture instead of manipulating it on the CPU. But it needs more sophisticated render algorithms. And you must find the real bottleneck first. Otherwise you might optimize things which are fast enough already.
Re: Reducing CPU loading when using 100s scenenodes
Thanks for the comments, all very interesting! So there is support to create one texture, and then assign regions within that texture to all the sides of a mesh cube, for example?
I guess thats exactly what most models have, like the clothes of a mesh model all in one texture.
But if I have a larger texture covering the entire mesh, then locking it will cause a transfer of the entire image bitmap to cpu memory, not just the side I need, and that will cause more overhead. So I´m not sure if that will help.
Of course there is no need for the GPU to transfer the texture to cpu memory when to fill the texture with a color. That should be handled by a short comand to the GPU to fill the required texture memory, no GPU-CPU CPU-GPU transfers needed.
Why should Irrlich transfer textures to memory like that? I´m not calling lock, unlock, I call setRenderTarget, which calls the D3D Clear function directly, and that function should clear the texture and not cause large texture buffer transfers.
I guess thats exactly what most models have, like the clothes of a mesh model all in one texture.
But if I have a larger texture covering the entire mesh, then locking it will cause a transfer of the entire image bitmap to cpu memory, not just the side I need, and that will cause more overhead. So I´m not sure if that will help.
Of course there is no need for the GPU to transfer the texture to cpu memory when to fill the texture with a color. That should be handled by a short comand to the GPU to fill the required texture memory, no GPU-CPU CPU-GPU transfers needed.
Why should Irrlich transfer textures to memory like that? I´m not calling lock, unlock, I call setRenderTarget, which calls the D3D Clear function directly, and that function should clear the texture and not cause large texture buffer transfers.
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
Of course, changing just one partial texture would cause more traffic. But I'd assume that you change many textures, and hence you'd have to transfer all those bytes anyway.
Correct. Rendering the data directly would not cause the memory transfer. Also the lcear will be made on the GPU. That's why I also assumed that it's not the major problem. But the high number of textures.
Correct. Rendering the data directly would not cause the memory transfer. Also the lcear will be made on the GPU. That's why I also assumed that it's not the major problem. But the high number of textures.
Re: Reducing CPU loading when using 100s scenenodes
Hi, I´ve tested the code with larger meshes and the getMesh function is taking quite a chunk of CPU time, on a 2GHz system it takes over 2 seconds to get a more complex mesh formed for 40 text characters exported from Blender ir irrmesh format. These are quite large meshes, so I guess there is considerable overhead with even simpler identical meshes. Is there a way to speed mesh loading, or a faster format? One of these meshes is 4 megabytes, and a 9 character mesh is over 2 megabytes,
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
Yes, alsmost every mesh format is faster than irrmesh. This format is text based, and needs heavy parsing.
Re: Reducing CPU loading when using 100s scenenodes
I´ve tried 3DS, comes out some 50% smaller, would you think thats the fastest alternative output from Blender?
-
hybrid
- Admin
- Posts: 14143
- Joined: Wed Apr 19, 2006 9:20 pm
- Location: Oldenburg(Oldb), Germany
- Contact:
Re: Reducing CPU loading when using 100s scenenodes
3ds is a bad format, though, as Irrlicht does not support some very important parts of those files. For static meshes, the fastest format could be .ply. Not sure if blender supports it, though. You might also get good results with the skeletal animated formats, such as b3d, ogre mesh, or .x