irrlicht inherently slow

virus · Post by **virus** » Fri Dec 02, 2005 3:17 pm

I did find some other posts about irrlicht being slow and few optimizations but haven't found a single post about the core problems. I noticed the problem with 5000 randomly positioned sydney models, 3395000 triangles, this produced about 6 fps on my computer (no animations, using vertex buffers on opengl).

1. When I truned the camera so that all the objects were outside of frustrum they were still rendered since the frustrum test sucks currently.
2. After turning over 90 degrees the nodes stopped rendering and the framerate went to whopping 30 FPS

So I wondered where the rest of the time goes and lookielookie, irrlicht just goes trough every node systematically. It's no wonder that I don't get better FPS when you test culling 5000 times per frame (~31ms). So here's what I think should be done to gain enough speed to be considered a fast engine.

Better frustrum culling: first do a sphere - sphere check against the frustrum, then sphere - cone against the frustrum, then test if sphere is inside frustrum and only after that test for box - plain (if the box is close enough to intersect with frustrum).
Occlusion culling: run HW occlusion queries simultaneously to rendering (Coherent Hierarchical Culling) to avoid rendering occluded objects.
Rendering: store nodes in kdtrees to allow for quick culling of invisible child objects. Create an separate tree for dynamic objects.

After these updates the engine should run alot faster even with alot of stuff. The frustrum culling is easy to do and should be done properly, the method I proposed would require adding collision sphere to every node. Occlusion queries are pretty easy too but isn't very useful to the current system. Stop & Wait occlusion query method might even make it slower. For rendering you need to create an kDTree system that can traverse the nodes from front to back.

Also it would be suggested to add timer to the nodes that checks wheter or not the check for occlusion, since objects in the screen usually stay there for more than 1 frame there's no point in testing every frame. Some heuristics could be used to alter this value depending on the situation (camera movement/rotation speed, objects speed, objects position etc). Also the ammount of pixels the object should take before it'll be rendered can be altered. You don't want to render and object if only 1 pixel is visible, some heuristics could be used again to determine the ammount of pixels needed.

Once I get my master plan for the engine rewamp ready I'll post it here

The goal is to make irrlicht in to a stable next-gen engine

What I'm intrested in is what others have done to increase the output of the engine in a large scale? Suggestions about possible optimizations/changes are welcome too

dhenton9000 · Post by **dhenton9000** » Fri Dec 02, 2005 5:17 pm

You should code these changes yourself. If they are as good as they say they are, chances they'll get included in a release.

Conquistador · Post by **Conquistador** » Fri Dec 02, 2005 5:18 pm

Sounds interesting... Keep us posted!

Guest · Post by **Guest** » Fri Dec 02, 2005 5:33 pm

dhenton9000 wrote:You should code these changes yourself. If they are as good as they say they are, chances they'll get included in a release.

That's the general idea, I just informed other people about this aspect of the engine. As I said the goal is to make irrlicht in to a stable next-gen engine

Other · Post by **Other** » Fri Dec 02, 2005 5:33 pm

Very good idea!!!

PS: Whats about IrrlichtNX, IrrlichtNX++, Nebula - are they as slow as Irrlicht too?

Guest · Post by **Guest** » Fri Dec 02, 2005 5:45 pm

no actually they are faster

sounds like some good ideas and its good to know how it should be done (at least for me, i dont want to stick with irrlicht all my life

no offense here )

virus · Post by **virus** » Fri Dec 02, 2005 5:46 pm

I quickly checked IrrlichtNX changelogs and it seems it has sphere vs frustrum and sphere vs cone culling. So it's a bit faster but the core problem still remains, I haven't heard of Nebula before

pfo · Post by **pfo** » Fri Dec 02, 2005 7:20 pm

5000 randomly positioned sydney models, 3395000 triangles, this produced about 6 fps on my computer

That is a lot of models and a lot of triangles, what kind of FPS were you expecting? You seem to be pretty knowledgable on this stuff, what kind of FPS do you think Irrlicht is capable of getting? I ask these questions because right now I am making some simple game objects, and soon I intend to start stress testing them by adding several of them to the game and testing AI, etc... I am using Newton Physics, which takes up a good chunk of CPU time and I am worried that in the future I will have to scale back my ambitions because I won't be able to compute everything I want to and get a decent frame rate. I've seen several speed fixes posted on this board, I might grab and test them when I get to that point to see if I can't speed things up.

Electron · Post by **Electron** » Fri Dec 02, 2005 7:46 pm

virus, IrrlichtNX/NX++ developemnet has stopped, we are now working on Lightfeather which has the progressively more accurate frustum checks, BHV tree for quick culling of children, and hardware occlusion culling. It also has PVS and portal culling, but these of course require external setup/preprocessing. If you want to do these things in Irrlicht, you might want to have a loo at how they're done in lf. Another important optimizations is of course the use of vbo's (in ogl).

burivuh · Post by **burivuh** » Fri Dec 02, 2005 10:00 pm

And do you have any ideas how to optimize engine by the memory?
Scene with 13 000 triangles and 1,5 mb of textures (OpenGL mode, X Meshes) is eating about 100 Mb of RAM.

roninmagus · Post by **roninmagus** » Fri Dec 02, 2005 10:55 pm

Do you actually need to store a bounding sphere, or can you just keep in memory the distance from the center of the model to the farthest vertex, and use that as a radius for a sphere? It could be recalculated when the transformation is changed...

I'm no core engine programmer though, that may get very expensive on a large poly model that is transformed often.

Electron · Post by **Electron** » Sat Dec 03, 2005 12:36 pm

storing a sphere doesn't take much memory. vector3df for center, a single float for radius, so that's only 13 bytes. Compared to the memory eaten up by textures and geometry, 13 more bytes per node is small change

There are various ways to optimize memory, and to be honest I'm not sure exactly how much memory optimization Lightfeather is doing, I had no role in coding that part of the engine.
1) Only store information for each vertex which you need. Many models don't use vertex colors, therefore storing vertex colors can be very wasteful. Unfortunately irrlicht always stores vertex colors, and this is not possible to change in a clean manner because of the way its vertex class is implemented. It is not flexible.

2) Use compressed textures in video ram (if video card subborts it). Btw are your 1.5 mb of textures bmp, or something much more compressed, like jpg. Generally, a given image will have roughly the same memory usage as its bmp size, assuming same color depth. Changing the color depth is one way to reduce memory usage, but of course leads to lower quality

3) Completely offload unneeded data. Once a scene is loaded, some texture and mesh data can be completely offloaded to video ram and need not consume system ram. This is fast for the video card, and frees up system memory. Of course you have ot be careful that the resources aren't still needed in system memory for something. . .This obviously can't be done with animated meshes (unless your doing skeletal animation with gpu skinning).

Probably something I've forgotten there too, I'm not an expert by any means.

virus · Post by **virus** » Sat Dec 03, 2005 1:01 pm

Yea 5000 nodes is alot and the ammount was just to stresstest the engine, and it clearly pointed out the main problems

The idea is that 5000 randomly positioned nodes in such a small space will occlude eachother alot. With correct culling this could still be possible, of course calculating physics/ai/etc will all take it's toll but at least you have more time to do other stuff and still retain playable FPS

I noticed IrrlichtNX++ isn't developed anymore and I did check Lf but didn't go in to any more details about it. The features you listed are pretty much what I proposed for Irrlicht. I'll certainly check Lf later. VBOs were the second thing I corrected with Irrlicht

I haven't checked the memory usage much but I assume them main problems are the storing of vertex and texture information. For example if you have IrrSpintz 8 texture support you'll store 8 texture coordinates, color, position etc per vertex. Which is a great waste if they aren't used.

Guest · Post by **Guest** » Sat Dec 03, 2005 5:53 pm

Lightfeather may be faster internally - but on windows at least it will be SLOWER for many "Older" systems because it only uses OpenGl.. poor OGL support on *OLDER* windows systems means DX7/8 is a must. They have no plans to put in DX7/8 - and DX9 (eventually) is too "high" a level to require in all but Famouse retail games (if you want widespread compatiblity with older systems - as many indiegame developers do).

Eternl Knight · Post by **Eternl Knight** » Sat Dec 03, 2005 9:25 pm

Just a small note - DX7/8 on older machines is not guaranteed to speed up anything either. With older machines, you take the risk that the drivers are not up to par for anything (or that there is even a 3D card present).

As such, DX7/8 is not a "required" feature but a "desired" one as it MIGHT make things faster.

--EK