I have a few theories as to why this is and I'm seeking information as to which one(s) may or may not be the case:
theory one: irrlicht renders each mesh buffer as a draw call, I have thousands of mesh buffers per mesh scene node.
theory two: irrlicht stores a copy of the material assigned to each mesh buffer and then renders the copies as separate draw calls (functionally identical to theory one but semantically different)
theory three: my mesh somehow doesn't agree with irrlicht
theory four: I am vertex bottlenecked at 3 verts per primitive, quads would improve the vert bottleneck at cost of fragment bottlenecking
theory five: irrlicht really doesn't agree with my graphics driver (I find this unlikely)
theory six: irrlicht is terribly slow on linux
theory seven: irrlicht is terribly slow (poor optimization?)
They are ranked by what I wager would be the probabilities, if each mesh buffer is considered one draw call then what's the max vert count for a mesh buffer? (can a mesh buffer have many materials attached to different primitives?).
if theory two is true, then how would one inhibit this behavior?
if theory three is true I am at loss as to how to solve this.
if theory four is true, then that's a reasonably simple fix.
if theory five is true: then I'll just have to hack together my own openGL renderer (not an additional workload I'm looking forward to in this case)
theory six and seven: see theory five.
I wager most of you would be able to identify which theory is the correct one and possibly point out other reasons why my renders are so slow (the 15 fps number was picked because that's what my 10k scenes run at, 10x slower than MC is unacceptable)
Any other information on how to speed up renders are greatly appreciated.
here's my main loop, it's not particularly cluttered:
Code: Select all
while(device->run())
{
if(ticks == 20)
{
cMgr->poll();
ticks=0;
}
++ticks;
driver->beginScene(true, true, SColor(255,140,255,140));
smgr->drawAll();
guienv->drawAll();
stringw str = L"Hexahedron World [";
str += driver->getName();
str += "] VERSION:";
str += VERSION;
str += "Triangles: ";
str += driver->getPrimitiveCountDrawn();
device->setWindowCaption(str.c_str());
driver->endScene();
}
Code: Select all
void chunkMgr::poll()
{
//unloadList();
//loadList();
buildList();
//updateList();
}
Code: Select all
void chunkMgr::buildList()
{
for( int index = 0; index != indx; index++)
{
if(loadID.at(index) == NULL)
{
bList.at(index)=true;
}
else
{
//printf("Not building local chunk #%d (already built?)\n",index);
bList.at(index)=false;
}
}
build();
}
This happens every 20 frames.
let's check build for completeness
Code: Select all
void chunkMgr::build()
{
for(int index = 0; index != indx; index++)
{
if(bList.at(index) == true)
{
loadID.at(index) = new chunk(index);
bList.at(index) = false;
}
}
}
overall this might add a millisecond or two to the frame in question at most.
in other words: my rendering loop is not at fault.
but for sake of argument, let's get the exact triangle count and the correct mesh buffer count:
- Irrlicht Engine version 1.8.1
Linux 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015 x86_64
Creating X window...
Visual chosen: : 133
Using renderer: OpenGL 2.1
Mesa DRI Intel(R) Ironlake Mobile : Intel Open Source Technology Center
OpenGL driver version is 1.2 or better.
GLSL version: 1.2
Triangles: 10296
858 mesh buffers
FPS: 30 (avg), 31 (peak), 5 (low, only during the first few ms) [as reported by driver->getFPS()]
- Triangles: 82368
6864 mesh buffers
FPS: 4 (avg) 8 (peak) 1 (low, only during the first few ms) [as reported by driver->getFPS()]
other baseline values that aren't 15 fps:
TES IV oblivion, medium/low settings 30ish FPS (avg), 70 (peak, mostly in empty well lit areas), 10 (low, with litebrite enabled in dark dusky areas with a lot of shadows)
Fallout 3 and Fallout new vegas: medium settings, 45 FPS (avg), 70ish (peak), 30 (low)
Half-Life 2, medium settings, 30 FPS (avg), no other data collected. (low is terribly low, that's the best estimate I can give, high seems to be about avg, the game is really stable with its fps, the only times I get dips is with a lot of explosions and on the title screen that uses a fancy water shader)
This joke of a GPU is, again, really not good but it does hold up to some abuse, the values I get are far too low to indicate only gpu, there's a clear bottleneck.
and I intend to find it or I'm forced to hack together a rendering engine of my own as these FPS values (naive optimization) aren't even remotely in the target range (80k tris is a reasonable render count for a much larger render distance than 2 chunks in each direction, either way 80k tris is not that much, even the ps1 could push that if you really wanted to torture the thing).
As theory 1 and 2 state, this would indicate 6864 draw calls instead of a more realistic 16ish (assuming two mesh buffers per chunk @ 8 chunks), if this is indeed the case then the solution is reasonably obvious: use fewer mesh buffers.
the issue is, how does one merge mesh buffers?
On that matter, perhaps it's a better idea to just build the entire mesh in a vertex shader, given that I upload it ot the gpu anyawy with VBOs this would seemingly skip an entire step that could potentially bottleneck the entire thing.
especially since I'll have to use a vertex shader to do greedy meshing, one could assume that perhaps it'd make more sense to build the entire mesh from scratch (or indeed, build it entirely from voxel metadata since that data is strictly speaking all that's needed to traverse the optimization algorithm)
Other potential issues I see: I was told one cannot allocate a node in the middle of a frame, does this also go for creating mesh buffers or meshes? or is this fine so long they aren't attached to a node until the frame has rendered?
if this is indeed the case, what about modifying a mesh attached to a node while mid-frame, is this also a no-no?
or to simplify: what mesh and node operations can and can't be done during the rendering of a frame (this is of course, vitally important since I'm going to generate chunks in a multithreaded fashion)
to reiterate, since it was mentioned way up: any general advice on speeding up irrlicht in additon to what I asked would be greatly appreciated, I need to squeeze out every bit of performance I can.