Code: Select all
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
29.74 0.43 0.43 2097148 0.00 0.00 irr::core::CMatrix4<float>::operator==(irr::core::CMatrix4<float> const&) const
11.89 0.60 0.17 irr::scene::SMesh::getMeshBuffer(unsigned int) const
10.49 0.75 0.15 2097148 0.00 0.00 irr::video::SMaterialLayer::operator!=(irr::video::SMaterialLayer const&) const
9.10 0.88 0.13 3145748 0.00 0.00 irr::core::aabbox3d<float>::addInternalPoint(float, float, float)
8.05 0.99 0.12 irr::video::SMaterial::operator!=(irr::video::SMaterial const&) const
5.60 1.07 0.08 1835016 0.00 0.00 irr::core::vector3d<float>::vector3d(float, float, float)
4.90 1.14 0.07 262145 0.00 0.00 irr::IReferenceCounted::drop() const
2.45 1.18 0.04 irr::core::vector3d<float>::operator*=(irr::core::vector3d<float> const&)
2.10 1.21 0.03 irr::scene::SMesh::getMeshBufferCount() const
1.40 1.23 0.02 2097148 0.00 0.00 irr::video::SColor::operator!=(irr::video::SColor const&) const
1.40 1.25 0.02 262144 0.00 0.00 irr::scene::IMesh::~IMesh()
1.40 1.27 0.02 1 20.01 140.07 chunk::createMesh()
1.05 1.28 0.02 2883584 0.00 0.00 irr::core::array<irr::scene::IMeshBuffer*, irr::core::irrAllocator<irr::scene::IMeshBuffer*>
that takes, with profiling enabled, about 5000ms (rounded up, 3550 without profiling) to generate the full 3.14m tris (using 350ish mb of ram).
here's the relevant code for generating the mesh data
Code: Select all
ISceneNode *node = smgr->addCubeSceneNode(1.0f, 0, -1, core::vector3df(x,y,z));
node->setMaterialFlag(video::EMF_WIREFRAME, true);
The sad part is that the extremely naive and silly method of generating 262144 cube scene nodes is still faster than this:
Code: Select all
video::SColor c(255, rand() % 256, rand() % 256, rand() % 256);
//SNIP
video::S3DVertex vertices[24] =
{
// Up
video::S3DVertex(-2,+2,-2, 0,1,0, c, 0,1),
video::S3DVertex(-2,+2,+2, 0,1,0, c, 0,0),
video::S3DVertex(+2,+2,+2, 0,1,0, c, 1,0),
video::S3DVertex(+2,+2,-2, 0,1,0, c, 1,1),
// Down
video::S3DVertex(-2,-2,-2, 0,-1,0, c, 0,0),
video::S3DVertex(+2,-2,-2, 0,-1,0, c, 1,0),
video::S3DVertex(+2,-2,+2, 0,-1,0, c, 1,1),
video::S3DVertex(-2,-2,+2, 0,-1,0, c, 0,1),
// Right
video::S3DVertex(+2,-2,-2, 1,0,0, c, 0,1),
video::S3DVertex(+2,+2,-2, 1,0,0, c, 0,0),
video::S3DVertex(+2,+2,+2, 1,0,0, c, 1,0),
video::S3DVertex(+2,-2,+2, 1,0,0, c, 1,1),
// Left
video::S3DVertex(-2,-2,-2, -1,0,0, c, 1,1),
video::S3DVertex(-2,-2,+2, -1,0,0, c, 0,1),
video::S3DVertex(-2,+2,+2, -1,0,0, c, 0,0),
video::S3DVertex(-2,+2,-2, -1,0,0, c, 1,0),
// Back
video::S3DVertex(-2,-2,+2, 0,0,1, c, 1,1),
video::S3DVertex(+2,-2,+2, 0,0,1, c, 0,1),
video::S3DVertex(+2,+2,+2, 0,0,1, c, 0,0),
video::S3DVertex(-2,+2,+2, 0,0,1, c, 1,0),
// Front
video::S3DVertex(-2,-2,-2, 0,0,-1, c, 0,1),
video::S3DVertex(-2,+2,-2, 0,0,-1, c, 0,0),
video::S3DVertex(+2,+2,-2, 0,0,-1, c, 1,0),
video::S3DVertex(+2,-2,-2, 0,0,-1, c, 1,1),
};
u16 indices[6] = {0,1,2,2,3,0};
scene::SMesh *mesh = new scene::SMesh();
for (u32 i=0; i<6; ++i)
{
scene::IMeshBuffer *buf = new scene::SMeshBuffer();
buf->append(vertices + 4 * i, 4, indices, 6);
// Set default material
buf->getMaterial().setFlag(video::EMF_LIGHTING, false);
buf->getMaterial().setFlag(video::EMF_BILINEAR_FILTER, false);
buf->getMaterial().MaterialType = video::EMT_TRANSPARENT_ALPHA_CHANNEL_REF;
// Add mesh buffer to mesh
mesh->addMeshBuffer(buf);
buf->drop();
//SNIP
scene::SAnimatedMesh *anim_mesh = new scene::SAnimatedMesh(mesh);
mesh->drop();
scaleMesh(anim_mesh, scale); // also recalculates bounding box
ISceneNode * Chunk = smgr->addMeshSceneNode(anim_mesh);
eyeing over the profiling analysis of the algorithm I can already tell why it's so bloody inefficient.
Code: Select all
14.03 2.42 2.42 18874364 0.00 0.00 irr::core::CMatrix4<float>::operator==(irr::core::CMatrix4<float> const&) const
7.38 5.36 1.27 40866384 0.00 0.00 irr::core::irrAllocator<irr::core::CMatrix4<float> >::irrAllocator()
5.34 7.22 0.92 18874364 0.00 0.00 irr::video::SMaterialLayer::operator!=(irr::video::SMaterialLayer const&) const
3.95 9.51 0.68 34574928 0.00 0.00 irr::video::SMaterialLayer::~SMaterialLayer()
3.31 10.08 0.57 53449312 0.00 0.00 irr::core::irrAllocator<irr::core::CMatrix4<float> >::internal_delete(void*)
2.27 10.47 0.39 24372313 0.00 0.00 operator new(unsigned long, void*)
2.00 10.82 0.35 34574928 0.00 0.00 irr::core::irrAllocator<irr::core::CMatrix4<float> >::~irrAllocator()
1.86 11.14 0.32 34574928 0.00 0.00 irr::core::irrAllocator<irr::core::CMatrix4<float> >::deallocate(irr::core::CMatrix4<float>*)
1.74 11.44 0.30 34574928 0.00 0.00 irr::core::irrAllocator<irr::core::CMatrix4<float> >::destruct(irr::core::CMatrix4<float>*)
That's a lot of allocation calls, a lot of matrices and just a whole ton of insane over processing it seems*.
*Note: all of the entries in the log are handpicked from the analysis to showcase the worst parts.
So, the question is.
What is a better way to generate these meshes in a clean way, memory efficiency is more important than runtime speed (in milliseconds), so long it doesn't take more than ~5 seconds it's an acceptable runtime speed for generating the mesh (I reckon actually generating the optimized mesh will be orders of magnitude faster).
Ideally I'd like to use less than 40mb for the mesh data, however this might be impossible for the naive allocator of filling the entire volume with blocks (is it?).
Really, anything faster than the cube scene node method (preferably by building the mesh instead of calling a function as I'll want to generate fewer faces later, hence why I'm trying to write my own function and doing very poorly) would be fine.
Recap: I do not wish to get advice for how to optimize generation, I know full well that generating 262144 cubes (3.14M tris) per chunk isn't a viable long term strategy, however I am primarily concerned with getting something up so I can start implementing other features, mesh optimizations would be farther down the line as compared to other things like paging.