Saga3D - Re-creating Irrlicht with Vulkan

mant · Post by **mant** » Fri Nov 09, 2018 2:44 am

Hello everyone, today I want to announce an open-source project.
My goal is to make a graphics library based on Irrlicht's source code, and touch everything I can to modernize.

First, here is a topic demonstrating some of my work with original Irrlicht:
http://irrlicht.sourceforge.net/forum/v ... =6&t=52165

Reason: It's too slow for my goal (especially animated models, only runs with 10-20 characters, limited graphics pipeline makes good graphics impossible) so I want to create my own graphics engine, more explanation and design philosophy are in the repo's Readme file.

Saga3D Repository: https://gitlab.com/InnerPieceOSS/Saga3D
Mailing list: https://groups.google.com/forum/#!forum/innerpiece_oss
Discord chat: https://discord.gg/QdY7tuJ
Sample Applications: https://gitlab.com/InnerPieceOSS/Saga3D ... er/samples

I would like to hear recommendations from you, ideas or collaboration offer.
Have a nice day!

Major changes compared to original Irrlicht:
- Use SDL2 device only
- STL containers
- Assimp & STB_Image for loading resources
- Simple & Flexible API
- Android Vulkan port is in progress and functional a little (window clearing)

mant · Post by **mant** » Tue Nov 20, 2018 5:49 am

An Android port is in progress: https://groups.google.com/forum/#!topic ... YWwXmiSyno

mant · Post by **mant** » Tue Nov 27, 2018 2:31 am

Camera scene node is ready, next is FPS camera and texture stuffs: texturing, render target, ...

mant · Post by **mant** » Wed Nov 28, 2018 3:43 pm

Initial Android port

robmar · Post by **robmar** » Wed Nov 28, 2018 9:17 pm

Looks like you've been making some interesting progress. The way I see it though, adding another driver may not be the best way when bGFX project has an Irrlicht version, and that includes DX12, and Vulkan is underway.

EZ Vulkan would also make the process easier, if one was to be added, but Vulkan is really aimed at parallel processing, and for that to be useful with Irrlicht, it would need a major redesign because it was designed to be single threaded.

Still there would be a lot of people interested to see a Vulkan driver on Irrlicht, as it runs fast and on more devices than any other driver as far as I know.

mant · Post by **mant** » Thu Nov 29, 2018 2:28 am

Over 90% of Irrlicht code is re-written for modern approach. I think I only keep an Irrlicht shape for the project.
And yeah, there'll be a development phase focusing on parallelism.

robmar · Post by **robmar** » Thu Nov 29, 2018 7:34 pm

Irrlicht has no structure for CPU or GPU parallel processing. To a large extent it can be used as and API, leaving users to handle multi-tasking, but without DX12 or Vulkan, there is no GPU multitasking, but how useful is that anyway considering most people have one GPU, and that's usually where the bottleneck occurs.

devsh · Post by **devsh** » Fri Nov 30, 2018 5:08 am

Its is very useful because a single GPU now has many command dispatch schedulers (Nvidia is 2-4 cores per scheduler), which allows for async compute or async rendering in general.

Independent tasks can be executed simult. to other tasks that would undersaturate the GPU.

The drawing commands you send to the GPU don't actually happen in the exact order they are submitted, they can overlap, etc.

And also DMA copy engines, which are async. cores, you can do updates to buffers while the gpu is rendering or doing compute.

mant · Post by **mant** » Tue Dec 04, 2018 9:48 am

devsh wrote:Its is very useful because a single GPU now has many command dispatch schedulers (Nvidia is 2-4 cores per scheduler), which allows for async compute or async rendering in general.

Independent tasks can be executed simult. to other tasks that would undersaturate the GPU.

The drawing commands you send to the GPU don't actually happen in the exact order they are submitted, they can overlap, etc.

And also DMA copy engines, which are async. cores, you can do updates to buffers while the gpu is rendering or doing compute.

Thanks for the info, I'll combine this with my future research on GPU parallelism to design Saga3D's multi-threaded API.

mant · Post by **mant** » Tue Dec 04, 2018 9:49 am

Mesh loading and texturing sample is ready.
https://gitlab.com/InnerPieceOSS/Saga3D ... h/main.cpp

robmar · Post by **robmar** » Wed Dec 05, 2018 11:19 am

@devsh So the render process runs, loading of meshes to CPU and GPU, load/update textures to GPU, update animations CPU values, issue render commands to GPU... GPU starts rendering, shader units access meshes and textures... so we can't update those meshes or textures until the rendering has finished, but we can upload any new meshes and textures...

After drawAll, endScene exits after all rendering has finished and the frame has been presented.

So parallel updates to the GPU while rendering in the main thread happen in another thread, in fact we could use a thread to load meshes, another to load textures, etc.

How do you see Irrlicht parallel processing being best implemented?

mant · Post by **mant** » Wed Dec 05, 2018 1:51 pm

robmar wrote:@devsh So the render process runs, loading of meshes to CPU and GPU, load/update textures to GPU, update animations CPU values, issue render commands to GPU... GPU starts rendering, shader units access meshes and textures... so we can't update those meshes or textures until the rendering has finished, but we can upload any new meshes and textures...

After drawAll, endScene exits after all rendering has finished and the frame has been presented.

So parallel updates to the GPU while rendering in the main thread happen in another thread, in fact we could use a thread to load meshes, another to load textures, etc.

How do you see Irrlicht parallel processing being best implemented?

I think there's no way for Irrlicht to implement parallelism without breaking most of its APIs.

netpipe · Post by **netpipe** » Thu Dec 06, 2018 10:56 am

amazing work. i was wondering about using virtualgl for mobile devices too or even virtualmachines.

devsh · Post by **devsh** » Thu Dec 06, 2018 4:05 pm

@devsh So the render process runs, loading of meshes to CPU and GPU, load/update textures to GPU, update animations CPU values, issue render commands to GPU... GPU starts rendering, shader units access meshes and textures... so we can't update those meshes or textures until the rendering has finished, but we can upload any new meshes and textures...

After drawAll, endScene exits after all rendering has finished and the frame has been presented.

So parallel updates to the GPU while rendering in the main thread happen in another thread, in fact we could use a thread to load meshes, another to load textures, etc.

How do you see Irrlicht parallel processing being best implemented?

This is why I've changed all the Irrlicht API and made a clear distinction between CPU objects and GPU objects (they are actually mirroring each other in structure, except that the GPU counterparts are a little less mutable).

Generally this works by using GPU API fences/events, for example I can prepare buffers (create or modify) and textures in other threads and have them already sitting in GPU memory, then in those threads I create a fence/event in the context/queue of that thread and pass the fence/event to the main thread which renders the prepared data.

I've implemented two ways to use a fence that signals the completion of a GPU upload:
1) Wait for the fence to signal (stall if not ready)
2) Skip some drawcalls (fence attached to a ISceneNode) which doesn't stall, this is especially useful for streaming as it doesn't matter when exactly the objects appear on screen

Vulkan allows far more parallelism, i.e. it allows you to overlap your drawcalls when rendering into different render-targets (lets say shadowmap rendering with deferred lighting shader) via explicit dependency specification and pipeline barriers. But that is not related to CPU multi-threading.

After drawAll, endScene exits after all rendering has finished and the frame has been presented.

endScene never does that in my fork or current stock irrlicht, rendering for even the previous frame is still going on when you call endScene.

so we can't update those meshes or textures until the rendering has finished, but we can upload any new meshes and textures...

This is why people use "staging buffers" as opposed to what stock irrlicht uses, which is a lot of immediate-mode commands which are bad, examples:
lock(),unlock() in D3D9
TexSubImage/TexImage with no UNPACK_BUFFER
etc.

What the above do is block the CPU until the C++ memory is no longer needed (consumed and copied already), the old APIs like D3D9 and OpenGL do implicit synch for you, so they will actually wait for all previous rendering and GPU commands to complete before doing the copy you requested so that everything appears to happen in the same order as your GL/D3D9 commands.

So instead of doing a direct CPU raw C++ memory -> GPU object in VRAM transfer that also blocks the CPU until finished, what we do is:
Raw C++ memory -> GPU visible unsynchronised Staging Buffer -> GPU object in VRAM

This is really good because you can copy to your staging buffer immediately (you do it all yourself, synchronise and detect when a subrange of the staging buffer is not used anymore), throw away the c++ memory, then wait with the data in the staging buffer to be copied by the GPU into the objects you wish to update at the correct time (staging buffer is a proper API buffer, and copy operations from buffer to buffer or from buffer to texture actually go onto a command buffer and get queued up in the right order).

This obviously means that you must pre-allocate enough staging buffer memory to hold all updates that can happen across 3 or 4 frames (100ms or so) and/or stall when you run of of staging mem. This is exactly what I do in the latest version of IrrBAW, in SDeviceCreationParameters you can set the staging buffer sizes, for now the default is 64mb for download and 64mb for upload built-in staging buffers.

If the CPU doesn't need to read what it wrote and writes sequentially, then you can even skip the raw c++ memory and write straight to staging buffer for extra perf.

robmar · Post by **robmar** » Wed Dec 12, 2018 1:11 pm

Seems complicated, good though, but keeping track of even more stuff happening is going to be hard for all of us without brains the size of a small planet

Rendering large models, like 1GB meshes of complete cars seems to just load the GPU so much. I guess its still rendering all the sub-meshes even if they are not finally visible.

So you made a lot of change to Irrlicht 8.4 for parallel processing then? Does it also handle sub-mesh rendering so as not to render unless visible?

How about EZ Vulkan, looks good, easy Vulkan API set-up, but you still have all the low-end facilities.

Irrlicht Engine

Saga3D - Re-creating Irrlicht with Vulkan

Saga3D - Re-creating Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan

Re: Saga3D - Modernizing Irrlicht with Vulkan