Streaming load features for IRRlicht...

christianclavet · Post by **christianclavet** » Fri Jul 04, 2008 9:17 pm

Hi,

It would be wonderful if in the future, we could use load functions and have a status of the loading process and being able to do something ELSE while loading the file.

At the moment, we have to display a "loading" graphic and hope the user will know it's loading and it's not crashed.

Doing this will surely mean, pathching the IRRlicht core to use a separate cpu thread and update the status to the loader. For compatibility with older source, we could have the 2 methods and use a parameter in the function to use the new method.

For this to work; The meshbuffer/image buffer access should be blocked until the status is load-completed (for example, you render the scene graph while streaming the level's assets (as in SECOND LIFE)) So before rendering the scene, the scene manager will check if the current geometry/images is complete.

For other useful statuses we could use a function to retrieve the size of the file and the current loading progress (bytes loaded). (So with this we could display % loaded on the load screen while it's loading.

I only wish that this could be read and evaluated by the IRRlicht dev team so we could have this in future releases. (They have so much to do already) Perpaps if someone can do this, could post a patch request?

I think that having this will have the user feeling the engine is getting faster because we could load an IRRscene, and start the rendering directly (we'll see meshes coming on the screen as they are being loaded). If it's planned carefully by the game designer, the user would not even see the mesh coming on screen (could load a small building first, then after the rest of the level))

Dark_Kilauea · Post by **Dark_Kilauea** » Fri Jul 04, 2008 10:00 pm

An asynchronous file loader is your answer, however it would require extensive modification to irrlicht to do. Mainly, we would have to separate file loading from the scene manager, then thread safe the scene manager to protect against additions to the scenegraph while rendering. This will probably affect rendering time.

One might be able to pull it off without threads, but IMO threads are both safer and easier to implement. I'm just worried about the mutex that may be required in the rendering loop to pull this off. A mutex will end up hurting rendering speed.

christianclavet · Post by **christianclavet** » Fri Jul 04, 2008 10:32 pm

Just a kind of global value that "loading" is occuring. The rendering will surely be a "little" slower when loading at the same time (I think it's normal, now we don't have nothing)

So when the loading is started and not yet completed a "global" flag somewhere is having "loading proceeding".

If this flag is on when the scene manager will render the scene(a single simple check before rendering the frame), it will check for every meshes/images to be complete in the buffers. If loading is complete then it will simply render the scene without checking for this. Only when the first occurence of the loading occur stop the current render (So it will not crash the scene manager) and restart with the flag ON.

There would be to have some modifications to the core loading functions, and the scene manager also.

rogerborg · Post by **rogerborg** » Sat Jul 05, 2008 12:24 am

Dark_Kilauea wrote:One might be able to pull it off without threads

What would be the point? The idea is to do work while the loading thread is busy (if you've got multiple cores) or stalled waiting on I/O.

Dark_Kilauea wrote:but IMO threads are both safer

Ouch.

Dark_Kilauea wrote:and easier to implement.

Now you're trying to wind me up, I just know it.

Dark_Kilauea wrote:I'm just worried about the mutex that may be required in the rendering loop to pull this off. A mutex will end up hurting rendering speed.

It depends what you mutex. If you mutex (e.g.) the entire mesh and texture caches, then you'd either have to stall or skip rendering if you can't take the mutex, either of which would negate the point of parallelising the loading. Mutexing each mesh / texture would allow smaller granularity, but with more opportunities to screw up. Note that you'd have to mutex scene nodes as well, as you wouldn't want the loading thread to modify a node as it's being used by the scene manager, or vice versa, and you'd have to protect the loading thread from accessing any resources while the main thread is deleting resources, including when it's shutting down.

christianclavet wrote:Just a kind of global value that "loading" is occuring. The rendering will surely be a "little" slower when loading at the same time (I think it's normal, now we don't have nothing)

So when the loading is started and not yet completed a "global" flag somewhere is having "loading proceeding".

If this flag is on when the scene manager will render the scene(a single simple check before rendering the frame)

Only if you can guarantee that loading can't begin after the scene has started rendering.

christianclavet wrote:it will check for every meshes/images to be complete in the buffers.

I guess if the flag starts clear and is set once on completion that would be as close to thread safe as you're going to get, but since mutexing is required anyway to control access to (at least) the scene nodes, you might as well mutex the resources.

christianclavet wrote:If loading is complete then it will simply render the scene without checking for this. Only when the first occurence of the loading occur stop the current render

How? Actually, that may not be an issue, if you can guarantee that loading will only begin outside a scene begin/end.

christianclavet wrote:There would be to have some modifications to the core loading functions, and the scene manager also.

And nodes, and meshes, and textures, and you'd have to ensure that both threads would be deterministically safe under error and shutdown conditions.

I have to ask: have either of you written or maintained a large multithreaded project? It's certainly possible to multithread Irrlicht, but it would be a major rewrite, and the potential to introduce crash bugs and lock ups (race conditions) is significant.

It's not really something that you'd want to commit to if you're learning as you go. I'd suggest that you write some multithreaded sample apps first to help you spot the pitfalls. I'd be happy to help you crash and lock them up.

christianclavet · Post by **christianclavet** » Sat Jul 05, 2008 3:14 am

Hi Rogerborg.

From your question.

I have to ask: have either of you written or maintained a large multithreaded project? It's certainly possible to multithread Irrlicht, but it would be a major rewrite, and the potential to introduce crash bugs and lock ups (race conditions) is significant.

It's not really something that you'd want to commit to if you're learning as you go. I'd suggest that you write some multithreaded sample apps first to help you spot the pitfalls. I'd be happy to help you crash and lock them up.

Definitively no. I got some little knowledge on how threads works. Still This is something worth investigating. It would improve a lot on IRRlicht performance when loading scenes.

Could there be a way to load the data somewhere else (another special memory buffer) then moved to the standard buffer ONCE it's completed (And checking to do this between the frames rendering)?

Like a task that doing all the job outside IRRlicht and then give it the buffer telling it, it's has been loaded. The user would have to:
1. Issue the threaded loading command
2. Check for a new type of EVENT that tell the loading is complete
3. Issue a command to "load" from that memory zone loaded from the other thread.

Theoricaly something like this:

Code: Select all

1. Load file in memory in a separate thread (process as if it was loaded in a buffer). The IRRlicht thread continue execution.

2. When completed the control of the memory zone is given to the IRRlicht thread. This could be a [u]new type of event [/u]that the main IRRlicht thread use to continue the loading process (as the third step)

3. Copy the loaded buffer to the meshbuffer and update the scene graph. (same thing as if the file was loaded, but this time from the memory) (Memory to memory would surely be much faster and the user would not feel the delay too much.)

4. release the memory used to temporarely hold the buffer data.

This could be the equivaled of a simulated "ram disk" for one file. Using a RAM DISK solution could be used, but if the game as more that 1gig of files, that would not be possible.

I'm only putting suggestion on how I think it could be done "theoricaly". I really appreciate that you replied and given the possible problems for implementation. If you have any ideas that could be propose a realistic solution for this...

Dark_Kilauea · Post by **Dark_Kilauea** » Sat Jul 05, 2008 4:33 am

To answer your question, rogerborg: Yes I have developed multithreaded commercial applications.

Thinking about it over the 4th of July celebrations, I realized the best way would probably be to create the mesh buffer/texture in a loading thread, then pass the pointer to the scenegraph quickly once loading is complete. Then, never touch the meshbuffer/texture again, it would be handled as they are handled now.

With only mutexing when the pointer gets switched over, the slow down on the system should be insignificant (not counting the load of loading a file).

Progress could be gotten via a getProgress() method (or whatever).

I'm not looking to do this just yet (I'm quite busy myself). I'm just putting down suggestions for whoever feels brave enough to take this on.

BlindSide · Post by **BlindSide** » Sat Jul 05, 2008 4:48 am

Some stuff don't really need any wariness while threading. For example you can render stuff while the scene manager is busy constructing an octtree triangle selector (Which usually takes some time).

Heres a simple example showing how to do this (Just wrote it now, so might have errors, but I've tried something identical to this and it worked fine.):

Code: Select all

#include <pthread.h>
#include <windows.h>
#include <stdio.h>
#include <irrlicht.h>

#pragma comment(lib,"pthreadVC2.lib")
#pragma comment(lib,"irrlicht.lib")

using namespace irr;
using namespace scene;
using namespace core;
using namespace video;

struct tStruct
{
	IrrlichtDevice* dev;
	ISceneNode* node;
	IAnimatedMesh* mesh;
}

void* doOtherStuff(void * stuff)
{
	tStruct mytStruct = *((tStruct*)stuff);

	mytStruct.dev->getSceneManager()->createOctTreeTriangleSelector(mytStruct.mesh->getMesh(0),mytStruct.node);

	pthread_exit(NULL);
	return (void*)0;
}

int main()
{
	IrrlichtDevice* device = createDevice(video::EDT_OPENGL,core::dimension2d<s32>(640,480));

	device->getSceneManager()->addCameraSceneNode();
	device->getFileSystem()->addZipFileArchive("../../media/map-20kdm2.pk3");
	
	scene::IAnimatedMesh* q3levelmesh = smgr->getMesh("20kdm2.bsp");
	scene::ISceneNode* q3node = 0;
	
	if (q3levelmesh)
		q3node = smgr->addOctTreeSceneNode(q3levelmesh->getMesh(0));

	pthread_t* myThread2 = new pthread_t();
	
	tStruct mytStruct;
	mytStruct.dev = device;
	mytStruct.mesh = q3levelmesh;
	mytStruct.node = q3node;

	pthread_create(myThread2,NULL,doOtherStuff,(void*)&mytStruct);

	ISceneNode* cuby = device->getSceneManager()->addCubeSceneNode(10,0,-1,vector3df(0,0,30));
	cuby->setMaterialFlag(EMF_LIGHTING,false);
	ISceneNodeAnimator* anim = device->getSceneManager()->createRotationAnimator(vector3df(0.5f,0.5f,0.5f));

	cuby->addAnimator(anim);
	 
	// Show something cool while it's loading.
	while(device->run())
	{
		device->getVideoDriver()->beginScene(true,true,video::SColor(255,255,0,0));
		device->getSceneManager()->drawAll();
		device->getVideoDriver()->endScene();
		
		Sleep(10);
	}

	device->drop();

	pthread_exit(NULL);
}

Dark_Kilauea · Post by **Dark_Kilauea** » Sat Jul 05, 2008 10:38 am

It should work, but that triangle selector better be checked for existance beforehand (and probably in the loop).

Dorth · Post by **Dorth** » Sat Jul 05, 2008 1:37 pm

also, not everything needs to be loaded in a thread. If we know something will go fast or is essential, we can just load it atomically. Implementing threads in Irrlicht is beyond my capacities (and time, too), but I'd help however I could since it could bring benefits all around the place...

rogerborg · Post by **rogerborg** » Sat Jul 05, 2008 2:22 pm

BlindSide wrote:Some stuff don't really need any wariness while threading.

And they couldn't hit an elephant at this dist

BlindSide wrote:Heres a simple example showing how to do this (Just wrote it now, so might have errors, but I've tried something identical to this and it worked fine.):

Close the window while the createOctTreeTriangleSelector() is still being created, and see what happens. If it doesn't crash, repeat until it does. Or remove the node, or purge the mesh from the cache, both things that would be likely to occur in a real app that's loading progressively, and would thus presumably be discarding resources as well.

I'm not trying to discourage you from doing this, but if you go in with an "unwary" attitude, or assume that something is safe because it works when the sun is shining, then you - or more likely someone else - is going to spend a lot of time debugging pretty horrible contention bugs.

Yes, I know that you could have coded defensively in your example, but my point is that you didn't think it was necessary, which is not the best argument in favour of the safety of multithreading.

BlindSide · Post by **BlindSide** » Sat Jul 05, 2008 3:31 pm

You're right, I gotta be more careful what I convey.

DISCLAIMER: To anyone looking at the above example thinking "Oh, it might not be so hard to utilize multi threading in my application", think again! It is a pain in the ass, even for the simplest stuff, and the problems increase exponentially as the complexity of the program increases. If you are not careful your program may crash repeatedly and without notice, your computer may explode, and last but not least you may get viciously attacked by a prehistoric creature.

christianclavet · Post by **christianclavet** » Sat Jul 05, 2008 6:29 pm

Wow! That's interesting. So there also a possibility that the OCCTree creation could be done in a other thread!

On my other comment, using only a change on the pointer address will be much faster that just copying the buffers. I still forget that we can do this with pointers...

Poor me...

What I found interesting is that your code example was small (I could even say tiny) and it had not required a whole change of the library just to test an idea.

Your right Blindside and Rogerborg. This can produce unstability if it's not done well, but at least, we have to try it to see unless it will never be done. It's good to be aware of occuring problems that could happen if you do something, but you must face problems if you want them resolved.

The method mentionned by Dark_Kilauea don't seem to imply a complete redesign of the engine. So I think it's a start. If that is done without affecting the current system.

Then only the new commands would put it ustable or I am completely wrong here? If someone modify the engine to add those command (without modifying the current ones), if they don't use it, could it put the engine unstable?

hybrid · Post by **hybrid** » Sat Jul 05, 2008 9:40 pm

There are also two completely different things when we talk about loading meshes in threads. One is the usual loading, but only in parallel to the rendering, and the other one is stream loading, which might require completely different things.
For the former, I guess we could have some kind of an asynchronous mesh loader interface, which returns some proxy object. This object can be queried for the current loading state and being finalized. Finalization would add the mesh to the cache and all textures etc. It might even be possible to use the asynchronous interface for normal loading without problems. And it should be usable in threads as well.
I won't think about stream loading, yet, because it wouldn't fit into the current loading scheme at all - at least at first glance. And it might be highly related to the actual file format layout.

rogerborg · Post by **rogerborg** » Sat Jul 05, 2008 9:55 pm

Yup. Lest anyone think I'm poo-pooing the idea offhand, I do like it in principle, and I've been looking into the implementation.

What I'm thinking is that an initial step would be to expose functionality that would allow the user app to pre-cache resources itself.

For textures, for example, we could make CNullDevice::loadTextureFromFile() and CNullDevice::addTexture() public (via IVideoDriver). That would allow a user app to create a texture in a separate thread, then add it to the texture cache with addTexture() outside the scene begin()/end().

It wouldn't be completely safe, since ASSPLOSIONS could still occur if the filesystem went bye-bye during the loading or if the user insists on calling addTexture() during a scene render, but on the other hand, I've had a bottle of cheap Bulgarian Merlot, so suddenly it doesn't seem so very naughty. I love you guys. Who wants to hug?

dlangdev · Post by **dlangdev** » Sun Jul 06, 2008 8:39 am

Found something interesting you guys might want to check-out.

Lock-Free Programming on AMD Multi-Core Systems

http://developer.amd.com/documentation/ ... 00689.aspx

quote:

A recent story in Ars Technica reported that Valve Software, makers of Half Life and other graphics-intensive first-person games, turned to lock-free methods in a deadlock-resolution effort. Adding lock-free was part of the company's overall push to multi-core, which made use of different synchronization strategies, including lock-free, coarse-grain and fine-grain locking, and multi-read exclusive write locks.

Whether a lock-free approach makes sense for your application will depend on the workload. Where parallelism is fine-grained (where each thread requires access to a shared resource for relatively short intervals) and there are a large number of contending threads, a well-written lock-free algorithm might make a better-performing choice, and will avoid deadlock and related conditions.

With a small number of threads, there probably won't be much difference in performance between a lock-free implementation and one that uses locks. But what might be a wash today, on a dual-core system with dozens of threads, might strongly favor lock-free on a 16-core future processor supporting thousands of threads.

end quote: