Learn From my Fail

devsh · Post by **devsh** » Mon Mar 28, 2011 4:32 pm

it will scale if you set thread priorities/scheduling, also you could merge networking with another thread as it is minute

Insomniacp · Post by **Insomniacp** » Tue Mar 29, 2011 1:51 am

I would say just make sure you research how things work behind the scenes before attempting to thread. I currently have 5 threads no matter the processor type. If you program properly the number of threads you use can be hundreds and still be efficient. If your thread is waiting on a system block, such as receiving from a network socket then it will be asleep until the system notifies it that something arrived and it should unblock. This means your thread is doing absolutely nothing and not taking up any cpu resources. So you can have 100 network threads if you really wanted to and not have to worry about them stealing your cpu time unless they are all getting data at same time but at that point you will need the different threads to be able to process each set of data differently and efficiently. As an example of this look at any bit torrent program. downloading 1 item, vuze/azureus uses about 100 threads.

That said you should not thread things that take little time to accomplish such as memory access and manipulation. Things like hard disk I/O would be good to thread since most of the time you are waiting on the read/write calls to return. You should google the time it takes for certain things to occur. From the book Unix Systems Programming, if a processor cycle were 1 second (really only .5 ns (with 2GHz)) then cache access would be 2 seconds, memory access 30 seconds, context switch 167 minutes, and disk access 162 days. So you can see that switching between process on the cpu takes a lot of time which is one down fall of multi-threading. But if you do it to prevent waiting for a disk access then you are saving a huge amount of time.

In my implementation I have
1) main thread for irrlicht and rendering.
2) A render buffer thread which loads files from disk into memory and then passes them to main thread, this also handles network data.
3) A network thread which reads data in from the network.
4) A logger thread which writes log data to the disk, all threads send their output to here so no screen writes occur and no other thread writes to the disk
5) A physics thread. This is basically to allow physics calculations which are timely to be run on a separate thread at a pace which keeps things smooth but does not need to be run every frame.

In my map system I also thread off multiple threads to load map data from the disk to memory. With this type of implementation I avoid any timely disk access in the rendering thread and physics thread so they can constantly run at optimal speeds and not be blocked in lengthy system calls. So if I needed to load 100 new objects from the disk rendering would be largely unaffected until the files are in memory and even then accessing memory is fast and will barely slow it down.

devsh · Post by **devsh** » Tue Mar 29, 2011 3:31 am

whatever pleases you, as long you get the basis of thread synchronization and memory access with threads.

@pat
If I wanted to be picky I'd say you contradict yourself saying context switch takes 167 minutes (10020 clock cycles) and having 100 threads is fine (10020*100*(i dunno will 50hz do the job to make them seem runnin at the same time?)=5010000 cycles off your CPU which means 5mhz off your clock). I mean whatever pleases you, not a large hit, but obviously when the thread sleeps you switch context even more times. Personally I just dont like the 100 thread approach cause it scares me.

Insomniacp · Post by **Insomniacp** » Tue Mar 29, 2011 4:10 am

I agree its rather pointless since one thread could go through all the network sockets using non-blocking methods but 100 threads accomplish the same goal and chances are you don't need 100 sockets and all of them shouldn't be receiving at the same time unless you are doing something very extreme in which case you should know how OSes handle everything and this topic is pointless for you.

What did you mean by this part?

but obviously when the thread sleeps you switch context even more times.

When in a system block it will give up its cpu cycles and not be woken until the system returns from the function call. so it only context switches out, and then back in when the read or write call has finished and its that threads turn to run again. So if you are reading a file from the disk you switch out 167 mins, and back in, 334 mins total, instead of blocking for 162 days. That is how you would save a lot of time by multi-threading

It definitely isn't something you should ever just whimsically attempt to implement and you should definitely read a lot of multi-threading, OS, and data synchronization material before hand or you will crash and burn very quickly.

EDIT:
BTW, you can make all threads run like that. so they block at the system level (never get onto the cpu) until you set a flag that wakes them up. That way you can have threads waiting for data from other threads without using the cpu until one of your threads sends it data and tells it to wake up. This makes things like my logger more efficient and not use any cpu until it has something to write to the disk.

devsh · Post by **devsh** » Tue Mar 29, 2011 4:38 am

I know blocking only switches context twice, but I mean there are situations where you cant use a mutex and you need to brute force sleep(), then the thread yields the CPU for "some" time which is similar to nanosleep() and switches back, and nanosleep is like a couple of "your" 1 second clocks, so 10 minutes, so you get a context switch every 1 0minutes which lasts 167 minutes. This could be catastrophic to performance, I use that in my grass generator, to make a worker thread(meshcopy and tform) using a memory buffer while it is being worked on by a previous thread (raycasts). Basically the raycast thread says how many plants it has done so far and the other thread makes sure to always lag behind one on copying and tforming meshes. But the thing is, I put that there theoretically, because that thread has more workload per plant and it will almost always lag behind the raycast thread anyway.

Unless your terrain is ridiculously highpoly and plants lowpoly

Insomniacp · Post by **Insomniacp** » Tue Mar 29, 2011 4:43 am

sleeping is a system call and therefore will not run or get on the cpu until the timer expires. I can't think of any situation where you would have to use sleep and can't use a different method...

devsh · Post by **devsh** » Tue Mar 29, 2011 4:49 am

still system calls are processed on something...

The last argument would be that more threads waste CPU cache, with their local stuff

Insomniacp · Post by **Insomniacp** » Tue Mar 29, 2011 5:24 am

yes, processed once removing the thread from the cpu queue and then the system will obviously monitor it but it does that no matter what and innately is the job of the kernel.

During a context switch the cache is saved, flushed, and reloaded for the next program. So there is no way to waste cpu cache by having more threads.

devsh · Post by **devsh** » Tue Mar 29, 2011 5:53 am

sorry... cpu cache bandwidth (tb/sec or whatever it is) by having more context switches?

but still same old point... more threads == more switches

cant argue on the subject obviously you know more than me

Insomniacp · Post by **Insomniacp** » Tue Mar 29, 2011 6:01 am

bandwidth isn't really an issue, generally you will always have context switches because of the number of programs runnings, xorg, gnome, kde, shells, kernel, desktop, applications, etc. adding more has little to no effect. You did bring up a lot of points that many people have on the topic so its good to have discussed it for future people to read and learn from.

Rocko Bonaparte · Post by **Rocko Bonaparte** » Wed May 18, 2011 6:42 pm

You should try to avoid sleep calls in multithreading too because it's really just polling. Actually avoiding them as much as possible is wise. If possible, try to have the stuff you're waiting on come back to you when it's ready rather than polling. In multithreading, that means stuff like wait/notify, or outright existing out of the waiting code and reinvoking on a callback.

Sleep calls still get processed, even if the thread basically just getting its shot at the open mic to say, "Ahem" before sitting down again.

OpenGL is similar to GUI frameworks in the regard that it all works in one thread. GUI frameworks are actually much worse though. I am sure everybody's experienced a simple program with maybe just a "Stop" and a "Go" button. They push the button and some long work happens, but the GUI completely locks up and refuses to paint anymore. It's because the framework called back to the code doing the work, but that code didn't spawn off in its own thread. So it's hogging the GUI thread; the thread that issues the actual system-level draw commands.

I'm just dabbling in game stuff, so right now I only have two threads. One is for a scripting console and the other just has all the engine stuff crammed together. I'll make that more elegant when I see generally what I need to do accomplish stuff, or else I'd be an analysis paralysis. There is a risk of exposing game objects that the scripting engine could modify while the engine is actively using it. Generally you'd want the engine to test for a transition in the scripted object's state and then react to that. In some cases they don't even need to hit a mutex because maybe it doesn't matter for the current frame; it can take care of the object's state change the next time around.

captainkitteh · Post by **captainkitteh** » Sat May 21, 2011 12:05 pm

Premature optimization is the root of all evil

I don't know who said but this word should be heeded. Unless you are absolutely sure you need the performance gain of multithreading don't do it. It might be an ego boost as a programmer to efficiently use all four cores on a modern CPU... but the fun disappears when the debugging starts !