It's always going to be slow with 40 enemies on the same level all moving around at once. DirectX 10 is better for rendering large groups of AI characters. It doesn't really matter what method your use to calculate your AI paths, becuase rendering 40 enemies all with their own set of rules can really bog down a system.
You should at least do some profiling to check what percentage goes into the calculations. You might want to compile with -ffast-math or whatever your compiler flag is to disable IEEE precision requirements.
I see you use espeed and rotrad vars in the countation, if those vars are not dependant on your agents, do the countation once before the cycle and then do this pos.Z += and pos.X += stuff in cycle with the already counted stuff. Simply push out form the cycle as much as you can
The nice thing about sin and cos is that they have fixed range input [0,2*PI]. This makes them a good candidate for optimization using lookup tables. The precision won't be perfect, but for most cases it will be adequate as long as the operations are in a situation where hte error won't become cumulative.
You do a lot of programming? Really? I try to get some in, but the debugging keeps me pretty busy.
Well noone88 I set up a whole demo with lookup tables to prove you wrong, but it turns out you were mostly right. In my tests, a lookup table was only 2-3 times faster than std::sin
You do a lot of programming? Really? I try to get some in, but the debugging keeps me pretty busy.
Electron in your test, did you also scatter the sin/cos computations around in the code, that most of the time is doing different things and moving big heaps of mem? Because in a naive test application with computations done in a for-loop, you have the whole table in tha cache the whole time. No cache misses at all, which is not true for a typical application, where table moves in/out of cache all the time, which will make it much less worthwhile. This topic has been discussed often in other forums and most came to the conclusion, that if at all, calculation is prefferable to a LUT in the end.
Most math implementations in c libraries are based on LUTs, so it's usually not necessary to do it on your own. Moreover most processors have hardware LUT such that IEEE precision fixes won't make it that slow (as you saw).