I wanted to rewrite cos/sin functions in order to see wich is the performance compared to std functions. My solution required a mixed approach of branching / lookup tables (8 MB of lookup tables). I know that lookup tables at a firts look seems a ugly solution. But at the end my solution is 20% faster than the std one (precision is the same except that sometimes the last bit of the mantissa is rounded down). But at this point my question is.. Is that really a gain?
Ok the code for measuring the speed is something like this.
Code: Select all
float a*; //array
...// setup the array. with random numbers
Timer T; //thread affinity timer with precision to nanoseconds.
T.start();
for(int i=0; i<10000000; ++i)
{
std::cos(a[i]);
}
T.print();
T.reset();
T.start();
for(inti i=0; i<10000000; ++i)
{
myspace::cos(a[i]);
}
T.print();
different randomizers for that. Included the irrlicht randomizer).
Ok at this point I thinked that this function requires always to load new data into the cache since the 8MB of lookup tables can't be cached entirely at once. The random array forced to reload data from the cache very frequently.
Is that function worth? my computer has relatively a small chache compared to modern computers. But if even reloading data from the cache is faster maybe there are other side effects (less precision. Sometimes less significant bit of the mantissa is lost..).. Is BUS kept busy during this process? (because I have the suspect that this faster function will slow down other processes on the computer.. Don't know if this can happen or how.. but I don't know how to measure time of other processes. So at a first look I have a faster function.. ready to share on source forge if it will be usefull to anyone.. but it is really faster? or the BUS is more busy so other processes will slow down?).