cache question

REDDemon · Post by **REDDemon** » Sun Dec 04, 2011 9:46 pm

After spend the last month to test a my project on different computers I'm now taking some relax with funny things: sin and cos. I tried to write my own functions.

I wanted to rewrite cos/sin functions in order to see wich is the performance compared to std functions. My solution required a mixed approach of branching / lookup tables (8 MB of lookup tables). I know that lookup tables at a firts look seems a ugly solution. But at the end my solution is 20% faster than the std one (precision is the same except that sometimes the last bit of the mantissa is rounded down). But at this point my question is.. Is that really a gain?

Ok the code for measuring the speed is something like this.

Code: Select all

 
float a*; //array
 
...// setup the array. with random numbers
 
Timer T; //thread affinity timer with precision to nanoseconds.
 
T.start();
for(int i=0; i<10000000; ++i)
{
     std::cos(a[i]);
}
T.print();
 
T.reset();
T.start();
for(inti i=0; i<10000000; ++i)
{
     myspace::cos(a[i]);
}
T.print();

now using different compiler optimizations the final result is that my function is always 20% faster (if the array instead of random numbers is made of sequential numbers my function becomes something like 98% faster that's why I use an array of random numbers.. I used several
different randomizers for that. Included the irrlicht randomizer).

Ok at this point I thinked that this function requires always to load new data into the cache since the 8MB of lookup tables can't be cached entirely at once. The random array forced to reload data from the cache very frequently.

Is that function worth? my computer has relatively a small chache compared to modern computers. But if even reloading data from the cache is faster maybe there are other side effects (less precision. Sometimes less significant bit of the mantissa is lost..).. Is BUS kept busy during this process? (because I have the suspect that this faster function will slow down other processes on the computer.. Don't know if this can happen or how.. but I don't know how to measure time of other processes. So at a first look I have a faster function.. ready to share on source forge if it will be usefull to anyone.. but it is really faster? or the BUS is more busy so other processes will slow down?).