Hey guys,
I have been doing some work with elementary functions and wanted to increase the speed of ln(x) and exp(x). I believe most compilers do not take full advantage of the cpu and thus I am writing my own. However before I can optimize I need an algorithm that shows promise. I tried an implementation based off AGM but for some reason my version is 10 times slower! They say it is the fastest implementation but I am not seeing the results of my end. I am trying to calculate to 53 bit precision since that is IEEE 754 double mantissa precision.
I haven't tried the AGM 2 or 3 only AGM 1 so far. The
AGM 1-2 Method:http://rnc7.loria.fr/brent_invited.pdf
AGM 3 Iteration:http://rnc7.loria.fr/brent_invited.pdf
Anyone have any advice on what algorithm compilers use?
Fastest LN(X) Implementation: What do compilers use?
-
- Posts: 288
- Joined: Wed Oct 29, 2008 12:07 pm
Re: Fastest LN(X) Implementation: What do compilers use?
Have you checked the source of LLVM, GCC, glibc, uclibc, musl, and a number of other open source compilers and libcs?
-
- Posts: 288
- Joined: Wed Oct 29, 2008 12:07 pm
Re: Fastest LN(X) Implementation: What do compilers use?
The gcc compiler use a Taylor series approximation for a number in between 1/sqrt(2) and sqrt(2) using an argument reduction formula. For 53 bits it looks like they truncate it at about 6 summations which surprised me. I guess the AGM method is for bits >> 53 like 1000 bit precision etc.
Re: Fastest LN(X) Implementation: What do compilers use?
The gcc uses a very slow way to calculate the logarithm.
For normal x86 machines, the assembler command fyl2x exists which calculates the logarithm to the basis 2 of the first and multiplies the result with the second operand. As a formula: y*log2(x).
I tested this code with Microsoft Visual C++ 10 and GCC 4.6.1.
In Visual C++ the assembler function is as fast as the in normal code one, but only if compiler optimization is set to full.
In GCC the assembler function is always 3.5 times faster.
This calculates the logarithm to the basis of 2 but not the natural logarithm. To do this you can just pass the value 1/log2(e) as a second parameter to the function:
For Visual Studio this is a little bit slower than the normal log (in my tests about 1.075 times slower) and for gcc faster by the same factor.
For the calculation of the natural logarithmus there is no real performance gain but for a combination with a multiplication it is much faster than just using the old log.
For normal x86 machines, the assembler command fyl2x exists which calculates the logarithm to the basis 2 of the first and multiplies the result with the second operand. As a formula: y*log2(x).
Code: Select all
// y*log2(x)
// To set a specific set y to 1/log2(base)
#if defined(_MSC_VER)
__forceinline __declspec(naked) float __cdecl log2(float x, float y=1.0f)
{
__asm fld [esp+8]; // push y onto stack
__asm fld [esp+4]; // push x onto the stack
__asm fyl2x; // calculate
__asm retn; // exit function
}
#elif defined (__GNUC__)
__inline float log2(float x, float y=1.0f)
{
float result;
asm ("fyl2x" : "=t" (result) : "0" (x), "u" (y));
return result;
}
#else
__inline float log2(float x, float y=1.0f)
{
static const float _1_div_log_2 = 1/log(2.0f);
return y*log(x)*_1_div_log_2;
}
#endif
In Visual C++ the assembler function is as fast as the in normal code one, but only if compiler optimization is set to full.
In GCC the assembler function is always 3.5 times faster.
This calculates the logarithm to the basis of 2 but not the natural logarithm. To do this you can just pass the value 1/log2(e) as a second parameter to the function:
Code: Select all
inline float ln(float x)
{
static const float y = 0.69314718055994530941f; // 1/log2(e);
return log2(x, y);
}
For the calculation of the natural logarithmus there is no real performance gain but for a combination with a multiplication it is much faster than just using the old log.
Rust fanboy