Irrlicht Engine

Posted: **Sat Dec 01, 2012 2:27 pm**

Hey guys,

I have been doing some work with elementary functions and wanted to increase the speed of ln(x) and exp(x). I believe most compilers do not take full advantage of the cpu and thus I am writing my own. However before I can optimize I need an algorithm that shows promise. I tried an implementation based off AGM but for some reason my version is 10 times slower! They say it is the fastest implementation but I am not seeing the results of my end. I am trying to calculate to 53 bit precision since that is IEEE 754 double mantissa precision.

I haven't tried the AGM 2 or 3 only AGM 1 so far. The

AGM 1-2 Method:http://rnc7.loria.fr/brent_invited.pdf

AGM 3 Iteration:http://rnc7.loria.fr/brent_invited.pdf

Anyone have any advice on what algorithm compilers use?

Posted: **Sat Dec 01, 2012 3:10 pm**

Have you checked the source of LLVM, GCC, glibc, uclibc, musl, and a number of other open source compilers and libcs?

Posted: **Sat Dec 01, 2012 8:57 pm**

The gcc compiler use a Taylor series approximation for a number in between 1/sqrt(2) and sqrt(2) using an argument reduction formula. For 53 bits it looks like they truncate it at about 6 summations which surprised me. I guess the AGM method is for bits >> 53 like 1000 bit precision etc.

Posted: **Wed Dec 12, 2012 7:30 pm**

The gcc uses a very slow way to calculate the logarithm.

For normal x86 machines, the assembler command fyl2x exists which calculates the logarithm to the basis 2 of the first and multiplies the result with the second operand. As a formula: y*log2(x).

Code: Select all

// y*log2(x)
// To set a specific set y to 1/log2(base) 
#if defined(_MSC_VER)
__forceinline __declspec(naked) float __cdecl log2(float x, float y=1.0f)
{
    __asm fld [esp+8];  // push y onto stack
    __asm fld [esp+4];  // push x onto the stack
    __asm fyl2x;        // calculate 
    __asm retn;         // exit function
}
#elif defined (__GNUC__)
__inline float log2(float x, float y=1.0f)
{
    float result;
    asm ("fyl2x" : "=t" (result) : "0" (x), "u" (y));
    return result;
}
#else
__inline float log2(float x, float y=1.0f)
{
    static const float _1_div_log_2 = 1/log(2.0f);
    return y*log(x)*_1_div_log_2;
}
#endif

I tested this code with Microsoft Visual C++ 10 and GCC 4.6.1.
In Visual C++ the assembler function is as fast as the in normal code one, but only if compiler optimization is set to full.
In GCC the assembler function is always 3.5 times faster.

This calculates the logarithm to the basis of 2 but not the natural logarithm. To do this you can just pass the value 1/log2(e) as a second parameter to the function:

Code: Select all

inline float ln(float x)
{
    static const float y = 0.69314718055994530941f; // 1/log2(e);
    return log2(x, y);
}

For Visual Studio this is a little bit slower than the normal log (in my tests about 1.075 times slower) and for gcc faster by the same factor.
For the calculation of the natural logarithmus there is no real performance gain but for a combination with a multiplication it is much faster than just using the old log.

Irrlicht Engine

Fastest LN(X) Implementation: What do compilers use?

Fastest LN(X) Implementation: What do compilers use?

Re: Fastest LN(X) Implementation: What do compilers use?

Re: Fastest LN(X) Implementation: What do compilers use?

Re: Fastest LN(X) Implementation: What do compilers use?