IRRLICHT_FAST_MATH ceil32 and floor32

CuteAlien · Post by **CuteAlien** » Tue Jan 28, 2014 11:53 am

Bruce Dawson wrote an article about testing optimized floating point functions like ceil, floor, round: http://randomascii.wordpress.com/2014/0 ... -them-all/

So I had to test it for Irrlicht:

Code: Select all

 
// Test of ceil functions based based on http://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-floatsso-test-them-all/
#define IRRLICHT_FAST_MATH 1
#include <irrMath.h>
#include <cstdio>
#include <cstdint>
 
// have to wrap the functions to receive and return float
float ceil32f ( float x )
{
    return (float)irr::core::ceil32(x);
}
 
float orig_ceil(float x )
{
    return (float)ceil(x);
}
     
union Float_t
{
    Float_t(float num = 0.0f) : f(num) {}
    // Portable extraction of components.
    bool Negative() const { return (i >> 31) != 0; }
    int32_t RawMantissa() const { return i & ((1 << 23) - 1); }
    int32_t RawExponent() const { return (i >> 23) & 0xFF; }
 
    int32_t i;
    float f;
#ifdef _DEBUG
    struct
    {   // Bitfields for exploration. Do not use in production code.
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign : 1;
    } parts;
#endif
};   
 
typedef float(*Transform)(float);
 
// Pass in a uint32_t range of float representations to test.
// start and stop are inclusive. Pass in 0, 0xFFFFFFFF to scan all
// floats. The floats are iterated through by incrementing
// their integer representation.
void ExhaustiveTest(uint32_t start, uint32_t stop, Transform TestFunc,
    Transform RefFunc, const char* desc)
{
    printf("Testing %s from %u to %u (inclusive).\n", desc, start, stop);
    // Use long long to let us loop over all positive integers.
    long long i = start;
    while (i <= stop)
    {
        Float_t input;
        input.i = (int32_t)i;
        Float_t testValue = TestFunc(input.f);
        Float_t refValue = RefFunc(input.f);
        // If the results don’t match then report an error.
        if (testValue.f != refValue.f &&
            // If both results are NaNs then we treat that as a match.
            (testValue.f == testValue.f || refValue.f == refValue.f))
        {
            printf("Input %.9g, expected %.9g, got %1.9g        \n",
            input.f, refValue.f, testValue.f);
        }
 
        ++i;
    }
}
 
int main () 
{
    // This is the biggest number that can be represented in
    // both float and int32_t. It’s 2^31-128.
    Float_t maxfloatasint(2147483520.0f);
    const uint32_t signBit = 0x80000000;
    ExhaustiveTest(0, (uint32_t)maxfloatasint.i, ceil32f, orig_ceil,
                "ceil");
    ExhaustiveTest(signBit, signBit | maxfloatasint.i, ceil32f, orig_ceil,
                "ceilSignBit"); 
    
    return 0;
}

Some notes:
- I tested on Linux
- It will only compile with latest irrlicht trunk version
- You have to enable -std=c++11 for compiling this

Unfortunately my computer is obviously not as fast as the one Bruce was using for his tests as it is running here for a lot longer than the 90 seconds already. But spitting out different results already for a while which show the Irrlicht implementation for ceil32 is not correct.

I also tested floor32 for a little and it was just as bad. If fails for all odd numbers (so yeah, it fails with 1, 3, 5, etc - which makes me wonder if that function ever was tested in any way).

I'm not going to try to fix that - my recommendation would be to kick those functions out. That's unless someone is motivated enough to rewrite them and then does actually test them.

hendu · Post by **hendu** » Wed Jan 29, 2014 10:22 am

Agreed, please nuke them. I've never used that define, and it's disabled by default - maybe consider nuking all irrlicht_fast_math?

CuteAlien · Post by **CuteAlien** » Wed Jan 29, 2014 10:44 am

Testing this I also found out that Konsole (KDE console) crashes when an application printf's 25 gb of text-lines ;-)

Granyte · Post by **Granyte** » Wed Jan 29, 2014 12:24 pm

we should either nuke it or make it use intrinsic and sse

CuteAlien · Post by **CuteAlien** » Wed Jan 29, 2014 5:37 pm

Granyte wrote:we should either nuke it or make it use intrinsic and sse

Sure, if someone writes a version which improves speed and actually works that's fine and can be added. Unless that happens I guess deprecating them is the only solution.

Further testing showed that round32 also has pretty bad results. But even round_ fails with very small and with larger numbers (over 8.000.000). We might need round_ on some platforms (wasn't originally in math.h from the c-lib but only got added at some point), so I guess would be nice to figure out a working solution for that one (which must not be written in asm).

Granyte · Post by **Granyte** » Wed Jan 29, 2014 5:53 pm

well we can use the intrinsic unless they are not the same on ARM platforms

CuteAlien · Post by **CuteAlien** » Wed Jan 29, 2014 11:02 pm

Granyte wrote:well we can use the intrinsic unless they are not the same on ARM platforms

Sorry, I don't know about which intrinsic functions you are talking here. I googled a little around and found intrinsic functions for hlsl which do stuff like ceil and floor, but I didn't find that for the PC platform so far.

Granyte · Post by **Granyte** » Thu Jan 30, 2014 12:20 am

Code: Select all

_mm_ceil_ss() //f32
 
_mm_ceil_sd() //f64

are some but they are specific to sse 4.1 how ever they are only 1-3 cycle long

how ever there are others but I don't have them on hand atm

CuteAlien · Post by **CuteAlien** » Thu Jan 30, 2014 10:44 am

Thanks. Must admit I don't get the documentation. What are the first 4 floats for? They get returned untouched and only the last float is ceil'ed? Anyway - I would be fine if someone implements it correct (aka tested and with right defines etc).

But this is not really my area of code in the engine. I just checked and those functions are ~99% used by the Burnings renderer. So maybe all the errors don't matter there or are even deliberate (thought it would wonder me at least in the case of floor32 - I already see at least 1 place in the GUI where that might lead to one-pixel-off bugs). The real problem for me in that case would be that those functions are public while they should only be available internal in that case. So I'm going to pass that on for now and wait if Thomas has some feedback on that.

If Thomas doesn't care I'm going to make them internal functions (which he can then fix whenever he has time and needs that), deprecate the public ones and for now make the public functions return the c-lib stuff. Intrinsic calls can be added later if someone wants to do that, although I guess if anyone really needs those he can also just use them in his code.

Granyte · Post by **Granyte** » Thu Jan 30, 2014 4:33 pm

only the first float get ceiled thier 3 other get picked from the second vector and a _m128 get returned with a the first float from vector a ceiled and the 3 other from vector b untouched.

but ya the single operand version act really weird.

i'm in the process of implementing a sse based math library for irrlicht to speed up matrices so I could add the basic math to the project to

Irrlicht Engine

IRRLICHT_FAST_MATH ceil32 and floor32

IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32

Re: IRRLICHT_FAST_MATH ceil32 and floor32