SquareRoot Function Improve

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
Post Reply
Zurzaza
Posts: 153
Joined: Fri Mar 26, 2010 9:41 pm
Location: Filo (FE), Italy
Contact:

SquareRoot Function Improve

Post by Zurzaza »

Hi, Surfing in the internet, I found an interesting article that was speaking about some improved square root functions.

Using this function instead of the classic sqrt() function, can improve the engine speed. (Useful over 10000 calls).

I think that the assembly square-root isn't a portable solution, but i'm not sure of that.

Code: Select all

REALINLINE double fastsqrt(double n) //The fastest
{
 	  __asm{
	     fld n
	     fsqrt
	   }
} 


REALINLINE float fastsqrt2(const float x)  
{
  const float xhalf = 0.5f*x;
 
  union 
  {
    float x;
    int i;
  } u;
  u.x = x;
  u.i = 0x5f3759df - (u.i >> 1);  
  return x*u.x*(1.5f - xhalf*u.x*u.x);
}

int main()
{
[..]
	 irr::u32 irrT,fastT;
//Prima = before ; dopo = after
	 irr::u32 prima = device->getTimer()->getRealTime();

	 for(int i=0;i<10000;i++)
		 irr::core::squareroot(128.0f);

	 irr::u32 dopo = device->getTimer()->getRealTime();

	 irrT = dopo - prima;
	 cout<<"IRRLICHT' SQUARE ROOT TIME: "<<irrT<<endl;
	 cout<<"IRRLICHT' SQUARE ROOT RES: "<<squareroot(128.0f)<<endl;

	 prima = device->getTimer()->getRealTime();

	 for(int i=0;i<10000;i++)
		 fastsqrt(128.0f);

	 dopo = device->getTimer()->getRealTime();

	 fastT = dopo - prima;
	 cout<<"FAST SQUARE ROOT TIME: "<<fastT<<endl;
	 cout<<"FAST SQUARE ROOT RES: "<<fastsqrt(128.0f)<<endl;
	 	 prima = device->getTimer()->getRealTime();
	 for(int i=0;i<10000;i++)
		 fastsqrt2(128.0f);

	 dopo = device->getTimer()->getRealTime();

	 fastT = dopo - prima;
	 cout<<"FAST SQUARE ROOT (2nd) TIME: "<<fastT<<endl;
	 cout<<"FAST SQUARE ROOT (2nd) RES: "<<fastsqrt2(128.0f)<<endl;
[..]
}
Hope this can be helpful
shadowslair
Posts: 758
Joined: Mon Mar 31, 2008 3:32 pm
Location: Bulgaria

Post by shadowslair »

Here`re my results for 100k calls:

Code: Select all

IRRLICHT' SQUARE ROOT TIME: 20
IRRLICHT' SQUARE ROOT RES: 11.3137

FAST SQUARE ROOT TIME: 10
FAST SQUARE ROOT RES: 11.3137

FAST SQUARE ROOT (2nd) TIME: 14
FAST SQUARE ROOT (2nd) RES: 11.3109
Anyone comment what are the pros and cons of this sqrt func and its portability please? :)
"Although we walk on the ground and step in the mud... our dreams and endeavors reach the immense skies..."
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Post by hybrid »

If you really made the test case as above, I guess you should update your compiler. The loop should be solved at compile time :roll: Anyway, portability is near zero, because the asm code depends on the processor and the correct compiler.
sudi
Posts: 1686
Joined: Fri Aug 26, 2005 8:38 pm

Post by sudi »

try this:
code from http://www.codemaestro.com/reviews/9

Code: Select all

#include <irrlicht.h>
float InvSqrt(float x)
{
  float xhalf = 0.5f*x;
  int i = *(int*)&x; // get bits for floating value
  i = 0x5f375a86- (i>>1); // gives initial guess y0
  x = *(float*)&i; // convert bits back to float
  x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
  return x;
}

float Sqrt(float x)
{
    return 1.f/InvSqrt(x);
}

int main(void)
{

    irr::f32 n = 0.0f;

    scanf("%f", &n);


    irr::u32 start = clock();

    for(int i=0;i<10000000;i++)
       irr::core::squareroot(n);

    irr::u32 end = clock();

    irr::u32 irrT = end - start;
    nlog<<"IRRLICHT' SQUARE ROOT TIME: "<<irrT<<nlendl;
    nlog<<"IRRLICHT' SQUARE ROOT RES: "<<irr::core::squareroot(n)<<nlendl;

    start = clock();

    for(int i=0;i<10000000;i++)
       Sqrt(n);

    end = clock();

    irrT = end - start;
    nlog<<"QUAKE3' SQUARE ROOT TIME: "<<irrT<<nlendl;
    nlog<<"QUAKE3' SQUARE ROOT RES: "<<Sqrt(n)<<nlendl;


    return 0;
}

Last edited by sudi on Sat Sep 04, 2010 9:58 pm, edited 1 time in total.
We're programmers. Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We're not excited by renovation:tinkering,improving,planting flower beds.
slavik262
Posts: 753
Joined: Sun Nov 22, 2009 9:25 pm
Location: Wisconsin, USA

Post by slavik262 »

http://en.wikipedia.org/wiki/Fast_inverse_square_root
http://en.wikipedia.org/wiki/Methods_of ... esentation

It mostly stems from back in the day when vector calculations were done on the processor instead of the GPU via a shader. Also, like Hybrid said, it makes a lot of assumptions about the hardware.
shadowslair
Posts: 758
Joined: Mon Mar 31, 2008 3:32 pm
Location: Bulgaria

Post by shadowslair »

@hybrid: I`m using VisualC++ 2008 def compiler. I called device->sleep(10000); to be able to copy the results from the console window. X)

@all: This is some interesting reading. Worth take a look and make some tests when have some time. Thank you all. :wink:
"Although we walk on the ground and step in the mud... our dreams and endeavors reach the immense skies..."
Post Reply