Endian Issues

zenaku · Post by **zenaku** » Fri Jul 14, 2006 3:52 am

What's wrong with the IColor idea?

IColor color = driver->SColor(r,g,b,a);    //stack alloc, faster but uses stack
IColor *color2 = driver->SColor(r,g,b,a);  //heap alloc
...

//provide stack allocation method
IColor COpenGLDriver::SColor(s32 r, s32 g, s32 b, s32 a)
{
      return COpenGLColor(r,g,b,a);
}

//provide heap allocation method
IColor *COpenGLDriver::SColor(s32 r, s32 g, s32 b, s32 a)
{
      return new COpenGLColor(r,g,b,a);
}

...
class IColor
{
   protected:
   IColor() //make constructor protected so that only driver can create  them (having pure virtual methods also works)
          
   {
       ...
   }
   
}

class COpenGLColor : public IColor
{
    public:
    COpenGLColor(s32 r, s32 g, s32 b, s32 a)
    {
        ...
    }
    ...
}

AFAIK you gain four bytes in size due to the vtable, and all calls are indirect. I don't know if that's worth it performance-wise, but the design is cleaner, IMO.

lingwitt · Post by **lingwitt** » Fri Jul 14, 2006 7:20 am

Conceptually it's Perfect. Unfortunately, it seems that Practice would be slow and inefficient. The thought of millions of heap manipulations is rotten.

I suppose we could manage our own cache of available color objects, so as to minimize heap de/allocations, but that could lead to inefficient use of space, and there would still be the overhead of management.

Moreover, it's isn't possible to have the best of both worlds:

Code: Select all

IColor color = driver->SColor(r,g,b,a);    //stack alloc, faster but uses stack 
IColor *color2 = driver->SColor(r,g,b,a);  //heap alloc

There is no such thing as return-type overloading, and the stack doesn't allow for runtime polymorphism.

I investigated another trick, but came to the conclusion that it wouldn't work here (convince me otherwise, please): http://en.wikipedia.org/wiki/Curiously_ ... te_Pattern

I still believe the best marriage of Practice and Abstraction is the use of the video driver.

IVideoDriver* driver = device->getVideoDriver();

u32 color = driver->ARGB(...);
u32 color = driver->RGB(...);
u32 color = driver->BGRA(...);
driver->setR(&color, ....);

etc

Then the only overhead is the use of virtual methods that create u32 stack values; in practice, the programmer will generally never need to worry about what's going on, and in theory, the programmer can still use different devices by converting properly, as one expects:

u32 color2 = driverD3D->RGB(driverGL->getR(color), driverGL->getG(color), driverGL->getB(color));
or
u32 color2 = driverD3D->ARGB(color, EDT_OPENGL);
or
driverD3D->ARGB(&color, EDT_OPENGL);

Let me add that I think this need to convert colors is nonexistant; however, the need to create device-dependent colors as quickly and as transparently as possible is very necessary.

Moreover, neither D3D (from what I know) nor OpenGL support pure colors in various formats, so I think it unnecessary to consider them for now: 32 bits, 8 bits per component (4 bytes per pixel), ARGB is all that matters.

zenaku · Post by **zenaku** » Fri Jul 14, 2006 1:45 pm

he thought of millions of heap manipulations is rotten

...

There is no such thing as return-type overloading, and the stack doesn't allow for runtime polymorphism.

Maybe you can't polymorph ret vals, but you do NOT have to heap alloc. You CAN use stack alloc, and I see no reason why polymorphism of stack variables won't work. You just create a pointer of the base class you want and assign it the address of your derived class, stack alloc'd or not.

Code: Select all

class IColor 
{
    ...
    void foo() { printf("foo\n"); }
}
class IColorEx : public IColor
{
    ...
    void foo() { printf("bar\n"); }
}

   //NO heap alloc here! ColorEx is stack, Color points to stack
   IColorEx ColorEx = driver->SColorEx(r,g,b,a);
   IColor *Color = &ColorEx;

   ColorEx.foo();   //prints "bar"
   Color->foo();    //prints "foo"

zenaku · Post by **zenaku** » Fri Jul 14, 2006 2:15 pm

I investigated another trick, but came to the conclusion that it wouldn't work here (convince me otherwise, please): http://en.wikipedia.org/wiki/Curiously_ ... te_Pattern

That definately won't work. The driver must be selected at compile time for that to work.

Baal Cadar · Post by **Baal Cadar** » Fri Jul 14, 2006 2:19 pm

zenaku, in your example isn't it the opposite? First call yields bar, second one foo.

Also in general stack allocation and polymorphism don't work well together, simply because you have to know the size of an instance at compile time, and you can't know that for the general case. In this special situation slicing won't be an issue though.

zenaku · Post by **zenaku** » Fri Jul 14, 2006 4:58 pm

Baal Cadar wrote:zenaku, in your example isn't it the opposite? First call yields bar, second one foo.

Also in general stack allocation and polymorphism don't work well together, simply because you have to know the size of an instance at compile time, and you can't know that for the general case. In this special situation slicing won't be an issue though.

Yeah, fixed that

oops!

The problem is that no matter what is decided, SColor will always be dependant upon the driver. You can't escape that fact. It might as well be an interface like the rest of the engine.

lingwitt · Post by **lingwitt** » Mon Jul 17, 2006 10:47 am

OpenGL has two ways to deal with texture data:
1) standard:
OpenGL deals with bytes. That is, GL_RGB and GL_UNSIGNED_BYTE specify that
unsigned char* data;
data[0] = R;
data[1] = G;
data[2] = B;

On all machines, byte locations are the same, but bit locations are not.

2) nonstandard:
Actual bit locations are specified. That is, GL_BGRA and GL_UNSIGNED_SHORT_1_5_5_5_REV specify that
bits 0-4 = B
bits 5-9 = G
bits 10-14 = R
bit 15 = A

On all machines, bit locations are the same, but byte locations are not.

The Irrlicht standard is more or less bit-associated. That is, ARGB means that A is always the most significant. This means that the nonstandard OpenGL method is an excellent choice. Unfortunately, OpenGL only accepts the standard format for R8G8B8 data. This is why we shuffle the colors for 24 bit data in CColorConverter. However, this shuffling is a major point of confusion, because it is a departure from a system wide bit-associated convention.

We should instead format R8G8B8 data as anticipated, such that R is most significant. Then COpenGLTexture should have the burden of formatting it properly.

Here is a patch for COpenGLTexture, which should be correct on all platforms now; there is no need for conditional compilation. Moreover, R8G8B8 data is now used directly:
http://web.mit.edu/mfwitten/irrlicht/pa ... .cpp.patch

Other patches are available here:
http://web.mit.edu/mfwitten/irrlicht/patches/Patches/

However, I believe the image loaders are not completely correct. TGA is certainly wrong, and CColorConverter::convert32BitTo32BitFlipMirror looks dubious.

I believe I have solutions in reach, so hold on!

hybrid · Post by **hybrid** » Mon Jul 17, 2006 11:49 am

The current (SVN) way of handling R8G8B8 is due to the lack of texture conversions adding alpha channels to non-alpha textures. So if we start with a jpeg file it would not be possible to add an alpha channel to the texture with current Irrlicht methods. So I decided to convert R8G8B8 to A8R8G8B8 in OpenGLTexture as this is the most common non-alpha format and loaded by many file loaders. This is a rather slow hack until we get the alpha addition methods or at least a texture creation flag stating whether or not alpha channel is required. Just remember that already the sky boxes add several megabytes of redundant bytes each time although no alpha channel is required. But at least 16bit textures are now used space efficiently.
And what do you mean by non-standard? All those functions are an official part of OpenGL since version 1.3

lingwitt · Post by **lingwitt** » Mon Jul 17, 2006 7:57 pm

I call it standard and nonstandard for the sake of discussion; I'm defining those two terms. In particular, the OpenGL specification uses packed pixel formats instead of nonstandard; these were added later, and they are probably around, because of Windows and Endian issues.

lingwitt · Post by **lingwitt** » Thu Jul 20, 2006 9:54 am

I think I've got the image loading problems pretty much narrowed down:
1) jpegs and pngs should already work, because they are loaded with jpeglib and libpng.
2) BMP and TGA have been fixed; BMP now supports 16-bit files too.
3) PCX is incomplete
4) PSD isn't confirmed as yet

I will first get textures working on all machines, post patches, and then work on prettying the code.

For the current working fixes, use the patches that deal with images here: http://web.mit.edu/mfwitten/irrlicht/pa ... Individual
1) COpenGLTexture.cpp.patch
2) CColorConverter.cpp.patch
3) CColorConverter.h.patch
4) CImageLoaderBmp.cpp.patch
5) CImageLoaderTGA.cpp.patch

lingwitt · Post by **lingwitt** » Fri Jul 21, 2006 4:32 am

JPEG, BMP, PNG, and TGA seem more than enough at this point, and they all work now (with the patches listed in my last comment).

The PCX and PSD code is currently very old and incomplete and it certainly doesn't handle endianness.

Does anybody actually use these? Can we just drop them?

lingwitt · Post by **lingwitt** » Mon Jul 24, 2006 7:43 pm

Currently, Irrlicht makes liberal use of OSReadSwapInt32, and its ilk, for BIG_ENDIAN fixes. However, these are a Mac OS X thing, and the functionality is somewhat incomplete, as there is no way to deal with floats and doubles (in fact, a recent addition to Irrlicht uses OSReadSwapFloat32, which doesn't even exist).

Perhaps we need to make our own built in byte swapping functions or macros.

I'll put together some macros and test them out.

hybrid · Post by **hybrid** » Mon Jul 24, 2006 8:21 pm

check out COgreMeshFileLoader where a bswap32 define handles all platforms. The only hack is the float swap done in readFloat. It casts the float to s32, swaps, and casts back to float. We should probably put these methods into os::byteswapXX with XX in 16,32,f32 and use those methods instead. Sorry for the wrong method in the b3d loader, I skipped those BIG_ENDIAN parts too fast.

lingwitt · Post by **lingwitt** » Thu Jul 27, 2006 10:04 pm

How about we use POSIX htonl, htons, ntohl, ntohs?

These will be optimized for the platform they're on, I imagine. We can just use our own #defines to change the API.

Perhaps we should take care of endian issues in the reading and writing functions; we can pass flags to the file readers specifying the endianness of the data.

hybrid · Post by **hybrid** » Thu Jul 27, 2006 10:37 pm

The routines are not possible to use as they don't do endian swapping in all cases. If you have a LE machine the methods are noops (IIRC) because network coding is always LE. But since we might get big endian files on those machines we must convert even in such cases. So bswap and relatives are the way to go.