Endian Issues

lingwitt · Post by **lingwitt** » Sun Jul 02, 2006 12:29 pm

There seems to be confusion in the engine with regard to endianness.

All endian issues should be pushed as close to the device (hardware) abstraction as possible, in order to minimize the conditional compilation. I believe this can be achieved with some simple standards for colors:

1) Names such as ARGB indicate the byte ordering of components, where A is the most significant; bit counts should be specified in the name (A1R5B5G45), but in the case of SColor, a name such as ARGB implies that the bit count is the same for each component; some existing functions and methods will need to be renamed to fit this.

2) Scolor.color should be treated from now on as protected (it would be good to make it actually protected now); bad assumptions about endian nature could be made by accessing it directly; internally, the standard has already been established as ARGB, but this fact can be completely hidden by requiring users to get representations with members like
getARGB()
getRGBA()
getBGRA()
getABGR()

setARGB(s32)
setRGBA(s32)
setBGRA(s32)
setABGR(s32)

Then we could get rid of the weird toOpenGLColor() method, and let the device choose which of the above is appropriate according to __BIG_ENDIAN__

3) Perhaps similar getARGB(int x, int y) pixel methods should be used in image objects. Moreover, getPixel() is used to return a pixel of the form ARGB, regardless of the format in which the image is stored, which means device dependent textures are created with alpha components when this is perhaps a waste of space.

On the subject of textures, OpenGL treats data such that the first element to be accessed is the lower left corner of the image; the subsequent pixels map from left to right to fill up a line, and subsequent lines map upward until the last pixel is at the top right of the image. BMP is even stored this way. However, CColorConverter is used to flip the data upsidedown, and in some cases the endianness is switched. Therefore, 2 problems occur:
1) Image colors are not right for certain formats (due to bad loading from CColorConverter), at least on Big Endian machines,
2) It seems like texture coordinates need to be manipulated strangely in order to account for the unnecessary flipping of an image; perhaps CColorConverter flips images in order to make D3D code easier, but OpenGL doesn't expect that, and since BMP's are stored as described above, I doubt D3D expects any differently. What's going on here?

lingwitt · Post by **lingwitt** » Mon Jul 03, 2006 2:24 pm

Unfortunately, theory says that practice is better.

If the device chooses the format via a method (such as toABGR()), then bit shifting must occur if the internal format is different. Thus, performance would theoretically improve if--in practice--we decide on an internal format for the color depending on the endianness, so that the choice is made before the device level.

Unfortunately, D3D likes the format ARGB and OpenGL likes RGBA, so the situation is further complicated.

In any case, I think we can achieve the best of theory and practice if we hide the internal representation (s32 color). Then, the device code will make request for a format, and this will already be the case.

Perhaps then, since colors are so tightly bound to both hardware and software, we need to let the devices handle colors. We could make IColor instead of SColor and have devices return their own implementations.

Therefore, a Direct3D device will return a CD3DColor that puts its internal representation as ARGB, and an OpenGL device will return a COpenGLColor that puts its internal representation as RGBA or ABGR.

We could keep SColor around to make life easier for programmers, but anytime an SColor is passed to the engine, we could convert it into the proper device representation.

hybrid · Post by **hybrid** » Mon Jul 03, 2006 3:49 pm

You should have a look into IrrSpintz sources. Spintz removed the toOpenGLColor() overkill by implementing such a color structure.

lingwitt · Post by **lingwitt** » Mon Jul 03, 2006 4:57 pm

Great. However, Spintz has conditionals in the SColor class. We can remove these tests and bit shifting completely by making specialized classes.

In fact, with device proper colors, we wouldn't need to create a color buffer each time a node is rendered.

It seems like the biggest obstacle is the case where two devices are created, one with OpenGL and one with Direct3D. In this case, any specialized colors won't be compatible between the two without conversions. Is this a feasible problem?

If this is never a problem, then it should be rather easy to do all this.

Does using 4 u8's and direct assignment rather than bit operations slow down colors?

lingwitt · Post by **lingwitt** » Mon Jul 03, 2006 10:41 pm

I've forgotten that specialized classes require polymorphism, which requires objects on the heap..... This problem has gotten a bit trickier. I suppose there is no way around conditionals and bit swapping.

We could use function pointers and treat SColor as in procedural C (then we could just have typedef s32 SColor).

Eternl Knight · Post by **Eternl Knight** » Tue Jul 04, 2006 1:05 am

Actually, it's not the conditionals in the code that causes major issues with Spintz's code - it's the global variable use.

He has a static global variable that is changed by the last device created to the value it requires (ARB, BGRA, whatever). However, let us assume that someone wants to use TWO devices (such as a software renderer and a hardware renderer), something that is not unusual fr non-gaming purposes. If these two devices have incompatible endian/color-order values - then the last device created will screw up the first devices use of the variable. Given Irrlicht is (by default) a DLL - such static variables are basically a 'no-no'.

Personally, I think that there should be a container for the SColor arrays that keeps the endian/color-order flags within it. One would then create these array-containers from the device as needed. When moving such arrays between devices - one could compare the device's expected endian/color-order with the flags in the SColor-array container and convert only if necessary.

Basically, you keep the speed of non-conversion for colors but don't pollute the process space of Irrlicht with variables the end-user should not even need to be aware of.

--EK

lingwitt · Post by **lingwitt** » Tue Jul 04, 2006 2:55 am

You're absolutely right.

Rendering
Actually, the problem is deeper than suggested.

Creating more than one device doesn't seem possible currently, as devices are tied both to application code and graphics library code. Thus, devices don't represent just windowing, but also program control.

Thus, right now it doesn't seem possible to get multiple windows with different graphics library backends. This should indeed change.

SColor
In any case, SColor shouldn't select a format based on flags, as this requires conditionals and bit shifting. Instead, colors should be transparently stored in the proper format, so that Color Buffers can be specified along with other vertex properties as interleaved data. Currently a Color Buffer is created (at least with OpenGL) every frame, and after bit shifting.

Specialized classes, transparently used, are the key for this kind of behavior. Unfortunately, such classes must be polymorphic, which requires them to be on the heap. This is not a problem if we limit the heap:

Method 1
One could request a ColorFactory from the device. This would be a structure with function pointers appropriately set. These functions could then be used statically on s32 data:
ColorFactory* color = device->getColorFactory();
SColor c = color->ARGB(...);
color->setR(c, ...);

Method 2
Or perhaps we could create colors from or with a device each time, so that an SColor always has a pointer to a color manager used to delegate methods; SColor would have a new member of type ColorManager* which could be used internally by SColor to polymorphically deal with the SColor data.
SColor c = device->colorARGB(...)
c.setR(...) //uses the color manager to do the grunt work.

Conclusion
The former method seems cleaner, but the latter appears practical.
I would ask you to consider please the ramifications of either method; for instance, the former would allow for colors to be stored as simply s32 (typedef s32 SColor), while the latter requires extra storage. However, the latter maintains an objective aspect.

Consider too the case where we want to display the D3D and OpenGL renderings of one node in different windows. This would cause conversions to occur every frame.

lingwitt · Post by **lingwitt** » Thu Jul 06, 2006 7:20 pm

Hey!

I'd really like to work out a design here and get this moving. Thus, I need responses from people who have a more intimate understanding of the engine.

Some of the concerns are only fringe cases; we shouldn't cater to them. In particular, not many people will need more than one video driver, so we should put the burden of conversion on the programmer. This isn't actually a burden, as the programmer will always know when he is trying to draw with one driver instead of another, so he can choose to use a built in conversion function/method.

The main problem is the extensive use of SColor. However, I don't think we should try to maintain its use through transparent conversions. Then where should color management methods go? The IVideoDriver? The Device? Both?

Eternl Knight · Post by **Eternl Knight** » Fri Jul 07, 2006 12:13 am

To be honest, the second method is the more scalable, but both (to me) are nasty hacks that need a more elegant solution.

The first one requires X pointers stored PER SColor struct. In essence at least quadrupling the memory needed for SColor arrays. The second method is better in terms of memory (it only uses a single pointer per SColor struct back to the color manager) but the runtime cost is an extra redirection per color call & double the memory requirements of previously.

To fix this properly, there needs to be an "device internal" color format used rather than everyone using the SColor. Use the SColor for loading and "ease of development" purposes, but then store them in the driver-based mesh.

My suggestion (which may or may not be feasible, it's off the top of my head at the moment) is to create a ColorArray interface.

Instead of say an array of SColor where one would do the following:

Code: Select all

array[index]->SetRed(c);

You would use the following:

Code: Select all

colorArray.setRed(index, c);

I personally think one of the main reasons for this is that we have interface code and driver code using the same struct which causes all manner of issues. I personally think we should treat the driver as just that a "driver" where there is a set of interfaces but not necessarily the same underelying data types.

But that is just me...

--EK

lingwitt · Post by **lingwitt** » Fri Jul 07, 2006 3:09 am

To be honest, the second method is the more scalable, but both (to me) are nasty hacks that need a more elegant solution.

Actually, I've put a lot of thought into it, and I believe the first method is the cleanest and most scalable.

I believe I neglected to mention that method 1 uses the SColor to store a color in an internal format. The methods of the ColorFactory class assume this internal format, but hide that fact from the user.

The first one requires X pointers stored PER SColor struct. In essence at least quadrupling the memory needed for SColor arrays. The second method is better in terms of memory (it only uses a single pointer per SColor struct back to the color manager) but the runtime cost is an extra redirection per color call & double the memory requirements of previously.

This is not the case. The second method indeed doubles the memory requirements, but the first method adds no additional memory requirement. In fact, the SColor structure could be replaced by a simple s32 for each color (hence the typedef s32 SColor mentioned before. However, I think we should use s32 directly in order to enforce the idea that colors are device dependent).

Use the SColor for loading and "ease of development" purposes, but then store them in the driver-based mesh.

A conversion from an SColor to an internal device (really video driver) dependent color format seems pleasing at first, but only because it is a patch up job to make existing code work. It would only be beneficial for people who want to use two different devices (drivers); this is rarely the case. Moreover, creating nodes with device dependent colors will eliminate any need for conversion for almost every user, and there will only be one conversion used when switching to a different video driver. Besides, someone trying to use multiple video drivers will need to convert before each frame anyway, so we might as well not cater to that crowd.

In any case, programmers know when they will start using another video driver, so they will know how and when to perform a conversion; we just need to supply the methods.

You would use the following:
Code: Select all
colorArray.setRed(index, c);

This is actually the same idea as method 1, except that you're using an array, which would be inflexible. Besides, all colors are already in an interleaved array when stored in a node.

I personally think we should treat the driver as just that a "driver" where there is a set of interfaces but not necessarily the same underelying data types.

I agree. However, performance always dictates that we cannot ignore nasty real-world dependencies. Thus we cannot escape it, and method 1 is the least nasty. In fact, it is only nasty because programmers need to remember to which device a specific ColorFactory belongs, but this is only a problem for those using multiple video drivers. In this case, the programmer doesn't even need to know about endian issues. We can just supply some methods for any conversion necessary.

I think this says that method 1 is the right choice.

Spintz · Post by **Spintz** » Fri Jul 07, 2006 12:54 pm

I honestly don't see a problem with the global variable that I implemented in SColor. I see no need to run a single application with 2 graphics devices( i.e. DX and OpenGL running in the same application ). I did it this way, because if you want to change between DX and OpenGL, you can simply inform that the application will need to be restarted for the change to take affect.

The fact that Irrlicht supports DX and OpenGL is enough, asking it to be able to switch between the 2, dynamically at runtime, or to use both device modes at the same time is just plain overkill IMO, and I think all this discussion is just over-designing. Hence, I have no plans on changing that implementation, if you guys do, all the power to you, but it just doesn't make sense to me.

lingwitt · Post by **lingwitt** » Fri Jul 07, 2006 6:49 pm

I personally have no trouble with the global variable. I would have done the same.

However, that implementation hides decisions, but requires recurring conditionals and bit shifting.

By exposing the fact that the implementation is device dependent, we can eliminate those performance hits; at least for OpenGL, the creation of a color buffer for each node every frame will be eliminated.

Moreover, it's best to keep options open. Who are we to suggest that two video drivers will never be used?

Spintz · Post by **Spintz** » Fri Jul 07, 2006 9:55 pm

I'm not saying they will never be used, but I know that I will never ask for that requirement( without a restart of the engine ), was my point.

I guess I was trying to defend, or give a reason for the global variable and why I never pursued it beyond that. It works for me, the conditionals are WAY faster than the toOpenGLColor method and their was no noticeable speed difference with DX when making the change, so I'm happy with it the way it is( in IrrSpintz ).

lingwitt · Post by **lingwitt** » Mon Jul 10, 2006 8:57 am

Page 132 of the OpenGL 2.0 specification (found here: http://www.opengl.org/documentation/spe ... spec20.pdf) can have the following important interpretation:

OpenGL code is Endian Neutral; the root of trouble is the loading of data from storage.

That is, the OpenGL code could be the same for all platforms, but the data loading code be machine dependent, which makes more sense. Here's why:

Currently, we load data directly from files and then conditionally compile OpenGL texture generation code to account for endian issues. However, this won't work for all formats.

Consider ECF_A1R5G5B5: Irrlicht expects arrrrrgggggbbbbb, where 'a' is the most significant. However, when written to a little endian format and then read into a big endian machine, it becomes: gggbbbbbarrrrrgg; OpenGL has no way of interpreting this jumbled mess of color data.

The solution is to normalize the data upon reading it.

We should begin by replacing of CColorConverter, which is currently an impenetrable hack.

hybrid · Post by **hybrid** » Mon Jul 10, 2006 9:49 am

No, colors in file formats are either using fixed endian style (e.g. lwo uses big endian on all systems) or native endian of the source platform (i.e. the one which wrote the filei, e.g. Ogre). But these cases are already correctly handled by conversion inside the file loaders. I don't understand how the conversion routines proposed would differ from the exisiting ones.