ms3d (milkshape 3d) loader fail on BIG_ENDIAN

kas1e · Post by **kas1e** » Mon Oct 21, 2019 10:05 pm

Found today another loader which fail on BIG_ENDIAN same as halflife one. But while with Halflife one there wasn't ever any big_endian ifdefs, the CMS3DMeshFileLoader.cpp have them in all places, and all looks sane enough.

But, when i just trying to port some game which use ms3d, then i found it always crashes in the CMS3DMeshFileLoader::load() , on the "f32 framesPerSecond = *(float*)pPtr;" line, when it want to get animation time. Very possible that my crash-log are borked somehow, and actuall line which cause a crash is not that one, but its for sure in the CMS3DMeshFileLoader::load() anyway (as i can see how debug prints all the stuff about headers, vertices, triangles, groups, materials and stuff), and then crashes. And type of crashlog are "alignment exception" , which make me think that maybe its something about those "PACK_STRUCT" at top of file ?

Any ideas are welcome , thanks

kas1e · Post by **kas1e** » Tue Oct 22, 2019 8:22 am

Oh, with help of one of ppc developers we found issue, and that crash was correct one with point on correct line:

While all PPC CPUs should handle unaligned integer accesses the same cannot be said of floating point. Some other OSes may use exception handling to handle such accesses without crashing but this comes at a high cost to performance, so it is better to fix those programs.

Instead of reading the floating point value from an unaligned pointer you should do something like:

Code: Select all

 
union {
    u32 u;
    float f;
} tmp;
tmp.u = *(u32*)pPtr;
f32 framesPerSecond = tmp.f;

And it works when i compile that loader with "-Wall -fno-exceptions -fno-rtti -fstrict-aliasing -gstabs -D_DEBUG" , but then, when i compile for release version which is "-Wall -fno-exceptions -fno-rtti -fstrict-aliasing -fexpensive-optimizations -O3", then still the same crash. Seems optimisation broke that part , so need to deal with it now, and then that change can be commited to trunk. I assume it will help also on little endian cpus as well (even if a little bit better, why not)

CuteAlien · Post by **CuteAlien** » Tue Oct 22, 2019 10:23 am

There are a few more places where we just swap bytes for floats in that loader :-(

kas1e · Post by **kas1e** » Tue Oct 22, 2019 12:25 pm

@CuteAlien

Well, now after a more checks, i assume that my other crash indeed not because of that fix we do which is correct now. Its just another crash come up with alignment which by some luck just didn't showups when we build without optimisation, but showups when build with it.

But again, pointing out lines in crash-log are strange ones, its now:

Code: Select all

 
for (u16 tmp=0; tmp<numVertices; ++tmp)
    {
#ifdef __BIG_ENDIAN__
        vertices[tmp].Vertex[0] = os::Byteswap::byteswap(vertices[tmp].Vertex[0]);
        vertices[tmp].Vertex[1] = os::Byteswap::byteswap(vertices[tmp].Vertex[1]);
        vertices[tmp].Vertex[2] = -os::Byteswap::byteswap(vertices[tmp].Vertex[2]);
#else
        vertices[tmp].Vertex[2] = -vertices[tmp].Vertex[2];
#endif
    }

And point out exactly on the first line "for (u16 tmp=0; tmp<numVertices; ++tmp)".

Its line 215 in src:

https://sourceforge.net/p/irrlicht/code ... r.cpp#l215

Type of crash also alignment one, and i can see that "numVertices" are "u16 numVertices = *(u16*)pPtr;". But it didn't looks like floats ..

kas1e · Post by **kas1e** » Tue Oct 22, 2019 3:34 pm

@CuteAlien
So seems that vertices structure looks like it's also unaligned so some kind of fix is needed there as well.. There is that vertex struct:

Code: Select all

 
// Vertex information
struct MS3DVertex
{
    u8 Flags;
    float Vertex[3];
    char BoneID;
    u8 RefCount;
} PACK_STRUCT;

I assume that those "float Vertex[3]" in here cause issues ? Have any idea how to align them ?:)

CuteAlien · Post by **CuteAlien** » Tue Oct 22, 2019 3:47 pm

I'm not sure what "aligned" means in this context or how the union changes any of it. The struct is always loaded as binary block. So the only alignmment that should matter is that there shouldn't be any gaps in between. Which is what the PACK_STRUCT is about. I know on some architectures certain types (pointers maybe?) have to be aligned to memory-addresses (like an address dividable by 4). But I can't see how the union would make a difference there.

Can't help until I understand the exact problem... what is really happening?

edit: Maybe start by comparing sizeof information for the struct. Is it different on AmigaOS than on Win32?

kas1e · Post by **kas1e** » Tue Oct 22, 2019 4:11 pm

At moment i only have that kind of answer from developer who looked at:

This code is not written to be portable to CPU architectures which don't allow unaligned accesses.

The best solution that wouldn't require any hacks would be to remove __attribute__((packed)) from all the structures and then rewrite the loader so that it reads one field at a time (this could be done with helper functions that could also handle the endian conversion). Then no unaligned memory accesses would be necessary.

But rewrite whole loader are out of my abilities for sure, as i only can do "hack-slash" things and not clean code

Is it sounds like anything easy, or its mean "whole total rewriting of loader" ? Or only "load()" function ?

CuteAlien · Post by **CuteAlien** » Tue Oct 22, 2019 11:22 pm

OK, this is hard. Just how I understand what he wrote:

It seems that platform does _need_ the gaps inside the struct so it can align certain variables to certain memory addresses. For example a pointer might not follow directly behind a single char, but only can be placed after 4 chars in a row so it's memory address is again dividable by 4 (or whatever alignment it needs). So if a struct isn't doing that then the compiler must insert empty buffers in between variables to ensure pointer variables end up like that. And it can't just be avoided by forcing it to use packing without gaps like we do currently for other platforms.

So in short - there is simply no way to have sizeof (some_struct) work. Not in a platform independent way. Using the PACK_STRUCT macro works only if the platform actually supports packed structs.

Unfortunately c/c++ have no sizeof alternative which sums up the sizes of all members of a struct (at least none I've heard of).

Two alternative I can think of:

First is writing sizeof() functions (or whatever we call them... not sure if sizeof is a legal name for a member function) for each struct in which we sum up the sizes of members ourself. Then indeed we would have to get rid of all the PACK_STRUCT, irrpack.h and irrunpack.h includes. And then all sizeof(struct) stuff has to be replaced by some_struct.sizeof() calls using our functions. And then the harder part... all places which read bytes using pointers to structs like "MS3DTriangle *triangles = (MS3DTriangle*)pPtr;" can't be used any longer. They have to be replaced by pointers to bytes (u8*) instead. And then struct-members have to read in the stuff byte-wise (this unfortunately means also we can't just look for pointers inside structs, but all pointers _to_ structs on loading can likely cause troubles).
This is likely the easier solution to code as it has less changes, but it makes all code uglier and harder to read. So not some patches I'm keen on getting. But if you just want to get it working - do that maybe.

Second alternative is writing a binary serializer interface and rewrite all loaders for binary formats to use that. So some class where you say read(type), write(type) for all kind of types and it then reads/writes from a stream of bytes. Lot of rewriting. I _might_ accept patches like that, but again not too keen on it as it's a huge rewrite.

Maybe we should check first how other Amiga libs solve that. But I can't think of more ideas right now.

So simplest solution likely - do it per hand per loader - but don't merge it back to Irrlicht. Still lots of work, sorry...

Or the super-easy solution. Return 0 for such loaders and print out a log-warning like "This loader is not supported on AmigaOS". Note that PACK_STRUCT is also used in some image format-loaders. Thought you might be lucky with those (see below).

First task is probably finding out the _exact_ alignment rules you need. Like is it possible to use strict packing (without gaps) when there are no pointers inside the struct and when the size of the whole struct ends up being dividable by 4 (in which case all structs which are like that are already no problem - I think some formats might already care about this).

kas1e · Post by **kas1e** » Wed Oct 23, 2019 10:24 am

Uhm it should't be that hard. I mean, all the formats (image ones, mesh ones, etc) from irrlicht already works fine on our cpu. The only format which is not works on our side, its halflife loader, but that expected as it didn't have any big_endian ifdefs.

Also, that ms3d loader actually works fine when i just put that hack for floats, and compile it just with -O1 instead of -O3. So it should't be something which requery very big modifications all over the places..

I will ask more details about that alignemnt issues (i.e. which restrictions we have and in which areas about), so we can discuss it futher.

In end of all i can build it just with -O1 (only that loader) and be done with it. But then hiding bugs with lovering optimisation level was always bad idea

kas1e · Post by **kas1e** » Wed Oct 23, 2019 5:11 pm

I discussed with other ppls a bit as well as doing some google, and that what i found for big-endian cpus:

1. Access with floating point registers to memory need to be either on a 4 byte boundary. The memory accesses which are not properly aligned generate an "alignment exception". If it didnt crashes on some of ppc cpus with linux , it only mean that kernel "trap" those unaligned memory accesses , and doing some realign (which is slooow). Other kernels on other oses on different big-endian cpus may throw and expections / crashes /etc.

2. PPC processors (that ones amigaos4 use) only have alignment requirements for certain types of data such as floating point.

Now, in terms of irrlicht about that ms3d loader:

1. The problem is that the structure is directly loaded from the binary file. So it have floats inside, and they not on top of structures, not aligned then, which mean they should be aligned to not crash on us.
2. I search in all the whole irrlicht 1.8.4 on PACK_STRUCT words, and found them in those ones only:

include/IAnimatedMeshMD3.h

source/Irrlicht/C3DSMeshFileLoader.h
source/Irrlicht/CAnimatedMeshHalfLife.h
source/Irrlicht/CAnimatedMeshMD3.cpp
source/Irrlicht/CImageLoaderBMP.h
source/Irrlicht/CImageLoaderDDS.h
source/Irrlicht/CImageLoaderPCX.h
source/Irrlicht/CImageLoaderPSD.h
source/Irrlicht/CImageLoaderRGB.h
source/Irrlicht/CImageLoaderTGA.h
source/Irrlicht/CImageLoaderWAL.h
source/Irrlicht/CLMTSMeshFileLoader.h
source/Irrlicht/CMD2MeshFileLoader.cpp
source/Irrlicht/CMS3DMeshFileLoader.cpp
source/Irrlicht/CMY3DHelper.h
source/Irrlicht/CMY3DMeshFileLoader.h
source/Irrlicht/COgreMeshFileLoader.h
source/Irrlicht/CTarReader.h
source/Irrlicht/CWADReader.h
source/Irrlicht/CZipReader.h

I checked all those packed structs, to find out where "floats" only used in, and so, there is really not many, there is:

source/Irrlicht/CAnimatedMeshHalfLife.h
source/Irrlicht/CAnimatedMeshMD3.cpp
source/Irrlicht/CLMTSMeshFileLoader.h
source/Irrlicht/CMD2MeshFileLoader.cpp
source/Irrlicht/CMS3DMeshFileLoader.cpp
source/Irrlicht/CMY3DHelper.h
source/Irrlicht/CMY3DMeshFileLoader.h

source/Irrlicht/CImageLoaderDDS.h :

in this one btw, we can found exactly the same kind of union code as our ppc developer give us before, its even called "endian tomfoolery"

Code: Select all

 
/* endian tomfoolery */
typedef union
{
    f32 f;
    c8  c[ 4 ];
}
floatSwapUnion;

I.e. there is already those who code that file at least know that it should be aligned (and that why they use Union as you ask before, to just made an align with float).

So, seeing on that, we can say that things which should crash on big-endian cpus are:

1. Halflife loader. That one even didn't have any big_endian ifdefs, so it wasn't care about at all , all expected.

2. MD2 loader. That one while have floats in packed structures (SMD2Frame and SMD2GLCommand), for sure didn't crashed : devs which checking that say that this loader has everything aligned. That is why it does not crash. The floats are at the top so start at 0. The other struct has a u8 at the top so the first float starts at 1 (and therefore is misaligned for the PPC FPU).

3. MD3 loader. That one not sure if ok or not, but loading a quake3 packs inside of 21.Quake3Explorer works.

4. LMTS - didn't found any test file to test with

5. MY3D - didn't found any test file to test with

In other words, its all come not to whole problem of Irrlicht, but to that single loader, which writen without keeping alignemnt in mind. So that need to be fixed/rewriten and updated in trunk for sure.

Solution there just probabaly one: Loading it as an integer and then convert it to float should work. I.e. idea is to access float values from the loaded binary only with byte access (by declare a union to manage access with float and bytes).

I.e. making something like done in CImageLoaderDDS.h :

Code: Select all

 
union HybridFloat {
    f32 fvalue;
    u8 cvalue[4];
};

And then use it in all the places in that loader in question. It will only mean a little updating of code inside of __BIG_ENDIAN__ ifdefs.

Later the same approach can be choicen when i will add big_endian ifdefs for halflifeloader.

At least, that how it alredy done in other irrlicht's parts where that necessary.

kas1e · Post by **kas1e** » Wed Oct 23, 2019 6:59 pm

@CuteAlien
I thinking about adding something like that on to the ms3d loader (as done for ImageLoaderDDS):

Code: Select all

 
#ifdef __BIG_ENDIAN__
    typedef union
    {
        f32 f;
        c8  c[ 4 ];
    }
    floatSwapUnion;
 
    f32 EndianFloat( f32 src )
    {
        floatSwapUnion in,out;
        in.f = src;
        out.c[ 0 ] = in.c[ 3 ];
        out.c[ 1 ] = in.c[ 2 ];
        out.c[ 2 ] = in.c[ 1 ];
        out.c[ 3 ] = in.c[ 0 ];
        return out.f;
    };
#endif

Then in all ifdefs for big-endian where unaligned floats are used, we can convert it.

CuteAlien · Post by **CuteAlien** » Wed Oct 23, 2019 7:56 pm

OK, so not pointers but floats. Yeah, I guess we can fix it like that, phew :-)
We have a somewhat similar union already in irrMath.h (FloatIntUnion32, but for another use-case).
Maybe we can just add c8 c[ 4 ]; in there and some empty constructor.

kas1e · Post by **kas1e** » Wed Oct 23, 2019 8:03 pm

At least that how i understand it.

Throrught there is also "Instead of reading the floating point value from an unaligned pointer you should do something like" was about that part: f32 framesPerSecond = *(float*)pPtr; , but its not because a pointer unaligned, but because its "float" pointer which start to be unaligned because if *(float *). So issues is with floats only yeah.

Maybe we can just add c8 c[ 4 ]; in there and some empty constructor.

Have idea how to do it with less blood ?:)

I currently have suggestion from the second unalignment crash (first one was that f32 framesPerSecond = *(float*)pPtr, which we fix with that union), and now second one happens on that for() loop:
https://sourceforge.net/p/irrlicht/code ... r.cpp#l215

And idea was something like this:

- Declare a union to manage access with float and bytes, rename the struct MS3DVertex into MS3DVertexPacked and define a struct MS3DVertex without the PACK_STRUCT keyword:

Code: Select all

 
union HybridFloat {
    float fvalue;
    char cvalue[4];
};
 
struct MS3DVertexPacked
{
    u8 Flags;
    HybridFloat Vertex[3];
    char BoneID;
    u8 RefCount;
} PACK_STRUCT;
 
struct MS3DVertex
{
    u8 Flags;
    float Vertex[3];
    char BoneID;
    u8 RefCount;
};

- Modify the code (lines 206 to 225) by something like that (sorry, not able to compile and test here):

Code: Select all

 
    MS3DVertex *vertices;
    MS3DVertexPacked *vertices_packed = (MS3DVertexPacked *)pPtr;
    HybridFloat tmpfloat;
    char *pVPtr;
    pVPtr = (char *)pPtr; // address of the first packed vertex in memory
    pPtr += sizeof(MS3DVertexPacked) * numVertices;
    if (pPtr > buffer+fileSize)
    {
        delete [] buffer;
        os::Printer::log("Loading failed. Corrupted data found.", file->getFileName(), ELL_ERROR);
        return false;
    }
    for (u16 tmp=0; tmp<numVertices; ++tmp)
    {
#ifdef __BIG_ENDIAN__
        // Read per byte with swapping and fill the 3 vertices in the final structure
        // pVPtr[0] is ignored, it contains the char field Flags
        tmpfloat.cvalue[3] = pVPtr[1];
        tmpfloat.cvalue[2] = pVPtr[2];
        tmpfloat.cvalue[1] = pVPtr[3];
        tmpfloat.cvalue[0] = pVPtr[4];
        vertices[tmp].Vertex[0] = tmpfloat.fvalue;
        tmpfloat.cvalue[3] = pVPtr[5];
        tmpfloat.cvalue[2] = pVPtr[6];
        tmpfloat.cvalue[1] = pVPtr[7];
        tmpfloat.cvalue[0] = pVPtr[8];
        vertices[tmp].Vertex[1] = tmpfloat.fvalue;
        tmpfloat.cvalue[3] = pVPtr[9];
        tmpfloat.cvalue[2] = pVPtr[10];
        tmpfloat.cvalue[1] = pVPtr[11];
        tmpfloat.cvalue[0] = pVPtr[12];
        vertices[tmp].Vertex[2] = - tmpfloat.fvalue;
 
        // Go to the next vertex structure
        pVPtr += sizeof(struct MS3DVertexPacked);
#else
        vertices[tmp].Vertex[2] = -vertices[tmp].Vertex[2];
#endif
    }

But that code wasn't tested, its just as idea.

kas1e · Post by **kas1e** » Wed Oct 23, 2019 8:23 pm

Found also interesting aritcle on IBM's site:
https://developer.ibm.com/articles/pa-dalign/

There the says:

The PowerPC takes a hybrid approach. Every PowerPC processor to date has hardware support for unaligned 32-bit integer access. While you still pay a performance penalty for unaligned access, it tends to be small.

On the other hand, modern PowerPC processors lack hardware support for unaligned 64-bit floating-point access. When asked to load an unaligned floating-point number from memory, modern PowerPC processors will throw an exception and have the operating system perform the alignment chores in software. Performing alignment in software is much slower than performing it in hardware

So kind of what we think about then.

CuteAlien · Post by **CuteAlien** » Thu Oct 24, 2019 12:15 pm

Hm, having same struct twice is feeling kinda bad.

I think using HybridFloat always is probably fine. Those structs are generally only used for loading and discarded afterwards. So as long as we document the "real" type we wouldn't need a copy of struct (with a name like HybridFloat32 it's already documented enough - not sure if "hybrid" is a good name though... maybe AlignFloat32 would be more telling what it does) . edit: I can add AlignFloat32 to irrtypes.h if you want.

For the 2 different types of packing needed for some struct - maybe use something like this:

Code: Select all

 
#include <iostream>
 
#pragma pack( 4 )
struct STest
{
    char x;
    int y;
};
 
#pragma pack( 1 )
 
template <typename T>
struct TPack1
{
    T Packed;
};
 
int main()
{
    STest a;
    TPack1<STest> b;
        
    std::cout << "a: " << sizeof(a) << "b: " << sizeof(b) << std::endl;
 
    // copy should work?
    a = b.Packed;   
 
    return 0;
}

So on loading all structs which need packing are put into a TPack1<otherstruct>. And accessed with variable.Packed. After loading they can be copied. That way we still have 2 structs for everything, but they no longer require us to type them twice in code.

Edit: Also be careful - I think PACK_STRUCT is not enough. That one only works on very old gcc compilers. The other trick we use (and is important on all other compilers) is the #include "irrPack.h" and #include "irrunpack.h" before/after structs needing packing set to 1.
Edit2: In my example i just used the pragma for testing if it works in theory.
Edit3: Hehe - I just realized with my TPack1 trick I suddenly have a sizeof with strict packing for any kind of struct. Well, not exactly cost-free (compiler will need to create 2 structs), but still nice.
Edit4: AlignFloat32 is also a bad name. As it's about having an unaligned float. Names are hard... UnalignedFloat32? UnalignedF32? Unaligned32f? (unaligned is a very long word ... )

Irrlicht Engine

ms3d (milkshape 3d) loader fail on BIG_ENDIAN

ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN

Re: ms3d (milkshape 3d) loader fail on BIG_ENDIAN