[fixed]XML reader can't read files produced by XML writer
[fixed]XML reader can't read files produced by XML writer
I've just finished writing the Linux implementation of IrrFontTool, but I've been having trouble with getFont rejecting the produced XML font file. The problem seems to be that the XML writer is producing a file with four bytes per character, which is the size of a wchar_t on 64 bit Linux, but the reader seems to only be able to handle two bytes per character in an XML file.
I have a couple of questions for someone knowledgeable:
1) Is there some sort of "quick fix" that I haven't heard about for 64bit systems that makes this work?
2) If this is a problem, then is it that the reader should be able to handle four byte characters, or is it that the writer shouldn't be producing four byte characters?
As soon as I've got this out the way, I'll be able to finish testing the Linux implementation of IrrFontTool and release it.
I have a couple of questions for someone knowledgeable:
1) Is there some sort of "quick fix" that I haven't heard about for 64bit systems that makes this work?
2) If this is a problem, then is it that the reader should be able to handle four byte characters, or is it that the writer shouldn't be producing four byte characters?
As soon as I've got this out the way, I'll be able to finish testing the Linux implementation of IrrFontTool and release it.
-
- Admin
- Posts: 3590
- Joined: Mon Oct 09, 2006 9:36 am
- Location: Scotland - gonnae no slag aff mah Engleesh
- Contact:
Re: XML reader can't read files produced by XML writer
I believe that wchar_t is 32 bits by default on gcc compilers, regardless of the CPU architecture being targetted. -fshort-wchar should force it to be 16 bits.nburlock wrote:1) Is there some sort of "quick fix" that I haven't heard about for 64bit systems that makes this work?
It's a fundamental problem with wchar_t, which is why it's not a good type for data exchange. It would be great if Irrlicht defined its own wide type instead, perhaps a UCS-2 type (since UTF-16 brings its own sizing problems to the party).nburlock wrote:2) If this is a problem, then is it that the reader should be able to handle four byte characters, or is it that the writer shouldn't be producing four byte characters?
Hmm, I'm meandering here. I guess I should actually look into doing a patch for this, although robustly testing it across all platforms will be interesting.
Please upload candidate patches to the tracker.
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
Re: XML reader can't read files produced by XML writer
Great info, thanks for that.rogerborg wrote:-fshort-wchar should force it to be 16 bits.
I've logged it as a bug:
https://sourceforge.net/tracker2/?func= ... tid=540676
I don't think that's the problem. All xml-files produced by Irrlicht on Linux are (unfortunately) always 4 bytes and it usually can also read them.
I have no experience with the IrrFontTool, but search around in the forum, I remember having seen already a few threads about that.
I have no experience with the IrrFontTool, but search around in the forum, I remember having seen already a few threads about that.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
The problem is happening inside the read method of IXMLReader. If I give it a four byte per char file (created by Font Tool), that function will fail. If I strip the extra 2 bytes out of each char in the XML file, then read will work. I've also noticed that Text Editor (Ubuntu's Wordpad equivalent) can't open the four byte per char XML file (it thinks it's a binary file), while Firefox can.
I went back and had Irrlicht create the simplest possible XML file, just a header and one tag, and the same problem is present. I checked the file in a Hex editor, and apart from the Unicode header in the first two bytes of the file, 0xFFFE, everything else is one character value followed by 3 zero bytes which should be legal. Again, Firefox can open this file, but Text Editor and Irrlicht can't.
I went back and had Irrlicht create the simplest possible XML file, just a header and one tag, and the same problem is present. I checked the file in a Hex editor, and apart from the Unicode header in the first two bytes of the file, 0xFFFE, everything else is one character value followed by 3 zero bytes which should be legal. Again, Firefox can open this file, but Text Editor and Irrlicht can't.
Irrlicht checks for the following formats:
So 0xfffe would be utf16_be, only if it's followed by 0000 then it's an utf32_be.
I'm not really an expert on IrrXML, but I'm often using utf32 files with Irrlicht so that's why I would be surprised to see a problem there. Which version of irrlicht are you using?
Code: Select all
const unsigned char UTF8[] = {0xEF, 0xBB, 0xBF}; // 0xEFBBBF;
const int UTF16_BE = 0xFFFE;
const int UTF16_LE = 0xFEFF;
const int UTF32_BE = 0xFFFE0000;
const int UTF32_LE = 0x0000FEFF;
I'm not really an expert on IrrXML, but I'm often using utf32 files with Irrlicht so that's why I would be surprised to see a problem there. Which version of irrlicht are you using?
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
I'm running 1.4.2
I've tracked the problem down. It starts at line 573 of CXMLReaderImpl.h:
Then, the following is defined a little further on:
Two if statements are used to determine of the first four bytes of the file are big (line 587) or little endian (594):
Both tests fail because:
And the code goes on to determine that it's a 2 byte character file of type UTF16_LE, which is why it doesn't work. This will need someone with more experience of the system to say what needs to be fixed.
I've tracked the problem down. It starts at line 573 of CXMLReaderImpl.h:
Code: Select all
char32* data32 = reinterpret_cast<char32*>(data8);
Code: Select all
const int UTF32_BE = 0xFFFE0000;
const int UTF32_LE = 0x0000FEFF;
Code: Select all
if (size >= 4 && data32[0] == (char32)UTF32_BE)
if (size >= 4 && data32[0] == (char32)UTF32_LE)
Code: Select all
data32[0] = 0x0000FEFF
(char32) UTF32_BE = 0xFFFE0000
(char32) UTF32_LE = 0xFEFF
Looks like something for hybrid (I guess he's currently in holiday as he didn't post the last days and it's holiday time in his area).
Still I don't really get it as 0xFEFF should be equal to 0x0000FEFF and so it should recognice the UTF32_LE in that 'if' clause.
Still I don't really get it as 0xFEFF should be equal to 0x0000FEFF and so it should recognice the UTF32_LE in that 'if' clause.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
I mistyped the value of data32[0] in my previous post, it's actually 0x3C0000FFFE.
char32 is defined as an unsigned long, which is eight bytes on my 64 bit system. That explains why this isn't working, because it's comparing the first 8 bytes of the file against a four byte value. The following code demonstrates the problem:
So then I guess that the solution is to change char32 to some type that is four bytes long on all platforms.
char32 is defined as an unsigned long, which is eight bytes on my 64 bit system. That explains why this isn't working, because it's comparing the first 8 bytes of the file against a four byte value. The following code demonstrates the problem:
Code: Select all
char data8[8] = { 0xFE,0xFF,0x00,0x00,0x3C,0x00,0x00,0x00 };
char32* data32 = reinterpret_cast<char32*>(&data8[0]);
char16* data16 = reinterpret_cast<char16*>(&data8[0]);
const int UTF32_BE = 0xFFFE0000;
const int UTF32_LE = 0x0000FEFF;
if (data32[0] == (char32)UTF32_BE)
printf("big endian\n");
if (data32[0] == (char32)UTF32_LE)
printf("little endian\n");
Yes, that sounds like a rather good idea :-)nburlock wrote: So then I guess that the solution is to change char32 to some type that is four bytes long on all platforms.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
I've posted the info to the bug report, but I'm not going to post a patch - I've no idea what types are constant across all the different compilers and platforms Irrlicht supports
Last edited by nburlock on Thu Oct 30, 2008 3:17 am, edited 1 time in total.
-
- Admin
- Posts: 3590
- Joined: Mon Oct 09, 2006 9:36 am
- Location: Scotland - gonnae no slag aff mah Engleesh
- Contact:
Do we just want char32 to be an unsigned 32 bit type?
Presumably u32 is an unsigned 32 bit type, even on a 64 bit system?
Unfortunately, we can't just "typedef u32 char32", since that farks up the string<char32> type defined by CXMLReaderImpl ( operator += (const unsigned int i) is the same as operator += (T c) )
What a pretty pickle!
Presumably u32 is an unsigned 32 bit type, even on a 64 bit system?
Unfortunately, we can't just "typedef u32 char32", since that farks up the string<char32> type defined by CXMLReaderImpl ( operator += (const unsigned int i) is the same as operator += (T c) )
What a pretty pickle!
Please upload candidate patches to the tracker.
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
I wouldn't think so. I think that a char32 should be a 32-bit integral type that has the same signedness as a char.rogerborg wrote:Do we just want char32 to be an unsigned 32 bit type?
Yeah, it should.rogerborg wrote:Presumably u32 is an unsigned 32 bit type, even on a 64 bit system?
There are ways around this. One would be to just remove the operator overloading and use unique method names. Of course that breaks source compatibility for some users. Another way is to us SFINAE and remove one of the overloads when T is unsigned int.rogerborg wrote:that farks up the string<char32> type defined by CXMLReaderImpl (operator += (const unsigned int i) is the same as operator += (T c) )
Travis