[fixed]XML reader can't read files produced by XML writer

You discovered a bug in the engine, and you are sure that it is not a problem of your code? Just post it in here. Please read the bug posting guidelines first.
rogerborg
Admin
Posts: 3590
Joined: Mon Oct 09, 2006 9:36 am
Location: Scotland - gonnae no slag aff mah Engleesh
Contact:

Post by rogerborg »

vitek wrote:
rogerborg wrote:Do we just want char32 to be an unsigned 32 bit type?
I wouldn't think so. I think that a char32 should be a 32-bit integral type that has the same signedness as a char.
Indeed. It's currently "unsigned long", which is peculiar. I actually didn't spot that and tried typedeffing it to s32 first, but that's got the same string<T> problem anyway.

vitek wrote:
rogerborg wrote:that farks up the string<char32> type defined by CXMLReaderImpl (operator += (const unsigned int i) is the same as operator += (T c) )
There are ways around this. One would be to just remove the operator overloading and use unique method names. Of course that breaks source compatibility for some users.
Anybody we like? ;)

I'd be minded to go that way in the short term, since we know fine well which conversions we want to support.

In the long term, I'd like us to drop wchar_t altogether and define a fixed size UCS-2 type for wide strings. Then after we do that, Alyson Hannigan arrives on a white horse, and we ride off into the sunset together.
Please upload candidate patches to the tracker.
Need help now? IRC to #irrlicht on irc.freenode.net
How To Ask Questions The Smart Way
netpipe
Posts: 670
Joined: Fri Jun 06, 2008 12:50 pm
Location: Edmonton, Alberta, Canada
Contact:

working workaround for .irr xml file loading.

Post by netpipe »

here is a work around thanks to CuteAlien.

#!/bin/sh

if test $# = 0
then
echo usage: utf32to8 source [target]
exit 1
fi
if iconv -f UTF-32 -t UTF-8 $1 > .utf32to8dummy
then
mv .utf32to8dummy ${2:-$1}
rm -f .utf32to8dummy
fi
CuteAlien
Admin
Posts: 9734
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Post by CuteAlien »

Just to mention what my workaround does: It's a simple script called utf32to8.sh which does just wrap iconv so you don't have to type the full iconv syntax each time. What iconv does is to convert utf32 files to utf8 files, so if you call it on an typical .xml written by Irrlicht on Linux you get an utf8 file afterwards. Which doesn't really solve the problem in this thread, but allows to work with the files afterwards as Irrlicht can read the utf-8. It also allows me to work with the files otherwise as I have no editor which can do utf-32 on Linux (which is the reason why I need this script even on a 32-bit platform).
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
hiker
Posts: 58
Joined: Thu May 31, 2007 5:12 am

Post by hiker »

Hi,

this is still an issue with irrlicht 1.7.2: we have recently changed one of the config files in SuperTuxKart to be in utf (to support people entering names that might contain non-ascii characters), and at least on 64 bit Linux systems (and some Macs at least) the config files can not be read anymore, since char32 is typedef'd to unsigned long, which is 64 bit.

Any chances for a fix?

Cheers,
Joerg
CuteAlien
Admin
Posts: 9734
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Post by CuteAlien »

Sounds to me like we should change that typedef:

Code: Select all

typedef unsigned long char32;
Maybe to the same code used to typedef u32. But will have to check if this still compiles (I learned once that the templates used in xml are very sensitive to innocent changes). Unfortunately I don't have yet a compile environment to test 64 bit, so maybe you could try if this would work for you and tell me the results here? Replace the line above (in IrrXML.h) with:

Code: Select all

#ifdef _MSC_VER
typedef unsigned __int32	char32;
#else
typedef unsigned int		char32;
#endif
A better solution will probably be adding Nalin's new string-class and allowing xml-writing with utf-8. That's high on my list, but right now I'm working on another i18n issue - to find a better solution for entering cyrillic characters than the one I'm using so far with X11.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
hiker
Posts: 58
Joined: Thu May 31, 2007 5:12 am

Post by hiker »

Hi,

...
CuteAlien wrote:But will have to check if this still compiles (I learned once that the templates used in xml are very sensitive to innocent changes).
No, it doesn't (I should have mentioned that I tried something like your patch before finding this thread :) ).

Code: Select all

g++ -Wall -pipe -fno-exceptions -fno-rtti -fstrict-aliasing -g -D_DEBUG -O0 -I../../include -Izlib -Ijpeglib -Ilibpng -I/usr/X11R6/include -DIRRLICHT_EXPORTS=1  -c -o irrXML.o irrXML.cpp
../../include/irrString.h: In instantiation of ‘irr::core::string<unsigned int, irr::core::irrAllocator<unsigned int> >’:
CXMLReaderImpl.h:791:   instantiated from ‘irr::io::CXMLReaderImpl<unsigned int, irr::io::IXMLBase>’
irrXML.cpp:167:   instantiated from here
../../include/irrString.h:840: error: ‘irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(unsigned int) [with T = unsigned int, TAlloc = irr::core::irrAllocator<unsigned int>]’ cannot be overloaded
../../include/irrString.h:803: error: with ‘irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(T) [with T = unsigned int, TAlloc = irr::core::irrAllocator<unsigned int>]’
make: *** [irrXML.o] Error 1
Note the line number 840 is off by a line or two because of a local change.

If I then comment out

Code: Select all

 string<T,TAlloc>& operator += (const unsigned int i)
...
in irrString.h (around line 840) II get:

Code: Select all

g++ -Wall -pipe -fno-exceptions -fno-rtti -fstrict-aliasing -g -D_DEBUG -O0 -I../../include -Izlib -Ijpeglib -Ilibpng -I/usr/X11R6/include -DIRRLICHT_EXPORTS=1  -c -o CXMeshFileLoader.o CXMeshFileLoader.cpp
CXMeshFileLoader.cpp: In member function ‘virtual irr::scene::IAnimatedMesh* irr::scene::CXMeshFileLoader::createMesh(irr::io::IReadFile*)’:
CXMeshFileLoader.cpp:78: error: ambiguous overload for ‘operator+=’ in ‘tmpString += time’
../../include/irrString.h:803: note: candidates are: irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(T) [with T = char, TAlloc = irr::core::irrAllocator<char>]
../../include/irrString.h:812: note:                 irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(const T*) [with T = char, TAlloc = irr::core::irrAllocator<char>] <near match>
../../include/irrString.h:830: note:                 irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(int)[with T = char, TAlloc = irr::core::irrAllocator<char>]
../../include/irrString.h:849: note:                 irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(longint) [with T = char, TAlloc = irr::core::irrAllocator<char>]
../../include/irrString.h:858: note:                 irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(const long unsigned int&) [with T = char, TAlloc = irr::core::irrAllocator<char>]
../../include/irrString.h:867: note:                 irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(double) [with T = char, TAlloc = irr::core::irrAllocator<char>]
../../include/irrString.h:876: note:                 irr::core::string<T, TAlloc>& irr::core::string<T, TAlloc>::operator+=(float) [with T = char, TAlloc = irr::core::irrAllocator<char>]
make: *** [CXMeshFileLoader.o] Error 1
At this point I decided to ask the experts here :)

Thanks for your help!
Joerg
Nalin
Posts: 194
Joined: Thu Mar 30, 2006 12:34 am
Location: Lacey, WA, USA
Contact:

Post by Nalin »

If you really, really want to allow unicode XML files, you pretty much will be forced to use my modifications:
http://irrlicht.sourceforge.net/phpBB2/ ... hp?t=37296

Irrlicht does not understand unicode. At all. Irrlicht's XML reader can read ASCII, UCS-2, and UTF-32/UCS-4. It, by default, reads the XML file as ASCII unless it encounters a unicode byte order mark. Except it doesn't really know unicode, so if your file is saved as UTF-8 or UTF-16, there is a good chance for data corruption as it can't deal with multi-byte characters.

My modifications will allow you to read ASCII, UTF-8, UTF-16, and UTF-32 formatted XML files, as well as allow you to write UTF-8 formatted XML files. It is your best option for now. Unfortunately, that would require you to compile a custom version of Irrlicht, so it may not be possible for you guys. Sorry, but Irrlicht just cannot handle multi-byte unicode.
Auria
Competition winner
Posts: 120
Joined: Wed Feb 18, 2009 1:11 am
Contact:

Post by Auria »

CuteAlien wrote:Sounds to me like we should change that typedef:

Code: Select all

typedef unsigned long char32;
Maybe to the same code used to typedef u32. But will have to check if this still compiles (I learned once that the templates used in xml are very sensitive to innocent changes). Unfortunately I don't have yet a compile environment to test 64 bit, so maybe you could try if this would work for you and tell me the results here? Replace the line above (in IrrXML.h) with:

Code: Select all

#ifdef _MSC_VER
typedef unsigned __int32	char32;
#else
typedef unsigned int		char32;
#endif
A better solution will probably be adding Nalin's new string-class and allowing xml-writing with utf-8. That's high on my list, but right now I'm working on another i18n issue - to find a better solution for entering cyrillic characters than the one I'm using so far with X11.
Including full Unicode support would be real nice, though IMO this doesn't change a thing to the fact that defining "char32" to be 64 bits is a bug :)
Nalin wrote: If you really, really want to allow unicode XML files, you pretty much will be forced to use my modifications:
http://irrlicht.sourceforge.net/phpBB2/ ... hp?t=37296

Irrlicht does not understand unicode. At all. Irrlicht's XML reader can read ASCII, UCS-2, and UTF-32/UCS-4. It, by default, reads the XML file as ASCII unless it encounters a unicode byte order mark. Except it doesn't really know unicode, so if your file is saved as UTF-8 or UTF-16, there is a good chance for data corruption as it can't deal with multi-byte characters.
Yes, atm we just dump a core::stringw (wchar_t*) to file (with BOM); on Linux and OSX, wchar_t is 32 bits so the file will be UTF-32 (On windows it may be UTF-16). So normally I expect that we can just load it back into a wchar_t* without any unicode conversion.

Of course proper Unicode conversion would be a nice feature to have :)
CuteAlien
Admin
Posts: 9734
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Post by CuteAlien »

It seems hybrid fixed that already some time ago in svn trunk. And probably not fixed in 1.7 because it's an interface change.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Post by hybrid »

Oh sorry, I didn't recall that fix. And I'm far too lazy with changes.txt lately. But IIRC, it was a fix using a simple struct in place of the raw integer. The changes in the headers made this an 1.8 change, there's no safe way to backport it.
Post Reply