Unicode-aware Irrlicht

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
Nalin
Posts: 194
Joined: Thu Mar 30, 2006 12:34 am
Location: Lacey, WA, USA
Contact:

Post by Nalin »

CuteAlien wrote:Thanks, great work again :-)
I suppose you are using some test-codes for development. If you could post those also, it would be easier for others checking the patch out. Although we're still all deep in 1.7 release-fixing currently, so please be patient (also I'm even deeper in finishing a day-project this weeks).
Unfortunately, I use my main project to test, and I rely heavily on the debugger. The code I used for testing the ustring class was mentioned earlier in the thread. This is the code I used to test the XML saving/loading:

Code: Select all

	env->saveGUI("test.utf8.txt", env->getRootGUIElement());
	io::IXMLReader* xml = device->getFileSystem()->createXMLReader("test.utf8.txt");
	while (xml && xml->read())
	{
		if (core::stringw(xml->getNodeName()) == L"string")
		{
			for (u32 i = 0; i < xml->getAttributeCount(); ++i)
			{
				device->getLogger()->log(xml->getAttributeName(i), xml->getAttributeValue(i));
			}
		}
	}
	delete xml;
I then walked through the code with a debugger, checking the internal state as I went to make sure everything was behaving correctly; I made sure it loaded the XML file correctly and converted it to the appropriate unicode string format to coincide with the size of wchar_t. Since my unicode string was in a string field, I had it print out all of the strings to see what would happen.

I will create a sample project for testing, but it will have to make use of a butchered version of my CGUITTFont in order to correctly draw the unicode glyphs.
Nalin
Posts: 194
Joined: Thu Mar 30, 2006 12:34 am
Location: Lacey, WA, USA
Contact:

Post by Nalin »

New version of ustring:
http://irrlicht.pastebin.com/f6aed1724

Changes in this version:
  • Fixed validate().
  • Fixed the trim() default parameter.
  • Fixed some gcc warnings.
  • After converting to UTF-16 in various functions, validate the result.
  • Added operator== to the iterators.
  • Fixed iterator toEnd() function (technically wasn't broken, a code path just wasn't used so it was removed.)
  • Added the ability to insert() whole strings.
  • Added the utility function getUnicodeBOM() to the core::unicode namespace.
  • Improved C++0x features.
I fixed a couple bugs and warnings in the code.

It now validates the string when you construct a new one or append a character array. I may remove it, though. Should it validate automatically, or rely on the user to call validate?

I also added an operator== overload to the iterators, so you can use them like you would STL iterators:

Code: Select all

ustring test("This is a test");
for (ustring::iterator i = test.begin(); i != test.end(); ++i)
I also updated the C++0x features. If you are using Visual C++ 2010 or have enabled C++0x features in GCC, it will correctly use move semantics for constructing, assigning, or adding (operator+) strings, resulting in a performance boost.
Nalin
Posts: 194
Joined: Thu Mar 30, 2006 12:34 am
Location: Lacey, WA, USA
Contact:

Post by Nalin »

I've uploaded another new version of ustring:
http://irrlicht.suckerfreegames.com/irrUString.h

Changes in this version:
  • Renamed the class to ustring16, added custom allocators back in, and added a ustring typedef.
  • Renamed all the byte order mark constants in the unicode namespace to better convey their functions.
  • Added endianness conversion features.
  • The class checks for byte order marks when you construct a new string.
  • Ability to add byte order marks and specify endianness when using the string conversion functions.
  • Various bugs fixed.
The big new features are the byte order marks and the endianness features. When you construct a new string, it will check to see if a byte order mark is present. If one is, it will correctly construct the string, taking into account the endianness specified by the mark. The byte order mark will NOT be saved into the string, however.

The string now saves its EUTF_ENCODE type. The encoding of the string can be retrieved via the getEncoding() member function. Currently, the only valid return values you should get are EUTFE_UTF16_LE and EUTFE_UTF16_BE.

I added a new enum, EUTF_ENDIAN. The getEndianness() member function will also return the endianness of the ustring.

I also updated all the unicode conversion routines (the toUTF8, toUTF16, toUTF32 member functions.) You can choose whether or not the function adds the appropriate byte order mark to the output string by passing true to the addBOM argument. Also, for the toUTF16 and toUTF32 functions, you can specify which endianness you wish the resulting string to have: EUTFEE_NATIVE, EUTFEE_LITTLE, or EUTFEE_BIG.

These new changes make it very easy to read/write proper Unicode formatted files complete with appropriate byte order marks. The class will take care of all the necessary endian conversions for you.

Code: Select all

// Passing true tells the class to include the byte order mark.
core::array<uchar8_t> utf8 = mystring.toUTF8(true);

io::IWriteFile* f = device->getFileSystem()->createAndWriteFile("mysettings.txt");
f->write(utf8.pointer(), utf8.size());
f->drop();

Code: Select all

// Passing true tells the class to include the byte order mark, and passing core::unicode::EUTFEE_BIG says that we want to save it as big-endian, for whatever reason.
core::array<uchar16_t> utf16be = mystring.toUTF16(core::unicode::EUTFEE_BIG, true);

io::IWriteFile* f = device->getFileSystem()->createAndWriteFile("mysettings.txt");
f->write(utf16be.pointer(), utf16be.size() * sizeof(uchar16_t));
f->drop();
Post Reply