How to use UNICODE

If you are a new Irrlicht Engine user, and have a newbie-question, this is the forum for you. You may also post general programming questions here.
Post Reply
milen_prg
Posts: 5
Joined: Tue May 16, 2023 5:42 pm

How to use UNICODE

Post by milen_prg »

I use Font tool to generate Russian font, then load in code:

IGUIFont* font = device->getGUIEnvironment()->getFont("../../font/sogoeuibru.xml");
guie->getSkin()->setFont(font);
guie->getSkin()->setColor(EGDC_BUTTON_TEXT, SColor{255, 255, 255, 100});

auto *t = guie->addStaticText(L"Proba ПРОба my font.", rect<s32>{10, 10, 300, 100});

The Latin alphabets are correct, but the Cyrillic are some glyphs. Can correct this? How to activate the "Irrlicht full Unicode support"?!
CuteAlien
Admin
Posts: 9843
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: How to use UNICODE

Post by CuteAlien »

Using cyrillic inside c++ code can be a bit tricky. First thing is that it depends on the format your c++ file is using (and maybe also on the compiler).
I think by default c++ source files are saved in ansi and have trouble with cyrillic, but I'm also not exactly sure about that. Visual Studio warns me and wants to convert to Unicode, which can mean all kind of formats in a text file. Googling a bit around it seems what VS chooses depens on the VS version (seems older used utf-8 then and newer may use UTF-16 with BOM). Other editors/IDE's may choose different or not support it at all. Also not sure if all compilers then interpret the text-format correctly. Anyway - in the end it has to be unicode code points when passed to Irrlicht.

So the usual solution is: Don't put cyrillic strings inside your source files at all. Instead work with string-tables. There are all kind of way to implement those, but generally the interface is something that takes a string-label (in ansi) and returns a string in wchar_t* in the given format you need (utf-16 for example). You can find 2 simple examples for doing string-tables (using tinyXMl) on my website: http://www.michaelzeilfelder.de/irrlicht.htm
Here's an example how such an xml string-table can look like: https://github.com/mzeilfelder/media_hc ... ngs_ru.xml

There are other ways to do that - you can also use irrXML to load the xml for example. But general idea always the same - put your strings in some extra text-file which is saved usually either as utf-8 or or utf-16 (I think irrXML can handle utf-8 only in Irrlicht svn, Irrlicht 1.8 probably needs files in utf-16 or usc-2).

This has also a big advantage once you want to have your application translated. Then all you have to do is load different string-tables for different languages.

Also, you probably did that, but just mentioning it anyway - make sure your font supports cyrillic characters. Many don't. You can test it usually by creating an edit-box in Irrlicht and just type into it. Cyrillic characters should show (as long as your keyboard input in your OS is set to cyrillic).
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
milen_prg
Posts: 5
Joined: Tue May 16, 2023 5:42 pm

Re: How to use UNICODE

Post by milen_prg »

@CuteAlien, my source code is in UTF8, I use VSCode - modern IDE.
milen_prg
Posts: 5
Joined: Tue May 16, 2023 5:42 pm

Re: How to use UNICODE

Post by milen_prg »

I just now see what was the problem - on my system is installed Irrlicht 1.8.4, which can't write unicode text. I try at system with v.1.8.5 and all is OK.

Because my 1.8.4 uses MSVC and this 1.8.5 uses GCC, I will try to update the first to see if only the library version was the problem.
CuteAlien
Admin
Posts: 9843
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: How to use UNICODE

Post by CuteAlien »

Even if the text is saved in utf-8 you still have to figure out what the compiler makes internally out of this string. Using L"something" just means "something is using wchar_t in memory. Which is not a string format, but just giving a size. Like usually 2 bytes per character on Windows and 4 bytes per character on Linux. So utf-8 source c++ file loaded into a wchar_t string may do some conversion, or it may just copy characters. It's up to the compiler and not something the c++ standard defines.

You can control behaviour for this to a certain extend in most compilers via compile-flags. So it might be possible to figure out a set of working compile-flags for each compiler which makes it work.

Alternatively - c++11allows you to use string literals. So you can use u8"some string". And then you know it's in utf-8 format (and doesn't matter which format your source file is saved in). And you can then use string-conversion functions to convert that string into some wide-string variant. Irrlicht 1.8 doesn't have those, but Irrlicht svn trunk has utf8ToWchar. It's simply some code we got from physfs which is in utf8.cpp (https://sourceforge.net/p/irrlicht/code ... t/utf8.cpp) so if you don't want to use Irrlicht svn trunk version you can still copy that file. You will need to compile your sources then with c++11 support, but basically every c++ compiler supports that by now.

So then it's u8"Proba ПРОба my font." And then convert with utf8ToWchar. Which expects you to know the size of the result and is a bit complicated to use, sorry. But converting in this direction I think strings never get longer so you can allocate strlen+1 probably.

There's also multibyteToWString in Irrlicht svn trunk which is slightly easier to use. But what it exactly does depends on the LC_CTYPE of the current c locale.

And I think modern c++ has a few more conversion functions to offer now.

Bit sorry state of c++, but it was developed back in the days, before we build the tower of Babel and everyone in the world spoke ANSI ;-)
And Irrlicht certainly also a bit to blame, but similar reasons I assume (it was developed around the time Java was the big thing, so I guess bit influenced by that maybe?)
The easiest solution is still to not use non-ansi characters in c++ source files. Put them into text-files and load and convert them. That is a good software development practice anyway for any string which your program users can see . As then you can convert your program to other languages easier.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
milen_prg
Posts: 5
Joined: Tue May 16, 2023 5:42 pm

Re: How to use UNICODE

Post by milen_prg »

@CuteAlien, thank, you, very much for this explanation in detail!
Before that I tried Irrlicht with GCC and the Cyrillic font works just fine. This with MSVC not works.
Now, after your explanation I first try to add some compiler flags for MSVC:

set(CMAKE_C_FLAGS_RELEASE "/source-charset:utf-8 /utf-8 /validate-charset")

but without any success.
I had some problems to link https://sourceforge.net/p/irrlicht/code ... t/utf8.cpp, and at least I use function utf8ztowchar from https://github.com/tapika/cutf, which works fine, but it is not so convenient as was with direct using of strings in GCC source.

Whatever, I now understand the problem and have working decision, will consider to load all strings from file with proper conversion.

Thank, you!
milen_prg
Posts: 5
Joined: Tue May 16, 2023 5:42 pm

Re: How to use UNICODE

Post by milen_prg »

For completeness must add that I correct the flags for MSVC C++ (Visual Studio 2022) and it works directly with L strings:

set(CMAKE_CXX_FLAGS_RELEASE "/source-charset:utf-8 /execution-charset:utf-8")

or even only:
set(CMAKE_CXX_FLAGS_RELEASE "/utf-8")

(In gcc/g++ this works without any additional flags.)
Post Reply