XMLReader vs XMLWriter

Discuss about anything related to the Irrlicht Engine, or read announcements about any significant features or usage changes.
Post Reply
clarks
Posts: 35
Joined: Sat Jul 28, 2012 5:23 am

XMLReader vs XMLWriter

Post by clarks »

I believe that the XMLWriter Interface should be changed a bit. It's interface is vastly different from how xmlreader is implemented. For one, XMLReader is templated allowing wchar_t and char, along with the ease of traversing an xml document to find elements and get attribute values easily. XMLWriter on the other hand only deals with wchar_t, so the conversion is a killer if you're used to using irr::core::stringc for strings. Another thing is that XMLReader allows the reading of attributes just by name. Shouldn't XMLWriter allow the writing of attributes just by name? These are just my thoughts, please comment.
CuteAlien
Admin
Posts: 9648
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: XMLReader vs XMLWriter

Post by CuteAlien »

Yes, supporting more formats on writing is certainly a feature that would be nice (especially as the current version writes different files on Linux (4 byte characters) and Windows (2 byte characters). There is one patch from Nalin somewhere using his utf16 string class to fix that, I hope we can decide on a solution in Irrlicht 1.9.

I don't know what you mean by writing attributes by name - can you give an example how that interface would look like?

What I miss a lot is that the writer can not write cdata at all. Also it does replace special characters in comments which is probably also wrong. I'm even considering fixing the cdata part before 1.8 when I find time for it as the writer is simply not usable when you need cdata right now and it's probably easy to fix.

Also annoying is that the reader mixes up names and data - using the same variable for both - which makes coding often way more complicated than would be necessary otherwise. I just don't dare changing that one before 1.8 as we already have a feature-freeze and it's not such a trivial change (patch for that has to be careful that the name must still be available in the close-tag and think about how to minimize breaking existing code). But also something I hope to change in Irrlicht 1.9
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
clarks
Posts: 35
Joined: Sat Jul 28, 2012 5:23 am

Re: XMLReader vs XMLWriter

Post by clarks »

The one im thinking about is like a fixed pipeline, thats probably templated;

Code: Select all

 
template< typename char_type >
class IIrrXMLWriter: public whatever
{
public:
 virtual const char_type* writeElement( const char_type* name, bool empty, bool holdForAttributes );
 virtual const char_type* writeAttribute( const char_type* name, const char_type* value);
 virtual const char_type* closeElement();
};
 
int main()
{
 IIrrXMLWriter* xml = create xml writer;
 
 // write the header, the function writeXMLHeader probably doesn,t need XML in its name
 xml->writeHeader();
 xml->writeElement("letter",true,true); // its empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
 xml->closeElement(); // no need to supply name, it internally figures it out using possibly a stack
 
 // what if nested elements need to be written
 xml->writeElement("letter",false,true); // its not empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
  xml->writeElement("body",false,false); // auto write the /> characters to close element letter opening tag
  xml->writeText("Hello, how are you. I am trying to improve Irrlicht.  Its a great engine. You should check it out. Love Always, Irrlicht");
 xml->closeElement(); // close body
xml->closeElement(); // close leter
 
return 0;
}
 
An interesting thing is that to write elements there are two functions in the original interface. One takes an array of names and values using wchar_t, while the other takes alot of parameters. Then there is the IAttribute class that is very flexible but there is some inconsistency. I believe that the way that these two interfaces writes attributes should be the same. However, IAttributes are writing attributes as elements, but the xml writer writes attributes:

<body what="this is how attributes should be written"/>

I was reading an article about xml at http://www.w3schools.com/ where the author rejected the idea of using attributes. Now personally I am not a big fan of xml, but Irrlicht is great. So if irrlicht provides I use. Now take a look at this xml doc:

Code: Select all

 
<?xml version="1.0"?>
    <body  
        mass = "1000.0"
        width = "1.83"
        height = "0.963"
        depth = "4.470"
        model = "chassis.obj"
        vshader = "vert.glsl"
        fshader = "frag.glsl"
        wind_sound = "sound/windy.wav"
    />
 
vs.

Code: Select all

 
<?xml version="1.0"?>
 
    <body>  
        <mass>1000.0</mass>
        <width>1.83</width>
        <height>0.963</height>
        <depth>4.470</depth>
        <model>chassis.obj</model>
        <vshader>vert.glsl</vshader>
        <fshader>frag.glsl</fshader>
        <wind_sound>sound/windy.wav</wind_sound>
    </body>
 
This first doc is easy to read and pretty straight forward. Now imagine if these attributes were written as elements (the second doc), it would be overkill. Now some people would actually prefer that. Why on the earth would they prefer that I don't know. But I am a big fan of keeping things simple and clean. If xml writer can take an array of names and values, it probably should be able to return one. IAttributes is powerful but attributes are really elements. Irrlicht gets better every day, and as it continues to mature I think we should keep it clean.
CuteAlien
Admin
Posts: 9648
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: XMLReader vs XMLWriter

Post by CuteAlien »

Writing templated - maybe, I think reading works like this, although it is maybe more about which parameters to pass in. Then again there's just a fixed number of possibilities anyway that make really sense (utf-8, utf-16, utf-32), so using a parameter on writing would also not be so bad and it would be obvious what is supported. I would probably prefer that as user.

The 2 functions for writing probably have a simple reason. One is there for lazyness and covers the usual case. The other is there when you need more than 5 parameters.

The attributes interface is about serialization and the result looks rather like this:

Code: Select all

 
<int name="VariableNameInt" value="152722522" />
<float name="VariableNameFloat" value="1.000000" />
<string name="VariableNameString" value="one" />
 
So each attribute saves name, value and type. If that last one is needed depends a little bit on what you do - actually in many cases it wouldn't be necessary. But I think it's used for example in irrEdit to allow support for editing parameters without irrEdit having to know _which_ parameters an interface supports. So when we add a new variable to a scenenode for example we just have to serialize it and the editor automatically supports it then.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
clarks
Posts: 35
Joined: Sat Jul 28, 2012 5:23 am

Re: XMLReader vs XMLWriter

Post by clarks »

The attributes interface is about serialization
I know that and I think its brilliant.
So each attribute saves name, value and type. If that last one is needed depends a little bit on what you do - actually in many cases it wouldn't be necessary
My point exactly , I just think its a bit overkill (type) because attributes are written as elements. Which or course allows us to know the type. But making changes to it would break compatibility and that's not good.
But I think it's used for example in irrEdit to allow support for editing parameters without irrEdit having to know _which_ parameters an interface supports.
Ok you made your point, irrEdit would automatically be able to create a widget for the attribute based on the type, like creating a check box for booleans, and only allowing numbers to be entered in an editbox for ints and floats. That's brilliant!!!
What if we could control how IAttributes are read and written. How about specifying whether or not it should write as an attribute element or as real xml attributes. A same way for reading them as attribute elements or as real xml attributes. I am thinking in terms of making it a bit more general purpose, but I could be pushing this to far.
CuteAlien
Admin
Posts: 9648
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: XMLReader vs XMLWriter

Post by CuteAlien »

clarks wrote: What if we could control how IAttributes are read and written. How about specifying whether or not it should write as an attribute element or as real xml attributes. A same way for reading them as attribute elements or as real xml attributes. I am thinking in terms of making it a bit more general purpose, but I could be pushing this to far.
I guess we could have smaller xml-files then sometimes. Without type the format could probably be like:

Code: Select all

 
<attributes>
VariableNameInt="152722522"
VariableNameFloat ="1.000000" 
VariableNameString = "one"
</attributes>
 
Or more likely we would need another label for attributes to distinguish them from the other style (so we know about it on reading). The question is a little bit if there are really use-cases for when that is needed. So maybe a nice to have feature, but not exactly a high-priority for now. The other changes like writing utf-8, writing cdata and a cleaner xml-interface on reading just seem more important for now.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Re: XMLReader vs XMLWriter

Post by hybrid »

Oh no, not such an unparsable format. The really proper way would be to put everything into nodes. Due to some legacy considerations inherent to most programmers, the first idea is always to put things into attributes. But that's no good idea, you lose too much flexibility and the gain is neglectable. But of course these things cannot be changed anymore, once the format is used that way. But we shouldn't make it worse than it is.
CuteAlien
Admin
Posts: 9648
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: XMLReader vs XMLWriter

Post by CuteAlien »

Well... worse is always relative and depends on what you do. This is about the shortest format that could be done without changing our serialization interface. Which has some value when it's about game-programming. Irrlicht itself could parse this already with minor changes because inside Irrlicht we mention the types always on reading and writing (we do things like writeInteger, readInteger). So it would just have to be changed inside the attribute system - not even in the code using that system.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Re: XMLReader vs XMLWriter

Post by christianclavet »

Hi, Sorry to get into the conversation... But since the discussion is about XML Writing/Reading...

I would really appreciate that the writer be improved someday to support UTF-8 writing for any platform without having to patch the source.(Linux, Window, MacOS, etc).

I like that the loader can at least parse theses files. First I was thinking accented characters were not supported. With the check (2 bytes character) and a little offset it finally match. But the values is not what I had from the UTF reference tables from the internet, I had to tweak until I found the proper offset value.
CuteAlien
Admin
Posts: 9648
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: XMLReader vs XMLWriter

Post by CuteAlien »

If the utf-conversion on loading is not correct I could need an example, do you have one maybe?
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
christianclavet
Posts: 1638
Joined: Mon Apr 30, 2007 3:24 am
Location: Montreal, CANADA
Contact:

Re: XMLReader vs XMLWriter

Post by christianclavet »

Hi, You can get my XML UTF-8 example here.http://irrrpgbuilder.svn.sourceforge.ne ... vision=302

I'm using this file to store all the languages strings for IRB.

Here is a function I'm using to "offset" the data and give me back the string correctly to display in Irrlicht.
You can find that in the tools directory in the SVN repository of IRB if you need the full code:

Here is how I offset the data:

Code: Select all

core::stringw xmldata::winconvert(core::stringw str)
// Convert accents from loaded XML files (irrXML)
// WARNING: Tested only on windows
// might not work on Linux or other platform.
{
    bool debug = false;
    core::stringw textline = L"";
    core::stringw text = L"";
    u32 base = 0;
    
    char test2 = ' ';
 
    for (u32 a=0; a<str.size(); a++)
    {
        // Get the character first
        text = str.subString(a,1);
        
        // Then check this character directly (convert to unsigned 32bit)
        base=(u32)text[0];
 
        if (base<256) // Standard characters
        {
            textline+=text;
        }
        
        // All characters after 256 are ignored except thoses
        // Character higher are re-aligned from the offset to match LATIN1
 
        // Reference to the table is here:
        // http://www.utf8-chartable.de/unicode-utf8-table.pl
        
        const u32 offset=65216;
 
        core::stringw replace = L" ";
        if ((base>255) && ((base-offset)<255))
        {
                replace[0]=(base-offset);
                textline+=replace;
        }
        
    }
    
    return textline;
}
blissed
Posts: 5
Joined: Sun Oct 10, 2004 1:04 pm

Re: XMLReader vs XMLWriter

Post by blissed »

clarks wrote:The one im thinking about is like a fixed pipeline, thats probably templated;

Code: Select all

 
template< typename char_type >
class IIrrXMLWriter: public whatever
{
public:
 virtual const char_type* writeElement( const char_type* name, bool empty, bool holdForAttributes );
 virtual const char_type* writeAttribute( const char_type* name, const char_type* value);
 virtual const char_type* closeElement();
};
 
int main()
{
 IIrrXMLWriter* xml = create xml writer;
 
 // write the header, the function writeXMLHeader probably doesn,t need XML in its name
 xml->writeHeader();
 xml->writeElement("letter",true,true); // its empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
 xml->closeElement(); // no need to supply name, it internally figures it out using possibly a stack
 
 // what if nested elements need to be written
 xml->writeElement("letter",false,true); // its not empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
  xml->writeElement("body",false,false); // auto write the /> characters to close element letter opening tag
  xml->writeText("Hello, how are you. I am trying to improve Irrlicht.  Its a great engine. You should check it out. Love Always, Irrlicht");
 xml->closeElement(); // close body
xml->closeElement(); // close leter
 
return 0;
}
 
An interesting thing is that to write elements there are two functions in the original interface. One takes an array of names and values using wchar_t, while the other takes alot of parameters. Then there is the IAttribute class that is very flexible but there is some inconsistency. I believe that the way that these two interfaces writes attributes should be the same. However, IAttributes are writing attributes as elements, but the xml writer writes attributes:

<body what="this is how attributes should be written"/>

I was reading an article about xml at http://www.w3schools.com/ where the author rejected the idea of using attributes. Now personally I am not a big fan of xml, but Irrlicht is great. So if irrlicht provides I use. Now take a look at this xml doc:

Code: Select all

 
<?xml version="1.0"?>
    <body  
        mass = "1000.0"
        width = "1.83"
        height = "0.963"
        depth = "4.470"
        model = "chassis.obj"
        vshader = "vert.glsl"
        fshader = "frag.glsl"
        wind_sound = "sound/windy.wav"
    />
 
vs.

Code: Select all

 
<?xml version="1.0"?>
 
    <body>  
        <mass>1000.0</mass>
        <width>1.83</width>
        <height>0.963</height>
        <depth>4.470</depth>
        <model>chassis.obj</model>
        <vshader>vert.glsl</vshader>
        <fshader>frag.glsl</fshader>
        <wind_sound>sound/windy.wav</wind_sound>
    </body>
 
This first doc is easy to read and pretty straight forward. Now imagine if these attributes were written as elements (the second doc), it would be overkill. Now some people would actually prefer that. Why on the earth would they prefer that I don't know. But I am a big fan of keeping things simple and clean. If xml writer can take an array of names and values, it probably should be able to return one. IAttributes is powerful but attributes are really elements. Irrlicht gets better every day, and as it continues to mature I think we should keep it clean.

Hey guys, I know this is a few months old but I was thought this was worth mentioning. The reason why data should NOT be stored in attributes is because of data types. XSD files (XML schemas) can explicitly state what type of data (integer, decimal, string, etc) is stored in element's value. So I believe Irrlicht should follow W3C's standards in promoting the prevention of data being stored in attributes. It would be nice if Irrlicht's XML reader could provide an easier way to read element values -- something in the line of a DOM (document object model) type object.
chronologicaldot
Competition winner
Posts: 685
Joined: Mon Sep 10, 2012 8:51 am

Re: XMLReader vs XMLWriter

Post by chronologicaldot »

The reason why data should NOT be stored in attributes is because of data types.
... Or you could just read everything in as strings and interpret what those strings mean in some other area of your program rather than locking everything in to strict data types. e.g. What if I wanted to read an integer attribute as a floating point value? I'm going to have to make the conversion somewhere, so in a sense it doesn't really matter if I read it in as a string or as an integer. (And no, there isn't much of a time delay anyways because the value starts as a string to begin with.)

Oh, and if you're looking for a DOM-tree generator for reading XML files, I wrote one called XMLStorage that you can find on my website here:
http://chronologicaldot.web44.net/proje ... xmlstorage
Post Reply