XMLReader vs XMLWriter

clarks · Post by **clarks** » Fri Aug 17, 2012 9:38 pm

I believe that the XMLWriter Interface should be changed a bit. It's interface is vastly different from how xmlreader is implemented. For one, XMLReader is templated allowing wchar_t and char, along with the ease of traversing an xml document to find elements and get attribute values easily. XMLWriter on the other hand only deals with wchar_t, so the conversion is a killer if you're used to using irr::core::stringc for strings. Another thing is that XMLReader allows the reading of attributes just by name. Shouldn't XMLWriter allow the writing of attributes just by name? These are just my thoughts, please comment.

CuteAlien · Post by **CuteAlien** » Sat Aug 18, 2012 10:20 am

Yes, supporting more formats on writing is certainly a feature that would be nice (especially as the current version writes different files on Linux (4 byte characters) and Windows (2 byte characters). There is one patch from Nalin somewhere using his utf16 string class to fix that, I hope we can decide on a solution in Irrlicht 1.9.

I don't know what you mean by writing attributes by name - can you give an example how that interface would look like?

What I miss a lot is that the writer can not write cdata at all. Also it does replace special characters in comments which is probably also wrong. I'm even considering fixing the cdata part before 1.8 when I find time for it as the writer is simply not usable when you need cdata right now and it's probably easy to fix.

Also annoying is that the reader mixes up names and data - using the same variable for both - which makes coding often way more complicated than would be necessary otherwise. I just don't dare changing that one before 1.8 as we already have a feature-freeze and it's not such a trivial change (patch for that has to be careful that the name must still be available in the close-tag and think about how to minimize breaking existing code). But also something I hope to change in Irrlicht 1.9

clarks · Post by **clarks** » Sat Aug 18, 2012 5:12 pm

The one im thinking about is like a fixed pipeline, thats probably templated;

Code: Select all

 
template< typename char_type >
class IIrrXMLWriter: public whatever
{
public:
 virtual const char_type* writeElement( const char_type* name, bool empty, bool holdForAttributes );
 virtual const char_type* writeAttribute( const char_type* name, const char_type* value);
 virtual const char_type* closeElement();
};
 
int main()
{
 IIrrXMLWriter* xml = create xml writer;
 
 // write the header, the function writeXMLHeader probably doesn,t need XML in its name
 xml->writeHeader();
 xml->writeElement("letter",true,true); // its empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
 xml->closeElement(); // no need to supply name, it internally figures it out using possibly a stack
 
 // what if nested elements need to be written
 xml->writeElement("letter",false,true); // its not empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
  xml->writeElement("body",false,false); // auto write the /> characters to close element letter opening tag
  xml->writeText("Hello, how are you. I am trying to improve Irrlicht.  Its a great engine. You should check it out. Love Always, Irrlicht");
 xml->closeElement(); // close body
xml->closeElement(); // close leter
 
return 0;
}

An interesting thing is that to write elements there are two functions in the original interface. One takes an array of names and values using wchar_t, while the other takes alot of parameters. Then there is the IAttribute class that is very flexible but there is some inconsistency. I believe that the way that these two interfaces writes attributes should be the same. However, IAttributes are writing attributes as elements, but the xml writer writes attributes:

<body what="this is how attributes should be written"/>

I was reading an article about xml at http://www.w3schools.com/ where the author rejected the idea of using attributes. Now personally I am not a big fan of xml, but Irrlicht is great. So if irrlicht provides I use. Now take a look at this xml doc:

Code: Select all

 
<?xml version="1.0"?>
    <body  
        mass = "1000.0"
        width = "1.83"
        height = "0.963"
        depth = "4.470"
        model = "chassis.obj"
        vshader = "vert.glsl"
        fshader = "frag.glsl"
        wind_sound = "sound/windy.wav"
    />

vs.

Code: Select all

 
<?xml version="1.0"?>
 
    <body>  
        <mass>1000.0</mass>
        <width>1.83</width>
        <height>0.963</height>
        <depth>4.470</depth>
        <model>chassis.obj</model>
        <vshader>vert.glsl</vshader>
        <fshader>frag.glsl</fshader>
        <wind_sound>sound/windy.wav</wind_sound>
    </body>

This first doc is easy to read and pretty straight forward. Now imagine if these attributes were written as elements (the second doc), it would be overkill. Now some people would actually prefer that. Why on the earth would they prefer that I don't know. But I am a big fan of keeping things simple and clean. If xml writer can take an array of names and values, it probably should be able to return one. IAttributes is powerful but attributes are really elements. Irrlicht gets better every day, and as it continues to mature I think we should keep it clean.

CuteAlien · Post by **CuteAlien** » Sat Aug 18, 2012 7:51 pm

Writing templated - maybe, I think reading works like this, although it is maybe more about which parameters to pass in. Then again there's just a fixed number of possibilities anyway that make really sense (utf-8, utf-16, utf-32), so using a parameter on writing would also not be so bad and it would be obvious what is supported. I would probably prefer that as user.

The 2 functions for writing probably have a simple reason. One is there for lazyness and covers the usual case. The other is there when you need more than 5 parameters.

The attributes interface is about serialization and the result looks rather like this:

Code: Select all

 
<int name="VariableNameInt" value="152722522" />
<float name="VariableNameFloat" value="1.000000" />
<string name="VariableNameString" value="one" />

So each attribute saves name, value and type. If that last one is needed depends a little bit on what you do - actually in many cases it wouldn't be necessary. But I think it's used for example in irrEdit to allow support for editing parameters without irrEdit having to know _which_ parameters an interface supports. So when we add a new variable to a scenenode for example we just have to serialize it and the editor automatically supports it then.

clarks · Post by **clarks** » Sat Aug 18, 2012 11:57 pm

The attributes interface is about serialization

I know that and I think its brilliant.

So each attribute saves name, value and type. If that last one is needed depends a little bit on what you do - actually in many cases it wouldn't be necessary

My point exactly , I just think its a bit overkill (type) because attributes are written as elements. Which or course allows us to know the type. But making changes to it would break compatibility and that's not good.

But I think it's used for example in irrEdit to allow support for editing parameters without irrEdit having to know _which_ parameters an interface supports.

Ok you made your point, irrEdit would automatically be able to create a widget for the attribute based on the type, like creating a check box for booleans, and only allowing numbers to be entered in an editbox for ints and floats. That's brilliant!!!
What if we could control how IAttributes are read and written. How about specifying whether or not it should write as an attribute element or as real xml attributes. A same way for reading them as attribute elements or as real xml attributes. I am thinking in terms of making it a bit more general purpose, but I could be pushing this to far.

CuteAlien · Post by **CuteAlien** » Sun Aug 19, 2012 10:51 am

clarks wrote: What if we could control how IAttributes are read and written. How about specifying whether or not it should write as an attribute element or as real xml attributes. A same way for reading them as attribute elements or as real xml attributes. I am thinking in terms of making it a bit more general purpose, but I could be pushing this to far.

I guess we could have smaller xml-files then sometimes. Without type the format could probably be like:

Code: Select all

 
<attributes>
VariableNameInt="152722522"
VariableNameFloat ="1.000000" 
VariableNameString = "one"
</attributes>

Or more likely we would need another label for attributes to distinguish them from the other style (so we know about it on reading). The question is a little bit if there are really use-cases for when that is needed. So maybe a nice to have feature, but not exactly a high-priority for now. The other changes like writing utf-8, writing cdata and a cleaner xml-interface on reading just seem more important for now.

hybrid · Post by **hybrid** » Sun Aug 19, 2012 11:39 am

Oh no, not such an unparsable format. The really proper way would be to put everything into nodes. Due to some legacy considerations inherent to most programmers, the first idea is always to put things into attributes. But that's no good idea, you lose too much flexibility and the gain is neglectable. But of course these things cannot be changed anymore, once the format is used that way. But we shouldn't make it worse than it is.

CuteAlien · Post by **CuteAlien** » Sun Aug 19, 2012 12:11 pm

Well... worse is always relative and depends on what you do. This is about the shortest format that could be done without changing our serialization interface. Which has some value when it's about game-programming. Irrlicht itself could parse this already with minor changes because inside Irrlicht we mention the types always on reading and writing (we do things like writeInteger, readInteger). So it would just have to be changed inside the attribute system - not even in the code using that system.

christianclavet · Post by **christianclavet** » Wed Aug 22, 2012 11:39 pm

Hi, Sorry to get into the conversation... But since the discussion is about XML Writing/Reading...

I would really appreciate that the writer be improved someday to support UTF-8 writing for any platform without having to patch the source.(Linux, Window, MacOS, etc).

I like that the loader can at least parse theses files. First I was thinking accented characters were not supported. With the check (2 bytes character) and a little offset it finally match. But the values is not what I had from the UTF reference tables from the internet, I had to tweak until I found the proper offset value.

CuteAlien · Post by **CuteAlien** » Thu Aug 23, 2012 7:54 am

If the utf-conversion on loading is not correct I could need an example, do you have one maybe?

christianclavet · Post by **christianclavet** » Sat Aug 25, 2012 8:50 pm

Hi, You can get my XML UTF-8 example here.http://irrrpgbuilder.svn.sourceforge.ne ... vision=302

I'm using this file to store all the languages strings for IRB.

Here is a function I'm using to "offset" the data and give me back the string correctly to display in Irrlicht.
You can find that in the tools directory in the SVN repository of IRB if you need the full code:

Here is how I offset the data:

Code: Select all

core::stringw xmldata::winconvert(core::stringw str)
// Convert accents from loaded XML files (irrXML)
// WARNING: Tested only on windows
// might not work on Linux or other platform.
{
    bool debug = false;
    core::stringw textline = L"";
    core::stringw text = L"";
    u32 base = 0;
    
    char test2 = ' ';
 
    for (u32 a=0; a<str.size(); a++)
    {
        // Get the character first
        text = str.subString(a,1);
        
        // Then check this character directly (convert to unsigned 32bit)
        base=(u32)text[0];
 
        if (base<256) // Standard characters
        {
            textline+=text;
        }
        
        // All characters after 256 are ignored except thoses
        // Character higher are re-aligned from the offset to match LATIN1
 
        // Reference to the table is here:
        // http://www.utf8-chartable.de/unicode-utf8-table.pl
        
        const u32 offset=65216;
 
        core::stringw replace = L" ";
        if ((base>255) && ((base-offset)<255))
        {
                replace[0]=(base-offset);
                textline+=replace;
        }
        
    }
    
    return textline;
}

blissed · Post by **blissed** » Mon Feb 04, 2013 3:48 am

clarks wrote:The one im thinking about is like a fixed pipeline, thats probably templated;
Code: Select all
 
template< typename char_type >
class IIrrXMLWriter: public whatever
{
public:
 virtual const char_type* writeElement( const char_type* name, bool empty, bool holdForAttributes );
 virtual const char_type* writeAttribute( const char_type* name, const char_type* value);
 virtual const char_type* closeElement();
};
 
int main()
{
 IIrrXMLWriter* xml = create xml writer;
 
 // write the header, the function writeXMLHeader probably doesn,t need XML in its name
 xml->writeHeader();
 xml->writeElement("letter",true,true); // its empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
 xml->closeElement(); // no need to supply name, it internally figures it out using possibly a stack
 
 // what if nested elements need to be written
 xml->writeElement("letter",false,true); // its not empty, but don't close because we have to write attributes
 xml->writeAttribute("name","Joe");
 xml->writeAttribute("address","no where");
  xml->writeElement("body",false,false); // auto write the /> characters to close element letter opening tag
  xml->writeText("Hello, how are you. I am trying to improve Irrlicht.  Its a great engine. You should check it out. Love Always, Irrlicht");
 xml->closeElement(); // close body
xml->closeElement(); // close leter
 
return 0;
}
 
An interesting thing is that to write elements there are two functions in the original interface. One takes an array of names and values using wchar_t, while the other takes alot of parameters. Then there is the IAttribute class that is very flexible but there is some inconsistency. I believe that the way that these two interfaces writes attributes should be the same. However, IAttributes are writing attributes as elements, but the xml writer writes attributes:

<body what="this is how attributes should be written"/>

I was reading an article about xml at http://www.w3schools.com/ where the author rejected the idea of using attributes. Now personally I am not a big fan of xml, but Irrlicht is great. So if irrlicht provides I use. Now take a look at this xml doc:
Code: Select all
 
<?xml version="1.0"?>
    <body  
        mass = "1000.0"
        width = "1.83"
        height = "0.963"
        depth = "4.470"
        model = "chassis.obj"
        vshader = "vert.glsl"
        fshader = "frag.glsl"
        wind_sound = "sound/windy.wav"
    />
 
vs.
Code: Select all
 
<?xml version="1.0"?>
 
    <body>  
        <mass>1000.0</mass>
        <width>1.83</width>
        <height>0.963</height>
        <depth>4.470</depth>
        <model>chassis.obj</model>
        <vshader>vert.glsl</vshader>
        <fshader>frag.glsl</fshader>
        <wind_sound>sound/windy.wav</wind_sound>
    </body>
 
This first doc is easy to read and pretty straight forward. Now imagine if these attributes were written as elements (the second doc), it would be overkill. Now some people would actually prefer that. Why on the earth would they prefer that I don't know. But I am a big fan of keeping things simple and clean. If xml writer can take an array of names and values, it probably should be able to return one. IAttributes is powerful but attributes are really elements. Irrlicht gets better every day, and as it continues to mature I think we should keep it clean.

Hey guys, I know this is a few months old but I was thought this was worth mentioning. The reason why data should NOT be stored in attributes is because of data types. XSD files (XML schemas) can explicitly state what type of data (integer, decimal, string, etc) is stored in element's value. So I believe Irrlicht should follow W3C's standards in promoting the prevention of data being stored in attributes. It would be nice if Irrlicht's XML reader could provide an easier way to read element values -- something in the line of a DOM (document object model) type object.

chronologicaldot · Post by **chronologicaldot** » Tue Feb 05, 2013 7:40 pm

The reason why data should NOT be stored in attributes is because of data types.

... Or you could just read everything in as strings and interpret what those strings mean in some other area of your program rather than locking everything in to strict data types. e.g. What if I wanted to read an integer attribute as a floating point value? I'm going to have to make the conversion somewhere, so in a sense it doesn't really matter if I read it in as a string or as an integer. (And no, there isn't much of a time delay anyways because the value starts as a string to begin with.)

Oh, and if you're looking for a DOM-tree generator for reading XML files, I wrote one called XMLStorage that you can find on my website here:
http://chronologicaldot.web44.net/proje ... xmlstorage

Irrlicht Engine

XMLReader vs XMLWriter

XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter

Re: XMLReader vs XMLWriter