irrXML doesn't remove white spaces. Why?

Post your questions, suggestions and experiences regarding game design, integration of external libraries here. For irrEdit, irrXML and irrKlang, see the
ambiera forums
Post Reply
Menuki
Posts: 6
Joined: Sat Apr 02, 2005 6:43 pm
Location: France

irrXML doesn't remove white spaces. Why?

Post by Menuki »

(I don't know where to put this message. So...)

Hello

To clarify my XML files, I indent them like that :

Code: Select all

<?xml version="1.0" ?>
<modele_inverse>
  <config test="ceci est un test">
    <origine>(51,17,21),(3,23,54)</origine>
    <capteur id="CPT1s">description CPT1</capteur>
    <capteur id="CPT2">description CPT2</capteur>
  </config>
</modele_inverse>
My (little) problem is that irrXML report EXN_TEXT between each element even if there is only whitespace.
Regarding the source code, only white space text with less than 3 characters are suppressed. Others are reported.

Why only 3 characters? Why do not suppress text entirely composed of white spaces?

With XML rules compliance, we can suppress contiguous white spaces at the beginning and at the end of the text and replace those in the middle by only one space.

What is your opinion?
elander
Posts: 193
Joined: Tue Oct 05, 2004 11:37 am

Post by elander »

I think these decisions should be made into a configurable option. Perhaps you could edit the irrXML source and modify the api then propose your own configuration options.
Menuki
Posts: 6
Joined: Sat Apr 02, 2005 6:43 pm
Location: France

Post by Menuki »

Thanks.

The modifications are already done.

Behind the remarks, the questions are:
- Is there anybody having the same problem?
- Is there anybody interested by the modifications?
- Have these modifications to be inserted into the next release?

Your idea (configurable option) is very good.
Menuki
Posts: 6
Joined: Sat Apr 02, 2005 6:43 pm
Location: France

Post by Menuki »

Here the modifications I have made in CXMLReaderImpl::parseText() method :

Code: Select all

	bool setText(char_type* start, char_type* end)
	{
		// we always do the check at the beginning but without the limit of 3 characters
		// so if the text is composed only with withespace, it is squeezed
		// if we add a configuration flag, we can have something like that
#ifdef WITH_WHITESPACE_COMPRESSION
		if (m_bWithWhitespaceCompression)
#endif
		{
			// this part of code is unchanged
			char_type* p = start;
			for(; p != end; ++p)
				if (!isWhiteSpace(*p))
					break;

			if (p == end)
				return false;

			// here, we now there is at least one no whitespace character
			// the string begins at its position
			start=p;

			// now, we can suppress those at the end of the string

			for (p=end-1;p!=start;--p)
				{
				if (!isWhiteSpace(*p))
					break;
				}
			// normaly, it's impossible that p==start here, but it's good to verify that...
			if (p == start)
				return false;
			end=p+1;
			// end of modification
		}
I have add a part of code which tests a whitespace compression configuration flag. You can remark that I have suppress the 3 characters llenght test. I think it's simple for users that parser have a binary behavior:
- you want a total whitespace compression or
- you want no whitespace compression at all.

For the moment, I have not implemented the whitespace compression inside the text (I have not the problem yet :oops: ).
The code should seem like that (totally untested):

Code: Select all

	core::string<char_type> compressMultipleWhitespace(
		core::string<char_type>& origstr)
	{
		// first find the first multiple whitespace occurence
		int pos,total_len;

		total_len=origstr.size();
		for (pos=0;pos<total_len-1;++pos)
		{
			if (isWhiteSpace(origstr[pos]) &&
				isWhiteSpace(origstr[pos+1]))
				{
				break;
				}
		}

		// if we didn't find multiple, it's done
		if (pos==origstr.size()-1)
		{
			return origstr;
		}

		core::string<char_type> newstr;

		oldPos=0;
		for (;;)
		{
			// append the non whitespace block
			newstr.append(origstr.substring(oldPos,pos-oldPos));

			// first find the next no whitespace character
			// we know that origstr[pos] and origstr[pos+1]
			// are whitespace so we begin at pos+2
			int next_no_whitespace=pos+2;
			while (next_no_whitespace<total_len &&
				   isWhiteSpace(origstr[next_no_whitespace]))
			{
				++next_no_whitespace;
			}
			// check if we are at the end of the string
			if (next_no_whitespace<total_len)
				{
				newstr.append(L' ');
				}
			oldPos=pos=next_no_whitespace;
			// search again for the next multiple whitespace occurence
			while (++pos<total_len-1)
			{
				if (isWhiteSpace(origstr[pos]) &&
					isWhiteSpace(origstr[pos+1]))
					{
					break;
					}
			}
			if (pos==total_len-1)
			{
				// copy the end of the string
				newstr.append(origstr.substring(oldPos,total_len-oldPos));
				break;
			}
		}
		return newstr;
	}
Voila. Hope this can help...
Menuki
Posts: 6
Joined: Sat Apr 02, 2005 6:43 pm
Location: France

Post by Menuki »

There is a bug in the code I have posted (method setText()).

Replace

Code: Select all

for (p=end-1;p!=start;--p)
by

Code: Select all

for (p=end-1;p!=start-1;--p)
And replace

Code: Select all

if (p == start)
by

Code: Select all

if (p == start-1)
In case of a one character long text, it caused the texte to be squeezed.
Post Reply