June 15, 2011

ionwind ionwind
Lab Rat
11 posts

Special characters in XML files created with QDomDocument

Page  
1

I’m using Qt to write a program that creates a XML files that can be opened in Excel. So far I have been progressing rather well but now I have this problem that I need to put newline characters to some of the cells containing some text. Excel XML sheets use 
 as the newline character.

So the problem is that with this:

  1. QString contents = "Foo&amp#10;Bar";
  2. QDomText cellContents = doc->createTextNode(contents);
  3. data.appendChild(cellContents);

I get this:
  1. <Cell><Data ss:Type="String">Foo&ampamp;#10;Bar</Data></Cell>

While this is what I need:
  1. <Cell><Data ss:Type="String">Foo&#10;Bar</Data></Cell>

I tried this:

  1. doc->implementation().setInvalidDataPolicy(QDomImplementation::AcceptInvalidChars);

But either that doesn’t help here or then there’s something wrong on how I’m doing this. Anyone cares to help?

20 replies

June 15, 2011

zester zester
Lab Rat
89 posts

Characters like “<” and “&” are illegal in XML elements.

Entity References <—- See this section
http://www.w3schools.com/xml/xml_syntax.asp

also see

http://www.w3schools.com/xml/xml_cdata.asp

June 15, 2011

ionwind ionwind
Lab Rat
11 posts

Yeah, I know the basics of xml and I know they’re illegal :D
I was just wondering if it is possible to anyway break the rules or to create some kind of a workaround. At least Microsoft breaks the rules: If you create a spreadsheet document in Excel and save it as .xml and you have there a cell with newline it’ll appear as in the xml document.

June 15, 2011

zester zester
Lab Rat
89 posts

The out put that your getting that you say you don’t want use that as your string.

June 15, 2011

ionwind ionwind
Lab Rat
11 posts

If I remember correctly I tried that before, then the output was something like

  1. <Cell><Data ss:Type="String">Foo&ampamp;&ampamp;#10;Bar</Data></Cell>

But I’ll try I tomorrow just to make sure…

June 15, 2011

zester zester
Lab Rat
89 posts

Hmmmm I am not sure how it would be done on Windows.

But from reading I see that…

In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). In Unix applications, a new line is normally stored as an LF character. Macintosh applications also use an LF to store a new line.

XML stores a new line as LF.

June 15, 2011

Volker Volker
Ant Farmer
5428 posts

This should do the trick:

  1. QString contents = "Foo\nBar";
  2. QDomText cellContents = doc->createTextNode(contents);
  3. data.appendChild(cellContents);

Wether the newline is put as an numeric entity (&#10;) or as a literal character is irrelevant. XML wise, both are equivalent.

[EDIT:] PS:
createTextNode() manipulates the input text in such a way, that when the document is parsed again the then retrieved string is exactly the same as your original one. So, every single ampersand (&) is transformed into a &amp;. The same holds for “<” and “>”, and in some cases the single and double quotation marks (’ and “).

June 16, 2011

zester zester
Lab Rat
89 posts

I’m not sure that will work volker but then again I could be wrong.

What the OP was getting at was that excel needs to uses special characters that are illegal in xml and the parser is converting the characters in such away to make them legal, which end’s up not being the proper format for excel.

In my previous post I was back tracking trying to figure out why that maybe.

Excel supports XML-based files, but the file must conform to some rules (excel XML sheet XSD schema).
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example

Not sure if that will help ;)

June 16, 2011

Chris Kawa Chris Kawa
Robot Herder
784 posts

Why not just put your text inside CDATA section i.e. instead of QDomText use QDomCDATASection?
&, < and newlines are all valid text inside that markup.

June 16, 2011

ionwind ionwind
Lab Rat
11 posts

Thanks for all the suggestions, but unfortunately none of them seems to work so far :(

zester’s got the right idea there what I’m up to here ;)

I had tried before what Volker suggested. \n in the string does this to the output file:

  1. <Cell><Data ss:Type="String">Foo
  2. Bar</Data></Cell>

While that’s right it’s not what I’m after here, since in Excel that shows up as a space between Foo and Bar, not a newline.

I was also wondering if CDATASection suggested by Krzysztof would work. I was rather hopeful when I saw this to appear in the output file:

  1. <Cell><Data ss:Type="String"><![CDATA[Foo&amp#10;Bar]]></Data></Cell>

But unfortunately, when opening the output file in Excel, the contents of the cell were still Foo&#10;Bar instead of Foo and Bar being on separate lines.

If anyone got any more suggestions I’d be happy to try them, but I guess it’s simpler if I just arrange the output file so that I don’t need the damn newlines :P

June 16, 2011

zester zester
Lab Rat
89 posts

It’s a case of your damed if you do and damed if you don’t thank microsoft
for there non-standard ways lol ;) I will keep looking I have run into this problem
before I just can’t for the life of me remember how I solved it. Maybe when volker
get’s a free minute from the Qt Summit he can enlighten us.

June 16, 2011

jim_kaiser jim_kaiser
Lab Rat
144 posts

So, sounds like you needs ASCII char 10 in your data. Did you try this?

  1.  QString contents = QString("Foo %1 Bar").arg((char)10);

Remove the spaces around %1 .. seems code tag is eating it without the spaces..

[ Edit: No different from putting a \n ofcourse.. that doesn’t work in excel?? ]

[ Edit2: Just use a string with \n while in Qt.. in the end when you save to an excel file… convert characters to excels format. That should work no? Okay my bad.. i see what you need.. the special chars from excel in Qt.. ]

June 16, 2011

Chris Kawa Chris Kawa
Robot Herder
784 posts

When you use CDATA everything you put there will be used as is, without any parsing so no wonder it shows untransformed &#10;
Maybe you tried it already, but I think the mix of those two solutions i.e. CDATA and \n should work.

[Edit.] Heh, just a wild idea just now if that doesn’t work. You can combine CDATA and text like so:

  1.  <![CDATA[some text]]>&#10;<![CDATA[some more text]]>

June 16, 2011

Volker Volker
Ant Farmer
5428 posts

Guys! Get you some book about basic XML!

XML wise all this is equivalent:

  1. &#10; == '\n' == QChar(10)

It is completely irrelevant if you put &#10; or ‘\n’ into an XML parser! If it makes a difference, the parser is not standards compliant.

And just to prove: Make a hand crafted xlsx file an just put a newline in it. At least in OpenOffice it is displayed with a line break. I didn’t have a “genuine” Excel at hand to test.

June 16, 2011

ionwind ionwind
Lab Rat
11 posts

jim_kaiser, just to mention that \n doesn’t work in Excel. Anyway the last solution you posted would work of course. I take it you mean something like this:

  1. QFile exportFile(fileName);
  2. if(exportFile.open(QIODevice::WriteOnly))
  3. {
  4.   QTextStream TextStream(&exportFile);
  5.   QString docString = doc->toString().replace("&ampamp;", "&");
  6.   TextStream << docString;
  7.   exportFile.close();
  8. }

I could do it that way, yes. In my case the exported XML files have thousands of lines and the largest ones have “only” 30-50 thousand lines, so it’s not a problem, even though searching and replacing bits of that large string just isn’t programmatically good-looking ;P

Krzysztof, thanks for the suggestions but so far any combination of CDATA and any newline sequence doesn’t seem to do the trick :\

June 16, 2011

Chris Kawa Chris Kawa
Robot Herder
784 posts

@Volker It doesn’t make a difference if it’s in a text node, but id does if it’s in the CDATA node. &#10; is a 4-letter string there and not a newline mark.
Other than that what you said is absolutely true – that’s why I think \n (or QChar(10) which is not necessarily the same with windows 2-char newlines is it?) should work just fine inside CDATA node(used to have those & and <), but if ionwind says it doesn’t than I don’t really know..

Page  
1

  ‹‹ How to use LibXl (Library C++ for writing/reading excel files) in Qt application      I STRONGLY NEED to know slot where signal is connected to. ››

You must log in to post a reply. Not a member yet? Register here!