July 18, 2012

Thibaut Thibaut
Lab Rat
9 posts

Solved : Reading ASCII/UTF-8 file

 

Hi all,

I am using Qt to read a file generated by the method store of the Java Properties class. This method generates an ASCII file. However, special characters are written in their Utf-8 equivalent. In the end, the ascii file looks like that :

firstname = g\u00C9rard
lastname = normand

Now, when I read and display the data, I would like to display the symbols itself (gérard normand). I tried using the toUtf8() method but nothing convincing came out. So how do I read this ascii file while handling the Utf-8 symbols ?

Thanks for your help

Thibaut

7 replies

July 18, 2012

Andre Andre
Area 51 Engineer
6075 posts

The format you describe isn’t UTF-8, but an escaping sequence. You’ll need to parse those. AFAIK, there is no standard Qt codec that deals with these. You could create your own though. See QTextCodec for more information on that.

 Signature 

Looking for Qt developers to join our team @ i-Optics: https://qt-project.org/forums/viewthread/25393/

July 18, 2012

Jeroentje@home Jeroentje@ho..
Robot Herder
272 posts

Hi, the \u00C9 is a utf32 representation of a capital letter é, so getting the small letter is going to be trouble some. Think as Andre mentioned you need to make your own conversion class to handle this one.

 Signature 

Greetz, Jeroen

July 19, 2012

Thibaut Thibaut
Lab Rat
9 posts

Thanks for the reply. The capital letter é is what I want. I went on http://www.fileformat.info/info/unicode/char/c9/index.htm and it says that \u00C9 is Java/C++ source code for the capital é character. So I was thinking there might be a way around it.

July 19, 2012

Thibaut Thibaut
Lab Rat
9 posts

Finally found what I was looking for.
I need to use the following routine

  1. QRegExp rx("(\\\\u[0-9a-fA-F]{4})");
  2. int pos = 0;
  3. while ((pos = rx.indexIn(str, pos)) != -1) {
  4.     str.replace(pos++, 6, QChar(rx.cap(1).right(4).toUShort(0, 16)));
  5. }

Thanks a lot for your replies

July 19, 2012

koahnig koahnig
Mad Scientist
2193 posts

Did you check out fromUnicode? [qt-project.org]
At least from the name and description it seems to be fitting. However, Andre may have more experience with this.

July 19, 2012

Andre Andre
Area 51 Engineer
6075 posts
koahnig wrote:
Did you check out fromUnicode? [qt-project.org] At least from the name and description it seems to be fitting. However, Andre may have more experience with this.

The problem is that the codec used is not a codec in the normal sense. It is a unicode escape sequence in an otherwise ASCII-encoded file. So to use this method, you first have to actually implement a codec that does that translation back and forth. If you have to work with these files, it is probably a good idea to implement such a codec. Doesn’t seem all that hard to me…

Edit: though, I admit, I did not try doing it myself, so it might be harder than it seems by just looking at the docs…

 Signature 

Looking for Qt developers to join our team @ i-Optics: https://qt-project.org/forums/viewthread/25393/

July 19, 2012

koahnig koahnig
Mad Scientist
2193 posts
Andre wrote:
koahnig wrote:
Did you check out fromUnicode? [qt-project.org] At least from the name and description it seems to be fitting. However, Andre may have more experience with this.

The problem is that the codec used is not a codec in the normal sense. It is a unicode escape sequence in an otherwise ASCII-encoded file. So to use this method, you first have to actually implement a codec that does that translation back and forth. If you have to work with these files, it is probably a good idea to implement such a codec. Doesn’t seem all that hard to me…

Andre, thanks for clarification

 
  ‹‹ MacOS & SSL      How to format QString to remove % and & symbols for URL requests ››

You must log in to post a reply. Not a member yet? Register here!