January 9, 2011

BlackDante BlackDante
Lab Rat
55 posts

[SOLVED]Regular Expresion and national letters

 

Hi, I have another problem :)
In my aplication I wanna to catch everythink between quotation marks and I have problem with national letters like “ą ł ó ę ś ć ż ź”… so i write this regular expresion:

  1. QRegExp("\\\"[A-Za-z0-9_+=,.:;'<>-/*+() ąśćźżęłóń]+\\\"");

but it isn’t work… maybe not exactly not work, it work only to space and don’t catch the national letters…
Can anybody help?

 Signature 

sorry for my broken english :)

10 replies

January 9, 2011

peppe peppe
Ant Farmer
1026 posts

Which encoding is your source file saved in? Are you using a proper encoding for it AND a proper QString decoding method for building the string you pass to QRegExp ctor, like QString::fromUtf8? For instance, you can save the source file as UTF8, or save it as ASCII and put the unicode encoding of those characters, like “\xc4\x85” for the literal “ą”.

 Signature 

Software Engineer
KDAB (UK) Ltd., a KDAB Group company

January 9, 2011

Volker Volker
Robot Herder
5428 posts

If you want everything between two quoatation marks you can use a simpler regex:

  1. QRegExp re("\".+\"");
  2. re.setMinimal(true);

This matches if at least one character is between quotation marks.

Also, what’s the single backslash before your quotation mark in the regex for?

  1. # "\\\"[A" in C/C++
  2. # is actually
  3. # \"[A

January 10, 2011

BlackDante BlackDante
Lab Rat
55 posts

thank you Volker again, yours solutions is perfect :)

Volker wrote:

Also, what’s the single backslash before your quotation mark in the regex for?

  1. # "\\\"[A" in C/C++
  2. # is actually
  3. # \"[A

When I looked in to QRegExp example, most of examples was started with “\\” so I thought that in my case it’s must to be, and it works except national letters ;)

Peppe, I almost forgot about encoding QString and this was a problem ;) eh, still I am amateur, thanks for anwser, next time I will be remember to encodnig QString ;)

 Signature 

sorry for my broken english :)

January 10, 2011

Andre Andre
Area 51 Engineer
6076 posts

Note that you can get around most encoding issues by using the hexcodes instead for symbols outside the standard character range. That is less readable, but probably more relyable. The problem with text files (including source files) is that they carry no information on the encoding they are in. That means that trouble can arise as soon as somebody else, unaware of your encoding settings, start editing your file.

 Signature 

Looking for Qt developers to join our team @ i-Optics: https://qt-project.org/forums/viewthread/25393/

January 10, 2011

BlackDante BlackDante
Lab Rat
55 posts

thanks Andre for advice :) but if text files don’t carry inforrmations about encoding, how can I get this information? Suppose that in my apllication user can open every text file and content of this file is displayed on QPlainTextEdit, so I don’t have any chance to unearth innformation about encoding?

 Signature 

sorry for my broken english :)

January 10, 2011

Andre Andre
Area 51 Engineer
6076 posts

Nope, there is no relyable way. You can use some complicated routines that use some statistics or other heuristics to determine the likely encoding or something like that, but that’s not all that relyable. Just hope that UFT-8 will soon replace all other local encodings that are in use…

 Signature 

Looking for Qt developers to join our team @ i-Optics: https://qt-project.org/forums/viewthread/25393/

January 10, 2011

BlackDante BlackDante
Lab Rat
55 posts

oh, it’s not good, but thanks for answer :)

Andre wrote:
Just hope that UFT-8 will soon replace all other local encodings that are in use…

Yes, I will be prayed for this :)

 Signature 

sorry for my broken english :)

January 10, 2011

Volker Volker
Robot Herder
5428 posts

If we write hex codes in sources no vendor will care for proper UTF-8 support in their products. Hexcodes are not the solution, they are the source of all that evil.

If you are thoroughly you can switch your entire code base to UTF-8 without problems in MS Visual Studio, Qt Creator and XCode.

Add to your .pro file

  1. CODECFORTR = UTF-8
  2. CODECFORSRC = UTF-8

and to your main.cpp

  1. QTextCodec::setCodecForCStrings( QTextCodec::codecForName( "UTF-8" ) );
  2. QTextCodec::setCodecForTr( QTextCodec::codecForName( "UTF-8" ) );

This way you just can tell your code editors to open the files in UTF-8 mode if not stated otherwise. It works like a charm here in our team, involving different operating systems, programming languages and IDEs.

We are in year 2k11, in times of mega-supercomputing and what the hell has see, and I simply refuse strictly to type hexcodes in a file to gain an ‘ä’ or ‘ç’.

January 10, 2011

BlackDante BlackDante
Lab Rat
55 posts

I am much grateful for this anwser :) This will be very helpful in my little project :)

 Signature 

sorry for my broken english :)

January 10, 2011

ixSci ixSci
Lab Rat
203 posts

f we write hex codes in sources no vendor will care for proper UTF-8 support in their products. Hexcodes are not the solution, they are the source of all that evil.

While it is a correct statement in general and I agree with you, it is not so right in regard to regexps. Regexps notion \uXXXX is a standard way to represent character in exact Unicode code point. And you have full control of what you are writing, thus you won’t get any unexpected results if you use the hex notation in regexps. No encoding issues will bother you ever. BTW, there is \p{L} in regexps which is enough in the most cases.

 
  ‹‹ Qt for hardware acceleration      QProgressBar font color change during progress ››

You must log in to post a reply. Not a member yet? Register here!