[SOLVED]Regular Expresion and national letters
Hi, I have another problem :)
In my aplication I wanna to catch everythink between quotation marks and I have problem with national letters like “ą ł ó ę ś ć ż ź”… so i write this regular expresion:
but it isn’t work… maybe not exactly not work, it work only to space and don’t catch the national letters…
Can anybody help?
10 replies
Which encoding is your source file saved in? Are you using a proper encoding for it AND a proper QString decoding method for building the string you pass to QRegExp ctor, like QString::fromUtf8? For instance, you can save the source file as UTF8, or save it as ASCII and put the unicode encoding of those characters, like “\xc4\x85” for the literal “ą”.
thank you Volker again, yours solutions is perfect :)
Also, what’s the single backslash before your quotation mark in the regex for?
# "\\\"[A" in C/C++ # is actually # \"[A
When I looked in to QRegExp example, most of examples was started with “\\” so I thought that in my case it’s must to be, and it works except national letters ;)
Peppe, I almost forgot about encoding QString and this was a problem ;) eh, still I am amateur, thanks for anwser, next time I will be remember to encodnig QString ;)
Note that you can get around most encoding issues by using the hexcodes instead for symbols outside the standard character range. That is less readable, but probably more relyable. The problem with text files (including source files) is that they carry no information on the encoding they are in. That means that trouble can arise as soon as somebody else, unaware of your encoding settings, start editing your file.
thanks Andre for advice :) but if text files don’t carry inforrmations about encoding, how can I get this information? Suppose that in my apllication user can open every text file and content of this file is displayed on QPlainTextEdit, so I don’t have any chance to unearth innformation about encoding?
If we write hex codes in sources no vendor will care for proper UTF-8 support in their products. Hexcodes are not the solution, they are the source of all that evil.
If you are thoroughly you can switch your entire code base to UTF-8 without problems in MS Visual Studio, Qt Creator and XCode.
Add to your .pro file
- CODECFORTR = UTF-8
- CODECFORSRC = UTF-8
and to your main.cpp
This way you just can tell your code editors to open the files in UTF-8 mode if not stated otherwise. It works like a charm here in our team, involving different operating systems, programming languages and IDEs.
We are in year 2k11, in times of mega-supercomputing and what the hell has see, and I simply refuse strictly to type hexcodes in a file to gain an ‘ä’ or ‘ç’.
f we write hex codes in sources no vendor will care for proper UTF-8 support in their products. Hexcodes are not the solution, they are the source of all that evil.
While it is a correct statement in general and I agree with you, it is not so right in regard to regexps. Regexps notion \uXXXX is a standard way to represent character in exact Unicode code point. And you have full control of what you are writing, thus you won’t get any unexpected results if you use the hex notation in regexps. No encoding issues will bother you ever. BTW, there is \p{L} in regexps which is enough in the most cases.
You must log in to post a reply. Not a member yet? Register here!




