July 27, 2011

dcortesi dcortesi
Lab Rat
33 posts

QRegExp, perl and Unicode

 

According to the documentation for QRegExp [developer.qt.nokia.com], it “is modeled on Perl’s regexp language. It fully supports Unicode.”

However, Perl’s regexp (documented here [perldoc.perl.org]) specifies support for the special escape \N{name} to match a named Unicode character or character sequence, e.g. \N{KELVIN SIGN},

and also for \p{property} and \P{property} to match or not-match numerous Unicode properties “listed here”:http://perldoc.perl.org/perluniprops.html#Properties-accessible-through-\p{}-and-\P{}.

I do not see either \p or \N mentioned in the QRegExp doc page. Are they there but not documented? Are these features (named characters and properties) supported in some other way?

3 replies

July 27, 2011

Gerolf Gerolf
Area 51 Engineer
3211 posts

Is modeled on doe not mean is equal :-)
Afaik it only handles the syntax that is written in the docs, so I assume it can’t handle those two types.

 Signature 

Nokia Certified Qt Specialist.
Programming Is Like Sex: One mistake and you have to support it for the rest of your life. (Michael Sinz)

July 27, 2011

peppe peppe
Ant Farmer
1026 posts

No. They’re simply not there. QRegExp supports a (minimal) subset of Perl’s regexps, and it’s not even PCRE compatible (f.i. there are no non-greedy operators).

You can work around the lack of \N support by (sigh…) using \x. Unfortunately not only there’s no direct equivalent of the \p escape, but the information provided by QChar are not enough to provide a workaround.

The best I can suggest is to dump QRegExp and using libpcre to do your matches.

 Signature 

Software Engineer
KDAB (UK) Ltd., a KDAB Group company

July 27, 2011

dcortesi dcortesi
Lab Rat
33 posts

Because I’m working in PyQt, libpcre is not available. Python native re support also lacks \p\N and has other Unicode deficiencies. However there is a good extension regex package (regex [pypi.python.org]) with rather complete Unicode support.

The difficulty that I see as a [Py]Qt newbie is in working on the one hand with a QPlainTextEditor and text cursor objects, and on the other with Python-based regex matching. Constantly crossing between the world of the editor document and the world of Python u“strings” looks like a very fruitful way to create confusion and mistakes. Comment?

 
  ‹‹ Qt Solutions archive, is it safe to use them for new projects, are there replacement Classes?      Icon in pushbutton does not resize ››

You must log in to post a reply. Not a member yet? Register here!