June 1, 2012

P_Tarr P_Tarr
Lab Rat
18 posts

Funny QRegExp whitespace behavior

 

While trying to eliminate empty HTML paragraphs I’ve run into some strange behavior while using a regular expression on a QString using the replace function. The intent is to replace

example 1:

  1. <p attributes>  </p>

or

example 2:

  1. <p attributes>
  2.  
  3. </p>

with a null value (”“). Example 2 differs from example 1 only by the addition of newlines. If A is a QString then the expression

  1. A.replace(QRegExp("<p[^<]*>[\s]*</p>"),"");

fails while the explicitly listing the whitespace elements (including a space character) works just fine

  1. A.replace(QRegExp("<p[^<]*>[ \n\r\t\f\v]*</p>"),"");

Does anyone know why QRegExp fails to detect whitespace properly in a QString?

2 replies

June 1, 2012

Tannin Tannin
Lab Rat
13 posts

You should be getting a warning “unrecognized character escape sequence”, no? That \s (and \n and so on) is interpreted by the compiler and replaced before QRegExp even gets to see it, you have to escape backslashes!
Use

  1. A.replace(QRegExp("<p[^<]*>[\\s]*</p>"),"");

June 2, 2012

P_Tarr P_Tarr
Lab Rat
18 posts

Tannin
Thanks for the pointer. After reading your reply I found the note in the QRegExp documentation that tells me to use double backslashes. I tried it and it works with both QRegExp expresions. My compiler didn’t give me a warning. I’m using Qt Creator 2.4.0 with the standard supplied compiler.

After some additional testing it became clear that \s was being interpreted as the character s but that both \n and \\n was being interpreted as a newline character (not sure why both works).

 
  ‹‹ Transparency of QGLWidget on other QGLWidgets      OSX: QTreeWidgetItem with custom widgets ››

You must log in to post a reply. Not a member yet? Register here!