January 13, 2012

aurora aurora
Lab Rat
137 posts

Regular expression help needed

 

I have collection of files, the contents of all those files have the following format

  1. -- File name
  2. --
  3. -- listOne (L1)
  4. -- listTwo (L2)
  5. -- listThree (L3)
  6. -- HeaderLine (HE)
  7. --     listFour (L6)
  8. --     listFive (L2)
  9. -- listSix (L9)
  10. -- listSeven (L0)
  11. -- someline (SL)
  12. --    listeight (LL)
  13. --
  14. --
  15. REMAINING CONTENTS OF THE LINE
  16. -----------------------------------------------------------------------
  17. some more contents
  18. ------------------------------------------------------------------------

Here i want to store only L1,L2,L3 etc in a list, except HE,SL and remaining lines of files
How can i do that?
Please help me, i went through QREgExp class defination also, and i wrote code but that seems to be very big and inserts some blank strings into stored list

  1.     while(!f.atEnd() && (!line.contains("------------------------------------------")))
  2.     {
  3.      
  4.     if(!line.contains("-- "))
  5.     {
  6.     flag=1;
  7.     QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
  8.     rx.indexIn(line);
  9.     QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
  10.     rx1.indexIn(rx.cap(0));
  11.     captured.append(rx1.cap(0));
  12.     line=f.readLine();
  13.     }
  14.     else if(flag==1)
  15.     {
  16.     flag++;
  17.     captured.pop_back();
  18.     QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
  19.     rx.indexIn(line);
  20.     QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
  21.     rx1.indexIn(rx.cap(0));
  22.     captured.append(rx1.cap(0));
  23.     line=f.readLine();
  24.     }
  25.      
  26.     else if(flag>0)
  27.     { flag++;
  28.     QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
  29.     rx.indexIn(line);
  30.     QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
  31.     rx1.indexIn(rx.cap(0));
  32.      
  33.      
  34.     captured.append(rx1.cap(0));
  35.     line=f.readLine();
  36.     }
  37.      
  38.     }

Please help me solve this problem

9 replies

January 13, 2012

sierdzio sierdzio
Area 51 Engineer
2333 posts

All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.

Also, if I get it right, all you need to do is store all whole lines containing “(XY)”, except those with “HL” and “SL”? Then, why not do it like that:

  1. if (line.contains(QRegExp("[\(]\w\w[\)]")) { // Get all lines with "(XY)"
  2.     if (line.contains("HL") || line.contains("SL")) { // Throw away those with "HL" or "SL"
  3.         continue;
  4.     }
  5.     // do your code here
  6. }

 Signature 

(Z(:^

January 13, 2012

sierdzio sierdzio
Area 51 Engineer
2333 posts

Regexp might be wrong, but I’m in a hurry now and don’t have time to think it through. But you’ll probably get the idea.

 Signature 

(Z(:^

January 13, 2012

aurora aurora
Lab Rat
137 posts

Thank u…..but u misunderstood….may be i explained it wrongly…It is just a format, words are not same…..
I dont want to store those lines, which has sub lines…..
eg:

  1. -- someline(kk)
  2. -- main line(mm)
  3. --     this is subline(ab)
  4. --     this is another subline(hh)
  5.    in such case i want only sublines....

sierdzio wrote:
All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.

Also, if I get it right, all you need to do is store all whole lines containing “(XY)”, except those with “HL” and “SL”? Then, why not do it like that:
@
if (line.contains(QRegExp(”[\(]\w\w[\)]”)) { // Get all lines with “(XY)” if (line.contains(“HL”) || line.contains(“SL”)) { // Throw away those with “HL” or “SL” continue; } // do your code here
}
@

January 13, 2012

Volker Volker
Robot Herder
5428 posts

Best way to describe your goal would be to show the input list and the result that you expect.

January 15, 2012

aurora aurora
Lab Rat
137 posts
Volker wrote:
Best way to describe your goal would be to show the input list and the result that you expect.

ok…my input is file shown above,
and regular expression must capture
only L1,L2,L3,L6,L2,L9,L0,LL

it should not capture the line which has subline, thats all…

January 15, 2012

Volker Volker
Robot Herder
5428 posts

The following snippet should show you the basic principle:

  1. l << "listOne (L1)";
  2. l << "listTwo (L2)";
  3. l << "listThree (L3)";
  4. l << "HeaderLine (HE)";
  5. l << "listFour (L6)";
  6. l << "listFive (L2)";
  7. l << "listSix (L9)";
  8. l << "listSeven (L0)";
  9. l << "someline (SL)";
  10. l << "listeight (LL)";
  11.  
  12. QRegExp re("^.+\\s+\\((L[0-9L])\\)$");
  13. foreach(const QString s, l) {
  14.     qDebug() << "check string" << s;
  15.     if(re.exactMatch(s)) {
  16.         QString code = re.cap(1);
  17.         qDebug() << "     found mach" << code;
  18.     } else {
  19.         qDebug() << "     no match";
  20.     }
  21. }

Short explanation of the regex:

  • ^.+
    matches everything at the start of the string
  • \\s+
    followed by at least one (or more) whitespace character(s) (space, tab, newlines)
  • \\(
    followed by a literal opening parenthesis. Actually it is \(, but the backslash needs to be encoded for C string construction
  • (
    start a caption group
  • L[0-9L]
    followd by a literal L and exactly one of 0, 1, 2… 9 or L
  • )
    end the caption gropu
  • \\)
    followed by a literal closing parenthesis
  • $
    at the end of the string

The caption group contains what has been matched in between, which will be one of L0, L1, L2… L9, LL.

January 16, 2012

aurora aurora
Lab Rat
137 posts

Sorry Volker, not like that….
> All texts inside round bracket, which is present at the end of all line.
> And regular expression should not capture line which has sub line.. example input:

  1.     -- afgh hkjhkh(gk_6)
  2.     -- its main line (aa)     <<--except this line capture remaining, as this has subline
  3.     --       its sub line(bb)           <<----subline
  4.     --       its another subline(cc)      <<-----subline
  5.     -- something(dd09)
  6.     -- this is also(tr_8787)

And output should be: gk_6,aa,bb,cc,dd09,tr_8787

January 16, 2012

aureshinite aureshinite
Lab Rat
61 posts

Learn about regular expressions. Period.

January 16, 2012

Volker Volker
Robot Herder
5428 posts

It is up to you to detect what’s a “subline” and skip the regex on that alltogether.

I recommend to study the QString documentation. It has various helpful methods. Read through the method list and descriptions.

 
  ‹‹ Application hangs when using QNetworkAccessManager      Project file specifies CONFIG += debug_and_release but I can only build in release. What Gives? ››

You must log in to post a reply. Not a member yet? Register here!