Regular expression help needed
I have collection of files, the contents of all those files have the following format
- -- File name
- --
- -- listOne (L1)
- -- listTwo (L2)
- -- listThree (L3)
- -- HeaderLine (HE)
- -- listFour (L6)
- -- listFive (L2)
- -- listSix (L9)
- -- listSeven (L0)
- -- someline (SL)
- -- listeight (LL)
- --
- --
- REMAINING CONTENTS OF THE LINE
- -----------------------------------------------------------------------
- some more contents
- ------------------------------------------------------------------------
Here i want to store only L1,L2,L3 etc in a list, except HE,SL and remaining lines of files
How can i do that?
Please help me, i went through QREgExp class defination also, and i wrote code but that seems to be very big and inserts some blank strings into stored list
- while(!f.atEnd() && (!line.contains("------------------------------------------")))
- {
- if(!line.contains("-- "))
- {
- flag=1;
- rx.indexIn(line);
- rx1.indexIn(rx.cap(0));
- captured.append(rx1.cap(0));
- line=f.readLine();
- }
- else if(flag==1)
- {
- flag++;
- captured.pop_back();
- rx.indexIn(line);
- rx1.indexIn(rx.cap(0));
- captured.append(rx1.cap(0));
- line=f.readLine();
- }
- else if(flag>0)
- { flag++;
- rx.indexIn(line);
- rx1.indexIn(rx.cap(0));
- captured.append(rx1.cap(0));
- line=f.readLine();
- }
- }
Please help me solve this problem
9 replies
All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.
Also, if I get it right, all you need to do is store all whole lines containing “(XY)”, except those with “HL” and “SL”? Then, why not do it like that:
- if (line.contains("HL") || line.contains("SL")) { // Throw away those with "HL" or "SL"
- continue;
- }
- // do your code here
- }
Thank u…..but u misunderstood….may be i explained it wrongly…It is just a format, words are not same…..
I dont want to store those lines, which has sub lines…..
eg:
- -- someline(kk)
- -- main line(mm)
- -- this is subline(ab)
- -- this is another subline(hh)
- in such case i want only sublines....
All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.Also, if I get it right, all you need to do is store all whole lines containing “(XY)”, except those with “HL” and “SL”? Then, why not do it like that:
@
if (line.contains(QRegExp(”[\(]\w\w[\)]”)) { // Get all lines with “(XY)” if (line.contains(“HL”) || line.contains(“SL”)) { // Throw away those with “HL” or “SL” continue; } // do your code here
}
@
The following snippet should show you the basic principle:
- QStringList l;
- l << "listOne (L1)";
- l << "listTwo (L2)";
- l << "listThree (L3)";
- l << "HeaderLine (HE)";
- l << "listFour (L6)";
- l << "listFive (L2)";
- l << "listSix (L9)";
- l << "listSeven (L0)";
- l << "someline (SL)";
- l << "listeight (LL)";
- qDebug() << "check string" << s;
- if(re.exactMatch(s)) {
- qDebug() << " found mach" << code;
- } else {
- qDebug() << " no match";
- }
- }
Short explanation of the regex:
- ^.+
matches everything at the start of the string - \\s+
followed by at least one (or more) whitespace character(s) (space, tab, newlines) - \\(
followed by a literal opening parenthesis. Actually it is \(, but the backslash needs to be encoded for C string construction - (
start a caption group - L[0-9L]
followd by a literal L and exactly one of 0, 1, 2… 9 or L - )
end the caption gropu - \\)
followed by a literal closing parenthesis - $
at the end of the string
The caption group contains what has been matched in between, which will be one of L0, L1, L2… L9, LL.
Sorry Volker, not like that….
> All texts inside round bracket, which is present at the end of all line.
> And regular expression should not capture line which has sub line..
example input:
- -- afgh hkjhkh(gk_6)
- -- its main line (aa) <<--except this line capture remaining, as this has subline
- -- its sub line(bb) <<----subline
- -- its another subline(cc) <<-----subline
- -- something(dd09)
- -- this is also(tr_8787)
And output should be: gk_6,aa,bb,cc,dd09,tr_8787
It is up to you to detect what’s a “subline” and skip the regex on that alltogether.
I recommend to study the QString documentation. It has various helpful methods. Read through the method list and descriptions.
You must log in to post a reply. Not a member yet? Register here!




