aspose file tools*
The moose likes Java in General and the fly likes Regular Expression: finding multiple lines Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular Expression: finding multiple lines" Watch "Regular Expression: finding multiple lines" New topic
Author

Regular Expression: finding multiple lines

André Campanini
Greenhorn

Joined: Jun 20, 2008
Posts: 19
Hello!

Does anyone knows how can I do a "search and replace" using REGEX, but I want to match several lines.

For example, I want to open a TXT file and find:



and replace it for:



Like "put" a comment on it.

Thanks!
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
Most regexp libraries can perform multiline matching. Search the javadocs of the java.util.regex.Pattern class for "MULTILINE".


Ping & DNS - my free Android networking tools app
André Campanini
Greenhorn

Joined: Jun 20, 2008
Posts: 19
Hi, again.

I found a topic in this forum and I tested the code, but I think I don't know how to use some methods very well, like "replace all". Topic is: http://www.coderanch.com/t/411621/java/java/Pattern-matches-but-never-replaces

Using that code, when I look for "void myMethod(){}" works well, but if I change the text to multiple lines, it doesn't work.

"
void myMethod()
{
}
"

I'm posting the code that I tested.

Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
How will you be replacing a method like this:


[ October 07, 2008: Message edited by: Piet Verdriet ]
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Ulf Dittmer:
Most regexp libraries can perform multiline matching. Search the javadocs of the java.util.regex.Pattern class for "MULTILINE".


The MULTILINE option will only cause the regex engine to treat each line as if it was a "complete String" of it's own. So each line will have a ^ (beginning of String), some contents and ends with a $ (end of String).

What you're hinting at is probably the DOTALL option, causing the DOT will also match new-line characters?
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Try something like this:



But it will still break in a lot of cases! Better use a true Java source parser. ANTLR is an impressively easy to use parser generator.

Good luck!
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Originally posted by Piet Verdriet:
Try something like this:


But it will still break in a lot of cases! Better use a true Java source parser. ANTLR is an impressively easy to use parser generator.



I agree. Using a regex is normally a very poor approach for parsing recursive syntax.


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
André Campanini
Greenhorn

Joined: Jun 20, 2008
Posts: 19
Hello!

I got what I wanted doing this:

regex = "(?m)void myMethod\\(\\)\r\n\\{\r\n\\}"
String newFileContent = fileContent.replaceAll(regex, "/*"+regex+"*/");

It replaces every method with coments... just what I wanted. I don't know if it is just right using regex this way... but is working, now...

Regards!!!
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Andr� Campanini:
Hello!

I got what I wanted doing this:

regex = "(?m)void myMethod\\(\\)\r\n\\{\r\n\\}"
String newFileContent = fileContent.replaceAll(regex, "/*"+regex+"*/");

It replaces every method with coments... just what I wanted. I don't know if it is just right using regex this way... but is working, now...

Regards!!!


You don't need the (?m) flag.

Doesn't your method have a body? If not (your method always looks the same), you don't need regex for it: a simple String.replace(...) will do.

If there is a method body, try your approach with the following:
André Campanini
Greenhorn

Joined: Jun 20, 2008
Posts: 19
The method HAS a body, I just didn't put it here this way in the example.
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Andr� Campanini:
The method HAS a body, I just didn't put it here this way in the example.


Ok. Then my points still stand: you don't need the (?m) flag. And try it with a String with this contents:

or

or

To name just three out of many, many things that can go wrong.
Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836
Take the advice given above: avoid regular expressions for this. If you need to change a method with just one signature (assuming it's provided by the user), you need to count braces and take care of certain exceptions (like when they're contained in strings as given above). The basic principle is simple though, and will probably be faster than regex:Is that really difficult? Surely easier than regex?

Usual disclaimer: I haven't tested this and it is incomplete, so you'll need to finish and/or bug fix it yourself.
[ October 13, 2008: Message edited by: Charles Lyons ]

Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / Amazon Amazon UK )
André Campanini
Greenhorn

Joined: Jun 20, 2008
Posts: 19
Thanks a lot for the tips! It will help me a lot besides clean my code!
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Charles Lyons:
...Is that really difficult? Surely easier than regex?
[/QB]


Err, you make it sound too easy. Where the tricky part starts, you have a comment saying "TODO check for special cases to ignore". That's just the point: you can't simply catch all "special cases" in just one method: you need a true recursive decent parser. How are you planning to catch the special cases when a closing bracket is inside a String:

?

You could answer: "well, I'll count the opening quotes to see if it's inside a String literal". Okay, and what about String like these then:

?

My point here is (towards the OP): there exists no simple method that magically does what you ask with arbitrary source code. If that's fine with you, then go right ahead with a regex-solution! But be aware of all the things that can go wrong, and don't be amazed when your application breaks all of a sudden.

Whatever you do, best of luck!

[ October 13, 2008: Message edited by: Piet Verdriet ]
Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836
That's just the point: you can't simply catch all "special cases" in just one method: you need a true recursive decent parser.
I don't agree with that at all, based on my own experience writing mathematical parsers which did a similar thing. There are only a few places where an opening or closing brace can legally occur and where it doesn't group code into a block (which is part of the counting mechanism). If you don't believe me, read the Java Language Specification. Yes, there are plenty of examples of invalid code which will mess up the conversion, but they will also fail to compile so we can ignore them.

Off the top of my head you've hit the two places of interest for the missing key bit of code: inside a comment (either type) and inside a string or character. If you can think of another case (or find one in JLS), don't hesitate to quote it. The first and last cases are dead easy to consider given JLS rules. The string is slightly more tricky, but only needs a simple parser which understands string and character delimiters and as you've demonstrated, the escape mechanism (which is itself easy if you read character-by-character as in my algorithm above, and ignore the case \" as a closing string delimiter). You really don't need a full-blown code syntax parser which is going to add huge overheads and extra libraries to what should be a simple job. If I put just half an hour to this, I think I could write a fully-compliant application (for the benefit of the OP, I won't). That's probably less time than it takes to become familiar with an external library. Better yet, this can easily be written in portable C and avoid the start up times of the JVM---then it'll be lightening fast on thousands of files.

After a little thought, there is one other case which can lead to the final code having a compile error: if the method already contains comments which end with */. In that case you can't just put a /* ... */ around the method. But this case needs to be considered as part of the routine above anyway, so it can be dealt with there: if we encounter a "*/" (which will be as part of a comment which we handle), replace it with something else (e.g. "*\/") or, for the courageous, convert that entire block to // quotes. Slightly trickier, but still far from impossible.

You saying one can't do it this way is just defeatist and may lead to markedly inefficient code---like many many things, you can do it with some careful forethought. From experience with Java one can cover all the sane cases. Analysing the JLS will cover all the cases (sane and no-so) guaranteeing the application won't "break of all a sudden". Of course, if you need to cover more situations than just this one, the benefits of a true syntax parser will outweigh the added complexity.
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Charles Lyons:
...
You saying one can't do it this way
...


I didn't say that.
[ October 13, 2008: Message edited by: Piet Verdriet ]
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Piet Verdriet:
...
My point here is (towards the OP): there exists no simple method
...


... sure, one can have different opinions of what the a "simple" method is, but by looking at all the cases I'd have to take into account, it's isn't as trivial as I had the impression you lead the OP to believe.

All IMO, of course.
[ October 13, 2008: Message edited by: Piet Verdriet ]
Charles Lyons
Author
Ranch Hand

Joined: Mar 27, 2003
Posts: 836
it isn't as trivial as I had the impression you lead the OP to believe.
My apologies if that was the impression I conveyed. By "easy" I mean that it doesn't involve any complicated code or advanced libraries, and that it is a short program (I would hazard that the bulk of it can be accomplished in under 50 lines). The only techniques which need to be used are elementary string processing operations (namely charAt and indexOf). Naturally, as with all programs, you have to carefully think through the problem to come out with a decent and high-performance solution; isn't that part of the satisfaction that comes from writing a good program? Whether the solution is trivial depends on what background one comes from and how much experience they have (this is the intermediate, not beginner's, forum after all). Regardless, this can be done in a straightforward piece-by-piece way by thinking only about processing characters in a string linearly, starting with the simple program I gave and then building supplementary rules to cover the few special cases that exist. All good software is built in stages or modules, and this is no exception. I deliberately left the OP to think carefully about what would go in the // TODO bits of my code, since as you emphasised, those are the bits which require some intelligence!

It is all subjective though: one person's solution could easily be another's nightmare.
Piet Verdriet
Ranch Hand

Joined: Feb 25, 2006
Posts: 266
Originally posted by Charles Lyons:
My apologies if that was the impression I conveyed. By "easy" I mean that it doesn't involve any complicated code or advanced libraries, and that it is a short program ...


No problem! Looking at the OP's attempts at solving this, I get the impression that the OP's notion of "easy" might differ "slightly" from yours!


I don't say this to put the OP down, of course!

Thanks for your elaborate clarification Charles.

Regards,
Piet.
[ October 13, 2008: Message edited by: Piet Verdriet ]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regular Expression: finding multiple lines