File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes Regular Expressions and String replacements Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular Expressions and String replacements" Watch "Regular Expressions and String replacements" New topic

Regular Expressions and String replacements

Norm Radder
Ranch Hand

Joined: Aug 10, 2005
Posts: 692
I'm looking for ideas for a program that I'm modifying that does an edit function on groups of files. The data is html and the edits are to change the HREF= that are to a server engine as query strings to be changed to local references. For example HREF="thesite/engine.php?id=22&topic=345". I want to change this to HREF="Topic345/22.html".

I'm trying to use regex to solve this. The edit program gets its edit rules from a file. Up to now the edits have been simple replacements. Now I have pages that are more complicated.

Here's what I want to do. I'll use cap letters instead of URL strings.

Source data: ABC997DEF2G
Desired output: MNOP2YZ997QX

The pattern for matching would be: ABC\d{1,3}DEF\d{1,2}G
The pattern has five parts: constant, skip decimal, constant, skip decimal and constant

The replacement string for the above would be: MNOP<skip2>YZ<skip1>QX
Where <skip2> is the value skipped by \d{1,2} and <skip1> the value skipped by \d{1,3}.
The replacement rules would be: ABC by MNOP, DEF by YZ and G by QX.
These could be placed in an array: rep[]

When Pattern.matcher() finds a match, the start() and end() allow me to extract the area to work on.
If the skipped strings were in an array skip[] then the output record would be build by:
String outputRec = input.substring(0, matcher.start())
+ rep[0] + skip[1] + rep[1] + skip[0] + rep[2]
+ input.substring(matcher.end());

So how to do this?

The pattern would find the string and then use substring to extract the various parts of the string.
How to get the variable parts of the string that were skipped by \d{...}?

What would the rules for my edit program look like? These are input to my program.
For example:
Find: ABC\d{1,3}DEF\d{1,2}G
Use \ and } as delimiters for the variable part of the pattern. Find them with String.indexOf().
Replace: MNOP\<skip2>YZ\<skip1>QZ
Use \< and > as delimiters for the data matched by the variable part of the pattern.

Thanks for any ideas,
Henry Wong

Joined: Sep 28, 2004
Posts: 20516



Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Norm Radder
Ranch Hand

Joined: Aug 10, 2005
Posts: 692
Thanks. I knew it was easy.
I missed the $1 variable in String.replaceAll() method.
I agree. Here's the link:
subject: Regular Expressions and String replacements
It's not a secret anymore!