aspose file tools*
The moose likes Java in General and the fly likes Trouble with regex Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Trouble with regex" Watch "Trouble with regex" New topic
Author

Trouble with regex

Raj S Kumar
Ranch Hand

Joined: Aug 06, 2006
Posts: 48
I have to identify a set of characters in a string, a filepath.

D:\Development\Devlocale\SVN\ABC\ABCD\EN\Hello_EN.rc

I have to pick the characters 'EN' which starts with either \ or _ or . and ends with \ or _ or .

The problem is I don't know where to start with. I have read the Javadocs for the Pattern class but couldn't start.

Could you please show me just to find 'EN' anywhere in the string? I could build on top of that by adding the conditions.

Thanks in Advance.


Raj S Kumar
Raj S Kumar
Ranch Hand

Joined: Aug 06, 2006
Posts: 48
I have got the answer for the basic thing.
I will post if I face any further issue.
Peter Taucher
Ranch Hand

Joined: Nov 18, 2006
Posts: 174
It would be polite to post a resolution here as well. Other people may benefit from it in the future.

Regex Tutorial -> http://java.sun.com/docs/books/tutorial/essential/regex/

As a starting point for your pattern this might help (but I'm no pro in regex, so maybe you could do it more effective/pretty):


Censorship is the younger of two shameful sisters, the older one bears the name inquisition.
-- Johann Nepomuk Nestroy
Raj S Kumar
Ranch Hand

Joined: Aug 06, 2006
Posts: 48
Sure, I will post the resolution once it is done.
I just got an answer for the basic thing. A lot more is yet to be done.
Raj S Kumar
Ranch Hand

Joined: Aug 06, 2006
Posts: 48
Hi,
I have to replace the 'EN' with other language codes, such as 'DE' etc in file paths. The condition is 'EN' might start or end with either a '\' or '_' or '.'
Once replaced, the replaced language code should also have the same characters.

D:\Development\Devlocale\SVN\ABC\ABCD\EN\Hello_EN.rc

I have done with the following code(covered all the conditions). Is there any other efficient way of doing it?


Please note that, the language codes (EN, DE) are outputs from methods. I have substituted with hardcoded values here.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19541
    
  16

Use a positive lookahead and lookbehind; in other words, only look for EN and put the rest in a lookbehind / lookahead.

For example:
That regex may look odd, but it's quote easy:
- (?<=[\\\\_.]) is a positive lookbehind that matches backslash, underscore or .
- EN is the literal EN
- (?=[\\\\_.]) is a positive lookahead that matches backslash, underscore or .

The secret with lookbehind / lookahead is that its presence (or with negative lookahead / lookbehind the absence) is required but it will not be part of the match. My regex will only match EN, but only if preceded with the characters you've specified.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Raj S Kumar
Ranch Hand

Joined: Aug 06, 2006
Posts: 48
Hi Rob,
Thanks for the reply and this will surely help me a lot. I would like to understand the regex better to tweak it for my requirement.

Could you please show me a link or a thread where I could understand the expression better?

Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19541
    
  16

You can start with the Javadoc of java.util.regex.Pattern. Lessen: Regular Expressions should also be good.
Peter Taucher
Ranch Hand

Joined: Nov 18, 2006
Posts: 174
Maybe here:
http://www.regular-expressions.info/lookaround.html

Never used that (lookahead/lookbehind). Rob, you're great!
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19541
    
  16

I know, I know
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Trouble with regex
 
Similar Threads
Regex problem
Regex pattern - unprintable characters
RandomAccessFile's Pointer
Text in different colours
where to begin!