wood burning stoves 2.0*
The moose likes Java in General and the fly likes catching regex group in repitition Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "catching regex group in repitition" Watch "catching regex group in repitition" New topic
Author

catching regex group in repitition

Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
I have text say "hallo a s d java world". Now 'asd' is an acronym with whitespace ('a s d'). It could have been "a. s. d." as well. The space between these individual alphabets of an acronym, creates lots of trouble in my application. Acronyms can be of minimum two alphabets, but upper limit is not defined. Now I want output as "hallo asd java world". What will be the regex for it and what will be the capturing group ? I've tried with "(\\s+)(([a-z])(?:\\.?)(?:\\s+)){2,}". I'm intended to catch the first group to maintain the space, then second group should concatenate all the findings in iteration, which is what I am finding it difficult to do. I tried this:



In above example it retains only last alphabet i.e. 'd', while I intend to retain 'asd'.

So, I tried this:



This too replaces all findings with tempText, which is the first occurrence. It means tempText is not initialized each time. Actually I don't have much idea about it. I posted in beginners section. There is no answer. Posting here in hope to get one.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18550
    
  40

I posted in beginners section. There is no answer. Posting here in hope to get one.


Please don't do that. The same people who visit the beginner's forum visit this one. All you are doing is crossposting.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18550
    
  40

I'm intended to catch the first group to maintain the space, then second group should concatenate all the findings in iteration, which is what I am finding it difficult to do. I tried this:



In above example it retains only last alphabet i.e. 'd', while I intend to retain 'asd'.


The reason for this is because group 3 is actually a submatch of group 2 -- and group 3 has to match multiple times to satisfy one match of group 2. When something like this happens, all you get is the last match. It doesn't magically concatenates the many matches for you.

Henry
Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
Henry Wong wrote:

The reason for this is because group 3 is actually a submatch of group 2 -- and group 3 has to match multiple times to satisfy one match of group 2. When something like this happens, all you get is the last match. It doesn't magically concatenates the many matches for you.

Henry

Ok, so how to achieve what I want to achieve? Probably I need some workaround. But That is what i am unable to figure out.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18550
    
  40

The problem is that your matches is too broad. You need to remove spaces after individual letters, which means you need to match / replace one letter at a time. However, you can't tell if a single letter is part of an acronym, unless you check the whole acronym, which causes you to not match one letter at a time.

Try using the look ahead and look behind constructs. With it, you can match one letter at a time, while be able to check if the letter is part of an acronym.

Henry
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: catching regex group in repitition
 
Similar Threads
Regular expression - Pattern and matcher
Pattern Matching
Pattern matching problem
catching regex group in repitition
regExp matcher not working