wood burning stoves*
The moose likes Beginning Java and the fly likes catching regex group in repitition Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Beginning Java
Reply locked New topic
Author

catching regex group in repitition

Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188

I have text say "hallo a s d java world". Now asd is a acronym with whitespace. It could have been "a. s. d." as well. When there is space between these individual alphabets of an acronym, it creates lots of trouble in my application. Acronyms can be of minimum two alphabets, but upper limit is not defined. Now I want output as "hallo asd java world". What will be the regex for it and what will be the capturing group ? I've tried with "(\\s+)(([a-z](?:\\s+)){2,})". I'm intended to catch the first group to maintain the space, then second group should concatenate all the findings in iteration, which is what I am finding it difficult to do. In above example it retains only last alphabet i.e. 'd', while I intend to retain 'asd'. please help me realize this.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39478
    
  28
Welcome to JavaRanch

Please supply more details, including the code you are using to parse that String.

By the way: if asd is an acronym, then "a s d" isn't an acronym. In which case it should not be possible to parse "a s d" or "a. s. d."

Acronyms consist of a single word without spaces or stops in. Really, "asd" isn't an acronym because it is awkward to pronounce it as a word. "NATO" is an acronym because it is formed from the initials of 4 words and is prounced Nay-to, not enn-a-tee-o.
Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
Idea of "a s d" as acronym was just an example. You can take "NATO as an example". Now somewhere in text, someone writes it as "hallo N. A. T. O. java world", which he meant it to be "N.A.T.O.". In such cases what I want is to find such patterns and get the whole picture "N.A.T.O.". in my earlier example I had removed dot for ease of use. If dot is present, my pattern will look like "(\\s+)(([A-Z](?:\\.?)(?:\\s+)){2,})". Here my intention is a pattern starting with one ore more whitespace, followed by two or more repetition of combination of character followed by dot (one or not at all), followed by one or more whitespace. last two(dot and whitespace) are non-captured groups.

NB- please do not focus on that in previous post I've not supplied dot handling and lower/upper case handling, those are not the issues

My code looks like this:

private static final String COMPACT_ACRONYMS = "(\\s+)(([A-Z](?:\\.?)(?:\\s+)){2,})";
public static String compactSpacedAcronyms(String text){
Pattern p = Pattern.compile(COMPACT_ACRONYMS);
Matcher m = p.matcher(text);
text = m.replaceAll("$1$3)");
return text;
}


This code matches the pattern correctly, however for replacement, I need some trick to compact the acronym. Here, I understand that it finds 'N.', 'A.', 'T.', 'O.' individually. however overrides the previous findings and at last '$3' prints 'O.' only. Is there any way to print 'N.A.T.O.' so that my final text becomes "hallo N.A.T.O. java world".

Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
OK, it's done. I had to do some work around.

leave aside those upper/lower case and dots, so code looks like:

private static final String COMPACT_ACRONYMS = "(\\s+)(([a-z])(?:\\.?)(?:\\s+)){2,}";
public static String compactSpacedAcronyms(String text){
Pattern p = Pattern.compile(COMPACT_ACRONYMS);
Matcher m = p.matcher(text);
Pattern p1 = Pattern.compile("((?:\\s*)([a-z])(?:\\s*))");
String tempText = null;
if(m.find()){
tempText = m.group(); // capture above compact acronym in temp string
System.out.println(tempText);
Matcher m1 = p1.matcher(tempText);
tempText = m1.replaceAll("$2"); // process this temp String further
System.out.println(tempText);
}
// System.out.println(m.);
text = m.replaceAll("$1"+tempText+"$1"); // replace original patterns with this tempstring
return text;
}
Rahul P Kumar
Ranch Hand

Joined: Sep 26, 2009
Posts: 188
Sorry, this has a serious bug. what it does is, the first find becomes the replacement for all occurrences. temp string is not reset each time. How to do that?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19723
    
  20

Please Use Code Tags instead of colouring.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18914
    
  40

Topic restarted here...

http://www.coderanch.com/t/464471/Java-General/java/catching-regex-group-repitition


This topic will be locked.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
 
Don't get me started about those stupid light bulbs.
 
subject: catching regex group in repitition