aspose file tools*
The moose likes Java in General and the fly likes Regex: Pulling out comments from text: Grabbing multiple lines using Pattern, Matcher Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regex: Pulling out comments from text: Grabbing multiple lines using Pattern, Matcher" Watch "Regex: Pulling out comments from text: Grabbing multiple lines using Pattern, Matcher" New topic
Author

Regex: Pulling out comments from text: Grabbing multiple lines using Pattern, Matcher

Paulo Simon
Greenhorn

Joined: Aug 03, 2005
Posts: 3
Hi folks,

this is my first post at javaranch and I hope I am posting to the right forum.

Here is the text file:
==================================================================
import java.util.*;

/**
;;
;;Comment 1

*/
code codename1
{
code_contents 1

};

/**
;;
;;Comment 2

*/
code codename2
{
code_contents 2

};
========================================================================

My output should be
==========================================================================
/**
;;
;;Comment 1

*/
/**
;;
;;Comment 2

*/
==========================================================================

I have written my code as shown below, but I am not able to get all the text I need. Need your kind help.
==========================================================
public static void main( String [] args )
{
try
{
String filename = "C:\\workspace\\xxx\\src\\document.txt";

// Read file into String
FileInputStream fis = new FileInputStream(filename);
int len= fis.available();
byte b[]= new byte[len];
fis.read(b);
String content = new String(b);
//System.out.println(content);
StringBuffer comBuf = new StringBuffer();
StringBuffer ruleBuf = new StringBuffer();


// "/**" == (\/\*\*) - Start
// "*/* == (\*\/)- End
// (\/\*\*)(.*)(\*\/) - Capture the comment block
// == (\\/\\*\\*)(.*)(\\*\\/)
//String commRx = "((\\/\\*\\*)(.*)(\\s*)(\\*\\/))";
String commRx = "((\\/\\*\\*)(\\w)(\\s*)(\\*\\/))";
Pattern commPattern = Pattern.compile(commRx,
(Pattern.MULTILINE));
Matcher commMatcher = commPattern.matcher(content);
ArrayList commArr = new ArrayList();

while( commMatcher.find() )
{
System.out.println(commMatcher.group());
}

}
catch( Exception ex )
{
ex.printStackTrace();
}
}

===================================================================


Thanks in Advance!!!
Paulo Simon
Greenhorn

Joined: Aug 03, 2005
Posts: 3
Also folks, I wanted to know if you know of a utility that would help build regular expressions that could be used for java. There are tools out there "Regex Coach" that do this, but I guess it is more for Perl kind of expressions.

For example, the text above, it works for the expression in Regex Coach but for some reason, does not work inside Java

Thanks again and waiting to see new ways to achieve this!!
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
The basic pattern for the multiline comments would be "/\\*\\*.*?\\*/", and you would compile it with the DOTALL flag set, not the MULTILINE flag.

As an alternative to Regex Coach, you might try RegexBuddy.
Paulo Simon
Greenhorn

Joined: Aug 03, 2005
Posts: 3
Thanks Alan. Works like a charm!!
 
Consider Paul's rocket mass heater.
 
subject: Regex: Pulling out comments from text: Grabbing multiple lines using Pattern, Matcher