Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java Regex for Haskell.

 
Lucky J Verma
Ranch Hand
Posts: 278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to write a regex to read out comments from Haskell code in source.
I created a pattern but it doesnt seem to work for nested comments.



Can someone please verify the regex i wrote?
source -is haskell code



import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test10 {

public static void main(String[] args) {



String regex="--[^\\n]*\\n|\\{-.*?-\\}";
String source = "--aa\n{- longer\ncomment -}";
Pattern p = Pattern.compile(regex);

Matcher m = p.matcher(source);

System.out.println(regex);
while (m.find()) {

System.out.println(m.group()+ " " +m.start());
}


 
fred rosenberger
lowercase baba
Bartender
Posts: 12123
30
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lucky,

A little tip - if you wrap your java in "code" tags, it makes it much easier to read, and folks are more likely to help if you do so.
 
Lucky J Verma
Ranch Hand
Posts: 278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes Fred :-)

 
fred rosenberger
lowercase baba
Bartender
Posts: 12123
30
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
well...I guess that doesn't help too much if you don't have your original code formatted very well...I cleaned it up for you.
 
Campbell Ritchie
Sheriff
Pie
Posts: 48952
60
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Question too difficult for "beginning Java™". Moving discussion.
 
Vinoth Kumar Kannan
Ranch Hand
Posts: 276
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Lucky,

The regex does not match multi-line comments because, the "." character class matches everything except line terminators - \n(unix line terminator),\r(carriage return),\u0085(next line),\u2028(line separator),\u2029(paragraph separator). So, the \n character wont be matched by default.
To make "." match all characters, you must use the other Pattern.compile() signature

DOTALL flag when set, will match "." with any character. This must solve your problem.



Or alternatively, the regex can be defined as

Here, in the first part of the regex, ".*" wont match \n character. So you wont need that "[^\\n]*" part.
In the second part, you can see we have used "(?s)". This is a regex flag and it implies that starting from the place where "(?s)" is encountered to the place "?(-s)", "." will match everything on its way. If we skip "(?-s)", it is equivalent to DOTALL flag for the entire remaining regex.
Beware, setting the flag in the Patter.compile() method call will apply it to the entire regex.
This kind of regex flag setting will come in handy, when you need to partially apply the DOTALL flag or CASE_INSENSITIVE flag..or any others.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic