jQuery in Action, 3rd edition
The moose likes Java in General and the fly likes Java Regex for Haskell. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java Regex for Haskell." Watch "Java Regex for Haskell." New topic

Java Regex for Haskell.

Lucky J Verma
Ranch Hand

Joined: Apr 11, 2007
Posts: 278
I need to write a regex to read out comments from Haskell code in source.
I created a pattern but it doesnt seem to work for nested comments.

Can someone please verify the regex i wrote?
source -is haskell code

import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test10 {

public static void main(String[] args) {

String regex="--[^\\n]*\\n|\\{-.*?-\\}";
String source = "--aa\n{- longer\ncomment -}";
Pattern p = Pattern.compile(regex);

Matcher m = p.matcher(source);

while (m.find()) {

System.out.println(m.group()+ " " +m.start());

fred rosenberger
lowercase baba

Joined: Oct 02, 2003
Posts: 11955


A little tip - if you wrap your java in "code" tags, it makes it much easier to read, and folks are more likely to help if you do so.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Lucky J Verma
Ranch Hand

Joined: Apr 11, 2007
Posts: 278
Yes Fred :-)

fred rosenberger
lowercase baba

Joined: Oct 02, 2003
Posts: 11955

well...I guess that doesn't help too much if you don't have your original code formatted very well...I cleaned it up for you.
Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46412
Question too difficult for "beginning Java™". Moving discussion.
Vinoth Kumar Kannan
Ranch Hand

Joined: Aug 19, 2009
Posts: 276

Hi Lucky,

The regex does not match multi-line comments because, the "." character class matches everything except line terminators - \n(unix line terminator),\r(carriage return),\u0085(next line),\u2028(line separator),\u2029(paragraph separator). So, the \n character wont be matched by default.
To make "." match all characters, you must use the other Pattern.compile() signature

DOTALL flag when set, will match "." with any character. This must solve your problem.

Or alternatively, the regex can be defined as

Here, in the first part of the regex, ".*" wont match \n character. So you wont need that "[^\\n]*" part.
In the second part, you can see we have used "(?s)". This is a regex flag and it implies that starting from the place where "(?s)" is encountered to the place "?(-s)", "." will match everything on its way. If we skip "(?-s)", it is equivalent to DOTALL flag for the entire remaining regex.
Beware, setting the flag in the Patter.compile() method call will apply it to the entire regex.
This kind of regex flag setting will come in handy, when you need to partially apply the DOTALL flag or CASE_INSENSITIVE flag..or any others.

I agree. Here's the link: http://aspose.com/file-tools
subject: Java Regex for Haskell.
It's not a secret anymore!