This regular expression works as expected in "The Regex Coach":
(?m)^(\w+)(?:\s+)??((?:.*(?:[\n]^\s+)?.*)*)?
In a
Java program, (tested on 1.4.2_3 or 1.5 beta 1) it looks like this
(?m)^(\\w+)(?:\\s+)??((?:.*(?:[\\n]^\\s+)?.*)*)?
_________________________ ^^^^^^^^^^^^^^ ___________ carets mark the section that does continuation
The regex group(1) captures the term "Budgie" and group(2) is its definition: "Active and amusing miniature parrot native to Australia"
EXCEPT that
in Java, the continuation line is ignored, and we get "Active and amusing miniature parrot". I am sure the problem is in here: (?:[\\n]^\\s+)
Inserting $ in almost every conceivable position had no effect.
I also tried .? after [\\n] just in case there was something after the newline and before ^ the beginning of the actual new line. It doesn't seem to matter if there are two, three, four or even five backslashes [\\\\\n].
Either the continuation is ignored or the program captures nothing at all.
I want to do this <term>Cat</term> <meaning>Natural loner</meaning>
with a definition file having some lines blank and ignored, and some lines have terms without definitions. I have added underscores for leading blanks, because they are being edited out by the
forum.
Cat Natural loner
Dog Best friend
Budgie Active and amusing miniature parrot
____native to Australia
____and popular as a house pet
Eagle American mascot
Any suggestions as to what I could try?
[ August 13, 2004: Message edited by: Mike Rainville ]
[ August 13, 2004: Message edited by: Mike Rainville ]