aspose file tools*
The moose likes Java in General and the fly likes Regular Expressions in String's split() method. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular Expressions in String Watch "Regular Expressions in String New topic
Author

Regular Expressions in String's split() method.

Siju Odeyemi
Greenhorn

Joined: Jan 16, 2003
Posts: 10

I've got a String variable expFile with the following value in it:



THEN I split the string using the following method:



I'm trying to write a regular expression to split the file after every 10 paragraphs OR at every 1000 characters at most. Unfortunately, I can't seem to get the regular expression right. Can someone with regex skills please show me the light? I'm quite desperate.

Thanks in advance.
prem pillai
Ranch Hand

Joined: Nov 02, 2007
Posts: 87

have a look at java.util.regex.Matcher ....
Wouter Oet
Saloon Keeper

Joined: Oct 25, 2008
Posts: 2700

I think that he knows about the Matcher class since he is asking for someone with regex skills. However what have you tried so far? The regex you're looking for isn't very complicated.


"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." --- Martin Fowler
Please correct my English.
Siju Odeyemi
Greenhorn

Joined: Jan 16, 2003
Posts: 10
prem & Wouter, thanks for responses.

I don't know regexp syntax at all, I know that the split method breaks the string up everytime it encounters the tag, but I need an expression that does what I explained in my opening post.

Cheers guys.

prem pillai
Ranch Hand

Joined: Nov 02, 2007
Posts: 87

but I need an expression that does what I explained in my opening post.


Why are you insisting that it should be done using a regex ? If you are not comfortable with regexes , why don't you have a look at other options to break up your string? There are options available in java.lang.String class itself. Why dont you give it a try ... in the simple way first.

Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Siju Odeyemi wrote:
I'm trying to write a regular expression to split the file after every 10 paragraphs OR at every 1000 characters at most. Unfortunately, I can't seem to get the regular expression right. Can someone with regex skills please show me the light? I'm quite desperate.


Generally, split() is good when you can describe what you want in terms of it's delimiters. Descriptions like "10 paragraphs" are more towards what you actually want, than how they are separated. In those cases, it is probably better to use the find() method instead of the split() method.

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Siju Odeyemi wrote:I don't know regexp syntax at all, I know that the split method ....


I seriously recommend against using regexes if you don't know how they work (or their syntax). With regex, it is very easy to write code that you don't understand, even with some experience; to try it with no experience at all is sure to wind up with code you don't understand (and completely unmaintainable).

Henry
Vinoth Kumar Kannan
Ranch Hand

Joined: Aug 19, 2009
Posts: 276

Siju Odeyemi wrote:
I don't know regexp syntax at all....


Regex is no big deal. Its easy, yes. A few tutorials and trying out a few sample code would get you going.
I suggest you try reading this - http://www.regular-expressions.info/tutorial.html
This one is really good & easy to understand.


OCPJP 6
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39409
    
  28
Vinoth Kumar Kannan wrote: . . . Regex is no big deal. Its easy, yes. . . .
. . . and,

I'm from the Government; I'm here to help.
The cheque's in the post.
etc etc
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Henry Wong wrote:
I seriously recommend against using regexes if you don't know how they work (or their syntax). With regex, it is very easy to write code that you don't understand, even with some experience; to try it with no experience at all is sure to wind up with code you don't understand (and completely unmaintainable).


++

But if you do know how they work then


Retired horse trader.
 Note: double-underline links may be advertisements automatically added by this site and are probably not endorsed by me.
Joanne Neal
Rancher

Joined: Aug 05, 2005
Posts: 3669
    
  15
James Sabre wrote:

As Vinoth said. Easy


Joanne
James Sabre
Ranch Hand

Joined: Sep 07, 2004
Posts: 781

Joanne Neal wrote:
James Sabre wrote:

As Vinoth said. Easy


Certainly not difficult and I would normally write it with comments to make it obvious; something along the lines


Regex don't have to be difficult and the biggest problem I see with regex is people trying to write them as one long string. Yes, one can write very very complex regex that are incomprehensible probably even to the author but the same applies to any computer language; it just happens to be easier to do with regex.

If you want to see really incomprehensible syntax then take a look at APL. I spent several years teaching APL and learned to both love and hate the mathematical notation.

Edit : :-( Must be complex regex since nobody has pointed out that my regex is actually rubbish so I have added weight to the arguments of those who are against regex. At this time I can't correct the regex. Funny really since my initial approach would have been to use Pattern with Matcher.find() and that is easy to code correctly. Using StringTokenizer would follow the same approach as Pattern and Matcher.find() so would probably be easier still.
Harivittal Atreya Hk
Greenhorn

Joined: Jul 14, 2010
Posts: 1
Why Dont you try solving it with "StringTokenizer class", you can specify the common occurences at the end of 1000 chars as its a static doc.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

James Sabre wrote:Edit : :-( Must be complex regex since nobody has pointed out that my regex is actually rubbish so I have added weight to the arguments of those who are against regex. At this time I can't correct the regex.


That's the other thing about regexes, a complex regex is just a mess of characters....

I won't try to fix this, but if you want to, I would first recommend adding the matches for the characters, in-between the paragraph markers. The way it is written, it will only match if the markers are back to back.

Second, you will likely run into the issue that unbounded regexes are not allowed for look-behinds. To fix that, you can't use "*", or "+", which isn't a problem; it isn't a problem because the maximum match is a 1000 characters anyway. You can cap each at 1000 characters, which will bound the look behind as no more than 10,000 characters, which will trigger the other part of the pattern anyway.

Third, there may be some issues with the start and end boundaries.

And at this point, I am sure that I missed something...

Henry
 
 
subject: Regular Expressions in String's split() method.