File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes regex? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "regex?" Watch "regex?" New topic
Author

regex?

nick kaushik
Ranch Hand

Joined: Sep 25, 2009
Posts: 48

Could you guys also tell me regex for fishing out strings from text such that it doesn't contain any space,newline,tabs,etc & the full stops.....
i know split method fish out line feeds & whitespace with "//s" but what about the dot.
eg.
i am doing Java.
output should be:
i
am
doing
Java


"ye shall know the truth & the truth shall set you free..."
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19726
    
  20

If you check out the Javadoc of java.util.regex.Pattern you will find something called character classes. You can use those, with \s being a range of its own. So to get only dot and whitespace you use "[.\\s]".


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Harsha Smith
Ranch Hand

Joined: Jul 18, 2011
Posts: 287
or
or


only to decrease the readability
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19726
    
  20

But that would not split on tab characters, for instance. "\\s" will take care of all whitespace characters, although you would need to enable line breaks with a special flag; see java.util.regex.Pattern and its DOTALL flag.
nick kaushik
Ranch Hand

Joined: Sep 25, 2009
Posts: 48

& what about when i want only the word strings only i.e. no whitespaces, line terminators, & extract the string from----(),{},[]..........?
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19726
    
  20

Use a negating character class to split. The simples regular expression is "[^a-zA-Z]+"; don't forget the + or you will get empty Strings between non-word characters.
This will not handle accented characters correctly, you need \p{L} for that, but in combination with digits I think this should work: "[^\\p{L}&&[^0-9]]+".
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 8060
    
  22

Rob Spoor wrote:...you need \p{L} for that, but in combination with digits I think this should work: "[^\\p{L}&&[^0-9]]+".

Question: I'm much more familiar with good old grep/egrep than I am with Java regexes. Would
"[^\\p{L}0-9]+"
not be the same as
"[^\\p{L}&&[^0-9]]+"
?

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19726
    
  20

I'm not sure, but I don't think so.
My intention was to take \\p{L}&&[^0-9] (in other words, all letters without digits), and negate that. Perhaps I didn't write that intention correctly.
[^\\p{L}0-9] takes \\p{L} and 0-9 (in other words, all letters and digits), and negates that.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: regex?