Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regex?

 
nick kaushik
Ranch Hand
Posts: 48
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you guys also tell me regex for fishing out strings from text such that it doesn't contain any space,newline,tabs,etc & the full stops.....
i know split method fish out line feeds & whitespace with "//s" but what about the dot.
eg.
i am doing Java.
output should be:
i
am
doing
Java
 
Rob Spoor
Sheriff
Pie
Posts: 20526
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you check out the Javadoc of java.util.regex.Pattern you will find something called character classes. You can use those, with \s being a range of its own. So to get only dot and whitespace you use "[.\\s]".
 
Harsha Smith
Ranch Hand
Posts: 287
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
or
or


only to decrease the readability
 
Rob Spoor
Sheriff
Pie
Posts: 20526
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But that would not split on tab characters, for instance. "\\s" will take care of all whitespace characters, although you would need to enable line breaks with a special flag; see java.util.regex.Pattern and its DOTALL flag.
 
nick kaushik
Ranch Hand
Posts: 48
Chrome Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
& what about when i want only the word strings only i.e. no whitespaces, line terminators, & extract the string from----(),{},[]..........?
 
Rob Spoor
Sheriff
Pie
Posts: 20526
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use a negating character class to split. The simples regular expression is "[^a-zA-Z]+"; don't forget the + or you will get empty Strings between non-word characters.
This will not handle accented characters correctly, you need \p{L} for that, but in combination with digits I think this should work: "[^\\p{L}&&[^0-9]]+".
 
Winston Gutkowski
Bartender
Pie
Posts: 10417
63
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:...you need \p{L} for that, but in combination with digits I think this should work: "[^\\p{L}&&[^0-9]]+".

Question: I'm much more familiar with good old grep/egrep than I am with Java regexes. Would
"[^\\p{L}0-9]+"
not be the same as
"[^\\p{L}&&[^0-9]]+"
?

Winston
 
Rob Spoor
Sheriff
Pie
Posts: 20526
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not sure, but I don't think so.
My intention was to take \\p{L}&&[^0-9] (in other words, all letters without digits), and negate that. Perhaps I didn't write that intention correctly.
[^\\p{L}0-9] takes \\p{L} and 0-9 (in other words, all letters and digits), and negates that.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic