This week's book giveaway is in the Java in General forum.
We're giving away four copies of Think Java: How to Think Like a Computer Scientist and have Allen B. Downey & Chris Mayfield on-line!
See this thread for details.
Win a copy of Think Java: How to Think Like a Computer Scientist this week in the Java in General forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Counting exact matches of substring.

 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a string which contains words, numbers, line breaks, punctuations etc. all sort of characters.
I want to count the number of exact occurrences of some words in the string.

I am experimenting using the following code


I am trying to work out how the regular expression should look when I want exact matches, eg. given the text "foobar" and substring "foo" the count should be 0.
The regular expression

almost works for counting occurences of "foo", but not quite.
 
Anton Shaikin
Ranch Hand
Posts: 63
IntelliJ IDE Java Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First of all, there is a nice overview of regular expressions in Java API for Pattern class (here). I even use it for reference, when working with regexps in other languages.
Also you may want to have a look at pretty good tutorial on regular expressions from Sun here.
So, before asking such questions, you could try to figure it out,by first, learning the basics about regular expressions.
Anyway, the correct pattern in your case would be:
^foo$
As you can find in documentation for Pattern class, ^ stands for the beginning of a line, and $ for the end.
 
jishnu dasgupta
Ranch Hand
Posts: 103
Eclipse IDE Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Michael,

As Anton suggested you probably need to look into your regex expression. As your expression stands i beleive it would match for 1foo98 which is not what you want i guess.

On a personal note, if all you want is just to count the number of occurences you might just as well use the Scanner class.
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Anton Shaykin wrote:First of all, there is a nice overview of regular expressions in Java API for Pattern class (here). I even use it for reference, when working with regexps in other languages.
Also you may want to have a look at pretty good tutorial on regular expressions from Sun here.
So, before asking such questions, you could try to figure it out,by first, learning the basics about regular expressions.
Anyway, the correct pattern in your case would be:
^foo$
As you can find in documentation for Pattern class, ^ stands for the beginning of a line, and $ for the end.


I am familiar with the basics of regular expressions.
Have a look at the question again and see that ^foo$ is not the correct pattern in my case as I want to count "foo" for every time it appears as a word in a string.
The string might for instance be "baz23! foos23foo bar foobar barfoo!foo" and the count should be 2.
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
jishnu dasgupta wrote:Hi Michael,

As Anton suggested you probably need to look into your regex expression. As your expression stands i beleive it would match for 1foo98 which is not what you want i guess.

On a personal note, if all you want is just to count the number of occurences you might just as well use the Scanner class.


I would want to count that as an occurence. Seems like \bfoo\b should work [EDIT: Absolutely not]
 
jishnu dasgupta
Ranch Hand
Posts: 103
Eclipse IDE Java Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Michael Boehm wrote:
The string might for instance be "baz23! foos23foo bar foobar barfoo!foo" and the count should be 2.


Michael isnt the word "foo" actaully appearing 5 times in this String??
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
jishnu dasgupta wrote:
Michael isnt the word "foo" actaully appearing 5 times in this String??


Not the way I want to count it. I only want to count exact matches, so for me "foo" only appear twice since it isn't counted in eg. "foos" and "foobar"
 
Rob Spoor
Sheriff
Pie
Posts: 20527
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So what you want is foo, preceded by nothing, whitespace or punctuation, and followed by nothing, whitespace or punctuation. That looks like a job for positive lookahead / lookbehind:
(?<=^|\s|\p{Punct})foo(?=$|\s|\p{Punct})

That will only result in one match:
- foos23foo does not match since this is one word containing foo, not the word foo itself
- foobar does not match since this is one word containing foo, not the word foo itself
- barfoo does not match since this is one word containing foo, not the word foo itself
- foo matches since it's preceded by only a punctuation character
 
Anton Shaikin
Ranch Hand
Posts: 63
IntelliJ IDE Java Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That looks like a job for positive lookahead / lookbehind

Exactly, and that goes far beyond the "Beginning Java". Regular expressions are all about formalizing your requirements, so first you have to define what you mean by "word". Because, according to the common regexp vocabulary, a word character could be described by the following pattern [a-zA-Z_0-9]. As I see in your case, you mean something different.
 
Campbell Ritchie
Sheriff
Posts: 48910
58
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Anton Shaykin wrote: . . . that goes far beyond the "Beginning Java". . . ..
Agree. Moving thread.
 
Michael Boehm
Ranch Hand
Posts: 51
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I managed to do what I wanted. I used an appropriate Pattern and then I count by using split on the string containing the text. However this is quite slow.
 
Luigi Plinge
Ranch Hand
Posts: 441
IntelliJ IDE Scala Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This works for what you described in your example, although it may be what you have already:
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic