Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular Expression for Detecting Emoticons

 
Chia-you Chai
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello, I have a question on detecting the emoticons in a string.

If I have a sentence "

How can I use regular expression to extract the emoticons and in this sentence ?

Thank you helping !!
 
Nicola Garofalo
Ranch Hand
Posts: 308
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unfortunately i am not so good with regular expressions since i don't use them so often and i can't practice them as i would, but i would like to try to give you an answer because i think regular exepressions are a really expressive language.
Then

i arrived to this one



If i am not wrong this expression recognises any character (zero or more times) followed by or sequence followed by any character (zero or more times)

But, as i told you, i am not so good with it and you surely find better answers

 
Rob Spoor
Sheriff
Pie
Posts: 20531
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's indeed not such a good regex ;)

It says "any character any number of times" followed by a : or ) (because the [] denote a character class, or a choice of characters) or a : or D, followed again by any character any number of times.

The regular expression needed is simple in this case:
- a :. Use the literal.
- a ) or D. Use either a character class or a | for this.

So you'd get ":[)D]" or ":(\\)|D)". Note that in the second example you need to escape the ). The first form is easier but allows extending only with two-character like :(. The regex would become ":[)D(]". The second form allows you to add any number of characters after the :. With additions :(, :'( and :wink: the regex would become ":(\\(|D|\\(|'\\(|wink:)". With a given set of smilies that can be automatically generated, using Pattern.quote (the : is taken inside the parentheses so you can also add smilies like ;) that don't start with :):
This will create a regex like this: Don't be put off by those \Q and \E; \Q means to treat everything until the next \E as literals instead of meta characters. See also the Javadoc of java.util.regex.Pattern.
 
Nicola Garofalo
Ranch Hand
Posts: 308
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was sure of it
Thank you for the explanation
 
Rob Spoor
Sheriff
Pie
Posts: 20531
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome.
 
Chia-you Chai
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for reply ! That's really a nice post to improve my knowledge about RE.
However, when I try to apply this regular expression to detect emoticon, I only can get the first emoticon in the sentence.


Is there any approach that I get get all the emoticons in the sentence ?

Thanks again
 
Rob Spoor
Sheriff
Pie
Posts: 20531
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Change the if into a while would be a start
 
Chia-you Chai
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes ! I change to and it works ! Thanks for helpin !
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic