Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Need help with regex expression

 
Billy Sclater
Ranch Hand
Posts: 145
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to write some code to scan an html file for, and return links to reviews of a game.
The file contains long URLs in quotes "". The specific URLs I am looking for will contain
the 'game name', and the word 'review'.

This is what I have so far:

Pattern p = Pattern.compile("http://.*?(?=.*gamename\\sreview).*?(?=\")");

I'm struggling somewhat! Can anyone help?
 
Billy Sclater
Ranch Hand
Posts: 145
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I figured it out:

String MyRegex= "http://www[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]" + "gamename" + "[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Now I'm a great fan of regular expressions but that is just dreadful and provides ammunition for the guys round here who preach that regex were invented by the Devil.

In your OP you specified that the target would contain the word 'review' but I don't see it in your regex. Also, I assume the game name is supplied as a variable so this needs to be escaped so that none of it is interpreted as regex meta characters.
 
Billy Sclater
Ranch Hand
Posts: 145
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeh, I decided to ditch review. When you say 'escape' the game name. Do you mean to just add a forward slash in front of it? Could you show me what you mean?
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34422
347
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Consider breaking it up to make more readable (and have less duplication). For example, a first iteration of refactoring could be:


As a second iteration, you could use character classes such as digit or word character. Or extract the common parts into another String. the idea is to have the final reg exp have less to read.
 
Billy Sclater
Ranch Hand
Posts: 145
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's awesome, thanks!
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic