aspose file tools
The moose likes Java in General and the fly likes Need help with regex expression Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of The Software Craftsman this week in the Agile forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Need help with regex expression" Watch "Need help with regex expression" New topic

Need help with regex expression

Billy Sclater
Ranch Hand

Joined: Nov 18, 2012
Posts: 143

I'm trying to write some code to scan an html file for, and return links to reviews of a game.
The file contains long URLs in quotes "". The specific URLs I am looking for will contain
the 'game name', and the word 'review'.

This is what I have so far:

Pattern p = Pattern.compile("http://.*?(?=.*gamename\\sreview).*?(?=\")");

I'm struggling somewhat! Can anyone help?
Billy Sclater
Ranch Hand

Joined: Nov 18, 2012
Posts: 143

I figured it out:

String MyRegex= "http://www[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]" + "gamename" + "[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Richard Tookey

Joined: Aug 27, 2012
Posts: 1129

Now I'm a great fan of regular expressions but that is just dreadful and provides ammunition for the guys round here who preach that regex were invented by the Devil.

In your OP you specified that the target would contain the word 'review' but I don't see it in your regex. Also, I assume the game name is supplied as a variable so this needs to be escaped so that none of it is interpreted as regex meta characters.
Billy Sclater
Ranch Hand

Joined: Nov 18, 2012
Posts: 143

Yeh, I decided to ditch review. When you say 'escape' the game name. Do you mean to just add a forward slash in front of it? Could you show me what you mean?
Jeanne Boyarsky
author & internet detective

Joined: May 26, 2003
Posts: 31668

Consider breaking it up to make more readable (and have less duplication). For example, a first iteration of refactoring could be:

As a second iteration, you could use character classes such as digit or word character. Or extract the common parts into another String. the idea is to have the final reg exp have less to read.

[OCA 8 book] [Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Billy Sclater
Ranch Hand

Joined: Nov 18, 2012
Posts: 143

That's awesome, thanks!
It is sorta covered in the JavaRanch Style Guide.
subject: Need help with regex expression