File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Java in General and the fly likes URL string replacing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "URL string replacing" Watch "URL string replacing" New topic

URL string replacing

jack nicolson

Joined: Mar 07, 2008
Posts: 5
Hi All,
I have a very small doubt I want to extract URL from a html, like

<a href="" /a>

I want this to extracted from the html page. the page is have one link only as above. I want to to know the regular expression pattern or substring finding method for it.

waiting for your reply,
dharmendra Rathor

Joined: Mar 20, 2007
Posts: 17
"<a href="" /a>" is a part of html file and we have to retrieve URL ( form html page then we can use

tagged regular expression
(<a href=")(http://[a-z,.]*/)(" /a>
and value of tagged expression two ie (\2)will give the desired output.
jack nicolson

Joined: Mar 07, 2008
Posts: 5
Thanks for your reply,
however when I used your expression then I am getting the same result as I am getting earlier.
I did this way

String line(http response string) = line.replaceAll("<a href=","");
// line = line.replaceAll("http://[a-z,.]*/","");
line = line.replaceAll("/a>","");

Had I done something correct.

The web page from which I want to extract the Url is

<TITLE>Moved Temporarily</TITLE>
<H1>Moved Temporarily</H1>
The document has moved <A HREF="">here</A>.

I want to retrieve only this string ""

Hope this will help you solve my issue.

Campbell Ritchie

Joined: Oct 13, 2005
Posts: 46349
  • Try using either String.indexOf() or a regular expression to match the location of <A href=.
  • Get the index and add the length of <A href= to it. You not have a start index.
  • From that start index, find the first occurrence of "> using the indexOf() method or a regular expression.
  • You now have start and finish indices. Use those to obtain a substring.
  • Put that substring into an ArrayList<URL> or an ArrayList<String>
  • Repeat until you reach the end of the String.
  • I think that will probably work. Try it.
    jack nicolson

    Joined: Mar 07, 2008
    Posts: 5
    Thanks for your suggestion, I tried according but I got out of bound exception

    int i=line.indexOf("<A href=");
    int len = i+"<A href=".length();
    i = line.indexOf(">",len+1);
    Please let me know if I commit any mistake in the code.


    Joanne Neal

    Joined: Aug 05, 2005
    Posts: 3742
    The exception will tell you which line of your code it happened on. Look at the documentation for the methods on that line and see what could cause the exception.

    Campbell Ritchie

    Joined: Oct 13, 2005
    Posts: 46349
    Look closely through the details of the substring() method. You may be getting problems because of the +1.
    In case you are getting lines ending with /a> rather than . . .>link text</a> you might try getting the index of /a> as well, and using that if it is less than the index of >. There is a simple method in the Math class which can do that for you.
    I agree. Here's the link:
    subject: URL string replacing
    jQuery in Action, 3rd edition