This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Java in General and the fly likes URL string replacing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "URL string replacing" Watch "URL string replacing" New topic
Author

URL string replacing

jack nicolson
Greenhorn

Joined: Mar 07, 2008
Posts: 5
Hi All,
I have a very small doubt I want to extract URL from a html, like

<a href="http://www.yahoo.com/" /a>

I want http://www.yahoo.com/ this to extracted from the html page. the page is have one link only as above. I want to to know the regular expression pattern or substring finding method for it.

waiting for your reply,
Jack,
dharmendra Rathor
Greenhorn

Joined: Mar 20, 2007
Posts: 17
If
"<a href="http://www.yahoo.com/" /a>" is a part of html file and we have to retrieve URL (http://www.yahoo.com/) form html page then we can use

tagged regular expression
(<a href=")(http://[a-z,.]*/)(" /a>
and value of tagged expression two ie (\2)will give the desired output.
jack nicolson
Greenhorn

Joined: Mar 07, 2008
Posts: 5
Thanks for your reply,
however when I used your expression then I am getting the same result as I am getting earlier.
I did this way


String line(http response string) = line.replaceAll("<a href=","");
// line = line.replaceAll("http://[a-z,.]*/","");
line = line.replaceAll("/a>","");

Had I done something correct.

The web page from which I want to extract the Url is




<HTML>
<HEAD>
<TITLE>Moved Temporarily</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Temporarily</H1>
The document has moved <A HREF="http://www.yahoo.com/feeds/default/private/full/?gsessionid=w8O_URi_sRmSo66ZbxfhYQ">here</A>.
</BODY>
</HTML>

I want to retrieve only this string "http://www.yahoo.com/feeds/default/private/full/?gsessionid=w8O_URi_sRmSo66ZbxfhYQ"

Hope this will help you solve my issue.

Thanks
Jack,
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37941
    
  22
  • Try using either String.indexOf() or a regular expression to match the location of <A href=.
  • Get the index and add the length of <A href= to it. You not have a start index.
  • From that start index, find the first occurrence of "> using the indexOf() method or a regular expression.
  • You now have start and finish indices. Use those to obtain a substring.
  • Put that substring into an ArrayList<URL> or an ArrayList<String>
  • Repeat until you reach the end of the String.
  • I think that will probably work. Try it.
    jack nicolson
    Greenhorn

    Joined: Mar 07, 2008
    Posts: 5
    Thanks for your suggestion, I tried according but I got out of bound exception

    int i=line.indexOf("<A href=");
    out.println(i);
    int len = i+"<A href=".length();
    i = line.indexOf(">",len+1);
    out.println(i);
    line=line.substring(len+1,i+1);
    out.println(line);
    Please let me know if I commit any mistake in the code.

    Thanks,

    Jack.
    Joanne Neal
    Rancher

    Joined: Aug 05, 2005
    Posts: 3419
        
      12
    The exception will tell you which line of your code it happened on. Look at the documentation for the methods on that line and see what could cause the exception.


    Joanne
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 37941
        
      22
    Look closely through the details of the substring() method. You may be getting problems because of the +1.
    In case you are getting lines ending with /a> rather than . . .>link text</a> you might try getting the index of /a> as well, and using that if it is less than the index of >. There is a simple method in the Math class which can do that for you.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: URL string replacing
     
    Similar Threads
    getting page contents n servlet hit
    how to retrive something from a line of string
    HttpsConnection not establishing?
    Read HTML/Source of external site page
    How to get response content of a website