aspose file tools*
The moose likes Java in General and the fly likes URL string replacing Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "URL string replacing" Watch "URL string replacing" New topic
Author

URL string replacing

jack nicolson
Greenhorn

Joined: Mar 07, 2008
Posts: 5
Hi All,
I have a very small doubt I want to extract URL from a html, like

<a href="http://www.yahoo.com/" /a>

I want http://www.yahoo.com/ this to extracted from the html page. the page is have one link only as above. I want to to know the regular expression pattern or substring finding method for it.

waiting for your reply,
Jack,
dharmendra Rathor
Greenhorn

Joined: Mar 20, 2007
Posts: 17
If
"<a href="http://www.yahoo.com/" /a>" is a part of html file and we have to retrieve URL (http://www.yahoo.com/) form html page then we can use

tagged regular expression
(<a href=")(http://[a-z,.]*/)(" /a>
and value of tagged expression two ie (\2)will give the desired output.
jack nicolson
Greenhorn

Joined: Mar 07, 2008
Posts: 5
Thanks for your reply,
however when I used your expression then I am getting the same result as I am getting earlier.
I did this way


String line(http response string) = line.replaceAll("<a href=","");
// line = line.replaceAll("http://[a-z,.]*/","");
line = line.replaceAll("/a>","");

Had I done something correct.

The web page from which I want to extract the Url is




<HTML>
<HEAD>
<TITLE>Moved Temporarily</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Temporarily</H1>
The document has moved <A HREF="http://www.yahoo.com/feeds/default/private/full/?gsessionid=w8O_URi_sRmSo66ZbxfhYQ">here</A>.
</BODY>
</HTML>

I want to retrieve only this string "http://www.yahoo.com/feeds/default/private/full/?gsessionid=w8O_URi_sRmSo66ZbxfhYQ"

Hope this will help you solve my issue.

Thanks
Jack,
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 40071
    
  28
  • Try using either String.indexOf() or a regular expression to match the location of <A href=.
  • Get the index and add the length of <A href= to it. You not have a start index.
  • From that start index, find the first occurrence of "> using the indexOf() method or a regular expression.
  • You now have start and finish indices. Use those to obtain a substring.
  • Put that substring into an ArrayList<URL> or an ArrayList<String>
  • Repeat until you reach the end of the String.
  • I think that will probably work. Try it.
    jack nicolson
    Greenhorn

    Joined: Mar 07, 2008
    Posts: 5
    Thanks for your suggestion, I tried according but I got out of bound exception

    int i=line.indexOf("<A href=");
    out.println(i);
    int len = i+"<A href=".length();
    i = line.indexOf(">",len+1);
    out.println(i);
    line=line.substring(len+1,i+1);
    out.println(line);
    Please let me know if I commit any mistake in the code.

    Thanks,

    Jack.
    Joanne Neal
    Rancher

    Joined: Aug 05, 2005
    Posts: 3742
        
      16
    The exception will tell you which line of your code it happened on. Look at the documentation for the methods on that line and see what could cause the exception.


    Joanne
    Campbell Ritchie
    Sheriff

    Joined: Oct 13, 2005
    Posts: 40071
        
      28
    Look closely through the details of the substring() method. You may be getting problems because of the +1.
    In case you are getting lines ending with /a> rather than . . .>link text</a> you might try getting the index of /a> as well, and using that if it is less than the index of >. There is a simple method in the Math class which can do that for you.
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: URL string replacing