| Author |
URL string replacing
|
jack nicolson
Greenhorn
Joined: Mar 07, 2008
Posts: 5
|
|
Hi All, I have a very small doubt I want to extract URL from a html, like <a href="http://www.yahoo.com/" /a> I want http://www.yahoo.com/ this to extracted from the html page. the page is have one link only as above. I want to to know the regular expression pattern or substring finding method for it. waiting for your reply, Jack,
|
 |
dharmendra Rathor
Greenhorn
Joined: Mar 20, 2007
Posts: 17
|
|
If "<a href="http://www.yahoo.com/" /a>" is a part of html file and we have to retrieve URL (http://www.yahoo.com/) form html page then we can use tagged regular expression (<a href=")(http://[a-z,.]*/)(" /a> and value of tagged expression two ie (\2)will give the desired output.
|
 |
jack nicolson
Greenhorn
Joined: Mar 07, 2008
Posts: 5
|
|
Thanks for your reply, however when I used your expression then I am getting the same result as I am getting earlier. I did this way String line(http response string) = line.replaceAll("<a href=",""); // line = line.replaceAll("http://[a-z,.]*/",""); line = line.replaceAll("/a>",""); Had I done something correct. The web page from which I want to extract the Url is <HTML> <HEAD> <TITLE>Moved Temporarily</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF" TEXT="#000000"> <H1>Moved Temporarily</H1> The document has moved <A HREF="http://www.yahoo.com/feeds/default/private/full/?gsessionid=w8O_URi_sRmSo66ZbxfhYQ">here</A>. </BODY> </HTML> I want to retrieve only this string "http://www.yahoo.com/feeds/default/private/full/?gsessionid=w8O_URi_sRmSo66ZbxfhYQ" Hope this will help you solve my issue. Thanks Jack,
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32712
|
|
|
Try using either String.indexOf() or a regular expression to match the location of <A href=.Get the index and add the length of <A href= to it. You not have a start index.From that start index, find the first occurrence of "> using the indexOf() method or a regular expression.You now have start and finish indices. Use those to obtain a substring.Put that substring into an ArrayList<URL> or an ArrayList<String>Repeat until you reach the end of the String.I think that will probably work. Try it.
|
 |
jack nicolson
Greenhorn
Joined: Mar 07, 2008
Posts: 5
|
|
Thanks for your suggestion, I tried according but I got out of bound exception int i=line.indexOf("<A href="); out.println(i); int len = i+"<A href=".length(); i = line.indexOf(">",len+1); out.println(i); line=line.substring(len+1,i+1); out.println(line); Please let me know if I commit any mistake in the code. Thanks, Jack.
|
 |
Joanne Neal
Rancher
Joined: Aug 05, 2005
Posts: 3011
|
|
|
The exception will tell you which line of your code it happened on. Look at the documentation for the methods on that line and see what could cause the exception.
|
Joanne
|
 |
Campbell Ritchie
Sheriff
Joined: Oct 13, 2005
Posts: 32712
|
|
Look closely through the details of the substring() method. You may be getting problems because of the +1. In case you are getting lines ending with /a> rather than . . .>link text</a> you might try getting the index of /a> as well, and using that if it is less than the index of >. There is a simple method in the Math class which can do that for you.
|
 |
 |
|
|
subject: URL string replacing
|
|
|