File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes how to extract search engine results Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "how to extract search engine results" Watch "how to extract search engine results" New topic
Author

how to extract search engine results

amitha reddy
Greenhorn

Joined: Apr 13, 2005
Posts: 1
hi iam new to java
i want to extract URLs from search engine such google
i tried it and iam getting one URL
how to get all URLs
here is my code
import java.net.*;
import java.io.*;
import java.util.*;
class Googly2
{
public static void main(String[] args)
{
try{
//Reading the keyword from text file
DataInputStream din=new DataInputStream(new BufferedInputStream(new FileInputStream("C:\\Documents and Settings\\Administrator\\Desktop\\keywordfile.txt")));
String str;
while((str=din.readLine())!=null)
{
System.out.println("\nKeyword :" +str);
//Creating URL
URL url = new URL("http://www.google.com/sponsoredlinks?hl=en&lr=&q="+str);
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent","");
conn.connect();
//Reading the page
BufferedReader in =
new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
String lines;
String result="";
String urlResult="";

int c=0;
while((line=in.readLine())!=null)
{
if(line.indexOf("return ss")!=-1)
{
c++;
System.out.println("C :"+c);}
if(line.indexOf("return ss")!=-1)
{
System.out.println("C :"+c);
urlResult= line.substring(line.indexOf("return ss"),line.indexOf("onMouseOut"));

}
}




System.out.println("\nURL : "+urlResult);

}


}catch(Exception e)
{
e.printStackTrace();
}

}}
David Harkness
Ranch Hand

Joined: Aug 07, 2003
Posts: 1646
The if tests in your while loop are duplicates. Perhaps you want this:Also, in the else block you might want to make sure the line also contains "onMouseOut" or you'll get an exception.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: how to extract search engine results