• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Finding the largest table ina web page and displaying it

 
Rashmi Raju
Greenhorn
Posts: 4
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi friends I need to open a url and display only the contents of the largest table in the web page using regular expressions......
I have written the code to open a url and display all the data between <table> tag and </table> tag in a web page.
Please help me correct the below code to find the largest table and display the content.
the code is.....
import java.net.*;
import java.io.*;
import java.util.*;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


class ConnectionTest {
public static void main(String[] args) {
try
{
URL yahoo = new URL("http://www.shopping.com");
URLConnection yahooConnection = yahoo.openConnection();
DataInputStream dis = new DataInputStream(yahooConnection.getInputStream());

String inputLine;

Pattern regexp = Pattern.compile("<table(.*?)</table>", Pattern.DOTALL);



while ((inputLine = dis.readLine()) != null) {
Matcher matcher = regexp.matcher(inputLine);


matcher.reset( inputLine ); //reset the input
if ( matcher.find() )
{


System.out.println(matcher.group());
}
}
dis.close();
} catch (MalformedURLException me) {
System.out.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.out.println("IOException: " + ioe);
}
}
}

Can anyone please help me?
 
Ulf Dittmer
Rancher
Pie
Posts: 42966
73
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do you define "largest"? If by the length of its string contents, then that's easy to test. If you mean number of rows or columns, then it gets more tricky.

Note that tables may be nested, so you may need to adapt your regexps for that.

For HTML screen scraping I'd use a library like jWebUnit. It provides access to the page content via an API. That should be a lot easier than trying to write your own.
 
Rashmi Raju
Greenhorn
Posts: 4
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes it is by length of its string contents??
Then please tell me the modification in the code
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic