my dog learned polymorphism*
The moose likes Java in General and the fly likes Finding the largest table ina web page and displaying it Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Finding the largest table ina web page and displaying it" Watch "Finding the largest table ina web page and displaying it" New topic
Author

Finding the largest table ina web page and displaying it

Rashmi Raju
Greenhorn

Joined: Apr 24, 2008
Posts: 4
Hi friends I need to open a url and display only the contents of the largest table in the web page using regular expressions......
I have written the code to open a url and display all the data between <table> tag and </table> tag in a web page.
Please help me correct the below code to find the largest table and display the content.
the code is.....
import java.net.*;
import java.io.*;
import java.util.*;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


class ConnectionTest {
public static void main(String[] args) {
try
{
URL yahoo = new URL("http://www.shopping.com");
URLConnection yahooConnection = yahoo.openConnection();
DataInputStream dis = new DataInputStream(yahooConnection.getInputStream());

String inputLine;

Pattern regexp = Pattern.compile("<table(.*?)</table>", Pattern.DOTALL);



while ((inputLine = dis.readLine()) != null) {
Matcher matcher = regexp.matcher(inputLine);


matcher.reset( inputLine ); //reset the input
if ( matcher.find() )
{


System.out.println(matcher.group());
}
}
dis.close();
} catch (MalformedURLException me) {
System.out.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.out.println("IOException: " + ioe);
}
}
}

Can anyone please help me?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41114
    
  45
How do you define "largest"? If by the length of its string contents, then that's easy to test. If you mean number of rows or columns, then it gets more tricky.

Note that tables may be nested, so you may need to adapt your regexps for that.

For HTML screen scraping I'd use a library like jWebUnit. It provides access to the page content via an API. That should be a lot easier than trying to write your own.


Ping & DNS - my free Android networking tools app
Rashmi Raju
Greenhorn

Joined: Apr 24, 2008
Posts: 4
Yes it is by length of its string contents??
Then please tell me the modification in the code
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Finding the largest table ina web page and displaying it
 
Similar Threads
Please Help -Problem in Regular Expressions
Problem in extracting tables from a web page
can i get other website contents in my webapplication
about URL access (Please help me
Display tables from web page