File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Beginning Java and the fly likes Display tables from web page Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Display tables from web page" Watch "Display tables from web page" New topic

Display tables from web page

Raju Muni

Joined: Jul 02, 2008
Posts: 1
Hi guys I am doing a web project where I need to parse the html code find the largest table in terms of width attribute and display only that table taking care of nested tables in a web page. Below is my code which finds the largest table but it gives error while displaying the largest table please help me with the program as I am new to java.

Looking forward for your help as I amnot very good at string programkming in java.Please help me out guys as early as possible. Looking forward for your help

[edit]Add code tags. CR[/edit]
[ July 02, 2008: Message edited by: Campbell Ritchie ]
Charles Lyons
Ranch Hand

Joined: Mar 27, 2003
Posts: 836
A few thoughts to help you (I hope):
  • You mention needing to take care of nested tables but not how: do you want to include nested tables in your results or not?
  • If you want to use regex then it may help you considerably, but you can help yourself further by incorporating as much as possible into a single pattern. Think about how you might be able to capture an entire <table ... width="xxx" ...> start tag in order to determine the width straight away in one invocation (hint: use capturing groups for the "xxx"). Once you've done this, your only challenge is to address the first point above (hint: keep a counter of open table tags going - increment on each start tag and decrement on each closing table tag).

  • Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
    Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / Amazon Amazon UK )
    Campbell Ritchie

    Joined: Oct 13, 2005
    Posts: 46405
    Welcome to JavaRanch

    I have edited your post to add "code" tags; you can see how much easier the code is to read. You can add them yourself in subsequent posts with the buttons beneath the "message" window.
    Charles Lyons
    Ranch Hand

    Joined: Mar 27, 2003
    Posts: 836
    Thanks Campbell - now I can actually read the code

    It seems as though you already have the counter for start tags (your variable "start"). Let me help you with the regex:Now that code will match every occurrence of an opening <table> tag (make sure you understand why it works!). So you just need to incorporate some code in the loop to take account of nested tables and ignore or extract them, and also the code to extract all the lines of text between matching start and end tags (hint: you can adapt the regex above and tell it to use MULTILINE mode).
    [ July 03, 2008: Message edited by: Charles Lyons ]
    I agree. Here's the link:
    subject: Display tables from web page
    jQuery in Action, 3rd edition