Hi guys I am doing a web project where I need to parse the html code find the largest table in terms of width attribute and display only that table taking care of nested tables in a web page. Below is my code which finds the largest table but it gives error while displaying the largest table please help me with the program as I am new to java.
Looking forward for your help as I amnot very good at string programkming in java.Please help me out guys as early as possible. Looking forward for your help
You mention needing to take care of nested tables but not how: do you want to include nested tables in your results or not?
If you want to use regex then it may help you considerably, but you can help yourself further by incorporating as much as possible into a single pattern. Think about how you might be able to capture an entire <table ... width="xxx" ...> start tag in order to determine the width straight away in one invocation (hint: use capturing groups for the "xxx"). Once you've done this, your only challenge is to address the first point above (hint: keep a counter of open table tags going - increment on each start tag and decrement on each closing table tag).
Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / AmazonAmazon UK )
I have edited your post to add "code" tags; you can see how much easier the code is to read. You can add them yourself in subsequent posts with the buttons beneath the "message" window.
Joined: Mar 27, 2003
Thanks Campbell - now I can actually read the code
It seems as though you already have the counter for start tags (your variable "start"). Let me help you with the regex:Now that code will match every occurrence of an opening <table> tag (make sure you understand why it works!). So you just need to incorporate some code in the loop to take account of nested tables and ignore or extract them, and also the code to extract all the lines of text between matching start and end tags (hint: you can adapt the regex above and tell it to use MULTILINE mode). [ July 03, 2008: Message edited by: Charles Lyons ]