Java code for extracting data from a HTML Table from a web page
Nandu Vajjala
Greenhorn
Joined: Nov 12, 2005
Posts: 8
posted
0
Hi
We need some pointers on how do we extract data from a HTML Table from a web page using a Java program.
For instance: http://www.fsa.gov.uk/ukla/hcaList.do
Above link has a table in the below format
Company name Country of Incorporation Home member state
3I INFRASTRUCTURE PLC CHANNEL ISLANDS UNITED KINGDOM
888 HOLDINGS PLC GIBRALTAR UNITED KINGDOM
We need to extract the data and convert it to a csv format file.
You should better use Javascript to extract the HTML data. Using Java means you program needs to act as an HTML client. Although open source solution like Jakarta's HttpClient already existing, Javascript is much better choice as the browser already support it. In particular, you will probably need to use HTA (HTML Application) file (see MSDN for it, basically a HTML file with embedded javascript renamed to .hta).