my dog learned polymorphism*
The moose likes Other Open Source Projects and the fly likes Parse HTML Table Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Parse HTML Table" Watch "Parse HTML Table" New topic
Author

Parse HTML Table

Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
Could some body please let me know how I can parse a html table contained within a string such as this below:

<table border="0" cellspacing="0" cellpadding="0">
<thead>
<th>SOH</th>
<th>Order</th>
<th>PN</th>
<th>Description</th>
<th>TP</th>
<th>RRP</th>
</thead>

<tbody>
<tr>
<td>4</td>
<td>0</td>
<td>MU242</a></td>
<td>Test content</td>
<td></td>
<td></td>
</tr>

<tr>
<td>8</td>
<td>0</td>
<td>>MS243/1</td>
<td>Other content</td>
<td>Something</td>
<td>£23.10</td>
</tr>

</tbody>
</table>

I need to collect each respective value from each row.

Any help would be much appreciated :)
Piyush Mangal
Ranch Hand

Joined: Jan 22, 2007
Posts: 196
You can make use of any available Java Script libraries like JQuery, Prototype for this.
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
I am pulling the raw data from a database put need to parse and populate another set of tables. Could you please provide an example?
Tim Moores
Rancher

Joined: Sep 21, 2011
Posts: 2408
jQuery and such libraries will only help if the HTML is part of a web page inside a browser, not so much in your case.

Libraries such as NekoHTML, HtmlCleaner, jTidy and TagSoup clean up HTML and transform it to XML - thus enabling the use of XML APIs such as DOM and XPath.
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
So in essence, I need to pass the database stored html string to a library to translate to XML and then parse that?
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
Please help I am batting my head against a brick wall, could anyone please provide a real world demo example?
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
What about the posibility, for example:

1. Find how many table rows <tr> exist in the string
2. Split the string at </tr>
3. Move to first stringbetween <td> </td>
4. Get value
5. Move to next stringbetween <td> </td>
6. Repeat 4
7. Repeat 4 &5 till end of the row
8. Repeat 3 to 7 for each row.

Just a thought, the html is well formatted.
Tim Moores
Rancher

Joined: Sep 21, 2011
Posts: 2408
Sure, if you're certain that the HTML is formed that way, and will not change, then that's a possibility.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18716
    
    8

If you want to parse HTML in your Java code, then what you need (obvious as it may sound) is an HTML parser written in Java. So your Google keywords are java html parser... yup, I just checked, that gives you plenty of useful links.
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
I've been trying to find a useful jsp HTML parser but not had any success, some libraries but no clear instructions as how to implement.

I don't think the HTML will change anytime soon but would appreciate any links that demonstate how to do what I've suggested.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18716
    
    8

There's several perfectly good HTML parsers out there. Can you point one of them where you consider the "implementation instructions" to be inadequate?
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
Ok, ive looked at Jsoup but struggling with the implementation.

I have uploaded jsoup-1.6.2.jar to WEB-INF/lib and have the following code at the top of my jsp page:

<%@ page import="org.jsoup.nodes.Document"%>
<%@ page import="org.jsoup.Jsoup"%>
<%@ page import="org.jsoup.nodes.Element"%>

And then the following code to parse my string which contains the html table:

String html = msVals[i][6];
Document doc = Jsoup.parse(html);

Element tableRows = doc.select("tr").first();
Iterator<Element> ite = tableRows.select("td").iterator();

String soh = ite.next().text();
String onorder = ite.next().text();
String pn = ite.next().text();
String description = ite.next().text();
String tp = ite.next().text();
String rrp = ite.next().text();

out.println("SOH: "+soh+" Order: "+onorder+" PN: "+pn+" Description: "+description+" TP: "+tp+" RRP: "+rrp+"<br>");

Unfortunatly its not working, any help would be much appreciated
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
Ideally I would like to loop through each table row, pass the table cell values into an array to then use for saving to a database later.
Gary Sheldon
Ranch Hand

Joined: Nov 21, 2011
Posts: 44
Its ok I've sorted it, hope the following proves useful for others:

<%@ page import="org.jsoup.nodes.Document"%>
<%@ page import="org.jsoup.Jsoup"%>
<%@ page import="org.jsoup.nodes.Element"%>
<%@ page import="org.jsoup.select.Elements"%>

String html = msVals[i][6];
Document doc = Jsoup.parse(html);

for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 4) {
soh = tds.get(0).text();
onorder = tds.get(1).text();
pn = tds.get(2).text();
description = tds.get(3).text();
tp = tds.get(4).text();
rrp = tds.get(5).text();

out.println("<tr><td>"+soh+"</td><td>"+onorder+"</td><td>"+pn+"</td><td>"+description+"</td><td>"+tp+"</td><td>"+rrp+"</td></tr>");
}
}
}
 
jQuery in Action, 2nd edition
 
subject: Parse HTML Table