• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parse HTML Table

 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could some body please let me know how I can parse a html table contained within a string such as this below:

<table border="0" cellspacing="0" cellpadding="0">
<thead>
<th>SOH</th>
<th>Order</th>
<th>PN</th>
<th>Description</th>
<th>TP</th>
<th>RRP</th>
</thead>

<tbody>
<tr>
<td>4</td>
<td>0</td>
<td>MU242</a></td>
<td>Test content</td>
<td></td>
<td></td>
</tr>

<tr>
<td>8</td>
<td>0</td>
<td>>MS243/1</td>
<td>Other content</td>
<td>Something</td>
<td>£23.10</td>
</tr>

</tbody>
</table>

I need to collect each respective value from each row.

Any help would be much appreciated :)
 
Piyush Mangal
Ranch Hand
Posts: 196
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can make use of any available Java Script libraries like JQuery, Prototype for this.
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am pulling the raw data from a database put need to parse and populate another set of tables. Could you please provide an example?
 
Tim Moores
Bartender
Posts: 2752
38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
jQuery and such libraries will only help if the HTML is part of a web page inside a browser, not so much in your case.

Libraries such as NekoHTML, HtmlCleaner, jTidy and TagSoup clean up HTML and transform it to XML - thus enabling the use of XML APIs such as DOM and XPath.
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So in essence, I need to pass the database stored html string to a library to translate to XML and then parse that?
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please help I am batting my head against a brick wall, could anyone please provide a real world demo example?
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What about the posibility, for example:

1. Find how many table rows <tr> exist in the string
2. Split the string at </tr>
3. Move to first stringbetween <td> </td>
4. Get value
5. Move to next stringbetween <td> </td>
6. Repeat 4
7. Repeat 4 &5 till end of the row
8. Repeat 3 to 7 for each row.

Just a thought, the html is well formatted.
 
Tim Moores
Bartender
Posts: 2752
38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sure, if you're certain that the HTML is formed that way, and will not change, then that's a possibility.
 
Paul Clapham
Sheriff
Posts: 20986
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you want to parse HTML in your Java code, then what you need (obvious as it may sound) is an HTML parser written in Java. So your Google keywords are java html parser... yup, I just checked, that gives you plenty of useful links.
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've been trying to find a useful jsp HTML parser but not had any success, some libraries but no clear instructions as how to implement.

I don't think the HTML will change anytime soon but would appreciate any links that demonstate how to do what I've suggested.
 
Paul Clapham
Sheriff
Posts: 20986
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's several perfectly good HTML parsers out there. Can you point one of them where you consider the "implementation instructions" to be inadequate?
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, ive looked at Jsoup but struggling with the implementation.

I have uploaded jsoup-1.6.2.jar to WEB-INF/lib and have the following code at the top of my jsp page:

<%@ page import="org.jsoup.nodes.Document"%>
<%@ page import="org.jsoup.Jsoup"%>
<%@ page import="org.jsoup.nodes.Element"%>

And then the following code to parse my string which contains the html table:

String html = msVals[i][6];
Document doc = Jsoup.parse(html);

Element tableRows = doc.select("tr").first();
Iterator<Element> ite = tableRows.select("td").iterator();

String soh = ite.next().text();
String onorder = ite.next().text();
String pn = ite.next().text();
String description = ite.next().text();
String tp = ite.next().text();
String rrp = ite.next().text();

out.println("SOH: "+soh+" Order: "+onorder+" PN: "+pn+" Description: "+description+" TP: "+tp+" RRP: "+rrp+"<br>");

Unfortunatly its not working, any help would be much appreciated
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ideally I would like to loop through each table row, pass the table cell values into an array to then use for saving to a database later.
 
Gary Sheldon
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Its ok I've sorted it, hope the following proves useful for others:

<%@ page import="org.jsoup.nodes.Document"%>
<%@ page import="org.jsoup.Jsoup"%>
<%@ page import="org.jsoup.nodes.Element"%>
<%@ page import="org.jsoup.select.Elements"%>

String html = msVals[i][6];
Document doc = Jsoup.parse(html);

for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 4) {
soh = tds.get(0).text();
onorder = tds.get(1).text();
pn = tds.get(2).text();
description = tds.get(3).text();
tp = tds.get(4).text();
rrp = tds.get(5).text();

out.println("<tr><td>"+soh+"</td><td>"+onorder+"</td><td>"+pn+"</td><td>"+description+"</td><td>"+tp+"</td><td>"+rrp+"</td></tr>");
}
}
}
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic