File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes XSLT on HTML DOM (x2) Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XSLT on HTML DOM (x2)" Watch "XSLT on HTML DOM (x2)" New topic


R Harvey

Joined: Sep 17, 2002
Posts: 20
I got 2 questions, but they're on the same project, and closly related. First, i'll outline the project.
I'm trying to write a program that can download and extract 'data' from an HTML page on the web. My approach is to download the page, create a DOM of the page using JTIDY, then select out the items i want by using (pretty long) XPATH statements (in XSLT).
So, the first problem is that i'm getting loads of rubbish output from the 'HEAD' section of the HTML DOM tree. I've tried <xsl utput method="...."/> and this seems to have no effect. Is it some kind of namespace problem (that namespace voodoo always seems to give me a headache....)???
The second problem is worse. I've got to be able to reference specific elements within the pages structure, so i'm using some code like:
<xsl:template match="/html/body/center/table[2]/tr[2]....(etc.)"/>(etc.)
to try and reference these 'areas' directly. I've hooked up my DOM tree to the DomEcho02 program from the site (cheeky!!?!), so that i can see a tree structure of the DOM i'm transforming. Even though i'm using this graphical tool to build my XPATH it seems to get to a certain point, and then just stops working. No error, just no output. Can you think of a reason why this approach may be failing???
If you know of another (better) way to write this program i'd love to hear, although i've already tried htmlparser from sourceforge and got completely and hoplessly lost with it!
Many thanks
Madhav Lakkapragada
Ranch Hand

Joined: Jun 03, 2000
Posts: 5040
then just stops working. No error, just no output. Can you think of a reason why this approach may be failing???
its possible that the processor finds no content
in the element that you are trying to select using
the xPath. To debug such issues, I generally select
an attribute (like the ID) at various levels of the
XPath, just so I know how the runtime selection
is proceeding. I have had such accidents also.
Hope this helps.
- madhav
Ok, Now that I suggested something to you, could you
please post the direct link to that DomEcho02
program from the site (cheeky!!?!),
Please don't tell me to goto the Sun website and
search I never find anything in search, unless it is
on this website. If you don't have the link handy,
that's ok, don't sweat.
- m

Take a Minute, Donate an Hour, Change a Life
I agree. Here's the link:
subject: XSLT on HTML DOM (x2)
It's not a secret anymore!