File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java Micro Edition and the fly likes kXml - Preventing expanding entity references in attribute values Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Mobile » Java Micro Edition
Bookmark "kXml - Preventing expanding entity references in attribute values " Watch "kXml - Preventing expanding entity references in attribute values " New topic
Author

kXml - Preventing expanding entity references in attribute values

Vivek Viswanathan
Ranch Hand

Joined: Mar 03, 2001
Posts: 350
Hi

I am using kXml for parsing a well formed html page and I am having a problem since this parser expands entity references in attributes values.
Since the page that I am parsing is an HTML page it contains something on these lines

...href="http://foobar.com/FooToos.aspx?ito=4912&itc=0"...

As you can see the parser reads the attribute values &itc=0 and thinks that it is a begingning of an entity and then falls over since it doesnt get an ending ; it complains that it could not resolve &itc

But as you can see that is not an entity ref rather it is paramters passed to the page FooToos.aspx.

So comming back to my questions.

Has anyone going around and modified kXml source code so that it doesnt be too smart and starts expanding all the entity references it encounters in attribute values.


Vivek Viswanathan SCJP 1.2, SCJP 1.6,SCJD,SCEA,SCWCD,IBM-484,IBM-486,IBM-141,Ms.NET C# 70-316,SCMAD, LPIC-I
Alexander Traud
Greenhorn

Joined: Jul 07, 2004
Posts: 16
I had a similar problem in my XHTML page and it turned out to be a bug on my side. I would say that file is not well formed. Every literal "&" must be escaped with "& a m p ;" even within attribute values. The W3C validator will not like your page either.
If you want to use a literal ampersand in your document you must encode it as "& a m p ;" (even inside URLs!).

[ July 11, 2004: Message edited by: Alexander Traud ]
Vivek Viswanathan
Ranch Hand

Joined: Mar 03, 2001
Posts: 350
Hi

Thanks for the reply, the only problem in my case is that the HTML page that I am parsing is not a page developed my me. It is a page of some web site so I do not have access to the html generated by them.

Vivek
James Reilly
wrangler
Ranch Hand

Joined: Oct 01, 2003
Posts: 30
It ought to be pretty straightfoward to write your own HTML to XHTML converter servlet, cgi bin page, etc. using something off-the-shelf like Tidy or JTidy? O'Reilly has something along those lines here: http://www.oreillynet.com/network/2000/04/28/feature/index.csp.

Off the top of my head, it doesn't sound like a great deal of work.
For an HTTP GET, it is probably relatively straightforward and perhaps
with some googling one could find lots of helper packages and APIs
for converting HTML to XHTML.

Theoretically, a MIDlet could also do a conversion to XHTML too.
But MIDlet size and memory (e.g. for large HTML pages) might be
problematic.

Your mileage may vary :-).
james
Vivek Viswanathan
Ranch Hand

Joined: Mar 03, 2001
Posts: 350
Cheers Mate.

But I had talk with the site that is providing the html page and I asked them if they provide me with a web service rather than me parsing html pages and told me that they can provide me with an xml reply ( rather than an html).
So I am back in the game now parsing the xml document, though I had to ditch all the old code that I had written to parse the html page.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: kXml - Preventing expanding entity references in attribute values
 
Similar Threads
my attribute value is coming in html tag how to parse that value in java
Question on Entity References
My two bits
EL expressions involvin . and [] operator
kXml - Preventing expanding entity references in attribute values