wood burning stoves 2.0*
The moose likes XML and Related Technologies and the fly likes Problem with XML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Problem with XML" Watch "Problem with XML" New topic
Author

Problem with XML

Padma Prasad
Ranch Hand

Joined: Sep 16, 2002
Posts: 76
Hi,

I am working on parsing a xml file. I am using JDOM for this. Basically, I get xml file generated through iBatis -> Oracle. I parse this xml file to add some elements etc. But in my java parser application, when I try to open the oracle generated xml, I get the error "jdom.input.JDOMParseException: Error on line 1 of document file:/C:/Projects/app/xml/t.xml: Character conversion error: "Unconvertible UTF-8 character beginning with 0x91" (line number may be too low). "

I generated another xml file with another query. That was opened and I could parse it too.

Can someone tell me why I get this error when I open the xml file? any solution to this?

Thanks in advance,
Padma.
Madhav Lakkapragada
Ranch Hand

Joined: Jun 03, 2000
Posts: 5040
Can you open the t.xml file using a text editor (notepad / wordpad).
With the info you gave here I am wondering if the file is "compressed".

- m


Take a Minute, Donate an Hour, Change a Life
http://www.ashanet.org/workanhour/2006/?r=Javaranch_ML&a=81
Padma Prasad
Ranch Hand

Joined: Sep 16, 2002
Posts: 76
Hi Madhav,

Yes. I could. I could open the file in text editor. I found that few records having some special characters are the reason behind this error. I removed those and everything seem to be working. So, I tried to use CDATA section because I cannot control the data(with special characters) that is coming in. But this also failed. Even when I added CDATA, I still get the same old error.

" org.xml.sax.SAXParseException: Character conversion error: "Unconvertible UTF-8 character beginning with 0x93" (line number may be too low)."

What could be the reason? Does the data at CDATA section also needs validation?

Thanks,
Padma.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

The parser is assuming that the XML is encoded in UTF-8, whereas it is really encoded in some other encoding.

So either iBatis/Oracle is producing a document that forgot to declare its encoding (unlikely but not impossible) or you are passing the document to JDOM in such a way that you don't allow JDOM to use the correct encoding (more likely and quite possible). One such way would be to pass JDOM a FileReader that by default uses the system's default encoding. Would you like to post the code where you pass the document to JDOM?
Padma Prasad
Ranch Hand

Joined: Sep 16, 2002
Posts: 76
Hi Paul,

This is the code......

Document doc = null;

SAXBuilder buildr = new SAXBuilder();

try {

log.debug("Before opening the file");
doc = buildr.build("C:/Projects/app/xml/t.xml");
log.debug("Opened the file");
} catch (JDOMException e) {
System.out.println("File is not well-formed.");
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println("I/O Exception");
System.out.println(e.getMessage());
}

I get the JDOM Exception.

Please find the record in the xml file where I get this error. (I got it working when I supplied xml file without this record).

<RECORD>
<ID>12345</ID>
<NAME><![CDATA[ DESCRIPTION ]]></NAME>
<VALUE><![CDATA[ This summer the award-winning outdoor theatre Company, Illyria bring Jane Austen�s classic novel to life in the Castle Grounds ]]></VALUE>
</RECORD>

The apostrophe is of oracle style. When I change that to windows style of apostrophe, I get the record validated.

The data in CDATA section should not be validated. Then why is it that I get this error? Also, why is it that I get JDOM Exception which is basically thrown when the xml file is not well-formed?

Thanks,
Padma.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

First of all, your claim that characters within CDATA sections "should not be validated" (i.e. don't have to be encoded in the same encoding as the rest of the document) is incorrect. All of an XML document has to be encoded in whatever encoding is specified in the prolog.

I don't see any problem with your JDOM code. So it follows that either Oracle is failing to declare the correct encoding for the document (a possibility, as I said; I have no idea what an "oracle style" quote might be) or that some other encoding-unaware process between Oracle generation and your parsing is damaging the document. For example you might be transferring it via HTTP in a way that assumes it is in a particular encoding.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Problem with XML
 
Similar Threads
The character '' is an invalid XML character exception getting while export data in XML file
XML Beans - The type javax.xml.stream. XMLStreamReader cannot be resolved.
Setting a TextArea to show contents of file.
XML to JavaBeans
XML Parsing problem