• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Problem with XML

 
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am working on parsing a xml file. I am using JDOM for this. Basically, I get xml file generated through iBatis -> Oracle. I parse this xml file to add some elements etc. But in my java parser application, when I try to open the oracle generated xml, I get the error "jdom.input.JDOMParseException: Error on line 1 of document file:/C:/Projects/app/xml/t.xml: Character conversion error: "Unconvertible UTF-8 character beginning with 0x91" (line number may be too low). "

I generated another xml file with another query. That was opened and I could parse it too.

Can someone tell me why I get this error when I open the xml file? any solution to this?

Thanks in advance,
Padma.
 
Ranch Hand
Posts: 5040
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you open the t.xml file using a text editor (notepad / wordpad).
With the info you gave here I am wondering if the file is "compressed".

- m
 
Padma Prasad
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Madhav,

Yes. I could. I could open the file in text editor. I found that few records having some special characters are the reason behind this error. I removed those and everything seem to be working. So, I tried to use CDATA section because I cannot control the data(with special characters) that is coming in. But this also failed. Even when I added CDATA, I still get the same old error.

" org.xml.sax.SAXParseException: Character conversion error: "Unconvertible UTF-8 character beginning with 0x93" (line number may be too low)."

What could be the reason? Does the data at CDATA section also needs validation?

Thanks,
Padma.
 
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The parser is assuming that the XML is encoded in UTF-8, whereas it is really encoded in some other encoding.

So either iBatis/Oracle is producing a document that forgot to declare its encoding (unlikely but not impossible) or you are passing the document to JDOM in such a way that you don't allow JDOM to use the correct encoding (more likely and quite possible). One such way would be to pass JDOM a FileReader that by default uses the system's default encoding. Would you like to post the code where you pass the document to JDOM?
 
Padma Prasad
Ranch Hand
Posts: 76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Paul,

This is the code......

Document doc = null;

SAXBuilder buildr = new SAXBuilder();

try {

log.debug("Before opening the file");
doc = buildr.build("C:/Projects/app/xml/t.xml");
log.debug("Opened the file");
} catch (JDOMException e) {
System.out.println("File is not well-formed.");
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println("I/O Exception");
System.out.println(e.getMessage());
}

I get the JDOM Exception.

Please find the record in the xml file where I get this error. (I got it working when I supplied xml file without this record).

<RECORD>
<ID>12345</ID>
<NAME><![CDATA[ DESCRIPTION ]]></NAME>
<VALUE><![CDATA[ This summer the award-winning outdoor theatre Company, Illyria bring Jane Austen�s classic novel to life in the Castle Grounds ]]></VALUE>
</RECORD>

The apostrophe is of oracle style. When I change that to windows style of apostrophe, I get the record validated.

The data in CDATA section should not be validated. Then why is it that I get this error? Also, why is it that I get JDOM Exception which is basically thrown when the xml file is not well-formed?

Thanks,
Padma.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First of all, your claim that characters within CDATA sections "should not be validated" (i.e. don't have to be encoded in the same encoding as the rest of the document) is incorrect. All of an XML document has to be encoded in whatever encoding is specified in the prolog.

I don't see any problem with your JDOM code. So it follows that either Oracle is failing to declare the correct encoding for the document (a possibility, as I said; I have no idea what an "oracle style" quote might be) or that some other encoding-unaware process between Oracle generation and your parsing is damaging the document. For example you might be transferring it via HTTP in a way that assumes it is in a particular encoding.
 
Can you shoot lasers out of your eyes? Don't look at this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic