aspose file tools*
The moose likes I/O and Streams and the fly likes java io UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "java io UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence" Watch "java io UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence" New topic
Author

java io UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence

Jignesh Gohel
Ranch Hand

Joined: Dec 28, 2004
Posts: 276
Hello,

I have two queries as follows:
a) What is the difference between response content types text/pdf and application/pdf ?
2) In my application i am generating a xml.In the generated XML the encoding is alreadys specified as "UTF-8".

Now using this xml, i want to generated a PDF.For generating this PDF i am using JasperReport's class net.sf.jasperreports.engine.data.JRXmlDataSource.

The code snippet for the same is as follows:



Buw when the following line is executed :


i am getting this exception :



Can anybody please explain me why this is happening and how to resolve this ?

Thanks,
Jignesh


Regards,
Jignesh

The Art Of Life Is To Know When To Be Useless And When To Be Useful - CHUANG TZU
Nicholas Jordan
Ranch Hand

Joined: Sep 17, 2006
Posts: 1282
The first two bytes of a Unicode file are a marker code to determine byte ordering in the file. It appears, as a preliminary guess, that you are getting a xmlDataBuffer and using it as a JRXmlDataSource, the UTFDataFormatException tells us that this first and second byte are not an 0xfe 0xff pair. The obvious place to look is dig deep in the documentaion for the two data types looking for any and all information on the BOM implementation ( BOM == byte order mark )

Byte Order Mark. The Unicode character U+FEFF when used to indicate the byte order of a text


Source: Glossary of Unicode Terms

The exceptions message tells us:




MIME == Multipurpose Internet Mail Extensions
described in - [RFC2045,RFC2046]

See: MIME Media Types
  •   text/pdf
    The PDF format has become a standard for document transfer between computer architectures. A PDF file retains formatting for the file being transmitted. (...snip...)
    SOURCE: FILExt - The File Extension Source
  •   application/pdf The program that displays text/pdf


  • [ March 16, 2008: Message edited by: Nicholas Jordan ]

    "The differential equations that describe dynamic interactions of power generators are similar to that of the gravitational interplay among celestial bodies, which is chaotic in nature."
    Paul Clapham
    Bartender

    Joined: Oct 14, 2005
    Posts: 18570
        
        8

    Here's your problem:

    Your XML document declares that it is encoded in UTF-8. But you disregarded that, and encoded it to bytes using your system's default encoding, which is not UTF-8. So if the document contained non-ASCII characters, they would have been mangled. Here's what you want instead:

    Actually I would try to avoid what you are doing, which is to convert chars to bytes and then have the parser convert the bytes back to chars. Even if you do it right, it's wasteful. If JRXmlDataSource has a constructor that takes a Reader, or an InputSource, then use a StringReader containing xmlDataBuffer.toString().
     
    It is sorta covered in the JavaRanch Style Guide.
     
    subject: java io UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence