*
The moose likes XML and Related Technologies and the fly likes Invalid byte 1 of 1-byte UTF-8 sequence Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Invalid byte 1 of 1-byte UTF-8 sequence" Watch "Invalid byte 1 of 1-byte UTF-8 sequence" New topic
Author

Invalid byte 1 of 1-byte UTF-8 sequence

Santiago Rodriguez
Greenhorn

Joined: Aug 16, 2006
Posts: 10
Hi
I have the next error when I try to transform a xml and xls in pdf (FOP)
javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Invalid byte 1 of 1-byte UTF-8 sequence.|#]

Please help me...
Thanks
Santiago
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

That means your document isn't actually encoded in UTF-8, but you are reading it as though it were. This is often because whoever created the document failed to specify its encoding in the prolog.

So send it back to whoever created it and ask them to fix it up. If you don't feel you have the technical background to back up that claim yourself (and you probably shouldn't) then read this tutorial first:

http://skew.org/xml/tutorial/
Ramamoorthy Govindaraj
Greenhorn

Joined: Dec 31, 2009
Posts: 5
I got the same exception. Fortunately, I resolved in the following ways, this code will help for others.

String output = "some contents...go here."; //or input from other
String s = new String(output.getBytes(),"UTF-8");//force to convert UTF-8 standard will address this issue Invalid byte 1 of 1-byte UTF-8 sequence
Writer writer = new BufferedWriter(new FileWriter("c:/temp/Jasper/invoice.html"));
try{
writer.write(s);
}finally{
writer.close();
}
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

That may work in the sense that it won't throw the exception any more. It may not prevent damage to the data caused by failing to read the document using UTF-8 in the first place.

You may well have seen pages on the web with things like Euro signs and A-with-a-hat characters where there should have been quotes or dashes. This is the sort of thing that happens if you don't use the right encodings.
Michael Angstadt
Ranch Hand

Joined: Jun 17, 2009
Posts: 273

I thought I would share my thoughts, since I was having the same problem (even though this thread is very old).

I did what Ramamoorthy Govindaraj suggested (except my input/output streams used files instead of Strings because my XML document was very large and storing the entire document in memory would have been inefficient):



But that still didn't work. When I opened the file in a text editor (Notepad++), I saw a question mark character at the very beginning of the file. After I deleted that character, I could parse the file successfully.

Working with encodings is annoying because text files are supposed to be simple.


SCJP 6 || SCWCD 5
James Boswell
Bartender

Joined: Nov 09, 2011
Posts: 1012
    
    5

Hi Michael

I think you may need to define the encoding for the output stream. Something like the following:
Brylle Lee
Greenhorn

Joined: Nov 14, 2011
Posts: 1
Character encoding differs from system to system, with some common standards including ISO-8859-1, UTF-8 plus other encodings such as Mac OS.
Louie Poll
Greenhorn

Joined: Nov 19, 2011
Posts: 1
Paul Clapham wrote:That means your document isn't actually encoded in UTF-8, but you are reading it as though it were. This is often because whoever created the document failed to specify its encoding in the prolog.

So send it back to whoever created it and ask them to fix it up. If you don't feel you have the technical background to back up that claim yourself (and you probably shouldn't) then read this tutorial first:

http://skew.org/xml/tutorial/


Im actually having the same problem, and it really stresses me a lot. I hope this get solved by this.....
Raju Sharmas
Greenhorn

Joined: Nov 21, 2011
Posts: 1
I also had the same problem. Was looking for solutions in the internet. This trade helped me a lot. Thanks to all.
john wise
Greenhorn

Joined: Nov 30, 2011
Posts: 1
Brylle Lee wrote:Character encoding differs from system to system, with some common standards including ISO-8859-1, UTF-8 plus other encodings such as Mac OS.

Does encoding also differ from Windows OS versions?



crucial memory coupon
unspoken hermit
Greenhorn

Joined: Dec 14, 2011
Posts: 1
I got a XML doc and a java class which should process this XML doc (on WinXP-OS).

Unfortunately I am getting an execption:

"Invalid byte 1 of 1-byte UTF-8 sequence"

What's wrong?

Because I have not the java source I can only change the XML doc.
The XML doc starts:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema";>
....

It could be that there are some line end conversion errors when I downloaded the XML file from
Linux server.

Could this be the problem ?
___________________


William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
The first thing I would do is examine the start of that document with an editor that can display HEX values to see what it really starts with.

Personally I use UltraEdit-32.

Do you know how the XML document was created?

Bill
steven scortez
Greenhorn

Joined: Dec 16, 2011
Posts: 1
William Brogden wrote:The first thing I would do is examine the start of that document with an editor that can display HEX values to see what it really starts with.

Personally I use UltraEdit-32.

Do you know how the XML document was created?

Bill


Will. Is the UltraEdit 32 easy to use?

--

Toshiba NB505 Review

Life is short live it up
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
Ultraedit is always open on my desktop.
I organize all projects, including my personal papers, using the UE Project/Workspace concepts.
I edit all Java and XML with the keyword sensitive editor.
I compile all programs using UE's Project Tool Customization in combination with ANT capabilities.
Getting started for things like viewing files in HEX is easy.

Ultraedit is a commercial program but I feel good about spending money for good tools.

Bill
 
Consider Paul's rocket mass heater.
 
subject: Invalid byte 1 of 1-byte UTF-8 sequence
 
Similar Threads
Automatic Schema Generation
SAXException: Invalid byte 2 of 2-byte UTF-8 sequence
XML Parsing -> Character encoding and Euro sign
SAXParseException on "ИЙ" sequence, why?
UTFDataFormatException when executing transform on soapMessage content