That means your document isn't actually encoded in UTF-8, but you are reading it as though it were. This is often because whoever created the document failed to specify its encoding in the prolog.
So send it back to whoever created it and ask them to fix it up. If you don't feel you have the technical background to back up that claim yourself (and you probably shouldn't) then read this tutorial first:
I got the same exception. Fortunately, I resolved in the following ways, this code will help for others.
String output = "some contents...go here."; //or input from other
String s = new String(output.getBytes(),"UTF-8");//force to convert UTF-8 standard will address this issue Invalid byte 1 of 1-byte UTF-8 sequence
Writer writer = new BufferedWriter(new FileWriter("c:/temp/Jasper/invoice.html"));
try{
writer.write(s);
}finally{
writer.close();
}
That may work in the sense that it won't throw the exception any more. It may not prevent damage to the data caused by failing to read the document using UTF-8 in the first place.
You may well have seen pages on the web with things like Euro signs and A-with-a-hat characters where there should have been quotes or dashes. This is the sort of thing that happens if you don't use the right encodings.
Michael Angstadt
Ranch Hand
Joined: Jun 17, 2009
Posts: 269
posted
0
I thought I would share my thoughts, since I was having the same problem (even though this thread is very old).
I did what Ramamoorthy Govindaraj suggested (except my input/output streams used files instead of Strings because my XML document was very large and storing the entire document in memory would have been inefficient):
But that still didn't work. When I opened the file in a text editor (Notepad++), I saw a question mark character at the very beginning of the file. After I deleted that character, I could parse the file successfully.
Working with encodings is annoying because text files are supposed to be simple.
SCJP 6 || SCWCD 5
James Boswell
Ranch Hand
Joined: Nov 09, 2011
Posts: 152
posted
0
Hi Michael
I think you may need to define the encoding for the output stream. Something like the following:
Brylle Lee
Greenhorn
Joined: Nov 14, 2011
Posts: 1
posted
0
Character encoding differs from system to system, with some common standards including ISO-8859-1, UTF-8 plus other encodings such as Mac OS.
Louie Poll
Greenhorn
Joined: Nov 19, 2011
Posts: 1
posted
0
Paul Clapham wrote:That means your document isn't actually encoded in UTF-8, but you are reading it as though it were. This is often because whoever created the document failed to specify its encoding in the prolog.
So send it back to whoever created it and ask them to fix it up. If you don't feel you have the technical background to back up that claim yourself (and you probably shouldn't) then read this tutorial first:
Im actually having the same problem, and it really stresses me a lot. I hope this get solved by this.....
Raju Sharmas
Greenhorn
Joined: Nov 21, 2011
Posts: 1
posted
0
I also had the same problem. Was looking for solutions in the internet. This trade helped me a lot. Thanks to all.
john wise
Greenhorn
Joined: Nov 30, 2011
Posts: 1
posted
0
Brylle Lee wrote:Character encoding differs from system to system, with some common standards including ISO-8859-1, UTF-8 plus other encodings such as Mac OS.
Does encoding also differ from Windows OS versions?
William Brogden wrote:The first thing I would do is examine the start of that document with an editor that can display HEX values to see what it really starts with.
This message was edited 1 time. Last update was at by steven scortez
Life is short live it up
William Brogden
Author and all-around good cowpoke
Rancher
Joined: Mar 22, 2000
Posts: 11689
posted
0
Ultraedit is always open on my desktop.
I organize all projects, including my personal papers, using the UE Project/Workspace concepts.
I edit all Java and XML with the keyword sensitive editor.
I compile all programs using UE's Project Tool Customization in combination with ANT capabilities.
Getting started for things like viewing files in HEX is easy.