• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
  • Mikalai Zaikin

Problems Reading UTF-8 File

Posts: 25
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi guys I need clarify this issue

I am working with the SunFtpClient Class in a project that involve
download file contents from Ftp server on Unix Machine. I create some
Files in notepad, write the content and then I save as UTF-8 encoding.

Next I transfer the file content from my machine to the ftp server
in binary mode. Here everything is Ok. But the problem is right here
I execute this piece of code and kaboom the problem appears. Let�s review
the code and next I specify the problem

The Message.txt Content is the following
One Two Three Four Five Six

The LocalMessage.txt Content is the following
?One Two Three Four Five Six

SomeBody Could Ask What is the problem?

The problem is that although I use UTF-8 in InputStreamreader as the Convert
Encoding ,the BOM bytes are not filtering and I suppose that the ? character in the content of file LocalMessage.txt is the result of those bytes. Why InputStreamReader converter=new InputStreamReader(ftp.get("Message.txt"),"UTF-8"); is not working well

I appreciate your comments

Posts: 28271
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are correct. When Notepad writes a file in UTF-8 encoding, it puts the BOM (byte order mark) at the beginning of the file. This is unnecessary since byte ordering is unambiguous in an 8-bit encoding, but it does it anyway. So the BOM is there.

You would think that a Java Reader that is decoding from UTF-8 would notice that there's a BOM at the beginning of the file, since the UTF-8 specification says it may be there. But no, it doesn't. So it's up to you to read that byte (or character) and ignore it.
Blood pressure normal? What do I change to get "magnificent"? Maybe this tiny ad?
a bit of art, as a gift, the permaculture playing cards
    Bookmark Topic Watch Topic
  • New Topic