Win a copy of JDBC Workbook this week in the JDBC and Relational Databases forum
or A Day in Code in the A Day in Code forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Jeanne Boyarsky
  • Junilu Lacar
  • Henry Wong
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Frits Walraven
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • salvin francis
  • fred rosenberger

A weird thing on CR and LF

 
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Deal all, i encoutered a weird problem, ok, let me tell you.
First of all ,i wrote a program to create a text file called unicode.txt which was encoded using UTF-16LE charset.

The content was only a single '\n', and i verify that the file size was 4 bytes , besides the binary representation was

from my ultraedit.

Then i wrote another programme trying to read the file bytes by bytes

and the output just drove me crazy, it was
10
0(null)10
0(null)10
End of stream 5 bytes read.

P.S (null) is for the character whose ascii value is zero .
So, my question is why there is no 13(0x0D) in the output and how come the last 10 exist ?

Please explain ...

Thanks in advance !
 
Rancher
Posts: 43016
76
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to be careful about encodings; any time you use methods like getBytes() and toString() without specifying an encoding you risk conversion problems. Rewrite the code to only deal with bytes -not characters and strings- and you should get exactly what's in the file.
 
Sheriff
Posts: 21940
106
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tracy Tse wrote:


Wait, what? You first read the entire file into a String, then convert that into a byte[], then read from that again? Why not replace it with this:


That can be simplified without creating a new Integer and Character object:
Because in the first two ways the first value is a String, all + operations perform a String concatenation. The third form is actually what the second form does without appending the empty String at the start.
 
Tracy Tse
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ulf Dittmer wrote:You need to be careful about encodings; any time you use methods like getBytes() and toString() without specifying an encoding you risk conversion problems. Rewrite the code to only deal with bytes -not characters and strings- and you should get exactly what's in the file.


i appreciate your advice ,but i just could not figure out what in essence the problem is ?

my file was encoded using utf-16le, and now i wanna read the file bytes by bytes (i.e. treat it as a ascii text file),so i use the getBytes method without specifying an
encoding (by default it uses the platform's default charset , my OS is WindowsXP SP3).
 
Rob Spoor
Sheriff
Posts: 21940
106
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What does BufferedInputFile.read look like? If that code is not using UTF-16LE for reading the contents then that's where the problem lies.
 
Tracy Tse
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Rob Prime wrote:

Tracy Tse wrote:


Wait, what? You first read the entire file into a String, then convert that into a byte[], then read from that again? Why not replace it with this:


That can be simplified without creating a new Integer and Character object:
Because in the first two ways the first value is a String, all + operations perform a String concatenation. The third form is actually what the second form does without appending the empty String at the start.


thanks for the code optimization suggestions ,i rewrite the code according to your thoughs , and it works .
So i guess the problem is below

And the source code for the implementation of BufferedInputFile is below

what do you think ?
 
Tracy Tse
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Rob Prime wrote:What does BufferedInputFile.read look like? If that code is not using UTF-16LE for reading the contents then that's where the problem lies.


please see my previous reply !
 
Rob Spoor
Sheriff
Posts: 21940
106
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
new FileReader always uses the default encoding. Replace it with "new InputStreamReader(new FileInputStream(filename), "UTF-16LE")" and see if that works.
 
Tracy Tse
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks, it worls .
 
Rob Spoor
Sheriff
Posts: 21940
106
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome.
If the BufferedInputFile.read method is used in more places with different encodings you may want to consider adding the encoding as a parameter. You can overload the method to use a default encoding:
 
I found some pretty shells, some sea glass and this lovely tiny ad:
Devious Experiments for a Truly Passive Greenhouse!
https://www.kickstarter.com/projects/paulwheaton/greenhouse-1
    Bookmark Topic Watch Topic
  • New Topic