File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes IO Streams and Characters. Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "IO Streams and Characters." Watch "IO Streams and Characters." New topic
Author

IO Streams and Characters.

Dinakar Kas
Ranch Hand

Joined: Jul 11, 2010
Posts: 34
Hi all,

My question is , when we read a file using read() method of InputStream class like

InputStream in = new FileInputStream("a.txt");

int b;
while((b=in.read() != -1){
system.out.println((char)b)
}

the read method returns signed bytes whose values vary from 0 to 255.
So, we can read a text file containing only ASCII and extended ASCII using streams?(If I don't want to use FileReader)
What happens if a read a file containing some unicode characters as I mentioned above?

Also, I read that characters take two bytes of storage. Is it like characters in the range of ASCII take only 1 byte and above those values take 2 bytes?

Thanks,
Dinakar.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

Dinakar Kas wrote:(If I don't want to use FileReader)

Why not? That's what it's for -- to read text files instead of binary files.

What happens if a read a file containing some unicode characters as I mentioned above?

Why don't you try it out?

Also, I read that characters take two bytes of storage. Is it like characters in the range of ASCII take only 1 byte and above those values take 2 bytes?

Character encoding. All characters use two bytes (well, more accurately, 16 bits); when converted to bytes using character encoding characters can require any size from 1 to more bytes. The ASCII encoding only supports characters from 0-127 (inclusive), and only takes one byte per character. Unicode always takes two bytes per character.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Dinakar Kas
Ranch Hand

Joined: Jul 11, 2010
Posts: 34
Hi Rob,
Thanks for your response. I still have some questions.

I created a .txt file with ANSI encoding. I read the file and written to disk again. It works fine when I used streams.
Now, I changed the encoding of the file to Unicode and have read and written the file to disk, again using streams. It did not work. It showed some gibberish. One thing that surprises me is that when I read file and written to disk using filereader and writer, it still shows some nonsense stuff.

My program is as follows:



Any inputs are highly appreciated.
Thanks.
Dinakar.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19783
    
  20

First of all, I think you have one misunderstanding about FileReader. To be honest, this class is quite rubbish. It won't read the real encoding of the file, but assumes the system default is used. A better solution is use InputStreamReader around a FileInputStream and manually specify the encoding.
I know that I mentioned using FileReader before, and it's still good if the system default encoding is used, but only then.

There is one huge flaw in your code.
That reading is as it should; the writing isn't though. First of all, the third parameter is the number of characters to write. Even if you need to write the entire array, that parameter should be cbuf.length, not cbuf.length - 1. This way you're missing one character most of the time.

That said, you should never assume that you'll need to write the entire array. Although it's probably true for files for most iterations, it's wrong most of the time for the last iteration. Your file size will most likely not be a multiple of 20. If your file size is 32 your code will first write 19 characters (ignoring number 20), then write another 19, where only 12 should be written.

That's where b comes into play. It's the number of characters actually read into cbuf. Therefore, that's also the number of characters to write:
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42918
    
  68
If you use streams, then the actual bytes aren't altered - you can read and write them without knowing how the characters in the file are encoded.

But if you use Readers/Writers, you need to tell the JVM which encoding to use every single time. If you don't, then it's going to assume the platform default encoding, which most of the time is not what you want, and -just as importantly- generally is not UTF-8.

So the problem is that you're specifying UTF-8 during writing, but you're not specifying it during reading. That means using a FileInputStream and an InputStreamReader instead of a FileReader.

Edit: ... which is pretty much what Rob just said. Too late :-(
Dinakar Kas
Ranch Hand

Joined: Jul 11, 2010
Posts: 34
Thanks Rob and Ulf for explaining me where I was going wrong.

I have written a program using InputStreamReader, and it works well.



Thanks once again for the inputs.
Dinakar
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: IO Streams and Characters.