jQuery in Action, 3rd edition
The moose likes I/O and Streams and the fly likes Reading Tabs Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Java Interview Guide this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Reading Tabs" Watch "Reading Tabs" New topic

Reading Tabs

Anthony Smith
Ranch Hand

Joined: Sep 10, 2001
Posts: 285
I got a text file that has the following
<TAB> is an actual TAB keystroke
US<TAB> United States USA
CA<TAB> Canada CAN

I just wanted to be able to access the 3 elements in each column so I did the following:
import java.io.*;

public class file


public static void main(String[] args)

File csv = new File("wl.txt");
try {
DataInputStream in = new DataInputStream(
new FileInputStream("wl.txt"));

DataOutputStream out = new DataOutputStream(
new FileOutputStream("w2.txt"));
char chr;

while (true) {

StringBuffer country_code = new StringBuffer(2);
while ((chr = in.readChar()) != '\t') {
System.out.println("CC: " + country_code);

StringBuffer country_name = new StringBuffer(20);
while ((chr = in.readChar()) != '\t') {
System.out.println("CN: " + country_name);

StringBuffer district = new StringBuffer(20);
char lineSep = System.getProperty("line.separator").charAt(0);

while ((chr = in.readChar()) != lineSep) {
System.out.println("D: " + district);

catch (EOFException e) {

// System.

catch (Exception e) {


When I look at the following line, all I see is '?' System.out.println(chr);
What am I doign wrong?
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
The readChar() method of DataInputStream reads exactly two bytes and assumes that they are a Unicode representation of a character. The problem is, most text files aren't in Unicode - they're usually in your system's default encoding. On Windows in the Americas and Europe this is usually Cp-1252, which is Microsoft's version for latin-1 encoding (a variant of ASCII). It's a one-byte encoding - which means that the DataInputStream is grabbing two two characters in Cp-1252 and reinterpreting them as one Unicode char, which results in gibberish. Instead of DataInputStream, try a FileReader wrapped in a BufferedReader:

The FileReader takes char of translating the system default encoding into characters, and the BufferedReader takes care of reading one line at a time. What you do with each line you've read is up to you...

"I'm not back." - Bill Harding, Twister
I agree. Here's the link: http://aspose.com/file-tools
subject: Reading Tabs
It's not a secret anymore!