File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Is StringTokenizer a "Java Gotcha" for use in CSV processing?

 
Julian Corallo
Greenhorn
Posts: 8
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I dunno - maybe I'm just bitter!
I thought StringTokenizer was the way forwards for processing comma separated records, until I discovered that the "delimter" you specify, is actually "delimiters"!!
So I have a comma separated string where the data in that string is enclosed in quotes - supposedly so that the data can contain the actual delimiter:
String test = "'somedata1','somedata2','some,data3'"; // 3rd item contains a comma
String delim = "','"; //doesnt work as a delimter....
Any ideas?
java 1.3.1 btw...
Cheers,
Julian
 
eammon bannon
Ranch Hand
Posts: 140
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
StringTokenizer a pretty basic Object, and is behaving (unsurprisingly) as it is documneted. What you want is a StreamTokenizer which gives you more control over how you define you delimiters - importantly for you it handles string quotes as tokens.
 
Tom Blough
Ranch Hand
Posts: 263
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If everything is enclosed in single quotes, why not use that as the delimiter and just toss the tokens that consist of a single comma?
If you are using Java 1.4 or above, you could use String.split() or regular expressions to parse the lines. If not, you'll have to obtain or write a string parser class that does what you want.
Tom Blough
[ March 26, 2004: Message edited by: Tom Blough ]
 
Julian Corallo
Greenhorn
Posts: 8
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks guys. I've had a play with StreamTokeniser - its a bit of a clunky class to use but does the job just fine.
For info - I'm processing records dumped from a database - where char and varchar data is encapsulated, and other data, numbers, timestamps etc are not...
e.g. 'column one char data', 'column two, char data', 3, 4, 'col 5' ....
Many thanks,
Jules
[ March 27, 2004: Message edited by: Julian Corallo ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
PJ Plauger (very cool author back in structured days, still writing great C++) once said you can think of any input stream with syntax rules as a mini language. Non-trivial syntax rapidly gets you beyond simplistic parsers like StringTokenizer. Look into regular expressions if you don't know them yet. Could be a very strong way to approach this.
BTW: Be sure to test a string in your database that contains a single quote. We hit this in names all the time like O'Reilly or something. How do they come out in your CSV file? You may have to handle escape characters.
 
chi Lin
Ranch Hand
Posts: 348
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just for fun (as you resolved this already), I implement a method that
extract item between delimiters and the delimiter could be assigned
dynamically.

output :

column one char data
column two, char data
col 5
 
Thomas Paul
mister krabs
Ranch Hand
Posts: 13974
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why not use regular expressions? It would greatly simplify the code.
 
Julian Corallo
Greenhorn
Posts: 8
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
Thanks for all your replies.
I'm not experienced with reg expressions - can you give me an example of how I can approach this problem using these?
To be honest, I think I'm getting into knots with the StreamTokeniser now...
String myString = "'one word','two,words',three words, 4, 5.6, 787, 0.1111, ' ', 'one_word'";
The annoyance comes in when it gets to three words as it splits it into two because it has a whitespace char inbetween. If I set ordinaryChar(' ') then it just makes it worse.
Here is my test code:

Any help much appreciated....
Cheers,
Jules
[ March 29, 2004: Message edited by: Julian Corallo ]
 
chi Lin
Ranch Hand
Posts: 348
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

[ March 30, 2004: Message edited by: chi Lin ]
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're looking for an introduction to learning regular expressions in Java, let me recommend the two articles I wrote for the JavaRanch Journal.
An Introduction to java.util.regex - Lesson 1
An Introduction to java.util.regex - Lesson 2
Note that these articles mention a third party regex package available with tutorial from http://www.javaregex.com. So, if you don't have access to Java 1.4 and the java.util.regex package, then you could use this one.
Now, if you really want to learn the java.util.regex engine and understand regular expressions, then you may want to get your hands on a copy of a new book by JavaRanch's Mehran (Max) Habibi, "Java Regular Expressions: Taming the java.util.regex Engine".
[ March 30, 2004: Message edited by: Dirk Schreckmann ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic