File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes Is StringTokenizer a Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of JavaScript Promises Essentials this week in the JavaScript forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "Is StringTokenizer a "Java Gotcha"  for use in CSV processing?" Watch "Is StringTokenizer a "Java Gotcha"  for use in CSV processing?" New topic
Author

Is StringTokenizer a "Java Gotcha" for use in CSV processing?

Julian Corallo
Greenhorn

Joined: Mar 26, 2004
Posts: 8
Hi,
I dunno - maybe I'm just bitter!
I thought StringTokenizer was the way forwards for processing comma separated records, until I discovered that the "delimter" you specify, is actually "delimiters"!!
So I have a comma separated string where the data in that string is enclosed in quotes - supposedly so that the data can contain the actual delimiter:
String test = "'somedata1','somedata2','some,data3'"; // 3rd item contains a comma
String delim = "','"; //doesnt work as a delimter....
Any ideas?
java 1.3.1 btw...
Cheers,
Julian
eammon bannon
Ranch Hand

Joined: Mar 16, 2004
Posts: 140
StringTokenizer a pretty basic Object, and is behaving (unsurprisingly) as it is documneted. What you want is a StreamTokenizer which gives you more control over how you define you delimiters - importantly for you it handles string quotes as tokens.
Tom Blough
Ranch Hand

Joined: Jul 31, 2003
Posts: 263
If everything is enclosed in single quotes, why not use that as the delimiter and just toss the tokens that consist of a single comma?
If you are using Java 1.4 or above, you could use String.split() or regular expressions to parse the lines. If not, you'll have to obtain or write a string parser class that does what you want.
Tom Blough
[ March 26, 2004: Message edited by: Tom Blough ]

Tom Blough<br /> <blockquote><font size="1" face="Verdana, Arial">quote:</font><hr>Cum catapultae proscriptae erunt tum soli proscripti catapultas habebunt.<hr></blockquote>
Julian Corallo
Greenhorn

Joined: Mar 26, 2004
Posts: 8
Thanks guys. I've had a play with StreamTokeniser - its a bit of a clunky class to use but does the job just fine.
For info - I'm processing records dumped from a database - where char and varchar data is encapsulated, and other data, numbers, timestamps etc are not...
e.g. 'column one char data', 'column two, char data', 3, 4, 'col 5' ....
Many thanks,
Jules
[ March 27, 2004: Message edited by: Julian Corallo ]
Stan James
(instanceof Sidekick)
Ranch Hand

Joined: Jan 29, 2003
Posts: 8791
PJ Plauger (very cool author back in structured days, still writing great C++) once said you can think of any input stream with syntax rules as a mini language. Non-trivial syntax rapidly gets you beyond simplistic parsers like StringTokenizer. Look into regular expressions if you don't know them yet. Could be a very strong way to approach this.
BTW: Be sure to test a string in your database that contains a single quote. We hit this in names all the time like O'Reilly or something. How do they come out in your CSV file? You may have to handle escape characters.


A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
chi Lin
Ranch Hand

Joined: Aug 24, 2001
Posts: 348
Just for fun (as you resolved this already), I implement a method that
extract item between delimiters and the delimiter could be assigned
dynamically.

output :

column one char data
column two, char data
col 5
Thomas Paul
mister krabs
Ranch Hand

Joined: May 05, 2000
Posts: 13974
Why not use regular expressions? It would greatly simplify the code.


Associate Instructor - Hofstra University
Amazon Top 750 reviewer - Blog - Unresolved References - Book Review Blog
Julian Corallo
Greenhorn

Joined: Mar 26, 2004
Posts: 8
Hi
Thanks for all your replies.
I'm not experienced with reg expressions - can you give me an example of how I can approach this problem using these?
To be honest, I think I'm getting into knots with the StreamTokeniser now...
String myString = "'one word','two,words',three words, 4, 5.6, 787, 0.1111, ' ', 'one_word'";
The annoyance comes in when it gets to three words as it splits it into two because it has a whitespace char inbetween. If I set ordinaryChar(' ') then it just makes it worse.
Here is my test code:

Any help much appreciated....
Cheers,
Jules
[ March 29, 2004: Message edited by: Julian Corallo ]
chi Lin
Ranch Hand

Joined: Aug 24, 2001
Posts: 348

[ March 30, 2004: Message edited by: chi Lin ]
Dirk Schreckmann
Sheriff

Joined: Dec 10, 2001
Posts: 7023
If you're looking for an introduction to learning regular expressions in Java, let me recommend the two articles I wrote for the JavaRanch Journal.
An Introduction to java.util.regex - Lesson 1
An Introduction to java.util.regex - Lesson 2
Note that these articles mention a third party regex package available with tutorial from http://www.javaregex.com. So, if you don't have access to Java 1.4 and the java.util.regex package, then you could use this one.
Now, if you really want to learn the java.util.regex engine and understand regular expressions, then you may want to get your hands on a copy of a new book by JavaRanch's Mehran (Max) Habibi, "Java Regular Expressions: Taming the java.util.regex Engine".
[ March 30, 2004: Message edited by: Dirk Schreckmann ]

[How To Ask Good Questions] [JavaRanch FAQ Wiki] [JavaRanch Radio]
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Is StringTokenizer a "Java Gotcha" for use in CSV processing?