This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Hi, I dunno - maybe I'm just bitter! I thought StringTokenizer was the way forwards for processing comma separated records, until I discovered that the "delimter" you specify, is actually "delimiters"!! So I have a comma separated string where the data in that string is enclosed in quotes - supposedly so that the data can contain the actual delimiter: String test = "'somedata1','somedata2','some,data3'"; // 3rd item contains a comma String delim = "','"; //doesnt work as a delimter.... Any ideas? java 1.3.1 btw... Cheers, Julian
StringTokenizer a pretty basic Object, and is behaving (unsurprisingly) as it is documneted. What you want is a StreamTokenizer which gives you more control over how you define you delimiters - importantly for you it handles string quotes as tokens.
If everything is enclosed in single quotes, why not use that as the delimiter and just toss the tokens that consist of a single comma? If you are using Java 1.4 or above, you could use String.split() or regular expressions to parse the lines. If not, you'll have to obtain or write a string parser class that does what you want. Tom Blough [ March 26, 2004: Message edited by: Tom Blough ]
Tom Blough<br /> <blockquote><font size="1" face="Verdana, Arial">quote:</font><hr>Cum catapultae proscriptae erunt tum soli proscripti catapultas habebunt.<hr></blockquote>
Joined: Mar 26, 2004
Thanks guys. I've had a play with StreamTokeniser - its a bit of a clunky class to use but does the job just fine. For info - I'm processing records dumped from a database - where char and varchar data is encapsulated, and other data, numbers, timestamps etc are not... e.g. 'column one char data', 'column two, char data', 3, 4, 'col 5' .... Many thanks, Jules [ March 27, 2004: Message edited by: Julian Corallo ]
PJ Plauger (very cool author back in structured days, still writing great C++) once said you can think of any input stream with syntax rules as a mini language. Non-trivial syntax rapidly gets you beyond simplistic parsers like StringTokenizer. Look into regular expressions if you don't know them yet. Could be a very strong way to approach this. BTW: Be sure to test a string in your database that contains a single quote. We hit this in names all the time like O'Reilly or something. How do they come out in your CSV file? You may have to handle escape characters.
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi
Hi Thanks for all your replies. I'm not experienced with reg expressions - can you give me an example of how I can approach this problem using these? To be honest, I think I'm getting into knots with the StreamTokeniser now... String myString = "'one word','two,words',three words, 4, 5.6, 787, 0.1111, ' ', 'one_word'"; The annoyance comes in when it gets to three words as it splits it into two because it has a whitespace char inbetween. If I set ordinaryChar(' ') then it just makes it worse. Here is my test code:
Any help much appreciated.... Cheers, Jules [ March 29, 2004: Message edited by: Julian Corallo ]