This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
In Line 1, there are 4 words - aaa, bbb, ccc, ddd which gets assigned to variables first, second, third and fourth. In Line 2, there are only 3 words, The word that should be assigned to variable third is missing.
I want that the variable 'third' should not be assigned anything in the Line 2. But StringTokenizer treats all the spaces as delimiters.
How can this be done?
[ EJFH: Added "CODE" tags to preserve formatting in data file. ] [ August 27, 2007: Message edited by: Ernest Friedman-Hill ]
How do you know that it's the third word that's missing and not the first second or fourth? From your post there doesn't appear to be a double space or any other marker to denote which word is missing.
If there is a double space then you can split the line successfully using the String.split(..) method (which is actually the preferred way of splitting strings these days).
I don't know if it is the best solution, but take a look at the StringTokenizer's constructors. There is one where you can specify whether or not you want it to return also the delimiters (the spaces). Then you can check what it returns, if it returns two consecutive spaces, a string is missing.
If your text file is the kind of fixed format that it looks like (meaning that each word position is fixed at a specified index; eg firstName starts at position 1 in the line, lastName starts at position 10, middleInitial starts at position 25, etc) then you might want to look up the substring methods of class String; they include the option of specifying the beginning and optional ending indexes for the positions in which you expect to find words. If you were to split each line into substrings by the appropriate beginning index, you could then look for a word or absence of a word in each of the resulting substrings. If your words are all the same length when they exist, then the option mentioned earlier of looking for contiguous spaces would probably be easier and less memory intensive.