Im working on a basic email filter, which will read in a text file of several hundred email messages, increment a counter for an individual message, tokenise it, count number of occurences of token in that email then store in a dictionary. I have built a couple of classes that do certain things and they work ok. Although i cant figure out how i can split up each individual message in the text file. At the moment i just read the whole text file in, with no method of splitting the text file up into individual messages. The messages in the text file are split by 10 dashes (----------). Here are my classes so far: Tokeniser.java Comwords.java (removes common words like 'the', 'a' etc) Does anyone have any ideas how i might solve this problem? Thanks
posted 11 years ago
Can you just test for a line of ten dashes immediately after the read - the first line inside your loop?
For a real change of direction, if you have JDK 5 look at Scanner. That could probably read one message at a time by using the dashes as a delimiter.
Any chance you'll be bitten by somebody including a line of dashes in their mail message?
A good question is never answered. It is not a bolt to be tightened into place but a seed to be planted and to bear more seed toward the hope of greening the landscape of the idea. John Ciardi