• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Add text file to array but include punctuation

 
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I need to write a program that very crudely mimics file compression.

We need to read a file in, replace words with ASCII chars, write to a file (including the compression "key" at the beginning of the file) and eventually decompress the file (using the key at the beginning of the file).

My code may be very clunky, but it works pretty well and this assignment is due next week and has taken me two weeks to get this far, so I can't really reinvent the wheel.

Here is high level what I am doing:

Read file in and split at spaces, punctuation, numbers (any non-word). Just the words go to a BST. BST goes to a 2D array where words are matched with the ASCII symbols. I read through the original buffered reader and sub the symbols for the words and write it to the file. My problem is that I cannot figure out how to preserve the punctuation from the original file. I don't want or need it for the word and symbol matching, but I do need it to really compress the original file or I will end up with just a bunch of words.

I realize I can remove my split, but then how to I parse my file so that I can sub the symbols? I looked at tokenizer, but that just seems to work in the same way the split does.

Thoughts?

Thanks!
Megan
 
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Meg Berg wrote:I looked at tokenizer, but that just seems to work in the same way the split does.


Assuming you mean the StringTokenizer class, it does have one feature that String.split doesn't have. It has a constructor that tales a boolean argument - returnDelims. If you set this to true, the tokenizer will include the delimiting characters in the list of tokens it returns.
So, for each token returned, you check if it is in your list of compressable tokens and if it is you compress it and add it to your output, otherwise you just add it to your output as is.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Meg Berg wrote:BST goes to a 2D array where words are matched with the ASCII symbols.


Just as an FYI, a Map would be a better option than a 2D array here, but obviously if you haven't studied Maps yet, then just treat this as a bit of advice to bear in mind for the future.
 
Meg Berg
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Joanne. I really appreciate your help! That is exactly what I needed to do. I will remember the maps for the future,

Megan
reply
    Bookmark Topic Watch Topic
  • New Topic