The character encoding is 8 bit US ASCII.
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
Seems like too much work to me, but if you do want the charset to be user-configurable you kind of need to do this.
E.g. UTF-16 is also out, unless you want to do additional work to figure out how to validate the length of an input to make sure its encoded length is within the availailable space. It can be done, but it's more conceptual overhead. E.g. users may be confused why they were normally allowed 16 chars for one field, but when they sitch to UTF-16 it's 8, and when they use UTF-8 and insert an � they can only have 15. Easiest to just require that the encoding is 1-per-char, I think.
Not sure which type of overhead you mean, so I'll clarify my own words. When I referred to "conceptual overhead" I didn't mean anything to do with performance or how much code must be written, but with increased complexity in understanding how the code works (either for users or other programmers).
So - let's say we have a "name" field which has 16 bytes of storage in the DB file. Using US-ASCII, this means we can tell the user that the field length is 16, right? Now let's say he restarts the program to use UTF-16. Hurm, well for starters any old data that was stored in US-ASCII is now going to look like crap. But let's say he ignores this and only concentrates on creating some new data. If he tries to enter more than 8 chars for the name, we now need to inform the user that the field length is effectively 8, right? More than that, we don't have space in the file for storage. From the user's point of view, this is an additional complication - why did the length change? Seems weird.
So I'd keep this option out of any GUI config screen that's accessible to the average user. Maybe just let encoding be configured by editing the props file. (I know, the user shouldn't be required to do this - but in the requirements they're not expected to change encoding anyway; I don't see a problem here.)
I'd include comments in the file explaining that encoding must be 1 chr per byte, and warning that existing data may no longer be interpreted correctly if encoding is changed. (Unfortunately comments are lost by the store() method of Properties, so this should also be documented somewhere else.)
All numeric values are stored in the header information use the formats of the DataInputStream and DataOutputStream classes. All text values, and all fields (which are text only), contain only 8 bit characters, null terminated if less than the maximum length for the field. The character encoding is 8 bit US ASCII.
What is IMO
Most of this discussion has gone way beyond what I think is useful or necessary to actually implement for the assignment, but I find it interesting to speculate about future enhancements nonetheless.
Even if you have a conversion program, there are issues of how does the user know when to run it? I suppose it would best be run from the same GUI screen where you configure the encoding in the first place. The moment you change the encoding, maybe a popup should say "changing encoding will force the DB file to be reformatted - are you sure you want to do this now?" Maybe it should have the option of making a new DB file using the new encoding, rather than replacing the old one. Hmmm...
Let me know if you do put something like this in your program - I'd be interested to hear how it goes.
Now I agree with you when you tell that supporting 8 bits character sets should be enough.
I used RAF to read the header since it was simpler, but switched to FileChannel to access the records. Come to think of it, I suppose that column names must still be ni US-ASCII even if another encoding is used for the rest of the file - but that's not a big problem, IMO. I'll make sure it's ducumented though.