Hi Gordon,
Welcome to JavaRanch!
Lots of things I can tell you.
First, Java uses Unicode -- 16 bit characters. Therefore, the minimum amount of space needed to store the data is 10M, not 5.
Second, Java, being a garbage collected language, often keeps stuff around that it's not really using, and tries to keep some free space available. So whereas a (space efficient, time inefficient) malloc library might only grow its heap as needed, the Java heap's size is always larger than needed to hold the data.
Third, the JVM itself is kinda big; just loading the core classes takes up a non-negligible amount of space.
Lastly, how big the heap grows depends on how much "object churn" there is in the program -- how many objects are created and destroyed. This one is pretty bad, actually: one
String for each line of the file, plus many, many array resizings as the StringBuffer grows incrementally. There are much more efficient ways to write this program.
- You could use the StringBuffer constructor that takes a capacity as an argument. This will avoid the need to ever copy and resize the internal char array.
- More importantly, you could just use FileReader.read() to read data into a char[] buffer, then append these characters directly to the StringBuffer, without ever creating Strings. You could use BufferedReader this way too, but if you're reading in big chunks it shouldn't really matter and will use less memory:
Hope this helps.