Just trying to figure out when it is better to use a HashMap and when it is better to create a database table in a relational database. The key difference I am thinking about is that a hashmap is created and kept in memory and a database table is stored on drive or other secondary storage.
I am thinking that the fact that hashmap is in memory has some advantages and some disadvantages.Any of these may be more important than any other depending on necessary size of data to be processed. One advantage would be that CPU can access memory faster than secondary storage. A disadvantage could be that hashmap must be rebuilt and populated every time program is started.
So if anyone has thoughts or especially experience along these lines, please reply. Thanks in advance. PL
P.S. I should mention that I do realize that HashMap only holds 2 fields ( key and value) and that would be fine for project I have in mind .
P.S.S. Maybe I could have better phrased my question by asking about any practical limitations on size of a hashmap in regard to populating and processing .
Thanks again. PL
If you're going to do work with the data, use memory. If you're going to store data, use a database. It's not a trivial difference.
The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.
Joined: Aug 29, 2009
You bring up a good point. I am thinking of working with dataset that will be changed via additions and deletions and also will need to do comparisons. I am considering HashTable class mostly because of close relation to Equals class, but am still concerned that size of HashTable could become factor as I only have 1 GB memory and number of entries could be hundreds and would like to have potential for thousands.
This is an educational project for benefit of gaining experience. Thanks again. PL
"Hundreds" and "thousands" of entries is meaningless if we don't know the size in question, whether or not a paging performance hit is acceptable, and so on. We don't know if you're working with the entire data set at once, or if it could be cached/page from a DB using an existing or custom caching mechanism.
Joined: Aug 29, 2009
Appreciate your interest, but the whole point of the post was that i am asking for recommendations. If I Had already figured out the stuff you are saying I left out of my post, then I wouldn't need to post. Kind of a Catch 22 in reverse.
Anyway the entry size is two fields per entry . The first field would either be a key based on some as yet to be made up algorithm or ( if using a DB table ) a unique primary key. The second field would be a String of 256 characters or less.
The idea of using a hashmap in conjunction with a DB via a caching mechanism sounds interesting and may be the best bet.
Thanks . PL
There's no Catch-22, you know the size of entries and roughly how many there are--and presumably the nature of the processing (do you need full, partial sparse, etc. views into the dataset).
Assuming your guess of "thousands" is correct, we'll use 10k as an example.
10k entries at, say, 0.5kb each entry, is 5000k, or roughly 5m. That's essentially zero memory in a 1g system.
(Unless my math is wrong, which is entirely possible--but even assuming I'm off by an order of magnitude or two, we're still talking < half your physical memory.)
Joined: Aug 29, 2009
David , Thanks for doing the math. It looks right to me and that is reassuring because my initial interest in doing this project was to work with the HashMap class and with overiding the equals method and the hashCode method to go beyond the definition of equality in the Object class.
The concept that interests me is that two distinct objects can have the same identifying string such as a name or title. The implementation of this will involve computing a key based on an algorithm that incorporates unique attributes of an object. For example, If object is a recording of a song the algorithm could incorporate date time, and address of recording session . The second field would just be the songtitle and there could be multiple occurences of a given songtitle in the hashmap. The overridden methods would be coded to go beyond the equality of the second fields and test for a better definition of equality.
I guess this is all simple stuff for an experienced coder/developer but it is new to me. Thanks.