File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes Performance and the fly likes Which collection to use? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Which collection to use?" Watch "Which collection to use?" New topic

Which collection to use?

Ranch Hand

Joined: Nov 22, 2008
Posts: 18944
Hey all,
New to THIS board. I need some advice. I need a fast way to access some data that I've gotten from a database and put into some type of collection. Connecting to the database on each loop iteration is out of the question because of overhead. Unfortunately, there is a large amount of data, and there are two keys for the data - a unique key can only be made across these two sub-keys. I have:
Composite key(string) 1;
Composite key(string) 2;
data (string) 1;
data (string) 2;
Which collection would be the best to use to access data1 and data2 elements? I thought of using a TreeMap, and setting up a key object using the two sub keys and implementing a comparitor. Is this fastest? I've never done this before
Marius Holm
Ranch Hand

Joined: Sep 11, 2000
Posts: 84
Hi Nut.
I am afraid I have difficulties understanding your question. My mother tongue is not english so this happens from time to time. You say you have gotten data from the DB, and put into some type of collection. This means you are through with accessing the DB? If 'What kind of data structure could I use to store and access (large amounts of) data in the following format, in an efficient manner?' is a valid rephrase of your question, a proposed answer follows:
You can tailor a class to your data needs, and store instances of this class in an array (or maybe a Vector Object or some tree structure) What works best, depends of course on how you intend to use the data. Generally, if you will access them more than once, all calculation should be done during creation/ initialization of your structure. To explain: Let's say you have a huge amount of data, say a million records have been extracted from your DB, and you are about to insert them into your to-be-memory-resident data structure. Now, any way you want to do this, if you are planning on accessing all of the records by using your composite key, you will have to generate the key (from the two sub-keys) one million times. This you can not escape. The concern is whether you can get away with one million. If you plan to access (search through) the structure just once after it's been created, you can generate the key during the search, and then throw away the structure. If you plan to do further searches, the good thing to do is to design your class with room for an extra field, the generated key, and create the key while initializing the structure. That way you don't have to re-create the key next time. (Note that String operations are quite costly, as String Objects are immutable objects, and thus a concat() operation for instance, creates a new String Object. Object creation is especially heavy, but no operation should be done more than once if possible). I don't know what you mean by 'setting up a key object using the two sub keys and implementing a comparitor'. If that means creating another Object (a Key Object) for every record, I would not recommend it. Avoid Object creation when not necessary, at least when we speak large numbers. You should implement your comparison by writing a method accessible to the Object that accesses the data (alternatively, use a static method in your 'Data' class). Note that the method will be used whenever you need to compare, that means a million times (here) if you search through all records once, or 1M EXP 1M (math notation is not my strongest side but that is 1M raised to the power of 1M) if you do a recursive (bubble-)sorting in a worst case scenario. Your computer will be out of date (and so will you) long before you will be able to calculate that number, so I mention it here just to give the big picture on algorithm optimization The point is really that one million records are more than enough for you to invest some time in clever coding...
An array or some other Object? Again, that depends. You will probably know the number of records when you start initializing your structure. If not, it would probably be an easy task to add an int primitive to keep track of this while reading from the DB. (Another approach is to spawn another Thread and have that Thread ask the DB while the 'main' Thread gets the data. That's Object creation all right, but only once. It's the amount that's significant) If you know the record count, there are few situations where you should not use an array. A Vector, for instance, have more functionality than an array, but is more costly to use. For instance, you have to use a method whenever inserting or retrieving data, and it can only hold objects that are or subclass the 'Object' object. (Your tailored class can subclass Object if you decide to use a Vector, that way you won't have to cast it for insertion/deletion) I am not familiar with the 'TreeMap' Object you mention, (I actually don't do much straight Java these days, more scripting in fact, so I'm a bit rusty, but what I say here are general design guidelines that apply to all coding independent of language) and it could be a good choice, but from the name it seems to be made for working with tree structures, which you probably don't want. Tree structures are fascinating, so fascinating that many programmers tend to forget or ignore that working on tree structures usually involves recursion, which except from a very limited set of areas are extremely ineffective. No matter how well Object Oriented your code may be, you should remember the fact that in all computers on the market today, memory (and disk drives) is accessed sequentially, and in 99% (or 95, 90? Let's just say an overwhelming majority )of situations, code written to access an array in a sequential way will be the fastest. (There are some exceptions to this, but they apply to complex data scenarios and I don't think you need such approaches, and to say a lot about this would be exhaustive. I will mention as examples database tech, where indexing is used to retrieve data (In ISAM DB's like MS Access, this is in fact (additional) data about the data, that also is accessed sequentially.) and well-implemented (and large) tree structures. DB servers like Oracle and MS SQL Server employs tree structures for fast data access. (They also employ complex arithmetic to keep their trees tidy)
Also, I wonder where you have your data now that you read them from your DB, but still have not put them in your structure? But let's assume you have written a function that reads all the records and stores them (or will, when you have your data structure ).
OK, enough talk, here's a proposed class fer ya:
class DataClass{
String key;
String data1;
String data2;
//You say you need to access data1 and data2.
//So I saved us the extra space for two String variables
//that you would need to access the k1 and k2.
//Not a lot, it may seem, but it adds up to two million
//strings in our scenario.
public DataClass(String k1,String k2,String d1,String d2){
This class can hold what you need, two data Strings and one key (you didn't need the sub-keys, did you? If you do, just add two more string variables) It also performs the key generation upon construction so you are finished with that.
Now lets say you have a method to access the DB something like this:
DataClass[] mydata; //You need a global variable to keep the
// data between methods. (Global is of
//course a relative expression)
public void getData(){
int recordcount=-1;
UnKnownDBObject cursor; //This is to hold the object returned
//from the DB query. What kind of
//object depends. (That wasn't what
//your question really was, was it?)
//Here goes the code to connect to your database
//and perform the query. This code is dependent on your
//local system. It usually returns some database object like
//a cursor, which you wrap in, or convert to, some object
//supported by your language (Java). This object will often
//keep the recordcount. If not, do a seperate query for the
//count while the data query is running. At any rate, we
//assume that the recordcount and the cursor variable is set
//here. (If there are no exceptions);
catch(Exception e){
System.out.println("An exception occured: "+e);
//The query returned records

mydata=new DataClass[recordcount];
UnknownDBObject2 record; //for working with one and one rec.
for(int i=recordcount;--i>=0 {
mydata[i]=new DataClass(record[0],record[1],record[2],record[3]);
The comparison and stuff is trivial, if you need help on that too, I will check in in a while..
Hope that helps,
Marius Holm
Ranch Hand

Joined: Sep 11, 2000
Posts: 84
Uhh... that wasn't meant to be a smiley in that code, just: " ;) "
The line should be:
for(int i=recordcount;--i>=0;){
Susan Hoover
Ranch Hand

Joined: Jan 04, 2001
Posts: 64
Java Nut,
(With a name like that, you must be a child of the 60's. )
I had the same problem. I had a composite key of 3 or more strings that I needed to use as the key to a HashMap. The way I solved it was to create a CompoundKey class that takes a String[] as a constructor and implements hashCode, equals, and compareTo. Make sure that your CompoundKey class is declared to implement Comparable.
If you don't need the items to be in any particular order, I would use HashMap instead of TreeMap. TreeMap takes O(log n) time for get/put, whereas HashMap takes constant time.
Jim Yingst

Joined: Jan 30, 2000
Posts: 18671
"Java Nut" was also a child of "before we were enforcing a naming policy here." (Note the original posting date.) If I remember correctly, Java Nut later turned out to be Paul Smiley, who also has been a moderator her in the past when his time was more available.
Anyway, Susan's solution sounds like the best solution in general. A quick-and-dirty version of this might be to simply splice the strings together into one big string. However this is only safe if the strings have fixed length, or you can pad them to have fixed length. E.g.
<pre>Case 1:

key1 = "ab"
key2 = "cd"
combinedKey = "abcd"

Case 2:

key1 = "abc"
key2 = "d"
combinedKey = "abcd"</pre>
These two situations should have generated unique keys, but didn't. If we establish a maximum string length for each key (I'll pick 4), we can pad each string with spaces (or another char) out to its max length, and fix it thus:
<pre>Case 1:

key1 = "ab"
key2 = "cd"
combinedKey = "ab cd "

Case 2:

key1 = "abc"
key2 = "d"
combinedKey = "abc d "</pre>

[This message has been edited by Jim Yingst (edited January 17, 2001).]

"I'm not back." - Bill Harding, Twister
I agree. Here's the link:
subject: Which collection to use?
jQuery in Action, 3rd edition