Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!

# Deciding on a data structure to optimize performance

Fred Hamilton
Ranch Hand
Posts: 684
In my program, I have a need to temporarily store large numbers of integer pairs. The logical choice for storing an integer pair is an array of two elements int[] = m.

I suppose I can group these two element arrays in one of two structures, ArrayList<int[]> or int[][].

the actual integers, prior to storage, will contained in variables i & j.

If I am to use ArrayList<int[]> list1, then as far as I can see for each pair I need 4 lines of code as follows.

m = new int[2];
m[0] = i;
m[1] = j;

if I am to use a 2D array with 2 columns, then i only need two lines of code, aside from the initial aray declaration.

m[index][0] = i;
m[index][1] = j;

The pairs will need to be order according to various criteria. There will be multiple criteria for each list. I can order the pairs by sorting the 2d array after it has been assembled, presumably I will need one sort for each criteria, or If I use an ArrayList I would order at the time the list is assembled, using list1.add( int, <E> );

once the set is assembled, there is a loop on the set and some analysis with each iteration of the loop, after which one element (pair) is chosen.

This whole process gets repeated many thousands of times in a short period of time. Speed is everything.

if it wasn't for the ordering, I would stick with 2D arrays, but with the ordering requirements I am leaning towards using ArrayList<int[]> which may provide slight performance gains over repeated sorts of arrays, and would also make my life a little easier as a prograsmmer. But like I say, the process gets repeated many thousands of times, and any kind of significant performance gain would push me one way or the other,

So here is the question. Given these two choices, would go with ArrayList<int[]> as i am leaning towards, or would you go with int[][].

fred rosenberger
lowercase baba
Bartender
Posts: 12087
29
Go with whichever one makes the code easier to read/understand.

Only after you have shown a performance problem should you consider refactoring to something faster, and then use a profiler to determine where the slowdown really is.

Fred Hamilton
Ranch Hand
Posts: 684
Thanks Fred. Both methods are easy enough for me to understand and code, and readability is not an issue that would depend greatly on the data structure. Intuition tells me that the built in support for inserts that ArrayList has makes it the way to go. I suppose I could just do it both ways and compare the performance using a simple timer, since I just want to know which of two structures might have a speed advantage. Considering the sheer volume of processing involved. I was just wondering what an expert would say about the relative merits of the two data structures.

regards.

edit: like I said, speed is everything. Performance only becomes an issue when there is a faster way of getting the job done.

Paul Clapham
Sheriff
Posts: 20771
30
Generally if you find there's something which uses built-in functions, as opposed to code you write yourself, you'll find the built-in functions work better. For example if you were writing code to sort an array, you would probably write a bubble sort, which is less efficient than the built-in sort function. The people who wrote the built-in functions were aware of all the "Java is slow" propaganda articles out there on the internet and they did spend time making sure they worked well.

You could of course (in this example) spend some (a lot of) time and write a sort which worked faster than the standard sort, because it took advantage of some specific feature of your data. But you wouldn't do that unless you had a good idea of what to do. You don't, in this case, so you wouldn't do that. So stick with the shorter code.

Fred Hamilton
Ranch Hand
Posts: 684
Paul Clapham wrote:Generally if you find there's something which uses built-in functions, as opposed to code you write yourself, you'll find the built-in functions work better. For example if you were writing code to sort an array, you would probably write a bubble sort, which is less efficient than the built-in sort function. The people who wrote the built-in functions were aware of all the "Java is slow" propaganda articles out there on the internet and they did spend time making sure they worked well.

You could of course (in this example) spend some (a lot of) time and write a sort which worked faster than the standard sort, because it took advantage of some specific feature of your data. But you wouldn't do that unless you had a good idea of what to do. You don't, in this case, so you wouldn't do that. So stick with the shorter code.

In fact, the ArrayList method did turn out to be faster, which would support your suggestions here.

regards.

Nawapunth Manusitthipol
Greenhorn
Posts: 14
Have you try LinkedList<int[]>? If I am not mistaken, ArrayList use array as backend and its insert operation performance is not as good as link-list that LinkedList use.

Paul Clapham
Sheriff
Posts: 20771
30
Nawa Man wrote:Have you try LinkedList<int[]>? If I am not mistaken, ArrayList use array as backend and its insert operation performance is not as good as link-list that LinkedList use.

This may be true. But if you read the original post you will see that nowhere did it mention inserts to the list. Only appends. So your post isn't particularly useful.

Nawapunth Manusitthipol
Greenhorn
Posts: 14
Ummm ... I believe he say it here

The pairs will need to be order according to various criteria. There will be multiple criteria for each list. I can order the pairs by sorting the 2d array after it has been assembled, presumably I will need one sort for each criteria, or If I use an ArrayList I would order at the time the list is assembled, using <b>list1.add( int, <E> );</b>

Fred Hamilton
Ranch Hand
Posts: 684
I'll give LinkedList a looksee then, thanks. My original description wasn't a model of clarity I guess. I'm not actually trying to achieve the effect of a complete sort, but there is ordering in the sense that as elements are encountered, they are to grouped into at least two categories within the same list. I suppose an alternative to inserts with a LinkedList I could just use a separate array for each category, then do some sort of concatenation at the end. It's a game tree sort of thing, the faster the program thinks, the stronger it plays, so I'll have to try that option too.

William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
Have you considered a custom object:

m[index] = new myObj(i,j) ;

Bill

Fred Hamilton
Ranch Hand
Posts: 684
William Brogden wrote:Have you considered a custom object:

m[index] = new myObj(i,j) ;

Bill

Thanks, I thought about it things like that. Probably you are right it would simplify coding, but my experience with that sort of thing in this particular application (game tree analysis) is that it would have a negative impact on performance, which can't be tolerated in this case. After all, the object under discussion is a simple integer pair, of which many thousands are created, evaluated, and discarded all the time. Not some more complex object.

That being said, if you still think it's a good idea, I'd welcome further discussion. Learning is the primary goal here.

regards.

edited for grammar

William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
that it would have a negative impact on performance

I'll bet you \$1.00 it will improve or have zero effect performance - reason being there will be no variable access via an array index:

Furthermore, if your sort criteria can be represented as a value of an instance variable, sorting will be simpler and the criteria will only have to be calculated once during sorting (think of the total number of comparisons in any sort algorithm).

Bill

Fred Hamilton
Ranch Hand
Posts: 684
OK Bill, I'll give it a try and report back here with the results. I'm not sure exactly when I'll find the time, but it'll be soon enough. In the mean time, if you check my website tomorrow I'll post the algorithm I have now using ArrayList<int[]>

regards.

Jesper de Jong
Java Cowboy
Saloon Keeper
Posts: 15208
36
Fred Hamilton wrote:Intuition tells me that ...

The problem with performance optimization is that intuition is often misleading. It happens very often that people use certain constructions a certain way, because they think it will be good for performance, without really knowing if that's so. That's why prof. Donald Knuth came with his famous saying "Premature optimization is the root of all evil".

The only way to be sure is to try it out, and measure which method uses the least cycles and the least memory. Write some small test programs to measure the performance of different options.

Fred Hamilton
Ranch Hand
Posts: 684
OK here is the test I will be conducting over the next few days. I'll wait a day or two before I actually begin the work to give interested parties the chance to make comments on the situation, hopefully to improve the validity of the tests.

The situation will be the construction and analysis of a game tree (chess) to a depth of five ply, from the starting position. five ply means three moves by white and two by black.

A node is represented by an integer pair. The integers range from 0 - 63 inclusive. The numbers represent the 64 squares of the chessboard. Top left is 0, bottom right is 63. A node is a move. There is a string called pStr, the first 64 characters represent the pieces on the chessboard. UpperCase for white, lowerCase for Black, '1' for empty square. Experienced chess programmers may see this as less than optimal, but it can't be changed any time soon without rejigging the entire program.

my program has a recursive method which, at each node, calls for a list of replies (nodes) which are the possible responses. In a typical position there are between 30-50 possible moves. This list of replies (moves) will be grouped into three subsections, captures, moves that give check, and all other moves.

OK my preliminary work says that at a depth of five ply, there will be a total of 95,595 lists, containing a total of 2,425,928 nodes.

The purpose of the test is to evaluate different data structures and node objects, in order to choose one that optimizes performance. We have the following methods that will be avaluated.

Two types of node objects. int[2] and a custom object that will use comparators to create the subsections.

The actual lists will be: 1. An array of nodes that will have to be sorted after being built. 2. An ArrayList of nodes which can be arranged as it is being built by using inserts. 3. Separate arrays for each of the three categories, these arrays will be concatenated in some fashion before the single list is returned.

My initial method is an ArrayList of type <int[]>. The actual code can be viewed here: http://fchess.pastebin.com/f6c2d3664 The method which kicks of the process is getCandidates( String pStr ) . the actual ordering is done in methods addMoveW & addMoveB.

Anyways, like I said I'll wait a day or two before I actually do the work, in case people wish to comment on methodology of the test, or maybe place some bets. I will be evaluating each nethod using by coding a simple millisecond timer using the Date class.

edited for spelling and grammar