• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

getting the index of the closest value in an array that is sorted in reverse order

 
Ranch Hand
Posts: 265
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am maintaining two arrays (x and y time series data) that I pass to an application to plot. I've now been asked to return the time given a specific value of the data. This works nicely with ArrayLists. I can do a binarySearch when the data increases over time and a binary search with a comparator when the data decreases with time. So far, so good- the following works:



However, the program is time-sensitive and I have found ArrayList<Double> to be a lot slower than double[]. So I tried to create the equivalent with double[] arrays. That doesn't quite work. I know the problem, there is an Arrays.binarySearch(double[] a, double b) but not one with a comparator. For that I need to use objects.

I'm concerned if I go from double[] to Double[] I'll start taking the performance hits I saw with ArrayList<Double>. So can anyone tell me a) is there a better (fastest execution) way to find the index of the element closest to x in a double[] array and b) what sort of performance differences there might be between Double[] and double[]?

I am using the index to pull a value from another array. So I have two arrays I need to keep in synch. The plot package also requires two separate double[] to plot x,y data and I can't change that.

Doesn't compile


Best I have so far
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jon Swanson wrote:However, the program is time-sensitive and I have found ArrayList<Double> to be a lot slower than double[]. So I tried to create the equivalent with double[] arrays. That doesn't quite work. I know the problem, there is an Arrays.binarySearch(double[] a, double b) but not one with a comparator. For that I need to use objects.


I'm not sure you do (see below).

I'm concerned if I go from double[] to Double[] I'll start taking the performance hits I saw with ArrayList<Double>. So can anyone tell me a) is there a better (fastest execution) way to find the index of the element closest to x in a double[] array and b) what sort of performance differences there might be between Double[] and double[]?


It sounds like you have quite a few things going on here:
1. Access time per element.
2. Mutability of the array size.
3. Search time for a specific value.
(feel free to add any others that I've missed).

Dealing with them individually:
1. ArrayList should be close to an array for element retrieval. The only real overhead is the method call itself.
2. ArrayList's algorithm is about as good as it gets.
3. Double's compareTo() method is pretty quick. Again, the only real overhead is the method call itself.

The main issue with a List<Double>, as opposed to a double[] is that every element has to be converted. This adds time and space; and if you change values a lot, you have a double overhead (boxing/unboxing) every time it's done.

If searching is your only concern, you might want to check out the time difference between calling Arrays.binarySearch() on a double[] and a Double[]; but I'd be surprised if it's outlandishly different.

However, here's a possibility: create a MutableArray wrapper class that gives you access to all the internals of an array, but has an ensureSize() method that does the same thing that ArrayList does: increase the size of the array if it's too short.

You could also add a "view" class like this (filched from 'Effective Java'):On its own, it doesn't help you much, but it provides you with another way of looking at (and optionally updating) the array.
Then anything that changes the array (eg, a sort), or searches it, can be done on the array itself, while anything that simply needs to view it can use the List (or its Iterators).

HIH

Winston
 
Jon Swanson
Ranch Hand
Posts: 265
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think access time might be my biggest concern, not related to the particular look up, but for the whole package, of which doing these look ups is one piece. I started out using ArrayLists, since I will not know the eventual size of the arrays. But the main function that I wrote was taking 14/1000 of a second to run. The values are not changed often, by they are iterated over multiple times. When I changed the function to use double[] where I had used ArrayList<Double> the function ran in 4/1000 of a second. This only matters because the function is used in a Monte Carlo estimator, and is called 1K - 10K times. Which means 14 or 4 seconds and now it is a bigger deal. The client wants the 4 second run time reduced to 0.4 seconds, which at the moment does not seem realistic. But on the other hand, I don't want to increase it with these look ups more than I have to. Plus I am a little concerned if I am maintaining an ArrayList and an Array and using one or the other at different points, I'm asking for bugs or at least maintenance issues.

Here are the timing tests on the ArrayList<Double>, Double[] and double[], just doing the search. I created an array of 1000 elements with values from 0 to 999 and searched for 25.5, 500.1 and 975.9. The worst performance was for 25.5, so I will report that.

ArrayList<Double> 72 ms forward 28 ms reverse
Double[] 48 ms forward 19 ms reverse
double[] 10 ms forward reverse does not work

the average values are

ArrayList<Double> 30 ms forward 14 ms reverse
Double[] 19 ms forward 11 ms reverse
double[] 5 ms forward reverse does not work

so... double[] is faster, as I found with my other code. If I run a bunch of times, reverse is always faster, but ArrayList<Double> and Double[] average to more similar times, Double[] being a little bit faster. But double[] is consistently faster than the other two and also more consistent in the time reported. Which gets me back to the fact that binarySearch does not accept a comparator when the first argument is a double[] so I need some workaround to use double[] when sorted in descending order.

I'm tempted to just keep two more arrays- the x values in reverse order and the y values in reverse order and use those if the x values decrease with time.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jon Swanson wrote:Here are the timing tests on the ArrayList<Double>, Double[] and double[], just doing the search. I created an array of 1000 elements with values from 0 to 999 and searched for 25.5, 500.1 and 975.9. The worst performance was for 25.5, so I will report that.

ArrayList<Double> 72 ms forward 28 ms reverse


Wow. 72 milliseconds? Are you sure you're doing a binary search? I ran a test search on 100,000 random Doubles in an ArrayList, and it came in at 450 nanoseconds (averaged over 10,000,000 searches) on my clunky old Dell.

Makes me wonder if there might be something else wrong.

Winston
 
Jon Swanson
Ranch Hand
Posts: 265
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry, I have gotten in such a habit of multiplying by 1000 to estimate the MC times I did that automatically. It is taking 72 usec not msec. Still that is a lot longer than what you are seeing. This is what I did:



Nothing fancy at all. For my really long processes, i.e. seconds, it seems to be reporting timings that are consistent to what I observe in practice. Would you be willing to run it on your old Dell? Or should I add some loops around each binarySearch to get a more accurate average time?
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jon Swanson wrote:Nothing fancy at all. For my really long processes, i.e. seconds, it seems to be reporting timings that are consistent to what I observe in practice. Would you be willing to run it on your old Dell? Or should I add some loops around each binarySearch to get a more accurate average time?


I would certainly do that, because I suspect you're running into the Heisenberg principle. Simply put, measuring the time something takes takes time, and so affects the result (a call to System.nanoTime() takes 70-100ns on my machine); so the easiest way to eliminate that is to run the "thing to measure" a lot of times and take the mean of the total elapse time divided by the number of calls. I've actually written a Stopwatch class to do precisely that.

BTW, my test (other than running each search 10,000,000 times) was similar in effect to yours, except that I added 100,000 random double values and then sorted the result before searching. I also used 0.5 as my search value, which was unlikely to be found (theoretically, the worst case; although it's possible the compiler could optimize the constant).

If I see anything else, I'll get back to you, but even 72µs seems an awful lot - about 150 times slower than what I'm seeing for your worst case (ArrayList<Double>).

Winston
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I second Winston's advice: Put LOTS of searches inside your timing loop. For what it's worth, here's my attempt. In a list/array of 10,000,000 elements, I did 10,000 searches for each of 2,000 random values (20,000,000 calls to binarySearch()), half of which were in the list and half of which were not, with the search values shuffled to a random order.

A single call to binarySearch() on an array took about 5ns.
A single call to binarySearch() on a List took about 19ns, whether forward or backward. So, yes, slower, but not by an order of magnitude, and not enough to cause me concern. As Winston already pointed out, it's just a little overhead for unboxing and method invocations.


 
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote: . . .
nanosPerSearch: 5.346484294062453
. . .
nanosPerSearch: 19.865068028600355 . . .

You’re sure it wan’t 5.34649 and 18.866 ns?

I think this is now too complicated for “beginning” and shall move it.
 
Jon Swanson
Ranch Hand
Posts: 265
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for running the tests for me. I wrapped the binarySearch in a loop and then looked at the worst case times on my arrays of 1000 numbers.



I hadn't realized the overhead of the time call would be so high. But what seems odd is that the ratio of the time to use the ArrayList to the time to use the double array decreases with the number of iterations. It makes me think that there is some optimization going on that contaminates the benchmark. The ratio of 19/5 Jeff provided is pretty consistent with the other application I was running that used ArrayList<Double> versus double[]. Though that required a lot more calculations and the difference in run time was about 14 seconds versus 4 seconds (I really mean seconds this time). That required a lot of arrays and accessing the array elements a lot of times. So whether a difference of 19 ns versus 5 ns matters depends on how many times the calculation is run. At least the problem is not my computer. You had me worried I had a virus.

My loop includes the assignment of the return value to index. I was wondering if that mattered. If I take that out-



the variability from run to run seems higher than the difference between actually assigning the result to a variable or not.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jon Swanson wrote:Thanks for running the tests for me. I wrapped the binarySearch in a loop and then looked at the worst case times on my arrays of 1000 numbers.


I hate to say, but I think you're looking at this far too simplistically. You assume that a call to x() takes half the time of two calls to x() and it simply isn't true.
The machine you're running on has LOTS of things (the vast majority of which, neither you nor I need to know about) to do other than running your test. And if it didn't, it wouldn't be a proper test would it?

I ran my 10,000,000 search tests several times before I gave you my results, and I'm sure Jeff did with his too. But at the end of the day, it's all hogwash. Our tests are based on the machine spending most of its time doing our test, which is far from likely in the real world.

My advice: Look to your code. 72 microseconds for a search? Please. Are you running this on a Sinclair?

Winston
 
Jon Swanson
Ranch Hand
Posts: 265
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The timings that Jeff provided are pretty consistent with the timing differences I see in the complete code (which runs 14 or 4 seconds depending on whether array lists or arrays are used). So I think I will stick with arrays for now. Since there does not seem to be an option to binary search an array sorted in reverse order, I think I will just use the rate constant to determine if I have data that increases or decreases with time and create new arrays that are in reverse order at the start. I can binary search those and use the other arrays for plots.
 
Don't touch me. And dont' touch this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic