This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Performance and the fly likes why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Java » Performance
Bookmark "why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])?" Watch "why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])?" New topic
Author

why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])?

Abhishek Basu
Greenhorn

Joined: Aug 18, 2012
Posts: 3
why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])??
Matthew Brown
Bartender

Joined: Apr 06, 2010
Posts: 4342
    
    7

Hi Abhishek. Welcome to the Ranch!

What makes you think it is?
Abhishek Basu
Greenhorn

Joined: Aug 18, 2012
Posts: 3
this is actually a question from my interview and also i have run a code to find out that it is millisecinds slower than the random access of the array..
Abhishek Basu
Greenhorn

Joined: Aug 18, 2012
Posts: 3
Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)? You can assume that the array is preloaded with data.

this is the actual question....
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 60767
    
  65

How are you generating the random order?


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2986
    
    9
If this is true (and i wouldn't be too certain that it is), then I would guess that hardware is often designed to read a block of data at once for efficiency. Much like reading a byte[] from a stream is faster than reading one byte at a time. If the stuff you want to read is sequential, this works great. If not, it results in reading a bunch of stuff you don't actually need, to get the few bits you do need.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Abhishek Basu wrote:Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)?


Don't accept questions with hidden assumptions. The answer to this one is "I don't think it is faster. Do you have some evidence to demonstrate that?"
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2986
    
    9
To be fair...
Abhishek Basu wrote:...and also i have run a code to find out that it is millisecinds slower than the random access of the array..

This is good. But "milliseconds" isn't very concrete, and many measurements can easily have an error of milliseconds. So you may need to repeat this experiment by a factor of 10000 or so to make sure the difference is real.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

My version of that experiment:



Output:



So here's some evidence that can be discussed. The difference is significant in this particular test. Let the speculation begin.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2986
    
    9
Paul Clapham wrote:Let the speculation begin.

Ummmm... I'm sticking with my already-posted speculation.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2986
    
    9
My guess, then, is that for a smaller array, the whole thing gets buffered/cached, and accessing the cache randomly is as fast as accessing it sequentially. But when you exceed the size that can be buffered or cached, you get more slowdown as the system starts jumping around and reading different blocks.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12760
    
    5

Pretty mysterious alright -

Operating system memory paging?

Bill
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 2986
    
    9
Something like that - either OS paging or hardware buffering.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7520
    
  18

Paul Clapham wrote:I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)

I suspect for Java it would definitely be rows, since 2D arrays are arrays of arrays.

BTW, I tried a similar experiment to yours, except with int[]'s instead of Integer[]'s and my results were similar, except the jump for me came between array sizes of 10K and 100K (averaged over 100,000 sums), and it was only about 70% longer; not several times:For array sizes of 100, the random selection was often faster:but even averaging over 100,000 sums, the results varied so much that I wouldn't trust them as far as I could throw them.

Winston


Isn't it funny how there's always time and money enough to do it WRONG?
Articles by Winston can be found here
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3606
    
  60

My guess is this there is no OS paging involved, but it is all due to inbuilt processor caches. In the past I've been using some computationally intensive SW, and the size of memory caches on the CPU (these "Level 1" and "Level 2" caches, sized in kBs and MBs) made bigger difference than raw CPU speed. This was several years ago, though, I don't know how these CPU caches of the past evolved since that time.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12760
    
    5
The size of the CPU Cache of memory seems to me to be a likely suspect.

Bill
Anand Hariharan
Rancher

Joined: Aug 22, 2006
Posts: 257

Paul Clapham wrote:When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.


Spatial locality of reference


"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
Avi Stramer
Greenhorn

Joined: Aug 23, 2012
Posts: 6

CPU caches have something to do with the reason, but are not themselves the reason.

The reason for this is something called striding. When you access memory in a predictable pattern, a modern CPU will use one of its units to prefetch into cache (in the background) future memory locations at the "stride" it detected. You have to access memory at a consistent stride for the CPU to recognize this. A primitive array will generally get a continuous block of memory (well at least virtual memory), so if you walk it's indexes sequentially, the CPU will recognize the stride and prefetch future memory locations into its memory cache.

The reason you don't see this effect on small array sizes is because the whole array lives on one memory page, so the random vs sequential no longer matters much.

That's the short answer, hope it helps.
Don Redd
Ranch Hand

Joined: Jan 05, 2012
Posts: 82

As per my knowledge, Array elements are stored consecutively/sequentially in memory. and I think accessing next element in sequence takes less time than accessing random. (i.e if array is on disk, accessing randomly can lead to disk operations overhead).
Avi Stramer
Greenhorn

Joined: Aug 23, 2012
Posts: 6

Good recent blog entry from Martin Thompson on the subject: http://mechanical-sympathy.blogspot.com/2012/08/memory-access-patterns-are-important.html
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])?
 
Similar Threads
Bank account numbers
Iteration speed of Collections
Question on general advice of when to implement RandomAccess
Anybody tell me the difference ?
Mobile No