why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])?

Abhishek Basu
Greenhorn
Posts: 3
why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])??

Matthew Brown
Bartender
Posts: 4567
8
Hi Abhishek. Welcome to the Ranch!

What makes you think it is?

Abhishek Basu
Greenhorn
Posts: 3
this is actually a question from my interview and also i have run a code to find out that it is millisecinds slower than the random access of the array..

Abhishek Basu
Greenhorn
Posts: 3
Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)? You can assume that the array is preloaded with data.

this is the actual question....

Bear Bibeault
Author and ninkuma
Marshal
Posts: 64830
86
How are you generating the random order?

Mike Simmons
Ranch Hand
Posts: 3076
14
• 2
If this is true (and i wouldn't be too certain that it is), then I would guess that hardware is often designed to read a block of data at once for efficiency. Much like reading a byte[] from a stream is faster than reading one byte at a time. If the stuff you want to read is sequential, this works great. If not, it results in reading a bunch of stuff you don't actually need, to get the few bits you do need.

Paul Clapham
Sheriff
Posts: 21107
32
• 1
Abhishek Basu wrote:Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)?

Don't accept questions with hidden assumptions. The answer to this one is "I don't think it is faster. Do you have some evidence to demonstrate that?"

Mike Simmons
Ranch Hand
Posts: 3076
14
To be fair...
Abhishek Basu wrote:...and also i have run a code to find out that it is millisecinds slower than the random access of the array..

This is good. But "milliseconds" isn't very concrete, and many measurements can easily have an error of milliseconds. So you may need to repeat this experiment by a factor of 10000 or so to make sure the difference is real.

Paul Clapham
Sheriff
Posts: 21107
32
• 1
My version of that experiment:

Output:

So here's some evidence that can be discussed. The difference is significant in this particular test. Let the speculation begin.

Mike Simmons
Ranch Hand
Posts: 3076
14
Paul Clapham wrote:Let the speculation begin.

Ummmm... I'm sticking with my already-posted speculation.

Paul Clapham
Sheriff
Posts: 21107
32
When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.

Mike Simmons
Ranch Hand
Posts: 3076
14
My guess, then, is that for a smaller array, the whole thing gets buffered/cached, and accessing the cache randomly is as fast as accessing it sequentially. But when you exceed the size that can be buffered or cached, you get more slowdown as the system starts jumping around and reading different blocks.

William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6

Pretty mysterious alright -

Operating system memory paging?

Bill

Mike Simmons
Ranch Hand
Posts: 3076
14
Something like that - either OS paging or hardware buffering.

Paul Clapham
Sheriff
Posts: 21107
32
I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)

Winston Gutkowski
Bartender
Posts: 10417
63
Paul Clapham wrote:I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)

I suspect for Java it would definitely be rows, since 2D arrays are arrays of arrays.

BTW, I tried a similar experiment to yours, except with int[]'s instead of Integer[]'s and my results were similar, except the jump for me came between array sizes of 10K and 100K (averaged over 100,000 sums), and it was only about 70% longer; not several times:For array sizes of 100, the random selection was often faster:but even averaging over 100,000 sums, the results varied so much that I wouldn't trust them as far as I could throw them.

Winston

Martin Vajsar
Sheriff
Posts: 3752
62
My guess is this there is no OS paging involved, but it is all due to inbuilt processor caches. In the past I've been using some computationally intensive SW, and the size of memory caches on the CPU (these "Level 1" and "Level 2" caches, sized in kBs and MBs) made bigger difference than raw CPU speed. This was several years ago, though, I don't know how these CPU caches of the past evolved since that time.

William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6
The size of the CPU Cache of memory seems to me to be a likely suspect.

Bill

Anand Hariharan
Rancher
Posts: 272
Paul Clapham wrote:When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.

Spatial locality of reference

Avi Stramer
Greenhorn
Posts: 6
CPU caches have something to do with the reason, but are not themselves the reason.

The reason for this is something called striding. When you access memory in a predictable pattern, a modern CPU will use one of its units to prefetch into cache (in the background) future memory locations at the "stride" it detected. You have to access memory at a consistent stride for the CPU to recognize this. A primitive array will generally get a continuous block of memory (well at least virtual memory), so if you walk it's indexes sequentially, the CPU will recognize the stride and prefetch future memory locations into its memory cache.

The reason you don't see this effect on small array sizes is because the whole array lives on one memory page, so the random vs sequential no longer matters much.

That's the short answer, hope it helps.

Don Redd
Ranch Hand
Posts: 82
As per my knowledge, Array elements are stored consecutively/sequentially in memory. and I think accessing next element in sequence takes less time than accessing random. (i.e if array is on disk, accessing randomly can lead to disk operations overhead).

Avi Stramer
Greenhorn
Posts: 6
• 1
Good recent blog entry from Martin Thompson on the subject: http://mechanical-sympathy.blogspot.com/2012/08/memory-access-patterns-are-important.html