This week's book giveaways are in the Refactoring and Agile forums.
We're giving away four copies each of Re-engineering Legacy Software and Docker in Action and have the authors on-line!
See this thread and this one for details.
Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])?

 
Abhishek Basu
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
why reading an array sequentially is faster than reading it in a random manner(line a[9], a[179])??
 
Matthew Brown
Bartender
Posts: 4565
8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Abhishek. Welcome to the Ranch!

What makes you think it is?
 
Abhishek Basu
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
this is actually a question from my interview and also i have run a code to find out that it is millisecinds slower than the random access of the array..
 
Abhishek Basu
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)? You can assume that the array is preloaded with data.

this is the actual question....
 
Bear Bibeault
Author and ninkuma
Marshal
Pie
Posts: 64633
86
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How are you generating the random order?
 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If this is true (and i wouldn't be too certain that it is), then I would guess that hardware is often designed to read a block of data at once for efficiency. Much like reading a byte[] from a stream is faster than reading one byte at a time. If the stuff you want to read is sequential, this works great. If not, it results in reading a bunch of stuff you don't actually need, to get the few bits you do need.
 
Paul Clapham
Sheriff
Pie
Posts: 20776
30
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Abhishek Basu wrote:Given an array of 1000 elements, why is it substantially faster to read all of the array values sequentially (i.e. array[0], array[1], …. array[999]) rather than in a random order (array[17], array[962], etc...)?


Don't accept questions with hidden assumptions. The answer to this one is "I don't think it is faster. Do you have some evidence to demonstrate that?"
 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To be fair...
Abhishek Basu wrote:...and also i have run a code to find out that it is millisecinds slower than the random access of the array..

This is good. But "milliseconds" isn't very concrete, and many measurements can easily have an error of milliseconds. So you may need to repeat this experiment by a factor of 10000 or so to make sure the difference is real.
 
Paul Clapham
Sheriff
Pie
Posts: 20776
30
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My version of that experiment:



Output:



So here's some evidence that can be discussed. The difference is significant in this particular test. Let the speculation begin.
 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:Let the speculation begin.

Ummmm... I'm sticking with my already-posted speculation.
 
Paul Clapham
Sheriff
Pie
Posts: 20776
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.
 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My guess, then, is that for a smaller array, the whole thing gets buffered/cached, and accessing the cache randomly is as fast as accessing it sequentially. But when you exceed the size that can be buffered or cached, you get more slowdown as the system starts jumping around and reading different blocks.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Pretty mysterious alright -

Operating system memory paging?

Bill
 
Mike Simmons
Ranch Hand
Posts: 3028
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Something like that - either OS paging or hardware buffering.
 
Paul Clapham
Sheriff
Pie
Posts: 20776
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)
 
Winston Gutkowski
Bartender
Pie
Posts: 10111
56
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:I remember when I was learning Fortran, one of the things they told you was that when processing a two-dimensional array you should always try to read the data one column at a time, rather than one row at a time, because of this kind of effect. (Or maybe it was one row at a time rather than one column at a time... it's been a long time since I wrote in Fortran.)

I suspect for Java it would definitely be rows, since 2D arrays are arrays of arrays.

BTW, I tried a similar experiment to yours, except with int[]'s instead of Integer[]'s and my results were similar, except the jump for me came between array sizes of 10K and 100K (averaged over 100,000 sums), and it was only about 70% longer; not several times:For array sizes of 100, the random selection was often faster:but even averaging over 100,000 sums, the results varied so much that I wouldn't trust them as far as I could throw them.

Winston
 
Martin Vajsar
Sheriff
Pie
Posts: 3751
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My guess is this there is no OS paging involved, but it is all due to inbuilt processor caches. In the past I've been using some computationally intensive SW, and the size of memory caches on the CPU (these "Level 1" and "Level 2" caches, sized in kBs and MBs) made bigger difference than raw CPU speed. This was several years ago, though, I don't know how these CPU caches of the past evolved since that time.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The size of the CPU Cache of memory seems to me to be a likely suspect.

Bill
 
Anand Hariharan
Rancher
Posts: 272
C++ Debian VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:When I use SIZE = 1000 in that code (and a suitably larger number of repeats in the "time" method) then contrary to the statement in the original post, I don't see a significant difference between the sequential access and the random access. (So my answer was technically correct -- there isn't a difference for arrays with 1000 elements. Not in my tests anyway.)

But when I use SIZE = 10000 I do see a significant difference (50% longer for random access). So the reason depends on the size of the array. The larger the array, the bigger the difference.


Spatial locality of reference
 
Avi Stramer
Greenhorn
Posts: 6
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
CPU caches have something to do with the reason, but are not themselves the reason.

The reason for this is something called striding. When you access memory in a predictable pattern, a modern CPU will use one of its units to prefetch into cache (in the background) future memory locations at the "stride" it detected. You have to access memory at a consistent stride for the CPU to recognize this. A primitive array will generally get a continuous block of memory (well at least virtual memory), so if you walk it's indexes sequentially, the CPU will recognize the stride and prefetch future memory locations into its memory cache.

The reason you don't see this effect on small array sizes is because the whole array lives on one memory page, so the random vs sequential no longer matters much.

That's the short answer, hope it helps.
 
Don Redd
Ranch Hand
Posts: 82
Eclipse IDE Java Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As per my knowledge, Array elements are stored consecutively/sequentially in memory. and I think accessing next element in sequence takes less time than accessing random. (i.e if array is on disk, accessing randomly can lead to disk operations overhead).
 
Avi Stramer
Greenhorn
Posts: 6
Eclipse IDE Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good recent blog entry from Martin Thompson on the subject: http://mechanical-sympathy.blogspot.com/2012/08/memory-access-patterns-are-important.html
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic