File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Performance and the fly likes Can this be improved at all Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Performance
Bookmark "Can this be improved at all" Watch "Can this be improved at all" New topic
Author

Can this be improved at all

Rose Ellis
Greenhorn

Joined: Jun 08, 2012
Posts: 16

Hi everyone,

I'm trying to learn to be more efficient in my code and practising on simple stuff for now.
I've written this tiny program to calculate the sum of all primes up to 2000000. Initially it took over 5 minutes to get the answer, but I managed to improve it to run in 2.6. Still I'm not satisfied and would love to improve further. Maybe you could give me some pointers



If a cluttered desk is a sign of a cluttered mind, then what are we to think of an empty desk?
[Albert Einstein] - or just another way to justify my mess
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 860

I was able to shave around 2.3s off.

Original:

"Improved"


I moved the check for 1,3,5,7 outside of the loop as this would be executed anyway 2,000,000 times.
Given that every even number besides 2 is not a prime, this saves at least 1,000,000 method execution
calls, stack manipulation etc.


I also converted boolean isPrime() to int isPrime() returning 0 if not and the input if is to add to the sum.

WP
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 10916
    
  12

You shouldn't need to loop to (potentiallyPrime / 2), but only up to the square root of potentially prime.

You can also not check every number, but only every other number - instead of "number++" do a "number +=2"

Even better, ever prime number is equal to (6n + 1) or (6n - 1), once you get past 3, so that requires even less checking.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 860

WOW!

Changed:

Results in:


Ya learn something new every day.

Thanks Fred.

WP
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 10916
    
  12

I don't have time to look at the code, but you may be able to improve it slightly if you calculate the sqrt once outside of the loop on line 43 (of the original code) and save it, rather than re-calculate each time through...
Rose Ellis
Greenhorn

Joined: Jun 08, 2012
Posts: 16

wow, many great tips. I'll definitely try them out! Thanks guys

A few questions:

This sqrt business, I read somewhere that I could use this for primes, but I don't quite understand the logic behind it. Why only numbers that are up to the square root of the potential prime should be considered? Could someone elaborate on that?

How does returning int, rather than boolean save me time?

Btw, William P O'Sullivan your original resulting time is slightly greater than mine 172794. I take it, it also depends on the CPU?
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 860

I too did not know about the sqrt trick, but if you think about it, makes perfect sense.

As for the returning boolean, vs. int then adding to the sum, I figured that if already know
the number should be added (via true result) then execute a redundant "if", why bother?
simply add the result or 0 (non prime). Removing or reordering
conditional clauses is a good optimization technique especially in SQL!

Yes, I presume since my company balks at upgrading my laptop, I took yours first as
a yardstick then went from there.

WP
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3436
    
  47

Is this Project Euler's Problem 10? After you submit the correct result, you can download an accompanying PDF document. It describes a very efficient algorithm for generating primes, called Sieve of Eratosthenes. You'll be able to cut the time down substantially, probably to a few seconds.

Towards the end of this thread you'll find my implementation of the Sieve of Eratosthenes, which employs the optimization described in the above-mentioned document. It can generate 100 million primes in about 40 seconds on my computer. :-)
Rose Ellis
Greenhorn

Joined: Jun 08, 2012
Posts: 16

Thanks William

Martin Vajsar wrote:Is this Project Euler's Problem 10? After you submit the correct result, you can download an accompanying PDF document. It describes a very efficient algorithm for generating primes, called Sieve of Eratosthenes. You'll be able to cut the time down substantially, probably to a few seconds.

Towards the end of this thread you'll find my implementation of the Sieve of Eratosthenes, which employs the optimization described in the above-mentioned document. It can generate 100 million primes in about 40 seconds on my computer. :-)


Yes it is in fact a problem from Euler. I know there is a document explaining the best solution but I just wanted to see how my existing code could be improved because even though I have a couple of years experience in coding in Java I am honestly clueless about performance and what makes the programs slow down and what speeds them up

The sqrt already made it run in a blink of an eye and I'm quite happy with that, but I will definitely take a look at the current algorithms used for primes

Thanks
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 10916
    
  12

Rose Ellis wrote:This sqrt business, I read somewhere that I could use this for primes, but I don't quite understand the logic behind it. Why only numbers that are up to the square root of the potential prime should be considered? Could someone elaborate on that?


divisors come in pairs....so if you find that x % 2 is 0, you know that x % (x/2) will also be zero. If you find that x % 3 = 0, then x % (x/3) is zero, etc.
Rose Ellis
Greenhorn

Joined: Jun 08, 2012
Posts: 16

So as I said I made a little change in my code. Replaced the line 45 in my original post with the following:



The execution time dropped from 2.6mins to 2.4seconds. I'd say this is a considerable improvement. Then I went on and changed the return type from boolean to int as sugessted by William. However, the execution time went back up to 55seconds. Why is that? I thought this change was supposed to speed it up further. My latest code is this:

Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3436
    
  47

The "boolean" version printed out the sum on the screen only if the number was prime. The "int" version prints out the sum for each number, regardless of whether it is prime or not. Printing out text/numbers is very expensive operation, relative to the rest of your code at least. I'd suggest to remove it from the inner loop altogether and only output the final result, if you want to meaningfully compare the performance of different approaches.
Rose Ellis
Greenhorn

Joined: Jun 08, 2012
Posts: 16

Martin Vajsar wrote:The "boolean" version printed out the sum on the screen only if the number was prime. The "int" version prints out the sum for each number, regardless of whether it is prime or not. Printing out text/numbers is very expensive operation, relative to the rest of your code at least. I'd suggest to remove it from the inner loop altogether and only output the final result, if you want to meaningfully compare the performance of different approaches.


I see. Didn't think about that. Never would have thought that printing takes so much time. I thought that adding 0 was slowing it down. As soon as I removed the line, the result was computed in 0.89seconds. Amazing

So quick question. In my project we're are using org.slf4j.Logger to output debugging statements. If printing is so expensive, does that mean that our app performance is reduced? Or is Logger somewhat cheaper then System.out?
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 860

SWEET!

Yes ANY I/O is expensive, and since System.out is usually not conditional, unlike logging levels, you will take a performance hit.

WP
Martin Vajsar
Sheriff

Joined: Aug 22, 2010
Posts: 3436
    
  47

The impact of logging depends heavily on your system and how much do you actually log. It should be possible to measure that, though, by running load tests with various levels of logging, and monitoring response times and CPU/IO usage.

But remember, you've printed out two million lines in your code. I've got a system which logs quite heavily, and that one would take a day or more to generate two million log calls. An overhead of a minute or two is probably not noticeable at all.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 10916
    
  12

A couple of important take-aways:

1) speed is (almost) never the correct thing to go for. It especially is not what to go for first. Write clean, clear code that you can understand if you put it aside for a week or a month. Then run it. Then if it is too slow, look for ways to improve it.
2) EVERYTHING impacts performance. This includes other processes, so if you have a browser running flash during one run and not the other, you may notice a difference.
3) re-read rule #1
Rose Ellis
Greenhorn

Joined: Jun 08, 2012
Posts: 16

I totally agree fred, that's why I decided to remove the following from my code, because in terms of speed it doesn't add much value, but I don't like using constants in my code



This was a very interesting exercise for me. Thanks everyone for the tips, I really learnt something
Winston Gutkowski
Bartender

Joined: Mar 17, 2011
Posts: 7063
    
  16

Rose Ellis wrote:This was a very interesting exercise for me. Thanks everyone for the tips, I really learnt something

In the hope that you haven't abandoned this, I'll say one other thing:

You've probably already worked out that using previous results is definitely the way to go when using a divisor algorithm. The main problem with that is that it needs a list of "previous primes" to work with. A List has the advantage of being flexible, but if you're really into speed, an array is (a) quicker and (b) smaller.

One thing that might help is the fact that, given some limit L, a reasonable approximation of the number of primes < L is L/ln(L) (ln = the natural log of L). So, if you want to find all the primes less than x, you need an array that is approximately sqrt(x) / ln(sqrt(x)).

The trouble is, the approximation is almost always an underestimate - in fact it's always an underestimate once you get past L=60 or so - which means that it's less than the size of array you need. Fortunately, once you get to L = 100 it's within 20%; by L=1000000, it's within 10% (but don't forget, it's always an underestimate). There are even better estimation formulas around, including a great one for values of L > 56, but the proper one may be more trouble than it's worth (it's quite involved).

All can be found at Wikipedia's Prime Number Theorem page if you're interested.

Winston

Isn't it funny how there's always time and money enough to do it WRONG?
Artlicles by Winston can be found here
Ivan Jozsef Balazs
Rancher

Joined: May 22, 2012
Posts: 865
    
    5
> This sqrt business

If a number is not prime, then it can be composed as a product of two lesser numbers.

n = a*b;

The smaller of "a" and "b" must be at most the square root of "n", otherwise the product would exceed "n".
So the smallest (non-trivial) divisor of the number can be at most sqrt(n).

If n is the square of a prime, its smallest (non-trivial) divisor equals to sqrt(n).



 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Can this be improved at all
 
Similar Threads
How do you evaluate getTime() repeatedly?
Grid size limit
"write end dead" or "pipe" broken error
why use int over double?
how many project Euler problems have you solved?