This week's book giveaway is in the Big Data forum. We're giving away four copies of Elasticsearch in Action and have Radu Gheorghe & Matthew Lee Hinman on-line! See this thread for details.

I'm trying to learn to be more efficient in my code and practising on simple stuff for now.
I've written this tiny program to calculate the sum of all primes up to 2000000. Initially it took over 5 minutes to get the answer, but I managed to improve it to run in 2.6. Still I'm not satisfied and would love to improve further. Maybe you could give me some pointers

If a cluttered desk is a sign of a cluttered mind, then what are we to think of an empty desk?
[Albert Einstein] - or just another way to justify my mess

I moved the check for 1,3,5,7 outside of the loop as this would be executed anyway 2,000,000 times.
Given that every even number besides 2 is not a prime, this saves at least 1,000,000 method execution
calls, stack manipulation etc.

I also converted boolean isPrime() to int isPrime() returning 0 if not and the input if is to add to the sum.

I don't have time to look at the code, but you may be able to improve it slightly if you calculate the sqrt once outside of the loop on line 43 (of the original code) and save it, rather than re-calculate each time through...

wow, many great tips. I'll definitely try them out! Thanks guys

A few questions:

This sqrt business, I read somewhere that I could use this for primes, but I don't quite understand the logic behind it. Why only numbers that are up to the square root of the potential prime should be considered? Could someone elaborate on that?

How does returning int, rather than boolean save me time?

Btw, William P O'Sullivan your original resulting time is slightly greater than mine 172794. I take it, it also depends on the CPU?

I too did not know about the sqrt trick, but if you think about it, makes perfect sense.

As for the returning boolean, vs. int then adding to the sum, I figured that if already know
the number should be added (via true result) then execute a redundant "if", why bother?
simply add the result or 0 (non prime). Removing or reordering
conditional clauses is a good optimization technique especially in SQL!

Yes, I presume since my company balks at upgrading my laptop, I took yours first as
a yardstick then went from there.

Is this Project Euler's Problem 10? After you submit the correct result, you can download an accompanying PDF document. It describes a very efficient algorithm for generating primes, called Sieve of Eratosthenes. You'll be able to cut the time down substantially, probably to a few seconds.

Towards the end of this thread you'll find my implementation of the Sieve of Eratosthenes, which employs the optimization described in the above-mentioned document. It can generate 100 million primes in about 40 seconds on my computer. :-)

Martin Vajsar wrote:Is this Project Euler's Problem 10? After you submit the correct result, you can download an accompanying PDF document. It describes a very efficient algorithm for generating primes, called Sieve of Eratosthenes. You'll be able to cut the time down substantially, probably to a few seconds.

Towards the end of this thread you'll find my implementation of the Sieve of Eratosthenes, which employs the optimization described in the above-mentioned document. It can generate 100 million primes in about 40 seconds on my computer. :-)

Yes it is in fact a problem from Euler. I know there is a document explaining the best solution but I just wanted to see how my existing code could be improved because even though I have a couple of years experience in coding in Java I am honestly clueless about performance and what makes the programs slow down and what speeds them up

The sqrt already made it run in a blink of an eye and I'm quite happy with that, but I will definitely take a look at the current algorithms used for primes

Rose Ellis wrote:This sqrt business, I read somewhere that I could use this for primes, but I don't quite understand the logic behind it. Why only numbers that are up to the square root of the potential prime should be considered? Could someone elaborate on that?

divisors come in pairs....so if you find that x % 2 is 0, you know that x % (x/2) will also be zero. If you find that x % 3 = 0, then x % (x/3) is zero, etc.

So as I said I made a little change in my code. Replaced the line 45 in my original post with the following:

The execution time dropped from 2.6mins to 2.4seconds. I'd say this is a considerable improvement. Then I went on and changed the return type from boolean to int as sugessted by William. However, the execution time went back up to 55seconds. Why is that? I thought this change was supposed to speed it up further. My latest code is this:

The "boolean" version printed out the sum on the screen only if the number was prime. The "int" version prints out the sum for each number, regardless of whether it is prime or not. Printing out text/numbers is very expensive operation, relative to the rest of your code at least. I'd suggest to remove it from the inner loop altogether and only output the final result, if you want to meaningfully compare the performance of different approaches.

Martin Vajsar wrote:The "boolean" version printed out the sum on the screen only if the number was prime. The "int" version prints out the sum for each number, regardless of whether it is prime or not. Printing out text/numbers is very expensive operation, relative to the rest of your code at least. I'd suggest to remove it from the inner loop altogether and only output the final result, if you want to meaningfully compare the performance of different approaches.

I see. Didn't think about that. Never would have thought that printing takes so much time. I thought that adding 0 was slowing it down. As soon as I removed the line, the result was computed in 0.89seconds. Amazing

So quick question. In my project we're are using org.slf4j.Logger to output debugging statements. If printing is so expensive, does that mean that our app performance is reduced? Or is Logger somewhat cheaper then System.out?

The impact of logging depends heavily on your system and how much do you actually log. It should be possible to measure that, though, by running load tests with various levels of logging, and monitoring response times and CPU/IO usage.

But remember, you've printed out two million lines in your code. I've got a system which logs quite heavily, and that one would take a day or more to generate two million log calls. An overhead of a minute or two is probably not noticeable at all.

1) speed is (almost) never the correct thing to go for. It especially is not what to go for first. Write clean, clear code that you can understand if you put it aside for a week or a month. Then run it. Then if it is too slow, look for ways to improve it.
2) EVERYTHING impacts performance. This includes other processes, so if you have a browser running flash during one run and not the other, you may notice a difference.
3) re-read rule #1

I totally agree fred, that's why I decided to remove the following from my code, because in terms of speed it doesn't add much value, but I don't like using constants in my code

This was a very interesting exercise for me. Thanks everyone for the tips, I really learnt something

Rose Ellis wrote:This was a very interesting exercise for me. Thanks everyone for the tips, I really learnt something

In the hope that you haven't abandoned this, I'll say one other thing:

You've probably already worked out that using previous results is definitely the way to go when using a divisor algorithm. The main problem with that is that it needs a list of "previous primes" to work with. A List has the advantage of being flexible, but if you're really into speed, an array is (a) quicker and (b) smaller.

One thing that might help is the fact that, given some limit L, a reasonable approximation of the number of primes < L is L/ln(L) (ln = the natural log of L). So, if you want to find all the primes less than x, you need an array that is approximately sqrt(x) / ln(sqrt(x)).

The trouble is, the approximation is almost always an underestimate - in fact it's always an underestimate once you get past L=60 or so - which means that it's less than the size of array you need. Fortunately, once you get to L = 100 it's within 20%; by L=1000000, it's within 10% (but don't forget, it's always an underestimate). There are even better estimation formulas around, including a great one for values of L > 56, but the proper one may be more trouble than it's worth (it's quite involved).

If a number is not prime, then it can be composed as a product of two lesser numbers.

n = a*b;

The smaller of "a" and "b" must be at most the square root of "n", otherwise the product would exceed "n".
So the smallest (non-trivial) divisor of the number can be at most sqrt(n).

If n is the square of a prime, its smallest (non-trivial) divisor equals to sqrt(n).

Don't get me started about those stupid light bulbs.