Matthew Brown wrote:It doesn't actually return an infinite number of primes. It returns an iterator that will generate as many primes as it is asked to. So stick it in a for loop and it will run forever (or until you run out of memory). It's a sort of lazy generation.

It will run until the prime numbers exceed Integer.MAX_VALUE, at which point my generator throws a RuntimeException. But it's more likely that the user will just abort before then because it takes so long.

Yep, I just like to read the type of code that's possible with methods such as Primes.each() and Primes.greaterThan(). For instance, here's the source of my factorize() method:

Stephan van Hulst wrote:It will run until the prime numbers exceed Integer.MAX_VALUE, at which point my generator throws a RuntimeException. But it's more likely that the user will just abort before then because it takes so long.

You all got me hooked to Project Euler too.

I've coded the Sieve of Eratosthenes for the prime-related problems and toyed with it a bit. It can get first 100 million primes in 80 seconds. Given that the 100,000,000th prime is 2,038,074,743, I don't think it makes much sense to go further with Integers.

Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3366

9

posted

0

Hey Martin, I'm curious to see the program you used to generate the first 100 million primes.

1) Don't concentrate on design issues. This is what I hacked out quickly (I polished it a bit, but it only helped so much). This code is ugly. The important part is mathematics, not design.

2) Some (ok, most) of the optimization were taken from the Project Euler's Problem 10 PDF:

- Skipping even number and not tracking even numbers in the sieve is obvious.
- The 6k±1 rule excludes even number and multiples of three, improving on the obvious "skip even numbers" optimization by a factor of three a third. Well, obvious, but I missed on this. Including multiples of five would complicate things substantially with appropriately diminished returns (skipping only about one fifth of remaining numbers).
- Marking multiples of n in the sieve starting at n*n is very significant improvement, which makes processing larger numbers run much faster. Again, easy to derive, but I missed on that.
- No need to mark multiples higher than square root of largest expected number, this shortens code path slightly, but is not actually that important.
- Approximations of number of primes less than X and of upper bound for Nth prime are taken from Wikipedia (Prime Counting Theorem). I wouldn't have an ambition to derive these myself

Use int[] primes = Primes.listOfFirstNPrimes(Primes.MAX_NUM_OF_PRIMES); to obtain the list of hundred million primes. You need to give your JVM enough memory (I've used -Xmx1024m). It runs in around 75-80 seconds on my notebook.

Martin Vajsar wrote:Edit: this actually surprises me a lot. I'd expect the difference to be much smaller, given the amount of work BitSet.set() has to do.

Yeah, me too. I remember a few years ago I was doing this problem, and I found that BitSet was notably slower than a boolean[] array. However the latter used a lot more memory - I forget if it was a factor of 8 or 32. It might be worth more investigation under a modern JVM, to see how the numbers work out today.

I'd blow up the memory with boolean[] array, but I've got rid of the BitSet; I use a long[] array and manipulate the bits of my own. I don't need the BitSet's ability to grow (I know the right size beforehand), neither the internal checks. Time is slightly under 40 seconds now.

Stephan van Hulst
Bartender

Joined: Sep 20, 2010
Posts: 3366

9

posted

0

Wow, nice. I'm impressed. My 'simple' code is probably an order of 100 times slower.

Sorry for bringing up this zombie thread, but I wanted to share: this year I got an upgrade and I tried to run the program on the new hardware. It completes in 17 seconds.

The moral of this story probably is that Moore's law eventually solves all performance problems.