Ok, so I have a need for a super fast square root function - common enough thing. Math.sqrt just is not fast enough - I need every bit of performance while doing some N body simulations. So I found this article which lists different ways along with their performance and accuracy. So I'd like to use #14, which it says is 375% faster than the C Math.sqrt, but doing it is a bit beyond me.

It would make a nice little Java library if someone could make a JNA or JNI for this code! (Uses inline method for Assembly)

Jeff Gent wrote:Ok, so I have a need for a super fast square root function - common enough thing. Math.sqrt just is not fast enough - I need every bit of performance while doing some N body simulations. So I found this article which lists different ways along with their performance and accuracy. So I'd like to use #14, which it says is 375% faster than the C Math.sqrt, but doing it is a bit beyond me.

Well, first: it looks like the method might be doing some pointer manipulation, which ain't allowed in Java.

Second: if speed is paramount, why aren't you doing the whole thing in assembler?

Third: 9ns isn't fast enough for you? That's what Math.sqrt() takes on my 4-year old clunker Dell (averaged over 100,000,000 calls).

Fourth: Math.sqrt() is likely to be optimized as and when new algorithms turn up; so its performance is only likely to get better. I compared it with the method from this article, written 2 years ago, which seemed quite neat. The author claims that it's 3 times faster (although not as accurate); I found it to be more than 3 times slower with no repetitions, and four times slower with 1.

Winston

PS: If you're really interested in this stuff, you might want to check out Paul Hsieh's page, which has been around for a while now.

Bats fly at night, 'cause they aren't we. And if we tried, we'd hit a tree -- Ogden Nash (or should've been).
Articles by Winston can be found here

I tried a little while back, but couldn't get it to work, perhaps because of the assembler components. I've made a very basic JNI once, but this one is more difficult.

Winston Gutkowski wrote:Well, first: it looks like the method might be doing some pointer manipulation, which ain't allowed in Java.

Even with JNA or JNI?

Winston Gutkowski wrote:Second: if speed is paramount, why aren't you doing the whole thing in assembler?

I don't know assembler. I'm familiar with Java.

Winston Gutkowski wrote:Third: 9ns isn't fast enough for you? That's what Math.sqrt() takes on my 4-year old clunker Dell (averaged over 100,000,000 calls).

It does distance calculations, so take that * itself * 3, and iterate it 100 times. My Java Profiler shows that the vast majority of the time is spent on sqrt.

Winston Gutkowski wrote:Fourth: Math.sqrt() is likely to be optimized as and when new algorithms turn up; so its performance is only likely to get better. I compared it with the method from this article, written 2 years ago, which seemed quite neat. The author claims that it's 3 times faster (although not as accurate); I found it to be more than 3 times slower with no repetitions, and four times slower with 1.

So because it might be faster in two years, I shouldn't mess it now? How about when it gets faster, I'll switch back.

Just a diversion: are you sure you need to calculate any square roots?

For example, if you wanted to know which of a set of points was closest to a given point, you might at first compute the distance of each of the points to the given point, thus calling the sqrt function once for each of the points. But it's sufficient to work with the squares of the distances; you don't need to call sqrt in this case. Perhaps your algorithm can be similarly adjusted to call sqrt fewer times?

Paul Clapham wrote:Just a diversion: are you sure you need to calculate any square roots?

For example, if you wanted to know which of a set of points was closest to a given point, you might at first compute the distance of each of the points to the given point, thus calling the sqrt function once for each of the points. But it's sufficient to work with the squares of the distances; you don't need to call sqrt in this case. Perhaps your algorithm can be similarly adjusted to call sqrt fewer times?

Thanks, that's a very good point. In places where I could eliminate the sqrt, I have. So some of my distance calculations are just sq(x2 - x1) + sq(y2 - y1) + sq(z2 - z1), others are an approximate distance. In fact, if I could find a more efficient sq function, that would be even better.. haha. My sq() is just (a * a).