GeeCON Prague 2014*
The moose likes Java in General and the fly likes JVM customization for faster Array Access Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Java » Java in General
Bookmark "JVM customization for faster Array Access" Watch "JVM customization for faster Array Access" New topic
Author

JVM customization for faster Array Access

Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Hi

As you know JAVA Array access is 2X-3X slower than C or Equivalent Language without bound checks...

As a Research Effort,
I am wondering If I can customize the JVM for Arrays without bounds checking or Java Language Extension for unmanaged Arrays(same functionality but little fancy term ).

Any Ideas on this???

Will OpenJDK license allow me for this customization...? ( If not i still would be interested in doing this for my personal usage and my organizational usage)
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Hi Pratap,

are you experiencing any real problems from this supposedly slower array access? Did you know that the JIT compiler may optimize your code and throw away the bounds checks when it is save?

Marco
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
JIT compiler may optimize the code by translating to native code...
Native code includes boundary checks..( In our supposed implementation it will not include boundary checks so we will get garbage value for out of boundary Array access where in standard implementation ArrayIndexOutOfBounds Exception will be thrown)

But Are you sure about the Boundary checks.. I think it still checks for the boundary for every Array access(As for my knowledge there isn't any other way).

I would love to hear if you think it won't and how it performs..


I am talking about high performance algorithm perspective...
You must be aware of number of Array access if you use Dynamic programming.
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Have a look at this article on Wikipedia ;-)

I'm not an expert in JIT compiler implementation but it seems that the compiler/JVM most probably will elimante array bounds checking when it is save to remove...

Marco
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14196
    
  20

Pratap koritala wrote:Will OpenJDK license allow me for this customization...?

The whole point of making Java open source is so that people can look at the source code, experiment with it, and even submit patches for it. If you need to know the details of the license, read the legal documents on the OpenJDK website.


Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39103
    
  23
Pratap koritala wrote:As you know JAVA Array access is 2X-3X slower than C or Equivalent Language without bound checks...
Is it?
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

I think it isn't. But even if Java is slower here, I don't see any problems besides some rare special cases.

Anyway, it's easy to find some weaknesses for every programming language or platform. And if it really can't solve a specific problem, you should take one which fits the problem better.

Marco
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
I'm merely pointing to a feature rather than weakness...

To say concretely ,



In the above code, Do I really need one more boundary check...
I am already checking for i>n for every loop, why to check again for array access?

ArrayIndexOutofBounds surely helps but not when I know what I am doing(Certainly)..

Campbell Ritchie : Is it?

I strongly recommend to visit Online judges for Proof...


BTW, Do you know that some people prefer to write
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Hi Pratap,

as I said the JIT compiler will probably elimante unnecessary boundary checks. For this to work the compiler has to be sure that the array index will definitely be within the valid range of an array. Of course you can't convince the compiler to omit the boundary checks just because you promise not to put in forbidden values for the index The compiler or JVM must be able to infer from the code what values the index may get at runtime. Clearly this optimization won't work if the program logic is too complicated for the compiler to "see" the index values.

And YES, it's definitely a good idea to have one check too much because in languages like C most security bugs come from the fact that boundaries are not checked!

To the exmpale with the try-catch: This is simply bullshit! To catch and ignore an exception which would tell you that your application has tried to use a wrong array index (and of course other exceptions, too) is plain stupid. In almost all cases this exception could give you valuable hints to a bug in your application logic.

Anyway, why do you worry so much about these boundary checks?

Marco
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Hi ,

I know this doesn't make sense for lot of people ,But I am talking in high performance context...

To the example with the try-catch: This is simply bullshit! To catch and ignore an exception which would tell you that your application has tried to use a wrong array index (and of course other exceptions, too) is plain stupid. In almost all cases this exception could give you valuable hints to a bug in your application logic.


I mistaken with the code..



I don't think it is a bull shit, you are potentially eliminating one condition.. (i<n)


Thanks for your comments....


>
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

I don't think it is a bull shit, you are potentially eliminating one condition.. (i<n)

OK, now I can see the real advantage of this code. Of course in that case it's not bullshit ;-) It's still questionable if such performance hacks are necessary and reasonable in a language like Java.

On the one hand if you have really, really performance problems with a specific part of an application it's maybe better to use a better suited language at least for this part.

On the other hand modern JVMs and JIT compiler can do very much to optimize an application even many times during runtime depending on the actual behavior and needs of the application. So before using such hacks which is always hard when mainting code from another developer for example, it would be a good idea to find out what the compiler can really do for you! Just because the source code may look like it would reduce performance doesn't mean the actually executed bytecode does really look as you would expect! Inlining, loop unrolling, omiting boundary checks... Unfortunately I don't know exact details but there's surely enough information available. Additionally you could disassemble the byte code to have a look at it at compile time to see what the compiler really does although the bytecode may change at runtime.

Marco
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

As you know JAVA Array access is 2X-3X slower than C or Equivalent Language without bound checks...


What?!? On a modern day processor, a conditional branch is a one-cycle operation. Heck, with pipelining, even this one cycle may be gone. To the compiler, this is not even a hiccup -- and I highly doubt that the optimizer even worries about.

Where did you get your 2X-3X slower than C numbers from?

Henry



Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

OK, I seems now we have an expert here
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Indeed-Java numerical performance is fast enough that some folks are even moving away from Fortran. That said, it sure seems like you'd want to just use a JNI library rather than customizing a JVM to eliminate bounds checking.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 39103
    
  23
Pratap koritala wrote:
Campbell Ritchie : Is it?

I strongly recommend to visit Online judges for Proof...
Where are these online judges? Please quote a link.

I trust you would never use a try-catch like that yourself. Even the for loop you quoted first is flawed by not using the a.length field.

Even if there is a performance overhead (and I am sure Henry is correct), there is another much more serious problem with dispensing with bounds checking. In C/C++ overflowing the bounds of an array (most commonly a null-terminated string expressed as a *char) allows writers of malware to gain access to memory. Abandoning bounds checking in your new JVM would probably lay your code open to such attacks.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12792
    
    5
ArrayIndexOutofBounds surely helps but not when I know what I am doing(Certainly)..


Man! I bet every Microsoft programmer who wrote a C++ buffer overflow security risk said exacty the same thing.



Bill
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252

(Please remember this is not java VS SOMETHING...)

Henry Wong wrote:

Where did you get your 2X-3X slower than C numbers from?



2X-3X is about only the code related to Accessing the array...( Marco Ehrentreich posts has been helpful)

Isn't your instruction pipeline will fail because of if condition...
Isn't that overhead to check for bounds by if loop...
Isn't that overhead analyzing the code for possible optimization, which will never happen because of complex code...(Excluding compile time)
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40


Isn't your instruction pipeline will fail because of if condition...


What "if" condition? What do you mean by fail?

Keep in mind that this is not a Java "if" condition -- it's a conditional branch check (which is part of the code that the compiler generated). Just because the Java "if" condition also has a condition branch, in its generated code, doesn't mean that the array access, will have all the code of a Java "if" condition.

A little history. With older processors, the branch, if not taken, is one cycle. With current processors, the branch, if not taken, is still one cycle. And the branch, if taken, is only one or two cycles. Since the compiler knows that this is array access, it can easily make sure that the valid access is the branch not taken.

As for parallel pipelining, that is processor dependent -- but yes, some processors can do the conditional branch, not taken, at the same time as the next instruction... so even the one cycle is moot.

Isn't that overhead to check for bounds by if loop...


Actual Java "if" conditions and loops are a different story. And this needs to be taken care of by the optimizer. And yes, we can have a separate topic to debates about the merits and comparison of it.

Isn't that overhead analyzing the code for possible optimization, which will never happen because of complex code...(Excluding compile time)


What complex code? It's an array access. The optimizer may get rid of the array access altogether, or optimize the calculation of the index, but that too is a different topic.

Hang on, I'll give the proof in figures... (Please remember this is not java VS SOMETHING DEBATE...)

2X-3X is about only the code related to Accessing the array...( Marco Ehrentreich posts has been helpful)


What debate? You are the one that did the Java to C comparison. You are the one that quoted the 2X-3X number. I merely asked for backup for the number that you threw out as an "as you know" fact. Debate implies that I countered your points -- I just asked for clarification of your facts.

Henry
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Let me tell you story and motivation behind these posts..


I've used JAVA for a programming Competition with Extensive use of Arrays.( A pure Dynamic programming solution)
My implementation is just like every body else, But I lost because of time Out (Time Limit Exceded).

I merely asked for backup for the number that you threw out as an "as you know" fact

After that, I changed it to C++ and it ran within the time limit. Difference is around 1.5-1.7 times ( The Claim I made 2x-3x is array access time that I derived from the differences).



That Debate comment not meant for Henry: It's for..
Man! I bet every Microsoft programmer who wrote a C++ buffer overflow security risk said exacty the same thing.


I Agree with William ,if the usage meant for General Public.

It's for,
A language Extension(Apart from regular arrays) for targeted programmers (who participate in Programming Competition) to use this kind of feature..


Steve Luke
Bartender

Joined: Jan 28, 2003
Posts: 4181
    
  21

Pratap koritala wrote:I'm merely pointing to a feature rather than weakness...

To say concretely ,



In the above code, Do I really need one more boundary check...


In this case? Yeah, you need two boundary checks, because n has no relationship to the size of the arrray. If n=6000, then what should you do when you get past the size of the array (error)? If n=500, then need to keep that i < n check because you don't need to iterate over the entire array. The only time the boundary check is redundant is when n is the same size of the array.

But maybe your concrete example was bad?


I merely asked for backup for the number that you threw out as an "as you know" fact

After that, I changed it to C++ and it ran within the time limit. Difference is around 1.5-1.7 times ( The Claim I made 2x-3x is array access time that I derived from the differences).

So, your premise that we should all know that Java array access is 2-3x slower than C/C++ was founded on the anecdotal evidence that your code wasn't up to the competition's standards? Perhaps it suggests a need to optimize your code rather than inherent speed differences in languages. To test the real differences in the language speeds you should do more precise timing tests, and make those tests public (show us the code you used in both cases) so they can be repeated.

So, your question shouldn't be: Java is slower than C++ because of boundary checks, how can I get rid of the boundary checks? It should be: This code sample runs slower in Java than I need it to. How can I make it faster? Or: This code runs too slow in Java, but not too slow in C. Should I run this code in Java or not?


Steve
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252

So, your question shouldn't be: Java is slower than C++ because of boundary checks, how can I get rid of the boundary checks? It should be: This code sample runs slower in Java than I need it to. How can I make it faster? Or: This code runs too slow in Java, but not too slow in C. Should I run this code in Java or not?


Didn't you see my post asking for language extension...
and also mentioned, It is for programming Competition offering participants this customization.
I encourage you to post a new thread for those...

A language Extension(Apart from regular arrays) for targeted programmers (who participate in Programming Competition) to use this kind of feature..



Perhaps it suggests a need to optimize your code rather than inherent speed differences in languages.

What kind of optimization needed, if the same code is performed good on the other one ... ( I am not complaining about choice of language, there are always risks for a choice)





Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
So, it seems Building a language construct for unmanaged arrays is not worth as there is so little to gain.........
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

What kind of optimization needed, if the same code is performed good on the other one

How could we know without knowing anything about the code? The compiler? Length of execution (so JIT has a better chance)? Anything?

Saying the "same code" across languages is a bit weird.
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Ok, Here's the code

David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

The first couple of things that jump out are the compares against n; comparing against 0 is generally faster (not just in Java, and Java performance will vary across JVMs). The exception trick works for dumb benchmarks/code competitions. Don't know what optimizations/JVM you used. Don't know which version of which JVM; there are noticeable performance boosts in some/some versions.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

Also see here as an example of the wide variety of performance results under different JVMs (and C thrown in for good measure).

Hopefully you get the point.
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
David Newton wrote:Also see here as an example of the wide variety of performance results under different JVMs (and C thrown in for good measure).


In that article,Major discussion of benchmark has been on Array access...
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Just out of curiosity, does anyone know some tutorials or articles with a comprehensive list of common optimizations a typical JIT compiler in Java will make?

@Pratap: What exactly should this code sample do? Sorry, for today it's to hard to think about it myself Besides the fact that the same code was obviously faster in C++ I'd rather think about an improvement of the algorithm itself instead of compiler optimizations alone. 3 or 4 times nested loops usually never result in very good performance results regarding the time for a computation. If you could come up with a solution to eliminate at least some of the nested loops, I guess this would speed up the application enough to win the competition!

Marco
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

As many already mentioned, there are tons of reasons why comparing Java and C is like comparing apples and oranges. There are just so many differences that it will be impossible to pinpoint anything in particular. But putting all of that aside...

Just brainstorming a couple of ideas with everyone...

1. Could it be partly caused by the double array accesses? Since Java doesn't support multidimensional arrays, it has to (load and) dereference the array object twice, verses a single load and rereference for C/C++.

2. I wonder how long does it take to run this application? If it is really short, Java may be at a disadvantage as it haven't warmed up (ie. JIT compiled) yet. (can't tell from the application, as the definition of "n" is not included)

[EDIT: Sorry. David already mentioned the second point.]

Henry
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

1) Unfortunately no idea, but your speculation sounds reasonable to me.

2) In case the application doesn't run long enough for the JIT compiler to optimize anything, as far as I know it could help to use a server JVM which usually has a higher startup time but does more optimizations before the application starts.

Marco


Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

Marco Ehrentreich wrote:2) In case the application doesn't run long enough for the JIT compiler to optimize anything, as far as I know it could help to use a server JVM which usually has a higher startup time but does more optimizations before the application starts.


I think its the client JIT that you want for short live programs. The client JIT will immediately JIT compile the classes as soon as it is loaded.

With the server JIT, it will run in interpreter mode, for a while, gathering data about the application, and then, later, it will JIT compile the classes with the profile data that it learned while running it with the interpreter.

With short lived programs, by the time the JIT compiler starts compiling, the application could be done -- hence, it basically ran interpreted.

Henry
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Mmmh... yes, I think, you're right. I recently explained to someone else that startup time for a server application should be faster with a server JVM and indeed it was faster... Just forget my last post

But after thinking about your last post regarding two dimensional array I wonder if it could improve performance to use a one dimensional array and calculate offset and index manually. But I can hardly imagine that this could be faster than the internal array handling of the JVM.

A simpler idea would be to profile this code sample and see what part takes the most time and discuss that for more optimizations.

Marco
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Marco Ehrentreich wrote:Just out of curiosity, does anyone know some tutorials or articles with a comprehensive list of common optimizations a typical JIT compiler in Java will make?

http://java.sun.com/products/hotspot/docs/whitepaper/Java_HotSpot_WP_Final_4_30_01.html

If you scroll down to: Java HotSpot Server VM Optimizations, you'll notice this
Range check elimination -- The Java programming language specification requires array bounds checking to be performed with each array access. An index bounds check can be eliminated when the compiler can prove that an index used for an array access is within bounds.
So it seems that the server VM may make this optimization but not the client VM.


Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Thanks for the information, Garrett! It seems these are only the few optimization methods mentioned in Wikipedia, too.
So it seems that the server VM may make this optimization but not the client VM.

This one is new to me. I've ever thought that each modern JVM would make some optimizations like this. What is the JIT compiler in a client VM then good for? Are there other things the compiler optimizes besideds such code optimizations?

The client VM compiler doesn't try to execute many of the more complex optimizations performed by the compiler in the server VM, but in exchange requires less time to analyze and compile a piece of code. This means the client VM starts up faster and requires a smaller memory footprint.

@Henry: According to the Sun docs it's really the client VM which has faster startup time. But because server and client VM/compiler support different levels of optimization techniques the situation has changed anyway.

Now it would be interesting to profile the full application and test the impact of different JVMs...

Marco
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

@Henry: According to the Sun docs it's really the client VM which has faster startup time. But because server and client VM/compiler support different levels of optimization techniques the situation has changed anyway.


I purposely try not to discuss which JIT has a "faster startup" time, because it really depends on how you define "startup". If the definition is running your classes, then the server JIT is faster, as it initially uses the interpreter, and hence, can run your classes sooner. If the definition is running natively, then the client JIT is definitely faster, because the server actually runs initially with the interpreter, gathering profile data, before it compiles the bytecodes.

As for my definition of startup time, I choose the second definition, as that is when the JITs are completely warmed up (running at full speed). This seems to be the same definition used by the Sun docs.

Henry
Pratap koritala
Ranch Hand

Joined: Sep 27, 2006
Posts: 252
Just wanted to give you actual figures on much simpler code...

I've ran same Equivalent Java and C++ code(posted At the End) on perfect bench mark suitable system.


On a different kinds of test data, summing up all

It took 4.2 sec for c++ ( Time limit for C++ is 10 Sec)

and Java took > 20 Sec (Exact time unknown as it is Timed out, There is already 2X grace time for java and it didn't even complete in the grace time also despite being same code)

You can verify that at,
http://www.codechef.com/JULY09/status/CUBESUM,supercharger/


Reasons could be following...
No support for Multi-dimensional Array.(I am not sure about this)
Array Index checking



henry wong wrote:I merely asked for backup for the number that you threw out as an "as you know" fact


Is this "AS YOU KNOW FACT"? ( Now,difference is clearly visible)


Sorry, I forgot that, I cannot post the code as it is currently running competition...

But, I will provide you if you can PM me(After verifying identity).


Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Hi Pratap,

it's still unclear what exactly is the cause of this big difference. But as I wrote some posts above, I'd recommend to run this application and use a profiler to identify where exactly the most time is spent. Then a further discussion will be more reasonable, I think.

Marco
Vlado Zajac
Ranch Hand

Joined: Aug 03, 2004
Posts: 245
Even the innermost loop does input and output so most time should be spent in I/O.

Java I/O is slow when handling text data since it uses 16-bit chars.
Dipanjan Kailthya
Greenhorn

Joined: Jul 04, 2009
Posts: 20
I was looking at the successful times for this problem submitted by other people and found that the user "Darko Aleksic" posted a correct java solution that supposedly took 3.71 seconds which, while not the fastest out there, is still faster than the OP's C++ code....

http://www.codechef.com/JULY09/problems/CUBESUM/


Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18876
    
  40

Pratap koritala wrote:Just wanted to give you actual figures on much simpler code...


At the end of the day, everything stated here, on possible problems is pure speculation. If you want to hunt down the problem, you will need to profile the application.

We would be happy to humor any speculation. In fact, this topic has already went down many speculative discussions -- I particularly liked the one with the server / client JIT differences. But keep in mind, it is all speculation -- no conclusions will ever be drawn from it.

Is this "AS YOU KNOW FACT"? ( Now,difference is clearly visible)


I guess you missed the previous paragraph that mentioned that I was "brainstorming". Or are you now trying to prove that I am a hypocrite?

Again, if you want to hunt down the problem, you will need to profile the application.

Henry
 
GeeCON Prague 2014
 
subject: JVM customization for faster Array Access