1) Some instructions do take longer than others, for example, iconst_0, which pushes an 0 on the stack, will not take as long to execute as say, imul, which pops the two top elements off the stack, multiplies them, and pushes the result. imul simply does more work.
There is no list for how long each one takes because, as you say, it is VM dependent. The VM spec lists all the bytecode, and what they do, but does not specify an upperbound for their execution. This is how is should be, IMO.
2)This depends, in part, to your execution environment. If you are running a VM with a JIT, bytecode analysis will be less important than if you are running fully interpreted in an embedded environment. If you are running fully interpreted, you can analyze the bytecode by disassembling the .class file to find inefficiencies. You can then rework your source code to eliminate them. Remember that the javac compiler does very little to no optimization. For more, see the Performance chapter of my
book.
3) I wrote an article about bytecode that can be found
here. In addition, my
book contains lots of details about bytecode and its optimization and relevance. Also, Bill Venner's
book "Inside the
Java Virtual Machine" talks about bytecode as well.
I hope this helps,
Peter Haggar
------------------
Senior Software Engineer, IBM
author of:
Practical Java