Sorry... I'm new to this forum, sorry for the long answer, but I work in this area, hopefully I can help explain things from the JDK perspective.
BCI or class file bytecode instrumentation is not something that either jvmpi or jvmti do, they do provide ways for the users of the interface to do BCI, so it's the user of jvmpi or jvmti that does the BCI.
Using jvmpi or jvmti you can instrument classfiles in memory before the
Java virtual machine sees them (ClassFileLoadHook event), or you can re-define a class after the virtual machine has loaded it (RedefineClasses interface). The dynamic BCI using the RedefineClasses after a class has already been loaded is risky, especially with jvmpi. The latest jvmti (in JDK 5 latest update) is considerably more robust.
But jvmti and jvmpi is just the interfaces that allow you to inject the instrumented classfile. Something else needs to actually do the BCI operation on the classfile.
You could even instrument the classfiles on disk, and avoid jvmpi or jvmti completely, by just running your instrumented classes. But I don't think many people do that. The total number of classes available is so much larger than the total classes actually loaded and used by an application, that just capturing the classfile image prior to the VM seeing it is usually preferred (e.g. the ClassFileLoadHook event approach).
The HPROF in JDK 5 used jvmti, the ClassFileLoadHook event, and BCI. The BCI is done by a native library called java_crw_demo (libjava_crw_demo.so or java_crw_demo.dll) in JDK 5 & 6, and it is independent of any jvm* interface, just a piece of native code to do primitive BCI on classfiles (the source to java_crw_demo is also available in the JDK 5 & 6 downloads, look in the demo/jvmti directory).
There are other examples of BCI libraries, BCEL, etc.
The HPROF agent does cpu sampling also, but does not use BCI and uses very little of jvmti (or jvmpi in the past), by just having a
thread that wakes up on a fixed interval and samples the java stacks on all java threads. I think this is actually the lowest overhead to sample java thread profiling.
Also, jvmpi will distort the virtual machine, creating overhead internally, less so with jvmti. Ultimately the jvmpi is going away, replaced by jvmti.
-kto