> What are typically the speed gains that are possible with dynarec?
If you read the literature, the highest gains tend to be around 20% to 30% when performing dynamic optimization on native code. Performance gains become more uncertain when the input is non-native binary code. In this case, its going to be highly dependent on the difference between the destination architecture and the source architecture. If the destination processor cannot natively execute code written for the source processor, and you do not translate to native code, you have no choice but to use an interpreter loop. Even when emulating complex architectures, an interpreter will often spend a good 5 to 10% of its time just performing indirect calls to instruction handlers and performing book keeping. If you execute optimized IL blocks, this figure may increase to 20% or more.
System emulators tend to see very large gains from dynamic translation due to peephole optimization of instruction blocks and execution of said blocks to decrease overhead interrupt processing and multi-processor synchronization.
> I'm thinking about doing dead flag removal as well, but I have the feeling it
> might not make much of a difference when doing it in java since these kinds of
> things are typically those things that HotSpot should be very good at.
That depends. I'm not sure which phase dead code elimination is performed in the Java environment when JIT is used. You are using EBCL to emit Java bytecode directly. Unless Hotspot is known to do dead code elimination, I would expect the java compiler to do the dead code elimination. If that is the case then you wouldn't get the benefit of dead code elimination when using EBCL unless it has a similar feature.
> Does anybody have experience with this, and especially using java? Is there some
> good documentation somewhere about dynarec techniques that I should be aware
> about? Thoughts, suggestions?
The best sources on the subject tend to be academic research papers published through ACM and IEEE. As Bart mentioned, there is also the dynarec mailing list.
Dynamic recompilers are really not all that complex and when used in emulators, the process stays mostly the same. Basic blocks or "super" blocks of instructions are converted to an intermediate representation. Peephole optimizations are performed the IR nodes, including constant folding, constant propagation, inlining and dead code elimination. The compiled block is cached and is then executed whenever encountered in the future. Interrupts are always processed between block executions. Thats pretty much it. Everything else tends to be influenced by the particular system being emulated and the types of programs being run. As an example, self-modifying code in a large address space will require some sort of paging scheme to do reasonably fine-grain code invalidation.