> Ah yes - we studies pipelines and stuff in Uni, so does it only affect branches
> What about situations when a register is loaded and then used, are they never
> trouble for emulators?
> How does it manifest itself in Tekken 2?
The use of the term "delayed" is kind of a misnomer, at least when compared to delayed branches. The operations aren't delayed in either case, but I usually associate the term delayed with branches. The branch is not actually delayed, its just that in an N stage pipeline where the last stage is execute, the first (N-1) stages will be full after the branch is executed. The "delay" is created by the fact that the pipeline is not flushed. You reduce wasted pipeline activity by allowing the previous stages to complete after the branch. This reduction in waste increases proportionally to the number of pipeline stages at the cost of making it less likely that you will be able to fill the delay slots.
Load delays is a better name for delayed loads. Because of the external memory fetch and writeback, loads cannot complete in a single pipeline execute stage. Unlike in the case of delayed branches, the loads truly aren't done after the first execute stage. The original MIPS processors performed no interlocking with the simple restriction that you can't use the target load register in the instruction slot following the load. The later processors get rid of this restriction by allowing the pipeline to continue until the target register is used at which point the pipeline is stalled until the load completes.
The Nuon behaves like the original MIPS in terms of loads and any other instruction taking two cycles. You simply can't assume that the result of the operation is complete after one cycle.
What doesn't make sense to me is that in addition to having interrupts, the R3000 has an instruction cache. A pipeline stall or interrupt can occur between instructions so for all intents and purposes you can't take advantage of the multi-cycle behavior because the operation might or might not be complete when you try to read the result. This pretty much means that you can always emulate such instructions as if they only take one cycle. My guess is that Tekken2's use truely is a bug where a register was inadvertantly used in the first cycle of a branch target where the delay slot instruction was a load targeting the register. The register was probably only meant to be read a single time so they got lucky and the software worked as expected.
The load behavior is always this way. Its not limited to branches. I'm guessing that mistakes involving the load restriction probably happen mostly in separate subroutines as (I'm assuming) in the tekken2 case since its pretty easy to check contiguous statements.
The Nuon assembler will actually spit out warnings for all instances of register port write-contention and situations where the result of an operation is used when it may not be ready. Thats one of the few things I like about it.