> Maybe I don't understand you correctly, but shouldn't that be: The less
> interleaving you do, the more idle processing instructions you emulate? I mean I
> can understand that if a main cpu triggers a coprocessor to do something, the
> coproc. should leave any idle loops as fast as possible. One simple way to do
> that is to increase interleaving, right?
I suppose that you could call it increasing or decreasing the amount of interleaving depending on which side you approach it from. If you have four processors, A, B, C and D, the maximum interleave would be acheived by executing a single instruction from each processor per round. If one processor is idle processing, it suffers from the biggest waste of time. I would say that decreasing the amount of interleave minimizes the amount of time spent emulating a polling loop. Its not quite that easy, though. The load on an active processor is path dependent. There is often no way to determine the processing requirement for a given processor and sometimes no way to tell if a processor is idle or not. The ideal solution would be to keep processors in lock-step while both are active, and then when one processor goes idle waiting for the other processor, only execute instructions on the active processor until the wait condition is satisfied.
I currently mitigate this by forcing interpretation of branches. The idle loops executed tend to be only a few packets long and fit within a basic block whereas non-idle code tends to have much larger basic blocks. As a result of switching to the next processor after each basic block, more active code is executed than idle code. This helps considerably, but is still far from optimal given that the active code may contain lots of branches in which case the behavior is similar to maximum interleave.
> Hmmm, so what you're saying is that to ensure proper synchronization you have to
> interleave as much as possible, right? In my own emulator, the only overhead
> this adds just is more interrupt checking. Sure this adds overhead and it can
> add up, but the overhead is not really noticable on my emu but maybe that's
> because I only emulated fairly simple machines.
I don't really need to provide maximum interleave, but there are some features that will benefit from it. The Nuon processors can use a low latency communications bus to talk with each other. They often poll the receive status. Other than that, there really is no hardware based synchronization. The standard software configuration is to have one main processor and three slave rendering processors. In most games, the load balancing between master and slaves is non-existant, so there is significant emulation overhead, particularly when the three slaves are waiting for their next rendering task.