> > The
> > indirect-pointer overhead is negligable compared to overhead associated with
> > interpretation and multi-processor synchronization.
> From what I understand, polymorphism is not just indirect calls, but it leads to
> conditional branches which can slow things down considerably.
My quote didn't mention polymorphism at all, only "OO," and I don't see any reason why polymorphism is required to be considered object-oriented. Virtual functions still do not require conditional branches that I know of, or at least they shouldnt in many cases. All the compiler needs to do is assign a fixed vtable address that is shared between the classes containing the virtual method. When calling a virtual method, the compiler simply uses the vtable entry belonging to the object instance. The entry is used as a pointer to the correct function. I suppose it may be the case that combined with multiple inheritance and other peculiarities, it might be possible to run into a situation where the vtable ordering cannot be matched such that the method lookup cannot be performed without some comparisons. The overhead I was referring to was the overhead of indirect addressing to access member variables. If you have an array of processors, you probably would have this overhead anyway, but if there is only one emulated processor, its an overhead that would still exist in a non-static C++ class implementation, but not in a C implementation.
> What slows down multiple processor synchronization so much? In my own emulator,
> running 3 CPU's sync'ed with much interleaving takes exactly 3 times as much
> time as just 1 CPU (obviously without interleaving), so I don't see any
The overhead stems from the increased latency of instruction execution on a given processor and the effect that latency has on timing events. The raw cycles spent on bookkeeping is pretty much linearly proportional to the number of processors being emulated. On the other extreme, a program that has a main processor waiting for coprocessors to finish tasks, and/or uses vsync will suffer a non-linear slowdown factor due to a couple reasons. One big one is that that idle coprocessors still need to be emulated. Putting idle processors to sleep might not be an option if they do more than spinwait. This means that the slower your emulator is, the more idle processing instructions you emulate. These emulated cycles are a complete waste if the processor is just spinwaiting. Additionally, if the program is vsync based where the rendering must occur during vblank, the wasted cycles may cause vblanks to be missed. A vicious cycle forms and as you make the core faster, you end up seeing a non-linear speed gain due to the timing dependencies.
Parallel code adds overhead by nature. On a real multi-processor machine, more function calls and instructions may be executed when a program is configured to use multiple processors versus a single processor. This can end up being faster on a real system because the code is executed in parallel. An emulator running on a single processor has to emulate the instructions sequentally in an interleaved fashion. This is not parallel and depending on the slowdown factor of the emulator, the overhead of a parallel program may be expanded to such an extent that it creates noticable performance degredation.