> > I'm wondering if (because C makes quite a good job on the register copies but
> > bad job on the flag calculation) dead flag analysis could bring C emulators
> > close to Asm ones and possibly even past them!
> You could always write a better emulator in assembly which uses the same trick.
> What's nice about this approach is that you don't have to worry about
> instruction decoding anymore. This can normally take a considerable amount of
> time and memory. If you look at the fast assembly 68K emulators, they generate
> many different permutations of each instruction simply to avoid having to
> extract register fields.
> Starscream and Turbo68K had mostly the same approach and generated a lot of
> instructions. A68K is faster and I think that's mostly due to the fact that it
> generates much less code.
> I'm not sure exactly what Generator does, but you can easily generate different
> permutations for instructions based on flags and then have a fetch/decode stage
> which outputs code like this:
> push reg1
> push reg2
> call _OpXXXX
> add esp,4
> push reg1
> push reg2
> call _OpYYYY
> ... etc. ...
> Or, to make it truly portable, have an array of structs of a type like this:
> struct instruction
> void (*Handler)(struct instruction *);
> UINT32 field1, field2, field3, etc. ;
> And simply iterate through this. I think this is what ElSemi was talking about
> (correct me if I'm wrong.)
> It's a good approach, especially when you don't have to keep re-fetching code
> (if it's mostly in ROM, for instance, or in protected non-writeable memory
> pages) and you want to avoid having to write a full-blown dynarec.
> I don't know if it would be worth it for emulating a 68K on X86, though. Flag
> calculation is simple (LAHF, SETO AL), and decoding can be pretty optimal, as
> A68K shows.
> It's probably great for something like ARM or a low-end MIPS or PPC.
Actually, I instead of pre decoding to the register value, I predecode to the pointer of the register.
I have 2 cases:
For the CPUs that don't have implicit side effect on register access (like i960 and SHARC) I store the address
of the register to read/write, with a structure like this:
this way, an opcode like ADD can be written as:
in the case of cpu with register side effects, like the MB86235 used in model2c, I store handlers to perform
the register reading. Also DSPs can perform more than 1 op in the same inst in parallel, so all regs must be
fetched before any write. I can't use direct register write in the handlers because the "next" microop in the
same opcode might be reading the reg we have already written, and it will read the changed value,
not the original one, thus making the execution sequential, and not parallel.
OP controls the execution order of the ops, so a typical ADDER+MUL op is
AFLAGS(dst); //This just compute simple flags like ADDERZero and ADDERSign.
also side effects must be performed in the correct priority order. Emulating DSPs is a real pain :).
about the flags thing, I think I could generate the flags and noflags variation of each alu operation, using the
noflags by default and then when finding a conditional inst, trace back to the first op instruction that affects
that flag and replacing the handler by the flags version. I haven't thought on it too much, but it might have
some trouble with flags at the end of subs and such. I'll try when I have some free time.