Welcome to Emulationworld

Forum Index | FAQ | New User | Login | Search

Make a New PostPrevious ThreadView All ThreadsNext ThreadShow in Flat Mode*


SubjectRe: Ahhh! That's genius! Reply to this message
Posted byRiff
Posted on12/15/03 00:38 AM



> > Or, to make it truly portable, have an array of structs of a type like this:
> >
> > struct instruction
> > {
> > void (*Handler)(struct instruction *);
> > UINT32 field1, field2, field3, etc. ;
> > };
> >
> > And simply iterate through this. I think this is what ElSemi was talking about
> > (correct me if I'm wrong.)

This is more or less how instructions are cached in Nuance except that I store indices into a handler array instead of storing a direct pointer. Each instruction cache entry contains up to five intermediate instructions which I call Nuances. NuonEm uses a similar structure but the fields are explicitly named and referenced for a fixed execution order. Nuance reschedules instructions within each packet and uses a generic for-loop to call the handlers. This method is well-suited for both caching of pre-decoded instruction packets that are to be interpreted, and to use as an intermediate code representation to supply to compiler routines

> > It's a good approach, especially when you don't have to keep re-fetching code
> > (if it's mostly in ROM, for instance, or in protected non-writeable memory
> > pages) and you want to avoid having to write a full-blown dynarec.

I assume that by fetching, you really mean decoding. The amount of data fetched will usually be greater in the cached representation than it is in the original encoding. Indeed that is the whole point, to trade size for speed.

I'd argue that its a good solution in any situation where decoding carries a moderate cost or greater unless there are memory limitations on the host platform. Memory is cheap and plentiful. Decoding is often quite expensive. This is particularly true for variable-length instruction encodings and VLIW streams where you don't know where the instruction packet ends until you fully decode it.

Besides, the generic array structure is suitable for both interpretation and for doing analysis and code generation. For simple CPUs, it might be a waste of memory, but it still has a good chance of being faster if a given code sequence is frequently executed.

> >
> > I don't know if it would be worth it for emulating a 68K on X86, though. Flag
> > calculation is simple (LAHF, SETO AL), and decoding can be pretty optimal, as
> > A68K shows.

The only problem is that works only if you program the instruction handlers in assembly and its only going to work if you have native flags and flag extraction instructions. The most expensive common flag to calculate is overflow and without native flag calculation the only choice is to manually compute it. The overflow flag is almost never examined, so its not needed in the majority of cases.

For an interpreter, I would use an all-or-nothing approach. Do the flag analysis and select between handlers which calculate all flags and which calculate no flags. This significantly reduces the number of handler routines you will have to generate.

For a recompiler, I don't see the point in not doing dead flag elimination. You're going to emit machine code anyway and not only will you save time on calculating flags, you also eliminate dependencies on the result of the operation which may significantly reduce code time. At the very least, you will be improving the use-efficiency of your emulator code cache and the instruction cache of the system you are running on.

> Also DSPs can perform more than 1 op in the same inst in
> parallel, so all regs must be
> fetched before any write. I can't use direct register write in the handlers
> because the "next" microop in the
> same opcode might be reading the reg we have already written, and it will read
> the changed value,
> not the original one, thus making the execution sequential, and not parallel.

This is how the Nuon Aries3 processor works. It can execute up to seven instructions in parallel using five resource units: ALU, ECU, RCU, MUL and MEM. Instead of using pointers directly, I store pointers to the various register banks in addition to the standard the register index values. Almost all of the units can operate on four-register vectors and I didn't want to have a separate operand fetch stage so I use an all-or-nothing approach. In almost all cases, the instructions can be reordered to eliminate any operand dependencies. I do full blown dependency tracking in the decode stage and choose an optimal reordering in the scheduling phase. If there is a dependency, I copy *all* possible input registers to temporaries (a hefty penalty), but parallel instructions rarely contain operand dependencies so usually the instructions can be arranged to eliminate the dependencies completely. I'm not so sure now that this is a big win over having a separate operand fetch stage, but its ultimately desirable for compiling phases as it will reduce code size as well as reduce register spilling, or at least eliminate unnecessary register saves.

>
> about the flags thing, I think I could generate the flags and noflags variation
> of each alu operation, using the
> noflags by default and then when finding a conditional inst, trace back to the
> first op instruction that affects
> that flag and replacing the handler by the flags version. I haven't thought on
> it too much, but it might have
> some trouble with flags at the end of subs and such. I'll try when I have some
> free time.

That doesn't quite work because non-conditionals might use flags as an input. You have the right idea in that you can force synchronization at branches and can work backwards, but you have to do complete dependency analysis, keeping track of which registers/flags are outputs of instruction N and which registers/flags are input dependencies of instructions N+1,...N+M. If the mask value at instruction N shows a given output flag to be a future input dependency then that instruction needs to update the flag. The simplest synchronization method is to force dependencies for all flags at branch points (as well as for instructions that modify all flags or where you dont know if they were modified) but the best method is going to differ depending on what kind of program flow you allow. If you wanted extremely optimal reduction you would have to follow both paths at each branch to determine the smallest set of flags required to be synchronized.

A backwards one-pass method works for both basic blocks and superblocks where a taken branch forces a return to the dispatch loop. If you are going to allow backwards branching in your emitted code, it will work if you force flag updates at branch points or if you follow both paths to make sure all flags are synchronized. I imagine it would work for forward branches in a similar manner.

-
Entire Thread
Subject  Posted byPosted On
*Does Generator have a 68000 Dynarec?  finaldave12/10/03 03:35 PM
.*Re: Does Generator have a 68000 Dynarec?  Bart T.12/10/03 03:48 PM
..*Ahhh! That's genius!  finaldave12/11/03 05:03 AM
...*Re: Ahhh! That's genius!  smf12/12/03 07:56 AM
....*Re: Ahhh! That's genius!  ElSemi12/13/03 07:31 AM
.....*Re: Ahhh! That's genius!  finaldave12/13/03 05:19 PM
......*Re: Ahhh! That's genius!  Bart T.12/13/03 08:31 PM
.......*Re: Ahhh! That's genius!  ElSemi12/14/03 05:18 AM
.........Re: Ahhh! That's genius!  Riff12/15/03 00:38 AM
.........*Re: Ahhh! That's genius!  Bart T.12/15/03 01:17 AM
..........*Re: Ahhh! That's genius!  Riff12/15/03 02:27 PM
...........*Re: Ahhh! That's genius!  Bart T.12/16/03 03:20 AM
...........*Re: Ahhh! That's genius!  finaldave12/15/03 03:48 PM
............*Re: Ahhh! That's genius!  Bart T.12/15/03 04:30 PM
.............*Re: Ahhh! That's genius!  Kayamon12/18/03 08:56 AM
..............*Re: Ahhh! That's genius!  finaldave12/18/03 09:36 AM
...............*Re: Ahhh! That's genius!  smf12/19/03 04:33 AM
.............*Re: Ahhh! That's genius!  finaldave12/16/03 05:16 AM
..............*Re: Ahhh! That's genius!  Bart T.12/16/03 01:04 PM
...............*Re: Ahhh! That's genius!  finaldave12/16/03 03:40 PM
................*Re: Ahhh! That's genius!  smf12/17/03 06:33 AM
.................*Re: Ahhh! That's genius!  finaldave12/17/03 06:39 AM
..................*Re: Ahhh! That's genius!  smf12/17/03 09:57 AM
...................*Re: Ahhh! That's genius!  finaldave12/17/03 12:39 PM
....................*Re: Ahhh! That's genius!  smf12/17/03 02:59 PM