|
Was thinking last night about Threaded Interpreters and thought of this... What about a system whereby the execute loop for a basic block (on ARM) is like this?:
r0=base for the cpu context ldmia r4,{r1,r2,r3,pc} ldmia r4,{r1,r2,r3,pc} ldmia r4,{r1,r2,r3,pc} ldmia r4,{r1,r2,r3,pc} ...
That way, because r0-r3 are simply the parameters of the function, and lr is return pc, you could write the handler in C! e.g.:
void OpcodeAdd(unsigned char *cpuBase,int destReg,int srcReg,int thirdReg) { cpuBase[destReg]=cpuBase[srcReg] cpuBase[thirdReg]; }
That way on an non-ARM system, you could just have a slightly slower, but portable execute loop, e.g.
#ifdef ARM asm () { "ldmia r4,{r1,r2,r3,pc}" "ldmia r4,{r1,r2,r3,pc}" "ldmia r4,{r1,r2,r3,pc}" "ldmia r4,{r1,r2,r3,pc}" } #else for (;;) { void (*func)(...)=NULL; func=memory[i 3]; func(cpuBase,memory(i],memory[i 1],memory[i 2]); } #endif
So then your intermediate recompile of the basic blocks is like this:
0x00: Dest register 0x04: Source register 0x08: Third register 0x0c: Opcode handler (version with flags or without flags, depending on basic block analysis)
Superquick ARM threaded interpreter, yet portable as well - would this work do you think? Have I missed anything?
Update - just realised r0 is also return value, which means it probably makes sense to have "ldmia r4,{r0,r1,r2,pc}" and then the base pointer in r3 instead so it doesn't get squashed. Argggh - although r3 may get squashed anyway since it's a temporary.... hmmm - can this be made to work anyway?
You learn something old everyday...
|