Welcome to Emulationworld

Forum Index | FAQ | New User | Login | Search

Make a New PostPrevious ThreadView All ThreadsNext Thread*Show in Threaded Mode


SubjectPlaystation Emu new Reply to this message
Posted byfinaldave
Posted on12/16/03 09:16 AM



I was just having a quick look at the source code to FPSE and was shocked that the cpu emulator looks like it's actually very very compact (cpu2.cpp in the source, 500 lines) and just seems to have stuff like

case ADDU: rd = rs rt; break;
case SUBU: rd = rs - rt; break;
case BNE: if (rt!=rs) { JUMP(PC immS*4); } break;

Is this because MIPS is a very reduced instruction set (i.e. the decode is similar in each opcode)?

Is emulating a MIPS 3000A in interpreted mode as simple as this source seems to suggest? (well... simple compared to say 68000 or 65816 at least!)
Or am I missing something in the source?


You learn something old everyday...



SubjectRe: Playstation Emu new Reply to this message
Posted byBart T.
Posted on12/16/03 01:02 PM



> Is emulating a MIPS 3000A in interpreted mode as simple as this source seems to
> suggest? (well... simple compared to say 68000 or 65816 at least!)
> Or am I missing something in the source?

Take a look at the MIPS instruction set. It's a very simple processor.

No flags certainly makes things much easier.


----
Bart


SubjectRe: Playstation Emu new Reply to this message
Posted bygalibert
Posted on12/16/03 09:37 PM



> Is emulating a MIPS 3000A in interpreted mode as simple as this source seems to
> suggest?

Yes, it is. Why do you think there are that many r3k recompilers and not so many for other processors? :-)

OG.





SubjectRe: Playstation Emu new Reply to this message
Posted byfinaldave
Posted on12/17/03 05:05 AM



> > Is emulating a MIPS 3000A in interpreted mode as simple as this source seems
> to
> > suggest?
>
> Yes, it is. Why do you think there are that many r3k recompilers and not so
> many for other processors? :-)
>
> OG.

Actually I think I did miss the dynarec in FPSE when I looked (I *think* it has a dynarec).

Not having flags sounds great! It sounds like the perfect cpu to attempt your first dynarec for, since you'd only have to do dead register analysis rather than anaylsis on each flag.
Plus the end result would be loads of fun (getting PSX games to run faster). It's just a shame that ISOs are so big and tricky to fit on Pocket PCs and stuff. I have a load of old PSX CDs somewhere, I'll have to try and dig them out to make MiniPSX rips to play around with.
Remember the D1 Demo disk with Wipeout, Destruction Derby, and the Dinosaur/Manta ray demos? Great days!



You learn something old everyday...



SubjectRe: Playstation Emu new Reply to this message
Posted bysmf
Posted on12/17/03 06:21 AM



The only complicated bit about the psx cpu is the "tekken2 bug", which requires you to emulate the instruction pipeline.

Not only do you need to delay branches ( i.e. always execute the next instruction after a branch ) you also have to delay loads that occur in that instruction. It only matters if the instruction at the branch destination uses that register, pcsx does a complicated analaysis and generates code depending on that. For the MAME interpreter I just incorporated it into the branch delay code. I don't think any of the other open source emulators ever had it working properly.

smf





SubjectRe: Playstation Emu new Reply to this message
Posted byfinaldave
Posted on12/17/03 06:21 PM



> The only complicated bit about the psx cpu is the "tekken2 bug", which requires
> you to emulate the instruction pipeline.
>
> Not only do you need to delay branches ( i.e. always execute the next
> instruction after a branch ) you also have to delay loads that occur in that
> instruction. It only matters if the instruction at the branch destination uses
> that register, pcsx does a complicated analaysis and generates code depending on
> that. For the MAME interpreter I just incorporated it into the branch delay
> code. I don't think any of the other open source emulators ever had it working
> properly.
>
> smf
>

Ah yes - we studies pipelines and stuff in Uni, so does it only affect branches then?

What about situations when a register is loaded and then used, are they never trouble for emulators?

How does it manifest itself in Tekken 2?



You learn something old everyday...



Subjectthis way last opcode costs more than the first new Reply to this message
Posted byTerry Bogard
Posted on12/17/03 07:40 PM



> I was just having a quick look at the source code to FPSE and was shocked that
> the cpu emulator looks like it's actually very very compact (cpu2.cpp in the
> source, 500 lines) and just seems to have stuff like
>
> case ADDU: rd = rs rt; break;
> case SUBU: rd = rs - rt; break;
> case BNE: if (rt!=rs) { JUMP(PC immS*4); } break;
>
> Is this because MIPS is a very reduced instruction set (i.e. the decode is
> similar in each opcode)?

Maybe the overhead isn't much with a fast CPU and a reduced instruction set, but when I tried to write a Z80 emu, I chose to initialize an array of function pointers *ONCE*, and then decode opcodes this way:

opcode = readOpcodeFromMemory(PC);

(functionArray[opcode]) (parameters);

Downsides were the init time, which was kinda long, and the fact that all functions implementing opcodes shared the same signature, which was a useless parameter passing, sometimes. But no time wasted to "seek" the correct opcode implementation.

Was it a very bad idea?




SubjectRe: Playstation Emu new Reply to this message
Posted byRiff
Posted on12/17/03 07:57 PM



> Ah yes - we studies pipelines and stuff in Uni, so does it only affect branches
> then?
>
> What about situations when a register is loaded and then used, are they never
> trouble for emulators?
>
> How does it manifest itself in Tekken 2?

The use of the term "delayed" is kind of a misnomer, at least when compared to delayed branches. The operations aren't delayed in either case, but I usually associate the term delayed with branches. The branch is not actually delayed, its just that in an N stage pipeline where the last stage is execute, the first (N-1) stages will be full after the branch is executed. The "delay" is created by the fact that the pipeline is not flushed. You reduce wasted pipeline activity by allowing the previous stages to complete after the branch. This reduction in waste increases proportionally to the number of pipeline stages at the cost of making it less likely that you will be able to fill the delay slots.

Load delays is a better name for delayed loads. Because of the external memory fetch and writeback, loads cannot complete in a single pipeline execute stage. Unlike in the case of delayed branches, the loads truly aren't done after the first execute stage. The original MIPS processors performed no interlocking with the simple restriction that you can't use the target load register in the instruction slot following the load. The later processors get rid of this restriction by allowing the pipeline to continue until the target register is used at which point the pipeline is stalled until the load completes.

The Nuon behaves like the original MIPS in terms of loads and any other instruction taking two cycles. You simply can't assume that the result of the operation is complete after one cycle.

What doesn't make sense to me is that in addition to having interrupts, the R3000 has an instruction cache. A pipeline stall or interrupt can occur between instructions so for all intents and purposes you can't take advantage of the multi-cycle behavior because the operation might or might not be complete when you try to read the result. This pretty much means that you can always emulate such instructions as if they only take one cycle. My guess is that Tekken2's use truely is a bug where a register was inadvertantly used in the first cycle of a branch target where the delay slot instruction was a load targeting the register. The register was probably only meant to be read a single time so they got lucky and the software worked as expected.

The load behavior is always this way. Its not limited to branches. I'm guessing that mistakes involving the load restriction probably happen mostly in separate subroutines as (I'm assuming) in the tekken2 case since its pretty easy to check contiguous statements.

The Nuon assembler will actually spit out warnings for all instances of register port write-contention and situations where the result of an operation is used when it may not be ready. Thats one of the few things I like about it.


SubjectRe: this way last opcode costs more than the first new Reply to this message
Posted byBart T.
Posted on12/17/03 09:46 PM




> opcode = readOpcodeFromMemory(PC);
>
> (functionArray[opcode]) (parameters);
>
> Downsides were the init time, which was kinda long, and the fact that all
> functions implementing opcodes shared the same signature, which was a useless
> parameter passing, sometimes. But no time wasted to "seek" the correct opcode
> implementation.

Switch statements can often be converted to jump tables by the compiler which produces slightly better code than indirect function calls.


----
Bart


SubjectRe: Playstation Emu new Reply to this message
Posted byfinaldave
Posted on12/18/03 04:59 AM



> > Is emulating a MIPS 3000A in interpreted mode as simple as this source seems
> to
> > suggest?
>
> Yes, it is. Why do you think there are that many r3k recompilers and not so
> many for other processors? :-)
>
> OG.
>

I suppose, also, if there is only one type of branch (did I read that correctly?), there is a very easy way to find all code segments. If there is only a branch on equal or branch on not equal, then there can be no dynamic jumps like in 68000.

That means that at load time, you could scan all binary paths and recompile the entire program.

ELF Start address
|
/\
x x
/\ /\
...

Is this possible, on did I miss another jump opcode in the MIPS instruction set?


Ahhhh - is that why N64 emulators are able to recompile at load time as well, and do all the searches for sqrt and stuff like that?


Heh, I feel like I'm gradually working through the history of emulation and seeing how it's done... been through the Massage, Genecyst and Callus stages, i.e. hardware from the 80s and early 90's, and now slowly moving on to 1994,5 - PSX and N64, or rather 1998, the PSEmu Pro era.

*Update* - Oh bollocks:
jr Rsrc
Jump Register
Unconditionally jump to the instruction whose address is in register Rsrc.

Damn and blarst


You learn something old everyday...



SubjectRe: Playstation Emu new Reply to this message
Posted bysmf
Posted on12/18/03 05:24 AM



> The use of the term "delayed" is kind of a misnomer, at least when compared to
> delayed branches.

Not really, any changes to the program counter are delayed for one instruction.

> My guess is that Tekken2's
> use truely is a bug where a register was inadvertantly used in the first cycle

I believe it was a missing nop in the jr ra of the called function, so it was executing the first instruction of the next function ( IIRC it was a mfc2 ). The instruction at ra then uses the register & needs it to be the old value, the pipeline should stall but it doesn't. In theory you should count the stalled cycles, but then you should also count the cache misses too. AFAIK the real r3000 doesn't have a stalling circuit, but the derivative the PSX uses does.

I don't think the pipelines can be advanced when a cache misses or you'd end up with pretty random behaviour with regards to whether instructions in the delay slot would be executed or not.

smf



SubjectRe: Playstation Emu Reply to this message
Posted byElSemi
Posted on12/18/03 01:24 PM



Even more complicated, SHARC DSPs are 2-delayed, the pipeline fetch poitner is 2 instructions ahead than the one currently executing, that way the loop behaviour is a real bitch, because the loop condition is checked 2 instructions before the end of the loop (the registers are fetched 2 insts ahead) so if you change the register involved in the loop condition in the last instruction of the loop, it will take another loop to exit, weird, eh?. Fortunately I haven't seen any model2 game using that, so that part of the emulation is silently ignored in my current emulator :). Still all branches have to be 2 inst delayed, and the register dependencies between them caused me some problems.


> > The use of the term "delayed" is kind of a misnomer, at least when compared to
> > delayed branches.
>
> Not really, any changes to the program counter are delayed for one instruction.
>
> > My guess is that Tekken2's
> > use truely is a bug where a register was inadvertantly used in the first cycle
>
> I believe it was a missing nop in the jr ra of the called function, so it was
> executing the first instruction of the next function ( IIRC it was a mfc2 ). The
> instruction at ra then uses the register & needs it to be the old value, the
> pipeline should stall but it doesn't. In theory you should count the stalled
> cycles, but then you should also count the cache misses too. AFAIK the real
> r3000 doesn't have a stalling circuit, but the derivative the PSX uses does.
>
> I don't think the pipelines can be advanced when a cache misses or you'd end up
> with pretty random behaviour with regards to whether instructions in the delay
> slot would be executed or not.
>
> smf
>





SubjectRe: Playstation Emu new Reply to this message
Posted byRiff
Posted on12/18/03 04:56 PM



> Even more complicated, SHARC DSPs are 2-delayed, the pipeline fetch poitner is 2
> instructions ahead than the one currently executing, that way the loop behaviour
> is a real bitch, because the loop condition is checked 2 instructions before the
> end of the loop (the registers are fetched 2 insts ahead) so if you change the
> register involved in the loop condition in the last instruction of the loop, it
> will take another loop to exit, weird, eh?. Fortunately I haven't seen any
> model2 game using that, so that part of the emulation is silently ignored in my
> current emulator :). Still all branches have to be 2 inst delayed, and the
> register dependencies between them caused me some problems.

It doesn't seem weird to me, as its just the result of the instruction fetch and decode stages being before the execute stage. What I consider weird is the fact that both instructions following a delayed branch are fetched but the instruction in the second delay slot is discarded! On the Aries processors, which have three pipeline stages (and a fourth writeback stage for loads and vector multiplies), both delay slots are executed. I'm not sure why you would want to discard a second slot as its not really that difficult to find useful work to do. I wonder if the goal was to ease porting code from MIPS to SHARC.

Delayed branches are pretty easy to deal with using a simple counter. The end of my execute loop looks like this:



pcexec = pcroute;
if(delayCount)
{
delayCount--;
if(!delayCount)
{
pcexec = pcfetchnext;
}
}

The handlers for the program flow instructions will set delayCount to 3 or 1 and set pcfetchnext to the target address. If you compile code, you can eliminate a lot of the delay count checks by keeping track of which delay slots each instruction resides in. If an instruction is only in delay slot zero, I don't have to do the delay count check. If an instruction can be in delay slot one, I will only have to emit code to check if the delay count is non-zero, as the Aries processor inhibits the ECU unit instructions after a taken branch (to allow contiguous conditional branches). If an instruction can be in delay slot two, I'll emit code for the full check and decrement, and exit back to dispatch if the count is decremented to zero.


SubjectRe: Playstation Emu new Reply to this message
Posted bytratax
Posted on12/30/03 07:07 PM



> > The use of the term "delayed" is kind of a misnomer, at least when compared to
> > delayed branches.
>
> Not really, any changes to the program counter are delayed for one instruction.
>
> > My guess is that Tekken2's
> > use truely is a bug where a register was inadvertantly used in the first cycle
>
> I believe it was a missing nop in the jr ra of the called function, so it was
> executing the first instruction of the next function ( IIRC it was a mfc2 ). The
> instruction at ra then uses the register & needs it to be the old value, the
> pipeline should stall but it doesn't. In theory you should count the stalled
> cycles, but then you should also count the cache misses too. AFAIK the real
> r3000 doesn't have a stalling circuit, but the derivative the PSX uses does.
>
> I don't think the pipelines can be advanced when a cache misses or you'd end up
> with pretty random behaviour with regards to whether instructions in the delay
> slot would be executed or not.
>
> smf
>
I think the big problem was that it is a COP2 instruction, which makes things much more interesting than any memory or register access. In this case it relies on the exact implementation of the COP2 engine.

I wonder, did Tekken2 work correctly on the PS2 in PS1 emulation mode ? The R3K CPU is the same, but I thought the COP2 engine was emulated, so I wonder if they had to hack Tekken2 .. :)


SubjectRe: Playstation Emu new Reply to this message
Posted bytratax
Posted on12/30/03 07:11 PM



> > > Is emulating a MIPS 3000A in interpreted mode as simple as this source seems
> > to
> > > suggest?
> >
> > Yes, it is. Why do you think there are that many r3k recompilers and not so
> > many for other processors? :-)
> >
> > OG.
> >
>
> I suppose, also, if there is only one type of branch (did I read that
> correctly?), there is a very easy way to find all code segments. If there is
> only a branch on equal or branch on not equal, then there can be no dynamic
> jumps like in 68000.
>
> That means that at load time, you could scan all binary paths and recompile the
> entire program.
>
> ELF Start address
> |
> /\
> x x
> /\ /\
> ...
>
> Is this possible, on did I miss another jump opcode in the MIPS instruction set?
>
>
> Ahhhh - is that why N64 emulators are able to recompile at load time as well,
> and do all the searches for sqrt and stuff like that?
>
>
> Heh, I feel like I'm gradually working through the history of emulation and
> seeing how it's done... been through the Massage, Genecyst and Callus stages,
> i.e. hardware from the 80s and early 90's, and now slowly moving on to 1994,5 -
> PSX and N64, or rather 1998, the PSEmu Pro era.
>
> *Update* - Oh bollocks:
> jr Rsrc
> Jump Register
> Unconditionally jump to the instruction whose address is in register Rsrc.
>
> Damn and blarst
>
>
> You learn something old everyday...
>

Dave,

PS and N64 were both initially based on libraries and no chipset documentation. Combine this with the fact that the devkit came with C examples and GCC compiler, and it becomes pretty easy to scan for and replace library calls. Of course you have to take care about new library versions, but there is still plenty you can cheat especially on N64 emulation.





SubjectRe: Playstation Emu new Reply to this message
Posted byR. Belmont
Posted on12/31/03 01:19 AM



> I wonder, did Tekken2 work correctly on the PS2 in PS1 emulation mode ? The R3K
> CPU is the same, but I thought the COP2 engine was emulated, so I wonder if they
> had to hack Tekken2 .. :)

I don't have Tekken2 to check, but the R3k isn't exactly the same - in PS2 mode at least it has full 4k data and instruction caches just like "real" MIPS chips :-)





SubjectRe: Playstation Emu new Reply to this message
Posted bytratax
Posted on01/01/04 06:06 PM



> > I wonder, did Tekken2 work correctly on the PS2 in PS1 emulation mode ? The
> R3K
> > CPU is the same, but I thought the COP2 engine was emulated, so I wonder if
> they
> > had to hack Tekken2 .. :)
>
> I don't have Tekken2 to check, but the R3k isn't exactly the same - in PS2 mode
> at least it has full 4k data and instruction caches just like "real" MIPS chips
> :-)

Thats PS2 "turbo" mode. I bet they disable the data cache in PS1 mode, or many many games will crash that dont expect the wonders of data cache. And "some programmers" dont even expect having to flush the instruction cache, when loading in a new segment .. do they ? (wink wink ;)

Maybe I'll try to find some tekken2 and run it on a PS2 myself one day.



SubjectRe: Playstation Emu new Reply to this message
Posted bysmf
Posted on01/02/04 05:56 AM



> I think the big problem was that it is a COP2 instruction, which makes things
> much more interesting than any memory or register access.

Although the tekken 2 problem can be fixed by ignoring the cop2 instruction in the delay slot, it breaks other games.

I have it on good authority that it's a delayed load in a delay slot problem & treating it that way doesn't appear to cause problems for anything else ( although I have to admit I have unknown bugs :-) ).

cop2 stalling would be a different issue, I'm not aware of any situation where it would give you result mid calculation ( which would require you to emulate it at a much lower level ).

smf





Previous ThreadView All ThreadsNext Thread*Show in Threaded Mode