>But I may have to lose that at some point or at least make two
> graphics engines.
Are you sure a tile engine would be noticably faster? It seems like you'd be trading off accuracy and a very slight per-line performance hit for a worse user experience (having to choose 2 engines -- unless you intend to make it switch automatically and transparently) and less accurate emulation with a barely noticable speed improvement.
In the end, you're still rendering the same amount of pixels. Can you identify if there's a bottleneck in line rendering?
I tried implementing a tile renderer for Genital once and it turned out to be remarkably similar to the line engine, only it couldn't handle line scrolling.
> The first optimisation I have is the blank tile one - it remembers the 16-bit
> tile code which is used for a blank tile, and doesn't bother drawing it at all.
This could be a good one. Try implementing it and see how much of a boost it gives. Perhaps you could even do something awesome like self-modifying code, assuming the target devices don't have any cache (I don't know much about ARM.)
> Reference piccy from Wiki:
> Just looking at a typical screenshot from Sonic1, you could save a lot of time
> if you could avoid plotting the background tiles which were hidden. (But is it
> worth the overhead of calculating which are invisble?)
The overhead of figuring out which tile is invisible would be minimal. It would be done only during VRAM writes.
> My question is, there must be a lot of ideas and things people have tried in the
> past since there are so many emulators with tile rendering - what optimisations
> do you think worked well?
Charles MacDonald has a really accurate renderer in Genesis Plus, IIRC. Several years ago he sent me a rough draft of a document which described how Genecyst and Genesis Plus (then MekaDrive) rendered front-to-back quickly (which is required for accuracy.) I wish I had it still (I may have a printed copy somewhere in my closet.) If you're not already doing accurate front-to-back rendering, it would have been an interesting read. Genecyst encoded priority information into the unused bits of each 8-bit pixel (since only 6 bits were needed for color.) Consequently, this is why shadow/highlight mode was never implemented. MekaDrive used a similar technique but supported shadow/highlight accurately. I think it may have had 2 separate engines it switched between at run-time.
If your target processor speeds are really low, you have no cache, and have enough memory, look-up tables may be beneficial. There are three kinds of tiles: Fully transparent (probably only 1), partially transparent, and fully opaque. If a sizable portion of the screen is covered by fully opaque tiles, it could be worthwhile to write code that detects them and then plots them without doing a pixel-for-pixel transparency check. Maybe you could even figure this data out for each tile line rather than just the entire tile.
On low-speed processors, hand tweaked assembly is the way to go :) At least for rasterizing.