Why Simics Won’t Run Super Mario

By Jakob Engblom

Engblom_lgEmulating old computer gaming systems is a popular consumer-market application of simulation technology similar to Wind River Simics. There are a large number of emulators out there, emulating everything from old arcade games to the home computers of the 1980's to gaming consoles.  These emulators might seem quite similar to Simics on the surface, but in practice, the simulation technology employed is quite different.  You will never see Simics simulate an old Super NES running Super Mario Bros. 

It all comes down to the level of timing accuracy required.

To accurately run games written to run well on the very limited hardware of the 1980s, you need a fairly detailed simulation of the hardware circuits and their timing.  Game programming on 1980's and early 1990's was a matter of writing code that was very tailored to the hardware and intimately depended on the hardware timing.  A typical trick applied on many system was to change the color palette, display mode settings, or even the contents of the screen buffer during the time that it was being redisplayed on the TV. Since every unit shipped was identical, this was a viable strategy.  I wrote code like this myself on my old ZX Spectrum, achieving increased vertical color resolution, by changing the color data in the graphics display while the graphics "system" was sending display data to the attached TV.

ArsTechnica ran a very interesting article in August of 2011, about the most accurate emulation yet of the Super Nintendo Entertainment System (SNES) (or Super Famicon as it was known in Japan).  The article describes how successive generations of SNES emulators have got closer and closer to the actual hardware timing.  The timing that needs to be simulated can be very detailed indeed, such as the interleaving of bus accesses between different hardware units.  At the same time, hardware requirements for running the emulator has increased hundredfold from the first emulators from 1997 until today. Precision costs compute power, and this is essentially about using 100 times more computation cycles on the host to run the target system just as fast.

As described in the article, the latest "BSNES" emulator is about 10 times more resource-intense than the previous "ZSNES" simulator, for an increase in accuracy of 10.  That is actually very impressive, usually precision tends to be exponential in cost.  However, the really interesting statistic is that this only gives you a few more playable games.  ZSNES manages to run about 95% of all games, for a much lower cost in computation and implementation. Only in a few cases is the additional accuracy of BSNES absolutely needded in order to correctly run games or render graphics effects correctly.

This is all very interesting, but in which way is the emulation of old video games relevant for what we do today with Simics and virtual platforms in the embeddded systems field? It offers a neat illustration of the tradeoff between speed of execution and level of detail (and accuracy) of the virtual platform model.  Simics is not trying to be BSNES, it is much more like ZSNES or its predecessors. 

Simics is designed to emphasize speed of execution over detail of model. This is a deliberate choice, and it has proven critical in enabling the usage that Simics is seeing today.  By being fast enough to real-world software workloads in a reasonable time, Simics is acceptable to software developers as a daily tool. Interestingly, this was echoed in a DAC 2011 panel on ESL – without sufficient speed, software developers will not use a virtual platform. It is a crucial enabler.

The trade-off is pretty drastic: adding just a little detail to a simulator can quickly cut your speed down by a factor of ten. The diagram below has been shown in any number of variants in any number of venues, and the crucial fact that the curve is not straight. It drops very quickly initially as you go from left to right.

Speed vs detail
The reason that building a fast simulator in the Simics style works is that current software does not rely on the physical system timing in the same way that old console games do. The hardware today is very variable in its timing (processor pipelines, multiple levels of cache, branch prediction, multiple processors competing for shared resources, etc., all conspire to make execution-time variable and unpredictable). With the proliferation of levels of software abstraction, code that is tightly coupled to the hardware is almost impossible to write in general.  Instead, hardware-software interfaces have become event-driven, using interrupts to signal completion, or using poll loops on status bits for handshakes.  Code is also supposed to live across several generations of hardware, discouraging coding that is too tightly dependent on the characteristics of any particular hardware platform.

This is much more simulation-friendly since there is no need to know exactly how long an operation would take on hardware, as long as the simulator latency is of the right order of magnitude.  For most use cases, adding more detail than that just makes the simulation run slower, limiting its usefulness for the majority of software developers. It also makes the development of the models more expensive, since building a very accurate and detailed model is many times more costly than building a good enough model.

There is certainly software that does depend on the precise timing of the hardware. In real-time-critical software like signal processing and low-level firmware, there is hardware that does have very predictable timing, and software that depends on this timing.  Still,
such software is but a small part of the overall body of software that we run on today's systems. When needed, you can often build small workarounds into a simulator to make the software run right – just like the many game-specific patches in ZSNES discussed in the ArsTechnica article.  This is overall much more efficient than always simulating at a high level of detail just to make the really picky software happy.

In our experience, it is more valuable to run all of the software at a sufficient level of detail, than to run a small part of the software with a very high level of detail.  And just like we see with the ZSNES simulator, you can cover a very large base of software without going into too much detail on the hardware side. 

I hope you now understand why Simics is not likely going to run Super Mario at the level of fidelity of BSNES – and why doing so really does not make sense for most uses of a virtual platform. But I still applaud and admire the effort and dedication that went into building BSNES, and the way in which this is preserving an important part of our digital heritage.