Variable and Deterministic

A virtual platform like Wind River Simics is generally designed to be deterministic. Determinism brings a host of benefits to users (repeatability of bugs, reverse execution, bug transportation, etc.), but is also easily misunderstood. Quite often, users fear that they will lose an important "feature" of a physical machine – its built-in variation across runs of the same software. If the virtual platform is deterministic, surely it will not expose all the different ways a program can run that you get on a random physical machine? Yes it can.

There is a huge difference between the repeatable determinism of a virtual machine and completely invariant, predictable, predetermined behavior.

The determinism of a virtual platform means that given the same initial state and exact same sequence of inputs (including the precise timing of the inputs), the execution will be the same. Anything that has happened on the virtual platform can be precisely repeated, but there is no effect on how things will execute in the future. The virtual platform does not restrain or limit the execution of software on the target machine.

Deterministic does not mean predetermined. In a predetermined execution, you would know what is going to happen before it happens. If we take the case of running a new piece of software on a target system, testing would be pointless – since we would know the future results without even running the program once. This is not how software usually behaves. Rather, when you run a fresh compile of a changed program, you do not know what will happen (you certainly hope that things will go well, produce the right results, etc., but until the program has run, you do not actually know). Using a virtual platform does not change this. Since the initial state (which includes the program code) is different, the execution will be different. The program will not repeat the same execution it had when a previous build was run on the virtual platform. Each time is different.

In a variable system, each time something an action is performed, the effect might be different. A good example is running a multithreaded program on a multicore computer. Each time it runs, the precise pattern of thread starts, thread switches, lock and mutex operations, thread communications, and other concurrent properties of the program will be different. Sometimes this affects the program results, sometimes not. Such variation will be present on a deterministic virtual platform too, as long as the program starts from a different state each time. A typical example of this is running the same program many times in succession on the same system.

A good example to understand how this works is to look at how the race condition test program discussed in a previous post works on Simics. In a typical demonstration, we run the program once, and show the output. As can be seen in the screenshot below, we print the actual value of the unprotected shared variable. This is a fairly good measure of how many race conditions the program hit – essentially, each race is going to subtract one from the sum.

The next time the program is run, will you get the same or a different result? In all likelihood, you get a different result, since the initial condition has changed as time passes while we type on target system command-line. We also cannot say ahead of time what each execution of the program will report – since that can only be decided by actually running the program in the current state of the target machine.

Variation on MPC8572

If we use Simics reverse execution to back up and redo this sequence of program runs, we will see the runs repeating with the exact same results as in the first set of runs. That is the repeatability and reversibility that you want to have in order to develop and debug multi-threaded software. Play the short (silent) movie below to see repeatability and reverse execution in action.

We could also impose a repeatable scenario by saving a checkpoint of the machine before running the tests, and then run the tests from a script. This would reproduce the exact same timing of input each time, and therefore execute each successive run of the program in exact same way each time. But each successive run would still get a different result.

Another way to get variable behavior into a simulation is to add some form of explicit variation to the execution. Typically, a simulation module would be injecting variation into the system based on a pseudo-random number generator. Examples of variation would be to change the precise time that programs are started, to delay network packets in a virtual network, or to add jitter to the time that timers trigger. Such variation should be deterministic, in the sense that if we track the random seed or current state of the random-number generator, we should be able to repeat any execution precisely. A higher level of variation is to change the target hardware configuration, which will certainly cause the software to execute in a different way.

If the goal was to repeat the same execution, we would start the simulation from a checkpoint, run a target program under scripted control once, and then quit the simulator. In this case, each run would start from the same state and get the same inputs at the same time, resulting the in exact same execution with no variation. The key is that the we get repeatability of a particular execution as it happened, even as the actual execution was not possible to predict prior to doing it.

I hope that this post has helped you understand the difference between variation, predictability, predeterminism, and determinism. Determinism is a very powerful property of a virtual platform. It does not detract from its ability to test software, it just makes it trivial to go back and repeat any run with perfect fidelity.