True Concurrency is Truly Different (Again)

A recent article at Ars Technica describes yet another security flaw in Windows. Nothing much new in that respect, but this is indeed an interesting attack in that it is enabled by using multicore hardware. It is not practical on a single processor, demonstrating once again how multicore is fundamentally different from multitasking on a single processor.

The attack targets software that hooks Windows kernel code to do
additional work, such as anti-virus software.
The idea is to get bad data into a hooked system call, by replacing input data after it has been validated by the hook but before it gets consumed by the kernel. The hooks run in user mode, and have to carefully validate the inputs they are given in order to not pass on bad data to the kernel. While the hook code is inspecting the inputs, they are present in user-level memory, providing a window of opportunity for overwrite the data. This is not possible in sequential code, as the code would put parameters in memory in user land, call the hooked call, and then regain control only after the call had completed.

However, if you add a second thread running within the same process, it can overwrite the parameters concurrently with the validation. This attack is not practical in a multitasking system with only one processor, as the chance of the scheduler switching threads at the right moment would be very small indeed. In a truly concurrent system with multiple processor cores, however, it is a different matter. Here, the chance of a hit increases dramatically, making the attack practical (according to the researchers).  Essentially, it is an intentional race condition, and apparently it is reasonable to get it to "work" after a few attempts.

This attack is yet another example of how true concurrency is different from interleaved time-sharing in a fundamental way. It reminded me of some early multicore experiments we did with Simics a few years ago, proving exactly this point: race conditions trigger much more often in truly concurrent environments than in multitasking environments.

We created a simple program containing a really bad race condition. The program had two threads. Both threads incremented a shared counter inside a tight loop, without any locking to protect against races.   Essentially, the core of the loop did a read-modify-write cycle in three steps, and the race would manifest itself by both threads reading the same counter value and overwriting each other's results.

Race condition illustrated

In Simics, we used a single-core and a dual-core configuration of the same machine to execute this experiment. We ran the test program 20 times in succession on each configuration, and counted the number of times the race condition showed up as an actual error in the program result (all automated from a Simics script). Additionally, we varied the clock frequency from 1 Mhz to 10 GHz to see if that would have any effect. The results are shown below:

Race condition measured

As we can see, the race only triggers rarely on a single-core machine where only multitasking switches between the threads. On the other hand, it triggers all the time on the dual-core machine, which supports the hypothesis behind the hack that I described above. As the clock frequency increases, the likelihood of an error being caused by multitasking is reduced, since the time slice of each task increases (assuming a fixed scheduler task switch rate).

This neatly demonstrates the value of Simics for multicore work. We discover real bugs inside the virtual world. The virtual world allows us to bend the hardware configurations to expose errors that might otherwise remain hidden (another bug example from lack of testing on certain configurations can be found in Microsoft KB article 950182).

We can also make use of Simics to identify exactly when a thread overwrites the data of another thread in a race. In one Simics demos, we run the same test program on an eight-core machine using several threads, along with a Simics script that tracks all writes to the shared data and raises an alarm each time the read-modify-write cycle of one thread is interrupted by another thread.

To achieve this, we put a memory write breakpoint on the shared data structure. Such a breakpoint is unintrusive in Simics and does not disturb the execution of the virtual machine. The location of the shared data is found using the built-in scriptable Simics debugger. When the breakpoint hits, we run a piece of script code that uses OS awareness (see my post about Analyzer for more on OS awareness) to check if the thread that just wrote the counter variable was also the most recent reader. If not, we print a diagnostic to the Simics command-line along the following lines:

Race condition detected!
Write to shared variable from tid 1384
on cpu p4080_simple.soc.cpu[5], at 4246640000 cycles

Last read from tid 1387 on cpu p4080_simple.soc.cpu[0],
at 4246649994 cycles, instruction at 0x10000b7c

Last write from tid 1387 on cpu p4080_simple.soc.cpu[0],
at 4246649996 cycles, instruction at 0x10000b84

As an obvious extension, we could stop the execution immediately and check which code in which thread was doing an unsynchronized access. What is nice with this approach is that it works regardless of the OS used and how the program is coded, since we look at the hardware level and do not try to understand locking or other synchronization primitives.

If you want to read more on how Simics can help with multicore software development debugging, please see our Simics white papers:

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>