I have a paper about "Transporting Bugs with Checkpoints" to be presented at the S4D (System, Software, SoC and Silicon Debug) conference in Southampton, UK, on September 15 and 16, 2010. The core concept presented is to leverage Wind River Simics checkpointing to capture and move a bug from the bug reporter to the responsible developer. It is a fairly simple idea, but getting it to work efficiently does require that some things are done right.
This approach to bug reporting solves two of the fundamental bug reporting problems with the same mechanism:
- How to reliably reproduce a bug at the developer side
- How to capture the relevant target system state required to cause a bug to trigger
Today, when you do this in a typical bug reporting system, the reporter has to describe both the steps to reproduce the bug and the necessary system setup. Doing this using plain text tends to cause missed steps as well as non-exhaustive system state descriptions. The following is a fairly typical example from the real world:
Given such a bug report, we often end up in iterations between the developer and the reporter. By an exchange of questions and answers, a clearer picture of the bug and its triggering context will eventually emerge. If the developer still does not succeed in recreating the bug, the final result is that the bug is marked as "unconfirmed" or "works for me" or "invalid" and never resolved.
For embedded systems, reporting a bug is even more complicated than for desktop software. In addition to the issues related to the software configuration, the hardware configuration also enters into the list of possible variables - including configurations like clock speed, jumper switches, neighbors in the network, attached specialized hardware, and similar things which are not obviously part of a bug report or easy to replicate. With a Simics virtual platform checkpoint, the hardware configuration is also of the package. The checkpoint will contain the complete hardware and software configuration when the bug triggered, making describing the state very easy.
The reproduction of a bug is also facilitated by a virtual platform. Since a virtual platform should be deterministic, running from a checkpoint of a state just prior to a bug triggering should reproduce the bug. Always, and every time. The reproduction aspect is particularly interesting for the very complex systems being built today. If you take a multiple-board, multiple-SoC, multicore system and manage to make it run the same way twice even in a controlled lab, you are very lucky. Parallel bugs are notoriously hard to reproduce, but with a checkpoint and a deterministic virtual platform, reproduction is trivial - even in a lab on the other side of the world.
See you in Southampton for more details on just how to implement bug transportation and how to make it really efficient and effective in practice.