Continuous Integration with Simics

Continuous Integration with Simics

jakob-engblom-intro-picture

Continuous integration is an important component of modern software engineering practice. As always, it might mean different things depending on who you ask, but typically a key part is the notion that rather than waiting until the last minute to integrate all the many different pieces of code in a system, integration and most importantly integration testing is performed as early as possible, as soon as code is ready to run. This shortens the lead time from coding to deployed products, and catches errors earlier. Testing is done as part of the check-in cycle for all code, which puts access to test systems on the critical path for developers. Each piece of code added to a system should be tested as soon as possible and as quickly as possible, to make sure that feedback reaches the developers while the new software is still fresh in their mind. Testing soon and testing quickly might not be an issue for simple applications where any standard computer can be used for testing, but for embedded systems and distributed systems, it can be a real issue. Unless, of course, you use simulation and Simics.

Simics-style full-system simulation is a real enabler for continuous integration for complex systems or systems interacting with their environment. Using hardware boards is much more difficult than using a simulator, especially for quick short tests that should be run before any code is admitted into code repositories. A simulator can be automated, and allows a company to use clusters or clouds of standard machines to run code for any variety of particular boards or system setups. What we have seen is that a typical continuous integration workflow starts with a developer submitting new code to the build system. If the build fails, they have to fix it. Once the code actually builds, quick unit tests (often known as “smoke tests”) are run to make sure the code is not totally broken. The overall flow tends to look something like the picture below – the precise number of test levels and their names will vary, but this three-tier structure is pretty typical.

ci-flow

The unit tests should run very quickly, in no more than a few minutes. The developer wants results back in the time it takes to go for a cup of coffee. The execution has to be fast and the latency to get hold of a test platform must be short. With simulation in the Simics style, both of these can be achieved. In particular, getting a target to run on is much faster when using a simulator. Just start a new simulator process on a compute server, and run the test. No need to reserve hardware or initialize and load software on a hardware board. Just do it immediately.

Once code passes unit testing, it can be subjected to larger-scale tests. First, some form of subsystem test is run where the code is tested in a real context but typically with quite small inputs. The goal is to get the subsystem-level tests done in hours. Code that passes subsystem tests is finally used in system-level tests where it is run along with all other code and functionality of the system, and subjected to long hard tests under high load and lots of traffic. The system-level tests can range in scope from simple functional tests that make sure that the systems hangs together, to “burn-in” tests that run for weeks under full load to ensure that hardware and software can stand up to rigors of the real world. Simics can handle most subsystem tests and some of the system-level tests. Still, the final system-level tests have to be run on hardware. At some point, it is simply necessary to test what is actually going to be shipped. The maxim is always to “test what you ship and ship what you test”. Thus, the physical hardware that will be shipped to the customer must be used for final testing.

Using a virtual platform like Simics can drastically reduce the amount of hardware labs needed to enable continuous integration. The quick cycles that most affect developers will be independent of hardware and can run whenever needed, regardless of hardware lab availability. It is very easy to integrate Simics as an automated test component in build automation systems like Jenkins, and Simics scripting can be used to automate runs. The below screenshot shows an example automated test session in Simics, with a script that runs a networked client program on one machine, and checks the results (in this case, a segfault) on another machine that is acting as a server. While annotating what is going on in the Simics Timeline view. It is a demo expressed within Simics, but it show just how easy it is to annotate results.

ci-screenshot

When issues are found, checkpoints can be used to capture the failed runs and bring them back to engineering, making the feedback loop  much faster and the bugs reported in a much more precise manner. The flow would be something along these lines:

ci-flow-feedback

First of all, the target system would be booted and setup to a point where test code can be applied. Depending on the nature of the test and the level of integration being performed, this can be everything from booting a single board to bringing up and initializing a complex self-organizing distributed multiple-network system. In any case, the starting point is saved as a Simics checkpoint (A) to use as the starting point for many tests. When a particular piece of code is to be tested, the checkpoint is brought up, and the system is run to load in the newly developed code. Once the code is in place and ready to run, another checkpoint (C) is saved. This checkpoint is then used as the starting point for one or many test runs on the code. Each test run would use different parameters and inputs to drive the code in different ways, which is what we show in the simple example shown above.

If an issue is found in a test run, a Simics collaboration checkpoint (Q) is saved and passed as an attachment through the issue reporting system. As discussed in a previous blog post, collaboration checkpoints include both the system state and a recording of the inputs to the system, and thus it is sufficient to reproduce the issue for the developer. No more trying to explain what happened in a text-based bug description. Instead, the bug is perfectly transported from the test system to the developer. This closes the continuous integration loop and makes sure that issues found in automatic testing are promptly addressed by the developers.

Simics also has an interesting effect on the nature of the integration used for testing. As discussed previously on this blog, you can easily combine a simulation of the physical aspects of a system with the simulation of the computer part of the system. This makes it possible to continuously test how a control system interacts with its environment. Another important aspect is that with a simulator, you can take shortcuts and replace parts of the system with stubs and dummies. This makes it possible to test more integrations earlier than would be possible with hardware, since with hardware you pretty much only have the choice of having the real system and not having it. With Simics, you can do continuous integration testing all the way to system integration long before hardware is actually available or even before the hardware design has stabilized.

Creating and managing multiple system and network configurations for testing is often difficult in hardware. The number of hardware lab setups is limited by hardware availability, and reconfiguring a hardware setup with different boards and network connections is time-consuming and error-prone. With Simics, it is possible to write scripts and save setups as software, making configuration an instant process. Configurations can also be saved in version control systems, allowing hardware and software configurations to be managed together. See a previous blog post for more on network simulation in Simics.

Testing can naturally be performed in parallel, since virtual platform availability is only limited by the number of servers that can be used to run Simics. This increases the amount of testing that can be performed within a given time, compared to only using hardware setups. Using techniques like checkpointing, it is possible to shorten test execution time by starting from booted setups rather than rebooting the test system for each test. The cycle time can also be reduced by using various shortcuts in the simulator to bring newly built software into the system, such as loading directly to RAM rather than flashing a boot FLASH.

This piece was an edited excerpt from the upcoming book about simulation and Simics that I have been working on, showing you the kinds of information we have in the book.