Debug, multicore, and more debug

I recently gave a talk at an industry-academia collaboration called ICES, Innovative Center for Embedded Systems, at KTH in Kista, Stockholm, Sweden. The theme was embedded multicore, and I realized that my role at these events seems to have changed. A few years ago, I would be the "embedded guy", defending the collective of embedded systems against speakers assuming that everything was a homogeneous shared-memory multiprocessor. This time was different, though. I have become the "debug fanatic".

There seems to be less need to explain embedded multicore nowadays. The academic community and multicore event participants seem to have finally accepted that there are systems out there that use more than one operating systems on a single chip (thanks to hypervisors, AMP setups, local-memory configurations, etc) and that we have hardware that is highly heterogeneous (with different makes and different types of cores all being part of a single system). In the final panel discussion, I found myself saying "build the hardware anyway you like, as long as I have good hardware debug support and introspection features".

I think this is indicative of a general truth: if you cannot debug it, you cannot build it. For multicore in particular, debug is scrambling to catch up.

One joker in the audience made the age-old claim that you only need a debugger if you code errors. True enough. There are classes of code that can be written to be correct by construction and always work. I have written such code myself. For example, I wrote a memory allocation system for a simple OS that we created in a course project during my CS undergraduate days. I wrote the algorithm on paper and tested it on paper using various scenarios (brain-powered simulation, essentially). Once I was happy with the code, I went to the computer, typed it in, and it compiled and ran without a hitch for the rest of the project (module a single typing mistake). If only all problems were as easy as that.

Such nice code and nicely-defined problems tend to be the exception. In the real world today, your code will have to integrate with other people's code, as well as operating-systems and library APIs. In such circumstances, too much is out of your own control to allow correct-by-construction coding. I am not saying that you should be sloppy – but you have to be aware that there are factors beyond your control and insight that will affect the overall correctness of the combined system. There will be bugs in code that calls your code, and in code that your code calls. There will be timing-dependent and ordering-dependent errors as the operating system schedules threads on different cores in different orders.

In this world, debugging is a necessity, and we need to support in all ways we can. I prefer using virtual platforms like Simics for debugging as far as possible, since they provide a controlled and repeatable environment for most of the debugging of your project. When you get to the hardware, most bugs should already have been removed using other tools.

Note that there is no replacement for testing (and probably doing a bit of debug) on the actual physical hardware that will be used to run the software. No matter how you develop and debug your code, in the end, it has to run on a particular physical system. On that system, you really want the most powerful on-chip on-target debug tools you can find.

For more on Simics and debugging, see some of my previous blog posts: