After Summer, I have been in contact with Tennessee Carmel-Veilleux at the École de technologie supérieure in Quebec, who has been doing some really cool work with Simics. Tennessee has learnt Simics very quickly, and has been very clever in how to apply Simics to the research problems he has encountered. In this interview, he tells us some more about his research and use of Simics.
I prefix my comments with "JE", and his with "TCV".
JE: We start with the basics. Can you introduce yourself?
TCV: My name is Tennessee Carmel-Veilleux and I am a masters student in the Electrical Engineering department at École de technologie supérieure (ÉTS) in Montréal, Québec. I hold my B.Eng EE from ÉTS as well as an Electronics Design Technician diploma from Collège Maisonneuve. I am part of the AREXIMAS joint project between ÉTS and Polytechnique. My advisor is Prof. Jean-François Boland and my co-advisor is Prof. Guy Bois from École Polytechnique. I am finishing my Master's remotely from Waterloo, Ontario, where I live with my wife and our son.
JE: What is your university and research group?
TCV: École de technologie supérieure is an engineering school in the University of Québec's public university network. It has the largest undergraduate engineering program in the province of Québec. Admission is restricted to students who have previously completed technical collegiate studies diplomas, although they are now accepting non-technicians through a one-year technical apprenticeship
I belong to the AREXIMAS (ARchitectural EXploration of Integrated Modular Avionics Systems) research project, a joint effort between École Polytechnique de Montréal and ÉTS. We are funded by the CRIAQ, a local aerospace academic research consortium that fosters links between industry and academia. The project currently has three faculty members, three masters students and one PhD student.
During my undergraduate studies I was significantly involved in the SONIA autonomous underwater vehicle (AUV) project. The goal of that project is to build intelligent submarine robots to participate to AUVSI and ONR's internation AUV competition. Since starting my master's, I have been doing technical consulting with the team.
JE: What is your research about?
TCV: My research project is about the use of multicore embedded processors for Integrated Modular Avionics (IMA) applications. IMA is a systems architecture for avionics platforms that consolidates multiple avionics functions on highly-integrated embedded computers. Classical "federated" avionics systems used to have one single computer for every avionic function (such as Flight Management System (FMS), Flight Control System (FCS), actuator controls, cockpit data display systems, etc) [1,2]. Newer aircraft such as the Airbus A380, Boeing 787 and Dassault Falcon 7X all use IMA to reduce the size, weight and power used by their avionics bay.
JE: Multicore and Avionics, sounds like a difficult combination to me?
TCV: Right now, the market is very slowly starting to evaluate multicore technology for avionics applications. On one hand, this is because multicore SoCs are moving so quick that very little reliability and availability history have been established . The complexity of modern SoCs doesn't help with safety and certification concerns. On the other hand, IMA requires the use of robust partitioning, such as that provided by ARINC-653 operating systems , to ensure fault containment amongst the multiple separate applications that run on an IMA platform. There are still many open questions about how to certify multicore versions of these for flying .
This field has a lot in common with the current interest in hypervisors for running multiple completely separate applications and OSes on embedded systems. Multicore processors offer the enticing prospect of integrating an even higher number of applications in the same physical space as current single-core systems.
This is where my research project comes in: I am adapting an existing academic robust partitioning kernel called XtratuM  to work on multicore PowerPC SoCs such as the Freescale MPC86xx, MPC85xx and QorIQ series. The picture below shows an overview of the Xtratum architecture:
- Provide a proof of concept multicore robust partitioning kernel that can be used for research of safety issues mitigation.
- Highlight implementation safety issues in moving from current single-core IMA systems to multi-core IMA systems.
Some companies are already working on certified multicore avionics operating systems. My research project aims to bring a research-grade platform to other academic users so that work can be done on solving already identified multicore safety problems without resorting to proprietary implementations. Iwant to allow other researchers to try different approaches to maximize IMA integration on a multicore processor while remaining safe. Surely not all ideas have been tried yet.
It is well understood in the IMA community that multicore processors trump all assumptions about computing platforms that underlie current IMA standards and implementations. We sort of need a paradigm shift because what integrators want are higher integration and low development and upgrade costs. These conflicting goals have many possible solutions with multicore/manycore processors, but for sure we need to see beyond the current ideas that try to cast multicore in the mold of a multiplexed single-processor (to follow the structure and strictures of existing standards).
JE: How do you use Simics in your research?
TCV: I used Simics as my main prototyping tool for code development. The Simics MPC8641-simple model (containing a Freescale MPC8641 SoC and external hardware needed to run code) has allowed me to validate my code base from one through eight cores and has made debugging of the kernel a snap. I've worked with a lot of architectures and development systems from many vendors, but Simics is simply the easiest embedded software prototyping environment I have used. I am also testing the kernel on an MPC8572-based hardware platform with a BDI-3000 debugger, but it is nowhere near as productive.
Up to now, I have used Simics to debug and test a bare-metal multi-core system bring-up, synchronization primitives (barriers, spinlocks) and low-level drivers for PIC, UART, clocks and exception processing. Reverse execution and scripting have been especially useful in helping to debug race conditions, interrupts and booting issues, which are usually incredibly tricky to debug with JTAG or hardware probes, since they are all intermittent and often time-dependant issues.
We are also working on possibly bringing Simics to the classroom for embedded programming labs.
Down the road, we hope to be able to target the QorIQ P4080 cycle-accurate model so that even more progress can be made in the development of multicore robust partitioning. Some of the most difficult problems being tackled right now in this field are WCET (worst-case execution time) estimation and resource-allocation and schedulability analysis in partitioned systems. These problems can benefit from comparing analytical solutions to actual implementations on current and upcoming complex computer architectures.
JE: As a side note, I actually worked on WCET analysis for my own PhD thesis. There are a lot of details that can affect predictability in multicore SoCs with many shared resources. Do you have any exciting results to share yet?
TCV: Robust partitioning implementation such as ARINC-653 mandate the use of a two-level scheduling scheme with static schedules employed for partition switching, and then fixed-priority scheduling for threads (tasks) in each partition. The AMP case where all cores would share scheduling quanta, but no two cores would ever share data from the same partition, appears to be the most tractable case.
The more interesting case of SMP static scheduling, where applications can be spread over several cores for the same partition time interval, is much more problematic than we had initially hypothesized. This is because of the difficulty of providing a usable parallel programming model without breaking space and time partitioning assumptions.
This picture shows an SMP time partitioning static schedule, with a problem spot at the partition context switch point.
JE: I don't immediately understand the problem. Could you explain it in some more detail?
TCV: The robust partitioning kernel is an hypervisor in this case. The hypervisor must ensure that partition switches occurs at the right time to prevent overruns that could take time away from the next partition. Usually, the static schedule has some slack at the end of every schedule slot, to account for the overhead in partition switching. That slack must also account for overhead in hypervisor calls, like the case where we would do an IPC hypervisor call just before the partition switch time. One example would be the partition OS doing a hypercall to effect a "return from virtual interrupt".
In the SMP case, the programmer of the system software expects an abstraction where all cores are always active at the same time for each partition. However, since the hypervisor partition switch can happen with small differences in timing on each core, we can have odd situations.
An example would be that cpu1 acquires a spinlock for a short mutual exclusion, but gets switched-out by the hypervisor before having had time to release the lock. If cpu0's hypervisor runtime has not yet triggered the partition switch, because of jitter and timing differences, problems start to show up. Suppose that cpu0 also tries to acquire the lock. It will spin for much longer than anticipated, basically until the end of the partition time slice. Hundreds if not thousands of cycles can be "stolen" this way, which could throw off schedulability analysis. The application development team, assuming the lock would be held very shortly, might not have taken proper steps to ensure that this implicit lock-holder preemption was mitigated.
There are many corner cases like these that need to be handled. We don't want the programmers of system and application software to have to consider and take care to address them all, as that is not a tenable programming paradigm. For the example I just mentioned, there are software solutions at the hypervisor level that can eliminate it, but always at the cost of complexity or decreased processor utilization. In some other cases, common assumptions about SMP systems are completely trumped by the safety mechanisms in the hypervisor. That's another reason why I believe AMP schedules with no explicit dependency or sharing between cores are simpler in many respects. Maybe I'll prove myself wrong by the end of my project 🙂
JE: Ouch, that hurts… I can see how you end up with worst-case assumptions being added to worst-case assumptions until you eat up a significant portion of the time slice. Nothing you want regular programmers to have to concern themselves with, certainly.
Let's get back on the Simics track. I think I saw that you did some smart things with Simics to let you focus on the essential problems and not the accidental problems of your target system.
TCV: On a more practical side, I have been experimenting with different ways of simplifying low-level embedded development with Simics. I have recently blogged about using the fact that Simics is a functional simulator to make prototyping easier. It's easy to forget that sometimes the main problem to solve is completely decoupled from the low-level aspects of the system.
Having a super-optimized implementation of a driver that works identically between the virtual hardware and the real system is sometimes required. However, you can often simply replace entire portions of a system with behavioral equivalents that use the functional simulator's introspection capabilities. For instance, before sitting down to write a working multi-core timebase synchronization scheme, I coded a "magic instruction" that simply forced a full system clock synchronization from behind the scenes. I could replace tons of runtime and significant debugging time by less than 10 lines of Python code. The rest of the kernel did not care about whether the clock synchronization was based on my code or some virtual platform tricks.
JE: Could not agree more. Do you have any other technical tid-bits to share?
TCV: My technical blog is at http://www.tentech.ca. It is a mix of embedded systems work, hardware design projects and open-source tools. I used to have "train projects" which progressed every day on the commuter train when I was back in Montréal. Nowadays, they are "after-the-baby-is-asleep" projects 🙂 I often post about solutions I find to my everyday work problems and that includes some nice ones about Simics.
Another interesting aspect I have been writing about is the use of live instrumentation and code patching to find bugs and try bugfixes in exception-handling and interrupts code. That part of embedded systems development is usually a terrible time sink, especially when no usable simulator is available.
JE: I saw your online movie about that. I have one comment on the methodology there though. You should use the "clear-recorder" command after patching the target code, to make sure that there was no residual recording of the future of the system which could cause strange artefacts to happen.
No academic write-up of a subject is complete without some references. In the text above, we find: R. Walter and C. Watkins, “Genesis platform,” in Digital Avionics Handbook, C. R. Spitzer, Ed. Taylor & Francis Group LLC, 2007, vol. 2, ch. 12, pp. 12–1 – 12–28.  J. W. Ramsey. Integrated modular avionics: Less is more. Avionics Magazine, Access Intelligence LLC, February 2007.  R. N. Mahapatra, P. Bhojwani, and J. Lee, “Microprocessor evaluations for safety-critical, real-time applications : Authority for expenditure no. 43 phase 2 report,” Federal Aviation Administration, Washington, DC 20591, Technical Report DOT/FAA/AR-08/14, June 2008.  ARINC, Inc, Avionics Application Software Standard Interface, Part 1- Required Services, ARINC, Inc Specification 653P1, Rev. 2, Mar. 2006.  L. Kinnan, “Use of multicore processors in avionics systems and its potential impact on implementation and certification,” in Digital Avionics Systems Conference, 2009. DASC ’09. IEEE/AIAA 28th, Oct. 2009, pp. 1.E.4–1 –1.E.4–6.  A. Crespo, I. Ripoll, and M. Masmano, “Partitioned embedded architecture based on hypervisor : The xtratum approach,” in Dependable Computing Conference (EDCC), 2010 European, Apr. 2010, pp. 67–72.