Mitigating Risk

The phone rings.  A customer has an application destined for a high-risk environment, and somewhere in test they’ve found a new unanticipated condition.  A "Tiger Team" is formed – a group of experts who’ve "been here before", who understand the priority and the risks, and know how serious it is.  Many times a Tiger Team may give the go / no-go, the final answer that either saves a mission, or drops all that work into the dust-bin.  Such a team itself represents a *lot* of work, frustration, and time – investigations may run on days, weeks, or months, as long as they reach their conclusion on-time.  And there’s the one facet that can’t be changed by any engineering practice:  Time.

Developing space, mission- or life-critical applications is serious business.  The deployment environment may be harsh – and much about it may be unknown.  To make sure missions are successful, the known risks are eliminated as well as possible.  Some of these risks may include high radiation, occlusion by the Sun (Solar Conjunction), extreme differences in temperatures, harsh chemical environment, vibrations, or electrical noise.  Some of this you can plan for with hardware, some of this you can’t.  You address the things you know about, that’s all you can do.

With software engineering, you’re also faced with unknowns.  One of the biggest impacts is the unknown date when you’ll finally have the actual hardware the project is going to run on.  Using simulators can help out, a lot, enabling some degree of parallel development when hardware time is rare.  I encourage most of my customers to leverage the simulator wherever possible.  It’s a good way to kick-start development, enable more hands on the project, and bring new engineers up to speed.

Once you have your hardware, there  may be some hardware-level debug issues.  Having a hardware debugging interface of some sort – an In Circuit Emulator (ICE), for example – can be invaluable.  As long as one has an LED to blink, one can write polled-output routines to display register settings (etc) to assist in debugging a hardware interaction, but nothing is as good as being able to dump desired address ranges (registers, etc) at will.  An ICE can give you that ability to see inside the hardware.

What do you do if the hardware is just not going to be available?  You can use similar boards, processors, etc, but.. it’s not the *same* thing.  The minute you change the foundation, everything above it is going to have a new set of interactions with the foundation, and some of those may induce problems.  When updating hardware, timing, electrical, or even errata related to updated chips or even PAL equations may cause engineering delays – even if it’s just a newer version of the same board.  Switching entire boards – or from engineering / test boards to "flight qualified" boards – can complicate the issue.

So.. what can you do when you don’t have the real hardware on-hand, to help reduce risks associated with developing for that hardware?  In the past few years, hardware emulation tools have come a long way.  Hardware emulation is like a simulator – vxSim for example – except there’s a layer of software that’s designed to act *just like* the real hardware would, right down to devices and registers within those devices, etc. 

Leveraging known-good (mature) software designs is another way to limit risk.  Re-using application code that is well understood is one example.   Selecting a base of software that’s been certified to adhere to accepted standards is another way.  It’s always nice to know that software has been scrutinized by eyes other than the manufacturer’s, and been found fit.  And there’s no substitute for a rigorous test strategy, or following the oft-learned adage: test as you will fly, fly as you did test.

Safe design practices can’t eliminate tiger teams completely – there will still be unknowns, still be conditions that aren’t discovered until the last moment.  But by combining proper practices, mature systems, and the proper tools, many of the problems that lead to the need for tiger teams may be addressed, and may be discovered far enough in advance to prevent a project from running out of Time.