Iterative Hardware-Software Interface Design

Making good design decisions is hard, making bad decisions is easy. The best way to avoid really bad design is to actually work through how a certain design works in practice. One of my favorite examples is how Jeff Hawkins walked around with a mockup of the original Palm Pilot to tests its real feel through daily "use".

The same principle applies to software architecture. When you design a software component that other programmer's will use, the best way to make sure the design makes sense is to use it to accomplish something. Just creating an API and some unit tests is not likely to result in a design that is easy to use and that works well in practice. To me, this is one of the key insights of agile methods and iterative software development. By geting something half-finished into the hands of users and collecting feedback from actual use, the final design is much more likely to be good.

Working with embedded systems, I have seen quite a few hardware designs with bad programming interfaces. Quite often, it looks like the hardware design was never really reviewed or tested before being committed and literally set in stone. Usually, the difference between a good design and one that makes driver programmers tear their hair out in exasperation is very small. Adding a few extra status bits or making the layouts of registers just a bit more regular is often all that is needed to turn a bad design into a good one.

Most example of bad hardware design resulting from not talking to the software teams are confidential or vague word-of-mouth stories collected at hardware and software conference. However, I have managed to find some public examples of hardware design issues as seen from the software side.

There is a very interesting transcript of a discussion with the Windows driver team at Microsoft. Over and over again, they are essentially telling the hardware team: "please, please, can we talk before you freeze the design?" Some examples of concrete issues:

If hardware engineers understood that Windows is not a real-time operating system and it cannot put tight bounds on interrupt latencies, then they would not create hardware that has to be "touched" within a timeout period.


If every hardware engineer just understood that write-only registers make debugging almost impossible, our job would be a lot easier. Many products are designed with registers that can be written, but not read. This makes the hardware design easier, but it means there is no way to snapshot the current state of the hardware, or do a debug dump of the registers, or do read-modify-write operations.


Another typical hardware trick is registers that automatically clear themselves when written. Although this is sometimes useful, it also makes debugging difficult when overused.

Jack Ganssle has a more deeply embedded view:

Use many narrow I/O ports rather than a few wide ones. When a single port controls 3 LEDs, two interrupt masks, and a stepper motor, changing any output means managing every output. The code becomes a convoluted mess of ANDs/ORs. Any small hardware change requires a lot of software tuning. 

Steve Chessin tells a long story about fault injection in the ACM Queue, with a very important conclusion:

Note that deciding whether to preserve coherency on a diagnostic access is an example of the many decisions a chip designer must make. Prior to Sun's e-cache parity crisis, these decisions were made by the hardware designers without consulting the software error-handling experts. Since that crisis, error and diagnostic reviews of new chips are a required part of the hardware design cycle.

These reviews are joint meetings of the chip designers and the software people responsible for error handling, diagnosis, and containment. They are held early enough in the design process so that any deficiencies in the treatment of errors by the hardware (such as a failure to capture important information) can be corrected, and suggestions for improvements can be incorporated.

All of these examples essentially say the same: if the hardware designers would talk to the software designers before finalizing the design, much software developer grief could be avoided. We simply have to connect hardware developers and software developers into an iterative, agile, flow. Hardware developers need to give the software developers something to work on, collect feedback, and adjust the design before it is frozen .

The obvious technical solution to this is to use virtual prototypes, where the hardware team supplies the software team with early functional models of their devices, and the software team tries to write driver code and use the devices. The act of designing and writing a driver will expose bad design choices in the hardware-software interface. If this is done early enough in the development process, the cost for correcting the issues will be low. In the end, the device drivers are likely to be more stable and take less time to write (since they have better hardware beneath them), and thus the final product should ship faster with fewer issues.