Quit Bugging Me: Induction

Working on a customer problem once, we had an interesting phenomena.  Upgrading a system with a large VME cage and several boards, the customer replaced older processor boards with what were then "new" boards.  The old boards ran at (I think) 33 MHz, the new ones at more like 133MHz.

The overall system included motor control functions and sensor feedback, there were D-to-A, A-to-D, and other boards in the system.  The problem was, with a straight-and-simple rebuild of software for the new boards, the system was giving anomalous readings.  Checking the registers for values from the A-to-D boards, there were numbers showing up that were not possible.  Readings indicated the motors were on and moving their load when in fact the motors were not connected.  Switching back to the old computer boards eliminated the problem.  We added some hardware to examine VME bus activity, to see if the new CPUs were reading the bus wrong.  We had to put the test board between the CPUs and the sampling boards.

The sampling boards (A-to-D) were in the chassis right next to the CPUs.  They had a sampling rate not far off from the new CPU's clock rates.  Moving the sampling boards farther down the chassis reduced the severity of the anomalous data reads, but did not eliminate the problem.  The bus analiser showed the data on the VME bus was consistent with the data reported by the new CPUs.  This provided the final clues needed to pinpoint the problem and fix the system.

The new boards have clocks that run at a rate close to the sampling rate of the A-to-D boards.  The clocks were physically next to the A-to-D  boards in the VME chassis.  For whatever reason, the new boards oscillators were radiating enough energy at the right frequency to corrupt samples taken by the A-to-D boards.  By moving the boards farther apart we reduced the effect (inverse square law) but not by enough to eliminate it.  To test this theory we placed steel plates in a VME slot, as a separator between the rest of the boards in the bus and the A-to-D board.  At this point, both old and new CPUs read the same consistent readings from the A-to-D boards.  Removing the metal plates brought back anomalous readings with the new CPUs.

In this case it seemed like a new system of software and boards was improperly accessing a device.  Switching back to the old baseline eliminated all the new problems.  Analytical equipment verified the readings reported by the CPU were consistent with the data retrieved from the sampling hardware, so it wasn't a problem "within" the new software or hardware. The real problem was a completely unexpected interaction between legacy hardware and upgrade hardware.