In 2000, I used model-based testing to identify the bug that was the likely cause of the Mars Polar Lander (MPL) crash. See “Mars Polar Lander Fault Identification Using Model-based Testing” for additional detail.
The question came up as to whether the bug was a requirement or code (implementation) problem. It is true that there are many cases when the issue with a system is caused by a requirement problem. The requirements can be wrong and even though the software completely satisfies the requirements (verification is correct), it doesn’t perform as needed (validation is not correct).
However, in the case of the MPL, the actual software had a defect – the requirements in this case were correct. The model was created from the requirements. The tests were generated from the model, and the tests exposed the defect in the software. Here’s a short description of the defect in the software implementation – see image at bottom.
If there were two consecutive sensor reads on the sensor of any of the three legs during the first phase of descent (5KM to 40 meters), those values were stored (latched) in state memory of the software. The SW was suppose to mark that sensor/leg bad (and it did), because it could not be on the surface yet. However, when the software switched mode at the second/final phase of descent (40 meters), the software used the sensor information stored in state memory, which caused the software to command the thruster to shutdown, resulting in a crash from approximately 40 meters above the surface of Mars.
Once the software marked a leg bad (during the first phase of descent), it should have cleared state memory, but it didn’t. That single line of additional code was added, after the fact, to correct the implementation.
The model represented the value of a sensor leg at two points in time, and the test driver injected the current and previous values for a leg sensor. Because there was no access to the state information, the technique used in the test driver (a common pattern used with MATRIXx) was to call the MPL twice, first with the previous sensor value, then call it again with the current state value of the sensor input, which propagates the information in state memory. On the second call both values are latched in state memory. This is the test case that exposed the defect.