How does Compaq “fail-fast” principle contribute to the reliability of the NonStop Himalaya S-series servers?
System processes and critical hardware modules are designed to be fail-fast. In other words, they must perform to specified standards or they halt and go offline before any problem has the chance to propagate to other modules. Hardware and software are made fail-fast through extensive error checking. Some hardware components also perform periodic self-tests. The operating system performs rigorous internal consistency checks to verify its inputs, outputs, and data structures. In the extremely rare instance where an error occurs within a system process or the operating system detects a corrupted data structure, the operating system halts the processor and lets the backup processes in other processors take over. No two processors have identical states so the error condition is not repeated in the backup. This way, no malfunctioning system process is allowed to continue after the error is detected. Other vendors that do not support process pairs cannot react to failures in this way. In tho