What are examples of hardware and software faults and how are they dealt with in a Fault Tolerant System?
Examples of hardware faults are easy to find — disk failures, transient memory corruption etc. Software faults are usually a matter of definition — it could be a system related software fault, such as a page fault, buffer overflow, or a user process getting killed for whatever reason. If it a application related fault, obviously the application has to deal with it using an appropriate recovery mechanism. Hardware faults are usually guarded against by having redundancy. To some extent, this can be provided for software too. For example, in the Tandem NonStop System, a user can create a backup process for a process, so that if the master process gets killed for some reason, the backup process takes over. This backup process is typically on a different CPU. A process can have multiple backup processes for increased reliability.