Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why is fault tolerance vital to the success of grid computing?

April 26, 2017computing fault Grid success tolerance vital

0

Posted

Why is fault tolerance vital to the success of grid computing?

1 Answer

0

Posted

Comprehensive error handling and automated recovery are essential to robust, scalable systems. Grid computing especially pushes these limits of reliability, since it entails a large number of distributed components, all of which are subject to various types of unpredictable failures, such as computer hardware breakdowns, operating system crashes, loss of communications, and abnormal exits from application software. Historically, it has been too difficult to program distributed systems that can automatically recover from failures in individual components and keep working. For businesses, this has meant that distributed computing by PCs was more of an academic exercise than a commercial reality. Large-scale, distributed computing is now practical, but only if the middleware has been designed to handle a variety of inevitable failures. Without fault tolerance, systems can’t grow large or complex, which is just where grid computing is most needed and offers the greatest benefits.