High Availability And Fault Tolerance in the Server World

High availability and fault tolerance are crucial to all modern enterprise server deployments and help companies manage high volumes of customer traffic. While redundancy is the traditional method for achieving high availability, it actually increases the number of potential component failures by increasing the number of components. Therefore, redundancy, if not applied properly, can end up decreasing system availability. So, should redundancy remain the top consideration, or are alternate methods available?

This paper explores the various aspects involved in achieving highly available and fault tolerant systems. Some of the key areas that this paper touches upon include:

  • What is high availability?
  • Reliability
  • Fault tolerance
  • Lockstep
  • Achieving high availability through redundancy and fault tolerance
  • Elimination of single points of failure
  • Load balancing
  • Fast fault detection
  • Resiliency

