Although research in fault tolerance by replication has matured, the results have not been widely used in practice. Existing approaches incur large overheads in resources and performance. In many cases, application programs have to be altered to explicitly manage replication. Our view is that for fault tolerance by replication to gain acceptance in practice, the following conditions must be met:
- Failure-free performance must not be penalized because of replication.
- Replication and failure recovery must be transparent to application programs. Moreover, existing programs should be able to benefit from replication without modification.
- Replication techniques must support standard protocols and systems.
This paper discusses the design and implementation of a Highly Available Network File Server (HA-NFS) using the guidelines outlined above. Key areas covered in the paper include:
- Current Status
- Future Work