Highly Available Distributed Systems


Availability has become a first-order design goal in widely-distributed file systems, sensor systems, p2p systems, directory services, and large-scale infrastructure. However, availability research is lagging far behind the need. This project seeks to close the gap by (1) studying and understanding the failure characteristics of large-scale distributed systems and infrastructures, (2) developing a realistic failure benchmark, (3) comparing and shedding understanding on existing techniques for improving availability for these systems, and (4) developing new techniques better tuned to the failure characteristics of these systems.

See Haifeng Yu's home page* for a list of publications.

Researchers