Collaboration Summit 2013 - Hardware Error Handling Improvement for Reliable KVM Hypervisor
The Linux Foundation Collaboration Summit 2013 San Francisco, California Hardware Error Handling Improvement for Reliable KVM Hypervisor By Mitsuhiro Tanino In virtual environment, many guests are running on one hypervisor and reliability of KVM hypervisor is really important. One of the key features is ""hardware error handling."" In order to minimize area of influence when hardware error, such as Machine Check, is detected, isolating hardware with a failure, shutting down only affected guest, are required. As for hardware error handling of Linux, there are three key features: pre-failure detection, failure isolation, continuity after isolation. These features are generally implemented in upstream kernel, however some important issues are still unresolved.This presentation will show the current implementation of the three key features, detail of unresolved issues, and current activities to solve those issues will be explained. Target audience is kernel developers who are interested in reliability of virtual environment.