Virtual machines of high availability using hardware-assisted failure detection

Wei Jen Wang, Hung Lin Huang, Shan Hao Chuang, Shao Jui Chen, Chia Hung Kao, Deron Liang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The virtualization technology has been widely used in today's doud computing datacenters. With the virtualization technology, each physical machine in a datacenter can be logically divided into several virtual machines, on which different types of software services can host. However, many reasons may decrease the availability of the whole system. For example, a failed physical machine automatically fails all virtual machines on the physical machine, and consequently fails every software service on the virtual machines. It is difficult to detect failures efficiently in a general-purpose computer architecture because the hardware cannot provide enough information for fast failure detection. On the contrary, the ATCA (Advanced Telecommunications Computing Architecture) physical machines provide high hardware availability, and support IPMI (Intelligent Platform Management Interface) that can quickly detect the hardware status. In this paper, we developed a novel failure model and designed a symmetric fault-tolerant mechanism using ATCA physical machines and KVM to provide a solution for high system availability. The proposed fault-tolerant mechanism divides ATCA physical machines into pairs, such that each machine of a pair supports fault tolerance for each other. Once a failure is detected in the physical machine layer or the virtualization layer, the failed virtual machines are then recovered on the other physical machine. We have compared the proposed fault-tolerance mechanism with another prior VM-based fault-tolerance tool. The results show that the proposed mechanism significantly reduces the service downtime. That is, it provides better system availability for software services running on the virtual machines.

Original languageEnglish
Title of host publicationICCST 2015 - The 49th Annual IEEE International Carnahan Conference on Security Technology
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479986910
DOIs
StatePublished - 21 Jan 2016
Event49th Annual IEEE International Carnahan Conference on Security Technology, ICCST 2015 - Taipei, Taiwan
Duration: 21 Sep 201524 Sep 2015

Publication series

NameProceedings - International Carnahan Conference on Security Technology
Volume2015-January
ISSN (Print)1071-6572

Conference

Conference49th Annual IEEE International Carnahan Conference on Security Technology, ICCST 2015
Country/TerritoryTaiwan
CityTaipei
Period21/09/1524/09/15

Keywords

  • ATCA
  • Failover
  • Fault tolerance
  • High availability
  • Virtual machine

Fingerprint

Dive into the research topics of 'Virtual machines of high availability using hardware-assisted failure detection'. Together they form a unique fingerprint.

Cite this