Cost Efficient Design of Fault Tolerant Geo-Distributed Data Centers

Abstract

Many critical e-commerce and financial services are deployed on geo-distributed data centers for scalability and availability. Recent market surveys show that failure of a data center is inevitable resulting in a huge financial loss. Fault-tolerance in distributed data centers is typically handled by provisioning spare capacity to mask failure at a site. We argue that the operating cost and data replication cost (for data availability) must be considered in spare capacity provisioning along with minimizing the number of servers. Since the operating cost and client demand vary across space and time, we propose cost-aware capacity provisioning to minimize the total cost of ownership (TCO) for fault-tolerant data centers. We formulate the problem of spare capacity provisioning in fault-tolerant distributed data centers using mixed integer linear programming (MILP), with an objective of minimizing the TCO. The model accounts for heterogeneous client demand, data replication strategies (single and multiple site), variation in electricity price and carbon tax, and delay constraints while computing the spare capacity. Solving the MILP using real-world data, we observed a saving in the TCO to the tune of 35% compared to a model that minimizes the total number of servers and 43% compared to the model that minimizes the average response time. We demonstrate that our model is beneficial when the cost of electricity, carbon tax, and bandwidth vary significantly across the locations, which seems to be the problem for most of the operators.

Publication
IEEE Transactions on Network and Service Management
Vignesh Sivaraman
Vignesh Sivaraman
Assistant Professor

My research interests include Information Centric Networks, Network Security, Privacy and Verificaiton.