Autonomic Power and Performance Management of Large-Scale Data Centers


With the rapid growth of servers and applications spurred by the Internet, the power consumption of servers has become critically important and must be efficiently managed. Previous literature [1] have reported that with a projected 50 million square feet of data center hosting capacity in the U.S. by 2005, data center density levels of 200 watts per square foot for servers, storage, switches, and data center support for 24x7, could require 80 TWh per year, costing $8B per year at $100 per MWh, and releasing 50 tons of new CO2 annually. High energy consumption also translates into excessive heat dissipation which in turn, increases cooling costs and causes servers to become more prone to failure. The main goal of this research is to consider power as a first-class resource that must be managed autonomously along with performance, fault, and security. We have pioneered the research in optimizing power and picture quality in video compression. Our research effort will extend this research to high performance distributed computing centers like the Internet data centers of today and leverage as well as contribute towards the emerging power management features for servers and devices that offer a wide range of low-power states and reduce the cost of transitions.

The objective of this research is to develop a theoretical framework and a general methodology for autonomic power and performance management of high performance distributed computing centers to achieve (a) online modeling, monitoring, and analysis of power consumption and performance; (b) adaptive learning and automatically identifying strategies to minimize power consumption while maintaining the required quality of service requirements for a wide range of workloads and applications; (c) dynamically reconfigure the computing, storage and network resources according to the selected optimization strategies; and (d) seek new and effective optimization techniques that can scale well and can perform endlessly in a dynamic and continuous fashion, making the system truly autonomic.

As part of this work we develop innovative management techniques to address the following research challenges:
1. How do we efficiently and accurately model power and energy consumption from a system level perspective that involves the complex interactions of different classes of devices such as processor, memory, network and I/O?
2. How can we predict in real-time the behavior of system resources and their power consumptions as workloads change dynamically by several order of magnitude within a day or a week?
3. How to design efficient and self-adjusting optimization mechanisms, game theoretic techniques that can continuously and endlessly learn, execute, monitor, and improve themselves in meeting the collective objectives of power management and performance improvement?

The development of the models and solution methods consist of the following steps: First, a mixed programming model is developed to minimize the power consumption while maintaining performance requirements of a memory system which is at the lower-most layer (component-level) in a data center. Based on this model a game model is constructed which takes the competition of the different systems/platforms within the data center taking into account the limited available electric power budget. Non-cooperative and cooperative solutions are determined and compared in order to find the most satisfying outcome for the entire system. In the next step each element of each data center will be considered as an agent in an agent-based gaming approach. Using simulation and sensitivity analysis, the most satisfying strategies of the agents will be determined with respect to the overall performance of the entire system. In addition to developing a practical methodology, several theoretical issues have to be examined such as existence and uniqueness of non-cooperative Nash equilibrium and cooperative solution concepts. Since different solution concepts would lead to different outcomes, one objective of the proposed research going forward would be to find the solution concepts which fit best the particular problems under investigation.

This project is sponsored by NSF Gransts number 0615323