CIS Colloquium, Mar 20, 2009, 03:00PM - 04:00PM, Wachman 447

Autonomic Cloud Systems Management: A Machine Learning Approach

Cheng-Zhong Xu, Wayne State University

Cloud computing, unlocked by virtualization, is emerging as an increasingly important service-oriented computing paradigm. Management is key to providing accurate service availability and performance data and to enabling on-demand real-time capacity planning to meet service demands dynamically. This is because virtualization does not reduce the complexity of a system. In fact, having multiple virtual machines (VMs) running on top of a physical computing infrastructure increases the overall system complexity and poses new challenges in systems management. Optimizing one component may compromise the others, leading to overall performance degradation. Frequent component failures here and there would even cause low system productivity. This talk starts with a review of challenge issues in the design of large scale cloud computing systems. A machine learning approach is introduced for tackling the performance and reliability problems. Two case studies will be presented. One is anomaly detection, bottleneck identification, and VM autoconfiguration. The other is proactive failure management that deals with failures before they occur in cloud systems. Empirical models built from statistical learning exhibit great potential to help overcome the challenges of scale and complexity in current and future networked computer systems.

Dr. Cheng-Zhong Xu is a Professor in the Department of Electrical and Computer Engineering of Wayne State University and the Director of the Laboratory for Networked Computing Systems. Dr. Xu's research interest includes resource management in distributed and parallel systems, high performance cluster computing, and scalable and secure Internet services. He has published more than 140 peer-reviewed journal and conference papers in these areas. He is the leading co-author of the book "Load Balancing in Parallel Computers" (Kluwer Academic, 1996). It was one of the first research monographs that addressed the load balancing issue systematically. Dr. Xu’s most recent book "Scalable and Secure Internet Services and Architecture" (Chapman & Hall/CRC Press, 2005) provided an in-depth analysis of the Internet services in a unified framework from the performance perspective. Dr. Xu is currently serving in the editorial borards of IEEE Transactions on Parallel and Distributed Systems and Journal of Parallel and Distributed Computing. He is a recipient of "Faculty Research Award" of Wayne State University in 2000, President's Award for Excellence in Teaching in 2002, and Career Development Chair Award in 2003. Dr. Xu received a Ph.D. in Computer Science from the University of Hong Kong in 1993.

