Deduplication Storage System: How It Works and Why It Changes Storage Industry

Professor Kai Li

Princeton University, Data Domain (an EMC Company)

Date and time: 11.30am - 12.30pm, Thursday 20th August, 2009

Venue: 12.13.03 (Building 12, Level 13, Room 03)

Abstract:

Deduplication has emerged as the hottest technology in storage industry.  The latest deduplication storage system can achieve 10x to 30x compression ratio on backup data, with an inline multi-stream deduplication throughput of 1.5Gbytes/sec (as a contrast, a common compression tool such as gzip or winzip achieves about 2-3x compression at about 30 Mbytes/sec on a typical server).  Deduplication storage systems have now become the standard for backups and remote data replications for enterprise data centers.  What is deduplication?  How does a deduplication storage system work?  Can deduplication storage system go beyond the backup use cases? 

This talk answers these questions by first giving an introduction to the deduplication technology and then describing the internals of Data Domain deduplication file system.  We will give an in-depth discussion on how to solve the key technical challenge to achieve a high deduplication throughput with minimal CPU amd memory resources and how the deduplication technology can impact nearline storage and primary storage systems.

About the speaker:

Dr. Kai Li is a Charles Fitzmorris professor at the Computer Science Department of Princeton University. His research interests include operating systems, computer architecture, storage systems, and large-scale data analysis and visualization systems  He has led several research projects at Princeton including the Shared Virtual Memory project which studies how to build shared memory on a cluster without physically shared memory, the Scalable I/O project which attacks I/O bottleneck problems for supercomputers, the Scalable High-performance Really Inexpensive MultiProcessor (SHRIMP) project which investigates how to build high-performance servers on a cluster, and the Scalable Display Wall project which explores how to build and use a high-resolution, wall-size display system to visualize massive datasets. During his sabbatical from Princeton in 2001, he co-founded Data Domain, Inc which built the first commercial deduplication storage system and became the leading company in the deduplication storage market.

 

He joined Princeton after receiving his Ph.D. degree from Yale University In 1986.  Prior to that, he received his B.S. degree from Jilin University in China and M.S. degree from University of Science and Technology of China, Academy of Sciences of China, respectively.  He became an ACM fellow in 1998.


Seminar Organisation

Seminars are free and open to the general public. No booking is necessary. If you are interested in giving a presentation in this seminar series, or to make suggestions for speakers, please contact Xiaodong Li, the seminar co-ordinator.