Hierarchical Clustering and Summarization of Network Traffic Data

Abdun Mahmood

Department of Computer Science and Software Engineering, University of Melbourne

Date and time: 11.30am - 12.30pm, Friday 9th May, 2008

Venue: 10.08.04 (Building 10, Level 8, Room 4)

Abstract:

An important problem in Network Management is how to understand the pattern of usage of a network by its users. One problem faced when analyzing network traffic data traces is how to use the mix of different types of attributes present in the data in order to better understand the underlying traffic patterns. In general, data mining techniques can be made more efficient by exploiting the underlying structure of the data.

In the case of network traffic data it is evident that there is a hierarchical pattern in some of its features, for example, IP addresses. Consequently, an important challenge is how to develop data mining techniques that can efficiently use this hierarchical structure. Analyzing the pattern of usage helps to identify potential resource bottleneck problems and security problems. The resource bottlenecks may be due to new services, such as distributed file sharing, and the security threats can be due to malicious activities, such as worm propagation and Distributed Denial of Service (DDoS) attacks. As networks continue to grow in capacity, any data mining techniques that are used to identify patterns of network usage face the problem of how to cope with limited memory and computational resources. In this talk, I present a hierarchical clustering technique to (1) identify similar types of network traffic, (2) adaptively sample input traffic data, and (3) summarize important trends in the output of the clustering method. In order to cope with the different types of attributes that characterize network traffic, we have developed a clustering technique that can exploit the hierarchical structure of attributes, together with categorical and numerical attributes. We identify the need to reduce network traffic data from the input data and from the output report of a network traffic analysis technique. To address the problem of scalability for clustering in this context, we have developed an adaptive sampling scheme coupled with a selection stage to ensure that both the larger traffic patterns from DoS attacks as well as the smaller traffic patterns from normal activities are represented in the sampled data that is used for clustering.

A key advantage of this approach is that we can increase the proportion of computational resources that are spent on new or unusual traffic patterns. To address the problem of how to generate a concise report of network traffic patterns from the results of clustering, we consider two different techniques to summarize network traffic reports so that the summarized report is compact yet accurate traffic summary. A key advantage of this approach is that it can help identify redundant clusters from a hierarchical report of network traffic patterns. In comparison to existing methods for network traffic characterization, we demonstrate that our approach is more accurate and scalable for identifying a wide variety of different patterns of network traffic usage.

About the speaker:

Abdun Naser Mahmood received the B.Sc. degree in Applied Physics & Electronics (with first class honours) in 1997, and the M.Sc. degree in Computer Science (with first class first position) in 1999, all from University of Dhaka, Bangladesh. He joined University of Dhaka as a lecturer in 2000 and took leave of absence in 2003 for his PhD degree at the University of Melbourne, Australia, where he is currently completing thesis. His research interests include data mining techniques for network monitoring, and algorithm design for adaptive sorting and sampling.


Seminar Organisation

Seminars are free and open to the general public. No booking is necessary. If you are interested in giving a presentation in this seminar series, or to make suggestions for speakers, please contact Xiaodong Li, the seminar co-ordinator.