Boosting Schema Matchers

Associate Professor Avigdor Gal

Faculty of Industrial Engineering & Management, Technion

Date and time: 12.00pm - 13.00pm, Tuesday 17th February, 2009

Venue: 12.08.02 (Building 12, Level 8, Room 2)

Abstract:

Schema matching is recognized to be one of the basic operations required by the process of data and schema integration, and thus has a great impact on its outcome. We propose a new approach to combining matchers into ensembles, called Schema Matcher Boosting (SMB). This approach is based on a well-known machine learning technique, called boosting. We present a boosting algorithm for schema matching with a unique ensembler feature, namely the ability to choose the matchers that participate in an ensemble. SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher that is accurate for all schema pairs, a designer can focus on finding better than random schema matchers. We provide a thorough comparative empirical results where we show that SMB outperforms, on average, any individual matcher. In our experiments we have compared SMB with more than 30 other matchers over a real world data of 230 schemata and several ensembling approaches, including the Meta-Learner of LSD. Our empirical analysis shows that SMB is shown to be consistently dominant, far beyond any other individual matcher. Finally, we observe that SMB performs better than the Meta-Learner in terms of precision, recall and F-Measure.

About the speaker:

Avigdor Gal is an Associate professor at the Faculty of Industrial Engineering & Management at the Technion. He received his D.Sc. degree from the Technion in 1995 in the area of temporal active databases. He has published more than 80 papers in journals (e.g. Journal of the ACM (JACM), ACM Transactions on Database Systems (TODS),  IEEE Transactions on Knowledge and Data Engineering (TKDE), ACM Transactions on Internet Technology (TOIT), and the VLDB Journal), books (Temporal Databases: Research and Practice) and conferences (ICDE, ER, CoopIS, BPM) on the topics of data integration, temporal databases, information systems architectures, and active databases. Avigdor is a steering committee member of IFCIS, a member of IFIP WG 2.6, and a recepient of the IBM Faculty Award for 2002-2004. He is a member of the ACM and a senior member of IEEE.


Seminar Organisation

Seminars are free and open to the general public. No booking is necessary. If you are interested in giving a presentation in this seminar series, or to make suggestions for speakers, please contact Xiaodong Li, the seminar co-ordinator.