Date and time: 11.30am-12.30pm, Friday 23rd July, 2004
Venue: 10.11.04
Chair: Margaret Hamilton
Abstract:
Search engines need to evaluate queries extremely fast, a challenging task given the quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases, and this talk shows how phrase queries can be efficiently supported with low disk overheads. This talk explains the nextword index structure we invented in 1998, and details experiments that show how it can be combined with a partial nextword, partial phrase, and conventional inverted index. Using our structures, phrase queries can be evaluated in a quarter the time required to evaluate such queries with an inverted file alone, and the additional space overhead is only 26% of the size of the inverted file.
This work will appear soon in the ACM Transactions on Information Systems as H.E. Williams, J. Zobel, and D. Bahle, ``Fast Phrase Querying with Combined Indexes''.
About the speaker:
Hugh E. Williams is the Associate Professor in Information Retrieval in the School of Computer Science and IT at RMIT University. His research interests include building better web search engines, multimedia and genomic information retrieval, and designing fast data structures. When not at work Hugh likes to go running, watch Richmond play footy, follow the cricket, and write books. Hugh has a PhD from RMIT University.
Seminars are free and open to the general public. No booking is necessary.
If you are interested in giving a presentation in this seminar series, or to make suggestions for speakers, please contact James Harland, the seminar co-ordinator.
James Harland Last modified: Tue Jun 1 11:34:52 EST 2004