Tamer Elsayed
University of Maryland
9:30am, October 6, 2008
Hornbake 2119
Abstract
In this talk, I will discuss the problem of computing pairwise document similarity in large text collections. This general problem appears in many different applications such as text clustering, ad-hoc retrieval, and co-reference resolution. A simple MapReduce solution to the problem will be presented, along with different optimizations that make the solution more efficient but still effective. I will then illustrate how the solution can be leveraged in the context of one application, identity resolution in email collections.
About the Speaker
Tamer Elsayed is a Ph.D. candidate in the Computer Science Department at the University of Maryland. He has earned B.Sc. and M.Sc. degrees in Computer Science from Alexandria University (Egypt) and a second M.Sc. in Computer Science from the University of Maryland. His research interests include identity resolution in
informal media and large-scale text processing using the MapReduce framework.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment