Got comments, questions, gripes, issues, etc. on Problem Set 4: PageRank in my cloud computing course? Post here!
Friday, October 10, 2008
Sunday, October 5, 2008
Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective
Tamer Elsayed
University of Maryland
9:30am, October 6, 2008
Hornbake 2119
Abstract
In this talk, I will discuss the problem of computing pairwise document similarity in large text collections. This general problem appears in many different applications such as text clustering, ad-hoc retrieval, and co-reference resolution. A simple MapReduce solution to the problem will be presented, along with different optimizations that make the solution more efficient but still effective. I will then illustrate how the solution can be leveraged in the context of one application, identity resolution in email collections.
About the Speaker
Tamer Elsayed is a Ph.D. candidate in the Computer Science Department at the University of Maryland. He has earned B.Sc. and M.Sc. degrees in Computer Science from Alexandria University (Egypt) and a second M.Sc. in Computer Science from the University of Maryland. His research interests include identity resolution in
informal media and large-scale text processing using the MapReduce framework.
University of Maryland
9:30am, October 6, 2008
Hornbake 2119
Abstract
In this talk, I will discuss the problem of computing pairwise document similarity in large text collections. This general problem appears in many different applications such as text clustering, ad-hoc retrieval, and co-reference resolution. A simple MapReduce solution to the problem will be presented, along with different optimizations that make the solution more efficient but still effective. I will then illustrate how the solution can be leveraged in the context of one application, identity resolution in email collections.
About the Speaker
Tamer Elsayed is a Ph.D. candidate in the Computer Science Department at the University of Maryland. He has earned B.Sc. and M.Sc. degrees in Computer Science from Alexandria University (Egypt) and a second M.Sc. in Computer Science from the University of Maryland. His research interests include identity resolution in
informal media and large-scale text processing using the MapReduce framework.
Problem Set 3: Boolean retrieval
Got comments, questions, gripes, issues, etc. on Problem Set 3: Boolean retrieval in my cloud computing course? Post here!
Problem Set 2: Invert index construction
Got comments, questions, gripes, issues, etc. on Problem Set 2: Invert index construction in my cloud computing course? Post here!
Sorry, I meant to post this before the problem set was due... but I'd still be interested in feedback, reactions, comments, etc.
Sorry, I meant to post this before the problem set was due... but I'd still be interested in feedback, reactions, comments, etc.
Subscribe to:
Posts (Atom)