Friday, October 10, 2008

Problem Set 4: PageRank

Got comments, questions, gripes, issues, etc. on Problem Set 4: PageRank in my cloud computing course? Post here!

Sunday, October 5, 2008

Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective

Tamer Elsayed
University of Maryland

9:30am, October 6, 2008
Hornbake 2119

Abstract
In this talk, I will discuss the problem of computing pairwise document similarity in large text collections. This general problem appears in many different applications such as text clustering, ad-hoc retrieval, and co-reference resolution. A simple MapReduce solution to the problem will be presented, along with different optimizations that make the solution more efficient but still effective. I will then illustrate how the solution can be leveraged in the context of one application, identity resolution in email collections.

About the Speaker
Tamer Elsayed is a Ph.D. candidate in the Computer Science Department at the University of Maryland. He has earned B.Sc. and M.Sc. degrees in Computer Science from Alexandria University (Egypt) and a second M.Sc. in Computer Science from the University of Maryland. His research interests include identity resolution in
informal media and large-scale text processing using the MapReduce framework.

Problem Set 3: Boolean retrieval

Got comments, questions, gripes, issues, etc. on Problem Set 3: Boolean retrieval in my cloud computing course? Post here!

Problem Set 2: Invert index construction

Got comments, questions, gripes, issues, etc. on Problem Set 2: Invert index construction in my cloud computing course? Post here!

Sorry, I meant to post this before the problem set was due... but I'd still be interested in feedback, reactions, comments, etc.

Contributors