Thursday, October 6, 2011

Paper Reading assignment: Due 10/13 (Updated the homework page too)

  • Homework 3 (or think of it as Homework 2 part b). Due 10/13 in class. (We might discuss your answers)
  • ("Reading Comprehension") Read the paper "Anatomy of a large-scale hyper-textual search engine" which constains a description of Google search engine circa 1998 (i.e., before it became a company). Answer the following questions:
    1. What are Fancy hits?
    2. Why are there two types of barrels--the short and the long?
    3. How is indexing parallelized?
    4. How does Google show that it doesn't quite care about recall?
    5. How does Google avoid crawling the same URL multiple times?
    6. What are some of the memory saving things they do?
    7. Do they use TF/IDF directly or indirectly?
    8. Do they normalize the vectors? (why not?)
    9. Can they support proximity queries?
    10. How are "page synopses" made?
    11. List some of the parameters that need to be initialized for their search engine. Are the default values they pick reasonable?

No comments:

Post a Comment