- Homework 3 (or think of it as Homework 2 part b). Due 10/13 in class. (We might discuss your answers)
- ("Reading Comprehension") Read the paper "Anatomy of a large-scale hyper-textual search engine" which constains a description of Google search engine circa 1998 (i.e., before it became a company). Answer the following questions:
- What are Fancy hits?
- Why are there two types of barrels--the short and the long?
- How is indexing parallelized?
- How does Google show that it doesn't quite care about recall?
- How does Google avoid crawling the same URL multiple times?
- What are some of the memory saving things they do?
- Do they use TF/IDF directly or indirectly?
- Do they normalize the vectors? (why not?)
- Can they support proximity queries?
- How are "page synopses" made?
- List some of the parameters that need to be initialized for their search engine. Are the default values they pick reasonable?
This is class blog for CSE494/598 at ASU. The class homepage is http://rakaposhi.eas.asu.edu/cse494
Thursday, October 6, 2011
Paper Reading assignment: Due 10/13 (Updated the homework page too)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment