Tuesday, August 30, 2011

Information collection

I was trying to understand few basic questions
a. Does search engine queries the entire Web whenever a new search is done or does it has a copy of all the websites from where it does the search.
b. How does the search engine knows when a new webpage, new document is added or existing documents are updated somehwere in a remote web server. How does it update itself.

I found this video helpful to answer the above questions http://www.youtube.com/watch?v=RLyKLo6StLg

Google for example, has a programs which they send it out periodically to web to find out new pages or documents. This in turn returns with the words used in that page. Google has a big data base where it maps the words to this webpage. So when  user does a search, google looks into this datbase and finds the webpage matching to the text eneterd by the user.



  1. Yes, generally information retrieval in documents maintains a bag of words. Same concept is used here. Instead of documents they make use of webpages. This video explains the concept clearly.