What are some of the challenges with crawling the web?
How can a web developer make his/her page get better rankings?
Some challenges I found:
Due to the fact that bandwidth is finite and costly, it is in the best interest to most effectively crawl the most important content. With all sorts of server-side generated content being served by many websites, the problem with avoiding duplicate content is more difficult(eg many many GET parameter combinations that result in the same data). Also, spider traps can be a severe problem. Take for example, a web-calendar. If the spider was to follow the links on the calender, it would continue crawling to the next day/year etc...
Web page rankings:
There is an entire topic on SEO (search engine optimization) which in a sense is reverse engineering search engines to get better rankings, which is too large to cover here. The typical tips you hear about are avoiding duplicate content, getting more inbound links...etc