Friday, May 19, 2006

Google's White Paper

Last Year I took an Internet Marketing class at BYU from Paul Allen (Provo Labs CEO). And I just wanted to share with you ten interesting things I learned by reading The Anatomy of a Large-Scale Hypertextual Web Search Engine.
1- Page rank is thought of a model of user behavior. Imagine a surfer hitting on links without coming back to any of them, now the probability that the same random surfer visits a page is its page rank.

2- Anchors often provide more accurate descriptions of web pages than the pages themselves.

3- Metadata efforts have failed with web search engines. Any text on a page which is not directly represented to the user is abused to manipulate search engines.

4- How the whole Google system works, which sounds pretty complicated to me.

5- Goggle uses a hand optimized compact encoding since it requires less space than the simple encoding and less manipulation than Huffman coding.

6- Google’s system can crawl over 100 pages per second using four crawlers.

7- Page rank can be personalized by increasing the weight of a user's home page or bookmarks.

8- The use of proximity information helps increase relevance a great deal for many queries.

9- Google really struggles to come out with high quality pages for every query. That’s something I had not thought about before.

10- Google's data structures are optimized so that a large document collection can be crawled, indexed, and searched with little cost.

0 Comments:

Post a Comment

<< Home

free hit counter script
View My Stats