- Dec 2022
-
ilpubs.stanford.edu:8090 ilpubs.stanford.edu:8090
-
PageRank handles both these cases and everything in between by recursively propagating weightsthrough the link structure of the web
Another justification of the PageRank algorithm is that hyperlinks influence the index of a page. For example, if a page is referred to by multiple other pages (well cited), then it's deemed important and will have a higher PageRank.
-
Another intuitive justification is that a page can have a high PageRank if there are many pages that pointto it, or if there are some pages that point to it and have a high PageRank.
So, does this mean that hyperlinks influence the PageRank? If you were to tag a website multiple times, it's PageRank index will significantly increase?
-
One important variation is to only add the damping factor d to a single page, or agroup of pages. This allows for personalization and can make it nearly impossible to deliberatelymislead the system in order to get a higher ranking.
It assumes there's a random user click on pages at random and not backtracking. The probability in the randomness is the PageRank.
The damping factor is the probability that the "random surfer" will get bored and move on to another page. The damping factor can personalize search results, making it difficult for people to manipulate the system in order to get a higher ranking.
-
Keywords
Definitions: 1. World Wide Web = An information system on the internet which allows documents to be connected to other documents by hypertext links.
-
Search Engines = A program that searches for and identifies items in a database that correspond to keywords or characters specified by the user.
-
Information Retrieval = The process of obtaining information system resources that are relevant to an information need from a collection of those resources, which uses searches based on a type of indexing.
-
PageRank = PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites.
-
Google = Google is a search engine which was discussed in this paper, which uses the PageRank algorithm to optimize searching for larger datasets.
-
-
Google is designed to crawl and index the Web efficientlyand produce much more satisfying search results than existing systems.
Prior to Google, how did other search engines work? Did they have crawlers or did they have to manually add websites to their databases? Make sure to look into this later or wait if it's addressed in the paper.
-
1.3.2 Academic Search Engine Research
Another goal of Google is a specification of the previous goal (improved search quality), as they want to create an improved searching experience for students.
The World Wide Web was originally created to facilitate academic research, and Google plans on creating a system that can support research activities as mentioned.
-
1.1 Web Search Engines
The biggest issue with previous Web Search engines, like the World Wide Web Worm and later Altavista was that they handled very limited queries per day. They ranged in the millions, starting from 2, 20, to 100m, which was expected because of the technology, but not the most efficient manner.
-
1.2. Google: Scaling with the Web
What Google proposes is the challenge of a search engine with efficient crawling technology, able to use storage efficiently, fast queries and sorting, etc. It's inevitable that the tasks will become harder as the demand for the Web increases, but Google is designed to be scaled to large data sets.
-
1.3.1 Improved Search Quality
Improved search quality means that users searching can find exactly what it is they want. At the time, search engines weren't reliable and it was more than a myth for a search engine to find "almost anything on the Web".
-