What is Google PageRank Guide (with code)
While no longer a public-facing metric, Google PageRank still plays an important role in web search. This Google algorithm measures webpages’ importance and relevance based on the quantity and quality of links pointing to them, with incoming links counting as votes.
PageRank is far from being the only criteria in the ranking of Google search results, but it still has a significant impact on results and on search engine optimisation. A page with a high PageRank is more likely to appear among the first few results on the first page of search results and to attract more organic traffic.
This article sheds more light on Google PageRank, exploring its history and development, the PageRank algorithm, the importance of links, the connection between PageRank and SEO, strategies for boosting PageRank, and much more.
History and Development of PageRank
In 1996, Robin Li created RankDesk, which uses backlinks to measure the popularity and quality of sites it indexes. The engine uses keywords in links’ anchor text to identify relevance. Li’s work with RankDesk inspired a young Stanford University PhD student named Lawrence (Larry) Page to develop a web search algorithm that used links pointing to a site. Page went on to co-found Google with Sergey Brin.
In 1998, Google appeared on the scene and revolutionised web search, thanks to its PageRank algorithm. An early paper by Google co-founder Larry Page explained that the citation (link) graph was an important and underutilised resource. This was the original algorithm that Google used to calculate web pages’ importance. It’s also what set Google apart from other search engines.
Here’s a glance at the history of Google PageRank:
- 1 April 1998: Page and Brin publish “The Anatomy of a Large-Scale Hypertextual Web Search Engine”.
- 1 September 1998: Page and Brin file the first PageRank patent.
- 4 September 1998: Google is incorporated.
- 11 December 2000: Google launches Google Toolbar.
- 17 June 2004: Google files the Reasonable Surfer patent.
- 12 October 2006: Google files the Seed Sets patent.
- 8 March 2016: Google retires Google Toolbar.
How PageRank made Google the Leading Search Engine
The world wide web was a very different place in the summer of 1993. It was still unfamiliar territory to most people, and those who did use it had to do so with specialised catalogues that were maintained by hand rather than with search engines.
W3Catalog, the ancestor of today’s search engines, was created when the University of Geneva’s Oscar Nierstrasz wrote Perl scripts that periodically rewrote those catalogues into a standard format. W3Catalog was released on 2 September 1993.
Other early search engines appeared later that year, such as Aliweb and JumpStation. Aliweb relied on notifications from website administrators, while JumpStation used a web robot to find pages and build its index. JumpStation was also the first search engine to use a web form as its query program’s interface.
In 1994, Yahoo! launched its web directory and added a search function in 1995. While Yahoo! was the web’s first popular search engine, its search function depended on the directory rather than on full-text copies of web pages. Google’s arrival with PageRank in 1998 changed things, as its search engine did not rely on hand-maintained directories or bots that couldn’t assess pages’ relevance to produce more accurate search results.
Google made such an impact that, in 2000, Yahoo! partnered with Google to carry the young company’s search results and adverts. The partnership lasted for four years until Yahoo! developed its in-house search and ad serving systems in 2004.
The accuracy, accessibility, and relevance of Google and its search engine results continued to attract users, many of whom realised that they could go straight to the search engine, rather than going via sites such as Yahoo!. The result was Google becoming the single most popular search engine on the internet.
Understanding the PageRank Algorithm
The complex PageRank algorithm bases its measurement of web pages’ importance and relevance on the number and quality of links pointing to those pages. The higher the number of links and the greater the value and relevance of those links’ originating pages, the higher the web page’s prestige.
The theory behind the original algorithm was that a link from one site to another was like a vote of authority and trust. A page with more links pointing to it should have a higher ranking and should be trusted more than other pages. That said, PageRank never counted all links from all pages equally.
Let’s delve into the calculations behind PageRank using an example from Brin and Page’s paper.
Assume that pages T1…Tn point to page A. The parameter d is a damping factor usually set to 0.85. C(A) is the number of links going out of page A. To calculate page A’s PageRank, the search engine uses the following calculation: PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn)).
The Importance of Links and Backlinks
Links and backlinks play an important role in PageRank’s ranking of pages. In this context, links refer to links between pages on your website, while backlinks are links to your site’s pages from other websites.
A solid internal linking structure on your website can make it easier for PageRank to flow through your site and to drive authority to pages that aren’t linked to from any other pages. Backlinks are important for your website as they:
- Act as votes of confidence, especially when they’re from high-quality websites.
- Help websites distribute their authority (PageRank) – a backlink from a site with a high PageRank boosts your own site’s PageRank score.
- Help strengthen your website’s position, especially if they are from a diversity of reputable websites.
To sum up, good-quality backlinks from reputable, relevant sites with a high PageRank are good endorsements for your site, as far as Google’s algorithms go. This ultimately means potentially higher search engine result page rankings and better search visibility for your website.
PageRank and SEO
Even though PageRank is not the primary factor in Google’s search algorithm, a high PageRank can still help your site to rank higher in search results. A low PageRank will have the opposite effect. Among the hundreds of other SEO ranking factors that Google takes into account are:
- Backlinks
- Keyword optimisation
- Quality content
- Technical SEO
- User experience
- Schema markup
- Brand signals (how your brand is perceived online)
- Social signals such as likes and shares on social media
Calculating PageRank
When Google calculates PageRank, its calculations are based on the assumption that every site has a quantifiable importance, ranked from 0 to 10 (0 being least important, 10 being most important). The algorithm uses the number and value of the links coming to a website to calculate the site’s PageRank.
At first, one backlink to a site counted as one vote for the site being linked to. This changed in later versions of PageRank, which set the new website’s initial value at 0.25.
This process is iterative, as it involves multiple iterations (passes) over the web graph. At first, each webpage is given an equal PageRank score. The algorithm recalculates the scores based on the links between pages with each iteration.
During the iterations, the algorithm divides the page’s PageRank score among its outgoing links. A fraction of the site’s importance is distributed to each of the pages it linked to, allowing them to accumulate more PageRank.
The Basic Formula and Factors Influencing PageRank
As a quick reminder, the basic PageRank formula is calculated below:
PageRank PR(A) is the PageRank of web page A.
The formula determines the PageRank of A based on:
- the damping factor (d) set between 0 and 1, representing the probability that a user will continue clicking on links. It typically has a value around 0.85 or 85%. Naturally this probability will depend on the location of the hyperlink. For example a hyperlink in the header menu is more likely to get clicked on than one in the footer menu.
- PR(Ti) is the PageRank of inbound linking page Ti
- number of outbound links C(Ti) is the on inbound linking page Ti
Effectively this is determining the PageRank based on the PageRank of the inbound links and the number of other URLs the inbound linking URL’s PageRank is also being distributed to.
Of course, once Page A has more inbound links added to it, it becomes more powerful because the pages it links to become more powerful which then (directly or indirectly) feeds back to Page A and other pages in the website.
For that reason, the formula is iterative, so Google would run their function numerous times until it stabilises. Of course, once a home page is more powerful, it becomes more powerful because the pages it links to become more powerful. This then feeds back to the home page and other pages in the website. For this reason, Google runs the function numerous times until it stabilises as the incremental PageRank with each successive iteration becomes minuscule and plateaus eventually.
More Advanced PageRank
Since Larry Page and Sergey Brin’s original ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine’ paper, advances have been made in determining PageRank which is hypothesised in the formula below which incorporates a number
The more advanced formula determines the PageRank of A with added weights:
- Time decay factor (D) to reduce the importance of older URLs (even if this does sound counterintuitive to SEOs)
- Freshness (F) to reflect the freshness of the inbound page Ti to page A
- Trustworthiness score (T) of page A, where more trustworthy pages are given a higher weighting, think of this as the inverse of a spam score
- Personalised user preferences (P) assigned to page A represent the user’s browsing history and links they’re most likely to click on, which would have been boosted by the introduction and widespread adoption of the Chrome browser.
How site structure and link quality contribute to PageRank
The Code
The code below executes the most advanced version of the PageRank formula as a Python function:
import numpy as np
def calculate_pagerank(damping_factor, inbound_pageranks, decay_factors, freshness_scores, trustworthiness_scores, personalized_scores, outbound_links):
pagerank = 0
for i in range(len(inbound_pageranks)):
pagerank += (inbound_pageranks[i] * decay_factors[i] * freshness_scores[i] * trustworthiness_scores[i] * personalized_scores[i] / outbound_links[i])
pagerank = int((1 – damping_factor) + damping_factor * pagerank)
return pagerank
The function itself calculates the incremental PR based on the advanced elements of PageRank by iterating through the inbound link values.
To see the code in action the code below using example scores
# Example usage:
damping_factor = 0.85
inbound_pageranks = [40, 21, 15] # PageRank scores of inbound linking URLs
decay_factors = [0.9, 0.8, 0.7] # Decay factors for inbound linking URLs
freshness_scores = [0.95, 0.85, 0.75] # Freshness scores for inbound linking URLs
trustworthiness_scores = [0.9, 0.85, 0.8] # Trustworthiness scores for inbound linking URLs
personalized_scores = [0.8, 0.75, 0.7] # Personalized scores for inbound linking URLs
outbound_links = [5, 8, 10] # Outbound link counts for inbound linking URLs
pagerank = calculate_pagerank(damping_factor, inbound_pageranks, decay_factors, freshness_scores, trustworthiness_scores, personalized_scores, outbound_links)
print(“PageRank score for URL ‘A’:”, pagerank)
PageRank score for URL ‘A’: 5
Based on 3 inbound links, A has a PageRank of 5.
In practical terms, it would be quite an undertaking to collect all the data on every single inbound link URL to your site pages, despite the Google founder’s claims that the web’s PageRank could be run on a medium-sized desktop computer. The web has gotten exponentially larger and more complex since 1998!
If you’re looking to evaluate the impact of your technical SEO recommendations, you’ll want to simulate and quantify the PageRank following your proposed implementation.
Assuming you’ve already determined the True Internal PageRank (TIPR) of your site pages, the goal is to work out the additional PageRank from your technical SEO recommendations, which may come from:
- Adding or removing inbound links
- Merging URLs via canonical tags, redirection or other
- Blocking / NoIndex / NoFollow ‘ing URLs
Using the function above, you can add the result to the pre-implementation TIPR to work out the new PageRank.
PageRank vs. Other Ranking Algorithms
Google’s PageRank algorithm operates quite differently from other search algorithms on the market. For example, Bing uses hubs and authorities when ranking search results. These two factors affect each other during update rounds, as hubs that link to higher authorities and authorities with links from high-value hubs each get higher updated scores.
Comparing Google and Bing, it’s clear that both search engines deliver accurate results, although Bing can be slow with making updates, which could lead to it ranking older websites with outdated information more highly than newer sites with more relevant content.
Improving Your PageRank
Even though PageRank’s no longer a public-facing metric and is seldom mentioned in basic SEO strategies, it’s still possible to improve your site’s PageRank. The best way to do this is to improve your site’s internal links and to build good-quality backlinks to your site. A few strategies for doing this include:
- Find broken backlinks and ask site owners to fix them.
- Share site content with publishers who might link back to it.
- Write good-quality content for relevant sites as a guest blogger.
- Respond to online media requests.
- As site owners to remove links to your site from low-quality websites or sites that have been flagged by Google.
- Add your website to online directories.
- Create high-quality site content that other websites will want to link to.
Tools and Software for Analyzing PageRank
Although it’s no longer possible to see Google’s PageRank, there are tools and software that can be used to analyse a site’s PageRank. Ahrefs’ URL Rating (UR) is a fine example of this, as it is a good replacement metric for PageRank.
UR indicates the strength of a page’s link profile on a scale of 100 points. The higher the score, the stronger the link profile. Like PageRank, UR takes internal and external links into account when calculating a page’s link profile strength. However, UR ignores some links’ value and does not include nofollow links in its calculations.
Using Python to Calculate, Simulate and Analyse PageRank
PageRank is the advance in search engine technology which made Google the pre-eminent search engine. Since Larry Page and Sergey Brin’s original ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine’ paper, advances have been made in determining PageRank, which is the relative importance of content on the web i.e. its authority. Being data-driven we cover
PageRank Formula
The Basic Formula and Factors Influencing PageRank
As a quick reminder, the basic PageRank formula is calculated below:
PageRank PR(A) is the PageRank of web page A.
The formula determines the PageRank of A based on:
- the damping factor (d) set between 0 and 1, representing the probability that a user will continue clicking on links. It typically has a value around 0.85 or 85%. Naturally this probability will depend on the location of the hyperlink. For example a hyperlink in the header menu is more likely to get clicked on than one in the footer menu.
- PR(Ti) is the PageRank of inbound linking page Ti
- number of outbound links C(Ti) is the on inbound linking page Ti
Effectively this is determining the PageRank based on the PageRank of the inbound links and the number of other URLs the inbound linking URL’s PageRank is also being distributed to.
Of course, once Page A has more inbound links added to it, it becomes more powerful because the pages it links to become more powerful which then (directly or indirectly) feeds back to Page A and other pages in the website.
For that reason, formula is iterative so Google would run their function numerous times until it stabilises as the incremental PageRank with each successive iteration becomes miniscule and plateaus eventually.
More Advanced PageRank
The more advanced formula:
determines the PageRank of A with added weights:
- Time decay factor (D) to reduce the importance of older URLs (even if this does sound counterintuitive to SEOs)
- Freshness (F) to reflect the freshness of the inbound page Ti to page A
- Trustworthiness score (T) of page A where more trustworthy pages are given a higher weighting, think of this as the inverse of a spam score
- Personalised user preferences (P) assigned to page A representing the user’s browsing history and links they’re most likely to click on, which would have been boosted by the introduction and widespread adoption of the Chrome browser.
Python Code For PageRank
The code below executes the most advanced version of the PageRank formula as a Python function:
import numpy as np def calculate_pagerank(damping_factor, inbound_pageranks, decay_factors, freshness_scores, trustworthiness_scores, personalized_scores, outbound_links): pagerank = 0 for i in range(len(inbound_pageranks)): pagerank += (inbound_pageranks[i] * decay_factors[i] * freshness_scores[i] * trustworthiness_scores[i] * personalized_scores[i] / outbound_links[i]) pagerank = int((1 - damping_factor) + damping_factor * pagerank) return pagerank
The function itself calculates the incremental PR based on the advanced elements of PageRank by iterating through the inbound link values.
To see the code in action the code below using example scores:
damping_factor = 0.85 inbound_pageranks = [40, 21, 15] # PageRank scores of inbound linking URLs decay = [0.9, 0.8, 0.7] # Decay factors for inbound linking URLs freshness = [0.95, 0.85, 0.75] # Freshness scores for inbound linking URLs trust = [0.9, 0.85, 0.8] # Trustworthiness scores for inbound linking URLs personalized = [0.8, 0.75, 0.7] # Personalized scores for inbound linking URLs outbound_links = [5, 8, 10] # Outbound link counts for inbound linking URLs pagerank = calculate_pagerank(damping_factor, inbound_pageranks, decay, freshness, trust, personalized, outbound_links) print("PageRank score for URL 'A':", pagerank) PageRank score for URL 'A': 5
Based on 3 inbound links, A has a PageRank of 5.
In practical terms it would be quite an undertaking to collect all the data on every single inbound link URL to your site pages, despite the Google founder claims that the web’s PageRank could be run on a medium sized desktop computer. The web has got exponentially much larger and more complex since 1998. Thankfully computing power has exponentially increased too.
If you’re looking to evaluate the impact of your technical SEO recommendations, you’ll want to simulate and quantify the PageRank following your proposed implementation.
Assuming you’ve already determined the True Internal PageRank (TIPR) of your site pages, the goal is to work out the additional PageRank from your technical SEO recommendations which may come from:
- Adding or removing inbound links
- Merging URLs via canonical tags, redirection or other
- Blocking / NoIndex / NoFollow ‘ing URLs
Using the function above you can add the result to the pre-implementation TIPR to work out the new PageRank.
Limitations and Criticisms of PageRank
PageRank may have helped cement Google’s reputation as the world’s leading search engine, but it is not without limitations or criticism.
One of these limitations is that PageRank scores do not take current events into account. Rather than being calculated at the time of search, which would be a slow, costly process, PageRank scores are calculated during Google’s indexing of site pages on the web, when it scans pages for key phrases and topics.
The problem here is that Google doesn’t see recently updated pages as authoritative until they have gained more exposure and paths from other authoritative pages. Google has tried to get around this issue by searching news articles separately and by listing news results separately at the top of general search results.
Another limitation is that PageRank is unable to process queries containing natural language and information that goes beyond keywords. It also cannot rank keywords or key phrases by importance to the user. To overcome these issues, Google has developed separate natural language processing and machine learning algorithms to deliver accurate voice search results.
The search engine has also updated its core ranking systems’ algorithms to increase the relevance of search results, reduce unoriginal content, and reduce spam (such as expired websites repurposed as spam repositories) in search results.
In summary, despite their limitations and pitfalls, Page Rank scores are an important part of SEO. However, they’re only one of the many integral components that work together to get (and keep) a website at the top of the SERPs.