PageRank is the advance in search engine technology which made Google the pre-eminent search engine. Since Larry Page and Sergey Brin’s original ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine’ paper, advances have been made in determining PageRank, which is the relative importance of content on the web i.e. its authority. Being data-driven we cover
PageRank Formula
The Basic Formula and Factors Influencing PageRank
As a quick reminder, the basic PageRank formula is calculated below:
PageRank PR(A) is the PageRank of web page A.
The formula determines the PageRank of A based on:
- the damping factor (d) set between 0 and 1, representing the probability that a user will continue clicking on links. It typically has a value around 0.85 or 85%. Naturally this probability will depend on the location of the hyperlink. For example a hyperlink in the header menu is more likely to get clicked on than one in the footer menu.
- PR(Ti) is the PageRank of inbound linking page Ti
- number of outbound links C(Ti) is the on inbound linking page Ti
Effectively this is determining the PageRank based on the PageRank of the inbound links and the number of other URLs the inbound linking URL’s PageRank is also being distributed to.
Of course, once Page A has more inbound links added to it, it becomes more powerful because the pages it links to become more powerful which then (directly or indirectly) feeds back to Page A and other pages in the website.
For that reason, formula is iterative so Google would run their function numerous times until it stabilises as the incremental PageRank with each successive iteration becomes miniscule and plateaus eventually.
More Advanced PageRank
The more advanced formula:
determines the PageRank of A with added weights:
- Time decay factor (D) to reduce the importance of older URLs (even if this does sound counterintuitive to SEOs)
- Freshness (F) to reflect the freshness of the inbound page Ti to page A
- Trustworthiness score (T) of page A where more trustworthy pages are given a higher weighting, think of this as the inverse of a spam score
- Personalised user preferences (P) assigned to page A representing the user’s browsing history and links they’re most likely to click on, which would have been boosted by the introduction and widespread adoption of the Chrome browser.
Python Code For PageRank
The code below executes the most advanced version of the PageRank formula as a Python function:
import numpy as np def calculate_pagerank(damping_factor, inbound_pageranks, decay_factors, freshness_scores, trustworthiness_scores, personalized_scores, outbound_links): pagerank = 0 for i in range(len(inbound_pageranks)): pagerank += (inbound_pageranks[i] * decay_factors[i] * freshness_scores[i] * trustworthiness_scores[i] * personalized_scores[i] / outbound_links[i]) pagerank = int((1 - damping_factor) + damping_factor * pagerank) return pagerank
The function itself calculates the incremental PR based on the advanced elements of PageRank by iterating through the inbound link values.
To see the code in action the code below using example scores:
damping_factor = 0.85 inbound_pageranks = [40, 21, 15] # PageRank scores of inbound linking URLs decay = [0.9, 0.8, 0.7] # Decay factors for inbound linking URLs freshness = [0.95, 0.85, 0.75] # Freshness scores for inbound linking URLs trust = [0.9, 0.85, 0.8] # Trustworthiness scores for inbound linking URLs personalized = [0.8, 0.75, 0.7] # Personalized scores for inbound linking URLs outbound_links = [5, 8, 10] # Outbound link counts for inbound linking URLs pagerank = calculate_pagerank(damping_factor, inbound_pageranks, decay, freshness, trust, personalized, outbound_links) print("PageRank score for URL 'A':", pagerank) PageRank score for URL 'A': 5
Based on 3 inbound links, A has a PageRank of 5.
In practical terms it would be quite an undertaking to collect all the data on every single inbound link URL to your site pages, despite the Google founder claims that the web’s PageRank could be run on a medium sized desktop computer. The web has got exponentially much larger and more complex since 1998. Thankfully computing power has exponentially increased too.
If you’re looking to evaluate the impact of your technical SEO recommendations, you’ll want to simulate and quantify the PageRank following your proposed implementation.
Assuming you’ve already determined the True Internal PageRank (TIPR) of your site pages, the goal is to work out the additional PageRank from your technical SEO recommendations which may come from:
- Adding or removing inbound links
- Merging URLs via canonical tags, redirection or other
- Blocking / NoIndex / NoFollow ‘ing URLs
Using the function above you can add the result to the pre-implementation TIPR to work out the new PageRank.