12 Jan

Making Google SEO predictable with data science

predicting results with google

This is a presentation outline shown on a seminar delivered by Andreas Voniatis.


Using SEO for high rankings on Google is an effective way to drive the growth of your business – but in the face of RankBrain, Penguin, and Panda, how can you make sure that the SEO advice you receive will actually work? The answer is in data science and the same mathematical approach that Google themselves use. This presentation explains the key aspects of SEO, from onsite to offsite practices, and – crucially – how you can assess your website and your competitors to accurately predict the impact of your optimisation.


You are a marketing manager. You already get business from referrals and word of mouth. However, you’d like your business to grow faster and that high rankings on Google are one way to make that happen. You realise you need actionable SEO advice that will actually work.


Using data science you will be able to get actionable advice. Here we can see a list of recommendations in descending order of importance. that shows:

  • The probability of that advice increasing your Google rankings
  • The competitive benchmarks your site needs to be working towards

How SEO works

Broadly speaking, the main components of SEO to get right are:

  • Content
  • Site Design & Architecture
  • Links from the traffic sources
  • Social Media

Content and site are often referred to ‘onsite’ or ‘technical SEO’. Links and social media are referred to as ‘offsite SEO’.

The descending order of importance is calculated by R2 which tells us the probability that following the advice will increase rankings.

The advice is also targeted, for example we don’t just tell what it is that needs changing, we also give you the targeted number to work towards shown here.

How to predict ranking factors

The first thing is you need to know your competitors. By knowing your competitors, you know who to gather data from and which competitors to model. Then you need to check the rankings of your competitors as that will be used as your main metric that you’re going to predict.

You also need a basket of ranking factors that your competitors could be doing well on in order to find new SEO advantages. You’ll need to gather (feature) data on each of your competitors for each of those potential ranking factors.

For example, if you want to test how much of a factor mobile UX is for increasing your rankings in your industry, then you’ll need to include mobile UX in your basket.

Once you have your feature data, you’re ready to perform data science and find out what really works in your industry.


The first step is to find your competitors by checking the top ranked sites in Google for ALL of your desired search phrases. You’ll need to do that every day for 30 days so that you get a sense of who and which sites consistently rank for your desired search phrases.


Once you have the data you can perform summary statistics on the competitors to see who is doing well and who isn’t.


  • Average performance above and below you
  • Follow a similar business (and content) model
  • Don’t include non industry sites like Wikipedia (unless you’re in the online encyclopedia market)

Ranking Factors

Courtesy of Search Engine Land we can see that there are over 200 ranking factors. So if we already know what all these factors are, what is the point?

The point is that these generic ranking factors give us ideas we can test to make our SEO strategy more actionable and predictable.

Data sources

Once we know ranking factors we wish to test, we need to decide where we will get the data from. For example, for getting ranking data you may want to use a tool like Authority Labs.


Readability was one of the key ingredients of the Google Penguin algorithm as identified by MathSight. If you wanted to test the hypothesis that readability or reading age was a factor, you could go to readability-score.com and use the bulk tool to get a full diagnosis of your and your competitors’ site pages.

Collecting the data

For some but not all ranking factors, the data needs to be collected daily, at random times of the day. The purpose is to construct a reliable dataset to perform statistical tests that will help us make decisions.

Test your ranking factors

Now that you have your data set you’re ready to perform statistical tests to establish what are significant ranking factors. This means checking each and every potential ranking factor to understand their very nature and make the necessary adjustments so that each factor becomes predictable when performing regression.

Is your factor really a factor?

According to this graph (in the slide deck), yes. We can see that there is a 76% chance that implementing increasing the reading ease of your title tags to 72 will increase your rankings by 20 positions.

Note how the analysis is both specific in specifying the required reading age AND tells you the likelihood of increasing your rankings as a result.

Anything with an R2 of 0.30 or above is considered statistically significant. The R2 is the calculated probability that your site’s and your competitors’ data points follow the line of best fit.

Insight into action

So, if reading age was a significant factor, you will want to check all landing pages that don’t meet the required metric, rewrite that content until the copy meets the metric and then rerun your SEO study to see how well your predictions performed.

Making Google predictable with data science

For your desired keywords, gather enough ranking data on the ranking sites so that you can perform statistical analysis to find and quantify the competitors in your industry.

Pick a ranking factor (feature) you wish to test and decide how you will collect the data. Remember some features will require daily data collection.

Perform the appropriate statistical tests like linear regression to see what benchmarks your site’s SEO should be working towards and what that benchmark is worth in terms of Google ranking.

Naturally this means that no matter what Google algorithm changes (or new algorithms), new content design, UX practices or social media trends emerge, you can keep on top of what’s working for your competitors. The key is in testing as many Google ranking factors in a statistically valid manner.

It’s not a perfect science, but it’s far more reliable than any non data science alternative.

Technology for effective SEO

To make the above work you’ll need a technology that collects data on over 100 Google ranking hypotheses on you and your competitors. A technology that uses artificial intelligence to make your SEO campaigns in Google more predictable and quantifiable.

Andreas Voniatis
Data Scientist