### Maths is still the foundation for solving technical problems

#### Cracking the Enigma code

Attempts to crack the Enigma code led to all kinds of technological advances as brilliant men and women sought to do the unthinkable: to break a code that seemed unbreakable. More than sixty years later, computing has moved forward at a pace that even those brilliant men and women might find startling. But some things haven’t really changed. In this article, we’ll look at how we’re still facing similar technological challenges today. Although thankfully only the sales charts of companies are at stake.

In the film *The Imitation Game*, Hugh Alexander shows Alan Turing how the Bletchley Park team use histograms to mathematically deduce the Enigma code. The use of histograms to summarise the letters being used in coded messages allows the Bletchley Park team to know which letters occur frequently and then identify, with some confidence, the vowels and consonants of the German alphabet. This was even though the coded letters would change mid message as part of the encryption process.

Their analysis, aided by the Polish government sharing their analytical methods, was then compared to a list of German words to help them identify the settings of the enigma machine. Knowing the settings meant they could decode all of the Nazi encrypted messages.

#### Cracking the Google code

Today, maths-driven data scientists like us have a number of codes to break, most of them related to search engines such as Google. For example, we try and understand why some pages of a client’s website rank well in Google and other pages do not. Like the Bletchley Park team, we plot histograms of each possible ranking factor.

These histograms help us understand a number of factors, such as whether a website tweak such as the time spent on a page is likely to have value for predicting rankings. We can also understand what improvements to site content and architecture will lead to a reader spending more time on the page.

Quite often, the distribution can be skewed so we need to do further work to ‘treat’ the data, for example by eliminating outliers so that our model’s predictions of client SEO rankings will be accurate. In this example, rankings data is heavily positively skewed, which will generate inaccurate results:

After treatment, the data is rendered more ‘normal’ looking in preparation for potential predictive modelling that will give clients actionable and statistically valid insights into how to improve their rankings.

### Working with multiple combinations is still difficult

#### 10,000 trillion possible ways

The last version of the Enigma machine had six scrambling rotors which generated over 10,000 trillion possible ways the machine could be configured.

Source: http://enigma.louisedade.co.uk/howitworks.html

To work with data on such a scale, Alan Turing designed and built an electro-mechanical computer known as a ‘bombe’ that could sift through all those probability calculations and guess the correct settings of the enigma machine. Still, it was an amazing achievement that the team cracked the code, with such odds stacked against them.

#### 200 possible ways – and that’s enough!

Fortunately, Google has approximately 200 signals making the job of marketing data scientists today much easier than those at Bletchley Park. Although we don’t have a ‘bombe’, we use cloud computing servers instead which help us:

- collect the data
- run algorithms to summarise and explore the data
- Identify the crucial ranking factors

### Algorithm changes still keep us on our toes

#### Back to the beginning every few hours

To make matters worse, the code breakers not only had to sift through the large volume of calculations, but all the calculations also had to be done within hours. This was because the settings were changed on a daily basis.

Initially the military versions of the code generating machines had three rotors, but the Germans added additional rotors after becoming suspicious that the Allies were successfully decoding the messages. With each rotor, the number of combinations that had to be calculated went up by a power of 26.

For the codebreaking team, this was a matter of improving the computing efficiency so that the large numbers of calculations could be done quickly. In particular, Alan Turing turned to a lengthy probability analysis to help the computer identify ‘false stops’, that is, false rotor step changes.

#### Daily refinements make progress challenging

Although there are mainly 200 signals, these are split between a number of algorithms known as PageRank, Panda, Hummingbird and Penguin. These are theoretically refined on a daily basis as Google seeks to improve its ability to predict and rank sites by their ability to satisfy a reader’s search query.

While the search engine algorithms don’t suddenly change on a daily basis, there are static updates (such as Penguin and Panda) that can have a definite impact. For example, Google launched the Penguin update in April 2012 “to better catch sites deemed to be spamming its search results, in particular those doing so by buying links or obtaining them through link networks designed primarily to boost Google rankings.” (Search Engine Land).

To address this challenge, SEO data scientists turn to cloud computing and machine learning. These allow us to use ‘change point’ analysis to statistically prove which ranking factors have changed, by how much and what the new optimisation targets are. Most ranking factors will also be industry and geographically specific. For example, the travel industry will have different content requirements to those in the legal sector.

Change point analysis uses Bayesian mathematics to estimate the likelihood that a change (or multiple changes) took place. In the context of SEO, a site may have received a sudden upward (or downward change) in traffic. Changepoint analysis can help us determine whether the change in traffic was likely due to an event occurring on a certain date, such as a Google update.

#### What is change point analysis?

Change point analysis uses Bayesian mathematics to estimate the likelihood that a change (or multiple changes) took place. In the context of SEO, a site may have received a sudden upward (or downward change) in traffic. Changepoint analysis can help us determine whether the change in traffic was likely due to an event occurring on a certain date, such as a Google update.

The graph above shows the change point analysis confirming the site’s loss of traffic was indeed due to an event that occurred on the 22nd May 2013, the same date Google Penguin 2.0 officially rolled. The probability of traffic loss occurring because of this event is highest at that point. If we know what caused the loss in rankings, then we are far more likely to be successful in understanding the factors that win or lose traffic.

### Concluding thoughts

Although the pace of technological change can feel breathtaking today, it’s reassuring – and humbling – to consider how many of the basic challenges we face are similar to those from sixty years ago. And we have so many more ways to tackle those challenges, with much less at stake, than those brilliant men and women at Bletchley Park. Still, we can learn from their approach.

In the same way they pioneered the use of computing to extract meaning from a fiendishly complicated data set, we too use the power of computing to comprehend an ever-expanding network of data:

- 1. We still use histograms, as they did at Bletchley Park, as a first step to understanding why some pages rank well on Google. We then treat the data and use predictive
- 2. Instead of using a physical computer as they did to crack the Enigma Code, we now use cloud computing servers to collect data and run algorithms to summarise data and identify crucial ranking factors.
- 3. Building on what has been achieved in the last sixty years, we now use machine learning and change point analysis to statistically prove changes in rankings.

Maths-driven analysis remains one of the only ways to extract meaningful stories from complex data. Without that statistical rigour, we’re left with guesswork – and uncracked codes.

Contact us at [email protected] to measurably improve your search rankings. We’ll use the techniques described above to find solutions that are statistically proven to increase visibility, traffic, user engagement and online revenue.