< back to knowledge

The SEO Tales: SEO Supercharged with Data Science and Python

by Andreas Voniatis Founder. Fractional SEO Consultant.

Hosted on the Duda youtube page, 3 Python experts spoke live about how using data science and Python can turbocharge your SEO efforts. Where leading SEO consultants Andreas, Nitin, Olesia and Elias spent over an hour discussing how they use data science and Python in their own business’. 

The SEO Tales with Nitin: SEO Supercharged with Data Science and Python, filmed on July 5th, covered everything from what you need to know as a beginner to how to start using Python more effectively. 

The first guest Andreas is an SEO specialist using a data-driven science approach. Andreas, the founder of Artios, has been involved in SEO for 20 years now and uses Python to automate his processes. 

“I actually wanted to quit SEO about 13 years ago as I was falling out of love with it. Then I started to retrain for a career in data science. What was really funny is that the more I came to appreciate how search engines like Google work, made me fall in love with SEO again. It totally gave me a new way of looking at it. I’m self taught in high mathematics, and since then, it’s led me to this point today. I started writing a book three years ago, and the book – Data-Driven SEO – has been published by Springer A-Press.”

Olesia has been working with Python since 2015. Olesia focuses on using ChatGPT to automate the coding process required for building Python scripts and integrates the code into websites. 

Elias, from the US, has worked using data science and Python for a number of years. He realised that no package was all-encompassing for people wanting to booth their SEO efforts using data science models and Python, so he created Advertools

Data science and Python both sound a bit Einstein at first. We hear you. But don’t worry; this podcast will take you well on your way to getting started in this field. Let’s find out what juicy topics were discussed through this live session. 

Where Do You Start with Python & Data Science?

It often seems out of reach learning to code. But with a little willpower, you can go a long way. 

Olesia says before you start trying to write a script using Python, it’s best to grasp the basic concepts of data science first. Python only comes in handy when you understand what data you need to pull and how to use it. To get started, you can find a beginner data science course on Youtube or Coursera.

She also adds that you want to grasp the basic concepts of Python too. Platforms like DataCamp offer courses tailored to different skill levels. These courses typically cover essential topics such as data preprocessing, exploratory data analysis, and machine learning algorithms implemented in Python.

Olesia continues by explaining that as you progress, you can leverage the power of ChatGPT to your advantage. Utilise ChatGPT to generate code snippets that assist you in various tasks. The generated code snippets can serve as starting points that you can modify and adapt to suit your specific needs. Writing the code manually can be hard, so ChatpGPT speeds up the process. It only takes around 20 minutes or so to get the working code once you have input your prompts.

How Do You Run a Crawler Using Python?

Elias explains that you can begin by using the crawl function of an SEO tool, such as the one provided by Advertools. Set the “follow links” parameter so that the crawler traverses through all the links on the website.

Once the crawl is complete, you can obtain the results as a text file. This file typically contains information about each URL discovered during the crawl.

Convert the sitemap data from the text file into a pandas DataFrame. This will allow you to analyse the data more effectively and generate insights. The DataFrame can provide a table-like structure with details about the URLs found in the sitemap.

Elias continues by showing that you can convert the URLs obtained from the crawl into another DataFrame. This DataFrame can include information such as the count of schemes (HTTP/HTTPS), the count of domains, and directories within the URLs.

Use these DataFrame tables to filter, analyse, and plot various aspects of the website’s URLs. For example, you can analyse the publishing frequency of URLs, identify pages that have not been discovered or crawled, and generate HTML reports to highlight these issues.

For uncrawled pages, Elias says, investigate and determine the reasons why they were not crawled. Common reasons can include restrictions specified in the robots.txt file, issues with canonical tags, inadequate title tags, or improper usage of H1 tags.

Finally, he creates a comprehensive report based on the findings to prioritise and address the issues that require attention. These reports can be customised to focus on specific areas that need improvement based on your website’s requirements.

How Do You Automate Website Keywords Using Python?

Andreas’ expertise comes into place for this question when he walks us through his automated process for finding and integrating keywords into the SEO process. Automating SEO using Python can greatly streamline and enhance your optimisation efforts. 

“The end result is to get to a list of keywords that are grouped by search intent, and that’s what I think most people probably do in the industry. 90% of data science is trying to get things into the right format.”

We start by importing the search engine results pages (SERPs) for your target keywords. You can use Python libraries like BeautifulSoup or Scrapy to scrape the search engine results pages and retrieve relevant information.

Andreas continues by saying you need to filter out noisy URLs from the SERPs, such as those from search engine domains (e.g., Google). These URLs do not provide meaningful insights for your analysis.

“You want to put all the SERPs into a single string or single cell. So you have one row per keyword. And what we do is, by group, we simply take all of the URLs and stick them into a single value, which we call the search string here, and then we run the function.”

Once you have the data in the right format, you want to compare the SERPs. The outcome is to take each keyword and line it up with another keyword. Once you have the comparison, you can run the function to see the similarity. 

Andreas continues, “Once we’ve got those comparisons, we group by how similar or dissimilar they are.”

When the keywords are grouped by search intent, you can strategically incorporate them throughout your website. By aligning content with specific search intents, you improve the website’s indexability and help search engines understand the purpose behind different pages.

If you want to use the code to run these functions, you can find this in Andreas’ book Data-Driven SEO.

How to Use Python for SEO Migrations?

Andreas starts off by showing us how to map legacy URLs to new staging URLs or new URLs. He explains that;

“One of the problems of migrations is trying to map legacy URLs to new staging URLs or new URLs. And so one of the things that I do is to actually do a similarity score of the content, trying to match the titles of the new URLs and the URLs you’re about to migrate.”

Andreas adds, “Using Pareto is quite useful because it’s a bit similar to the cumulative distribution chart. With Pareto, you’ll see that if you list in descending order of volume, the total impressions, you can find how important each successive URL is. As you go down, that can help you work out which URLs are at risk. And if they’re not carried over, how it could impact the traffic of the site upon relaunch.”

You can use this data to look at the different URLs and make any changes you need to before the migration takes place. Ensuring that none of the important URLs are missed. 

Andreas says that 80% is generally the best similarity score, but that’s for him. You need to test what is workable for your client and go from there rather than blindly following a set number. 

Olessia ended the conversation by adding a key point; many people miss images. It’s worth keeping an eye on this too, as they play a crucial role in your SEO efforts. 

Download Our SEO Guide