Automate Your SEO Tasks With Custom Extraction – Max Coupland

09 April 2019

Posted in: BSEO

Max Coupland’s talk on automating your SEO tasks with custom extraction was extremely practical and served as a real eye-opener for how you could feasibly achieve a 4-day work week.


The main points Max outlined he’d be covering in his talk were building tools and processes to automate:

  • Keyword research.
  • Market share analysis.
  • Internal linking insights.

The weapons Max would go on to explain he’d be using in his time-saving arsenal were XPATHs & Regular Expressions (Regexs).

Max explained that both these tools are used to extract information from the web or a document, that is custom to your exact needs.

Understanding these two concepts will allow you to have virtually any information from any website, even Google SERP, at any scale, in the palm of your hands within a matter of minutes.

Furthermore through machine automation it is possible to crawl a website with millions of URLs, without having to wait a week for the crawl to finish.

There are many tools already available out there, but not every website is built the same way so it’s impossible for them to offer every possible SEO element for you to extract. That’s why I am going to talk about ‘custom extraction’.

Crawling large websites

Custom extraction can manage the size of the crawl – removing certain elements from the crawl that are irrelevant.

One way I could do this is to use an exclude rule and take out all parameters from the crawl using this simple REGEX string: .*[insert parameter].* e.g: .*?.* will exclude any URLs with a query string attached.

You can also use this expression to include URL parameters within URLs too so you could only crawl a subdomain or specific section of a site.

Custom extraction can also be used to pull out framed content. You can use regex to pull all the pages on a website using a specific framed content and extract them in a single exportable view.

All you would need to do is find the class being used in the HTML to display that content by inspecting the element then adding the class to the Xpath XPATH.

Another example Max mentioned is how Xpath could be used to find every page missing GA code, find iframes in the head of the page, or rogue HREFLANG tags in the body and much more.

All you need to do is find the elements they are all wrapped around, and use their XPATHs in custom extraction and you’ll be saved all this time and hassle.

He highlighted how DeepCrawl is a great tool for using RegEx to crawl your site or extract important information and it even comes with a live RegEx tester so you can see if your string is functional before implementing it in the wild.

Keyword research

Max explained out how custom extraction can be used to gather People Also Ask queries and related searches which are being displayed on the SERP to be added to your keyword list.

He pointed out Google’s SERP is a web page in itself, and like most web pages uses classes, id, elements to structure them. So you can simply inspect a PAA box and see what class Google uses for them.

This class can then be extracted and added to a custom Xpath combined with the URL of Google’s search result for our chosen keywords. This is Google’s root URL for Google search, and so we just have to re-create how the URL would appear were we to conduct a Google search for our keywords.

Then via the custom extraction tab in Screaming Frog we will be able to see for our list of keywords, SF will pull through the PAA queries listed on the SERP for each of the keywords.

Custom extraction can also be used to determine user intent for each keyword. Max cited the fantastic work of Rory Truesdale from Conductor for working this out.

Max identified we could use custom extraction for all three pillars of SEO, but could also is for outreach too.

He described an example of where his team built a parent’s salary calculator by scraping job sites for various salaries and returning them before taking averages to work out how much parents should be paid for all of their household chores.

Out of this info they built an outreachable asset that garnered some very high authority links and because of the custom extraction used, took very little time.

In summary, Max delivered some very useful actionable nuggets on how to automate SEO tasks to save yourself time.

It is clear that using custom extraction effectively will require more research for the beginner, however, the principles were explained clearly and the benefits expertly exhibited.

Share This