How To Use Chrome Puppeteer to Fake Googlebot & Monitor Your Site – Tom Pool

09 April 2019

Posted in: BSEO

In this talk, Tom Pool delivered an insightful and candidly potty-mouthed assessment of the advantages of using Google Puppeteer to monitor your site.

Overview

Tommy began his 210 slides by talking about the power of headless Chrome.

For those that don’t know, he explains, headless Chrome is effectively Chrome without a user interface, meaning you have to operate it directly through your machine’s command line, which isn’t for the faint of heart.

The main thing headless Chrome allows us to do is to scrape millions of JavaScript websites.

You can also copy the HTML DOM and paste to a text file and compare this with the source code and export the differences. You can generate screenshots of the pages and crawl single page applications.

You can also automate WebPage checks to perform WebPage testing, emulating user behaviour to see how your site performs under heavy use.

Tom acknowledged, however, using command line isn’t ideal for everyone, enter Chrome Puppeteer.

Google Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.

Tom did warn though, that to use Puppeteer to run tests locally on your machine, you have to install NPN & Node.js. Here’s how to install on a PC and here’s how to install on a Mac (he also warned it is harder to install on a Mac, but there’s a handy plug in for this – Homebrew – which makes this easier though).

Once you have Puppeteer set up you can fake Googlebot with a few tweaks to the code.

Tom did caveat, however, that this is merely mimicking Googlebot and isn’t exactly the same as Googlebot.

Puppeteer can then be installed on your server so it can be provided with a list or URLs to work through and render how they would appear in Google.

The next step he explained was to use Puppeteer to mimic ContentKing which is a tool that allows you to monitor a site in real-time, that also lets you know of any issues or changes to any of your pages (but it is expensive).

Using Puppeteer is free!

Tom has written a code (about 200 lines of code) to automate these checks and look out for changes through Puppeteer.

Tom did highlight a few example lines from the code that perform specific functions (shared below), but did not share the entire code. Presumably, they are bug checking this and continuing to build it out into a packaged monitoring tool which they can monetise or simply roll out on their clients.

Tom then went on to explain how he went one step further to automate the process of running the code to monitor his chosen sites daily through a Raspberry Pi by setting up a CronJob. He also set it up to include email alerts to specific changes.

Key Takeaways

  • Google Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium
  • To use Puppeteer to run tests locally on your machine, you have to install NPN & Node.js
  • Puppeteer can be installed on your server so it can be provided with a list of URLs to work through and render how they would appear in Google
Share This