Updating a Large WordPress Site to Gatsby Frontend, Part I: Planning

August 06, 2021

The undertaking: update a large custom-theme WordPress marketing site to a decoupled headless CMS paired with the static frontend framework Gatsby. Gatsby, is a static site generator (SSG) powered by Node, React and a vast open-source plugin ecosystem.

This series of blog posts is written to outline the general process I used to accomplish this task. There will be terms included such as rsync and ssh etc. Rather than include extensive step-by-step instructions on each, I will link to manuals or tutorials which helped me along my journey.

This will be a full-stack, high-level dive into all facets of this project from concept to completion.

Let’s get to it!

First: Assess The Situation

The first step for any programming task should be to assess the situation. Below are some specs on the original website I was working with:

  • The database was a decade old with orphaned tables and data.
  • WordPress installation had several custom post types and thousands of posts & pages.
  • The custom theme has lots of interwoven functionality – there was no way this site would function if the theme were swapped for another.
  • The theme relied on a custom plugin for some functionality and lead-generating forms.
  • Several plugins were actively used for security, SEO, and optimization.
  • Tons of legacy files outside the wp-content folder, including thousands of images, HTML files, and miscellaneous clutter.

Phew! This was going to be a lot to organize while modernizing the frontend and optimizing the backend.

The original website was actively updated and regularly maintained by multiple people – which meant that I didn’t have the luxury of cloning to a staging environment and working on it for 2-3 months uninterrupted. Rather, I had to ensure the new website developments also included any SEO & content changes and new pages published by the team.

Next: Clean

I. Remove Unused Themes and Plugins

The next step was to determine which weight could immediately be removed. The first to go were unused themes and plugins. The less clutter, the better. This is a best practice regardless.

II. Remove Unused Images

There were thousands of images in subfolders which were directly linked to from within pages and posts. This presented some challenges. It was far too time consuming to manually check if each image exists in the database and needs to be kept. Gatsby provided an interesting way to help pinpoint unnecessary images right out of the box though. More on that later.

III. Remove Unnecessary Custom Fields

Over time, many marketing efforts were initialized but didn’t gain traction. Some of these efforts required specific landing pages, and development involved utilizing custom fields with the Advanced Custom Fields plugin or custom meta. Some of these pages were eventually removed or redirected, but their custom field data remained. This doesn’t result in much database bloat, however it can become difficult to determine which fields are needed later on. It’s best to clean up unused data sooner than later whenever possible. In this case, however, some sleuthing was required to determine what could be purged.

One method I liked to use involves an SQL query to determine which published posts or pages might still use custom field data.


SELECT * FROM wp_postmeta 
  INNER JOIN wp_posts ON wp_postmeta.post_id = wp_posts.id 
  WHERE wp_posts.post_status = 'publish' 
    AND wp_postmeta.meta_key LIKE '%some_custom_field%' 
  GROUP BY wp_posts.id;

This query is looking for any published posts (or pages) that happen to reference a custom field by meta key %some_custom_field%

When running this SQL query, if I see zero rows returned, it’s safe to remove the custom field definition from Advanced Custom Fields, and rule out the support requirement in the new theme.

SQL query returns no results.

Now, if I did get some results returned, I needed to dig a bit deeper and see if the data is important enough to warrant including in the new site. To do this, I’d visit a handful of the posts by post_id in wp-admin, and see where the data is presented. To do this, I’d pinpoint which post_id I’d like to check, then visit it in the browser via …com/wp-admin/post.php?post=2379&action=edit

SQL query returns posts that have a meta_key populated (meta_key corresponds to the field in ACF)

If it wasn’t something that needed to live in the database anymore, it was removed.

There are no significant performance implications if this unused data remains in the database. However it can become a source for confusion or uncertainty during times of troubleshooting. If it’s not used, lose it.

Then: Consider the Destination

I was planning to use some new plugins and remove others in the new installation. For example, I could lose Contact Form 7 and Google XML Sitemaps plugins, because that functionality will be handled differently with Gatsby.

I also needed to add new plugins to the destination installation to support Gatsby and GraphQL:

For more information on how Gatsby, GraphQL, and WordPress integrate, check out the Sourcing with WordPress article by Gatsby.

We’ll also be migrating from All in one SEO over to Yoast, because Yoast has a nice GraphQL support plugin available: WPGraphQL Yoast SEO Addon (at the time of this project, All in one SEO did not have GraphQL support available).

The new backend also needed to support the numerous custom post types.

In the next article, I review the process used to set up the staging environment, and eventually get into the sync/pull bash scripts needed to fetch the latest content or changes from production.

Leave a comment. Let's discuss!

MichaelWritten by Michael - His career path has allowed him to incorporate his creative eye with a love of programming, analytical thinking, and learning. Michael has been married to his lovely wife Yohana since 2012. They have four wonderful children, two St. Bernard dogs, and a chinchilla. Follow @missionmikedev on Twitter