Content Audit for Any Website

Screaming Frog Data | URL profiler Data Example

 

Crawling the site.

  • Take inventory of all content available for indexation
  • Crawling the site with Screaming Frog or xenu 
  • Export and duplicate the crawled list
  • Separate all URLs with robots meta or X-Robots noindex tags 
  • Separate other URLS based on server status errors

 

Understanding Sheet Data (Actionable metrics)

 

Gather 

  • Export the “Internal All” file, from screaming frog
  • Combine this sheet with additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere
  • Now use this sheet as a seed list in URL Profiler
  • Check image below for URL profiler settings.
  • Give Google Analytics and Google Webmaster Credentials to URL Profiler
  • Give the Google pagespeed insights API Key to URL Profiler (Info). 
  • Use the same credentials of the Google Analytics and the webmaster
  • Google Indexation needs around 10 proxy servers, otherwise the IP address will be blocked and we won’t get correct results

 

 

 

What to track? What tool to use?
Indexed or not A non-indexed URL is often a sign of an un crawled or low-quality page.). 
Content uniqueness Copyscape, Siteliner, and now URL Profiler can provide this data
Organic Search Traffic Typically 90 days
Conversion Tracking Revenue and/or conversions 
Publish date  discover stale content
Internal links ensuring the most important pages have the most internal links
External links (These can come from Moz, SEMRush, and a variety of other tools)
One Page Time Landing pages resulting in low time-on-site
Bounce Rate Landing pages resulting in Low Pages-Per-Visit
Page Speed Page speed and mobile-friendliness through google tool
Page HTTP Status Response code 2xx, 3xx, 4xx, 5xx
Page Canonical tags

General Content & Auditing

The site content is classified into some general categories. Then we follow certain procedures to improve these general pages. Once this is done we move to part two to target specific issues and improve them. 

Content Classification How to Decide this category Action How to decide Action & exact procedure
Trust Building Pages These are pages like

  • About Us Page
  • Contacts Us Page: Should have Contact Form, Address + Map, Phone, Email
  • Testimonial Page
  • Terms & Conditions Page || Privacy Policy Page
  • For ecomm site: Refund Policy Shipping Information Return policy 
Always Improve

OR Create

Every site must have these pages. 

  • Create if the site is missing these pages
  • Improve if they have less content or lack functionality
Editorial content If the site has a blog. All the blogs or articles come under this category. 

  • Blogs
  • Articles
  • News
  • tutorials/how-to content
Improve, Keep, or Remove
  • Improve Outdated Blogs: The information in older pages may be valuable but parts of may need to be updated
  • Remove Irrelevant Blogs: Information relevant 2 years ago may not be relevant today
  • Keep Blogs that are current with relevant information. 
Archive Pages These are auto generated pages by many CMSs. These could be category pages, tag pages, author archives etc  Page Redesign As these are autogenerated they may have duplicate content. We almost always have redesign these pages so duplicate content is minimized. 
Sales Pages All Product Pages where users can make purchase Improve
External Guest Posts These are articles you have written elsewhere in hopes of getting backlinks. 
  • Remove guest posts that were published elsewhere
Internal Guest Posts These are articles that someone else has written on your site. 
Stub Content (e.g., “No content is here yet, but if you sign in and leave some user-generated-content, then we’ll have content here for the next guy.” By the way, want our newsletter? Click an AD!)
Curated Content Comprised almost entirely of bits and pieces of content that exists elsewhere.
Media & Related Pages
  • Image Pages & Files
  • PDF PAges & Files – Google indexes PDFs just as it indexes regular pages. Are PDFs being linked from Indexed pages?
  • Videos Pages & Files
  • Audio Pages & Files

Special issues & Fixes

These is for content that has specific issues. 

Content Classification How to Decide this category Action How to decide Action & exact procedure
Prime Content Pages with high visits/entrances but low conversion, time-on-site, page views per session, etc. Improve & Analyse These pages are really good place to add A/B testing and take the optimization to next level. 
Influencer Pages Pages with good link and social metrics. Analyse These pages are really good place to learn what is working on your website. Find out why these pages are getting more shares and likes and how to repeat this formula for other pages.
Thin Content pages Thin or Short content are pages Glossed over the topic, too few words, or all image-based content. Improve, Remove or Consolidate
  • If there are many thing content pages belonging to one topic consider consolidating such pages and implement canonicalization 
  • Remove pages that don’t serve any purpose
  • Improve pages that serve some purpose. There should be > 1500 words per page. 
Misleading SEO  Titles or keywords targeting queries for which content doesn’t answer or deserve to rank. Generally not providing the information the visitor was expecting to find. Improve
Low Quality pages Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate Improve OR Remove
Irrelevant content
  • Content with no external links, no social shares, and very few or no entrances/visits
  • Pages with poor link, traffic, and social metrics related to low-quality content that isn’t worth updating
  • The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
  • The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
  • Too many indexable blog tag or blog category pages
  • Prune from site to remove duplicate content. & Out-of-date content that isn’t worth updating or consolidating T
  • Recommend allowing the URL to return 404 or 410 response code. 
  • Remove all internal links, including from the sitemap.”
Internally duplicated 
  • Plagiarized content
  • on other pages (e.g., categories, product variants, archives, technical issues, etc.). www.siteliner.com  
Remove or Canonicalize
Externally duplicated
  • (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.) Use www.copyscape.com
Remove or Canonicalize

Understanding the Actions

Consolidate Content
  • When you have overlapping topics that don’t provide much unique value of their own, but could make a great resource when combined.
  • Mark the page in the set with the best metrics as “Improve” and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
  • Mark the pages that are to be consolidated into the canonical page as “Consolidate” and provide further instructions in the Details column, such as:
  • Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
  • Update all internal links
  • Campaign-based or seasonal pages that could be consolidated into a single “Evergreen” landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 —> Best Sellers).
Content left As-is
  • Important pages which have had their content stolen. Sort by entrances or visits (filtering out any that were already finished)
  • Pages with good traffic, conversions, time on site, etc. that also have good content.
  • These may or may not have any decent external links.

 

Writing up the report

As given below

 

As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:

Removal of about 624 pages from Google index by deletion or consolidation:


203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages

Followed by a redirect to the page into which they were consolidated

 

Rewriting or improving of 668 pages


605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 “Other” pages to be rewritten due to low-quality or duplicate content.


Keeping 226 pages as-is


No rewriting or improvements needed
These changes reflect an immediate need to “improve or remove” content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.