Screaming Frog Data | URL profiler Data Example
Crawling the site.
- Take inventory of all content available for indexation
- Crawling the site with Screaming Frog or xenu
- Export and duplicate the crawled list
- Separate all URLs with robots meta or X-Robots noindex tags
- Separate other URLS based on server status errors
Understanding Sheet Data (Actionable metrics)
Gather
- Export the “Internal All” file, from screaming frog
- Combine this sheet with additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere
- Now use this sheet as a seed list in URL Profiler
- Check image below for URL profiler settings.
- Give Google Analytics and Google Webmaster Credentials to URL Profiler
- Give the Google pagespeed insights API Key to URL Profiler (Info).
- Use the same credentials of the Google Analytics and the webmaster
- Google Indexation needs around 10 proxy servers, otherwise the IP address will be blocked and we won’t get correct results
What to track? | What tool to use? | |
Indexed or not | A non-indexed URL is often a sign of an un crawled or low-quality page.). | |
Content uniqueness | Copyscape, Siteliner, and now URL Profiler can provide this data | |
Organic Search Traffic | Typically 90 days | |
Conversion Tracking | Revenue and/or conversions | |
Publish date | discover stale content | |
Internal links | ensuring the most important pages have the most internal links | |
External links | (These can come from Moz, SEMRush, and a variety of other tools) | |
One Page Time | Landing pages resulting in low time-on-site | |
Bounce Rate | Landing pages resulting in Low Pages-Per-Visit | |
Page Speed | Page speed and mobile-friendliness through google tool | |
Page HTTP Status | Response code 2xx, 3xx, 4xx, 5xx | |
Page Canonical tags |
General Content & Auditing
The site content is classified into some general categories. Then we follow certain procedures to improve these general pages. Once this is done we move to part two to target specific issues and improve them.
Content Classification | How to Decide this category | Action | How to decide Action & exact procedure |
Trust Building Pages | These are pages like
| Always Improve OR Create | Every site must have these pages.
|
Editorial content | If the site has a blog. All the blogs or articles come under this category.
| Improve, Keep, or Remove |
|
Archive Pages | These are auto generated pages by many CMSs. These could be category pages, tag pages, author archives etc | Page Redesign | As these are autogenerated they may have duplicate content. We almost always have redesign these pages so duplicate content is minimized. |
Sales Pages | All Product Pages where users can make purchase | Improve | |
External Guest Posts | These are articles you have written elsewhere in hopes of getting backlinks. |
| |
Internal Guest Posts | These are articles that someone else has written on your site. | ||
Stub Content | (e.g., “No content is here yet, but if you sign in and leave some user-generated-content, then we’ll have content here for the next guy.” By the way, want our newsletter? Click an AD!) | ||
Curated Content | Comprised almost entirely of bits and pieces of content that exists elsewhere. | ||
Media & Related Pages |
|
Special issues & Fixes
These is for content that has specific issues.
Content Classification | How to Decide this category | Action | How to decide Action & exact procedure |
Prime Content | Pages with high visits/entrances but low conversion, time-on-site, page views per session, etc. | Improve & Analyse | These pages are really good place to add A/B testing and take the optimization to next level. |
Influencer Pages | Pages with good link and social metrics. | Analyse | These pages are really good place to learn what is working on your website. Find out why these pages are getting more shares and likes and how to repeat this formula for other pages. |
Thin Content pages | Thin or Short content are pages Glossed over the topic, too few words, or all image-based content. | Improve, Remove or Consolidate |
|
Misleading SEO | Titles or keywords targeting queries for which content doesn’t answer or deserve to rank. Generally not providing the information the visitor was expecting to find. | Improve | |
Low Quality pages | Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate | Improve OR Remove | |
Irrelevant content |
|
| |
Internally duplicated |
| Remove or Canonicalize | |
Externally duplicated |
| Remove or Canonicalize |
Understanding the Actions
Consolidate Content |
|
| |
Content left As-is |
| ||
Writing up the report
As given below
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 “Other” pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to “improve or remove” content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.