fbpx

Troubleshooting Non-Indexable URLs In Screaming Frog

Home / Digital / Troubleshooting Non-Indexable URLs In Screaming Frog

Recently I came across an issue which took me a while to find an answer to.

I was doing a site migration (same domain, new URLs) and ran a Screaming Frog crawl across the new site when I noticed something a little odd.

All the new html URLs that had a status code of 200 and had a status of ‘OK‘ seemed to be non-indexable. The URLs were canonicalised. I scratched my head.

Reasons for non-indexable URLs in Screaming Frog

According to Screaming Frog, a non-indexable URL indexability status can be attributed to one of the following issues. The URLs:

  • Are blocked by robots.txt.
  • Give no Response.
  • Redirect (3XX, meta refresh, or JavaScript redirect)
  • Give Client Error (4XX)
  • Give Server Error (5XX)
  • Are Noindex (or ‘None’)
  • Are Canonicalised
  • Are Nofollowed

I checked everything. The robots.txt file wasn’t blocking the URLs, there were no redirects, meta refreshes or client errors, noindex tags weren’t present, and nofollow tags weren’t present.

Puzzling. And troubling.

That only left a potential issue with the canonical link element.

I doubled checked the canonical links on a couple of the pages, and finally spotted the answer.

The new canonical tags were referencing ‘HTTP’ rather than ‘HTTPS’. As a redirect was in place from HTTP to HTTPs for all URLs, this led the SEO spider to flag the URLs as non-indexable.

The canonical tags were changed to HTTPS and voila, problem solved!

Comments(2)

  • Robbert
    12th July 2021, 8:30 pm  Reply

    Hi Simon,
    I’ve bumped into this post searching for this exact same issue. The odd thing is that the canonical tags are referencing https. So there should not be a redirect issue right? What I don’t understand is that every non-indexable / canonicalised page has two canonical link elements. Link element 1 is referencing to the Homepage. Link element 2 is self referencing. Any thoughts on this? thx

    • Simon Heyes
      27th July 2021, 3:36 pm

      Hi Robbert, that does sound a bit weird. In fact that would be the reason Screaming Frog is pushing out an error. A page should only have one canonical tag – if a page has multiple canonical tags, then Google will ignore both (see more here > https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html). Try just the self-referencing canonical & see if that works 🙂

Leave a Comment