fbpx

Troubleshooting Non-Indexable URLs In Screaming Frog

Home / Digital / Troubleshooting Non-Indexable URLs In Screaming Frog

Recently I came across an issue which took me a while to find an answer to.

I was doing a site migration (same domain, new URLs) and ran a Screaming Frog crawl across the new site when I noticed something a little odd.

All the new html URLs that had a status code of 200 and had a status of ‘OK‘ seemed to be non-indexable. The URLs were canonicalised. I scratched my head.

Reasons for non-indexable URLs in Screaming Frog

According to Screaming Frog, a non-indexable URL indexability status can be attributed to one of the following issues. The URLs:

  • Are blocked by robots.txt.
  • Give no Response.
  • Redirect (3XX, meta refresh, or JavaScript redirect)
  • Give Client Error (4XX)
  • Give Server Error (5XX)
  • Are Noindex (or ‘None’)
  • Are Canonicalised
  • Are Nofollowed

I checked everything. The robots.txt file wasn’t blocking the URLs, there were no redirects, meta refreshes or client errors, noindex tags weren’t present, and nofollow tags weren’t present.

Puzzling. And troubling.

That only left a potential issue with the canonical link element.

I doubled checked the canonical links on a couple of the pages, and finally spotted the answer.

The new canonical tags were referencing ‘HTTP’ rather than ‘HTTPS’. As a redirect was in place from HTTP to HTTPs for all URLs, this led the SEO spider to flag the URLs as non-indexable.

The canonical tags were changed to HTTPS and voila, problem solved!

Comments(9)

  • Robbert
    12th July 2021, 8:30 pm  Reply

    Hi Simon,
    I’ve bumped into this post searching for this exact same issue. The odd thing is that the canonical tags are referencing https. So there should not be a redirect issue right? What I don’t understand is that every non-indexable / canonicalised page has two canonical link elements. Link element 1 is referencing to the Homepage. Link element 2 is self referencing. Any thoughts on this? thx

    • Simon Heyes
      27th July 2021, 3:36 pm

      Hi Robbert, that does sound a bit weird. In fact that would be the reason Screaming Frog is pushing out an error. A page should only have one canonical tag – if a page has multiple canonical tags, then Google will ignore both (see more here > https://webmasters.googleblog.com/2013/04/5-common-mistakes-with-relcanonical.html). Try just the self-referencing canonical & see if that works 🙂

    • 22nd August 2021, 7:17 pm

      Hi Robbert and Simon,

      Just want to say, I am doing an audit of a website and encountering this issue, too. What I found out was that my site has in some cases redirect chains with up to 6 hops, which I believe is why some of the URLs although 200/OK and canonicalized were marked down non-indexable.

      Thanks, Simon, for starting a discussion about this. Definitely a super interesting topic!

      • Simon Heyes
        31st August 2021, 11:40 am

        Thanks Lazarina, that would make sense! Hopefully the blog post or your comment will help clear up any issues. Thanks for commenting 🙂

  • Ben Carew
    8th June 2022, 12:50 am  Reply

    Hi Simon
    I have the exact same issue and yet the site in question appears to be successfully indexed on Google nonetheless. Did you spot any actual issues arising from this problem (the http instead of https in the canonical tag) or was it just about fixing a reporting problem inside Screaming Frog.
    Thanks in advance.
    Ben

    • Simon Heyes
      16th June 2022, 11:59 am

      Hi Ben, thanks for commenting. Interestingly some of the pages were indexed on Google also, but for best practice purposes I wanted to ensure everything was set up correctly, so ensure the canonical tag issue was fixed (it wasn’t an issue with Screaming Frog). Cheers, Simon

  • Mags
    11th July 2022, 7:55 pm  Reply

    I’m having this exact problem with several sites of my employer. 90-99% of the URLs are non-indexable canonicalised, but also OK and 200 (according to Screaming Frog tool). However, we do score in Google.

    The only difference I notice it the / at the end of the URL. Could that be it? It’s not Http(s).
    Frog also says:
    Meta Robots 1 – ALL
    Meta Robots 1 – ALL,INDEX,FOLLOW

    Could it still affect SEO?

    • Simon Heyes
      18th July 2022, 9:58 am

      Hmm, strange one. If the content is being indexed, then that’s a good start. Yes, check the trailing slash to see if that is the cause.

  • MAGS
    11th July 2022, 8:11 pm  Reply

    One other thing: I checked like 10 sites for this issue, from competitors to very big sites, none seem to have this issue. Except for the site of the (large and succesful) company that created/hosts our sites and the site of their marketing partner (who’s also responsible for a lot of our content)!

Leave a Comment