Recently I came across an issue which took me a while to find an answer to.
I was doing a site migration (same domain, new URLs) and ran a Screaming Frog crawl across the new site when I noticed something a little odd.
All the new html URLs that had a status code of 200 and had a status of ‘OK‘ seemed to be non-indexable. The URLs were canonicalised. I scratched my head.
According to Screaming Frog, a non-indexable URL indexability status can be attributed to one of the following issues. The URLs:
- Are blocked by robots.txt.
- Give no Response.
- Give Client Error (4XX)
- Give Server Error (5XX)
- Are Noindex (or ‘None’)
- Are Canonicalised
- Are Nofollowed
I checked everything. The robots.txt file wasn’t blocking the URLs, there were no redirects, meta refreshes or client errors, noindex tags weren’t present, and nofollow tags weren’t present.
Puzzling. And troubling.
That only left a potential issue with the canonical link element.
I doubled checked the canonical links on a couple of the pages, and finally spotted the answer.
The new canonical tags were referencing ‘HTTP’ rather than ‘HTTPS’. As a redirect was in place from HTTP to HTTPs for all URLs, this led the SEO spider to flag the URLs as non-indexable.
The canonical tags were changed to HTTPS and voila, problem solved!