Indexable and Non-Indexable URLs

Every URL that is discovered via the Screaming frog SEO spider crawl is classified as either indexable or non-indexable.

This is important as you would want all the content you wish to rank for, to be found, and secondly, have it capable of being indexed for ranking purposes so that you can draw more traffic to your webpages.

Check out the ‘Indexability’ status column on the top tabs after your crawl.

This can consolidate your index or non-index assets and better understand the status of the URLs.

Indexability of each URL within the crawl

This will show the URLs that are indexed or not indexed for search engine spiders to crawl and register for indexing.

Index – a URL that can be crawled and responds with a 200 response status code and is permitted to be indexed. This does not guarantee it will be indexed by the search engines as it is a hint and not a directive as a signal. However, if it has the proper setup to be crawled and indexed, it stands a better chance of being read and applied than leaving it to luck.

Non-index – these are URLs that cannot be crawled or do not respond with a 200 response status code or they respond with a specific instruction such as a redirect or canonical status. Also includes any URLs that are blocked by Robots.txt, time outs, redirects such as 301, 302, 307, or Javascript redirects; errors such as 404, 500, internal server errors or instructions such as no index, canonicalized URLs.

Why is this important? It alows us to see if there is a problem with a URL and its contents contained within, which can allow us to drill down to a specific URL to fix it.

Non-indexable URLs are listed under the ‘Indexability Status’ column.

Non-indexable URLs are itemised separately for easy identification

This column will show you the status of non-indexable URLs ranging from 301 redirects to canonical, to blocked by robots.txt and noindex statements too.

Refine your search parameters

Type into the search box listed [1] and enter your search term and ensure you have selected the correct filter tab. This will now display all the non-indexable URLs [2] and allow you to export the list too [3].

Non-indexed URLs filtered for easier analysis and exporting

Another way to see these non-indexed URLs is in the right-hand pane. Scrolling through the ‘Response Codes’ will allow you to analyze which URLs are not correctly indexed and need to be fixed.

Checking for non-indexable URLs via the right-hand window pane

Other factors to be aware of

Indexability is not saying that Google will or will not index a URL. Ultimately, Google decides what it will index.

For example, canonicals are hints rather than directives.

The aim of indexability is to inform and educate you as to what the search engines might do with your URLs. Therefore, having certain URLs marked as canonicalised is informing the search engines that you wish the selected URL to be canonicalised.

Whether Google decides to treat the chosen URL as canonicalised and pass over the link signals to the designated URL you have specified with the canonical reference is purely at Google’s discretion.

However, it is better to have these signals in place than to leave it unstructured and hope that the search engines are smart enough to figure it out. Sometimes they are and sometimes they are not, but always be aware they have a crawl budget and will not waste time or money looking to improve your site’s ranking if you do not help them.

Therefore indexability is a guide to inform and educate search engines about the way you wish your website to be read.

You can also manually check via Search Console by checking each URL one by one but this is too time-consuming. So Screaming frog SEO spider provides a quicker way to run through this process even if it is not perfect.

Manually enter into the search query box a URL to be verified under Search Console

Further support

The guide above should help illustrate the simple steps required to get started with Screaming frog SEO Spider.

You can read more about URL matching-based path values in Google’s robots.txt specifications guide.

For more information check out the videos on Youtube.

Likewise, if you have any further questions, then please get in touch via our contact page.