Indexing & Crawling Issues — IndxQ SEO

04 // Crawling

Indexing &
Crawling

Diagnose and fix every GSC coverage status — from “Discovered, currently not indexed” to soft 404s, canonical conflicts, robots.txt blocks, and sitemap errors that are silently keeping your pages out of Google’s index.

11 in-depth guides

Updated March 2026

GSC coverage focus

⚙ GSC Coverage Status Reference

Indexed — appears in search resultsSuccessfully crawled and indexed.

Indexed

Discovered — currently not indexedFound via sitemap or link. Not yet crawled.

Warn

Crawled — currently not indexedCrawled but Google chose not to index.

Warn

Duplicate — submitted URL not selectedCanonical points elsewhere.

Warn

Blocked by robots.txtGooglebot explicitly blocked.

Error

Soft 404Returns 200 but content is thin/empty.

Error

Page with redirectURL redirects to another destination.

Info

// Each status has a distinct cause and fix. Scroll down for the complete breakdown of every coverage state.

How It Works

The Googlebot Crawl-to-Index
Pipeline

Before you can fix an indexing problem, you need to understand where in the pipeline your pages are breaking down. Each stage has distinct failure modes.

// Googlebot’s path from URL discovery to appearing in search results

🔗

URL Discovery

Sitemap / Links

🤖

robots.txt Check

Can fail here

📥

Crawl Queue

Budget limits

🔍

Rendering & Parsing

JS issues here

📊

Indexing Decision

Quality filter

Most pages stall at stage 3 (crawl queue) or stage 5 (indexing decision). “Discovered not indexed” = stalled at queue. “Crawled not indexed” = failed at indexing decision. These require completely different fixes.

Status Deep-Dives

Every GSC Coverage Status,
Explained & Fixed

Google Search Console reports 15+ distinct coverage statuses. Here are the most impactful — the ones that account for 90% of indexing problems across real sites.

Discovered — Currently Not Indexed GSC → Coverage → Excluded

Google has found the URL (via sitemap or internal link) but hasn’t crawled it yet. This is usually a crawl budget problem — Google decided the page wasn’t worth prioritising. It can also indicate low internal link equity pointing to the page, making it appear low-priority. Critically, this is NOT a quality issue — Google hasn’t seen the content yet.

crawl budget exhausted weak internal linking deep page depth (4+ clicks) new page, not yet crawled sitemap only — no internal links

Fix: Improve internal linking, submit URL in GSC, reduce crawl budget waste on low-value pages

Crawled — Currently Not Indexed GSC → Coverage → Excluded

Google crawled the page and actively decided not to index it. This is a content quality signal — Google evaluated the page and determined it didn’t meet the threshold for inclusion in the index. Common on thin content, near-duplicate pages, and pages with low E-E-A-T signals. This is harder to fix than “discovered not indexed” because quality improvement is required.

thin / low-value content near-duplicate content poor E-E-A-T signals no unique added value auto-generated pages

Fix: Substantially improve content depth, uniqueness, and E-E-A-T signals — or noindex and consolidate

Duplicate — Submitted URL Not Selected as Canonical GSC → Coverage → Excluded

You submitted a URL in your sitemap, but Google chose a different URL as the canonical version. This typically means your canonical tags are conflicting, you have multiple URLs serving similar content (www vs non-www, trailing slash vs no slash, HTTP vs HTTPS, or parameter variations), and Google is applying its own canonical judgement over yours.

www / non-www conflict trailing slash inconsistency missing canonical tag self-canonical points elsewhere URL parameters creating dupes

Fix: Add consistent self-canonicals, enforce one URL variant via 301 redirect, clean up URL parameters

Blocked by robots.txt GSC → Coverage → Error

Googlebot is explicitly blocked from accessing this URL by your robots.txt file. This is usually an intentional configuration, but misconfiguration is extremely common — especially after CMS updates, migrations, or when a developer adds a blanket Disallow rule. A single Disallow: / will block your entire site from being indexed.

Disallow: / applied globally folder blocked unintentionally post-migration config error staging robots.txt deployed to prod

Fix: Audit robots.txt at yourdomain.com/robots.txt — remove or correct Disallow rules for pages you want indexed

Soft 404 GSC → Coverage → Error

The server returns an HTTP 200 (success) status code, but the page content is essentially empty, shows an error message, or is so thin that Google treats it as a 404. Common on e-commerce sites with out-of-stock products, search result pages accidentally indexed, empty category archives, and user-generated content pages with deleted content.

empty category page out-of-stock product page deleted content, 200 returned search result pages indexed thin auto-generated archive

Fix: Return true 404/410 for deleted content, improve thin pages, noindex or add canonical to near-empty pages

Excluded by “noindex” Tag GSC → Coverage → Excluded

A noindex directive in the page’s meta robots tag or X-Robots-Tag HTTP header is explicitly telling Google not to index this URL. Intentional for admin, thank-you, and tag pages — but a serious problem when found on content pages, landing pages, or product listings. Plugins like Yoast/Rank Math and theme builders frequently apply noindex accidentally.

SEO plugin misconfiguration noindex in page template custom post type excluded theme builder default setting staging setting leaked to prod

Fix: Screaming Frog crawl filtering for noindex — remove directive from pages that should be indexed

Alternate Page with Proper Canonical Tag GSC → Coverage → Excluded

This page is not indexed because it has a canonical tag pointing to a different URL, which Google is respecting. This is often the intended behaviour — you don’t want paginated pages, filtered views, or print versions indexed. However, it becomes a problem when canonical tags are set incorrectly and point important pages to the wrong canonical URL.

intended canonical exclusion wrong canonical URL set pagination canonicalled to page 1 faceted nav URLs excluded

Verify canonical targets are correct — self-canonicals on important pages, exclusion canonicals on duplicates only

Code Reference

robots.txt: Right vs
Wrong Configurations

A single character in robots.txt can block your entire site from Google. These are the most dangerous misconfigurations we see — and their correct equivalents.

✗ Dangerous — Blocks entire site

Staging robots.txt deployed to production

This is the #1 robots.txt catastrophe. A developer forgets to update robots.txt when pushing staging to live — and the entire site is blocked from all crawlers. Can eliminate 100% of organic traffic within days.

✓ Correct — Selective blocking only

Block only admin and private paths

Block only directories that should never be indexed: admin panels, login pages, internal search results, and utility paths. Never block content directories or the root path.

# ✗ WRONG — blocks EVERYTHING from all crawlers
User-agent: *
Disallow: /

# ✗ WRONG — blocks Googlebot from all content
User-agent: Googlebot
Disallow: /

# ✓ CORRECT — only blocks admin/utility paths
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /?s=          # internal search results
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/

# ✓ Point Googlebot to your sitemap
Sitemap: https://yoursite.com/sitemap.xml
      

# ✓ Canonical tag — self-referential on all key pages
<link rel=“canonical” href=“https://yoursite.com/exact-url/” />

# ✗ WRONG — canonical pointing to a different page
<link rel=“canonical” href=“https://yoursite.com/different-page/” />

# ✓ X-Robots-Tag HTTP header — server-level noindex
# Use for PDFs, dynamic pages, and non-HTML resources
X-Robots-Tag: noindex, nofollow

# ✓ Meta robots — page-level noindex for admin/utility pages
<meta name=“robots” content=“noindex, follow”>
# Use “noindex, follow” not “noindex, nofollow”
# — nofollow wastes link equity on the page’s outbound links
      

Diagnostic Checklist

The Indexing Audit
Checklist

Run through this sequence whenever GSC shows a significant number of excluded or errored pages. Work in order — each phase builds on the last.

Phase 1 // Immediate GSC Audit

▸Open GSC → Indexing → Pages and record the count in each status categoryScreenshot or export the data. You need a baseline to measure improvement against.

▸Check for any “Blocked by robots.txt” entries — these are always urgentEven one key page blocked by robots.txt is a critical issue. Check yourdomain.com/robots.txt directly.

▸Check for “Excluded by noindex tag” on pages that should be indexedScreaming Frog: crawl site → filter by Directives → noindex. Cross-reference with your intended index list.

▸Count “Discovered not indexed” vs “Crawled not indexed” — these need different fixesDiscovered = crawl budget / discovery problem. Crawled = quality problem. Don’t confuse the two.

Phase 2 // Canonical & Duplicate Audit

▸Verify every key page has a self-referential canonical tagIn Screaming Frog: Directives → Canonical → check all important URLs self-canonicalise correctly.

▸Check for www vs non-www and HTTP vs HTTPS inconsistencies across internal linksMixed URL variants create duplicate content that splits canonical signals. Pick one and enforce via 301.

▸Audit URL parameters — ensure faceted navigation parameters are canonicalised or blockedE-commerce filter pages (?color=red&size=M) often generate thousands of near-duplicate URLs.

▸Verify paginated pages (/page/2/) have correct canonicals or are noindexedPaginated pages should rarely be indexed. Canonical to page 1 or noindex — do not leave unconfigured.

Phase 3 // Sitemap Health

▸Submit sitemap in GSC and check for reported errorsGSC → Sitemaps — verify the sitemap is valid, no errors, and was last fetched recently.

▸Ensure sitemap only contains URLs you want indexed — remove noindexed and redirected URLsA sitemap containing 301-redirected or noindex URLs sends conflicting signals to Google.

▸Verify sitemap URL count matches your expected indexable page countLarge discrepancies (e.g. 800 URLs in sitemap but only 200 indexed) signal a quality or crawl budget issue.

▸Set lastmod dates in sitemap and update them when content changesAccurate lastmod dates signal freshness and help Google prioritise recrawling of updated content.

Phase 4 // Crawl Budget Optimisation

▸Identify and remove internal links to 404, soft-404, and redirect pagesEvery broken internal link wastes crawl budget and passes no link equity. Use Screaming Frog to find them all.

▸Block low-value URL patterns in robots.txt (search results, session IDs, printer-friendly URLs)Crawl budget saved on noise = more budget available for your real content pages.

▸Reduce redirect chains — no redirect should exceed 2 hops (A → B, not A → B → C → D)Each hop in a redirect chain reduces the PageRank passed and wastes a unit of crawl budget.

▸Improve internal linking depth for priority pages — key content should be reachable in 3 clicks from homepagePages buried at 5+ clicks deep are frequently skipped during crawls on large sites.

Quick Reference

GSC Status → Fix
Decision Matrix

The fastest path from GSC status to the correct fix action — without misdiagnosing which problem you actually have.

GSC Status	Root Cause Type	Urgency	Primary Fix
Blocked by robots.txt	Configuration error	Critical	Edit robots.txt to remove the Disallow rule — verify in GSC robots.txt tester
Excluded by noindex tag	Directive error	Critical	Remove noindex meta tag or HTTP header from pages that should be indexed
Discovered — not indexed	Crawl budget / discovery	High	Add internal links, submit URL in GSC, reduce low-value crawl targets
Crawled — not indexed	Content quality	High	Improve content depth and E-E-A-T — or noindex and consolidate with related pages
Duplicate — submitted URL not selected	Canonical conflict	High	Add self-canonical, enforce URL consistency via 301, fix sitemap URL variants
Soft 404	Server / content mismatch	High	Return real 404/410 for deleted content; improve thin pages to justify 200 status
Redirect error	Redirect chain / loop	Medium	Fix redirect chain to single 301 hop — detect with Screaming Frog redirect report
Alternate page — proper canonical	Intended or misconfigured	Check	Verify canonical target is correct — if intentional, no action needed

Pages Not Showing Up?

We’ll Find Every Indexing Issue
and Tell You Exactly How to Fix It.

Request Free Crawl Audit →

Top Indexing & Crawling Guides

Where readers start when their pages disappear from Google — or never show up in the first place.

🔥 Most Read · Indexing & Crawling

“Discovered — Currently Not Indexed”: The Definitive Fix Guide for 2026

Why Google finds your pages but refuses to crawl them — the 6 specific causes of “Discovered not indexed,” how to diagnose which one you have, and the exact fixes for each. Includes crawl budget optimisation and internal linking strategies.

🕐 19 min readDefinitive guide

Coverage Status

Crawled — Currently Not Indexed: Why Content Quality Is the Only Fix

Unlike “discovered not indexed,” this status means Google evaluated your page and decided it wasn’t worth indexing. Here’s how to diagnose which quality signals failed — and how to fix them.

🕐 15 min read

Configuration

robots.txt Complete Guide: The Most Dangerous File on Your Website

A single misconfiguration in robots.txt can block your entire site from Google in minutes. How to write, test, and audit your robots.txt so it never becomes a catastrophe.

🕐 12 min read

Canonical Tags

Canonical Tags: The Complete 2026 Guide (With Every Edge Case)

How canonical tags work, when Google ignores them, the difference between hints and directives, and how to audit your site for canonical conflicts using Screaming Frog and GSC.

🕐 17 min read

Crawl Budget

Crawl Budget Optimisation: How to Get Google to Crawl Your Best Pages First

What crawl budget actually means, which sites need to worry about it, and the specific technical changes that direct Googlebot’s attention toward your most valuable content.

🕐 14 min read

Free Crawl & Index Audit

Pages Invisible to Google?
Let’s Find Out Why.

Share your site and we’ll run a full crawl and indexing audit — GSC coverage analysis, robots.txt check, canonical conflict detection, sitemap health, and a prioritised fix list. Free, no pitch.

GSC Coverage report analysis — every excluded status explained
robots.txt audit — check for accidental blocking rules
Canonical tag conflict detection across key pages
Sitemap health check — errors, redirects, noindexed URLs
Internal link depth analysis for priority pages
Prioritised fix list delivered to your inbox within 48 hours

No sales calls

Response within 48 hours

100% free

Request Your Free Crawl Audit

// Reviewed by a technical SEO specialist.

First Name *

Last Name *

Email Address *

Website URL *

Primary indexing issue *

How many excluded pages does GSC show?

CMS / Platform

Any additional context

✓

Audit Request Received

We’ll analyse your crawl and indexing data and send your prioritised fix list within 48 hours. Check your inbox — and spam, just in case.

WordPress SEO that actually ranks

Indexing &
Crawling

The Googlebot Crawl-to-Index
Pipeline

Every GSC Coverage Status,
Explained & Fixed

robots.txt: Right vs
Wrong Configurations

The Indexing Audit
Checklist

GSC Status → Fix
Decision Matrix

We’ll Find Every Indexing Issue
and Tell You Exactly How to Fix It.

Top Indexing & Crawling Guides

Pages Invisible to Google?
Let’s Find Out Why.

Sayed Iftekharul Haque — SEO Strategist & Web Designer

WordPress SEO that actually ranks

Indexing &Crawling

The Googlebot Crawl-to-IndexPipeline

Every GSC Coverage Status,Explained & Fixed

robots.txt: Right vsWrong Configurations

The Indexing AuditChecklist

GSC Status → FixDecision Matrix

We’ll Find Every Indexing Issueand Tell You Exactly How to Fix It.

Top Indexing & Crawling Guides

Pages Invisible to Google?Let’s Find Out Why.

Sayed Iftekharul Haque — SEO Strategist & Web Designer

Indexing &
Crawling

The Googlebot Crawl-to-Index
Pipeline

Every GSC Coverage Status,
Explained & Fixed

robots.txt: Right vs
Wrong Configurations

The Indexing Audit
Checklist

GSC Status → Fix
Decision Matrix

We’ll Find Every Indexing Issue
and Tell You Exactly How to Fix It.

Pages Invisible to Google?
Let’s Find Out Why.