🚀 New: WordPress Care Plans starting at $49/mo — see plans & pricing →
Fiverr
Upwork
LinkedIn
YouTube
WhatsApp
BD Local Guide
SEO Services
📝
On-Page Optimisation
🔍
Indexing & Crawling
Core Web Vitals
🔗
Backlinks & Off-Page SEO
📍
Local SEO & Map Pack
🛒
E-Commerce SEO
📈
Affiliate Content Scaling
🚨
Traffic Drops & Penalties
WordPress
🔧
WordPress Technical SEO
🛡️
Care Plans — from $49/mo
📦
Products & Tools
Resources
SEO Checklist 2026
💰
SEO Strategy & ROI
🛠️
Tools We Recommend
Company
👋
About Us
👋
Our Portfolio

WordPress SEO that actually ranks

Technical SEO, speed optimisation, and monthly care plans for WordPress sites that need to perform.

Indexing & Crawling Issues — IndxQ SEO
04 // Crawling

Indexing &
Crawling

Diagnose and fix every GSC coverage status — from “Discovered, currently not indexed” to soft 404s, canonical conflicts, robots.txt blocks, and sitemap errors that are silently keeping your pages out of Google’s index.

11 in-depth guides
Updated March 2026
GSC coverage focus
⚙ GSC Coverage Status Reference
Indexed — appears in search resultsSuccessfully crawled and indexed.
Indexed
Discovered — currently not indexedFound via sitemap or link. Not yet crawled.
Warn
Crawled — currently not indexedCrawled but Google chose not to index.
Warn
Duplicate — submitted URL not selectedCanonical points elsewhere.
Warn
Blocked by robots.txtGooglebot explicitly blocked.
Error
Soft 404Returns 200 but content is thin/empty.
Error
Page with redirectURL redirects to another destination.
Info
// Each status has a distinct cause and fix. Scroll down for the complete breakdown of every coverage state.
How It Works

The Googlebot Crawl-to-Index
Pipeline

Before you can fix an indexing problem, you need to understand where in the pipeline your pages are breaking down. Each stage has distinct failure modes.

// Googlebot’s path from URL discovery to appearing in search results
🔗
URL Discovery
Sitemap / Links
🤖
robots.txt Check
Can fail here
📥
Crawl Queue
Budget limits
🔍
Rendering & Parsing
JS issues here
📊
Indexing Decision
Quality filter
Most pages stall at stage 3 (crawl queue) or stage 5 (indexing decision). “Discovered not indexed” = stalled at queue. “Crawled not indexed” = failed at indexing decision. These require completely different fixes.

Status Deep-Dives

Every GSC Coverage Status,
Explained & Fixed

Google Search Console reports 15+ distinct coverage statuses. Here are the most impactful — the ones that account for 90% of indexing problems across real sites.

Discovered — Currently Not Indexed GSC → Coverage → Excluded

Google has found the URL (via sitemap or internal link) but hasn’t crawled it yet. This is usually a crawl budget problem — Google decided the page wasn’t worth prioritising. It can also indicate low internal link equity pointing to the page, making it appear low-priority. Critically, this is NOT a quality issue — Google hasn’t seen the content yet.

crawl budget exhausted weak internal linking deep page depth (4+ clicks) new page, not yet crawled sitemap only — no internal links
Fix: Improve internal linking, submit URL in GSC, reduce crawl budget waste on low-value pages
Crawled — Currently Not Indexed GSC → Coverage → Excluded

Google crawled the page and actively decided not to index it. This is a content quality signal — Google evaluated the page and determined it didn’t meet the threshold for inclusion in the index. Common on thin content, near-duplicate pages, and pages with low E-E-A-T signals. This is harder to fix than “discovered not indexed” because quality improvement is required.

thin / low-value content near-duplicate content poor E-E-A-T signals no unique added value auto-generated pages
Fix: Substantially improve content depth, uniqueness, and E-E-A-T signals — or noindex and consolidate
Duplicate — Submitted URL Not Selected as Canonical GSC → Coverage → Excluded

You submitted a URL in your sitemap, but Google chose a different URL as the canonical version. This typically means your canonical tags are conflicting, you have multiple URLs serving similar content (www vs non-www, trailing slash vs no slash, HTTP vs HTTPS, or parameter variations), and Google is applying its own canonical judgement over yours.

www / non-www conflict trailing slash inconsistency missing canonical tag self-canonical points elsewhere URL parameters creating dupes
Fix: Add consistent self-canonicals, enforce one URL variant via 301 redirect, clean up URL parameters
Blocked by robots.txt GSC → Coverage → Error

Googlebot is explicitly blocked from accessing this URL by your robots.txt file. This is usually an intentional configuration, but misconfiguration is extremely common — especially after CMS updates, migrations, or when a developer adds a blanket Disallow rule. A single Disallow: / will block your entire site from being indexed.

Disallow: / applied globally folder blocked unintentionally post-migration config error staging robots.txt deployed to prod
Fix: Audit robots.txt at yourdomain.com/robots.txt — remove or correct Disallow rules for pages you want indexed
Soft 404 GSC → Coverage → Error

The server returns an HTTP 200 (success) status code, but the page content is essentially empty, shows an error message, or is so thin that Google treats it as a 404. Common on e-commerce sites with out-of-stock products, search result pages accidentally indexed, empty category archives, and user-generated content pages with deleted content.

empty category page out-of-stock product page deleted content, 200 returned search result pages indexed thin auto-generated archive
Fix: Return true 404/410 for deleted content, improve thin pages, noindex or add canonical to near-empty pages
Excluded by “noindex” Tag GSC → Coverage → Excluded

A noindex directive in the page’s meta robots tag or X-Robots-Tag HTTP header is explicitly telling Google not to index this URL. Intentional for admin, thank-you, and tag pages — but a serious problem when found on content pages, landing pages, or product listings. Plugins like Yoast/Rank Math and theme builders frequently apply noindex accidentally.

SEO plugin misconfiguration noindex in page template custom post type excluded theme builder default setting staging setting leaked to prod
Fix: Screaming Frog crawl filtering for noindex — remove directive from pages that should be indexed
Alternate Page with Proper Canonical Tag GSC → Coverage → Excluded

This page is not indexed because it has a canonical tag pointing to a different URL, which Google is respecting. This is often the intended behaviour — you don’t want paginated pages, filtered views, or print versions indexed. However, it becomes a problem when canonical tags are set incorrectly and point important pages to the wrong canonical URL.

intended canonical exclusion wrong canonical URL set pagination canonicalled to page 1 faceted nav URLs excluded
Verify canonical targets are correct — self-canonicals on important pages, exclusion canonicals on duplicates only

Code Reference

robots.txt: Right vs
Wrong Configurations

A single character in robots.txt can block your entire site from Google. These are the most dangerous misconfigurations we see — and their correct equivalents.

✗ Dangerous — Blocks entire site
Staging robots.txt deployed to production

This is the #1 robots.txt catastrophe. A developer forgets to update robots.txt when pushing staging to live — and the entire site is blocked from all crawlers. Can eliminate 100% of organic traffic within days.

✓ Correct — Selective blocking only
Block only admin and private paths

Block only directories that should never be indexed: admin panels, login pages, internal search results, and utility paths. Never block content directories or the root path.

# ✗ WRONG — blocks EVERYTHING from all crawlers User-agent: * Disallow: / # ✗ WRONG — blocks Googlebot from all content User-agent: Googlebot Disallow: / # ✓ CORRECT — only blocks admin/utility paths User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php Disallow: /?s= # internal search results Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ # ✓ Point Googlebot to your sitemap Sitemap: https://yoursite.com/sitemap.xml
# ✓ Canonical tag — self-referential on all key pages <link rel=“canonical” href=“https://yoursite.com/exact-url/” /> # ✗ WRONG — canonical pointing to a different page <link rel=“canonical” href=“https://yoursite.com/different-page/” /> # ✓ X-Robots-Tag HTTP header — server-level noindex # Use for PDFs, dynamic pages, and non-HTML resources X-Robots-Tag: noindex, nofollow # ✓ Meta robots — page-level noindex for admin/utility pages <meta name=“robots” content=“noindex, follow”> # Use “noindex, follow” not “noindex, nofollow” # — nofollow wastes link equity on the page’s outbound links

Diagnostic Checklist

The Indexing Audit
Checklist

Run through this sequence whenever GSC shows a significant number of excluded or errored pages. Work in order — each phase builds on the last.

Phase 1 // Immediate GSC Audit
Open GSC → Indexing → Pages and record the count in each status categoryScreenshot or export the data. You need a baseline to measure improvement against.
Check for any “Blocked by robots.txt” entries — these are always urgentEven one key page blocked by robots.txt is a critical issue. Check yourdomain.com/robots.txt directly.
Check for “Excluded by noindex tag” on pages that should be indexedScreaming Frog: crawl site → filter by Directives → noindex. Cross-reference with your intended index list.
Count “Discovered not indexed” vs “Crawled not indexed” — these need different fixesDiscovered = crawl budget / discovery problem. Crawled = quality problem. Don’t confuse the two.
Phase 2 // Canonical & Duplicate Audit
Verify every key page has a self-referential canonical tagIn Screaming Frog: Directives → Canonical → check all important URLs self-canonicalise correctly.
Check for www vs non-www and HTTP vs HTTPS inconsistencies across internal linksMixed URL variants create duplicate content that splits canonical signals. Pick one and enforce via 301.
Audit URL parameters — ensure faceted navigation parameters are canonicalised or blockedE-commerce filter pages (?color=red&size=M) often generate thousands of near-duplicate URLs.
Verify paginated pages (/page/2/) have correct canonicals or are noindexedPaginated pages should rarely be indexed. Canonical to page 1 or noindex — do not leave unconfigured.
Phase 3 // Sitemap Health
Submit sitemap in GSC and check for reported errorsGSC → Sitemaps — verify the sitemap is valid, no errors, and was last fetched recently.
Ensure sitemap only contains URLs you want indexed — remove noindexed and redirected URLsA sitemap containing 301-redirected or noindex URLs sends conflicting signals to Google.
Verify sitemap URL count matches your expected indexable page countLarge discrepancies (e.g. 800 URLs in sitemap but only 200 indexed) signal a quality or crawl budget issue.
Set lastmod dates in sitemap and update them when content changesAccurate lastmod dates signal freshness and help Google prioritise recrawling of updated content.
Phase 4 // Crawl Budget Optimisation
Identify and remove internal links to 404, soft-404, and redirect pagesEvery broken internal link wastes crawl budget and passes no link equity. Use Screaming Frog to find them all.
Block low-value URL patterns in robots.txt (search results, session IDs, printer-friendly URLs)Crawl budget saved on noise = more budget available for your real content pages.
Reduce redirect chains — no redirect should exceed 2 hops (A → B, not A → B → C → D)Each hop in a redirect chain reduces the PageRank passed and wastes a unit of crawl budget.
Improve internal linking depth for priority pages — key content should be reachable in 3 clicks from homepagePages buried at 5+ clicks deep are frequently skipped during crawls on large sites.

Quick Reference

GSC Status → Fix
Decision Matrix

The fastest path from GSC status to the correct fix action — without misdiagnosing which problem you actually have.

GSC Status Root Cause Type Urgency Primary Fix
Blocked by robots.txt Configuration error Critical Edit robots.txt to remove the Disallow rule — verify in GSC robots.txt tester
Excluded by noindex tag Directive error Critical Remove noindex meta tag or HTTP header from pages that should be indexed
Discovered — not indexed Crawl budget / discovery High Add internal links, submit URL in GSC, reduce low-value crawl targets
Crawled — not indexed Content quality High Improve content depth and E-E-A-T — or noindex and consolidate with related pages
Duplicate — submitted URL not selected Canonical conflict High Add self-canonical, enforce URL consistency via 301, fix sitemap URL variants
Soft 404 Server / content mismatch High Return real 404/410 for deleted content; improve thin pages to justify 200 status
Redirect error Redirect chain / loop Medium Fix redirect chain to single 301 hop — detect with Screaming Frog redirect report
Alternate page — proper canonical Intended or misconfigured Check Verify canonical target is correct — if intentional, no action needed
Pages Not Showing Up?

We’ll Find Every Indexing Issue
and Tell You Exactly How to Fix It.

Request Free Crawl Audit →
Most Read

Top Indexing & Crawling Guides

Where readers start when their pages disappear from Google — or never show up in the first place.

🔥 Most Read · Indexing & Crawling
“Discovered — Currently Not Indexed”: The Definitive Fix Guide for 2026
Why Google finds your pages but refuses to crawl them — the 6 specific causes of “Discovered not indexed,” how to diagnose which one you have, and the exact fixes for each. Includes crawl budget optimisation and internal linking strategies.
🕐 19 min readDefinitive guide
Coverage Status
Crawled — Currently Not Indexed: Why Content Quality Is the Only Fix
Unlike “discovered not indexed,” this status means Google evaluated your page and decided it wasn’t worth indexing. Here’s how to diagnose which quality signals failed — and how to fix them.
🕐 15 min read
Configuration
robots.txt Complete Guide: The Most Dangerous File on Your Website
A single misconfiguration in robots.txt can block your entire site from Google in minutes. How to write, test, and audit your robots.txt so it never becomes a catastrophe.
🕐 12 min read
Canonical Tags
Canonical Tags: The Complete 2026 Guide (With Every Edge Case)
How canonical tags work, when Google ignores them, the difference between hints and directives, and how to audit your site for canonical conflicts using Screaming Frog and GSC.
🕐 17 min read
Crawl Budget
Crawl Budget Optimisation: How to Get Google to Crawl Your Best Pages First
What crawl budget actually means, which sites need to worry about it, and the specific technical changes that direct Googlebot’s attention toward your most valuable content.
🕐 14 min read
Free Crawl & Index Audit

Pages Invisible to Google?
Let’s Find Out Why.

Share your site and we’ll run a full crawl and indexing audit — GSC coverage analysis, robots.txt check, canonical conflict detection, sitemap health, and a prioritised fix list. Free, no pitch.

  • GSC Coverage report analysis — every excluded status explained
  • robots.txt audit — check for accidental blocking rules
  • Canonical tag conflict detection across key pages
  • Sitemap health check — errors, redirects, noindexed URLs
  • Internal link depth analysis for priority pages
  • Prioritised fix list delivered to your inbox within 48 hours
No sales calls
Response within 48 hours
100% free
Request Your Free Crawl Audit
// Reviewed by a technical SEO specialist.
Audit Request Received
We’ll analyse your crawl and indexing data and send your prioritised fix list within 48 hours. Check your inbox — and spam, just in case.
IQ

Sayed Iftekharul Haque — SEO Strategist & Web Designer

Founder of IndXQ. Specialises in SEO-first website redesigns, Core Web Vitals, and digital growth strategy. Available for projects via Fiverr, Upwork, and direct engagements. Connect on LinkedIn or watch free SEO tutorials on YouTube.

Published by IndXQ · Web Strategy & SEO · April 2026 · All rights reserved.

Scroll to Top