Digital experiences built for performance + scale
Back to Blog

SEO

Robots.txt and noindex QA checklist before launch

Use this checklist before launch to keep staging pages, filters, internal search, duplicate URLs, and retired pages from leaking into search or blocking the wrong pages.

Abstract technical SEO crawl-control dashboard with robots gates, noindex checks, sitemap paths, and launch QA indicators

Practical tool

Crawl control QA

Published

Jun 8, 2026

Read time

10 min read

Topic

Technical SEO / Shopify / WordPress / Headless / QA / Playbook

01

Use this before crawl rules change

Robots.txt and noindex mistakes are quiet launch risks. One line can keep an entire staging site out of search, protect private URLs, or accidentally block product, service, and article pages that should be indexed. The problem is that crawl-control rules often live across several systems.

This checklist is for Shopify stores, WordPress sites, headless builds, and redesign launches where URL structure, filters, preview environments, or old pages are changing. Run it before launch, after a platform migration, and whenever an SEO plugin, theme, middleware, CDN rule, or CMS template changes indexing behavior.

02

Step 1: Inventory every crawl-control surface

Start by listing every place that can tell crawlers what to crawl or index. Do not only check the visible robots.txt file. Indexing behavior can be controlled by page templates, SEO plugins, HTTP headers, route handlers, CDN rules, app extensions, password gates, and CMS fields.

Create one inventory row per control surface. Record the owner, where the rule is configured, which URL patterns it affects, how it is deployed, and how the team can roll it back. If nobody owns a rule, treat it as a launch risk.

  • Robots.txt rules: disallow, allow, sitemap references, platform-generated defaults, and custom overrides.
  • Page-level rules: meta robots noindex, nofollow, max-snippet, max-image-preview, and template defaults.
  • Header rules: X-Robots-Tag on PDFs, images, feeds, API routes, preview routes, and generated files.
  • Routing rules: redirects, rewrites, canonical URLs, hreflang alternates, sitemap entries, and status codes.
  • Platform settings: Shopify theme files and app output, WordPress reading settings and SEO plugins, headless middleware, and CDN edge logic.

03

Step 2: Assign every URL group one index decision

Before editing rules, classify URL groups by intent. A redesign usually has product or service pages that should be indexed, utility pages that should work but not rank, duplicate states that should canonicalize, and retired URLs that should redirect or return a clear status code.

The useful output is a simple decision table. Each URL group should have one primary rule: index, noindex, disallow, redirect, canonicalize, 404, or 410. If one URL group needs two rules, write down which rule wins and how it will be tested.

  • Index: homepage, service pages, product pages, collection pages, articles, case studies, and market-specific landing pages that have search value.
  • Noindex: account pages, cart states, internal search results, thin filtered pages, campaign previews, and duplicate utility pages that crawlers can access.
  • Disallow: private admin paths, staging-only assets, internal scripts, and crawl traps that should not be requested by search bots.
  • Redirect: moved pages, merged content, renamed handles, retired campaign URLs with replacement pages, and legacy CMS routes.
  • 404 or 410: removed pages with no useful replacement, expired one-off pages, and test URLs that should disappear cleanly.

04

Step 3: QA platform defaults before custom rules

Most crawl-control bugs come from a default setting that nobody remembered. Shopify, WordPress, and headless stacks all have different places where robots and noindex rules can appear. Check the defaults first, then add custom rules only where the default behavior is not enough.

For each platform, test one live page, one draft or preview page, one search or filter page, one media or file URL, and one old URL. That sample usually exposes whether rules are coming from the platform, the theme, a plugin, middleware, or the hosting layer.

  • Shopify: review robots.txt.liquid, collection filter URLs, search URLs, product handles, app-injected tags, and market-specific paths.
  • WordPress: check the Reading setting, SEO plugin index rules, custom post type archives, media attachment pages, author archives, tag archives, and staging plugins.
  • Headless: inspect robots routes, sitemap generation, metadata components, middleware, preview mode, cache headers, and CDN edge functions.
  • Multilingual sites: confirm localized routes, hreflang targets, canonical targets, and market-specific sitemaps are not blocked by broad rules.

05

Step 4: Test noindex, robots.txt, and canonical interactions

Robots.txt, noindex, and canonical tags are not interchangeable. A page that is disallowed in robots.txt may not be crawled, which means search engines may not see its noindex tag. A noindex page that remains in the sitemap sends a conflicting signal. A canonical target that redirects or noindexes can create a broken consolidation path.

Make a small interaction matrix before launch. For each risky URL pattern, record the robots.txt result, status code, meta robots value, X-Robots-Tag header, canonical target, sitemap inclusion, and expected search behavior.

  • Do not disallow pages only because they should be noindexed; crawlers need to access the page to see the noindex directive.
  • Remove noindex URLs from XML sitemaps unless there is a documented temporary reason during migration.
  • Make canonical targets return 200, be indexable, and appear in the correct localized or canonical sitemap.
  • Keep redirected URLs out of sitemaps and internal links once the redirect map is live.
  • Check faceted URLs and URL parameters against canonical, noindex, and robots rules together, not in separate reviews.

06

Step 5: Crawl staging and production with the same URL sample

Use the same QA sample in staging and production so differences are visible. Include 10 to 20 priority URLs from each important group: homepage, service or product pages, category pages, blog posts, internal search, filtered URLs, localized routes, media files, old redirected URLs, and one intentionally removed URL.

The sample should include clean URLs and messy URLs. Add a URL with parameters, a URL with uppercase characters if the old site had them, a trailing-slash variant, a paginated or filtered URL, a preview URL, and a legacy URL from analytics or Search Console.

  • Compare robots.txt, status code, meta robots, canonical, sitemap presence, hreflang, and rendered title for each URL.
  • Check staging protection separately so staging is blocked from indexing without copying that block into production.
  • Confirm production is not carrying a temporary noindex from a launch freeze, preview plugin, or environment variable.
  • Test old URLs from the redirect map, not only new navigation links.

07

Step 6: Verify headers and rendered HTML

Do not trust only the CMS preview or page source. JavaScript, server components, plugins, middleware, and CDN rules can change what crawlers receive. Test the final HTTP response and the rendered DOM for priority templates.

For files and non-HTML routes, check headers directly. PDFs, images, feeds, JSON endpoints, and generated XML files can receive X-Robots-Tag headers that never appear in page source. This matters when a redesign publishes gated files, downloadable specs, product feeds, or legacy assets.

  • Verify HTML source and rendered DOM both contain the intended robots and canonical values.
  • Check X-Robots-Tag headers on PDFs, images, feeds, API responses, XML files, and preview routes.
  • Confirm status codes are intentional: 200 for live indexable pages, 301 or 308 for permanent redirects, 404 or 410 for removed pages.
  • Retest after CDN cache clears because stale robots.txt, sitemap, or header responses can outlive the deploy.

08

Step 7: Create a launch and rollback checklist

Treat crawl-control changes like a release, not a setting update. Save a copy of the current robots.txt, sitemap output, key template metadata, SEO plugin rules, middleware rules, and CDN rules before launch. Then define exactly who can approve the final production crawl behavior.

A rollback does not always mean reverting the whole site. Sometimes the safest fix is restoring robots.txt, removing a noindex field, disabling one CDN header rule, regenerating the sitemap, or reverting a single metadata component.

  • Before launch: export robots.txt, sitemap URLs, index rule settings, metadata templates, redirect rules, and priority URL QA results.
  • During launch: publish rules, clear cache, regenerate sitemaps, crawl the priority sample, and test 20 high-value URLs manually.
  • Rollback triggers: homepage noindex, priority templates blocked, staging indexed, sitemap missing live URLs, or redirect chains on important pages.
  • After rollback: document the failed rule, owner, affected URLs, fix, and prevention step before the next deploy.

09

Post-launch checks for the first two weeks

The first 24 to 48 hours are for obvious errors: robots.txt availability, sitemap fetches, homepage and priority template indexability, redirect status, and accidental staging exposure. After that, watch Search Console coverage, crawl stats, indexed pages, 404s, excluded-by-noindex reports, and server logs if available.

Keep the monitoring window open for 14 days. Crawlers do not revisit every URL immediately, and delayed cache, sitemap, or redirect issues may appear only after the first crawl wave. Log every issue with URL, rule source, expected behavior, actual behavior, owner, and prevention note.

10

Copy this robots/noindex QA template

URL group map: group name, example URLs, desired search behavior, robots.txt rule, meta robots rule, X-Robots-Tag rule, canonical rule, sitemap rule, owner, and rollback path.

Priority URL test: URL, status code, robots.txt result, meta robots, header robots, canonical target, sitemap presence, hreflang target, rendered title, and pass or fail.

Launch log: changed rule, environment, deploy time, cache clear time, validator, affected URL count, rollback trigger, and post-launch monitoring date.

Robots/noindex checklist

  • 01Inventory every place that can control crawling or indexing, including robots.txt, meta robots, HTTP headers, canonicals, redirects, sitemaps, and platform settings.
  • 02Classify URLs before editing rules so product, service, blog, filter, search, account, staging, and campaign pages each have a clear index decision.
  • 03Test robots.txt and noindex rules with rendered HTML and HTTP headers, not only theme settings or source code.
  • 04Check how noindex interacts with canonicals, redirects, sitemaps, hreflang, and faceted URLs before launch.
  • 05Monitor Search Console coverage, crawl stats, 404s, and indexed staging URLs for the first two weeks after release.

Keep reading

Now booking for Q2 2026

Start a project

Tell us your goal, timeline, and budget. We'll reply within 2 business days with the best next step.

I'm Max, founder of Build Build Studio. I work with a small network of trusted designers, developers, and specialists, keeping senior attention and direct communication close to every project.
Mo – Fr: 9AM–5PMGMT+8 local time

Project communication

Mandarin / ChineseNativeCantoneseNativeEnglishWorking proficiency

Formal proposals and pitch work are scoped as paid discovery.

Start a project