Site Crawler

The Agentic SEO site crawler indexes your pages so the AI agent can understand your content, find thin pages, and suggest improvements grounded in what you actually have published.

How Crawling Works

When you trigger a crawl in Agentic SEO, the crawler starts from your sitemap (or the root URL if no sitemap is found) and visits each page to extract structured data. Progress streams in real time — you'll see each URL being processed as it happens.

The crawler respects your sitemap structure and stays within your domain. It doesn't follow external links or index third-party content.

What Gets Extracted

For each crawled page, Agentic SEO extracts and stores:

  • Page title — the <title> tag
  • Meta description — for CTR analysis and optimization suggestions
  • Headings — H1 through H6 structure for content hierarchy analysis
  • Body content — the main text content of each page
  • URL structure — path hierarchy and slug patterns
  • Sitemap URLs — stored separately for comprehensive URL inventory

This data becomes the foundation for site context queries, internal link suggestions, content gap analysis, and writing style extraction.

Crawl Limits by Plan

PlanMax Pages per CrawlBest For
Starter (Free)25 pagesSmall blogs, single-product sites
Pro ($29/mo)100 pagesMedium blogs, business sites
Agency ($79/mo)500 pagesLarge sites, e-commerce, client portfolios
Tip

If your site has more pages than your crawl limit allows, the crawler prioritizes pages from your sitemap. Make sure your sitemap lists your most important pages first.

How the Agent Uses Crawl Data

Once your site is crawled, the Agentic SEO agent can access your content through the site_context tool. This tool searches your crawled pages to answer questions about your existing content — without re-crawling each time.

Site Context Query Types

The site_context tool supports three query types:

Query TypeWhat It DoesExample Use
searchSearches crawled page content by keyword"Find all my pages about email marketing"
thin_pagesIdentifies pages with low word count"Which pages need more content?"
keyword_checkChecks if a keyword appears across your pages"Do I already have content covering this topic?"

Check if I already have content about 'local SEO' before I write a new article

The agent searches your crawled pages for existing coverage, preventing keyword cannibalization.

When to Re-Crawl

Re-crawl your site when:

  • You've published significant new content since the last crawl
  • You've restructured your site or changed URL patterns
  • You want the agent to have fresh content data for analysis
  • You've updated meta descriptions or titles and want to verify changes
Note

Each crawl replaces the previous data — Agentic SEO stores one snapshot per project. Your GSC data is separate and unaffected by re-crawls.

Try these prompts

Find thin pages on my site that need more content
Which of my pages are missing meta descriptions?
Search my site for pages about "keyword research"
Analyze my site structure and suggest improvements

© 2026 Agentic SEO. All rights reserved.