Site Crawler
The Agentic SEO site crawler indexes your pages so the AI agent can understand your content, find thin pages, and suggest improvements grounded in what you actually have published.
How Crawling Works
When you trigger a crawl in Agentic SEO, the crawler starts from your sitemap (or the root URL if no sitemap is found) and visits each page to extract structured data. Progress streams in real time — you'll see each URL being processed as it happens.
The crawler respects your sitemap structure and stays within your domain. It doesn't follow external links or index third-party content.
What Gets Extracted
For each crawled page, Agentic SEO extracts and stores:
- Page title — the <title> tag
- Meta description — for CTR analysis and optimization suggestions
- Headings — H1 through H6 structure for content hierarchy analysis
- Body content — the main text content of each page
- URL structure — path hierarchy and slug patterns
- Sitemap URLs — stored separately for comprehensive URL inventory
This data becomes the foundation for site context queries, internal link suggestions, content gap analysis, and writing style extraction.
Crawl Limits by Plan
| Plan | Max Pages per Crawl | Best For |
|---|---|---|
| Starter (Free) | 25 pages | Small blogs, single-product sites |
| Pro ($29/mo) | 100 pages | Medium blogs, business sites |
| Agency ($79/mo) | 500 pages | Large sites, e-commerce, client portfolios |
If your site has more pages than your crawl limit allows, the crawler prioritizes pages from your sitemap. Make sure your sitemap lists your most important pages first.
How the Agent Uses Crawl Data
Once your site is crawled, the Agentic SEO agent can access your content through the site_context tool. This tool searches your crawled pages to answer questions about your existing content — without re-crawling each time.
Site Context Query Types
The site_context tool supports three query types:
| Query Type | What It Does | Example Use |
|---|---|---|
| search | Searches crawled page content by keyword | "Find all my pages about email marketing" |
| thin_pages | Identifies pages with low word count | "Which pages need more content?" |
| keyword_check | Checks if a keyword appears across your pages | "Do I already have content covering this topic?" |
“Check if I already have content about 'local SEO' before I write a new article”
The agent searches your crawled pages for existing coverage, preventing keyword cannibalization.
When to Re-Crawl
Re-crawl your site when:
- You've published significant new content since the last crawl
- You've restructured your site or changed URL patterns
- You want the agent to have fresh content data for analysis
- You've updated meta descriptions or titles and want to verify changes
Each crawl replaces the previous data — Agentic SEO stores one snapshot per project. Your GSC data is separate and unaffected by re-crawls.
Try these prompts
© 2026 Agentic SEO. All rights reserved.