Crawl Optimization

Maximize crawl value within your page limits — prioritize the right pages so the agent always has the context it needs.

How the Crawler Works

The crawler uses a sitemap-first approach. If a sitemap is found at the standard location or linked from your root page, those URLs form the crawl queue. Without a sitemap, the crawler falls back to link discovery starting from your homepage. Progress streams in real time via SSE, and extracted data is stored per-project as a JSON snapshot the agent can query instantly.

Tip

Most sitemap generators list pages newest-first. If your top pages are older pillar posts, move them to the top of your sitemap so they're always included within your crawl limit.

Crawl Limits by Plan

Plan	Max Pages per Crawl	Best Fit
Free	25 pages	Small blogs, landing pages, early-stage projects
Pro ($29/mo)	100 pages	Business blogs, content-heavy SaaS
Agency ($79/mo)	500 pages	Large sites, e-commerce, multi-section publications

Each crawl replaces the previous snapshot. There is no historical archive, so re-crawl whenever your site changes significantly.

Prioritization Strategies

When your site has more pages than your crawl limit, prioritize the pages the agent needs most for content analysis, link suggestions, and writing style extraction.

Put product pages, service pages, and high-traffic pillar content at the top of your sitemap
List important blog posts before category or tag archive pages
Exclude utility pages (author archives, pagination, login) from your sitemap entirely

“Crawl my site focusing on the blog section first”

The agent prioritizes that section's pages within your plan limit.

Note

For large sites on Pro, consider temporarily pruning your sitemap to a specific section (e.g., only /docs/ URLs) when you need deep analysis of that area.

When to Re-Crawl

Re-crawl when any of the following apply:

New content published — the agent cannot see pages added after the last crawl
Site restructured — changed URLs, merged sections, or reorganized navigation
Key pages updated — title changes, rewrites, or meta description updates
4-6 weeks have passed — periodic refreshes keep agent context accurate

Note

GSC data syncs independently. A re-crawl updates page content knowledge only — it does not affect your search performance data.

How the Agent Uses Crawl Data

The agent accesses crawl data through the site_context tool. This lets it search pages, identify thin content, and check keyword coverage against the stored snapshot without live network requests.

Agent Task	How Crawl Data Helps
Content gap analysis	Checks which topics already have pages before suggesting new ones
Internal link suggestions	Finds relevant anchor opportunities across your site
Content audit	Identifies thin pages, missing meta descriptions, and H1 issues
Writing style extraction	Samples existing content to model your brand voice
Keyword cannibalization	Finds multiple pages targeting the same query

“Analyze my crawled content for thin pages that need improvement”

The agent scans your crawl snapshot and returns pages below a word count threshold with expansion suggestions.

Tips for Best Results

Keep your sitemap current. Outdated sitemaps waste quota on deleted pages and miss new ones.
Match crawl scope to your question. Crawl a specific section for focused analysis, or crawl broadly for a full audit.
Re-crawl before content sprints. Fresh context prevents the agent from reasoning on stale data.
Check crawl status first. Ask the agent what's been crawled and when before starting analysis.

Tip

On the Free plan, 25 pages goes further than you'd expect. Focus on your top pillar pages plus highest-traffic blog posts — enough for the agent to understand your site's structure, voice, and topical coverage.

Try these prompts

“Crawl my site focusing on the blog section first”

“What pages have I crawled and when was the last crawl?”

“Analyze my crawled content for thin pages that need improvement”

Site Crawler Internal Link Suggester Usage & Limits Dashboard

Usage & Limits Dashboard

Content Gap Analysis