The AI crawler landscape is changing fast. AEO Pugmill exists to track it honestly and help WordPress sites prepare for it practically.
AEO Pugmill adds structured data and machine-readable endpoints to WordPress posts. Some outputs are served as separate URLs that bots can request independently. Others are embedded in the HTML of the page itself. The distinction matters for tracking, and for understanding the limits of what bot analytics can tell us.
Each of these is a distinct resource a bot can request. When a crawler fetches one, the plugin logs the bot name, the resource type, and the date. That per-resource granularity is what makes the network dashboard possible; it shows which bots are requesting which content formats.
A plain-text index of the site: title, description, and a list of posts with summaries and links to their Markdown versions. Follows the llms.txt specification. AI crawlers use it as a table of contents to decide which pages to fetch in full.
Dynamically generated by a WordPress rewrite rule; no static file is written to disk. WordPress intercepts requests to /llms.txt and renders the response in-memory.
# AEO Pugmill
> https://aeopugmill.com
The WordPress plugin for Answer Engine Optimization — FAQPage schema, entity graphs, llms.txt, and AI bot analytics.
## Posts
- [How AI Crawlers Read Your Content](https://aeopugmill.com/how-ai-crawlers-work): What GPTBot, ClaudeBot, and PerplexityBot actually fetch, why structured data changes what they cite, and how to track it.
Markdown: https://aeopugmill.com/how-ai-crawlers-work/?aeopugmill_llm=1
- [What is llms.txt?](https://aeopugmill.com/what-is-llms-txt): The emerging open standard that gives AI crawlers a structured index of your site's content.
Markdown: https://aeopugmill.com/what-is-llms-txt/?aeopugmill_llm=1
## Pages
- [About](https://aeopugmill.com/about): The Pugmill network, what the plugin does, and how bot tracking works.
Markdown: https://aeopugmill.com/about/?aeopugmill_llm=1
llms_txt; each bot request is counted separately from HTML page visits.
A structured Markdown rendering of a single post. Includes metadata (publish date, modified date, featured image), the AEO summary, entity list, Q&A pairs, keywords, and the full post content converted to Markdown. Gives AI crawlers a clean, parse-ready version of the content without HTML markup or theme chrome.
Served by intercepting the standard post URL when the ?aeopugmill_llm=1 query parameter is present. The same permalink that normally returns the HTML page returns a Markdown document instead; no extra URL or file required.
# Circuit Breakers in Practice
URL: https://example.com/circuit-breakers
Published: 2026-01-15T10:30:00Z
Modified: 2026-03-10T14:22:15Z
## Summary
Circuit breaker patterns prevent cascading failures by wrapping
remote calls in a state machine that trips open after repeated errors.
**Keywords:** circuit breaker, microservices, fault tolerance
## Entities
- Martin Fowler (Person) — Software author and ThoughtWorks chief scientist
- Hystrix (Technology) — Netflix's circuit breaker library
## Q&A
**Q: When should a circuit breaker trip open?**
After a configurable threshold of consecutive failures within
a rolling time window.
## Content
The full post body in Markdown...
post_markdown
A Markdown overview of the site served at the home URL with the aeopugmill_llm=1 parameter. Lists the five most recent posts with summaries and links to the full content index at /llms.txt and /llms-full.txt.
Same query parameter mechanism as Post Markdown, but applied to the home URL (/?aeopugmill_llm=1). Returns a site-level Markdown overview rather than a single post.
site_summary
A standalone JSON-LD file containing the FAQPage schema, entity mentions, citations, and an associatedMedia link to the Markdown endpoint. Served only for posts that have AEO data. Gives bots direct access to the structured data without parsing the HTML page.
A dynamic endpoint registered via a WordPress rewrite rule matching /aeo/*.jsonld. The file does not exist on disk; WordPress intercepts the request and generates the JSON-LD response from the post's stored AEO metadata.
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "FAQPage",
"mainEntity": [{
"@type": "Question",
"name": "When should a circuit breaker trip open?",
"acceptedAnswer": {
"@type": "Answer",
"text": "After a configurable threshold of consecutive failures..."
}
}]
},
{
"@type": "Article",
"headline": "Circuit Breakers in Practice",
"mentions": [{
"@type": "Person",
"name": "Martin Fowler",
"sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
}],
"citation": [{
"@type": "WebPage",
"url": "https://martinfowler.com/bliki/CircuitBreaker.html",
"name": "CircuitBreaker — Martin Fowler"
}],
"associatedMedia": {
"@type": "MediaObject",
"encodingFormat": "text/markdown",
"contentUrl": "https://example.com/circuit-breakers/?aeopugmill_llm=1"
}
}
]
}
aeo_jsonld
A standard XML sitemap with one addition: each post entry includes an xhtml:link alternate pointing to its Markdown endpoint. Bots that understand alternate links can discover the structured version without a separate crawl of /llms.txt.
Extends WordPress's built-in sitemap provider via the wp_sitemaps_posts_entry filter. Each post entry gets an xhtml:link alternate pointing to its Markdown endpoint; no separate sitemap file is generated.
<url>
<loc>https://aeopugmill.com/how-ai-crawlers-work</loc>
<lastmod>2026-03-10</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
<xhtml:link rel="alternate" type="text/markdown"
href="https://aeopugmill.com/how-ai-crawlers-work/?aeopugmill_llm=1"/>
</url>
sitemap
The plugin appends a Sitemap directive and an LLMs-Txt directive to the WordPress-generated robots.txt. The LLMs-Txt line signals to AI crawlers that a structured content index is available.
Appended via WordPress's robots_txt filter hook; no file is written to disk. WordPress generates /robots.txt dynamically on each request, and the plugin adds its directives at that point.
Sitemap: https://aeopugmill.com/sitemap.xml
# AI content index
LLMs-Txt: https://aeopugmill.com/llms.txt
robots_txt
The standard WordPress RSS 2.0 feed, enriched with an xmlns:aeo namespace and per-item AEO elements: <aeo:summary>, <aeo:entity>, and <aeo:qa>. AI crawlers that consume RSS feeds receive the full AEO metadata (structured summaries, named entities, and Q&A pairs) alongside post content. Purely additive: existing feed elements including content:encoded are not modified. Can be disabled from the Compatibility tab if another plugin is already enriching the feed.
Hooks into WordPress's rss2_ns and rss2_item filter actions to add the xmlns:aeo namespace declaration and per-item AEO elements to the existing feed. The base RSS 2.0 feed is unchanged; AEO data is purely additive.
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:aeo="https://aeopugmill.com/ns/rss/1.0">
<channel>
<item>
<title>Circuit Breakers in Practice</title>
<link>https://example.com/circuit-breakers</link>
<content:encoded><![CDATA[...full post HTML...]]></content:encoded>
<aeo:summary>Circuit breaker patterns prevent cascading failures
by wrapping remote calls in a state machine that trips open
after repeated errors.</aeo:summary>
<aeo:entity type="Person" sameAs="https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)">
Martin Fowler
</aeo:entity>
<aeo:qa>
<aeo:question>When should a circuit breaker trip open?</aeo:question>
<aeo:answer>After a configurable threshold of consecutive failures
within a rolling time window.</aeo:answer>
</aeo:qa>
</item>
</channel>
</rss>
rss_feed; RSS requests from bots are counted separately from HTML page visits.
These outputs are injected into the HTML <head> of each post. They are present when any bot (or person) loads the page. There is no separate URL to request; the data rides along with the HTML.
When a bot visits an HTML page, the plugin checks at that moment whether the post has AEO metadata stored. If it does, the visit is logged as aeo_post. If not, it logs as html. The visit is still a single HTML page request either way, but the distinction matters: since most WordPress sites have a mix of AEO-enriched and plain posts, this split reveals which bots are landing on enriched content and which are only reaching plain pages. Over time, patterns emerge at the network level; AI answer engines tend to show higher aeo_post ratios than SEO crawlers, which suggests they are finding and returning to structured content rather than treating all HTML equally.
What the tracking cannot tell you is which specific embedded element a bot looked at. A visit to an AEO-enriched post could involve the FAQPage schema, the entity mentions, the citation list, or just the post text. The plugin records that enriched content was present; it cannot record what the bot did with it.
The data is embedded this way because that is where it belongs. FAQPage schema is most effective when it is part of the page's own <script type="application/ld+json"> block, where search engines and AI crawlers expect to find it. Separating it into standalone files would reduce its utility for traditional search while providing no clear benefit for AI crawlers, which already parse the full page.
Generated from the Q&A pairs stored in the post's AEO metadata. Each question becomes a Question node with an acceptedAnswer. This is the same schema format that Google uses for FAQ rich results; AI crawlers also parse it to extract question-answer pairs as citable facts.
Injected into the page <head> as a <script type="application/ld+json"> block via the wp_head action. Present on every page load; no separate URL or request needed.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [{
"@type": "FAQPage",
"@id": "https://example.com/circuit-breakers/#faqpage",
"mainEntity": [{
"@type": "Question",
"name": "When should a circuit breaker trip open?",
"acceptedAnswer": {
"@type": "Answer",
"text": "After a configurable threshold of consecutive failures within a rolling time window."
}
}]
}]
}
</script>
html or aeo_post, and not separately distinguishable from the page visit.
Each entity stored in the post's AEO metadata becomes a typed mentions entry in the Article JSON-LD node. The sameAs URL links to an authoritative reference (typically Wikipedia or an official site), giving AI systems a way to disambiguate the entity, for example "Martin Fowler the software author" versus any other Martin Fowler.
Merged into the same <script type="application/ld+json"> Article node as the FAQPage schema; a single wp_head block carries the Q&A, entity, and citation data together.
"mentions": [
{
"@type": "Person",
"name": "Martin Fowler",
"description": "Software author and ThoughtWorks chief scientist",
"sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
},
{
"@type": "SoftwareApplication",
"name": "Hystrix",
"sameAs": "https://github.com/Netflix/Hystrix"
}
]
html or aeo_post
External links in the post content are extracted and added as citation entries in the Article JSON-LD node. Each citation includes the URL and the link's anchor text. This gives AI systems a structured list of the post's sources without parsing the HTML body.
Part of the same <script type="application/ld+json"> Article node injected via wp_head. Citations are extracted from post content links at render time and included in the unified JSON-LD block.
"citation": [
{
"@type": "WebPage",
"url": "https://martinfowler.com/bliki/CircuitBreaker.html",
"name": "CircuitBreaker — Martin Fowler"
},
{
"@type": "WebPage",
"url": "https://netflix.github.io/Hystrix/",
"name": "Hystrix Wiki"
}
]
html or aeo_post
The plugin injects description, Open Graph, and Twitter Card meta tags derived from the post's AEO summary (falling back to the WordPress excerpt if no summary exists). These are suppressed when an active SEO plugin is detected and the conflict toggle is enabled.
Injected via wp_head as standard <meta> tags. When a supported SEO plugin (Yoast, Rank Math, AIOSEO) is detected and the conflict toggle is enabled, the plugin suppresses these tags to avoid duplicates.
<meta name="description" content="Circuit breaker patterns prevent cascading failures by wrapping remote calls in a state machine.">
<meta property="og:title" content="Circuit Breakers in Practice">
<meta property="og:description" content="Circuit breaker patterns prevent cascading failures...">
<meta property="og:type" content="article">
<meta name="twitter:card" content="summary_large_image">
html or aeo_post
On every incoming request, the plugin checks the user-agent string against a list of 25 known bot signatures: AI answer engines (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), training crawlers (CCBot, Bytespider, DeepSeekBot), search engines (Googlebot, Bingbot), and SEO tools (SemrushBot, AhrefsBot). Matching is case-insensitive substring comparison.
When a match is found, the plugin records three things: the canonical bot name, the resource type requested (one of html, llms_txt, post_markdown, site_summary, sitemap, robots_txt, aeo_jsonld, rss_feed, or aeo_post), and the date. Counts are stored locally in a daily summary table; no per-request log is kept.
For content-bearing requests (HTML pages), the plugin also records content signals: word count bucket, content freshness, fact density, and URL depth. These are the metrics displayed in the Crawl Intelligence section of the network dashboard.
Network participation is opt-in. The plugin functions without contributing to the network. Enabling network intelligence sends a daily count summary: bot name, resource type, visit total, and content signal distributions. No URLs, no content, no user data.
The site identifier sent with each payload is a one-way SHA-256 hash of the site URL and an instance ID. It cannot be reversed to recover the domain. It exists to deduplicate contributions from the same site across days.
When a post is published or updated, the plugin sends an IndexNow ping to notify search engines that the URL has changed. The ping includes a verification key stored as a static file at the site root. Pings are throttled to one burst per 30 minutes to avoid rate-limiting.
IndexNow is supported by Bing, Yandex, and other participating search engines. Google does not currently participate in the IndexNow protocol but receives a separate sitemap ping.
The phrase comes from Juvenal, a Roman poet writing satirical verse in the late 1st and early 2nd centuries AD. It appears in Satire VI, his longest and most acerbic work, in a passage about the futility of keeping a wife faithful: you cannot trust the guards you hire to watch over her, because the guards themselves need watching. The original meaning was less about political theory and more about the impossibility of reliable oversight at all.
The phrase outlasted its domestic context entirely. By the time it entered political philosophy, it had become a foundational challenge to any system of power: who holds the overseers accountable? It now appears in arguments about police oversight, intelligence agencies, judicial review, and, increasingly, the systems that monitor digital behaviour.
AEO Pugmill borrows it literally. AI crawlers and search bots are themselves watchers; they index, train on, and cite the web. The plugin watches them back: logging which bots visit, which content they read, and how their behaviour shifts over time. The network turns that data into a public record. We don't decide how AI systems should behave. We just watch the watchers, and publish what we see.
AEO Pugmill is built by Janzen Works. The plugin is free and available to download directly. The network intelligence dashboard is open to anyone at aeopugmill.com.
Feedback, bug reports, and data questions: michael@janzenworks.com