About AEO Pugmill

What the plugin does

Structuring content for AI crawlers

AEO Pugmill adds structured data and machine-readable endpoints to WordPress posts. Some outputs are served as separate URLs that bots can request independently. Others are embedded in the HTML of the page itself. The distinction matters for tracking, and for understanding the limits of what bot analytics can tell us.

The plugin is fully functional as described on this page — all AEO endpoints, bot analytics, and network intelligence are available in the free version. A pro version adds convenience features and automation, but no capability that isn't already present in the free plugin.

Trackable endpoints

Outputs served at their own URLs

Each of these is a distinct resource a bot can request. When a crawler fetches one, the plugin logs the bot name, the resource type, and the date. That per-resource granularity is what makes the network dashboard possible; it shows which bots are requesting which content formats.

llms.txt

/llms.txt

A plain-text index of the site: title, description, and a list of posts with summaries and links to their Markdown versions. Follows the llms.txt specification. AI crawlers use it as a table of contents to decide which pages to fetch in full.

How It's Served

Dynamically generated by a WordPress rewrite rule; no static file is written to disk. WordPress intercepts requests to /llms.txt and renders the response in-memory.

# AEO Pugmill

> https://aeopugmill.com

The WordPress plugin for Answer Engine Optimization — FAQPage schema, entity graphs, llms.txt, and AI bot analytics.

## Posts

- [How AI Crawlers Read Your Content](https://aeopugmill.com/how-ai-crawlers-work): What GPTBot, ClaudeBot, and PerplexityBot actually fetch, why structured data changes what they cite, and how to track it.
  Markdown: https://aeopugmill.com/how-ai-crawlers-work/?aeopugmill_llm=1

- [What is llms.txt?](https://aeopugmill.com/what-is-llms-txt): The emerging open standard that gives AI crawlers a structured index of your site's content.
  Markdown: https://aeopugmill.com/what-is-llms-txt/?aeopugmill_llm=1

## Pages

- [About](https://aeopugmill.com/about): The Pugmill network, what the plugin does, and how bot tracking works.
  Markdown: https://aeopugmill.com/about/?aeopugmill_llm=1

Tracked as: llms_txt; each bot request is counted separately from HTML page visits.

Post Markdown

/your-post/?aeopugmill_llm=1

A structured Markdown rendering of a single post. Includes metadata (publish date, modified date, featured image), the AEO summary, entity list, Q&A pairs, keywords, and the full post content converted to Markdown. Gives AI crawlers a clean, parse-ready version of the content without HTML markup or theme chrome.

How It's Served

Served by intercepting the standard post URL when the ?aeopugmill_llm=1 query parameter is present. The same permalink that normally returns the HTML page returns a Markdown document instead; no extra URL or file required.

# Circuit Breakers in Practice

URL: https://example.com/circuit-breakers
Published: 2026-01-15T10:30:00Z
Modified: 2026-03-10T14:22:15Z

## Summary

Circuit breaker patterns prevent cascading failures by wrapping
remote calls in a state machine that trips open after repeated errors.

**Keywords:** circuit breaker, microservices, fault tolerance

## Entities

- Martin Fowler (Person) — Software author and ThoughtWorks chief scientist
- Hystrix (Technology) — Netflix's circuit breaker library

## Q&A

**Q: When should a circuit breaker trip open?**
After a configurable threshold of consecutive failures within
a rolling time window.

## Content

The full post body in Markdown...

Tracked as: post_markdown

Site summary

/?aeopugmill_llm=1

A Markdown overview of the site served at the home URL with the aeopugmill_llm=1 parameter. Lists the five most recent posts with summaries and links to the full content index at /llms.txt and /llms-full.txt.

How It's Served

Same query parameter mechanism as Post Markdown, but applied to the home URL (/?aeopugmill_llm=1). Returns a site-level Markdown overview rather than a single post.

Tracked as: site_summary

AEO JSON-LD

/aeo/your-post.jsonld

A standalone JSON-LD file containing the FAQPage schema, entity mentions, citations, and an associatedMedia link to the Markdown endpoint. Served only for posts that have AEO data. Gives bots direct access to the structured data without parsing the HTML page.

How It's Served

A dynamic endpoint registered via a WordPress rewrite rule matching /aeo/*.jsonld. The file does not exist on disk; WordPress intercepts the request and generates the JSON-LD response from the post's stored AEO metadata.

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "FAQPage",
      "mainEntity": [{
        "@type": "Question",
        "name": "When should a circuit breaker trip open?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "After a configurable threshold of consecutive failures..."
        }
      }]
    },
    {
      "@type": "Article",
      "headline": "Circuit Breakers in Practice",
      "mentions": [{
        "@type": "Person",
        "name": "Martin Fowler",
        "sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
      }],
      "citation": [{
        "@type": "WebPage",
        "url": "https://martinfowler.com/bliki/CircuitBreaker.html",
        "name": "CircuitBreaker — Martin Fowler"
      }],
      "associatedMedia": {
        "@type": "MediaObject",
        "encodingFormat": "text/markdown",
        "contentUrl": "https://example.com/circuit-breakers/?aeopugmill_llm=1"
      }
    }
  ]
}

Tracked as: aeo_jsonld

XML sitemap

/sitemap.xml

A standard XML sitemap with one addition: each post entry includes an xhtml:link alternate pointing to its Markdown endpoint. Bots that understand alternate links can discover the structured version without a separate crawl of /llms.txt.

How It's Served

Extends WordPress's built-in sitemap provider via the wp_sitemaps_posts_entry filter. Each post entry gets an xhtml:link alternate pointing to its Markdown endpoint; no separate sitemap file is generated.

<url>
  <loc>https://aeopugmill.com/how-ai-crawlers-work</loc>
  <lastmod>2026-03-10</lastmod>
  <changefreq>weekly</changefreq>
  <priority>0.8</priority>
  <xhtml:link rel="alternate" type="text/markdown"
    href="https://aeopugmill.com/how-ai-crawlers-work/?aeopugmill_llm=1"/>
</url>

Tracked as: sitemap

robots.txt additions

/robots.txt

The plugin appends a Sitemap directive and an LLMs-Txt directive to the WordPress-generated robots.txt. The LLMs-Txt line signals to AI crawlers that a structured content index is available.

How It's Served

Appended via WordPress's robots_txt filter hook; no file is written to disk. WordPress generates /robots.txt dynamically on each request, and the plugin adds its directives at that point.

Sitemap: https://aeopugmill.com/sitemap.xml

# AI content index
LLMs-Txt: https://aeopugmill.com/llms.txt

Tracked as: robots_txt

RSS+AEO Feed

/feed/

The standard WordPress RSS 2.0 feed, enriched with an xmlns:aeo namespace and per-item AEO elements: <aeo:summary>, <aeo:entity>, and <aeo:qa>. AI crawlers that consume RSS feeds receive the full AEO metadata (structured summaries, named entities, and Q&A pairs) alongside post content. Purely additive: existing feed elements including content:encoded are not modified. Can be disabled from the Compatibility tab if another plugin is already enriching the feed.

How It's Served

Hooks into WordPress's rss2_ns and rss2_item filter actions to add the xmlns:aeo namespace declaration and per-item AEO elements to the existing feed. The base RSS 2.0 feed is unchanged; AEO data is purely additive.

<rss version="2.0"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:aeo="https://aeopugmill.com/ns/rss/1.0">
  <channel>
    <item>
      <title>Circuit Breakers in Practice</title>
      <link>https://example.com/circuit-breakers</link>
      <content:encoded><![CDATA[...full post HTML...]]></content:encoded>

      <aeo:summary>Circuit breaker patterns prevent cascading failures
by wrapping remote calls in a state machine that trips open
after repeated errors.</aeo:summary>

      <aeo:entity type="Person" sameAs="https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)">
        Martin Fowler
      </aeo:entity>

      <aeo:qa>
        <aeo:question>When should a circuit breaker trip open?</aeo:question>
        <aeo:answer>After a configurable threshold of consecutive failures
within a rolling time window.</aeo:answer>
      </aeo:qa>
    </item>
  </channel>
</rss>

Tracked as: rss_aeo when AEO enrichment is enabled; rss_feed when disabled. Both are counted separately from HTML page visits.

Embedded in HTML

Outputs that live inside the page

These outputs are injected into the HTML <head> of each post. They are present when any bot (or person) loads the page. There is no separate URL to request; the data rides along with the HTML.

When a bot visits an HTML page, the plugin checks at that moment whether the post has AEO metadata stored. If it does, the visit is logged as aeo_post. If not, it logs as html. The visit is still a single HTML page request either way, but the distinction matters: since most WordPress sites have a mix of AEO-enriched and plain posts, this split reveals which bots are landing on enriched content and which are only reaching plain pages. Over time, patterns emerge at the network level; AI answer engines tend to show higher aeo_post ratios than SEO crawlers, which suggests they are finding and returning to structured content rather than treating all HTML equally.

What the tracking cannot tell you is which specific embedded element a bot looked at. A visit to an AEO-enriched post could involve the FAQPage schema, the entity mentions, the citation list, or just the post text. The plugin records that enriched content was present; it cannot record what the bot did with it.

The data is embedded this way because that is where it belongs. FAQPage schema is most effective when it is part of the page's own <script type="application/ld+json"> block, where search engines and AI crawlers expect to find it. Separating it into standalone files would reduce its utility for traditional search while providing no clear benefit for AI crawlers, which already parse the full page.

FAQPage JSON-LD

Embedded in <head>

Generated from the Q&A pairs stored in the post's AEO metadata. Each question becomes a Question node with an acceptedAnswer. This is the same schema format that Google uses for FAQ rich results; AI crawlers also parse it to extract question-answer pairs as citable facts.

How It's Served

Injected into the page <head> as a <script type="application/ld+json"> block via the wp_head action. Present on every page load; no separate URL or request needed.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [{
    "@type": "FAQPage",
    "@id": "https://example.com/circuit-breakers/#faqpage",
    "mainEntity": [{
      "@type": "Question",
      "name": "When should a circuit breaker trip open?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "After a configurable threshold of consecutive failures within a rolling time window."
      }
    }]
  }]
}
</script>

Tracked as: html or aeo_post, and not separately distinguishable from the page visit.

Entity mentions with sameAs

Embedded in <head>

Each entity stored in the post's AEO metadata becomes a typed mentions entry in the Article JSON-LD node. The sameAs URL links to an authoritative reference (typically Wikipedia or an official site), giving AI systems a way to disambiguate the entity, for example "Martin Fowler the software author" versus any other Martin Fowler.

How It's Served

Merged into the same <script type="application/ld+json"> Article node as the FAQPage schema; a single wp_head block carries the Q&A, entity, and citation data together.

"mentions": [
  {
    "@type": "Person",
    "name": "Martin Fowler",
    "description": "Software author and ThoughtWorks chief scientist",
    "sameAs": "https://en.wikipedia.org/wiki/Martin_Fowler_(software_engineer)"
  },
  {
    "@type": "SoftwareApplication",
    "name": "Hystrix",
    "sameAs": "https://github.com/Netflix/Hystrix"
  }
]

Tracked as: html or aeo_post

Citation JSON-LD

Embedded in <head>

External links in the post content are extracted and added as citation entries in the Article JSON-LD node. Each citation includes the URL and the link's anchor text. This gives AI systems a structured list of the post's sources without parsing the HTML body.

How It's Served

Part of the same <script type="application/ld+json"> Article node injected via wp_head. Citations are extracted from post content links at render time and included in the unified JSON-LD block.

"citation": [
  {
    "@type": "WebPage",
    "url": "https://martinfowler.com/bliki/CircuitBreaker.html",
    "name": "CircuitBreaker — Martin Fowler"
  },
  {
    "@type": "WebPage",
    "url": "https://netflix.github.io/Hystrix/",
    "name": "Hystrix Wiki"
  }
]

Tracked as: html or aeo_post

Meta tags

Embedded in <head>

The plugin injects description, Open Graph, and Twitter Card meta tags derived from the post's AEO summary (falling back to the WordPress excerpt if no summary exists). These are suppressed when an active SEO plugin is detected and the conflict toggle is enabled.

How It's Served

Injected via wp_head as standard <meta> tags. When a supported SEO plugin (Yoast, Rank Math, AIOSEO) is detected and the conflict toggle is enabled, the plugin suppresses these tags to avoid duplicates.

<meta name="description" content="Circuit breaker patterns prevent cascading failures by wrapping remote calls in a state machine.">
<meta property="og:title" content="Circuit Breakers in Practice">
<meta property="og:description" content="Circuit breaker patterns prevent cascading failures...">
<meta property="og:type" content="article">
<meta name="twitter:card" content="summary_large_image">

Tracked as: html or aeo_post

Bot analytics

How the data is captured and reported

On every incoming request, the plugin checks the user-agent string against a list of 25 known bot signatures: AI answer engines (GPTBot, ClaudeBot, PerplexityBot, Google-Extended), training crawlers (CCBot, Bytespider, DeepSeekBot), search engines (Googlebot, Bingbot), and SEO tools (SemrushBot, AhrefsBot). Matching is case-insensitive substring comparison.

When a match is found, the plugin records three things: the canonical bot name, the resource type requested (one of html, llms_txt, post_markdown, site_summary, sitemap, robots_txt, aeo_jsonld, rss_feed, rss_aeo, or aeo_post), and the date. RSS visits are split: rss_aeo when AEO feed enrichment is enabled on the site, rss_feed when it is not. Counts are stored locally in a daily summary table; no per-request log is kept.

For content-bearing requests (HTML pages), the plugin also records content signals: word count bucket, content freshness, fact density, and URL depth. These are the metrics displayed in the Crawl Intelligence section of the network dashboard.

Network participation is opt-in. The plugin functions without contributing to the network. Enabling network intelligence sends a daily count summary: bot name, resource type, visit total, and content signal distributions. No URLs, no content, no user data.

The site identifier sent with each payload is a one-way SHA-256 hash of the site URL and an instance ID. It cannot be reversed to recover the domain. It exists to deduplicate contributions from the same site across days.

Discovery

IndexNow pings

When a post is published or updated, the plugin sends an IndexNow ping to notify search engines that the URL has changed. The ping includes a verification key stored as a static file at the site root. Pings are throttled to one burst per 30 minutes to avoid rate-limiting.

IndexNow is supported by Bing, Yandex, and other participating search engines. Google does not currently participate in the IndexNow protocol but receives a separate sitemap ping.

The network

Every metric comes from a real site

The intelligence dashboard isn't scraped, modelled, or estimated. Every number on it — bot visit counts, resource breakdowns, signal distributions — comes from a WordPress site owner who installed the plugin and chose to opt in to the Pugmill Network.

When a site opts in, the plugin sends one anonymised payload per day to the network. That payload contains aggregated bot counts and content signal distributions for that site's 24-hour window. No URLs, no post content, no visitor data — just counts. The site's identity is a one-way hash, irreversible to a domain name. Individual sites are never identifiable in what the dashboard shows.

What that means in practice: the picture gets sharper the more people join. A network of ten sites can tell you which bots are active. A network of a hundred can tell you how crawl patterns differ by content type. A network of thousands can tell you whether a behavioural shift is real or noise. The people who opt in aren't just using the data — they're making it.

This is what "watching the watchers" looks like in practice. AI crawlers and search bots watch the web. Site owners who run this plugin watch them back — and by sharing what they see, anonymously and voluntarily, they build a public record that belongs to no single company.

How to join

Download the plugin, install it on your WordPress site, and enable Network Participation in the plugin settings. That's it. You can opt out at any time.

What gets sent

Bot visit counts by bot name and resource type. Content signal distributions: word count, freshness, fact density, URL depth. A one-way hash of your domain. Nothing else. See the full privacy policy.

Origin

Watching the Watchers

Quis custodiet ipsos custodes?

Who will guard the guards themselves?

Juvenal, Satires VI, c. 2nd century AD

The phrase comes from Juvenal, a Roman poet writing satirical verse in the late 1st and early 2nd centuries AD. It appears in Satire VI, his longest and most acerbic work, in a passage about the futility of keeping a wife faithful: you cannot trust the guards you hire to watch over her, because the guards themselves need watching. The original meaning was less about political theory and more about the impossibility of reliable oversight at all.

The phrase outlasted its domestic context entirely. By the time it entered political philosophy, it had become a foundational challenge to any system of power: who holds the overseers accountable? It now appears in arguments about police oversight, intelligence agencies, judicial review, and, increasingly, the systems that monitor digital behaviour.

AEO Pugmill borrows it literally. AI crawlers and search bots are themselves watchers; they index, train on, and cite the web. The plugin watches them back: logging which bots visit, which content they read, and how their behaviour shifts over time. The network turns that data into a public record. We don't decide how AI systems should behave. We just watch the watchers, and publish what we see.

Contact

Built by Janzen Works

AEO Pugmill is built by Janzen Works. The plugin is free and available to download directly. The network intelligence dashboard is open to anyone at aeopugmill.com.

Feedback, bug reports, and data questions: michael@janzenworks.com