Signaling the Shift to Generative Engine Optimization (GEO)
Key findings — Fortune 500 adoption rates of robots.txt, JSON-LD, and llms.txt
As of March 2026, ProGEO.ai research found:
92.8% (n=464) of the Fortune 500 have implemented robots.txt.
Only 11% (n=55) of the Fortune 500 have named an AI user agent in robots.txt.
53.8% (n=269) of the Fortune 500 have implemented JSON-LD on their homepage.
Only 47.6% of those companies add interior page-specific JSON-LD (n=90 of 189 sampled).
Only 7.4% (n=37) of the Fortune 500 have implemented llms.txt.
Fortune 500 adoption rates of robots.txt, JSON-LD, and llms.txt
Executive summary — From SEO to GEO
The tides of buyer behavior are changing. Generative AI (GenAI) platforms refer less traffic to websites than traditional search engines and search engine traffic is declining. “Zero-click” search behavior is reshaping how buyers find information.
According to Gartner®, “Chief Marketing Officers (CMOs) and their teams need to adjust their web content strategy to adapt to search engine’s evolving algorithms and appear in GenAI-powered search results.”
As of March 2026, ProGEO.ai research indicates that only 5.2% (n=26) of the Fortune 500 have implemented two technical protocols — JSON-LD and llms.txt — intended to optimize their website for GenAI-powered search platforms like ChatGPT, Claude, and Gemini.
This report presents original research across three signals — AI user agents named in robots.txt, JSON-LD, and llms.txt — analyzing adoption rates, implementation patterns, and strategic implications across all 500 companies.
The data reveals early adopters of generative engine optimization (GEO) — the practice of optimizing brand visibility for GenAI-powered search results.
Companies that establish brand visibility for AI systems now are positioning for a paradigm shift in how their buyers, analysts, and stakeholders find information.
Overview — Benchmarking generative engine optimization (GEO) maturity
As of March 2026, only 5.2% (n=26) of the Fortune 500 have implemented JSON-LD and llms.txt — two web crawling protocols closely associated with generative engine optimization (GEO) maturity.
The imperative is clear: maintain brand visibility as GenAI replaces search results.
The technical implications are complex. What does this mean? Why does this matter?
ProGEO.ai conducted original research to measure the Fortune 500 adoption rates across three signal categories — robots.txt, JSON-LD, and llms.txt — establishing a baseline for GEO maturity and enabling marketing leaders to benchmark their organizations against the results.
In the process of this analysis, a pattern emerged. The adoption rates map to Rogers’ diffusion of innovations — a framework for understanding the spread of technical adoption.
The Frequency of Generative Engine Optimization (GEO) — Fortune 500 Adoption Rates Mapped to Rogers’ Diffusion of Innovation Curve
The adoption rate of each signal tracks with the age of its underlying protocol; the older the standard or specification, the greater its adoption among the Fortune 500.
A brief history of robots.txt, JSON-LD, and llms.txt
robots.txt — The Robots Exclusion Protocol.
1994-1998: Robots.txt announced as an informal standard for search engines — becomes the de facto standard for Lycos, AltaVista, and Google.
2019: Google proposed formalizing the standard under the Internet Engineering Task Force.
2022: IETF published RFC 9309 — official standard.
JSON-LD — A structured data standard.
2011: W3C launched JSON-LD Community Group
2011: Google, Microsoft, and Yahoo! launched Schema.org — a centralized resource for documentation.
2014: JSON-LD 1.0 W3C Recommendation
2019: Google explicitly states JSON-LD is their preferred format for structured data.
2020: JSON-LD 1.1 W3C Recommendation.
llms.txt — A specification to serve content in Markdown for AI platforms.
2024: llms.txt specification published.
2024: Anthropic requested Mintlify, a documentation platform, to implement llms.txt support.
2025: OpenAI and Google implemented llms.txt
The pattern is consistent: platform endorsement drives adoption. For robots.txt and JSON-LD, the search giants (i.e. Google, Microsoft, and Yahoo!) drove adoption. For llms.txt, Mintlify’s support for the nascent specification — at the request of Anthropic — has been a pivotal moment.
The window for GEO is open for first movers to gain an advantage.
One dimension of that advantage requires technical implementation.
This report examines each signal in turn — robots.txt, JSON-LD, and llms.txt — what it does, how the Fortune 500 have implemented it, and what the adoption data reveals for enterprise brands evaluating their AI visibility.
Part I — robots.txt: The 30-year-old foundation for search crawlers wasn’t built for AI
Fortune 500 adoption rates for robots.txt —
92.8% (n=464) of the Fortune 500 have implemented robots.txt.
11% (n=55) of the Fortune 500 name at least one AI user agent directivein robots.txt.
76% (n=380) of the Fortune 500 include at least one Sitemap: directive in robots.txt.
What is robots.txt?
Robots.txt implements the Robots Exclusion Protocol — RFC 9309 — a standard that enables site operators to specify whether they [allow] or [disallow] web crawlers to access their site. In principle, it serves as a digital doorman, but in practice it is a “Do Not Enter” sign — a social contract with no enforcement mechanism.
ProGEO.ai research finds that more than nine-in-ten (92.8%) Fortune 500 companies have implemented robots.txt, but only one-in-ten (11%) have named a single AI user agent in their robots.txt file. Robots.txt is nearly ubiquitous, but its application for managing AI user agents has yet to cross the chasm.
Sitemaps in robots.txt
ProGEO.ai research finds more than three-in-four (76%) Fortune 500 companies include at least one Sitemap: directive in their robots.txt. Sitemaps implement the URL inclusion protocol — providing crawlers with a structured list of URLs, their relative priority, and other metadata.
Robots.txt tells crawlers what they can access, sitemaps tell them where to find it. This baseline for information retrieval (IR) is leveraged by both search engines and GenAI systems.
The paradox of permissiveness
The default behavior of robots.txt is permissive. The absence of an explicit [allow] or [disallow] directive is treated as an implied [allow]. The absence of a robots.txt file is also treated as an implicit [allow]. Wildcard (*) rules typically [allow] all user agents and [disallow] specific directories — a crawl budget management technique — which helps crawlers focus on relevant content.
Naming a specific AI user agent in robots.txt requires an intentional decision. When a Fortune 500 company names an AI user agent, they are adopting a posture. The question is whether it is permissive or defensive.
The paradox of permissiveness is that the Fortune 500 companies that have not named an AI user agent are — by default — more accessible to AI crawlers than most of the 11% who have.
Training vs. search — Are Fortune 500 companies blocking AI?
Among the 55 Fortune 500 companies that have named an AI user agent, ProGEO.ai identified 270 total directives in robots.txt across 25 distinct AI user agents — split between 105 [allow], 116 [disallow], and 49 partial access directives. However, a distinct pattern emerges between training agents and search agents.
Directives for training crawlers — bots that collect content to build or fine-tune AI models — in robots.txt skew toward [disallow]. Directives for search crawlers — bots that retrieve content to generate responses — in robots.txt skew toward [allow].
Blocking a training agent may create gaps in how an AI model understands your brand. Blocking a search agent prevents that model from finding, retrieving, and citing your brand in generated responses.
AI user agent directives in Fortune 500 robots.txt files
| Bot Name | Allow | Disallow | Partial | Total |
|---|---|---|---|---|
| GPTBot | 10 | 14 | 8 | 32 |
| CCBot | 4 | 13 | 4 | 21 |
| ChatGPT-User | 11 | 3 | 7 | 21 |
| Google-Extended | 7 | 11 | 3 | 21 |
| ClaudeBot | 6 | 10 | 4 | 20 |
| PerplexityBot | 12 | 4 | 4 | 20 |
| Meta-ExternalAgent | 5 | 11 | 1 | 17 |
| OAI-SearchBot | 11 | 2 | 4 | 17 |
| Bytespider | 3 | 10 | 2 | 15 |
| Amazonbot | 5 | 7 | 1 | 13 |
| PetalBot | 2 | 9 | 0 | 11 |
| YouBot | 7 | 2 | 0 | 9 |
| anthropic-ai | 4 | 2 | 3 | 9 |
| Applebot-Extended | 3 | 5 | 0 | 8 |
| cohere-ai | 2 | 3 | 2 | 7 |
| Claude-SearchBot | 3 | 1 | 2 | 6 |
| Claude-Web | 3 | 2 | 1 | 6 |
| AI2Bot | 2 | 2 | 0 | 4 |
| Diffbot | 1 | 2 | 0 | 3 |
| Timpibot | 2 | 1 | 0 | 3 |
| GrokBot | 0 | 0 | 2 | 2 |
| ImagesiftBot | 0 | 1 | 1 | 2 |
| DeepSeekBot | 1 | 0 | 0 | 1 |
| Grok | 1 | 0 | 0 | 1 |
| VelenPublicWebCrawler | 0 | 1 | 0 | 1 |
| TOTAL | 105 | 116 | 49 | 270 |
GPTBot (i.e., OpenAI’s training crawler) is the most frequently named AI user agent (n=32) and its directives lean toward defensiveness. CCBot (i.e., Common Crawl), Google-Extended (i.e., Gemini), Meta-ExternalAgent, Bytespider (i.e., ByteDance) and Petalbot (i.e., Huawei) follow a similar pattern. These are the most restricted bots.
The inverse pattern emerges for search and retrieval agents. ChatGPT-User (i.e. OpenAI’s search agent), OAI-SearchBot, and PerplexityBot all lean toward permissiveness.
The Fortune 500 companies that have named AI user agents in robots.txt are generally choosing to remain open to AI-generated searches while restricting the use of their content for model training.
The limits of voluntary compliance — can you block AI user agents with robots.txt?
In December 2025, OpenAI announced that ChatGPT-User would no longer follow robots.txt directives for user-initiated browsing. Multiple reports suggest Bytespider (i.e., ByteDance) ignores robots.txt directives.
In response, some organizations have moved to enforce web crawling access at a different layer: web application firewalls (WAFs). Unlike robots.txt, which relies on voluntary compliance, WAFs can identify and block unwanted requests. (ProGEO.ai experienced this WAF-blocking while scanning the Fortune 500, which required a manual recovery pipeline to complete this research.)
WAF-based blocking has introduced its own complexity. Google uses the same user agent for all of its crawlers — from Search to Gemini. An organization using a WAF to block Googlebot to restrict Gemini also prevents indexing their site for search. The separation of AI training (e.g., Google-Extended) from search indexing in robots.txt does not cleanly translate to WAF-based enforcement.
What this means for enterprise brands
92.8% of the Fortune 500 have implemented robots.txt, but only 11% have named an AI user agent — the delta between the two is the difference between a widespread standard, supported by search giants (i.e., Google, Microsoft, Yahoo!), and the early adopters of an emerging application with questionable efficacy.
For enterprise brands evaluating their AI visibility, robots.txt surfaces three priorities that require decisions:
Which AI user agents to name or whether to [allow] all with a (*) wildcard.
Whether to restrict training crawlers and permit search crawlers.
Whether the voluntary compliance of robots.txt provides sufficient control.
Part II — JSON-LD: Half of the Fortune 500 have structured data, but fewer are using it strategically
Fortune 500 Adoption Rates for JSON-LD —
53.8% (n=269) of the Fortune 500 have implemented JSON-LD on their homepage.
5.1 average types per implementation.
Only 47.6% of the Fortune 500 that have implemented JSON-LD on their homepage add interior-page types (n=90 of 189 sampled).
What is JSON-LD?
JSON-LD (JavaScript Object Notation for Linked Data) is a W3C standard for encoding structured data on web pages. It provides explicit, machine-readable semantic signals — entity declarations that tell search engines and AI systems what is on the page and what it means.
Think of JSON-LD as the “Nutrition Facts” of your website — it declares standardized types (e.g., [Organization], [Article], [Person]).
Search engines use JSON-LD to generate rich results — the images, carousels, FAQs that appear in search. More importantly for generative engine optimization (GEO), JSON-LD informs knowledge graphs — the system that search engines and AI platforms use to map entity relationships.
A partial example of Google’s “rich results” for Nvidia
How many Fortune 500 companies have implemented JSON-LD?
ProGEO.ai research finds that more than half (53.8%) of the Fortune 500 have implemented JSON-LD on their homepage, at an average of 5.1 types per implementation. The three most frequently used types are [Organization] (n=182), [WebSite] (n=147), and [SearchAction] (n=124).
These three types do important work for traditional search: [Organization] populates knowledge panels, [WebSite] enables sitelinks, and [SearchAction] powers the search box that appears within Google results.
These JSON-LD types are the structured data equivalent of having a robots.txt file — the baseline infrastructure that has been standard practice for years.
A more sophisticated strategy emerges on interior content pages.
To assess whether JSON-LD implementations extend beyond the homepage of these 269 companies, ProGEO.ai randomly sampled their interior content pages — 189 were successfully sampled (interior page scanning was blocked for 77 companies and inconclusive for 3).
JSON-LD implementation patterns among Fortune 500 companies
52.4% of the sampled implementations only have JSON-LD on their homepage or injected as a site-wide header — they are doing the same work on every page: declaring the [Organization], [WebSite], and [SearchAction]. This pattern is indicative of SEO or content management system (CMS) work that has not been extended to serve AI retrieval.
47.6% (n=90) of the sampled implementations are adding page-specific structured data to interior pages. The most common content-specific types are [Article] (n=146), [Person] (n=105), and [BreadcrumbList] (n=84). These types do the work that matters for GEO — building entity relationships for knowledge graphs and AI citation: they identify the author, mark the content as a distinct publishable unit, and establish its position in a site hierarchy.
What this means for enterprise brands
53.8% of the Fortune 500 have implemented JSON-LD on their homepages — the late majority of adoption. But this adoption rate conceals a maturity gap.
The relevant measure for generative engine optimization (GEO) is not whether a company has JSON-LD, it is whether that implementation extends to the interior content pages. By that measure fewer than half of the sampled implementations demonstrate that strategy — only about one-quarter of the Fortune 500.
For enterprise brands evaluating their structured data posture, the data surfaces two priorities:
Implement JSON-LD on your homepage to define company-specific schemas (e.g., [Organization])
Extend JSON-LD beyond site-wide templates to include content-specific schemas (e.g., [Article], [Person], [BreadcrumbList]) on interior pages.
Part III — llms.txt: Early Adopters indicate intent for generative engine optimization (GEO)
Fortune 500 adoption rates for llms.txt —
7.4% (n=37) of the Fortune 500 have implemented llms.txt
66.5% of llms.txt file content is prose (not just URL lists)
70.3% of the Fortune 500 that have implemented llms.txt have also implemented JSON-LD (n=26)
What is llms.txt?
llms.txt is a specification for curating and serving website content in Markdown (.md) — the format most efficiently processed by large language models (LLMs). The file uses an H1 header to declare the website name, H2 sections to list URLs, and Markdown to provide detailed context.
llms.txt is similar to robots.txt and sitemap.xml — a machine-readable file at the domain root. Sitemaps contain URLs and metadata for search engines. llms.txt contains URLs and curated content for AI systems. Sitemaps are maps. llms.txt are guidebooks.
How many Fortune 500 companies have implemented llms.txt?
ProGEO.ai research finds that 7.4% (n=37) of the Fortune 500 have implemented llms.txt. For context: 92.8% of the Fortune 500 have implemented robots.txt and 53.8% have implemented JSON-LD. The Fortune 500 companies that have implemented llms.txt are the early adopters.
This research analyzes the llms.txt file structure, file size, and content composition across the 37 detected implementations. Companies were excluded from individual analysis for improper formatting (i.e., missing the required H1 header) or as statistical outliers identified through IQR analysis. Sample sizes vary as noted below.
How are early adopters using llms.txt?
ProGEO.ai research finds that approximately two-thirds (66.5%) of the typical llms.txt file is prose — this content provides GenAI platforms with context into what matters and why.
The typical llms.txt file structure and size
ProGEO.ai research finds a median of eight (8) headers in the typical llms.txt file (n=31) — with the required structure of one (1) H1 header and six (6) H2 headers. The average file size is 6,721 characters (n=27) and contains a median of 31 URLs (n=29)
Outlier analysis revealed extreme variance. One company implemented 976 H1 headers, creating hierarchical ambiguity that undermines the specification’s purpose. One company published an llms.txt of 1.3 million characters — approximately 250,000 tokens, which exceeds the context window of some AI models.
| Level | Mean | Median | Mode | Companies Using |
|---|---|---|---|---|
| Total | 9 | 8 | 1 | 31/31 |
| H1 | 2 | 1 | 1 | 31/31 (100%) |
| H2 | 6.2 | 6 | — | 26/31 (84%) |
| H3 | 0.8 | 0 | 0 | 5/31 (16%) |
Early adopters of llms.txt show multi-signal sophistication
70.3% (n=26) of the Fortune 500 companies that have implemented llms.txt have also implemented JSON-LD — these companies have invested in the structured data that fuels AI visibility for generative engine optimization.
Eight (8) Fortune 500 companies that have implemented llms.txt have also named an AI user agent in robots.txt. These companies have an overwhelmingly permissive posture toward AI — across 51 directives for AI user agents, 41 are [allow] and only one (1) is [disallow]. This permissive posture is the logical complement to publishing llms.txt.
Six (6) of those companies have also implemented JSON-LD on their homepage. These are the companies going all-in on AI visibility — they have implemented llms.txt, JSON-LD, and explicitly [allow] AI user agents in robots.txt:
Nvidia (nvidia.com)
Dell Technologies (dell.com)
Builder FirstSource (bldr.com)
Sonic Automotive (sonicautomotive.com)
FM (fmglobal.com)
Concentrix (concentrix.com)
NB: Concentrix explicitly disallows ClaudeBot from its entire site, but all other AI bots inherit partial permission from wildcard rules
When you add it all up, just barely 1% of the Fortune 500 have implemented all three signals — llms.txt, JSON-LD, and AI directives in robots.txt.
Mixed messages for an emerging signal — do large language models (LLMs) support llms.txt?
Among the three signals measured in “Signaling the Shift to Generative Engine Optimization (GEO),” llms.txt is by far the most recent. robots.txt is more than 30-years old and has been an IETF standard since 2022. JSON-LD is more than 15-years old and has been a W3C standard since 2014. The llms.txt specification was published two years ago in 2024 — it is not yet a standard.
After llms.txt launched in 2024, its first major milestone occurred when Mintlify — a documentation platform — announced support for llms.txt at the request of Anthropic.
However, as of March 2026, docs.anthropic.com/llms-full.txt returns a “page not found” result. On the other hand, as of March 2026, OpenAI is serving llms.txt.
Throughout 2025, John Mueller, search advocate, Google, reiterated that “no AI system currently uses llms.txt” — However, as of March 2026, Google’s Gemini documentation has an active llms.txt.
In February 2026, a 90-day OtterlyAI experiment found llms.txt provided no meaningful impact on AI crawler behavior.
The evidence for the efficacy of llms.txt is early and contested — exactly what is to be expected for a specification that is two-years old. However, the signal for its adoption is clear — the largest brands in the United States have begun experimenting with GEO.
What this means for enterprise brands
Only 7.4% of the Fortune 500 have implemented llms.txt on their homepage — these early adopters signal that the largest enterprises are beginning to recognize generative engine optimization (GEO) is a distinct discipline from search engine optimization (SEO).
The implementation data suggests that early adopters are approaching llms.txt files with sophistication — files are descriptive prose, not just URL dumps.
Companies with llms.txt are far more likely to have JSON-LD and a permissive posture toward AI user agents in robots.txt. The companies most deliberate about AI visibility are optimizing across multiple layers simultaneously.
For enterprise brands evaluating whether to implement llms.txt, the data surfaces three priorities:
Implement llms.txt — it is low-cost and low risk.
Extend the impact of AI visibility with JSON-LD
Audit robots.txt to ensure AI user agents have permission to access your website.
Ultimately, robots.txt, JSON-LD, and llms.txt are not the end of optimization, they are just the beginning. Effective GEO also requires content to fuel the engine.
Conclusion — From Structured Context to Content Strategy
This report has measured the delta between the widespread adoption of robots.txt and the early adopters of llms.txt, reflecting Rogers’ diffusion of innovations. The Fortune 500 companies experimenting with optimizing their websites for AI have initiated their journey toward generative engine optimization (GEO) maturity.
Likewise, the majority of Fortune 500 companies have implemented JSON-LD — a signal toward SEO maturity. However, the absence of JSON-LD from interior content pages demonstrates how the complexity of technical implementation can hinder progress.
Compared to the adoption rates of robots.txt and JSON-LD, the adoption rates of llms.txt and AI directives in robots.txt indicate that GEO is still an emerging practice compared to SEO.
The technical implementations measured in this report are just one dimension: they are infrastructure like plumbing. Another dimension is topical authority: the water these pipes carry.
AI systems cite content that is authoritative, evidence-based, and structured for extraction. Google’s E-E-A-T framework (i.e., experience, expertise, authority, and trust) describes the qualities that corporate communications and content marketing programs should provide to support SEO and GEO.
For the early adopters of GEO, the choice is not between infrastructure and content — they must do both: implement the structured data that provides clarity for AI systems and publish the kind of content that earns citations.
There are many best practices, but there is no one-size-fits-all approach to GEO. Organizations need to meet their customers where they are, which requires understanding their needs. These organizations and their content need to be adaptable, to flow and to fit into the shape of their container — just like water.
The tides of buyer behavior are changing.
“Be water, my friend.” — Bruce Lee
Methodology
In March 2026, ProGEO.ai scanned the entire Fortune 500 with a Python-based HTTP client (httpx). The scanner retrieved each company’s homepage, /robots.txt, and /llms.txt. Validation logic identified failed scans.
A Playwright headless browser rescanned websites that returned inconclusive results. ProGEO.ai conducted a manual review of the remaining websites to extract relevant signals.
All results were stored in an SQLite database, including the raw text of each scan and the method (automated or manual) attributed to each record.
Limitations
This study represents a point-in-time snapshot. Company policies may have changed since data collection. Companies protected by aggressive bot-detection infrastructure may be undercounted adoption signals despite manual review.
Gartner Press Release,Marketing Leaders Must Adjust Web Content to Succeed with GenAI-Powered Search, April 16, 2025
GARTNER is a trademark of Gartner, Inc. and/or its affiliates.
Ready to see how AI views your brand?
Give us 30 minutes and ProGEO.ai will give you an initial assessment of your location in the AI landscape – and what it will take to own your narrative.