How JSON-LD Helps AI Understand Your Website

JSON-LD and Schema.org help AI crawlers and Google understand your site. See how server-rendered structured data avoids JavaScript gaps.

OpenReplay Team

Jun 3, 2026 · 8 min read

How JSON-LD Helps AI Understand Your Website

JSON-LD is a <script>-based format for embedding Schema.org vocabulary as structured, machine-readable metadata in your HTML. It lets search engines and AI systems identify entities — articles, organizations, products, breadcrumbs, people — without inferring them from prose. For Google Search, correct JSON-LD makes pages eligible for specific search features and helps populate the Knowledge Graph. For AI crawlers like GPTBot, ClaudeBot, and PerplexityBot, there is one hard constraint frontend developers consistently miss: these crawlers do not execute JavaScript, so JSON-LD must be present in the server response to reach them at all.

This article covers what JSON-LD actually does, where it helps and where it doesn’t, three working examples you can adapt, the rendering gap that breaks client-injected markup for AI crawlers, and how to validate what you ship.

Key Takeaways

JSON-LD is Google’s recommended structured-data format and uses Schema.org vocabulary to describe entities and their relationships in machine-readable form.
AI crawlers including GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript — JSON-LD injected via useEffect, Google Tag Manager, or any client-side path is invisible to them.
AI Overviews and AI Mode do not require AI-specific markup; standard Google Search eligibility applies, and structured data helps interpretation rather than guaranteeing inclusion.
Google stopped surfacing FAQ rich results in May 2026, with Search Console and Rich Results Test support being removed in June 2026; FAQPage remains a valid Schema.org type but is no longer a rich-result growth tactic.
For long-term validation, validator.schema.org is the safer default — it checks the full Schema.org vocabulary, while the Google Rich Results Test only covers Google-supported features.

What JSON-LD is and why it exists

JSON-LD (JavaScript Object Notation for Linked Data) is a W3C-standardized syntax for expressing Linked Data in JSON. On the web, it’s almost always used to embed Schema.org vocabulary inside a <script type="application/ld+json"> tag in the document. The data describes the page’s entities — what it’s about, who wrote it, what organization published it, how it relates to other pages — in a form a parser can consume without reading prose.

The “why” is interpretive cost. Without structured data, a crawler has to infer that “Apple” on a page refers to the company, not the fruit; that the byline “Jane Chen” is the author and not a subject; that a price in the markup belongs to the product on the page and not to a related item in a sidebar. Schema.org gives those facts explicit types and properties. JSON-LD delivers them in a block that’s decoupled from the visible DOM, which is why Google recommends it over Microdata and RDFa — you can generate and update it without touching layout.

A few claims worth being precise about: JSON-LD is not a confirmed ranking factor in Google’s documented ranking systems. It does not guarantee rich results — eligibility is necessary but not sufficient. It does not guarantee inclusion in AI Overviews or AI Mode. And AI systems do not require JSON-LD; they can read prose. What JSON-LD does is reduce the inferential work, which makes correct interpretation more likely and ambiguous content less likely to be misclassified.

How JSON-LD fits AI Overviews and AI Mode

AI Overviews and AI Mode operate on top of Google’s existing Search index. Google’s published guidance is that there is no AI-specific markup — pages become eligible for AI features through the same content and structured-data signals that govern regular Search. If your page is correctly indexed, structured, and matches the user’s intent, the structured data you ship for Search is the same data those features draw on.

The honest framing: structured data helps AI systems interpret your content correctly. It does not buy inclusion in generated answers. Treat it as a way to remove ambiguity, not as a growth lever.

The AI crawler rendering gap

This is the single highest-leverage technical point for frontend developers, and it’s missing from most JSON-LD writeups: AI crawlers should generally be treated as non-rendering crawlers, so structured data should be present in the initial HTML response.

OpenAI’s GPTBot fetches HTML and does not render client-side JavaScript.
Anthropic’s ClaudeBot operates as a standard HTTP crawler without JS execution.
PerplexityBot likewise reads raw HTML.

Googlebot is the exception: it generally renders JavaScript on a deferred pass, so client-injected JSON-LD will eventually be processed for Search. AI crawlers will not catch up. If your JSON-LD only exists after hydration, it’s invisible to the systems most readers are now asking about.

A common production failure mode visible in real frontend apps: a developer adds JSON-LD inside a useEffect hook, or pushes it via Google Tag Manager. The block appears in the rendered DOM, Rich Results Test (which renders) shows it as valid, and the team assumes the job is done. Meanwhile every non-rendering crawler sees an HTML response without any structured data at all.

The fix is to put the <script type="application/ld+json"> block in the initial server response.

Server-rendering JSON-LD by framework

Framework	Where to put JSON-LD
Next.js App Router	Inline `<script type="application/ld+json">` in a Server Component, or the `other` field in the Metadata API
Next.js Pages Router	Inline `<script>` inside `<Head>` from `next/head`, rendered in `getServerSideProps`/`getStaticProps` paths
Nuxt 3	`useHead` with a `script` entry, called in a server context
Astro	Inline `<script type="application/ld+json">` directly in the `.astro` template (static by default, no JS execution required)
SvelteKit	`<svelte:head>` block containing the script, rendered server-side

The principle is identical across stacks: the JSON-LD block must exist in the bytes that come back from the server, not in bytes added by the browser after first paint.

Three working JSON-LD examples

The examples below use Schema.org vocabulary as of version 30.0, released March 19, 2026.

BlogPosting (an Article subtype)

In the Schema.org hierarchy, BlogPosting is a subtype of Article, which is itself a subtype of CreativeWork. Declaring "@type": "BlogPosting" implicitly inherits all Article properties — no need to declare both.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "@id": "https://openreplay.com/blog/json-ld-ai-search#article",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://openreplay.com/blog/json-ld-ai-search"
  },
  "headline": "How JSON-LD Helps AI Understand Your Website",
  "description": "A frontend-focused guide to JSON-LD, Schema.org, and the server-rendering requirement for AI crawlers.",
  "image": "https://openreplay.com/images/blog/json-ld-ai-search.png",
  "datePublished": "2026-05-20T09:00:00-04:00",
  "dateModified": "2026-05-20T09:00:00-04:00",
  "author": {
    "@type": "Organization",
    "name": "OpenReplay Team",
    "url": "https://openreplay.com"
  },
  "publisher": {
    "@id": "https://openreplay.com/#organization"
  }
}
</script>

Use this on long-form article pages. mainEntityOfPage ties the article to its canonical URL; dateModified matters because Google uses it when freshness is a query signal.

Organization

The Organization type describes the publishing entity. Use sameAs to point at canonical external profiles — this is how systems disambiguate your brand from others with similar names.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://openreplay.com/#organization",
  "name": "OpenReplay",
  "url": "https://openreplay.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://openreplay.com/images/logo.png",
    "width": 512,
    "height": 512
  },
  "sameAs": [
    "https://github.com/openreplay",
    "https://www.linkedin.com/company/openreplay",
    "https://x.com/OpenReplayHQ"
  ]
}
</script>

Ship this once, site-wide — typically in the document head of every page. Reuse the @id from other schema blocks (as BlogPosting.publisher does above) so a single canonical Organization definition is referenced everywhere instead of duplicated.

BreadcrumbList

BreadcrumbList describes the page’s place in the site hierarchy. Google supports breadcrumb rich results, and the same data helps AI systems understand site structure.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://openreplay.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://openreplay.com/blog"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "How JSON-LD Helps AI Understand Your Website"
    }
  ]
}
</script>

The last item omits item because it’s the current page. position is 1-indexed.

Keep structured data aligned with visible content

Google’s structured-data guidelines are explicit: content in JSON-LD must match what’s on the page. Marking up reviews that don’t exist on the page, prices that don’t appear, or author names that contradict visible bylines is a manual-action risk. The structured data is supposed to describe the page, not enhance it with claims that aren’t substantiated in the DOM.

This also matters for AI systems. If your visible prose says one thing and your JSON-LD says another, you’ve created the exact ambiguity structured data is supposed to resolve.

Validate before you ship

Two tools cover the workflow:

Tool	Checks	Best for
Google Rich Results Test	Eligibility for Google-supported rich result types	Pre-deployment check for Article, Product, Breadcrumb, etc.
Schema Markup Validator	Conformance to the full Schema.org vocabulary	General validation; any type Google doesn’t surface as a rich result

Rich Results Test renders the page (so it sees client-injected JSON-LD that AI crawlers won’t). To confirm what non-rendering crawlers see, run curl -A "GPTBot" https://your-page against your URL and grep the response for application/ld+json. If the block isn’t in that raw HTML, AI crawlers won’t see it.

Google Search Console’s Enhancements section reports parsing errors and eligibility for the structured-data types Google tracks. Use it for ongoing monitoring, not as a substitute for pre-deployment validation.

A note on FAQPage

FAQPage is still a valid Schema.org type and Google still uses it for content understanding. But Google notes in its FAQPage documentation that FAQ-related Search features are being removed: FAQ rich results stopped appearing in Search in May 2026; Search Console reporting and Rich Results Test support are being removed in June 2026; Search Console API support follows in August 2026. If you’re choosing schema to implement now for visible rich-result lift, FAQPage isn’t the answer. Implement it if you have a genuine FAQ page and want the semantics on record — not as a SERP real-estate play.

This is also why validator.schema.org is the safer default for long-term validation work: it doesn’t depend on which features Google chooses to surface as rich results.

What to ship first

If you’re starting from zero, the highest-value sequence is: a single Organization block site-wide, BreadcrumbList on every page that isn’t the homepage, and Article/BlogPosting on editorial content. Server-render all of it. Validate with both tools. Then check the raw HTML response with a non-rendering user agent to confirm AI crawlers actually receive what you wrote.

Structured data won’t move rankings on its own and won’t talk your way into AI Overviews. What it does — reliably, when shipped correctly — is remove the interpretive guesswork that causes search engines and AI systems to get your content wrong. For frontend teams, the work that matters most is not which schema types to add but where in the rendering pipeline they live. Put the JSON-LD in the server response, and the rest is content.

FAQs

Should I use JSON-LD, Microdata, or RDFa for structured data?

Use JSON-LD. Google explicitly recommends it over Microdata and RDFa because it lives in a single script block decoupled from the visible DOM, which means you can generate and update structured data without touching layout markup. All three formats can express Schema.org vocabulary and are parsed by Google, but JSON-LD is easier to maintain, easier to server-render, and the format Google's own examples use throughout its structured-data documentation.

Can I put multiple JSON-LD blocks on the same page?

Yes. Google supports multiple separate script tags on a single page, which is the standard pattern for combining types like Organization, BreadcrumbList, and BlogPosting. The alternative is a single script containing a graph array under the @graph key, where each entity gets its own @id. Both approaches are valid; the @graph pattern is cleaner when entities reference each other via @id, since it keeps a single canonical definition per entity.

Does JSON-LD work for single-page applications that fetch content client-side?

Only if the JSON-LD is in the initial server response. SPAs that inject structured data after hydration are invisible to non-rendering crawlers like GPTBot, ClaudeBot, and PerplexityBot. The fix is server-side rendering or static generation for pages that need structured data — using Next.js, Nuxt, Astro, or SvelteKit in their SSR or SSG modes — so the script tag exists in the HTML bytes before any JavaScript executes.

How often should I update dateModified in BlogPosting markup?

Update dateModified only when the article's substantive content changes — corrections, new sections, updated facts, refreshed examples. Do not bump it on every deploy or for cosmetic edits like typo fixes or styling tweaks. Google uses dateModified as a freshness signal for time-sensitive queries, and inflating it without real content changes contradicts the rule that structured data must match visible content, which is a manual-action risk.

Understand every bug

Uncover frustrations, understand bugs and fix slowdowns like never before with OpenReplay — self-hosted, with full data ownership.

Star on GitHub