Back

How JSON-LD Helps AI Understand Your Website

How JSON-LD Helps AI Understand Your Website

JSON-LD is a <script>-based format for embedding Schema.org vocabulary as structured, machine-readable metadata in your HTML. It lets search engines and AI systems identify entities — articles, organizations, products, breadcrumbs, people — without inferring them from prose. For Google Search, correct JSON-LD makes pages eligible for specific search features and helps populate the Knowledge Graph. For AI crawlers like GPTBot, ClaudeBot, and PerplexityBot, there is one hard constraint frontend developers consistently miss: these crawlers do not execute JavaScript, so JSON-LD must be present in the server response to reach them at all.

This article covers what JSON-LD actually does, where it helps and where it doesn’t, three working examples you can adapt, the rendering gap that breaks client-injected markup for AI crawlers, and how to validate what you ship.

Key Takeaways

  • JSON-LD is Google’s recommended structured-data format and uses Schema.org vocabulary to describe entities and their relationships in machine-readable form.
  • AI crawlers including GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript — JSON-LD injected via useEffect, Google Tag Manager, or any client-side path is invisible to them.
  • AI Overviews and AI Mode do not require AI-specific markup; standard Google Search eligibility applies, and structured data helps interpretation rather than guaranteeing inclusion.
  • Google stopped surfacing FAQ rich results in May 2026, with Search Console and Rich Results Test support being removed in June 2026; FAQPage remains a valid Schema.org type but is no longer a rich-result growth tactic.
  • For long-term validation, validator.schema.org is the safer default — it checks the full Schema.org vocabulary, while the Google Rich Results Test only covers Google-supported features.

What JSON-LD is and why it exists

JSON-LD (JavaScript Object Notation for Linked Data) is a W3C-standardized syntax for expressing Linked Data in JSON. On the web, it’s almost always used to embed Schema.org vocabulary inside a <script type="application/ld+json"> tag in the document. The data describes the page’s entities — what it’s about, who wrote it, what organization published it, how it relates to other pages — in a form a parser can consume without reading prose.

The “why” is interpretive cost. Without structured data, a crawler has to infer that “Apple” on a page refers to the company, not the fruit; that the byline “Jane Chen” is the author and not a subject; that a price in the markup belongs to the product on the page and not to a related item in a sidebar. Schema.org gives those facts explicit types and properties. JSON-LD delivers them in a block that’s decoupled from the visible DOM, which is why Google recommends it over Microdata and RDFa — you can generate and update it without touching layout.

A few claims worth being precise about: JSON-LD is not a confirmed ranking factor in Google’s documented ranking systems. It does not guarantee rich results — eligibility is necessary but not sufficient. It does not guarantee inclusion in AI Overviews or AI Mode. And AI systems do not require JSON-LD; they can read prose. What JSON-LD does is reduce the inferential work, which makes correct interpretation more likely and ambiguous content less likely to be misclassified.

How JSON-LD fits AI Overviews and AI Mode

AI Overviews and AI Mode operate on top of Google’s existing Search index. Google’s published guidance is that there is no AI-specific markup — pages become eligible for AI features through the same content and structured-data signals that govern regular Search. If your page is correctly indexed, structured, and matches the user’s intent, the structured data you ship for Search is the same data those features draw on.

The honest framing: structured data helps AI systems interpret your content correctly. It does not buy inclusion in generated answers. Treat it as a way to remove ambiguity, not as a growth lever.

The AI crawler rendering gap

This is the single highest-leverage technical point for frontend developers, and it’s missing from most JSON-LD writeups: AI crawlers should generally be treated as non-rendering crawlers, so structured data should be present in the initial HTML response.

Googlebot is the exception: it generally renders JavaScript on a deferred pass, so client-injected JSON-LD will eventually be processed for Search. AI crawlers will not catch up. If your JSON-LD only exists after hydration, it’s invisible to the systems most readers are now asking about.

A common production failure mode visible in real frontend apps: a developer adds JSON-LD inside a useEffect hook, or pushes it via Google Tag Manager. The block appears in the rendered DOM, Rich Results Test (which renders) shows it as valid, and the team assumes the job is done. Meanwhile every non-rendering crawler sees an HTML response without any structured data at all.

The fix is to put the <script type="application/ld+json"> block in the initial server response.

Server-rendering JSON-LD by framework

FrameworkWhere to put JSON-LD
Next.js App RouterInline <script type="application/ld+json"> in a Server Component, or the other field in the Metadata API
Next.js Pages RouterInline <script> inside <Head> from next/head, rendered in getServerSideProps/getStaticProps paths
Nuxt 3useHead with a script entry, called in a server context
AstroInline <script type="application/ld+json"> directly in the .astro template (static by default, no JS execution required)
SvelteKit<svelte:head> block containing the script, rendered server-side

The principle is identical across stacks: the JSON-LD block must exist in the bytes that come back from the server, not in bytes added by the browser after first paint.

Three working JSON-LD examples

The examples below use Schema.org vocabulary as of version 30.0, released March 19, 2026.

BlogPosting (an Article subtype)

In the Schema.org hierarchy, BlogPosting is a subtype of Article, which is itself a subtype of CreativeWork. Declaring "@type": "BlogPosting" implicitly inherits all Article properties — no need to declare both.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "@id": "https://openreplay.com/blog/json-ld-ai-search#article",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://openreplay.com/blog/json-ld-ai-search"
  },
  "headline": "How JSON-LD Helps AI Understand Your Website",
  "description": "A frontend-focused guide to JSON-LD, Schema.org, and the server-rendering requirement for AI crawlers.",
  "image": "https://openreplay.com/images/blog/json-ld-ai-search.png",
  "datePublished": "2026-05-20T09:00:00-04:00",
  "dateModified": "2026-05-20T09:00:00-04:00",
  "author": {
    "@type": "Organization",
    "name": "OpenReplay Team",
    "url": "https://openreplay.com"
  },
  "publisher": {
    "@id": "https://openreplay.com/#organization"
  }
}
</script>

Use this on long-form article pages. mainEntityOfPage ties the article to its canonical URL; dateModified matters because Google uses it when freshness is a query signal.

Organization

The Organization type describes the publishing entity. Use sameAs to point at canonical external profiles — this is how systems disambiguate your brand from others with similar names.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://openreplay.com/#organization",
  "name": "OpenReplay",
  "url": "https://openreplay.com",
  "logo": {
    "@type": "ImageObject",
    "url": "https://openreplay.com/images/logo.png",
    "width": 512,
    "height": 512
  },
  "sameAs": [
    "https://github.com/openreplay",
    "https://www.linkedin.com/company/openreplay",
    "https://x.com/OpenReplayHQ"
  ]
}
</script>

Ship this once, site-wide — typically in the document head of every page. Reuse the @id from other schema blocks (as BlogPosting.publisher does above) so a single canonical Organization definition is referenced everywhere instead of duplicated.

BreadcrumbList describes the page’s place in the site hierarchy. Google supports breadcrumb rich results, and the same data helps AI systems understand site structure.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://openreplay.com/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://openreplay.com/blog"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "How JSON-LD Helps AI Understand Your Website"
    }
  ]
}
</script>

The last item omits item because it’s the current page. position is 1-indexed.

Keep structured data aligned with visible content

Google’s structured-data guidelines are explicit: content in JSON-LD must match what’s on the page. Marking up reviews that don’t exist on the page, prices that don’t appear, or author names that contradict visible bylines is a manual-action risk. The structured data is supposed to describe the page, not enhance it with claims that aren’t substantiated in the DOM.

This also matters for AI systems. If your visible prose says one thing and your JSON-LD says another, you’ve created the exact ambiguity structured data is supposed to resolve.

Validate before you ship

Two tools cover the workflow:

ToolChecksBest for
Google Rich Results TestEligibility for Google-supported rich result typesPre-deployment check for Article, Product, Breadcrumb, etc.
Schema Markup ValidatorConformance to the full Schema.org vocabularyGeneral validation; any type Google doesn’t surface as a rich result

Rich Results Test renders the page (so it sees client-injected JSON-LD that AI crawlers won’t). To confirm what non-rendering crawlers see, run curl -A "GPTBot" https://your-page against your URL and grep the response for application/ld+json. If the block isn’t in that raw HTML, AI crawlers won’t see it.

Google Search Console’s Enhancements section reports parsing errors and eligibility for the structured-data types Google tracks. Use it for ongoing monitoring, not as a substitute for pre-deployment validation.

A note on FAQPage

FAQPage is still a valid Schema.org type and Google still uses it for content understanding. But Google notes in its FAQPage documentation that FAQ-related Search features are being removed: FAQ rich results stopped appearing in Search in May 2026; Search Console reporting and Rich Results Test support are being removed in June 2026; Search Console API support follows in August 2026. If you’re choosing schema to implement now for visible rich-result lift, FAQPage isn’t the answer. Implement it if you have a genuine FAQ page and want the semantics on record — not as a SERP real-estate play.

This is also why validator.schema.org is the safer default for long-term validation work: it doesn’t depend on which features Google chooses to surface as rich results.

What to ship first

If you’re starting from zero, the highest-value sequence is: a single Organization block site-wide, BreadcrumbList on every page that isn’t the homepage, and Article/BlogPosting on editorial content. Server-render all of it. Validate with both tools. Then check the raw HTML response with a non-rendering user agent to confirm AI crawlers actually receive what you wrote.

Structured data won’t move rankings on its own and won’t talk your way into AI Overviews. What it does — reliably, when shipped correctly — is remove the interpretive guesswork that causes search engines and AI systems to get your content wrong. For frontend teams, the work that matters most is not which schema types to add but where in the rendering pipeline they live. Put the JSON-LD in the server response, and the rest is content.

FAQs

Use JSON-LD. Google explicitly recommends it over Microdata and RDFa because it lives in a single script block decoupled from the visible DOM, which means you can generate and update structured data without touching layout markup. All three formats can express Schema.org vocabulary and are parsed by Google, but JSON-LD is easier to maintain, easier to server-render, and the format Google's own examples use throughout its structured-data documentation.

Yes. Google supports multiple separate script tags on a single page, which is the standard pattern for combining types like Organization, BreadcrumbList, and BlogPosting. The alternative is a single script containing a graph array under the @graph key, where each entity gets its own @id. Both approaches are valid; the @graph pattern is cleaner when entities reference each other via @id, since it keeps a single canonical definition per entity.

Only if the JSON-LD is in the initial server response. SPAs that inject structured data after hydration are invisible to non-rendering crawlers like GPTBot, ClaudeBot, and PerplexityBot. The fix is server-side rendering or static generation for pages that need structured data — using Next.js, Nuxt, Astro, or SvelteKit in their SSR or SSG modes — so the script tag exists in the HTML bytes before any JavaScript executes.

Update dateModified only when the article's substantive content changes — corrections, new sections, updated facts, refreshed examples. Do not bump it on every deploy or for cosmetic edits like typo fixes or styling tweaks. Google uses dateModified as a freshness signal for time-sensitive queries, and inflating it without real content changes contradicts the rule that structured data must match visible content, which is a manual-action risk.

Understand every bug

Uncover frustrations, understand bugs and fix slowdowns like never before with OpenReplay — the open-source session replay tool for developers. Self-host it in minutes, and have complete control over your customer data. Check our GitHub repo and join the thousands of developers in our community.

OpenReplay