LangShake: The New Standard for AI-Ready Web Content

A lightweight, open framework to deliver high-quality structured data to LLMs

Join the Discussion

What is LangShake?

LangShake Logo

LangShake is a modern web protocol that makes your site machine-readable, AI-friendly, and verifiable — without changing your frontend.

It introduces a three-part system to expose structured content to Large Language Models (LLMs), search agents, and automated crawlers:

sitemap.xml Extension

LangShake adds new fields to your existing sitemap.xml, allowing you to link to structured content and verify its integrity.

<url>
  <loc>https://example.com/article</loc>
  <lastmod>2025-04-16T17:58:00Z</lastmod>
  <langshake:schema-url>https://example.com/langshake/article.json</langshake:schema-url>
  <langshake:checksum>8f7a9b3cf5a...a8b9c07e8f</langshake:checksum>
</url>

Content JSON Files — Schema.org Data

This is a standalone JSON file following Schema.org standards. It's used by LLMs instead of parsing HTML, and includes a checksum to ensure it hasn't been modified.

{
  "@context": "http://schema.org",
  "@type": "Article",
  "headline": "LangShake: Revolutionizing LLM Training Data",
  "description": "A guide to implementing the LangShake protocol...",
  "articleBody": "...",
  "author": { "name": "Jane Smith" },
  "publisher": { "name": "Example Corp" },
  "checksum": "8f7a9b3cf5a...a8b9c07e8f"
}

.well-known/llm.json — The Manifest

This file serves as a high-level entry point for AI agents. It describes your site, links to structured content, and optionally includes context and a Merkle root for integrity checks.

{
  "version": "1.0",
  "site": {
    "name": "Example Corp",
    "language": "en"
  },
  "modules": [
    "/langshake/article.json"
  ],
  "llm_context": {
    "summary": "We build open-source AI tools.",
    "principles": ["Transparency", "Ethics"]
  },
  "verification": {
    "strategy": "merkle",
    "merkleRoot": "abc123..."
  }
}

Why LangShake?

Efficiency

Minimizes HTML parsing with structured JSON metadata, reducing computational overhead and speeding up data collection processes.

Integrity

Checksum verification ensures trustworthy data by validating content hasn't been modified, creating a more reliable training dataset.

Compatibility

Builds on established Sitemap.xml and Schema.org standards, making adoption straightforward for websites already using these technologies.

Developer-Friendly

CLI tool abstracts complexity; caching speeds up local builds, making implementation straightforward for development teams.

Context-Ready

Developers can add nuance for LLMs in a dedicated field, enabling more precise and contextually appropriate AI interactions.

Future-Proof

Modular, extendable spec allows for Merkle, LLM summaries, and more, ensuring the protocol can adapt to emerging technologies.

Call for Experts

We're inviting experts in AI, web standards, and data engineering to provide feedback and contribute to LangShake's development. Your expertise can help shape the future of LLM training data collection.

Share Your Insights