---
title: "How to Build a Website in 2026"
description: "An entity-first website built on Astro using a simple CMS and large JSON databases enables fast, machine-readable graphs optimized for modern LLM search, RAG synthesis, and explicit data indexing."
category: Essays
publishDate: 2026-05-22T06:54:00.000Z
---The shift from keyword-matching to entity-modeling isn't a theory—it's the architecture that separates the next generation of search-native products from the last.

In 2026, the web design and search optimization industries have collided at a precise, inescapable intersection: algorithmic graph theory ([Ali et al., 2024](https://www.google.com/search?q=%23references)). The era of treating a website as a digital filing cabinet full of individual text documents is dead. String matching is dead.

Search platforms no longer view a page as a collection of matching textual phrases. Instead, they view it as a vector space populated by recognized entities, attributes, and mathematical predicates ([Siddharth et al., 2021](https://www.google.com/search?q=%23references)). They look for a coherent model of a domain, expressed as an immutable knowledge graph that their web systems can traverse, validate, cross-reference, and cite ([Zhang et al., 2024](https://www.google.com/search?q=%23references)).

Most websites are still architected as flat collections of documents, with each individual asset optimized around an isolated phrase someone might type into a search query box. We reject that framing entirely. A premium website in 2026 is a standalone, deterministic representation of a real-world ecosystem—the core concepts, the actors, the locations, the products, and the credentials—and your URLs are merely customized, contextual *projections* of that central data layer.

## The Confluence of LLMs and Graph Theory

The intersection of Large Language Models (LLMs) and entity-first web architecture forms the foundation of modern search. To understand why an entity-first design is necessary, we must examine how modern information retrieval systems function under the hood.

In early search architecture, web pages were treated as bags of words. Today, search engines use Retrieval-Augmented Generation (RAG) to ground their language models in verified facts. The LLM does not generate answers purely from its static training data; instead, it acts as a synthesis engine that processes raw documents retrieved from a core search index and shapes them into a natural language response.  

This mechanism relies heavily on Query Fan-Out. When a user inputs a complex query, the underlying model breaks it down into a multi-directional cluster of parallel sub-queries to gather comprehensive background information. If your website is structured as a collection of disjointed documents, the model cannot map the relationships between your pages during this rapid retrieval phase.

An entity-first site architecture provides the structural clarity these systems require. By organizing your data into a unified, machine-readable graph, you ensure that when an LLM traces an industry concept, your content is structurally prepared to be extracted, synthesized, and cited.

## Google’s Strategic Shift: The Intermediary AI Layer

Recent product rollouts and patent filings reveal how search engines are evolving from discovery platforms into primary destinations. The release of Google's official optimization guide, *Optimizing your website for generative AI features on Google Search*, confirms that generative features like AI Overviews and AI Mode share the same foundational infrastructure as traditional organic search ([Google Search Central, 2026](https://www.google.com/search?q=%23references)). There is no separate AI index; the systems deciding which pages appear in standard search results are the same ones feeding the RAG pipeline.  

However, the presentation layer is changing significantly. The issuance of [Google Patent US12536233B1](https://patents.google.com/patent/US12536233B1/en), *AI-generated content page tailored to a specific user*, outlines an architecture where the search engine evaluates a user's intent history and generates a customized intermediate landing page ([USPTO, 2026](https://www.google.com/search?q=%23references)).

If a website presents thin content, weak site navigation, or ambiguous data, the search engine can bypass the destination domain entirely. It synthesizes a custom, personalized interface using data pulled from the index, shifting the user's click from an external website to an internal, platform-hosted experience.

Simultaneously, Google's research into decentralized, on-device models demonstrates a shift toward extraction based on user behavior rather than explicit keyword matching. By decomposing a user's multi-screen journey into summarized interaction blocks, smaller models infer user goals from sequential actions.

When search engines determine relevance based on behavioral context, linear content optimization becomes insufficient. Websites must present distinct, extractable data points that align with specific stages of the user's decision-making process.

## The Core Philosophy: Aesthetic Precision Meets Topological Depth

To build a website that survives the programmatic landscape of 2026, your structural engineering must be as meticulous as your typography. We balance the clean, typography-first minimalism of elegant web layout with the deep, unapologetic technical rigor of semantic topical mapping.

The strategy is simple yet absolute:

> "Model the entity world once within an unyielding database schema; enforce structural consistency and verifiable source data; project pages from it only where the data density earns a page; and dedicate human authorship exclusively to the nuanced, interpretive analysis a graph cannot calculate."

## The Death of the Monolithic CMS: Why We Build on Astro

To realize an entity-first architecture, you must completely abandon the legacy CMS paradigms of the past decade. The industry-standard approach of the last twenty years—typified by WordPress, Drupal, and heavy, monolithic relational-database setups—is fundamentally incompatible with entity-modeling at scale.

Legacy platforms treat content as unstructured blobs of text stored in a rigid database table, forced through an expensive runtime computation loop every single time a visitor requests a URL. They are document-first, slow, plug-in dependent, and structurally fragile.

In 2026, we build exclusively on [Astro.build](https://www.google.com/search?q=https://astro.build), paired with a highly structured, simple, entity-based headless CMS for human authorship and a larger, version-controlled JSON database repository for systemic facts. This technical stack delivers three massive architectural shifts:

### 1. Build-Time Graph Compilation

Astro functions as a compiler, not a runtime processing engine. It allows us to pull data from our massive JSON files and our headless CMS simultaneously at build time. Astro maps the relationships between these files, validates our data-density thresholds, compiles the dense cross-linking arrays, and outputs pure, hyper-optimized, static HTML. There is no live SQL database to exploit, no server-side bottleneck, and zero runtime overhead.

### 2. Zero-JS Server-First Performance

Traditional single-page application (SPA) frameworks force the client's browser to execute massive packages of JavaScript just to render simple text and layout links. This ruins performance metrics and creates indexing hazards for modern crawling bots.

Astro’s server-first architecture strips away all unnecessary client-side JavaScript by default. The result is instant, sub-10ms edge-delivery of pure typography and clean semantic markup.

### 3. Decoupling Truth From Presentation

By migrating our core dataset into structured JSON databases and tracking human editorial text within a lightweight, entity-driven headless CMS, our information asset becomes entirely portable.

We are no longer bound to a specific database format or theme engine. The data exists as pure, machine-readable truth. Astro simply acts as the high-performance lens that projects that truth into the digital wild.

## The Five Structural Principles

These are not abstract design ideals. They are architectural dictates that govern every database table, every layout file, every relational edge, and every editorial workflow.

### 1. Entities Over Documents

Model the things in the world first; pages come second as superficial renderings. In the database architecture, the schema definition for a primary entity is vastly more fundamental than the web layout displaying it. Structured data must live in an isolated, normalized data layer while human-authored text lives as localized documents. They are completely different classes of digital objects, and your architecture must honor that boundary.

An enterprise organization, an executive role, or a core industry concept is not a "page." It is a discrete entity node in the world possessing predictable attributes and multi-directional relationships ([Siddharth et al., 2021](https://www.google.com/search?q=%23references)). The page layout at any given URL is simply one ephemeral view of that entity. When you update the entity in your data tier, every semantic view, schema string, and internal cross-reference across your system must update simultaneously.

### 2. Structural Uniformity as an Authority Signal

An authoritative repository treats every entity within a shared classification exactly the same way. Every entity profile must answer identical core questions within an identical structure; every systematic index must deploy an unyielding, standardized math function; every statistic must trace back to an identical taxonomy of source documentation.

We enforce this uniformity *by design* through rigid relational schemas, programmatic scoring routines, and strict design systems. This makes structural asymmetry impossible.

A web system that maps its sector systematically proves that it understands the parameters of its industry ([Bergeaud et al., 2017](https://www.google.com/search?q=%23references)). When search engines calculate entity salience—the mathematical confidence with which an algorithm can isolate the entities on your page and map their relative distance to other nodes—the consistency of your layout structure functions as a primary indexing filter.

### 3. The Strict Data-Density Threshold

We do not generate long-tail programmatic index pages or multi-tiered filtering matrices unless the underlying knowledge graph holds enough unique data points to cross a specific data threshold. A programmatic page that contains only hollow templated sentences and empty tabular rows is an architectural failure, a user liability, and a semantic lie. It asserts an entity relationship that your data engine cannot substantiate.

Our data-density threshold serves as a core philosophy embedded directly into the codebase: the system will refuse to construct a URL or declare an index entry unless it possesses the statistical mass required to prove its own utility.

### 4. Immutable Data Provenance

We never fabricate, approximate, or loose-type a data metric. Every numeric attribute across the platform is explicitly tied to a verified primary source repository, a public record dataset, or an active API endpoint—or it is explicitly flagged to the user and the crawler as an unverified variable ([Xu et al., 2024](https://www.google.com/search?q=%23references)). This goes far beyond editorial transparency; it is the programmatic construction of an authoritative web asset.

Trust is compiled as a first-class programmatic attribute of every single entity node, not slapped onto a page later as a content-marketing checklist.

Our detailed sources blocks, active cross-citations, and real-time validation dates represent the user-facing expression of a framework where data origin is hard-coded into the system schema. This structural execution aligns precisely with the patterns uncovered in [Google's Knowledge Graph Reconciliation Patent (US20190251173A1)](https://www.google.com/search?q=https://patents.google.com/patent/US20190251173A1/en), which details how search engines isolate candidate data tuples from the open web, isolate conflicting assertions, and cluster them based on provenance, source authority, and entity-attribute reliability.

### 5. Compute What Is Mechanical; Author What Is Human

Our development framework draws a definitive line between content that should be generated by an engine (entity attribute intersections, comparison indices, directional matrices) and content that requires manual human execution (deep investigative analysis, first-party reporting, hand-curated perspectives, and institutional critique).

The first category is mathematically projected directly from your graph database and should never be manually modified by an editor. The second category is hand-written, maintained under strict version control, and serves as your platform's authentic voice.

Understanding this split is how you win the modern web. Generic informational summaries have been completely commoditized; algorithms can synthesize basic answers on the fly. The strategic edge shifts entirely to human interpretation, distinct corporate perspective, and structural authority that an isolated graph cannot invent.

## The Strategic Blueprint for Next-Generation Optimization

As search platforms prioritize synthesis over link aggregation, optimization strategies are shifting from keyword targeting to maximizing information extraction density. The goal is no longer simply to rank for a specific term, but to format your site's data so clearly that it is seamlessly ingested by the model's retrieval system.

### 1. Transition to Modular Content Design

Traditional web layouts rely on continuous, linear text. To optimize for RAG synthesis, content must adopt a modular, componentized structure. This approach organizes information into clear, distinct sections:

* The Informational Core: A highly concise definition or answer placed at the beginning of a section, designed for quick extraction by an indexing bot.
* The Structured Proof Tier: A supporting data matrix, table, or structured chart that clarifies the primary statement. Synthesis models frequently lift tabular structures to populate comparison features in generative interfaces.
* The Expert Analysis Block: A deep-dive narrative providing unique insights, first-hand professional experience, or institutional perspective.

### 2. High-Density Internal Relationships

The internal linking structure of a website should reflect the actual relationships within the industry it covers. Every link must act as a clear semantic bridge between distinct entities. Instead of generic navigational phrasing, anchor text must explicitly state the relationship between the connected pages.

### 3. Absolute Technical Indexing Stability

Because generative answer engines depend entirely on the primary search index for live data retrieval, technical framework health is a critical priority. If an asset is blocked by rendering errors, has poor Core Web Vitals, or suffers from high latency, it becomes ineligible for the RAG pipeline. Building on a modern, high-performance architecture like Astro ensures your content remains readily accessible to search crawlers.

## The Semantic Pipeline: Six Engine Overhauls

In building out our system, we map our architectural goals directly against the algorithmic mechanisms outlined in documented retrieval patents. We use these patents as conceptual maps to locate structural gaps in traditional web design, rather than treating them as specific variables to optimize against.

Our core priority rule remains absolute: prioritize engineering changes that simultaneously deliver exceptional information design and flawless user utility. Purely theoretical strategy bets are treated as lower confidence; they are executed only when the baseline architecture is fully secured.

### Tier A: Foundational Execution Overhauls

#### A1. Map the Entity Graph into Rendered HTML

Every entity template must loop through its relational database links and generate explicit internal link modules using highly descriptive anchor text ([Zhang et al., 2024](https://www.google.com/search?q=%23references)). Your relationships already exist inside your database engine; your layout layer must project those implicit edges as crawlable links so that standard parsing systems can easily traverse the latent graph.

This model directly satisfies the extraction patterns detailed in [Google's Entity Extraction Patent (US11017304B2)](https://patents.google.com/patent/US11017304B2/en), which outlines how semantic nodes and relationship triplets are isolated from human language layouts ([Zuo et al., 2022](https://www.google.com/search?q=%23references)):

* Primary Entity Profiles: Systematically calculate and print deep contextual links pointing to associated sub-entities, localized operating markets, required certifications, and immediate market competitors.
* Classification & System Rubrics: Programmatically link out to operational baselines, parent organizations, core metric lexicons, and the primary high-value locations where those rubrics apply.
* Localized Hub Indexes: Automate navigation lists that link down to hyper-local market operators, regional infrastructure options, and specific industry specializations trending within that distinct coordinate zone.

#### A2. Project the Entity Schema Model as a JSON-LD Data Layer

While our interior database systems define entity structures cleanly behind the firewall, we must use deep JSON-LD serialization to explicitly declare those identical networks to any external consumption engine. The schema layer should not be treated as a marketing addition designed to win a visual layout badge in a search result; it is an executive document detailing the architecture of your data model.

* Enterprise Entities & Facilities: Serialize into comprehensive `Organization` or `EducationalOrganization` data schemas, explicitly mapping global corporate identifiers, confirmed physical `address` fields, and `numberOfEmployees` metrics sourced from our data tier.
* Operational Roles & Skill Tracks: Serialize into deep `Occupation` schemas, strictly populating national labor classification codes, localized `estimatedSalary` variations, and required certification properties.
* System Evaluations & Rankings: Output detailed `ItemList` architectures with sequential `ListItem` positions, tying each array element to its deterministic database ID.

#### A3. Hardcode Internal Link Integrity and Clean Sitemap Discipline

Your platform's XML sitemaps must operate as an automated reflection of your data-density thresholds. If an entity page falls beneath your system's data-density floor following an external data pull, the system must handle the containment automatically:

* The routing architecture must immediately throw a `404` or append a `noindex` tag to that specific rendering node.
* The sitemap generator must dynamically drop the URL from all asset listings.
* The system must verify that every single active page in the graph is nested no more than three standard directory links away from a root index hub, completely eradicating orphan nodes.

This technical hygiene limits your system's exposure to structural bloat, keeping your layout perfectly aligned with the ingestion rules established in [Google's Context-Based Rich Content Directory Patent (US12482215B1)](https://www.google.com/search?q=https://patents.google.com/patent/US12482215B1/en), which addresses computational load optimization when traversing expansive knowledge networks.

### Tier B & C: Speculative Context Overhauls

#### B1. Construct an Industry Lexicon as an Autonomous Database Entity Type

Do not relegate your industry's complex definitions to plain text entries buried in generic blog posts. Instead, treat every industry acronym, technical system metric, and compliance standard as an unyielding entity node inside an independent database lexicon.

Each terminology entry requires its own dedicated canonical URL, a clean structural definition, a fully valid `DefinedTerm` schema declaration, and an automated internal link cluster pointing back to the real-world companies, operational roles, and geographic markets that are governed by that distinct concept. This strategy converts a simple reference page into a heavy structural anchor that binds your entire graph together.

#### B2. Convert System Rating Frameworks and Evaluation Metrics into Clear Data Signals

If your site evaluates and ranks external institutions, platforms, or systems, you must abandon arbitrary narrative reviews. Translate your complex internal analytical rubrics into machine-readable data arrays.

Expose your foundational criteria metrics—such as historic success rates, operational cost efficiency, compliance metrics, and infrastructure capability rankings—as distinct data cells printed directly onto the user-facing template.

Simultaneously, output this data through highly structured `Review` and `AggregateRating` schemas, explicitly setting your platform's scoring framework as the definitive `ratingExplanation`. This transforms a subjective opinion into a structured data assertion that can be ingested, analyzed, and indexed as an authoritative fact.

#### C1. Deploy Dynamic Relative-Entity Journey Navigation Modules

Incorporate smart context blocks that compute real-time relational connections based on the layout paths of your user base (e.g., *"Users analyzing Enterprise Platform Alpha frequently compare it with System Beta"*).

This approach is executed purely for information utility and navigational clarity. We treat the subsequent algorithm-indexing benefits as a natural consequence of exceptional user interface design, rather than treating the search engine as the primary customer.

## The Industry Consensus: Insights from Semantic Experts

The shift toward a unified information architecture is widely supported by leading search experts and industry analysts. Rather than treating Generative Engine Optimization (GEO) as an entirely new discipline, the consensus views it as a deeper, more rigorous extension of advanced technical SEO ([Search Engine Journal, 2026](https://www.google.com/search?q=%23references)).  

### The Holistic View: Koray Tuğberk Gübür

Koray Tuğberk Gübür, founder of Holistic SEO & Digital, has long advocated out for the integration of [Topical Authority and Semantic SEO](https://www.google.com/search?q=https://www.holisticseo.digital/) frameworks. His research outlines how search engines construct internal concepts of web entities based on structural consistency and semantic logic.

Gübür’s framework emphasizes that search engines do not analyze pages in isolation; they evaluate the comprehensive network density of an entire domain. If a site provides a uniform layout and clear semantic relationships across all its pages, it builds the structural trust required to anchor its industry sector.

### The Structural Focus: Chris Pearson

Chris Pearson, pioneer of semantic web typography and high-performance layout design, emphasizes the critical connection between site architecture, performance, and information delivery. Pearson’s design philosophy focuses on removing unnecessary code bloat to ensure that both human readers and search crawlers can parse a page's core content without friction.

In a search landscape increasingly shaped by AI synthesis, minor layout inefficiencies or rendering delays can prevent content from being indexed. A clean, asset-optimized design is a fundamental requirement for modern information visibility.

### The Agency Perspective: Industry Insights

Leading digital marketing analysts confirm that modern optimization requires shifting focus away from traditional search engine result page (SERP) positions toward broader digital ecosystem tracking. As noted by [Rachel Harvey, SEO Director at Impressive Digital](https://impressive.com.au/google-ai-overview-guidelines-release/), optimization for generative search features relies heavily on maintaining a clean technical foundation and delivering verifiable, firsthand expertise.

The industry is moving toward measuring holistic metrics, including:

* Model Citation Frequency: How often a brand or platform is referenced as a trusted source within generative summaries.
* Share of Voice (SoV) within Synthesized Responses: The percentage of AI-generated answers within a specific industry that include your data points.
* Entity Attribute Confidence: The speed and accuracy with which a machine-learning model maps your brand's core offerings to its internal knowledge graph.

The consensus across the industry is clear: the websites that thrive in an AI-driven environment are those built on structured data, verifiable facts, and exceptional performance. By moving away from legacy monolithic platforms and adopting a fast, entity-first architecture, your digital properties remain highly effective components of the modern web ecosystem.

## Architecture-Driven Implementation

Executing this strategy does not require you to constantly build out new application systems. It simply requires you to design three independent projections derived from a single, unchanging data engine tier:

This architecture brings our foundational manifesto principle to life: model your target ecosystem once inside a pristine database layer, and project that model out into whatever specialized formats your targets (human users, system crawlers, language networks) require. Your discovery pipeline is merely the machine-facing expression of your interior database logic.

### Structural Sequence of Implementation

1. [JSON-LD Data Engine Serialization (A2):](https://www.google.com/search?q=%23a2-project-the-entity-schema-model-as-a-json-ld-data-layer) Secure your highest concrete technical advantage by transforming your core database arrays into valid, explicit schema.org script models across all templates.
2. [HTML Cross-Linking Projections (A1):](https://www.google.com/search?q=%23a1-map-the-entity-graph-into-rendered-html) Programmatically map out related-entity link frameworks across every profile template, utilizing data variables that are already compiled within your system.
3. [Dynamic Sitemap & Threshold Containment (A3):](https://www.google.com/search?q=%23a3-hardcode-internal-link-integrity-and-clean-sitemap-discipline) Hardcode crawl-budget optimization loops to ensure your sitemaps prune away low-density and thin programmatic paths automatically.
4. [Lexicon Entity Framework (B1):](https://www.google.com/search?q=%23b1-construct-an-industry-lexicon-as-an-autonomous-database-entity-type) Deploy your structural industry glossary module to serve as a collection of relational bridges across disparate database classifications.
5. [Machine-Readable System Rubrics (B2):](https://www.google.com/search?q=%23b2-convert-system-rating-frameworks-and-evaluation-metrics-into-clear-data-signals) Structuralize all performance rankings and scoring mechanisms into comparative public data tables.
6. [Relative Navigation Journeys (C1):](https://www.google.com/search?q=%23c1-deploy-dynamic-relative-entity-journey-navigation-modules) Inject smart contextual link matrices across template footers to optimize human navigation across highly complex data intersections.

### The Honest Bottom Line

The engineering investments that represent exceptional information architecture—dense relational internal linking, serialized data schemas, and rigorous programmatic crawl hygiene—are fundamentally worth building whether or not an individual search algorithm patent is actively running in production. The strategic moves that lean exclusively on speculative ranking theories are lower-priority tasks.

Build the foundational information data tier first; treat the algorithmic edge as a natural byproduct of great engineering.

A platform designed from the ground up around these structural parameters is uniquely insulated from the chaotic shifts of the modern web landscape because its core domain model is clean, accurate, and completely separate from its presentation layer.

Exposing that internal model to external search engines and large language networks isn't a complex optimization trick; it is simply a matter of technical presentation. The architecture remains bulletproof regardless.

## References

* Ali, A., Tufail, A., De Silva, L. C., & Abas, P. E. (2024). Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art Searches. *Applied System Innovation*, *7*(5), 91. [https://doi.org/10.3390/asi7050091](https://www.google.com/search?q=https://doi.org/10.3390/asi7050091)
* Bergeaud, A., Potiron, Y., & Raimbault, J. (2017). Classifying patents based on their semantic content. *PLOS ONE*, *12*(4), e0176310. [https://doi.org/10.1371/journal.pone.0176310](https://www.google.com/search?q=https://doi.org/10.1371/journal.pone.0176310)
* Google Search Central. (2026). Optimizing your website for generative AI features on Google Search. *Google for Developers*. <https://developers.google.com/search/docs/fundamentals/ai-optimization-guide>
* Search Engine Journal. (2026). Google's New AI Search Guide Calls AEO And GEO 'Still SEO'. *Search Engine Journal*. <https://www.searchenginejournal.com/googles-new-ai-search-guide-calls-aeo-and-geo-still-seo/575026/>  
* Siddharth, L., Blessing, L. T. M., Wood, K. L., & Luo, J. (2021). Engineering Knowledge Graph From Patent Database. *Journal of Computing and Information Science in Engineering*, *22*(2). [https://doi.org/10.1115/1.4052293](https://www.google.com/search?q=https://doi.org/10.1115/1.4052293)
* United States Patent and Trademark Office (USPTO). (2026). US Patent No. US12536233B1: AI-generated content page tailored to a specific user. *Google Patents*. <https://patents.google.com/patent/US12536233B1/en>  
* United States Patent and Trademark Office (USPTO). (2019). US Patent Application No. US20190251173A1: Knowledge Graph Reconciliation. *Google Patents*. [https://patents.google.com/patent/US20190251173A1/en](https://www.google.com/search?q=https://patents.google.com/patent/US20190251173A1/en)
* United States Patent and Trademark Office (USPTO). (2021). US Patent No. US11017304B2: Entity Extraction. *Google Patents*. <https://patents.google.com/patent/US11017304B2/en>
* United States Patent and Trademark Office (USPTO). (2025). US Patent No. US12482215B1: Context-Based Rich Content Directory. *Google Patents*. [https://patents.google.com/patent/US12482215B1/en](https://www.google.com/search?q=https://patents.google.com/patent/US12482215B1/en)
* Xu, J., Yu, C., Xu, J., Ding, Y., Torvik, V. I., Kang, J., Sung, M., & Song, M. (2024). PubMed knowledge graph 2.0: Connecting papers, patents, and clinical trials in biomedical science. *arXiv*. [https://doi.org/10.48550/arxiv.2410.07969](https://www.google.com/search?q=https://doi.org/10.48550/arxiv.2410.07969)
* Zhang, L., Hu, K., Ma, X., & Sun, X. (2024). Combining Semantic and Structural Features for Reasoning on Patent Knowledge Graphs. *Applied Sciences*, *14*(15), 6807. [https://doi.org/10.3390/app14156807](https://www.google.com/search?q=https://doi.org/10.3390/app14156807)
* An, Y., & Childs, P. (2022). Patent-KG: Patent Knowledge Graph Extraction for Engineering Design. *Proceedings of the Design Society*, *2*, 821-830. <https://doi.org/10.1017/pds.2022.84>