Book a demo

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Try now

LLM hallucination rate evaluation for engineering

LLM Hallucination Rate Evaluation — PatSnap Insights
AI & Engineering Intelligence

Deploying large language models in high-stakes engineering environments demands more than accuracy benchmarks — it requires quantifiable, auditable hallucination rate measurement. This analysis synthesises approximately 60 patent filings from JPMorgan Chase Bank, Vodafone, Microsoft, Google, ABB, and others to map the four principal technical approaches: pre-generation probability scoring, VAE-based runtime detection, composite health frameworks, and domain-specific correction pipelines.

PatSnap Insights Team Innovation Intelligence Analysts 12 min read
Share
Reviewed by the PatSnap Insights editorial team ·

Pre-generation hallucination probability: stopping errors before they start

The most proactive approach to evaluating LLM hallucination rate is to compute a probability of hallucination before the model generates any response at all — enabling hard-gating of high-risk engineering queries rather than post-hoc correction. JPMorgan Chase Bank’s 2025 patent on hallucination probability prediction formalises this: an incoming query is perturbed n times into lexically divergent but semantically equivalent variations; n+1 independent agents then sample outputs for each variant; a statistical simulation algorithm is applied across those sampled outputs; and the resulting empirical expected hallucination rate becomes the ground truth label for training an encoder classifier. The classifier ultimately returns a probability-of-hallucination value before the LLM generates any response — allowing a risk threshold to gate whether the query proceeds to the model or triggers human expert review.

~60
Patent filings on LLM hallucination detection (US, EP, GB, WO, CN, KR, JP+)
3
Jurisdictions covered by Vodafone’s VAE-based RAG hallucination detection patent
4
Principal technical themes across the patent dataset
n+1
Independent agents used in JPMorgan’s pre-generation perturbation framework

The companion JPMorgan filing (2026) formalises the training pipeline: a plurality of LLMs perturb training queries n times, generating perturbed outputs whose consistency is measured via computational statistical simulation to derive empirical probability estimations that become supervised training labels. The statistical robustness of using Monte Carlo-style sampling across multiple agent outputs — rather than a single confidence score derived from token probabilities — makes this approach particularly defensible in engineering audit trails where probabilistic traceability is required by regulators or quality management systems.

JPMorgan Chase Bank’s pre-generation hallucination system perturbs an incoming query n times into semantically equivalent variations, deploys n+1 independent agents to sample outputs for each variant, and applies statistical simulation to derive an empirical hallucination probability before the LLM generates any response — enabling hard-gating of high-risk engineering queries.

Microsoft Technology Licensing’s 2025 forward-backward traversal method complements query-perturbation approaches with a geometric consistency check. A primary forward prompt yields a primary answer; backward traversals — using answer-question pairs with the primary answer embedded but the primary question withheld — generate candidate questions. A vector distance between candidate question embeddings and the primary question embedding serves as a hallucination indicator. This method is notable for its model-agnostic quality and its tolerance for varying temperature and sampling parameters (top-p, top-k), allowing it to probe LLM response consistency across stochastic conditions that approximate real engineering query variance — making it applicable to third-party LLM integrations common in engineering decision support platforms, without requiring internal model access.

What is pre-generation hallucination probability estimation?

Pre-generation hallucination probability estimation is a class of techniques that compute a risk score for a query before the LLM generates any response. Rather than detecting errors after output, these methods use query perturbation, multi-agent sampling, and statistical simulation to estimate the likelihood that a given input will elicit a hallucinated response — enabling threshold-based gating in high-stakes engineering workflows.

Figure 1 — Pre-generation hallucination probability estimation: process flow
Pre-generation LLM hallucination probability estimation: query perturbation, multi-agent sampling, statistical simulation, encoder classifier output Incoming Query n Query Perturbations n+1 Agent Sampling Statistical Simulation Hallucination Probability Step 1 Step 2 Step 3 Step 4 Output
JPMorgan Chase Bank’s patented pipeline estimates hallucination probability before generation by perturbing the query n times, sampling across n+1 agents, and applying statistical simulation — producing a risk score that can gate whether a query proceeds to the LLM.

SRI International’s 2025 hallucination prevention system addresses a different temporal moment: inline generation. The system monitors token-by-token generation uncertainty against a predetermined threshold and injects “think tokens” — additional computation prompts — whenever generated tokens exhibit uncertainty exceeding expected bounds. This mechanism operates during streaming rather than post-hoc, making it applicable to engineering decision interfaces where latency constraints prevent full-output analysis. According to NIST‘s AI Risk Management Framework, inline uncertainty monitoring of this kind aligns with the “Govern” and “Manage” functions of responsible AI deployment.

Runtime detection in RAG-enhanced LLMs: catching misalignment at inference

Retrieval-Augmented Generation (RAG) systems do not eliminate hallucination in large language models — they shift the hallucination signature from outright fabrication to subtle misalignment between retrieved evidence and generated assertion. Vodafone Group Services Limited’s EP filing (2026) addresses this directly: an LLM output vector is fed into the encoder portion of a Variational Autoencoder (VAE), which maps the output into a dimensionally reduced latent space distribution. The VAE is pre-trained on labeled datasets of normal versus hallucination outputs, enabling it to compute a likelihood metric for whether any new output vector deviates from the characteristic distribution of factually grounded responses — without requiring ground truth at inference time.

Vodafone Group Services Limited’s VAE-based hallucination detection patent, filed across EP, US, and GB jurisdictions (2026), maps LLM output vectors into a dimensionally reduced latent space to compute a likelihood metric distinguishing hallucinated from factually grounded responses in RAG-enhanced LLMs — without requiring ground truth at inference time.

The GB filing explicitly notes that a detected candidate hallucination may cause the output to be discarded, the user to be alerted, or a revised prompt to be generated automatically — three distinct response modes that engineering decision pipelines can select based on criticality level. Vodafone’s parallel filings across EP, US, and GB indicate a coherent global IP strategy for closed-domain RAG hallucination detection, suggesting the organisation views this capability as a core defensible asset.

“VAE-based latent space analysis provides scalable runtime hallucination detection for RAG systems without requiring ground truth at inference time — essential for domains lacking comprehensive reference corpora.”

For industrial asset management specifically, ABB Switzerland’s CN filing (2025) introduces a verification plan methodology: after an LLM returns an answer about an industrial asset, a set of follow-up verification questions is constructed based on the technical context, and the degree to which the LLM’s answers to verification questions align with expected answers constitutes a confidence metric. The patent draws an analogy to forensic interrogation — consistent fabrication across multiple cross-questions is difficult to maintain, so inconsistency in follow-up responses signals hallucination. This approach is particularly suited to process engineering and asset maintenance contexts where domain-grounded expected answers can be pre-established, and it does not require labeled training data for each asset type.

Explore the full patent landscape for LLM hallucination detection in PatSnap Eureka — filter by assignee, jurisdiction, and filing date.

Explore Full Patent Data in PatSnap Eureka →
Figure 2 — LLM hallucination detection approaches: patent count by technical theme and key assignee
LLM hallucination rate detection patent filings by technical theme: pre-generation estimation, VAE runtime detection, composite evaluation frameworks, domain-specific correction 0 5 10 15 2 Pre-generation Estimation 3 VAE Runtime Detection 5 Composite Eval Frameworks 3 Domain-specific Correction Pre-generation VAE Runtime Composite Eval Domain-specific
Patent counts by technical theme across the dataset’s principal approaches — composite evaluation frameworks (including LTI MindTree, Accenture, ServiceNow, BMC, and Oracle filings) represent the largest cluster, reflecting the governance imperative in regulated engineering environments.

Google LLC’s parallel US and WO filings (2025) take a generative-correction approach: if a first response is detected to contain hallucination, a second response is generated and checked, with only the verified non-hallucinated response rendered to the client. Google also filed on monitoring generative model quality using an expert system to benchmark LLM output quality against modified model versions, incorporating backstop prompts that a model must answer acceptably before production clearance — a pattern directly analogous to qualification testing in engineering certification processes. Standards bodies such as ISO and IEEE are increasingly examining how such iterative verification patterns map onto existing software quality assurance frameworks.

Key finding: RAG does not eliminate hallucination

Retrieval-Augmented Generation shifts the hallucination signature from fabrication to subtle misalignment between retrieved evidence and generated assertion. Vodafone’s VAE-based detection patent demonstrates that dimensional reduction of output vectors into a trained latent space can distinguish hallucination from normal outputs without requiring ground truth at inference time — essential for engineering domains lacking comprehensive reference corpora.

End-to-end evaluation frameworks and composite health scoring for regulated engineering environments

Evaluating hallucination rate for high-stakes deployment requires more than binary detection — it requires calibrated, multi-dimensional scoring that can serve as an operational quality gate with auditable thresholds. LTI MindTree Ltd.’s 2025 end-to-end LLM evaluation system evaluates both input prompts and output responses across multiple characteristics encompassing quality and quantity dimensions. Each input characteristic is assigned a normalized score via statistical techniques to derive a composite health score; outputs are evaluated both with and without ground truth references. A scorer module employing threshold-based statistical techniques aggregates input prompt health and output prompt response health into a final LLM health score — allowing organizations to set engineering-specific acceptance thresholds below which an LLM version is not cleared for production use.

LTI MindTree Ltd.’s end-to-end LLM evaluation system (2025) aggregates normalized scores across multiple input prompt and output response characteristics into a composite LLM health score, enabling organizations to set engineering-specific acceptance thresholds below which an LLM version is not cleared for production use in regulated environments such as aerospace, energy infrastructure, or pharmaceuticals.

Accenture Global Solutions’ 2026 Responsible AI Operations (RAIOPS) evaluation method frames hallucination evaluation within a broader governance paradigm: prompts and responses are stored as associations, and user-specified evaluation criteria drive the computation of evaluation metrics. Results are visualized as knowledge graph representations or numerical scores indicating whether the LLM needs optimization or tuning — a governance-oriented feedback loop directly applicable to engineering decision-support system certification processes. This approach aligns with the accountability principles articulated by OECD‘s AI Principles, which emphasize transparency and human oversight in high-stakes AI deployments.

BMC Software’s 2024 domain-specific hallucination detection pipeline demonstrates per-assertion reliability quantification: a domain-specific ML model trained on resolved incident tickets assigns a hallucination score to each resolution statement by cross-referencing it against source worklog data or training data. Hallucinated content is flagged and removed before the resolution is finalized. This exemplifies how hallucination rate can be estimated at the assertion level within a structured engineering artifact — an incident ticket — rather than at the output level only, providing more actionable reliability signals for engineering quality assurance teams.

BMC Software’s domain-specific hallucination detection pipeline (2024) assigns a hallucination score to each resolution statement in an incident ticket by cross-referencing it against source worklog data, enabling per-assertion reliability quantification within structured engineering artifacts rather than at the output level only.

ServiceNow’s 2025 Framework for Trustworthy Generative Artificial Intelligence generalises this pattern: a validation model configured to detect a specific fault property in an LLM output computes a likelihood metric; if the metric exceeds a fault threshold, the output is labeled untrustworthy. The architecture supports real-time computation of metrics by pre-processing modules, which is essential for maintaining responsive engineering advisory systems without sacrificing trust assurance. NEC Laboratories Europe’s 2026 filing extends this to computational biology and medical AI contexts, using attribution links between text spans to identify hallucination candidates — a technique transferable to engineering documentation analysis where traceability between claim and source is a compliance requirement.

Use PatSnap Eureka to analyse composite LLM evaluation patents by assignee, claim depth, and jurisdiction.

Analyse Patents with PatSnap Eureka →
Figure 3 — Composite LLM health score architecture: input and output evaluation dimensions
Composite LLM health score evaluation framework for engineering decision support: input characteristics, output characteristics, and threshold-based scoring INPUT PROMPT HEALTH Quality dimensions Quantity dimensions Normalized scoring per characteristic Score A SCORER MODULE Threshold-based statistical aggregation OUTPUT RESPONSE HEALTH With ground truth Without ground truth Hallucination flag per assertion Score B ↓ Final LLM Health Score Acceptance threshold gates production deployment
LTI MindTree Ltd.’s end-to-end LLM evaluation architecture aggregates input prompt health (Score A) and output response health (Score B) through a scorer module into a final LLM health score — enabling organizations to set engineering-specific acceptance thresholds for production clearance.

Microsoft Technology Licensing’s 2025 calibrated confidence estimation filing adds a complementary dimension: description-based and cause-based confidence scores are calibrated using historical event data in a target domain, making them directly applicable to engineering root-cause analysis. Oracle International Corporation’s 2026 machine learning traceback-enabled decision rationale patent emphasizes explainability and traceability of AI-driven decisions — critical requirements for engineering audit and compliance that align with guidance from bodies such as WIPO on AI transparency in industrial innovation contexts.

Who is patenting LLM hallucination detection: key assignees and innovation trends across ~60 filings

The patent dataset encompasses approximately 60 filings and pending applications across US, EP, GB, WO, CN, KR, JP, and other jurisdictions — with dominant assignees spanning financial services, telecommunications, enterprise software, and industrial automation. This breadth of sectors reflects growing urgency to deploy trustworthy LLMs in mission-critical decision support, and the filing patterns reveal distinct technical strategies by organisation.

JPMorgan Chase Bank leads in pre-generation hallucination probability estimation, filing both the system-level patent and the encoder training methodology. Their approach of multi-agent query perturbation with statistical simulation represents the most technically rigorous pre-generation framework in the dataset. JPMorgan also addresses code generation hallucination via guardrails in a separate 2025 filing on improving code generation quality through code guardrails.

Vodafone Group Services Limited pursues a VAE-based runtime detection architecture across three jurisdictions (EP, US, GB), demonstrating a coherent global IP strategy for closed-domain RAG hallucination detection. Microsoft Technology Licensing, LLC contributes both the forward-backward hallucination detection technique and a calibrated confidence estimation filing — covering model-agnostic and domain-calibrated approaches respectively. Adobe Inc. contributes three filings across 2024, 2025, and 2026 on template-based hallucination prevention focused on factual consistency checking against structured templates — a method extensible to engineering specification documents.

Oracle International Corporation addresses both machine learning traceback-enabled decision rationales and responding to hallucinations in generative LLMs, emphasizing explainability and traceability. NEC Laboratories Europe discloses explainer, output verification, and hallucination correction for LLMs with explicit application to computational biology and medical AI — using attribution links between text spans to identify hallucination candidates. Google LLC holds two parallel filings on iterative hallucination detection-and-regeneration, plus a generative model quality monitoring filing using expert system benchmarking with backstop prompts.

The LLM hallucination detection patent dataset encompasses approximately 60 filings across US, EP, GB, WO, CN, KR, JP, and other jurisdictions, with dominant assignees including JPMorgan Chase Bank, Google LLC, Vodafone Group Services Limited, Microsoft Technology Licensing LLC, Oracle International Corporation, Adobe Inc., LTI MindTree Ltd., Accenture Global Solutions, BMC Software, ServiceNow, and NEC Laboratories Europe — spanning financial services, telecommunications, enterprise software, and industrial automation sectors.

The clustering of assignees across these four sectors — financial services, telecommunications, enterprise software, and industrial automation — signals that hallucination rate evaluation is no longer a research-stage concern. It is an active IP battleground where organisations are seeking defensible technical positions before regulatory frameworks for AI in high-stakes engineering contexts are formalised. Engineering leaders evaluating LLM deployment should monitor this patent landscape as a leading indicator of which technical approaches are gaining commercial confidence. PatSnap’s IP intelligence platform and R&D analytics tools provide structured access to this landscape for technology scouting and freedom-to-operate analysis.

Frequently asked questions

LLM hallucination rate evaluation — key questions answered

Still have questions? Let PatSnap Eureka answer them for you.

Ask PatSnap Eureka for a Deeper Answer →

References

  1. System and method for implementing a model that predicts the probability of hallucination for any query imposed to an LLM — JPMorgan Chase Bank, 2025
  2. Method and system of training an encoder classifier model in predicting hallucination of a machine learning (ML) model before a generation of a query — JPMorgan Chase Bank, 2026
  3. Detecting candidate hallucinations in outputs of a retrieval-augmented generation enhanced large language model (EP) — Vodafone Group Services Limited, 2026
  4. Detecting candidate hallucinations in outputs of a retrieval-augmented generation enhanced large language model (US) — Vodafone Group Services Limited, 2026
  5. Detecting candidate hallucinations in outputs of a retrieval-augmented generation enhanced large language model (GB) — Vodafone Group Services Limited, 2026
  6. Language model hallucination detection — Microsoft Technology Licensing, LLC, 2025
  7. Detection of hallucinations in large language model responses (US) — Google LLC, 2025
  8. Detection of hallucinations in large language model responses (WO) — Google LLC, 2025
  9. System and method for preventing hallucinations — SRI International, 2025
  10. Method and system for performing end-to-end evaluation of a large language model (LLM) — LTI MindTree Ltd., 2025
  11. Method and system for evaluating integration of responsible AI with LLM operations — Accenture Global Solutions Limited, 2026
  12. Domain-specific hallucination detection and correction for machine learning models — BMC Software, Inc., 2024
  13. Framework for Trustworthy Generative Artificial Intelligence — ServiceNow, Inc., 2025
  14. Explainer, output verification, and hallucination correction for output of large language models — NEC Laboratories Europe GmbH, 2026
  15. Producing calibrated confidence estimates for open-ended answers by generative artificial intelligence models — Microsoft Technology Licensing, LLC, 2025
  16. Responding to hallucinations in generative large language models — Oracle International Corporation, 2025
  17. Machine learning traceback-enabled decision rationales as models for explainability — Oracle International Corporation, 2026
  18. Information retrieval from LLM with reduced hallucination for industrial applications — ABB Switzerland, 2025
  19. Hallucination prevention for natural language insights — Adobe Inc., 2024
  20. Hallucination prevention for natural language insights — Adobe Inc., 2025
  21. Hallucination prevention for natural language insights — Adobe Inc., 2026
  22. Method and system for improving code generation quality of large language model through code guardrails — JPMorgan Chase Bank, 2025
  23. NIST AI Risk Management Framework — National Institute of Standards and Technology
  24. OECD AI Principles — Organisation for Economic Co-operation and Development
  25. ISO/IEC AI Standards — International Organization for Standardization
  26. IEEE Standards for AI — Institute of Electrical and Electronics Engineers
  27. WIPO — World Intellectual Property Organization (AI and IP transparency guidance)

All data and statistics in this article are sourced from the references above and from PatSnap‘s proprietary innovation intelligence platform.

Your Agentic AI Partner
for Smarter Innovation

PatSnap fuses the world’s largest proprietary innovation dataset with cutting-edge AI to
supercharge R&D, IP strategy, materials science, and drug discovery.

Book a demo