novelty-search

PatentBench-Novelty Search
Methodology
Use Cases

Understanding Novelty Search

Novelty search is a key patent task that involves systematically identifying prior art worldwide to determine whether a technical solution is new and inventive under patent law.

It plays a critical role throughout the innovation process, including:

R&D planning: guiding the direction and feasibility of new developments

Pre-filing: verifying that an invention is patentable before submission

Patent examination: helping examiners assess the novelty of applications

Key Findings

This benchmark evaluates six AI tools for patent novelty search: Patsnap's Novelty Search AI Agent, Claude Opus 4.8 (with web search), Perplexity Pro (with web search), ChatGPT 5.4 (with web search), Gemini 3.1 Pro (with web search), and DeepSeek 3.2 (with web search).

The evaluation uses a curated cross-jurisdiction patent family dataset of 340 test samples. Each sample contains a problem statement and a standard answer: the family set of X references cited by examiners across different patent offices. This design creates a practical benchmark answer that closely reflects real-world novelty search requirements.

Benchmark results show that Patsnap's Novelty Search AI Agent achieved an 85% X Hit Rate and a 37% X Recall Rate within the top 100 results, outperforming the general-purpose AI tools tested in this benchmark. Claude Opus 4.8 ranked second, with a 52.37% X Hit Rate and an 11.68% X Recall Rate.

The evaluation dataset is evenly distributed across IPC classifications, covering both mainstream technologies and niche domains. In terms of language, 68% of the data is in English and 32% of the data is in Chinese, ensuring the model performs well across multilingual patent content.

For receiving-office distribution, applications from United States (US) and China (CN) each make up about 32%, while those from the European Patent Office (EP) and WIPO (WO) each account for roughly 18%. This balanced mix reflects the different examination styles across major patent jurisdictions and ensures more realistic, globally representative evaluation.

Language distribution of patent texts in 340 test samples

Distribution of IPC samples across 340 test samples

Note: Percentages may not sum to 100% due to rounding to one decimal place.

Distribution of receiving offices for 340 test samples

1) X Hit Rate

Patsnap’s Novelty Search AI Agent successfully identified at least one relevant X document in 85% of test cases—an essential capability for speeding up decision-making in patent examination and early-stage R&D.

2) X Recall Rate

Patsnap’s Novelty Search AI Agent retrieved 37% of all relevant X documents, enabling more thorough analysis and more informed patent claim drafting.
A high X Recall Rate is key during R&D planning and before filing a patent. Patsnap’s Novelty Search AI Agent helps teams—whether in-house researchers, patent professionals, or external agents—find more relevant X documents. This supports better technical decisions and stronger patent claims, increasing the chances of patent approval.

3) Typical Test Result Sample

In this test, the patent specification, or problem statement, was submitted to each AI tool. The returned results were evaluated against a predefined standard-answer set of X references.

The sample below shows how the benchmark evaluates the returned patent references. Green references are hits in the standard-answer family set, and the bottom rows calculate X Hit Rate and X Recall Rate directly from those hits.

A single-sample benchmark test

In this sample, Patsnap's Novelty Search AI Agent identified three of the four relevant patent families, achieving a 100% X Hit Rate and a 75% X Recall Rate. Claude Opus 4.8 returned eight results in this sample; its two hits appear at positions 3 and 8, identifying two of the four relevant families and achieving a 100% X Hit Rate and a 50% X Recall Rate for this sample. Perplexity Pro and DeepSeek 3.2 each identified one relevant family, while ChatGPT 5.4 and Gemini 3.1 Pro did not identify a relevant family in this example.

These sample-level results are illustrative. The overall benchmark conclusions are based on the full dataset of 340 cross-jurisdiction patent family samples.

Future Research

Future benchmarks will continue to expand the dataset and refine the evaluation methods for greater accuracy, coverage, and representativeness. As the dataset grows, the benchmark will provide a more robust view of how AI tools perform in professional patent novelty search.

Methodology

The Patsnap PatentBench Novelty Search benchmark encompasses four key dimensions:

1) Test samples to establish the “benchmark value”

Each test sample is a basic unit in the benchmark. It consists of a problem statement and a standard answer. Because patent tasks rarely have perfect one-line answers, domain experts construct standard-answer sets that closely approximate the ideal outcome for professional novelty search.

For this benchmark, X and Y references cited by examiners at different receiving offices were collected, deduplicated, and normalized by patent family. This creates a reusable reference set for evaluation and comparison.

2) Datasets to create reliable and unbiased test results

The dataset contains 340 carefully selected cross-jurisdiction patent family samples. The samples are controlled for language and IPC distribution, helping the benchmark reflect real-world diversity across patent texts and technical fields.

3) Evaluation metrics - the core of benchmarking

Evaluation metrics are used to measure and compare performance. They can be single or combined indicators, carefully designed by experts to reflect the practical needs of patent professionals.

Novelty search aims to identify relevant prior art to determine whether a patent claim is truly new. It follows the general principles of traditional search logic, but with a specialized focus on patent validity.

The Patsnap PatentBench uses the following indicators to measure the quality of search results:

X Hit Rate

Proportion of samples where a correct answer appears among the top 1, 3, or 5 results

X Hit Rate: Proportion of samples where a correct answer appears among the top 1, 3, or 5 results.

X Recall Rate

Percentage of correct answers found within the top 100 results

X Recall Rate: Percentage of correct answers found within the top 100 results.

4) Comparison AI tools with industry experts

AI tools are designed to support and enhance the work of professionals, so benchmark comparisons should include both specialized AI agents and general-purpose models. This report compares Patsnap's Novelty Search AI Agent with five general-purpose AI tools that support web search.

AI AGENTS

INTELLIGENCE SUITE

API, MCP & INTEGRATION

INDUSTRIES

USE CASES

EXPLORE

ENGAGE

SUPPORT & SERVICES

Your Agentic AI Partner
for Smarter Innovation

Great, Please verify your email.

Patsnap PatentBench for Novelty Search

Understanding Novelty Search

Key Findings

A single-sample benchmark test

Future Research

Methodology

X Hit Rate

X Recall Rate

Patsnap Novelty Search AI Agent
Use Cases & Its Impact

AI AGENTS

INTELLIGENCE SUITE

API, MCP & INTEGRATION

INDUSTRIES

USE CASES

EXPLORE

ENGAGE

SUPPORT & SERVICES

Your Agentic AI Partner for Smarter Innovation

Great, Please verify your email.

Sign up

Great! Please verifyyour email.

Patsnap PatentBench for Novelty Search

Understanding Novelty Search

Key Findings

A single-sample benchmark test

Future Research

Methodology

X Hit Rate

X Recall Rate

Patsnap Novelty Search AI Agent Use Cases & Its Impact

Your Agentic AI Partner
for Smarter Innovation

Great! Please verify
your email.

Patsnap Novelty Search AI Agent
Use Cases & Its Impact