Benchmark

PatentBench-Novelty Search
Methodology
Use Cases

Overview

The Patsnap PatentBench is a benchmark specifically for novelty search tasks in real-world patent scenarios.

It evaluates the performance of three AI tools: Patsnap’s Novelty Search AI Agent, ChatGPT-o3 (with web search), and DeepSeek-R1 (with web search).

The benchmark is based on a group of test samples, each consisting of a “test question” and a “standard answer”—a curated set of X documents from various patent offices that closely represent the ideal references used in actual novelty searches.

Understanding Novelty Search

Novelty search is a key patent task that involves systematically identifying prior art worldwide to determine whether a technical solution is new and inventive under patent law.

It plays a critical role throughout the innovation process, including:

R&D planning: guiding the direction and feasibility of new developments

Pre-filing: verifying that an invention is patentable before submission

Patent examination: helping examiners assess the novelty of applications

Key Findings

The evaluation dataset is evenly distributed across IPC classifications, covering both mainstream technologies and niche domains. In terms of language, 68% of the data is in English and 32% of the data is in Chinese, ensuring the model performs well across multilingual patent content. For receiving-office distribution, applications from United States (US) and China (CN) each make up about 32%, while those from the European Patent Office (EP) and WIPO (WO) each account for roughly 18%. This balanced mix reflects the different examination styles across major patent jurisdictions and ensures more realistic, globally representative evaluation.

Language distribution of patent texts in 340 test samples

Distribution of IPC samples across 340 test samples

Distribution of receiving offices for 340 test samples

Benchmark results show that Patsnap’s Novelty Search AI Agent achieved a 81% X Detection Rate and a 36% X Recall Rate within the top 100 results—significantly outperforming two leading general-purpose AI tools.

1) X Hit Rate

Patsnap’s Novelty Search AI Agent successfully identified at least one relevant X document in 81% of test cases—an essential capability for speeding up decision-making in patent examination and early-stage R&D.

X Hit Rate

The percentage of tests with accurate hits in the top 100 results

2) X Recall Rate

Patsnap’s Novelty Search AI Agent retrieved 36% of all relevant X documents, enabling more thorough analysis and more informed patent claim drafting.
A high X Recall Rate is key during R&D planning and before filing a patent. Patsnap’s Novelty Search AI Agent helps teams—whether in-house researchers, patent professionals, or external agents—find more relevant X documents. This supports better technical decisions and stronger patent claims, increasing the chances of patent approval.

X Recall Rate

Share of X documents found in the top 100 results

3) Typical Test Result Sample

In this test, the patent specification (the “problem statement”) was submitted to each AI tool. Their results were then evaluated against a predefined set of X documents (the “model answer”).

Patsnap’s Novelty Search AI Agent successfully identified all four relevant patent families within the top 100 results, achieving an X Hit Rate of 100% and an X Recall Rate of 100%.

By comparison, both ChatGPT-o3 and DeepSeek-R1 also achieved a 100% X Hit Rate. However, ChatGPT retrieved only one relevant patent family, leading to a much lower X Recall Rate of 25%, while DeepSeek failed to retrieve any, resulting in an X Recall Rate of 0%.

These findings highlight that while general-purpose LLMs excel in reasoning, they struggle with highly specialized tasks like patent novelty search. In comparison, domain-specific AI tools like Patsnap’s Novelty Search AI Agent offer superior accuracy and relevance, underscoring their essential role in patent-focused workflows.

A single-sample benchmark test

Future Research

Future benchmarks will further expand this dataset and refine evaluation methods for greater accuracy and coverage.

The Patsnap PatentBench Methodology

The Patsnap PatentBench Novelty Search benchmark encompasses four key dimensions:

1) Test samples to establish the “benchmark value”

Each test sample is a basic unit in the benchmark, consisting of a “test question” and a “standard answer.” Since ideal answers rarely exist in patent tasks—like a complete list of “X documents” to determine novelty—domain experts must carefully define realistic questions and construct answers that closely approximate ideal outcomes.

2) Datasets to create reliable and unbiased test results

To ensure statistical validity, each benchmark dataset includes hundreds to thousands of carefully selected samples that reflect real-world diversity across regions, patent offices, and industries. Samples are drawn from recent timeframes to avoid overlap with data used in LLM pretraining, ensuring unbiased evaluation.

3) Evaluation metrics - the core of benchmarking

Evaluation metrics are used to measure and compare performance. They can be single or combined indicators, carefully designed by experts to reflect the practical needs of patent professionals.

Novelty search aims to identify relevant prior art to determine whether a patent claim is truly new. It follows the general principles of traditional search logic, but with a specialized focus on patent validity.

The Patsnap PatentBench uses the following indicators to measure the quality of search results:

X Hit Rate

Proportion of samples where a correct answer appears among the top 1, 3, or 5 results

X Hit Rate: Proportion of samples where a correct answer appears among the top 1, 3, or 5 results.

X Recall Rate

Percentage of correct answers found within the top 100 results

X Recall Rate: Percentage of correct answers found within the top 100 results.

4) Comparison AI tools with industry experts

AI tools are designed to support and enhance the work of professionals, so direct comparisons with professional performance are essential. Benchmarks also compare specialized AI agents with general-purpose models like LLMs to establish baselines and better understand their capabilities.

AI AGENTS

AI APPLICATIONS

OTHERS

INDUSTRIES

EXPLORE

ENGAGE

SUPPORT & SERVICES

Your Agentic AI Partner
for Smarter Innovation

Great, Please verify your email.

Patsnap PatentBench Research:
Benchmarking AI Tools for Novelty Search

Overview

Understanding Novelty Search

Key Findings

X Hit Rate

X Recall Rate

A single-sample benchmark test

Future Research

The Patsnap PatentBench Methodology

X Hit Rate

X Recall Rate

Patsnap Novelty Search AI Agent
Use Cases & Its Impact

AI AGENTS

AI APPLICATIONS

OTHERS

INDUSTRIES

EXPLORE

ENGAGE

SUPPORT & SERVICES

Your Agentic AI Partner for Smarter Innovation

Great, Please verify your email.

Patsnap PatentBench Research: Benchmarking AI Tools for Novelty Search

Overview

Understanding Novelty Search

Key Findings

X Hit Rate

X Recall Rate

A single-sample benchmark test

Future Research

The Patsnap PatentBench Methodology

X Hit Rate

X Recall Rate

Patsnap Novelty Search AI Agent Use Cases & Its Impact

Your Agentic AI Partner
for Smarter Innovation

Patsnap PatentBench Research:
Benchmarking AI Tools for Novelty Search

Patsnap Novelty Search AI Agent
Use Cases & Its Impact