Book a demo

Patent Drafting Analysis of DeepMind Technologies Limited’s Multi-Agent Reinforcement Learning with Matchmaking Policies | US 11,627,165 B2

Patent Drafting Analysis of DeepMind Technologies Limited’s Multi-Agent Reinforcement Learning with Matchmaking Policies | US 11,627,165 B2
IP Drafting Analysis · US 11,627,165 B2

Patent Drafting Analysis of DeepMind Technologies Limited's Multi-Agent Reinforcement Learning with Matchmaking Policies | US 11,627,165 B2

A structural and strategic analysis of US 11,627,165 B2, examining claim architecture across method, system, and CRM formats, drafting quality signals, §101 eligibility exposure, and critical prosecution gaps in DeepMind's matchmaking-based RL training system.

US 11,627,165 B2Filed: Jan 24, 2020Granted: Apr 11, 2023G06N 3/08H04L 9/40G06K 9/62
Spec Words
8,200
Across 6 sections
Draft now ↗
Total Claims
30
3 independent · 27 dependent
Draft now ↗
Figure Sheets
3
System architecture, training process flows
Draft now ↗
Published by PatSnap Insights Team · · 12 min read Verified by PatSnap Eureka Data
Overview

Structural Overview

The detailed description dominates at approximately 50% of total words (~4,800 words), reflecting substantive technical depth in explaining the matchmaking policy training loop, though the background section is lean at ~680 words, offering minimal prior art context. The claim set comprises 30 claims across 3 independent claims (method, system, and CRM) with 27 dependents, yielding a ratio of 9:1 — well above norms for AI/software patents. The three drawing sheets are minimalist, covering a system architecture (FIG. 1) and two simple flow diagrams (FIG. 2, FIG. 3), which provide only coarse-grained structural support for the detailed claim limitations.

Section Word Distribution

Detailed Desc. 4800 w Claims 2400 w Summary 1440 w Background 680 w Brief Desc. 340 w Abstract 200 w ↗ Click bars to explore

Figure Inventory — 3 Sheets

FigureDescriptionRole
FIG. 1
Shows the overall reinforcement learning system 100, including agents 102A-N, environment 104, policy neural network 110, training engine 120, policy data 140, training data 130, labeled task instances 132, learner policies 142A-M with respective matchmaking policies 144A-M, and fixed policy 152.Search in Eureka ↗
System architecture
FIG. 2
Flow diagram of example process 200 for training a policy neural network, showing three sequential steps: maintain pool of candidate action selection policies (202), maintain matchmaking policies (204), and train the policy neural network (206).Search in Eureka ↗
Flow diagram
FIG. 3
Flow diagram of example process 300 for updating learner policies based on training data, showing per-learner-policy steps: select one or more policies (302), generate training data for the learner policy (304), and update the respective set of policy parameters (306).Search in Eureka ↗
Flow diagram
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Claims

Claim Architecture Analysis

The patent contains 3 independent claims: Claim 1 (method), Claim 20 (CRM/non-transitory storage media), and Claim 21 (system), providing tripartite enforcement coverage across method, storage medium, and apparatus formats. The dependent:independent ratio of 9:1 significantly exceeds the software/AI industry norm of 4–8:1, reflecting a deliberately layered fallback strategy. Notably, the dependent claims are substantially mirrored across all three independent claims (e.g., Claims 2–19 depend from Claim 1, while Claims 22–30 roughly parallel Claims 2–10 for Claims 21/20), concentrating fallback depth on the method claim.

Core inventive concept: The claims address the challenge of training a policy neural network to control agents performing tasks in multi-agent environments where the state and strategic spaces are extremely large — a problem that arises because a single training opponent set cannot cover the diversity of strategies needed. The solution, as expressed across Claims 1, 20, and 21, is maintaining a pool of candidate action selection policies where each learner policy has its own "matchmaking policy" defining a probability distribution over the pool, allowing each learner to be trained against different, strategically selected opponents, thereby encouraging exploration of diverse state and strategy spaces.

Independent Claim Dissection

ClaimPreambleTransitionKey Body Elements
Claim 1A method of training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environmentcomprising
maintaining a pool of candidate action selection policies including learner policies and fixed policies; maintaining per-learner-policy matchmaking policies defining distributions over the pool; at each training iteration for each learner policy: selecting policies via matchmaking, generating training data via agent interaction, updating policy parameters via RL loss function; determining criteria for converting a learner to a fixed policy; generating new fixed policy with same parameter valuesSearch prior art ↗
Claim 20One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environmentcomprising
operations mirroring Claim 1: maintaining candidate action selection pool with learner and fixed policies; maintaining per-learner matchmaking policies; iterative selection via matchmaking, training data generation, RL-based parameter update; criteria-based conversion of learner to fixed policy; new fixed policy generation with same parameter valuesSearch prior art ↗
Claim 21A system comprising one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environmentcomprising
operations mirroring Claims 1 and 20: maintaining candidate action selection pool with learner and fixed policies; maintaining per-learner matchmaking policies defining distributions; iterative matchmaking-based policy selection, training data generation, RL parameter updates; criteria-based learner-to-fixed conversion; new fixed policy generationSearch prior art ↗

Claim Dependency Tree

1 Method: training policy neural network via pool of candidate policies + per-learner matchmaking distributions + RL parameter updates + learner-to-fixed conversionSearch Claim 1 prior art ↗
2 Adds: matchmaking policies for two or more learner policies are different from each otherSearch in Eureka ↗
3 Further: learner policies each assigned a respective type from plurality of types, each type associated with different matchmaking policySearch in Eureka ↗
4 Adds: matchmaking policy for at least one learner is uniform across learner policies of same type and zero for different types and fixed policiesSearch in Eureka ↗
5 Adds: matchmaking policy for at least one learner is uniform across all learner policies and zero for fixed policiesSearch in Eureka ↗
6 Adds: matchmaking policy for at least one learner is uniform across all policies in the poolSearch in Eureka ↗
7 Adds: RL loss function depends on plurality of hyperparameters; hyperparameter values different for two or more learner policiesSearch in Eureka ↗
8 Further: hyperparameters include one or more hyperparameters of a RL algorithm used in trainingSearch in Eureka ↗
9 Further: hyperparameters include internal reward hyperparameters defining whether RL loss depends on internal reward and how it is computedSearch in Eureka ↗
10 Adds: one or more fixed policies defined by policy parameter values determined through supervised learning on labeled task instancesSearch in Eureka ↗
11 Further: supervised learning comprises first supervised learning using first training data and second supervised learning using only selected portion with threshold performanceSearch in Eureka ↗
12 Adds: determining criteria satisfied comprises determining a predetermined number of training iterations have been completedSearch in Eureka ↗
13 Adds: in response to conversion criteria satisfied, setting policy parameters of particular learner based on current values of other policies in poolSearch in Eureka ↗
14 Further: setting policy parameters to new set determined based on current sets of values for policy parameters defining one or more other policies in poolSearch in Eureka ↗
15 Further: in response, modifying hyperparameters of the RL loss function for the particular learner policySearch in Eureka ↗
16 Further: in response, modifying the matchmaking policy for the particular learner policySearch in Eureka ↗
17 Adds: for at least one selected policy, updating its policy parameters by training on training data through RL to optimize RL loss functionSearch in Eureka ↗
18 Adds: determining criteria satisfied comprises determining agent controlled by learner has attained threshold level of performance on the particular taskSearch in Eureka ↗
19 Adds: matchmaking policy specifies higher-performing learner policies are more likely to be selected than lower-performing onesSearch in Eureka ↗
20 CRM: non-transitory storage media encoding operations identical in structure to Claim 1 method for training policy neural network with matchmaking-based learner-opponent selectionSearch Claim 20 prior art ↗
21 System: one or more computers + storage devices performing operations structurally identical to Claim 1 for training policy neural network with matchmaking policiesSearch Claim 21 prior art ↗
22 Adds: matchmaking policies for two or more learner policies are different (parallels Claim 2)Search in Eureka ↗
23 Further: learner policies each assigned type from plurality; type associated with different matchmaking policy (parallels Claim 3)Search in Eureka ↗
24 Adds: type-based uniform matchmaking, zero for other types and fixed policies (parallels Claim 4)Search in Eureka ↗
25 Adds: uniform across all learner policies, zero for fixed (parallels Claim 5)Search in Eureka ↗
26 Adds: uniform across all policies in pool (parallels Claim 6)Search in Eureka ↗
27 Adds: RL loss depends on hyperparameters different across learner policies (parallels Claim 7)Search in Eureka ↗
28 Further: hyperparameters include RL algorithm hyperparameters (parallels Claim 8)Search in Eureka ↗
29 Further: internal reward hyperparameters (parallels Claim 9)Search in Eureka ↗
30 Adds: in response to conversion criteria, setting policy parameters of learner based on current values of other policies in pool (parallels Claim 13)Search in Eureka ↗
MetricThis ApplicationSoftware / AI Industry Norm
Total claims3015 – 25
Independent claim count32 – 4
Dependent : Independent ratio9.0 : 14 – 8 : 1
Method claims present?Yes — Claim 1Common
System / apparatus claims?Yes — Claim 21Common
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Drafting Quality

Drafting Quality Signals

The patent demonstrates strong claim architecture through its tripartite independent claim structure (Claims 1, 20, 21) and a notably high dependent claim ratio of 9:1, providing layered fallback against validity attacks — particularly through Claims 3–9 which add distinct technical limitations around policy typing, hyperparameter diversity, and internal reward schemes. The primary quality weakness lies in §101 eligibility exposure: the independent claims recite entirely abstract computational operations without tying to any specific hardware architecture or physical effect, leaving the claims vulnerable to Alice Step 1 challenge.

Antecedent Basis
Antecedent basis is well-managed throughout the claim set. Claim 1 introduces "a pool of candidate action selection policies" and consistently references it as "the pool" in subsequent limitations. Similarly, "a respective matchmaking policy" is introduced then correctly referenced as "the matchmaking policy for the learner policy" in the selection step. The dependent claims that reference "the matchmaking policies" (e.g., Claims 2, 7, 19) trace cleanly back to the maintaining step in Claim 1. No orphaned "the" references were identified across Claims 1–30.
Spec–Claim Consistency
The specification provides direct written description support for the key independent claim limitations. FIG. 1 maps to the pool maintenance step (learner policies 142A-M, fixed policy 152, training engine 120), FIG. 2 maps to the three-step training process in Claims 1/20/21, and FIG. 3 maps to the per-learner-policy iteration sub-steps (selection 302, training data generation 304, parameter update 306). The "learner-to-fixed conversion" limitation in Claims 1, 20, and 21 is supported by detailed written description at columns 11–12. However, Equation 1 (the weighting function for matchmaking selection) appears in the spec but has no corresponding claim language referencing a probability weighting function, representing a mild asymmetry.
Transition Word Usage
All three independent claims use "comprising" as the transition word, which is the strategically correct choice for this technology domain — it renders the claims open-ended, meaning a system or method that adds further steps or components to the claimed structure still infringes. The use of "comprising" in both the method claim preamble and in the pool-maintenance sub-limitation ("the pool of candidate action selection policies comprising: (i)...(ii)...") is also correct and consistent. No restrictive "consisting of" or "consisting essentially of" language appears, and there were no missed opportunities to broaden by use of "including" or "having."
§112(f) Means-Plus-Function Risk
No "means for" or "step for" language appears in any of the 30 claims, and the claims do not use functional label constructs (e.g., "selection means," "training module") that would trigger §112(f) interpretation. The independent claims use active verb forms — "maintaining," "selecting," "generating," "updating," "determining" — which are well-established as avoiding §112(f) treatment under USPTO examination guidelines. The system claims (Claims 21–30) are drafted as computer-implemented operations rather than as named structural components, which also avoids §112(f) exposure. This is a clean, risk-free drafting approach for this technology domain.
⚠️
§101 Eligibility Risk
The independent claims carry meaningful Alice Step 1 exposure: Claim 1 recites maintaining data structures, selecting policies, generating training data, and updating parameters — all abstract computational operations that a court may characterize as a mathematical concept or mental process. The hardware tie-in in Claims 20 and 21 ("non-transitory computer-readable storage media" and "system comprising one or more computers") provides a §101 defense under Alice Step 2B but only at the structural claim level. Claim 1 (method) lacks any explicit hardware anchor. A stronger filing would have included at least one dependent claim specifying the distributed computing architecture (e.g., actor-learner architecture referenced in the spec) as a concrete hardware limitation to reinforce the §101 defense during prosecution.
Dependent Claim Fallback Quality
The dependent claims from Claim 1 add genuinely distinct fallback positions. Claims 3–6 provide a structured hierarchy of matchmaking distribution specificity (type-specific → learner-only uniform → all-policy uniform), each representing a narrower but distinct scope. Claims 7–9 add a meaningful technical dimension around hyperparameter diversity, which is a key aspect of the diversified learning strategy described in the specification. Claims 10–11 add the supervised learning initialization mechanism. However, Claims 22–30 (depending from Claims 21) are near-verbatim parallels of Claims 2–10 for the system claim, adding little incremental protection beyond what the tripartite structure already provides; a continuation adding CRM-specific dependent variations would have strengthened the portfolio.
⚠️
Abstract Quality
The abstract describes the invention at a functional level — "maintaining data specifying a pool of candidate action selection policies; maintaining data specifying respective matchmaking policy; and training the policy neural network using a reinforcement learning technique" — which is accurate but insufficiently differentiated from generic RL training systems. An examiner reading only the abstract would not identify the novel contribution: the per-learner-policy matchmaking distributions that solve the large-state-space exploration problem. The abstract omits the learner-to-fixed-policy conversion mechanism, which is the distinguishing structural feature of the claims, and fails to name the specific technical problem being solved (insufficient strategy diversity in multi-agent RL training).
⚠️
Figure Support Quality
The three figures provide only high-level structural support for the claims, and several key claim limitations lack dedicated figure support. FIG. 1 supports the pool architecture (learner policies 142A-M, matchmaking policies 144A-M, fixed policy 152), FIG. 2 maps to the top-level three-step process in Claims 1/20/21, and FIG. 3 maps to the per-learner iteration sub-steps. However, no figure illustrates the learner-to-fixed-policy conversion mechanism (the distinguishing limitation in all three independent claims), the hyperparameter diversity mechanism (Claim 7), the internal reward computation (Claim 9), or the supervised learning initialization (Claim 10). A stronger filing would have included a figure showing the conversion trigger logic and the resulting pool state transition.
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Scorecard

Strategic Intent Scorecard

Multi-dimensional assessment of this application's patent strategy quality, based on claim structure, specification depth, and prosecution positioning.

Claim Breadth
3.8
Prosecution Defensibility
3.5
Spec–Claim Consistency
3.6
Dependent Claim Coverage
4
Claim Type Diversity
4.5
Figure Support Quality
2.8
Breadth Prosecution Consistency Dep. Coverage Claim Types Figures
Key observation: Claim Type Diversity scores highest (4.5/5.0) because the tripartite structure across Claims 1 (method), 20 (CRM), and 21 (system) provides enforcement coverage in all three formats most relevant to AI software deployments, significantly complicating design-around attempts. Figure Support Quality scores lowest (2.8/5.0) because the three figures — all high-level flow diagrams and a single system architecture — leave the learner-to-fixed conversion mechanism, hyperparameter diversity, and supervised learning initialization (all claimed in independent or second-tier dependent claims) without any dedicated diagrammatic support, creating written description vulnerability for those limitations. Practitioners drafting continuations should prioritize adding figures that illustrate the conversion trigger logic and pool state transitions to anchor the deeper dependent claims.
See how your own draft compares — Open Eureka IP Drafting →
Critical Gaps

3 Critical Gaps in This Claim Set

A senior-attorney lens on the three highest-priority structural weaknesses — what each exposes in prosecution and litigation, and what a stronger filing would have done differently.

🔒

3 Critical Gaps in This Claim Set

See the full attorney-level analysis of what this application leaves unprotected — and how to draft it more defensively for your own filings.

Matchmaking weighting function unclaimed Method claim §101 hardware gap No cooperative multi-agent task claims
Unlock Full Analysis — Free
Frequently asked questions

US 11,627,165 B2 — key questions answered

Still have questions? PatSnap Eureka can answer them from patent data instantly. Search in Eureka
PatSnap Eureka

Ready to Draft Your Next Patent with AI?

PatSnap Eureka's AI drafting agent writes structured claims, flags coverage gaps, and positions your application for prosecution success.

Disclaimer: This analysis is generated by PatSnap Eureka AI based on publicly available patent data from the USPTO. It does not constitute legal advice and should not be relied upon as such. Patent data may be subject to change as prosecution progresses. Scores and assessments reflect automated analysis and may not capture all relevant legal or technical nuances. Always consult a qualified patent attorney for formal legal opinions on patentability, freedom to operate, or infringement.

Ask anything about this patent.
PatSnap Eureka searches patents and data to answer instantly.
Powered by PatSnap Eureka
Link copied to clipboard

Eureka built for innovation research

Eureka built for research
Domain-specific AI agents for IP, Engineering, Life Sciences, and Materials
Patents, Scientific Literature, Compounds & More Unified in One Platform
Ask, Research, Solve, Draft, and Validate Your Work from Weeks to Minutes
Try it for Free

Help us improve this page

Found incorrect or outdated information? Let us know and we'll get it fixed.