Book a demo

Patent Drafting Analysis of DeepMind Technologies Limited’s Asynchronous Deep Reinforcement Learning | US 11,783,182 B2

Patent Drafting Analysis of DeepMind Technologies Limited’s Asynchronous Deep Reinforcement Learning | US 11,783,182 B2
IP Drafting Analysis · US 11,783,182 B2

Patent Drafting Analysis of DeepMind Technologies' Asynchronous Deep Reinforcement Learning | US 11,783,182 B2

A structural and strategic analysis of US 11,783,182 B2, examining claim architecture, drafting quality, critical gaps, and prosecution positioning across DeepMind's core asynchronous RL training system patent.

US 11,783,182 B2Filed: Feb 8, 2021Granted: Oct 10, 2023G06N 3/08G06N 3/045G06N 3/04
Spec Words
5,200
Across 5 sections
Draft now ↗
Total Claims
20
3 independent · 17 dependent
Draft now ↗
Figure Sheets
5
System architecture and training flow diagrams
Draft now ↗
Published by PatSnap Insights Team · · 12 min read Verified by PatSnap Eureka Data
Overview

Structural Overview

The detailed description dominates at approximately 50% of total specification words (~2,600 of ~5,200), providing solid technical depth for the asynchronous training architecture, while the claims section at ~1,730 words is notably large relative to the specification, reflecting verbose independent claim language. The patent presents 20 claims in a tripartite structure: 3 independent claims covering method (Claim 1), system (Claim 9), and CRM (Claim 17), with 17 dependents distributed across each independent. Five drawing sheets provide flow diagram and system architecture coverage, though the figures are limited to block diagrams and process flows with no detailed architectural diagrams of the shared memory update mechanism.

Section Word Distribution

Detailed Desc. 2600 w Claims 1730 w Summary 520 w Background 310 w Brief Desc. 260 w Abstract 105 w ↗ Click bars to explore

Figure Inventory — 5 Sheets

FigureDescriptionRole
FIG. 1
Block diagram of the neural network training system 100 showing workers 102A-102N, actors 104A-104N, environment replicas 106A-106N, and shared memory 110.Search in Eureka ↗
System architecture
FIG. 2
Flow diagram of process 200 showing the per-worker training loop: determine parameters (202), receive observation (204), select action (206), receive reward (208), compute gradient (210-212), conditionally write to shared memory (214-220).Search in Eureka ↗
Flow diagram
FIG. 3
Flow diagram of process 300 for performing a Q-learning iteration: receive observation/action/reward (302), determine maximum target network output (304), determine error (306), compute gradient (308).Search in Eureka ↗
Flow diagram
FIG. 4
Flow diagram of process 400 for performing a SARSA iteration: receive inputs (402), select next action (304), determine target network output (406), determine error (408), compute gradient using error (408).Search in Eureka ↗
Flow diagram
FIG. 5
Flow diagram of process 500 for training a policy neural network: determine policy network parameters (502), receive observations and select actions (504), determine long-term reward (506), compute per-observation errors (508), gradient updates (510-512), conditionally write to shared memory (514-520).Search in Eureka ↗
Flow diagram
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Claims

Claim Architecture Analysis

The patent presents 3 independent claims: Claim 1 (method), Claim 9 (system), and Claim 17 (non-transitory computer storage media/CRM), each covering the asynchronous multi-worker training architecture with per-worker exploration policies. The dependent-to-independent ratio of 5.67:1 is close to the software/AI norm, with 17 dependent claims adding exploration policy variants (Claims 2–4, 10–12, 18–20), gradient accumulation mechanisms (Claims 5–6, 13–14), and implementation details (Claims 7–8, 15). The tripartite independent claim structure provides enforcement coverage across method, system, and storage medium formats, though the claim bodies are notably verbose, potentially creating prosecution narrowing risks.

Core inventive concept: The claims target the specific problem of slow, communication-heavy synchronous deep RL training by disclosing a system of plural independently-operating workers, each associated with a respective actor and environment replica, wherein each worker's exploration policy is "parameterized by a set of exploration policy parameters" that are "specific to the worker and are different from values of exploration policy parameters of each of one or more other workers" — enabling diverse parallel exploration without requiring a replay memory or inter-worker synchronization.

Independent Claim Dissection

ClaimPreambleTransitionKey Body Elements
Claim 1A method of training a deep neural network having a plurality of parameters that is used to select actions to be performed by an agent that interacts with an environment by performing actions selected from a predetermined set of actionscomprising
using a plurality of workers to generate training data; each worker operates independently, associated with a respective actor and environment replica with a distinct exploration policy; each worker repeatedly determines current DNN parameters, receives observations, selects actions, receives rewards, accumulates training data; applying reinforcement learning technique to determine current gradients; determining updated DNN parameter values using gradientsSearch prior art ↗
Claim 9A systemcomprising
one or more computers; one or more storage devices storing instructions that when executed cause the computers to perform operations for training a deep neural network using a plurality of workers, each worker independently operating with distinct exploration policy parameters, generating training data via actor-environment replica interaction, applying reinforcement learning technique to determine gradients, and determining updated DNN parameter valuesSearch prior art ↗
Claim 17One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations to train an industrial plant controller that controls operation of an industrial plantcomprising
training a deep neural network with a plurality of parameters for agent action selection; using plural independent workers each with a distinct exploration policy; each worker generates training data via repeated observation-action-reward cycles with environment replicas; applying reinforcement learning technique to determine gradients; determining updated DNN parameter values using gradientsSearch prior art ↗

Claim Dependency Tree

1 Method: training DNN via plural independent workers, each with distinct exploration policy, accumulating training data, applying RL technique to determine gradientsSearch Claim 1 prior art ↗
2 Adds: epsilon-greedy exploration policy with per-worker different epsilon probability parameterSearch in Eureka ↗
3 Adds: sampling new epsilon value from probability distribution when update criterion is met (depends on Claim 2)Search in Eureka ↗
4 Adds: softmax temperature parameter tau for per-worker exploration, sampling action from probability distributionSearch in Eureka ↗
5 Adds: per-worker application of RL technique to worker-generated training data to generate per-worker current gradientsSearch in Eureka ↗
6 Adds: per-worker accumulated gradient update, shared memory criteria check, conditional write of updated parameter values to shared memory (depends on Claim 5)Search in Eureka ↗
7 Adds: each worker executes independently on same computerSearch in Eureka ↗
8 Adds: DNN is Q network generating Q values for observation-action pairs, epsilon-greedy action selection using Q valuesSearch in Eureka ↗
9 System: one or more computers with storage devices executing instructions for multi-worker asynchronous DNN training with distinct per-worker exploration policiesSearch Claim 9 prior art ↗
10 Adds: epsilon-greedy exploration policy with per-worker different epsilon probability (mirrors Claim 2)Search in Eureka ↗
11 Adds: sampling new epsilon from probability distribution when criterion satisfied (depends on Claim 10)Search in Eureka ↗
12 Adds: softmax temperature tau for per-worker action selection from probability distributionSearch in Eureka ↗
13 Adds: per-worker RL technique applied to worker-generated training data for per-worker gradientsSearch in Eureka ↗
14 Adds: accumulated gradient update, shared memory criteria check, conditional write of updated parameters (depends on Claim 13)Search in Eureka ↗
15 Adds: each worker executes independently on same computerSearch in Eureka ↗
16 Adds: DNN is Q network, Q value generation for observation-action pairs, epsilon-greedy selection using Q valuesSearch in Eureka ↗
17 CRM: non-transitory computer storage media storing instructions to train industrial plant controller via asynchronous multi-worker DNN training with distinct per-worker exploration policiesSearch Claim 17 prior art ↗
18 Adds: epsilon-greedy exploration policy with per-worker different epsilon (mirrors Claims 2, 10)Search in Eureka ↗
19 Adds: sampling new epsilon from probability distribution when criterion satisfied (depends on Claim 18)Search in Eureka ↗
20 Adds: softmax temperature tau for per-worker action selection via probability distribution samplingSearch in Eureka ↗
MetricThis ApplicationSoftware / AI Industry Norm
Total claims2015 – 25
Independent claim count32 – 4
Dependent : Independent ratio5.67 : 14 – 8 : 1
Method claims present?Yes — Claim 1Common
System / apparatus claims?Yes — Claim 9Common
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Drafting Quality

Drafting Quality Signals

The patent demonstrates strong structural coverage through its tripartite independent claim architecture (Claims 1, 9, 17) and provides adequate written description support in the detailed description for the core worker-based asynchronous training mechanism. However, the identical mirroring of dependent claims across all three independent claims (Claims 2–8 ≈ Claims 10–16 ≈ Claims 18–20) reduces the quality of fallback positions and creates a significant §101 exposure risk given the purely functional claim language describing the RL training process without hardware-specific structural anchors.

Antecedent Basis
Antecedent basis is generally clean throughout the 20 claims. In Claim 1, "the worker" correctly refers back to "each worker" introduced in the preceding limitation, and "the actor associated with the worker" properly references "a respective actor" established earlier. "The deep neural network" in the gradient-determining step traces back to the preamble's "a deep neural network." No floating "the" references were identified across Claims 2–20 that lack proper antecedent basis.
Spec–Claim Consistency
FIG. 1 and the detailed description at col. 3–4 directly map to the "plurality of workers," "respective actor," and "shared memory" limitations of Claims 1, 9, and 17. FIG. 2 (steps 202–220) provides direct support for Claim 6's accumulated gradient update and conditional shared-memory write limitation. The Q-learning and SARSA embodiments in FIGS. 3–4 support the RL technique application recited in the independent claims, and FIG. 5 supports the policy network variant. No independent claim limitation lacks a corresponding description passage.
Transition Word Usage
All three independent claims use "comprising" — the correct open-ended transition for a software/AI patent that should not exclude implementations with additional components such as a centralized parameter server or experience replay buffer. Claim 17's CRM preamble uses "storing instructions" followed by "comprising" for the operations, which is the accepted format for CRM claims at the USPTO. No restrictive "consisting of" or "consisting essentially of" transitions appear, which is strategically appropriate for a training system architecture where additional operational steps are likely.
§112(f) Means-Plus-Function Risk
No "means for" or "step for" language appears in any of the 20 claims, eliminating direct §112(f) invocation risk. The system claim (Claim 9) is drafted as "one or more computers" and "one or more storage devices" with recited structural components, rather than functional "means" elements. Functional language in the claims (e.g., "configured to operate independently," "configured to generate training data") is anchored to the structural computer/storage device elements, which courts have generally held sufficient to avoid §112(f) interpretation under Williamson v. Citrix.
⚠️
§101 Eligibility Risk
Claims 1, 9, and 17 present moderate Alice/Mayo exposure because the core inventive concept — asynchronous parallel RL training with diverse per-worker exploration policies — is framed as an abstract mathematical/computational process rather than a concrete hardware improvement. The §101 defense rests primarily on Claim 9's "one or more computers" and Claim 17's CRM anchor; however, in an inter partes review or district court challenge, an adversary could argue the exploration policy diversification is a mathematical concept practiced on generic computers. The specification's reference to FPGA/ASIC implementations (col. 9–10) is helpful but not reflected in the claims as a structural limitation.
⚠️
Dependent Claim Fallback Quality
The 17 dependent claims are almost entirely duplicated across the three independent claims: Claims 2–8 mirror Claims 10–16, which mirror Claims 18–20 (truncated). This symmetrical mirroring adds enforcement coverage across claim types but provides no genuinely distinct technical fallback positions beyond what is already in Claims 2–8. Notably, Claims 3 and 19 (epsilon sampling from distribution) and Claims 6 and 14 (accumulated gradient with conditional shared-memory write) are the strongest fallbacks, adding specific mechanisms. However, there is no dependent claim directed to the target network synchronization frequency described at col. 5–6, nor to the specific RMSProp update rule mentioned in the spec, representing missed fallback opportunities.
⚠️
Abstract Quality
The abstract states: "One of the systems includes a plurality of workers, wherein each worker is configured to operate independently of each other worker, and wherein each worker is associated with a respective actor that interacts with a respective replica of the environment during the training of the deep neural network." This accurately describes the system architecture but omits the distinguishing feature — per-worker diversified exploration policy parameterization — which is the key claim element over prior parallel RL approaches. An examiner reading only the abstract might not appreciate why the per-worker exploration policy differentiation is the novel contribution, potentially leading to an examiner search that misses relevant prior art combinations.
Figure Support Quality
FIG. 1 supports all structural system claim limitations in Claims 9 and 17, clearly depicting workers (102A-N), actors (104A-N), environment replicas (106A-N), and shared memory (110). FIGS. 2 and 5 support the gradient accumulation and conditional shared-memory write steps in Claims 6 and 14. FIGS. 3 and 4 support Q-learning (Claim 8/16) and SARSA reinforcement learning technique embodiments referenced in the independent claims. However, no figure depicts the per-worker exploration policy parameter differentiation — the core novel element — which is described only in text, leaving the most important limitation without diagrammatic support.
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Scorecard

Strategic Intent Scorecard

Multi-dimensional assessment of this application's patent strategy quality, based on claim structure, specification depth, and prosecution positioning.

Claim Breadth
3.5
Prosecution Defensibility
3.8
Spec–Claim Consistency
4.2
Dependent Claim Coverage
2.5
Claim Type Diversity
4.5
Figure Support Quality
3.5
Breadth Prosecution Consistency Dep. Coverage Claim Types Figures
Key observation: Claim Type Diversity scores highest (4.5/5.0) because the tripartite structure of method (Claim 1), system (Claim 9), and CRM (Claim 17) — with the CRM claim specifically tied to an industrial plant controller application — provides unusually robust enforcement options across different defendant profiles (software developers, cloud platform operators, and industrial automation vendors). Dependent Claim Coverage scores lowest (2.5/5.0) because the 17 dependent claims are mechanically duplicated across all three independent claims with no unique technical fallbacks — specifically, no claims are directed to the target network synchronization mechanism, the RMSProp optimizer variant, or the asynchronous gradient conflict resolution described in the specification. Practitioners should note that the absence of specific hardware-tied dependent claims creates vulnerability in any §101 challenge under Alice's second step.
See how your own draft compares — Open Eureka IP Drafting →
Critical Gaps

3 Critical Gaps in This Claim Set

A senior-attorney lens on the three highest-priority structural weaknesses — what each exposes in prosecution and litigation, and what a stronger filing would have done differently.

🔒

3 Critical Gaps in This Claim Set

See the full attorney-level analysis of what this application leaves unprotected — and how to draft it more defensively for your own filings.

Missing target network synchronization claims CRM claim narrowed to industrial plant only No distributed multi-machine apparatus claim
Unlock Full Analysis — Free
Frequently asked questions

US 11,783,182 B2 — key questions answered

Still have questions? PatSnap Eureka can answer them from patent data instantly. Search in Eureka
PatSnap Eureka

Ready to Draft Your Next Patent with AI?

PatSnap Eureka's AI drafting agent writes structured claims, flags coverage gaps, and positions your application for prosecution success.

Disclaimer: This analysis is generated by PatSnap Eureka AI based on publicly available patent data from the USPTO. It does not constitute legal advice and should not be relied upon as such. Patent data may be subject to change as prosecution progresses. Scores and assessments reflect automated analysis and may not capture all relevant legal or technical nuances. Always consult a qualified patent attorney for formal legal opinions on patentability, freedom to operate, or infringement.

Ask anything about this patent.
PatSnap Eureka searches patents and data to answer instantly.
Powered by PatSnap Eureka
Link copied to clipboard

Eureka built for innovation research

Eureka built for research
Domain-specific AI agents for IP, Engineering, Life Sciences, and Materials
Patents, Scientific Literature, Compounds & More Unified in One Platform
Ask, Research, Solve, Draft, and Validate Your Work from Weeks to Minutes
Try it for Free

Help us improve this page

Found incorrect or outdated information? Let us know and we'll get it fixed.