To start using PatSnap Eureka, click the verification button in the email we sent to .
This helps keep your account secure. Haven't received it? Check your spam folder.
Patent Drafting Analysis of Microsoft Technology Licensing’s Asynchronous Neural Network Training | US 12,099,927 B2
Patent Drafting Analysis of Microsoft Technology Licensing’s Asynchronous Neural Network Training | US 12,099,927 B2
IP Drafting Analysis · US 12,099,927 B2
Patent Drafting Analysis of Microsoft Technology Licensing's Asynchronous Neural Network Training | US 12,099,927 B2
A structural and strategic analysis of Microsoft's distributed neural network training patent, examining claim architecture, drafting quality, critical gaps, and prosecution positioning across apparatus, worker-node, and method claim types.
US 12,099,927 B2Filed: Mar 28, 2022Granted: Sep 24, 2024G06N 3/08G06N 3/04G06N 3/063
System architecture, message flow, and training schedules
Draft now ↗
Published byPatSnap Insights Team · · 12 min read Verified by PatSnap Eureka Data
Overview
Structural Overview
The detailed description dominates at approximately 63% of total specification words, providing strong written-description support for the pipelined, asynchronous training architecture across 11 figures. The claim set comprises 21 claims with 3 independent claims — an apparatus (Claim 1), a worker-node apparatus (Claim 9), and a method (Claim 17) — supported by 18 dependent claims at a 6:1 dependent-to-independent ratio. Figure coverage is comprehensive, spanning system overview (FIG. 1), node topology (FIGS. 2–5), control/worker flow diagrams (FIGS. 6–8), timing schedules (FIG. 9), a recurrent network example (FIG. 10), and the computing hardware embodiment (FIG. 11).
Section Word Distribution
↗ Click bars to explore
Figure Inventory — 11 Sheets
Figure
Description
Role
FIG. 1
Schematic of distributed neural network training system 104 connected to communications network 100, showing training data 102, control node 106, and deployed devices including smart phone 114, AR device 116, laptop 118, and smart watch 120.Search in Eureka ↗
System architecture
FIG. 2
Schematic of control node 200 connected to six worker nodes W1–W6, each containing fast memory 204, and a training data store 202, illustrating the pipeline topology.Search in Eureka ↗
System architecture
FIG. 3
Schematic of a layered neural network showing input layer 302, hidden layers 304–310, and output node 312, illustrating how the network is partitioned across worker nodes.Search in Eureka ↗
Claim support
FIG. 4
Schematic of the pipeline of FIG. 2 with dashed arrows showing forward pass message paths 400 and 402, from control node 200 through worker nodes W1–W6.Search in Eureka ↗
System architecture
FIG. 5
Schematic of the pipeline showing backward pass message paths 500, 502, and 504, with gradient updates propagated from worker node W6 back to W1 and completion message 504 returned to control node 200.Search in Eureka ↗
System architecture
FIG. 6
Flow diagram of the control node operation, starting at 600, checking training/test instance availability 602, evaluating rate criteria 606, creating and sending messages 610, and updating inflight records 618.Search in Eureka ↗
Flow diagram
FIG. 7
Flow diagram of a worker node during a forward process, receiving message 700, processing with local neural network subgraph 706, checking last node 708, sending forward message 710, and computing loss 718 when applicable.Search in Eureka ↗
Flow diagram
FIG. 8
Flow diagram of a worker node during a backward process, receiving backward pass message 800, computing gradient 802, checking gradient threshold 804, asynchronously updating local parameters 806, and sending message to next node 812.Search in Eureka ↗
Flow diagram
FIG. 9
Timing diagrams comparing three pipeline scheduling scenarios 902, 904, 906 on three machines over time, illustrating efficiency gains of asynchronous model parallelism versus synchronous update alternatives.Search in Eureka ↗
Claim support
FIG. 10
Schematic of a variable-length recurrent neural network 1000 with branch node 1010, replicated neural network nodes 1004, 1006, 1008, classifier 1012, and downstream control/application 1016, illustrating variable-length input handling.Search in Eureka ↗
System architecture
FIG. 11
Block diagram of exemplary computing-based device 1100 comprising processor 1102, fast memory 1106, communications interface 1104, memory 1108, operating system 1110, forward process 1112, and backward process 1114.Search in Eureka ↗
Claim support
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Claims
Claim Architecture Analysis
The patent contains 3 independent claims: Claim 1 (system/apparatus — full training apparatus), Claim 9 (apparatus — worker node), and Claim 17 (computer-implemented method at a worker node), providing tripartite structural coverage across the training system, individual node, and procedural dimensions. The 18 dependent claims yield a 6:1 dependent-to-independent ratio, which is at the lower end of norms for G06N-class AI/ML patents, suggesting moderate fallback depth. The strategy of having both a system-level apparatus claim (Claim 1) and a standalone worker-node apparatus claim (Claim 9) enables independent enforcement at the node level without requiring proof of the full training system.
Core inventive concept: The claims address the computational bottleneck and offline-training limitation of large neural networks by partitioning the neural network across a pipeline of worker nodes — each holding a subgraph in local memory — and enabling each worker node to "asynchronously update parameters of the subgraph of the neural network stored in the memory according to data in the received message," thereby eliminating synchronization barriers that would otherwise idle the pipeline. The asynchronous update mechanism, recited explicitly in Claims 2, 3, 10, 11, 18, and 19, is the central differentiator over conventional synchronous gradient-descent training.
Independent Claim Dissection
Claim
Preamble
Transition
Key Body Elements
Claim 1
A neural network training apparatus
comprising
a network of individual worker nodes forming a pipeline enabling different control flows for individual training/test instances; a control node configured to send training data instances to trigger parallelized message passing operations implementing a training algorithm; each worker node comprising a memory storing a neural network subgraph, and a processor programmed to receive a message and update subgraph parameters according to received message dataSearch prior art ↗
Claim 9
A worker node of a neural network training apparatus
comprising
a memory storing a subgraph of a neural network; a processor configured to receive a message from a control node comprising training data instances triggering parallelized message passing operations, and update parameters of the neural network subgraph stored in memory according to data in the received messageSearch prior art ↗
Claim 17
A computer implemented method at a worker node of a neural network training apparatus
comprising
accessing from memory a subgraph of a neural network; receiving a message from a control node comprising training data instances into a network of individual worker nodes triggering parallelized message passing operations; updating parameters of the subgraph according to data in the received messageSearch prior art ↗
Claim Dependency Tree
1 Apparatus: pipeline of worker nodes + control node, parallelized message passing, subgraph memory, parameter update per messageSearch Claim 1 prior art ↗
11 Adds: updating occurs without reference to, and independently of, other parameter updates at other worker nodesSearch in Eureka ↗
12 Adds: training data instances are a graphical representation of an organic moleculeSearch in Eureka ↗
13 Further: organic molecule includes rings of bonded atomsSearch in Eureka ↗
14 Adds: accumulator accumulating gradients from other worker nodes; processor asynchronously updates parameters using accumulated gradients when criteria are metSearch in Eureka ↗
15 Further: criteria comprise one or more of number of accumulated gradients, NN architecture type, data instance features, worker performance, communications performance, subgraph factorsSearch in Eureka ↗
16 Adds: memory is on-chip memory and subgraph parameters stored in on-chip memorySearch in Eureka ↗
17 Method: accessing subgraph from memory, receiving control-node message with training data triggering parallelized message passing, updating subgraph parameters per received messageSearch Claim 17 prior art ↗
19 Adds: updating occurs without reference to, and independently of, other updates at other individual worker nodesSearch in Eureka ↗
20 Adds: training data instances are a graphical representation of an organic moleculeSearch in Eureka ↗
21 Adds: updating subgraph parameters comprises computing at least one gradient of a loss function using data in the received messageSearch in Eureka ↗
Metric
This Application
Software / Cloud Norm
Total claims
21
15 – 30
Independent claim count
3
2 – 5
Dependent : Independent ratio
6.0 : 1
4 – 9 : 1
Method claims present?
Yes — Claim 17
Common
System / apparatus claims?
Yes — Claims 1, 9
Always
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Drafting Quality
Drafting Quality Signals
The claims demonstrate solid multi-tier apparatus and method coverage with consistent antecedent basis and strong spec-to-claim mapping across FIGS. 6–8 and the detailed description. However, the absence of a computer-readable medium (CRM) claim type and the significant repetition of the asynchronous update limitation across three parallel independent claims represent notable strategic omissions that a continuation or reissue could address.
✅
Antecedent Basis
Antecedent basis is consistently maintained throughout the claim set. In Claim 1, "a network of individual worker nodes" is introduced and subsequently referenced as "the network of individual worker nodes" in the control node limitation. Claim 9 introduces "a memory" and "a processor" and uses these terms consistently in dependents (Claims 10–16). Claim 17's method steps follow the same pattern with "a memory" and "a subgraph" properly introduced before use. No orphaned "the" references were identified across the 21 claims.
All key independent claim limitations map to specific figures and paragraphs. The "pipeline enabling different control flows" limitation in Claim 1 is directly supported by the detailed description at columns 3–4 discussing variable control flow and FIG. 10 (recurrent network example). The "asynchronously update parameters" limitation (Claims 2, 10, 18) maps to FIG. 8 (backward pass flow, steps 806) and columns 7–8 describing the asynchronous update process. The "on-chip memory" limitation of Claims 8 and 16 maps directly to FIG. 11 (fast memory 1106) and columns 5–6 describing SRAM usage. No unsupported claim limitations were identified.
All three independent claims (1, 9, 17) use "comprising" as the transitional phrase, which is strategically optimal for this technology domain — it preserves open-ended coverage and prevents a competitor from designing around by adding additional components or steps. The use of "comprising" in Claim 1's worker node sub-limitation ("each of the individual worker nodes comprising") properly extends open-ended coverage to the node-level elements. No missed opportunities for "consisting essentially of" were identified, and the "comprising" choice is appropriate given the complex, multi-component system architecture.
No "means for" or "step for" language appears in any of the 21 claims, avoiding mandatory §112(f) invocation. Functional language is present — for example, Claim 1 recites "a processor programmed to: receive a message... and update parameters" — but the structural anchoring to a named "processor" removes this from §112(f) territory under MPEP 2181 guidance. The specification at FIG. 11 and columns 11–12 provides detailed hardware description of processors 1102 and memory components, further insulating against §112(f) challenges. The risk is effectively managed.
The claims carry moderate Alice/Mayo exposure because the core inventive concept — asynchronously updating neural network parameters — could be characterized as an abstract mathematical operation (gradient-based parameter optimization) under Step 2A Prong 1 of the USPTO's 2019 Revised Guidance. The §101 defense rests primarily on the structural hardware tie-in: Claim 1 recites a specific pipeline of worker nodes each having a memory and processor, and Claims 8/16 add on-chip memory — these structural elements provide the "practical application" hook. However, the method claim (Claim 17) lacks explicit hardware recitation beyond memory access, making it more vulnerable than the apparatus claims if challenged under Alice Step 2. A stronger filing would have included hardware-specific limitations in Claim 17's body or added a CRM claim with specific structural memory limitations.
The dependent claim set has a structural redundancy problem: Claims 2, 10, and 18 all add the identical asynchronous update limitation to their respective independent claims; Claims 3, 11, and 19 all add the identical "without reference to, and independently of" qualifier; and Claims 4, 12, and 20 all add the identical organic molecule data limitation — resulting in 9 of 18 dependent claims being direct mirrors across three independent claims rather than adding novel fallback positions. The genuinely value-added dependents are Claim 14 (gradient accumulator with threshold-based update trigger), Claim 15 (multi-factor dynamic criteria adjustment), and Claim 7 (rate control), which provide meaningful prosecution fallback. A stronger filing would have redistributed the mirrored claims into unique technology-specific limitations such as specific loss functions, gradient clipping, or model compression at deployment.
The abstract accurately describes the training apparatus architecture and mentions asynchronous parameter updates — which is the core novel feature — but an examiner reading only the abstract might not identify the differentiation from prior synchronous distributed training systems. The abstract states "at least some of the message passing operations asynchronously update parameters of individual subgraphs of the neural network at the individual worker nodes" which captures the key limitation, but omits the pipeline's capacity for different control flows per instance, which is the specific structural element recited in Claim 1's preamble and distinguishes this from simpler distributed training approaches. A more precisely drafted abstract would have highlighted the variable control flow capability as a primary distinguishing feature.
Figure coverage is comprehensive and well-matched to the claim set. FIG. 2 directly supports the "network of worker nodes" and "control node" elements of Claim 1; FIG. 4 supports the forward pass message passing limitation; FIG. 5 supports the backward pass and gradient update flow; FIG. 8 step 806 provides direct visual support for the "asynchronously update parameters" limitation of Claims 2, 10, and 18; and FIG. 11 supports the on-chip memory limitations of Claims 8 and 16. The only minor gap is that the Claim 1 preamble limitation enabling "different control flows for individual training or test instances" relies primarily on FIG. 10 (the variable-length recurrent network) and text description rather than a dedicated flow diagram, which could have been more explicitly illustrated.
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Scorecard
Strategic Intent Scorecard
Multi-dimensional assessment of this application's patent strategy quality, based on claim structure, specification depth, and prosecution positioning.
Claim Breadth
3.5
Prosecution Defensibility
3.8
Spec–Claim Consistency
4.5
Dependent Claim Coverage
3
Claim Type Diversity
3
Figure Support Quality
4.2
Key observation: Spec–Claim Consistency scores highest (4.5/5.0) because every structural limitation in Claims 1, 9, and 17 maps precisely to named figures and paragraphs — for example, the asynchronous update limitation maps to FIG. 8 step 806 and the on-chip memory limitation maps to FIG. 11's fast memory 1106, leaving no written-description vulnerability. Dependent Claim Coverage scores lowest (3.0/5.0) because 9 of 18 dependent claims are structural mirrors across three independent claims (Claims 2/10/18, 3/11/19, 4/12/20) that add no new fallback positions beyond what the other independent claims already provide, significantly reducing prosecution and post-grant flexibility. Practitioners should note that a continuation filing focusing on the gradient accumulator mechanism (currently only in Claim 14) as a standalone independent claim, and adding a CRM claim, would substantially strengthen the patent family's enforcement profile.
A senior-attorney lens on the three highest-priority structural weaknesses — what each exposes in prosecution and litigation, and what a stronger filing would have done differently.
GAP 01 · HIGHEST IMPACT
No Computer-Readable Medium Claim Filed
The patent contains apparatus claims (Claims 1 and 9) and a method claim (Claim 17) but entirely omits a computer-readable medium (CRM) or computer-program-product claim covering the software implementation of the asynchronous training algorithm. This creates a direct design-around opportunity: a competitor that distributes software implementing the same asynchronous neural network training algorithm on commodity hardware without selling the hardware apparatus would fall outside all 21 claims. A stronger filing would have included a CRM claim reciting a non-transitory computer-readable medium storing instructions that, when executed, perform the steps of Claim 17, tying the software distribution channel to the claim scope and closing the software-only enforcement gap.
GAP 02 · HIGH IMPACT
Asynchronous Update Confined to Dependent Claims Only
The core differentiating feature — asynchronous parameter updating — is not recited in any of the three independent claims (Claims 1, 9, 17); it appears only in dependent Claims 2, 10, and 18. This means the independent claims are facially readable on synchronous distributed training systems, and an accused infringer who uses synchronous updates could argue the independent claims read on the prior art without any asynchronous limitation. The specific prosecution risk is that if Claims 2, 10, and 18 are invalidated (e.g., through IPR citing Plotnikova et al. 2015 or the Yu 2017 reference already cited), the remaining independent claims may be unenforceable against the very systems the patent was designed to cover. A stronger filing would have incorporated the asynchronous update limitation directly into the body of each independent claim, relying on dependent claims to add further specificity such as the gradient accumulator mechanism.
GAP 03 · HIGH IMPACT
Rate Control Mechanism Underprotected as Sole Dependent
Unlock to read the full analysis.
🔒
3 Critical Gaps in This Claim Set
See the full attorney-level analysis of what this application leaves unprotected — and how to draft it more defensively for your own filings.
Missing CRM software-distribution claimAsync update excluded from independent claimsRate control mechanism underprotected in dependents
US 12,099,927 B2 protects a neural network training apparatus and method in which a network of worker nodes, each storing a subgraph of a neural network in local memory, receives training data instances from a control node via parallelized message passing operations, and each worker node asynchronously updates its local subgraph parameters according to data in the received messages. The patent also protects the worker node apparatus itself and the corresponding computer-implemented method, covering the scenario where different training instances follow different control flow paths through the pipeline.
The patent is owned by Microsoft Technology Licensing, LLC, headquartered in Redmond, WA, USA. The inventors are Ryota Tomioka (Cambridge, GB), Matthew Alastair Johnson (Cambridge, GB), Daniel Stefan Tarlow (Cambridge, GB), Samuel Alexander Webster (Cambridge, GB), Dimitrios Vytiniotis (Cambridge, GB), Alexander Lloyd Gaunt (Cambridge, GB), and Maik Riechert (Cambridge, GB).
Claim 1 is a system/apparatus claim covering a full neural network training apparatus comprising a pipeline of worker nodes and a control node that sends training data to trigger parallelized message passing, with each worker node having memory and a processor to update subgraph parameters. Claim 9 is a standalone apparatus claim directed to an individual worker node comprising memory storing a neural network subgraph and a processor that receives control-node messages and updates subgraph parameters. Claim 17 is a computer-implemented method claim at a worker node covering accessing a subgraph from memory, receiving training data messages, and updating subgraph parameters accordingly.
This patent covers a way to train large neural networks faster and more efficiently by spreading the network across multiple computers (worker nodes), each handling a piece of the network. Instead of waiting for all computers to finish updating before moving on — as traditional training requires — each computer updates its portion of the network independently and immediately when it receives new information, a technique called asynchronous updating. This approach enables continuous, online training without the need for expensive graphics processing units and without having to wait for a full training cycle to complete before the network can be used.
G06N 3/08 (2023.01) — Learning algorithms or learning methods for artificial neural networks. G06N 3/04 (2023.01) — Architecture of artificial neural networks, including feed-forward, recurrent, and graph-structured networks. G06N 3/063 (2023.01) — Physical realization of artificial neural networks using optical, mechanical, or other non-electronic means, including hardware acceleration implementations.
Still have questions? PatSnap Eureka can answer them from patent data instantly. Search in Eureka
PatSnap Eureka
Ready to Draft Your Next Patent with AI?
PatSnap Eureka's AI drafting agent writes structured claims, flags coverage gaps, and positions your application for prosecution success.
Disclaimer: This analysis is generated by PatSnap Eureka AI based on publicly available patent data from the USPTO. It does not constitute legal advice and should not be relied upon as such. Patent data may be subject to change as prosecution progresses. Scores and assessments reflect automated analysis and may not capture all relevant legal or technical nuances. Always consult a qualified patent attorney for formal legal opinions on patentability, freedom to operate, or infringement.
Ask anything about this patent. PatSnap Eureka searches patents and data to answer instantly.