Book a demo

Patent Drafting Analysis of Microsoft Technology Licensing’s Asynchronous Neural Network Training | US 12,099,927 B2

Patent Drafting Analysis of Microsoft Technology Licensing’s Asynchronous Neural Network Training | US 12,099,927 B2
IP Drafting Analysis · US 12,099,927 B2

Patent Drafting Analysis of Microsoft Technology Licensing's Asynchronous Neural Network Training | US 12,099,927 B2

A structural and strategic analysis of Microsoft's distributed neural network training patent, examining claim architecture, drafting quality, critical gaps, and prosecution positioning across apparatus, worker-node, and method claim types.

US 12,099,927 B2Filed: Mar 28, 2022Granted: Sep 24, 2024G06N 3/08G06N 3/04G06N 3/063
Spec Words
6,200
Across 6 sections
Draft now ↗
Total Claims
21
3 independent · 18 dependent
Draft now ↗
Figure Sheets
11
System architecture, message flow, and training schedules
Draft now ↗
Published by PatSnap Insights Team · · 12 min read Verified by PatSnap Eureka Data
Overview

Structural Overview

The detailed description dominates at approximately 63% of total specification words, providing strong written-description support for the pipelined, asynchronous training architecture across 11 figures. The claim set comprises 21 claims with 3 independent claims — an apparatus (Claim 1), a worker-node apparatus (Claim 9), and a method (Claim 17) — supported by 18 dependent claims at a 6:1 dependent-to-independent ratio. Figure coverage is comprehensive, spanning system overview (FIG. 1), node topology (FIGS. 2–5), control/worker flow diagrams (FIGS. 6–8), timing schedules (FIG. 9), a recurrent network example (FIG. 10), and the computing hardware embodiment (FIG. 11).

Section Word Distribution

Detailed Desc. 3,900 w Claims 1,950 w Summary 780 w Background 620 w Brief Desc. 550 w Abstract 200 w ↗ Click bars to explore

Figure Inventory — 11 Sheets

FigureDescriptionRole
FIG. 1
Schematic of distributed neural network training system 104 connected to communications network 100, showing training data 102, control node 106, and deployed devices including smart phone 114, AR device 116, laptop 118, and smart watch 120.Search in Eureka ↗
System architecture
FIG. 2
Schematic of control node 200 connected to six worker nodes W1–W6, each containing fast memory 204, and a training data store 202, illustrating the pipeline topology.Search in Eureka ↗
System architecture
FIG. 3
Schematic of a layered neural network showing input layer 302, hidden layers 304–310, and output node 312, illustrating how the network is partitioned across worker nodes.Search in Eureka ↗
Claim support
FIG. 4
Schematic of the pipeline of FIG. 2 with dashed arrows showing forward pass message paths 400 and 402, from control node 200 through worker nodes W1–W6.Search in Eureka ↗
System architecture
FIG. 5
Schematic of the pipeline showing backward pass message paths 500, 502, and 504, with gradient updates propagated from worker node W6 back to W1 and completion message 504 returned to control node 200.Search in Eureka ↗
System architecture
FIG. 6
Flow diagram of the control node operation, starting at 600, checking training/test instance availability 602, evaluating rate criteria 606, creating and sending messages 610, and updating inflight records 618.Search in Eureka ↗
Flow diagram
FIG. 7
Flow diagram of a worker node during a forward process, receiving message 700, processing with local neural network subgraph 706, checking last node 708, sending forward message 710, and computing loss 718 when applicable.Search in Eureka ↗
Flow diagram
FIG. 8
Flow diagram of a worker node during a backward process, receiving backward pass message 800, computing gradient 802, checking gradient threshold 804, asynchronously updating local parameters 806, and sending message to next node 812.Search in Eureka ↗
Flow diagram
FIG. 9
Timing diagrams comparing three pipeline scheduling scenarios 902, 904, 906 on three machines over time, illustrating efficiency gains of asynchronous model parallelism versus synchronous update alternatives.Search in Eureka ↗
Claim support
FIG. 10
Schematic of a variable-length recurrent neural network 1000 with branch node 1010, replicated neural network nodes 1004, 1006, 1008, classifier 1012, and downstream control/application 1016, illustrating variable-length input handling.Search in Eureka ↗
System architecture
FIG. 11
Block diagram of exemplary computing-based device 1100 comprising processor 1102, fast memory 1106, communications interface 1104, memory 1108, operating system 1110, forward process 1112, and backward process 1114.Search in Eureka ↗
Claim support
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Claims

Claim Architecture Analysis

The patent contains 3 independent claims: Claim 1 (system/apparatus — full training apparatus), Claim 9 (apparatus — worker node), and Claim 17 (computer-implemented method at a worker node), providing tripartite structural coverage across the training system, individual node, and procedural dimensions. The 18 dependent claims yield a 6:1 dependent-to-independent ratio, which is at the lower end of norms for G06N-class AI/ML patents, suggesting moderate fallback depth. The strategy of having both a system-level apparatus claim (Claim 1) and a standalone worker-node apparatus claim (Claim 9) enables independent enforcement at the node level without requiring proof of the full training system.

Core inventive concept: The claims address the computational bottleneck and offline-training limitation of large neural networks by partitioning the neural network across a pipeline of worker nodes — each holding a subgraph in local memory — and enabling each worker node to "asynchronously update parameters of the subgraph of the neural network stored in the memory according to data in the received message," thereby eliminating synchronization barriers that would otherwise idle the pipeline. The asynchronous update mechanism, recited explicitly in Claims 2, 3, 10, 11, 18, and 19, is the central differentiator over conventional synchronous gradient-descent training.

Independent Claim Dissection

ClaimPreambleTransitionKey Body Elements
Claim 1A neural network training apparatuscomprising
a network of individual worker nodes forming a pipeline enabling different control flows for individual training/test instances; a control node configured to send training data instances to trigger parallelized message passing operations implementing a training algorithm; each worker node comprising a memory storing a neural network subgraph, and a processor programmed to receive a message and update subgraph parameters according to received message dataSearch prior art ↗
Claim 9A worker node of a neural network training apparatuscomprising
a memory storing a subgraph of a neural network; a processor configured to receive a message from a control node comprising training data instances triggering parallelized message passing operations, and update parameters of the neural network subgraph stored in memory according to data in the received messageSearch prior art ↗
Claim 17A computer implemented method at a worker node of a neural network training apparatuscomprising
accessing from memory a subgraph of a neural network; receiving a message from a control node comprising training data instances into a network of individual worker nodes triggering parallelized message passing operations; updating parameters of the subgraph according to data in the received messageSearch prior art ↗

Claim Dependency Tree

1 Apparatus: pipeline of worker nodes + control node, parallelized message passing, subgraph memory, parameter update per messageSearch Claim 1 prior art ↗
2 Adds: updating parameters comprises asynchronously updating subgraph parametersSearch in Eureka ↗
3 Adds: updating occurs without reference to, and independently of, other updates at other worker nodesSearch in Eureka ↗
4 Adds: training data instances are a graphical representation of an organic moleculeSearch in Eureka ↗
5 Further: organic molecule includes rings of bonded atomsSearch in Eureka ↗
6 Adds: control node keeps record of number of in-flight training data instances in the networkSearch in Eureka ↗
7 Adds: control node controls rate at which it sends training data instances into the networkSearch in Eureka ↗
8 Adds: worker nodes comprise on-chip memory and subgraph parameters are stored in on-chip memorySearch in Eureka ↗
9 Worker node apparatus: memory storing neural network subgraph, processor receiving control-node message and updating subgraph parameters per message dataSearch Claim 9 prior art ↗
10 Adds: updating parameters comprises asynchronously updating subgraph parametersSearch in Eureka ↗
11 Adds: updating occurs without reference to, and independently of, other parameter updates at other worker nodesSearch in Eureka ↗
12 Adds: training data instances are a graphical representation of an organic moleculeSearch in Eureka ↗
13 Further: organic molecule includes rings of bonded atomsSearch in Eureka ↗
14 Adds: accumulator accumulating gradients from other worker nodes; processor asynchronously updates parameters using accumulated gradients when criteria are metSearch in Eureka ↗
15 Further: criteria comprise one or more of number of accumulated gradients, NN architecture type, data instance features, worker performance, communications performance, subgraph factorsSearch in Eureka ↗
16 Adds: memory is on-chip memory and subgraph parameters stored in on-chip memorySearch in Eureka ↗
17 Method: accessing subgraph from memory, receiving control-node message with training data triggering parallelized message passing, updating subgraph parameters per received messageSearch Claim 17 prior art ↗
18 Adds: updating comprises asynchronously updating subgraph parametersSearch in Eureka ↗
19 Adds: updating occurs without reference to, and independently of, other updates at other individual worker nodesSearch in Eureka ↗
20 Adds: training data instances are a graphical representation of an organic moleculeSearch in Eureka ↗
21 Adds: updating subgraph parameters comprises computing at least one gradient of a loss function using data in the received messageSearch in Eureka ↗
MetricThis ApplicationSoftware / Cloud Norm
Total claims2115 – 30
Independent claim count32 – 5
Dependent : Independent ratio6.0 : 14 – 9 : 1
Method claims present?Yes — Claim 17Common
System / apparatus claims?Yes — Claims 1, 9Always
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Drafting Quality

Drafting Quality Signals

The claims demonstrate solid multi-tier apparatus and method coverage with consistent antecedent basis and strong spec-to-claim mapping across FIGS. 6–8 and the detailed description. However, the absence of a computer-readable medium (CRM) claim type and the significant repetition of the asynchronous update limitation across three parallel independent claims represent notable strategic omissions that a continuation or reissue could address.

Antecedent Basis
Antecedent basis is consistently maintained throughout the claim set. In Claim 1, "a network of individual worker nodes" is introduced and subsequently referenced as "the network of individual worker nodes" in the control node limitation. Claim 9 introduces "a memory" and "a processor" and uses these terms consistently in dependents (Claims 10–16). Claim 17's method steps follow the same pattern with "a memory" and "a subgraph" properly introduced before use. No orphaned "the" references were identified across the 21 claims.
Spec–Claim Consistency
All key independent claim limitations map to specific figures and paragraphs. The "pipeline enabling different control flows" limitation in Claim 1 is directly supported by the detailed description at columns 3–4 discussing variable control flow and FIG. 10 (recurrent network example). The "asynchronously update parameters" limitation (Claims 2, 10, 18) maps to FIG. 8 (backward pass flow, steps 806) and columns 7–8 describing the asynchronous update process. The "on-chip memory" limitation of Claims 8 and 16 maps directly to FIG. 11 (fast memory 1106) and columns 5–6 describing SRAM usage. No unsupported claim limitations were identified.
Transition Word Usage
All three independent claims (1, 9, 17) use "comprising" as the transitional phrase, which is strategically optimal for this technology domain — it preserves open-ended coverage and prevents a competitor from designing around by adding additional components or steps. The use of "comprising" in Claim 1's worker node sub-limitation ("each of the individual worker nodes comprising") properly extends open-ended coverage to the node-level elements. No missed opportunities for "consisting essentially of" were identified, and the "comprising" choice is appropriate given the complex, multi-component system architecture.
§112(f) Means-Plus-Function Risk
No "means for" or "step for" language appears in any of the 21 claims, avoiding mandatory §112(f) invocation. Functional language is present — for example, Claim 1 recites "a processor programmed to: receive a message... and update parameters" — but the structural anchoring to a named "processor" removes this from §112(f) territory under MPEP 2181 guidance. The specification at FIG. 11 and columns 11–12 provides detailed hardware description of processors 1102 and memory components, further insulating against §112(f) challenges. The risk is effectively managed.
⚠️
§101 Eligibility Risk
The claims carry moderate Alice/Mayo exposure because the core inventive concept — asynchronously updating neural network parameters — could be characterized as an abstract mathematical operation (gradient-based parameter optimization) under Step 2A Prong 1 of the USPTO's 2019 Revised Guidance. The §101 defense rests primarily on the structural hardware tie-in: Claim 1 recites a specific pipeline of worker nodes each having a memory and processor, and Claims 8/16 add on-chip memory — these structural elements provide the "practical application" hook. However, the method claim (Claim 17) lacks explicit hardware recitation beyond memory access, making it more vulnerable than the apparatus claims if challenged under Alice Step 2. A stronger filing would have included hardware-specific limitations in Claim 17's body or added a CRM claim with specific structural memory limitations.
⚠️
Dependent Claim Fallback Quality
The dependent claim set has a structural redundancy problem: Claims 2, 10, and 18 all add the identical asynchronous update limitation to their respective independent claims; Claims 3, 11, and 19 all add the identical "without reference to, and independently of" qualifier; and Claims 4, 12, and 20 all add the identical organic molecule data limitation — resulting in 9 of 18 dependent claims being direct mirrors across three independent claims rather than adding novel fallback positions. The genuinely value-added dependents are Claim 14 (gradient accumulator with threshold-based update trigger), Claim 15 (multi-factor dynamic criteria adjustment), and Claim 7 (rate control), which provide meaningful prosecution fallback. A stronger filing would have redistributed the mirrored claims into unique technology-specific limitations such as specific loss functions, gradient clipping, or model compression at deployment.
⚠️
Abstract Quality
The abstract accurately describes the training apparatus architecture and mentions asynchronous parameter updates — which is the core novel feature — but an examiner reading only the abstract might not identify the differentiation from prior synchronous distributed training systems. The abstract states "at least some of the message passing operations asynchronously update parameters of individual subgraphs of the neural network at the individual worker nodes" which captures the key limitation, but omits the pipeline's capacity for different control flows per instance, which is the specific structural element recited in Claim 1's preamble and distinguishes this from simpler distributed training approaches. A more precisely drafted abstract would have highlighted the variable control flow capability as a primary distinguishing feature.
Figure Support Quality
Figure coverage is comprehensive and well-matched to the claim set. FIG. 2 directly supports the "network of worker nodes" and "control node" elements of Claim 1; FIG. 4 supports the forward pass message passing limitation; FIG. 5 supports the backward pass and gradient update flow; FIG. 8 step 806 provides direct visual support for the "asynchronously update parameters" limitation of Claims 2, 10, and 18; and FIG. 11 supports the on-chip memory limitations of Claims 8 and 16. The only minor gap is that the Claim 1 preamble limitation enabling "different control flows for individual training or test instances" relies primarily on FIG. 10 (the variable-length recurrent network) and text description rather than a dedicated flow diagram, which could have been more explicitly illustrated.
Analysis powered by PatSnap Eureka. Patent text and figures publicly available from USPTO. Draft a Similar Patent
Scorecard

Strategic Intent Scorecard

Multi-dimensional assessment of this application's patent strategy quality, based on claim structure, specification depth, and prosecution positioning.

Claim Breadth
3.5
Prosecution Defensibility
3.8
Spec–Claim Consistency
4.5
Dependent Claim Coverage
3
Claim Type Diversity
3
Figure Support Quality
4.2
Breadth Prosecution Consistency Dep. Coverage Claim Types Figures
Key observation: Spec–Claim Consistency scores highest (4.5/5.0) because every structural limitation in Claims 1, 9, and 17 maps precisely to named figures and paragraphs — for example, the asynchronous update limitation maps to FIG. 8 step 806 and the on-chip memory limitation maps to FIG. 11's fast memory 1106, leaving no written-description vulnerability. Dependent Claim Coverage scores lowest (3.0/5.0) because 9 of 18 dependent claims are structural mirrors across three independent claims (Claims 2/10/18, 3/11/19, 4/12/20) that add no new fallback positions beyond what the other independent claims already provide, significantly reducing prosecution and post-grant flexibility. Practitioners should note that a continuation filing focusing on the gradient accumulator mechanism (currently only in Claim 14) as a standalone independent claim, and adding a CRM claim, would substantially strengthen the patent family's enforcement profile.
See how your own draft compares — Open Eureka IP Drafting →
Critical Gaps

3 Critical Gaps in This Claim Set

A senior-attorney lens on the three highest-priority structural weaknesses — what each exposes in prosecution and litigation, and what a stronger filing would have done differently.

🔒

3 Critical Gaps in This Claim Set

See the full attorney-level analysis of what this application leaves unprotected — and how to draft it more defensively for your own filings.

Missing CRM software-distribution claim Async update excluded from independent claims Rate control mechanism underprotected in dependents
Unlock Full Analysis — Free
Frequently asked questions

US 12,099,927 B2 — key questions answered

Still have questions? PatSnap Eureka can answer them from patent data instantly. Search in Eureka
PatSnap Eureka

Ready to Draft Your Next Patent with AI?

PatSnap Eureka's AI drafting agent writes structured claims, flags coverage gaps, and positions your application for prosecution success.

Disclaimer: This analysis is generated by PatSnap Eureka AI based on publicly available patent data from the USPTO. It does not constitute legal advice and should not be relied upon as such. Patent data may be subject to change as prosecution progresses. Scores and assessments reflect automated analysis and may not capture all relevant legal or technical nuances. Always consult a qualified patent attorney for formal legal opinions on patentability, freedom to operate, or infringement.

Ask anything about this patent.
PatSnap Eureka searches patents and data to answer instantly.
Powered by PatSnap Eureka
Link copied to clipboard

Help us improve this page

Found incorrect or outdated information? Let us know and we'll get it fixed.