Patent Drafting Analysis of STMicroelectronics’ Convolutional Network MAC Hardware Accelerator | US 11,740,870 B2
Patent Drafting Analysis of STMicroelectronics' MAC Hardware Accelerator for CNN Inference | US 11,740,870 B2
A structural and strategic analysis of US 11,740,870 B2 covering claim architecture, drafting quality signals, dependency coverage, prosecution positioning, and critical gaps in STMicroelectronics' digit-serial multiply-accumulate accelerator patent.
Structural Overview
The detailed description dominates at approximately 56% of total words, reflecting deep technical exposition across 56 columns covering digit-serial architectures, folding transformations, gating techniques, and sign-magnitude coding. The claim set comprises 28 claims with 6 independent claims spanning system, mobile device, method, and MAC accelerator apparatus types, providing multimodal enforcement coverage. The 50 drawing sheets — among the most extensive in the semiconductor IP class — cover circuit schematics, timing tables, data-flow diagrams, and histograms, providing thorough visual support for the disclosed embodiments.
Section Word Distribution
↗ Click bars to exploreFigure Inventory — 50 Sheets
| Figure | Description | Role |
|---|---|---|
| FIG. 1 | Schematic representation of a neural network showing input layer, hidden layers 1 and 2, and output layer connections.Search in Eureka ↗ | Other |
| FIG. 2 | Example of ANN operation for image classification showing input image, inference block, and output class probabilities.Search in Eureka ↗ | Other |
| FIG. 3 | Embodiment of artificial neuron scheme compared with a biological neuron, showing dendrites, nucleus, axon, and mathematical model.Search in Eureka ↗ | Other |
| FIG. 4 | Examples of activation functions including Sigmoid, Hyperbolic Tangent, Rectified Linear Unit (ReLU), and Leaky ReLU.Search in Eureka ↗ | Other |
| FIG. 5 | Example connections between neuron 501 of a first convolutional layer and neurons in region 505 and region 507 of the previous layer.Search in Eureka ↗ | Other |
| FIG. 6 | Example steps of a convolution computation showing kernel 601 superimposed over input feature map 605 and corresponding output matrix 610.Search in Eureka ↗ | Claim support |
| FIG. 7 | Example operation of a convolutional layer with M depth-slices on N input feature maps having C channels, showing input fmaps and filters.Search in Eureka ↗ | Other |
| FIG. 8A | Example of discrete convolution operation with kernel 601 and padded input 805p, illustrating nine computation steps.Search in Eureka ↗ | Claim support |
| FIG. 8B | Enlarged portions of FIG. 8A showing discrete convolution with padded input 805p and kernel 601 positions.Search in Eureka ↗ | Claim support |
| FIG. 8C | Enlarged portions of FIG. 8A showing further discrete convolution steps with kernel 601 and padded input 805p.Search in Eureka ↗ | Claim support |
| FIG. 8D | Enlarged portions of FIG. 8A showing final discrete convolution steps with kernel 601 and padded input 805p.Search in Eureka ↗ | Claim support |
| FIG. 9 | Example operation of Max-pooling and Average-pooling layers with 2x2 pooling and stride 2 applied to a 4x4 input matrix.Search in Eureka ↗ | Other |
| FIG. 10 | Embodiment of a bit-serial adder architecture showing a full-adder with inputs A0, B0, carry register, TIME signal, and output X0.Search in Eureka ↗ | Key embodiment |
| FIG. 11 | Schematic representation of a digit-serial layout cell showing Cap, Bit 0 through Bit N-1, and Control sections.Search in Eureka ↗ | Key embodiment |
| FIG. 12 | Graphical representation of example area occupation versus digit size showing predicted and measured relative area curves.Search in Eureka ↗ | Other |
| FIG. 13 | Example of throughput versus digit-size for different word sizes (8b, 12b, 16b, 24b) showing sample rate in MHz.Search in Eureka ↗ | Other |
| FIG. 14 | Example illustration of area-time product (AT efficiency measure) versus digit-size showing optimal efficiency at small digit sizes.Search in Eureka ↗ | Other |
| FIG. 15 | Example scheme of a synchronous system showing combinational logic block M with inputs A, B, C, outputs X, Y, Z, and feedback loops with delay elements.Search in Eureka ↗ | Key embodiment |
| FIG. 16 | Example scheme of a circuit obtained using bit-level unfolding transformation showing blocks M(0), M(1), through M(N-1) with combinatorial logic.Search in Eureka ↗ | Key embodiment |
| FIG. 17 | Example digit-serial adder circuit with digit-size 3 showing three adder stages with inputs A0/B0, A1/B1, A2/B2 and TIME signal.Search in Eureka ↗ | Key embodiment |
| FIG. 18 | Data-flow of a parallel W-bits x W-bits multiplier unit showing digit inputs, parallel multiply stages M0 through MP-1, and digit outputs.Search in Eureka ↗ | Key embodiment |
| FIG. 19 | Data-flow of a digit-serial multiplier obtained using folding transformation showing parallel_in, digit_in, parallel x digit block H, and Ldigit_out/Hdigit_out outputs.Search in Eureka ↗ | Key embodiment |
| FIG. 20 | Example two's complement computation of -5x5 showing partial product rows and final sum via sign bit extension chain.Search in Eureka ↗ | Other |
| FIG. 21 | Example two's complement computation of -5x7 showing partial products with last partial product inversion and carry-in addition.Search in Eureka ↗ | Other |
| FIG. 22 | Embodiment of a signed digit-serial multiplier unit with digit-size 2 and word size 4 implemented using carry-save array multiplier architecture with DELAY, summing arrays, PARALLEL/SERIAL and SERIAL/PARALLEL blocks.Search in Eureka ↗ | Key embodiment |
| FIG. 23 | Embodiment of a bit-serial multiplier unit with factors of word size 4 bits showing multiply-accumulate chain with delay registers D and coefficient inputs a3, a2, a1, a0.Search in Eureka ↗ | Key embodiment |
| FIG. 24 | Embodiment of multiplier unit 2401 having multiple digit-cells 2405, with expanded view of a single digit-cell including partial product generator, carry save array, and digit-to-digit-serial component.Search in Eureka ↗ | Key embodiment |
| FIG. 25 | Example embodiment of a bit-serial multiplier unit and digit cell with partial product generator, carry save array, showing inputs B and A with digit cells b3b2b1b0.Search in Eureka ↗ | Key embodiment |
| FIG. 26 | Embodiment of a bit-serial multiplier unit and digit cell showing dual partial product generators with inputs A_low and A_up and inputs b3b2b1b0.Search in Eureka ↗ | Key embodiment |
| FIG. 27 | Embodiment of data-flow of accumulator unit or circuit showing chained digit_adder stages (digit_adder0 through digit_adderP-1) with carry propagation and digit outputs.Search in Eureka ↗ | Key embodiment |
| FIG. 28 | Data-flow of a digit-serial accumulation unit showing digit_adder (H_A) block with digit_in, partial delay (P-delta), and digit_out.Search in Eureka ↗ | Key embodiment |
| FIG. 29 | Architecture obtained using word-level unfolding showing input X through P/S converter, Stream 1 through Stream P processing, and S/P converter to output Y.Search in Eureka ↗ | Key embodiment |
| FIG. 30 | Embodiment of a digit-serial architecture-based MAC hardware accelerator showing feature input, ds_mult0 through ds_mult(aP-1) multiplier chain, ds_acc0 through ds_acc(aP-1) accumulators, digit-serial converter, and parallel accumulation output.Search in Eureka ↗ | Key embodiment |
| FIG. 31 | Example embodiment of an accumulation digit-cell showing N-bit_adder, N-bit_FF_stage (register), c_out flip-flop, and new_acc control signal.Search in Eureka ↗ | Key embodiment |
| FIG. 32 | Embodiment of a digit-cell of an example digit-serial accumulation architecture showing adder_i, register_(i+1), and acc_out_(i+1) with carry out flip-flop.Search in Eureka ↗ | Key embodiment |
| FIG. 33 | Embodiment of a digit-serial accumulation architecture showing four d_acc_cells (d_acc_cell0 through d_acc_cell3) with mult_out inputs and d_accout/c_out outputs.Search in Eureka ↗ | Key embodiment |
| FIG. 34 | Architecture of digit-serial multiplier unit embodiment showing W-bit_FF_stage, digit x parallel multiplier, W-bit_FF_stage (partial sum register), combinatorial digit-shifter, and digit serial output.Search in Eureka ↗ | Key embodiment |
| FIG. 35 | Example embodiment of a digit-serial architecture-based MAC hardware accelerator showing feature input, mem. word digit 0-3 multipliers with carry signals, s2p converter, and accumulation output.Search in Eureka ↗ | Key embodiment |
| FIG. 36 | Control logic embodiment for generating temp and new_acc control signals for multiplier units 0-3 and accumulators, including a modulus-P counter.Search in Eureka ↗ | Flow diagram |
| FIG. 37 | Embodiment of an output converter showing count[1:0] and last signals, four NFF shift register chains, 2:1 multiplexers, and acc_out_lshw output.Search in Eureka ↗ | Key embodiment |
| FIG. 38 | Example histograms of trained weights of convolutional layers conv1 through conv5 showing distribution concentrated near zero.Search in Eureka ↗ | Claim support |
| FIG. 39 | Embodiment of gating logic showing enable signal generation, gated clock generation block, and gated functional unit with clk input.Search in Eureka ↗ | Claim support |
| FIG. 40 | Embodiment of flip-flop based gating logic showing enable signal generation, D flip-flop with OR gate, and en_gclk/gclk outputs to gated functional unit.Search in Eureka ↗ | Claim support |
| FIG. 41A | Embodiment of gating logic showing OR gate combining digit_input_i bits with temp_i, D flip-flop generating en_gclk, AND gate producing gclk for gated multiplier unit.Search in Eureka ↗ | Claim support |
| FIG. 41B | Example schematic timing information for gating logic of FIG. 41A showing clk, mem_out, en, en_gclk, gclk, and mult_in waveforms.Search in Eureka ↗ | Claim support |
| FIG. 42 | MAC unit block diagram showing feature input, parallel input, four digit input/sign ports (mem. word digit 0-3 with digit_N[3]), weight input from memory, acc_output, and batch result.Search in Eureka ↗ | System architecture |
| FIG. 43 | Embodiment of array multiplier structure showing MB cells arranged with feature sign inputs a0, a1 and partial sum input to generate multiplication results.Search in Eureka ↗ | Key embodiment |
| FIG. 44 | Embodiment of a modified carry save array multiplier structure for sign-magnitude multiplication showing MB cells with a0, a1 inputs and b3b2b1b0 digits.Search in Eureka ↗ | Key embodiment |
| FIG. 45 | Example histogram of weights of a trained SqueezeNet convolutional layer showing weight values distribution from -0.15 to 0.25.Search in Eureka ↗ | Claim support |
| FIG. 46 | Schematic design flow for power consumption estimation showing RTL Verilog, synthesis, gate-level simulation, switching activity annotation, and Synopsys PrimeTime power analysis steps.Search in Eureka ↗ | Other |
| FIG. 47 | Reference architecture diagram showing five-pipeline-stage pipelined bit-parallel MAC with WxN multipliers 0-3 (pipe1-pipe4) and accumulation unit (pipe5).Search in Eureka ↗ | Other |
| FIG. 48 | Schematic diagram of exemplary electronic processor-based device 4800 including NN accelerator 4801, processor circuitry 4802, memory circuitry 4804, GPU 4812, and various I/O interfaces.Search in Eureka ↗ | System architecture |
| FIG. 49 | Embodiment of processor-based device 4900 showing convolution accelerator 4901 operating with SoC 4910 and co-processor subsystem 4915 including dual DSP clusters and global RAM.Search in Eureka ↗ | System architecture |
| FIG. 50 | Table 1 timing diagram of digit-serial MAC circuit showing 27 clock cycles of MAC unit 0-3 operations across accumulation sets acc1-acc3 for MACs 1-8.Search in Eureka ↗ | Claim support |
| FIG. 51 | Table 2 sequence of operations computed by the cluster multiplier unit showing 15 clock cycles across multiplier units 0-3 for MAC1-MAC4 operations.Search in Eureka ↗ | Claim support |
| FIG. 52 | Table 3 example of standard weight memorization strategy for P=4, showing memory locations and digit columns (Digit 3, 2, 1, 0) with weight assignments iw_n.Search in Eureka ↗ | Claim support |
| FIG. 53 | Table 4 alternative weight memorization strategy for P=4 digit-serial MAC architecture showing interleaved digit storage across memory words.Search in Eureka ↗ | Claim support |
| FIG. 54 | Table 5 example relationship between count[1:0] counter values, accumulation register ports, digit-serial converter assignments (conv1/conv2), and bit-parallel output.Search in Eureka ↗ | Claim support |
| FIG. 55 | Table 6 example of trained weights of AlexNet convolutional layers conv1-conv5 showing weights range, 95% occurrences interval, biggest magnitude boundary, and corresponding binary word.Search in Eureka ↗ | Claim support |
| FIG. 56 | Table 7 area comparison between described digit-serial MAC architecture (3897 gate count) and reference bit-parallel architecture (3665 gate count).Search in Eureka ↗ | Other |
Claim Architecture Analysis
The patent contains 28 claims with 6 independent claims covering system (Claims 1, 7, 10, 12), mobile computing device (Claim 14), method (Claim 19), and MAC hardware accelerator apparatus (Claim 23) types, providing enforcement coverage across deployment modalities. The 22 dependent claims yield a ratio of 3.67:1, below the semiconductor industry norm of 4-8:1, suggesting some missed fallback opportunities. The multimodal independent claim strategy — spanning system, device, method, and apparatus — is strategically sound, though the lack of an explicit SoC-level claim and the absence of a CRM claim represent notable omissions.
Independent Claim Dissection
| Claim | Preamble | Transition | Key Body Elements |
|---|---|---|---|
| Claim 1 | A system, | comprising: | an addressable memory array; one or more processing cores; an accelerator framework including a plurality of MAC hardware accelerators that multiply input weight digits by input feature sequentially; pre-processing logic controlling gating of multipliers of the MAC hardware acceleratorsSearch prior art ↗ |
| Claim 7 | A system, | comprising: | an addressable memory array; one or more processing cores; an accelerator framework including a plurality of MAC hardware accelerators multiplying input weight digits by input feature sequentially; MAC hardware accelerator having P digit-serial multipliers producing one W-bit by W-bit multiplication result per clock cycleSearch prior art ↗ |
| Claim 10 | A system, | comprising: | an addressable memory array; one or more processing cores; an accelerator framework including a plurality of MAC hardware accelerators multiplying input weight digits by input feature sequentially; MAC hardware accelerators performing multiplication operations via sign-magnitude codingSearch prior art ↗ |
| Claim 12 | A system, | comprising: | an addressable memory array; one or more processing cores; an accelerator framework including a plurality of MAC hardware accelerators multiplying input weight digits by input feature sequentially; individual digits of input weight stored in a series of words of a memory arraySearch prior art ↗ |
| Claim 14 | A mobile computing device, | comprising: | an imaging sensor that captures images; processing circuitry implementing a deep convolutional neural network including a memory and an accelerator framework with a plurality of MAC hardware accelerators that multiply input weight digits by input feature sequentiallySearch prior art ↗ |
| Claim 19 | A method, | comprising: | performing a plurality of multiply accumulate operations using a plurality of MAC hardware accelerators that multiply a digit-serial input by a parallel input by sequentially multiplying individual digits; generating an output based on results; controlling gating of multipliers using pre-processing logic of the accelerator frameworkSearch prior art ↗ |
| Claim 23 | A Multiple Accumulate (MAC) hardware accelerator, | comprising: | a plurality of multipliers multiplying a digit-serial input having a plurality of digits by a parallel input having a plurality of bits by sequentially multiplying individual digits of the digit-serial input; circuitry coupled to the plurality of multipliers outputting a result based on multiplication of the digit-serial input by the parallel inputSearch prior art ↗ |
Claim Dependency Tree
| Metric | This Application | Semiconductor / AI Hardware Norm |
|---|---|---|
| Total claims | 28 | 20 – 30 |
| Independent claim count | 6 | 3 – 6 |
| Dependent : Independent ratio | 3.67 : 1 | 4 – 8 : 1 |
| Method claims present? | Yes — Claim 19 | Common |
| System / apparatus claims? | Yes — Claims 1, 7, 10, 12, 23 | Always |
Drafting Quality Signals
The patent's greatest strength is the detailed figure support (50 sheets) providing rich written description for the hardware claims, particularly the gating logic in Claims 3-5 which is directly mapped to FIGS. 39-41B. The principal weakness is that the independent claims recite functional language without numeric parameters — for example, Claims 1 and 23 do not recite the digit-size N or word-size W as claim limitations, leaving potential design-around opportunities through minor parameter changes.
Strategic Intent Scorecard
Multi-dimensional assessment of this application's patent strategy quality, based on claim structure, specification depth, and prosecution positioning.
3 Critical Gaps in This Claim Set
A senior-attorney lens on the three highest-priority structural weaknesses — what each exposes in prosecution and litigation, and what a stronger filing would have done differently.
3 Critical Gaps in This Claim Set
See the full attorney-level analysis of what this application leaves unprotected — and how to draft it more defensively for your own filings.
US 11,740,870 B2 — key questions answered
Disclaimer: This analysis is generated by PatSnap Eureka AI based on publicly available patent data from the USPTO. It does not constitute legal advice and should not be relied upon as such. Patent data may be subject to change as prosecution progresses. Scores and assessments reflect automated analysis and may not capture all relevant legal or technical nuances. Always consult a qualified patent attorney for formal legal opinions on patentability, freedom to operate, or infringement.
PatSnap Eureka searches patents and data to answer instantly.