Eine Demo buchen

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Jetzt ausprobieren

Edge AI data throughput optimization without more power

Edge AI Data Throughput Optimization — PatSnap Insights
Edge AI & Hardware Acceleration

Edge AI inference workloads demand ever-greater data throughput, yet the memory bandwidth and power budgets of embedded and IoT-class hardware are severely constrained. This analysis maps four interconnected technical approaches — model compression, sparsity-aware scheduling, split computing, and intelligent data reduction — that demonstrably raise effective throughput per watt without widening the memory bus or increasing clock rates.

PatSnap Insights Team Innovation Intelligence Analysts 12 Minuten Lesezeit
Teilen
Reviewed by the PatSnap Insights editorial team ·

The memory bandwidth bottleneck: why more compute alone won’t solve it

The root cause of data throughput limitations in edge AI devices is not compute density but memory bandwidth. Inter-layer batching in CNN accelerators creates temporal bandwidth spikes — bursts of memory demand that saturate the narrow buses connecting on-chip SRAM to the compute fabric — and no amount of additional MAC units resolves a traffic jam at the memory interface. Research formally characterizing this problem dates to at least 2018, when compute unit partitioning was proposed specifically to smooth these memory traffic spikes.

10×
DRAM access reduction via temporal sparsity (EdgeDRNN)
~45
Patent & literature records analyzed (2018–2026)
~44%
Share of filings from China in the analyzed dataset
4
Interconnected technical domains in the landscape

The challenge of improving data throughput on edge AI devices without increasing memory bandwidth or power consumption resolves into four interconnected technical domains: on-device model compression, sparsity-aware hardware scheduling, CNN and DNN partitioning with split computing, and intelligent data reduction at the source. These are not mutually exclusive. The most recent patent filings — concentrated in 2025 and early 2026 — combine two or more strategies simultaneously, signaling that the field has moved beyond single-axis optimization.

Inter-layer batching in CNN accelerators creates temporal bandwidth spikes in memory traffic. Research published in 2018 formally characterized this bottleneck and proposed compute unit partitioning to smooth memory traffic in CNN accelerators without widening the memory bus.

The innovation timeline in this domain spans three phases. A foundational phase from 2018 to 2020 established the theoretical frameworks — including communication-efficient edge AI algorithms and offloading strategies for IoT devices. A development phase from 2021 to 2023 produced the greatest cluster density of hardware accelerator architectures, distributed CNN inference systems, and split-computing frameworks. The maturity and integration phase from 2024 to 2026 is characterized by convergence: sparse-aware mixed accelerator architectures from Peking University, VLIW-based parallel edge hardware from Xi’an Electronic Science and Technology University, SNN-based lossless compression from Tata Consultancy Services, and energy-aware inference scheduling from Shanghai University — all filed within the past two years.

Scope note

This analysis is derived from a targeted set of patent and literature records spanning 2018 to early 2026. It represents a snapshot of innovation signals within that dataset and should not be interpreted as a comprehensive view of the full industry.

Model compression and weight encoding: the first lever for edge AI throughput

Quantizing weights to 4-bit or 8-bit fixed-point and pruning zero-weight connections reduces the volume of data that must be fetched from on-chip SRAM per inference cycle — without any increase in memory bus width or clock rate. This is the most widely represented approach in the analyzed dataset, and it is increasingly treated as a baseline rather than a differentiator.

A dedicated AI Accelerator Core (AIAC) architecture proposed by Malla Reddy Deemed to Be University (IN, 2026) uses parallel MAC arrays executing 4-bit and 8-bit quantized models stored entirely in on-chip SRAM, eliminating off-chip DRAM accesses during inference. This is the logical endpoint of the quantization trajectory: if all weights fit in SRAM, off-chip bandwidth ceases to be a constraint during the inference pass itself.

Arithmetic coding applied to 5-bit quantized CNN weights takes a complementary approach. By encoding weights offline with range scaling and decoding them in hardware at inference time, the technique achieves lossless compression of the weight set — effectively expanding the logical weight capacity of a fixed memory footprint. A separate line of work, Resource Constrained Training (RCT, 2024), maintains only a quantized model copy throughout training with dynamic per-layer bitwidth adjustment, reducing both on-chip memory footprint and the energy cost of off-chip data movement during the training cycle itself, not just at inference.

Figure 1 — Edge AI throughput techniques: patent cluster representation in the analyzed dataset
Patent cluster distribution across edge AI data throughput optimization techniques, 2018–2026 0 5 10 15 Approx. patent/lit. records ~15 ~10 ~12 ~8 Model Compression Sparsity-Aware Scheduling Split Computing & Dist. Inference Intelligent Data Reduction Model Compression Sparsity-Aware Split Computing Data Reduction
Model compression is the most represented cluster in the analyzed dataset (~15 records), but split computing and sparsity-aware scheduling are rapidly closing the gap as the field matures into hardware-software co-design.

“Model compression — quantization plus pruning — is table stakes, not differentiation. R&D investment should now focus on the next layer: sparsity-aware dynamic scheduling and compute-in-memory architectures that extract additional throughput from already-compressed models.”

Standards bodies including IEEE have been publishing work on fixed-point neural network inference since at least 2017, and the technique is now embedded in virtually every hardware accelerator filing in this dataset. According to WIPO‘s most recent patent landscape reports on AI hardware, quantization-related claims appear across the majority of edge inference accelerator patent families. The implication: organizations that have not yet implemented 4-bit or 8-bit quantization are behind the baseline, while those seeking competitive differentiation must look to the techniques described in the following sections.

Map the full patent landscape for edge AI model compression and quantization with PatSnap Eureka.

Explore patent data in PatSnap Eureka →

Sparsity-aware scheduling: extracting throughput from zero-valued computations

Sparsity-aware hardware scheduling exploits the natural sparsity of pruned or ReLU-activated neural networks to skip zero-valued computations entirely, reducing effective memory read operations and improving throughput per unit of power. The key insight is that a multiply-accumulate operation against a zero operand consumes energy and clock cycles while contributing nothing to the output — and modern pruned networks may have sparsity rates well above 50%.

Peking University’s sparse-aware hybrid accelerator (CN, 2025) demonstrates the state of the art in this cluster. A sparsity information extractor associates each neural network layer with its sparsity features and feeds a latency estimation unit that evaluates execution time across heterogeneous accelerator configurations. A load distributor then finds the minimum-latency configuration, and a forward detector routes layer outputs directly to the partner accelerator’s input buffer — eliminating intermediate DRAM writes entirely. The result is a multi-dimensional optimization that combines sparsity exploitation with heterogeneous compute routing.

The EdgeDRNN accelerator (2020) implements the delta network algorithm on an FPGA-hosted GRU-RNN, exploiting temporal sparsity to reduce DRAM weight memory accesses by up to 10×, achieving latency comparable to a 92W GPU at a fraction of the power consumption.

The dynamically reconfigurable column streaming engine (DycSe, 2023) takes a complementary approach: programmable adder modules avoid zero-padding penalties and adapt to different CNN layer shapes, reducing wasted compute cycles and memory fetches across varying inference workloads. Importantly, DycSe was explicitly designed for edge AI accelerators deployed in radiation fields — space environments and nuclear power stations — where permanent fault tolerance is required alongside low power, a constraint absent from most benchmark-oriented designs.

Figure 2 — Innovation phase timeline: patent filing density by period in the edge AI throughput optimization landscape
Edge AI data throughput optimization: patent filing density by innovation phase, 2018–2026 Niedrig Med Hoch Peak Filing density Foundational Entwicklung Maturity 2018–2020 2021–2023 2024–2026 Foundational phase Development phase Maturity & integration phase
The 2024–2026 maturity phase shows the highest concentration of active and pending applications in the dataset, indicating that the field has not yet reached saturation and that competitive filing windows remain open.

The broader principle is confirmed by Nature-published research on neuromorphic computing: event-driven, spike-based computation naturally produces sparse activation patterns that translate directly into reduced memory traffic when hardware is designed to exploit them. This convergence between algorithmic sparsity and neuromorphic hardware principles is visible in the most recent filings, particularly the brain-inspired scheduling architecture from Shenzhen Power Supply Bureau (CN, 2025) and the Tata Consultancy Services SNN compression work (EP/IN, 2025).

Split computing and distributed inference: sharing the memory load across devices

Split computing partitions a DNN model at an optimal layer boundary, running front-end inference layers on-device and offloading back-end computation to a nearby edge server or peer device. This approach reduces the per-device memory bandwidth requirement without any change to the underlying model architecture — and without transmitting raw sensor data off-device, which would itself impose bandwidth and privacy costs.

The DeeperThings framework (2021) demonstrates the distributed extreme: fully-connected and convolutional layers are partitioned across multiple IoT devices using a communication-aware layer fusion method that jointly optimizes memory, computation, and communication demands. This removes the assumption that a single device handles even the front-end layers, distributing the problem across a peer mesh.

Samsung Electronics’ 2024 US patent for DNN execution in IoT edge networks describes a method where an IoT device selects an optimal edge device from those in communication range, identifies network throughput, and determines the DNN split ratio dynamically — with the split ratio recomputed periodically to adapt to network variation.

The split-point optimization problem has itself become a focus of active research. Wuhan University’s energy-consumption prediction method (CN, 2024) constructs a global prediction model combining per-layer latency and energy data, then uses a greedy algorithm to find the optimal model split point that minimizes total energy while meeting latency constraints. Critically, the system transmits only intermediate activation tensors at the split point rather than raw input data — a design choice with direct implications for both bandwidth and data privacy.

The LMOS (Latency-Memory Optimized Splitting) algorithm (2022) formulates CNN splitting as a multi-objective optimization, achieving Pareto-efficient latency/memory trade-offs on real-world edge devices. What distinguishes the 2024–2026 generation is the shift from static to dynamic split points: Samsung and Wuhan University both describe systems that recompute the optimal partition in response to changing network conditions, moving the problem from offline optimization to real-time adaptive control.

Key finding: IP risk in dynamic split computing

Static split points (predominant in 2021–2022 filings) are being replaced by dynamic, bandwidth-aware split ratio computation across 2024–2025 filings from Samsung and Chinese universities. IP strategists should examine freedom-to-operate around dynamic DNN splitting methods, particularly in jurisdictions where these assignees hold active grants.

Application domains for split computing in this dataset span railway inspection (China Railway Fourth Survey and Design Institute, CN, using ResNet adaptive split-computing), UAV-based high-resolution image inference (Beijing University of Posts and Telecommunications, CN), and self-driving car sensor caching (an Indian enhanced edge gateway patent, IN, 2023). The OECD‘s AI policy observatory has identified distributed edge inference as a strategic priority for national AI infrastructure, reinforcing the policy tailwinds behind this technical cluster.

Track dynamic DNN split computing patents across jurisdictions with PatSnap Eureka’s freedom-to-operate tools.

Analyse split computing patents in PatSnap Eureka →

Intelligent data reduction at the source: cutting bandwidth before inference begins

Reducing the volume of data entering the inference pipeline before any compute occurs is the earliest possible intervention point for bandwidth management. These approaches operate at the sensor or data acquisition layer, not at the model layer, and their effectiveness is independent of the inference architecture downstream.

IBM’s reservoir-layer approach (US, 2021) places a reservoir layer at the edge that compresses time-series data via random projection, reducing dimensionality and hence memory traffic while preserving the temporal structure of the data for downstream inference. This technique draws on reservoir computing principles that are well-established in the academic literature but had not previously been applied directly to edge sensor network optimization at this level of specificity.

Tata Consultancy Services’ SNN-based lossless compression (EP and IN, 2025) represents a qualitatively different approach: applying spiking neural network dynamics to achieve lossless data compression for edge communication. Unlike arithmetic coding or compressed sensing — which introduce approximation or require recovery algorithms — SNN-based compression achieves lossless reconstruction at the receiving node while reducing the transmitted data volume. Within the analyzed dataset, this is the first application of spiking neural network principles to lossless edge compression, and the sub-domain has minimal granted prior art.

Tata Consultancy Services filed patents in 2025 (EP pending, IN pending) for SNN-based lossless data compression in edge communication. This approach applies spiking neural network dynamics to compress transmitted data without accuracy loss, enabling faster reconstruction. Within the analyzed dataset, this represents the first application of SNN principles to lossless compression in an edge-communication context, with minimal granted prior art in this sub-domain.

The LazyAI paradigm (Model Institute of Engineering and Technology, IN, 2025) addresses a different inefficiency: inference that runs on data that has not meaningfully changed since the previous cycle. By gating inference so that it is only triggered when incoming sensor data exceeds a meaningful change threshold, redundant compute cycles and their associated memory accesses are eliminated. This is particularly effective for continuously streaming sensor inputs — environmental monitoring, industrial process control, wearable health sensors — where most cycles may produce data nearly identical to the previous sample.

Tunable compressed sensing (CS) in AIoT systems (2022) demonstrates that adjusting the CS compression rate at the sensor node can reduce network-level data traffic significantly, with a YOLOv5-based edge gateway performing CS recovery before inference. The rate tuning introduces a controllable trade-off between compression ratio and reconstruction fidelity that can be adjusted in response to downstream accuracy requirements — a degree of adaptability absent from fixed-rate encoding schemes. The ITU‘s standardization work on IoT data compression provides a regulatory reference point for organizations deploying CS-based reduction in regulated industrial environments.

Patent landscape: who is filing where, and what the geographic distribution signals

China accounts for the largest share of patent filings in this landscape — approximately 20 out of ~45 patent documents, representing around 44% — with the majority filed in 2024–2026. The assignee base is broad: Peking University dominates the hardware architecture sub-domain, while Beijing University of Posts and Telecommunications, Wuhan University, Tsinghua University, and multiple commercial entities cover adjacent sub-domains. This breadth, combined with the recency of filings, indicates a coordinated national scaling of edge AI infrastructure investment.

Figure 3 — Geographic distribution of patent filings in the edge AI throughput optimization dataset
Geographic distribution of edge AI data throughput optimization patents by jurisdiction, analyzed dataset 2018–2026 ~45 total records China (CN) — ~44% India (IN) — ~22% United States (US) — ~20% Europe/WO — ~11% Korea (KR) — ~2% Proportions are approximate based on the analyzed patent dataset only.
China’s ~44% share of the analyzed filing dataset — concentrated in 2024–2026 — signals rapidly narrowing windows for Western and Korean competitors to establish blocking positions in key architectural sub-domains.

India is the second most active jurisdiction in this dataset, with filings from Malla Reddy University, Robert Bosch GmbH (Indian applications), Tata Consultancy Services, Samsung Electronics (Indian applications), and several smaller research institutions. The Indian filings skew toward 2025–2026, indicating recent acceleration that may reflect both national AI policy incentives and the presence of large multinational R&D centers in the region.

United States filings include Intel Corporation, IBM, EMC IP Holding (Dell Technologies), Samsung Electronics, and Ubotica Technologies — concentrated among established semiconductor and cloud infrastructure players. Ubotica Technologies (US/EP) holds three related filings across jurisdictions in the low-bandwidth neural network update sub-domain, giving it an unusually concentrated position in that specific area. Siemens Aktiengesellschaft spans WO, US, and EP with consistent hardware-accelerator transfer learning claims for factory edge devices.

Innovation in this landscape is broadly distributed across many assignees rather than concentrated in one dominant player, suggesting an open competitive landscape in which fast-moving organizations can still establish meaningful IP positions — particularly in the emerging sub-domains identified below. The EPO‘s annual patent index consistently shows AI hardware as one of the fastest-growing technical fields by new application volume, a trend that this dataset’s 2025–2026 concentration clearly reflects at the edge inference layer.

Five forward vectors are visible in the 2025–2026 filings. Brain-inspired (neuromorphic) scheduling for heterogeneous data is demonstrated by Shenzhen Power Supply Bureau’s architecture that classifies heterogeneous sensor data, matches each class to candidate processing paths via quality metrics, and assigns priority scores based on real-time importance. Adaptive quantization matrices coupled to inference pipelines appear in Hangzhou Hongsen Zhihang Technology’s low-latency large model inference system targeting UAV/drone inference under latency constraints. Compute-in-memory (CIM) architectures for edge calibration are demonstrated by Tsinghua University’s system with separate analog and digital storage-compute layers, reducing write energy associated with full weight updates. SNN-based lossless compression (Tata Consultancy Services) and bandwidth-aware shared memory pool switching (Suzhou Yuannao Intelligent Technology, CN, 2026) complete the set — the last introducing multi-level heat-aware dynamic thresholding to sustain throughput under AI training and inference workloads without expanding physical bandwidth.

The strategic implication for IP teams: hardware-software co-design is the dominant architectural paradigm in the highest-value recent patents. Product teams entering this space should pursue co-design from the outset rather than layering software optimizations onto general-purpose processors, and patent portfolios should reflect the coupling between hardware datapaths and specific algorithmic optimizations rather than claiming either in isolation. PatSnap’s innovation intelligence platform, used by over 18,000 customers across 120+ countries, provides the cross-jurisdictional filing analytics needed to track these rapidly evolving positions in real time.

Häufig gestellte Fragen

Edge AI data throughput optimization — key questions answered

Still have questions? Let PatSnap Eureka answer them with live patent and literature data.

Ask PatSnap Eureka for a deeper answer →

Referenzen

  1. Edge AI-Enabled Signal Processor for Low-Bandwidth Devices — Malla Reddy Deemed to Be University, 2026, IN
  2. Sparse-Aware Scheduler, Hybrid Acceleration Architecture, Intelligent Edge Chip and Device — Peking University, 2025, CN
  3. Sparse-Aware Scheduler, Hybrid Acceleration Architecture, Intelligent Edge Chip and Device — Peking University, 2024, CN
  4. Enabling High Speed and Low Power Operation of a Sensor Network — IBM, 2021, US
  5. Method and Device for Execution of Deep Neural Network in IoT Edge Network — Samsung Electronics, 2024, US
  6. Reduced Data Transmission in Edge Communication Using SNN-Based Lossless Data Compression — Tata Consultancy Services, 2025, EP
  7. Reduced Data Transmission in Edge Communication Using SNN-Based Lossless Data Compression — Tata Consultancy Services, 2025, IN
  8. Energy-Consumption Prediction-Based Edge-End Collaborative AI Model Inference Method — Wuhan University, 2024, CN
  9. Hardware Accelerator Extension to Transfer Learning — Siemens Aktiengesellschaft, 2022, US
  10. Hardware Accelerator Extension to Transfer Learning — Siemens Aktiengesellschaft, 2024, EP
  11. Systems and Methods for Deploying and Updating Neural Networks at the Edge — Ubotica Technologies, 2025, US
  12. LazyAI System and Method for Optimizing AI Computational Efficiency in Edge Devices — Model Institute of Engineering and Technology, 2025, IN
  13. Edge-Side Compute Acceleration Method Based on Brain-Inspired AI Architecture — Shenzhen Power Supply Bureau, 2025, CN
  14. Low-Latency Edge Large Model Inference Acceleration Method and System — Hangzhou Hongsen Zhihang Technology, 2025, CN
  15. Rapid Calibration Method for Edge-End Devices (CIM Architecture) — Tsinghua University, 2025, CN
  16. Resource Scheduling Method (Bandwidth-Aware Shared Memory Pool) — Suzhou Yuannao Intelligent Technology, 2026, CN
  17. DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices (2021, literature)
  18. EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference (2020, literature)
  19. Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices (2021, literature)
  20. Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping (2018, literature)
  21. RCT: Resource Constrained Training for Edge AI (2024, literature)
  22. DycSe: A Low-Power, Dynamic Reconfiguration Column Streaming-Based Convolution Engine (2023, literature)
  23. Data Traffic Reduction with Compressed Sensing in an AIoT System (2022, literature)
  24. Latency-Memory Optimized Splitting of Convolution Neural Networks for Resource Constrained Edge Devices (2022, literature)
  25. Communication-Efficient Edge AI: Algorithms and Systems (2020, literature)
  26. WIPO — Patent Landscape Reports on AI Hardware
  27. EPO — Annual Patent Index: AI Hardware Filing Trends
  28. OECD AI Policy Observatory — Distributed Edge Inference
  29. IEEE — Fixed-Point Neural Network Inference Standards
  30. ITU — IoT Data Compression Standardization
  31. Nature — Neuromorphic Computing and Sparse Activation Research

All data and statistics in this article are sourced from the references above and from PatSnap‘s proprietary innovation intelligence platform.

Ihr Partner für künstliche Intelligenz
für intelligentere Innovationen

PatSnap fuses the world’s largest proprietary innovation dataset with cutting-edge AI to
supercharge R&D, IP strategy, materials science, and drug discovery.

Eine Demo buchen