Near-Memory vs In-Memory Computing — PatSnap Eureka
Near-Memory vs. In-Memory Computing for Bandwidth-Bound AI
The memory wall — not compute throughput — limits AI inference and training. Understanding when to use near-memory computing (NMC) versus in-memory computing (IMC) is the critical architectural decision for R&D engineers and IP professionals designing next-generation AI accelerators.
Why Bandwidth — Not Compute — Is the AI Bottleneck
The fundamental tension driving all near-memory and in-memory computing inventions is the "memory wall" — the growing mismatch between processor computational throughput and the bandwidth available to supply data from off-chip DRAM. For large language model (LLM) inference, recommendation systems, and graph neural networks, data volume far exceeds arithmetic intensity, making bandwidth — not FLOPS — the binding constraint.
As documented in the Huawei computation task scheduling patent (2023), the two paradigms address this differently: near-memory computing (NMC) tightly couples the memory and compute processor together, reducing data-transfer latency and power through short wiring, while in-memory computing (IMC) breaks the von Neumann constraint entirely by performing computation directly inside the memory array. In NMC, the compute unit accesses memory faster than any bus-connected processor, but the memory array retains its conventional read/write semantics — it does not compute.
Fudan University's sparse neural network near-memory inference patent (2024) quantifies the stakes: data movement energy is approximately two orders of magnitude higher than computation energy in conventional architectures. This is the core motivation for both paradigms. Explore the full patent landscape on PatSnap Analytics for competitive intelligence on these filing trends.
Samsung's Neural Processing Device patent (2022) introduces the concept of operational intensity (ops/byte) as a hardware-measured metric: beyond approximately 256 ops/byte, increasing memory bandwidth no longer improves algorithm performance — defining the boundary between memory-bound and compute-bound regimes where NMC and IMC interventions are justified.
How NMC and IMC Each Attack the Bandwidth Bottleneck
Both paradigms reduce data movement, but through fundamentally different mechanisms — with distinct trade-offs for AI workload types, model sizes, and update frequency.
Shortened Data Path via 3D-Stacked DRAM and TSV Integration
NMC places compute logic physically adjacent to — but not inside — the memory array. As described in the Fudan University Processing-In-Controller (PIC) patent (2025), a near-memory processing module is integrated inside the DRAM controller and packaged with 3D-stacked DRAM via TSV (through-silicon via) or hybrid bonding interconnects. The memory cells themselves are not modified to perform arithmetic. Weight prefetching from DRAM into the computing circuit runs in parallel with ongoing GEMV operations, eliminating weight-read wait latency. Asynchronous FIFOs support flexible clock frequency configurations, enabling low-power data movement through frequency downscaling. NMC retains standard DRAM read/write semantics — making it compatible with frequent weight updates and on-device fine-tuning.
3D-stacked HBM · TSV · PIC architecture · GEMV prefetchMAC Operations Embedded Inside the Storage Array
IMC moves the compute primitive inside the memory bit-cell itself, eliminating even the short-range data movement that NMC still requires for read operations. Realized in two substrate flavors: analog (ReRAM, MRAM, PCM, Flash — where conductance values represent weights) and digital (SRAM-based multiply-accumulate or TCAM-based lookup). Chinese Academy of Sciences ICT patents (2024, 2025) describe how ReRAM-based IMC systems pre-write neural network weights as resistance values into crossbar arrays, achieving 2–3 orders of magnitude better energy efficiency than CMOS by eliminating the recurring read-compute-write cycle. Princeton University documents approximately 10× better energy efficiency (TOPS/W) and 10× better compute density (TOPS/mm²) vs. optimized digital accelerators — but notes that IMC does not reduce memory write cost, and most demonstrations are limited to less than 128Kb capacity.
ReRAM crossbar · SRAM MAC · TCAM lookup · analog non-idealityFull-Analog Non-MAC Operations Eliminate ADC/DAC Overhead
Conventional CIM architectures achieve excellent throughput on MAC operations but depend on digital-domain circuits for batch normalization and activation functions, creating bottlenecks in speed, power, and area. Tsinghua University's 2025 patent uses analog bias arrays and global resistor-ladder voltage dividers to perform ReLU and batch normalization entirely in the analog domain, eliminating ADC/DAC conversion overhead for non-MAC operations. UESTC's TCAM+LUT-based IMC accelerator (2023) instead uses digital ternary content-addressable memory to perform multiply operations in parallel via search semantics — avoiding analog non-idealities while still executing computation inside the storage array. These approaches address the precision degradation that limits analog IMC at advanced CMOS nodes.
Full-analog BN · ReLU in analog · TCAM search semanticsDynamic Routing Between Paradigms Based on Workload Efficiency
Mentium Technologies Inc. has patented hybrid architectures (2021) that route workloads between conventional digital accelerators and IMC accelerators based on measured processing efficiency. These patents quantify a critical IMC liability: efficiency drops when the number of network parameters is small, because peripheral circuit power (ADC, DAC, sense amplifiers) can exceed the power saved by eliminating data transfer. Zhejiang Laboratory's Heterogeneous Storage-Compute Fusion System (2020) pairs a 3D-stacked DRAM NMC module (for bandwidth-intensive streaming) with a memristor-based IMC module (for stationary weights), mapping workloads to each substrate based on data stationarity. Princeton's Configurable IMC Engine (2019) includes a dedicated near-memory computing path as a fallback for operations better suited to conventional digital execution.
Dynamic routing · peripheral overhead · data stationarity · NVM+DRAM fusionPatent Filing Trends and Performance Benchmarks
Quantitative signals from the NMC and IMC patent dataset — filing velocity by paradigm and key efficiency benchmarks from cited inventions.
NMC vs. IMC Patent Filing Velocity (2019–2025)
IMC filings have outpaced NMC consistently since 2021, reflecting growing academic and industry investment in compute-in-memory architectures for AI acceleration.
Energy Efficiency Gains by Architecture Type
IMC achieves 2–3 orders of magnitude better energy efficiency than CMOS for MAC operations; Princeton documents 10× TOPS/W vs digital accelerators. NMC delivers meaningful gains with fewer trade-offs.
NMC vs. IMC: Architectural Trade-offs for AI Workloads
A structured comparison across the dimensions that matter most for bandwidth-bound AI inference and training, drawn directly from the patent dataset.
| Dimension | Near-Memory Computing (NMC) | In-Memory Computing (IMC) |
|---|---|---|
| Proximity to data | Compute in controller or package adjacent to DRAM — read operations still required from memory array | MAC result generated by physics of the array — weight data is never "read" in traditional sense for resident weights |
| Energy efficiency | ~5× vs CPU-DRAM path via 3D stacking and TSV interconnects (Fudan, 2025) | 10× TOPS/W vs digital (Princeton); 2–3 orders of magnitude vs CMOS (CAS ICT) LEADS |
| Memory write cost | Symmetric read/write — standard DRAM semantics, well-characterized, suitable for frequent weight updates LEADS | Write cost NOT reduced — expensive for NVM substrates with limited write endurance (Princeton, 2023) |
| Scalability | Scales with DRAM technology node — no precision degradation, standard digital logic (Princeton, 2023) LEADS | Most demos limited to <128Kb; advanced CMOS node use not demonstrated; analog non-ideality at scale (Princeton) |
| Weight update workloads | Preferred for fine-tuning and on-device learning — symmetric write cost, no endurance constraints | Costly weight writes; NVM substrates have limited write endurance — not suitable for frequent updates |
| Small model efficiency | Efficient across wider model size range — no peripheral circuit overhead cliff | Efficiency collapses for small parameter counts — peripheral circuit power (ADC, DAC) exceeds savings (Mentium, 2021) |
| Operational complexity | Separate compute module — conventional memory transactions do not compete with compute | PIM compute and conventional read/write compete for DRAM access — requires dynamic priority management (Google, 2025) |
| Best-fit AI workloads | Recommendation engines, graph neural networks, streaming bandwidth workloads, on-device fine-tuning | Stationary-weight inference — CNN, transformer attention with pre-loaded weights; LLM inference with KV cache in CIM tier |
Identify White Space in the NMC / IMC Patent Landscape
Use PatSnap Eureka to find unclaimed technical territory across 50+ assignees and 6 jurisdictions.
Key Assignees and Their Strategic Positions
The NMC and IMC patent landscape spans US, China, South Korea, Japan, India, and WIPO/PCT jurisdictions. These are the dominant players and their distinct technical focus areas.
Princeton University — IMC Foundations
The most prolific IMC architecture assignee in the dataset, with patents in US, WO, KR, IN, JP, and CN jurisdictions. Covers scalable configurable IMC core arrays with on-chip networks, configurable bit cells, and near-memory computing path fallbacks. Establishes the foundational ~10× energy efficiency claims and documents scalability barriers most clearly — including the <128Kb capacity limit and advanced CMOS node integration challenges.
IBM — System-Level IMC Deployment
Focuses on practical large-scale IMC deployment: 2D mesh CIM accelerator architectures with multiple analog CIM tiles interconnected via on-chip networks, 3D crossbar DNN weight assignment optimization, and weight/bias reuse for varying minibatch sizes. Addresses the operational question of how to maximize throughput of an existing IMC hardware substrate across diverse inference workloads at production scale.
Samsung Electronics — Bandwidth-Aware Scheduling
Contributes bandwidth-aware scheduling at the neural processing unit level: the processor determines available or predicted bandwidth for both external storage and system buses, then selects a scheduling strategy that co-optimizes memory access and processing element array utilization. Introduces the operational intensity threshold concept (~256 ops/byte) as a hardware-observable criterion for memory-bound vs. compute-bound phase identification.
Apple — Dynamic Phase Detection
Drives the dynamic phase detection direction, with efficiency control metrics and adaptive operating point management for tasks that transition between computation-bound and memory-bound phases within a single inference run. Particularly relevant for large language model token generation, which is inherently memory-bound during the autoregressive decode phase — a key commercial AI deployment scenario.
Matching AI Workloads to the Right Memory Paradigm
The choice between NMC and IMC is not merely architectural — it is workload-driven. Memory-bound workloads are those in which the arithmetic intensity (compute operations per byte fetched) is low, meaning the bandwidth bottleneck, not compute throughput, limits performance. This is precisely the regime of large language model (LLM) inference, recommendation systems, and graph neural networks.
Apple's efficiency and power control patents (2025) describe a controller that monitors real-time operational parameters to determine whether a task is in a computation-bound or memory-bound phase, then adaptively drives the computation engine to an efficient operating point — explicitly calling out LLM inference as a key target workload. The autoregressive decode phase of LLM generation is inherently memory-bound, making it a prime candidate for NMC or IMC acceleration.
Amazon's arithmetic-intensity-based load cloning patent (2025) operationalizes the arithmetic intensity concept directly in a neural network compiler: load operations with an arithmetic intensity factor (AIF) above a threshold are selected and cloned, with computation clusters inserted between each clone, so that the available local memory bandwidth is more fully utilized. This software-level mechanism complements both NMC and IMC hardware by ensuring that workload scheduling respects bandwidth availability. See how PatSnap's life sciences intelligence applies similar workload analysis principles to biotech R&D.
For large model inference, the data placement problem is critical. Hangzhou Micro-Nano Core Electronics' AI model compiler (2025) describes a two-tier CIM + near-DRAM architecture: KV caches with high access frequency are assigned to the CIM tier, while model weights reside in the DRAM tier with asynchronous prefetch instructions inserted by the compiler to hide data transfer latency. This is a concrete example of how NMC and IMC are combined in practice for LLM inference. The IEEE has documented similar hierarchical memory strategies in semiconductor research literature.
Google's memory access scheduling patent for PIM architectures (2025) addresses contention management when PIM compute operations and conventional memory read/write transactions compete for DRAM access — a challenge that conventional NMC architectures (with a separate compute module) largely avoid. Explore the PatSnap Analytics platform to map workload-aware scheduling patents across all major assignees.
Seven Key Takeaways from the Patent Dataset
Distilled from 50+ patents across 6 jurisdictions — the findings that matter most for R&D engineers and IP professionals designing or evaluating memory computing architectures.
NMC Retains Read Semantics; IMC Eliminates Weight Movement Entirely
NMC reduces data movement distance but retains memory read semantics, placing compute in the controller or package adjacent to DRAM. IMC eliminates weight movement entirely by performing MAC operations inside the storage array, yielding energy efficiency gains of 2–3 orders of magnitude per the CAS ICT Automatic Synthesis Method for CNN Accelerator Architecture patents.
NMC: SEN, 2023 · IMC: CAS ICT, 2025IMC Efficiency Collapses for Small Parameter Counts
IMC accelerator efficiency drops when the number of network parameters is small, because peripheral circuit power (ADC, DAC) can exceed the power saved by eliminating data transfer — as quantified in Mentium Technologies' Digital-Analog Hybrid System Architecture patent (2021). NMC remains efficient across a wider range of model sizes.
Mentium Technologies, 2021Memory Write Cost Asymmetry Is a Critical IMC Liability
Princeton University's Scalable Array Architecture for In-Memory Computing explicitly documents that IMC reduces read and compute cost but not write cost — making NMC preferable for workloads with frequent weight updates or fine-tuning. NVM substrates have limited write endurance that compounds this liability.
Princeton University, 2023Operational Intensity (~256 ops/byte) Is the Defining Routing Metric
Samsung's Neural Processing Device patent documents an empirical threshold of approximately 256 ops/byte, beyond which increasing memory bandwidth no longer improves algorithm performance — defining the memory-bound regime where NMC and IMC interventions are justified. Below this threshold, bandwidth is the binding constraint.
Samsung Electronics, 2022Near-Memory vs. In-Memory Computing — key questions answered
Near-memory computing (NMC) tightly couples the memory and compute processor together, reducing data-transfer latency and power through short wiring, while in-memory computing (IMC) breaks the von Neumann constraint entirely by performing computation directly inside the memory. In NMC, the compute unit accesses memory faster than any bus-connected processor, but the memory array retains its conventional read/write operational semantics — it does not compute.
ReRAM-based IMC systems achieve 2–3 orders of magnitude better energy efficiency than conventional CMOS-based accelerators by eliminating the recurring read-compute-write cycle across memory buses. Princeton University's IMC array patents document that IMC-based neural network accelerators can simultaneously achieve approximately 10× better energy efficiency (TOPS/W) and 10× better compute density (TOPS/mm²) compared to optimized digital accelerators.
IMC accelerator efficiency drops when the number of network parameters is small, because peripheral circuit power (ADC, DAC) can exceed the power saved by eliminating data transfer — directly quantifying the regime where NMC or conventional digital execution is preferable. NMC remains efficient across a wider range of model sizes.
Samsung's Neural Processing Device patent documents an empirical threshold of approximately 256 ops/byte, beyond which increasing memory bandwidth no longer improves algorithm performance, defining the transition between memory-bound and compute-bound regimes. Below this threshold, NMC and IMC interventions are justified.
Princeton University's Scalable Array Architecture for In-Memory Computing explicitly documents that IMC reduces memory read and compute cost but does not reduce memory write cost — writing new weights to the crossbar is expensive, particularly for NVM substrates with limited write endurance. This makes NMC preferable for workloads with frequent weight updates such as on-device learning or fine-tuning.
Hybrid NMC+IMC architectures represent the dominant practical direction, as demonstrated by Zhejiang Laboratory's Heterogeneous Storage-Compute Fusion System pairing 3D DRAM NMC with memristor IMC, and Princeton's Configurable In-Memory Computing Engine incorporating a dedicated near-memory computing path fallback within an IMC chip. The Hangzhou Micro-Nano Core Electronics compiler for hierarchical memory AI accelerators also implements a two-tier CIM + near-DRAM architecture where KV caches with high access frequency are assigned to the CIM tier while model weights reside in the DRAM tier.
Still have questions about NMC and IMC architectures? Let PatSnap Eureka search the patent literature for you.
Ask PatSnap Eureka About Memory ComputingAccelerate Your Memory Architecture R&D with AI-Powered Patent Intelligence
Join 18,000+ innovators already using PatSnap Eureka to navigate the NMC, IMC, and PIM patent landscape — faster than any manual search.
References
- Techniques to Utilize Near Memory Compute Circuitry for Memory-Bound Workloads — SEN, SUJOY, 2023
- Controller-Internal Near-Memory Computing Acceleration Circuit with Weight Prefetching — Fudan University, 2025
- A Method for Achieving Load-Balanced Sparse Neural Network Near-Memory Inference Acceleration — Fudan University, 2024
- Computation Task Scheduling Device, Computing Device, and Method — Huawei Technologies, 2023
- Automatic Synthesis Method for CNN Accelerator Architecture Oriented to In-Memory Computing — Chinese Academy of Sciences ICT, 2025
- Automatic Synthesis Method for CNN Accelerator Architecture Oriented to In-Memory Computing — Chinese Academy of Sciences ICT, 2024
- Full-Analog Non-Multiply-Accumulate Operation Method and Apparatus for Analog In-Memory Computing — Tsinghua University, 2025
- TCAM and LUT-Based IMC Architecture Neural Network Accelerator — UESTC, 2023
- Scalable Array Architecture for In-Memory Computing (US) — The Trustees of Princeton University, 2023
- Scalable Array Architecture for In-Memory Computing (WO) — The Trustees of Princeton University, 2021
- Configurable In-Memory Computing Engine, Platform, Bit Cells and Layouts — The Trustees of Princeton University, 2019
- Two-Dimensional Mesh for Compute-In-Memory Accelerator Architecture — IBM, 2023
- Assigning DNN Weights to a 3D Crossbar Array — IBM, 2024
- Reusing Weights and Biases in an AI Accelerator for Different Minibatch Sizes — IBM, 2025
- Digital-IMC Hybrid System Architecture for Neural Network Acceleration — Mentium Technologies, 2021
- Digital-Analog Hybrid System Architecture for Neural Network Acceleration — Mentium Technologies, 2021
- Neural Network Operation Method, Device, and Electronic Equipment Based on Neural Network — Samsung Electronics, 2024
- Neural Processing Device and Operation Method — Samsung Electronics, 2022
- Efficiency and Power Control of Tasks Having Computation Bound and Memory Bound Phases — Apple Inc., 2025
- Arithmetic-Intensity Based Load Cloning — Amazon, 2025
- Method and System for Compiling AI Models for AI Accelerators with Hierarchical Memory — Hangzhou Micro-Nano Core Electronics, 2025
- Memory Access Scheduling for Parallel Computations Using a Processing-In-Memory Architecture — Google LLC, 2025
- Neural Network Accelerator with Parameters Resident on the Chip — Google LLC, 2020
- Heterogeneous Storage-Compute Fusion System for DNN Inference Acceleration — Zhejiang Laboratory, 2020
- Adaptive Utilization-Based In-Memory Computing Accelerator, System, and Method — South University of Science and Technology, 2025
- IEEE — Semiconductor and Memory Architecture Research
- JEDEC — DRAM and HBM Standards
- arXiv — LLM Inference Memory Bottleneck Research
All data and statistics on this page are sourced from the references above and from PatSnap's proprietary innovation intelligence platform. Patent analysis conducted via PatSnap Eureka.
PatSnap Eureka searches 50+ patents and research literature to answer instantly.