Book a demo

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Try now

Near-Memory vs In-Memory Computing — PatSnap Eureka

Near-Memory vs In-Memory Computing — PatSnap Eureka
Memory Architecture · AI Acceleration

Near-Memory vs. In-Memory Computing for Bandwidth-Bound AI

The memory wall — not compute throughput — limits AI inference and training. Understanding when to use near-memory computing (NMC) versus in-memory computing (IMC) is the critical architectural decision for R&D engineers and IP professionals designing next-generation AI accelerators.

NMC vs IMC: Key Performance Dimensions — Energy Efficiency 10× IMC, Scalability NMC leads, Write Cost IMC high, Precision NMC leads, Peripheral Overhead IMC high for small models Radar comparison of near-memory computing (NMC) and in-memory computing (IMC) across five key performance dimensions for bandwidth-bound AI workloads, derived from patent analysis via PatSnap Eureka. IMC leads on energy efficiency and data movement elimination; NMC leads on scalability, precision, and write cost symmetry. Energy Efficiency Data Movement Write Cost Scalability Precision NMC IMC
50+
Patents analysed across NMC & IMC
10×
Better energy efficiency (TOPS/W) for IMC vs digital
256
ops/byte threshold defining memory-bound workloads
2–3 OOM
Energy efficiency gain of ReRAM IMC over CMOS
The Memory Wall Problem

Why Bandwidth — Not Compute — Is the AI Bottleneck

The fundamental tension driving all near-memory and in-memory computing inventions is the "memory wall" — the growing mismatch between processor computational throughput and the bandwidth available to supply data from off-chip DRAM. For large language model (LLM) inference, recommendation systems, and graph neural networks, data volume far exceeds arithmetic intensity, making bandwidth — not FLOPS — the binding constraint.

As documented in the Huawei computation task scheduling patent (2023), the two paradigms address this differently: near-memory computing (NMC) tightly couples the memory and compute processor together, reducing data-transfer latency and power through short wiring, while in-memory computing (IMC) breaks the von Neumann constraint entirely by performing computation directly inside the memory array. In NMC, the compute unit accesses memory faster than any bus-connected processor, but the memory array retains its conventional read/write semantics — it does not compute.

Fudan University's sparse neural network near-memory inference patent (2024) quantifies the stakes: data movement energy is approximately two orders of magnitude higher than computation energy in conventional architectures. This is the core motivation for both paradigms. Explore the full patent landscape on PatSnap Analytics for competitive intelligence on these filing trends.

Samsung's Neural Processing Device patent (2022) introduces the concept of operational intensity (ops/byte) as a hardware-measured metric: beyond approximately 256 ops/byte, increasing memory bandwidth no longer improves algorithm performance — defining the boundary between memory-bound and compute-bound regimes where NMC and IMC interventions are justified.

~256
ops/byte threshold beyond which more bandwidth yields no gain (Samsung)
10×
Better TOPS/W and TOPS/mm² for IMC vs digital (Princeton)
2–3 OOM
Energy efficiency advantage of ReRAM IMC over CMOS (CAS ICT)
<128Kb
Typical IMC array capacity limit identified by Princeton
  • NMC places compute in the controller or package adjacent to DRAM
  • IMC embeds MAC operations inside the storage array itself
  • Operational intensity (ops/byte) is the key workload routing metric
  • Data movement energy is ~100× higher than compute energy (Fudan)
  • LLM autoregressive decode is inherently memory-bound (Apple)
Search Memory Wall Patents →
Architectural Mechanisms

How NMC and IMC Each Attack the Bandwidth Bottleneck

Both paradigms reduce data movement, but through fundamentally different mechanisms — with distinct trade-offs for AI workload types, model sizes, and update frequency.

Near-Memory Computing (NMC / PNM)

Shortened Data Path via 3D-Stacked DRAM and TSV Integration

NMC places compute logic physically adjacent to — but not inside — the memory array. As described in the Fudan University Processing-In-Controller (PIC) patent (2025), a near-memory processing module is integrated inside the DRAM controller and packaged with 3D-stacked DRAM via TSV (through-silicon via) or hybrid bonding interconnects. The memory cells themselves are not modified to perform arithmetic. Weight prefetching from DRAM into the computing circuit runs in parallel with ongoing GEMV operations, eliminating weight-read wait latency. Asynchronous FIFOs support flexible clock frequency configurations, enabling low-power data movement through frequency downscaling. NMC retains standard DRAM read/write semantics — making it compatible with frequent weight updates and on-device fine-tuning.

3D-stacked HBM · TSV · PIC architecture · GEMV prefetch
In-Memory Computing (IMC / CIM / PIM)

MAC Operations Embedded Inside the Storage Array

IMC moves the compute primitive inside the memory bit-cell itself, eliminating even the short-range data movement that NMC still requires for read operations. Realized in two substrate flavors: analog (ReRAM, MRAM, PCM, Flash — where conductance values represent weights) and digital (SRAM-based multiply-accumulate or TCAM-based lookup). Chinese Academy of Sciences ICT patents (2024, 2025) describe how ReRAM-based IMC systems pre-write neural network weights as resistance values into crossbar arrays, achieving 2–3 orders of magnitude better energy efficiency than CMOS by eliminating the recurring read-compute-write cycle. Princeton University documents approximately 10× better energy efficiency (TOPS/W) and 10× better compute density (TOPS/mm²) vs. optimized digital accelerators — but notes that IMC does not reduce memory write cost, and most demonstrations are limited to less than 128Kb capacity.

ReRAM crossbar · SRAM MAC · TCAM lookup · analog non-ideality
Analog IMC Innovations

Full-Analog Non-MAC Operations Eliminate ADC/DAC Overhead

Conventional CIM architectures achieve excellent throughput on MAC operations but depend on digital-domain circuits for batch normalization and activation functions, creating bottlenecks in speed, power, and area. Tsinghua University's 2025 patent uses analog bias arrays and global resistor-ladder voltage dividers to perform ReLU and batch normalization entirely in the analog domain, eliminating ADC/DAC conversion overhead for non-MAC operations. UESTC's TCAM+LUT-based IMC accelerator (2023) instead uses digital ternary content-addressable memory to perform multiply operations in parallel via search semantics — avoiding analog non-idealities while still executing computation inside the storage array. These approaches address the precision degradation that limits analog IMC at advanced CMOS nodes.

Full-analog BN · ReLU in analog · TCAM search semantics
Hybrid NMC + IMC Architectures

Dynamic Routing Between Paradigms Based on Workload Efficiency

Mentium Technologies Inc. has patented hybrid architectures (2021) that route workloads between conventional digital accelerators and IMC accelerators based on measured processing efficiency. These patents quantify a critical IMC liability: efficiency drops when the number of network parameters is small, because peripheral circuit power (ADC, DAC, sense amplifiers) can exceed the power saved by eliminating data transfer. Zhejiang Laboratory's Heterogeneous Storage-Compute Fusion System (2020) pairs a 3D-stacked DRAM NMC module (for bandwidth-intensive streaming) with a memristor-based IMC module (for stationary weights), mapping workloads to each substrate based on data stationarity. Princeton's Configurable IMC Engine (2019) includes a dedicated near-memory computing path as a fallback for operations better suited to conventional digital execution.

Dynamic routing · peripheral overhead · data stationarity · NVM+DRAM fusion
PatSnap Eureka

Map the Full NMC and IMC Patent Landscape

50+ patents from Princeton, IBM, Samsung, Apple, Fudan, Google — all searchable in one platform.

Analyse IMC Patents in Eureka
Data Intelligence

Patent Filing Trends and Performance Benchmarks

Quantitative signals from the NMC and IMC patent dataset — filing velocity by paradigm and key efficiency benchmarks from cited inventions.

NMC vs. IMC Patent Filing Velocity (2019–2025)

IMC filings have outpaced NMC consistently since 2021, reflecting growing academic and industry investment in compute-in-memory architectures for AI acceleration.

NMC vs IMC Patent Filing Velocity 2019–2025: NMC 2,3,4,5,7,8,9 patents; IMC 3,4,6,7,10,11,13 patents per year Year-by-year patent filing counts for near-memory computing (NMC) and in-memory computing (IMC) architectures targeting bandwidth-bound AI workloads, based on PatSnap Eureka dataset of 50+ patents filed 2019–2025. IMC filings accelerated sharply from 2022 onward. 14 10 7 3 0 2019 2020 2021 2022 2023 2024 2025 NMC IMC

Energy Efficiency Gains by Architecture Type

IMC achieves 2–3 orders of magnitude better energy efficiency than CMOS for MAC operations; Princeton documents 10× TOPS/W vs digital accelerators. NMC delivers meaningful gains with fewer trade-offs.

Energy Efficiency Gains by Architecture: ReRAM IMC vs CMOS 100-1000×, Princeton IMC vs Digital 10×, NMC 3D-stacked vs CPU-DRAM 5×, Hybrid NMC+IMC 8×, Conventional Digital baseline 1× Comparative energy efficiency multipliers for different memory computing architectures targeting bandwidth-bound AI workloads, derived from patent literature analysis via PatSnap Eureka. ReRAM-based IMC leads with 2–3 orders of magnitude advantage over CMOS per CAS ICT patents; Princeton's IMC documents 10× over optimized digital accelerators. 1000× 100× 10× ~1000× ReRAM IMC vs CMOS 10× Princeton IMC vs Digital ~8× Hybrid NMC+IMC ~5× NMC 3D vs CPU-DRAM Conventional Digital Source: PatSnap Eureka · CAS ICT, Princeton, Zhejiang Lab patents · 2019–2025

Run your own NMC and IMC patent landscape analysis in PatSnap Eureka

Search Memory Architecture Patents
Head-to-Head Comparison

NMC vs. IMC: Architectural Trade-offs for AI Workloads

A structured comparison across the dimensions that matter most for bandwidth-bound AI inference and training, drawn directly from the patent dataset.

Dimension Near-Memory Computing (NMC) In-Memory Computing (IMC)
Proximity to data Compute in controller or package adjacent to DRAM — read operations still required from memory array MAC result generated by physics of the array — weight data is never "read" in traditional sense for resident weights
Energy efficiency ~5× vs CPU-DRAM path via 3D stacking and TSV interconnects (Fudan, 2025) 10× TOPS/W vs digital (Princeton); 2–3 orders of magnitude vs CMOS (CAS ICT) LEADS
Memory write cost Symmetric read/write — standard DRAM semantics, well-characterized, suitable for frequent weight updates LEADS Write cost NOT reduced — expensive for NVM substrates with limited write endurance (Princeton, 2023)
Scalability Scales with DRAM technology node — no precision degradation, standard digital logic (Princeton, 2023) LEADS Most demos limited to <128Kb; advanced CMOS node use not demonstrated; analog non-ideality at scale (Princeton)
Weight update workloads Preferred for fine-tuning and on-device learning — symmetric write cost, no endurance constraints Costly weight writes; NVM substrates have limited write endurance — not suitable for frequent updates
Small model efficiency Efficient across wider model size range — no peripheral circuit overhead cliff Efficiency collapses for small parameter counts — peripheral circuit power (ADC, DAC) exceeds savings (Mentium, 2021)
Operational complexity Separate compute module — conventional memory transactions do not compete with compute PIM compute and conventional read/write compete for DRAM access — requires dynamic priority management (Google, 2025)
Best-fit AI workloads Recommendation engines, graph neural networks, streaming bandwidth workloads, on-device fine-tuning Stationary-weight inference — CNN, transformer attention with pre-loaded weights; LLM inference with KV cache in CIM tier

Identify White Space in the NMC / IMC Patent Landscape

Use PatSnap Eureka to find unclaimed technical territory across 50+ assignees and 6 jurisdictions.

Find Patent White Space →
Innovation Landscape

Key Assignees and Their Strategic Positions

The NMC and IMC patent landscape spans US, China, South Korea, Japan, India, and WIPO/PCT jurisdictions. These are the dominant players and their distinct technical focus areas.

🎓

Princeton University — IMC Foundations

The most prolific IMC architecture assignee in the dataset, with patents in US, WO, KR, IN, JP, and CN jurisdictions. Covers scalable configurable IMC core arrays with on-chip networks, configurable bit cells, and near-memory computing path fallbacks. Establishes the foundational ~10× energy efficiency claims and documents scalability barriers most clearly — including the <128Kb capacity limit and advanced CMOS node integration challenges.

🏢

IBM — System-Level IMC Deployment

Focuses on practical large-scale IMC deployment: 2D mesh CIM accelerator architectures with multiple analog CIM tiles interconnected via on-chip networks, 3D crossbar DNN weight assignment optimization, and weight/bias reuse for varying minibatch sizes. Addresses the operational question of how to maximize throughput of an existing IMC hardware substrate across diverse inference workloads at production scale.

Samsung Electronics — Bandwidth-Aware Scheduling

Contributes bandwidth-aware scheduling at the neural processing unit level: the processor determines available or predicted bandwidth for both external storage and system buses, then selects a scheduling strategy that co-optimizes memory access and processing element array utilization. Introduces the operational intensity threshold concept (~256 ops/byte) as a hardware-observable criterion for memory-bound vs. compute-bound phase identification.

📱

Apple — Dynamic Phase Detection

Drives the dynamic phase detection direction, with efficiency control metrics and adaptive operating point management for tasks that transition between computation-bound and memory-bound phases within a single inference run. Particularly relevant for large language model token generation, which is inherently memory-bound during the autoregressive decode phase — a key commercial AI deployment scenario.

🔒
Unlock Mentium, Google, Fudan & CAS ICT Profiles
See the full strategic positioning of all 9 key assignees — including Chinese academic leaders and hybrid routing specialists.
Mentium hybrid routing Google PIM scheduling Fudan NMC PIC + more
Explore All Assignees in Eureka →
Workload Intelligence

Matching AI Workloads to the Right Memory Paradigm

The choice between NMC and IMC is not merely architectural — it is workload-driven. Memory-bound workloads are those in which the arithmetic intensity (compute operations per byte fetched) is low, meaning the bandwidth bottleneck, not compute throughput, limits performance. This is precisely the regime of large language model (LLM) inference, recommendation systems, and graph neural networks.

Apple's efficiency and power control patents (2025) describe a controller that monitors real-time operational parameters to determine whether a task is in a computation-bound or memory-bound phase, then adaptively drives the computation engine to an efficient operating point — explicitly calling out LLM inference as a key target workload. The autoregressive decode phase of LLM generation is inherently memory-bound, making it a prime candidate for NMC or IMC acceleration.

Amazon's arithmetic-intensity-based load cloning patent (2025) operationalizes the arithmetic intensity concept directly in a neural network compiler: load operations with an arithmetic intensity factor (AIF) above a threshold are selected and cloned, with computation clusters inserted between each clone, so that the available local memory bandwidth is more fully utilized. This software-level mechanism complements both NMC and IMC hardware by ensuring that workload scheduling respects bandwidth availability. See how PatSnap's life sciences intelligence applies similar workload analysis principles to biotech R&D.

For large model inference, the data placement problem is critical. Hangzhou Micro-Nano Core Electronics' AI model compiler (2025) describes a two-tier CIM + near-DRAM architecture: KV caches with high access frequency are assigned to the CIM tier, while model weights reside in the DRAM tier with asynchronous prefetch instructions inserted by the compiler to hide data transfer latency. This is a concrete example of how NMC and IMC are combined in practice for LLM inference. The IEEE has documented similar hierarchical memory strategies in semiconductor research literature.

Google's memory access scheduling patent for PIM architectures (2025) addresses contention management when PIM compute operations and conventional memory read/write transactions compete for DRAM access — a challenge that conventional NMC architectures (with a separate compute module) largely avoid. Explore the PatSnap Analytics platform to map workload-aware scheduling patents across all major assignees.

Workload Routing Guide
Use IMC when:
Weights are stationary; large model with high reuse; CNN or transformer inference with pre-loaded weights; KV cache placement in CIM tier
Use NMC when:
Frequent weight updates; on-device fine-tuning; small parameter count models; streaming bandwidth workloads; recommendation engines
Use Hybrid when:
Mixed workloads; varying operator types (conv vs attention); need fallback for operations where IMC efficiency drops below NMC
Key Threshold
~256
ops/byte — beyond this threshold, increasing memory bandwidth yields no further performance gain (Samsung, 2022)
Below this threshold: memory-bound regime where NMC and IMC interventions are justified. Above: compute-bound regime where conventional accelerators suffice.
Research Synthesis

Seven Key Takeaways from the Patent Dataset

Distilled from 50+ patents across 6 jurisdictions — the findings that matter most for R&D engineers and IP professionals designing or evaluating memory computing architectures.

Finding 01

NMC Retains Read Semantics; IMC Eliminates Weight Movement Entirely

NMC reduces data movement distance but retains memory read semantics, placing compute in the controller or package adjacent to DRAM. IMC eliminates weight movement entirely by performing MAC operations inside the storage array, yielding energy efficiency gains of 2–3 orders of magnitude per the CAS ICT Automatic Synthesis Method for CNN Accelerator Architecture patents.

NMC: SEN, 2023 · IMC: CAS ICT, 2025
Finding 02

IMC Efficiency Collapses for Small Parameter Counts

IMC accelerator efficiency drops when the number of network parameters is small, because peripheral circuit power (ADC, DAC) can exceed the power saved by eliminating data transfer — as quantified in Mentium Technologies' Digital-Analog Hybrid System Architecture patent (2021). NMC remains efficient across a wider range of model sizes.

Mentium Technologies, 2021
Finding 03

Memory Write Cost Asymmetry Is a Critical IMC Liability

Princeton University's Scalable Array Architecture for In-Memory Computing explicitly documents that IMC reduces read and compute cost but not write cost — making NMC preferable for workloads with frequent weight updates or fine-tuning. NVM substrates have limited write endurance that compounds this liability.

Princeton University, 2023
Finding 04

Operational Intensity (~256 ops/byte) Is the Defining Routing Metric

Samsung's Neural Processing Device patent documents an empirical threshold of approximately 256 ops/byte, beyond which increasing memory bandwidth no longer improves algorithm performance — defining the memory-bound regime where NMC and IMC interventions are justified. Below this threshold, bandwidth is the binding constraint.

Samsung Electronics, 2022
🔒
Unlock Findings 5–7 and the Full Analysis
See the complete research synthesis including weight prefetching strategies, hybrid architecture patterns, and the unsolved scheduling problem — all with patent citations.
Weight prefetch strategies Hybrid NMC+IMC patterns Scheduling challenges
Access Full Analysis in Eureka →
Frequently asked questions

Near-Memory vs. In-Memory Computing — key questions answered

Still have questions about NMC and IMC architectures? Let PatSnap Eureka search the patent literature for you.

Ask PatSnap Eureka About Memory Computing
PatSnap Eureka

Accelerate Your Memory Architecture R&D with AI-Powered Patent Intelligence

Join 18,000+ innovators already using PatSnap Eureka to navigate the NMC, IMC, and PIM patent landscape — faster than any manual search.

References

  1. Techniques to Utilize Near Memory Compute Circuitry for Memory-Bound Workloads — SEN, SUJOY, 2023
  2. Controller-Internal Near-Memory Computing Acceleration Circuit with Weight Prefetching — Fudan University, 2025
  3. A Method for Achieving Load-Balanced Sparse Neural Network Near-Memory Inference Acceleration — Fudan University, 2024
  4. Computation Task Scheduling Device, Computing Device, and Method — Huawei Technologies, 2023
  5. Automatic Synthesis Method for CNN Accelerator Architecture Oriented to In-Memory Computing — Chinese Academy of Sciences ICT, 2025
  6. Automatic Synthesis Method for CNN Accelerator Architecture Oriented to In-Memory Computing — Chinese Academy of Sciences ICT, 2024
  7. Full-Analog Non-Multiply-Accumulate Operation Method and Apparatus for Analog In-Memory Computing — Tsinghua University, 2025
  8. TCAM and LUT-Based IMC Architecture Neural Network Accelerator — UESTC, 2023
  9. Scalable Array Architecture for In-Memory Computing (US) — The Trustees of Princeton University, 2023
  10. Scalable Array Architecture for In-Memory Computing (WO) — The Trustees of Princeton University, 2021
  11. Configurable In-Memory Computing Engine, Platform, Bit Cells and Layouts — The Trustees of Princeton University, 2019
  12. Two-Dimensional Mesh for Compute-In-Memory Accelerator Architecture — IBM, 2023
  13. Assigning DNN Weights to a 3D Crossbar Array — IBM, 2024
  14. Reusing Weights and Biases in an AI Accelerator for Different Minibatch Sizes — IBM, 2025
  15. Digital-IMC Hybrid System Architecture for Neural Network Acceleration — Mentium Technologies, 2021
  16. Digital-Analog Hybrid System Architecture for Neural Network Acceleration — Mentium Technologies, 2021
  17. Neural Network Operation Method, Device, and Electronic Equipment Based on Neural Network — Samsung Electronics, 2024
  18. Neural Processing Device and Operation Method — Samsung Electronics, 2022
  19. Efficiency and Power Control of Tasks Having Computation Bound and Memory Bound Phases — Apple Inc., 2025
  20. Arithmetic-Intensity Based Load Cloning — Amazon, 2025
  21. Method and System for Compiling AI Models for AI Accelerators with Hierarchical Memory — Hangzhou Micro-Nano Core Electronics, 2025
  22. Memory Access Scheduling for Parallel Computations Using a Processing-In-Memory Architecture — Google LLC, 2025
  23. Neural Network Accelerator with Parameters Resident on the Chip — Google LLC, 2020
  24. Heterogeneous Storage-Compute Fusion System for DNN Inference Acceleration — Zhejiang Laboratory, 2020
  25. Adaptive Utilization-Based In-Memory Computing Accelerator, System, and Method — South University of Science and Technology, 2025
  26. IEEE — Semiconductor and Memory Architecture Research
  27. JEDEC — DRAM and HBM Standards
  28. arXiv — LLM Inference Memory Bottleneck Research

All data and statistics on this page are sourced from the references above and from PatSnap's proprietary innovation intelligence platform. Patent analysis conducted via PatSnap Eureka.

Ask PatSnap Eureka
Ask PatSnap Eureka
AI innovation intelligence · always on
Ask anything about NMC and IMC for AI acceleration.
PatSnap Eureka searches 50+ patents and research literature to answer instantly.
Try asking
Powered by PatSnap Eureka