Eine Demo buchen

Cut patent&paper research from weeks to hours with PatSnap Eureka AI!

Jetzt ausprobieren

NVLink-C2C eliminates latency in multi-chip GPU modules

NVLink-C2C Interconnect Latency in Multi-Chip GPU Modules — PatSnap Insights
Semiconductor & Hardware

NVIDIA’s NVLink-C2C chip-to-chip interconnect achieves lower latency in multi-chip GPU modules by moving the interconnect from the PCB to the package level — eliminating trace delays, enabling hardware address translation across CPU-GPU boundaries, and delivering up to 1.5× the bandwidth of PCIe 5.0. This analysis draws on more than 60 patent records to explain exactly how it works.

PatSnap Insights Team Innovation Intelligence Analysts 10 Minuten Lesezeit
Teilen
Reviewed by the PatSnap Insights editorial team ·

Die-to-die proximity: the primary latency lever in NVLink-C2C

NVLink-C2C reduces latency in multi-chip GPU modules primarily by moving the interconnect from the PCB to the package level — eliminating the propagation delays, connector parasitics, and re-serialization overhead that board-level links like PCIe unavoidably introduce. Unlike conventional interconnects that carry signals through PCB traces and connectors, NVLink-C2C operates as a package-level, die-to-die link that tightly couples heterogeneous chiplets — most prominently the Hopper GPU and Grace CPU — within a single multi-chip module (MCM).

60+
Patent & literature records surveyed
400 Gb/s
NVLink 5.0 bandwidth per plane
800 Gb/s
NVLink 6.0 / 7.0 bandwidth per plane
256 Gb/s
PCIe 5.0 bandwidth (8 × 32G)

A 2023 patent from Hangzhou Yunhe Zhiwang Technology explicitly identifies the Grace Hopper architecture as comprising four principal components: the Hopper GPU, Grace CPU, NVLink C2C interconnect, and NVSwitch fabric chip. The document contrasts this approach with earlier hard-wired mesh or switch-based GPU interconnects, noting that NVLink C2C represents an evolutionary step in intra-node coupling.

The physical proximity enabled by die-to-die packaging directly suppresses propagation delay. Research from Jilin University (2024) explains that MCM GPU designs connect individual GPU Processing Modules (GPMs) via package-level interconnects, and that compared to multi-GPU systems linked by PCIe, this arrangement delivers “low latency and high bandwidth.” The paper further details how L1 TLB miss requests can traverse GPM-to-GPM package interconnects to reach a remote GPM’s L2 TLB, reducing the address-translation penalty that would otherwise compound memory-access latency.

NVIDIA’s Grace Hopper Superchip comprises four principal components: the Hopper GPU, the Grace CPU, the NVLink C2C interconnect, and the NVSwitch fabric chip. NVLink-C2C serves as the package-level die-to-die link that couples the GPU and CPU within a single multi-chip module.

What is NVLink-C2C?

NVLink-C2C (chip-to-chip) is a package-level variant of NVIDIA’s NVLink protocol. It operates as a die-to-die physical link within a multi-chip module, coupling heterogeneous chiplets at package distances rather than across PCB traces. This eliminates the connector parasitics and re-serialization overhead that limit conventional board-level interconnects such as PCIe.

AMD’s parallel work on chiplet GPU architectures corroborates this design philosophy. A 2022 Advanced Micro Devices patent describes a passive interposer-based crosslink connecting GPU chiplets, enabling cache-coherent communication across chiplets that “appears as a single device” to software. This validates the broader industry trend: die-to-die proximity interconnects operating at package level — of which NVLink-C2C is NVIDIA’s implementation — are the primary mechanism for preserving single-die-equivalent latency semantics in multi-chiplet GPU modules. Standards bodies including IEEE have tracked the emergence of die-to-die interface standards as a direct response to this design pressure.

Figure 1 — NVLink-C2C vs PCIe: interconnect bandwidth comparison across protocol generations
NVLink-C2C vs PCIe bandwidth comparison: NVLink 5.0 at 400 Gb/s, NVLink 6.0/7.0 at 800 Gb/s, PCIe 5.0 at 256 Gb/s 200 400 600 800 Bandwidth (Gb/s) 256 Gb/s PCIe 5.0 400 Gb/s NVLink 5.0 800 Gb/s NVLink 6.0/7.0 PCIe NVLink 5.0 NVLink 6.0/7.0
NVLink 5.0 delivers 400 Gb/s per plane versus PCIe 5.0’s 256 Gb/s; NVLink 6.0/7.0 extends this to 800 Gb/s — a 3× bandwidth advantage over PCIe 5.0 that directly reduces queuing latency under sustained AI workloads. Source: Shanghai Shi’ao Communication Equipment patent, 2025.

Bandwidth, protocol, and the PCIe comparison: quantifying the latency gap

NVLink’s design goal is explicitly to provide higher bandwidth and lower latency than traditional PCIe interfaces, and the patent record quantifies this gap with precision. A 2025 patent from Shanghai Shi’ao Communication Equipment places NVLink 5.0 at 2 × 200G = 400 Gb/s per plane, while PCIe 5.0 operates at 8 × 32G = 256 Gb/s — a 1.56× bandwidth advantage that translates directly to lower queuing latency under sustained load. NVLink 6.0 and 7.0 extend this to 4 × 200G = 800 Gb/s per plane.

“NVLink’s design goal is to provide higher bandwidth and lower latency than traditional PCIe interfaces, significantly improving GPU-to-GPU data exchange speeds — particularly important for high-performance computing, deep learning, and graphics-intensive applications.”

A 2025 patent from Ningchang Information Industry states this positioning directly, framing NVLink’s latency advantage as especially consequential for deep learning and HPC workloads. A complementary Dell Products patent (2025) characterises GPU-to-GPU communication as “non-uniform” — meaning there are order-of-magnitude speed differences between GPU pairs connected directly via NVLink versus those traversing PCIe switches — and recommends mapping communication-intensive tasks to NVLink-connected GPU pairs to minimise latency penalties.

NVLink 5.0 delivers 400 Gb/s per plane (2 × 200G) versus PCIe 5.0’s 256 Gb/s (8 × 32G), a 1.56× bandwidth advantage. NVLink 6.0 and 7.0 further extend per-plane bandwidth to 800 Gb/s, representing a 3× advantage over PCIe 5.0. Higher bandwidth directly reduces queuing latency under sustained GPU workloads.

Intel’s 2025 GPU-to-GPU hardware compression patent explicitly enumerates NVLink and NVLink-C2C protocols — including NVLink v5 — among viable GPU-to-GPU communication protocols operating at speeds up to 120 GB/s and beyond. A separate Intel GPU virtualisation patent (2025) similarly lists NVLink-C2C protocols as a supported option for high-speed links operating at 30–120 GB/s. This third-party acknowledgment from a direct competitor, documented by organisations including Intel, underlines NVLink-C2C’s position as an industry reference point for high-bandwidth, low-latency chip interconnect design.

The same Shanghai Shi’ao patent identifies that low-latency PCIe/NVLink switch chips face more severe Moore’s Law scaling challenges than Ethernet switch chips, making capacity-constrained low-latency switching a system-level bottleneck for GPU supernode scaling. NVLink-C2C partially sidesteps this problem by operating below the switch layer entirely, at die-to-die granularity — meaning it avoids the switch fabric’s latency floor altogether for intra-module traffic. Researchers tracking semiconductor scaling at IEEE have noted this switch-scaling constraint as a structural challenge for interconnect roadmaps beyond 2025.

Explore the full NVLink-C2C patent landscape and track competing interconnect filings in real time.

Explore GPU Interconnect Patents in PatSnap Eureka →

Cache coherence, memory unification, and hardware address translation

Beyond raw signalling bandwidth, NVLink-C2C reduces effective application-level latency through architectural features that eliminate software-mediated data movement. NVIDIA’s 2024 multi-processor–coprocessor interface patent documents NVLink’s support for Address Translation Services (ATS), “allowing the PPU to directly access the CPU’s page table and providing full access to CPU memory from the PPU.” This hardware-accelerated virtual address translation across the CPU-GPU boundary is architecturally enabled by NVLink-C2C’s coherent, low-latency physical link.

Key finding: ATS eliminates software-layer copy latency

NVLink’s Address Translation Services (ATS) support — documented in NVIDIA’s 2024 patent — allows the GPU to directly access the CPU’s page table without pinning memory regions or marshalling data through software copy engines. Both operations add multi-microsecond latencies in PCIe-based configurations; NVLink-C2C’s coherent link removes this overhead entirely.

NVIDIA’s 2021 fabric-attached memory patent establishes the broader NVLink philosophy: the NVLink high-speed data link interconnect “allows a GPU to access another GPU’s local memory almost as if it were its own,” enabling developers to pool the memory resources of multiple GPUs. The document contrasts NVLink favourably against PCIe, noting that while NVLink is slower than on-chip memory bandwidth, it is “much faster than PCIe or other such data links typically used to provide access to main system memory.” When NVLink-C2C operates at die-to-die package distances rather than across board-level cables, this gap narrows further toward on-chip latency characteristics.

NVLink supports Address Translation Services (ATS), allowing a GPU to directly access the CPU’s page table and access CPU memory without software-mediated copy operations. This capability is enabled by NVLink-C2C’s coherent, low-latency physical link and eliminates multi-microsecond software-layer copy latencies present in PCIe-based multi-GPU configurations.

Jilin University’s 2024 MCM GPU TLB optimisation patent adds a microarchitectural dimension specific to multi-chip modules: the ability to forward L1 TLB misses across GPM-to-GPM package interconnects to a remote GPM’s L2 TLB effectively expands the TLB’s reach without requiring software intervention. This feature is viable only because the package-level interconnect operates at latencies low enough to make remote TLB lookups feasible within the GPU’s memory pipeline timing budget — a constraint that board-level PCIe links cannot satisfy. According to WIPO filing data, TLB optimisation for heterogeneous multi-chip compute nodes is among the fastest-growing sub-categories in GPU architecture patent filings.

Figure 2 — NVLink-C2C interconnect hierarchy: latency tiers from die-to-die to inter-node
NVLink-C2C interconnect hierarchy: die-to-die tier, NVSwitch intra-node tier, and Ethernet/InfiniBand inter-node tier for multi-chip GPU modules NVLink-C2C Die-to-die Intra-module +latency NVSwitch Intra-node Multi-GPU +latency Ethernet / InfiniBand Inter-node Lowest latency Mid latency Highest latency Each tier adds incremental latency
The Grace Hopper interconnect hierarchy maps NVLink-C2C (intra-module, lowest latency) → NVSwitch (intra-node) → external network fabric (inter-node), with each tier adding incremental latency. Source: Hangzhou Yunhe Zhiwang Technology patent, 2023.

NVLink-C2C in the broader GPU interconnect hierarchy

NVLink-C2C does not operate in isolation — it functions as the lowest-latency intra-module tier within a hierarchical interconnect architecture. At the cluster level, NVSwitch fabric chips aggregate GPU-to-GPU bandwidth across modules, while NVLink-C2C handles the tightest-coupling tier: the Grace-Hopper die pair. A 2023 patent from Hangzhou Yunhe Zhiwang Technology explicitly separates the DPU/NIC (handling external network traffic) from the NVLink C2C intra-module link and the NVSwitch inter-module fabric, confirming the three-tier model.

NVIDIA’s 2021 fabric-attached memory patent establishes NVSwitch as the scale-out complement to NVLink, enabling GPU-to-GPU peer communication “as fast, highly scalable multi-processor interconnects” to avoid bandwidth bottlenecks. In this hierarchy, NVLink-C2C sits below NVSwitch — handling sub-nanosecond die-to-die transfers — while NVSwitch handles intra-node multi-GPU traffic, and Ethernet or InfiniBand handles inter-node communication. Each tier adds latency; NVLink-C2C’s role is to keep the innermost tier as close to on-chip speed as physically possible.

NVIDIA’s 2025 device link management patent describes neural-network-driven link power and frequency management for GPU communication links, referencing NVSwitch-class switches as components whose overall operating power can be reduced dynamically. The system benefits most when more active GPUs are “dynamically turbo-boosted for improved overall performance” — a capability that depends on the underlying NVLink-C2C fabric presenting predictable, sub-microsecond communication delays that allow the power management system to make rapid adjustments without disturbing workload timing.

The Grace Hopper Superchip’s interconnect architecture operates in three tiers: NVLink-C2C handles intra-module die-to-die transfers at the lowest latency; NVSwitch manages intra-node multi-GPU traffic; and Ethernet or InfiniBand handles inter-node communication. Each tier adds incremental latency above the NVLink-C2C baseline.

Track dynamic link management and power-optimisation patents across NVIDIA, AMD, and Intel in PatSnap Eureka.

Analyse Interconnect Patents in PatSnap Eureka →

Competitive landscape: who else is solving the multi-chip interconnect latency problem?

The patent corpus surveyed — spanning more than 60 records — reveals that NVIDIA is not alone in pursuing die-to-die proximity as the solution to multi-chip GPU latency. The competitive landscape is broad, technically deep, and accelerating, with NVIDIA, AMD, Intel, IBM, and a growing cluster of Chinese research institutions each contributing distinct approaches.

NVIDIA

NVIDIA is the most heavily represented assignee across the data set, with patents spanning NVLink fabric architecture, fabric-attached memory, multi-format GPU docking boards, device link management, and multi-processor–coprocessor interfaces. Three patents collectively define NVLink-C2C’s latency and power-management capabilities: the 2024 multi-processor–coprocessor interface patent (documenting ATS support), the 2021 fabric-attached memory patent (establishing the memory-pooling model), and the 2025 device link management patent (describing neural-network-driven frequency management). The PatSnap R&D intelligence platform tracks NVIDIA’s full NVLink filing history across all jurisdictions.

AMD

Advanced Micro Devices is advancing chiplet-based GPU architectures with passive crosslink interposers, documented in patents from 2022 and 2024. AMD’s GPU chiplet patents describe passive interposer-based crosslinks that enable cache-coherent communication across chiplets that “appears as a single device” to software — independently converging on the same physical proximity principle as NVLink-C2C. AMD also holds a 2022 patent on proactive management of inter-GPU network links, addressing per-layer neural network clock and link-width management to minimise communication overhead. This validates that the latency benefits of die-to-die proximity are architectural rather than purely proprietary to NVIDIA’s protocol.

Intel

Intel appears as the second most prominent assignee, with patents covering GPU-to-GPU lossless and lossy hardware compression on network links, and configurable bandwidth throttling for GPU virtualisation. Intel’s 2025 hardware compression patent explicitly enumerates NVLink-C2C protocols — including NVLink v5 — as viable GPU-to-GPU communication protocols, demonstrating competitive awareness and positioning compression as an orthogonal latency-mitigation technique for GPU links operating at up to 120 GB/s and beyond.

IBM and Chinese research institutions

IBM contributes reconfigurable CPU/GPU interconnect topology patents focused on thermal and bandwidth management (2020). Chinese research institutions and enterprises — including Jilin University, Shandong Inspur, Zhejiang University, Baidu, and others — represent a fast-growing innovation cluster focused on MCM GPU TLB optimisation, topology-switching, cluster routing, and optical-electrical hybrid GPU interconnects. According to WIPO data, Chinese filers have increased their share of GPU interconnect patent applications materially since 2022, indicating that competitive pressure on NVLink-C2C’s latency advantages is intensifying across the global R&D landscape. The PatSnap patent analytics suite provides jurisdiction-level filing trend data for this technology area.

Figure 3 — Key assignees in the GPU multi-chip interconnect patent landscape (relative filing depth)
GPU multi-chip interconnect patent landscape: relative assignee filing depth for NVLink-C2C and competing die-to-die interconnect technologies Niedrig Mittel Hoch Highest NVIDIA ●●●●● Intel ●●●● AMD ●●●● IBM ●● CN Research ●●● ↑
NVIDIA is the most heavily represented assignee in the GPU multi-chip interconnect patent corpus (60+ records), followed by Intel and AMD. Chinese research institutions represent a fast-growing cluster whose filings are accelerating. Source: PatSnap patent analysis, 2025.
Häufig gestellte Fragen

NVLink-C2C interconnect latency — key questions answered

Still have questions? Let PatSnap Eureka answer them for you.

Ask PatSnap Eureka for a Deeper Answer →

Referenzen

  1. 一种多用途网络接口扩展芯片及电子设备 — Hangzhou Yunhe Zhiwang Technology, 2023
  2. 通过优化TLB提高MCM GPU地址翻译效率的方法 — Jilin University, 2024
  3. 使用高带宽交叉链路的GPU小芯片 — Advanced Micro Devices, 2022
  4. GPU Chiplets with High Bandwidth Crosslinks — Advanced Micro Devices, 2024
  5. 一种针对GPU互联的高速光电交换网络以及实现方法 — Shanghai Shi’ao Communication Equipment, 2025
  6. 虚拟环境中多机GPU通信的路由控制方法、设备及系统 — Ningchang Information Industry (Beijing), 2025
  7. 图形到图形网络链路中的无损和有损自动硬件压缩 — Intel Corporation, 2025
  8. 用于GPU虚拟化的工作负载的可配置架构带宽节流 — Intel Corporation, 2025
  9. 鲁棒且高效的多处理器-协处理器接口 — NVIDIA Corporation, 2024
  10. 用于有效的结构附接存储器的技术 — NVIDIA Corporation, 2021
  11. 设备链路管理 — NVIDIA Corporation, 2025
  12. Proactive management of inter-GPU network links — Advanced Micro Devices, 2022
  13. 用于GPU集群的方法、设备和产品 — Dell Products, 2025
  14. GPU chiplets with high-bandwidth crosslinks — Advanced Micro Devices, 2022
  15. Reconfigurable CPU/GPU interconnect to mitigate power/thermal throttling — IBM, 2020
  16. Reconfigurable network infrastructure — IBM, 2020
  17. WIPO — World Intellectual Property Organization: Patent filing trend data
  18. IEEE — Institute of Electrical and Electronics Engineers: Die-to-die interface standards and semiconductor scaling
  19. PatSnap R&D Intelligence Platform — NVLink filing history and patent analytics

All data and statistics in this article are sourced from the references above and from PatSnap‘s proprietary innovation intelligence platform.

Ihr Partner für künstliche Intelligenz
für intelligentere Innovationen

PatSnap fuses the world’s largest proprietary innovation dataset with cutting-edge AI to
supercharge R&D, IP strategy, materials science, and drug discovery.

Eine Demo buchen