What is NVLink-C2C and how does it differ from standard NVLink?

NVLink-C2C (chip-to-chip) is a package-level, die-to-die variant of NVIDIA's NVLink protocol. Unlike standard NVLink, which connects discrete GPUs across a board via cables or traces, NVLink-C2C tightly couples heterogeneous chiplets—most prominently the Hopper GPU and Grace CPU—within a single multi-chip module, eliminating PCB trace propagation delays and connector parasitics.

How much faster is NVLink than PCIe for GPU-to-GPU communication?

NVLink 5.0 delivers 400 Gb/s per plane (2 × 200G) versus PCIe 5.0's 256 Gb/s (8 × 32G), representing approximately a 1.5× bandwidth advantage. NVLink 6.0 and 7.0 extend this to 800 Gb/s per plane. Higher bandwidth directly reduces queuing latency under load.

What is the Grace Hopper Superchip and how does NVLink-C2C fit into it?

The Grace Hopper Superchip comprises four principal components: the Hopper GPU, the Grace CPU, the NVLink C2C interconnect, and the NVSwitch fabric chip. NVLink-C2C serves as the intra-module die-to-die link coupling the GPU and CPU at package level, while NVSwitch handles inter-module GPU-to-GPU traffic at the node level.

Do competing chipmakers use similar die-to-die interconnect approaches?

Yes. AMD's GPU chiplet patents (2022 and 2024) describe passive interposer-based crosslinks connecting GPU chiplets that enable cache-coherent communication and make the chiplet assembly appear as a single device to software. This independently validates that physical die-to-die proximity—not any single vendor's proprietary protocol—is the root cause of the latency advantage.

NVLink-C2C eliminates latency in multi-chip GPU modules

Q: What role does NVLink-C2C play in the broader GPU interconnect hierarchy?

NVLink-C2C sits at the base of a three-tier latency hierarchy: NVLink C2C handles intra-module die-to-die transfers, NVSwitch manages intra-node multi-GPU traffic, and Ethernet or InfiniBand handles inter-node communication. Each tier adds incremental latency, making NVLink-C2C the lowest-latency tier in the entire stack.

Die-to-die proximity: the primary latency lever in NVLink-C2C

NVLink-C2C reduces latency in multi-chip GPU modules primarily by moving the interconnect from the PCB to the package level — eliminating the propagation delays, connector parasitics, and re-serialization overhead that board-level links like PCIe unavoidably introduce. Unlike conventional interconnects that carry signals through PCB traces and connectors, NVLink-C2C operates as a package-level, die-to-die link that tightly couples heterogeneous chiplets — most prominently the Hopper GPU and Grace CPU — within a single multi-chip module (MCM).

60+

Patent & literature records surveyed

400 Gb/s

NVLink 5.0 bandwidth per plane

800 Gb/s

NVLink 6.0 / 7.0 bandwidth per plane

256 Gb/s

PCIe 5.0 bandwidth (8 × 32G)

A 2023 patent from Hangzhou Yunhe Zhiwang Technology explicitly identifies the Grace Hopper architecture as comprising four principal components: the Hopper GPU, Grace CPU, NVLink C2C interconnect, and NVSwitch fabric chip. The document contrasts this approach with earlier hard-wired mesh or switch-based GPU interconnects, noting that NVLink C2C represents an evolutionary step in intra-node coupling.

The physical proximity enabled by die-to-die packaging directly suppresses propagation delay. Research from Jilin University (2024) explains that MCM GPU designs connect individual GPU Processing Modules (GPMs) via package-level interconnects, and that compared to multi-GPU systems linked by PCIe, this arrangement delivers “low latency and high bandwidth.” The paper further details how L1 TLB miss requests can traverse GPM-to-GPM package interconnects to reach a remote GPM’s L2 TLB, reducing the address-translation penalty that would otherwise compound memory-access latency.

NVIDIA’s Grace Hopper Superchip comprises four principal components: the Hopper GPU, the Grace CPU, the NVLink C2C interconnect, and the NVSwitch fabric chip. NVLink-C2C serves as the package-level die-to-die link that couples the GPU and CPU within a single multi-chip module.

What is NVLink-C2C?

NVLink-C2C (chip-to-chip) is a package-level variant of NVIDIA’s NVLink protocol. It operates as a die-to-die physical link within a multi-chip module, coupling heterogeneous chiplets at package distances rather than across PCB traces. This eliminates the connector parasitics and re-serialization overhead that limit conventional board-level interconnects such as PCIe.

AMD’s parallel work on chiplet GPU architectures corroborates this design philosophy. A 2022 Advanced Micro Devices patent describes a passive interposer-based crosslink connecting GPU chiplets, enabling cache-coherent communication across chiplets that “appears as a single device” to software. This validates the broader industry trend: die-to-die proximity interconnects operating at package level — of which NVLink-C2C is NVIDIA’s implementation — are the primary mechanism for preserving single-die-equivalent latency semantics in multi-chiplet GPU modules. Standards bodies including IEEE have tracked the emergence of die-to-die interface standards as a direct response to this design pressure.

Figure 1 — NVLink-C2C vs PCIe: interconnect bandwidth comparison across protocol generations

NVLink 5.0 delivers 400 Gb/s per plane versus PCIe 5.0’s 256 Gb/s; NVLink 6.0/7.0 extends this to 800 Gb/s — a 3× bandwidth advantage over PCIe 5.0 that directly reduces queuing latency under sustained AI workloads. Source: Shanghai Shi’ao Communication Equipment patent, 2025.

Bandwidth, protocol, and the PCIe comparison: quantifying the latency gap

NVLink’s design goal is explicitly to provide higher bandwidth and lower latency than traditional PCIe interfaces, and the patent record quantifies this gap with precision. A 2025 patent from Shanghai Shi’ao Communication Equipment places NVLink 5.0 at 2 × 200G = 400 Gb/s per plane, while PCIe 5.0 operates at 8 × 32G = 256 Gb/s — a 1.56× bandwidth advantage that translates directly to lower queuing latency under sustained load. NVLink 6.0 and 7.0 extend this to 4 × 200G = 800 Gb/s per plane.

“NVLink’s design goal is to provide higher bandwidth and lower latency than traditional PCIe interfaces, significantly improving GPU-to-GPU data exchange speeds — particularly important for high-performance computing, deep learning, and graphics-intensive applications.”

A 2025 patent from Ningchang Information Industry states this positioning directly, framing NVLink’s latency advantage as especially consequential for deep learning and HPC workloads. A complementary Dell Products patent (2025) characterises GPU-to-GPU communication as “non-uniform” — meaning there are order-of-magnitude speed differences between GPU pairs connected directly via NVLink versus those traversing PCIe switches — and recommends mapping communication-intensive tasks to NVLink-connected GPU pairs to minimise latency penalties.

NVLink 5.0 delivers 400 Gb/s per plane (2 × 200G) versus PCIe 5.0’s 256 Gb/s (8 × 32G), a 1.56× bandwidth advantage. NVLink 6.0 and 7.0 further extend per-plane bandwidth to 800 Gb/s, representing a 3× advantage over PCIe 5.0. Higher bandwidth directly reduces queuing latency under sustained GPU workloads.

Intel’s 2025 GPU-to-GPU hardware compression patent explicitly enumerates NVLink and NVLink-C2C protocols — including NVLink v5 — among viable GPU-to-GPU communication protocols operating at speeds up to 120 GB/s and beyond. A separate Intel GPU virtualisation patent (2025) similarly lists NVLink-C2C protocols as a supported option for high-speed links operating at 30–120 GB/s. This third-party acknowledgment from a direct competitor, documented by organisations including Intel, underlines NVLink-C2C’s position as an industry reference point for high-bandwidth, low-latency chip interconnect design.

The same Shanghai Shi’ao patent identifies that low-latency PCIe/NVLink switch chips face more severe Moore’s Law scaling challenges than Ethernet switch chips, making capacity-constrained low-latency switching a system-level bottleneck for GPU supernode scaling. NVLink-C2C partially sidesteps this problem by operating below the switch layer entirely, at die-to-die granularity — meaning it avoids the switch fabric’s latency floor altogether for intra-module traffic. Researchers tracking semiconductor scaling at IEEE have noted this switch-scaling constraint as a structural challenge for interconnect roadmaps beyond 2025.

Explore the full NVLink-C2C patent landscape and track competing interconnect filings in real time.

Explore GPU Interconnect Patents in PatSnap Eureka →

Cache coherence, memory unification, and hardware address translation

Beyond raw signalling bandwidth, NVLink-C2C reduces effective application-level latency through architectural features that eliminate software-mediated data movement. NVIDIA’s 2024 multi-processor–coprocessor interface patent documents NVLink’s support for Address Translation Services (ATS), “allowing the PPU to directly access the CPU’s page table and providing full access to CPU memory from the PPU.” This hardware-accelerated virtual address translation across the CPU-GPU boundary is architecturally enabled by NVLink-C2C’s coherent, low-latency physical link.

Key finding: ATS eliminates software-layer copy latency

NVLink’s Address Translation Services (ATS) support — documented in NVIDIA’s 2024 patent — allows the GPU to directly access the CPU’s page table without pinning memory regions or marshalling data through software copy engines. Both operations add multi-microsecond latencies in PCIe-based configurations; NVLink-C2C’s coherent link removes this overhead entirely.

NVIDIA’s 2021 fabric-attached memory patent establishes the broader NVLink philosophy: the NVLink high-speed data link interconnect “allows a GPU to access another GPU’s local memory almost as if it were its own,” enabling developers to pool the memory resources of multiple GPUs. The document contrasts NVLink favourably against PCIe, noting that while NVLink is slower than on-chip memory bandwidth, it is “much faster than PCIe or other such data links typically used to provide access to main system memory.” When NVLink-C2C operates at die-to-die package distances rather than across board-level cables, this gap narrows further toward on-chip latency characteristics.

NVLink supports Address Translation Services (ATS), allowing a GPU to directly access the CPU’s page table and access CPU memory without software-mediated copy operations. This capability is enabled by NVLink-C2C’s coherent, low-latency physical link and eliminates multi-microsecond software-layer copy latencies present in PCIe-based multi-GPU configurations.

Jilin University’s 2024 MCM GPU TLB optimisation patent adds a microarchitectural dimension specific to multi-chip modules: the ability to forward L1 TLB misses across GPM-to-GPM package interconnects to a remote GPM’s L2 TLB effectively expands the TLB’s reach without requiring software intervention. This feature is viable only because the package-level interconnect operates at latencies low enough to make remote TLB lookups feasible within the GPU’s memory pipeline timing budget — a constraint that board-level PCIe links cannot satisfy. According to WIPO filing data, TLB optimisation for heterogeneous multi-chip compute nodes is among the fastest-growing sub-categories in GPU architecture patent filings.

Figure 2 — NVLink-C2C interconnect hierarchy: latency tiers from die-to-die to inter-node

The Grace Hopper interconnect hierarchy maps NVLink-C2C (intra-module, lowest latency) → NVSwitch (intra-node) → external network fabric (inter-node), with each tier adding incremental latency. Source: Hangzhou Yunhe Zhiwang Technology patent, 2023.

NVLink-C2C in the broader GPU interconnect hierarchy

NVLink-C2C does not operate in isolation — it functions as the lowest-latency intra-module tier within a hierarchical interconnect architecture. At the cluster level, NVSwitch fabric chips aggregate GPU-to-GPU bandwidth across modules, while NVLink-C2C handles the tightest-coupling tier: the Grace-Hopper die pair. A 2023 patent from Hangzhou Yunhe Zhiwang Technology explicitly separates the DPU/NIC (handling external network traffic) from the NVLink C2C intra-module link and the NVSwitch inter-module fabric, confirming the three-tier model.

NVIDIA’s 2021 fabric-attached memory patent establishes NVSwitch as the scale-out complement to NVLink, enabling GPU-to-GPU peer communication “as fast, highly scalable multi-processor interconnects” to avoid bandwidth bottlenecks. In this hierarchy, NVLink-C2C sits below NVSwitch — handling sub-nanosecond die-to-die transfers — while NVSwitch handles intra-node multi-GPU traffic, and Ethernet or InfiniBand handles inter-node communication. Each tier adds latency; NVLink-C2C’s role is to keep the innermost tier as close to on-chip speed as physically possible.

NVIDIA’s 2025 device link management patent describes neural-network-driven link power and frequency management for GPU communication links, referencing NVSwitch-class switches as components whose overall operating power can be reduced dynamically. The system benefits most when more active GPUs are “dynamically turbo-boosted for improved overall performance” — a capability that depends on the underlying NVLink-C2C fabric presenting predictable, sub-microsecond communication delays that allow the power management system to make rapid adjustments without disturbing workload timing.

The Grace Hopper Superchip’s interconnect architecture operates in three tiers: NVLink-C2C handles intra-module die-to-die transfers at the lowest latency; NVSwitch manages intra-node multi-GPU traffic; and Ethernet or InfiniBand handles inter-node communication. Each tier adds incremental latency above the NVLink-C2C baseline.

Track dynamic link management and power-optimisation patents across NVIDIA, AMD, and Intel in PatSnap Eureka.

Analyse Interconnect Patents in PatSnap Eureka →

Competitive landscape: who else is solving the multi-chip interconnect latency problem?

The patent corpus surveyed — spanning more than 60 records — reveals that NVIDIA is not alone in pursuing die-to-die proximity as the solution to multi-chip GPU latency. The competitive landscape is broad, technically deep, and accelerating, with NVIDIA, AMD, Intel, IBM, and a growing cluster of Chinese research institutions each contributing distinct approaches.

NVIDIA

NVIDIA is the most heavily represented assignee across the data set, with patents spanning NVLink fabric architecture, fabric-attached memory, multi-format GPU docking boards, device link management, and multi-processor–coprocessor interfaces. Three patents collectively define NVLink-C2C’s latency and power-management capabilities: the 2024 multi-processor–coprocessor interface patent (documenting ATS support), the 2021 fabric-attached memory patent (establishing the memory-pooling model), and the 2025 device link management patent (describing neural-network-driven frequency management). The PatSnap R&D intelligence platform tracks NVIDIA’s full NVLink filing history across all jurisdictions.

AMD

Advanced Micro Devices is advancing chiplet-based GPU architectures with passive crosslink interposers, documented in patents from 2022 and 2024. AMD’s GPU chiplet patents describe passive interposer-based crosslinks that enable cache-coherent communication across chiplets that “appears as a single device” to software — independently converging on the same physical proximity principle as NVLink-C2C. AMD also holds a 2022 patent on proactive management of inter-GPU network links, addressing per-layer neural network clock and link-width management to minimise communication overhead. This validates that the latency benefits of die-to-die proximity are architectural rather than purely proprietary to NVIDIA’s protocol.

Intel

Intel appears as the second most prominent assignee, with patents covering GPU-to-GPU lossless and lossy hardware compression on network links, and configurable bandwidth throttling for GPU virtualisation. Intel’s 2025 hardware compression patent explicitly enumerates NVLink-C2C protocols — including NVLink v5 — as viable GPU-to-GPU communication protocols, demonstrating competitive awareness and positioning compression as an orthogonal latency-mitigation technique for GPU links operating at up to 120 GB/s and beyond.

IBM and Chinese research institutions

IBM contributes reconfigurable CPU/GPU interconnect topology patents focused on thermal and bandwidth management (2020). Chinese research institutions and enterprises — including Jilin University, Shandong Inspur, Zhejiang University, Baidu, and others — represent a fast-growing innovation cluster focused on MCM GPU TLB optimisation, topology-switching, cluster routing, and optical-electrical hybrid GPU interconnects. According to WIPO data, Chinese filers have increased their share of GPU interconnect patent applications materially since 2022, indicating that competitive pressure on NVLink-C2C’s latency advantages is intensifying across the global R&D landscape. The PatSnap patent analytics suite provides jurisdiction-level filing trend data for this technology area.

Figure 3 — Key assignees in the GPU multi-chip interconnect patent landscape (relative filing depth)

NVIDIA is the most heavily represented assignee in the GPU multi-chip interconnect patent corpus (60+ records), followed by Intel and AMD. Chinese research institutions represent a fast-growing cluster whose filings are accelerating. Source: PatSnap patent analysis, 2025.

AI AGENTS

INTELLIGENCE SUITE

API, MCP & INTEGRATION

INDUSTRIES

USE CASES

EXPLORE

ENGAGE

SUPPORT & SERVICES

Your Agentic AI Partner
for Smarter Innovation

Great, Please verify your email.

NVLink-C2C eliminates latency in multi-chip GPU modules

Die-to-die proximity: the primary latency lever in NVLink-C2C

Bandwidth, protocol, and the PCIe comparison: quantifying the latency gap

Cache coherence, memory unification, and hardware address translation

NVLink-C2C in the broader GPU interconnect hierarchy