What Are NVIDIA’s “Vera Rubin” Chips? (And Why It’s More Than Just a GPU)

Jan 14

If you’ve seen “Vera Rubin” mentioned in NVIDIA keynotes and roadmaps, it can sound like a single chip name.

It’s not.

“Vera Rubin” is NVIDIA’s next-generation data-center AI platform: a coordinated stack that combines a new CPU (“Vera”), a new GPU architecture (“Rubin”), and the networking + rack-scale system design that ties hundreds of accelerators into one “AI factory.”

Below is the practical breakdown: what it is, what’s inside, when it arrives, and why NVIDIA is pushing platform-level chips rather than “just faster GPUs.”

1) The naming: “Vera” vs “Rubin” vs “Vera Rubin”

NVIDIA used two related names on purpose:

Vera = the CPU (Arm-based, custom design), successor to the Grace CPU line in this platform context.
Rubin = the GPU architecture/platform generation that follows Blackwell for data-center AI.
Vera Rubin = the combined CPU+GPU system (plus interconnect and rack design) that NVIDIA sells as an integrated supercomputing building block (examples: NVL72 / NVL144 depending on generation/branding).

So when people say “Vera Rubin chips,” they usually mean the Rubin-generation AI systems (often referencing both the Vera CPU and Rubin GPU together).

2) What NVIDIA is actually shipping: a rack-scale AI “computer,” not a card

The most important shift is this: NVIDIA is productizing the rack.

Instead of thinking “I buy GPUs and build a cluster,” the Rubin era is designed around pre-defined, validated rack-scale systems (NVIDIA often calls these “AI supercomputers” for a rack). One flagship example is:

NVIDIA Vera Rubin NVL72: a unified system that combines 72 Rubin GPUs + 36 Vera CPUs plus NVIDIA networking and DPUs.

This matches how frontier AI is built today: training + inference at massive scale, with enormous attention to networking, memory movement, reliability, and power/cooling.

3) What’s inside the Vera Rubin platform (the “six chips” story)

NVIDIA itself describes Rubin as “six new chips, one AI supercomputer”—because the platform isn’t only CPU+GPU. It includes the silicon needed to scale up (inside a rack) and scale out (across racks).

Commonly referenced components include:

A) Rubin GPU (the main accelerator)

Rubin is the next step after Blackwell for NVIDIA’s data-center AI acceleration, optimized for modern transformer workloads and inference at scale.

B) Vera CPU (the host + orchestration CPU)

Vera is described as a custom Arm CPU design and is paired tightly with Rubin using high-bandwidth coherent interconnect (NVLink-C2C).

C) NVLink / NVSwitch generation for scale-up

Rubin-era systems are built around a new NVLink generation (often discussed as NVLink 6 in NVIDIA materials) to connect many GPUs into a single “giant GPU-like” memory/compute fabric at rack scale.

D) ConnectX SuperNICs + DPU (BlueField) for data movement and security

NVIDIA’s DPUs and NICs are increasingly not optional. They offload networking, storage, and security tasks so GPUs stay busy on AI math.

4) Why Rubin exists: inference is becoming the “big” problem

Training is still huge, but the market is also shifting toward:

multi-step agentic workflows
longer context windows
massive key-value (KV) cache pressure
“always-on” inference where utilization, latency, and energy cost matter as much as peak FLOPS

NVIDIA’s Rubin messaging emphasizes exactly this: moving power and silicon into usable intelligence at rack scale for continuous operation.

That’s why Rubin is presented as a full platform: because for large inference, the bottleneck is often memory movement and networking, not just raw GPU math.

5) Rubin CPX: a “context” GPU class aimed at huge inference memory

One of the more interesting Rubin-era products is Rubin CPX, positioned as a GPU designed for massive-context inference (think million-token-class workloads, code assistants, and generative video pipelines).

Whether every claim holds in real deployments will depend on software and pricing, but the direction is clear: NVIDIA is splitting the lineup by workload type (training-heavy vs context-heavy inference) rather than one “do everything” GPU.

6) Timeline: when Vera Rubin arrives (and what’s next)

The dates move around depending on “announce vs sampling vs production,” but the consistent public roadmap pattern is:

Vera Rubin (Rubin generation) targeted for 2H 2026
Rubin Ultra in 2027 (a bigger follow-on platform step)

Industry coverage also consistently frames Rubin as Blackwell’s successor and highlights the infrastructure implications (power density, cooling, networking) as a major part of the story.

7) What makes Vera Rubin “different” from Blackwell in practical terms

Even without drowning in spec sheets, the practical differences NVIDIA is pushing are:

Platform-first design (validated rack-scale systems, not “here’s a GPU, good luck with the cluster”)
More emphasis on inference economics: throughput per watt, better utilization, faster data plumbing
Memory + interconnect as first-class features for long-context and agentic workloads
Networking/DPUs tightly integrated so the rack behaves like one machine

In short: Rubin is not only “faster.” It’s NVIDIA doubling down on the idea that the AI computer is the rack.

8) What this means for businesses buying AI infrastructure

If you’re advising a company (or building for clients) and “Rubin” comes up, here’s the framing that stays useful:

If you’re small/mid-size: you likely won’t buy Rubin systems directly at launch. But cloud providers will—and your cost/performance envelope for inference services can shift when these platforms roll out.
If you’re building “AI features” into products: Rubin-era improvements often show up as:
- cheaper inference for the same latency
- bigger context windows without exploding cost
- more reliable high-throughput pipelines
If you’re building data centers: Rubin is part of the trend toward very high rack power and infrastructure planning years ahead. Datacenter operators are already being told to prep for the power/cooling/networking realities of the next generation.

9) A simple mental model

Think of Vera Rubin like this:

Vera (CPU) coordinates the system
Rubin (GPU) does the AI math
NVLink/NVSwitch makes 72 GPUs behave like one giant pool
NICs + DPUs keep data feeding the GPUs securely and efficiently
The “product” is the rack, not the card

That’s the core idea.

Sorca Marian

Founder, CEO & CTO of Self-Manager.net & abZGlobal.net | Senior Software Engineer

https://self-manager.net/