英伟达GTC 2026发布Vera Rubin平台

深度长文 2026年3月17日

由Patrick Moorhead，首席执行官兼首席分析师，Moor Insights & Strategy

在主题演讲之前我写了手册。看看它是如何展开的。

上周，我发布了我的GTC 2026预览，提出了一个具体的论点：NVIDIA必须证明它能够将训练GPU、预填充加速器、Groq解码处理器和独立CPU统一在一个软件层之下。我阐述了我预期黄仁勋会宣布的内容，存在的风险，以及我建议公司采取的行动。然后我飞往圣荷西，在SAP中心观看了主题演讲。

自2011年以来，我参加了每一届GTC。这是我见过的最具架构完整性的主题演讲。七款全新芯片投入生产。五个机架规模的系统。一个统一的软件堆栈，涵盖训练、推理、代理编排和存储。一个比我预期的更广泛的物理AI生态系统。还有一个名为Olaf的迪士尼机器人，在NVIDIA的Isaac仿真环境中完全训练出来，走上了舞台。黄仁勋在开场时庆祝了CUDA的20周年，并在闭幕时宣布，“每个SaaS公司都会成为GaaS公司，”即代理即服务公司。在此期间，他阐述了代币工厂的经济学方式，应该能引起每位基础设施CEO的关注。

简短版本：NVIDIA 实现了异构平台的理论。Groq LPU 的集成完全如我所预测。Vera CPU 从默默无闻跃升至舞台中央。软件壁垒更高了。令我惊讶的是速度与规模：到 2027 年，需求管线达到 1 万亿美元，LPX 机架将在 2026 年下半年出货，三星已经在生产 Groq LP30 芯片，Satya Nadella 确认 Vera Rubin 已经在 Microsoft Azure 上运行。尚未完全解决的问题：企业简化以及我在 2027 年指出的能源限制。

七颗芯片，五个机架，一个 AI 工厂：Vera Rubin 平台

黄仁勋在 3 月 16 日发布了 NVIDIA Vera Rubin 平台：七款全新芯片，全部量产，以五套机架规模系统出货。组件包括 Rubin GPU、Vera CPU、NVLink 6 交换机、ConnectX-9 超级网卡、BlueField-4 DPU、Spectrum-6 以太网交换机，以及新集成的 Groq 3 LPU。机架包括：用于 GPU 计算的 Vera Rubin NVL72，用于智能编排的 Vera CPU，用于超低延迟解码的 Groq 3 LPX，用于上下文存储的 BlueField-4 STX，以及用于以太网骨干网络的 Spectrum-6 SPX。

正如我的同事 Matt Kimball 在他的 CES 2026 研究报告中写的，NVIDIA 将 Vera Rubin 定位为一个新平台，而不是新一代芯片。GTC 2026 验证了这一定位。NVL72 集成了 72 块 Rubin GPU 和 36 块 Vera CPU，通过 NVLink 6 互联。NVIDIA 宣称每瓦推理吞吐量提高 10 倍，每个 token 的成本仅为 Blackwell 的十分之一，并表示 NVL72 可以用上一代四分之一的 GPU 数量处理大型混合专家模型。如果这些效率主张在生产规模上成立，它们将改变整个 AI 工厂的经济模式。

在舞台上，Jensen 展示了硬件：100% 液冷、无电缆的计算托盘，将安装时间从两天缩短至两小时，以及第六代 NVLink 交换系统。他还确认 Satya Nadella 已经报告 Vera Rubin 在微软 Azure 上运行，并且 NVIDIA 的供应链现在可以每周生产“成千上万”台这些机架，“每月可能生产数千兆瓦的 AI 工厂。”正如 Anshel Sag 在 2025 年 GTC 上写的，基础版 Rubin 计划于 2026 年初发布，配备 HBM4 内存。NVIDIA 实现了这一里程碑。但真正的故事并不是 GPU 本身，而是它背后的架构。没有其他半导体公司同时交付这么多专门设计、共同设计的组件。也就是说，交付组件和证明它们在超大规模下能够协同工作是两回事。

从 5000 亿美元到 1 万亿美元：需求管道在 12 个月内翻倍

Jensen 在舞台上讲述的需求故事令人震惊。在去年的 GTC 上，他看到了 5000 亿美元的高信心水平的 Blackwell 和 Rubin 需求，预计到 2026 年。今天，他站在同一个舞台上，说他现在预见到“至少 1 万亿美元”的需求，持续到 2027 年。他还补充道：“我确信计算需求将远远高于这个数字。”

外部数据也支持这一点。微软、Alphabet、亚马逊和 Meta 预计今年将在 AI 投资上花费超过 6500 亿美元，几乎是 2023 年水平的三倍。正如我在 2 月份对 Yahoo Finance 所说，AI 基础设施基本上已经售罄，直到 2027 年底。NVIDIA 公布了 68.1 亿美元的第四季度收入，超出预期超过 80 亿美元，其中数据中心收入为 62.3 亿美元。Vera Rubin 的效率提升正好在客户需要从每瓦特和每美元的基础设施开销中提取更多智能的关键时刻到来。

Groq 集成：我的预测应验了，而 Jensen 展示了其经济学逻辑

在我 GTC 会前的分析中，我做出了一个具体的架构预测：Groq 在短期内更可能采用的集成路径是解耦配置，即 LPU 机架与 GPU 机架并排部署，通过 NVLink 互连，并由 NVIDIA 的软件层进行管理。这正是 NVIDIA 所宣布的方案。

但 Jensen 的讲解比新闻稿更进一步，他展示了“token 工厂”的经济模型。他构建了一个二维框架：纵轴是吞吐量（每瓦 token 数），横轴是 token 速度（延迟/智能水平），并划分了从免费到每百万 token 150 美元的超高端层级。仅 Vera Rubin 就将整个性能前沿整体上移，使数据中心每吉瓦的收入生成能力相较 Blackwell 提升 5 倍。问题在于：NVLink 72 在大约每秒 400 个 token 之后就开始力不从心，其带宽不足以支撑超高端层级。这正是 Groq 发挥作用的地方。

Groq 3 LPX 机架集成了 256 个 LPU 处理器，配备 128GB 片上 SRAM，并具备 640TB/s 的扩展带宽。GPU 负责 attention 计算；LPU 则在每一层对每个输出 token 的解码操作进行加速，并通过定制的 Spectrum-X 互连与 Vera Rubin 相连。Jensen 对部署比例给出了明确说明：“我会将 Groq 部署在大约 25% 的数据中心，其余部分则全部采用 100% 的 Vera Rubin。”NVIDIA 宣称，两者结合后，每兆瓦的推理吞吐量可提升 35 倍。他还感谢三星制造 LP30 芯片，并确认该产品将于 2026 年下半年出货。

Jensen 还解释了为什么 Groq 对他有吸引力：它是一种确定性数据流处理器，采用静态编译、由编译器调度，并配备了大规模片上 SRAM，专为单一工作负载——推理——而设计。这种单一工作负载的聚焦限制了 Groq 独立发展的空间，但与 Vera Rubin 和 Dynamo 搭配后，NVIDIA 能够同时获得两种架构的优势。我一直坚持异构架构的观点。AI 流水线正在分裂为三种不同的工作负载，而 NVIDIA 必须填补这些空白。现在它做到了。如果执行到位，这将成为市场上最强的总体拥有成本（TCO）叙事。

Vera CPU：Jensen 称其为一个数十亿美元级别的业务

在 GTC 之前的文章中，我把 CPU 的复兴称为“一个被低估的重要趋势”。Jensen 则彻底打消了这种说法。他在台上表示：“我们从未想过会单独销售 CPU。但现在我们正在大量单独销售 CPU。这毫无疑问已经会成为一个数十亿美元规模的业务。”

NVIDIA 将 Vera CPU 作为一个专用的机架级产品推出：配备 256 个液冷处理器、400TB 内存以及每秒 300TB 的内存带宽。该芯片采用 88 个 Arm Olympus 核心，每个核心的内存带宽是 x86 的 3 倍，能效提升一倍，单线程性能比当前的 x86 服务器 CPU 提高 1.5 倍。Jensen 用一个简单的逻辑说明其必要性：AI 智能体会调用工具、运行 SQL、编译代码，并在 CPU 上验证结果。如果 CPU 速度慢，GPU 就会闲置。他称 Vera 是“全球唯一使用 LPDDR5 的数据中心 CPU”，强调其极致的单线程性能和每瓦性能。

我在 GTC 之前就在 X 上发帖说，NVIDIA 正在执行旧的 Intel 服务器打法，但速度更快：先以 GPU 为核心，然后在技术栈上下扩展，直到掌握架构话语权。Vera CPU 机架就是这一战略的具体体现。正如 Matt Kimball 在其 CES 2026 分析中所说，CPU 在 AI 系统中并没有变得不重要；它们正在变得更加专用化。Alibaba、ByteDance、Meta 和 Oracle Cloud Infrastructure 正在合作推进部署，同时 Dell Technologies、HPE、Lenovo 和 Supermicro 参与制造。至于超大规模云厂商之外的企业是否会大规模采用 Vera，将取决于定价以及代理型工作负载成为标准的速度。

软件之墙持续升高：Dynamo、OpenShell，以及“每个 SaaS 都会变成 GaaS”

我曾预测 NemoClaw 会成为 GTC 上的软件头条。NVIDIA 的进展超出了我的预期。

Jensen 概述了推动我们走到今天的三个转折点：ChatGPT 开启了生成式时代，o1 开启了推理时代，而 Claude Code 开启了代理时代。他表示：“NVIDIA 有 100% 的员工在使用 Claude Code、Codex 和 Cursor 的组合。如今没有任何一位软件工程师是不借助一个或多个 AI 代理进行工作的。”这正是 NVIDIA 正在构建的软件技术栈背后的需求驱动力。

Dynamo 1.0 现已投入生产，作为 AI 工厂的开源推理操作系统，使 Blackwell 推理速度提升最多 7 倍，并已被 AWS、Azure、Google Cloud、Oracle Cloud 以及包括 PayPal、Pinterest 和字节跳动在内的企业客户采用。配备 OpenShell 的 Agent 工具包为自主代理提供企业级安全防护。NemoClaw 堆栈通过一条命令即可安装 Nemotron 模型和 OpenShell。Jensen 将 OpenClaw 与 Windows 和 Mac 相比，称其为“个人 AI 的操作系统”，并宣称它“与 HTML 同样重要，与 Linux 同样重要”。Adobe、Atlassian、SAP、Salesforce、ServiceNow、CrowdStrike 和 Siemens 正在采用它。

Nemotron 联盟汇聚了 Cursor、LangChain、Mistral AI、Perplexity 等公司，共同在 NVIDIA DGX Cloud 上构建开放前沿模型。NVIDIA 还在 Nemotron 3 上扩展了其开放模型系列，用于智能代理 AI，包括 Isaac GR00T N1.7、Cosmos 3 和 Alpamayo 1.5。Jensen 的挑战性表述是：“每家 SaaS 公司都将成为 GaaS 公司”：即代理即服务。我认为这一方向是正确的，尽管时间表会比 Jensen 所暗示的更长。企业 IT 堆栈不会在两年内重建。

我在 GTC 2024 上写道，NIM 对企业来说“比 Blackwell 更重要”，称其为终极的拥抱并扩展策略。Jensen 用 CUDA 飞轮强化了这一点：20 年，数亿安装的 GPU，以及六年前出货的 Ampere GPU，由于 CUDA 兼容硬件的使用寿命极长，云端定价反而上升。锁定效应已在架构中嵌入，对于任何竞争者来说，在两年内复制这一点是最困难的。

实体 AI 生态系统的广度超出了我的预期

在GTC之前的那篇文章中，我写道实体AI在“2026年不会带来有意义的收入，但这是为2028到2030年铺路。”我仍然坚持这个收入判断。我低估的是生态系统采用的速度。

ABB Robotics、FANUC、KUKA和YASKAWA都在采用NVIDIA Omniverse和Isaac仿真框架。NVIDIA表示，这四家公司合计的全球已安装工业机器人数量超过200万台。Figure、Agility和AGIBOT正在基于Isaac GR00T模型和Jetson Thor构建人形机器人。在自动驾驶方面，比亚迪、吉利、五十铃和日产正在为L4级车辆采用NVIDIA DRIVE Hyperion，Uber计划从2027年开始推出机器人出租车网络，并在2028年扩展到28个城市。在医疗领域，罗氏已部署超过3,500块Blackwell GPU用于药物发现。迪士尼还在舞台上展示了一台会行走的奥拉夫机器人，该机器人在Isaac仿真中训练，使用了与DeepMind联合开发的物理解算器。最后这个更像是表演，但其底层技术（NVIDIA Warp、Newton物理引擎、Cosmos世界模型）与驱动工业应用的是同一套技术栈。

自从公司在GTC 2020上展示宝马工厂应用以来，我一直在跟踪NVIDIA的机器人布局，并且我与一些正在基于NVIDIA“三计算机架构”构建完整开发栈的机器人公司CEO有过交流。在实体AI领域正在形成的生态锁定，类似于CUDA在数据中心所创造的局面。是否有人能够在这一规模上提供可信的替代方案，才是正确的问题。目前答案是否定的。但对于这些合作伙伴来说，实体AI收入仍处于商业化前阶段，从仿真到实际部署生产机器人之间的路径依然漫长。

NVIDIA未充分回应的问题：复杂性、能源与企业

我在 GTC 之前的分析中提到的三个风险，至今仍有部分尚未解决。

复杂性。五种机架类型、七款芯片以及多种互连方案，对于任何不是超大规模云厂商的企业来说都过于复杂。Jensen 提出的 MGX 模块化架构和“代币工厂”经济框架有所帮助，但企业 CIO 仍然需要一种无需依赖一整支 NVIDIA 工程师团队即可部署的参考架构。DGX Spark 和 DGX Station 搭配 NemoClaw 是一个开始，但“桌面级 AI”和“完整 AI 工厂”之间的差距依然很大。

能源。NVIDIA 发布了 DSX Max-Q 和 DSX Flex，用于动态电力调配和电网灵活性。这些是软件优化工具，而不是能源来源。正如我在主题演讲前所写，能源是 2028 年前景中最被低估的约束因素。我对 2026 年和 2027 年充满信心，但再往后一年，则需要行业尚未完全交付的解决方案。

Groq 集成的执行情况。Samsung 正在生产 LP30，NVIDIA 表示将在 2026 年下半年供货。这比我预期更为激进，这是积极信号。但每兆瓦吞吐量提升 35 倍的说法，以及“代币工厂”的收入预测，都需要在客户规模上经过第三方验证。如果这些数据成立，Groq 这笔交易将显得极具前瞻性；如果不成立，这将是一笔 200 亿美元的押注，其回报兑现时间将比市场当前定价所反映的更长。

我有一些问题

在GTC之前的那篇文章中，我提出了四点建议。简化异构计算的信息传达：部分解决，评分B+。Jensen 的“Token 工厂”框架有所帮助，但企业买家需要更简单的入门路径。推出风冷的企业级推理解决方案：在GTC上关于 Vera Rubin 的部分并未完全解决，评分未完成。展示具体的 Groq 集成时间表：已回应，预计 2026 年下半年可用，由 Samsung 负责制造，并给出了明确的 25/75 部署比例，评分 A-（等待验证）。主导共封装光学（CPO）的叙事：已回应，Spectrum-6 SPX 已投入生产，同时确认 Feynman 将扩大铜互连和 CPO 的规模，评分 B。

一个新的建议：让客户公开验证 Vera Rubin 在生产规模下的性能。Jensen 展示了 Satya 已确认在 Azure 上部署。接下来需要让 Anthropic、Meta 或 OpenAI 在下一次财报电话会议或 Computex 上登台，确认他们在自己的 token 工厂中看到的结果。NVIDIA 自己的基准测试只是起点，而不是终点。SemiAnalysis 的整体评测是很好的一步，现在需要在客户规模上展示。

GTC 2026 验证了平台论点。现在开始执行。

GTC 2026印证了我在主题演讲前写下的判断：NVIDIA如今已成为一家异构AI基础设施平台公司。Vera Rubin平台是迄今为止任何半导体公司发布过的架构最完整的AI基础设施方案。软件壁垒变得更高。物理AI生态系统比我预期的更广。而Jensen提到的到2027年达1万亿美元的需求管道，是两年前难以想象的数字。

正如我在GTC 2025中所写，那场大会展示了NVIDIA对自身愿景的信心。GTC 2026更进一步，证明了AI工厂是本十年最具决定性的基础设施类别。到2027年的短期需求在这一周期中处于最强水平。真正的考验将在能源约束、市场份额向70%压缩以及定制芯片逐渐成熟对经济性的压力下到来。正如我在2025年5月对Marketplace所说，AMD和Intel在原始训练性能上落后1到2年，而Google的TPU和Amazon的Trainium是现实存在的替代方案。定制芯片不会消失。但没有任何竞争对手能提供NVIDIA这样的广度：GPU、LPU、CPU、存储、网络，以及将它们整合在一起的软件栈。

我认为NVIDIA的地位是结构性的，而非周期性的。芯片可以被复制，但CUDA、NIMs、NeMo、Dynamo、OpenShell、Omniverse以及开发者生态系统无法在两年内复制。Jensen提醒我们，CUDA已经有20年历史，而Ampere GPU在云端定价中仍在升值。这正是押注所在。GTC 2026是迄今为止最有力的证据，证明这一押注是正确的。

来源

Patrick Moorhead，《NVIDIA GTC 2026：异构计算、Groq 与 AI 构建的下一阶段》，Moor Insights & Strategy（GTC 前分析）

Patrick Moorhead，《NVIDIA 的 AI 全息宇宙在 GTC 2025 扩展》，Moor Insights & Strategy，2025 年 5 月 6 日

Matt Kimball，《NVIDIA 在 CES 2026：Vera Rubin 与 AI 基础设施的变化》，Moor Insights & Strategy，2026 年 1 月 12 日

广播分析：Patrick Moorhead 谈 NVIDIA 财报，Yahoo Finance，2026 年 2 月 25 日

广播分析：Patrick Moorhead 讨论 NVIDIA 的竞争地位、市场动态，2025年5月28日

Patrick Moorhead，LinkedIn 上关于 NVIDIA NIM 在 GTC 2024 的帖子，2024年3月18日

NVIDIA Vera Rubin 平台新闻稿，2026年3月16日

NVIDIA Vera CPU 新闻稿，2026年3月16日

NVIDIA Dynamo 1.0 新闻稿，2026年3月16日

NVIDIA Agent Toolkit 新闻稿，2026年3月16日

NVIDIA Nemotron 联盟新闻稿，2026年3月16日

NVIDIA 开放模型新闻稿，2026年3月16日

英伟达机器人生态系统新闻稿，2026年3月16日

英伟达 DRIVE Hyperion L4 新闻稿，2026年3月16日

英伟达 Vera Rubin DSX 参考设计新闻稿，2026年3月16日

罗氏全球扩展英伟达 AI 工厂，英伟达博客，2026年3月16日

“随着AI投资飙升，大型科技公司预计将在2026年支出6500亿美元，”雅虎财经，2026年2月6日

英伟达GTC 2026主题演讲（黄仁勋），2026年3月16日（现场参加及演讲实录）

显示英文原文 / Show English Original

By Patrick Moorhead, CEO and Chief Analyst, Moor Insights & Strategy I Wrote the Playbook Before the Keynote. Here’s How It Played Out. Last week, I published my GTC 2026 preview with a specific thesis: NVIDIA must prove it can unify training GPUs, pre-fill accelerators, Groq decode processors, and standalone CPUs under a single software layer. I laid out what I expected Jensen Huang to announce, what the risks were, and what I’d advise the company to do. Then I flew to San Jose and watched the keynote from the SAP Center. I’ve attended every GTC since 2011. This was the most architecturally complete keynote I’ve seen Jensen deliver. Seven new chips in full production. Five rack-scale systems. A unified software stack spanning training, inference, agentic orchestration, and storage. A physical AI ecosystem broader than anything I expected. And a Disney robot named Olaf walking across the stage, trained entirely in NVIDIA’s Isaac simulation environment. Jensen opened by celebrating CUDA’s 20th anniversary and closed by declaring that “every SaaS company will become a GaaS company,” an agents-as-a-service company. In between, he laid out the economics of token factories in a way that should get every infrastructure CEO’s attention. The short version: NVIDIA delivered on the heterogeneous platform thesis. The Groq LPU integration landed exactly as I predicted. The Vera CPU moved from sleeper to center stage. The software wall got taller. What surprised me was the speed and the scale: a $1 trillion demand pipeline through 2027, the LPX rack shipping in the second half of 2026, Samsung already manufacturing the Groq LP30 chip, and Satya Nadella confirming Vera Rubin is already running at Microsoft Azure. What wasn’t fully addressed: enterprise simplification and the energy constraint I flagged for 2027. Seven Chips, Five Racks, One AI Factory: The Vera Rubin Platform Huang unveiled the NVIDIA Vera Rubin platform on March 16: seven new chips, all in full production, shipping as five rack-scale systems. The components include the Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU. The racks: Vera Rubin NVL72 for GPU compute, Vera CPU for agentic orchestration, Groq 3 LPX for ultra-low-latency decode, BlueField-4 STX for context memory storage, and Spectrum-6 SPX for Ethernet spine networking. As my colleague Matt Kimball wrote in his CES 2026 research note, NVIDIA positioned Vera Rubin as a new platform, not a new chip generation. GTC 2026 validated that framing. The NVL72 integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. NVIDIA claims 10x inference throughput per watt and one-tenth the cost per token versus Blackwell, and says NVL72 handles large mixture-of-experts models with one-quarter the GPU count of the prior generation. If those efficiency claims hold at production scale, they change AI factory economics for every buyer in the stack.

On stage, Jensen showed the hardware: 100 percent liquid cooled, cable-free compute trays that reduce installation from two days to two hours, and the sixth-generation NVLink switching system. He also confirmed that Satya Nadella had already reported Vera Rubin up and running at Microsoft Azure, and that NVIDIA’s supply chain can now manufacture “thousands per week” of these racks, “potentially multi-gigawatts of AI factories per month.” As Anshel Sag wrote at GTC 2025, the base-model Rubin was slated for early 2026 with HBM4 memory. NVIDIA delivered on that milestone. But the real story isn’t the GPU itself. It’s the architecture around it. No other semiconductor company has shipped this many purpose-built, co-designed components simultaneously. That said, shipping components and proving they work together at hyperscale are two different things. From $500 Billion to $1 Trillion: The Demand Pipeline Doubled in 12 Months The demand story Jensen told on stage is staggering. At last year’s GTC, he saw $500 billion of high-confidence demand for Blackwell and Rubin through 2026. This year, standing on the same stage, he said he now sees “at least $1 trillion” through 2027. He added: “I am certain computing demand will be much higher than that.” The external data backs it up. Microsoft, Alphabet, Amazon, and Meta are on track to spend upward of $650 billion on AI investments this year, nearly tripling 2023 levels. As I told Yahoo Finance in February, AI infrastructure is essentially sold out through the end of 2027. NVIDIA posted fourth-quarter revenue of $68.1 billion, beating estimates by more than $8 billion, with datacenter revenue of $62.3 billion. Vera Rubin’s efficiency gains arrive precisely when customers need to extract more intelligence from every watt and every dollar of infrastructure spend. The Groq Integration: My Prediction Landed, and Jensen Showed the Economics In my pre-GTC analysis, I made a specific architectural prediction: the more likely near-term path for Groq integration was a disaggregated configuration, with LPU racks sitting alongside GPU racks, connected by NVLink, managed by NVIDIA’s software layer. That’s exactly what NVIDIA announced. But Jensen went further than the press release by showing the token factory economics. He walked through a 2D framework: throughput (tokens per watt) on the Y axis, token speed (latency/intelligence) on the X axis, with tiers from free to ultra-premium at $150 per million tokens. Vera Rubin alone shifts the entire frontier up, enabling 5x more revenue generation per gigawatt of data center versus Blackwell. The problem: NVLink 72 runs out of steam beyond about 400 tokens per second. It simply doesn’t have enough bandwidth for the ultra-premium tier. That’s where Groq comes in. The Groq 3 LPX rack packs 256 LPU processors with 128 gigabytes of on-chip SRAM and 640 terabytes per second of scale-up bandwidth. GPUs handle attention math; LPUs accelerate decode operations at every layer for every output token, connected to Vera Rubin via a custom Spectrum-X interconnect. Jensen was specific about the deployment mix: “I would add Groq to maybe 25 percent of my total data center. The rest is all 100 percent Vera Rubin.” Combined, NVIDIA claims 35x more inference throughput per megawatt. He thanked Samsung for manufacturing the LP30 chip and confirmed it ships in the second half of 2026.

Jensen also explained why Groq was attractive to him: it’s a deterministic data flow processor, statically compiled, compiler-scheduled, with massive on-chip SRAM designed for one workload: inference. That single-workload focus limited Groq’s standalone reach, but paired with Vera Rubin and Dynamo, NVIDIA gets the best of both architectures. I’ve been consistent on the heterogeneous thesis. The AI pipeline is splitting into three distinct workloads, and NVIDIA had to fill the gaps. Now it has. If execution holds, it’s the strongest total cost of ownership story in the market. The Vera CPU: Jensen Called It a Multi-Billion Dollar Business In the pre-GTC piece, I called the CPU resurgence “one of the sleeper storylines.” Jensen put that to rest. He said on stage: “We never thought we would be selling CPU standalone. We are selling a lot of CPU standalone. This is already, for sure, going to be a multi-billion dollar business.” NVIDIA launched the Vera CPU as a dedicated rack-scale product: 256 liquid-cooled processors, 400 terabytes of memory, 300 terabytes per second of memory bandwidth. The chip uses 88 Arm Olympus cores with 3x more memory bandwidth per core than x86, twice the energy efficiency, and 1.5x better single-thread performance versus today’s x86 server CPUs. Jensen framed the need simply: AI agents call tools, run SQL, compile code, and validate results on CPUs. If the CPUs are slow, the GPUs sit idle. He called Vera “the only data center CPU in the world that uses LPDDR5,” emphasizing extreme single-thread performance and performance per watt. I posted on X before GTC that NVIDIA is executing the old Intel server playbook but faster: anchor the GPU, then expand up and down the stack until you own the architecture conversation. The Vera CPU rack is that strategy made concrete. As Matt Kimball put it in his CES 2026 analysis, CPUs aren’t becoming less relevant in AI systems; they’re becoming more specialized. Alibaba, ByteDance, Meta, and Oracle Cloud Infrastructure are collaborating on deployment, alongside Dell Technologies, HPE, Lenovo, and Supermicro on manufacturing. Whether enterprises outside the hyperscaler tier adopt Vera at volume will depend on pricing and how quickly agentic workloads become standard. The Software Wall Keeps Rising: Dynamo, OpenShell, and “Every SaaS Becomes GaaS” I predicted NemoClaw would be the software headline at GTC. NVIDIA went further than I expected. Jensen framed the three inflections that got us here: ChatGPT started the generative era, o1 started the reasoning era, and Claude Code started the agentic era. He said “100 percent of NVIDIA is using a combination of Claude Code, Codex, and Cursor. There’s not one software engineer today who is not assisted by one or many AI agents.” That’s the demand driver behind the software stack NVIDIA is building.

Dynamo 1.0 is now in production as the open-source inference operating system for AI factories, boosting Blackwell inference by up to 7x and adopted across AWS, Azure, Google Cloud, Oracle Cloud, and enterprise customers including PayPal, Pinterest, and ByteDance. The Agent Toolkit with OpenShell provides enterprise security guardrails for autonomous agents. The NemoClaw stack installs Nemotron models and OpenShell in a single command. Jensen compared OpenClaw to Windows and Mac, calling it “the operating system for personal AI” and declaring it “as big a deal as HTML, as big as Linux.” Adobe, Atlassian, SAP, Salesforce, ServiceNow, CrowdStrike, and Siemens are adopting it. The Nemotron Coalition brings Cursor, LangChain, Mistral AI, Perplexity, and others together to build open frontier models on NVIDIA DGX Cloud. NVIDIA also expanded its open model families across Nemotron 3 for agentic AI, Isaac GR00T N1.7, Cosmos 3, and Alpamayo 1.5. Jensen’s provocation: “Every SaaS company will become a GaaS company”: agents-as-a-service. I think that’s directionally right, though the timeline will be longer than Jensen implies. The enterprise IT stack doesn’t get rebuilt in two years. I wrote at GTC 2024 that NIM was “bigger than Blackwell” for enterprises, calling it the ultimate embrace-and-extend play. Jensen reinforced this with the CUDA flywheel: 20 years, hundreds of millions of installed GPUs, and Ampere GPUs (shipped six years ago) with pricing going UP in the cloud because the useful life of CUDA-compatible hardware is so long. The lock-in is architecturally embedded, and it’s the hardest thing for any competitor to replicate on a two-year timeline. Physical AI Ecosystem Breadth Exceeded My Expectations In the pre-GTC piece, I wrote that physical AI was “not meaningful 2026 revenue, but it is the 2028 to 2030 setup.” I stand by the revenue call. What I underestimated was the pace of ecosystem adoption. ABB Robotics, FANUC, KUKA, and YASKAWA are all adopting NVIDIA Omniverse and Isaac simulation frameworks. NVIDIA says these four represent a combined global installed base exceeding two million industrial robots. Figure, Agility, and AGIBOT are building humanoid robots on Isaac GR00T models and Jetson Thor. On autonomous vehicles, BYD, Geely, Isuzu, and Nissan are adopting NVIDIA DRIVE Hyperion for level 4 vehicles, with Uber planning a robotaxi network starting in 2027 and scaling to 28 cities by 2028. In healthcare, Roche deployed more than 3,500 Blackwell GPUs for drug discovery. And Disney brought a walking Olaf robot on stage, trained in Isaac simulation using a physics solver co-developed with DeepMind. That last one was pure theater, but the underlying tech (NVIDIA Warp, Newton physics engine, Cosmos world models) is the same stack powering the industrial applications. I’ve been tracking NVIDIA’s robotics push since the company demonstrated BMW factory applications at GTC 2020, and I’ve spoken with robotics CEOs who are building entire development stacks on NVIDIA’s three-computer architecture. The ecosystem lock-in forming in physical AI mirrors what CUDA created in the datacenter. Whether anyone can offer a credible alternative at this scale is the right question. Right now, the answer is no. But physical AI revenue remains pre-commercial for most of these partners, and the path from simulation to deployed production robots is long. What NVIDIA Didn’t Fully Address: Complexity, Energy, and Enterprise

Three risks from my pre-GTC analysis remain partially unresolved. Complexity. Five rack types, seven chips, and multiple interconnects is a lot for anyone who isn’t a hyperscaler. The MGX modular architecture and the token factory economics framework Jensen presented help, but enterprise CIOs still need a reference architecture they can deploy without a team of NVIDIA engineers. DGX Spark and DGX Station paired with NemoClaw are a start, but the gap between “desktop AI” and “full AI factory” remains wide. Energy. NVIDIA announced DSX Max-Q and DSX Flex for dynamic power provisioning and grid flexibility. Those are software optimization tools, not energy sources. As I wrote before the keynote, energy is the most underappreciated constraint on the 2028 outlook. I’m confident about 2026 and 2027. The year after that requires solutions the industry hasn’t fully delivered. Groq integration execution. Samsung is manufacturing the LP30, and NVIDIA says second-half 2026 availability. That’s more aggressive than I expected, which is positive. But the 35x throughput per megawatt claim and the token factory revenue projections need third-party validation at customer scale. If those numbers hold, the Groq deal will look prescient. If they don’t, it’s a $20 billion bet that takes longer to pay off than the market is pricing in. Questions I Have In the pre-GTC piece, I pushed on four advisory points. Simplify the heterogeneous compute message: partially addressed, B+. Jensen’s token factory framework helps, but enterprise buyers need a simpler on-ramp. Ship an air-cooled enterprise inference solution: not completely addressed at GTC on Vera Rubin, grade incomplete. Show concrete Groq integration timelines: addressed with second-half 2026 availability, Samsung manufacturing, and a specific 25/75 deployment ratio, grade A-minus pending validation. Own the co-packaged optics narrative: addressed with Spectrum-6 SPX in production plus both copper and CPO scale-up confirmed for Feynman, grade B. One new advisory: get customers on the record validating Vera Rubin performance at production scale. Jensen showed that Satya confirmed Azure deployment. Now get Anthropic, Meta, or OpenAI on stage at the next earnings call or Computex to confirm what they’re seeing in their token factories. NVIDIA’s own benchmarks are a starting point, not a finish line. The semi analysis sweep was a good step. Now show it at customer scale. GTC 2026 Validates the Platform Thesis. Now Execute.

GTC 2026 confirmed what I wrote before the keynote: NVIDIA is now a heterogeneous AI infrastructure platform company. The Vera Rubin platform is the most architecturally complete AI infrastructure announcement any semiconductor company has made. The software wall got taller. The physical AI ecosystem is broader than I anticipated. And Jensen’s $1 trillion demand pipeline through 2027 is a number that would have been unthinkable two years ago. As I wrote at GTC 2025, that show was a demonstration of NVIDIA’s confidence in its own vision. GTC 2026 goes further. It’s a demonstration that the AI factory is the defining infrastructure category of this decade. Near-term demand through 2027 is as strong as any point in this cycle. The real test comes when energy constraints, market share compression toward 70 percent, and maturing custom silicon pressure the economics. As I told Marketplace in May 2025, AMD and Intel are one to two years behind in raw training performance, and Google’s TPU and Amazon’s Trainium are real alternatives. Custom silicon isn’t going away. But no competitor offers NVIDIA’s breadth: GPUs, LPUs, CPUs, storage, networking, and the software stack tying it together. I believe NVIDIA’s position is structural, not cyclical. Chips can be replicated. CUDA, NIMs, NeMo, Dynamo, OpenShell, Omniverse, and the developer ecosystem can’t be replicated in two years. Jensen reminded us that CUDA is 20 years old and Ampere GPUs are still appreciating in cloud pricing. That’s the bet. GTC 2026 is the strongest evidence yet that it’s the right one. Sources Patrick Moorhead, “NVIDIA GTC 2026: Heterogeneous Compute, Groq, and the Next Phase of the AI Build-Out,” Moor Insights & Strategy (pre-GTC analysis) Patrick Moorhead, “NVIDIA’s AI Omniverse Expands at GTC 2025,” Moor Insights & Strategy, May 6, 2025 Matt Kimball, “NVIDIA at CES 2026: Vera Rubin and the Changing Shape of AI Infrastructure,” Moor Insights & Strategy, January 12, 2026 Broadcast Analysis: Patrick Moorhead on NVIDIA Earnings, Yahoo Finance, February 25, 2026

Broadcast Analysis: Patrick Moorhead on NVIDIA Competitive Position, Marketplace, May 28, 2025 Patrick Moorhead, LinkedIn post on NVIDIA NIM at GTC 2024, March 18, 2024 NVIDIA Vera Rubin Platform Press Release, March 16, 2026 NVIDIA Vera CPU Press Release, March 16, 2026 NVIDIA Dynamo 1.0 Press Release, March 16, 2026 NVIDIA Agent Toolkit Press Release, March 16, 2026 NVIDIA Nemotron Coalition Press Release, March 16, 2026 NVIDIA Open Models Press Release, March 16, 2026

NVIDIA Robot Ecosystem Press Release, March 16, 2026 NVIDIA DRIVE Hyperion L4 Press Release, March 16, 2026 NVIDIA Vera Rubin DSX Reference Design Press Release, March 16, 2026 Roche Scales NVIDIA AI Factories Globally, NVIDIA Blog, March 16, 2026 “Big Tech set to spend $650 billion in 2026 as AI investments soar,” Yahoo Finance, February 6, 2026 NVIDIA GTC 2026 Keynote by Jensen Huang, March 16, 2026 (live attendance and transcript)

来源 Source

https://x.com/i/article/2033575367961874432