By Patrick Moorhead, CEO and Chief Analyst, Moor Insights & Strategy I Wrote the Playbook Before the Keynote. Here’s How It Played Out. Last week, I published my GTC 2026 preview with a specific thesis: NVIDIA must prove it can unify training GPUs, pre-fill accelerators, Groq decode processors, and standalone CPUs under a single software layer. I laid out what I expected Jensen Huang to announce, what the risks were, and what I’d advise the company to do. Then I flew to San Jose and watched the keynote from the SAP Center. I’ve attended every GTC since 2011. This was the most architecturally complete keynote I’ve seen Jensen deliver. Seven new chips in full production. Five rack-scale systems. A unified software stack spanning training, inference, agentic orchestration, and storage. A physical AI ecosystem broader than anything I expected. And a Disney robot named Olaf walking across the stage, trained entirely in NVIDIA’s Isaac simulation environment. Jensen opened by celebrating CUDA’s 20th anniversary and closed by declaring that “every SaaS company will become a GaaS company,” an agents-as-a-service company. In between, he laid out the economics of token factories in a way that should get every infrastructure CEO’s attention. The short version: NVIDIA delivered on the heterogeneous platform thesis. The Groq LPU integration landed exactly as I predicted. The Vera CPU moved from sleeper to center stage. The software wall got taller. What surprised me was the speed and the scale: a $1 trillion demand pipeline through 2027, the LPX rack shipping in the second half of 2026, Samsung already manufacturing the Groq LP30 chip, and Satya Nadella confirming Vera Rubin is already running at Microsoft Azure. What wasn’t fully addressed: enterprise simplification and the energy constraint I flagged for 2027. Seven Chips, Five Racks, One AI Factory: The Vera Rubin Platform Huang unveiled the NVIDIA Vera Rubin platform on March 16: seven new chips, all in full production, shipping as five rack-scale systems. The components include the Rubin GPU, Vera CPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet switch, and the newly integrated Groq 3 LPU. The racks: Vera Rubin NVL72 for GPU compute, Vera CPU for agentic orchestration, Groq 3 LPX for ultra-low-latency decode, BlueField-4 STX for context memory storage, and Spectrum-6 SPX for Ethernet spine networking. As my colleague Matt Kimball wrote in his CES 2026 research note, NVIDIA positioned Vera Rubin as a new platform, not a new chip generation. GTC 2026 validated that framing. The NVL72 integrates 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6. NVIDIA claims 10x inference throughput per watt and one-tenth the cost per token versus Blackwell, and says NVL72 handles large mixture-of-experts models with one-quarter the GPU count of the prior generation. If those efficiency claims hold at production scale, they change AI factory economics for every buyer in the stack.
On stage, Jensen showed the hardware: 100 percent liquid cooled, cable-free compute trays that reduce installation from two days to two hours, and the sixth-generation NVLink switching system. He also confirmed that Satya Nadella had already reported Vera Rubin up and running at Microsoft Azure, and that NVIDIA’s supply chain can now manufacture “thousands per week” of these racks, “potentially multi-gigawatts of AI factories per month.” As Anshel Sag wrote at GTC 2025, the base-model Rubin was slated for early 2026 with HBM4 memory. NVIDIA delivered on that milestone. But the real story isn’t the GPU itself. It’s the architecture around it. No other semiconductor company has shipped this many purpose-built, co-designed components simultaneously. That said, shipping components and proving they work together at hyperscale are two different things. From $500 Billion to $1 Trillion: The Demand Pipeline Doubled in 12 Months The demand story Jensen told on stage is staggering. At last year’s GTC, he saw $500 billion of high-confidence demand for Blackwell and Rubin through 2026. This year, standing on the same stage, he said he now sees “at least $1 trillion” through 2027. He added: “I am certain computing demand will be much higher than that.” The external data backs it up. Microsoft, Alphabet, Amazon, and Meta are on track to spend upward of $650 billion on AI investments this year, nearly tripling 2023 levels. As I told Yahoo Finance in February, AI infrastructure is essentially sold out through the end of 2027. NVIDIA posted fourth-quarter revenue of $68.1 billion, beating estimates by more than $8 billion, with datacenter revenue of $62.3 billion. Vera Rubin’s efficiency gains arrive precisely when customers need to extract more intelligence from every watt and every dollar of infrastructure spend. The Groq Integration: My Prediction Landed, and Jensen Showed the Economics In my pre-GTC analysis, I made a specific architectural prediction: the more likely near-term path for Groq integration was a disaggregated configuration, with LPU racks sitting alongside GPU racks, connected by NVLink, managed by NVIDIA’s software layer. That’s exactly what NVIDIA announced. But Jensen went further than the press release by showing the token factory economics. He walked through a 2D framework: throughput (tokens per watt) on the Y axis, token speed (latency/intelligence) on the X axis, with tiers from free to ultra-premium at $150 per million tokens. Vera Rubin alone shifts the entire frontier up, enabling 5x more revenue generation per gigawatt of data center versus Blackwell. The problem: NVLink 72 runs out of steam beyond about 400 tokens per second. It simply doesn’t have enough bandwidth for the ultra-premium tier. That’s where Groq comes in. The Groq 3 LPX rack packs 256 LPU processors with 128 gigabytes of on-chip SRAM and 640 terabytes per second of scale-up bandwidth. GPUs handle attention math; LPUs accelerate decode operations at every layer for every output token, connected to Vera Rubin via a custom Spectrum-X interconnect. Jensen was specific about the deployment mix: “I would add Groq to maybe 25 percent of my total data center. The rest is all 100 percent Vera Rubin.” Combined, NVIDIA claims 35x more inference throughput per megawatt. He thanked Samsung for manufacturing the LP30 chip and confirmed it ships in the second half of 2026.
Jensen also explained why Groq was attractive to him: it’s a deterministic data flow processor, statically compiled, compiler-scheduled, with massive on-chip SRAM designed for one workload: inference. That single-workload focus limited Groq’s standalone reach, but paired with Vera Rubin and Dynamo, NVIDIA gets the best of both architectures. I’ve been consistent on the heterogeneous thesis. The AI pipeline is splitting into three distinct workloads, and NVIDIA had to fill the gaps. Now it has. If execution holds, it’s the strongest total cost of ownership story in the market. The Vera CPU: Jensen Called It a Multi-Billion Dollar Business In the pre-GTC piece, I called the CPU resurgence “one of the sleeper storylines.” Jensen put that to rest. He said on stage: “We never thought we would be selling CPU standalone. We are selling a lot of CPU standalone. This is already, for sure, going to be a multi-billion dollar business.” NVIDIA launched the Vera CPU as a dedicated rack-scale product: 256 liquid-cooled processors, 400 terabytes of memory, 300 terabytes per second of memory bandwidth. The chip uses 88 Arm Olympus cores with 3x more memory bandwidth per core than x86, twice the energy efficiency, and 1.5x better single-thread performance versus today’s x86 server CPUs. Jensen framed the need simply: AI agents call tools, run SQL, compile code, and validate results on CPUs. If the CPUs are slow, the GPUs sit idle. He called Vera “the only data center CPU in the world that uses LPDDR5,” emphasizing extreme single-thread performance and performance per watt. I posted on X before GTC that NVIDIA is executing the old Intel server playbook but faster: anchor the GPU, then expand up and down the stack until you own the architecture conversation. The Vera CPU rack is that strategy made concrete. As Matt Kimball put it in his CES 2026 analysis, CPUs aren’t becoming less relevant in AI systems; they’re becoming more specialized. Alibaba, ByteDance, Meta, and Oracle Cloud Infrastructure are collaborating on deployment, alongside Dell Technologies, HPE, Lenovo, and Supermicro on manufacturing. Whether enterprises outside the hyperscaler tier adopt Vera at volume will depend on pricing and how quickly agentic workloads become standard. The Software Wall Keeps Rising: Dynamo, OpenShell, and “Every SaaS Becomes GaaS” I predicted NemoClaw would be the software headline at GTC. NVIDIA went further than I expected. Jensen framed the three inflections that got us here: ChatGPT started the generative era, o1 started the reasoning era, and Claude Code started the agentic era. He said “100 percent of NVIDIA is using a combination of Claude Code, Codex, and Cursor. There’s not one software engineer today who is not assisted by one or many AI agents.” That’s the demand driver behind the software stack NVIDIA is building.
Dynamo 1.0 is now in production as the open-source inference operating system for AI factories, boosting Blackwell inference by up to 7x and adopted across AWS, Azure, Google Cloud, Oracle Cloud, and enterprise customers including PayPal, Pinterest, and ByteDance. The Agent Toolkit with OpenShell provides enterprise security guardrails for autonomous agents. The NemoClaw stack installs Nemotron models and OpenShell in a single command. Jensen compared OpenClaw to Windows and Mac, calling it “the operating system for personal AI” and declaring it “as big a deal as HTML, as big as Linux.” Adobe, Atlassian, SAP, Salesforce, ServiceNow, CrowdStrike, and Siemens are adopting it. The Nemotron Coalition brings Cursor, LangChain, Mistral AI, Perplexity, and others together to build open frontier models on NVIDIA DGX Cloud. NVIDIA also expanded its open model families across Nemotron 3 for agentic AI, Isaac GR00T N1.7, Cosmos 3, and Alpamayo 1.5. Jensen’s provocation: “Every SaaS company will become a GaaS company”: agents-as-a-service. I think that’s directionally right, though the timeline will be longer than Jensen implies. The enterprise IT stack doesn’t get rebuilt in two years. I wrote at GTC 2024 that NIM was “bigger than Blackwell” for enterprises, calling it the ultimate embrace-and-extend play. Jensen reinforced this with the CUDA flywheel: 20 years, hundreds of millions of installed GPUs, and Ampere GPUs (shipped six years ago) with pricing going UP in the cloud because the useful life of CUDA-compatible hardware is so long. The lock-in is architecturally embedded, and it’s the hardest thing for any competitor to replicate on a two-year timeline. Physical AI Ecosystem Breadth Exceeded My Expectations In the pre-GTC piece, I wrote that physical AI was “not meaningful 2026 revenue, but it is the 2028 to 2030 setup.” I stand by the revenue call. What I underestimated was the pace of ecosystem adoption. ABB Robotics, FANUC, KUKA, and YASKAWA are all adopting NVIDIA Omniverse and Isaac simulation frameworks. NVIDIA says these four represent a combined global installed base exceeding two million industrial robots. Figure, Agility, and AGIBOT are building humanoid robots on Isaac GR00T models and Jetson Thor. On autonomous vehicles, BYD, Geely, Isuzu, and Nissan are adopting NVIDIA DRIVE Hyperion for level 4 vehicles, with Uber planning a robotaxi network starting in 2027 and scaling to 28 cities by 2028. In healthcare, Roche deployed more than 3,500 Blackwell GPUs for drug discovery. And Disney brought a walking Olaf robot on stage, trained in Isaac simulation using a physics solver co-developed with DeepMind. That last one was pure theater, but the underlying tech (NVIDIA Warp, Newton physics engine, Cosmos world models) is the same stack powering the industrial applications. I’ve been tracking NVIDIA’s robotics push since the company demonstrated BMW factory applications at GTC 2020, and I’ve spoken with robotics CEOs who are building entire development stacks on NVIDIA’s three-computer architecture. The ecosystem lock-in forming in physical AI mirrors what CUDA created in the datacenter. Whether anyone can offer a credible alternative at this scale is the right question. Right now, the answer is no. But physical AI revenue remains pre-commercial for most of these partners, and the path from simulation to deployed production robots is long. What NVIDIA Didn’t Fully Address: Complexity, Energy, and Enterprise
Three risks from my pre-GTC analysis remain partially unresolved. Complexity. Five rack types, seven chips, and multiple interconnects is a lot for anyone who isn’t a hyperscaler. The MGX modular architecture and the token factory economics framework Jensen presented help, but enterprise CIOs still need a reference architecture they can deploy without a team of NVIDIA engineers. DGX Spark and DGX Station paired with NemoClaw are a start, but the gap between “desktop AI” and “full AI factory” remains wide. Energy. NVIDIA announced DSX Max-Q and DSX Flex for dynamic power provisioning and grid flexibility. Those are software optimization tools, not energy sources. As I wrote before the keynote, energy is the most underappreciated constraint on the 2028 outlook. I’m confident about 2026 and 2027. The year after that requires solutions the industry hasn’t fully delivered. Groq integration execution. Samsung is manufacturing the LP30, and NVIDIA says second-half 2026 availability. That’s more aggressive than I expected, which is positive. But the 35x throughput per megawatt claim and the token factory revenue projections need third-party validation at customer scale. If those numbers hold, the Groq deal will look prescient. If they don’t, it’s a $20 billion bet that takes longer to pay off than the market is pricing in. Questions I Have In the pre-GTC piece, I pushed on four advisory points. Simplify the heterogeneous compute message: partially addressed, B+. Jensen’s token factory framework helps, but enterprise buyers need a simpler on-ramp. Ship an air-cooled enterprise inference solution: not completely addressed at GTC on Vera Rubin, grade incomplete. Show concrete Groq integration timelines: addressed with second-half 2026 availability, Samsung manufacturing, and a specific 25/75 deployment ratio, grade A-minus pending validation. Own the co-packaged optics narrative: addressed with Spectrum-6 SPX in production plus both copper and CPO scale-up confirmed for Feynman, grade B. One new advisory: get customers on the record validating Vera Rubin performance at production scale. Jensen showed that Satya confirmed Azure deployment. Now get Anthropic, Meta, or OpenAI on stage at the next earnings call or Computex to confirm what they’re seeing in their token factories. NVIDIA’s own benchmarks are a starting point, not a finish line. The semi analysis sweep was a good step. Now show it at customer scale. GTC 2026 Validates the Platform Thesis. Now Execute.
GTC 2026 confirmed what I wrote before the keynote: NVIDIA is now a heterogeneous AI infrastructure platform company. The Vera Rubin platform is the most architecturally complete AI infrastructure announcement any semiconductor company has made. The software wall got taller. The physical AI ecosystem is broader than I anticipated. And Jensen’s $1 trillion demand pipeline through 2027 is a number that would have been unthinkable two years ago. As I wrote at GTC 2025, that show was a demonstration of NVIDIA’s confidence in its own vision. GTC 2026 goes further. It’s a demonstration that the AI factory is the defining infrastructure category of this decade. Near-term demand through 2027 is as strong as any point in this cycle. The real test comes when energy constraints, market share compression toward 70 percent, and maturing custom silicon pressure the economics. As I told Marketplace in May 2025, AMD and Intel are one to two years behind in raw training performance, and Google’s TPU and Amazon’s Trainium are real alternatives. Custom silicon isn’t going away. But no competitor offers NVIDIA’s breadth: GPUs, LPUs, CPUs, storage, networking, and the software stack tying it together. I believe NVIDIA’s position is structural, not cyclical. Chips can be replicated. CUDA, NIMs, NeMo, Dynamo, OpenShell, Omniverse, and the developer ecosystem can’t be replicated in two years. Jensen reminded us that CUDA is 20 years old and Ampere GPUs are still appreciating in cloud pricing. That’s the bet. GTC 2026 is the strongest evidence yet that it’s the right one. Sources Patrick Moorhead, “NVIDIA GTC 2026: Heterogeneous Compute, Groq, and the Next Phase of the AI Build-Out,” Moor Insights & Strategy (pre-GTC analysis) Patrick Moorhead, “NVIDIA’s AI Omniverse Expands at GTC 2025,” Moor Insights & Strategy, May 6, 2025 Matt Kimball, “NVIDIA at CES 2026: Vera Rubin and the Changing Shape of AI Infrastructure,” Moor Insights & Strategy, January 12, 2026 Broadcast Analysis: Patrick Moorhead on NVIDIA Earnings, Yahoo Finance, February 25, 2026
Broadcast Analysis: Patrick Moorhead on NVIDIA Competitive Position, Marketplace, May 28, 2025 Patrick Moorhead, LinkedIn post on NVIDIA NIM at GTC 2024, March 18, 2024 NVIDIA Vera Rubin Platform Press Release, March 16, 2026 NVIDIA Vera CPU Press Release, March 16, 2026 NVIDIA Dynamo 1.0 Press Release, March 16, 2026 NVIDIA Agent Toolkit Press Release, March 16, 2026 NVIDIA Nemotron Coalition Press Release, March 16, 2026 NVIDIA Open Models Press Release, March 16, 2026
NVIDIA Robot Ecosystem Press Release, March 16, 2026 NVIDIA DRIVE Hyperion L4 Press Release, March 16, 2026 NVIDIA Vera Rubin DSX Reference Design Press Release, March 16, 2026 Roche Scales NVIDIA AI Factories Globally, NVIDIA Blog, March 16, 2026 “Big Tech set to spend $650 billion in 2026 as AI investments soar,” Yahoo Finance, February 6, 2026 NVIDIA GTC 2026 Keynote by Jensen Huang, March 16, 2026 (live attendance and transcript)