NVIDIA Unveils Blackwell Ultra to Revolutionize AI Reasoning

Rongchai Wang
Mar 20, 2025 03:29

NVIDIA introduces Blackwell Ultra, a platform designed for the era of AI reasoning, offering enhanced performance for training, post-training, and test-time scaling.

NVIDIA has announced the launch of Blackwell Ultra, a new accelerated computing platform tailored for the evolving needs of AI reasoning. This platform is designed to enhance the capabilities of AI systems by optimizing training, post-training, and test-time scaling, according to NVIDIA.

Advancements in AI Scaling

Over the past five years, the requirements for AI pretraining have skyrocketed by a factor of 50 million, leading to significant advancements. However, the focus is now shifting towards refining models to enhance their reasoning capabilities. This involves post-training scaling, which utilizes domain-specific and synthetic data to improve AI’s conversational skills and understanding of nuanced contexts.

A new scaling law, termed ‘test-time scaling’ or ‘long thinking’, has emerged. This approach dynamically increases compute resources during AI inference, enabling deeper reasoning. Unlike traditional models that generate responses in a single pass, these advanced models can think and refine answers in real time, moving closer to autonomous intelligence.

The Blackwell Ultra Platform

The Blackwell Ultra platform is at the core of NVIDIA’s GB300 NVL72 systems, comprising a liquid-cooled, rack-scale solution that connects 36 NVIDIA Grace CPUs and 72 Blackwell Ultra GPUs. This setup forms a massive GPU domain with a total NVLink bandwidth of 130 TB/s, significantly enhancing AI inference performance.

With up to 288 GB of HBM3e memory per GPU, Blackwell Ultra supports large-scale AI models and complex tasks, offering improved performance and reduced latency. Its Tensor Cores provide 1.5x more AI compute FLOPS compared to previous models, optimizing memory usage and enabling breakthroughs in AI research and real-time analytics.

Enhanced Inference and Networking

NVIDIA’s Blackwell Ultra also features PCIe Gen6 connectivity with NVIDIA ConnectX-8 800G SuperNIC, which boosts network bandwidth to 800 Gb/s. This increased bandwidth enhances performance at scale, supported by NVIDIA Dynamo, an open-source library that scales up AI services and manages workloads across GPU nodes efficiently.

Dynamo’s disaggregated serving optimizes performance by separating the context and generation phases for large language model (LLM) inference, thus reducing costs and improving scalability. With a total data throughput of 800 Gb/s per GPU, GB300 NVL72 integrates seamlessly with NVIDIA’s Quantum-X800 and Spectrum-X platforms, meeting the demands of modern AI factories.

Impact on AI Factories

The introduction of Blackwell Ultra is expected to boost AI factory outputs significantly. NVIDIA GB300 NVL72 systems promise a 10x increase in throughput per user and a 5x improvement in throughput per megawatt, culminating in a 50x overall increase in AI factory output performance.

This advancement in AI reasoning will facilitate real-time insights, enhance predictive analytics, and improve AI agents across various industries, including finance, healthcare, and e-commerce. Organizations will be able to handle larger models and workloads without compromising on speed, making advanced AI capabilities more practical and accessible.

NVIDIA Blackwell Ultra products are anticipated to be available from partners in the second half of 2025, with support from major cloud service providers and server manufacturers.

Image source: Shutterstock