NVidia Blackwell Fleet
Blackwells are here.
NVIDIA B200, GB200, B300 and GB300s for your most demanding AI workloads.
The architecture high-performing teams trust
HGX B200
Faster training, efficient scalability
The second-generation Transformer Engine features FP8 and new precisions for 3X faster training on GPT MoE 1.8T.
HGX GB200
Real-time inference for LLM
Achieve up to 30X faster real-time trillion-parameter LLM inference compared to the NVIDIA H100 TensorCore GPU.
Compare the entire NVIDIA Blackwell fleet
Instance
GPU
GPU Memory
VCPUS
Storage
Network Bandwidth
NVIDIA HGX B200
GPU
HGX B200
GPU Memory
192GB HBM3e
VCPUs
2x Intel Xeon 6 Performance 6767P
Storage
OS: 2x960GB M.2 (RAID 1); Data: 4x3.84TB NVMe (15.36TB Total)
Network Bandwidth
0.8 TB/s
NVIDIA HGX GB200*
GPU
GB200NVL72
GPU Memory
186GB HBM3E | 8TB/s
VCPUs
72-core Arm Grace CPU acts as vCPUs
NVIDIA HGX B300*
GPU
HGX B300
GPU Memory
Up to 2.3 TB total
VCPUs
2× Intel Xeon CPUs (exact model unspecified)
Storage
OS: 2× 1.9 TB NVMe M.2 SSDs; Internal: 8× 3.84 TB NVMe E1.S SSDs
Network Bandwidth
1.6 TB/s
NVIDIA HGX GB300*
GPU
GB300NVL72
GPU Memory
20.1 TB HBM3e total
VCPUs
72x NVIDIA Blackwell Ultra GPUs (2592 CPU cores)
Storage
Not Available
Network Bandwidth
7.2 TB/s
*Liquid-cooled system
NVIDIA HGX B200
For any stage of your develop-to-deploy pipeline
An engine built to boost
Second-generation NVIDIA Blackwell Transformer Engine’s custom Tensor Core technology meets NVIDIA’s software stack. Lightning-fast inference and training for LLMs and MoE models.
Hardware-based security
The first TEE-I/O capable GPU in the industry with capable hosts and inline protection over NVLink.
NVLink and NVLink switch
Fifth-generation NVLink interconnect scales up to 576 GPUs. The NVLink Switch chip delivers 130 TB/s in a 72-GPU domain, plus 4x better bandwidth efficiency with SHARP FP8 support.
NVIDIA HGX GB200
Architecture for trillion-parameter inference performance and training
AI superchip
208 billion transistors in a single AI engine. All NVIDIA Grace Blackwells feature two reticle-limited dies connected by a 10 TB/s chip-to-chip interconnect.
Built to boost
Second-generation Blackwell Transformer Engine’s custom Tensor Core technology meets NVIDIA’s software stack. Lightning-fast inference and training for LLMs and MoE models.
Major Scale-Training
Second-generation Transformer Engine featuring FP8 precision gives 4x faster training for LLMs at scale.
NVIDIA HGX B300
The building block of reasoning
Real-time inference and training
The same capabilities hyperscalers rely on with 144 petaFLOPS for inference and 72 petaFLOPS for training.
Automate AI operations
NVIDIA Mission Control and AI Enterprise helps automate infrastructure, cluster management, and model deployment to help teams scale efficiently.
Reasoning at scale
Blackwell Ultra Tensor Cores feature 2x faster attention-layer acceleration and 1.5x more AI compute FLOPS than standard Blackwell GPUs.
NVIDIA HGX GB300
Efficiency for data center workloads, without performance compromise
AI reasoning inference
NVIDIA Blackwell Ultra’s Tensor Cores boast 1.5x more AI compute FLOPS compared to Blackwell GPUs.
More memory
288 GB of HBM3e for larger batch sizing and boosted throughput for massive context lengths.
Superior RDMA
NVIDIA ConnectX-8 SuperNIC’s input/output (IO) module hosts two ConnectX-8 devices for 800 Gb/s of network connectivity per GPU.