Hopper (microarchitecture) – Wikipedia

From Wikipedia, the free encyclopedia

GPU microarchitecture designed by Nvidia

Nvidia Hopper
Fabrication process TSMC N4
Predecessor Ampere (consumer, professional)
Supported

Grace Hopper, eponym of the architecture

Hopper is the codename for Nvidia’s GPU Datacenter microarchitecture that will be parallel release of Ada Lovelace (for the consumer segment).[citation needed] It is named after the American computer scientist and United States Navy Rear Admiral Grace Hopper. Hopper was once rumored to be Nvidia’s first generation of GPUs that will use multi-chip modules (MCMs), although the H100 announcement showed a massive monolithic die.[1][2][3][4][5][6] Nvidia officially announced the Hopper GPU microarchitecture and H100 GPU at GTC 2022 on March 22, 2022.[7]

Details[edit]

Architectural improvements of the Hopper architecture include the following:[8]

  • CUDA Compute Capability 9.0[9]
  • TSMC N4 FinFET process
  • Fourth-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.
  • New Nvidia Transformer Engine with FP8 and FP16
  • New DPX instructions
  • High Bandwidth Memory 3 (HBM3) on H100 80GB
  • Double FP32 cores per Streaming Multiprocessor (SM)
  • NVLink 4.0
  • PCI Express 5.0 with SR-IOV support (SR-IOV is reserved only for H100)
  • Second Generation Multi-instance GPU (MIG) virtualization and GPU partitioning feature in H100 supporting up to seven instances
  • PureVideo feature set hardware video decoding
  • 8 NVDEC for H100
  • Adds new hardware-based single-core JPEG decode with 7 NVJPG hardware decoders (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)

Chips[edit]

Comparison of Compute Capability: GP100 vs GV100 vs GA100 vs GH100[10][11]

GPU features NVIDIA Tesla P100 NVIDIA Tesla V100 NVIDIA A100 NVIDIA H100
GPU codename GP100 GV100 GA100 GH100
GPU architecture NVIDIA Pascal NVIDIA Volta NVIDIA Ampere NVIDIA Hopper
Transistors 15.3 billion 21.1 billion 54.2 billion 80 billion
Process 16nm 12nm TSMC 7nm TSMC 4nm
Die size 610 mm2 828 mm2 815 mm2 814 mm2
Compute capability 6.0 7.0 8.0 9.0
Threads / warp 32 32 32 32
Max warps / SM 64 64 64 64
Max threads / SM 2048 2048 2048 2048
Max thread blocks / SM 32 32 32 32
Max Thread Blocks / Thread Block Clusters N/A N/A N/A 16
Max 32-bit registers / SM 65536 65536 65536 65536
Max registers / block 65536 65536 65536 65536
Max registers / thread 255 255 255 255
Max thread block size 1024 1024 1024 1024
FP32 cores / SM 64 64 64 128
Ratio of SM registers to FP32 cores 1024 1024 1024 512
Shared Memory Size / SM 64 KB Configurable up to 96 KB Configurable up to 164 KB Configurable up to 228 KB

Comparison of Precision Support Matrix[12][13]

Supported CUDA Core Precisions Supported Tensor Core Precisions
FP8 FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16 FP8 FP16 FP32 FP64 INT1 INT4 INT8 TF32 BF16
NVIDIA Tesla P4 No No Yes Yes No No Yes No No No No No No No No No No No
NVIDIA P100 No Yes Yes Yes No No No No No No No No No No No No No No
NVIDIA Volta No Yes Yes Yes No No Yes No No No Yes No No No No No No No
NVIDIA Turing No Yes Yes Yes No No Yes No No No Yes No No Yes Yes Yes No No
NVIDIA A100 No Yes Yes Yes No No Yes No Yes No Yes No Yes Yes Yes Yes Yes Yes
NVIDIA H100 No Yes Yes Yes No No Yes No Yes Yes Yes No Yes No No Yes Yes Yes

Legend:

  • FPnn: floating point with nn bits
  • INTn: integer with n bits
  • INT1: binary
  • TF32: TensorFloat32
  • BF16: bfloat16

Comparison of Decode Performance

Concurrent streams H.264 decode (1080p30) H.265 (HEVC) decode (1080p30) VP9 decode (1080p30)
V100 16 22 22
A100 75 157 108
H100 170 340 260
Images/sec[11] JPEG 4:4:4 decode(1080p) JPEG 4:2:0 decode(1080p)
A100 1490 2950
H100 3310 6350

Products using Hopper[edit]

See also[edit]

References[edit]