Hopper (microarchitecture) – Wikipedia
From Wikipedia, the free encyclopedia
GPU microarchitecture designed by Nvidia
Fabrication process | TSMC N4 |
---|---|
Predecessor | Ampere (consumer, professional) |
Supported |
Hopper is the codename for Nvidia’s GPU Datacenter microarchitecture that will be parallel release of Ada Lovelace (for the consumer segment).[citation needed] It is named after the American computer scientist and United States Navy Rear Admiral Grace Hopper. Hopper was once rumored to be Nvidia’s first generation of GPUs that will use multi-chip modules (MCMs), although the H100 announcement showed a massive monolithic die.[1][2][3][4][5][6] Nvidia officially announced the Hopper GPU microarchitecture and H100 GPU at GTC 2022 on March 22, 2022.[7]
Details[edit]
Architectural improvements of the Hopper architecture include the following:[8]
- CUDA Compute Capability 9.0[9]
- TSMC N4 FinFET process
- Fourth-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and FP64 support and sparsity acceleration.
- New Nvidia Transformer Engine with FP8 and FP16
- New DPX instructions
- High Bandwidth Memory 3 (HBM3) on H100 80GB
- Double FP32 cores per Streaming Multiprocessor (SM)
- NVLink 4.0
- PCI Express 5.0 with SR-IOV support (SR-IOV is reserved only for H100)
- Second Generation Multi-instance GPU (MIG) virtualization and GPU partitioning feature in H100 supporting up to seven instances
- PureVideo feature set hardware video decoding
- 8 NVDEC for H100
- Adds new hardware-based single-core JPEG decode with 7 NVJPG hardware decoders (NVJPG) with YUV420, YUV422, YUV444, YUV400, RGBA. Should not be confused with Nvidia NVJPEG (GPU-accelerated library for JPEG encoding/decoding)
Chips[edit]
Comparison of Compute Capability: GP100 vs GV100 vs GA100 vs GH100[10][11]
GPU features | NVIDIA Tesla P100 | NVIDIA Tesla V100 | NVIDIA A100 | NVIDIA H100 |
---|---|---|---|---|
GPU codename | GP100 | GV100 | GA100 | GH100 |
GPU architecture | NVIDIA Pascal | NVIDIA Volta | NVIDIA Ampere | NVIDIA Hopper |
Transistors | 15.3 billion | 21.1 billion | 54.2 billion | 80 billion |
Process | 16nm | 12nm | TSMC 7nm | TSMC 4nm |
Die size | 610 mm2 | 828 mm2 | 815 mm2 | 814 mm2 |
Compute capability | 6.0 | 7.0 | 8.0 | 9.0 |
Threads / warp | 32 | 32 | 32 | 32 |
Max warps / SM | 64 | 64 | 64 | 64 |
Max threads / SM | 2048 | 2048 | 2048 | 2048 |
Max thread blocks / SM | 32 | 32 | 32 | 32 |
Max Thread Blocks / Thread Block Clusters | N/A | N/A | N/A | 16 |
Max 32-bit registers / SM | 65536 | 65536 | 65536 | 65536 |
Max registers / block | 65536 | 65536 | 65536 | 65536 |
Max registers / thread | 255 | 255 | 255 | 255 |
Max thread block size | 1024 | 1024 | 1024 | 1024 |
FP32 cores / SM | 64 | 64 | 64 | 128 |
Ratio of SM registers to FP32 cores | 1024 | 1024 | 1024 | 512 |
Shared Memory Size / SM | 64 KB | Configurable up to 96 KB | Configurable up to 164 KB | Configurable up to 228 KB |
Comparison of Precision Support Matrix[12][13]
Supported CUDA Core Precisions | Supported Tensor Core Precisions | |||||||||||||||||
FP8 | FP16 | FP32 | FP64 | INT1 | INT4 | INT8 | TF32 | BF16 | FP8 | FP16 | FP32 | FP64 | INT1 | INT4 | INT8 | TF32 | BF16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NVIDIA Tesla P4 | No | No | Yes | Yes | No | No | Yes | No | No | No | No | No | No | No | No | No | No | No |
NVIDIA P100 | No | Yes | Yes | Yes | No | No | No | No | No | No | No | No | No | No | No | No | No | No |
NVIDIA Volta | No | Yes | Yes | Yes | No | No | Yes | No | No | No | Yes | No | No | No | No | No | No | No |
NVIDIA Turing | No | Yes | Yes | Yes | No | No | Yes | No | No | No | Yes | No | No | Yes | Yes | Yes | No | No |
NVIDIA A100 | No | Yes | Yes | Yes | No | No | Yes | No | Yes | No | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes |
NVIDIA H100 | No | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes | Yes | No | Yes | No | No | Yes | Yes | Yes |
Legend:
- FPnn: floating point with nn bits
- INTn: integer with n bits
- INT1: binary
- TF32: TensorFloat32
- BF16: bfloat16
Comparison of Decode Performance
Concurrent streams | H.264 decode (1080p30) | H.265 (HEVC) decode (1080p30) | VP9 decode (1080p30) |
---|---|---|---|
V100 | 16 | 22 | 22 |
A100 | 75 | 157 | 108 |
H100 | 170 | 340 | 260 |
Images/sec[11] | JPEG 4:4:4 decode(1080p) | JPEG 4:2:0 decode(1080p) |
---|---|---|
A100 | 1490 | 2950 |
H100 | 3310 | 6350 |
Products using Hopper[edit]
See also[edit]
References[edit]
- ^ kopite7kimi (June 10, 2019). “After Ampere, the next codename of GeForce is Hopper, in memory of Grace Hopper”. @kopite7kimi. Retrieved December 1, 2019.
- ^ “Hardware- und Nachrichten-Links des 11./12. November 2019”. www.3dcenter.org (in German). Retrieved December 1, 2019.
- ^ Hagedoorn, Hilbert. “NVIDIA Next Gen-GPU Hopper could be offered in chiplet design”. Guru3D.com. Retrieved December 1, 2019.
- ^ Pirzada, Usman (November 16, 2019). “NVIDIA Next Generation Hopper GPU Leaked – Based On MCM Design, Launching After Ampere”. Wccftech. Retrieved December 1, 2019.
- ^ “NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder”. AnandTech. March 22, 2022.
- ^ “NVIDIA Hopper Architecture In-Depth”. Nvidia. March 22, 2022.
- ^ “NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing”.
- ^ “NVIDIA Hopper GPU Architecture”.
- ^ “CUDA C++ Programming Guide”.
- ^ “NVIDIA A100 Tensor Core GPU Architecture” (PDF). www.nvidia.com. Retrieved September 18, 2020.
- ^ a b “NVIDIA H100 Tensor Core GPU Architecture Whitepaper”. NVIDIA.
- ^ “NVIDIA Tensor Cores: Versatility for HPC & AI”. NVIDIA.
- ^ “Abstract”. docs.nvidia.com.
Recent Comments