← All Workloads

ML Training

Training throughput for deep learning models using GPU-accelerated compute.

GPT-2 Training Throughput

Higher is better -- sorted by performance

Our Recommendations

A3a3-highgpu-8g

Large-scale model training and fine-tuning

8x H100 GPUs with 1800 Gbps NVLink interconnect. 48,000 tokens/sec GPT-2 training throughput. 15.8 PFLOPS FP8 compute. This is the machine for serious training workloads -- nothing else on GCP comes close.

G2g2-standard-96

Budget training and fine-tuning smaller models

8x L4 GPUs provide solid training performance for smaller models at roughly 1/3 the cost of A3. 5,400 tokens/sec GPT-2 throughput. Good for fine-tuning, small model training, and research experiments.

Budget PickG2(g2-standard-24)

A single L4 GPU with enough CPU and memory for most fine-tuning jobs. Use Spot pricing for 70% savings on training runs that can handle interruptions.

All Benchmark Data

VM SeriesMachine TypeMetricResultNotes
A3a3-highgpu-8gGPT-2 Training Throughput48,000 tokens/sGPT-2 medium, 8x H100, DeepSpeed ZeRO-3, FP16
G2g2-standard-96GPT-2 Training Throughput5,400 tokens/sGPT-2 medium, 8x L4, DeepSpeed ZeRO-2, FP16
A3a3-highgpu-8gResNet-50 Training Images/sec12,800 images/sResNet-50, 8x H100, mixed precision, PyTorch DDP
G2g2-standard-96ResNet-50 Training Images/sec2,400 images/sResNet-50, 8x L4, mixed precision, PyTorch DDP