ML Training

Training throughput for deep learning models using GPU-accelerated compute.

GPT-2 Training Throughput

Higher is better -- sorted by performance

Our Recommendations

A3a3-highgpu-8g

Large-scale model training and fine-tuning

8x H100 GPUs with 1800 Gbps NVLink interconnect. 48,000 tokens/sec GPT-2 training throughput. 15.8 PFLOPS FP8 compute. This is the machine for serious training workloads -- nothing else on GCP comes close.

G2g2-standard-96

Budget training and fine-tuning smaller models

8x L4 GPUs provide solid training performance for smaller models at roughly 1/3 the cost of A3. 5,400 tokens/sec GPT-2 throughput. Good for fine-tuning, small model training, and research experiments.

Budget PickG2(g2-standard-24)

A single L4 GPU with enough CPU and memory for most fine-tuning jobs. Use Spot pricing for 70% savings on training runs that can handle interruptions.

All Benchmark Data

VM Series	Machine Type	Metric	Result	Notes
A3	a3-highgpu-8g	GPT-2 Training Throughput	48,000 tokens/s	GPT-2 medium, 8x H100, DeepSpeed ZeRO-3, FP16
G2	g2-standard-96	GPT-2 Training Throughput	5,400 tokens/s	GPT-2 medium, 8x L4, DeepSpeed ZeRO-2, FP16
A3	a3-highgpu-8g	ResNet-50 Training Images/sec	12,800 images/s	ResNet-50, 8x H100, mixed precision, PyTorch DDP
G2	g2-standard-96	ResNet-50 Training Images/sec	2,400 images/s	ResNet-50, 8x L4, mixed precision, PyTorch DDP