Meta has released PyTorch 2.7, with significant quality-of-life improvements for practitioners and new capabilities for Apple Silicon users.
torch.compile Default for Training
torch.compile is now enabled by default in new Trainer APIs and on nn.Module's training paths when supported. Typical models see 20-40% training throughput improvements with zero code changes. Fallback to eager mode is automatic when incompatible ops are detected.
Distributed on Apple Silicon
PyTorch now supports multi-machine distributed training on Apple Silicon (M2 Ultra and later) via Thunderbolt 5 and NCCL-compatible MLX backend. Small research labs can build low-power training clusters from Mac Studios, achieving competitive throughput for mid-scale models.
Export Improvements
torch.export handles dynamic shapes more gracefully with a new Dim API, and ExecuTorch (on-device inference) is officially non-experimental with wide hardware backend coverage.