NFNets: High-Performance Large-Scale Image Recognition Without Normalization (ML Paper Explained)

#nfnets #deepmind #machinelearning Batch Normalization is a core component of modern deep learning. It enables training at higher batch sizes, prevents mean shift, provides implicit regularization, and allows networks to reach higher performance than without. However, BatchNorm also has disadvantages, such as its dependence on batch size and its computational overhead, especially in distributed settings. Normalizer-Free Networks, developed at Google DeepMind, are a class of CNNs that achieve state-of-the-art classification accuracy on ImageNet without batch normalization. This is achieved by using adaptive gradient clipping (AGC), combined with a number of improvements in general network architecture. The resulting networks train faster, are more accurate, and provide better transfer learning performance. Code is provided in Jax. OUTLINE: 0:00 - Intro & Overview 2:40 - What’s the problem with BatchNorm? 11:00 - Paper contribution Overview 13:30 - Beneficial properties of BatchNorm 15:30 - Previous work: NF-
Back to Top