Interpolation Phase Transition in Neural Networks: Memorization and Generalization

Joe Zhong Postdoc, Stanford Abstract A mystery of modern neural networks is their surprising generalization power in overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones; despite this, they achieve good prediction error on unseen data. To demystify the above phenomena, we focus on two-layer neural networks in the neural tangent (NT) regime. Under a simple data model where n inputs are d-dimensional isotropic vectors and there are N hidden neurons, we show that as soon as Nd is much greater than n, the minimum eigenvalue of the empirical NT kernel is bounded away from zero, and therefore the network can exactly interpolate arbitrary labels. Next, we study the generalization error of NT ridge regression (including min-$ell_2$ norm interpolation). We show that in the same overparametrization regime Nd is much greater than n, in terms of generalization errors, NT ridge reg
Back to Top