Robust Fine-Tuning of Zero-Shot Models

Researchers from the University of Washington, Columbia University, Open AI, the Allen Institute of Artificial Intelligence, and Toyota Research have teamed up to present a new method for fine-tuning these pre-trained models such as GPT-3, BERT, DALL-E, EfficientNet, or CLIP for application specific datasets. The key insight is that as you fine-tune these models, you gain in-distribution accuracy, but sacrifice the zero-shot flexibility, or out-of-distribution generalization, of these pre-trained “foundation” models. The authors present Weight-Space Ensembling, where you take a linear interpolation between the weights of the zero-shot and fine-tuned model to make new inference. This achieves a balance between in and out of distribution accuracy. The authors connect this to Linear Mode Connectivity to explain why it works compared to random weight-space ensembles, which do not work. This is another very interesting study on the Generalization capability of Deep Neural Networks. This includes solving problems o

1 view