Parameter Prediction for Unseen Deep Architectures (w/ First Author Boris Knyazev)

#deeplearning #neuralarchitecturesearch #metalearning Deep Neural Networks are usually trained from a given parameter initialization using SGD until convergence at a local optimum. This paper goes a different route: Given a novel network architecture for a known dataset, can we predict the final network parameters without ever training them? The authors build a Graph-Hypernetwork and train on a novel dataset of various DNN-architectures to predict high-performing weights. The results show that not only can the GHN predict weights with non-trivial performance, but it can also generalize beyond the distribution of training architectures to predict weights for networks that are much larger, deeper, or wider than ever seen in training. OUTLINE: 0:00 - Intro & Overview 6:20 - DeepNets-1M Dataset 13:25 - How to train the Hypernetwork 17:30 - Recap on Graph Neural Networks 23:40 - Message Passing mirrors forward and backward propagation 25:20 - How to deal with different output shapes 28:45 - Differentiable Normal

6 views