Meta-Learning approaches for Supervised and Unsupervised Few-Shot Learning


Antreas Antoniou

The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. \emph{Model Agnostic Meta Learning} or MAML is one of the current best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as diminishing gradients, being computationally very expensive and architecture-conditional training instability. In this report, we propose various improvements for MAML that not only stabilize the system, but also substantially improve the generalization performance (from 91.3\% to 97.8\% in the 20-way 1-shot Omniglot setting), convergence speed and computational overhead of MAML, which we call \emph{\newmaml}.MAML works by learning initialization parameters for a neural network such that after a few gradient (with respect to some training set loss) steps on those parameters, the model can generalize well on a validation set.

However, MAML learns a \emph{static} parameter initialization, which means that for every received task the model is always initialized with the same learned parameters. Learning static parameter initializations can be very restrictive, since it pushes the model to learn parameters that lie approximately at the center of the training tasks space. If the tasks are very far apart, the resulting learned initializations will lie very far from the target tasks. Thus, requiring either more update steps (i.e. more compute) or more data to generalize well. In this report, we propose an approach capable of learning \emph{dynamic} parameter initializations instead of static ones and we call it \emph{HyperMAML}. We propose doing so using a weight generation network (i.e. a dynamic hypernetwork) that is conditioned on the current task at hand. Preliminary experimental results provide some indication of the techniques performance and hint us in how we could improve it.

Furthermore, the field of few-shot learning has only been investigated in the supervised setting, where per-class labels are available. Whereas, the very desirable unsupervised few-shot learning setting, where no labels of any kind are required, has seen little to no investigation. In this report, we propose a method for training few-shot learning models in an unsupervised manner by leveraging semantic similarities between randomly labeled samples, which we call \emph{Unsupervised Model Agnostic Meta Learning} or \umaml. Models trained with \umaml{} can be directly used on real-labeled datasets with good generalization performance (65\% test accuracy on 20-way 1-shot Omniglot where the supervised variant achieves 75\%).

Supervisors: Amos Storkey & Tim Hospedales