William Toner

Learning and using Data Symmetries in Classification

William Toner

The ultimate goal of Artificial General Intelligence (AGI) is to produce an intelligence capable of self-learning. This is to say that AGI should be capable of learning about and solving a wide variety of different tasks without having to be explicitly trained beforehand. Needless to say this final goal is some way off. In particular, most current machine learning methods which require a high-degree of hand-holding. Architectures and methods are typically crafted to solve specific problems with minimal generality to other domains or tasks. A case study is the typical approach to image classification: Here one takes a large collection of labelled images and uses them to train a convolutional neural network. The CNN is structurally imbued with translational equivariance meaning that this essential property of image data is imposed rather than learnt by the network.

On the other hand, it is important to remember that there is no such thing as a free lunch Wolpert and Macready (1997) and that no progress can be made on any task without the application of inductive biases. If instead we construct networks with few or no inbuilt assumption about our domain then we can expect poor performance. In the case of image classification this can be seen by the inferior performance of Multi-layer perceptrons (MLP) over CNNs despite in principle greater functional representativity of the former. How then can we reconcile the apparent contradiction between these two facts: On the one hand we know that we cannot make any progress on a task without the application of inductive biases. Conversely however, we wish to design systems which can operate in a largely autonomous way.

One potential approach is to use a more hierarchical approach. Rather than building systems which directly exploit a known symmetry in the data, we give the system the ability to look for symmetries. Here the prior is shifted from being very explicit and task-dependent to being a broader method approaching any problem. For example in the case of classification, rather than giving the explicit information that the class of an image is translational invariant we can attempt to build models which look for and utilise class-invariant symmetries. The goal of this PhD is to build methods which look for and implement data invariances in a single pipeline. Much of this work aims to continue from existing work which utilise to invariances to achieve superior performance.

Supervisors: Amos Storkey &