Miguel Jacques

Towards better extrapolation using physical and symbolic biases

Miguel Jacques

As neural networks become more complex and their range of applications broader, their failure modes also become more visible. These include susceptibility to adversarial examples, difficulty incorporating discrete quantities, among many others. Another issue is that of sane generalization to out-of-domain examples, or extrapolation, which is what we attempt to tackle. Here we define two types of extrapolation: spatial and temporal. Spatial extrapolation refers to the behavior of feedforward networks when given inputs outside the training domain, such as passing an image with a known pattern but with unseen color, or an image with a known color but with an unseen orientation. It also applies to non-perceptual cases, where we want to perform some numerical operations between vectors, but at test time these vectors have magnitudes larger than seen during training. This is a plausible case when performing arithmetic or logical operations [Trask et al., 2018].

Temporal extrapolation refers to the behavior of recurrent/autoregressive networks when used to make predictions over time intervals longer than those seen during training. These include motion prediction, from position vectors or video (e.g. [Watters et al., 2017]), or accumulation of values in time, where these values can grow beyond those seen during training. While extrapolation is a fundamentally ill-posed problem due to the fact that we are trying to model the data distribution in a region of input space for which we do not have any data, it is still an important problem to tackle if we want to build networks that can behave in a sane and expected manner (under our human notion of what “expected” behavior means) when deployed in the real world where they will certainly encounter cases that fall outside the training domain. This becomes particularly important when training data is limited or incomplete. We, as humans, have some built-in (or learned) priors about the behavior of objects in the world. For example, we known that there are certain functions that rule most of the motion or design patterns we see (lines, parabolas, etc.), which we can rely on to make accurate predictions with limited information. We argue that providing such inductive biases or providing the ability to learn such biases to neural network systems will improve their ability to extrapolate. Improving the extrapolation ability of a model, be it in space or time, will necessarily involve a trade-off between generality and specificity. In order to make a model extrapolate better, we have to incorporate more of the human knowledge of what we know about the problem in order to constrain the model so that we have some guarantees about how it will behave when faced with an unseen input. The challenge is to design the inductive biases in such a way that the model retains some learning flexibility, i.e., we do not include such strong inductive biases that the whole model becomes trivially replaceable by a hand-coded solution (as is the case some of the applications of [Trask et al., 2018]).

Supervisors: Tim Hospedales & Chris Williams