Current approaches to machine learning, particularly deep learning, are often successful on single tasks where there is plentiful data. However we may want one model to perform well across many (usually related) tasks, for example if we have a constraint on space and we only want to store one, all-purpose model. This situation might arise if we wanted a single computer vision system on a mobile phone to, say, recognise objects, create a depth map of the surroundings, and suggest camera settings. Additionally, multi-task systems are a step towards more ‘generally intelligent’ systems that are not constrained to a particular task. We will explore how to efficiently add parameters to a pre-trained system so that we can get good performance on many tasks. We will also think about how to choose an informative subset of data, and a good training procedure, so that we can add new tasks to a multi-task model while remembering old ones.
Methods for not forgetting old tasks have been explored for the setting where new tasks arrive sequentially, but less so in the multi-task setting, where we assume some access to old data. New challenges may arise due to the use of more complicated tasks than most previous work. Adding parameters efficiently has also been explored, specifically in computer vision, and we will build on some of this work by adapting it for new model architectures and considering alternatives.
As a starting point we will take a language processing model which has been pretrained on a very large corpus of English text, and is available to download. Previously this model has been fine-tuned on text classification tasks, achieving strong results. We intend to find the best way to add a small number of parameters and get good performance in a multi-task setting, on many text classification tasks at once. Future work will concentrate on generalising to other architectures, and finding the best training procedures for adding new tasks to a pretrained system.