Data science is the study of the computational principles, methods, and systems for extracting knowledge from data. Large data sets are now generated by almost every activity in science, society, and commerce — ranging from molecular biology to social media, from sustainable energy to health care.

Artificial Intelligence (AI) is the study of computational systems that demonstrate capabilities of perception, reasoning, learning and action that are typical of human intelligence. We have also recently seen an explosive growth in the capability of modern AI technologies. The great recent successes of modern AI, such as object recognition and game playing, are based on data-driven approaches rooted in machine learning and deep networks.

The CDT recognises that if researchers in this area wish to be leaders in AI and shape the technology landscape, they must be deeply conversant both data science and AI. Only then will they be able to direct and understand how to research, develop and target technologies that will be the pioneering breakthroughs.

Data science asks: How can we efficiently find patterns in these vast streams of data? Many research areas have tackled parts of this problem: machine learning and artificial intelligence provide methods for finding patterns and making predictions and decisions from data; databases are needed for efficiently accessing data and ensuring its quality; statistics and optimization provide fundamental mathematical ideas and methods; ideas from algorithms are required to build systems that scale to big data streams; and natural language processingcomputer vision, and speech processing are each needed for analysis of different types of unstructured data. Recently, these distinct disciplines have begun to converge into a single field called data science. At the same time modern AI is itself highly dependent on data science: the collection of the right data and the powerful methods for analysing that data to develop automated methods would not be progressing as it is were it not for the data science and machine learning underpinning it.

With the vast changes happening in data science and AI come pertinent ethical questions as to the societal impact of methods and applications. These questions cannot be separated from the research: they are a fundamental part of it.

The Centre for Doctoral Training (CDT) in Data Science and AI and the EPSRC CDT in Data Science before it are based at the University of Edinburgh. These CDTs are training a new generation of data scientists and AI practitioners and researchers. Students develop the technical skills and interdisciplinary awareness necessary to become R&D leaders in this emerging area. The first cohort of the EPSRC Centre for Doctoral Training in Data Science started the programme in September 2014, and the new CDT in Data Science and AI transforms and continues this programme into the future.