Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that is estimated to aﬀect about 1% of people worldwide. ASD is a term used to describe a set of conditions characterised by earlyappearing social communication deﬁcits and unusually restricted repetitive behaviour and interests. There is a considerable genetic component to ASD risk (74% to 95% according to one meta analyse), yet, the strongest identiﬁable contributors of genetic risk are de novo mutations found in less than 1% of individuals with ASD. A large number of heterogeneous, individual genetic variants have been associated with ASD risk yet their individual eﬀects are small.This heterogeneous nature of ASD both genetically and in the characteristics (phenotypes) expressed have resulted in challenges in uncovering insights with clinical implications.
One potential avenue forward is through the analysis of phenotypic data. Recent times have seen the emergence of detailed phenotype data thanks to the use of Electronic Health Records (EHRs) and progress in natural language processing methods. EHRs have become ubiquitous in recent years and automatic extraction has facilitated the curation of cohorts with rich phenotype information. Through the analysis of phenotype data it is hoped we can uncover new insights into ASD and answer a number of research questions. For example, is ASD genuinely a spectrum? If not, what deﬁnes the categories it can be stratiﬁed into? Given a new patient with minimal phenotyping, can we place them into the network and gain insight on prognosis or potential missing features? Can we learn which phenotypes are informative in these patient networks for learning tasks such as link-prediction, node classiﬁcation, missing feature imputation?
By using a representation of patient phenotype data as a graphical network, we can naturally capture the relationships between entities and take advantage of the plethora of algorithms available to answer these kinds of research questions. Furthermore, the techniques we develop to analyse and learn from these patient networks need not simply be limited to ASD, the prevalence of EHRs allow similar questions to be answered for other Neurological disorders and beyond. This is the area in which my PhD research will be situated, using methods from network analysis and machine learning to investigate ASD and patient phenotype data. I hope to generate new insights which will be beneﬁcial to patients and clinicians, as well as researchers.
Supervisors: Ian Simpson & Colin McLean