Mica Teo

Bayesian nonparametric scalar-on-image regression models: Identifying brain regions of interest in Alzheimer’s disease

Mica Teo


Alzheimer’s disease (AD) is a damaging brain disease and an increasing burden on society. Unfortunately, a definite diagnosis of the disease is typically unknown until an autopsy, as it requires histopathologic examination of brain tissue, an invasive procedure. Vast amounts of clinical, biological and neuroimaging data to study AD are being collected through projects such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the UK Biobank. The overall goal of this project is to utilize this large and diverse data to improve diagnosis and understanding of the disease, particularly in early stages of the disease when a diagnosis is most critical and any proposed drugs or therapies are most likely to be most effective.

In order to capture the complex pattern of changes associated to AD and improve diagnostic accuracy in the early stages, when changes can be very subtle, there is a strong need to investigate new approaches that use the entire brain imaging data. However, the massive dimension of the images, which is often in the millions, combined with the relatively small sample size, that at best is usually in the hundreds, pose serious challenges. In the statistical literature, diagnosing AD based on neuroimages can be framed as a scalar-on-image (SI) regression problem. SI regression belongs to the “large p, small n" paradigm; thus, many SI models utilize shrinkage methods that additionally incorporate the spatial information in the image. In the SI regression problem, the covariates represent the image value at a single voxel, i.e. a very tiny part of the brain, and the effect on the response is most often weak, unreliable and uninterpretable. Moreover, neighbouring voxels are highly correlated, making standard regression methods, even with shrinkage, problematic due to multicollinearity.

To overcome these difficulties, this project develops novel SI regression models that group together voxels with similar effects on the response to have a common coefficient within the SI model. As opposed to, the proposed model for clustering voxels utilises the spatial coordinates of the voxels to enforce that groups represent spatially contiguous regions. In this case, features represent brain regions which are automatically defined to be the most discriminative based on the chosen classifier. This not only improves the signal and eases interpretability, but also reduces the computational burden by drastically decreasing the image dimension and addresses the multicollinearity problem. In particular, I will develop novel Bayesian nonparametric (BNP) models; BNP is an exciting and expanding field characterized by flexible models that adapt to the complexity of the model to the data. Advantages of the proposed BNP approach include allowing the data to determine the number of regions; quantification of uncertainty in the diagnosis and other unknowns, such as the number of regions; and incorporation of prior knowledge from previous studies or expertise of doctors and clinicians.

The aim of this research is to develop scalable BNP SI regression models for automatic identification of brain regions to diagnose AD. To this end, the three goals are methodological, computational and applied.

1. Methodological: novel BNP SI regressions model that clusters voxels into spatial contiguous regions.

2. Computational:

(a) posterior inference through Markov chain Monte Carlo (MCMC);

(b)approximate inference through scalable approaches, including maximum aposteriori (MAP) and variational Bayes (VB).

3. Applied: AD study for diagnosis based on magnetic resonance imaging (MRI) data

Supervisors: Sara Wade & Vanda Inacio De Carvalho