Chaudhury, Sultan Raja
(2024)
Investigating polygenic risk scores in Alzheimer's disease.
PhD thesis, University of Nottingham.
Abstract
Alzheimer’s disease (AD), the most common cause of dementia, is one of the most studied diseases in the UK due to its impact on quality of life, symptoms of neurodegeneration, and burden on health and social care. AD is most common in the elderly, as age is a significant risk factor, but can also be found in young people. Diagnosis and treatment has developed over the years through improved therapies and screening methods, but a cure or definitive disease prevention has not been found.
Polygenic risk scoring is a relatively new approach, enabled by advancements in genotyping and sequencing technologies. They are used to quantify individual risk based on variants with observed effect within genes associated with the disease, calculated from genome-wide association studies of case v control data. Modelling is completed at various significance thresholds to identify the threshold at which the greatest predictive ability is achieved. Polygenic risk scoring has become increasingly popular as a tool for screening cohorts used in research, selecting candidates for trials, and further understanding complex genetic diseases and relationships between endophenotypes and disease status.
This project investigates polygenic risk scores in Alzheimer’s disease, analysing late-onset AD (LOAD) cases, controls and undiagnosed samples recruited by the Brains for Dementia Research resource; mild cognitive impairment (MCI) cases recruited by the Inflammation, Cognition and Stress study; and sporadic early-onset AD (sEOAD) cases and controls recruited from research centres across the UK.
Genetic data was collected on either the NeuroX or NeuroChip array, quality controlled using recommended software and methodology, and imputed using the Michigan Imputation Server. Polygenic risk score (PRS) analysis was undertaken using PRSice-2 software, to calculate likelihood of developing AD and identify effective models for determining disease status in LOAD and sEOAD; the most predictive LOAD model was then used to predict likelihood in undiagnosed samples and MCI cases; a subset of genes expressed at the synapse were also analysed to understand their predictive ability in AD. This method utilised the most up-to-date analysis software and improved data sources to build on previously published work.
The results for LOAD using updated PRS software identified a model with similar levels of predictive ability (AUPRC = 81.5%) as previously reported. Imputation identified additional variants within the best model threshold which implicate more genes in AD risk.
The sEOAD model using updated PRS software also confirmed a model with similar levels of prediction (AUROC = 73.0%) as previously reported. Analysis of imputed data identified a predictive model (AUROC = 72.9%) at a more significant p-value threshold and also implicated many more genes in AD risk.
PRS analysis of synaptic genes using updated PRS software showed greater levels of predictive ability for LOAD requiring fewer SNPs (AUPRC = 85.5%) than previously reported. A predictive model was also seen when analysing sEOAD (AUROC = 72.5%), and when combining LOAD and sEOAD cases (AUROC = 74.2%; AUPRC = 77.5%).
Utilisation of the best model for LOAD for predicting AD likelihood in MCI cases and undiagnosed samples successfully distributed individuals into tiers of risk; when most recent conversion status was cross-referenced with MCI samples, distribution was seen across all risk tiers with most converters found to have moderate followed by high risk.
Identification of predictive models for LOAD and sEOAD which remain consistent with changes to methods, successful modelling of synaptic genes for LOAD and sEOAD, and moderate success in stratifying risk in undiagnosed samples highlight the utility of PRS in AD research. Continuous improvement in these analyses, through access to larger, more comprehensive datasets and advancements in software and methods, can enable greater accuracy and utility. This can ultimately establish polygenic risk scoring as a mechanism for understanding genetic risk for AD and other dementia sub-types, but further research in other complex diseases.
Actions (Archive Staff Only)
|
Edit View |