Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data

Marsh, J. S. (2024) Models and methods to integrate epidemiological and whole genome sequence data for effectively analysing infectious disease outbreak data. PhD thesis, University of Nottingham.

[thumbnail of JoeMarshPhDThesis.pdf]
Preview
PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (11MB) | Preview

Abstract

Advances in sequencing technology and the reduction in associated costs have enabled scientists to obtain highly detailed genomic data on disease-causing pathogens on a scale never seen before. Combining genomic data with traditional epidemiological data (e.g. incidence data) provides a unique opportunity to determine the actual transmission pathway of the pathogen through a population. Despite recent advances, existing approaches have their own limitations, such as simplifications to the underlying biological processes, arbitrary phenomenological models or approximations to the likelihood function, to name a few.



We present a novel modelling framework for integrating epidemiological and whole genome sequence data to overcome the above limitations where (i) we use the matrix of pairwise horizontal distances between sequences as a summary statistic for the genetic data and (ii) explicitly derive joint probability distribution of pairwise genetic distances under the assumption of microevolution mutation models. We develop bespoke and computationally efficient data-augmentation MCMC algorithms to infer the transmission network, infection times and unobserved genetic distances from pathogen sequences at the time of transmission.



The framework presented is general and applicable to a variety of outbreak scenarios. For example, we explicitly consider a discrete time transmission model for healthcare associated infections and demonstrate the performance of our framework on simulated data and also analyse an outbreak of \textit{S. aureus} in an intensive care unit in Brighton during 2011-2012. Our approach integrates healthcare worker data at an individual level and considers the possibility of multiple distinct genetic subtypes.



Finally we also consider integrating genetic data with a continuous time SEIR model and analyse an outbreak of foot-and-mouth disease in Darlington, a town in the north west of the UK in 2001. We validated our inferred transmission network with previous modelling studies and demonstrate that pairwise genetic distance is an informative summary of the raw sequence data.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Kypraios, T.
O'Neill, P. D.
Keywords: stochastic epidemic models; bayesian inference; genetic data; transmission trees; mrsa; foot and mouth disease; pairwise genetic distance data; who infected whom
Subjects: R Medicine > RA Public aspects of medicine
Faculties/Schools: UK Campuses > Faculty of Science > School of Mathematical Sciences
Item ID: 77270
Depositing User: Marsh, Joseph
Date Deposited: 24 Jul 2024 04:40
Last Modified: 24 Jul 2024 04:40
URI: https://eprints.nottingham.ac.uk/id/eprint/77270

Actions (Archive Staff Only)

Edit View Edit View