Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning

Haywood, Alexe L. (2024) Artificial Intelligence for Chemical Synthesis: Improving the Workflow of Medicinal Chemists using Computer-Aided Synthesis Planning. PhD thesis, University of Nottingham.

[img]
Preview
PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (32MB) | Preview

Abstract

Machine learning techniques have numerous applications in modern drug discovery. Advances in computing power, machine learning algorithms and data availability have inspired renewed interest in artificial intelligence and automation in chemical synthesis. The field of Computer-Aided Synthesis Planning (CASP) aims to improve chemists’ workflow by shortening the time required to synthesise compounds, giving them more time to analyse and design future experiments. In this thesis, we review contemporary CASP methodologies before developing machine learning models to predict reaction yield. State-of-the-art approaches to forward reaction prediction and retrosynthetic analysis tasks are outlined and compared using quantitative metrics.

Predicting reaction yield is a newer aspect of CASP that has received significantly less attention than forward reaction prediction and retrosynthetic planning. This is owing, in part, to a lack of curated reaction data reporting reaction yield. Using a combinatorial benchmark dataset generated using high throughput experimentation, we evaluate machine learning models to predict reaction yield. Our research focuses on linear, tree-based, and Support Vector Regression (SVR) machine-learning algorithms. Chemical reactivity regression tasks frequently use molecular descriptors based on time-consuming, computationally demanding quantum chemical calculations. Along with quantum chemical descriptors, we investigate a range of topological representations that are quicker to calculate and applicable to all molecules. SVR emerges as the most promising machine learning model across all molecular descriptors in a preliminary crossvalidation test evaluating interpolation.

Rigorous out-of-sample tests are designed to reliably assess the extrapolation capabilities of the most promising SVR models. The performance of SVR models built on topological representations surpasses those constructed on quantum chemical descriptors. The top SVR models built on each descriptor are subjected to additional validation. A collection of previously unseen perspective chemical reactions is compiled. Predictions are presented for synthetic assessment to validate and explore the extent of the generalisability of the top SVR models.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Hirst, Jonathan
Keywords: machine learning, Computer-Aided Synthesis Planning, drug discovery, chemical synthesis
Subjects: Q Science > Q Science (General)
R Medicine > R Medicine (General) > R855 Medical technology. Biomedical engineering. Electronics
R Medicine > RS Pharmacy and materia medica
Faculties/Schools: UK Campuses > Faculty of Science > School of Chemistry
Item ID: 77169
Depositing User: Haywood, Alexe
Date Deposited: 24 Jul 2024 04:40
Last Modified: 24 Jul 2024 04:40
URI: https://eprints.nottingham.ac.uk/id/eprint/77169

Actions (Archive Staff Only)

Edit View Edit View