Efficient Training and Implementation of Gaussian Process Potentials

Broad, Jack W. (2022) Efficient Training and Implementation of Gaussian Process Potentials. PhD thesis, University of Nottingham.

[thumbnail of Final copy of thesis, complete with all corrections]
Preview
PDF (Final copy of thesis, complete with all corrections) (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (13MB) | Preview

Abstract

Molecular simulations are a powerful tool for translating information about the intermolecular interactions within a system to thermophysical properties via statistical mechanics. However, the accuracy of any simulation is limited by the potentials that model the microscopic interactions. Most first principles methods are too computationally expensive for use at every time-step or cycle of a simulation, which require typically thousands of energy evaluations. Meanwhile, cheaper semi-empirical potentials give rise to only qualitatively accurate simulations. Consequently, methods for efficient first principles predictions in simulations are of interest.

Machine-learned potentials (MLPs) have shown promise in this area, offering first principles predictions at a fraction of the cost of ab initio calculation. Of particular interest are Gaussian process (GP) potentials, which achieve equivalent accuracy to other MLPs with smaller training sets. They therefore offer the best route to employing information from expensive ab initio calculations, for which building a large data set is time-consuming.

GP potentials, however, are among the most computationally intensive MLPs. Thus, they are far more costly to employ in simulations than semi-empirical potentials. This work addresses the computational expense of GP potentials by both reducing the training set size at a given accuracy and developing a method to invoke GP potentials efficiently for first principles prediction in simulations.

By varying the cross-over distance between the GP and a long-range function with the accuracy of the former, training by sequential design requires up to 40 % fewer training points at fixed accuracy. This method was applied successfully to the CO-Ne, HF-Ne, HF-Na+, CO2-Ne, 2CO, 2HF and 2HCl systems, and can be extended easily to other interactions and methods of prediction. Meanwhile, a significant reduction in the time taken for Monte Carlo displacement and volume change moves is achieved by parallelisation of the requisite GP calculations. Though this exploits in part the framework of GP regression, the distribution of the calculations themselves is general to other methods of prediction. The work also shows that current kernels and input transforms for modelling intermolecular interactions are not improved easily.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Graham, Richard S.
Wheatley, Richard J.
Keywords: Machine Learning, Applied Mathematics, Machine-learned potentials, Gaussian processes
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA273 Probabilities
Faculties/Schools: UK Campuses > Faculty of Science > School of Mathematical Sciences
Item ID: 69868
Depositing User: Broad, Jack
Date Deposited: 15 Oct 2022 04:40
Last Modified: 15 Oct 2022 04:40
URI: https://eprints.nottingham.ac.uk/id/eprint/69868

Actions (Archive Staff Only)

Edit View Edit View