Statistical inference for molecular shapes

Czogiel, Irina (2010) Statistical inference for molecular shapes. PhD thesis, University of Nottingham.

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (2MB) | Preview


This thesis is concerned with developing statistical methods for evaluating and comparing molecular shapes. Techniques from statistical shape analysis serve as a basis for our methods. However, as molecules are fuzzy objects of electron clouds which constantly undergo vibrational motions and conformational changes, these techniques should be modified to be more suitable for the distinctive features of molecular shape.

The first part of this thesis is concerned with the continuous nature of molecules. Based on molecular properties which have been measured at the atom positions, a continuous field--based representation of a molecule is obtained using methods from spatial statistics. Within the framework of reproducing kernel Hilbert spaces, a similarity index for two molecular shapes is proposed which can then be used for the pairwise alignment of molecules. The alignment is carried out using Markov chain Monte Carlo methods and posterior inference. In the Bayesian setting, it is also possible to introduce additional parameters (mask vectors) which allow for the fact that only part of the molecules may be similar. We apply our methods to a dataset of 31 steroid molecules which fall into three activity classes with respect to the binding activity to a common receptor protein. To investigate which molecular features distinguish the activity classes, we also propose a generalisation of the pairwise method to the simultaneous alignment of several molecules.

The second part of this thesis is concerned with the dynamic aspect of molecular shapes. Here, we consider a dataset containing time series of DNA configurations which have been obtained using molecular dynamic simulations. For each considered DNA duplex, both a damaged and an undamaged version are available, and the objective is to investigate whether or not the damage induces a significant difference to the the mean shape of the molecule. To do so, we consider bootstrap hypothesis tests for the equality of mean shapes. In particular, we investigate the use of a computationally inexpensive algorithm which is based on the Procrustes tangent space. Two versions of this algorithm are proposed. The first version is designed for independent configuration matrices while the second version is specifically designed to accommodate temporal dependence of the configurations within each group and is hence more suitable for the DNA data.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Dryden, I.L.
Brignell, C.J.
Subjects: Q Science > QA Mathematics > QA273 Probabilities
Faculties/Schools: UK Campuses > Faculty of Science > School of Mathematical Sciences
Item ID: 12217
Depositing User: EP, Services
Date Deposited: 02 Dec 2011 11:49
Last Modified: 16 Dec 2017 12:38

Actions (Archive Staff Only)

Edit View Edit View