Data mining techniques for protein sequence analysis

Hamby, Stephen Edward (2010) Data mining techniques for protein sequence analysis. PhD thesis, University of Nottingham.

[img]
Preview
PDF (PhD Thesis Stephen Hamby) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (8MB) | Preview

Abstract

This thesis concerns two areas of bioinformatics related by their role in protein structure and function: protein structure prediction and post translational modification of proteins. The dihedral angles Ψ and Φ are predicted using support vector regression. For the prediction of Ψ dihedral angles the addition of structural information is examined and the normalisation of Ψ and Φ dihedral angles is examined. An application of the dihedral angles is investigated. The relationship between dihedral angles and three bond J couplings determined from NMR experiments is described by the Karplus equation. We investigate the determination of the correct solution of the Karplus equation using predicted Φ dihedral angles. Glycosylation is an important post translational modification of proteins involved in many different facets of biology. The work here investigates the prediction of N-linked and O-linked glycosylation sites using the random forest machine learning algorithm and pairwise patterns in the data. This methodology produces more accurate results when compared to state of the art prediction methods. The black box nature of random forest is addressed by using the trepan algorithm to generate a decision tree with comprehensible rules that represents the decision making process of random forest. The prediction of our program GPP does not distinguish between glycans at a given glycosylation site. We use farthest first clustering, with the idea of classifying each glycosylation site by the sugar linking the glycan to protein. This thesis demonstrates the prediction of protein backbone torsion angles and improves the current state of the art for the prediction of glycosylation sites. It also investigates potential applications and the interpretation of these methods.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Hirst, J.D.
Besley, N.A.
Subjects: Q Science > QD Chemistry > QD241 Organic chemistry > QD415 Biochemistry
Q Science > QH Natural history. Biology > QH301 Biology (General)
Faculties/Schools: UK Campuses > Faculty of Science > School of Chemistry
Item ID: 11498
Depositing User: EP, Services
Date Deposited: 06 Apr 2011 14:14
Last Modified: 14 Oct 2017 13:39
URI: https://eprints.nottingham.ac.uk/id/eprint/11498

Actions (Archive Staff Only)

Edit View Edit View