An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome

Hindle, Matthew Morritt (2012) An integrated approach to enhancing functional annotation of sequences for data analysis of a transcriptome. PhD thesis, University of Nottingham.

[thumbnail of thesis2.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (12MB) | Preview

Abstract

Given the ever increasing quantity of sequence data, functional annotation of new gene sequences persists as being a significant challenge for bioinformatics. This is a particular problem for transcriptomics studies in crop plants where large genomes and evolutionarily distant model organisms, means that identifying the function of a given gene used on a microarray, is often a non-trivial task. Information pertinent to gene annotations is spread across technically and semantically heterogeneous biological databases. Combining and exploiting these data in a consistent way has the potential to improve our ability to assign functions to new or uncharacterised genes.

Methods: The Ondex data integration framework was further developed to integrate databases pertinent to plant gene annotation, and provide data inference tools. The CoPSA annotation pipeline was created to provide automated annotation of novel plant genes using this knowledgebase. CoPSA was used to derive annotations for Affymetrix GeneChips available for plant species. A conjoint approach was used to align GeneChip sequences to orthologous proteins, and identify protein domain regions. These proteins and domains were used together with multiple evidences to predict functional annotations for sequences on the GeneChip. Quality was assessed with reference to other annotation pipelines. These improved gene annotations were used in the analysis of a time-series transcriptomics study of the differential responses of durum wheat varieties to water stress.

Results and Conclusions: The integration of plant databases using the Ondex showed that it was possible to increase the overall quantity and quality of information available, and thereby improve the resulting annotation. Direct data aggregation benefits were observed, as well as new information derived from inference across databases. The CoPSA pipeline was shown to improve coverage of the wheat microarray compared to the NetAffx and BLAST2GO pipelines. Leverage of these annotations during the analysis of data from a transcriptomics study of the durum wheat water stress responses, yielded new biological insights into water stress and highlighted potential candidate genes that could be used by breeders to improve drought response.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Hodgman, T.C.
Keywords: Bioinformatics, Wheat, Transcriptomics, Data Integration, Query, Drought, Water Stress
Subjects: Q Science > QH Natural history. Biology > QH301 Biology (General)
Faculties/Schools: UK Campuses > Faculty of Science > School of Biosciences
Item ID: 12580
Depositing User: EP, Services
Date Deposited: 12 Nov 2012 12:20
Last Modified: 19 Dec 2017 14:44
URI: https://eprints.nottingham.ac.uk/id/eprint/12580

Actions (Archive Staff Only)

Edit View Edit View