Translating nucleic acid binding protein function from model species to minor crops using transfer learning

Bonthala, Venkata Suresh (2018) Translating nucleic acid binding protein function from model species to minor crops using transfer learning. PhD thesis, University of Nottingham.

[img] PDF (Corrected and final version) (Thesis - as examined) - Repository staff only until 21 July 2020. Subsequently available to Repository staff only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (3MB)

Abstract

Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To date, various types of computational methods have been developed to predict the function of a given protein sequence. The recent increase in the development of a number of methods has created its own set of problems leading to difficulty in applying on newly sequenced genomes especially non-model crops. Due to these reasons, the immediate requirement for development of sophisticated computational methods to predict the function of a given protein sequence is raised. This thesis presents three novel computational tools developed based on transfer learning algorithms to predict the function of a given protein sequence and these tools are: 1) TL-RBPPred, for prediction of RNA-binding proteins, outperformed SPOT-Seq, RNApred, RBPPred and BLASTp on HumanSet (AUC of 0.977), YeastSet (AUC of 0.971), ArabidopsisSet (AUC of 0.972) and GlymaxSet (AUC of 0.97); 2) TL-DBPPred, for prediction of DNA-binding proteins, outperformed DNABP, enDNA-Prot, iDNA-Prot, nDNAProt, iDNA-Prot|Dis, DNAbinder and BLASTp on an testing dataset (AUC of 0.988); and 3) TL-TFPred, for prediction of transcription factors, outperformed PlantTFcat, iTAK and BLASTp on testing dataset (AUC of 0.999) in terms of prediction accuracy. Further, both TL-RBPPred and TL-DBPPred were tested on the transcriptome of the non-model crop, Bambara groundnut (Vigna subterranea (L.) Verdc.), to identify RNA-binding and DNA-binding proteins, respectively. The results obtained from these tests indicated that these two methods outperformed in terms of prediction accuracy (AUC) as compared to existing current state-of-the art tools such as SPOT-Seq, RBPPred, iDNA-Prot and iDNA-Prot|Dis. Based on the performance, the developed methods will be useful in predicting the function of given protein sequences (DNA, RNA-binding and transcription factor) of model species as well as non-model crops.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Twycross, Jamie
Mayes, Sean
Massawe, Festo
Keywords: bambara groundnut, non-model crops, translate protein function, transfer learning
Subjects: Q Science > QH Natural history. Biology
Faculties/Schools: UNMC Malaysia Campus > Faculty of Science > School of Biosciences
Item ID: 52289
Depositing User: BONTHALA, VENKATA SURESH
Date Deposited: 20 Aug 2018 04:40
Last Modified: 13 Sep 2018 08:15
URI: http://eprints.nottingham.ac.uk/id/eprint/52289

Actions (Archive Staff Only)

Edit View Edit View