Virtual genome walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence

Evans, Teri and D. Johnson, Andrew and Loose, Matthew (2018) Virtual genome walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence. Scientific Reports, 8 (1). 618/1-618/13. ISSN 2045-2322

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (2MB) | Preview

Abstract

Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n. The software pipeline is available from https://github.com/LooseLab/iterassemble.

Item Type: Article
Schools/Departments: University of Nottingham, UK > Faculty of Medicine and Health Sciences > School of Life Sciences
Identification Number: 10.1038/s41598-017-19128-6
Depositing User: Eprints, Support
Date Deposited: 23 Jan 2018 10:58
Last Modified: 23 Jan 2018 20:42
URI: http://eprints.nottingham.ac.uk/id/eprint/49288

Actions (Archive Staff Only)

Edit View Edit View