Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

Berat Z Haznedaroglu1, Darryl Reeves2, Hamid Rismani-Yazdi1 3 and Jordan Peccia1*

BMC Bioinformatics 2012, 13:170 doi:10.1186/1471-2105-13-170

Background

The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly.

See suggested Work-flow in Supplementary information

Professor Peter Kille,

Cardiff School of Biosciences (BIOSI 1),
Cardiff University,
Main Building,
Cardiff
CF10 3AT

Tel: +44 (0)29 20874507
Mobile: 07870655403
Email: Kille@cardiff.ac.uk

http://biosi.subsite.cf.ac.uk/biosi/kille-morgan/

Kille and Morgan lab group

Hot Paper! Optimization of de novo transcriptome assembly

Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms

Background