To access the full text documents, please follow this link: http://hdl.handle.net/10459.1/48329

High Performance computing improvements on bioinformatics consistency-based multiple sequence alignment tools
Orobitg Cortada, Miquel; Guirado Fernández, Fernando; Cores Prado, Fernando; Lladós Segura, Jordi; Notredame, Cedric
Multiple Sequence Alignment (MSA) is essential for a wide range of applications in Bioinformatics. Traditionally, the alignment accuracy was the main metric used to evaluate the goodness of MSA tools. However, with the growth of sequencing data, other features, such as performance and the capacity to align larger datasets, are gaining strength. To achieve these new requirements, without affecting accuracy, the use of high-performance computing (HPC) resources and techniques is crucial. In this paper, we apply HPC techniques in T-Coffee, one of the more accurate but less scalable MSA tools. We integrate three innovative solutions into T-Coffee: the Balanced Guide Tree to increase the parallelism/performance, the Optimized Library Method with the aim of enhancing the scalability and the Multiple Tree Alignment, which explores different alignments in parallel to improve the accuracy. The results obtained show that the resulting tool, MTA-TCoffee, is able to improve the scalability in both the execution time and also the number of sequences to be aligned. Furthermore, not only is the alignment accuracy not affected by these improvements, as would be expected, but it improves significantly. Finally, we emphasize that the presented methods are not just restricted to T-Coffee, but may be implemented in any other alignment tools that use similar algorithms (progressive alignment, consistency or guide trees). This work was supported by the Government of Spain TIN2011–28689-C02–02, TIN2010–12011-E, Consolider CSD2007–00050 and the CUR of GENCAT. Cedric Notredame is funded by the Plan Nacional BFU2011–28575 and the European Commission FP7, LEISHDRUG Project (No. 223414) and The Quantomics Project (KBBE-2A-222664).
-Multiple Sequence Alignment
-Consistency
-T-Coffee
-High Performance Computing
-Scalability
-Informàtica
-Computer science
(c) Elsevier, 2015
Article
Article - Submitted version
Elsevier
         

Full text files in this document

Files Size Format View
021620.pdf 860.9 KB application/pdf View/Open

Show full item record

Related documents

Other documents of the same author

 

Coordination

 

Supporters