Author

Lladós Segura, Jordi

Guirado Fernández, Fernando

Cores Prado, Fernando

Lérida Monsó, Josep Lluís

Notredame, Cedric

Publication date

2015-09-23T14:02:33Z

2015-09-23T14:02:33Z

2015-05-01

2015-09-23T14:02:33Z



Abstract

Multiple sequence alignment (MSA) is crucial for high-throughput next generation sequencing applications. Large-scale alignments with thousands of sequences are necessary for these applications. However, the quality of the alignment of current MSA tools decreases sharply when the number of sequences grows to several thousand. This accuracy degradation can be mitigated using global consistency information as in the T-Coffee MSA-Tool, which implements a consistency library. However, consistency-based methods do not scale well because of the computational resources required to calculate and store the consistency information, which grows quadratically. In this paper, we propose an alternative method for building the consistency-library. To allow unlimited scalability, consistency information must be discarded to avoid exceeding the environment memory. Our first approach deals with the memory limitation by identifying the most important entries, which provide better consistency. This method is able to achieve scalability, although there is a negative impact on accuracy. The second proposal, aims to reduce this degradation of accuracy, with three different methods presented to attain a better alignment.


This work has been supported by the Government of Spain TIN2011-28689-C02-02. Cedric Notredame is funded by the Plan Nacional BFU2011-28575 and The Quantomics project (KBBE- 2A-222664).

Document Type

Article
publishedVersion

Language

English

Subjects and keywords

Large-Scale Alignments; Scalability; Consistency; T-Coffee; Multiple Sequence Alignment; Llenguatges de programació; Informàtica; Arquitectures de xarxes d'ordinadors; Programming languages (Electronic computers); Computer science; Computer network architectures

Publisher

Springer Verlag

Related items

info:eu-repo/grantAgreement/MICINN//TIN2011-28689-C02-02/ES/EJECUCION EFICIENTE DE APLICACIONES MULTIDISCIPLINARES: NUEVOS DESAFIOS EN LA ERA MULTI%2FMANY CORE/

info:eu-repo/grantAgreement/MICINN//BFU2011-28575/ES/NGS-COFFEE: PRODUCCION DE ALINEAMIENTOS GENOMICOS MULTIPLES MEDIANTE EL ENRIQUECIMIENTO DE SECUENCIAS DE ADN CON INFORMACION EXPERIMENTAL PROVENIENTE DE CHIP-SEQ Y RNA-SEQ/

Reproducció del document publicat a: https://doi.org/10.1007/s11227-014-1362-z

Journal of Supercomputing, 2015, vol. 71, núm. 5, p. 1833-1845

Rights

cc-by (c) Lladós Segura, Jordi et al., 2015

http://creativecommons.org/licenses/by/3.0/es

This item appears in the following Collection(s)