SALAI-Net: species-agnostic local ancestry inference network

dc.contributor
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor.author
Oriol Sabat, Benet
dc.contributor.author
Mas Montserrat, Daniel
dc.contributor.author
Giró Nieto, Xavier
dc.contributor.author
Ioannidis, Alexander
dc.date.issued
2022-09-16
dc.identifier
Oriol, B. [et al.]. SALAI-Net: species-agnostic local ancestry inference network. "Bioinformatics", 16 Setembre 2022, vol. 38, núm. Supplement_2, p. ii27-ii33.
dc.identifier
1367-4803
dc.identifier
https://github.com/AI-sandbox/SALAI-Net
dc.identifier
https://hdl.handle.net/2117/375450
dc.identifier
10.1093/bioinformatics/btac464
dc.description.abstract
Availability and implementation: We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes).
dc.description.abstract
Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models’ ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods.
dc.description.abstract
This paper was published as part of a special issue financially supported by ECCB2022. Some of the computing for this project was performed on the Sherlock cluster at Stanford University. We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. A.G.I. and D.M.M. received support from NIH under award R01HG010140. Conflict of Interest: AGI is a co-founder of Galatea Bio Inc.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Objectius de Desenvolupament Sostenible::3 - Salut i Benestar
dc.description.abstract
Objectius de Desenvolupament Sostenible::3 - Salut i Benestar::3.4 - Per a 2030, reduir en un terç la mortalitat prematura per malalties no transmissibles, mitjançant la prevenció i el tractament, i promoure la salut mental i el benestar
dc.description.abstract
Postprint (published version)
dc.format
application/pdf
dc.language
eng
dc.relation
https://academic.oup.com/bioinformatics/article/38/Supplement_2/ii27/6701999
dc.rights
Open Access
dc.subject
Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica
dc.subject
Bioinformatics
dc.subject
Bioinformatics
dc.subject
Genome
dc.subject
Local ancestry inference
dc.subject
Bioinformàtica
dc.title
SALAI-Net: species-agnostic local ancestry inference network
dc.type
Article


Ficheros en el ítem

FicherosTamañoFormatoVer

No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [73026]