SALAI-Net: species-agnostic local ancestry inference network

Oriol Sabat, Benet; Mas Montserrat, Daniel; Giró Nieto, Xavier; Ioannidis, Alexander; Oriol Sabat, Benet; Mas Montserrat, Daniel; Giró Nieto, Xavier; Ioannidis, Alexander

SALAI-Net: species-agnostic local ancestry inference network

Author

Oriol Sabat, Benet

Mas Montserrat, Daniel

Giró Nieto, Xavier

Ioannidis, Alexander

Other authors

Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions

Publication date

2022-09-16

Abstract

Availability and implementation: We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes).

Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models’ ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods.

This paper was published as part of a special issue financially supported by ECCB2022. Some of the computing for this project was performed on the Sherlock cluster at Stanford University. We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results. A.G.I. and D.M.M. received support from NIH under award R01HG010140. Conflict of Interest: AGI is a co-founder of Galatea Bio Inc.

Peer Reviewed

Objectius de Desenvolupament Sostenible::3 - Salut i Benestar

Objectius de Desenvolupament Sostenible::3 - Salut i Benestar::3.4 - Per a 2030, reduir en un terç la mortalitat prematura per malalties no transmissibles, mitjançant la prevenció i el tractament, i promoure la salut mental i el benestar

Postprint (published version)

Document Type

Article

Language

English

Subjects and keywords

Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica; Bioinformatics; Bioinformatics; Genome; Local ancestry inference; Bioinformàtica

Related items

https://academic.oup.com/bioinformatics/article/38/Supplement_2/ii27/6701999

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

Open Access

This item appears in the following Collection(s)

E-prints [73034]

SALAI-Net: species-agnostic local ancestry inference network

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Related items

Recommended citation

Export

Rights

This item appears in the following Collection(s)