dc.contributor.author
Ros Freixedes, Roger
dc.contributor.author
Battagin, Mara
dc.contributor.author
Johnsson, Martin
dc.contributor.author
Gorjanc, Gregor
dc.contributor.author
Mileham, Alan J.
dc.contributor.author
Rounsley, Steve D.
dc.contributor.author
Hickey, John M.
dc.date.accessioned
2024-12-05T22:52:22Z
dc.date.available
2024-12-05T22:52:22Z
dc.date.issued
2020-02-03T08:25:03Z
dc.date.issued
2020-02-03T08:25:03Z
dc.date.issued
2018-12-13
dc.date.issued
2020-02-03T08:25:03Z
dc.identifier
https://doi.org/10.1186/s12711-018-0436-4
dc.identifier
http://hdl.handle.net/10459.1/67919
dc.identifier.uri
http://hdl.handle.net/10459.1/67919
dc.description.abstract
Background: Inherent sources of error and bias that affect the quality of sequence data include index hopping and bias towards the reference allele. The impact of these artefacts is likely greater for low-coverage data than for high- coverage data because low-coverage data has scant information and many standard tools for processing sequence data were designed for high-coverage data. With the proliferation of cost-effective low-coverage sequencing, there is a need to understand the impact of these errors and bias on resulting genotype calls from low-coverage sequencing. Results: We used a dataset of 26 pigs sequenced both at 2× with multiplexing and at 30× without multiplexing to show that index hopping and bias towards the reference allele due to alignment had little impact on genotype calls. However, pruning of alternative haplotypes supported by a number of reads below a predefined threshold, which is a default and desired step of some variant callers for removing potential sequencing errors in high-coverage data, introduced an unexpected bias towards the reference allele when applied to low-coverage sequence data. This bias reduced best-guess genotype concordance of low-coverage sequence data by 19.0 absolute percentage points. Conclusions: We propose a simple pipeline to correct the preferential bias towards the reference allele that can occur during variant discovery and we recommend that users of low-coverage sequence data be wary of unexpected biases that may be produced by bioinformatic tools that were designed for high-coverage sequence data.
dc.description.abstract
The authors acknowledge the financial support from the BBSRC ISPG to The Roslin Institute BBS/E/D/30002275, from Genus plc, Innovate UK (Grant 102271), and from grant numbers BB/N004736/1, BB/N015339/1, BB/L020467/1, and BB/M009254/1. M. Johnsson acknowledges financial support from the Swedish Research Council Formas Dnr 2016-01386.
dc.format
application/pdf
dc.publisher
BMC (part of Springer Nature)
dc.relation
Reproducció del document publicat a: https://doi.org/10.1186/s12711-018-0436-4
dc.relation
Genetics Selection Evolution, 2018, vol. 50, article number 64
dc.rights
cc-by (c) Ros Freixedes, Roger et al., 2018
dc.rights
info:eu-repo/semantics/openAccess
dc.rights
https://creativecommons.org/licenses/by/4.0/
dc.title
Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion