REAL

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.

Gézsi, András and Bolgár, Bence Márton and Marx, Péter and Sárközy, Péter and Szalai, Csaba and Antal, Péter (2015) VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering. BMC GENOMICS, 16 (1). p. 875. ISSN 1471-2164

[img]
Preview
Text
appeared_art_10.1186_s12864_015_2050_y_u.pdf

Download (2MB) | Preview

Abstract

BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data. RESULTS: We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision. CONCLUSIONS: VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller .

Item Type: Article
Subjects: Q Science / természettudomány > QH Natural history / természetrajz > QH426 Genetics / genetika, örökléstan
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 05 Oct 2016 07:47
Last Modified: 05 Oct 2016 07:47
URI: http://real.mtak.hu/id/eprint/41517

Actions (login required)

Edit Item Edit Item