1
|
Alosaimi S, van Biljon N, Awany D, Thami PK, Defo J, Mugo JW, Bope CD, Mazandu GK, Mulder NJ, Chimusa ER. Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches. Brief Bioinform 2020; 22:6042242. [PMID: 33341897 DOI: 10.1093/bib/bbaa366] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 11/14/2020] [Accepted: 01/08/2020] [Indexed: 12/15/2022] Open
Abstract
Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.
Collapse
Affiliation(s)
- Shatha Alosaimi
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Noëlle van Biljon
- Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa
| | - Denis Awany
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Prisca K Thami
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Joel Defo
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Jacquiline W Mugo
- Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Christian D Bope
- Faculty of Sciences, Department of Mathematics and Computer Science, University of Kinshasa, Kinshasa, DRC
| | - Gaston K Mazandu
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.,Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Nicola J Mulder
- Faculty of Health Sciences, Division of Computational Biology, Department of Biomedical Sciences, University of Cape Town, Cape Town, South Africa.,Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa
| | - Emile R Chimusa
- Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.,Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Anzio Road, Observatory, Cape Town 7925, South Africa
| |
Collapse
|
2
|
Mugo JW, Geza E, Defo J, Elsheikh SSM, Mazandu GK, Mulder NJ, Chimusa ER. A multi-scenario genome-wide medical population genetics simulation framework. Bioinformatics 2018; 33:2995-3002. [PMID: 28957497 DOI: 10.1093/bioinformatics/btx369] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 06/21/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. Results Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. Availability and implementation The FractalSIM package is available at http://www.cbio.uct.ac.za/FractalSIM. Contact emile.chimusa@uct.ac.za. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jacquiline W Mugo
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Ephifania Geza
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa.,African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Joel Defo
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Samar S M Elsheikh
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Gaston K Mazandu
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa.,African Institute for Mathematical Sciences, Muizenberg, Cape Town 7945, South Africa
| | - Nicola J Mulder
- Department of Integrative Biomedical Sciences, Computational Biology Division, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town 7925, South Africa
| | - Emile R Chimusa
- Department of Pathology, Division of Human Genetics, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Observatory, Cape Town 7925, South Africa
| |
Collapse
|