1
|
Hansen NF, Wang X, Tegegn MB, Liu Z, Gouveia MH, Hill G, Lin JC, Okulosubo T, Shriner D, Thein SL, Mullikin JC. Random forest classifiers trained on simulated data enable accurate short read-based genotyping of structural variants in the alpha globin region at Chr16p13.3. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568683. [PMID: 38076833 PMCID: PMC10705532 DOI: 10.1101/2023.11.27.568683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
In regions where reads don't align well to a reference, it is generally difficult to characterize structural variation using short read sequencing. Here, we utilize machine learning classifiers and short sequence reads to genotype structural variants in the alpha globin locus on chromosome 16, a medically-relevant region that is challenging to genotype in individuals. Using models trained only with simulated data, we accurately genotype two hard-to-distinguish deletions in two separate human cohorts. Furthermore, population allele frequencies produced by our methods across a wide set of ancestries agree more closely with previously-determined frequencies than those obtained using currently available genotyping software.
Collapse
Affiliation(s)
- Nancy F. Hansen
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Xunde Wang
- Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA
| | - Mickias B. Tegegn
- Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA
| | - Zhi Liu
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Mateus H. Gouveia
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Gracelyn Hill
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Jennifer C. Lin
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Temiloluwa Okulosubo
- Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA
| | - Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| | - Swee Lay Thein
- Sickle Cell Branch, National Heart, Lung and Blood Institute, NIH, Bethesda, MD 20892, USA
| | - James C. Mullikin
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892, USA
| |
Collapse
|
2
|
The impact of malaria-protective red blood cell polymorphisms on parasite biomass in children with severe Plasmodium falciparum malaria. Nat Commun 2022; 13:3307. [PMID: 35676275 PMCID: PMC9178016 DOI: 10.1038/s41467-022-30990-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 05/24/2022] [Indexed: 11/08/2022] Open
Abstract
Severe falciparum malaria is a major cause of preventable child mortality in sub-Saharan Africa. Plasma concentrations of P. falciparum Histidine-Rich Protein 2 (PfHRP2) have diagnostic and prognostic value in severe malaria. We investigate the potential use of plasma PfHRP2 and the sequestration index (the ratio of PfHRP2 to parasite density) as quantitative traits for case-only genetic association studies of severe malaria. Data from 2198 Kenyan children diagnosed with severe malaria, genotyped for 14 major candidate genes, show that polymorphisms in four major red cell genes that lead to hemoglobin S, O blood group, α-thalassemia, and the Dantu blood group, are associated with substantially lower admission plasma PfHRP2 concentrations, consistent with protective effects against extensive parasitized erythrocyte sequestration. In contrast the known protective ATP2B4 polymorphism is associated with higher plasma PfHRP2 concentrations, lower parasite densities and a higher sequestration index. We provide testable hypotheses for the mechanism of protection of ATP2B4.
Collapse
|