51
|
Arriaga-MacKenzie IS, Matesi G, Chen S, Ronco A, Marker KM, Hall JR, Scherenberg R, Khajeh-Sharafabadi M, Wu Y, Gignoux CR, Null M, Hendricks AE. Summix: A method for detecting and adjusting for population structure in genetic summary data. Am J Hum Genet 2021; 108:1270-1282. [PMID: 34157305 PMCID: PMC8322937 DOI: 10.1016/j.ajhg.2021.05.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 05/26/2021] [Indexed: 12/11/2022] Open
Abstract
Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Collapse
|
52
|
Abstract
The detection of introgression from genomic data is transforming our view of species and the origins of adaptive variation. Among the most widely used approaches to detect introgression is the so-called ABBA-BABA test or D-statistic, which identifies excess allele sharing between nonsister taxa. Part of the appeal of D is its simplicity, but this also limits its informativeness, particularly about the timing and direction of introgression. Here we present a simple extension, D frequency spectrum or DFS, in which D is partitioned according to the frequencies of derived alleles. We use simulations over a large parameter space to show how DFS carries information about various factors. In particular, recent introgression reliably leads to a peak in DFS among low-frequency derived alleles, whereas violation of model assumptions can lead to a lack of signal at low frequencies. We also reanalyze published empirical data from six different animal and plant taxa, and interpret the results in the light of our simulations, showing how DFS provides novel insights. We currently see DFS as a descriptive tool that will augment both simple and sophisticated tests for introgression, but in the future it may be usefully incorporated into probabilistic inference frameworks.
Collapse
|
53
|
Marta M, Sánchez-Pozos K, Jaimes-Santoyo J, Monroy-Escutia J, Rivera-Santiago C, de Los Ángeles Granados-Silvestre M, Ortiz-López MG. Pharmacogenetic Evaluation of Metformin and Sulphonylurea Response in Mexican Mestizos with Type 2 Diabetes. Curr Drug Metab 2021; 21:291-300. [PMID: 32407269 DOI: 10.2174/1389200221666200514125443] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/19/2020] [Accepted: 04/08/2020] [Indexed: 01/20/2023]
Abstract
BACKGROUND In Mexico, approximately 25% of patients with type 2 diabetes (T2D) have adequate glycemic control. Polymorphisms in pharmacogenetic genes have been shown to have clinical consequences resulting in drug toxicity or therapeutic inefficacy. OBJECTIVE The study aimed to evaluate the impact of variants in genes known to be involved in response to oral hypoglycemic drugs, such as CYP2C9, OCT, MATE, ABCA1 and C11orf65, in the Mexican Mestizo population of T2D patients. METHODS In this study, 265 patients with T2D were enrolled from the Hospital Juárez de México, Mexico City. Genotyping was performed by TaqMan® assays. SNP-SNP interactions were analyzed using the multifactor dimensionality reduction (MDR) method. RESULTS Carriers of the del allele of rs72552763 could achieve better glycemic control than noncarriers. There was a significant difference in plasma glucose and HbA1c levels among rs622342 genotypes. The results suggested an SNP-SNP interaction between rs72552763 and rs622342 OCT1 and rs12943590 MATE2. CONCLUSION The interaction between rs72552763 and rs622342 in OCT1, and rs12943590 in MATE2 suggested an important role of these polymorphisms in metformin response in T2D Mexican Mestizo population.
Collapse
|
54
|
Friedlaender A, Tsantoulis P, Chevallier M, De Vito C, Addeo A. The Impact of Variant Allele Frequency in EGFR Mutated NSCLC Patients on Targeted Therapy. Front Oncol 2021; 11:644472. [PMID: 33869038 PMCID: PMC8044828 DOI: 10.3389/fonc.2021.644472] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 03/05/2021] [Indexed: 12/22/2022] Open
Abstract
EGFR mutations represent the most common currently targetable oncogenic driver in non-small cell lung cancer. There has been tremendous progress in targeting this alteration over the course of the last decade, and third generation tyrosine kinase inhibitors offer previously unseen survival rates among these patients. Nonetheless, a better understanding is still needed, as roughly a third of patients do not respond to targeted therapy and there is an important heterogeneity among responders. Allelic frequency, or the variant EGFR allele frequency, corresponds to the fraction of sequencing reads harboring the mutation. The allelic fraction is influenced by the proportion of tumor cells in the sample, the presence of copy number alterations but also, most importantly, by the proportion of cells within the tumor that carry the mutation. Mutations that occur early in tumor evolution, often called clonal or truncal, have a higher allelic frequency than late, subclonal mutations, and are more often drivers of cancer evolution and attractive therapeutic targets. Most, but not all, EGFR mutations are clonal. Although an exact estimate of clonal proportion is hard to derive computationally, the allelic frequency is readily available to clinicians and could be a useful surrogate. We hypothesized that tumors with low allelic frequency of the EGFR mutation will respond less favorably to targeted treatment.
Collapse
|
55
|
Cheng X, Liu Y, Lin N, Deng S, Wan Q. Association between Interleukin-1β Polymorphism at Rs16944 and Glucose Metabolism: A Cohort Study. Immunol Invest 2021; 51:619-629. [PMID: 33739224 DOI: 10.1080/08820139.2020.1860085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2022]
Abstract
Background: This study explored the correlation between the interleukin-1β gene rs16944 polymorphism and diabetes through epidemiological and follow-up investigations.Methods: The study was conducted on 600 subjects with normal glucose metabolism recruited from participants of the Risk Evaluation of cAncers in Chinese type 2 diabeTic Individuals: A lONgitudinal (REACTION) study in Luzhou, China in 2011. All subjects received a unified standardized questionnaire, physical examination, laboratory examination, and follow-up in 2016. Subjects were divided into normal glucose metabolism (NC), pre-diabetes (PDM), and type 2 diabetes mellitus (T2DM) groups according to their glucose metabolism after follow-up. The IL-1β gene rs16944 polymorphism was analyzed using the polymerase chain reaction-restriction fragment length polymorphism(PCR-RFLP) technique.Results: After follow-up, 386, 156, and 58 cases were observed in the NC, PDM, and T2DM groups, respectively. Serum IL-1β levels were compared to baselines at follow-up in the 3 groups; the difference in the T2DM group was statistically significant. The frequency distributions of the IL-1β gene rs16944 genotypes, i.e., CC, CT, and TT, were significantly different in the 3 groups, and the distributions in the T2DM and NC groups were significantly different. The frequency distributions of the C and T alleles of IL-1β rs16944 were not significantly different. Logistic regression analysis identified the CC+CT genotype as an independent risk factor for the development of diabetes in patients with normal glucose metabolism (OR = 2.457, 95% CI: 1.238-4.877).Conclusions: The IL-1β gene rs16944 C/T polymorphism may cause genetic susceptibility to T2DM in the Luzhou population. The CC+CT genotypes may increase T2DM risk.
Collapse
|
56
|
McGaughran A, Laver R, Fraser C. Evolutionary Responses to Warming. Trends Ecol Evol 2021; 36:591-600. [PMID: 33726946 DOI: 10.1016/j.tree.2021.02.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 02/23/2021] [Accepted: 02/26/2021] [Indexed: 12/24/2022]
Abstract
Climate change is predicted to dramatically alter biological diversity and distributions, driving extirpations, extinctions, and extensive range shifts across the globe. Warming can also, however, lead to phenotypic or behavioural plasticity, as species adapt to new conditions. Recent genomic research indicates that some species are capable of rapid evolution as selection favours adaptive responses to environmental change and altered or novel niche spaces. New advances are providing mechanistic insights into how temperature might accelerate evolution in the Anthropocene. These discoveries highlight intriguing new research directions - such as using geothermal and polar systems combined with powerful genomic tools - that will help us to understand the processes underpinning adaptive evolution and better project how ecosystems will change in a warming world.
Collapse
|
57
|
Davidson AL, Leonard C, Koufariotis LT, Parsons MT, Hollway GE, Pearson JV, Newell F, Waddell N, Spurdle AB. Considerations for using population frequency data in germline variant interpretation: Cancer syndrome genes as a model. Hum Mutat 2021; 42:530-536. [PMID: 33600021 DOI: 10.1002/humu.24183] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 02/05/2021] [Accepted: 02/14/2021] [Indexed: 01/01/2023]
Abstract
Aggregate population genomics data from large cohorts are vital for assessing germline variant pathogenicity. However, there are no specifications on how sequencing quality metrics should be considered, and whether exome-derived and genome-derived allele frequencies should be considered in isolation. Germline genome sequence data were simulated for nine read-depths to identify a minimum acceptable read-depth for detecting variants. gnomAD exome-derived and genome-derived datasets were assessed for read-depth, for six key cancer genes selected for variant curation by ClinGen expert panels. Non-Finnish European allele frequency (AF) or filter AF of coding variants in these genes, assigned into frequency bins using modified ACMG-AMP criteria, was compared between exome-derived and genome-derived datasets. A 30X read-depth achieved acceptable precision and recall for detection of substitutions, but poor recall for small insertions/deletions. Exome-derived and genome-derived datasets exhibited low read-depth for different gene exons. Individual variants were mostly assigned to non-divergent AF bins (>95%) or filter AF bins (>97%). Two major bin divergences were resolved by applying the minimal acceptable read-depth threshold. These findings show the importance of assessing read-depth separately for population datasets sourced from different short-read sequencing technologies before assigning a frequency-based ACMG-AMP classification code for variant interpretation.
Collapse
|
58
|
Bogaerts‐Márquez M, Guirao‐Rico S, Gautier M, González J. Temperature, rainfall and wind variables underlie environmental adaptation in natural populations of Drosophila melanogaster. Mol Ecol 2021; 30:938-954. [PMID: 33350518 PMCID: PMC7986194 DOI: 10.1111/mec.15783] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 12/16/2020] [Accepted: 12/18/2020] [Indexed: 02/06/2023]
Abstract
While several studies in a diverse set of species have shed light on the genes underlying adaptation, our knowledge on the selective pressures that explain the observed patterns lags behind. Drosophila melanogaster is a valuable organism to study environmental adaptation because this species originated in Southern Africa and has recently expanded worldwide, and also because it has a functionally well-annotated genome. In this study, we aimed to decipher which environmental variables are relevant for adaptation of D. melanogaster natural populations in Europe and North America. We analysed 36 whole-genome pool-seq samples of D. melanogaster natural populations collected in 20 European and 11 North American locations. We used the BayPass software to identify single nucleotide polymorphisms (SNPs) and transposable elements (TEs) showing signature of adaptive differentiation across populations, as well as significant associations with 59 environmental variables related to temperature, rainfall, evaporation, solar radiation, wind, daylight hours, and soil type. We found that in addition to temperature and rainfall, wind related variables are also relevant for D. melanogaster environmental adaptation. Interestingly, 23%-51% of the genes that showed significant associations with environmental variables were not found overly differentiated across populations. In addition to SNPs, we also identified 10 reference transposable element insertions associated with environmental variables. Our results showed that genome-environment association analysis can identify adaptive genetic variants that are undetected by population differentiation analysis while also allowing the identification of candidate environmental drivers of adaptation.
Collapse
|
59
|
INNAN H, SAKAMOTO T. Multi-dimensional diffusion process of allele frequencies in population genetics. PROCEEDINGS OF THE JAPAN ACADEMY. SERIES B, PHYSICAL AND BIOLOGICAL SCIENCES 2021; 97:134-143. [PMID: 33692229 PMCID: PMC8019856 DOI: 10.2183/pjab.97.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 01/04/2021] [Indexed: 06/12/2023]
Abstract
One of the ultimate goals of population genetics is to theoretically describe the behavior of allele frequency. Diffusion theory has been commonly used for this purpose mainly in one-locus one-population models, although it is not easy to handle diffusion theory in models with multiple loci or with multiple populations. This review introduces several successful cases, where multi-dimensional diffusion equations contributed to addressing evolutionary questions, thereby demonstrating its strong potential in population genetics.
Collapse
|
60
|
Khacha-Ananda S, Mahawong P. Genetic analysis of 12 X-short tandem repeats loci in a northern Thai population. MEDICINE, SCIENCE, AND THE LAW 2021; 61:34-43. [PMID: 33045921 DOI: 10.1177/0025802420965000] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Short tandem repeats (STRs) are widely used as DNA markers in paternity testing and criminal investigations because of their high genetic polymorphism among individuals in population. However, many factors influence genetic variations of STRs. Therefore, understanding STR information within individual populations could provide database and scientifically reliable STR genotyping for forensic genetic purposes. We aimed to examine allele frequencies of X-STRs, including some forensic parameters, in a northern Thai population. A retrospective descriptive study was conducted by collecting X-STR data from unrelated individuals living in a northern region of Thailand. The allele frequency and forensic parameters - for example polymorphism information content (PIC), power of discrimination in females and males (PDf and PDm), mean exclusion chance (MEC) and haplotype frequency - were calculated. The Hardy-Weinberg equilibrium was analysed. A total of 132 alleles were observed, with corresponding allele frequency ranging from 0.0064 to 0.4904. The PIC of all loci was >0.6, representing high genetic polymorphism, except DXS8378 and DXS7423. Notably, DXS10135 was the most diverse loci with the highest PD and MEC, while DXS7423 was the least polymorphic marker with the lowest PD and MEC. The highest haplotype diversity in male data was on linkage group III (DXS10101-DXS10103-HPRTB) by 0.9895. The genetic distance analysis demonstrated that the northern Thai population had a close relationship with Taiwanese (DA = 0.023). There are no significant deviations among the Hardy-Weinberg equilibrium except DXS10148. This study has established a northern Thai X-STRs reference database to be used as a tool for forensic genetic purposes.
Collapse
|
61
|
Wei KHC, Mantha A, Bachtrog D. The Theory and Applications of Measuring Broad-Range and Chromosome-Wide Recombination Rate from Allele Frequency Decay around a Selected Locus. Mol Biol Evol 2020; 37:3654-3671. [PMID: 32658965 PMCID: PMC7743735 DOI: 10.1093/molbev/msaa171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Recombination is the exchange of genetic material between homologous chromosomes via physical crossovers. High-throughput sequencing approaches detect crossovers genome wide to produce recombination rate maps but are difficult to scale as they require large numbers of recombinants individually sequenced. We present a simple and scalable pooled-sequencing approach to experimentally infer near chromosome-wide recombination rates by taking advantage of non-Mendelian allele frequency generated from a fitness differential at a locus under selection. As more crossovers decouple the selected locus from distal loci, the distorted allele frequency attenuates distally toward Mendelian and can be used to estimate the genetic distance. Here, we use marker selection to generate distorted allele frequency and theoretically derive the mathematical relationships between allele frequency attenuation, genetic distance, and recombination rate in marker-selected pools. We implemented nonlinear curve-fitting methods that robustly estimate the allele frequency decay from batch sequencing of pooled individuals and derive chromosome-wide genetic distance and recombination rates. Empirically, we show that marker-selected pools closely recapitulate genetic distances inferred from scoring recombinants. Using this method, we generated novel recombination rate maps of three wild-derived strains of Drosophila melanogaster, which strongly correlate with previous measurements. Moreover, we show that this approach can be extended to estimate chromosome-wide crossover interference with reciprocal marker selection and discuss how it can be applied in the absence of visible markers. Altogether, we find that our method is a simple and cost-effective approach to generate chromosome-wide recombination rate maps requiring only one or two libraries.
Collapse
|
62
|
Chien WM, Chang CT, Chiang YC, Hwang SY. Ecological Factors Generally Not Altitude Related Played Main Roles in Driving Potential Adaptive Evolution at Elevational Range Margin Populations of Taiwan Incense Cedar ( Calocedrus formosana). Front Genet 2020; 11:580630. [PMID: 33262787 PMCID: PMC7686793 DOI: 10.3389/fgene.2020.580630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 10/21/2020] [Indexed: 12/05/2022] Open
Abstract
Population diversification can be shaped by a combination of environmental factors as well as geographic isolation interacting with gene flow. We surveyed genetic variation of 243 samples from 12 populations of Calocedrus formosana using amplified fragment length polymorphism (AFLP) and scored a total of 437 AFLP fragments using 11 selective amplification primer pairs. The AFLP variation was used to assess the role of gene flow on the pattern of genetic diversity and to test environments in driving population adaptive evolution. This study found the relatively lower level of genetic diversity and the higher level of population differentiation in C. formosana compared with those estimated in previous studies of conifers including Cunninghamia konishii, Keteleeria davidiana var. formosana, and Taiwania cryptomerioides occurring in Taiwan. BAYESCAN detected 26 FST outlier loci that were found to be associated strongly with various environmental variables using multiple univariate logistic regression, latent factor mixed model, and Bayesian logistic regression. We found several environmentally dependent adaptive loci with high frequencies in low- or high-elevation populations, suggesting their involvement in local adaptation. Ecological factors, including relative humidity and sunshine hours, that are generally not altitude related could have been the most important selective drivers for population divergent evolution in C. formosana. The present study provides fundamental information in relation to adaptive evolution and can be useful for assisted migration program of C. formosana in the future conservation of this species.
Collapse
|
63
|
Zou J, Shen G, Qiang W, Zhu YY, Li WX. Study on the polymorphisms of HLA-ABCDQB1DRB1 alleles and haplotypes in Hubei Han population of China. Int J Immunogenet 2020; 48:8-15. [PMID: 32996280 DOI: 10.1111/iji.12516] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 09/02/2020] [Accepted: 09/04/2020] [Indexed: 11/30/2022]
Abstract
The present study aimed to analyse the frequencies of human leukocyte antigen HLA-ABCDQB1 and HLA-DRB1 alleles and haplotypes in a subset of 3,732 Han population from Hubei of China. All samples were typed in the HLA-ABCDQB1 and HLA-DRB1 loci using the sequence-based typing method; subsequently, the HLA polymorphisms were analysed. A total of 47 HLA-A, 89 HLA-B, 43 HLA-C, 49 HLA-DRB1 and 24 HLA-DQB1 alleles were identified in the Hubei Han population. The top three most frequent alleles in the HLA-ABCDQB1 and HLA-DRB1 were A*11:01 (0.2617), A*24:02 (0.1590), A*02:07 (0.1281); B*46:01 (0.1502), B*40:01 (0.1409) and B*58:01 (0.0616); C*01:02 (0.2023), C*07:02 (0.1691) and C*03:04 (0.1175); and DQB1*03:01 (0.2000), DQB1*03:03 (0.1900), DQB1*06:01 (0.1187); DRB1*09:01 (0.1790), DRB1*15:01 (0.1062) and DRB1*12:02 (0.0841), respectively. Meanwhile, the three most frequent two-loci haplotypes were A*02:07-C*01:02 (0.0929), B*46:01-C*01:02 (0.1366) and DQB1*03:03-DRB1*09:01 (0.1766). The three most frequent three-loci haplotypes were A*02:07-B*46:01-C*01:02 (0.0883), B*46:01-DQB1*03:03-DRB1*09:01 (0.0808) and C*01:02-DQB1*03:03-DRB1*09:01 (0.0837). The three most frequent four-loci haplotypes were A*02:07-B*46:01-C*01:02-DQB1*03:03 (0.0494), B*46:01-DRB1*09:01-C*01:02-DQB1*03:03 (0.0729) and A*02:07-B*46:01-DQB1*03:03-DRB1*09:01 (0.0501). The most frequent five-loci haplotype was A*02:07-B*46:01-C*01:02-DQB1*03:03-DRB1*09:01 (0.0487). Heat maps and multiple correspondence analysis based on the frequencies of HLA specificity indicated that the Hubei Han population might be described into Southern Chinese populations. Our results lay a certain foundation for future population studies, disease association studies and donor recruitment strategies.
Collapse
|
64
|
Shikov AE, Barbitoff YA, Glotov AS, Danilova MM, Tonyan ZN, Nasykhova YA, Mikhailova AA, Bespalova ON, Kalinin RS, Mirzorustamova AM, Kogan IY, Baranov VS, Chernov AN, Pavlovich DM, Azarenko SV, Fedyakov MA, Tsay VV, Eismont YA, Romanova OV, Hobotnikov DN, Vologzhanin DA, Mosenko SV, Ponomareva TA, Talts YA, Anisenkova AU, Lisovets DG, Sarana AM, Urazov SP, Scherbak SG, Glotov OS. Analysis of the Spectrum of ACE2 Variation Suggests a Possible Influence of Rare and Common Variants on Susceptibility to COVID-19 and Severity of Outcome. Front Genet 2020; 11:551220. [PMID: 33133145 PMCID: PMC7550667 DOI: 10.3389/fgene.2020.551220] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 08/28/2020] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVES In March 2020, the World Health Organization declared that an infectious respiratory disease caused by a new severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2, causing coronavirus disease 2019 (COVID-19)] became a pandemic. In our study, we have analyzed a large publicly available dataset, the Genome Aggregation Database (gnomAD), as well as a cohort of 37 Russian patients with COVID-19 to assess the influence of different classes of genetic variants in the angiotensin-converting enzyme-2 (ACE2) gene on the susceptibility to COVID-19 and the severity of disease outcome. RESULTS We demonstrate that the European populations slightly differ in alternative allele frequencies at the 2,754 variant sites in ACE2 identified in the gnomAD database. We find that the Southern European population has a lower frequency of missense variants and slightly higher frequency of regulatory variants. However, we found no statistical support for the significance of these differences. We also show that the Russian population is similar to other European populations when comparing the frequencies of the ACE2 variants. Evaluation of the effect of various classes of ACE2 variants on COVID-19 outcome in a cohort of Russian patients showed that common missense and regulatory variants do not explain the differences in disease severity. At the same time, we find several rare ACE2 variants (including rs146598386, rs73195521, rs755766792, and others) that are likely to affect the outcome of COVID-19. Our results demonstrate that the spectrum of genetic variants in ACE2 may partially explain the differences in severity of the COVID-19 outcome.
Collapse
|
65
|
Wang J, Liu H, Bertrand RE, Sarrion-Perdigones A, Gonzalez Y, Venken KJT, Chen R. A novel statistical method for interpreting the pathogenicity of rare variants. Genet Med 2020; 23:59-68. [PMID: 32884132 PMCID: PMC7796914 DOI: 10.1038/s41436-020-00948-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 08/11/2020] [Accepted: 08/12/2020] [Indexed: 01/09/2023] Open
Abstract
Purpose: To achieve the ultimate goal of personalized treatment of patients, accurate molecular diagnosis and precise interpretation of the impact of genetic variants on gene function is essential. With the sequencing cost becoming increasingly affordable, accurate distinguishing benign from pathogenic variants upon sequencing becomes the major bottleneck. Although large normal population sequence databases have become a key resource in filtering benign variants, they are not effective at filtering extremely rare variants. Methods: To address this challenge, we developed a novel statistical test by combining sequencing data from a patient cohort with a normal control population database. By comparing the expected and observed allele frequency in the patient cohort, variants that are likely benign can be identified. Results: The performance of this new method is evaluated on both simulated and real datasets coupled with experimental validation. As a result, we demonstrate this new test is well-powered to identify benign variants, particularly effective for variants with low frequency in the normal population. Conclusion: Overall, as a general test that can be applied to any type of variants in the context of all Mendelian diseases, our work provides a general framework for filtering benign variants with very low population allele frequency.
Collapse
|
66
|
Requena D, Médico A, Chacón RD, Ramírez M, Marín-Sánchez O. Identification of Novel Candidate Epitopes on SARS-CoV-2 Proteins for South America: A Review of HLA Frequencies by Country. Front Immunol 2020; 11:2008. [PMID: 33013857 PMCID: PMC7494848 DOI: 10.3389/fimmu.2020.02008] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 07/24/2020] [Indexed: 01/08/2023] Open
Abstract
Coronavirus disease (COVID-19), caused by the virus SARS-CoV-2, is already responsible for more than 4.3 million confirmed cases and 295,000 deaths worldwide as of May 15, 2020. Ongoing efforts to control the pandemic include the development of peptide-based vaccines and diagnostic tests. In these approaches, HLA allelic diversity plays a crucial role. Despite its importance, current knowledge of HLA allele frequencies in South America is very limited. In this study, we have performed a literature review of datasets reporting HLA frequencies of South American populations, available in scientific literature and/or in the Allele Frequency Net Database. This allowed us to enrich the current scenario with more than 12.8 million data points. As a result, we are presenting updated HLA allelic frequencies based on country, including 91 alleles that were previously thought to have frequencies either under 5% or of an unknown value. Using alleles with an updated frequency of at least ≥5% in any South American country, we predicted epitopes in SARS-CoV-2 proteins using NetMHCpan (I and II) and MHC flurry. Then, the best predicted epitopes (class-I and -II) were selected based on their binding to South American alleles (Coverage Score). Class II predicted epitopes were also filtered based on their three-dimensional exposure. We obtained 14 class-I and four class-II candidate epitopes with experimental evidence (reported in the Immune Epitope Database and Analysis Resource), having good coverage scores for South America. Additionally, we are presenting 13 HLA-I and 30 HLA-II novel candidate epitopes without experimental evidence, including 16 class-II candidates in highly exposed conserved areas of the NTD and RBD regions of the Spike protein. These novel candidates have even better coverage scores for South America than those with experimental evidence. Finally, we show that recent similar studies presenting candidate epitopes also predicted some of our candidates but discarded them in the selection process, resulting in candidates with suboptimal coverage for South America. In conclusion, the candidate epitopes presented provide valuable information for the development of epitope-based strategies against SARS-CoV-2, such as peptide vaccines and diagnostic tests. Additionally, the updated HLA allelic frequencies provide a better representation of South America and may impact different immunogenetic studies.
Collapse
|
67
|
Bodner M, Parson W. The STRidER Report on Two Years of Quality Control of Autosomal STR Population Datasets. Genes (Basel) 2020; 11:E901. [PMID: 32784546 PMCID: PMC7463946 DOI: 10.3390/genes11080901] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 01/20/2023] Open
Abstract
STRidER, the STRs for Identity ENFSI Reference Database, is a curated, freely publicly available online allele frequency database, quality control (QC) and software platform for autosomal Short Tandem Repeats (STRs) developed under the endorsement of the International Society for Forensic Genetics. Continuous updates comprise additional STR loci and populations in the frequency database and many further STR-related aspects. One significant innovation is the autosomal STR data QC provided prior to publication of datasets. Such scrutiny was lacking previously, leaving QC to authors, reviewers and editors, which led to an unacceptably high error rate in scientific papers. The results from scrutinizing 184 STR datasets containing >177,000 individual genotypes submitted in the first two years of STRidER QC since 2017 revealed that about two-thirds of the STR datasets were either being withdrawn by the authors after initial feedback or rejected based on a conservative error rate. Almost no error-free submissions were received, which clearly shows that centralized QC and data curation are essential to maintain the high-quality standard required in forensic genetics. While many errors had minor impact on the resulting allele frequencies, multiple error categories were commonly found within single datasets. Several datasets contained serious flaws. We discuss the factors that caused the errors to draw the attention to redundant pitfalls and thus contribute to better quality of autosomal STR datasets and allele frequency reports.
Collapse
|
68
|
Dissanayake R, Braich S, Cogan NOI, Smith K, Kaur S. Characterization of Genetic and Allelic Diversity Amongst Cultivated and Wild Lentil Accessions for Germplasm Enhancement. Front Genet 2020; 11:546. [PMID: 32587602 PMCID: PMC7298104 DOI: 10.3389/fgene.2020.00546] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 05/06/2020] [Indexed: 12/13/2022] Open
Abstract
Intensive breeding of cultivated lentil has resulted in a relatively narrow genetic base, which limits the options to increase crop productivity through selection. Assessment of genetic diversity in the wild gene pool of lentil, as well as characterization of useful and novel alleles/genes that can be introgressed into elite germplasm, presents new opportunities and pathways for germplasm enhancement, followed by successful crop improvement. In the current study, a lentil collection consisting of 467 wild and cultivated accessions that originated from 10 diverse geographical regions was assessed, to understand genetic relationships among different lentil species/subspecies. A total of 422,101 high-confidence SNP markers were identified against the reference lentil genome (cv. CDC Redberry). Phylogenetic analysis clustered the germplasm collection into four groups, namely, Lens culinaris/Lens orientalis, Lens lamottei/Lens odemensis, Lens ervoides, and Lens nigricans. A weak correlation was observed between geographical origin and genetic relationship, except for some accessions of L. culinaris and L. ervoides. Genetic distance matrices revealed a comparable level of variation within the gene pools of L. culinaris (Nei’s coefficient 0.01468–0.71163), L. ervoides (Nei’s coefficient 0.01807–0.71877), and L. nigricans (Nei’s coefficient 0.02188–1.2219). In order to understand any genic differences at species/subspecies level, allele frequencies were calculated from a subset of 263 lentil accessions. Among all cultivated and wild lentil species, L. nigricans exhibited the greatest allelic differentiation across the genome compared to all other species/subspecies. Major differences were observed on six genomic regions with the largest being on Chromosome 1 (c. 1 Mbp). These results indicate that L. nigricans is the most distantly related to L. culinaris and additional structural variations are likely to be identified from genome sequencing studies. This would provide further insights into evolutionary relationships between cultivated and wild lentil germplasm, for germplasm improvement and introgression.
Collapse
|
69
|
Wang W, Zhang W, Zhang J, He J, Zhu F. Distribution of HLA allele frequencies in 82 Chinese individuals with coronavirus disease-2019 (COVID-19). HLA 2020; 96:194-196. [PMID: 32424945 PMCID: PMC7276866 DOI: 10.1111/tan.13941] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 05/12/2020] [Accepted: 05/15/2020] [Indexed: 01/04/2023]
Abstract
COVID‐19 is a respiratory disease caused by a novel coronavirus and is currently a global pandemic. HLA variation is associated with COVID‐19 because HLA plays a pivotal role in the immune response to pathogens. Here, 82 individuals with COVID‐19 were genotyped for HLA‐A, ‐B, ‐C, ‐DRB1, ‐DRB3/4/5, ‐DQA1, ‐DQB1, ‐DPA1, and ‐DPB1 loci using next‐generation sequencing (NGS). Frequencies of the HLA‐C*07:29, C*08:01G, B*15:27, B*40:06, DRB1*04:06, and DPB1*36:01 alleles were higher, while the frequencies of the DRB1*12:02 and DPB1*04:01 alleles were lower in COVID‐19 patients than in the control population, with uncorrected statistical significance. Only HLA‐C*07:29 and B*15:27 were significant when the corrected P‐value was considered. These data suggested that some HLA alleles may be associated with the occurrence of COVID‐19.
Collapse
|
70
|
Zuo Q, Duan Y, Wang B, Xu H, Wu W, Zhao J, Wu D, Chu X, Chen W. Genomic analysis of blood samples with serologic ABO discrepancy identifies 12 novel alleles in a Chinese Han population. Transfus Med 2020; 30:308-316. [PMID: 32452063 DOI: 10.1111/tme.12686] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 02/13/2020] [Accepted: 04/24/2020] [Indexed: 11/29/2022]
Abstract
OBJECTIVES This study aimed at identifying new ABO alleles from155 unrelated blood samples with potential ABO discrepancy in a Chinese Han population of 835 144 donors. BACKGROUND Serological strategies and genotyping are crucial for the precise determination of ABO discrepancy. METHODS Their ABO phenotypes and plasma glycosyltransferase activity were determined by standard forward and reverse typing and dilution tests. The genomic DNA of the ABO gene was amplified by polymerase chain reaction and sequenced. The frequency of ABO subgroup alleles associated with ABO discrepancy was analysed. RESULTS Serological analysis indicated that 53, 96 and 6 samples with ABO discrepancy were identified in the A, B and O subgroups, respectively. Genetic analysis revealed 12 novel alleles among the 46 associated with serologic ABO discrepancy. The majority of novel alleles was obtained from point mutations or single base insertion in Exons 6 to 7 of the ABO gene. The most frequent alleles were ABO*cisAB.01 (14/53, 26.42%) and ABO*A2.05 (7/53, 13.2%) in the A subgroup and ABO*BA.02 (34/96, 35.42%) and ABO*BEL.11 (15/96, 15.62%) in the B subgroup. Samples with the same ABO subgroup allele displayed different phenotypes, such as ABO*AX.13, ABO*BW.03, ABO*BW.12, ABO*BW.15, ABO*BEL.03, ABO*BEL.10 and ABO*BEL.11. CONCLUSION This study identified 12 novel alleles among the 46 associated with serologic ABO discrepancies. ABO genotyping is needed for the accurate evaluation of blood phenotype to improve the safety of blood transfusion.
Collapse
|
71
|
Hashimoto S, Nakajima F, Imanishi T, Kawai Y, Kato K, Kimura T, Miyata S, Takanashi M, Nishio M, Tokunaga K, Satake M. Implications of HLA diversity among regions for bone marrow donor searches in Japan. HLA 2020; 96:24-42. [PMID: 32222025 DOI: 10.1111/tan.13881] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 03/03/2020] [Accepted: 03/18/2020] [Indexed: 12/20/2022]
Abstract
Japan is an island country, and the Japanese people have had minimal genetic exchange with other ethnolinguistic groups. Consequently, the population is highly uniform and has limited HLA diversity relative to people from other countries. However, Japan has three ethnolinguistic groups, and HLA distributions differ depending on geographic region. To collect an HLA-rich variety of bone marrow bank donor registrants, it is essential to know the precise distribution of HLA in Japan. We analyzed HLA alleles and haplotypes based on HLA information of 177 041 bone marrow donor registrants. Registrants were grouped depending on the prefecture and region (a group of prefectures) as commonly used in Japan. The prefectures did not show the same distributions, but the tendency was similar for each region. We found that Okinawa Prefecture and the mainland can be clearly divided as haplotypes: [A*24:02-C*01:02-B*54:01-DRB1*04:05] and [A*24:02-C*01:02-B*59:01-DRB1*04:05] were typically found in Okinawa (P = .02, P < .001). Moreover, these types were found almost exclusively in Japan and Korea. Donor registration centers of the Japan Marrow Donor Program are currently located in all prefectures. It is essential to deploy registration centers to collect registrants with a large variety of HLA types covering all of Japan.
Collapse
|
72
|
Do MD, Le LGH, Nguyen VT, Dang TN, Nguyen NH, Vu HA, Mai TP. High-Resolution HLA Typing of HLA-A, -B, -C, -DRB1, and -DQB1 in Kinh Vietnamese by Using Next-Generation Sequencing. Front Genet 2020; 11:383. [PMID: 32425978 PMCID: PMC7204072 DOI: 10.3389/fgene.2020.00383] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 03/27/2020] [Indexed: 12/19/2022] Open
Abstract
Human leukocyte antigen (HLA) genotyping displays the particular characteristics of HLA alleles and haplotype frequencies in each population. Although it is considered the current gold standard for HLA typing, high-resolution sequence-based HLA typing is currently unavailable in Kinh Vietnamese populations. In this study, high-resolution sequence-based HLA typing (3-field) was performed using an amplicon-based next-generation sequencing platform to identify the HLA-A, -B, -C, -DRB1, and -DQB1 alleles of 101 unrelated healthy Kinh Vietnamese individuals from southern Vietnam. A total of 28 HLA-A, 41 HLA-B, 21 HLA-C, 26 HLA-DRB1, and 25 HLA-DQB1 alleles were identified. The most frequently occurring HLA alleles were A∗11:01:01, B∗15:02:01, C∗07:02:01, DRB1∗12:02:01, and DQB1∗03:01:01. Haplotype calculation showed that A∗29:01:01∼B∗07:05:01, DRB1∗12:02:01∼DQB1∗3:01:01, A∗29:01:01∼C∗15:05:02∼B∗07:05:01, A∗33:03:01∼B∗58:01:01∼DRB1∗03:01:01, and A∗29:01:01∼C∗15:05:02∼B∗07:05:01∼DRB1∗10:01:01∼DQB1∗05:01:01 were the most common haplotypes in the southern Kinh Vietnamese population. Allele distribution and haplotype analyses demonstrated that the Vietnamese population shares HLA features with South-East Asians but retains unique characteristics. Data from this study will be potentially applicable in medicine and anthropology.
Collapse
|
73
|
Wu P, Hou L, Zhang Y, Zhang L. Phylogenetic Tree Inference: A Top-Down Approach to Track Tumor Evolution. Front Genet 2020; 10:1371. [PMID: 32117420 PMCID: PMC7020887 DOI: 10.3389/fgene.2019.01371] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Accepted: 12/16/2019] [Indexed: 12/21/2022] Open
Abstract
Recently, an increasing number of studies sequence multiple biopsies of primary tumors, and even paired metastatic tumors to understand heterogeneity and the evolutionary trajectory of cancer progression. Although several algorithms are available to infer the phylogeny, most tools rely on accurate measurements of mutation allele frequencies from deep sequencing, which is often hard to achieve for clinical samples (especially FFPE samples). In this study, we present a novel and easy-to-use method, PTI (Phylogenetic Tree Inference), which use an iterative top-down approach to infer the phylogenetic tree structure of multiple tumor biopsies from same patient using just the presence or absence of somatic mutations without their allele frequencies. Therefore PTI can be used in a wide range of cases even when allele frequency data is not available. Comparison with existing state-of-the-art methods, such as LICHeE, Treeomics, and BAMSE, shows that PTI achieves similar or slightly better performance within a short run time. Moreover, this method is generally applicable to infer phylogeny for any other data sets (such as epigenetics) with a similar zero and one feature-by-sample matrix.
Collapse
|
74
|
Zhang C, Wang D, Wang J, Sun Q, Tian L, Tang X, Yuan Z, He H, Yu S. Genetic Dissection and Validation of Chromosomal Regions for Transmission Ratio Distortion in Intersubspecific Crosses of Rice. FRONTIERS IN PLANT SCIENCE 2020; 11:563548. [PMID: 33193492 PMCID: PMC7655136 DOI: 10.3389/fpls.2020.563548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 09/17/2020] [Indexed: 05/17/2023]
Abstract
Transmission ratio distortion (TRD) refers to a widespread phenomenon in which one allele is transmitted by heterozygotes more frequently to the progeny than the opposite allele. TRD is considered as a mark suggesting the presence of a reproductive barrier. However, the genetic and molecular mechanisms underlying TRD in rice remain largely unknown. In the present study, a population of backcross inbred lines (BILs) derived from the cross of a japonica cultivar Nipponbare (NIP) and an indica variety 9311 was utilized to study the genetic base of TRD. A total of 18 genomic regions were identified for TRD in the BILs. Among them, 12 and 6 regions showed indica (9311) and japonica (NIP) alleles with preferential transmission, respectively. A series of F2 populations were used to confirm the TRD effects, including six genomic regions that were confirmed by chromosome segment substitution line (CSSL)-derived F2 populations from intersubspecific allelic combinations. However, none of the regions was confirmed by the CSSL-derived populations from intrasubspecific allelic combination. Furthermore, significant epistatic interaction was found between TRD1.3 and TRD8.1 suggesting that TRD could positively contribute to breaking intersubspecific reproductive barriers. Our results have laid the foundation for identifying the TRD genes and provide an effective strategy to breakdown TRD for breeding wide-compatible lines, which will be further utilized in the intersubspecific hybrid breeding programs.
Collapse
|
75
|
Barbitoff YA, Skitchenko RK, Poleshchuk OI, Shikov AE, Serebryakova EA, Nasykhova YA, Polev DE, Shuvalova AR, Shcherbakova IV, Fedyakov MA, Glotov OS, Glotov AS, Predeus AV. Whole-exome sequencing provides insights into monogenic disease prevalence in Northwest Russia. Mol Genet Genomic Med 2019; 7:e964. [PMID: 31482689 PMCID: PMC6825859 DOI: 10.1002/mgg3.964] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 08/07/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Allele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large-scale projects, such as the Russian population. METHODS In this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar-based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1. RESULTS An estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers-Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies. CONCLUSION Our observations demonstrate the utility of population-specific allele frequency data to the diagnosis of monogenic disorders using high-throughput technologies.
Collapse
|