26
|
Chande AT, Nagar SD, Rishishwar L, Mariño-Ramírez L, Medina-Rivas MA, Valderrama-Aguirre AE, Jordan IK, Gallo JE. The Impact of Ethnicity and Genetic Ancestry on Disease Prevalence and Risk in Colombia. Front Genet 2021; 12:690366. [PMID: 34650589 PMCID: PMC8507149 DOI: 10.3389/fgene.2021.690366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 08/11/2021] [Indexed: 11/13/2022] Open
Abstract
Currently, the vast majority of genomic research cohorts are made up of participants with European ancestry. Genomic medicine will only reach its full potential when genomic studies become more broadly representative of global populations. We are working to support the establishment of genomic medicine in developing countries in Latin America via studies of ethnically and ancestrally diverse Colombian populations. The goal of this study was to analyze the effect of ethnicity and genetic ancestry on observed disease prevalence and predicted disease risk in Colombia. Population distributions of Colombia's three major ethnic groups - Mestizo, Afro-Colombian, and Indigenous - were compared to disease prevalence and socioeconomic indicators. Indigenous and Mestizo ethnicity show the highest correlations with disease prevalence, whereas the effect of Afro-Colombian ethnicity is substantially lower. Mestizo ethnicity is mostly negatively correlated with six high-impact health conditions and positively correlated with seven of eight common cancers; Indigenous ethnicity shows the opposite effect. Malaria prevalence in particular is strongly correlated with ethnicity. Disease prevalence co-varies across geographic regions, consistent with the regional distribution of ethnic groups. Ethnicity is also correlated with regional variation in human development, partially explaining the observed differences in disease prevalence. Patterns of genetic ancestry and admixture for a cohort of 624 individuals from Medellín were compared to disease risk inferred via polygenic risk scores (PRS). African genetic ancestry is most strongly correlated with predicted disease risk, whereas European and Native American ancestry show weaker effects. African ancestry is mostly positively correlated with disease risk, and European ancestry is mostly negatively correlated. The relationships between ethnicity and disease prevalence do not show an overall correspondence with the relationships between ancestry and disease risk. We discuss possible reasons for the divergent health effects of ethnicity and ancestry as well as the implication of our results for the development of precision medicine in Colombia.
Collapse
|
27
|
Nagar SD, Conley AB, Chande AT, Rishishwar L, Sharma S, Mariño-Ramírez L, Aguinaga-Romero G, González-Andrade F, Jordan IK. Genetic ancestry and ethnic identity in Ecuador. HGG ADVANCES 2021; 2:100050. [PMID: 35047841 PMCID: PMC8756502 DOI: 10.1016/j.xhgg.2021.100050] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 08/09/2021] [Indexed: 02/05/2023] Open
Abstract
We investigated the ancestral origins of four Ecuadorian ethnic groups-Afro-Ecuadorian, Mestizo, Montubio, and the Indigenous Tsáchila-in an effort to gain insight on the relationship between ancestry, culture, and the formation of ethnic identities in Latin America. The observed patterns of genetic ancestry are largely concordant with ethnic identities and historical records of conquest and colonization in Ecuador. Nevertheless, a number of exceptional findings highlight the complex relationship between genetic ancestry and ethnicity in Ecuador. Afro-Ecuadorians show far less African ancestry, and the highest levels of Native American ancestry, seen for any Afro-descendant population in the Americas. Mestizos in Ecuador show high levels of Native American ancestry, with substantially less European ancestry, despite the relatively low Indigenous population in the country. The recently recognized Montubio ethnic group is highly admixed, with substantial contributions from all three continental ancestries. The Tsáchila show two distinct ancestry subgroups, with most individuals showing almost exclusively Native American ancestry and a smaller group showing a Mestizo characteristic pattern. Considered together with historical data and sociological studies, our results indicate the extent to which ancestry and culture interact, often in unexpected ways, to shape ethnic identity in Ecuador.
Collapse
|
28
|
Im SB, Gupta S, Jain M, Chande AT, Carleton HA, Jordan IK, Rishishwar L. Genome-Enabled Molecular Subtyping and Serotyping for Shiga Toxin-Producing Escherichia coli. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2021. [DOI: 10.3389/fsufs.2021.752873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Foodborne pathogens are a major public health burden in the United States, leading to 9.4 million illnesses annually. Since 1996, a national laboratory-based surveillance program, PulseNet, has used molecular subtyping and serotyping methods with the aim to reduce the burden of foodborne illness through early detection of emerging outbreaks. PulseNet affiliated laboratories have used pulsed-field gel electrophoresis (PFGE) and immunoassays to subtype and serotype bacterial isolates. Widespread use of serotyping and PFGE for foodborne illness surveillance over the years has resulted in the accumulation of a wealth of routine surveillance and outbreak epidemiological data. This valuable source of data has been used to understand seasonal frequency, geographic distribution, demographic information, exposure information, disease severity, and source of foodborne isolates. In 2019, PulseNet adopted whole genome sequencing (WGS) at a national scale to replace PFGE with higher-resolution methods such as the core genome multilocus sequence typing. Consequently, PulseNet's recent shift to genome-based subtyping methods has rendered the vast collection of historic surveillance data associated with serogroups and PFGE patterns potentially unusable. The goal of this study was to develop a bioinformatics method to associate the WGS data that are currently used by PulseNet for bacterial pathogen subtyping to previously characterized serogroup and PFGE patterns. Previous efforts to associate WGS to PFGE patterns relied on predicting DNA molecular weight based on restriction site analysis. However, these approaches failed owing to the non-uniform usage of genomic restriction sites by PFGE restriction enzymes. We developed a machine learning approach to classify isolates to their most probable serogroup and PFGE pattern, based on comparisons of genomic k-mer signatures. We applied our WGS classification method to 5,970 Shiga toxin-producing Escherichia coli (STEC) isolates collected as part of PulseNet's routine foodborne surveillance activities between 2003 and 2018. Our machine learning classifier is able to associate STEC WGS to higher-level serogroups with very high accuracy and lower-level PFGE patterns with somewhat lower accuracy. Taken together, these classifications support the ability of public health investigators to associate currently generated WGS data with historical epidemiological knowledge linked to serogroups and PFGE patterns in support of outbreak surveillance for food safety and public health.
Collapse
|
29
|
Mariño-Ramírez L, Ahmad M, Rishishwar L, Nagar SD, Lee KK, Norris ET, Jordan IK. Vitamin D and socioeconomic deprivation mediate COVID-19 ethnic health disparities. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.09.20.21263865. [PMID: 34611667 PMCID: PMC8491858 DOI: 10.1101/2021.09.20.21263865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Ethnic minorities in developed countries suffer a disproportionately high burden of COVID-19 morbidity and mortality, and COVID-19 ethnic disparities have been attributed to social determinants of health. Vitamin D has been proposed as a modifiable risk factor that could mitigate COVID-19 health disparities. We investigated the relationship between vitamin D and COVID-19 susceptibility and severity using the UK Biobank, a large progressive cohort study of the United Kingdom population. Structural equation modelling was used to evaluate the ability of vitamin D, socioeconomic deprivation, and other known risk factors to mediate COVID-19 ethnic health disparities. Asian ethnicity is associated with higher COVID-19 susceptibility, compared to the majority White population, and Asian and Black ethnicity are both associated with higher COVID-19 severity. Socioeconomic deprivation mediates all three ethnic disparities and shows the highest overall signal of mediation for any COVID-19 risk factor. Vitamin supplements, including vitamin D, mediate the Asian disparity in COVID-19 susceptibility, and serum 25-hydroxyvitamin D (calcifediol) levels mediate Asian and Black COVID-19 severity disparities. Several measures of overall health also mediate COVID-19 ethnic disparities, underscoring the importance of comorbidities. Our results support ethnic minorities' use of vitamin D as both a prophylactic and a supplemental therapeutic for COVID-19.
Collapse
|
30
|
Etienne KA, Berkow EL, Gade L, Nunnally N, Lockhart SR, Beer K, Jordan IK, Rishishwar L, Litvintseva AP. Genomic Diversity of Azole-Resistant Aspergillus fumigatus in the United States. mBio 2021; 12:e0180321. [PMID: 34372699 PMCID: PMC8406307 DOI: 10.1128/mbio.01803-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 07/09/2021] [Indexed: 12/19/2022] Open
Abstract
Azole resistance in pathogenic Aspergillus fumigatus has become a global public health issue threatening the use of medical azoles. The environmentally occurring resistance mutations, TR34/L98H (TR34) and TR46/Y121F/T289A (TR46), are widespread across multiple continents and emerging in the United States. We used whole-genome single nucleotide polymorphism (SNP) analysis on 179 nationally represented clinical and environmental A. fumigatus genomes from the United States along with 18 non-U.S. genomes to evaluate the genetic diversity and foundation of the emergence of azole resistance in the United States. We demonstrated the presence of clades of A. fumigatus isolates: clade A (17%) comprised a global collection of clinical and environmental azole-resistant strains, including all strains with the TR34/L98H allele from India, The Netherlands, the United Kingdom, and the United States, and clade B (83%) consisted of isolates without this marker mainly from the United States. The TR34/L98H polymorphism was shared among azole-resistant A. fumigatus strains from India, The Netherlands, the United Kingdom, and the United States, suggesting the common origin of this resistance mechanism. Six percent of azole-resistant A. fumigatus isolates from the United States with the TR34 resistance marker had a mixture of clade A and clade B alleles, suggestive of recombination. Additionally, the presence of equal proportions of both mating types further suggests the ongoing presence of recombination. This study demonstrates the genetic background for the emergence of azole resistance in the United States, supporting a single introduction and subsequent propagation, possibly through recombination of environmentally driven resistance mutations. IMPORTANCE Aspergillus fumigatus is one of the most common causes of invasive mold infections in patients with immune deficiencies and has also been reported in patients with severe influenza and severe acute respiratory syndrome coronavirus 2 (SARs-CoV-2). Triazole drugs are the first line of therapy for this infection; however, their efficacy has been compromised by the emergence of azole resistance in A. fumigatus, which was proposed to be selected for by exposure to azole fungicides in the environment [P. E. Verweij, E. Snelders, G. H. J. Kema, E. Mellado, et al., Lancet Infect Dis 9:789-795, 2009, https://doi.org/10.1016/S1473-3099(09)70265-8]. Isolates with environmentally driven resistance mutations, TR34/L98H (TR34) and TR46/Y121F/T289A (TR46), have been reported worldwide. Here, we used genomic analysis of a large sample of resistant and susceptible A. fumigatus isolates to demonstrate a single introduction of TR34 in the United States and suggest its ability to spread into the susceptible population is through recombination between resistant and susceptible isolates.
Collapse
|
31
|
Nagar SD, Nápoles AM, Jordan IK, Mariño-Ramírez L. Socioeconomic deprivation and genetic ancestry interact to modify type 2 diabetes ethnic disparities in the United Kingdom. EClinicalMedicine 2021; 37:100960. [PMID: 34386746 PMCID: PMC8343245 DOI: 10.1016/j.eclinm.2021.100960] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/19/2021] [Accepted: 05/25/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Type 2 diabetes (T2D) is a complex common disease that disproportionately impacts minority ethnic groups in the United Kingdom (UK). Socioeconomic deprivation (SED) is widely considered as a potential explanation for T2D ethnic disparities in the UK, whereas the effect of genetic ancestry (GA) on such disparities has yet to be studied. METHODS We leveraged data from the UK Biobank prospective cohort study, with participants enrolled between 2006 and 2010, to model the relationship between SED (Townsend index), GA (clustering principal components of whole genome genotype data), and T2D status (ICD-10 codes) across the three largest ethnic groups in the UK - Asian, Black, and White - using multivariable logistic regression. FINDINGS The Asian group shows the highest T2D prevalence (17·9%), followed by the Black (11·7%) and White (5·5%) ethnic groups. We find that both SED (OR: 1·11, 95% CI: 1·10-1·11) and non-European GA (OR South Asian versus European: 4·37, 95% CI: 4·10-4·66; OR African versus European: 2·52, 95% CI: 2·23-2·85) are significantly associated with the observed T2D disparities. GA and SED show significant interaction effects on T2D, with SED being a relatively greater risk factor for T2D for individuals with South Asian and African ancestry, compared to those with European ancestry. INTERPRETATION The significant interactions between SED and GA underscore how the effects of environmental risk factors can differ among ancestry groups, suggesting the need for group-specific interventions. FUNDING This work was supported by the National Institutes of Health (NIH) Distinguished Scholars Program (DSP) to LMR and the Division of Intramural Research (DIR) of the National Institute on Minority Health and Health Disparities (NIMHD) at NIH.
Collapse
|
32
|
Chande AT, Rishishwar L, Ban D, Nagar SD, Conley AB, Rowell J, Valderrama-Aguirre AE, Medina-Rivas MA, Jordan IK. The Phenotypic Consequences of Genetic Divergence between Admixed Latin American Populations: Antioquia and Chocó, Colombia. Genome Biol Evol 2021; 12:1516-1527. [PMID: 32681795 PMCID: PMC7513793 DOI: 10.1093/gbe/evaa154] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association studies have uncovered thousands of genetic variants that are associated with a wide variety of human traits. Knowledge of how trait-associated variants are distributed within and between populations can provide insight into the genetic basis of group-specific phenotypic differences, particularly for health-related traits. We analyzed the genetic divergence levels for 1) individual trait-associated variants and 2) collections of variants that function together to encode polygenic traits, between two neighboring populations in Colombia that have distinct demographic profiles: Antioquia (Mestizo) and Chocó (Afro-Colombian). Genetic ancestry analysis showed 62% European, 32% Native American, and 6% African ancestry for Antioquia compared with 76% African, 10% European, and 14% Native American ancestry for Chocó, consistent with demography and previous results. Ancestry differences can confound cross-population comparison of polygenic risk scores (PRS); however, we did not find any systematic bias in PRS distributions for the two populations studied here, and population-specific differences in PRS were, for the most part, small and symmetrically distributed around zero. Both genetic differentiation at individual trait-associated single nucleotide polymorphisms and population-specific PRS differences between Antioquia and Chocó largely reflected anthropometric phenotypic differences that can be readily observed between the populations along with reported disease prevalence differences. Cases where population-specific differences in genetic risk did not align with observed trait (disease) prevalence point to the importance of environmental contributions to phenotypic variance, for both infectious and complex, common disease. The results reported here are distributed via a web-based platform for searching trait-associated variants and PRS divergence levels at http://map.chocogen.com (last accessed August 12, 2020).
Collapse
|
33
|
Medina-Cordoba LK, Chande AT, Rishishwar L, Mayer LW, Valderrama-Aguirre LC, Valderrama-Aguirre A, Gaby JC, Kostka JE, Jordan IK. Genomic characterization and computational phenotyping of nitrogen-fixing bacteria isolated from Colombian sugarcane fields. Sci Rep 2021; 11:9187. [PMID: 33911103 PMCID: PMC8080613 DOI: 10.1038/s41598-021-88380-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 04/07/2021] [Indexed: 01/26/2023] Open
Abstract
Previous studies have shown the sugarcane microbiome harbors diverse plant growth promoting microorganisms, including nitrogen-fixing bacteria (diazotrophs), which can serve as biofertilizers. The genomes of 22 diazotrophs from Colombian sugarcane fields were sequenced to investigate potential biofertilizers. A genome-enabled computational phenotyping approach was developed to prioritize sugarcane associated diazotrophs according to their potential as biofertilizers. This method selects isolates that have potential for nitrogen fixation and other plant growth promoting (PGP) phenotypes while showing low risk for virulence and antibiotic resistance. Intact nitrogenase (nif) genes and operons were found in 18 of the isolates. Isolates also encode phosphate solubilization and siderophore production operons, and other PGP genes. The majority of sugarcane isolates showed uniformly low predicted virulence and antibiotic resistance compared to clinical isolates. Six strains with the highest overall genotype scores were experimentally evaluated for nitrogen fixation, phosphate solubilization, and the production of siderophores, gibberellic acid, and indole acetic acid. Results from the biochemical assays were consistent and validated computational phenotype predictions. A genotypic and phenotypic threshold was observed that separated strains by their potential for PGP versus predicted pathogenicity. Our results indicate that computational phenotyping is a promising tool for the assessment of bacteria detected in agricultural ecosystems.
Collapse
|
34
|
Wozniak JE, Chande AT, Burd EM, Band VI, Satola SW, Farley MM, Jacob JT, Jordan IK, Weiss DS. Absence of mgrB Alleviates Negative Growth Effects of Colistin Resistance in Enterobacter cloacae. Antibiotics (Basel) 2020; 9:antibiotics9110825. [PMID: 33227907 PMCID: PMC7699182 DOI: 10.3390/antibiotics9110825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 11/21/2022] Open
Abstract
Colistin is an important last-line antibiotic to treat highly resistant Enterobacter infections. Resistance to colistin has emerged among clinical isolates but has been associated with a significant growth defect. Here, we describe a clinical Enterobacter isolate with a deletion of mgrB, a regulator of colistin resistance, leading to high-level resistance in the absence of a growth defect. The identification of a path to resistance unrestrained by growth defects suggests colistin resistance could become more common in Enterobacter.
Collapse
|
35
|
Nagar SD, Conley AB, Jordan IK. Population structure and pharmacogenomic risk stratification in the United States. BMC Biol 2020; 18:140. [PMID: 33050895 PMCID: PMC7557099 DOI: 10.1186/s12915-020-00875-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 09/22/2020] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Pharmacogenomic (PGx) variants mediate how individuals respond to medication, and response differences among racial/ethnic groups have been attributed to patterns of PGx diversity. We hypothesized that genetic ancestry (GA) would provide higher resolution for stratifying PGx risk, since it serves as a more reliable surrogate for genetic diversity than self-identified race/ethnicity (SIRE), which includes a substantial social component. We analyzed a cohort of 8628 individuals from the United States (US), for whom we had both SIRE information and whole genome genotypes, with a focus on the three largest SIRE groups in the US: White, Black (African-American), and Hispanic (Latino). Our approach to the question of PGx risk stratification entailed the integration of two distinct methodologies: population genetics and evidence-based medicine. This integrated approach allowed us to consider the clinical implications for the observed patterns of PGx variation found within and between population groups. RESULTS Whole genome genotypes were used to characterize individuals' continental ancestry fractions-European, African, and Native American-and individuals were grouped according to their GA profiles. SIRE and GA groups were found to be highly concordant. Continental ancestry predicts individuals' SIRE with > 96% accuracy, and accordingly, GA provides only a marginal increase in resolution for PGx risk stratification. In light of the concordance between SIRE and GA, taken together with the fact that information on SIRE is readily available to clinicians, we evaluated PGx variation between SIRE groups to explore the potential clinical utility of race and ethnicity. PGx variants are highly diverged compared to the genomic background; 82 variants show significant frequency differences among SIRE groups, and genome-wide patterns of PGx variation are almost entirely concordant with SIRE. The vast majority of PGx variation is found within rather than between groups, a well-established fact for almost all genetic variants, which is often taken to argue against the clinical utility of population stratification. Nevertheless, analysis of highly differentiated PGx variants illustrates how SIRE partitions PGx variation based on groups' characteristic ancestry patterns. These cases underscore the extent to which SIRE carries clinically valuable information for stratifying PGx risk among populations, albeit with less utility for predicting individual-level PGx alleles (genotypes), supporting the concept of population pharmacogenomics. CONCLUSIONS Perhaps most interestingly, we show that individuals who identify as Black or Hispanic stand to gain far more from the consideration of race/ethnicity in treatment decisions than individuals from the majority White population.
Collapse
|
36
|
Espitia-Navarro HF, Chande AT, Nagar SD, Smith H, Jordan IK, Rishishwar L. STing: accurate and ultrafast genomic profiling with exact sequence matches. Nucleic Acids Res 2020; 48:7681-7689. [PMID: 32619234 PMCID: PMC7430640 DOI: 10.1093/nar/gkaa566] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 06/16/2020] [Accepted: 07/01/2020] [Indexed: 11/30/2022] Open
Abstract
Genome-enabled approaches to molecular epidemiology have become essential to public health agencies and the microbial research community. We developed the algorithm STing to provide turn-key solutions for molecular typing and gene detection directly from next generation sequence data of microbial pathogens. Our implementation of STing uses an innovative k-mer search strategy that eliminates the computational overhead associated with the time-consuming steps of quality control, assembly, and alignment, required by more traditional methods. We compared STing to six of the most widely used programs for genome-based molecular typing and demonstrate its ease of use, accuracy, speed and efficiency. STing shows superior accuracy and performance for standard multilocus sequence typing schemes, along with larger genome-scale typing schemes, and it enables rapid automated detection of antimicrobial resistance and virulence factor genes. STing determines the sequence type of traditional 7-gene MLST with 100% accuracy in less than 10 seconds per isolate. We hope that the adoption of STing will help to democratize microbial genomics and thereby maximize its benefit for public health.
Collapse
|
37
|
Clayton EA, Rishishwar L, Huang TC, Gulati S, Ban D, McDonald JF, Jordan IK. Abstract 2115: An atlas of transposable element derived alternative splicing in cancer. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-2115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Transposable element (TE) derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially utilized in normal versus cancer tissues. We analyzed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumor tissue pairs across 13 cancer types, resulting in the discovery of 4,820 TE-generated alternative splice events distributed among 723 cancer-associated genes. SINEs (Alu) and LINEs (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes - including MYH11, WHSC1, and CANT1 - were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats.
Citation Format: Evan A. Clayton, Lavanya Rishishwar, Tzu-Chuan Huang, Saurabh Gulati, Dongjo Ban, John F. McDonald, I. King Jordan. An atlas of transposable element derived alternative splicing in cancer [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2115.
Collapse
|
38
|
Chande AT, Rishishwar L, Conley AB, Valderrama-Aguirre A, Medina-Rivas MA, Jordan IK. Ancestry effects on type 2 diabetes genetic risk inference in Hispanic/Latino populations. BMC MEDICAL GENETICS 2020; 21:132. [PMID: 32580712 PMCID: PMC7315475 DOI: 10.1186/s12881-020-01068-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Background Hispanic/Latino (HL) populations bear a disproportionately high burden of type 2 diabetes (T2D). The ability to predict T2D genetic risk using polygenic risk scores (PRS) offers great promise for improved screening and prevention. However, there are a number of complications related to the accurate inference of genetic risk across HL populations with distinct ancestry profiles. We investigated how ancestry affects the inference of T2D genetic risk using PRS in diverse HL populations from Colombia and the United States (US). In Colombia, we compared T2D genetic risk for the Mestizo population of Antioquia to the Afro-Colombian population of Chocó, and in the US, we compared European-American versus Mexican-American populations. Methods Whole genome sequences and genotypes from the 1000 Genomes Project and the ChocoGen Research Project were used for genetic ancestry inference and for T2D polygenic risk score (PRS) calculation. Continental ancestry fractions for HL genomes were inferred via comparison with African, European, and Native American reference genomes, and PRS were calculated using T2D risk variants taken from multiple genome-wide association studies (GWAS) conducted on cohorts with diverse ancestries. A correction for ancestry bias in T2D risk inference based on the frequencies of ancestral versus derived alleles was developed and applied to PRS calculations in the HL populations studied here. Results T2D genetic risk in Colombian and US HL populations is positively correlated with African and Native American ancestry and negatively correlated with European ancestry. The Afro-Colombian population of Chocó has higher predicted T2D risk than Antioquia, and the Mexican-American population has higher predicted risk than the European-American population. The inferred relative risk of T2D is robust to differences in the ancestry of the GWAS cohorts used for variant discovery. For trans-ethnic GWAS, population-specific variants and variants with same direction effects across populations yield consistent results. Nevertheless, the control for bias in T2D risk prediction confirms that explicit consideration of genetic ancestry can yield more reliable cross-population genetic risk inferences. Conclusions T2D associations that replicate across populations provide for more reliable risk inference, and modeling population-specific frequencies of ancestral and derived risk alleles can help control for biases in PRS estimation.
Collapse
|
39
|
Clayton EA, Rishishwar L, Huang TC, Gulati S, Ban D, McDonald JF, Jordan IK. An atlas of transposable element-derived alternative splicing in cancer. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190342. [PMID: 32075558 PMCID: PMC7061986 DOI: 10.1098/rstb.2019.0342] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
Transposable element (TE)-derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially used in normal versus cancer tissues. We analysed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumour tissue pairs across 13 cancer types, resulting in the discovery of 4820 TE-generated alternative splice events distributed among 723 cancer-associated genes. Short interspersed nuclear elements (Alu) and long interspersed nuclear elements (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes, including MYH11, WHSC1 and CANT1, were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
|
40
|
Norris ET, Rishishwar L, Chande AT, Conley AB, Ye K, Valderrama-Aguirre A, Jordan IK. Admixture-enabled selection for rapid adaptive evolution in the Americas. Genome Biol 2020; 21:29. [PMID: 32028992 PMCID: PMC7006128 DOI: 10.1186/s13059-020-1946-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 01/24/2020] [Indexed: 02/08/2023] Open
Abstract
Background Admixture occurs when previously isolated populations come together and exchange genetic material. We hypothesize that admixture can enable rapid adaptive evolution in human populations by introducing novel genetic variants (haplotypes) at intermediate frequencies, and we test this hypothesis through the analysis of whole genome sequences sampled from admixed Latin American populations in Colombia, Mexico, Peru, and Puerto Rico. Results Our screen for admixture-enabled selection relies on the identification of loci that contain more or less ancestry from a given source population than would be expected given the genome-wide ancestry frequencies. We employ a combined evidence approach to evaluate levels of ancestry enrichment at single loci across multiple populations and multiple loci that function together to encode polygenic traits. We find cross-population signals of African ancestry enrichment at the major histocompatibility locus on chromosome 6, consistent with admixture-enabled selection for enhanced adaptive immune response. Several of the human leukocyte antigen genes at this locus, such as HLA-A, HLA-DRB51, and HLA-DRB5, show independent evidence of positive selection prior to admixture, based on extended haplotype homozygosity in African populations. A number of traits related to inflammation, blood metabolites, and both the innate and adaptive immune system show evidence of admixture-enabled polygenic selection in Latin American populations. Conclusions The results reported here, considered together with the ubiquity of admixture in human evolution, suggest that admixture serves as a fundamental mechanism that drives rapid adaptive evolution in human populations.
Collapse
|
41
|
Clayton EA, Khalid S, Ban D, Wang L, Jordan IK, McDonald JF. Tumor suppressor genes and allele-specific expression: mechanisms and significance. Oncotarget 2020; 11:462-479. [PMID: 32064050 PMCID: PMC6996918 DOI: 10.18632/oncotarget.27468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 01/13/2020] [Indexed: 12/12/2022] Open
Abstract
Recent findings indicate that allele-specific expression (ASE) at specific cancer driver gene loci may be of importance in onset/progression of the disease. Of particular interest are loss-of-function (LOF) of tumor suppressor gene (TSGs) alleles. While LOF tumor suppressor mutations are typically considered to be recessive, if these mutant alleles can be significantly differentially expressed relative to wild-type alleles in heterozygotes, the clinical consequences could be significant. LOF TSG alleles are shown to be segregating at high frequencies in world-wide populations of normal/healthy individuals. Matched sets of normal and tumor tissues isolated from 233 cancer patients representing four diverse tumor types demonstrate functionally important changes in patterns of ASE in individuals heterozygous for LOF TSG alleles associated with cancer onset/progression. While a variety of molecular mechanisms were identified as potentially contributing to changes in ASE patterns in cancer, changes in DNA copy number and allele-specific alternative splicing possibly mediated by antisense RNA emerged as predominant factors. In conclusion, LOF TSGs are segregating in human populations at significant frequencies indicating that many otherwise healthy individuals are at elevated risk of developing cancer. Changes in ASE between normal and cancer tissues indicates that LOF TSG alleles may contribute to cancer onset/progression even when heterozygous with wild-type functional alleles.
Collapse
|
42
|
Jordan IK, Rishishwar L, Conley AB. Native American admixture recapitulates population-specific migration and settlement of the continental United States. PLoS Genet 2019; 15:e1008225. [PMID: 31545791 PMCID: PMC6756731 DOI: 10.1371/journal.pgen.1008225] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/31/2019] [Indexed: 11/19/2022] Open
Abstract
European and African descendants settled the continental US during the 17th-19th centuries, coming into contact with established Native American populations. The resulting admixture among these groups yielded a significant reservoir of Native American ancestry in the modern US population. We analyzed the patterns of Native American admixture seen for the three largest genetic ancestry groups in the US population: African descendants, Western European descendants, and Spanish descendants. The three groups show distinct Native American ancestry profiles, which are indicative of their historical patterns of migration and settlement across the country. Native American ancestry in the modern African descendant population does not coincide with local geography, instead forming a single group with origins in the southeastern US, consistent with the Great Migration of the early 20th century. Western European descendants show Native American ancestry that tracks their geographic origins across the US, indicative of ongoing contact during westward expansion, and Native American ancestry can resolve Spanish descendant individuals into distinct local groups formed by more recent migration from Mexico and Puerto Rico. We found an anomalous pattern of Native American ancestry from the US southwest, which most likely corresponds to the Nuevomexicano descendants of early Spanish settlers to the region. We addressed a number of controversies surrounding this population, including the extent of Sephardic Jewish ancestry. Nuevomexicanos are less admixed than nearby Mexican-American individuals, with more European and less Native American and African ancestry, and while they do show demonstrable Sephardic Jewish ancestry, the fraction is no greater than seen for other New World Spanish descendant populations.
Collapse
|
43
|
Chande AT, Wang L, Rishishwar L, Conley AB, Norris ET, Valderrama-Aguirre A, Jordan IK. GlobAl Distribution of GEnetic Traits (GADGET) web server: polygenic trait scores worldwide. Nucleic Acids Res 2019; 46:W121-W126. [PMID: 29788182 PMCID: PMC6031022 DOI: 10.1093/nar/gky415] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/03/2018] [Indexed: 11/14/2022] Open
Abstract
Human populations from around the world show striking phenotypic variation across a wide variety of traits. Genome-wide association studies (GWAS) are used to uncover genetic variants that influence the expression of heritable human traits; accordingly, population-specific distributions of GWAS-implicated variants may shed light on the genetic basis of human phenotypic diversity. With this in mind, we developed the GlobAl Distribution of GEnetic Traits web server (GADGET http://gadget.biosci.gatech.edu). The GADGET web server provides users with a dynamic visual platform for exploring the relationship between worldwide genetic diversity and the genetic architecture underlying numerous human phenotypes. GADGET integrates trait-implicated single nucleotide polymorphisms (SNPs) from GWAS, with population genetic data from the 1000 Genomes Project, to calculate genome-wide polygenic trait scores (PTS) for 818 phenotypes in 2504 individual genomes. Population-specific distributions of PTS are shown for 26 human populations across 5 continental population groups, with traits ordered based on the extent of variation observed among populations. Users of GADGET can also upload custom trait SNP sets to visualize global PTS distributions for their own traits of interest.
Collapse
|
44
|
Crisan CV, Chande AT, Williams K, Raghuram V, Rishishwar L, Steinbach G, Watve SS, Yunker P, Jordan IK, Hammer BK. Analysis of Vibrio cholerae genomes identifies new type VI secretion system gene clusters. Genome Biol 2019; 20:163. [PMID: 31405375 PMCID: PMC6691524 DOI: 10.1186/s13059-019-1765-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 07/18/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Like many bacteria, Vibrio cholerae deploys a harpoon-like type VI secretion system (T6SS) to compete against other microbes in environmental and host settings. The T6SS punctures adjacent cells and delivers toxic effector proteins that are harmless to bacteria carrying cognate immunity factors. Only four effector/immunity pairs encoded on one large and three auxiliary gene clusters have been characterized from largely clonal, patient-derived strains of V. cholerae. RESULTS We sequence two dozen V. cholerae strain genomes from diverse sources and develop a novel and adaptable bioinformatics tool based on hidden Markov models. We identify two new T6SS auxiliary gene clusters and describe Aux 5 here. Four Aux 5 loci are present in the host strain, each with an atypical effector/immunity gene organization. Structural prediction of the putative effector indicates it is a lipase, which we name TleV1 (type VI lipase effector Vibrio). Ectopic TleV1 expression induces toxicity in Escherichia coli, which is rescued by co-expression of the TliV1a immunity factor. A clinical V. cholerae reference strain expressing the Aux 5 cluster uses TleV1 to lyse its parental strain upon contact via its T6SS but is unable to kill parental cells expressing the TliV1a immunity factor. CONCLUSION We develop a novel bioinformatics method and identify new T6SS gene clusters in V. cholerae. We also show the TleV1 toxin is delivered in a T6SS manner by V. cholerae and can lyse other bacterial cells. Our web-based tool can be modified to identify additional novel T6SS genomic loci in diverse bacterial species.
Collapse
|
45
|
Norris ET, Rishishwar L, Wang L, Conley AB, Chande AT, Dabrowski AM, Valderrama-Aguirre A, Jordan IK. Assortative Mating on Ancestry-Variant Traits in Admixed Latin American Populations. Front Genet 2019; 10:359. [PMID: 31105740 PMCID: PMC6491930 DOI: 10.3389/fgene.2019.00359] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 04/04/2019] [Indexed: 11/13/2022] Open
Abstract
Assortative mating is a universal feature of human societies, and individuals from ethnically diverse populations are known to mate assortatively based on similarities in genetic ancestry. However, little is currently known regarding the exact phenotypic cues, or their underlying genetic architecture, which inform ancestry-based assortative mating. We developed a novel approach, using genome-wide analysis of ancestry-specific haplotypes, to evaluate ancestry-based assortative mating on traits whose expression varies among the three continental population groups – African, European, and Native American – that admixed to form modern Latin American populations. Application of this method to genome sequences sampled from Colombia, Mexico, Peru, and Puerto Rico revealed widespread ancestry-based assortative mating. We discovered a number of anthropometric traits (body mass, height, and facial development) and neurological attributes (educational attainment and schizophrenia) that serve as phenotypic cues for ancestry-based assortative mating. Major histocompatibility complex (MHC) loci show population-specific patterns of both assortative and disassortative mating in Latin America. Ancestry-based assortative mating in the populations analyzed here appears to be driven primarily by African ancestry. This study serves as an example of how population genomic analyses can yield novel insights into human behavior.
Collapse
|
46
|
Nagar SD, Moreno AM, Norris ET, Rishishwar L, Conley AB, O'Neal KL, Vélez-Gómez S, Montes-Rodríguez C, Jaraba-Álvarez WV, Torres I, Medina-Rivas MA, Valderrama-Aguirre A, Jordan IK, Gallo JE. Population Pharmacogenomics for Precision Public Health in Colombia. Front Genet 2019; 10:241. [PMID: 30967898 PMCID: PMC6439339 DOI: 10.3389/fgene.2019.00241] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Accepted: 03/04/2019] [Indexed: 11/13/2022] Open
Abstract
While genomic approaches to precision medicine hold great promise, they remain prohibitively expensive for developing countries. The precision public health paradigm, whereby healthcare decisions are made at the level of populations as opposed to individuals, provides one way for the genomics revolution to directly impact health outcomes in the developing world. Genomic approaches to precision public health require a deep understanding of local population genomics, which is still missing for many developing countries. We are investigating the population genomics of genetic variants that mediate drug response in an effort to inform healthcare decisions in Colombia. Our work focuses on two neighboring populations with distinct ancestry profiles: Antioquia and Chocó. Antioquia has primarily European genetic ancestry followed by Native American and African components, whereas Chocó shows mainly African ancestry with lower levels of Native American and European admixture. We performed a survey of the global distribution of pharmacogenomic variants followed by a more focused study of pharmacogenomic allele frequency differences between the two Colombian populations. Worldwide, we found pharmacogenomic variants to have both unusually high minor allele frequencies and high levels of population differentiation. A number of these pharmacogenomic variants also show anomalous effect allele frequencies within and between the two Colombian populations, and these differences were found to be associated with their distinct genetic ancestry profiles. For example, the C allele of the single nucleotide polymorphism (SNP) rs4149056 [Solute Carrier Organic Anion Transporter Family Member 1B1 (SLCO1B1)∗5], which is associated with an increased risk of toxicity to a commonly prescribed statin, is found at relatively high frequency in Antioquia and is associated with European ancestry. In addition to pharmacogenomic alleles related to increased toxicity risk, we also have evidence that alleles related to dosage and metabolism have large frequency differences between the two populations, which are associated with their specific ancestries. Using these findings, we have developed and validated an inexpensive allele-specific PCR assay to test for the presence of such population-enriched pharmacogenomic SNPs in Colombia. These results serve as an example of how population-centered approaches to pharmacogenomics can help to realize the promise of precision medicine in resource-limited settings.
Collapse
|
47
|
Bernardy EE, Petit RA, Moller AG, Blumenthal JA, McAdam AJ, Priebe GP, Chande AT, Rishishwar L, Jordan IK, Read TD, Goldberg JB. Whole-Genome Sequences of Staphylococcus aureus Isolates from Cystic Fibrosis Lung Infections. Microbiol Resour Announc 2019; 8:e01564-18. [PMID: 30687841 PMCID: PMC6346173 DOI: 10.1128/mra.01564-18] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 12/12/2018] [Indexed: 02/05/2023] Open
Abstract
Staphylococcus aureus is an early colonizer in the lungs of individuals with cystic fibrosis (CF), but surprisingly, only a limited number of genomes from CF-associated S. aureus isolates have been sequenced. Here, we present the whole-genome sequences of 65 S. aureus isolates obtained from 50 individuals with CF.
Collapse
|
48
|
Norris ET, Wang L, Conley AB, Rishishwar L, Mariño-Ramírez L, Valderrama-Aguirre A, Jordan IK. Genetic ancestry, admixture and health determinants in Latin America. BMC Genomics 2018; 19:861. [PMID: 30537949 PMCID: PMC6288849 DOI: 10.1186/s12864-018-5195-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Modern Latin American populations were formed via genetic admixture among ancestral source populations from Africa, the Americas and Europe. We are interested in studying how combinations of genetic ancestry in admixed Latin American populations may impact genomic determinants of health and disease. For this study, we characterized the impact of ancestry and admixture on genetic variants that underlie health- and disease-related phenotypes in population genomic samples from Colombia, Mexico, Peru, and Puerto Rico. RESULTS We analyzed a total of 347 admixed Latin American genomes along with 1102 putative ancestral source genomes from Africans, Europeans, and Native Americans. We characterized the genetic ancestry, relatedness, and admixture patterns for each of the admixed Latin American genomes, finding a spectrum of ancestry proportions within and between populations. We then identified single nucleotide polymorphisms (SNPs) with anomalous ancestry-enrichment patterns, i.e. SNPs that exist in any given Latin American population at a higher frequency than expected based on the population's genetic ancestry profile. For this set of ancestry-enriched SNPs, we inspected their phenotypic impact on disease, metabolism, and the immune system. All four of the Latin American populations show ancestry-enrichment for a number of shared pathways, yielding evidence of similar selection pressures on these populations during their evolution. For example, all four populations show ancestry-enriched SNPs in multiple genes from immune system pathways, such as the cytokine receptor interaction, T cell receptor signaling, and antigen presentation pathways. We also found SNPs with excess African or European ancestry that are associated with ancestry-specific gene expression patterns and play crucial roles in the immune system and infectious disease responses. Genes from both the innate and adaptive immune system were found to be regulated by ancestry-enriched SNPs with population-specific regulatory effects. CONCLUSIONS Ancestry-enriched SNPs in Latin American populations have a substantial effect on health- and disease-related phenotypes. The concordant impact observed for same phenotypes across populations points to a process of adaptive introgression, whereby ancestry-enriched SNPs with specific functional utility appear to have been retained in modern populations by virtue of their effects on health and fitness.
Collapse
|
49
|
Rishishwar L, Wang L, Wang J, Yi SV, Lachance J, Jordan IK. Evidence for positive selection on recent human transposable element insertions. Gene 2018; 675:69-79. [DOI: 10.1016/j.gene.2018.06.077] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 06/24/2018] [Indexed: 11/29/2022]
|
50
|
Rishishwar L, Mariño-Ramírez L, Jordan IK. Benchmarking computational tools for polymorphic transposable element detection. Brief Bioinform 2018; 18:908-918. [PMID: 27524380 DOI: 10.1093/bib/bbw072] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Indexed: 12/19/2022] Open
Abstract
Transposable elements (TEs) are an important source of human genetic variation with demonstrable effects on phenotype. Recently, a number of computational methods for the detection of polymorphic TE (polyTE) insertion sites from next-generation sequence data have been developed. The use of such tools will become increasingly important as the pace of human genome sequencing accelerates. For this report, we performed a comparative benchmarking and validation analysis of polyTE detection tools in an effort to inform their selection and use by the TE research community. We analyzed a core set of seven tools with respect to ease of use and accessibility, polyTE detection performance and runtime parameters. An experimentally validated set of 893 human polyTE insertions was used for this purpose, along with a series of simulated data sets that allowed us to assess the impact of sequence coverage on tool performance. The recently developed tool MELT showed the best overall performance followed by Mobster and then RetroSeq. PolyTE detection tools can best detect Alu insertion events in the human genome with reduced reliability for L1 insertions and substantially lowered performance for SVA insertions. We also show evidence that different polyTE detection tools are complementary with respect to their ability to detect a complete set of insertion events. Accordingly, a combined approach, coupled with manual inspection of individual results, may yield the best overall performance. In addition to the benchmarking results, we also provide notes on tool installation and usage as well as suggestions for future polyTE detection algorithm development.
Collapse
|