1
|
Genomic data in the All of Us Research Program. Nature 2024; 627:340-346. [PMID: 38374255 PMCID: PMC10937371 DOI: 10.1038/s41586-023-06957-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 12/08/2023] [Indexed: 02/21/2024]
Abstract
Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics1-4. The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health5,6. Here we describe the programme's genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.
Collapse
|
2
|
HIVseqDB: a portable resource for NGS and sample metadata integration for HIV-1 drug resistance analysis. BIOINFORMATICS ADVANCES 2024; 4:vbae008. [PMID: 38312948 PMCID: PMC10834361 DOI: 10.1093/bioadv/vbae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 12/29/2023] [Accepted: 01/12/2024] [Indexed: 02/06/2024]
Abstract
Summary Human immunodeficiency virus (HIV) remains a public health threat, with drug resistance being a major concern in HIV treatment. Next-generation sequencing (NGS) is a powerful tool for identifying low-abundance drug resistance mutations (LA-DRMs) that conventional Sanger sequencing cannot reliably detect. To fully understand the significance of LA-DRMs, it is necessary to integrate NGS data with clinical and demographic data. However, freely available tools for NGS-based HIV-1 drug resistance analysis do not integrate these data. This poses a challenge in interpretation of the impact of LA-DRMs, mainly for resource-limited settings due to the shortage of bioinformatics expertise. To address this challenge, we present HIVseqDB, a portable, secure, and user-friendly resource for integrating NGS data with associated clinical and demographic data for analysis of HIV drug resistance. HIVseqDB currently supports uploading of NGS data and associated sample data, HIV-1 drug resistance data analysis, browsing of uploaded data, and browsing and visualizing of analysis results. Each function of HIVseqDB corresponds to an individual Django application. This ensures efficient incorporation of additional features with minimal effort. HIVseqDB can be deployed on various computing environments, such as on-premises high-performance computing facilities and cloud-based platforms. Availability and implementation HIVseqDB is available at https://github.com/AlfredUg/HIVseqDB. A deployed instance of HIVseqDB is available at https://hivseqdb.org.
Collapse
|
3
|
Sick individuals, sick populations revisited: a test of the Rose hypothesis for type 2 diabetes disparities. BMJ PUBLIC HEALTH 2023; 1:e000655. [PMID: 38239263 PMCID: PMC10795613 DOI: 10.1136/bmjph-2023-000655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2024]
Abstract
Introduction The Rose hypothesis predicts that since genetic variation is greater within than between populations, genetic risk factors will be associated with individuals' risk of disease but not population disparities, and since socioenvironmental variation is greater between than within populations, socioenvironmental risk factors will be associated with population disparities but not individuals' disease risk. Methods We used the UK Biobank to test the Rose hypothesis for type 2 diabetes (T2D) ethnic disparities in the UK. Our cohort consists of 26 912 participants from Asian, black and white ethnic groups. Participants were characterised as T2D cases or controls based on the presence or absence of T2D diagnosis codes in electronic health records. T2D genetic risk was measured using a polygenic risk score (PRS), and socioeconomic deprivation was measured with the Townsend Index (TI). The variation of genetic (PRS) and socioeconomic (TI) risk factors within and between ethnic groups was calculated using analysis of variance. Multivariable logistic regression was used to associate PRS and TI with T2D cases, and mediation analysis was used to analyse the effect of PRS and TI on T2D ethnic group disparities. Results T2D prevalence differs for Asian 23.34% (OR=5.14, CI=4.68 to 5.65), black 16.64% (OR=3.81, CI=3.44 to 4.22) and white 7.35% (reference) ethnic groups in the UK. Both genetic and socioenvironmental T2D risk factors show greater within (w) than between (b) ethnic group variation: PRS w=64.60%, b=35.40%; TI w=71.18%, b=28.19%. Nevertheless, both genetic risk (PRS OR=1.96, CI=1.87 to 2.07) and socioeconomic deprivation (TI OR=1.09, CI=1.08 to 1.10) are associated with T2D individual risk and mediate T2D ethnic disparities (Asian PRS=22.5%, TI=9.8%; black PRS=32.0%, TI=25.3%). Conclusion A relative excess of within-group versus between-group variation does not preclude T2D risk factors from contributing to T2D ethnic disparities. Our results support an integrative approach to health disparities research that includes both genetic and socioenvironmental risk factors.
Collapse
|
4
|
The role of admixture in the rare variant contribution to inflammatory bowel disease. Genome Med 2023; 15:97. [PMID: 37968638 PMCID: PMC10647102 DOI: 10.1186/s13073-023-01244-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 10/10/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND Identification of rare variants involved in complex, polygenic diseases like Crohn's disease (CD) has accelerated with the introduction of whole exome/genome sequencing association studies. Rare variants can be used in both diagnostic and therapeutic assessments; however, since they are likely to be restricted to specific ancestry groups, their contributions to risk assessment need to be evaluated outside the discovery population. Prior studies implied that the three known rare variants in NOD2 are absent in West African and Asian populations and only contribute in African Americans via admixture. METHODS Whole genome sequencing (WGS) data from 3418 African American individuals, 1774 inflammatory bowel disease (IBD) cases, and 1644 controls were used to assess odds ratios and allele frequencies (AF), as well as haplotype-specific ancestral origins of European-derived CD variants discovered in a large exome-wide association study. Local and global ancestry was performed to assess the contribution of admixture to IBD contrasting European and African American cohorts. RESULTS Twenty-five rare variants associated with CD in European discovery cohorts are typically five-fold lower frequency in African Americans. Correspondingly, where comparisons could be made, the rare variants were found to have a predicted four-fold reduced burden for IBD in African Americans, when compared to European individuals. Almost all of the rare CD European variants were found on European haplotypes in the African American cohort, implying that they contribute to disease risk in African Americans primarily due to recent admixture. In addition, proportion of European ancestry correlates the number of rare CD European variants each African American individual carry, as well as their polygenic risk of disease. Similar findings were observed for 23 mutations affecting 10 other common complex diseases for which the rare variants were discovered in European cohorts. CONCLUSIONS European-derived Crohn's disease rare variants are even more rare in African Americans and contribute to disease risk mainly due to admixture, which needs to be accounted for when performing cross-ancestry genetic assessments.
Collapse
|
5
|
Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort. BMC GLOBAL AND PUBLIC HEALTH 2023; 1:22. [PMID: 38045036 PMCID: PMC10693462 DOI: 10.1186/s44263-023-00025-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/28/2023] [Indexed: 12/05/2023]
Abstract
Background Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the USA and has a greater observed prevalence among those who identify as Black or Hispanic. Methods This study aimed to assess T2D racial and ethnic disparities using the All of Us Research Program data and to measure associations between genetic ancestry (GA), socioeconomic deprivation, and T2D. We used the All of Us Researcher Workbench to analyze T2D prevalence and model its associations with GA, individual-level (iSDI), and zip code-based (zSDI) socioeconomic deprivation indices among participant self-identified race and ethnicity (SIRE) groups. Results The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n = 2311), Black (n = 16,282), Hispanic (n = 16,966), and White (n = 50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest levels of socioeconomic deprivation, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation, both iSDI and zSDI, are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with iSDI and zSDI on T2D. Higher levels of iSDI and zSDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of iSDI and zSDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusions Socioeconomic deprivation is associated with a higher prevalence of T2D in Black and Hispanic minority groups, compared to the majority White group. Nonetheless, socioeconomic deprivation is associated with reduced T2D risk within the Black and Hispanic groups. These results are paradoxical and have not been reported elsewhere, with possible explanations related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.
Collapse
|
6
|
Population Pharmacogenomics for Health Equity. Genes (Basel) 2023; 14:1840. [PMID: 37895188 PMCID: PMC10606908 DOI: 10.3390/genes14101840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 10/29/2023] Open
Abstract
Health equity means the opportunity for all people and populations to attain optimal health, and it requires intentional efforts to promote fairness in patient treatments and outcomes. Pharmacogenomic variants are genetic differences associated with how patients respond to medications, and their presence can inform treatment decisions. In this perspective, we contend that the study of pharmacogenomic variation within and between human populations-population pharmacogenomics-can and should be leveraged in support of health equity. The key observation in support of this contention is that racial and ethnic groups exhibit pronounced differences in the frequencies of numerous pharmacogenomic variants, with direct implications for clinical practice. The use of race and ethnicity to stratify pharmacogenomic risk provides a means to avoid potential harm caused by biases introduced when treatment regimens do not consider genetic differences between population groups, particularly when majority group genetic profiles are assumed to hold for minority groups. We focus on the mitigation of adverse drug reactions as an area where population pharmacogenomics can have a direct and immediate impact on public health.
Collapse
|
7
|
Ancestry-attenuated effects of socioeconomic deprivation on type 2 diabetes disparities in the All of Us cohort. RESEARCH SQUARE 2023:rs.3.rs-2976764. [PMID: 37790565 PMCID: PMC10543018 DOI: 10.21203/rs.3.rs-2976764/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Background Diabetes is a common disease with a major burden on morbidity, mortality, and productivity. Type 2 diabetes (T2D) accounts for roughly 90% of all diabetes cases in the United States and has greater observed prevalence among those who identify as Black or Hispanic. Methods The aims of this study were to determine whether T2D racial and ethnic disparities can be observed in data from the All of Us Research Program and to measure associations of genetic ancestry (GA) and socioeconomic deprivation with T2D. The All of Us Researcher Workbench was used to calculate T2D prevalence and to model T2D associations with GA, individual-level (iSDI) and zip code-based (zSDI) socioeconomic deprivation indices within and between participant self-identified race and ethnicity (SIRE) groups. Results The study cohort of 86,488 participants from the four largest SIRE groups in All of Us: Asian (n=2,311), Black (n=16,282), Hispanic (n=16,966), and White (n=50,292). SIRE groups show characteristic genetic ancestry patterns, consistent with their diverse origins, together with a continuum of ancestry fractions within and between groups. The Black and Hispanic groups show the highest median SDI values, followed by the Asian and White groups. Black participants show the highest age- and sex-adjusted T2D prevalence (21.9%), followed by the Hispanic (19.9%), Asian (15.1%), and White (14.8%) groups. Minority SIRE groups and socioeconomic deprivation are positively associated with T2D, when the entire cohort is analyzed together. However, SIRE and GA both show negative interaction effects with SDI on T2D. Higher levels of SDI are negatively associated with T2D in the Black and Hispanic groups, and higher levels of SDI are negatively associated with T2D at high levels of African and Native American ancestry. Conclusion Socioeconomic deprivation is positively associated with the SIRE group T2D disparities observed here but negatively associated with T2D within the Black and Hispanic groups that show the highest T2D prevalence. These results are paradoxical and have not been reported elsewhere. We discuss possible explanations for this paradox related to the nature of the All of Us data along with SIRE group differences in access to healthcare, diet, and lifestyle.
Collapse
|
8
|
Genomic analysis of Chlamydia psittaci from a multistate zoonotic outbreak in two chicken processing plants. J Genomics 2023; 11:40-44. [PMID: 37670735 PMCID: PMC10475345 DOI: 10.7150/jgen.86558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 07/27/2023] [Indexed: 09/07/2023] Open
Abstract
Four Chlamydia psittaci isolates were recovered from clinical specimens from ill workers during a multistate outbreak at two chicken processing plants. Whole genome sequencing analyses revealed high similarity to C. psittaci genotype D. The isolates differed from each other by only two single nucleotide polymorphisms, indicating a common source.
Collapse
|
9
|
Race, Ethnicity, and Pharmacogenomic Variation in the United States and the United Kingdom. Pharmaceutics 2023; 15:1923. [PMID: 37514109 PMCID: PMC10383154 DOI: 10.3390/pharmaceutics15071923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 06/30/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023] Open
Abstract
The relevance of race and ethnicity to genetics and medicine has long been a matter of debate. An emerging consensus holds that race and ethnicity are social constructs and thus poor proxies for genetic diversity. The goal of this study was to evaluate the relationship between race, ethnicity, and clinically relevant pharmacogenomic variation in cosmopolitan populations. We studied racially and ethnically diverse cohorts of 65,120 participants from the United States All of Us Research Program (All of Us) and 31,396 participants from the United Kingdom Biobank (UKB). Genome-wide patterns of pharmacogenomic variation-6311 drug response-associated variants for All of Us and 5966 variants for UKB-were analyzed with machine learning classifiers to predict participants' self-identified race and ethnicity. Pharmacogenomic variation predicts race/ethnicity with averages of 92.1% accuracy for All of Us and 94.3% accuracy for UKB. Group-specific prediction accuracies range from 99.0% for the White group in UKB to 92.9% for the Hispanic group in All of Us. Prediction accuracies are substantially lower for individuals who identified with more than one group in All of Us (16.7%) or as Mixed in UKB (70.7%). There are numerous individual pharmacogenomic variants with large allele frequency differences between race/ethnicity groups in both cohorts. Frequency differences for toxicity-associated variants predict hundreds of adverse drug reactions per 1000 treated participants for minority groups in All of Us. Our results indicate that race and ethnicity can be used to stratify pharmacogenomic risk in the US and UK populations and should not be discounted when making treatment decisions. We resolve the contradiction between the results reported here and the orthodoxy of race and ethnicity as non-genetic, social constructs by emphasizing the distinction between global and local patterns of human genetic diversity, and we stress the current and future limitations of race and ethnicity as proxies for pharmacogenomic variation.
Collapse
|
10
|
Transcriptome Analysis Identifies Tumor Immune Microenvironment Signaling Networks Supporting Metastatic Castration-Resistant Prostate Cancer. ONCO 2023; 3:81-95. [PMID: 38435029 PMCID: PMC10906979 DOI: 10.3390/onco3020007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Prostate cancer (PCa) is the second most common cause of cancer death in American men. Metastatic castration-resistant prostate cancer (mCRPC) is the most lethal form of PCa and preferentially metastasizes to the bones through incompletely understood molecular mechanisms. Herein, we processed RNA sequencing data from patients with mCRPC (n = 60) and identified 14 gene clusters (modules) highly correlated with mCRPC bone metastasis. We used a novel combination of weighted gene co-expression network analysis (WGCNA) and upstream regulator and gene ontology analyses of clinically annotated transcriptomes to identify the genes. The cyan module (M14) had the strongest positive correlation (0.81, p = 4 × 10-15) with mCRPC bone metastasis. It was associated with two significant biological pathways through KEGG enrichment analysis (parathyroid hormone synthesis, secretion, and action and protein digestion and absorption). In particular, we identified 10 hub genes (ALPL, PHEX, RUNX2, ENPP1, PHOSPHO1, PTH1R, COL11A1, COL24A1, COL22A1, and COL13A1) using cytoHubba of Cytoscape. We also found high gene expression for collagen formation, degradation, absorption, cell-signaling peptides, and bone regulation processes through Gene Ontology (GO) enrichment analysis.
Collapse
|
11
|
The landscape of health disparities in the UK Biobank. Database (Oxford) 2023; 2023:7143539. [PMID: 37114803 PMCID: PMC10132819 DOI: 10.1093/database/baad026] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 03/01/2023] [Accepted: 04/05/2023] [Indexed: 04/29/2023]
Abstract
The UK Biobank (UKB), a large-scale biomedical database that includes demographic and electronic health record data for more than half a million ethnically diverse participants, is a potentially valuable resource for the study of health disparities. However, publicly accessible databases that catalog health disparities in the UKB do not exist. We developed the UKB Health Disparities Browser with the aims of (i) facilitating the exploration of the landscape of health disparities in the UK and (ii) directing the attention to areas of disparities research that might have the greatest public health impact. Health disparities were characterized for UKB participant groups defined by age, country of residence, ethnic group, sex and socioeconomic deprivation. We defined disease cohorts for UKB participants by mapping participant International Classification of Diseases, Tenth Revision (ICD-10) diagnosis codes to phenotype codes (phecodes). For each of the population attributes used to define population groups, disease percent prevalence values were computed for all groups from phecode case-control cohorts, and the magnitude of the disparities was calculated by both the difference and ratio of the range of disease prevalence values among groups to identify high- and low-prevalence disparities. We identified numerous diseases and health conditions with disparate prevalence values across population attributes, and we deployed an interactive web browser to visualize the results of our analysis: https://ukbatlas.health-disparities.org. The interactive browser includes overall and group-specific prevalence data for 1513 diseases based on a cohort of >500 000 participants from the UKB. Researchers can browse and sort by disease prevalence and prevalence differences to visualize health disparities for each of the five population attributes, and users can search for diseases of interest by disease names or codes. Database URL https://ukbatlas.health-disparities.org/.
Collapse
|
12
|
Ethnic disparities in mortality and group-specific risk factors in the UK Biobank. PLOS GLOBAL PUBLIC HEALTH 2023; 3:e0001560. [PMID: 36963080 PMCID: PMC10021328 DOI: 10.1371/journal.pgph.0001560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 01/09/2023] [Indexed: 02/25/2023]
Abstract
Despite a substantial overall decrease in mortality, disparities among ethnic minorities in developed countries persist. This study investigated mortality disparities and their associated risk factors for the three largest ethnic groups in the United Kingdom: Asian, Black, and White. Study participants were sampled from the UK Biobank (UKB), a prospective cohort enrolled between 2006 and 2010. Genetics, biological samples, and health information and outcomes data of UKB participants were downloaded and data-fields were prioritized based on participants with death registry records. Kaplan-Meier method was used to evaluate survival differences among ethnic groups; survival random forest feature selection followed by Cox proportional-hazard modeling was used to identify and estimate the effects of shared and ethnic group-specific mortality risk factors. The White ethnic group showed significantly worse survival probability than the Asian and Black groups. In all three ethnic groups, endoscopy and colonoscopy procedures showed significant protective effects on overall mortality. Asian and Black women show lower relative risk of mortality than men, whereas no significant effect of sex was seen for the White group. The strongest ethnic group-specific mortality associations were ischemic heart disease for Asians, COVID-19 for Blacks, and cancers of respiratory/intrathoracic organs for Whites. Mental health-related diagnoses, including substance abuse, anxiety, and depression, were a major risk factor for overall mortality in the Asian group. The effect of mental health on Asian mortality, particularly for digestive cancers, was exacerbated by an observed hesitance to answer mental health questions, possibly related to cultural stigma. C-reactive protein (CRP) serum levels were associated with both overall and cause-specific mortality due to COVID-19 and digestive cancers in the Black group, where elevated CRP has previously been linked to psychosocial stress due to discrimination. Our results point to mortality risk factors that are group-specific and modifiable, supporting targeted interventions towards greater health equity.
Collapse
|
13
|
Abstract
This study assesses racial and ethnic differences in overall burden of firearm-related mortality and in change in firearm-related mortality among youths from 1999 to 2020.
Collapse
|
14
|
QuasiFlow: a Nextflow pipeline for analysis of NGS-based HIV-1 drug resistance data. BIOINFORMATICS ADVANCES 2022; 2:vbac089. [PMID: 36699347 PMCID: PMC9722223 DOI: 10.1093/bioadv/vbac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/10/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022]
Abstract
Summary Next-generation sequencing (NGS) enables reliable detection of resistance mutations in minority variants of human immunodeficiency virus type 1 (HIV-1). There is paucity of evidence for the association of minority resistance to treatment failure, and this requires evaluation. However, the tools for analyzing HIV-1 drug resistance (HIVDR) testing data are mostly web-based which requires uploading data to webservers. This is a challenge for laboratories with internet connectivity issues and instances with restricted data transfer across networks. We present QuasiFlow, a pipeline for reproducible analysis of NGS-based HIVDR testing data across different computing environments. Since QuasiFlow entirely depends on command-line tools and a local copy of the reference database, it eliminates challenges associated with uploading HIV-1 NGS data onto webservers. The pipeline takes raw sequence reads in FASTQ format as input and generates a user-friendly report in PDF/HTML format. The drug resistance scores obtained using QuasiFlow were 100% and 99.12% identical to those obtained using web-based HIVdb program and HyDRA web respectively at a mutation detection threshold of 20%. Availability and implementation QuasiFlow and corresponding documentation are publicly available at https://github.com/AlfredUg/QuasiFlow. The pipeline is implemented in Nextflow and requires regular updating of the Stanford HIV drug resistance interpretation algorithm. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
|
15
|
Investigation of hypertension and type 2 diabetes as risk factors for dementia in the All of Us cohort. Sci Rep 2022; 12:19797. [PMID: 36396674 PMCID: PMC9672061 DOI: 10.1038/s41598-022-23353-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 10/31/2022] [Indexed: 11/18/2022] Open
Abstract
The World Health Organization recently defined hypertension and type 2 diabetes (T2D) as modifiable comorbidities leading to dementia and Alzheimer's disease. In the United States (US), hypertension and T2D are health disparities, with higher prevalence seen for Black and Hispanic minority groups compared to the majority White population. We hypothesized that elevated prevalence of hypertension and T2D risk factors in Black and Hispanic groups may be associated with dementia disparities. We interrogated this hypothesis using a cross-sectional analysis of participant data from the All of Us (AoU) Research Program, a large observational cohort study of US residents. The specific objectives of our study were: (1) to compare the prevalence of dementia, hypertension, and T2D in the AoU cohort to previously reported prevalence values for the US population, (2) to investigate the association of hypertension, T2D, and race/ethnicity with dementia, and (3) to investigate whether race/ethnicity modify the association of hypertension and T2D with dementia. AoU participants were recruited from 2018 to 2019 as part of the initial project cohort (R2019Q4R3). Participants aged 40-80 with electronic health records and demographic data (age, sex, race, and ethnicity) were included for analysis, yielding a final cohort of 125,637 individuals. AoU participants show similar prevalence of hypertension (32.1%) and T2D (13.9%) compared to the US population (32.0% and 10.5%, respectively); however, the prevalence of dementia for AoU participants (0.44%) is an order of magnitude lower than seen for the US population (5%). AoU participants with dementia show a higher prevalence of hypertension (81.6% vs. 31.9%) and T2D (45.9% vs. 11.4%) compared to non-dementia participants. Dominance analysis of a multivariable logistic regression model with dementia as the outcome shows that hypertension, age, and T2D have the strongest associations with dementia. Hispanic was the only race/ethnicity group that showed a significant association with dementia, and the association of sex with dementia was non-significant. The association of T2D with dementia is likely explained by concurrent hypertension, since > 90% of participants with T2D also had hypertension. Black race and Hispanic ethnicity interact with hypertension, but not T2D, to increase the odds of dementia. This study underscores the utility of the AoU participant cohort to study disease prevalence and risk factors. We do notice a lower participation of aged minorities and participants with dementia, revealing an opportunity for targeted engagement. Our results indicate that targeting hypertension should be a priority for risk factor modifications to reduce dementia incidence.
Collapse
|
16
|
Comorbidities and ethnic health disparities in the UK biobank. JAMIA Open 2022; 5:ooac057. [PMID: 36313969 PMCID: PMC9272510 DOI: 10.1093/jamiaopen/ooac057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/15/2022] [Accepted: 06/24/2022] [Indexed: 11/15/2022] Open
Abstract
Objective The goal of this study was to investigate the relationship between comorbidities and ethnic health disparities in a diverse, cosmopolitan population. Materials and Methods We used the UK Biobank (UKB), a large progressive cohort study of the UK population. Study participants self-identified with 1 of 5 ethnic groups and participant comorbidities were characterized using the 31 disease categories captured by the Elixhauser Comorbidity Index. Ethnic disparities in comorbidities were quantified as the extent to which disease prevalence within categories varies across ethnic groups and the extent to which pairs of comorbidities co-occur within ethnic groups. Disease-risk factor comorbidity pairs were identified where one comorbidity is known to be a risk factor for a co-occurring comorbidity. Results The Asian ethnic group shows the greatest average number of comorbidities, followed by the Black and then White groups. The Chinese group shows the lowest average number of comorbidities. Comorbidity prevalence varies significantly among the ethnic groups for almost all disease categories, with diabetes and hypertension showing the largest differences across groups. Diabetes and hypertension both show ethnic-specific comorbidities that may contribute to the observed disease prevalence disparities. Discussion These results underscore the extent to which comorbidities vary among ethnic groups and reveal group-specific disease comorbidities that may underlie ethnic health disparities. Conclusion The study of comorbidity distributions across ethnic groups can be used to inform targeted group-specific interventions to reduce ethnic health disparities.
Collapse
|
17
|
The Apportionment of Pharmacogenomic Variation: Race, Ethnicity, and Adverse Drug Reactions. MEDICAL RESEARCH ARCHIVES 2022; 10:10.18103/mra.v10i9.2986. [PMID: 36304842 PMCID: PMC9600569 DOI: 10.18103/mra.v10i9.2986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Fifty years ago, Richard Lewontin found that the vast majority of human genetic variation falls within (~85%) rather than between (~15%) racial groups. This result has been replicated numerous times since and is widely taken to support the notion that genetic differences between racial groups are trivial and thus irrelevant for clinical decision-making. The aim of this study was to consider how the apportionment of pharmacogenomic variation within and between racial and ethnic groups relates to risk disparities for adverse drug reactions. We confirmed that the majority of pharmacogenomic variation falls within (97.3%) rather than between (2.78%) the three largest racial and ethnic groups in the United States: Black, Hispanic, and White. Nevertheless, pharmacogenomic variants showing far greater within than between-group variation can have high predictive value for adverse drug reactions, particularly for minority racial and ethnic groups. We predicted excess adverse drug reactions for minority Black and Hispanic groups, compared to the majority White group, and considered these results in light of the apportionment of genetic variation within and between groups. For 85% within and 15% between group variation, there are 700 excess adverse drug reactions per 1,000 patients predicted for a recessive effect model and 300 for a dominant model. We found high numbers of predicted Black and Hispanic excess adverse drug reactions for widely prescribed platinum chemotherapy compounds, such as cisplatin and oxaliplatin, as well as controlled narcotics, including fentanyl and tramadol. Our results indicate that race and ethnicity, while imprecise proxies for genetic diversity, correlate with patterns of pharmacogenomic variation in a way that is clearly relevant to medical treatment decisions. The effects of this variation is particularly pronounced for Black and Hispanic minority groups, owing to genetic differences from the majority White group. Treatment decisions that are made based on (assumed) White pharmacogenomic variant frequencies can be harmful for minority groups. Ignoring clinically relevant genetic differences among racial and ethnic groups, however well-intentioned, will exacerbate rather than ameliorate health disparities.
Collapse
|
18
|
Effects of genetic ancestry and socioeconomic deprivation on ethnic differences in serum creatinine. Gene 2022; 837:146709. [PMID: 35772650 PMCID: PMC9288982 DOI: 10.1016/j.gene.2022.146709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 06/24/2022] [Indexed: 11/18/2022]
Abstract
The inclusion of ethnicity in equations for estimating the glomerular filtration rate (eGFR) from serum creatinine levels has been challenged since ethnicity is socially defined and therefore a poor proxy for biological differences. We hypothesized that genetic ancestry (GA) would be more strongly associated with creatinine levels among healthy individuals than self-identified ethnicity. We studied a diverse cohort of 35,590 participants characterized as part of the UK Biobank, grouped by self-reported ethnicity: Black, East Asian, Mixed, Other, South Asian, and White. We used multivariable modeling to test for associations between ethnicity, GA, socioeconomic deprivation, and serum creatinine levels, including covariates for age, sex, height, and body mass index. Model fit comparisons and relative importance analysis were used to compare the effects of ethnicity and GA on creatinine levels. Black ethnicity shows a positive effect on participant serum creatinine levels (β = 9.36 ± 0.38), whereas East Asian (β = -1.80 ± 0.66) and South Asian (β = -0.28 ± 0.36) ethnicity show negative effects on creatinine. Male sex (β = 17.69 ± 0.34) and height (β = 0.13 ± 0.02) also show high positive associations with creatinine levels, while socioeconomic deprivation (β = -0.04 ± 0.04) shows no significant association. African ancestry has the highest association (β = 13.81 ± 0.52) with creatinine levels. Overall, GA (9.06%) explains significantly more of the variation in creatinine levels than ethnicity (4.96%), with African ancestry (6.36%) alone explaining more of the variation than ethnicity. We found that GA explains more of the variation in serum creatinine levels than socioeconomic deprivation, suggesting the possibility that ethnic differences in creatinine are shaped by genetic rather than social factors.
Collapse
|
19
|
Mutations in SORL1 and MTHFDL1 possibly contribute to the development of Alzheimer's disease in a multigenerational Colombian Family. PLoS One 2022; 17:e0269955. [PMID: 35905044 PMCID: PMC9337667 DOI: 10.1371/journal.pone.0269955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/31/2022] [Indexed: 11/19/2022] Open
Abstract
Alzheimer's disease (AD) is the most common cause of dementia in the elderly, affecting over 50 million people worldwide in 2020 and this number will triple to 152 million by 2050. Much of the increase will be in developing countries like Colombia. In familial forms, highly penetrant mutations have been identified in three genes, APP, PSEN1, and PSEN2, supporting a role for amyloid-β peptide. In sporadic forms, more than 30 risk genes involved in the lipid metabolism, the immune system, and synaptic functioning mechanisms. We used whole-exome sequencing (WES) to evaluate a family of 97 members, spanning three generations, with a familiar AD, and without mutations in APP, PSEN1, or PSEN2. We sequenced two affected and one unaffected member with the aim of identifying genetic variants that could explain the presence of the disease in the family and the candidate variants were validated in eleven members. We also built a structural model to try to determine the effect on protein function. WES analysis identified two rare variants in SORL1 and MTHFD1L genes segregating in the family with other potential risk variants in APOE, ABCA7, and CHAT, suggesting an oligogenic inheritance. Additionally, the structural 3D models of SORL1 and MTHFD1L variants shows that these variants produce polarity changes that favor hydrophobic interactions, resulting in local structural changes that could affect the protein function and may contribute to the development of the disease in this family.
Collapse
|
20
|
Transcriptome analysis identifies networks and key drivers in tumor microenvironment for metastatic castration-resistant prostate cancer. THE JOURNAL OF IMMUNOLOGY 2022. [DOI: 10.4049/jimmunol.208.supp.179.02] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Abstract
Prostate cancer (PCa) is the second most cancer-related cause of death in men. The most lethal form of PCa is metastatic castration-resistant prostate cancer (CRPC) that has progressed to the bone. The molecular mechanism underlying PCa progression to bone has not been fully elucidated. Here, we identify gene networks and characterize the cell types that contribute to the heterogeneity and complexity of CRPC associated with bone metastasis. Networks were investigated by using a novel weighted gene network co-expression analysis (WGCNA) method and examining overrepresentation of upstream regulators and signaling pathways within co-expressed transcriptome modules across clinically annotated transcriptomes from mCRPC patients (N=60). Functional Enrichment analysis was used, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment, and protein-protein interaction analysis to identify biological functions of related hub genes overrepresented in our module of interest. WGCNA identified significant modules that are associated with CRPC bone metastasis. KEGG and GO analysis results revealed genes in these modules were mainly related to collagen formation, cell signaling peptides, and bone regulation processes. Genes positively correlated with bone metastasis exhibited the following biological pathways: PI3K-Akt signaling, ECM-receptor interaction, and protein digestion and absorption pathways. This study provides novel insights into the biological pathways associated with CRPC metastasis to the bone and its tumor immune microenvironment (TIME) niche. The modules associated with bone metastasis and overall survival represent both known and novel pathways.
Supported by NIH/NCI Grant: MSM/Tuskegee U/UAB Comp. Cancer Center Partnership Name & Number of Grant Award: NCI - U54CA118638
Collapse
|
21
|
Epigenetics and cancer disparities: when nature might be nurture. Oncoscience 2022; 9:23-24. [PMID: 35479648 PMCID: PMC9033022 DOI: 10.18632/oncoscience.555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Indexed: 11/25/2022] Open
|
22
|
Association of genetic ancestry and molecular signatures with cancer survival disparities: a pan-cancer analysis. Cancer Res 2022; 82:1222-1233. [DOI: 10.1158/0008-5472.can-21-2105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/20/2021] [Accepted: 01/18/2022] [Indexed: 11/16/2022]
|
23
|
Correction to: Genetic Ancestry Inference for Pharmacogenomics. Methods Mol Biol 2022; 2547:C1. [PMID: 37794232 DOI: 10.1007/978-1-0716-2573-6_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
|
24
|
Genetic Ancestry Inference for Pharmacogenomics. Methods Mol Biol 2022; 2547:595-609. [PMID: 36068478 PMCID: PMC9486757 DOI: 10.1007/978-1-0716-2573-6_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Genetic ancestry inference can be used to stratify patient cohorts and to model pharmacogenomic variation within and between populations. We provide a detailed guide to genetic ancestry inference using genome-wide genetic variant datasets, with an emphasis on two widely used techniques: principal components analysis (PCA) and ADMIXTURE analysis. PCA can be used for patient stratification and categorical ancestry inference, whereas ADMIXTURE is used to characterize genetic ancestry as a continuous variable. Visualization methods are critical for the interpretation of genetic ancestry inference methods, and we provide instructions for how the results of PCA and ADMIXTURE can be effectively visualized.
Collapse
|
25
|
Comparing Genetic and Socioenvironmental Contributions to Ethnic Differences in C-Reactive Protein. Front Genet 2021; 12:738485. [PMID: 34733313 PMCID: PMC8558394 DOI: 10.3389/fgene.2021.738485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/05/2021] [Indexed: 02/03/2023] Open
Abstract
C-reactive protein (CRP) is a routinely measured blood biomarker for inflammation. Elevated levels of circulating CRP are associated with response to infection, risk for a number of complex common diseases, and psychosocial stress. The objective of this study was to compare the contributions of genetic ancestry, socioenvironmental factors, and inflammation-related health conditions to ethnic differences in C-reactive protein levels. We used multivariable regression to compare CRP blood serum levels between Black and White ethnic groups from the United Kingdom Biobank (UKBB) prospective cohort study. CRP serum levels are significantly associated with ethnicity in an age and sex adjusted model. Study participants who identify as Black have higher average CRP than those who identify as White, CRP increases with age, and females have higher average CRP than males. Ethnicity and sex show a significant interaction effect on CRP. Black females have higher average CRP levels than White females, whereas White males have higher average CRP than Black males. Significant associations between CRP, ethnicity, and genetic ancestry are almost completely attenuated in a fully adjusted model that includes socioenvironmental factors and inflammation-related health conditions. BMI, smoking, and socioeconomic deprivation all have high relative effects on CRP. These results indicate that socioenvironmental factors contribute more to CRP ethnic differences than genetics. Differences in CRP are associated with ethnic disparities for a number of chronic diseases, including type 2 diabetes, essential hypertension, sarcoidosis, and lupus erythematosus. Our results indicate that ethnic differences in CRP are linked to both socioenvironmental factors and numerous ethnic health disparities.
Collapse
|
26
|
The Impact of Ethnicity and Genetic Ancestry on Disease Prevalence and Risk in Colombia. Front Genet 2021; 12:690366. [PMID: 34650589 PMCID: PMC8507149 DOI: 10.3389/fgene.2021.690366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 08/11/2021] [Indexed: 11/13/2022] Open
Abstract
Currently, the vast majority of genomic research cohorts are made up of participants with European ancestry. Genomic medicine will only reach its full potential when genomic studies become more broadly representative of global populations. We are working to support the establishment of genomic medicine in developing countries in Latin America via studies of ethnically and ancestrally diverse Colombian populations. The goal of this study was to analyze the effect of ethnicity and genetic ancestry on observed disease prevalence and predicted disease risk in Colombia. Population distributions of Colombia's three major ethnic groups - Mestizo, Afro-Colombian, and Indigenous - were compared to disease prevalence and socioeconomic indicators. Indigenous and Mestizo ethnicity show the highest correlations with disease prevalence, whereas the effect of Afro-Colombian ethnicity is substantially lower. Mestizo ethnicity is mostly negatively correlated with six high-impact health conditions and positively correlated with seven of eight common cancers; Indigenous ethnicity shows the opposite effect. Malaria prevalence in particular is strongly correlated with ethnicity. Disease prevalence co-varies across geographic regions, consistent with the regional distribution of ethnic groups. Ethnicity is also correlated with regional variation in human development, partially explaining the observed differences in disease prevalence. Patterns of genetic ancestry and admixture for a cohort of 624 individuals from Medellín were compared to disease risk inferred via polygenic risk scores (PRS). African genetic ancestry is most strongly correlated with predicted disease risk, whereas European and Native American ancestry show weaker effects. African ancestry is mostly positively correlated with disease risk, and European ancestry is mostly negatively correlated. The relationships between ethnicity and disease prevalence do not show an overall correspondence with the relationships between ancestry and disease risk. We discuss possible reasons for the divergent health effects of ethnicity and ancestry as well as the implication of our results for the development of precision medicine in Colombia.
Collapse
|
27
|
Abstract
We investigated the ancestral origins of four Ecuadorian ethnic groups-Afro-Ecuadorian, Mestizo, Montubio, and the Indigenous Tsáchila-in an effort to gain insight on the relationship between ancestry, culture, and the formation of ethnic identities in Latin America. The observed patterns of genetic ancestry are largely concordant with ethnic identities and historical records of conquest and colonization in Ecuador. Nevertheless, a number of exceptional findings highlight the complex relationship between genetic ancestry and ethnicity in Ecuador. Afro-Ecuadorians show far less African ancestry, and the highest levels of Native American ancestry, seen for any Afro-descendant population in the Americas. Mestizos in Ecuador show high levels of Native American ancestry, with substantially less European ancestry, despite the relatively low Indigenous population in the country. The recently recognized Montubio ethnic group is highly admixed, with substantial contributions from all three continental ancestries. The Tsáchila show two distinct ancestry subgroups, with most individuals showing almost exclusively Native American ancestry and a smaller group showing a Mestizo characteristic pattern. Considered together with historical data and sociological studies, our results indicate the extent to which ancestry and culture interact, often in unexpected ways, to shape ethnic identity in Ecuador.
Collapse
|
28
|
Genome-Enabled Molecular Subtyping and Serotyping for Shiga Toxin-Producing Escherichia coli. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2021. [DOI: 10.3389/fsufs.2021.752873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Foodborne pathogens are a major public health burden in the United States, leading to 9.4 million illnesses annually. Since 1996, a national laboratory-based surveillance program, PulseNet, has used molecular subtyping and serotyping methods with the aim to reduce the burden of foodborne illness through early detection of emerging outbreaks. PulseNet affiliated laboratories have used pulsed-field gel electrophoresis (PFGE) and immunoassays to subtype and serotype bacterial isolates. Widespread use of serotyping and PFGE for foodborne illness surveillance over the years has resulted in the accumulation of a wealth of routine surveillance and outbreak epidemiological data. This valuable source of data has been used to understand seasonal frequency, geographic distribution, demographic information, exposure information, disease severity, and source of foodborne isolates. In 2019, PulseNet adopted whole genome sequencing (WGS) at a national scale to replace PFGE with higher-resolution methods such as the core genome multilocus sequence typing. Consequently, PulseNet's recent shift to genome-based subtyping methods has rendered the vast collection of historic surveillance data associated with serogroups and PFGE patterns potentially unusable. The goal of this study was to develop a bioinformatics method to associate the WGS data that are currently used by PulseNet for bacterial pathogen subtyping to previously characterized serogroup and PFGE patterns. Previous efforts to associate WGS to PFGE patterns relied on predicting DNA molecular weight based on restriction site analysis. However, these approaches failed owing to the non-uniform usage of genomic restriction sites by PFGE restriction enzymes. We developed a machine learning approach to classify isolates to their most probable serogroup and PFGE pattern, based on comparisons of genomic k-mer signatures. We applied our WGS classification method to 5,970 Shiga toxin-producing Escherichia coli (STEC) isolates collected as part of PulseNet's routine foodborne surveillance activities between 2003 and 2018. Our machine learning classifier is able to associate STEC WGS to higher-level serogroups with very high accuracy and lower-level PFGE patterns with somewhat lower accuracy. Taken together, these classifications support the ability of public health investigators to associate currently generated WGS data with historical epidemiological knowledge linked to serogroups and PFGE patterns in support of outbreak surveillance for food safety and public health.
Collapse
|
29
|
Vitamin D and socioeconomic deprivation mediate COVID-19 ethnic health disparities. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.09.20.21263865. [PMID: 34611667 PMCID: PMC8491858 DOI: 10.1101/2021.09.20.21263865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Ethnic minorities in developed countries suffer a disproportionately high burden of COVID-19 morbidity and mortality, and COVID-19 ethnic disparities have been attributed to social determinants of health. Vitamin D has been proposed as a modifiable risk factor that could mitigate COVID-19 health disparities. We investigated the relationship between vitamin D and COVID-19 susceptibility and severity using the UK Biobank, a large progressive cohort study of the United Kingdom population. Structural equation modelling was used to evaluate the ability of vitamin D, socioeconomic deprivation, and other known risk factors to mediate COVID-19 ethnic health disparities. Asian ethnicity is associated with higher COVID-19 susceptibility, compared to the majority White population, and Asian and Black ethnicity are both associated with higher COVID-19 severity. Socioeconomic deprivation mediates all three ethnic disparities and shows the highest overall signal of mediation for any COVID-19 risk factor. Vitamin supplements, including vitamin D, mediate the Asian disparity in COVID-19 susceptibility, and serum 25-hydroxyvitamin D (calcifediol) levels mediate Asian and Black COVID-19 severity disparities. Several measures of overall health also mediate COVID-19 ethnic disparities, underscoring the importance of comorbidities. Our results support ethnic minorities' use of vitamin D as both a prophylactic and a supplemental therapeutic for COVID-19.
Collapse
|
30
|
Genomic Diversity of Azole-Resistant Aspergillus fumigatus in the United States. mBio 2021; 12:e0180321. [PMID: 34372699 PMCID: PMC8406307 DOI: 10.1128/mbio.01803-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 07/09/2021] [Indexed: 12/19/2022] Open
Abstract
Azole resistance in pathogenic Aspergillus fumigatus has become a global public health issue threatening the use of medical azoles. The environmentally occurring resistance mutations, TR34/L98H (TR34) and TR46/Y121F/T289A (TR46), are widespread across multiple continents and emerging in the United States. We used whole-genome single nucleotide polymorphism (SNP) analysis on 179 nationally represented clinical and environmental A. fumigatus genomes from the United States along with 18 non-U.S. genomes to evaluate the genetic diversity and foundation of the emergence of azole resistance in the United States. We demonstrated the presence of clades of A. fumigatus isolates: clade A (17%) comprised a global collection of clinical and environmental azole-resistant strains, including all strains with the TR34/L98H allele from India, The Netherlands, the United Kingdom, and the United States, and clade B (83%) consisted of isolates without this marker mainly from the United States. The TR34/L98H polymorphism was shared among azole-resistant A. fumigatus strains from India, The Netherlands, the United Kingdom, and the United States, suggesting the common origin of this resistance mechanism. Six percent of azole-resistant A. fumigatus isolates from the United States with the TR34 resistance marker had a mixture of clade A and clade B alleles, suggestive of recombination. Additionally, the presence of equal proportions of both mating types further suggests the ongoing presence of recombination. This study demonstrates the genetic background for the emergence of azole resistance in the United States, supporting a single introduction and subsequent propagation, possibly through recombination of environmentally driven resistance mutations. IMPORTANCE Aspergillus fumigatus is one of the most common causes of invasive mold infections in patients with immune deficiencies and has also been reported in patients with severe influenza and severe acute respiratory syndrome coronavirus 2 (SARs-CoV-2). Triazole drugs are the first line of therapy for this infection; however, their efficacy has been compromised by the emergence of azole resistance in A. fumigatus, which was proposed to be selected for by exposure to azole fungicides in the environment [P. E. Verweij, E. Snelders, G. H. J. Kema, E. Mellado, et al., Lancet Infect Dis 9:789-795, 2009, https://doi.org/10.1016/S1473-3099(09)70265-8]. Isolates with environmentally driven resistance mutations, TR34/L98H (TR34) and TR46/Y121F/T289A (TR46), have been reported worldwide. Here, we used genomic analysis of a large sample of resistant and susceptible A. fumigatus isolates to demonstrate a single introduction of TR34 in the United States and suggest its ability to spread into the susceptible population is through recombination between resistant and susceptible isolates.
Collapse
|
31
|
Socioeconomic deprivation and genetic ancestry interact to modify type 2 diabetes ethnic disparities in the United Kingdom. EClinicalMedicine 2021; 37:100960. [PMID: 34386746 PMCID: PMC8343245 DOI: 10.1016/j.eclinm.2021.100960] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/19/2021] [Accepted: 05/25/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Type 2 diabetes (T2D) is a complex common disease that disproportionately impacts minority ethnic groups in the United Kingdom (UK). Socioeconomic deprivation (SED) is widely considered as a potential explanation for T2D ethnic disparities in the UK, whereas the effect of genetic ancestry (GA) on such disparities has yet to be studied. METHODS We leveraged data from the UK Biobank prospective cohort study, with participants enrolled between 2006 and 2010, to model the relationship between SED (Townsend index), GA (clustering principal components of whole genome genotype data), and T2D status (ICD-10 codes) across the three largest ethnic groups in the UK - Asian, Black, and White - using multivariable logistic regression. FINDINGS The Asian group shows the highest T2D prevalence (17·9%), followed by the Black (11·7%) and White (5·5%) ethnic groups. We find that both SED (OR: 1·11, 95% CI: 1·10-1·11) and non-European GA (OR South Asian versus European: 4·37, 95% CI: 4·10-4·66; OR African versus European: 2·52, 95% CI: 2·23-2·85) are significantly associated with the observed T2D disparities. GA and SED show significant interaction effects on T2D, with SED being a relatively greater risk factor for T2D for individuals with South Asian and African ancestry, compared to those with European ancestry. INTERPRETATION The significant interactions between SED and GA underscore how the effects of environmental risk factors can differ among ancestry groups, suggesting the need for group-specific interventions. FUNDING This work was supported by the National Institutes of Health (NIH) Distinguished Scholars Program (DSP) to LMR and the Division of Intramural Research (DIR) of the National Institute on Minority Health and Health Disparities (NIMHD) at NIH.
Collapse
|
32
|
The Phenotypic Consequences of Genetic Divergence between Admixed Latin American Populations: Antioquia and Chocó, Colombia. Genome Biol Evol 2021; 12:1516-1527. [PMID: 32681795 PMCID: PMC7513793 DOI: 10.1093/gbe/evaa154] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association studies have uncovered thousands of genetic variants that are associated with a wide variety of human traits. Knowledge of how trait-associated variants are distributed within and between populations can provide insight into the genetic basis of group-specific phenotypic differences, particularly for health-related traits. We analyzed the genetic divergence levels for 1) individual trait-associated variants and 2) collections of variants that function together to encode polygenic traits, between two neighboring populations in Colombia that have distinct demographic profiles: Antioquia (Mestizo) and Chocó (Afro-Colombian). Genetic ancestry analysis showed 62% European, 32% Native American, and 6% African ancestry for Antioquia compared with 76% African, 10% European, and 14% Native American ancestry for Chocó, consistent with demography and previous results. Ancestry differences can confound cross-population comparison of polygenic risk scores (PRS); however, we did not find any systematic bias in PRS distributions for the two populations studied here, and population-specific differences in PRS were, for the most part, small and symmetrically distributed around zero. Both genetic differentiation at individual trait-associated single nucleotide polymorphisms and population-specific PRS differences between Antioquia and Chocó largely reflected anthropometric phenotypic differences that can be readily observed between the populations along with reported disease prevalence differences. Cases where population-specific differences in genetic risk did not align with observed trait (disease) prevalence point to the importance of environmental contributions to phenotypic variance, for both infectious and complex, common disease. The results reported here are distributed via a web-based platform for searching trait-associated variants and PRS divergence levels at http://map.chocogen.com (last accessed August 12, 2020).
Collapse
|
33
|
Genomic characterization and computational phenotyping of nitrogen-fixing bacteria isolated from Colombian sugarcane fields. Sci Rep 2021; 11:9187. [PMID: 33911103 PMCID: PMC8080613 DOI: 10.1038/s41598-021-88380-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 04/07/2021] [Indexed: 01/26/2023] Open
Abstract
Previous studies have shown the sugarcane microbiome harbors diverse plant growth promoting microorganisms, including nitrogen-fixing bacteria (diazotrophs), which can serve as biofertilizers. The genomes of 22 diazotrophs from Colombian sugarcane fields were sequenced to investigate potential biofertilizers. A genome-enabled computational phenotyping approach was developed to prioritize sugarcane associated diazotrophs according to their potential as biofertilizers. This method selects isolates that have potential for nitrogen fixation and other plant growth promoting (PGP) phenotypes while showing low risk for virulence and antibiotic resistance. Intact nitrogenase (nif) genes and operons were found in 18 of the isolates. Isolates also encode phosphate solubilization and siderophore production operons, and other PGP genes. The majority of sugarcane isolates showed uniformly low predicted virulence and antibiotic resistance compared to clinical isolates. Six strains with the highest overall genotype scores were experimentally evaluated for nitrogen fixation, phosphate solubilization, and the production of siderophores, gibberellic acid, and indole acetic acid. Results from the biochemical assays were consistent and validated computational phenotype predictions. A genotypic and phenotypic threshold was observed that separated strains by their potential for PGP versus predicted pathogenicity. Our results indicate that computational phenotyping is a promising tool for the assessment of bacteria detected in agricultural ecosystems.
Collapse
|
34
|
Absence of mgrB Alleviates Negative Growth Effects of Colistin Resistance in Enterobacter cloacae. Antibiotics (Basel) 2020; 9:antibiotics9110825. [PMID: 33227907 PMCID: PMC7699182 DOI: 10.3390/antibiotics9110825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 11/21/2022] Open
Abstract
Colistin is an important last-line antibiotic to treat highly resistant Enterobacter infections. Resistance to colistin has emerged among clinical isolates but has been associated with a significant growth defect. Here, we describe a clinical Enterobacter isolate with a deletion of mgrB, a regulator of colistin resistance, leading to high-level resistance in the absence of a growth defect. The identification of a path to resistance unrestrained by growth defects suggests colistin resistance could become more common in Enterobacter.
Collapse
|
35
|
Population structure and pharmacogenomic risk stratification in the United States. BMC Biol 2020; 18:140. [PMID: 33050895 PMCID: PMC7557099 DOI: 10.1186/s12915-020-00875-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 09/22/2020] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Pharmacogenomic (PGx) variants mediate how individuals respond to medication, and response differences among racial/ethnic groups have been attributed to patterns of PGx diversity. We hypothesized that genetic ancestry (GA) would provide higher resolution for stratifying PGx risk, since it serves as a more reliable surrogate for genetic diversity than self-identified race/ethnicity (SIRE), which includes a substantial social component. We analyzed a cohort of 8628 individuals from the United States (US), for whom we had both SIRE information and whole genome genotypes, with a focus on the three largest SIRE groups in the US: White, Black (African-American), and Hispanic (Latino). Our approach to the question of PGx risk stratification entailed the integration of two distinct methodologies: population genetics and evidence-based medicine. This integrated approach allowed us to consider the clinical implications for the observed patterns of PGx variation found within and between population groups. RESULTS Whole genome genotypes were used to characterize individuals' continental ancestry fractions-European, African, and Native American-and individuals were grouped according to their GA profiles. SIRE and GA groups were found to be highly concordant. Continental ancestry predicts individuals' SIRE with > 96% accuracy, and accordingly, GA provides only a marginal increase in resolution for PGx risk stratification. In light of the concordance between SIRE and GA, taken together with the fact that information on SIRE is readily available to clinicians, we evaluated PGx variation between SIRE groups to explore the potential clinical utility of race and ethnicity. PGx variants are highly diverged compared to the genomic background; 82 variants show significant frequency differences among SIRE groups, and genome-wide patterns of PGx variation are almost entirely concordant with SIRE. The vast majority of PGx variation is found within rather than between groups, a well-established fact for almost all genetic variants, which is often taken to argue against the clinical utility of population stratification. Nevertheless, analysis of highly differentiated PGx variants illustrates how SIRE partitions PGx variation based on groups' characteristic ancestry patterns. These cases underscore the extent to which SIRE carries clinically valuable information for stratifying PGx risk among populations, albeit with less utility for predicting individual-level PGx alleles (genotypes), supporting the concept of population pharmacogenomics. CONCLUSIONS Perhaps most interestingly, we show that individuals who identify as Black or Hispanic stand to gain far more from the consideration of race/ethnicity in treatment decisions than individuals from the majority White population.
Collapse
|
36
|
STing: accurate and ultrafast genomic profiling with exact sequence matches. Nucleic Acids Res 2020; 48:7681-7689. [PMID: 32619234 PMCID: PMC7430640 DOI: 10.1093/nar/gkaa566] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 06/16/2020] [Accepted: 07/01/2020] [Indexed: 11/30/2022] Open
Abstract
Genome-enabled approaches to molecular epidemiology have become essential to public health agencies and the microbial research community. We developed the algorithm STing to provide turn-key solutions for molecular typing and gene detection directly from next generation sequence data of microbial pathogens. Our implementation of STing uses an innovative k-mer search strategy that eliminates the computational overhead associated with the time-consuming steps of quality control, assembly, and alignment, required by more traditional methods. We compared STing to six of the most widely used programs for genome-based molecular typing and demonstrate its ease of use, accuracy, speed and efficiency. STing shows superior accuracy and performance for standard multilocus sequence typing schemes, along with larger genome-scale typing schemes, and it enables rapid automated detection of antimicrobial resistance and virulence factor genes. STing determines the sequence type of traditional 7-gene MLST with 100% accuracy in less than 10 seconds per isolate. We hope that the adoption of STing will help to democratize microbial genomics and thereby maximize its benefit for public health.
Collapse
|
37
|
Abstract 2115: An atlas of transposable element derived alternative splicing in cancer. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-2115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Transposable element (TE) derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially utilized in normal versus cancer tissues. We analyzed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumor tissue pairs across 13 cancer types, resulting in the discovery of 4,820 TE-generated alternative splice events distributed among 723 cancer-associated genes. SINEs (Alu) and LINEs (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes - including MYH11, WHSC1, and CANT1 - were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats.
Citation Format: Evan A. Clayton, Lavanya Rishishwar, Tzu-Chuan Huang, Saurabh Gulati, Dongjo Ban, John F. McDonald, I. King Jordan. An atlas of transposable element derived alternative splicing in cancer [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2115.
Collapse
|
38
|
Ancestry effects on type 2 diabetes genetic risk inference in Hispanic/Latino populations. BMC MEDICAL GENETICS 2020; 21:132. [PMID: 32580712 PMCID: PMC7315475 DOI: 10.1186/s12881-020-01068-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Background Hispanic/Latino (HL) populations bear a disproportionately high burden of type 2 diabetes (T2D). The ability to predict T2D genetic risk using polygenic risk scores (PRS) offers great promise for improved screening and prevention. However, there are a number of complications related to the accurate inference of genetic risk across HL populations with distinct ancestry profiles. We investigated how ancestry affects the inference of T2D genetic risk using PRS in diverse HL populations from Colombia and the United States (US). In Colombia, we compared T2D genetic risk for the Mestizo population of Antioquia to the Afro-Colombian population of Chocó, and in the US, we compared European-American versus Mexican-American populations. Methods Whole genome sequences and genotypes from the 1000 Genomes Project and the ChocoGen Research Project were used for genetic ancestry inference and for T2D polygenic risk score (PRS) calculation. Continental ancestry fractions for HL genomes were inferred via comparison with African, European, and Native American reference genomes, and PRS were calculated using T2D risk variants taken from multiple genome-wide association studies (GWAS) conducted on cohorts with diverse ancestries. A correction for ancestry bias in T2D risk inference based on the frequencies of ancestral versus derived alleles was developed and applied to PRS calculations in the HL populations studied here. Results T2D genetic risk in Colombian and US HL populations is positively correlated with African and Native American ancestry and negatively correlated with European ancestry. The Afro-Colombian population of Chocó has higher predicted T2D risk than Antioquia, and the Mexican-American population has higher predicted risk than the European-American population. The inferred relative risk of T2D is robust to differences in the ancestry of the GWAS cohorts used for variant discovery. For trans-ethnic GWAS, population-specific variants and variants with same direction effects across populations yield consistent results. Nevertheless, the control for bias in T2D risk prediction confirms that explicit consideration of genetic ancestry can yield more reliable cross-population genetic risk inferences. Conclusions T2D associations that replicate across populations provide for more reliable risk inference, and modeling population-specific frequencies of ancestral and derived risk alleles can help control for biases in PRS estimation.
Collapse
|
39
|
An atlas of transposable element-derived alternative splicing in cancer. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190342. [PMID: 32075558 PMCID: PMC7061986 DOI: 10.1098/rstb.2019.0342] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/06/2019] [Indexed: 12/18/2022] Open
Abstract
Transposable element (TE)-derived sequences comprise more than half of the human genome, and their presence has been documented to alter gene expression in a number of different ways, including the generation of alternatively spliced transcript isoforms. Alternative splicing has been associated with tumorigenesis for a number of different cancers. The objective of this study was to broadly characterize the role of human TEs in generating alternatively spliced transcript isoforms in cancer. To do so, we screened for the presence of TE-derived sequences co-located with alternative splice sites that are differentially used in normal versus cancer tissues. We analysed a comprehensive set of alternative splice variants characterized for 614 matched normal-tumour tissue pairs across 13 cancer types, resulting in the discovery of 4820 TE-generated alternative splice events distributed among 723 cancer-associated genes. Short interspersed nuclear elements (Alu) and long interspersed nuclear elements (L1) were found to contribute the majority of TE-generated alternative splice sites in cancer genes. A number of cancer-associated genes, including MYH11, WHSC1 and CANT1, were shown to have overexpressed TE-derived isoforms across a range of cancer types. TE-derived isoforms were also linked to cancer-specific fusion transcripts, suggesting a novel mechanism for the generation of transcriptome diversity via trans-splicing mediated by dispersed TE repeats. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
|
40
|
Admixture-enabled selection for rapid adaptive evolution in the Americas. Genome Biol 2020; 21:29. [PMID: 32028992 PMCID: PMC7006128 DOI: 10.1186/s13059-020-1946-2] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 01/24/2020] [Indexed: 02/08/2023] Open
Abstract
Background Admixture occurs when previously isolated populations come together and exchange genetic material. We hypothesize that admixture can enable rapid adaptive evolution in human populations by introducing novel genetic variants (haplotypes) at intermediate frequencies, and we test this hypothesis through the analysis of whole genome sequences sampled from admixed Latin American populations in Colombia, Mexico, Peru, and Puerto Rico. Results Our screen for admixture-enabled selection relies on the identification of loci that contain more or less ancestry from a given source population than would be expected given the genome-wide ancestry frequencies. We employ a combined evidence approach to evaluate levels of ancestry enrichment at single loci across multiple populations and multiple loci that function together to encode polygenic traits. We find cross-population signals of African ancestry enrichment at the major histocompatibility locus on chromosome 6, consistent with admixture-enabled selection for enhanced adaptive immune response. Several of the human leukocyte antigen genes at this locus, such as HLA-A, HLA-DRB51, and HLA-DRB5, show independent evidence of positive selection prior to admixture, based on extended haplotype homozygosity in African populations. A number of traits related to inflammation, blood metabolites, and both the innate and adaptive immune system show evidence of admixture-enabled polygenic selection in Latin American populations. Conclusions The results reported here, considered together with the ubiquity of admixture in human evolution, suggest that admixture serves as a fundamental mechanism that drives rapid adaptive evolution in human populations.
Collapse
|
41
|
Tumor suppressor genes and allele-specific expression: mechanisms and significance. Oncotarget 2020; 11:462-479. [PMID: 32064050 PMCID: PMC6996918 DOI: 10.18632/oncotarget.27468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 01/13/2020] [Indexed: 12/12/2022] Open
Abstract
Recent findings indicate that allele-specific expression (ASE) at specific cancer driver gene loci may be of importance in onset/progression of the disease. Of particular interest are loss-of-function (LOF) of tumor suppressor gene (TSGs) alleles. While LOF tumor suppressor mutations are typically considered to be recessive, if these mutant alleles can be significantly differentially expressed relative to wild-type alleles in heterozygotes, the clinical consequences could be significant. LOF TSG alleles are shown to be segregating at high frequencies in world-wide populations of normal/healthy individuals. Matched sets of normal and tumor tissues isolated from 233 cancer patients representing four diverse tumor types demonstrate functionally important changes in patterns of ASE in individuals heterozygous for LOF TSG alleles associated with cancer onset/progression. While a variety of molecular mechanisms were identified as potentially contributing to changes in ASE patterns in cancer, changes in DNA copy number and allele-specific alternative splicing possibly mediated by antisense RNA emerged as predominant factors. In conclusion, LOF TSGs are segregating in human populations at significant frequencies indicating that many otherwise healthy individuals are at elevated risk of developing cancer. Changes in ASE between normal and cancer tissues indicates that LOF TSG alleles may contribute to cancer onset/progression even when heterozygous with wild-type functional alleles.
Collapse
|
42
|
Native American admixture recapitulates population-specific migration and settlement of the continental United States. PLoS Genet 2019; 15:e1008225. [PMID: 31545791 PMCID: PMC6756731 DOI: 10.1371/journal.pgen.1008225] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/31/2019] [Indexed: 11/19/2022] Open
Abstract
European and African descendants settled the continental US during the 17th-19th centuries, coming into contact with established Native American populations. The resulting admixture among these groups yielded a significant reservoir of Native American ancestry in the modern US population. We analyzed the patterns of Native American admixture seen for the three largest genetic ancestry groups in the US population: African descendants, Western European descendants, and Spanish descendants. The three groups show distinct Native American ancestry profiles, which are indicative of their historical patterns of migration and settlement across the country. Native American ancestry in the modern African descendant population does not coincide with local geography, instead forming a single group with origins in the southeastern US, consistent with the Great Migration of the early 20th century. Western European descendants show Native American ancestry that tracks their geographic origins across the US, indicative of ongoing contact during westward expansion, and Native American ancestry can resolve Spanish descendant individuals into distinct local groups formed by more recent migration from Mexico and Puerto Rico. We found an anomalous pattern of Native American ancestry from the US southwest, which most likely corresponds to the Nuevomexicano descendants of early Spanish settlers to the region. We addressed a number of controversies surrounding this population, including the extent of Sephardic Jewish ancestry. Nuevomexicanos are less admixed than nearby Mexican-American individuals, with more European and less Native American and African ancestry, and while they do show demonstrable Sephardic Jewish ancestry, the fraction is no greater than seen for other New World Spanish descendant populations.
Collapse
|
43
|
GlobAl Distribution of GEnetic Traits (GADGET) web server: polygenic trait scores worldwide. Nucleic Acids Res 2019; 46:W121-W126. [PMID: 29788182 PMCID: PMC6031022 DOI: 10.1093/nar/gky415] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/03/2018] [Indexed: 11/14/2022] Open
Abstract
Human populations from around the world show striking phenotypic variation across a wide variety of traits. Genome-wide association studies (GWAS) are used to uncover genetic variants that influence the expression of heritable human traits; accordingly, population-specific distributions of GWAS-implicated variants may shed light on the genetic basis of human phenotypic diversity. With this in mind, we developed the GlobAl Distribution of GEnetic Traits web server (GADGET http://gadget.biosci.gatech.edu). The GADGET web server provides users with a dynamic visual platform for exploring the relationship between worldwide genetic diversity and the genetic architecture underlying numerous human phenotypes. GADGET integrates trait-implicated single nucleotide polymorphisms (SNPs) from GWAS, with population genetic data from the 1000 Genomes Project, to calculate genome-wide polygenic trait scores (PTS) for 818 phenotypes in 2504 individual genomes. Population-specific distributions of PTS are shown for 26 human populations across 5 continental population groups, with traits ordered based on the extent of variation observed among populations. Users of GADGET can also upload custom trait SNP sets to visualize global PTS distributions for their own traits of interest.
Collapse
|
44
|
Analysis of Vibrio cholerae genomes identifies new type VI secretion system gene clusters. Genome Biol 2019; 20:163. [PMID: 31405375 PMCID: PMC6691524 DOI: 10.1186/s13059-019-1765-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 07/18/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Like many bacteria, Vibrio cholerae deploys a harpoon-like type VI secretion system (T6SS) to compete against other microbes in environmental and host settings. The T6SS punctures adjacent cells and delivers toxic effector proteins that are harmless to bacteria carrying cognate immunity factors. Only four effector/immunity pairs encoded on one large and three auxiliary gene clusters have been characterized from largely clonal, patient-derived strains of V. cholerae. RESULTS We sequence two dozen V. cholerae strain genomes from diverse sources and develop a novel and adaptable bioinformatics tool based on hidden Markov models. We identify two new T6SS auxiliary gene clusters and describe Aux 5 here. Four Aux 5 loci are present in the host strain, each with an atypical effector/immunity gene organization. Structural prediction of the putative effector indicates it is a lipase, which we name TleV1 (type VI lipase effector Vibrio). Ectopic TleV1 expression induces toxicity in Escherichia coli, which is rescued by co-expression of the TliV1a immunity factor. A clinical V. cholerae reference strain expressing the Aux 5 cluster uses TleV1 to lyse its parental strain upon contact via its T6SS but is unable to kill parental cells expressing the TliV1a immunity factor. CONCLUSION We develop a novel bioinformatics method and identify new T6SS gene clusters in V. cholerae. We also show the TleV1 toxin is delivered in a T6SS manner by V. cholerae and can lyse other bacterial cells. Our web-based tool can be modified to identify additional novel T6SS genomic loci in diverse bacterial species.
Collapse
|
45
|
Assortative Mating on Ancestry-Variant Traits in Admixed Latin American Populations. Front Genet 2019; 10:359. [PMID: 31105740 PMCID: PMC6491930 DOI: 10.3389/fgene.2019.00359] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 04/04/2019] [Indexed: 11/13/2022] Open
Abstract
Assortative mating is a universal feature of human societies, and individuals from ethnically diverse populations are known to mate assortatively based on similarities in genetic ancestry. However, little is currently known regarding the exact phenotypic cues, or their underlying genetic architecture, which inform ancestry-based assortative mating. We developed a novel approach, using genome-wide analysis of ancestry-specific haplotypes, to evaluate ancestry-based assortative mating on traits whose expression varies among the three continental population groups – African, European, and Native American – that admixed to form modern Latin American populations. Application of this method to genome sequences sampled from Colombia, Mexico, Peru, and Puerto Rico revealed widespread ancestry-based assortative mating. We discovered a number of anthropometric traits (body mass, height, and facial development) and neurological attributes (educational attainment and schizophrenia) that serve as phenotypic cues for ancestry-based assortative mating. Major histocompatibility complex (MHC) loci show population-specific patterns of both assortative and disassortative mating in Latin America. Ancestry-based assortative mating in the populations analyzed here appears to be driven primarily by African ancestry. This study serves as an example of how population genomic analyses can yield novel insights into human behavior.
Collapse
|
46
|
Population Pharmacogenomics for Precision Public Health in Colombia. Front Genet 2019; 10:241. [PMID: 30967898 PMCID: PMC6439339 DOI: 10.3389/fgene.2019.00241] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2018] [Accepted: 03/04/2019] [Indexed: 11/13/2022] Open
Abstract
While genomic approaches to precision medicine hold great promise, they remain prohibitively expensive for developing countries. The precision public health paradigm, whereby healthcare decisions are made at the level of populations as opposed to individuals, provides one way for the genomics revolution to directly impact health outcomes in the developing world. Genomic approaches to precision public health require a deep understanding of local population genomics, which is still missing for many developing countries. We are investigating the population genomics of genetic variants that mediate drug response in an effort to inform healthcare decisions in Colombia. Our work focuses on two neighboring populations with distinct ancestry profiles: Antioquia and Chocó. Antioquia has primarily European genetic ancestry followed by Native American and African components, whereas Chocó shows mainly African ancestry with lower levels of Native American and European admixture. We performed a survey of the global distribution of pharmacogenomic variants followed by a more focused study of pharmacogenomic allele frequency differences between the two Colombian populations. Worldwide, we found pharmacogenomic variants to have both unusually high minor allele frequencies and high levels of population differentiation. A number of these pharmacogenomic variants also show anomalous effect allele frequencies within and between the two Colombian populations, and these differences were found to be associated with their distinct genetic ancestry profiles. For example, the C allele of the single nucleotide polymorphism (SNP) rs4149056 [Solute Carrier Organic Anion Transporter Family Member 1B1 (SLCO1B1)∗5], which is associated with an increased risk of toxicity to a commonly prescribed statin, is found at relatively high frequency in Antioquia and is associated with European ancestry. In addition to pharmacogenomic alleles related to increased toxicity risk, we also have evidence that alleles related to dosage and metabolism have large frequency differences between the two populations, which are associated with their specific ancestries. Using these findings, we have developed and validated an inexpensive allele-specific PCR assay to test for the presence of such population-enriched pharmacogenomic SNPs in Colombia. These results serve as an example of how population-centered approaches to pharmacogenomics can help to realize the promise of precision medicine in resource-limited settings.
Collapse
|
47
|
Whole-Genome Sequences of Staphylococcus aureus Isolates from Cystic Fibrosis Lung Infections. Microbiol Resour Announc 2019; 8:e01564-18. [PMID: 30687841 PMCID: PMC6346173 DOI: 10.1128/mra.01564-18] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 12/12/2018] [Indexed: 02/05/2023] Open
Abstract
Staphylococcus aureus is an early colonizer in the lungs of individuals with cystic fibrosis (CF), but surprisingly, only a limited number of genomes from CF-associated S. aureus isolates have been sequenced. Here, we present the whole-genome sequences of 65 S. aureus isolates obtained from 50 individuals with CF.
Collapse
|
48
|
Abstract
BACKGROUND Modern Latin American populations were formed via genetic admixture among ancestral source populations from Africa, the Americas and Europe. We are interested in studying how combinations of genetic ancestry in admixed Latin American populations may impact genomic determinants of health and disease. For this study, we characterized the impact of ancestry and admixture on genetic variants that underlie health- and disease-related phenotypes in population genomic samples from Colombia, Mexico, Peru, and Puerto Rico. RESULTS We analyzed a total of 347 admixed Latin American genomes along with 1102 putative ancestral source genomes from Africans, Europeans, and Native Americans. We characterized the genetic ancestry, relatedness, and admixture patterns for each of the admixed Latin American genomes, finding a spectrum of ancestry proportions within and between populations. We then identified single nucleotide polymorphisms (SNPs) with anomalous ancestry-enrichment patterns, i.e. SNPs that exist in any given Latin American population at a higher frequency than expected based on the population's genetic ancestry profile. For this set of ancestry-enriched SNPs, we inspected their phenotypic impact on disease, metabolism, and the immune system. All four of the Latin American populations show ancestry-enrichment for a number of shared pathways, yielding evidence of similar selection pressures on these populations during their evolution. For example, all four populations show ancestry-enriched SNPs in multiple genes from immune system pathways, such as the cytokine receptor interaction, T cell receptor signaling, and antigen presentation pathways. We also found SNPs with excess African or European ancestry that are associated with ancestry-specific gene expression patterns and play crucial roles in the immune system and infectious disease responses. Genes from both the innate and adaptive immune system were found to be regulated by ancestry-enriched SNPs with population-specific regulatory effects. CONCLUSIONS Ancestry-enriched SNPs in Latin American populations have a substantial effect on health- and disease-related phenotypes. The concordant impact observed for same phenotypes across populations points to a process of adaptive introgression, whereby ancestry-enriched SNPs with specific functional utility appear to have been retained in modern populations by virtue of their effects on health and fitness.
Collapse
|
49
|
Evidence for positive selection on recent human transposable element insertions. Gene 2018; 675:69-79. [DOI: 10.1016/j.gene.2018.06.077] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 06/24/2018] [Indexed: 11/29/2022]
|
50
|
Benchmarking computational tools for polymorphic transposable element detection. Brief Bioinform 2018; 18:908-918. [PMID: 27524380 DOI: 10.1093/bib/bbw072] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Indexed: 12/19/2022] Open
Abstract
Transposable elements (TEs) are an important source of human genetic variation with demonstrable effects on phenotype. Recently, a number of computational methods for the detection of polymorphic TE (polyTE) insertion sites from next-generation sequence data have been developed. The use of such tools will become increasingly important as the pace of human genome sequencing accelerates. For this report, we performed a comparative benchmarking and validation analysis of polyTE detection tools in an effort to inform their selection and use by the TE research community. We analyzed a core set of seven tools with respect to ease of use and accessibility, polyTE detection performance and runtime parameters. An experimentally validated set of 893 human polyTE insertions was used for this purpose, along with a series of simulated data sets that allowed us to assess the impact of sequence coverage on tool performance. The recently developed tool MELT showed the best overall performance followed by Mobster and then RetroSeq. PolyTE detection tools can best detect Alu insertion events in the human genome with reduced reliability for L1 insertions and substantially lowered performance for SVA insertions. We also show evidence that different polyTE detection tools are complementary with respect to their ability to detect a complete set of insertion events. Accordingly, a combined approach, coupled with manual inspection of individual results, may yield the best overall performance. In addition to the benchmarking results, we also provide notes on tool installation and usage as well as suggestions for future polyTE detection algorithm development.
Collapse
|