1
|
Islam SJ, Kim JH, Topel M, Liu C, Ko YA, Mujahid MS, Sims M, Mubasher M, Ejaz K, Morgan-Billingslea J, Jones K, Waller EK, Jones D, Uppal K, Dunbar SB, Pemu P, Vaccarino V, Searles CD, Baltrus P, Lewis TT, Quyyumi AA, Taylor H. Cardiovascular Risk and Resilience Among Black Adults: Rationale and Design of the MECA Study. J Am Heart Assoc 2020; 9:e015247. [PMID: 32340530 PMCID: PMC7428584 DOI: 10.1161/jaha.119.015247] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Background Cardiovascular disease incidence, prevalence, morbidity, and mortality have declined in the past several decades; however, disparities persist among subsets of the population. Notably, blacks have not experienced the same improvements on the whole as whites. Furthermore, frequent reports of relatively poorer health statistics among the black population have led to a broad assumption that black race reliably predicts relatively poorer health outcomes. However, substantial intraethnic and intraracial heterogeneity exists; moreover, individuals with similar risk factors and environmental exposures are often known to experience vastly different cardiovascular health outcomes. Thus, some individuals have good outcomes even in the presence of cardiovascular risk factors, a concept known as resilience. Methods and Results The MECA (Morehouse‐Emory Center for Health Equity) Study was designed to investigate the multilevel exposures that contribute to “resilience” in the face of risk for poor cardiovascular health among blacks in the greater Atlanta, GA, metropolitan area. We used census tract data to determine “at‐risk” and “resilient” neighborhoods with high or low prevalence of cardiovascular morbidity and mortality, based on cardiovascular death, hospitalization, and emergency department visits for blacks. More than 1400 individuals from these census tracts assented to demographic, health, and psychosocial questionnaires administered through telephone surveys. Afterwards, ≈500 individuals were recruited to enroll in a clinical study, where risk biomarkers, such as oxidative stress, and inflammatory markers, endothelial progenitor cells, metabolomic and microRNA profiles, and subclinical vascular dysfunction were measured. In addition, comprehensive behavioral questionnaires were collected and ideal cardiovascular health metrics were assessed using the American Heart Association's Life Simple 7 measure. Last, 150 individuals with low Life Simple 7 were recruited and randomized to a behavioral mobile health (eHealth) plus health coach or eHealth only intervention and followed up for improvement. Conclusions The MECA Study is investigating socioenvironmental and individual behavioral measures that promote resilience to cardiovascular disease in blacks by assessing biological, functional, and molecular mechanisms. REGISTRATION URL: https://www.clinicaltrials.gov. Unique identifier: NCT03308812.
Collapse
Affiliation(s)
- Shabatun J Islam
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Jeong Hwan Kim
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Matthew Topel
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Chang Liu
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA.,Department of Epidemiology Rollins School of Public Health Emory University Atlanta GA
| | - Yi-An Ko
- Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University Atlanta GA
| | - Mahasin S Mujahid
- Division of Epidemiology School of Public Health University of California Berkeley CA
| | - Mario Sims
- Department of Medicine University of Mississippi Medical Center Jackson MS
| | - Mohamed Mubasher
- Department of Community Health and Preventive Medicine Morehouse School of Medicine Atlanta GA
| | - Kiran Ejaz
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Jan Morgan-Billingslea
- Department of Community Health and Preventive Medicine Morehouse School of Medicine Atlanta GA
| | - Kia Jones
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Edmund K Waller
- Department of Hematology and Oncology Winship Cancer Institute Emory University School of Medicine Atlanta GA
| | - Dean Jones
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine Department of Medicine Emory University School of Medicine Atlanta GA
| | - Karan Uppal
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine Department of Medicine Emory University School of Medicine Atlanta GA
| | - Sandra B Dunbar
- Nell Hodgson Woodruff School of Nursing Emory University Atlanta GA
| | - Priscilla Pemu
- Department of Medicine Morehouse School of Medicine Atlanta GA
| | - Viola Vaccarino
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA.,Department of Epidemiology Rollins School of Public Health Emory University Atlanta GA
| | - Charles D Searles
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Peter Baltrus
- Department of Community Health and Preventive Medicine Morehouse School of Medicine Atlanta GA.,National Center for Primary Care Morehouse School of Medicine Atlanta GA
| | - Tené T Lewis
- Department of Epidemiology Rollins School of Public Health Emory University Atlanta GA
| | - Arshed A Quyyumi
- Division of Cardiology Department of Medicine Emory University School of Medicine Atlanta GA
| | - Herman Taylor
- Department of Medicine Morehouse School of Medicine Atlanta GA
| |
Collapse
|
2
|
Quinn TP, Erb I, Richardson MF, Crowley TM. Understanding sequencing data as compositions: an outlook and review. Bioinformatics 2019; 34:2870-2878. [PMID: 29608657 PMCID: PMC6084572 DOI: 10.1093/bioinformatics/bty175] [Citation(s) in RCA: 148] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 03/26/2018] [Indexed: 12/30/2022] Open
Abstract
Motivation Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas P Quinn
- Bioinformatics Core Research Group, Deakin University, Geelong, Australia
| | - Ionas Erb
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Mark F Richardson
- Bioinformatics Core Research Group, Deakin University, Geelong, Australia.,Centre for Integrative Ecology, School of Life and Environmental Sciences, Deakin University, Geelong, Australia
| | - Tamsyn M Crowley
- Bioinformatics Core Research Group, Deakin University, Geelong, Australia.,Poultry Hub Australia, University of New England, Armidale, Australia
| |
Collapse
|
3
|
Rahmatallah Y, Zybailov B, Emmert-Streib F, Glazko G. GSAR: Bioconductor package for Gene Set analysis in R. BMC Bioinformatics 2017; 18:61. [PMID: 28118818 PMCID: PMC5259853 DOI: 10.1186/s12859-017-1482-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 01/10/2017] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Gene set analysis (in a form of functionally related genes or pathways) has become the method of choice for analyzing omics data in general and gene expression data in particular. There are many statistical methods that either summarize gene-level statistics for a gene set or apply a multivariate statistic that accounts for intergene correlations. Most available methods detect complex departures from the null hypothesis but lack the ability to identify the specific alternative hypothesis that rejects the null. RESULTS GSAR (Gene Set Analysis in R) is an open-source R/Bioconductor software package for gene set analysis (GSA). It implements self-contained multivariate non-parametric statistical methods testing a complex null hypothesis against specific alternatives, such as differences in mean (shift), variance (scale), or net correlation structure. The package also provides a graphical visualization tool, based on the union of two minimum spanning trees, for correlation networks to examine the change in the correlation structures of a gene set between two conditions and highlight influential genes (hubs). CONCLUSIONS Package GSAR provides a set of multivariate non-parametric statistical methods that test a complex null hypothesis against specific alternatives. The methods in package GSAR are applicable to any type of omics data that can be represented in a matrix format. The package, with detailed instructions and examples, is freely available under the GPL (> = 2) license from the Bioconductor web site.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA.
| | - Boris Zybailov
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| | - Frank Emmert-Streib
- Computational Medicine and Statistical Learning Laboratory, Tampere University of Technology, Korkeakoulunkatu 1, Tampere, FI-33720, Finland
| | - Galina Glazko
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, 72205, USA
| |
Collapse
|
4
|
K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:918954. [PMID: 26339652 PMCID: PMC4538770 DOI: 10.1155/2015/918954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 12/18/2014] [Indexed: 01/23/2023]
Abstract
With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
Collapse
|
5
|
Peng H, Ma J, Bai Y, Lu J, Yu T. MeDiA: Mean Distance Association and Its Applications in Nonlinear Gene Set Analysis. PLoS One 2015; 10:e0124620. [PMID: 25915206 PMCID: PMC4411044 DOI: 10.1371/journal.pone.0124620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2014] [Accepted: 03/17/2015] [Indexed: 11/23/2022] Open
Abstract
Probabilistic association discovery aims at identifying the association between random vectors, regardless of number of variables involved or linear/nonlinear functional forms. Recently, applications in high-dimensional data have generated rising interest in probabilistic association discovery. We developed a framework based on functions on the observation graph, named MeDiA (Mean Distance Association). We generalize its property to a group of functions on the observation graph. The group of functions encapsulates major existing methods in association discovery, e.g. mutual information and Brownian Covariance, and can be expanded to more complicated forms. We conducted numerical comparison of the statistical power of related methods under multiple scenarios. We further demonstrated the application of MeDiA as a method of gene set analysis that captures a broader range of responses than traditional gene set analysis methods.
Collapse
Affiliation(s)
- Hesen Peng
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | - Junjie Ma
- Department of Hematology, Yantai Yuhuangding Hospital, Yantai, Shandong, China
| | - Yun Bai
- Department of Pharmaceutical Sciences, School of Pharmacy, Philadelphia College of Osteopathic Medicine, Suwanee, Georgia, United States of America
| | - Jianwei Lu
- School of Software Engineering, Tongji University, Shanghai, China
- Advanced Institute of Translational Medicine, Tongji University, Shanghai, China
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
6
|
Rahmatallah Y, Emmert-Streib F, Glazko G. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. ACTA ACUST UNITED AC 2013; 30:360-8. [PMID: 24292935 PMCID: PMC4023302 DOI: 10.1093/bioinformatics/btt687] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To date, gene set analysis approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes. RESULTS In GSNCA, weight factors are assigned to genes in proportion to the genes' cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA captures changes in the structure of genes' cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses. AVAILABILITY AND IMPLEMENTATION Implementation of the GSNCA test in R is available upon request from the authors.
Collapse
Affiliation(s)
- Yasir Rahmatallah
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA and Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast BT9 7BL, UK
| | | | | |
Collapse
|
7
|
Kryukov F, Dementyeva E, Kubiczkova L, Jarkovsky J, Brozova L, Petrik J, Nemec P, Sevcikova S, Minarik J, Stefanikova Z, Kuglik P, Hajek R. Cell cycle genes co-expression in multiple myeloma and plasma cell leukemia. Genomics 2013; 102:243-9. [PMID: 23831116 DOI: 10.1016/j.ygeno.2013.06.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2013] [Revised: 06/06/2013] [Accepted: 06/25/2013] [Indexed: 02/01/2023]
Abstract
The objective of this study was to describe co-expression correlations of cell cycle regulatory genes in multiple myeloma (MM) and plasma cell leukemia (PCL). Our results highlight the presence of dynamic equilibrium between co-expression of activator and inhibitor gene sets. Moreover inhibitor set is more sensitive to the activator changes, not vice versa. We have shown that CDKN2A expression is associated with short-term survival in newly diagnosed MM patients (survival was 30.3 ± 3.9 months for 'low' expressed and 7.5 ± 5.6 months for 'high' expressed group, p<0.0001). Moreover low-expression CDKN2A group showed time-to-progression benefit in newly diagnosed patients (remission was 20.8 ± 3.6 months for 'low' and 8.4 ± 2.7 months for 'high' expressed group, p<0.0001) as well as in whole studied cohort of MM patients (remission was 20.8 ± 2.8 months for 'low' and 9.8 ± 1.1 months for 'high' expressed group, p<0.0001). The overexpression of inhibitors can be explained as a compensatory reaction to growing "oncogenic stress".
Collapse
Affiliation(s)
- Fedor Kryukov
- Babak Myeloma Group, Department of Pathological Physiology, Faculty of Medicine, Masaryk University, Brno, Czech Republic.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Yu T, Peng H. Hierarchical clustering of high-throughput expression data based on general dependences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1080-1085. [PMID: 24334400 PMCID: PMC3905248 DOI: 10.1109/tcbb.2013.99] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
High-throughput expression technologies, including gene expression array and liquid chromatography--mass spectrometry (LC-MS) and so on, measure thousands of features, i.e., genes or metabolites, on a continuous scale. In such data, both linear and nonlinear relations exist between features. Nonlinear relations can reflect critical regulation patterns in the biological system. However, they are not identified and utilized by traditional clustering methods based on linear associations. Clustering based on general dependences, i.e., both linear and nonlinear relations, is hampered by the high dimensionality and high noise level of the data. We developed a sensitive nonparametric measure of general dependence between (groups of) random variables in high dimensions. Based on this dependence measure, we developed a hierarchical clustering method. In simulation studies, the method outperformed correlation- and mutual information (MI)-based hierarchical clustering methods in clustering features with nonlinear dependences. We applied the method to a microarray data set measuring the gene expression in cell-cycle time series to show it generates biologically relevant results. The R code is available at http://userwww.service.emory.edu/~tyu8/GDHC.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| | - Hesen Peng
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| |
Collapse
|
9
|
Yu T, Bai Y. Analyzing LC/MS metabolic profiling data in the context of existing metabolic networks. ACTA ACUST UNITED AC 2012; 1:83-91. [PMID: 24010053 DOI: 10.2174/2213235x11301010084] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Metabolic profiling is the unbiased detection and quantification of low molecular-weight metabolites in a living system. It is rapidly developing in biological and translational research, contributing to disease mechanism elucidation, environmental chemical surveillance, biomarker detection, and health outcome prediction. Recent developments in experimental and computational technology allow more and more known metabolites to be detected and quantified from complex samples. As the coverage of the metabolic network improves, it has become feasible to examine metabolic profiling data from a systems perspective, i.e. interpreting the data and performing statistical inference in the context of pathways and genome-scale metabolic networks. Recently a number of methods have been developed in this area, and much improvement in algorithms and databases are still needed. In this review, we survey some methods for the analysis of metabolic profiling data based on metabolic networks.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| | | |
Collapse
|
10
|
Yu T, Bai Y. Improving gene expression data interpretation by finding latent factors that co-regulate gene modules with clinical factors. BMC Genomics 2011; 12:563. [PMID: 22087761 PMCID: PMC3282832 DOI: 10.1186/1471-2164-12-563] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 11/16/2011] [Indexed: 12/31/2022] Open
Abstract
Background In the analysis of high-throughput data with a clinical outcome, researchers mostly focus on genes/proteins that show first-order relations with the clinical outcome. While this approach yields biomarkers and biological mechanisms that are easily interpretable, it may miss information that is important to the understanding of disease mechanism and/or treatment response. Here we test the hypothesis that unobserved factors can be mobilized by the living system to coordinate the response to the clinical factors. Results We developed a computational method named Guided Latent Factor Discovery (GLFD) to identify hidden factors that act in combination with the observed clinical factors to control gene modules. In simulation studies, the method recovered masked factors effectively. Using real microarray data, we demonstrate that the method identifies latent factors that are biologically relevant, and extracts more information than analyzing only the first-order response to the clinical outcome. Conclusions Finding latent factors using GLFD brings extra insight into the mechanisms of the disease/drug response. The R code of the method is available at http://userwww.service.emory.edu/~tyu8/GLFD.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA.
| | | |
Collapse
|