1
|
Ai D, Chen L, Xie J, Cheng L, Zhang F, Luan Y, Li Y, Hou S, Sun F, Xia LC. Identifying local associations in biological time series: algorithms, statistical significance, and applications. Brief Bioinform 2023; 24:bbad390. [PMID: 37930023 DOI: 10.1093/bib/bbad390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/21/2023] [Accepted: 09/14/2023] [Indexed: 11/07/2023] Open
Abstract
Local associations refer to spatial-temporal correlations that emerge from the biological realm, such as time-dependent gene co-expression or seasonal interactions between microbes. One can reveal the intricate dynamics and inherent interactions of biological systems by examining the biological time series data for these associations. To accomplish this goal, local similarity analysis algorithms and statistical methods that facilitate the local alignment of time series and assess the significance of the resulting alignments have been developed. Although these algorithms were initially devised for gene expression analysis from microarrays, they have been adapted and accelerated for multi-omics next generation sequencing datasets, achieving high scientific impact. In this review, we present an overview of the historical developments and recent advances for local similarity analysis algorithms, their statistical properties, and real applications in analyzing biological time series data. The benchmark data and analysis scripts used in this review are freely available at http://github.com/labxscut/lsareview.
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Lulu Chen
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Jiemin Xie
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Longwei Cheng
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China
| | - Fang Zhang
- Shenwan Hongyuan Securities Co. Ltd., Shanghai 200031, China
| | - Yihui Luan
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Yang Li
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, California, 90007, USA
| | - Li Charlie Xia
- Department of Statistics and Financial Mathematics, School of Mathematics, South China University of Technology, Guangzhou 510641, China
| |
Collapse
|
2
|
Ma Z, Davis SW, Ho YY. Flexible copula model for integrating correlated multi-omics data from single-cell experiments. Biometrics 2022. [PMID: 35622236 DOI: 10.1111/biom.13701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 05/18/2022] [Indexed: 11/27/2022]
Abstract
With recent advances in technologies to profile multi-omics data at the single-cell level, integrative multi-omics data analysis has been increasingly popular. It is increasingly common that information such as methylation changes, chromatin accessibility, and gene expression are jointly collected in a single-cell experiment. In biomedical studies, it is often of interest to study the associations between various data types and to examine how these associations might change according to other factors such as cell types and gene regulatory components. However, since each data type usually has a distinct marginal distribution, joint analysis of these changes of associations using multi-omics data is statistically challenging. In this paper, we propose a flexible copula-based framework to model covariate-dependent correlation structures independent of their marginals. In addition, the proposed approach could jointly combine a wide variety of univariate marginal distributions, either discrete or continuous, including the class of zero-inflated distributions. The performance of the proposed framework is demonstrated through a series of simulation studies. Finally, it is applied to a set of experimental data to investigate the dynamic relationship between single-cell RNA-sequencing, chromatin accessibility, and DNA methylation at different germ layers during mouse gastrulation. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zichen Ma
- Department of Public Health Sciences, Clemson University, Clemson, SC, USA
| | - Shannon W Davis
- Department of Biological Sciences, University of South Carolina, Columbia, SC, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, SC, USA
| |
Collapse
|
3
|
Yang Z, Ho YY. Modeling dynamic correlation in zero-inflated bivariate count data with applications to single-cell RNA sequencing data. Biometrics 2021; 78:766-776. [PMID: 33720414 PMCID: PMC8477913 DOI: 10.1111/biom.13457] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 03/03/2021] [Accepted: 03/08/2021] [Indexed: 12/13/2022]
Abstract
Interactions between biological molecules in a cell are tightly coordinated and often highly dynamic. As a result of these varying signaling activities, changes in gene coexpression patterns could often be observed. The advancements in next‐generation sequencing technologies bring new statistical challenges for studying these dynamic changes of gene coexpression. In recent years, methods have been developed to examine genomic information from individual cells. Single‐cell RNA sequencing (scRNA‐seq) data are count‐based, and often exhibit characteristics such as overdispersion and zero inflation. To explore the dynamic dependence structure in scRNA‐seq data and other zero‐inflated count data, new approaches are needed. In this paper, we consider overdispersion and zero inflation in count outcomes and propose a ZEro‐inflated negative binomial dynamic COrrelation model (ZENCO). The observed count data are modeled as a mixture of two components: success amplifications and dropout events in ZENCO. A latent variable is incorporated into ZENCO to model the covariate‐dependent correlation structure. We conduct simulation studies to evaluate the performance of our proposed method and to compare it with existing approaches. We also illustrate the implementation of our proposed approach using scRNA‐seq data from a study of minimal residual disease in melanoma.
Collapse
Affiliation(s)
- Zhen Yang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
4
|
Lu J, Lu Y, Ding Y, Xiao Q, Liu L, Cai Q, Kong Y, Bai Y, Yu T. DNLC: differential network local consistency analysis. BMC Bioinformatics 2019; 20:489. [PMID: 31874600 PMCID: PMC6929334 DOI: 10.1186/s12859-019-3046-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 08/21/2019] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The biological network is highly dynamic. Functional relations between genes can be activated or deactivated depending on the biological conditions. On the genome-scale network, subnetworks that gain or lose local expression consistency may shed light on the regulatory mechanisms related to the changing biological conditions, such as disease status or tissue developmental stages. RESULTS In this study, we develop a new method to select genes and modules on the existing biological network, in which local expression consistency changes significantly between clinical conditions. The method is called DNLC: Differential Network Local Consistency. In simulations, our algorithm detected artificially created local consistency changes effectively. We applied the method on two publicly available datasets, and the method detected novel genes and network modules that were biologically plausible. CONCLUSIONS The new method is effective in finding modules in which the gene expression consistency change between clinical conditions. It is a useful tool that complements traditional differential expression analyses to make discoveries from gene expression data. The R package is available at https://cran.r-project.org/web/packages/DNLC.
Collapse
Affiliation(s)
- Jianwei Lu
- School of Software Engineering, Tongji University, Shanghai, China
- Institute of Advanced Translational Medicine, Tongji University, Shanghai, China
| | - Yao Lu
- School of Software Engineering, Tongji University, Shanghai, China
| | - Yusheng Ding
- School of Software Engineering, Tongji University, Shanghai, China
| | - Qingyang Xiao
- Department of Environmental Health, Emory University, Atlanta, GA USA
| | - Linqing Liu
- School of Software Engineering, Tongji University, Shanghai, China
| | - Qingpo Cai
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| | - Yunchuan Kong
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| | - Yun Bai
- Department of Pharmaceutical Sciences, School of Pharmacy, Philadelphia College of Osteopathic Medicine, Georgia Campus, Suwanee, GA USA
| | - Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| |
Collapse
|
5
|
A hypergraph-based method for large-scale dynamic correlation study at the transcriptomic scale. BMC Genomics 2019; 20:397. [PMID: 31117943 PMCID: PMC6530038 DOI: 10.1186/s12864-019-5787-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 05/09/2019] [Indexed: 12/22/2022] Open
Abstract
Background The biological regulatory system is highly dynamic. Correlations between functionally related genes change over different biological conditions, which are often unobserved in the data. At the gene level, the dynamic correlations result in three-way gene interactions involving a pair of genes that change correlation, and a third gene that reflects the underlying cellular conditions. This type of ternary relation can be quantified by the Liquid Association statistic. Studying these three-way interactions at the gene triplet level have revealed important regulatory mechanisms in the biological system. Currently, due to the extremely large amount of possible combinations of triplets within a high-throughput gene expression dataset, no method is available to examine the ternary relationship at the biological system level and formally address the false discovery issue. Results Here we propose a new method, Hypergraph for Dynamic Correlation (HDC), to construct module-level three-way interaction networks. The method is able to present integrative uniform hypergraphs to reflect the global dynamic correlation pattern in the biological system, providing guidance to down-stream gene triplet-level analyses. To validate the method’s ability, we conducted two real data experiments using a melanoma RNA-seq dataset from The Cancer Genome Atlas (TCGA) and a yeast cell cycle dataset. The resulting hypergraphs are clearly biologically plausible, and suggest novel relations relevant to the biological conditions in the data. Conclusions We believe the new approach provides a valuable alternative method to analyze omics data that can extract higher order structures. The software is at https://github.com/yunchuankong/HypergraphDynamicCorrelation. Electronic supplementary material The online version of this article (10.1186/s12864-019-5787-x) contains supplementary material, which is available to authorized users.
Collapse
|
6
|
Ai D, Li X, Pan H, Chen J, Cram JA, Xia LC. Explore mediated co-varying dynamics in microbial community using integrated local similarity and liquid association analysis. BMC Genomics 2019; 20:185. [PMID: 30967122 PMCID: PMC6456937 DOI: 10.1186/s12864-019-5469-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Discovering the key microbial species and environmental factors of microbial community and characterizing their relationships with other members are critical to ecosystem studies. The microbial co-occurrence patterns across a variety of environmental settings have been extensively characterized. However, previous studies were limited by their restriction toward pairwise relationships, while there was ample evidence of third-party mediated co-occurrence in microbial communities. METHODS We implemented and applied the triplet-based liquid association analysis in combination with the local similarity analysis procedure to microbial ecology data. We developed an intuitive scheme to visualize those complex triplet associations along with pairwise correlations. Using a time series from the marine microbial ecosystem as example, we identified pairs of operational taxonomic units (OTUs) where the strength of their associations appeared to relate to the values of a third "mediator" variable. These "mediator" variables appear to modulate the associations between pairs of bacteria. RESULTS Using this analysis, we were able to assess the OTUs' ability to regulate its functional partners in the community, typically not manifested in the pairwise correlation patterns. For example, we identified Flavobacteria as a multifaceted player in the marine microbial ecosystem, and its clades were involved in mediating other OTU pairs. By contrast, SAR11 clades were not active mediators of the community, despite being abundant and highly correlated with other OTUs. Our results suggested that Flavobacteria are more likely to respond to situations where particles and unusual sources of dissolved organic material are prevalent, such as after a plankton bloom. On the other hand, SAR11s are oligotrophic chemoheterotrophs with inflexible metabolisms, and their relationships with other organisms may be less governed by environmental or biological factors. CONCLUSIONS By integrating liquid association with local similarity analysis to explore the mediated co-varying dynamics, we presented a novel perspective and a useful toolkit to analyze and interpret time series data from microbial community. Our augmented association network analysis is thus more representative of the true underlying dynamic structure of the microbial community. The analytic software in this study was implemented as new functionalities of the ELSA (Extended local similarity analysis) tool, which is available for free download ( http://bitbucket.org/charade/elsa ).
Collapse
Affiliation(s)
- Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Xueyuan Road, Haidian District, Beijing, 100001 China
| | - Xiaoxin Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Xueyuan Road, Haidian District, Beijing, 100001 China
| | - Hongfei Pan
- School of Mathematics and Physics, University of Science and Technology Beijing, Xueyuan Road, Haidian District, Beijing, 100001 China
| | - Jiamin Chen
- Department of Medicine, Stanford University School of Medicine, 269 Campus Dr., Stanford, CA 94305 USA
| | - Jacob A. Cram
- Center for Environmental Science, University of Maryland, Cambridge, MA 21613 USA
| | - Li C. Xia
- Department of Medicine, Stanford University School of Medicine, 269 Campus Dr., Stanford, CA 94305 USA
| |
Collapse
|
7
|
Yu T. A new dynamic correlation algorithm reveals novel functional aspects in single cell and bulk RNA-seq data. PLoS Comput Biol 2018; 14:e1006391. [PMID: 30080856 PMCID: PMC6095616 DOI: 10.1371/journal.pcbi.1006391] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 08/16/2018] [Accepted: 07/24/2018] [Indexed: 01/21/2023] Open
Abstract
Dynamic correlations are pervasive in high-throughput data. Large numbers of gene pairs can change their correlation patterns in response to observed/unobserved changes in physiological states. Finding changes in correlation patterns can reveal important regulatory mechanisms. Currently there is no method that can effectively detect global dynamic correlation patterns in a dataset. Given the challenging nature of the problem, the currently available methods use genes as surrogate measurements of physiological states, which cannot faithfully represent true underlying biological signals. In this study we develop a new method that directly identifies strong latent dynamic correlation signals from the data matrix, named DCA: Dynamic Correlation Analysis. At the center of the method is a new metric for the identification of pairs of variables that are highly likely to be dynamically correlated, without knowing the underlying physiological states that govern the dynamic correlation. We validate the performance of the method with extensive simulations. We applied the method to three real datasets: a single cell RNA-seq dataset, a bulk RNA-seq dataset, and a microarray gene expression dataset. In all three datasets, the method reveals novel latent factors with clear biological meaning, bringing new insights into the data. Dynamic correlation is an important area in expression data. However it hasn’t received much attention because of the lack of effective methods that can unravel the complex relationship. Here we describe a new method that represents a substantial improvement over existing approaches. It achieves the goal of efficiently finding patterns of dynamic correlation in RNA-seq data, as well as detecting biological functions associated with the dynamic correlation patterns. Unlike traditional methods that focus on first-order structures, linear or nonlinear, our method finds second-order patterns that bring insights into the regulations of the complex system. Some of the interesting discoveries by the new method, such as immunological functions of some intestinal epithelial cells, are validated by recent biological publications.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States of America
- * E-mail:
| |
Collapse
|
8
|
Xu X, Wang M, Li L, Che R, Li P, Pei L, Li H. Genome-wide trait-trait dynamics correlation study dissects the gene regulation pattern in maize kernels. BMC PLANT BIOLOGY 2017; 17:163. [PMID: 29037150 PMCID: PMC5644097 DOI: 10.1186/s12870-017-1119-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 10/09/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND Dissecting the genetic basis and regulatory mechanisms for the biosynthesis and accumulation of nutrients in maize could lead to the improved nutritional quality of this crop. Gene expression is regulated at the genomic, transcriptional, and post-transcriptional levels, all of which can produce diversity among traits. However, the expression of most genes connected with a particular trait usually does not have a direct association with the variation of that trait. In addition, expression profiles of genes involved in a single pathway may vary as the intrinsic cellular state changes. To work around these issues, we utilized a statistical method, liquid association (LA) to investigate the complex pattern of gene regulation in maize kernels. RESULTS We applied LA to the expression profiles of 28,769 genes to dissect dynamic trait-trait correlation patterns in maize kernels. Among the 1000 LA pairs (LAPs) with the largest LA scores, 686 LAPs were identified conditional correlation. We also identified 830 and 215 LA-scouting leaders based on the positive and negative LA scores, which were significantly enriched for some biological processes and molecular functions. Our analysis of the dynamic co-expression patterns in the carotene biosynthetic pathway clearly indicated the important role of lcyE, CYP97A, ZEP1, and VDE in this pathway, which may change the direction of carotene biosynthesis by controlling the influx and efflux of the substrate. The dynamic trait-trait correlation patterns between gene expression and oil concentration in the fatty acid metabolic pathway and its complex regulatory network were also assessed. 23 of 26 oil-associated genes were correlated with oil concentration conditioning on 580 LA-scoutinggenes, and 5% of these LA-scouting genes were annotated as enzymes in the oil metabolic pathway. CONCLUSIONS By focusing on the carotenoid and oil biosynthetic pathways in maize, we showed that a genome-wide LA analysis provides a novel and effective way to detect transcriptional regulatory relationships. This method will help us understand the biological role of maize kernel genes and will benefit maize breeding programs.
Collapse
Affiliation(s)
- Xiuqin Xu
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Min Wang
- National Maize Improvement Center of China, Key Laboratory of Crop Genomics and Genetic Improvement, China Agricultural University, Beijing, 100193 China
| | - Lianbo Li
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Ronghui Che
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Peng Li
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Laming Pei
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| | - Hui Li
- School of Biological and Science Technology, University of Jinan, Jinan, 250022 China
| |
Collapse
|
9
|
Ma T, Song C, Tseng GC. Discussant paper on ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’. STAT MODEL 2017. [DOI: 10.1177/1471082x17705992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Tianzhou Ma
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH, USA
| | - George C. Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| |
Collapse
|