1
|
Dutta D, Sen A, Satagopan JM. Identifying genes associated with disease outcomes using joint sparse canonical correlation analysis-An application in renal clear cell carcinoma. Genet Epidemiol 2024; 48:414-432. [PMID: 38751238 PMCID: PMC11589067 DOI: 10.1002/gepi.22566] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 04/04/2024] [Accepted: 04/22/2024] [Indexed: 11/27/2024]
Abstract
Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing "gene component scores" and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA-KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans-regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2. Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia-regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.
Collapse
Affiliation(s)
- Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and GeneticsNational Cancer InstituteRockvilleUSA
| | - Ananda Sen
- Department of BiostatisticsUniversity of MichiganAnn ArborUSA
- Department of Family MedicineUniversity of MichiganAnn ArborUSA
| | - Jaya M. Satagopan
- Department of Biostatistics and EpidemiologyRutgers School of Public HealthPiscatawayUSA
| |
Collapse
|
2
|
Vich Vila A, Zhang J, Liu M, Faber KN, Weersma RK. Untargeted faecal metabolomics for the discovery of biomarkers and treatment targets for inflammatory bowel diseases. Gut 2024; 73:1909-1920. [PMID: 39002973 PMCID: PMC11503092 DOI: 10.1136/gutjnl-2023-329969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Accepted: 06/23/2024] [Indexed: 07/15/2024]
Abstract
The gut microbiome has been recognised as a key component in the pathogenesis of inflammatory bowel diseases (IBD), and the wide range of metabolites produced by gut bacteria are an important mechanism by which the human microbiome interacts with host immunity or host metabolism. High-throughput metabolomic profiling and novel computational approaches now allow for comprehensive assessment of thousands of metabolites in diverse biomaterials, including faecal samples. Several groups of metabolites, including short-chain fatty acids, tryptophan metabolites and bile acids, have been associated with IBD. In this Recent Advances article, we describe the contribution of metabolomics research to the field of IBD, with a focus on faecal metabolomics. We discuss the latest findings on the significance of these metabolites for IBD prognosis and therapeutic interventions and offer insights into the future directions of metabolomics research.
Collapse
Affiliation(s)
- Arnau Vich Vila
- Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Groningen, The Netherlands
| | - Jingwan Zhang
- Department of Medicine & Therapeutics, The Chinese University of Hong Kong, Hong Kong (SAR), People's Republic of China
- Microbiota I-Center (MagIC), Hong Kong (SAR), People's Republic of China
| | - Moting Liu
- Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, The Netherlands
| | - Klaas Nico Faber
- Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, The Netherlands
| | - Rinse K Weersma
- Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, Groningen, The Netherlands
| |
Collapse
|
3
|
Pusa T, Rousu J. Stable biomarker discovery in multi-omics data via canonical correlation analysis. PLoS One 2024; 19:e0309921. [PMID: 39250478 PMCID: PMC11383239 DOI: 10.1371/journal.pone.0309921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 08/20/2024] [Indexed: 09/11/2024] Open
Abstract
Multi-omics analysis offers a promising avenue to a better understanding of complex biological phenomena. In particular, untangling the pathophysiology of multifactorial health conditions such as the inflammatory bowel disease (IBD) could benefit from simultaneous consideration of several omics levels. However, taking full advantage of multi-omics data requires the adoption of suitable new tools. Multi-view learning, a machine learning technique that natively joins together heterogeneous data, is a natural source for such methods. Here we present a new approach to variable selection in unsupervised multi-view learning by applying stability selection to canonical correlation analysis (CCA). We apply our method, StabilityCCA, to simulated and real multi-omics data, and demonstrate its ability to find relevant variables and improve the stability of variable selection. In a case study on an IBD microbiome data set, we link together metagenomics and metabolomics, revealing a connection between their joint structure and the disease, and identifying potential biomarkers. Our results showcase the usefulness of multi-view learning in multi-omics analysis and demonstrate StabilityCCA as a powerful tool for biomarker discovery.
Collapse
Affiliation(s)
- Taneli Pusa
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
4
|
Liu W, Vu T, R Konigsberg I, A Pratte K, Zhuang Y, Kechris KJ. Smccnet 2.0: a comprehensive tool for multi-omics network inference with shiny visualization. BMC Bioinformatics 2024; 25:276. [PMID: 39179997 PMCID: PMC11344457 DOI: 10.1186/s12859-024-05900-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 08/14/2024] [Indexed: 08/26/2024] Open
Abstract
Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. AVAILABILITY : This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/ .
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
| | - Thao Vu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Iain R Konigsberg
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Katherine A Pratte
- Department of Biostatistics, National Jewish Health, Denver, 80206, CO, USA
| | - Yonghua Zhuang
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| |
Collapse
|
5
|
Volteras D, Shahrezaei V, Thomas P. Global transcription regulation revealed from dynamical correlations in time-resolved single-cell RNA sequencing. Cell Syst 2024; 15:694-708.e12. [PMID: 39121860 DOI: 10.1016/j.cels.2024.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/29/2024] [Accepted: 07/11/2024] [Indexed: 08/12/2024]
Abstract
Single-cell transcriptomics reveals significant variations in transcriptional activity across cells. Yet, it remains challenging to identify mechanisms of transcription dynamics from static snapshots. It is thus still unknown what drives global transcription dynamics in single cells. We present a stochastic model of gene expression with cell size- and cell cycle-dependent rates in growing and dividing cells that harnesses temporal dimensions of single-cell RNA sequencing through metabolic labeling protocols and cel lcycle reporters. We develop a parallel and highly scalable approximate Bayesian computation method that corrects for technical variation and accurately quantifies absolute burst frequency, burst size, and degradation rate along the cell cycle at a transcriptome-wide scale. Using Bayesian model selection, we reveal scaling between transcription rates and cell size and unveil waves of gene regulation across the cell cycle-dependent transcriptome. Our study shows that stochastic modeling of dynamical correlations identifies global mechanisms of transcription regulation. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Dimitris Volteras
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Vahid Shahrezaei
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Philipp Thomas
- Department of Mathematics, Faculty of Natural Sciences, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
6
|
Rodosthenous T, Shahrezaei V, Evangelou M. Multi-view data visualisation via manifold learning. PeerJ Comput Sci 2024; 10:e1993. [PMID: 38855253 PMCID: PMC11157621 DOI: 10.7717/peerj-cs.1993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 03/25/2024] [Indexed: 06/11/2024]
Abstract
Non-linear dimensionality reduction can be performed by manifold learning approaches, such as stochastic neighbour embedding (SNE), locally linear embedding (LLE) and isometric feature mapping (ISOMAP). These methods aim to produce two or three latent embeddings, primarily to visualise the data in intelligible representations. This manuscript proposes extensions of Student's t-distributed SNE (t-SNE), LLE and ISOMAP, for dimensionality reduction and visualisation of multi-view data. Multi-view data refers to multiple types of data generated from the same samples. The proposed multi-view approaches provide more comprehensible projections of the samples compared to the ones obtained by visualising each data-view separately. Commonly, visualisation is used for identifying underlying patterns within the samples. By incorporating the obtained low-dimensional embeddings from the multi-view manifold approaches into the K-means clustering algorithm, it is shown that clusters of the samples are accurately identified. Through extensive comparisons of novel and existing multi-view manifold learning algorithms on real and synthetic data, the proposed multi-view extension of t-SNE, named multi-SNE, is found to have the best performance, quantified both qualitatively and quantitatively by assessing the clusterings obtained. The applicability of multi-SNE is illustrated by its implementation in the newly developed and challenging multi-omics single-cell data. The aim is to visualise and identify cell heterogeneity and cell types in biological tissues relevant to health and disease. In this application, multi-SNE provides an improved performance over single-view manifold learning approaches and a promising solution for unified clustering of multi-omics single-cell data.
Collapse
Affiliation(s)
| | - Vahid Shahrezaei
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - Marina Evangelou
- Department of Mathematics, Imperial College London, London, United Kingdom
| |
Collapse
|
7
|
Rashid MM, Selvarajoo K. Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data. Brief Bioinform 2024; 25:bbae300. [PMID: 38904542 PMCID: PMC11190965 DOI: 10.1093/bib/bbae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/30/2024] [Accepted: 06/11/2024] [Indexed: 06/22/2024] Open
Abstract
The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.
Collapse
Affiliation(s)
- Md Mamunur Rashid
- Biomolecular Sequence to Function Division, BII, (ASTAR), Singapore 138671, Republic of Singapore
| | - Kumar Selvarajoo
- Biomolecular Sequence to Function Division, BII, (ASTAR), Singapore 138671, Republic of Singapore
- Synthetic Biology Translational Research Program, Yong Loo Lin School of Medicine, NUS, Singapore 117456, Republic of Singapore
- School of Biological Sciences, Nanyang Technological University (NTU), Singapore 639798, Republic of Singapore
| |
Collapse
|
8
|
Jilani M, Degras D, Haspel N. Elucidating Cancer Subtypes by Using the Relationship between DNA Methylation and Gene Expression. Genes (Basel) 2024; 15:631. [PMID: 38790260 PMCID: PMC11121157 DOI: 10.3390/genes15050631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 05/10/2024] [Accepted: 05/14/2024] [Indexed: 05/26/2024] Open
Abstract
Advancements in the field of next generation sequencing (NGS) have generated vast amounts of data for the same set of subjects. The challenge that arises is how to combine and reconcile results from different omics studies, such as epigenome and transcriptome, to improve the classification of disease subtypes. In this study, we introduce sCClust (sparse canonical correlation analysis with clustering), a technique to combine high-dimensional omics data using sparse canonical correlation analysis (sCCA), such that the correlation between datasets is maximized. This stage is followed by clustering the integrated data in a lower-dimensional space. We apply sCClust to gene expression and DNA methylation data for three cancer genomics datasets from the Cancer Genome Atlas (TCGA) to distinguish between underlying subtypes. We evaluate the identified subtypes using Kaplan-Meier plots and hazard ratio analysis on the three types of cancer-GBM (glioblastoma multiform), lung cancer and colon cancer. Comparison with subtypes identified by both single- and multi-omics studies implies improved clinical association. We also perform pathway over-representation analysis in order to identify up-regulated and down-regulated genes as tentative drug targets. The main goal of the paper is twofold: the integration of epigenomic and transcriptomic datasets followed by elucidating subtypes in the latent space. The significance of this study lies in the enhanced categorization of cancer data, which is crucial to precision medicine.
Collapse
Affiliation(s)
- Muneeba Jilani
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA;
| | - David Degras
- Department of Mathematics, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA;
| |
Collapse
|
9
|
Mazzella V, Dell'Anno A, Etxebarría N, González-Gaya B, Nuzzo G, Fontana A, Núñez-Pons L. High microbiome and metabolome diversification in coexisting sponges with different bio-ecological traits. Commun Biol 2024; 7:422. [PMID: 38589605 PMCID: PMC11001883 DOI: 10.1038/s42003-024-06109-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
Marine Porifera host diverse microbial communities, which influence host metabolism and fitness. However, functional relationships between sponge microbiomes and metabolic signatures are poorly understood. We integrate microbiome characterization, metabolomics and microbial predicted functions of four coexisting Mediterranean sponges -Petrosia ficiformis, Chondrosia reniformis, Crambe crambe and Chondrilla nucula. Microscopy observations reveal anatomical differences in microbial densities. Microbiomes exhibit strong species-specific trends. C. crambe shares many rare amplicon sequence variants (ASV) with the surrounding seawater. This suggests important inputs of microbial diversity acquired by selective horizontal acquisition. Phylum Cyanobacteria is mainly represented in C. nucula and C. crambe. According to putative functions, the microbiome of P. ficiformis and C. reniformis are functionally heterotrophic, while C. crambe and C. nucula are autotrophic. The four species display distinct metabolic profiles at single compound level. However, at molecular class level they share a "core metabolome". Concurrently, we find global microbiome-metabolome association when considering all four sponge species. Within each species still, sets of microbe/metabolites are identified driving multi-omics congruence. Our findings suggest that diverse microbial players and metabolic profiles may promote niche diversification, but also, analogous phenotypic patterns of "symbiont evolutionary convergence" in sponge assemblages where holobionts co-exist in the same area.
Collapse
Affiliation(s)
- Valerio Mazzella
- Department of Integrative Marine Ecology (EMI), Stazione Zoologica Anton Dohrn, Ischia Marine Centre, 80077, Ischia, Naples, Italy
- NBFC, National Biodiversity Future Center, Piazza Marina 61, Palermo, 90133, Italy
| | - Antonio Dell'Anno
- NBFC, National Biodiversity Future Center, Piazza Marina 61, Palermo, 90133, Italy.
- Department of Life and Environmental Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131, Ancona, Italy.
| | - Néstor Etxebarría
- Department of Analytical Chemistry, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Basque Country, Spain
- Research Centre for Experimental Marine Biology and Biotechnology (PIE), University of the Basque Country (UPV/EHU), Plentzia, Basque Country, Spain
| | - Belén González-Gaya
- Department of Analytical Chemistry, Faculty of Science and Technology, University of the Basque Country (UPV/EHU), Leioa, Basque Country, Spain
- Research Centre for Experimental Marine Biology and Biotechnology (PIE), University of the Basque Country (UPV/EHU), Plentzia, Basque Country, Spain
| | - Genoveffa Nuzzo
- Bio-Organic Chemistry Unit, Institute of Biomolecular Chemistry-CNR, Via Campi Flegrei 34, 80078, Pozzuoli, Italy
| | - Angelo Fontana
- Bio-Organic Chemistry Unit, Institute of Biomolecular Chemistry-CNR, Via Campi Flegrei 34, 80078, Pozzuoli, Italy
- Department of Biology, University of Naples Federico II, Via Cinthia-Bld. 7, 80126, Napoli, Italy
| | - Laura Núñez-Pons
- NBFC, National Biodiversity Future Center, Piazza Marina 61, Palermo, 90133, Italy.
- Department of Integrative Marine Ecology (EMI), Stazione Zoologica Anton Dohrn, Villa Comunale, 80121, Naples, Italy.
| |
Collapse
|
10
|
Wróbel S, Turek C, Stępień E, Piwowar M. Data integration through canonical correlation analysis and its application to OMICs research. J Biomed Inform 2024; 151:104575. [PMID: 38086443 DOI: 10.1016/j.jbi.2023.104575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/08/2023] [Indexed: 02/23/2024]
Abstract
The subject of the paper is a review of multidimensional data analysis methods, which is the canonical analysis with its various variants and its use in omics data research. The dynamic development of high-throughput methods, and with them the availability of large and constantly growing data resources, forces the development of new analytical approaches that allow the review of the analyzed processes, taking into account data from various levels of the organization of living organisms. The multidimensional perspective allows for the assessment of the analyzed phenomenon in a more realistic way, as it generally takes into account much more data (including OMICs data). Without omitting the complexity of an organism, the method simplifies the multidimensional view, finally giving the result so that the researcher can draw practical conclusions. This is particularly important in medical sciences, where the study of pathological processes is usually aimed at developing treatment regimens. One of the primary methods for studying biomedical processes in a multidimensional approach is the canonical correlation analysis (CCA) with various variants. The use of CCA unique methodologies for simultaneous analysis of multiset biomolecular data opens up new avenues for studying previously undiscovered processes and interdependencies such as e.g. in the tumor microenvironment (TME) connected to intercellular communication. Because of the huge and still untapped potential of canonical correlation, in this review available implementations of CCA techniques are presented. In particular, the possibility of using the technique of canonical correlation analysis for OMICs data is emphasized.
Collapse
Affiliation(s)
- Sonia Wróbel
- Department of Medical Physics, Jagiellonian University, Marian Smoluchowski Institute of Physics, Krakow, Poland
| | - Cezary Turek
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Krakow, Poland
| | - Ewa Stępień
- Department of Medical Physics, Jagiellonian University, Marian Smoluchowski Institute of Physics, Krakow, Poland; Center for Theranostics, Jagiellonian University ul. Kopernika 40, 31-034 Kraków, Poland; Total-Body Jagiellonian-PET Laboratory, Jagiellonian University, Kraków, Poland.
| | - Monika Piwowar
- Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Krakow, Poland.
| |
Collapse
|
11
|
Senar N, van de Wiel M, Zwinderman AH, Hof MH. TOSCCA: a framework for interpretation and testing of sparse canonical correlations. BIOINFORMATICS ADVANCES 2024; 4:vbae021. [PMID: 38456127 PMCID: PMC10919946 DOI: 10.1093/bioadv/vbae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/24/2024] [Accepted: 02/14/2024] [Indexed: 03/09/2024]
Abstract
Summary In clinical and biomedical research, multiple high-dimensional datasets are nowadays routinely collected from omics and imaging devices. Multivariate methods, such as Canonical Correlation Analysis (CCA), integrate two (or more) datasets to discover and understand underlying biological mechanisms. For an explorative method like CCA, interpretation is key. We present a sparse CCA method based on soft-thresholding that produces near-orthogonal components, allows for browsing over various sparsity levels, and permutation-based hypothesis testing. Our soft-thresholding approach avoids tuning of a penalty parameter. Such tuning is computationally burdensome and may render unintelligible results. In addition, unlike alternative approaches, our method is less dependent on the initialization. We examined the performance of our approach with simulations and illustrated its use on real cancer genomics data from drug sensitivity screens. Moreover, we compared its performance to Penalized Matrix Analysis (PMA), which is a popular alternative of sparse CCA with a focus on yielding interpretable results. Compared to PMA, our method offers improved interpretability of the results, while not compromising, or even improving, signal discovery. Availability and implementation The software and simulation framework are available at https://github.com/nuria-sv/toscca.
Collapse
Affiliation(s)
- Nuria Senar
- Department of Epidemiology & Data Science, Amsterdam School of Public Health, Amsterdam UMC, 1105 AZ Nord-Holland, The Netherlands
| | - Mark van de Wiel
- Department of Epidemiology & Data Science, Amsterdam School of Public Health, Amsterdam UMC, 1105 AZ Nord-Holland, The Netherlands
| | - Aeilko H Zwinderman
- Department of Epidemiology & Data Science, Amsterdam School of Public Health, Amsterdam UMC, 1105 AZ Nord-Holland, The Netherlands
| | - Michel H Hof
- Department of Epidemiology & Data Science, Amsterdam School of Public Health, Amsterdam UMC, 1105 AZ Nord-Holland, The Netherlands
| |
Collapse
|
12
|
Zhang J, Ma Z, Yang Y, Guo L, Du L. Modeling genotype-protein interaction and correlation for Alzheimer's disease: a multi-omics imaging genetics study. Brief Bioinform 2024; 25:bbae038. [PMID: 38348747 PMCID: PMC10939371 DOI: 10.1093/bib/bbae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/23/2023] [Accepted: 01/14/2024] [Indexed: 02/15/2024] Open
Abstract
Integrating and analyzing multiple omics data sets, including genomics, proteomics and radiomics, can significantly advance researchers' comprehensive understanding of Alzheimer's disease (AD). However, current methodologies primarily focus on the main effects of genetic variation and protein, overlooking non-additive effects such as genotype-protein interaction (GPI) and correlation patterns in brain imaging genetics studies. Importantly, these non-additive effects could contribute to intermediate imaging phenotypes, finally leading to disease occurrence. In general, the interaction between genetic variations and proteins, and their correlations are two distinct biological effects, and thus disentangling the two effects for heritable imaging phenotypes is of great interest and need. Unfortunately, this issue has been largely unexploited. In this paper, to fill this gap, we propose $\textbf{M}$ulti-$\textbf{T}$ask $\textbf{G}$enotype-$\textbf{P}$rotein $\textbf{I}$nteraction and $\textbf{C}$orrelation disentangling method ($\textbf{MT-GPIC}$) to identify GPI and extract correlation patterns between them. To ensure stability and interpretability, we use novel and off-the-shelf penalties to identify meaningful genetic risk factors, as well as exploit the interconnectedness of different brain regions. Additionally, since computing GPI poses a high computational burden, we develop a fast optimization strategy for solving MT-GPIC, which is guaranteed to converge. Experimental results on the Alzheimer's Disease Neuroimaging Initiative data set show that MT-GPIC achieves higher correlation coefficients and classification accuracy than state-of-the-art methods. Moreover, our approach could effectively identify interpretable phenotype-related GPI and correlation patterns in high-dimensional omics data sets. These findings not only enhance the diagnostic accuracy but also contribute valuable insights into the underlying pathogenic mechanisms of AD.
Collapse
Affiliation(s)
- Jin Zhang
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Zikang Ma
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Yan Yang
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Lei Guo
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | - Lei Du
- Department of Intelligent Science and Technology, Northwestern Polytechnical University School of Automation, 127 Youyi Road, 710072 Shaanxi, China
| | | |
Collapse
|
13
|
Bhattachan P, Jeschke MG. SINGLE-CELL TRANSCRIPTOME ANALYSIS IN HEALTH AND DISEASE. Shock 2024; 61:19-27. [PMID: 37962963 PMCID: PMC10883422 DOI: 10.1097/shk.0000000000002274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
ABSTRACT The analysis of the single-cell transcriptome has emerged as a powerful tool to gain insights on the basic mechanisms of health and disease. It is widely used to reveal the cellular diversity and complexity of tissues at cellular resolution by RNA sequencing of the whole transcriptome from a single cell. Equally, it is applied to discover an unknown, rare population of cells in the tissue. The prime advantage of single-cell transcriptome analysis is the detection of stochastic nature of gene expression of the cell in tissue. Moreover, the availability of multiple platforms for the single-cell transcriptome has broadened its approaches to using cells of different sizes and shapes, including the capture of short or full-length transcripts, which is helpful in the analysis of challenging biological samples. And with the development of numerous packages in R and Python, new directions in the computational analysis of single-cell transcriptomes can be taken to characterize healthy versus diseased tissues to obtain novel pathological insights. Downstream analysis such as differential gene expression analysis, gene ontology term analysis, Kyoto Encyclopedia of Genes and Genomes pathway analysis, cell-cell interaction analysis, and trajectory analysis has become standard practice in the workflow of single-cell transcriptome analysis to further examine the biology of different cell types. Here, we provide a broad overview of single-cell transcriptome analysis in health and disease conditions currently applied in various studies.
Collapse
|
14
|
Fernandez ME, Martinez-Romero J, Aon MA, Bernier M, Price NL, de Cabo R. How is Big Data reshaping preclinical aging research? Lab Anim (NY) 2023; 52:289-314. [PMID: 38017182 DOI: 10.1038/s41684-023-01286-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/10/2023] [Indexed: 11/30/2023]
Abstract
The exponential scientific and technological progress during the past 30 years has favored the comprehensive characterization of aging processes with their multivariate nature, leading to the advent of Big Data in preclinical aging research. Spanning from molecular omics to organism-level deep phenotyping, Big Data demands large computational resources for storage and analysis, as well as new analytical tools and conceptual frameworks to gain novel insights leading to discovery. Systems biology has emerged as a paradigm that utilizes Big Data to gain insightful information enabling a better understanding of living organisms, visualized as multilayered networks of interacting molecules, cells, tissues and organs at different spatiotemporal scales. In this framework, where aging, health and disease represent emergent states from an evolving dynamic complex system, context given by, for example, strain, sex and feeding times, becomes paramount for defining the biological trajectory of an organism. Using bioinformatics and artificial intelligence, the systems biology approach is leading to remarkable advances in our understanding of the underlying mechanism of aging biology and assisting in creative experimental study designs in animal models. Future in-depth knowledge acquisition will depend on the ability to fully integrate information from different spatiotemporal scales in organisms, which will probably require the adoption of theories and methods from the field of complex systems. Here we review state-of-the-art approaches in preclinical research, with a focus on rodent models, that are leading to conceptual and/or technical advances in leveraging Big Data to understand basic aging biology and its full translational potential.
Collapse
Affiliation(s)
- Maria Emilia Fernandez
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Jorge Martinez-Romero
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Epidemiology and Population Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Miguel A Aon
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Cardiovascular Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Michel Bernier
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Nathan L Price
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Rafael de Cabo
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
| |
Collapse
|
15
|
Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023; 39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Accurate disease risk prediction is an essential step in the modern quest for precision medicine. While high-dimensional multi-omics data have provided unprecedented data resources for prediction studies, their high-dimensionality and complex inter/intra-relationships have posed significant analytical challenges. RESULTS We proposed a two-step Bayesian linear mixed model framework (TBLMM) for risk prediction analysis on multi-omics data. TBLMM models the predictive effects from multi-omics data using a hybrid of the sparsity regression and linear mixed model with multiple random effects. It can resemble the shape of the true effect size distributions and accounts for non-linear, including interaction effects, among multi-omics data via kernel fusion. It infers its parameters via a computationally efficient variational Bayes algorithm. Through extensive simulation studies and the prediction analyses on the positron emission tomography imaging outcomes using data obtained from the Alzheimer's Disease Neuroimaging Initiative, we have demonstrated that TBLMM can consistently outperform the existing method in predicting the risk of complex traits. AVAILABILITY AND IMPLEMENTATION The corresponding R package is available on GitHub (https://github.com/YaluWen/TBLMM).
Collapse
Affiliation(s)
- Yang Hai
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Jixiang Ma
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Kaixin Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
16
|
Kim J, Sandri BJ, Rao RB, Lock EF. Bayesian predictive modeling of multi-source multi-way data. Comput Stat Data Anal 2023; 186:107783. [PMID: 37274461 PMCID: PMC10237362 DOI: 10.1016/j.csda.2023.107783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
A Bayesian approach to predict a continuous or binary outcome from data that are collected from multiple sources with a multi-way (i.e., multidimensional tensor) structure is described. As a motivating example, molecular data from multiple 'omics sources, each measured over multiple developmental time points, as predictors of early-life iron deficiency (ID) in a rhesus monkey model are considered. The method uses a linear model with a low-rank structure on the coefficients to capture multi-way dependence and model the variance of the coefficients separately across each source to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs sampling algorithm for posterior inference, assuming a continuous outcome with normal errors or a binary outcome with a probit link. Simulations demonstrate that the model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients, with large gains in performance by incorporating multi-way structure and modest gains when accounting for differing signal sizes across the different sources. Moreover, it provides robust classification of ID monkeys for the motivating application.
Collapse
Affiliation(s)
- Jonathan Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| | - Brian J. Sandri
- Division of Neonatology, Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, MN, USA
| | - Raghavendra B. Rao
- Division of Neonatology, Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, MN, USA
| | - Eric F. Lock
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| |
Collapse
|
17
|
Way GP, Sailem H, Shave S, Kasprowicz R, Carragher NO. Evolution and impact of high content imaging. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:292-305. [PMID: 37666456 DOI: 10.1016/j.slasd.2023.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/09/2023] [Accepted: 08/29/2023] [Indexed: 09/06/2023]
Abstract
The field of high content imaging has steadily evolved and expanded substantially across many industry and academic research institutions since it was first described in the early 1990's. High content imaging refers to the automated acquisition and analysis of microscopic images from a variety of biological sample types. Integration of high content imaging microscopes with multiwell plate handling robotics enables high content imaging to be performed at scale and support medium- to high-throughput screening of pharmacological, genetic and diverse environmental perturbations upon complex biological systems ranging from 2D cell cultures to 3D tissue organoids to small model organisms. In this perspective article the authors provide a collective view on the following key discussion points relevant to the evolution of high content imaging: • Evolution and impact of high content imaging: An academic perspective • Evolution and impact of high content imaging: An industry perspective • Evolution of high content image analysis • Evolution of high content data analysis pipelines towards multiparametric and phenotypic profiling applications • The role of data integration and multiomics • The role and evolution of image data repositories and sharing standards • Future perspective of high content imaging hardware and software.
Collapse
Affiliation(s)
- Gregory P Way
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Heba Sailem
- School of Cancer and Pharmaceutical Sciences, King's College London, UK
| | - Steven Shave
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK; Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK
| | - Richard Kasprowicz
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK
| | - Neil O Carragher
- Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK.
| |
Collapse
|
18
|
Massimino M, Martorana F, Stella S, Vitale SR, Tomarchio C, Manzella L, Vigneri P. Single-Cell Analysis in the Omics Era: Technologies and Applications in Cancer. Genes (Basel) 2023; 14:1330. [PMID: 37510235 PMCID: PMC10380065 DOI: 10.3390/genes14071330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 06/16/2023] [Accepted: 06/20/2023] [Indexed: 07/30/2023] Open
Abstract
Cancer molecular profiling obtained with conventional bulk sequencing describes average alterations obtained from the entire cellular population analyzed. In the era of precision medicine, this approach is unable to track tumor heterogeneity and cannot be exploited to unravel the biological processes behind clonal evolution. In the last few years, functional single-cell omics has improved our understanding of cancer heterogeneity. This approach requires isolation and identification of single cells starting from an entire population. A cell suspension obtained by tumor tissue dissociation or hematological material can be manipulated using different techniques to separate individual cells, employed for single-cell downstream analysis. Single-cell data can then be used to analyze cell-cell diversity, thus mapping evolving cancer biological processes. Despite its unquestionable advantages, single-cell analysis produces massive amounts of data with several potential biases, stemming from cell manipulation and pre-amplification steps. To overcome these limitations, several bioinformatic approaches have been developed and explored. In this work, we provide an overview of this entire process while discussing the most recent advances in the field of functional omics at single-cell resolution.
Collapse
Affiliation(s)
- Michele Massimino
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Federica Martorana
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Stefania Stella
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Silvia Rita Vitale
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Cristina Tomarchio
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Livia Manzella
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
| | - Paolo Vigneri
- Department of Clinical and Experimental Medicine, University of Catania, 95123 Catania, Italy
- Center of Experimental Oncology and Hematology, A.O.U. Policlinico "G. Rodolico-S. Marco", 95123 Catania, Italy
- Humanitas Istituto Clinico Catanese, University Oncology Department, 95045 Catania, Italy
| |
Collapse
|
19
|
Du L, Zhang J, Zhao Y, Shang M, Guo L, Han J. inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:396-413. [PMID: 37442417 PMCID: PMC10634656 DOI: 10.1016/j.gpb.2023.03.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 01/29/2023] [Accepted: 03/14/2023] [Indexed: 07/15/2023]
Abstract
Identifying genetic risk factors for Alzheimer's disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case-control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
Collapse
Affiliation(s)
- Lei Du
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jin Zhang
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Ying Zhao
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Muheng Shang
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lei Guo
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Junwei Han
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
20
|
Big Data in Gastroenterology Research. Int J Mol Sci 2023; 24:ijms24032458. [PMID: 36768780 PMCID: PMC9916510 DOI: 10.3390/ijms24032458] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/18/2023] [Accepted: 01/20/2023] [Indexed: 01/28/2023] Open
Abstract
Studying individual data types in isolation provides only limited and incomplete answers to complex biological questions and particularly falls short in revealing sufficient mechanistic and kinetic details. In contrast, multi-omics approaches to studying health and disease permit the generation and integration of multiple data types on a much larger scale, offering a comprehensive picture of biological and disease processes. Gastroenterology and hepatobiliary research are particularly well-suited to such analyses, given the unique position of the luminal gastrointestinal (GI) tract at the nexus between the gut (mucosa and luminal contents), brain, immune and endocrine systems, and GI microbiome. The generation of 'big data' from multi-omic, multi-site studies can enhance investigations into the connections between these organ systems and organisms and more broadly and accurately appraise the effects of dietary, pharmacological, and other therapeutic interventions. In this review, we describe a variety of useful omics approaches and how they can be integrated to provide a holistic depiction of the human and microbial genetic and proteomic changes underlying physiological and pathophysiological phenomena. We highlight the potential pitfalls and alternatives to help avoid the common errors in study design, execution, and analysis. We focus on the application, integration, and analysis of big data in gastroenterology and hepatobiliary research.
Collapse
|
21
|
Athieniti E, Spyrou GM. A guide to multi-omics data collection and integration for translational medicine. Comput Struct Biotechnol J 2022; 21:134-149. [PMID: 36544480 PMCID: PMC9747357 DOI: 10.1016/j.csbj.2022.11.050] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022] Open
Abstract
The emerging high-throughput technologies have led to the shift in the design of translational medicine projects towards collecting multi-omics patient samples and, consequently, their integrated analysis. However, the complexity of integrating these datasets has triggered new questions regarding the appropriateness of the available computational methods. Currently, there is no clear consensus on the best combination of omics to include and the data integration methodologies required for their analysis. This article aims to guide the design of multi-omics studies in the field of translational medicine regarding the types of omics and the integration method to choose. We review articles that perform the integration of multiple omics measurements from patient samples. We identify five objectives in translational medicine applications: (i) detect disease-associated molecular patterns, (ii) subtype identification, (iii) diagnosis/prognosis, (iv) drug response prediction, and (v) understand regulatory processes. We describe common trends in the selection of omic types combined for different objectives and diseases. To guide the choice of data integration tools, we group them into the scientific objectives they aim to address. We describe the main computational methods adopted to achieve these objectives and present examples of tools. We compare tools based on how they deal with the computational challenges of data integration and comment on how they perform against predefined objective-specific evaluation criteria. Finally, we discuss examples of tools for downstream analysis and further extraction of novel insights from multi-omics datasets.
Collapse
Affiliation(s)
- Efi Athieniti
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| | - George M. Spyrou
- Department of Bioinformatics, The Cyprus Institute of Neurology and Genetics, 6 Iroon Avenue, 2371 Ayios Dometios, Nicosia, Cyprus
| |
Collapse
|
22
|
A multi-marker integrative analysis reveals benefits and risks of bariatric surgery. Sci Rep 2022; 12:18877. [PMID: 36344536 PMCID: PMC9640526 DOI: 10.1038/s41598-022-23241-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 10/27/2022] [Indexed: 11/09/2022] Open
Abstract
Bariatric surgery (BS) is an effective intervention for severe obesity and associated comorbidities. Although several studies have addressed the clinical and metabolic effects of BS, an integrative analysis of the complex body response to surgery is still lacking. We conducted a longitudinal data study with 36 patients with severe obesity who were tested before, 6 and 12 months after restrictive BS for more than one hundred blood biomarkers, including clinical, oxidative stress and metabolic markers, peptide mediators and red blood cell membrane lipids. By using a synthetic data-driven modeling based on principal component and correlation analyses, we provided evidence that, besides the early, well-known glucose metabolism- and weight loss-associated beneficial effects of BS, a tardive, weight-independent increase of the hepatic cholesterol metabolism occurs that is associated with potentially detrimental inflammatory and metabolic effects. Canonical correlation analysis indicated that oxidative stress is the most predictive feature of the BS-induced changes of both glucose and lipids metabolism. Our results show the power of multi-level correlation analysis to uncover the network of biological pathways affected by BS. This approach highlighted potential health risks of restrictive BS that are disregarded with the current practice to use weight loss as surrogate of BS success.
Collapse
|
23
|
Palzer EF, Wendt CH, Bowler RP, Hersh CP, Safo SE, Lock EF. sJIVE: Supervised Joint and Individual Variation Explained. Comput Stat Data Anal 2022; 175:107547. [PMID: 36119152 PMCID: PMC9481062 DOI: 10.1016/j.csda.2022.107547] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. The proposed method, supervised joint and individual variation explained (sJIVE), can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. These two components are weighted to compromise between explaining variation in the multi-source data and in the outcome. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data. An application to data from the COPDGene study explores gene expression and proteomic patterns associated with lung function.
Collapse
Affiliation(s)
- Elise F. Palzer
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| | - Christine H. Wendt
- Division of Pulmonary, Allergy and Critical Care, University of Minnesota, Minneapolis, 55455, USA
| | - Russell P. Bowler
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Sandra E. Safo
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| | - Eric F. Lock
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| |
Collapse
|
24
|
Lin PC, Tsai YS, Yeh YM, Shen MR. Cutting-Edge AI Technologies Meet Precision Medicine to Improve Cancer Care. Biomolecules 2022; 12:1133. [PMID: 36009026 PMCID: PMC9405970 DOI: 10.3390/biom12081133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open
Abstract
To provide precision medicine for better cancer care, researchers must work on clinical patient data, such as electronic medical records, physiological measurements, biochemistry, computerized tomography scans, digital pathology, and the genetic landscape of cancer tissue. To interpret big biodata in cancer genomics, an operational flow based on artificial intelligence (AI) models and medical management platforms with high-performance computing must be set up for precision cancer genomics in clinical practice. To work in the fast-evolving fields of patient care, clinical diagnostics, and therapeutic services, clinicians must understand the fundamentals of the AI tool approach. Therefore, the present article covers the following four themes: (i) computational prediction of pathogenic variants of cancer susceptibility genes; (ii) AI model for mutational analysis; (iii) single-cell genomics and computational biology; (iv) text mining for identifying gene targets in cancer; and (v) the NVIDIA graphics processing units, DRAGEN field programmable gate arrays systems and AI medical cloud platforms in clinical next-generation sequencing laboratories. Based on AI medical platforms and visualization, large amounts of clinical biodata can be rapidly copied and understood using an AI pipeline. The use of innovative AI technologies can deliver more accurate and rapid cancer therapy targets.
Collapse
Affiliation(s)
- Peng-Chan Lin
- Department of Oncology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
- Department of Genomic Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
| | - Yi-Shan Tsai
- Department of Medical Imaging, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
| | - Yu-Min Yeh
- Department of Oncology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
| | - Meng-Ru Shen
- Institute of Clinical Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
- Department of Obstetrics and Gynecology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
- Department of Pharmacology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan 704, Taiwan
| |
Collapse
|
25
|
Aggregative trans-eQTL analysis detects trait-specific target gene sets in whole blood. Nat Commun 2022; 13:4323. [PMID: 35882830 PMCID: PMC9325868 DOI: 10.1038/s41467-022-31845-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 07/06/2022] [Indexed: 01/13/2023] Open
Abstract
Large scale genetic association studies have identified many trait-associated variants and understanding the role of these variants in the downstream regulation of gene-expressions can uncover important mediating biological mechanisms. Here we propose ARCHIE, a summary statistic based sparse canonical correlation analysis method to identify sets of gene-expressions trans-regulated by sets of known trait-related genetic variants. Simulation studies show that compared to standard methods, ARCHIE is better suited to identify "core"-like genes through which effects of many other genes may be mediated and can capture disease-specific patterns of genetic associations. By applying ARCHIE to publicly available summary statistics from the eQTLGen consortium, we identify gene sets which have significant evidence of trans-association with groups of known genetic variants across 29 complex traits. Around half (50.7%) of the selected genes do not have any strong trans-associations and are not detected by standard methods. We provide further evidence for causal basis of the target genes through a series of follow-up analyses. These results show ARCHIE is a powerful tool for identifying sets of genes whose trans-regulation may be related to specific complex traits.
Collapse
|
26
|
Pan-Cancer Analysis for Immune Cell Infiltration and Mutational Signatures Using Non-Negative Canonical Correlation Analysis. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Mutational signatures indicate the mutational processes and substitution patterns in cancer cell genomes. However, the functional consequences of mutational signatures remain unclear, and there have been no comprehensive systematic studies to examine the relationships between the mutational signatures and the immune cell infiltration. Here, the relationship between mutational signatures and immune cell infiltration using non-negative canonical correlation analysis based on 8927 patients across 25 tumor types was investigated. By inspecting mutational signatures with the maximal coefficients determined by the non-negative canonical correlation analysis, the study identified mutational signatures related to immune cell infiltration composed of tumor microenvironments. The analysis was validated by showing that the genes associated with the identified mutational signatures were linked to overall survival by a Kaplan–Meier curve and a log-rank test and were mainly related to immunity by gene set enrichment analysis. These results will help expand our knowledge of tumor biology and recognize the functional roles and associations of immune systems with mutational signatures.
Collapse
|
27
|
Collective effects of human genomic variation on microbiome function. Sci Rep 2022; 12:3839. [PMID: 35264618 PMCID: PMC8907173 DOI: 10.1038/s41598-022-07632-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 02/22/2022] [Indexed: 11/09/2022] Open
Abstract
Studies of the impact of host genetics on gut microbiome composition have mainly focused on the impact of individual single nucleotide polymorphisms (SNPs) on gut microbiome composition, without considering their collective impact or the specific functions of the microbiome. To assess the aggregate role of human genetics on the gut microbiome composition and function, we apply sparse canonical correlation analysis (sCCA), a flexible, multivariate data integration method. A critical attribute of metagenome data is its sparsity, and here we propose application of a Tweedie distribution to accommodate this. We use the TwinsUK cohort to analyze the gut microbiomes and human variants of 250 individuals. Sparse CCA, or sCCA, identified SNPs in microbiome-associated metabolic traits (BMI, blood pressure) and microbiome-associated disorders (type 2 diabetes, some neurological disorders) and certain cancers. Both common and rare microbial functions such as secretion system proteins or antibiotic resistance were found to be associated with host genetics. sCCA applied to microbial species abundances found known associations such as Bifidobacteria species, as well as novel associations. Despite our small sample size, our method can identify not only previously known associations, but novel ones as well. Overall, we present a new and flexible framework for examining host-microbiome genetic interactions, and we provide a new dimension to the current debate around the role of human genetics on the gut microbiome.
Collapse
|
28
|
Abstract
Multi-omics data analysis is an important aspect of cancer molecular biology studies and has led to ground-breaking discoveries. Many efforts have been made to develop machine learning methods that automatically integrate omics data. Here, we review machine learning tools categorized as either general-purpose or task-specific, covering both supervised and unsupervised learning for integrative analysis of multi-omics data. We benchmark the performance of five machine learning approaches using data from the Cancer Cell Line Encyclopedia, reporting accuracy on cancer type classification and mean absolute error on drug response prediction, and evaluating runtime efficiency. This review provides recommendations to researchers regarding suitable machine learning method selection for their specific applications. It should also promote the development of novel machine learning methodologies for data integration, which will be essential for drug discovery, clinical trial design, and personalized treatments.
Collapse
Affiliation(s)
- Zhaoxiang Cai
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Rebecca C. Poulos
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| | - Jia Liu
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
- Faculty of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Qing Zhong
- ProCan®, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, 214 Hawkesbury Rd, Westmead, NSW 2145, Australia
| |
Collapse
|
29
|
Correa R, Alonso-Pupo N, Hernández Rodríguez EW. Multi-omics data integration approaches for precision oncology. Mol Omics 2022; 18:469-479. [DOI: 10.1039/d1mo00411e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Next-generation sequencing (NGS) has been pivotal to enhance the molecular characterization of human malignancies, allowing multiple omics data types to be available for cancer researchers and practitioners. In this context,...
Collapse
|
30
|
Pierre-Jean M, Mauger F, Deleuze JF, Le Floch E. PIntMF: Penalized Integrative Matrix Factorization method for multi-omics data. Bioinformatics 2021; 38:900-907. [PMID: 34849583 PMCID: PMC8796362 DOI: 10.1093/bioinformatics/btab786] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 09/30/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION It is more and more common to perform multi-omics analyses to explore the genome at diverse levels and not only at a single level. Through integrative statistical methods, multi-omics data have the power to reveal new biological processes, potential biomarkers and subgroups in a cohort. Matrix factorization (MF) is an unsupervised statistical method that allows a clustering of individuals, but also reveals relevant omics variables from the various blocks. RESULTS Here, we present PIntMF (Penalized Integrative Matrix Factorization), an MF model with sparsity, positivity and equality constraints. To induce sparsity in the model, we used a classical Lasso penalization on variable and individual matrices. For the matrix of samples, sparsity helps in the clustering, while normalization (matching an equality constraint) of inferred coefficients is added to improve interpretation. Moreover, we added an automatic tuning of the sparsity parameters using the famous glmnet package. We also proposed three criteria to help the user to choose the number of latent variables. PIntMF was compared with other state-of-the-art integrative methods including feature selection techniques in both synthetic and real data. PIntMF succeeds in finding relevant clusters as well as variables in two types of simulated data (correlated and uncorrelated). Next, PIntMF was applied to two real datasets (Diet and cancer), and it revealed interpretable clusters linked to available clinical data. Our method outperforms the existing ones on two criteria (clustering and variable selection). We show that PIntMF is an easy, fast and powerful tool to extract patterns and cluster samples from multi-omics data. AVAILABILITY AND IMPLEMENTATION An R package is available at https://github.com/mpierrejean/pintmf. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Florence Mauger
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| | - Edith Le Floch
- Centre National de Recherche en Génomique Humaine, CEA, Université de Paris-Saclay, Evry, France
| |
Collapse
|
31
|
Dysregulated Expression and Methylation Analysis Identified TLX1NB as a Novel Recurrence Marker in Low-Grade Gliomas. Int J Genomics 2020; 2020:5069204. [PMID: 33102572 PMCID: PMC7576335 DOI: 10.1155/2020/5069204] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 09/22/2020] [Indexed: 11/17/2022] Open
Abstract
Low-grade gliomas (LGGs) are the most common CNS tumors, and the main therapy for LGGs is complete surgical resection, due to its curative effect. However, LGG recurrence occurs frequently. Biomarkers play a crucial role in evaluating the recurrence and prognosis of LGGs. Numerous studies have focused on LGG prognosis. However, the multiomics research investigating the roles played by gene methylation and expression in LGG recurrence remains limited. In this study, we integrated the TCGA and GEO datasets, analyzing RNA and methylation data for recurrence (R) and nonrecurrence (NR) groups. We found a low expression of TLX1NB and high methylation in recurrence patients. Low expression of TLX1NB is associated with poor survival (OS: p = 0.04). The expression of TLX1NB is likely to play a role in the prognosis of LGG. Therefore, TLX1NB may represent an alternative early biomarker for the recurrence of low-grade gliomas.
Collapse
|