1
|
Dutta S, Mudaranthakam DP, Li Y, Sardiu ME. PerSEveML: a web-based tool to identify persistent biomarker structure for rare events using an integrative machine learning approach. Mol Omics 2024; 20:348-358. [PMID: 38690925 DOI: 10.1039/d4mo00008k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2024]
Abstract
Omics data sets often pose a computational challenge due to their high dimensionality, large size, and non-linear structures. Analyzing these data sets becomes especially daunting in the presence of rare events. Machine learning (ML) methods have gained traction for analyzing rare events, yet there has been limited exploration of bioinformatics tools that integrate ML techniques to comprehend the underlying biology. Expanding upon our previously developed computational framework of an integrative machine learning approach, we introduce PerSEveML, an interactive web-based tool that uses crowd-sourced intelligence to predict rare events and determine feature selection structures. PerSEveML provides a comprehensive overview of the integrative approach through evaluation metrics that help users understand the contribution of individual ML methods to the prediction process. Additionally, PerSEveML calculates entropy and rank scores, which visually organize input features into a persistent structure of selected, unselected, and fluctuating categories that help researchers uncover meaningful hypotheses regarding the underlying biology. We have evaluated PerSEveML on three diverse biologically complex data sets with extremely rare events from small to large scale and have demonstrated its ability to generate valid hypotheses. PerSEveML is available at https://biostats-shinyr.kumc.edu/PerSEveML/ and https://github.com/sreejatadutta/PerSEveML.
Collapse
Affiliation(s)
- Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
| | - Dinesh Pal Mudaranthakam
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
- University of Kansas Cancer Center, Kansas City, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
- University of Kansas Cancer Center, Kansas City, USA
| | - Mihaela E Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA.
- University of Kansas Cancer Center, Kansas City, USA
- Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
2
|
Dutta S, Mudaranthakam DP, Li Y, Sardiu ME. PerSEveML: A Web-Based Tool to Identify Persistent Biomarker Structure for Rare Events Using Integrative Machine Learning Approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.25.564000. [PMID: 38196661 PMCID: PMC10775315 DOI: 10.1101/2023.10.25.564000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Omics datasets often pose a computational challenge due to their high dimensionality, large size, and non-linear structures. Analyzing these datasets becomes especially daunting in the presence of rare events. Machine learning (ML) methods have gained traction for analyzing rare events, yet there remains a limited exploration of bioinformatics tools that integrate ML techniques to comprehend the underlying biology. Expanding upon our previously developed computational framework of an integrative machine learning approach1, we introduce PerSEveML, an interactive web-based that uses crowd-sourced intelligence to predict rare events and determine feature selection structures. PerSEveML provides a comprehensive overview of the integrative approach through evaluation metrics that help users understand the contribution of individual ML methods to the prediction process. Additionally, PerSEveML calculates entropy and rank scores, which visually organize input features into a persistent structure of selected, unselected, and fluctuating categories that help researchers uncover meaningful hypotheses regarding the underlying biology. We have evaluated PerSEveML on three diverse biologically complex data sets with extremely rare events from small to large scale and have demonstrated its ability to generate valid hypotheses. PerSEveML is available at https://biostats-shinyr.kumc.edu/PerSEveML/ and https://github.com/sreejatadutta/PerSEveML.
Collapse
Affiliation(s)
- Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Dinesh Pal Mudaranthakam
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, USA
| | - Mihaela E Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, USA
- Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
3
|
Fu J, Zhu F, Xu CJ, Li Y. Metabolomics meets systems immunology. EMBO Rep 2023; 24:e55747. [PMID: 36916532 PMCID: PMC10074123 DOI: 10.15252/embr.202255747] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/24/2022] [Accepted: 02/24/2023] [Indexed: 03/16/2023] Open
Abstract
Metabolic processes play a critical role in immune regulation. Metabolomics is the systematic analysis of small molecules (metabolites) in organisms or biological samples, providing an opportunity to comprehensively study interactions between metabolism and immunity in physiology and disease. Integrating metabolomics into systems immunology allows the exploration of the interactions of multilayered features in the biological system and the molecular regulatory mechanism of these features. Here, we provide an overview on recent technological developments of metabolomic applications in immunological research. To begin, two widely used metabolomics approaches are compared: targeted and untargeted metabolomics. Then, we provide a comprehensive overview of the analysis workflow and the computational tools available, including sample preparation, raw spectra data preprocessing, data processing, statistical analysis, and interpretation. Third, we describe how to integrate metabolomics with other omics approaches in immunological studies using available tools. Finally, we discuss new developments in metabolomics and its prospects for immunology research. This review provides guidance to researchers using metabolomics and multiomics in immunity research, thus facilitating the application of systems immunology to disease research.
Collapse
Affiliation(s)
- Jianbo Fu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Cheng-Jian Xu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Yang Li
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
4
|
Wu J, Liu H, Zhao X, Hong H, Werner J. Editorial: Cell signaling status alteration in development and disease. Front Cell Dev Biol 2022; 10:1068887. [PMID: 36531965 PMCID: PMC9752079 DOI: 10.3389/fcell.2022.1068887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 07/29/2023] Open
Affiliation(s)
- Jun Wu
- School of Life Sciences, East China Normal University, Shanghai, China
| | - Haipeng Liu
- Shanghai Pulmonary Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Xiaodong Zhao
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Huixiao Hong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| | - Johannes Werner
- Center for Data Processing, University of Tübingen, Tübingen, Germany
| |
Collapse
|
5
|
Yang L, Yang Y, Huang L, Cui X, Liu Y. From single- to multi-omics: future research trends in medicinal plants. Brief Bioinform 2022; 24:6840072. [PMID: 36416120 PMCID: PMC9851310 DOI: 10.1093/bib/bbac485] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/25/2022] Open
Abstract
Medicinal plants are the main source of natural metabolites with specialised pharmacological activities and have been widely examined by plant researchers. Numerous omics studies of medicinal plants have been performed to identify molecular markers of species and functional genes controlling key biological traits, as well as to understand biosynthetic pathways of bioactive metabolites and the regulatory mechanisms of environmental responses. Omics technologies have been widely applied to medicinal plants, including as taxonomics, transcriptomics, metabolomics, proteomics, genomics, pangenomics, epigenomics and mutagenomics. However, because of the complex biological regulation network, single omics usually fail to explain the specific biological phenomena. In recent years, reports of integrated multi-omics studies of medicinal plants have increased. Until now, there have few assessments of recent developments and upcoming trends in omics studies of medicinal plants. We highlight recent developments in omics research of medicinal plants, summarise the typical bioinformatics resources available for analysing omics datasets, and discuss related future directions and challenges. This information facilitates further studies of medicinal plants, refinement of current approaches and leads to new ideas.
Collapse
Affiliation(s)
- Lifang Yang
- Kunming University of Science and Technology, China
| | - Ye Yang
- Kunming University of Science and Technology, China
| | - Luqi Huang
- the academician of the Chinese Academy of Engineering, studies the development of traditional Chinese medicine, Chinese Academy of Chinese Medical Sciences, China
| | - Xiuming Cui
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| | - Yuan Liu
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| |
Collapse
|
6
|
Zhang S, Sun X, Mou M, Amahong K, Sun H, Zhang W, Shi S, Li Z, Gao J, Zhu F. REGLIV: Molecular regulation data of diverse living systems facilitating current multiomics research. Comput Biol Med 2022; 148:105825. [PMID: 35872412 DOI: 10.1016/j.compbiomed.2022.105825] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 06/29/2022] [Accepted: 07/03/2022] [Indexed: 12/24/2022]
Abstract
Multiomics is a powerful technique in molecular biology that facilitates the identification of new associations among different molecules (genes, proteins & metabolites). It has attracted tremendous research interest from the scientists worldwide and has led to an explosive number of published studies. Most of these studies are based on the regulation data provided in available databases. Therefore, it is essential to have molecular regulation data that are strictly validated in the living systems of various cell lines and in vivo models. However, no database has been developed yet to provide comprehensive molecular regulation information validated by living systems. Herein, a new database, Molecular Regulation Data of Living System Facilitating Multiomics Study (REGLIV) is introduced to describe various types of molecular regulation tested by the living systems. (1) A total of 2996 regulations describe the changes in 1109 metabolites triggered by alterations in 284 genes or proteins, and (2) 1179 regulations describe the variations in 926 proteins induced by 125 endogenous metabolites. Overall, REGLIV is unique in (a) providing the molecular regulation of a clearly defined regulatory direction other than simple correlation, (b) focusing on molecular regulations that are validated in a living system not simply in an in vitro test, and (c) describing the disease/tissue/species specific property underlying each regulation. Therefore, REGLIV has important implications for the future practice of not only multiomics, but also other fields relevant to molecular regulation. REGLIV is freely accessible at: https://idrblab.org/regliv/.
Collapse
Affiliation(s)
- Song Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Kuerbannisha Amahong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| |
Collapse
|
7
|
Liu T, Salguero P, Petek M, Martinez-Mira C, Balzano-Nogueira L, Ramšak Ž, McIntyre L, Gruden K, Tarazona S, Conesa A. PaintOmics 4: new tools for the integrative analysis of multi-omics datasets supported by multiple pathway databases. Nucleic Acids Res 2022; 50:W551-W559. [PMID: 35609982 PMCID: PMC9252773 DOI: 10.1093/nar/gkac352] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 01/02/2023] Open
Abstract
PaintOmics is a web server for the integrative analysis and visualisation of multi-omics datasets using biological pathway maps. PaintOmics 4 has several notable updates that improve and extend analyses. Three pathway databases are now supported: KEGG, Reactome and MapMan, providing more comprehensive pathway knowledge for animals and plants. New metabolite analysis methods fill gaps in traditional pathway-based enrichment methods. The metabolite hub analysis selects compounds with a high number of significant genes in their neighbouring network, suggesting regulation by gene expression changes. The metabolite class activity analysis tests the hypothesis that a metabolic class has a higher-than-expected proportion of significant elements, indicating that these compounds are regulated in the experiment. Finally, PaintOmics 4 includes a regulatory omics module to analyse the contribution of trans-regulatory layers (microRNA and transcription factors, RNA-binding proteins) to regulate pathways. We show the performance of PaintOmics 4 on both mouse and plant data to highlight how these new analysis features provide novel insights into regulatory biology. PaintOmics 4 is available at https://paintomics.org/.
Collapse
Affiliation(s)
- Tianyuan Liu
- Department of Mechanical Engineering, School of Engineering, Cardiff University, Cardiff, UK
| | - Pedro Salguero
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Marko Petek
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | | | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Lauren McIntyre
- Department of Molecular Genetics and Microbiology, Genetics Institute, University of Florida, Gainesville, USA
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain.,Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| |
Collapse
|
8
|
Novel data archival system for multi-omics data of human exposure to harmful substances. Mol Cell Toxicol 2022. [DOI: 10.1007/s13273-022-00226-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
Rajczewski AT, Jagtap PD, Griffin TJ. An overview of technologies for MS-based proteomics-centric multi-omics. Expert Rev Proteomics 2022; 19:165-181. [PMID: 35466851 PMCID: PMC9613604 DOI: 10.1080/14789450.2022.2070476] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Mass spectrometry-based proteomics reveals dynamic molecular signatures underlying phenotypes reflecting normal and perturbed conditions in living systems. Although valuable on its own, the proteome has only one level of moleclar information, with the genome, epigenome, transcriptome, and metabolome, all providing complementary information. Multi-omic analysis integrating information from one or more of these other domains with proteomic information provides a more complete picture of molecular contributors to dynamic biological systems. AREAS COVERED Here, we discuss the improvements to mass spectrometry-based technologies, focused on peptide-based, bottom-up approaches that have enabled deep, quantitative characterization of complex proteomes. These advances are facilitating the integration of proteomics data with other 'omic information, providing a more complete picture of living systems. We also describe the current state of bioinformatics software and approaches for integrating proteomics and other 'omics data, critical for enabling new discoveries driven by multi-omics. EXPERT COMMENTARY Multi-omics, centered on the integration of proteomics information with other 'omic information, has tremendous promise for biological and biomedical studies. Continued advances in approaches for generating deep, reliable proteomic data and bioinformatics tools aimed at integrating data across 'omic domains will ensure the discoveries offered by these multi-omic studies continue to increase.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Coauthor, Research Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| |
Collapse
|
10
|
Gruca A, Henzel J, Kostorz I, Stęclik T, Wróbel Ł, Sikora M. MAINE: a web tool for multi-omics feature selection and rule-based data exploration. Bioinformatics 2021; 38:1773-1775. [PMID: 34954788 PMCID: PMC8896606 DOI: 10.1093/bioinformatics/btab862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 09/22/2021] [Accepted: 12/22/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Patient multi-omics datasets are often characterized by a high dimensionality; however, usually only a small fraction of the features is informative, that is change in their value is directly related to the disease outcome or patient survival. In medical sciences, in addition to a robust feature selection procedure, the ability to discover human-readable patterns in the analyzed data is also desirable. To address this need, we created MAINE-Multi-omics Analysis and Exploration. The unique functionality of MAINE is the ability to discover multidimensional dependencies between the selected multi-omics features and event outcome prediction as well as patient survival probability. Learned patterns are visualized in the form of interpretable decision/survival trees and rules. AVAILABILITY AND IMPLEMENTATION MAINE is freely available at maine.ibemag.pl as an online web application. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Joanna Henzel
- Department of Computer Networks and Systems, Silesian University of Technology, 44-100 Gliwice, Poland
| | - Iwona Kostorz
- Łukasiewicz Research Network – Institute of Innovative Technologies EMAG, 40-189 Katowice, Poland
| | - Tomasz Stęclik
- Łukasiewicz Research Network – Institute of Innovative Technologies EMAG, 40-189 Katowice, Poland
| | - Łukasz Wróbel
- Department of Computer Networks and Systems, Silesian University of Technology, 44-100 Gliwice, Poland
| | | |
Collapse
|