1
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
2
|
Procopio A, Cesarelli G, Donisi L, Merola A, Amato F, Cosentino C. Combined mechanistic modeling and machine-learning approaches in systems biology - A systematic literature review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 240:107681. [PMID: 37385142 DOI: 10.1016/j.cmpb.2023.107681] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/14/2023] [Accepted: 06/14/2023] [Indexed: 07/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Mechanistic-based Model simulations (MM) are an effective approach commonly employed, for research and learning purposes, to better investigate and understand the inherent behavior of biological systems. Recent advancements in modern technologies and the large availability of omics data allowed the application of Machine Learning (ML) techniques to different research fields, including systems biology. However, the availability of information regarding the analyzed biological context, sufficient experimental data, as well as the degree of computational complexity, represent some of the issues that both MMs and ML techniques could present individually. For this reason, recently, several studies suggest overcoming or significantly reducing these drawbacks by combining the above-mentioned two methods. In the wake of the growing interest in this hybrid analysis approach, with the present review, we want to systematically investigate the studies available in the scientific literature in which both MMs and ML have been combined to explain biological processes at genomics, proteomics, and metabolomics levels, or the behavior of entire cellular populations. METHODS Elsevier Scopus®, Clarivate Web of Science™ and National Library of Medicine PubMed® databases were enquired using the queries reported in Table 1, resulting in 350 scientific articles. RESULTS Only 14 of the 350 documents returned by the comprehensive search conducted on the three major online databases met our search criteria, i.e. present a hybrid approach consisting of the synergistic combination of MMs and ML to treat a particular aspect of systems biology. CONCLUSIONS Despite the recent interest in this methodology, from a careful analysis of the selected papers, it emerged how examples of integration between MMs and ML are already present in systems biology, highlighting the great potential of this hybrid approach to both at micro and macro biological scales.
Collapse
Affiliation(s)
- Anna Procopio
- Department of Experimental and Clinical Medicine, Università degli Studi Magna Græcia, Catanzaro, 88100, Italia
| | - Giuseppe Cesarelli
- Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, 80125, Italy
| | - Leandro Donisi
- Department of Advanced Medical and Surgical Sciences, Università della Campania Luigi Vanvitelli, Napoli, 80138, Italy
| | - Alessio Merola
- Department of Experimental and Clinical Medicine, Università degli Studi Magna Græcia, Catanzaro, 88100, Italia
| | - Francesco Amato
- Department of Electrical Engineering and Information Technology, Università degli Studi di Napoli Federico II, Napoli, 80125, Italy.
| | - Carlo Cosentino
- Department of Experimental and Clinical Medicine, Università degli Studi Magna Græcia, Catanzaro, 88100, Italia.
| |
Collapse
|
3
|
Shen B, Lin Y, Bi C, Zhou S, Bai Z, Zheng G, Zhou J. Translational Informatics for Parkinson's Disease: from Big Biomedical Data to Small Actionable Alterations. GENOMICS, PROTEOMICS & BIOINFORMATICS 2019; 17:415-429. [PMID: 31786313 PMCID: PMC6943761 DOI: 10.1016/j.gpb.2018.10.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 08/29/2018] [Accepted: 11/02/2018] [Indexed: 02/05/2023]
Abstract
Parkinson's disease (PD) is a common neurological disease in elderly people, and its morbidity and mortality are increasing with the advent of global ageing. The traditional paradigm of moving from small data to big data in biomedical research is shifting toward big data-based identification of small actionable alterations. To highlight the use of big data for precision PD medicine, we review PD big data and informatics for the translation of basic PD research to clinical applications. We emphasize some key findings in clinically actionable changes, such as susceptibility genetic variations for PD risk population screening, biomarkers for the diagnosis and stratification of PD patients, risk factors for PD, and lifestyles for the prevention of PD. The challenges associated with the collection, storage, and modelling of diverse big data for PD precision medicine and healthcare are also summarized. Future perspectives on systems modelling and intelligent medicine for PD monitoring, diagnosis, treatment, and healthcare are discussed in the end.
Collapse
Affiliation(s)
- Bairong Shen
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu 610041, China.
| | - Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Cheng Bi
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Shengrong Zhou
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Zhongchen Bai
- Center for Translational Biomedical Informatics, Guizhou University School of Medicine, Guiyang 550025, China
| | - Guangmin Zheng
- Center for Translational Biomedical Informatics, Guizhou University School of Medicine, Guiyang 550025, China
| | - Jing Zhou
- Center for Translational Biomedical Informatics, Guizhou University School of Medicine, Guiyang 550025, China
| |
Collapse
|
4
|
Qian F, Guo J, Jiang Z, Shen B. Translational Bioinformatics for Cholangiocarcinoma: Opportunities and Challenges. Int J Biol Sci 2018; 14:920-929. [PMID: 29989102 PMCID: PMC6036745 DOI: 10.7150/ijbs.24622] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 02/02/2018] [Indexed: 02/07/2023] Open
Abstract
Translational bioinformatics is becoming a driven force and a new scientific paradigm for cancer research in the era of big data. To promote the cross-disciplinary communication and research, we take cholangiocarcinoma as an example to review the present status and the future perspectives of the bioinformatics models applied in cancer study. We first summarize the present application of computational methods to the study of cholangiocarcinoma ranged from pattern recognition of biological data, knowledge based data annotation to systems biological level modeling and clinical translation. Then the future opportunities and challenges about database or knowledge base building, novel model developing and molecular mechanism exploring as well as the intelligent decision supporting system construction for the precision diagnosis, prognosis and treatment of cholangiocarcinoma are discussed.
Collapse
Affiliation(s)
- Fuliang Qian
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Junping Guo
- The Affiliated Yixing Hospital of Jiangsu University, Yixing, 214200, China
| | - Zhi Jiang
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, Suzhou 215006, China.,Guizhou University School of Medicine, Guiyang, 550025, China.,Institute for Systems Genetics, West China Hospital, Sichuan University, Chengdu, 610041, China
| |
Collapse
|
5
|
K T, N KV, S S. Distribution based Fuzzy Estimate Spectral Clustering for Cancer Detection with Protein Sequence and Structural Motifs. Asian Pac J Cancer Prev 2018; 19:1935-1940. [PMID: 30051675 PMCID: PMC6165630 DOI: 10.22034/apjcp.2018.19.7.1935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Objective: In biological data analysis, protein sequence and structural motifs are an amino-acid sequence patterns
that are widespread and used as tools for detecting the cancer at an earlier stage. To improve the cancer detection with
minimum space and time complexity, Distribution based Fuzzy Estimate Spectral Clustering (DFESC) technique is
developed. Methods: Initially, the protein sequence motifs are taken from dataset to form the cluster. The Distribution
based spectral clustering is applied to group the protein sequence by measuring the generalized jaccard similarity
between each protein sequences. To develop the clustering accuracy, soft computing technique namely fuzzy logic is
applied to calculate membership value of each sequence motifs. Results: The outcome showed that the presented DFESC
technique effectively identifies the cancer in terms of clustering accuracy, false positive rate, and cancer detection time
and space complexity. Conclusion: Based on the observations, evaluation of DFESC technique provides improved
result for premature detection of cancer using protein sequence and structural motifs.
Collapse
Affiliation(s)
- Thenmozhi K
- Department of Computer Applications, Selvam College of Technology, Namakkal, TamilNadu, India,For Correspondence:
| | | | - Shanthi S
- Department of Computer Applications, Kongu Engineering College, Erode, TamilNadu, India
| |
Collapse
|
6
|
Networks Models of Actin Dynamics during Spermatozoa Postejaculatory Life: A Comparison among Human-Made and Text Mining-Based Models. BIOMED RESEARCH INTERNATIONAL 2016; 2016:9795409. [PMID: 27642606 PMCID: PMC5013236 DOI: 10.1155/2016/9795409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 07/26/2016] [Accepted: 07/27/2016] [Indexed: 11/25/2022]
Abstract
Here we realized a networks-based model representing the process of actin remodelling that occurs during the acquisition of fertilizing ability of human spermatozoa (HumanMade_ActinSpermNetwork, HM_ASN). Then, we compared it with the networks provided by two different text mining tools: Agilent Literature Search (ALS) and PESCADOR. As a reference, we used the data from the online repository Kyoto Encyclopaedia of Genes and Genomes (KEGG), referred to the actin dynamics in a more general biological context. We found that HM_ALS and the networks from KEGG data shared the same scale-free topology following the Barabasi-Albert model, thus suggesting that the information is spread within the network quickly and efficiently. On the contrary, the networks obtained by ALS and PESCADOR have a scale-free hierarchical architecture, which implies a different pattern of information transmission. Also, the hubs identified within the networks are different: HM_ALS and KEGG networks contain as hubs several molecules known to be involved in actin signalling; ALS was unable to find other hubs than “actin,” whereas PESCADOR gave some nonspecific result. This seems to suggest that the human-made information retrieval in the case of a specific event, such as actin dynamics in human spermatozoa, could be a reliable strategy.
Collapse
|