1
|
Verma S, Magazzù G, Eftekhari N, Lou T, Gilhespy A, Occhipinti A, Angione C. Cross-attention enables deep learning on limited omics-imaging-clinical data of 130 lung cancer patients. CELL REPORTS METHODS 2024:100817. [PMID: 38981473 DOI: 10.1016/j.crmeth.2024.100817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 04/18/2024] [Accepted: 06/17/2024] [Indexed: 07/11/2024]
Abstract
Deep-learning tools that extract prognostic factors derived from multi-omics data have recently contributed to individualized predictions of survival outcomes. However, the limited size of integrated omics-imaging-clinical datasets poses challenges. Here, we propose two biologically interpretable and robust deep-learning architectures for survival prediction of non-small cell lung cancer (NSCLC) patients, learning simultaneously from computed tomography (CT) scan images, gene expression data, and clinical information. The proposed models integrate patient-specific clinical, transcriptomic, and imaging data and incorporate Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathway information, adding biological knowledge within the learning process to extract prognostic gene biomarkers and molecular pathways. While both models accurately stratify patients in high- and low-risk groups when trained on a dataset of only 130 patients, introducing a cross-attention mechanism in a sparse autoencoder significantly improves the performance, highlighting tumor regions and NSCLC-related genes as potential biomarkers and thus offering a significant methodological advancement when learning from small imaging-omics-clinical samples.
Collapse
Affiliation(s)
- Suraj Verma
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK
| | | | | | - Thai Lou
- Gateshead Health NHS Foundation Trust, Gateshead, UK
| | - Alex Gilhespy
- South Tyneside and Sunderland NHS Foundation Trust, Sunderland, UK
| | - Annalisa Occhipinti
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK; Centre for Digital Innovation, Teesside University, Middlesbrough, UK; National Horizons Centre, Teesside University, Darlington, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK; Centre for Digital Innovation, Teesside University, Middlesbrough, UK; National Horizons Centre, Teesside University, Darlington, UK.
| |
Collapse
|
2
|
Tarzi C, Zampieri G, Sullivan N, Angione C. Emerging methods for genome-scale metabolic modeling of microbial communities. Trends Endocrinol Metab 2024; 35:533-548. [PMID: 38575441 DOI: 10.1016/j.tem.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/28/2024] [Accepted: 02/29/2024] [Indexed: 04/06/2024]
Abstract
Genome-scale metabolic models (GEMs) are consolidating as platforms for studying mixed microbial populations, by combining biological data and knowledge with mathematical rigor. However, deploying these models to answer research questions can be challenging due to the increasing number of available computational tools, the lack of universal standards, and their inherent limitations. Here, we present a comprehensive overview of foundational concepts for building and evaluating genome-scale models of microbial communities. We then compare tools in terms of requirements, capabilities, and applications. Next, we highlight the current pitfalls and open challenges to consider when adopting existing tools and developing new ones. Our compendium can be relevant for the expanding community of modelers, both at the entry and experienced levels.
Collapse
Affiliation(s)
- Chaimaa Tarzi
- School of Computing, Engineering and Digital Technologies, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK
| | - Guido Zampieri
- Department of Biology, University of Padova, Padova, 35122, Veneto, Italy
| | - Neil Sullivan
- Complement Genomics Ltd, Station Rd, Lanchester, Durham, DH7 0EX, County Durham, UK
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK; Centre for Digital Innovation, Teesside University, Southfield Rd, Middlesbrough, TS1 3BX, North Yorkshire, UK; National Horizons Centre, Teesside University, 38 John Dixon Ln, Darlington, DL1 1HG, North Yorkshire, UK.
| |
Collapse
|
3
|
Turanli B, Gulfidan G, Aydogan OO, Kula C, Selvaraj G, Arga KY. Genome-scale metabolic models in translational medicine: the current status and potential of machine learning in improving the effectiveness of the models. Mol Omics 2024; 20:234-247. [PMID: 38444371 DOI: 10.1039/d3mo00152k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
The genome-scale metabolic model (GEM) has emerged as one of the leading modeling approaches for systems-level metabolic studies and has been widely explored for a broad range of organisms and applications. Owing to the development of genome sequencing technologies and available biochemical data, it is possible to reconstruct GEMs for model and non-model microorganisms as well as for multicellular organisms such as humans and animal models. GEMs will evolve in parallel with the availability of biological data, new mathematical modeling techniques and the development of automated GEM reconstruction tools. The use of high-quality, context-specific GEMs, a subset of the original GEM in which inactive reactions are removed while maintaining metabolic functions in the extracted model, for model organisms along with machine learning (ML) techniques could increase their applications and effectiveness in translational research in the near future. Here, we briefly review the current state of GEMs, discuss the potential contributions of ML approaches for more efficient and frequent application of these models in translational research, and explore the extension of GEMs to integrative cellular models.
Collapse
Affiliation(s)
- Beste Turanli
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
| | - Gizem Gulfidan
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
| | - Ozge Onluturk Aydogan
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
| | - Ceyda Kula
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
| | - Gurudeeban Selvaraj
- Concordia University, Centre for Research in Molecular Modeling & Department of Chemistry and Biochemistry, Quebec, Canada
- Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha Dental College and Hospital, Department of Biomaterials, Bioinformatics Unit, Chennai, India
| | - Kazim Yalcin Arga
- Marmara University, Faculty of Engineering, Department of Bioengineering, Istanbul, Turkey.
- Health Biotechnology Joint Research and Application Center of Excellence, Istanbul, Turkey
- Marmara University, Genetic and Metabolic Diseases Research and Investigation Center, Istanbul, Turkey
| |
Collapse
|
4
|
Gonçalves DM, Henriques R, Costa RS. Predicting metabolic fluxes from omics data via machine learning: Moving from knowledge-driven towards data-driven approaches. Comput Struct Biotechnol J 2023; 21:4960-4973. [PMID: 37876626 PMCID: PMC10590844 DOI: 10.1016/j.csbj.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/01/2023] [Accepted: 10/01/2023] [Indexed: 10/26/2023] Open
Abstract
The accurate prediction of phenotypes in microorganisms is a main challenge for systems biology. Genome-scale models (GEMs) are a widely used mathematical formalism for predicting metabolic fluxes using constraint-based modeling methods such as flux balance analysis (FBA). However, they require prior knowledge of the metabolic network of an organism and appropriate objective functions, often hampering the prediction of metabolic fluxes under different conditions. Moreover, the integration of omics data to improve the accuracy of phenotype predictions in different physiological states is still in its infancy. Here, we present a novel approach for predicting fluxes under various conditions. We explore the use of supervised machine learning (ML) models using transcriptomics and/or proteomics data and compare their performance against the standard parsimonious FBA (pFBA) approach using case studies of Escherichia coli organism as an example. Our results show that the proposed omics-based ML approach is promising to predict both internal and external metabolic fluxes with smaller prediction errors in comparison to the pFBA approach. The code, data, and detailed results are available at the project's repository[1].
Collapse
Affiliation(s)
- Daniel M. Gonçalves
- INESC-ID, Rua Alves Redol, 9, Lisbon, 1000-029, Portugal
- Instituto Superior Técnico, Av. Rovisco Pais, 1, Lisbon, 1049-001, Portugal
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, 2829-516, Portugal
| | - Rui Henriques
- INESC-ID, Rua Alves Redol, 9, Lisbon, 1000-029, Portugal
- Instituto Superior Técnico, Av. Rovisco Pais, 1, Lisbon, 1049-001, Portugal
| | - Rafael S. Costa
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, 2829-516, Portugal
| |
Collapse
|
5
|
Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023; 19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
6
|
Sopic M, Robinson EL, Emanueli C, Srivastava P, Angione C, Gaetano C, Condorelli G, Martelli F, Pedrazzini T, Devaux Y. Integration of epigenetic regulatory mechanisms in heart failure. Basic Res Cardiol 2023; 118:16. [PMID: 37140699 PMCID: PMC10158703 DOI: 10.1007/s00395-023-00986-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/27/2023] [Accepted: 04/10/2023] [Indexed: 05/05/2023]
Abstract
The number of "omics" approaches is continuously growing. Among others, epigenetics has appeared as an attractive area of investigation by the cardiovascular research community, notably considering its association with disease development. Complex diseases such as cardiovascular diseases have to be tackled using methods integrating different omics levels, so called "multi-omics" approaches. These approaches combine and co-analyze different levels of disease regulation. In this review, we present and discuss the role of epigenetic mechanisms in regulating gene expression and provide an integrated view of how these mechanisms are interlinked and regulate the development of cardiac disease, with a particular attention to heart failure. We focus on DNA, histone, and RNA modifications, and discuss the current methods and tools used for data integration and analysis. Enhancing the knowledge of these regulatory mechanisms may lead to novel therapeutic approaches and biomarkers for precision healthcare and improved clinical outcomes.
Collapse
Affiliation(s)
- Miron Sopic
- Department of Medical Biochemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, Serbia
| | - Emma L Robinson
- Division of Cardiology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Costanza Emanueli
- National Heart & Lung Institute, Imperial College London, London, UK
| | | | - Claudio Angione
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley, Middlesbrough, TS1 3BA, UK
- Centre for Digital Innovation, Teesside University, Campus Heart, Tees Valley, Middlesbrough, TS1 3BX, UK
- National Horizons Centre, Darlington, DL1 1HG, UK
| | - Carlo Gaetano
- Laboratorio di Epigenetica, Istituti Clinici Scientifici Maugeri IRCCS, Via Maugeri 10, 27100, Pavia, Italy
| | - Gianluigi Condorelli
- IRCCS-Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, MI, Italy
- Institute of Genetic and Biomedical Research, National Research Council of Italy, Arnold-Heller-Str.3, 24105, Milan, Italy
| | - Fabio Martelli
- Molecular Cardiology Laboratory, IRCCS-Policlinico San Donato, Via Morandi 30, San Donato Milanese, 20097, Milan, Italy
| | - Thierry Pedrazzini
- Experimental Cardiology Unit, Division of Cardiology, Department of Cardiovascular Medicine, University of Lausanne Medical School, 1011, Lausanne, Switzerland
| | - Yvan Devaux
- Cardiovascular Research Unit, Department of Population Health, Luxembourg Institute of Health, L-1445, Strassen, Luxembourg.
| |
Collapse
|
7
|
Magazzù G, Zampieri G, Angione C. Clinical stratification improves the diagnostic accuracy of small omics datasets within machine learning and genome-scale metabolic modelling methods. Comput Biol Med 2022; 151:106244. [PMID: 36343407 DOI: 10.1016/j.compbiomed.2022.106244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 10/07/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Recently, multi-omic machine learning architectures have been proposed for the early detection of cancer. However, for rare cancers and their associated small datasets, it is still unclear how to use the available multi-omics data to achieve a mechanistic prediction of cancer onset and progression, due to the limited data available. Hepatoblastoma is the most frequent liver cancer in infancy and childhood, and whose incidence has been lately increasing in several developed countries. Even though some studies have been conducted to understand the causes of its onset and discover potential biomarkers, the role of metabolic rewiring has not been investigated in depth so far. METHODS Here, we propose and implement an interpretable multi-omics pipeline that combines mechanistic knowledge from genome-scale metabolic models with machine learning algorithms, and we use it to characterise the underlying mechanisms controlling hepatoblastoma. RESULTS AND CONCLUSIONS While the obtained machine learning models generally present a high diagnostic classification accuracy, our results show that the type of omics combinations used as input to the machine learning models strongly affects the detection of important genes, reactions and metabolic pathways linked to hepatoblastoma. Our method also suggests that, in the context of computer-aided diagnosis of cancer, optimal diagnostic accuracy can be achieved by adopting a combination of omics that depends on the patient's clinical characteristics.
Collapse
Affiliation(s)
- Giuseppe Magazzù
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, England, United Kingdom
| | - Guido Zampieri
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, England, United Kingdom; Department of Biology, University of Padova, Padova, Italy
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, England, United Kingdom; Centre for Digital Innovation, Teesside University, Middlesbrough, England, United Kingdom; National Horizons Centre, Teesside University, Darlington, England, United Kingdom.
| |
Collapse
|
8
|
Gosselin MRF, Mournetas V, Borczyk M, Verma S, Occhipinti A, Róg J, Bozycki L, Korostynski M, Robson SC, Angione C, Pinset C, Gorecki DC. Loss of full-length dystrophin expression results in major cell-autonomous abnormalities in proliferating myoblasts. eLife 2022; 11:75521. [PMID: 36164827 PMCID: PMC9514850 DOI: 10.7554/elife.75521] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 09/02/2022] [Indexed: 12/05/2022] Open
Abstract
Duchenne muscular dystrophy (DMD) affects myofibers and muscle stem cells, causing progressive muscle degeneration and repair defects. It was unknown whether dystrophic myoblasts—the effector cells of muscle growth and regeneration—are affected. Using transcriptomic, genome-scale metabolic modelling and functional analyses, we demonstrate, for the first time, convergent abnormalities in primary mouse and human dystrophic myoblasts. In Dmdmdx myoblasts lacking full-length dystrophin, the expression of 170 genes was significantly altered. Myod1 and key genes controlled by MyoD (Myog, Mymk, Mymx, epigenetic regulators, ECM interactors, calcium signalling and fibrosis genes) were significantly downregulated. Gene ontology analysis indicated enrichment in genes involved in muscle development and function. Functionally, we found increased myoblast proliferation, reduced chemotaxis and accelerated differentiation, which are all essential for myoregeneration. The defects were caused by the loss of expression of full-length dystrophin, as similar and not exacerbated alterations were observed in dystrophin-null Dmdmdx-βgeo myoblasts. Corresponding abnormalities were identified in human DMD primary myoblasts and a dystrophic mouse muscle cell line, confirming the cross-species and cell-autonomous nature of these defects. The genome-scale metabolic analysis in human DMD myoblasts showed alterations in the rate of glycolysis/gluconeogenesis, leukotriene metabolism, and mitochondrial beta-oxidation of various fatty acids. These results reveal the disease continuum: DMD defects in satellite cells, the myoblast dysfunction affecting muscle regeneration, which is insufficient to counteract muscle loss due to myofiber instability. Contrary to the established belief, our data demonstrate that DMD abnormalities occur in myoblasts, making these cells a novel therapeutic target for the treatment of this lethal disease.
Collapse
Affiliation(s)
- Maxime R F Gosselin
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Portsmouth, United Kingdom
| | | | - Malgorzata Borczyk
- Laboratory of Pharmacogenomics, Maj Institute of Pharmacology PAS, Krakow, Poland
| | - Suraj Verma
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| | - Annalisa Occhipinti
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| | - Justyna Róg
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Portsmouth, United Kingdom.,Laboratory of Cellular Metabolism, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Lukasz Bozycki
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Portsmouth, United Kingdom.,Laboratory of Cellular Metabolism, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Michal Korostynski
- Laboratory of Pharmacogenomics, Maj Institute of Pharmacology PAS, Krakow, Poland
| | - Samuel C Robson
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Portsmouth, United Kingdom.,Centre for Enzyme Innovation, University of Portsmouth, Portsmouth, United Kingdom
| | - Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| | | | - Dariusz C Gorecki
- School of Pharmacy and Biomedical Sciences, University of Portsmouth, Portsmouth, United Kingdom
| |
Collapse
|
9
|
Sampaio M, Rocha M, Dias O. Exploring synergies between plant metabolic modelling and machine learning. Comput Struct Biotechnol J 2022; 20:1885-1900. [PMID: 35521559 PMCID: PMC9052043 DOI: 10.1016/j.csbj.2022.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 11/03/2022] Open
|
10
|
Angione C, Silverman E, Yaneske E. Using machine learning as a surrogate model for agent-based simulations. PLoS One 2022; 17:e0263150. [PMID: 35143521 PMCID: PMC8830643 DOI: 10.1371/journal.pone.0263150] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 01/12/2022] [Indexed: 02/02/2023] Open
Abstract
In this proof-of-concept work, we evaluate the performance of multiple machine-learning methods as surrogate models for use in the analysis of agent-based models (ABMs). Analysing agent-based modelling outputs can be challenging, as the relationships between input parameters can be non-linear or even chaotic even in relatively simple models, and each model run can require significant CPU time. Surrogate modelling, in which a statistical model of the ABM is constructed to facilitate detailed model analyses, has been proposed as an alternative to computationally costly Monte Carlo methods. Here we compare multiple machine-learning methods for ABM surrogate modelling in order to determine the approaches best suited as a surrogate for modelling the complex behaviour of ABMs. Our results suggest that, in most scenarios, artificial neural networks (ANNs) and gradient-boosted trees outperform Gaussian process surrogates, currently the most commonly used method for the surrogate modelling of complex computational models. ANNs produced the most accurate model replications in scenarios with high numbers of model runs, although training times were longer than the other methods. We propose that agent-based modelling would benefit from using machine-learning methods for surrogate modelling, as this can facilitate more robust sensitivity analyses for the models while also reducing CPU time consumption when calibrating and analysing the simulation.
Collapse
Affiliation(s)
- Claudio Angione
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
- Healthcare Innovation Centre, Teesside University, Middlesbrough, United Kingdom
- National Horizons Centre, Teesside University, Darlington, United Kingdom
- Centre for Digital Innovation, Teesside University, Middlesbrough, United Kingdom
| | - Eric Silverman
- Institute for Health and Wellbeing, University of Glasgow, Glasgow, United Kingdom
| | - Elisabeth Yaneske
- School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
11
|
Pio G, Mignone P, Magazzù G, Zampieri G, Ceci M, Angione C. Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction. Bioinformatics 2022; 38:487-493. [PMID: 34499112 DOI: 10.1093/bioinformatics/btab647] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 07/23/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Gene regulation is responsible for controlling numerous physiological functions and dynamically responding to environmental fluctuations. Reconstructing the human network of gene regulatory interactions is thus paramount to understanding the cell functional organization across cell types, as well as to elucidating pathogenic processes and identifying molecular drug targets. Although significant effort has been devoted towards this direction, existing computational methods mainly rely on gene expression levels, possibly ignoring the information conveyed by mechanistic biochemical knowledge. Moreover, except for a few recent attempts, most of the existing approaches only consider the information of the organism under analysis, without exploiting the information of related model organisms. RESULTS We propose a novel method for the reconstruction of the human gene regulatory network, based on a transfer learning strategy that synergically exploits information from human and mouse, conveyed by gene-related metabolic features generated in silico from gene expression data. Specifically, we learn a predictive model from metabolic activity inferred via tissue-specific metabolic modelling of artificial gene knockouts. Our experiments show that the combination of our transfer learning approach with the constructed metabolic features provides a significant advantage in terms of reconstruction accuracy, as well as additional clues on the contribution of each constructed metabolic feature. AVAILABILITY AND IMPLEMENTATION The method, the datasets and all the results obtained in this study are available at: https://doi.org/10.6084/m9.figshare.c.5237687. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,Big Data Lab, National Interuniversity Consortium for Informatics (CINI), Rome 00185, Italy
| | - Paolo Mignone
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,Big Data Lab, National Interuniversity Consortium for Informatics (CINI), Rome 00185, Italy
| | - Giuseppe Magazzù
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley TS1 3BA, UK
| | - Guido Zampieri
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley TS1 3BA, UK.,Department of Biology, University of Padova, Padova 35121, Italy
| | - Michelangelo Ceci
- Department of Computer Science, University of Bari Aldo Moro, Bari 70125, Italy.,Big Data Lab, National Interuniversity Consortium for Informatics (CINI), Rome 00185, Italy.,Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana 1000, Slovenia
| | - Claudio Angione
- School of Computing, Engineering & Digital Technologies, Teesside University, Tees Valley TS1 3BA, UK.,Centre for Digital Innovation, Teesside University, Campus Heart, Tees Valley TS1 3BX, UK.,Healthcare Innovation Centre, Teesside University, Campus Heart, Tees Valley TS1 3BX, UK
| |
Collapse
|
12
|
Khaleghi MK, Savizi ISP, Lewis NE, Shojaosadati SA. Synergisms of machine learning and constraint-based modeling of metabolism for analysis and optimization of fermentation parameters. Biotechnol J 2021; 16:e2100212. [PMID: 34390201 DOI: 10.1002/biot.202100212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/10/2021] [Accepted: 08/11/2021] [Indexed: 11/06/2022]
Abstract
Recent noteworthy advances in the development of high-performing microbial and mammalian strains have enabled the sustainable production of bio-economically valuable substances such as bio-compounds, biofuels, and biopharmaceuticals. However, to obtain an industrially viable mass-production scheme, much time and effort are required. The robust and rational design of fermentation processes requires analysis and optimization of different extracellular conditions and medium components, which have a massive effect on growth and productivity. In this regard, knowledge- and data-driven modeling methods have received much attention. Constraint-based modeling (CBM) is a knowledge-driven mathematical approach that has been widely used in fermentation analysis and optimization due to its capabilities of predicting the cellular phenotype from genotype through high-throughput means. On the other hand, machine learning (ML) is a data-driven statistical method that identifies the data patterns within sophisticated biological systems and processes, where there is inadequate knowledge to represent underlying mechanisms. Furthermore, ML models are becoming a viable complement to constraint-based models in a reciprocal manner when one is used as a pre-step of another. As a result, more predictable model is produced. This review highlights the applications of CBM and ML independently and the combination of these two approaches for analyzing and optimizing fermentation parameters. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Mohammad Karim Khaleghi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Iman Shahidi Pour Savizi
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, USA.,Department of Pediatrics, University of California, San Diego, USA
| | - Seyed Abbas Shojaosadati
- Biotechnology Department, Faculty of Chemical Engineering, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
13
|
Sahu A, Blätke MA, Szymański JJ, Töpfer N. Advances in flux balance analysis by integrating machine learning and mechanism-based models. Comput Struct Biotechnol J 2021; 19:4626-4640. [PMID: 34471504 PMCID: PMC8382995 DOI: 10.1016/j.csbj.2021.08.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 08/03/2021] [Accepted: 08/03/2021] [Indexed: 02/08/2023] Open
Abstract
The availability of multi-omics data sets and genome-scale metabolic models for various organisms provide a platform for modeling and analyzing genotype-to-phenotype relationships. Flux balance analysis is the main tool for predicting flux distributions in genome-scale metabolic models and various data-integrative approaches enable modeling context-specific network behavior. Due to its linear nature, this optimization framework is readily scalable to multi-tissue or -organ and even multi-organism models. However, both data and model size can hamper a straightforward biological interpretation of the estimated fluxes. Moreover, flux balance analysis simulates metabolism at steady-state and thus, in its most basic form, does not consider kinetics or regulatory events. The integration of flux balance analysis with complementary data analysis and modeling techniques offers the potential to overcome these challenges. In particular machine learning approaches have emerged as the tool of choice for data reduction and selection of most important variables in big data sets. Kinetic models and formal languages can be used to simulate dynamic behavior. This review article provides an overview of integrative studies that combine flux balance analysis with machine learning approaches, kinetic models, such as physiology-based pharmacokinetic models, and formal graphical modeling languages, such as Petri nets. We discuss the mathematical aspects and biological applications of these integrated approaches and outline challenges and future perspectives.
Collapse
Affiliation(s)
- Ankur Sahu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Mary-Ann Blätke
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Jędrzej Jakub Szymański
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| | - Nadine Töpfer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, 06466 Gatersleben, Germany
| |
Collapse
|