1
|
Abbasi F, Rousu J. New methods for drug synergy prediction: A mini-review. Curr Opin Struct Biol 2024; 86:102827. [PMID: 38705070 DOI: 10.1016/j.sbi.2024.102827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 05/07/2024]
Abstract
In this mini-review, we explore the new prediction methods for drug combination synergy relying on high-throughput combinatorial screens. The fast progress of the field is witnessed in the more than thirty original machine learning methods published since 2021, a clear majority of them based on deep learning techniques. We aim to put these articles under a unifying lens by highlighting the core technologies, the data sources, the input data types and synergy scores used in the methods, as well as the prediction scenarios and evaluation protocols that the articles deal with. Our finding is that the best methods accurately solve the synergy prediction scenarios involving known drugs or cell lines while the scenarios involving new drugs or cell lines still fall short of an accurate prediction level.
Collapse
Affiliation(s)
- Fatemeh Abbasi
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland.
| |
Collapse
|
2
|
Armah-Sekum RE, Szedmak S, Rousu J. Protein function prediction through multi-view multi-label latent tensor reconstruction. BMC Bioinformatics 2024; 25:174. [PMID: 38698340 PMCID: PMC11067221 DOI: 10.1186/s12859-024-05789-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 04/17/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .
Collapse
Affiliation(s)
- Robert Ebo Armah-Sekum
- Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| | - Sandor Szedmak
- Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| |
Collapse
|
3
|
Astero M, Rousu J. Learning symmetry-aware atom mapping in chemical reactions through deep graph matching. J Cheminform 2024; 16:46. [PMID: 38650016 PMCID: PMC11036715 DOI: 10.1186/s13321-024-00841-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/07/2024] [Indexed: 04/25/2024] Open
Abstract
Accurate atom mapping, which establishes correspondences between atoms in reactants and products, is a crucial step in analyzing chemical reactions. In this paper, we present a novel end-to-end approach that formulates the atom mapping problem as a deep graph matching task. Our proposed model, AMNet (Atom Matching Network), utilizes molecular graph representations and employs various atom and bond features using graph neural networks to capture the intricate structural characteristics of molecules, ensuring precise atom correspondence predictions. Notably, AMNet incorporates the consideration of molecule symmetry, enhancing accuracy while simultaneously reducing computational complexity. The integration of the Weisfeiler-Lehman isomorphism test for symmetry identification refines the model's predictions. Furthermore, our model maps the entire atom set in a chemical reaction, offering a comprehensive approach beyond focusing solely on the main molecules in reactions. We evaluated AMNet's performance on a subset of USPTO reaction datasets, addressing various tasks, including assessing the impact of molecular symmetry identification, understanding the influence of feature selection on AMNet performance, and comparing its performance with the state-of-the-art method. The result reveals an average accuracy of 97.3% on mapped atoms, with 99.7% of reactions correctly mapped when the correct mapped atom is within the top 10 predicted atoms.Scientific contributionThe paper introduces a novel end-to-end deep graph matching model for atom mapping, utilizing molecular graph representations to capture structural characteristics effectively. It enhances accuracy by integrating symmetry detection through the Weisfeiler-Lehman test, reducing the number of possible mappings and improving efficiency. Unlike previous methods, it maps the entire reaction, not just main components, providing a comprehensive view. Additionally, by integrating efficient graph matching techniques, it reduces computational complexity, making atom mapping more feasible.
Collapse
Affiliation(s)
- Maryam Astero
- Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| | - Juho Rousu
- Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.
| |
Collapse
|
4
|
Sandström H, Rissanen M, Rousu J, Rinke P. Data-Driven Compound Identification in Atmospheric Mass Spectrometry. Adv Sci (Weinh) 2024; 11:e2306235. [PMID: 38095508 PMCID: PMC10885664 DOI: 10.1002/advs.202306235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/04/2023] [Indexed: 02/24/2024]
Abstract
Aerosol particles found in the atmosphere affect the climate and worsen air quality. To mitigate these adverse impacts, aerosol particle formation and aerosol chemistry in the atmosphere need to be better mapped out and understood. Currently, mass spectrometry is the single most important analytical technique in atmospheric chemistry and is used to track and identify compounds and processes. Large amounts of data are collected in each measurement of current time-of-flight and orbitrap mass spectrometers using modern rapid data acquisition practices. However, compound identification remains a major bottleneck during data analysis due to lacking reference libraries and analysis tools. Data-driven compound identification approaches could alleviate the problem, yet remain rare to non-existent in atmospheric science. In this perspective, the authors review the current state of data-driven compound identification with mass spectrometry in atmospheric science and discuss current challenges and possible future steps toward a digital era for atmospheric mass spectrometry.
Collapse
Affiliation(s)
- Hilda Sandström
- Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| | - Matti Rissanen
- Aerosol Physics Laboratory, Tampere University, FI-33720, Tampere, Finland
- Department of Chemistry, University of Helsinki, P.O. Box 55, A.I. Virtasen aukio 1, FI-00560, Helsinki, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| |
Collapse
|
5
|
Sabzevari M, Szedmak S, Penttilä M, Jouhten P, Rousu J. Strain design optimization using reinforcement learning. PLoS Comput Biol 2022; 18:e1010177. [PMID: 35658018 PMCID: PMC9200333 DOI: 10.1371/journal.pcbi.1010177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 06/15/2022] [Accepted: 05/06/2022] [Indexed: 11/18/2022] Open
Abstract
Engineered microbial cells present a sustainable alternative to fossil-based synthesis of chemicals and fuels. Cellular synthesis routes are readily assembled and introduced into microbial strains using state-of-the-art synthetic biology tools. However, the optimization of the strains required to reach industrially feasible production levels is far less efficient. It typically relies on trial-and-error leading into high uncertainty in total duration and cost. New techniques that can cope with the complexity and limited mechanistic knowledge of the cellular regulation are called for guiding the strain optimization.
In this paper, we put forward a multi-agent reinforcement learning (MARL) approach that learns from experiments to tune the metabolic enzyme levels so that the production is improved. Our method is model-free and does not assume prior knowledge of the microbe’s metabolic network or its regulation. The multi-agent approach is well-suited to make use of parallel experiments such as multi-well plates commonly used for screening microbial strains.
We demonstrate the method’s capabilities using the genome-scale kinetic model of Escherichia coli, k-ecoli457, as a surrogate for an in vivo cell behaviour in cultivation experiments. We investigate the method’s performance relevant for practical applicability in strain engineering i.e. the speed of convergence towards the optimum response, noise tolerance, and the statistical stability of the solutions found. We further evaluate the proposed MARL approach in improving L-tryptophan production by yeast Saccharomyces cerevisiae, using publicly available experimental data on the performance of a combinatorial strain library.
Overall, our results show that multi-agent reinforcement learning is a promising approach for guiding the strain optimization beyond mechanistic knowledge, with the goal of faster and more reliably obtaining industrially attractive production levels.
Collapse
Affiliation(s)
- Maryam Sabzevari
- Department of Computer Science, Aalto University, Espoo, Finland
- * E-mail: ,
| | - Sandor Szedmak
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | - Paula Jouhten
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
6
|
Kong W, Midena G, Chen Y, Athanasiadis P, Wang T, Rousu J, He L, Aittokallio T. Systematic review of computational methods for drug combination prediction. Comput Struct Biotechnol J 2022; 20:2807-2814. [PMID: 35685365 PMCID: PMC9168078 DOI: 10.1016/j.csbj.2022.05.055] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/27/2022] [Accepted: 05/27/2022] [Indexed: 12/26/2022] Open
Abstract
Synergistic effects between drugs are rare and highly context-dependent and patient-specific. Hence, there is a need to develop novel approaches to stratify patients for optimal therapy regimens, especially in the context of personalized design of combinatorial treatments. Computational methods enable systematic in-silico screening of combination effects, and can thereby prioritize most potent combinations for further testing, among the massive number of potential combinations. To help researchers to choose a prediction method that best fits for various real-world applications, we carried out a systematic literature review of 117 computational methods developed to date for drug combination prediction, and classified the methods in terms of their combination prediction tasks and input data requirements. Most current methods focus on prediction or classification of combination synergy, and only a few methods consider the efficacy and potential toxicity of the combinations, which are the key determinants of therapeutic success of drug treatments. Furthermore, there is a need to further develop methods that enable dose-specific predictions of combination effects across multiple doses, which is important for clinical translation of the predictions, as well as model-based identification of biomarkers predictive of heterogeneous drug combination responses. Even if most of the computational methods reviewed focus on anticancer applications, many of the modelling approaches are also applicable to antiviral and other diseases or indications.
Collapse
|
7
|
Bach E, Rogers S, Williamson J, Rousu J. Probabilistic framework for integration of mass spectrum and retention time information in small molecule identification. Bioinformatics 2021; 37:1724-1731. [PMID: 33244585 PMCID: PMC8289373 DOI: 10.1093/bioinformatics/btaa998] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 10/27/2020] [Accepted: 11/17/2020] [Indexed: 11/14/2022] Open
Abstract
Motivation Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve identifications solely based on MS information, such as precursor mass-per-charge and tandem mass spectrometry (MS2). Results We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining MS2 data and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features has MS2 measurements available besides MS1. Availability and implementation Software and data are freely available at https://github.com/aalto-ics-kepaco/msms_rt_score_integration. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Eric Bach
- Department of Computer Science, School of Science, Aalto University, Espoo, Finland
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, UK
| | - John Williamson
- School of Computing Science, University of Glasgow, Glasgow, UK
| | - Juho Rousu
- Department of Computer Science, School of Science, Aalto University, Espoo, Finland
| |
Collapse
|
8
|
Wang T, Szedmak S, Wang H, Aittokallio T, Pahikkala T, Cichonska A, Rousu J. Modeling drug combination effects via latent tensor reconstruction. Bioinformatics 2021; 37:i93-i101. [PMID: 34252952 PMCID: PMC8336593 DOI: 10.1093/bioinformatics/btab308] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Motivation Combination therapies have emerged as a powerful treatment modality to overcome drug resistance and improve treatment efficacy. However, the number of possible drug combinations increases very rapidly with the number of individual drugs in consideration, which makes the comprehensive experimental screening infeasible in practice. Machine-learning models offer time- and cost-efficient means to aid this process by prioritizing the most effective drug combinations for further pre-clinical and clinical validation. However, the complexity of the underlying interaction patterns across multiple drug doses and in different cellular contexts poses challenges to the predictive modeling of drug combination effects. Results We introduce comboLTR, highly time-efficient method for learning complex, non-linear target functions for describing the responses of therapeutic agent combinations in various doses and cancer cell-contexts. The method is based on a polynomial regression via powerful latent tensor reconstruction. It uses a combination of recommender system-style features indexing the data tensor of response values in different contexts, and chemical and multi-omics features as inputs. We demonstrate that comboLTR outperforms state-of-the-art methods in terms of predictive performance and running time, and produces highly accurate results even in the challenging and practical inference scenario where full dose–response matrices are predicted for completely new drug combinations with no available combination and monotherapy response measurements in any training cell line. Availability and implementation comboLTR code is available at https://github.com/aalto-ics-kepaco/ComboLTR. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianduanyi Wang
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.,Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Sandor Szedmak
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Haishan Wang
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Tero Aittokallio
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.,Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland.,Department of Mathematics and Statistics, University of Turku, Turku, Finland.,Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway
| | - Tapio Pahikkala
- Department of Computing, University of Turku, Turku, Finland
| | - Anna Cichonska
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.,Institute for Molecular Medicine Finland FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juho Rousu
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| |
Collapse
|
9
|
Hjörleifsson Eldjárn G, Ramsay A, van der Hooft JJJ, Duncan KR, Soldatou S, Rousu J, Daly R, Wandy J, Rogers S. Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. PLoS Comput Biol 2021; 17:e1008920. [PMID: 33945539 PMCID: PMC8130963 DOI: 10.1371/journal.pcbi.1008920] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 05/18/2021] [Accepted: 03/26/2021] [Indexed: 12/31/2022] Open
Abstract
Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.
Collapse
Affiliation(s)
| | - Andrew Ramsay
- School of Computing Science, University of Glasgow, Glasgow, United Kingdom
| | | | - Katherine R. Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, United Kingdom
| | - Sylvia Soldatou
- School of Pharmacy and Life Sciences, Robert Gordon University, Aberdeen, United Kingdom
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Rónán Daly
- Glasgow Polyomics, University of Glasgow, Glasgow, United Kingdom
| | - Joe Wandy
- Glasgow Polyomics, University of Glasgow, Glasgow, United Kingdom
| | - Simon Rogers
- School of Computing Science, University of Glasgow, Glasgow, United Kingdom
- * E-mail:
| |
Collapse
|
10
|
Dührkop K, Nothias LF, Fleischauer M, Reher R, Ludwig M, Hoffmann MA, Petras D, Gerwick WH, Rousu J, Dorrestein PC, Böcker S. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat Biotechnol 2021; 39:462-471. [PMID: 33230292 DOI: 10.1038/s41587-020-0740-8] [Citation(s) in RCA: 233] [Impact Index Per Article: 77.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022]
Abstract
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.
Collapse
Affiliation(s)
- Kai Dührkop
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Louis-Félix Nothias
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Raphael Reher
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - Marcus Ludwig
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Martin A Hoffmann
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
- International Max Planck Research School 'Exploration of Ecological Interactions with Molecular and Chemical Techniques', Max Planck Institute for Chemical Ecology, Jena, Germany
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - William H Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Juho Rousu
- Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Espoo, Finland
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany.
| |
Collapse
|
11
|
Wang T, Gautam P, Rousu J, Aittokallio T. Systematic mapping of cancer cell target dependencies using high-throughput drug screening in triple-negative breast cancer. Comput Struct Biotechnol J 2020; 18:3819-3832. [PMID: 33335681 PMCID: PMC7720026 DOI: 10.1016/j.csbj.2020.11.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 10/23/2020] [Accepted: 11/01/2020] [Indexed: 12/31/2022] Open
Abstract
While high-throughput drug screening offers possibilities to profile phenotypic responses of hundreds of compounds, elucidation of the cell context-specific mechanisms of drug action requires additional analyses. To that end, we developed a computational target deconvolution pipeline that identifies the key target dependencies based on collective drug response patterns in each cell line separately. The pipeline combines quantitative drug-cell line responses with drug-target interaction networks among both intended on- and potent off-targets to identify pharmaceutically actionable and selective therapeutic targets. To demonstrate its performance, the target deconvolution pipeline was applied to 310 small molecules tested on 20 genetically and phenotypically heterogeneous triple-negative breast cancer (TNBC) cell lines to identify cell line-specific target mechanisms in terms of cytotoxic and cytostatic drug target vulnerabilities. The functional essentiality of each protein target was quantified with a target addiction score (TAS), as a measure of dependency of the cell line on the therapeutic target. The target dependency profiling was shown to capture inhibitory information that is complementary to that obtained from the structure or sensitivity of the drugs. Comparison of the TAS profiles and gene essentiality scores from CRISPR-Cas9 knockout screens revealed that certain proteins with low gene essentiality showed high target addictions, suggesting that they might be functioning as protein groups, and therefore be resistant to single gene knock-out. The comparative analysis discovered protein groups of potential multi-target synthetic lethal interactions, for instance, among histone deacetylases (HDACs). Our integrated approach also recovered a number of well-established TNBC cell line-specific drivers and known TNBC therapeutic targets, such as HDACs and cyclin-dependent kinases (CDKs). The present work provides novel insights into druggable vulnerabilities for TNBC, and opportunities to identify multi-target synthetic lethal interactions for further studies.
Collapse
Affiliation(s)
- Tianduanyi Wang
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Prson Gautam
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Juho Rousu
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.,Helsinki Institute for Information Technology (HIIT), Department of Computer Science, Aalto University, Espoo, Finland.,Institute for Cancer Research, Department of Cancer Genetics, Oslo University Hospital, Oslo, Norway.,Centre for Biostatistics and Epidemiology (OCBE), Faculty of Medicine, University of Oslo, Oslo, Norway
| |
Collapse
|
12
|
Voutilainen S, Heinonen M, Andberg M, Jokinen E, Maaheimo H, Pääkkönen J, Hakulinen N, Rouvinen J, Lähdesmäki H, Kaski S, Rousu J, Penttilä M, Koivula A. Substrate specificity of 2-deoxy-D-ribose 5-phosphate aldolase (DERA) assessed by different protein engineering and machine learning methods. Appl Microbiol Biotechnol 2020; 104:10515-10529. [PMID: 33147349 PMCID: PMC7671976 DOI: 10.1007/s00253-020-10960-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 10/01/2020] [Accepted: 10/12/2020] [Indexed: 11/29/2022]
Abstract
In this work, deoxyribose-5-phosphate aldolase (Ec DERA, EC 4.1.2.4) from Escherichia coli was chosen as the protein engineering target for improving the substrate preference towards smaller, non-phosphorylated aldehyde donor substrates, in particular towards acetaldehyde. The initial broad set of mutations was directed to 24 amino acid positions in the active site or in the close vicinity, based on the 3D complex structure of the E. coli DERA wild-type aldolase. The specific activity of the DERA variants containing one to three amino acid mutations was characterised using three different substrates. A novel machine learning (ML) model utilising Gaussian processes and feature learning was applied for the 3rd mutagenesis round to predict new beneficial mutant combinations. This led to the most clear-cut (two- to threefold) improvement in acetaldehyde (C2) addition capability with the concomitant abolishment of the activity towards the natural donor molecule glyceraldehyde-3-phosphate (C3P) as well as the non-phosphorylated equivalent (C3). The Ec DERA variants were also tested on aldol reaction utilising formaldehyde (C1) as the donor. Ec DERA wild-type was shown to be able to carry out this reaction, and furthermore, some of the improved variants on acetaldehyde addition reaction turned out to have also improved activity on formaldehyde. KEY POINTS: • DERA aldolases are promiscuous enzymes. • Synthetic utility of DERA aldolase was improved by protein engineering approaches. • Machine learning methods aid the protein engineering of DERA.
Collapse
Affiliation(s)
- Sanni Voutilainen
- VTT Technical Research Centre of Finland Ltd, P.O. Box 1000, FI-02044 VTT, Espoo, Finland.
| | - Markus Heinonen
- Department of Computer Science, Aalto University, Espoo, Finland
- Helsinki Institute for Information Technology, Espoo, Finland
| | - Martina Andberg
- VTT Technical Research Centre of Finland Ltd, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Emmi Jokinen
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Hannu Maaheimo
- VTT Technical Research Centre of Finland Ltd, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Johan Pääkkönen
- Department of Chemistry, University of Eastern Finland, PO Box 111, FI-80101, Joensuu, Finland
| | - Nina Hakulinen
- Department of Chemistry, University of Eastern Finland, PO Box 111, FI-80101, Joensuu, Finland
| | - Juha Rouvinen
- Department of Chemistry, University of Eastern Finland, PO Box 111, FI-80101, Joensuu, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, Finland
- Helsinki Institute for Information Technology, Espoo, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland
- Helsinki Institute for Information Technology, Espoo, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland Ltd, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Anu Koivula
- VTT Technical Research Centre of Finland Ltd, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| |
Collapse
|
13
|
Heinonen M, Osmala M, Mannerström H, Wallenius J, Kaski S, Rousu J, Lähdesmäki H. Bayesian metabolic flux analysis reveals intracellular flux couplings. Bioinformatics 2020; 35:i548-i557. [PMID: 31510676 PMCID: PMC6612884 DOI: 10.1093/bioinformatics/btz315] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Motivation Metabolic flux balance analysis (FBA) is a standard tool in analyzing metabolic reaction rates compatible with measurements, steady-state and the metabolic reaction network stoichiometry. Flux analysis methods commonly place model assumptions on fluxes due to the convenience of formulating the problem as a linear programing model, while many methods do not consider the inherent uncertainty in flux estimates. Results We introduce a novel paradigm of Bayesian metabolic flux analysis that models the reactions of the whole genome-scale cellular system in probabilistic terms, and can infer the full flux vector distribution of genome-scale metabolic systems based on exchange and intracellular (e.g. 13C) flux measurements, steady-state assumptions, and objective function assumptions. The Bayesian model couples all fluxes jointly together in a simple truncated multivariate posterior distribution, which reveals informative flux couplings. Our model is a plug-in replacement to conventional metabolic balance methods, such as FBA. Our experiments indicate that we can characterize the genome-scale flux covariances, reveal flux couplings, and determine more intracellular unobserved fluxes in Clostridium acetobutylicum from 13C data than flux variability analysis. Availability and implementation The COBRA compatible software is available at github.com/markusheinonen/bamfa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Markus Heinonen
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology, Espoo, Finland
| | - Maria Osmala
- Department of Computer Science, Aalto University, Espoo, Finland
| | | | | | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology, Espoo, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland.,Helsinki Institute for Information Technology, Espoo, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
14
|
Abstract
Motivation In the analysis of metabolism, two distinct and complementary approaches are frequently used: Principal component analysis (PCA) and stoichiometric flux analysis. PCA is able to capture the main modes of variability in a set of experiments and does not make many prior assumptions about the data, but does not inherently take into account the flux mode structure of metabolism. Stoichiometric flux analysis methods, such as Flux Balance Analysis (FBA) and Elementary Mode Analysis, on the other hand, are able to capture the metabolic flux modes, however, they are primarily designed for the analysis of single samples at a time, and not best suited for exploratory analysis on a large sets of samples. Results We propose a new methodology for the analysis of metabolism, called Principal Metabolic Flux Mode Analysis (PMFA), which marries the PCA and stoichiometric flux analysis approaches in an elegant regularized optimization framework. In short, the method incorporates a variance maximization objective form PCA coupled with a stoichiometric regularizer, which penalizes projections that are far from any flux modes of the network. For interpretability, we also introduce a sparse variant of PMFA that favours flux modes that contain a small number of reactions. Our experiments demonstrate the versatility and capabilities of our methodology. The proposed method can be applied to genome-scale metabolic network in efficient way as PMFA does not enumerate elementary modes. In addition, the method is more robust on out-of-steady steady-state experimental data than competing flux mode analysis approaches. Availability and implementation Matlab software for PMFA and SPMFA and dataset used for experiments are available in https://github.com/aalto-ics-kepaco/PMFA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sahely Bhadra
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland.,Computer Science and Engineering, Indian Institute of Technology, Palakkad, India
| | - Peter Blomberg
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | - Sandra Castillo
- VTT Technical Research Centre of Finland Ltd, Espoo, Finland
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
15
|
Brouard C, Bassé A, d'Alché-Buc F, Rousu J. Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models. Metabolites 2019; 9:E160. [PMID: 31374904 PMCID: PMC6724104 DOI: 10.3390/metabo9080160] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/30/2019] [Accepted: 07/31/2019] [Indexed: 01/15/2023] Open
Abstract
In small molecule identification from tandem mass (MS/MS) spectra, input-output kernel regression (IOKR) currently provides the state-of-the-art combination of fast training and prediction and high identification rates. The IOKR approach can be simply understood as predicting a fingerprint vector from the MS/MS spectrum of the unknown molecule, and solving a pre-image problem to find the molecule with the most similar fingerprint. In this paper, we bring forward the following improvements to the IOKR framework: firstly, we formulate the IOKRreverse model that can be understood as mapping molecular structures into the MS/MS feature space and solving a pre-image problem to find the molecule whose predicted spectrum is the closest to the input MS/MS spectrum. Secondly, we introduce an approach to combine several IOKR and IOKRreverse models computed from different input and output kernels, called IOKRfusion. The method is based on minimizing structured Hinge loss of the combined model using a mini-batch stochastic subgradient optimization. Our experiments show a consistent improvement of top-k accuracy both in positive and negative ionization mode data.
Collapse
Affiliation(s)
- Céline Brouard
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, 31326 Castanet Tolosan, France.
| | - Antoine Bassé
- LTCI, Télécom Paris, Institut Polytechnique de Paris, 75634 Paris, France
| | | | - Juho Rousu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, 00076 Espoo, Finland
| |
Collapse
|
16
|
Bach E, Szedmak S, Brouard C, Böcker S, Rousu J. Liquid-chromatography retention order prediction for metabolite identification. Bioinformatics 2018; 34:i875-i883. [DOI: 10.1093/bioinformatics/bty590] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Affiliation(s)
- Eric Bach
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Sandor Szedmak
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Céline Brouard
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Sebastian Böcker
- Department for Computer Science, Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
| | - Juho Rousu
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| |
Collapse
|
17
|
Cichonska A, Pahikkala T, Szedmak S, Julkunen H, Airola A, Heinonen M, Aittokallio T, Rousu J. Learning with multiple pairwise kernels for drug bioactivity prediction. Bioinformatics 2018; 34:i509-i518. [PMID: 29949975 PMCID: PMC6022556 DOI: 10.1093/bioinformatics/bty277] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem. Availability and implementation Code is available at https://github.com/aalto-ics-kepaco. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna Cichonska
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | - Tapio Pahikkala
- Department of Information Technology, University of Turku, Turku, Finland
| | - Sandor Szedmak
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Heli Julkunen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Antti Airola
- Department of Information Technology, University of Turku, Turku, Finland
| | - Markus Heinonen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Tero Aittokallio
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
| | - Juho Rousu
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| |
Collapse
|
18
|
Bhadra S, Rousu J. Analysis of Fluxomic Experiments with Principal Metabolic Flux Mode Analysis. Methods Mol Biol 2018; 1807:141-161. [PMID: 30030809 DOI: 10.1007/978-1-4939-8561-6_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In the analysis of metabolism, two distinct and complementary approaches are frequently used: Principal component analysis (PCA) and stoichiometric flux analysis. PCA is able to capture the main modes of variability in a set of experiments and does not make many prior assumptions about the data, but does not inherently take into account the flux mode structure of metabolism. Stoichiometric flux analysis methods, such as Flux Balance Analysis (FBA) and Elementary Mode Analysis, on the other hand, are able to capture the metabolic flux modes, however, they are primarily designed for the analysis of single samples at a time, and assume the stoichiometric steady state of the metabolic network.We will discuss a new methodology for the analysis of metabolism, called Principal Metabolic Flux Mode Analysis (PMFA), which marries the PCA and stoichiometric flux analysis approaches in an elegant regularized optimization framework. In short, the method incorporates a variance maximization objective form PCA coupled with a stoichiometric regularizer, which penalizes projections that are far from any flux modes of the network. For interpretability, we also discuss a sparse variant of PMFA that favors flux modes that contain a small number of reactions. PMFA has several benefits: (1) it can be applied to large metabolic network in efficient way as PMFA does not enumerate elementary modes, (2) the method is more robust to the steady-state violations than competing approaches, and (3) can compactly capture the variation in the data by a few factors. This chapter will describe the detailed steps how to do the above task on experimental data from fluxomic and gene expression measurements.
Collapse
Affiliation(s)
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
19
|
Cankorur-Cetinkaya A, Dias JML, Kludas J, Slater NKH, Rousu J, Oliver SG, Dikicioglu D. Erratum: CamOptimus: a tool for exploiting complex adaptive evolution to optimise experiments and processes in biotechnology. Microbiology (Reading) 2017; 163:1369. [DOI: 10.1099/mic.0.000530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 08/21/2017] [Indexed: 11/18/2022] Open
Affiliation(s)
- Ayca Cankorur-Cetinkaya
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Joao M. L. Dias
- Department of Haematology, Cambridge University Hospitals NHS Trust, Cambridge, CB2 0QQ, UK
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jana Kludas
- Helsinki Institute for Information Technology HIIT; Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, FI-02150, Finland
| | - Nigel K. H. Slater
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, UK
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT; Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, FI-02150, Finland
| | - Stephen G. Oliver
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Duygu Dikicioglu
- Present address: Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, UK
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| |
Collapse
|
20
|
Abstract
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact:celine.brouard@aalto.fi Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Céline Brouard
- Department of Computer Science, Aalto University, Espoo, Finland Helsinki Institute for Information Technology, Espoo, Finland
| | - Huibin Shen
- Department of Computer Science, Aalto University, Espoo, Finland Helsinki Institute for Information Technology, Espoo, Finland
| | - Kai Dührkop
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | | | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Juho Rousu
- Department of Computer Science, Aalto University, Espoo, Finland Helsinki Institute for Information Technology, Espoo, Finland
| |
Collapse
|
21
|
Cankorur-Cetinkaya A, Dias JML, Kludas J, Slater NKH, Rousu J, Oliver SG, Dikicioglu D. CamOptimus: a tool for exploiting complex adaptive evolution to optimize experiments and processes in biotechnology. Microbiology (Reading) 2017. [PMID: 28635591 PMCID: PMC5817226 DOI: 10.1099/mic.0.000477] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimization problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simple-to-use and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimize their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrate the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimized. CamOptimus is available through: (https://doi.org/10.17863/CAM.10257).
Collapse
Affiliation(s)
- Ayca Cankorur-Cetinkaya
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Joao M L Dias
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Haematology, Cambridge University Hospitals NHS Trust, Cambridge, CB2 0QQ, UK
| | - Jana Kludas
- Helsinki Institute for Information Technology HIIT; Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, FI-02150, Finland
| | - Nigel K H Slater
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, UK
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT; Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, FI-02150, Finland
| | - Stephen G Oliver
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Duygu Dikicioglu
- Cambridge Systems Biology Centre and Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.,Present address: Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, UK
| |
Collapse
|
22
|
Schymanski EL, Ruttkies C, Krauss M, Brouard C, Kind T, Dührkop K, Allen F, Vaniya A, Verdegem D, Böcker S, Rousu J, Shen H, Tsugawa H, Sajed T, Fiehn O, Ghesquière B, Neumann S. Critical Assessment of Small Molecule Identification 2016: automated methods. J Cheminform 2017; 9:22. [PMID: 29086042 PMCID: PMC5368104 DOI: 10.1186/s13321-017-0207-1] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 03/13/2017] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org ) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification. RESULTS The Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in "Category 2: Best Automatic Structural Identification-In Silico Fragmentation Only", won by Team Brouard with 41% challenge wins. The winner of "Category 3: Best Automatic Structural Identification-Full Information" was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways. CONCLUSIONS The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for "known unknowns". As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for "real life" annotations. The true "unknown unknowns" remain to be evaluated in future CASMI contests. Graphical abstract .
Collapse
Affiliation(s)
- Emma L Schymanski
- Eawag: Swiss Federal Institute for Aquatic Science and Technology, Überlandstrasse 133, 8600, Dübendorf, Switzerland.
| | - Christoph Ruttkies
- Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle, Germany
| | - Martin Krauss
- Department of Effect-Directed Analysis, UFZ: Helmholtz Centre for Environmental Research, Permoserstrasse 15, 04318, Leipzig, Germany
| | - Céline Brouard
- Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland
- Helsinki Institute for Information Technology, Tekniikantie 14, 02150, Espoo, Finland
| | - Tobias Kind
- West Coast Metabolomics Center and Genome Center, University of California Davis, 451 Health Sciences Drive, Davis, CA, 95616, USA
| | - Kai Dührkop
- Chair of Bioinformatics, Friedrich-Schiller-University, Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
| | - Felicity Allen
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| | - Arpana Vaniya
- West Coast Metabolomics Center and Genome Center, University of California Davis, 451 Health Sciences Drive, Davis, CA, 95616, USA
- Department of Chemistry, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA
| | - Dries Verdegem
- Metabolomics Expertise Center, Vesalius Research Center (VRC), VIB, KU Leuven - University of Leuven, 3000, Louvain, Belgium
| | - Sebastian Böcker
- Chair of Bioinformatics, Friedrich-Schiller-University, Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
| | - Juho Rousu
- Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland
- Helsinki Institute for Information Technology, Tekniikantie 14, 02150, Espoo, Finland
| | - Huibin Shen
- Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland
- Helsinki Institute for Information Technology, Tekniikantie 14, 02150, Espoo, Finland
| | - Hiroshi Tsugawa
- RIKEN Center for Sustainable Resource Science (CSRS), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Tanvir Sajed
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| | - Oliver Fiehn
- West Coast Metabolomics Center and Genome Center, University of California Davis, 451 Health Sciences Drive, Davis, CA, 95616, USA
- Department of Biochemistry, Faculty of Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Bart Ghesquière
- Metabolomics Expertise Center, Vesalius Research Center (VRC), VIB, KU Leuven - University of Leuven, 3000, Louvain, Belgium
| | - Steffen Neumann
- Department of Stress and Developmental Biology, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle, Germany
| |
Collapse
|
23
|
Affiliation(s)
- Aalt D J van Dijk
- Biometris, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.,Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands.,Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, 00076, Aalto, Finland
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands
| | - Juho Rousu
- Department of Computer Science, Aalto University, 00076, Aalto, Finland.
| |
Collapse
|
24
|
Kludas J, Arvas M, Castillo S, Pakula T, Oja M, Brouard C, Jäntti J, Penttilä M, Rousu J. Machine Learning of Protein Interactions in Fungal Secretory Pathways. PLoS One 2016; 11:e0159302. [PMID: 27441920 PMCID: PMC4956264 DOI: 10.1371/journal.pone.0159302] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Accepted: 06/30/2016] [Indexed: 12/18/2022] Open
Abstract
In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities.
Collapse
Affiliation(s)
- Jana Kludas
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| | - Mikko Arvas
- VTT Technical Research Centre of Finland, Espoo, Finland
| | | | - Tiina Pakula
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Merja Oja
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Céline Brouard
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| | - Jussi Jäntti
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
25
|
Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S, Pirinen M. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics 2016; 32:1981-9. [PMID: 27153689 PMCID: PMC4920109 DOI: 10.1093/bioinformatics/btw052] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 12/04/2015] [Accepted: 01/19/2016] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. RESULTS We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/aalto-ics-kepaco CONTACTS anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna Cichonska
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| | - Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
| | - Antti J Kangas
- Computational Medicine, University of Oulu, Oulu University Hospital and Biocenter Oulu, Oulu, Finland
| | - Pasi Soininen
- Computational Medicine, University of Oulu, Oulu University Hospital and Biocenter Oulu, Oulu, Finland, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland
| | - Olli T Raitakari
- Department of Clinical Physiology and Nuclear Medicine, University of Turku and Turku University Hospital, Turku, Finland, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Marjo-Riitta Järvelin
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment & Health, School of Public Health, Imperial College London, London, UK, Centre for Life Course Epidemiology, Faculty of Medicine, University of Oulu, Oulu, Finland, Biocenter Oulu, University of Oulu, Oulu, Finland, Unit of Primary Care, Oulu University Hospital, Oulu, Finland
| | - Veikko Salomaa
- National Institute for Health and Welfare, Helsinki, Finland
| | - Mika Ala-Korpela
- Computational Medicine, University of Oulu, Oulu University Hospital and Biocenter Oulu, Oulu, Finland, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, Public Health, University of Helsinki, Helsinki, Finland and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Matti Pirinen
- Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| |
Collapse
|
26
|
Honeyborne I, McHugh TD, Kuittinen I, Cichonska A, Evangelopoulos D, Ronacher K, van Helden PD, Gillespie SH, Fernandez-Reyes D, Walzl G, Rousu J, Butcher PD, Waddell SJ. Profiling persistent tubercule bacilli from patient sputa during therapy predicts early drug efficacy. BMC Med 2016; 14:68. [PMID: 27055815 PMCID: PMC4825072 DOI: 10.1186/s12916-016-0609-3] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 03/23/2016] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND New treatment options are needed to maintain and improve therapy for tuberculosis, which caused the death of 1.5 million people in 2013 despite potential for an 86 % treatment success rate. A greater understanding of Mycobacterium tuberculosis (M.tb) bacilli that persist through drug therapy will aid drug development programs. Predictive biomarkers for treatment efficacy are also a research priority. METHODS AND RESULTS Genome-wide transcriptional profiling was used to map the mRNA signatures of M.tb from the sputa of 15 patients before and 3, 7 and 14 days after the start of standard regimen drug treatment. The mRNA profiles of bacilli through the first 2 weeks of therapy reflected drug activity at 3 days with transcriptional signatures at days 7 and 14 consistent with reduced M.tb metabolic activity similar to the profile of pre-chemotherapy bacilli. These results suggest that a pre-existing drug-tolerant M.tb population dominates sputum before and after early drug treatment, and that the mRNA signature at day 3 marks the killing of a drug-sensitive sub-population of bacilli. Modelling patient indices of disease severity with bacterial gene expression patterns demonstrated that both microbiological and clinical parameters were reflected in the divergent M.tb responses and provided evidence that factors such as bacterial load and disease pathology influence the host-pathogen interplay and the phenotypic state of bacilli. Transcriptional signatures were also defined that predicted measures of early treatment success (rate of decline in bacterial load over 3 days, TB test positivity at 2 months, and bacterial load at 2 months). CONCLUSIONS This study defines the transcriptional signature of M.tb bacilli that have been expectorated in sputum after two weeks of drug therapy, characterizing the phenotypic state of bacilli that persist through treatment. We demonstrate that variability in clinical manifestations of disease are detectable in bacterial sputa signatures, and that the changing M.tb mRNA profiles 0-2 weeks into chemotherapy predict the efficacy of treatment 6 weeks later. These observations advocate assaying dynamic bacterial phenotypes through drug therapy as biomarkers for treatment success.
Collapse
Affiliation(s)
- Isobella Honeyborne
- Centre for Clinical Microbiology, University College London, London, NW3 2PF, UK
| | - Timothy D McHugh
- Centre for Clinical Microbiology, University College London, London, NW3 2PF, UK
| | - Iitu Kuittinen
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Anna Cichonska
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.,Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland
| | | | - Katharina Ronacher
- Department of Science and Technology/National Research Foundation Centre of Excellence for Biomedical Tuberculosis Research and Medical Research Council Centre for TB Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Western Cape, South Africa
| | - Paul D van Helden
- Department of Science and Technology/National Research Foundation Centre of Excellence for Biomedical Tuberculosis Research and Medical Research Council Centre for TB Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Western Cape, South Africa
| | - Stephen H Gillespie
- Medical and Biological Sciences Building, University of St Andrews, North Haugh, St Andrews, Fife, KY16 9TF, UK
| | - Delmiro Fernandez-Reyes
- Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.,Department of Paediatrics, University College Hospital, College of Medicine of the University of Ibadan, Ibadan, Nigeria
| | - Gerhard Walzl
- Department of Science and Technology/National Research Foundation Centre of Excellence for Biomedical Tuberculosis Research and Medical Research Council Centre for TB Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Western Cape, South Africa
| | - Juho Rousu
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland
| | - Philip D Butcher
- Institute for Infection and Immunity, St George's University of London, London, SW17 0RE, UK
| | - Simon J Waddell
- Brighton and Sussex Medical School, University of Sussex, Brighton, BN1 9PX, UK.
| |
Collapse
|
27
|
Rantasalo A, Czeizler E, Virtanen R, Rousu J, Lähdesmäki H, Penttilä M, Jäntti J, Mojzita D. Synthetic Transcription Amplifier System for Orthogonal Control of Gene Expression in Saccharomyces cerevisiae. PLoS One 2016; 11:e0148320. [PMID: 26901642 PMCID: PMC4762949 DOI: 10.1371/journal.pone.0148320] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 01/15/2016] [Indexed: 12/26/2022] Open
Abstract
This work describes the development and characterization of a modular synthetic expression system that provides a broad range of adjustable and predictable expression levels in S. cerevisiae. The system works as a fixed-gain transcription amplifier, where the input signal is transferred via a synthetic transcription factor (sTF) onto a synthetic promoter, containing a defined core promoter, generating a transcription output signal. The system activation is based on the bacterial LexA-DNA-binding domain, a set of modified, modular LexA-binding sites and a selection of transcription activation domains. We show both experimentally and computationally that the tuning of the system is achieved through the selection of three separate modules, each of which enables an adjustable output signal: 1) the transcription-activation domain of the sTF, 2) the binding-site modules in the output promoter, and 3) the core promoter modules which define the transcription initiation site in the output promoter. The system has a novel bidirectional architecture that enables generation of compact, yet versatile expression modules for multiple genes with highly diversified expression levels ranging from negligible to very strong using one synthetic transcription factor. In contrast to most existing modular gene expression regulation systems, the present system is independent from externally added compounds. Furthermore, the established system was minimally affected by the several tested growth conditions. These features suggest that it can be highly useful in large scale biotechnology applications.
Collapse
Affiliation(s)
- Anssi Rantasalo
- VTT Technical Research Centre of Finland, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Elena Czeizler
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Espoo, Finland
| | - Riitta Virtanen
- VTT Technical Research Centre of Finland, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, P.O. Box 15400, FI-00076 Aalto, Espoo, Finland
| | - Harri Lähdesmäki
- Aalto University, Department of Computer Science, P.O. Box 15400, FI-00076 Aalto, Espoo, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Jussi Jäntti
- VTT Technical Research Centre of Finland, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
| | - Dominik Mojzita
- VTT Technical Research Centre of Finland, P.O. Box 1000, FI-02044 VTT, Espoo, Finland
- * E-mail:
| |
Collapse
|
28
|
Cichonska A, Rousu J, Aittokallio T. Identification of drug candidates and repurposing opportunities through compound-target interaction networks. Expert Opin Drug Discov 2015; 10:1333-45. [PMID: 26429153 DOI: 10.1517/17460441.2015.1096926] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
INTRODUCTION System-wide identification of both on- and off-targets of chemical probes provides improved understanding of their therapeutic potential and possible adverse effects, thereby accelerating and de-risking drug discovery process. Given the high costs of experimental profiling of the complete target space of drug-like compounds, computational models offer systematic means for guiding these mapping efforts. These models suggest the most potent interactions for further experimental or pre-clinical evaluation both in cell line models and in patient-derived material. AREAS COVERED The authors focus here on network-based machine learning models and their use in the prediction of novel compound-target interactions both in target-based and phenotype-based drug discovery applications. While currently being used mainly in complementing the experimentally mapped compound-target networks for drug repurposing applications, such as extending the target space of already approved drugs, these network pharmacology approaches may also suggest completely unexpected and novel investigational probes for drug development. EXPERT OPINION Although the studies reviewed here have already demonstrated that network-centric modeling approaches have the potential to identify candidate compounds and selective targets in disease networks, many challenges still remain. In particular, these challenges include how to incorporate the cellular context and genetic background into the disease networks to enable more stratified and selective target predictions, as well as how to make the prediction models more realistic for the practical drug discovery and therapeutic applications.
Collapse
Affiliation(s)
- Anna Cichonska
- a 1 University of Helsinki, Institute for Molecular Medicine Finland FIMM , Helsinki, Finland.,b 2 Aalto University, Helsinki Institute for Information Technology HIIT, Department of Computer Science , Espoo, Finland
| | - Juho Rousu
- c 3 Aalto University, Helsinki Institute for Information Technology HIIT, Department of Computer Science , Espoo, Finland
| | - Tero Aittokallio
- d 4 University of Helsinki, Institute for Molecular Medicine Finland FIMM , Helsinki, Finland +358 5 03 18 24 26 ; .,e 5 University of Turku, Department of Mathematics and Statistics , Turku, Finland
| |
Collapse
|
29
|
Abstract
Motivation: Metabolite identification from tandem mass spectrometric data is a key task in metabolomics. Various computational methods have been proposed for the identification of metabolites from tandem mass spectra. Fragmentation tree methods explore the space of possible ways in which the metabolite can fragment, and base the metabolite identification on scoring of these fragmentation trees. Machine learning methods have been used to map mass spectra to molecular fingerprints; predicted fingerprints, in turn, can be used to score candidate molecular structures. Results: Here, we combine fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures. We introduce a family of kernels capturing the similarity of fragmentation trees, and combine these kernels using recently proposed multiple kernel learning approaches. Experiments on two large reference datasets show that the new methods significantly improve molecular fingerprint prediction accuracy. These improvements result in better metabolite identification, doubling the number of metabolites ranked at the top position of the candidates list. Contact:huibin.shen@aalto.fi Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huibin Shen
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, GermanyDepartment of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Kai Dührkop
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Sebastian Böcker
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Juho Rousu
- Department of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, GermanyDepartment of Information and Computer Science, Aalto University, Espoo, Finland, Helsinki Institute for Information Technology, Espoo, Finland and Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| |
Collapse
|
30
|
|
31
|
Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P, Kludas J, Oja M, Holm L, Penttilä M, Rousu J, Arvas M. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species. PLoS Comput Biol 2014; 10:e1003465. [PMID: 24516375 PMCID: PMC3916221 DOI: 10.1371/journal.pcbi.1003465] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Accepted: 12/18/2013] [Indexed: 12/12/2022] Open
Abstract
We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/. Advances in next-generation sequencing technologies are revolutionizing molecular biology. Sequencing-enabled cost-effective characterization of microbial genomes is a particularly exciting development in metabolic engineering. There, considerable effort has been put to reconstructing genome-scale metabolic networks that describe the collection of hundreds to thousands of biochemical reactions available for a microbial cell. These network models are instrumental in understanding microbial metabolism and guiding metabolic engineering efforts to improve biochemical yields. We have developed a novel computational method, CoReCo, which bridges the growing gap between the availability of sequenced genomes and respective reconstructed metabolic networks. The method reconstructs genome-scale metabolic networks simultaneously for related microbial species. It utilizes the available sequencing data from these species to correct for incomplete and missing data. We used the method to reconstruct metabolic networks for a set of 49 fungal species providing the method protein sequence data and a phylogenetic tree describing the evolutionary relationships between the species. We demonstrate the applicability of the method by comparing a metabolic reconstruction of Saccharomyces cerevisiae to the manually curated, high-quality consensus network. We also provide an easy-to-use implementation of the method, usable both in single computer and distributed computing environments.
Collapse
Affiliation(s)
- Esa Pitkänen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Medical Genetics, Genome-Scale Biology Research Program, University of Helsinki, Helsinki, Finland
- * E-mail:
| | - Paula Jouhten
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Jian Hou
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Department of Information and Computer Science, Aalto University, Espoo, Finland
| | | | - Peter Blomberg
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Jana Kludas
- Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Merja Oja
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Liisa Holm
- Institute of Biotechnology & Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Merja Penttilä
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Juho Rousu
- Department of Information and Computer Science, Aalto University, Espoo, Finland
| | - Mikko Arvas
- VTT Technical Research Centre of Finland, Espoo, Finland
| |
Collapse
|
32
|
Shen H, Zamboni N, Heinonen M, Rousu J. Metabolite Identification through Machine Learning- Tackling CASMI Challenge Using FingerID. Metabolites 2013; 3:484-505. [PMID: 24958002 PMCID: PMC3901273 DOI: 10.3390/metabo3020484] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Revised: 05/24/2013] [Accepted: 05/30/2013] [Indexed: 01/28/2023] Open
Abstract
Metabolite identification is a major bottleneck in metabolomics due to the number and diversity of the molecules. To alleviate this bottleneck, computational methods and tools that reliably filter the set of candidates are needed for further analysis by human experts. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for developing a new genre of metabolite identification methods that rely on machine learning as the primary vehicle for identification. In this paper we describe the machine learning approach used in FingerID, its application to the CASMI challenges and some results that were not part of our challenge submission. In short, FingerID learns to predict molecular fingerprints from a large collection of MS/MS spectra, and uses the predicted fingerprints to retrieve and rank candidate molecules from a given large molecular database. Furthermore, we introduce a web server for FingerID, which was applied for the first time to the CASMI challenges. The challenge results show that the new machine learning framework produces competitive results on those challenge molecules that were found within the relatively restricted KEGG compound database. Additional experiments on the PubChem database confirm the feasibility of the approach even on a much larger database, although room for improvement still remains.
Collapse
Affiliation(s)
- Huibin Shen
- Helsinki Institute for Information Technology HIIT; Department of Information and Computer Science, Aalto University, Konemiehentie 2, FI-02150 Espoo, Finland;.
| | - Nicola Zamboni
- Institute of Molecular Systems Biology, ETH Zürich, Wolfgang-Pauli Street 16, 8093 Zürich, Switzerland.
| | - Markus Heinonen
- IBISC, Université d'Evry-Val d'Essonne, Bâtiment IBGBI, 23 Bd de France, 91037 cedex Evry,France.
| | - Juho Rousu
- Helsinki Institute for Information Technology HIIT; Department of Information and Computer Science, Aalto University, Konemiehentie 2, FI-02150 Espoo, Finland;.
| |
Collapse
|
33
|
Rousu J, Agranoff DD, Sodeinde O, Shawe-Taylor J, Fernandez-Reyes D. Biomarker discovery by sparse canonical correlation analysis of complex clinical phenotypes of tuberculosis and malaria. PLoS Comput Biol 2013; 9:e1003018. [PMID: 23637585 PMCID: PMC3630122 DOI: 10.1371/journal.pcbi.1003018] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 02/18/2013] [Indexed: 11/25/2022] Open
Abstract
Biomarker discovery aims to find small subsets of relevant variables in ‘omics data that correlate with the clinical syndromes of interest. Despite the fact that clinical phenotypes are usually characterized by a complex set of clinical parameters, current computational approaches assume univariate targets, e.g. diagnostic classes, against which associations are sought for. We propose an approach based on asymmetrical sparse canonical correlation analysis (SCCA) that finds multivariate correlations between the ‘omics measurements and the complex clinical phenotypes. We correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. We discovered relevant ‘omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5–3% of all ‘omic variables. We show that using clinical view projections we obtain remarkable improvements in diagnostic class prediction, up to 11% in tuberculosis and up to 5% in malaria. Our approach finds proteomic-biomarkers that correlate with complex combinations of clinical-biomarkers. Using the clinical-biomarkers improves the accuracy of diagnostic class prediction while not requiring the measurement plasma proteomic profiles of each subject. Our approach makes it feasible to use omics' data to build accurate diagnostic algorithms that can be deployed to community health centres lacking the expensive ‘omics measurement capabilities. Many infectious diseases such as tuberculosis and malaria are challenging both for scientists trying to understand the biochemical basis of the diseases and for medical doctors making diagnosis. The challenges arise both from the dependence of the diseases on sets of proteins and from the complexity of the symptoms. Biomarkers denote small sets of measurements that correlate with the phenotype of interest. They have potential use both in advancing the basic biomedical research of infectious diseases and in facilitating predictive diagnostic tools. We propose a new method for biomarker discovery that works by finding canonical correlations between two sets of data, the plasma proteomic profiles and clinical profiles of the subjects. We show that the method is able to find candidate proteomic biomarkers that correlate with combinations of clinical variables, called the clinical biomarkers. Using the clinical biomarkers improves the accuracy of diagnostic class prediction while not requiring the expensive plasma proteomic profiles to be measured for each subject.
Collapse
Affiliation(s)
- Juho Rousu
- Helsinki Institute for Information Technology, Department of Information and Computer Science, Aalto University, Espoo, Finland
| | | | | | | | | |
Collapse
|
34
|
Heinonen M, Shen H, Zamboni N, Rousu J. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics 2012; 28:2333-41. [DOI: 10.1093/bioinformatics/bts437] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
35
|
Heinonen M, Lappalainen S, Mielikäinen T, Rousu J. Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism. J Comput Biol 2011; 18:43-58. [DOI: 10.1089/cmb.2009.0216] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Markus Heinonen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Sampsa Lappalainen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | | | - Juho Rousu
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
36
|
Pitkänen E, Rousu J, Ukkonen E. Computational methods for metabolic reconstruction. Curr Opin Biotechnol 2010; 21:70-7. [DOI: 10.1016/j.copbio.2010.01.010] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2009] [Revised: 01/17/2010] [Accepted: 01/20/2010] [Indexed: 12/19/2022]
|
37
|
Pitkänen E, Jouhten P, Rousu J. Inferring branching pathways in genome-scale metabolic networks. BMC Syst Biol 2009; 3:103. [PMID: 19874610 PMCID: PMC2791103 DOI: 10.1186/1752-0509-3-103] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 10/29/2009] [Indexed: 11/17/2022]
Abstract
Background A central problem in computational metabolic modelling is how to find biochemically plausible pathways between metabolites in a metabolic network. Two general, complementary frameworks have been utilized to find metabolic pathways: constraint-based modelling and graph-theoretical path finding approaches. In constraint-based modelling, one aims to find pathways where metabolites are balanced in a pseudo steady-state. Constraint-based methods, such as elementary flux mode analysis, have typically a high computational cost stemming from a large number of steady-state pathways in a typical metabolic network. On the other hand, graph-theoretical approaches avoid the computational complexity of constraint-based methods by solving a simpler problem of finding shortest paths. However, while scaling well with network size, graph-theoretic methods generally tend to return more false positive pathways than constraint-based methods. Results In this paper, we introduce a computational method, ReTrace, for finding biochemically relevant, branching metabolic pathways in an atom-level representation of metabolic networks. The method finds compact pathways which transfer a high fraction of atoms from source to target metabolites by considering combinations of linear shortest paths. In contrast to current steady-state pathway analysis methods, our method scales up well and is able to operate on genome-scale models. Further, we show that the pathways produced are biochemically meaningful by an example involving the biosynthesis of inosine 5'-monophosphate (IMP). In particular, the method is able to avoid typical problems associated with graph-theoretic approaches such as the need to define side metabolites or pathways not carrying any net carbon flux appearing in results. Finally, we discuss an application involving reconstruction of amino acid pathways of a recently sequenced organism demonstrating how measurement data can be easily incorporated into ReTrace analysis. ReTrace is licensed under GPL and is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/retrace/. Conclusion ReTrace is a useful method in metabolic path finding tasks, combining some of the best aspects in constraint-based and graph-theoretic methods. It finds use in a multitude of tasks ranging from metabolic engineering to metabolic reconstruction of recently sequenced organisms.
Collapse
Affiliation(s)
- Esa Pitkänen
- Department of Computer Science, University of Helsinki, Finland.
| | | | | |
Collapse
|
38
|
Abstract
BACKGROUND In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM3) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences. RESULTS In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction. CONCLUSION Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction.
Collapse
Affiliation(s)
- Katja Astikainen
- Department of Computer Science, PO Box 68, FI-00014 University of Helsinki, Finland
| | - Liisa Holm
- Institute of Biotechnology, P.O. Box 56, FI-00014 University of Helsinki, Finland
| | - Esa Pitkänen
- Department of Computer Science, PO Box 68, FI-00014 University of Helsinki, Finland
| | - Sandor Szedmak
- Electronics and Computer Science, University of Southampton, SO17 1BJ, UK
| | - Juho Rousu
- Department of Computer Science, PO Box 68, FI-00014 University of Helsinki, Finland
| |
Collapse
|
39
|
Heinonen M, Rantanen A, Mielikäinen T, Kokkonen J, Kiuru J, Ketola RA, Rousu J. FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data. Rapid Commun Mass Spectrom 2008; 22:3043-3052. [PMID: 18763276 DOI: 10.1002/rcm.3701] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We present FiD (Fragment iDentificator), a software tool for the structural identification of product ions produced with tandem mass spectrometric measurement of low molecular weight organic compounds. Tandem mass spectrometry (MS/MS) has proven to be an indispensable tool in modern, cell-wide metabolomics and fluxomics studies. In such studies, the structural information of the MS(n) product ions is usually needed in the downstream analysis of the measurement data. The manual identification of the structures of MS(n) product ions is, however, a nontrivial task requiring expertise, and calls for computer assistance. Commercial software tools, such as Mass Frontier and ACD/MS Fragmenter, rely on fragmentation rule databases for the identification of MS(n) product ions. FiD, on the other hand, conducts a combinatorial search over all possible fragmentation paths and outputs a ranked list of alternative structures. This gives the user an advantage in situations where the MS/MS data of compounds with less well-known fragmentation mechanisms are processed. FiD software implements two fragmentation models, the single-step model that ignores intermediate fragmentation states and the multi-step model, which allows for complex fragmentation pathways. The software works for MS/MS data produced both in positive- and negative-ion modes. The software has an easy-to-use graphical interface with built-in visualization capabilities for structures of product ions and fragmentation pathways. In our experiments involving amino acids and sugar-phosphates, often found, e.g., in the central carbon metabolism of yeasts, FiD software correctly predicted the structures of product ions on average in 85% of the cases. The FiD software is free for academic use and is available for download from www.cs.helsinki.fi/group/sysfys/software/fragid.
Collapse
Affiliation(s)
- Markus Heinonen
- Department of Computer Science, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | |
Collapse
|
40
|
Rantanen A, Rousu J, Jouhten P, Zamboni N, Maaheimo H, Ukkonen E. An analytic and systematic framework for estimating metabolic flux ratios from 13C tracer experiments. BMC Bioinformatics 2008; 9:266. [PMID: 18534038 PMCID: PMC2430715 DOI: 10.1186/1471-2105-9-266] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2008] [Accepted: 06/06/2008] [Indexed: 11/10/2022] Open
Abstract
Background Metabolic fluxes provide invaluable insight on the integrated response of a cell to environmental stimuli or genetic modifications. Current computational methods for estimating the metabolic fluxes from 13C isotopomer measurement data rely either on manual derivation of analytic equations constraining the fluxes or on the numerical solution of a highly nonlinear system of isotopomer balance equations. In the first approach, analytic equations have to be tediously derived for each organism, substrate or labelling pattern, while in the second approach, the global nature of an optimum solution is difficult to prove and comprehensive measurements of external fluxes to augment the 13C isotopomer data are typically needed. Results We present a novel analytic framework for estimating metabolic flux ratios in the cell from 13C isotopomer measurement data. In the presented framework, equation systems constraining the fluxes are derived automatically from the model of the metabolism of an organism. The framework is designed to be applicable with all metabolic network topologies, 13C isotopomer measurement techniques, substrates and substrate labelling patterns. By analyzing nuclear magnetic resonance (NMR) and mass spectrometry (MS) measurement data obtained from the experiments on glucose with the model micro-organisms Bacillus subtilis and Saccharomyces cerevisiae we show that our framework is able to automatically produce the flux ratios discovered so far by the domain experts with tedious manual analysis. Furthermore, we show by in silico calculability analysis that our framework can rapidly produce flux ratio equations – as well as predict when the flux ratios are unobtainable by linear means – also for substrates not related to glucose. Conclusion The core of 13C metabolic flux analysis framework introduced in this article constitutes of flow and independence analysis of metabolic fragments and techniques for manipulating isotopomer measurements with vector space techniques. These methods facilitate efficient, analytic computation of the ratios between the fluxes of pathways that converge to a common junction metabolite. The framework can been seen as a generalization and formalization of existing tradition for computing metabolic flux ratios where equations constraining flux ratios are manually derived, usually without explicitly showing the formal proofs of the validity of the equations.
Collapse
Affiliation(s)
- Ari Rantanen
- Department of Computer Science, University of Helsinki, Finland.
| | | | | | | | | | | |
Collapse
|
41
|
Abstract
This supplement contains extended versions of a selected subset of papers presented at the workshop PMSB 2007, Probabilistic Modeling and Machine Learning in Structural and Systems Biology, Tuusula, Finland, from June 17 to 18, 2006.
Collapse
|
42
|
Rantanen A, Mielikäinen T, Rousu J, Maaheimo H, Ukkonen E. Planning optimal measurements of isotopomer distributions for estimation of metabolic fluxes. Bioinformatics 2006; 22:1198-206. [PMID: 16504982 DOI: 10.1093/bioinformatics/btl069] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Flux estimation using isotopomer information of metabolites is currently the most reliable method to obtain quantitative estimates of the activity of metabolic pathways. However, the development of isotopomer measurement techniques for intermediate metabolites is a demanding task. Careful planning of isotopomer measurements is thus needed to maximize the available flux information while minimizing the experimental effort. RESULTS In this paper we study the question of finding the smallest subset of metabolites to measure that ensure the same level of isotopomer information as the measurement of every metabolite in the metabolic network. We study the computational complexity of this optimization problem in the case of the so-called positional enrichment data, give methods for obtaining exact and fast approximate solutions, and evaluate empirically the efficacy of the proposed methods by analyzing a metabolic network that models the central carbon metabolism of Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Ari Rantanen
- Department of Computer Science P.O. Box 68 (Gustaf Hällströmin katu 2b) 00014 University of Helsinki Finland.
| | | | | | | | | |
Collapse
|
43
|
|
44
|
|
45
|
Abstract
The isotopomer distributions of metabolites are invaluable pieces of information in the computation of the flux distribution in a metabolic network. We describe the use of tandem mass spectrometry with the daughter ion scanning technique in the discovery of positional isotopomer distributions (PID). This technique increases the possibilities of mass spectrometry since given the same fragment ions, it uncovers more information than the full scanning mode. The mathematics of the new technique is slightly more complicated than the techniques needed by full scanning mode methods. Our experiments, however, show that in practice the inadequacy of the fragmentation of amino acids in the tandem mass spectrometer does not allow uncovering the PID exactly even if the daughter ion scanning is used. The computational techniques have been implemented in a MATLAB application called PIDC (Positional Isotopomer Distribution Calculator).
Collapse
Affiliation(s)
- Ari Rantanen
- Department of Computer Science, FIN-00014 University of Helsinki, Finland.
| | | | | | | | | |
Collapse
|
46
|
Elomaa T, Rousu J. J Intell Inf Syst 2002; 18:55-70. [DOI: 10.1023/a:1012920624627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
47
|
|