1
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States,*Correspondence: Lisa M. Bramer ✉
| |
Collapse
|
2
|
Maghsoudi Z, Nguyen H, Tavakkoli A, Nguyen T. A comprehensive survey of the approaches for pathway analysis using multi-omics data integration. Brief Bioinform 2022; 23:6761962. [PMID: 36252928 PMCID: PMC9677478 DOI: 10.1093/bib/bbac435] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/26/2022] [Accepted: 09/08/2022] [Indexed: 02/07/2023] Open
Abstract
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
Collapse
Affiliation(s)
- Zeynab Maghsoudi
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Ha Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Alireza Tavakkoli
- Department of Computer Science and Engineering, University of Nevada, Reno, 89557, Nevada, USA
| | - Tin Nguyen
- Corresponding author: Tin Nguyen, Department of Computer Science and Engineering, University of Nevada, Reno, NV, USA. Tel.: +1-775-784-6619;
| |
Collapse
|
3
|
Danchin A. In vivo, in vitro and in silico: an open space for the development of microbe-based applications of synthetic biology. Microb Biotechnol 2022; 15:42-64. [PMID: 34570957 PMCID: PMC8719824 DOI: 10.1111/1751-7915.13937] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 09/14/2021] [Indexed: 12/24/2022] Open
Abstract
Living systems are studied using three complementary approaches: living cells, cell-free systems and computer-mediated modelling. Progresses in understanding, allowing researchers to create novel chassis and industrial processes rest on a cycle that combines in vivo, in vitro and in silico studies. This design-build-test-learn iteration loop cycle between experiments and analyses combines together physiology, genetics, biochemistry and bioinformatics in a way that keeps going forward. Because computer-aided approaches are not directly constrained by the material nature of the entities of interest, we illustrate here how this virtuous cycle allows researchers to explore chemistry which is foreign to that present in extant life, from whole chassis to novel metabolic cycles. Particular emphasis is placed on the importance of evolution.
Collapse
Affiliation(s)
- Antoine Danchin
- Kodikos LabsInstitut Cochin24 rue du Faubourg Saint‐JacquesParis75014France
| |
Collapse
|
4
|
Lee LY, Pandey AK, Maron BA, Loscalzo J. Network medicine in Cardiovascular Research. Cardiovasc Res 2020; 117:2186-2202. [PMID: 33165538 DOI: 10.1093/cvr/cvaa321] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/08/2020] [Accepted: 10/30/2020] [Indexed: 12/21/2022] Open
Abstract
The ability to generate multi-omics data coupled with deeply characterizing the clinical phenotype of individual patients promises to improve understanding of complex cardiovascular pathobiology. There remains an important disconnection between the magnitude and granularity of these data and our ability to improve phenotype-genotype correlations for complex cardiovascular diseases. This shortcoming may be due to limitations associated with traditional reductionist analytical methods, which tend to emphasize a single molecular event in the pathogenesis of diseases more aptly characterized by crosstalk between overlapping molecular pathways. Network medicine is a rapidly growing discipline that considers diseases as the consequences of perturbed interactions between multiple interconnected biological components. This powerful integrative approach has enabled a number of important discoveries in complex disease mechanisms. In this review, we introduce the basic concepts of network medicine and highlight specific examples by which this approach has accelerated cardiovascular research. We also review how network medicine is well-positioned to promote rational drug design for patients with cardiovascular diseases, with particular emphasis on advancing precision medicine.
Collapse
Affiliation(s)
- Laurel Y Lee
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA
| | - Arvind K Pandey
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA
| | - Bradley A Maron
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA.,Department of Cardiology, Boston VA Healthcare System, Boston, MA, USA
| | - Joseph Loscalzo
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA
| |
Collapse
|
5
|
Lee LYH, Loscalzo J. Network Medicine in Pathobiology. THE AMERICAN JOURNAL OF PATHOLOGY 2019; 189:1311-1326. [PMID: 31014954 DOI: 10.1016/j.ajpath.2019.03.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 03/05/2019] [Indexed: 12/11/2022]
Abstract
The past decade has witnessed exponential growth in the generation of high-throughput human data across almost all known dimensions of biological systems. The discipline of network medicine has rapidly evolved in parallel, providing an unbiased, comprehensive biological framework through which to interrogate and integrate systematically these large-scale, multi-omic data to enhance our understanding of disease mechanisms and to design drugs that reflect a deep knowledge of molecular pathobiology. In this review, we discuss the key principles of network medicine and the human disease network and explore the latest applications of network medicine in this multi-omic era. We also highlight the current conceptual and technological challenges, which serve as exciting opportunities by which to improve and expand the network-based applications beyond the artificial boundaries of the current state of human pathobiology.
Collapse
Affiliation(s)
| | - Joseph Loscalzo
- Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.
| |
Collapse
|
6
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 214] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
7
|
Dunn W, Burgun A, Krebs MO, Rance B. Exploring and visualizing multidimensional data in translational research platforms. Brief Bioinform 2017; 18:1044-1056. [PMID: 27585944 PMCID: PMC5862238 DOI: 10.1093/bib/bbw080] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 07/30/2016] [Accepted: 08/03/2016] [Indexed: 01/20/2023] Open
Abstract
The unprecedented advances in technology and scientific research over the past few years have provided the scientific community with new and more complex forms of data. Large data sets collected from single groups or cross-institution consortiums containing hundreds of omic and clinical variables corresponding to thousands of patients are becoming increasingly commonplace in the research setting. Before any core analyses are performed, visualization often plays a key role in the initial phases of research, especially for projects where no initial hypotheses are dominant. Proper visualization of data at a high level facilitates researcher's abilities to find trends, identify outliers and perform quality checks. In addition, research has uncovered the important role of visualization in data analysis and its implied benefits facilitating our understanding of disease and ultimately improving patient care. In this work, we present a review of the current landscape of existing tools designed to facilitate the visualization of multidimensional data in translational research platforms. Specifically, we reviewed the biomedical literature for translational platforms allowing the visualization and exploration of clinical and omics data, and identified 11 platforms: cBioPortal, interactive genomics patient stratification explorer, Igloo-Plot, The Georgetown Database of Cancer Plus, tranSMART, an unnamed data-cube-based model supporting heterogeneous data, Papilio, Caleydo Domino, Qlucore Omics, Oracle Health Sciences Translational Research Center and OmicsOffice® powered by TIBCO Spotfire. In a health sector continuously witnessing an increase in data from multifarious sources, visualization tools used to better grasp these data will grow in their importance, and we believe our work will be useful in guiding investigators in similar situations.
Collapse
Affiliation(s)
- William Dunn
- Inserm University Paris Descartes UMR_S894 Centre de Psychiatrie et Neurosciences Laboratoire de Physiopathologie des maladies Psychiatriques, Paris, France
| | - Anita Burgun
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| | - Marie-Odile Krebs
- Inserm University Paris Descartes UMR_S894 Centre de Psychiatrie et Neurosciences Laboratoire de Physiopathologie des maladies Psychiatriques, Paris, France
- Université Paris Descartes, Faculté de Médecine Paris Descartes, Service Hospitalo Universitaire, Centre Hospitalier Sainte-Anne, CNRS GDR 3557 – Institut de Psychiatrie, Paris, France
| | - Bastien Rance
- University Hospital Georges Pompidou (HEGP); AP-HP, Paris, France; INSERM; UMRS1138, Paris Descartes University, Paris, France
| |
Collapse
|
8
|
Liu B, Shen X, Pan W. Integrative and regularized principal component analysis of multiple sources of data. Stat Med 2016; 35:2235-50. [PMID: 26756854 DOI: 10.1002/sim.6866] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Revised: 09/28/2015] [Accepted: 12/14/2015] [Indexed: 12/14/2022]
Abstract
Integration of data of disparate types has become increasingly important to enhancing the power for new discoveries by combining complementary strengths of multiple types of data. One application is to uncover tumor subtypes in human cancer research in which multiple types of genomic data are integrated, including gene expression, DNA copy number, and DNA methylation data. In spite of their successes, existing approaches based on joint latent variable models require stringent distributional assumptions and may suffer from unbalanced scales (or units) of different types of data and non-scalability of the corresponding algorithms. In this paper, we propose an alternative based on integrative and regularized principal component analysis, which is distribution-free, computationally efficient, and robust against unbalanced scales. The new method performs dimension reduction simultaneously on multiple types of data, seeking data-adaptive sparsity and scaling. As a result, in addition to feature selection for each type of data, integrative clustering is achieved. Numerically, the proposed method compares favorably against its competitors in terms of accuracy (in identifying hidden clusters), computational efficiency, and robustness against unbalanced scales. In particular, compared with a popular method, the new method was competitive in identifying tumor subtypes associated with distinct patient survival patterns when applied to a combined analysis of DNA copy number, mRNA expression, and DNA methylation data in a glioblastoma multiforme study. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Binghui Liu
- School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin Province, China.,School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A.,Division of Biostatistics, University of Minnesota, 420 Delaware St. S.E., Minneapolis, 55455, MN, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, 420 Delaware St. S.E., Minneapolis, 55455, MN, U.S.A
| |
Collapse
|
9
|
de Oliveira Dal'Molin CG, Orellana C, Gebbie L, Steen J, Hodson MP, Chrysanthopoulos P, Plan MR, McQualter R, Palfreyman RW, Nielsen LK. Metabolic Reconstruction of Setaria italica: A Systems Biology Approach for Integrating Tissue-Specific Omics and Pathway Analysis of Bioenergy Grasses. FRONTIERS IN PLANT SCIENCE 2016; 7:1138. [PMID: 27559337 PMCID: PMC4978736 DOI: 10.3389/fpls.2016.01138] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 07/18/2016] [Indexed: 05/19/2023]
Abstract
The urgent need for major gains in industrial crops productivity and in biofuel production from bioenergy grasses have reinforced attention on understanding C4 photosynthesis. Systems biology studies of C4 model plants may reveal important features of C4 metabolism. Here we chose foxtail millet (Setaria italica), as a C4 model plant and developed protocols to perform systems biology studies. As part of the systems approach, we have developed and used a genome-scale metabolic reconstruction in combination with the use of multi-omics technologies to gain more insights into the metabolism of S. italica. mRNA, protein, and metabolite abundances, were measured in mature and immature stem/leaf phytomers, and the multi-omics data were integrated into the metabolic reconstruction framework to capture key metabolic features in different developmental stages of the plant. RNA-Seq reads were mapped to the S. italica resulting for 83% coverage of the protein coding genes of S. italica. Besides revealing similarities and differences in central metabolism of mature and immature tissues, transcriptome analysis indicates significant gene expression of two malic enzyme isoforms (NADP- ME and NAD-ME). Although much greater expression levels of NADP-ME genes are observed and confirmed by the correspondent protein abundances in the samples, the expression of multiple genes combined to the significant abundance of metabolites that participates in C4 metabolism of NAD-ME and NADP-ME subtypes suggest that S. italica may use mixed decarboxylation modes of C4 photosynthetic pathways under different plant developmental stages. The overall analysis also indicates different levels of regulation in mature and immature tissues in carbon fixation, glycolysis, TCA cycle, amino acids, fatty acids, lignin, and cellulose syntheses. Altogether, the multi-omics analysis reveals different biological entities and their interrelation and regulation over plant development. With this study, we demonstrated that this systems approach is powerful enough to complement the functional metabolic annotation of bioenergy grasses.
Collapse
Affiliation(s)
- Cristiana G. de Oliveira Dal'Molin
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
- *Correspondence: Cristiana G. de Oliveira Dal'Molin
| | - Camila Orellana
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Leigh Gebbie
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Jennifer Steen
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Mark P. Hodson
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
- Metabolomics Australia, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Panagiotis Chrysanthopoulos
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
- Metabolomics Australia, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Manuel R. Plan
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
- Metabolomics Australia, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Richard McQualter
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Robin W. Palfreyman
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| | - Lars K. Nielsen
- Centre for Systems and Synthetic Biology, Australian Institute for Bioengineering and Nanotechnology, The University of QueenslandBrisbane, QLD, Australia
| |
Collapse
|
10
|
Lewis AM, Abu-Absi NR, Borys MC, Li ZJ. The use of 'Omics technology to rationally improve industrial mammalian cell line performance. Biotechnol Bioeng 2015; 113:26-38. [PMID: 26059229 DOI: 10.1002/bit.25673] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 03/25/2015] [Accepted: 06/01/2015] [Indexed: 02/06/2023]
Abstract
Biologics represent an increasingly important class of therapeutics, with 7 of the 10 top selling drugs from 2013 being in this class. Furthermore, health authority approval of biologics in the immuno-oncology space is expected to transform treatment of patients with debilitating and deadly diseases. The growing importance of biologics in the healthcare field has also resulted in the recent approvals of several biosimilars. These recent developments, combined with pressure to provide treatments at lower costs to payers, are resulting in increasing need for the industry to quickly and efficiently develop high yielding, robust processes for the manufacture of biologics with the ability to control quality attributes within narrow distributions. Achieving this level of manufacturing efficiency and the ability to design processes capable of regulating growth, death and other cellular pathways through manipulation of media, feeding strategies, and other process parameters will undoubtedly be facilitated through systems biology tools generated in academic and public research communities. Here we discuss the intersection of systems biology, 'Omics technologies, and mammalian bioprocess sciences. Specifically, we address how these methods in conjunction with traditional monitoring techniques represent a unique opportunity to better characterize and understand host cell culture state, shift from an empirical to rational approach to process development and optimization of bioreactor cultivation processes. We summarize the following six key areas: (i) research applied to parental, non-recombinant cell lines; (ii) systems level datasets generated with recombinant cell lines; (iii) datasets linking phenotypic traits to relevant biomarkers; (iv) data depositories and bioinformatics tools; (v) in silico model development, and (vi) examples where these approaches have been used to rationally improve cellular processes. We critically assess relevant and state of the art research being conducted in academic, government and industrial laboratories. Furthermore, we apply our expertise in bioprocess to define a potential model for integration of these systems biology approaches into biologics development.
Collapse
Affiliation(s)
- Amanda M Lewis
- Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb Company, 35 South Street, Hopkinton 01748, Massachusetts.
| | - Nicholas R Abu-Absi
- Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb Company, 35 South Street, Hopkinton 01748, Massachusetts
| | - Michael C Borys
- Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb Company, 35 South Street, Hopkinton 01748, Massachusetts
| | - Zheng Jian Li
- Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb Company, 35 South Street, Hopkinton 01748, Massachusetts
| |
Collapse
|
11
|
Dos Santos CC. Shedding metabo'light' on the search for sepsis biomarkers. CRITICAL CARE : THE OFFICIAL JOURNAL OF THE CRITICAL CARE FORUM 2015; 19:277. [PMID: 26148483 PMCID: PMC4493818 DOI: 10.1186/s13054-015-0969-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The clinical presentation of severe infection with generalized inflammation is similar, if not identical, to systemic inflammation induced by sterile tissue injury. Novel models and unbiased technologies are urgently needed for biomarker identification and disease profiling in sepsis. Here we briefly review the article of Kamisoglu and colleagues in this issue of Critical Care on comparing metabolomics data from different studies to assess whether responses elicited by endotoxin recapitulate, at least in part, those seen in clinical sepsis.
Collapse
Affiliation(s)
- Claudia C Dos Santos
- Interdepartmental Division of Critical Care, The Keenan Research Centre of the Li Ka Shing Knowledge Institute of St Michael's Hospital, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.
| |
Collapse
|
12
|
Fondi M, Liò P. Multi -omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol Res 2015; 171:52-64. [PMID: 25644953 DOI: 10.1016/j.micres.2015.01.003] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 01/02/2015] [Accepted: 01/03/2015] [Indexed: 12/27/2022]
Abstract
Integrated -omics approaches are quickly spreading across microbiology research labs, leading to (i) the possibility of detecting previously hidden features of microbial cells like multi-scale spatial organization and (ii) tracing molecular components across multiple cellular functional states. This promises to reduce the knowledge gap between genotype and phenotype and poses new challenges for computational microbiologists. We underline how the capability to unravel the complexity of microbial life will strongly depend on the integration of the huge and diverse amount of information that can be derived today from -omics experiments. In this work, we present opportunities and challenges of multi -omics data integration in current systems biology pipelines. We here discuss which layers of biological information are important for biotechnological and clinical purposes, with a special focus on bacterial metabolism and modelling procedures. A general review of the most recent computational tools for performing large-scale datasets integration is also presented, together with a possible framework to guide the design of systems biology experiments by microbiologists.
Collapse
Affiliation(s)
- Marco Fondi
- Florence Computational Biology Group (ComBo), University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy; Laboratory of Microbial and Molecular Evolution, Department of Biology, University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy.
| | - Pietro Liò
- University of Cambridge, Computer Laboratory, 15 JJ Thomson Avenue, CB3 0FD Cambridge, UK
| |
Collapse
|
13
|
Abstract
Standard protocols are available in order to apply Phenotype MicroArray (PM) technology to characterize different groups of microorganisms. Nevertheless, there is the need to pay attention to several crucial steps in order to obtain high-quality and reproducible data from PM, such as the choice of the Dye mix, the type and concentration of the carbon source in metabolic experiments, the use of a buffered medium. A systematic research of auxotrophies in strains to be tested should be carefully evaluated before starting with PM experiments. Detailed protocols to obtain defined and reproducible phenotypic profiles for bacteria and yeasts are shown. Moreover, the innovative software opm R packages and DuctApe suite for the analysis of kinetic data produced by PM and panphenome description are reported.
Collapse
Affiliation(s)
- Carlo Viti
- Dipartimento di Scienze delle Produzioni Agroalimentari e dell'Ambiente (DISPAA), University of Florence, P.le delle Cascine, 24, Florence, 50144, Italy,
| | | | | | | | | |
Collapse
|
14
|
Abstract
With the availability of numerous curated databases, researchers are now able to efficiently use the multitude of biological data by integrating these resources via hyperlinks and cross-references. A large proportion of bioinformatics research tasks, however, may include labor-intensive tasks such as fetching, parsing, and merging datasets and functional annotations from distributed multi-domain databases. This data integration issue is one of the key challenges in bioinformatics. We aim to provide an identifier conversion and data aggregation system as a part of solution to solve this problem with a service named G-Links, 1) by gathering resource URI information from 130 databases and 30 web services in a gene-centric manner so that users can retrieve all available links about a given gene, 2) by providing RESTful API for easy retrieval of links including facet searching based on keywords and/or predicate types, and 3) by producing a variety of outputs as visual HTML page, tab-delimited text, and in Semantic Web formats such as Notation3 and RDF. G-Links as well as other relevant documentation are available at http://link.g-language.org/.
Collapse
Affiliation(s)
- Kazuki Oshita
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-0882, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-0882, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Fujisawa, 252-0882, Japan
| |
Collapse
|
15
|
Rajasundaram D, Selbig J, Persson S, Klie S. Co-ordination and divergence of cell-specific transcription and translation of genes in arabidopsis root cells. ANNALS OF BOTANY 2014; 114:1109-23. [PMID: 25149544 PMCID: PMC4195562 DOI: 10.1093/aob/mcu151] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
BACKGROUND AND AIMS A key challenge in biology is to systematically investigate and integrate the different levels of information available at the global and single-cell level. Recent studies have elucidated spatiotemporal expression patterns of root cell types in Arabidopsis thaliana, and genome-wide quantification of polysome-associated mRNA levels, i.e. the translatome, has also been obtained for corresponding cell types. Translational control has been increasingly recognized as an important regulatory step in protein synthesis. The aim of this study was to investigate coupled transcription and translation by use of publicly available root datasets. METHODS Using cell-type-specific datasets of the root transcriptome and translatome of arabidopsis, a systematic assessment was made of the degree of co-ordination and divergence between these two levels of cellular organization. The computational analysis considered correlation and variation of expression across cell types at both system levels, and also provided insights into the degree of co-regulatory relationships that are preserved between the two processes. KEY RESULTS The overall correlation of expression and translation levels of genes resemble an almost bimodal distribution (mean/median value of 0·08/0·12), with a second, less strongly pronounced 'mode' for negative Pearson's correlation coefficient values. The analysis conducted also confirms that previously identified key transcriptional activators of secondary cell wall development display highly conserved patterns of transcription and translation across the investigated cell types. Moreover, the biological processes that display conserved and divergent patterns based on the cell-type-specific expression and translation levels were identified. CONCLUSIONS In agreement with previous studies in animal cells, a large degree of uncoupling was found between the transcriptome and translatome. However, components and processes were also identified that are under co-ordinated transcriptional and translational control in plant root cells.
Collapse
Affiliation(s)
- Dhivyaa Rajasundaram
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, 14476, Germany Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Joachim Selbig
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, 14476, Germany Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Staffan Persson
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany ARC Centre of Excellence in Plant Cell Walls, School of Botany, University of Melbourne, Parkville, VIC 3010, Australia
| | - Sebastian Klie
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany Targenomix GmbH, Potsdam-Golm, 14476, Germany
| |
Collapse
|
16
|
Hollinshead W, He L, Tang YJ. Biofuel production: an odyssey from metabolic engineering to fermentation scale-up. Front Microbiol 2014; 5:344. [PMID: 25071754 PMCID: PMC4088188 DOI: 10.3389/fmicb.2014.00344] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 06/20/2014] [Indexed: 12/21/2022] Open
Abstract
Metabolic engineering has developed microbial cell factories that can convert renewable carbon sources into biofuels. Current molecular biology tools can efficiently alter enzyme levels to redirect carbon fluxes toward biofuel production, but low product yield and titer in large bioreactors prevent the fulfillment of cheap biofuels. There are three major roadblocks preventing economical biofuel production. First, carbon fluxes from the substrate dissipate into a complex metabolic network. Besides the desired product, microbial hosts direct carbon flux to synthesize biomass, overflow metabolites, and heterologous enzymes. Second, microbial hosts need to oxidize a large portion of the substrate to generate both ATP and NAD(P)H to power biofuel synthesis. High cell maintenance, triggered by the metabolic burdens from genetic modifications, can significantly affect the ATP supply. Thereby, fermentation of advanced biofuels (such as biodiesel and hydrocarbons) often requires aerobic respiration to resolve the ATP shortage. Third, mass transfer limitations in large bioreactors create heterogeneous growth conditions and micro-environmental fluctuations (such as suboptimal O2 level and pH) that induce metabolic stresses and genetic instability. To overcome these limitations, fermentation engineering should merge with systems metabolic engineering. Modern fermentation engineers need to adopt new metabolic flux analysis tools that integrate kinetics, hydrodynamics, and 13C-proteomics, to reveal the dynamic physiologies of the microbial host under large bioreactor conditions. Based on metabolic analyses, fermentation engineers may employ rational pathway modifications, synthetic biology circuits, and bioreactor control algorithms to optimize large-scale biofuel production.
Collapse
Affiliation(s)
- Whitney Hollinshead
- Department of Energy, Environmental and Chemical Engineering, Washington University St. Louis, MO, USA
| | - Lian He
- Department of Energy, Environmental and Chemical Engineering, Washington University St. Louis, MO, USA
| | - Yinjie J Tang
- Department of Energy, Environmental and Chemical Engineering, Washington University St. Louis, MO, USA
| |
Collapse
|
17
|
You L, Zhang B, Tang YJ. Application of stable isotope-assisted metabolomics for cell metabolism studies. Metabolites 2014; 4:142-65. [PMID: 24957020 PMCID: PMC4101500 DOI: 10.3390/metabo4020142] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Revised: 03/18/2014] [Accepted: 03/20/2014] [Indexed: 01/28/2023] Open
Abstract
The applications of stable isotopes in metabolomics have facilitated the study of cell metabolisms. Stable isotope-assisted metabolomics requires: (1) properly designed tracer experiments; (2) stringent sampling and quenching protocols to minimize isotopic alternations; (3) efficient metabolite separations; (4) high resolution mass spectrometry to resolve overlapping peaks and background noises; and (5) data analysis methods and databases to decipher isotopic clusters over a broad m/z range (mass-to-charge ratio). This paper overviews mass spectrometry based techniques for precise determination of metabolites and their isotopologues. It also discusses applications of isotopic approaches to track substrate utilization, identify unknown metabolites and their chemical formulas, measure metabolite concentrations, determine putative metabolic pathways, and investigate microbial community populations and their carbon assimilation patterns. In addition, 13C-metabolite fingerprinting and metabolic models can be integrated to quantify carbon fluxes (enzyme reaction rates). The fluxome, in combination with other "omics" analyses, may give systems-level insights into regulatory mechanisms underlying gene functions. More importantly, 13C-tracer experiments significantly improve the potential of low-resolution gas chromatography-mass spectrometry (GC-MS) for broad-scope metabolism studies. We foresee the isotope-assisted metabolomics to be an indispensable tool in industrial biotechnology, environmental microbiology, and medical research.
Collapse
Affiliation(s)
- Le You
- Department of Energy, Environmental and Chemical Engineering, Washington University, St. Louis, MO 63130, USA.
| | - Baichen Zhang
- Plant Metabolomics Group, Institute of Plant Physiology and Ecology, Shanghai Institute for Biological Sciences, CAS, Shanghai 20032, China.
| | - Yinjie J Tang
- Department of Energy, Environmental and Chemical Engineering, Washington University, St. Louis, MO 63130, USA.
| |
Collapse
|