1
|
Eltohamy KM, Menezes-Blackburn D, Klumpp E, Liu C, Jin J, Xing C, Lu Y, Liang X. Microbially Induced Soil Colloidal Phosphorus Mobilization Under Anoxic Conditions. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:7554-7566. [PMID: 38647007 DOI: 10.1021/acs.est.3c10022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Understanding the behavior of colloidal phosphorus (Pcoll) under anoxic conditions is pivotal for addressing soil phosphorus (P) mobilization and transport and its impact on nutrient cycling. Our study investigated Pcoll dynamics in acidic floodplain soil during a 30-day flooding event. The sudden oxic-to-anoxic shift led to a significant rise in pore-water Pcoll levels, which exceeded soluble P levels by more than 2.7-fold. Colloidal fractions transitioned from dispersed forms (<220 nm) to colloid-associated microaggregates (>220 nm), as confirmed by electron microscopy. The observed increase in colloidal sizes was paralleled by their heightened ability to form aggregates. Compared to sterile control conditions, anoxia prompted the transformation of initially dispersed colloids into larger particles through microbial activity. Curiously, the 16S rRNA and ITS microbial diversity analysis indicated that fungi were more strongly associated with anoxia-induced colloidal release than bacteria. These microbially induced shifts in Pcoll lead to its higher mobility and transport, with direct implications for P release from soil into floodwaters.
Collapse
Affiliation(s)
- Kamel M Eltohamy
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
- Department of Water Relations & Field Irrigation, National Research Centre, Dokki, Cairo 12622, Egypt
| | - Daniel Menezes-Blackburn
- Department of Soils, Water and Agricultural Engineering, Sultan Qaboos University, P.O. Box 34, Al-Khoud 123, Sultanate of Oman
| | - Erwin Klumpp
- Institute of Bio- and Geosciences, Agrosphere (IBG-3), Forschungszentrum Jülich GmbH, Jülich 52425, Germany
| | - Chunlong Liu
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin 150081, China
| | - Junwei Jin
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Chaogang Xing
- Analysis Center of Agrobiology and Environmental Sciences of Zhejiang University, Hangzhou 310058, China
| | - Yuanyuan Lu
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xinqiang Liang
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
- Key Laboratory of Mollisols Agroecology, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Harbin 150081, China
| |
Collapse
|
2
|
Chen HM, Liu JX, Liu D, Hao GF, Yang GF. Human-virus protein-protein interactions maps assist in revealing the pathogenesis of viral infection. Rev Med Virol 2024; 34:e2517. [PMID: 38282401 DOI: 10.1002/rmv.2517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/12/2023] [Accepted: 01/16/2024] [Indexed: 01/30/2024]
Abstract
Many significant viral infections have been recorded in human history, which have caused enormous negative impacts worldwide. Human-virus protein-protein interactions (PPIs) mediate viral infection and immune processes in the host. The identification, quantification, localization, and construction of human-virus PPIs maps are critical prerequisites for understanding the biophysical basis of the viral invasion process and characterising the framework for all protein functions. With the technological revolution and the introduction of artificial intelligence, the human-virus PPIs maps have been expanded rapidly in the past decade and shed light on solving complicated biomedical problems. However, there is still a lack of prospective insight into the field. In this work, we comprehensively review and compare the effectiveness, potential, and limitations of diverse approaches for constructing large-scale PPIs maps in human-virus, including experimental methods based on biophysics and biochemistry, databases of human-virus PPIs, computational methods based on artificial intelligence, and tools for visualising PPIs maps. The work aims to provide a toolbox for researchers, hoping to better assist in deciphering the relationship between humans and viruses.
Collapse
Affiliation(s)
- Hui-Min Chen
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Di Liu
- CAS Key Laboratory of Special Pathogens and Biosafety, Wuhan Institute of Virology, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Wuhan, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| |
Collapse
|
3
|
Kolomeets M, Desnitsky V, Kotenko I, Chechulin A. Graph Visualization: Alternative Models Inspired by Bioinformatics. SENSORS (BASEL, SWITZERLAND) 2023; 23:3747. [PMID: 37050807 PMCID: PMC10099065 DOI: 10.3390/s23073747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/05/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
Currently, the methods and means of human-machine interaction and visualization as its integral part are being increasingly developed. In various fields of scientific knowledge and technology, there is a need to find and select the most effective visualization models for various types of data, as well as to develop automation tools for the process of choosing the best visualization model for a specific case. There are many data visualization tools in various application fields, but at the same time, the main difficulty lies in presenting data of an interconnected (node-link) structure, i.e., networks. Typically, a lot of software means use graphs as the most straightforward and versatile models. To facilitate visual analysis, researchers are developing ways to arrange graph elements to make comparing, searching, and navigating data easier. However, in addition to graphs, there are many other visualization models that are less versatile but have the potential to expand the capabilities of the analyst and provide alternative solutions. In this work, we collected a variety of visualization models, which we call alternative models, to demonstrate how different concepts of information representation can be realized. We believe that adapting these models to improve the means of human-machine interaction will help analysts make significant progress in solving the problems researchers face when working with graphs.
Collapse
|
4
|
Koutrouli M, Karatzas E, Papanikolopoulou K, Pavlopoulos GA. NORMA: The Network Makeup Artist - A Web Tool for Network Annotation Visualization. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:578-586. [PMID: 34171457 PMCID: PMC9801029 DOI: 10.1016/j.gpb.2021.02.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/08/2020] [Accepted: 11/20/2020] [Indexed: 01/26/2023]
Abstract
The Network Makeup Artist (NORMA) is a web tool for interactive network annotation visualization and topological analysis, able to handle multiple networks and annotations simultaneously. Precalculated annotations (e.g., Gene Ontology, Pathway enrichment, community detection, or clustering results) can be uploaded and visualized in a network, either as colored pie-chart nodes or as color-filled areas in a 2D/3D Venn-diagram-like style. In the case where no annotation exists, algorithms for automated community detection are offered. Users can adjust the network views using standard layout algorithms or allow NORMA to slightly modify them for visually better group separation. Once a network view is set, users can interactively select and highlight any group of interest in order to generate publication-ready figures. Briefly, with NORMA, users can encode three types of information simultaneously. These are 1) the network, 2) the communities or annotations of interest, and 3) node categories or expression values. Finally, NORMA offers basic topological analysis and direct topological comparison across any of the selected networks. NORMA service is available at http://norma.pavlopouloslab.info, whereas the code is available at https://github.com/PavlopoulosLab/NORMA.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece,Department of Informatics and Telecommunications, University of Athens, Athens 15703, Greece
| | | | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari 16672, Greece,Corresponding author.
| |
Collapse
|
5
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Koutrouli M, Thanati F, Voutsadaki K, Gkonta M, Hotova J, Kasionis I, Hatzis P, Pavlopoulos GA. Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review. Biomolecules 2021; 11:1245. [PMID: 34439912 PMCID: PMC8391349 DOI: 10.3390/biom11081245] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 08/16/2021] [Accepted: 08/18/2021] [Indexed: 02/06/2023] Open
Abstract
Technological advances in high-throughput techniques have resulted in tremendous growth of complex biological datasets providing evidence regarding various biomolecular interactions. To cope with this data flood, computational approaches, web services, and databases have been implemented to deal with issues such as data integration, visualization, exploration, organization, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more and more difficult for an end user to know what the scope and focus of each repository is and how redundant the information between them is. Several repositories have a more general scope, while others focus on specialized aspects, such as specific organisms or biological systems. Unfortunately, many of these databases are self-contained or poorly documented and maintained. For a clearer view, in this article we provide a comprehensive categorization, comparison and evaluation of such repositories for different bioentity interaction types. We discuss most of the publicly available services based on their content, sources of information, data representation methods, user-friendliness, scope and interconnectivity, and we comment on their strengths and weaknesses. We aim for this review to reach a broad readership varying from biomedical beginners to experts and serve as a reference article in the field of Network Biology.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Kleanthi Voutsadaki
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Maria Gkonta
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Joana Hotova
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Ioannis Kasionis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Pantelis Hatzis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| |
Collapse
|
6
|
Koutrouli M, Hatzis P, Pavlopoulos GA. Exploring Networks in the STRING and Reactome Database. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11516-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
7
|
Sarmah DT, Bairagi N, Chatterjee S. Tracing the footsteps of autophagy in computational biology. Brief Bioinform 2020; 22:5985288. [PMID: 33201177 PMCID: PMC8293817 DOI: 10.1093/bib/bbaa286] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 09/29/2020] [Accepted: 09/30/2020] [Indexed: 12/11/2022] Open
Abstract
Autophagy plays a crucial role in maintaining cellular homeostasis through the degradation of unwanted materials like damaged mitochondria and misfolded proteins. However, the contribution of autophagy toward a healthy cell environment is not only limited to the cleaning process. It also assists in protein synthesis when the system lacks the amino acids’ inflow from the extracellular environment due to diet consumptions. Reduction in the autophagy process is associated with diseases like cancer, diabetes, non-alcoholic steatohepatitis, etc., while uncontrolled autophagy may facilitate cell death. We need a better understanding of the autophagy processes and their regulatory mechanisms at various levels (molecules, cells, tissues). This demands a thorough understanding of the system with the help of mathematical and computational tools. The present review illuminates how systems biology approaches are being used for the study of the autophagy process. A comprehensive insight is provided on the application of computational methods involving mathematical modeling and network analysis in the autophagy process. Various mathematical models based on the system of differential equations for studying autophagy are covered here. We have also highlighted the significance of network analysis and machine learning in capturing the core regulatory machinery governing the autophagy process. We explored the available autophagic databases and related resources along with their attributes that are useful in investigating autophagy through computational methods. We conclude the article addressing the potential future perspective in this area, which might provide a more in-depth insight into the dynamics of autophagy.
Collapse
Affiliation(s)
| | - Nandadulal Bairagi
- Centre for Mathematical Biology and Ecology, Department of Mathematics, Jadavpur University, Kolkata, India
| | - Samrat Chatterjee
- Translational Health Science and Technology Institute, Faridabad, India
| |
Collapse
|
8
|
Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos GA. A Guide to Conquer the Biological Network Era Using Graph Theory. Front Bioeng Biotechnol 2020; 8:34. [PMID: 32083072 PMCID: PMC7004966 DOI: 10.3389/fbioe.2020.00034] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/15/2020] [Indexed: 12/24/2022] Open
Abstract
Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Department of Informatics and Telecommunications, University of Athens, Athens, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States
| | | |
Collapse
|
9
|
Podpečan V, Ramšak Ž, Gruden K, Toivonen H, Lavrač N. Interactive exploration of heterogeneous biological networks with Biomine Explorer. Bioinformatics 2019; 35:5385-5388. [PMID: 31233141 PMCID: PMC6954666 DOI: 10.1093/bioinformatics/btz509] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 04/18/2019] [Accepted: 06/19/2019] [Indexed: 01/11/2023] Open
Abstract
SUMMARY Biomine Explorer is a web application that enables interactive exploration of large heterogeneous biological networks constructed from selected publicly available biological knowledge sources. It is built on top of Biomine, a system which integrates cross-references from several biological databases into a large heterogeneous probabilistic network. Biomine Explorer offers user-friendly interfaces for search, visualization, exploration and manipulation as well as public and private storage of discovered subnetworks with permanent links suitable for inclusion into scientific publications. A JSON-based web API for network search queries is also available for advanced users. AVAILABILITY AND IMPLEMENTATION Biomine Explorer is implemented as a web application, which is publicly available at https://biomine.ijs.si. Registration is not required but registered users can benefit from additional features such as private network repositories.
Collapse
Affiliation(s)
- Vid Podpečan
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
| | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia
| | - Hannu Toivonen
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Nada Lavrač
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
- Centre for Information Technologies and Applied Mathematics, University of Nova Gorica, Nova Gorica, Slovenia
| |
Collapse
|
10
|
Cruz A, Arrais JP, Machado P. Interactive and coordinated visualization approaches for biological data analysis. Brief Bioinform 2019; 20:1513-1523. [PMID: 29590305 DOI: 10.1093/bib/bby019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 01/24/2018] [Indexed: 12/11/2022] Open
Abstract
The field of computational biology has become largely dependent on data visualization tools to analyze the increasing quantities of data gathered through the use of new and growing technologies. Aside from the volume, which often results in large amounts of noise and complex relationships with no clear structure, the visualization of biological data sets is hindered by their heterogeneity, as data are obtained from different sources and contain a wide variety of attributes, including spatial and temporal information. This requires visualization approaches that are able to not only represent various data structures simultaneously but also provide exploratory methods that allow the identification of meaningful relationships that would not be perceptible through data analysis algorithms alone. In this article, we present a survey of visualization approaches applied to the analysis of biological data. We focus on graph-based visualizations and tools that use coordinated multiple views to represent high-dimensional multivariate data, in particular time series gene expression, protein-protein interaction networks and biological pathways. We then discuss how these methods can be used to help solve the current challenges surrounding the visualization of complex biological data sets.
Collapse
Affiliation(s)
- António Cruz
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Joel P Arrais
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| | - Penousal Machado
- Universidade de Coimbra Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Informática
| |
Collapse
|
11
|
Singh A, Rawlings CJ, Hassani-Pak K. KnetMaps: a BioJS component to visualize biological knowledge networks. F1000Res 2018; 7:1651. [PMID: 30755790 PMCID: PMC6347035 DOI: 10.12688/f1000research.16605.1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/11/2018] [Indexed: 11/20/2022] Open
Abstract
KnetMaps is a
BioJS component for the interactive visualization of biological knowledge networks. It is well suited for applications that need to visualise complementary, connected and content-rich data in a single view in order to help users to traverse pathways linking entities of interest, for example to go from genotype to phenotype. KnetMaps loads data in JSON format, visualizes the structure and content of knowledge networks using lightweight JavaScript libraries, and supports interactive touch gestures. KnetMaps uses effective visualization techniques to prevent information overload and to allow researchers to progressively build their knowledge.
Collapse
Affiliation(s)
- Ajit Singh
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | | | - Keywan Hassani-Pak
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| |
Collapse
|
12
|
Brandizi M, Singh A, Rawlings C, Hassani-Pak K. Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach. J Integr Bioinform 2018; 15:/j/jib.ahead-of-print/jib-2018-0023/jib-2018-0023.xml. [PMID: 30085931 PMCID: PMC6340125 DOI: 10.1515/jib-2018-0023] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 06/07/2018] [Indexed: 01/01/2023] Open
Abstract
The speed and accuracy of new scientific discoveries – be it by humans or artificial intelligence – depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).
Collapse
Affiliation(s)
- Marco Brandizi
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| | - Ajit Singh
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| | - Christopher Rawlings
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| | - Keywan Hassani-Pak
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| |
Collapse
|
13
|
Pavlopoulos GA, Kontou PI, Pavlopoulou A, Bouyioukos C, Markou E, Bagos PG. Bipartite graphs in systems biology and medicine: a survey of methods and applications. Gigascience 2018; 7:1-31. [PMID: 29648623 PMCID: PMC6333914 DOI: 10.1093/gigascience/giy014] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Revised: 01/15/2018] [Accepted: 02/13/2018] [Indexed: 11/14/2022] Open
Abstract
The latest advances in high-throughput techniques during the past decade allowed the systems biology field to expand significantly. Today, the focus of biologists has shifted from the study of individual biological components to the study of complex biological systems and their dynamics at a larger scale. Through the discovery of novel bioentity relationships, researchers reveal new information about biological functions and processes. Graphs are widely used to represent bioentities such as proteins, genes, small molecules, ligands, and others such as nodes and their connections as edges within a network. In this review, special focus is given to the usability of bipartite graphs and their impact on the field of network biology and medicine. Furthermore, their topological properties and how these can be applied to certain biological case studies are discussed. Finally, available methodologies and software are presented, and useful insights on how bipartite graphs can shape the path toward the solution of challenging biological problems are provided.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Lawrence Berkeley Labs, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Panagiota I Kontou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Athanasia Pavlopoulou
- Izmir International Biomedicine and Genome Institute (iBG-Izmir), Dokuz Eylül University, 35340, Turkey
| | - Costas Bouyioukos
- Université Paris Diderot, Sorbonne Paris Cité, Epigenetics and Cell Fate, UMR7216, CNRS, France
| | - Evripides Markou
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| | - Pantelis G Bagos
- University of Thessaly, Department of Computer Science and Biomedical Informatics, Papasiopoulou 2–4, Lamia, 35100, Greece
| |
Collapse
|
14
|
Empirical Comparison of Visualization Tools for Larger-Scale Network Analysis. Adv Bioinformatics 2017; 2017:1278932. [PMID: 28804499 PMCID: PMC5540468 DOI: 10.1155/2017/1278932] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 05/14/2017] [Accepted: 06/04/2017] [Indexed: 12/19/2022] Open
Abstract
Gene expression, signal transduction, protein/chemical interactions, biomedical literature cooccurrences, and other concepts are often captured in biological network representations where nodes represent a certain bioentity and edges the connections between them. While many tools to manipulate, visualize, and interactively explore such networks already exist, only few of them can scale up and follow today's indisputable information growth. In this review, we shortly list a catalog of available network visualization tools and, from a user-experience point of view, we identify four candidate tools suitable for larger-scale network analysis, visualization, and exploration. We comment on their strengths and their weaknesses and empirically discuss their scalability, user friendliness, and postvisualization capabilities.
Collapse
|
15
|
Hassani-Pak K, Rawlings C. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes. J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2016-0002/jib-2016-0002.xml. [PMID: 28609292 PMCID: PMC6042805 DOI: 10.1515/jib-2016-0002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 02/16/2017] [Indexed: 02/06/2023] Open
Abstract
Genetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Collapse
|
16
|
Sun L, Zhu Y, Mahmood ASMA, Tudor CO, Ren J, Vijay-Shanker K, Chen J, Schmidt CJ. WebGIVI: a web-based gene enrichment analysis and visualization tool. BMC Bioinformatics 2017; 18:237. [PMID: 28472919 PMCID: PMC5418709 DOI: 10.1186/s12859-017-1664-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Accepted: 04/28/2017] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND A major challenge of high throughput transcriptome studies is presenting the data to researchers in an interpretable format. In many cases, the outputs of such studies are gene lists which are then examined for enriched biological concepts. One approach to help the researcher interpret large gene datasets is to associate genes and informative terms (iTerm) that are obtained from the biomedical literature using the eGIFT text-mining system. However, examining large lists of iTerm and gene pairs is a daunting task. RESULTS We have developed WebGIVI, an interactive web-based visualization tool ( http://raven.anr.udel.edu/webgivi/ ) to explore gene:iTerm pairs. WebGIVI was built via Cytoscape and Data Driven Document JavaScript libraries and can be used to relate genes to iTerms and then visualize gene and iTerm pairs. WebGIVI can accept a gene list that is used to retrieve the gene symbols and corresponding iTerm list. This list can be submitted to visualize the gene iTerm pairs using two distinct methods: a Concept Map or a Cytoscape Network Map. In addition, WebGIVI also supports uploading and visualization of any two-column tab separated data. CONCLUSIONS WebGIVI provides an interactive and integrated network graph of gene and iTerms that allows filtering, sorting, and grouping, which can aid biologists in developing hypothesis based on the input gene lists. In addition, WebGIVI can visualize hundreds of nodes and generate a high-resolution image that is important for most of research publications. The source code can be freely downloaded at https://github.com/sunliang3361/WebGIVI . The WebGIVI tutorial is available at http://raven.anr.udel.edu/webgivi/tutorial.php .
Collapse
Affiliation(s)
- Liang Sun
- Department of Animal and Food Sciences, University of Delaware, Newark, DE USA
- Current address: Computing Service, The Samuel Roberts Noble Foundation, Ardmore, OK 73401 USA
| | - Yongnan Zhu
- Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD USA
- Department of Computer Science, Hangzhou Dianzi University, Hangzhou, 310018 Zhejiang Province People’s Republic of China
| | | | - Catalina O. Tudor
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| | - Jia Ren
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711 USA
| | - K. Vijay-Shanker
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716 USA
| | - Jian Chen
- Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD USA
| | - Carl J. Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, DE USA
| |
Collapse
|
17
|
Hassani-Pak K, Castellote M, Esch M, Hindle M, Lysenko A, Taubert J, Rawlings C. Developing integrated crop knowledge networks to advance candidate gene discovery. Appl Transl Genom 2016; 11:18-26. [PMID: 28018846 PMCID: PMC5167366 DOI: 10.1016/j.atg.2016.10.003] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 10/24/2016] [Indexed: 12/03/2022]
Abstract
The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
Collapse
Affiliation(s)
- Keywan Hassani-Pak
- Rothamsted Research, Department of Computational and Systems Biology, UK
| | - Martin Castellote
- Rothamsted Research, Department of Computational and Systems Biology, UK
- INTA EEA-Balcarce, Laboratory of Agrobiotechnology, Argentina
| | - Maria Esch
- Rothamsted Research, Department of Computational and Systems Biology, UK
| | - Matthew Hindle
- Rothamsted Research, Department of Computational and Systems Biology, UK
| | - Artem Lysenko
- Rothamsted Research, Department of Computational and Systems Biology, UK
| | - Jan Taubert
- Rothamsted Research, Department of Computational and Systems Biology, UK
| | | |
Collapse
|
18
|
Mısırlı G, Hallinan J, Pocock M, Lord P, McLaughlin JA, Sauro H, Wipat A. Data Integration and Mining for Synthetic Biology Design. ACS Synth Biol 2016; 5:1086-1097. [PMID: 27110921 DOI: 10.1021/acssynbio.5b00295] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.
Collapse
Affiliation(s)
- Göksel Mısırlı
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | - Jennifer Hallinan
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | - Matthew Pocock
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
- Turing Ate My Hamster Ltd, NE27
0RT Newcastle upon Tyne, United Kingdom
| | - Phillip Lord
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| | | | - Herbert Sauro
- Department
of Bioengineering, University of Washington, Seattle, Washington 98105, United States
| | - Anil Wipat
- School
of Computing Science, Newcastle University, NE1 7RU Newcastle
upon Tyne, United Kingdom
| |
Collapse
|
19
|
Lysenko A, Roznovăţ IA, Saqi M, Mazein A, Rawlings CJ, Auffray C. Representing and querying disease networks using graph databases. BioData Min 2016; 9:23. [PMID: 27462371 PMCID: PMC4960687 DOI: 10.1186/s13040-016-0102-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 06/30/2016] [Indexed: 11/19/2022] Open
Abstract
Background Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. Results We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. Conclusions Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation. Electronic supplementary material The online version of this article (doi:10.1186/s13040-016-0102-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Artem Lysenko
- Rothamsted Research, Harpenden, West Common, Hertfordshire, AL5 2JQ UK
| | - Irina A Roznovăţ
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Lyon, France
| | - Mansoor Saqi
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Lyon, France
| | - Alexander Mazein
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Lyon, France
| | | | - Charles Auffray
- European Institute for Systems Biology and Medicine (EISBM), CIRI UMR CNRS 5308, CNRS-ENS-UCBL-INSERM, Lyon, France
| |
Collapse
|
20
|
Batley J, Edwards D. The application of genomics and bioinformatics to accelerate crop improvement in a changing climate. CURRENT OPINION IN PLANT BIOLOGY 2016; 30:78-81. [PMID: 26926905 DOI: 10.1016/j.pbi.2016.02.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 02/02/2016] [Accepted: 02/04/2016] [Indexed: 05/22/2023]
Abstract
The changing climate and growing global population will increase pressure on our ability to produce sufficient food. The breeding of novel crops and the adaptation of current crops to the new environment are required to ensure continued food production. Advances in genomics offer the potential to accelerate the genomics based breeding of crop plants. However, relating genomic data to climate related agronomic traits for use in breeding remains a huge challenge, and one which will require coordination of diverse skills and expertise. Bioinformatics, when combined with genomics has the potential to help maintain food security in the face of climate change through the accelerated production of climate ready crops.
Collapse
Affiliation(s)
- Jacqueline Batley
- School of Plant Biology and Institute of Agriculture, University of Western Australia, Crawley 6009, Australia
| | - David Edwards
- School of Plant Biology and Institute of Agriculture, University of Western Australia, Crawley 6009, Australia.
| |
Collapse
|
21
|
Mullen J, Cockell SJ, Tipney H, Woollard PM, Wipat A. Mining integrated semantic networks for drug repositioning opportunities. PeerJ 2016; 4:e1558. [PMID: 26844016 PMCID: PMC4736989 DOI: 10.7717/peerj.1558] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 12/11/2015] [Indexed: 11/20/2022] Open
Abstract
Current research and development approaches to drug discovery have become less fruitful and more costly. One alternative paradigm is that of drug repositioning. Many marketed examples of repositioned drugs have been identified through serendipitous or rational observations, highlighting the need for more systematic methodologies to tackle the problem. Systems level approaches have the potential to enable the development of novel methods to understand the action of therapeutic compounds, but requires an integrative approach to biological data. Integrated networks can facilitate systems level analyses by combining multiple sources of evidence to provide a rich description of drugs, their targets and their interactions. Classically, such networks can be mined manually where a skilled person is able to identify portions of the graph (semantic subgraphs) that are indicative of relationships between drugs and highlight possible repositioning opportunities. However, this approach is not scalable. Automated approaches are required to systematically mine integrated networks for these subgraphs and bring them to the attention of the user. We introduce a formal framework for the definition of integrated networks and their associated semantic subgraphs for drug interaction analysis and describe DReSMin, an algorithm for mining semantically-rich networks for occurrences of a given semantic subgraph. This algorithm allows instances of complex semantic subgraphs that contain data about putative drug repositioning opportunities to be identified in a computationally tractable fashion, scaling close to linearly with network data. We demonstrate the utility of our approach by mining an integrated drug interaction network built from 11 sources. This work identified and ranked 9,643,061 putative drug-target interactions, showing a strong correlation between highly scored associations and those supported by literature. We discuss the 20 top ranked associations in more detail, of which 14 are novel and 6 are supported by the literature. We also show that our approach better prioritizes known drug-target interactions, than other state-of-the art approaches for predicting such interactions.
Collapse
Affiliation(s)
- Joseph Mullen
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science, University of Newcastle-upon-Tyne , Newcastle upon Tyne , United Kingdom
| | - Simon J Cockell
- Bioinformatics Support Unit, University of Newcastle-upon-Tyne , United Kingdom
| | - Hannah Tipney
- Computational Biology, Target Sciences, GSK R&D, GlaxoSmithKline , Stevenage, Hertfordshire , United Kingdom
| | - Peter M Woollard
- Computational Biology, Target Sciences, GSK R&D, GlaxoSmithKline , Stevenage, Hertfordshire , United Kingdom
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Research Group, School of Computing Science, University of Newcastle-upon-Tyne , Newcastle upon Tyne , United Kingdom
| |
Collapse
|
22
|
Al-Harazi O, Al Insaif S, Al-Ajlan MA, Kaya N, Dzimiri N, Colak D. Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network. J Genet Genomics 2015; 43:349-67. [PMID: 27318646 DOI: 10.1016/j.jgg.2015.11.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 10/22/2015] [Accepted: 11/20/2015] [Indexed: 12/16/2022]
Abstract
A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Sadiq Al Insaif
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Monirah A Al-Ajlan
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia; College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
| | - Namik Kaya
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Nduna Dzimiri
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Dilek Colak
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia.
| |
Collapse
|
23
|
Lapatas V, Stefanidakis M, Jimenez RC, Via A, Schneider MV. Data integration in biological research: an overview. JOURNAL OF BIOLOGICAL RESEARCH (THESSALONIKE, GREECE) 2015; 22:9. [PMID: 26336651 PMCID: PMC4557916 DOI: 10.1186/s40709-015-0032-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 08/10/2015] [Indexed: 11/16/2022]
Abstract
Data sharing, integration and annotation are essential to ensure the reproducibility of the analysis and interpretation of the experimental findings. Often these activities are perceived as a role that bioinformaticians and computer scientists have to take with no or little input from the experimental biologist. On the contrary, biological researchers, being the producers and often the end users of such data, have a big role in enabling biological data integration. The quality and usefulness of data integration depend on the existence and adoption of standards, shared formats, and mechanisms that are suitable for biological researchers to submit and annotate the data, so it can be easily searchable, conveniently linked and consequently used for further biological analysis and discovery. Here, we provide background on what is data integration from a computational science point of view, how it has been applied to biological research, which key aspects contributed to its success and future directions.
Collapse
Affiliation(s)
- Vasileios Lapatas
- />Department of Informatics, Ionian University, 7 Tsirigoti Square, Corfu, 49100 Greece
| | - Michalis Stefanidakis
- />Department of Informatics, Ionian University, 7 Tsirigoti Square, Corfu, 49100 Greece
| | | | - Allegra Via
- />Biocomputing Group, Sapienza University, Piazzale Aldo Moro 5, Rome, 00185 Italy
| | | |
Collapse
|
24
|
Athanasiadis EI, Bourdakou MM, Spyrou GM. ZoomOut: Analyzing Multiple Networks as Single Nodes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1213-1216. [PMID: 26451833 DOI: 10.1109/tcbb.2015.2424411] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We have developed ZoomOut web server in order to provide the research community with a tool for analysis, visualization and clustering of networks as a super network, based on their calculated feature properties. Networks can be analysed and be further treated as single nodes in a super network that describe their relations. Specifically, the user interface is divided into three main sections: the Workspace, the Networks Feature Calculations and the Clustering Networks section. In the Workspace section, users are able to upload and manage multiple networks for further processing. In the Networks Feature Calculations section, a variety of network properties are calculated as features for each uploaded network. In the Clustering Networks section, users are able to apply clustering by selecting from the list of previously calculated features. All processed networks can also be visualized as a super interactive network, were interconnections among networks are based on the calculated clustering distances. To the best of our knowledge, this is the first available web-service that allows users to manage, quantify and visualize multiple networks at the same time, handling them as parts of a larger network with properties calculated in an upper scale. The ZoomOut web-application is available at http://bioserver-3.bioacademy.gr/Bioserver/ZoomOut.
Collapse
|
25
|
Pavlopoulos GA, Malliarakis D, Papanikolaou N, Theodosiou T, Enright AJ, Iliopoulos I. Visualizing genome and systems biology: technologies, tools, implementation techniques and trends, past, present and future. Gigascience 2015; 4:38. [PMID: 26309733 PMCID: PMC4548842 DOI: 10.1186/s13742-015-0077-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/03/2015] [Indexed: 01/31/2023] Open
Abstract
"Α picture is worth a thousand words." This widely used adage sums up in a few words the notion that a successful visual representation of a concept should enable easy and rapid absorption of large amounts of information. Although, in general, the notion of capturing complex ideas using images is very appealing, would 1000 words be enough to describe the unknown in a research field such as the life sciences? Life sciences is one of the biggest generators of enormous datasets, mainly as a result of recent and rapid technological advances; their complexity can make these datasets incomprehensible without effective visualization methods. Here we discuss the past, present and future of genomic and systems biology visualization. We briefly comment on many visualization and analysis tools and the purposes that they serve. We focus on the latest libraries and programming languages that enable more effective, efficient and faster approaches for visualizing biological concepts, and also comment on the future human-computer interaction trends that would enable for enhancing visualization further.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | | | - Nikolas Papanikolaou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Theodosis Theodosiou
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| | - Anton J Enright
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Ioannis Iliopoulos
- Bioinformatics & Computational Biology Laboratory, Division of Basic Sciences, University of Crete, Medical School, 70013 Heraklion, Crete Greece
| |
Collapse
|
26
|
Castaneda C, Nalley K, Mannion C, Bhattacharyya P, Blake P, Pecora A, Goy A, Suh KS. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J Clin Bioinforma 2015; 5:4. [PMID: 25834725 PMCID: PMC4381462 DOI: 10.1186/s13336-015-0019-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 02/27/2015] [Indexed: 01/08/2023] Open
Abstract
As research laboratories and clinics collaborate to achieve precision medicine, both communities are required to understand mandated electronic health/medical record (EHR/EMR) initiatives that will be fully implemented in all clinics in the United States by 2015. Stakeholders will need to evaluate current record keeping practices and optimize and standardize methodologies to capture nearly all information in digital format. Collaborative efforts from academic and industry sectors are crucial to achieving higher efficacy in patient care while minimizing costs. Currently existing digitized data and information are present in multiple formats and are largely unstructured. In the absence of a universally accepted management system, departments and institutions continue to generate silos of information. As a result, invaluable and newly discovered knowledge is difficult to access. To accelerate biomedical research and reduce healthcare costs, clinical and bioinformatics systems must employ common data elements to create structured annotation forms enabling laboratories and clinics to capture sharable data in real time. Conversion of these datasets to knowable information should be a routine institutionalized process. New scientific knowledge and clinical discoveries can be shared via integrated knowledge environments defined by flexible data models and extensive use of standards, ontologies, vocabularies, and thesauri. In the clinical setting, aggregated knowledge must be displayed in user-friendly formats so that physicians, non-technical laboratory personnel, nurses, data/research coordinators, and end-users can enter data, access information, and understand the output. The effort to connect astronomical numbers of data points, including ‘-omics’-based molecular data, individual genome sequences, experimental data, patient clinical phenotypes, and follow-up data is a monumental task. Roadblocks to this vision of integration and interoperability include ethical, legal, and logistical concerns. Ensuring data security and protection of patient rights while simultaneously facilitating standardization is paramount to maintaining public support. The capabilities of supercomputing need to be applied strategically. A standardized, methodological implementation must be applied to developed artificial intelligence systems with the ability to integrate data and information into clinically relevant knowledge. Ultimately, the integration of bioinformatics and clinical data in a clinical decision support system promises precision medicine and cost effective and personalized patient care.
Collapse
Affiliation(s)
- Christian Castaneda
- Genomics and Biomarkers Program, Hackensack University Medical Center, Hackensack, NJ 07601 USA
| | - Kip Nalley
- Sophic Alliance, 2275 Research Blvd., Suite 500, Rockville, MD 20850 USA
| | - Ciaran Mannion
- Department of Pathology, Hackensack University Medical Center, Hackensack, NJ 07601 USA
| | - Pritish Bhattacharyya
- Department of Pathology, Hackensack University Medical Center, Hackensack, NJ 07601 USA
| | - Patrick Blake
- Sophic Alliance, 2275 Research Blvd., Suite 500, Rockville, MD 20850 USA
| | - Andrew Pecora
- John Theurer Cancer Center, Hackensack University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601 USA
| | - Andre Goy
- John Theurer Cancer Center, Hackensack University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601 USA
| | - K Stephen Suh
- Genomics and Biomarkers Program, Hackensack University Medical Center, Hackensack, NJ 07601 USA ; John Theurer Cancer Center, Hackensack University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601 USA
| |
Collapse
|
27
|
Tuszynski JA, Winter P, White D, Tseng CY, Sahu KK, Gentile F, Spasevska I, Omar SI, Nayebi N, Churchill CD, Klobukowski M, El-Magd RMA. Mathematical and computational modeling in biology at multiple scales. Theor Biol Med Model 2014; 11:52. [PMID: 25542608 PMCID: PMC4396153 DOI: 10.1186/1742-4682-11-52] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 11/25/2014] [Indexed: 01/08/2023] Open
Abstract
A variety of topics are reviewed in the area of mathematical and computational modeling in biology, covering the range of scales from populations of organisms to electrons in atoms. The use of maximum entropy as an inference tool in the fields of biology and drug discovery is discussed. Mathematical and computational methods and models in the areas of epidemiology, cell physiology and cancer are surveyed. The technique of molecular dynamics is covered, with special attention to force fields for protein simulations and methods for the calculation of solvation free energies. The utility of quantum mechanical methods in biophysical and biochemical modeling is explored. The field of computational enzymology is examined.
Collapse
Affiliation(s)
- Jack A Tuszynski
- Department of Physics and Department of Oncology, University of Alberta, Edmonton, Canada.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Hanley SJ, Karp A. Genetic strategies for dissecting complex traits in biomass willows (Salix spp.). TREE PHYSIOLOGY 2014; 34:1167-80. [PMID: 24218244 DOI: 10.1093/treephys/tpt089] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Willows are highly diverse catkin-bearing trees and shrubs of the genus Salix. They occur in many growth forms, from tall trees to creeping alpines, and successfully occupy a wide variety of ecological niches. Shrubby willows (sub-genus Vetrix) have many characteristics that render them suited to cultivation in much faster growth cycles than conventional forestry. They respond well to coppicing, can be propagated vegetatively as cuttings and achieve rapid growth with low fertilizer inputs. As a result, willows grown as short rotation coppice are now among the leading commercially grown biomass crops in temperate regions. However, although willows have a long history of cultivation for traditional uses, their industrial use is relatively recent and, compared with major arable crops, they are largely undomesticated. Breeding programmes initiated to improve willow as a biomass crop achieved a doubling of yields within a period of <15 years. These advances were made by selecting for stem characteristics (height and diameter) and coppicing response (shoot number and shoot vigour), as well as resistance to pests, diseases and environmental stress, with little or no knowledge of the genetic basis of these traits. Genetics and genomics, combined with extensive phenotyping, have substantially improved our understanding of the basis of biomass traits in willow for more targeted breeding via marker-assisted selection. Here, we present the strategy we have adopted in which a genetic-based approach was used to dissect complex traits into more defined components for molecular breeding and gene discovery.
Collapse
Affiliation(s)
- Steven J Hanley
- Department of AgroEcology, Rothamsted Research, Cropping Carbon Institute Programme, Harpenden, Hertfordshire AL5 2JQ, UK
| | - Angela Karp
- Department of AgroEcology, Rothamsted Research, Cropping Carbon Institute Programme, Harpenden, Hertfordshire AL5 2JQ, UK
| |
Collapse
|
29
|
Abstract
Systems biology has gained a tremendous amount of interest in the last few years. This is partly due to the realization that traditional approaches focusing only on a few molecules at a time cannot describe the impact of aberrant or modulated molecular environments across a whole system. Furthermore, a hypothesis-driven study aims to prove or disprove its postulations, whereas a hypothesis-free systems approach can yield an unbiased and novel testable hypothesis as an end-result. This latter approach foregoes assumptions which predict how a biological system should react to an altered microenvironment within a cellular context, across a tissue or impacting on distant organs. Additionally, re-use of existing data by systematic data mining and re-stratification, one of the cornerstones of integrative systems biology, is also gaining attention. While tremendous efforts using a systems methodology have already yielded excellent results, it is apparent that a lack of suitable analytic tools and purpose-built databases poses a major bottleneck in applying a systematic workflow. This review addresses the current approaches used in systems analysis and obstacles often encountered in large-scale data analysis and integration which tend to go unnoticed, but have a direct impact on the final outcome of a systems approach. Its wide applicability, ranging from basic research, disease descriptors, pharmacological studies, to personalized medicine, makes this emerging approach well suited to address biological and medical questions where conventional methods are not ideal.
Collapse
Affiliation(s)
- Scott W Robinson
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, BHF Glasgow Cardiovascular Research Centre, 126 University Place, Glasgow G12 8TA, UK
| | - Marco Fernandes
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, BHF Glasgow Cardiovascular Research Centre, 126 University Place, Glasgow G12 8TA, UK
| | - Holger Husi
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, BHF Glasgow Cardiovascular Research Centre, 126 University Place, Glasgow G12 8TA, UK
| |
Collapse
|
30
|
Li H, Liu C. 3DProIN: Protein-Protein Interaction Networks and Structure Visualization. AMERICAN JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2014; 2:32-37. [PMID: 25664223 DOI: 10.7726/ajbcb.2014.1003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
3DProIN is a computational tool to visualize protein-protein interaction networks in both two dimensional (2D) and three dimensional (3D) view. It models protein-protein interactions in a graph and explores the biologically relevant features of the tertiary structures of each protein in the network. Properties such as color, shape and name of each node (protein) of the network can be edited in either 2D or 3D views. 3DProIN is implemented using 3D Java and C programming languages. The internet crawl technique is also used to parse dynamically grasped protein interactions from protein data bank (PDB). It is a java applet component that is embedded in the web page and it can be used on different platforms including Linux, Mac and Window using web browsers such as Firefox, Internet Explorer, Chrome and Safari. It also was converted into a mac app and submitted to the App store as a free app. Mac users can also download the app from our website. 3DProIN is available for academic research at http://bicompute.appspot.com.
Collapse
Affiliation(s)
- Hui Li
- Department of Systems and Computer Science, Howard University, Washington, DC 20059, USA
| | - Chunmei Liu
- Department of Systems and Computer Science, Howard University, Washington, DC 20059, USA
| |
Collapse
|
31
|
Horn F, Rittweger M, Taubert J, Lysenko A, Rawlings C, Guthke R. Interactive exploration of integrated biological datasets using context-sensitive workflows. Front Genet 2014; 5:21. [PMID: 24600467 PMCID: PMC3929842 DOI: 10.3389/fgene.2014.00021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Accepted: 01/21/2014] [Indexed: 11/16/2022] Open
Abstract
Network inference utilizes experimental high-throughput data for the reconstruction of molecular interaction networks where new relationships between the network entities can be predicted. Despite the increasing amount of experimental data, the parameters of each modeling technique cannot be optimized based on the experimental data alone, but needs to be qualitatively assessed if the components of the resulting network describe the experimental setting. Candidate list prioritization and validation builds upon data integration and data visualization. The application of tools supporting this procedure is limited to the exploration of smaller information networks because the display and interpretation of large amounts of information is challenging regarding the computational effort and the users' experience. The Ondex software framework was extended with customizable context-sensitive menus which allow additional integration and data analysis options for a selected set of candidates during interactive data exploration. We provide new functionalities for on-the-fly data integration using InterProScan, PubMed Central literature search, and sequence-based homology search. We applied the Ondex system to the integration of publicly available data for Aspergillus nidulans and analyzed transcriptome data. We demonstrate the advantages of our approach by proposing new hypotheses for the functional annotation of specific genes of differentially expressed fungal gene clusters. Our extension of the Ondex framework makes it possible to overcome the separation between data integration and interactive analysis. More specifically, computationally demanding calculations can be performed on selected sub-networks without losing any information from the whole network. Furthermore, our extensions allow for direct access to online biological databases which helps to keep the integrated information up-to-date.
Collapse
Affiliation(s)
- Fabian Horn
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany
| | - Martin Rittweger
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany
| | - Jan Taubert
- Department of Computational and Systems Biology, Rothamsted Research Harpenden, UK
| | - Artem Lysenko
- Department of Computational and Systems Biology, Rothamsted Research Harpenden, UK
| | - Christopher Rawlings
- Department of Computational and Systems Biology, Rothamsted Research Harpenden, UK
| | - Reinhard Guthke
- Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany
| |
Collapse
|
32
|
Pavlopoulos GA, Promponas VJ, Ouzounis CA, Iliopoulos I. Biological information extraction and co-occurrence analysis. Methods Mol Biol 2014; 1159:77-92. [PMID: 24788262 DOI: 10.1007/978-1-4939-0709-0_5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Nowadays, it is possible to identify terms corresponding to biological entities within passages in biomedical text corpora: critically, their potential relationships then need to be detected. These relationships are typically detected by co-occurrence analysis, revealing associations between bioentities through their coexistence in single sentences and/or entire abstracts. These associations implicitly define networks, whose nodes represent terms/bioentities/concepts being connected by relationship edges; edge weights might represent confidence for these semantic connections.This chapter provides a review of current methods for co-occurrence analysis, focusing on data storage, analysis, and representation. We highlight scenarios of these approaches implemented by useful tools for information extraction and knowledge inference in the field of systems biology. We illustrate the practical utility of two online resources providing services of this type-namely, STRING and BioTextQuest-concluding with a discussion of current challenges and future perspectives in the field.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete Medical School, Heraklion, 71110, Greece
| | | | | | | |
Collapse
|
33
|
Taubert J, Hassani-Pak K, Castells-Brooke N, Rawlings CJ. Ondex Web: web-based visualization and exploration of heterogeneous biological networks. Bioinformatics 2013; 30:1034-5. [PMID: 24363379 PMCID: PMC3967113 DOI: 10.1093/bioinformatics/btt740] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Ondex Web is a new web-based implementation of the network visualization and exploration tools from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into Web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL. AVAILABILITY AND IMPLEMENTATION http://ondex.rothamsted.ac.uk/OndexWeb.
Collapse
Affiliation(s)
- Jan Taubert
- Rothamsted Research, Computational and Systems Biology, Harpenden, AL5 2JQ, UK
| | | | | | | |
Collapse
|
34
|
Lysenko A, Urban M, Bennett L, Tsoka S, Janowska-Sejda E, Rawlings CJ, Hammond-Kosack KE, Saqi M. Network-based data integration for selecting candidate virulence associated proteins in the cereal infecting fungus Fusarium graminearum. PLoS One 2013; 8:e67926. [PMID: 23861834 PMCID: PMC3701590 DOI: 10.1371/journal.pone.0067926] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Accepted: 05/23/2013] [Indexed: 11/19/2022] Open
Abstract
The identification of virulence genes in plant pathogenic fungi is important for understanding the infection process, host range and for developing control strategies. The analysis of already verified virulence genes in phytopathogenic fungi in the context of integrated functional networks can give clues about the underlying mechanisms and pathways directly or indirectly linked to fungal pathogenicity and can suggest new candidates for further experimental investigation, using a 'guilt by association' approach. Here we study 133 genes in the globally important Ascomycete fungus Fusarium graminearum that have been experimentally tested for their involvement in virulence. An integrated network that combines information from gene co-expression, predicted protein-protein interactions and sequence similarity was employed and, using 100 genes known to be required for virulence, we found a total of 215 new proteins potentially associated with virulence of which 29 are annotated as hypothetical proteins. The majority of these potential virulence genes are located in chromosomal regions known to have a low recombination frequency. We have also explored the taxonomic diversity of these candidates and found 25 sequences, which are likely to be fungal specific. We discuss the biological relevance of a few of the potentially novel virulence associated genes in detail. The analysis of already verified virulence genes in phytopathogenic fungi in the context of integrated functional networks can give clues about the underlying mechanisms and pathways directly or indirectly linked to fungal pathogenicity and can suggest new candidates for further experimental investigation, using a 'guilt by association' approach.
Collapse
Affiliation(s)
- Artem Lysenko
- Department of Computational and Systems Biology, Rothamsted Research, Harpenden, United Kingdom
| | - Martin Urban
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, United Kingdom
| | - Laura Bennett
- Department of Informatics, School of Natural and Mathematical Sciences, Kings College London, Strand, London, United Kingdom
| | - Sophia Tsoka
- Department of Informatics, School of Natural and Mathematical Sciences, Kings College London, Strand, London, United Kingdom
| | - Elzbieta Janowska-Sejda
- Department of Computational and Systems Biology, Rothamsted Research, Harpenden, United Kingdom
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, United Kingdom
| | - Chris J. Rawlings
- Department of Computational and Systems Biology, Rothamsted Research, Harpenden, United Kingdom
| | - Kim E. Hammond-Kosack
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, United Kingdom
- * E-mail:
| | - Mansoor Saqi
- Department of Computational and Systems Biology, Rothamsted Research, Harpenden, United Kingdom
| |
Collapse
|
35
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 512] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
36
|
Arpino JAJ, Hancock EJ, Anderson J, Barahona M, Stan GBV, Papachristodoulou A, Polizzi K. Tuning the dials of Synthetic Biology. MICROBIOLOGY-SGM 2013; 159:1236-1253. [PMID: 23704788 PMCID: PMC3749727 DOI: 10.1099/mic.0.067975-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Synthetic Biology is the ‘Engineering of Biology’ – it aims to use a forward-engineering design cycle based on specifications, modelling, analysis, experimental implementation, testing and validation to modify natural or design new, synthetic biology systems so that they behave in a predictable fashion. Motivated by the need for truly plug-and-play synthetic biological components, we present a comprehensive review of ways in which the various parts of a biological system can be modified systematically. In particular, we review the list of ‘dials’ that are available to the designer and discuss how they can be modelled, tuned and implemented. The dials are categorized according to whether they operate at the global, transcriptional, translational or post-translational level and the resolution that they operate at. We end this review with a discussion on the relative advantages and disadvantages of some dials over others.
Collapse
Affiliation(s)
- James A J Arpino
- Centre for Synthetic Biology and Innovation, Imperial College London, South Kensington Campus, London SW7 2AZ, UK.,Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, UK.,Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Edward J Hancock
- Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK
| | - James Anderson
- St John's College, St Giles, Oxford OX1 3JP, UK.,Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK
| | - Mauricio Barahona
- Department of Mathematics, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Guy-Bart V Stan
- Department of Bioengineering, Imperial College London, South Kensington Campus, London SW7 2AZ, UK.,Centre for Synthetic Biology and Innovation, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | | | - Karen Polizzi
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK.,Centre for Synthetic Biology and Innovation, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| |
Collapse
|
37
|
Vincent J, Dai Z, Ravel C, Choulet F, Mouzeyar S, Bouzidi MF, Agier M, Martre P. dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat014. [PMID: 23660284 PMCID: PMC3649639 DOI: 10.1093/database/bat014] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL:urgi.versailles.inra.fr/dbWFA/
Collapse
Affiliation(s)
- Jonathan Vincent
- INRA, UMR1095 Genetics, Diversity and Ecophysiology of Cereals, 5 Chemin de Beaulieu, Clermont-Ferrand, F-63 039 Cedex 2, France
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Abstract
PURPOSE OF REVIEW Skeletal muscle loss appears to be the most significant event in cancer cachexia and is associated with a poor outcome. The balance between mechanisms that control synthesis and degradation is fundamental when designing new therapies. This review aims to highlight the molecular mechanisms that are associated with protein kinetics. RECENT FINDINGS The mechanisms that promote muscle synthesis have been explored in detail recently but moreover they have been the mechanisms behind degradation. Specific advances in cellular signalling molecules related to autophagy pathways including signal transducer and activators of transcription-3, activin type-2 receptor, TRAF6, and transcriptomic research have been given special attention in this review to highlight their roles in degradation and as potential targets for therapeutics. Ways to quantify muscle loss are badly needed for outcome measures; recent research using radiolabelled amino acids has also been discussed in this review. SUMMARY Only by having an appreciation of the complex regulation of muscle protein synthesis and degradation, will potential new therapeutics be able to be developed. This review identifies known targets in molecular pathways of current interest, explores methods used to find novel genes which may be involved in muscle kinetics and also highlights ways in which muscle kinetics may be measured to assess the efficacy of such interventions.
Collapse
|
39
|
Furlong LI. Human diseases through the lens of network biology. Trends Genet 2013; 29:150-9. [DOI: 10.1016/j.tig.2012.11.004] [Citation(s) in RCA: 150] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 10/24/2012] [Accepted: 11/09/2012] [Indexed: 12/13/2022]
|
40
|
Van Landeghem S, De Bodt S, Drebert ZJ, Inzé D, Van de Peer Y. The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis. THE PLANT CELL 2013; 25:794-807. [PMID: 23532071 PMCID: PMC3634689 DOI: 10.1105/tpc.112.108753] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 02/27/2013] [Accepted: 03/08/2013] [Indexed: 05/21/2023]
Abstract
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Collapse
Affiliation(s)
- Sofie Van Landeghem
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Stefanie De Bodt
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Zuzanna J. Drebert
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Dirk Inzé
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- Address correspondence to
| |
Collapse
|
41
|
Agapito G, Guzzi PH, Cannataro M. Visualization of protein interaction networks: problems and solutions. BMC Bioinformatics 2013; 14 Suppl 1:S1. [PMID: 23368786 PMCID: PMC3548679 DOI: 10.1186/1471-2105-14-s1-s1] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Visualization concerns the representation of data visually and is an important task in scientific research. Protein-protein interactions (PPI) are discovered using either wet lab techniques, such mass spectrometry, or in silico predictions tools, resulting in large collections of interactions stored in specialized databases. The set of all interactions of an organism forms a protein-protein interaction network (PIN) and is an important tool for studying the behaviour of the cell machinery. Since graphic representation of PINs may highlight important substructures, e.g. protein complexes, visualization is more and more used to study the underlying graph structure of PINs. Although graphs are well known data structures, there are different open problems regarding PINs visualization: the high number of nodes and connections, the heterogeneity of nodes (proteins) and edges (interactions), the possibility to annotate proteins and interactions with biological information extracted by ontologies (e.g. Gene Ontology) that enriches the PINs with semantic information, but complicates their visualization. Methods In these last years many software tools for the visualization of PINs have been developed. Initially thought for visualization only, some of them have been successively enriched with new functions for PPI data management and PIN analysis. The paper analyzes the main software tools for PINs visualization considering four main criteria: (i) technology, i.e. availability/license of the software and supported OS (Operating System) platforms; (ii) interoperability, i.e. ability to import/export networks in various formats, ability to export data in a graphic format, extensibility of the system, e.g. through plug-ins; (iii) visualization, i.e. supported layout and rendering algorithms and availability of parallel implementation; (iv) analysis, i.e. availability of network analysis functions, such as clustering or mining of the graph, and the possibility to interact with external databases. Results Currently, many tools are available and it is not easy for the users choosing one of them. Some tools offer sophisticated 2D and 3D network visualization making available many layout algorithms, others tools are more data-oriented and support integration of interaction data coming from different sources and data annotation. Finally, some specialistic tools are dedicated to the analysis of pathways and cellular processes and are oriented toward systems biology studies, where the dynamic aspects of the processes being studied are central. Conclusion A current trend is the deployment of open, extensible visualization tools (e.g. Cytoscape), that may be incrementally enriched by the interactomics community with novel and more powerful functions for PIN analysis, through the development of plug-ins. On the other hand, another emerging trend regards the efficient and parallel implementation of the visualization engine that may provide high interactivity and near real-time response time, as in NAViGaTOR. From a technological point of view, open-source, free and extensible tools, like Cytoscape, guarantee a long term sustainability due to the largeness of the developers and users communities, and provide a great flexibility since new functions are continuously added by the developer community through new plug-ins, but the emerging parallel, often closed-source tools like NAViGaTOR, can offer near real-time response time also in the analysis of very huge PINs.
Collapse
Affiliation(s)
- Giuseppe Agapito
- Department of Medical and Surgical Sciences, Magna Graecia University of Catanzaro, Italy
| | | | | |
Collapse
|
42
|
|
43
|
Abstract
This article aims to introduce the nature of data integration to life scientists. Generally, the subject of data integration is not discussed outside the field of computational science and is not covered in any detail, or even neglected, when teaching/training trainees. End users (hereby defined as wet-lab trainees, clinicians, lab researchers) will mostly interact with bioinformatics resources and tools through web interfaces that mask the user from the data integration processes. However, the lack of formal training or acquaintance with even simple database concepts and terminology often results in a real obstacle to the full comprehension of the resources and tools the end users wish to access. Understanding how data integration works is fundamental to empowering trainees to see the limitations as well as the possibilities when exploring, retrieving, and analysing biological data from databases. Here we introduce a game-based learning activity for training/teaching the topic of data integration that trainers/educators can adopt and adapt for their classroom. In particular we provide an example using DAS (Distributed Annotation Systems) as a method for data integration.
Collapse
Affiliation(s)
- Maria Victoria Schneider
- Outreach and Training Team, European Molecular Biology Laboratory Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
| | | |
Collapse
|
44
|
Signalling network construction for modelling plant defence response. PLoS One 2012; 7:e51822. [PMID: 23272172 PMCID: PMC3525666 DOI: 10.1371/journal.pone.0051822] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Accepted: 11/06/2012] [Indexed: 12/28/2022] Open
Abstract
Plant defence signalling response against various pathogens, including viruses, is a complex phenomenon. In resistant interaction a plant cell perceives the pathogen signal, transduces it within the cell and performs a reprogramming of the cell metabolism leading to the pathogen replication arrest. This work focuses on signalling pathways crucial for the plant defence response, i.e., the salicylic acid, jasmonic acid and ethylene signal transduction pathways, in the Arabidopsis thaliana model plant. The initial signalling network topology was constructed manually by defining the representation formalism, encoding the information from public databases and literature, and composing a pathway diagram. The manually constructed network structure consists of 175 components and 387 reactions. In order to complement the network topology with possibly missing relations, a new approach to automated information extraction from biological literature was developed. This approach, named Bio3graph, allows for automated extraction of biological relations from the literature, resulting in a set of (component1, reaction, component2) triplets and composing a graph structure which can be visualised, compared to the manually constructed topology and examined by the experts. Using a plant defence response vocabulary of components and reaction types, Bio3graph was applied to a set of 9,586 relevant full text articles, resulting in 137 newly detected reactions between the components. Finally, the manually constructed topology and the new reactions were merged to form a network structure consisting of 175 components and 524 reactions. The resulting pathway diagram of plant defence signalling represents a valuable source for further computational modelling and interpretation of omics data. The developed Bio3graph approach, implemented as an executable language processing and graph visualisation workflow, is publically available at http://ropot.ijs.si/bio3graph/and can be utilised for modelling other biological systems, given that an adequate vocabulary is provided.
Collapse
|
45
|
Junker A, Rohn H, Schreiber F. Visual analysis of transcriptome data in the context of anatomical structures and biological networks. FRONTIERS IN PLANT SCIENCE 2012; 3:252. [PMID: 23162564 PMCID: PMC3498740 DOI: 10.3389/fpls.2012.00252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 10/22/2012] [Indexed: 05/12/2023]
Abstract
The complexity and temporal as well as spatial resolution of transcriptome datasets is constantly increasing due to extensive technological developments. Here we present methods for advanced visualization and intuitive exploration of transcriptomics data as necessary prerequisites in order to facilitate the gain of biological knowledge. Color-coding of structural images based on the expression level enables a fast visual data analysis in the background of the examined biological system. The network-based exploration of these visualizations allows for comparative analysis of genes with specific transcript patterns and supports the extraction of functional relationships even from large datasets. In order to illustrate the presented methods, the tool HIVE was applied for visualization and exploration of database-retrieved expression data for master regulators of Arabidopsis thaliana flower and seed development in the context of corresponding tissue-specific regulatory networks.
Collapse
Affiliation(s)
- Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGatersleben, Germany
| | - Hendrik Rohn
- Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGatersleben, Germany
| | - Falk Schreiber
- Leibniz Institute of Plant Genetics and Crop Plant Research GaterslebenGatersleben, Germany
- Institute of Computer Science, Martin Luther University Halle-WittenbergHalle, Germany
- Clayton School of Information Technology, Monash UniversityClayton, VIC, Australia
| |
Collapse
|
46
|
Rohn H, Junker A, Hartmann A, Grafahrend-Belau E, Treutler H, Klapperstück M, Czauderna T, Klukas C, Schreiber F. VANTED v2: a framework for systems biology applications. BMC SYSTEMS BIOLOGY 2012; 6:139. [PMID: 23140568 PMCID: PMC3610154 DOI: 10.1186/1752-0509-6-139] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 11/01/2012] [Indexed: 12/21/2022]
Abstract
BACKGROUND Experimental datasets are becoming larger and increasingly complex, spanning different data domains, thereby expanding the requirements for respective tool support for their analysis. Networks provide a basis for the integration, analysis and visualization of multi-omics experimental datasets. RESULTS Here we present VANTED (version 2), a framework for systems biology applications, which comprises a comprehensive set of seven main tasks. These range from network reconstruction, data visualization, integration of various data types, network simulation to data exploration combined with a manifold support of systems biology standards for visualization and data exchange. The offered set of functionalities is instantiated by combining several tasks in order to enable users to view and explore a comprehensive dataset from different perspectives. We describe the system as well as an exemplary workflow. CONCLUSIONS VANTED is a stand-alone framework which supports scientists during the data analysis and interpretation phase. It is available as a Java open source tool from http://www.vanted.org.
Collapse
Affiliation(s)
- Hendrik Rohn
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Astrid Junker
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Anja Hartmann
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Eva Grafahrend-Belau
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Hendrik Treutler
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Matthias Klapperstück
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Tobias Czauderna
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Christian Klukas
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
| | - Falk Schreiber
- , Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr. 3, 06466 Gatersleben, Germany
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, 06120 Halle, Germany
- Clayton School of Information Technology, Monash University, Victoria 3800, Australia
| |
Collapse
|
47
|
Eronen L, Toivonen H. Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics 2012; 13:119. [PMID: 22672646 PMCID: PMC3505483 DOI: 10.1186/1471-2105-13-119] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 04/17/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. RESULTS Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. CONCLUSIONS The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Collapse
Affiliation(s)
- Lauri Eronen
- Biocomputing Platforms Ltd, Innopoli 2, Tekniikantie 14, , FI-02150 Espoo, Finland.
| | | |
Collapse
|
48
|
Garcia-Garcia J, Bonet J, Guney E, Fornes O, Planas J, Oliva B. Networks of ProteinProtein Interactions: From Uncertainty to Molecular Details. Mol Inform 2012; 31:342-62. [PMID: 27477264 DOI: 10.1002/minf.201200005] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 03/09/2012] [Indexed: 11/08/2022]
Abstract
Proteins are the bricks and mortar of cells. The work of proteins is structural and functional, as they are the principal element of the organization of the cell architecture, but they also play a relevant role in its metabolism and regulation. To perform all these functions, proteins need to interact with each other and with other bio-molecules, either to form complexes or to recognize precise targets of their action. For instance, a particular transcription factor may activate one gene or another depending on its interactions with other proteins and not only with DNA. Hence, the ability of a protein to interact with other bio-molecules, and the partners they have at each particular time and location can be crucial to characterize the role of a protein. Proteins rarely act alone; they rather constitute a mingled network of physical interactions or other types of relationships (such as metabolic and regulatory) or signaling cascades. In this context, understanding the function of a protein implies to recognize the members of its neighborhood and to grasp how they associate, both at the systemic and atomic level. The network of physical interactions between the proteins of a system, cell or organism, is defined as the interactome. The purpose of this review is to deepen the description of interactomes at different levels of detail: from the molecular structure of complexes to the global topology of the network of interactions. The approaches and techniques applied experimentally and computationally to attain each level are depicted. The limits of each technique and its integration into a model network, the challenges and actual problems of completeness of an interactome, and the reliability of the interactions are reviewed and summarized. Finally, the application of the current knowledge of protein-protein interactions on modern network medicine and protein function annotation is also explored.
Collapse
Affiliation(s)
- Javier Garcia-Garcia
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Jaume Bonet
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Emre Guney
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Oriol Fornes
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Joan Planas
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Group, GRIB-IMIM, Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Catalonia, Spain.
| |
Collapse
|
49
|
De Ferrari L, Aitken S, van Hemert J, Goryanin I. EnzML: multi-label prediction of enzyme classes using InterPro signatures. BMC Bioinformatics 2012; 13:61. [PMID: 22533924 PMCID: PMC3483700 DOI: 10.1186/1471-2105-13-61] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Accepted: 03/31/2012] [Indexed: 02/07/2023] Open
Abstract
Background Manual annotation of enzymatic functions cannot keep up with automatic genome sequencing. In this work we explore the capacity of InterPro sequence signatures to automatically predict enzymatic function. Results We present EnzML, a multi-label classification method that can efficiently account also for proteins with multiple enzymatic functions: 50,000 in UniProt. EnzML was evaluated using a standard set of 300,747 proteins for which the manually curated Swiss-Prot and KEGG databases have agreeing Enzyme Commission (EC) annotations. EnzML achieved more than 98% subset accuracy (exact match of all correct Enzyme Commission classes of a protein) for the entire dataset and between 87 and 97% subset accuracy in reannotating eight entire proteomes: human, mouse, rat, mouse-ear cress, fruit fly, the S. pombe yeast, the E. coli bacterium and the M. jannaschii archaebacterium. To understand the role played by the dataset size, we compared the cross-evaluation results of smaller datasets, either constructed at random or from specific taxonomic domains such as archaea, bacteria, fungi, invertebrates, plants and vertebrates. The results were confirmed even when the redundancy in the dataset was reduced using UniRef100, UniRef90 or UniRef50 clusters. Conclusions InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function. This representation makes multi-label machine learning feasible in reasonable time (30 minutes to train on 300,747 instances with 10,852 attributes and 2,201 class values) using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN).
Collapse
Affiliation(s)
- Luna De Ferrari
- Computational Systems Biology and Bioinformatics, School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, UK.
| | | | | | | |
Collapse
|
50
|
Weile J, James K, Hallinan J, Cockell SJ, Lord P, Wipat A, Wilkinson DJ. Bayesian integration of networks without gold standards. ACTA ACUST UNITED AC 2012; 28:1495-500. [PMID: 22492647 PMCID: PMC3356839 DOI: 10.1093/bioinformatics/bts154] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Motivation: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality ‘gold standard’ reference networks, but such reference networks are not always available. Results: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein–protein interaction experiments. Availability: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/ Contact:darren.wilkinson@ncl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jochen Weile
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | |
Collapse
|