1
|
Yuan Z, Peng J, Gao L, Shao R. Fractal and first-passage properties of a class of self-similar networks. CHAOS (WOODBURY, N.Y.) 2024; 34:033134. [PMID: 38526982 DOI: 10.1063/5.0196934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/01/2024] [Indexed: 03/27/2024]
Abstract
A class of self-similar networks, obtained by recursively replacing each edge of the current network with a well-designed structure (generator) and known as edge-iteration networks, has garnered considerable attention owing to its role in presenting rich network models to mimic real objects with self-similar structures. The generator dominates the structural and dynamic properties of edge-iteration networks. However, the general relationships between these networks' structural and dynamic properties and their generators remain unclear. We study the fractal and first-passage properties, such as the fractal dimension, walk dimension, resistance exponent, spectral dimension, and global mean first-passage time, which is the mean time for a walker, starting from a randomly selected node and reaching the fixed target node for the first time. We disclose the properties of the generators that dominate the fractal and first-passage properties of general edge-iteration networks. A clear relationship between the fractal and first-passage properties of the edge-iteration networks and the related properties of the generators are presented. The upper and lower bounds of these quantities are also discussed. Thus, networks can be customized to meet the requirements of fractal and dynamic properties by selecting an appropriate generator and tuning their structural parameters. The results obtained here shed light on the design and optimization of network structures.
Collapse
Affiliation(s)
- Zhenhua Yuan
- School of Mathematics and Information Science, Guangzhou University, Guangzhou 510006, China
- Guangdong Provincial Key Laboratory, Co-sponsored by the Province and City of Information Security Technology, Guangzhou University, Guangzhou 510006, China
- Guangzhou Center for Applied Mathematics, Guangzhou University, Guangzhou 510006, China
| | - Junhao Peng
- School of Mathematics and Information Science, Guangzhou University, Guangzhou 510006, China
- Guangdong Provincial Key Laboratory, Co-sponsored by the Province and City of Information Security Technology, Guangzhou University, Guangzhou 510006, China
- Guangzhou Center for Applied Mathematics, Guangzhou University, Guangzhou 510006, China
| | - Long Gao
- School of Mathematics and Information Science, Guangzhou University, Guangzhou 510006, China
- Guangdong Provincial Key Laboratory, Co-sponsored by the Province and City of Information Security Technology, Guangzhou University, Guangzhou 510006, China
- Guangzhou Center for Applied Mathematics, Guangzhou University, Guangzhou 510006, China
| | - Renxiang Shao
- School of Mathematics and Information Science, Guangzhou University, Guangzhou 510006, China
- Guangdong Provincial Key Laboratory, Co-sponsored by the Province and City of Information Security Technology, Guangzhou University, Guangzhou 510006, China
- Guangzhou Center for Applied Mathematics, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
2
|
Shah HA, Liu J, Yang Z, Feng J. Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways. Front Mol Biosci 2021; 8:634141. [PMID: 34222327 PMCID: PMC8247443 DOI: 10.3389/fmolb.2021.634141] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/01/2021] [Indexed: 11/13/2022] Open
Abstract
Prediction and reconstruction of metabolic pathways play significant roles in many fields such as genetic engineering, metabolic engineering, drug discovery, and are becoming the most active research topics in synthetic biology. With the increase of related data and with the development of machine learning techniques, there have many machine leaning based methods been proposed for prediction or reconstruction of metabolic pathways. Machine learning techniques are showing state-of-the-art performance to handle the rapidly increasing volume of data in synthetic biology. To support researchers in this field, we briefly review the research progress of metabolic pathway reconstruction and prediction based on machine learning. Some challenging issues in the reconstruction of metabolic pathways are also discussed in this paper.
Collapse
Affiliation(s)
- Hayat Ali Shah
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Juan Liu
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Zhihui Yang
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| | - Jing Feng
- Institute of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan, China
| |
Collapse
|
3
|
Abstract
Domains are the structural, functional and evolutionary units of proteins. They combine to form multidomain proteins. The evolutionary history of this molecular combinatorics has been studied with phylogenomic methods. Here, we construct networks of domain organization and explore their evolution. A time series of networks revealed two ancient waves of structural novelty arising from ancient 'p-loop' and 'winged helix' domains and a massive 'big bang' of domain organization. The evolutionary recruitment of domains was highly modular, hierarchical and ongoing. Domain rearrangements elicited non-random and scale-free network structure. Comparative analyses of preferential attachment, randomness and modularity showed yin-and-yang complementary transition and biphasic patterns along the structural chronology. Remarkably, the evolving networks highlighted a central evolutionary role of cofactor-supporting structures of non-ribosomal peptide synthesis pathways, likely crucial to the early development of the genetic code. Some highly modular domains featured dual response regulation in two-component signal transduction systems with DNA-binding activity linked to transcriptional regulation of responses to environmental change. Interestingly, hub domains across the evolving networks shared the historical role of DNA binding and editing, an ancient protein function in molecular evolution. Our investigation unfolds historical source-sink patterns of evolutionary recruitment that further our understanding of protein architectures and functions.
Collapse
|
4
|
Guo J, Singh P, Bassler KE. Reduced network extremal ensemble learning (RenEEL) scheme for community detection in complex networks. Sci Rep 2019; 9:14234. [PMID: 31578406 PMCID: PMC6775136 DOI: 10.1038/s41598-019-50739-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 09/17/2019] [Indexed: 11/30/2022] Open
Abstract
We introduce an ensemble learning scheme for community detection in complex networks. The scheme uses a Machine Learning algorithmic paradigm we call Extremal Ensemble Learning. It uses iterative extremal updating of an ensemble of network partitions, which can be found by a conventional base algorithm, to find a node partition that maximizes modularity. At each iteration, core groups of nodes that are in the same community in every ensemble partition are identified and used to form a reduced network. Partitions of the reduced network are then found and used to update the ensemble. The smaller size of the reduced network makes the scheme efficient. We use the scheme to analyze the community structure in a set of commonly studied benchmark networks and find that it outperforms all other known methods for finding the partition with maximum modularity.
Collapse
Affiliation(s)
- Jiahao Guo
- Department of Physics, University of Houston, Houston, Texas, 77204, USA.,Texas Center for Superconductivity, University of Houston, Houston, Texas, 77204, USA
| | - Pramesh Singh
- Department of Physics, University of Houston, Houston, Texas, 77204, USA.,Texas Center for Superconductivity, University of Houston, Houston, Texas, 77204, USA
| | - Kevin E Bassler
- Department of Physics, University of Houston, Houston, Texas, 77204, USA. .,Texas Center for Superconductivity, University of Houston, Houston, Texas, 77204, USA. .,Department of Mathematics, University of Houston, Houston, Texas, 77204, USA.
| |
Collapse
|
5
|
Hoyos-Idrobo A, Varoquaux G, Kahn J, Thirion B. Recursive Nearest Agglomeration (ReNA): Fast Clustering for Approximation of Structured Signals. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:669-681. [PMID: 29993861 DOI: 10.1109/tpami.2018.2815524] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clustering schemes for data reductions that capture this structure. An impediment to fast dimension reduction is then that good clustering comes with large algorithmic costs. We address it by contributing a linear-time agglomerative clustering scheme, Recursive Nearest Agglomeration (ReNA). Unlike existing fast agglomerative schemes, it avoids the creation of giant clusters. We empirically validate that it approximates the data as well as traditional variance-minimizing clustering schemes that have a quadratic complexity. In addition, we analyze signal approximation with feature clustering and show that it can remove noise, improving subsequent analysis steps. As a consequence, data reduction by clustering features with ReNA yields very fast and accurate models, enabling to process large datasets on budget. Our theoretical analysis is backed by extensive experiments on publicly-available data that illustrate the computation efficiency and the denoising properties of the resulting dimension reduction scheme.
Collapse
|
6
|
Zhao Y, Forst CV, Sayegh CE, Wang IM, Yang X, Zhang B. Molecular and genetic inflammation networks in major human diseases. MOLECULAR BIOSYSTEMS 2017; 12:2318-41. [PMID: 27303926 DOI: 10.1039/c6mb00240d] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
It has been well-recognized that inflammation alongside tissue repair and damage maintaining tissue homeostasis determines the initiation and progression of complex diseases. Albeit with the accomplishment of having captured the most critical inflammation-involved molecules, genetic susceptibilities, epigenetic factors, and environmental factors, our schemata on the role of inflammation in complex diseases remain largely patchy, in part due to the success of reductionism in terms of research methodology per se. Omics data alongside the advances in data integration technologies have enabled reconstruction of molecular and genetic inflammation networks which shed light on the underlying pathophysiology of complex diseases or clinical conditions. Given the proven beneficial role of anti-inflammation in coronary heart disease as well as other complex diseases and immunotherapy as a revolutionary transition in oncology, it becomes timely to review our current understanding of the molecular and genetic inflammation networks underlying major human diseases. In this review, we first briefly discuss the complexity of infectious diseases and then highlight recently uncovered molecular and genetic inflammation networks in other major human diseases including obesity, type II diabetes, coronary heart disease, late onset Alzheimer's disease, Parkinson's disease, and sporadic cancer. The commonality and specificity of these molecular networks are addressed in the context of genetics based on genome-wide association study (GWAS). The double-sword role of inflammation, such as how the aberrant type 1 and/or type 2 immunity leads to chronic and severe clinical conditions, remains open in terms of the inflammasome and the core inflammatome network features. Increasingly available large Omics and clinical data in tandem with systems biology approaches have offered an exciting yet challenging opportunity toward reconstruction of more comprehensive and dynamic molecular and genetic inflammation networks, which hold great promise in transiting network snapshots to video-style multi-scale interplays of disease mechanisms, in turn leading to effective clinical intervention.
Collapse
Affiliation(s)
- Yongzhong Zhao
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, 1425 Madison Avenue, NY 10029, USA. and Institute of Genomics and Multiscale Biology, Mount Sinai School of Medicine, 1425 Madison Avenue, NY 10029, USA
| | - Christian V Forst
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, 1425 Madison Avenue, NY 10029, USA. and Institute of Genomics and Multiscale Biology, Mount Sinai School of Medicine, 1425 Madison Avenue, NY 10029, USA
| | - Camil E Sayegh
- Vertex Pharmaceuticals (Canada) Incorporated, 275 Armand-Frappier, Laval, Quebec H7V 4A7, Canada
| | - I-Ming Wang
- Informatics and Analysis, Merck Research Laboratories, Merck & Co., Inc., 770 Sumneytown Pike, West Point, PA 19486, USA.
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, CA 90025, USA.
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, 1425 Madison Avenue, NY 10029, USA. and Institute of Genomics and Multiscale Biology, Mount Sinai School of Medicine, 1425 Madison Avenue, NY 10029, USA
| |
Collapse
|
7
|
Sun Y, Ma L, Zeng A, Wang WX. Spreading to localized targets in complex networks. Sci Rep 2016; 6:38865. [PMID: 27966613 PMCID: PMC5155210 DOI: 10.1038/srep38865] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 11/14/2016] [Indexed: 11/20/2022] Open
Abstract
As an important type of dynamics on complex networks, spreading is widely used to model many real processes such as the epidemic contagion and information propagation. One of the most significant research questions in spreading is to rank the spreading ability of nodes in the network. To this end, substantial effort has been made and a variety of effective methods have been proposed. These methods usually define the spreading ability of a node as the number of finally infected nodes given that the spreading is initialized from the node. However, in many real cases such as advertising and news propagation, the spreading only aims to cover a specific group of nodes. Therefore, it is necessary to study the spreading ability of nodes towards localized targets in complex networks. In this paper, we propose a reversed local path algorithm for this problem. Simulation results show that our method outperforms the existing methods in identifying the influential nodes with respect to these localized targets. Moreover, the influential spreaders identified by our method can effectively avoid infecting the non-target nodes in the spreading process.
Collapse
Affiliation(s)
- Ye Sun
- School of Systems Science, Beijing Normal University, Beijing 100875, P. R. China
| | - Long Ma
- School of Systems Science, Beijing Normal University, Beijing 100875, P. R. China
| | - An Zeng
- School of Systems Science, Beijing Normal University, Beijing 100875, P. R. China
| | - Wen-Xu Wang
- School of Systems Science, Beijing Normal University, Beijing 100875, P. R. China
| |
Collapse
|
8
|
Miranda GHB, Machicao J, Bruno OM. Exploring Spatio-temporal Dynamics of Cellular Automata for Pattern Recognition in Networks. Sci Rep 2016; 6:37329. [PMID: 27874024 PMCID: PMC5118793 DOI: 10.1038/srep37329] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 10/18/2016] [Indexed: 11/13/2022] Open
Abstract
Network science is an interdisciplinary field which provides an integrative approach for the study of complex systems. In recent years, network modeling has been used for the study of emergent phenomena in many real-world applications. Pattern recognition in networks has been drawing attention to the importance of network characterization, which may lead to understanding the topological properties that are related to the network model. In this paper, the Life-Like Network Automata (LLNA) method is introduced, which was designed for pattern recognition in networks. LLNA uses the network topology as a tessellation of Cellular Automata (CA), whose dynamics produces a spatio-temporal pattern used to extract the feature vector for network characterization. The method was evaluated using synthetic and real-world networks. In the latter, three pattern recognition applications were used: (i) identifying organisms from distinct domains of life through their metabolic networks, (ii) identifying online social networks and (iii) classifying stomata distribution patterns varying according to different lighting conditions. LLNA was compared to structural measurements and surpasses them in real-world applications, achieving improvement in the classification rate as high as 23%, 4% and 7% respectively. Therefore, the proposed method is a good choice for pattern recognition applications using networks and demonstrates potential for general applicability.
Collapse
Affiliation(s)
| | - Jeaneth Machicao
- São Carlos Institute of Physics, University of São Paulo, São Carlos - SP, PO Box 369, 13560-970, Brazil
| | - Odemir Martinez Bruno
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos - SP, Brazil
- São Carlos Institute of Physics, University of São Paulo, São Carlos - SP, PO Box 369, 13560-970, Brazil
| |
Collapse
|
9
|
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) 2016; 6:life6030039. [PMID: 27618105 PMCID: PMC5041015 DOI: 10.3390/life6030039] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/29/2016] [Accepted: 09/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines.
Collapse
Affiliation(s)
- Rémi Zallot
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Katherine J Harrison
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
10
|
Cuevas DA, Edirisinghe J, Henry CS, Overbeek R, O’Connell TG, Edwards RA. From DNA to FBA: How to Build Your Own Genome-Scale Metabolic Model. Front Microbiol 2016; 7:907. [PMID: 27379044 PMCID: PMC4911401 DOI: 10.3389/fmicb.2016.00907] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 05/27/2016] [Indexed: 11/19/2022] Open
Abstract
Microbiological studies are increasingly relying on in silico methods to perform exploration and rapid analysis of genomic data, and functional genomics studies are supplemented by the new perspectives that genome-scale metabolic models offer. A mathematical model consisting of a microbe's entire metabolic map can be rapidly determined from whole-genome sequencing and annotating the genomic material encoded in its DNA. Flux-balance analysis (FBA), a linear programming technique that uses metabolic models to predict the phenotypic responses imposed by environmental elements and factors, is the leading method to simulate and manipulate cellular growth in silico. However, the process of creating an accurate model to use in FBA consists of a series of steps involving a multitude of connections between bioinformatics databases, enzyme resources, and metabolic pathways. We present the methodology and procedure to obtain a metabolic model using PyFBA, an extensible Python-based open-source software package aimed to provide a platform where functional annotations are used to build metabolic models (http://linsalrob.github.io/PyFBA). Backed by the Model SEED biochemistry database, PyFBA contains methods to reconstruct a microbe's metabolic map, run FBA upon different media conditions, and gap-fill its metabolism. The extensibility of PyFBA facilitates novel techniques in creating accurate genome-scale metabolic models.
Collapse
Affiliation(s)
- Daniel A. Cuevas
- Computational Science Research Center, San Diego State University, San DiegoCA, USA
| | - Janaka Edirisinghe
- Mathematics and Computer Science Division, Argonne National Laboratory, ArgonneIL, USA
| | - Chris S. Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, ArgonneIL, USA
| | - Ross Overbeek
- Fellowship for Interpretation of Genomes, Burr RidgeIL, USA
| | - Taylor G. O’Connell
- Biological and Medical Informatics Research Center, San Diego State University, San DiegoCA, USA
| | - Robert A. Edwards
- Computational Science Research Center, San Diego State University, San DiegoCA, USA
- Biological and Medical Informatics Research Center, San Diego State University, San DiegoCA, USA
- Department of Computer Science, San Diego State University, San DiegoCA, USA
- Department of Biology, San Diego State University, San DiegoCA, USA
| |
Collapse
|
11
|
Aziz MF, Caetano-Anollés K, Caetano-Anollés G. The early history and emergence of molecular functions and modular scale-free network behavior. Sci Rep 2016; 6:25058. [PMID: 27121452 PMCID: PMC4848518 DOI: 10.1038/srep25058] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 04/08/2016] [Indexed: 12/17/2022] Open
Abstract
The formation of protein structural domains requires that biochemical functions, defined by conserved amino acid sequence motifs, be embedded into a structural scaffold. Here we trace domain history onto a bipartite network of elementary functional loop sequences and domain structures defined at the fold superfamily level of SCOP classification. The resulting 'elementary functionome' network and its loop motif and structural domain graph projections create evolutionary 'waterfalls' describing the emergence of primordial functions. Waterfalls reveal how ancient loops are shared by domain structures in two initial waves of functional innovation that involve founder 'p-loop' and 'winged helix' domain structures. They also uncover a dynamics of modular motif embedding in domain structures that is ongoing, which transfers 'preferential' cooption properties of ancient loops to emerging domains. Remarkably, we find that the emergence of molecular functions induces hierarchical modularity and power law behavior in network evolution as the network of motifs and structures expand metabolic pathways and translation.
Collapse
Affiliation(s)
- M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, United States
| | - Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, United States
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, United States
| |
Collapse
|
12
|
Dhulekar N, Ray S, Yuan D, Baskaran A, Oztan B, Larsen M, Yener B. Prediction of Growth Factor-Dependent Cleft Formation During Branching Morphogenesis Using A Dynamic Graph-Based Growth Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:350-64. [PMID: 27070978 PMCID: PMC4917296 DOI: 10.1109/tcbb.2015.2452916] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This study considers the problem of describing and predicting cleft formation during the early stages of branching morphogenesis in mouse submandibular salivary glands (SMG) under the influence of varied concentrations of epidermal growth factors (EGF). Given a time-lapse video of a growing SMG, first we build a descriptive model that captures the underlying biological process and quantifies the ground truth. Tissue-scale (global) and morphological features related to regions of interest (local features) are used to characterize the biological ground truth. Second, we devise a predictive growth model that simulates EGF-modulated branching morphogenesis using a dynamic graph algorithm, which is driven by biological parameters such as EGF concentration, mitosis rate, and cleft progression rate. Given the initial configuration of the SMG, the evolution of the dynamic graph predicts the cleft formation, while maintaining the local structural characteristics of the SMG. We determined that higher EGF concentrations cause the formation of higher number of buds and comparatively shallow cleft depths. Third, we compared the prediction accuracy of our model to the Glazier-Graner-Hogeweg (GGH) model, an on-lattice Monte-Carlo simulation model, under a specific energy function parameter set that allows new rounds of de novo cleft formation. The results demonstrate that the dynamic graph model yields comparable simulations of gland growth to that of the GGH model with a significantly lower computational complexity. Fourth, we enhanced this model to predict the SMG morphology for an EGF concentration without the assistance of a ground truth time-lapse biological video data; this is a substantial benefit of our model over other similar models that are guided and terminated by information regarding the final SMG morphology. Hence, our model is suitable for testing the impact of different biological parameters involved with the process of branching morphogenesis in silico, while reducing the requirement of in vivo experiments.
Collapse
|
13
|
Ghorbani S, Tahmoorespur M, Masoudi Nejad A, Nasiri M, Asgari Y. Analysis of the enzyme network involved in cattle milk production using graph theory. MOLECULAR BIOLOGY RESEARCH COMMUNICATIONS 2015; 4:93-103. [PMID: 27844001 PMCID: PMC5019301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Understanding cattle metabolism and its relationship with milk products is important in bovine breeding. A systemic view could lead to consequences that will result in a better understanding of existing concepts. Topological indices and quantitative characterizations mostly result from the application of graph theory on biological data. In the present work, the enzyme network involved in cattle milk production was reconstructed and analyzed based on available bovine genome information using several public datasets (NCBI, Uniprot, KEGG, and Brenda). The reconstructed network consisted of 3605 reactions named by KEGG compound numbers and 646 enzymes that catalyzed the corresponding reactions. The characteristics of the directed and undirected network were analyzed using Graph Theory. The mean path length was calculated to be4.39 and 5.41 for directed and undirected networks, respectively. The top 11 hub enzymes whose abnormality could harm bovine health and reduce milk production were determined. Therefore, the aim of constructing the enzyme centric network was twofold; first to find out whether such network followed the same properties of other biological networks, and second, to find the key enzymes. The results of the present study can improve our understanding of milk production in cattle. Also, analysis of the enzyme network can help improve the modeling and simulation of biological systems and help design desired phenotypes to increase milk production quality or quantity.
Collapse
Affiliation(s)
| | | | - Ali Masoudi Nejad
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | | | - Yazdan Asgari
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
14
|
|
15
|
Wattam AR, Foster JT, Mane SP, Beckstrom-Sternberg SM, Beckstrom-Sternberg JM, Dickerman AW, Keim P, Pearson T, Shukla M, Ward DV, Williams KP, Sobral BW, Tsolis RM, Whatmore AM, O'Callaghan D. Comparative phylogenomics and evolution of the Brucellae reveal a path to virulence. J Bacteriol 2014; 196:920-30. [PMID: 24336939 PMCID: PMC3957692 DOI: 10.1128/jb.01091-13] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 12/04/2013] [Indexed: 11/20/2022] Open
Abstract
Brucella species include important zoonotic pathogens that have a substantial impact on both agriculture and human health throughout the world. Brucellae are thought of as "stealth pathogens" that escape recognition by the host innate immune response, modulate the acquired immune response, and evade intracellular destruction. We analyzed the genome sequences of members of the family Brucellaceae to assess its evolutionary history from likely free-living soil-based progenitors into highly successful intracellular pathogens. Phylogenetic analysis split the genus into two groups: recently identified and early-dividing "atypical" strains and a highly conserved "classical" core clade containing the major pathogenic species. Lateral gene transfer events brought unique genomic regions into Brucella that differentiated them from Ochrobactrum and allowed the stepwise acquisition of virulence factors that include a type IV secretion system, a perosamine-based O antigen, and systems for sequestering metal ions that are absent in progenitors. Subsequent radiation within the core Brucella resulted in lineages that appear to have evolved within their preferred mammalian hosts, restricting their virulence to become stealth pathogens capable of causing long-term chronic infections.
Collapse
Affiliation(s)
- Alice R. Wattam
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
| | - Jeffrey T. Foster
- Center for Microbial Genetics & Genomics, Northern Arizona University, Flagstaff, Arizona, USA
| | | | - Stephen M. Beckstrom-Sternberg
- Center for Microbial Genetics & Genomics, Northern Arizona University, Flagstaff, Arizona, USA
- Translational Genomics Research Institute, Pathogen Genomics Division, Phoenix, Arizona, USA
| | - James M. Beckstrom-Sternberg
- Center for Microbial Genetics & Genomics, Northern Arizona University, Flagstaff, Arizona, USA
- Translational Genomics Research Institute, Pathogen Genomics Division, Phoenix, Arizona, USA
| | - Allan W. Dickerman
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
| | - Paul Keim
- Center for Microbial Genetics & Genomics, Northern Arizona University, Flagstaff, Arizona, USA
- Translational Genomics Research Institute, Pathogen Genomics Division, Phoenix, Arizona, USA
| | - Talima Pearson
- Center for Microbial Genetics & Genomics, Northern Arizona University, Flagstaff, Arizona, USA
| | - Maulik Shukla
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
| | - Doyle V. Ward
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Kelly P. Williams
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
| | - Bruno W. Sobral
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, USA
| | - Renee M. Tsolis
- Department of Medical Microbiology and Immunology, School of Medicine, University of California, Davis, California, USA
| | - Adrian M. Whatmore
- Department of Bacteriology, Animal Health & Veterinary Laboratories Agency, Addlestone, United Kingdom
| | - David O'Callaghan
- INSERM U1047, UFR Médecine, Nîmes, France
- Université Montpellier 1, UFR Médecine, Nîmes, France
| |
Collapse
|
16
|
Cardinal-Fernández P, Nin N, Ruíz-Cabello J, Lorente JA. Systems medicine: a new approach to clinical practice. Arch Bronconeumol 2014; 50:444-51. [PMID: 24397963 DOI: 10.1016/j.arbres.2013.10.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Revised: 10/13/2013] [Accepted: 10/31/2013] [Indexed: 10/25/2022]
Abstract
Most respiratory diseases are considered complex diseases as their susceptibility and outcomes are determined by the interaction between host-dependent factors (genetic factors, comorbidities, etc.) and environmental factors (exposure to microorganisms or allergens, treatments received, etc.) The reductionist approach in the study of diseases has been of fundamental importance for the understanding of the different components of a system. Systems biology or systems medicine is a complementary approach aimed at analyzing the interactions between the different components within one organizational level (genome, transcriptome, proteome), and then between the different levels. Systems medicine is currently used for the interpretation and understanding of the pathogenesis and pathophysiology of different diseases, biomarker discovery, design of innovative therapeutic targets, and the drawing up of computational models for different biological processes. In this review we discuss the most relevant concepts of the theory underlying systems medicine, as well as its applications in the various biological processes in humans.
Collapse
Affiliation(s)
- Pablo Cardinal-Fernández
- Servicio de Medicina Intensiva, Hospital Universitario de Getafe, Madrid, España; CIBER de Enfermedades Respiratorias, Madrid, España
| | - Nicolás Nin
- CIBER de Enfermedades Respiratorias, Madrid, España; Servicio de Medicina Intensiva, Hospital Universitario de Torrejón, Madrid, España
| | - Jesús Ruíz-Cabello
- CIBER de Enfermedades Respiratorias, Madrid, España; Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, España; Universidad Complutense de Madrid, Madrid, España
| | - José A Lorente
- Servicio de Medicina Intensiva, Hospital Universitario de Getafe, Madrid, España; CIBER de Enfermedades Respiratorias, Madrid, España; Universidad Europea de Madrid, Madrid, España.
| |
Collapse
|
17
|
Verma M, Lal D, Saxena A, Anand S, Kaur J, Kaur J, Lal R. Understanding alternative fluxes/effluxes through comparative metabolic pathway analysis of phylum actinobacteria using a simplified approach. Gene 2013; 531:306-17. [DOI: 10.1016/j.gene.2013.08.076] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 08/14/2013] [Accepted: 08/22/2013] [Indexed: 11/28/2022]
|
18
|
Moreira RN, Domingues S, Viegas SC, Amblar M, Arraiano CM. Synergies between RNA degradation and trans-translation in Streptococcus pneumoniae: cross regulation and co-transcription of RNase R and SmpB. BMC Microbiol 2012; 12:268. [PMID: 23167513 PMCID: PMC3534368 DOI: 10.1186/1471-2180-12-268] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Accepted: 10/31/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ribonuclease R (RNase R) is an exoribonuclease that recognizes and degrades a wide range of RNA molecules. It is a stress-induced protein shown to be important for the establishment of virulence in several pathogenic bacteria. RNase R has also been implicated in the trans-translation process. Transfer-messenger RNA (tmRNA/SsrA RNA) and SmpB are the main effectors of trans-translation, an RNA and protein quality control system that resolves challenges associated with stalled ribosomes on non-stop mRNAs. Trans-translation has also been associated with deficiencies in stress-response mechanisms and pathogenicity. RESULTS In this work we study the expression of RNase R in the human pathogen Streptococcus pneumoniae and analyse the interplay of this enzyme with the main components of the trans-translation machinery (SmpB and tmRNA/SsrA). We show that RNase R is induced after a 37°C to 15°C temperature downshift and that its levels are dependent on SmpB. On the other hand, our results revealed a strong accumulation of the smpB transcript in the absence of RNase R at 15°C. Transcriptional analysis of the S. pneumoniae rnr gene demonstrated that it is co-transcribed with the flanking genes, secG and smpB. Transcription of these genes is driven from a promoter upstream of secG and the transcript is processed to yield mature independent mRNAs. This genetic organization seems to be a common feature of Gram positive bacteria, and the biological significance of this gene cluster is further discussed. CONCLUSIONS This study unravels an additional contribution of RNase R to the trans-translation system by demonstrating that smpB is regulated by this exoribonuclease. RNase R in turn, is shown to be under the control of SmpB. These proteins are therefore mutually dependent and cross-regulated. The data presented here shed light on the interactions between RNase R, trans-translation and cold-shock response in an important human pathogen.
Collapse
Affiliation(s)
- Ricardo N Moreira
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Av. da República, Oeiras 2780-157, Portugal
| | | | | | | | | |
Collapse
|
19
|
Yu GX. RULEMINER: A KNOWLEDGE SYSTEM FOR SUPPORTING HIGH-THROUGHPUT PROTEIN FUNCTION ANNOTATIONS. J Bioinform Comput Biol 2011; 2:615-37. [PMID: 15617156 DOI: 10.1142/s0219720004000752] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2003] [Revised: 03/23/2004] [Accepted: 03/24/2004] [Indexed: 11/18/2022]
Abstract
In this paper, we present RuleMiner, a knowledge system to facilitate a seamless integration of multi-sequence analysis tools and define profile-based rules for supporting high-throughput protein function annotations. This system consists of three essential components, Protein Function Groups (PFGs), PFG profiles and rules. The PFGs, established from an integrated analysis of current knowledge of protein functions from Swiss-Prot database and protein family-based sequence classifications, cover all possible cellular functions available in the database. The PFG profiles illustrate detailed protein features in the PFGs as in sequence conservations, the occurrences of sequence-based motifs, domains and species distributions. The rules, extracted from the PFG profiles, describe the clear relationships between these PFGs and all possible features. As a result, the RuleMiner is able to provide an enhanced capability for protein function analysis, such as results from the integrated sequence analysis tools for given proteins can be comparatively analyzed due to the clear feature-PFG relationships. Also, much needed guidance is readily available for such analysis. If the rules describe one-to-one (unique) relationships between the protein features and the PFGs, then these features can be utilized as unique functional identifiers and cellular functions of unknown proteins can be reliably determined. Otherwise, additional information has to be provided.
Collapse
Affiliation(s)
- Gong-Xin Yu
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA.
| |
Collapse
|
20
|
Sadovskaya NS, Sutormin RA, Gelfand MS. RECOGNITION OF TRANSMEMBRANE SEGMENTS IN PROTEINS: REVIEW AND CONSISTENCY-BASED BENCHMARKING OF INTERNET SERVERS. J Bioinform Comput Biol 2011; 4:1033-56. [PMID: 17099940 DOI: 10.1142/s0219720006002326] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Revised: 06/21/2006] [Accepted: 06/22/2006] [Indexed: 11/18/2022]
Abstract
Membrane proteins perform a number of crucial functions as transporters, receptors, and components of enzyme complexes. Identification of membrane proteins and prediction of their topology is thus an important part of genome annotation. We present here an overview of transmembrane segments in protein sequences, summarize data from large-scale genome studies, and report results of benchmarking of several popular internet servers.
Collapse
Affiliation(s)
- Nataliya S Sadovskaya
- Institute for Information Transmission Problems, Russian Academy of Science, Bolshoi Karetny per. 19, Moscow 127994, Russia.
| | | | | |
Collapse
|
21
|
Mueller LAJ, Kugler KG, Netzer M, Graber A, Dehmer M. A network-based approach to classify the three domains of life. Biol Direct 2011; 6:53. [PMID: 21995640 PMCID: PMC3226542 DOI: 10.1186/1745-6150-6-53] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Accepted: 10/13/2011] [Indexed: 11/22/2022] Open
Abstract
Background Identifying group-specific characteristics in metabolic networks can provide better insight into evolutionary developments. Here, we present an approach to classify the three domains of life using topological information about the underlying metabolic networks. These networks have been shown to share domain-independent structural similarities, which pose a special challenge for our endeavour. We quantify specific structural information by using topological network descriptors to classify this set of metabolic networks. Such measures quantify the structural complexity of the underlying networks. In this study, we use such measures to capture domain-specific structural features of the metabolic networks to classify the data set. So far, it has been a challenging undertaking to examine what kind of structural complexity such measures do detect. In this paper, we apply two groups of topological network descriptors to metabolic networks and evaluate their classification performance. Moreover, we combine the two groups to perform a feature selection to estimate the structural features with the highest classification ability in order to optimize the classification performance. Results By combining the two groups, we can identify seven topological network descriptors that show a group-specific characteristic by ANOVA. A multivariate analysis using feature selection and supervised machine learning leads to a reasonable classification performance with a weighted F-score of 83.7% and an accuracy of 83.9%. We further demonstrate that our approach outperforms alternative methods. Also, our results reveal that entropy-based descriptors show the highest classification ability for this set of networks. Conclusions Our results show that these particular topological network descriptors are able to capture domain-specific structural characteristics for classifying metabolic networks between the three domains of life.
Collapse
Affiliation(s)
- Laurin A J Mueller
- Institute for Bioinformatics and Translational Research, Department of Biomedical Sciences and Engineering, University for Health Sciences, Medical Informatics and Technology (UMIT), Austria
| | | | | | | | | |
Collapse
|
22
|
Moon JY, Lee D, Koolen JH, Kim S. Core-periphery disparity in fractal behavior of complex networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:037103. [PMID: 22060535 DOI: 10.1103/physreve.84.037103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2011] [Revised: 06/21/2011] [Indexed: 05/31/2023]
Abstract
We show that there is a disparity in fractal scaling behavior of the core and peripheral parts of empirical small-world scale-free networks. We decompose the network into a core and a periphery and measure the fractal dimension of each part separately using the box-counting method. We find that the core of small-world scale-free networks has a nonfractal structure, whereas the periphery exhibits either fractal or nonfractal scaling. The fractal dimension of the periphery is found to coincide with one for the whole network.
Collapse
Affiliation(s)
- Joon-Young Moon
- Nonlinear and Complex Systems Laboratory, Department of Physics, Pohang University of Science and Technology, Pohang 790-784, Republic of Korea
| | | | | | | |
Collapse
|
23
|
Wurtele ES, Li J, Diao L, Zhang H, Foster CM, Fatland B, Dickerson J, Brown A, Cox Z, Cook D, Lee EK, Hofmann H. MetNet: Software to Build and Model the Biogenetic Lattice of Arabidopsis. Comp Funct Genomics 2011; 4:239-45. [PMID: 18629120 PMCID: PMC2447407 DOI: 10.1002/cfg.285] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2003] [Revised: 02/07/2003] [Accepted: 02/10/2003] [Indexed: 11/06/2022] Open
Abstract
MetNet (http://www.botany.iastate.edu/∼mash/metnetex/metabolicnetex.html) is publicly
available software in development for analysis of genome-wide RNA, protein
and metabolite profiling data. The software is designed to enable the biologist to
visualize, statistically analyse and model a metabolic and regulatory network map
of Arabidopsis, combined with gene expression profiling data. It contains a JAVA
interface to an interactions database (MetNetDB) containing information on regulatory
and metabolic interactions derived from a combination of web databases (TAIR,
KEGG, BRENDA) and input from biologists in their area of expertise. FCModeler
captures input from MetNetDB in a graphical form. Sub-networks can be identified
and interpreted using simple fuzzy cognitive maps. FCModeler is intended to develop
and evaluate hypotheses, and provide a modelling framework for assessing the large
amounts of data captured by high-throughput gene expression experiments. FCModeler
and MetNetDB are currently being extended to three-dimensional virtual reality
display. The MetNet map, together with gene expression data, can be viewed using
multivariate graphics tools in GGobi linked with the data analytic tools in R. Users
can highlight different parts of the metabolic network and see the relevant expression
data highlighted in other data plots. Multi-dimensional expression data can be
rotated through different dimensions. Statistical analysis can be computed alongside
the visual. MetNet is designed to provide a framework for the formulation of testable
hypotheses regarding the function of specific genes, and in the long term provide
the basis for identification of metabolic and regulatory networks that control plant
composition and development.
Collapse
Affiliation(s)
- Eve Syrkin Wurtele
- Department of Genetics Cellular and Developmental Biology Iowa State University Ames IA 50011 USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Li J, Gao S, Wang J, Zhang C. Construction of comprehensive gene network for human mitochondria. CHINESE SCIENCE BULLETIN-CHINESE 2010. [DOI: 10.1007/s11434-010-3028-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
25
|
Abstract
The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches--based on the data collected with high throughput technologies--to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems.
Collapse
Affiliation(s)
- John Blazeck
- Department of Chemical Engineering, The University of Texas at Austin, 1 University Station, Austin, TX 78712, USA
| | | |
Collapse
|
26
|
Affiliation(s)
- J Wixon
- Bioinformatics Division, HGMP-RC, Hinxton, Cambridge CB10 1SB, UK
| |
Collapse
|
27
|
Aho T, Almusa H, Matilainen J, Larjo A, Ruusuvuori P, Aho KL, Wilhelm T, Lähdesmäki H, Beyer A, Harju M, Chowdhury S, Leinonen K, Roos C, Yli-Harja O. Reconstruction and validation of RefRec: a global model for the yeast molecular interaction network. PLoS One 2010; 5:e10662. [PMID: 20498836 PMCID: PMC2871048 DOI: 10.1371/journal.pone.0010662] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 04/15/2010] [Indexed: 11/26/2022] Open
Abstract
Molecular interaction networks establish all cell biological processes. The networks are under intensive research that is facilitated by new high-throughput measurement techniques for the detection, quantification, and characterization of molecules and their physical interactions. For the common model organism yeast Saccharomyces cerevisiae, public databases store a significant part of the accumulated information and, on the way to better understanding of the cellular processes, there is a need to integrate this information into a consistent reconstruction of the molecular interaction network. This work presents and validates RefRec, the most comprehensive molecular interaction network reconstruction currently available for yeast. The reconstruction integrates protein synthesis pathways, a metabolic network, and a protein-protein interaction network from major biological databases. The core of the reconstruction is based on a reference object approach in which genes, transcripts, and proteins are identified using their primary sequences. This enables their unambiguous identification and non-redundant integration. The obtained total number of different molecular species and their connecting interactions is approximately 67,000. In order to demonstrate the capacity of RefRec for functional predictions, it was used for simulating the gene knockout damage propagation in the molecular interaction network in approximately 590,000 experimentally validated mutant strains. Based on the simulation results, a statistical classifier was subsequently able to correctly predict the viability of most of the strains. The results also showed that the usage of different types of molecular species in the reconstruction is important for accurate phenotype prediction. In general, the findings demonstrate the benefits of global reconstructions of molecular interaction networks. With all the molecular species and their physical interactions explicitly modeled, our reconstruction is able to serve as a valuable resource in additional analyses involving objects from multiple molecular -omes. For that purpose, RefRec is freely available in the Systems Biology Markup Language format.
Collapse
Affiliation(s)
- Tommi Aho
- Department of Signal Processing, Tampere University of Technology, Tampere, Finland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Chowbina SR, Wu X, Zhang F, Li PM, Pandey R, Kasamsetty HN, Chen JY. HPD: an online integrated human pathway database enabling systems biology studies. BMC Bioinformatics 2009; 10 Suppl 11:S5. [PMID: 19811689 PMCID: PMC3226194 DOI: 10.1186/1471-2105-10-s11-s5] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Pathway-oriented experimental and computational studies have led to a significant accumulation of biological knowledge concerning three major types of biological pathway events: molecular signaling events, gene regulation events, and metabolic reaction events. A pathway consists of a series of molecular pathway events that link molecular entities such as proteins, genes, and metabolites. There are approximately 300 biological pathway resources as of April 2009 according to the Pathguide database; however, these pathway databases generally have poor coverage or poor quality, and are difficult to integrate, due to syntactic-level and semantic-level data incompatibilities. Results We developed the Human Pathway Database (HPD) by integrating heterogeneous human pathway data that are either curated at the NCI Pathway Interaction Database (PID), Reactome, BioCarta, KEGG or indexed from the Protein Lounge Web sites. Integration of pathway data at syntactic, semantic, and schematic levels was based on a unified pathway data model and data warehousing-based integration techniques. HPD provides a comprehensive online view that connects human proteins, genes, RNA transcripts, enzymes, signaling events, metabolic reaction events, and gene regulatory events. At the time of this writing HPD includes 999 human pathways and more than 59,341 human molecular entities. The HPD software provides both a user-friendly Web interface for online use and a robust relational database backend for advanced pathway querying. This pathway tool enables users to 1) search for human pathways from different resources by simply entering genes/proteins involved in pathways or words appearing in pathway names, 2) analyze pathway-protein association, 3) study pathway-pathway similarity, and 4) build integrated pathway networks. We demonstrated the usage and characteristics of the new HPD through three breast cancer case studies. Conclusion HPD http://bio.informatics.iupui.edu/HPD is a new resource for searching, managing, and studying human biological pathways. Users of HPD can search against large collections of human biological pathways, compare related pathways and their molecular entity compositions, and build high-quality, expanded-scope disease pathway models. The current HPD software can help users address a wide range of pathway-related questions in human disease biology studies.
Collapse
Affiliation(s)
- Sudhir R Chowbina
- Indiana University School of Informatics, Indianapolis, IN 46202, USA
| | | | | | | | | | | | | |
Collapse
|
29
|
Roy S, Filkov V. Strong associations between microbe phenotypes and their network architecture. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 80:040902. [PMID: 19905265 DOI: 10.1103/physreve.80.040902] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Indexed: 05/28/2023]
Abstract
Understanding the dependence and interplay between architecture and function in biological networks has great relevance to disease progression, biological fabrication, and biological systems in general. We propose methods to assess the association of various microbe characteristics and phenotypes with the topology of their networks. We adopt an automated approach to characterize metabolic networks of 32 microbial species using 11 topological metrics from complex networks. Clustering allows us to extract the indispensable, independent, and informative metrics. Using hierarchical linear modeling, we identify relevant subgroups of these metrics and establish that they associate with microbial phenotypes surprisingly well. This work can serve as a stepping stone to cataloging biologically relevant topological properties of networks and toward better modeling of phenotypes. The methods we use can also be applied to networks from other disciplines.
Collapse
Affiliation(s)
- Soumen Roy
- Department of Medicine and Institute for Genomics and Systems Biology, The University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
30
|
Félix L, Rosselló F, Valiente G. Efficient reconstruction of metabolic pathways by bidirectional chemical search. Bull Math Biol 2008; 71:750-69. [PMID: 19101770 PMCID: PMC2784519 DOI: 10.1007/s11538-008-9380-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2007] [Accepted: 11/24/2008] [Indexed: 11/25/2022]
Abstract
One of the main challenges in systems biology is the establishment of the metabolome: a catalogue of the metabolites and biochemical reactions present in a specific organism. Current knowledge of biochemical pathways as stored in public databases such as KEGG, is based on carefully curated genomic evidence for the presence of specific metabolites and enzymes that activate particular biochemical reactions. In this paper, we present an efficient method to build a substantial portion of the artificial chemistry defined by the metabolites and biochemical reactions in a given metabolic pathway, which is based on bidirectional chemical search. Computational results on the pathways stored in KEGG reveal novel biochemical pathways.
Collapse
Affiliation(s)
- Liliana Félix
- Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, 08034 Barcelona, Spain
| | - Francesc Rosselló
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122 Palma de Mallorca, Spain
- Research Institute of Health Science (IUNICS), University of the Balearic Islands, 07122 Palma de Mallorca, Spain
| | - Gabriel Valiente
- Algorithms, Bioinformatics, Complexity and Formal Methods Research Group, Technical University of Catalonia, 08034 Barcelona, Spain
- Research Institute of Health Science (IUNICS), University of the Balearic Islands, 07122 Palma de Mallorca, Spain
| |
Collapse
|
31
|
Zhao J, Tao L, Yu H, Luo J, Cao Z, Li Y. Bow-tie topological features of metabolic networks and the functional significance. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/s11434-007-0143-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
32
|
Koonin EV, Wolf YI. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 2008; 36:6688-719. [PMID: 18948295 PMCID: PMC2588523 DOI: 10.1093/nar/gkn668] [Citation(s) in RCA: 465] [Impact Index Per Article: 29.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
33
|
Zhu G, Yang H, Yin C, Li B. Localizations on complex networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008; 77:066113. [PMID: 18643342 DOI: 10.1103/physreve.77.066113] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2007] [Revised: 02/18/2008] [Indexed: 05/09/2023]
Abstract
We study the structural characteristics of complex networks using the representative eigenvectors of the adjacent matrix. The probability distribution function of the components of the representative eigenvectors are proposed to describe the localization on networks where the Euclidean distance is invalid. Several quantities are used to describe the localization properties of the representative states, such as the participation ratio, the structural entropy, and the probability distribution function of the nearest neighbor level spacings for spectra of complex networks. Whole-cell networks in the real world and the Watts-Strogatz small-world and Barabasi-Albert scale-free networks are considered. The networks have nontrivial localization properties due to the nontrivial topological structures. It is found that the ascending-order-ranked series of the occurrence probabilities at the nodes behave generally multifractally. This characteristic can be used as a structural measure of complex networks.
Collapse
Affiliation(s)
- Guimei Zhu
- Department of Modern Physics, University of Science and Technology of China, Hefei Anhui 230026, China
| | | | | | | |
Collapse
|
34
|
Planes FJ, Beasley JE. A critical examination of stoichiometric and path-finding approaches to metabolic pathways. Brief Bioinform 2008; 9:422-36. [DOI: 10.1093/bib/bbn018] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
35
|
Abstract
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
Collapse
|
36
|
SPIKE--a database, visualization and analysis tool of cellular signaling pathways. BMC Bioinformatics 2008; 9:110. [PMID: 18289391 PMCID: PMC2263022 DOI: 10.1186/1471-2105-9-110] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2007] [Accepted: 02/20/2008] [Indexed: 12/17/2022] Open
Abstract
Background Biological signaling pathways that govern cellular physiology form an intricate web of tightly regulated interlocking processes. Data on these regulatory networks are accumulating at an unprecedented pace. The assimilation, visualization and interpretation of these data have become a major challenge in biological research, and once met, will greatly boost our ability to understand cell functioning on a systems level. Results To cope with this challenge, we are developing the SPIKE knowledge-base of signaling pathways. SPIKE contains three main software components: 1) A database (DB) of biological signaling pathways. Carefully curated information from the literature and data from large public sources constitute distinct tiers of the DB. 2) A visualization package that allows interactive graphic representations of regulatory interactions stored in the DB and superposition of functional genomic and proteomic data on the maps. 3) An algorithmic inference engine that analyzes the networks for novel functional interplays between network components. SPIKE is designed and implemented as a community tool and therefore provides a user-friendly interface that allows registered users to upload data to SPIKE DB. Our vision is that the DB will be populated by a distributed and highly collaborative effort undertaken by multiple groups in the research community, where each group contributes data in its field of expertise. Conclusion The integrated capabilities of SPIKE make it a powerful platform for the analysis of signaling networks and the integration of knowledge on such networks with omics data.
Collapse
|
37
|
Majumder HK. Searching the Tritryp genomes for drug targets. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2008; 625:133-40. [PMID: 18365664 PMCID: PMC7123030 DOI: 10.1007/978-0-387-77570-8_11] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The recent publication of the complete genome sequences of Leishmania major, Trypanosoma brucei and Trypanosoma cruzi revealed that each genome contains 8300-12,000 protein-coding genes, of which approximately 6500 are common to all three genomes, and ushers in a new, post-genomic, era for trypanosomatid drug discovery. This vast amount of new information makes possible more comprehensive and accurate target identification using several new computational approaches, including identification of metabolic "choke-points", searching the parasite proteomes for orthologues of known drug targets, and identification of parasite proteins likely to interact with known drugs and drug-like small molecules. In this chapter, we describe several databases (such as GENEDB, BRENDA, KEGG, METACYC, the THERAPEUTIC TARGET DATABASE, and CHEMBANK) and algorithms (including PATHOLOGIC, PATHWAY HUNTER TOOL, AND AUToDOCK) which have been developed to facilitate the bioinformatic analyses underlying these approaches. While target identification is only the first step in the drug development pipeline, these new approaches give rise to renewed optimism for the discovery of new drugs to combat the devastating diseases caused by these parasites. Traditionally, drug discovery in the trypanosomatids (and other organisms) has proceeded from two different starting points: screening large numbers of existing compounds for activity against whole parasites or more focused screening of compounds for activity against defined molecular targets. Most existing anti-trypanosomatids drugs were developed using the former approach, although the latter has gained much attention in the last twenty years under the rubric of "rational drug design". Until recently, one of the major bottlenecks in anti-trypanosomatid drug development has been our ability to identify good targets, since only a very small percentage of the total number of trypanosomatid genes were known. That has now changed forever, with the recent (July, 2005) publication of the "Tritryp" (Trypanosoma brucei, Trypanosoma cruzi and Leishmania major) genome sequences. This vast amount of information now makes possible several new approaches for target identification and ushers in a post-genomic era for trypanosomatid drug discovery.
Collapse
Affiliation(s)
- Hemanta K. Majumder
- Molecular Parasitology Laboratory, Indian Institute of Chemical Biology, Kolkata, India
| |
Collapse
|
38
|
Glycerate 2-kinase of Thermotoga maritima and genomic reconstruction of related metabolic pathways. J Bacteriol 2007; 190:1773-82. [PMID: 18156253 DOI: 10.1128/jb.01469-07] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Members of a novel glycerate-2-kinase (GK-II) family were tentatively identified in a broad range of species, including eukaryotes and archaea and many bacteria that lack a canonical enzyme of the GarK (GK-I) family. The recently reported three-dimensional structure of GK-II from Thermotoga maritima (TM1585; PDB code 2b8n) revealed a new fold distinct from other known kinase families. Here, we verified the enzymatic activity of TM1585, assessed its kinetic characteristics, and used directed mutagenesis to confirm the essential role of the two active-site residues Lys-47 and Arg-325. The main objective of this study was to apply comparative genomics for the reconstruction of metabolic pathways associated with GK-II in all bacteria and, in particular, in T. maritima. Comparative analyses of approximately 400 bacterial genomes revealed a remarkable variety of pathways that lead to GK-II-driven utilization of glycerate via a glycolysis/gluconeogenesis route. In the case of T. maritima, a three-step serine degradation pathway was inferred based on the tentative identification of two additional enzymes, serine-pyruvate aminotransferase and hydroxypyruvate reductase (TM1400 and TM1401, respectively), that convert serine to glycerate via hydroxypyruvate. Both enzymatic activities were experimentally verified, and the entire pathway was validated by its in vitro reconstitution.
Collapse
|
39
|
Zhao J, Ding GH, Tao L, Yu H, Yu ZH, Luo JH, Cao ZW, Li YX. Modular co-evolution of metabolic networks. BMC Bioinformatics 2007; 8:311. [PMID: 17723146 PMCID: PMC2001200 DOI: 10.1186/1471-2105-8-311] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2007] [Accepted: 08/27/2007] [Indexed: 11/25/2022] Open
Abstract
Background The architecture of biological networks has been reported to exhibit high level of modularity, and to some extent, topological modules of networks overlap with known functional modules. However, how the modular topology of the molecular network affects the evolution of its member proteins remains unclear. Results In this work, the functional and evolutionary modularity of Homo sapiens (H. sapiens) metabolic network were investigated from a topological point of view. Network decomposition shows that the metabolic network is organized in a highly modular core-periphery way, in which the core modules are tightly linked together and perform basic metabolism functions, whereas the periphery modules only interact with few modules and accomplish relatively independent and specialized functions. Moreover, over half of the modules exhibit co-evolutionary feature and belong to specific evolutionary ages. Peripheral modules tend to evolve more cohesively and faster than core modules do. Conclusion The correlation between functional, evolutionary and topological modularity suggests that the evolutionary history and functional requirements of metabolic systems have been imprinted in the architecture of metabolic networks. Such systems level analysis could demonstrate how the evolution of genes may be placed in a genome-scale network context, giving a novel perspective on molecular evolution.
Collapse
Affiliation(s)
- Jing Zhao
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
- Department of Mathematics, Logistical Engineering University, Chongqing 400016, China
| | - Guo-Hui Ding
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Lin Tao
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
| | - Hong Yu
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
| | - Zhong-Hao Yu
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jian-Hua Luo
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhi-Wei Cao
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
| | - Yi-Xue Li
- School of Life Sciences & Technology, Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Center for Bioinformation and Technology, Shanghai 200235, China
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
40
|
Computational prediction of protein-protein interactions. Mol Biotechnol 2007; 38:1-17. [PMID: 18095187 DOI: 10.1007/s12033-007-0069-2] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 07/16/2007] [Indexed: 01/19/2023]
Abstract
Recently a number of computational approaches have been developed for the prediction of protein-protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.
Collapse
|
41
|
Tsoka S. Computational methodologies for genome evolution and functional association. Comput Chem Eng 2007. [DOI: 10.1016/j.compchemeng.2006.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
42
|
Overbeek R, Bartels D, Vonstein V, Meyer F. Annotation of bacterial and archaeal genomes: improving accuracy and consistency. Chem Rev 2007; 107:3431-47. [PMID: 17658903 DOI: 10.1021/cr068308h] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ross Overbeek
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois 60527, USA
| | | | | | | |
Collapse
|
43
|
Affiliation(s)
- Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenchaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
44
|
Lu LJ, Sboner A, Huang YJ, Lu HX, Gianoulis TA, Yip KY, Kim PM, Montelione GT, Gerstein MB. Comparing classical pathways and modern networks: towards the development of an edge ontology. Trends Biochem Sci 2007; 32:320-31. [PMID: 17583513 DOI: 10.1016/j.tibs.2007.06.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2006] [Revised: 05/02/2007] [Accepted: 06/06/2007] [Indexed: 02/04/2023]
Abstract
Pathways are integral to systems biology. Their classical representation has proven useful but is inconsistent in the meaning assigned to each arrow (or edge) and inadvertently implies the isolation of one pathway from another. Conversely, modern high-throughput (HTP) experiments offer standardized networks that facilitate topological calculations. Combining these perspectives, classical pathways can be embedded within large-scale networks and thus demonstrate the crosstalk between them. As more diverse types of HTP data become available, both perspectives can be effectively merged, embedding pathways simultaneously in multiple networks. However, the original problem still remains - the current edge representation is inadequate to accurately convey all the information in pathways. Therefore, we suggest that a standardized and well-defined edge ontology is necessary and propose a prototype as a starting point for reaching this goal.
Collapse
Affiliation(s)
- Long J Lu
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.
Collapse
Affiliation(s)
- Xiaowei Zhu
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520, USA
| | | | | |
Collapse
|
46
|
Iyer LM, Burroughs AM, Aravind L. The prokaryotic antecedents of the ubiquitin-signaling system and the early evolution of ubiquitin-like beta-grasp domains. Genome Biol 2007; 7:R60. [PMID: 16859499 PMCID: PMC1779556 DOI: 10.1186/gb-2006-7-7-r60] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2006] [Revised: 06/12/2006] [Accepted: 07/06/2006] [Indexed: 11/14/2022] Open
Abstract
A systematic analysis of prokaryotic ubiquitin-related beta-grasp fold proteins provides new insights into the Ubiquitin family functional history. Background Ubiquitin (Ub)-mediated signaling is one of the hallmarks of all eukaryotes. Prokaryotic homologs of Ub (ThiS and MoaD) and E1 ligases have been studied in relation to sulfur incorporation reactions in thiamine and molybdenum/tungsten cofactor biosynthesis. However, there is no evidence for entire protein modification systems with Ub-like proteins and deconjugation by deubiquitinating enzymes in prokaryotes. Hence, the evolutionary assembly of the eukaryotic Ub-signaling apparatus remains unclear. Results We systematically analyzed prokaryotic Ub-related β-grasp fold proteins using sensitive sequence profile searches and structural analysis. Consequently, we identified novel Ub-related proteins beyond the characterized ThiS, MoaD, TGS, and YukD domains. To understand their functional associations, we sought and recovered several conserved gene neighborhoods and domain architectures. These included novel associations involving diverse sulfur metabolism proteins, siderophore biosynthesis and the gene encoding the transfer mRNA binding protein SmpB, as well as domain fusions between Ub-like domains and PIN-domain related RNAses. Most strikingly, we found conserved gene neighborhoods in phylogenetically diverse bacteria combining genes for JAB domains (the primary de-ubiquitinating isopeptidases of the proteasomal complex), along with E1-like adenylating enzymes and different Ub-related proteins. Further sequence analysis of other conserved genes in these neighborhoods revealed several Ub-conjugating enzyme/E2-ligase related proteins. Genes for an Ub-like protein and a JAB domain peptidase were also found in the tail assembly gene cluster of certain caudate bacteriophages. Conclusion These observations imply that members of the Ub family had already formed strong functional associations with E1-like proteins, UBC/E2-related proteins, and JAB peptidases in the bacteria. Several of these Ub-like proteins and the associated protein families are likely to function together in signaling systems just as in eukaryotes.
Collapse
Affiliation(s)
- Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - A Maxwell Burroughs
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
- Bioinformatics Program, Boston University, Cummington Street, Boston, Massachusetts 02215, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
47
|
Markowitz VM. Microbial genome data resources. Curr Opin Biotechnol 2007; 18:267-72. [PMID: 17467973 DOI: 10.1016/j.copbio.2007.04.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Revised: 03/18/2007] [Accepted: 04/18/2007] [Indexed: 11/17/2022]
Abstract
Studies of the genomes of individual microbial organisms as well as aggregate genomes (metagenomes) of microbial communities are expected to lead to advances in various areas, such as healthcare, environmental cleanup, and alternative energy production. A variety of specialized data resources manage the results of different microbial genome data processing and interpretation stages, and represent different degrees of microbial genome characterization. Scientists studying microbial genomes and metagenomes often need one or several of these resources. Given their diversity, these resources cannot be used effectively without determining the scope and type of individual resources as well as the relationship between their data.
Collapse
Affiliation(s)
- Victor M Markowitz
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 50A-1148, Berkeley CA 94720, USA.
| |
Collapse
|
48
|
Addison ER, Hobbs ET. PRISM: protein integration of sequence metrics. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:2991-4. [PMID: 17270907 DOI: 10.1109/iembs.2004.1403848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
PRISM is a comprehensive method for predicting protein sequence structure and function by providing a mechanism to combine information from numerous structure and function prediction tools into one comprehensive view. It uses an evidential reasoning calculus to combine information provided from multiple sources into a consensus solution. The output of PRISM is a comprehensive one stop shop report of all information about a given known or novel protein.
Collapse
|
49
|
Abstract
Integrating information in the molecular biosciences involves more than the cross-referencing of sequences or structures. Experimental protocols, results of computational analyses, annotations and links to relevant literature form integral parts of this information, and impart meaning to sequence or structure. In this review, we examine some existing approaches to integrating information in the molecular biosciences. We consider not only technical issues concerning the integration of heterogeneous data sources and the corresponding semantic implications, but also the integration of analytical results. Within the broad range of strategies for integration of data and information, we distinguish between platforms and developments. We discuss two current platforms and six current developments, and identify what we believe to be their strengths and limitations. We identify key unsolved problems in integrating information in the molecular biosciences, and discuss possible strategies for addressing them including semantic integration using ontologies, XML as a data model, and graphical user interfaces as integrative environments.
Collapse
Affiliation(s)
- Alexander Garcia Castro
- ARC Centre in Bioinformatics, and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | | | | |
Collapse
|
50
|
Liu Y, Li J, Sam L, Goh CS, Gerstein M, Lussier YA. An integrative genomic approach to uncover molecular mechanisms of prokaryotic traits. PLoS Comput Biol 2006; 2:e159. [PMID: 17112314 PMCID: PMC1636675 DOI: 10.1371/journal.pcbi.0020159] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022] Open
Abstract
With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.
Collapse
Affiliation(s)
- Yang Liu
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Jianrong Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Lee Sam
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Chern-Sing Goh
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
- * To whom correspondence should be addressed. E-mail: (MG); (YAL)
| | - Yves A Lussier
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Center for Biomedical Informatics, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
- * To whom correspondence should be addressed. E-mail: (MG); (YAL)
| |
Collapse
|