1
|
Schaffer LV, Ideker T. Mapping the multiscale structure of biological systems. Cell Syst 2021; 12:622-635. [PMID: 34139169 PMCID: PMC8245186 DOI: 10.1016/j.cels.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/04/2021] [Accepted: 05/14/2021] [Indexed: 01/14/2023]
Abstract
Biological systems are by nature multiscale, consisting of subsystems that factor into progressively smaller units in a deeply hierarchical structure. At any level of the hierarchy, an ever-increasing diversity of technologies can be applied to characterize the corresponding biological units and their relations, resulting in large networks of physical or functional proximities-e.g., proximities of amino acids within a protein, of proteins within a complex, or of cell types within a tissue. Here, we review general concepts and progress in using network proximity measures as a basis for creation of multiscale hierarchical maps of biological systems. We discuss the functionalization of these maps to create predictive models, including those useful in translation of genotype to phenotype, along with strategies for model visualization and challenges faced by multiscale modeling in the near future. Collectively, these approaches enable a unified hierarchical approach to biological data, with application from the molecular to the macroscopic.
Collapse
Affiliation(s)
- Leah V Schaffer
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
2
|
Janwa H, Massey SE, Velev J, Mishra B. On the Origin of Biomolecular Networks. Front Genet 2019; 10:240. [PMID: 31024611 PMCID: PMC6467946 DOI: 10.3389/fgene.2019.00240] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/04/2019] [Indexed: 12/17/2022] Open
Abstract
Biomolecular networks have already found great utility in characterizing complex biological systems arising from pairwise interactions amongst biomolecules. Here, we explore the important and hitherto neglected role of information asymmetry in the genesis and evolution of such pairwise biomolecular interactions. Information asymmetry between sender and receiver genes is identified as a key feature distinguishing early biochemical reactions from abiotic chemistry, and a driver of network topology as biomolecular systems become more complex. In this context, we review how graph theoretical approaches can be applied not only for a better understanding of various proximate (mechanistic) relations, but also, ultimate (evolutionary) structures encoded in such networks from among all types of variations they induce. Among many possible variations, we emphasize particularly the essential role of gene duplication in terms of signaling game theory, whereby sender and receiver gene players accrue benefit from gene duplication, leading to a preferential attachment mode of network growth. The study of the resulting dynamics suggests many mathematical/computational problems, the majority of which are intractable yet yield to efficient approximation algorithms, when studied through an algebraic graph theoretic lens. We relegate for future work the role of other possible generalizations, additionally involving horizontal gene transfer, sexual recombination, endo-symbiosis, etc., which enrich the underlying graph theory even further.
Collapse
Affiliation(s)
- Heeralal Janwa
- Department of Mathematics, University of Puerto Rico, San Juan, PR, United States
| | - Steven E Massey
- Department of Biology, University of Puerto Rico, San Juan, PR, United States
| | - Julian Velev
- Department of Physics, University of Puerto Rico, San Juan, PR, United States
| | - Bud Mishra
- Departments of Computer Science, Mathematics and Cell Biology, Courant Institute and NYU School of Medicine, New York University, New York City, NY, United States
| |
Collapse
|
3
|
Fernandes R, Nogueira G, da Costa PJ, Pinto F, Romão L. Nonsense-Mediated mRNA Decay in Development, Stress and Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1157:41-83. [DOI: 10.1007/978-3-030-19966-1_3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
4
|
Tegge AN, Rodrigues RR, Larkin AL, Vu L, Murali TM, Rajagopalan P. Transcriptomic Analysis of Hepatic Cells in Multicellular Organotypic Liver Models. Sci Rep 2018; 8:11306. [PMID: 30054499 PMCID: PMC6063915 DOI: 10.1038/s41598-018-29455-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 07/11/2018] [Indexed: 02/08/2023] Open
Abstract
Liver homeostasis requires the presence of both parenchymal and non-parenchymal cells (NPCs). However, systems biology studies of the liver have primarily focused on hepatocytes. Using an organotypic three-dimensional (3D) hepatic culture, we report the first transcriptomic study of liver sinusoidal endothelial cells (LSECs) and Kupffer cells (KCs) cultured with hepatocytes. Through computational pathway and interaction network analyses, we demonstrate that hepatocytes, LSECs and KCs have distinct expression profiles and functional characteristics. Our results show that LSECs in the presence of KCs exhibit decreased expression of focal adhesion kinase (FAK) signaling, a pathway linked to LSEC dedifferentiation. We report the novel result that peroxisome proliferator-activated receptor alpha (PPARα) is transcribed in LSECs. The expression of downstream processes corroborates active PPARα signaling in LSECs. We uncover transcriptional evidence in LSECs for a feedback mechanism between PPARα and farnesoid X-activated receptor (FXR) that maintains bile acid homeostasis; previously, this feedback was known occur only in HepG2 cells. We demonstrate that KCs in 3D liver models display expression patterns consistent with an anti-inflammatory phenotype when compared to monocultures. These results highlight the distinct roles of LSECs and KCs in maintaining liver function and emphasize the need for additional mechanistic studies of NPCs in addition to hepatocytes in liver-mimetic microenvironments.
Collapse
Affiliation(s)
- Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, USA
- Department of Statistics, Virginia Tech, Blacksburg, USA
| | - Richard R Rodrigues
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, USA
| | - Adam L Larkin
- Department of Chemical Engineering, Virginia Tech, Blacksburg, USA
| | - Lucas Vu
- Department of Chemical Engineering, Virginia Tech, Blacksburg, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA.
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, USA.
| | - Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Tech, Blacksburg, USA.
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, USA.
- Virginia Tech-Wake Forest School of Biomedical Engineering and Sciences, Virginia Tech, Blacksburg, USA.
| |
Collapse
|
5
|
Abstract
Motivation Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. Results DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis, and found that about 2.5–35% of the proteomes are potential MPs. Availability and Implementation Code available at http://kiharalab.org/DextMP.
Collapse
Affiliation(s)
- Ishita K Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Mansurul Bhuiyan
- Department of Computer Science, Indiana University-Purdue University Indianapolis (IUPUI), Indianapolis, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,Department of Biological Science, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
6
|
Padmanabhan K, Shpanskaya K, Bello G, Doraiswamy PM, Samatova NF. Toward Personalized Network Biomarkers in Alzheimer's Disease: Computing Individualized Genomic and Protein Crosstalk Maps. Front Aging Neurosci 2017; 9:315. [PMID: 29085293 PMCID: PMC5649142 DOI: 10.3389/fnagi.2017.00315] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 09/15/2017] [Indexed: 01/12/2023] Open
Affiliation(s)
- Kanchana Padmanabhan
- Department of Computer Science, North Carolina State University, Raleigh, NC, United States.,Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Katie Shpanskaya
- Stanford University School of Medicine, Stanford, CA, United States
| | - Gonzalo Bello
- Department of Computer Science, North Carolina State University, Raleigh, NC, United States
| | - P Murali Doraiswamy
- Department of Psychiatry, Duke University, Durham, NC, United States.,Duke Institute for Brain Sciences, Duke University, Durham, NC, United States
| | - Nagiza F Samatova
- Department of Computer Science, North Carolina State University, Raleigh, NC, United States.,Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
7
|
Sam SA, Teel J, Tegge AN, Bharadwaj A, Murali TM. XTalkDB: a database of signaling pathway crosstalk. Nucleic Acids Res 2016; 45:D432-D439. [PMID: 27899583 PMCID: PMC5210533 DOI: 10.1093/nar/gkw1037] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 09/28/2016] [Accepted: 10/20/2016] [Indexed: 01/01/2023] Open
Abstract
Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTalkDB (http://www.xtalkdb.org) to fill this very important gap. XTalkDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTalkDB website provides an easy-to-use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest.
Collapse
Affiliation(s)
- Sarah A Sam
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA.,School of Neuroscience, Virginia Tech, Blacksburg, VA 24061, USA
| | - Joelle Teel
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.,Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA
| | - Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA .,ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
8
|
Glass K, Girvan M. Finding New Order in Biological Functions from the Network Structure of Gene Annotations. PLoS Comput Biol 2015; 11:e1004565. [PMID: 26588252 PMCID: PMC4654495 DOI: 10.1371/journal.pcbi.1004565] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 09/23/2015] [Indexed: 11/19/2022] Open
Abstract
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.
Collapse
Affiliation(s)
- Kimberly Glass
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Physics Department, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Michelle Girvan
- Physics Department, University of Maryland, College Park, Maryland, United States of America
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
9
|
Suphavilai C, Zhu L, Chen JY. A method for developing regulatory gene set networks to characterize complex biological systems. BMC Genomics 2015; 16 Suppl 11:S4. [PMID: 26576648 PMCID: PMC4652563 DOI: 10.1186/1471-2164-16-s11-s4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Background Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. Results In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. Conclusions R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.
Collapse
|
10
|
Jafari M, Mirzaie M, Sadeghi M. Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms. BMC Bioinformatics 2015; 16:319. [PMID: 26437714 PMCID: PMC4595048 DOI: 10.1186/s12859-015-0755-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 09/29/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the field of network science, exploring principal and crucial modules or communities is critical in the deduction of relationships and organization of complex networks. This approach expands an arena, and thus allows further study of biological functions in the field of network biology. As the clustering algorithms that are currently employed in finding modules have innate uncertainties, external and internal validations are necessary. METHODS Sequence and network structure alignment, has been used to define the Interlog Protein Network (IPN). This network is an evolutionarily conserved network with communal nodes and less false-positive links. In the current study, the IPN is employed as an evolution-based benchmark in the validation of the module finding methods. The clustering results of five algorithms; Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Cartographic Representation (CR), Laplacian Dynamics (LD) and Genetic Algorithm; to find communities in Protein-Protein Interaction networks (GAPPI) are assessed by IPN in four distinct Protein-Protein Interaction Networks (PPINs). RESULTS The MCL shows a more accurate algorithm based on this evolutionary benchmarking approach. Also, the biological relevance of proteins in the IPN modules generated by MCL is compatible with biological standard databases such as Gene Ontology, KEGG and Reactome. CONCLUSION In this study, the IPN shows its potential for validation of clustering algorithms due to its biological logic and straightforward implementation.
Collapse
Affiliation(s)
- Mohieddin Jafari
- Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, 69 Pasteur St, PO Box 13164, Tehran, Iran.
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Shahid Lavasani St, PO Box 19395-5746, Tehran, Iran.
| | - Mehdi Mirzaie
- Department of Computational Biology, Faculty of High Technologies, Tarbiat Modares University, Jalal Ale Ahmad Highway, PO Box 14115-111, Tehran, Iran.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology (NIGEB), Pajoohesh Blvd, 17 Km Tehran-Karaj Highway, PO Box 161-14965, Tehran, Iran.
| |
Collapse
|
11
|
Tegge AN, Sharp N, Murali TM. Xtalk: a path-based approach for identifying crosstalk between signaling pathways. Bioinformatics 2015; 32:242-51. [PMID: 26400040 DOI: 10.1093/bioinformatics/btv549] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 09/04/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. RESULTS We present Xtalk, a path-based approach for identifying pairs of pathways that may crosstalk. Xtalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, Xtalk reports the precise interactions and mechanisms that support the identified crosstalk. We applied Xtalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which Xtalk achieved an area under the receiver operator characteristic curve of 0.65, a 12% improvement over the closest competing approach. The area under the receiver operator characteristic curve varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID). We provide examples of networks computed by Xtalk that accurately recovered known mechanisms of crosstalk. AVAILABILITY AND IMPLEMENTATION The XTALK software is available at http://bioinformatics.cs.vt.edu/~murali/software. Crosstalk networks are available at http://graphspace.org/graphs?tags=2015-bioinformatics-xtalk. CONTACT ategge@vt.edu, murali@cs.vt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Allison N Tegge
- Department of Computer Science, Department of Statistics and
| | | | - T M Murali
- Department of Computer Science, ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
12
|
Chapple CE, Herrmann C, Brun C. PrOnto database : GO term functional dissimilarity inferred from biological data. Front Genet 2015; 6:200. [PMID: 26089836 PMCID: PMC4452890 DOI: 10.3389/fgene.2015.00200] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/21/2015] [Indexed: 12/22/2022] Open
Abstract
Moonlighting proteins are defined by their involvement in multiple, unrelated functions. The computational prediction of such proteins requires a formal method of assessing the similarity of cellular processes, for example, by identifying dissimilar Gene Ontology terms. While many measures of Gene Ontology term similarity exist, most depend on abstract mathematical analyses of the structure of the GO tree and do not necessarily represent the underlying biology. Here, we propose two metrics of GO term functional dissimilarity derived from biological information, one based on the protein annotations and the other on the interactions between proteins. They have been collected in the PrOnto database, a novel tool which can be of particular use for the identification of moonlighting proteins. The database can be queried via an web-based interface which is freely available at http://tagc.univ-mrs.fr/pronto.
Collapse
Affiliation(s)
- Charles E Chapple
- Inserm, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090 TAGC Marseille, France
| | - Carl Herrmann
- Inserm, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090 TAGC Marseille, France
| | - Christine Brun
- Inserm, UMR_S1090 TAGC Marseille, France ; Aix-Marseille Université, UMR_S1090 TAGC Marseille, France ; Centre National de la Recherche Scientifique Marseille, France
| |
Collapse
|
13
|
Wang H, Huang H, Ding C. Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency. J Comput Biol 2015; 22:546-62. [PMID: 25922963 DOI: 10.1089/cmb.2014.0172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Conventional computational approaches for protein function prediction usually predict one function at a time, fundamentally. As a result, the protein functions are treated as separate target classes. However, biological processes are highly correlated in reality, which makes multiple functions assigned to a protein not independent. Therefore, it would be beneficial to make use of function category correlations when predicting protein functions. In this article, we propose a novel Maximization of Data-Knowledge Consistency (MDKC) approach to exploit function category correlations for protein function prediction. Our approach banks on the assumption that two proteins are likely to have large overlap in their annotated functions if they are highly similar according to certain experimental data. We first establish a new pairwise protein similarity using protein annotations from knowledge perspective. Then by maximizing the consistency between the established knowledge similarity upon annotations and the data similarity upon biological experiments, putative functions are assigned to unannotated proteins. Most importantly, function category correlations are gracefully incorporated into our learning objective through the knowledge similarity. Comprehensive experimental evaluations on the Saccharomyces cerevisiae species have demonstrated promising results that validate the performance of our methods.
Collapse
Affiliation(s)
- Hua Wang
- 1Department of Electrical Engineering and Computer Science, Colorado School of Mines, Golden, Colorado
| | - Heng Huang
- 2Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Chris Ding
- 2Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| |
Collapse
|
14
|
Ritz A, Tegge AN, Kim H, Poirel CL, Murali TM. Signaling hypergraphs. Trends Biotechnol 2014; 32:356-62. [PMID: 24857424 PMCID: PMC4299695 DOI: 10.1016/j.tibtech.2014.04.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 04/01/2014] [Accepted: 04/04/2014] [Indexed: 01/10/2023]
Abstract
Signaling pathways function as the information-passing mechanisms of cells. A number of databases with extensive manual curation represent the current knowledge base for signaling pathways. These databases motivate the development of computational approaches for prediction and analysis. Such methods require an accurate and computable representation of signaling pathways. Pathways are often described as sets of proteins or as pairwise interactions between proteins. However, many signaling mechanisms cannot be described using these representations. In this opinion, we highlight a representation of signaling pathways that is underutilized: the hypergraph. We demonstrate the usefulness of hypergraphs in this context and discuss challenges and opportunities for the scientific community.
Collapse
Affiliation(s)
- Anna Ritz
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Hyunju Kim
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA; ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
15
|
Krüger DM, Ignacio Garzón J, Chacón P, Gohlke H. DrugScorePPI knowledge-based potentials used as scoring and objective function in protein-protein docking. PLoS One 2014; 9:e89466. [PMID: 24586799 PMCID: PMC3931789 DOI: 10.1371/journal.pone.0089466] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 01/20/2014] [Indexed: 02/06/2023] Open
Abstract
The distance-dependent knowledge-based DrugScorePPI potentials, previously developed for in silico alanine scanning and hot spot prediction on given structures of protein-protein complexes, are evaluated as a scoring and objective function for the structure prediction of protein-protein complexes. When applied for ranking “unbound perturbation” (“unbound docking”) decoys generated by Baker and coworkers a 4-fold (1.5-fold) enrichment of acceptable docking solutions in the top ranks compared to a random selection is found. When applied as an objective function in FRODOCK for bound protein-protein docking on 97 complexes of the ZDOCK benchmark 3.0, DrugScorePPI/FRODOCK finds up to 10% (15%) more high accuracy solutions in the top 1 (top 10) predictions than the original FRODOCK implementation. When used as an objective function for global unbound protein-protein docking, fair docking success rates are obtained, which improve by ∼2-fold to 18% (58%) for an at least acceptable solution in the top 10 (top 100) predictions when performing knowledge-driven unbound docking. This suggests that DrugScorePPI balances well several different types of interactions important for protein-protein recognition. The results are discussed in view of the influence of crystal packing and the type of protein-protein complex docked. Finally, a simple criterion is provided with which to estimate a priori if unbound docking with DrugScorePPI/FRODOCK will be successful.
Collapse
Affiliation(s)
- Dennis M. Krüger
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich-Heine-University, Düsseldorf, Germany
| | - José Ignacio Garzón
- Rocasolano Physical Chemistry Institute, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Pablo Chacón
- Rocasolano Physical Chemistry Institute, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Holger Gohlke
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich-Heine-University, Düsseldorf, Germany
- * E-mail:
| |
Collapse
|
16
|
Lasher CD, Rajagopalan P, Murali TM. Summarizing cellular responses as biological process networks. BMC SYSTEMS BIOLOGY 2013; 7:68. [PMID: 23895181 PMCID: PMC3751784 DOI: 10.1186/1752-0509-7-68] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 06/26/2013] [Indexed: 12/02/2022]
Abstract
Background Microarray experiments can simultaneously identify thousands of genes that show significant perturbation in expression between two experimental conditions. Response networks, computed through the integration of gene interaction networks with expression perturbation data, may themselves contain tens of thousands of interactions. Gene set enrichment has become standard for summarizing the results of these analyses in terms functionally coherent collections of genes such as biological processes. However, even these methods can yield hundreds of enriched functions that may overlap considerably. Results We describe a new technique called Markov chain Monte Carlo Biological Process Networks (MCMC-BPN) capable of reporting a highly non-redundant set of links between processes that describe the molecular interactions that are perturbed under a specific biological context. Each link in the BPN represents the perturbed interactions that serve as the interfaces between the two processes connected by the link. We apply MCMC-BPN to publicly available liver-related datasets to demonstrate that the networks formed by the most probable inter-process links reported by MCMC-BPN show high relevance to each biological condition. We show that MCMC-BPN’s ability to discern the few key links from in a very large solution space by comparing results from two other methods for detecting inter-process links. Conclusions MCMC-BPN is successful in using few inter-process links to explain as many of the perturbed gene-gene interactions as possible. Thereby, BPNs summarize the important biological trends within a response network by reporting a digestible number of inter-process links that can be explored in greater detail.
Collapse
Affiliation(s)
- Christopher D Lasher
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA 24061 USA
| | | | | |
Collapse
|
17
|
Theofilatos K, Dimitrakopoulos C, Likothanassis S, Kleftogiannis D, Moschopoulos C, Alexakos C, Papadimitriou S, Mavroudi S. The Human Interactome Knowledge Base (HINT-KB): an integrative human protein interaction database enriched with predicted protein–protein interaction scores using a novel hybrid technique. Artif Intell Rev 2013. [DOI: 10.1007/s10462-013-9409-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
18
|
Dutkowski J, Kramer M, Surma MA, Balakrishnan R, Cherry JM, Krogan NJ, Ideker T. A gene ontology inferred from molecular networks. Nat Biotechnol 2013; 31:38-45. [PMID: 23242164 DOI: 10.1038/nbt.2463] [Citation(s) in RCA: 124] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Indexed: 12/20/2022]
Abstract
Ontologies have proven very useful for capturing knowledge as a hierarchy of terms and their interrelationships. In biology a major challenge has been to construct ontologies of gene function given incomplete biological knowledge and inconsistencies in how this knowledge is manually curated. Here we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to infer an ontology whose coverage and power are equivalent to those of the manually curated Gene Ontology (GO). The network-extracted ontology (NeXO) contains 4,123 biological terms and 5,766 term-term relations, capturing 58% of known cellular components. We also explore robust NeXO terms and term relations that were initially not cataloged in GO, a number of which have now been added based on our analysis. Using quantitative genetic interaction profiling and chemogenomics, we find further support for many of the uncharacterized terms identified by NeXO, including multisubunit structures related to protein trafficking or mitochondrial function. This work enables a shift from using ontologies to evaluate data to using data to construct and evaluate ontologies.
Collapse
Affiliation(s)
- Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla, California, USA.
| | | | | | | | | | | | | |
Collapse
|
19
|
Multi-edge gene set networks reveal novel insights into global relationships between biological themes. PLoS One 2012; 7:e45211. [PMID: 23028852 PMCID: PMC3441533 DOI: 10.1371/journal.pone.0045211] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 08/15/2012] [Indexed: 11/25/2022] Open
Abstract
Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.
Collapse
|
20
|
Fung DCY, Li SS, Goel A, Hong SH, Wilkins MR. Visualization of the interactome: What are we looking at? Proteomics 2012; 12:1669-86. [DOI: 10.1002/pmic.201100454] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- David C. Y. Fung
- New South Wales Systems Biology Initiative; and School of Biotechnology and Biomolecular Sciences; The University of New South Wales; New South Wales Australia
| | - Simone S. Li
- New South Wales Systems Biology Initiative; and School of Biotechnology and Biomolecular Sciences; The University of New South Wales; New South Wales Australia
| | - Apurv Goel
- New South Wales Systems Biology Initiative; and School of Biotechnology and Biomolecular Sciences; The University of New South Wales; New South Wales Australia
| | - Seok-Hee Hong
- School of Information Technologies; Faculty of Engineering and Information Technologies; The University of Sydney; New South Wales Australia
| | - Marc R. Wilkins
- New South Wales Systems Biology Initiative; and School of Biotechnology and Biomolecular Sciences; The University of New South Wales; New South Wales Australia
| |
Collapse
|
21
|
From networks of protein interactions to networks of functional dependencies. BMC SYSTEMS BIOLOGY 2012; 6:44. [PMID: 22607727 PMCID: PMC3434018 DOI: 10.1186/1752-0509-6-44] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Accepted: 05/20/2012] [Indexed: 11/23/2022]
Abstract
Background As protein-protein interactions connect proteins that participate in either the same or different functions, networks of interacting and functionally annotated proteins can be converted into process graphs of inter-dependent function nodes (each node corresponding to interacting proteins with the same functional annotation). However, as proteins have multiple annotations, the process graph is non-redundant, if only proteins participating directly in a given function are included in the related function node. Results Reasoning that topological features (e.g., clusters of highly inter-connected proteins) might help approaching structured and non-redundant understanding of molecular function, an algorithm was developed that prioritizes inclusion of proteins into the function nodes that best overlap protein clusters. Specifically, the algorithm identifies function nodes (and their mutual relations), based on the topological analysis of a protein interaction network, which can be related to various biological domains, such as cellular components (e.g., peroxisome and cellular bud) or biological processes (e.g., cell budding) of the model organism S. cerevisiae. Conclusions The method we have described allows converting a protein interaction network into a non-redundant process graph of inter-dependent function nodes. The examples we have described show that the resulting graph allows researchers to formulate testable hypotheses about dependencies among functions and the underlying mechanisms.
Collapse
|
22
|
Tsiliki G, Kossida S. Fusion methodologies for biomedical data. J Proteomics 2011; 74:2774-85. [PMID: 21767675 DOI: 10.1016/j.jprot.2011.07.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Revised: 06/13/2011] [Accepted: 07/01/2011] [Indexed: 12/12/2022]
Abstract
Data fusion methods are powerful tools for integrating the different views of an organism provided by various types of experimental data. We describe various methodologies for integrating and drawing inferences from a collection of biomedical data, primarily focusing on protein and gene expression data. Computational experiments performed using biomedical data, including known protein-protein interactions, hydropathy profiles, gene expression data and amino acid sequences, demonstrate the utility of this approach. Overall, studies agree in that methodologies using carefully selected data of various types to predict particular classes, groups and interactions, perform better than when applied to a single type of data.
Collapse
Affiliation(s)
- Georgia Tsiliki
- Bioinformatics andMedical Informatics Group, Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, 115 27, Athens, Greece.
| | | |
Collapse
|
23
|
Lin M, Zhou X, Shen X, Mao C, Chen X. The predicted Arabidopsis interactome resource and network topology-based systems biology analyses. THE PLANT CELL 2011; 23:911-22. [PMID: 21441435 PMCID: PMC3082272 DOI: 10.1105/tpc.110.082529] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Revised: 12/30/2010] [Accepted: 03/10/2011] [Indexed: 05/17/2023]
Abstract
Predicted interactions are a valuable complement to experimentally reported interactions in molecular mechanism studies, particularly for higher organisms, for which reported experimental interactions represent only a small fraction of their total interactomes. With careful engineering consideration of the lessons from previous efforts, the predicted arabidopsis interactome resource (PAIR; ) presents 149,900 potential molecular interactions, which are expected to cover approximately 24% of the entire interactome with approximately 40% precision. This study demonstrates that, although PAIR still has limited coverage, it is rich enough to capture many significant functional linkages within and between higher-order biological systems, such as pathways and biological processes. These inferred interactions can nicely power several network topology-based systems biology analyses, such as gene set linkage analysis, protein function prediction, and identification of regulatory genes demonstrating insignificant expression changes. The drastically expanded molecular network in PAIR has considerably improved the capability of these analyses to integrate existing knowledge and suggest novel insights into the function and coordination of genes and gene networks.
Collapse
Affiliation(s)
- Mingzhi Lin
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
- Department of Bioinformatics, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Xi Zhou
- Department of Bioinformatics, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Xueling Shen
- Institute of Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Chuanzao Mao
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
| | - Xin Chen
- State Key Laboratory of Plant Physiology and Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
- Department of Bioinformatics, Zhejiang University, Hangzhou 310058, People’s Republic of China
- Institute of Biochemistry, Zhejiang University, Hangzhou 310058, People’s Republic of China
| |
Collapse
|
24
|
Lasher CD, Rajagopalan P, Murali TM. Discovering networks of perturbed biological processes in hepatocyte cultures. PLoS One 2011; 6:e15247. [PMID: 21245926 PMCID: PMC3016309 DOI: 10.1371/journal.pone.0015247] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Accepted: 11/02/2010] [Indexed: 12/20/2022] Open
Abstract
The liver plays a vital role in glucose homeostasis, the synthesis of bile acids and the detoxification of foreign substances. Liver culture systems are widely used to test adverse effects of drugs and environmental toxicants. The two most prevalent liver culture systems are hepatocyte monolayers (HMs) and collagen sandwiches (CS). Despite their wide use, comprehensive transcriptional programs and interaction networks in these culture systems have not been systematically investigated. We integrated an existing temporal transcriptional dataset for HM and CS cultures of rat hepatocytes with a functional interaction network of rat genes. We aimed to exploit the functional interactions to identify statistically significant linkages between perturbed biological processes. To this end, we developed a novel approach to compute Contextual Biological Process Linkage Networks (CBPLNs). CBPLNs revealed numerous meaningful connections between different biological processes and gene sets, which we were successful in interpreting within the context of liver metabolism. Multiple phenomena captured by CBPLNs at the process level such as regulation, downstream effects, and feedback loops have well described counterparts at the gene and protein level. CBPLNs reveal high-level linkages between pathways and processes, making the identification of important biological trends more tractable than through interactions between individual genes and molecules alone. Our approach may provide a new route to explore, analyze, and understand cellular responses to internal and external cues within the context of the intricate networks of molecular interactions that control cellular behavior.
Collapse
Affiliation(s)
- Christopher D. Lasher
- Genetics, Bioinformatics, and Computational Biology PhD Program, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|
25
|
Pancaldi V, Schubert F, Bähler J. Meta-analysis of genome regulation and expression variability across hundreds of environmental and genetic perturbations in fission yeast. ACTA ACUST UNITED AC 2010; 6:543-52. [DOI: 10.1039/b913876p] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
26
|
Das S, Yennamalli RM, Vishnoi A, Gupta P, Bhattacharya A. Single-nucleotide variations associated with Mycobacterium tuberculosis KwaZulu-Natal strains. J Biosci 2009; 34:397-404. [PMID: 19805901 DOI: 10.1007/s12038-009-0046-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The occurrence of drug resistance in Mycobacterium tuberculosis, the aetiological agent of tuberculosis (TB), is hampering the management and control of TB in the world. Here we present a computational analysis of recently sequenced drug-sensitive (DS), multidrug-resistant (MDR) and extensively drug-resistant (XDR) strains of M. tuberculosis. Single-nucleotide variations (SNVs) were identified in a pair-wise manner using the anchor-based whole genome comparison (ABWGC) tool and its modified version. For this analysis, four fully sequenced genomes of different strains of M. tuberculosis were taken along with three KwaZulu-Natal (KZN) strains isolated from South Africa including one XDR and one MDR strain. KZN strains were compared with other fully sequenced strains and also among each other. The variations were analysed with respect to their biological influence as a result of either altered structure or synthesis. The results suggest that the DR phenotype may be due to changes in a number of genes. The database on KZN strains can be accessed through the website http://mirna.jnu.ac.in/mgdd/.
Collapse
Affiliation(s)
- Sarbashis Das
- Center for Computational Biology and Bioinformatics, School of Information Technology, Jawaharlal Nehru University, New Delhi 110 067, India
| | | | | | | | | |
Collapse
|