1
|
Ding Z, Kihara D. Computational identification of protein-protein interactions in model plant proteomes. Sci Rep 2019; 9:8740. [PMID: 31217453 PMCID: PMC6584649 DOI: 10.1038/s41598-019-45072-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in many biological processes. A PPI network provides crucial information on how biological pathways are structured and coordinated from individual protein functions. In the past two decades, large-scale PPI networks of a handful of organisms were determined by experimental techniques. However, these experimental methods are time-consuming, expensive, and are not easy to perform on new target organisms. Large-scale PPI data is particularly sparse in plant organisms. Here, we developed a computational approach for detecting PPIs trained and tested on known PPIs of Arabidopsis thaliana and applied to three plants, Arabidopsis thaliana, Glycine max (soybean), and Zea mays (maize) to discover new PPIs on a genome-scale. Our method considers a variety of features including protein sequences, gene co-expression, functional association, and phylogenetic profiles. This is the first work where a PPI prediction method was developed for is the first PPI prediction method applied on benchmark datasets of Arabidopsis. The method showed a high prediction accuracy of over 90% and very high precision of close to 1.0. We predicted 50,220 PPIs in Arabidopsis thaliana, 13,175,414 PPIs in corn, and 13,527,834 PPIs in soybean. Newly predicted PPIs were classified into three confidence levels according to the availability of existing supporting evidence and discussed. Predicted PPIs in the three plant genomes are made available for future reference.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.
- Department of Pediatrics, University of Cincinnati, Cincinnati, OH, 45229, USA.
| |
Collapse
|
2
|
Gurunathan S, Qasim M, Park C, Yoo H, Choi DY, Song H, Park C, Kim JH, Hong K. Cytotoxicity and Transcriptomic Analysis of Silver Nanoparticles in Mouse Embryonic Fibroblast Cells. Int J Mol Sci 2018; 19:ijms19113618. [PMID: 30453526 PMCID: PMC6275036 DOI: 10.3390/ijms19113618] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 10/27/2018] [Accepted: 11/13/2018] [Indexed: 12/16/2022] Open
Abstract
The rapid development of nanotechnology has led to the use of silver nanoparticles (AgNPs) in biomedical applications, including antibacterial, antiviral, anti-inflammatory, and anticancer therapies. The molecular mechanism of AgNPs-induced cytotoxicity has not been studied thoroughly using a combination of cellular assays and RNA sequencing (RNA-Seq) analysis. In this study, we prepared AgNPs using myricetin, an anti-oxidant polyphenol, and studied their effects on NIH3T3 mouse embryonic fibroblasts as an in vitro model system to explore the potential biomedical applications of AgNPs. AgNPs induced loss of cell viability and cell proliferation in a dose-dependent manner, as evident by increased leakage of lactate dehydrogenase (LDH) from cells. Reactive oxygen species (ROS) were a potential source of cytotoxicity. AgNPs also incrementally increased oxidative stress and the level of malondialdehyde, depleted glutathione and superoxide dismutase, reduced mitochondrial membrane potential and adenosine triphosphate (ATP), and caused DNA damage by increasing the level of 8-hydroxy-2′-deoxyguanosine and the expressions of the p53 and p21 genes in NIH3T3 cells. Thus, activation of oxidative stress may be crucial for NIH3T3 cytotoxicity. Interestingly, gene ontology (GO) term analysis revealed alterations in epigenetics-related biological processes including nucleosome assembly and DNA methylation due to AgNPs exposure. This study is the first demonstration that AgNPs can alter bulk histone gene expression. Therefore, our genome-scale study suggests that the apoptosis observed in NIH3T3 cells treated with AgNPs is mediated by the repression of genes required for cell survival and the aberrant enhancement of nucleosome assembly components to induce apoptosis.
Collapse
Affiliation(s)
- Sangiliyandi Gurunathan
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Muhammad Qasim
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Chanhyeok Park
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Hyunjin Yoo
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Dong Yoon Choi
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Hyuk Song
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Chankyu Park
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Jin-Hoi Kim
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| | - Kwonho Hong
- Department of Stem Cell and Regenerative Biotechnology and Humanized Pig Center (SRC), Konkuk Institute of Technology, Konkuk University, Seoul 05029, Korea.
| |
Collapse
|
3
|
Abstract
Motivation Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. Results DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis, and found that about 2.5–35% of the proteomes are potential MPs. Availability and Implementation Code available at http://kiharalab.org/DextMP.
Collapse
Affiliation(s)
- Ishita K Khan
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Mansurul Bhuiyan
- Department of Computer Science, Indiana University-Purdue University Indianapolis (IUPUI), Indianapolis, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,Department of Biological Science, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
4
|
Ding Z, Wei Q, Kihara D. Computing and Visualizing Gene Function Similarity and Coherence with NaviGO. Methods Mol Biol 2018; 1807:113-130. [PMID: 30030807 DOI: 10.1007/978-1-4939-8561-6_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Gene ontology (GO) is a controlled vocabulary of gene functions across all species, which is widely used for functional analyses of individual genes and large-scale proteomic studies. NaviGO is a webserver for visualizing and quantifying the relationship and similarity of GO annotations. Here, we walk through functionality of the NaviGO webserver ( http://kiharalab.org/web/navigo/ ) using an example input and explain what can be learned from analysis results. NaviGO has four main functions, accessed from each page of the webserver: "GO Parents," "GO Set", "GO Enrichment", and "Protein Set." For a given list of GO terms, the "GO Parents" tab visualizes the hierarchical relationship of GO terms, and the "GO Set" tab calculates six functional similarity and association scores and presents results in a network and a multidimensional scaling plot. For a set of proteins and their associated GO terms, the "GO Enrichment" tab calculates protein GO functional enrichment, while the "Protein Set" tab calculates functional association between proteins. The NaviGO source code can be also downloaded and used locally or integrated into other software pipelines.
Collapse
Affiliation(s)
- Ziyun Ding
- Department of Biological Science, Purdue University, West Lafayette, IN, USA
| | - Qing Wei
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, USA. .,Department of Computer Science, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
5
|
Xi J, Wang M, Li A. Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. MOLECULAR BIOSYSTEMS 2017; 13:2135-2144. [DOI: 10.1039/c7mb00303j] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
An integrated approach to identify driver genes based on information of somatic mutations, the interaction network and Gene Ontology similarity.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
| | - Minghui Wang
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| | - Ao Li
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| |
Collapse
|
6
|
Semantic particularity measure for functional characterization of gene sets using gene ontology. PLoS One 2014; 9:e86525. [PMID: 24489737 PMCID: PMC3904913 DOI: 10.1371/journal.pone.0086525] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 12/11/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity. RESULTS We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure. CONCLUSION Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.
Collapse
|
7
|
Lee TL, Chiang JH. A Systems Biology Approach to Solving the Puzzle of Unknown Genomic Gene-Function Association Using Grid-Ready SVM Committee Machines. IEEE COMPUT INTELL M 2012. [DOI: 10.1109/mci.2012.2215126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
Revisiting the variation of clustering coefficient of biological networks suggests new modular structure. BMC SYSTEMS BIOLOGY 2012; 6:34. [PMID: 22548803 PMCID: PMC3465239 DOI: 10.1186/1752-0509-6-34] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2011] [Accepted: 02/16/2012] [Indexed: 12/03/2022]
Abstract
Background A central idea in biology is the hierarchical organization of cellular processes. A commonly used method to identify the hierarchical modular organization of network relies on detecting a global signature known as variation of clustering coefficient (so-called modularity scaling). Although several studies have suggested other possible origins of this signature, it is still widely used nowadays to identify hierarchical modularity, especially in the analysis of biological networks. Therefore, a further and systematical investigation of this signature for different types of biological networks is necessary. Results We analyzed a variety of biological networks and found that the commonly used signature of hierarchical modularity is actually the reflection of spoke-like topology, suggesting a different view of network architecture. We proved that the existence of super-hubs is the origin that the clustering coefficient of a node follows a particular scaling law with degree k in metabolic networks. To study the modularity of biological networks, we systematically investigated the relationship between repulsion of hubs and variation of clustering coefficient. We provided direct evidences for repulsion between hubs being the underlying origin of the variation of clustering coefficient, and found that for biological networks having no anti-correlation between hubs, such as gene co-expression network, the clustering coefficient doesn’t show dependence of degree. Conclusions Here we have shown that the variation of clustering coefficient is neither sufficient nor exclusive for a network to be hierarchical. Our results suggest the existence of spoke-like modules as opposed to “deterministic model” of hierarchical modularity, and suggest the need to reconsider the organizational principle of biological hierarchy.
Collapse
|
9
|
Sael L, Chitale M, Kihara D. Structure- and sequence-based function prediction for non-homologous proteins. ACTA ACUST UNITED AC 2012; 13:111-23. [PMID: 22270458 DOI: 10.1007/s10969-012-9126-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 01/10/2012] [Indexed: 01/14/2023]
Abstract
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | |
Collapse
|
10
|
Sun S, Dong X, Fu Y, Tian W. An iterative network partition algorithm for accurate identification of dense network modules. Nucleic Acids Res 2011; 40:e18. [PMID: 22121225 PMCID: PMC3273790 DOI: 10.1093/nar/gkr1103] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A key step in network analysis is to partition a complex network into dense modules. Currently, modularity is one of the most popular benefit functions used to partition network modules. However, recent studies suggested that it has an inherent limitation in detecting dense network modules. In this study, we observed that despite the limitation, modularity has the advantage of preserving the primary network structure of the undetected modules. Thus, we have developed a simple iterative Network Partition (iNP) algorithm to partition a network. The iNP algorithm provides a general framework in which any modularity-based algorithm can be implemented in the network partition step. Here, we tested iNP with three modularity-based algorithms: multi-step greedy (MSG), spectral clustering and Qcut. Compared with the original three methods, iNP achieved a significant improvement in the quality of network partition in a benchmark study with simulated networks, identified more modules with significantly better enrichment of functionally related genes in both yeast protein complex network and breast cancer gene co-expression network, and discovered more cancer-specific modules in the cancer gene co-expression network. As such, iNP should have a broad application as a general method to assist in the analysis of biological networks.
Collapse
Affiliation(s)
- Siqi Sun
- State Key Laboratory of Genetic Engineering, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai 200433, PR China
| | | | | | | |
Collapse
|
11
|
Abstract
The nuclear receptors (NRs) of metazoans are an ancient family of transcription factors defined by conserved DNA- and ligand-binding domains (DBDs and LBDs, respectively). The Drosophila melanogaster genome project revealed 18 canonical NRs (with DBDs and LBDs both present) and 3 receptors with the DBD only. Annotation of subsequently sequenced insect genomes revealed only minor deviations from this pattern. A renewed focus on functional analysis of the isoforms of insect NRs is therefore required to understand the diverse roles of these transcription factors in embryogenesis, metamorphosis, reproduction, and homeostasis. One insect NR, ecdysone receptor (EcR), functions as a receptor for the ecdysteroid molting hormones of insects. Researchers have developed nonsteroidal ecdysteroid agonists for EcR that disrupt molting and can be used as safe pesticides. An exciting new technology allows EcR to be used in chimeric, ligand-inducible gene-switch systems with applications in pest management and medicine.
Collapse
Affiliation(s)
- Susan E Fahrbach
- Department of Biology, Wake Forest University, Winston-Salem, North Carolina 27109, USA.
| | | | | |
Collapse
|
12
|
Quantification of protein group coherence and pathway assignment using functional association. BMC Bioinformatics 2011; 12:373. [PMID: 21929787 PMCID: PMC3189934 DOI: 10.1186/1471-2105-12-373] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Accepted: 09/19/2011] [Indexed: 11/11/2022] Open
Abstract
Background Genomics and proteomics experiments produce a large amount of data that are awaiting functional elucidation. An important step in analyzing such data is to identify functional units, which consist of proteins that play coherent roles to carry out the function. Importantly, functional coherence is not identical with functional similarity. For example, proteins in the same pathway may not share the same Gene Ontology (GO) terms, but they work in a coordinated fashion so that the aimed function can be performed. Thus, simply applying existing functional similarity measures might not be the best solution to identify functional units in omics data. Results We have designed two scores for quantifying the functional coherence by considering association of GO terms observed in two biological contexts, co-occurrences in protein annotations and co-mentions in literature in the PubMed database. The counted co-occurrences of GO terms were normalized in a similar fashion as the statistical amino acid contact potential is computed in the protein structure prediction field. We demonstrate that the developed scores can identify functionally coherent protein sets, i.e. proteins in the same pathways, co-localized proteins, and protein complexes, with statistically significant score values showing a better accuracy than existing functional similarity scores. The scores are also capable of detecting protein pairs that interact with each other. It is further shown that the functional coherence scores can accurately assign proteins to their respective pathways. Conclusion We have developed two scores which quantify the functional coherence of sets of proteins. The scores reflect the actual associations of GO terms observed either in protein annotations or in literature. It has been shown that they have the ability to accurately distinguish biologically relevant groups of proteins from random ones as well as a good discriminative power for detecting interacting pairs of proteins. The scores were further successfully applied for assigning proteins to pathways.
Collapse
|
13
|
Abstract
Classical algorithms aiming at identifying biological pathways significantly related to studying conditions frequently reduced pathways to gene sets, with an obvious ignorance of the constitutive non-equivalence of various genes within a defined pathway. We here designed a network-based method to determine such non-equivalence in terms of gene weights. The gene weights determined are biologically consistent and robust to network perturbations. By integrating the gene weights into the classical gene set analysis, with a subsequent correction for the "over-counting" bias associated with multi-subunit proteins, we have developed a novel gene-weighed pathway analysis approach, as implemented in an R package called "Gene Associaqtion Network-based Pathway Analysis" (GANPA). Through analysis of several microarray datasets, including the p53 dataset, asthma dataset and three breast cancer datasets, we demonstrated that our approach is biologically reliable and reproducible, and therefore helpful for microarray data interpretation and hypothesis generation.
Collapse
|