1
|
Castaneda EU, Baker EJ. KNeXT: a NetworkX-based topologically relevant KEGG parser. Front Genet 2024; 15:1292394. [PMID: 38415058 PMCID: PMC10896898 DOI: 10.3389/fgene.2024.1292394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/25/2024] [Indexed: 02/29/2024] Open
Abstract
Automating the recreation of gene and mixed gene-compound networks from Kyoto Encyclopedia of Genes and Genomes (KEGG) Markup Language (KGML) files is challenging because the data structure does not preserve the independent or loosely connected neighborhoods in which they were originally derived, referred to here as its topological environment. Identical accession numbers may overlap, causing neighborhoods to artificially collapse based on duplicated identifiers. This causes current parsers to create misleading or erroneous graphical representations when mixed gene networks are converted to gene-only networks. To overcome these challenges we created a python-based KEGG NetworkX Topological (KNeXT) parser that allows users to accurately recapitulate genetic networks and mixed networks from KGML map data. The software, archived as a python package index (PyPI) file to ensure broad application, is designed to ingest KGML files through built-in APIs and dynamically create high-fidelity topological representations. The utilization of NetworkX's framework to generate tab-separated files additionally ensures that KNeXT results may be imported into other graph frameworks and maintain programmatic access to the original x-y axis positions to each node in the KEGG pathway. KNeXT is a well-described Python 3 package that allows users to rapidly download and aggregate specific KGML files and recreate KEGG pathways based on a range of user-defined settings. KNeXT is platform-independent, distinctive, and it is not written on top of other Python parsers. Furthermore, KNeXT enables users to parse entire local folders or single files through command line scripts and convert the output into NCBI or UniProt IDs. KNeXT provides an ability for researchers to generate pathway visualizations while persevering the original context of a KEGG pathway. Source code is freely available at https://github.com/everest-castaneda/knext.
Collapse
Affiliation(s)
- Everest Uriel Castaneda
- Department of Biology, Baylor University, Waco, TX, United States
- School of Engineering and Computer Science, Baylor University, Waco, TX, United States
| | - Erich J Baker
- Department of Mathematics and Computer Science, Belmont University, Nashville, TN, United States
| |
Collapse
|
2
|
Krishna Siva Prasad M, Sharma P. Exploring intrinsic information content models for addressing the issues of traditional semantic measures to evaluate verb similarity. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2021.101280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
3
|
Elahi A, Babamir SM. Identification of Protein Complexes Based on Core-Attachment Structure and Combination of Centrality Measures and Biological Properties in PPI Weighted Networks. Protein J 2020; 39:681-702. [PMID: 33040223 DOI: 10.1007/s10930-020-09922-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/28/2020] [Indexed: 02/02/2023]
Abstract
In protein interaction networks, a complex is a group of proteins that causes a biological process to take place. The correct identification of complexes can help to better understand function of cells used for therapeutic purposes, such as drug discoveries. This paper uses core-attachment structure, centrality measures, and biological properties of proteins to identify protein complex with the aim of enhancing prediction accuracy compared to related work. We used the inherent organization of complex to the identification in this article, while most methods have not considered such properties. On the other hand, clustering methods, as the common method for identifying complexes in protein interaction networks have been applied. However, we want to propose a method for more accurate identification of complexes in this article. Using this method, we determined the core center of each complex and its attachment proteins using the centrality measures, biological properties and weight density, whereby the weight of each interaction was calculated using the protein information in the gene ontology. In the proposed approach to weighting the network and measuring the importance of proteins, we used our previous work. To compare with other methods, we used datasets DIP, Collins, Krogan, and Human. The results show that the performance of our method was significantly improved, compared to other methods, in terms of detecting the protein complex. Using the p-value concept, we show the biological significance of our predicted complexes. The proposed method could identify an acceptable number of protein complexes, with the highest proportion of biological significance in collaborating on the functional annotation of proteins.
Collapse
|
4
|
Lavarenne J, Guyomarc'h S, Sallaud C, Gantet P, Lucas M. The Spring of Systems Biology-Driven Breeding. TRENDS IN PLANT SCIENCE 2018; 23:706-720. [PMID: 29764727 DOI: 10.1016/j.tplants.2018.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 04/12/2018] [Accepted: 04/16/2018] [Indexed: 05/08/2023]
Abstract
Genetics and molecular biology have contributed to the development of rationalized plant breeding programs. Recent developments in both high-throughput experimental analyses of biological systems and in silico data processing offer the possibility to address the whole gene regulatory network (GRN) controlling a given trait. GRN models can be applied to identify topological features helping to shortlist potential candidate genes for breeding purposes. Time-series data sets can be used to support dynamic modelling of the network. This will enable a deeper comprehension of network behaviour and the identification of the few elements to be genetically rewired to push the system towards a modified phenotype of interest. This paves the way to design more efficient, systems biology-based breeding strategies.
Collapse
Affiliation(s)
- Jérémy Lavarenne
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France; Biogemma, Centre de Recherches de Chappes, Route d'Ennezat, 63720 Chappes, France
| | - Soazig Guyomarc'h
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France
| | - Christophe Sallaud
- Biogemma, Centre de Recherches de Chappes, Route d'Ennezat, 63720 Chappes, France
| | - Pascal Gantet
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France.
| | - Mikaël Lucas
- UMR DIADE, Université de Montpellier, IRD, 911 Avenue Agropolis, 34394 Montpellier cedex 5, France
| |
Collapse
|
5
|
Zhang J, Jia K, Jia J, Qian Y. An improved approach to infer protein-protein interaction based on a hierarchical vector space model. BMC Bioinformatics 2018; 19:161. [PMID: 29699476 PMCID: PMC5921294 DOI: 10.1186/s12859-018-2152-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 04/09/2018] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Comparing and classifying functions of gene products are important in today's biomedical research. The semantic similarity derived from the Gene Ontology (GO) annotation has been regarded as one of the most widely used indicators for protein interaction. Among the various approaches proposed, those based on the vector space model are relatively simple, but their effectiveness is far from satisfying. RESULTS We propose a Hierarchical Vector Space Model (HVSM) for computing semantic similarity between different genes or their products, which enhances the basic vector space model by introducing the relation between GO terms. Besides the directly annotated terms, HVSM also takes their ancestors and descendants related by "is_a" and "part_of" relations into account. Moreover, HVSM introduces the concept of a Certainty Factor to calibrate the semantic similarity based on the number of terms annotated to genes. To assess the performance of our method, we applied HVSM to Homo sapiens and Saccharomyces cerevisiae protein-protein interaction datasets. Compared with TCSS, Resnik, and other classic similarity measures, HVSM achieved significant improvement for distinguishing positive from negative protein interactions. We also tested its correlation with sequence, EC, and Pfam similarity using online tool CESSM. CONCLUSIONS HVSM showed an improvement of up to 4% compared to TCSS, 8% compared to IntelliGO, 12% compared to basic VSM, 6% compared to Resnik, 8% compared to Lin, 11% compared to Jiang, 8% compared to Schlicker, and 11% compared to SimGIC using AUC scores. CESSM test showed HVSM was comparable to SimGIC, and superior to all other similarity measures in CESSM as well as TCSS. Supplementary information and the software are available at https://github.com/kejia1215/HVSM .
Collapse
Affiliation(s)
- Jiongmin Zhang
- Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Ke Jia
- Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| | - Jinmeng Jia
- School of life science, East China Normal University, Dongchuan Road, Shanghai, 200241 China
| | - Ying Qian
- Department of Computer Science & Technology, East China Normal University, North Zhongshan Road, Shanghai, 200062 China
| |
Collapse
|
6
|
Peng J, Zhang X, Hui W, Lu J, Li Q, Liu S, Shang X. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. BMC SYSTEMS BIOLOGY 2018; 12:18. [PMID: 29560823 PMCID: PMC5861498 DOI: 10.1186/s12918-018-0539-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. RESULTS We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. CONCLUSIONS Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China. .,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China. .,Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuanshuo Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Qianqian Li
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Shuhui Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China
| |
Collapse
|
7
|
Díaz-Montaña JJ, Gómez-Vela F, Díaz-Díaz N. GNC-app: A new Cytoscape app to rate gene networks biological coherence using gene-gene indirect relationships. Biosystems 2018; 166:61-65. [PMID: 29408296 DOI: 10.1016/j.biosystems.2018.01.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Revised: 12/22/2017] [Accepted: 01/27/2018] [Indexed: 01/13/2023]
Abstract
MOTIVATION Gene networks are currently considered a powerful tool to model biological processes in the Bioinformatics field. A number of approaches to infer gene networks and various software tools to handle them in a visual simplified way have been developed recently. However, there is still a need to assess the inferred networks in order to prove their relevance. RESULTS In this paper, we present the new GNC-app for Cytoscape. GNC-app implements the GNC methodology for assessing the biological coherence of gene association networks and integrates it into Cytoscape. Implemented de novo, GNC-app significantly improves the performance of the original algorithm in order to be able to analyse large gene networks more efficiently. It has also been integrated in Cytoscape to increase the tool accessibility for non-technical users and facilitate the visual analysis of the results. This integration allows the user to analyse not only the global biological coherence of the network, but also the biological coherence at the gene-gene relationship level. It also allows the user to leverage Cytoscape capabilities as well as its rich ecosystem of apps to perform further analyses and visualizations of the network using such data. AVAILABILITY The GNC-app is freely available at the official Cytoscape app store: http://apps.cytoscape.org/apps/gnc.
Collapse
Affiliation(s)
- Juan J Díaz-Montaña
- Intelligent Data Analysis (DATAi), Division of Computer Science, Pablo de Olavide University, ES-41013 Seville, Spain.
| | - Francisco Gómez-Vela
- Intelligent Data Analysis (DATAi), Division of Computer Science, Pablo de Olavide University, ES-41013 Seville, Spain.
| | - Norberto Díaz-Díaz
- Intelligent Data Analysis (DATAi), Division of Computer Science, Pablo de Olavide University, ES-41013 Seville, Spain.
| |
Collapse
|