1
|
Chen Q, Ma B, Xu M, Xu H, Yan Z, Wang F, Wang Y, Huang Z, Yin S, Zhao Y, Wang L, Wu H, Liu X. Comparative proteomics study of exosomes in Vibrio harveyi and Vibrio anguillarum. Microb Pathog 2023:106174. [PMID: 37244489 DOI: 10.1016/j.micpath.2023.106174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 05/17/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Exosomes are a class of extracellular vesicles released by bacteria and contain diverse biomolecules. In this study, we isolated exosomes from Vibrio harveyi and Vibrio anguillarum, which are both serious pathogens in mariculture, using a supercentrifugation method, and the proteins in the exosomes of these two vibrios were analyzed by LC-MS/MS proteomics. Exosome proteins released by V. harveyi and V. anguillarum were different; they not only contained virulence factors (such as lipase and phospholipase in V. harveyi, metalloprotease and hemolysin in V. anguillarum), but also participated in the important life activities of bacteria (such as fatty acid biosynthesis, biosynthesis of antibiotics, carbon metabolism). Subsequently, to verify whether the exosomes participated in bacterial toxicity, after Ruditapes philippinarum was challenged with V. harveyi and V. anguillarum, the corresponding genes of virulence factors from exosomes screened by proteomics were tested by quantitative real-time PCR. All the genes detected were upregulated which suggested that exosomes were involved in vibrio toxicity. The results could provide an effective proteome database for decoding the pathogenic mechanism of vibrios from the exosome perspective.
Collapse
Affiliation(s)
- Qian Chen
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Bangguo Ma
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Mingzhe Xu
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Huiwen Xu
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Zimiao Yan
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Fei Wang
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Yiran Wang
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Zitong Huang
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Shuchang Yin
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Yancui Zhao
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Lei Wang
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Hongyan Wu
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China
| | - Xiaoli Liu
- Key Laboratory of Marine Biotechnology in Universities of Shandong, School of Life Sciences, Ludong University, Yantai, 264025, PR China.
| |
Collapse
|
2
|
Abeysinghe R, Yang Y, Bartels M, Zheng WJ, Cui L. An evidence-based lexical pattern approach for quality assurance of Gene Ontology relations. Brief Bioinform 2022; 23:bbac122. [PMID: 35419584 PMCID: PMC9116247 DOI: 10.1093/bib/bbac122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/14/2022] Open
Abstract
Gene Ontology (GO) is widely used in the biological domain. It is the most comprehensive ontology providing formal representation of gene functions (GO concepts) and relations between them. However, unintentional quality defects (e.g. missing or erroneous relations) in GO may exist due to the large size of GO concepts and complexity of GO structures. Such quality defects would impact the results of GO-based analyses and applications. In this work, we introduce a novel evidence-based lexical pattern approach for quality assurance of GO relations. We leverage two layers of evidence to suggest potentially missing relations in GO as follows. We first utilize related concept pairs (i.e. existing relations) in GO to extract relationship-specific lexical patterns, which serve as the first layer evidence to automatically suggest potentially missing relations between unrelated concept pairs. For each suggested missing relation, we further identify two other existing relations as the second layer of evidence that resemble the difference between the missing relation and the existing relation based on which the missing relation is suggested. Applied to the 15 December 2021 release of GO, this approach suggested a total of 866 potentially missing relations. Local domain experts evaluated the entire set of potentially missing relations, and identified 821 as missing relations and 45 indicate erroneous existing relations. We submitted these findings to the GO consortium for further validation and received encouraging feedback. These indicate that our evidence-based approach can be utilized to uncover missing relations and erroneous existing relations in GO.
Collapse
Affiliation(s)
- Rashmie Abeysinghe
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Yuntao Yang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Mason Bartels
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - W Jim Zheng
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Licong Cui
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| |
Collapse
|
3
|
Zheng F, Abeysinghe R, Cui L. Identification of missing concepts in biomedical terminologies using sequence-based formal concept analysis. BMC Med Inform Decis Mak 2021; 21:234. [PMID: 34753458 PMCID: PMC8579614 DOI: 10.1186/s12911-021-01592-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 07/21/2021] [Indexed: 11/15/2022] Open
Abstract
Background As biomedical knowledge is rapidly evolving, concept enrichment of biomedical terminologies is an active research area involving automatic identification of missing or new concepts. Previously, we prototyped a lexical-based formal concept analysis (FCA) approach in which concepts were derived by intersecting bags of words, to identify potentially missing concepts in the National Cancer Institute (NCI) Thesaurus. However, this prototype did not handle concept naming and positioning. In this paper, we introduce a sequenced-based FCA approach to identify potentially missing concepts, supporting concept naming and positioning. Methods We consider the concept name sequences as FCA attributes to construct the formal context. The concept-forming process is performed by computing the longest common substrings of concept name sequences. After new concepts are formalized, we further predict their potential positions in the original hierarchy by identifying their supertypes and subtypes from original concepts. Automated validation via external terminologies in the Unified Medical Language System (UMLS) and biomedical literature in PubMed is performed to evaluate the effectiveness of our approach. Results We applied our sequenced-based FCA approach to all the sub-hierarchies under Disease or Disorder in the NCI Thesaurus (19.08d version) and five sub-hierarchies under Clinical Finding and Procedure in the SNOMED CT (US Edition, March 2020 release). In total, 1397 potentially missing concepts were identified in the NCI Thesaurus and 7223 in the SNOMED CT. For NCI Thesaurus, 85 potentially missing concepts were found in external terminologies and 315 of the remaining 1312 appeared in biomedical literature. For SNOMED CT, 576 were found in external terminologies and 1159 out of the remaining 6647 were found in biomedical literature. Conclusion Our sequence-based FCA approach has shown the promise for identifying potentially missing concepts in biomedical terminologies.
Collapse
Affiliation(s)
- Fengbo Zheng
- Department of Computer Science, University of Kentucky, Lexington, KY, USA.,School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Rashmie Abeysinghe
- Department of Neurology, McGovern School of Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
4
|
Zheng F, Cui L. A Lexical-based Formal Concept Analysis Method to Identify Missing Concepts in the NCI Thesaurus. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2021; 2020. [PMID: 34721941 DOI: 10.1109/bibm49941.2020.9313186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Biomedical terminologies have been increasingly used in modern biomedical research and applications to facilitate data management and ensure semantic interoperability. As part of the evolution process, new concepts are regularly added to biomedical terminologies in response to the evolving domain knowledge and emerging applications. Most existing concept enrichment methods suggest new concepts via directly importing knowledge from external sources. In this paper, we introduced a lexical method based on formal concept analysis (FCA) to identify potentially missing concepts in a given terminology by leveraging its intrinsic knowledge - concept names. We first construct the FCA formal context based on the lexical features of concepts. Then we perform multistage intersection to formalize new concepts and detect potentially missing concepts. We applied our method to the Disease or Disorder sub-hierarchy in the National Cancer Institute (NCI) Thesaurus (19.08d version) and identified a total of 8,983 potentially missing concepts. As a preliminary evaluation of our method to validate the potentially missing concepts, we further checked whether they were included in any external source terminology in the Unified Medical Language System (UMLS). The result showed that 592 out of 8,937 potentially missing concepts were found in the UMLS.
Collapse
Affiliation(s)
- Fengbo Zheng
- Department of Computer Science, University of Kentucky, Lexington, Kentucky, USA.,School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
5
|
Abeysinghe R, Hinderer EW, Moseley HNB, Cui L. SSIF: Subsumption-based Sub-term Inference Framework to audit Gene Ontology. Bioinformatics 2020; 36:3207-3214. [PMID: 32065617 PMCID: PMC7214018 DOI: 10.1093/bioinformatics/btaa106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/08/2020] [Accepted: 02/11/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION The Gene Ontology (GO) is the unifying biological vocabulary for codifying, managing and sharing biological knowledge. Quality issues in GO, if not addressed, can cause misleading results or missed biological discoveries. Manual identification of potential quality issues in GO is a challenging and arduous task, given its growing size. We introduce an automated auditing approach for suggesting potentially missing is-a relations, which may further reveal erroneous is-a relations. RESULTS We developed a Subsumption-based Sub-term Inference Framework (SSIF) by leveraging a novel term-algebra on top of a sequence-based representation of GO concepts along with three conditional rules (monotonicity, intersection and sub-concept rules). Applying SSIF to the October 3, 2018 release of GO suggested 1938 unique potentially missing is-a relations. Domain experts evaluated a random sample of 210 potentially missing is-a relations. The results showed SSIF achieved a precision of 60.61, 60.49 and 46.03% for the monotonicity, intersection and sub-concept rules, respectively. AVAILABILITY AND IMPLEMENTATION SSIF is implemented in Java. The source code is available at https://github.com/rashmie/SSIF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rashmie Abeysinghe
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.,Department of Computer Science
| | | | - Hunter N B Moseley
- Department of Molecular and Cellular Biochemistry.,Institute for Biomedical Informatics.,Markey Cancer Center.,Center for Environmental and Systems Biochemistry, University of Kentucky, Lexington, KY 40506, USA
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
6
|
Qiu WR, Xu A, Xu ZC, Zhang CH, Xiao X. Identifying Acetylation Protein by Fusing Its PseAAC and Functional Domain Annotation. Front Bioeng Biotechnol 2019; 7:311. [PMID: 31867311 PMCID: PMC6908504 DOI: 10.3389/fbioe.2019.00311] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 10/22/2019] [Indexed: 11/13/2022] Open
Abstract
Acetylation is one of post-translational modification (PTM), which often reacts with acetic acid and brings an acetyl radical to an organic compound. It is helpful to identify acetylation protein correctly for understanding the mechanism of acetylation in biological systems. Although many acetylation sites have been identified by high throughput experimental studies via mass spectrometry, there still are lots of acetylation sites need to be discovered. Computational methods have showed their power for identifying acetylation sites with informatics techniques which usually reduce experiment cost and improve the effectiveness and efficiency. In fact, if there is an approach can distinguish the acetylated proteins from the non-acetylated ones, it is no doubt a very meaningful and effective method for this issue. Here, we proposed a novel computational method for identifying acetylation proteins by extracting features from the conservation information of sequence via gray system model and KNN scores based on the information of functional domain annotation and subcellular localization. The authors have performed the 5-fold cross-validation on three datasets along with much analysis of features and the Relief feature selection algorithm. The obtained accuracies are all satisfactory, as the mean performance, the accuracy is 77.10%, the Matthew's correlation coefficient is 0.5457, and the AUC value is 0.8389. These works might provide useful insights for the related experimental validation, and further studies of other PTM process. For the convenience of related researchers, the web-server named “iACetyP” was established and is accessible at http://www.jci-bioinfo.cn/iAcetyP.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Ao Xu
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Zhao-Chun Xu
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Chun-Hua Zhang
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
7
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
8
|
Yang X, Li Y, Lv R, Qian H, Chen X, Yang CF. Study on the Multitarget Mechanism and Key Active Ingredients of Herba Siegesbeckiae and Volatile Oil against Rheumatoid Arthritis Based on Network Pharmacology. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE : ECAM 2019; 2019:8957245. [PMID: 31885670 PMCID: PMC6899322 DOI: 10.1155/2019/8957245] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/28/2019] [Indexed: 11/30/2022]
Abstract
BACKGROUND Herba Siegesbeckiae (HS, Xixiancao in Chinese) is widely used to treat inflammatory joint diseases such as rheumatoid arthritis (RA) and arthritis, and its molecular mechanisms and active ingredients have not been completely elucidated. METHODS In this study, the small molecule ligand library of HS was built based on Traditional Chinese Medicine Systems Pharmacology (TCMSP). The essential oil from HS was extracted through hydrodistillation and analyzed by Gas Chromatography-Mass Spectrometer (GC-MS). The target of RA was screened based on Comparative Toxicogenomics Database (CTD). The key genes were output by the four algorithms' maximum neighborhood component (MNC), degree, maximal clique centrality (MCC), and stress in cytoHubba in Cytoscape, while biological functions and pathways were also analyzed. The key active ingredients and mechanism of HS and essential oil against RA were verified by molecular docking technology (Sybyl 2.1.1) in treating RA. The interaction between 6 active ingredients (degree ≥ 5) and CSF2, IL1β, TNF, and IL6 was researched based on the software Ligplot. RESULTS There were 31 small molecule constituents of HS and 16 main chemical components of essential oil (relative content >1%) of HS. There were 47 chemical components in HS. Networks showed that 9 core targets (TNF, IL1β, CSF2, IFNG, CTLA4, IL18, CD26, CXCL8, and IL6) of RA were based on Venn diagrams. In addition, molecular docking simulation indicated that CSF2, IL1β, TNF, and IL6 had good binding activity with the corresponding compounds (degree > 10).The 6 compounds (degree ≥ 5) of HS and essential oil had good interaction with 5 or more targets. CONCLUSION This study validated and predicted the mechanism and key active ingredients of HS and volatile oil in treating RA. Additionally, this study provided a good foundation for further experimental studies.
Collapse
Affiliation(s)
- Xin Yang
- Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
| | - Yahui Li
- Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
| | - Runlin Lv
- Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
| | - Haibing Qian
- Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
| | - Xiangyun Chen
- Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
| | - Chang Fu Yang
- Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
| |
Collapse
|
9
|
Xiao X, Chen WJ, Qiu WR. A Novel Prediction of Quaternary Structural Type of Proteins with Gene Ontology. Protein Pept Lett 2019; 27:313-320. [PMID: 31749418 DOI: 10.2174/0929866526666191014144618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 05/20/2019] [Accepted: 06/29/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The information of quaternary structure attributes of proteins is very important because it is closely related to the biological functions of proteins. With the rapid development of new generation sequencing technology, we are facing a challenge: how to automatically identify the four-level attributes of new polypeptide chains according to their sequence information (i.e., whether they are formed as just as a monomer, or as a hetero-oligomer, or a homo-oligomer). OBJECTIVE In this article, our goal is to find a new way to represent protein sequences, thereby improving the prediction rate of protein quaternary structure. METHODS In this article, we developed a prediction system for protein quaternary structural type in which a protein sequence was expressed by combining the Pfam functional-domain and gene ontology. turn protein features into digital sequences, and complete the prediction of quaternary structure through specific machine learning algorithms and verification algorithm. RESULTS Our data set contains 5495 protein samples. Through the method provided in this paper, we classify proteins into monomer, or as a hetero-oligomer, or a homo-oligomer, and the prediction rate is 74.38%, which is 3.24% higher than that of previous studies. Through this new feature extraction method, we can further classify the four-level structure of proteins, and the results are also correspondingly improved. CONCLUSION After the applying the new prediction system, compared with the previous results, we have successfully improved the prediction rate. We have reason to believe that the feature extraction method in this paper has better practicability and can be used as a reference for other protein classification problems.
Collapse
Affiliation(s)
- Xuan Xiao
- School of Information, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wei-Jie Chen
- School of Information, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Wang-Ren Qiu
- School of Information, Jingdezhen Ceramic Institute, Jingdezhen 333403, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
10
|
Wang T, Peng J, Peng Q, Wang Y, Chen J. FSM: Fast and scalable network motif discovery for exploring higher-order network organizations. Methods 2019; 173:83-93. [PMID: 31306744 DOI: 10.1016/j.ymeth.2019.07.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/30/2019] [Accepted: 07/09/2019] [Indexed: 01/06/2023] Open
Abstract
Networks exhibit rich and diverse higher-order organizational structures. Network motifs, which are recurring significant patterns of inter-connections, are recognized as fundamental units to study the higher-order organizations of networks. However, the principle of selecting representative network motifs for local motif based clustering remains largely unexplored. We present a scalable algorithm called FSM for network motif discovery. FSM is advantageous in twofold. First, it accelerates the motif discovery process by effectively reducing the number of times for subgraph isomorphism labeling. Second, FSM adopts multiple heuristic optimizations for subgraph enumeration and classification to further improve its performance. Experimental results on biological networks show that, comparing with the existing network motif discovery algorithm, FSM is more efficient on computational efficiency and memory usage. Furthermore, with the large, frequent, and sparse network motifs discovered by FSM, the higher-order organizational structures of biological networks were successfully revealed, indicating that FSM is suitable to select network representative network motifs for exploring high-order network organizations.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA.
| |
Collapse
|
11
|
Peng J, Wang X, Shang X. Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data. BMC Bioinformatics 2019; 20:284. [PMID: 31182005 PMCID: PMC6557741 DOI: 10.1186/s12859-019-2769-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Background Single cell RNA sequencing (scRNA-seq) is applied to assay the individual transcriptomes of large numbers of cells. The gene expression at single-cell level provides an opportunity for better understanding of cell function and new discoveries in biomedical areas. To ensure that the single-cell based gene expression data are interpreted appropriately, it is crucial to develop new computational methods. Results In this article, we try to re-construct a neural network based on Gene Ontology (GO) for dimension reduction of scRNA-seq data. By integrating GO with both unsupervised and supervised models, two novel methods are proposed, named GOAE (Gene Ontology AutoEncoder) and GONN (Gene Ontology Neural Network) respectively. Conclusions The evaluation results show that the proposed models outperform some state-of-the-art dimensionality reduction approaches. Furthermore, incorporating with GO, we provide an opportunity to interpret the underlying biological mechanism behind the neural network-based model.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China.,Centre for Multidisciplinary Convergence Computing, School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xiaoyu Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China. .,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China.
| |
Collapse
|
12
|
Yu MK, Ma J, Ono K, Zheng F, Fong SH, Gary A, Chen J, Demchak B, Pratt D, Ideker T. DDOT: A Swiss Army Knife for Investigating Data-Driven Biological Ontologies. Cell Syst 2019; 8:267-273.e3. [PMID: 30878356 PMCID: PMC7042149 DOI: 10.1016/j.cels.2019.02.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 12/08/2018] [Accepted: 02/08/2019] [Indexed: 01/08/2023]
Abstract
Systems biology requires not only genome-scale data but also methods to integrate these data into interpretable models. Previously, we developed approaches that organize omics data into a structured hierarchy of cellular components and pathways, called a "data-driven ontology." Such hierarchies recapitulate known cellular subsystems and discover new ones. To broadly facilitate this type of modeling, we report the development of a software library called the Data-Driven Ontology Toolkit (DDOT), consisting of a Python package (https://github.com/idekerlab/ddot) to assemble and analyze ontologies and a web application (http://hiview.ucsd.edu) to visualize them. Using DDOT, we programmatically assemble a compendium of ontologies for 652 diseases by integrating gene-disease mappings with a gene similarity network derived from omics data. For example, the ontology for Fanconi anemia describes known and novel disease mechanisms in its hierarchy of 194 genes and 74 subsystems. DDOT provides an easy interface to share ontologies online at the Network Data Exchange.
Collapse
Affiliation(s)
- Michael Ku Yu
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA 92093, USA; Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Jianzhu Ma
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Fan Zheng
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Samson H Fong
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Aaron Gary
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Barry Demchak
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Dexter Pratt
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA; Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA 92093, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
13
|
Feng S, Fu P, Zheng W. A hierarchical multi-label classification method based on neural networks for gene function prediction. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1521302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Affiliation(s)
- Shou Feng
- Department of Automatic Test and Control, School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, PR China
| | - Ping Fu
- Department of Automatic Test and Control, School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, PR China
| | - Wenbin Zheng
- Department of Automatic Test and Control, School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, PR China
| |
Collapse
|
14
|
Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018; 19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs. Results We present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network. Conclusions PhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055, China. .,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
15
|
Hu Y, Zhao T, Zhang N, Zang T, Zhang J, Cheng L. Identifying diseases-related metabolites using random walk. BMC Bioinformatics 2018; 19:116. [PMID: 29671398 PMCID: PMC5907145 DOI: 10.1186/s12859-018-2098-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Metabolites disrupted by abnormal state of human body are deemed as the effect of diseases. In comparison with the cause of diseases like genes, these markers are easier to be captured for the prevention and diagnosis of metabolic diseases. Currently, a large number of metabolic markers of diseases need to be explored, which drive us to do this work. Methods The existing metabolite-disease associations were extracted from Human Metabolome Database (HMDB) using a text mining tool NCBO annotator as priori knowledge. Next we calculated the similarity of a pair-wise metabolites based on the similarity of disease sets of them. Then, all the similarities of metabolite pairs were utilized for constructing a weighted metabolite association network (WMAN). Subsequently, the network was utilized for predicting novel metabolic markers of diseases using random walk. Results Totally, 604 metabolites and 228 diseases were extracted from HMDB. From 604 metabolites, 453 metabolites are selected to construct the WMAN, where each metabolite is deemed as a node, and the similarity of two metabolites as the weight of the edge linking them. The performance of the network is validated using the leave one out method. As a result, the high area under the receiver operating characteristic curve (AUC) (0.7048) is achieved. The further case studies for identifying novel metabolites of diabetes mellitus were validated in the recent studies. Conclusion In this paper, we presented a novel method for prioritizing metabolite-disease pairs. The superior performance validates its reliability for exploring novel metabolic markers of diseases.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Jun Zhang
- Department of rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China.
| |
Collapse
|
16
|
Peng J, Zhang X, Hui W, Lu J, Li Q, Liu S, Shang X. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. BMC SYSTEMS BIOLOGY 2018; 12:18. [PMID: 29560823 PMCID: PMC5861498 DOI: 10.1186/s12918-018-0539-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. RESULTS We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. CONCLUSIONS Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China. .,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China. .,Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuanshuo Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Qianqian Li
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Shuhui Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China
| |
Collapse
|
17
|
Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics 2017; 18:573. [PMID: 29297309 PMCID: PMC5751813 DOI: 10.1186/s12859-017-1959-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The Gene Ontology (GO) is a community-based bioinformatics resource that employs ontologies to represent biological knowledge and describes information about gene and gene product function. GO includes three independent categories: molecular function, biological process and cellular component. For better biological reasoning, identifying the biological relationships between terms in different categories are important. However, the existing measurements to calculate similarity between terms in different categories are either developed by using the GO data only or only take part of combined gene co-function network information. Results We propose an iterative ranking-based method called CroGO2 to measure the cross-categories GO term similarities by incorporating level information of GO terms with both direct and indirect interactions in the gene co-function network. Conclusions The evaluation test shows that CroGO2 performs better than the existing methods. A genome-specific term association network for yeast is also generated by connecting terms with the high confidence score. The linkages in the term association network could be supported by the literature. Given a gene set, the related terms identified by using the association network have overlap with the related terms identified by GO enrichment analysis.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Honggang Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
18
|
Hu Y, Zhou M, Shi H, Ju H, Jiang Q, Cheng L. Measuring disease similarity and predicting disease-related ncRNAs by a novel method. BMC Med Genomics 2017; 10:71. [PMID: 29297338 PMCID: PMC5751624 DOI: 10.1186/s12920-017-0315-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Background Similar diseases are always caused by similar molecular origins, such as diasease-related protein-coding genes (PCGs). And the molecular associations reflect their similarity. Therefore, current methods for calculating disease similarity often utilized functional interactions of PCGs. Besides, the existing methods have neglected a fact that genes could also be associated in the gene functional network (GFN) based on intermediate nodes. Methods Here we presented a novel method, InfDisSim, to deduce the similarity of diseases. InfDisSim utilized the whole network based on random walk with damping to model the information flow. A benchmark set of similar disease pairs was employed to evaluate the performance of InfDisSim. Results The region beneath the receiver operating characteristic curve (AUC) was calculated to assess the performance. As a result, InfDisSim reaches a high AUC (0.9786) which indicates a very good performance. Furthermore, after calculating the disease similarity by the InfDisSim, we reconfirmed that similar diseases tend to have common therapeutic drugs (Pearson correlation γ2 = 0.1315, p = 2.2e-16). Finally, the disease similarity computed by infDisSim was employed to construct a miRNA similarity network (MSN) and lncRNA similarity network (LSN), which were further exploited to predict potential associations of lncRNA-disease pairs and miRNA-disease pairs, respectively. High AUC (0.9893, 0.9007) based on leave-one-out cross validation shows that the LSN and MSN is very appropriate for predicting novel disease-related lncRNAs and miRNAs, respectively. Conclusions The high AUC based on benchmark data indicates the method performs well. The method is valuable in the prediction of disease-related lncRNAs and miRNAs. Electronic supplementary material The online version of this article (doi: 10.1186/s12920-017-0315-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China
| | - Hong Ju
- Department of information engineering, Heilongjiang biological science and technology Career Academy, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China.
| |
Collapse
|
19
|
Yang F, Wu D, Lin L, Yang J, Yang T, Zhao J. The integration of weighted gene association networks based on information entropy. PLoS One 2017; 12:e0190029. [PMID: 29272314 PMCID: PMC5741255 DOI: 10.1371/journal.pone.0190029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 12/06/2017] [Indexed: 01/18/2023] Open
Abstract
Constructing genome scale weighted gene association networks (WGAN) from multiple data sources is one of research hot spots in systems biology. In this paper, we employ information entropy to describe the uncertain degree of gene-gene links and propose a strategy for data integration of weighted networks. We use this method to integrate four existing human weighted gene association networks and construct a much larger WGAN, which includes richer biology information while still keeps high functional relevance between linked gene pairs. The new WGAN shows satisfactory performance in disease gene prediction, which suggests the reliability of our integration strategy. Compared with existing integration methods, our method takes the advantage of the inherent characteristics of the component networks and pays less attention to the biology background of the data. It can make full use of existing biological networks with low computational effort.
Collapse
Affiliation(s)
- Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Duzhi Wu
- Rongzhi College of Chongqing Technology and Business, Chongqing, China
- * E-mail: (DW); (JZ)
| | - Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- * E-mail: (DW); (JZ)
| |
Collapse
|
20
|
Hu Y, Zhao L, Liu Z, Ju H, Shi H, Xu P, Wang Y, Cheng L. DisSetSim: an online system for calculating similarity between disease sets. J Biomed Semantics 2017; 8:28. [PMID: 29297411 PMCID: PMC5763469 DOI: 10.1186/s13326-017-0140-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Functional similarity between molecules results in similar phenotypes, such as diseases. Therefore, it is an effective way to reveal the function of molecules based on their induced diseases. However, the lack of a tool for obtaining the similarity score of pair-wise disease sets (SSDS) limits this type of application. Results Here, we introduce DisSetSim, an online system to solve this problem in this article. Five state-of-the-art methods involving Resnik’s, Lin’s, Wang’s, PSB, and SemFunSim methods were implemented to measure the similarity score of pair-wise diseases (SSD) first. And then “pair-wise-best pairs-average” (PWBPA) method was implemented to calculated the SSDS by the SSD. The system was applied for calculating the functional similarity of miRNAs based on their induced disease sets. The results were further used to predict potential disease-miRNA relationships. Conclusions The high area under the receiver operating characteristic curve AUC (0.9296) based on leave-one-out cross validation shows that the PWBPA method achieves a high true positive rate and a low false positive rate. The system can be accessed from http://www.bio-annotation.cn:8080/DisSetSim/.
Collapse
Affiliation(s)
- Yang Hu
- Harbin Institute of Technology, School of Life Science and Technology, Harbin, 150001, People's Republic of China
| | - Lingling Zhao
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhiyan Liu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Hong Ju
- Department of information engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, 150001, People's Republic of China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, People's Republic of China
| | - Peigang Xu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, People's Republic of China.
| |
Collapse
|
21
|
Zhang C, Li X, Li S, Feng Z. Dynamically analyzing cell interactions in biological environments using multiagent social learning framework. J Biomed Semantics 2017; 8:31. [PMID: 29297360 PMCID: PMC5763467 DOI: 10.1186/s13326-017-0142-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Biological environment is uncertain and its dynamic is similar to the multiagent environment, thus the research results of the multiagent system area can provide valuable insights to the understanding of biology and are of great significance for the study of biology. Learning in a multiagent environment is highly dynamic since the environment is not stationary anymore and each agent's behavior changes adaptively in response to other coexisting learners, and vice versa. The dynamics becomes more unpredictable when we move from fixed-agent interaction environments to multiagent social learning framework. Analytical understanding of the underlying dynamics is important and challenging. RESULTS In this work, we present a social learning framework with homogeneous learners (e.g., Policy Hill Climbing (PHC) learners), and model the behavior of players in the social learning framework as a hybrid dynamical system. By analyzing the dynamical system, we obtain some conditions about convergence or non-convergence. We experimentally verify the predictive power of our model using a number of representative games. Experimental results confirm the theoretical analysis. CONCLUSION Under multiagent social learning framework, we modeled the behavior of agent in biologic environment, and theoretically analyzed the dynamics of the model. We present some sufficient conditions about convergence or non-convergence and prove them theoretically. It can be used to predict the convergence of the system.
Collapse
Affiliation(s)
- Chengwei Zhang
- School of Computer Science and Technology, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Xiaohong Li
- School of Computer Science and Technology, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Shuxin Li
- School of Computer Science and Technology, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| | - Zhiyong Feng
- School of Computer Computer Software, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350 China
| |
Collapse
|
22
|
Abstract
Background More than 1/3 of human genes are regulated by microRNAs. The identification of microRNA (miRNA) is the precondition of discovering the regulatory mechanism of miRNA and developing the cure for genetic diseases. The traditional identification method is biological experiment, but it has the defects of long period, high cost, and missing the miRNAs that but also many other algorithms only exist in a specific period or low expression level. Therefore, to overcome these defects, machine learning method is applied to identify miRNAs. Results In this study, for identifying real and pseudo miRNAs and classifying different species, we extracted 98 dimensional features based on the primary and secondary structure, then we proposed the BP-Adaboost method to figure out the overfitting phenomenon of BP neural network by constructing multiple BP neural network classifiers and distributed weights to these classifiers. The novel method we proposed, from the 4 evaluation terms, have achieved greatly improvement on the effect of identifying true pre-RNA compared to other methods. And from the respect of identifying species of pre-RNA, the novel method achieved more accuracy than other algorithms. Conclusions The BP-Adaboost method has achieved more than 98% accuracy in identifying real and pseudo miRNAs. It is much higher than not only BP but also many other algorithms. In the second experiment, restricted by the data, the algorithm could not get high accuracy in identifying 7 species, but also better than other algorithms.
Collapse
|
23
|
Peng J, Li Q, Shang X. Investigations on factors influencing HPO-based semantic similarity calculation. J Biomed Semantics 2017; 8:34. [PMID: 29297376 PMCID: PMC5763495 DOI: 10.1186/s13326-017-0144-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Although disease diagnosis has greatly benefited from next generation sequencing technologies, it is still difficult to make the right diagnosis purely based on sequencing technologies for many diseases with complex phenotypes and high genetic heterogeneity. Recently, calculating Human Phenotype Ontology (HPO)-based phenotype semantic similarity has contributed a lot for completing disease diagnosis. However, factors which affect the accuracy of HPO-based semantic similarity have not been evaluated systematically. Results In this study, we proposed a new framework called HPOFactor to evaluate these factors. Our model includes four components: (1) the size of annotation set, (2) the evidence code of annotations, (3) the quality of annotations and (4) the coverage of annotations respectively. Conclusions HPOFactor analyzes the four factors systematically based on two kinds of experiments: causative gene prediction and disease prediction. Furthermore, semantic similarity measurement could be designed based on the characteristic of these factors.
Collapse
Affiliation(s)
- Jiajie Peng
- Northwestern Polytechnical University, 127 West Youyi Road, Xi'an, 710072, China
| | - Qianqian Li
- Northwestern Polytechnical University, 127 West Youyi Road, Xi'an, 710072, China
| | - Xuequn Shang
- Northwestern Polytechnical University, 127 West Youyi Road, Xi'an, 710072, China.
| |
Collapse
|
24
|
Dongliang X, Jingchang P, Bailing W. Multiple kernels learning-based biological entity relationship extraction method. J Biomed Semantics 2017; 8:38. [PMID: 29297359 PMCID: PMC5763518 DOI: 10.1186/s13326-017-0138-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques. Results The P, R, and F value of tag graph method in Aimed corpus are 50.82, 69.76, and 58.61%, respectively. The P, R, and F value of tag graph kernel method in other four evaluation corpuses are 2–5% higher than that of all-paths graph kernel. And The P, R and F value of feature kernel and tag graph kernel fuse methods is 53.43, 71.62 and 61.30%, respectively. The P, R and F value of feature kernel and tag graph kernel fuse methods is 55.47, 70.29 and 60.37%, respectively. It indicated that the performance of the two kinds of kernel fusion methods is better than that of simple kernel. Conclusion In comparison with the all-paths graph kernel method, the tag graph kernel method is superior in terms of overall performance. Experiments show that the performance of the multi-kernels method is better than that of the three separate single-kernel method and the dual-mutually fused kernel method used hereof in five corpus sets.
Collapse
Affiliation(s)
- Xu Dongliang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China
| | - Pan Jingchang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China.
| | - Wang Bailing
- School of Computer Science and Technology, Harbin Institute of Technology, WenHua West Road, WeiHai, 264209, China
| |
Collapse
|
25
|
Roy S, Yun D, Madahian B, Berry MW, Deng LY, Goldowitz D, Homayouni R. Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts. Front Bioeng Biotechnol 2017; 5:48. [PMID: 28894735 PMCID: PMC5581332 DOI: 10.3389/fbioe.2017.00048] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/31/2017] [Indexed: 01/09/2023] Open
Abstract
In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs.
Collapse
Affiliation(s)
- Sujoy Roy
- Bioinformatics Program, University of Memphis, Memphis, TN, United States.,Center for Translational Informatics, University of Memphis, Memphis, TN, United States
| | - Daqing Yun
- Computer and Information Sciences Program, Harrisburg University of Science and Technology, Harrisburg, PA, United States
| | - Behrouz Madahian
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States
| | - Michael W Berry
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, United States
| | - Lih-Yuan Deng
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States
| | - Daniel Goldowitz
- Center for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| | - Ramin Homayouni
- Bioinformatics Program, University of Memphis, Memphis, TN, United States.,Center for Translational Informatics, University of Memphis, Memphis, TN, United States.,Department of Biological Sciences, University of Memphis, Memphis, TN, United States
| |
Collapse
|
26
|
Peng J, Lu J, Shang X, Chen J. Identifying consistent disease subnetworks using DNet. Methods 2017; 131:104-110. [PMID: 28807723 DOI: 10.1016/j.ymeth.2017.07.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 07/25/2017] [Accepted: 07/26/2017] [Indexed: 12/12/2022] Open
Abstract
It is critical to identify disease-specific subnetworks from the vastly available genome-wide gene expression data for elucidating how genes perform high-level biological functions together. Various algorithms have been developed for disease gene identification. However, the topological structure of the disease networks (or even the fraction of the networks) has been left largely unexplored. In this article, we present DNet, a method for the identification of significant disease subnetworks by integrating both the network structure and gene expression information. Our work will lead to the identification of missing key disease genes, which are be highly expressed in a disease-specific gene expression dataset. The experimental evaluation of our method on both the Leukemia and the Duchenne Muscular Dystrophy gene expression datasets show that DNet performs better than the existing state-of-the-art methods. In addition, literature supports were found for the discovered disease subnetworks in a case study.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, USA; Department of Internal Medicine, University of Kentucky, Lexington, USA; Department of Computer Science, University of Kentucky, Lexington, USA.
| |
Collapse
|
27
|
Zhou Y, Zhan C, Huang Y, Liu H. Comprehensive bioinformatics analyses of Crohn's disease. Mol Med Rep 2017; 15:2267-2272. [PMID: 28260036 DOI: 10.3892/mmr.2017.6250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 01/17/2017] [Indexed: 11/06/2022] Open
Abstract
Crohn's disease (CD) is a chronic, relapsing inflammatory disease with increasing incidence and prevalence worldwide. In previous years, the accumulation of microarray data has provided us an approach to obtain further insight into CD. In the present study, the microarray data of CD was comprehensively analyzed using multiple bioinformatics methods, and the pathobiological process of the disease was examined. Gene expression data from colon tissues of patients with CD were obtained from the Gene Expression Omnibus database; following which differentially expressed genes were identified between CD and control sample groups. Subsequently, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses were performed to investigate which functions and pathways in which the differentially expressed genes enriched. TargetScan and miRDB databases were then used to predict which microRNAs (miRNAs) regulated the differentially expressed genes. As a result, a total of 432 differentially expressed genes, including 229 upregulated and 203 downregulated genes, including matrix metallopeptidase 3 and glutathione S‑transferase α1, were identified in CD samples. These differentially expressed genes were significantly involved in regulation of the inflammatory response, innate immune response, cell migration, extracellular matrix organization, Janus kinase/signal transducers and activators of transcription signaling pathway, and cytokine‑cytokine receptor interaction. The miRNA-gene network showed that miR‑149‑3p and miR‑4447 regulated the most differentially expressed genes. These findings extend current understanding of the mechanisms underlying CD, and the differentially expressed genes and regulator miRNAs identified may be used as potential biomarkers and therapeutic targets for CD.
Collapse
Affiliation(s)
- Yi Zhou
- Department of Gastroenterology, Zhongshan Hospital, Fudan University, Shanghai 200032, P.R. China
| | - Cheng Zhan
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, Shanghai 200032, P.R. China
| | - Yingyu Huang
- Department of Gastroenterology, Zhongshan Hospital, Fudan University, Shanghai 200032, P.R. China
| | - Hongchun Liu
- Department of Gastroenterology, Zhongshan Hospital, Fudan University, Shanghai 200032, P.R. China
| |
Collapse
|
28
|
Abstract
Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery. Results We propose a new network-based disease gene prediction method called SLN-SRW (Simplified Laplacian Normalization-Supervised Random Walk) to generate and model the edge weights of a new biomedical network that integrates biomedical data from heterogeneous sources, thus far enhancing the disease related gene discovery. Conclusions The experiment results show that SLN-SRW significantly improves the performance of disease gene prediction on both the real and the synthetic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3263-4) contains supplementary material, which is available to authorized users.
Collapse
|
29
|
Tian Z, Wang C, Guo M, Liu X, Teng Z. An improved method for functional similarity analysis of genes based on Gene Ontology. BMC SYSTEMS BIOLOGY 2016; 10:119. [PMID: 28155727 PMCID: PMC5259995 DOI: 10.1186/s12918-016-0359-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. Results We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. Conclusions The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0359-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhen Tian
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Chunyu Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Maozu Guo
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Xiaoyan Liu
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhixia Teng
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.,Department of Information Management and Information System, Northeast Forestry University, Harbin, 150001, People's Republic of China
| |
Collapse
|
30
|
Integrating Information in Biological Ontologies and Molecular Networks to Infer Novel Terms. Sci Rep 2016; 6:39237. [PMID: 27976738 PMCID: PMC5157009 DOI: 10.1038/srep39237] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 11/21/2016] [Indexed: 12/24/2022] Open
Abstract
Currently most terms and term-term relationships in Gene Ontology (GO) are defined manually, which creates cost, consistency and completeness issues. Recent studies have demonstrated the feasibility of inferring GO automatically from biological networks, which represents an important complementary approach to GO construction. These methods (NeXO and CliXO) are unsupervised, which means 1) they cannot use the information contained in existing GO, 2) the way they integrate biological networks may not optimize the accuracy, and 3) they are not customized to infer the three different sub-ontologies of GO. Here we present a semi-supervised method called Unicorn that extends these previous methods to tackle the three problems. Unicorn uses a sub-tree of an existing GO sub-ontology as training part to learn parameters in integrating multiple networks. Cross-validation results show that Unicorn reliably inferred the left-out parts of each specific GO sub-ontology. In addition, by training Unicorn with an old version of GO together with biological networks, it successfully re-discovered some terms and term-term relationships present only in a new version of GO. Unicorn also successfully inferred some novel terms that were not contained in GO but have biological meanings well-supported by the literature.Availability: Source code of Unicorn is available at http://yiplab.cse.cuhk.edu.hk/unicorn/.
Collapse
|
31
|
OAHG: an integrated resource for annotating human genes with multi-level ontologies. Sci Rep 2016; 6:34820. [PMID: 27703231 PMCID: PMC5050487 DOI: 10.1038/srep34820] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 09/20/2016] [Indexed: 01/04/2023] Open
Abstract
OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e-16).
Collapse
|
32
|
Peng J, Li H, Liu Y, Juan L, Jiang Q, Wang Y, Chen J. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology. BMC Genomics 2016; 17 Suppl 5:530. [PMID: 27586009 PMCID: PMC5009821 DOI: 10.1186/s12864-016-2828-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. RESULTS We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. CONCLUSIONS InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, 48824, MI, USA
| | - Hongxiang Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Jin Chen
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, 48824, MI, USA. .,Department of Computer Science and Engineering, Michigan State University, East Lansing, 48824, MI, USA.
| |
Collapse
|
33
|
Penga J, Wang T, Huc J, Wang Y, Chen J. Constructing Networks of Organelle Functional Modules in Arabidopsis. Curr Genomics 2016; 17:427-438. [PMID: 28479871 PMCID: PMC5320545 DOI: 10.2174/1389202917666160726151048] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Revised: 05/30/2015] [Accepted: 06/05/2015] [Indexed: 11/22/2022] Open
Abstract
With the rapid accumulation of gene expression data, gene functional module identification has become a widely used approach in functional analysis. However, tools to identify organelle functional modules and analyze their relationships are still missing. We present a soft thresholding approach to construct networks of functional modules using gene expression datasets, in which nodes are strongly co-expressed genes that encode proteins residing in the same subcellular localization, and links represent strong inter-module connections. Our algorithm has three steps. First, we identify functional modules by analyzing gene expression data. Next, we use a self-adaptive approach to construct a mixed network of functional modules and genes. Finally, we link functional modules that are tightly connected in the mixed network. Analysis of experimental data from Arabidopsis demonstrates that our approach is effective in improving the interpretability of high-throughput transcriptomic data and inferring function of unknown genes.
Collapse
Affiliation(s)
- Jiajie Penga
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China.,Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China
| | - Jianping Huc
- Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA.,Department of Plant Biology, Michigan State University, East Lansing, USA
| | - Yadong Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China
| | - Jin Chen
- Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA.,Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| |
Collapse
|
34
|
Identifying Liver Cancer-Related Enhancer SNPs by Integrating GWAS and Histone Modification ChIP-seq Data. BIOMED RESEARCH INTERNATIONAL 2016; 2016:2395341. [PMID: 27429976 PMCID: PMC4939323 DOI: 10.1155/2016/2395341] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Revised: 05/30/2016] [Accepted: 06/01/2016] [Indexed: 11/18/2022]
Abstract
Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.
Collapse
|