1
|
Mansouri V, Bandarian F, Razi F, Razzaghi Z, Rezaei-Tavirani M, Rezaei M, Arjmand B, Rezaei-Tavirani M. NF-kappa B signaling pathway is associated with metformin resistance in type 2 diabetes patients. J Diabetes Metab Disord 2024; 23:2021-2030. [PMID: 39610517 PMCID: PMC11599502 DOI: 10.1007/s40200-024-01458-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 06/18/2024] [Indexed: 11/30/2024]
Abstract
Introduction Metformin is an essential medicine that is most widely prescribed frontline for the treatment of Type 2 diabetes (T2D). Metformin upgraded glycemic control in T2D patients without hypoglycemic effects in patients. This assessment aims to understand molecular mechanism mechanisms in non-responder patients to metformin. Methods Gene expression profiles of responder and non-responder T2D patients to metformin are extracted from Gene Expression Omnibus (GEO) and are evaluated by the GEO2R program to find the significant differentially expressed genes (DEGs). The significant DEGs have been studied via action map gene ontology analyses. Results Results indicate that 563 significant DEGs discriminate non-responders from responder groups. "NF-kappa B signaling pathway" and 11 DEGs including BIRC3, CCL4L2, CXCL2, ICAM1, LYN, MYD88, RELA, SYK, TLR4, TNFAIP3, and TRIM25 were pointed out as core of drug resistance. Conclusion It can be concluded that there are differences between gene expression analysis, the response of diabetic patients to metformin. Results indicate that dysregulation of the "NF-kappa B signaling pathway" and TNFAIP3, BIRC3, RELA, MYD88, TLR4, and ICAM1 is associated with drug resistance in T2D patients.
Collapse
Affiliation(s)
- Vahid Mansouri
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fatemeh Bandarian
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Farideh Razi
- Diabetes Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Zahra Razzaghi
- Laser application in medical sciences research center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Mitra Rezaei
- Genomic Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Babak Arjmand
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
- Iranian Cancer Control Center (MACSA), Tehran, Iran
| | - Mostafa Rezaei-Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
Recent advances in predicting lncRNA-disease associations based on computational methods. Drug Discov Today 2023; 28:103432. [PMID: 36370992 DOI: 10.1016/j.drudis.2022.103432] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 10/19/2022] [Accepted: 11/03/2022] [Indexed: 11/11/2022]
Abstract
Mutations in and dysregulation of long non-coding RNAs (lncRNAs) are closely associated with the development of various human complex diseases, but only a few lncRNAs have been experimentally confirmed to be associated with human diseases. Predicting new potential lncRNA-disease associations (LDAs) will help us to understand the pathogenesis of human diseases and to detect disease markers, as well as in disease diagnosis, prevention and treatment. Computational methods can effectively narrow down the screening scope of biological experiments, thereby reducing the duration and cost of such experiments. In this review, we outline recent advances in computational methods for predicting LDAs, focusing on LDA databases, lncRNA/disease similarity calculations, and advanced computational models. In addition, we analyze the limitations of various computational models and discuss future challenges and directions for development.
Collapse
|
3
|
Tan J, Li X, Zhang L, Du Z. Recent advances in machine learning methods for predicting LncRNA and disease associations. Front Cell Infect Microbiol 2022; 12:1071972. [PMID: 36530425 PMCID: PMC9748103 DOI: 10.3389/fcimb.2022.1071972] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/11/2022] [Indexed: 12/03/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are involved in almost the entire cell life cycle through different mechanisms and play an important role in many key biological processes. Mutations and dysregulation of lncRNAs have been implicated in many complex human diseases. Therefore, identifying the relationship between lncRNAs and diseases not only contributes to biologists' understanding of disease mechanisms, but also provides new ideas and solutions for disease diagnosis, treatment, prognosis and prevention. Since the existing experimental methods for predicting lncRNA-disease associations (LDAs) are expensive and time consuming, machine learning methods for predicting lncRNA-disease associations have become increasingly popular among researchers. In this review, we summarize some of the human diseases studied by LDAs prediction models, association and similarity features of LDAs prediction, performance evaluation methods of models and some advanced machine learning prediction models of LDAs. Finally, we discuss the potential limitations of machine learning-based methods for LDAs prediction and provide some ideas for designing new prediction models.
Collapse
|
4
|
Azadifar S, Ahmadi A. A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning. BMC Bioinformatics 2022; 23:422. [PMID: 36241966 PMCID: PMC9563530 DOI: 10.1186/s12859-022-04954-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/20/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. METHODS In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein-protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. RESULTS Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. CONCLUSION This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.
Collapse
Affiliation(s)
- Saeid Azadifar
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| | - Ali Ahmadi
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| |
Collapse
|
5
|
Pagano-Márquez R, Córdoba-Caballero J, Martínez-Poveda B, Quesada AR, Rojano E, Seoane P, Ranea JAG, Ángel Medina M. Deepening the knowledge of rare diseases dependent on angiogenesis through semantic similarity clustering and network analysis. Brief Bioinform 2022; 23:6613395. [PMID: 35731990 PMCID: PMC9294413 DOI: 10.1093/bib/bbac220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 04/28/2022] [Accepted: 05/11/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Angiogenesis is regulated by multiple genes whose variants can lead to different disorders. Among them, rare diseases are a heterogeneous group of pathologies, most of them genetic, whose information may be of interest to determine the still unknown genetic and molecular causes of other diseases. In this work, we use the information on rare diseases dependent on angiogenesis to investigate the genes that are associated with this biological process and to determine if there are interactions between the genes involved in its deregulation. RESULTS We propose a systemic approach supported by the use of pathological phenotypes to group diseases by semantic similarity. We grouped 158 angiogenesis-related rare diseases in 18 clusters based on their phenotypes. Of them, 16 clusters had traceable gene connections in a high-quality interaction network. These disease clusters are associated with 130 different genes. We searched for genes associated with angiogenesis througth ClinVar pathogenic variants. Of the seven retrieved genes, our system confirms six of them. Furthermore, it allowed us to identify common affected functions among these disease clusters. AVAILABILITY https://github.com/ElenaRojano/angio_cluster. CONTACT seoanezonjic@uma.es and elenarojano@uma.es.
Collapse
Affiliation(s)
- Raquel Pagano-Márquez
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain
| | - José Córdoba-Caballero
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain
| | - Beatriz Martínez-Poveda
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,CIBER de Enfermedades Cardiovasculares, CIBERCV, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain
| | - Ana R Quesada
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| | - Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| | - Miguel Ángel Medina
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| |
Collapse
|
6
|
Slater LT, Russell S, Makepeace S, Carberry A, Karwath A, Williams JA, Fanning H, Ball S, Hoehndorf R, Gkoutos GV. Evaluating semantic similarity methods for comparison of text-derived phenotype profiles. BMC Med Inform Decis Mak 2022; 22:33. [PMID: 35123470 PMCID: PMC8818208 DOI: 10.1186/s12911-022-01770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/21/2022] [Indexed: 11/16/2022] Open
Abstract
Background Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance ‘patient-like me’ analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. Methods We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). Results 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. Conclusion We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.
Collapse
|
7
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
8
|
Milano M. Using Gene Ontology to Annotate and Prioritize Microarray Data. Methods Mol Biol 2022; 2401:273-287. [PMID: 34902135 DOI: 10.1007/978-1-0716-1839-4_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The results of high-throughput experiments consist of numerous candidate genes, proteins, or other molecules potentially associated with diseases. A challenge for omics science is the knowledge extraction from the results and the filtering of promising gene or protein candidates. Especially, the hot topic in clinical scenarios consists of highlighting the behavior of few molecules related to some specific disease. In this contest, different computational approaches, also referred Gene prioritization methods, ensure to identify the most related genes to a disease among a larger set of candidate genes. The identification requires the use of domain-specific knowledge that is often encoded into ontologies.
Collapse
Affiliation(s)
- Marianna Milano
- Department of Medical and Surgical Sciences, University of Catanzaro, Catanzaro, Italy.
| |
Collapse
|
9
|
Arjmand B, Khodadoost M, Jahani Sherafat S, Rezaei Tavirani M, Ahmadi N, Okhovatian F, Rezaei Tavirani M. Low-Level Laser Therapy Effects on Rat Blood Hemostasis Via Significant Alteration in Fibrinogen and Plasminogen Expression Level. J Lasers Med Sci 2021; 12:e59. [PMID: 35155144 PMCID: PMC8837859 DOI: 10.34172/jlms.2021.59] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 07/03/2021] [Indexed: 01/20/2024]
Abstract
Introduction: There are many documents about the significant role of low-level laser therapy (LLLT) in different processes such as regenerator medicine and bone formation. The aim of this study is to assess the role of LLLT in blood hemostasis in rats via bioinformatic investigation. Methods: The differentially expressed plasma proteins of treated rats via LLLT from the literature and the added 50 first neighbors were investigated via network analysis to find the critical dysregulated proteins and biological processes by using Cytoscape software, the STRING database, and ClueGO. Results: A scale-free network including 55 nodes was constructed from queried and added first neighbor proteins. Fibrinogen gamma, fibrinogen alpha, and plasminogen were highlighted as the central genes of the analyzed network. Fibrinolysis was determined as the main group of biological processes that were affected by LLLT. Conclusion: Findings indicate that LLLT affects blood hemostasis which is an important point in approving the therapeutic application of LLLT and also in preventing its possible complication.
Collapse
Affiliation(s)
- Babak Arjmand
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahmood Khodadoost
- School of Traditional Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Somayeh Jahani Sherafat
- Laser Application in Medical Sciences Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Rezaei Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Nayebali Ahmadi
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farshad Okhovatian
- Physiotherapy Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
10
|
BiGAN: LncRNA-disease association prediction based on bidirectional generative adversarial network. BMC Bioinformatics 2021; 22:357. [PMID: 34193046 PMCID: PMC8247109 DOI: 10.1186/s12859-021-04273-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 06/15/2021] [Indexed: 12/11/2022] Open
Abstract
Background An increasing number of studies have shown that lncRNAs are crucial for the control of hormones and the regulation of various physiological processes in the human body, and deletion mutations in RNA are related to many human diseases. LncRNA- disease association prediction is very useful for understanding pathogenesis, diagnosis, and prevention of diseases, and is helpful for labelling relevant biological information. Results In this manuscript, we propose a computational model named bidirectional generative adversarial network (BiGAN), which consists of an encoder, a generator, and a discriminator to predict new lncRNA-disease associations. We construct features between lncRNA and disease pairs by utilizing the disease semantic similarity, lncRNA sequence similarity, and Gaussian interaction profile kernel similarities of lncRNAs and diseases. The BiGAN maps the latent features of similarity features to predict unverified association between lncRNAs and diseases. The computational results have proved that the BiGAN performs significantly better than other state-of-the-art approaches in cross-validation. We employed the proposed model to predict candidate lncRNAs for renal cancer and colon cancer. The results are promising. Case studies show that almost 70% of lncRNAs in the top 10 prediction lists are verified by recent biological research. Conclusion The experimental results indicated that our proposed model had an accurate predictive ability for the association of lncRNA-disease pairs.
Collapse
|
11
|
Nayarisseri A. Experimental and Computational Approaches to Improve Binding Affinity in Chemical Biology and Drug Discovery. Curr Top Med Chem 2021; 20:1651-1660. [PMID: 32614747 DOI: 10.2174/156802662019200701164759] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Drug discovery is one of the most complicated processes and establishment of a single drug may require multidisciplinary attempts to design efficient and commercially viable drugs. The main purpose of drug design is to identify a chemical compound or inhibitor that can bind to an active site of a specific cavity on a target protein. The traditional drug design methods involved various experimental based approaches including random screening of chemicals found in nature or can be synthesized directly in chemical laboratories. Except for the long cycle design and time, high cost is also the major issue of concern. Modernized computer-based algorithm including structure-based drug design has accelerated the drug design and discovery process adequately. Surprisingly from the past decade remarkable progress has been made concerned with all area of drug design and discovery. CADD (Computer Aided Drug Designing) based tools shorten the conventional cycle size and also generate chemically more stable and worthy compounds and hence reduce the drug discovery cost. This special edition of editorial comprises the combination of seven research and review articles set emphasis especially on the computational approaches along with the experimental approaches using a chemical synthesizing for the binding affinity in chemical biology and discovery as a salient used in de-novo drug designing. This set of articles exfoliates the role that systems biology and the evaluation of ligand affinity in drug design and discovery for the future.
Collapse
Affiliation(s)
- Anuraj Nayarisseri
- In silico Research Laboratory, Eminent Biosciences, Mahalakshmi Nagar, Indore - 452010, Madhya Pradesh, India
| |
Collapse
|
12
|
Zamanian-Azodi M, Arjmand B, Razzaghi M, Rezaei Tavirani M, Ahmadzadeh A, Rostaminejad M. Platelet and Haemostasis are the Main Targets in Severe Cases of COVID-19 Infection; a System Biology Study. ARCHIVES OF ACADEMIC EMERGENCY MEDICINE 2021; 9:e27. [PMID: 34027422 PMCID: PMC8126352 DOI: 10.22037/aaem.v9i1.1108] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Introduction: Many proteomics-based and bioinformatics-based efforts are made to detect the molecular mechanism of COVID-19 infection. Identification of the main protein targets and pathways of severe cases of COVID-19 infection is the aim of this study. Methods: Published differentially expressed proteins were screened and the significant proteins were investigated via protein-protein interaction network using Cytoscape software V. 3.7.2 and STRING database. The studied proteins were assessed via action map analysis to determine the relationship between individual proteins using CluePedia. The related biological terms were investigated using ClueGO and the terms were clustered and discussed. Results: Among the 35 queried proteins, six of them (FGA, FGB, FGG, and FGl1 plus TLN1 and THBS1) were identified as critical proteins. A total of 38 biological terms, clustered in 4 groups, were introduced as the affected terms. “Platelet degranulation” and “hereditary factor I deficiency disease” were introduced as the main class of the terms disturbed by COVID-19 virus. Conclusion: It can be concluded that platelet damage and disturbed haemostasis could be the main targets in severe cases of coronavirus infection. It is vital to follow patients’ condition by examining the introduced critical differentially expressed proteins (DEPs).
Collapse
Affiliation(s)
- Mona Zamanian-Azodi
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Babak Arjmand
- Cell Therapy and Regenerative Medicine Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Mohammadreza Razzaghi
- Laser Application in Medical Sciences Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Rezaei Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Alireza Ahmadzadeh
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Rostaminejad
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
13
|
Joodaki M, Ghadiri N, Maleki Z, Lotfi Shahreza M. A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion. J Biomed Inform 2021; 115:103688. [PMID: 33545331 DOI: 10.1016/j.jbi.2021.103688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 01/10/2021] [Accepted: 01/23/2021] [Indexed: 12/11/2022]
Abstract
One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.
Collapse
Affiliation(s)
- Mehdi Joodaki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.
| | - Zeinab Maleki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | | |
Collapse
|
14
|
Yang Y, Fu X, Qu W, Xiao Y, Shen HB. MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA-disease association. Bioinformatics 2019; 34:3547-3556. [PMID: 29718114 DOI: 10.1093/bioinformatics/bty343] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2017] [Accepted: 04/26/2018] [Indexed: 01/22/2023] Open
Abstract
Motivation Benefiting from high-throughput experimental technologies, whole-genome analysis of microRNAs (miRNAs) has been more and more common to uncover important regulatory roles of miRNAs and identify miRNA biomarkers for disease diagnosis. As a complementary information to the high-throughput experimental data, domain knowledge like the Gene Ontology and KEGG pathway is usually used to guide gene function analysis. However, functional annotation for miRNAs is scarce in the public databases. Till now, only a few methods have been proposed for measuring the functional similarity between miRNAs based on public annotation data, and these methods cover a very limited number of miRNAs, which are not applicable to large-scale miRNA analysis. Results In this paper, we propose a new method to measure the functional similarity for miRNAs, called miRGOFS, which has two notable features: (i) it adopts a new GO semantic similarity metric which considers both common ancestors and descendants of GO terms; (i) it computes similarity between GO sets in an asymmetric manner, and weights each GO term by its statistical significance. The miRGOFS-based predictor achieves an F1 of 61.2% on a benchmark dataset of miRNA localization, and AUC values of 87.7 and 81.1% on two benchmark sets of miRNA-disease association, respectively. Compared with the existing functional similarity measurements of miRNAs, miRGOFS has the advantages of higher accuracy and larger coverage of human miRNAs (over 1000 miRNAs). Availability and implementation http://www.csbio.sjtu.edu.cn/bioinf/MiRGOFS/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Xiaofeng Fu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Wenhao Qu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yiqun Xiao
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
15
|
Lu C, Yang M, Luo F, Wu FX, Li M, Pan Y, Li Y, Wang J. Prediction of lncRNA-disease associations based on inductive matrix completion. Bioinformatics 2019; 34:3357-3364. [PMID: 29718113 DOI: 10.1093/bioinformatics/bty327] [Citation(s) in RCA: 180] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/25/2018] [Indexed: 12/23/2022] Open
Abstract
Motivation Accumulating evidences indicate that long non-coding RNAs (lncRNAs) play pivotal roles in various biological processes. Mutations and dysregulations of lncRNAs are implicated in miscellaneous human diseases. Predicting lncRNA-disease associations is beneficial to disease diagnosis as well as treatment. Although many computational methods have been developed, precisely identifying lncRNA-disease associations, especially for novel lncRNAs, remains challenging. Results In this study, we propose a method (named SIMCLDA) for predicting potential lncRNA-disease associations based on inductive matrix completion. We compute Gaussian interaction profile kernel of lncRNAs from known lncRNA-disease interactions and functional similarity of diseases based on disease-gene and gene-gene onotology associations. Then, we extract primary feature vectors from Gaussian interaction profile kernel of lncRNAs and functional similarity of diseases by principal component analysis, respectively. For a new lncRNA, we calculate the interaction profile according to the interaction profiles of its neighbors. At last, we complete the association matrix based on the inductive matrix completion framework using the primary feature vectors from the constructed feature matrices. Computational results show that SIMCLDA can effectively predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case studies show that SIMCLDA can effectively predict candidate lncRNAs for renal cancer, gastric cancer and prostate cancer. Availability and implementation https://github.com//bioinfomaticsCSU/SIMCLDA. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chengqian Lu
- School of Information Science and Engineering, Central South University, Changsha, People's Republic of China
| | - Mengyun Yang
- School of Information Science and Engineering, Central South University, Changsha, People's Republic of China
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, USA
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, Canada
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha, People's Republic of China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA, USA
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, People's Republic of China
| |
Collapse
|
16
|
LLCLPLDA: a novel model for predicting lncRNA-disease associations. Mol Genet Genomics 2019; 294:1477-1486. [PMID: 31250107 DOI: 10.1007/s00438-019-01590-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 06/21/2019] [Indexed: 12/19/2022]
Abstract
Long noncoding RNAs play a significant role in the occurrence of diseases. Thus, studying the relationship prediction between lncRNAs and disease is becoming more popular. Researchers hope to determine effective treatments by revealing the occurrence and development of diseases at the molecular level. However, the traditional biological experimental way to verify the association between lncRNAs and disease is very time-consuming and expensive. Therefore, we developed a method called LLCLPLDA to predict potential lncRNA-disease associations. First, locality-constrained linear coding (LLC) is leveraged to project the features of lncRNAs and diseases to local-constraint features, and then, a label propagation (LP) strategy is used to mix up the initial association matrix and the obtained features of lncRNAs and diseases. To demonstrate the performance of our method, we compared LLCLPLDA with five methods in the leave-one-out cross-validation and fivefold cross-validation scheme, and the experimental results show that the proposed method outperforms the other five methods. Additionally, we conducted case studies on three diseases: cervical cancer, gliomas, and breast cancer. The top five predicted lncRNAs for cervical cancer and gliomas were verified, and four of the five lncRNAs for breast cancer were also confirmed.
Collapse
|
17
|
Sabir JSM, El Omri A, Shaik NA, Banaganapalli B, Al-Shaeri MA, Alkenani NA, Hajrah NH, Awan ZA, Zrelli H, Elango R, Khan M. Identification of key regulatory genes connected to NF-κB family of proteins in visceral adipose tissues using gene expression and weighted protein interaction network. PLoS One 2019; 14:e0214337. [PMID: 31013288 PMCID: PMC6478283 DOI: 10.1371/journal.pone.0214337] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 03/11/2019] [Indexed: 12/12/2022] Open
Abstract
Obesity is connected to the activation of chronic inflammatory pathways in both adipocytes and macrophages located in adipose tissues. The nuclear factor (NF)-κB is a central molecule involved in inflammatory pathways linked to the pathology of different complex metabolic disorders. Investigating the gene expression data in the adipose tissue would potentially unravel disease relevant gene interactions. The present study is aimed at creating a signature molecular network and at prioritizing the potential biomarkers interacting with NF-κB family of proteins in obesity using system biology approaches. The dataset GSE88837 associated with obesity was downloaded from Gene Expression Omnibus (GEO) database. Statistical analysis represented the differential expression of a total of 2650 genes in adipose tissues (p = <0.05). Using concepts like correlation, semantic similarity, and theoretical graph parameters we narrowed down genes to a network of 23 genes strongly connected with NF-κB family with higher significance. Functional enrichment analysis revealed 21 of 23 target genes of NF-κB were found to have a critical role in the pathophysiology of obesity. Interestingly, GEM and PPP1R13L were predicted as novel genes which may act as potential target or biomarkers of obesity as they occur with other 21 target genes with known obesity relationship. Our study concludes that NF-κB and prioritized target genes regulate the inflammation in adipose tissues through several molecular signaling pathways like NF-κB, PI3K-Akt, glucocorticoid receptor regulatory network, angiogenesis and cytokine pathways. This integrated system biology approaches can be applied for elucidating functional protein interaction networks of NF-κB protein family in different complex diseases. Our integrative and network-based approach for finding therapeutic targets in genomic data could accelerate the identification of novel drug targets for obesity.
Collapse
Affiliation(s)
- Jamal S. M. Sabir
- Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Genomics and Biotechnology Section and Research Group, Department of Biological Sciences, Faculty of Science, King abdulaziz University, Jeddah, Saudi Arabia
| | - Abdelfatteh El Omri
- Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Genomics and Biotechnology Section and Research Group, Department of Biological Sciences, Faculty of Science, King abdulaziz University, Jeddah, Saudi Arabia
- * E-mail: (MK); (AEO)
| | - Noor A. Shaik
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Babajan Banaganapalli
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majed A. Al-Shaeri
- Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Naser A. Alkenani
- Biology- Zoology Division, Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nahid H. Hajrah
- Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Genomics and Biotechnology Section and Research Group, Department of Biological Sciences, Faculty of Science, King abdulaziz University, Jeddah, Saudi Arabia
| | - Zuhier A. Awan
- Department of Clinical Biochemistry. Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Houda Zrelli
- Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Genomics and Biotechnology Section and Research Group, Department of Biological Sciences, Faculty of Science, King abdulaziz University, Jeddah, Saudi Arabia
| | - Ramu Elango
- Department of Genetic Medicine, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhummadh Khan
- Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Genomics and Biotechnology Section and Research Group, Department of Biological Sciences, Faculty of Science, King abdulaziz University, Jeddah, Saudi Arabia
- * E-mail: (MK); (AEO)
| |
Collapse
|
18
|
Abstract
Inherited metabolic disorders (IMDs) are debilitating inherited diseases, with phenotypic, biochemical and genetic heterogeneity, frequently leading to prolonged diagnostic odysseys. Mitochondrial disorders represent one of the most severe classes of IMDs, wherein defects in >350 genes lead to multi-system disease. Diagnostic rates have improved considerably following the adoption of next-generation sequencing (NGS) technologies, but are still far from perfect. Phenomic annotation is an emerging concept which is being utilised to enhance interpretation of NGS results. To test whether phenomic correlations have utility in mitochondrial disease and IMDs, we created a gene-to-phenotype interaction network with searchable elements, for Leigh syndrome, a frequently observed paediatric mitochondrial disorder. The Leigh Map comprises data on 92 genes and 275 phenotypes standardised in human phenotype ontology terms, with 80% predictive accuracy. This commentary highlights the usefulness of the Leigh Map and similar resources and the challenges associated with integrating phenomic technologies into clinical practice.
Collapse
Affiliation(s)
- Joyeeta Rahman
- UCL Great Ormond Street Institute of Child Health, London, UK
| | - Shamima Rahman
- UCL Great Ormond Street Institute of Child Health, London, UK
| |
Collapse
|
19
|
Liu W, Liu J, Rajapakse JC. Gene Ontology Enrichment Improves Performances of Functional Similarity of Genes. Sci Rep 2018; 8:12100. [PMID: 30108262 PMCID: PMC6092333 DOI: 10.1038/s41598-018-30455-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/25/2018] [Indexed: 12/23/2022] Open
Abstract
There exists a plethora of measures to evaluate functional similarity (FS) between genes, which is a widely used in many bioinformatics applications including detecting molecular pathways, identifying co-expressed genes, predicting protein-protein interactions, and prioritization of disease genes. Measures of FS between genes are mostly derived from Information Contents (IC) of Gene Ontology (GO) terms annotating the genes. However, existing measures evaluating IC of terms based either on the representations of terms in the annotating corpus or on the knowledge embedded in the GO hierarchy do not consider the enrichment of GO terms by the querying pair of genes. The enrichment of a GO term by a pair of gene is dependent on whether the term is annotated by one gene (i.e., partial annotation) or by both genes (i.e. complete annotation) in the pair. In this paper, we propose a method that incorporate enrichment of GO terms by a gene pair in computing their FS and show that GO enrichment improves the performances of 46 existing FS measures in the prediction of sequence homologies, gene expression correlations, protein-protein interactions, and disease associated genes.
Collapse
Affiliation(s)
- Wenting Liu
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
| | - Jianjun Liu
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
20
|
Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018; 6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open
Abstract
Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provides researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relationships between biomedical entities by automatically mapping phenotypic abnormality defining HPO terms with biomolecular function defining GO terms, where each association indicates the occurrence of the abnormality due to the loss of the biomolecular function expressed by the corresponding GO term. The proposed HPO2GO mappings were extracted by calculating the frequency of the co-annotations of the terms on the same genes/proteins, using already existing curated HPO and GO annotation sets. This was followed by the filtering of the unreliable mappings that could be observed due to chance, by statistical resampling of the co-occurrence similarity distributions. Furthermore, the biological relevance of the finalized mappings were discussed over selected cases, using the literature. The resulting HPO2GO mappings can be employed in different settings to predict and to analyse novel gene/protein—ontology term—disease relations. As an application of the proposed approach, HPO term—protein associations (i.e., HPO2protein) were predicted. In order to test the predictive performance of the method on a quantitative basis, and to compare it with the state-of-the-art, CAFA2 challenge HPO prediction target protein set was employed. The results of the benchmark indicated the potential of the proposed approach, as HPO2GO performance was among the best (Fmax = 0.35). The automated cross ontology mapping approach developed in this work may be extended to other ontologies as well, to identify unexplored relation patterns at the systemic level. The datasets, results and the source code of HPO2GO are available for download at: https://github.com/cansyl/HPO2GO.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Cancer Systems Biology Laboratory (KanSiL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
21
|
MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization. BMC Bioinformatics 2018; 19:215. [PMID: 29871590 PMCID: PMC5989416 DOI: 10.1186/s12859-018-2216-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 05/23/2018] [Indexed: 01/13/2023] Open
Abstract
Background Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization. Results In this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence. Conclusions This work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy. Electronic supplementary material The online version of this article (10.1186/s12859-018-2216-0) contains supplementary material, which is available to authorized users.
Collapse
|
22
|
|
23
|
Tian Z, Guo M, Wang C, Liu X, Wang S. Refine gene functional similarity network based on interaction networks. BMC Bioinformatics 2017; 18:550. [PMID: 29297381 PMCID: PMC5751769 DOI: 10.1186/s12859-017-1969-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND In recent years, biological interaction networks have become the basis of some essential study and achieved success in many applications. Some typical networks such as protein-protein interaction networks have already been investigated systematically. However, little work has been available for the construction of gene functional similarity networks so far. In this research, we will try to build a high reliable gene functional similarity network to promote its further application. RESULTS Here, we propose a novel method to construct and refine the gene functional similarity network. It mainly contains three steps. First, we establish an integrated gene functional similarity networks based on different functional similarity calculation methods. Then, we construct a referenced gene-gene association network based on the protein-protein interaction networks. At last, we refine the spurious edges in the integrated gene functional similarity network with the help of the referenced gene-gene association network. Experiment results indicate that the refined gene functional similarity network (RGFSN) exhibits a scale-free, small world and modular architecture, with its degrees fit best to power law distribution. In addition, we conduct protein complex prediction experiment for human based on RGFSN and achieve an outstanding result, which implies it has high reliability and wide application significance. CONCLUSIONS Our efforts are insightful for constructing and refining gene functional similarity networks, which can be applied to build other high quality biological networks.
Collapse
Affiliation(s)
- Zhen Tian
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Maozu Guo
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044 People’s Republic of China
| | - Chunyu Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Xiaoyan Liu
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Shiming Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| |
Collapse
|
24
|
Mazandu GK, Chimusa ER, Mulder NJ. Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform 2017; 18:886-901. [PMID: 27473066 DOI: 10.1093/bib/bbw067] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Indexed: 01/02/2023] Open
Abstract
Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.
Collapse
|
25
|
Tian Z, Guo M, Wang C, Xing L, Wang L, Zhang Y. Constructing an integrated gene similarity network for the identification of disease genes. J Biomed Semantics 2017; 8:32. [PMID: 29297379 PMCID: PMC5763299 DOI: 10.1186/s13326-017-0141-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. RESULTS We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. CONCLUSIONS RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Maozu Guo
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Chunyu Wang
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - LinLin Xing
- School of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001 People’s Republic of China
| | - Lei Wang
- Institute of Health Service and Medical Information Academy of Military Medical Sciences Beijing, Beijing, 100850 China
| | - Yin Zhang
- Institute of Health Service and Medical Information Academy of Military Medical Sciences Beijing, Beijing, 100850 China
| |
Collapse
|
26
|
Mazza A, Klockmeier K, Wanker E, Sharan R. An integer programming framework for inferring disease complexes from network data. Bioinformatics 2017; 32:i271-i277. [PMID: 27307626 PMCID: PMC4908347 DOI: 10.1093/bioinformatics/btw263] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Unraveling the molecular mechanisms that underlie disease calls for methods that go beyond the identification of single causal genes to inferring larger protein assemblies that take part in the disease process. RESULTS Here, we develop an exact, integer-programming-based method for associating protein complexes with disease. Our approach scores proteins based on their proximity in a protein-protein interaction network to a prior set that is known to be relevant for the studied disease. These scores are combined with interaction information to infer densely interacting protein complexes that are potentially disease-associated. We show that our method outperforms previous ones and leads to predictions that are well supported by current experimental data and literature knowledge. AVAILABILITY AND IMPLEMENTATION The datasets we used, the executables and the results are available at www.cs.tau.ac.il/roded/disease_complexes.zip CONTACT roded@post.tau.ac.il.
Collapse
Affiliation(s)
- Arnon Mazza
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | - Erich Wanker
- Max Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
27
|
Yu G, Lu C, Wang J. NoGOA: predicting noisy GO annotations using evidences and sparse representation. BMC Bioinformatics 2017; 18:350. [PMID: 28732468 PMCID: PMC5521088 DOI: 10.1186/s12859-017-1764-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/14/2017] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. RESULTS We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. CONCLUSIONS The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| | - Chang Lu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
28
|
Lastra-Díaz JJ, García-Serrano A, Batet M, Fernández M, Chirigati F. HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. INFORM SYST 2017. [DOI: 10.1016/j.is.2017.02.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
29
|
Silberberg Y, Kupiec M, Sharan R. GLADIATOR: a global approach for elucidating disease modules. Genome Med 2017; 9:48. [PMID: 28549478 PMCID: PMC5446740 DOI: 10.1186/s13073-017-0435-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 05/04/2017] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Understanding the genetic basis of disease is an important challenge in biology and medicine. The observation that disease-related proteins often interact with one another has motivated numerous network-based approaches for deciphering disease mechanisms. In particular, protein-protein interaction networks were successfully used to illuminate disease modules, i.e., interacting proteins working in concert to drive a disease. The identification of these modules can further our understanding of disease mechanisms. METHODS We devised a global method for the prediction of multiple disease modules simultaneously named GLADIATOR (GLobal Approach for DIsease AssociaTed mOdule Reconstruction). GLADIATOR relies on a gold-standard disease phenotypic similarity to obtain a pan-disease view of the underlying modules. To traverse the search space of potential disease modules, we applied a simulated annealing algorithm aimed at maximizing the correlation between module similarity and the gold-standard phenotypic similarity. Importantly, this optimization is employed over hundreds of diseases simultaneously. RESULTS GLADIATOR's predicted modules highly agree with current knowledge about disease-related proteins. Furthermore, the modules exhibit high coherence with respect to functional annotations and are highly enriched with known curated pathways, outperforming previous methods. Examination of the predicted proteins shared by similar diseases demonstrates the diverse role of these proteins in mediating related processes across similar diseases. Last, we provide a detailed analysis of the suggested molecular mechanism predicted by GLADIATOR for hyperinsulinism, suggesting novel proteins involved in its pathology. CONCLUSIONS GLADIATOR predicts disease modules by integrating knowledge of disease-related proteins and phenotypes across multiple diseases. The predicted modules are functionally coherent and are more in line with current biological knowledge compared to modules obtained using previous disease-centric methods. The source code for GLADIATOR can be downloaded from http://www.cs.tau.ac.il/~roded/GLADIATOR.zip .
Collapse
Affiliation(s)
- Yael Silberberg
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Tel Aviv, Israel
| | - Martin Kupiec
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Tel Aviv, Israel
| | - Roded Sharan
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
30
|
Kulmanov M, Hoehndorf R. Evaluating the effect of annotation size on measures of semantic similarity. J Biomed Semantics 2017; 8:7. [PMID: 28193260 PMCID: PMC5307803 DOI: 10.1186/s13326-017-0119-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 02/01/2017] [Indexed: 01/29/2023] Open
Abstract
Background Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products. Results Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation. Conclusions Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions. Electronic supplementary material The online version of this article (doi:10.1186/s13326-017-0119-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
31
|
Abstract
Background Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery. Results We propose a new network-based disease gene prediction method called SLN-SRW (Simplified Laplacian Normalization-Supervised Random Walk) to generate and model the edge weights of a new biomedical network that integrates biomedical data from heterogeneous sources, thus far enhancing the disease related gene discovery. Conclusions The experiment results show that SLN-SRW significantly improves the performance of disease gene prediction on both the real and the synthetic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3263-4) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
Li X, Lin Y, Gu C. A network similarity integration method for predicting microRNA-disease associations. RSC Adv 2017. [DOI: 10.1039/c7ra05348g] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The NSIM integrates the disease similarity network, miRNA similarity network, and known miRNA-disease association network on the basis of cousin similarity to predict not only novel miRNA-disease associations but also isolated diseases.
Collapse
Affiliation(s)
- Xiaoying Li
- College of Information Science and Engineer
- Hunan University
- Changsha
- China
| | - Yaping Lin
- College of Information Science and Engineer
- Hunan University
- Changsha
- China
| | - Changlong Gu
- College of Information Science and Engineer
- Hunan University
- Changsha
- China
| |
Collapse
|
33
|
Tian Z, Wang C, Guo M, Liu X, Teng Z. An improved method for functional similarity analysis of genes based on Gene Ontology. BMC SYSTEMS BIOLOGY 2016; 10:119. [PMID: 28155727 PMCID: PMC5259995 DOI: 10.1186/s12918-016-0359-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Background Measures of gene functional similarity are essential tools for gene clustering, gene function prediction, evaluation of protein-protein interaction, disease gene prioritization and other applications. In recent years, many gene functional similarity methods have been proposed based on the semantic similarity of GO terms. However, these leading approaches may make errorprone judgments especially when they measure the specificity of GO terms as well as the IC of a term set. Therefore, how to estimate the gene functional similarity reliably is still a challenging problem. Results We propose WIS, an effective method to measure the gene functional similarity. First of all, WIS computes the IC of a term by employing its depth, the number of its ancestors as well as the topology of its descendants in the GO graph. Secondly, WIS calculates the IC of a term set by means of considering the weighted inherited semantics of terms. Finally, WIS estimates the gene functional similarity based on the IC overlap ratio of term sets. WIS is superior to some other representative measures on the experiments of functional classification of genes in a biological pathway, collaborative evaluation of GO-based semantic similarity measures, protein-protein interaction prediction and correlation with gene expression. Further analysis suggests that WIS takes fully into account the specificity of terms and the weighted inherited semantics of terms between GO terms. Conclusions The proposed WIS method is an effective and reliable way to compare gene function. The web service of WIS is freely available at http://nclab.hit.edu.cn/WIS/. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0359-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhen Tian
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Chunyu Wang
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Maozu Guo
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Xiaoyan Liu
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhixia Teng
- Department of computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.,Department of Information Management and Information System, Northeast Forestry University, Harbin, 150001, People's Republic of China
| |
Collapse
|
34
|
Tian Z, Wang C, Guo M, Liu X, Teng Z. SGFSC: speeding the gene functional similarity calculation based on hash tables. BMC Bioinformatics 2016; 17:445. [PMID: 27814675 PMCID: PMC5096311 DOI: 10.1186/s12859-016-1294-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 10/19/2016] [Indexed: 12/23/2022] Open
Abstract
Background In recent years, many measures of gene functional similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic similarity. Therefore, the efficient measurement of gene functional similarity remains a challenging problem. Results To speed current gene functional similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene functional similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene functional similarity on the whole genomic scale. Conclusions The proposed strategy is successful in speeding current gene functional similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC. The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhixia Teng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.,Department of Information Management and Information System, Northeast Forestry University, Harbin, 150001, People's Republic of China
| |
Collapse
|
35
|
Li P, Nie Y, Yu J. Fusing literature and full network data improves disease similarity computation. BMC Bioinformatics 2016; 17:326. [PMID: 27578323 PMCID: PMC5006367 DOI: 10.1186/s12859-016-1205-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Accepted: 08/24/2016] [Indexed: 01/01/2023] Open
Abstract
Background Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. Results Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. Conclusions Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ping Li
- State Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yaling Nie
- State Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingkai Yu
- State Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
36
|
Deng Y, Gao L, Guo X, Wang B. Integrating phenotypic features and tissue-specific information to prioritize disease genes. SCIENCE CHINA INFORMATION SCIENCES 2016; 59:070101. [DOI: 10.1007/s11432-016-5584-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
37
|
Al-Dalky R, Taha K, Al Homouz D, Qasaimeh M. Applying Monte Carlo Simulation to Biomedical Literature to Approximate Genetic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:494-504. [PMID: 26415184 DOI: 10.1109/tcbb.2015.2481399] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g , the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g . It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.
Collapse
|
38
|
Domeniconi G, Masseroli M, Moro G, Pinoli P. Cross-organism learning method to discover new gene functionalities. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 126:20-34. [PMID: 26724853 DOI: 10.1016/j.cmpb.2015.12.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2015] [Revised: 11/16/2015] [Accepted: 12/08/2015] [Indexed: 06/05/2023]
Abstract
BACKGROUND Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. METHODS Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. RESULTS We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted.
Collapse
Affiliation(s)
- Giacomo Domeniconi
- DISI, Università degli Studi di Bologna, Via Venezia 52, 47521 Cesena, Italy.
| | - Marco Masseroli
- DEIB, Politecnico di Milano, Piazza L. Da Vinci 32, 20133 Milan, Italy.
| | - Gianluca Moro
- DISI, Università degli Studi di Bologna, Via Venezia 52, 47521 Cesena, Italy.
| | - Pietro Pinoli
- DEIB, Politecnico di Milano, Piazza L. Da Vinci 32, 20133 Milan, Italy.
| |
Collapse
|
39
|
Investigating the impact human protein–protein interaction networks have on disease-gene analysis. INT J MACH LEARN CYB 2016. [DOI: 10.1007/s13042-016-0503-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
40
|
Rahmani H, Blockeel H, Bender A. Using a Human Drug Network for generating novel hypotheses about drugs. INTELL DATA ANAL 2016. [DOI: 10.3233/ida-150800] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Hossein Rahmani
- School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
- Department of Knowledge Engineering, Universiteit Maastricht, Maastricht, The Netherlands
| | - Hendrik Blockeel
- Department of Computer Science, KU Leuven, Leuven, Belgium
- Leiden Institute of Advanced Computer Science, Leiden University, CA Leiden, The Netherlands
| | - Andreas Bender
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
41
|
Taha K, Yoo PD. Predicting the functions of a protein from its ability to associate with other molecules. BMC Bioinformatics 2016; 17:34. [PMID: 26767846 PMCID: PMC4714473 DOI: 10.1186/s12859-016-0882-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 01/05/2016] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND All proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurrences of a protein's name and a molecule's name in the sentences of biomedical abstracts can be considered as indicative of the association between the protein and the molecule. Dependency parsers extract textual relations from a text by determining the grammatical relations between words in a sentence. They can be used for determining the textual relations between proteins and molecules. Despite their success, they may extract textual relations with low precision. This is because they do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). Moreover, they may not be well suited for complex sentences and for long-distance textual relations. RESULTS We introduce an information extraction system called PPFBM that predicts the functions of unannotated proteins from the molecules that associate with these proteins. PPFBM represents each protein by the other molecules that associate with it in the abstracts referenced in the protein's entries in reliable biological databases. It automatically extracts each co-occurrence of a protein-molecule pair that represents semantic relationship between the pair. Towards this, we present novel semantic rules that identify the semantic relationship between each co-occurrence of a protein-molecule pair using the syntactic structures of sentences and linguistics theories. PPFBM determines the functions of an un-annotated protein p as follows. First, it determines the set S r of annotated proteins that is semantically similar to p by matching the molecules representing p and the annotated proteins. Then, it assigns p the functional category FC if the significance of the frequency of occurrences of S r in abstracts associated with proteins annotated with FC is statistically significantly different than the significance of the frequency of occurrences of S r in abstracts associated with proteins annotated with all other functional categories. We evaluated the quality of PPFBM by comparing it experimentally with two other systems. Results showed marked improvement. CONCLUSIONS The experimental results demonstrated that PPFBM outperforms other systems that predict protein function from the textual information found within biomedical abstracts. This is because these system do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). PPFBM's performance over these system increases steadily as the number of training protein increases. That is, PPFBM's prediction performance becomes more accurate constantly, as the size of training proteins gets larger. This is because every time a new set of test proteins is added to the current set of training proteins. A demo of PPFBM that annotates each input Yeast protein (SGD (Saccharomyces Genome Database). Available at: http://www.yeastgenome.org/download-data/curation) with the functions of Gene Ontology terms is available at: (see Appendix for more details about the demo) http://ecesrvr.kustar.ac.ae:8080/PPFBM/.
Collapse
Affiliation(s)
- Kamal Taha
- Department of Electrical and Computer Engineering, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Paul D Yoo
- Faculty of Science and Technology, Bournemouth University, Bournemouth, UK.
| |
Collapse
|
42
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
43
|
Gene Prioritization by Compressive Data Fusion and Chaining. PLoS Comput Biol 2015; 11:e1004552. [PMID: 26465776 PMCID: PMC4605714 DOI: 10.1371/journal.pcbi.1004552] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 09/12/2015] [Indexed: 01/17/2023] Open
Abstract
Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.
Collapse
|
44
|
Yang H, Robinson PN, Wang K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat Methods 2015; 12:841-3. [PMID: 26192085 PMCID: PMC4718403 DOI: 10.1038/nmeth.3484] [Citation(s) in RCA: 282] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/18/2015] [Indexed: 12/21/2022]
Abstract
Prior biological knowledge and phenotype information may help to identify disease genes from human whole-genome and whole-exome sequencing studies. We developed Phenolyzer (http://phenolyzer.usc.edu), a tool that uses prior information to implicate genes involved in diseases. Phenolyzer exhibits superior performance over competing methods for prioritizing Mendelian and complex disease genes, based on disease or phenotype terms entered as free text.
Collapse
Affiliation(s)
- Hui Yang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California, USA
- Neuroscience Graduate Program, University of Southern California, Los Angeles, California, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California, USA
- Department of Psychiatry, University of Southern California, Los Angeles, California, USA
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
45
|
Browne F, Wang H, Zheng H. A computational framework for the prioritization of disease-gene candidates. BMC Genomics 2015; 16 Suppl 9:S2. [PMID: 26330267 PMCID: PMC4547404 DOI: 10.1186/1471-2164-16-s9-s2] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background The identification of genes and uncovering the role they play in diseases is an important and complex challenge. Genome-wide linkage and association studies have made advancements in identifying genetic variants that underpin human disease. An important challenge now is to identify meaningful disease-associated genes from a long list of candidate genes implicated by these analyses. The application of gene prioritization can enhance our understanding of disease mechanisms and aid in the discovery of drug targets. The integration of protein-protein interaction networks along with disease datasets and contextual information is an important tool in unraveling the molecular basis of diseases. Results In this paper we propose a computational pipeline for the prioritization of disease-gene candidates. Diverse heterogeneous data including: gene-expression, protein-protein interaction network, ontology-based similarity and topological measures and tissue-specific are integrated. The pipeline was applied to prioritize Alzheimer's Disease (AD) genes, whereby a list of 32 prioritized genes was generated. This approach correctly identified key AD susceptible genes: PSEN1 and TRAF1. Biological process enrichment analysis revealed the prioritized genes are modulated in AD pathogenesis including: regulation of neurogenesis and generation of neurons. Relatively high predictive performance (AUC: 0.70) was observed when classifying AD and normal gene expression profiles from individuals using leave-one-out cross validation. Conclusions This work provides a foundation for future investigation of diverse heterogeneous data integration for disease-gene prioritization.
Collapse
|
46
|
Jeong JC, Chen X. A New Semantic Functional Similarity over Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:322-334. [PMID: 26357220 DOI: 10.1109/tcbb.2014.2343963] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identifying functionally similar or closely related genes and gene products has significant impacts on biological and clinical studies as well as drug discovery. In this paper, we propose an effective and practically useful method measuring both gene and gene product similarity by integrating the topology of gene ontology, known functional domains and their functional annotations. The proposed method is comprehensively evaluated through statistical analysis of the similarities derived from sequence, structure and phylogenetic profiles, and clustering analysis of disease genes clusters. Our results show that the proposed method clearly outperforms other conventional methods. Furthermore, literature analysis also reveals that the proposed method is both statistically and biologically promising for identifying functionally similar genes or gene products. In particular, we demonstrate that the proposed functional similarity metric is capable of discoverying new disease related genes or gene products.
Collapse
|
47
|
Zhang SB, Lai JH. Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. Gene 2015; 558:108-17. [DOI: 10.1016/j.gene.2014.12.062] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 12/15/2014] [Accepted: 12/24/2014] [Indexed: 11/25/2022]
|
48
|
Na D, Son H, Gsponer J. Categorizer: a tool to categorize genes into user-defined biological groups based on semantic similarity. BMC Genomics 2014; 15:1091. [PMID: 25495442 PMCID: PMC4298957 DOI: 10.1186/1471-2164-15-1091] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 12/04/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent-child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis. RESULTS We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington's disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures. CONCLUSION Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.
Collapse
Affiliation(s)
| | | | - Jörg Gsponer
- Department of Biochemistry and Molecular Biology, Centre for High-throughput Biology, University of British Columbia, 2125 East Mall, Vancouver, BC V6T 1Z4, Canada.
| |
Collapse
|
49
|
Taha K. RGFinder: a system for determining semantically related genes using GO graph minimum spanning tree. IEEE Trans Nanobioscience 2014; 14:24-37. [PMID: 25343765 DOI: 10.1109/tnb.2014.2363295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Biologists often need to know the set S' of genes that are the most functionally and semantically related to a given set S of genes. For determining the set S', most current gene similarity measures overlook the structural dependencies among the Gene Ontology (GO) terms annotating the set S, which may lead to erroneous results. We introduce in this paper a biological search engine called RGFinder that considers the structural dependencies among GO terms by employing the concept of existence dependency. RGFinder assigns a weight to each edge in GO graph to represent the degree of relatedness between the two GO terms connected by the edge. The value of the weight is determined based on the following factors: 1) type of the relation represented by the edge (e.g., an "is-a" relation is assigned a different weight than a "part-of" relation), 2) the functional relationship between the two GO terms connected by the edge, and 3) the string-substring relationship between the names of the two GO terms connected by the edge. RGFinder then constructs a minimum spanning tree of GO graph based on these weights. In the framework of RGFinder, the set S' is annotated to the GO terms located at the lowest convergences of the subtree of the minimum spanning tree that passes through the GO terms annotating set S. We evaluated RGFinder experimentally and compared it with four gene set enrichment systems. Results showed marked improvement.
Collapse
|
50
|
Luo Y, Riedlinger G, Szolovits P. Text mining in cancer gene and pathway prioritization. Cancer Inform 2014; 13:69-79. [PMID: 25392685 PMCID: PMC4216063 DOI: 10.4137/cin.s13874] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 05/18/2014] [Accepted: 05/18/2014] [Indexed: 12/18/2022] Open
Abstract
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Collapse
Affiliation(s)
- Yuan Luo
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory Riedlinger
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|