1
|
Shah E, Maji P. Scalable Non-Linear Graph Fusion for Prioritizing Cancer-Causing Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1130-1143. [PMID: 32966220 DOI: 10.1109/tcbb.2020.3026219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few decades, both gene expression data and protein-protein interaction (PPI)networks have been extensively studied, due to their ability to depict important characteristics of disease-associated genes. In this regard, the paper presents a new gene prioritization algorithm to identify and prioritize cancer-causing genes, integrating judiciously the complementary information obtained from two data sources. The proposed algorithm selects disease-causing genes by maximizing the importance of selected genes and functional similarity among them. A new quantitative index is introduced to evaluate the importance of a gene. It considers whether a gene exhibits a differential expression pattern across sick and healthy individuals, and has a strong connectivity in the PPI network, which are the important characteristics of a potential biomarker. As disease-associated genes are expected to have similar expression profiles and topological structures, a scalable non-linear graph fusion technique, termed as ScaNGraF, is proposed to learn a disease-dependent functional similarity network from the co-expression and common neighbor based similarity networks. The proposed ScaNGraF, which is based on message passing algorithm, efficiently combines the shared and complementary information provided by different data sources with significantly lower computational cost. A new measure, termed as DiCoIN, is introduced to evaluate the quality of a learned affinity network. The performance of the proposed graph fusion technique and gene selection algorithm is extensively compared with that of some existing methods, using several cancer data sets.
Collapse
|
2
|
Mahapatra S, Bhuyan R, Das J, Swarnkar T. Integrated multiplex network based approach for hub gene identification in oral cancer. Heliyon 2021; 7:e07418. [PMID: 34258466 PMCID: PMC8258848 DOI: 10.1016/j.heliyon.2021.e07418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/27/2021] [Accepted: 06/23/2021] [Indexed: 02/01/2023] Open
Abstract
Background: The incidence of Oral Cancer (OC) is high in Asian countries, which goes undetected at its early stage. The study of genetics, especially genetic networks holds great promise in this endeavor. Hub genes in a genetic network are prominent in regulating the whole network structure of genes. Thus identification of such genes related to specific cancer types can help in reducing the gap in OC prognosis. Methods: Traditional study of network biology is unable to decipher the inter-dependencies within and across diverse biological networks. Multiplex network provides a powerful representation of such systems and encodes much richer information than isolated networks. In this work, we focused on the entire multiplex structure of the genetic network integrating the gene expression profile and DNA methylation profile for OC. Further, hub genes were identified by considering their connectivity in the multiplex structure and the respective protein-protein interaction (PPI) network as well. Results: 46 hub genes were inferred in our approach with a high prediction accuracy (96%), outstanding Matthews coefficient correlation value (93%) and significant biological implications. Among them, genes PIK3CG, PIK3R5, MYH7, CDC20 and CCL4 were differentially expressed and predominantly enriched in molecular cascades specific to OC. Conclusions: The identified hub genes in this work carry ontological signatures specific to cancer, which may further facilitate improved understanding of the tumorigenesis process and the underlying molecular events. Result indicates the effectiveness of our integrated multiplex network approach for hub gene identification. This work puts an innovative research route for multi-omics biological data analysis.
Collapse
Affiliation(s)
- S. Mahapatra
- Department of Computer Application, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| | - R. Bhuyan
- Department of Oral Pathology & Microbiology, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| | - J. Das
- Centre for Genomics & Biomedical Informatics, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| | - T. Swarnkar
- Department of Computer Application, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| |
Collapse
|
3
|
Das D, Krishnan SR, Roy A, Bulusu G. A network-based approach reveals novel invasion and Maurer's clefts-related proteins in Plasmodium falciparum. Mol Omics 2019; 15:431-441. [PMID: 31631203 DOI: 10.1039/c9mo00124g] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Malaria continues to be a major concern in developing countries despite continuous efforts to find a cure for the disease. Understanding the pathogenesis mechanism is necessary to identify more effective drug targets against malaria. Many years of experimental research have generated a large amount of data for the malarial parasite, Plasmodium falciparum. These data are useful to understand the importance of certain parasite proteins, but it often remains unclear how these proteins come together, interact with other proteins and carry out their function. Identification of all proteins involved in pathogenesis is an important step towards understanding the molecular mechanism of pathogenesis. In this study, dynamic stage-specific protein-protein interaction networks were created based on gene expression data during the parasite's intra-erythrocytic stages and static protein-protein interaction data. Using previously known proteins of a biological event as seed proteins, the random walk with restart (RWR) method was used on the dynamic protein-protein interaction networks to identify novel proteins related to that event. Two screening procedures namely, permutation test and GO enrichment test were performed to increase the reliability of the RWR predictions. The proposed method was first validated on Plasmodium falciparum proteins related to invasion, where it could reproduce the existing knowledge from a small set of seed proteins. It was then used to identify novel Maurer's clefts resident proteins, where it could identify 152 parasite proteins. We show that the current approach can annotate conserved proteins with unknown function. The predicted proteins can help build a mechanistic model for disease pathogenesis, which will be useful in identifying new drug targets.
Collapse
Affiliation(s)
- Dibyajyoti Das
- TCS Innovation Labs - Hyderabad (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad, India.
| | | | | | | |
Collapse
|
4
|
Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms. Gene Ther 2019; 26:465-478. [PMID: 31455874 DOI: 10.1038/s41434-019-0099-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/18/2019] [Accepted: 07/15/2019] [Indexed: 12/14/2022]
Abstract
Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.
Collapse
|
5
|
Lu S, Zhu ZG, Lu WC. Inferring novel genes related to colorectal cancer via random walk with restart algorithm. Gene Ther 2019; 26:373-385. [PMID: 31308477 DOI: 10.1038/s41434-019-0090-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2018] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022]
Abstract
Colorectal cancer (CRC) is the third most common type of cancer. In recent decades, genomic analysis has played an increasingly important role in understanding the molecular mechanisms of CRC. However, its pathogenesis has not been fully uncovered. Identification of genes related to CRC as complete as possible is an important way to investigate its pathogenesis. Therefore, we proposed a new computational method for the identification of novel CRC-associated genes. The proposed method is based on existing proven CRC-associated genes, human protein-protein interaction networks, and random walk with restart algorithm. The utility of the method is indicated by comparing it to the methods based on Guilt-by-association or shortest path algorithm. Using the proposed method, we successfully identified 298 novel CRC-associated genes. Previous studies have validated the involvement of the majority of these 298 novel genes in CRC-associated biological processes, thus suggesting the efficacy and accuracy of our method.
Collapse
Affiliation(s)
- Sheng Lu
- Department of General Surgery, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Digestive Surgery, Shanghai, 200025, China
| | - Zheng-Gang Zhu
- Department of General Surgery, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Digestive Surgery, Shanghai, 200025, China
| | - Wen-Cong Lu
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
6
|
Sheng M, Dong Z, Xie Y. Identification of tumor-educated platelet biomarkers of non-small-cell lung cancer. Onco Targets Ther 2018; 11:8143-8151. [PMID: 30532555 PMCID: PMC6241732 DOI: 10.2147/ott.s177384] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Lung cancer is a severe cancer with a high death rate. The 5-year survival rate for stage III lung cancer is much lower than stage I. Early detection and intervention of lung cancer patients can significantly increase their survival time. However, conventional lung cancer-screening methods, such as chest X-rays, sputum cytology, positron-emission tomography (PET), low-dose computed tomography (CT), magnetic resonance imaging, and gene-mutation, -methylation, and -expression biomarkers of lung tissue, are invasive, radiational, or expensive. Liquid biopsy is non-invasive and does little harm to the body. It can reflect early-stage dysfunctions of tumorigenesis and enable early detection and intervention. METHODS In this study, we analyzed RNA-sequencing data of tumor-educated platelets (TEPs) in 402 non-small-cell lung cancer (NSCLC) patients and 231 healthy controls. A total of 48 biomarker genes were selected with advanced minimal-redundancy, maximal-relevance, and incremental feature-selection (IFS) methods. RESULTS A support vector-machine (SVM) classifier based on the 48 biomarker genes accurately predicted NSCLC with leave-one-out cross-validation (LOOCV) sensitivity, specificity, accuracy, and Matthews correlation coefficients of 0.925, 0.827, 0.889, and 0.760, respectively. Network analysis of the 48 genes revealed that the WASF1 actin cytoskeleton module, PRKAB2 kinase module, RSRC1 ribosomal protein module, PDHB carbohydrate-metabolism module, and three intermodule hubs (TPM2, MYL9, and PPP1R12C) may play important roles in NSCLC tumorigenesis and progression. CONCLUSION The 48-gene TEP liquid-biopsy biomarkers will facilitate early screening of NSCLC and prolong the survival of cancer patients.
Collapse
Affiliation(s)
- Meiling Sheng
- Department of Respiration, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China
| | - Zhaohui Dong
- Department of Intensive Care Unit, First Hospital of Huzhou, First Affiliated Hospital of Huzhou University, Huzhou, Zhejiang 313000, China
| | - Yanping Xie
- Department of Respiratory Medicine, First Hospital of Huzhou, First Affiliated Hospital of Huzhou University, Huzhou, Zhejiang 313000, China,
| |
Collapse
|
7
|
Lu S, Zhao K, Wang X, Liu H, Ainiwaer X, Xu Y, Ye M. Use of Laplacian Heat Diffusion Algorithm to Infer Novel Genes With Functions Related to Uveitis. Front Genet 2018; 9:425. [PMID: 30349554 PMCID: PMC6186792 DOI: 10.3389/fgene.2018.00425] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 09/10/2018] [Indexed: 12/17/2022] Open
Abstract
Uveitis is the inflammation of the uvea and is a serious eye disease that can cause blindness for middle-aged and young people. However, the pathogenesis of this disease has not been fully uncovered and thus renders difficulties in designing effective treatments. Completely identifying the genes related to this disease can help improve and accelerate the comprehension of uveitis. In this study, a new computational method was developed to infer potential related genes based on validated ones. We employed a large protein–protein interaction network reported in STRING, in which Laplacian heat diffusion algorithm was applied using validated genes as seed nodes. Except for the validated ones, all genes in the network were filtered by three tests, namely, permutation, association, and function tests, which evaluated the genes based on their specialties and associations to uveitis. Results indicated that 59 inferred genes were accessed, several of which were confirmed to be highly related to uveitis by literature review. In addition, the inferred genes were compared with those reported in a previous study, indicating that our reported genes are necessary supplements.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Ke Zhao
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xuefei Wang
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Hui Liu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xiamuxiya Ainiwaer
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Yan Xu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Min Ye
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| |
Collapse
|
8
|
Chen L, Zhang YH, Zhang Z, Huang T, Cai YD. Inferring Novel Tumor Suppressor Genes with a Protein-Protein Interaction Network and Network Diffusion Algorithms. MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT 2018; 10:57-67. [PMID: 30069494 PMCID: PMC6068090 DOI: 10.1016/j.omtm.2018.06.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 06/19/2018] [Indexed: 02/07/2023]
Abstract
Extensive studies on tumor suppressor genes (TSGs) are helpful to understand the pathogenesis of cancer and design effective treatments. However, identifying TSGs using traditional experiments is quite difficult and time consuming. Developing computational methods to identify possible TSGs is an alternative way. In this study, we proposed two computational methods that integrated two network diffusion algorithms, including Laplacian heat diffusion (LHD) and random walk with restart (RWR), to search possible genes in the whole network. These two computational methods were LHD-based and RWR-based methods. To increase the reliability of the putative genes, three strict screening tests followed to filter genes obtained by these two algorithms. After comparing the putative genes obtained by the two methods, we designated twelve genes (e.g., MAP3K10, RND1, and OTX2) as common genes, 29 genes (e.g., RFC2 and GUCY2F) as genes that were identified only by the LHD-based method, and 128 genes (e.g., SNAI2 and FGF4) as genes that were inferred only by the RWR-based method. Some obtained genes can be confirmed as novel TSGs according to recent publications, suggesting the utility of our two proposed methods. In addition, the reported genes in this study were quite different from those reported in a previous one.
Collapse
Affiliation(s)
- Lei Chen
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People’s Republic of China
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People’s Republic of China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People’s Republic of China
| | - Zhenghua Zhang
- Department of Clinical Oncology, Jing’an District Centre Hospital of Shanghai (Huashan Hospital Fudan University Jing’An Branch), Shanghai 200040, People’s Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People’s Republic of China
- Corresponding author: Tao Huang, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People’s Republic of China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People’s Republic of China
- Corresponding author: Yu-Dong Cai, School of Life Sciences, Shanghai University, Shanghai 200444, People’s Republic of China.
| |
Collapse
|
9
|
Computational Approach to Investigating Key GO Terms and KEGG Pathways Associated with CNV. BIOMED RESEARCH INTERNATIONAL 2018; 2018:8406857. [PMID: 29850576 PMCID: PMC5925134 DOI: 10.1155/2018/8406857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2017] [Revised: 02/28/2018] [Accepted: 03/06/2018] [Indexed: 12/25/2022]
Abstract
Choroidal neovascularization (CNV) is a severe eye disease that leads to blindness, especially in the elderly population. Various endogenous and exogenous regulatory factors promote its pathogenesis. However, the detailed molecular biological mechanisms of CNV have not been fully revealed. In this study, by using advanced computational tools, a number of key gene ontology (GO) terms and KEGG pathways were selected for CNV. A total of 29 validated genes associated with CNV and 17,639 nonvalidated genes were encoded based on the features derived from the GO terms and KEGG pathways by using the enrichment theory. The widely accepted feature selection method-maximum relevance and minimum redundancy (mRMR)-was applied to analyze and rank the features. An extensive literature review for the top 45 ranking features was conducted to confirm their close associations with CNV. Identifying the molecular biological mechanisms of CNV as described by the GO terms and KEGG pathways may contribute to improving the understanding of the pathogenesis of CNV.
Collapse
|
10
|
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
|
11
|
Zou Q, He W. Special Protein Molecules Computational Identification. Int J Mol Sci 2018; 19:ijms19020536. [PMID: 29439426 PMCID: PMC5855758 DOI: 10.3390/ijms19020536] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 02/02/2018] [Accepted: 02/10/2018] [Indexed: 01/29/2023] Open
Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| | - Wenying He
- School of Computer Science and Technology, Tianjin University, Tianjin 300354, China.
| |
Collapse
|
12
|
Abstract
In post-genomic era, an important task is to explore the function of individual biological molecules (i.e., gene, noncoding RNA, protein, metabolite) and their organization in living cells. For this end, gene regulatory networks (GRNs) are constructed to show relationship between biological molecules, in which the vertices of network denote biological molecules and the edges of network present connection between nodes (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). Biologists can understand not only the function of biological molecules but also the organization of components of living cells through interpreting the GRNs, since a gene regulatory network is a comprehensively physiological map of living cells and reflects influence of genetic and epigenetic factors (Strogatz, Nature 410:268-276, 2001; Bray, Science 301:1864-1865, 2003). In this paper, we will review the inference methods of GRN reconstruction and analysis approaches of network structure. As a powerful tool for studying complex diseases and biological processes, the applications of the network method in pathway analysis and disease gene identification will be introduced.
Collapse
|
13
|
Deciphering the Relationship between Obesity and Various Diseases from a Network Perspective. Genes (Basel) 2017; 8:genes8120392. [PMID: 29258237 PMCID: PMC5748710 DOI: 10.3390/genes8120392] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 12/02/2017] [Accepted: 12/13/2017] [Indexed: 12/14/2022] Open
Abstract
The number of obesity cases is rapidly increasing in developed and developing countries, thereby causing significant health problems worldwide. The pathologic factors of obesity at the molecular level are not fully characterized, although the imbalance between energy intake and consumption is widely recognized as the main reason for fat accumulation. Previous studies reported that obesity can be caused by the dysfunction of genes associated with other diseases, such as myocardial infarction, hence providing new insights into dissecting the pathogenesis of obesity by investigating its associations with other diseases. In this study, we investigated the relationship between obesity and diseases from Online Mendelian Inheritance in Man (OMIM) databases on the protein–protein interaction (PPI) network. The obesity genes and genes of one OMIM disease were mapped onto the network, and the interaction scores between the two gene sets were investigated on the basis of the PPI of individual gene pairs, thereby inferring the relationship between obesity and this disease. Results suggested that diseases related to nutrition and endocrine are the top two diseases that are closely associated with obesity. This finding is consistent with our general knowledge and indicates the reliability of our obtained results. Moreover, we inferred that diseases related to psychiatric factors and bone may also be highly related to obesity because the two diseases followed the diseases related to nutrition and endocrine according to our results. Numerous obesity–disease associations were identified in the literature to confirm the relationships between obesity and the aforementioned four diseases. These new results may help understand the underlying molecular mechanisms of obesity–disease co-occurrence and provide useful insights for disease prevention and intervention.
Collapse
|
14
|
Chen L, Pan H, Zhang YH, Feng K, Kong X, Huang T, Cai YD. Network-Based Method for Identifying Co- Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues. Genes (Basel) 2017; 8:genes8100252. [PMID: 28974058 PMCID: PMC5664102 DOI: 10.3390/genes8100252] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 09/28/2017] [Indexed: 12/26/2022] Open
Abstract
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Hongying Pan
- Department of Oral Medicine, Infection and Immunity, Harvard School of Dental Medicine, Harvard University, Boston, MA 02115, USA.
- Department of Orthopedic Surgery, Brigham and Women's Hospital, Harvard University, Boston, MA 02115, USA.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou 510507, Guangdong, China.
| | - XiangYin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
15
|
A computational method using the random walk with restart algorithm for identifying novel epigenetic factors. Mol Genet Genomics 2017; 293:293-301. [PMID: 28932904 DOI: 10.1007/s00438-017-1374-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 09/11/2017] [Indexed: 12/31/2022]
Abstract
Epigenetic regulation has long been recognized as a significant factor in various biological processes, such as development, transcriptional regulation, spermatogenesis, and chromosome stabilization. Epigenetic alterations lead to many human diseases, including cancer, depression, autism, and immune system defects. Although efforts have been made to identify epigenetic regulators, it remains a challenge to systematically uncover all the components of the epigenetic regulation in the genome level using experimental approaches. The advances of constructing protein-protein interaction (PPI) networks provide an excellent opportunity to identify novel epigenetic factors computationally in the genome level. In this study, we identified potential epigenetic factors by using a computational method that applied the random walk with restart (RWR) algorithm on a protein-protein interaction (PPI) network using reported epigenetic factors as seed nodes. False positives were identified by their specific roles in the PPI network or by a low-confidence interaction and a weak functional relationship with epigenetic regulators. After filtering out the false positives, 26 candidate epigenetic factors were finally accessed. According to previous studies, 22 of these are thought to be involved in epigenetic regulation, suggesting the robustness of our method. Our study provides a novel computational approach which successfully identified 26 potential epigenetic factors, paving the way on deepening our understandings on the epigenetic mechanism.
Collapse
|
16
|
Zhang YH, Huang T, Chen L, Xu Y, Hu Y, Hu LD, Cai Y, Kong X. Identifying and analyzing different cancer subtypes using RNA-seq data of blood platelets. Oncotarget 2017; 8:87494-87511. [PMID: 29152097 PMCID: PMC5675649 DOI: 10.18632/oncotarget.20903] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 08/16/2017] [Indexed: 12/11/2022] Open
Abstract
Detection and diagnosis of cancer are especially important for early prevention and effective treatments. Traditional methods of cancer detection are usually time-consuming and expensive. Liquid biopsy, a newly proposed noninvasive detection approach, can promote the accuracy and decrease the cost of detection according to a personalized expression profile. However, few studies have been performed to analyze this type of data, which can promote more effective methods for detection of different cancer subtypes. In this study, we applied some reliable machine learning algorithms to analyze data retrieved from patients who had one of six cancer subtypes (breast cancer, colorectal cancer, glioblastoma, hepatobiliary cancer, lung cancer and pancreatic cancer) as well as healthy persons. Quantitative gene expression profiles were used to encode each sample. Then, they were analyzed by the maximum relevance minimum redundancy method. Two feature lists were obtained in which genes were ranked rigorously. The incremental feature selection method was applied to the mRMR feature list to extract the optimal feature subset, which can be used in the support vector machine algorithm to determine the best performance for the detection of cancer subtypes and healthy controls. The ten-fold cross-validation for the constructed optimal classification model yielded an overall accuracy of 0.751. On the other hand, we extracted the top eighteen features (genes), including TTN, RHOH, RPS20, TRBC2, in another feature list, the MaxRel feature list, and performed a detailed analysis of them. The results indicated that these genes could be important biomarkers for discriminating different cancer subtypes and healthy controls.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- Department of General Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, People's Republic of China.,Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - YaoChen Xu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu Hu
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Lan-Dian Hu
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| |
Collapse
|
17
|
Li L, Wang Y, An L, Kong X, Huang T. A network-based method using a random walk with restart algorithm and screening tests to identify novel genes associated with Menière's disease. PLoS One 2017; 12:e0182592. [PMID: 28787010 PMCID: PMC5546581 DOI: 10.1371/journal.pone.0182592] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 07/20/2017] [Indexed: 12/28/2022] Open
Abstract
As a chronic illness derived from hair cells of the inner ear, Menière’s disease (MD) negatively influences the quality of life of individuals and leads to a number of symptoms, such as dizziness, temporary hearing loss, and tinnitus. The complete identification of novel genes related to MD would help elucidate its underlying pathological mechanisms and improve its diagnosis and treatment. In this study, a network-based method was developed to identify novel MD-related genes based on known MD-related genes. A human protein-protein interaction (PPI) network was constructed using the PPI information reported in the STRING database. A classic ranking algorithm, the random walk with restart (RWR) algorithm, was employed to search for novel genes using known genes as seed nodes. To make the identified genes more reliable, a series of screening tests, including a permutation test, an interaction test and an enrichment test, were designed to select essential genes from those obtained by the RWR algorithm. As a result, several inferred genes, such as CD4, NOTCH2 and IL6, were discovered. Finally, a detailed biological analysis was performed on fifteen of the important inferred genes, which indicated their strong associations with MD.
Collapse
Affiliation(s)
- Lin Li
- Department of Otorhinolaryngology and Head & Neck, China-Japan Union Hospital of Jilin University, Changchun, China
| | - YanShu Wang
- Department of Anesthesia, The First Hospital of Jilin University, Changchun, China
| | - Lifeng An
- Department of Otorhinolaryngology and Head & Neck, China-Japan Union Hospital of Jilin University, Changchun, China
- * E-mail:
| | - XiangYin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
18
|
Identifying novel fruit-related genes in Arabidopsis thaliana based on the random walk with restart algorithm. PLoS One 2017; 12:e0177017. [PMID: 28472169 PMCID: PMC5417634 DOI: 10.1371/journal.pone.0177017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/20/2017] [Indexed: 01/03/2023] Open
Abstract
Fruit is essential for plant reproduction and is responsible for protection and dispersal of seeds. The development and maturation of fruit is tightly regulated by numerous genetic factors that respond to environmental and internal stimulation. In this study, we attempted to identify novel fruit-related genes in a model organism, Arabidopsis thaliana, using a computational method. Based on validated fruit-related genes, the random walk with restart (RWR) algorithm was applied on a protein-protein interaction (PPI) network using these genes as seeds. The identified genes with high probabilities were filtered by the permutation test and linkage tests. In the permutation test, the genes that were selected due to the structure of the PPI network were discarded. In the linkage tests, the importance of each candidate gene was measured from two aspects: (1) its functional associations with validated genes and (2) its similarity with validated genes on gene ontology (GO) terms and KEGG pathways. Finally, 255 inferred genes were obtained, subsequent extensive analysis of important genes revealed that they mainly contribute to ubiquitination (UBQ9, UBQ8, UBQ11, UBQ10), serine hydroxymethyl transfer (SHM7, SHM5, SHM6) or glycol-metabolism (HXKL2_ARATH, CSY5, GAPCP1), suggesting essential roles during the development and maturation of fruit in Arabidopsis thaliana.
Collapse
|