1
|
Yuan F, Cao X, Zhang YH, Chen L, Huang T, Li Z, Cai YD. Identification of Novel Lung Cancer Driver Genes Connecting Different Omics Levels With a Heat Diffusion Algorithm. Front Cell Dev Biol 2022; 10:825272. [PMID: 35155435 PMCID: PMC8826452 DOI: 10.3389/fcell.2022.825272] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/06/2022] [Indexed: 12/21/2022] Open
Abstract
Cancer driver gene is a type of gene with abnormal alterations that initiate or promote tumorigenesis. Driver genes can be used to reveal the fundamental pathological mechanisms of tumorigenesis. These genes may have pathological changes at different omics levels. Thus, identifying cancer driver genes involving two or more omics levels is essential. In this study, a computational investigation was conducted on lung cancer driver genes. Four omics levels, namely, epigenomics, genomics, transcriptomics, and post-transcriptomics, were involved. From the driver genes at each level, the Laplacian heat diffusion algorithm was executed on a protein–protein interaction network for discovering latent driver genes at this level. A following screen procedure was performed to extract essential driver genes, which contained three tests: permutation, association, and function tests, which can exclude false-positive genes and screen essential ones. Finally, the intersection operation was performed to obtain novel driver genes involving two omic levels. The analyses on obtained genes indicated that they were associated with fundamental pathological mechanisms of lung cancer at two corresponding omics levels.
Collapse
Affiliation(s)
- Fei Yuan
- Department of Science and Technology, Binzhou Medical University Hospital, Binzhou, China
| | - Xiaoyu Cao
- Department of Neurology, Binzhou Medical University Hospital, Binzhou, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; ZhanDong Li, ; Yu-Dong Cai,
| | - ZhanDong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
- *Correspondence: Tao Huang, ; ZhanDong Li, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; ZhanDong Li, ; Yu-Dong Cai,
| |
Collapse
|
2
|
Sheng M, Cai H, Yang Q, Li J, Zhang J, Liu L. A Random Walk-Based Method to Identify Candidate Genes Associated With Lymphoma. Front Genet 2021; 12:792754. [PMID: 34899868 PMCID: PMC8655984 DOI: 10.3389/fgene.2021.792754] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 11/02/2021] [Indexed: 11/16/2022] Open
Abstract
Lymphoma is a serious type of cancer, especially for adolescents and elder adults, although this malignancy is quite rare compared with other types of cancer. The cause of this malignancy remains ambiguous. Genetic factor is deemed to be highly associated with the initiation and progression of lymphoma, and several genes have been related to this disease. Determining the pathogeny of lymphoma by identifying the related genes is important. In this study, we presented a random walk-based method to infer the novel lymphoma-associated genes. From the reported 1,458 lymphoma-associated genes and protein–protein interaction network, raw candidate genes were mined by using the random walk with restart algorithm. The determined raw genes were further filtered by using three screening tests (i.e., permutation, linkage, and enrichment tests). These tests could control false-positive genes and screen out essential candidate genes with strong linkages to validate the lymphoma-associated genes. A total of 108 inferred genes were obtained. Analytical results indicated that some inferred genes, such as RAC3, TEC, IRAK2/3/4, PRKCE, SMAD3, BLK, TXK, PRKCQ, were associated with the initiation and progression of lymphoma.
Collapse
Affiliation(s)
- Minjie Sheng
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Haiying Cai
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Qin Yang
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Jing Li
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China.,Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China.,National Clinical Research Center for Eye Diseases, Shanghai, China.,Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Lihua Liu
- Department of Ophthalmology, Yangpu Hospital, School of Medicine, Tongji University, Shanghai, China
| |
Collapse
|
3
|
Identification of Latent Oncogenes with a Network Embedding Method and Random Forest. BIOMED RESEARCH INTERNATIONAL 2020; 2020:5160396. [PMID: 33029511 PMCID: PMC7530476 DOI: 10.1155/2020/5160396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/09/2020] [Accepted: 09/14/2020] [Indexed: 12/29/2022]
Abstract
Oncogene is a special type of genes, which can promote the tumor initiation. Good study on oncogenes is helpful for understanding the cause of cancers. Experimental techniques in early time are quite popular in detecting oncogenes. However, their defects become more and more evident in recent years, such as high cost and long time. The newly proposed computational methods provide an alternative way to study oncogenes, which can provide useful clues for further investigations on candidate genes. Considering the limitations of some previous computational methods, such as lack of learning procedures and terming genes as individual subjects, a novel computational method was proposed in this study. The method adopted the features derived from multiple protein networks, viewing proteins in a system level. A classic machine learning algorithm, random forest, was applied on these features to capture the essential characteristic of oncogenes, thereby building the prediction model. All genes except validated oncogenes were ranked with a measurement yielded by the prediction model. Top genes were quite different from potential oncogenes discovered by previous methods, and they can be confirmed to become novel oncogenes. It was indicated that the newly identified genes can be essential supplements for previous results.
Collapse
|
4
|
Identification of COVID-19 Infection-Related Human Genes Based on a Random Walk Model in a Virus-Human Protein Interaction Network. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4256301. [PMID: 32685484 PMCID: PMC7345912 DOI: 10.1155/2020/4256301] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 06/26/2020] [Indexed: 12/15/2022]
Abstract
Coronaviruses are specific crown-shaped viruses that were first identified in the 1960s, and three typical examples of the most recent coronavirus disease outbreaks include severe acute respiratory syndrome (SARS), Middle East respiratory syndrome (MERS), and COVID-19. Particularly, COVID-19 is currently causing a worldwide pandemic, threatening the health of human beings globally. The identification of viral pathogenic mechanisms is important for further developing effective drugs and targeted clinical treatment methods. The delayed revelation of viral infectious mechanisms is currently one of the technical obstacles in the prevention and treatment of infectious diseases. In this study, we proposed a random walk model to identify the potential pathological mechanisms of COVID-19 on a virus–human protein interaction network, and we effectively identified a group of proteins that have already been determined to be potentially important for COVID-19 infection and for similar SARS infections, which help further developing drugs and targeted therapeutic methods against COVID-19. Moreover, we constructed a standard computational workflow for predicting the pathological biomarkers and related pharmacological targets of infectious diseases.
Collapse
|
5
|
Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:1573543. [PMID: 32454877 PMCID: PMC7232712 DOI: 10.1155/2020/1573543] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/14/2020] [Accepted: 04/23/2020] [Indexed: 01/07/2023]
Abstract
Drugs are an important way to treat various diseases. However, they inevitably produce side effects, bringing great risks to human bodies and pharmaceutical companies. How to predict the side effects of drugs has become one of the essential problems in drug research. Designing efficient computational methods is an alternative way. Some studies paired the drug and side effect as a sample, thereby modeling the problem as a binary classification problem. However, the selection of negative samples is a key problem in this case. In this study, a novel negative sample selection strategy was designed for accessing high-quality negative samples. Such strategy applied the random walk with restart (RWR) algorithm on a chemical-chemical interaction network to select pairs of drugs and side effects, such that drugs were less likely to have corresponding side effects, as negative samples. Through several tests with a fixed feature extraction scheme and different machine-learning algorithms, models with selected negative samples produced high performance. The best model even yielded nearly perfect performance. These models had much higher performance than those without such strategy or with another selection strategy. Furthermore, it is not necessary to consider the balance of positive and negative samples under such a strategy.
Collapse
|
6
|
Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms. Gene Ther 2019; 26:465-478. [PMID: 31455874 DOI: 10.1038/s41434-019-0099-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/18/2019] [Accepted: 07/15/2019] [Indexed: 12/14/2022]
Abstract
Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.
Collapse
|
7
|
Lu S, Zhu ZG, Lu WC. Inferring novel genes related to colorectal cancer via random walk with restart algorithm. Gene Ther 2019; 26:373-385. [PMID: 31308477 DOI: 10.1038/s41434-019-0090-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2018] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022]
Abstract
Colorectal cancer (CRC) is the third most common type of cancer. In recent decades, genomic analysis has played an increasingly important role in understanding the molecular mechanisms of CRC. However, its pathogenesis has not been fully uncovered. Identification of genes related to CRC as complete as possible is an important way to investigate its pathogenesis. Therefore, we proposed a new computational method for the identification of novel CRC-associated genes. The proposed method is based on existing proven CRC-associated genes, human protein-protein interaction networks, and random walk with restart algorithm. The utility of the method is indicated by comparing it to the methods based on Guilt-by-association or shortest path algorithm. Using the proposed method, we successfully identified 298 novel CRC-associated genes. Previous studies have validated the involvement of the majority of these 298 novel genes in CRC-associated biological processes, thus suggesting the efficacy and accuracy of our method.
Collapse
Affiliation(s)
- Sheng Lu
- Department of General Surgery, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Digestive Surgery, Shanghai, 200025, China
| | - Zheng-Gang Zhu
- Department of General Surgery, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Digestive Surgery, Shanghai, 200025, China
| | - Wen-Cong Lu
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
8
|
Wang T, Chen L, Zhao X. Prediction of Drug Combinations with a Network Embedding Method. Comb Chem High Throughput Screen 2019; 21:789-797. [DOI: 10.2174/1386207322666181226170140] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 11/02/2018] [Accepted: 11/28/2018] [Indexed: 01/10/2023]
Abstract
Aim and Objective:
There are several diseases having a complicated mechanism. For such
complicated diseases, a single drug cannot treat them very well because these diseases always
involve several targets and single targeted drugs cannot modulate these targets simultaneously. Drug
combination is an effective way to treat such diseases. However, determination of effective drug
combinations is time- and cost-consuming via traditional methods. It is urgent to build quick and
cheap methods in this regard. Designing effective computational methods incorporating advanced
computational techniques to predict drug combinations is an alternative and feasible way.
Method:
In this study, we proposed a novel network embedding method, which can extract
topological features of each drug combination from a drug network that was constructed using
chemical-chemical interaction information retrieved from STITCH. These topological features were
combined with individual features of drug combination reported in one previous study. Several
advanced computational methods were employed to construct an effective prediction model, such as
synthetic minority oversampling technique (SMOTE) that was used to tackle imbalanced dataset,
minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS)
methods that were adopted to analyze features and extract optimal features for building an optimal
support machine vector (SVM) classifier.
Results and Conclusion:
The constructed optimal SVM classifier yielded an MCC of 0.806, which
is superior to the classifier only using individual features with or without SMOTE. The performance
of the classifier can be improved by combining the topological features and essential features of a
drug combination.
Collapse
Affiliation(s)
- Tianyun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
9
|
Lu S, Zhao K, Wang X, Liu H, Ainiwaer X, Xu Y, Ye M. Use of Laplacian Heat Diffusion Algorithm to Infer Novel Genes With Functions Related to Uveitis. Front Genet 2018; 9:425. [PMID: 30349554 PMCID: PMC6186792 DOI: 10.3389/fgene.2018.00425] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 09/10/2018] [Indexed: 12/17/2022] Open
Abstract
Uveitis is the inflammation of the uvea and is a serious eye disease that can cause blindness for middle-aged and young people. However, the pathogenesis of this disease has not been fully uncovered and thus renders difficulties in designing effective treatments. Completely identifying the genes related to this disease can help improve and accelerate the comprehension of uveitis. In this study, a new computational method was developed to infer potential related genes based on validated ones. We employed a large protein–protein interaction network reported in STRING, in which Laplacian heat diffusion algorithm was applied using validated genes as seed nodes. Except for the validated ones, all genes in the network were filtered by three tests, namely, permutation, association, and function tests, which evaluated the genes based on their specialties and associations to uveitis. Results indicated that 59 inferred genes were accessed, several of which were confirmed to be highly related to uveitis by literature review. In addition, the inferred genes were compared with those reported in a previous study, indicating that our reported genes are necessary supplements.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Ke Zhao
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xuefei Wang
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Hui Liu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xiamuxiya Ainiwaer
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Yan Xu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Min Ye
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| |
Collapse
|
10
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|