1
|
Yao Y, Lv Y, Tong L, Liang Y, Xi S, Ji B, Zhang G, Li L, Tian G, Tang M, Hu X, Li S, Yang J. ICSDA: a multi-modal deep learning model to predict breast cancer recurrence and metastasis risk by integrating pathological, clinical and gene expression data. Brief Bioinform 2022; 23:6761046. [PMID: 36242564 DOI: 10.1093/bib/bbac448] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Breast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data. Specifically, we segmented tumor regions in H&E into image blocks (256 × 256 pixels) and encoded each image block into a 1D feature vector using a deep neural network. Then, the attention module scored each area of the H&E-stained images and combined image features with clinical and gene expression data to predict the risk of recurrence and metastasis for each patient. To test the model, we downloaded all 196 breast cancer samples from the Cancer Genome Atlas with clinical, gene expression and H&E information simultaneously available. The samples were then divided into the training and testing sets with a ratio of 7: 3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling. The multi-modal model achieved an area-under-the-curve value of 0.75 on the testing set better than those based solely on H&E image, sequencing data and clinical data, respectively. This study might have clinical significance in identifying high-risk breast cancer patients, who may benefit from postoperative adjuvant treatment.
Collapse
Affiliation(s)
- Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, China
| | - Yaping Lv
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Genies Beijing Co., Ltd., Beijing 100102, China
| | - Ling Tong
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Yuebin Liang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Shuxue Xi
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Binbin Ji
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Guanglu Zhang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China
| | - Ling Li
- Basic Courses Department, Zhejiang Shuren University, Hangzhou 310000, China
| | - Geng Tian
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, 212013, China
| | - Xiyue Hu
- Dept. of Colorectal Surgery, National Cancer Center/ Cancer Hospital, Chinese Academy of Medical Science, 17 Panjiayuan Nanli, Chaoyang District, Beijing, China, 100021
| | - Shijun Li
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Jialiang Yang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
2
|
Zhang S, Jiang H, Gao B, Yang W, Wang G. Identification of Diagnostic Markers for Breast Cancer Based on Differential Gene Expression and Pathway Network. Front Cell Dev Biol 2022; 9:811585. [PMID: 35096840 PMCID: PMC8790293 DOI: 10.3389/fcell.2021.811585] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
Background: Breast cancer is the second largest cancer in the world, the incidence of breast cancer continues to rise worldwide, and women's health is seriously threatened. Therefore, it is very important to explore the characteristic changes of breast cancer from the gene level, including the screening of differentially expressed genes and the identification of diagnostic markers. Methods: The gene expression profiles of breast cancer were obtained from the TCGA database. The edgeR R software package was used to screen the differentially expressed genes between breast cancer patients and normal samples. The function and pathway enrichment analysis of these genes revealed significant enrichment of functions and pathways. Next, download these pathways from KEGG website, extract the gene interaction relations, construct the KEGG pathway gene interaction network. The potential diagnostic markers of breast cancer were obtained by combining the differentially expressed genes with the key genes in the network. Finally, these markers were used to construct the diagnostic prediction model of breast cancer, and the predictive ability of the model and the diagnostic ability of the markers were verified by internal and external data. Results: 1060 differentially expressed genes were identified between breast cancer patients and normal controls. Enrichment analysis revealed 28 significantly enriched pathways (p < 0.05). They were downloaded from KEGG website, and the gene interaction relations were extracted to construct the gene interaction network of KEGG pathway, which contained 1277 nodes and 7345 edges. The key nodes with a degree greater than 30 were extracted from the network, containing 154 genes. These 154 key genes shared 23 genes with differentially expressed genes, which serve as potential diagnostic markers for breast cancer. The 23 genes were used as features to construct the SVM classification model, and the model had good predictive ability in both the training dataset and the validation dataset (AUC = 0.960 and 0.907, respectively). Conclusion: This study showed that the difference of gene expression level is important for the diagnosis of breast cancer, and identified 23 breast cancer diagnostic markers, which provides valuable information for clinical diagnosis and basic treatment experiments.
Collapse
Affiliation(s)
- Shumei Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Haoran Jiang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
3
|
Zhang S, Zhang J, Zhang Q, Liang Y, Du Y, Wang G. Identification of Prognostic Biomarkers for Bladder Cancer Based on DNA Methylation Profile. Front Cell Dev Biol 2022; 9:817086. [PMID: 35174173 PMCID: PMC8841402 DOI: 10.3389/fcell.2021.817086] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 12/22/2021] [Indexed: 12/14/2022] Open
Abstract
Background: DNA methylation is an important epigenetic modification, which plays an important role in regulating gene expression at the transcriptional level. In tumor research, it has been found that the change of DNA methylation leads to the abnormality of gene structure and function, which can provide early warning for tumorigenesis. Our study aims to explore the relationship between the occurrence and development of tumor and the level of DNA methylation. Moreover, this study will provide a set of prognostic biomarkers, which can more accurately predict the survival and health of patients after treatment. Methods: Datasets of bladder cancer patients and control samples were collected from TCGA database, differential analysis was employed to obtain genes with differential DNA methylation levels between tumor samples and normal samples. Then the protein-protein interaction network was constructed, and the potential tumor markers were further obtained by extracting Hub genes from subnet. Cox proportional hazard regression model and survival analysis were used to construct the prognostic model and screen out the prognostic markers of bladder cancer, so as to provide reference for tumor prognosis monitoring and improvement of treatment plan. Results: In this study, we found that DNA methylation was indeed related with the occurrence of bladder cancer. Genes with differential DNA methylation could serve as potential biomarkers for bladder cancer. Through univariate and multivariate Cox proportional hazard regression analysis, we concluded that FASLG and PRKCZ can be used as prognostic biomarkers for bladder cancer. Patients can be classified into high or low risk group by using this two-gene prognostic model. By detecting the methylation status of these genes, we can evaluate the survival of patients. Conclusion: The analysis in our study indicates that the methylation status of tumor-related genes can be used as prognostic biomarkers of bladder cancer.
Collapse
Affiliation(s)
- Shumei Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Jingyu Zhang
- Department of Neurology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Qichao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yingjian Liang
- Department of General Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- Key Laboratory of Hepatosplenic Surgery, Ministry of Education, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Youwen Du
- School of Life Sciences, Anhui Medical University, Hefei, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Guohua Wang,
| |
Collapse
|
4
|
Han K, Cao P, Wang Y, Xie F, Ma J, Yu M, Wang J, Xu Y, Zhang Y, Wan J. A Review of Approaches for Predicting Drug-Drug Interactions Based on Machine Learning. Front Pharmacol 2022; 12:814858. [PMID: 35153767 PMCID: PMC8835726 DOI: 10.3389/fphar.2021.814858] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 12/20/2021] [Indexed: 01/01/2023] Open
Abstract
Drug-drug interactions play a vital role in drug research. However, they may also cause adverse reactions in patients, with serious consequences. Manual detection of drug-drug interactions is time-consuming and expensive, so it is urgent to use computer methods to solve the problem. There are two ways for computers to identify drug interactions: one is to identify known drug interactions, and the other is to predict unknown drug interactions. In this paper, we review the research progress of machine learning in predicting unknown drug interactions. Among these methods, the literature-based method is special because it combines the extraction method of DDI and the prediction method of DDI. We first introduce the common databases, then briefly describe each method, and summarize the advantages and disadvantages of some prediction models. Finally, we discuss the challenges and prospects of machine learning methods in predicting drug interactions. This review aims to provide useful guidance for interested researchers to further promote bioinformatics algorithms to predict DDI.
Collapse
Affiliation(s)
- Ke Han
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
- College of Pharmacy, Harbin University of Commerce, Harbin, China
| | - Peigang Cao
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yu Wang
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Fang Xie
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Jiaqi Ma
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Mengyao Yu
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Jianchun Wang
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Yaoqun Xu
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Yu Zhang
- Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
5
|
Fan Y, Dong X, Li M, Liu P, Zheng J, Li H, Zhang Y. LncRNA KRT19P3 Is Involved in Breast Cancer Cell Proliferation, Migration and Invasion. Front Oncol 2022; 11:799082. [PMID: 35059320 PMCID: PMC8763666 DOI: 10.3389/fonc.2021.799082] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 12/08/2021] [Indexed: 12/13/2022] Open
Abstract
Long non-coding RNAs (LncRNAs) have already been taken as critical regulatory molecules in breast carcinoma (BC). Besides, the progression of BC is closely associated with the immune system. However, the relationship between lncRNAs and the tumor immune system in BC has not been fully studied. LncRNA KRT19P3 has been reported to inhibit the progression of gastric cancer. In the present study, we first discovered that KRT19P3 was downregulated in BC tissues compared with para cancer tissue. Then we showed that KRT19P3 could be used as a marker to differentiate BC from para cancer tissue. Increased expression of KRT19P3 markedly inhibited the proliferation, migration, and invasion rate of BC cells in vitro and tumor growth of BC in vivo. Conversely, KRT19P3 knockdown by siRNA markedly promoted the proliferation, migration, and invasion rate of BC cells after being transfected. Comparison of clinical parameters showed an inverse relationship between the expression of KRT19P3 and pathological grade. Furthermore, immunohistochemistry (IHC) was applied to reveal the positive rate of the expression of Ki-67, programmed death-ligand 1 (PD-L1), and CD8 in BC tissues. Correlation analysis showed that Ki-67 and PD-L1 were inversely proportional to KRT19P3 but CD8 was directly proportional to KRT19P3. In conclusion, this study demonstrated that lncRNA KRT19P3 inhibits BC progression, and may affect the expression of PD-L1 in BC, which in turn affects CD8+ T (CD8 positive Cytotoxic T lymphocyte) cells in the immune microenvironment.
Collapse
Affiliation(s)
- Yanping Fan
- Pathology Department, First Affiliated Hospital of Weifang Medical University (Weifang People's Hospital), Weifang, China.,Department of Basic Medicine, Weifang Medical University, Weifang, China
| | - Xiaotong Dong
- Pathology Department, First Affiliated Hospital of Weifang Medical University (Weifang People's Hospital), Weifang, China.,Department of Basic Medicine, Weifang Medical University, Weifang, China
| | - Meizeng Li
- Pathology Department, First Affiliated Hospital of Weifang Medical University (Weifang People's Hospital), Weifang, China.,Department of Basic Medicine, Weifang Medical University, Weifang, China
| | - Pengju Liu
- School of Economics, Qingdao University, Qingdao, China
| | - Jie Zheng
- Department of Basic Medicine, Weifang Medical University, Weifang, China
| | - Hongli Li
- Department of Basic Medicine, Weifang Medical University, Weifang, China
| | - Yunxiang Zhang
- Pathology Department, First Affiliated Hospital of Weifang Medical University (Weifang People's Hospital), Weifang, China
| |
Collapse
|
6
|
Yu L, Zheng Y, Gao L. MiRNA-disease association prediction based on meta-paths. Brief Bioinform 2022; 23:6501422. [PMID: 35018405 DOI: 10.1093/bib/bbab571] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 12/02/2021] [Accepted: 12/11/2021] [Indexed: 01/09/2023] Open
Abstract
Since miRNAs can participate in the posttranscriptional regulation of gene expression, they may provide ideas for the development of new drugs or become new biomarkers for drug targets or disease diagnosis. In this work, we propose an miRNA-disease association prediction method based on meta-paths (MDPBMP). First, an miRNA-disease-gene heterogeneous information network was constructed, and seven symmetrical meta-paths were defined according to different semantics. After constructing the initial feature vector for the node, the vector information carried by all nodes on the meta-path instance is extracted and aggregated to update the feature vector of the starting node. Then, the vector information obtained by the nodes on different meta-paths is aggregated. Finally, miRNA and disease embedding feature vectors are used to calculate their associated scores. Compared with the other methods, MDPBMP obtained the highest AUC value of 0.9214. Among the top 50 predicted miRNAs for lung neoplasms, esophageal neoplasms, colon neoplasms and breast neoplasms, 49, 48, 49 and 50 have been verified. Furthermore, for breast neoplasms, we deleted all the known associations between breast neoplasms and miRNAs from the training set. These results also show that for new diseases without known related miRNA information, our model can predict their potential miRNAs. Code and data are available at https://github.com/LiangYu-Xidian/MDPBMP.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, P.R. China
| | - Yujia Zheng
- School of Computer Science and Technology, Xidian University, Xi'an 710071, P.R. China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, P.R. China
| |
Collapse
|
7
|
Chen Y, Juan L, Lv X, Shi L. Bioinformatics Research on Drug Sensitivity Prediction. Front Pharmacol 2021; 12:799712. [PMID: 34955863 PMCID: PMC8696280 DOI: 10.3389/fphar.2021.799712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/18/2021] [Indexed: 11/28/2022] Open
Abstract
Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiao Lv
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
8
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
9
|
Gong Y, Liao B, Wang P, Zou Q. DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins. Front Pharmacol 2021; 12:771808. [PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/15/2021] [Indexed: 01/09/2023] Open
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Peng Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
10
|
Kan Y, Jiang L, Guo Y, Tang J, Guo F. Two-stage-vote ensemble framework based on integration of mutation data and gene interaction network for uncovering driver genes. Brief Bioinform 2021; 23:6426028. [PMID: 34791034 DOI: 10.1093/bib/bbab429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 08/30/2021] [Accepted: 09/18/2021] [Indexed: 11/14/2022] Open
Abstract
Identifying driver genes, exactly from massive genes with mutations, promotes accurate diagnosis and treatment of cancer. In recent years, a lot of works about uncovering driver genes based on integration of mutation data and gene interaction networks is gaining more attention. However, it is in suspense if it is more effective for prioritizing driver genes when integrating various types of mutation information (frequency and functional impact) and gene networks. Hence, we build a two-stage-vote ensemble framework based on somatic mutations and mutual interactions. Specifically, we first represent and combine various kinds of mutation information, which are propagated through networks by an improved iterative framework. The first vote is conducted on iteration results by voting methods, and the second vote is performed to get ensemble results of the first poll for the final driver gene list. Compared with four excellent previous approaches, our method has better performance in identifying driver genes on $33$ types of cancer from The Cancer Genome Atlas. Meanwhile, we also conduct a comparative analysis about two kinds of mutation information, five gene interaction networks and four voting strategies. Our framework offers a new view for data integration and promotes more latent cancer genes to be admitted.
Collapse
Affiliation(s)
- Yingxin Kan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, U.S
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
11
|
Assessing the Adequacy of Hemodialysis Patients via the Graph-Based Takagi-Sugeno-Kang Fuzzy System. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:9036322. [PMID: 34367320 PMCID: PMC8337127 DOI: 10.1155/2021/9036322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 07/10/2021] [Indexed: 01/09/2023]
Abstract
Maintenance hemodialysis is the main method for the treatment of end-stage renal disease in China. The Kt/V value is the gold standard of hemodialysis adequacy. However, Kt/V requires repeated blood drawing and evaluation; it is hard to monitor dialysis adequacy frequently. In order to meet the need for repeated clinical assessments of dialysis adequacy, we want to find a noninvasive way to assess dialysis adequacy. Therefore, we collect some clinically relevant data and develop a machine learning- (ML-) based model to predict dialysis adequacy for clinical hemodialysis patients. We collect 250 patients, including gender, age, ultrafiltration (UF), predialysis body weight (preBW), postdialysis body weights (postBW), blood pressure (BP), heart rate (HR), and blood flow (BF). An efficient graph-based Takagi-Sugeno-Kang Fuzzy System (G-TSK-FS) model is proposed to predict the dialysis adequacy of hemodialysis patients. The root mean square error (RMSE) of our model is 0.1578. The proposed model can be used as a feasible method to predict dialysis adequacy, providing a new way for clinical practice. Our G-TSK-FS model could be used as a feasible method to predict dialysis adequacy, providing a new way for clinical practice.
Collapse
|
12
|
A Self-Representation-Based Fuzzy SVM Model for Predicting Vascular Calcification of Hemodialysis Patients. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:2464821. [PMID: 34367315 PMCID: PMC8337133 DOI: 10.1155/2021/2464821] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/30/2021] [Accepted: 07/08/2021] [Indexed: 01/09/2023]
Abstract
In end-stage renal disease (ESRD), vascular calcification risk factors are essential for the survival of hemodialysis patients. To effectively assess the level of vascular calcification, the machine learning algorithm can be used to predict the vascular calcification risk in ESRD patients. As the amount of collected data is unbalanced under different risk levels, it has an influence on the classification task. So, an effective fuzzy support vector machine based on self-representation (FSVM-SR) is proposed to predict vascular calcification risk in this work. In addition, our method is also compared with other conventional machine learning methods, and the results show that our method can better complete the classification task of the vascular calcification risk.
Collapse
|