1
|
Wang H, Zhang Z, Li H, Li J, Li H, Liu M, Liang P, Xi Q, Xing Y, Yang L, Zuo Y. A cost-effective machine learning-based method for preeclampsia risk assessment and driver genes discovery. Cell Biosci 2023; 13:41. [PMID: 36849879 PMCID: PMC9972636 DOI: 10.1186/s13578-023-00991-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND The placenta, as a unique exchange organ between mother and fetus, is essential for successful human pregnancy and fetal health. Preeclampsia (PE) caused by placental dysfunction contributes to both maternal and infant morbidity and mortality. Accurate identification of PE patients plays a vital role in the formulation of treatment plans. However, the traditional clinical methods of PE have a high misdiagnosis rate. RESULTS Here, we first designed a computational biology method that used single-cell transcriptome (scRNA-seq) of healthy pregnancy (38 wk) and early-onset PE (28-32 wk) to identify pathological cell subpopulations and predict PE risk. Based on machine learning methods and feature selection techniques, we observed that the Tuning ReliefF (TURF) score hybrid with XGBoost (TURF_XGB) achieved optimal performance, with 92.61% accuracy and 92.46% recall for classifying nine cell subpopulations of healthy placentas. Biological landscapes of placenta heterogeneity could be mapped by the 110 marker genes screened by TURF_XGB, which revealed the superiority of the TURF feature mining. Moreover, we processed the PE dataset with LASSO to obtain 497 biomarkers. Integration analysis of the above two gene sets revealed that dendritic cells were closely associated with early-onset PE, and C1QB and C1QC might drive preeclampsia by mediating inflammation. In addition, an ensemble model-based risk stratification card was developed to classify preeclampsia patients, and its area under the receiver operating characteristic curve (AUC) could reach 0.99. For broader accessibility, we designed an accessible online web server ( http://bioinfor.imu.edu.cn/placenta ). CONCLUSION Single-cell transcriptome-based preeclampsia risk assessment using an ensemble machine learning framework is a valuable asset for clinical decision-making. C1QB and C1QC may be involved in the development and progression of early-onset PE by affecting the complement and coagulation cascades pathway that mediate inflammation, which has important implications for better understanding the pathogenesis of PE.
Collapse
Affiliation(s)
- Hao Wang
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Zhaoyue Zhang
- grid.54549.390000 0004 0369 4060School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Haicheng Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Jinzhao Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Hanshuang Li
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Mingzhu Liu
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China ,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010 China
| | - Pengfei Liang
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Qilemuge Xi
- grid.411643.50000 0004 1761 0411The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070 China
| | - Yongqiang Xing
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China. .,Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010, China.
| |
Collapse
|
2
|
Zhang W, Liu B. iSnoDi-LSGT: identifying snoRNA-disease associations based on local similarity constraints and global topological constraints. RNA (NEW YORK, N.Y.) 2022; 28:1558-1567. [PMID: 36192132 PMCID: PMC9670808 DOI: 10.1261/rna.079325.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Growing evidence proves that small nucleolar RNAs (snoRNAs) have important functions in various biological processes, the malfunction of which leads to the emergence and development of complex diseases. However, identifying snoRNA-disease associations is an ongoing challenging task due to the considerable time- and money-consuming biological experiments. Therefore, it is urgent to design efficient and economical methods for the identification of snoRNA-disease associations. In this regard, we propose a computational method named iSnoDi-LSGT, which utilizes snoRNA sequence similarity and disease similarity as local similarity constraints. The iSnoDi-LSGT predictor further employs network embedding technology to extract topological features of snoRNAs and diseases, based on which snoRNA topological similarity and disease topological similarity are calculated as global topological constraints. To the best of our knowledge, the iSnoDi-LSGT is the first computational method for snoRNA-disease association identification. The experimental results indicate that the iSnoDi-LSGT predictor can effectively predict unknown snoRNA-disease associations. The web server of the iSnoDi-LSGT predictor is freely available at http://bliulab.net/iSnoDi-LSGT.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
3
|
Yao Y, Lv Y, Tong L, Liang Y, Xi S, Ji B, Zhang G, Li L, Tian G, Tang M, Hu X, Li S, Yang J. ICSDA: a multi-modal deep learning model to predict breast cancer recurrence and metastasis risk by integrating pathological, clinical and gene expression data. Brief Bioinform 2022; 23:6761046. [PMID: 36242564 DOI: 10.1093/bib/bbac448] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Breast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data. Specifically, we segmented tumor regions in H&E into image blocks (256 × 256 pixels) and encoded each image block into a 1D feature vector using a deep neural network. Then, the attention module scored each area of the H&E-stained images and combined image features with clinical and gene expression data to predict the risk of recurrence and metastasis for each patient. To test the model, we downloaded all 196 breast cancer samples from the Cancer Genome Atlas with clinical, gene expression and H&E information simultaneously available. The samples were then divided into the training and testing sets with a ratio of 7: 3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling. The multi-modal model achieved an area-under-the-curve value of 0.75 on the testing set better than those based solely on H&E image, sequencing data and clinical data, respectively. This study might have clinical significance in identifying high-risk breast cancer patients, who may benefit from postoperative adjuvant treatment.
Collapse
Affiliation(s)
- Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, China
| | - Yaping Lv
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Genies Beijing Co., Ltd., Beijing 100102, China
| | - Ling Tong
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Yuebin Liang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Shuxue Xi
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Binbin Ji
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Guanglu Zhang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China
| | - Ling Li
- Basic Courses Department, Zhejiang Shuren University, Hangzhou 310000, China
| | - Geng Tian
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, 212013, China
| | - Xiyue Hu
- Dept. of Colorectal Surgery, National Cancer Center/ Cancer Hospital, Chinese Academy of Medical Science, 17 Panjiayuan Nanli, Chaoyang District, Beijing, China, 100021
| | - Shijun Li
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Jialiang Yang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
4
|
Han YM, Yang H, Huang QL, Sun ZJ, Li ML, Zhang JB, Deng KJ, Chen S, Lin H. Risk prediction of diabetes and pre-diabetes based on physical examination data. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3597-3608. [PMID: 35341266 DOI: 10.3934/mbe.2022166] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L ≤ FPG < 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG > 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.
Collapse
Affiliation(s)
- Yu-Mei Han
- Beijing Physical Examination Center, Beijing, China
| | - Hui Yang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | | | | | - Ke-Jun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shuo Chen
- Beijing Physical Examination Center, Beijing, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
5
|
Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021; 19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. METHODS In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. RESULTS Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . CONCLUSIONS We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Huannan Guo
- Department of Oncology, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
6
|
Yang H, Qi C, Li B, Cheng L. Non-coding RNAs as Novel Biomarkers in Cancer Drug Resistance. Curr Med Chem 2021; 29:837-848. [PMID: 34348605 DOI: 10.2174/0929867328666210804090644] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/09/2021] [Accepted: 06/15/2021] [Indexed: 11/22/2022]
Abstract
Chemotherapy is often the primary and most effective anticancer treatment; however, drug resistance remains a major obstacle to it being curative. Recent studies have demonstrated that non-coding RNAs (ncRNAs), especially microRNAs and long non-coding RNAs, are involved in drug resistance of tumor cells in many ways, such as modulation of apoptosis, drug efflux and metabolism, epithelial-to-mesenchymal transition, DNA repair, and cell cycle progression. Exploring the relationships between ncRNAs and drug resistance will not only contribute to our understanding of the mechanisms of drug resistance and provide ncRNA biomarkers of chemoresistance, but will also help realize personalized anticancer treatment regimens. Due to the high cost and low efficiency of biological experimentation, many researchers have opted to use computational methods to identify ncRNA biomarkers associated with drug resistance. In this review, we summarize recent discoveries related to ncRNA-mediated drug resistance and highlight the computational methods and resources available for ncRNA biomarkers involved in chemoresistance.
Collapse
Affiliation(s)
- Haixiu Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081. China
| | - Changlu Qi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081. China
| | - Boyan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081. China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081. China
| |
Collapse
|
7
|
Chen Z, Shen Z, Zhang Z, Zhao D, Xu L, Zhang L. RNA-Associated Co-expression Network Identifies Novel Biomarkers for Digestive System Cancer. Front Genet 2021; 12:659788. [PMID: 33841514 PMCID: PMC8033200 DOI: 10.3389/fgene.2021.659788] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 02/25/2021] [Indexed: 01/04/2023] Open
Abstract
Cancers of the digestive system are malignant diseases. Our study focused on colon cancer, esophageal cancer (ESCC), rectal cancer, gastric cancer (GC), and rectosigmoid junction cancer to identify possible biomarkers for these diseases. The transcriptome data were downloaded from the TCGA database (The Cancer Genome Atlas Program), and a network was constructed using the WGCNA algorithm. Two significant modules were found, and coexpression networks were constructed. CytoHubba was used to identify hub genes of the two networks. GO analysis suggested that the network genes were involved in metabolic processes, biological regulation, and membrane and protein binding. KEGG analysis indicated that the significant pathways were the calcium signaling pathway, fatty acid biosynthesis, and pathways in cancer and insulin resistance. Some of the most significant hub genes were hsa-let-7b-3p, hsa-miR-378a-5p, hsa-miR-26a-5p, hsa-miR-382-5p, and hsa-miR-29b-2-5p and SECISBP2 L, NCOA1, HERC1, HIPK3, and MBNL1, respectively. These genes were predicted to be associated with the tumor prognostic reference for this patient population.
Collapse
Affiliation(s)
- Zheng Chen
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zijie Shen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Da Zhao
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| |
Collapse
|