1
|
Chrysostomou A, Furlan C, Saccenti E. Machine learning based analysis of single-cell data reveals evidence of subject-specific single-cell gene expression profiles in acute myeloid leukaemia patients and healthy controls. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2024; 1867:195062. [PMID: 39366464 DOI: 10.1016/j.bbagrm.2024.195062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/01/2024] [Accepted: 09/24/2024] [Indexed: 10/06/2024]
Abstract
Acute Myeloid Leukaemia (AML) is characterized by uncontrolled growth of immature myeloid cells, disrupting normal blood production. Treatment typically involves chemotherapy, targeted therapy, and stem cell transplantation but many patients develop chemoresistance, leading to poor outcomes due to the disease's high heterogeneity. In this study, we used publicly available single-cell RNA sequencing data and machine learning to classify AML patients and healthy, monocytes, dendritic and progenitor cells population. We found that gene expression profiles of AML patients and healthy controls can be classified at the individual level with high accuracy (>70 %) when using progenitor cells, suggesting the existence of subject-specific single cell transcriptomics profiles. The analysis also revealed molecular determinants of patient heterogeneity (e.g. TPSD1, CT45A1, and GABRA4) which could support new strategies for patient stratification and personalized treatment in leukaemia.
Collapse
Affiliation(s)
- Andreas Chrysostomou
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, the Netherlands
| | - Cristina Furlan
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, the Netherlands.
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, the Netherlands.
| |
Collapse
|
2
|
Cheng W, Liu J, Wang C, Jiang R, Jiang M, Kong F. Application of image recognition technology in pathological diagnosis of blood smears. Clin Exp Med 2024; 24:181. [PMID: 39105953 PMCID: PMC11303489 DOI: 10.1007/s10238-024-01379-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 05/13/2024] [Indexed: 08/07/2024]
Abstract
Traditional manual blood smear diagnosis methods are time-consuming and prone to errors, often relying heavily on the experience of clinical laboratory analysts for accuracy. As breakthroughs in key technologies such as neural networks and deep learning continue to drive digital transformation in the medical field, image recognition technology is increasingly being leveraged to enhance existing medical processes. In recent years, advancements in computer technology have led to improved efficiency in the identification of blood cells in blood smears through the use of image recognition technology. This paper provides a comprehensive summary of the methods and steps involved in utilizing image recognition algorithms for diagnosing diseases in blood smears, with a focus on malaria and leukemia. Furthermore, it offers a forward-looking research direction for the development of a comprehensive blood cell pathological detection system.
Collapse
Affiliation(s)
- Wangxinjun Cheng
- Center of Hematology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, China
- Queen Mary College, Nanchang University, Nanchang, 330006, China
| | - Jingshuang Liu
- Center of Hematology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, China
- Queen Mary College, Nanchang University, Nanchang, 330006, China
| | - Chaofeng Wang
- Center of Hematology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, China
- Queen Mary College, Nanchang University, Nanchang, 330006, China
| | - Ruiyin Jiang
- Queen Mary College, Nanchang University, Nanchang, 330006, China
| | - Mei Jiang
- Department of Clinical Laboratory, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, China.
| | - Fancong Kong
- Center of Hematology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, 330006, China.
| |
Collapse
|
3
|
Nuryani N, Pambudi Utomo T, Wiyono N, Sutomo AD, Ling S. Cuffless Hypertension Detection using Swarm Support Vector Machine Utilizing Photoplethysmogram and Electrocardiogram. J Biomed Phys Eng 2023; 13:477-488. [PMID: 37868942 PMCID: PMC10589690 DOI: 10.31661/jbpe.v0i0.2206-1504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 01/11/2023] [Indexed: 10/24/2023]
Abstract
Background Hypertension is associated with severe complications, and its detection is important to provide early information about a hypertension event, which is essential to prevent further complications. Objective This study aimed to investigate a strategy for hypertension detection without a cuff using parameters of bioelectric signals, i.e., Electrocardiogram (ECG), Photoplethysmogram (PPG,) and an algorithm of Swarm-based Support Vector Machine (SSVM). Material and Methods This experimental study was conducted to develop a hypertension detection system. ECG and PPG bioelectrical records were collected from the Medical Information Mart for Intensive Care (MIMIC) from normal and hypertension participants and processed to find the parameters, used for the inputs of SSVM and comprised Pulse Arrival Time (PAT) and the characteristics of PPG signal derivatives. The SSVM was n Support Vector Machine (SVM) algorithm optimized using particle swarm optimization with Quantum Delta-potential-well (QDPSO). The SSVMs with different inputs were investigated to find the optimal detection performance. Results The proposed strategy was performed at 96% in terms of F1-score, accuracy, sensitivity, and specificity with better performance than the other methods tested and methods and also could develop a cuff-free hypertension monitoring system. Conclusion Hypertension using SSVM, ECG, and PPG parameters is acceptably performed. The hypertension detection had lower performance utilizing only PPG than both ECG and PPG.
Collapse
Affiliation(s)
- Nuryani Nuryani
- Department of Physics, University of Sebelas Maret Jl. Ir. Sutami 36A Kentingan Jebres Surakarta 57126, Indonesia
| | - Trio Pambudi Utomo
- Department of Physics, University of Sebelas Maret Jl. Ir. Sutami 36A Kentingan Jebres Surakarta 57126, Indonesia
| | - Nanang Wiyono
- Faculty of Medicine, University of Sebelas Maret Jl. Ir. Sutami 36A Kentingan Jebres Surakarta 57126, Indonesia
| | - Artono Dwijo Sutomo
- Department of Physics, Graduate Program, University of Sebelas Maret Jl. Ir. Sutami 36A Kentingan Jebres Surakarta 57126, Indonesia
| | - Steve Ling
- Centre for Health Technologies, University of Technology Sydney, Broadway NSW 2007, Australia
| |
Collapse
|
4
|
Lin W, Niu R, Park SM, Zou Y, Kim SS, Xia X, Xing S, Yang Q, Sun X, Yuan Z, Zhou S, Zhang D, Kwon HJ, Park S, Il Kim C, Koo H, Liu Y, Wu H, Zheng M, Yoo H, Shi B, Park JB, Yin J. IGFBP5 is an ROR1 ligand promoting glioblastoma invasion via ROR1/HER2-CREB signaling axis. Nat Commun 2023; 14:1578. [PMID: 36949068 PMCID: PMC10033905 DOI: 10.1038/s41467-023-37306-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 03/10/2023] [Indexed: 03/24/2023] Open
Abstract
Diffuse infiltration is the main reason for therapeutic resistance and recurrence in glioblastoma (GBM). However, potential targeted therapies for GBM stem-like cell (GSC) which is responsible for GBM invasion are limited. Herein, we report Insulin-like Growth Factor-Binding Protein 5 (IGFBP5) is a ligand for Receptor tyrosine kinase like Orphan Receptor 1 (ROR1), as a promising target for GSC invasion. Using a GSC-derived brain tumor model, GSCs were characterized into invasive or non-invasive subtypes, and RNA sequencing analysis revealed that IGFBP5 was differentially expressed between these two subtypes. GSC invasion capacity was inhibited by IGFBP5 knockdown and enhanced by IGFBP5 overexpression both in vitro and in vivo, particularly in a patient-derived xenograft model. IGFBP5 binds to ROR1 and facilitates ROR1/HER2 heterodimer formation, followed by inducing CREB-mediated ETV5 and FBXW9 expression, thereby promoting GSC invasion and tumorigenesis. Importantly, using a tumor-specific targeting and penetrating nanocapsule-mediated delivery of CRISPR/Cas9-based IGFBP5 gene editing significantly suppressed GSC invasion and downstream gene expression, and prolonged the survival of orthotopic tumor-bearing mice. Collectively, our data reveal that IGFBP5-ROR1/HER2-CREB signaling axis as a potential GBM therapeutic target.
Collapse
Affiliation(s)
- Weiwei Lin
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
- Research Institute, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
- Department of Life Science, Ewha Womans University, Seoul, 03760, Republic of Korea
| | - Rui Niu
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Seong-Min Park
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
- Personalized Genomic Medicine Research Center, KRIBB, Daejeon, 34141, Republic of Korea
| | - Yan Zou
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
- Centre for Motor Neuron Disease Research, Macquarie Medical School, Faculty of Medicine & Health Sciences, Macquarie University, Sydney, NSW, 2109, Australia
| | - Sung Soo Kim
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
| | - Xue Xia
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Songge Xing
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Qingshan Yang
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Xinhong Sun
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Zheng Yuan
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Shuchang Zhou
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Dongya Zhang
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Hyung Joon Kwon
- Department of Cancer Control and Population Health, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
| | - Saewhan Park
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
| | - Chan Il Kim
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
| | - Harim Koo
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
| | - Yang Liu
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Haigang Wu
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Meng Zheng
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China
| | - Heon Yoo
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
- Research Institute, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea
| | - Bingyang Shi
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China.
- Centre for Motor Neuron Disease Research, Macquarie Medical School, Faculty of Medicine & Health Sciences, Macquarie University, Sydney, NSW, 2109, Australia.
| | - Jong Bae Park
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China.
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea.
- Research Institute, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea.
| | - Jinlong Yin
- Henan-Macquarie University Joint Centre for Biomedical Innovation, School of Life Sciences, Henan University, Kaifeng, Henan, 475004, China.
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, Goyang, Gyeonggi, 10408, Republic of Korea.
| |
Collapse
|
5
|
Li J, Huang F, Ma Q, Guo W, Feng K, Huang T, Cai YD. Identification of genes related to immune enhancement caused by heterologous ChAdOx1-BNT162b2 vaccines in lymphocytes at single-cell resolution with machine learning methods. Front Immunol 2023; 14:1131051. [PMID: 36936955 PMCID: PMC10017451 DOI: 10.3389/fimmu.2023.1131051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open
Abstract
The widely used ChAdOx1 nCoV-19 (ChAd) vector and BNT162b2 (BNT) mRNA vaccines have been shown to induce robust immune responses. Recent studies demonstrated that the immune responses of people who received one dose of ChAdOx1 and one dose of BNT were better than those of people who received vaccines with two homologous ChAdOx1 or two BNT doses. However, how heterologous vaccines function has not been extensively investigated. In this study, single-cell RNA sequencing data from three classes of samples: volunteers vaccinated with heterologous ChAdOx1-BNT and volunteers vaccinated with homologous ChAd-ChAd and BNT-BNT vaccinations after 7 days were divided into three types of immune cells (3654 B, 8212 CD4+ T, and 5608 CD8+ T cells). To identify differences in gene expression in various cell types induced by vaccines administered through different vaccination strategies, multiple advanced feature selection methods (max-relevance and min-redundancy, Monte Carlo feature selection, least absolute shrinkage and selection operator, light gradient boosting machine, and permutation feature importance) and classification algorithms (decision tree and random forest) were integrated into a computational framework. Feature selection methods were in charge of analyzing the importance of gene features, yielding multiple gene lists. These lists were fed into incremental feature selection, incorporating decision tree and random forest, to extract essential genes, classification rules and build efficient classifiers. Highly ranked genes include PLCG2, whose differential expression is important to the B cell immune pathway and is positively correlated with immune cells, such as CD8+ T cells, and B2M, which is associated with thymic T cell differentiation. This study gave an important contribution to the mechanistic explanation of results showing the stronger immune response of a heterologous ChAdOx1-BNT vaccination schedule than two doses of either BNT or ChAdOx1, offering a theoretical foundation for vaccine modification.
Collapse
Affiliation(s)
- Jing Li
- School of Computer Science, Baicheng Normal University, Baicheng, Jilin, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - QingLan Ma
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Science, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
6
|
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2023; 2023:5333361. [PMID: 36644165 PMCID: PMC9833906 DOI: 10.1155/2023/5333361] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/15/2022] [Accepted: 12/15/2022] [Indexed: 01/06/2023]
Abstract
Long-term cigarette smoking causes various human diseases, including respiratory disease, cancer, and gastrointestinal (GI) disorders. Alterations in gene expression and variable splicing processes induced by smoking are associated with the development of diseases. This study applied advanced machine learning methods to identify the isoforms with important roles in distinguishing smokers from former smokers based on the expression profile of isoforms from current and former smokers collected in one previous study. These isoforms were deemed as features, which were first analyzed by the Boruta to select features highly correlated with the target variables. Then, the selected features were evaluated by four feature ranking algorithms, resulting in four feature lists. The incremental feature selection method was applied to each list for obtaining the optimal feature subsets and building high-performance classification models. Furthermore, a series of classification rules were accessed by decision tree with the highest performance. Eventually, the rationality of the mined isoforms (features) and classification rules was verified by reviewing previous research. Features such as isoforms ENST00000464835 (expressed by LRRN3), ENST00000622663 (expressed by SASH1), and ENST00000284311 (expressed by GPR15), and pathways (cytotoxicity mediated by natural killer cell and cytokine-cytokine receptor interaction) revealed by the enrichment analysis, were highly relevant to smoking response, suggesting the robustness of our analysis pipeline.
Collapse
|
7
|
Xu Y, Huang F, Guo W, Feng K, Zhu L, Zeng Z, Huang T, Cai YD. Characterization of chromatin accessibility patterns in different mouse cell types using machine learning methods at single-cell resolution. Front Genet 2023; 14:1145647. [PMID: 36936430 PMCID: PMC10014730 DOI: 10.3389/fgene.2023.1145647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/06/2023] Open
Abstract
Chromatin accessibility is a generic property of the eukaryotic genome, which refers to the degree of physical compaction of chromatin. Recent studies have shown that chromatin accessibility is cell type dependent, indicating chromatin heterogeneity across cell lines and tissues. The identification of markers used to distinguish cell types at the chromosome level is important to understand cell function and classify cell types. In the present study, we investigated transcriptionally active chromosome segments identified by sci-ATAC-seq at single-cell resolution, including 69,015 cells belonging to 77 different cell types. Each cell was represented by existence status on 20,783 genes that were obtained from 436,206 active chromosome segments. The gene features were deeply analyzed by Boruta, resulting in 3897 genes, which were ranked in a list by Monte Carlo feature selection. Such list was further analyzed by incremental feature selection (IFS) method, yielding essential genes, classification rules and an efficient random forest (RF) classifier. To improve the performance of the optimal RF classifier, its features were further processed by autoencoder, light gradient boosting machine and IFS method. The final RF classifier with MCC of 0.838 was constructed. Some marker genes such as H2-Dmb2, which are specifically expressed in antigen-presenting cells (e.g., dendritic cells or macrophages), and Tenm2, which are specifically expressed in T cells, were identified in this study. Our analysis revealed numerous potential epigenetic modification patterns that are unique to particular cell types, thereby advancing knowledge of the critical functions of chromatin accessibility in cell processes.
Collapse
Affiliation(s)
- Yaochen Xu
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou, China
| | - Lin Zhu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Zhenbing Zeng
- Department of Mathematics, School of Sciences, Shanghai University, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Zhenbing Zeng, ; Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
8
|
Jian F, Huang F, Zhang YH, Huang T, Cai YD. Identifying anal and cervical tumorigenesis-associated methylation signaling with machine learning methods. Front Oncol 2022; 12:998032. [PMID: 36249027 PMCID: PMC9557006 DOI: 10.3389/fonc.2022.998032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
Cervical and anal carcinoma are neoplastic diseases with various intraepithelial neoplasia stages. The underlying mechanisms for cancer initiation and progression have not been fully revealed. DNA methylation has been shown to be aberrantly regulated during tumorigenesis in anal and cervical carcinoma, revealing the important roles of DNA methylation signaling as a biomarker to distinguish cancer stages in clinics. In this research, several machine learning methods were used to analyze the methylation profiles on anal and cervical carcinoma samples, which were divided into three classes representing various stages of tumor progression. Advanced feature selection methods, including Boruta, LASSO, LightGBM, and MCFS, were used to select methylation features that are highly correlated with cancer progression. Some methylation probes including cg01550828 and its corresponding gene RNF168 have been reported to be associated with human papilloma virus-related anal cancer. As for biomarkers for cervical carcinoma, cg27012396 and its functional gene HDAC4 were confirmed to regulate the glycolysis and survival of hypoxic tumor cells in cervical carcinoma. Furthermore, we developed effective classifiers for identifying various tumor stages and derived classification rules that reflect the quantitative impact of methylation on tumorigenesis. The current study identified methylation signals associated with the development of cervical and anal carcinoma at qualitative and quantitative levels using advanced machine learning methods.
Collapse
Affiliation(s)
- Fangfang Jian
- Department of Obstetrics & Gynecology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - FeiMing Huang
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- *Correspondence: Tao Huang, ; Yu-Dong Cai,
| |
Collapse
|
9
|
Beheshti Z. BMPA-TVSinV: A Binary Marine Predators Algorithm using time-varying sine and V-shaped transfer functions for wrapper-based feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
10
|
Li Z, Wang D, Guo W, Zhang S, Chen L, Zhang YH, Lu L, Pan X, Huang T, Cai YD. Identification of cortical interneuron cell markers in mouse embryos based on machine learning analysis of single-cell transcriptomics. Front Neurosci 2022; 16:841145. [PMID: 35911980 PMCID: PMC9337837 DOI: 10.3389/fnins.2022.841145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Mammalian cortical interneurons (CINs) could be classified into more than two dozen cell types that possess diverse electrophysiological and molecular characteristics, and participate in various essential biological processes in the human neural system. However, the mechanism to generate diversity in CINs remains controversial. This study aims to predict CIN diversity in mouse embryo by using single-cell transcriptomics and the machine learning methods. Data of 2,669 single-cell transcriptome sequencing results are employed. The 2,669 cells are classified into three categories, caudal ganglionic eminence (CGE) cells, dorsal medial ganglionic eminence (dMGE) cells, and ventral medial ganglionic eminence (vMGE) cells, corresponding to the three regions in the mouse subpallium where the cells are collected. Such transcriptomic profiles were first analyzed by the minimum redundancy and maximum relevance method. A feature list was obtained, which was further fed into the incremental feature selection, incorporating two classification algorithms (random forest and repeated incremental pruning to produce error reduction), to extract key genes and construct powerful classifiers and classification rules. The optimal classifier could achieve an MCC of 0.725, and category-specified prediction accuracies of 0.958, 0.760, and 0.737 for the CGE, dMGE, and vMGE cells, respectively. The related genes and rules may provide helpful information for deepening the understanding of CIN diversity.
Collapse
Affiliation(s)
- Zhandong Li
- College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Deling Wang
- State Key Laboratory of Oncology in South China, Department of Radiology, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Shiqi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Harvard Medical School, Brigham and Women’s Hospital, Boston, MA, United States
| | - Lin Lu
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, United States
| | - XiaoYong Pan
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- *Correspondence: Tao Huang,
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
- Yu-Dong Cai,
| |
Collapse
|
11
|
Huang F, Chen L, Guo W, Zhou X, Feng K, Huang T, Cai Y. Identifying COVID-19 Severity-Related SARS-CoV-2 Mutation Using a Machine Learning Method. Life (Basel) 2022; 12:806. [PMID: 35743837 PMCID: PMC9225528 DOI: 10.3390/life12060806] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/22/2022] [Accepted: 05/25/2022] [Indexed: 12/22/2022] Open
Abstract
SARS-CoV-2 shows great evolutionary capacity through a high frequency of genomic variation during transmission. Evolved SARS-CoV-2 often demonstrates resistance to previous vaccines and can cause poor clinical status in patients. Mutations in the SARS-CoV-2 genome involve mutations in structural and nonstructural proteins, and some of these proteins such as spike proteins have been shown to be directly associated with the clinical status of patients with severe COVID-19 pneumonia. In this study, we collected genome-wide mutation information of virulent strains and the severity of COVID-19 pneumonia in patients varying depending on their clinical status. Important protein mutations and untranslated region mutations were extracted using machine learning methods. First, through Boruta and four ranking algorithms (least absolute shrinkage and selection operator, light gradient boosting machine, max-relevance and min-redundancy, and Monte Carlo feature selection), mutations that were highly correlated with the clinical status of the patients were screened out and sorted in four feature lists. Some mutations such as D614G and V1176F were shown to be associated with viral infectivity. Moreover, previously unreported mutations such as A320V of nsp14 and I164ILV of nsp14 were also identified, which suggests their potential roles. We then applied the incremental feature selection method to each feature list to construct efficient classifiers, which can be directly used to distinguish the clinical status of COVID-19 patients. Meanwhile, four sets of quantitative rules were set up, which can help us to more intuitively understand the role of each mutation in differentiating the clinical status of COVID-19 patients. Identified key mutations linked to virologic properties will help better understand the mechanisms of infection and will aid in the development of antiviral treatments.
Collapse
Affiliation(s)
- Feiming Huang
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China;
| | - Wei Guo
- Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200025, China;
| | - Xianchao Zhou
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai 200025, China;
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510060, China;
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China;
| |
Collapse
|
12
|
Feng CH, Disis ML, Cheng C, Zhang L. Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models. J Transl Med 2022; 102:236-244. [PMID: 34537824 DOI: 10.1038/s41374-021-00662-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/10/2021] [Accepted: 08/12/2021] [Indexed: 11/09/2022] Open
Abstract
Colorectal cancer (CRC) is one of the most common cancers worldwide, and a leading cause of cancer deaths. Better classifying multicategory outcomes of CRC with clinical and omic data may help adjust treatment regimens based on individual's risk. Here, we selected the features that were useful for classifying four-category survival outcome of CRC using the clinical and transcriptomic data, or clinical, transcriptomic, microsatellite instability and selected oncogenic-driver data (all data) of TCGA. We also optimized multimetric feature selection to develop the best multinomial logistic regression (MLR) and random forest (RF) models that had the highest accuracy, precision, recall and F1 score, respectively. We identified 2073 differentially expressed genes of the TCGA RNASeq dataset. MLR overall outperformed RF in the multimetric feature selection. In both RF and MLR models, precision, recall and F1 score increased as the feature number increased and peaked at the feature number of 600-1000, while the models' accuracy remained stable. The best model was the MLR one with 825 features based on sum of squared coefficients using all data, and attained the best accuracy of 0.855, F1 of 0.738 and precision of 0.832, which were higher than those using clinical and transcriptomic data. The top-ranked features in the MLR model of the best performance using clinical and transcriptomic data were different from those using all data. However, pathologic staging, HBS1L, TSPYL4, and TP53TG3B were the overlapping top-20 ranked features in the best models using clinical and transcriptomic, or all data. Thus, we developed a multimetric feature-selection based MLR model that outperformed RF models in classifying four-category outcome of CRC patients. Interestingly, adding microsatellite instability and oncogenic-driver data to clinical and transcriptomic data improved models' performances. Precision and recall of tuned algorithms may change significantly as the feature number changes, but accuracy appears not sensitive to these changes.
Collapse
Affiliation(s)
| | - Mary L Disis
- UW Medicine Cancer Vaccine Institute, University of Washington, Seattle, WA, USA
| | - Chao Cheng
- Department of Medicine, Section of Epidemiology and Population Sciences, Baylor College of Medicine, Houston, TX, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, USA.,Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, NJ, USA. .,Department of Pathology, Princeton Medical Center, Plainsboro, NJ, USA. .,Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA. .,Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA.
| |
Collapse
|
13
|
Guo D, Fan Y, Yue JR, Lin T. A regulatory miRNA-mRNA network is associated with transplantation response in acute kidney injury. Hum Genomics 2021; 15:69. [PMID: 34886903 PMCID: PMC8656037 DOI: 10.1186/s40246-021-00363-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 10/11/2021] [Indexed: 02/08/2023] Open
Abstract
Background Acute kidney injury (AKI) is a life-threatening complication characterized by rapid decline in renal function, which frequently occurs after transplantation surgery. However, the molecular mechanism underlying the development of post-transplant (post-Tx) AKI still remains unknown. An increasing number of studies have demonstrated that certain microRNAs (miRNAs) exert crucial functions in AKI. The present study sought to elucidate the molecular mechanisms in post-Tx AKI by constructing a regulatory miRNA–mRNA network. Results Based on two datasets (GSE53771 and GSE53769), three key modules, which contained 55 mRNAs, 76 mRNAs, and 151 miRNAs, were identified by performing weighted gene co-expression network analysis (WGCNA). The miRDIP v4.1 was applied to predict the interactions of key module mRNAs and miRNAs, and the miRNA–mRNA pairs with confidence of more than 0.2 were selected to construct a regulatory miRNA–mRNA network by Cytoscape. The miRNA–mRNA network consisted of 82 nodes (48 mRNAs and 34 miRNAs) and 125 edges. Two miRNAs (miR-203a-3p and miR-205-5p) and ERBB4 with higher node degrees compared with other nodes might play a central role in post-Tx AKI. Additionally, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis indicated that this network was mainly involved in kidney-/renal-related functions and PI3K–Akt/HIF-1/Ras/MAPK signaling pathways. Conclusion We constructed a regulatory miRNA–mRNA network to provide novel insights into post-Tx AKI development, which might help discover new biomarkers or therapeutic drugs for enhancing the ability for early prediction and intervention and decreasing mortality rate of AKI after transplantation. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-021-00363-y.
Collapse
Affiliation(s)
- Duan Guo
- Department of Palliative Medicine, West China School of Public Health and West China fourth Hospital, Sichuan University, Chengdu, 610041, China.,Palliative Medicine Research Center, West China-PUMC C.C. Chen Institute of Health, Sichuan University, Chengdu, 610041, China
| | - Yu Fan
- Department of Urology, National Clinical Research Center for Geriatrics and Organ Transplantation Center, West China Hospital of Sichuan University, No. 37 Guoxue Xiang, Chengdu, 610041, China
| | - Ji-Rong Yue
- Department of Geriatrics and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Tao Lin
- Department of Urology, National Clinical Research Center for Geriatrics and Organ Transplantation Center, West China Hospital of Sichuan University, No. 37 Guoxue Xiang, Chengdu, 610041, China.
| |
Collapse
|
14
|
A survey on artificial intelligence techniques for chronic diseases: open issues and challenges. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10084-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
15
|
Chen L, Li Z, Zeng T, Zhang YH, Feng K, Huang T, Cai YD. Identifying COVID-19-Specific Transcriptomic Biomarkers with Machine Learning Methods. BIOMED RESEARCH INTERNATIONAL 2021; 2021:9939134. [PMID: 34307679 PMCID: PMC8272456 DOI: 10.1155/2021/9939134] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/03/2021] [Accepted: 06/24/2021] [Indexed: 12/11/2022]
Abstract
COVID-19, a severe respiratory disease caused by a new type of coronavirus SARS-CoV-2, has been spreading all over the world. Patients infected with SARS-CoV-2 may have no pathogenic symptoms, i.e., presymptomatic patients and asymptomatic patients. Both patients could further spread the virus to other susceptible people, thereby making the control of COVID-19 difficult. The two major challenges for COVID-19 diagnosis at present are as follows: (1) patients could share similar symptoms with other respiratory infections, and (2) patients may not have any symptoms but could still spread the virus. Therefore, new biomarkers at different omics levels are required for the large-scale screening and diagnosis of COVID-19. Although some initial analyses could identify a group of candidate gene biomarkers for COVID-19, the previous work still could not identify biomarkers capable for clinical use in COVID-19, which requires disease-specific diagnosis compared with other multiple infectious diseases. As an extension of the previous study, optimized machine learning models were applied in the present study to identify some specific qualitative host biomarkers associated with COVID-19 infection on the basis of a publicly released transcriptomic dataset, which included healthy controls and patients with bacterial infection, influenza, COVID-19, and other kinds of coronavirus. This dataset was first analysed by Boruta, Max-Relevance and Min-Redundancy feature selection methods one by one, resulting in a feature list. This list was fed into the incremental feature selection method, incorporating one of the classification algorithms to extract essential biomarkers and build efficient classifiers and classification rules. The capacity of these findings to distinguish COVID-19 with other similar respiratory infectious diseases at the transcriptomic level was also validated, which may improve the efficacy and accuracy of COVID-19 diagnosis.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, shanghai 200444, China
- College of Information Engineering, Shanghai Maritime University, shanghai 201306, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun 130052, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, shanghai 200031, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, shanghai 200031, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, shanghai 200444, China
| |
Collapse
|
16
|
CWLy-RF: A novel approach for identifying cell wall lyases based on random forest classifier. Genomics 2021; 113:2919-2924. [PMID: 34186189 DOI: 10.1016/j.ygeno.2021.06.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/20/2021] [Accepted: 06/25/2021] [Indexed: 02/05/2023]
Abstract
Drug resistance of pathogenic bacteria has become increasingly serious due to the abuse of antibiotics in recent years. Researchers have found that cell wall lyases are effective antibacterial agents that can specifically recognize target bacteria and degrade bacterial peptidoglycan. Traditional wet experiments are usually expensive, time-consuming and laborious for the identification of lyases. Therefore, there is an urgent need to develop prediction tools based on computer methods to identify lyases quickly and accurately. In this paper, a new predictor, CWLy-RF, is proposed based on the random forest (RF) algorithm to identify cell wall lyases. In this method, we combined three features, namely, 400D, 188D and the composition of k-spaced amino acid group pairs, using mixed-feature representation methods. Afterward, we improved the feature representation ability with the selected top 100 features by using the information gain method and trained a predictive model using RF. The constructed prediction model is evaluated by using 10-fold cross-validation. The accuracy obtained was 96.09%, the AUC was 0.993, the MCC was 0.922, the sensitivity was 94.92%, and the specificity was 97.32%. We have proved that the proposed predictor CWLy-RF is superior to other latest models, and it will hopefully become an effective and useful tool for identifying lyases.
Collapse
|
17
|
Giamas G, Gagliano T. Cancer gene therapy 2020: highlights from a challenging year. Cancer Gene Ther 2021; 29:1-3. [PMID: 33963297 PMCID: PMC8103066 DOI: 10.1038/s41417-021-00340-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/04/2021] [Accepted: 04/13/2021] [Indexed: 11/09/2022]
Affiliation(s)
- Georgios Giamas
- Department of Biochemistry and Biomedicine, School of Life Sciences, University of Sussex, Falmer, Brighton, UK.
| | - Teresa Gagliano
- Department of Medical Science, University of Udine, Udine, Italy.
| |
Collapse
|
18
|
Sánchez-Corrales YE, Pohle RVC, Castellano S, Giustacchini A. Taming Cell-to-Cell Heterogeneity in Acute Myeloid Leukaemia With Machine Learning. Front Oncol 2021; 11:666829. [PMID: 33996595 PMCID: PMC8117935 DOI: 10.3389/fonc.2021.666829] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/06/2021] [Indexed: 12/21/2022] Open
Abstract
Acute Myeloid Leukaemia (AML) is a phenotypically and genetically heterogenous blood cancer characterised by very poor prognosis, with disease relapse being the primary cause of treatment failure. AML heterogeneity arise from different genetic and non-genetic sources, including its proposed hierarchical structure, with leukemic stem cells (LSCs) and progenitors giving origin to a variety of more mature leukemic subsets. Recent advances in single-cell molecular and phenotypic profiling have highlighted the intra and inter-patient heterogeneous nature of AML, which has so far limited the success of cell-based immunotherapy approaches against single targets. Machine Learning (ML) can be uniquely used to find non-trivial patterns from high-dimensional datasets and identify rare sub-populations. Here we review some recent ML tools that applied to single-cell data could help disentangle cell heterogeneity in AML by identifying distinct core molecular signatures of leukemic cell subsets. We discuss the advantages and limitations of unsupervised and supervised ML approaches to cluster and classify cell populations in AML, for the identification of biomarkers and the design of personalised therapies.
Collapse
Affiliation(s)
- Yara E. Sánchez-Corrales
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Ruben V. C. Pohle
- Molecular and Cellular Immunology Section, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sergi Castellano
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
- University College London (UCL) Genomics, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Alice Giustacchini
- Molecular and Cellular Immunology Section, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| |
Collapse
|
19
|
Peng X, Chen L, Zhou JP. Identification of Carcinogenic Chemicals with Network Embedding and Deep Learning Methods. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200414084317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background:
Cancer is the second leading cause of human death in the world. To date,
many factors have been confirmed to be the cause of cancer. Among them, carcinogenic chemicals
have been widely accepted as the important ones. Traditional methods for detecting carcinogenic
chemicals are of low efficiency and high cost.
Objective:
The aim of this study was to design an efficient computational method for the
identification of carcinogenic chemicals.
Methods:
A new computational model was proposed for detecting carcinogenic chemicals. As a
data-driven model, carcinogenic and non-carcinogenic chemicals were obtained from Carcinogenic
Potency Database (CPDB). These chemicals were represented by features extracted from five
chemical networks, representing five types of chemical associations, via a network embedding
method, Mashup. Obtained features were fed into a powerful deep learning method, recurrent
neural network, to build the model.
Results:
The jackknife test on such model provided the F-measure of 0.971 and AUROC of 0.971.
Conclusion:
The proposed model was quite effective and was superior to the models with
traditional machine learning algorithms, classic chemical encoding schemes or direct usage of
chemical associations.
Collapse
Affiliation(s)
- Xuefei Peng
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Jian-Peng Zhou
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
20
|
Yu X, Pan X, Zhang S, Zhang YH, Chen L, Wan S, Huang T, Cai YD. Identification of Gene Signatures and Expression Patterns During Epithelial-to-Mesenchymal Transition From Single-Cell Expression Atlas. Front Genet 2021; 11:605012. [PMID: 33584803 PMCID: PMC7876317 DOI: 10.3389/fgene.2020.605012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 12/21/2020] [Indexed: 11/13/2022] Open
Abstract
Cancer, which refers to abnormal cell proliferative diseases with systematic pathogenic potential, is one of the leading threats to human health. The final causes for patients’ deaths are usually cancer recurrence, metastasis, and drug resistance against continuing therapy. Epithelial-to-mesenchymal transition (EMT), which is the transformation of tumor cells (TCs), is a prerequisite for pathogenic cancer recurrence, metastasis, and drug resistance. Conventional biomarkers can only define and recognize large tissues with obvious EMT markers but cannot accurately monitor detailed EMT processes. In this study, a systematic workflow was established integrating effective feature selection, multiple machine learning models [Random forest (RF), Support vector machine (SVM)], rule learning, and functional enrichment analyses to find new biomarkers and their functional implications for distinguishing single-cell isolated TCs with unique epithelial or mesenchymal markers using public single-cell expression profiling. Our discovered signatures may provide an effective and precise transcriptomic reference to monitor EMT progression at the single-cell level and contribute to the exploration of detailed tumorigenesis mechanisms during EMT.
Collapse
Affiliation(s)
- Xiangtian Yu
- Clinical Research Center, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - XiaoYong Pan
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
| | - ShiQi Zhang
- Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Yu-Hang Zhang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, China
| | - Sibao Wan
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
21
|
Li JF, Ma XJ, Ying LL, Tong YH, Xiang XP. Multi-Omics Analysis of Acute Lymphoblastic Leukemia Identified the Methylation and Expression Differences Between BCP-ALL and T-ALL. Front Cell Dev Biol 2021; 8:622393. [PMID: 33553159 PMCID: PMC7859262 DOI: 10.3389/fcell.2020.622393] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/15/2020] [Indexed: 02/06/2023] Open
Abstract
Acute lymphoblastic leukemia (ALL) as a common cancer is a heterogeneous disease which is mainly divided into BCP-ALL and T-ALL, accounting for 80–85% and 15–20%, respectively. There are many differences between BCP-ALL and T-ALL, including prognosis, treatment, drug screening, gene research and so on. In this study, starting with methylation and gene expression data, we analyzed the molecular differences between BCP-ALL and T-ALL and identified the multi-omics signatures using Boruta and Monte Carlo feature selection methods. There were 7 expression signature genes (CD3D, VPREB3, HLA-DRA, PAX5, BLNK, GALNT6, SLC4A8) and 168 methylation sites corresponding to 175 methylation signature genes. The overall accuracy, accuracy of BCP-ALL, accuracy of T-ALL of the RIPPER (Repeated Incremental Pruning to Produce Error Reduction) classifier using these signatures evaluated with 10-fold cross validation repeated 3 times were 0.973, 0.990, and 0.933, respectively. Two overlapped genes between 175 methylation signature genes and 7 expression signature genes were CD3D and VPREB3. The network analysis of the methylation and expression signature genes suggested that their common gene, CD3D, was not only different on both methylation and expression levels, but also played a key regulatory role as hub on the network. Our results provided insights of understanding the underlying molecular mechanisms of ALL and facilitated more precision diagnosis and treatment of ALL.
Collapse
Affiliation(s)
- Jin-Fan Li
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Xiao-Jing Ma
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Lin-Lin Ying
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Ying-Hui Tong
- Department of Pharmacy, Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Cancer and Basic Medicine (IBMC), Chinese Academy of Sciences, Hangzhou, China
| | - Xue-Ping Xiang
- Department of Pathology, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
22
|
Gu C, Shi X, Dang X, Chen J, Chen C, Chen Y, Pan X, Huang T. Identification of Common Genes and Pathways in Eight Fibrosis Diseases. Front Genet 2021; 11:627396. [PMID: 33519923 PMCID: PMC7844395 DOI: 10.3389/fgene.2020.627396] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/15/2020] [Indexed: 01/05/2023] Open
Abstract
Acute and chronic inflammation often leads to fibrosis, which is also the common and final pathological outcome of chronic inflammatory diseases. To explore the common genes and pathogenic pathways among different fibrotic diseases, we collected all the reported genes of the eight fibrotic diseases: eye fibrosis, heart fibrosis, hepatic fibrosis, intestinal fibrosis, lung fibrosis, pancreas fibrosis, renal fibrosis, and skin fibrosis. We calculated the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment scores of all fibrotic disease genes. Each gene was encoded using KEGG and GO enrichment scores, which reflected how much a gene can affect this function. For each fibrotic disease, by comparing the KEGG and GO enrichment scores between reported disease genes and other genes using the Monte Carlo feature selection (MCFS) method, the key KEGG and GO features were identified. We compared the gene overlaps among eight fibrotic diseases and connective tissue growth factor (CTGF) was finally identified as the common key molecule. The key KEGG and GO features of the eight fibrotic diseases were all screened by MCFS method. Moreover, we interestingly found overlaps of pathways between renal fibrosis and skin fibrosis, such as GO:1901890-positive regulation of cell junction assembly, as well as common regulatory genes, such as CTGF, which is the key molecule regulating fibrogenesis. We hope to offer a new insight into the cellular and molecular mechanisms underlying fibrosis and therefore help leading to the development of new drugs, which specifically delay or even improve the symptoms of fibrosis.
Collapse
Affiliation(s)
- Chang Gu
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Xin Shi
- Department of Cardiology, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Xuening Dang
- Department of Colorectal and Anal Surgery, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai Colorectal Cancer Research Center, Shanghai, China
| | - Jiafei Chen
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, China
| | - Chunji Chen
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Yumei Chen
- Department of Nuclear Medicine, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xufeng Pan
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
23
|
Li D, Lin H, Li L. Multiple Feature Selection Strategies Identified Novel Cardiac Gene Expression Signature for Heart Failure. Front Physiol 2020; 11:604241. [PMID: 33304275 PMCID: PMC7693561 DOI: 10.3389/fphys.2020.604241] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 10/15/2020] [Indexed: 02/02/2023] Open
Abstract
Heart failure (HF) is a serious condition in which the support of blood pumped by the heart is insufficient to meet the demands of body at a normal cardiac filling pressure. Approximately 26 million patients worldwide are suffering from heart failure and about 17–45% of patients with heart failure die within 1-year, and the majority die within 5-years admitted to a hospital. The molecular mechanisms underlying the progression of heart failure have been poorly studied. We compared the gene expression profiles between patients with heart failure (n = 177) and without heart failure (n = 136) using multiple feature selection strategies and identified 38 HF signature genes. The support vector machine (SVM) classifier based on these 38 genes evaluated with leave-one-out cross validation (LOOCV) achieved great performance with sensitivity of 0.983 and specificity of 0.963. The network analysis suggested that the hub gene SMOC2 may play important roles in HF. Other genes, such as FCN3, HMGN2, and SERPINA3, also showed great promises. Our results can facilitate the early detection of heart failure and can reveal its molecular mechanisms.
Collapse
Affiliation(s)
- Dan Li
- Department of Cardiovascular Medicine, First Hospital Affiliated to Harbin Medical University, Harbin, China
| | - Hong Lin
- Internal Medicine-Cardiovascular Department, Harbin Chest Hospital, Harbin, China
| | - Luyifei Li
- Department of Cardiovascular Medicine, First Hospital Affiliated to Harbin Medical University, Harbin, China
| |
Collapse
|
24
|
Eckardt JN, Bornhäuser M, Wendt K, Middeke JM. Application of machine learning in the management of acute myeloid leukemia: current practice and future prospects. Blood Adv 2020; 4:6077-6085. [PMID: 33290546 PMCID: PMC7724910 DOI: 10.1182/bloodadvances.2020002997] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/26/2020] [Indexed: 12/19/2022] Open
Abstract
Machine learning (ML) is rapidly emerging in several fields of cancer research. ML algorithms can deal with vast amounts of medical data and provide a better understanding of malignant disease. Its ability to process information from different diagnostic modalities and functions to predict prognosis and suggest therapeutic strategies indicates that ML is a promising tool for the future management of hematologic malignancies; acute myeloid leukemia (AML) is a model disease of various recent studies. An integration of these ML techniques into various applications in AML management can assure fast and accurate diagnosis as well as precise risk stratification and optimal therapy. Nevertheless, these techniques come with various pitfalls and need a strict regulatory framework to ensure safe use of ML. This comprehensive review highlights and discusses recent advances in ML techniques in the management of AML as a model disease of hematologic neoplasms, enabling researchers and clinicians alike to critically evaluate this upcoming, potentially practice-changing technology.
Collapse
Affiliation(s)
- Jan-Niklas Eckardt
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
| | - Martin Bornhäuser
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
- National Center for Tumor Diseases, Dresden (NCT/UCC), Dresden, Germany
- German Consortium for Translational Cancer Research, DKFZ, Heidelberg, Germany; and
| | - Karsten Wendt
- Institute of Circuits and Systems, Technical University Dresden, Dresden, Germany
| | - Jan Moritz Middeke
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
| |
Collapse
|
25
|
Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020; 8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
Osteoarthritis is one of the most prevalent chronic joint diseases for middle-aged and elderly people. But in recent years, the number of young people suffering from the disease increases quickly. It is known that osteoarthritis is a common degenerative disease caused by the combination and interaction of many factors such as natural and environmental factors. DNA methylations reflect the effects of environmental factors. Several researches on DNA methylation at specific genes in OA cartilage indicated the great potential roles of DNA methylation in OA. To systematically investigate the methylation pattern in knee and hip osteoarthritis, we analyzed the methylation profiles in cartilage of 16 OA hip samples, 19 control hip samples and 62 OA knee samples. 12 discriminative methylation sites were identified using advanced minimal Redundancy Maximal Relevance (mRMR) and Incremental Feature Selection (IFS) methods. The SVM classifier of these 12 methylation sites from genes like MEIS1, GABRG3, RXRA, and EN1, can perfectly classify the OA hip samples, control hip samples and OA knee samples evaluated with LOOCV (Leave-One Out-Cross Validation). These 12 methylation sites can not only serve as biomarker, but also provide underlying mechanism of OA.
Collapse
Affiliation(s)
- Zhen Wu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Lu Shou
- Departmemt of Pneumology, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Jian Wang
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Xu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| |
Collapse
|
26
|
Ren X, Wang S, Huang T. Decipher the connections between proteins and phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140503. [PMID: 32707349 DOI: 10.1016/j.bbapap.2020.140503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 10/23/2022]
Abstract
As the outward-most representation of life, phenotype is the fundamental basis with which humans understand life and disease. But with the advent of molecular and sequencing technique and research, a growing portion of science research focuses primarily on the molecular level of life. Our understanding in molecular variations and mechanisms can only be fully utilized when they are translated into the phenotypic level. In this study, we constructed similarity network for phenotype ontology, and then applied network analysis methods to discover phenotype/disease clusters. Then, we used machine learning models to predict protein-phenotype associations. Each protein was characterized by the functional profiles of its interaction neighbors on the protein-protein interaction network. Our methods can not only predict protein-phenotype associations, but also reveal the underlying mechanisms from protein to phenotype.
Collapse
Affiliation(s)
- Xiaohui Ren
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Steven Wang
- Department of Molecular Biology, Columbia University, New York, USA
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|
27
|
Li M, Chen F, Zhang Y, Xiong Y, Li Q, Huang H. Identification of Post-myocardial Infarction Blood Expression Signatures Using Multiple Feature Selection Strategies. Front Physiol 2020; 11:483. [PMID: 32581823 PMCID: PMC7287215 DOI: 10.3389/fphys.2020.00483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/20/2020] [Indexed: 12/24/2022] Open
Abstract
Myocardial infarction (MI) is a type of serious heart attack in which the blood flow to the heart is suddenly interrupted, resulting in injury to the heart muscles due to a lack of oxygen supply. Although clinical diagnosis methods can be used to identify the occurrence of MI, using the changes of molecular markers or characteristic molecules in blood to characterize the early phase and later trend of MI will help us choose a more reasonable treatment plan. Previously, comparative transcriptome studies focused on finding differentially expressed genes between MI patients and healthy people. However, signature molecules altered in different phases of MI have not been well excavated. We developed a set of computational approaches integrating multiple machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), and support vector machine (SVM), to identify gene expression characteristics on different phases of MI. 134 genes were determined to serve as features for building optimal SVM classifiers to distinguish acute MI and post-MI. Subsequently, functional enrichment analyses followed by protein-protein interaction analysis on 134 genes identified several hub genes (IL1R1, TLR2, and TLR4) associated with progression of MI, which can be used as new diagnostic molecules for MI.
Collapse
Affiliation(s)
- Ming Li
- Department of Cardiology, Eastern Hospital, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Fuli Chen
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Yaling Zhang
- Department of Nephrology, Eastern Hospital, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Yan Xiong
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Qiyong Li
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| | - Hui Huang
- Department of Cardiology, Sichuan Academy of Medical Sciences & Sichuan Provincial People’s Hospital, Chengdu, China
| |
Collapse
|
28
|
Tao X, Wu X, Huang T, Mu D. Identification and Analysis of Dysfunctional Genes and Pathways in CD8 + T Cells of Non-Small Cell Lung Cancer Based on RNA Sequencing. Front Genet 2020; 11:352. [PMID: 32457792 PMCID: PMC7227791 DOI: 10.3389/fgene.2020.00352] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/23/2020] [Indexed: 12/26/2022] Open
Abstract
Lung cancer, the most common of malignant tumors, is typically of the non-small cell (NSCLC) type. T-cell-based immunotherapies are a promising and powerful approach to treating NSCLCs. To characterize the CD8+ T cells of non-small cell lung cancer, we re-analyzed the published RNA-Seq gene expression profiles of 36 CD8+ T cell isolated from tumor (TIL) samples and 32 adjacent uninvolved lung (NTIL) samples. With an advanced Monte Carlo method of feature selection, we identified the CD8+ TIL specific expression patterns. These patterns revealed the key dysfunctional genes and pathways in CD8+ TIL and shed light on the molecular mechanisms of immunity and use of immunotherapy.
Collapse
Affiliation(s)
- Xuefang Tao
- Affiliated Hospital of Shaoxing University, Shaoxing, China
| | - Xiaotang Wu
- Shanghai Engineering Research Center of Pharmaceutical Translation, Shanghai, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Deguang Mu
- Department of Respiratory Medicine, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, China
| |
Collapse
|
29
|
Yan Q, Hu D, Li M, Chen Y, Wu X, Ye Q, Wang Z, He L, Zhu J. The Serum MicroRNA Signatures for Pancreatic Cancer Detection and Operability Evaluation. Front Bioeng Biotechnol 2020; 8:379. [PMID: 32411694 PMCID: PMC7201024 DOI: 10.3389/fbioe.2020.00379] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/06/2020] [Indexed: 12/19/2022] Open
Abstract
Pancreatic cancer (PC) has high morbidity and mortality. It is the fourth leading cause of cancer death. Its diagnosis and treatment are difficult. Liquid biopsy makes early diagnosis of pancreatic cancer possible. We analyzed the expression profiles of 2,555 serum miRNAs in 100 pancreatic cancer patients and 150 healthy controls. With advanced feature selection methods, we identified 13 pancreatic cancer signature miRNAs that can classify the pancreatic cancer patients and healthy controls. For pancreatic cancer treatment, operation is still the first choice. But many pancreatic cancer patients are already inoperable. Therefore, we compared the 79 inoperable and 21 operable patients and identified 432 miRNAs that can predict whether a pancreatic cancer patient was operable. The functional analysis of the 13 pancreatic cancer signatures and the 432 operability miRNAs revealed the molecular mechanisms of pancreatic cancer and shield light on the diagnosis and therapy of pancreatic cancer in clinical practice.
Collapse
Affiliation(s)
- Qiuliang Yan
- Department of General Surgery, Jinhua People's Hospital, Jinhua, China
| | - Dandan Hu
- Department of General Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Maolan Li
- Department of General Surgery, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yan Chen
- Department of General Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiangsong Wu
- Department of General Surgery, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qinghuang Ye
- Department of General Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhijiang Wang
- Department of General Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Lingzhe He
- Department of General Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jinhui Zhu
- Department of General Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
30
|
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:1384749. [PMID: 32300371 PMCID: PMC7142336 DOI: 10.1155/2020/1384749] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 03/16/2020] [Indexed: 02/08/2023]
Abstract
Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.
Collapse
|
31
|
Liu F, Dong H, Mei Z, Huang T. Investigation of miRNA and mRNA Co-expression Network in Ependymoma. Front Bioeng Biotechnol 2020; 8:177. [PMID: 32266223 PMCID: PMC7096354 DOI: 10.3389/fbioe.2020.00177] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 02/20/2020] [Indexed: 12/18/2022] Open
Abstract
Ependymoma (EPN) is a rare primary tumor of the central nervous system (CNS) that affects both children and adults. Despite the definition and classification of distinct molecular subgroups, there remains a group of EPNs with a balanced genome, which makes it difficult to predict a prognosis of patients with EPN. The role of miRNA-mRNA network on EPN is still poorly understood. We assessed the involvement of miRNA-mRNA pairs in EPN by applying a weighted co-expression network analysis (WGCNA) approach. Using whole genome expression profile analysis followed by functional enrichment, we detected hub genes involved in active proliferation and DNA replication of nerve cells. Key genes including CYP11B1, KRT33B, RUNX1T1, SIK1, MAP3K4, MLANA, and SFRP5 identified in co-expression networks were regulated by miR-15a and miR-24-1. These seven miRNA-mRNA pairs were considered to influence not only pathways in cancer and tumor suppression process, but also MAPK, NF-kappaB, and WNT signaling pathways which were associated with tumorigenesis and development. This study provides a novel insight into potential diagnostic biomarkers of EPN and may have value in choosing therapeutic targets with clinical utility.
Collapse
Affiliation(s)
- Feili Liu
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Hang Dong
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Zi Mei
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
32
|
Zhang H, Jin Z, Cheng L, Zhang B. Integrative Analysis of Methylation and Gene Expression in Lung Adenocarcinoma and Squamous Cell Lung Carcinoma. Front Bioeng Biotechnol 2020; 8:3. [PMID: 32117905 PMCID: PMC7019569 DOI: 10.3389/fbioe.2020.00003] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 01/03/2020] [Indexed: 12/18/2022] Open
Abstract
Lung cancer is a highly prevalent type of cancer with a poor 5-year survival rate of about 4-17%. Eighty percent lung cancer belongs to non-small-cell lung cancer (NSCLC). For a long time, the treatment of NSCLC has been mostly guided by tumor stage, and there has been no significant difference between the therapy strategy of lung adenocarcinoma (LUAD) and squamous cell lung carcinoma (SCLC), the two major subtypes of NSCLC. In recent years, important molecular differences between LUAD and SCLC are increasingly identified, indicating that targeted therapy will be more and more histologically specific in the future. To investigate the LUAD and SCLC difference on multi-omics scale, we analyzed the methylation and gene expression data together. With the Boruta method to remove irrelevant features and the MCFS (Monte Carlo Feature Selection) method to identify the significantly important features, we identified 113 key methylation features and 23 key gene expression features. HNF1B and TP63 were found to be dysfunctional on both methylation and gene expression levels. The experimentally determined interaction network suggested that TP63 may play an important role in connecting methylation genes and expression genes. Many of the discovered signature genes have been supported by literature. Our results may provide directions of precision diagnosis and therapy of LUAD and SCLC.
Collapse
Affiliation(s)
- Hao Zhang
- Department of Respiratory and Critical Care Medicine, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China
| | - Zhou Jin
- Department of Respiratory and Critical Care Medicine, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China.,Department of Respiration, Hospital of Traditional Chinese Medicine of Zhenhai, Ningbo, China
| | - Ling Cheng
- Shanghai Engineering Research Center of Pharmaceutical Translation, Shanghai, China
| | - Bin Zhang
- Department of Respiratory and Critical Care Medicine, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
33
|
Zhang J, Hu H, Xu S, Jiang H, Zhu J, Qin E, He Z, Chen E. The Functional Effects of Key Driver KRAS Mutations on Gene Expression in Lung Cancer. Front Genet 2020; 11:17. [PMID: 32117436 PMCID: PMC7010953 DOI: 10.3389/fgene.2020.00017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/07/2020] [Indexed: 12/11/2022] Open
Abstract
Lung cancer is a common malignant cancer. Kirsten rat sarcoma oncogene (KRAS) mutations have been considered as a key driver for lung cancers. KRAS p.G12C mutations were most predominant in NSCLC which was comprised about 11–16% of lung adenocarcinomas (p.G12C accounts for 45–50% of mutant KRAS). But it is still not clear how the KRAS mutation triggers lung cancers. To study the molecular mechanisms of KRAS mutation in lung cancer. We analyzed the gene expression profiles of 156 KRAS mutation samples and other negative samples with two stage feature selection approach: (1) minimal Redundancy Maximal Relevance (mRMR) and (2) Incremental Feature Selection (IFS). At last, 41 predictive genes for KRAS mutation were identified and a KRAS mutation predictor was constructed. Its leave one out cross validation MCC was 0.879. Our results were helpful for understanding the roles of KRAS mutation in lung cancer.
Collapse
Affiliation(s)
- Jisong Zhang
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Huihui Hu
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Shan Xu
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Hanliang Jiang
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Jihong Zhu
- Department of Anesthesiology, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - E Qin
- Department of Respiratory Medicine, Shaoxing People's Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing, China
| | - Zhengfu He
- Department of Thoracic Surgery, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| | - Enguo Chen
- Department of Pulmonary and Critical Care Medicine, Sir Run Run Shaw Hospital of Zhejiang University, Hangzhou, China
| |
Collapse
|
34
|
Chen L, Li D, Shao Y, Wang H, Liu Y, Zhang Y. Identifying Microbiota Signature and Functional Rules Associated With Bacterial Subtypes in Human Intestine. Front Genet 2019; 10:1146. [PMID: 31803234 PMCID: PMC6872643 DOI: 10.3389/fgene.2019.01146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open
Abstract
Gut microbiomes are integral microflora located in the human intestine with particular symbiosis. Among all microorganisms in the human intestine, bacteria are the most significant subgroup that contains many unique and functional species. The distribution patterns of bacteria in the human intestine not only reflect the different microenvironments in different sections of the intestine but also indicate that bacteria may have unique biological functions corresponding to their proper regions of the intestine. However, describing the functional differences between the bacterial subgroups and their distributions in different individuals is difficult using traditional computational approaches. Here, we first attempted to introduce four effective sets of bacterial features from independent databases. We then presented a novel computational approach to identify potential distinctive features among bacterial subgroups based on a systematic dataset on the gut microbiome from approximately 1,500 human gut bacterial strains. We also established a group of quantitative rules for explaining such distinctions. Results may reveal the microstructural characteristics of the intestinal flora and deepen our understanding on the regulatory role of bacterial subgroups in the human intestine.
Collapse
Affiliation(s)
- Lijuan Chen
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Daojie Li
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Ye Shao
- School of Medicine, Huaqiao University, Quanzhou, China
| | - Hui Wang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Yuqing Liu
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Yunhua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| |
Collapse
|
35
|
Pan X, Zeng T, Yuan F, Zhang YH, Chen L, Zhu L, Wan S, Huang T, Cai YD. Screening of Methylation Signature and Gene Functions Associated With the Subtypes of Isocitrate Dehydrogenase-Mutation Gliomas. Front Bioeng Biotechnol 2019; 7:339. [PMID: 31803734 PMCID: PMC6871504 DOI: 10.3389/fbioe.2019.00339] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 10/30/2019] [Indexed: 02/05/2023] Open
Abstract
Isocitrate dehydrogenase (IDH) is an oncogene, and the expression of a mutated IDH promotes cell proliferation and inhibits cell differentiation. IDH exists in three different isoforms, whose mutation can cause many solid tumors, especially gliomas in adults. No effective method for classifying gliomas on genetic signatures is currently available. DNA methylation may be applied to distinguish cancer cells from normal tissues. In this study, we focused on three subtypes of IDH-mutation gliomas by examining methylation data. Several advanced computational methods were used, such as Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support machine vector (SVM), etc. The MCFS method was adopted to analyze methylation features, resulting in a feature list. Then, the IFS method incorporating SVM was applied to the list to extract important methylation features and construct an optimal SVM classifier. As a result, several methylation features (sites) were found to relate to glioma subclasses, which are annotated onto multiple genes, such as FLJ37543, LCE3D, FAM89A, ADCY5, ESR1, C2orf67, REST, EPHA7, etc. These genes are enriched in biological functions, including cellular developmental process, neuron differentiation, cellular component morphogenesis, and G-protein-coupled receptor signaling pathway. Our results, which are supported by literature reports and independent dataset validation, showed that our identified genes and functions contributed to the detailed glioma subtypes. This study provided a basic research on IDH-mutation gliomas.
Collapse
Affiliation(s)
- XiaoYong Pan
- School of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.,IDLab, Department for Electronics and Information Systems, Ghent University, Ghent, Belgium
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Fei Yuan
- Department of Science and Technology, Binzhou Medical University Hospital, Binzhou, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, China
| | - LiuCun Zhu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - SiBao Wan
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|