1
|
Li Y, Zou Z, Gao Z, Wang Y, Xiao M, Xu C, Jiang G, Wang H, Jin L, Wang J, Wang HZ, Guo S, Wu J. Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting. Cancer Med 2022; 11:4469-4478. [PMID: 35499292 PMCID: PMC9741969 DOI: 10.1002/cam4.4800] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 04/22/2022] [Accepted: 04/24/2022] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Detecting early-stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case-control study. MATERIALS AND METHODS A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large-scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10-fold cross-validation was performed, and model predictive performance was evaluated by the area under the curve (AUC). RESULTS After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816). CONCLUSION Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic-based logistic regression risk prediction model significantly improves model performance.
Collapse
Affiliation(s)
- Yutao Li
- School of Life SciencesFudan UniversityShanghaiChina
| | - Zixiu Zou
- School of Life SciencesFudan UniversityShanghaiChina
| | - Zhunyi Gao
- Company 6 of Basic Medical SchoolNavy Military Medical UniversityShanghaiChina
| | - Yi Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Man Xiao
- Department of Biochemistry and Molecular BiologyHainan Medical UniversityHaikouChina
| | - Chang Xu
- Clinical College of Xiangnan UniversityChenzhouChina
| | - Gengxi Jiang
- Department of Thoracic Surgerythe First Affiliated Hospital of Naval Medical University (Second Military Medical University)ShanghaiChina
| | - Haijian Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Li Jin
- School of Life SciencesFudan UniversityShanghaiChina
| | - Jiucun Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Huai Zhou Wang
- Department of Laboratory Diagnosisthe First Affiliated Hospital of Naval Medical University (Second Military Medical University)ShanghaiChina
| | - Shicheng Guo
- School of Life SciencesFudan UniversityShanghaiChina
| | - Junjie Wu
- School of Life SciencesFudan UniversityShanghaiChina,Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina,Department of Pulmonary and Critical Care MedicineShanghai Geriatric Medical CenterShanghaiChina
| |
Collapse
|
2
|
Wang H, Wang X, Xu L, Cao H, Zhang J. Nonnegative matrix factorization-based bioinformatics analysis reveals that TPX2 and SELENBP1 are two predictors of the inner sub-consensuses of lung adenocarcinoma. Cancer Med 2021; 10:9058-9077. [PMID: 34734491 PMCID: PMC8683537 DOI: 10.1002/cam4.4386] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 09/21/2021] [Accepted: 10/14/2021] [Indexed: 12/24/2022] Open
Abstract
Background Lung adenocarcinoma (LUAD) is a heterogeneous disease. However the inner sub‐groups of LUAD have not been fully studied. Markers predicted the sub‐groups and prognosis of LUAD are badly needed. Aims To identify biomarkers associated with the sub‐groups and prognosis of LUAD. Materials and Methods Using nonnegative matrix factorization (NMF) clustering, LUAD patients from The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO) datasets and LUAD cell lines from Genomics of Drug Sensitivity in Cancer (GDSC) dataset were divided into different sub‐consensuses based on the gene expression profiling. The overall survival of LUAD patients in each sub‐consensus was determined by Kaplan‐Meier survival analysis. The common genes which were differentially expressed in each sub‐consensus of LUAD patients and LUAD cell lines were identified using TBtools. The predictive accuracy of TPX2 and SELENBP1 for theinner sub‐consensuses of LUAD was determined by Receiver operator characteristic (ROC) analysis. The Kaplan‐Meier survival analysis was also used to test the prognostic significance of TPX2 and SELENBP1 in LUAD patients. Results Using nonnegative matrix factorization clustering, LUAD patients in The Cancer Genome Atlas (TCGA), GSE30219, GSE42127, GSE50081, GSE68465, and GSE72094 datasets were divided into three sub‐consensuses. Sub‐consensus3 LUAD patients were with low overall survival and were with high TP53 mutations. Similarly, LUAD cell lines were also divided into three sub‐consensuses by NMF method, and sub‐consensus2 cell lines were resistant to EGFR inhibitors. Identification of the common genes which were differentially expressed in different sub‐consensuses of LUAD patients and LUAD cell lines revealed that TPX2 was highly expressed in sub‐consensus3 LUAD patients and sub‐consensus2 LUAD cell lines. On the contrary, SELENBP1 was highly expressed in sub‐consensus1 LUAD patients and sub‐consensus1 LUAD cell lines. The expression levels of TPX2 and SELENBP1 could distinguish sub‐consensus3 LUAD patients or sub‐consensus2 LUAD cell lines from other sub‐consensuses of LUAD patients or cell lines. Moreover, compared with normal lung tissues, TPX2 was highly expressed, while, SELENBP1 was lowly expressed in LUAD tissues. Furthermore, the higher expression levels of TPX2 were associated with the lower relapse‐free survival and the lower overall survival of LUAD patients. While, the higher expression levels of SELENBP1 were associated with the higher relapse‐free survival and higher overall survival. At last, we showed that TP53 mutant LUAD patients were with higher TPX2 and lower SELENBP1 expressions. Discussion Both iCluster and NMF method are proved to be robust LUAD classification systems. However, the LUAD patients in different iclusters had no significant clinical overall survival, while, sub‐consensus3 LUAD patients from NMF classification were with lower overall survival than other sub‐consensuses. Conclusions By integrated analysis of 1765 LUAD patients and 64 LUAD cell lines, we showed that NMF was a robust inner sub‐consensuses classification method of LUAD. TPX2 and SELENBP1 were differentially expressed in different LUAD sub‐ consensuses, and predicted the inner sub‐consensuses of LUAD with high accuracy. TPX2 was an unfavorable prognostic biomarker of LUAD which was up‐regulated in LUAD tissues and associated with the low overall survival of LUAD. SELENBP1 was a favorable prognostic biomarker of LUAD which was down‐regulated in LUAD tissues and associated with the prolonged overall survival of LUAD.
Collapse
Affiliation(s)
- Haiwei Wang
- Fujian Key Laboratory for Prenatal Diagnosis and Birth Defect, Fujian Maternity and Child Health Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, China.,Key Laboratory of Technical Evaluation of Fertility Regulation for Non-human Primate, National Health and Family Planning Commission, Fuzhou, Fujian, China
| | - Xinrui Wang
- Fujian Key Laboratory for Prenatal Diagnosis and Birth Defect, Fujian Maternity and Child Health Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, China.,Key Laboratory of Technical Evaluation of Fertility Regulation for Non-human Primate, National Health and Family Planning Commission, Fuzhou, Fujian, China
| | - Liangpu Xu
- Fujian Key Laboratory for Prenatal Diagnosis and Birth Defect, Fujian Maternity and Child Health Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, China.,Key Laboratory of Technical Evaluation of Fertility Regulation for Non-human Primate, National Health and Family Planning Commission, Fuzhou, Fujian, China
| | - Hua Cao
- Fujian Key Laboratory for Prenatal Diagnosis and Birth Defect, Fujian Maternity and Child Health Hospital, Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, China.,Key Laboratory of Technical Evaluation of Fertility Regulation for Non-human Primate, National Health and Family Planning Commission, Fuzhou, Fujian, China
| | - Ji Zhang
- State Key Laboratory for Medical Genomics, Shanghai Institute of Hematology, Rui-Jin Hospital Affiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
3
|
Cai X, Lin L, Zhang Q, Wu W, Su A. Bioinformatics analysis of the circRNA-miRNA-mRNA network for non-small cell lung cancer. J Int Med Res 2021; 48:300060520929167. [PMID: 32527185 PMCID: PMC7294496 DOI: 10.1177/0300060520929167] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE Non-small cell lung cancer (NSCLC) accounts for approximately 80% of all lung cancers, but its pathogenesis has not been fully elucidated. Therefore, it is valuable to explore the pathogenesis of NSCLC to improve diagnosis and identify novel treatment biomarkers. METHODS Circular (circ)RNA, micro (mi)RNA, and gene expression datasets of NSCLC were analyzed to identify those that were differentially expressed between tumor and healthy tissues. Common genes were found and pathway enrichment analyses were performed. Survival analysis was used to identify hub genes, and their level of methylation and association with immune cell infiltration were analyzed. Finally, an NSCLC circRNA-miRNA-mRNA network was constructed. RESULTS Eight miRNAs and 211 common genes were identified. Gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses revealed that cell projection morphogenesis, blood vessel morphogenesis, muscle cell proliferation, and synapse organization were enriched. Ten hub genes were found, of which the expression of DTL and RRM2 was significantly related to NSCLC patient prognosis. Significant methylation changes and immune cell infiltration correlations with DTL and RRM2 were also detected. CONCLUSIONS hsa_circ_0001947/hsa-miR-637/RRM2 and hsa_circ_0072305/hsa-miR-127-5p/DTL networks were constructed, and identified molecules may be involved in the occurrence and development of NSCLC.
Collapse
Affiliation(s)
- Xueying Cai
- Department of Respiratory Medicine, Zhongshan Hospital, Xiamen University, Xiamen City, Fujian Province, China
| | - Lixuan Lin
- Department of Basic Medicine, College of Life Sciences, Sichuan University, Chengdu, China
| | - Qiuhua Zhang
- Department of Internal Medicine and Oncology, Zhongshan Hospital, Xiamen University, Xiamen City, Fujian Province, China
| | - Weixin Wu
- Department of Internal Medicine and Oncology, Zhongshan Hospital, Xiamen University, Xiamen City, Fujian Province, China
| | - An Su
- Department of Internal Medicine and Oncology, Zhongshan Hospital, Xiamen University, Xiamen City, Fujian Province, China
| |
Collapse
|
4
|
Islam R, Ahmed L, Paul BK, Ahmed K, Bhuiyan T, Moni MA. Identification of molecular biomarkers and pathways of NSCLC: insights from a systems biomedicine perspective. J Genet Eng Biotechnol 2021; 19:43. [PMID: 33742334 PMCID: PMC7979844 DOI: 10.1186/s43141-021-00134-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 02/14/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND Worldwide, more than 80% of identified lung cancer cases are associated to the non-small cell lung cancer (NSCLC). We used microarray gene expression dataset GSE10245 to identify key biomarkers and associated pathways in NSCLC. RESULTS To collect Differentially Expressed Genes (DEGs) from the dataset GSE10245, we applied the R statistical language. Functional analysis was completed using the Database for Annotation Visualization and Integrated Discovery (DAVID) online repository. The DifferentialNet database was used to construct Protein-protein interaction (PPI) network and visualized it with the Cytoscape software. Using the Molecular Complex Detection (MCODE) method, we identify clusters from the constructed PPI network. Finally, survival analysis was performed to acquire the overall survival (OS) values of the key genes. One thousand eighty two DEGs were unveiled after applying statistical criterion. Functional analysis showed that overexpressed DEGs were greatly involved with epidermis development and keratinocyte differentiation; the under-expressed DEGs were principally associated with the positive regulation of nitric oxide biosynthetic process and signal transduction. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway investigation explored that the overexpressed DEGs were highly involved with the cell cycle; the under-expressed DEGs were involved with cell adhesion molecules. The PPI network was constructed with 474 nodes and 2233 connections. CONCLUSIONS Using the connectivity method, 12 genes were considered as hub genes. Survival analysis showed worse OS value for SFN, DSP, and PHGDH. Outcomes indicate that Stratifin may play a crucial role in the development of NSCLC.
Collapse
Affiliation(s)
- Rakibul Islam
- Department of Software Engineering, Daffodil International University (DIU), Ashulia, Savar, Dhaka, 1342, Bangladesh
| | - Liton Ahmed
- Department of Software Engineering, Daffodil International University (DIU), Ashulia, Savar, Dhaka, 1342, Bangladesh
| | - Bikash Kumar Paul
- Department of Software Engineering, Daffodil International University (DIU), Ashulia, Savar, Dhaka, 1342, Bangladesh.,Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh.,Group of Bio-photomatiχ, Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail, 1902, Bangladesh
| | - Kawsar Ahmed
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh. .,Group of Bio-photomatiχ, Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail, 1902, Bangladesh.
| | - Touhid Bhuiyan
- Department of Software Engineering, Daffodil International University (DIU), Ashulia, Savar, Dhaka, 1342, Bangladesh
| | - Mohammad Ali Moni
- WHO Collaborating Centre on eHealth, School of Public Health and Community Medicine, Faculty of Medicine, University of New South Wales, Sydney, Australia
| |
Collapse
|
5
|
A prognostic model for overall survival of patients with early-stage non-small cell lung cancer: a multicentre, retrospective study. LANCET DIGITAL HEALTH 2020; 2:e594-e606. [PMID: 33163952 PMCID: PMC7646741 DOI: 10.1016/s2589-7500(20)30225-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Background Intratumoural heterogeneity has been previously shown to be related to clonal evolution and genetic instability and associated with tumour progression. Phenotypically, it is reflected in the diversity of appearance and morphology within cell populations. Computer-extracted features relating to tumour cellular diversity on routine tissue images might correlate with outcome. This study investigated the prognostic ability of computer-extracted features of tumour cellular diversity (CellDiv) from haematoxylin and eosin (H&E)-stained histology images of non-small cell lung carcinomas (NSCLCs). Methods In this multicentre, retrospective study, we included 1057 patients with early-stage NSCLC with corresponding diagnostic histology slides and overall survival information from four different centres. CellDiv features quantifying local cellular morphological diversity from H&E-stained histology images were extracted from the tumour epithelium region. A Cox proportional hazards model based on CellDiv was used to construct risk scores for lung adenocarcinoma (LUAD; 270 patients) and lung squamous cell carcinoma (LUSC; 216 patients) separately using data from two of the cohorts, and was validated in the two remaining independent cohorts (comprising 236 patients with LUAD and 335 patients with LUSC). We used multivariable Cox regression analysis to examine the predictive ability of CellDiv features for 5-year overall survival, controlling for the effects of clinical and pathological parameters. We did a gene set enrichment and Gene Ontology analysis on 405 patients to identify associations with differentially expressed biological pathways implicated in lung cancer pathogenesis. Findings For prognosis of patients with early-stage LUSC, the CellDiv LUSC model included 11 discriminative CellDiv features, whereas for patients with early-stage LUAD, the model included 23 features. In the independent validation cohorts, patients predicted to be at a higher risk by the univariable CellDiv model had significantly worse 5-year overall survival (hazard ratio 1·48 [95% CI 1·06–2·08]; p=0·022 for The Cancer Genome Atlas [TCGA] LUSC group, 2·24 [1·04–4·80]; p=0·039 for the University of Bern LUSC group, and 1·62 [1·15–2·30]; p=0·0058 for the TCGA LUAD group). The identified CellDiv features were also found to be strongly associated with apoptotic signalling and cell differentiation pathways. Interpretation CellDiv features were strongly prognostic of 5-year overall survival in patients with early-stage NSCLC and also associated with apoptotic signalling and cell differentiation pathways. The CellDiv-based risk stratification model could potentially help to determine which patients with early-stage NSCLC might receive added benefit from adjuvant therapy. Funding National Institue of Health and US Department of Defense.
Collapse
|