1
|
Jiang L, Jia L, Wang Y, Wu Y, Yue J. Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets. Interdiscip Sci 2024:10.1007/s12539-024-00635-w. [PMID: 38758306 DOI: 10.1007/s12539-024-00635-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/18/2024] [Accepted: 04/23/2024] [Indexed: 05/18/2024]
Abstract
Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.
Collapse
Affiliation(s)
- Liancheng Jiang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Liye Jia
- College of Computer Science and Technology, Taiyuan Normal University, Taiyuan, 030619, China
| | - Yizhen Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Yongfei Wu
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Junhong Yue
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China.
| |
Collapse
|
2
|
Rydzewski NR, Shi Y, Li C, Chrostek MR, Bakhtiar H, Helzer KT, Bootsma ML, Berg TJ, Harari PM, Floberg JM, Blitzer GC, Kosoff D, Taylor AK, Sharifi MN, Yu M, Lang JM, Patel KR, Citrin DE, Sundling KE, Zhao SG. A platform-independent AI tumor lineage and site (ATLAS) classifier. Commun Biol 2024; 7:314. [PMID: 38480799 PMCID: PMC10937974 DOI: 10.1038/s42003-024-05981-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 02/27/2024] [Indexed: 03/17/2024] Open
Abstract
Histopathologic diagnosis and classification of cancer plays a critical role in guiding treatment. Advances in next-generation sequencing have ushered in new complementary molecular frameworks. However, existing approaches do not independently assess both site-of-origin (e.g. prostate) and lineage (e.g. adenocarcinoma) and have minimal validation in metastatic disease, where classification is more difficult. Utilizing gradient-boosted machine learning, we developed ATLAS, a pair of separate AI Tumor Lineage and Site-of-origin models from RNA expression data on 8249 tumor samples. We assessed performance independently in 10,376 total tumor samples, including 1490 metastatic samples, achieving an accuracy of 91.4% for cancer site-of-origin and 97.1% for cancer lineage. High confidence predictions (encompassing the majority of cases) were accurate 98-99% of the time in both localized and remarkably even in metastatic samples. We also identified emergent properties of our lineage scores for tumor types on which the model was never trained (zero-shot learning). Adenocarcinoma/sarcoma lineage scores differentiated epithelioid from biphasic/sarcomatoid mesothelioma. Also, predicted lineage de-differentiation identified neuroendocrine/small cell tumors and was associated with poor outcomes across tumor types. Our platform-independent single-sample approach can be easily translated to existing RNA-seq platforms. ATLAS can complement and guide traditional histopathologic assessment in challenging situations and tumors of unknown primary.
Collapse
Affiliation(s)
- Nicholas R Rydzewski
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Yue Shi
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Chenxuan Li
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | | | - Hamza Bakhtiar
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Kyle T Helzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Matthew L Bootsma
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Tracy J Berg
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
| | - Paul M Harari
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - John M Floberg
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - Grace C Blitzer
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
| | - David Kosoff
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Amy K Taylor
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Marina N Sharifi
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Menggang Yu
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
| | - Joshua M Lang
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA
- Department of Medicine, University of Wisconsin, Madison, WI, USA
| | - Krishnan R Patel
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Deborah E Citrin
- Radiation Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kaitlin E Sundling
- Department of Pathology and Laboratory Medicine, University of Wisconsin, Madison, WI, USA
- Wisconsin State Laboratory of Hygiene, University of Wisconsin, Madison, WI, USA
| | - Shuang G Zhao
- Department of Human Oncology, University of Wisconsin, Madison, WI, USA.
- Carbone Cancer Center, University of Wisconsin, Madison, WI, USA.
- William S. Middleton Veterans Hospital, Madison, WI, USA.
| |
Collapse
|
3
|
Huang J, Xie S, Huang J, Zheng Z, Lin Z, Lin J, Tang K, Meng M, Zhao Y, Liao W, Liu C, Gu Y, Li S, Chen H, Chen R. Imaging features and deep learning for prediction of pulmonary epithelioid hemangioendothelioma in CT images. J Thorac Dis 2024; 16:935-947. [PMID: 38505025 PMCID: PMC10944745 DOI: 10.21037/jtd-23-455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 09/08/2023] [Indexed: 03/21/2024]
Abstract
Background Pulmonary epithelioid hemangioendothelioma (PEH) is a rare vascular tumour, and its early diagnosis remains challenging. This study aims to comprehensively analyse the imaging features of PEH and develop a model for predicting PEH. Methods Retrospective and pooled analyses of imaging findings were performed in PEH patients at our center (n=25) and in published cases (n=71), respectively. Relevant computed tomography (CT) images were extracted and used to build a deep learning model for PEH identification and differentiation from other diseases. Results In this study, bilateral multiple nodules/masses (n=19) appeared to be more common with most nodules less than 2 cm. In addition to the common types and features, the pattern of mixed type (n=4) and isolated nodules (n=4), punctate calcifications (5/25) and lymph node enlargement were also observed (10/25). The presence of pleural effusion is associated with a poor prognosis in PEH. The deep learning model, with an area under the receiver operating characteristic curve (AUC) of 0.71 [95% confidence interval (CI): 0.69-0.72], has a differentiation accuracy of 100% and 74% for the training and test sets respectively. Conclusions This study confirmed the heterogeneity of the imaging findings in PEH and showed several previously undescribed types and features. The current deep learning model based on CT has potential for clinical application and needs to be further explored in the future.
Collapse
Affiliation(s)
- Junfeng Huang
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Shuojia Xie
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
- Nanshan School of Medicine, Guangzhou Medical University, Guangzhou, China
| | - Junjie Huang
- Department of Radiology, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
- Department of Medical Imaging, Foshan Hospital of Traditional Chinese Medicine, Foshan, China
| | - Ziwen Zheng
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zikai Lin
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
- Nanshan School of Medicine, Guangzhou Medical University, Guangzhou, China
| | - Jinsheng Lin
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Kailun Tang
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
- Clinical Medical College of Henan University, Kaifeng, China
| | - Mingqiang Meng
- The School of Biomedical Engineering, Southern Medical University, Guangzhou, China
- Guangdong Artificial Intelligence and Digital Economy Laboratory (Guangzhou), Guangzhou, China
| | - Yulin Zhao
- Nanshan School of Medicine, Guangzhou Medical University, Guangzhou, China
| | - Wanzhe Liao
- Nanshan School of Medicine, Guangzhou Medical University, Guangzhou, China
| | - Chunping Liu
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yingying Gu
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Shiyue Li
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Huai Chen
- Department of Radiology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Ruchong Chen
- Department of Allergy and Clinical Immunology, National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
4
|
Ma W, Wu H, Chen Y, Xu H, Jiang J, Du B, Wan M, Ma X, Chen X, Lin L, Su X, Bao X, Shen Y, Xu N, Ruan J, Jiang H, Ding Y. New techniques to identify the tissue of origin for cancer of unknown primary in the era of precision medicine: progress and challenges. Brief Bioinform 2024; 25:bbae028. [PMID: 38343328 PMCID: PMC10859692 DOI: 10.1093/bib/bbae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 12/10/2023] [Accepted: 01/11/2024] [Indexed: 02/15/2024] Open
Abstract
Despite a standardized diagnostic examination, cancer of unknown primary (CUP) is a rare metastatic malignancy with an unidentified tissue of origin (TOO). Patients diagnosed with CUP are typically treated with empiric chemotherapy, although their prognosis is worse than those with metastatic cancer of a known origin. TOO identification of CUP has been employed in precision medicine, and subsequent site-specific therapy is clinically helpful. For example, molecular profiling, including genomic profiling, gene expression profiling, epigenetics and proteins, has facilitated TOO identification. Moreover, machine learning has improved identification accuracy, and non-invasive methods, such as liquid biopsy and image omics, are gaining momentum. However, the heterogeneity in prediction accuracy, sample requirements and technical fundamentals among the various techniques is noteworthy. Accordingly, we systematically reviewed the development and limitations of novel TOO identification methods, compared their pros and cons and assessed their potential clinical usefulness. Our study may help patients shift from empirical to customized care and improve their prognoses.
Collapse
Affiliation(s)
- Wenyuan Ma
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Hui Wu
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yiran Chen
- Department of Surgical Oncology, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hongxia Xu
- Zhejiang University-University of Edinburgh Institute (ZJU-UoE Institute), Zhejiang University School of Medicine, Zhejiang University, Haining, China
| | - Junjie Jiang
- Department of Gastroenterology, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Bang Du
- Real Doctor AI Research Centre, School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingyu Wan
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaolu Ma
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoyu Chen
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Lili Lin
- Department of Nuclear Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xinhui Su
- Department of Nuclear Medicine, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xuanwen Bao
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yifei Shen
- Department of Laboratory Medicine, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Nong Xu
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jian Ruan
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Haiping Jiang
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yongfeng Ding
- Department of Medical Oncology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
5
|
Pan P, Li J, Wang B, Tan X, Yin H, Han Y, Wang H, Shi X, Li X, Xie C, Chen L, Chen L, Bai Y, Li Z, Tian G. Molecular characterization of colorectal adenoma and colorectal cancer via integrated genomic transcriptomic analysis. Front Oncol 2023; 13:1067849. [PMID: 37546388 PMCID: PMC10401844 DOI: 10.3389/fonc.2023.1067849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 06/21/2023] [Indexed: 08/08/2023] Open
Abstract
Introduction Colorectal adenoma can develop into colorectal cancer. Determining the risk of tumorigenesis in colorectal adenoma would be critical for avoiding the development of colorectal cancer; however, genomic features that could help predict the risk of tumorigenesis remain uncertain. Methods In this work, DNA and RNA parallel capture sequencing data covering 519 genes from colorectal adenoma and colorectal cancer samples were collected. The somatic mutation profiles were obtained from DNA sequencing data, and the expression profiles were obtained from RNA sequencing data. Results Despite some similarities between the adenoma samples and the cancer samples, different mutation frequencies, co-occurrences, and mutually exclusive patterns were detected in the mutation profiles of patients with colorectal adenoma and colorectal cancer. Differentially expressed genes were also detected between the two patient groups using RNA sequencing. Finally, two random forest classification models were built, one based on mutation profiles and one based on expression profiles. The models distinguished adenoma and cancer samples with accuracy levels of 81.48% and 100.00%, respectively, showing the potential of the 519-gene panel for monitoring adenoma patients in clinical practice. Conclusion This study revealed molecular characteristics and correlations between colorectal adenoma and colorectal cancer, and it demonstrated that the 519-gene panel may be used for early monitoring of the progression of colorectal adenoma to cancer.
Collapse
Affiliation(s)
- Peng Pan
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Jingnan Li
- Department of Gastroenterology, Peking Union Medical College Hospital, Beijing, China
| | - Bo Wang
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoyan Tan
- Department of Gastroenterology, Maoming People's Hospital, Maoming, China
| | - Hekun Yin
- Department of Gastroenterology, Jiangmen Central Hospital, Jiangmen, China
| | - Yingmin Han
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| | - Haobin Wang
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| | - Xiaoli Shi
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Xiaoshuang Li
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Cuinan Xie
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Longfei Chen
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Lanyou Chen
- Department of Science, Geneis Beijing Co., Ltd., Beijing, China
| | - Yu Bai
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Zhaoshen Li
- Department of Gastroenterology, Shanghai Changhai Hospital, Shanghai, China
| | - Geng Tian
- Department of Bioinformatics, Boke Biotech Co., Ltd., Wuxi, China
| |
Collapse
|
6
|
Li S, Wang B, Chang M, Hou R, Tian G, Tong L. A Novel Algorithm for Detecting Microsatellite Instability Based on Next-Generation Sequencing Data. Front Oncol 2022; 12:916379. [PMID: 35847873 PMCID: PMC9280483 DOI: 10.3389/fonc.2022.916379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 05/27/2022] [Indexed: 11/25/2022] Open
Abstract
Objectives Microsatellite instability (MSI) is the condition of genetic hypermutability caused by spontaneous acquisition or loss of nucleotides during the DNA replication. MSI has been discovered to be a useful immunotherapy biomarker clinically. The main DNA-based method for MSI detection is polymerase chain reaction (PCR) amplification and fragment length analysis, which are costly and laborious. Thus, we developed a novel method to detect MSI based on next-generation sequencing (NGS) data. Methods We chose six markers of MSI. After alignment and reads counting, a histogram was plotted showing the counts of different lengths for each marker. We then designed an algorithm to discover peaks in the generated histograms so that the peak numbers discovered in NGS data resembled that in PCR-based method. Results We selected nine samples as the training dataset, 101 samples for validation, and 68 samples as the test dataset from Chifeng Municipal Hospital, Inner Mongolia, China. The NGS-based method achieved 100% accuracy for the validation dataset and 98.53% accuracy for the test dataset, in which only one false positive was detected. Conclusions Accurate MSI judgments were achieved using NGS data, which could provide comparable MSI detection with the gold standard, PCR-based methods.
Collapse
Affiliation(s)
- Shijun Li
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
| | - Bo Wang
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Miaomiao Chang
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
| | - Rui Hou
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Science Department, Geneis Beijing Co., Ltd., Beijing, China
- *Correspondence: Geng Tian, ; Ling Tong,
| | - Ling Tong
- Pathology Department, Chifeng Municipal Hospital, Chifeng, China
- *Correspondence: Geng Tian, ; Ling Tong,
| |
Collapse
|
7
|
Informative SNP Selection Based on a Fuzzy Clustering and Improved Binary Particle Swarm Optimization Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:3837579. [PMID: 35756402 PMCID: PMC9225903 DOI: 10.1155/2022/3837579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/14/2022] [Accepted: 04/30/2022] [Indexed: 12/04/2022]
Abstract
Single-nucleotide polymorphism (SNP) involves the replacement of a single nucleotide in a deoxyribonucleic acid (DNA) sequence and is often linked to the development of specific diseases. Although current genotyping methods can tag SNP loci within biological samples to provide accurate genetic information for a disease associated, they have limited prediction accuracy. Furthermore, they are complex to perform and may result in the prediction of an excessive number of tag SNP loci, which may not always be associated with the disease. Therefore in this manuscript, we aimed to evaluate the impact of a newly optimized fuzzy clustering and binary particle swarm optimization algorithm (FCBPSO) on the accuracy and running time of informative SNP selection. Fuzzy clustering and FCBPSO were first applied to identify the equivalence relation and the candidate tag SNP set to reduce the redundancy between loci. The FCBPSO algorithm was then optimized and used to obtain the final tag SNP set. The prediction performance and running time of the newly developed model were compared with other traditional methods, including NMC, SPSO, and MCMR. The prediction accuracy of the FCBPSO algorithm was always higher than that of the other algorithms especially as the number of tag SNPs increased. However, when the number of tag SNPs was low, the prediction accuracy of FCBPSO was slightly lower than that of MCMR (add prediction accuracy values for each algorithm). However, the running time of the FCBPSO algorithm was always lower than that of MCMR. FCBPSO not only reduced the size and dimension of the optimization problem but also simplified the training of the prediction model. This improved the prediction accuracy of the model and reduced the running time when compared with other traditional methods.
Collapse
|
8
|
Ding Y, Jiang J, Xu J, Chen Y, Zheng Y, Jiang W, Mao C, Jiang H, Bao X, Shen Y, Li X, Teng L, Xu N. Site-specific therapy in cancers of unknown primary site: a systematic review and meta-analysis. ESMO Open 2022; 7:100407. [PMID: 35248824 PMCID: PMC8897579 DOI: 10.1016/j.esmoop.2022.100407] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 01/22/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Background Cancer of unknown primary site (CUP) is a term applied to characterize pathologically confirmed metastatic cancer with unknown primary tumor origin. It remains uncertain whether patients with CUP benefit from site-specific therapy guided by molecular profiling. Patients and methods A systematic search in PubMed, Web of Science, Embase, Cochrane Library, and ClinicalTrials.gov, and of conference abstracts from January 1976 to January 2021 was performed to identify studies investigating the efficacy of site-specific therapy on patients with CUP. The quality of included studies was evaluated using the Cochrane risk of bias tool and Newcastle–Ottawa scale. Eligible studies were weighted and pooled for meta-analysis. Hazard ratios (HRs) for overall survival (OS) and progression-free survival (PFS) were assessed to compare the efficacy of site-specific therapy with empiric therapy in patients with CUP. In addition, subgroup analyses were conducted. Results Five studies comprising 1114 patients were identified, of which 454 patients received site-specific therapy, and 660 patients received empiric therapy. Our meta-analysis revealed that site-specific therapy was not significantly associated with improved PFS [HR 0.93, 95% confidence interval (CI) 0.74-1.17, P = 0.534] and OS (HR 0.75, 95% CI 0.55-1.03, P = 0.069), compared with empiric therapy. However, during subgroup analysis significantly improved OS was associated with site-specific therapy in the high-accuracy predictive assay subgroup (HR 0.46, 95% CI 0.26-0.81, P = 0.008) compared with the low accuracy predictive assay subgroup (HR 0.93, 95% CI 0.75-1.15, P = 0.509). Furthermore, compared with patients with less responsive tumor types, more survival benefit from site-specific therapy was found in patients with more responsive tumors (HR 0.67, 95% CI 0.46-0.97, P = 0.037). Conclusions Our results suggest that site-specific therapy is not significantly associated with improved survival outcomes; however, it might benefit patients with CUP with responsive tumor types. Studies evaluating the role of site-specific therapy guided by molecular profiling in CUP provided contradictory results. Site-specific therapy is not significantly associated with improved survival outcomes in the overall CUP population. Molecularly defined site-specific therapy may improve OS only when high-accuracy assays assign CUP to responsive tumor types.
Collapse
Affiliation(s)
- Y Ding
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - J Jiang
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - J Xu
- Department of Thoracic Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Chen
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Zheng
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - W Jiang
- Department of Colorectal Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou; China
| | - C Mao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - H Jiang
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - X Bao
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Y Shen
- Centre of Clinical Laboratory, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou; China; Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang Province, Hangzhou; China; Institute of Laboratory Medicine, Zhejiang University, Hangzhou; China
| | - X Li
- Department of Surgery, The Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - L Teng
- Department of Surgical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| | - N Xu
- Department of Medical Oncology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| |
Collapse
|
9
|
Wu Q, Li D. CRIA: An Interactive Gene Selection Algorithm for Cancers Prediction Based on Copy Number Variations. FRONTIERS IN PLANT SCIENCE 2022; 13:839044. [PMID: 35386679 PMCID: PMC8978562 DOI: 10.3389/fpls.2022.839044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 01/19/2022] [Indexed: 05/05/2023]
Abstract
Genomic copy number variations (CNVs) are among the most important structural variations of genes found to be related to the risk of individual cancer and therefore they can be utilized to provide a clue to the research on the formation and progression of cancer. In this paper, an improved computational gene selection algorithm called CRIA (correlation-redundancy and interaction analysis based on gene selection algorithm) is introduced to screen genes that are closely related to cancer from the whole genome based on the value of gene CNVs. The CRIA algorithm mainly consists of two parts. Firstly, the main effect feature is selected out from the original feature set that has the largest correlation with the class label. Secondly, after the analysis involving correlation, redundancy and interaction for each feature in the candidate feature set, we choose the feature that maximizes the value of the custom selection criterion and add it into the selected feature set and then remove it from the candidate feature set in each selection round. Based on the real datasets, CRIA selects the top 200 genes to predict the type of cancer. The experiments' results of our research show that, compared with the state-of-the-art related methods, the CRIA algorithm can extract the key features of CNVs and a better classification performance can be achieved based on them. In addition, the interpretable genes highly related to cancer can be known, which may provide new clues at the genetic level for the treatment of the cancer.
Collapse
|
10
|
Pang H, Zhang G, Yan N, Lang J, Liang Y, Xu X, Cui Y, Wu X, Li X, Shan M, Wang X, Meng X, Liu J, Tian G, Cai L, Yuan D, Wang X. Evaluating the Risk of Breast Cancer Recurrence and Metastasis After Adjuvant Tamoxifen Therapy by Integrating Polymorphisms in Cytochrome P450 Genes and Clinicopathological Characteristics. Front Oncol 2021; 11:738222. [PMID: 34868931 PMCID: PMC8639703 DOI: 10.3389/fonc.2021.738222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/25/2021] [Indexed: 11/13/2022] Open
Abstract
Tamoxifen (TAM) is the most commonly used adjuvant endocrine drug for hormone receptor-positive (HR+) breast cancer patients. However, how to accurately evaluate the risk of breast cancer recurrence and metastasis after adjuvant TAM therapy is still a major concern. In recent years, many studies have shown that the clinical outcomes of TAM-treated breast cancer patients are influenced by the activity of some cytochrome P450 (CYP) enzymes that catalyze the formation of active TAM metabolites like endoxifen and 4-hydroxytamoxifen. In this study, we aimed to first develop and validate an algorithm combining polymorphisms in CYP genes and clinicopathological signatures to identify a subpopulation of breast cancer patients who might benefit most from TAM adjuvant therapy and meanwhile evaluate major risk factors related to TAM resistance. Specifically, a total of 256 patients with invasive breast cancer who received adjuvant endocrine therapy were selected. The genotypes at 10 loci from three TAM metabolism-related CYP genes were detected by time-of-flight mass spectrometry and multiplex long PCR. Combining the 10 loci with nine clinicopathological characteristics, we obtained 19 important features whose association with cancer recurrence was assessed by importance score via random forests. After that, a logistic regression model was trained to calculate TAM risk-of-recurrence score (TAM RORs), which is adopted to assess a patient's risk of recurrence after TAM treatment. The sensitivity and specificity of the model in an independent test cohort were 86.67% and 64.56%, respectively. This study showed that breast cancer patients with high TAM RORs were less sensitive to TAM treatment and manifested more invasive characteristics, whereas those with low TAM RORs were highly sensitive to TAM treatment, and their conditions were stable during the follow-up period. There were some risk factors that had a significant effect on the efficacy of TAM. They were tissue classification (tumor Grade < 2 vs. Grade ≥ 2, p = 2.2e-16), the number of lymph node metastases (Node-Negative vs. Node < 4, p = 5.3e-07; Node < 4 vs. Node ≥ 4, p = 0.003; Node-Negative vs. Node ≥ 4, p = 7.2e-15), and the expression levels of estrogen receptor (ER) and progesterone receptor (PR) (ER < 50% vs. ER ≥ 50%, p = 1.3e-12; PR < 50% vs. PR ≥ 50%, p = 2.6e-08). The really remarkable thing is that different genotypes of CYP2D6*10(C188T) show significant differences in prediction function (CYP2D6*10 CC vs. TT, p < 0.019; CYP2D6*10 CT vs. TT, p < 0.037). There are more than 50% Chinese who have CYP2D6*10 mutation. So the genotype of CYP2D6*10(C188T) should be tested before TAM therapy.
Collapse
Affiliation(s)
- Hui Pang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Guoqiang Zhang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Na Yan
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jidong Lang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Yuebin Liang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Xinyuan Xu
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yaowen Cui
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xueya Wu
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xianjun Li
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Ming Shan
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xiaoqin Wang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiangzhi Meng
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiaxiang Liu
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Geng Tian
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Li Cai
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Dawei Yuan
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xin Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
11
|
Smith J, Shi Y, Benedikt M, Nikolic M. Scalable analysis of multi-modal biomedical data. Gigascience 2021; 10:giab058. [PMID: 34508579 PMCID: PMC8434767 DOI: 10.1093/gigascience/giab058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 05/31/2021] [Accepted: 08/18/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. SOLUTION To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. PERFORMANCE We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on "flattening" complex data structures, and runs efficiently when alternative approaches are unable to perform at all.
Collapse
Affiliation(s)
- Jaclyn Smith
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Yao Shi
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Michael Benedikt
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Milos Nikolic
- University of Edinburgh, School of Informatics, Informatics Forum, 10 Crichton St, Newington, Edinburgh EH8 9AB, Scotland
| |
Collapse
|
12
|
Zhao L, Li Y, Wang Y, Gao Q, Ge Z, Sun X, Li Y. Development and Validation of a Nomogram for the Prediction of Hospital Mortality of Patients With Encephalopathy Caused by Microbial Infection: A Retrospective Cohort Study. Front Microbiol 2021; 12:737066. [PMID: 34489922 PMCID: PMC8417384 DOI: 10.3389/fmicb.2021.737066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Background Hospital mortality is high for patients with encephalopathy caused by microbial infection. Microbial infections often induce sepsis. The damage to the central nervous system (CNS) is defined as sepsis-associated encephalopathy (SAE). However, the relationship between pathogenic microorganisms and the prognosis of SAE patients is still unclear, especially gut microbiota, and there is no clinical tool to predict hospital mortality for SAE patients. The study aimed to explore the relationship between pathogenic microorganisms and the hospital mortality of SAE patients and develop a nomogram for the prediction of hospital mortality in SAE patients. Methods The study is a retrospective cohort study. The lasso regression model was used for data dimension reduction and feature selection. Model of hospital mortality of SAE patients was developed by multivariable Cox regression analysis. Calibration and discrimination were used to assess the performance of the nomogram. Decision curve analysis (DCA) to evaluate the clinical utility of the model. Results Unfortunately, the results of our study did not find intestinal infection and microorganisms of the gastrointestinal (such as: Escherichia coli) that are related to the prognosis of SAE. Lasso regression and multivariate Cox regression indicated that factors including respiratory failure, lactate, international normalized ratio (INR), albumin, SpO2, temperature, and renal replacement therapy were significantly correlated with hospital mortality. The AUC of 0.812 under the nomogram was more than that of the Simplified Acute Physiology Score (0.745), indicating excellent discrimination. DCA demonstrated that using the nomogram or including the prognostic signature score status was better than without the nomogram or using the SAPS II at predicting hospital mortality. Conclusion The prognosis of SAE patients has nothing to do with intestinal and microbial infections. We developed a nomogram that predicts hospital mortality in patients with SAE according to clinical data. The nomogram exhibited excellent discrimination and calibration capacity, favoring its clinical utility.
Collapse
Affiliation(s)
- Lina Zhao
- Emergency Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China.,Department of Critical Care Medicine, Chifeng Municipal Hospital, Chifeng Clinical Medical College of Inner Mongolia Medical University, Chifeng, China
| | - Yun Li
- Department of Anesthesiology, Chifeng Municipal Hospital, Chifeng Clinical Medical College of Inner Mongolia Medical University, Chifeng, China
| | - Yunying Wang
- Department of Critical Care Medicine, Chifeng Municipal Hospital, Chifeng Clinical Medical College of Inner Mongolia Medical University, Chifeng, China
| | - Qian Gao
- Department of Neurology, Yidu Central Hospital Affiliated to Weifang Medical University, Weifang, China
| | - Zengzheng Ge
- Emergency Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Xibo Sun
- Department of Neurology, Yidu Central Hospital Affiliated to Weifang Medical University, Weifang, China
| | - Yi Li
- Emergency Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
13
|
Yang J, Hui Y, Zhang Y, Zhang M, Ji B, Tian G, Guo Y, Tang M, Li L, Guo B, Ma T. Application of Circulating Tumor DNA as a Biomarker for Non-Small Cell Lung Cancer. Front Oncol 2021; 11:725938. [PMID: 34422670 PMCID: PMC8375502 DOI: 10.3389/fonc.2021.725938] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 07/19/2021] [Indexed: 12/21/2022] Open
Abstract
Background Non-small cell lung cancer (NSCLC) is one of the most prevalent causes of cancer-related death worldwide. Recently, there are many important medical advancements on NSCLC, such as therapies based on tyrosine kinase inhibitors and immune checkpoint inhibitors. Most of these therapies require tumor molecular testing for selecting patients who would benefit most from them. As invasive biopsy is highly risky, NSCLC molecular testing based on liquid biopsy has received more and more attention recently. Objective We aimed to introduce liquid biopsy and its potential clinical applications in NSCLC patients, including cancer diagnosis, treatment plan prioritization, minimal residual disease detection, and dynamic monitoring on the response to cancer treatment. Method We reviewed recent studies on circulating tumor DNA (ctDNA) testing, which is a minimally invasive approach to identify the presence of tumor-related mutations. In addition, we evaluated potential clinical applications of ctDNA as blood biomarkers for advanced NSCLC patients. Results Most studies have indicated that ctDNA testing is critical in diagnosing NSCLC, predicting clinical outcomes, monitoring response to targeted therapies and immunotherapies, and detecting cancer recurrence. Moreover, the changes of ctDNA levels are associated with tumor mutation burden and cancer progression. Conclusion The ctDNA testing is promising in guiding the therapies on NSCLC patients.
Collapse
Affiliation(s)
- Jialiang Yang
- Chifeng Municipal Hospital, Chifeng, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Yan Hui
- Chifeng Municipal Hospital, Chifeng, China
| | | | | | - Binbin Ji
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Yangqiang Guo
- China National Intellectual Property Administration, Beijing, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, China
| | | | - Bella Guo
- Genetron Health (Beijing) Co. Ltd., Beijing, China
| | - Tonghui Ma
- Genetron Health (Beijing) Co. Ltd., Beijing, China
| |
Collapse
|
14
|
Meng Y, Jin M. HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis. Front Cell Dev Biol 2021; 9:696359. [PMID: 34277640 PMCID: PMC8278475 DOI: 10.3389/fcell.2021.696359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 05/19/2021] [Indexed: 11/15/2022] Open
Abstract
The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).
Collapse
Affiliation(s)
| | - Min Jin
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
15
|
Gao B, Baudis M. Signatures of Discriminative Copy Number Aberrations in 31 Cancer Subtypes. Front Genet 2021; 12:654887. [PMID: 34054918 PMCID: PMC8155688 DOI: 10.3389/fgene.2021.654887] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 04/15/2021] [Indexed: 12/13/2022] Open
Abstract
Copy number aberrations (CNA) are one of the most important classes of genomic mutations related to oncogenetic effects. In the past three decades, a vast amount of CNA data has been generated by molecular-cytogenetic and genome sequencing based methods. While this data has been instrumental in the identification of cancer-related genes and promoted research into the relation between CNA and histo-pathologically defined cancer types, the heterogeneity of source data and derived CNV profiles pose great challenges for data integration and comparative analysis. Furthermore, a majority of existing studies have been focused on the association of CNA to pre-selected "driver" genes with limited application to rare drivers and other genomic elements. In this study, we developed a bioinformatics pipeline to integrate a collection of 44,988 high-quality CNA profiles of high diversity. Using a hybrid model of neural networks and attention algorithm, we generated the CNA signatures of 31 cancer subtypes, depicting the uniqueness of their respective CNA landscapes. Finally, we constructed a multi-label classifier to identify the cancer type and the organ of origin from copy number profiling data. The investigation of the signatures suggested common patterns, not only of physiologically related cancer types but also of clinico-pathologically distant cancer types such as different cancers originating from the neural crest. Further experiments of classification models confirmed the effectiveness of the signatures in distinguishing different cancer types and demonstrated their potential in tumor classification.
Collapse
Affiliation(s)
- Bo Gao
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Michael Baudis
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| |
Collapse
|
16
|
Liu H, Qiu C, Wang B, Bing P, Tian G, Zhang X, Ma J, He B, Yang J. Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin. Front Cell Dev Biol 2021; 9:619330. [PMID: 34012960 PMCID: PMC8126648 DOI: 10.3389/fcell.2021.619330] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 03/22/2021] [Indexed: 12/18/2022] Open
Abstract
Carcinoma of unknown primary (CUP) is a type of metastatic cancer, the primary tumor site of which cannot be identified. CUP occupies approximately 5% of cancer incidences in the United States with usually unfavorable prognosis, making it a big threat to public health. Traditional methods to identify the tissue-of-origin (TOO) of CUP like immunohistochemistry can only deal with around 20% CUP patients. In recent years, more and more studies suggest that it is promising to solve the problem by integrating machine learning techniques with big biomedical data involving multiple types of biomarkers including epigenetic, genetic, and gene expression profiles, such as DNA methylation. Different biomarkers play different roles in cancer research; for example, genomic mutations in a patient’s tumor could lead to specific anticancer drugs for treatment; DNA methylation and copy number variation could reveal tumor tissue of origin and molecular classification. However, there is no systematic comparison on which biomarker is better at identifying the cancer type and site of origin. In addition, it might also be possible to further improve the inference accuracy by integrating multiple types of biomarkers. In this study, we used primary tumor data rather than metastatic tumor data. Although the use of primary tumors may lead to some biases in our classification model, their tumor-of-origins are known. In addition, previous studies have suggested that the CUP prediction model built from primary tumors could efficiently predict TOO of metastatic cancers (Lal et al., 2013; Brachtel et al., 2016). We systematically compared the performances of three types of biomarkers including DNA methylation, gene expression profile, and somatic mutation as well as their combinations in inferring the TOO of CUP patients. First, we downloaded the gene expression profile, somatic mutation and DNA methylation data of 7,224 tumor samples across 21 common cancer types from the cancer genome atlas (TCGA) and generated seven different feature matrices through various combinations. Second, we performed feature selection by the Pearson correlation method. The selected features for each matrix were used to build up an XGBoost multi-label classification model to infer cancer TOO, an algorithm proven to be effective in a few previous studies. The performance of each biomarker and combination was compared by the 10-fold cross-validation process. Our results showed that the TOO tracing accuracy using gene expression profile was the highest, followed by DNA methylation, while somatic mutation performed the worst. Meanwhile, we found that simply combining multiple biomarkers does not have much effect in improving prediction accuracy.
Collapse
Affiliation(s)
- Haiyan Liu
- Academician Workstation, Changsha Medical University, Changsha, China.,College of Information Engineering, Changsha Medical University, Changsha, China
| | - Chun Qiu
- Department of Oncology, Hainan General Hospital, Haikou, China
| | - Bo Wang
- Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Geng Tian
- Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Xueliang Zhang
- Department of Oncology, Jiamusi Cancer Hospital, Jiamusi, China
| | - Jun Ma
- College of Information Engineering, Changsha Medical University, Changsha, China
| | - Bingsheng He
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China.,Geneis Beijing Co., Ltd., Beijing, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| |
Collapse
|
17
|
Zhuang J, Liu D, Lin M, Qiu W, Liu J, Chen S. PseUdeep: RNA Pseudouridine Site Identification with Deep Learning Algorithm. Front Genet 2021; 12:773882. [PMID: 34868261 PMCID: PMC8637112 DOI: 10.3389/fgene.2021.773882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 10/04/2021] [Indexed: 11/16/2022] Open
Abstract
Background: Pseudouridine (Ψ) is a common ribonucleotide modification that plays a significant role in many biological processes. The identification of Ψ modification sites is of great significance for disease mechanism and biological processes research in which machine learning algorithms are desirable as the lab exploratory techniques are expensive and time-consuming. Results: In this work, we propose a deep learning framework, called PseUdeep, to identify Ψ sites of three species: H. sapiens, S. cerevisiae, and M. musculus. In this method, three encoding methods are used to extract the features of RNA sequences, that is, one-hot encoding, K-tuple nucleotide frequency pattern, and position-specific nucleotide composition. The three feature matrices are convoluted twice and fed into the capsule neural network and bidirectional gated recurrent unit network with a self-attention mechanism for classification. Conclusion: Compared with other state-of-the-art methods, our model gets the highest accuracy of the prediction on the independent testing data set S-200; the accuracy improves 12.38%, and on the independent testing data set H-200, the accuracy improves 0.68%. Moreover, the dimensions of the features we derive from the RNA sequences are only 109,109, and 119 in H. sapiens, M. musculus, and S. cerevisiae, which is much smaller than those used in the traditional algorithms. On evaluation via tenfold cross-validation and two independent testing data sets, PseUdeep outperforms the best traditional machine learning model available. PseUdeep source code and data sets are available at https://github.com/dan111262/PseUdeep.
Collapse
Affiliation(s)
- Jujuan Zhuang
- College of Science, Dalian Maritime University, Dalian, China
| | - Danyang Liu
- College of Science, Dalian Maritime University, Dalian, China
| | - Meng Lin
- College of Science, Dalian Maritime University, Dalian, China
| | - Wenjing Qiu
- Electrical and Information Engineering, Anhui University of Technology, Anhui, China
- Geneis (Beijing) Co., Ltd., Beijing, China
| | | | - Size Chen
- Department of Oncology, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
- Guangdong Provincial Engineering Research Center for Esophageal Cancer Precise Therapy, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
- Central Laboratory, The First Affiliated Hospital of Guangdong Pharmaceutical University, Guangzhou, China
- *Correspondence: Size Chen,
| |
Collapse
|
18
|
Zhang Y, Feng T, Wang S, Dong R, Yang J, Su J, Wang B. A Novel XGBoost Method to Identify Cancer Tissue-of-Origin Based on Copy Number Variations. Front Genet 2020; 11:585029. [PMID: 33329723 PMCID: PMC7716814 DOI: 10.3389/fgene.2020.585029] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 10/05/2020] [Indexed: 01/18/2023] Open
Abstract
The discovery of cancer of unknown primary (CUP) is of great significance in designing more effective treatments and improving the diagnostic efficiency in cancer patients. In the study, we develop an appropriate machine learning model for tracing the tissue of origin of CUP with high accuracy after feature engineering and model evaluation. Based on a copy number variation data consisting of 4,566 training cases and 1,262 independent validation cases, an XGBoost classifier is applied to 10 types of cancer. Extremely randomized tree (Extra tree) is used for dimension reduction so that fewer variables replace the original high-dimensional variables. Features with top 300 weights are selected and principal component analysis is applied to eliminate noise. We find that XGBoost classifier achieves the highest overall accuracy of 0.8913 in the 10-fold cross-validation for training samples and 0.7421 on independent validation datasets for predicting tumor tissue of origin. Furthermore, by contrasting various performance indices, such as precision and recall rate, the experimental results show that XGBoost classifier significantly improves the classification performance of various tumors with less prediction error, as compared to other classifiers, such as K-nearest neighbors (KNN), Bayes, support vector machine (SVM), and Adaboost. Our method can infer tissue of origin for the 10 cancer types with acceptable accuracy in both cross-validation and independent validation data. It may be used as an auxiliary diagnostic method to determine the actual clinicopathological status of specific cancer.
Collapse
Affiliation(s)
- Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, China
| | - Tong Feng
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, China
| | - Shudong Wang
- College of Computer and Communication Engineering, China University of Petroleum (East China), Qingdao, China
| | - Ruyi Dong
- Geneis (Beijing) Co., Ltd., Beijing, China
| | | | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Bo Wang
- Geneis (Beijing) Co., Ltd., Beijing, China
| |
Collapse
|