1
|
Yao Y, Lv Y, Tong L, Liang Y, Xi S, Ji B, Zhang G, Li L, Tian G, Tang M, Hu X, Li S, Yang J. ICSDA: a multi-modal deep learning model to predict breast cancer recurrence and metastasis risk by integrating pathological, clinical and gene expression data. Brief Bioinform 2022; 23:6761046. [PMID: 36242564 DOI: 10.1093/bib/bbac448] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Breast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data. Specifically, we segmented tumor regions in H&E into image blocks (256 × 256 pixels) and encoded each image block into a 1D feature vector using a deep neural network. Then, the attention module scored each area of the H&E-stained images and combined image features with clinical and gene expression data to predict the risk of recurrence and metastasis for each patient. To test the model, we downloaded all 196 breast cancer samples from the Cancer Genome Atlas with clinical, gene expression and H&E information simultaneously available. The samples were then divided into the training and testing sets with a ratio of 7: 3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling. The multi-modal model achieved an area-under-the-curve value of 0.75 on the testing set better than those based solely on H&E image, sequencing data and clinical data, respectively. This study might have clinical significance in identifying high-risk breast cancer patients, who may benefit from postoperative adjuvant treatment.
Collapse
Affiliation(s)
- Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, China
| | - Yaping Lv
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.,Genies Beijing Co., Ltd., Beijing 100102, China
| | - Ling Tong
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Yuebin Liang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Shuxue Xi
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Binbin Ji
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Guanglu Zhang
- School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China
| | - Ling Li
- Basic Courses Department, Zhejiang Shuren University, Hangzhou 310000, China
| | - Geng Tian
- Genies Beijing Co., Ltd., Beijing 100102, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, 212013, China
| | - Xiyue Hu
- Dept. of Colorectal Surgery, National Cancer Center/ Cancer Hospital, Chinese Academy of Medical Science, 17 Panjiayuan Nanli, Chaoyang District, Beijing, China, 100021
| | - Shijun Li
- Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China
| | - Jialiang Yang
- Genies Beijing Co., Ltd., Beijing 100102, China.,Chifeng Municipal Hospital, Chifeng, Inner Mongolia 024000, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China
| |
Collapse
|
2
|
Lu Q, Chen F, Li Q, Chen L, Tong L, Tian G, Zhou X. A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data. Front Oncol 2022; 12:832567. [PMID: 35530331 PMCID: PMC9071249 DOI: 10.3389/fonc.2022.832567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 03/21/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer of unknown primary site (CUP) is a heterogeneous group of cancers whose tissue of origin remains unknown after detailed investigation by conventional clinical methods. The number of CUP accounts for roughly 3%–5% of all human malignancies. CUP patients are usually treated with broad-spectrum chemotherapy, which often leads to a poor prognosis. Recent studies suggest that the treatment targeting the primary lesion of CUP will significantly improve the prognosis of the patient. Therefore, it is urgent to develop an efficient method to accurately detect tissue of origin of CUP in clinical cancer research. In this work, we developed a novel framework that uses Extreme Gradient Boosting (XGBoost) to trace the primary site of CUP based on microarray-based gene expression data. First, we downloaded the microarray-based gene expression profiles of 59,385 genes for 57,08 samples from The Cancer Genome Atlas (TCGA) and 6,364 genes for 3,101 samples from the Gene Expression Omnibus (GEO). Both data were divided into training and independent testing data with a ratio of 4:1. Then, we obtained in the training data 200 and 290 genes from TCGA and the GEO datasets, respectively, to train XGBoost models for the identification of the primary site of CUP. The overall 5-fold cross-validation accuracies of our methods were 96.9% and 95.3% on TCGA and GEO training datasets, respectively. Meanwhile, the macro-precision for the independent dataset reached 96.75% and 98.8% on, respectively, TCGA and GEO. Experimental results demonstrated that the XGBoost framework not only can reduce the cost of clinical cancer traceability but also has high efficiency, which might be useful in clinical usage.
Collapse
Affiliation(s)
- Qingfeng Lu
- Oncology Department, Daqing Oilfield General Hospital, Daqing, China
| | - Fengxia Chen
- Department of Thoracic Surgery, Hainan General Hospital, Haikou, China
| | - Qianyue Li
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Lihong Chen
- Department of Emergency, Qingdao Eighth People's Hospital, Qingdao, China
| | - Ling Tong
- Department of Pathology, Chifeng Municipal Hospital, Chifeng Clinical Medical School of Inner Mongolia Medical University, Chifeng, China
| | - Geng Tian
- Department of R&D, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiaohong Zhou
- Second Division of Cancer, Jiamusi Cancer Hospital, Jiamusi, China
| |
Collapse
|
3
|
Pang H, Zhang G, Yan N, Lang J, Liang Y, Xu X, Cui Y, Wu X, Li X, Shan M, Wang X, Meng X, Liu J, Tian G, Cai L, Yuan D, Wang X. Evaluating the Risk of Breast Cancer Recurrence and Metastasis After Adjuvant Tamoxifen Therapy by Integrating Polymorphisms in Cytochrome P450 Genes and Clinicopathological Characteristics. Front Oncol 2021; 11:738222. [PMID: 34868931 PMCID: PMC8639703 DOI: 10.3389/fonc.2021.738222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/25/2021] [Indexed: 11/13/2022] Open
Abstract
Tamoxifen (TAM) is the most commonly used adjuvant endocrine drug for hormone receptor-positive (HR+) breast cancer patients. However, how to accurately evaluate the risk of breast cancer recurrence and metastasis after adjuvant TAM therapy is still a major concern. In recent years, many studies have shown that the clinical outcomes of TAM-treated breast cancer patients are influenced by the activity of some cytochrome P450 (CYP) enzymes that catalyze the formation of active TAM metabolites like endoxifen and 4-hydroxytamoxifen. In this study, we aimed to first develop and validate an algorithm combining polymorphisms in CYP genes and clinicopathological signatures to identify a subpopulation of breast cancer patients who might benefit most from TAM adjuvant therapy and meanwhile evaluate major risk factors related to TAM resistance. Specifically, a total of 256 patients with invasive breast cancer who received adjuvant endocrine therapy were selected. The genotypes at 10 loci from three TAM metabolism-related CYP genes were detected by time-of-flight mass spectrometry and multiplex long PCR. Combining the 10 loci with nine clinicopathological characteristics, we obtained 19 important features whose association with cancer recurrence was assessed by importance score via random forests. After that, a logistic regression model was trained to calculate TAM risk-of-recurrence score (TAM RORs), which is adopted to assess a patient's risk of recurrence after TAM treatment. The sensitivity and specificity of the model in an independent test cohort were 86.67% and 64.56%, respectively. This study showed that breast cancer patients with high TAM RORs were less sensitive to TAM treatment and manifested more invasive characteristics, whereas those with low TAM RORs were highly sensitive to TAM treatment, and their conditions were stable during the follow-up period. There were some risk factors that had a significant effect on the efficacy of TAM. They were tissue classification (tumor Grade < 2 vs. Grade ≥ 2, p = 2.2e-16), the number of lymph node metastases (Node-Negative vs. Node < 4, p = 5.3e-07; Node < 4 vs. Node ≥ 4, p = 0.003; Node-Negative vs. Node ≥ 4, p = 7.2e-15), and the expression levels of estrogen receptor (ER) and progesterone receptor (PR) (ER < 50% vs. ER ≥ 50%, p = 1.3e-12; PR < 50% vs. PR ≥ 50%, p = 2.6e-08). The really remarkable thing is that different genotypes of CYP2D6*10(C188T) show significant differences in prediction function (CYP2D6*10 CC vs. TT, p < 0.019; CYP2D6*10 CT vs. TT, p < 0.037). There are more than 50% Chinese who have CYP2D6*10 mutation. So the genotype of CYP2D6*10(C188T) should be tested before TAM therapy.
Collapse
Affiliation(s)
- Hui Pang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Guoqiang Zhang
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Na Yan
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Jidong Lang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Yuebin Liang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Xinyuan Xu
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yaowen Cui
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xueya Wu
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xianjun Li
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Ming Shan
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xiaoqin Wang
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xiangzhi Meng
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiaxiang Liu
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Geng Tian
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
- Department of Science, Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
| | - Li Cai
- Department of Medical Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Dawei Yuan
- Department of Science, Geneis (Beijing) Co., Ltd., Beijing, China
| | - Xin Wang
- Department of Breast Surgical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
4
|
Zhang Y, Xia L, Ma D, Wu J, Xu X, Xu Y. 90-Gene Expression Profiling for Tissue Origin Diagnosis of Cancer of Unknown Primary. Front Oncol 2021; 11:722808. [PMID: 34692498 PMCID: PMC8529103 DOI: 10.3389/fonc.2021.722808] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/21/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer of unknown primary (CUP), in which metastatic diseases exist without an identifiable primary location, accounts for about 3-5% of all cancer diagnoses. Successful diagnosis and treatment of such patients are difficult. This study aimed to assess the expression characteristics of 90 genes as a method of identifying the primary site from CUP samples. We validated a 90-gene expression assay and explored its potential diagnostic utility in 44 patients at Jiangsu Cancer Hospital. For each specimen, the expression of 90 tumor-specific genes in malignant tumors was analyzed, and similarity scores were obtained. The types of malignant tumors predicted were compared with the reference diagnosis to calculate the accuracy. In addition, we verified the consistency of the expression profiles of the 90 genes in CUP secondary malignancies and metastatic malignancies in The Cancer Genome Atlas. We also reported a detailed description of the next-generation coding sequences for CUP patients. For each clinical medical specimen collected, the type of malignant tumor predicted and analyzed by the 90-gene expression assay was compared with its reference diagnosis, and the overall accuracy was 95.4%. In addition, the 90-gene expression profile generally accurately classified CUP into the cluster of its primary tumor. Sequencing of the exome transcriptome containing 556 high-frequency gene mutation oncogenes was not significantly related to the 90 genes analysis. Our results demonstrate that the expression characteristics of these 90 genes can be used as a powerful tool to accurately identify the primary sites of CUP. In the future, the inclusion of the 90-gene expression assay in pathological diagnosis will help oncologists use precise treatments, thereby improving the care and outcomes of CUP patients.
Collapse
Affiliation(s)
- Yi Zhang
- Department of Pathology, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Lei Xia
- Department of Pathology, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Dawei Ma
- Department of Pathology, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Jing Wu
- Department of Radiation Oncology, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Xinyu Xu
- Department of Pathology, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Youtao Xu
- Department of Thoracic Surgery, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, The Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
5
|
Wharton KA, Wood D, Manesse M, Maclean KH, Leiss F, Zuraw A. Tissue Multiplex Analyte Detection in Anatomic Pathology - Pathways to Clinical Implementation. Front Mol Biosci 2021; 8:672531. [PMID: 34386519 PMCID: PMC8353449 DOI: 10.3389/fmolb.2021.672531] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 07/14/2021] [Indexed: 12/12/2022] Open
Abstract
Background: Multiplex tissue analysis has revolutionized our understanding of the tumor microenvironment (TME) with implications for biomarker development and diagnostic testing. Multiplex labeling is used for specific clinical situations, but there remain barriers to expanded use in anatomic pathology practice. Methods: We review immunohistochemistry (IHC) and related assays used to localize molecules in tissues, with reference to United States regulatory and practice landscapes. We review multiplex methods and strategies used in clinical diagnosis and in research, particularly in immuno-oncology. Within the framework of assay design and testing phases, we examine the suitability of multiplex immunofluorescence (mIF) for clinical diagnostic workflows, considering its advantages and challenges to implementation. Results: Multiplex labeling is poised to radically transform pathologic diagnosis because it can answer questions about tissue-level biology and single-cell phenotypes that cannot be addressed with traditional IHC biomarker panels. Widespread implementation will require improved detection chemistry, illustrated by InSituPlex technology (Ultivue, Inc., Cambridge, MA) that allows coregistration of hematoxylin and eosin (H&E) and mIF images, greater standardization and interoperability of workflow and data pipelines to facilitate consistent interpretation by pathologists, and integration of multichannel images into digital pathology whole slide imaging (WSI) systems, including interpretation aided by artificial intelligence (AI). Adoption will also be facilitated by evidence that justifies incorporation into clinical practice, an ability to navigate regulatory pathways, and adequate health care budgets and reimbursement. We expand the brightfield WSI system “pixel pathway” concept to multiplex workflows, suggesting that adoption might be accelerated by data standardization centered on cell phenotypes defined by coexpression of multiple molecules. Conclusion: Multiplex labeling has the potential to complement next generation sequencing in cancer diagnosis by allowing pathologists to visualize and understand every cell in a tissue biopsy slide. Until mIF reagents, digital pathology systems including fluorescence scanners, and data pipelines are standardized, we propose that diagnostic labs will play a crucial role in driving adoption of multiplex tissue diagnostics by using retrospective data from tissue collections as a foundation for laboratory-developed test (LDT) implementation and use in prospective trials as companion diagnostics (CDx).
Collapse
|