1
|
Li S, Yi H, Leng Q, Wu Y, Mao Y. New perspectives on cancer clinical research in the era of big data and machine learning. Surg Oncol 2024; 52:102009. [PMID: 38215544 DOI: 10.1016/j.suronc.2023.102009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/16/2023] [Indexed: 01/14/2024]
Abstract
In the 21st century, the development of medical science has entered the era of big data, and machine learning has become an essential tool for mining medical big data. The establishment of the SEER database has provided a wealth of epidemiological data for cancer clinical research, and the number of studies based on SEER and machine learning has been growing in recent years. This article reviews recent research based on SEER and machine learning and finds that the current focus of such studies is primarily on the development and validation of models using machine learning algorithms, with the main directions being lymph node metastasis prediction, distant metastasis prediction, and prognosis-related research. Compared to traditional models, machine learning algorithms have the advantage of stronger adaptability, but also suffer from disadvantages such as overfitting and poor interpretability, which need to be weighed in practical applications. At present, machine learning algorithms, as the foundation of artificial intelligence, have just begun to emerge in the field of cancer clinical research. The future development of oncology will enter a more precise era of cancer research, characterized by larger data, higher dimensions, and more frequent information exchange. Machine learning is bound to shine brightly in this field.
Collapse
Affiliation(s)
- Shujun Li
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, 410008, China; National Clinical Research Center for Geriatric Diseases (Xiangya Hospital), China; Hunan Hematology Oncology Clinical Medical Research Center, China
| | - Hang Yi
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Qihao Leng
- Xiangya School of Medicine, Central South University, Changsha, 410013, Hunan Province, China
| | - You Wu
- Institute for Hospital Management, School of Medicine, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Yousheng Mao
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
2
|
Huang Y, Li J, Li M, Aparasu RR. Application of machine learning in predicting survival outcomes involving real-world data: a scoping review. BMC Med Res Methodol 2023; 23:268. [PMID: 37957593 PMCID: PMC10641971 DOI: 10.1186/s12874-023-02078-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 10/20/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Despite the interest in machine learning (ML) algorithms for analyzing real-world data (RWD) in healthcare, the use of ML in predicting time-to-event data, a common scenario in clinical practice, is less explored. ML models are capable of algorithmically learning from large, complex datasets and can offer advantages in predicting time-to-event data. We reviewed the recent applications of ML for survival analysis using RWD in healthcare. METHODS PUBMED and EMBASE were searched from database inception through March 2023 to identify peer-reviewed English-language studies of ML models for predicting time-to-event outcomes using the RWD. Two reviewers extracted information on the data source, patient population, survival outcome, ML algorithms, and the Area Under the Curve (AUC). RESULTS Of 257 citations, 28 publications were included. Random survival forests (N = 16, 57%) and neural networks (N = 11, 39%) were the most popular ML algorithms. There was variability across AUC for these ML models (median 0.789, range 0.6-0.950). ML algorithms were predominately considered for predicting overall survival in oncology (N = 12, 43%). ML survival models were often used to predict disease prognosis or clinical events (N = 27, 96%) in the oncology, while less were used for treatment outcomes (N = 1, 4%). CONCLUSIONS The ML algorithms, random survival forests and neural networks, are mainly used for RWD to predict survival outcomes such as disease prognosis or clinical events in the oncology. This review shows that more opportunities remain to apply these ML algorithms to inform treatment decision-making in clinical practice. More methodological work is also needed to ensure the utility and applicability of ML models in survival outcomes.
Collapse
Affiliation(s)
- Yinan Huang
- Department of Pharmacy Administration, School of Pharmacy, University of Mississippi, University, MS, 38677, USA
| | - Jieni Li
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX, 77204, USA
| | - Mai Li
- Department of Industrial Engineering, Cullen College of Engineering, University of Houston, Houston, TX, USA
| | - Rajender R Aparasu
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX, 77204, USA.
| |
Collapse
|
3
|
Li W, Qin Y, Chen X, Wang X. Mining of clinical and prognosis related genes in the tumor microenvironment of endometrial cancer: A field synopsis of observational study. Medicine (Baltimore) 2023; 102:e34047. [PMID: 37352078 PMCID: PMC10289639 DOI: 10.1097/md.0000000000034047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/25/2023] Open
Abstract
Endometrial cancer (EC) is the sixth most common malignant tumor in women worldwide, and its morbidity and mortality are on the rise. The purpose of this study was to explore potential tumor microenvironment (TME)-related biomarkers associated with the clinical features and prognosis of EC. The Estimating Stromal and Immune Cells in Malignancy Using Expression Data (ESTIMATE) algorithm was used to calculate TME immune and stromal scores of EC samples and to analyze the relationship between immune/stromal scores, clinical features, and prognosis. Heat maps and Venn maps were used to screen for differentially expressed genes (DEGs). The ESTIMATE algorithm revealed immune score was significantly correlated with overall survival and tumor grade in patients with EC. A total of 1448 DEGs were screened, of which 387 were intersecting genes. Gene Ontology (GO) analysis revealed that the biological processes (BP) related to intersecting genes mainly included T cell activation and regulation of lymphocyte activation. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that the intersecting genes were closely related to immune-related signaling pathways. Thirty core genes with more than 7 nodes were identified using protein-protein interaction (PPI) analysis. Six independent prognostic genes of EC were identified using Kaplan-Meier survival analysis and multivariate Cox analysis, namely CD5, BATF, CACNA2D2, LTA, CD52, and NOL4, which are all immune-infiltrating genes that are closely related to clinical features. The current study identified 6 key genes closely related to immune infiltration in the TME of EC that predict clinical outcomes, which may provide new insights into novel prognostic biomarkers and immunotherapy for patients with EC.
Collapse
Affiliation(s)
- Wenxue Li
- Department of Obstetrics and Gynecology, The Affiliated Weihai Second Municipal Hospital of Qingdao University, Weihai, Shandong, China
| | - Yujing Qin
- Department of Obstetrics and Gynecology, The Affiliated Weihai Second Municipal Hospital of Qingdao University, Weihai, Shandong, China
| | - Xiujuan Chen
- Department of Obstetrics and Gynecology, The Affiliated Weihai Second Municipal Hospital of Qingdao University, Weihai, Shandong, China
| | - Xiaolei Wang
- Department of Obstetrics and Gynecology, The Affiliated Weihai Second Municipal Hospital of Qingdao University, Weihai, Shandong, China
| |
Collapse
|
4
|
Syed Soffian SS, Mohammed Nawi A, Hod R, Abdul Maulud KN, Mohd Azmi AT, Hasim Hashim MH, Chan HK, Abu Hassan MR. Spatial clustering of colorectal cancer in Malaysia. GEOSPATIAL HEALTH 2023; 18. [PMID: 37246545 DOI: 10.4081/gh.2023.1158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/14/2023] [Indexed: 05/30/2023]
Abstract
INTRODUCTION The rise in colorectal cancer (CRC) incidence becomes a global concern. As geographical variations in the CRC incidence suggests the role of area-level determinants, the current study was designed to identify the spatial distribution pattern of CRC at the neighbourhood level in Malaysia. METHOD Newly diagnosed CRC cases between 2010 and 2016 in Malaysia were identified from the National Cancer Registry. Residential addresses were geocoded. Clustering analysis was subsequently performed to examine the spatial dependence between CRC cases. Differences in socio-demographic characteristics of individuals between the clusters were also compared. Identified clusters were categorized into urban and semi-rural areas based on the population background. RESULT Most of the 18 405 individuals included in the study were male (56%), aged between 60 and 69 years (30.3%) and only presented for care at stages 3 or 4 of the disease (71.3%). The states shown to have CRC clusters were Kedah, Penang, Perak, Selangor, Kuala Lumpur, Melaka, Johor, Kelantan, and Sarawak. The spatial autocorrelation detected a significant clustering pattern (Moran's Index 0.244, p< 0.01, Z score >2.58). CRC clusters in Penang, Selangor, Kuala Lumpur, Melaka, Johor, and Sarawak were in urbanized areas, while those in Kedah, Perak and Kelantan were in semi-rural areas. CONCLUSION The presence of several clusters in urbanized and semi-rural areas implied the role of ecological determinants at the neighbourhood level in Malaysia. Such findings could be used to guide the policymakers in resource allocation and cancer control.
Collapse
Affiliation(s)
| | - Azmawati Mohammed Nawi
- Department of Community Health, Faculty of Medicine, Universiti Kebangsaan Malaysia, Kuala Lumpur.
| | - Rozita Hod
- Department of Community Health, Faculty of Medicine, Universiti Kebangsaan Malaysia, Kuala Lumpur.
| | - Khairul Nizam Abdul Maulud
- Earth Observation Centre, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi; Department of Civil Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan.
| | - Ahmad Tarmizi Mohd Azmi
- Earth Observation Centre, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi.
| | | | - Huan-Keat Chan
- Clinical Research Center, Sultanah Bahiyah Hospital, Alor Setar.
| | | |
Collapse
|
5
|
Lin FPY, Salih OS, Scott N, Jameson MB, Epstein RJ. Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer. JCO Clin Cancer Inform 2022; 6:e2200064. [DOI: 10.1200/cci.22.00064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
PURPOSE Predicting short-term mortality in patients with advanced cancer remains challenging. Whether digitalized clinical text can be used to build models to enhance survival prediction in this population is unclear. MATERIALS AND METHODS We conducted a single-centered retrospective cohort study in patients with advanced solid tumors. Clinical correspondence authored by oncologists at the first patient encounter was extracted from the electronic medical records. Machine learning (ML) models were trained using narratives from the derivation cohort, before being tested on a temporal validation cohort at the same site. Performance was benchmarked against Eastern Cooperative Oncology Group performance status (PS), comparing ML models alone (comparison 1) or in combination with PS (comparison 2), assessed by areas under receiver operating characteristic curves (AUCs) for predicting vital status at 11 time points from 2 to 52 weeks. RESULTS ML models were built on the derivation cohort (4,791 patients from 2001 to April 2017) and tested on the validation cohort of 726 patients (May 2017-June 2019). In 441 patients (61%) where clinical narratives were available and PS was documented, ML models outperformed the predictivity of PS (mean AUC improvement, 0.039, P < .001, comparison 1). Inclusion of both clinical text and PS in ML models resulted in further improvement in prediction accuracy over PS with a mean AUC improvement of 0.050 ( P < .001, comparison 2); the AUC was > 0.80 at all assessed time points for models incorporating clinical text. Exploratory analysis of oncologist's narratives revealed recurring descriptors correlating with survival, including referral patterns, mobility, physical functions, and concomitant medications. CONCLUSION Applying ML to oncologists' narratives with or without including patient's PS significantly improved survival prediction to 12 months, suggesting the utility of clinical text in building prognostic support tools.
Collapse
Affiliation(s)
- Frank Po-Yen Lin
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
- NHMRC Clinical Trials Centre, Sydney University, Camperdown, Australia
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
| | - Osama S.M. Salih
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Auckland City Hospital, Auckland, New Zealand
| | - Nina Scott
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Michael B. Jameson
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Richard J. Epstein
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
- Cancer Research Division, Garvan Institute of Medical Research, Sydney, Australia
- New Hope Cancer Centre, Beijing United Hospital, Beijing, China
| |
Collapse
|
6
|
Area-Level Determinants in Colorectal Cancer Spatial Clustering Studies: A Systematic Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph181910486. [PMID: 34639786 PMCID: PMC8508304 DOI: 10.3390/ijerph181910486] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 10/01/2021] [Accepted: 10/03/2021] [Indexed: 12/12/2022]
Abstract
The increasing pattern of colorectal cancer (CRC) in specific geographic region, compounded by interaction of multifactorial determinants, showed the tendency to cluster. The review aimed to identify and synthesize available evidence on clustering patterns of CRC incidence, specifically related to the associated determinants. Articles were systematically searched from four databases, Scopus, Web of Science, PubMed, and EBSCOHost. The approach for identification of the final articles follows PRISMA guidelines. Selected full-text articles were published between 2016 and 2021 of English language and spatial studies focusing on CRC cluster identification. Articles of systematic reviews, conference proceedings, book chapters, and reports were excluded. Of the final 12 articles, data on the spatial statistics used and associated factors were extracted. Identified factors linked with CRC cluster were further classified into ecology (health care accessibility, urbanicity, dirty streets, tree coverage), biology (age, sex, ethnicity, overweight and obesity, daily consumption of milk and fruit), and social determinants (median income level, smoking status, health cost, employment status, housing violations, and domestic violence). Future spatial studies that incorporate physical environment related to CRC cluster and the potential interaction between the ecology, biology and social determinants are warranted to provide more insights to the complex mechanism of CRC cluster pattern.
Collapse
|
7
|
Parimbelli E, Wilk S, Cornet R, Sniatala P, Sniatala K, Glaser SLC, Fraterman I, Boekhout AH, Ottaviano M, Peleg M. A review of AI and Data Science support for cancer management. Artif Intell Med 2021; 117:102111. [PMID: 34127240 DOI: 10.1016/j.artmed.2021.102111] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/23/2020] [Accepted: 05/11/2021] [Indexed: 02/09/2023]
Abstract
INTRODUCTION Thanks to improvement of care, cancer has become a chronic condition. But due to the toxicity of treatment, the importance of supporting the quality of life (QoL) of cancer patients increases. Monitoring and managing QoL relies on data collected by the patient in his/her home environment, its integration, and its analysis, which supports personalization of cancer management recommendations. We review the state-of-the-art of computerized systems that employ AI and Data Science methods to monitor the health status and provide support to cancer patients managed at home. OBJECTIVE Our main objective is to analyze the literature to identify open research challenges that a novel decision support system for cancer patients and clinicians will need to address, point to potential solutions, and provide a list of established best-practices to adopt. METHODS We designed a review study, in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, analyzing studies retrieved from PubMed related to monitoring cancer patients in their home environments via sensors and self-reporting: what data is collected, what are the techniques used to collect data, semantically integrate it, infer the patient's state from it and deliver coaching/behavior change interventions. RESULTS Starting from an initial corpus of 819 unique articles, a total of 180 papers were considered in the full-text analysis and 109 were finally included in the review. Our findings are organized and presented in four main sub-topics consisting of data collection, data integration, predictive modeling and patient coaching. CONCLUSION Development of modern decision support systems for cancer needs to utilize best practices like the use of validated electronic questionnaires for quality-of-life assessment, adoption of appropriate information modeling standards supplemented by terminologies/ontologies, adherence to FAIR data principles, external validation, stratification of patients in subgroups for better predictive modeling, and adoption of formal behavior change theories. Open research challenges include supporting emotional and social dimensions of well-being, including PROs in predictive modeling, and providing better customization of behavioral interventions for the specific population of cancer patients.
Collapse
Affiliation(s)
| | - S Wilk
- Poznan University of Technology, Poland
| | - R Cornet
- Amsterdam University Medical Centre, the Netherlands
| | | | | | - S L C Glaser
- Amsterdam University Medical Centre, the Netherlands
| | - I Fraterman
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - A H Boekhout
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | | | |
Collapse
|
8
|
Bioinformatic profiling of prognosis-related genes in the breast cancer immune microenvironment. Aging (Albany NY) 2019; 11:9328-9347. [PMID: 31715586 PMCID: PMC6874454 DOI: 10.18632/aging.102373] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 10/12/2019] [Indexed: 02/07/2023]
Abstract
In the microenvironment of breast cancer, immune cell infiltration is associated with an improved prognosis. To identify immune-related prognostic markers and therapeutic targets, we determined the lymphocyte-specific kinase (LCK) metagene scores of samples from breast cancer patients in The Cancer Genome Atlas. The LCK metagene score correlated highly with other immune-related scores, as well as with the clinical stage, prognosis and tumor suppressor gene mutation status (BRCA2, TP53, PTEN) of patients in the four breast cancer subtypes. A weighted gene co-expression network analysis was performed to detect representative genes from LCK metagene-related gene modules. In two of these modules, the levels of the co-expressed genes correlated highly with LCK metagene levels, so we conducted an enrichment analysis to discover their functions. We also identified differentially expressed genes in samples with high and low LCK metagene scores. By examining the overlapping results from these analyses, we obtained 115 genes, and found that 22 of them were independent predictors of overall survival in breast cancer patients. These genes were validated for their prognostic and diagnostic value with external data sets and paired tumor and non-tumor tissues. The genes identified herein could serve as diagnostic/prognostic markers and immune-related therapeutic targets in breast cancer.
Collapse
|