1
|
Lin H, Ni L, Phuong C, Hong JC. Natural Language Processing for Radiation Oncology: Personalizing Treatment Pathways. Pharmgenomics Pers Med 2024; 17:65-76. [PMID: 38370334 PMCID: PMC10874185 DOI: 10.2147/pgpm.s396971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 01/29/2024] [Indexed: 02/20/2024] Open
Abstract
Natural language processing (NLP), a technology that translates human language into machine-readable data, is revolutionizing numerous sectors, including cancer care. This review outlines the evolution of NLP and its potential for crafting personalized treatment pathways for cancer patients. Leveraging NLP's ability to transform unstructured medical data into structured learnable formats, researchers can tap into the potential of big data for clinical and research applications. Significant advancements in NLP have spurred interest in developing tools that automate information extraction from clinical text, potentially transforming medical research and clinical practices in radiation oncology. Applications discussed include symptom and toxicity monitoring, identification of social determinants of health, improving patient-physician communication, patient education, and predictive modeling. However, several challenges impede the full realization of NLP's benefits, such as privacy and security concerns, biases in NLP models, and the interpretability and generalizability of these models. Overcoming these challenges necessitates a collaborative effort between computer scientists and the radiation oncology community. This paper serves as a comprehensive guide to understanding the intricacies of NLP algorithms, their performance assessment, past research contributions, and the future of NLP in radiation oncology research and clinics.
Collapse
Affiliation(s)
- Hui Lin
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
- UC Berkeley-UCSF Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, San Francisco, CA, USA
| | - Lisa Ni
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Christina Phuong
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
| | - Julian C Hong
- Department of Radiation Oncology, University of California San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA
- Joint Program in Computational Precision Health, University of California, Berkeley and San Francisco, Berkeley, CA, USA
| |
Collapse
|
2
|
Saha A, Burns L, Kulkarni AM. A scoping review of natural language processing of radiology reports in breast cancer. Front Oncol 2023; 13:1160167. [PMID: 37124523 PMCID: PMC10130381 DOI: 10.3389/fonc.2023.1160167] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
Various natural language processing (NLP) algorithms have been applied in the literature to analyze radiology reports pertaining to the diagnosis and subsequent care of cancer patients. Applications of this technology include cohort selection for clinical trials, population of large-scale data registries, and quality improvement in radiology workflows including mammography screening. This scoping review is the first to examine such applications in the specific context of breast cancer. Out of 210 identified articles initially, 44 met our inclusion criteria for this review. Extracted data elements included both clinical and technical details of studies that developed or evaluated NLP algorithms applied to free-text radiology reports of breast cancer. Our review illustrates an emphasis on applications in diagnostic and screening processes over treatment or therapeutic applications and describes growth in deep learning and transfer learning approaches in recent years, although rule-based approaches continue to be useful. Furthermore, we observe increased efforts in code and software sharing but not with data sharing.
Collapse
Affiliation(s)
- Ashirbani Saha
- Department of Oncology, McMaster University, Hamilton, ON, Canada
- Hamilton Health Sciences and McMaster University, Escarpment Cancer Research Institute, Hamilton, ON, Canada
- *Correspondence: Ashirbani Saha,
| | - Levi Burns
- Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
3
|
DRUG REPOSITIONING FOR CANCER IN THE ERA OF BIG OMICS AND REAL-WORLD DATA. Crit Rev Oncol Hematol 2022; 175:103730. [DOI: 10.1016/j.critrevonc.2022.103730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/25/2022] [Accepted: 05/27/2022] [Indexed: 11/15/2022] Open
|
4
|
Zeng J, Gensheimer MF, Rubin DL, Athey S, Shachter RD. Uncovering interpretable potential confounders in electronic medical records. Nat Commun 2022; 13:1014. [PMID: 35197467 PMCID: PMC8866497 DOI: 10.1038/s41467-022-28546-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 01/28/2022] [Indexed: 12/25/2022] Open
Abstract
Randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational studies are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding. We explore how unstructured clinical text can be used to reduce selection bias and improve medical practice. We develop a framework based on natural language processing to uncover interpretable potential confounders from text. We validate our method by comparing the estimated hazard ratio (HR) with and without the confounders against established RCTs. We apply our method to four cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute and show that our method shifts the HR estimate towards the RCT results. The uncovered terms can also be interpreted by oncologists for clinical insights. We present this proof-of-concept study to enable more credible causal inference using observational data, uncover meaningful insights from clinical text, and inform high-stakes medical decisions. Randomized clinical trials are often plagued by selection bias, and expert-selected covariates may insufficiently adjust for confounding factors. Here, the authors develop a framework based on natural language processing to uncover interpretable potential confounders from text.
Collapse
Affiliation(s)
- Jiaming Zeng
- Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305, USA.
| | - Michael F Gensheimer
- Department of Radiation Oncology, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Daniel L Rubin
- Department of Biomedical Data Science, Radiology, and Medicine, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Susan Athey
- Graduate School of Business, Stanford University, Stanford, CA, 94305, USA
| | - Ross D Shachter
- Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
5
|
Wang S, Tseng B, Hernandez-Boussard T. Development and evaluation of novel ophthalmology domain-specific neural word embeddings to predict visual prognosis. Int J Med Inform 2021; 150:104464. [PMID: 33892445 PMCID: PMC8183292 DOI: 10.1016/j.ijmedinf.2021.104464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 03/20/2021] [Accepted: 04/11/2021] [Indexed: 01/17/2023]
Abstract
OBJECTIVE To develop and evaluate novel word embeddings (WEs) specific to ophthalmology, using text corpora from published literature and electronic health records (EHR). MATERIALS AND METHODS We trained ophthalmology-specific WEs using 121,740 PubMed abstracts and 89,282 EHR notes using word2vec continuous bag-of-words architecture. PubMed and EHR WEs were compared to general domain GloVe WEs and general biomedical domain BioWordVec embeddings using a novel ophthalmology-domain-specific 200-question analogy test and prediction of prognosis in 5547 low vision patients using EHR notes as inputs to a deep learning model. RESULTS We found that many words representing important ophthalmic concepts in the EHR were missing from the general domain GloVe vocabulary, but covered in the ophthalmology abstract corpus. On ophthalmology analogy testing, PubMed WEs scored 95.0 %, outperforming EHR (86.0 %) and GloVe (91.0 %) but less than BioWordVec (99.5 %). On predicting low vision prognosis, PubMed and EHR WEs resulted in similar AUROC (0.830; 0.826), outperforming GloVe (0.778) and BioWordVec (0.784). CONCLUSION We found that using ophthalmology domain-specific WEs improved performance in ophthalmology-related clinical prediction compared to general WEs. Deep learning models using clinical notes as inputs can predict the prognosis of visually impaired patients. This work provides a framework to improve predictive models using domain-specific WEs.
Collapse
Affiliation(s)
- Sophia Wang
- Byers Eye Institute, Department of Ophthalmology, Stanford University, 2370 Watson Court, Palo Alto, CA, 94303, United States.
| | - Benjamin Tseng
- Byers Eye Institute, Department of Ophthalmology, Stanford University, 2370 Watson Court, Palo Alto, CA, 94303, United States.
| | - Tina Hernandez-Boussard
- Center for Biomedical Informatics Research, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305, United States.
| |
Collapse
|
6
|
Eyuboglu S, Angus G, Patel BN, Pareek A, Davidzon G, Long J, Dunnmon J, Lungren MP. Multi-task weak supervision enables anatomically-resolved abnormality detection in whole-body FDG-PET/CT. Nat Commun 2021; 12:1880. [PMID: 33767174 PMCID: PMC7994797 DOI: 10.1038/s41467-021-22018-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Accepted: 02/16/2021] [Indexed: 11/09/2022] Open
Abstract
Computational decision support systems could provide clinical value in whole-body FDG-PET/CT workflows. However, limited availability of labeled data combined with the large size of PET/CT imaging exams make it challenging to apply existing supervised machine learning systems. Leveraging recent advancements in natural language processing, we describe a weak supervision framework that extracts imperfect, yet highly granular, regional abnormality labels from free-text radiology reports. Our framework automatically labels each region in a custom ontology of anatomical regions, providing a structured profile of the pathologies in each imaging exam. Using these generated labels, we then train an attention-based, multi-task CNN architecture to detect and estimate the location of abnormalities in whole-body scans. We demonstrate empirically that our multi-task representation is critical for strong performance on rare abnormalities with limited training data. The representation also contributes to more accurate mortality prediction from imaging data, suggesting the potential utility of our framework beyond abnormality detection and location estimation.
Collapse
Affiliation(s)
- Sabri Eyuboglu
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| | - Geoffrey Angus
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Bhavik N Patel
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Anuj Pareek
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Guido Davidzon
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Jin Long
- Center for Artificial Intelligence in Medicine and Imaging, Stanford University, Stanford, CA, USA
| | - Jared Dunnmon
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
7
|
Banerjee I, de Sisternes L, Hallak JA, Leng T, Osborne A, Rosenfeld PJ, Gregori G, Durbin M, Rubin D. Prediction of age-related macular degeneration disease using a sequential deep learning approach on longitudinal SD-OCT imaging biomarkers. Sci Rep 2020; 10:15434. [PMID: 32963300 PMCID: PMC7508843 DOI: 10.1038/s41598-020-72359-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 08/23/2020] [Indexed: 01/28/2023] Open
Abstract
We propose a hybrid sequential prediction model called “Deep Sequence”, integrating radiomics-engineered imaging features, demographic, and visual factors, with a recursive neural network (RNN) model in the same platform to predict the risk of exudation within a future time-frame in non-exudative AMD eyes. The proposed model provides scores associated with risk of exudation in the short term (within 3 months) and long term (within 21 months), handling challenges related to variability of OCT scan characteristics and the size of the training cohort. We used a retrospective clinical trial dataset that includes 671 AMD fellow eyes with 13,954 observations before any signs of exudation for training and validation in a tenfold cross validation setting. Deep Sequence achieved high performance for the prediction of exudation within 3 months (0.96 ± 0.02 AUCROC) and within 21 months (0.97 ± 0.02 AUCROC) on cross-validation. Training the proposed model on this clinical trial dataset and testing it on an external real-world clinical dataset showed high performance for the prediction within 3-months (0.82 AUCROC) but a clear decrease in performance for the prediction within 21-months (0.68 AUCROC). While performance differences at longer time intervals may be derived from dataset differences, we believe that the high performance and generalizability achieved in short-term predictions may have a high clinical impact allowing for optimal patient follow-up, adding the possibility of more frequent, detailed screening and tailored treatments for those patients with imminent risk of exudation.
Collapse
Affiliation(s)
- Imon Banerjee
- Department of Biomedical Informatics, Emory University, Atlanta, GA, 30322, USA. .,Department of Radiology, Emory University, Atlanta, GA, 30322, USA.
| | | | - Joelle A Hallak
- Department of Ophthalmology and Visual Sciences, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Theodore Leng
- Byers Eye Institute At Stanford, Stanford University School of Medicine, Palo Alto, CA, 94303, USA
| | | | - Philip J Rosenfeld
- Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL, 33136, USA
| | - Giovanni Gregori
- Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL, 33136, USA
| | - Mary Durbin
- Carl Zeiss Meditec, Inc., Dublin, CA, 94568, USA
| | - Daniel Rubin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
8
|
Banerjee I, Bozkurt S, Caswell-Jin JL, Kurian AW, Rubin DL. Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer. JCO Clin Cancer Inform 2020; 3:1-12. [PMID: 31584836 DOI: 10.1200/cci.19.00034] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
PURPOSE Electronic medical records (EMRs) and population-based cancer registries contain information on cancer outcomes and treatment, yet rarely capture information on the timing of metastatic cancer recurrence, which is essential to understand cancer survival outcomes. We developed a natural language processing (NLP) system to identify patient-specific timelines of metastatic breast cancer recurrence. PATIENTS AND METHODS We used the OncoSHARE database, which includes merged data from the California Cancer Registry and EMRs of 8,956 women diagnosed with breast cancer in 2000 to 2018. We curated a comprehensive vocabulary by interviewing expert clinicians and processing radiology and pathology reports and progress notes. We developed and evaluated the following two distinct NLP approaches to analyze free-text notes: a traditional rule-based model, using rules for metastatic detection from the literature and curated by domain experts; and a contemporary neural network model. For each 3-month period (quarter) from 2000 to 2018, we applied both models to infer recurrence status for that quarter. We trained the NLP models using 894 randomly selected patient records that were manually reviewed by clinical experts and evaluated model performance using 179 hold-out patients (20%) as a test set. RESULTS The median follow-up time was 19 quarters (5 years) for the training set and 15 quarters (4 years) for the test set. The neural network model predicted the timing of distant metastatic recurrence with a sensitivity of 0.83 and specificity of 0.73, outperforming the rule-based model, which had a specificity of 0.35 and sensitivity of 0.88 (P < .001). CONCLUSION We developed an NLP method that enables identification of the occurrence and timing of metastatic breast cancer recurrence from EMRs. This approach may be adaptable to other cancer sites and could help to unlock the potential of EMRs for research on real-world cancer outcomes.
Collapse
Affiliation(s)
- Imon Banerjee
- Stanford University School of Medicine, Stanford, CA
| | - Selen Bozkurt
- Stanford University School of Medicine, Stanford, CA
| | | | | | | |
Collapse
|
9
|
Spasic I, Nenadic G. Clinical Text Data in Machine Learning: Systematic Review. JMIR Med Inform 2020; 8:e17984. [PMID: 32229465 PMCID: PMC7157505 DOI: 10.2196/17984] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 02/24/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
Background Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. Results The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. Conclusions We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.
Collapse
Affiliation(s)
- Irena Spasic
- School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
10
|
Deep learning-based interpretation of basal/acetazolamide brain perfusion SPECT leveraging unstructured reading reports. Eur J Nucl Med Mol Imaging 2020; 47:2186-2196. [PMID: 31912255 DOI: 10.1007/s00259-019-04670-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 12/23/2019] [Indexed: 12/27/2022]
Abstract
PURPOSE Basal/acetazolamide brain perfusion single-photon emission computed tomography (SPECT) has been used to evaluate functional hemodynamics in patients with carotid artery stenosis. We aimed to develop a deep learning model as a support system for interpreting brain perfusion SPECT leveraging unstructured text reports. METHODS In total, 7345 basal/acetazolamide brain perfusion SPECT images and their text reports were retrospectively collected. A long short-term memory (LSTM) network was trained using 500 randomly selected text reports to predict manually labeled structured information, including abnormalities of basal perfusion and vascular reserve for each vascular territory. Using this trained LSTM model, we extracted structured information from the remaining 6845 text reports to develop a deep learning model for interpreting SPECT images. The model was based on a 3D convolutional neural network (CNN), and the performance was tested on the other 500 cases by measuring the area under the receiver-operating characteristic curve (AUC). We then applied the model to patients who underwent revascularization (n = 33) to compare the estimated output of the CNN model for pre- and post-revascularization SPECT and clinical outcomes. RESULTS The AUC of the LSTM model for extracting structured labels was 1.00 for basal perfusion and 0.99 for vascular reserve for all 9 brain regions. The AUC of the CNN model designed to identify abnormal perfusion was 0.83 for basal perfusion and 0.89 for vascular reserve. The output of the CNN model was significantly improved according to the revascularization in the target vascular territory, and its changes in brain territories were concordant with clinical outcomes. CONCLUSION We developed a deep learning model to support the interpretation of brain perfusion SPECT by converting unstructured text reports into structured labels. This model can be used as a support system not only to identify perfusion abnormalities but also to provide quantitative scores of abnormalities, particularly for patients who require revascularization.
Collapse
|
11
|
Koutkias V, Bouaud J. Contributions on Clinical Decision Support from the 2018 Literature. Yearb Med Inform 2019; 28:135-137. [PMID: 31419825 PMCID: PMC6697519 DOI: 10.1055/s-0039-1677929] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Objectives
: To summarize recent research and select the best papers published in 2018 in the field of computerized clinical decision support for the Decision Support section of the International Medical Informatics Association (IMIA) yearbook.
Methods
: A literature review was performed by searching two bibliographic databases for papers referring to clinical decision support systems (CDSSs). The aim was to identify a list of candidate best papers from the retrieved bibliographic records, which were then peer-reviewed by external reviewers. A consensus meeting of the IMIA editorial team finally selected the best papers on the basis of all reviews and the section editors' evaluation.
Results
: Among 1,148 retrieved articles, 15 best paper candidates were selected, the review of which resulted in the selection of four best papers. The first paper introduces a deep learning model for estimating short-term life expectancy (>3 months) of metastatic cancer patients by analyzing free-text clinical notes in electronic medical records, while maintaining the temporal visit sequence. The second paper takes note that CDSSs become routinely integrated in health information systems and compares statistical anomaly detection models to identify CDSS malfunctions which, if remain unnoticed, may have a negative impact on care delivery. The third paper fairly reports on lessons learnt from the development of an oncology CDSS using artificial intelligence techniques and from its assessment in a large US cancer center. The fourth paper implements a preference learning methodology for detecting inconsistencies in clinical practice guidelines and illustrates the applicability of the proposed methodology to antibiotherapy.
Conclusions
: Three of the four best papers rely on data-driven methods, and one builds on a knowledge-based approach. While there is currently a trend for data-driven decision support, the promising results of such approaches still need to be confirmed by the adoption of these systems and their routine use.
Collapse
Affiliation(s)
- Vassilis Koutkias
- Institute of Applied Biosciences, Centre for Research & Technology Hellas, Thermi, Thessaloniki, Greece
| | - Jacques Bouaud
- AP-HP, Delegation for Clinical Research and Innovation, Paris, France.,Sorbonne Université, Université Paris 13, Sorbonne Paris Cité, INSERM, UMR_S 1142, LIMICS, Paris, France
| | | |
Collapse
|
12
|
Gensheimer MF, Henry AS, Wood DJ, Hastie TJ, Aggarwal S, Dudley SA, Pradhan P, Banerjee I, Cho E, Ramchandran K, Pollom E, Koong AC, Rubin DL, Chang DT. Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data. J Natl Cancer Inst 2019; 111:568-574. [PMID: 30346554 PMCID: PMC6579743 DOI: 10.1093/jnci/djy178] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 06/28/2018] [Accepted: 09/05/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Oncologists use patients' life expectancy to guide decisions and may benefit from a tool that accurately predicts prognosis. Existing prognostic models generally use only a few predictor variables. We used an electronic medical record dataset to train a prognostic model for patients with metastatic cancer. METHODS The model was trained and tested using 12 588 patients treated for metastatic cancer in the Stanford Health Care system from 2008 to 2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into a training set used to fit the model coefficients and a test set used to evaluate model performance (80%/20% split). A regularized Cox model with 4126 predictor variables was used. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients. RESULTS The C-index for overall survival was 0.786 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.745 (95% confidence interval [CI] = 0.715 to 0.775) compared with 0.635 (95% CI = 0.601 to 0.669) for a published model using performance status, primary tumor site, and treated site (two-sided P < .001). Our model's predictions were well-calibrated. CONCLUSIONS The model showed high predictive performance, which will need to be validated using external data. Because it is fully automated, the model can be used to examine providers' practice patterns and could be deployed in a decision support tool to help improve quality of care.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Eunpi Cho
- Stanford University, Stanford, CA; Genentech, South San Francisco, CA
| | | | | | - Albert C Koong
- Department of Radiation Oncology
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX
| | - Daniel L Rubin
- Department of Biomedical Data Science
- Department of Statistics
| | | |
Collapse
|
13
|
Fathiamini S, Johnson AM, Zeng J, Holla V, Sanchez NS, Meric-Bernstam F, Bernstam EV, Cohen T. Rapamycin - mTOR + BRAF = ? Using relational similarity to find therapeutically relevant drug-gene relationships in unstructured text. J Biomed Inform 2019; 90:103094. [PMID: 30615938 PMCID: PMC6386529 DOI: 10.1016/j.jbi.2019.103094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Revised: 11/30/2018] [Accepted: 12/27/2018] [Indexed: 11/17/2022]
Affiliation(s)
- Safa Fathiamini
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, United States.
| | - Amber M Johnson
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Jia Zeng
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Vijaykumar Holla
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Nora S Sanchez
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Funda Meric-Bernstam
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States; Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States; Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Elmer V Bernstam
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, United States; Division of General Internal Medicine, Department of Internal Medicine, The University of Texas Health Science Center at Houston, TX, United States.
| | - Trevor Cohen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.
| |
Collapse
|