Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Seneviratne MG, Seto T, Blayney DW, Brooks JD, Hernandez-Boussard T. Architecture and Implementation of a Clinical Research Data Warehouse for Prostate Cancer. EGEMS (Wash DC) 2018;6:13. [PMID: 30094285 DOI: 10.5334/egems.234] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

For:	Seneviratne MG, Seto T, Blayney DW, Brooks JD, Hernandez-Boussard T. Architecture and Implementation of a Clinical Research Data Warehouse for Prostate Cancer. EGEMS (Wash DC) 2018;6:13. [PMID: 30094285 DOI: 10.5334/egems.234] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Number

Cited by Other Article(s)

Magoc T, Everson R, Harle CA. Enhancing an enterprise data warehouse for research with data extracted using natural language processing. J Clin Transl Sci 2023;7:e149. [PMID: 37456264 PMCID: PMC10346024 DOI: 10.1017/cts.2023.575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/14/2023] [Accepted: 05/31/2023] [Indexed: 07/18/2023] Open

Bozkurt S, Magnani CJ, Seneviratne MG, Brooks JD, Hernandez-Boussard T. Expanding the Secondary Use of Prostate Cancer Real World Data: Automated Classifiers for Clinical and Pathological Stage. Front Digit Health 2022;4:793316. [PMID: 35721793 PMCID: PMC9201076 DOI: 10.3389/fdgth.2022.793316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 05/12/2022] [Indexed: 11/30/2022] Open

Abstract

Background

Explicit documentation of stage is an endorsed quality metric by the National Quality Forum. Clinical and pathological cancer staging is inconsistently recorded within clinical narratives but can be derived from text in the Electronic Health Record (EHR). To address this need, we developed a Natural Language Processing (NLP) solution for extraction of clinical and pathological TNM stages from the clinical notes in prostate cancer patients.

Methods

Data for patients diagnosed with prostate cancer between 2010 and 2018 were collected from a tertiary care academic healthcare system's EHR records in the United States. This system is linked to the California Cancer Registry, and contains data on diagnosis, histology, cancer stage, treatment and outcomes. A randomly selected sample of patients were manually annotated for stage to establish the ground truth for training and validating the NLP methods. For each patient, a vector representation of clinical text (written in English) was used to train a machine learning model alongside a rule-based model and compared with the ground truth.

Results

A total of 5,461 prostate cancer patients were identified in the clinical data warehouse and over 30% were missing stage information. Thirty-three to thirty-six percent of patients were missing a clinical stage and the models accurately imputed the stage in 21–32% of cases. Twenty-one percent had a missing pathological stage and using NLP 71% of missing T stages and 56% of missing N stages were imputed. For both clinical and pathological T and N stages, the rule-based NLP approach out-performed the ML approach with a minimum F1 score of 0.71 and 0.40, respectively. For clinical M stage the ML approach out-performed the rule-based model with a minimum F1 score of 0.79 and 0.88, respectively.

Conclusions

We developed an NLP pipeline to successfully extract clinical and pathological staging information from clinical narratives. Our results can serve as a proof of concept for using NLP to augment clinical and pathological stage reporting in cancer registries and EHRs to enhance the secondary use of these data.

Collapse

Davila JR, Singh K, Hernandez-Boussard T, Wang S. Outcomes of Primary Trabeculectomy versus Combined Phacoemulsification-Trabeculectomy Using Automated Electronic Health Record Data Extraction. Curr Eye Res 2022;47:923-929. [PMID: 35317681 PMCID: PMC10000312 DOI: 10.1080/02713683.2022.2045611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Lee KS, Shin DG, Hwang JH, Kim R, Han CH, Yoo J. Construction of a bone marrow report registry using a clinical data warehouse. Int J Lab Hematol 2021;44:e140-e144. [PMID: 34889526 DOI: 10.1111/ijlh.13781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 11/30/2021] [Indexed: 11/28/2022]

Eschrich SA, Teer JK, Reisman P, Siegel E, Challa C, Lewis P, Fellows K, Malpica E, Carvajal R, Gonzalez G, Cukras S, Betin-Montes M, Aden-Buie G, Avedon M, Manning D, Tan AC, Fridley BL, Gerke T, Van Looveren M, Blake A, Greenman J, Rollison D. Enabling Precision Medicine in Cancer Care Through a Molecular Data Warehouse: The Moffitt Experience. JCO Clin Cancer Inform 2021;5:561-569. [PMID: 33989014 DOI: 10.1200/cci.20.00175] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Evaluation of clustering and topic modeling methods over health-related tweets and emails. Artif Intell Med 2021;117:102096. [PMID: 34127235 DOI: 10.1016/j.artmed.2021.102096] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 03/30/2021] [Accepted: 05/05/2021] [Indexed: 01/31/2023]

Abstract

BACKGROUND

Internet provides different tools for communicating with patients, such as social media (e.g., Twitter) and email platforms. These platforms provided new data sources to shed lights on patient experiences with health care and improve our understanding of patient-provider communication. Several existing topic modeling and document clustering methods have been adapted to analyze these new free-text data automatically. However, both tweets and emails are often composed of short texts; and existing topic modeling and clustering approaches have suboptimal performance on these short texts. Moreover, research over health-related short texts using these methods has become difficult to reproduce and benchmark, partially due to the absence of a detailed comparison of state-of-the-art topic modeling and clustering methods on these short texts.

METHODS

We trained eight state-of- the-art topic modeling and clustering algorithms on short texts from two health-related datasets (tweets and emails): Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), LDA with Gibbs Sampling (GibbsLDA), Online LDA, Biterm Model (BTM), Online Twitter LDA, and Gibbs Sampling for Dirichlet Multinomial Mixture (GSDMM), as well as the k-means clustering algorithm with two different feature representations: TF-IDF and Doc2Vec. We used cluster validity indices to evaluate the performance of topic modeling and clustering: two internal indices (i.e. assessing the goodness of a clustering structure without external information) and five external indices (i.e. comparing the results of a cluster analysis to an externally known provided class labels).

RESULTS

In overall, for number of clusters (k) from 2 to 50, Online Twitter LDA and GSDMM achieved the best performance in terms of internal indices, while LSI and k-means with TF-IDF had the highest external indices. Also, of all tweets (N = 286, 971; HPV represents 94.6% of tweets and lynch syndrome represents 5.4%), for k = 2, most of the methods could respect this initial clustering distribution. However, we found model performance varies with the source of data and hyper-parameters such as the number of topics and the number of iterations used to train the models. We also conducted an error analysis using the Hamming loss metric, for which the poorest value was obtained by GSDMM on both datasets.

CONCLUSIONS

Researchers hoping to group or classify health related short-text data can expect to select the most suitable topic modeling and clustering methods for their specific research questions. Therefore, we presented a comparison of the most common used topic modeling and clustering algorithms over two health-related, short-text datasets using both internal and external clustering validation indices. Internal indices suggested Online Twitter LDA and GSDMM as the best, while external indices suggested LSI and k-means with TF-IDF as the best. In summary, our work suggested researchers can improve their analysis of model performance by using a variety of metrics, since there is not a single best metric.

Collapse

Coquet J, Bievre N, Billaut V, Seneviratne M, Magnani CJ, Bozkurt S, Brooks JD, Hernandez-Boussard T. Assessment of a Clinical Trial-Derived Survival Model in Patients With Metastatic Castration-Resistant Prostate Cancer. JAMA Netw Open 2021;4:e2031730. [PMID: 33481032 PMCID: PMC7823224 DOI: 10.1001/jamanetworkopen.2020.31730] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

IMPORTANCE

Randomized clinical trials (RCTs) are considered the criterion standard for clinical evidence. Despite their many benefits, RCTs have limitations, such as costliness, that may reduce the generalizability of their findings among diverse populations and routine care settings.

OBJECTIVE

To assess the performance of an RCT-derived prognostic model that predicts survival among patients with metastatic castration-resistant prostate cancer (CRPC) when the model is applied to real-world data from electronic health records (EHRs).

DESIGN, SETTING, AND PARTICIPANTS

The RCT-trained model and patient data from the RCTs were obtained from the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge for prostate cancer, which occurred from March 16 to July 27, 2015. This challenge included 4 phase 3 clinical trials of patients with metastatic CRPC. Real-world data were obtained from the EHRs of a tertiary care academic medical center that includes a comprehensive cancer center. In this study, the DREAM challenge RCT-trained model was applied to real-world data from January 1, 2008, to December 31, 2019; the model was then retrained using EHR data with optimized feature selection. Patients with metastatic CRPC were divided into RCT and EHR cohorts based on data source. Data were analyzed from March 23, 2018, to October 22, 2020.

EXPOSURES

Patients who received treatment for metastatic CRPC.

MAIN OUTCOMES AND MEASURES

The primary outcome was the performance of an RCT-derived prognostic model that predicts survival among patients with metastatic CRPC when the model is applied to real-world data. Model performance was compared using 10-fold cross-validation according to time-dependent integrated area under the curve (iAUC) statistics.

RESULTS

Among 2113 participants with metastatic CRPC, 1600 participants were included in the RCT cohort, and 513 participants were included in the EHR cohort. The RCT cohort comprised a larger proportion of White participants (1390 patients [86.9%] vs 337 patients [65.7%]) and a smaller proportion of Hispanic participants (14 patients [0.9%] vs 42 patients [8.2%]), Asian participants (41 patients [2.6%] vs 88 patients [17.2%]), and participants older than 75 years (388 patients [24.3%] vs 191 patients [37.2%]) compared with the EHR cohort. Participants in the RCT cohort also had fewer comorbidities (mean [SD], 1.6 [1.8] comorbidities vs 2.5 [2.6] comorbidities, respectively) compared with those in the EHR cohort. Of the 101 variables used in the RCT-derived model, 10 were not available in the EHR data set, 3 of which were among the top 10 features in the DREAM challenge RCT model. The best-performing EHR-trained model included only 25 of the 101 variables included in the RCT-trained model. The performance of the RCT-trained and EHR-trained models was adequate in the EHR cohort (mean [SD] iAUC, 0.722 [0.118] and 0.762 [0.106], respectively); model optimization was associated with improved performance of the best-performing EHR model (mean [SD] iAUC, 0.792 [0.097]). The EHR-trained model classified 256 patients as having a high risk of mortality and 256 patients as having a low risk of mortality (hazard ratio, 2.7; 95% CI, 2.0-3.7; log-rank P < .001).

CONCLUSIONS AND RELEVANCE

In this study, although the RCT-trained models did not perform well when applied to real-world EHR data, retraining the models using real-world EHR data and optimizing variable selection was beneficial for model performance. As clinical evidence evolves to include more real-world data, both industry and academia will likely search for ways to balance model optimization with generalizability. This study provides a pragmatic approach to applying RCT-trained models to real-world data.

Collapse

Magnani CJ, Bievre N, Baker LC, Brooks JD, Blayney DW, Hernandez-Boussard T. Real-world Evidence to Estimate Prostate Cancer Costs for First-line Treatment or Active Surveillance. EUR UROL SUPPL 2020;23:20-29. [PMID: 33367287 PMCID: PMC7751921 DOI: 10.1016/j.euros.2020.11.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

Background

Prostate cancer is the most common cancer in men and second leading cause of cancer-related deaths. Changes in screening guidelines, adoption of active surveillance (AS), and implementation of high-cost technologies have changed treatment costs. Traditional cost-effectiveness studies rely on clinical trial protocols unlikely to capture actual practice behavior, and existing studies use data predating new technologies. Real-world evidence reflecting these changes is lacking.

Objective

To assess real-world costs of first-line prostate cancer management.

Design setting and participants

We used clinical electronic health records for 2008-2018 linked with the California Cancer Registry and the Medicare Fee Schedule to assess costs over 24 or 60 mo following diagnosis. We identified surgery or radiation treatments with structured methods, while we used both structured data and natural language processing to identify AS.

Outcome measurements and statistical analysis

Our results are risk-stratified calculated cost per day (CCPD) for first-line management, which are independent of treatment duration. We used the Kruskal-Wallis test to compare unadjusted CCPD while analysis of covariance log-linear models adjusted estimates for age and Charlson comorbidity.

Results and limitations

In 3433 patients, surgery (54.6%) was more common than radiation (22.3%) or AS (23.0%). Two years following diagnosis, AS ($2.97/d) was cheaper than surgery ($5.67/d) or radiation ($9.34/d) in favorable disease, while surgery ($7.17/d) was cheaper than radiation ($16.34/d) for unfavorable disease. At 5 yr, AS ($2.71/d) remained slightly cheaper than surgery ($2.87/d) and radiation ($4.36/d) in favorable disease, while for unfavorable disease surgery ($4.15/d) remained cheaper than radiation ($10.32/d). Study limitations include information derived from a single healthcare system and costs based on benchmark Medicare estimates rather than actual payment exchanges.

Patient summary

Active surveillance was cheaper than surgery (-47.6%) and radiation (-68.2%) at 2 yr for favorable-risk disease, which decreased by 5 yr (-5.6% and -37.8%, respectively). Surgery was less costly than radiation for unfavorable risk for both intervals (-56.1% and -59.8%, respectively).

Collapse

Bozkurt S, Paul R, Coquet J, Sun R, Banerjee I, Brooks JD, Hernandez-Boussard T. Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case. Learn Health Syst 2020;4:e10237. [PMID: 33083539 PMCID: PMC7556418 DOI: 10.1002/lrh2.10237] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 06/15/2020] [Accepted: 06/23/2020] [Indexed: 01/12/2023] Open

Abstract

Introduction

A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient‐centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision‐based therapy and promote a value‐based delivery system.

Methods

Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule‐based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes.

Results

The rule‐based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule‐based model but did outperform the deep learning model (accuracy: 0.75).

Conclusion

Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs and by using NLP methods, it is feasible to accurately and efficiently phenotype PCO severity. Phenotyping must extend beyond the identification of disease to provide classification of disease severity that can be used to guide treatment and inform shared decision‐making. Our methods demonstrate a path to a patient centered LHS that could advance precision medicine.

Collapse

Cho S, Sin M, Tsapepas D, Dale LA, Husain SA, Mohan S, Natarajan K. Content Coverage Evaluation of the OMOP Vocabulary on the Transplant Domain Focusing on Concepts Relevant for Kidney Transplant Outcomes Analysis. Appl Clin Inform 2020;11:650-658. [PMID: 33027834 DOI: 10.1055/s-0040-1716528] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Wang SY, Azad AD, Lin SC, Hernandez-Boussard T, Pershing S. Intraocular Pressure Changes after Cataract Surgery in Patients with and without Glaucoma: An Informatics-Based Approach. Ophthalmol Glaucoma 2020;3:343-349. [PMID: 32703703 DOI: 10.1016/j.ogla.2020.06.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/28/2020] [Accepted: 06/03/2020] [Indexed: 10/24/2022]

Abstract

PURPOSE

To evaluate changes in intraocular pressure (IOP) after cataract surgery among patients with or without glaucoma using automated extraction of data from electronic health records (EHRs).

DESIGN

Retrospective cohort study.

PARTICIPANTS

Adults who underwent standalone cataract surgery at a single academic center from 2009-2018.

METHODS

Patient information was identified from procedure and billing codes, demographic tables, medication orders, clinical notes, and eye examination fields in the EHR. A previously validated natural language processing pipeline was used to identify laterality of cataract surgery from operative notes and laterality of eye medications from medication orders. Cox proportional hazards modeling evaluated factors associated with the main outcome of sustained postoperative IOP reduction.

MAIN OUTCOME MEASURES

Sustained post-cataract surgery IOP reduction, measured at 14 months or the last follow-up while using equal or fewer glaucoma medications compared with baseline and without additional glaucoma laser or surgery on the operative eye.

RESULTS

The median follow-up for 7574 eyes of 4883 patients who underwent cataract surgery was 244 days. The mean preoperative IOP for all patients was 15.2 mmHg (standard deviation [SD], 3.4 mmHg), which decreased to 14.2 mmHg (SD, 3.0 mmHg) at 12 months after surgery. Patients with IOP of 21.0 mmHg or more showed mean postoperative IOP reduction ranging from -6.2 to -6.9 mmHg. Cataract surgery was more likely to yield sustained IOP reduction for patients with primary open-angle glaucoma (hazard ratio [HR], 1.19; 95% confidence interval, 1.05-1.36) or narrow angles or angle closure (HR, 1.21; 95% confidence interval, 1.08-1.34) compared with patients without glaucoma. Those with a higher baseline IOP were more likely to achieve postoperative IOP reduction (HR, 1.06 per 1-mmHg increase in baseline IOP; 95% confidence interval, 1.05-1.07).

CONCLUSIONS

Our results suggest that patients with primary open-angle glaucoma or with narrow angles or chronic angle closure were more likely to achieve sustained IOP reduction after cataract surgery. Patients with higher baseline IOP had increasingly higher odds of achieving reduction in IOP. This evidence demonstrates the potential usefulness of a pipeline for automated extraction of ophthalmic surgical outcomes from EHR to answer key clinical questions on a large scale.

Collapse

Meregaglia M, Ciani O, Banks H, Salcher-Konrad M, Carney C, Jayawardana S, Williamson P, Fattore G. A scoping review of core outcome sets and their 'mapping' onto real-world data using prostate cancer as a case study. BMC Med Res Methodol 2020;20:41. [PMID: 32103725 PMCID: PMC7045588 DOI: 10.1186/s12874-020-00928-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 02/17/2020] [Indexed: 12/14/2022] Open

Abstract

Background

A Core Outcomes Set (COS) is an agreed minimum set of outcomes that should be reported in all clinical studies related to a specific condition. Using prostate cancer as a case study, we identified, summarized, and critically appraised published COS development studies and assessed the degree of overlap between them and selected real-world data (RWD) sources.

Methods

We conducted a scoping review of the Core Outcome Measures in Effectiveness Trials (COMET) Initiative database to identify all COS studies developed for prostate cancer. Several characteristics (i.e., study type, methods for consensus, type of participants, outcomes included in COS and corresponding measurement instruments, timing, and sources) were extracted from the studies; outcomes were classified according to a predefined 38-item taxonomy. The study methodology was assessed based on the recent COS-STAndards for Development (COS-STAD) recommendations. A ‘mapping’ exercise was conducted between the COS identified and RWD routinely collected in selected European countries.

Results

Eleven COS development studies published between 1995 and 2017 were retrieved, of which 8 were classified as ‘COS for clinical trials and clinical research’, 2 as ‘COS for practice’ and 1 as ‘COS patient reported outcomes’. Recommended outcomes were mainly categorized into ‘mortality and survival’ (17%), ‘outcomes related to neoplasm’ (18%), and ‘renal and urinary outcomes’ (13%) with no relevant differences among COS study types. The studies generally fulfilled the criteria for the COS-STAD ‘scope specification’ domain but not the ‘stakeholders involved’ and ‘consensus process’ domains. About 72% overlap existed between COS and linked administrative data sources, with important gaps. Linking with patient registries improved coverage (85%), but was sometimes limited to smaller follow-up patient groups.

Conclusions

This scoping review identified few COS development studies in prostate cancer, some quite dated and with a growing level of methodological quality over time. This study revealed promising overlap between COS and RWD sources, though with important limitations; linking established, national patient registries to administrative data provide the best means to additionally capture patient-reported and some clinical outcomes over time. Thus, increasing the combination of different data sources and the interoperability of systems to follow larger patient groups in RWD is required.

Collapse

Hernandez-Boussard T, Blayney DW, Brooks JD. Leveraging Digital Data to Inform and Improve Quality Cancer Care. Cancer Epidemiol Biomarkers Prev 2020;29:816-822. [PMID: 32066619 DOI: 10.1158/1055-9965.epi-19-0873] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 02/12/2020] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Efficient capture of routine clinical care and patient outcomes is needed at a population-level, as is evidence on important treatment-related side effects and their effect on well-being and clinical outcomes. The increasing availability of electronic health records (EHR) offers new opportunities to generate population-level patient-centered evidence on oncologic care that can better guide treatment decisions and patient-valued care.

METHODS

This study includes patients seeking care at an academic medical center, 2008 to 2018. Digital data sources are combined to address missingness, inaccuracy, and noise common to EHR data. Clinical concepts were identified and extracted from EHR unstructured data using natural language processing (NLP) and machine/deep learning techniques. All models are trained, tested, and validated on independent data samples using standard metrics.

RESULTS

We provide use cases for using EHR data to assess guideline adherence and quality measurements among patients with cancer. Pretreatment assessment was evaluated by guideline adherence and quality metrics for cancer staging metrics. Our studies in perioperative quality focused on medications administered and guideline adherence. Patient outcomes included treatment-related side effects and patient-reported outcomes.

CONCLUSIONS

Advanced technologies applied to EHRs present opportunities to advance population-level quality assessment, to learn from routinely collected clinical data for personalized treatment guidelines, and to augment epidemiologic and population health studies. The effective use of digital data can inform patient-valued care, quality initiatives, and policy guidelines.

IMPACT

A comprehensive set of health data analyzed with advanced technologies results in a unique resource that facilitates wide-ranging, innovative, and impactful research on prostate cancer. This work demonstrates new ways to use the EHRs and technology to advance epidemiologic studies and benefit oncologic care.See all articles in this CEBP Focus section, "Modernizing Population Science."

Collapse

Li K, Banerjee I, Magnani CJ, Blayney DW, Brooks JD, Hernandez-Boussard T. Clinical Documentation to Predict Factors Associated with Urinary Incontinence Following Prostatectomy for Prostate Cancer. Res Rep Urol 2020;12:7-14. [PMID: 32158720 PMCID: PMC6986242 DOI: 10.2147/rru.s234178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 12/11/2019] [Indexed: 02/01/2023] Open

Lenain R, Seneviratne MG, Bozkurt S, Blayney DW, Brooks JD, Hernandez-Boussard T. Machine Learning Approaches for Extracting Stage from Pathology Reports in Prostate Cancer. Stud Health Technol Inform 2019;264:1522-1523. [PMID: 31438212 DOI: 10.3233/shti190515] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Extracting Patient-Centered Outcomes from Clinical Notes in Electronic Health Records: Assessment of Urinary Incontinence After Radical Prostatectomy. EGEMS 2019;7:43. [PMID: 31497615 PMCID: PMC6706996 DOI: 10.5334/egems.297] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Bozkurt S, Kan KM, Ferrari MK, Rubin DL, Blayney DW, Hernandez-Boussard T, Brooks JD. Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study. BMJ Open 2019;9:e027182. [PMID: 31324681 PMCID: PMC6661600 DOI: 10.1136/bmjopen-2018-027182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Magnani CJ, Li K, Seto T, McDonald KM, Blayney DW, Brooks JD, Hernandez-Boussard T. PSA Testing Use and Prostate Cancer Diagnostic Stage After the 2012 U.S. Preventive Services Task Force Guideline Changes. J Natl Compr Canc Netw 2019;17:795-803. [PMID: 31319390 PMCID: PMC7195904 DOI: 10.6004/jnccn.2018.7274] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 01/15/2019] [Indexed: 12/28/2022]

Abstract

BACKGROUND

Most patients with prostate cancer are diagnosed with low-grade, localized disease and may not require definitive treatment. In 2012, the U.S. Preventive Services Task Force (USPSTF) recommended against prostate cancer screening to address overdetection and overtreatment. This study sought to determine the effect of guideline changes on prostate-specific antigen (PSA) screening and initial diagnostic stage for prostate cancer.

PATIENTS AND METHODS

A difference-in-differences analysis was conducted to compare changes in PSA screening (exposure) relative to cholesterol testing (control) after the 2012 USPSTF guideline changes, and chi-square test was used to determine whether there was a subsequent decrease in early-stage, low-risk prostate cancer diagnoses. Data were derived from a tertiary academic medical center's electronic health records, a national commercial insurance database (OptumLabs), and the SEER database for men aged ≥35 years before (2008-2011) and after (2013-2016) the guideline changes.

RESULTS

In both the academic center and insurance databases, PSA testing significantly decreased for all men compared with the control. The greatest decrease was among men aged 55 to 74 years at the academic center and among those aged ≥75 years in the commercial database. The proportion of early-stage prostate cancer diagnoses (

CONCLUSIONS

In primary care, PSA testing decreased significantly and fewer prostate cancers were diagnosed at an early stage, suggesting provider adherence to the 2012 USPSTF guideline changes. Long-term follow-up is needed to understand the effect of decreased screening on prostate cancer survival.

Collapse

Coquet J, Bozkurt S, Kan KM, Ferrari MK, Blayney DW, Brooks JD, Hernandez-Boussard T. Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients. J Biomed Inform 2019;94:103184. [PMID: 31014980 PMCID: PMC6584041 DOI: 10.1016/j.jbi.2019.103184] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 04/15/2019] [Accepted: 04/19/2019] [Indexed: 01/31/2023]

Banerjee I, Li K, Seneviratne M, Ferrari M, Seto T, Brooks JD, Rubin DL, Hernandez-Boussard T. Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment. JAMIA Open 2019;2:150-159. [PMID: 31032481 PMCID: PMC6482003 DOI: 10.1093/jamiaopen/ooy057] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 11/14/2018] [Accepted: 11/28/2018] [Indexed: 11/13/2022] Open

Abstract

Background

The population-based assessment of patient-centered outcomes (PCOs) has been limited by the efficient and accurate collection of these data. Natural language processing (NLP) pipelines can determine whether a clinical note within an electronic medical record contains evidence on these data. We present and demonstrate the accuracy of an NLP pipeline that targets to assess the presence, absence, or risk discussion of two important PCOs following prostate cancer treatment: urinary incontinence (UI) and bowel dysfunction (BD).

Methods

We propose a weakly supervised NLP approach which annotates electronic medical record clinical notes without requiring manual chart review. A weighted function of neural word embedding was used to create a sentence-level vector representation of relevant expressions extracted from the clinical notes. Sentence vectors were used as input for a multinomial logistic model, with output being either presence, absence or risk discussion of UI/BD. The classifier was trained based on automated sentence annotation depending only on domain-specific dictionaries (weak supervision).

Results

The model achieved an average F1 score of 0.86 for the sentence-level, three-tier classification task (presence/absence/risk) in both UI and BD. The model also outperformed a pre-existing rule-based model for note-level annotation of UI with significant margin.

Conclusions

We demonstrate a machine learning method to categorize clinical notes based on important PCOs that trains a classifier on sentence vector representations labeled with a domain-specific dictionary, which eliminates the need for manual engineering of linguistic rules or manual chart review for extracting the PCOs. The weakly supervised NLP pipeline showed promising sensitivity and specificity for identifying important PCOs in unstructured clinical text notes compared to rule-based algorithms.

Collapse

Affiliation(s)

Imon Banerjee Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA
Kevin Li Stanford University School of Medicine, 291 Campus Drive, Stanford, California 94305-5479, USA
Martin Seneviratne Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA Department of Biomedical Informatics, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA
Michelle Ferrari Department of Urology - Divisions, Stanford University School of Medicine, 875 Blake Wilbur, Stanford, California 94305-5479, USA
Tina Seto IRT Research Technology, Stanford University School of Medicine, Stanford, California 94305-5479, USA
James D Brooks Department of Urology - Divisions, Stanford University School of Medicine, 875 Blake Wilbur, Stanford, California 94305-5479, USA
Daniel L Rubin Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA Department of Radiology, Stanford University School of Medicine, Stanford, California 94305-5479, USA Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA
Tina Hernandez-Boussard Department of Biomedical Data Science, Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Medical School Office Building (MSOB), 1265 Welch Road, Stanford, California 94305-5479, USA Department of Surgery, Stanford University School of Medicine, 300 Pasteur Drive Stanford, California 94305-2200, USA

Collapse

Seneviratne MG, Bozkurt S, Patel MI, Seto T, Brooks JD, Blayney DW, Kurian AW, Hernandez-Boussard T. Distribution of global health measures from routinely collected PROMIS surveys in patients with breast cancer or prostate cancer. Cancer 2019;125:943-951. [PMID: 30512191 PMCID: PMC6403006 DOI: 10.1002/cncr.31895] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 10/17/2018] [Accepted: 10/31/2018] [Indexed: 01/07/2023]

Seneviratne MG, Kahn MG, Hernandez-Boussard T. Merging heterogeneous clinical data to enable knowledge discovery. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019;24:439-443. [PMID: 30864344 PMCID: PMC6447393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Seneviratne MG, Banda JM, Brooks JD, Shah NH, Hernandez-Boussard TM. Identifying Cases of Metastatic Prostate Cancer Using Machine Learning on Electronic Health Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:1498-1504. [PMID: 30815195 PMCID: PMC6371284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Bozkurt S, Park JI, Kan KM, Ferrari M, Rubin DL, Brooks JD, Hernandez-Boussard T. An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:288-294. [PMID: 30815067 PMCID: PMC6371344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]